100 Years of Math Milestones: The Pi Mu Epsilon Centennial Collection 1470436523, 9781470436520

227 87 142MB

English Pages 0 [597] Year 2019

Report DMCA / Copyright


Polecaj historie

100 Years of Math Milestones: The Pi Mu Epsilon Centennial Collection
 1470436523, 9781470436520

Citation preview


The Pi Mu Epsilon Centennial Collection Stephan Ramon Garcia

Steven J. Miller


The Pi Mu Epsilon Centennial Collection


The Pi Mu Epsilon Centennial Collection Stephan Ramon Garcia

Steven J. Miller

2010 Mathematics Subject Classification. Primary 00A08, 00A30, 00A35, 05-01, 11-01, 30-01, 54-01, 60-01.

For additional information and updates on this book, visit www.ams.org/bookpages/mbk-121

Library of Congress Cataloging-in-Publication Data Names: Garcia, Stephan Ramon, author. | Miller, Steven J., 1974- author. Title: 100 years of math milestones : the Pi Mu Epsilon centennial collection / Stephan Ramon Garcia, Steven J. Miller. Other titles: One hundred years of math milestones | Pi Mu Epsilon centennial collection Description: Providence, Rhode Island : American Mathematical Society, [2019] | Includes bibliographical references and indexes. Identifiers: LCCN 2019000982 | ISBN 9781470436520 (alk. paper) Subjects: LCSH: Mathematics–United States–History. | Pi Mu Epsilon. | AMS: General – General and miscellaneous specific topics – Philosophy of mathematics. msc | General – General and miscellaneous specific topics – Methodology of mathematics, didactics. msc | Combinatorics – Instructional exposition (textbooks, tutorial papers, etc.). msc | Number theory – Instructional exposition (textbooks, tutorial papers, etc.). msc | Functions of a complex variable – Instructional exposition (textbooks, tutorial papers, etc.). msc | General topology – Instructional exposition (textbooks, tutorial papers, etc.). msc | Probability theory and stochastic processes – Instructional exposition (textbooks, tutorial papers, etc.). msc Classification: LCC QA27.U5 G37 2019 | DDC 510.9–dc23 LC record available at https://lccn.loc.gov/2019000982

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for permission to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For more information, please visit www.ams.org/publications/pubpermissions. Send requests for translation rights and licensed reprints to [email protected] c 2019 by the authors. All rights reserved.  Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines 

established to ensure permanence and durability. Visit the AMS home page at https://www.ams.org/ 10 9 8 7 6 5 4 3 2 1

24 23 22 21 20 19

Stephan Ramon Garcia dedicates this book to his wife, Gizem Karaali, and their children, Reyhan and Altay. Thanks also go to his parents for their constant support and affection. Steven Miller dedicates this book with thanks to his many colleagues and students who assisted in writing this book, to his in-laws Jeffrey and Judy Gelfand for providing a hospitable environment where many of these entries were written and edited, and to his friends at Pi Mu Epsilon (especially Harold Reiter, a previous editor of the Problem Section) for their support of this project.

Contents Preface




1913. Paul Erd˝os


1914. Martin Gardner


1915. General Relativity and the Absolute Differential Calculus


1916. Ostrowski’s Theorem


1917. Morse Theory, but Really Cantor


1918. Georg Cantor


1919. Brun’s Theorem


1920. Waring’s Problem


1921. Mordell’s Theorem


1922. Lindeberg Condition


1923. The Circle Method


1924. The Banach–Tarski Paradox


1925. The Schr¨ odinger Equation


1926. Ackermann’s Function


1927. William Lowell Putnam Mathematical Competition


1928. Random Matrix Theory


1929. G¨odel’s Incompleteness Theorems


1930. Ramsey Theory


1931. The Ergodic Theorem


1932. The 3x + 1 Problem


1933. Skewes’s Number


1934. Khinchin’s Constant

113 vii



1935. Hilbert’s Seventh Problem


1936. Alan Turing


1937. Vinogradov’s Theorem


1938. Benford’s Law


1939. The Power of Positive Thinking


1940. A Mathematician’s Apology


1941. The Foundation Trilogy


1942. Zeros of ζ(s)


1943. Breaking Enigma


1944. Theory of Games and Economic Behavior


1945. The Riemann Hypothesis in Function Fields


1946. Monte Carlo Method


1947. The Simplex Method


1948. Elementary Proof of the Prime Number Theorem


1949. Beurling’s Theorem


1950. Arrow’s Impossibility Theorem


√ 1951. Tennenbaum’s Proof of the Irrationality of 2


1952. NSA Founded


1953. The Metropolis Algorithm


1954. Kolmogorov–Arnold–Moser Theorem


1955. Roth’s Theorem


1956. The GAGA Principle


1957. The Ross Program


1958. Smale’s Paradox


1959. QR Decomposition


1960. The Unreasonable Effectiveness of Mathematics


1961. Lorenz’s Nonperiodic Flow


1962. The Gale–Shapley Algorithm and the Stable Marriage Problem


1963. Continuum Hypothesis




1964. Principles of Mathematical Analysis


1965. Fast Fourier Transform


1966. Class Number One Problem


1967. The Langlands Program


1968. Atiyah–Singer Index Theorem


1969. Erd˝os Numbers


1970. Hilbert’s Tenth Problem


1971. Society for American Baseball Research


1972. Zaremba’s Conjecture


1973. Transcendence of e Centennial


1974. Rubik’s Cube


1975. Szemer´edi’s Theorem


1976. Four Color Theorem


1977. RSA Encryption


1978. Mandelbrot Set


1979. TEX


1980. Hilbert’s Third Problem


1981. The Mason–Stothers Theorem


1982. Two Envelopes Problem


1983. Julia Robinson


1984. 1984


1985. The Jones Polynomial


1986. Sudokus and Look and Say


1987. Primes, the Zeta Function, Randomness, and Physics


1988. Mathematica


1989. PROMYS


1990. The Monty Hall Problem


1991. arXiv


1992. Monstrous Moonshine




1993. The 15-Theorem


1994. AIM


1995. Fermat’s Last Theorem


1996. Great Internet Mersenne Prime Search (GIMPS)


1997. The Nobel Prize of Merton and Scholes


1998. The Kepler Conjecture


1999. Baire Category Theorem


2000. R


2001. Colin Hughes Founds Project Euler


2002. PRIMES in P


2003. Poincar´e Conjecture


2004. Primes in Arithmetic Progression


2005. William Stein Developed Sage


2006. The Strong Perfect Graph Theorem


2007. Flatland


2008. 100th Anniversary of the t-Test


2009. 100th Anniversary of Brouwer’s Fixed-Point Theorem


2010. Carmichael Numbers


2011. 100th Anniversary of Egorov’s Theorem


2012. National Museum of Mathematics


Index of People




Preface In 2013, the second named author had the honor of succeeding Ashley Ahlin and Harold Reiter as the editor of the Problem Department of the ΠME Journal. This event essentially coincided with the 100th anniversary of Pi Mu Epsilon, so Miller thought it would be fun and appropriate to recognize this milestone in some way. Many others agreed. For example, Mike Pinter, from Belmont University in Nashville, Tennessee, proposed the base-16 celebratory equation PMEMATH + SOCIETY HUNDRED (which was used in the Spring 2014 issue). Many readers submitted correct solutions, the first being Jessica Lehr of Elizabethtown College. We leave the task of determining all possible solutions as a fun exercise for you. Being still somewhat young, energetic, and new to the job, while also gravely worried about finding enough good problems for issue after issue (not yet aware of the excellent submissions that would consistently arrive), Miller decided to celebrate with one hundred problems related to important mathematical milestones of the past century. Since one hundred is a large number of problems relative to the normal operation of the Problem Department (there are typically five or six problems per issue), he asked many colleagues for contributions. This resulted in four centennial articles, which appeared in The Pi Mu Epsilon Journal in 2013–2014 (13 (2013), no. 9, 513–534; 13 (2014), no. 10, 577–608; 14 (2014), no. 1, 65–99; and 14 (2014), no. 2, 100–134). The four articles were well received and there was strong interest in converting them into a book. The first named author came on board early in the process as a collaborator. Every entry was either expanded jointly by us from the four centennial articles or simply written anew. The second option was an essential step in converting the collection from a series of disjointed problems into a unified whole. We have used the original descriptions as springboards to introduce a variety of mathematical ideas, techniques, and applications. Whenever possible, we have quoted primary sources. Concepts are often introduced early on and then threaded through and expanded upon in later entries. The final result is a tour through much of mathematics, with an emphasis on beauty, big ideas, and interesting problems. There are several influential collections of problems that have motivated and guided mathematics. Hilbert’s problems and the Clay Millennium Problems are notable examples. We have a different emphasis here. Pi Mu Epsilon is an undergraduate mathematics honor society and thus, in addition to being important, the problems must be accessible to students. Although some of them do require analysis or algebra, number theory or probability, as a whole we hope they will be xi



appealing to energetic and enthusiastic math majors of all stripes. We wanted to create a collection that would motivate people who are still trying to decide what to do with their lives, as well as those who already have. No list can be complete and there are far too many items to celebrate. This book necessarily misses many old favorites. It is largely a reflection of the personal tastes and inclinations of the two authors. Accessibility counted far more than importance in breaking the many ties, and thus the collection below is well represented with problems that are somewhat recreational but also serve as springboards to great mathematics. We thank all the people who have helped us over the last several years. This includes the problem proposers, James M. Andrews and Avery T. Carr, who helped edit some of the original collection of problems; Miles C. Fippinger, who helped with some of the initial organization; and Ben Logsdon, who carefully read an early draft. We owe particular gratitude to Zachary Glassman, who made numerous Tikz drawings for some of the earlier entries. We learned many Tikz tricks and techniques from him, without which many of the remaining illustrations would not have been possible. In addition, we are greatly indebted to Yo Akiyama, Katherine Blake, Paula Burkhardt-Guim, Max Chao-Haft, Amina Diop, Alexandre Gueganic, Mark Hay, Bjørn Kjos-Hanssen, Forest Kobayashi, Scott Duke Kominers, Jeffrey Lagarias, David Lee, Clayton Mizgerd, Jos´e Mu˜ noz-L´ opez, Giebien Na, Carl Pomerance, Harald Schilly, Zachary Siegel, Lily Shao, William A. Stein, Hong Suh, Alexander Summers, James Tener, Gabe Udell, and Hunter Wieman for spotting numerous mistakes, typos, and errors throughout the book or suggesting various improvements to the text. The first named author also thanks Kathy Sheldon for her considerable logistical support. We were fortunate to work with a terrific staff at AMS (Marcia Almeida, Brian Bartling, John Brady, Sergei Gelfand, Eriko Hironaka, Arlene O’Sean, and Courtney Rose), whose tireless efforts from the start of this project years ago to the careful reading of the final draft greatly enhanced the book before you. Although we are no longer young or energetic, it has been a fun and enlightening experience working on so many diverse topics and with so many distinguished people. Read on, enjoy, and for those of you who someday aspire to be the Problem Editor for PME, here is some useful advice: start assembling the next hundred problems today!

Stephan Ramon Garcia Claremont, CA May 2, 2019

Steven J. Miller Williams College Williamstown, MA 01267 Carnegie Mellon University Pittsburgh, PA 15213 May 2, 2019

Notation • ∅ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . empty set • |A| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cardinality of a set A • (. . .)b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . number in base-b • log x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .base-e logarithm of x • logb x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . base-b logarithm of x • a|b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a divides b • x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . greatest integer function • gcd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . greatest common divisor • lcm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . least common multiple • a ≡ b (mod m) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . congruence modulo m n • i=1 ai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . product of a1 , a2 , . . . , an • N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set {1, 2, 3, . . .} of natural numbers • Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set {. . . , −2, −1, 0, 1, 2, . . .} of integers • Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set of rational numbers • R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set of real numbers • C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set of complex numbers • Re z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . real part of the complex number z • Im z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . imaginary part of the complex number z • ∼ = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . equinumerosity (p. 28) • f ∼ g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . asymptotic equivalence (p. 33) • π(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the number of primes at most x (p. 33) • Li(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (offset) logarithmic integral of x (p. 107)



Paul Erd˝ os Introduction How many contacts do you have in your cell phone? How many friends do you have on Facebook? Over the course of his life, Paul Erd˝ os (1913–1996) published over 1,500 mathematical papers with more than 500 different people. These are staggering numbers, and it is fitting to begin with a problem related to him. He worked in many fields, especially in combinatorics and number theory, often using probabilistic methods. The Erd˝os number (see the problem from 1969 for more details) measures a mathematician’s collaborative distance from Erd˝ os; the famous “Six Degrees of Kevin Bacon” game is based upon it. Erd˝os is best known for solving difficult problems and making profound conjectures, as opposed to developing new theories. Many conjectures he formulated remain open. Some have small cash prizes associated with them to attract attention and encourage further investigation. One of his most famous conjectures deals with finding arithmetic progressions contained in a given set of integers. An arithmetic progression is a (finite or infinite) sequence of integers, such as 4, 9, 14, 19, 24, whose terms differ by a fixed amount. Let N = {1, 2, 3, . . .} denote the set of natural numbers, let A = {n ∈ N : n is divisible by a prime congruent to 3 mod 4}, and let B = N\A. The first few primes congruent to 3 modulo 4 are 3, 7, 11, 19, 23, 31, 43, 47, 59, 67, 71, 79, 83, 103, 107, 127, 131, 139, . . . , so A = {3, 6, 7, 9, 11, 12, 14, 15, 18, 19, 21, 22, 23, 24, . . .} and B = {1, 2, 4, 5, 8, 10, 13, 16, 17, 20, 25, . . .}. Examining the numbers that are at most 25, we see A contains a lot of arithmetic progressions, from short ones of length three (for example, 7, 11, 15) to long ones of length seven (for example, 3, 6, 9, 12, 15, 18, 21). However, it is harder to find progressions among the elements of B at most 25. A little work turns up many of length three (for example, 2, 5, 8 or 4, 10, 16 or 1, 13, 25), but we do not have a progression as long as seven. What happens if we look at the full sets A and B? Do you think that there are arithmetic progressions of length  for any finite ? Why might the two sets behave differently? 1

˝ 1913. PAUL ERDOS


Centennial Problem 1913 Proposed by Craig Corsi and Steven J. Miller, Williams College. Erd˝os suspected that any set of natural numbers that is “not too sparse” contains “lots” of arithmetic progressions. More specifically, he conjectured that if S ⊆ N and the reciprocal sum 1 s s∈S

diverges, then S contains arithmetic progressions of any given finite length. Currently $5,000 is offered for the proof of the conjecture.

1913: Comments More on the Erd˝ os conjecture. It is important to note that the Erd˝os conjecture is not an “if and only if” statement. A set of natural numbers may contain arbitrarily long arithmetic progressions and have a convergent reciprocal sum. An example is 1, 10, 11, 100, 101, 102, 1000, 1001, 1002, 1003, 1004, 10001, . . . , 10005, . . . . Erd˝os’s conjecture only asserts that a divergent reciprocal sum is sufficient to ensure the existence of arbitrarily long arithmetic progressions. It is not a necessary condition. Notable progress on Erd˝os’s problem includes the celebrated Green–Tao theorem (see the 2004 entry), which states that the primes contain arbitrarily long arithmetic progressions. That the sum of the reciprocals of the primes diverges is an old result of Leonhard Euler (1707–1783); see p. 4 for a proof. Even though the Green–Tao theorem is a special case of Erd˝os’s more general conjecture, it is a profound one. It shows that a set of natural numbers as seemingly erratic as the primes enjoys some occasional semblance of regularity. While the proof of the Green–Tao theorem is beyond the scope of this book, we can look at some famous sequences whose reciprocal sums converge, to see if Erd˝os’s conjecture is reasonable. Two well-known examples are ∞  1 = 1 and 2n n=1

∞  1 π2 ; = n2 6 n=1

see the notes for 1919 for a proof of the second identity. Suppose there is a three-term arithmetic progression in the powers of 2, say 2a < 2b < 2c . Since the two gaps between the three terms are the same, 2b − 2a = 2c − 2b , or equivalently 2b+1 = 2c + 2a = 2a (2c−a + 1). Since b > a, the left-hand side is divisible by a higher power of 2 than the right-hand side, a contradiction. Thus, the longest arithmetic progression in this sequence is of length 2 (which is not impressive). Now for perfect squares.



Arithmetic progressions of perfect squares. Imagine that we have a three-term arithmetic progression of perfect squares, say a2 < b2 < c2 . Then there is a common difference d so that a2 = b2 − d


c2 = b2 + d.

Therefore, a2 + c2 = 2b2 , in which a < b < c. If (a, b, c) is one such solution, then so is (ma, mb, mc) for m = 1, 2, . . .. Thus, we only need to find a single primitive solution, in which gcd(a, b, c) = 1, in order to conclude that the set of perfect squares has infinitely many arithmetic progressions of length three. A quick computer search turns up several primitive solutions, such as (1, 5, 7), (7, 13, 17), and (7, 17, 23), which lead to the arithmetic progressions 1, 25, 49 with d = 24, 49, 169, 289 with d = 120, 49, 289, 529 with d = 240. What about quadruples of perfect squares in arithmetic progressions? See the notes for the 2004 entry for the answer. Primes in arithmetic progressions. Erd˝os remarked that one does not have to believe in a supreme being to be a mathematician, but one had to believe in The Book , in which is collected the most elegant “aha” proofs of results. Martin Aigner (1942– ) and G¨ unter M. Ziegler (1963– ) compiled a beautiful approximation of The Book [1]. The first chapter gives six proofs of the infinitude of primes, including the shocking topological proof of Hillel Furstenberg (1935– ); see the 1955 entry. After proving the infinitude of the primes, it is natural to study primes in arithmetic progressions. This is a fascinating subject and a terrific window into mathematics. For example, consider the following two statements (recall that two integers a and b are relatively prime if gcd(a, b) = 1). (a) Given two relatively prime natural numbers a and b, there is a prime congruent to b modulo a (that is, there is a prime p such that p − b is a multiple of a).1 (b) Given two relatively prime natural numbers a and b, there are infinitely many primes congruent to b modulo a. For instance, if a = 1000 and b = 123, then (a) asserts that there is at least one prime ending in 123 (one such example is 1123). On the other hand, (b) says that there are infinitely many primes ending in 123; this is much more difficult to prove. Except it is not! The only way that we currently know to prove that there exists a prime congruent to b modulo a is to show there are infinitely many such primes (and hence there must be at least one). Proving that there are infinitely many such primes is difficult; Peter Gustav Lejeune Dirichlet (1805–1859) succeeded in the 1830s by introducing and developing properties of L-functions (generalizations of the Riemann zeta function; see the 1967 entry). 1 If a and b are not relatively prime, there can be at most one prime congruent to b modulo a, and this happens precisely when b is prime. Thus, this case is uninteresting.


˝ 1913. PAUL ERDOS

Let us look at elementary approaches to finding primes in arithmetic progressions. Euclid’s proof of the infinitude of the primes goes like this. If there are only finitely many primes p1 , p2 , . . . , pn , then division of N = p1 p2 · · · pn + 1 by any pi leaves a remainder of 1. Thus, any prime factor of N is not on our list, a contradiction. It is natural to ask if this argument extends to arithmetic progressions. Here is a proof that there are infinitely many primes congruent to 3 (mod 4). If there are only finitely many (say p1 = 3 and p2 , p3 , . . . , pn ), consider the natural number M = 4p2 p3 · · · pn + 3. It is not divisible by any prime in our list (nor by 2 or 3). Since any product of primes congruent to 1 (mod 4) is congruent to 1 (mod 4), it follows that M is divisible by one of the primes p1 , p2 , . . . , pn , a contradiction. How far can this method be pushed? See [5]; since this paper is hard to find, the argument was reproduced in [6]. Sum of the reciprocals of the primes. Most mathematicians suspected that the primes contain arbitrarily long arithmetic progressions. Why? Because they believed in Erd˝ os’s conjecture and because Euler proved in 1737 that the sum of the reciprocals of the primes diverges. Their faith was rewarded in 2004 when Ben Green (1977– ) and Terence Tao (1975– ) proved this special case of Erd˝os’s conjecture [3]. While we cannot yet prove Erd˝ os’s conjecture, there is a beautiful elementary proof of Euler’s result [2] that uses an idea similar to Euclid’s proof of the infinitude of the primes. ∞Let1 pn denote the nth prime number and suppose toward a contradiction that n=1 pn converges. Since the tail end of a convergent series tends to zero, let K be so large that ∞  1 1 < . pj 2 j=K+1

Let Q = p1 p2 · · · pK and note that none of the numbers Q + 1, 2Q + 1, 3Q + 1, . . . is divisible by any of the primes p1 , p2 , . . . , pK . Now observe that m N ∞   ∞ ∞  m    1 1 1 ≤ < = 2 nQ + 1 pj 2 n=1 m=0 m=0 j=K+1

for N ≥ 1; the reason for the first inequality is due to the fact that the sum in the middle, when expanded term-by-term,  includes every term on the left-hand side. ∞ 1 diverges. This is a contradiction, since the series n=1 nQ+1 Bibliography [1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the 1998 original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin, 2018. MR3823190 [2] J. A. Clarkson, On the series of prime reciprocals, Proc. Amer. Math. Soc. 17 (1966), 541, DOI 10.2307/2035210. MR0188132 [3] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. MR2415379 [4] P. Hoffman, The man who loved only numbers: The story of Paul Erd˝ os and the search for mathematical truth, Hyperion Books, New York, 1998. MR1666054



[5] M. Ram Murty, Primes in certain arithmetic progressions, Journal of the Madras University (1988), 161–169. [6] M. R. Murty and N. Thain, Prime numbers in certain arithmetic progressions, Funct. Approx. Comment. Math. 35 (2006), 249–259, DOI 10.7169/facm/1229442627. MR2271617


Martin Gardner Introduction Few twentieth-century mathematical authors have written on such diverse subjects as Martin Gardner (1914–2010), whose books, numbering over seventy, cover not only numerous fields of mathematics but also literature, philosophy, pseudoscience, religion, and magic. He is best known as a recreational mathematician, due to the accessible and entertaining manner in which he wrote. This is an important role and should not be overlooked or minimized, as it both draws people to study mathematics and helps with public awareness and appreciation. In the introduction to his first book of puzzles, Hexaflexagons, Probability Paradoxes, and the Tower of Hanoi, he wrote: There is not much difference between the delight a novice experiences in cracking a clever brain teaser and the delight a mathematician experiences in mastering a more advanced problem. Both look on beauty bare—that clean, sharply defined, mysterious, entrancing order that underlies all structure.

A philosophy major at the University of Chicago, Gardner worked as a reporter, a yeoman in the Navy, and a writer for a childrens’ magazine before writing his first article for Scientific American in 1956. The publisher enjoyed the article and asked Gardner to turn it into a monthly puzzle column. The column ran for over twentyfive years and spawned fifteen books, reaching and inspiring countless mathematical hobbyists. There are many problems Gardner popularized that would make excellent additions to this collection. A good problem has to have many features: in addition to being interesting, there should be something wonderful about the solution, something that forces you to stop, smile, and marvel at the beauty of the argument. We decided upon a fun geometry question that meets these criteria. Its solution highlights a powerful method: passing from simple cases to the general case. Before getting to the question, let us look at a couple examples of this method. (a) Most students remember that cos(x + y) involves cos x cos y and sin x sin y, but do we add them or subtract? Let us suppose that cos(x + y) = a cos x cos y + b sin x sin y and refine our guess; as long as the formula is of this general shape we can determine a and b from special cases. When investigating special cases, try the simplest. For example, if we take x = y = 0, then we see a = 1. Setting 7



x = y = π/2 gives −1 = a · 02 + b · 12 , so b = −1. We remembered enough of the formula to figure out the rest.1 (b) The sum of the zeroth powers of the first n positive integers is n. The sum of the first powers is 1 + 2 + ···+ n =

1 2 2n

+ 12 n;

this is a familiar exercise in mathematical induction. In fact, there are similar formulas for sums of squares, cubes, and so forth. In general, 1k + 2k + · · · + nk = Pk (n),


in which Pk (n) is a polynomial in n of degree k + 1 with rational coefficients. If you remember that the sum of the first kth powers has this form, then one can evaluate (1914.1) for n = 0, 1, 2, . . . , k + 1 and solve the resulting linear equations for the unknown coefficients of Pk (n). (c) The method of undetermined coefficients from differential equations is another example of this technique. When confronted with a differential equation, such as y  (t) + y(t) = 2tet , (1914.2) one makes an educated guess about the form of a particular solution yp (t). Frequently, our guess has several undetermined coefficients (hence the name) that we try to find. In (1914.2), we have the unknown function and its second derivative on the left-hand side and a linear polynomial times an exponential on the right. This suggests that we substitute yp (t) = atet + bet into (1914.2) and attempt to solve for the constants a, b. In this case, solving the resulting linear equations for a and b yields the particular solution yp (t) = tet − et . (d) We end with an example from complex analysis. If you have never seen this before, it is a great way to discover the Cauchy–Riemann equations. Consider a complex function f (z), in which z = x + iy and i2 = −1. We can write f (z) = u(x, y) + iv(x, y) for two real-valued functions u and v. The partial derivatives of u and v enjoy a simple relationship of the form ux = avy and uy = bvx (where ux = ∂u/∂x and similarly for the rest), with one of a, b equal to 1 and the other equal to −1. The difficulty is remembering where the minus sign goes. To figure out the correct signs, take f (z) = z 2 . For this function, f (z) = (x + iy)2 = (x2 − y 2 ) + i(2xy); thus, u(x, y) = x2 − y 2 and v(x, y) = 2xy. Going through the calculations, we see that ux = 2x and vy = 2x, so ux = vy . A similar calculation shows that uy = −vx . The idea above can be used to do more than just recover a forgotten formula; it can help us discover something new. If we can show that a solution has to have a certain form, then we can often determine the answer by investigating a special case. This idea comes into play in our problem, where it turns out that a large class of configurations all lead to the same answer. Thus, if we can solve the problem for 1 Along the same lines as the dictum on p. 187, the use of complex numbers provides a more powerful method. Euler’s formula implies that cos(x + y) + i sin(x + y) = ei(x+y) = eix eiy = (cos x + i sin x)(cos y + i sin y) = (cos x cos y − sin x sin y) + i(cos x sin y + sin x cos y). Compare real and imaginary parts to obtain the addition formulas for cosine and sine.




Figure 1. The sphere with the removed cylinder. the simplest configuration, then we can solve it for all configurations! Of course, it is often hard to show that all the different possibilities lead to the same answer and that we need only deal with one case. Fortunately, this idea is still useful even if we cannot prove the equivalence since we can use it as a starting point to guess the correct solution. Centennial Problem 1914 Proposed by Byron Perpetua, Williams College. The following problem, which he popularized in the 1950s, is classic Gardner: easily stated and solvable without advanced techniques, yet challenging and surprising. Take a solid sphere and drill a cylindrical hole 6 inches long through its center (this means that the height of the cylinder is 6 inches; the caps on the bottom and top, which are removed from the sphere when we drill our hole, are not counted); see Figure 1. What is the remaining volume of the sphere? One approach is tedious and slow; the other is clever and skips several computations. Hint: although the problem seems to be missing necessary information, it would not be posed unless it had a unique solution. While some effort is required to prove that all possible realizations lead to the same answer, there is a particularly simple case that you can solve by inspection. 1914: Comments Solution to the problem. If we have a rough idea of the answer, checking a special case can help us determine it precisely. Let us use this idea to attack the problem from Gardner’s column. We therefore assume that the answer is independent of the radius of the given sphere since that information is not given to us. What would be a good choice for the radius of the sphere? An excellent option is to have the diameter of the sphere equal 6, so the volume of the removed cylinder is zero! If instead of choosing the diameter to be 6 we considered the general case, we would have to argue as in Figure 2. This is certainly possible, but it is not fun.



R2 − 9 R−3




Figure 2. Analysis of the removed cylinder from the sphere. Some calculus will yield the correct answer, but there is a simpler way! Of course, the difficulty of this problem is proving that the answer is independent of the radius of the initial sphere. However, if you are willing to accept this fact (which is implicit in the formulation of the problem), we just need to find the answer in one special case. We might as well choose the case that is the simplest. This is a truly powerful method and it is well worth mastering. Bibliography [1] M. Gardner, Hexaflexagons, probability paradoxes, and the Tower of Hanoi, New Martin Gardner Mathematical Library, vol. 1, Cambridge University Press, Cambridge; Mathematical Association of America, Washington, DC, 2008. Martin Gardner’s first book of mathematical puzzles and games; second edition of The Scientific American book of mathematical puzzles and diversions. MR2444876 [2] E. Peres, Martin Gardner: the mathematical jester, Mathematical lives, Springer, Berlin, 2011, pp. 217–220, DOI 10.1007/978-3-642-13606-1 31. MR2743951 [3] J. J. O’Connor and E. F. Robertson, Martin Gardner, MacTutor History of Mathematics, http://www-history.mcs.st-and.ac.uk/Biographies/Gardner.html.


General Relativity and the Absolute Differential Calculus Introduction Gregorio Ricci-Curbastro (1853–1925) developed a branch of mathematics known as the absolute differential calculus in his study of geometrical quantities and physical laws that are invariant under general coordinate transformations. The concept of a tensor first appeared in Ricci’s work, although a restricted form of tensors had been previously introduced in vector analysis. In 1901, Ricci and his student, Tullio Levi-Civita (1873–1941), published a complete account of the methods of absolute differential calculus and their applications [12]. Their work was a natural extension of the mathematics of curved surfaces introduced by Gauss and developed by Riemann and others, and of the vector analysis developed by Gibbs and Heaviside. Albert Einstein’s special theory of relativity deals with the study of the dynamics of matter and light in frames of reference that move uniformly with respect to each other, the so-called inertial frames. Those quantities that are invariant under the (Lorentz) transformation from one frame to another are of fundamental importance. They include the invariant interval between two events (ct)2 − x2 , the energy-momentum invariant E 2 − (pc)2 , and the frequency-wave number invariant ω 2 − (kc)2 . Here c denotes the speed of light in free space. The special theory is formulated in a gravity-free universe. Ten years after introducing his special theory of relativity, Einstein (1879–1955) published his crowning achievement, the general theory of relativity [6, 7]. This is a theory of space-time and dynamics in the presence of gravity. The essential mathematical methods used in the general theory are differential geometry and the absolute differential calculus (which Einstein referred to as tensor analysis). Einstein devoted more than five years to mastering the necessary mathematical techniques. He corresponded with Levi-Civita, asking for his advice on applications of tensor analysis. A tensor is a set of functions, fixed in a coordinate system that transforms under a change of the coordinate system according to definite rules. Each tensor component in a given coordinate system is a linear, homogeneous function of the components in another system. If there are two tensors with components that are equal when both are written in one coordinate system, then they are equal in all coordinate systems; these tensors are invariant under a transformation of the coordinates [14]. Physical laws are true in their mathematical forms for all observers in their own frames of reference (coordinate systems) and therefore the laws are necessarily formulated in terms of tensors.




Observed position

Actual position Sun during eclipse

Observer Figure 1. Gravitational lensing. Einstein’s belief that matter generates a curvature of space-time led him to the notion that space-time is Riemannian, that is, locally Euclidean. The entire curved surface can be approximated by tiling with flat frames. Einstein assumed that in such locally flat regions, in which there is no appreciable gradient in the gravitational field, a freely falling observer experiences all physical aspects of special relativity; the effects of gravity are thereby locally removed. This assumption is known as the principle of equivalence. In special relativity, the energy-momentum invariant is of fundamental importance. It involves energy E, momentum p, and rest mass m: E 2 − (pc)2 = (mc2 )2 . Einstein proposed that in general relativity, it is mass/energy that is responsible for the curvature. He introduced the stress-energy tensor, well known in physics, to be the quantity related to the curvature. He proposed that the relationship between them is the simplest possible; they are proportional to each other [12]: Curvature tensor = k (Stress-Energy tensor), in which the constant k is chosen so that the equation agrees with Newton’s law of gravity for the motion of low-velocity objects in weak gravitational fields (k = 8πG/c4 , in which G is Newton’s constant). In 1907, Einstein [5] combined his principle of equivalence with the theory of special relativity (1905) and predicted that clocks run at different rates in a gravitational potential, and that light rays bend in a gravitational field; see Figure 1. This work predated his introduction of the theory of general relativity (1915). In general relativity, objects falling in a gravitational field are not being acted upon by a gravitational force (in the Newtonian sense). Rather, they are moving along geodesics (distance-minimizing paths) in the warped space-time that surrounds massive objects. The observed deflection of light beams near the sun is a test of



the principle of equivalence. Tests of general relativity are an active part of research in physics and astronomy. The problem below is related to one of these tests; for a review of early tests of gravitational theory see [10]. The Schwarzschild line element, in the region of a spherical mass M (obtained as an exact solution of the Einstein field equations) is, in polar coordinates, ds2 = c2 (1 − 2GM/rc2 )dt2 − (1 − 2GM/rc2 )−1 dr 2 − r 2 (dθ 2 + sin2 θ dφ2 ). If χ = 2GM/rc2 is small, then the coefficient (1 − χ)−1 of dr 2 in the Schwarzschild line element can be replaced by the leading term of its binomial expansion to give the “weak field” line element ds2W = (1 − χ)(c dt)2 − (1 + χ)dr 2 − r 2 (dθ 2 + sin2 θ dφ2 ). At the surface of the sun, the value of χ is 4.2 · 8−6 , so that the weak-field approximation is valid for all gravitational phenomena in our solar system. Consider a beam of light traveling radially in the weak field of a mass M . Then ds2W = 0 (a light-like interval) and

dθ 2 + sin2 θ dφ2 = 0,

which gives 0 = (1 − χ)(c dt)2 − (1 + χ)dr 2 . The “velocity” of the light vL = dr/dt, as determined by observers far from the gravitational influence of M , is therefore  vL = c (1 − χ)/(1 + χ) < c since χ > 0. Observers in free fall near M have  χ = 0 and hence measure the speed of light to be c. Expanding the term (1 − χ)/(1 + χ) to first order in χ = 2GM/rc2 provides the approximation vL (r) ≈ c(1 − 2GM/rc2 + · · · ). In geometrical optics, the refractive index n of a material is n = c/vmedium , in which vmedium is the speed of light in the medium. We introduce the concept of the refractive index of space-time nG (r) at a point r in the gravitational field of a mass M : nG (r) = c/vL (r) ≈ 1 − 2GM/rc2 . The value of nG (r) increases as r decreases. This effect can be interpreted as an increase in the “density” of space-time as M is approached. As a plane wave of light approaches a spherical mass, those parts of the wave front nearest the mass are slowed down more than those parts farthest from the mass. The speed of the wave front is no longer constant along its surface, and therefore the normal to the surface must be deflected. The deflection of a plane wave of light by a spherical mass M of radius R, as it travels through space-time, can be calculated in the weak-field approximation. Centennial Problem 1915 Proposed by Frank W. K. Firk, Yale University. Show that in the weak-field approximation the total deflection Δα equals 4GM/Rc2 . This is Einstein’s famous prediction on the bending of light in a gravitational field.



1915: Comments There are several nice points worth isolating from this problem and remarking on. First, when a new theory is conjectured in the sciences, we test it to see whether or not it can explain current observations. In the case of the general theory of relativity, this was spectacularly done by its explanation of the perihelion of Mercury; Isaac Asimov (1920–1992) has a beautiful article on this [1]. If one is lucky, the theory also predicts new phenomena. A terrific example of such a theory is Bohr’s model for the hydrogen atom, which not only explained the observed spectral lines but also predicted others previously unseen. Scientists before Einstein, using Newtonian physics and particle models for light, posited a deflection of light passing near a massive object. But Einstein obtained a much different value for this deflection, which experiments then verified. Speaking of gravitational lensing, did you know that the number of images produced by n coplanar point lenses is at most 5n − 5? This was proved in 2008 using complex dynamics and harmonic function theory [9]. The second great lesson here is that the usefulness of mathematics is not always apparent. When asked about the utility of a new invention, Benjamin Franklin (1706–1790) remarked, “What is the use of a new-born child?” The differential geometry that underlies Einstein’s theories was not developed for relativity, but it was available and could be used when the proper situation arose. While it can take decades or more for some mathematics to find applications, such connections often arise to the surprise of many of the involved parties. The 1940 entry involves G. H. Hardy’s classic book, A Mathematician’s Apology; the reader is encouraged to jump to that entry and reflect, while reading the excerpt, on the fact that many of Hardy’s results have found a home in modern cryptography (and even in biology [2]). That said, for those who would like a more down-to-earth answer here is one: Einstein’s general theory of relativity is essential for the Global Positioning System (GPS) to function properly and accurately [13]. Finally, it is important to remember that the jury is always out and we should constantly explore additional ways to test a theory. It often takes decades or longer to fully explore all the predictions and verify the results of these experiments. To this end, there have been some exciting recent developments in the field of relativity. The Laser Interferometer Gravitational-Wave Observatory (LIGO) recently announced [11] that they have verified another prediction of Einstein’s general theory: the existence of gravitational waves. Of course, with monumental discoveries such as this, one must wait for the results to be confirmed. To give the reader a sense of how delicate these measurements are, researchers are looking for effects on the order of one part in 1021 . One article put this in perspective by saying this is equivalent to squishing our galaxy to the height of a human [8]. Bibliography [1] I. Asimov, The planet that wasn’t, The Magazine of Fantasy and Science Fiction (1975), May. http://geobeck.tripod.com/frontier/planet.htm. [2] H. E. Christenson and S. R. Garcia, G. H. Hardy: mathematical biologist, J. Humanist. Math. 5 (2015), no. 2, 96–102, DOI 10.5642/jhummath.201502.08. http://scholarship.claremont. edu/jhm/vol5/iss2/8. MR3378780 [3] P. A. M. Dirac, General theory of relativity, reprint of the 1975 original, Princeton Landmarks in Physics, Princeton University Press, Princeton, NJ, 1996. MR1373868



[4] A. Einstein, On the electrodymanics of moving bodies, Annalen der Physik 17 (1905), 891–921. http://www.fourmilab.ch/etexts/einstein/specrel/www/. For more of Einstein’s papers from this time period, see http://www.loc.gov/rr/scitech/SciRefGuides/einstein.html. ¨ [5] A. Einstein, Uber das Relativit¨ atsprinzip und die aus demselben gezogene Folgerungen, Jahrbuch Rad. 4 (1907), 410. http://www.relativitycalculator.com/pdfs/ Einstein_1907_Comprehensive_Essay_PartsI_II_III.pdf. [6] A. Einstein, The foundation of the general theory of relativity, Annalen der Physik (1916). http://web.archive.org/web/20060831163721/http://www.alberteinstein.info/ gallery/pdf/CP6Doc30_English_pp146-200.pdf. [7] A. Einstein, The meaning of relativity, reprint of the 1956 edition, Princeton University Press, Princeton, NJ, 1988. MR1042572 [8] C. Hanna, What happens when LIGO texts you to say it’s detected one of Einstein’s predicted gravitational waves, The Conversation, February 11, 2016. http://theconversation. com/what-happens-when-ligo-texts-you-to-say-its-detected-one-of-einsteinspredicted-gravitational-waves-53259. [9] D. Khavinson and G. Neumann, From the fundamental theorem of algebra to astrophysics: a “harmonious” path, Notices Amer. Math. Soc. 55 (2008), no. 6, 666–675. MR2431564 [10] D. F. Lawden, An introduction to tensor calculus, relativity and cosmology, 3rd ed., John Wiley & Sons, Ltd., Chichester, 1982. MR665917 [11] LIGO, Gravitational Waves Detected 100 Years After Einstein’s Prediction, LIGO News Release, February 11, 2016. https://www.ligo.caltech.edu/news/ligo20160211. [12] M. M. G. Ricci and T. Levi-Civita, M´ ethodes de calcul diff´ erentiel absolu et leurs applications (French), Math. Ann. 54 (1900), no. 1-2, 125–201, DOI 10.1007/BF01454201. MR1511109 [13] T. Van Flandern, What the Global Positioning System tells us about relativity, in Open Questions in Relativistic Physics (edited by F. Selleri), Apeiron (1998), 81–90. [14] C. M. Will, in General Relativity (edited by S. W. Hawking and W. Israel), Chapter 2, Cambridge University Press, 1979.


Ostrowski’s Theorem Introduction The absolute value function gives the magnitude of a real or complex number. However, there are other ways to define the “size” of a number. An absolute value on a field F is a real-valued function that satisfies (a) x ≥ 0, (b) x = 0 if and only if x = 0, (c) xy = x y , and (d) x + y ≤ x + y . Josef Kurschak proposed these axioms in 1912, although Kurt Hensel (1861–1941) had started related research in 1897. The standard absolute value on the field Q of rational numbers is  x if x ≥ 0, x 0 = −x if x < 0. Another example is the trivial absolute value, defined by  1 if x = 0, x = 0 if x = 0. There is an important type of absolute value on Q that leads to a notion of the size of a number that is related to its arithmetic properties. Given a prime p and a nonzero rational number x, we may write x = pn a/b, in which n, a, b ∈ Z and a, b, p are pairwise relatively prime. The p-adic absolute value of x is  0 if x = 0, x p = −n p if x = 0 and x = pn a/b as above. The beautiful Artin–Whaples product formula x 0 x p = 1,

x ∈ Q,


p prime

relates the standard absolute value to all of the p-adic absolute values. We say that two absolute values · 1 and · 2 on a field F are equivalent if there is a c > 0 so that x 1 = x c2 for all x ∈ F. In 1916, Alexander Ostrowski (1893– 1986) proved what is now known as Ostrowski’s theorem: each absolute value on the rational numbers is equivalent to the trivial absolute value, the standard absolute value, or a p-adic absolute value. In other words, we have a complete description 17



of all possible ways to generalize the notion of “size” for rational numbers so that the four axioms above hold. The standard absolute value on Q is Archimedean; that is, for each x = 0 there is an N ∈ N so that nx 0 > 1 for all n ≥ N . In contrast, the p-adic absolute values are non-Archimedean. Since the Archimedean property is, in a sense, “natural,” one might use Ostrowski’s theorem to argue that the standard absolute value is the most natural possible absolute value one can endow Q with. Centennial Problem 1916 Proposed by David Burt and Steven J. Miller, Williams College. √ √ A number field is a finite field extension of Q, such√as Q[ −5] = {a + b −5 : a, b ∈ Q}. Observe that unique factorization fails in Q[ −5] since √ √ 2 · 3 = (1 + −5)(1 − −5) √ and none of the factors divide any of the others in Q[ −5]. Are there notions of absolute values in this context? If so, what are they? 1916: Comments The p-adic numbers. Each absolute value on Q defines a metric. The standard metric on Q is d0 (x, y) = x − y 0 . On the other hand, each prime number p gives rise to the p-adic metric on Q: dp (x, y) = x − y p . The real number system is the completion of Q with respect to the standard metric. In the same way, for each prime p we complete Q with respect to the p-adic metric and obtain the p-adic number system Qp . Just as the completion of Q with respect to the standard metric is a field (namely R), one can show that Qp is a field. What do the elements of Qp look like? First let us examine how Z, the set of integers, sits inside of Q3 ; see Figure 1. Modulo 3, the integers come in precisely three flavors: an integer is congruent to exactly one of 0, 1, or 2 modulo 3. If x ≡ y (mod 3), then they are “pretty close” to each other in Qp since 3|(x − y) and hence x − y 3 ≤ 13 . If x ≡ y (mod 9), then they are even closer since 9|(x − y) and hence x − y 3 ≤ 19 . Continuing in this fashion, a famous fractal (a Sierpi´ nski triangle; see the 1963 entry) emerges. This is suggested by Figure 1(d). To picture how Z sits inside of Q3 , imagine iterating this process “downward” infinitely many times; to picture Q3 itself, imagine iterating “upward” too! If you are baffled and confused, then we have done our job. The p-adic number system is strange. For instance, the p-adic metric is an example of an ultrametric. An ultrametric is a metric that satisfies the strong triangle inequality d(x, z) ≤ max{d(x, y), d(y, z)}. One consequence of this is that every triangle in Qp is isosceles: if x − y p = z − y p , then x − y p = max { x − z p , z − y p } . Even more baffling is the fact that every point in a p-adic open disk is a center of that open disk! Try to prove these results.


(a) Location in Q3 of the integers congruent to 0, 1, 2 (mod 3).


(b) Location in Q3 of the integers congruent to 0, 1, . . . , 8 (mod 9). 0










18 72





6 57







21 66






15 75


24 78





2 55







19 64

















(c) Location in Q3 of the integers congruent to 0, 1, . . . , 26 (mod 27).

34 22


38 7







32 25


20 74


16 76







8 59

14 79


23 68






17 77


26 80

(d) Location in Q3 of the integers congruent to 0, 1, . . . , 80 (mod 81).

Figure 1. Depiction of the integers in Q3 .

One of the most common mistakes in mathematics is to use a formula without checking to see if its requirements are satisfied. If we ignore the fact that 2 > 1, then the geometric series formula suggests that 1 + 2 + 22 + 23 + 24 + · · · =

1 = −1. 1−2


This seems absurd: how can the sum of infinitely many positive numbers be negative? It cannot, if you insist on using the standard metric on Q. It does however make sense 2-adically since

N −1

1 − 2N

n N

N −N

2 − (−1) =

+ 1

= 1 − (1 − 2 ) 2 = 2 2 = 2 ,

1−2 n=0



which tends to zero as N → ∞. So the partial sums of our series converge to −1 with respect to the 2-adic metric. Thus, (1916.2) makes sense in Q2 ! From this analysis we can extract an important lesson: whether or not something converges



depends on what we mean by “converges”. Can you show that 1 1 + 3 + 32 + 33 + 34 + · · · = − 2 in Q3 ? What is the closure of Q ∩ (0, ∞) in Qp ? Is it all of Qp ? Now that we have played around with p-adic arithmetic a little, a more conat hand. Each p-adic number can be expressed as an crete description of Qp is  ∞ infinite series of the form n=N an pn with each an ∈ {0, 1, 2, . . . , p − 1} for some integer N (the series converges with respect to the p-adic metric). At this point, manipulating p-adic numbers is analogous to handling decimal expansions of real numbers. Instead of powers 10n with n running from N (usually positive) to −∞, we have powers pn with n running from N (potentially negative) to +∞. Of course, Qp is good for something besides mathematical parlor tricks. For instance, the famous Hasse–Minkowski local-global principle for quadratic forms (see the 1966 and 1993 entries) asserts that a multivariate quadratic equation with rational coefficients has a solution in integers if and only if it has a solution in R and in each Qp . The unity suggested by the product formula (1916.1) is no illusion! Bibliography [1] J. E. Holly, Pictures of ultrametric spaces, the p-adic numbers, and valued fields, Amer. Math. Monthly 108 (2001), no. 8, 721–728, DOI 10.2307/2695615. https://www.colby.edu/math/ faculty/Faculty_files/hollydir/Holly01.pdf. MR1865659 ¨ [2] A. Ostrowski, Uber einige L¨ osungen der Funktionalgleichung ψ(x) · ψ(x) = ψ(xy) (German), Acta Math. 41 (1916), no. 1, 271–284, DOI 10.1007/BF02422947. http://link.springer. com/article/10.1007%2FBF02422947. MR1555153 [3] W. Stein, Introduction to Algebraic Number Theory, May 2005, http://wstein.org/129-05/ notes/129.pdf.


Morse Theory, but Really Cantor Introduction Marston Morse (1892–1977) was inspired by the work of Jacques Hadamard (1865–1963), Henri Poincar´e (1854–1912), and his advisor George Birkhoff (1884– 1944). In choosing a topic for his thesis, he wished to combine the fields of analysis and geometry, a theme that continued throughout his life’s work. An entire branch of mathematics, Morse theory, is named after him. The shortest distance between two points in a plane is a straight line, and straight lines have constant slope. Now consider two points on a surface. The analogue for the straight line is a curve called a geodesic. The analogue for constant slope is that the tangent vectors to the curve remain parallel as they are transported along the curve. For example, on a sphere the geodesic between two points is the arc of the great circle going through them; see Figure 1. Morse often focused on surfaces with negative curvature, such as the “pair of pants” in Figure 2(a). In his 1917 thesis he proved the existence of certain types of nonperiodic geodesics on surfaces of negative curvature; for more information, see Morse’s article [2]. On a less happy note, 1917 was the year when Georg Cantor (1845–1918) entered the sanatorium in which he ultimately died. We have a lot of things to say about the work of Cantor, so much so that he has snuck into this entry even though the entire 1918 entry is devoted to him!

Figure 1. Geodesics on the sphere are great circles. 21



(a) A topological pair of pants.

(b) Add a pair of pants to each “leghole” and continue indefinitely.

Figure 2. If you have infinitely many pairs of pants, you could try this at home.

Define a sequence of subsets Cn of [0, 1] according to the following scheme: C0 = [0, 1], C1 = [0, 13 ] ∪ [ 23 , 1], C2 = [0, 19 ] ∪ [ 29 , 13 ] ∪ [ 23 , 79 ] ∪ [ 89 , 1], and so forth. For n ∈ N, the set Cn is obtained from Cn−1 by removing the middle third of every closed interval contained in Cn−1 ; see Figure 3. The Cantor set is C=

Cn ,


which is what is “left over” after removing the open intervals ( 13 , 23 ), ( 19 , 29 ), ( 97 , 89 ), . . . from [0, 1]. To pass from stage Cn−1 to Cn we remove 2n−1 open intervals of length 1/3n . Thus, the total length of the omitted intervals is ∞  2n−1 = 1. 3n n=1

What remains behind, namely C, has Lebesgue measure zero; that is, it has zero length. The Cantor set is a fractal, a set that demonstrates self-similarity: it contains infinitely many scaled copies of itself. Moreover, C is a compact, uncountable (see the 1918 and 1999 entries), nowhere dense, and totally disconnected subset of [0, 1]. Two metric spaces (X, dX ) and (Y, dY ) are homeomorphic if there is a continuous bijection f : X → Y whose inverse f −1 : Y → X is continuous; the functions f



Figure 3. The sets C0 , C1 , . . . , C6 . and f −1 are homeomorphisms. The notion of homeomorphism provides an equivalence relation among metric spaces. One often considers homeomorphic metric spaces to be “the same.” The following problem is related, perhaps tangentially, to the work of both Morse and Cantor. It concerns a specific surface of negative curvature that lies at the intersection of analysis and topology, a recurring theme in much of Morse’s work [4, p. 29]. Centennial Problem 1917 Proposed by Joanne Snow, Colleen Hoover, and Steven Broad at Saint Mary’s College. Let C ⊂ R ⊂ R2 be the Cantor set, embedded in R2 . Show that R2 \C is homeomorphic to the surface pictured in Figure 2(b). 1917: Comments The Brunn–Minkowski theorem and Cantor dust. It seems appropriate to spend a few pages discussing some little-known, but extremely interesting, properties of the Cantor set. Much of this can be found in [3]. A famous result that combines arithmetic properties of sets with topological and measure-theoretic properties is the Brunn–Minkowski theorem. Let n ∈ N, let A and B be two nonempty, compact subsets of Rd , and let A + B = {a + b : a ∈ A, b ∈ B}. Then (m(A))1/n + (m(B))1/n ≤ (m(A + B))1/n ,



Figure 4. The set C × C is sometimes called Cantor dust. in which m(S) denotes the Lebesgue measure of S.1 For n = 1, we have m(A) + m(B) ≤ m(A + B)


for any nonempty, compact subsets A, B ⊆ R. How large can the gap between the left- and right-hand sides of the inequality be? It turns out that m(A + B) can be made arbitrarily large, even if m(A) = m(B) = 0. This is a consequence of the fact that C + C = [0, 2]. (1917.2) To prove (1917.2), it suffices to show that for each b ∈ [0, 2], the line  defined by y = −x + b intersects the Cantor dust C × C; see Figure 4. Indeed, if  contains a point (x0 , y0 ) ∈ C × C, then b = x0 + y0 ∈ C + C. The Cantor dust can be constructed iteratively in a manner similar to the Cantor set. Start with a unit square, subdivide it into 9 squares with side length 1/3, remove the central “plus sign” consisting of 5 squares, and iterate the process. Since the slope of the line  is −1, it intersects at least one of the four corner squares in stage one; call it S1 . Similarly,  intersects one of the corner squares in stage two; call it S2 . Continuing in this manner, we obtain a decreasing sequence ∞ of closed squares S1 ⊃ S2 ⊃ · · · . Consequently, n=1 Sn consists of a single point (x0 , y0 ) ∈ C × C.2 We have (x0 , y0 ) ∈  since  is closed and  contains points that are arbitrarily close to (x0 , y0 ). Returning to (1917.1), we may let α > 0 and let A = B = { α2 c : c ∈ C} be a scaled copy of the Cantor set, so that m(A) = m(B) = 0 and m(A + B) = α. The Cantor surjection theorem. Another surprising result about the Cantor set is the Cantor surjection theorem. If (X, d) is a compact metric space, then there is a continuous surjection f : C → X (see the 1918 entry). Thus, every compact metric space is the continuous image of the Cantor set. This does not 1 Think

of m(S) as the “length” of S. That is, until you read the notes to the 1924 entry! a complete metric space, the intersection of a sequence A1 ⊇ A2 ⊇ · · · of nested compact sets whose diameters sup{d(x, y) : x, y ∈ An } tend to zero is a singleton. 2 In



contradict anything about connectedness: although the continuous image of a connected set is connected, the continuous image of a disconnected set can very well be connected. The proof of the Cantor surjection theorem, while not difficult in principle, is notationally cumbersome. The basic premise is that one uses compactness to dissect X into 2k1 nonempty compact subsets of diameter less than 1/2 for some k1 , then one dissects the resulting “pieces” into 2k2 pieces of diameter less than 1/4, and so forth. By labeling things cleverly, one obtains a dyadic filtration of X with which each x ∈ X can be assigned at least one address using a binary string. This is done in a manner so that the strings corresponding to x and y agree for more and more initial bits the closer that x and y are; this guarantees the continuity of our function. Each binary string specifies a point in the Cantor set; the sequence of zeros and ones specifies whether one stays in the left- or right-hand interval at each stage in the Cantor set construction. A remarkable consequence of the Cantor surjection theorem is that the cardinality of a compact metric space cannot exceed that of R. That is, if (X, d) is a compact metric space, then there is an injection g : X → R (that is, g is oneto-one). For instance, it is impossible to endow the powerset P(R) of R (see the comments for the 1918 entry) with a metric so that it becomes a complete metric space. Peano curves. A byproduct of the Cantor surjection theorem is the existence of Peano curves, that is, curves whose images have nonempty interiors. To be more specific, if K is a nonempty, compact, convex subset of Rn , then there exists a continuous surjection f : [0, 1] → K. Here is a sketch of the proof. Let g : C → K be the continuous surjection from the Cantor surjection theorem and let  g(x) if x ∈ C, f (x) = (1 − t)g(a) + tg(b) if x = (1 − t)x + tb ∈ (a, b), in which (a, b) denotes a gap interval from the construction of the Cantor set; that is, (a, b) ⊆ [0, 1]\C and a, b ∈ C. Since K is convex and g(a), g(b) ∈ K, it follows that (1 − t)g(a) + tg(b) ∈ K for t ∈ [0, 1]. Moreover, f is continuous since g is continuous and f extends g linearly on each gap interval. Voil`a! Returning to the concept of homeomorphism, we should mention that C is homeomorphic to C × C. This is a consequence of the Moore–Kline theorem: if X = ∅ is a compact, perfect (closed with no isolated points), totally disconnected metric space, then X is homeomorphic to C. Bibliography [1] H. M. Morse, A One-to-One Representation of Geodesics on a Surface of Negative Curvature, Amer. J. Math. 43 (1921), no. 1, 33–51, DOI 10.2307/2370306. http://www.jstor.org/ stable/2370306. MR1506428 [2] H. M. Morse, Recurrent geodesics on a surface of negative curvature, Trans. Amer. Math. Soc. 22 (1921), no. 1, 84–100, DOI 10.2307/1988844. http://www.ams.org/journals/tran/1921022-01/S0002-9947-1921-1501161-8/S0002-9947-1921-1501161-8.pdf. MR1501161 [3] C. C. Pugh, Real mathematical analysis, 2nd ed., Undergraduate Texts in Mathematics, Springer, Cham, 2015. MR3380933 [4] M. Spivak, A comprehensive introduction to differential geometry. Vol. One, Published by M. Spivak, Brandeis Univ., Waltham, Mass., 1970. MR0267467


Georg Cantor Introduction It is known that there are an infinite number of worlds, simply because there is an infinite amount of space for them to be in. However, not every one of them is inhabited. Therefore, there must be a finite number of inhabited worlds. Any finite number divided by infinity is as near to nothing as makes no odds, so the average population of all the planets in the Universe can be said to be zero. From this it follows that the population of the whole Universe is also zero, and that any people you may meet from time to time are merely the products of a deranged imagination. [1]

While we hold Douglas Adams (1952–2001), author of the famed Hitchhiker’s Guide to the Galaxy “trilogy,” in the highest regard, “this argument isn’t worth a pair of fetid dingo’s kidneys.” Find at least three things wrong with his argument! The most influential mathematician to study infinity was Georg Cantor. His work was so mind-blowing that he even managed to appropriate territory in our 1917 entry. Before getting into Cantor’s theory of cardinality and some of its jawdropping consequences, let us first warm up with a few infinity-related paradoxes. Imagine that every second, you are given two numbers that you add to your (initially empty) collection. The first pair is 1, 2, the second pair is 3, 4, and so on. After receiving each pair of numbers, you must discard exactly one number from your collection. Let us examine two strategies for handling this situation. (a) Every time you receive a pair of numbers, you discard the odd one. Thus, the number 2n arrives in round n and remains in your collection in all successive rounds. You are eventually left with the infinite set {2, 4, 6, . . .}. (b) Every time you receive a pair of numbers, you discard the lowest number in your collection. Thus, the natural number n arrives in round  n+1 2  and is removed in round n. You are eventually left with the empty set ∅. In both scenarios, you discard exactly one card in each round. How can they lead to two such different outcomes? This next paradox is from the final book of Galileo Galilei (1564–1642), Discourses and Mathematical Demonstrations Relating to Two New Sciences (1638). Let S = {1, 4, 9, 16, . . .} denote the set of perfect squares. Galileo’s paradox is the apparent contradiction that although S is “much smaller than N,” the function n → n2 exhibits a one-to-one correspondence between S and N. How can this be? A function f : A → B is injective (one-to-one) if “distinct inputs are sent to distinct outputs”, that is, if f (a) = f (a ) implies a = a . We say that f : A → B is surjective (onto) if every element of B is “hit by f ”, that is, if f (A) = B. Two sets A and B are equinumerous if there is a one-to-one and onto function f : A → B. 27















(a) Injective and surjective.

(b) Not injective and not surjective.




b c






(c) Surjective and not injective.

x y z w

(d) Injective and not surjective.

Figure 1. Of the four functions depicted, only (a) is a bijection.

Such a function is called a bijection; see Figure 1. This relationship between A and B is denoted A ∼ = B; we also say that A and B have the same cardinality. One of the most important properties of the symbol ∼ = is that it is an equivalence relation. In other words, it “behaves like an equal sign” in the sense that it is reflexive (A ∼ = A), symmetric (A ∼ = B implies that B ∼ = A), and transitive (A ∼ =B ∼ ∼ and B = C implies that A = C). Can you prove this? We say that A is finite if A = ∅ or A ∼ = {1, 2, . . . , n} for some n ∈ N. For finite sets, A ∼ = B just means that “A and B have the same number of elements.” We say that A is infinite if A is not finite, and countable if A is finite or A ∼ = N. In fact, A is infinite if and only if it has a proper subset B such that A ∼ B. For = instance, Galileo noted that S is a proper subset of N and S ∼ N. = An infinite set A is countable if and only if its elements can be enumerated a1 , a2 , a3 , . . . without repetition. Indeed, if A is so enumerable, then the function f : N → A defined by f (n) = an is a bijection. Conversely, each bijection f : N → A gives rise to an enumeration f (1), f (2), f (3), . . . of A. Even though Z = {. . . , −2, −1, 0, 1, 2, . . .} has ellipses going in two directions, it is countable since 0, 1, −1, 2, −2, 3, −3, 4, −4, . . . is an enumeration of Z. In fact, an explicit bijection f : N → Z is ⎧n ⎪ ⎨ 2 f (n) = 1 ⎪ ⎩ −n 2

if n is even, if n is odd.

Can you use a similar idea to prove that the union of two countable sets is countable?



(1, 5)

(2, 5)

(3, 5)

(4, 5)

(5, 5)

(1, 4)

(2, 4)

(3, 4)

(4, 4)

(5, 4)

(1, 3)

(2, 3)

(3, 3)

(4, 3)

(5, 3)

(1, 2)

(2, 2)

(3, 2)

(4, 2)

(5, 2)

(1, 1)

(2, 1)

(3, 1)

(4, 1)

(5, 1)

Figure 2. Diagram illustrating an enumeration of N2 .

What about N2 = N×N? It is countable too! Consider the function f : N2 → N defined by f (a, b) = 2a−1 (2b − 1). If f (a, b) = f (c, d), then 2a−1 (2b − 1) = 2c−1 (2d − 1) .       odd


The fundamental theorem of arithmetic ensures that a = c and b = d, so f is injective. Given n ∈ N, we may write n = 2a−1 (2b − 1) for some a, b ∈ N and hence f (a, b) = n. Thus, f is surjective and we conclude that N2 ∼ = N. See Figure 2 for a different approach (a “proof without words”). A similar argument shows that Q is countable too; see Figure 3. Another approach is to first show that the map n →

1 2n − n + 1

recursively generates the Calkin–Wilf sequence 11 , 12 , 21 , 13 , 32 , 23 , 31 , 14 , 43 , . . ., which is an enumeration of the positive rational numbers. At this point, one might suspect that every infinite set is countable. How boring would that be? One of Cantor’s most brilliant insights is that uncountable sets exist. One example is R. In fact, it is impossible to find a surjective function f : N → [0, 1). Here is Cantor’s classic (1891) diagonal argument [3]. Recall that each real number in [0, 1) can be written uniquely as a sequence 0.d1 d2 d3 . . . of decimal digits di that does not eventually terminate in all 9’s. Given a function f : N → [0, 1), consider the list f (1), f (2), . . .. Write the decimal representations of these numbers in an




− 1/2
























Figure 3. Diagram illustrating an enumeration of Q. Quotients m/n that are undefined or that have appeared previously in the enumeration are discarded; these are in red. array f (1) f (2) f (3) f (4) f (5) .. .

= = = = = .. .

0.d11 d12 d13 d14 d15 . . . 0.d21 d22 d23 d24 d25 . . . 0.d31 d32 d33 d34 d35 . . . 0.d41 d42 d43 d44 d45 . . . 0.d51 d52 d53 d54 d55 . . . .. .

and consider the number c = 0.c1 c2 c3 . . . ∈ [0, 1), in which  4 if dnn = 4, cn = 7 if dnn = 4.


For each n = 1, 2, . . ., the nth digit of c differs from the nth digit of f (n). Since c = f (n) for any n, the function f : N → [0, 1) is not a surjection.1 A shocking consequence of the uncountability of R is that there are many more irrational numbers than rational numbers. Indeed, if the set of Qc of irrational numbers were countable, then R = Q ∪ Qc would be the union of two countable sets and hence be countable, which is not the case. An algebraic number is a complex number that is a root of a polynomial with integer coefficients. The set A of all algebraic numbers includes the rationals and 1 Why can this argument not be used to prove that Q is uncountable? Even if each f (1), f (2), . . . is rational, there is no way to guarantee that c has an eventually repeating decimal expansion (a real number is rational if and only if it has an eventually repeating decimal expansion).


numbers such as 1/3


√ i = −1,



√ √ 3 + 5;



these are roots of x3 − 2,

x2 + 1,

and x8 − 16x4 + 4,

respectively. The degree of the integer polynomial with least degree for which an algebraic number α is a root is called the degree of α. For instance, the numbers (1918.2) have degrees 3, 2, and 8, respectively. One can show that the set of all algebraic numbers is countable2 and hence most real numbers are transcendental (not algebraic). For more information about transcendental numbers, see the 1935 and 1955 entries. If all of this has not blown your mind, then maybe our next major revelation will. If A is a set, then the powerset P(A) of A is the set of all subsets of A. For example, if A = {a, b, c}, then   P(A) = ∅, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, A . Cantor’s powerset theorem asserts that if S is any set, then there does not exist a surjection (let alone a bijection) f : S → P(S). Since s → {s} furnishes an injection from S into P(S) (so P(S) is “at least as big as S”), Cantor’s theorem tells us that “P(S) is of a strictly larger cardinality than S.” Starting with S = N and iterating the preceding result reveals that there are “infinitely many levels of infinity”! If that did not blow your mind, then please consider a pan galactic gargle blaster3 or two. Here is the proof of Cantor’s theorem. Suppose toward a contradiction that f : S → P(S) is a surjection. For each x ∈ S, we have f (x) ⊆ S and hence either x ∈ f (x) or x ∈ / f (x). Let E = {x ∈ S : x ∈ / f (x)}. Since f is a bijection, there exists a z ∈ S such that f (z) = E. However, z ∈ E ⇐⇒ z ∈ / f (z) ⇐⇒ z ∈ / E; the first equivalence is from the definition of E, the second since f (z) = E. This contradiction shows that no such f exists. Centennial Problem 1918 Proposed by Stephan Ramon Garcia, Pomona College.4 A chain of subsets of N is a collection C ⊆ P(N) that is totally ordered by the relation ⊆; that is, if A, B ∈ C, then either A ⊆ B or B ⊆ A. We refer to such subsets A and B as links in the chain C. Does there exists a chain C of subsets of N that has uncountably many links? For the solution, see the comments below.  show that A = ∞ n=0 An , in which An denotes the set of algebraic numbers of degree n. Show that each An is countable (how many degree n polynomials with integer coefficients are there?) and, as a consequence, that A is countable. 3 The effect is similar to “having your brains smashed in by a slice of lemon wrapped round a large gold brick” [1]. 4 The original proposed problem, due to Steven Miller of Williams College, dealt with functions taking on only transcendental values. This has been moved to the 1955 entry. 2 First



1918: Comments A common misconception. Cantor’s diagonal method is often described as existential and nonconstructive. This is incorrect, since it can be used to produce a real number that is not on the given list. For instance, when Cantor’s method is applied to a list of all algebraic numbers, in some specified order, it produces the digits of a transcendental number. For more on these issues, see the excellent article by Gray [4]. Solution to the problem. There are many potential false starts to the problem. If your solution involves the word “next” or “first”, then it is probably incorrect! Since N ∼ = Q, we may as well replace N with Q to see if that makes things easier. For each x ∈ R, let Ax = {q ∈ Q : q < x}. The function f : R → P(Q) defined by f (x) = Ax is an injection since the density of Q in R implies that x < y if and only if Ax  Ay . It follows that the collection f (R) = {Ax : x ∈ R} is uncountable and linearly ordered by ⊆. Now let g : N → Q be a bijection and let Bx = {n ∈ N : g(n) < x}. The collection C = {Bx : x ∈ R} is an uncountable chain of subsets of N. Bibliography [1] D. Adams, The Restaurant at the End of the Universe, Pan Books, 1980. [2] H. Cantor, Ueber eine Eigenschaft des Inbegriffs aller reellen algebraischen Zahlen (German), J. Reine Angew. Math. 77 (1874), 258–262, DOI 10.1515/crll.1874.77.258. MR1579605 ¨ [3] G. Cantor, Uber eine elementare Frage der Mannigfaltigskeitslehre, Jahresbericht der Deutschen Mathematiker-Vereinigung 1 (1891), 75–78. [4] R. Gray, Georg Cantor and transcendental numbers, Amer. Math. Monthly 101 (1994), no. 9, 819–832, DOI 10.2307/2975129. http://www.jstor.org/stable/2975129. MR1300488 [5] J. J. O’Connor and E. F. Robertson, Georg Ferdinand Ludwig Philipp Cantor, MacTutor History of Mathematics, http://www-history.mcs.st-andrews.ac.uk/Biographies/Cantor. html.


Brun’s Theorem Introduction One of the most tantalizing conjectures in number theory is the twin prime conjecture. It asserts that there are infinitely many pairs of primes that differ by 2; a prime in such a pair is a twin prime. Examples of such pairs are 5 and 7, 29 and 31, and 2,996,863,034,895 × 21,290,000 ± 1 (the numbers in the last example have 388,342 digits when fully written out [11]). More generally, given any even k ∈ N, are there infinitely many pairs of primes whose elements differ by k? This is Polignac’s conjecture. Although both conjectures remain open, there has been remarkable progress over the past 100 years, culminating in the 2013 proof of Yitang Zhang (1955– ) that there is some even number k ≤ 70,000,000 such that infinitely many pairs of primes differ by k. This result has been improved and generalized by many authors, especially James Maynard (1987– ), Terence Tao, and the Polymath8 project [2, 8, 9]. It is now known that there are infinitely many pairs of primes that differ by at most 246. One of the earliest results in the field is due to Viggo Brun (1885–1978), who proved in 1919 that the sum of the reciprocals of the twin primes converges. Compare this to Euler’s result that the sum of the reciprocals of the primes diverges (see p. 4). Thus, in a qualitative sense, the twin primes are far more sparse than the primes. The value of Brun’s sum,       1 1 1 1 1 1 + + + + + + ··· , (1919.1) B = 3 5 5 7 11 13 which is at least 1.83 and less than 2.347 [4], is Brun’s constant. The search for a good approximation to it led Thomas Nicely of Lynchburg College to discover a floating-point arithmetic error in Intel’s Pentium processor [6]. This led to a $475 million loss for Intel, demonstrating the power of pure mathematics! Unfortunately, the convergence of Brun’s series does not resolve the twin prime conjecture since there are many infinite collections of natural numbers that have convergent reciprocal sums. The perfect squares are an example, since ∞  1 π2 (1919.2) = n2 6 n=1 as Euler showed in 1734 (see the notes for a derivation). Since any finite sum of rational numbers is rational, if one could show that Brun’s constant were irrational, then one would have a proof of the twin prime conjecture! In what follows, log x denotes the base-e logarithm of x. We say that functions f and g are asymptotically equivalent, denoted f ∼ g, if limx→∞ f (x)/g(x) = 1. Let π(x) denote the number of primes at most x. For example, π(10.5) = 4 since 33



Figure 1. Plot of the prime counting function π(x) (top) versus the twin prime counting function π2 (x) (bottom). The prime number theorem ensures that π(x) grows like x/ log x; the first Hardy–Littlewood conjecture asserts that π2 (x) grows like a constant times x/(log x)2 . 2, 3, 5, 7 ≤ 10. Similarly, we let π2 (x) denote the number of twin primes at most x. The celebrated prime number theorem states that π(x) ∼ x/ log x; that is, lim


π(x) = 1; x/ log x

see the 1933 and 1948 entries. Consequently, for any C > 1 π(x) ≤

Cx log x

for sufficiently large x. In contrast, one can show that there is a constant D > 0 such that Dx π2 (x) ≤ (log x)2 for sufficiently large x. The smallest constant known to work here is D = 4.5 [12]. A refinement of the twin prime conjecture is the Hardy–Littlewood conjecture (twin primes), which suggests that x/(log x)2 is the appropriate benchmark function for the twin primes. The conjecture is  x dt π2 (x) ∼ 2C2 , (1919.3) (log t)2 2 in which C2 =

p(p − 2) = 0.660161815 . . . (p − 1)2



is the twin primes constant [3]; see Figure 1. A simpler expression that is asymptotically equivalent to (1919.3) is 2C2 x/(log x)2 . See the comments for the 2005 entry for information about the Bateman–Horn conjecture, a wide-ranging generalization of the Hardy–Littlewood conjecture.



Centennial Problem 1919 Proposed by Stephan Ramon Garcia, Pomona College, and Steven J. Miller, Williams College. Let Ntwin be the set of all natural numbers whose only prime factors are twin primes. Thus, Ntwin contains 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 25, but not 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 23, 24. Note that 1 ∈ Ntwin because the set of all primes that divide 1, namely the empty set, is a subset of Ptwin , the set of twin primes! Does  1 (1919.5) S = n n∈Ntwin

converge or diverge? If it converges, approximate the sum. If this one is too hard to start out with, here is a closely related problem that is a little easier. Let A denote the set of all natural numbers that do not have a “9” in their decimal representations. In other words, A = {1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, . . .}. Does


 1 1 1 1 1 = 1 + + + ··· + + + ··· n 2 3 8 10


converge or diverge? To put this another way: have we removed enough terms from the harmonic series to obtain a series that converges? 1919: Comments The Basel problem. In 1644, Pietro Mengoli (1626–1686), posed the famous Basel problem: evaluate 1 1 1 1 + + + + ··· . 4 9 16 25 This was solved by Euler in 1734, who provided the formula (1919.2); the problem is named after his hometown. There are now dozens of proofs of Euler’s result. We present a 2015 proof by Samuel G. Moreno, which is also one of the shortest [5]. It simplifies an earlier argument of Eberhard L. Stark [10]. We require the mean value theorem for integrals: if f : [a, b] → R is continuous and g : [a, b] → R is Riemann integrable and nonnegative, then there is a c ∈ (a, b) so that  b  b f (x)g(x) dx = f (c) g(x) dx. 1+



We start by proving the well-known formula sin((n + 12 )x) 1  + cos kx = 2 2 sin x2 n


from Fourier analysis. Euler’s formula eix = cos x + i sin x, in which i2 = −1, implies that cos x =

eix + e−ix 2


sin x =

eix − e−ix . 2i




Convert the sum on the left-hand side of (1919.7) into a sum of complex exponentials, use the finite geometric series summation formula: 1 + r + · · · + r n = (1 − r n+1 )/(1 − r), and appeal to the exponential representation of the sine function to complete the proof of (1919.7). Now multiply both sides of (1919.7) by x2 − 2πx and integrate by parts over [0, π] to obtain  π n   x/2 π 3  2π + = (x − 2π) sin (n + 1/2)x dx − 2 3 k sin(x/2) 0 k=1      u


  π x/2 − cos (n + 1/2)x  = (x − 2π)   sin(x/2) n + 1/2     0  u v    π − cos (n + 1/2)x du − dx n + 1/2 dx 0       v du   π cos (n + 1/2)ξ du −2π n + dx = n + 1/2 n + 1/2 0 dx   −2π + (u(π) − u(0)) cos (n + 1/2)ξn = n + 1/2   2 −2π + (2π − π /2) cos (n + 1/2)ξn . = n + 1/2

ξn ∈ [0, π]

Let n → ∞ and obtain   ∞ −2π + (2π − π 2 /2) cos (n + 1/2)ξn π 3  2π − + = 0, = lim n→∞ 3 k2 n + 1/2 k=1

which is equivalent to (1919.2). This completes the proof. Solution to the second problem. Before tackling the first (and more difficult) problem, let us address the second: the series converges. One way to interpret this result is: “most natural numbers have 9’s in them.” Big numbers have lots of digits, and hence a high probability of having a 9 in them somewhere. Since most numbers are “big,” we expect that the set (1919.6) omits most natural numbers. Let us try to make this precise. The sum of the terms with single-digit denominators is 1 1 1 + + ···+ < 9 · 1 = 9. 2 3 8 The sum of the terms with 2-digit denominators is 1+

1 1 1 1 + + ··· + < 92 · , 10 11 88 10 since there are 92 ways of getting an ordered pair of digits from the set {0, 1, 2, 3, 4, 5, 1 is the largest summand in the group. Similarly, the sum of the 6, 7, 8} and since 10



terms with 3-digit denominators is less than 93 /102 , and so forth. Thus, the series converges1 and    2  1 9 9 9 < 9 1+ + = 90. + ··· = n 10 10 1 − 9/10 n∈A

Solution to the first problem. Let us get back to the first problem. The sum in (1919.5) can be written as   1 1 −1 S= (1 − 1/p) = 1 + + 2 + ··· . p p p∈Ptwin


To see this, multiply the right-hand side of the preceding equation term-by-term and use the fundamental theorem of arithmetic (this is permissible by Mertens’s theorem; see the 1933 entry). Consequently,      1 1 1 1 + 2 + 3 + ··· log (1919.8) log S = = 1 − 1/p p 2p 3p p∈Ptwin



 1 1  1 1  1 + + ··· + 2 p 2 p 3 p3




 1  1  1 1 1 + + + ··· 2 p 2·3 p 3·3 p


= (B − 15 )




1 = 3(B − 15 ) log 32 < ∞, n−1 n3 n=1

so the series that defines S converges. The appearance of B − 15 in place of Brun’s constant is due to the fact that 15 occurs twice in the sum (1919.1) that defines B. From (1919.8) we obtain B − 15 < log S, so  3 3(B− 15 ) 1 5.10 ≈ eB− 5 < S ≤ ≈ 13.62, 2 since 1.83 ≤ B ≤ 2.347 [4]. Bibliography [1] V. Brun, La s´ erie 1/5 + 1/7 + 1/11 + 1/13 + 1/17 + 1/19 + 1/29 + 1/31 + 1/41 + 1/43 + 1/59 + 1/61 + ..., o´ u les d´ enominateurs sont nombres premiers jumeaux est convergente ou finie, Bulletin des Sciences Math´ ematiques 43 (1919), 100-104, 124-128. http://gallica. bnf.fr/ark:/12148/bpt6k486270d. [2] J. Maynard, Small gaps between primes, Ann. of Math. (2) 181 (2015), no. 1, 383–413, DOI 10.4007/annals.2015.181.1.7. MR3272929 [3] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’; III: On the expression of a number as a sum of primes, Acta Math. 44 (1923), no. 1, 1–70, DOI 10.1007/BF02403921. MR1555183 [4] D. Klyve, Explicit bounds on twin primes and Brun’s Constant, Thesis (Ph.D.)–Dartmouth College, ProQuest LLC, Ann Arbor, MI, 2007. MR2712414 [5] Samuel G. Moreno, A one-sentence and truly elementary proof of the Basel problem, http:// arxiv.org/abs/1502.07667. [6] T. Nicely, Pentium FDIV flaw (2011), http://www.trnicely.net/pentbug/pentbug.html.  1 be more specific, n∈A n is a series of positive terms for which the partial sums are  1 converges. bounded above by 90. Thus, the monotone sequence property ensures that n∈A n 1 To



[7] T. R. Nicely, Enumeration to 1014 of the twin primes and Brun’s constant, Virginia J. Sci. 46 (1995), no. 3, 195–204. See also http://www.trnicely.net/twins/twins2.html. MR1401560 [8] D. H. J. Polymath, New equidistribution estimates of Zhang type, Algebra Number Theory 8 (2014), no. 9, 2067–2199, DOI 10.2140/ant.2014.8.2067. MR3294387 [9] D. H. J. Polymath, Variants of the Selberg sieve, and bounded intervals containing many primes, Res. Math. Sci. 1 (2014), Art. 12, 83, DOI 10.1186/s40687-014-0012-7. MR3373710 [10] E. L. Stark, Application of a mean value theorem for integrals to series summation, Amer. Math. Monthly 85 (1978), no. 6, 481–483, DOI 10.2307/2320072. MR0476932 [11] The Prime Pages, The List of Largest Known Primes Home Page, http://primes.utm.edu/ primes/. [12] J. Wu, Chen’s double sieve, Goldbach’s conjecture and the twin prime problem, Acta Arith. 114 (2004), no. 3, 215–273, DOI 10.4064/aa114-3-2. MR2071082


Waring’s Problem Introduction Godfrey Harold Hardy and John Edensor Littlewood wrote a series of influential papers concerning additive problems in number theory. The first paper in this series, published in 1920, addressed Waring’s problem [1]. For each k ≥ 2, is there an s = s(k) such that every natural number is a sum of at most s perfect k-powers? This problem, posed in 1770 by Edward Waring (1736–1798), is closely related to several other famous problems in number theory. In 1769, Euler suggested that for k ≥ 3, it is impossible to write a kth power as a sum of fewer than k nonzero kth powers. The case k = 3 is Fermat’s last theorem for the exponent 3, now known to be true; see the 1995 entry. Euler’s conjecture was disproved in 1966 by Leon J. Lander and Thomas R. Parkin, who showed that 275 + 845 + 1105 + 1335 = 1445 . What about k = 4? In 1986, Noam Elkies (1966– ) constructed an infinite sequence of counterexamples, the smallest of which is 2,682,4404 + 15,365,6394 + 18,796,7604 = 20,615,6734 . The smallest possible counterexample to Euler’s conjecture with k = 4 was provided by Roger Fry in 1988: 95,8004 + 217,5194 + 414,5604 = 422,4814 . The case k = 2 of Waring’s problem was settled in 1770 by Joseph-Louis Lagrange (1736–1813), who proved that every integer is a sum of four squares. For instance, 2 = 12 + 12 + 02 + 02


7 = 22 + 12 + 12 + 12 .

A key ingredient in Lagrange’s proof is the four-square identity: (a21 + a22 + a23 + a24 )(b21 + b22 + b23 + b24 ) = (a1 b1 − a2 b2 − a3 b3 − a4 b4 )2 + (a1 b2 + a2 b1 + a3 b4 − a4 b3 )2 + (a1 b3 − a2 b4 + a3 b1 + a4 b2 )2 + (a1 b4 + a2 b3 − a3 b2 + a4 b1 )2 . This identity is not as “magical” as it seems; see the notes for a derivation. Do three squares suffice? No, because 7 cannot be written as a sum of three squares (try it). Lagrange proved that a natural number can be represented as the sum of three perfect squares if and only if it is not of the form 4j (8k + 7). Thus, every natural number at most 100 except for 7, 15, 23, 28, 31, 39, 47, 55, 60, 63, 71, 79, 87, 92, 95 can be written as a sum of three squares. 39



The finiteness of s(k) for all k ≥ 2 was not shown until the work of David Hilbert (1862–1943) in 1909. For k = 1, 2, 3, 4, 5, 6, 7, the optimal values of s are 1, 4, 9, 19, 37, 73, 143. For instance, each positive integer is the sum of 19 fourth powers, and there are some positive integers for which 18 fourth powers do not suffice. For most values of k, we still do not know the optimal value of s. Hilbert’s proof is existential; as originally stated it does not provide a bound on how many kth powers are needed. This was remedied by Hardy and Littlewood in their masterful paper, in which they further develop the circle method, introduced by Hardy and Srinivasa Ramanujan (1887–1920) in 1916–1917 to analyze the partition function; see the 1923 entry. This approach involves a delicate analysis of exponential sums, which we now sketch in the more modern trigonometric-polynomial formulation. If we attempt to write integers as a sum of kth powers, we might attempt to use the generating function ∞ 

f (x) =


e2πin x ,

n=0 ix

in which e = cos x + i sin x (this is Euler’s formula). Unfortunately, the series above does not converge since its terms are each of unit magnitude and hence do not tend to zero.1 We can avoid convergence problems altogether by considering the truncated sum N  k fN (x) = e2πin x . n=0

There is now a free parameter N involved; we choose N to be the number we are trying to represent as a sum of kth powers. There is no danger in doing so since if we are trying to express N as a sum of kth powers, none of the summands can be larger than N . The great insight is to consider fN (x)s






s (1920.1)





e2πin1 x · · ·

n1 =0




e2πins x

ns =0 k 2πi(nk 1 +···+ns )x


0≤n1 ,...,ns ≤N


k sN 

a(m; s, N )e2πimx ,



in which a(m; s, N ) is the number of ways of writing m = nk1 + · · · + nks with each n ≤ N . This can be deduced by expanding the product (1920.1) and collecting terms involving e2πimx for each m. To solve Waring’s problem, we just need to show that if s is sufficiently large, then a(N ; s, N ) = 0 for all N . Fortunately, we can isolate a(N ; s, N ) in (1920.2), 1 The terms of a series that converges must tend to zero. However, a series whose terms tend to zero need not converge. Consider the harmonic series 1 + 12 + 13 + · · · .



which is the number of ways of writing N as a sum of exactly s perfect kth powers:  1 a(N ; s, N ) = fN (x)s e−2πiN x dx. (1920.3) 0

This is because   1 s −2πiN x fN (x) e dx = 0

k 1 sN 

a(m; s, N )e2πimx e−2πiN x

0 m=0


k sN 


e2πi(m−N )x 0

m=0 k sN 


a(m; s, N ) 



[cos (2π(m − N )x) + i sin (2π(m − N )x)] .

a(m; s, N ) 0

The integrals of the sine terms are all zero; the cosine integral vanishes unless m − N = 0, in which case the integral is 1. Thus, the preceding sum collapses to (1920.3). We must still show that the integral (1920.3) is nonzero. This is done by splitting the domain of integration into two pieces, the set M of major arcs (where the integrand is large) and the set m of minor arcs (where the integrand is small). The terms “arc” and “circle method” originate with Hardy and Littlewood, who formulated the method in terms of power series on the unit disk in the complex plane. The modern treatment recasts things in terms of truncated exponential sums and the wrapped interval [0, 1). Since fN (x) is highly oscillatory, we expect it to exhibit massive amounts of cancellation for most values of x. For which x is there strong reinforcement? It turns out that if x = a/q is a rational number whose denominator q is small relative to N , then fN (x) is large. If one can show that the integrals over M and m are of different orders of magnitude as N → ∞, we win. See [6] for details of this calculation and [5] for a general introduction to the circle method. Centennial Problem 1920 Proposed by Steven J. Miller, Williams College. Often a related problem is significantly easier to attack than the original. This is the case for the well-studied easier Waring’s problem, due to Wright [8]. Given a positive integer k, is there a ν(k) such that every integer can be written as

1 nk1 + · · · + ν(k) nkν(k) , in which 1 , 2 , . . . , ν(k) ∈ {−1, 0, 1}? Solve the easier Waring’s problem and show that ν(k) ≤ 2k−1 + 12 k!. 1920: Comments The four-square identity. Before giving the solution to the problem, let us return to the four-square identity. Let z1 = a1 + ia2 ,

z2 = a3 + ia4 ,

w1 = b1 + ib2 ,


w2 = b3 + ib4 .



To obtain the four-square identity, compute the determinants of both sides of ! ! ! z1 −z2 w1 −w2 z1 w1 − z2 w2 −z1 w2 − z2 w1 = . (1920.4) z2 z1 w2 w1 z2 w1 + z1 w2 −z2 w2 + z1 w1 The determinants of the two matrices on the left-hand side of (1920.4) are |z1 |2 + |z2 |2 = a21 + a22 + a23 + a24

and |w1 |2 + |w2 |2 = b21 + b22 + b23 + b24 ,

respectively. The determinant of the matrix on the right-hand side of (1920.4) is the right-hand side of the four-square identity. The multiplicative property of determinants completes the proof. What is really going on here? That involves quaternions, another story altogether. Solution to the problem. Here is a quick sketch of the solution; for more details see [6]. The idea is to exploit the fact that both sums and differences are allowed. This allows us to overshoot our target number, then fall back down through subtractions. For f : N → N, let (Δf )(x) = f (x + 1) − f (x), (Δ(2) f )(x) = (Δ(Δf ))(x), and so forth. Induction confirms that   r  r (Δ(r) f )(x) = (−1)r− f (x + )  =0

for r = 1, 2, . . . and that Δ(k−1) xk = k!x + dk , in which dk is an integer that depends on k but not on x. Thus,   k−1  k−1 k!x + dk = (−1)k−1− (x + )k  =0

is the sum and difference of at most k−1  k − 1 = 2k−1  =0

kth powers. Given N , write N −dk = k!z +w, in which |w| ≤ k!/2. Since 1 = 1k , we see that w is at worst the sum or difference of at most 12 k! kth powers. Consequently, N is a sum and difference of at most 2k−1 + 12 k! kth powers. What is the optimal number of summands? Bibliography [1] G. H. Hardy and J. E. Littlewood, Some problems of “Partitio numerorum”, I: A new solution of Waring’s problem, Nachrichten von der Gesellschaft der Wissenschaften zu G¨ ottingen, Mathematisch-Physikalische Klasse (1920), 33-54. https://eudml.org/doc/59073. [2] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio Numerorum’: IV. The singular series in Waring’s Problem and the value of the number G(k), Math. Z. 12 (1922), no. 1, 161– 188, DOI 10.1007/BF01482074. http://link.springer.com/article/10.1007%2FBF01482074. MR1544511 [3] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’ (VI): Further researches in Waring’s Problem, Math. Z. 23 (1925), no. 1, 1–37, DOI 10.1007/BF01506218. http://link.springer.com/article/10.1007%2FBF01506218. MR1544728



[4] G. H. Hardy and S. Ramanujan, Asymptotic Formulaae in Combinatory Analysis, Proc. London Math. Soc. (2) 17 (1918), 75–115, DOI 10.1112/plms/s2-17.1.75. http://plms. oxfordjournals.org/content/s2-17/1/75.full.pdf+html. MR1575586 [5] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 [6] M. B. Nathanson, Additive number theory: The classical bases, Graduate Texts in Mathematics, vol. 164, Springer-Verlag, New York, 1996. MR1395371 [7] P. Pollack, On Hilbert’s solution of Waring’s problem, Cent. Eur. J. Math. 9 (2011), no. 2, 294–301, DOI 10.2478/s11533-011-0009-z. MR2772425 [8] E. M. Wright, An Easier Waring’s Problem, J. London Math. Soc. 9 (1934), no. 4, 267–272, DOI 10.1112/jlms/s1-9.4.267. MR1574875


Mordell’s Theorem Introduction An elliptic curve is a plane curve E determined by an equation of the form y 2 = x3 + ax + b,


in which a and b are fixed integers and the discriminant Δ = −16(4a3 + 27b2 ) is nonzero. The nonvanishing of the discriminant ensures that E has no cusps, selfintersections, or isolated points; see Figure 1. Elliptic curves have many fascinating properties; we can barely scratch the surface of this important topic.

(a) a = −2, b = 0, Δ = 512

(b) a = −1, b = 1, Δ = −368

(c) a = −3, b = 3, Δ = −2160

(d) a = 0, b = 2, Δ = −1728

(e) a = −3, b = 2, Δ = 0

(f) a = 0, b = 0, Δ = 0

Figure 1. (a)–(d) are elliptic curves. If Δ > 0, the curve is disconnected; if Δ < 0, the curve is connected. If Δ = 0, then the curve is not an elliptic curve: (e) has a self-intersection and (f) has a cusp. 45






x P


x P +Q

P +Q

(a) a = −3, b = 3

(b) a = −2, b = 1

Figure 2. Addition of points on two elliptic curves. Let E be an elliptic curve. A point (x, y) ∈ E with rational coordinates is a rational point of E. Amazingly, the set E(Q) of rational points on E (which includes a “point at infinity”1 ) can be endowed with the structure of an abelian group. Here is a quick, if imprecise, explanation; see [8, 9] for complete details. Let P = (x1 , y1 ) and Q = (x2 , y2 ) be distinct rational points on E. The straight line y = αx+β that connects P and Q has rational coefficients α and β. It intersects E in a third point, R = (x3 , y3 ), which is also rational. Indeed, (αx + β)2 = x3 + ax + b


is a cubic equation in x with rational coefficients and two known rational roots, x1 and x2 . The third root, x3 , must also be rational; this can be seen by expanding the left-hand side of (x − x1 )(x − x2 )(x − x3 ) = 0 and comparing it to the cubic obtained in (1921.2). Thus, y3 = αx3 + β is rational as well. The sum of P and Q is defined to be the reflection, with respect to the x-axis, of R; see Figure 2. To be more specific,   y −y  y2 − y1 2 y1 x2 − y2 x1 2 1 . (1921.3) − x1 − x2 , x3 − P +Q = x2 − x1 x2 − x1 x2 − x1 It is not easy to show that addition in E(Q) is associative; have a look at (1921.3) and try to prove that (P + Q) + R = P + (Q + R)! Although brute force works, a higher-level approach is to use the Riemann–Roch theorem; see the 1945 entry. In 1921–1922 Louis Mordell (1888–1972) proved that E(Q) is a finitely generated abelian group. That is, E(Q) is isomorphic to Zr ⊕ T for some nonnegative integer r and some finite abelian group T. The number r is the (geometric) 1 To be more precise, one instead considers the curve y 2 z = x3 + axz 2 + bz 3 in projective space; the point at “infinity” is the equivalence class of (0, 1, 0).



rank of E(Q) and the group T is the torsion subgroup of E(Q). Barry Mazur (1937– ) proved that T must be of the form Z/nZ for 1 ≤ n ≤ 12 or Z/2Z × Z/2nZ for n ∈ {1, 2, 3, 4} [2, 3]. Moreover, each of these groups occurs infinitely often as the torsion subgroup of an elliptic curve. It is possible for r = 0 to occur. In fact, it is conjectured that 50% of elliptic curves have rank 0 and 50% have rank 1 (in a probabilistic sense that we cannot make precise here). For instance, one can show that the elliptic curve E defined by y 2 = x3 − x has only four rational points:   E(Q) = (0, 0), (1, 0), (0, 1), ∞ ; see [7, Ex. 1.1]. In this case, rank E = 0 and E(Q) is isomorphic to Z/2Z × Z/2Z. On the other hand, the elliptic curve with the largest known rank is y 2 + xy + y = x3 − 120039822036992245303534619191166796374x + 504224992484910670010801799168082726759443756222911415116. This can be put in “standard form” (1921.1) with a change of variables, although the coefficients become even larger. The rank of this curve is at least 24; the actual rank is unknown (it is suspected to be exactly 24). In addition to being of theoretical interest, elliptic curves play an important role in cryptography, factorization algorithms, and primality testing; see [10] and the references in [7]. Their group structure is far richer than the group structure arising in the cyclic groups (Z/pqZ)× , in which p and q are distinct primes, that are used in RSA (see the entry for 1977). Centennial Problem 1921 Proposed by Stephan Ramon Garcia, Pomona College, and Steven J. Miller, Williams College. We sketched a geometric definition of addition in E(Q). However, we glossed over some important details. Think about the following questions geometrically. If P ∈ E(Q), what is P + P ? What is the additive identity in E(Q)? What is the inverse −P of P in E(Q)? Why is addition in E(Q) associative? Now consider the equation y 2 = x4 + ax3 + bx2 + cx + d. For which choices of coefficients a, b, c, d is there an analogous definition of adding two rational points and obtaining a rational point? Or any definition for adding two rational points? More generally, what if we replace a quartic with a fixed polynomial of higher degree? 1921: Comments The Birch and Swinnerton-Dyer conjecture. The celebrated Birch and Swinnerton-Dyer conjecture, one of the Clay Millennium Problems (see the comments for the 2000 entry), concerns the rank of elliptic curves. The 2014 Fields Medal of Manjul Bhargava (1974– ) was awarded in part for work related to this problem. The conjecture states that the geometric rank of an elliptic curve equals its analytic rank . What does this mean?



We may consider the equation (1921.1) modulo some prime p; for technical reasons, p should not divide Δ. The number of points on the elliptic curve modulo p is   Np = 1 +  (x, y) ∈ (Z/pZ)2 : y 2 = x3 + ax + b , in which the +1 is added for the “point at infinity.” Helmut Hasse (1898–1979) proved that √ (1921.4) |Np − (p + 1)| ≤ 2 p for every prime p that does not divide Δ. The Hasse–Weil L-function of the elliptic curve E is the function  p −1 (p + 1) − Np + 2s × p (E, s)−1 1− L(E, s) = 2 p p p∈P pΔ

p∈P pΔ

of the complex variable s, in which p (E, s) is a certain polynomial in p−s that does not vanish at s = 1. Hasse’s bound (1921.4) ensures that the product that defines L(E, s) converges absolutely and locally uniformly on the half plane Re s > 32 . Consequently, L(E, s) is an analytic function of s on this region. The famed Taniyama–Shimura conjecture ensures that L(E, s) has an analytic continuation to C satisfying a certain functional equation analogous to that satisfied by the Riemann zeta function (see the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries). The analytic rank of E is the order of the zero of L(E, s) at s = 1. The simplest version of the Birch and Swinnerton-Dyer conjecture asserts that the geometric and analytic ranks of an elliptic curve are equal. See [4, 5, 8] for more on L-functions associated to elliptic curves. See [1, 11] and the references therein for results towards the Birch and Swinnerton-Dyer conjecture and the distribution of ranks. Bibliography [1] M. Bhargava and A. Shankar, Ternary cubic forms having bounded invariants, and the existence of a positive proportion of elliptic curves having rank 0, Ann. of Math. (2) 181 (2015), no. 2, 587–621, DOI 10.4007/annals.2015.181.2.4. http://annals.math.princeton. edu/2015/181-2/p04. MR3275847 ´ [2] B. Mazur, Modular curves and the Eisenstein ideal, Inst. Hautes Etudes Sci. Publ. Math. 47 (1977), 33–186 (1978). http://link.springer.com/article/10.1007%2FBF02684339. MR488287 [3] B. Mazur, Rational isogenies of prime degree (with an appendix by D. Goldfeld), Invent. Math. 44 (1978), no. 2, 129–162, DOI 10.1007/BF01390348. http://link.springer.com/ article/10.1007%2FBF01390348. MR482230 [4] A. W. Knapp, Elliptic curves, Mathematical Notes, vol. 40, Princeton University Press, Princeton, NJ, 1992. MR1193029 ´ Lozano-Robledo, Elliptic curves, modular forms, and their L-functions, Student Math[5] A. ematical Library, vol. 58, American Mathematical Society, Providence, RI; Institute for Advanced Study (IAS), Princeton, NJ, 2011. IAS/Park City Mathematical Subseries. MR2757255 [6] L. J. Mordell, On the rational solutions of the indeterminate equations of the third and fourth degrees, Proc Cam. Phil. Soc. 21 (1922). [7] K. Rubin and A. Silverberg, Ranks of elliptic curves, Bull. Amer. Math. Soc. (N.S.) 39 (2002), no. 4, 455–474, DOI 10.1090/S0273-0979-02-00952-7. MR1920278 [8] J. H. Silverman, The arithmetic of elliptic curves, Graduate Texts in Mathematics, vol. 106, Springer-Verlag, New York, 1986. MR817210



[9] J. H. Silverman and J. Tate, Rational points on elliptic curves, Undergraduate Texts in Mathematics, Springer-Verlag, New York, 1992. MR1171452 [10] L. C. Washington, Elliptic curves: Number theory and cryptography, 2nd ed., Discrete Mathematics and its Applications (Boca Raton), Chapman & Hall/CRC, Boca Raton, FL, 2008. MR2404461 [11] A. Wiles, The Birch and Swinnerton-Dyer conjecture, The millennium prize problems, Clay Math. Inst., Cambridge, MA, 2006, pp. 31–41. http://www.claymath.org/sites/default/ files/birchswin.pdf. MR2238272


Lindeberg Condition Introduction This year celebrates a milestone in the history of the central limit theorem, one of the most important results in probability theory. We first introduce some of the key concepts. A continuous random variable X has density fX if (a) fX (x) ≥ 0,  ∞ fX (x) dx = 1, and (b) −∞

(c) P (a ≤ X ≤ b) =


fX (x) dx, a

in which P (a ≤ X ≤ b) denotes the probability that X takes on a value in the closed interval [a, b]. This leads to one of the most important applications of integration: it allows us to compute probabilities. The nth moment of a random variable X with density fX , also called the expected value of X n , is  ∞ n E[X ] = xn fX (x) dx. −∞

The two most important moments are the mean  ∞ μX = E[X] = xfX (x) dx −∞

and the variance (the second centered moment of fX )  ∞ 2 2 σX = E[(X − μX ) ] = (x − μX )2 fX (x) dx = E[X 2 ] − E[X]2 . −∞

When the random variable X is clear from context, we often simplify the notation and write μ, σ, and f in place of μX , σX , and fX , respectively. The mean is the average value of a random variable. The standard deviation, the square root of the variance, is the natural scale to measure fluctuations from the mean. If you assign units to the random variable, say meters, then the mean and the standard deviation are both in meters while the variance is in meters-squared. If we want to have confidence intervals about a measurement, then the units of the uncertainty should be the same as the units of the random variable. This is why the standard deviation is the natural quantity considered in many problems. There are many densities that arise in theory and applications. The normal distribution occupies a central role in the subject; we will see why shortly. A random 51



Figure 1. Plots of Gaussians with variance 1 and 12 . variable X is normally distributed with mean μ and variance σ 2 if its density is 2


e−(x−μ) /2σ √ ; 2πσ 2 see Figure 1 for two representative plots. We also call this density a Gaussian or a bell curve. We can standardize a random variable X by passing to (X − μX )/σX , which has mean 0 and variance 1. The central limit theorem says that for appropriate random variables Xi , if we set Yn = X1 + · · · + Xn , then Zn = (Yn − μYn )/σYn converges to being normally distributed as n → ∞. from a uniform random variable on [− 12 , 12 ]. Then ⎧ 1 √ 3 √  √ 3z + 18z 2 + 36 3z + 72 ⎪ ⎪ 54 3 ⎪ ⎪  √ 3  ⎪ 1 ⎪ 2 ⎪ ⎪ 18√3 − 3z − 6z + 12 ⎪ ⎨ 2 fZ8 (z) = 3√ 3 ⎪ ⎪ √ 3  ⎪ 1 ⎪ √ ⎪ 3z − 6z 2 + 12 ⎪ 18 3 ⎪ ⎪ √ ⎪  ⎩ 1√  √ 3 2 − 3z + 18z − 36 3z + 72 54 3

For example, draw each Xi √ √ if −2 3 < z ≤ − 3, √ if − 3 < z < 0, if z = 0,

√ if 0 < z < 3, √ √ if 3 ≤ z < 2 3.

Although the preceding formula is exact, it is hard to work with. The central limit theorem permits us to approximate this exact, but hard to manipulate, expression with the density of a normal distribution; see Figure 2. In particular, observe how rapid the convergence is. The central limit theorem has a long and rich history. It has been a perennial quest to find the weakest possible conditions that suffice to ensure convergence to



(a) Z1

(b) Z2

(c) Z4

(d) Z8

Figure 2. Plots of normalized sums Zn of n copies of the uniform random variable on [− 12 , 12 ] versus the standard normal.

normality. Typically the first version students encounter has the random variables identically distributed, and the even moments  ∞ m2k = x2k fX (x) dx −∞

growing sufficiently slowly so that ∞  m2k 2k t (2k)!


converges for all |t| < δ for some δ > 0. This comes from a desire to have the moment generating function  ∞ tX etx fX (x) dx MX (t) = E[e ] = −∞

converge for t in an open neighborhood of the origin. Unfortunately, the moment generating function is not necessary well-defined, even for some common random variables. This occurs, for example, with the Cauchy random variable, whose density is 1 1 . π 1 + x2



Thus, one often studies the closely related characteristic function  ∞ φX (t) = E[eitX ] = eitx fX (x) dx, −∞

essentially the Fourier transform of fX , which always exists. In 1922 Jarl Lindeberg (1876–1932) proved that a certain set of conditions on the Xi forces convergence to a normal distribution. Specifically, consider the following situation. Central Limit Theorem: Let Xk be random variables on a probability space, and 2 exist and are finite. Let assume that the means μXk and variances σX k  1 if |Xk | ≥ sn , I(|Xk | ≥ sn ) = 0 otherwise, and let E[·] denote expectation relative to the underlying probability space. If s2n =  n 2 k=1 σXk and for all > 0 we have n 2 k=1 E[(Xk − μXk ) I(Xk ) ≥ sn ] = 0, lim n→∞ s2n 2 then Zn converges to a Gaussian. If we additionally assume maxk σX /s2n → 0, k then this condition is also necessary.

Centennial Problem 1922 Proposed by Steven J. Miller, Williams College. Instead of caring about the sum Yn = X1 + · · · + Xn , suppose that we only care about its value modulo 1; that is, we are concerned with Yn − Yn , in which · denotes the greatest integer function. This expression cannot converge to a Gaussian since it is only nonzero on [0, 1). What do you expect this sum to converge to? What is the most general set of conditions required to ensure convergence? 1922: Comments A useful trick. By taking logarithms we can convert questions about products of random variables to questions about sums of related random variables. If Yi = logB Xi , then to understand the distribution of the product X1 · · · Xn it suffices to determine the distribution of Y1 + · · · + Yn and then exponentiate. In many situations Lindeberg’s conditions hold for the Yi and as n tends to infinity the sum is approximately a Gaussian. Since we have not standardized this sum, we expect the variance of the Gaussian to tend to infinity as we add more and more terms. Given a positive real number r, we may write it uniquely as r = S10 (r)10k(r) , where S10 (r) ∈ [1, 10) is the significand and k(r) is an integer. Computing the sum Y1 + · · · + Yn modulo 1, in which Yi = log10 Xi , is equivalent to determining the significand of the product X1 · · · Xn , which is the first nonzero digit of this product when rounded down. See [4, 6] for proofs that the sum converges to the uniform distribution, as well as applications to Benford’s law; see the 1938 entry. In addition to being of theoretical interest, such probabilistic digit laws are frequently used in a variety of fields. The Internal Revenue Service (IRS) uses them to detect tax fraud and computer scientists use them to optimize systems architecture.



Bernoulli random variables. The linearity of expectation frequently simplifies the evaluation of a sum. We say that X is a Bernoulli random variable with parameter p ∈ [0, 1] if X assumes the value 1 with probability p and 0 with probability 1 − p. If X1 , . . . , Xn are independent Bernoulli random variables, then Sn = X 1 + · · · + X n is a binomial random variable (with parameters p and n). This random variable assumes values in {0, 1, . . . , n}, and the probability that Sn assumes the value k is   n k p (1 − p)n−k . k To calculate the mean of Sn from the definition, we need to evaluate   n  n k E[Sn ] = k p (1 − p)n−k , k k=0

which is a tedious exercise. However, the linearity of expectation provides an easier evaluation. Since each Xk is a Bernoulli random variable with parameter p, E[Xk ] = 1p + 0(1 − p) = p, and hence E[Sn ] = E[X1 + · · · + Xn ] = E[X1 ] + · · · + E[Xn ] = p + p + ··· + p    n times

= np. The central limit theorem ensures that, when appropriately normalized, Sn approaches a Gaussian distribution; see Figure 3.

(a) n = 10

(b) n = 20

(c) n = 50

(d) n = 100

Figure 3. Convergence of a binomial random variable to a Gaussian.



Bibliography [1] P. Billingsley, Probability and measure, 3rd ed., A Wiley-Interscience Publication, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1995. MR1324786 [2] J. W. Lindeberg, Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung (German), Math. Z. 15 (1922), no. 1, 211–225, DOI 10.1007/BF01494395. http:// link.springer.com/article/10.1007%2FBF01494395. MR1544569 [3] W. Feller, An introduction to probability theory and its applications. Vol. II., Second edition, John Wiley & Sons, Inc., New York-London-Sydney, 1971. MR0270403 [4] S. J. Miller (ed.), Benford’s Law: theory and applications, Princeton University Press, Princeton, NJ, 2015. MR3408774 [5] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480 [6] S. J. Miller and M. J. Nigrini, The modulo 1 central limit theorem and Benford’s law for products, Int. J. Algebra 2 (2008), no. 1-4, 119–130. http://arxiv.org/pdf/math/0607686v2. MR2417189


The Circle Method Introduction In 1923 G. H. Hardy and J. E. Littlewood published a landmark paper [3] that further developed the celebrated circle method; see the 1920 entry. This method is well represented in the early years of this book. This is no accident, since it has enjoyed spectacular success in resolving difficult problems in number theory. Their paper attacked many questions in additive number theory, including the ternary Goldbach conjecture, the twin prime conjecture, and the distribution of admissible tuples of primes. The first two problems are discussed in the 1937 and 1919 entries, respectively. We discuss the third problem below. The expressions n, n + 2, and n + 4 are simultaneously prime if and only if n = 3; this yields the primes 3, 5, and 7. Indeed, n, n+2, and n+4 are congruent to n, n + 2, and n + 1 modulo 3, respectively. Therefore, exactly one of these numbers is divisible by 3, which leads to only one prime triple, (3, 5, 7). This congruence obstruction prevents (n, n + 2, n + 4) from being a triple of primes infinitely often. On the other hand, there is no congruence obstruction that prevents n and n + 2 from being simultaneously prime. The twin prime conjecture suggests that this occurs infinitely often. Hardy and Littlewood conjectured that if there are no congruence obstructions that prevent a particular k-tuple of primes from occurring, then there are infinitely many such k-tuples. They also gave asymptotic formulas for the expected number of k-tuples of primes below a certain threshold. The main term is the product of a function asymptotic to x/(log x)k+1 and a constant that depends on the k − 1 neighbor differences and vanishes if there is a congruence obstruction. For example, they predicted that the number π2 (x) of twin primes at most x obeys π2 (x) ∼ 2

p(p − 2)  x dt (p − 1)2 2 (log t)2


as x → ∞ (see the comments for the 2005 entry for information about the Bateman– Horn conjecture, a broad generalization). The formula does a phenomenal job; the number of twin primes at most 1016 is 10,304,195,697,298, which differs from the Hardy–Littlewood prediction by 3,142,802. Although the error might seem large, it is small in terms of percentages. The ratio of the error to the true value is about 3 × 10−7 (and we believe the ratio gets smaller the higher up we go). To put this in the proper perspective, MapQuest lists it as 2,990.1 miles from Fenway Park in Boston to Dodger Stadium in Los Angeles. A similarly precise measurement here would correspond to an error of about 5 feet (about a third of the length of a typical car). 57



Ben Green and Terence Tao extended a seminal result of Endre Szemer´edi (1940– ) and established that the primes contain arbitrarily long arithmetic progressions [2]; see the 1975 and 2004 entries. That is, given , there exist a and b so that an + b is prime for n = 1, 2, . . . , . This differs from Dirichlet’s theorem on primes in arithmetic progressions (see the 1913 entry), in which one fixes relatively prime a and b and then concludes that the set {an + b : n = 1, 2, . . .} contains infinitely many primes (along with many composite numbers). The Green–Tao theorem is a consequence of the more far-reaching Hardy–Littlewood k-tuple conjecture, which is currently beyond our reach. Centennial Problem 1923 Proposed by Steven J. Miller, Williams College. Let us consider a significantly easier problem. Prove, without using the Green– Tao theorem, that there are infinitely many triples of primes of the form (p, p + 2mp , p + 4mp ), in which mp is a constant that depends upon p. Here the difference between neighboring primes can depend on the first prime in the sequence. That is, we do not require that the triples have the same common differences. For example, the triples (11, 17, 23), which has a common difference of 6, and (29, 41, 53), which has a common difference of 12, are acceptable. More generally, consider a set A of positive integers and let A(x) = |A ∩ {1, 2 . . . , x}|. For the set of prime numbers, A(x) √ ∼ x/ log x by the prime number theorem. For the set of perfect squares, A(x) ∼ x. The quotient A(x)/x can be interpreted as the density of the set A in {1, 2, . . . , x}. The hope is that if the density of A is sufficiently high as x → ∞, then A should contain infinitely many triples in arithmetic progression. On the other hand, if the density remains low, then there may not be infinitely many triples. (a) Find a function g(x) such that if A(x) ≥ g(x), then there are infinitely many triples (n, n + mn , n + 2mn ) in A, in which mn depends on n. (b) Find a function h(x) such that if A(x) ≤ h(x), then it is possible for A to lack infinitely many such triples. (c) Of course, one could take g(x) = x and h(x) = 2. While the analysis is easy in these extreme cases, the goal is to find the best functions possible. What are the smallest g(x) and the largest h(x) that we can take and still successfully resolve the problem? (d) If you cannot find a function that yields infinitely many triples, can you at least ensure the existence of one, or find a special sequence so that there are no triples? 1923: Comments The power of counting. The centennial problem illustrates another important idea: the power of counting arguments. Brun proved his theorem on the



convergence of the sum of the reciprocals of the twin primes by showing there is a C > 0 such that  2 log log x π2 (x) ≤ Cx log x for sufficiently large x. Brun’s estimate permits us to obtain an upper bound on the number of the twin primes in each interval of the form [2n , 2n+1 ). The reciprocals of such primes are at most 1/2n , from which one can show that the sum of the reciprocals of the twin primes converges. Suppose that p is prime, p  a, and p  b. For a given integer c, are there any points (x, y) on the “ellipse” ax2 + by 2 ≡ c (mod p)? Although the expression appears familiar, we are working modulo p and things become hard to visualize. The answer involves a beautiful counting argument. 2 First observe that there are exactly p+1 2 distinct values modulo p assumed by x , p−1 namely 0 and 2 nonzero values. This is because x2 ≡ y 2 (mod p)


x ≡ y (mod p) or

x ≡ −y (mod p).

By hypothesis, a is invertible modulo p and hence ax also assumes p+1 2 distinct p+1 2 values modulo p. Similarly, −by + c assumes 2 distinct values as well. If there did not exist x, y such that ax2 ≡ −by 2 + c (mod p), then there would be at least 2

p+1 p+1 + =p+1 2 2 distinct residue classes modulo p, which is impossible. Thus, ax2 + by 2 ≡ c (mod p) has a solution. The partition function. Although generally attributed to Hardy and Littlewood, the basic ideas of the circle method originated in Hardy and Ramanujan’s work on the partition function a few years earlier [4]. The partition function p(n) counts the number of ways to write n as a sum of nonincreasing integers. For example, p(4) = 5 since 4 = 3 + 1 = 2 + 2 = 2 + 1 + 1 = 1 + 1 + 1 + 1. Hardy and Ramanujan investigated properties of the generating function  ∞ ∞   1 n p(n)z = (1923.1) 1 − zk n=0 k=1

to study the asymptotic behavior of p(n). To see why (1923.1) iscorrect, expand ∞ jk each term in the product as a geometric series: (1 − z k )−1 = j=0 z . Then multiply the product of these series term-by-term. When the terms are gathered together, the coefficient of z n will be p(n). If multiplying together infinitely many infinite series makes you feel queazy, look at the derivation of the Euler product formula in 1933. We give a rigorous derivation there that has a similar flavor. In 1918, Hardy and Ramanujan proved that  "  2n 1 √ exp π ; p(n) ∼ 3 4n 3



that is, the quotient of these two expressions tends to 1 as n → ∞. Ramanujan also discovered some fascinating divisibility properties of p(n). He showed that p(5k + 4) ≡ 0 (mod 5), p(7k + 5) ≡ 0 (mod 7),


p(11k + 6) ≡ 0 (mod 11), for k = 0, 1, 2, . . .. Until recently, only a few other “simple” congruence relations of the type Ramanujan provided had been discovered. In retrospect, this is not surprising since we now know, for example, that p(711647853449k + 485138482133) ≡ 0 (mod 13) and p(28995244292486005245947069k + 28995221336976431135321047) ≡ 0 (mod 29); see [5, 8]. The numbers involved in these expressions are so large that they could not have been found by computation; they required a deep understanding of the theory of modular forms. Ken Ono (1968– ), who proved that such congruences exist for every prime modulus, is particularly fond of his discovery that p(4063467631k + 30064597) ≡ 0 (mod 31); it appears at the top of his homepage. For more information about the partition function and its properties, the reader is strongly encouraged to read the expository article [1] and the references therein. Bibliography [1] S. Ahlgren and K. Ono, Addition and counting: the arithmetic of partitions, Notices Amer. Math. Soc. 48 (2001), no. 9, 978–984. http://www.ams.org/notices/200109/fea-ahlgren. pdf. MR1854533 [2] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. arXiv:math.NT/ 0404188. MR2415379 [3] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’; III: On the expression of a number as a sum of primes, Acta Math. 44 (1923), no. 1, 1– 70, DOI 10.1007/BF02403921. http://link.springer.com/article/10.1007%2FBF02403921. MR1555183 [4] G. H. Hardy and S. Ramanujan, Asymptotic Formulaae in Combinatory Analysis, Proc. London Math. Soc. (2) 17 (1918), 75–115, DOI 10.1112/plms/s2-17.1.75. MR1575586 [5] F. Johansson, Efficient implementation of the Hardy-Ramanujan-Rademacher formula, LMS J. Comput. Math. 15 (2012), 341–359, DOI 10.1112/S1461157012001088. MR2988821 [6] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 [7] M. B. Nathanson, Additive number theory: The classical bases, Graduate Texts in Mathematics, vol. 164, Springer-Verlag, New York, 1996. MR1395371 [8] K. Ono, Distribution of the partition function modulo m, Ann. of Math. (2) 151 (2000), no. 1, 293–307, DOI 10.2307/121118. MR1745012


The Banach–Tarski Paradox Introduction A mathematical paradox often leads to a reevaluation of underlying assumptions. A good paradox can greatly influence the development of a subject. The Banach–Tarski paradox is one of the best: there exists a partition of the unit ball in R3 into a finite number of subsets that can be rearranged, using rigid motions, to yield two identical copies of the original ball. In other words, you can dissect an orange and reassemble it into two full-sized oranges; see Figure 1. This is impossible in the real world—hence the word “paradox”. Whereas real oranges are made up of atoms and are cut with a knife, mathematical oranges are made up of infinitely many points that can be partitioned into extremely complicated sets. We can choose which subset each point should belong to, with no regard

Figure 1. The Banach–Tarski paradox asserts that one can partition a ball in R3 into a finite number of disjoint sets that can be rearranged, using rigid motions, to form two balls identical to the original. Raphael Robinson (1911–1995) proved that this can be accomplished with five pieces; fewer than five pieces will not suffice [3]. 61



for nearby points (that is, ignoring continuity). One should not think of the pieces involved as solid pieces that can be handled like everyday objects. It is best to think of the pieces as, perhaps, “dense gas clouds.” The actual construction is more subtle, but it does involve making infinitely many arbitrary choices. This is permitted by the axiom of choice, a topic that deserves a healthy digression; see the 1940 and 1999 entries. Stefan Banach (1882–1945) and Alfred Tarski (1901–1983) actually proved much more. For n ≥ 3, given any two bounded subsets E and F of Rn having #N #N nonempty interior, there are partitions E = i=1 Ei and F = i=1 Fi into disjoint sets so that the sets Ei and Fi are congruent (in the geometric sense) for i = 1, 2, . . . , N . So for n ≥ 3, it is impossible to find a finitely additive, normalized “volume function,” defined on all subsets of Rn , that is invariant under rigid motions. This defeats any attempt to assign a “volume” to all subsets of Rn . We briefly sketch the main ideas behind “doubling the ball” in R3 . Let SO3 = SO3 (R) denote the group of all 3 × 3 real orthogonal matrices with determinant 1. That is, SO3 is the set of rigid motions of R3 that fix the origin and preserve orientation. The crucial observation is that SO3 contains a subgroup that is “isomorphic to the free group on two generators.” In less technical language, SO3 contains two matrices A and B for which there are no nontrivial relationships between A, A−1 , B, and B −1 , apart from A−1 A = I, BB −1 = I, B −1 AA−1 B = I, and so forth. For example, ⎤ ⎤ ⎡3 ⎡ 1 0 0 − 45 0 5 ⎥ ⎥ ⎢ ⎢ 3 0⎦ , A = ⎣ 45 B = ⎣0 35 − 45 ⎦ 5 3 0 0 1 0 45 5 are such matrices (the proof is nontrivial and involves elements of Diophantine approximation). These matrices induce rotations by θ = tan−1 ( 34 ), which is not a rational multiple of π, with respect to the z- and x-axes, respectively. Let Γ ⊂ SO3 denote the group generated by A and B; it consists of words in A, A−1 , B, and B −1 , such as AB 2 B −2 A−1 A2 B. This word is not reduced because further cancellation is possible; it reduces to A2 B. Let w(A) be the set of all reduced words that begin with A, and so on, and let Γ1 = w(A) ∪ {I, A−1 , A−2 , A−3 , . . .}, Γ2 = w(A−1 )\{A−1 , A−2 , A−3 , . . .}, Γ3 = w(B), Γ4 = w(B


and ).

We have the paradoxical decomposition Γ = Γ1 ∪ Γ2 ∪ Γ3 ∪ Γ4 = Γ1 ∪ AΓ2 = Γ3 ∪ BΓ4 , which facilitates “doubling the ball.” Let S 2 = {x ∈ R3 : x = 1} denote the surface of the unit sphere in R3 ; the exponent 2 denotes that S 2 is a “two-dimensional manifold” (a microscopic observer on the surface of the sphere will think that they are in R2 ; see the 2003 entry for more information). Define a relation on S 2 by saying that x ∼ y if there is a C ∈ Γ so that Cx = y. This is an equivalence relation: ∼ is reflexive since I ∈ Γ; ∼ is symmetric since Γ is closed under inversion; ∼ is transitive since matrix multiplication is associative.



The relation ∼ partitions S 2 into equivalence classes, which are necessarily disjoint. Form a set X by selecting one element from each equivalence class. Since S 2 = ΓX, we have the decompositions S 2 “=” Γ1 X  Γ2 X  Γ3 X  Γ4 X and S 2 “=” Γ1 X  AΓ2 X = Γ3 X  BΓ4 X. Now join each point on the sphere to 0 and consider the corresponding decompositions to obtain the Banach–Tarski paradox for the unit ball in R3 . The quotation marks indicate that these equalities are not entirely legitimate. We have not discussed the countably many points of S 2 (the “poles” of some rotation) that are fixed by some element of Γ. Nor have we discussed what happens to 0. We do not explore the details further, although one can see that the introduction of additional “pieces” in some elaborate manner is involved. Centennial Problem 1924 Proposed by Stephen Bigelow, University of California, Santa Barbara. Perhaps the most fruitful legacy of the Banach–Tarski paradox is in group theory. Inspired by the paradox, John von Neumann (1903–1957) defined amenable groups, which are groups that do not admit a paradoxical decomposition. More precisely, a discrete group is amenable if and only if it has a finitely additive, left-invariant probability measure. Such a measure gives a reasonable notion of “volume,” exactly the concept that the Banach–Tarski paradox appears to violate. The Thompson group F was introduced by John Thompson (1932– ) in 1965 and has unusual properties that make it a good source of counterexamples. It can be defined as the group of piecewise-linear bijections from the unit interval to itself for which all nondifferentiable points are dyadic rational numbers and all slopes are powers of two. Is the Thompson group amenable? The question of its amenability is still controversial. A preprint by E. T. Shavgulidze [4] claims to show it is amenable, and one by Azer Akhmedov [1] claims to show it is not. The consensus seems to be that both preprints contain serious gaps, and the correct answer is not clear. 1924: Comments Why three dimensions? The Banach–Tarski paradox dealt with a solid ball in R3 . What happens if we work with a disk in R2 ? Recall that SO3 contains a subgroup isomorphic to the free group on two generators. The group SO2 of 2 × 2 orthogonal matrices with determinant 1 does not contain such a subgroup. Indeed, SO2 is the group of all rotations of R2 around the origin. Rotations in R2 commute with each other, so SO2 is an abelian (commutative) group. It is too nice for something like the “paradoxical decomposition” required for the Banach–Tarski paradox to occur. The problem of measure. Here is a closely related paradox that helped launch the field of measure theory. Suppose that we wish to define a function m : P(R) → [0, ∞] that assigns a “length” to each subset of R (recall that P(R) is the set of all subsets of R; see p. 31). Our intuition tells us such a function should



satisfy the following three axioms:   (a) m (a, b) = b − a if a < b. (b) If An is a sequence of disjoint subsets of R, then ∞ ∞ *   m(An ) = m An . n=1


(c) If x + A = {x + a : a ∈ A}, then m(x + A) = m(A) for all x in R. A function m : P(R) → [0, ∞] that satisfies these properties would be fundamental to analysis, topology, and geometry. Unfortunately, no such function exists! Suppose toward a contradiction that m : P(R) → [0, ∞] satisfies (a), (b), and (c). One can show that the relation x∼y


x−y ∈Q

on the open interval (0, 1) is an equivalence relation. Thus, (0, 1) is the disjoint union of equivalence classes Eα , in which α runs over some index set I. Let S ⊆ (0, 1) contain exactly one element sα from each equivalence class Eα , let r1 , r2 , r3 , . . . be an enumeration of Q ∩ (−1, 1), and let Sn = rn + S. Because ∼ partitions (0, 1) into equivalence classes, each x in (0, 1) belongs to some Eα . Since S contains a representative from each equivalence class, there is an sα ∈ S so that x ∼ sα . Thus, x − sα = rn for some n since |x − sα | < 1. We conclude that ∞ * Sn , x = rn + s α ∈ S n ⊆ n=1

and hence (0, 1) ⊆

∞ *

Sn .


If x ∈ Si ∩ Sj , then x = ri + s α = rj + s β for some sα , sβ ∈ S. Consequently, sα − sβ = rj − ri ∈ Q, so sα ∼ sβ . Since S contains exactly one element from each equivalence class, sα = sβ , from which ri = rj and Si = Sj follow. Thus, the Sn are disjoint. A consequence of axiom (b) is that m is a monotone set function: A ⊆ B =⇒ m(A) ≤ m(B). Since Sn ⊆ (−1, 2) for all n, (0, 1) ⊆

∞ *

Sn ⊆ (−1, 2)


and hence

∞  *     1 = m (0, 1) ≤ m Sn ≤ m (−1, 2) = 3. n=1

The translation invariance of m ensures that m(Sn ) = m(S) for all n, so 1≤


m(S) ≤ 3.


Since m(S) is constant, the preceding is impossible. Thus, no such m exists.



The set S constructed above is an example of a Vitali set, named after Giuseppe Vitali (1875–1932). These sets are so strange that there is no reasonable way to define their “length.” Their construction relies upon the axiom of choice; see the 1940 and 1999 entries for more about this fascinating, and somewhat controversial, axiom of set theory. Bibliography [1] A. Akhmedov, A new metric criterion for non-amenability III: Non-amenability of R. Thompson’s group F , http://arxiv.org/abs/0902.3849. [2] S. Banach and A. Tarski, Sur la d´ ecomposition des ensembles de points en parties respectivement congruentes, Fund. Math. 6 (1924), 244-277. http://matwbn.icm.edu.pl/ksiazki/fm/ fm6/fm6127.pdf [3] R. M. Robinson, On the decomposition of spheres, Fund. Math. 34 (1947), 246–260, DOI 10.4064/fm-34-1-246-260. http://matwbn.icm.edu.pl/ksiazki/fm/fm34/fm34125.pdf. MR0026093 [4] E. T. Shavgulidze, About amenability of subgroups of the group of diffeomorphisms of the interval, http://arxiv.org/abs/0906.0107. [5] S. Wagon, The Banach-Tarski paradox, with a foreword by Jan Mycielski, Encyclopedia of Mathematics and its Applications, vol. 24, Cambridge University Press, Cambridge, 1985. MR803509


The Schr¨ odinger Equation Introduction 1

Probably the second most famous equation in physics is Newton’s second law: F = ma. Here F is the force acting on a body, m is the mass of the body, and a is the acceleration (the second derivative of position). The analogue for quantum mechanical systems is the Schr¨odinger equation, formulated in 1925 by Erwin Schr¨odinger (1887–1961). It is i

∂ + Ψ = HΨ, ∂t

+ is the Hamiltonian operator in which i2 = −1,  = h/2π (h is Planck’s constant), H of the system, and Ψ is the wave function that governs the system. To explain the mathematics behind the Schr¨odinger equation with any sense of rigor would occupy the remaining pages of this book.2 In the quantum-mechanical setting, the eigenvalues E of the time-independent + are the energy levels of the corresponding quanSchr¨odinger equation EΨ = HΨ + is an unbounded, selfadjoint operator tum system. In this eigenvalue problem, H on a Hilbert space (pardon the jargon). If one knows the eigenvalues of a given Schr¨odinger operator, one often wants to predict how these eigenvalues are affected by a slight modification of the original operator. Although this is far too complicated to address here, we can discuss the finite-dimensional setting. Let Mn (C) denote the set of n × n complex matrices. We say that A ∈ Mn (C) is selfadjoint if A = A∗ , that is, if A equals its conjugate transpose (physicists tend to use A† instead of A∗ ). A selfadjoint matrix A ∈ Mn (C) has only real eigenvalues, denoted by λ1 (A) ≥ λ2 (A) ≥ · · · ≥ λn (A) and repeated according to multiplicity, along with a corresponding orthonormal basis of eigenvectors. This is a special case of the spectral theorem [3]. How do the eigenvalues of a selfadjoint matrix behave under a small perturbation? Suppose that E ∈ Mn (C) is a positive semidefinite matrix of rank one. That is, E = ee∗ ∈ Mn (C) for some nonzero column vector e ∈ Cn . Then the eigenvalues of A interlace with those of A + E and A − E; that is, each eigenvalue of A is at most the corresponding eigenvalue of A + E and at least the corresponding eigenvalue of 1 The most famous is undoubtedly E = mc2 ; see the entry for 1915. See Episode 2 of the 1972 Doctor Who serial The Time Monster for the more dubious E = mc3 . 2 It took the genius of John von Neumann to put quantum mechanics on a firm mathematical foundation; see the entries for 1924, 1931, 1944, 1946 for more about him.


















Figure 1. The eigenvalues of A − E, A, and A + E interlace. A − E. For example, ⎡ −4 0 ⎢ 0 −4 ⎢ A = ⎣ 2 1 4 2


⎤ 2 4 1 2⎥ ⎥ 4 −2⎦ −2 −4

1 ⎢2 ⎢ and E = ⎣ 1 0

2 4 2 0

1 2 1 0

⎤ ⎡ ⎤ 0 1 ⎢ ⎥ 0⎥ ⎥ = ⎢2⎥ [1 2 1 0], ⎣1⎦ 0⎦ 0 0

then adding E to A increases each of its eigenvalues. Subtracting E from A decreases each of its eigenvalues. This is illustrated in Figure 1. This sort of eigenvalue interlacing result is the tip of the iceberg; for more information see [4]. There are lots of other beautiful results like this that are not typically covered in an undergraduate linear algebra course. What if we have three selfadjoint matrices A, B, C ∈ Mn (C) that satisfy A + B = C? How are the eigenvalues of A, B, and C related? Taking the trace of this equation indicates that n n n    λi (A) + λi (B) = λi (C). i=1



However, there are many other nontrivial relationships between the eigenvalues of A, B, and C. For instance, if n = 2, then λ1 (C) ≤ λ1 (A) + λ1 (B), λ2 (C) ≤ λ1 (A) + λ2 (B),


λ2 (C) ≤ λ2 (A) + λ1 (B). For larger values of n, more and more inequalities emerge. The story was only completed in 1999, with the resolution of the famous Horn conjecture by Alexander Klyachko [1], and by Allen Knutson (1969– ) and Terence Tao [2]. Centennial Problem 1925 Proposed by Stephan Ramon Garcia, Pomona College. Why does a planet spin on an axis? For instance, Venus spins on its axis once every 5,832 hours, Earth every 24 hours, and Mars every 25 hours. What does this have to do with linear algebra? 1925: Comments Stone’s theorem and the solution to the problem. Stone’s theorem is a seminal result in the mathematical formulation of quantum mechanics. It says that a strongly continuous, one-parameter semigroup t → U (t) of unitary operators on a Hilbert space H is of the form U (t) = exp(itA), in which A is a (potentially



unbounded) selfadjoint operator on H. In physical terms, the time evolution of a quantum system is obtained by exponentiating its Hamiltonian. If that sounds like a lot of technical jargon, you are right. Stone’s theorem is usually not covered until a second course in functional analysis.3 We content ourselves here with a finite-dimensional manifestation of Stone’s theorem that explains why a body in three-dimensional space spins on some axis. Consider a planet in space. Fix a coordinate system so that the center of the planet is always at 0 ∈ R3 . With its center now fixed and free from external forces the planet’s movement is governed by a continuous, one-parameter semigroup of matrices denoted t → U (t), in which U (t) is a 3 × 3 real matrix for each time t. The position of a point x on the planet’s surface at time t is U (t)x. The position of x at time s + t is U (s + t)x. This is the same as the position of the point U (t)x at time s, namely U (s)U (t)x. Consequently, U (s + t)x = U (s)U (t)x for each x ∈ R2 . This is the semigroup condition: U (s + t) = U (s)U (t). Consider further properties of the matrices U (t). Since the planet is not deformed as time passes, the matrices involved induce rigid motions of Euclidean space. These are the real orthogonal matrices, which are characterized by the condition U T U = I. If U is real orthogonal, then 1 = det I = det U T U = det(U T ) det U = (det U )2 , so det U = ±1. The sign of the determinant reveals whether the linear transformation induced by U is orientation preserving (+1) or reversing (−1). The orientation of our planet does not change with time, so we insist that each U (t) has determinant 1. We know that U (0) = I, since U (s) = U (s + 0) = U (s)U (0) for all s ∈ R. The derivative of U (t) at time t = 0 is U (t) − I . t Stone’s theorem (which works in a much more general setting) says that the preceding limit exists and that U (t) = exp(St), in which S = lim


exp(A) =

∞  1 n A n! n=0

is the matrix exponential function. Since U (t)T = U (t)−1 for t ≥ 0, we conclude that S is skew symmetric: S T = −S. In particular, U (t) = exp(itA), in which A = −iS is selfadjoint: T

A∗ = (iS)∗ = −iS ∗ = −iS = −iS = A. We say that A is the infinitesimal generator of the semigroup U (t). Since S is 3 × 3 and skew symmetric, det S = det(S T ) = det(−S) = (−1)3 det S = − det S.


Thus, det S = 0 and there is a nonzero x ∈ R such that Sx = 0. Therefore, ∞ ∞   1 n n 1 n n t S x=x+ t S x = x + 0 = x. U (t)x = exp(tS)x = n! n! n=0 n=1 3

3 Interesting historical tidbit: mathematician Marshall H. Stone (1903–1989) was the son of Harlan F. Stone (1872–1946), Chief Justice of the Supreme Court from 1941 to 1946.



That is, the point x is fixed by each U (t). Furthermore, x generates a onedimensional subspace span{x} of R3 that is fixed by each U (t). In other words, span{x} is an axis of rotation for our planet. The fact that our model takes place in three dimensions is crucial. In an even-dimensional universe, the planet need not have any axis of rotation. The computation (1925.1) would only yield the useless deduction det S = det S. Here is a 2 × 2 semigroup of real orthogonal matrices: ! cos t − sin t U (t) = . sin t cos t Their eigenvalues are eit = cos t+i sin t and they have no common (real) eigenvector. Although U (t) rotates R2 clockwise around the origin through an angle of t, there is no nonzero vector that is fixed by each U (t). Bibliography [1] A. A. Klyachko, Stable bundles, representation theory and Hermitian operators, Selecta Math. (N.S.) 4 (1998), no. 3, 419–445, DOI 10.1007/s000290050037. MR1654578 [2] A. Knutson and T. Tao, Honeycombs and sums of Hermitian matrices, Notices Amer. Math. Soc. 48 (2001), no. 2, 175–186. MR1811121 [3] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge University Press, 2017. [4] R. A. Horn and C. R. Johnson, Matrix analysis, 2nd ed., Cambridge University Press, Cambridge, 2013. MR2978290 ¨ [5] D. Hilbert, Uber das Unendliche, Math. Ann. 95 (1926), 161-190. http://link.springer. com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-0810/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf. [6] E. Schr¨ odinger, An undulatory theory of the mechanics of atoms and molecules, Phys. Rev. 28 (1926), no. 6, 1049-1070. http://journals.aps.org/pr/abstract/10.1103/PhysRev.28. 1049.


Ackermann’s Function Introduction In 1926 David Hilbert published an article on infinity [2], at that time still a controversial topic, in which he famously declared “no one will drive us from the paradise which Cantor created for us” (see the 1918 entry for a brief introduction to Cantor’s theory of cardinality). In this important paper, Hilbert also described a function discovered by his student, Wilhelm Ackermann (1896–1962). Ackermann was trying to unify arithmetic operations on natural numbers. Just as addition is repeated counting, multiplication is repeated addition, and exponentiation is repeated multiplication, one can continue to iterate each successive operation to produce an even faster-growing one. Ackermann defined his function ϕ of three variables recursively in such a way that ϕ(a, b, 0) = a + b,

ϕ(a, b, 1) = a · b,

ϕ(a, b, 2) = ab ,

.a ..

ϕ(a, b, 3) = aa  , b times

and so on. In particular, ϕ grows astronomically as its arguments increase. The significance is that ϕ is computable, but only by using tricks like double recursion, unbounded loops, or the operator “the least n such that.” Functions that can be computed in a more direct manner, without resort to such devices, are called primitive recursive. Later authors simplified the definition but kept the spirit. The cleanest version is due to Raphael Robinson: ⎧ ⎪ if i = 0, ⎨j + 1 A(i, j) = A(i − 1, 1) if i > 0 and j = 0, ⎪ ⎩ A(i − 1, A(i, j − 1)) if i > 0 and j > 0. To get an idea of how fast this function grows, note that A(2, 3) = 9, A(3, 3) = 61, and A(4, 3) have about 1020,000 decimal digits. The enormity of A(5, 3) is scarcely conceivable. Because Ackermann’s function (in whatever incarnation) grows very rapidly, one can form a kind of “inverse” function, α, that grows so slowly that for all practical purposes it is constant. This function turns out to play an important role in the analysis of algorithms. For example, although there is no linear-time algorithm for managing a sequence of “union” and “find” operations on a collection of n disjoint sets, Robert Tarjan (1948– ) found a data structure such that these operations can be performed in time O(n · α(n)) [5]. 71



Centennial Problem 1926 Proposed by Jerrold Grossman, Oakland University. Here is a problem about a modification (pun intended) of the Ackermann function. For each n > 2, define An : {0, 1, 2, . . .} × {0, 1, 2, . . . , n − 1} → {0, 1, 2, . . . , n − 1} by

⎧ ⎪ if i = 0, ⎨j + 1 (mod n) An (i, j) = An (i − 1, 1) if i > 0 and j = 0, ⎪ ⎩ An (i − 1, An (i, j − 1)) if i > 0 and j > 0.

If you make a table of its values for small i and j, and for various small values of n, you will find that An (i, j) quickly becomes constant. For example, A13 (i, j) = 9 for all j once i ≥ 6. Prove or disprove that this behavior happens for all n. 1926: Comments Euler’s power tower. Before discussing the solution, let us digress a bit on a few other interesting expressions with a recursive flavor. Under certain circumstances, the function ·· x·

f (x) = xx


can be made sense of. First of all, the preceding denotes the limit of the sequence a1 , a2 , . . . defined by a1 = x and an+1 = xan for n = 1, 2, . . .. That is, we always group exponents “from the top down”: x



xx means x(x ) , not (xx )x = x(x ) . Euler showed in 1783 that the expression that defines f (x) converges if 0.06598 . . . = e−e < x < e1/e = 1.4446 . . . ; √ see Figure 1. Since 2 = 1.4142 . . ., s =


√ √2·· 2


√ s is well-defined and nonnegative. Since s = 2 , we have √ s s2 = ( 2 )2 = (2s/2 )2 = 2s . Consequently, 2 log s = s log 2 and hence log s log 2 = . s 2 Since s ≥ 1 and log x/x is strictly decreasing on [1, ∞), we conclude that s = 2. Assuming convergence, a similar approach can be used to evaluate , "  √ r = 2 + 2 + 2 + 2 + · · ·. √ Since r = 2 + r, we have r 2 = 2 + r and hence (r − 2)(r + 1) = 0. Since r = −1 is impossible, we must have r = 2.



Figure 1. Graph of Euler’s iterated exponential function (1926.1). A continued fraction. How can we justify these sorts of computations? Let us work through an example in detail. Consider the sequence defined by x1 = 1 and xn+1 = 1 + 1/(1 + xn ) for n ≥ 1. That is, 1 1 1 x1 = 1, x2 = 1 + , , x4 = 1 + , x3 = 1 + 1 2 2+ 2 2 + 2+1 1 2

and so forth. Induction confirms that 1 ≤ xn ≤ 2 and hence       1 1   |xn+1 − xn | =  1 + − 1+ 1 + xn 1 + xn−1     1  |xn − xn−1 | 1   = =  − 1 + xn 1 + xn−1  (1 + xn )(1 + xn−1 ) 1 |xn − xn−1 | ≤ |xn − xn−1 | ≤ (1 + 1)(1 + 1) 4 1 1 ≤ · · · ≤ n−1 |x2 − x1 | = 4 2 · 4n−1 for n ≥ 2. Consequently, the limit n ∞   L = lim xn = x1 + lim (xk − xk−1 ) = x1 + (xk − xk−1 ) n→∞




exists since the series involved converges absolutely. Since L = limn→∞ xn+1 , it 1 2 follows √ that L = 1 + 1/(1 + L). Thus, L − 2 = 0, from which it follows that L = 2 since L ≥ 0. We write this as an infinite continued fraction: √ 1 2=1+ . 2 + 2+ 1 1 2+...

See the 1931, 1934, and 1972 entries for more on continued fractions. 1 Recall that we may interchange limits with continuous functions. Here we are using the fact that f (t) = 1 + t and g(t) = 1/t are continuous functions of the real variable t = 0.



Solution to the problem. It turns out that there is exactly one counterexample for n < 4,000,000, namely n = 1969. In this case, the values A1969 (2i, ·) are (1698, 0, 0, 0, 0, 0, . . .), and the values A1969 (2i+1, ·) are (0, 1698, 0, 1698, 0, 1698, . . .) for all i ≥ 4 [3]. It is not known whether there are other counterexamples. Bibliography [1] W. Ackermann, Zum Hilbertschen Aufbau der reellen Zahlen (German), Math. Ann. 99 (1928), no. 1, 118–133, DOI 10.1007/BF01459088. http://eretrandre.org/rb/files/ Ackermann1928_126.pdf. MR1512441 ¨ [2] D. Hilbert, Uber das Unendliche (German), Math. Ann. 95 (1926), no. 1, 161– 190, DOI 10.1007/BF01206605. http://link.springer.com/article/10.1007%2FBF01206605. MR1512272 [3] J. Froemke and J. W. Grossman, Unsolved Problems: A Mod-n Ackermann Function, or What’s So Special About 1969?, Amer. Math. Monthly 100 (1993), no. 2, 180–183, DOI 10.2307/2323780. MR1542281 [4] R. M. Robinson, Recursion and double recursion, Bull. Amer. Math. Soc. 54 (1948), 987– 993, DOI 10.1090/S0002-9904-1948-09121-2. http://www.math.ntnu.no/emner/MA2301/2010h/ robinson_doublerec.pdf. MR0026976 [5] R. E. Tarjan, Efficiency of a good but not linear set union algorithm, J. Assoc. Comput. Mach. 22 (1975), 215–225, DOI 10.1145/321879.321884. http://ecommons.library.cornell. edu/handle/1813/5942. MR0458996


William Lowell Putnam Mathematical Competition Introduction Many a problem solver will be aware of the William Lowell Putnam Mathematical Competition, a North American undergraduate contest administered by the Mathematical Association of America. It was founded in 1927 by Elizabeth Lowell Putnam1 (1862–1935) in honor of her late husband, William Lowell Putnam (1861–1923), who firmly believed in the virtues of academic rivalry between universities. Among the many unwritten traditions of the Putnam exam is that every exam should have at least one problem that uses the year number as part of a problem statement or its solution. So, it is a fitting twist that the Putnam exam is the subject of this section. Joseph Gallian (1942– ) wrote a fabulous overview of the Putnam exam’s history, milestones, statistics, and trivia [2]. Offered every year since 1938 (except in 1943–1945), the Putnam exam’s roots include a math competition also sponsored by Elizabeth Lowell Putnam and held in 1933 between ten Harvard students and ten West Point cadets. The cadets both won the team contest and had the top individual score. Earlier Putnam exams featured problems in areas closer to the introductory technical undergraduate curriculum such as calculus, differential equations, or geometry; in more recent years, a recognizable blend of topics including also linear algebra, some abstract algebra, combinatorics, number theory (or even an occasional advanced topic on harder questions) characterizes each year’s twelve problems. The five most successful contestants each year are named Putnam Fellows, one of whom is also awarded a fellowship for graduate study at Harvard; eight persons so far have been a Putnam Fellow the maximum possible four times. Other substantial monetary team and individual prizes are given, and an Elizabeth Lowell Putnam prize may be awarded to one female contestant. The original intent to boost team spirit and provide an avenue for students to fight for their institution’s glory in an academic subject helps one to understand the peculiar ranking system, in which every participating institution must designate a three-person team in advance, and the team ranking is obtained by adding the team members’ individual ranks (rather than their scores). Since higher scores are obtained by much fewer students, a university whose three team members solve seven problems will usually rank higher than the one where two brilliant team members solve nine problems and the third solves three. 1 Her

brother was astronomer Percival Lowell (1855–1916), who predicted the location of Pluto and popularized the erroneous theory that Mars bore canals that indicated the presence of intelligent life. 75



The Putnam exam has been called “the hardest math test in the world” [1, 3]. This is not without reason: the median score has budged above 1 point out of 120 in only four years since 1999 and then never above 3, while fully 62.6% of 2006 entrants scored 0. A student must make substantial progress toward an actual solution to receive any points for a problem; checking small examples or stating some immediate conclusions typically does not make the cut. Each of the 12 problems is graded on a scale from 0 to 10 points, with the only scores allowed being 0, 1, 2, 8, 9, and 10. Thus, the grader must decide whether the problem is essentially solved or not. A submission that solves one of the two main cases or one that contains the structure of a full solution but has a serious flaw might get 1 or 2 points. On the other hand, a submission that contains all the ingredients of a full solution but neglects to check a minor subcase might get 8 or 9 points; the full mark of 10 points is reserved for essentially perfect solutions. The first round of grading currently occurs in December at Santa Clara University. Imagine several dozen mathematicians tackling the collective output of over 4,000 competitors from over 500 colleges on twelve problems one paper at a time over the span of four days. Undergraduate students solve problems in several competitions around the world, such as the annual Schweitzer competition in Hungary, the Jarn´ık competition in central Europe, the famous competitions at Moscow’s and Kiev’s Mech-Mats, or the International Mathematics Competition for University Students [5], an annual contest held in Europe that has also seen participation from several American universities. While many Putnam stars were successful in the high school IMOs, the two contests retain distinct mathematical profiles. Opinions differ on the extent to which the Putnam exam or other competitions mimic the mathematical research experience or are somehow reflective of the student’s research aptitude; see several Putnam Fellows’ perspectives in [1]. The Putnam exam was, of course, never designed for such use. Five Fellows, namely Richard Feynman (1918–1988), John Milnor (1931– ), David Mumford (1937– ), Daniel Quillen (1940–2011), and Kenneth Wilson (1936–2013), have been subsequently recognized with a Fields Medal or a Nobel Prize, and many dozens more have become distinguished mathematicians at top universities and research institutes. Notable Putnam competitors include many Abel Prize winners, MacArthur Fellows, AMS and MAA presidents, members of the National Academy of Sciences, as well as many winners of the Morgan Prize for undergraduate research. Many others have chosen entirely different careers, and many top-notch mathematicians have never taken or particularly enjoyed contest mathematics. Ask mathematics graduate program admissions chairs or hedge fund managers and many will tell you that, while neither a prerequisite nor a guarantee of success, a candidate’s good showing on the Putnam exam gets their attention. Putnam problems test a specific kind of ingenuity over technical mastery and are sometimes seen as occupying a universe of their own, but here as in Hamming we must remember that Putnam problems “were not on the stone tablets that Moses brought down from Mt. Sinai” [4]. They are composed by a committee of working mathematicians designated by the MAA, and so their evolution over time perhaps reflects our collective style and taste. What makes a good Putnam problem? Bruce Reznick (1953– ) has written with charm and detail about writing for the Putnam exam [8]. Andr´e Weil (1906–1998), paraphrasing the English poet Housman who had used an example of a fox-terrier hunting for a rat to explain why he cannot define poetry, famously



quipped: “when I smell number-theory I think I know it, and when I smell something else I think I know it too” [9]. He then proceeded to argue that analytic number theory is not number theory, but this is a subject for another article. Putnam takers and experienced problem-solvers will similarly spot a juicy problem. It will be accessible but not trivial, challenging but not impossibly so. It will relate to important mathematics, but with an unexpected twist. It will make you smile and, in Reznick’s words, whistle in your mind like a catchy tune. We propose the following Putnam problem for your whistling enjoyment. It appeared as problem A3 in the 2013 exam and requires nothing beyond calculus. Do not just solve it and bask in your glory. Make yourself a hot beverage, relate the solution to your other mathematical experiences, continue the story that it tells; you will have new problems of your own in no time. Centennial Problem 1927 Proposed by Djordje Mili´ cevi´ c, Bryn Mawr College. Suppose that the real numbers a0 , a1 , . . . , an and x, in which 0 < x < 1, satisfy a1 an a0 + + ···+ = 0. 1 − x 1 − x2 1 − xn+1 Prove that there exists a real number y with 0 < y < 1 such that a0 + a1 y + · · · + an y n = 0.


1927: Comments A hint for the problem. There are numerous collections, both in print and online, with solutions and commentaries on the Putnam problems. For example, see [6]. Before looking at the solution, you are strongly encouraged to try it yourself. Here is a hint: suppose that a0 + a1 y + · · · + an y n = 0


for 0 < y < 1. If this occurs, then the intermediate value theorem implies that a0 + a1 y + · · · + an y n has the same sign for all 0 < y < 1. Eigenvalues and the intermediate value theorem. Never underestimate the power of some of those first-semester calculus theorems. Here is a cute application of the intermediate value theorem to linear algebra. Let A be a real n × n matrix, in which n is odd. Then the characteristic polynomial pA of A is monic and of odd degree. Consequently, lim pA (x) = +∞



lim pA (x) = −∞,


and hence pA assumes both positive and negative values on R. The intermediate value theorem ensures that pA has a real zero; that is, A has a real eigenvalue. This argument fails if n is even. For example, the eigenvalues of ! 0 −1 A= 1 0 are ±i.



Solution to the problem. Suppose toward a contradiction that (1927.2) holds for 0 < y < 1. Then the intermediate value theorem ensures that a0 + a1 y + · · · + an y n has the same sign for 0 < y < 1. Without loss of generality, we assume that a0 + a1 y + · · · + an y n > 0 for 0 < y < 1. Then a0 xk + a1 x2k + · · · + an x(n+1)k > 0 for all 0 < x < 1 and k = 1, 2, . . .. Continuity ensures that a0 + a1 + · · · + an ≥ 0. Since |x| < 1, we have ∞ ∞ ∞       a0 xk + a 1 x2k + · · · + an x(n+1)k > 0 k=0



and hence

a0 a1 an + + ···+ > 0 1 − x 1 − x2 1 − xn+1 by the geometric series summation formula. Since this is a contradiction, we conclude that there is a y ∈ (0, 1) so that (1927.1) holds. Bibliography [1] G. L. Alexanderson, How Putnam fellows view the competition, MAA Focus, December 2004, 14-15, http://www.maa.org/sites/default/files/pdf/pubs/dec04.pdf [2] J. A. Gallian, The first sixty-six years of the Putnam competition, American Mathematical Monthly 111 (2004), 691–699. See also http://www.d.umn.edu/~jgallian/putnam.pdf. [3] L. Grossman, Crunching the numbers, Time, December 16, 2002, http://content.time.com/ time/magazine/article/0,9171,400000,00.html. [4] R. W. Hamming, The unreasonable effectiveness of mathematics, Amer. Math. Monthly 87 (1980), no. 2, 81–90, DOI 10.2307/2321982. MR559142 [5] International Mathematics Competition for University Students, http://www.imc-math.org. uk. [6] K. S. Kedlaya, B. Poonen, and R. Vakil, The William Lowell Putnam Mathematical Competition, 1985–2000, Problems, solutions, and commentary, MAA Problem Books Series, Mathematical Association of America, Washington, DC, 2002. MR1933844 [7] K. S. Kedlaya, The Putnam archive, http://kskedlaya.org/putnam-archive/. [8] B. Reznick, Some thoughts on writing for the Putnam, http://www.math.uiuc.edu/~reznick/ putnam.pdf. [9] A. Weil, Essais historiques sur la th´ eorie des nombres (French), L’Enseignement Math´ ematique, Universit´ e de Gen` eve, Geneva, 1975. Extrait de l’Enseignement Math. 20 (1974); Monographie No. 22 de L’Enseignement Math´ ematique. MR0389725


Random Matrix Theory Introduction Random matrix theory is, as expected, the study of randomly chosen matrices. What is not immediately apparent is why it should so beautifully model such diverse phenomena as energy levels in nuclear physics, zeros of the Riemann zeta function (which encode information about the primes; see the 1942 entry), and stopping times of bus routes, to name just a few! While the subject began with the 1928 paper of John Wishart (1898–1956) [24], for many people the most exciting dates come later, in the 1950s, 1970s and 1990s. In the 1950s Eugene Wigner (1902–1995) had the fruitful insight that systems of random matrices could accurately predict properties of heavy nuclei [18–22]. In a classical mechanics course one learns how to solve, in closed form, problems that involve just one or two point masses. Once we have three bodies, chaos sets in and closed-form solutions typically do not exist. Now imagine how much more daunting the task is with heavy nuclei, in which there are hundreds of protons and neutrons interacting under far more complicated forces than gravity. In quantum mechanics, this is represented as HΨn = En Ψn , in which H is the Hamiltonian of the system and Ψn are the energy eigenstates with eigenvalues En ; see the 1925 entry. While this reduces quantum mechanics to linear algebra, there is a twist. The matrices are infinite1 and the entries are unknown. These sorts of problems are beyond the techniques learned in an undergraduate linear algebra class. Wigner’s idea, for which he earned a Nobel Prize, is that the complicated interactions actually help us. Rather than trying to find the eigenvalues of the operator associated to our physical system, he looked at many random matrices, diagonalized them, weighted the observed eigenvalue spectra by the probability of choosing those matrices, and then averaged over a family of matrices. The hope, which has been borne out time and time again in experiments and theories, is that a “typical” system is close to average. A good way to view this universality is to see it as a sort of central limit theorem; see the 1922 entry. We first establish some notation and then give a simple version of his result below; it has since been greatly generalized and extended. See [4–7, 16, 17] for some of the recent successes and surveys, which have greatly weakened the assumptions needed on the random variables. While we concentrate on real symmetric matrices, variants have been proved in many other settings. 1 To

be more precise, they are unbounded selfadjoint operators on an infinite-dimensional Hilbert space. There are additional wrinkles too. On infinite-dimensional vector spaces, linear operators need not have any eigenvalues. Consider the operator T : C[0, 1] → C[0, 1] defined by (T f )(x) = xf (x). Here C[0, 1] denotes the complex vector space of continuous functions f : [0, 1] → C. If T f = λf for some λ ∈ C, then (x − λ)f (x) = 0 for all x ∈ [0, 1]. Since f is continuous, it must be the zero function. Thus, no λ ∈ C is an eigenvalue of T ! 79



An N × N real symmetric matrix A has N real eigenvalues, repeated according to multiplicity, which we label λ1 (A) ≥ λ2 (A) ≥ · · · ≥ λN (A). Fix a probability distribution p with mean 0, variance 1, and finite higher moments. We consider the ensemble of N × N real symmetric matrices, in which the independent entries2 are independent (in the probabilistic sense), identically distributed random variables drawn from p. The probability that we choose a matrix A whose (i, j) entry is in [αij , αij + ] is  αij + p(aij ) daij . 1≤i≤j≤N


A key tool in understanding the eigenvalues is the eigenvalue trace lemma, which states n  tr A = aii = λ1 (A) + · · · + λN (A) i=1

and, more generally, tr(Ak ) =

 1≤i1 ≤···≤ik ≤N

ai1 i2 ai2 i3 · · · aik−1 ik aik i1 =


λi (A)k .


The importance of this lemma is that it allows us to pass from the matrix elements (which we know) to the eigenvalues (which we want to understand). Determining the moments of the eigenvalues yields information on their distribution. For example, taking k = 2 implies that the expected value of tr(A2 ) is N 2 and hence the average square of an eigenvalue is N . Thus, a typical eigenvalue should be of order √ N . This simple calculation suggests the normalization we shall see in Wigner’s semicircle law. The last item √ we need is the empirical spectral measure of A,√denoted by μA . We divide by 2 N to normalize each eigenvalue; the factor 2 in 2 N ensures that we ultimately wind up with a circle instead of an ellipse. We write   N 1  λi (A) δ x− √ μA (x) = , N i=1 2 N in which δ is the Dirac delta functional. One can view δ(x − a) as a unit point mass at a. If f is a suitably nice function, then  ∞ f (x)δ(x − a) dx = f (a). −∞

The way to mathematically realize “delta functions” in a rigorous way is to regard them as linear functionals on suitable spaces of functions. For example, δ(x − a) is the linear functional that sends a function f to the scalar f (a). Wigner’s Semicircle Law: Consider the ensemble E(N, p) of N ×N real symmetric matrices where the independent entries are independent, identically distributed random variables drawn from a distribution p with mean 0, variance 1, and finite 2 For

instance, the (1, 2) and (3, 4) entries are independent, in the sense that neither determines the other. On the other hand, the (1, 2) and (2, 1) entries each determine the other since the matrices involved are symmetric.



Figure 1. Eigenvalues of a 1,000 × 1,000 random real (nonsymmetric) matrix, with entries drawn independently from the uniform distribution on [−1, 1]. higher moments. As N → ∞, for almost all A ∈ E(N, p), the √ empirical spectral measure μA converges to the density of the semicircle, fsc (x) = π2 1 − x2 if |x| ≤ 1 and 0 otherwise. Wigner’s work was expanded by Freeman Dyson (1923– ) [2, 3], whom we will meet again shortly, and many others. Physical reasons often require the matrices involved to be real symmetric (this ensures that the eigenvalues are real). Modulo such constraints, researchers mostly considered matrices in which the free entries were chosen independently from a fixed distribution. See Figures 1 and 2 for examples. Fast forward to the 1970s. The Riemann zeta function, defined for Re s > 1 by −1 ∞   1 1 = , 1− s ζ(s) = ns p n=1 p prime can be meromorphically continued to the entire complex plane with a simple pole at s = 1; see the 1933 entry for a proof of the remarkable product formula above. Using complex analysis, one can show that the zeros of the completed zeta function are intimately connected to many properties of the primes. Hugh Montgomery (1944– ) was working on the pair correlation problem [14], trying to understand the distribution of differences of pairs of zeta zeros. While visiting the Institute for Advanced Study at Princeton, he relayed what he had found to Dyson, who remarked that the same behavior is seen in the eigenvalues of certain ensembles of random matrices. Additional support was later provided by the numerical investigations of Andrew Odlyzko (1949– ) on the zeros of ζ(s); see [15] and the 1987 entry.



Figure 2. Histogram of eigenvalues of a 2,500 × 2,500 random real symmetric matrix, with entries drawn independently from the uniform distribution on [−1, 1]. From that moment, number theory, random matrix theory, and physics had a lot to say to each other. These subjects continue to drive each other. New questions emerged in the 1990s with the work of Nick Katz (1943– ) and Peter Sarnak (1953– ) [11], expanding the universe of matrix families relevant to number theory. For more information on random matrix theory and its connection to number theory, see the books [12, 13] and the survey articles [1, 21, 22]. See also the entry from 1960 for an entertaining look at Wigner’s views on the role of mathematics in physics. Centennial Problem 1928 Proposed by Steven J. Miller, Williams College. Let f be a “nice” probability distribution with mean 0, variance 1, and finite higher moments. For example, f could be the standard normal f (x) =  distribution  √ 2 e−x /2 / 2π, or f could be the uniform distribution on [− 3 3/2, 3 3/2]. Consider the family of real symmetric matrices . ! a11 a12 : a11 , a12 , a22 i.i.d. random variables with density f ; a12 a22 this means that the entries a11 , a12 , and a22 are chosen independently of each /β other, and the probability that aij ∈ [α, β] is α f (x) dx. Calculate the probability that such a randomly chosen matrix has its largest eigenvalue in the interval [A, B]. What about its smallest eigenvalue? What about the difference between its eigenvalues? 1928: Comments Some connections. It is worth briefly mentioning the remarkable similarities we see in such diverse systems. In the 1922 entry, we saw another example of different systems converging to similar behavior. For more on this phenomenon see the 1960 entry on Wigner’s paper The Unreasonable Effectiveness of Mathematics in the Natural Sciences [23], in which we deliberately chose to focus on quantities related to the characters from this year’s entry.



Analogues of the eigenvalue trace lemma arise in number theory. There we wish to understand the zeros of the Riemann zeta function (or other related Lfunctions); here the zeros play a role analogous to that played by eigenvalues, and the coefficients of the L-functions correspond to the entries of the matrices. The analysis frequently begins through some explicit formula, which allows us to pass from what we understand (the coefficients) to what we wish to understand (the zeros). These correspondences are useless, of course, if one cannot execute the averaging on at least one side. For random matrix theory, this is done through integration and combinatorics; much of the difficulty in number theory is that we do not have similarly powerful results for the expressions involved. For further reading. The references below contain many great introductions to random matrix theory. These include a general interest article [10], short survey articles [1, 8, 11], textbooks [9, 12], and many of the original papers in nuclear physics and number theory [2, 18–24]. Bibliography [1] J. B. Conrey, L-functions and random matrices, Mathematics unlimited—2001 and beyond, Springer, Berlin, 2001, pp. 331–352. http://arxiv.org/pdf/math/0005300.pdf? origin=publication_detail. MR1852163 [2] F. Dyson, Statistical theory of the energy levels of complex systems: I, II, III, J. Mathematical Phys. 3 (1962), 140-156, 157-165, 166-175, http://scitation.aip.org/content/aip/ journal/jmp/3/1/10.1063/1.1703773, http://scitation.aip.org/content/aip/journal/ jmp/3/1/10.1063/1.1703774, http://scitation.aip.org/content/aip/journal/jmp/3/1/ 10.1063/1.1703773. [3] F. J. Dyson, The threefold way. Algebraic structure of symmetry groups and ensembles in quantum mechanics, J. Mathematical Phys. 3 (1962), 1199–1215, DOI 10.1063/1.1703863. http://scitation.aip.org/content/aip/journal/jmp/3/6/10.1063/1. 1703863. MR0177643 [4] L. Erd˝ os, Universality of Wigner random matrices: a survey of recent results, http://arxiv. org/abs/1004.0861. [5] L. Erd˝ os, B. Schlein, and H.-T. Yau, Semicircle law on short scales and delocalization of eigenvectors for Wigner random matrices, Ann. Probab. 37 (2009), no. 3, 815–852, DOI 10.1214/08-AOP421. MR2537522 [6] L. Erd˝ os, B. Schlein, and H.-T. Yau, Local semicircle law and complete delocalization for Wigner random matrices, Comm. Math. Phys. 287 (2009), no. 2, 641–655, DOI 10.1007/s00220-008-0636-9. MR2481753 [7] L. Erd˝ os, B. Schlein, and H.-T. Yau, Wegner estimate and level repulsion for Wigner random matrices, Int. Math. Res. Not. IMRN 3 (2010), 436–479, DOI 10.1093/imrn/rnp136. MR2587574 [8] F. W. K. Firk and S. J. Miller, Nuclei, primes and the random matrix connection, Symmetry 1 (2009), no. 1, 64–105, DOI 10.3390/sym1010064. http://arxiv.org/pdf/0909.4914.pdf. MR2756142 [9] P. J. Forrester, Log-gases and random matrices, London Mathematical Society Monographs Series, vol. 34, Princeton University Press, Princeton, NJ, 2010. MR2641363 [10] B. Hayes, The spectrum of Riemannium, American Scientist 91 (2003), no. 4, 296-300. [11] N. M. Katz and P. Sarnak, Zeroes of zeta functions and symmetry, Bull. Amer. Math. Soc. (N.S.) 36 (1999), no. 1, 1–26, DOI 10.1090/S0273-0979-99-00766-1. MR1640151 [12] M. L. Mehta, Random matrices, 2nd ed., Academic Press, Inc., Boston, MA, 1991. MR1083764 [13] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019



[14] H. L. Montgomery, The pair correlation of zeros of the zeta function, Analytic number theory (Proc. Sympos. Pure Math., Vol. XXIV, St. Louis Univ., St. Louis, Mo., 1972), Amer. Math. Soc., Providence, R.I., 1973, pp. 181–193. http://www-personal.umich.edu/~hlm/paircor1. pdf. MR0337821 [15] A. M. Odlyzko, On the distribution of spacings between zeros of the zeta function, Math. Comp. 48 (1987), no. 177, 273–308, DOI 10.2307/2007890. http://www.ams.org/journals/ mcom/1987-48-177/S0025-5718-1987-0866115-0/. MR866115 [16] B. Schlein, Spectral Properties of Wigner Matrices, Proceedings of the Conference QMath 11, Hradec Kralove, 2010. [17] T. Tao and V. Vu, Random matrices: universality of local eigenvalue statistics, Acta Math. 206 (2011), no. 1, 127–204, DOI 10.1007/s11511-011-0061-3. MR2784665 [18] E. Wigner, On the statistical distribution of the widths and spacings of nuclear resonance levels, Proc. Cambridge Philo. Soc. 47 (1951), 790-798. http://journals.cambridge.org/ abstract_S0305004100027237. [19] E. Wigner, Characteristic vectors of bordered matrices with infinite dimensions, Ann. of Math. 2 (1955), no. 62, 548-564. http://www.jstor.org/stable/1970079?seq=1# page_scan_tab_contents. [20] E. Wigner, Statistical properties of real symmetric matrices, in Canadian Mathematical Congress Proceedings, University of Toronto Press, Toronto, 1957, 174–184. [21] E. P. Wigner, Characteristic vectors of bordered matrices with infinite dimensions. II, Ann. of Math. (2) 65 (1957), 203–207, DOI 10.2307/1969956. MR0083848 [22] E. P. Wigner, On the distribution of the roots of certain symmetric matrices, Ann. of Math. (2) 67 (1958), 325–327, DOI 10.2307/1970008. http://www.jstor.org/stable/1970008? seq=1#page_scan_tab_contents. MR0095527 [23] E. P. Wigner, The unreasonable effectiveness of mathematics in the natural sciences [Comm. Pure Appl. Math. 13 (1960), 1–14; Zbl 102, 7], Mathematical analysis of physical systems, Van Nostrand Reinhold, New York, 1985, pp. 1–14. https://www.dartmouth.edu/~matc/ MathDrama/reading/Wigner.html. MR824292 [24] J. Wishart, The generalized product moment distribution in samples from a normal multivariate population, Biometrika 20 A (1928), 32-52. http://www.jstor.org/stable/2331939? seq=1#page_scan_tab_contents.


G¨ odel’s Incompleteness Theorems Introduction This statement is false. If it is false, then it is true; likewise, if it is true, then it is false. This paradox, commonly known as the liar’s paradox, has been attributed to Eubulides of Miletus (4th century BCE). If you are confused, you are not the only one. The liar’s paradox was used to disable an android in the Star Trek episode I, Mudd (1967) and a sentient mainframe in the Doctor Who serial The Green Death (1973). In a similar vein, Bertrand Russell (1872–1970) dealt the death blow to “naive set theory” in 1901 when he observed that the definition of the “set” R = {x : x ∈ / x} ensures that R ∈ R if and only if R ∈ / R, which is absurd. Thus, R is not a set; it is too big to be treated as a set in a logically sound manner. This is Russell’s paradox . Needless to say, the invalidity of naive set theory1 was a great disappointment. No one was more disappointed than Gottlob Frege (1848–1925), who was just finishing his would-be masterpiece Grundgesetze der Arithmetik , which purported to derive the laws of arithmetic from supposedly logical axioms. As Frege admitted: A scientist can hardly meet with anything more undesirable than to have the foundation give way just as the work is finished. I was put in this position by a letter from Mr. Bertrand Russell when the work was nearly through the press.

Mathematicians were forced to reevaluate the foundations of their discipline. Sets would have to be treated in a rigorous manner. The rules would have to be explicitly stated so that contradictions would not occur in axiomatic set theory. The Zermelo–Fraenkel axioms (ZF) are a list of eight or so axioms,2 depending upon the particular formulation, that codify the properties of sets. For the most part they assert things that most mathematicians take for granted (for example, unions of sets exist). Here are a couple of the axioms.   Axiom of Foundation. ∀x x = ∅ =⇒ ∃y ∈ x(y ∩ x = ∅) . This axiom prevents a set from being an element of itself.3

1 More specifically, the general comprehension principle, which asserts that given any property, there exists a set that consists of all objects having that property. 2 Technically, some of them are axiom schema; the distinction is not important for us. 3 If A is a set, then so is {A} (this requires the axiom of pairing, which asserts that if A and B are sets, then so is {A, B}; let A = B to see that {A} is a set). The axiom of foundation ensures that there is an element of {A} that is disjoint from {A}. The only element of {A} is A, so A and {A} are disjoint. Thus, A ∈ / A.




   Axiom of Infinity. ∃X ∅ ∈ X ∧ ∀x ∈ X (x ∪ {x}) ∈ X . This ensures that an infinite set exists; the set X described by the axiom contains   ∅, ∅ ∪ {∅}, ∅ ∪ {∅} ∪ ∅ ∪ {∅} , . . . , which can be used to define the natural numbers 0, 1, 2, . . .. Is Zermelo–Fraenkel set theory the ultimate answer to the foundational crisis in mathematics? In short, no. We have not even brought up the axiom of choice or the continuum hypothesis yet; see the entries for 1924, 1940, 1963, 1964, and 1999. Self-reference, seen in the liar’s paradox and Russell’s paradox, lies at the heart of the celebrated first incompleteness theorem of Kurt G¨odel (1906–1978).4 A set of axioms is consistent if there does not exist a statement S such that both S and its negation ¬S are provable from the axioms (that is, the axioms are not self-contradictory). The first incompleteness theorem says that any “sufficiently complicated” axiomatic system is either incomplete (not all true statements in that system can be proved in that system) or inconsistent (self-contradictory). The second incompleteness theorem states that no “sufficiently complicated” axiomatic system (this includes ZF) can prove its own consistency. Around the turn of the 20th century, David Hilbert initiated a program that aimed to show that all of mathematics can be derived from a set of self-evident axioms. This program was pursued in earnest by Bertrand Russell and Alfred North Whitehead (1861–1947), who authored the imposing Principia Mathematica. This was an ambitious task; it took over 300 pages to establish that 1 + 1 = 2. G¨odel’s theorems show that Hilbert’s program is doomed. If ZF is consistent, then there are true statements that can be expressed in the language of ZF, but not proved in ZF. Moreover, we cannot even hope to use ZF to prove that ZF is consistent. Indeed, if ZF could be used to prove that ZF is consistent, then the second incompleteness theorem would ensure that ZF is inconsistent! There is nothing special about ZF. Any other sufficiently complicated system of axioms would be plagued by the same issues. Centennial Problem 1929 Proposed by Stephan Ramon Garcia, Pomona College. To obtain the hereditary base-b representation of a natural number, write it in base-b, then expand the exponents in base-b, and continue until the process terminates. Since 266 = 28 + 23 + 21 , the hereditary base-2 representation of 266 is 2+1

266 = 22

+ 22+1 + 2.

The Goodstein sequence Gb (n) of a natural number n is obtained as follows. (a) Let G2 (n) = n. (b) Write Gb (n) in hereditary base-b notation. (c) Obtain Gb+1 (n) from Gb (n) be first replacing all occurrences of the base b by b + 1, then subtracting 1 from the end result. 4 Technically, we are a little early. The completeness theorem for first-order logic was the subject of G¨ odel’s 1929 thesis; the incompleteness theorems actually date from 1931.



If n = 4, then G2 (4) = 22 = 4, G3 (4) = 33 − 1 = 26, and so forth. The first few values of Gb (266) are 2+1

G2 (266) = 22

+ 22+1 + 2 = 266,


+ 33+1 + 3) − 1 ≈ 4.4 × 1038 ,


+ 44+1 + 2) − 1 ≈ 3.2 × 10616 ,


+ 55+1 + 1) − 1 ≈ 2.5 × 1010921 ,

G3 (266) = (33 G4 (266) = (44 G5 (266) = (55


G6 (266) = 66

+ 66+1 − 1 ≈ 3.5 × 10217832 ;


see [8]. For n ≥ 4, a few calculations quickly suggest that limb→∞ Gb (n) = ∞. However, this is completely misleading. Prove that every Goodstein sequence terminates with 0. 1929: Comments Although we do not provide the complete solution here, the following example illustrates the main idea; see [1] for more details. Let us return to the case n = 266 discussed above. In the line corresponding to base b, replace every occurrence of b in the hereditary base-b expansion of Gb (266) − 1 with the first infinite ordinal ω; see Figure 1. This yields H2 (266) = ω ω


+ ω ω+1 + ω,

H3 (266) = ω ω


+ ω ω+1 + 2,

H4 (266) = ω ω


+ ω ω+1 + 1,

H5 (266) = ω ω


+ ω ω+1 ,

H6 (266) = ω ω


+ ω ω · 5 + ω 5 · 5 + · · · + ω · 5 + 5,

and so forth. The structure of H6 (266) looks different than expected because (1929.1) is not the hereditary base-6 expansion of Gb (266) − 1; the “ones” term cannot be −1. Instead. 6+1



+ 66+1 − 1 = 66

+ 5 · 66 + 5 · 65 + · · · + 5 · 6 + 5.

This leads to a strictly decreasing sequence Hb (266) of countable ordinals. It turns out that any strictly decreasing sequence of ordinal numbers is finite, so Hb (266), and hence Gb (266), terminates with 0. Although Goodstein’s theorem is a statement about natural numbers and their properties, it cannot be proved without “infinitary” means; some form of transfinite mathematics is required to prove Goodstein’s theorem. The Kirby–Paris theorem (1982) implies that Goodstein’s theorem is independent of Peano arithmetic (PA). In other words, Goodstein’s theorem can neither be proved nor disproved in PA. One can think of Peano arithmetic as the system ZFCfin obtained from ZFC (see the 1963, 1964, and 1969 entries) if the axiom of infinity (“there exists an infinite set”) is replaced by its negation (“all sets are finite”). PA is sufficient for almost all familiar statements of elementary number theory. For instance, Euclid’s theorem on the infinitude of the primes can be proved without any reference to infinite sets. “For each prime p there exists a prime q such that p < q” expresses



3 2


ω +2


ω +1


ω5 ω6




Figure 1. Graphical depiction of an initial segment of the ordinal numbers. The “set of all ordinal numbers” is well-ordered and has the property that a strictly decreasing sequence of ordinals is finite. However, the “set of all ordinal numbers” is too big to be a set; this leads to the Burali–Forti paradox. But that is a story for another time. the infinitude of primes without discussing infinite sets. One does not need to “hold in one’s hand” the set of all primes in order to prove Euclid’s theorem. G¨odel’s first incompleteness theorem ensures that if Peano arithmetic is consistent (as most people believe), then there are true statements about the integers that cannot be proved in PA. Goodstein’s theorem is one such statement. If you want more information about the foundational crisis and its main characters and you want it in the form of a graphic novel, then [2] is for you. If you want your G¨ odel with a serving of M. C. Escher (1898–1972) and J. S. Bach (1685–1750), then you need the acclaimed book [6]. Another great choice is [3], particularly for debunking the numerous pseudoscientific assertions often ascribed to G¨ odel’s theorems. Bibliography [1] A. E. Caicedo, Goodstein’s function (English, with English and Spanish summaries), Rev. Colombiana Mat. 41 (2007), no. 2, 381–391. MR2585906 [2] A. Doxiadis and C. H. Papadimitriou, Logicomix: An epic search for truth, character design and drawings by Alecos Papadatos, color by Annie Di Donna, Bloomsbury Press, New York, 2009. MR2884886 [3] T. Franz´en, G¨ odel’s theorem: An incomplete guide to its use and abuse, A K Peters, Ltd., Wellesley, MA, 2005. MR2146326 ¨ [4] K. G¨ odel, Uber die Vollst¨ andigkeit des Logikkalk¨ uls, Doctoral dissertation, University of Vienna, 1929. [5] R. L. Goodstein, On the restricted ordinal theorem, J. Symbolic Logic 9 (1944), 33–41, DOI 10.2307/2268019. https://projecteuclid.org/euclid.jsl/1183391360. MR0010515 [6] D. R. Hofstadter, G¨ odel, Escher, Bach: an eternal golden braid, Basic Books, Inc., Publishers, New York, 1979. MR530196 [7] L. Kirby and J. Paris, Accessible independence results for Peano arithmetic, Bull. London Math. Soc. 14 (1982), no. 4, 285–293, DOI 10.1112/blms/14.4.285. http://blms. oxfordjournals.org/content/14/4/285.full.pdf. MR663480 [8] Wolfram MathWorld, Goodstein Sequence, http://mathworld.wolfram.com/ GoodsteinSequence.html.


Ramsey Theory Introduction There are many questions that could, in principle, be settled by a computation. However, some of these problems are so far beyond the realm of practical computation that we may never know the answer; see the 1933 entry and the comments for the 1992 entry for other examples of this phenomenon. A great source of such problems is Ramsey theory, named after Frank Plumpton Ramsey (1903–1930), an area of mathematics that studies how large a collection of objects must be to ensure the emergence of a desired property. The seminal problem in Ramsey theory is the determination of the Ramsey number R(m, n), which is defined as follows. Imagine there is a long-expected party with N people, and in any pair of two people either both know each other or neither knows the other. Then R(m, n) is the smallest N which guarantees that there are either (a) at least m people that all know each other or (b) at least n people such that none of these n people know each other. It is known that R(3, 3) = 6 and R(4, 4) = 18. Ramsey theory’s mantra is “complete disorder is impossible”: any large, seemingly disordered, structure should contain a smaller, highly ordered substructure. Unfortunately, there are often so many cases to investigate that these sorts of problems cannot be solved by brute force. For example, we may associate a graph to the party problem (see Figure 1), with vertices representing people and an edge C






Figure 1. Since R(3, 3) = 6, in any party of six or more there are at least three people who all know each other (blue lines) or at least three people who are mutual strangers (red lines). In the configuration illustrated, both of these situations occur: people A, C, E all know each other and B, D, F are mutual strangers. 89





























Figure 2. The van der Waerden number W (2, 3) is at least 8. The coloring of {1, 2, . . . , 8} pictured at the top has no monochromatic arithmetic progression of length three. It is possible to show that W (2, 3) = 9. For instance, appending a red 9 to the original coloring yields the monochromatic progression 1, 5, 9; appending a blue 9 yields the monochromatic progression 3, 6, 9.

connecting people who know each other. The number of graphs on N labeled vern tices is 2( 2 ) = 2n(n−1)/2 , which already exceeds 10200 for n = 40. According to Paul Erd˝os: Suppose aliens invade the earth and threaten to obliterate it in a year’s time unless human beings can find the Ramsey number for red five and blue five. We could marshal the world’s best minds and fastest computers, and within a year we could probably calculate the value. If the aliens demanded the Ramsey number for red six and blue six, however, we would have no choice but to launch a preemptive attack.

Erd˝os’s quote is about R(5, 5) (which is between 43 and 49) and R(6, 6) (which is between 102 and 165). A famous theorem in the subject is due to Bartel Leendert van der Waerden (1903–1996). Given c ≥ 2 colors and a natural number n, there is a natural number W (c, n) such that if N ≥ W (c, n) and we paint the integers 1, 2, . . . , N with these colors, there is a length n arithmetic progression in {1, 2, . . . , N }, each element of which has the same color. It is known that W (2, 3) = 9 (see Figure 2), W (2, 4) = 35, W (2, 5) = 178, and W (2, 6) = 1132. Most other values of W (c, d) are unknown, although bounds exist. For instance, the novel approach to Szemer´edi’s theorem (see the 1975 entry) developed by Fields Medalist Timothy Gowers (1963– ) yields the upper bound c2


W (c, n) ≤ 22


Although W (c, n) grows rapidly, it is hoped that Gowers’s bound is overkill. A cash prize of $1,000 was offered by Ronald Graham (1935– ) for a proof that 2

W (2, n) < 2n ; see [6] for a list of problems in Ramsey theory with cash prizes attached.



Centennial Problem 1930 Proposed by Joel Spencer, NYU, James M. Andrews, University of Memphis, and Steven J. Miller, Williams College, based on the 1953 Putnam Mathematical Examination. Six points are in general position in R3 (no three on a line, no four in a plane). The fifteen line segments joining them in pairs are drawn and then painted, some segments red, some blue. Prove that some triangle has all its sides the same color. 1930: Comments Some cautionary tales. Before giving the solution to this problem, let us digress a bit on the dangers of extrapolating from limited data. For instance, the Ramsey numbers R(n, n) are known only for n = 1, 2, 3, 4; it is hard to surmise what R(5, 5) should be based on this information. Here are a couple cautionary tales about careless extrapolation. Moser’s circle problem1 asks for the maximum number f (n) of regions into which a circle can be partitioned by connecting n points on the circle with chords. A couple quick sketches confirm that f (1) = 1,

f (2) = 2,

f (3) = 4,

f (4) = 8,


f (5) = 16;


for all n. However, it see Figure 3. This limited data suggests that f (n) = 2 turns out that f (6) = 31; see Figure 4. The correct general answer 1 4 (n − 6n3 + 23n2 − 18n + 24), f (n) = 24 can be obtained by induction or combinatorial topology [3, 11]. The previous conjecture failed at n = 6. Here is an even more striking example. Let p(n) = n2 + n + 41 and consider its values 41, 43, 47, 53, 61, 71, 83, 97, 113, 131, 151, 173, 197, 223, 251, 281, 313, 347, 383, 421, 461, 503, 547, 593, 641, 691, 743, 797, 853, 911, 971, 1033, 1097, 1163, 1231, 1301, 1373, 1447, 1523, 1601, . . . for n = 0, 1, 2, . . .. Do you see a pattern? They are all prime! Or at least p(0), p(1), . . . , p(39) are; we intentionally left off the composite number p(40) = 1681 = 412 . This shows that even a few dozen cases do not a theorem make. This amazing polynomial was discovered by Euler in 1772; see the 1983 entry for an even more amazing “prime generating polynomial.” In 1919, George P´ olya (1887–1985) suggested that most natural numbers have an odd number of prime factors [12]. For instance, 108 = 22 · 33 has 2 + 3 = 5 prime factors. The Liouville lambda function λ(n) is +1 if n has an even number of prime factors and −1 if n has an odd number of prime factors. P´ olya’s conjecture states that n  L(n) = λ(i) ≤ 0 i=1

for n = 2, 3, 4, . . .. Numerical evidence suggests that truth of the conjecture. In fact, it holds for all n < 906,150,257, which is the smallest counterexample to P´olya’s 1 The problem appears in a paper of Leo Moser (1921–1970) and W. Bruce Ross [10], so perhaps the “Moser–Ross circle problem” would be more appropriate.



2 1 1



2 4

2 2

1 6

7 3









13 9



1 6 15 10 5


Figure 3. Illustration of Moser’s circle problem for small n. Here n points on a circle yield chords that determine at most 2n−1 regions for n = 1, 2, 3, 4, 5. 2 8 14






1 7

26 27




31 22



30 29

10 4

12 16

24 23


17 11 5

Figure 4. Illustration of Moser’s circle problem for n = 6. The pattern f (n) = 2n−1 that held for n = 1, 2, 3, 4, 5 fails for n = 6. At most f (6) = 31 regions are determined.



conjecture. When C. Brian Haselgrove (1926–1974) disproved the conjecture in 1958, he proved the existence of a counterexample in the vicinity of 1.845 × 10361 [8]. Now for our final warning about naive reliance on numerical data (see also the notes for 1957). One can show that    t   t  t  ∞ sin 100n+1 sin t sin 101 sin 201 π ··· dt = t t t t 2 0 101 201 100n+1 for n = 1, 2, . . . , 9.8 × 1042 . However, it is possible to show that the integral is unequal to π/2 for all n > 7.4 × 1043 [1]. Solution to the problem. The following is from [4]. Let P be any one of the six points. Five of the line segments end at P , and of these at least three, say P Q, P R, and P S, must have the same color, say blue. If any of the segments QR, RS, SQ is blue, we obtain a blue triangle; if not, QRS will be a red triangle. The proposer, Joel Spencer (1946– ), was given this problem while in high school. After many days he had a proof, but only after going through a good many of the 32,784 cases! Bibliography [1] J. C. Baez, Patterns that eventually fail, https://johncarlosbaez.wordpress.com/2018/09/ 20/patterns-that-eventually-fail/. [2] A. Carr, Party at Ramsey’s, http://blogs.ams.org/mathgradblog/2013/05/11/ mathematics/. [3] J. H. Conway and R. K. Guy, The book of numbers, Copernicus, New York, 1996. MR1411676 [4] A. M. Gleason, R. E. Greenwood, and L. M. Kelly, The William Lowell Putnam Mathematical Competition: Problems and solutions: 1938–1964, Mathematical Association of America, Washington, D.C., 1980. MR588757 [5] R. L. Graham and J. H. Spencer, Ramsey Theory, Scientific American (July 1990), 112-117. http://www.math.ucsd.edu/~ronspubs/90_06_ramsey_theory.pdf. [6] R. L. Graham, Some of my favorite problems in Ramsey theory, INTEGERS 7 (2007), no. 2, #A15. [7] W. T. Gowers, A new proof of Szemer´ edi’s theorem, Geom. Funct. Anal. 11 (2001), no. 3, 465–588, DOI 10.1007/s00039-001-0332-9. MR1844079 [8] C. B. Haselgrove, A disproof of a conjecture of P´ olya, Mathematika 5 (1958), 141–145, DOI 10.1112/S0025579300001480. MR0104638 [9] B. M. Landman and A. Robertson, Ramsey theory on the integers, Student Mathematical Library, vol. 24, American Mathematical Society, Providence, RI, 2004. MR2020361 [10] L. Moser and W. B. Ross, Mathematical Miscellany, Math. Mag. 23 (1949), no. 2, 109–114. MR1570450 [11] The On-Line Encyclopedia of Integer Sequences, A000127 (Maximal number of regions obtained by joining n points around a circle by straight lines. Also number of regions in 4-space formed by n − 1 hyperplanes), http://oeis.org/A000127. [12] G. P´ olya Verschiedene Bemerkungen zur Zahlentheorie, Jahresbericht der Deutschen Mathematiker-Vereinigung 28, 31-40. [13] F. P. Ramsey, On a Problem of Formal Logic, Proc. London Math. Soc. (1930), s2-30, no. 1, 264-286. https://londmathsoc.onlinelibrary.wiley.com/doi/abs/10.1112/plms/ s2-30.1.264.


The Ergodic Theorem Introduction A discrete dynamical system consists of a set of states X and a function T : X → X. Given a system in state x ∈ X, let T (x) be the state of the system one unit of time later. If the system starts in state x, it next moves to T (x), then to T 2 (x) = T (T (x)), and so forth. If A is a set of states and  1 if x ∈ A, χA (x) = 0 if x ∈ /A is the characteristic function (also called the indicator function) of A, then n  χA (T i (x)) i=1

counts the number of visits of x to A up to and including time n. The time average of visits of x to A is n 1 lim χA (T i (x)), n→∞ n i=1 if this limit exists. Suppose that we have a notion of “size” m(A) for subsets A of X. We insist that this “measure” is normalized so that m(X) = 1. Then m(A) can be thought of as the relative size of A in X (see the notes for 1924 for some of the hazards of naive measure theory). We also assume that T : X → X is measure preserving, in the sense that m(T −1 (A)) = m(A) for every measurable set A. That is, although T mixes and rearranges points of X, the size of A is unchanged after an application of T . For instance, X could be a batch of (incompressible) cookie dough and T could be the act of kneading the dough once (in some prescribed manner) for one minute. A particular handful A of dough might be warped, stretched, or cut, but the volume occupied by A, T (A), T 2 (A),. . . is always the same. A consequence of Ludwig Boltzmann (1844–1906) and Josiah Willard Gibbs’s investigations in statistical mechanics was the ergodic hypothesis. A version of the ergodic hypothesis states that the time average of a system should equal the space average, m(A). To see what this means, we consider a simple example. An irrational rotation is a function T : [0, 1) → [0, 1) of the form T (x) = x + θ (mod 1), in which θ is a fixed irrational real number. By x + θ (mod 1), we refer to √ the fractional part x + θ − x + θ of x + θ. For example, if x = 0.5 and θ = 2 = 1.414 . . ., then x + θ (mod 1) = 0.914 . . .. The term “rotation” stems from the fact the wrapped interval [0, 1) is topologically the same as a circle. From this perspective, addition of θ modulo 1 corresponds to a rotation of the circle through an angle of 2πθ. It is possible to show that the ergodic hypothesis holds for this example. For instance, 95



Figure 1. T (0), T 2 (0), . . . , T 100 (0) for θ =

2, e, and π, respectively.

the average amount of time that T (0), T 2 (0), . . . spends in an interval [a, b] equals the length b − a of that interval; see Figure 1. In 1931 John von Neumann [6] followed shortly by the sooner-to-publish George Birkhoff [2], proved that time averages exist and equal the space averages for measure-preserving systems satisfying a condition called ergodicity. A set E is invariant if T (x) ∈ E if and only if x ∈ E. Ergodicity means that the only invariant sets for T are those that differ from ∅ or X by a set of measure zero. If E is invariant and x starts in E, then all of its iterates stay in E and no point outside E visits E. That means that if T were not ergodic, there would exist sets E and E c , both of positive measure, for which the dynamics of T on E will be totally unrelated to the dynamics of T on E c . In other words, one could decompose the dynamic system into two independent, simpler systems. We can now state the Birkhoff ergodic theorem: for all measurable sets A there exists a set of measure zero N so that 1 χA (T i (x)) = m(A) for all x outside N. n→∞ n i=1 n


An important part of the theorem is that the limit exists. In fact, once we know the limit exists, using standard results from analysis it is possible to show the limit equals the measure of A. An immediate consequence is that for all sets A of positive measure, every point of X (outside a set of measure zero) visits the set A, and furthermore, the visits are with the “right frequency.” Thus, we know a lot about the orbit of almost every point. This theorem has had a strong influence in analysis and has many consequences. For example, it can be used to prove Weyl’s uniform distribution property1 and the law of large numbers2 from probability. The ergodic theorem is in fact a bit more general: one can replace χA by any Lebesgue-integrable function f , and then m(A) is replaced by the integral of f . The theorem proved by von Neumann is similar to Birkhoff’s but the convergence is in the norm of a Hilbert space in which the functions reside. For an introduction and proof the reader may consult [7]. Further historical details and current developments can be found in [1]. α is irrational, then the set {nα (mod 1)}∞ n=1 is equidistributed in [0, 1]. X1 , . . . , Xn be independent random variables drawn from a common distribution with mean μ and let X n = (X1 + · · · + Xn )/n denote the sample mean. Then X n converges in probability to μ. A sequence of random variables Sn converges in probability to a random variable S (which in our case will be the constant μ) if for every  > 0 we have limn→∞ P (|Sn −S| > ) = 0. 1 If

2 Let



Centennial Problem 1931 Proposed by Cesar E. Silva, Williams College. In 1988 Jean Bourgain (1954– ) proved that for every square-integrable function f , the time average along polynomial times exists outside a set of measure zero [3]. In other words, 1 f (T p(i) (x)) converges for almost all x, n i=1 n

for any polynomial p with integer coefficients. When all powers of T are ergodic, it follows that this limit equals the integral that is expected. It is reasonable to ask what happens when the function f is merely integrable, even in the case of the squares: p(i) = i2 . It was shown recently by Buczolich and Mauldin [4] that the theorem for the squares fails when f is only assumed to be integrable. This proof has been extended recently by P. LaVictoire. It would be interesting to find simpler proofs of all of these results.

1931: Comments Continued fractions. We briefly discuss a connection between the ergodic theorem and continued fractions; see [5] and the references therein, as well as the 1934 and 1972 entries. Each real number x has a unique continued fraction expansion 1

x = a0 (x) +



a1 (x) + a2 (x) +

1 a3 (x) +


1 ···

in which the positive integers ai (x) are the continued fraction digits of x. For typographical reasons we write x = [a0 ; a1 , a2 , . . . ] or x = [a1 , a2 , . . . ] if a0 = 0. How are the ai (x) computed? First, let a0 (x) = x, the greatest integer at most x. Next, let a1 (x) = 1/(x − a0 (x)) and so forth. The continued fraction (1931.1) is finite if and only if x is rational. It is eventually periodic if and only if x is a quadratic irrational. For an x chosen uniformly at random in [0, 1), what is the probability as n → ∞ that the nth digit is k? The answer is the beautiful Gauss–Kuzmin theorem, due to Gauss and Rodion Kuzmin (1891–1949). It says that for almost all x the probability converges to   1 ; log2 1 + k(k + 2) see [5] for a proof, which is an expanded version of the the argument in the classic book by Aleksandr Khinchin (1894–1959). The beauty stems from the clear, simple formula. The problem with the Gauss–Kuzmin theorem is that we do not know much about the exceptional set. For example, although we believe that cubic irrationals follow the Gauss–Kuzmin distribution, we do not know for sure. Some specific numbers, such as e1/n for n = 1, 2, . . ., fail dramatically.



(a) x = π

(b) x = e

Figure 2. Histograms of the first 10,000 iterates of the Gauss map versus the probability density function (PDF) of the Gauss measure for fixed x. A central ingredient in the proof of the Gauss–Kuzmin theorem is the Gauss map G. This is the ergodic transformation on [0, 1) defined by 0 1 1 1 G(x) = − . x x We can use G to obtain the digits of x’s continued fraction expansion as follows. Given x ∈ [0, 1), set a0 = 0, a1 = 1/G(x), and an = 1/Gn (x). The reader is strongly encouraged to choose a few different x and to look at the sequence {Gn (x)}∞ n=1 with respect to the Gauss measure γ, defined by  dx 1 . γ(A) = log 2 A 1 + x We compare the distribution of iterates of the Gauss map applied to π and e and the Gauss measure in Figure 2. Note the stark difference in behavior; the fit is excellent for π but so bad for e that we cannot show the observed and Gauss–Kuzmin predictions on the same plot. Both of these numbers are transcendental, but they have very different properties. The first few digits of their continued fraction expansions are π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3, 13, 1, 4, 2, 6, 6, 99, 1, 2, 2, 6, 3, 5, 1, 1, 6, 8, 1, 7, 1, 2, 3, 7, 1, 2, 1, 1, 12, 1, 1, 1, 3, 1, 1, 8, 1, 1, 2, 1, 6, 1, 1, 5, 2, 2, 3, 1, 2, 4, 4, 16, 1, 161, 45, 1, 22, 1, 2, 2, 1, 4, 1, 2, 24, 1, 2, 1, 3, 1, 2, 1, 1, 10, . . . ] and e = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, 1, 12, 1, 1, 14, 1, 1, 16, 1, 1, 18, 1, 1, 20, 1, 1, 22, 1, 1, 24, 1, 1, 26, 1, 1, 28, 1, 1, 30, 1, 1, 32, 1, 1, 34, 1, 1, 36, 1, 1, 38, 1, 1, 40, 1, 1, 42, 1, 1, 44, 1, 1, 46, 1, 1, 48, 1, 1, 50, 1, 1, 52, 1, 1, 54, 1, 1, 56, 1, 1, 58, 1, 1, 60, 1, 1, 62, 1, 1, 64, 1, 1, 66, 1, . . . ]. Notice that two out of every three continued-fraction digits of e equal 1 and the others form an arithmetic progression. A pattern in π’s continued-fraction digits has never been found.



Bibliography [1] V. Bergelson, Some historical comments and modern questions around the ergodic theorem, Dynamics of Complex Systems, Research Institute for Math. Sciences, Kyoto, 2004, 1–11. https://people.math.osu.edu/bergelson.1/vb_Kyoto8Nov04.pdf. [2] G. Birkhoff, Proof of the ergodic theorem, Proc. Nat. Acad. Sci, USA 17 (1931), 656–660. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1076138/. [3] J. Bourgain, On the maximal ergodic theorem for certain subsets of the integers, Israel J. Math. 61 (1988), no. 1, 39–72, DOI 10.1007/BF02776301. http://link.springer.com/article/10. 1007%2FBF02776301. MR937581 [4] Z. Buczolich and R. D. Mauldin, Divergent square averages, Ann. of Math. (2) 171 (2010), no. 3, 1479–1530, DOI 10.4007/annals.2010.171.1479. http://annals.math.princeton.edu/ wp-content/uploads/annals-v171-n3-p02-p.pdf. MR2680392 [5] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 [6] J. von Neumann, Proof of the quasi-ergodic hypothesis, Proc. Nat. Acad. Sci., USA 18 (1932), 70–82. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1076162/. [7] C. E. Silva, Invitation to ergodic theory, Student Mathematical Library, vol. 42, American Mathematical Society, Providence, RI, 2008. MR2371216


The 3x + 1 Problem Introduction The Collatz function T : N → N is defined by ⎧x ⎨ if x is even, T (x) = 2 ⎩3x + 1 if x is odd. Now pick a seed, a natural number n, and consider the corresponding Collatz sequence n, T (n), T 2 (n), . . ., in which T k (n) denotes the k-fold iterate T (T (· · · (T (n)))). This is also called the orbit of n under T . For example, n = 21 yields the Collatz sequence 21,
























and n = 24 provides 24,






Both sequences eventually settle down to the repeating pattern 4, 2, 1, 4, 2, 1, . . ., a periodic orbit of period three. It appears that for every initial seed n, the Collatz sequence eventually reaches the number 1; that is, every Collatz sequence ends with 4, 2, 1, 4, 2, 1, . . .. Proving this is the famed 3x + 1 problem (or the 3x + 1 conjecture), often credited to Lothar Collatz (1910–1990). It goes by an astounding number of other names as well: Ulam’s conjecture, Kakutani’s problem, the Thwaites conjecture, and the Syracuse problem. We are not going to debate the origins of this problem and are content to call it the 3x + 1 problem. One way to visualize the 3x + 1 problem is with a directed graph; see Figure 1. Each natural number is a vertex in the Collatz graph and there is an arrow from j to k whenever T (j) = k. The 3x + 1 conjecture asserts that no matter which vertex you start on, following the arrows in the Collatz graph always leads to 4, 2, 1, 4, 2, 1, . . .. Some seeds take a long time to reach 1. For example, n = 27 requires 111 iterations. Its Collatz sequence climbs all the way up to 9,232 before coming back down (see the notes for 1929 for an example of another sequence that demonstrates this sort of behavior). This highlights one of our main obstacles: there is no simple way to predict how high the Collatz sequence for a given seed reaches. As of 2015, the 3x + 1 conjecture has been verified for all seeds less than 260 . Although this is an overwhelming amount of numerical evidence, it is not a proof (see the notes for 1930 for examples of misleading computations). 101


1932. THE 3x + 1 PROBLEM

Figure 1. The map T : N → N induces a directed graph on the natural numbers. A portion of this graph is represented here. The 3x + 1 problem suggests that any starting point eventually leads to the cycle 1, 4, 2, 1, 4, 2, . . ..

One can prove that the distribution of digits converges, in an appropriate sense, to Benford’s law on digit bias (see the 1938 entry and [4, 7]). That is, if you take a large starting seed and look at all the iterates until it hits the cycle 4, 2, 1, then with probability tending to 1 the digit distribution converges to Benford’s law. One can make this precise: given any tolerance, the number of starting seeds in an interval [1, X] that are more than this tolerance away from Benford tends to zero as X → ∞. How can one disprove the 3x + 1 conjecture? There are two ways in which the conjecture could be false. There might be a seed whose Collatz sequence is unbounded. Or there might be a periodic orbit other than 4, 2, 1 (it is known that there are no other periodic orbits of length 100,000,000 or less [3]). Some have described the 3x + 1 problem as a Soviet conspiracy to slow down American mathematics since so many people tried working on it, tempted by its apparent simplicity. Paul Erd˝os said that mathematics is not yet ready to address questions such as the the 3x + 1 problem.



Centennial Problem 1932 Proposed by Jeffrey Lagarias, University of Michigan. Here we consider the original function G : Z → Z, defined by G(3n) = 2n,

G(3n + 1) = 4n + 1,

and G(3n + 2) = 4n + 3,

that Collatz wrote down on July 1, 1932. It is a permutation of Z and its inverse is given by G−1 (2n) = 3n,

G−1 (4n + 1) = 3n + 1,

and G−1 (4n + 3) = 3n + 2.

One can show that G maps N onto N, so it induces a permutation of N too. One finds that G(1) = 1 is a fixed point, that G(2) = 3 and G(3) = 2 form a periodic orbit of period 2, and that G(4) = 5,

G(5) = 7,

G(7) = 9,

G(9) = 6,


G(6) = 4

form a periodic orbit of period 5. (a) What happens for n = 8? Computation indicates that the forward orbit {Gk (n) : k ≥ 0} of n = 8 includes numbers larger than 10400 . But is the orbit infinite? This question is the original Collatz problem and it has been proposed independently several times, starting with Murray S. Klamkin (1921–2004) in 1963 [2]. It too is unsolved and could be as hopeless as the 3x + 1 problem. (b) For N = 1, 2, . . ., let SN = {n ∈ [1, N ] : Gk (8) = n for some k ∈ Z}. It is conjectured that

|SN | = 0. (1932.1) N →∞ N Probabilistic models suggest that |SN | = O(log N ) as N → ∞ and computer experiments support this. So there seems to be “room to spare” in trying to establish (1932.1). Nevertheless, this problem seems difficult. The reader is warned. lim

(c) Consider the full forward and backward orbit of n = 8: S∞ = {n ∈ N : Gk (8) = n for some k ∈ Z}. Disprove that there are only finitely many natural numbers that are not in S∞ . This assertion sounds simple to resolve and it is much weaker than (1932.1). Nevertheless, it is an open problem and may be as intractable as the 3x + 1 problem. 1932: Comments A heuristic approach. When stuck on a difficult conjecture, one can try to give heuristic arguments for or against its validity. To simplify our model, we omit the troublesome +1 in the definition of the Collatz function. Since half of the even numbers are divisible by 2 and not by 4, and a fourth are divisible by 4 and not by 8, and so on, we consider the functions H2 (x) = 3x/2, H4 (x) = 3x/4, H8 (x) = 3x/8, and so forth. Our heuristic approximation to the Collatz function is denoted H; it is obtained by applying H2k with probability 1/2k for k = 1, 2, . . .. The hope is that this related problem is easier to analyze and that its behavior will shed light on the original problem.


1932. THE 3x + 1 PROBLEM

It is more appropriate to consider the expected value of log H(x) since there are products involved. According to our model,1 ∞ ∞   1 log(3x/2k ) log H k (x) = 2 2k 2k k=1 k=1   ∞  k log 2 = log x + log(3/4) = log x + log 3 − 2k

E[log H(x)] =


< log x. Consequently, iterating H once decreases the size of the expected outcome. Repeated iterations should continue to decrease. Not only does such an argument lead to heuristic support for the 3x + 1 conjecture, it also suggests roughly how many steps one needs to iterate until we reach 1. Since each iteration tends to replace x with 34 x, the expected number of iterations should satisfy (3/4)m x = 1; that is, log x . m≈ log 4/3 Numerical data strongly supports this rate; see [5, 6] for more on these ideas. The idea of replacing a deterministic problem with a random one is applicable in many other settings. One can do this with prime numbers to build intuition about a host of problems. However, one must be careful. Just as the 3x + 1 problem has some structure that is lost in the conversion to a random model, the actual sequence of primes has additional structure not present in random analogues. While random models are useful, they sometimes give the wrong answer in certain regimes. Lychrel numbers. We end with an example that leads to another simply stated open problem. Consider the function L : N → N defined by L(n) = n+R(n), in which R(n) is the number formed by reversing the decimal representation of n. For instance, L(89) = 89 + 98 = 187, L2 (89) = L(187) = 187 + 781 = 968, and so forth. This leads to the following sequence: 89, 187, 968, 1837, 9218, 17347, 91718, 173437, 907808, 1716517, 8872688, 17735476, 85189247, 159487405, 664272356, 1317544822, 3602001953, 7193004016, 13297007933, 47267087164, 93445163438, 176881317877, 955594506548, 1801200002107, 8813200023188. . . . The number L24 (89) is the palindrome 8,813,200,023,188; it is the same read forward or backward. Most natural numbers eventually appear to reach a palindrome after repeated applications of L. A Lychrel number is a natural number for which this process never yields a palindrome. Brute force computations show that no n ≤ 195 is a Lychrel number, but no one is sure about 196 (this leads to an alternative name for this iteration: the 196-algorithm). Nobody knows whether Lychrel numbers exist, but 196 sure  n −1 , valid for |z| < 1, multiply differentiate the identity ∞ n=0 z = (1 − z) ∞ the result by z, and obtain n=1 nz n = z/(1 − z)2 . Then substitute z = 1/2. 1 To



k k=1 2k ,



looks like a strong candidate: 196, 887, 1675, 7436, 13783, 52514, 94039, 187088, 1067869, 10755470, 18211171, 35322452, 60744805, 111589511, 227574622, 454050344, 897100798, 1794102596, 8746117567, 16403234045, 70446464506, 130992928913, 450822227944, 900544455998. . . . Over a billion iterates have been computed without reaching a palindrome. Extensive computation suggests that the following integers are Lychrel numbers [9]: 196, 295, 394, 493, 592, 689, 691, 788, 790, 879, 887, 978, 986, 1495, 1497, 1585, 1587, 1675, 1677, 1765, 1767, 1855, 1857, 1945, 1947, 1997, 2494, 2496, 2584, 2586, 2674, 2676, 2764, 2766, 2854, 2856, 2944, 2946, 2996, 3493, 3495, 3583, 3585, 3673, 3675. Curiously, Lychrel numbers are known to exist in other bases. For example, in binary the number 10110 (which is 22 in decimal) is a Lychrel number. Can you prove it? Bibliography [1] R. K. Guy, Don’t try to solve these problems!, Amer. Math. Monthly 90 (1983), 35–41. http:// www.jstor.org/discover/10.2307/2975688?uid=3739256&uid=2&uid=4&sid=21102550539183. [2] M. S. Klamkin, Problem 63-13∗ , SIAM Review 5 (1963), 275–276. [3] L. Halbeisen and N. Hungerb¨ uhler, Optimal bounds for the length of rational Collatz cycles, Acta Arith. 78 (1997), no. 3, 227–239, DOI 10.4064/aa-78-3-227-239. MR1432018 [4] A. V. Kontorovich and S. J. Miller, Benford’s law, values of L-functions and the 3x + 1 problem, Acta Arith. 120 (2005), no. 3, 269–297, DOI 10.4064/aa120-3-4. http://arxiv.org/ pdf/math/0412003v2. MR2188844 [5] J. C. Lagarias, The 3x + 1 problem and its generalizations, Amer. Math. Monthly 92 (1985), no. 1, 3–23, DOI 10.2307/2322189. MR777565 [6] J. C. Lagarias (ed.), The ultimate challenge: the 3x + 1 problem, American Mathematical Society, Providence, RI, 2010. MR2663745 [7] J. C. Lagarias and K. Soundararajan, Benford’s law for the 3x + 1 function, J. London Math. Soc. (2) 74 (2006), no. 2, 289–303, DOI 10.1112/S0024610706023131. http://arxiv.org/pdf/ math/0509175.pdf. MR2269630 [8] H. L. Montgomery and K. Soundararajan, Primes in short intervals, Comm. Math. Phys. 252 (2004), no. 1-3, 589–617, DOI 10.1007/s00220-004-1222-4. MR2104891 [9] The On-Line Encyclopedia of Integer Sequences, A023108 (Positive integers which apparently never result in a palindrome under repeated applications of the function f (x) = x + (x with digits reversed), http://oeis.org/A023108.


Skewes’s Number Introduction For a few decades, Skewes’s number held the record as the largest finite number to meaningfully appear in a mathematical research paper. Let π(x) denote the number of primes at most x and let  x dt (1933.1) Li(x) = 2 log t denote the offset logarithmic integral function. One version of the prime number theorem (see the 1913 and 1919 entries) says that lim


π(x) = 1. Li(x)

This is illustrated in Figure 1. The logarithmic integral gives a better approximation to π(x) than x/ log x, which is used in other formulations of the prime number theorem; see Table 1.

(a) x ≤ 100

(b) x ≤ 1,000

(c) x ≤ 10,000

(d) x ≤ 100,000

Figure 1. Graphs of Li(x) versus π(x) on various scales. 107



Table 1. The logarithmic integral Li(x) is a better approximation to the prime-counting function π(x) than is x/ log x. The entries in the table have been rounded to the nearest integer. x 1000 10,000 100,000 1,000,000 10,000,000 100,000,000

π(x) Li(x) x/ log x 168 177 145 1,229, 1,245 1,086 9,592 9,629 8,686 78,498 78,627 72,382 664,579 664,917 620,421 5,761,455 5,762,208 5,428,681

For all practically computable values of x, the function li(x) = Li(x) + log 2 satisfies li(x) > π(x). Based upon overwhelming numerical evidence, it was conjectured that this held for all x. In 1914, John Edensor Littlewood (1885–1977) showed that li(x) − π(x) changes sign infinitely many times. Littlewood asked one of his students, a South African named Stanley Skewes (1899–1988), to compute how high one must go to find the first integer s0 for which π(s0 ) > li(s0 ). Assuming the truth of the Riemann hypothesis,1 Skewes proved in 1933 that s0 < ee



In 1955, he showed that if the Riemann hypothesis is false, then s0 < ee

7.705 ee


Both of these extraordinary numbers are sometimes referred to as Skewes’s number . While much progress has been made, the best upper bounds on s0 are still on the order of e728 (or about 10316 ). It seems hopeless to expect the first sign change to be found by computer. Since Skewes’s second bound is larger than the first, we can conclude that li(x)−π(x) changes sign somewhere before exp(exp(exp(exp(7.705)))). Why? There are two cases. Either the Riemann hypothesis is true or it is false, and Skewes covered both cases! Voil` a! For another striking example of this sort of “magical” reasoning, see the 1935 entry. Are we overlooking a third possibility? Could the Riemann hypothesis (see the 1942 and 1945 entries) be undecidable, say in ZFC (Zermelo–Fraenkel set theory with the axiom of choice)? If it is false, then it must be provably false in ZFC. Why? Because it is known to be equivalent, under ZFC, to various elementary statements about natural numbers. Let 1 1 Hn = 1 + + · · · + 2 n denote the nth harmonic number . In 2002, Lagarias showed that the statement  “for each n ≥ 1, d ≤ Hn + eHn log Hn ” d|n 1 The Riemann hypothesis, one of the seven Clay Millennium Problems (see the comments for the 2000 entry), is one of the most important open problems in mathematics. Its veracity would have numerous applications throughout number theory and cryptography. It’s going to take a while to build up to! See below and the entries for 1942, 1945, 1948, 1967, and 1987.



is equivalent to the Riemann hypothesis [3]. Thus, if the Riemann hypothesis (RH) is false, there is a natural number n for which the preceding inequality is violated and hence there is a finite computation that disproves the Riemann hypothesis. On the other hand, if RH is undecidable in ZFC, then it is true (but just not provable in ZFC; see the 1929 entry on G¨odel’s work). Why? If the RH were undecidable in ZFC, then no natural number n violating Lagarias’s condition exists (the existence of such an n would lead to a quick proof of the falsehood of the Riemann hypothesis). Thus, if the RH is undecidable in ZFC, then Lagarias’s condition holds, so the RH is true (just not provable). See the 1924, 1929, and 1963 entries for more information on axiom systems, and the 1987 entry for connections between the Riemann hypothesis and counting primes.

Centennial Problem 1933 Proposed by Steven J. Miller, Williams College. Let e↑n (x) = exp(exp(· · · exp(exp(x)))), in which there are n iterated exponentials. Thus, Skewes’s 1955 result is the bound s0 ≤ e↑4 (7.705). If we were to write this as 10y , what would y equal? More generally, if e↑n (x) = 10f (x;n) , how fast does f grow with n? With x? The functions e↑n (x) are also known as iterated towers. For more rapidly growing quantities, see the 1926 and 1992 entries.

1933: Comments A proof technique. Skewes’s arguments use a powerful proof technique: break the problem into an exhaustive set of cases, where in each case you have additional facts at your disposal. For another example of this approach, see the 1935 entry. Term-by-term multiplication and Mertens’s theorem. Here ∞ are some facts about infinite series that we will need shortly. Suppose that n=0 an and ∞ b are two convergent series of complex numbers. Naively multiplying the n n=0 two series term-by-term suggests that ∞  i=0




= (a0 + a1 + a2 + · · · )(b0 + b1 + b2 + · · · )


= a0 b0 + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 ) + · · · =


cn ,


n ∞ ∞ in which ∞ cn = k=0 ak bn−k . The series n=0 cn is the Cauchy product of n=0 an and n=0 bn . This term-by-term multiplication of series is permissible if both of



the series involved are absolutely convergent.2 This is used implicitly in calculus, complex variables, and differential equations whenever power series methods are involved. If both series are conditionally convergent (convergent but not absolutely convergent), then their Cauchy product series can diverge. An example is furnished by  ∞ (−1)n . The alternating series test confirms that ∞ an = bn = √ n=0 an and n=0 bn n+1 converge. However, n n    1    |cn | =  ak bn−k  = (k + 1)(n − k + 1) k=0 k=0 ≥

n  k=0


= (n + 1)



2 +1


n  n k=0 2

n  1 2 = +1 n+2 k=0

2 2n + 2 = n+2 n+2

∞ does not tend to zero, so n=1 cn diverges. Mertens’s theorem, due to Franz Mertens (1840–1927), ensures that if at least one of the two series involved is absolutely convergent, then term-by-term multi∞ ∞ plication is permissible. To be more specific, if n=0 an = A and n=0 bn = B are convergent series of complex numbers, atleast one of which is absolutely con∞ vergent, then their Cauchy product series n=0 cn converges to AB. Proving Mertens’s theorem is a good exercise in analysis. Here is a sketch. Let An , Bn , and Cn be the nth  partial sums of the three series involved and consider the idenn = A B + tity C n n i=0 (Bi − B)an−i . Since An → A, the key is to show that n (B − B)a → 0 as n → ∞. i n−i i=0 The Riemann zeta function and the Euler product formula. In homage to Riemann, who wrote s = σ + it to denote his complex variable, we follow him and use the letter s below to refer to a complex number. The Riemann hypothesis concerns the location of the complex zeros of the Riemann zeta function ζ(s) =

∞  1 , s n n=1


which is defined initially for Re s > 1. It might at first appear strange to call (1933.2) by such a fancy name. Indeed, (1933.2) is the familiar p-series from calculus. However, the Riemann zeta function is the critical function that links analysis and number theory. In particular, the deepest properties of the prime numbers are encoded in the Riemann zeta function. The connection between the innocuous looking Riemann zeta function and the prime numbers is furnished by the Euler product formula. If Re s > 1, then −1 ∞   1 1 = . (1933.3) 1 − ns ps n=1 p prime 2A




an is absolutely convergent if



|an | converges. Absolute convergence ∞ (−1)n+1

implies convergence, but the converse is not true. The alternating harmonic series  1 converges to log 2, but the harmonic series ∞ n=1 n diverges.





Since quite a few of our entries (1928, 1942, 1945, 1967, and 1987) involve the Riemann zeta function, we can take the liberty to develop the topic slowly and deliberately. If p is a fixed prime number and s > 1, then the series  n −1 ∞ ∞ ∞     1 1 1 1 = = = 1 − (pn )s pns ps ps n=0 n=0 n=0 converges absolutely since |1/ps | < 1. By Mertens’s theorem,    −1  −1  1 1 1 1 1 1 1 1 = 1 + s + s + s + ··· 1 + s + s + s + ··· 1− s 1− s 2 3 2 4 8 3 9 27 1 1 1 1 1 1 1 = 1 + s + s + s + s + s + s + s + ··· , 2 3 4 6 8 9 12 in which the last sum includes terms corresponding exactly to those numbers whose prime factorizations involve only 2 or 3. Since Re s > 1, the preceding series is absolutely convergent. Similarly, −1  −1  −1  1 1 1 1− s 1− s 1− s 2 3 5 1 1 1 1 1 1 1 1 1 1 = 1 + s + s + s + s + s + s + s + s + s + s + ··· , 2 3 4 5 6 8 9 10 12 15 in which the sum involves those numbers whose only prime factors are 2, 3, or 5, and so forth. Since the tail end of a convergent series tends to zero, −1  ∞ ∞    1 1 1   ≤ − → 0 1 −   s s n p ns n=1 p prime p≤N


as N → ∞. This establishes the Euler product formula (1933.3). We get Euclid’s theorem on the infinitude of the primes as a corollary. If there were only finitely many primes, then the right-hand side of (1933.3) would converge to a finite limit as s → 1+ . However, the left-hand side of (1933.3) diverges as s → 1+ since its terms tend to those of the harmonic series. Bibliography [1] T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Heidelberg, 1976. MR0434929 [2] H. Davenport, Multiplicative number theory, 3rd ed., revised and with a preface by Hugh L. Montgomery, Graduate Texts in Mathematics, vol. 74, Springer-Verlag, New York, 2000. MR1790423 [3] J. C. Lagarias, An elementary problem equivalent to the Riemann hypothesis, Amer. Math. Monthly 109 (2002), no. 6, 534–543, DOI 10.2307/2695443. MR1908008 [4] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 [5] S. Skewes, On the Difference pi(x) − li(x) (I), J. London Math. Soc. 8 (1933), no. 4, 277–283, DOI 10.1112/jlms/s1-8.4.277. MR1573970 [6] S. Skewes, On the difference π(x) − li x. II, Proc. London Math. Soc. (3) 5 (1955), 48–70, DOI 10.1112/plms/s3-5.1.48. MR0067145


Khinchin’s Constant Introduction Each irrational real number x has a unique infinite continued fraction expansion 1

x = a0 (x) +



a1 (x) + a2 (x) +

1 a3 (x) +

1 ···

in which the ai (x) are the continued fraction digits of x and a1 (x), a2 (x), . . . are positive integers (see the 1931 and 1972 entries or [6] and the references therein for more details). For instance, 1

π = 3+





15 + 1+

1 292 +

1 ···

which we write as π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3, 13, 1, 4, 2, 6, 6, 99, 1, 2, 2, 6, 3, 5, 1, 1, 6, 8, 1, 7, 1, 2, 3, 7, 1, 2, 1, 1, 12, 1, 1, 1, 3, 1, 1, 8, 1, 1, 2, 1, 6, 1, 1, 5, 2, 2, 3, 1, 2, 4, 4, 16, 1, 161, 45, 1, 22, 1, 2, 2, 1, 4, 1, 2, 24,. . . ]. Truncating this expansion after tions to π: 1 3+ 7 1 3+ 1 7 + 15 1 3+ 1 7 + 15+ 1

a few steps provides excellent rational approxima22 = 3.142857 . . . , 7 333 = 3.141509 . . . , = 106 355 = 3.141592 . . . . = 113




These approximations are accurate to 2, 4, and 6 decimal places, respectively. Continued fractions provide an alternative to base-dependent expansions, such as binary or decimal expansions. Since they are base-independent, there is a possibility that the continued fraction digits might have some deep meaning. In 1934, Aleksandr Khinchin proved that, for almost every real number x, the geometric 113



mean of the first n digits in the continued fraction expansion of x converges to the same constant K as n → ∞:  (1934.1) lim n a1 (x)a2 (x) · · · an (x) = K. n→∞

That means that, for every > 0, the set of real numbers x for which (1934.1) fails can be covered by countably many open intervals of total length < . The constant K is called Khinchin’s constant; it is given by log2 r ∞  1 = 2.6854520010653064453 . . . . 1+ K = r(r + 2) r=1 It is not known whether K is rational, algebraic irrational, or transcendental. Besides examples contrived for the purpose, we do not know a “natural” example of an x for which the geometric mean of the ai (x) converges to K. However, numerical experiments suggest that π, γ, and Khinchin’s constant itself are likely candidates (the geometric mean for e diverges). Since you are dying to know, the continued fraction expansion of Khinchin’s constant is K = [2; 1, 2, 5, 1, 1, 2, 1, 1, 3, 10, 2, 1, 3, 2, 24, 1, 3, 2, 3, 1, 1, 1, 90, 2, 1, 12, 1, 1, 1, 1, 5, 2, 6, 1, 6, 3, 1, 1, 2, 5, 2, 1, 2, 1, 1, 4, 1, 2, 2, 3, 2, 1, 1, 4, 1, 1, 2, 5, 2, 1, 1, 3, 29, 8, 3, 1, 4, 3, 1, 10, 50, 1, 2, 2, 7, 6, 2, 2, 16, 4, 4, 2, 2, 3, 1, 1, 7, 1, 5, 1, 2, 1, 5, 3, 1, 1, 1, 2, 2, 2, 1, 13, 11, 770, 1, 4, 2, 1, 14, 1, 14, 2, 1, 6, 1, 1, 1, 9, 2, 53, 1, 2, 2, 1, 9, 5, 6, 2, 1, 2, 1, 5, 4, 1, 234, 7, 1, 1, 4, 3, 19, 3, 1, 10, 18, 8, 24, 1, 12, 1, 1, 10, 3, 2, 1, 32, 112, 5, 1, 1, 3, 2, 5, 1, 2, 1, 3, 2, 1, 2, 1, 1, 2, 2, 4, 1, 6, 4, 1, 2, 1, 8, 2, 1, 4, 2, 1, 1, 11, 1, 1, 1, 5, 3, 4, 2, 6, 2, 1, 2, 1, 1, 19, 1, 38, 2, 1, 1, 4, 6, 2, 50, 2, 1, 1, 2, 1, 4, 1, 5, 1, 2, 8, 13, 1, 2, 1, 1, 9, 1, 6, 3, 6, 1, 4, 2, 1, 272, 1, 1, 1, 1, 4, 1, 21, 3, 1, 2, 87, 1, 8, 1, 2, 3, 2, 1, 1, 2, 3, 16, 1, 5, 3, 5, 1, 1, 1, 10, 11, 45, 2, 331, 2, 1, 2, 1, 4, 1, 2, 2, 1, 3, 1, 1, 3, 1, 2, 2, 1, 13, 1, 3, 3, 2, 4, 4, 1, 4, 40,1, 9, 1, 4, 1, 1, 1,. . . ]. Does x = K satisfy (1934.1)? Centennial Problem 1934 Proposed by Jake Wellens, Caltech. This problem explores some consequences of the conjectured transcendence of K. Assume that K is transcendental and let x be a quadratic irrational (an algebraic irrational of degree 2; see p. 30). Prove that for any such x, the geometric  n mean a1 (x)a2 (x) · · · an (x) does not converge to K. 1934: Comments Solution to the problem. A quadratic irrational has a continued fraction expansion that is eventually periodic (try to prove it). Thus, the geometric means of its continued fraction digits converges to the th root of a product of integers, in which  denotes the length of the period. Consequently, the limit of the geometric means are either rational or algebraic irrational, and hence not transcendental. This solves the proposed problem.



e and Khinchin’s constant. While we cannot give an example of a number for which the geometric mean of its continued fraction digits converges to K, we can give a transcendental number for which it diverges, namely e. Its continued fraction expansion is e = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, 1, 12, 1, 1, 14, 1, 1, 16, 1, 1, 18, 1, 1, 20, 1, 1, 22, 1, 1, 24, 1, 1, 26, 1, 1, 28, 1, 1, 30, 1, 1, 32, 1, 1, 34, 1, 1, 36, 1, 1, 38, 1, 1, 40, 1, 1, 42, 1, 1, 44, 1, 1, 46, 1, 1, 48, 1, 1, 50, 1, 1, 52, 1, 1, 54, 1, 1, 56, 1, 1, 58, 1, 1, 60, 1, 1, 62, 1, 1, 64, 1, 1, 66, 1, . . . ]. Since it will not affect the limiting behavior, let us change the first 2 to a 1, so that our string of digits is 1, 1, 2, 1, 1, 4, 1, 1, 6, . . .. If we look at the geometric mean of the first 3n digits we have  2n n 1/3n = 21/3 n!1/3n . 1 2 n! Since Stirling’s formula 1 states that n! ≈

 n n 2πn , e

the geometric mean of the first 3n digits is comparable to (2n/e)1/3 , which diverges to infinity. Nonsimple continued fractions for π and e. Why does e have a “nice” continued fractional expansion while π does not? Maybe we are looking at things the wrong way. Instead of considering simple continued fraction expansions, in which all the numerators are 1’s, the situation drastically changes if we allow them to vary. One example, which restores balance between these two fundamental constants, is Brouncker’s formula: 4 = 1+ π







72 1 + ··· Of course, e has amazing expansions as well, such as 2+


e = 2 +



1 +


2 +

3 4 + ··· see the 1972 entry for a derivation of this formula. This is just the beginning of the story; see [1] and the references therein for the Rogers–Ramanujan continued fraction. 3 +

1 First note that n! = n(n − 1) · · · 2 · 1, so log(n!) = log n + log(n − 1) + · · · + log 2 + log 1.  We then approximate the sum with an integral and find log(n!) ≈ 1n log x dx = n log n − (n − 1), with error on the order of half the sum of the first and last terms. Exponentiating yields a rough approximation to n!. There are numerous elementary proofs of Stirling’s formula; see [2] and [5].



Bibliography [1] B. C. Berndt, H. H. Chan, S.-S. Huang, S.-Y. Kang, J. Sohn, and S. H. Son, The Rogers-Ramanujan continued fraction, Continued fractions and geometric function theory (CONFUN) (Trondheim, 1997), J. Comput. Appl. Math. 105 (1999), no. 1-2, 9– 24, DOI 10.1016/S0377-0427(99)00033-3. http://www.sciencedirect.com/science/article/ pii/S0377042799000333. MR1690576 [2] C. L. Frenzen, A New Elementary Proof of Stirling’s Formula, Math. Mag. 68 (1995), no. 1, 55–58. https://www.maa.org/sites/default/files/269138004440.pdf. MR1573069 [3] A. Khintchine, Metrische Kettenbruchprobleme (German), Compositio Math. 1 (1935), 361–382. http://archive.numdam.org/ARCHIVE/CM/CM_1935__1_/CM_1935__1__361_0/ CM_1935__1__361_0.pdf. MR1556899 [4] A. Ya. Khinchin, Continued fractions, The University of Chicago Press, Chicago, Ill.-London, 1964. MR0161833 [5] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480 [6] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019


Hilbert’s Seventh Problem Introduction Our problem collection is inspired, as are so many other collections, by the problems David Hilbert proposed in his keynote address at the International Congress of Mathematicians in 1900; see [1]. These problems were meant to chart important directions for research in the 20th century. A solution to any of Hilbert’s problems brings instant fame and membership in “The Honors Class” [3]. Here is a curious warmup to one of Hilbert’s problems. We claim that there are irrational numbers α and β so that αβ is rational. To show this, we consider γ =

√ 2



There are two possibilities.

√ (a) If γ is rational, take α = β = 2. In this case, α and β are irrational and αβ = γ is rational (by assumption). √ √ √2 (b) If γ is irrational, take α = 2 and β = 2. Then √ √2 √ √ √2√2 √ 2 αβ = ( 2 ) 2 = 2 = 2 = 2. In this case, α and β are irrational and αβ is rational.

Since both cases lead to the desired conclusion, the proof is finished. If you are like most people, the preceding proof will leave you feeling unsatisfied. The proof is correct, but it does not indicate which of the two possibilities is true. This is a quintessential example of an existential proof ; it proves the existence of α and β without indicating specific values of α and β that are guaranteed to work. Hilbert’s seventh problem is the following. Let α and β be algebraic numbers with β irrational. Prove that αβ is transcendental whenever α = 0, 1. Problems along these lines have a long and storied history. For example, in 1748 Leonhard Euler proposed that if α = 0, 1 is rational and β is an irrational algebraic number, then αβ is irrational. This is a weak version of Hilbert’s seventh problem. In 1934 Alexandr Gelfond (1906–1968) and Theodor Schneider (1911–1988) independently resolved Hilbert’s problem in the affirmative. If we invoke the Gelfond– Schneider theorem, then we can immediately deduce that the Gelfond–Schneider constant √



≈ 2.665144142690225188650297 . . .

is transcendental. Thus, its square root, γ, cannot be algebraic; that is, it is case (b) in the proof above that is correct. The transcendence of the Gelfond–Schneider constant was first established in 1930 by Rodion Kuzmin. 117



Of course, no discussion along these lines would be complete without mentioning Euler’s formula eix = cos x + i sin x, in which i2 = −1. Setting x = π yields the marvelous relation eiπ + 1 = 0 between the five most important constants in mathematics (0, 1, π, e, i). Among other things, a transcendental power of a transcendental number can be rational. Currently, it is unknown if either of e + π or eπ is transcendental, but we do know that at least one of them is; see the comments for the 1973 entry. Centennial Problem 1935 Proposed by Jesse Freeman and Steven J. Miller, Williams College. The problems below trace the development of the theory of transcendental numbers and highlight the power of the Gelfond–Schneider theorem. (a) Euler: For α ∈ C, let Bα = {β ∈ C : αβ ∈ Q}. Show that for an algebraic irrational γ, Bγ  Bα . (1935.1) α∈Q

You may use the Gelfond–Schneider theorem. Describe the union on the righthand side of (1935.1) and investigate the algebraic structure of Bγ . (b) Cantor: Use the fundamental theorem of algebra to prove that the set of algebraic numbers is countable (see the footnote on p. 31 for an outline of the proof). Since R is uncountable, this shows that almost all real numbers are transcendental. Although this argument proves that almost all numbers are transcendental, it does not provide an explicit example of a transcendental number. (c) Liouville: Suppose α is an algebraic number of degree d > 1 (see p. 30 for the definition). Liouville’s theorem asserts that there exists a positive constant C(α) such that for any rational number a/b,  C(α) a   . (1935.2) α −  > b bd We say α ∈ R is a Liouville number if for every positive integer n there are integers a and b with b > 1 such that  a  1  0 < α −  < n . b b The result above implies that all Liouville numbers are transcendental; however, not all transcendental numbers are Liouville numbers. Show that the set of Liouville numbers in the interval [−1, 1] has measure zero.1 See the notes below for a proof of Liouville’s theorem and the explicit construction of a transcendental number. 1 That is, for every  > 0, the set of Liouville numbers in [−1, 1] can be covered by countably many open intervals of total length < .



(d) Gelfond/Schneider/Hilbert: Using the Gelfond–Schneider theorem, show that if the ratio of two angles in an isosceles triangle is algebraic and irrational, then the ratio between the sides opposite those angles is transcendental.

1935: Comments Proof of Liouville’s theorem. Suppose that α ∈ R is a root of f (x) = cd xd + cd−1 xd−1 + · · · + c1 x + c0 , in which the coefficients are integers and cd = 0. Since f has only finitely many roots, there is a δ > 0 so that f (x) = 0 whenever 0 < |x − α| ≤ δ. Write f (x) = (x−α)g(x), in which g is a polynomial of degree d−1. Since g is continuous, there is an M > 0 such that |g(x)| ≤ M for |x − α| ≤ δ. Suppose that a, b ∈ Z, b > 1, and 0 < |α − a/b| ≤ δ. Then g(a/b) = 0 and hence d

cn ( abd ) + · · · + c1 ( ab ) + c0 f (a/b) a −α = = b g(a/b) g(a/b) d d−1 cn a + cd−1 a b + · · · + c0 bd . = bd g(a/b) The numerator is an integer which is nonzero since f (a/b) = 0, so  a  1  α −  ≥ b M bd whenever 0 < |α − a/b| ≤ δ. On the other hand, if |α − a/b| > δ, then  a  δ  α −  > n b b 1 since b ≥ 1. Consequently, 0 < C(α) < min{δ, M } ensures that   C(α) a  . α −  > b bn

This concludes the proof of Liouville’s theorem. A specific transcendental number. We are in a position to prove, without heavy machinery like the Gelfond–Schneider theorem, that a single, specific number is transcendental. We claim that Liouville’s constant λ =

∞  1 = 0.11000100000000000000000100000 . . . 10n! n=1

is transcendental; this number was “cooked up” exactly for this purpose. It is irrational since its decimal expansion is not eventually repeating. Thus, if λ is algebraic, its degree is at least 2. So suppose toward a contradiction that λ is an algebraic number of degree d ≥ 2. If n > d, then consider the nth partial sum n  1 a = j! 10 b j=1



of the series defining λ. Putting things over a common denominator, we find that the preceding is a rational number with denominator b = 10m! for some m. Thus,  1 1 1 a   + + (n+3)! + · · · λ −  = b 10(n+1)!  10(n+2)! 10  1 1 1 = 1 + (n+2)!−(n+1)! + (n+3)!−(n+1)! + · · · 10(n+1)! 10 10   1 1 1 = 1 + (n+1)!(n+1) + (n+1)!(n+2) + · · · 10(n+1)! 10 10   1 1 1 1 1 < + + ··· = · 1+ 1 10 102 10(n+1)! 10(n+1)! 1 − 10 2 < . 10(n+1)! Liouville’s theorem ensures that for n > d,  C(λ) C(λ) 2 a   = < 0 < λ −  <  (n+1)! bd 10n!d b 10 and hence C(λ) 10n!d 10n!d 0 < = = 10n!(d−n−1) → 0 < 2 10(n+1)! 10n!(n+1) as n → ∞. This is a contradiction, so λ must be transcendental. Bibliography [1]

[2] [3] [4] [5]

¨ D. Hilbert, Uber das Unendliche, Math. Ann. 95 (1926), 161–190. http://link.springer. com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-0810/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf. S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 B. H. Yandell, The honors class: Hilbert’s problems and their solvers, A K Peters, Ltd., Natick, MA, 2002. MR1880187 Wikipedia, Gelfond–Schneider theorem, https://en.wikipedia.org/wiki/GelfondSchneider_theorem. Wikipedia, Liouville number, https://en.wikipedia.org/wiki/Liouville_number.


Alan Turing Introduction Besides cracking codes at Bletchley Park during World War II and pioneering the field of artificial intelligence, Alan Turing (1912–1954) might be best known for his eponymous model of computation, the Turing machine (see Figure 1). The machine features an infinite tape partitioned into squares and a moving head that overlooks a single square at each moment in time. Squares start out blank but can also contain symbols from a finite alphabet. The head can read symbols from and write symbols to the tape. It also occupies one of n states-of-mind, which we simply call states. These states serve as the machine’s memory. Computation occurs as follows: the head reads a symbol from its current square, writes a new symbol to the square (it might be the same symbol or a blank), and moves either to the left or to the right while also (potentially) changing its state. The alphabet, states, and transition rules constitute a finite description of a Turing machine. In [4], Turing defined a universal machine, one that can take the description of another Turing machine as input and then simulate that Turing machine. It is the first example of the now ubiquitous virtual machine. Turing also used his machine to define computable numbers, which are real numbers whose decimal values can be written down successively, with each additional digit appearing after a finite number of steps. These machines do not halt, but they always make progress. Most

Sensor to read, write, or erase

Symbol on tape








Moving tape Figure 1. Depiction of a Turing machine. 121




modern treatments of Turing machines deal with computable functions instead of computable numbers. In this scenario the computation begins with a tape initialized with some finite input. What remains on the tape after the machine halts is the output. Thus, computable functions are functions that can be computed by a Turing machine in a finite number of steps. Unlike the machines writing computable numbers, these machines always halt. A classic function that is not computable asks whether given the description of a Turing machine, will that machine halt on every input? This is called the the halting problem and it remains a natural gateway into the study of computability. Though Stephen Kleene (1909–1994), Alonzo Church (1903–1995), and Emil Leon Post (1897–1954) had already developed models of computation that were equivalent in power, the Turing machine was the first to convince Kurt G¨ odel (see the 1929 entry) of what it truly meant to be an algorithm. That this really is the correct definition of mechanical computability was established beyond any doubt by Turing. Indeed, the Turing machine has remained the model of choice when explaining, extending, or developing new concepts in computability and complexity theory. Centennial Problem 1936 Proposed by Brent Heeringa, Williams College. Suppose we restrict our attention to Turing machines with n states and one additional HALT state, which tells the machine to immediately cease computation. In addition, suppose these machines are only allowed to read and write 0’s and 1’s, with 0’s serving as the blank symbol, so the tape is initially all 0’s. Let Σ(n) be the maximum number of 1’s appearing on the tape after any n-state Turing machine halts; Σ(n) is called the busy beaver function and any n-state, halting Turing machine achieving Σ(n) is called a busy beaver . It is clear that Σ(n) is welldefined because there are only a finite number of n-state halting Turing machines over the binary alphabet {0, 1}. It is known that Σ(3) = 6 and Σ(4) = 13, but the exact value of Σ(5) is unknown (it is at least 4,098). As a warm-up, show that Σ(3) = 6. Then show that, in general, Σ(n) is not computable. Can you find any upper or lower bounds on its growth rate? 1936: Comments Turing and Enigma. It would be a disservice of the highest rank not to mention the valuable work Turing and his colleagues performed for the British government in cracking the German Enigma encryption; see Figures 2 and 3. To put these contributions in perspective, estimates of their worth range from shortening the war by two to four years, to turning the tide to an Allied victory. Turing was one of the driving forces in cracking the supposedly uncrackable codes. For more on these efforts see the 1943 entry. Much of Turing’s work during the war was classified and was kept classified for a variety of reasons afterwards. However, with the passage of time the need for such security lessened, and much of his work is now publicly available; see [5]. Sadly, Turing, who was a homosexual, was prosecuted for “gross indecency” and forced to undergo chemical castration. He committed suicide at the age of 41. Speaking in



Figure 2. An Enigma machine. 2009, British Prime Minister Gordon Brown (1951– ) apologized: Thousands of people have come together to demand justice for Alan Turing and recognition of the appalling way he was treated. While Turing was dealt with under the law of the time and we can’t put the clock back, his treatment was of course utterly unfair and I am pleased to have the chance to say how deeply sorry I and we all are for what happened to him . . . . So on behalf of the British government, and all those who live freely thanks to Alan’s work I am very proud to say: we’re sorry, you deserved so much better.

Alan Turing was officially pardoned by Elizabeth II (1926– ) in 2014. Cryptography. An entry on Alan Turing seems like the appropriate place to begin discussing cryptography. This is a topic that we will return to every now and then; see the 1943, 1952, and 1977 entries. Suppose that Alice and Bob wish to communicate and that they want to prevent an eavesdropper, Eve, from understanding their conversation. Any information that Alice and Bob wish to exchange can be encoded using numbers. For instance, the American Standard Code for Information Interchange (ASCII) is a standard method for converting symbols into numerical equivalents. There are 256 = 28 symbols that can be



Figure 3. Detail of an Enigma machine.

represented in ASCII. For example, the string ASCII corresponds (in decimal) to 65 83 67 73 73 and (in binary) to 01000001 01010011 01000011 01001001 01001001. Each symbol is represented by eight bits, that is, a sequence of eight 0’s and 1’s. These transmitted segments can be augmented to ensure a more accurate transmission. For instance, the seven bits 0100110 that Alice wants to send might be augmented to 01001101. The additional 1 is a checksum bit; it means that there is an odd number of ones in 0100110. If Bob receives 01001100, then he knows an error has occurred and he can request that Alice resend the block. There are, of course, many more effective and fascinating error-detecting methods that have been developed over the years. If Alice and Bob share a common key beforehand, then there are many methods they can use to encrypt their data. For instance, the National Institute of Standards and Technology (NIST) adopted the Data Encryption Standard (DES) in 1976 and the Advanced Encryption Standard (AES) in 2001. Since this is our first expedition into cryptography, we discuss a simple technique that dates back to antiquity: the Caesar cipher . For the sake of readability and simplicity, we do not consider a blank space as a character. Alice replaces each letter in the plaintext HERE IS A MESSAGE ENCRYPTED WITH THE CAESAR CIPHER USING THE KEY FIVE. with the letter that occurs k places after it (with “wraparound”). We say that k is the key that is used to encrypt the message. With k = 5, Alice sends the ciphertext MJWJ NXFR JXXF LJJS HWDU YJIB NYMY MJHF JXFW HNUM JWZX NSLY MJPJ DKNA JPMN.



Table 1. Frequency of letters (in percent) in English. A 8.064 J B 1.537 K C 2.689 L D 4.329 M E 12.886 N F 2.448 O G 1.963 P H 6.099 Q I 6.906 R

0.112 S 0.625 T 4.102 U 2.501 V 6.985 W 7.378 X 1.703 Y 0.106 Z 6.157

6.382 9.025 2.786 1.026 2.119 0.169 1.806 0.097

to Bob. Observe that Alice padded the cipher text with nonsense to ensure that the blocks are of uniform size. For Alice and Bob to use the Caesar cipher, they must first share the key k (see the 1977 entry for an encryption method that eliminates the need to share a secret key before communicating). Eve can use frequency analysis to decipher an intercepted message, even though she does not know k. For example, the letter E is the most common letter in English; see Table 1. The uncommon letter J occurs often in the ciphertext, which suggests that E is replaced by J. Thus, Eve guesses that k = 5 (the distance between E and J) and obtains the plaintext message. As its name suggests, this method of encryption was used by Julius Caesar (100–44 BCE). Although the Caesar cipher is easily broken, in a time when most of the population was illiterate and mathematically unsophisticated, it provided adequate security. The following was encrypted with the Caesar cipher. HSSN LSNH YVDU HSSA HUKS OLHX TAOL











Use frequency analysis to determine possible keys and then decipher the message. See below for the answer. Bibliography [1] G. Boolos and R. Jeffrey, Computability and Logic (third edition), Cambridge University Press, 1999. [2] K. G¨ odel, Undecidable Diophantine propositions, in Collected Works III (from the 1930s), 164–175. [3] T. Rad´ o, On non-computable functions, Bell System Tech. J. 41 (1962), 877–884, DOI 10.1002/j.1538-7305.1962.tb00480.x. MR0133229 [4] A. M. Turing, On Computable Numbers, with an Application to the Entscheidungsproblem, Proc. London Math. Soc. (2) 42 (1936), no. 3, 230–265, DOI 10.1112/plms/s242.1.230. https://academic.oup.com/plms/article-abstract/s2-42/1/230/1491926? redirectedFrom=fulltext. MR1577030 [5] A. M. Turing, The Applications of Probability to Cryptography, http://arxiv.org/pdf/1505. 04714v2.pdf.

Answer: The key is 7. The message is: “All Gaul is divided into three parts, one of which the Belgae inhabit, the Aquitani another, those who in their own language are called Celts, in our Gauls, the third. All these differ from each other in language, customs and laws. The river Garonne separates the Gauls from the Aquitani; the Marne and the Seine separate them from the Belgae.” These are the famous opening lines of Julius Caesar’s Commentarii de Bello Gallico (Commentary on the Gallic War). The final XY in the ciphertext is padding. [6] C. Teuscher (ed.), Alan Turing: life and legacy of a great thinker, papers from the Conference “Turing Day: Computing Science 90 Years from the Birth of Alan Mathison Turing” held ´ at the Ecole Polytechnique F´ ed´ erale de Lausanne, Lausanne, June 28, 2002, Springer-Verlag, Berlin, 2004. MR2106942 126



Vinogradov’s Theorem Introduction Although we normally view primes from a multiplicative perspective, there are many interesting additive questions to investigate. A famous conjecture, due to Christian Goldbach (1690–1764), is that every even number greater than four is the sum of two primes; see Figure 1. This is the binary Goldbach conjecture; it is significantly harder than the ternary Goldbach conjecture: every odd number at least seven is the sum of three primes. A major advance towards the proof of the ternary conjecture was made by Ivan Matveyevich Vinogradov (1891–1983) in 1937, who proved that there is a constant C such that every odd number at least C is the sum of three primes. Thus, the ternary Goldbach conjecture is reduced to a finite computation: show that every odd number less than C is a sum of three primes. Unfortunately, the value of C produced by Vinogradov’s proof is too large for practical computation: it was over 101000 . In 2013, the ternary Goldbach conjecture was proved by Harald Andr´es Helfgott (1977– ), who brought C down to 1027 , well within the range checkable by computers. These approaches all use the circle method (see the 1920 and 1923

Figure 1. Number of ways (vertical axis) to represent an even number at most 10,000 (horizontal axis) as a sum of two primes. Numerical evidence such as this suggests that Goldbach’s conjecture is true. 127



entries), which converts the problem to estimating integrals of exponential sums. For example, the number of ways an integer N can be written as the sum of three primes is  1  3 e2πipx e−2πiN x dx. (1937.1) 0

p≤N p prime

To see this, expand the sum and integrate term-by-term. We obtain a sum of terms of the form  1

e2πi(p1 +p2 +p3 −N )x dx,


which equals 0 if p1 + p2 + p3 − N = 0 (by the periodicity of the integrand) and 1 if p1 + p2 + p3 − N = 0 (since the integrand is the constant function 1). Consequently, each representation N = p1 + p2 + p3 as a sum of three primes adds 1 to (1937.1), so the integral (1937.1) counts the number of ways to write N as a sum of three primes. The Goldbach problems boil down to determining if an integral, which happens to be integer valued, is nonzero. Unfortunately, the integral (1937.1) is fiendishly difficult to analyze. To date, all approaches to understanding these sums are highly technical. Centennial Problem 1937 Proposed by Steven J. Miller, Williams College. Perhaps if we are willing to allow more primes we can prove a related result more easily. Or, even more generally, let us consider writing integers as sums and differences of primes. (a) Can you prove, in an elementary manner, whether or not there is a finite integer r such that every odd number is the sum and difference of at most r primes? For example, if r = 4, we could consider quantities of the form p1 + p2 + p3 + p4 ,

p1 + p2 + p3 − p4 ,

p1 + p2 − p3 − p4 ,

and p1 − p2 − p3 − p4 .

(b) Prove that if for any even n you knew there was at least one pair of primes differing by n, then you could take r = 2,013 in the original problem. Can you get a better value of r than 2,013? (c) Here is an easier version: find a function f (x) that does not grow too rapidly such that every integer exceeding 4 and at most x is the sum of at most f (x) primes. Helfgott’s work shows that the constant function f (x) = 4 works. 1937: Comments Recent developments. The Hardy–Littlewood k-tuple conjecture (see the 1923 entry) implies that for every even number n there is a positive constant Cn , which can be explicitly written down in terms of functions of the prime factors of n, such that the number of pairs of primes of the form (p, p + n) with p ≤ x is asymptotic to Cn x/(log x)2 . In particular, for each even n the conjecture predicts infinitely many pairs of primes (p, p + n). Although this has not been proved for any n, the landscape has changed dramatically in recent years. In 2013, Yitang Zhang proved that there is some n ≤ 70,000,000 for which there are infinitely many



pairs of primes (p, p + n). Subsequent work has lowered seventy million to 246; see the 1919 entry. Also see the comments for the 2005 entry for information about the more general Bateman–Horn conjecture. Bertrand’s postulate. In 1843, Joseph Bertrand (1822–1900) conjectured that for any integer n > 3, there is at least one prime in the interval (n, 2n). Pafnuty Chebyshev (1821–1894) gave a proof in 1852 by obtaining nontrivial upper and lower bounds for π(x), the number of primes at most x (see the entries for 1913, 1919, and 1948). Bertrand’s postulate is now a consequence of the prime number theorem (proved in 1896) since lim


π(2n) 2n/ log 2n log n + log 2 = lim = 2 lim = 2. n→∞ n→∞ π(n) n/ log n log n

That is, there are approximately twice as many primes in the interval (0, 2n) than there are in the interval (0, n). But there is a simpler proof that does not rely on such heavy machinery. In 1932, the nineteen-year-old Paul Erd˝os (see the 1913 entry) gave a beautiful elementary proof of Bertrand’s postulate [2]. Our presentation is based upon that given in [1]. Erd˝os first obtained the estimate p ≤ 4x−1 p≤x

for real x ≥ 2; here and henceforth, the subscript p refers to a prime number. He next examined the prime divisors of the central binomial coefficient   2n (2n)! . (1937.2) = (n!)2 n Erd˝o√ s then showed that no prime divides (1937.2) more than 2n times, that primes p > 2n appear at most once in the factorization of (1937.2), and that primes p satisfying 23 n < p ≤ n do not divide (1937.2) at all. This last remark is the key to his argument. To see why it is true, observe that 3p > 2n for n, p ≥ 3 implies that p and 2p are the only multiples of p that divide (2n)! and that p divides (n!)2 exactly twice. Consequently,   2n 4n ≤ ≤ 2n · p· p, n 2n √ √ 2 p≤ 2n

2n 0 and write x = 10u+v = 10u 10v , in which u = log10 x and v ∈ [0, 1) is the fractional part of log10 x; that is, v = log10 x − log10 x. Since 10u is a positive integer power of 10, it follows that the leading digit of x is determined entirely by 10v . Because 10v ∈ [1, 10), the probability that 10v has leading digit d is the probability that 10v ∈ [d, d + 1); that is, v ∈ [log10 d, log10 (d + 1)). The equidistribution hypothesis on the data set ensures the probability that v ∈ [log10 d, log10 (d + 1)) is d + 1 . log10 (d + 1) − log10 d = log10 d This is the prediction of Benford’s law. Centennial Problem 1938 Proposed by Steven J. Miller, Williams College. The sequences {2n } and {3n } are both Benford; what about the sequence m n {2 3 } (write the numbers in increasing order: 1, 2, 3, 4, 6, 8, 9, . . .)? More generally, is {pm q n } Benford for p and q distinct primes? 1938: Comments The Kronecker–Weyl theorem. Although we will not spoil the problem, we should at least explain why {2n } and {3n } obey Benford’s law. The Kronecker– Weyl theorem asserts that nξ is equidistributed modulo 1 if ξ is irrational; see Figure 2 and the 1931 entry. Consequently, if ξ = log10 α is irrational, then xn = nξ = log10 (αn ) is equidistributed modulo 1 and hence the sequence {αn } obeys Benford’s law. Since log10 2 and log10 3 are irrational, we conclude that {2n } and {3n } are Benford. A bit more work shows that {en } and {π n } are Benford too. The version of the Kronecker–Weyl theorem we used above states that nξ is equidistributed modulo 1; it says nothing about how rapidly the equidistribution sets in. This can be remedied by a more involved analysis that takes into account how “irrational” a number is. A real number α has irrationality type κ if κ is the supremum of all γ such that    p  γ+1  min α −  = 0. lim inf q q→∞ p q Roth’s theorem (see the 1955 entry) ensures that every algebraic irrational is of type 1. See [7] for more details on irrationality types, [9] for applications to Benford’s law, and [8, Thm. 3.3, p. 124] for details connecting the irrationality type to the convergence rate.



Figure 2. Plots of ξ, 2ξ, . . . , 100ξ (mod 1) for ξ = π, π 2 , e, γ, K, in which γ is the Euler–Mascheroni constant (see (1942.5) in the 1942 entry) and K is Khinchin’s constant (see the 1934 entry). Powers of 2’s and 3’s. Here is another interesting question about 2 and 3. Is S = {3n /2m : 1 ≤ m, n < ∞} dense in the positive real numbers? To handle this question, we need Kronecker’s approximation theorem [6, Thm. 440], which asserts that if β > 0 is irrational, α ∈ R, and δ > 0, then there are n, m ∈ N so that |nβ − α − m| < δ. Let ξ, > 0 and note that β = log2 3 > 0 is irrational. By the continuity of f (x) = 2x at log2 ξ, there exists δ > 0 such that | log2 x − log2 ξ| < δ


|x − ξ| < .


Kronecker’s theorem with β = log2 3 and α = log2 ξ now yields n, m ∈ N so that    n   log2 3 − log2 ξ  = |n log2 3 − log2 ξ − m| < δ.  m 2 In light of (1938.1), it follows that |3n /2m − ξ| < , and thus S is dense in the positive real numbers. This answer, along with many similar results, can be found in [5]. Benford’s law and powers of π and e. A glance at Figure 2 suggests that nπ (mod 1) and nπ 2 (mod 1) are not as “random” as ne (mod 1). Perhaps a similar behavior is seen in π n versus en ? To test this, we calculate the chi-square test statistic1 to see how well Benford’s law fits the first digits of π n and en for n ≤ N for N up to 1,000. If we simulate data randomly from the Benford probabilities, then approximately 95% of the time we should observe a chi-square value of 15.487 or 1 If

the probability of observing a leading digit of d is pd and we have N observations, the 9 2 chi-square statistic (with 8 degrees of freedom) is χ2 = d=1 (Obsd − N pd ) /N pd , where N is the number of observations and Obsd is the number with leading digit d.



Figure 3. Logarithm of the chi-square statistic for the Benford test of en (red) and π n (blue) for n ≤ N versus N .

lower. We plot the results in Figure 3, where for convenience we plot the logarithm of the chi-square value. Two items are immediately apparent. First, for most N the chi-square values for π n are significantly larger than those of en . Second, there seems to be an almost periodic behavior in the amplitude of the chi-square values for π n , with a period of approximately 175 (and the amplitude getting smaller in subsequent periods). The latter is not a coincidence. While many people have made it a matter of personal pride to memorize and be able to recite digits of π on demand, very few can do this feat for π 2 , and almost no one for even higher powers. This is a shame, as that knowledge would be useful here. If we go far down with our powers, we eventually come to π 175 and notice that it is approximately 1.0028 · 1087 . In other words, every time we increase the exponent n by 175 we almost return to our original value padded by 87 zeros. Almost. If we returned to the same leading digits (just with an extra 87 zeros at the end), we would have periodic, non-Benford behavior. The slight difference eventually pushes us to Benford behavior, but very slowly (as can be seen by the slow decay in the maximum amplitudes); this is what we mean by the irrationality of the number controlling the behavior. The fact that a large power of π is almost a large power of 10 produces the peculiar behavior exhibited in Figure 3. Bibliography [1] F. Benford, The law of anomalous numbers, Proceedings of the American Philosophical Society 78 (1938), 551–572. http://www.jstor.org/discover/10.2307/984802?uid=3739552& uid=2&uid=4&uid=3739256&sid=21103164625091. [2] A. Berger and T. P. Hill, Benford online bibliography, http://www.benfordonline.net. [3] A. Berger and T. P. Hill, A basic theory of Benford’s law, Probab. Surv. 8 (2011), 1–126, DOI 10.1214/11-PS175. MR2846899



[4] A. Berger and T. P. Hill, An introduction to Benford’s law, Princeton University Press, Princeton, NJ, 2015. MR3242822 [5] B. Brown, M. Dairyko, S. R. Garcia, B. Lutz, and M. Someck, Four quotient set gems, Amer. Math. Monthly 121 (2014), no. 7, 590–599, DOI 10.4169/amer.math.monthly.121.07.590. MR3229105 [6] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 6th ed., revised by D. R. Heath-Brown and J. H. Silverman; with a foreword by Andrew Wiles, Oxford University Press, Oxford, 2008. MR2445243 [7] M. Hindry and J. H. Silverman, Diophantine geometry: An introduction, Graduate Texts in Mathematics, vol. 201, Springer-Verlag, New York, 2000. MR1745599 [8] L. Kuipers and H. Niederreiter, Uniform distribution of sequences, Pure and Applied Mathematics, Wiley-Interscience [John Wiley & Sons], New York-London-Sydney, 1974. MR0419394 [9] A. V. Kontorovich and S. J. Miller, Benford’s law, values of L-functions and the 3x + 1 problem, Acta Arith. 120 (2005), no. 3, 269–297, DOI 10.4064/aa120-3-4. http://arxiv. org/abs/math/0412003. MR2188844 [10] S. J. Miller and M. J. Nigrini, The modulo 1 central limit theorem and Benford’s law for products, Int. J. Algebra 2 (2008), no. 1-4, 119–130. MR2417189 [11] S. J. Miller (editor), The Theory and Applications of Benford’s Law, Princeton University Press, 2015. [12] R. A. Raimi, The first digit problem, Amer. Math. Monthly 83 (1976), no. 7, 521–538. MR0410850 [13] M. F. Schilling, The longest run of heads, College Math. J. 21 (1990), no. 3, 196–207, DOI 10.2307/2686886. MR1070635


The Power of Positive Thinking Introduction A student doing a homework problem has an enormous advantage over a researcher: the problem is known to be solvable. This is especially true in undergraduate and beginning graduate classes, in which assignments are meant to reinforce lessons and help students learn techniques. It is hard to overstate how important this is. It is a huge psychological boost to know a solution exists (let alone having a sense of what methods will be useful in finding it). There are many anecdotes and studies of people who were unaware of the difficulty of a problem and who then proceeded to make great progress. The following story and its variants have circulated for years and are the subject of this year’s entry. We will meet the protagonist, George Dantzig (1914–2005), again in the 1947 entry. The quote below is from a 1986 interview [1]. He was asked why his Ph.D. was on a statistics topic when he had taken so few statistics courses. It happened because during my first year at Berkeley I arrived late one day at one of Neyman’s classes. On the blackboard there were two problems that I assumed had been assigned for homework. I copied them down. A few days later I apologized to Neyman1 for taking so long to do the homework—the problems seemed to be a little harder to do than usual. I asked him if he still wanted it. He told me to throw it on his desk. I did so reluctantly because his desk was covered with such a heap of papers that I feared my homework would be lost there forever. About six weeks later, one Sunday morning about eight o’clock, Anne and I were awakened by someone banging on our front door. It was Neyman. He rushed in with papers in hand, all excited: “I’ve just written an introduction to one of your papers. Read it so I can send it out right away for publication.” For a minute I had no idea what he was talking about. To make a long story short, the problems on the blackboard that I had solved thinking they were homework were in fact two famous unsolved problems in statistics. That was the first inkling I had that there was anything special about them.

Later in the interview he discusses how the story found its way into sermons. The origin of that minister’s sermon can be traced to another Lutheran minister, the Reverend Schuler of the Crystal Cathedral in Los Angeles. Several years ago he and I happened to have adjacent seats on an airplane. He told me his ideas about thinking positively, and I told him my story about the homework problems and my thesis. A few months later I received a letter from him asking permission to include my story 1 Jerzy

Neyman (1894–1981) was Dantzig’s eventual thesis advisor. 137



in a book he was writing on the power of positive thinking. Schuler’s published version was a bit garbled and exaggerated but essentially correct. The moral of his sermon was this: If I had known that the problems were not homework but were in fact two famous unsolved problems in statistics, I probably would not have thought positively, would have become discouraged, and would never have solved them.

Centennial Problem 1939 Proposed by Steven J. Miller, Williams College. Find the statements of the two problems Dantzig solved, read papers, and believe in yourself when confronted with challenges in the future. To start you on your journey, one of the papers is available at [2]. 1939: Comments The birthday problem. Another candidate for this year’s topic is the birthday problem. In 1939 Richard von Mises (1883–1953) posed the following problem, which is a staple in most probability courses. How many people must there be in a room before there is at least a 50% chance that two people share a birthday? We give a quick discussion of this problem; see [4] for an expanded treatment and additional questions. The first step is to interpret what is going on. Normally people assume that all birthdays are equally likely (and no one is born on February 29th). This assumption is not always met. Malcolm Gladwell (1963– ) has a beautifully humorous passage in his book Outliers [3], in which he investigates the distribution of birthdays among Canadian junior hockey players. What often happens is that the young kids who just miss the cutoff for a program are now the oldest and hence likely to be among the biggest players. This is a tremendous advantage and this makes them look like better players. They then get more attention, get on to special teams, and the difference grows. In a telling passage, Gladwell substitutes the birthdays for the players names: It no longer sounds like the championship of Canadian junior hockey. It now sounds like a strange sporting ritual for teenage boys born under the astrological signs Capricorn, Aquarius, and Pisces. March 11 starts around one side of the Tigers’ net, leaving the puck for his teammate January 4, who passes it to January 22, who flips it back to March 12, who shoots point-blank at the Tigers’ goalie, April 27. April 27 blocks the shot, but it’s rebounded by Vancouver’s March 6. He shoots! Medicine Hat defensemen February 9 and February 14 dive to block the puck while January 10 looks on helplessly. March 6 scores!

Back to the birthday problem. We assume that there are 365 days in each year and that all days are equally likely. We use the law of complementary probability: the probability that an event happens is one minus the probability that it does not happen. The probability that among n people we have n different birthdays is       n−1  0 k 1 n−1 qn = 1 − 1− 1− ··· 1 − = . 365 365 365 365 k=0



Indeed, the first person can have any birthday, the next person must avoid that first birthday, then the subsequent person must miss those two days, and so on. As we saw in the 1920 and 1934 entries, it is often profitable to take the logarithm of a product. Thus, we consider   n−1  k log qn = log 1 − . 365 k=0

If we choose N so that qN ≤ 1/2, then 1 − qN ≥ 1/2; that is, the probability that a birthday is shared among N people is ≥ 1/2. For small x, we use the Taylor approximation log(1 − x) ≈ −x and obtain log(1/2) ≈ −

N −1  k=0

k (N − 1)N (N − 1/2)2 = − ≈ − 365 2 · 365 2 · 365

and hence   1 1 = = 22.994 . . . . −2 · 365 log(1/2) + 365 log 4 + 2 2 Most people unfamiliar with the problem significantly underestimate the chance; the probability is about 70% if there are 30 people, 89% with 40, and √ 97% with 50. More generally, if there were D days in the year, we need at least D log 4 + 12 people to have a 50% chance of at least one shared birthday. How close is this approximation? Very close: the probability that among n people at least two share the same birthday is ≥ 50% if n ≥ 23. This is sometimes called the birthday paradox since the answer is strikingly different than the answer to a seemingly similar problem: how many people are needed before there is a 50% chance that someone shares my birthday? We need N so large that  N 1 1 ≤ ; 1− 365 2 N ≈

this occurs first for N = 253 (if we had D days in a year, we would find N ≈ D log 2). The reason the two answers disagree by so much is that in one version any two people may agree, while in the other someone must agree with a predetermined person. Note the sharp difference in behavior: the first answer grows like D1/2 , whereas and the second grows linearly with D. In addition to being a source of revenue for probability professors betting their students on the odds two members in the class share a birthday, an interesting application is the birthday attack to find collisions of hash functions in cryptography (see [5] and the references therein). The zeta function and relatively prime integers. Now that we have developed a bit of the theory behind the Riemann zeta function (see the 1928 and 1933 entries) here is another probability gem that we cannot resist. What is the probability that two randomly chosen integers a, b are relatively prime? To begin, let us note that gcd(a, b) = 1 if and only if a and b have no prime factors in common. In other words, no prime number p divides both a and b. There is only a 1/4 chance that both a and b are divisible by 2. Therefore there is a   1 1 = 1− 2 1− 4 2



chance that 2 is not a common divisor of a and b. Similarly, there is a   1 1 = 1− 2 1− 9 3 chance that 3 is not a common divisor of a and b. In particular, the chance that neither 2 nor 3 is a common factor of a and b is     1 1 1− 2 . 1− 2 2 3 Proceeding in this manner for all primes, the Euler product formula (1933.3) and the solution (1919.2) to the Basel problem suggest the probability that a and b share no common prime factors is   6 1 1 = 2 = 0.6079 . . . ≈ 60.8%. 1− 2 = p ζ(2) π p prime This probabilistic result can be seen in actual computations. For example, in 10 seconds on a desktop computer, Mathematica generated 106 random pairs (a, b) of integers belonging to the interval [−1016 , 1016 ] and computed gcd(a, b). Of these pairs, approximately 0.6074 ≈ 60.7% satisfied gcd(a, b) = 1. This is remarkably close to the true value 6/π 2 ≈ 60.8%. Is our preceding reasoning sound? After all, there is no uniform probability measure on the integers. Indeed, if every integer had the same probability of being selected, then the sum of these infinitely many identical probabilities would sum to 1, which is impossible (a similar argument arose at the end of the comments portion of the 1924 entry). Thus, we need to be careful about what is meant by “randomly chosen integers.” A more precise version of the problem (for which our answer is the correct one) is, “What is the limit as N → ∞ of the probability that two randomly chosen integers a, b with |a|, |b| ≤ N are relatively prime?” Bibliography [1] D. J. Albers and C. Reid, An interview with George B. Dantzig: the father of linear programming, College Math. J. 17 (1986), no. 4, 293–314, DOI 10.2307/2686279. http://www.jstor. org/stable/2686279. MR856311 [2] G. B. Dantzig, On the non-existence of tests of “Student’s” hypothesis having power functions independent of σ, Ann. Math. Statistics 11 (1940), 186–192, DOI 10.1214/aoms/1177731912. http://projecteuclid.org/download/pdf_1/euclid.aoms/1177731912. MR0002082 [3] M. Gladwell, Outliers: The story of success (reprint edition), Back Bay Books, 2011. [4] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480 [5] R. Niebuhr, P.-L. Cayrel, and J. Buchmann, Improving the efficiency of generalized birthday attacks against certain structured cryptosystems, published in WCC 2011—Workshop on coding and cryptography (2011), 163–172. https://www.cdc.informatik.tu-darmstadt.de/ reports/reports/GBA-final2.pdf.


A Mathematician’s Apology Introduction One of the most important parts of an academic’s job is mentoring the next generation. Some have written extensively to share the lessons they have learned. One of the most prolific is Steven G. Krantz (1951– ), whose titles include A Mathematician’s Survival Guide: Graduate School and Early Career Development; A Primer of Mathematical Writing: Being a Disquisition on Having Your Ideas Recorded, Typeset, Published, Read and Appreciated ; How to Teach Mathematics; A TEX Primer for Scientists; and The Survival of a Mathematician: From Tenure to Emeritus. These books give a nice sample of the issues, challenges, and rewards that lie ahead (the last is available online [9]; all can be purchased for reasonable amounts). Although there are many authors and texts to mention, this entry highlights Godfrey Harold Hardy’s A Mathematician’s Apology, first published in 1940 [7]. While many books discuss the challenges and rewards of being a mathematician, his work is a reflection on his life and whether or not it was well spent. Mathematically it surely was, since he was responsible for numerous advances and new techniques. Regarding his life, Hardy considered it to be a success in terms of the happiness and comfort that he found, but the question remained as to the “triviality” of his life. He resolved it accordingly: The case for my life. . . is this: that I have added something to knowledge, and helped others to add more; and that these somethings have a value which differs in degree only, and not in kind, from that of the creations of the great mathematicians, or of any of the other artists, great or small, who have left some kind of memorial behind them.

Because of the influence of Hardy’s writing and work, we devote the entire entry to him. This is not meant to imply that there were no significant results proved in 1940. One natural candidate is Kurt G¨ odel’s proof [5] of the relative consistency of the axiom of choice with the Zermelo–Fraenkel axioms of set theory; see the entry from 1963 for the rest of the story. A well-known passage from the Apology proclaims: I have never done anything “useful.” No discovery of mine has made, or is likely to make, directly or indirectly, for good or ill, the least difference to the amenity of the world. . . . Judged by all practical standards, the value of my mathematical life is nil; and outside mathematics it is trivial anyhow.

Then it might come as a surprise that Hardy is best known to the world for his work in genetics. His fame stems from a condescending letter to the editor in Science on the stability of genotype distributions from one generation to the next [6]; see 141



Figure 1. Hardy’s note in Science in which he lays out what is now known as the Hardy–Weinberg law [6].

Figure 1. The result was independently found by the German physician Wilhelm Weinberg (1862–1937) and is now known as the Hardy–Weinberg law ; see [1] for more details. During a lecture by Reginald Crundall Punnett (1875–1967) of Punnett square fame, the statistician Udny Yule (1871–1951) asked about the behavior of the ratio of dominant to recessive traits over time. Why does the population not tend towards the dominant trait over time? Punnett brought the problem to his friend and cricket companion Hardy; see [3, 4] for more details. Using only “mathematics of the multiplication-table type”, under natural conditions Hardy proved that there is an equilibrium at which the ratio of different genotypes remains constant over time. The mathematical content of the letter can be summarized in one line: (p + q)2 = p2 + 2pq + q 2 . The following passage from his note gives a good sense of its tone. I am reluctant to intrude in a discussion concerning matters of which I have no expert knowledge, and I should have expected the very simple point which I wish to make to have been familiar to biologists. . . . There is not the slightest foundation for the idea that a dominant character should show a tendency to spread over a whole population, or that a recessive should tend to die out. [6].

In an obituary of Hardy, Edward Charles Titchmarsh (1899–1963) states that Hardy “attached little weight to it” [12]. However, its prevalence in introductory biology



texts demonstrates the importance of the Hardy–Weinberg law. One commentator conjectured: It must have embarrassed him that his mathematically most trivial paper is not only far and away his most widely known, but has been of such distastefully practical value. He published this paper not in the obvious place, Nature, but across the Atlantic in Science. Why? . . . I would like to think that he didn’t want it to be seen by his mathematician colleagues. [2]

Centennial Problem 1940 Proposed by Steven J. Miller, Williams College. Read the masters! Pull up Riemann’s original paper [11] or some article in a field that strikes your fancy. Read the rest of A Mathematician’s Apology or other similar books. Browse some math blogs. We are fortunate to live in a time when the only cost of posting and publishing certain types of information is the time it takes to write it. The AMS has a great blog for graduate students at http:// blogs.ams.org/mathgradblog/. Many people make career decisions by following paths of least resistance; really think about what you want to do. Do not just go with the flow; make as informed a decision as you can. 1940: Comments More about Hardy. Hardy lived through World War I and the Apology was written at the start of the Second World War. Much of his pride in the uselessness of his work stemmed from the fact that he was not contributing to violence and war. But here I must deal with a misconception. It is sometimes suggested that pure mathematicians glory in the uselessness of their work. If the theory of numbers could be employed for any practical and obviously honorable purpose, if it could be turned directly to the furtherance of human happiness of the relief of human suffering. . . then surely neither Gauss nor any other mathematician would have been so foolish as to decry or regret such applications. But science works for evil as well as for good (and particularly, of course, in time of war). . . . [7]

Interestingly, what seems useless and pure in one era can become useful and applied a short time later. Hardy’s own work provides an excellent example, where much of elementary number theory (as well as advanced results on L-functions) now plays an important role in cryptography; see the 1921 and 1977 entries. Of course, Hardy is perhaps best known (in the mathematical community at any rate) for his collaborations with Littlewood and Ramanujan. On this Hardy says: I still say to myself when I am depressed and find myself forced to listen to pompous and tiresome people, “Well, I have done one thing you could never have done, and that is to have collaborated with Littlewood and Ramanujan on something like equal terms.” [7]

The 2015 film “The Man Who Knew Infinity,” based upon the outstanding biography of Ramanujan by Robert Kanigel (1946– ) [8], depicts some of Hardy’s many quirks and his working relationship with the great Ramanujan; see Figure 2.



Figure 2. A scene from the 2015 movie “The Man Who Knew Infinity.” S. Ramanujan (left) speaks with J. E. Littlewood (right) as G. H. Hardy (middle) observes. Ramanujan, Littlewood, and Hardy are played by Dev Patel (1990– ), Toby Jones (1966– ), and Jeremy Irons (1948– ), respectively. Bibliography [1] H. E. Christenson and S. R. Garcia, G. H. Hardy: mathematical biologist, J. Humanist. Math. 5 (2015), no. 2, 96–102, DOI 10.5642/jhummath.201502.08. http://scholarship.claremont. edu/cgi/viewcontent.cgi?article=1273&context=jhm. MR3378780 [2] J. F. Crow, Eighty years ago: the beginnings of population genetics, Genetics 19 (1988), no. 3, 473–76. [3] A. W. F. Edwards, G. H. Hardy (1908) and Hardy–Weinberg equilibrium, Genetics 179 (2008), no. 3, 1143–150. http://genetics.org/content/179/3/1143. [4] C. R. Fletcher, G. H. Hardy—applied mathematician, Bull. Inst. Math. Appl. 16 (1980), no. 2-3, 61–67. MR576086 [5] K. G¨ odel, The Consistency of the Continuum Hypothesis, Annals of Mathematics Studies, no. 3, Princeton University Press, Princeton, N. J., 1940. MR0002514 [6] G. H. Hardy, Mendelian proportions in a mixed population, Science 28 (1908), 49–50. http:// www.esp.org/foundations/genetics/classical/hardy.pdf. [7] G. H. Hardy, A mathematician’s apology, with a foreword by C. P. Snow; reprint of the 1967 edition, Canto, Cambridge University Press, Cambridge, 1992. MR1148590 [8] R. Kanigel, The man who knew infinity: A life of the genius Ramanujan, Charles Scribner’s Sons, New York, 1991. MR1113890 [9] S. G. Krantz, The survival of a mathematician: From tenure-track to emeritus, American Mathematical Society, Providence, RI, 2009. http://www.math.wustl.edu/~sk/books/ newsurv.pdf. MR3309302 [10] R. C. Punnett, Early days of genetics, Heredity 4 (1950), no. 1, 1–10. [11] B. Riemann, On the number of prime numbers less than a given quantity, Monatsberichte der K¨ oniglich Preußischen Akademie der Wissenschaften zu Berlin, 1859. http://www.claymath. org/sites/default/files/ezeta.pdf. [12] E. C. Titchmarsh, Obituary: Godfrey Harold Hardy (1877–1947), Obit. Notices Roy. Soc. London 6 (1949), 447–461 (1 plate). MR0037796


The Foundation Trilogy Introduction On August 1, 1941, Isaac Asimov visited John Campbell, editor of Astounding Science Fiction. The meeting led to the creation of the Foundation series, one of the most influential science-fiction series of all time. The story is modeled on the celebrated The History of the Decline and Fall of the Roman Empire by Edward Gibbon (1737–1794) and tells the story of how the Galactic Empire will fall and 30,000 years of anarchy will reign before a new empire arises.1 Hari Seldon develops the mathematical theory of psychohistory. Inspired by statistical mechanics, the Foundation series postulates that it is possible to mathematically predict the general behavior of galactic populations with high precision (despite the fact that it is impossible to predict the behavior of specific individuals). While it is too late to stop the fall, Hari and his colleagues analyze the equations and take steps to minimize its impact, so that a new empire will rise after just a thousand years. Asimov is but one of many science-fiction writers whose work has inspired scientists and engineers. NASA seriously considered adopting the Star Trek logo; while that never happened, the first shuttle was named Enterprise. Of course, this is not meant to imply that science fiction always gets the math right. In the 1989 Star Trek: The Next Generation episode The Royale, Captain Jean-Luc Picard claims that Fermat’s last theorem is still unresolved after 800 years;2 it was proved by Andrew Wiles (1953– ) in 1994 (see the 1995 entry). The 2010 Doctor Who episode The Eleventh Hour is notable for conflating anecdotes ´ about the mathematicians Pierre de Fermat (1607–1665) and Evariste Galois (1811– 1832). On the other hand, the 1981 Doctor Who story Logopolis and the 1982 story Castrovalva (named after an M. C. Escher lithograph) involve mathematics, in a vague but fascinating sense, as part of the plot.

Centennial Problem 1941 Proposed by Steven J. Miller, Williams College. One of the most famous quotes in Asimov’s original trilogy is “A circle has no end.” In case you are not familiar with the story, we will not spoil it for you by divulging its meaning in the work. While a circle has no end, it does have a perimeter and an area. Consider the following generalization. Find the area

1 Can

you think of a Roman Emperor who was captured in battle? Can you find a Fields Medalist with that middle name? 2 Although the 1995 Star Trek: Deep Space 9 episode Facets refers to Wiles’s proof. 145



enclosed by the ellipse x2 y2 + = 1. a2 b2 Now find its perimeter. The first problem is often covered in multivariable calculus. The second problem has been studied by many mathematicians and its solution touches on many fields. This conundrum illustrates that for many problems the boundary is harder to deal with than the interior. 1941: Comments Elliptical reasoning. If we let u = x/a and v = y/b, then the equation of the ellipse, in uv-space, becomes the equation of the unit circle centered at the origin; the area element dx dy becomes ab du dv. This change of variables yields     1 dx dy = 1 · ab du dv = πab (x/a)2 +(y/b)2 ≤1

u2 +v 2 ≤1

since the area of the unit circle is π. If a = b = r, then the area is πr 2 , as expected. A similar calculation shows that the volume of the ellipsoid (x/a)2 + (y/b)2 + (z/c)2 ≤ 1 is πabc. Computing the perimeter of an ellipse is a different story; see [1] for a discussion and solution. Fourier series. Of course, 1941 witnessed many mathematical innovations that are worthy of our attention. We focus on a famous theorem of Norbert Wiener (1894–1964) about absolutely convergent Fourier series. Before tackling Wiener’s theorem, we need to talk about Fourier series. This is a subject that every mathematics student should learn about. Under certain circumstances one can approximate a function f : [−π, π] → R by the partial sums of its Fourier series ∞  a √0 + (an cos nt + a−n sin nt), (1941.1) 2 n=1 in which the Fourier coefficients are given by  1 π a−n = f (t) sin nt dt, π −π  1 π f (t) √ dt, a0 = π −π 2 and  1 π an = f (t) cos nt dt π −π for n ∈ N. The motivation stems from the study of waves. A 2π-periodic function f : R → R can be regarded as a function f : [−π, π] → R since the values of f on [−π, π] determine the values of f everywhere. Under certain circumstances, one hopes to express f as a superposition of simple sine and cosine waves. The integrals defining the “amplitudes” an act as “filters” that isolate the component of f that has “frequency” n. A typical result in the area, used all the time by electrical engineers, is the following. Let f : R → R be a periodic function with period 2π. Suppose that



Figure 1. The graph of the square-wave function (1941.2) and the Fourier approximation π2 + 2 sin t + 23 sin 3t + 25 sin 5t + 27 sin 7t + 2 9 sin 9t. f and f  are both piecewise continuous on [−π, π] and that f (−π) = f (π) and f  (−π) = f  (π). If f is continuous at t, then the Fourier series (1941.1) converges to f (t). If f has a jump discontinuity at t, then the Fourier series (1941.1) converges to the midpoint 12 (f (t+ ) + f (t− )) of the gap [4, 2.3.10]. This result is of practical value, since it ensures that “nice” waves can be studied using sines and cosines. Leibniz’s series for π/4. Here is a cute example of Fourier series in action. Consider the square-wave function f : [−π, π] → R defined by ⎧ ⎪ ⎨0 if −π < t < 0, f (t) = π if 0 < t < π, (1941.2) ⎪ ⎩π if t = 0 or t = ±π. 2 Since f and f  are piecewise continuous, one can show that ∞

 π  2 sin[(2n − 1)t] a0 (an cos nt + a−n sin nt) = + f (t) = √ + 2 n=1 2n − 1 2 n=1 for all t; see Figure 1. Since f ( π2 ) = π, it follows that π=

∞  π (−1)n+1 +2 , 2 (2n − 1) n=1

which can be rearranged to yield the famous series 1−

1 1 1 π + − + ··· = , 3 5 7 4

discovered in 1674 by Gottfried Wilhelm Leibniz (1646–1716). Gelfand’s proof of Wiener’s 1/f theorem. A familiar theme in this book is that everything real is better complex. Let T = {z ∈ C : |z| = 1} = {eit : t ∈ [−π, π]} denote the unit circle in the complex plane. By identifying the interval [−π, π] with T, we may regard a 2π-periodic complex-valued function as a function f : T → C.



This is the natural setting for the study of Fourier series, in which one attempts to  find series expansions of the form n∈Z cn eint for f (eit ), in which 1 cn = 2π


f (t)e−int dt.


The advantage of this approach is that we can work entirely with exponential functions, which are much easier to deal with than sines and cosines. For instance, do you remember the trigonometric identities for cos(x + y) and sin(x + y)? If not, see the footnote on p. 8. However, you certainly know that ex+y = ex ey . Wiener’s 1/f theorem  asserts that if f : T → C has an absolutely convergent Fourier series f (eit ) = n∈Z an eint and if f does not vanish on T, then 1/f has an absolutely convergent Fourier series. The original proof by Norbert Wiener (1894–1964) from 1932 was delicate and technical (around 100 pages) [7]. It is often described as a tour-de-force of “hard analysis.” Using the theory of Banach algebras, Israel Gelfand (1913–2009) gave a “soft” proof of Wiener’s theorem in 1941 [5] that requires only a few pages. To discuss Gelfand’s proof, we need to view the problem through the lens of Banach algebras [2].  The Wiener algebra W is the set of all functions f : T → C of the form f (eit ) = n∈Z an eint for which f W =

|an | < ∞.


It turns that W is closed under pointwise addition and multiplication. It can be endowed with the norm and metric dW (f, g) = f − g W with respect to which it is a complete metric space. In particular, W consists of those functions whose Fourier series are absolutely convergent. Each f ∈ W is continuous, since it is the N uniform limit on T of the continuous functions n=−N an eint . The Wiener algebra is an example of a commutative Banach algebra. That is, it is a complete (in the topological sense) normed vector space endowed with addition and a commutative multiplication that satisfy several natural axioms (for instance, the norm is submultiplicative: f g W ≤ f W g W ). The general theory of commutative Banach algebras, developed by Gelfand, tells us that f ∈ W is invertible in W if and only if χ(f ) = 0 for every character χ of W. In this context, a character is a multiplicative linear function χ : W → C. That is, χ is a complex-valued linear map on W that satisfies χ(f g) = χ(f )χ(g) for all f, g ∈ W. Gelfand showed that characters on commutative Banach algebras are contractive; so |χ(f )| ≤ f W for all f ∈ W. In particular, every character on W is continuous since |χ(f ) − χ(g)| = |χ(f − g)| ≤ f − g W . Let z = eit and observe that z n ∈ W and z n W = 1 for all n ∈ Z. Suppose that χ : W → C is a character and that χ(z) = λ. Then |λ|n = |λn | = |χ(z n )| ≤ z n W = 1 for all n ∈ Z, so |λ| = 1. Thus, for each character χ, there is a unique eiα ∈ T so that χ(z) = eiα . Consequently, χ(z n ) = χ(z)n = einα


for all n ∈ Z. Thus, χ simply evaluates the function z n at eiα . If f = W, then the continuity of χ ensures that   cn χ(z n ) = cn einα = f (eiα ). χ(f ) = n∈Z


n∈Z cn z



If f does not vanish on T, then χ(f ) = 0 for every character χ of W. Thus, f is invertible in W, so 1/f ∈ W as claimed. Bibliography [1] S. Adlaj, An eloquent formula for the perimeter of an ellipse, Notices Amer. Math. Soc. 59 (2012), no. 8, 1094–1099, DOI 10.1090/noti879. http://www.ams.org/notices/201208/ rtx120801094p.pdf. MR2985810 [2] W. Arveson, A short course on spectral theory, Graduate Texts in Mathematics, vol. 209, Springer-Verlag, New York, 2002. MR1865513 [3] I. Asimov, Foundation, Gnome Press, 1951. [4] R. Bhatia, Fourier series, reprint of the 1993 edition [Hindustan Book Agency, New Delhi, MR1657675], Classroom Resource Materials Series, Mathematical Association of America, Washington, DC, 2005. MR2108537 [5] I. M. Gelfand, Normierte ringe, Mat. Sbornik N.S. 9 (1941), no. 51, 3-24. http://www.mathnet. ru/links/c2c9f3ffc009b4ac303540de01718e1e/sm6046.pdf. [6] D. J. Newman, A simple proof of Wiener’s 1/f theorem, Proc. Amer. Math. Soc. 48 (1975), 264–265, DOI 10.2307/2040730. http://www.ams.org/journals/proc/1975-048-01/ S0002-9939-1975-0365002-8/. MR0365002 [7] N. Wiener, Tauberian theorems, Ann. of Math. (2) 33 (1932), 1-100. http://www.jstor.org/ stable/1968102?origin=crossref&seq=1#page_scan_tab_contents.


Zeros of ζ(s) Introduction The Riemann zeta function is perhaps the most important function in number theory; see the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries. It is initially defined for Re s > 1 by the series ∞  1 . s n n=1

ζ(s) =

The Euler product formula (1933.3) is the product representation −1  1 ζ(s) = , 1− s p p prime also valid for Re s > 1; see the 1933 entry for a proof. Although Euler and others studied the zeta function first, it is named after Georg Friedrich Bernhard Riemann (1826–1866) because of his 1859 masterpiece that relates the distribution of the zeros of ζ(s) to the fine properties of the prime-counting function π(x) [8]. The Euler product formula confirms that the zeta function has no zeros in the half plane Re s > 1. However, neither the series nor the product representation given above converges if Re s ≤ 1. So what do we mean by the zeros of ζ(s)? To resolve this issue and to understand Riemann’s contribution, we must discuss analytic continuation. An analytic function is a differentiable function f : U → C defined on a nonempty, connected open set U ⊆ C. By “differentiable,” we mean that f  (z0 ) = lim


f (z) − f (z0 ) z − z0

exists for every z0 ∈ U . This is the complex version of the single-variable calculus definition. For instance, the zeta function is analytic on Re s > 1 with derivative ζ  (s) = −

∞  log n . ns n=1

An analytic continuation of an analytic function f : U → C is an analytic function g : V → C, defined on an open set V that contains U , so that f and g agree on U . That is, g is an analytic “extension” of f to the larger set V . An example of analytic continuation involves the geometric series. The summation formula ∞  1 (1942.1) zn = 1−z n=0 151


1942. ZEROS OF ζ(s)

is valid for |z| < 1.1 There is an important asymmetry in (1942.1): the series converges only for |z| < 1, whereas the function (1 −  z)−1 is defined for all z = 1. ∞ −1 Thus, (1 − z) provides an analytic continuation of n=0 z n from the open disk |z| < 1 to the much larger region C\{1}. Obtaining an analytic continuation of the zeta function is more difficult. We first construct an analytic continuation to Re s > 0. Observe that   ∞  n+1 ∞ ∞    1 = n−s − x−s dx = x−s dx n−s − ζ(s) − s − 1 n=1 1 n n=1 ∞  n+1   −s  = n − x−s dx n=1 n ∞  n+1 








y −1−s dy












  dx ≤ |s|n−1−Re s ,

it follows that the series (1942.2) converges absolutely and uniformly on each halfplane Re s ≥ δ > 0. Each summand is an analytic function of s, so (1942.2) provides an analytic continuation of ζ(s) − (s − 1)−1 to the half-plane Re z > 0; the presence of the term (s − 1)−1 on the left-hand side ensures that ζ(s) has a simple pole at s = 1 with residue 1. That is, near the point s = 1, the zeta function behaves like the function (s − 1)−1 . The next, and most complicated, step is to show that the zeta function satisfies the functional equation  πs  Γ(1 − s)ζ(1 − s), (1942.3) ζ(s) = 2s π s−1 sin 2 in which ∞ e−γs  s −1 s Γ(s) = 1+ en (1942.4) s n=1 n is the gamma function and γ =

N   1 − log N ≈ 0.5772156 . . . N →∞ n n=1



is the Euler–Mascheroni constant. For the sake of brevity, we omit this step. Since the product (1942.4) is an analytic function on C\{0, −1, −2, −3, . . .}, the functional equation (1942.3) permits us to define ζ(s) for Re s ≤ 0 since the function on the right-hand side of (1942.3) is now defined for s = 1 with Re s ≥ 1. Thus, we have obtained an analytic continuation of the zeta function to C\{0}. The product representation (1942.4) of the gamma function and (1942.3) ensure that ζ has zeros at −2, −4, −6, . . . These are the trivial zeros of the zeta function. Any remaining zeros must be in the critical strip {s ∈ C : 0 < Re s < 1}. ∞ n radius of convergence of the series n=0 z is 1. What students of calculus do not often realize is that the “radius” referred to is the radius of the disk |z| < 1 in the complex plane. 1 The



These are the nontrivial zeros of the zeta function. It turns out that the nontrivial zeros govern the main terms in our error estimates of the π(x). Neglecting some logarithmic factors, if θ = sup{Re s : 0 < Re s < 1, ζ(s) = 0}, then the maximum deviation2 |π(x)−Li(x)| from the prediction of the prime number theorem is essentially of size at most xθ . Thus, the nontrivial zeros of the zeta function have an enormous influence in number theory: they control the large-scale distribution of the prime numbers. To a few decimal places, these are the first twenty nontrivial zeros that lie in the upper half-plane: 0.5 + 14.1347i, 0.5 + 37.5862i, 0.5 + 52.9703i, 0.5 + 67.0798i,

0.5 + 21.0220i, 0.5 + 40.9187i, 0.5 + 56.4462i, 0.5 + 69.5464i,

0.5 + 25.0109i, 0.5 + 43.3271i, 0.5 + 59.3470i, 0.5 + 72.0672i,

0.5 + 30.4249i, 0.5 + 48.0052i, 0.5 + 60.8318i, 0.5 + 75.7047i,

0.5 + 32.9351i, 0.5 + 49.7738i, 0.5 + 65.1125i, 0.5 + 77.1448i.

Notice a pattern? Numerical calculations have confirmed that the first 1013 nontrivial zeros lie on the critical line Re s = 12 ; see Figure 1. The Riemann hypothesis, one of the seven Clay Millennium Problems, asserts that the nontrivial zeros all lie on the critical line. Riemann wrote in [8]: . . . and it is very probable that all roots are real.3 Certainly one would wish for a stricter proof here; I have meanwhile temporarily put aside the search for this after some fleeting futile attempts, as it appears unnecessary for the next objective of my investigation.

The Riemann hypothesis, which was one of Hilbert’s problems [10] (see the 1935, 1963, 1970, 1980, and 1983 entries), is considered by many mathematicians to be the most important open problem in mathematics. In 1914, Godfrey Harold Hardy (see the 1920, 1923, and 1940 entries) proved there are infinitely many nontrivial zeros on the critical line. However, he was unable to ascertain whether a positive proportion of them are on the critical line. The situation changed in 1942, when Atle Selberg (1917–2007) showed that a small, but positive, proportion of the zeros of ζ(s) are on the critical line; see the 1948 entry. A major advance came in 1974 with the work of Norman Levinson (1912– 1975), who proved more than a third of these zeros are on the line. The best results today are around 40%; there is still a long way to go. Even if we can prove that 100% of the zeros are on the critical line, that still would be insufficient to prove the Riemann hypothesis. There could still be infinitely many zeros in the critical strip that do not lie on the critical line. This is meant in the same sense that “100% of natural numbers are not perfect squares.” The proportion √ of natural numbers √ at most x that are not perfect squares is approximately (x − x)/x = 1 − 1/ x, which tends to zero as x → ∞. It is still unknown whether or not there is a c < 1 such that all nontrivial zeros of the zeta function have real part at most c; the Riemann hypothesis is equivalent to being able to take c = 12 (the nontrivial zeros are symmetric about the line 2 Here

Li(x) denotes the offset logarithmic integral function (1933.1). was considering a variant of the zeta function, for which the corresponding conjecture is that the zeros are real. 3 Riemann


1942. ZEROS OF ζ(s)

Figure 1. The nontrivial zeros of the Riemann zeta function lie in the critical strip 0 < Re s < 1. The Riemann hypothesis asserts that they all lie on the critical line Re s = 12 . Re s = 12 ). The best results are zero-free regions where how far to the left of the line Re s = 1 we can go tends to zero rapidly with the height t, giving regions where ζ(σ + it) = 0

if σ > 1 − A(log |t|)−r1 (log log |t|)−r2

for some positive constants A, r1 , r2 . Centennial Problem 1942 Proposed by Steven J. Miller, Williams College. What is wrong with the following “proof” of the Riemann hypothesis?4 (a) For each prime p let hp (s) = (1 − p−2s )−1 /(1 − p−s )−1 . Note that hp (s) is never zero or infinity for Re s > 0. 4 This

is not a valid proof, nor can it be salvaged.



(b) Let ζ2 (s) = h2 (s)ζ(s). The analytic continuation of ζ2 (s) is simply h2 (s) times the analytic continuation of ζ(s). Furthermore, ζ2 (s) and ζ(s) have the same zeros for Re s > 0. Observe that −1  −1  1 − p−s ζ2 (s) = 1 − 2−2s . p prime p≥3

(c) Similarly set ζ3 (s) = h3 (s)ζ2 (s), and observe that ζ3 (s) and ζ2 (s) (and hence also ζ(s)) have the same zeros in the region Re s > 0. Note that −1  −1  −1  1 − 3−2s 1 − p−s . ζ3 (s) = 1 − 2−2s p prime p≥5

(d) We continue this process, working initially in the region Re s > 2 so that all the products involved converge uniformly. We let ζ∞ (s) be the limit of ζp (s) as p → ∞. This limit exists and equals ζ(2s) for Re s > 2. (e) Since ζ(2s) has an analytic continuation that does not vanish for Re s > 1/2 (because ζ(s) does not vanish if Re s > 1), each ζp (s) does not vanish for Re s > 1/2. Since all these functions have the same zeros in this region, none of them vanish for Re s > 1/2. Thus, ζ(s) does not vanish in this region and the Riemann hypothesis is true.

1942: Comments Solution to the problem. The approach sketched above is fundamentally flawed. The error is that the analytic continuation of the limit is not necessarily the limit of the analytic continuation. Moreover, there is no hope of salvaging the argument above. If instead of replacing each prime with its square we used its cube, we would then deduce that ζ(s) has no zeros for Re s > 1/3. However, this is impossible since the zeta function has infinitely many zeros on the critical line.

Bibliography [1] E. Bombieri, Problems of the millennium: the Riemann hypothesis, Clay Mathematics Institute, http://www.claymath.org/sites/default/files/official_problem_description. pdf. [2] Clay Mathematics Institute, Millennium problems, http://www.claymath.org/millenniumproblems. [3] H. Davenport, Multiplicative number theory, 2nd ed., revised by Hugh L. Montgomery, Graduate Texts in Mathematics, vol. 74, Springer-Verlag, New York-Berlin, 1980. MR606931 [4] H. M. Edwards, Riemann’s zeta function, Pure and Applied Mathematics, Vol. 58, Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London, 1974. MR0466039 [5] G. H. Hardy, Sur les z´ eros de la fonction ζ(s), Comp. Rend. Acad. Sci. 158 (1914), 1012– 1014. [6] H. Iwaniec and E. Kowalski, Analytic number theory, American Mathematical Society Colloquium Publications, vol. 53, American Mathematical Society, Providence, RI, 2004. MR2061214 [7] N. Levinson, More than one third of zeros of Riemann’s zeta-function are on σ = 1/2, Advances in Math. 13 (1974), 383–436, DOI 10.1016/0001-8708(74)90074-7. MR0564081


1942. ZEROS OF ζ(s)

¨ [8] G. F. B. Riemann, Uber die Anzahl der Primzahlen unter einer gegebenen Gr¨ osse, Monatsber. K¨ onigl. Preuss. Akad. Wiss. Berlin, Nov. 1859, 671–680. http://www.maths.tcd.ie/pub/ HistMath/People/Riemann/Zeta/EZeta.pdf. [9] A. Selberg, Contributions to the theory of the Riemann zeta-function, Arch. Math. Naturvid. 48 (1946), no. 5, 89–155. MR0020594 [10] Wikipedia, Hilbert’s problems, http://en.wikipedia.org/wiki/Hilbert’s_problems.


Breaking Enigma Introduction One group of mathematicians played a crucial role in the Allied victory in World War II: the codebreakers. The German Army encrypted its communications with Enigma machines, typerwriter-like devices (see Figures 2 and 3 on pp. 123 and 124, respectively) that produce a fiendishly complicated code. The Polish Cipher Bureau developed the strategies to break the Enigma code in the early 1930s, but the largest codebreaking operation was British, headquartered at Bletchley Park, a Victorian manor northwest of London. The top-secret Bletchley Park project, codenamed “Ultra,” is legendary. It employed mathematicians, linguists, chess masters, academics, composers, and puzzle experts. Recruiters once asked the Daily Telegraph to organize a crossword competition and then secretly offered jobs to the winners. One of the leaders of Ultra was Alan Turing, the mathematician and pioneer of theoretical computer science whom we met in the 1936 entry. Mathematically speaking, the Enigma machine generates a permutation τ ∈ S26 of the 26 letters of the alphabet. Here Sn denotes the symmetric group on n symbols. The permutation τ changes with each keystroke. Typing one letter sends an electric current through scrambling mechanisms—a plugboard, then a set of rotors, then a reflector, then back through the rotors and the plugboard—causing a different letter to light up. It also turns the rotors so that the next letter will be scrambled differently. The scramblers are wired as follows: the plugboard has one plug for each letter and ten pairs of letters wired together. It defines a permutation π, which is a product of ten two-cycles. The rotors are rotating wheels with a circle of twentysix brass pins on one side and twenty-six electrical contacts on the other. The wiring from contacts to pins gives a fixed permutation ρ. Depending on the position of the rotor, this permutation is conjugated by a power of the 26-cycle α = (1 2 3 . . . 26). The reflector has twenty-six electrical contacts, connected in pairs by thirteen wires. It gives a fixed permutation σ, a product of thirteen 2-cycles. Altogether, the permutation τ is π −1 (α−i1 ρ1 αi1 )−1 (α−i2 ρ2 αi2 )−1 (α−i3 ρ3 αi3 )−1 σ(α−i3 ρ3 αi3 )(α−i2 ρ2 αi2 )(α−i1 ρ1 αi1 )π,

where i1 , i2 , i3 , which represent the positions of rotors 1, 2, and 3, vary. Since each permutation τ is a conjugate of σ, it follows that τ is also a product of thirteen 2-cycles and that τ −1 = τ . Thus, a message can be encrypted and decrypted by Enigma machines with the same settings. The operator could choose ten pairs of letters to connect in the plugboard, three out of five exchangeable rotors in any order, and twenty-six initial positions for each rotor. This gives a total of 150,738,274,937,250 initial settings for the machine. 157



The vast number of initial settings makes the Enigma code almost unbreakable, but it does have weaknesses. Since τ is a product of thirteen 2-cycles, no letter is ever encoded as itself. A codebreaker can look for common words and phrases in the encrypted text and rule them out if any letters match. German messages also had various common formats that made them easier to guess. Furthermore, the Allied spies captured parts of Enigma machines, decrypted messages, and information about initial settings. All this was just enough to break the code. By 1943, British Intelligence was able to decrypt most Enigma codes without knowing the initial settings of the machine. This capability was kept utterly secret; the Nazis never knew. Winston Churchill (1874–1965) later told George VI (1895–1952), “It was thanks to Ultra that we won the war.” Centennial Problem 1943 Proposed by Ian Whitehead, University of Minnesota. In honor of the Bletchley Park crossword contest, here is a cryptographythemed cryptic crossword (see Figure 1), jointly written with Joey McGarvey. As in all cryptic crosswords, each clue contains a regular definition and a pun/anagram/ wordplay hint. You must figure out how to parse the clue. 1943: Comments Derangements. There are n! permutations of {1, 2, . . . , n}. A permutation is a derangement if no element ends up where it started. Thus, (2 4 3 5 1) is not a derangement since 3 is fixed, but (2 3 5 1 4) is. Let pn denote the fraction of permutations of {1, 2, . . . , n} that are derangements. Does limn→∞ pn exist? If it exists, is it large (close to 1) or small (close to 0)? Think about this before reading on. To determine pn , we compute 1 minus the probability at least one number is fixed. Let Ai1 ,i2 ,...,ik denote the number of permutations that fix the distinct natural numbers i1 , i2 , . . . , ik ≤ n; these permutations may fix other numbers as well, so long as i1 , i2 , . . . , ik are fixed. Then Ai1 ,i2 ,...,ik = (n − k)!. The principle of inclusion-exclusion ensures that the number of permutations that fix at least one of {1, 2, . . . , n} is n  i1 =1

A i1 −

 i1 1.

Like the Riemann zeta function, ζK (s) can be analytically continued to a larger domain (see the 1942 entry). The Riemann hypothesis over global function fields is a theorem, first proved by Andr´e Weil in the 1940s. The Riemann Hypothesis for Function Fields: Let K be a global function field over Fq . All the roots of ζK (s) lie on the line Re s = 1/2. The result above was first conjectured for hyperelliptic function fields by Emil Artin (1898–1962) in his thesis. The simplest case (elliptic curves; see the 1921 entry) was proved by Helmut Hasse. The first proof of the general result was published by Weil in 1948. Weil presented two proofs of this theorem. The first used the geometry of algebraic surfaces and the theory of correspondences. The second used the theory of abelian varieties; see [13, 14]. The whole project required revisions in the foundations of algebraic geometry since he needed these theories to be valid over arbitrary fields not just algebraically closed fields in characteristic zero. In the early seventies, Fields Medalist Enrico Bombieri (1940– ) obtained a more elementary proof, building upon important work of Sergei Aleksandrovich Stepanov (1941– ).

Centennial Problem 1945 ´ Proposed by Julio Andrade, IHES. Let K be a global function field in one variable with a finite constant field Fq with q elements. Suppose that the genus of K is g. Prove that there is a polynomial LK (u) ∈ Z[u] of degree 2g such that ζK (s) =

LK (q −s ) . (1 − q −s )(1 − q 1−s )

You will need to use the Riemann–Roch theorem. For more details about the genus of a function field and the Riemann–Roch theorem see [9].

1945: Comments Special values of the Riemann zeta function. Since we have discussed the Riemann zeta function in this entry, now is a good time to explore some more of its intriguing properties. While reading the results below ask yourself: do analogous statements hold in the function field setting?



The Riemann zeta function can be evaluated at the even positive integers in closed form. The first few values are π2 π 10 , ζ(10) = , 6 93555 π4 691π 12 ζ(4) = , ζ(12) = , 90 638512875 2π 14 π6 , ζ(14) = , ζ(6) = 945 18243225 3617π 16 π8 , ζ(16) = ; ζ(8) = 9450 325641566250 see the 1919 entry for an evaluation of ζ(2). In fact, Euler showed that ζ(2k) is a rational multiple of π 2k for k = 1, 2, . . .. On the other hand, the exact value of ζ(2) =

ζ(3) =

∞  1 = 1.2020569031595942854 . . . n3 n=1

is unknown. Fortune and glory await the mathematician who provides a closedform evaluation of ζ(3). The most significant result in this direction is due to Roger Ap´ery (1916–1994), who proved that ζ(3) is irrational in 1979 [1] (see the 1979 entry). The Riemann zeta function and arithmetic functions. To further highlight the significance of the Riemann zeta function, let us investigate some of its connections with arithmetic functions from number theory. The divisor function τ (n) counts the number of divisors of n. Thus, τ (n) = 2 if and only if n is a prime. If p and q are distinct primes, then τ (p2 q) = 6 since the divisors of p2 q are 1, p, p2 , q, pq, and p2 q. Moreover, τ (p2 )τ (q) = 3 · 2 = 6 = τ (p2 q). This is not a coincidence, for τ is a multiplicative function. This means that τ (mn) = τ (m)τ (n) whenever m and n are relatively prime. If Re s > 1, then we may square the zeta function and obtain ∞  1 2 ζ 2 (s) = ns n=1    1 1 1 1 1 1 = 1 + s + s + s + ··· 1 + s + s + s + ··· 2 3 4 2 3 4       1 1 1 1 1 1 1 1 = 1+ + s + + s + + s · s + s + ··· 2s 2 3s 3 4s 2 2 4 2 2 3 2 4 = 1 + s + s + s + s + s + ··· 2 3 4 5 6 ∞  τ (n) ; = ns n=1 term-by-term multiplication is permissible here since both series involved are absolutely convergent (see p. 110). This suggests that we might extract information about the divisor function from knowledge of ζ(s). Experience has taught number theorists that almost any interesting arithmetic function can be expressed in terms of the zeta function.



A famous open problem in this area is the Dirichlet divisor problem. It asks for the infimum over all α such that n 

τ (j) = n log n + (2γ − 1)n + O(nα ),


in which γ is the Euler–Mascheroni constant (1942.5). Dirichlet himself showed that α = 12 works, so the infimum is at most 12 ; a simple proof can be found in [2]. In particular, the average value of τ (n) tends to log n + 2γ − 1: lim


n 1 


τ (k) − (log n + 2γ − 1)

 = 0.


On the other hand, Edmund Landau (1916) showed that the infimum must be ≥ 14 .  It is customary to write τ (n) = d|n d, in which the subscript d|n indicates that the sum runs over all of the positive divisors of n. This suggests the generalization  σk (n) = dk , d|n

which sums the kth powers of the divisors of n. If k = 0, then σ0 = τ . The case k = 1 is also special; we write σ = σ1 and refer to this as the sigma function (or the sum of divisors function). Like the τ function, the functions σk are multiplicative. For Re s > 2, we have ζ(s)ζ(s − 1) = = = = =

∞ ∞  1   1  ns ms−1 n=1 m=1

∞ ∞  1   m  ns ms n=1 m=1    1 1 1 2 3 4 1 + s + s + s + ··· 1 + s + s + s + ··· 2 3 4 2 3 4       1 1 1 2 3 2 4 1+ + + + + + + + ··· 2s 2s 3s 3s 4s 4s 4s ∞  σ(n) , ns n=1

which reveals a connection between σ and ζ. In a similar vein, Srinivasa Ramanujan derived the identity ∞  σa (n)σb (n) ζ(s)ζ(s − a)ζ(s − b)ζ(s − a − b) = s n ζ(2s − a − b) n=1


that Albert Ingham (1900–1967) used in 1930 to provide a quick proof of the prime number theorem [7]. Ramanujan’s formula reduces to the curious formula ∞  (τ (n))2 ζ 4 (s) = s n ζ(2s) n=1



when a = b = 0. Another appealing formula of Ramanujan is ∞ π 2 n  cq (n) σ(n) = , 6 q=1 q 2 in which cq (n) =


a=1 gcd(a,q)=1

is a Ramanujan sum [10] Bibliography [1] R. Ap´ ery, Irrationalit´ e de ζ(2) et ζ(3) (French), Luminy Conference on Arithmetic, Ast´ erisque 61 (1979), 11–13. MR3363457 [2] T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Heidelberg, 1976. MR0434929 [3] E. Artin, Quadratische K¨ orper in Geibiet der H¨ oheren Kongruzzen I and II, Math. Z. 19 (1924), 153-296. https://eudml.org/doc/167773 and https://eudml.org/doc/167774. [4] E. Bombieri, Problems of the millennium: the Riemann hypothesis, Clay Mathematics Institute, http://www.claymath.org/sites/default/files/official_problem_description. pdf. [5] E. Bombieri, Counting points on curves over finite fields (d’apr` es S. A. Stepanov), S´ eminaire Bourbaki, 25` eme ann´ ee (1972/1973), Exp. No. 430, Lecture Notes in Math., Vol. 383, Springer, Berlin, 1974, pp. 234–241. http://link.springer.com/chapter/10.1007 %2FBFb0057311. MR0429903 [6] G. Lejeune Dirichlet, Sur l’usage des s´ eries infinies dans la th´ eorie des nombres (French), J. Reine Angew. Math. 18 (1838), 259–274, DOI 10.1515/crll.1838.18.259. MR1578191 [7] A. E. Ingham, Note on Riemann’s zeta-Function and Dirichlet’s L-Functions, J. London Math. Soc. 5 (1930), no. 2, 107–112, DOI 10.1112/jlms/s1-5.2.107. MR1574211 [8] K. Ireland and M. Rosen, A classical introduction to modern number theory, 2nd ed., Graduate Texts in Mathematics, vol. 84, Springer-Verlag, New York, 1990. MR1070716 [9] C. Moreno, Algebraic curves over finite fields, Cambridge Tracts in Mathematics, vol. 97, Cambridge University Press, Cambridge, 1991. MR1101140 [10] S. Ramanujan, On certain trigonometrical sums and their applications in the theory of numbers [Trans. Cambridge Philos. Soc. 22 (1918), no. 13, 259–276], Collected papers of Srinivasa Ramanujan, AMS Chelsea Publ., Providence, RI, 2000, pp. 179–199. MR2280864 [11] P. Sarnak, Problems of the millennium: the Riemann hypothesis (2004), Clay Mathematics Institute, http://www.claymath.org/sites/default/files/sarnak_rh_0.pdf. [12] A. Weil, On the Riemann hypothesis in function fields, Proc. Nat. Acad. Sci. U. S. A. 27 (1941), 345–347. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1078336/. MR0004242 [13] A. Weil, Sur les courbes alg´ ebriques et les vari´ et´ es qui s’en d´ eduisent (French), Actualit´ es Sci. Ind., no. 1041 = Publ. Inst. Math. Univ. Strasbourg 7 (1945), Hermann et Cie., Paris, 1948. MR0027151 [14] A. Weil, Vari´ et´ es ab´ eliennes et courbes alg´ ebriques (French), Actualit´ es Sci. Ind., no. 1064 = Publ. Inst. Math. Univ. Strasbourg 8 (1946), Hermann & Cie., Paris, 1948. MR0029522


Monte Carlo Method Introduction While today it is hard to gaze around a room without seeing a computer, be it in a smartphone or a thermostat, the situation was different during World War II. Computers were in their infancy. They were rare, expensive, and big. Early computers could fill an entire room and they had enormous power demands. A major leap came when people realized that they could be used for more than computing exact answers to specific problems. They can be used to approximate the answers to difficult problems through extensive simulations. This led to what is now called the Monte Carlo method (the name refers to the famous casino on Monaco). The first thoughts and attempts I made to practice [the Monte Carlo method] were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than “abstract thinking” might not be to lay it out say one hundred times and count the number of successful plays. This was already possible to envisage with the beginning of the new era of fast computers, and I immediately thought of problems of neutron diffusion and other questions of mathematical physics, and more generally how to change processes described by certain differential equations into an equivalent form interpretable as a succession of random operations. – Stanislaw Ulam (1909–1984) [2]

Monte Carlo techniques are now used to approximate the solution to numerous problems. Rather than finding exact answers, one can simulate millions of cases and use that information to obtain an excellent approximation to the correct answer. An early application was to nuclear reactions, in which scientists would approximate both the trajectories of neutrons and the numbers released in each collision. A more down-to-earth example involves integration. In calculus, students learn how to compute areas by integration. Instructors work hard to find functions that have nice antiderivatives; a general function does not have a closed-form expression for its integral (see the 1968 and 1976 entries). For instance, the definite integrals  0


 1 + x3 dx,



dx , ln x



and 0

e−x dx 2



cannot be computed directly using antiderivatives; none of the integrands are derivatives of elementary functions. They can, however, be computed quickly and accurately using Monte Carlo integration. Suppose that we want to find the area of a region R in R2 . For simplicity suppose that R is a subset of the unit square [0, 1]2 . Then choose N points in [0, 1]2 uniformly at random. Whatever fraction lies in R is our approximation to the area. The central limit theorem (see the 1922 entry) ensures that this is a good approximation, and for large N , it gives us bounds on our error. Figure 1 illustrates the Monte Carlo method for computing the area bounded by the ellipse x2 + 4y 2 = 1. As another example, one can use the Monte Carlo method to approximate the /1 2 definite integral 0 e−x dx. Take N random points (x1 , y1 ), (x2 , y2 ), . . . , (xN , yN ) 2 in [0, 1]2 , count how many of them satisfy yi ≤ e−xi , then divide this number by N to obtain an approximation to the integral. Another famous example of the Monte Carlo philosophy, which actually predates the method by many years, can be seen in Buffon’s needle problem. The following problem was posed by Georges-Louis Leclerc (1707–1788), Comte de Buffon. Given infinitely many parallel lines exactly d units apart, independently drop N needles of length , and count how many times the needles intersect the lines; see Figure 2. If  ≤ d, then one can show that as N → ∞ the expected number of hits tends to 2 N. (1946.1) πd The claimed answer is reasonable: if we rescale the separation of the lines by r and the length of the needle by r, the percentage of hits should not change (this is equivalent to passing from meters to feet, say). Also, in the limit as d → ∞ the percentage of tosses resulting in a hit tends to zero, while as  → ∞ we expect to cross more and more lines each toss. There are many ways to prove this result. One is a direct, brute-force calculation, putting a probability measure on the space of needle tosses. Without loss of generality, by symmetry we can parametrize the space by saying the center of the rod lands on the x-axis (and there is a line at x = 0) somewhere between −d/2 and d/2 and the angle the needle makes with the x-axis is θ ∈ [0, 2π). Actually, we can exploit symmetry even more: it suffices to assume the center lands between 0 and d/2 and that the angle is between 0 and π/2. The result now follows from direct integration, which you are encouraged to set up and do. There is a truly remarkable and elegant derivation of (1946.1) that completely avoids integration. You can find the complete details in [1], although we encourage you to ponder the following sketch and make it rigorous since there are a lot of powerful techniques involved. First, observe that the answer only depends on the ratio /d; rescaling both by the same amount is effectively the same as just changing the units. Next, the answer is linear in ; the expected number of intersections from two sticks of length 1 and 2 is the same as using one stick of length 1 +2 . Finally, the answer is linear in N (if we double the number of tosses, we double the number of expected hits). Putting all of this together we see there is some constant c such that the expected number of hits is c d N .


(a) N = 100, A ≈ 1.24

(b) N = 500, A ≈ 1.504

(c) N = 1,000, A ≈ 1.576

(d) N = 5,000, A ≈ 1.6016

(e) N = 10,000, A ≈ 1.5976

(f) N = 20,000, A ≈ 1.5838

Figure 1. Monte Carlo method for computing the area A bounded by the ellipse x2 + 4y 2 = 1. The true area is π/2 = 1.570796 . . .; see the comments for the 1941 entry for the derivation.




Figure 2. Throwing 1,000 needles of length 1/2 on an array of vertical lines spaced 1 unit apart. There are 318 hits, yielding an approximation π ≈ 1,000 318 = 3.14465. This differs from π ≈ 3.14159, for a relative error of approximately 0.00097. Equivalently, with 1,000 tosses we have two digits of decimal accuracy. All that remains is to determine c. Our solution uses a powerful method: if we can find c for one specific choice of  and d, we know c for all  and d; see the 1914 entry for more on this method. Let us find c when  = πd. To do this, we imagine that instead of tossing rods we toss a circle1 with diameter d. No matter how the circle lands, it intersects the vertical lines exactly twice (most of the time it will be the same line twice, but if it lands just right, it will be touched at the extremes by two adjacent lines). Thus, with a perimeter of πd we have two intersections, so if we toss the circle N times, we expect 2N intersections. Thus, c

πd N = 2N d


c =

2 , π

exactly as we had before. Centennial Problem 1946 Proposed by Steven J. Miller, Williams College. One of the most important steps in the Monte Carlo method is the ability to choose numbers randomly. You may be surprised by how hard it is to generate a “random” sequence of points. Frequently one generates a sequence of quasirandom points through a deterministic process, which is often good enough for 1 You

might object to our tossing a circle when the problem is about tossing rods; however, we may approximate the circle as a regular n-gon with many small sides. As the number of sides tends to infinity, the circle corresponds to lots of little rods falling at all angles equally.



applications. A popular, early method is the von Neumann middle square digits method, described with some nice references in the “random numbers” section of [3]. Given an n-digit natural number, square it to get a 2n-digit number. Our random number is the middle n digits. We then square that, take the middle n digits of the new product, and obtain our next “random” number. Continuing this process generates our pseudo-random sequence of numbers. For example, if we start with 4321, our next number is 6710 since 43212 = 18671041. Since 67102 = 45024100, our next number is 241. This process cannot generate numbers uniformly at random, even if we restrict ourselves to numbers from 0 to 10n −1. The reason is simple: this process generates a periodic sequence! After at most 10n − 1 terms we have a repeat, at which point the pattern cycles since all future terms are completely determined by the preceding value. For each n, what is the shortest period? The longest? How many of the 10n initial seeds have the shortest (or longest) period? Can you give an example? If you cannot solve this problem exactly, can you approximate the answer using Monte Carlo techniques? 1946: Comments How many trials? One of the most natural questions to ask when doing a Monte Carlo simulation is, “How many trials N are needed for a given accuracy?” In √ many settings the convergence is fast, with the error on the order of 1/ N . Let us explore approximating the area A of a region R contained in the unit square [0, 1]2 . Let X1 , X2 , . . . , XN be independent, identically distributed random variables, in which each is 1 with probability A and 0 with probability 1 − A (each Xn is a Bernoulli random variable; see the comments for the 1922 entry). The fraction YN =

X1 + · · · + XN N

has expected value A and variance Var YN = Var =

N N  1   1 Xn = A(1 − A) 2 N n=1 N n=1

A(1 − A) . N

As N → ∞, the central limit theorem (see the 1922 entry) ensures that the random variable YN converges to being normally distributed with mean A and standard deviation " A(1 − A) . N The greatest uncertainty is when A = 1/2, for which the standard deviation is√at most 2√1N . The probability that the observed estimate is off by more than 2/ N is bounded by the probability of being more than four standard deviations from the mean,√which is approximately 0.0000633425. If instead we asked about being within 3/ N , the probability of failing decreases to at most 1.973176 · 10−9 .



Bibliography [1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the 1998 original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin, 2018. MR3823190 [2] R. Eckhardt, Stan Ulam, John von Neumann, and the Monte Carlo method, with contributions by Tony Warnock, Gary D. Doolen and John Hendricks; Stanislaw Ulam 1909–1984, Los Alamos Sci. 15, Special Issue (1987), 131–137. http://library.lanl.gov/cgi-bin/ getfile?00326867.pdf. MR935772 [3] N. Metropolis, The beginning of the Monte Carlo method, Stanislaw Ulam 1909–1984, Los Alamos Sci. 15, Special Issue (1987), 125–130. http://library.lanl.gov/cgi-bin/ getfile?00326866.pdf. MR935771 [4] N. Metropolis and S. Ulam, The Monte Carlo method, J. Amer. Statist. Assoc. 44 (1949), 335–341. http://www.jstor.org/stable/2280232. MR0031341 [5] Wikipedia, Monte Carlo method, http://en.wikipedia.org/wiki/Monte_Carlo_method.


The Simplex Method Introduction There are many important problems for which an algorithm to find a solution exists but has a prohibitively long run time that limits its practical value. One example is integer factorization: given an integer N , write it as a product of primes. We give one solution below without any attempt to improve its efficiency. • Step 1: Initialize Factors(N ) to be the empty set; as the name suggests, we will store the factors of N here. Let M = N and n = 2 and continue to Step 2. • Step 2: If n divides M , then append n to Factors(N ), replace M with M/n, and continue to Step 3. If n does not divide M , then let n = n + 1; if n = M , then append n to Factors(N ) and go to Step 4, else repeat this step. • Step 3: If M > 1, then set n = 2 and repeat Step 2, else go to Step 4. • Step 4: Print Factors(N ) and stop. This algorithm is painfully slow since it requires us to check all numbers up to N as potential divisors. We can make many improvements, although none of these yields a practical algorithm. Once we find an n that divides M , we should see how many times n divides M ; this would save us from having to return to n = 2 each time √ we can notice that any prime factor of N is at most √ we restart Step 2. Next, N , and hence once n > N we know N is prime. Finally, if we are able to store the earlier prime numbers, we need only check n that are prime. Even if we do all √ of these, however, we still have to check all primes at most N . The prime number theorem tells us that the number of primes at most x is approximately x/ log x. If N is around 10406 , we need to check about 2 × 10200 numbers. This is well beyond what modern computers can do. While factorization is easy to do in principle, in practice the “natural” approach is too slow to be useful. It is a major open problem to find a fast way to factor numbers. If such an algorithm existed, then encryption schemes such as RSA (described in the 1977 entry) would be insecure. Interestingly, while we cannot quickly factor a number, we can quickly tell if a number is prime (see the 2002 entry). Our topic for this year concerns a different problem for which a fast algorithm is available. Linear programming is a beautiful subject that is a natural outgrowth of linear algebra. In linear algebra we try to solve systems of linear equations, such as Ax = b. In linear programming we have a constraint matrix A and are now looking for a solution to Ax = b that maximizes the profit cT x, in which c is fixed. Initially 181



one allows inequalities in the linear system of constraints. By introducing additional variables we can replace all the inequalities with equalities. We also require each component of x to be nonnegative (it is a nice exercise to show that we may always do this, though we may need to introduce some additional variables); doing so allows us to put our linear programming problem into a standard, canonical form. For example, one of the earliest successes in the subject concerns the diet problem. Here the entries of x are constrained to be nonnegative, with xk equal to the amount of product k consumed. Each food provides a different amount of essential vitamins and minerals, and we wish to find the cheapest diet that will keep us alive while ensuring that we get the minimum daily recommended allowance of each nutrient. See [1] for a humorous recounting of the meeting between linear programming and the diet problem. One of the first theorems proved in the subject concerns the candidates for our solution. We say x is feasible if it solves Ax = b. It turns out that the space of feasible solutions has many nice properties. We call a solution x of the constraints a basic solution if the columns of A corresponding to the nonzero entries of x are linearly independent. It turns out that if there is an optimal solution to our problem, then that optimal solution is a basic solution. Moreover, there are only finitely many basic solutions. Thus, we need only check all the basic solutions to find the optimal solution. The problem is that naively searching the set of basic solutions is impractical for large, real-world problems. Let A be an m × n matrix, so that x ∈ Rn . We assume that n > m, since otherwise the system Ax = b is overdetermined. If every subset of at most m columns of A is linearly independent, then the number of basic solutions is at most       n n n + + ··· + . 0 1 m For m, n large this is approximately nm /m!. To get some feel for how quickly this expression grows, if n = 10,000 and m = 100, then the number of candidates exceeds 10241 . We need an efficient way to navigate the set of basic solutions. Fortunately, there is such an approach. It is called the simplex algorithm and it was introduced by George Dantzig (whom we met in the 1939 entry) in 1947. His procedure and later generalizations allow us to solve many real-world problems in reasonable amounts of time on everyday laptops. Centennial Problem 1947 Proposed by Steven J. Miller, Williams College. Building on the success of the simplex algorithm, it is natural to consider other generalizations of linear programming and ask if they too can be solved efficiently. The first natural candidate is to replace the word “linear” with “quadratic.” Unfortunately, while quadratic objective functions can often be handled, to date we still require the constraints to be linear. To see why, we first consider another generalization. Instead of requiring the solution vector x to have nonnegative real entries, let us require it to have nonnegative integral entries. This is an extremely important class of optimization problems.



When the entries are restricted to 0 or 1, we can interpret the components as binary indicator variables. Do we have a plane leaving from Albany to Charlotte at 2:45pm? Do we show The Lego Movie on our biggest screen at 10:30am? If we are trying to solve the traveling salesman problem (what is the route of least distance through a given set of cities?), is the fifth leg of our trip from Boston to Rochester? These examples should convey the importance of solving binary integer programming problems. Prove that if we could modify the simplex method to handle problems with quadratic constraints, then we could solve all integer programming problems! For those familiar with the P versus NP problem (see the comments for the 2000 entry), this would prove P equals NP.

1947: Comments Overview of the simplex method. While we cannot describe the simplex method in its full glory and prove why it works in a short introduction, we can at least sketch what it is and give a sense of why it should work. Suppose that we wish to solve the canonical linear programming problem: minimize cT x subject to Ax = b for an m×n matrix A, in which the entries of x and b are nonnegative (if an entry of b were negative, we could multiply the corresponding row of A by −1 and reverse its sign). We also make the assumption that the problem is nondegenerate, in the following sense. Suppose that the m rows of A are linearly independent. If the rows are not linearly independent, then either we cannot solve Ax = b or at least one of the rows is unnecessary. We also assume b is not a linear combination of fewer than m columns of A. If b is a combination of fewer than m columns, this will create a technical difficulty in the simplex method. Fortunately this is a weak condition: if we change some of the entries of b by small amounts (less than 10−10 , for example), this should suffice to break the degeneracy. The simplex method has two phases. • Phase I: Find a basic feasible solution (or prove that none exists). • Phase II: Given a basic feasible solution, find a basic optimal solution (or prove that none exists). If no optimal solution exists, Phase II produces a sequence of feasible solutions with cost cT x that tends to minus infinity. The idea of the proof seems absurd at first: we start by assuming we can do Phase II, use that to do Phase I, and then use Phase I to do Phase II. The reason this argument is not circular is that the input of Phase II is a basic feasible solution. If we have a problem for which we have one such solution, then we can run through Phase II. Instead of the original problem, we instead consider a related one for which we can find a basic feasible solution by inspection. It is to this related problem that we apply Phase II to determine whether or not there is a basic feasible solution to the original problem; if there is, we then use that solution as an input in applying Phase II to the original problem. We proceed by appending the m × m identity matrix to A to form the new matrix A = [A I] and consider the following new canonical linear programming



problem: minimize z1 + · · · + zm subject to A (x1 , . . . , xn , z1 , . . . , zm )T = b with xi , zj ≥ 0. By construction we can find a basic feasible solution: set each xi = 0 and set zj = bj . Now that we have a basic feasible solution to this related problem, we can apply Phase II. The cost cannot go to negative infinity, since the sum of the zj ’s is at least zero. Thus, there is an optimal solution and there are two cases. If the sum is zero, then we have found a feasible solution to our original problem (as the only way the sum vanishes is if each zj = 0). If the sum is positive, then at least one of the zj ’s is nonzero. This proves that there cannot be a feasible solution to the original problem, for if there were, that would correspond to a solution with all zj = 0 and hence lower cost. For a full analysis of Phase II, see [2]; we summarize the main ideas here through an application mentioned earlier: the diet problem. The main idea is that the space of feasible solutions is given by the intersection of regions above or below hyperplanes arising from the constraints. The optimal solution, if it exists, is either in the interior or on the boundary. One then shows that if this minimum value is attained inside, that same value is attained somewhere on the boundary and thus it suffices to investigate these points. Consequently, the search for an optimal solution is reduced to the “faces” of our boundary. The power of the simplex method is that it also gives a very efficient way to flow to the optimal solution. We give an example of one such path in Figure 1 and illustrate what is happening by looking at a two-dimensional example. Consider the diet problem, in which we have two foods with two nutrients (iron and protein). • The first food costs $20 per unit. Each unit contains 30 units of iron and 15 units of protein. • The second food costs $2 per unit. Each unit contains 5 units of iron and 10 units of protein. Assume we need 60 units of iron and 70 units of protein daily to remain alive. If we buy x1 units of the first and x2 units of the second, the constraints become 30x1 + 5x2 ≥ 60, 15x1 + 10x2 ≥ 70,

(iron) (protein)

x1 , x2 ≥ 0. The first two constraints reflect how much iron and protein we consume, ensuring we meet the minimum requirements. The third constraint prevents us from eating negative quantities. We want to minimize the cost C, which is given by C = 20x1 + 2x2 . We illustrate the search for the optimal solution in Figure 2. Changing the value of C shifts the cost line up or down; thus different values generate a family of parallel cost lines. Any two points on one of these lines have the same cost. If we are at an interior point in the feasible region, we can flow down the cost line going through it until we reach the boundary without changing the cost. Thus, there cannot be a diet at an interior point cheaper than all the boundary diets. Next, we



Figure 1. An example of a feasible space for a linear programming problem and a path generated by the simplex algorithm to reach an optimum solution. The set of solutions is a convex polytope, and an optimum solution (if it exists) must be one of the vertices.

(a) The region of feasible solutions.

(b) Searching for the cheapest diet.

Figure 2. An illustration of the simplex method for the two-food diet problem. can shift the cost line down and to the left and lower the cost. Doing so lands us at one of the three vertices (unless the slope of the cost line equals the slope of one of the boundary lines, but even in that case we would still have the value at a vertex



equal to the minimal cost). These arguments generalize to a large class of linear programming problems and show that the optimal solution occurs at a boundary; all that remains is to find a method to quickly reach such a point. Bibliography [1] G. Dantzig, The diet problem, Interfaces 20 (1990), no. 4, 43–47. [2] J. Franklin, Methods of mathematical economics: Linear and nonlinear programming, fixedpoint theorems, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Berlin, 1980. MR602694 [3] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Undergraduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274 [4] Wikipedia, Simplex algorithm, https://en.wikipedia.org/wiki/Simplex_algorithm.


Elementary Proof of the Prime Number Theorem Introduction The prime number theorem states that the number of primes at most x, denoted π(x), is asymptotic to x/ log x: lim


π(x) = 1; x/ log x

see the 1919 and 1933 entries. First conjectured in the 1790s, it was not proved until almost 100 years later, when Jacques Hadamard and Charles Jean de la Vall´eePoussin (1866–1962) independently established it in 1896. They both used complex analysis to understand the distribution of zeros of the Riemann zeta function (see the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries). Since the prime number theorem is a statement about integers and not about complex analysis, these proofs were unsatisfactory to some. It felt unnatural to use complex numbers to study primes.1 However, it was commonly believed that complex analysis or other similarly “deep” methods were needed to prove it. According to G. H. Hardy (see the 1940 entry): No elementary proof of the prime number theorem is known, and one may ask whether it is reasonable to expect one. Now we know that the theorem is roughly equivalent to a theorem about an analytic function, the theorem that the Riemann zeta function has no roots on a certain line. A proof of such a theorem, not fundamentally dependent on the theory of functions, seems to me extraordinarily unlikely.

It took almost fifty years for an elementary proof (that is, one that does not rely on complex analysis) to be found. This was done by Paul Erd˝ os [3] (see the 1913 entry) and Atle Selberg [9] in 1948. The story of who contributed what and when, and who should receive what credit, has been the subject of many heated discussions. Dorian Goldfeld (1947– ), who knew the players involved, has written a good description of what happened [5]. See also [6] for a motivated account of the proof. The term “elementary” should not be confused with “easy.” The elementary proofs of the prime number theorem are longer, more technical, and provide less accurate estimates about π(x) than the complex analysis proofs do. In fact, the classical approach is still preferred in most textbooks. We devote the remainder of this entry to discussing the traditional, complex analysis approach. 1 A famous, humorous dictum is that the shortest path between two statements involving real numbers is through the complex plane; that is certainly the case here.




In Riemann’s seminal 1859 paper [8], he showed that knowledge of the zeros of the zeta function yields information about π(x); see the 1942 entry. The fact that the zeta function enjoys the Euler product representation (1933.3) suggests the use of logarithmic derivatives. Recall that the logarithmic derivative of f is (log f ) =

f . f

The logarithmic derivative of a product is a sum of logarithmic derivatives: (f g) f  g + f g f g = = + ; fg fg f g the same holds true for products with three or more factors. With appropriate limit arguments, one can use this technique to study the Euler product (1933.3) representation of the zeta function. The following theorem from complex analysis permits us to pass from knowledge of the logarithmic derivative of a function to knowledge of the number and location of the roots of that function. Theorem: Let Ω be a nonempty, connected open set in C and let γ be a simple closed curve in Ω with its interior in Ω. Let f : Ω → C be analytic with no zeros on γ and let g : Ω → C be analytic. Then f has finitely many zeros in the interior of γ and    1 f (s) g(s) ds = g(ρ), (1948.1) 2πi γ f (s) f (ρ)=0

in which ρ runs over the zeros of f , repeated according to multiplicity. Up to lower order terms, when we integrate f (s) = ζ  (s)/ζ(s) times g(s) = xs /s along the line2 Re s = 2, we find (as is customary in number theory, p always denotes a prime number)  xρ  . (1948.2) log p = x − ρ p≤x


Some care is needed in writing down the sum so that it converges (this is typically done by summing the zeros in complex-conjugate pairs). As remarked in the 1942 entry, the analytic continuation of ζ(s) has only a single pole3 , which is simple and at s = 1 with residue 1; this is responsible for the x = x1 /1 term in (1948.2). The remaining terms come from the zeros of ζ(s). One can show that these zeros have real part at most 1 without too much trouble; this follows from the convergence of the Euler product (1933.3). The prime number theorem asserts that  x 1 ∼ , log x p≤x

2 The line Re s = 2 is not a simple closed curve. However, it is when viewed as a curve that passes through ∞. Suitable limit arguments and the Riemann sphere model of the complex plane (see the 1956 entry) are required to push this through. 3 A pole of f is an isolated singularity s around which f behaves like a constant times 0 (s − s0 )−k for some natural number k. The pole is simple if k = 1. The residue of f at a simple pole s0 is lims→s0 (s − s0 )f (s).



which partial summation (the discrete analogue of integration by parts) confirms is equivalent to  log p ∼ x. p 1. Then use the Euler product formula (1933.3) to show that ζ(σ + it) = 0 if σ > 1. We are left with the case σ = 1. This was originally independently proved by Hadamard and de la Vall´ee-Poussin in 1896; fill in the details of Mertens’s elegant proof from a few years later by proving the following statements. (a) 3 + 4 cos θ + cos 2θ ≥ 0. (Hint: Consider (cos θ + 1)2 .) (b) For s = σ + it, log ζ(s) =

∞  p−kσ p k=1

(c) Re log ζ(s) =

∞  p k=1


e−itk log p .

  p−kσ cos t log pk . k

(d) 3 log ζ(σ) + 4 Re log ζ(σ + it) + Re log ζ(σ + 2it) ≥ 0. (e) ζ(σ)3 |ζ(σ + it)4 ζ(σ + 2it)| ≥ 1. (f) If ζ(1 + it) = 0, then σ decreases to 1 from above and |ζ(σ + it)| < A(σ − 1) for some A. (g) Since ζ(σ) ∼ (σ−1)−1 (because ζ(s) has a simple pole of residue 1 at s = 1) and ζ(σ +2it) is bounded as σ → 1 (the only pole of ζ(s) is at s = 1), the preceding implies that if ζ(1 + it) = 0, then as σ → 1, ζ(σ)3 |ζ(σ + it)4 ζ(σ + 2it)| → 0. Since the product must be at least 1, this proves ζ(1 + it) = 0. The key to Mertens’s proof is the positivity of the trigonometric expression in (a).



Bibliography [1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the 1998 original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin, 2018. MR3823190 [2] D. Burt, S. Donow, S. J. Miller, M. Schiffman, and B. Wieland, Irrationality measure and lower bounds for π(X), Pi Mu Epsilon J. 14 (2017), no. 7, 421–429. http://arxiv.org/pdf/ 0709.2184.pdf. MR3726946 [3] P. Erd˝ os, On a new method in elementary number theory which leads to an elementary proof of the prime number theorem, Proc. Nat. Acad. Sci. U. S. A. 35 (1949), 374–384, DOI 10.1073/pnas.35.7.374. MR0029411 [4] H. Furstenberg, On the infinitude of primes, Amer. Math. Monthly 62 (1955), 353, DOI 10.2307/2307043. MR0068566 [5] D. Goldfeld, The elementary proof of the prime number theorem: an historical perspective, http://www.math.columbia.edu/~goldfeld/ErdosSelbergDispute.pdf. [6] N. Levinson, A motivated account of an elementary proof of the prime number theorem, Amer. Math. Monthly 76 (1969), 225–245, DOI 10.2307/2316361. http://www.maa.org/ sites/default/files/images/upload_library/22/Ford/NormanLevinson.pdf. MR0241372 [7] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 ¨ [8] G. F. B. Riemann, Uber die Anzahl der Primzahlen unter einer gegebenen Gr¨ osse, Monatsber. K¨ onigl. Preuss. Akad. Wiss. Berlin (1859), 671-680. http://www.maths.tcd.ie/pub/ HistMath/People/Riemann/Zeta/EZeta.pdf. [9] A. Selberg, An elementary proof of the prime-number theorem, Ann. of Math. (2) 50 (1949), 305–313, DOI 10.2307/1969455. http://www.jstor.org/stable/1969455?seq=1# page_scan_tab_contents. MR0029410 [10] T. Tao, Mertens’ theorems, https://terrytao.wordpress.com/2013/12/11/mertens-theorems/.


Beurling’s Theorem Introduction The study of linear transformations between finite-dimensional spaces is the purview of linear algebra. Analysis rarely enters into the discussion because it is possible to show any two norms on the vector space Rn are essentially the same. To be more specific, they give rise to the same open sets, closed sets, convergent sequences, continuous functions, and so forth. The study of linear transformations between infinite-dimensional normed vector spaces is called operator theory. There are lots of things that can go wrong when one steps up to the infinite-dimensional setting; we examine of few of them below. The interplay between linear algebra and analysis is one of the great appeals of the subject. A norm on a vector space V is a function · : V → [0, ∞) that satisfies (a) v = 0 if and only if v = 0, (b) cv = |c| v for all v ∈ V and all scalars c, and (c) u + v ≤ u + v for all u, v ∈ V. The Euclidean norm x = (x21 + x22 + · · · + x2n )1/2 on Rn is an example, as are   x 1 = |x1 | + |x2 | + · · · + |xn | and x ∞ = max |x1 |, |x2 |, . . . , |xn | , (1949.1) in which x = (x1 , x2 , . . . , xn ); see Figure 1. Consider the space C[a, b] of continuous, real-valued functions f : [a, b] → R, endowed with the norm f ∞ = sup |f (x)|. x∈[a,b]

This induces a metric d∞ (f, g) = f − g ∞ with respect to which C[a, b] is a complete metric space (a sequence converges with respect to d∞ if and only if it converges uniformly). Let P denote the subspace of all polynomial functions; it is infinite dimensional since it contains polynomials of every finite degree. The celebrated Weierstrass approximation theorem (see the comments) asserts that P is dense in C[a, b]: for each f ∈ C[a, b], there is a sequence of polynomials pn that converges to f with respect to d∞ . In finite-dimensional spaces, this sort of thing is impossible: a proper subspace cannot be dense in the whole space. For example, a plane through the origin cannot be dense in R3 . An often used, but underappreciated, result in linear algebra is: if A, B are n × n matrices and AB = I, then BA = I; that is, a right inverse is a left inverse, 193



e2 e2



(a) {x ∈ R2 : x 1 ≤ 1}



(b) {x ∈ R2 : x ∞ ≤ 1}

Figure 1. The closed unit balls for the norms (1949.1) on R2 have corners. The corners are extreme points: they do not lie in any open line segment that joins two points of the closed ball. Here e1 = (1, 0) and e2 = (0, 1). and vice versa. This ⎡ 0 1 ⎢0 0 ⎢ A = ⎢0 0 ⎣ .. .. . .

fails miserably in infinite dimensions. If ⎤ ⎡ ⎤ 0 0 ··· 0 0 0 0 ··· ⎢1 0 0 0 · · ·⎥ 1 0 · · ·⎥ ⎥ ⎢ ⎥ and B = ⎥ ⎢0 1 0 0 · · ·⎥ , 0 1 · · ·⎦ ⎣ ⎦ .. .. . . .. .. .. .. . . . . . . . . . .


then one can verify that AB = I and BA = I. However, we should be a bit more formal about the space upon which these matrices act. Consider the complex vector space 2 (N) of all complex, square-summable infi 2 1/2 nite sequences x = (x0 , x1 , . . .); the norm on 2 (N) is x = ( ∞ . The n=0 |xn | ) 2 matrices A and B from (1949.2) operate on x ∈  (N) as follows: A(x0 , x1 , . . .) = (x1 , x2 , . . .) and

B(x0 , x1 , . . .) = (0, x0 , x1 , . . .).

That is, A is the backward shift operator and B is the forward shift operator. In an interesting twist, observe that A is onto but not one-to-one, and B is one-to-one but not onto. Linear algebra tells us that both of these situations are impossible for a linear transformation from Rn to itself. At this point we insist that our subspaces are topologically closed; this avoids some peculiarities that would take us too far afield. Critical to the understanding of any linear transformation is a careful study of its invariant subspaces. These are subspaces that are mapped into themselves by an operator. For instance, each one-dimensional invariant subspace of an operator is spanned by an eigenvector. A complete understanding of the invariant subspaces of an operator on Cn reveals its Jordan canonical form. What are the invariant subspaces of the forward shift operator? In other words, what are the (topologically closed) subspaces of 2 (N) that B maps into itself? The answer to this question, found by Arne Beurling (1905–1986) [3], requires a radical change of perspective. Instead of 2 (N), we must consider the related



Hardy space H 2 (named after G. H. Hardy; see the 1940 entry), which consists of  n complex power series f (z) = ∞ n=0 an z for which f =


1/2 |an |2


is finite. Each function f ∈ H 2 is analytic (see p. 151) on the open unit disk D = {z ∈ C : |z| < 1}. The spaces 2 (N) and H 2 are fundamentally the same; they are relabelled versions of each other. The shift operator on 2 (N) is essentially the same as the operator that maps f (z) to zf (z): multiplication by z shifts the Taylor coefficients of f . Beurling’s theorem asserts that the nontrivial invariant subspaces for the forward shift operator are all of the form uH 2 = {uf : f ∈ H 2 }, in which u is an inner function. What is an inner function? An inner function is a bounded analytic function on D whose boundary values on the unit circle have absolute value 1 “almost everywhere.” For example, a M¨obius transformation (see the 1956 entry) of the form u(z) = (a − z)/(1 − az) with a ∈ D is an example, as are finite products of such functions. An important factorization theorem asserts that each inner function factors as  ∞ !  π it |zn | zn − z e +z iγ N it e z dμ(e ) , exp − it z 1 − zn z −π e − z n=1 n in which γ ∈ [0, 2π) is a real constant, N ≥ 0, z1 , z2 , . . . is a (possibly finite or  vacuous) list of points in D that satisfy the Blaschke condition ∞ n=1 (1 − |zn |) < ∞ (this ensures that the infinite product converges on D), and dμ is a nonnegative singular measure [6]. This is a lot to digest! The point to take away is that Beurling’s theorem provides an unexpected link between operator theory and complex analysis. Moreover, the solution to a concrete theorem about a specific linear transformation boils down to a deep factorization theorem for a certain class of analytic functions. For an even more unexpected link between two areas of mathematics, see the 1985 entry.

Centennial Problem 1949 Proposed by Stephan Ramon Garcia, Pomona College. Give a complete description of the (topologically closed) invariant subspaces /x for the Volterra integration operator [V f ](x) = 0 f (t) dt on C[0, 1]. In particular, show that V has no eigenvalues.

1949: Comments Left and right inverses. Here are two proofs that left and right inverses coincide for n × n matrices [4]. Both involve finite dimensionality in a crucial way and avoid the unnecessary use of determinants. You may wish to consider at which point the proofs break down for the infinite matrices in (1949.2).



(a) Let A, B ∈ Mn . If AB = I, then A(Bx) = x for all x ∈ Rn . Thus, columnspace A = Rn and hence nullspace A = {0} by the dimension theorem. Let I − BA = [x1 x2 . . . xn ] be written columnwise. Then, [Ax1 Ax2 . . . Axn ] = A[x1 x2 . . . xn ] = A(I − BA) = A − (AB)A = A − IA = 0, so x1 , x2 , . . . , xn = 0. Thus, I − BA = 0 and BA = I. 2

(b) Let A, B ∈ Mn and suppose that AB = I. The n2 +1 matrices I, A, A2 , . . . , An in Mn are linearly dependent, so there is a polynomial p of degree at most n2 such that p(A) = 0. Write p(z) = cz j f (z), in which j ≥ 0, c ∈ C is nonzero, and f (z) = ak z k + ak−1 z k−1 + · · · + a1 z + a0 has a0 = 0. Then 0 = B k p(A) = cB k Ak f (A) = cf (A), so f (A) = 0. Since AB = I, we find that 0 = f (A)B = (ak Ak + ak−1 Ak−1 + · · · + a1 A + a0 I)B = ak Ak B + ak−1 Ak−1 B + · · · + a1 AB + a0 B = (ak Ak−1 + ak−1 Ak−2 + · · · + a1 I) + a0 B. Thus, B is a polynomial in A, so BA = AB = I. Weierstrass approximation theorem. While a proof of Beurling’s theorem would take us too far afield (see [5] for a modern approach), we can outline an elegant proof of the Weierstrass approximation theorem that is due to Sergei Bernstein (1880–1968) [2]. It suffices to consider [a, b] = [0, 1] since the linear function φ : [a, b] → [0, 1] defined by φ(x) = (x − a)/(b − a) maps polynomials to polynomials and so does its inverse. Let f ∈ C[0, 1]. The nth Bernstein polynomial for f is    n  k n k (Bn f )(x) = f (1949.3) x (1 − x)n−k ; n k k=0

it is a polynomial of degree at most n. Let x ∈ [0, 1] and consider a coin flip with probability of heads x and tails 1 − x, respectively. The probability of k heads in n trials is P (k, n) = nk xk (1 − x)n−k . What do the Bernstein polynomials (1949.3) represent? Think of f ∈ C[0, 1] as the “payoff function” for a coin tossing game: if there are k heads in n trials, then you win f (k/n) dollars. The expected winnings are (Bn f )(x). For large n, we expect k ≈ nx heads, so (Bn f )(x) = Expected winnings after n tosses ≈ f ( nk ) ≈ f (x). The uniform continuity of f on [0, 1] and some probability theory ensure that this informal reasoning can be pushed through. The M¨ untz–Sz´ asz theorem. Recall that the span of a set of vectors is the collection of all finite linear combinations of elements of that set. The Weierstrass approximation theorem says that span{1, x, x2 , . . .} is dense in C[a, b]. This suggests the following question. Let 0 = λ0 < λ1 < λ2 < · · · . What are necessary and sufficient conditions for span{1, xλ1 , xλ2 , . . .} to be dense in C[a, b]?



This question has a precise and elegant answer, due independently to Herman M¨ untz (1884–1956) [9] and Otto Sz´ asz (1884–1952) [11]. Let S = span{1, xλ1 , xλ2 , . . .}− denote the closure of span{1, xλ1 , xλ2 , . . .} in C[a, b], in which a > 0. ∞  1 (a) If = ∞, then S = C[a, b]. λ n=1 n (b) If

∞  1 λ < ∞ and if λ ∈ / {λn }∞ / S (so S is not dense in C[a, b]). n=0 , then x ∈ λ n n=1

The proof of the M¨ untz–Sz´asz theorem is beyond the scope of this course. Its key ingredients are the Hahn–Banach theorem and Riesz representation theorem from functional analysis and the Blaschke characterization of the zero sets of bounded analytic functions on the unit disk. A proof can be found in [8]. Here are two curious corollaries of the M¨ untz–Sz´asz theorem. Let a > 0. (a) If C[a, b] = span{1, xλ1 , xλ2 , . . .}− , then there is an infinite subsequence of the λn that can be removed from the collection {1, xλ1 , xλ2 , . . .} so that the span of the new collection is also dense in C[a, b]. (b) span{1, x2 , x3 , x5 , x7 , x11 , x13 , x17 , x19 , . . .} is dense in C[a, b]. This follows from Euler’s proof that the sum of the reciprocals of the primes diverges; see the notes for the 1913 entry. Solution to the problem. Now for the answer to our question, which (in the L2 [0, 1] Hilbert space setting) was first asked by Israel Gelfand [7]. The invariant subspaces of the Volterra integration operator are all of the form {f ∈ C[0, 1] : f (x) = 0 for x ∈ [0, a)} for some a ∈ [0, 1]. Thus, the invariant subspaces form an uncountably, linearly ordered chain of subspaces of C[0, 1]. This result was established in 1949 by Shmuel Agmon (1922– ) [1], with later proofs given by several others authors (the most influential proof being that of Donald Sarason (1933–2017) [10]). Bibliography [1] S. Agmon, Sur un probl` eme de translations (French), C. R. Acad. Sci. Paris 229 (1949), 540–542. MR0031110 [2] S. N. Bernstein, D´ emonstration du th´ ero` eme de Weierstrass, fonde´ e sur le calcul des probabilit´ es, Commun. Soc. Math. Kharkow (2) 13, 1–2. [3] A. Beurling, On two problems concerning linear transformations in Hilbert space, Acta Math. 81 (1948), 17, DOI 10.1007/BF02395019. MR0027954 [4] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge Mathematical Textbooks, Cambridge University Press, 2017. [5] S. R. Garcia, J. Mashreghi, and W. T. Ross, Introduction to model spaces and their operators, Cambridge Studies in Advanced Mathematics, vol. 148, Cambridge University Press, Cambridge, 2016. MR3526203 [6] J. B. Garnett, Bounded analytic functions, 1st ed., Graduate Texts in Mathematics, vol. 236, Springer, New York, 2007. MR2261424 [7] I. M. Gelfand, A problem (Russian), Uspehi Matem. Nauk, 5 (1938), 233. [8] P. D. Lax, Functional analysis, Pure and Applied Mathematics (New York), WileyInterscience [John Wiley & Sons], New York, 2002. MR1892228



¨ [9] C. H. M¨ untz, Uber den Approximationssatz von Weierstrass, H. A. Schwarz’s Festschrift (1914), 303–31. [10] D. Sarason, A remark on the Volterra operator, J. Math. Anal. Appl. 12 (1965), 244–246, DOI 10.1016/0022-247X(65)90035-1. MR0192355 ¨ [11] O. Sz´ asz, Uber die Approximation stetiger Funktionen durch lineare Aggregate von Potenzen (German), Math. Ann. 77 (1916), no. 4, 482–496, DOI 10.1007/BF01456964. MR1511875


Arrow’s Impossibility Theorem Introduction Kenneth J. Arrow (1921–2017) was awarded the Nobel Prize in Economics in 1972. Among the contributions cited in the prize committee’s statement was the “possibility theorem” from his doctoral dissertation on voting theory that was published as the book Social Choice and Individual Values [1–3]. Arrow set out to determine the best election procedure and narrowed the set of all procedures by requiring them to satisfy a number of desirable properties. These properties were called axioms because they represented what Arrow believed were, in some sense, the most natural properties that an election procedure should satisfy. Arrow showed that no election procedure satisfies the axioms, which we describe below, when two or more voters decide among three or more candidates. That is, the axioms are inconsistent (see the notes for the 1924 entry for another example of inconsistent axioms). His result is now referred to as Arrow’s impossibility theorem. Assume that each of m ≥ 2 voters can rank order n ≥ 3 candidates, listing them from most preferred to least preferred. An election procedure aggregates the voters’ rankings and produces a societal ranking of the candidates. A version of Arrow’s theorem from 1963 (the second edition of [1]) says that there is no election procedure that satisfies the following three axioms. • Pareto condition: If every voter prefers A over B, then the group ranks A above B. • Nondictatorship: There is not a single voter who is able to determine the group’s rankings (that is, there is no dictator). • Independence of Irrelevant Alternatives (or IIA): The societal ranking between candidates A and B should only depend on the voters’ preferences for A and B. The third axiom perhaps requires a bit more explanation. It asserts that for a society to rank A and B, it is irrelevant to factor in how the voters rank other candidates. For example, suppose that several voters change the relative rankings of B and C. This should not affect how the society ranks A and B. It may, of course, affect how the society ranks B and C. To appreciate Arrow’s theorem, we need to go back to the beginnings of voting theory. Nicolas de Condorcet (1743–1794), whose full name was Marie Jean Antoine Nicolas de Caritat, Marquis de Condorcet, was a French mathematician, political scientist, and philosopher. Although he published several papers on differential and integral calculus, in mathematics he is most famous for observing a fundamental paradox in voting theory, which we describe below. He died in prison under mysterious circumstances (some suggest poison) during the French revolution. 199



A candidate is the Condorcet winner if the candidate defeats every other candidate in a pairwise election (by being preferred by more than half of the voters to every other candidate in a head-to-head competition). However, not every collection of voters’ preferences has a Condorcet winner. In his 1785 paper Essai sur l’application de l’analyse a ` la probabilit´e des d´ecisions rendues ` a la pluralit´e des voix, Condorcet made a remarkable observation. Suppose that three voters have the following preferences for candidates A, B, and C: First choice Second choice Third choice

voter 1 voter 2 voter 3 A B C B C A C A B

Suppose that C is removed from consideration. Then we have the voter preferences First choice Second choice

voter 1 voter 2 voter 3 A B A B A B

Consequently, A would defeat B in a pairwise election (denoted A  B) because A would receive two first-choice votes (from voters 1 and 3) and B would receive only one such vote (from voter 2). Similarly, B would defeat C in a pairwise election and C would defeat A. Notationally, this is represented by ABCA and is referred to as a Condorcet cycle. A Condorcet cycle can involve more candidates. For example, we might have A  B  D  E  C  A. If a Condorcet cycle contains all of the candidates in an election, then that election does not have a Condorcet winner. An election procedure satisfies the Condorcet winner criterion (CWC) if the following holds: If a Condorcet winner exists, then the election procedure always has the Condorcet winner ranked first.

A weaker and easily accessible version of Arrow’s impossibility theorem requires just two axioms, IIA and the Condorcet winner criterion (CWC), but also supposes that the election procedure returns a top-ranked candidate; see [5, p. 343] for details.

Centennial Problem 1950 Proposed by Michael Jones, Mathematical Reviews. It is possible to show that an election procedure that satisfies IIA and CWC cannot return a single, top-ranked candidate for the three-voter Condorcet cycle above. This idea can be extended. In an election between n candidates, a set of candidates C are a top cycle if the candidates in C all defeat the candidates not in



C in pairwise contests and if there is a Condorcet cycle among all candidates in C. For example, the Condorcet cycle for the three-voter, three-candidate case above is a top cycle. For three candidates, there are two possible top cycles that involve all three candidates: A  B  C  A and A  C  B  A. For n-candidate elections, how many top cycles are possible? 1950: Comments Penrose–Banzhaf power index. In the discussion above we gave each voter the same weight. However, this is frequently not the case in practice. Consider the following two situations for a private firm. In the first, Adams owns 90% of the stock, Buchanan owns 8%, and Cleveland has the remaining 2%. In the second, Adams has 45%, Buchanan 35%, and Cleveland 20%. How much are their shares worth in each case, assuming that if over 50% of the stock supports a plan, then that plan will be done? Adams effectively controls the company in the first setting, since she can do whatever she wants and the other two cannot outvote her. The second case is more interesting, since any two of the three suffice to control the company. Thus, in this setting, it is reasonable to say each share is worth a third of the company. More generally, we define the Penrose–Banzhaf power index , named after Lionel Penrose (1898–1972) [7] and John F. Banzhaf III (1940– ) [4]) as follows. A winning coalition is a sets of voters that is sufficient to pass a measure. That is, if every member of a winning coalition votes “yes,” then the measure is guaranteed to pass. A swing vote is an additional vote necessary for a particular coalition; without that “yes” vote, the coalition is not sufficient to pass the measure. The power index of an individual is the fraction of all possible swing votes that they cast, that is, their percentage of all swing votes in all winning coalitions. In our first example, there are four winning coalitions (any coalition with Adams), and Adams casts every swing vote, so Adams’s power index is 1 while Buchanan and Cleveland both get 0. In our second example, there are seven winning coalitions (any coalition with at least two voters), and each voter has the same number of swing votes, so they all have power index 1/3. This index is useful in evaluating many real-world voting schemes, such as the United States Electoral College or the European Union. Voting in Venice. The most convoluted voting system that the authors are aware of was the method used by the Republic of Venice to elect its Doge; see Figure 1. It was established in 1268 and remained in use until the ignominious fall of the Most Serene Republic in 1797. The details and particulars are so mindbogglingly complicated that we have no choice but to quote John Julius Norwich (1929–2018), an authority on the subject [6, p. 166]. Surely we would get the finer points incorrect otherwise! On the day appointed for the election, the youngest member of the Signoria was to pray in St Mark’s; then on leaving the Basilica, he was to stop the first boy he met and take him to the Doges’ Palace, where the Great Council, minus those of its members who were under thirty, was to be in full session. This boy, known as the ballotino, would



Figure 1. The 17th-century Basilica di Santa Maria della Salute in Venice.

have the duty of picking the slips of paper from the urn during the drawing of lots. By the first of such lots, the Council chose thirty of their own number. The second was used to reduce the thirty to nine, and the nine would then vote for forty, each of whom was to receive at least seven nominations. The forty would then be reduced, again by lot, to twelve, whose task was to vote for twenty-five, of whom each this time required nine votes. The twenty-five were in turn reduced to another nine; the nine voted for forty-five, with a minimum of seven votes each, and from these the ballotino picked out the names of eleven. The eleven now voted for forty-one—nine or more votes each—and it was these forty-one who were to elect the Doge. They first attended Mass, and individually swore an oath that they would act honestly and uprightly, for the good of the Republic. They were then locked in secret conclave in the Palace, cut off from all contact or communication with the outside world and guarded by a special force of sailors, day and night, until their work was done. So much for the preliminaries; now the election itself could begin. Each elector wrote the name of his candidate on a paper and dropped it in the urn; the slips were then removed and read, and a list drawn up of all the names proposed, regardless of the number of nominations for each. A single slip for each name was now placed in another urn, and one drawn. If the candidate concerned was present, he retired together with any other elector who bore the same surname, and the remainder proceeded to discuss his suitability. He was then called back to answer questions or to defend himself against any accusations. A ballot followed. If he obtained the required twenty-five votes, he was declared Doge; otherwise a second name was drawn, and so on. With a system so tortuously involved as this, it may seem remarkable that anyone was ever elected at all.



Solution to the Centennial Problem. A top cycle requires three or more candidates. For a set of k candidates, there are (k − 1)! circular permutations of the k candidates. Hence, there are (k − 1)! top cycles for a set of k candidates. For n-candidate elections, the number of possible top cycles is n   n n    n n!(k − 1)! n! = . (k − 1)! = k k!(n − k)! k(n − k)! k=3



Bibliography [1] K. J. Arrow, Social Choice and Individual Values, Cowles Commission Monograph No. 12, John Wiley & Sons, Inc., New York, N. Y.; Chapman & Hall, Ltd., London, 1951. MR0039976 [2] K. J. Arrow, Social Choice and Individual Values (second edition), Cowles Foundation Monographs Series 12, Yale University Press, 1963. [3] K. J. Arrow, Social Choice and Individual Values (third edition), Cowles Foundation Monographs Series 12, Yale University Press, 2012. http://www.jstor.org/stable/j.ctt1nqb90. [4] J. F. Banzhaf, Weighted voting doesn’t work: a mathematical analysis, Rutgers Law Review 19 (1965), no. 2, 317–343. http://heinonline.org/HOL/Page?handle=hein.journals/ rutlr19&div=19&g_sent=1&collection=journals. [5] COMAP, For All Practical Purposes (ninth edition), W. H. Freeman and Company, 2013. [6] J. J. Norwich, A History of Venice, Vintage Books, 1989. [7] L. Penrose, The elementary statistics of majority voting, Journal of the Royal Statistical Society 109 (1946), no. 1, 53–57. http://www.jstor.org/stable/2981392?seq=1# page_scan_tab_contents.


Tennenbaum’s Proof of the Irrationality of Introduction There are now hundreds of proofs of the irrationality of familiar is the following:

√ 2

√ 2. Perhaps the most

√ Suppose toward a contradiction that 2 = a/b, in which a, b are relatively prime integers and b = 0. Squaring the preceding equation, we obtain 2b2 = a2 . This shows that a2 is even, so a is even too. Write a = 2c with c a positive integer, so that 2b2 = (2c)2 = 4c2 . Thus, b2 = 2c2 . This shows that b2 , and hence b itself, is even. Thus,√2 divides both a and b, which is a contradiction. We conclude that 2 is irrational.

It is worth noting that we used the fact that 2 is a prime number. Indeed, if p is a prime number that divides a perfect square a2 , then p divides a itself. This does not hold in general since, for example, 4 divides 36 = 62 , but 4 does not divide 6. Sometime in the 1950s Stanley Tennenbaum came up with the √ following geometric gem (see Figure 1). Suppose toward a contradiction that 2 = a/b, in which the positive integer b is as small as possible. Consider a square with sides of length a, and draw squares of side length b in the upper-left and lower-right corners. Since a2 = 2b2 , the area of the two squares of side length b equals that of the large square of side length a. The figure suggests that these two squares miss two small squares with side length a − b and double count a square with side length 2b − a. Consequently, the double counted region must have√the same area as the two missing squares; that is, (2b − a)2 = 2(a − b)2 . Thus, 2 = (2b − a)/(a − b)





2b − a a−b Figure 1. Illustration of Tennenbaum’s proof of the irrationality of 205

√ 2.



√ 2

Figure 2. A misleading “proof by picture.” The large triangles have the same area, so subtracting the areas of the four congruent pieces from each large triangle yields the same area. Thus, 0 = 1. and√a little more work shows 0 < a − b < b. This contradicts the minimality of b, so 2 is irrational. For a discussion of the history of Tennenbaum’s proof, see [4]. Is Tennenbaum’s proof valid? One must always be wary about “proofs by picture.” There are many appealing visual “proofs” that are wrong; see Figure 2. Fortunately, the geometric intuition used in Tennenbaum’s proof can be formalized. It is instructive to see what the fundamental ingredients are and how the proof √ can proceed without the use of diagrams. Suppose toward a√contradiction that 2 is rational and let b be the smallest natural number so that 2 = a/b for some integer a. More explicitly, the well-ordering principle ensures that √ b = min{n ∈ N : 2n ∈ Z} exists. We claim that a > b; if not, then we obtain a contradiction: a ≤ b


2 = a2 2b  since

≤ b2


2 ≤ 1.

2 = a/b

Since 2 > 1, we must √ have a > b. A few algebraic manipulations lead us to another representation of 2 as a quotient of integers:  √ √ √ √ 2− a 2b − a 2−1 2− 2 . (1951.1) = √ = a b = 2 = 2 √ − 1 a−b 2−1 2−1 b √ Since a − b > 0 and 2 > 0, it follows from the preceding that 2b − a > 0 and hence 0 < a − b < b. Because (1951.1) contradicts the minimality of b, we conclude that

2 is irrational.



Centennial Problem 1951 Proposed by Steven J. Miller, Williams College.

√ Tennenbaum’s construction is beautiful and gives the irrationality of 2. Steven J. Miller and √ David √ √Montague √ used similar geometric arguments to get the irrationality of 3, 5, 6, and 10 [3]. Can you geometrically prove the irrationality √ √ of 7 or 3 2? 1951: Comments

The square root of 2 has been called the Rome of mathematics, for all roads lead to it [2, p. 207]. Here are some lesser-known proofs for your enjoyment. Linear representation of the gcd. We begin by exploring a consequence of the Euclidean algorithm: if a, b are two nonzero integers, then there exist integers x and y such that gcd(a, b) = ax + by; see [5]. In other words, the greatest common divisor of a and b is an integral linear combination of a and b. In the language of algebra, this says that the ring Z is a principal ideal domain. We can use this property of the integers to give a quick √ proof that n is irrational if the natural number n is not a perfect square. Suppose √ that n = a/b, in which a, b are relatively prime. Then there exist x, y ∈ Z so that √ √ 1 = ax + by. Since na = bn and nb = a, it follows that √ √ √ √ n = n(ax + by) = ( na)x + ( nb)y = bnx + ay √ is an integer. This contradicts the hypothesis that n is not a perfect square, so n is irrational. Analytic proof. Although this proof appears unnecessarily complicated, it contains techniques that can be recycled to great effect √ in the theory of Diophantine approximation. Assume toward a contradiction that 2 = p/q, in which p, q are integers and q ≥ 1. Let √ en = ( 2 − 1)n and observe that 0
0 and akn ,kn+1 > 0 for 1 ≤ n ≤ m − 1; (b) is aperiodic; that is, there is at least one i ∈ Z with pi,i > 0; (c) is reversible; that is, pi ai,j = pj aj,i for all i, j ∈ Z;  (d) leaves p∞ stationary; that is, i∈Z pi ai,j = pj for all j ∈ Z [Hint: Use (c)]; (e) converges to p∞ as described above. Hint: This follows from (a), (b), and (d) by the standard Markov chain convergence theorem [5, Sect. 1.8].



1953: Comments The interesting posts [6, 9] give a nice background of the history of Markov chains, some surprising examples, and code to explore. Andrey Markov (1856–1922) introduced the chains now named for him in 1913 while performing an analysis of the sequence of consonants and vowels in the work of the Russian writer Alexander Pushkin (1799–1837). In particular, he found that he could create state diagrams in which the transition probability to the next letter depended only on the previous two letters. In the intervening years these ideas have been successfully extended and applied to numerous other problems. There are many readable accounts of the history of these algorithms [1, 8]. Motivation for these extensions and improvements range from studying the behavior of neutrons in fissile material to estimating the probability that certain solitaire games are winnable. The applications are almost as varied. One reason for this is that we can use these ideas to estimate integrals and areas; it is desirable to be able to determine areas since these frequently correspond to probabilities. For more on this see the 1946 entry. Bibliography [1] D. B. Hitchcock, A history of the Metropolis-Hastings algorithm, Amer. Statist. 57 (2003), no. 4, 254–257, DOI 10.1198/0003130032413. http://www.jstor.org/stable/pdf/30037292. pdf. MR2037852 [2] N. Metropolis, The beginning of the Monte Carlo method, Stanislaw Ulam 1909–1984, Los Alamos Sci. 15, Special Issue (1987), 125–130. http://library.lanl.gov/cgi-bin/ getfile?15-12.pdf. MR935771 [3] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, Equations of state calculations by fast computing machines, J. Chem. Phys. 21 (1953), 1087–1091. [4] N. Metropolis and S. Ulam, The Monte Carlo method, J. Amer. Statist. Assoc. 44 (1949), 335–341. http://www.jstor.org/stable/pdf/2280232.pdf. MR0031341 [5] J. R. Norris, Markov chains, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 2, reprint of 1997 original, Cambridge University Press, Cambridge, 1998. http://www. statslab.cam.ac.uk/~james/Markov/. MR1600720 [6] O. Pavlyk, Centennial of Markov Chains, Mathematica Algorithm R&D, http://blog. wolfram.com/2013/02/04/centennial-of-markov-chains/. [7] http://probability.ca/jeff/java/rwm.html [8] C. Robert and G. Casella, A short history of Markov chain Monte Carlo: subjective recollections from incomplete data, Statist. Sci. 26 (2011), no. 1, 102–115, DOI 10.1214/10-STS351. http://arxiv.org/pdf/0808.2902.pdf. MR2849912 [9] A. Smith, Surprising examples of Markov chains, http://mathoverflow.net/q/252671.


Kolmogorov–Arnold–Moser Theorem Introduction One can regard perturbation theory as a collection of various methods for obtaining approximate solutions to difficult problems based upon exact solutions to closely related, but simpler, problems. Applied mathematicians and scientists use the tools of perturbation theory to infer information about problems that model dynamical systems under the influence of gravitational or quantum forces. For example, the planet Neptune was discovered in 1846 as a result of calculations made by the French mathematician Urbain Le Verrier (1811–1877) and mathematicianastronomer John Couch Adams (1819–1892), based on the perturbations of the planet Uranus due to the gravitational influence of the then-unknown Neptune. It was a momentous day in the history of science when mathematicians told astronomers where to point their telescopes to see the first new planet discovered since 1781, sixty-five years earlier. Around the turn of the 20th century, Henri Poincar´e, expanding on the work of the “problem of small denominators” by astronomer Charles-Eug´ene Delaunay (1816–1872), first postulated that small perturbations can have large effects on a dynamical system. In popular culture, this is known as “chaos” or the “butterfly effect.” The “problem of small denominators” refers to issues arising from potentially small quantities that appear in the denominators of the formal Fourier series constructed to solve the problem. These can cause convergence issues in the proposed perturbative series, a problem solved with the advent of KAM theory [8]. The Kolmogorov–Arnold–Moser theorem concerns the behavior of systems under small perturbations; see [3, 4, 8]. The first set of results are due to Andrey Kolmogorov (1903–1987) in 1954, which were later extended in 1962 by J¨ urgen Moser (1928–1999) and further developed by Vladimir Arnold (1937–2010) a year later. Essentially, the Kolmogorov–Arnold–Moser theorem provides criteria under which a system of partial differential equations have little “chaotic” behavior under small perturbations. One of the most important examples arises from physics. In the Hamiltonian formulation we have position variables q = (q1 , q2 , . . . , qn ), momentum variables p = (p1 , p2 , . . . , pn ), the Hamiltonian function H(p, q) (which often corresponds to the total energy of the system), and the time evolution given by dp = −∂q H dt and dq = ∂p H. dt 221



One then studies how the solutions evolve over time. In this setting, KAM theory states that for sufficiently small perturbations the new behavior should be close to that of the unperturbed system. See [1] for a nice perspective written fifty years after Kolmogorov’s work. We can see some of the issues by looking at an example from that paper: complex linearization. Consider a map F (z) = λz + f (z) for some nonzero λ ∈ C. We wish to find a function Φ such that (Φ ◦ F )(z) = λΦ(z). If f has a series expansion, then we can formally find a series expansion for Φ. If 0 < |λ| = 1, the series for Φ converges, but issues arise if |λ| = 1. In that case we may write λ = e2πiα for some α ∈ R, and the behavior depends on how well approximable α is by rational numbers. We elaborate on this in the associated problem and encourage the interested reader to consult [1].

Centennial Problem 1954 Proposed by Avery T. Carr, Emporia State University, and Steven J. Miller, Williams College. A key ingredient in KAM theory is the irrationality type of certain parameters. This concept measures how well one can approximate a given number by rational numbers. In this problem, we need a notion of how well approximable an irrational number is by rationals. We can get as good of an approximation as desired simply by taking more and more decimal digits; thus, our notion cannot simply be how far our rational approximation is from the original number. It is fruitful to measure the “cost” of a rational approximation by the size of the denominator used. This is a reasonable notion, since a large improvement using a small denominator is more impressive than a large improvement obtained with a larger denominator. For example, if we want π to 6 decimal places we could use 31415926 = 3.1415926; 10000000 however, 355 = 3.14159292 . . . 113 also gives us 6 decimal places of accuracy while having a much smaller denominator. Such unusually good rational approximations can be found with continued fractions; see [5–7] and the entries for 1931, 1934, and 1955. An irrational number α is of type (K, ν) (for positive K, ν) if     α − p  > K  q  qν for all integers p, q. In other words, we cannot approximate α too well by rationals. The following problem assumes some familiarity with measure theory; for a brief introduction to these ideas see [7, Appendix A.5].



(a) Prove that for any irrational α there exist infinitely many relatively prime pairs of integers p, q such that     α − p  < 1 .  q  q2 This is known as Dirichlet’s approximation theorem. It implies that every irrational number can be approximated fairly well. (b) Consider all irrational numbers in [0, 1] of type (1, 2 + ) for a fixed > 0. What is the measure of such numbers? More generally, what is the measure of all irrational numbers in [0, 1] that are of type (K, 2 + ) for a fixed > 0 (so K is allowed to vary)? 1954: Comments In the 1938 entry we saw how the irrationality of α affected the rate of convergence of the sequence {αn } to Benford’s law. The results of this year provide another example of irrationality in action and serve as an excellent bridge to the 1955 entry on Roth’s theorem. Dirichlet’s approximation theorem. The standard solution to problem (a) can be found in [7] and many other number theory books. The proof proceeds by Dirichlet’s box principle (the pigeonhole principle): if we place n + 1 pigeons in n boxes, at least one box must have two pigeons. We may assume that 0 < α < 1. For each Q, partition [0, 1) into Q intervals of length 1/Q: [0, 1/Q),

[1/Q, 2/Q), . . . ,

[(Q − 1)/Q, 1).

Consider qα (mod 1) for 0 ≤ q ≤ Q, the fractional parts of qα. Each must lie in one of the Q bins above. Since we have Q + 1 fractional parts, at least one bin must contain two of them, say q1 α (mod 1) and q2 α (mod 1). Moreover, these fractional parts must be distinct because α is irrational. This implies there are integers p1 , p2 such that q1 α − p1 and q2 α − p2 are in the same bin. Equivalently, the absolute value of their difference is at most 1/Q: |(q1 α − p1 ) − (q2 α − p2 )|
0 such that    1 p   0 < α −  < r (1955.1) q q has infinitely many solutions with relatively prime integers p, q and q > 0. Dirichlet’s approximation theorem asserts that μ(α) ≥ 2 for all irrational α; see Table 1. To be more specific, Dirichlet proved that (1955.1) has infinitely many solutions when r = 2. For such p, q, the error in the approximation α ≈ p/q is much smaller than one has a right to expect since consecutive rational numbers with denominator q are at a distance of 1/q from each other. An error bounded above by 1/q 2 seems like too much to ask for. Such excellent approximations can be produced with truncated continued fraction expansions (see the 1931 entry). On the other hand, each rational α has μ(α) = 1. To see this, write α = a/b in lowest terms and let δ > 0. Suppose that (1955.1) has infinitely many solutions with r = 1 + δ. Then    aq − bp   < 1 . 0 0, the inequality     α − p  < 1  q  q 2+ has only finitely many solutions with relatively prime integers p, q and q > 0. Thus, an irrational algebraic real number cannot have many “extremely good” rational approximations. The origins of this work go back to Joseph Liouville (1809–1882), who proved in 1844 that μ(α) ≤ d for an algebraic number of degree d ≥ 2; see the 1935 entry for a proof of this result. In 1909, Axel Thue (1863–1922) improved this to d/2 + 1 + for √ every > 0. √ This bound was reduced to 2 d by Carl Ludwig Siegel (1896–1981) in 1921 and to 2d by mathematician-physicist Freeman Dyson in 1947 (see the 1928 entry). Siegel had conjectured that μ(α) = 2 for all algebraic irrational numbers; this was finally proved by Roth in 1955. Due to the recent explosion of work in additive combinatorics [5], the phrase “Roth’s theorem” now often refers to Roth’s theorem on arithmetic progressions (1953), which asserts that if A ⊆ Z has positive upper density, meaning that lim sup N →∞

|A ∩ [−N, N ]| > 0, 2N

then A contains infinitely many arithmetic progressions of length three; see the 1913 entry. This is the first nontrivial case of Szemer´edi’s theorem; see the 1975 entry. To avoid confusion, Roth’s theorem on Diophantine approximation is sometimes referred to as the Thue–Siegel–Roth theorem. Centennial Problem 1955 Proposed by Steven J. Miller, Williams College. Find a one-to-one function f : [0, 1] → [0, 1] such that f (x) is always transcendental. Can you find a continuous function that does this? If so, can you make your function differentiable?



1955: Comments Hint for the problem. Consider the binary expansion x =

∞  bn (x) , 2n n=1

bn (x) ∈ {0, 1},

of x ∈ [0, 1). The expansion is unique if x is irrational. If x is rational, then it has two binary expansions; take the infinite one. Motivated by Liouville’s construction of a transcendental number (see the 1935 entry), define M (x) =


10−(bn (x)+1)n! .


Prove that M (x) is always transcendental. What properties does M have? Is it continuous? Strictly increasing? One-to-one? The Flint Hills series. Here is another difficult question related to Diophantine approximation. Does the Flint Hills series ∞  n=1

1 n3 sin2



converge? In case you were wondering, the nomenclature refers to Flint Hills, Kansas [6, Chapter 25]. One suspects that the n3 in the denominator should force the series to converge. However, sin n gets close to zero every now and then. For example, the exceptionally good rational approximation π ≈ 355/113 means that sin 355 = −0.000030144 . . . is dangerously close to sin 113π = 0. This results in a big jump: the 354th partial sum of (1955.2) is approximately 4.8 and the 355th partial sum is approximately 29.4. Figures 1 and 2 illustrate this sort of behavior.

Figure 1. First 75 partial sums of the Flint Hills series (1955.2). The nth partial sum tends to be significantly larger than the (n − 1)st whenever n is the numerator of an unusually accurate rational approximation to π.



Figure 2. First 400 partial sums of the Flint Hills series (1955.2). The huge jump between the 354th and 355th partial sums is evident.

As of 2018, it is unknown whether the Flint Hills series converges. Its relevance to this entry stems from the fact that its convergence would imply that μ(π) ≤ 2.5 [1]. That would be a huge improvement over the best known result, μ(π) ≤ 7.6063. Furstenberg’s proof of Euclid’s theorem. The year 1955 is also notable for Hillel Furstenberg’s remarkable topological proof of the infinitude of the primes (Euclid’s theorem) [2]. The essence of Furstenberg’s proof was later highlighted by Idris D. Mercer [3] in 2009 when he provided a variant of the proof without the use of point-set topology. Let X be a set. A collection τ of subsets of X is a topology on X if (a) ∅, X ∈ τ , (b) τ is closed under arbitrary unions, (c) τ is closed under finite intersections. A set X endowed with a topology τ is a topological space. The elements of τ are open sets; a subset of X is a closed set if its complement is open. A base β for X is a subset β of τ such that each element of τ is a union of elements of β. If this is the case, one says that “β generates the topology τ on X.” Now for Furstenberg’s proof of Euclid’s theorem. Consider the topology τ on X = Z generated by the collection of all infinite arithmetic progressions Ba,n = {a + kn : k ∈ Z}. Thus, each Ba,n is open and each open set in Z is a union of some collection of infinite arithmetic progressions. Since c Ba,n =

n−1 *

Ba+j (mod n),n

j=1 c is a finite union of open sets, it follows that each Ba,n is open. Consequently, each Ba,n is closed (in addition to being open).



Suppose toward a contradiction that there are only finitely many primes. Since the finite union of closed sets is closed2 , we conclude that * B0,p A= p prime

is closed. Since every integer except −1 and 1 is a multiple of some prime, it follows that {−1, 1} = Z\A is a nonempty open set that contains no infinite arithmetic progression. This contradiction shows that there are infinitely many prime numbers. Much of the preceding material on Furstenberg’s proof was originally contained in the 1948 entry, whose problem was written by James M. Andrews. The original problem asked: Is the Furstenberg topology Hausdorff? Is it regular? Is it normal? Here is a quick explanation of the terminology for those who have not taken a course in point-set topology. A topological space X is (a) Hausdorff if whenever x, y ∈ X are distinct points, there are disjoint open sets U, V ⊂ X with x ∈ U and y ∈ V , (b) regular if whenever A ⊂ X is closed and x ∈ X\A, there are disjoint open sets U, V ⊂ X with x ∈ U and A ⊂ V , and (c) normal if whenever A, B ⊂ X are disjoint closed sets, there are disjoint open sets U, V ⊂ X with A ⊂ U and B ⊂ V . Bibliography [1] M. A. Alekseyev, On convergence of the Flint Hills series, https://arxiv.org/pdf/1104. 5100.pdf [2] H. Furstenberg, On the infinitude of primes, Amer. Math. Monthly 62 (1955), 353, DOI 10.2307/2307043. MR0068566 [3] I. D. Mercer, On Furstenberg’s proof of the infinitude of primes, Amer. Math. Monthly 116 (2009), no. 4, 355–356, DOI 10.4169/193009709X470218. MR2503321 [4] V. Kh. Salikhov, On the irrationality measure of π (Russian), Uspekhi Mat. Nauk 63 (2008), no. 3(381), 163–164, DOI 10.1070/RM2008v063n03ABEH004543; English transl., Russian Math. Surveys 63 (2008), no. 3, 570–572. MR2483171 [5] T. Tao and V. H. Vu, Additive combinatorics, paperback edition [of MR2289012], Cambridge Studies in Advanced Mathematics, vol. 105, Cambridge University Press, Cambridge, 2010. MR2573797 [6] C. A. Pickover, The mathematics of Oz: Mental gymnastics from beyond the edge, Cambridge University Press, Cambridge, 2002. MR1936664

2 This

follows from de Morgan’s law (

n i=1

S i )c =

n i=1

Sic and axiom (c).


The GAGA Principle Introduction In calculus one encounters a vast array of “transcendental” functions such as ex , sin x, and log x. In multivariable calculus (with functions) and differential geometry (with smooth maps), the abundance of “transcendental” functions and maps becomes even more pronounced. In 1956, it was shown by Jean-Pierre Serre (1926– ), who had been awarded the Fields Medal in 1954, that in the setting of complex variables, under a compactness hypothesis many “transcendental-looking” geometric and function-theoretic constructions are algebraic from an appropriate point of view and, moreover, that such an “algebraization” of the analytic construction is unique. This result explained many earlier known special cases and was of fundamental importance in the development of algebraic and complex analytic geometry. Not only did it justify the role of transcendental methods in the solution of algebraic problems admitting a sufficiently geometric flavor, it also inspired the profound work of Alexander Grothendieck (1928–2014) and many others during the revolution that swept through algebraic geometry in the 1960s. Serre’s method of proof was sufficiently robust that it was later generalized to apply to geometric constructions over the p-adic numbers instead of C, and this generalization is a ubiquitous tool in contemporary algebraic number theory. His 1956 paper is titled “G´eom´etrie alg´ebrique et g´eom´etrie analytique”, or GAGA for short, and the phrase “GAGA principle” expresses the idea that in the presence of compactness, certain analytic constructions in geometry over C not only admit an algebraic description (which is already quite striking) but in fact an essentially unique one. Centennial Problem 1956 Proposed by Brian Conrad, Stanford University. This problem develops the classical content of Serre’s theorem in the onedimensional case and assumes familiarity with undergraduate complex analysis. Let f be a meromorphic function on C. It is meromorphic at ∞ if f (1/z) is meromorphic at 0. (a) Prove that every rational function is meromorphic at ∞. (b) Prove that if f is meromorphic at ∞, then f is a rational function. Deduce that if a holomorphic automorphism f : C → C is meromorphic at ∞, then f (z) = az + b for some a ∈ C× and b ∈ C. Hint: Show that f has only finitely many zeros and poles in C. Use this to reduce to the case in which f has no 233



such zeros or poles. By studying the zero or pole order of f (1/z) at z = 0, get to a case where Liouville’s theorem can be applied. 1956: Comments Fermat’s last theorem. Since J. P. Serre has played an important role in many entries in this collection, it is worth mentioning another here which we explore in greater detail in the 1995 entry: Fermat’s last theorem. Fermat’s last theorem states that if n ≥ 3, then there are no solutions to an + bn = cn in natural numbers a, b, c. Although Pierre de Fermat claimed to have a remarkably simple proof almost four hundred years ago, the only known proof uses an enormous amount of machinery from 20th-century mathematics. The following summary is painfully short, and the interested reader is encouraged to peruse the references from the 1995 entry. In the 1960s Yves Hellegouarch (1936– ) considered what could be done if a solution (a, b, c) existed for some n. He associated the elliptic curve y 2 = x(x − an )(x + bn ) to this solution and saw that it would have some special properties. A few years later in the 1980s, Gerhard Frey (1944– ) explored these curves again and proposed that such curves would not be “modular.” However, it was believed that all elliptic curves are modular (one interpretation is that there is a weight-2 cuspidal newform associated to the curve). Serre noticed a mistake in Frey’s proof of the nonmodularity of his curves; this issue (the epsilon conjecture) was proved by Ken Ribet (1948– ) in 1986. The proof of Fermat’s last theorem follows from showing all semistable elliptic curves over Q are modular, something accomplished by Andrew Wiles with assistance from Richard Taylor (1962– ) in the 1990s. A word about fonts. A convention going back to the influential books of Nicholas Bourbaki (a pseudonym under which a group of mathematicians, mostly French, published a series of influential textbooks known for their high level of abstraction) in the 1930s is that canonical mathematical structures should be denoted with a boldface font, including various number systems such as as Z, Q, R, and C. Since such boldface is hard to replicate in handwriting, Kunihiko Kodaira (1915–1997) proposed the variants Z, Q, R, and C when writing by hand. Paradoxically, with the advent of modern mathematical typsetting, these latter fonts became more widespread in typesetting than the boldface fonts they were invented to replicate. Both Conrad and Serre (whose work is featured here) feel strongly that only boldface should be used in the typography for these number systems, so we have followed that convention here. Bibliography [1] J.-P. Serre, G´ eom´ etrie alg´ ebrique et g´ eom´ etrie analytique (French), Ann. Inst. Fourier, Grenoble 6 (1955), 1–42. MR0082175 [2] Wikipedia, Algebraic and analytic geometry, http://en.wikipedia.org/wiki/ Algebraic_geometry_and_analytic_geometry.


The Ross Program Introduction The Ross Mathematics Program is an intensive residential summer program for talented high school students. Arnold Ross (1906–2002) founded the program at the University of Notre Dame in 1957. He later moved it to the Ohio State University in 1964. Although Dr. Ross stepped down in 2000, the Ross Program continues to run, involving about seventy-five first-year students every summer. The central goal of the program is to train students to think like mathematicians and to write convincing, logical proofs of their mathematical observations. Ross chose number theory as the vehicle for this learning process. Starting from the axioms for the ring of integers, Ross participants analyze topics such as modular arithmetic, Euclid’s algorithm, quadratic reciprocity, and the existence of primitive roots. They also consider analogues of those ideas in other contexts such as the Gaussian integers and the ring of polynomials over Z/pZ. Further information about the Ross Program is posted at http://www.math.osu.edu/ross. The problems below are taken from some of the Ross problem sets. Centennial Problem 1957 Proposed by Daniel Shapiro, The Ohio State University. Let gcd(a, b) denote the greatest common divisor of integers a and b. The sequence 2n − 1 enjoys a curious property: gcd(2m − 1, 2n − 1) = 2gcd(m,n) − 1. We give this property a name: a sequence {An }n≥1 of positive integers has the gcd property if gcd(Am , An ) = Agcd(m,n) for every pair of indices m, n. Problem 1. Show that the following sequences have the gcd property: (a) the constant sequence Cn = r, in which r ∈ N is fixed, (b) the linear sequence Ln = rn, in which r ∈ N is fixed, (c) for fixed c, k ∈ N, the sequence  c if n is a multiple of k, E(k, c)n = 1 otherwise, (d) for fixed a, b ∈ N with a > b, the sequence Rn = an − bn , (e) the Fibonacci numbers, defined by F1 = F2 = 1 and Fn+2 = Fn+1 + Fn . 235



Problem 2. For a sequence {bn }n≥1 of positive integers, define Bn = bd . d|n

For example, B2 = b1 b2 ,

B4 = b1 b2 b4 ,

and B6 = b1 b2 b3 b6 .

If gcd(bm , bn ) = 1 whenever m = n, show that {Bn } has the gcd property. (a) Which {bn } produce sequences {Bn } with the gcd property? (b) Does every {Bn } with the gcd property arise from some (unique) integer sequence {bn }? 1957: Comments Cyclotomic polynomials. The second problem is related to the factorization xn − 1 = Φd (x), d|n

in which Φd (x) denotes the dth cyclotomic polynomial. To be more specific, Φd (x) is the monic (leading coefficient 1) polynomial whose roots are the primitive dth roots of unity. The primitive dth roots of unity are exp(2πij/d), in which i2 = −1, j ∈ {1, 2, . . . , d}, and gcd(j, d) = 1; see Figure 1.

− 12 +

√ 3 2 i

i 1 2



√ 3 2 i


− 12

√ 3 2 i



1 2

√ 3 2 i

Figure 1. The sixth roots of unity are the vertices of an equilateral hexagon inscribed in the unit circle in the complex plane. √ 3 1 πi/3 The primitive sixth roots of unity are e = 2 + 2 i and √ e5πi/3 = 12 − 23 i, denoted by red dots above.



Although they are defined in terms of their roots, which are certain complex roots of unity, cyclotomic polynomials have only integer coefficients. The first few cyclotomic polynomials are Φ1 (x) = x − 1, Φ2 (x) = x + 1, Φ3 (x) = x2 + x + 1, Φ4 (x) = x2 + 1, Φ5 (x) = x4 + x3 + x2 + x + 1, Φ6 (x) = x2 − x + 1, Φ7 (x) = x6 + x5 + x4 + x3 + x2 + x + 1, Φ8 (x) = x4 + 1, Φ9 (x) = x6 + x3 + 1, Φ10 (x) = x4 − x3 + x2 − x + 1. If Bn = 2n − 1, then bd = Φd (2) is a sequence of integers with the gcd property. Before proceeding to another gcd-related gem (see the 1951 and 1977 entries for more applications of the gcd), we first want to dispel a natural conjecture about cyclotomic polynomials. A glance at the first hundred or so cyclotomic polynomials suggests that their coefficients are always −1, 0, or 1. This is false, since Φ105 (x) = x48 + x47 + x46 − x43 − x42 − 2x41 − x40 − x39 + x36 + x35 + x34 + x33 + x32 + x31 − x28 − x26 − x24 − x22 − x20 + x17 + x16 + x15 + x14 + x13 + x12 − x9 − x8 − 2x7 − x6 − x5 + x2 + x + 1. When one realizes that 105 = 3 · 5 · 7 is the smallest number that is the product of three distinct odd primes, it becomes slightly more reasonable to expect that the first counterexample might take so long to materialize. Invisible forests. Imagine that there is a slender tree planted at each lattice point (x, y) ∈ Z2 and pretend that you are at the origin (0, 0). How many lattice points can you “see” from the origin? Which ones are blocked by trees? Are there arbitrarily large portions of the forest that are not visible from the origin? See [2] for interesting generalizations of this problem. If gcd(x, y) = g = 1, then x = gx and y = gy  for some (x , y  ) ∈ Z2 , so that the tree planted at (x , y  ) “blocks” our view of (x, y) = g(x , y  ). In general, a lattice point (x, y) is visible from the origin if and only if gcd(x, y) = 1; see Figures 2 and 3. Based upon this, one can show that the proportion of lattice points visible from the origin is 6/π 2 ≈ 60.8%; see the notes to the 1939 entry. The following result is from [1, Thm. 5.29]: the set of lattice points visible from the origin contains arbitrarily large square gaps. That is, given any positive integer n, there exists a lattice point (a, b) such that none of the lattice points (a + j, b + k) with 0 < j, k ≤ n is visible from the origin.



Figure 2. Visible lattice points in the region [−10, 10] × [−10, 10]. The proof is an elegant use of prime numbers and the Chinese remainder theorem. Given n > 0, form the n × n matrix ⎡ ⎤ 2 3 · · · pn ⎢ pn+1 pn+2 · · · p2n ⎥ ⎢ ⎥ ⎢ .. .. .. ⎥ ⎣ . . . ⎦ p(n−1)n+1




whose first row consists of the first n primes, whose second row consists of the next n primes, and so on. Let rj be the product of the primes in the jth row and let cj denote the product of the primes in the jth column. Since none of the primes p1 , p2 , . . . , pn2 can lie in two rows or two columns simultaneously, it follows that gcd(rj , rk ) = gcd(cj , ck ) = 1 whenever j = k. The Chinese remainder theorem asserts that the system of congruences x ≡ −1 (mod r1 ),

x ≡ −2 (mod r2 ), . . . ,

x ≡ −n (mod rn )

has a unique solution a modulo r1 r2 · · · rn . Similarly, the system y ≡ −1 (mod c1 ),

y ≡ −2 (mod c2 ), . . . ,

y ≡ −n (mod cn )



Figure 3. Visible lattice points in the region [−50, 50] × [−50, 50]. has a solution b that is unique modulo c1 c2 · · · cn . Observe now that r1 r2 · · · rk = c 1 c 2 · · · c k = 2 · 3 · 5 · · · p k 2 . Consider the square with corners at (a, b) and (a + n, b + n). Any lattice point inside of this square can be written in the form (a + j, b + k), in which 0 < j, k < n (the points with either j = n or k = n lie on the boundary of the square). Since a ≡ −j (mod rj )

and b ≡ −k (mod ck )

by the definition of a and b, it follows that rj |(a + j) and ck |(b + k). Thus, the prime number at the intersection of row j and column k divides a + j and b + k. Consequently, gcd(a + j, b + k) = 1 and hence (a + j, b + k) is not visible from the origin if 0 < j, k ≤ n. This proves that there exists a square of n2 lattice points that are not visible from the origin. Bibliography [1] T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Heidelberg, 1976. MR0434929 [2] E. H. Goins, P. E. Harris, B. Kubik, and A. Mbirika, Lattice point visibility on generalized lines of sight, Amer. Math. Monthly 125 (2018), no. 7, 593–601, DOI 10.1080/00029890.2018.1465760. MR3836421



[3] D. Goss (editor), Arnold Ross Memorial Issue, Journal of Number Theory 110 (2005), no. 1. In particular, see Arnold Ephraim Ross (1906–2002), p.1-2. http://www.sciencedirect.com/ science/journal/0022314X/110/1. [4] M. Dziemia´ nczuk and Wieslaw Bajguz, On GCD-morphic sequences, 2008, http://arxiv. org/abs/0802.1303.


Smale’s Paradox Introduction There are many remarkable results in topology that are counterintuitive. One of the most famous is the subject of our 1924 entry, the Banach–Tarski paradox. It asserts that the three-dimensional unit ball can be partitioned into finitely many disjoint subsets that can be rearranged using rigid motions to form two identical unit balls. This appears to violate our notion of volume. Smale’s paradox is another strange result about supposedly familiar objects. Imagine a sphere composed of a material that can pass through itself. Without puncturing or creasing the material, is it possible to turn the sphere inside out? Stephen Smale (1930– ) shocked the mathematical world in 1958 when he proved that sphere eversion is possible [2]. However, his proof was difficult to distill into an explicit regular homotopy. It was through the work of many others, including Arnold Shapiro (1921–1962) and Bernard Morin (1931– ), that the first concrete geometric representation of a sphere eversion emerged. In particular, William Thurston (1946–2012) discovered a clever explicit construction, now known as Thurston’s corrugations. Using the methods of Thurston’s corrugations, the sphere is corrugated and then the top and the bottom of the sphere are pulled through each other without creasing due to the geometry of the corrugations which permits the “turning.” An excellent introduction to the topic, including an animation of the eversion, is available online at [3]. Smale’s paradox belongs in the “a video is worth a thousand words” category, so we make no attempt to provide the details here. However, we can introduce some other interesting topological ideas here that are not so exotic. The M¨obius strip is perhaps most students’ first brush with topology. It is a peculiar surface that is obtained by gluing two opposite ends of a flexible, rectangular strip with a half twist; see Figure 1. The M¨ obius strip is an example of a two-dimensional manifold with boundary. This means that a tiny observer who lives on the surface of the M¨obius strip, but not on the boundary curve, could be forgiven for thinking that she lives in R2 . The observer would not be able to deduce, based upon purely local observations, that the universe was curved in some way. Nor would she be able to deduce that the M¨obius strip is nonorientable: there are no “front” and “back” sides to the M¨ obius strip. It has only one side. A torus can be described in a similar manner; see Figure 2. Unlike the M¨obius strip, a torus is orientable: it has an inside and an outside (as Homer Simpson could tell you). What is a practical application of the M¨obius strip? Large conveyor belts are often fashioned into M¨ obius strips to ensure that the entire surface wears evenly. The 241



Figure 1. Construction of a M¨obius strip from a square of flexible material. (left) A square of flexible material that has its left- and right-hand edges identified with opposite orientations. (right) Aligning the edges of the square so that the arrows agree results in a M¨obius strip.

M¨ obius strip was independently discovered by August Ferdinand M¨ obius (1790– 1868) and Johann Benedict Listing (1808–1882) in 1858. Unfortunately for Listing, the surface eventually became known as the M¨obius strip despite the fact that M¨ obius’s discovery of the surface became known only after his death. However, Listing was the first to use the term topology. According to Listing: By topology we mean the doctrine of the modal features of objects, or of the laws of connection, of relative position and of succession of points, lines, surfaces, bodies and their parts, or aggregates in space, always without regard to matters of measure or quantity.

Although he used the term as early as 1836, Listing’s 1847 book Vorstudien zur Topologie firmly cemented the word in German mathematics. English speakers continued to use the term analysis situs until the late 1920s when “topology” came into popular use [1].

Centennial Problem 1958 Proposed by Avery T. Carr, Emporia State University, and James M. Andrews, University of Memphis. Smale’s unexpected result brings to question the possibility of everting other shapes. Consider a circle governed by the same rules as the sphere from Smale’s paradox. The circle is composed of a material that can pass through itself but cannot be punctured or creased. Is it possible to evert the circle? What about a torus? More generally, what about a hypersphere in n dimensions? Hint: Look up the Whitney–Graustein theorem.



Figure 2. Construction of a torus from a square of flexible material. (left) A square of flexible material with opposite edges identified with the same orientation. (middle) Aligning the left and right edges so that the arrows agree results in a hollow cylinder. (right) Aligning the top and bottom of the cylinder so that the arrows agree results in a torus. 1958: Comments M¨ obius trip. As Figure 1 suggests, we can describe a M¨ obius strip as a parametrized surface. The parametrization  u v cos u, x(u, v) = 1 + cos 2 2  v u y(u, v) = 1 + cos sin u, 2 2 u v z(u, v) = sin , 2 2 maps the rectangle [0, 2π] × [−1, 1] in uv-space onto a M¨ obius strip in xyz-space with width 1 and whose central circle has radius 1. The boundary of a M¨ obius strip is, topologically speaking, a circle. In the formation of the M¨obius strip from our square of flexible material (Figure 1), the upper and lower edges are joined into one continuous curve. Indeed, trace the edge of a M¨ obius strip and you will find that it is a single curve; see Figure 3. Imagine that we live in a high-dimensional space, or that we had a flexible material that could pass through itself. What would happen if we took a disk and glued its edge to the boundary of a M¨ obius strip? We cannot accomplish this in R3 without self-intersections, but we could accomplish this in R5 , which gives us enough wiggle room. See the notes for the 2003 entry for more information. The Klein bottle. Another popular topological item is the Klein bottle; see Figure 4. This peculiar bottle was first described in 1882 by Felix Klein (1849– 1925). Like the M¨obius strip, it is nonorientable. You should definitely not use it to store liquids since it has no inside or outside! You can make a Klein bottle by gluing together two M¨obius strips along their boundaries. Unfortunately, the resulting object cannot be realized in R3 without self-intersections, although R4



Figure 3. The boundary of a M¨ obius strip is a single continuous curve that can be continuously deformed into a circle.

Figure 4. Construction of a Klein bottle from a square of flexible material. (left) A square of flexible material so that the left and right edges have the same orientation and so that the top and bottom edges have opposite orientations. (middle) Aligning the leftand right-hand edges so that the arrows agree results in a hollow cylinder. (right) The result is the Klein bottle, a nonorientable surface. It has no inside and no outside.

will do nicely. For example, x(u, v) = (3 + cos u) cos v, y(u, v) = (3 + cos u) sin v, z(u, v) = sin u cos(v/2), w(u, v) = sin u sin(v/2), maps the square [0, 2π] × [0, 2π] in uv-space onto a Klein bottle in xyzw-space without self-intersection.



Bibliography [1] J. J. O’Connor and E. F. Robertson, Johann Benedict Listing, MacTutor History of Mathematics, http://www-history.mcs.st-and.ac.uk/Biographies/Listing.html. [2] S. Smale, A classification of immersions of the two-sphere, Trans. Amer. Math. Soc. 90 (1958), 281–290, DOI 10.2307/1993205. http://www.maths.ed.ac.uk/~aar/papers/smale5. pdf. MR0104227 [3] YouTube, Outside In, http://www.youtube.com/watch?v=wO61D9x6lNY.


QR Decomposition Introduction The QR decomposition is a phenomenally useful matrix factorization that was independently discovered by John G. F. Francis (1934– ) in 1959 [3, 4] and Vera Kublanovskaya (1920–2012) in 1961 [9]; see [7] for a detailed history. Suppose that A ∈ Mm×n (R) and m ≥ n; that is, suppose that A has at most as many columns as rows. Then we may factor A = QR, in which Q ∈ Mm×n (R) has orthonormal columns and R ∈ Mn (R) is upper triangular and has nonnegative diagonal entries. The QR algorithm is an iterative algorithm, based upon repeated QR decompositions, that is used to quickly and accurately compute eigenvalues. The standard approach, taught in most introductory linear algebra courses, is to compute the characteristic polynomial pA (z) = det(zI − A) of A ∈ Mn and then find its roots. Due to its reliance on determinants, this method is terribly inefficient for large matrices. Moreover, there are no simple formulas to exactly compute the roots of a polynomial of degree five or more. According to mathematician-writer-journalist Barry Cipra (1952– ) [1]: Eigenvalues are arguably the most important numbers associated with matrices—and they can be the trickiest to compute. It’s relatively easy to transform a square matrix into a matrix that’s “almost” upper triangular, meaning one with a single extra set of nonzero entries just below the main diagonal.1 But chipping away those final nonzeros, without launching an avalanche of error, is nontrivial. The QR algorithm is just the ticket. Based on the QR decomposition, which writes A as the product of an orthogonal matrix Q and an upper triangular matrix R, this approach iteratively changes Ai = QR into Ai+1 = RQ, with a few bells and whistles for accelerating convergence to upper triangular form. By the mid-1960s, the QR algorithm had turned once-formidable eigenvalue problems into routine calculations.

The QR algorithm has rightly been hailed as one of the ten most important algorithms of the 20th century [1, 2]; see also the 1965 entry. How does one compute a QR decomposition? We outline here the method of Householder reflections, named after Alston Scott Householder (1904–1993). See [5] for more details and corresponding results about complex matrices; see [6] for 1 Such a matrix is called an upper Hessenberg matrix. It is possible to bring a square matrix into upper Hessenberg form through the use of Householder transformations; see [5].




w x


Uw x Figure 1. Action of a 3 × 3 Householder matrix Uw on R3 . explicit algorithms and numerical considerations. Let w ∈ Rn . The n × n Householder matrix (1959.1) Uw = I − 2ww∗ / w 2 reflects vectors in Rn across the (n − 1)-dimensional hyperplane that is orthogonal to w; see Figure 1. Since Uw preserves the norm of each vector that it acts on, −1 it is an orthogonal matrix. Moreover, Uw = Uw since a reflection is self-inverse. Consequently, Householder matrices are well suited for numerical computation: they are simple to define (1959.1), numerically stable, and easy to invert. Let A = [a1 a2 . . . an ] ∈ Mm×n (R), and suppose for the sake of simplicity that none of the columns of A are zero. Find an orthogonal matrix2 U1 ∈ Mm (R) so that U1 a1 equals a1 times the first standard basis vector in Rm . Then ! a1  (1959.2) , A ∈ M(m−1)×(n−1) (R), U1 A = 0 A in which  denotes entries that are of no interest to us. The same principle applies to the smaller matrix A now. Iterating this procedure n times, one obtains a sequence of orthogonal matrices U1 , U2 , . . . , Un ∈ Mm (R) so that ! R , U ···U U A =  n  2 1 0(m−n)×n U

in which U ∈ Mm (R) is orthogonal and R ∈ Mn (R) is upper-triangular and has nonnegative diagonal entries. Let V = U T and partition V = [Q Q ], in which Q ∈ Mm×n (R). Since V is an orthogonal matrix, Q has orthonormal columns and ! ! R  R A = V = [Q Q ] = QR + Q 0 = QR. 0 0 2 One can always find a Householder matrix that takes a given vector to another given vector with the same norm. To improve numerical stability, it is useful to consider a slight generalization. Suppose that x, y ∈ Rn and x = y = 0. Let σ = 1 if x · y ≤ 0 and σ = −1 if x · y > 0. Then σUy−σx ∈ Mn (R) is a real orthogonal matrix that maps x to y; see [5].



Centennial Problem 1959 Proposed by Stephan R. Garcia, Pomona College. Let A = [a1 a2 . . . an ] ∈ Mn (R), in which a1 , a2 , . . . , an ∈ Rn . Use the QR decomposition to prove Hadamard’s inequality | det A| ≤ a1 a2 · · · an .


1959: Comments Gram–Schmidt in the real world. It is customary in elementary linear algebra courses to teach students how to orthogonalize a list of vectors with the Gram–Schmidt process. While there is some merit to this (for instance, the Gram– Schmidt process can be used to provide an easy proof that every finite-dimensional inner product space has an orthonormal basis), students should be warned that the Gram–Schmidt process is numerically unstable and hence unreliable in practice. The QR decomposition, because of its reliance on orthogonal matrices, is stable and yields much better results. If A = [a1 a2 . . . an ] ∈ Mm×n (R) has linearly independent columns (this implies that m ≥ n), then the columns of the matrix Q = [q1 q2 . . . qn ] ∈ Mm×n (R) from the QR decomposition are orthonormal and they have the property that span{a1 , a2 , . . . , ar } = span{q1 , q2 , . . . , qr } for r = 1, 2, . . . , n. Hadamard matrices. Jacques Hadamard first proved his eponymous inequality in 1893 [8]; it is related to a fascinating open problem in combinatorics. If each entry of A ∈ Mn (R) is −1 or 1, then Hadamard’s inequality (1959.3) tells us that | det A| ≤ nn/2 . A matrix for which equality holds is a Hadamard matrix of order n. It is possible to show that the order of a Hadamard matrix must be 1, 2, or a multiple of 4. Some Hadamard matrices of small order are ⎡ ⎤ 1 1 1 1 ! ⎢ 1 1 1 1 −1 −1 ⎥ ⎢ ⎥, [1], , ⎣ 1 −1 1 −1 −1 1 ⎦ 1 −1 1 −1 and

⎤ 1 1 1 1 1 1 1 1 ⎢ 1 1 1 1 −1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ 1 1 −1 −1 −1 −1 1 1 ⎥ ⎢ ⎥ ⎢ 1 1 −1 −1 1 1 −1 −1 ⎥ ⎢ ⎥. ⎢ 1 −1 −1 1 1 −1 −1 1 ⎥ ⎢ ⎥ ⎢ 1 −1 −1 1 −1 1 1 −1 ⎥ ⎢ ⎥ ⎣ 1 −1 1 −1 −1 1 −1 1 ⎦ 1 −1 1 −1 1 −1 1 −1 As these examples suggest, a Hadamard matrix must be a multiple of an orthogonal matrix; that is, it must have orthogonal rows and orthogonal columns. The famed



Hadamard conjecture asserts that a Hadamard matrix of order 4k exists for every positive integer k; the smallest permissible order for which no Hadamard matrix is presently known is 668. A determinantal inequality. We conclude with a beautiful determinantal inequality for positive semidefinite matrices. Recall that A ∈ Mn (R) is positive semidefinite if A is symmetric and its eigenvalues are nonnegative (the eigenvalues of a real symmetric matrix are always real). This is equivalent to A = B TB


for some B = [b1 b2 . . . bn ] ∈ Mm×n (R). This decomposition highlights one of the main applications of positive semidefinite matrices. Since aij = bT i bj = bi · bj for 1 ≤ i, j ≤ n in (1959.4), the entries of A measure the correlations between the vectors b1 , b2 , . . . , bn ∈ Rm . In this context, positive semidefinite matrices frequently arise in statistics. As a consequence of Hadamard’s inequality, | det A| = | det(B T B)| = | det(B T ) det B| = | det B|2 ≤ b1 2 b2 2 · · · bn 2 = a11 a22 · · · ann since aii = bi · bi = bi 2 ≥ 0 for i = 1, 2, . . . , n. Thus, the absolute value of the determinant of a positive semidefinite matrix is bounded above by the product of its diagonal entries. If A is merely symmetric, but not positive semidefinite, then the preceding inequality fails. Consider ! 0 1 A= , 1 0 for which | det A| = 1 and a11 = a22 = 0. For more information about positive semidefinite matrices and their properties, see [5]. Bibliography [1] B. A. Cipra, The best of the 20th century: editors name top 10 algorithms, SIAM News 33 (2000). [2] J. Dongarra and F. Sullivan, The top 10 algorithms, Comput. Sci. Eng. 2 (2000), 22–23. [3] J. G. F. Francis, The QR transformation: a unitary analogue to the LR transformation. I, Comput. J. 4 (1961/1962), 265–271, DOI 10.1093/comjnl/4.3.265. MR0130111 [4] J. G. F. Francis, The QR transformation. II, Comput. J. 4 (1961/1962), 332–345, DOI 10.1093/comjnl/4.4.332. MR0137289 [5] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge Mathematical Textbooks, Cambridge University Press, 2017. [6] G. H. Golub and C. F. Van Loan, Matrix computations, 4th ed., Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, 2013. MR3024913 [7] G. Golub and F. Uhlig, The QR algorithm: 50 years later its genesis by John Francis and Vera Kublanovskaya and subsequent developments, IMA J. Numer. Anal. 29 (2009), no. 3, 467–485, DOI 10.1093/imanum/drp012. MR2520155 [8] J. Hadamard, R´ esolution d’une question relative aux d´ eterminants, Bulletin des Sciences Math´ ematiques 17 (1893), 240–246. [9] V.N. Kublanovskaya, On some algorithms for the solution of the complete eigenvalue problem, USSR Computational Mathematics and Mathematical Physics 1 (1963), no. 3, 637–657


The Unreasonable Effectiveness of Mathematics Introduction This year honors a groundbreaking, influential article by Eugene Wigner [12], the Nobel laureate in physics whose work in random matrix theory eventually led to astonishing connections between the seemingly diverse fields of number theory and nuclear physics; see the 1928 entry. In his article, Wigner discusses the use of mathematics in physics:1 A possible explanation of the physicist’s use of mathematics to formulate his laws of nature is that he is a somewhat irresponsible person. As a result, when he finds a connection between two quantities which resembles a connection well-known from mathematics, he will jump at the conclusion that the connection is that discussed in mathematics simply because he does not know of any other similar connection. It is not the intention of the present discussion to refute the charge that the physicist is a somewhat irresponsible person. Perhaps he is. However, it is important to point out that the mathematical formulation of the physicist’s often crude experience leads in an uncanny number of cases to an amazingly accurate description of a large class of phenomena. This shows that the mathematical language has more to commend it than being the only language which we can speak; it shows that it is, in a very real sense, the correct language.

Mathematics is so ubiquitous in physics that the American Journal of Physics asked, “Does any piece of mathematics exist for which there is no application whatsoever in physics? ” To this, physicist Dwight E. Neuenschwander (1952– ) responded: While constructing such a “useless” piece of mathematics would be the delight of a mathematical purist, it seems we physicists have always managed to foil this lofty goal. It seems that even the most esoteric mathematical inventions of the human mind are eventually used to model physical systems. Why that should be true is of course a deep and fascinating question. [9]

The catchphrase “unreasonable effectiveness” has spawned innumerable imitators and it is difficult to catalogue them all. Some of the most influential were discussed by economist K. Vela Velupillai (1947– ) [11]: Eugene Wigner’s Richard Courant Lecture in the Mathematical Sciences, delivered at New York University on 11 May 1959, was titled, 1 The

repeated use of “his” and “he” to refer to a generic physicist is regrettable. 251



picturesquely and, perhaps, with intentional impishness The Unreasonable Effectiveness of Mathematics in the Natural Sciences [12]. Twenty years later, another distinguished scientist, Richard W. Hamming, gave an invited lecture to the Northern California Section of the Mathematical Association of America with the slightly truncated title The Unreasonable Effectiveness of Mathematics [5]. A decade or so later, Stefan Burr tried a different variant of Wigner’s title by organising a short course on The Unreasonable Effectiveness of Number Theory [2]. Another decade elapsed before Arthur Lesk, a distinguished molecular biologist at Cambridge, gave a lecture at the Isaac Newton Institute for Mathematical Sciences at Cambridge University where yet another twist to the Wigner theme was added: The Unreasonable Effectiveness of Mathematics in Molecular Biology [8].2

The words “unreasonable” and “effectiveness” are often slightly modified to fit the author’s point. For example, there is The Reasonable Ineffectiveness of Research in Mathematics Education [7]. In The Reasonable Effectiveness of Mathematics in Economics [3], Frank J. Fabozzi (1948– ) and Sergio M. Focardi tell us: In a nutshell, we believe that the reason that mathematics is only reasonably effective in economics is because we apply mathematics to study large engineered artefacts (i.e., economies or financial markets), that have been designed to allow a lot of freedom so as to encourage change and innovation. The level of unpredictability and control is clearly different when considering systems governed by immutable natural laws as opposed to artefacts constructed by humans. Some systems, such as economies or financial markets, are prone to crises. Mathematics does a reasonably good job in describing these systems. But the mathematics involved is not that of physics: It is the mathematics of learning and complexity.

Mathematics is often called the language of the universe. However, some dispute how far this universe extends beyond physics and astronomy and how much is actually needed to describe the world and make significant contributions; see the article [13] by biologist Edward Osborne Wilson (1929– ). Wigner’s article influenced even those who profoundly disagree with him. For example, Israel Gelfand, who worked both in pure mathematics (see the 1941 entry) and mathematical biology, said: Eugene Wigner wrote a famous essay on the unreasonable effectiveness of mathematics in natural sciences. He meant physics, of course. There is only one thing which is more unreasonable than the unreasonable effectiveness of mathematics in physics, and this is the unreasonable ineffectiveness of mathematics in biology.

The engineer Derek Abbott (1960– ) wrote the influential rebuttal The Reasonable Ineffectiveness of Mathematics [1], in which he writes: Science is a modern form of alchemy that produces wealth by producing the understanding for enabling valuable products from base ingredients. Science is merely functional alchemy that has had a few incorrect assumptions fixed, but has in its arrogance replaced them 2 The

punctuation and citation style has been slightly modified.



with more insidious ones. The real world of nature has the uncanny habit of surprising us; it has always proven to be a lot stranger than we give it credit for. Mathematics is a product of the imagination that sometimes works on simplified models of reality. Platonism is a viral form of philosophical reductionism that breaks apart holistic concepts into imaginary dualisms. . . . Mathematics is a human invention for describing patterns and regularities. It follows that mathematics is then a useful tool in describing regularities we see in the universe. The reality of the regularities and invariances, which we exploit, may be a little rubbery, but as long as they are sufficiently rigid on the scales of interest to humans, then it bestows a sense of order.

Certainly many mathematicians would disagree with Abbott’s account! Centennial Problem 1960 Proposed by Stanislav Molchanov and Harold Reiter, UNC Charlotte. The following four problems illustrate Wigner’s principle that a single mathematical idea often appears in several different areas. Problem 1. The Catalan numbers are defined for integers n ≥ 0 by   2n 1 Cn = . n+1 n


The first several Catalan numbers are 1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, . . . . Prove that Cn is always an integer. Problem 2. The probability density ⎧ √ ⎪ ⎨ 1 4 − x2 2π p(x) = ⎪ ⎩0

if |x| ≤ 2, otherwise,

arises in Wigner’s semicircle law, which he proposed for the description of the spectra of heavy atomic nuclei. Show that its moments are   2  Cn/2 if n is even, 1 n x 4 − x2 dx = 2π −2 0 if n is odd. Problem 3. A tree is a graph in which any two vertices are connected by exactly one path. An ordered tree is a rooted tree in which the children of each vertex are given a fixed left-to-right order. Show that Cn is the number of nonisomorphic orderded trees with n vertices; see Figure 1. Problem 4. Suppose that we must multiply n ≥ 2 symbols a1 , a2 , . . . , an using a binary but not necessarily associative operation b(x, y). Consequently, we must keep track of order. We are interested in the number of structurally different ways we can combine the symbols, and not the number of different ways we can then input the n objects into the possibilities. If we let Sn−1 denote the number of different structures we can use to multiply n symbols using our binary operation n − 1 times, then S1 = 1 since the only way to combine two symbols is b(a1 , a2 ); we do not count b(a2 , a1 ) since it is structurally the same as b(a1 , a2 ).



Figure 1. Two rooted trees on 23 vertices. The root vertices are highlighted in red. If we had to choose names for the trees, they would be Telperion the Silver and Laurelin the Golden. Similarly, S2 = 2 since we have only two structurally different approaches: b (a1 , b(a2 , a3 ))


b (b(a1 , a2 ), a3 ) .

A little more work shows that S3 = 5:          b a1 , b b(a2 , a3 ), a4 , b b(a1 , a2 ), a3 , a4 , b b(a1 , a2 ), b(a3 , a4 ) ,        b b a1 , b(a2 , a3 ) , a4 , and b a1 , b a2 , b(a3 , a4 ) . Show that Sn = Cn . 1960: Comments Catalan numbers. There is a wealth of interesting facts known about the Catalan numbers. First of all, they are named after the French-Belgian mathematician Eug`ene Charles Catalan (1814–1894), who does not appear to be Catalonian. Nevertheless, the term “Catalonian” has been used by a few authors to refer to subjects related to the Catalan numbers [4, p. 254] (at least the authors think it a good idea and are not above flagrant self-reference). The Catalan numbers appear in many different places in mathematics; over fifty such occurrences are discussed in [10]. It turns out that Cn is the number of ways to write n left parentheses and n right parentheses so that, as we move from left to right, we never see more right parentheses than left parentheses. We see that C1 = 1 since the only possible arrangement is (). Similarly, C2 = 2 since there are only two permissible configurations: ()() and (()). For n = 3, we have exactly five options: ((())),






Thus, C3 = 5. See the comments for the 2008 entry for the asymptotic rate of growth of the Catalan numbers. Another interesting interpretation of Cn is that it is the number of “staircase walks” from (0, 0) to (n, n) that never rise above the main diagonal; that is, j ≤ k whenever (j, k) is on our path. Such a path is called a Dyck path, in honor of Walther von Dyck (1856–1934); see Figure 2.



Figure 2. There are C4 = 14 Dyck paths of order 4. Bibliography [1] D. Abbott, The reasonable ineffectiveness of mathematics, Proceedings of the IEEE, Vol. 101, no. 10, October 2013. [2] S. A. Burr (ed.), The unreasonable effectiveness of number theory, papers from the American Mathematical Society Short Course held in Orono, Maine, August 6–7, 1991, Proceedings of Symposia in Applied Mathematics, vol. 46, American Mathematical Society, Providence, RI, 1992. MR1195838 [3] S. M. Focardi and F. J. Fabozzi, The reasonable effectiveness of mathematics in economics, American Economist 1 (2010), no. 55, 19–30. [4] S. R. Garcia and S. J. Miller, 100 Years of Math Milestones: The Pi Mu Epsilon Centennial Collection, American Mathematical Society, 2019. [5] R. W. Hamming, The unreasonable effectiveness of mathematics, Amer. Math. Monthly 87 (1980), no. 2, 81–90, DOI 10.2307/2321982. MR559142 [6] A. Harvey, The Reasonable Effectiveness of Mathematics in the Physical Sciences, Relativity and Gravitation, 43 (2011), 3057–3064. [7] J. Kilpatrick, The reasonable ineffectiveness of research in mathematics education, For the Learning of Mathematics 2 (1981), no. 2, 22–29. [8] A. M. Lesk, The unreasonable effectiveness of mathematics in molecular biology, Math. Intelligencer 22 (2000), no. 2, 28–37, DOI 10.1007/BF03025372. MR1764266 [9] D. E. Neuenschwander, Does any piece of mathematics exist for which is no application whatsoever in physics?, Amer. J. Phys. 63 (1996), 63. [10] R. P. Stanley, Enumerative combinatorics. Vol. 2, with a foreword by Gian-Carlo Rota and appendix 1 by Sergey Fomin, Cambridge Studies in Advanced Mathematics, vol. 62, Cambridge University Press, Cambridge, 1999. MR1676282 [11] K. V. Velupillai, The unreasonable ineffectiveness of mathematics in economics, Cambridge Journal of Economics 29 (2005), 849–872. [12] E. P. Wigner, The unreasonable effectiveness of mathematics in the natural sciences [Comm. Pure Appl. Math. 13 (1960), 1–14; Zbl 102, 7], Mathematical analysis of physical systems, Van Nostrand Reinhold, New York, 1985, pp. 1–14. https://www.dartmouth.edu/~matc/ MathDrama/reading/Wigner.html. MR824292 [13] E. O. Wilson, Great Scientist = Good at Math: E. O. Wilson shares a secret: Discoveries emerge from ideas, not number-crunching, Wall Street Journal (online). http://www.wsj. com/articles/SB10001424127887323611604578398943650327184.


Lorenz’s Nonperiodic Flow Introduction There is a certain “continuity principle” that underlies much familiar mathematics: if one jiggles parameters a little bit, then the final answer should only change by a small amount. For example, the roots of a quadratic polynomial ax2 + bx + c, in which a = 0, are given by √ −b ± b2 − 4ac . 2a As long as we avoid a = 0, the (possibly complex) roots vary continuously with the parameters (a, b, c). Similarly, the area and perimeter of a polygon vary continuously with the placement of its vertices. Beginning with the work of Henri Poincar´e on the orbits of planets, this general principle began to be questioned. A milestone in our understanding of chaotic behavior is the work of Edward Lorenz (1917–2008). His seminal paper Deterministic Nonperiodic Flow, published in 1963 (but based on work started in 1961), introduced the notion of “sensitive dependence on initial conditions.” This refers to when minute changes to initial conditions drastically affect long-term behavior. In an attempt to study the weather, Lorenz considered the deterministic system dx = σ(y − x), dt

dy = x(ρ − z) − y, dt

dz = xy − βz, dt

in which x is proportional to the rate of convection, y to the horizontal temperature variation, z to the vertical temperature variation, and σ, ρ, and β are parameters. He wanted to rerun some calculations from an intermediate point. When he fed the output from a previous run into the computer, the system behaved in a totally different manner than it had before. How could this occur in a deterministic system? Lorenz’s printer only displayed three digits of the output, while the computer code worked internally with six. The resulting loss of precision changed the initial conditions slightly and permitted the two computations to make radically different long-term predictions. Many people are familiar with the butterfly effect, a phrase which insinuates that the flap of a butterfly’s wings may eventually cause (or prevent) the onset of a tornado hundreds of miles away. Long-term weather forecasting may be impossible since we can never know all the parameter values with perfect accuracy. The following problem shows that a tiny difference in the initial trajectory of a billiard ball can have qualitative effects on the long-time orbits of the initial points. Furthermore, if one imagines this rectangle is slightly compressed to have concave 257



sides, then a tiny difference in the initial slope has exponentially large effects on the long-time orbits of the initial points, a property of a dynamical system known as sensitive dependence on initial conditions. Centennial Problem 1961 Proposed by Craig Corsi and Steven J. Miller, Williams College. Imagine playing billiards, in which the billiards table is the unit square [0, 1] × [0, 1] in R2 , and the ball is a point. You place the ball at (0, 0), the lower-left edge of the table, and strike the ball at some angle θ ∈ (0, π/2) relative to the positive x-axis. Assume that there is no friction and the boundary of the square is perfectly elastic. Then the ball will continue to bounce off the walls of the table forever. For instance, if θ = π/4, then the ball will bounce back and forth between the lower-left and upper-right corners of the table. Let xθ (t) represent the position of the ball at time t. (a) For any N ∈ N, show that if θ = φ, then there exists a t > N such that |xθ (t) − xφ (t)| > 1/2. (b) For any θ ∈ (0, π/2), show that either (i) the number of points on the edge of the billiards table hit by the ball is finite or (ii) any line segment contained in the boundary of the unit square, however small, is hit infinitely often by the ball; see Figure 1. (c) Classify all angles for which (i) occurs in (b).

Figure 1. Two billiard-ball trajectories starting at (0, 0) on a frictionless, perfectly elastic, square billiard table represented by the unit square [0, 1] × [0, 1]. (top) The slope of the initial trajectory is the rational number 27/10 = 2.7. (bottom) The slope of the initial trajectory is the irrational number e ≈ 2.71828. The small difference in initial conditions leads to profound differences in the eventual behavior of the system.



1961: Comments Newton’s method. Here is a particularly nice example of a chaotic system that sets the stage for our 1964 and 1978 entries. Newton’s method is a powerful algorithm that constructs a sequence of real numbers that rapidly converges to a zero of a given polynomial. For example, f (x) = x2 − 3 √ √ has the zeros x = ± 3. Arithmetic tells us that 1 < 3 < 2 and we suspect that √ 3√lies closer to 2 than 1. Let x0 = 2 be our initial guess for the numerical value of 3; it is not a particularly good guess, but this matters not because Newton’s method is incredibly effective. Construct the tangent line to the the graph of f (x) at the point (x0 , f (x0 )); that is, y − f (x0 ) = f  (x0 )(x − x0 ). We suspect that the point x1 at which √ the tangent line intersects the x-axis should be a decent approximation to the zero 3 of f ; see Figure 2. Set (x, y) = (x1 , 0) in the preceding and obtain x1 = x0 −

f (2) 7 f (x0 ) = 2−  = = 1.75.  f (x0 ) f (2) 4

Our succeeding approximations are generated by the recursion xn+1 = xn −

f (xn ) . f  (xn )

Figure 2. First step of Newton’s method to compute a root of x2 − 3. The initial guess x0 = 2 provides the approximation x1 = √ 7/4 = 1.75, which is already remarkably close to 3 = 1.73205 . . ..



Start with the initial approximation x0 = 2 and obtain 7 = 1.75, x1 = 4 97 x2 = = 1.73214 . . . , 56 18817 x3 = = 1.732050810 . . . , 10864 at which point we might as well stop, since √ 3 = 1.732050807 . . . . Thus, only three iterations of Newton’s method are required to get seven digits of accuracy; see the 1964 entry for more information about the computation of square roots and some results about the rate of convergence. Newton fractals. In the preceding example, one can show that ⎧√ 3 if x0 > 0, ⎪ ⎨ √ lim xn = − 3 if x0 < 0, n→∞ ⎪ ⎩ 0 if x0 = 0.

Figure 3. Newton fractal for f (z) = z 6 − 1.



In particular, x0 = 0 is a poor initial choice since xn is the zero sequence and hence does not converge to a zero of f . Things become much more interesting if we use polynomials of higher degree and permit the use of complex numbers. Consider the complex polynomial f (z) = z 6 − 1, whose roots are the vertices of an equilateral hexagon inscribed in the unit circle |z| = 1; see the figure on p. 236. For almost every complex number z, the sequence obtained from Newton’s method with initial seed z converges to one of the three roots. But which one? Assign a color to each of the three roots of f . Now paint each initial seed z according to which root the seed iterates to under Newton’s method. The resulting image (see Figure 3) is a Newton fractal. Other polynomials yield similarly enchanting images; see Figure 4. For a wealth of information about chaos and fractals, see [3].

Figure 4. Newton fractal f (z) = z 5 − 1.



Bibliography [1] E. N. Lorenz, Deterministic Nonperiodic Flow, Journal of Atmospheric Sciences 20 (1963), 130–141. http://eaps4.mit.edu/research/Lorenz/Deterministic_63.pdf [2] E. N. Lorenz, How much better can weather prediction become?, Technology Review Jul/Aug 1969, 39-49. http://eaps4.mit.edu/research/Lorenz/How Much Better Can Weather Prediction 1969.pdf [3] H.-O. Peitgen, H. J¨ urgens, and D. Saupe, Chaos and fractals: New frontiers of science, 2nd ed., with a foreword by Mitchell J. Feigenbaum, Springer-Verlag, New York, 2004. MR2031217


The Gale–Shapley Algorithm and the Stable Marriage Problem Introduction In a seminal 1962 paper [4], David Gale (1921–2008) and Lloyd Shapley (1923– 2016) initiated the formal study of stable matchings. In 2012 the Nobel Prize in Economics was given to Shapley and Alvin Roth (1951– ) “for the theory of stable allocations and the practice of market design” (Gale had the misfortune of passing away in 2008, thus rendering him ineligible for a Nobel Prize). One of the most important applications of these ideas is to the National Resident Match Program (NRMP), which matches hospitals and medical students for residencies. In 1998, the NRMP changed the matching algorithm in response to concerns of fairness. Finding stable matchings that meet various fairness criteria remains challenging and depends upon a careful study of intricate relationships in posets imposed on multiple stable matchings. Suppose that we have two groups of the same size: proposers and acceptors. Each proposer must be matched with an acceptor; see Figure 1. We collect the preferences for proposers and acceptors in two preference matrices, one for each of

Figure 1. (top) Each proposer (gray) has one or more desired acceptors (red). (bottom) A compatible matching. This is a much harder problem once ranked preferences are involved. That is the context of the Gale–Shapley algorithm. 263



the groups. A matching is stable if no two parties prefer each other to their assigned partners. In a stable matching, no two parties have a reason to switch partners. The Gale–Shapley algorithm is an efficient proposal algorithm that, given two preference matrices, finds stable matchings. These are often called stable marriages because one of the original applications provided by Gale and Shapley was the matching of n men to n women (although this makes one question who the consumer of such an algorithm would be). The worst case complexity of the algorithm is O(n2 ), which means that the number of steps needed is at worst proportional to the square of the size of each group. Moreover, the Gale–Shapley algorithm always returns at least one stable matching, and at most two of them no matter what set of preferences are given. Unlike our description of the powerful simplex method (see the 1947 entry), the Gale–Shapley algorithm is simple enough for us to explain in detail. Suppose we have a group of n men and a group of n women who want to be matched.1 Let the men, in turn, propose to the women, each of whom either rejects the proposals she receives or breaks off a previous engagement if a better proposal comes along. Here is the Gale–Shapley algorithm. (a) In the first round, each man proposes to the woman he prefers the most. Each woman considers all the proposals she receives. She provisionally accepts the proposal coming from the man she ranks highest among those who have proposed to her and rejects all the other proposals. (b) Each unengaged man now proposes to the woman he prefers among the women he has not previously asked to marry him (once a woman rejects a man he never asks her again), regardless of whether or not she has provisionally accepted a proposal. Each woman considers all the proposals she receives and provisionally accepts the proposal coming from the man she ranks highest among those who have proposed to her. She rejects all the other proposals. (c) We keep repeating step (b), with the unengaged men asking and the women provisionally accepting, until all men are provisionally engaged. At this point all the provisional engagements become permanent and we have obtained a matching between the men and women. The proof that this algorithm always results in a stable matching is constructive. Once a woman provisionally accepts a proposal, she can only stay the same or trade-up; she is never unmatched. If a man has been unsuccessful, then he proposes to someone new. Since there are the same number of men and women, there must be at least one available woman who has not received any offers and thus must accept his. Each man remains paired with a woman he prefers unless that woman receives a better offer, and every woman is given the option of choosing among the men that prefer her. When the Gale–Shapley algorithm finds at most two stable matchings, it is because the matching resulting from having one group do the proposing may differ from the matching obtained when the other group does the proposing. If two distinct stable matchings can be returned by the algorithm, each is optimal for the 1 We tried unsuccessfully to rephrase the algorithm in a gender-neutral manner. It became increasingly difficult to understand because the words “its” and “their” could refer to either party.



group doing the proposing. For example, if the men propose, each man will fare at least as well as he would in the matching obtained by having the women propose. Current attention is focused on what happens when there are many more than two stable matchings possible. In these cases the additional matchings must be found with other algorithms. Christine Cheng and her colleagues recently proved that a nice relationship holds for local and global aspects of the set of matchings, given that all the stable matchings for a particular problem instance have been found. This work involves the following two concepts. Global Median Matching (GMM): Impose a partial ordering on the set of stable matchings according to the rule that one matching is better than another if every proposer (or symmetrically every acceptor) receives at least as good a partner in the former matching as in the latter matching. The resulting poset terminates at one end in the proposer-optimal matching and at the other end in the acceptoroptimal matching. A GMM matching is a matching that lies a median number of steps from these extreme matchings. Local Median Matching (LMM): Consider for each proposer (and similarly for each acceptor) the ordered set of all the rankings of the partners it is assigned in all the stable matchings. An LMM is a matching that assigns all the people a partner that lies at the median of their ordered sets. The surprising result is that not only do GMMs and LMMs exist, but there is always at least one GMM and LMM that are identical. Therefore, if one accepts these local and global measures of fairness as valid, both measures can be satisfied by a single stable matching. Centennial Problem 1962 Proposed by Paul Kehle, Hobart and William Smith Colleges. What is the problem? The problem is that in some cases, in addition to a coinciding GMM and LMM solution, other stable matchings are arguably fairer. How can we characterize stable-matching problems according to whether the GMM/LMM matching is the fairest of them all? What other measure of fairness should we use to select a matching when the GMM/LMM matching leaves something to be desired? Consider the set of stable matchings in Figure 2. Which one is fairest, and why? How does your measure of fairness connect with the GMM and LMM measures? 1962: Comments Kidney transplants. One of the powers of the Gale–Shapley algorithm is its flexibility. If we can formulate a real-world problem in terms of assignments, then the algorithm may be applicable. For example, the Gale–Shapley algorithm can be used in diverse situations such as college admissions, scheduling tasks on processors, matching kidneys with patients, internet search engine auctions, speed dating, and pairing students to schools. For many problems in the world there are other algorithms that could run faster or yield better solutions, but it is good to know that a stable matching exists. Moreover, the algorithm often runs fast enough to resolve the problem. The great insight in many of these situations is that we can find solutions using market-like situations without money changing hands. For example, consider



Figure 2. This stable-matching instance for n = 8 has 16 stable matchings. They form a partially ordered set that reveals a hierarchy of matchings. A line between two matchings means that the matching with the larger Roman numeral is one in which each of the proposers has a partner it prefers at least as much as the partner it has in the matching with the smaller Roman numeral. This ordering is therefore transitive; since XVI is better than XV and because XV is better than XI, we see that XVI is better than XI even though no direct line is drawn between these two matchings. Note however that XV is not better than V, even though the average preference of the proposers in XV, 1.875, is much lower than the average in V, 2.625 (examine A’s and G’s preferences). This “better than” ordering is reversed for the acceptors’ perspective: lines between matchings indicate that each of the acceptors has at least as good or better a partner in the matching denoted by the smaller of the two Roman numerals. Image courtesy of Paul Kehle.







Figure 3. An opportunity exists here for both Hober and Peter to receive kidneys from outside of their families. Hari’s kidneys are incompatible with Hober; Petra’s kidneys are incompatible with Peter. However, Hari can donate to Peter and Petra can donate to Hober. kidney transplants. Initially most transplants came from deceased donors. However, it is possible for a living person to donate one of their kidneys. This greatly increases the available supply, but many people are understandably hesitant to donate one of their kidneys. Moreover, not any kidney can go to any patient; there are compatibility issues that must be addressed. Imagine two families in which someone needs a kidney, say Hari and Hober in one, and Petra and Peter in another. Hober and Peter both need kidneys, but unfortunately Hari’s kidney is incompatible with Hober. Similarly, Petra’s is incompatible with Peter. However, Hari’s would work in Peter and Petra’s in Hober. This opens up the opportunity for a trade that helps both families; see Figure 3. Now Peter and Hobert can declare: “Kidneys! I’ve got new kidneys! I don’t like the colour.” Before the Gale–Shapley algorithm was applied in the early 2000s, only about twenty transplants per year were from living donors. Now thousands of such transplants have been performed successfully. For more information about this life-saving application of mathematics, see [6, 7, 9]. Algorithmic bias. Concerns about the ever-present use of algorithms in modern society have recently begun to surface. It would take us too far afield to explore the rapidly developing conversation about algorithmic bias here, so we content ourself with a quote from Cathy O’Neill (1972– ), a mathematician whose experience in the finance sector left her with grave concerns about the application of supposedly “fair” algorithms. The math-powered applications powering the data economy were based on choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of these models encoded human prejudice, misunderstanding, and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domain: mathematicians and computer



scientists. Their verdicts, even when wrong or harmful, were beyond dispute or appeal. And they tended to punish the poor and the oppressed in our society, while making the rich richer [8].

Bibliography [1] C. T. Cheng, Understanding the generalized median stable matchings, Algorithmica 58 (2010), no. 1, 34–51, DOI 10.1007/s00453-009-9307-2. http://link.springer.com/article/10.1007 %2Fs00453-009-9307-2. MR2658099 [2] C. T. Cheng and A. Lin, Stable roommates matchings, mirror posets, median graphs, and the local/global median phenomenon in stable matchings, SIAM J. Discrete Math. 25 (2011), no. 1, 72–94, DOI 10.1137/090750299. http://epubs.siam.org/doi/abs/10.1137/090750299. MR2765702 [3] C. Cheng, E. McDermid, and I. Suzuki, A unified approach to finding good stable matchings in the hospitals/residents setting, Theoret. Comput. Sci. 400 (2008), no. 1-3, 84–99, DOI 10.1016/j.tcs.2008.02.014. MR2424344 [4] D. Gale and L. S. Shapley, College Admissions and the Stability of Marriage, Amer. Math. Monthly 69 (1962), no. 1, 9–15, DOI 10.2307/2312726. http://www.econ.ucsb.edu/~tedb/ Courses/Ec100C/galeshapley.pdf. MR1531503 [5] D. Gusfield and R. W. Irving, The stable marriage problem: structure and algorithms, Foundations of Computing Series, MIT Press, Cambridge, MA, 1989. MR1021242 [6] A. Hern, Trading kidneys, repugnant markets and stable marriages win the Nobel Prize in Economics, NewStatesman, October 15, 2012. http://www.newstatesman.com/blogs/ economics/2012/10/trading-kidneys-repugnant-markets-and-stable-marriages-winnobel-prize-econo. [7] K. Luong, Matching theory: kidney allocation, Health Policy and Economics, UWOMJ 82, no.1, Spring 2013. http://www.uwomj.com/wp-content/uploads/2013/10/v82no1_6.pdf. [8] C. O’Neil, Weapons of math destruction: How big data increases inequality and threatens democracy, Crown, New York, 2016. MR3561130 [9] Reuters, Alvin Roth Transformed Kidney Donation System, Reuters, October 15, 2012. http://forward.com/news/breaking-news/164327/alvin-roth-transformed-kidneydonation-system/.


Continuum Hypothesis Introduction In our 1918 entry, we introduced Cantor’s theory of cardinality and its shocking implication that there are multiple levels of infinity. Recall that A ∼ = B means that there is a one-to-one and onto function f : A → B. For finite sets, A ∼ = B if and only if A and B have the same number of elements. Cantor’s brilliant insight was to extend this definition to infinite sets. His classic diagonal argument (see p. 29) reveals that no one-to-one correspondence between N and R exists; that is, N and R represent two different levels of infinity. Since N is a subset of R, it is natural to consider what happens “in between” N and R. The continuum hypothesis (CH) asserts that if N ⊆ A ⊆ R, then either A∼ = N or A ∼ = R; that is, there are no “intermediate infinities” between those of the natural numbers and the real numbers. Cantor believed the continuum hypothesis to be true and he spent years attempting to prove it, without success. David Hilbert, one of the greatest mathematicians of all time, placed it first on his list of twenty-three open questions presented to the International Congress of Mathematicians, held in Paris in 1900 (for more about Hilbert’s problems, see the 1935, 1970, and 1980 entries). Hilbert opened his historic address with these words: Who among us would not be happy to lift the veil behind which is hidden the future; to gaze at the coming developments of our science and at the secrets of its development in the centuries to come? What will be the ends toward which the spirit of future generations of mathematicians will tend? What methods, what new facts will the new century reveal in the vast and rich field of mathematical thought?

So is the continuum hypothesis true or false? To this day, nobody has been able to prove it. Neither has anyone been able to disprove it. Nevertheless, the problem has been resolved! How can this be? In 1940, Kurt G¨ odel proved that CH cannot be disproved from the traditionally accepted axioms of set theory [5]. Specifically, he showed that CH cannot be disproved using the Zermelo–Fraenkel (ZF) axioms (see the 1929 entry) or the Zermelo–Fraenkel axioms augmented with the axiom of choice (AC). This extended axiom system is denoted ZFC; see the comments for the 1964 and 1969 entries. In 1963, Paul Cohen (1934–2007) introduced the powerful forcing technique and demonstrated that CH cannot be proved in ZFC [1, 2, 9]. Thus, the continuum hypothesis is neither provable nor disprovable from the standard axioms of set theory. Cohen won the prestigious Fields Medal in 1966 for this achievement. 269



See [9] for a remembrance of Paul Cohen; the second named author is one of his mathematical grandchildren. Of course, the results of G¨odel and Cohen assume that ZFC is consistent. The issue of whether or not ZFC is consistent is another story altogether; see the 1929 entry. To some extent, whether CH is “true” or “false” is a matter of opinion since it can neither be proved nor disproved in ZFC. One can add either CH or its negation to ZFC and obtain two different versions of mathematics, one in which CH is “true” and one in which CH is “false.” Each is as valid as the other, although, as G¨odel showed, neither system can prove its own consistency. This situation seems bizarre, although it becomes easier to understand if we study a similar occurrence in classical geometry; see the comments for this entry. Centennial Problem 1963 Proposed by Steven J. Miller, Williams College. Cardinality is a blunt instrument that ignores the topological properties of a set. For example, R and the Cantor set (see the 1917 entry) are equinumerous but “feel” totally different. One approach to distinguishing self-similar subsets of Rn is the notion of fractal dimension. A square, which is a “two-dimensional object,” consists of four scaled copies of itself, each of which has been reduced by a factor of two. It also consists of nine scaled copies of itself, each of which has been reduced by a factor of three. The relationship between these numbers is p = r d , in which p is the number of pieces in the dissection, r is the reduction factor, and d = 2 is the “dimension” of the square. Something similar works for a cube, which we regard as a “threedimensional object” because a similar equation holds with d = 3; see Figure 1. (a) Explain why the “fractal dimension” of the Cantor set C is log3 2 ≈ 0.6309298. The Cantor set shows that there is a set of fractal dimension strictly between 0 and 1; see the notes for the construction of a set of fractal dimension strictly between 1 and 2.

Figure 1. A cube consists of (left) 8 copies of itself, each scaled down by a factor of 2; (center) 27 copies of itself, each scaled down by a factor of 3; (right) 64 copies of itself, each scaled down by a factor of 4.



(b) If you have two sets of positive fractal dimension d1 = d2 , can you always construct a set whose fractal dimension is strictly between d1 and d2 ? 1963: Comments Self-similarity. The Cantor set is self-similar: it is composed of two scaled copies of itself, each of which has been shrunk by a factor of three. The power-law relation between the number of pieces p, the reduction factor r, and the fractal dimension d is p = r d ; that is, log p . d= log r At each stage of the Cantor set construction, the number of pieces is doubled and each is shrunk by a factor of 3. Thus, the fractal dimension of the Cantor set is log 2 ≈ 0.63093. log 3 What about a fractal whose dimension is between 1 and 2? Take a solid equilateral triangle and subdivide it into four equilateral triangles by removing the central triangle. Iterate this process and obtain the Sierpi´ nski triangle (Figure 2), named after Wac law Sierpi´ nski (1882–1969). In particular, observe that the Sierpi´ nski triangle resembles our diagram of the 3-adic integers; see the 1916 entry. This fractal is composed of three scaled copies of itself, each of which has been shrunk by a factor of two. Thus, its fractal dimension is log 3 ≈ 1.58496, log 2 which is strictly between 1 and 2. For more information about fractals, see the 1961 and 1978 entries and [8].

Figure 2. Sierpi´ nski triangle construction iterated eight times.





Figure 3. Illustration of Euclid’s fifth postulate. Since 0 < α + β < π2 , one expects the red and blue lines to eventually intersect.

The parallel postulate. How does classical geometry relate to the independence of the continuum hypothesis? The story begins around 2,300 years ago, when Euclid of Alexandria (in modern Egypt) wrote the Elements, a monumental treatise on geometry and related topics. The Elements was an attempt to build geometry in a logical and rigorous manner from a few basic axioms. Although Euclid’s book contains some mistakes and a few hidden assumptions, it is nonetheless a magnificent intellectual achievement. After defining everything from circles to isosceles triangles to rhomboids, Euclid presents his five postulates (axioms): (a) A straight line segment can be drawn joining any two points. (b) Any straight line segment can be extended indefinitely in a straight line. (c) Given any straight line segment, a circle can be drawn having the segment as radius and one endpoint as center. (d) All right angles are congruent. (e) If two lines are drawn which intersect a third in such a way that the sum of the inner angles on one side is less than two right angles, then the two lines inevitably must intersect each other on that side if extended far enough.

The fifth postulate sticks out: it seems too complicated to accept as an axiom. Perhaps with sufficient work we can deduce it from the remaining axioms? Euclid himself must have been unsatisfied with his fifth postulate since he held off from using it until his twenty-ninth theorem (Proposition I.29). For over 2,000 years mathematicians tried unsuccessfully to prove that the fifth postulate followed from the other postulates and definitions. They all failed for a subtle reason: it is impossible to prove or disprove the fifth postulate, given only the truth of the other four! This is because we can produce two distinct versions of geometry, one in which the fifth postulate is true and another in which it is false. If you assume that Euclid’s fifth postulate is true, then your geometry is just plain-old plane geometry. If you assume that Euclid’s fifth postulate is false, then you are studying hyperbolic geometry, a type of curved geometry. The existence of curved geometries is not surprising to us in the 21st century, since we are used to hearing of relativity and “curved space-time.” However, this was once an extremely radical thought. Indeed, the philosopher




Figure 4. The parallel postulate fails in the Poincar´e disk model of hyperbolic geometry. Given a line  that does not contain p, there are infinitely many hyperbolic lines through p that are parallel to the given line. Immanuel Kant (1724–1804) went so far as to say that “Euclidean geometry is the inevitable necessity of thought.” The fifth postulate is often called the parallel postulate because it is equivalent to Playfair’s axiom: In a plane, given a line and a point not on it, at most one line parallel to the given line can be drawn through the point.

The Poincar´e disk model of the hyperbolic plane is a geometry in which Euclid’s first four postulates hold, but the parallel postulate fails; see Figure 4. The “points” in this geometry are elements of an open disk. The “lines” are arcs of circles that intersect the boundary circle orthogonally. This geometry satisfies the first four of Euclid’s axioms, but not the fifth. See [6] for the whole story behind Euclidean and non-Euclidean geometry. Bibliography [1] P. Cohen, The independence of the continuum hypothesis, Proc. Nat. Acad. Sci. U.S.A. 50 (1963), 1143–1148, DOI 10.1073/pnas.50.6.1143. http://www.ncbi.nlm.nih.gov/pmc/ articles/PMC221287/. MR0157890 [2] P. J. Cohen, The independence of the continuum hypothesis. II, Proc. Nat. Acad. Sci. U.S.A. 51 (1964), 105–110, DOI 10.1073/pnas.51.1.105. http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC300611/. MR0159745 [3] T. Y. Chow, A beginner’s guide to forcing, Communicating Mathematics: A Conference in Honor of Joseph A. Gallian’s 65th Birthday, Contemporary Mathematics 479, 25–40. http:// arxiv.org/abs/0712.1320. [4] L. Gillman, Two classical surprises concerning the axiom of choice and the continuum hypothesis, Amer. Math. Monthly 109 (2002), no. 6, 544–553, DOI 10.2307/2695444. http://www.maa. org/sites/default/files/pdf/upload_library/22/Ford/Gillman544-553.pdf. MR1908009



[5] K. G¨ odel, The Consistency of the Continuum Hypothesis, Annals of Mathematics Studies, no. 3, Princeton University Press, Princeton, N. J., 1940. MR0002514 [6] R. Hartshorne, Geometry: Euclid and beyond, Undergraduate Texts in Mathematics, SpringerVerlag, New York, 2000. MR1761093 ¨ [7] D. Hilbert, Uber das Unendliche, Math. Ann. 95 (1926), 161–190. http://link.springer. com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-0810/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf. [8] H.-O. Peitgen, H. J¨ urgens, and D. Saupe, Chaos and fractals: New frontiers of science, 2nd ed., with a foreword by Mitchell J. Feigenbaum, Springer-Verlag, New York, 2004. MR2031217 [9] P. Sarnak (ed.), Remembering Paul Cohen, Notices of the AMS 57 (2010), no. 7, 824–838.


Principles of Mathematical Analysis Introduction One of the most important contributions someone can make to mathematics is to encourage others to join and thrive in the field. Although there are many ways to do this, one way is through quality writing. A good textbook can circle the globe, edition after edition, reaching many generations. For example, Euclid’s Elements remained in use for almost 2,000 years; see notes for the 1963 entry. One of the most prestigious prizes honoring such work is the Leroy P. Steele Prize for Mathematical Exposition. It was first given in 1993 to Walter Rudin (1921–2010) for his enormously influential books Principles of Mathematical Analysis [4] and Real and Complex Analysis [5]. These books have been used around the world for decades and have influenced countless mathematicians. They have survived into many editions. In fact, the reason this is the entry for 1964 and not 1953 is that this year marks the publication of the second edition of Principles. Many mathematicians profess their love for these books because of the challenging problems at the end of each chapter. On a personal note, the second named author remembers using the third edition of Principles as a sophomore at Yale. At the time he was on the fence between mathematics and physics. The joy of wrestling with Rudin’s problems finally pushed him into the math camp. Principles is such an omnipresent classic that one can hardly imagine the time, shortly after it was published, when it was just another new real analysis textbook. The original 1953 review from the Bulletin of the American Mathematical Society compared three contemporary real analysis books: Real Functions by Casper Goffman (1913–2006), H. P. Thielman’s Theory of Functions of Real Variables, and Rudin’s Principles of Mathematical Analysis; see Figure 1. The reviewer, M. E. Munroe, concluded: Rudin’s book is definitely the smoothest. He lists his theorems in the most effective order for facilitating his arguments, and he invariably comes up with extremely neat proofs. [2]

Rudin’s books do have their detractors. His style, which was typical for the era, is terse. As generations of students have lamented, illustrations are notoriously absent from Principles. The first named author suspects that for each student turned on to mathematics by Principles’ style, another few are turned away. Perhaps the widespread use of Principles is one of the main reasons why real analysis is so frequently viewed as the “sink or swim” course by mathematics majors. As Herbert 275



Figure 1. Comparison of Rudin’s Principles of Real Analysis versus two competitors in the November 1953 issue of the Bulletin of the American Mathematical Society [2].

Wilf (1931–2012), who took undergraduate analysis at MIT under Rudin said: This course is famous for being our rite of passage. Our hazing ceremony. If you want to join the club, then here is the hurdle that you have to jump over. [6]

One Goodreads reviewer had the following humorous take: I have mixed feelings about this book. How to describe it. . . ok, let’s talk kung-fu movies. So there’s a standard trope in martial arts movies where the young apprentice shows up at the stoop of the Old Master and says, “teach me to fight”. And the Old Master decides that instead of doing the obvious thing and having our hapless padawan practice something reasonable like, you know, punching techniques, the Old Master tells the aspirant to do a series of incomprehensible and difficult tasks. Carrying the Old Master up and down the mountains. Knitting sweaters while hanging upside-down over hot coals. Doing the Old Master’s laundry. And so on. Usually, it’s never clear if the training is difficult because Sensei is trying to impart some kind of deeper wisdom or if he’s really just a resentful old jerk who takes pleasure in making young students suffer. Principles of Mathematical Analysis is the Old Master. It is completely uncompromising—no diagrams, the proofs are often opaque, the definitions unmotivated—and the book carries more than a whiff



of that sadistic strain in math education that sees formal rigor and a lack of justification as a kind of intellectual machismo. [3]

Centennial Problem 1964 Proposed by Steven J. Miller, Williams College. All these years later, I still remember Problems 16, 17, and 18 from Chapter 3 of Principles. This was my first introduction to Newton’s method, and I remember being amazed at being able to prove how rapidly convergence set in when computing square roots in Problem 16. Problem 17 involved a significantly slower method for finding square roots and Problem 18 was the generalization of Problem 16 to pth roots. I went to the office of my professor, Peter Jones (1952– ), to talk about these further. Although the problem below is somewhat standard, I’ve chosen to use that because of the impact these three problems had on me. I strongly urge any reader not familiar with these books to pick up a copy, read on, and try the exercises. Exercise #16, Chapter 3 (third edition). Fix a positive number α. Choose √ x1 > α, and define x2 , x3 , x4 , . . . by the recursion formula   1 α xn+1 = . xn + 2 xn √ (a) Prove that {xn } decreases monotonically and that lim xn = α. √ (b) Put n = xn − α, and show that

n+1 = √ so that, setting β = 2 α,

n+1 < β

1 β


2 < √n 2xn 2 α

2n (n = 1, 2, 3, . . . ).

(c) This is a good algorithm for computing square roots, since the recursion formula is simple and the convergence is extremely rapid. For example, if α = 3 and x1 = 2, show that 1 /β < 1/10 and that therefore

5 < 4 · 10−16 ,

6 < 4 · 10−32 .

1964: Comments The axiom of choice. We were so busy in the 1963 entry discussing the continuum hypothesis that we never had a chance to say anything substantial about the axiom of choice! That is a much more exciting topic than debating the merits of Rudin’s Principles of Real Analysis. In the proof of the existence of Vitali sets and in the derivation of the Banach– Tarski paradox (see the 1924 entry), we had an equivalence relation on a set. We produced a new set by selecting one element from each equivalence class. This step implicitly appeals to the axiom of choice (AC): Axiom of Choice. If {Xα }α∈I is a nonempty collection of # nonempty sets, then there is an f : I → α∈I Xα such that f (α) ∈ Xα for all α ∈ I.



This axiom, stated above in familiar mathematical terminology (as opposed to purely symbolically), is not one of the axioms of Zermelo–Fraenkel set theory (ZF). The axiom system obtained by augmenting ZF with AC is abbreviated ZFC. The function f “chooses” one element f (α) ∈ Xα for each α ∈ I. This additional axiom of set theory is needed whenever one needs to make infinitely many choices without a definite procedure in place to do so. Think of each “choice” as a “step” in a proof. Finitely many choices can be made in a proof of finite length. If infinitely many choices must be made, then we need AC to accomplish this in “one step” unless there is a definite procedure to make the choices automatically. The axiom of choice is used implicitly in statements such as the following: Suppose that X1 , X2 , . . . are nonempty sets. Let x1 , x2 , . . . be a sequence such that xn ∈ Xn for n = 1, 2, . . .. Without further knowledge about the sets Xn , the axiom of choice is required to assert that the sequence x1 , x2 , . . . exists. What do we mean about “further knowledge”? For example, AC is not required for the following statement: Suppose that X1 , X2 , . . . are nonempty subsets of N. Let x1 , x2 , . . . be the sequence defined by xn = min Xn for n = 1, 2, . . .. Here we have used the fact that N is well-ordered : a nonempty subset of N contains a smallest element. This does not require the axiom of choice because we have provided a definite rule for producing each xn . Suppose that a caterpillar with infinitely many pairs of legs is getting dressed. It has infinitely many pairs of shoes and infinitely many pairs of socks. For each pair of legs, the caterpillar can put on the left shoe first, then the right. The caterpillar is unable to wear its socks without the axiom of choice, since infinitely many choices need to be made without the aid of a procedure for making the selection. Since the socks are indistinguishable, an arbitrary choice must be made for each pair. For more about the axiom of choice, see the comments for the 1999 entry.

Cauchy functional equation. In 1905, Georg Hamel (1877–1954) used the axiom of choice to prove that not every solution f : R → R to the Cauchy functional equation f (x + y) = f (x) + f (y) is of the form f (x) = cx for some c ∈ R. Can you prove this?

A surprising isomorphism. Another cute application of the axiom of choice is the following: the groups (R, +) and (R2 , +) are isomorphic. That is, there is a bijection φ : R → R2 such that φ(x + y) = φ(x) + φ(y) for all x, y ∈ R. Can you prove this?



Bibliography [1] G. Hamel, Eine Basis aller Zahlen und die unstetigen L¨ osungen der Funktionalgleichung: f (x + y) = f (x) + f (y) (German), Math. Ann. 60 (1905), no. 3, 459–462, DOI 10.1007/BF01457624. MR1511317 [2] M. E. Munroe, Book Review: Real functions // Book Review: Principles of mathematical analysis // Book Review: Theory of functions of real variables, Bull. Amer. Math. Soc. 59 (1953), no. 6, 572–577, DOI 10.1090/S0002-9904-1953-09765-8. MR1565532 [3] M. Needham, Review of Principles of Mathematical Analysis, https://www.goodreads.com/ review/show/1271096254?book_show_action=true&from_review_page=1. [4] W. Rudin, Principles of mathematical analysis, McGraw-Hill Book Company, Inc., New YorkToronto-London, 1953. MR0055409 [5] W. Rudin, Real and complex analysis, McGraw-Hill Book Co., New York-Toronto, Ont.London, 1966. MR0210528 [6] H. S. Wilf, Epsilon sandwiches, https://www.math.upenn.edu/~wilf/website/MAASpeech.


Fast Fourier Transform Introduction There are many important milestones in our efforts to find better and faster ways to solve problems. One of the most important is the fast Fourier transform (FFT), developed in 1965 by James William Cooley (1926–2016) and John Tukey (1915–2000). It is (unintentionally) based upon tools first developed by Carl Friedrich Gauss in 1805 to calculate the coefficients in a trigonometric expansion related to the trajectories of two asteroids. The FFT has had a tremendous impact upon the engineering community, particularly in the field of digital signal processing. A discrete periodic function with period n can be thought of as a function whose domain is the cyclic group Z/nZ. Such functions arise naturally not only in abstract algebra and number theory, but also in many real-world applications. For example, a real- or complex-valued function on Z/nZ can be regarded as the discretization of a continuous, periodic function; see Figure 1. The discrete Fourier transform (DFT) of f : Z/nZ → C is the function f+ : Z/nZ → C defined by

n−1 1  f+(j) = √ f (k)e−2πijk/n , n k=0

in which i2 = −1. The DFT is used to analyze the strength of the “signal” f at √ various frequencies. The normalization 1/ n is not universal: 1 and 1/n are also used. Since periodic functions arise anytime there are waves or vibrations, the DFT is used to analyze everything from radio waves to earthquakes. The FFT reduces the number of computations required to compute the discrete Fourier transform from O(n2 ) to O(n log n). Since n log n tends to infinity much more slowly than n2 , the FFT provides a huge savings when n is large. This illustrates that a problem which appears to require a certain amount of time or effort may be susceptible to a more clever, faster approach; see the 2002 entry for another striking example. Our problem for this year involves such a problem: how fast can one multiply two matrices? We must be specific about how we measure the speed of an algorithm. Addition is somewhat faster than multiplication on a computer, so one often counts the number of multiplications required by an algorithm as a measure of its approximate runtime. In particular, one wants to know how the algorithm performs as the size of the input increases. 281



(a) A periodic function with period 1.

(b) A discretization (with 100 sample points) over one period of the periodic function above.

Figure 1. To analyze a periodic function, one can sample the function over one period at n evenly spaced points. The resulting discretized function can be regarded as a function on Z/nZ. Centennial Problem 1965 Proposed by Steven J. Miller, Williams College, and Bree Yeates, Emporia State University. If A and B are n×n matrices, then there are n2 entries in the product AB. Each entry apparently requires n multiplications and n − 1 additions to compute. Thus, computing AB can be done with n3 multiplications. Show that we can cleverly group terms and compute the four entries in the product of two 2 × 2 matrices with just seven multiplications (and 18 additions). 1965: Comments Matrix multiplication. The method suggested by the centennial problem can be iterated to provide an algorithm for multiplying two n × n matrices with only O(n2.8074 ) multiplications. The exponent log2 7 ≈ 2.8074, which improves upon the log2 8 = 3 provided by the naive algorithm, reflects the fact that only seven



Figure 2. Best known exponents for matrix multiplication (extrapolated from the image https://commons.wikimedia.org/ wiki/File:Bound on matrix multiplication omega over time. svg which is in the public domain).

multiplications (instead of eight) are required with each iteration. This method of matrix multiplication is known as the Strassen algorithm, due to Volker Strassen (1936– ) [6]. Since its introduction in 1969, there have been many incremental improvements; see Figure 2. The current world record is an O(n2.3728639 ) algorithm due to Fran¸cois Le Gall in 2014 [3]. For the most part these algorithms are only of theoretical interest since their numerical stability is inferior to that of the naive method. Moreover, the constants hidden by the Big-O notation can be prohibitively large. On the other hand, the Strassen algorithm can be used effectively over finite fields, in which numerical accuracy is irrelevant because the computations are performed exactly [7]. Fourier matrix. Let ζ = exp(2πi/n). The matrix representation of the npoint DFT with respect to the standard basis of Cn is the complex conjugate of ⎡ 1 ⎢1 1 ⎢ ⎢ Fn = √ ⎢1 n ⎢ .. ⎣.

1 ζ ζ2 .. .

1 ζ n−1

1 ζ2 ζ4 .. .

··· ··· ··· .. .

ζ 2(n−1)



ζ n−1 ⎥ ⎥ ζ 2(n−1) ⎥ ⎥. .. ⎥ . ⎦ (n−1)2 ζ




Table 1. The eigenvalues of the n × n Fourier matrix Fn depend upon n (mod 4). n 4k 4k + 1 4k + 2 4k + 3

+1 −1 −i i k+1 k k k−1 k+1 k k k k+1 k+1 k k k+1 k+1 k+1 k

This is the Fourier matrix of order n. It is unitary, meaning that Fn−1 = Fn∗ , in which Fn∗ is the conjugate transpose of Fn . Some computations with finite geometric series confirm that Fn2 = −I, and hence Fn4 = I. Thus, the eigenvalues of Fn are among 1, −1, i, −i; see Table 1. Since the trace of a matrix is the sum of its eigenvalues, repeated according to multiplicity, the multiplicities of the eigenvalues of Fn can be deduced from the evaluation of the quadratic Gauss sum ⎧ √ ⎪ (1 + i) n if n ≡ 0 (mod 4), ⎪ ⎪ n−1 ⎨√n  2 ⎪ if n ≡ 1 (mod 4), k ζ = ⎪ if n ≡ 2 (mod 4), ⎪ k=0 ⎪0√ ⎪ ⎩i n if n ≡ 3 (mod 4). The preceding formula is not at all obvious! Although the magnitude of the sum can be found relatively easily, its argument is much harder to pin down. As Gauss confided to Wilhelm Olbers (1758–1840) in 1805 [4]: . . . the determination of the sign, is exactly what has tortured me all the time. This shortcoming spoiled everything else that I found; and hardly a week passed during the last four years where I have not made this or that vain attempt to untie that knot—especially vigorously during recent times. But all this brooding and searching was in vain, sadly I had to put the pen down again. Finally, a few days ago, it has been achieved—but not by my cumbersome search, rather through God’s good grace, I am tempted to say. As the lightning strikes the riddle was solved; I myself would be unable to point to a guiding thread between what I knew before, what I had used in my last attempts, and what made it work. Curiously enough the solution now appears to me to be easier than many other things that have not detained me as many days as this one years, and surely noone whom I will once explain the material will get an idea of the tight spot into which this problem had locked me for so long.

Horner’s method. The FFT and Strassen algorithm provide much more rapid methods for performing important computations than the naive approaches suggested by the definitions. Although we refrain from discussing the technical details of these algorithms, we can at least discuss a simpler algorithm for the rapid evaluation of polynomials. This hints at the sort of creative thinking and clever repackaging that is often required to “beat” the approach suggested by definitions.



Although it appears that evaluating f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 requires 1 n(n + 1) = O(n2 ) 2 multiplications and n additions, there is a faster approach. Horner’s method , named after William George Horner (1786–1837), was known to Chinese mathematicians over 2,000 years ago. It computes f (x) as follows: 1 + 2 + ··· + n =

f (x) = a0 + x(a1 + x(a2 + · · · + x(an−1 + an x))). This requires only n multiplications and n additions, an order of magnitude savings over the naive method. We can sometimes do even better if the polynomial has a special form. For example, the evaluation of xn can be done in at most 2 log2 n steps; see the 1977 and 1996 entries, which concern applications of fast multiplication to encryption and the search for large prime numbers. Bibliography [1] C. Burrus, Fast Fourier Transforms. http://cnx.org/content/col10550/1.22/ [2] M. T. Heideman, D. H. Johnson, and C. S. Burrus, Gauss and the history of the fast Fourier transform, Arch. Hist. Exact Sci. 34 (1985), no. 3, 265–277, DOI 10.1007/BF00348431. MR815154 [3] F. Le Gall, Powers of tensors and fast matrix multiplication, ISSAC 2014—Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ACM, New York, 2014, pp. 296–303, DOI 10.1145/2608628.2608664. MR3239939 [4] S. J. Patterson, Gauss sums, The shaping of arithmetic after C. F. Gauss’s Disquisitiones arithmeticae, Springer, Berlin, 2007, pp. 505–528, DOI 10.1007/978-3-540-34720-0 19. MR2284818 [5] J. M. Pollard, The fast Fourier transform in a finite field, Math. Comp. 25 (1971), 365–374, DOI 10.2307/2004932. http://www.ams.org/journals/mcom/1971-25-114/S0025-5718-19710301966-0/S0025-5718-1971-0301966-0.pdf. MR0301966 [6] V. Strassen, Gaussian elimination is not optimal, Numer. Math. 13 (1969), 354–356, DOI 10.1007/BF02165411. MR0248973 [7] Wikipedia, Matrix multiplication algorithm, https://en.wikipedia.org/wiki/Matrix multiplication algorithm.


Class Number One Problem Introduction A binary, integral quadratic form is a function Q : Z2 → Z of the form Q(x, y) = ax2 + bxy + cy 2 ,


in which a, b, c are integers. We require that a and c be nonzero to avoid trivialities and we often drop the adjectives “binary” and “integral” in what follows. Despite their simple appearance, quadratic forms have an incredibly rich structure. Carl Friedrich Gauss developed much of the theory of quadratic forms in his landmark book Disquisitiones Arithmeticae. Two quadratic forms Q and Q are equivalent if Q (x, y) = Q(αx + βy, γx + δy), in which α, β, γ, δ ∈ Z and αδ − βγ = ±1; that is, (αx + βy, γx + δy) is related to (x, y) by a 2 × 2 integer matrix whose determinant is 1 or −1. The discriminant of (1966.1) is D = b2 − 4ac. One can show that equivalent forms share the same discriminant. We denote the number of equivalence classes of quadratic forms with discriminant D by h(D); this is the class number of D (see Table 1). For the sake of simplicity, we assume throughout the following that D < 0, in which case Q is positive definite: Q(x, y) > 0 for all x, y ∈ Z. Gauss proved that h(D) is always finite and discovered that the set of equivalence classes of quadratic forms of discriminant D forms an abelian group of order h(D) under a complicated operation now known as “Gauss composition.” This was illuminated by Fields Medalist Manjul Bhargava, who discovered many higherorder composition laws. In particular, the composition of quadratic forms can now be conveniently represented with so-called Bhargava cubes [2]. Gauss’s legendary class number one problem asserts that D > 0 satisfies h(−D) = 1 if and only if D ∈ {3, 4, 7, 8, 11, 12, 16, 19, 27, 28, 43, 67, 163}. It is more √ convenient these days to treat things in terms of imaginary quadratic fields Q[ −D] instead of quadratic forms. Consequently, it suffices to consider only square-free D since removing a perfect-square divisor of D results in the same √ field. In this context, Q[ −D] is said to have class number one if its “ideal class √ group” is trivial. This occurs if and only if the ring of integers in Q[ −D] is a 287



Table 1. √ Class numbers for the first 100 imaginary quadratic fields Q[ −D]. The boldface entries correspond to Gauss’s list (1966.2). D 1 2 3 5 6 7 10 11 13 14 15 17 19 21 22 23 26 29 30 31

h(−D) 1 1 1 2 2 1 2 1 2 4 2 4 1 4 2 3 6 6 4 3

D 33 34 35 37 38 39 41 42 43 46 47 51 53 55 57 58 59 61 62 65

h(−D) 4 4 2 2 6 4 8 4 1 4 5 2 6 4 4 2 3 6 8 8

D 66 67 69 70 71 73 74 77 78 79 82 83 85 86 87 89 91 93 94 95

h(−D) 8 1 8 4 7 4 10 8 4 5 4 3 4 10 6 12 2 4 8 8

D 97 101 102 103 105 106 107 109 110 111 113 114 115 118 119 122 123 127 129 130

h(−D) 4 14 4 5 8 6 3 6 12 8 8 8 2 6 10 10 2 5 12 4

D 131 133 134 137 138 139 141 142 143 145 146 149 151 154 155 157 158 159 161 163

h(−D) 5 4 14 8 8 3 8 4 10 8 16 14 7 8 4 6 8 10 16 1

D 165 166 167 170 173 174 177 178 179 181 182 183 185 186 187 190 191 193 194 195

h(−D) 8 10 11 12 14 12 4 8 5 10 12 8 16 12 2 4 13 4 20 4

unique √ factorization domain. An equivalent form of Gauss’s conjecture claims that Q( −D) with D > 0 has class number one if and only if D ∈ {1, 2, 3, 7, 11, 19, 43, 67, 163}.


In 1966, Alan Baker (1939–2018) and Harold Stark (1939– ) independently proved Gauss’s conjecture. Their methods were strikingly different: Baker used his theory of logarithmic forms [1], whereas Stark studied L-functions and certain Diophantine equations [6]. Stark’s approach was similar to an attempt of Kurt Heegner (1893–1965) from 1952 [4], which contained a gap that prevented its general acceptance for several years [7]. Baker was awarded the Fields Medal in 1970, in part for his work on the class number one problem; see the 1935 entry for information about Baker’s work on Hilbert’s seventh problem.

Centennial Problem 1966 Proposed by Kyle Pratt, Brigham Young University. Show that the quadratic form Q(x, y) = x2 + 7y 2 represents infinitely many primes. The first few primes of this form are 7, 11, 23, 29, 37, 43, 53, 67, 71, 79, 107, 109, 113, 127, 137, 149, 151, 163, 179.



Hint: For a prime p, show that p = x2 + 7y 2 if and only if p = 7 or (−7/p) = 1, in which (·/p) denotes the Legendre symbol. Then use the fact that h(−28) = 1. See the notes for a complete solution. 1966: Comments Primes of the form x2 + dy 2 . The study of primes of the form x2 + dy 2 has a long and storied history [3]. Fermat showed that a prime p is of the form x2 + y 2 if and only if p = 2 or p ≡ 1 (mod 4). Here is a short explanation. Since 2 = 12 + 12 , it suffices to consider odd primes p. If p ≡ 1 (mod 4), then the method discussed in the comments for the 1923 entry imply that p is the sum of two squares. On the other hand, any square is congruent to 0 or 1 (mod 4). Thus, a sum of two squares cannot be congruent to 3 (mod 4). Similar criteria are available for many other small values of d: • p = x2 + 2y 2 iff p = 2 or p ≡ 1, 3 (mod 8). • p = x2 + 3y 2 iff p = 3 or p ≡ 1 (mod 3). • p = x2 + 5y 2 iff p ≡ 1, 9 (mod 20). • p = x2 + 6y 2 iff p ≡ 1, 7 (mod 24). • p = x2 + 7y 2 iff p = 7 or p ≡ 1, 2, 4 (mod 7). Sums of squares. Let r(n) denote the number of decompositions of n as the sum of two squares. We count decompositions as distinct even if they differ only in sign or order. For instance, since 5 = (±2)2 + (±1)2 = (±1)2 + (±2)2 ,

Figure 1. Plot of r(n) for 1 ≤ n ≤ 1,000,000. The maximum value of r(n) in this range belongs to n = 801,125 = 53 · 13 · 17 · 29.



we say that r(5) = 8. If p is prime, then one can show that ⎧ ⎪ ⎨4 if p = 2, r(p) = 8 if p ≡ 1 (mod 4), ⎪ ⎩ 0 if p ≡ 3 (mod 4). More generally, if n = 2a m1 m2 , in which m1 is a product of primes of the form 4k + 1 and m2 is a product of primes of the form 4k + 3, then  0 if m2 is not a perfect square, r(n) = 4d(m1 ) if m2 is a perfect square. Here d(m1 ) denotes the number of divisors of m1 . Evidently, the behavior of r(n) is erratic; see Figure 1. On the other hand, the arithmetic mean An =

r(0) + r(1) + · · · + r(n) n+1

of r(n) does something remarkable: lim An = π.



Why does (1966.3) hold? First, observe that r(n) equals the number of points in √ Z2 that lie on the circle x2 +y 2 = n with center (0, 0) and radius n. Consequently, r(0) + r(1) + · · · + r(n)


is the number of points in Z2 that lie in the disk Dn = {(x, y) ∈ R2 : x2 + y 2 ≤ n}

√ of radius n centered at the origin. Thus, (1966.3) says that for large n, the expression (1966.4) is approximately equal to the area πn of the disk Dn . For each point of Dn ∩ Z2 , we associate the square of area 1 of which it forms the lower left-hand corner.1 The area of the region Rn that is formed by the union of these squares is (1966.4). In certain places Rn extends past the boundary of the Dn while in other places Dn extends beyond Rn . Since the squares have √ √ diagonal √ 2, it follows that the region Rn is contained in the disk of radius n+ 2 centered √ √ at the origin. On the other hand, the region Rn contains the disk of radius n− 2 centered at the origin; see Figure 2. Consequently, √ √ √ √ π( n − 2)2 ≤ r(0) + r(1) + · · · + r(n) ≤ π( n + 2)2 , from which it follows that

√ √ π + 2π 2n π − 2π 2n ≤ An ≤ π + . π+ n+1 n+1

Take the limit as n → ∞ to obtain (1966.3). In fact, the preceding inequalities tell √ us that An = π + O(1/ n), so the convergence is relatively slow. For example, A1,000,000 = 3.141545858 . . ., which is only accurate to four decimal places. 1 Any other corner, or even the center of the square, would work too. The important thing is to pick a convention and remain consistent.



Figure √ 2. The region Rn is contained inside of the disk of radius √ n + 2 that √ is centered at the origin. It contains the disk of √ radius n − 2 that is centered at the origin.

Solution to the problem. Fix a prime p = 7. It suffices to show that p = x2 + 7y 2


(−7/p) = 1.

Indeed, quadratic reciprocity ensures that (−7/p) = 1 if and only if p ≡ 1, 2, 4 (mod 7) and Dirichlet’s theorem on primes in arithmetic progressions yields infinitely many primes congruent to 1, 2, or 4 modulo 7; see the 1913 entry. Suppose that p = x2 + 7y 2 ; in particular, this implies that gcd(x, p) = gcd(y, p) = 1. Since x2 = p − 7y 2 , it follows that x2 ≡ −7y 2 (mod p)

(xy −1 )2 ≡ −7 (mod p),

and hence

so that (−7/p) = 1. Conversely, suppose that (−7/p) = 1. Then there is a b such that −7 ≡ (b )2 (mod p), which implies that −28 ≡ (2b )2 (mod 4p). In particular, observe that −28 is the discriminant of Q(x, y) = x2 + 7y 2 . Then b = 2b



b2 + 28 4p



are integers. Since h(−28) = 1 and because the discriminant of the quadratic form Q (x, y) = px2 + bxy + cy 2 is −28, it follows that Q is equivalent to Q. Since Q (1, 0) = p, it follows that Q also represents p; that is, p is of the form x2 + 7y 2 . Bibliography [1] A. Baker, Linear forms in the logarithms of algebraic numbers. IV, Mathematika 15 (1968), 204–216, DOI 10.1112/S0025579300002588. MR0258756 [2] M. Bhargava, Higher composition laws. I. A new view on Gauss composition, and quadratic generalizations, Ann. of Math. (2) 159 (2004), no. 1, 217–250, DOI 10.4007/annals.2004.159.217. MR2051392 [3] D. A. Cox, Primes of the form x2 +ny 2 : Fermat, class field theory, and complex multiplication, 2nd ed., Pure and Applied Mathematics (Hoboken), John Wiley & Sons, Inc., Hoboken, NJ, 2013. MR3236783 [4] K. Heegner, Diophantische Analysis und Modulfunktionen (German), Math. Z. 56 (1952), 227–253, DOI 10.1007/BF01174749. MR0053135 [5] D. Shanks, On Gauss’s class number problems, Math. Comp. 23 (1969), 151–163, DOI 10.2307/2005064. http://www.ams.org/journals/mcom/1969-23-105/S0025-5718-19690262204-1/S0025-5718-1969-0262204-1.pdf. MR0262204 [6] H. Stark, On complex quadratic fields with class number equal to one, Trans. Amer. Math. Soc. 122 (1966), 112–119, DOI 10.2307/1994504. http://www.ams.org/journals/tran/1966122-01/S0002-9947-1966-0195845-4/S0002-9947-1966-0195845-4.pdf. MR0195845 [7] H. M. Stark, On the “gap” in a theorem of Heegner, J. Number Theory 1 (1969), 16–27, DOI 10.1016/0022-314X(69)90023-7. MR0241384 [8] H. M. Stark, A complete determination of the complex quadratic fields of class-number one, Michigan Math. J. 14 (1967), 1–27. [9] H. M. Stark, The Gauss class-number problems, Analytic number theory, Clay Math. Proc., vol. 7, Amer. Math. Soc., Providence, RI, 2007, pp. 247–256. http://www.claymath.org/ publications/Gauss_Dirichlet/stark.pdf. MR2362205


The Langlands Program Introduction A handwritten letter from Robert Langlands (1936– ) to Andr´e Weil begins modestly: While trying to formulate clearly the question I was asking you before Chern’s talk I was led to two more general questions. Your opinion of these questions would be appreciated. I have not had a chance to think over these questions seriously and I would not ask them except as the continuation of a casual conversation. I hope you will treat them with the tolerance they require at this stage. After I have asked them I will comment briefly on their genesis. [5]

This 1967 letter was a tour de force, a manifesto that would shape the next halfcentury (and more) of number theory. The main characters in Langlands’s drama are automorphic forms: functions on a topological space that are invariant under a discrete group of symmetries (the actual definition is much longer and more technical). There are two crucial supporting characters. First we have Galois representations; these are homomorphisms Gal(Q/Q) → GLn (C), in which Q is the algebraic closure of Q and GLn (C) is the group of n × n invertible complex matrices. This Galois group is one of the richest objects in algebraic number theory and describing its representations is a complicated problem. We also have L-functions, of which the simplest example is the Riemann zeta function: −1 ∞   1 1 ζ(s) = = . (1967.1) 1− s ns p n=1 p prime The equality of the sum and the product is the famed Euler product formula; see the 1933 entry. Every L-function can be written as a product over primes in this way. They extend meromorphically to s ∈ C, with a certain symmetry with respect to s → 1 − s. Miraculously, L-functions encode all kinds of data, from the distribution of prime numbers to the number of points on algebraic varieties. The Langlands program, to put it roughly, aims to show that behind every Galois representation or L-function there is an automorphic form. For example, suppose that we have an L-function −1  −1  αp βp L(s) = , 1− s 1− s p p p prime

in which αp , βp ∈ C and αp βp = 1. This kind of L-function might come from an elliptic curve, a representation Gal(Q/Q) → GL2 (C), a modular form, or a more 293



mysterious Maass form. In the case of an elliptic curve (see the 1921 entry), αp and βp are functions of the number of points on the curve modulo p. Langlands conjectured that not only do these L-functions have automorphic forms behind them, but so too do the “symmetric power L-functions”  −1 r αpi βpr−i Lr (s) = . 1− ps p prime i=0 Just the convergence of the symmetric powers implies two famous conjectures: the Ramanujan conjecture (all αp , βp are on the unit circle) and the Sato–Tate conjecture (they are equidistributed on the circle). The Langlands program encompasses a vast range of conjectures and theorems, more than one person could ever prove. For example, class field theory is the simplest case of the Langlands program. Andrew Wiles’s proof of Fermat’s last theorem? That is part of the next simplest case. There have been huge breakthroughs on the Langlands program since 1967, such as the proof of the so-called fundamental lemma by Ngˆ o Bao Chˆ au (1972– ). He received the Fields Medal in 2010 for this result. However, we will almost certainly be working on the Langlands program for years to come. Centennial Problem 1967 Proposed by Ian Whitehead, Macalester College. In this problem we will show that Langlands’s conjecture for symmetric power L-functions implies the Ramanujan conjecture. Consider one factor of the product of symmetric power L-functions L0 (s)L2 (s)L4 (s) · · · L2m (s): m 2r

(1 − αi β 2r−i x)−1 ,

r=0 i=0

in which we have substituted α, β for αp , βp , and x for p−s . Assume that αβ = 1 and α + β ∈ R. Prove that this expands as a power series in x with positive real coefficients. (Hint: Take a logarithm first.) This fact, together with Langlands’s conjecture, implies that the series converges for x < 1/p, regardless of m. Conclude that |α| = |β| = 1. 1967: Comments It is impossible to do the Langlands program justice in a few short paragraphs and hence we make no attempt to do so. Instead we focus on a couple tangential results that are of a more elementary nature. Euclid’s theorem revisited. We derived the Euler product formula (1967.1) in the notes to the 1933 entry. In the notes for the 1919 entry, we showed that ∞  1 π2 , = 2 n 6 n=1



an old result due to Leonhard Euler (see also the 1939 and 1973 entries). Put these two results together and obtain −1  π2 1 = 1− 2 . p 6 p prime

Since π 2 /6 is irrational, the preceding product must include infinitely many terms; that is, there are infinitely many primes. This provides another proof of Euclid’s theorem. Armed with the finiteness of the irrationality measure of π 2 /6, one can modify this proof and obtain lower bounds on the prime counting function [2]. We should be more careful, however. The √ irrationality of π does not automatically imply that π 2 is irrational. For example, 2 is irrational (see the 1951 entry), but its square is an integer. Fortunately, π is transcendental (proved by Lindemann in 1882) and hence π 2 is irrational. Indeed, if π 2 were rational, then π would be algebraic since it is a root of x2 − π 2 , which is assumed to have rational coefficients. The field of algebraic numbers. Fundamental to the Langlands program is Q, the algebraic closure of Q. One can show that Q = A, the set of algebraic numbers that we encountered in the 1918 entry. Recall that an algebraic number is a complex number that is a zero of a polynomial with rational coefficients. Consequently, we are asserting that A is an algebraically closed field: any polynomial with coefficients in A has a root in A. It is not even clear that A is a field. For example, why are the sum and product of algebraic numbers algebraic? The proof that the algebraic numbers form a field is standard fare for abstract algebra texts. However, there is a concrete proof that uses only linear algebra. This has the added benefit of exposing students to the notion of tensor products, before they are introduced in a graduate algebra class as bifunctors on monoidal categories subject to certain coherence conditions; that is, as “abstract nonsense.” It is much better to see concrete examples with matrices before diving into whatever Figure 1 entails! (A ⊗ (B ⊗ C)) ⊗ D C B, α A,


⊗ 1D

A, B⊗

A ⊗ ((B ⊗ C) ⊗ D)

(A ⊗ B) ⊗ (C ⊗ D)


,D ,C




B, C, D

((A ⊗ B) ⊗ C) ⊗ D

C, D


A ⊗ (B ⊗ (C ⊗ D))

Figure 1. The pentagon diagram from the formal definition of an abstract tensor product. If such a diagram was your first introduction to tensor products, then we feel sorry for you.



The Kronecker product of an m × n matrix A = [aij ] and a p × q matrix B is the mp × nq matrix ⎡ ⎤ a11 B a12 B · · · a1n B ⎢ a21 B a22 B · · · a2n B ⎥ ⎢ ⎥ A⊗B =⎢ . .. .. ⎥ . .. ⎣ .. . . . ⎦ am1 B For example, if A=

! 1 2 3 4

am2 B and


amn B

B = [5 6],

! ! B 2B 5 6 10 12 = . 3B 4B 15 18 20 24 This is a concrete instance of a tensor product. You should verify that the Kronecker product enjoys the following properties: (a) (A ⊗ B)(C ⊗ D) = AC ⊗ BD, then

A⊗B =

(b) c(A ⊗ B) = (cA) ⊗ B = A ⊗ (cB), (c) (A + B) ⊗ C = A ⊗ C + B ⊗ C, (d) A ⊗ (B + C) = A ⊗ B + A ⊗ C, (e) A ⊗ (B ⊗ C) = (A ⊗ B) ⊗ C, in which A, B, C, D are matrices and c is a scalar [3, Sect. 3.6]. Eigenvalues and eigenvectors are particularly compatible with Kronecker products. From this stems our current interest in them. If Ax = λx and By = μy, then (A ⊗ B)(x ⊗ y) = (Ax) ⊗ (By) = (λx) ⊗ (μy) = λμ(x ⊗ y) and [(A ⊗ I) + (I ⊗ B)](x ⊗ y) = (A ⊗ I)(x ⊗ y) + (I ⊗ B)(x ⊗ y) = Ax ⊗ y + x ⊗ By = λx ⊗ y + μx ⊗ y = (λ + μ)(x ⊗ y). Thus, if λ and μ are eigenvalues of A and B, respectively, then λμ is an eigenvalue of A ⊗ B and λ + μ is an eigenvalue of A ⊗ I + I ⊗ B. Let f (z) = z n + cn−1 z n−1 + cn−2 z n−2 + · · · + c1 z + c0 be a polynomial of degree at least two. The companion matrix of f is ⎤ ⎡ 0 0 ... 0 −c0 ⎢1 0 . . . 0 −c1 ⎥ ⎥ ⎢ ⎢0 1 . . . 0 −c2 ⎥ Cf = ⎢ ⎥. ⎢. . . .. ⎥ . . ... ⎣ .. .. . ⎦ 0 0 . . . 1 −cn−1 Induction and cofactor expansion along the top row of zI − Cf reveals that f is the characteristic polynomial of Cf [3, p. 200]. Consequently, a complex number is



algebraic if and only if it is an eigenvalue of a matrix with rational entries. This is all the equipment we need to show that the set A of algebraic numbers is a field. Since A is a subset of the field of complex numbers, it suffices to show that A is closed under addition, multiplication, and inversion. Let α, β ∈ A and suppose that p(α) = q(β) = 0, in which p and q are monic polynomials with rational coefficients. Then α, β are eigenvalues of the rational matrices Cp and Cq , respectively. This means that αβ is an eigenvalue of the rational matrix Cp ⊗ Cq , so αβ is algebraic. Similarly, α+β is an eigenvalue of the rational matrix Cp ⊗I +I ⊗Cq and hence α+β is algebraic. What about inversion? If α = 0 is algebraic and p is a polynomial with rational coefficients such that p(α) = 0, then 1/α is a root of the rational polynomial z deg p (p(z −1 )). Bibliography [1] D. Bump, Automorphic forms and representations, Cambridge Studies in Advanced Mathematics, vol. 55, Cambridge University Press, Cambridge, 1997. MR1431508 [2] D. Burt, S. Donow, S. J. Miller, M. Schiffman, and B. Wieland, Irrationality measure and lower bounds for π(X), Pi Mu Epsilon J. 14 (2017), no. 7, 421–429. https://arxiv.org/abs/ 0709.2184. MR3726946 [3] S. R. Garcia and R. A. Horn A Second Course in Linear Algebra, Cambridge University Press, 2017. [4] S. Gelbart, An elementary introduction to the Langlands program, Bull. Amer. Math. Soc. (N.S.) 10 (1984), no. 2, 177–219, DOI 10.1090/S0273-0979-1984-15237-6. http://www.ams. org/journals/bull/1984-10-02/S0273-0979-1984-15237-6/S0273-0979-1984-15237-6. pdf. MR733692 [5] R. Langlands, Letter to Andr´ e Weil, Institute for Advanced Study. http://publications. ias.edu/rpl/paper/43. [6] R. P. Langlands, Problems in the theory of automorphic forms, Lectures in modern analysis and applications, III, Lecture Notes in Math., Vol. 170, Springer, Berlin, 1970, pp. 18–61. MR0302614


Atiyah–Singer Index Theorem Introduction In 1968, Michael Atiyah (1929–2019) and Isador Singer (1924– ) established what is now known as the Atiyah–Singer index theorem, a remarkable result that connects topology and analysis [2, 3]. In 2004, the Norwegian Academy of Science and Letters awarded the Abel Prize to Atiyah and Singer for this work (the inaugural award went to Jean-Pierre Serre, whom we met in our 1956 entry). The award citation proclaims: The Atiyah-Singer index theorem is one of the great landmarks of twentieth-century mathematics, influencing profoundly many of the most important later developments in topology, differential geometry and quantum field theory. Its authors, both jointly and individually, have been instrumental in repairing a rift between the worlds of pure mathematics and theoretical particle physics, initiating a crossfertilization which has been one of the most exciting developments of the last decades. We describe the world by measuring quantities and forces that vary over time and space. The rules of nature are often expressed by formulas involving their rates of change, that is, differential equations. Such formulas may have an “index”, the number of solutions of the formulas minus the number of restrictions which they impose on the values of the quantities being computed. The index theorem calculates this number in terms of the geometry of the surrounding space. [12]

It is also worth noting that Atiyah was knighted in 1983, in part for the index theorem. Singer, who is American, is not eligible for knighthood. Although the precise statement of the Atiyah–Singer index theorem is beyond the scope of this book, we can describe an elementary result that is of a similar spirit. The rank-nullity theorem from linear algebra states that if A is an m × n complex matrix, then n = rank A + nullity A. Similarly, m = rank A∗ + nullity A∗ , in which A∗ denotes the conjugate transpose of A. Since rank A = rank A∗ , it follows that m − n = nullity A∗ − nullity A. 299




This relates the “topological” quantity m − n, the difference in dimensions of the underlying range and domain spaces, to an “analytic” quantity, the kernel dimensions of A and A∗ , which measures the “sizes” of the solutions sets to Ax = 0 and A∗ x = 0, respectively. See the notes for a discussion of the Toeplitz index theorem, a deeper result in the same vein. The Atiyah–Singer index theorem paved new paths connecting physical theories such as string theory with pure abstractions found in topology. Much of physics is concerned with differential equations and the index theorem answers some fundamental general questions about them. The overview provided on the occasion of the Abel Prize states: The Atiyah–Singer index theorem is a purely mathematical result. It tells us that a fundamental question in analysis, namely how many solutions there are to a system of differential equations, has a concrete answer in topology. This insight provides a short-cut to getting to know whether such solutions exist or not. The theorem is valuable, because it connects analysis and topology in a beautiful and insightful way. It is practical, because it explains how the manifold applications there are of mathematical analysis can make good use of the spatial, or topological, structure that underlies the problem at hand. [15]

The Atiyah–Singer index theorem displays an unexpected connection between two seemingly unrelated branches of mathematics (see the 1985 entry for an even more remarkable story of a completely unexpected connection between disparate parts of mathematics). In a similar spirit, the following concrete, but extremely difficult problem, illustrates a surprising connection between the composition of polynomials and the classification of finite simple groups (see the 1992 and 2004 entries). Indeed, it is surprising that one of the deepest theorems in mathematics is necessary to solve the problem. Centennial Problem 1968 Proposed by Stephan Ramon Garcia, Pomona College. A polynomial f (x) ∈ C[x] is indecomposable if f (x) = u(v(x)) for u, v ∈ C[x] implies that u or v is a linear polynomial. Suppose that f, g are indecomposable polynomials and that f (x) − g(y) can be factored in C[x, y]. Prove that g(x) = f (ax + b) or deg f = deg g ∈ {7, 11, 13, 15, 21, 31}. Prove that each of these possibilities occurs. Hint: Use the classification of finite simple groups [7, 8] (see the comments for the 2004 entry). 1968: Comments A temporal anomaly. Although the index theorem appears here in the 1968 entry, Atiyah was awarded a Fields Medal in 1966 because he [d]id joint work with Hirzebruch in K-theory; proved jointly with Singer the index theorem of elliptic operators on complex manifolds; worked in collaboration with Bott to prove a fixed point theorem related to the “Lefschetz formula.”1 1 The quote refers to Friedrich Hirzebruch (1927–2012), Raoul Bott (1923–2005), and Solomon Lefschetz (1884–1972).



How did this occur? The original announcement of the index theorem dates to 1963 [1] and the results had undergone many years of peer review and study by the community before the final papers [2, 3] appeared in print in 1968. Toeplitz index theorem. A somewhat more elementary, although still highly nontrivial, index theorem is the Toeplitz index theorem. This requires a little bit of setup. Consider the Hardy space H 2 , which consists of complex power series  n f (z) = ∞ n=0 an z for which  1/2 ∞ 2 |an | f = n=0

is finite; see the 1949 entry. Each function f ∈ H 2 is analytic (see p. 151) on the open unit disk D and has a boundary function f (ζ) =


an ζ n = lim−




an (rζ)n



that exists for almost all ζ on the unit circle T [10]. For example, ∞  zn n+1 n=0 ∞ belongs to H 2 (its norm is the square root of n=1 1/n2 = π 2 /6; see the 1919 entry), but (1968.2) diverges for ζ = 1 since it is the harmonic series. However, such points are the exception, rather than the rule: the radial limit (1968.2) exists generically. Suppose now that we have a suitable function2 g : T → C that can be decomposed as a complex Fourier series  bn ζ n . g(ζ) =

f (z) =


Its Riesz projection is Pg =


bn ζ n ;


that is, we remove the negatively indexed summands. If g is nice enough, then we can regard P g as an element of H 2 by replacing ζ ∈ T with z ∈ D. For example,   P 2 cos(arg ζ) = P (ζ + ζ −1 ) = P (ζ) = z. If φ : T → C is continuous, then the Toeplitz operator Tφ : H 2 → H 2 with symbol φ is defined by Tφ f = P (φf ). Since φ is likely not analytic, its Fourier series will probably involve both positively and negatively indexed terms. To compute Tφ f , we multiply the Fourier series for φ and f term-by-term and then apply the Riesz projection, which removes any negatively indexed terms that result. 2 The technical hypothesis here is that g belongs to L2 (T), the space of complex-valued functions on T that are square-integrable with respect to Lebesgue measure.






Figure 1. Several curves and their winding numbers about a point. The Toeplitz index theorem relates this quantity to the index of a Toeplitz operator.

The analytic index of Tφ is ind Tφ = dim ker Tφ∗ − dim ker Tφ , in which Tφ∗ = Tφ is the adjoint operator of Tφ . If φ : T → C is continuous and does not pass through z, then its winding number about z is  dζ 1 indφ (z) = . 2πi φ ζ − z Those who have not learned complex analysis might be surprised to learn that this quantity is an integer and that it counts the number of times that φ encircles z; see Figure 1. The Toeplitz index theorem asserts that if φ : T → C does not pass through the origin, then ind Tφ = indφ (0). The preceding result relates the “analytic index” of a Toeplitz operator to its “topological index.” This is one of the seminal results in the theory of C ∗ -algebras, a field that can largely be thought of as “noncommutative point-set topology.” See [6] for a good introduction to the subject. Risch algorithm. The year 1968 is also notable for the introduction of the Risch algorithm, developed by Robert Henry Risch (1939– ) [13,14]. This algorithm determines whether a given function has an elementary antiderivative. If it has such an antiderivative, the Risch algorithm produces it. Calculus students worldwide depend on variants of the algorithm whenever they appeal to Wolfram Alpha to do their homework. Information about computer implementations of the Risch algorithm can be found in [5, 9]. What is an elementary function? We say that f (x) is elementary if it can be obtained from the field of complex rational functions in x by adjoining a finite number of nested exponentials, logarithms, and algebraic functions. The trigonometric and hyperbolic functions are elementary, as are their inverses. For example, 2 cos x = eix + e−ix



by Euler’s formula, so cos x is elementary. What about the inverse cosine? Write the preceding equation as e2ix − 2eix cos x + 1 = 0 and use the quadratic formula to reveal √  2 cos x ± 4 cos2 x − 4 ix e = = cos x ± i cos2 x − 1. 2 In what follows, we gloss over some technical issues, such as the precise definition of the complex logarithm. By convention, we select the plus sign in the preceding equation. Substitute x = cos−1 z and obtain  cos−1 z = −i log(z + z 2 − 1). This demonstrates that cos−1 z is an elementary function. Some well-known functions that do not have elementary antiderivatives are 1/ log x, which arises in the prime number theorem (see the 1919, 1933, and 1948 entries), cos(x2 ) and sin(x2 ), which arise in the Fresnel integrals from optics, and 2 e−x , which arises in the central limit theorem (see the 1922 entry). A particularly compelling example was found by Manuel Bronstein (1963– ), who observed that x f (x) = √ (1968.3) x4 + 10x2 − 96x − 71 has the elementary antiderivative  1  F (x) = − ln (x6 + 15x4 − 80x3 + 27x2 − 528x + 781) x4 + 10x2 − 96x − 71 8  − (x8 + 20x6 − 128x5 + 54x4 − 1408x3 + 3124x2 + 10001) + C but that substituting 72 in place of 71 in (1968.3) results in a function whose antiderivative is not elementary. Bibliography [1] M. F. Atiyah and I. M. Singer, The index of elliptic operators on compact manifolds, Bull. Amer. Math. Soc. 69 (1963), 422–433, DOI 10.1090/S0002-9904-1963-10957-X. MR0157392 [2] M. F. Atiyah and I. M. Singer, The index of elliptic operators. I, Ann. of Math. (2) 87 (1968), 484–530, DOI 10.2307/1970715. http://www.jstor.org/stable/1970715. MR0236950 [3] M. F. Atiyah and I. M. Singer, The index of elliptic operators. III, Ann. of Math. (2) 87 (1968), 546–604, DOI 10.2307/1970717. http://www.jstor.org/stable/1970717. MR0236952 [4] M. Bronstein, Integration of elementary functions, J. Symbolic Comput. 9 (1990), no. 2, 117–173, DOI 10.1016/S0747-7171(08)80027-2. MR1056841 [5] M. Bronstein, Symbolic Integration Tutorial http://www-sop.inria.fr/cafe/Manuel. Bronstein/publications/issac98.pdf. [6] K. R. Davidson, C ∗ -algebras by example, Fields Institute Monographs, vol. 6, American Mathematical Society, Providence, RI, 1996. MR1402012 [7] W. Feit, Some consequences of the classification of finite simple groups, The Santa Cruz Conference on Finite Groups (Univ. California, Santa Cruz, Calif., 1979), Proc. Sympos. Pure Math., vol. 37, Amer. Math. Soc., Providence, R.I., 1980, pp. 175–181. MR604576 [8] M. Fried, Exposition on an arithmetic-group theoretic connection via Riemann’s existence theorem, The Santa Cruz Conference on Finite Groups (Univ. California, Santa Cruz, Calif., 1979), Proc. Sympos. Pure Math., vol. 37, Amer. Math. Soc., Providence, R.I., 1980, pp. 571– 602. MR604636



[9] K. O. Geddes, S. R. Czapor, and G. Labahn, Algorithms for computer algebra, Kluwer Academic Publishers, Boston, MA, 1992. MR1256483 [10] J. Mashreghi, Representation theorems in Hardy spaces, London Mathematical Society Student Texts, vol. 74, Cambridge University Press, Cambridge, 2009. MR2500010 [11] R. B. Melrose, The Atiyah-Patodi-Singer index theorem, Research Notes in Mathematics, vol. 4, A K Peters, Ltd., Wellesley, MA, 1993. http://www.maths.ed.ac.uk/~aar/papers/ melrose.pdf. MR1348401 [12] Norwegian Academy of Science and Letters, 2004 Abel Prize Citation, http://www. abelprize.no/c53865/binfil/download.php?tid=53806 [13] R. H. Risch, The problem of integration in finite terms, Trans. Amer. Math. Soc. 139 (1969), 167–189, DOI 10.2307/1995313. MR0237477 [14] R. H. Risch, The solution of the problem of integration in finite terms, Bull. Amer. Math. Soc. 76 (1970), 605–608, DOI 10.1090/S0002-9904-1970-12454-5. MR0269635 [15] J. Rognes, On the Atiyah–Singer index theorem, http://www.abelprize.no/c53865/binfil/ download.php?tid=53804. [16] R. P. Stanley, Enumerative combinatorics. Vol. 2, with a foreword by Gian-Carlo Rota and appendix 1 by Sergey Fomin, Cambridge Studies in Advanced Mathematics, vol. 62, Cambridge University Press, Cambridge, 1999. MR1676282 [17] R. P. Stanley, Enumerative combinatorics. Vol. 2, with a foreword by Gian-Carlo Rota and appendix 1 by Sergey Fomin, Cambridge Studies in Advanced Mathematics, vol. 62, Cambridge University Press, Cambridge, 1999. MR1676282


Erd˝ os Numbers Introduction The most prolific mathematical researcher of the 20th century was Paul Erd˝ os. He wrote over 1,500 articles with around 500 different coauthors. Mathematicians started to think of him as the center of the research collaboration world. In 1969 Casper Goffman wrote a whimsical article in which he described a measure of distance from Erd˝ os in terms of mathematical collaborations [6]: • Paul Erd˝os has Erd˝os number 0; • a person who published a joint paper with Erd˝ os has Erd˝ os number 1; • a person who published a paper with a person with Erd˝ os number n but who does not qualify for a smaller Erd˝ os number has Erd˝ os number n + 1; • a person with no such path to Erd˝ os has Erd˝ os number ∞. Currently, over 11,000 people have Erd˝os number 2 and nearly every practicing mathematician has Erd˝os number 6 or less [9]. Most nonmathematicians have Erd˝os number ∞ (simply because most people have never coauthored a research article of any type), although there are many exceptions since researchers in physics, economics, computer science, and other fields can often be linked to Erd˝ os in a finite number of steps. From a mathematical point of view, Erd˝os numbers are distances in the grand “collaboration graph.” The vertices of this graph are researchers and an edge is present between every pair of researchers who have published together. A small portion of this graph is depicted in Figure 1. The collaboration graph is just one example of a large social network; other examples include Facebook and Twitter. Research into the structure and dynamics of social networks has reached a feverish pace in the past several years [10]. Much of that work deals with how graphs can evolve randomly, a topic pioneered by Erd˝ os and his collaborators decades ago [4]. Erd˝os himself wrote a short paper in 1972 in which it is shown that the more restrictive collaboration graph, in which only two-author papers are considered, cannot be drawn in the plane without its edges crossing [2]. To be more specific, he attributed the observation to Andrzej Schinzel (1937– ): I communicated this problem to Schinzel, who proved that G(M ) [the restricted collaboration graph] is not planar by showing that G(M ) contains a K(3, 3)—that is, a complete bipartite graph of 6 vertices (with three vertices of each color and the 9 edges connecting black to white in all possible ways). The white vertices are Chowla, Mahler, 305



Christopher N. B. Hammond Harold S. Shapiro Mihai Putinar

Alex Kontorovich

William T. Ross

Aviezri Fraenkel Curtis Cooper

David Sherman Gary Weiss Stephan Garcia

Hang Chen Paul Erd˝os

Steven J. Miller

M. Ram Murty

Florian Luca Zolt Fedi

Frank Morgan

Figure 1. Partial collaboration graph illustrating some of the authors’ links to Paul Erd˝ os. Both authors have Erd˝os number 2. They also have multiple three-edge paths to Erd˝os. Prior to the publication of this book, the authors had collaboration distance 2. After the publication of this book they will be connected by the dashed edge and hence have collaboration distance 1. and Schinzel; the black ones are Davenport, Erd˝ os, Lewis; the simple task of finding the 9 relevant papers can be left to the reader.

Centennial Problem 1969 Proposed by Jerrold Grossman, Oakland University. Suppose that in a group of at least three people, each pair has precisely one common friend. Prove that there is always someone who is everybody’s friend, and describe the structure of this “friendship graph.” Paul Erd˝os solved this problem in a paper with Alfr´ed R´enyi (1921–1970) and Vera S´ os (1930– ) in 1966 [5]. 1969: Comments Erd˝ os–Bacon numbers. A much more selective group are those who have a finite Erd˝ os–Bacon number . Your Erd˝os–Bacon number is the sum of your Erd˝os number and your Bacon number . The Bacon number is similar to the Erd˝os number: just replace “Paul Erd˝os” with “Kevin Bacon” and “research papers” with “movie roles.” If you have never appeared in a movie, then your Bacon number is infinite. Thus, it is hard to have a finite Erd˝ os–Bacon number. The first named author’s former senior thesis student, Vincent Selhorst-Jones, has one of the lowest Erd˝os–Bacon numbers (5) on record; see Figure 2. He appeared in American Sniper (2014) with Joel Lambert, who appeared in Patriots Day (2016) with Kevin Bacon. Thus, Vincent has Bacon number 2 (since he never appeared in a movie with Kevin Bacon, he does not have a 1). As an undergraduate mathematics



Figure 2. The actor and former mathematics major Vincent Selhorst-Jones has Erd˝ os–Bacon number five. Photo courtesy of Vincent Selhorst-Jones. major at Pomona College, Vincent coauthored a paper [7] with the first named author, who has Erd˝os number 2; see Figure 1. Thus, Vincent has Erd˝os–Bacon number 5. Wetzel’s problem. In 1963, Paul Erd˝os provided a stunning solution to the following problem, first posed by John E. Wetzel (1932– ): If {fα } is a family of distinct analytic functions (on some fixed domain) such that for each z the set of values {fα (z)} is countable, is the family itself countable?

Erd˝os proved that an affirmative answer to Wetzel’s problem is equivalent to the negation of the continuum hypothesis [3] (see [1] for a detailed exposition of Erd˝ os’s proof and the 1963 entry for more information about the continuum hypothesis). Taken together, Erd˝ os’s solution and Cohen’s proof of the independence of the continuum hypothesis render Wetzel’s problem undecidable in ZFC. Upon hearing of Erd˝os’s solution, Wetzel wrote to his advisor, Halsey Royden (1928–1993) and said: Erd˝ os has showed that the answer to a question I asked in my dissertation is closely tied to the continuum hypothesis! So once again a natural analysis question has grown horns.

Erd˝os begins his paper with “[i]n the Ann Arbor Problem Book, Wetzel asked (under the date December, 1962) the following question. . . .” One minor quibble is that Wetzel says that “I have never visited the University of Michigan; I’ve never even been to Ann Arbor.” Simple enough, we can consult the Ann Arbor Problem



Book to figure out what happened. However, Peter Duren (1935– ), who was a professor at the University of Michigan in 1962, tells us: The Secretary of the Math Club acted as guardian of the book, and both locals and visitors were invited to look through it. Unfortunately, the book was lost during the Christmas break of 1962–63, on the streets of Chicago. The man then serving as Secretary of the Math Club had carried the book (or books) with him when he drove to Chicago and had left it in his car overnight. Someone broke into the car and set it on fire, and the Math Club book was lost (among other items, including the car). . . . What Paul Erd˝ os called the Ann Arbor Problem Book must have been the Math Club book. But his reference can’t be checked, since the original entries for December 1962 no longer exist.

Interestingly the University of Illinois at Urbana-Champaign, where Wetzel was a professor for many years, has its own problem book that contains several remarks

Figure 3. Excerpt from the University of Illinois at UrbanaChampaign “Boneyard Book” (image in the public domain). Erd˝os’s distinctive handwriting is evident in the fourth entry.



on Wetzel’s problem by multiple authors and an entry written in Erd˝os’s distinctive handwriting; see Figure 3. The history of Wetzel’s problem is no less interesting than its solution; for details and references see [8]. Bibliography [1] M. Aigner and G. M. Ziegler, Proofs from The Book, 3rd ed., including illustrations by Karl H. Hofmann, Springer-Verlag, Berlin, 2004. MR2014872 [2] P. Erd˝ os, Mathematical Notes: On the Fundamental Problem of Mathematics, Amer. Math. Monthly 79 (1972), no. 2, 149–150, DOI 10.2307/2316535. MR1536622 [3] P. Erd˝ os, An interpolation problem associated with the continuum hypothesis, Michigan Math. J. 11 (1964), 9–10. MR0168482 [4] P. Erd˝ os and A. R´ enyi, On the evolution of random graphs (English, with Russian summary), Magyar Tud. Akad. Mat. Kutat´ o Int. K¨ ozl. 5 (1960), 17–61. http://www.renyi.hu/~p_erdos/ 1961-15.pdf. MR0125031 [5] P. Erd˝ os, A. R´ enyi, and V. T. S´ os, On a problem of graph theory, Studia Sci. Math. Hungar. 1 (1966), 215–235. MR0223262 [6] C. Goffman, Mathematical Notes: And What Is Your Erd˝ os Number?, Amer. Math. Monthly 76 (1969), no. 7, 791, DOI 10.2307/2317868. MR1535523 [7] S. R. Garcia, V. Selhorst-Jones, D. E. Poore, and N. Simon, Quotient sets and Diophantine equations, Amer. Math. Monthly 118 (2011), no. 8, 704–711, DOI 10.4169/amer.math.monthly.118.08.704. MR2843990 [8] S. R. Garcia and A. L. Shoemaker, Wetzel’s problem, Paul Erd˝ os, and the continuum hypothesis: a mathematical mystery, Notices Amer. Math. Soc. 62 (2015), no. 3, 243-247 (in Part II of the Erd˝ os retrospective). [9] J. W. Grossman, The Erd˝ os Number Project, www.oakland.edu/enp. [10] M. Newman, A.-L. Barab´ asi, and D. J. Watts (eds.), The structure and dynamics of networks, Princeton Studies in Complexity, Princeton University Press, Princeton, NJ, 2006. MR2352222


Hilbert’s Tenth Problem Introduction A Diophantine equation is an equation of the form p(x1 , x2 , . . . , xn ) = 0,


in which p is a polynomial with integer coefficients and only integer solutions are sought. Such equations have intrigued mathematicians from the dawn of the subject to the present day. Here are just a few well-known examples. An early example arises in the Pythagorean theorem, which asserts that a2 + b2 = c2 for a right triangle with sides a and b and hypotenuse c. Since this relationship can be rewritten as a2 + b2 − c2 = 0, it is of the form (1970.1). As evidence in favor of the old dictum that one should not trust a scarecrow whose certification comes from an unscrupulous degree mill, the theorem is apparently contradicted by the nonsense the scarecrow utters upon receiving his Th.D. (Doctor of Thinkology) diploma in The Wizard of Oz : The sum of the square roots of any two sides of an isosceles triangle is equal to the square root of the remaining side. [8]

Fictional scarecrows are not alone in botching the Pythagorean theorem: Major League Baseball messed it up as well (see the comments for the 1971 entry). See the comments for this year for a proof of the theorem. Another famous Diophantine equation is the Fermat equation xn + y n = z n , in which n ≥ 3. Pierre de Fermat conjectured in 1637 that the equation has no solutions in positive integers. This is a fiendishly difficult problem that took over three centuries to solve (see the 1981 and 1995 entries for more about Fermat’s last theorem). In 1900, the Second International Congress of Mathematicians was held in Paris. David Hilbert gave an influential keynote address that set out what he thought were the most important problems in mathematics [2]. This list has motivated and shaped the course of mathematical research ever since. The first, third, and seventh of Hilbert’s problems are discussed in the 1963, 1980, and 1935 entries, 311



respectively. Hilbert’s tenth problem was: Given a Diophantine equation with any number of unknown quantities and with rational integral numerical coefficients: To devise a process according to which it can be determined in a finite number of operations whether the equation is solvable in rational integers.

In 1970 Yuri Matiyasevich (1947– ) completed a chain of ideas developed by many mathematicians, including Julia Robinson (1919–1985), Martin Davis (1928– ), and Hilary Putnam (1926–2016), that proved Hilbert’s tenth problem is unsolvable [5]. That is, there does not exist an algorithm to determine whether an arbitrary Diophantine equation has an integer solution. We now turn to the other end of the spectrum: a problem for which a unique solution can be shown to exist and, moreover, for which there is a simple algorithm to find it. Moreover, it involves the sequence of Fibonacci numbers 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610,


a key ingredient in Matiyasevich’s attack on Hilbert’s tenth problem. Edouard Zeckendorf (1901–1983) proved that every positive integer can be written uniquely as a sum of nonconsecutive Fibonacci numbers (we exclude 0 and omit the first 1 in (1970.2)). For example, 42 = 34 + 8, 221 = 144 + 55 + 21 + 1,


1,701 = 1,597 + 89 + 13 + 2. We use a greedy algorithm to obtain these decompositions. Given n, subtract the largest Fibonacci number at most n; call it Fk . If n = Fk , then we are done. Otherwise, subtract the largest Fibonacci number at most n − Fk . The number subtracted off at this step cannot be Fk−1 since otherwise we could have subtracted Fk+1 = Fk + Fk−1 in the first step. Since this contradicts the maximality of Fk , we conclude that consecutive Fibonacci numbers are never used in the decomposition. It turns out that Zeckendorf’s theorem provides another characterization of the Fibonacci numbers: they are the unique sequence of positive integers such that every natural number can be written uniquely as a sum of nonconsecutive terms. Without the restriction that the Fibonacci numbers involved are nonconsecutive, many more representations arise. For example, since 34 = 21 + 13 and 8 = 5 + 3, we may write 42 = 34 + 5 + 3 = 21 + 13 + 8 = 21 + 13 + 5 + 3, and so forth. Among all possible decompositions of a positive integer as a sum of Fibonacci numbers, one can show that none have fewer summands than the Zeckendorf decomposition; see the 1980 entry.



Centennial Problem 1970 Proposed by Steven J. Miller, Williams College. Here is an outline for another proof of Zeckendorf’s theorem. We phrase it as a cookie problem, though it is more commonly referred to as a stars and bars problem: how many ways are there to divide C identical cookies among P people, in which the cookies are indistinguishable? This is equivalent to x1 + x2 + · · · + xP = C,

x1 , x2 , . . . , xP ≥ 0.

Find an elementary proof that the number of solutions to the preceding is   C +P −1 . P −1 Can you use this to prove Zeckendorf’s theorem? 1970: Comments Statistical properties of Zeckendorf decompositions. The combinatorial interpretation suggested by the problem can be used not only to prove Zeckendorf’s theorem, but also to obtain statistical results about Zeckendorf decompositions. For example, if we look at all integers in [Fn , Fn+1 ), then the number of summands in the Zeckendorf decomposition becomes normally distributed as n → ∞ [4]. As a consequence, one can obtain the following curious results of Lekkerkerker [3]. The average number of summands used to represent integers in [Fn , Fn+1 ) is √ 2 1 5− 5 2 n− = n− 10 5 1 + φ2 5 and the variance in the number of summands is 2 φ 2 1 √ n− = n− , 25 5(φ + 2) 25 5 5 in which

√ 1 (1 + 5) = 1.618 . . . 2 denotes the golden ratio. The appearance of the golden ratio is not surprising in light of Binet’s formula; see the comments for the 2002 entry. For another angle on statistical properties of the Fibonacci numbers, see the 1938 entry. φ =

A Fibonacci tiling. The Fibonacci identity F12 + F22 + · · · + Fn2 = Fn Fn+1


has an appealing geometric interpretation; see Figure 1. The left-hand side of (1970.3) can be interpreted as the sum of the areas of n squares used to dissect a rectangle of size Fn (Fn + Fn−1 ) = Fn Fn+1 , which yields the desired formula.



Figure 1. Squares of Fibonacci size tile the plane. In addition, this suggests the formula F02 + F12 + · · · + Fn2 = Fn Fn+1 . A proof of the Pythagorean theorem. If you are reading this book, chances are that you have taken a good number of sophisticated mathematics courses. However, a surprising number of mathematics majors cannot prove the Pythagorean theorem off the top of their heads! We shall remedy that here; see Figure 2 for an elegant “proof by picture.” There are now hundreds of proofs known. Even a






a a


a (a) The area of the large square equals that of the two small squares (a2 + b2 ) plus that of the four triangles.







c b


a b


(b) The area of the large square equals that of the central square (c2 ) plus that of the four triangles.

Figure 2. Proof of the Pythagorean theorem. The total area in (a) is a2 +b2 +4( 21 ab). The total area in (b) is c2 +4( 12 ab). Equating these expressions yields a2 + b2 = c2 .



US President James A. Garfield (1831–1881) got in on the action when he was a member of the US House of Representatives [1]. A readable account of the history behind the Pythagorean theorem is [7]. Bibliography [1] J. A. Garfield, Pons Asinorum, The New England Journal of Education 3 (1876), no. 14, 161. ¨ [2] D. Hilbert, Uber das Unendliche (German), Math. Ann. 95 (1926), no. 1, 161–190, DOI 10.1007/BF01206605. http://www.ams.org/journals/bull/1902-08-10/S0002-99041902-00923-3/S0002-9904-1902-00923-3.pdf. MR1512272 [3] C. G. Lekkerkerker, Voorstelling van natuurlyke getallen door een som van getallen van Fibonacci, Simon Stevin 29 (1952), 190–195. [4] M. Kolo˘ glu, G. S. Kopp, S. J. Miller, and Y. Wang, On the number of summands in Zeckendorf decompositions, Fibonacci Quart. 49 (2011), no. 2, 116–130. MR2801798 [5] Ju. V. Matijaseviˇc, The Diophantineness of enumerable sets (Russian), Dokl. Akad. Nauk SSSR 191 (1970), 279–282. MR0258744 [6] Y. Matijasevich, My collaboration with Julia Robinson, Math. Intelligencer 14 (1992), no. 4, 38–45, DOI 10.1007/BF03024472. MR1188142 [7] E. Maor, The Pythagorean theorem: A 4,000-year history, Princeton University Press, Princeton, NJ, 2007. MR2316578 [8] Scarecrow (from the Wizard of Oz), https://www.youtube.com/watch?v=uCOxU2rKLas.


Society for American Baseball Research Introduction The Society for American Baseball Research (SABR), founded in Cooperstown, New York, by Bob Davids (1926–2002) in 1971, has many objectives, one of which is to encourage and aid the application of mathematics and statistics to the analysis of baseball. The term sabermetrics, derived from the acronym SABR, refers to the statistical study of baseball (usually with the aim of improving a team’s performance). Sabermetricians have created an alphabet soup of acronyms to describe new metrics for measuring player performance (VORP, WAR, OPS, and so forth). Other sports have since followed baseball’s lead. For example, exotic acronyms such as TS%, PER, PPP, USG%, and APM are now bandied about on basketball websites. The current dominance of the NBA’s Golden State Warriors is often partly attributed to their wholehearted embrace of data analytics. It is important to know what to measure. For example, walks were originally viewed as errors by the pitcher and not a positive event by the batter. This led to an enormous undervaluation of walks, now remedied by the consideration of on-base percentage. Since the annual revenues in Major League Baseball (MLB) and other professional sports are measured in the billions, there is a lot at stake. A team that has a better understanding of which statistics truly matter can assemble a better team for less money. This can translate into World Series rings and increased revenue. Most teams now have sabermetricians helping with player selection and strategy. Moneyball [6], by Michael Lewis (1960– ), is an excellent popular account of how the Oakland A’s applied these principles and, with a relatively small budget, fielded competitive teams that routinely reached the playoffs. See also [4] for applications of mathematics in sports.

Centennial Problem 1971 Proposed by Steven J. Miller, Williams College. Only seven times in MLB history has a team had four consecutive batters hit home runs: the Milwaukee Braves (1961), the Cleveland Indians (1963), the Minnesota Twins (1964), the Los Angeles Dodgers (2006), the Boston Red Sox (2007), the Chicago White Sox (2008), and the Arizona Diamondbacks (2010). Estimate the probability that some team performs this feat during the season. What is wrong with just raising the average home run frequency to the fourth power? 317



1971: Comments Predicting unlikely events. The importance of the chosen problem extends far beyond baseball: how do we estimate the probability of an unlikely event? One approach is through simulation; see the 1946 entry on the Monte Carlo method. However, we need the ability to run many trials and gather a lot of data. For Monte Carlo-type methods to be useful in baseball, we would need to be able to simulate games accurately. Such programs do exist and they often use Markov chains; see the 1953 entry. Consult [2, 9] for an introduction to Markov chains and [1, 3, 8] for some applications to baseball. Another approach is to count how many situations have existed in which the desired event could have occurred and how many of these situations led to the outcome. Such an approach does an excellent job for events that occur frequently, such as hits or stolen bases, or even coming back to win after being down by four runs after six innings. It is much harder to apply this method if there are few occurrences. In a playoff matchup, two teams compete in a best-of-seven series; this means that the first team to win four games advances. Prior to 2004 (when the Boston Red Sox achieved the feat), no team in Major League Baseball had ever come back to win a series after trailing 3-0. However, such opportunities for an epic comeback only arose 24 times (as of January 1, 2018, it has happened only 34 times). If each team has an equal chance of winning a game, then we should expect the team down 3-0 to complete the comeback one out of every sixteen times. Of course, it is too simplistic to think that each team has an equal chance: perhaps the team that is up 3-0 is just much better than the other team. In Figure 1, we plot the probability of having no teams, at most one team, and at most two teams come back to win a best-of-seven series after being down 3-0 if there are n teams in that situation. There is an enormous difference if we drop the hypothesis that each team in a series is equally likely to win any given game. If we assume that the losing team has only a 40% chance of winning each game, then the number of teams expected to complete an epic comeback drops dramatically. To compute the probabilities in Figure 1, we first find the chance that one team comes back after being down 3-0 in games. Assuming they win each individual game with probability p, the chance they win the next four is just p4 . Thus, the probability they do not come back is 1−p4 . If there are n teams that find themselves in a 3-0 hole, the probability that none come back is just (1 − p4 )n , while exactly one team comes back with probability   n (p4 )(1 − p4 )n−1 1 and exactly two happens with probability   n (p4 )2 (1 − p4 )n−2 . 2 To get the probabilities of at most 0, 1, or 2 teams winning a series we just sum the corresponding probabilities.


(a) p = 0.5


(b) p = 0.4

Figure 1. Probability (vertical axis) of having no teams (blue), at most one team (yellow), and at most two teams (green) come back to win a best-of-seven series after being down 3-0 when there have been n teams in that position (horizontal axis) and the trailing team has a probability p of winning each game.

Which team wins? What is the probability that one team beats another? The goal is to obtain a formula that allows you to assess the contributions of your players to winning. Such knowledge can then be used to determine where you need to build. Is it more valuable to improve your offense or your pitching? How much should you pay for a hitter that is a little bit better than your current player? More generally, the answer is a result of general techniques that can be applied to a variety of problems. One of the most commonly used formulas is the Pythagorean won-loss formula, due to Bill James (1949– ), which dates back to the 1970s. To give a sense of its value, it is one of the few statistics often used in scoreboards or expanded scoreboards online (frequently denoted X-WL for expected won-loss). If RS denotes the average number of runs scored by a team per game, and RA the average number of runs they allow, James postulated that a good approximation to their winning percentage (number of wins divided by number of games) would be RS2 /(RS2 + RA2 ). The exponent 2 was chosen to simplify the computations and led to the name since the sum of the squares in the denominator looks similar to the sum of squares in the Pythagorean theorem. Nowadays the 2 is replaced by a parameter γ, whose best fit value in baseball is close to, but a little less than, 2. In 2006, the second named author provided a theoretical justification for why this formula should be an excellent predictor. He used elementary probability theory to model the runs scored and allowed as being drawn from independent Weibull distributions; see [5, 7]. One of the great values of the Pythagorean expectation is that it allows a team to estimate the benefit it would receive from adding a hitter who generates 10 more runs versus signing a pitcher who allows 10 fewer. The nonexistence of baseball. We claim that baseball does not exist. To be more specific, we prove that the official rules of Major League Baseball specify an








Figure 2. The specifications for home plate require the existence of a right triangle with sides 12, 12, 17. This is prohibited by the Pythagorean theorem.

impossible geometric construction. According to the Major League Baseball 2017 Official Rules: Home base shall be marked by a five-sided slab of whitened rubber. It shall be a 17-inch square with two of the corners removed so that one edge is 17 inches long, two adjacent sides are 8 21 inches and the remaining two sides are 12 inches and set at an angle to make a point. It shall be set in the ground with the point at the intersection of the lines extending from home base to first base and to third base; with the 17-inch edge facing the pitcher’s plate, and the two 12-inch edges coinciding with the first and third base lines. The top edges of home base shall be beveled and the base shall be fixed in the ground level with the ground surface.

Since “the infield shall be a 90-foot square,” it follows that home base contains a right triangle with hypotenuse 17 and side lengths 12, 12; see Figure 2. This contradicts the Pythagorean theorem since 122 + 122 = 288


172 = 289.

Consequently, home base does not exist, from which it follows that baseball does not exist.



Bibliography [1] J. Beamer, Introducing Markov chains, The Hardball Times, November 26, 2007. https:// www.fangraphs.com/tht/introducing-markov-chains/. [2] E. Behrends, Introduction to Markov chains: With special emphasis on rapid mixing, Advanced Lectures in Mathematics, Friedr. Vieweg & Sohn, Braunschweig, 2000. MR1730905 [3] B. Bukiet, E. R. Harold, and J. L. Palacios, Markov Chain Approach to Baseball, Operations Research 45 (1997), 14–23. https://pubsonline.informs.org/doi/abs/10.1287/opre.45.1. 14. [4] J. A. Gallian (ed.), Mathematics and sports, The Dolciani Mathematical Expositions, vol. 43, Mathematical Association of America, Washington, DC, 2010. MR2766424 [5] S. J. Miller, T. Corcoran, J. Gossels, V. Luo, and J. Porfilio, Pythagoras at the bat, Social networks and the economics of sports, Springer, Cham, 2014, pp. 89– 113, DOI 10.1007/978-3-319-08440-4 6. https://web.williams.edu/Mathematics/sjmiller/ public_html/math/papers/MillerEtAl_Pythagoras.pdf. MR3307909 [6] M. Lewis, Moneyball: The Art of Winning an Unfair Game, W. N. Norton & Company, 2004. [7] S. J. Miller, A derivation of James’ Pythagorean projection, By The Numbers – The Newsletter of the SABR Statistical Analysis Committee 16 (February 2006), no. 1, 17–22 and Chance Magazine 20 (Winter 2007), no. 1, 40–48; expanded version available at https://web.williams. edu/Mathematics/sjmiller/public_html/math/papers/PythagWonLoss_Paper.pdf. [8] M. D. Pankin, Baseball as a Markov Chain, http://www.pankin.com/markov/intro.htm. [9] D. Stansbury, A Brief Introduction to Markov Chains, The Clever Machine: Topics in Computational Neuroscience & Machine Learning, September 24, 2012. https://theclevermachine. wordpress.com/2012/09/24/a-brief-introduction-to-markov-chains/.


Zaremba’s Conjecture Introduction What is the best way to numerically integrate a function of several variables? One method is to compute the average value of the function over a large number of sample points. The 1946 entry described the Monte Carlo method, in which sample points are selected at random. However, it is often desirable to use a deterministic approach, that is, one that does not depend upon random choices. Suppose for the sake of simplicity that we wish to numerically integrate a realvalued smooth function of two variables over the unit square [0, 1]2 in R2 . In 1971, Stanis law Zaremba (1903–1990) suggested using sample points - . n np  , (mod 1) : 1 ≤ n ≤ q , q q in which gcd(p, q) = 1. In other words, he considered the orbit of (1/q, p/q) under repeated addition modulo 1; this may remind you of Figure 1 in the 1961 entry. Zaremba noticed that the quality of the sampling depends upon how small the partial quotients a0 , a1 , . . . , ak are in the (finite) continued fraction expansion p = a0 + q

1 1

a1 + a2 +


1 a3 + · · ·

see Figure 1 (for more information about continued fractions, see [3,5] and the 1931, 1934, and 1955 entries). There is no loss of generality in assuming that 1 ≤ p < q, in which case a0 = 0 and we write p = [a1 , a2 , . . . , ak ]. q For a given q, can we select a p so that a1 , a2 , . . . , ak are as small as possible? In 1972, Zaremba conjectured that this “height” of the partial quotients can be made absolute, for any choice of sample size q [6]. In particular, he conjectured that one can always select p so that max{a1 , a2 , . . . , ak } ≤ 5; see Table 1. Zaremba’s conjecture is our problem for this year. Centennial Problem 1972 Proposed by Alex Kontorovich, Rutgers University. For A > 0, let DA be the set of all positive integers q for which there exists a p ∈ {1, 2, . . . , q} with gcd(p, q) = 1 such that the finite continued fraction expansion 323



(a) 1191/2383 = [2, 1191] yields a poor sampling of the unit square.

(b) 1678/2383 = [1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2] yields a good sampling of the unit square.

Figure 1. Zaremba noticed that if p/q = [a1 , a2 , . . . , ak ] has only small partial quotients a1 , a2 , . . . , ak , then {(n/q, np/q) (mod 1) : 1 ≤ n ≤ q} provides a good sampling of the unit square. For the prime q = 2,383, we examine the samplings that arise from p = 1,191 and p = 1,678. p/q = [a1 , a2 , . . . , ak ] has max{a1 , a2 , . . . , ak } ≤ A. Prove that there exists an A > 1 so that DA = N. Bonus points. Prove that A = 5 suffices. Extra bonus points. Prove that A = 2 suffices if a finite number of integers are allowed to be omitted. 1972: Comments A continued fraction expansion for e. We follow [4, Sect. 3.8] and derive the beautiful continued fraction expansion1 2 e = 2+ (1972.1) 3 2+ 4 3+ 5 4+ 5 + ··· for Euler’s constant e = 2.71828 . . .. First, substitute x = −1 in the power series expansion ∞  xn x2 x3 = 1+x+ + + ··· ex = n! 2! 3! n=0 and obtain

1 1 1 = 1 − 1 + − + ··· , e 2! 3!

1 Since the numerators in (1972.1) are not all 1, this is not a “simple” continued fraction of the sort that we have been considering.



Table 1. Evidence in favor of Zaremba’s conjecture. For q = 2, 3, . . . , 101, we find a corresponding p so that none of the partial quotients in the continued fraction expansion of p/q exceed A = 5. p q 1 2 1 3 1 4 1 5 5 6 2 7 3 8 2 9 3 10 2 11 5 12 3 13 3 14 4 15 3 16 3 17 5 18 4 19 9 20 4 21 5 22 4 23 5 24 7 25 5 26

[a1 , a2 , . . . , ak ] 2 3 4 5 1, 5 3, 2 2, 1, 2 4, 2 3, 3 5, 2 2, 2, 2 4, 3 4, 1, 2 3, 1, 3 5, 3 5, 1, 2 3, 1, 1, 2 4, 1, 3 2, 4, 2 5, 4 4, 2, 2 5, 1, 3 4, 1, 4 3, 1, 1, 3 5, 5

p q 5 27 5 28 5 29 7 30 7 31 7 32 7 33 9 34 6 35 11 36 7 37 7 38 7 39 7 40 9 41 11 42 8 43 13 44 8 45 11 46 9 47 11 48 9 49 9 50 11 51

[a1 , a2 , . . . , ak ] 5, 2, 2 5, 1, 1, 2 5, 1, 4 4, 3, 2 4, 2, 3 4, 1, 1, 3 4, 1, 2, 2 3, 1, 3, 2 5, 1, 5 3, 3, 1, 2 5, 3, 2 5, 2, 3 5, 1, 1, 3 5, 1, 2, 2 4, 1, 1, 4 3, 1, 4, 2 5, 2, 1, 2 3, 2, 1, 1, 2 5, 1, 1, 1, 2 4, 5, 2 5, 4, 2 4, 2, 1, 3 5, 2, 4 5, 1, 1, 4 4, 1, 1, 1, 3

p q 9 52 10 53 17 54 12 55 13 56 10 57 11 58 11 59 11 60 11 61 11 62 11 63 11 64 12 65 25 66 12 67 13 68 13 69 13 70 15 71 17 72 13 73 13 74 13 75 13 76

[a1 , a2 , . . . , ak ] 5, 1, 3, 2 5, 3, 3 3, 5, 1, 2 4, 1, 1, 2, 2 4, 3, 4 5, 1, 2, 3 5, 3, 1, 2 5, 2, 1, 3 5, 2, 5 5, 1, 1, 5 5, 1, 1, 1, 3 5, 1, 2, 1, 2 5, 1, 4, 2 5, 2, 2, 2 2, 1, 1, 1, 3, 2 5, 1, 1, 2, 2 5, 4, 3 5, 3, 4 5, 2, 1, 1, 2 4, 1, 2, 1, 3 4, 4, 4 5, 1, 1, 1, 1, 2 5, 1, 2, 4 5, 1, 3, 3 5, 1, 5, 2

p q 16 77 17 78 14 79 17 80 14 81 17 82 16 83 19 84 16 85 15 86 16 87 17 88 16 89 17 90 16 91 17 92 16 93 33 94 17 95 17 96 17 97 17 98 17 99 19 100 18 101

which can be rewritten as 1 1 1 1 1 1− = − + − + ··· . e 1 1·2 1·2·3 1·2·3·4 This is a convergent series of the form 1 1 1 1 − + − + ··· , x1 x1 x2 x1 x2 x3 x1 x2 x3 x4

[a1 , a2 , . . . , ak ] 4, 1, 4, 3 4, 1, 1, 2, 3 5, 1, 1, 1, 4 4, 1, 2, 2, 2 5, 1, 3, 1, 2 4, 1, 4, 1, 2 5, 5, 3 4, 2, 2, 1, 2 5, 3, 5 5, 1, 2, 1, 3 5, 2, 3, 2 5, 5, 1, 2 5, 1, 1, 3, 2 5, 3, 2, 2 5, 1, 2, 5 5, 2, 2, 3 5, 1, 4, 3 2, 1, 5, 1, 1, 2 5, 1, 1, 2, 3 5, 1, 1, 1, 5 5, 1, 2, 2, 2 5, 1, 3, 4 5, 1, 4, 1, 2 5, 3, 1, 4 5, 1, 1, 1, 1, 3





for which a remarkable algebraic manipulation is available. Observe that 1 1 1 − = 1 x1 x1 x2 x1 + ( x2x−1 ) and use this to obtain 1 1 1 1 x3 − 1 − + = − x1 x1 x2 x1 x2 x3 x1 x1 x2 x3 1 1 = − x3 x1 x1 ( xx32−1 ) 1

= x1 +

by (1972.3)

x1 x3 ( xx32−1 )



= x1 +


x1 x2 x3 − 1

x2 − 1 +

Proceed by induction and get 1 1 1 1 − + − + = x1 x1 x2 x1 x2 x3 x1 x2 x3 x4 x1 +



x1 x2 − 1 +

x2 x3 − 1 +

x3 x4 − 1 + · · ·

Apply this to (1972.2) with xn = n and obtain 1−

1 = e






1+ 2+

3 3 + ···

Since 1−

1 = e

1 1+

1 e−1


we see that 1 1 1+ e−1







1+ 2+

3 3 + ···

from which (1972.1) follows. Status of the problem. In 2011, Alex Kontorovich (1980– ) and Jean Bourgain almost proved Zaremba’s conjecture [1]. To be more specific, they showed that |D50 ∩ {1, 2, . . . , n}| lim = 1. n→∞ n In other words, almost all natural numbers appear as the denominator of a finite continued fraction whose partial quotients are bounded by 50. In 2015, the same



result was established with D5 in place of D50 . Thus, Zaremba’s original conjecture with A = 5 is now known to be “almost” true in the sense that those natural numbers that do not belong to D5 have density zero. Bibliography [1] J. Bourgain and A. Kontorovich, On Zaremba’s conjecture, Ann. of Math. (2) 180 (2014), no. 1, 137–196, DOI 10.4007/annals.2014.180.1.3. https://arxiv.org/pdf/1107.3776. MR3194813 [2] S. Huang, An improvement to Zaremba’s conjecture, Geom. Funct. Anal. 25 (2015), no. 3, 860–914, DOI 10.1007/s00039-015-0327-6. MR3361774 [3] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 [4] D. Perkins, ϕ, π, e & i, MAA Press, 2017. [5] A. J. van der Poorten, Notes on continued fractions and recurrence sequences, Number theory and cryptography (Sydney, 1989), London Math. Soc. Lecture Note Ser., vol. 154, Cambridge Univ. Press, Cambridge, 1990, pp. 86–97. MR1055401 [6] S. K. Zaremba, La m´ ethode des “bons treillis” pour le calcul des int´ egrales multiples (French, with English summary), Applications of number theory to numerical analysis (Proc. Sympos., Univ. Montreal, Montreal, Que., 1971), Academic Press, New York, 1972, pp. 39–119. MR0343530


Transcendence of e Centennial Introduction Let α be a complex number. If there exists a polynomial p(x) of positive degree with integer coefficients such that p(α) = 0, then α is an algebraic number . If no such polynomial √ exists, then √ α is transcendental. Thus, all rational numbers are algebraic, as are 2, i = −1, and "  √ √ 5 + 3 + 1 + 2. However, not every algebraic number can be written in terms of integers, rational operations, and root extractions. Students of Galois theory know that the Abel– Ruffini theorem says that there is no formula analogous to the quadratic formula that can provide the roots of every polynomials of degree five. For example, x5 −x−1 has roots that are algebraic but not expressible in terms of radicals. It is often difficult to prove that a specific number is transcendental, although we can quickly show that most real (or complex) numbers are transcendental. Georg Cantor proved that the set of algebraic numbers is countable (see the footnote concerning this on p. 31 in the 1918 entry). Since the set of real numbers is uncountable (see the 1918 and 1999 entries for proofs), it follows that real transcendental numbers exist and, moreover, that most real numbers are transcendental. In 1844, Joseph Liouville proved a theorem that can be used to construct specific transcendental numbers. For example, he proved that Liouville’s constant λ =

∞  1 = 0.11000100000000000000000100000 . . . n! 10 n=1

is transcendental; see the comments for the 1935 entry for complete details. This did not, however, shed any light on the status of the famous constants e and π. Charles Hermite (1822–1901) established the transcendence of Euler’s constant (Figure 1) e = 2.7182818284590452353602874713526624977572470936999 . . . in 1873 and Ferdinand von Lindemann (1852–1939) proved the transcendence of π in 1882. We provide a slick modern proof of e’s transcendence in the notes below. Since transcendental numbers are irrational, it also establishes the irrationality of e. However, the following simple proof that e is irrational is too good to pass up. Let  1 In = xn ex dx, 0 329



Figure 1./ Euler’s constant e can be defined as the unique value e for which 0 dx x = 1.

which is positive. Then use integration by parts and induction to show that there are integers an and bn such that In = an + bn e,

n = 0, 1, 2, . . . .

Suppose toward a contradiction that e = p/q for some natural numbers p and q. Then   p 1 an q + bn p ≥ In = an + bn = q q q since the numerator an q + bn p is a positive integer. On the other hand,  1  1 1 e ≤ In = → 0. xn ex dx ≤ e xn dx = q n + 1 0 0 This contradiction implies that e is irrational. See the 1935 and 1955 entries, along with the the comments for the 1918, 1934, 1938, and 1967 entries, for more information about algebraic and transcendental numbers.

Centennial Problem 1973 Proposed by Steven J. Miller, Williams College. Prove that at least one of e + π and eπ is transcendental.1 1 When

a version of this entry was published in the Pi Mu Epsilon journal, the following problem was used: “Find a 1-to-1, increasing function f : [0, 1] → R such that f (x) is transcendental for all x.” This problem has been moved to the 1955 entry.



1973: Comments Transcendence of e. The following proof, which can be found in [2, Thm. 12.45], involves a small amount of complex analysis, or at least some familiarity with complex integration (see [4, Thm. 5.4.2] for another proof). If f (x) is a polynomial with deg f = m, then define  z f (ζ)ez−ζ dζ. (1973.1) I(z) = 0

Repeated integration by parts yields I(z) = ez


f (j) (0) −



f (j) (z).



Let F (x) denote the polynomial obtained from f by replacing each coefficient of f with its absolute value. Since the inequality |ez−ζ | ≤ e|z−ζ| ≤ e|z| holds for ζ = tz with t ∈ [0, 1], it follows from (1973.1) that |I(z)| ≤ |z|e|z| F (|z|). Suppose toward a contradiction that e is algebraic. Then there are integers q0 , q1 , . . . , qn with q0 =

0 and gcd(q0 , q1 , . . . , qn ) = 1 so that q0 + q1 e + q2 e2 + · · · + qn en = 0.


f (x) = xp−1 (x − 1)p · · · (x − n)p ,


Let in which p is a large prime number. Let I(z) denote (1973.1) and let J = q0 I(0) + q1 I(1) + · · · + qn I(n).


Then (1973.2) and (1973.3) ensure that J = −

n m  

qk f (j) (k),

j=0 k=0

in which m = deg f = (n + 1)p − 1. The definition (1973.4) tells us that f (j) (k) = 0 if j < p and k > 0 or if j < p − 1 and k = 0. Consequently p! divides f (j) (k) for all j, k except for j = p − 1 and k = 0, in which case we have f (p−1) (0) = (p − 1)!(−1)np (n!)p . It follows that f (p−1) (0) is a nonzero integer that is divisible by (p − 1)! but not p! whenever p > n. Let p > max{n, |q0 |} so that |J| ≥ (p − 1)!. Since F (k) ≤ (2n)m , it follows from (1973.4) and (1973.5) that |J| ≤ |q1 |eF (1) + · · · + |qn |nen F (n) ≤ cp ,



in which c is a constant that is independent of n. Therefore, (p − 1)! ≤ |J| ≤ cp 

 cp−1 1 ≤ c → 0. (p − 1)! This contradiction proves that e is transcendental.

and hence

Solution to the problem. The solution is simpler than one might suspect since it has nothing to do with special properties of e and π, or with any mysterious relationships between them. Given two transcendental numbers α and β, at least one of α + β and αβ is transcendental. Here is the explanation. Suppose that there are transcendental numbers α and β such that α + β and αβ are both algebraic. Since the sum and product of algebraic numbers are algebraic (see the notes for the 1967 entry), it follows that (α + β)2



are algebraic. Therefore, (α − β)2 = (α + β)2 − 4αβ is algebraic too. Since square roots of algebraic numbers are algebraic (prove it!), we deduce that α − β is algebraic and hence  1 α = (α + β) − (α − β) 2 is algebraic. This contradiction shows that at least one of α + β and αβ is transcendental. In the context of our problem, we know that at least one of e + π and eπ is transcendental. Possibly both of them are. As of 2019, we still do not know. The same goes for e/π, π − e, π π , and ee , although we do know that π + eπ and πeπ are both transcendental [5]. Bibliography [1] E. B. Burger and R. Tubbs, Making transcendence transparent: An intuitive approach to classical transcendental number theory, Springer-Verlag, New York, 2004. MR2077395 [2] B. Fine, A. Gaglione, A. Moldenhauer, G. Rosenberger, and D. Spellman, Algebra and number theory: A selection of highlights, De Gruyter Textbook, De Gruyter, Berlin, 2017. MR3727130 [3] S. Lang, Algebra, 3rd ed., Springer-Verlag, 2002. [4] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 [5] Yu. V. Nesterenko, Modular functions and transcendence questions (Russian, with Russian summary), Mat. Sb. 187 (1996), no. 9, 65–96, DOI 10.1070/SM1996v187n09ABEH000158; English transl., Sb. Math. 187 (1996), no. 9, 1319–1348. MR1422383 [6] D. Richeson, The transcendence of e, Division by Zero blog, September 28, 2010. https:// divisbyzero.com/2010/09/28/the-transcendence-of-e/. [7] R. Schwartz, Transcendence of e, online notes adopted from Section 5.2 of Herstein’s Topics in Algebra, http://www.math.brown.edu/~res/M154/e.pdf.


Rubik’s Cube Introduction In 1974, Ern˝ o Rubik (1944– ) invented the Magic Cube (as it was initially called in his native Hungary), a mechanical puzzle now known around the world as the Rubik’s Cube [3]. It is easy to scramble the cube with just a few turns; figuring out how to restore the six faces takes much more work (one solution is presented in the comments below). Although the Rubik’s Cube has 43,252,003,274,489,856,000 = 227 × 314 × 53 × 72 × 11 possible states, it can always be solved in 20 moves or less, a fact only established in 2010 [1]. At the first World Championships in 1982, winner Minh Thai (1966– ) of the United States won with a best time of 22.95 seconds. The current world record belongs to Feliks Zemdegs (1995– ) of Australia, who clocked in at 4.22 seconds [4]. The best average time over five solves is Zemdegs’s astounding 5.80 seconds. The mathematics of the Rubik’s Cube is inherently noncommutative in nature: the order of operations matters. For example, fix an orientation of the cube, rotate the front face by 90◦ clockwise, then rotate the right face by 90◦ clockwise:

F −−−−−→

R −−−−−→

Call these operations F and R, respectively. Now take a similarly oriented cube and perform these steps in the reverse order:

R −−−−−→

F −−−−−→

Since we have obtained two different configurations, F R = RF . The Rubik’s Cube group is the group generated by the symbols U, D, L, R, F, B (for Up, Down, Left, Right, Front, and Back, respectively) and their inverses, 333



subject to the natural relations imposed by the cube itself. For example, U 4 = D4 = L4 = R4 = F 4 = B 4 = I, in which I denotes the identity element (that is, do nothing) of the Rubik’s Cube group. This algebraically encapsulates the fact that turning any face of the cube four times returns the cube to its original state. Other relations are more subtle, such as (R2 U 2 )6 = I and (RU 2 D−1 BD−1 )1260 = I. Centennial Problem 1974 Proposed by Alan Chang, Princeton University. (a) Start with a solved Rubik’s Cube. Prove that every finite sequence of turns, if repeated enough times, will get you back to the solved state. (b) Observe that each face of the Rubik’s Cube has two “cuts” (in order to produce three layers). We say that a Rubik’s Cube “has cuts at 1/3 and 2/3.” If you want to turn a face of the cube, you must turn along one of these cuts. Similarly, a 4 × 4 × 4 cube has cuts at 1/4, 2/4, and 3/4. Suppose instead that we have a cube that has a cut at α for every α ∈ [0, 1]. Is it true that any finite sequence of moves, if repeated enough times, will get you back to a solved state? Acknowledgements: This problem would not have been possible without the help of the second named author of this book, who suggested looking at an infinite variation of (a), a dinner discussion with a group of SMALL ’13 REU students at Williams College, and Scott Sicong Zhang, who helped simplify the proof of (b).

1974: Comments How to solve a cube. Before giving the solution to the centennial problem, we might as well provide a solution to the Rubik’s Cube itself! Although the following method is far from the fastest, it is relatively simple and relies only on a small number of algorithms. The first named author of this book and economist (and Pomona alum) Xan Vongsathorn have coached dozens of students through their first solves with the method below. The second named author uses a similar approach (online tutorials of his are available at [2]). Speed cubers know dozens of additional algorithms and have different approaches to the cube entirely.

Figure 1. The six faces of a Rubik’s Cube. The Rubik’s Cube has six faces, which we refer to as U , D, L, R, F , and B (for Up, Down, Left, Right, Front, and Back, respectively). Unlike the preceding



discussion, we do not attach these letters to particular colors or insist upon fixing a certain orientation of the cube. In Figure 1 we call the orange face F because we are holding the cube so that the orange face is in front of us. The green face is R because it is on our right. The letters U, D, L, R, F, B also describe turning a face 90◦ clockwise from the perspective of someone looking at the face head on. For example, U means “turn the U face clockwise (seen from above) by 90◦ ” and D means “turn the D face clockwise (from the perspective of someone looking at the bottom of the cube).” We use F −1 to refer to a counterclockwise quarter turn of the F face, and similarly for the other faces. An algorithm is a specific sequence of turns, such as F RF −1 R−1 . This algorithm asks us to execute F , then R, then F −1 , then R−1 . The first step is to make a white cross on the U face:



Somewhere on the cube are four corners with a white sticker on them. You want to get them on the white (U ) face without messing up the white cross. The two other colors on the white corners will need to match their surroundings:



You can use these algorithms to move a corner into position.



F DF −1

R−1 DDRDR−1 D−1 R


R−1 D−1 R

If a white corner is in the top layer but not in the correct position, use one of the preceding algorithms to move it into the bottom layer. Then proceed as above.



Now flip the cube over so that the white side is on the D face. If possible, turn the U layer until you are in a position to apply one of these algorithms:

(U −1 F −1 U F )(U RU −1 R−1 ) −−−−−−−−−−−−−−−−−−−−−−→ F


(U RU −1 R−1 )(U −1 F −1 U F ) −−−−−−−−−−−−−−−−−−−−−−→ F


If the desired edge is not in the top layer, then it is in the middle layer. Use one of the algorithms above to swap the edge that is stuck in the middle layer with an edge from the upper layer. Now proceed as above. With these two algorithms, you can solve all four of the middle layer edge pieces. We now want to make a yellow cross on the (U ) face. If you already have a yellow cross, you can move onto the next phase. If not, apply the algorithm below one, two, or three times to make the cross. B







F U RU −1 R−1 F −1 −−−−−−−−−−−−−−→









F U RU −1 R−1 F −1 −−−−−−−−−−−−−−→










F U RU −1 R−1 F −1 −−−−−−−−−−−−−−→









We now need to put the yellow edges in the correct locations:



Here are algorithms to swap two or three adjacent edge pieces: B










(RU R−1 U )(RU U R−1 )U

(RU R−1 U )(RU U R−1 )

You might need to swap two opposite edges. In that case, apply one of the preceding algorithms and reevaluate the situation. Now we need to move the corners to the right locations. We will worry about their orientations later.



Use the algorithm B




(U RU −1 L−1 )(U R−1 U −1 L)


one or more times to permute the corners until they are in the correct locations. This is the last step! It is the most complicated. Every piece should be in the right location, but the yellow corners may not all have yellow stickers facing up.



The algorithm

(R−1 D−1 RD)(R−1 D−1 RD) −−−−−−−−−−−−−−−−−−−−−−→ F


rotates the Up-Front-Right (U F R) corner counterclockwise. If you do it twice in a row, it will rotate the U F R corner counterclockwise twice, which is the same as rotating it clockwise. This algorithm has strange side effects on the bottom two layers, unless you do it three times in a row. Of course, if you do it three times in a row, your corner piece will rotate counterclockwise three times, ending up back where it started! The trick is to rotate the U face in between executions of the algorithm. If you need to rotate the U F R corner, apply the algorithm until the corner is properly oriented, that is, yellow side up. This will require one repetition if it needs to be rotated counterclockwise, and two if clockwise. Then rotate U until another misoriented corner is in the U F R position, and repeat the algorithm until the new corner is properly oriented. Repeat this until all the yellow corners have been properly oriented. This step always requires that you repeat the algorithm some multiple of three times. If you are successful, the bottom two layers will be restored and the cube solved! Solution to the centennial problem. (a) Start with a solved cube and let M be a finite sequence of turns. There are only r = 227 · 314 · 53 · 72 · 11 possible states of the cube. The pigeonhole principle ensures that among the r + 1 states M, M 2 , M 3 , . . . , M r+1 there are two that are identical. That is, there are distinct s > t so that M s = M t . Thus, M s−t = I; that is, repeating M a total of s − t times returns the cube to the solved state. (b) We cannot use the pigeonhole principle directly since there are infinitely many states. However, each finite sequence of turns involves only finitely many cuts. These cuts, along with their reflections, divide the cube into an n × n × n cube for some n. By a proof similar to (a), any finite sequence of turns on a solved n × n × n cube eventually returns to the solved state after sufficiently many iterations. Bibliography [1] Cube20, God’s Number is 20, http://www.cube20.org/. [2] S. J. Miller, Talks on solving the 2 × 2 × 2 and 3 × 3 × 3 cubes, https://youtu.be/PKZ7pxFyYu0 and https://youtu.be/FO1kOU-3Blw. [3] Rubik’s, Home of the Rubik’s Cube, http://www.rubiks.com/. [4] World Cube Association, https://www.worldcubeassociation.org/.


Szemer´ edi’s Theorem Introduction An arithmetic progression is a finite sequence of integers, such as 4, 9, 14, 19, 24, whose consecutive terms differ by a fixed amount; see the 1913 entry. We say that a subset of the natural numbers is AP-rich if it contains arbitrarily long arithmetic progressions. For example, the set of even numbers is AP-rich. Is the set A = {1, 2, 3, 5, 6, 7, 10, 11, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, . . . }


of square-free natural numbers AP-rich? What about the set B = {1, 4, 9, 16, 25, 36, . . .} of perfect squares? Or the set C = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .} of prime numbers? Although each of these three sets is infinite, the ways in which they sit inside of the natural numbers are different. The square-free numbers appear omnipresent, whereas the perfect squares seem sparsely distributed. The prime numbers are somewhere in between. To capture this intuitive idea, we introduce the notion of natural density (or simply density): d(S) = lim


|S ∩ {1, 2, . . . , n}| . n


For example, one can show that 6 = 0.607927 . . . ; π2 see the notes for the 1939 entry. Consequently, one might say that “A contains about 60.8% of the natural numbers.” The perfect squares are much sparser, since √ √  n + 1 2 n d(B) = lim ≤ lim = 0. n→∞ n→∞ n n Similarly, the prime number theorem (see the 1919 and 1948 entries) ensures that d(A) =

π(n) n/ log n 1 = lim = lim = 0. n→∞ n→∞ log n n n Natural density confirms, in a quantitative manner, that there are a lot more squarefree natural numbers than perfect squares or primes. One can make such statements more precise by studying the asymptotic behavior of the quotient that appears in √ (1975.2). For the sets A, B, C above, the quotient is asymptotic to 6n/π 2 , 1/ n, and 1/ log n, respectively. d(C) = lim





In practice, natural density is too restrictive. Indeed, there are subsets of the natural numbers for which the limit (1975.2) is undefined. Can you find an example? Of greater use is the notion of upper density d(S) = lim sup n→∞

|S ∩ {1, 2, . . . , n}| , n

which always exists. In 1935, Klaus Friedrich Roth, whom we met in our 1955 entry, proved that any subset of the natural numbers with positive upper density contains infinitely many three-term arithmetic progressions [13]. Paul Erd˝os (see the 1913 entry) and P´ al Tur´ an (1910–1976) then conjectured that every subset of the naturals with positive upper density is AP-rich [7].1 To be more precise, the original wording is less direct and they attribute much of the conjecture to George Szekeres (1911–2005): More generally, he [Szekeres] has conjectured that, if we denote by rl (N ) the maximum number of integers less than or equal to N such that no l of them form an arithmetic progression, then, for any k, and any prime p,   (p − 2)pk + 1) = (p − 1)k . rp p−1 An immediate and very interesting consequence of this conjecture would be that for every k there is an infinity of k combinations of primes forming and arithmetic progression.

A major step occurred in 1969 when Endre Szemer´edi extended Roth’s theorem to four-term arithmetic progressions [18]. In 1975, Szemer´edi proved the Erd˝os– Tur´ an conjecture in its entirety [19]. For this, and many other results, he received the Abel Prize in 2012. Armed with Szemer´edi’s theorem, we can assert that the set of square-free natural numbers is AP-rich. This is not at all obvious. However, even Szemer´edi’s theorem does not address whether the perfect squares or the prime numbers, which both have density zero, are AP-rich. Elementary arguments show that the set of perfect squares contains infinitely many arithmetic progressions of length three (see the 1913 entry). More sophisticated methods confirm that there is no arithmetic progression in the perfect squares of length four (see the 2004 entry). The Green– Tao theorem, for which Hillel Furstenberg’s ergodic-theoretic proof of Szemer´edi’s theorem was a crucial ingredient [8], asserts that the primes are indeed AP-rich. See the 2004 entry for more about the Green–Tao theorem [12] and some of its extensions [14, 15]. We say that S ⊆ N is additively large if for some sequence of “intervals” In = {an + 1, an + 2, . . . , an + n } ⊆ N with lengths |In | = n → ∞, the following holds: d(S; In ) = lim sup n→∞

|S ∩ In | > 0. |In |


One can prove that Szemer´edi’s theorem implies the ostensibly stronger statement that every additively large set is AP-rich. 1 In

Hungarian, their names are Erd˝ os P´ al and Tur´ an P´ al, respectively.



What about geometric progressions? A geometric progression is a finite sequence of integers, such as 6, 12, 24, 48, 96, so that the ratio of consecutive terms is constant. We say a subset of N is GP-rich if it contains arbitrarily long geometric progressions. Are sets with positive upper density GP-rich? No. The set of square-free numbers, which has density 6/π 2 , provides a counterexample. It cannot contain a length-three geometric progression a, ar, ar 2 since the third term cannot be square free. Nowadays, it is customary to view Szemer´edi theorem as a density version of van der Waerden’s theorem [16], a seminal result of Ramsey theory that implies that for any finite partition r * N = Ci , (1975.4) i=1

at least one of the Ci is AP-rich (see the 1930 entry). It is also true that one of the Ci is GP-rich: consider the restriction of the coloring (1975.4) to the set {2n : n ∈ N} and then apply van der Waerden’s theorem. If one hopes that a partition result from Ramsey theory should have a density version, we need a new notion of largeness that is geared towards the multiplicative structure of N. The additive semigroup of natural numbers (N, +) has the single generator 1 since k = 1  + 1 +· · · + 1 . k times

On the other hand, the multiplicative semigroup (N, ×) has infinitely many generators: they are the prime numbers. This distinction complicates matters. (j) For each j ∈ N, let Nn be an increasing sequence of natural numbers. Let an be a sequence in N, let p1 , p2 , . . . be an enumeration of the primes, and let Fn = {an pi11 pi22 · · · pinn : 0 ≤ ij ≤ Nn(j) , 1 ≤ j ≤ n}. For A ⊆ N, the upper multiplicative density with respect to the family Fn is d× (A; Fn ) = lim sup n→∞

|A ∩ Fn | . |Fn |

Observe that d× (A; Fn ) is invariant with respect to multiplication and division in the sense that d× (A; Fn ) = d× (kA; Fn ) = d× (A/k; Fn ), in which kA = {ka : a ∈ A}


A/k = {b : kb ∈ A}.

The sets Fn are best viewed as multiplicative counterparts of the family of intervals that appear in (1975.3). We say that A ⊆ N is multiplicatively large if d× (A; Fn ) > 0 for some sequence Fn as defined above. Vitaly Bergelson (1950– ) proved that any multiplicatively large set is GP-rich [3]. This can be viewed as the multiplicative analogue of Szemer´edi’s theorem. In light of van der Waerden’s theorem, for any finite partition (1975.4), at least one Ci is simultaneously AP-rich and GP-rich. It turns out, surprisingly, that the notion of multiplicative largeness admits a density version of this result: any multiplicatively large set is AP-rich [3].



Suppose that S is a syndetic set in (N, +), that is, a set with the property that finitely many of its shifts k + S = {k + s : s ∈ S} cover N. Equivalently, S is syndetic if it has bounded gaps in the sense that there exists a g ∈ N so that {a, a + 1, a + 2, . . . , a + g} ∩ S = ∅ for all a ∈ N. Centennial Problem 1975 Proposed by Vitaly Bergelson, The Ohio State University. Are syndetic subsets of the natural numbers GP-rich? 1975: Comments Divisibility chains. The set (1975.1) of square-free numbers, while free of geometric progressions, has a lot of multiplicative structure. Each prime number is square free, and the product of any finite number of distinct primes is square free. Thus, A contains p1 , p1 p2 , p1 p2 p3 , . . . , for any sequence p1 , p2 , . . . of distinct primes. It turns out that any set of positive upper logarithmic density (a notion slightly stronger than that of upper density) contains an infinite divisibility chain, that is, a sequence x1 , x2 , . . . for which each term divides the next [6]. On the other hand, there is a set of positive upper density for which no element divides any other element [5]. Ramanujan’s constant. If someone told you that a particular computation had produced the number 262,537,412,640,768,743.999999999999, it would be reasonable to assume that the correct result is actually the integer 262,537,412,640,768,744 and that the string of twelve 9’s beyond the decimal point is the byproduct of roundoff error or some other inaccuracy introduced through numerical computation. In 1975, Martin Gardner (see the 1914 entry) played a famous April Fool’s joke on the mathematical community when he claimed in his√Scientific American column that Srinivasa Ramanujan had conjectured that exp(π √163) was an integer [9] (Gardner fessed up about the joke in [10]). Although exp(π 163) is not exactly an integer, it is remarkably close since eπ

√ 163

= 262,537,412,640,768,743.9999999999992500725971981856888 . . . .

This amazing near-miss had already been noted in 1859 by Charles Hermite, whom we met in our 1973 entry. One should keep in mind that few readers in 1975 would have been able to detect this ruse. Personal computers did not yet exist and desktop calculators did not have the ability to deal with such large numbers or work with such great precision. On the other hand, the first named author just computed



1,000 digits of eπ 163 on a late 2013 iMac in just 0.000038 seconds. One million digits only took 3.367 seconds. How far we have come! The origin of this spectacular “almost integer” lies with the theory of the jinvariant; see the 1992 entry. If τ is a quadratic irrational number with positive imaginary part, then j(τ ) is an algebraic integer (an algebraic number that is the root of a monic polynomial with integer coefficients) whose degree is the class number of the quadratic field Q(τ ). Consequently, if Q(τ ) has class number one, then j(τ ) is an algebraic √ integer of degree 1, that is, an integer in√the usual sense of the word. For τ = −d with d square free (to ensure that Q( −d) = Q), this occurs if and only if d is a Heegner number . These are 1, 2, 3, 7, 11, 19, 43, 67, 163; see the 1966 entry. One can show that √   √ 1 + −d π d e ≈ −j + 744, 2 in which the first term on the right-hand side is an integer if d = 163. Other, less spectacular, near-integer identities hold for the largest remaining Heegner numbers: eπ eπ

√ √


= 147,197,952,743.99999 . . . ,


= 884,736,743.9997 . . . .

For an explanation of the mathematics behind “Ramanujan’s constant”, see [11]. Another Erd˝ os–Tur´ an conjecture? Erd˝os’s conjecture on arithmetic progressions (see the 1913 entry) states that 1 diverges =⇒ A is AP-rich. (1975.5) a a∈A

The conjecture is often confusingly referred to as the Erd˝os–Tur´ an conjecture, which more properly refers to the original conjecture proved by Szemer´edi. There is also the Erd˝ os–Tur´ an conjecture on additive bases, which is something else entirely! Status of the centennial problem. At present, it is not known whether any syndetic set contains a pair of the form {a, ar 2 }. See [1, 2] for discussion and some equivalent forms of this problem. Bibliography [1] M. Beiglb¨ ock, V. Bergelson, N. Hindman, and D. Strauss, Multiplicative structures in additively large sets, J. Combin. Theory Ser. A 113 (2006), no. 7, 1219– 1242, DOI 10.1016/j.jcta.2005.11.003. http://www.sciencedirect.com/science/article/ pii/S0097316505002141. MR2259058 [2] M. Beiglb¨ ock, V. Bergelson, N. Hindman, and D. Strauss, Some new results in multiplicative and additive Ramsey theory, Trans. Amer. Math. Soc. 360 (2008), no. 2, 819–847, DOI 10.1090/S0002-9947-07-04370-X. http://www.ams.org/journals/tran/2008-360-02/S00029947-07-04370-X/S0002-9947-07-04370-X.pdf. MR2346473 [3] V. Bergelson, Multiplicatively large sets and ergodic Ramsey theory, Probability in mathematics, Israel J. Math. 148 (2005), 23–40, DOI 10.1007/BF02775431. http://link.springer. com/article/10.1007%2FBF02775431. MR2191223 [4] V. Bergelson and A. Leibman, Polynomial extensions of van der Waerden’s and Szemer´ edi’s theorems, J. Amer. Math. Soc. 9 (1996), no. 3, 725–753, DOI 10.1090/S0894-0347-96-00194-4. MR1325795



[5] A. S. Besicovitch, On the density of certain sequences of integers, Math. Ann. 110 (1935), no. 1, 336–341, DOI 10.1007/BF01448032. MR1512943 [6] H. Davenport and P. Erd˝ os, On sequences of positive integers, Acta Arith. 2 (1936), 147–151. [7] P. Erd˝ os and P. Tur´ an, On Some Sequences of Integers, J. London Math. Soc. 11 (1936), 261–264. http://citeseerx.ist.psu.edu/viewdoc/summary?doi= [8] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemer´ edi on arithmetic progressions, J. Analyse Math. 31 (1977), 204–256, DOI 10.1007/BF02813304. MR0498471 [9] M. Gardner, Mathematical Games: Six Sensational Discoveries that Somehow or Another Have Escaped Public Attention, Sci. Amer. 232 (1975), no. 4, 127–131. [10] M. Gardner, Mathematical Games: On Tessellating the Plane with Convex Polygons, Sci. Amer. 232 (1975), no. 7, 12–117. [11] B. J. Green, The Ramanujan Constant: An Essay on Elliptic Curves, Complex Multiplication and Modular Forms, http://people.maths.ox.ac.uk/greenbj/papers/ramanujanconstant. pdf. [12] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. MR2415379 [13] K. F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 104–109, DOI 10.1112/jlms/s1-28.1.104. MR0051853 [14] T. Tao and T. Ziegler, The primes contain arbitrarily long polynomial progressions, Acta Math. 201 (2008), no. 2, 213–305, DOI 10.1007/s11511-008-0032-5. MR2461509 [15] J. Pintz, Patterns of primes in arithmetic progressions, Number theory—Diophantine problems, uniform distribution and applications, Springer, Cham, 2017, pp. 369–379. MR3676411 [16] B. L. van der Waerden, Beweis einer Baudetschen Vermutung, Nieuw. Arch. Wisk. 15 (1927), 212–216. [17] Wikipedia, https://en.wikipedia.org/wiki/Heegner_number. [18] E. Szemer´ edi, On sets of integers containing no four elements in arithmetic progression, Acta Math. Acad. Sci. Hungar. 20 (1969), 89–104, DOI 10.1007/BF01894569. MR0245555 [19] E. Szemer´ edi, On sets of integers containing no k elements in arithmetic progression, Collection of articles in memory of Juri˘ı Vladimiroviˇ c Linnik, Acta Arith. 27 (1975), 199–245, DOI 10.4064/aa-27-1-199-245. MR0369312


Four Color Theorem Introduction The four color theorem states that every planar map can be colored with four colors in such a way that no two adjacent countries share the same color; see Figure 1. However, we should be precise about what this means. First of all, each country must be connected. For example, the United States does not count because Alaska and Hawaii are not connected to the lower forty-eight states. Second, we do not consider countries that touch “at corners” to be adjacent. Thus, Arizona and Colorado do not share a border as far as we are concerned; neither do Utah and New Mexico. Finally, we prohibit countries with infinitely long boundaries since otherwise one can construct bizarre maps that require more than four colors [8]. The year 1976 marked the end of the long search for a (correct) proof of the four color theorem, which was initially conjectured in 1852 by Francis Guthrie (1831– 1899). The conjecture was prompted by his attempt to color a map of English counties. Today most people know the theorem in the form “no more than four colors are needed to color a map.” Despite this common understanding of the theorem, cartographers claim that it does not matter since there is no reason to limit the number of colors used. Moreover, only three colors are needed for most

(a) Georgia

(b) Ohio

Figure 1. Four colorings of the counties in two US states. 345



maps that arise in practice. Despite its pragmatic insignificance, the four color theorem has great historical importance. To make the problem more precise, one converts statements about maps into statements about graphs. Assign each country a vertex. Place an edge between two vertices if and only if the two corresponding countries share a common border. This permits us to phrase the four color theorem in terms of graph theory: the vertices of any graph that can be drawn in the plane without edge crossings can be colored with at most four colors so that no two adjacent vertices share the same color. The four color theorem has the dubious honor of having been “proved” twice before 1976. Proofs by Alfred Kempe (1849–1922) in 1879 and by Peter Guthrie Tait (1831–1901) in 1880 each stood unchallenged for 11 years before fatal flaws were found. It is much easier to prove that five colors suffice [7]; see [9, Chapter 19] for details. It was not until 1976 that mathematicians again claimed to have a proof of the elusive theorem. Kenneth Appel (1932–2013) and Wolfgang Haken (1928– ) at the University of Illinois proved the four color theorem with computer assistance, through which they reduced the problem to 1,936 special cases, each of which was checked by computer. This was greeted with controversy by the mathematical community (see also the 1998 entry on the Kepler conjecture). Is a proof valid if it is so long and computationally intensive that no human can understand it in totality? Although the theorem has since been verified by the Coq interactive theorem prover [6], there are some who still find the prospect of computer-aided proofs unsettling. Perhaps a more elegant, humanly understandable proof of the four color theorem exists. Try to find it!

Centennial Problem 1976 Proposed by Alexandra Jensen, Steven J. Miller, and Pamela Mishkin, Williams College. We know that four colors suffice to color a planar map so that no two countries with a common border share the same color. What if we add the constraint that no color is used too often? For what p ∈ [25, 100] does a four coloring exist that uses each color for at most p% of the countries? The four color theorem says we may take p = 100 and the pigeonhole principle tells us we cannot have p < 25. What if we only require at most p% of each color when there are at most N regions?

1976: Comments Heawood conjecture. The four color theorem tells us that we can color any planar map using at most four colors. What about map colorings on the torus, the Klein bottle (see the 1958 entry), or other surfaces? Percy J. Heawood (1861– 1955), who spent most of his career attempting to prove the four color theorem and found the fatal flaw in Kempe’s 1879 proof, conjectured in 1890 that the minimum



Table 1. Computation of the Euler characteristics of the five Platonic solids. Here v denotes the number of vertices, e the number of edges, and f the number of faces of the solid. Since all five solids are homeomorphic, their Euler characteristics are equal. S v e f χ tetrahedron 4 6 4 2 cube 8 12 6 2 octahedron 6 12 8 2 dodecahedron 20 30 12 2 isosahedron 12 30 20 2

number of colors required to color any map on a two-dimensional surface S is 0


√ 1 49 − 24χ , 2


in which χ denotes the Euler characteristic of S [7]. To compute χ, triangulate S and use the formula χ(S) = v − e + f, in which v denotes the number of vertices, e the number of edges, and f the number of faces in the triangulation. It turns out that any triangulation of S produces the same value; that is, the Euler characteristic is a topological invariant of S.1 For example, the five Platonic solids are all homeomorphic (see p. 22) to a sphere and all have χ = 2; see Figure 2 and Table 1. Substituting this into (1976.1) suggests that any map on a sphere can be colored with at most four colors. What is the status of the Heawood conjecture? Technically, it was disproved in 1934 when Philip Franklin (1898–1965) proved that any map on the Klein bottle (for which χ = 0) can be colored with only six colors, as opposed to the seven predicted by the conjecture [5]. This bound is tight since the Franklin graph (Figure 3) can be embedded on the surface of the Klein bottle and the resulting map cannot be colored with fewer than six colors. Morally speaking, however, the conjecture is true 100% of the time since Gerhard Ringel (1919–2008) and John W. T. Youngs (1910–1970) proved that it holds for all surfaces other than the Klein bottle [10]. For example, any map on the torus (which has χ = 0) can be colored with only seven colors, and this is minimal; see Figure 4.

1 It is important to note that nonhomeomorphic surfaces may have the same Euler characteristic. For example, the torus and the Klein bottle both have Euler characteristic zero. They are not homeomorphic since, for example, they have different fundamental groups (Z2 for the torus and a, b : ab = b−1 a for the Klein bottle). We refrain from further discussion since that would take us too far afield.



(a) Tetrahedron (4, 6, 4).

(b) Cube (8, 12, 6)

(d) Dodecahedron (20, 30, 12)

(c) Octahedron (6, 12, 8)

(e) Icosahedron (12, 30, 20)

Figure 2. The five Platonic solids along with (v, e, f ), in which v denotes the number of vertices, e the number of edges, and f the number of faces. The surface of each Platonic solid is homeomorphic to a two-dimensional sphere. Since the Euler characteristic of a surface is a topological invariant, v − e + f = 2 for all five surfaces. Readers who prefer the terminology d4, d6, d8, d10, d12, and d20, respectively, for these objects gain 100 experience points.



Figure 3. The Franklin graph can be embedded on the surface of the Klein bottle. The resulting map cannot be colored with fewer than six colors. Since Franklin proved that every map on a Klein bottle can be colored with at most six colors, this example shows that his bound is sharp.

Figure 4. The map at left can be wrapped onto the surface of a torus. This example shows that not every map on the torus can be colored with fewer than seven colors.

Bibliography [1] K. Appel and W. Haken, Every planar map is four colorable. I. Discharging, Illinois J. Math. 21 (1977), no. 3, 429–490. http://www.projecteuclid.org/euclid.ijm/1256049011. MR0543792 [2] K. Appel, W. Haken, and J. Koch, Every planar map is four colorable. II. Reducibility, Illinois J. Math. 21 (1977), no. 3, 491–567. http://projecteuclid.org/euclid.ijm/1256049012. MR0543793 [3] K. Appel and W. Haken, The solution of the four-color-map problem, Sci. Amer. 237 (1977), no. 4, 108–121, 152, DOI 10.1038/scientificamerican1077-108. MR0543796 [4] K. Appel and W. Haken, Every planar map is four colorable, with the collaboration of J. Koch, Contemporary Mathematics, vol. 98, American Mathematical Society, Providence, RI, 1989. MR1025335 [5] P. Franklin, A six color problem, J. Math. Phys. 13 (1934), 363–379.



[6] G. Gonthier, Formal proof—the four-color theorem, Notices Amer. Math. Soc. 55 (2008), no. 11, 1382–1393. http://www.ams.org/notices/200811/tx081101382p.pdf. MR2463991 [7] P. J. Heawood, Map-colour theorems, Quarterly Journal of Mathematics, Oxford 24 (1890), 332–338. [8] H. Hudson, Four colors do not suffice, Amer. Math. Monthly 110 (2003), no. 5, 417–423. [9] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Undergraduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274 [10] G. Ringel and J. W. T. Youngs, Solution of the Heawood map-coloring problem, Proc. Nat. Acad. Sci. U.S.A. 60 (1968), 438–445, DOI 10.1073/pnas.60.2.438. MR0228378 [11] R. Thomas, An update on the four-color theorem, Notices Amer. Math. Soc. 45 (1998), no. 7, 848–859. http://www.ams.org/notices/199807/thomas.pdf. MR1633714 [12] Wikipedia, Four color theorem, http://en.wikipedia.org/wiki/Four_color_theorem.


RSA Encryption Introduction Alice and Bob wish to communicate without letting an eavesdropper, Eve, understand their conversation. Any information that they wish to exchange can be encoded with numbers (see the comments for the 1936 entry). Instead of sending one large number that represents an entire message, information is typically broken up into smaller blocks of fixed size. Thus, Alice and Bob want to securely send and receive nonnegative integers less than or equal to a fixed threshold while Eve is eavesdropping. Moreover, they need to do this without first exchanging a secret key for their code: otherwise Eve will know the key! The RSA cryptosystem, invented by Ronald Rivest (1947– ), Adi Shamir (1952– ), and Leonard Adleman (1945– ) in 1977 and, independently, by Clifford Cocks (1950– ) of the UK intelligence agency GCHQ (Government Communications Headquarters) in 1973, addresses this issue (Cocks’s work remained classified until 1997). Eve can listen to the entire RSA-encrypted communication and she will be unable to decipher it! Without algorithms such as RSA, modern e-commerce would be impossible: we can buy things online without meeting the seller in person to agree on a secret key for the transaction. To perform this amazing feat, Alice and Bob require some number theory. To describe the RSA cryptosystem, we need Euler’s generalization of Fermat’s little theorem. Fermat’s little theorem tells us that ap−1 ≡ 1 (mod p) if p is prime and gcd(a, p) = 1; see the 2002 entry. Let φ(n) denote the number of j ∈ {1, 2, . . . , n} that are relatively prime to n. For example, φ(15) = 8 since there are eight numbers, namely 1, 2, 4, 7, 8, 11, 13, 14, in the specified range that are relatively prime to 15. The function φ is called the Euler totient function. It is multiplicative, in the sense that φ(mn) = φ(m)φ(n) if m and n are relatively prime. For example, φ(15) = φ(3)φ(5) = 2 · 4 = 8. Moreover, φ(p) = p − 1 whenever p is prime, since 1, 2, . . . , p − 1 are relatively prime to p. Euler’s theorem states that if gcd(a, n) = 1, then aφ(n) ≡ 1 (mod n). We are now ready to state the RSA algorithm. 351



RSA algorithm. • Alice secretly selects distinct large primes p and q. Their product n = pq is her enciphering modulus. • Alice picks a public key (also called an encryption key) e. This is a positive integer such that gcd(e, φ(n)) = 1. She knows n = pq, so she can compute φ(n) = φ(p)φ(q) = (p − 1)(q − 1) and check if gcd(e, φ(n)) = 1 rapidly via the Euclidean algorithm. • Alice’s private key (also called a decryption key) d is the inverse of e (mod φ(n)). Thus, de = jφ(n) + 1 for some integer j. • Alice makes n and e known to the public. She does not disclose p, q, or d. • To send the message M ∈ {1, 2, . . . , n} to Alice, Bob computes1 E ≡ M e (mod n). He sends E to Alice. • Alice recovers M from E as follows:2 E d ≡ (M e )d ≡ M de ≡ M jφ(n)+1 ≡ (M φ(n) )j M ≡ M (mod n). Since n and e are publicly available, anyone can send messages to Alice. Only she can decrypt these messages because only she knows the private key d. Here is an example. Alice selects secret primes p = 7,919 and q = 9,733. Then n = pq = 77,075,627 and φ(n) = (p − 1)(q − 1) = 77,057,976. Alice chooses e = 47 and checks that gcd(47, φ(n)) = 1. The multiplicative inverse of 47 (mod φ(n)) is d = 68,860,319. Bob wants to send the message M = 12,345 to Alice. He computes E ≡ M e = (12,345)47 ≡ 18,269,972 (mod n) and sends this to Alice, who receives it and computes E d = (18,269,972)68,860,319 ≡ 12,345 (mod n). Suppose that Eve wants to find M , knowing only E and Alice’s public information, n and e. She needs Alice’s private key d, so Eve must solve de ≡ 1 (mod φ(n)). To do this, Eve needs to know φ(n) = (p − 1)(q − 1). Since (p − 1)(q − 1) = pq − p − q + 1 = n − (p + q) + 1, knowing φ(n) is equivalent to knowing p + q. However, knowing p + q is equivalent to knowing p and q since the roots of (x − p)(x − q) = x2 − (p + q)x + pq = x2 − (p + q)x + n, namely p and q, can be found by the quadratic formula. Thus, finding φ(n) = (p − 1)(q − 1) is as hard as factoring n = pq. The security of RSA is based upon the assumption that it is hard to factor large numbers (even though it is easy to multiply them). If a method for fast factorization were to be found, then RSA would cease to be secure. Peter Shor (1959– ) found such an algorithm for fast factorization, but it requires a quantum computer. Although quantum computers have so far only been able to factor relatively small 1 Although

exponentiating M modulo n appears to be a daunting task, it can be done rapidly by repeated squaring and modular reduction; see the 2002 entry. 2 One can prove that E d ≡ M (mod n) even if gcd(M, n) = 1.



numbers, the potential exists for them to one day factor RSA moduli. Other cryptographic systems, such as lattice-based methods, are believed to be more secure against quantum-computer attacks. Centennial Problem 1977 Proposed by Steven J. Miller, Williams College. Rivest, Shamir, and Adleman formed RSA Laboratories to market and further develop applications of the RSA cryptosystem, which was granted U.S. Patent 4,405,829. In 1991, the company announced fifty-four factoring challenges to encourage cryptographic research and to monitor the state of contemporary factoring algorithms and technology. Each challenge number is the product of two large primes. These RSA challenge numbers were generated by an isolated computer, with no access to the internet, whose hard drive was immediately destroyed. Thus, we can be certain that if someone presents a factorization of an RSA challenge number, there was no cheating involved. Cash prizes were offered, ranging from $1,000 to $200,000. The challenge was officially closed in 2007, although many people continue to try to factor the RSA numbers. As of 2017, the smallest unfactored RSA challenge number is RSA-230 17969491597941066732916128449573246156367561808012600070888 91883553172646034149093349337224786865075523085586419992922 18144366847228740520652579374956943483892631711525225256544 10980819170611742509702440718010364831638288518852689, which has 230 digits. The largest of the challenge numbers is RSA-2048, which has 617 decimal digits (2048 bits). Without a major advance in quantum computation, RSA-2048 will probably never be factored. The smallest RSA challenge number is RSA-100 15226050279225333605356183781326374297180681149613806886579 08494580122963258952897654000350692006139. This 100-digit number was factored less than a month after the challenge began. Find the factors yourself! 1977: Comments Poor choices and Pollard’s p − 1 algorithm. The security of RSA rests on the assumption that factoring n = pq is computationally infeasible. However, there are some choices of p and q that render n susceptible to certain factorization algorithms. Suppose that p − 1 has only small prime factors. For instance, the prime p = 614,657 is “large” but p − 1 = 614,656 = 28 · 74 has only “small” prime factors. In this situation, Pollard’s p − 1 algorithm might be able to factor n in a reasonable amount of time. In what follows, we do not require that n is a product of two distinct primes. The starting point of Pollard’s algorithm is the observation that if p − 1 does not have any large prime factors, then (p − 1)|k! for some small k. For example, if p = 181, then p − 1 = 180 = 22 · 32 · 5



contains only small prime factors and p − 1 divides 6! = 720 = 180 · 4. On the other hand, if p = 179, then p − 1 = 178 = 2 · 89 has a relatively large prime factor. Because of this, p − 1 does not divide k! for k = 1, 2, . . . , 88, although it divides 89!. Suppose that p is a prime factor of n and (p − 1)|k!. Then k! = (p − 1)r for some r ∈ N and Fermat’s little theorem yields 2k! = 2(p−1)r ≡ (2p−1 )r ≡ 1r ≡ 1 (mod p), so p|(2k! − 1). Although other bases may be used, the base 2 is preferred in practice since exponentiation with base 2 is particularly amenable to computation. Let mk ≡ 2k! − 1 (mod n) with 1 ≤ mk ≤ n. Since mk and 2k! − 1 differ by a multiple of n, we have gcd(mk , n) = gcd(2k! − 1, n) ≥ p. If n does not divide 2k! − 1, then gcd(mk , n) is a proper divisor of n. In the preceding, we insisted that mk is the least positive residue of 2k! − 1 modulo n since mk = 0 implies that gcd(mk , n) = n and hence we do not obtain a proper factor of n. To implement Pollard’s algorithm, fix a threshold K and compute gcd(mk , n) for k = 2, 3, . . . , K and hope that a proper divisor of n is found. Observe that mk ≡ 2k! − 1 ≡ (2(k−1)! )k − 1 ≡ (mk−1 + 1)k − 1 (mod n), so the mk can be computed iteratively without computing k!. This shortcut is important, since the rapid growth of k! prevents the direct evaluation of mk . Here is an example. If n = 26,016,619, then 22! ≡ 4 (mod n),

m2 = 3,

gcd(m2 , n) = 1,

2 ≡ 4 ≡ 64 (mod n),

m3 = 63,

gcd(m3 , n) = 1,

2 ≡ 64 ≡ 16,777,216 (mod n),

m4 = 16,777,215,

gcd(m4 , n) = 1,

25! ≡ 16,777,2165 ≡ 6,730,144 (mod n),

m5 = 6,730,143,

gcd(m5 , n) = 1,

2 ≡ 6,730,144 ≡ 14,067,788 (mod n),

m6 = 14,067,787,

gcd(m6 , n) = 1,

3! 4!





2 ≡ 14,067,788 ≡ 20,137,005 (mod n), m7 = 20,137,004, 7!


gcd(m7 , n) = 5,419,

so 5,419|n. In fact, n = pq, in which p = 5,419 and q = 4,801 are prime. Neither p − 1 = 5,418 = 2 · 32 · 7 · 43


q − 1 = 4,800 = 26 · 3 · 52

divides 7! = 5,040. That is, the Pollard p − 1 method was successful before our initial analysis predicted that it should be. This is because 2k! −1 might be divisible by p by chance, as opposed to being divisible by p because k! is a multiple of p − 1. This is the case here, since 27! − 1 happens to be divisible by p. If Alice is careful in selecting her primes p and q, she can prevent Eve from factoring her RSA modulus n = pq using Pollard’s p − 1 algorithm. Let p0 , q0 be large primes. Then let p and q be even larger primes of the form p = ip0 + 1


q = jq0 + 1.

Dirichlet’s theorem on primes in arithmetic progressions ensures that there are infinitely many such primes; see the comments for the 1913 entry. By construction,



p − 1 = ip0 and q − 1 = jq0 have the large prime factors p0 and q0 , respectively. This prevents Eve from applying the Pollard p − 1 algorithm effectively. Answer to the problem. The factorization of RSA-100 is 37975227936943673922808872755445627854565536638199 × 40094690950920881030683735292761468389214899724061 This was found in 1991 by Mark Manasse (1958– ) and Arjen K. Lenstra (1956– ) [3]. Bibliography [1] R. Rivest, A. Shamir, and L. Adleman, US Patent 4,405,829 (1977). http://www.google.com/ patents/US4405829. [2] R. L. Rivest, A. Shamir, and L. Adleman, A method for obtaining digital signatures and publickey cryptosystems, Comm. ACM 21 (1978), no. 2, 120–126, DOI 10.1145/359340.359342. https://people.csail.mit.edu/rivest/Rsapaper.pdf. MR700103 [3] RSA Laboratories, RSA Honor Roll, http://www.ontko.com/pub/rayo/primes/hr_rsa.txt [4] Wikipedia, Pollard’s p − 1 algorithm, https://en.wikipedia.org/wiki/Pollard’s_p__1_algorithm. [5] Wikipedia, RSA Factoring Challenge, http://en.wikipedia.org/wiki/RSA Factoring Challenge. [6] Wikipedia, Shor’s Algorithm, http://en.wikipedia.org/wiki/Shor’s_algorithm.


Mandelbrot Set Introduction The Mandelbrot set is an example of a fractal, a mathematical object that possesses a great deal of self-similarity. It is constructed as follows. For each complex number c, form the sequence zn;c , in which z0;c = c

2 zn+1;c = zn;c + c.


The simplest pictures of the Mandelbrot set are obtained by coloring a point c black if the sequence defined above is bounded and white otherwise; see Figure 1. For finer detail, we can color points c whose sequences zn;c appear unbounded based upon how many iterations are needed to exceed a fixed, large threshhold; see Figure 2. One can zoom in on the Mandelbrot set and obtain a variety of beautiful and bewildering images; see Figure 3 and the links at [9]. One of the most important things to address with any iterative problem is the existence and classification of fixed points. If w is a fixed point of the map p(z) = z 2 + c, in which c is a constant, then p(w) = w; that is, w2 − w + c = 0.

Figure 1. The first visualization of the Mandelbrot set was produced in 1978 by Robert W. Brooks (1952–2002) and J. Peter Matelski [1]. Image public domain. 357



Figure 2. The Mandelbrot set.

This yields two fixed points (which coincide if c = 14 ): w =

1 − 4c . 2

The magnitude of p (w) = 1 ±

√ 1 − 4c

determines the nature of the fixed point w. If |p (w)| < 1, then w is an attracting fixed point and values that start out close to w will iterate toward w. If |p (w)| > 1, then w is a repelling fixed point and values that start out close to w will iterate away from w. If |p (w)| = 1, then the situation is more complicated and the argument of the complex number p (w) comes into play. What about polynomials of higher degree? If p is a polynomial of degree n, then p(w) = w means that w is a zero of the polynomial h(z) = p(z) − z, which has degree at most n. The fundamental theorem of algebra asserts that a polynomial of degree n has exactly n zeros, counted according to multiplicity, in the complex plane. Thus, p has at most n fixed points. What if the polynomial p is replaced with a slightly more exotic function?



Figure 3. Several close-up images of the Mandelbrot set. Centennial Problem 1978 Proposed by Stephan Ramon Garcia, Pomona College. Let p(z) be a complex polynomial of degree n. How many fixed points can p(z) have? That is, how many roots can the equation p(z) = z have? At most n? Infinitely many? Or something in between? 1978: Comments Space Invaders. A strong contender for this year’s topic was the video game Space Invaders 1 [10]. Created by Tomohiro Nishikado (1944– ) and released in 1978, this mega-blockbuster game revolutionized the industry. Interestingly, one of the defining features of the game was due to hardware limitations. In the game, alien ships are attacking the Earth. As more and more of them are destroyed, the 1A

common misconception is that the line “And the space he invades he gets by on you” from the 1981 Rush song Tom Sawyer is “And the space invaders get by on you.” Certainly, the second is the more amusing interpretation.



remaining ships move faster and faster until the last few ships move at incredible speeds. This feature was due to a computational bottleneck. The fewer the number of ships that need to be drawn, the faster the computer could display them! Nishikado decided that he liked this and incorporated it into the game. A continuous, nowhere-differentiable function. Self-similarity is a key ingredient in the construction of the blancmange function, a continuous, nowheredifferentiable function f : R → R; see Figure 4. Since the original construction is due to Teiji Takagi (1875–1960) [8], this function is also called the Takagi function. The first step is to prove that if f : R → R is differentiable at x, then lim


f (vn ) − f (un ) = f  (x) vn − un

whenever un , vn are sequences such that (a) un ≤ x < vn for all n ∈ N, (b) un < vn for all n ∈ N, and (c) limn→∞ (vn − un ) = 0. To do this, use the definition of the derivative as a limit of difference quotients and argue that it suffices to consider the case f  (x) = 0. Given x ∈ R, let g(x) denote the distance from x to the nearest integer. The graph of g(x) looks like a “sawtooth wave” with each “tooth” of height 1/2 and width 1; see Figure 5. Use the Weierstrass M -test to prove that the function f : R → R defined by ∞  g(2n x) f (x) = (1978.1) 2n n=0 is continuous and bounded. Since g(x) is periodic with period 1, it follows that g(2n x) is periodic with period 21n . If x is a dyadic rational number (that is, its denominator is a power of 2), then 2k x is an integer whenever k ≥ n and hence

Figure 4. Graph of the blancmange function on [0, 1]. This function is continuous, but nowhere differentiable.



Figure 5. Graphs of the summands g(x), 12 g(2x), 14 g(4x), 18 g(8x) for n = 1, 2, 3, 4. g(2k x) = 0 for all k ≥ n. Fix x ∈ R. For each n ∈ N, let un =

mn 2n


vn =

mn + 1 2n

be dyadic rational numbers that satisfy un ≤ x < vn


vn − un =

1 . 2n

By the preceding remarks, the series for f reduces to n−1  g(2k vn ) − g(2k un ) f (vn ) − f (un ) = . vn − un 2k vn − 2k un



However, 2k un = 2k−n 2n un = 2k−n mn , k


2 vn = 2


(mn + 1),

for some mn ∈ Z. Since 2k−n ≤ 12 for k < n, this means that g is linear on the interval [2k un , 2k vn ]. Thus, each of the difference quotients on the right side of (1978.2) is ±1 (depending on whether mn is even or odd). In other words, n−1  f (vn ) − f (un ) = ±1 vn − un



for some sequence of signs ±. Since the terms of a convergent series must tend to zero, it follows that (1978.3) does not tend to a finite limit as n → ∞. In light of (1978.1), we conclude that f  (x) does not exist. Answer to the problem. Let p(z) and q(z) be polynomials with deg p = n, deg q = m, and m < n. What is the maximum number of zeros of h(z) = p(z) − q(z)?



Terence Sheil-Small conjectured in 1992 that the sharp upper bound was n2 . This is indeed the case if m = n or m = n − 1, as his student A. S. Wilmshurst proved [11]. What if m < n − 1? Wilmshurst conjectured that if m = 1, that is, h(z) = p(z) − z, then the number of zeros of h is at most 3n − 2. This was proved in 2002 by ´ atek using techniques from complex Dmitry Khavinson (1956– ) and Grzegorz Swi¸ dynamics [4]; see [3] for an elegant exposition of this result and an application to gravitational lensing (also see the 1915 entry). The sharpness of the upper bound 3n − 2 was proved in 2008 by Lukas Geyer [2]. Bibliography [1] R. Brooks and J. P. Matelski, The dynamics of 2-generator subgroups of PSL(2, C), Riemann surfaces and related topics: Proceedings of the 1978 Stony Brook Conference (State Univ. New York, Stony Brook, N.Y., 1978), Ann. of Math. Stud., vol. 97, Princeton Univ. Press, Princeton, N.J., 1981, pp. 65–71. MR624805 [2] L. Geyer, Sharp bounds for the valence of certain harmonic polynomials, Proc. Amer. Math. Soc. 136 (2008), no. 2, 549–555, DOI 10.1090/S0002-9939-07-08946-0. MR2358495 [3] D. Khavinson and G. Neumann, From the fundamental theorem of algebra to astrophysics: a “harmonious” path, Notices Amer. Math. Soc. 55 (2008), no. 6, 666–675. MR2431564 ´ atek, On the number of zeros of certain harmonic polynomials, [4] D. Khavinson and G. Swi¸ Proc. Amer. Math. Soc. 131 (2003), no. 2, 409–414, DOI 10.1090/S0002-9939-02-06476-6. MR1933331 [5] B. Mandelbrot, Fractal aspects of the iteration of z → λz(1 − z) for complex λ, z, Annals of the New York Academy of Sciences 357, 249–259. [6] B. B. Mandelbrot, The fractal geometry of nature, Schriftenreihe f¨ ur den Referenten [Series for the Referee], W. H. Freeman and Co., San Francisco, Calif., 1982. MR665254 [7] Team Fresh, Last Lights On—Mandelbrot fractal zoom to 6.066 e228 (2760 ). http://vimeo. com/12185093. [8] T. Takagi, A simple example of the continuous function without derivative, Proc. Phys. Math. Japan, 1 (1901), 176–177. [9] Wikipedia, Mandelbrot set, http://en.wikipedia.org/wiki/Mandelbrot_set. [10] Wikipedia, Space invaders, http://en.wikipedia.org/wiki/Space_Invaders. [11] A. S. Wilmshurst, The valence of harmonic polynomials, Proc. Amer. Math. Soc. 126 (1998), no. 7, 2077–2081, DOI 10.1090/S0002-9939-98-04315-9. MR1443416


TEX Introduction This entry honors two fundamental contributions of computer science to mathematics. First, there is the creation of the TEX typesetting system by Donald Knuth (1938– ), which was released in 1978. Second, there are off-by-one errors (in which a loop intended to be executed n times is inadvertently executed n − 1 or n + 1 times), which is why this entry is listed under 1979. Purists will be happy to know, however, that Knuth was honored with the National Medal of Science in 1979. Donald Knuth is perhaps best known for his monumental, encyclopedic, and stunningly readable series The Art of Computer Programming. Begun in 1962 while he was a graduate student at Caltech, the project continues to this day, with volume 4A published in 2011 and several remaining volumes in preparation. While preparing a second edition of Volume 2, Knuth was dismayed with the quality of the typesetting done by the publisher. He realized that digital typesetting can be boiled down to 0’s and 1’s: is a pixel black or white? Knuth saw this as a problem amenable to computer science and set out to design his own system. Knuth estimated he could have his digital-typesetting system ready in six months. Instead, it was almost ten years before TEX was released. It was called version 3. The next version was 3.1, which was followed by version 3.14, and so forth. The current version is 3.14159265. This unusual numbering system suggests that later versions are only incrementally different from previous ones and that TEX has essentially stabilized. TEX is used extensively in the publishing world. Almost every contemporary paper in mathematics and computer science is typeset using a system based on TEX, including this book! Most mathematicians use LATEX, a document-preparation system written in TEX that includes many predefined commands that would be tedious to deal with in “raw” TEX. For example, the LATEX source \begin{equation}\label{eq:ZetaAgain1979} \zeta(s) \ = \ \sum_{n=1}^\infty \frac{1}{n^s} \end{equation} produces (1979.1) below. The formula for the Riemann zeta function is enclosed in an equation environment with the label eq:ZetaAgain1979 attached in case we need to refer to it at some point. Although TEX has many features that distinguish it from other digital typesetting and publishing systems, we focus on only two points of interest here. 363


1979. TEX

First, TEX is a programming language. The user writes a program that describes both the content and layout of the document. The program is then interpreted by TEX and produces the desired output. This design choice means that TEX is extraordinarily flexible and customizable. The price is that TEX and related systems can be hard for beginners to pick up. Fortunately there are many templates available online. By looking at existing code and compiled documents you can learn over 90% of what you need fairly quickly, and then search the web or ask experts for the rest. For example, the second named author maintains TEX templates (for papers and presentations) at http://web.williams.edu/Mathematics/ sjmiller/public_html/math/handouts/latex.htm. The website also has a link to a YouTube video that goes through writing simple articles with these templates. The second point is that TEX uses sophisticated algorithms to lay out text. Consider the problem of breaking a paragraph into justified lines. Each line must begin at the left margin and end at the right margin, and there can be neither too much nor too little space between words. Line breaks are allowed only between words and, if necessary, inside a word at a known hyphenation point. How would you solve this problem? The solution used in most digital typesetting systems and word processors is a greedy strategy. We consider the words of the paragraph one at a time, adding them to the current line. When the current line is full, it is added to the page, and we begin adding words to the next line. This approach is fast since it considers each word only once, but it can lead to unappealing results because it never changes its mind about lines that have already been added to the page. For example, the greedy algorithm may put vastly different amounts of space between words on different lines, which looks terrible.

Centennial Problem 1979 Proposed by James Wilcox, University of Washington. Instead of looking one word at a time, TEX tries hard to optimize things to produce a good-looking paragraph. To do so, it uses a notion called badness, which is computed using complex rules that are designed to penalize ugly layouts. For example, we wish to penalize paragraphs that contain lines with too much or too little space between words. Given a definition of badness, the problem is now to minimize badness over all possible sets of line breaks. A naive implementation of this approach would consider exponentially many choices, but it is possible to do better. Give a quadratic-time algorithm (in the number of lines) for finding the optimal set of line breaks. For a detailed discussion of TEX’s line breaking algorithm, see [4].

1979: Comments Ap´ ery’s constant. The year 1979 also saw Roger Ap´ery’s proof that ζ(3) is irrational. Here ζ denotes the Riemann zeta function, defined by ζ(s) =

∞  1 ns n=1




for Re s > 1; see the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries for more information. As a consequence of Ap´ery’s result, some people refer to ζ(3) as Ap´ery’s constant. It has long been known that ζ(k) is a rational multiple of π k if k ≥ 2 is even1 (see the 1919 and 1945 entries); the values of ζ(k) for odd k ≥ 3 remain largely mysterious. To fifty decimal places, ζ(3) = 1.2020569031595942853997381615114499907649862923405 . . . . Is this a rational multiple of π 3 ? If so, the numbers involved must be enormous since otherwise an explicit formula for ζ(3) would have been found long ago. Lots of mathematicians have studied ζ(3). For example, Srinivasa Ramanujan discovered the curious representation ∞

ζ(3) =

 1 7 3 π −2 e2πk − 1. 180 k3 k=1

Although we do not have a closed-form expression for ζ(3), at least we know that it is irrational. Moreover, ζ(k) is irrational for infinitely many odd k [2] and at least one of ζ(5), ζ(7), ζ(9), ζ(11) is irrational [9]. Proof of Ap´ ery’s theorem. The following argument of Stephen D. Miller2 (1974– ) [7] simplifies the proof of Frits Beukers (1953– ) [3]. We begin with a few integrals:  1 1 a a ∞  s t 1 ds dt = , (1979.2) (n + a)2 0 0 1 − st n=1  1 1 a a ∞  s t log st 1 ds dt = −2 , (1979.3) 1 − st (n + a)3 0 0 n=1  1 1 a a ∞  s t log t 1 . ds dt = − 1 − st (n + a)3 0 0 n=1 If a = b, then  1 1 0


= = and  1 0



s a tb ds dt 1 − st ∞  1  b − a n=1

1 1 − n+a n+b

1 0



sa tb log s ds dt 1 − st

 ∞  ∞  1 1 1 1  1 − + , a − b n=1 (n + a)2 (a − b)2 n=1 n + a n + b

(1979.4) (1979.5)

 ∞  ∞  1 1  1 sa tb log t 1 1 ds dt = − + , 1 − st b − a n=1 (n + b)2 (a − b)2 n=1 n + b n + a

we permit nonpositive values of k and consider the analytic continuation of ζ to C\{1}, then ζ(0) = −1/2 and 0 = ζ(2) = ζ(4) = · · · ; see the 1942 entry. 2 No relation of the second named author. 1 If



1979. TEX





 ∞  1 sa tb log st 1 1  − . ds dt = 1 − st a − b n=1 (n + a)2 (n + b)2


The formulas (1979.2) and (1979.4) follow by expanding ∞  1 = s n tn 1 − st n=0

and integrating. The others follow by differentiating (1979.2) and (1979.4) with respect to a and b. If p(s, t) is a polynomial of degree n with integral coefficients, then (1979.3) and (1979.6) imply  1 1 an + bn ζ(3) p(s, t) log(st) ds dt = , (1979.7) 1 − st d3n 0 0 in which an , bn , dn ∈ Z and dn = lcm{1, 2, . . . , n}. We claim that dn ≤ e1.01n for sufficiently large n. Indeed, dn = pk(p) , p≤n


1 log n log n ≤ log p log p is the highest power of p that divides a number at most n. The prime number theorem ensures that  log dn = k(p) log p ≤ π(n) log n ∼ n,

in which

k(p) =


which proves the claim. Consider

1 dn (s − s2 )n , n! dsn which is a polynomial with integral coefficients. Since  1 log(x) 1 dx = − 1−x 0 1 − (1 − x)t Pn (s) =


and Pn (1 − s) = (−1)n Pn (s), it follows that  1 1 1  1 1 Pn (s)Pn (t) log(st) Pn (s)Pn (t) ds dt = ds dt du (1979.9) 1 − st 1 − (1 − st)u 0 0 0 0 0  1 1 1 Pn (s)Pn (t) ds dt du. = (−1)n 0 0 0 1 − (1 − (1 − s)t)u We next claim that for s, t ∈ (0, 1) fixed,  1  1 1 1 du = du. 1 − (1 − (1 − s)t)u (1 − (1 − u)s)(1 − (1 − t)u) 0 0 Indeed, a partial fraction expansion implies that   1 1 1−t s = − , (1 − (1 − u)s)(1 − (1 − t)u) 1 − (1 − s)t 1 − (1 − u)s 1 − (1 − t)u


and hence

1 0


1 du (1 − (1 − u)s)(1 − (1 − t)u)   log(t) 1 log(1 − s) + (1 − t) = −s 1 − (1 − s)t s t−1 log(t(1 − s)) . = − 1 − (1 − s)t

Use (1979.8) with x = 1 − (1 − s)t and observe that the two integrals are equal. The preceding argument and (1979.7) ensure that  1 1 1 Pn (s)Pn (t) ds dt du (−1)n 0 0 0 1 − (1 − (1 − s)t)u  1 1 1 Pn (s)Pn (t) = (−1)n ds dt du (1 − (1 − u)s)(1 − (1 − t)u) 0 0 0  1 1 1 Pn (s)Pn (t) ds dt du = (1 − (1 − u)s)(1 − tu) 0 0 0 is of the form an + bn ζ(3) . d3n Integrating by parts n times with respect to each of the variables s and t yields  1 1 1 Pn (s)Pn (t) ds dt du (1 − (1 − u)s)(1 − tu) 0 0 0  1 1 1 (s − s2 )n (t − t2 )n (u − u2 )n = ds dt du. n+1 0 0 0 ((1 − (1 − u)s)(1 − tu)) The nonnegative function f (s, t, u) =

s(1 − s)t(1 − t)u(1 − u) (1 − (1 − u)s)(1 − tu)


vanishes on the boundary of [0, 1] × [0, 1] × [0, 1] and attains its maximum at √ √ (s, t, u) = (2 − 2, 2 − 1, 12 ), where f (2 − Thus,

√ √ √ 2, 2 − 1, 12 ) = 17 − 12 2.

√   an + bn ζ(3) = O (17 − 12 2)n . 3 dn

Since dn = O(e1.01n ) and

√ e3.03 (17 − 12 2) ≈ 0.60927,

it follows that the integer an + bn ζ(3), which is nonzero because of the positivity of the integrand (1979.10), satisfies 1 ≤ |an + bn ζ(3)| = O(0.61n ). This contradiction proves that ζ(3) is irrational.


1979. TEX

Bibliography [1] R. Ap´ ery, Irrationalit´ e de ζ(2) et ζ(3) (French), Luminy Conference on Arithmetic, Ast´ erisque 61 (1979), 11–13. MR3363457 [2] K. Ball and T. Rivoal, Irrationalit´ e d’une infinit´ e de valeurs de la fonction zˆ eta aux entiers impairs (French), Invent. Math. 146 (2001), no. 1, 193–207, DOI 10.1007/s002220100168. MR1859021 [3] F. Beukers, A note on the irrationality of ζ(2) and ζ(3), Pi: A Source Book, SpringerVerlag, 2000, 434-438. http://citeseerx.ist.psu.edu/viewdoc/download?doi= 2222&rep=rep1&type=pdf. [4] D. E. Knuth and M. F. Plass Breaking paragraphs into lines, Software: Experience and Practice 11 (1981), no.11. [5] D. E. Knuth, The art of computer programming. Vol. 1, Fundamental algorithms, 3rd ed. [of MR0286317], Addison-Wesley, Reading, MA, 1997. MR3077152 [6] D. E. Knuth, The art of computer programming, http://www-cs-faculty.stanford.edu/ ~uno/taocp.html. [7] S. D. Miller, An Easier Way to Show ζ(3) ∈ Q. http://sites.math.rutgers.edu/~sdmiller/ simplerzeta3.pdf. [8] The TEX User Group, History of TEX, http://www.tug.org/whatis.html. [9] V. V. Zudilin, One of the numbers ζ(5), ζ(7), ζ(9), ζ(11) is irrational (Russian), Uspekhi Mat. Nauk 56 (2001), no. 4(340), 149–150, DOI 10.1070/RM2001v056n04ABEH000427; English transl., Russian Math. Surveys 56 (2001), no. 4, 774–776. MR1861452 [10] W. Zudilin, Ap´ ery’s theorem. Thirty years after, Int. J. Math. Comput. Sci. 4 (2009), no. 1, 9–19. https://arxiv.org/abs/math/0202159. MR2598496


Hilbert’s Third Problem Introduction The Wallace–Bolyai–Gerwien theorem states that two rectilinear figures are equidecomposable if and only if they have the same area. For example, if a square and an equilateral triangle have the same area, then they can be dissected into a finite number of polygonal pieces so that one figure can be rearranged into the other; see Figure 1. The hypothesis that the original figures and the resulting pieces are rectilinear is necessary (see the 1924 entry on the Banach–Tarski paradox). The history of the theorem is convoluted. According to the detailed history set forth in [3], the problem was posed in 1807 by the Scottish mathematician William Wallace1 (1768–1843). John Lowry arrived at the first proof of what is now known as the Wallace–Bolyai–Gerwien theorem in 1814, although sadly his contribution now appears largely unheralded and we are unable to find out much information about him. The Hungarian mathematician Wolfgang Farkas Bolyai (1775–1856) independently proved the result in 1832, followed shortly thereafter by Paul Gerwien. Little is known about Gerwien, save that he was a “lieutenant in the Prussian 22nd Infantry Regiment and instructor in the Royal Prussian Cadet Corps in the early 1830s” and that he published two papers and an analytic geometry textbook in the 1830s [7]. As for Farkas Bolyai, he is well known for the stern warning to his son J´ anos Bolyai (1802–1860) about Euclid’s parallel postulate (see the comments from the 1963 entry): “do not try the parallels in that way. . . . I have measured that bottomless night, and all the light and all the joy of my life went out there” [11]. Does the Wallace–Bolyai–Gerwien theorem have an analogue for solids in three dimensions? At the International Congress of Mathematicians in Paris in 1900, David Hilbert presented a host of problems to inspire and guide mathematicians in the new century [8] (see the 1935 and 1970 entries). His third problem concerns polyhedra, the analogues of polygons in three dimensions. Hilbert asked if two polyhedra with equal volumes can always be dissected into finitely many polyhedra so that one of the original polyhedra can be rearranged to form the other. The problem was quickly dispatched by Max Dehn (1878–1952) in 1901. He introduced a polyhedral invariant, the Dehn invariant, such that two polyhedra are equivalent under dissection if and only if they have the same volume and same Dehn invariant. Dehn proved that the cube and the tetrahedron have different Dehn invariants and hence they are not equidecomposable [6]. More turns out to be true. In 1980 (hence the topic for this year’s entry), Hans Debrunner showed 1 We

are unsure whether he was related to his more famous namesake. 369



Figure 1. A square can be dissected and rearranged to form an equilateral triangle of equal area. Image public domain. that if a polyhedron tiles three-dimensional space, then its Dehn invariant is zero [5]. Since tetrahedra have nonzero Dehn invariants, they cannot tile R3 . See [2, 9] and the references therein for a readable account of the method. Dehn was not the first to solve Hilbert’s third problem. That honor goes to Ludwik Antoni Birkenmajer (1855–1929), who solved it in 1882 for a math contest held by the Academy of Arts and Sciences of Krak´ ow [3]. The competitors were challenged with the following: Given any two tetrahedra with equal volumes, subdivide one of them by means of planes, if it is possible, into the smallest possible number of pieces that can be rearranged so as to form the other tetrahedron. If this cannot be done at all or can be done only with certain restrictions, then prove the impossibility or specify precisely those restrictions.

This is Hilbert’s third problem! Although it was judged at the time to be correct, Birkenmajer’s solution was never published. It disappeared from history until it was recently rediscovered and reevaluated; a modern appraisal deems it valid [3]. Centennial Problem 1980 Proposed by Jeffrey Lagarias, University of Michigan. Problems on packing and tiling go back to antiquity. In On the Heavens, Aristotle (384–322 BCE) asserted [1, Book 3, Sec. 8]: It is agreed that there are only three plane figures which can fill a space, the triangle, the square, and the hexagon, and only two solids, the pyramid [tetrahedron] and the cube.

However, regular tetrahedra do not completely fill space around a point.2 This leads to the following problem, which is unsolved. How many nonoverlapping congruent regular tetrahedra can touch a point in R3 ? 2 The regular tetrahedra do not have to be congruent; similarity is enough. This is because the condition involves only a small neighborhood of the point in question.



One can show that 20 tetrahedra can touch at a point. This can be done in such a way that the 20 opposite faces of these tetrahedra (not touching the point) lie on the 20 faces of a regular icosahedron, whose centroid is the point at which the tetrahedra touch. We can get an upper bound on how many tetrahedra can touch by determining the solid angle subtended by a regular tetrahedron and dividing it into a full solid angle 4π ≈ 12.56 steradians. In this way, it is found that there is room for at most 22 tetrahedra to touch at a point. Is the answer 20, 21, or 22? No one knows. The answer is suspected to be 20. Can one even rule out 22? The problem can be turned into a two-dimensional problem by intersecting a neighborhood of the point in question with a small sphere. How many equilateral spherical triangles, with all angles arccos(1/3) (about 71 degrees) can be packed on the surface of a unit sphere without overlap? 1980: Comments Origin of the problem. The first written instance of the centennial problem appears to be the paper of Lagarias and Chuanming Zong [10, p. 1545], in which the number 20 is suggested as the correct answer. However, the problem seems to have circulated in the community for many years. Paul Sally (1933–2013) told Lagarias that he encountered the problem at Lincoln Labs in 1958. A tiling problem. Here is a problem that has an elegant solution using invariants (an invariant is a quantity that is unchanged throughout a process). Is it possible to tile a chessboard with two opposite corners removed (Figure 2(b)) with 1 × 2 dominoes (Figure 2(a))? If we assign white the number −1 and black the number 1, then the sum of the values in any figure tiled by dominoes is zero. This is an invariant: this value is the same regardless of how many dominoes are used or where they are placed. Since the sum of the values in the modified chessboard is −2, no such tiling is possible. See the 2003 entry for a more difficult variation on this problem. Zombies and monovariants. Invariants permeate much of mathematics and science. In physics one sees this with conservation of energy, momentum, and angular momentum. A related and also useful concept is that of a monovariant, a quantity whose value can change in only one direction throughout a series of transformations. Suppose that the world is an n × n chessboard and that each square is occupied either by a zombie or a person. If a square is occupied by a zombie, then it remains occupied by a zombie in all subsequent rounds. If a square has at least two edges that border zombie-infested squares, then the person in that square becomes a zombie in the next round. If not, then the person remains as they were. We iterate this process round after round; see Figure 3 for a sample evolution. For a given configuration, will the zombie infection spread and infect everyone, or will some people survive? Some configurations lead to universal infestation. For example, if each square is occupied by a zombie, then the zombies have already won. As another example, a checkerboard pattern of zombies and people also leads to universal infestation. Both of these initial configurations have on the order of n2 cells that are initially



(a) A 1 × 2 domino.

(b) A chessboard with two opposite corners removed.

Figure 2. Is it possible to tile a chessboard that has two opposite corners removed with 2 × 1 dominoes?

(a) Round 0

(b) Round 1

(c) Round 2

(d) Round 3

Figure 3. The spread of an infection. After the third round no more cells will be infected. Thus, seven people survive.

(a) The perimeter of the infected area remains unchanged.

(b) The perimeter of the infected area decreases by four.

Figure 4. Two of the possibilities that can happen when a cell is infected.

infected. Can we reduce this to approximately n initial zombies and still end with a complete takeover? Yes. One can show that infecting the main diagonal suffices and this requires only n zombies to start with.



Can the undead rule the world if there are only n − 1 zombies at the beginning? No! One way to see this is to introduce the following monovariant. At time t, let P (t) denote the perimeter of the infected area. One can show that P (t) is nonincreasing; two cases are shown in Figure 4. Since the perimeter of the n × n board is 4n and since the maximum possible perimeter of a configuration with n − 1 infected squares is 4(n − 1), it follows that at least one person will be safe if the zombie apocalypse commences with only n − 1 zombies. We leave it as an exercise to the reader to determine exactly how many people will be safe. Zeckendorf revisited. For another example of a monovariant, we return to Zeckendorf’s theorem (see the 1970 entry). By defining the right monovariant, one can show that among all decompositions of a natural number as a sum of Fibonacci numbers, none have fewer summands than the Zeckendorf decomposition. Given a decomposition of n as a sum of Fibonacci numbers (we use the convention here that F1 = 1, F2 = 2, and Fk+1 = Fk + Fk−1 ), consider the sum of the indices of the terms that appear in the decomposition. If two adjacent summands Fk and Fk−1 appear, we do not increase the index sum by replacing them with Fk+1 . If Fk is used twice in the decomposition, then use the identity 2Fk = Fk−2 + Fk−1 + Fk = Fk−2 + Fk−1 to replace the two occurrences of Fk with Fk−2 + Fk−1 , which has a smaller index sum. These two processes can occur only finitely many times. When the procedure terminates, there can be no repeats or adjacencies in the decomposition. Thus, we have a Zeckendorf decomposition. See [4] for generalizations. Bibliography [1] Aristotle, On the Heavens, http://classics.mit.edu/Aristotle/heavens.html. [2] M. Aigner and G. M. Ziegler, Proofs from The Book, 3rd ed., including illustrations by Karl H. Hofmann, Springer-Verlag, Berlin, 2004. MR2014872 [3] D. Ciesielska and K. Ciesielski, Equidecomposability of polyhedra: a solution of Hilbert’s third problem in Krak´ ow before ICM 1900, Math. Intelligencer 40 (2018), no. 2, 55–63, DOI 10.1007/s00283-017-9748-4. https://doi.org/10.1007/s00283-017-9748-4. MR3814621 [4] K. Cordwell, M. Hlavacek, C. Huynh, S. J. Miller, C. Peterson, and Y. N. T. Vu, On summand minimality of generalized Zeckendorf decompositions, https://arxiv.org/abs/1608.08764. ¨ [5] H. E. Debrunner, Uber Zerlegungsgleichheit von Pflasterpolyedern mit W¨ urfeln (German), Arch. Math. (Basel) 35 (1980), no. 6, 583–587 (1981), DOI 10.1007/BF01235384. MR604258 ¨ [6] M. Dehn, Ueber den Rauminhalt, Mathematische Annalen 55 (1901), no. 3, 465–478. http:// gdz.sub.uni-goettingen.de/dms/load/img/?PID=GDZPPN002258633. [7] G. N. Frederickson, Dissections: plane & fancy, Cambridge University Press, Cambridge, 1997. MR1735254 ¨ [8] D. Hilbert, Uber das Unendliche, Math. Ann. 95 (1926), 161–190. http://link.springer. com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/190208-10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf. [9] L. A. Krasilnikova, Hilbert’s Third Problem (A Story of Threes), MIT Admissions Blog, February 25, 2015, http://mitadmissions.org/blogs/entry/hilberts-third-problem-astory-of-threes and http://sciencecow.mit.edu/me/hilberts_third_problem.pdf. [10] J. C. Lagarias and C. Zong, Mysteries in packing regular tetrahedra, Notices Amer. Math. Soc. 59 (2012), no. 11, 1540–1549, DOI 10.1090/noti918. http://www.ams.org/notices/201211/ rtx121101540p.pdf. MR3027108



[11] J. J. O’Connor and E. F. Robertson, Farkas Wolfgang Bolyai, MacTutor History of Mathematics, http://www-groups.dcs.st-and.ac.uk/history/Biographies/Bolyai_Farkas.html. [12] Wikipedia, Dehn invariant, https://en.wikipedia.org/wiki/Dehn_invariant. [13] Wikipedia, Hilbert’s third problem, https://en.wikipedia.org/wiki/Hilbert’s third problem. [14] Wolfram MathWorld, Dissection, http://mathworld.wolfram.com/Dissection.html.


The Mason–Stothers Theorem Introduction In 1981, Walter Wilson Stothers (1946–2009) proved a remarkable theorem about polynomials [10], later discovered independently by Richard C. Mason [3]. Although broader generalizations exist, we state it here for polynomials over the complex numbers for the sake of simplicity. The Mason–Stothers theorem states that if a, b, c are relatively prime polynomials, not all of which are constant, and a + b = c, then max{deg a, deg b, deg c} ≤ deg rad(abc) − 1,


in which rad f denotes radical of f , that is,  the product of the distinct irreducible factors of f . For example, rad x3 (x + 1)2 = x(x + 1). Since the field of complex numbers is algebraically closed, deg rad(abc) = number of distinct roots of abc. What is the importance of the Mason–Stothers theorem? The centennial problem for this year is to prove the polynomial version of Fermat’s last theorem! If that is not motivation enough, perhaps an integer analogue of the Mason–Stothers theorem will interest you. Why should such an analogue exist? As students of abstract algebra know, there are a great many similarities between integers and polynomials. For example, they both form rings and they both enjoy unique factorization into irreducibles. The integers have the primes as their basic building blocks and the polynomials have the linear polynomials as theirs. Here is a first attempt at an integer analogue of (1981.1). Suppose that a, b, c are relatively prime integers and a + b = c. A naive generalization of (1981.1) is   max |a|, |b|, |c| ≤ rad(abc), (1981.2) in which rad(abc) denotes the product of the distinct prime factors of abc. For example, rad(200) = rad(23 · 52 ) = 2 · 5 = 10. Unfortunately, (1981.2) is false, even if we replace rad(abc) with K rad(abc) for some large K > 0 [1]. Let p ≥ 2K be a large prime and let a = 1,

b = 2p(p−1) − 1,

and c = 2p(p−1) .

Then Euler’s generalization of Fermat’s little theorem ensures that p2 divides b and hence the strengthened (1981.2) implies that c ≤ K rad(abc) ≤

2Kc 2bK < < c, p p




which is absurd. Consequently, an integral generalization of the Mason–Stothers theorem must be more subtle. The celebrated abc-conjecture states that for any > 0, there exists a constant κ so that if a, b, c are relatively prime integers and a + b = c, then  1+ max{|a|, |b|, |c|} ≤ κ rad(abc) . It was posed independently in 1985 by David Masser (1948– ) [4] and in 1988 by Joseph Oesterl´e (1954– ) [6]. Although the Mason–Stothers theorem has several short proofs (see the comments below for the proofs of Noah Snyder (1980– ) [9] and Joseph H. Silverman (1955– ) [8]), the abc-conjecture is much more stubborn. It is considered to be one of the most important open problems in number theory [1]. In 2012, Shinichi Mochizuki (1969– ), a respected mathematician previously best known for proving a conjecture of Alexander Grothendieck, released a series of four papers on “Inter-universal Teichm¨ uller theory,” totaling over 500 pages, which claim to contain a proof of the abc-conjecture [5]. As of 2019, there is no universal consensus about the validity of the proof. Centennial Problem 1981 Proposed by Jeffery Paul Wheeler, University of Pittsburgh. Use the Mason–Stothers theorem to prove a polynomial version of Fermat’s last theorem: if x, y, z are relatively prime polynomials, not all of which are constant, then xn + y n = z n has no solutions with n ≥ 3. 1981: Comments Snyder’s proof of the Mason–Stothers theorem. In 2000, Noah Snyder provided a simple proof of the Mason–Stothers theorem [9, 11]. Let a, b, c be relatively prime polynomials, not all of which are constant, and suppose that a+b+c = 0 (it is more convenient to work with this symmetric version instead of a + b = c). Then a, b, c are pairwise relatively prime since any polynomial that divides two of a, b, c divides the third. Since a + b + c = 0, the three Wronskians W (a, b) = ab − a b, 


W (b, c) = bc − b c = b(−a − b ) − b (−a − b) = ab − a b, W (c, a) = ca − c a = (−a − b)a − (−a − b )a = ab − a b are equal. Let W = W (a, b) = W (b, c) = W (c, a) denote their common value. We claim that W is not the zero polynomial. Without loss of generality, suppose that a = 0; that is, a is not constant. If W = 0, then ab = a b and hence a divides a since gcd(a, b) = 1. However, this contradicts the fact that deg a > deg a . Thus, W is not identically zero.



The various formulas for W ensure that gcd(a, a ), gcd(b, b ), and gcd(c, c ) each divide W . Since these three polynomials are pairwise relatively prime, W is divisible by their product and hence deg gcd(a, a ) + deg gcd(b, b ) + deg gcd(c, c ) ≤ deg W.


⎫ deg gcd(a, a ) ≥ deg a − (number of distinct roots of a),⎪ ⎬ deg gcd(b, b ) ≥ deg b − (number of distinct roots of b), ⎪ ⎭ deg gcd(c, c ) ≥ deg c − (number of distinct roots of c).



From (1981.3), we have deg W ≤ deg a + deg b − 1. Putting (1981.4) and (1981.5) together, simplifying, and using the relative primality of a, b, c, yields deg c ≤ (number of distinct roots of a) + (number of distinct roots of b) + (number of distinct roots of c) − 1 = (number of distinct roots of abc) − 1 = deg rad(abc) − 1. By symmetry, the same argument provides identical bounds deg a and deg b. This yields (1981.1), as desired. Silverman’s proof of the Mason–Stothers theorem. Joseph H. Silverman (1955– ) proved a more general result that, when distilled, provides an elegant proof of the Mason–Stothers theorem [8]. We follow the presentation in [1]. Suppose that π : C ∪ {∞} → C ∪ {∞} is a rational function; that is, π(t) = f (t)/g(t), in which f, g are polynomials. Since the Riemann sphere C∪{∞} has genus 0, the Riemann– Hurwitz formula from the theory of Riemann surfaces [2] tells us    2 deg π = 2 + deg π − |π −1 (z)| , (1981.6) z∈C∪{∞}

in which deg π = max{deg f, deg f } and     π −1 (z) = w ∈ C ∪ {∞} : π(w) = z = w ∈ C ∪ {∞} : f (w) − zg(w) . In particular, |π −1 (z)| ≤ deg π, with equality unless f (w) − zg(w) has a double root. If a, b, c are relatively prime polynomials, not all of which are constant, and a + b = c, then let π = a/c. Every term on the right-hand side of (1981.6) is nonnegative and hence    2 deg π ≥ 2 + deg π − |π −1 (z)| . z∈{0,1,∞}



Observe the following. • If π(∞) = 0, then π −1 (0) is the set of distinct roots of a. • If π(∞) = 1, then π −1 (0) is the set of distinct roots of b. • If π(∞) = ∞, then π −1 (∞) is the set of distinct roots of c. Since ∞ belongs to at most one of these sets, we get deg π ≤ (number of distinct roots of abc) − 1. Since deg π = max{deg a, deg c} and b = c − a, we obtain max{deg a, deg b, deg b} ≤ deg rad(abc) − 1. This concludes the proof of the Mason–Stothers theorem. Fermat’s last theorem and the abc-conjecture. What does the abc-conjecture have to say about the Fermat equation xn + y n = z n ,


in which x, y, z are natural numbers with gcd(x, y, z) = 1 and n ≥ 3? We follow the presentation in [1]. The case n = 3 was handled by Leonhard Euler in 1770, so we suppose that n ≥ 4. Let a = xn ,

b = yn ,


c = zn,

and observe that rad(abc) = rad(xn y n z n ) = rad(xyz) ≤ xyz ≤ z 3 . For each > 0, the abc-conjecture provides a constant κ so that z n = max{a, b, c} ≤ k rad(abc)1+ ≤ k (z 3 )1+ = k z 3+3 . If < 13 , then the exponent on the right-hand side is less than 4 and hence the preceding inequality has only finitely many solutions. Since x, y ≤ z, it follows that (1981.7) has only finitely many solutions. This result was proved for n ≥ 5 (without the abc-conjecture) by Gerd Faltings (1954– ) as a consequence of his proof of Mordell’s conjecture (a curve of genus greater than 1 over the field of rational numbers has only finitely many rational points). Faltings earned the Fields Medal in 1986 for that result. Solution to the problem. Suppose that x, y, z are polynomials, not all of which are constant, such that xn + y n = z n for some n ≥ 3. The Mason–Stothers theorem with a = xn , b = y n , and c = z n ensures that n deg x ≤ n max{deg x, deg y, deg z} = max deg{xn , y n , z n } ≤ deg rad(xn y n z n ) − 1 = deg rad(xyz) − 1 ≤ deg x + deg y + deg z − 1,



and similarly for n deg y and n deg z. Therefore, n(deg x + deg y + deg z) ≤ 3(deg x + deg y + deg z) − 3 and hence

3 < 3. deg x + deg y + deg z Thus, the polynomial version of the Fermat equation has no solutions for n ≥ 3, in which not all the polynomials involved are constant. n ≤ 3−

Bibliography [1] A. Granville and T. J. Tucker, It’s as easy as abc, Notices Amer. Math. Soc. 49 (2002), no. 10, 1224–1231. http://www.ams.org/notices/200210/fea-granville.pdf. MR1930670 [2] G. A. Jones and D. Singerman, Complex functions: An algebraic and geometric viewpoint, Cambridge University Press, Cambridge, 1987. MR890746 [3] R. C. Mason, Diophantine equations over function fields, London Mathematical Society Lecture Note Series, vol. 96, Cambridge University Press, Cambridge, 1984. MR754559 [4] D. W. Masser, Open problems, Proceedings of the Symposium on Analytic Number Theory. London: Imperial College, 1985. [5] S. Mochizuki, http://www.kurims.kyoto-u.ac.jp/~motizuki/top-english.html. [6] J. Oesterl´e, Nouvelles approches du “th´ eor` eme” de Fermat (French), Ast´ erisque 161-162 (1988), Exp. No. 694, 4, 165–186 (1989). S´ eminaire Bourbaki, Vol. 1987/88. MR992208 [7] P. Ribenboim, 13 Lectures of Fermat’s Last Theorem, Springer, 1979. [8] J. H. Silverman, The S-unit equation over function fields, Math. Proc. Cambridge Philos. Soc. 95 (1984), no. 1, 3–4, DOI 10.1017/S0305004100061235. MR727073 [9] N. Snyder, An alternate proof of Mason’s theorem, Elem. Math. 55 (2000), no. 3, 93–94, DOI 10.1007/s000170050074. http://cr.yp.to/bib/2000/snyder.pdf. MR1781918 [10] W. W. Stothers, Polynomial identities and Hauptmoduln, Quart. J. Math. Oxford Ser. (2) 32 (1981), no. 127, 349–370, DOI 10.1093/qmath/32.3.349. http://qjmath.oxfordjournals. org/content/32/3/349.extract. MR625647 [11] Wikipedia, Mason–Stother’s theorem, https://en.wikipedia.org/wiki/Mason-Stothers theorem.


Two Envelopes Problem Introduction The debate about whether the natural numbers and the primes are built into the universe or whether they are human constructs has raged for centuries. In A Mathematician’s Apology, G. H. Hardy (see the 1920, 1923, and 1940 entries) asserts: 317 is a prime, not because we think so, or because our minds are shaped in one way rather than another, but because it is, because mathematical reality is built that way.

We make no attempt to wade into these deep waters here: you are welcome to consider Figures 1 and 2 and draw your own conclusions. Despite his legendary aversion to applicable mathematics, Hardy helped to solidify some of the theoretical underpinnings of probability theory. Famed probabilist Persi Diaconis (1945– ) wrote [1]: Despite a true antipathy to the subject, Hardy contributed deeply to modern probability. His work with Ramanujan begat probabilistic number theory. His work on Tauberian theorems and divergent series has probabilistic proofs and interpretations. Finally, Hardy spaces are a central ingredient in stochastic calculus. . . . I want to argue that Hardy had no knowledge of probability theory and indeed had a genuine antipathy to the subject. To begin with, Hardy loved clear rigorous argument. At the time he worked, the mathematical underpinnings of probability were a vague mess. . . it was only in 1933 that Kolmogorov gave a measure theoretic interpretation of probability; a random variable was defined as a measurable function. Then one could see that early workers in probability; Bernoulli, Laplace, Gauss, Chebychev, Markov were doing mathematics after all.

The naive approach to probability is full of pitfalls and paradoxes. It took many years for the theory to be established on firm foundations. We have seen several examples of paradoxes throughout this book. Each one provides a valuable opportunity for further work: it means there is something incomplete or incompatible with our view of mathematics. We began with paradoxes related to the notion of infinity in the 1918 entry. Then we encountered the Banach–Tarski paradox in the 1924 entry, which challenged our understanding of what area and volume are. We continued with the liar’s paradox and Russell’s paradox in the 1929 entry and discussed related issues in set theory and logic. This is just the start! We have many others paradoxes in the entries before this, as well as a few more ahead (see the Monty Hall problem in the 1990 entry). 381



Figure 1. Six balls can be arranged to form a rectangular array, each side of which consists of two or more balls. That is, six is a composite number. The arrangements above are physical demonstrations of the factorizations 3 × 2 = 6 and 2 × 3 = 6. Are these statements about the physical universe itself or about a human construct? Would the result be the same if six physical balls were used on the other side of the galaxy? Would we suddenly find ourselves unable to arrange six balls in a rectangular array as above?

Figure 2. Seven balls cannot be arranged into a rectangular array, each side of which consists of two or more balls. That is, seven is a prime number. Is the primality of seven built into the universe? Is it a human construct? Is it conceivable that, in another time or place, seven balls could be placed in a rectangular array (each side of which consists of two or more balls)? For this year, we look at the two envelopes problem. A player can choose between two closed, identically constructed envelopes. The envelopes are labeled A and B, respectively. Both envelopes contain money, although one of them contains twice as much as the other. The one that contains more money cannot be determined without opening the envelopes. You initially choose envelope A but do not open it. You are permitted to switch envelopes indefinitely until a final decision is made. Which envelope should you open? Where is the paradox? Let us examine the expected value of switching the envelopes. For example, envelopes A and B each have a probability of 1/2 to contain the greater amount. Suppose that we choose A and let x > 0 denote the amount of money in the envelope. Should we switch? Since B has an equal chance of having either value, half the time it should contain half as much as A, namely x/2, and half the time it should contain twice, namely 2x. Thus, the expected



amount of money in B is   1 1 x 5 (2x) + = x. 2 2 2 4


Since this is larger than x, the amount in A, we should switch. Of course, the same argument applies to B as well. Therefore, we should continue to switch back and forth indefinitely since the expected return (5/4)n x after n switches tends to infinity! In principle, this suggests that one can place $1 and $2 into two different envelopes, juggle them for several minutes, then open one of the envelopes with the expectation of receiving at least a billion dollars. It is not entirely clear who first came up with the two envelopes problem. A variant of it appeared in a 1953 recreational mathematics book by Maurice Kraitchek (1882–1957), who considered a wager between two rich men who wished to determine whose necktie was more expensive. In the same year, Littlewood stated another variant and credited it to Schr¨ odinger (see the 1925 entry). What is beyond a doubt is that the problem was popularized in 1982 by Scientific American writer and puzzle enthusiast Martin Gardner [3]; see the 1914 entry. The following problem was first proposed by Olle H¨ aggstr¨ om in 2013 [2]. It is similar to Newcomb’s paradox, a variant of the two envelopes problem.

Centennial Problem 1982 Proposed by Avery T. Carr, Emporia State University, and Steven J. Miller, Williams College. An intelligent donor has prepared two boxes for you: a big one and a small one. The small one contains $1,000. The big one contains either $1,000,000 or nothing. You have a choice between accepting both boxes or just the big box. It seems obvious that you should accept both boxes, since that gives you $1,000 regardless of the content of the big box. However, the donor has tried to predict whether you will pick one box or two boxes. If the prediction is that you pick just the big box, then it contains $1,000,000. If the prediction is that you pick both boxes, then the big box is empty. The donor has performed the same experiment with many other people and has predicted correctly 90% of the time. What should you do?

1982: Comments Resolution of the two envelopes problem. Unlike many of the other paradoxes that we have encountered, the issue for the two envelopes problem is easily highlighted and explained. Let x denote the amount in the lesser of the two envelopes. Then the total amount of money in the two envelopes is 3x = x + 2x and this cannot change. If A contains x dollars, then you gain x by switching. If A contains 2x dollars, then you lose x by switching. Therefore, the expected gain from switching is 1 1 x + (−x) = 0, 2 2



as expected. The issue with (1982.1) is that the terms 2x and x/2 are conditioned upon whether envelope B contains more or less money than A. Thus, a more complicated argument involving conditional probability is required to pursue that line of reasoning. Bibliography [1] P. Diaconis, G. H. Hardy and probability???, Bull. London Math. Soc. 34 (2002), no. 4, 385–402, DOI 10.1112/S002460930200111X. https://doi.org/10.1112/S002460930200111X. MR1897417 [2] O. H¨ aggstr¨ om, Paradoxes in Probability Theory (Book Review), Notices of the AMS 3 (2013), 329–331. [3] M. Gardner, Aha! Gotcha: Paradoxes to Puzzle and Delight, W. H. Freeman & Co, 1982. [4] Wikipedia, Two envelopes problem, http://en.wikipedia.org/wiki/Two envelopes problem.


Julia Robinson Introduction Julia Robinson was the first woman to become president of the American Mathematical Society (1983–1984). She shared her passion for mathematics with her sister and biographer, Constance Reid, who said about Robinson: She herself, in the normal course of events, would never have considered recounting the story of her own life. As far as she was concerned, what she had done mathematically was all that was significant. [8]

“Significant” is a fitting word indeed when speaking about the magnitude of Robinson’s mathematical accomplishments, especially regarding her contributions to the eventual resolution of Hilbert’s tenth problem (see the 1970 entry). In the early years of the 20th century, David Hilbert proposed twenty-three problems that would shape mathematics for decades to come [3]. One of the underlying themes of the list was the question of decidability. Given a mathematical problem that falls into a certain class, is there a general algorithm that can solve every problem in the class? For his tenth problem, Hilbert considered the general solvability of Diophantine equations. These are equations of the form P (x1 , x2 , . . . , xn ) = 0, in which P is a polynomial with integer coefficients and the unknowns x1 , x2 , . . . , xn are integers. Hilbert’s tenth problem is the following: is there an algorithm that can determine the solvability of an arbitrary Diophantine equation? Over the span of several decades, Robinson and her collaborators, Martin Davis and Hilary Putnam, proved that if there is at least one “Diophantine relation of exponential growth,” then no such solvability algorithm exists. They later established that if a Diophantine equation P (a, b, c, x1 , x2 , . . . , xm ) = 0 has a solution in integers x1 , x2 , . . . , xm if and only if a = bc , then the necessary growth condition is met. Robinson later simplified this criterion to involve only two parameters a, b. In 1970, Yuri Matiyasevich discovered an equation that satisfied this criterion. This answered Hilbert’s tenth problem in the negative (see the 1970 entry). Decision problems are found in many areas of mathematics, such as combinatorics and graph theory. One particularly important problem is the traveling salesman problem: given a set of cities and the distances between them, find the shortest possible route that visits each city once and returns to the city of origin. This is known to be an “NP-complete problem,” which, without getting technical, means that there is no efficient algorithm to solve it. 385



A graph G is a set of vertices, represented by dots, that are connected by edges, represented by line segments that connect exactly two dots (see the 2006 entry). It is often fruitful to assign a weight to each edge. For example, these might represent the physical distance between two cities. The sum of the weighted edges of a path in G is the total weight of the path. In this context, the traveling salesman problem asks one to find the path of least weight that traverses all of the vertices once.

Centennial Problem 1983 Proposed by Avery T. Carr, Emporia State University. A cycle in a graph G is a closed path that traverses a set of vertices and passes through each exactly once. A Hamiltonian cycle of G is a cycle that contains every vertex of G; see Figure 1. Is there an efficient algorithm to find a Hamiltonian cycle (provided at least one exists) of minimum weight for an arbitrary, edge-weighted graph G? This is an equivalent abstract version of the traveling salesman problem. A positive solution would have notable implications in several industries, including logistics and computer engineering.

1983: Comments Diophantine sets and the prime numbers. We say that S ⊆ Nj is a Diophantine set if there is a Diophantine equation P (x1 , x2 , . . . , xj , y1 , y2 , . . . , yk ) = 0 so that (n1 , n2 , . . . , nj ) ∈ S if and only if there is an (m1 , m2 , . . . , mk ) ∈ Nk so that P (n1 , n2 , . . . , nj , m1 , m2 , . . . , mk ) = 0. For example, the set {1, 4, 9, 16, . . .} is Diophantine since we may let P (x, y) = y 2 − x. Indeed, x is a perfect square if and only if x = y 2 for some y ∈ N, that is, if and only if P (x, y) = 0. Similarly, the arithmetic progression {a + b, 2a + b, 3a + b, . . .} is Diophantine, as witnessed by the polynomial P (x, y) = ay + b − x. Is the set of prime numbers Diophantine? Surely the primes are random and unpredictable enough that they could never be encapsulated by a single polynomial, right? In 1976, James P. Jones, Daihachiro Sato, Hideo Wada, and Douglas Wien found such a polynomial. They write: Martin Davis, Yuri Matijasevich, Hilary Putnam and Julia Robinson have proven that every recursively enumerable set is Diophantine, and hence that the set of prime numbers is Diophantine. . . it follows that the set of prime numbers is representable by a polynomial formula. In this article such a prime representing polynomial will be exhibited in explicit form. [5]


(a) Cube

(b) Octahedron

(c) Dodecahedron

Figure 1. Hamiltonian cycles (right) for the edge graphs of the cube, octahedron, and dodecahedron.




The set of prime numbers, they show, is precisely the set of positive values assumed by the polynomial P (a, b, c, . . . , x, y, z)  = (k + 2) 1 − [wz + h + j − q]2 − [(gk + 2g + k + 1)(h + j) + h − z]2 − [16(k + 1)3 (k + 2)(n + 1)2 + 1 − f 2 ]2 − [2n + p + q + z − e]2 − [e3 (e + 2)(a + 1)2 + 1 − o2 ]2 − [(a2 − 1)y 2 + 1 − x2 ]2 − [16r 2 y 4 (a2 − 1) + 1 − u2 ]2 − [n +  + v − y]2 − [(a2 − 1)2 + 1 − m2 ]2 − [ai + k + 1 −  − i]2 − [((a + u2 (u2 − a))2 − 1)(n + 4dy)2 + 1 − (x + cu)2 ]2 − [p + (a − n − 1) + b(2an + 2a − n2 − 2n − 2) − m]2 − [q + y(a − p − 1) + s(2ap + 2a − p2 − 2p − 2) − x]2  − [z + p(a − p) + t(2ap − p2 − 1) − pm]2 , which is of degree 25 and has 26 variables. An interesting corollary is that if p is a prime number, then there is a computation that confirms the primality of p that involves only 87 additions and multiplications. Indeed, one only need exhibit natural numbers a, b, c, d, . . . , x, y, z so that P (a, b, c, . . . , x, y, z) = p. Of course, finding such numbers is no easy task. Mills’s constant. The existence of a single polynomial that encodes the prime numbers is shocking. What about a simple formula that produces only prime numbers? In 1947, William H. Mills proved the existence of a constant A (called Mills’s constant) so that n A3  is prime for n = 0, 1, 2, . . .. Since there are actually uncountably many values of A with this property, the term “Mills’s constant” is mildly inappropriate, especially since Mills himself did not specify a precise numerical value for A. Assuming the truth of the Riemann hypothesis, the smallest possible Mills’s constant begins 1.3063778838630806904686144926026057129167845851567136 . . . . It was calculated to 6,850 decimal places by Chris K. Caldwell and Yuanyou Cheng in 2005 [2]. It is unknown whether this constant is rational or irrational. The proof of Mills’s result indicates how one might go about constructing such a constant A. Unfortunately, one needs to know a lot about the distribution of the prime numbers to compute A and hence Mills’s result is not a practical method for producing primes. Here is the proof. Let pn denote the nth prime number. Using knowledge about the rate of growth of the Riemann zeta function on the critical line 12 + it, Albert Ingham showed in 1937 that there is a constant K so that1 pn+1 − pn < Kp5/8 n 1 Roger C. Baker (1947– ), Glyn Harman (1956– ), and J´ anos Pintz (1950– ) showed that the exponent can be lowered from 5/8 = 0.625 to 0.525 [1]. Assuming the Riemann hypothesis, this can be reduced even further and the constant K made explicit.



for all n ∈ N [4]. We use this to show that if N ≥ K 8 , then there is a prime p so that N 3 < p < (N + 1)3 . (1983.1) To see this, let pn = N 3 . Then N 3 < pn+1 < pn + Kp5/8 n < N 3 + KN 15/8 ≤ N3 + N2 < (N + 1)3 − 1, as desired. If P0 ≥ K 8 is prime, then (1983.1) permits us to find a sequence P0 , P1 , P2 , . . . of primes so that Pn3 < Pn+1 < (Pn + 1)3 − 1. (1983.2) Define


un = Pn3


vn = (Pn + 1)3



Then perform a few computations based upon (1983.2) to verify that un < un+1 < vn+1 < vn for all n. In particular, the sequence un is increasing and bounded above by v0 and is therefore convergent. Define A = lim un n→∞

and observe that un < A < vn , and hence n

Pn < A3

< Pn + 1,

for all n. Thus, n

A3  = Pn is prime for n = 0, 1, 2, . . .. Bibliography [1] R. C. Baker, G. Harman, and J. Pintz, The difference between consecutive primes. II, Proc. London Math. Soc. (3) 83 (2001), no. 3, 532–562, DOI 10.1112/plms/83.3.532. MR1851081 [2] C. K. Caldwell and Y. Cheng, Determining Mills’ constant and a note on Honaker’s problem, J. Integer Seq. 8 (2005), no. 4, Article 05.4.1, 9. MR2165330 ¨ [3] D. Hilbert, Uber das Unendliche, Math. Ann. 95 (1926), 161–190. http://link.springer. com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/190208-10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf. [4] A. E. Ingham, On the difference between consecutive primes, Quart. J. Math. Oxford 8 (1937), 255–266. [5] J. P. Jones, D. Sato, H. Wada, and D. Wiens, Diophantine representation of the set of prime numbers, Amer. Math. Monthly 83 (1976), no. 6, 449–464, DOI 10.2307/2318339. https:// www.jstor.org/stable/2318339. MR0414514 [6] Y. Matiyasevich, My Collaboration with Julia Robinson, http://logic.pdmi.ras.ru/~yumat/ personaljournal/collaborationjulia/index.html. [7] W. H. Mills, A prime-representing function, Bull. Amer. Math. Soc. 53 (1947), 604, DOI 10.1090/S0002-9904-1947-08849-2. MR0020593



[8] C. Reid, The autobiography of Julia Robinson, College Math. J. 17 (1986), no. 1, 3–21, DOI 10.2307/2686866. https://www.maa.org/sites/default/files/pdf/upload_library/ 22/Polya/07468342.di020720.02p00912.pdf. MR827630 [9] C. Reid, Being Julia Robinson’s sister, Notices Amer. Math. Soc. 43 (1996), no. 12, 1486– 1492. http://www.ams.org/notices/199612/reid.pdf. MR1416722 [10] Wikipedia, Hilbert’s tenth problem, http://en.wikipedia.org/wiki/Hilbert’s tenth problem. [11] Wikipedia, Julia Robinson. http://en.wikipedia.org/wiki/Julia_Robinson. [12] C. Wood, Julia Robinson and Hilbert’s Tenth Problem (film review), Notices of the American Mathematical Society (2008), 573-575. http://www.ams.org/notices/200805/tx080500573p. pdf.


1984 Introduction The year is the title of this entry. The other entries of this work honor mathematicians or mathematical events; in a sense, this year honors math itself. Here 1984 refers to the classic dystopian novel, 1984, by George Orwell (1903–1950). Written thirty-five years prior to 1984, it describes a world at perpetual war in which the three major governments manipulate and control their populations. Some of the methods of control are centuries old, such as informants, constant surveillance, and fear. Others are either new or are given a clearer expression than before, such as Newspeak (the language of Oceania, designed to limit freedom of thought by restricting what can be discussed). One of the most famous passages of the novel involves the “equation” 2 + 2 = 5, which is false (unless one works modulo 1, as one does in certain Diophantine approximation problems; see the 1922, 1931, 1938, and 1972 entries). The protagonist, Winston Smith, is thinking about Big Brother, the rule of the party, and “alternative facts”: In the end the Party would announce that two and two made five, and you would have to believe it. It was inevitable that they should make that claim sooner or later: the logic of their position demanded it. Not merely the validity of experience, but the very existence of external reality, was tacitly denied by their philosophy. The heresy of heresies was common sense. And what was terrifying was not that they would kill you for thinking otherwise, but that they might be right. For, after all, how do we know that two and two make four? Or that the force of gravity works? Or that the past is unchangeable? If both the past and the external world exist only in the mind, and if the mind itself is controllable. . . what then?

A few paragraphs later, the chapter ends with Winston thinking: Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.

The Star Trek: The Next Generation episode Chain of Command, Part II features a striking homage to 1984 that mirrors the powerful scene in which Winston Smith is tortured by O’Brien (Orwell does not provide the character with a first 391


1984. 1984

name).1 A Cardassian torturer shines four bright lights on a physically restrained Jean-Luc Picard, who is pressured most unpleasantly over many days to say that he sees five lights. Picard’s captor gives up on obtaining useful information from Picard and focuses only on breaking his spirit. Fortunately, a diplomatic solution is found by the higher-ups and Picard’s release is ordered, although not before he almost gives in. As he is ushered out of the torture chamber he shouts at his former captor, “There are four lights!” In honor of Winston’s quote about everything following from the freedom to say 2 + 2 = 4, this year’s problem is the famous four fours puzzle: given four fours and the unlimited use of a finite set of mathematical operations, which natural numbers are constructible? For example, 44 − 44 = 0,

44/44 = 1,


4/4 + 4/4 = 2.

More creatively, we have 4! + 4! + 4/4 = 49. There are many variants of the puzzle. The following is from [6]. Here’s a brain teaser! Can you (with the help of your calculator, as needed) “build” all the whole numbers between 1 and 100 using only four 4’s? Use only the + - × / ( ) . ∧ 2 = and 4 keys on your calculator; 4! = 4 × 3 × 2 × 1 is allowed, along with repeating decimal 4 (.4444 . . .). (All the whole numbers up to 120 have been “built” with just four 4’s—how many can you find?)

Centennial Problem 1984 Proposed by Steven J. Miller, Williams College. To make the problem even more interesting, let us add a scoring component. Assign a cost of 1 unit to the four basic binary operations (addition, subtraction, multiplication, and division). Assign a cost of 2 units for exponentiation, factorization, and nth roots. Continue along these lines until you have assigned a value to all the operations you are allowed to use. Classify all numbers of cost at most C. Given some integer n, is there a bound on the minimal cost to represent it? 1984: Comments So Long, and Thanks for All the Fish. The fourth book, So Long, and Thanks for All the Fish, in Douglas Adams’s heralded “Hitchhiker’s guide trilogy” was released in 1984. In the first book, The Hitchhiker’s Guide to the Galaxy, the supercomputer Deep Thought (after a seven and a half million year long calculation) produced the “Answer to the Ultimate Question of Life, The Universe, and Everything”: 42. This unhelpful response prompted the construction of a much more sophisticated computer that would find out what the Ultimate Question actually was. 1 This is an inversion of the “O’Brien must suffer” theme (as fans refer to it) in Star Trek: Deep Space Nine scripts. The character Miles O’Brien (2383– ), a noncommissioned everyman whom audience members could relate to, was often subjected to various physical and emotional tortures.



At the end of the second book, The Restaurant at the End of the Universe, after ten million additional years, the new computer (we will not spoil its identity) reveals that the Ultimate Question is, “What do you get if you multiply six by nine?” Unfortunately, an error was unintentionally introduced into the computation that rendered the result meaningless. Or did it? Most readers will agree that 6 × 9 = 54. However, it is true that 6 × 9 = 42 if the computations are carried out in base 13 since 54 = 4 · 13 + 2 · 1 = (42)13 . Adams has claimed that this was unintentional. On the other hand, the title “42” of the 2007 Doctor Who episode is an intentional simultaneous homage to Douglas Adams and the television show “24,” along with a reference to the approximate running time of the episode (which proceeds in “real time”). Bieberbach conjecture. Ludwig Bieberbach (1886–1982) is best known for the conjecture that bears his name and for being a terrible person: he was a passionate Nazi who even wore a Nazi uniform while giving an examination [4].  n Suppose that f (z) = ∞ n=0 an z defines an analytic function on the open unit disk D in the complex plane. A sufficient condition for this is lim sup |an |1/n ≤ 1, n→∞

which ensures that the radius of convergence of the power series that defines f is greater than or equal to 1. If f is one-to-one on D, what can be said about the growth of the Taylor coefficients an ? Since f (0) = a0 , it makes sense to normalize f so that f (0) = 0; otherwise, a0 could be any complex number. Then f  (0) = a1 and there are two possibilities. If a1 = 0, then f  (0) = 0 and basic complex analysis tells us that f is not one-to-one on any neighborhood of the origin. Thus, we may assume that a1 = 0. In this case, we may divide f by a1 and, without loss of generality, we may as well assume that a1 = 1: f (z) = z + a2 z 2 + a3 z 3 + · · · . Such a one-to-one analytic function is called a schlicht function. Suppose that f is schlicht. The Bieberbach conjecture, first posed in 1916, states that |an | ≤ n for n ≥ 2 and, moreover, that if equality is attained for some n, then ∞  z = nαn−1 z n (1984.1) f (z) = (1 − αz)2 n=1 for some constant α of absolute value one [1]. The function k(z) =

z = z + 2z 2 + 3z 3 + · · · (1 − z)2

is the Koebe function; see Figure 1. The extremal functions (1984.1) are just rotations of the Koebe functions in the following sense: f (z) = α−1 k(αz).


1984. 1984

Figure 1. The Koebe function, named after Paul Koebe (1882– 1945), k(z) = z/(1 − z)2 is a one-to-one map from D onto the complement of the half-line (−∞, − 41 ] on the real axis. The Bieberbach conjecture was finally proved in 1984 by Louis de Branges (1932– ) [2, 3]. Because of several unfortunate incidents in the past in which de Branges had claimed erroneous solutions to major open problems, his claims were greeted with some skepticism. He traveled to Leningrad in 1984, where several mathematicians worked through the proof and declared it sound. Overnight, de Branges went from being a pariah of sorts to a superstar. He was showered with various prizes, ranging from the Ostrowski Prize (1989) to the Steele Prize (1994). Bibliography [1]




[5] [6]

¨ L. Bieberbach, Uber die Koeffizienten derjenigen Potenzreihen, welche eine schlichte Abbildung des Einheitskreises vermitteln, Sitzungsber. Preuss. Akad. Wiss. Phys-Math. Kl. (1916), 940–955. L. de Branges, A proof of the Bieberbach conjecture, Acta Math. 154 (1985), no. 1-2, 137– 152, DOI 10.1007/BF02392821. http://link.springer.com/article/10.1007%2FBF02392821. MR772434 L. de Branges, Underlying concepts in the proof of the Bieberbach conjecture, Proceedings of the International Congress of Mathematicians, Vol. 1, 2 (Berkeley, Calif., 1986), Amer. Math. Soc., Providence, RI, 1987, pp. 25–42. http://www.mathunion.org/ICM/ICM1986.1/ Main/icm1986.1.0025.0042.ocr.pdf. MR934213 J. J. O’Connor and E. F. Robertson, Ludwig Georg Elias Moses Bieberbach, MacTutor History of Mathematics, http://www-history.mcs.st-andrews.ac.uk/Biographies/ Bieberbach.html. G. Orwell, Nineteen Eighty-Four: A novel, Secker & Warburg, 1949. Texas Instruments Incorporated, The Great International Math on Keys Book, Texas Instruments, 1976.


The Jones Polynomial Introduction A knot is an embedding of a circle in three-dimensional space. We consider here only tame knots; these are knots that can be physically realized with a string or rope that has a nonzero thickness. Given two knots, how do we determine if they are equivalent? That is, can we manipulate one of them into the other without cutting? For example, is the trefoil knot (Figure 1(b)) equivalent to the unknot (Figure 1(a))? Knot theorists try to solve these sorts of problems by associating a mathematical object (an invariant) to each knot in such a way that equivalent knots are assigned the same invariant. Thus, two knots with different invariants are truly different knots: neither can be manipulated into the other (on the other hand, two knots with the same invariant might turn out to be inequivalent). One desires knot invariants that are simple to compute and compare. So far, nobody has come up with a simple invariant that can distinguish between all nonequivalent knots. Although knots have been used since ancient times, their mathematical study began with Gauss’s development of linking numbers in 1833. Although physicists

(a) The unknot.

(b) The trefoil knot.

Figure 1. The trefoil knot cannot be deformed into the unknot without passing through itself. The Jones polynomials of these two knots are 1 and −t−4 + t−3 + t−1 , respectively. Since these polynomials are different, the two knots are not equivalent. 395



were interested in knots for a period in the 1800s, the modern study of knots only took off in the early 20th century. Max Dehn (see the 1980 entry), James Waddell Alexander (1888–1971), and Kurt Reidemeister (1893–1971) were early contributors. In particular, Alexander discovered the first knot polynomial [2]. The so-called Alexander polynomial of a knot is a Laurent polynomial (negative powers of the variable are permitted) with integer coefficients that is a knot invariant: two equivalent knots share the same Alexander polynomial. The year 1985 marked the publication of the explosive paper “A polynomial invariant for knots via von Neumann algebras” by Vaughan F. R. Jones [7]. This paper introduced the Jones polynomial of a knot, a Laurent polynomial invariant that is distinct from the Alexander polynomial and that could settle problems

t−2 − t−1 + 1 − t + t2

t3 + t5 − t8

t−5 − 2t−4 + 2t−3 − 2t−2 + 2t−1 − 1 + t

t2 + t4 − t5 + t6 − t7

Figure 2. Several knots and their Jones polynomials. Since their Jones polynomials are different, these knots are mutually nonequivalent.



that were impervious to previous methods; see Figure 2. It also exposed links between knot theory and physics that revitalized interest in the subject. Soon afterwards, a variety of other invariants, such as the HOMFLY polynomial [4], were discovered. Jones’s work had ushered in a new age in knot theory. See [1, Ch. 6] for an introduction to the Alexander, Jones, and HOMFLY polynomials and how to compute them. The existence of the Jones polynomial was not the most surprising part of the paper [7]. The most stunning portion of the title is “von Neumann algebras,” a highly technical and abstract branch of operator theory (think linear algebra in infinite-dimensional spaces with a hefty dose of analysis) with no initially apparent connections to low-dimensional topology at all. One could hardly have predicted deep links between two more seemingly disparate parts of mathematics! It is like claiming that techniques from the theory of large cardinals (transfinite numbers that are so large they require additional axioms beyond ZFC) could be used to solve open problems in biostatistics. Most mathematicians prior to Jones would have dismissed a connection between von Neumann algebras and knot theory as wild fantasy. For his amazing discovery, Jones was awarded the Fields Medal in 1990. Centennial Problem 1985 Proposed by Chad Wiley, Emporia State University. The Jones polynomial of the unknot is the constant polynomial 1. Are there any nontrivial knots which also have this property? Surprisingly, despite all the research that has been done, we still do not know the answer. A more accessible problem would be to show, perhaps using Rolfsen’s tables [8], that a nontrivial knot with Jones polynomial 1 must have at least 11 crossings. In fact, it can be shown that such a knot would need to have at least 18 crossings; see [3] for details. 1985: Comments Knot theory in other dimensions. It is only in three dimensions that knot theory is interesting. In two dimensions, there is essentially only the unknot (the unit circle) since a knot cannot cross itself. In four dimensions and higher, there is too much freedom to manipulate knots. One can show that any knot in four dimensions can be untangled to obtain the unknot; see Figure 3 for an intuitive explanation of this phenomenon. Where do von Neumann algebras come in? Knot theorists rapidly built on the Jones polynomial and developed more direct constructions. They have since discovered many new invariants without the use of von Neumann algebras. Consequently, information about the technical details of Jones’s work is hard to come by in the knot theory literature. We attempt to sketch, in broad strokes, some of the details behind Jones’s discovery. We thank James Tener for his assistance in this endeavor. First of all, a von Neumann algebra is a collection of bounded linear operators on a (typically



Figure 3. We can represent the position of a knot in four dimensions with three spatial dimensions and one “color dimension.” Two portions of the knot that are different colors can slide past each other. Thus, even complicated knots like the Stevedore knot above can be unknotted in four dimensions.

infinite-dimensional) Hilbert space that satisfies several desirable algebraic and topological axioms. As a finite-dimensional analogue, one might think of the set of n × n complex matrices as a model, although do not get too comfortable with that analogy! A factor is a von Neumann algebra that is highly noncommutative, in the sense that its center (the set of all things that commute with everything in the algebra) consists only of multiples of the identity operator. In 1936, Francis Murray (1911–1996) and von Neumann introduced the theory of von Neumann algebras (“rings of operators”) and showed that factors come in three basic flavors, called Type I, Type II, and Type III. Certain factors of Type II possess a well-behaved “trace” that behaves like the trace on the n × n matrices; these factors are called II1 factors. A factor that is contained by another factor is, not surprisingly, a subfactor . Given a II1 subfactor M−1 ⊆ M0 , Jones’s basic construction produces a new factor M1 that is generated by M0 and an orthogonal projection e1 [6]. One then iterates this procedure to get the Jones tower M−1 ⊆ M0 ⊆ M1 ⊆ M2 ⊆ · · · and a sequence of orthogonal projections ej ∈ Mj such that Mj is generated by Mj−1 and ej . One then considers the finite-dimensional algebras T Ln generated by {e1 , e2 , . . . , en }. This algebra inherits a positive trace from the II1 factors. A key property of this trace is the “Markov property”: tr(xen ) = τ tr(x) if x ∈ T Ln−1 , in which τ is the reciprocal of the “index” of the subfactor. The projections satisfy



the Temperly–Lieb(–Jones) relations • e2j = ej , • ej ek ej = τ ej when |j − k| = 1, and • ej ek = ek ej when |k − j| ≥ 2. One cannot have an algebra with these relations and a positive trace unless τ ∈ {4 cos2 (π/n) : n = 3, 4, . . .} ∪ [4, ∞); moreover, each of these values does occur. The fact that the spectrum of permissible index values has both a discrete and a continuous component is the most startling aspects of Jones’s paper on subfactors [6]. The Temperly–Lieb(–Jones) relations are reminiscent of the relations that define the braid group: Bn = σ1 , σ2 , . . . , σn : σi σj = σj σi if |i − j| ≥ 2, σi σi+1 σi = σi+1 σi σi+1 . It turns out that T Ln contains a representation of the braid group Bn , with generators (t + 1)ej − 1, where τ −1 = 2 + t + t−1 . This formula gives a family of representations πt of Bn (parametrized by t = −1) with special traces defined on them. However, for the rest of the story the positivity is not crucial, only the algebras T Ln with their special trace, which can be defined for any complex number τ . At this point, Jones noticed the similarity between the Temperly–Lieb relations and the braid group relations and talked to Joan Birman (1927– ) about it [5]. Alexander’s theorem says that any knot can be described as the closure of a braid. This process is not injective, but Markov’s theorem says that two braids result in the same link if and only if one can be obtained from the other by two types of moves. The first move is conjugation in the braid group, so to obtain link invariants from representations of the braid group, one needs a trace on the representation π. Thus, g → tr(π(g)) gives an invariant of the braid that is invariant under conjugation. The key to getting link invariants is to have a trace that is also invariant under the second Markov move. According to Jones’s paper, he realized the connection between the second Markov move and the Markov property of the trace as a result of the conversations with Birman. Once you multiply by an appropriate factor, his trace was indeed invariant under the second Markov move and hence gave an invariant of links. The normalizing factor uses the “writhe” of the link, which requires an orientation, so the Jones polynomial is actually an invariant of oriented links. One can then prove many of the properties of the Jones polynomial. For example, it is a Laurent polynomial in t1/2 . Since the Jones polynomial can distinguish between the trefoil knot and its mirror image, it was no mere variant of the Alexander polynomial; it was indeed something novel. Quantum topology. We cannot resist sharing an enlightening exchange from MathOverflow [9]. One user asked, “Why should I care about the Jones polynomial?” A particularly erudite answer was provided by Jonny Evans, who responded: Operator algebras had grown out of an attempt to formalise quantum mechanics/QFT, and it was surprising that a knot invariant should appear naturally in a completely different subject. At around the



same time, other manifold invariants (Donaldson invariants, instanton Floer homology) appeared that were also inspired by constructions in physics. These other manifold invariants had very definite topological consequences, solving huge open questions in 4-dimensional topology. Witten showed that all of these invariants (including the Jones polynomial) can be obtained formally by performing path integrals. For example, roughly speaking, the Jones polynomial can be obtained by looking at all possible connections A on a suitable bundle on your 3-manifold, taking the trace of the holonomy of A around your knot, multiplying by eiCS(A) where CS is the Chern–Simons invariant of A, and then integrating the result (over the infinite-dimensional space of all connections, which is a kind of path integral). Whether or not you care about physics, that is a pretty cool way to define an invariant. This led mathematicians to study topological quantum field theories, and the associated manifold invariants. In particular, Khovanov was led to discover his homological refinement of the Jones polynomial, which is extraordinarily useful as a knot invariant. . . .

See [10] for Witten’s down-to-earth exposition of the Jones polynomial and its relation to quantum mechanics. Bibliography [1] C. C. Adams, The knot book: An elementary introduction to the mathematical theory of knots, revised reprint of the 1994 original, American Mathematical Society, Providence, RI, 2004. MR2079925 [2] J. W. Alexander, Topological invariants of knots and links, Trans. Amer. Math. Soc. 30 (1928), no. 2, 275–306, DOI 10.2307/1989123. MR1501429 [3] O. T. Dasbach and S. Hougardy, Does the Jones polynomial detect unknottedness?, Experiment. Math. 6 (1997), no. 1, 51–56. http://www.or.uni-bonn.de/~hougardy/paper/ does_the.pdf. MR1464581 [4] P. Freyd, D. Yetter, J. Hoste, W. B. R. Lickorish, K. Millett, and A. Ocneanu, A new polynomial invariant of knots and links, Bull. Amer. Math. Soc. (N.S.) 12 (1985), no. 2, 239–246, DOI 10.1090/S0273-0979-1985-15361-3. MR776477 [5] A. Jackson and L. Traynor, Interview with Joan Birman, Notices Amer. Math. Soc. 54 (2007), no. 1, 20–29. http://www.ams.org/notices/200701/fea-birman.pdf. MR2275922 [6] V. F. R. Jones, Index for subfactors, Invent. Math. 72 (1983), no. 1, 1– 25, DOI 10.1007/BF01389127. http://link.springer.com/article/10.1007%2FBF01389127. MR696688 [7] V. F. R. Jones, A polynomial invariant for knots via von Neumann algebras, Bull. Amer. Math. Soc. (N.S.) 12 (1985), no. 1, 103–111, DOI 10.1090/S0273-0979-1985-15304-2. http:// www.ams.org/journals/bull/1985-12-01/S0273-0979-1985-15304-2/. MR766964 [8] The Knot Atlas, The Rolfsen Knot Table, http://katlas.org/wiki/The Rolfsen Knot Table [9] MathOverflow, Why should I care about the Jones polynomial?, https://mathoverflow.net/ questions/304486/why-should-i-care-about-the-jones-polynomial. [10] E. Witten, Jones polynomial, https://www.ias.edu/ideas/2011/witten-knots-quantumtheory.


Sudokus and Look and Say Introduction Long ago movie theaters had double features, at which you could see two films for the price of one. The Astor Theatre in Melbourne opened in 1936. It is one of the few places in the world where one can still catch a double feature. In honor of its 50th anniversary, we present a mathematical double feature: two “recreational” math topics for the price of one! Almost everyone has heard about Sudokus. Their rise to popularity began in 1986 with the puzzle company Nikoli in Japan. Since then, they have become so ubiquitous that they now share space with crossword puzzles in newspapers and airline magazines. One is presented with a partially filled 9 × 9 grid, which is subdivided into nine blocks of size 3 × 3. The goal is to fill in the empty boxes with digits in such a way that each row and column is a permutation of 1, 2, . . . , 9. Moreover, each block must contain each of 1, 2, . . . , 9 exactly once. Figure 1 is a good example; see Figure 2 in the comments for the solution. Sudokus involve a lot of terrific mathematics. The first natural question to ask is how many distinct Sudoku puzzles there are. For example, if we switch all 1’s and 9’s, we obtain a puzzle that looks different, but that is fundamentally the same. There are other transformations that can be performed: rotate the puzzle











3 2 5

5 6



9 1 9

7 1




6 4


Figure 1. A Sudoku challenge. 401

1 3



by 90 degrees, reflect across the diagonal, and so forth. Up to symmetries, there are 5,472,730,538 essentially different puzzles. An implicit rule, observed by Sudoku puzzle creators, is that there must be exactly one solution to each puzzle. What is the minimal number of clues that must be given in order to uniquely determine how a Sudoku grid is filled? The answer, 17, was obtained in 2012 by Gary McGuire, Bastian Tugemann, and Gilles Civario who wrote [5]: The Sudoku minimum number of clues problem is the following question: what is the smallest number of clues that a Sudoku puzzle can have (and lead to a unique solution)? For several years it had been conjectured that the answer is 17. We have performed an exhaustive computer search for 16-clue Sudoku puzzles, and did not find any, thus proving that the answer is indeed 17. In this article we describe our method and the actual search. As a part of this project we developed a novel way for enumerating hitting sets. The hitting set problem is computationally hard; it is one of Richard Karp’s 21 classic NP-complete problems. A standard backtracking algorithm for finding hitting sets would not be fast enough to search for a 16-clue Sudoku puzzle exhaustively, even at today’s supercomputer speeds. To make an exhaustive search possible, we designed an algorithm that allowed us to efficiently enumerate hitting sets of a suitable size.

Our example (Figure 1) has 30 clues and is therefore much simpler than the worst case scenario: a Sudoku with only 17 clues. For more information about Sudoku, see [4, 8, 10]. For our second feature, consider the famous see and say sequence (or look and say sequence) introduced by John Horton Conway (1937– ) in 1986. The first few terms are 1, 11, 21, 1211, 111221, 312211, 13112221, 1113213211.


The pattern is not immediately obvious because we are used to looking for patterns that arise from mathematical processes. However, (1986.1) is generated linguistically. It is created by the process suggested by its name. The first number is “one 1”, so the second number is 11. The second number is “two 1’s”, so the third number is 21, and so on. Can you show that no digit other than 1, 2, or 3 appears in the sequence? Conway and his colleagues proved a number of remarkable facts about the sequence (1986.1). The following is from the abstract of a talk on the subject given by Alex Kontorovich at Columbia on March 23, 2004: He [Conway] found that the sequence decomposed into certain recurring strings. Categorizing these 92 strings and labeling them by the atoms of the periodic table (from Hydrogen to Uranium), Conway was able to prove that the asymptotic length of the sequence grows exponentially, where the growth factor (now known as Conway’s constant) is found by computing the largest eigenvalue of a 92 × 92 transition matrix. Even more remarkable is the Cosmological Theorem, which



states that regardless of the starting string, every Look and Say sequence will eventually decay into a compound of these 92 atoms, in a bounded number of steps. Conway writes that, although two independent proofs of the Cosmological Theorem were verified, they were lost in writing! It wasn’t until a decade later that Doron Zeilberger’s paper (coauthored with his computer, Shalosh B. Ekhad) gave a tangible proof of the theorem. We will discuss this weird and wonderful chemistry, and some philosophical consequences. The only prerequisite is basic linear algebra.

Many variants of Conway’s sequence have been analyzed. Some use different starting numbers, others use binary, and still others count the total number of digits instead of the numbers of digits in blocks. See [1–3, 9] for more information about the “look and say” sequence and its variations. Centennial Problem 1986 Proposed by Steven J. Miller and Samuel Tripp, Williams College. Instead of a 9 × 9 Sudoku, one can consider n2 × n2 Sudoku puzzles. How does the minimum number of clues required to impose a unique solution grow with n? Can you find any lower bounds? Any upper bounds? Let us now consider variants of the look and say sequence. (a) What if we say things backwards? For example, instead of saying “two three” for 33 we say “three two”? There is no difference for 1, 22, or 333, but there is a difference for 33. If we start with 1, then the first few terms of the sequence are 1, 11, 12, 1121, 122111, 112213, 12221131. Each of these is the reverse of the corresponding term in the original sequence (1986.1). Does this pattern hold forever? (b) What if whenever we have just one of a number, we just write that number? In this case, if we start with 1, then the sequence produced is 1, 1, 1, . . .. Starting with 11 yields the sequence 11, 21, 21, 21, . . .. Starting with 112 yields 112, 212, 212, 212, . . .. If we start with a finite string composed of 1’s, 2’s, and 3’s, does the sequence produced eventually stabilize? (c) Consider 3-digit substrings of the sequence (1986.1). Prove that 333 will never be found as a 3-digit substring of any term. Find three other such 3-digit substrings that never appear. (d) Show that if d ≥ 4 does not occur in the first two terms of the sequence, then d never occurs. There are of course many other problems you could study; see [7]. 1986: Comments An algorithm for Sudokus. The Sudoko-solving approach that we suggest below is not the fastest, but it connects to our 1947 entry on linear programming. See [6] and the references therein for more information about algorithms and linear programming.



Let X = [xi,j ]9i,j=1 be the 9 × 9 matrix that represents the unique solution to the Sudoku puzzle. Some of the xi,j (hopefully at least 17 of them!) are given and we must find the rest. We have the conditions that in each row, each column, and each of the nine 3 × 3 blocks, each digit 1, 2, . . . , 9 appears exactly once. Linear programming can solve problems such as these, although the additional restriction that our entries are integers makes things more difficult. A lot of packages exist for solving binary integer programming, which requires that the variables only assume the values 0 or 1. We can modify our approach by choosing variables xi,j,d for 1 ≤ i, j, d ≤ 9 such that  1 if xi,j = d, xi,j,d = 0 otherwise. How many constraints on the xi,j,d do we have? If I is the set of locations for which we are given initial values, then we have |I| conditions. However, this is dwarfed by what remains. Each row, column, and 3 × 3 block has exactly one of each of the nine digits. The fact that we require exactly one 5 in the first row yields the constraint x1,1,5 + x1,2,5 + · · · + x1,9,5 = 1. This yields 81 constraints for the rows, 81 for the columns, and 81 for the blocks (some may be redundant or unnecessary due to the placement of the initial values). So we have about 240 constraints, give or take a few. However, we do not want the nonzero values to correspond to the same cell and hence we add 81 more constraints 9 

xi,j,d = 1,

1 ≤ i, j ≤ 9,


which gives us around 320 constraints. We can reduce the number of constraints by instead replacing {1, 2, . . . , 9} with S = {1, 10, 100, . . . , 108 }. As before, the constraint

x1,j,d = 1


ensures that exactly one of the xi,j,d is nonzero; we need 81 such constraints to make sure we choose exactly one element of S for each grid location. The constraint on the jth row is now 9 

d · xi,j,d = 111,111,111.

d∈S i=1

Our choice of S ensures that the only way a row can sum to 111,111,111 is to have exactly one element in the jth row that equals 1, exactly one that equals 10, and so forth. This reduces the number of constraints by a huge amount, leaving us around 100 constraints to contend with.


2 6 4 3 1 7 9 5 8

3 5 9 2 8 6 7 4 1

8 1 7 4 5 9 3 2 6

7 4 1 6 2 3 8 9 5

6 8 2 7 9 5 1 3 4

9 3 5 1 4 8 6 7 2

1 7 8 5 3 2 4 6 9

4 2 3 9 6 1 5 8 7


5 9 6 8 7 4 2 1 3

Figure 2. The answer to the Sudoku challenge. Bibliography [1] J. H. Conway, The weird and wonderful chemistry of audioactive decay, Eureka 46 (1986), 5-18. http://graphics8.nytimes.com/packages/pdf/crossword/GENIUS AT PLAY Eureka Article.pdf. [2] S. B. Ekhad and D. Zeilberger, Proof of Conway’s lost cosmological theorem, Electron. Res. Announc. Amer. Math. Soc. 3 (1997), 78–82, DOI 10.1090/S1079-6762-9700026-7. http://www.ams.org/journals/era/1997-03-11/S1079-6762-97-00026-7/S10796762-97-00026-7.pdf. MR1461977 ´ Mart´ın, Look-and-say biochemistry: exponential RNA and multistranded DNA, Amer. [3] O. Math. Monthly 113 (2006), no. 4, 289–307, DOI 10.2307/27641915. http://web.archive. org/web/20061224154744/http://www.uam.es/personal_pdi/ciencias/omartin/Biochem. PDF. MR2211756 [4] Math Explorer’s Club, The Math Behind Sudoku: References, http://www.math.cornell. edu/~mec/Summer2009/Mahmood/References.html. [5] G. McGuire, B. Tugemann, and G. Civario, There is no 16-clue Sudoku: solving the Sudoku minimum number of clues problem via hitting set enumeration, Exp. Math. 23 (2014), no. 2, 190–217, DOI 10.1080/10586458.2013.870056. http://arxiv.org/abs/1201. 0749. MR3223774 [6] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Undergraduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274 [7] C. Rivera, Puzzle 657: Look and say sequence, http://www.primepuzzles.net/puzzles/ puzz_657.htm. [8] E. Russell and F. Jarvis, There are 5472730538 essentially different Sudoku grids. . . and the Sudoku symmetry group, Mathematical Spectrum 39 (2006), 54–58. [9] Wikipedia, Look and Say, http://en.wikipedia.org/wiki/Look-and-say_sequence. [10] Wikipedia, Sudoku, http://en.wikipedia.org/wiki/Sudoku.


Primes, the Zeta Function, Randomness, and Physics Introduction In the 1942 entry, we saw that the Riemann zeta function ζ(s) =

∞  1 ns n=1

can be analytically continued from the half-plane Re s > 1 to C\{1}, with a simple pole at s = 1 and with zeros at the negative even integers −2, −4, . . .. The nontrivial zeros of the zeta function lie in the critical strip 0 < Re s < 1; the Riemann hypothesis asserts that these all lie on the vertical line Re s = 12 [3]. The Euler product formula (1933.3) suggests a profound relationship between the zeta function and the prime numbers. We suggested in the 1939 entry that the location of the nontrivial zeros determines the large-scale behavior of the primes. This profound link between the continuous (complex analysis) and discrete (prime numbers) has long fascinated mathematicians. The primes dance to the tune played by the zeros of an analytic function! The classical methods of analytic number theory have not yet produced a proof of the Riemann hypothesis. There is a general opinion among experts that a new approach is needed. One idea that has spurred a huge amount of research in the last several decades is the Hilbert–P´ olya conjecture, which says that the Riemann hypothesis is true because there is an unbounded selfadjoint operator H on some Hilbert space so that the eigenvalues of 1 I + iH 2 are the nontrivial zeroes of the zeta function (the eigenvalues of a selfadjoint operator are real). Moreover, some expect that H is the Schr¨odinger operator (see the 1925 entry) corresponding to some quantum system. Although the conjecture first appeared in print in 1973 [8], it was originally proposed by George P´ olya sometime during 1912–1914. Hilbert’s role in the conjecture is less clear: David Hilbert did not work in the central areas of analytic number theory, but his name has become known for the Hilbert–P´ olya conjecture for reasons that are anecdotal. [14]

In the early 1980s, Andrew Odlyzko investigated the provenance of the conjecture. His correspondence with P´olya and Olga Taussky-Todd (1906–1995), who worked with Hilbert in G¨ ottingen, makes an interesting read [10]. 407



The Hilbert–P´olya conjecture suggests a connection with random matrix theory (see the 1928 entry). While random matrix theory began in the 1920s and was known to be relevant to nuclear physics since the 1950s, it was not until a fortuitous meeting at the Institute for Advanced Study in the early 1970s that connections to number theory emerged. The mathematical physicist Freeman Dyson was talking with Hugh Montgomery and inquired about his recent work. Montgomery was looking at the pair correlation of zeros of the Riemann zeta function, and when he showed the sine kernel answer he found, Dyson remarked that one sees similar behavior in nuclear physics and random matrix theory. Thus began numerous productive conversations between the two communities. The connection with random matrix theory suggests that we should approach the prime numbers from a probabilistic viewpoint (see the comments below for an explanation of the Cram´er random model of the prime numbers). The sieve of Eratosthenes is an elementary method for producing every prime number up to a given threshold. For example, suppose that we wish to find all of the primes at most 100. First cross out every multiple of 2, other than 2 itself. Then cross out every multiple of 3, except for 3. The number 4 has already been crossed out, so we ignore it. We proceed to cross out every multiple of 5 and so forth. What remains when the procedure terminates is a list of the primes below 100: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97. Although the sieve is agonizingly slow and unsuitable for practical use, it does provide a deterministic method to produce all prime numbers in a given range. In order to understand the extent to which conjectures such as the Riemann hypothesis or the twin prime conjecture are outcomes of general sieving procedures, David Hawkins proposed a probabilistic version of the sieve of Eratosthenes [4, 5]. Start with the list of integers greater than 1, and perform an infinite series of passes through the list; each pass produces what is called a Hawkins prime. On the first pass, we identify 2 as the first element in the list. We call p1 = 2 the first Hawkins prime. Then go through the remainder of the list and cross out each element with probability 1/p1 = 1/2. On the kth pass, we look for the first element in the list that is greater than pk−1 and that has not been crossed out. We declare it to be pk and cross out every subsequent integer with probability 1/pk . For example, 3 might be crossed out on the first pass while 4 is not. In this case p2 = 4 is the second Hawkins prime! The Hawkins primes do not share all of the properties of ordinary primes; they only provide a rough model for the large-scale distribution of the primes. Centennial Problem 1987 Proposed by Andrew Odlyzko, University of Minnesota. The Hawkins sieve can be simulated on a computer and studied. Many properties of primes, both proven and conjectured, can be shown to hold with probability one for Hawkins primes. Can one come up with other random sieves that can be analyzed rigorously and produce numbers that more closely resemble ordinary primes?



1987: Comments The explicit formula. In our previous entries on the Riemann zeta function and the Riemann hypothesis, we alluded to the connection between the location of the nontrivial zeros of the zeta function and the distribution of the prime numbers. We address that issue here. Let ∞     ψ(x) = log p = log p + log p (1987.1) pk ≤x


k=2 pk ≤x

and note that the main term in ψ(x) comes from the first summand. Believe it or not, the log p appears throughout the preceding for convenience. The reason has to do with the Euler product representation of the zeta function and the log p terms that arise upon taking logarithmic derivatives. Indeed, we have ∞

 Λ(n) ζ  (s) , = − ζ(s) ns n=1 

in which Λ(n) =

log p

if n = pk for some prime p,



is the von Mangoldt function, named after Hans Carl Friedrich von Mangoldt (1854– 1925). This computation is the first step in many delicate contour integrations; see [7, Rem. 2.3.21 & Ch. 3] for details. Let ρ denote a typical zero of ζ(s) in the critical strip. Then 0 < Re ρ < 1; the Riemann hypothesis is the statement that all such ρ have Re ρ = 12 . If x is not a prime power, then a hefty dose of complex analysis yields an explicit formula that relates the sum (1987.1) over the prime numbers to a sum over the zeros of ζ(s):    xρ ζ  (0) 1 1 ψ(x) = x − − − log 1 − 2 . ρ ζ(0) 2 x ρ There is a small technicality here. If ζ(ρ) = 0, then ζ(ρ) = 0. In order to have the preceding sum converge, we group the terms corresponding to ρ and ρ together. The Riemann hypothesis implies that 1 √ |ψ(x) − x| ≤ x log x, x ≥ 74. (1987.2) 8π The square root comes from the assumption that |xρ | = x1/2 ; the extra logarithm appears because of technical reasons. Through partial summation, one can use (1987.2) to conclude that there is a constant C such that |π(x) − Li(x)| ≤ Cx1/2 log x, in which π(x) denotes the number of primes at most x and  x dt x Li(x) = ∼ log t log x 2 denotes the offset logarithmic integral. These arguments can be reversed: if π(x) is sufficiently close to Li(x), then the Riemann hypothesis is true. One shows that the existence of a zero with real



part greater than 1/2 leads to a violation on the proposed bound on |π(x) − Li(x)| (if ζ(ρ) = 0, then ζ(1 − ρ); hence we may assume an exception to the Riemann hypothesis has real part greater than 1/2). How big is the nth prime? We have seen that the zeros of the Riemann zeta function govern the large-scale distribution of the prime numbers. For example, the prime number theorem is a consequence of the fact that ζ(1 + it) = 0 for t ∈ R. This famous theorem asserts that lim


π(x) = 1. x/ log x

Since π(pn ) = n, we substitute q = pn , do a bit of calculus, and obtain    π(pn ) log pn log n n log n lim = lim n→∞ n→∞ pn pn log pn log n = lim n→∞ log pn log π(q) = lim q→∞ log q   log π(q)qlog q + log q − log log q = lim q→∞ log q   log 1 log log q +1− = lim q→∞ log q log q = 1. Thus, pn is asymptotic to n log n. A more precise estimate is due to Michele Cipolla (1880–1947), who proved that n(log n + log log n − 1) < pn < n(log n + log log n) for sufficiently large n [2]. In fact, he showed that     m  n(log log n)m+1 (−1)k+1 Tk (log log n) pn = n log n + log log n − 1 + +O , logm+1 n k logk n k=1 in which Tk is a monic polynomial of degree k with rational coefficients, the first of which are T1 (x) = x − 2


T2 (x) = x2 − 6x + 11.

For example, Cipolla’s formula with m = 2 predicts that the ten millionth prime number is in the neighborhood of 179,464,275. The ten millionth prime number is 179,424,673. Not too shabby! The historical progression and the current state of the art on estimating pn are discussed in [1]. The Cram´ er model. Based on the prime number theorem, Harald Cram´er (1893–1985) proposed a simple probabilistic model of the prime numbers that often leads to decent predictions [9, 12] (see the comments for the 1975 entry for another application of heuristic reasoning to the primes). The prime number theorem tells



us that the number of primes at most x is asymptotic to x/ log x or, somewhat more accurately, Li(x). Consequently, for fixed > 0 and large x we expect about x + x x − x 2 x − ∼ log(x + x) log(x − x) log x primes in the interval [x − x, x + x]. Dividing by the length 2 x of the interval, it follows that the probability that a natural number in the vicinity of x is prime is roughly 1/ log x. For n ≥ 2, let Xn be the random variable that is 1 with probability 1/ log n and 0 otherwise. Since this is a heuristic argument that cannot yield rigorous results, we may be a little imprecise. For example, 1/ log 2 > 1 and hence we should omit the prime 2 from our considerations. However, this does not come back to bite us and we ignore such issues. Let RN = X1 + X2 + · · · + XN denote the number of “random primes” at most N . What is the expected number E[RN ] of primes at most N ? According to our model and the linearity of expectation, we have E[RN ] =


E[Xn ] =



1 ∼ Li(N ) ∼ π(N ). log n n=2

This is not surprising, since we designed our model based on the hypothesis that there should be π(x) primes at most x: E[RN ] better be asymptotic to π(N )! Since the random variables X1 , X2 , . . . are independent, the variance of their sum is the sum of their variances. Consequently,1 Var(RN ) =


Var(Xn )

n=2 N    E[Xn2 ] − E[Xn ]2 = n=2

= ∼


 1 1 − log n n=2 log2 n n=2 N


1 log n n=2

∼ Li(N ) ∼

N . log N

√ The standard deviation of RN is therefore asymptotic to N 1/2 / log N . Thus, if the primes are “random” in the sense of the Cram´er model, we should expect that π(x) behaves like Li(x), with an error on the order of x1/2 (ignoring constants and logarithmic factors). This is what the Riemann hypothesis predicts! Although the Cram´er model does not always give exactly the right answer, it often does a decent job. It certainly beats having to prove the Riemann hypothesis. 1 We

did not apply the central limit theorem because, while our random variables are independent, they are not identically distributed. One can use the Lyapunov central limit theorem in this context [13].



Bibliography [1] C. Axler, New estimates for the n-th prime number, https://arxiv.org/pdf/1706.03651. pdf [2] M. Cipolla, La determinazione assintotica dell’ nimo numero primo, Rend. Accad. Sci. FisMat. Napoli 3 (1902), no. 8, 132–166. [3] J. B. Conrey, The Riemann hypothesis, Notices Amer. Math. Soc. 50 (2003), no. 3, 341–353. http://www.ams.org/notices/200303/fea-conrey-web.pdf. MR1954010 [4] D. Hawkins, The random sieve, Math. Mag. 31 (1957/1958), 1–3, DOI 10.2307/3029322. MR0099321 ¨ [5] J. Lorch and G. Okten, Primes and probability: the Hawkins random sieve, Math. Mag. 80 (2007), no. 2, 112–119, DOI 10.1080/0025570x.2007.11953464. http://www.cs.bsu.edu/ homepages/jdlorch/mathmag116-123-lorch.pdf. MR2301878 [6] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480 [7] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 [8] H. L. Montgomery, The pair correlation of zeros of the zeta function, Analytic number theory (Proc. Sympos. Pure Math., Vol. XXIV, St. Louis Univ., St. Louis, Mo., 1972), Amer. Math. Soc., Providence, R.I., 1973, pp. 181–193. MR0337821 [9] H. L. Montgomery and K. Soundararajan, Beyond pair correlation, Paul Erd˝ os and his mathematics, I (Budapest, 1999), Bolyai Soc. Math. Stud., vol. 11, J´ anos Bolyai Math. Soc., Budapest, 2002, pp. 507–514. MR1954710 [10] A. Odlyzko, Correspondence about the origins of the Hilbert-Polya Conjecture, http://www. dtc.umn.edu/~odlyzko/polya/index.html. [11] A. M. Odlyzko, On the distribution of spacings between zeros of the zeta function, Math. Comp. 48 (1987), no. 177, 273–308, DOI 10.2307/2007890. http://www.ams.org/journals/ mcom/1987-48-177/S0025-5718-1987-0866115-0/. MR866115 [12] T. Tao, 254A, Supplement 4: Probabilistic models and heuristics for the primes (optional), https://terrytao.wordpress.com/tag/cramers-random-model/. [13] Wikipedia, Lyapunov CLT, https://en.wikipedia.org/wiki/Central_limit_theorem# Lyapunov_CLT. [14] Wikipedia, Hilbert–P´ olya conjecture, https://en.wikipedia.org/wiki/Hilbert-Polya conjecture.


Mathematica Introduction On June 23, 1988, Mathematica 1.0 was launched. What is Mathematica? Wolfram Mathematica (usually termed Mathematica) is a modern technical computing system spanning all areas of technical computing— including neural networks, machine learning, image processing, geometry, data science, visualizations, and others. The system is used in many technical, scientific, engineering, mathematical, and computing fields. It was conceived by Stephen Wolfram and is developed by Wolfram Research of Champaign, Illinois. [13]

That describes what Mathematica is now; it was not always so all-encompassing. It was originally focused, as its name implies, on mathematics and it still does math well. Many of the illustrations in this book were produced with Mathematica, as were many of the tables of data we have presented. For more on what Mathematica can do, check out the demonstrations page [11]. Mathematica, even from its beginnings, was capable of complex symbolic manipulations of the sort appreciated by calculus students everywhere. Indeed, the “computational knowledge engine” Wolfram Alpha, which is consulted by millions of calculus students every day, is based in part on Mathematica. For example, suppose that we wish to compute the partial fraction expansion of

f (x) =

1 . x2 (x − 1)3 (x + 1)

This is the sort of grueling symbolic computation that is well suited for the computer. We enter Apart[1/(x^2 (x - 1)^3 (x + 1))] into Mathematica (perhaps using its more appealing, modern interface) and immediately receive the answer 1/(2 (-1+x)^3)-5/(4 (-1+x)^2)+17/(8 (-1+x))-1/x^2-2/x-1/(8 (1+x)). There are, of course, more elegant ways to receive the output. However, this is what users in the late 1980s would have seen on their screens. A nifty feature of recent versions of Mathematica is the ability to get output in LATEX (see the 1979 entry). A simple cut-and-paste from the Mathematica window provides the LATEX 413



source for the answer 1 17 5 1 2 1 + . + − − f (x) = − 2 − x 4(x − 1)2 2(x − 1)3 x 8(x + 1) 8(x − 1) Integrals are also easily conquered (see the 1968 entry for information about the Risch algorithm for symbolic integration). The command Integrate[Exp[-x^2], {x, -Infinity, Infinity}] yields the answer Sqrt[Pi] and hence tells us that  ∞ √ 2 e−x dx = π. −∞

Of course, Mathematica is not only the backbone of Wolfram Alpha and the hidden savior of calculus students everywhere. It has long been used for serious mathematical research. Both authors have used Mathematica computations in their own research, particularly in number theory, linear algebra, complex analysis, and statistics. Its flexibility is remarkable: It is often said that the release of Mathematica marked the beginning of modern technical computing. Ever since the 1960s individual packages had existed for specific numerical, algebraic, graphical and other tasks. But the visionary concept of Mathematica was to create once and for all a single system that could handle all the various aspects of technical computing in a coherent and unified way. The key intellectual advance that made this possible was the invention of a new kind of symbolic computer language that could for the first time manipulate the very wide range of objects involved in technical computing using only a fairly small number of basic primitives. [12]

It does not take much imagination to see how computing software could be useful in applied mathematics or statistics research. How can Mathematica be used in pure mathematics research? Suppose that we wanted to explore the prime numbers. A first step might be to examine the first 100 of them. The command Table[Prime[n], {n, 1, 100}] produces the output {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541} from which we can observe several patterns. Aside from the anomalous primes 2 and 5, each prime number appears to end in 1, 3, 7, or 9. This is explained easily enough: numbers that end in 0, 2, 4, 5, 6, 8 are divisible by 2 or by 5. Do the primes have a favorite, among the final digits 1, 3, 7, and 9? The command Table[Length[ Select[Prime[Range[1000000]], Mod[#, 10] == {1, 3, 7, 9}[[i]] &]], {i, 1, 4}]



produces {249934, 250110, 250014, 249940} This tells us that among the first 1,000,000 primes, there are 249,934 that end in 1, 250,110 that end in 3, and so forth. The split looks remarkably even. Trying a few different bases would reveal similarly equitable splits. From this, it is a short step to conjecturing Dirichlet’s theorem on primes in arithmetic progressions (see the 1913 entry). A more detailed analysis could also reveal Chebyshev’s bias (there are usually slightly more primes of the form 4k + 3 than 4k + 1 up to a given threshold) [2, 7], or perhaps even the recent observation of Robert Lemke Oliver and Kannan Soundararajan (1973– ) that the primes have some definite thoughts about who they sit next to [3, 5]: Lemke Oliver and Soundararajan saw that in the first billion primes, a 1 is followed by a 1 about 18% of the time, by a 3 or a 7 each 30% of the time, and by a 9 22% of the time. They found similar results when they started with primes that ended in 3, 7 or 9: variation, but with repeated last digits the least common. The bias persists but slowly decreases as numbers get larger. [4]

Simply put, computational power can reveal hidden patterns in classical objects that could not be guessed at otherwise. This can lead to new conjectures about the observed behavior and eventually new theorems. See the comments for an example of this method of discovery. Computer algebra systems can be used to produce startling identities that can later be verified. This is similar to proofs by induction: you know the answer already and need only to justify it. As a great example, the Mathematica commands Sum[Binomial[n, k]^2, {k, 0, n}] Sum[k Binomial[n, k]^2, {k, 0, n}] Sum[k^2 Binomial[n, k]^2, {k, 0, n}] Sum[k^3 Binomial[n, k]^2, {k, 0, n}] yield the outputs Binomial[2 n, n] 1/2 n Binomial[2 n, n] n^2 Binomial[-2 + 2 n, -1 + n] 1/2 n^2 (1 + n) Binomial[-2 + 2 n, -1 + n] These are the identities

n  2  n

k=0  n 



2 n = k k=0  2 n  2 n k = k k=0  2 n  n k3 = k k


  2n , n   1 2n n , 2 n   2 2n − 2 n , n−1   2n − 2 1 2 n (n + 1) . n−1 2



Centennial Problem 1988 Proposed by Steven J. Miller, Williams College. Can you prove the identities above by induction? Can you prove them combinatorially? Can you prove them using generating functions? Can you find closed forms for  2 n  n k4 k k=0

and n  k=0


 2 n ? k

1988: Comments Wilf–Zeilberger algorithm. We have seen that computers can spit out novel identities. It would be better if they could provide humanly understandable proofs of those identities too. In 1990, Herbert Wilf and Doron Zeilberger (1950– ) came up with an algorithm to do just that [6, 8, 9]. Twin primes and their biases. Here is a true story about how a few Mathematica computations led to a new discovery. Recall that a primitive root modulo n is a generator of the multiplicative group (Z/nZ)× . For example, 2 is a primitive root modulo 5 since 21 , 22 , 23 , 24 ≡ 2, 4, 3, 1 (mod 5), respectively. If p is prime, then a theorem of Gauss ensures that (Z/pZ)× has exactly φ(p − 1) primitive roots, in which φ denotes the Euler totient function (see the 1977 entry). One day last year, in Professor Stephan Garcia’s Number Theory and Cryptography class, the lesson took a surprising turn. To make a point about the use of seemingly random patterns in cryptography, Garcia had just flashed onto the screen a chart of the first 100 [actually 20] prime numbers and all of their primitive roots. . . . [10]

Needless to say, the chart (Table 1) was produced by Mathematica. The command PrimitiveRootList[p] provides a list of the primitive roots of p. Looking at the chart, Elvis Kahoro ’20 noticed something interesting about pairs of primes known as “twins”—primes that differ by exactly two, such as 29 and 31 [apart from 3 and 5]. The smaller of the pair always seemed to have as many or more primitive roots than the larger of the two. He wondered if that was always true. “So I just asked what I thought was a random question,” Kahoro recalls. It was the kind of curious question he was known for asking all through his school years, sometimes with unfortunate results. “Some teachers would get mad at me for asking so many questions that led us off the topic,” he remembers.



But Garcia took the first-year student’s question seriously. And the next day, the professor called Kahoro to his office, where he’d been doing some number-crunching on his computer [with Mathematica]. “It turns out that Elvis’s conjecture is false, but in an astoundingly interesting way,” Garcia explains. “There are only two counterexamples below 10,000. And bigger number-crunching indicated that his conjecture seemed to be correct 98 percent of the time [see Figure 1].” Garcia and a frequent collaborator, Florian Luca, then found a theoretical explanation for the phenomenon, resulting in a paper titled “Primitive root bias for twin primes [1],” to be published in the journal Experimental Mathematics, with Kahoro listed as a co-author. “What I’ve taken away from this,” Kahoro says, “is never to be afraid to ask questions in class, because you never know where they’ll lead.” [10]

Table 1. Lists of primitive roots for the first 20 primes. p 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59

primitive roots modulo p 1 2 2, 3 3, 5 2, 6, 7, 8 2, 6, 7, 11 3, 5, 6, 7, 10, 11, 12, 14 2, 3, 10, 13, 14, 15 5, 7, 10, 11, 14, 15, 17, 19, 20, 21 2, 3, 8, 10, 11, 14, 15, 18, 19, 21, 26, 27 3, 11, 12, 13, 17, 21, 22, 24 2, 5, 13, 15, 17, 18, 19, 20, 22, 24, 32, 35 6, 7, 11, 12, 13, 15, 17, 19, 22, 24, 26, 28, 29, 30, 34, 35 3, 5, 12, 18, 19, 20, 26, 28, 29, 30, 33, 34 5, 10, 11, 13, 15, 19, 20, 22, 23, 26, 29, 30, 31, 33, 35, 38, 39, 40, 41, 43, 44, 45 2, 3, 5, 8, 12, 14, 18, 19, 20, 21, 22, 26, 27, 31, 32, 33, 34, 35, 39, 41, 45, 48, 50, 51 2, 6, 8, 10, 11, 13, 14, 18, 23, 24, 30, 31, 32, 33, 34, 37, 38, 39, 40, 42, 43, 44, 47, 50, 52, 54, 55, 56 61 2, 6, 7, 10, 17, 18, 26, 30, 31, 35, 43, 44, 51, 54, 55, 59 67 2, 7, 11, 12, 13, 18, 20, 28, 31, 32, 34, 41, 44, 46, 48, 50, 51, 57, 61, 63 71 7, 11, 13, 21, 22, 28, 31, 33, 35, 42, 44, 47, 52, 53, 55, 56, 59, 61, 62, 63, 65, 67, 68, 69



Figure 1. The horizontal axis denotes the number of twin primes. The vertical axis is the ratio of twin prime pairs (p, p+2) for which p has more primitive roots than p + 2. The ratio hangs stubbornly near 98%. For more information about twin primes, see the 1919 and 1923 entries. Bibliography [1] S. R. Garcia, F. Luca, and E. Kahoro, Primitive root bias for twin primes, Experimental Mathematics, in press. https://www.tandfonline.com/doi/full/10.1080/10586458.2017. 1360809. [2] A. Granville and G. Martin, Prime number races, Amer. Math. Monthly 113 (2006), no. 1, 1–33, DOI 10.2307/27641834. MR2202918 [3] E. Klarreich, Mathematicians Discover Prime Conspiracy, https://www.quantamagazine. org/mathematicians-discover-prime-conspiracy-20160313. [4] E. Lamb, Peculiar pattern found in ‘random’ prime numbers: last digits of nearby primes have ‘anti-sameness’ bias, Nature (online), https://www.nature.com/news/peculiar-patternfound-in-random-prime-numbers-1.19550. [5] R. J. Lemke Oliver and K. Soundararajan, Unexpected biases in the distribution of consecutive primes, Proc. Natl. Acad. Sci. USA 113 (2016), no. 31, E4446–E4454, DOI 10.1073/pnas.1605366113. http://www.pnas.org/content/pnas/113/31/E4446.full. pdf. MR3624386 [6] P. Paule and M. Schorn, A Mathematica Version of Zeilberger’s Algorithm for Proving Binomial Coefficient Identities, J. Symbolic Computation 11 (1994), 1–25. [7] M. Rubinstein and P. Sarnak, Chebyshev’s bias, Experiment. Math. 3 (1994), no. 3, 173–197. MR1329368 [8] H. S. Wilf, Computer programs from the book “A = B”, and related programs, https://www. math.upenn.edu/~wilf/progs.html. [9] H. S. Wilf and D. Zeilberger, Towards computerized proofs of identities, Bull. Amer. Math. Soc. (N.S.) 23 (1990), no. 1, 77–83. https://projecteuclid.org/euclid.bams/1183555718. [10] Staff writer, How to Advance Mathematics By Asking the Right Questions, Pomona College Magazine, Spring 2018, 20-21. http://magazine.pomona.edu/2018/spring/how-to-advancemathematics-by-asking-the-right-questions/.



[11] Wolfram, Wolfram Demonstrations Project, http://demonstrations.wolfram.com/. [12] Wolfram, The Mathematica Book (Mathematica 5 Documentation, 2003), http://reference. wolfram.com/legacy/v5/TheMathematicaBook/FrontMatter/0.2.1.html [13] Wikipedia, Wolfram Mathematica, https://en.wikipedia.org/wiki/Wolfram_Mathematica.


PROMYS Introduction In 1989, David Fried and Glenn H. Stevens (1953– ), graduates of Arnold Ross’s Secondary Science Training Program (see the 1957 entry), cofounded PROMYS (Programs in Mathematics for Young Scientists). Since then, over 1,000 students have gone through the program. Currently about 80 high school students each year come to Boston University for six weeks of challenging mathematics. They are mentored by top graduate students and faculty drawn from all over the world. Programs like PROMYS play a key role in exciting students to pursue mathematics and teaching older students how to mentor, design classes, and develop research programs. In addition to standard classes and challenging problems, students participate in research and attend advanced lectures on topics ranging from “The Schoenflies Conjecture and Morse Theory” to “Statistical Inference and Modeling the Unseen: How Bayesian statistics powers Google’s voice search.” The second named author spoke at PROMYS several times. In 2009, he gave a talk on heuristics and ballpark estimates. Informal argumentation is an important skill for aspiring mathematicians to develop. The centennial problem for this year concerns an application of heuristic reasoning to an old problem in number theory. The Fermat number s are defined by n

Fn = 22 + 1. The first several of these are F0 = 3,

F1 = 5,

F2 = 17,

F3 = 257,

F4 = 65,537.


Notice a pattern? The first three are prime, and a little work shows that F3 and F4 are prime too. What about F5 = 4,294,967,297? Is it a Fermat prime as well? The Fermat numbers grow so rapidly that things soon get beyond the realm of computation. For example, F10 has 309 digits! Pierre de Fermat conjectured that each Fn is prime, although he was unable to prove this. What does heuristic reasoning suggest?

Centennial Problem 1989 Proposed by Steven J. Miller, Williams College. Give a heuristic argument for or against the existence of infinitely many Fermat primes. Does your prediction agree or disagree with the numerical evidence? 421


1989. PROMYS

1989: Comments Why the weird exponent? Some authors consider 2 = 20 + 1 a Fermat prime because of their preference for the formula 2n + 1 [2]. However, this is not n widely adhered to. Why is it that we search for primes of the form 22 + 1 instead n of 2 + 1? We start with the identity xn − 1 = (x − 1)(xn−1 + xn−2 + · · · + x + 1),


which can be confirmed by induction. Then replace x with x/y, multiply by y n , and obtain xn − y n = (x − y)(xn−1 + xn−2 y + · · · + xy n−2 + y n−1 ). If n is odd, then we set x = 2 and y = −1 and get xn + 1 = (x + 1)(xn−1 − xn−2 + · · · − x + 1). k

Suppose that 2m + 1 is prime, in which m = 2k n and n is odd. If x = 22 , then k

2m + 1 = 22




=x +1 = (x + 1)(xn−1 − xn−2 + · · · − x + 1) k


= (22 + 1)(22



− 22



+ · · · − 22 + 1).


The factor 22 + 1 is definitely larger than 1 and it is smaller than 2m + 1, unless n = 1 (in which case the second factor is 1). Thus, the exponent m must be a power of 2 in order for 2m + 1 to be prime. Fermat’s conjecture. The state of Fermat’s conjecture is so well known that we can hardly keep things a secret: Fermat was wrong. Moreover, the heuristic argument discussed below suggests that he was way off base. In 1732, Leonhard Euler disproved Fermat’s conjecture when he computed the prime factorization F5 = 4,294,967,297 = 641 × 6,700,417. Although this was an impressive computational feat at the time, a modern computer factors F5 faster than the blink of an eye. The prime factorizations of the Fermat numbers seem to involve some large primes (this is partially explained by the Euler– Lucas theorem described at the end of the following section). For example, a few seconds on a desktop computer reveals the prime factorizations F6 = 274177 × 67280421310721, F7 = 59649589127497217 × 5704689200685129054721, F8 = 1238926361552897 × 93461639715357977769163558199606896584051237541638188580280321. Prime factorizations are known for only a few more Fermat numbers. No Fermat primes besides the original five (1989.1) have been found.



Euclid’s theorem revisited. We can use the Fermat numbers to provide another proof of the infinitude of the primes [1, Ch. 1]. Begin by observing that n


Fn − 2 = (22 + 1) − 2 = 22 − 1 n−1

− 1)(22


− 1)Fn−1

= (22 = (22


= (2


= (22



− 1)(22

+ 1)

+ 1)Fn−1

− 1)Fn−2 Fn−1

.. . = F0 F1 · · · Fn−1 . In light of this, Fm divides Fn − 2 whenever m < n. Consequently, any common divisor of Fm and Fn divides Fn − F0 F1 · · · Fn−1 = 2.    divisible by Fm

Since Fermat numbers are odd, the preceding tells us that gcd(Fm , Fn ) = 1. Thus, the Fermat numbers F0 , F1 , F2 , . . . are pairwise relatively prime and hence their prime factorizations yield infinitely many distinct primes. In fact, this proves that n there are at least n primes at most 22 + 1. The ordered list of prime factors of the Fermat numbers begins with 3, 5, 17, 257, 641, 65537, 114689, 274177, 319489, 974849, 2424833, 6700417, 13631489, 26017793, 45592577, 63766529, 167772161, 825753601, 1214251009, 6487031809, 70525124609, 190274191361, 646730219521, 2710954639361, 2748779069441, 4485296422913, 6597069766657, according to [3]. How can we be sure of this? What if a large Fermat number is ´ divisible by a small prime? A result of Euler, later improved by Edouard Lucas (1842–1891), asserts that every prime factor of Fn is of the form k2n+2 + 1. Thus, the size of the smallest prime factor of Fn tends to increase rapidly with n. For example, we can be sure that no Fermat number has a prime factor strictly between 257 and 641 since we have the prime factorizations of all Fn for n = 0, 1, 2, . . . , 11. Constructible polygons. The ancient Greeks developed methods to construct regular (equilateral and equiangular) n-gons with straightedge and compass for any n ≥ 3 of the form 2i 3j 5k , in which i ≥ 0 and j, k ∈ {0, 1}. Thus, they could construct regular n-gons for n = 3, 4, 5, 6, 8, 10, 12, 15, 16 . . .. Can all regular n-gons be constructed by straightedge and compass? This question vexed mathematicians for two thousand years. In 1796, at the age of nineteen, Carl Friedrich Gauss shocked the mathematical world when he proved that the regular 17-gon was constructible. Folklore holds that Gauss wanted the heptadecagon inscribed on his tombstone, although this was regrettably not carried out. Gauss provided the first new constructible regular ngon since ancient times, a remarkable feat. Moreover, he also provided sufficient


1989. PROMYS

conditions for the constructibility of a regular n-gon. The constructibility of the heptadecagon boils down to the fact that "   √ √ √ √ 2π = −1 + 17 + 34 − 2 17 + 2 17 + 3 17 − 170 + 38 17 16 cos 17 is expressible in terms of integers and square roots [7]. From a Cartesian perspective, straightedge and compass constructions involve finding the intersections of lines or circles with other lines or circles in R2 . Thus, one only considers systems of equations of degree one or two; this leads to expressions that involve rational numbers and nested square roots. In 1837, Pierre Wantzel (1814–1848) completed the proof of what is now known as the Gauss–Wantzel theorem. It states that for n ≥ 3, a regular n-gon is constructible with straightedge and compass if and only if n = 2k p1 p2 · · · pr , in which k ≥ 0 and p1 , p2 , . . . , pr are distinct Fermat primes (either type of factor may be absent). Consequently, the regular 7- and 9-gons are nonconstructible, whereas the regular 10- and 17-gons are constructible; see Figure 1. Heuristic argument. The prime number theorem asserts that the number of primes at most x is roughly x/ log x. Thus, the density of primes at most x is about 1/ log x. We therefore model the primes as a random process, in which the probability that a natural number n is prime is 1/ log n (see the comments for the 1987 entry). Consider the random variable  1 if n is prime, Xn = 0 otherwise. The expected number of Fermat primes is E[XF0 + · · · + XFN ] = E[XF0 ] + · · · + E[XFN ] by the linearity of expectation. Since E[XFn ] =

1 1 1 ≤ n , = n 2 log Fn log(2 + 1) 2 log 2

the expected number of Fermat primes is at most ∞ 1  1 2 ≈ 2.88 < 3. = log 2 n=0 2n log 2


Thus, we expect that there are only finitely many Fermat primes. A more sophisticated argument comes to the same conclusion [2]. Our estimate is reasonably close to the presently observed number (five). What causes the discrepancy? First of all, this is a heuristic argument that proves nothing: our model could be completely wrong. However, well-composed heuristic arguments often do point us in the right direction (see the 1987 entry). A more likely culprit is the bias introduced by small primes. The largest contributions to the sum (1989.3) come from the smallest Fermat numbers. In this range, the large-scale predictions afforded by the prime number theorem are swamped by small-scale fluctuations. For example, the prime number theorem predicts that there are 2/ log 2 ≈ 2.73 primes at most 2, which is absurd.


(a) n = 7 (nonconstructible)

(b) n = 9 (nonconstructible)

(c) n = 10 (constructible)

(d) n = 17 (constructible)


Figure 1. The regular n-gon with n ≥ 3 is constructible with straightedge and compass if and only if n = 2k p1 p2 · · · pr , in which k ≥ 0 and the p1 , p2 , . . . , pr are distinct Fermat primes (either type of factor may be absent). Bibliography [1] M. Aigner and G. M. Ziegler, Proofs from The Book, 4th ed., Springer-Verlag, Berlin, 2010. MR2569612 [2] K. D. Boklan and J. H. Conway, Expect at most one billionth of a new Fermat prime!, Math. Intelligencer 39 (2017), no. 1, 3–5, DOI 10.1007/s00283-016-9644-3. https://arxiv.org/pdf/ 1605.01371.pdf. MR3620166 [3] The On-Line Encyclopedia of Integer Sequences, A023394 (Prime factors of Fermat numbers), http://oeis.org/A023394. [4] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 [5] PROMYS, PROMYS: Program in Mathematics for Young Scientists, http://www.promys. org/. [6] Wikipedia, Constructible polygon, https://en.wikipedia.org/wiki/Constructible_polygon. [7] Wikipedia, Heptadecagon, https://en.wikipedia.org/wiki/Heptadecagon.


The Monty Hall Problem Introduction Although it rose to national prominence after Marilyn vos Savant (1946– ) presented it in a 1990 Parade magazine column [3], the famed Monty Hall problem first appeared in 1975 when it was posed by Steve Selvin (1941– ) in The American Statistician [4]. His presentation also explains the origin of the problem’s name: It is “Let’s Make a Deal”—a famous TV show starring Monte Hall.1 Monte Hall: One of the three boxes labeled A, B, and C contains the keys to that new 1975 Lincoln Continental. The other two are empty. If you choose the box containing the keys, you win the car. Contestant: Gasp! Monte Hall: Select one of these boxes. Contestant: I’ll take box B. Monte Hall: Now box A and box C are on the table and here is box B (contestant grips box B tightly). It is possible the car keys are in that box! I’ll give you $100 for the box. Contestant: No, thank you. Monte Hall: How about $200? Contestant: No! Audience: No!! Monte Hall: Remember that the probability of your box containing the keys to the car is 1/3 and the probability of your box being empty is 2/3. I’ll give you $500. Audience: No!! Contestant: No, I think I’ll keep this box. Monte Hall: I’ll do you a favor and open one of the remaining boxes on the table (he opens box A). It’s empty! (Audience: applause). Now either box C or your box B contains the car keys. Since there are two boxes left, the probability of your box containing the keys is now 1/2. I’ll give you $1000 cash for your box. WAIT!!!! Is Monte right? The contestant knows that at least one of the boxes on the table is empty. He now knows it was box A. Does this knowledge change his probability of having the box containing the keys 1 Monty

Hall was the stage name of Monte Halparin (1921–2017). Although Selvin’s puzzle is universally referred to as the “Monty Hall problem,” it is interesting to note that Selvin (perhaps unintentionally) spelled the stage name “Monty” as “Monte,” which is the host’s actual first name. 427




3 2

Figure 1. The contestant is presented with three doors. Behind one of them is a valuable prize. The other two doors conceal nothing of value. The contestant selects a door, say 1. The host opens another door, say 2, and shows that it does not conceal the prize. Thus, the prize is either behind door 1 or 3. Is the contestant better off switching from door 1 to door 3? from 1/3 to 1/2? One of the boxes on the table has to be empty. Has Monte done the contestant a favor by showing him which of the two boxes was empty? Is the probability of winning the car 1/2 or 1/3?

In most contemporary formulations of the problem, the contestant chooses one of three doors. One door conceals a valuable prize. Behind the other two doors are goats, which are presumed to be worthless; see Figure 1. The host opens one of the other doors and reveals a goat. He gives the contestant the chance to switch to the remaining door. Should the contestant switch? How can switching doors possibly help? Each door initially has a 1/3 chance of holding the prize. After the host opens one of the doors, we know that one of the two remaining doors conceals the prize. Thus, the chance that either holds the prize is 1/2. Is this correct? See the comments below for the answer! Our problem for this year, which also appeared in 1990, is due to philosopher Arnold Zuboff [7]. It is called the sleeping beauty problem and it is still the source of spirited arguments. What do you think the answer is? Centennial Problem 1990 Proposed by Adam Elga, Princeton University. Some researchers put you to sleep for two days. While sleeping, they briefly wake you up either once or twice, depending on the toss of a fair coin (heads once; tails twice). After each waking, they put you back to sleep with a drug that makes you forget that waking. When you are first awakened, to what degree ought you believe that the outcome of the coin toss is heads? 1990: Comments Resolution of the Monty Hall problem. One good way to build intuition for the answer is to write a computer program and simulate millions of games.



Computational results can quickly provide evidence for or against a particular answer. Without loss of generality we may assume the contestant always chooses the first door. Here is an example of such a program in Mathematica (see the 1988 entry).2 success = 0; (* initialize number of successes to 0 *) For[n = 1, n 0 and a, b, c, d are integers with ad − bc = 1, and it enjoys a Laurent series expansion of the form f (q) =


a(n)q n ,

n=−m 2πiτ

in which q = e . One can show that every rational function of j is a modular function and, conversely, that every modular function is a rational function of j. In 1978, John McKay (1939– ) observed that the first few coefficients in the expansion j(q) = q −1 + 744 + 196,884q + 21,493,760q 2 + 864,299,970q 3 + 20,245,856,256q 4 + 333,202,640,600q 5 + · · ·



are expressible as integral linear combinations of the dimensions rn of the irreducible representations2 of the monster group M . For example, 1 = r1 , 196,884 = r1 + r2 , 21,493,760 = r1 + r2 + r3 , 864,299,970 = 2r1 + 2r2 + r3 + r4 , 20,245,856,256 = 3r1 + 3r2 + r3 + 2r4 + r5 = 2r1 + 3r2 + 2r3 + r4 + r6 , 333,202,640,600 = 5r1 + 5r2 + 2r3 + 3r4 + 2r5 + r7 = 4r1 + 5r2 + 3r3 + 2r4 + r5 + r6 + r7 , and so forth [10]. The numbers involved are so large3 that one suspects that these identities cannot be mere coincidence. In 1979, John Horton Conway and Simon P. Norton (1952– ) coined the term “monstrous moonshine” to reflect both the monster group and the (apparent) improbability of such a connection. The conjectured connection between the j-invariant and the monster group led to the discovery of several analogous relationships between modular functions and group theory. Richard Borcherds (1959– ) proved the Conway–Norton conjectures in 1992 and earned a Fields Medal for this work. One of the main elements of his argument was the construction of a Z2 -graded Lie algebra on which M acts. As a result of his proof, the relationship between the two mathematical objects is now understood as follows: there is a vertex operator algebra called the moonshine module, first explicitly constructed by Igor Frenkel (1952– ), James Lepowsky (1944– ), and Arne Meurman (1956– ), that has M as its automorphism group and the j-invariant as its graded dimension function. The underlying similarities of the two seemingly unrelated topics comes from conformal field theory (field theory that is invariant under conformal transformations), a theory that is used in modeling statistical mechanics, string theory, and condensed matter physics. Centennial Problem 1992 Proposed by Blake Mackall and Steven J. Miller, Williams College. It is fascinating that the particular number 246 · 320 · 59 · 76 · 112 · 133 · 17 · 19 · 23 · 29 · 31 · 41 · 47 · 59 · 71 corresponds to the size of an interesting group. There are fifteen primes that appear in its factorization: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 47, 59, 71.


How many distinct products of powers of these fifteen numbers exist that yield a number within a factor of 100 of the Monster’s size? What if we instead allow ourselves to use all primes at most 71? 2 There are certain dimensions, r , for which one can find a homomorphism φ : M → GL (C) n rn that cannot be “decomposed” in the sense that the only subspaces of Crn that are invariant under every φ(g) are {0} and Crn itself. 3 The coefficients grow rapidly. One can use the circle method (see the 1923 entry) to show √

that the coefficient of q n in the Laurent series expansion for j(q) is asymptotic to

e4π n √ . 2n3/4



1992: Comments Numbers with fixed prime factors. What can be said about the sequence 1 = n1 < n2 < · · · of natural numbers whose prime factors are among the list (1992.1)? The sequence begins promisingly enough: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 60, 62, 63, 64, 65, 66, 68, 69, 70, 71, 72, 75, 76, 77, 78, 80, 81, 84, 85, 87, 88, 90, 91, 92, 93, 94, 95, 96, 98, 99, 100

20, 39, 59, 82,

but skips a few numbers. Does it contain most of the natural numbers? Axel Thue (1863–1922) proved that if one starts with any finite set of primes, then limi→∞ (ni+1 − ni ) = ∞; that is, the gaps between terms in the sequence tend to infinity [7]. In particular, the sequence above contains relatively few natural numbers in the big scheme of things. A more quantitative version is due to Robert Tijdeman (1943– ), who proved that there is a constant C, which depends only upon the initial (finite) list of primes, such that ni ni+1 − ni > (log ni )C for ni ≥ 3 [8]. Graham’s number. Consider the following problem in Ramsey theory. An ndimensional hypercube has 2n vertices. For example, a two-dimensional hypercube is a square, which has four vertices. A three-dimensional hypercube is a cube, in the traditional sense, which has eight vertices, and so forth; see Figure 2. Connect each pair of vertices and obtain a complete graph on 2n vertices. Now assign one of two colors to each edge. What is the smallest dimension d for which any such twocoloring contains a monochromatic, complete subgraph on four coplanar vertices (see Figure 3)? It is known that d ≥ 13. In unpublished work, Ronald Graham established an upper bound for d that makes the size of the monster group pale in comparison. This number, now known as Graham’s number, was popularized in Martin Gardner’s Scientific American column (see the 1914 entry) in November 1977 [5].

(a) n = 2

(b) n = 3

(c) n = 4

Figure 2. The vertices of an n-dimensional hypercube (projected into two-dimensional Euclidean space).


(a) A 2-coloring of a three-dimensional cube.


(b) A monochromatic 4-vertex coplanar complete subgraph.

Figure 3. Illustration of the Ramsey theory problem from which Graham’s number arises (image public domain). We start with Donald Knuth’s up-arrow notation for positive integers [11]; this is related to Ackermann’s function from the 1926 entry. We can view multiplication as iterated addition: ab is b copies of a under addition. Along these lines, a ↑ b is defined to be ab , that is, b copies of a under multiplication. We can define a ↑↑ b = a ↑ (a ↑ (· · · ↑ a)),    b − 1 up arrows

so that 3 ↑↑ 2 = 33 = 27



3 ↑↑ 3 = 33 = 7,625,597,484,987,


and so forth. But why stop there? We can define a ↑↑↑ b = a ↑↑ (a ↑↑ (· · · ↑↑ a)),    b − 1 double up arrows

and so forth. Graham’s number involves 64 layers of ever larger up-arrowing ⎧ b if n = 1, ⎪ ⎨a n a↑ b = 1 if n ≥ 1 and b = 0, ⎪ ⎩ n−1 n a↑ (a ↑ (b − 1)) otherwise. As (1992.2) suggests, g1 = 3 ↑↑↑↑ 3 is already outrageously huge. If we let gn = 3 ↑gn−1 3 for n ≥ 2, then Graham’s number is g64 . The following description gives a rough idea of its magnitude: Graham’s number is much larger than many other large numbers such as Skewes’s number. . . it is so large that the observable universe is far too small to contain an ordinary digital representation of Graham’s number, assuming that each digit occupies one Planck volume, possibly the smallest measurable space. But even the number of digits in this digital representation of Graham’s number would itself be a



number so large that its digital representation cannot be represented in the observable universe. Nor even can the number of digits of that number—and so forth, for a number of times far exceeding the total number of Planck volumes in the observable universe. [9]

A video explanation, by Graham himself, is [6]. See the 1930 entry to learn more about Ramsey theory and the 1933 entry for information about Skewes’s number. Bibliography [1] R. E. Borcherds, Monstrous moonshine and monstrous Lie superalgebras, Invent. Math. 109 (1992), no. 2, 405–444, DOI 10.1007/BF01232032. MR1172696 [2] J. H. Conway and S. P. Norton, Monstrous moonshine, Bull. London Math. Soc. 11 (1979), no. 3, 308–339, DOI 10.1112/blms/11.3.308. http://blms.oxfordjournals.org/content/11/ 3/308.full.pdf+html. MR554399 [3] Dr. FermiGuy, Physics Questions People Ask Fermilab, http://www.fnal.gov/pub/science/ inquiring/questions/atoms.html. [4] T. Gannon, Monstrous moonshine: the first twenty-five years, Bull. London Math. Soc. 38 (2006), no. 1, 1–33, DOI 10.1112/S0024609305018217. http://arxiv.org/pdf/math/ 0402345v2.pdf. MR2201600 [5] M. Gardner, Mathematical games, Scientific American 237 (1977), November, 18–28. [6] R. Graham and B. Haran, How big is Graham’s number?, https://www.youtube.com/watch? v=GuigptwlVHo. [7] A. Thue, Selected mathematical papers, with an introduction by Carl Ludwig Siegel and a biography by Viggo Brun; edited by Trygve Nagell, Atle Selberg, Sigmund Selberg, and Knut Thalberg, Universitetsforlaget, Oslo, 1977. MR0460050 [8] R. Tijdeman, On integers with many small prime factors, Compositio Math. 26 (1973), 319–330. MR0325549 [9] Wikipedia, Graham’s number, https://en.wikipedia.org/wiki/Graham’s_number. [10] Wikipedia, Monstrous moonshine, http://en.wikipedia.org/wiki/Monstrous_moonshine. [11] Wikipedia, Knuth’s up-arrow notation, https://en.wikipedia.org/wiki/Knuth’s_uparrow_notation.


The 15-Theorem Introduction Lagrange’s four-square theorem, proved by Joseph-Louis Lagrange in 1770, says that every positive integer is the sum of four perfect squares (in which zero is considered a square). For example, 1993 is expressible as a sum of four squares in many different ways, such as 1993 = 442 + 72 + 22 + 22 = 422 + 122 + 72 + 62 = 332 + 302 + 22 + 02 = 242 + 242 + 212 + 202 = 432 + 122 + 02 + 02 = 322 + 222 + 222 + 12 . Lagrange’s theorem was refined in 1834 by Carl Gustav Jacob Jacobi (1804–1851), who proved what is now known as Jacobi’s four-square theorem: the number r4 (n) of representations n = a2 + b2 + c2 + d2 , in which a, b, c, d are integers, is

r4 (n) =

⎧  ⎪ 24 m ⎪ ⎪ ⎪ ⎪ m|n ⎨

if n is even,

 ⎪ ⎪ ⎪ 8 m ⎪ ⎪ ⎩

if n is odd.

m odd


There are several ways to generalize Lagrange’s four-square theorem. One might consider a different number of squares. We considered sums of two squares in the 1966 entry and refer the reader there for further details. In light of Lagrange’s theorem, we focus on sums of three squares. Adrien-Marie Legendre (1752–1833) proved that a natural number n is the sum of three squares if and only if n is not of the form 4i (8j + 7). Thus, any number not on the list 7, 15, 23, 28, 31, 39, 47, 55, 60, 63, 71, 79, 87, 92, 95, 103, 111, 112, 119, 124, 127, 135, 143, 151, 156, 159, 167, 175, 183, 188, 191, 199, 207, 215, 220, 223, 231, 239, 240, 247, 252, 255, 263, 271, 279, 284, 287, 295, 303, 311, 316, 319, 327, 335, 343, . . . is the sum of three squares [7]. In fact, Gauss later proved that if n ≥ 5 is square free, then the number r3 (n) of representations n = x2 + y 2 + z 2 , in which x, y, z are 445


1993. THE 15-THEOREM

integers, is ⎧ ⎪ ⎨12h(−4n) if n ≡ 1, 2, 5, 6 (mod 8), r3 (n) = 24h(−n) if n ≡ 3 (mod 8), ⎪ ⎩ 0 if n ≡ 7 (mod 8), in which h(x) denotes the class number of x [10] (see the 1966 entry for more about class numbers). Instead of focusing only on sums of squares, we can learn even more by studying quadratic forms in several variables. We take our inspiration from the identity x21 + x22 + x23 + x24 = xT Ix,


in which x = [x1 x2 x3 x4 ]T ∈ R4 and I denotes the 4 × 4 identity matrix. Other quadratic forms arise if we replace I with a more general matrix A. For example, the most general binary quadratic form is ax21 + bx1 x2 + cx22 = [x1 x2 ]

a b 2

b 2



! x1 . x2

We say that a quadratic form Q(x) = xT Ax is universal if Q represents every natural number; that is, for each n ∈ N, there is an integer vector x so that Q(x) = n. For example, Lagrange’s four-square theorem asserts that the form (1993.1) is universal. On the other hand, x21 + x22 + x23 is not universal because it does not represent 7. In 1993, John Horton Conway and William Schneeberger proved the 15-theorem. This remarkable result asserts that if Q(x) = xT Ax


is a quadratic form with positive definite, integral matrix A, then Q represents all positive integers if and only if it represents the numbers 1, 2, . . . , 15 [12]. In fact, one can replace this list with 1, 2, 3, 5, 6, 7, 10, 14, 15. The restriction that A has integer entries is nontrivial. For example, the quadratic form ! ! 1 12 x1 2 2 x1 + x1 x2 + x2 = [x1 x2 ] 1 (1993.3) 1 x2 2 has integer coefficients, but its corresponding matrix has noninteger off-diagonal entries. The original proof of the 15-theorem was not published, although Fields Medalist Manjul Bhargava gave a simpler proof in 2000 [1]. In 1916, Ramanujan provided a list of fifty-five “diagonal” quartic forms (1993.2) that he claimed exhausts the positive-definite, universal forms in four variables [9].



Figure 1. The nth triangular number Tn = n(n + 1)/2 is the number of balls in a triangular array whose base consists of n balls. To be more specific, they correspond to diagonal matrices A with diagonal (1, 1, 1, d),

1 ≤ d ≤ 7,

(1, 1, 2, d),

2 ≤ d ≤ 14,

(1, 1, 3, d),

3 ≤ d ≤ 6,

(1, 2, 2, d),

2 ≤ d ≤ 7,

(1, 2, 3, d),

3 ≤ d ≤ 10,

(1, 2, 4, d),

4 ≤ d ≤ 14,

(1, 2, 5, d),

5 ≤ d ≤ 10.

The 15-theorem can be used to show that Ramanujan was almost correct: his only mistake was the erroneous inclusion of the quadratic form x21 + 2x22 + 5x23 + 5x24 , which omits the value 15. Thus, the tuple (1, 2, 5, 5) should not have been included in his list. Nevertheless, Ramanujan (as always) displayed remarkable foresight and was well ahead of his time. Centennial Problem 1993 Proposed by Scott Duke Kominers, Harvard University. (a) The nth triangular number is Tn = n(n + 1)/2; see Figure 1. Prove that every positive integer can be represented in the form Tp + Tq + Tr . (b) Prove that every positive integer can be represented in √ the form pp + 3qq, in √ which p = p1 + p2 −2 (p1 , p2 ∈ Z) and √ q = q1 + q2 −2 (q1 , q2 ∈ Z) are algebraic integers in the quadratic field Q( −2). (c) Characterize the set of natural numbers that are not of the form x2 +y 2 +10z 2 . 1993: Comments The 290-theorem. In 2008, Bhargava and Jonathan P. Hanke proved the 290-theorem, which asserts that a quadratic form (1993.2) with integer coefficients and positive definite matrix A is universal if and only if it assumes the values


1993. THE 15-THEOREM

1, 2, . . . , 290 [12]. As before, we can replace this list by something smaller: 1, 2, 3, 5, 6, 7, 10, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, 30, 31, 34, 35, 37, 42, 58, 93, 110, 145, 203, 290. The 290-theorem handles quadratic forms, such as (1993.3), that the 15-theorem does not address. Triangular numbers. Do not feel bad if you have trouble with part (a) of the centennial problem. It is a famous result of Gauss from 1796 and it is closely related to the difficult problem of representing an integer as a sum of three squares. Indeed, if n = Tp + Tq + Tr , then a little algebra shows that 8n + 3 = (2p + 1)2 + (2q + 1)2 + (2r + 1)2 . There are many things that can be said about triangular (and more generally, polygonal) numbers [11]. The reader is invited to deduce the correct definition of an n-polygonal number; look at Figure 1 for inspiration. If you wish to check your answer, the generating function for the n-polygonal numbers is   x (n − 3)x + 1 Gn (x) = . (1 − x)3 Fermat’s polygonal number theorem, stated by Fermat in 1638, asserts that for k = 3, 4, . . ., each natural number n is the sum of k k-polygonal numbers (as usual, zero summands are permitted). The case n = 3 is Gauss’s theorem and the n = 4 case is Lagrange’s four-square theorem. Fermat’s theorem, for which he gave no proof, was finally proved by Cauchy in 1813 [14]. Ramanujan’s ternary quadratic form. Ramanujan observed several curious properties of the quadratic form x2 + y 2 + 10z 2 that appears in part (c) of the centennial problem [9]. The even numbers that are not represented by this quadratic form are of the form 4j (16k + 6). Moreover, the odd numbers 3, 7, 21, 31, 33, 43, 67, 79, 87, 133, 217, 219, 223, 253, 307, 391, . . . that are not represented by this form are not easily characterized [13]. Two other numbers, 679 and 2,719, were later added to Ramanujan’s list. In 1997, Ken Ono and Kannan Soundararajan (1973– ) conjectured that the odd natural numbers that are not of the form x2 + y 2 + 10z 2 are 3, 7, 21, 31, 33, 43, 67, 79, 87, 133, 217, 219, 223, 253, 307, 391, 679, 2719. More importantly, they show that if the generalized Riemann hypothesis is true, then their conjecture holds [8, Thm. 3].



Positivity. In engineering applications one often encounters multivariable functions that one hopes are nonnegative. For example, is f (x, y) = 5x2 + xy 2 + 6xy + 4y 2 nonnegative for all real x, y? Yes, because f (x, y) = (2x + y)2 + (xy)2 + 2y 2 + (x + y)2 happens to be a sum of squares. These sums-of-squares decompositions have recently found applications in the field of self-driving cars. Briefly, the idea is to encode the journey so that if a certain function is nonnegative, then there are no collisions [3]. If the function can be represented as a sum of squares of polynomials, then it is nonnegative and the path is a safe one. To be practical for such real-world problems, it is not enough to be able to compute that the polynomial has such a decomposition; we must be able to rapidly certify that such a decomposition exists. The main idea is to replace computationally slow semidefinite programming problems with a series of linked linear programming problems (see the 1947 entry).

Bibliography [1] M. Bhargava, On the Conway-Schneeberger fifteen theorem, Quadratic forms and their applications (Dublin, 1999), Contemp. Math., vol. 272, Amer. Math. Soc., Providence, RI, 2000, pp. 27–37, DOI 10.1090/conm/272/04395. http://www.maths.ed.ac.uk/~aar/books/ dublin.pdf. MR1803359 [2] J. H. Conway, Universal quadratic forms and the fifteen theorem, Quadratic forms and their applications (Dublin, 1999), Contemp. Math., vol. 272, Amer. Math. Soc., Providence, RI, 2000, pp. 23–26, DOI 10.1090/conm/272/04394. http://www.maths.ed.ac.uk/~aar/books/ dublin.pdf. MR1803358 [3] K. Hartnett, A classical math problem gets pulled into self-driving cars, Quanta Magazine, May 23, 2018. https://www.quantamagazine.org/a-classical-math-problem-getspulled-into-the-modern-world-20180523/ [4] M.-H. Kim, Recent developments on universal forms, Algebraic and arithmetic theory of quadratic forms, Contemp. Math., vol. 344, Amer. Math. Soc., Providence, RI, 2004, pp. 215– 228, DOI 10.1090/conm/344/06218. MR2058677 [5] S. D. Kominers, On universal binary Hermitian forms, Integers 9 (2009), A02, 6, DOI 10.1515/INTEG.2009.002. http://www.emis.de/journals/INTEGERS/papers/j2/j2. pdf. MR2475630 [6] I. Niven, H. S. Zuckerman, and H. L. Montgomery, An Introduction to the Theory of Numbers, Wiley, 2008. [7] The On-Line Encyclopedia of Integer Sequences, A004215 (numbers that are the sum of 4 but no fewer nonzero squares), https://oeis.org/A004215. [8] K. Ono and K. Soundararajan, Ramanujan’s ternary quadratic form, Invent. Math. 130 (1997), no. 3, 415–454, DOI 10.1007/s002220050191. http://link.springer.com/article/ 10.1007%2Fs002220050191. MR1483991 [9] S. Ramanujan, On the expression of a number in the form ax2 + by 2 + cz 2 + du2 , Proc. Camb. Phil. Soc. 19 (1916), 11–21. [10] Wolfram MathWorld, Sum of squares function, http://mathworld.wolfram.com/ SumofSquaresFunction.html.


1993. THE 15-THEOREM

[11] Wolfram MathWorld, Polygonal number, http://mathworld.wolfram.com/PolygonalNumber. html. [12] Wikipedia, 15 and 290 theorems, https://en.wikipedia.org/wiki/15_and_290_theorems. [13] Wikipedia, Ramanujan’s ternary quadratic form, https://en.wikipedia.org/wiki/ Ramanujan’s_ternary_quadratic_form. [14] Wikipedia, Fermat polygonal number theorem, https://en.wikipedia.org/wiki/ Fermat_polygonal_number_theorem.


AIM Introduction In 1994 John Fry (1944– ), cofounder of the Fry’s Electronics chain, funded the creation of AIM, the American Institute of Mathematics1 [1]. AIM was located in Palo Alto, California, for many years before moving to its present location in San Jose. The institute’s stated mission is: To advance mathematical knowledge through collaboration, to broaden participation in the mathematical endeavor, and to increase the awareness of the contributions of the mathematical sciences to society.

Since 2002, AIM has been one of eight institutions that are part of the National Science Foundation’s Mathematical Sciences Institute Program [5]. The others are: • Institute for Advanced Study (IAS) in Princeton, NJ, • Institute for Computational and Experimental Research in Mathematics (ICERM) in Providence, RI, • Institute for Mathematics and its Applications (IMA) in Minneapolis, MN, • Institute for Pure and Applied Mathematics (IPAM) in Los Angeles, CA, • Mathematical Biosciences Institute (MBI) in Columbus, OH, • Mathematical Sciences Research Institute (MSRI) in Berkeley, CA, • Statistical and Applied Mathematical Sciences Institute (SAMSI) in Research Triangle Park, NC. These institutes bring together mathematicians and foster long-term collaborations. One of AIM’s most effective and popular methods for nurturing collaborative work is the SQuaREs program: The purpose of AIM’s research program called SQuaREs (Structured Quartet Research Ensembles) is to allow a dedicated group of four to six mathematicians to spend a week at AIM in San Jose, California, with the possibility of returning in following years. A SQuaRE could arise as a followup to an AIM workshop, or it could be a freestanding activity. AIM will provide both the research facilities and the financial support for each SQuaRE group.

There are so many good questions arising from work at AIM that it is hard to select just one. We have chosen an easily stated problem with a long and storied history. Moreover, it connects not only to Hilbert’s tenth problem (see the 2005 entry) 1 Full disclosure: the first named author has served on the human resources board of AIM since 2008. Both authors have led workshops at AIM over the years.



1994. AIM

and Sage (see the 2005 entry), but it also forms a segue into Fermat’s last theorem, the topic of our next entry. See https://aimath.org/news/congruentnumbers/ for more information. Centennial Problem 1994 Proposed by Steven J. Miller, Williams College. What positive integers n are the areas of a right triangle with rational sides? In other words, solve the system of equations 1 ab = n, 2 in which a, b, c are rational and n is a positive integer. a2 + b2 = c2



1994: Comments A trillion triangles. An n ≥ 1 for which (1994.1) has a rational solution (a, b, c) is a congruent number . The centennial problem above is the famed congruent number problem. The first few congruent numbers are 5, 6, 7, 13, 14, 15, 20, 21, 22, 23, 24, 28, 29, 30, 31, 34, 37, 38, 39, 41, 45, 46, 47, 52, 53, 54, 55, 56, 60, 61, 62, 63, 65, 69, 70, 71, 77, 78, 79, 80, 84, 85, 86, 87, 88, 92, 93, 94, 95, 96, 101, 102, 103, 109, 110, 111, 112, 116, 117, 118, 119, 120, 124, 125, 126 [6]. For example, 5 is the area of the right triangle with sides   20 3 41 (a, b, c) = , , . 3 2 6 Although early Islamic mathematicians identified the congruent numbers 5, 6, 14, 15, 21, 30, 34, 65, 70, 110, 154, 190, 210, 221, 231, 246, 290, 390, 429, 546, they missed many of the examples above [2]. It is not easy to determine whether a given number is congruent or not. The first congruent number omitted in the second list, 7, is congruent because it is the area of the right triangle with sides   24 35 337 (a, b, c) = , , . 5 12 60 Where does AIM come in? In 2009, a team of mathematicians supported by AIM succeeded in determining all of the congruent numbers up to one trillion [2]. Long story short: there are 3,148,379,694 of them in that range. An AIM press release declared [1]: Mathematicians from North America, Europe, Australia, and South America have resolved the first one trillion cases of an ancient mathematics problem. The advance was made possible by a clever technique for multiplying large numbers. The numbers involved are so enormous that if their digits were written out by hand they would stretch to the moon and back. The biggest challenge was that these numbers could not even fit into the main memory of the available computers, so the researchers had to make extensive use of the computers’ hard drives.



Two teams, each using different software and hardware, arrived at the same results (one group used Sage, the focus of our 2005 entry). A critical role was played by the fast Fourier transform (see the 1965 entry), which can be used to multiply two n-bit numbers in O(n log n log log n) time. Congruent numbers and Pythagorean triples. Every right triangle with rational sides gives rise to infinitely many congruent numbers. For example, the (3, 4, 5)-triangle, which has area 6, gives rise to right triangles with side lengths (3k, 4k, 5k) and area 6k2 for k = 1, 2, . . .. Are there infinitely many congruent numbers whose associated triangles are not similar? The substitution x = a/c and y = b/c provides a bijection between rational solutions (a, b, c) with c = 0 to a2 + b2 = c2 and rational solutions to x2 + y 2 = 1.


The preceding equation has the solution (1, 0), from which we can construct all other rational solutions; see Figure 1. Consider the line through (1, 0) with slope t; that is, y = tx − t. Substitute this into (1994.2) and obtain (1 + t2 )x2 + 2t2 x = 1 − t2 . The quadratic equation implies that x = 1, which leads to the known solution (1, 0) of (1994.2), or t2 − 1 x = 2 , t +1 which leads to  2  t −1 2t , . t2 + 1 t2 + 1

  x(t), y(t)

(1, 0)

Figure 1. Parametrizing the rational solutions to x2 + y 2 = 1.


1994. AIM

This is a rational solution to (1994.2) if and only if t is rational. If we set t = m/n, in which m, n are integers, and clear the resulting denominators, we obtain Pythagorean triples (m2 − n2 , 2mn, m2 + n2 ) and congruent numbers mn(m2 − n2 ). In particular, if n = 1 and m = p is prime, we have a triple of the form (p2 − 1, 2p, p2 + 1) and associated congruent number p(p2 − 1) = (p − 1)p(p + 1). Moreover, no two such triples are rational multiples of each other and hence we have a family of congruent numbers, no two of which are obtained from similar right triangles. Congruent numbers and elliptic curves. There is a beautiful connection between congruent numbers and elliptic curves [3]. For n ≥ 1, the maps    2  nb x − n2 2nx x2 + n2 2n2 (a, b, c) → , and (x, y) → , , c−a c−a y y y provide bijections between the solution sets of (1994.2) and y 2 = x3 − n2 x,


in which y = 0.2 Moreover, these maps send rational solutions to rational solutions. Thus, a positive rational number n is a congruent number if and only if the elliptic curve (1994.3) has a rational point with y = 0. The AIM press release tells us: In 1982 Jerrold Tunnell of Rutgers University made significant progress by exploiting the connection (first used by Heegner) between congruent numbers and elliptic curves, mathematical objects for which there is a well-established theory. He found a simple formula for determining whether or not a number is a congruent number. This allowed the first several thousand cases to be resolved very quickly. One issue is that the complete validity of his formula (therefore also the new computational result) depends on the truth of a particular case of one of the outstanding problems in mathematics known as the Birch and Swinnerton-Dyer conjecture. That conjecture is one of the seven Millennium Prize Problems posed by the Clay Math Institute with a prize of one million dollars. [1]

What is Tunnell’s theorem? Let  An = {(x, y, z) ∈ Z3  Bn = {(x, y, z) ∈ Z3  Cn = {(x, y, z) ∈ Z3  Dn = {(x, y, z) ∈ Z3

 | n = 2x2 + y 2 + 32z 2 },  | n = 2x2 + y 2 + 8z 2 },  | n = 8x2 + 2y 2 + 64z 2 },  | n = 8x2 + 2y 2 + 16z 2 }.


2 The solutions (0, 0), (n, 0), and (0, n) to (1994.3) correspond to a = c, which is not attainable by a right triangle with sides (a, b, c).



If n is a congruent number, then 2|An | = |Bn | if n is odd and 2|Cn | = |Dn | if n is even. Moreover, if the Birch and Swinnerton-Dyer conjecture is true for curves of the form (1994.3), then n is a congruent number whenever the corresponding equality holds [7, 9]. How does this help? Since the quantities that define the sets An , Bn , Cn , Dn are nonnegative for all x, y, z ∈ Z, the cardinalities |An |, |Bn |, |Cn |, |Dn | can be found through an exhaustive search. For example, a short computation confirms that |A41 | = 16 and |B41 | = 32. Assuming the Birch and Swinnerton-Dyer conjecture, we conclude (rightly) that 41 is a congruent number. On the other hand, |A43 | = |B43 | = 12, so 43 is not a congruent number. Bibliography [1] AIM, A trillion triangles, https://aimath.org/news/congruentnumbers/. [2] R. Bradshaw, W. B. Hart, D. Harvey, G. Tornaria, and M. Watkins, Congruent number theta coefficients to 1012 , http://homepages.warwick.ac.uk/~masfaw/congruent.pdf. [3] K. Conrad, The congruent number problem, Harvard College Mathematical Review 2 (2008), no. 2, 58–73. [4] S. J. Miller, Extending the Pythagorean formula, talk online at http://youtu.be/ idIHcgapMG4 (slides at https://web.williams.edu/Mathematics/sjmiller/public_html/ math/talks/GeneralizingPythagoras.pdf). [5] National Science Foundation, Mathematical sciences institutes, https://mathinstitutes.org/ institutes/. [6] The On-Line Encyclopedia of Integer Sequences, A003273 (Congruent numbers: positive integers n for which there exists a right triangle having area n and rational sides), https:// oeis.org/A003273. [7] J. B. Tunnell, A classical Diophantine problem and modular forms of weight 3/2, Invent. Math. 72 (1983), no. 2, 323–334, DOI 10.1007/BF01389327. MR700775 [8] Wikipedia, Congruent number, https://en.wikipedia.org/wiki/Congruent_number. [9] Wikipedia, Tunnell’s theorem, https://en.wikipedia.org/wiki/Tunnell’s_theorem.


Fermat’s Last Theorem Introduction In 1637, Pierre de Fermat wrote the following statement in the margin of his copy of Diophantus’s Arithmetica (Figure 1): Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos & generaliter nullam in infinitum ultra quadratum potestatem in duos eiusdem nominis fas est dividere cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet.

In English, this reads It is impossible to separate a cube into two cubes, or a fourth power into two fourth powers, or in general, any power higher than the second, into two like powers. I have discovered a truly marvelous proof of this, which this margin is too narrow to contain.

Although it appears unlikely that Fermat found a simple and correct proof,1 the conjecture became known as Fermat’s last theorem. In modern terminology it states that if n ≥ 3, then there are no solutions in natural numbers x, y, z to xn + y n = z n .


Although various special cases of Fermat’s last theorem were handled over the years, a complete proof remained elusive (in contrast, Fermat’s last theorem for polynomials is significantly easier; see the 1981 entry). Many mathematicians, great and small, chipped away and some proved various special cases. The great David Hilbert excused himself by saying, “Before beginning I should have to put in three years of intensive study, and I haven’t that much time to squander on a probable failure” (however, he must have squandered a little time on it, since he found a new proof in the case n = 4). The year 1995 saw the publication of papers by Andrew Wiles (1953– ) [12] and by Richard Taylor (1962– ) and Wiles [11] that finally put Fermat’s last theorem to rest. The big announcement came in 1993 during a series of lectures delivered by Wiles at the Isaac Newton Institute in Cambridge. However, a serious issue was soon found that threatened to undermine his proof. He teamed up with Taylor, his former student, and they eventually succeeded in filling the gap. Their work built upon the foundations laid by several generations of mathematicians that connected the problem to the theory of elliptic curves (see the 1921 entry and the comments for the 1956 entry). While Fermat’s result has held mathematicians’ interest for 1 Fermat

proved the special case n = 4. If he were in a possession of a complete proof, this would not have been necessary. He probably never thought that people would obsess over a comment he made to himself in the margin of a book. 457



Figure 1. (left) Fermat found himself unable to write his proof in the space next to Problem II.8 of the 1621 edition of Diophantus’s Arithmetica (this is not Fermat’s copy). (right) The 1670 edition of Diophantus’s Arithmetica, prepared by Cl´ement-Samuel Fermat after the death of his father. The statement of Fermat’s last theorem is near the bottom third of the page. Images in the public domain. centuries, the method of proof was at least as important as the final result since it yielded many important results in active areas of research. Where does one start such a difficult and imposing problem? First observe that if (x, y, z) ∈ N3 is a solution to (1995.1), then (xn/d )d + (y n/d )d = (z n/d )d whenever d divides n. Thus, we obtain solutions in natural numbers to the Fermat equation with exponent d. Since there are solutions to (1995.1) if n = 1 and n = 2, it suffices to show that there are no solutions if n = 4 or if n is an odd prime. This is a significant reduction! The case n = 3 was handled by Euler in 1770, although many independent proofs followed over the years. The case n = 5 was dispatched by Legendre and Dirichlet around 1825. Gabriel Lam´e (1795–1870) settled the case n = 7 in 1839, followed shortly thereafter by a proof of Victor-Am´ed´ee Lebesgue2 (1791–1875). 2 Not

to be confused with Henri Lebesgue (1875–1941) of measure and integration fame.



Lam´e’s proof made use of the clever identity (x + y + z)7 − (x7 + y 7 + z 7 ) 7 8 = 7(x + y)(x + z)(y + z) (x2 + y 2 + z 2 + xy + xz + yz)2 + xyz(x + y + z) . However, such ad hoc methods appeared unlikely to permit the conjecture to be proved for larger and larger odd prime exponents. A major breakthrough occurred in 1849 when Ernst Kummer (1810–1893) proved Fermat’s last theorem for so-called “regular” primes. In brief, if p ≥ 3 is prime and ζp is a primitive pth root of unity, then the class number of the pth cyclotomic field Q(ζp ) is a positive integer that measures the extent to which unique prime factorization fails in Z[ζp ] (we encountered a similar notion in the 1966 entry in the context of imaginary quadratic fields). A prime p is regular if it does not divide the class number of Q(ζp ). Kummer’s used Lam´e’s factorization zp − yp =


(z − ζpj y)


and studied the ideals generated by the z − ζpj y in Z[ζp ]. Kummer also found an elementary characterization of regular primes in terms of the Bernoulli numbers Bn . These are defined by  n   n+1 B0 = 1 and (1995.2) Bk = 0. k k=0

One can show that Bn = 0 for odd n ≥ 3 and that ∞  t Bn n = t . et − 1 n! n=0


The first few Bernoulli numbers of even index are 1 1 1 1 5 691 , B8 = − , B10 = , B12 = − . B2 = , B4 = − , B6 = 6 30 42 30 66 2730 These arise in the computation of the values of the Riemann zeta function at the even positive integers (comments for the 1945 entry): ζ(2n) =

(−1)n+1 (2π)2n B2n . 2(2n)!

Although little is known about the Bernoulli numerators, an 1840 theorem of Karl von Staudt (1798–1867) and Thomas Clausen (1801–1885) tells us a lot about the denominators. They independently showed that  1 ∈ Z, B2n + p p prime (p−1)|2n

 and hence the denominator of B2n in lowest terms divides (p−1)|2n p. Kummer proved that an odd prime p is regular if and only if p does not divide the numerator of Bn , written in lowest terms, for all even n ≤ p − 3. Although this does not solve Fermat’s problem outright, it does permit the rapid verification of the conjecture for certain exponents. Indeed, Bn can be computed readily from either the recurrence (1995.2) or the generating function (1995.3). This permits



one to rapidly determine whether a given prime is regular or not. The first several regular primes are [7] 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 43, 47, 53, 61, 71, 73, 79, 83, 89, 97, 107, 109, 113, 127, 137, 139, 151, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 239, 241, 251, 269, 277, 281, 313, 317, 331, 337, 349, 359, 367, 373, 383, 397, 419, 431. Kummer’s theorem tells us that Fermat’s last theorem is true for these exponents. Unfortunately, it is not known whether infinitely many regular primes exist, although Carl Ludwig Siegel (1896–1981) conjectured that infinitely many exist and, moreover, that they have density e−1/2 ≈ 0.60653 as a subset of the primes [8]. On the other hand, there are infinitely many irregular primes, that is, primes for which Kummer’s approach to Fermat’s last theorem is not applicable. This seems to have first been proved in 1915 by Johan Ludwig Jensen (1859–1925), although many authors cite the 1954 paper of Leonard Carlitz (1907–1999) [2]. A result as monumental as Fermat’s last theorem deserves two problems. The first problem below was originally from the 1995 entry, while the second was from the 1949 entry (in the process of converting these entries to a book, we had the opportunity to move and combine some material). Centennial Problem 1995 Proposed by the students in Frank Morgan’s “The Big Questions” class at Williams College (Fall 2008) and Minh-Tam Trinh, Princeton University. (a) The status of Fermat’s last theorem for rational exponents is known [1]. What about real exponents? Are there positive integral solutions to xr + y r = z r for r real? If yes, can you give a nice example? (b) The following is called Kummer’s congruence. If p is prime, h, k are positive even integers not divisible by p − 1, and h ≡ k (mod(p − 1)), then Bh Bk ≡ (mod p). h k Use Kummer’s congruence and the Clausen–von Staudt theorem to show that if n is a product of irregular primes and 2n < B2n , then there is an irregular prime p  n. With more work, one can build on this and prove there are infinitely many irregular primes. 1995: Comments Sophie Germain primes. A prime p is a Sophie Germain prime if 2p + 1 is prime. These are named after Sophie Germain (1776–1831), a remarkable mathematician, physicist, and philosopher. She proved that if p is such a prime, then the only natural number solutions to xp + y p = z p have p|xyz [9]. See [5, Ch. 14] for a circle-method argument (see the 1923 entry) that suggests the number of Sophie Germain primes at most x is asymptotic to C2 x/ log2 x, in which C2 = 0.660161815 . . . is the twin primes constant (1919.4).



Solution to (a). The Fermat equation with rational exponents is settled in [1, Thm. 1]: the equation xn/m + y n/m = z n/m , in which m, n are relatively prime natural numbers with n > 2, has solutions in natural numbers x, y, z if and only if x = y = z, m is divisible by 6, and three different sixth roots of unity are used. Let f (t) = 4t + 5t − 6t . Since 42 + 52 > 62


43 + 53 < 63 ,

f (2) > 0 and f (3) < 0. The intermediate value theorem3 provides r ∈ (2, 3) so that f (r) = 0; that is, 4r + 5r = 6r . In fact, r ≈ 2.48794 and, moreover, this exponent is irrational in light of the theorem above. See [6] for more information about the Fermat equation with real exponents. Solution to (b). Choose a prime p that divides the numerator of B2n /(2n), written in lowest terms. The Clausen–von Staudt theorem ensures that (p − 1)  2n, so the division algorithm gives 2n = (p − 1)q + r, in which 0 < r < p − 1 must be even. Kummer’s congruence implies that B2n Br ≡ (mod p). 2n r Thus, p divides the numerator of Br /r when it is written in lowest terms. Therefore, p divides the numerator of Br and hence p is irregular. Bibliography [1] C. D. Bennett, A. M. W. Glass, and G. J. Sz´ ekely, Fermat’s last theorem for rational exponents, Amer. Math. Monthly 111 (2004), no. 4, 322–329, DOI 10.2307/4145241. MR2057186 [2] L. Carlitz, Note on irregular primes, Proc. Amer. Math. Soc. 5 (1954), 329–331, DOI 10.2307/2032249. MR0061124 [3] K. Devlin, F. Gouvˆ ea, and A. Granville, Fermat’s last theorem, a theorem at last, FOCUS, August 1993, 3–5. http://www.dms.umontreal.ca/~andrew/PDF/FLTatlast.pdf. [4] F. Q. Gouvˆ ea, “A marvelous proof ”, Amer. Math. Monthly 101 (1994), no. 3, 203–222, DOI 10.2307/2975598. MR1264001 [5] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019 [6] F. Morgan, Fermat’s last theorem for fractional and irrational exponents, College Math. J. 41 (2010), no. 3, 182–185, DOI 10.4169/074683410X488647. MR2656314 [7] Online Encyclopedia of Integer Sequences, A007703 (Regular primes), http://oeis.org/ A007703. [8] C. L. Siegel, Zu zwei Bemerkungen Kummers (German), Nachr. Akad. Wiss. G¨ ottingen Math.-Phys. Kl. II 1964 (1964), 51–57. MR0163899 [9] A. van der Poorten, Notes on Fermat’s last theorem, Canadian Mathematical Society Series of Monographs and Advanced Texts, A Wiley-Interscience Publication, John Wiley & Sons, Inc., New York, 1996. MR1373197 [10] S. Singh, Fermat’s enigma: The epic quest to solve the world’s greatest mathematical problem, with a foreword by John Lynch, Walker and Company, New York, 1997. MR1491363 [11] R. Taylor and A. Wiles, Ring-theoretic properties of certain Hecke algebras, Ann. of Math. (2) 141 (1995), no. 3, 553–572, DOI 10.2307/2118560. MR1333036 [12] A. Wiles, Modular elliptic curves and Fermat’s last theorem, Ann. of Math. (2) 141 (1995), no. 3, 443–551, DOI 10.2307/2118559. MR1333035

3 See

the comments for the 1927 and 1944 entries for two more applications of this theorem.


Great Internet Mersenne Prime Search (GIMPS) Introduction Cram´er’s probabilistic model of the primes (see the comments to the 1989 entry) predicts that a large natural number n has roughly a 1/ log n chance of being prime. This heuristic suggests that the expected number of primes in a set A ⊆ N is  1 . log a a∈A

Are there infinitely many primes among the Mersenne numbers Mn = 2n − 1? These are named after Marin Mersenne (1588–1648), who compiled a (somewhat inaccurate) list of primes Mn with n ≤ 257. Since ∞ 

∞ ∞ ∞    1 1 1 1 > = = (log 2) n − 1) n log M log(2 log 2 n n n=1 n=1 n=1 n=1

diverges, there is cause for optimism. However, like Treebeard we must not be hasty. A similar computation suggests that there are infinitely many primes of the form 2n , which is absurd. Some sort of adjustment must be made in order to fine-tune such predictions. We must first address the fact that not all n are treated equally by our sequence. If n = ab and 1 < a, b < n, then (1989.2) provides the factorization 2n − 1 = 2ab − 1 = (2a )b − 1   = (2a − 1) (2a )b−1 + (2a )b−2 + · · · + 2a + 1 . Thus, we may restrict our attention to Mp = 2p − 1, in which p is prime. A prime of this form is called a Mersenne prime. If we update our heuristic argument to reflect this restriction, we obtain a sum over the primes  1 1 = (log 2) , log Mp p p p which diverges (a famous result of Euler; see the comments for the 1913 entry). Perhaps there are infinitely many Mersenne primes? The values of p ≤ 1,000 that produce Mersenne primes are 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607 and there are currently (as of mid-2018) fifty known Mersenne primes. While the search continues, it remains an open problem whether the number of Mersenne primes is infinite. Even widely accepted conjectures, such as the Bateman–Horn 463



conjecture (see the comments for the 2005 entry), are not refined enough to handle the distribution of primes arising from nonpolynomial functions, such as 2n − 1. For many years, 2127 − 1 = 170,141,183,460,469,231,731,687,303,715,884,105,727 ´ was the largest known prime. It was shown to be prime by Edouard Lucas (1842– 1891) in 1876 and it will forever remain the largest prime ever found without the use of a computer. The status of M67 , however, remained in doubt until 1903. Mersenne claimed that it was prime, but Lucas proved that this is not the case. However, he was unable to produce any of its factors. The following curious anecdote concerns Frank Nelson Cole (1861–1926), whom the prestigious Cole Prizes in algebra and number theory are named after: At a mathematical meeting in New York in 1903, F. N. Cole walked on to the platform and, without saying a single word, wrote two large numbers on the blackboard. He multiplied them out in longhand, and equated the result to 267 − 1. (Subsequently, in private, Cole said that those few minutes at the blackboard had cost him three years of Sundays.) So Mersenne was wrong about his ninth case: p = 67 does not yield a prime number. . . . [5]

Perhaps Cole would be disappointed to know that his factorization M67 = 147,573,952,589,676,412,927 = 193,707,721 × 761,838,257,287 can be found on a late-2013 desktop computer in less than 0.002 seconds! In 1996, George Woltman (1957– ) started the Great Internet Mersenne Prime Search (GIMPS) project.1 This distributed computing project operates on thousands of participating computers around the world. Since its inception, every new Mersenne prime that has been discovered was discovered by GIMPS. As of mid2018, the largest known prime is M77,232,917 , discovered in late-2017 by the GIMPS program (see below), which has 23,249,425 digits. Anyone with a computer can join in the effort to find new Mersenne primes and there is a monetary reward for doing so: the Electronic Frontier Foundation (EFF) offers prizes of • $150,000 to the first individual or group who discovers a prime number with at least 100,000,000 decimal digits; • $250,000 to the first individual or group who discovers a prime number with at least 1,000,000,000 decimal digits [4]. Why is so much focus placed on Mersenne primes? Special algorithms and the binary nature of computer architecture make Mersenne numbers particularly tractable. The factoring code works in three phases to determine whether Mp = 2p − 1 is prime. First, one eliminates all possible small factors. This relies on the fact that any factor of a Mersenne number must be of the form f = 2kp + 1 with f ≡ 1, 7 (mod 8). This eliminates about 95% of the potential factors. Once any potential small factors have been ruled out, GIMPS turns to the Lucas–Lehmer primality test (see the comments below). 1 Not to be confused with the popular GNU Image Manipulation Program (GIMP), an opensource alternative to Photoshop.



We have far too much to say about Mersenne primes to fit in one entry; see the comments for the 1997 entry for more information. Centennial Problem 1996 Proposed by Steven J. Miller and Pamela Mishkin, Williams College. This was a particularly hard year to choose a single problem for. It was neck and neck between GIMPS and Google’s PageRank algorithm, so here are two problems: (a) See [1] for one of the earliest papers on PageRank. Fittingly, we leave it as an exercise to navigate the internet and find out more about PageRank. (b) Find a new Mersenne prime. One approach to this problem is to visit https:// www.mersenne.org/, download GIMPS, and let your computer search for Mersenne primes in the background. 1996: Comments Lucas–Lehmer primality test. Trial division is impractical for determining whether a large number n is prime. It suffices to check for prime factors at most √ √ n since in any factorization n = ab, not both factors can be larger than n. If n ≈ 10500 , then we would need to divide n by every prime at most 10250 . The prime number theorem tells us that there are approximately 1.74 · 10247 such primes. How bad is this? To put this in perspective, there are about 1082 atoms in the observable universe [10]. If each atom were a universe itself, each atom of which was actually a supercomputer capable of 1020 divisions per second and running since the big bang (13.82 billion years ago), we would have only completed 1082 × 1082 × 1020 × 13.82 × 365 × 24 × 60 × 60 ≈ 4.36 × 10197 trial divisions. So how can we possible know, with absolute certainty, that a given Mersenne number is truly prime? The Lucas–Lehmer primality test, developed by Lucas in 1856 and subsequently refined by Derrick Henry Lehmer (1905–1991), is an efficient way to test Mersenne numbers for primality. If p is prime and  4 if i = 0, si = 2 si−1 − 2 if i ≥ 1, then Mp is prime if and only if sp−2 ≡ 0 (mod Mp ). Fortunately, repeated squaring can be performed rapidly in modular arithmetic, especially when powers of 2 are involved. The ability to exponentiate huge numbers quickly is one of the key reasons why the RSA cryptosystem is practical; see the 1977 entry and [3]. Mersenne almost primes. We know that n must be prime for Mn to be prime. For example, M2 , M3 , M5 , and M7 are prime. However, M11 = 2,047 = 23 · 89 is the product of exactly two primes. This leads to a natural question: how often is Mn the product of exactly two primes? More generally, for a fixed k how often is Mn the product of exactly k primes? There has been some progress on these and related questions. For example, one can show that if Mn has exactly two distinct prime factors, then n = 4 or 6, or there is an odd prime p such that n = p



or n = p2 . Try to prove this; see [6] for a proof, as well as a characterization for three distinct prime factors. The Sokal affair. The year 1996 marks the publication of the landmark paper Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity by physicist Alan Sokal [8]. With such a lofty title, one might expect the article to have deep philosophical reflections about the potential unification of quantum mechanics and gravity, long considered a “holy grail” in physics. It contains nothing of the sort but is instead composed of rambling and largely nonsensical passages such as: More recently, Lacan’s topologie du sujet has been applied fruitfully to cinema criticism and to the psychoanalysis of AIDS. In mathematical terms, Lacan2 is here pointing out that the first homology group of the sphere is trivial, while those of the other surfaces are profound; and this homology is linked with the connectedness or disconnectedness of the surface after one or more cuts. Furthermore, as Lacan suspected, there is an intimate connection between the external structure of the physical world and its inner psychological representation qua knot theory: this hypothesis has recently been confirmed by Witten’s derivation of knot invariants (in particular the Jones polynomial) from three-dimensional Chern–Simons quantum field theory.

This load of fetid dingo’s kidneys was published by Social Text, a leading journal in postmodern cultural studies.3 What was Sokal’s motivation for this prank? He provides the following explanation on his website [9]: For some years I’ve been troubled by an apparent decline in the standards of intellectual rigor in certain precincts of the American academic humanities. . . . So, to test the prevailing intellectual standards, I decided to try a modest (though admittedly uncontrolled) experiment: Would a leading North American journal of cultural studies. . . publish an article liberally salted with nonsense if (a) it sounded good and (b) it flattered the editors’ ideological preconceptions? The answer, unfortunately, is yes. . . . Throughout the article, I employ scientific and mathematical concepts in ways that few scientists or mathematicians could possibly take seriously. . . . I assert that Lacan’s psychoanalytic speculations have been confirmed by recent work in quantum field theory. Even nonscientist readers might well wonder what in heavens’ [sic] name quantum field theory has to do with psychoanalysis; certainly my article gives no reasoned argument to support such a link. . . . In sum, I intentionally wrote the article so that any competent physicist or mathematician (or undergraduate physics or math major) would realize that it is a spoof. Evidently the editors of Social Text felt comfortable publishing an article on quantum physics without bothering to consult anyone knowledgeable in the subject.

2 See

the comments for the 1991 entry for more about Jacques Lacan. the holy grail theme, one might say that the editors of the journal “chose poorly.”

3 Continuing



Bibliography [1] S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, Computer Networks and ISDN Systems 30 (1998), 107–117. http://infolab.stanford.edu/~backrub/ google.html. [2] C. K. Caldwell, Mersenne Primes: History, Theorems and Lists, http://primes.utm.edu/ mersenne/index.html. [3] M. Cozzens and S. J. Miller, The mathematics of encryption: An elementary introduction, Mathematical World, vol. 29, American Mathematical Society, Providence, RI, 2013. MR3098499 [4] Electronic Frontier Foundation, EFF Cooperative Computing Awards, https://www.eff.org/ awards/coop. [5] N. Gridgeman, The search for perfect numbers, New Scientist 334 (1963), 86–88. [6] A. Lemos and A. Cambraia Junior, On the number of prime factors of Mersenne numbers, http://arxiv.org/abs/1606.08690. [7] GIMPS Homepage. http://www.mersenne.org/. [8] A. D. Sokal, Transgressing the Boundaries: Toward a Transformative Hermeneutics of Quantum Gravity, Social Text (1996), no. 46/47, 217–252. [9] A. D. Sokal, A Physicist Experiments With Cultural Studies, http://www.physics.nyu.edu/ faculty/sokal/lingua_franca_v4/lingua_franca_v4.html. [10] Universe Today, How many atoms are there in the universe?, https://www.universetoday. com/36302/atoms-in-the-universe/. [11] Wikipedia, Lucas-Lehmer primality test, https://en.wikipedia.org/wiki/Lucas-Lehmer primality test. [12] Wikipedia, Mersenne prime, https://en.wikipedia.org/wiki/Mersenne_prime. [13] Wikipedia, Sokal affair, https://en.wikipedia.org/wiki/Sokal_affair.


The Nobel Prize of Merton and Scholes Introduction In addition to applications in the physical sciences, mathematics plays a key role in many other fields, including economics and finance. While there is no true Nobel Prize in Economics, since 1968 the Royal Swedish Academy of Sciences has awarded the Bank of Sweden Prize in Economic Sciences in Memory of Alfred Nobel. This is widely regarded as the “Nobel Prize in Economics” by the general public. The award announcement from 1997 [2] begins: Robert C. Merton [1944– ] and Myron S. Scholes [1941– ] have, in collaboration with the late Fischer Black [1938–1995], developed a pioneering formula for the valuation of stock options. Their methodology has paved the way for economic valuations in many areas. It has also generated new types of financial instruments and facilitated more efficient risk management in society.

Sadly, Black passed away before the announcement and did not receive the award since it is not given posthumously. They begin with a stochastic model dS = μ dt + σ dW, S

Figure 1. Brownian motion of a particle in one spatial dimension (vertical axis). The horizontal axis represents time. 469



in which S is the stock price at time t and W is a Wiener process, that is, Brownian motion (Figure 1). Intuitively, this says that the infinitesimal rate of return on S has expected value μ dt and variance σ 2 dt. From here, one makes a few reasonable assumptions, performs a number of manipulations, and deduces that ∂V ∂2V ∂V 1 + σ 2 S 2 2 + rS − rV = 0, ∂t 2 ∂S ∂S in which V is the option price function. This is the famed Black–Scholes equation, which can be solved numerically when given suitable boundary conditions [1]. When one considers the trillions of dollars traded annually in the global economy, the impact and importance of such mathematics is clear. See the 1962 entry and the notes below for another connection between mathematics and Nobel Prize winning economics applications. No note on applications of mathematics in finance would be complete without a mention of the dangers of using formulas in regimes in which they are not known to hold. Famed investor Warren Buffett (1930– ) said in 2008: I believe the Black–Scholes formula, even though it is the standard for establishing the dollar liability for options, produces strange results when the long-term variety are being valued. . . . The Black–Scholes formula has approached the status of holy writ in finance . . . . If the formula is applied to extended time periods, however, it can produce absurd results. In fairness, Black and Scholes almost certainly understood this point well. But their devoted followers may be ignoring whatever caveats the two men attached when they first unveiled the formula.

For a description of the faulty mathematics and incorrect assumptions that helped instigate the “great recession,” see [3]. Centennial Problem 1997 Proposed by Steven J. Miller, Williams College. The density of a normal random variable with mean μ and variance σ 2 is (x−μ)2 1 e− 2σ2 . fμ,σ (x) = √ 2πσ 2 A key ingredient in applications of the Black–Scholes model is the corresponding cumulative distribution function1  x Fμ,σ (x) = fμ,σ (t) dt. −∞

Unfortunately, there is no closed-form expression for this function. Find a rapidly convergent series expansion for Fμ,σ (x). 1997: Comments Solution to the problem. For simplicity, we assume that μ = 0 and σ = 1. Hence we must compute  x 2 1 √ e−t /2 dt; F (x) = 2π −∞ 1 This

is related to the error function by

1 [1 2

√ )]. + erf( x−μ σ




see Figure 2. A natural approach is to use the series expansion for the exponential function: 


F (x) = −∞

 x ∞ ∞  1  (−1)n t2n (−1)n √ √ dt = t2n dt. n n! 2π 2π n=0 2n n! 2 −∞ n=0

However, this/ interchange of integration and summation is not permissible since x each integral −∞ t2n dt is infinite! How do we work around this problem? The symmetry of f about the origin ensures that F (0) = 12 . If we let  G(x) = 0

then F (x) =


2 1 √ e−t /2 dt, 2π

⎧ ⎨ 1 + G(x) 2

if x ≥ 0,

⎩ 1 − G(|x|) if x ≤ 0. 2

We can write this more compactly as F (x) = in which

1 + sgn(x)G(|x|), 2

⎧ if x > 0, ⎪ ⎨1 sgn(x) = 0 if x = 0, ⎪ ⎩ −1 if x < 0,

Figure 2. Graphs of f = f0,1 √ and F = F0,1 . The density f reaches its peak at f (0) = 1/ 2π = 0.3989 . . .. The cumulative distribution function F satisfies F (0) = 12 , which reflects the symmetry of f about the origin.



is the sign function. Since each integrand that appears is nonnegative and has finite integral, the Fubini–Tonelli theorem implies that 

∞ 1  (−1)n t2n √ G(x) = dt 2π n=0 2n n! 0  x ∞  (−1)n √ = t2n dt n n! 2π 2 0 n=0



∞  n=0

(−1)n x2n+1 √ . + 1)n! 2π

2n (2n

For any fixed x, the series converges rapidly due to the factorial in the denominator.

The Leontief input-output model. A much simpler economic model that also won a Nobel Prize in Economics (1973) was developed by Wassily Leontief (1906–1999) around 1949. Consider an economy that consists of n sectors, S1 , S2 , . . . , Sn . Each sector interacts with the others in complicated ways. This can be quantified with a detailed economic study and assembled in a consumption matrix , an n × n matrix C = [c1 c2 . . . cn ] whose (i, j) entry Ci,j is the amount Sj consumes from Si in order to produce one unit of output. The column vector cj contains the demands of the jth sector required to produce one unit of output. In addition to the sectors S1 , S2 , . . . , Sn , suppose that there is another part of the economy, the open sector , that only consumes. It might represent consumer demand, government consumption, surplus production, exports, and so forth. Let d ∈ Rn be the final demand vector, which lists the amounts demanded from S1 , S2 , . . . , Sn by the open sector. Is there a production vector x ∈ Rn that lists the outputs x1 , x2 , . . . , xn of sectors S1 , S2 , . . . , Sn for one year so that the amounts produced balance the total demand for that production? Since the sectors all interact with each other every step of the way, the relationship between the final demand and production targets is complicated: demand amt. produced = intermediate  demand + final  .      x



The intermediate demands upon each sector are given by ⎡

⎤ x1 ⎢ x2 ⎥ ⎢ ⎥ x1 c1 + x2 c2 + · · · + xn cn = [c1 c2 . . . cn ] ⎢ . ⎥ = Cx. ⎣ .. ⎦ xn Thus, demand amt. produced = intermediate  demand + final  ,      x





or, equivalently, x = Cx + d. This is equivalent to the system of linear equations (I − C)x = d, which can be solved by any number of well-known numerical methods.2 It may come as some surprise that Leontief was awarded the highest prize in economics for setting up the type of problem covered during the first day of an elementary linear algebra course (see the 1940 entry for a story in a similar vein). Although the underlying idea is embarrassingly simple, Leontief’s application involved 500 sectors and an enormous amount of data collected from the U.S. Bureau of Labor Statistics. Back in 1949, the solution of a 500 × 500 system of linear equations required cutting-edge technology. Perfect numbers. Let us be honest. We like number theory more than mathematical economics. There was too much to be said about Mersenne primes in our 1996 entry, so we have appropriated some space here to continue the discussion. The Pythagoreans regarded the number 6 as special because it equals the sum of its proper divisors: 1 + 2 + 3 = 6. The next largest numbers with this property are 28, 496, and 8,128 since 28 = 1 + 2 + 4 + 7 + 14, 496 = 1 + 2 + 4 + 8 + 16 + 31 + 62 + 124 + 248, 8,128 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 127 + 254 + 508 + 1,016 + 2,032 + 4,064. One of the cornerstones of Pythagorean philosophy was the assignment of mystical qualities to numbers. They called numbers like 6, 28, 496, and 8,126 perfect numbers. Later thinkers like Augustine of Hippo (354–430) and Alcuin of York (ca. 735–804) celebrated the special nature of perfect numbers. In the City of God (Part XI, Chapter 30), Augustine writes: These works are recorded to have been completed in six days (the same day being six times repeated), because six is a perfect number,—not because God required a protracted time, as if He could not at once create all things, which then should mark the course of time by the movements proper to them, but because the perfection of the works was signified by the number six. For the number six is the first which is made up of its own parts, i.e., of its sixth, third, and half, which are respectively one, two, and three, and which make a total of six.. . . And, therefore, we must not despise the science of numbers, which, in many passages of holy Scripture, is found to be of eminent service to the careful interpreter.

The fact that it takes twenty-eight days for the moon to travel around the Earth was also seen by many early thinkers to confirm the importance of perfect numbers. 2 The QR-decomposition (see the 1959 entry) is particularly effective here. Write I −C = QR, in which Q is an orthogonal matrix and R is upper triangular. The given system QRx = d is equivalent to Rx = QT d, which has an upper-triangular coefficient matrix and hence can be solved via back substitution. This approach is more stable than Gaussian elimination, which is typically promoted in a first course on linear algebra.



In Book IX (Proposition 36) of the Elements, Euclid proved that if 2k − 1 is prime, then n = 2k−1 (2k − 1) is a perfect number. For example, 22 − 1 = 3 is prime


21 (22 − 1) = 2 · 3 = 6 is perfect,

23 − 1 = 7 is prime


22 (23 − 1) = 4 · 7 = 28 is perfect,

25 − 1 = 31 is prime


24 (25 − 1) = 16 · 31 = 496 is perfect.

Over 2,000 years later, Euler proved the converse: every even perfect number is of Euclid’s form. Thus, an even number n is perfect if and only if n = 2k−1 (2k − 1), in which 2k − 1 is prime. What about odd perfect numbers? It is known that an odd perfect number must be larger than 101,500 and that it must have at least ten distinct prime factors. As James Joseph Sylvester (1814–1897) noted: . . . the existence of [an odd perfect number]—its escape, so to say, from the complex web of conditions which hem it in on all sides—would be little short of a miracle.

Since Sylvester’s time, many more obscure restrictions upon odd perfect numbers have emerged. For example, they must be congruent to 1 (mod 12), 117 (mod 468), or 81 (mod 324) and cannot be divisible by 105 [5, 6]. Most mathematicians believe that odd perfect numbers do not exist, although we remain unable to prove it. Bibliography [1] J. Fogler, Options Pricing: Black–Scholes Model, https://www.investopedia.com/ university/options-pricing/black-scholes-model.asp. [2] The Royal Swedish Academy of Sciences, Press Release (October 14, 1997), http://www. nobelprize.org/nobel_prizes/economic-sciences/laureates/1997/press.html. [3] F. Salmon, Recipe for disaster: the formula that killed Wall Street, Wired, February 23, 2009. https://www.wired.com/2009/02/wp-quant/. [4] Wikipedia, Black–Scholes model, https://en.wikipedia.org/wiki/Black-Scholes_model. [5] Wikipedia, Perfect number, https://en.wikipedia.org/wiki/Perfect_number. [6] Wolfram Mathworld, Odd perfect number, http://mathworld.wolfram.com/ OddPerfectNumber.html.


The Kepler Conjecture Introduction What is the densest way to pack spheres into n-dimensional space? In one dimension, each sphere is a line segment of length two and hence the densest packing consists of infinitely many line segments placed end to end. Thus, the packing density in one dimension is 1. In two dimensions the problem is somewhat harder. Here the “spheres” are disks of radius one. Joseph-Louis Lagrange (1736–1813) proved in 1773 that the hexagonal lattice packing (see Figure 1) is the densest possible lattice-based sphere packing in the plane. Its density is √ π 3 ≈ 0.9069, 6 so about 90.7% of the plane is covered. Although Axel Thue had provided a flawed proof back in 1890, a complete proof that the hexagonal lattice packing is the densest of all possible packings, including irregular, non-lattice-based packings, came only in 1940, when it was established by L´aszl´ o Fejes T´oth (1915–2005) [14].

Figure 1. (left) The densest sphere packing in two dimensions is the hexagonal lattice (honeycomb) packing. It covers approximately 90.7% of the plane. (right) The square lattice packing has density 4 − π ≈ 0.8584, so only around 85.8% of the plane is covered. 475



In 1611, Johannes Kepler (1571–1630) conjectured that the densest packing of identical spheres in three-dimensional space has density π √ ≈ 0.74048; (1998.1) 3 2 that is, the spheres occupy about 74.05% of the available space. This is the famed Kepler conjecture. What made Kepler think of the number (1998.1)? There are two familiar sphere packings in three dimensions: the hexagonal close and cubic close packings; see Figure 2. Both of these packings have density equal to (1998.1) and it seems impossible to do better.1 Kepler was aware of the cubic close packing and conjectured that its density cannot be beaten [9, 10]. The hexagonal close packing was only identified as a different packing by William Barlow (1845– 1934) in 1883 [2]. The problem was brought to Kepler’s attention by Thomas Harriot (ca. 1560– 1621), who had been asked by Walter Raleigh (1554–1618) about the best way to stack cannonballs; see Figure 3. The problem was posed earlier (1611) than Fermat’s last theorem (1637) and was solved shortly afterwards, making it an open, active problem for a longer period of time. A proof of the conjecture was announced by Thomas C. Hales and his student Samuel P. Ferguson in 1998 (see [13, 15] for summaries of the key ideas). Although it required a large number of computer-assisted computations, the proof did not spark nearly the level of philosophical debate that the proof of the four color theorem did over two decades earlier (see the 1976 entry). [T]he proof was a 300-page monster that took 12 reviewers four years to check for errors. Even when it was published in the journal Annals of Mathematics in 2005, the reviewers could say only that they were “99 per cent certain” the proof was correct. [1]

Although the final paper was eventually published in a top peer reviewed journal [4], the entire process prompted an important question. How does one referee an argument where a significant amount of the argument is the result of running tens of thousands of lines of code? To address this, Hales began a collaborative project in 2003 to create a formal proof verifiable through automated proof checking software. Called Project Flyspeck (the “F,” “P,” and “K” stand for a “Formal Proof of Kepler”), it was successfully completed in 2014: So in 2003, Hales started the Flyspeck project, an effort to vindicate his2 proof through formal verification. His team used two formal proof software assistants called Isabelle and HOL Light, both of which are built on a small kernel of logic that has been intensely scrutinised for any errors—this provides a foundation which ensures the computer can check any series of logical statements to confirm they are true . . . the Flyspeck team announced they had finally translated the dense mathematics of Hale’s proof into computerised form, and verified that it is indeed correct. 1 There

are uncountably many packings that do just as well: study the key difference between the two packings in Figure 2 and see if you can use it to build more packings of the same density. 2 Actually, the proof in the Flyspeck project involves a different local inequality based on later work of Christian Marchal [12]. In converting the proof ideas to formal form, Hales took advantage of this to get a local inequality that was cleaner and easier to prove by computer [11].


Hexagonal close packing (above)

Hexagonal close packing (front)

Cubic close packing (above)

Cubic close packing (front)


Figure 2. The √ cubic close and hexagonal close packings both have density π/(3 2) ≈ 0.74048. The difference between the two packings is in the relative orientation of every other layer. The spheres in the hexagonal packing lie directly above the spheres two layers below. The spheres in the cubic close packing do not: consider the relative orientation of the green and blue triangles suggested by the top and bottom layers. “This technology cuts the mathematical referees out of the verification process,” says Hales. “Their opinion about the correctness of the proof no longer matters.” [1]

Centennial Problem 1998 Proposed by Jeffrey Lagarias, University of Michigan. For 1 ≤ n ≤ 20, determine the minimal side length R(n) of a cube in which one can completely pack n unit-radius spheres. If you cannot get exact answers, determine upper and lower bounds.



(a) A cubic close packing of cannonballs at Fort Monroe in Hampton, Virginia, in 1861 (image public domain).

(b) Snowballs packed in hexagonal close (front) and cubic close packings (rear) (image public domain).

Figure 3. Packings of cannonballs and snowballs. (1, 1) ( √12 ,

1 √ ) 2

(0, 0)

(a) In two dimensions, the sphere occupies approximately 52.36% of the box that contains it.

(b) How does the distance between the corner of the cube to the nearest point of the sphere change as the dimension increases?

Figure 4. What proportion of an n-dimensional cube with side length 2 is taken up by the n-dimensional unit sphere? 1998: Comments Cubes and spheres. What fraction of the n-dimensional cube (with sides of length 2) is taken up by the n-dimensional unit sphere? In two dimensions the area of the circle is π, giving a ratio of π/4 ≈ 0.785398, while in three dimensions the volume of the sphere is 4π/3, giving a ratio of π/6 ≈ 0.523599; see Figure 4(a). One can show that in n dimensions the sphere has volume Vn =

π n/2 , Γ( n2 + 1)



Table 1. The ratio of the volume of the n-dimensional sphere to the n-dimensional cube tends to zero rapidly as n tends to infinity. n r(n) r(n) approx 1 1 1. 2

π 4


π 6


π2 32



π 60


n 6 7

0.523599 0.308425 0.164493

r(n) π3 384

r(n) approx 0.0807455

n 11

π5 332640

r(n) approx 0.000919973

π3 840



π6 2949120




π 6144


π4 15120


in which



π 122880

 Γ(s) =

0.0064424 0.00249039

e−x xs−1 dx,




π 8648640



π7 82575360




π 259459200


Re s > 0,


is the gamma function. For positive integers n, we have ⎧ ⎪ if x = n, ⎨(n − 1)! Γ(x) = √ (n − 2)!! ⎪ ⎩ π if x = n + 12 , n−1 2 2 in which n!! denotes the product of every other term of the corresponding factorial. For example, 6!! = 6 · 4 · 2 and 7!! = 7 · 5 · 3 · 1. Using Stirling’s formula (see the comments for the 1934 entry) √ n! ≈ nn e−n 2πn, it follows that the ratio π n/2 /Γ( n2 + 1) 2n of the volumes of the n-dimensional sphere and cube tends to zero rapidly; see Table 1. Thus, in higher dimensions the sphere occupies very little of the cube. How can this be? Our low-dimensional intuition misleads us in higher dimensions. For example, the point 1 √ (1, 1, . . . , 1) ∈ Rn n r(n) =

lies on the n-dimensional sphere. Its distance to the corner (1, 1, . . . , 1) of the n-dimensional cube is 9 : n   2  : √ √ 1 1 ; √ √ = n 1− 1− = n − 1, n n i=1 which tends to infinity! This unexpected behavior is not evident in Figure 4(b). Remark on the problem. One can show that R(1) = 2, and we think that R(2) = 1 + √23 . Then things rapidly get tricky. There are some n ≤ 20 for which the exact answer is unknown. Some records for 1 ≤ n ≤ 32 are in [3, 8].



Bibliography [1] J. Aron, Proof confirmed of 400-year-old fruit-stacking problem, New Scientist (August 12, 2014), https://www.newscientist.com/article/dn26041-proof-confirmed-of-400-yearold-fruit-stacking-problem. [2] W. Barlow, Probable nature of the internal symmetry of crystals, Nature 29 (1883), 186–188. [3] Th. Gensane, Dense packings of equal spheres in a cube, Electron. J. Combin. 11 (2004), no. 1, Research Paper 33, 17. http://www.combinatorics.org/ojs/index.php/eljc/ article/view/v11i1r33/pdf. MR2056085 [4] T. C. Hales, A proof of the Kepler conjecture, Ann. of Math. (2) 162 (2005), no. 3, 1065– 1185, DOI 10.4007/annals.2005.162.1065. http://annals.math.princeton.edu/2005/162-3/ p01. MR2179728 [5] T. C. Hales, Historical overview of the Kepler conjecture, Discrete Comput. Geom. 36 (2006), no. 1, 5–20, DOI 10.1007/s00454-005-1210-2. http://link.springer.com/article/10.1007 %2Fs00454-005-1210-2. MR2229657 [6] T. C. Hales and S. P. Ferguson, A formulation of the Kepler conjecture, Discrete Comput. Geom. 36 (2006), no. 1, 21–69, DOI 10.1007/s00454-005-1211-1. http://link.springer.com/ article/10.1007%2Fs00454-005-1211-1. MR2229658 [7] T. C. Hales, J. Harrison, S. McLaughlin, T. Nipkow, S. Obua, and R. Zumkeller, A revision of the proof of the Kepler conjecture, Discrete Comput. Geom. 44 (2010), no. 1, 1–34, DOI 10.1007/s00454-009-9148-4. http://link.springer.com/article/10.1007%2Fs00454009-9148-4. MR2639816 [8] A. Jo´ os, On the packing of fourteen congruent spheres in a cube, Geom. Dedicata 140 (2009), 49–80, DOI 10.1007/s10711-008-9308-3. http://link.springer.com/article/10. 1007%2Fs10711-008-9308-3. MR2504734 [9] C. Hardie, translation of J. Kepler’s Strena, seu de nive sexangula, Oxford University Press, 2014. [10] J. Kepler, Strena, seu de nive sexangula, Francofurti ad Moenum apud Godfefridum Tampach, 1611. [11] J. C. Lagarias, Dense sphere packings: a blueprint for formal proofs [book review of MR3012355], Bull. Amer. Math. Soc. (N.S.) 53 (2016), no. 1, 159–166, DOI 10.1090/bull/1502. MR3443950 [12] C. Marchal, Study of the Kepler’s conjecture: the problem of the closest packing, Math. Z. 267 (2011), no. 3-4, 737–765, DOI 10.1007/s00209-009-0644-2. MR2776056 [13] J. Lagarias (ed.), The Kepler Conjecture: The Hales-Ferguson Proof, Springer-Verlag, 2011. ¨ [14] L. F. T´ oth, Uber die dichteste Kugellagerung, Math. Z. 48 (1940), 676–684. [15] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Undergraduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274


Baire Category Theorem Introduction A seminal result in analysis, the Baire category theorem, was published by the French mathematician Ren´e-Louis Baire (1874–1932) in his 1899 doctoral thesis Sur les fonctions de variables r´eelles. In particular, it is the main ingredient in the proof of three fundamental theorems in functional analysis: the open mapping theorem, the closed graph theorem, and the uniform boundedness principle [3]. Because of its numerous applications and continued use in modern mathematics, its centennial merits special attention. A few definitions are necessary in order to state this important theorem. A subset A of a topological space (see the comments for the 1955 entry) is nowhere dense if its closure A− has empty interior, that is, if (A− )◦ = ∅. Figure 1 shows the closure and interior of a set in R2 . A subset A of a topological space is of the first category if it can be written as the countable union of nowhere dense sets; otherwise A is of the second category. The classical version of the Baire category theorem says that a complete metric space is of the second category in itself [1–3]. Before proceeding, we should admit that Baire’s terminology is unenlightening and dated. To add to the confusion, it has nothing to do with category theory, an important branch of mathematics that originated in the latter half of the 20th century. A more modern statement of Baire’s theorem has two parts: (a) In a complete metric space, the countable union of open dense sets is dense. (b) A complete metric space is not the countable union of nowhere dense sets. The theorem also applies to topological spaces that are homeomorphic (see the comments for the 1917 entry) to complete metric spaces.

Figure 1. A set A in R2 (left), its closure A− (middle), and its interior A◦ (right). 481



Figure 2. A fat Cantor set F obtained by removing the middle fifth of each successive interval. Like the standard middle-third Cantor set, F is uncountable, compact, and nowhere dense. However, F has Lebesgue measure 13 . What is the big deal about the Baire category theorem? As a warmup, here is a one-line proof that R is uncountable (see the 1918 entry). If R = {a1 , a2 , . . .}, then # R = ∞ n=1 {an } is the countable union of nowhere dense sets, which contradicts (b) since R is complete. A similar argument shows that the Cantor set (see the comments for the 1917 entry) is uncountable. Since the Cantor set is compact, it is complete and hence it cannot be the countable union of singletons. Here is another cute application. Let F be a fat Cantor set, that is a Cantorlike set with positive Lebesgue measure; see Figure 2. Then R is not the countable union of translated copies of F . We cannot appeal to a measure-theoretic argument here: since F has positive measure, a countable union of translates of F may well have infinite Lebesgue measure. Baire’s theorem comes to the rescue. Like the standard Cantor set, F is nowhere dense. Thus, (b) tells us that R is not the union of countably many translates of F . Why is the Baire category theorem so powerful? What is going on underneath the hood? The proof of Baire’s theorem hinges in a crucial manner upon the axiom of choice (see the comments below and in the 1964 entry). Our problem for this year is a typical application of the Baire category theorem to functional analysis. It may not be obvious how to apply the theorem to the following problem. Here is a hint: look at finite-dimensional subspaces! Centennial Problem 1999 Proposed by Mihai Stoiciu, Williams College. Let C[x] be the vector space of polynomials in one variable with complex coefficients and let · : C[x] → [0, ∞) be a norm on C[x]. Use the Baire category theorem to prove that C[x] is not complete with respect to the induced metric. That is, prove that (C[x], · ) is not a Banach space. 1999: Comments Axiom of choice. The proof of the Baire category theorem, which can be found in most real analysis textbooks, involves the subtle use of the axiom of choice (AC). See the comments for the 1964 entry for a statement of the axiom and a few general comments. We are interested here in discussing a few equivalent formulations of AC. To continue our discussion, we require a few definitions.



{1, 2, 3} {1, 2}

{1, 3}

{2, 3}




∅ Figure 3. A Hasse diagram illustrating the poset P({1, 2, 3}), ordered by ⊆. A partial order on a set A is a relation ≤ on A that is (a) (reflexive) a ≤ a, (b) (antisymmetric) a ≤ b and b ≤ a imply a = b, (c) (transitive) a ≤ b and b ≤ c imply a ≤ c. A partially ordered set is called a poset. The symbols are defined in terms of ≤ in the natural way. In a poset, two elements need not be comparable; that is, there may exist a, b ∈ A such that neither a ≤ b nor b ≤ a holds. A poset is totally ordered if for every a, b ∈ A, either a ≤ b or b ≤ a. A chain is a totally ordered subset of a poset (see the 1918 problem). A totally ordered poset is well-ordered if each nonempty subset of A has a smallest element with respect to ≤. The powerset P({1, 2, 3}), when endowed with the partial order ⊂, is a poset; see Figure 3. This poset has a unique largest element, {1, 2, 3}, and a unique smallest element, ∅. The elements {1} and {2} are not comparable; neither is greater than or equal to the other. The set   ∅, {1}, {1, 2}, {1, 2, 3} is a chain in P({1, 2, 3}) that is well-ordered. Many useful results in various branches of mathematics are known to be equivalent under the axioms of Zermelo–Fraenkel set theory: (a) Axiom of choice. If {Xα }α∈I is a nonempty collection of nonempty sets, # then there is an f : I → α∈I Xα such that f (α) ∈ Xα . (b) Well-ordering principle. Every set can be well-ordered.1 (c) Cardinal comparability. If A, B are sets, then there is an injection f : A → B or an injection g : B → A. (d) Zorn’s lemma. Every nonempty partially ordered set in which every chain has an upper bound contains at least one maximal element. 1 The

order produced by the well-ordering principle need not correspond to any sort of natural order structure that A possesses. The axiom of choice implies that R can be well-ordered, but the order has no relation to the standard order on R.



(e) Hausdorff maximality principle. Every partially ordered set has a maximal totally ordered subset. (f) The Cartesian product of nonempty sets is nonempty. (g) Every vector space has a basis. (h) Every poset has a maximal antichain.2 (i) Every connected graph has a spanning tree.3 The following common theorems require the axiom of choice or some weaker variant of it such as the axiom of countable choice (in which countably many arbitrary choices can always be made): • A countable union of countable sets is countable. • Every infinite set has a countable infinite subset. • Every field has an algebraic closure.4 • Nielsen–Schreier theorem. Every subgroup of a free group is free. • Baire category theorem. In a complete metric space, the countable intersection of open, dense sets is dense. The bizarre results that follow from the axiom of choice, coupled with its intuitive and useful consequences, spur one to ask if AC is true or false. This is, in a precise sense, a question that cannot be answered: G¨odel and Cohen proved that AC is independent of Zermelo–Fraenkel set theory. A set of axioms is consistent if there does not exist a statement S such that both S and its negation ¬S are provable from the axioms; that is, the axioms are not selfcontradictory. G¨odel’s second incompleteness theorem (see the 1929 entry) asserts that no “sufficiently complicated” axiomatic system, including Zermelo–Fraenkel set theory (ZF), can prove its own consistency. Outside of logic and set theory, few working mathematicians concern themselves with the consistency of ZF. Almost everyone believes that ZF is consistent, but G¨ odel’s theorem tells us that we cannot hope to prove its consistency without recourse to a more powerful axiom system; then we face the problem of proving that that system is consistent! Think of systems of axioms as “operating systems” for software. Most of modern mathematics “runs under” ZFC, the Zermelo–Fraenkel axioms augmented with the axiom of choice. ZFC is sufficient to “run” the software that most average “users” (mathematicians, statisticians, physicists, computer scientists, and so forth) need. It has not “crashed” (been proven inconsistent) yet, but no one knows if ZFC is “crash-proof” (consistent). There are other, more exotic operating systems out 2 An

antichain is a subset of a poset with the property that any two distinct elements in the subset are not comparable. 3 A spanning tree in a graph G is a connected subgraph that contains every vertex of G and which contains no cycles. 4 A field F is algebraically closed if every polynomial with coefficients in F has a root in F. The standard example is the complex field C. That C is algebraically closed is the fundamental theorem of algebra.



there, such as ZFC augmented by certain large cardinal axioms, but mostly these are for “power users” such as set theorists and logicians. The average user is content running on ZFC and rarely thinks about operating systems, if at all. Bibliography [1] G. B. Folland, Real analysis: Modern techniques and their applications, 2nd ed., Pure and Applied Mathematics (New York), A Wiley-Interscience Publication, John Wiley & Sons, Inc., New York, 1999. MR1681462 [2] S. H. Jones, Applications of the Baire category theorem, Real Anal. Exchange 23 (1997/98), no. 2, 363–394. https://projecteuclid.org/euclid.rae/1337001353. MR1640007 [3] T. Tao, The Baire category theorem and its Banach space consequences, http://terrytao. wordpress.com/2009/02/01/245b-notes-9-the-baire-category-theorem-and-its-banachspace-consequences.


R Introduction This is another entry for which there were at least two good options. The Clay Millennium Problems were one natural candidate; we briefly discuss them in the comments below. This year’s winner is one of the most popular statistical programming languages and environments: R. R was created in 1993 by Ross Ihaka (1954– ) and Robert Gentleman (1959– ) at the University of Auckland, New Zealand. Its name is both a reference to the first names of its inventors and to the underlying S programming language that was developed at Bell Labs in the 1970s [13]. R, which is open source and freely available, is widely used in industry and academia to perform statistical computations. There are numerous developers and thousands of useful packages available online. Version 1.0.0 of R was released on February 29, 2000. This was the first version considered stable enough for general use [5]: The release of a current major version indicates that we believe that R has reached a level of stability and maturity that makes it suitable for production use. Also, the release of 1.0.0 marks that the base language and the API for extension writers will remain stable for the foreseeable future. In addition we have taken the opportunity to tie up as many loose ends as we could.

In the comments to the 1953 entry, we saw how Andrey Markov developed Markov chains to analyze the writing of Alexander Pushkin. What about the creation of literature? A little probability theory ensures that an immortal monkey who pounds away randomly at a typewriter for all eternity will almost surely produce the complete works of William Shakespeare1 , along with the true version of his lost play Love’s Labour’s Won, along with many false versions2 . What about more sensible applications of mathematics to literature? For example, we might wish to determine if a certain passage was written by the purported author. Has an author’s style changed over time? All of these questions involve culling large sets of linguistic data, then parsing and analyzing it. Maciej Eder, Jan Rybicki, and Mike Kestemont created an R package to perform such analyses [4]. The motivating examples they consider range from a pseudonymously published work written by J. K. Rowling (1965– ) to the alleged original version of To Kill a Mockingbird by Harper Lee (1926–2016). Their paper 1 “Ford!” he said, “there’s an infinite number of monkeys outside who want to talk to us about this script for Hamlet they’ve worked out” [1]. 2 Including the fanciful script of the 2007 Doctor Who episode The Shakespeare Code.



2000. R

Figure 1. Bookstrap Consensus Tree for Harper Lee, from [4].

is full of code and detailed textual analyses (see Figure 1) and gives a small glimpse of what one can do with R: This software paper describes ‘Stylometry with R’ (stylo), a flexible R package for the high-level analysis of writing style in stylometry. Stylometry (computational stylistics) is concerned with the quantitative study of writing style, e.g. authorship verification, an application which has considerable potential in forensic contexts, as well as historical research. In this paper we introduce the possibilities of stylo for computational text analysis, via a number of dummy case studies from English and French literature. We demonstrate how the package is particularly useful in the exploratory statistical analysis of texts, e.g. with respect to authorial writing style. Because stylo provides an attractive graphical user interface for high-level exploratory analyses, it is especially suited for an audience of novices, without programming skills (e.g. from the Digital Humanities). More experienced users can benefit from our implementation of a series of standard pipelines for text processing, as well as a number of similarity metrics.



Centennial Problem 2000 Proposed by Steven J. Miller, Williams College. To be scientifically literate these days one must understand statistics and be able to write simple programs to cull and analyze data. Download R and analyze a real-world problem. For example, look at all batters in baseball with (a) bases empty and (b) just a runner on first and no outs. Are the batting averages in the two cases statistically different? To solve this problem you will need to find game data online and reconstruct the games to get the game state of each at bat. 2000: Comments Monkey business. On the theme of monkey-generated literature, we cannot pass up the opportunity to recount the bizarre story of Pierre Brassau. In 1964, tabloid journalist ˚ Ake Axelsson had a four-year-old chimpanzee produce a series of paintings that were later exhibited in the Gallerie Christinae in G¨ oteborg under the pretense that they were the work of “Pierre Brassau,” an unheralded French painter. One critic applauded the work: “Brassau paints with powerful strokes, but also with clear determination. His brush strokes twist with furious fastidiousness. Pierre is an artist who performs with the delicacy of a ballet dancer” [12]. Needless to say, many in the Swedish art world were not amused by the hoax. Maximum amusement. On the theme of hoaxes (make sure to check out the comments for the 1996 entry), MIT students Jeremy Stribling, Maxwell Krohn, and Daniel Aguayo wrote SCIgen, “a program that generates random Computer Science research papers, including graphs, figures, and citations” [9]. It produced the nowinfamous paper Rooter: A Methodology for the Typical Unification of Access Points and Redundancy [8], which opens with the immortal lines: Many scholars would agree that, had it not been for active networks, the simulation of Lamport clocks might never have occurred. The notion that end-users synchronize with the investigation of Markov models is rarely outdated. A theoretical grand challenge in theory is the important unification of virtual machines and real-time theory. To what extent can web browsers be constructed to achieve this purpose? Certainly, the usual methods for the emulation of Smalltalk that paved the way for the investigation of rasterization do not apply in this area. In the opinions of many, despite the fact that conventional wisdom states that this grand challenge is continuously answered by the study of access points, we believe that a different solution is necessary. It should be noted that Rooter runs in Ω(log log n). Certainly, the shortcoming of this type of solution, however, is that compilers and superpages are mostly incompatible. Despite the fact that similar methodologies visualize XML, we surmount this issue without synthesizing distributed archetypes.

This meaningless load of fetid dingo’s kidneys was accepted by the Ninth World Multiconference on Systemics, Cybernetics and Informatics (WMSCI 2005). What was the point of this exercise? The mischievous trio hoped to cause “maximum amusement” and “test whether such meaningless manuscripts could pass the screening procedure for conferences that, they feel, exist simply to make money” [2].


2000. R

Curiously, the statistical generation and analysis of research papers has come full circle. In 2014, a study by computer scientist Cyril Labb´e revealed that at least 120 nonsense papers generated by SCIgen had been published in conference proceedings between 2008 and 2013 [6]!

Clay Millennium Problems. The Clay Mathematics Institute was founded in 1998 by Landon T. Clay (1926–2017), a successful venture capitalist with a profound appreciation for mathematics. The institute is most well known for its proposal of the Millennium Prize Problems, which were announced on May 24, 2000, in a series of lectures at the Coll`ege de France by Timothy Gowers, Michael Atiyah, and John Tate (1925– ). According to the institute [3]: The Prizes were conceived to record some of the most difficult problems with which mathematicians were grappling at the turn of the second millennium; to elevate in the consciousness of the general public the fact that in mathematics, the frontier is still open and abounds in important unsolved problems; to emphasize the importance of working towards a solution of the deepest, most difficult problems; and to recognize achievement in mathematics of historical magnitude.

The millennium problems are a 21st-century analogue of David Hilbert’s celebrated list from 1900; see the 1935, 1963, 1970, 1980, and 1983 entries. There is a modern twist: a solution to a millennium problem earns the solver a million-dollar prize! Some of the millennium problems are old favorites. For example, the Riemann hypothesis, Hilbert’s eighth problem, is one of the seven Clay problems. Others are more modern and would have been inconceivable in Hilbert’s time. The P versus NP problem, for example, involves computational complexity theory, a field that blossomed only after the advent of the computer. Several of the problems are discussed elsewhere in this book; others we have not touched. We can hardly do better than to quote the original summaries provided by the Clay foundation [3]; we do so frequently below. • Yang–Mills existence and mass gap. The problem requires a proof that any compact simple gauge group gives rise to a nontrivial quantum Yang–Mills theory on R4 with a positive mass gap [14]. What does this mean? Quantum Yang–Mills theory is now the foundation of most of elementary particle theory, and its predictions have been tested at many experimental laboratories, but its mathematical foundation is still unclear. The successful use of Yang–Mills theory to describe the strong interactions of elementary particles depends on a subtle quantum mechanical property called the “mass gap”: the quantum particles have positive masses, even though the classical waves travel at the speed of light. [3]

The existence of the mass gap has been confirmed by experimental physicists and computer simulations, although a mathematical explanation is lacking. • Riemann hypothesis. By now the reader is well versed on the Riemann hypothesis, one of the most stubborn problems in mathematics. It is the subject of our 1942, 1945, and 1987 entries. Also see the comments for the 1933 and 1948 entries.



Figure 2. Turbulent fluid flow at multiple scales. Photo by Steven Mathey under Creative Commons Attribution-Share Alike 4.0 International license. https://commons.wikimedia.org/ wiki/File:Self_Similar_Turbulence.png

• P versus NP problem. Let P denote the class of decision problems that can be solved in polynomial time (with respect to the length of the input) and let NP be the class of problems for which a proposed solution can be verified in polynomial time. Thus, P ⊆ NP. The million-dollar question is whether equality holds [11]. That is, does knowing how to quickly verify a solution to a problem automatically mean that a fast algorithm for solving that problem exists? For example, one multiplication verifies the correctness of an integer factorization. Does this imply that a deterministic, polynomial-time integer factorization algorithm exists? • Navier–Stokes equation. This complicated system of partial differential equations with prescribed boundary conditions, named after Claude-Louis Navier (1785–1836) and George Gabriel Stokes, (1819–1903), governs three-dimensional fluid flow. For example, the turbulent behavior of water and air seems to adhere to these equations; see Figure 2. Under reasonable mathematical hypotheses, do solutions exist? Are they unique? Or can solutions “break down” in finite time? How well does Navier–Stokes model physical reality? • Hodge conjecture. This conjecture concerns how much of the topology of the solution set of a system of algebraic equations can be defined in terms of further algebraic equations. Since this is a tough one to describe with any degree of faithfulness, we refer the reader to [3, 10] for further information. • Poincar´ e conjecture. Is every simply connected, closed, three-dimensional manifold homeomorphic to the three-dimensional sphere? This conjecture has a long and storied history and, by some accounts, it has resulted in two or three Fields Medals! See the 2003 entry for more details. • Birch and Swinnerton-Dyer conjecture. What is the relationship between the number of points on an elliptic curve over finite fields of prime order and the rank of the group of rational points on the curve? See the comments for the 1921 entry for a detailed discussion of this conjecture. Of the seven millennium problems, only the Poincar´e conjecture has been resolved; see the 2003 entry.


2000. R

Bibliography [1] D. Adams, The Hitchhiker’s Guide to the Galaxy, Pan Books, 1979. [2] P. Ball, Computer conference welcomes gobbledegook paper, Nature.com, https://www. nature.com/articles/nature03653. [3] Clay Mathematics Institute, The Millennium Prize Problems, http://www.claymath.org/ millennium-problems/millennium-prize-problems. [4] M. Eder, J. Rybicki, and M. Kestemont, Stylometry with R: A Package for Computational Text Analysis, The R Journal 8 (2016), no. 1, 107–121. https://journal.r-project.org/ archive/2016/RJ-2016-007/RJ-2016-007.pdf. [5] R Developer Page, Statistical analysis environment “R” version 1.0.0 is released, http:// developer.r-project.org/R-release-1.0.0.txt. [6] R. Van Noorden, Publishers withdraw more than 120 gibberish papers, Nature.com, https:// www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763. [7] The R Project for Statistical Computing, http://www.r-project.org/. [8] J. Stribling, D. Aguayo, and M. Krohn, Rooter: a methodology for the typical unification of access points and redundancy, https://pdos.csail.mit.edu/archive/scigen/rooter.pdf. [9] J. Stribling, M. Krohn, and D. Aguayo, SCIgen—An Automatic CS Paper Generator, https://pdos.csail.mit.edu/archive/scigen/. [10] Wikipedia, Hodge conjecture, https://en.wikipedia.org/wiki/Hodge_conjecture. [11] Wikipedia, P versus NP problem, https://en.wikipedia.org/wiki/P_versus_NP_problem. [12] Wikipedia, Pierre Brassau, https://en.wikipedia.org/wiki/Pierre_Brassau. [13] Wikipedia, R (programming language), http://en.wikipedia.org/wiki/R (programming language). [14] Wikipedia, Yang–Mills existence and mass gap, https://en.wikipedia.org/wiki/YangMills_existence_and_mass_gap


Colin Hughes Founds Project Euler Introduction Project Euler, created by Colin Hughes in 2001, is an outstanding website that has provided countless hours of enjoyment to mathematicians, computer scientists, and other computationally minded people. It describes itself as follows [1]: Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems. The motivation for starting Project Euler, and its continuation, is to provide a platform for the inquiring mind to delve into unfamiliar areas and learn new concepts in a fun and recreational context.

For many of the problems, one can quickly come up with a program that will eventually find the solution. However, this does not mean that the program will run in a reasonable amount of time. As an extreme example of this phenomenon, consider chess. Since there are only finitely many possible board configurations, an analysis of chess can be reduced to a finite computation. Does the first player have a winning strategy? Can the second player always force the game to end in a draw? Unfortunately, the number of board configurations and possible moves is far too large for humans or their computers to analyze by brute force. The same is true in many of the Project Euler problems: although one can describe a brute-force approach, the naive approach simply takes too long to run. Project Euler problems illustrate several key points: • Theory has a place in computational problems: a clever reformulation of the problem may prove more tractable than the original approach. • Implementation is nontrivial: different programming languages and environments may be better suited to different tasks. • Although brute force sometimes works, an elegant approach is often more illuminating. We illustrate these principles with the following problem. Consider a large triangle and several possible triangulations; see Figure 1(a). Assign colors (red, green, or blue) to each vertex as follows: (a) the bottom left vertex of the original triangle is red, the bottom right is green, the top is blue; 493



(a) The initial triangle has vertices of three different colors.

(b) A refinement with one subtriangle with vertices of three different colors.

(c) A further refinement with three subtriangles with vertices of three different colors.

Figure 1. Sperner’s lemma ensures that each refinement contains an odd number of subtriangles that each have three vertices of distinct colors. (b) any vertex on an outer edge of the original triangle has its color determined by the two vertices adjacent to it; (c) internal vertices may be colored red, green, or blue with no restrictions. Does there exist a small triangle with red, green, and blue vertices? Given a fixed subdivision, we can check all possible labelings by brute force. However, this will not settle the general question since there are infinitely many possible subdivisions that must be considered. An elegant approach to the problem is to prove that the number of triangles with distinctly labeled vertices is odd; therefore at least one such triangle exists. This result, now known as Sperner’s lemma, was discovered by Emanuel Sperner (1905–1980) in 1928; see [2, 4, 5]. Surprisingly, it can be used to prove Brouwer’s fixed-point theorem, a seminal result in topology; see the 2009 entry. Here is a sketch of the proof. First, label the colors 1, 2, 3. Let Tabc , in which a ≤ b ≤ c, denote the number of small triangles with vertices labeled a, b, and c in some order. We want to show that T123 is positive. Let S12 denote the number of 1−2 segments on the bottom of the original triangle. Then twice the number of 1−2 segments in the subdivision is T123 + 2T112 + 2T122 + S12 . This is because a 1−1−2 or 1−2−2 triangle generates two 1−2 segments. Thus, the parity of T123 is the same as that of S12 . We leave it as an exercise to show that the number of 1 − 2 segments on the bottom edge of the original triangle is odd, which proves the claim. Centennial Problem 2001 Proposed by Steven J. Miller, Williams College. There are now over 400 problems of various levels of difficulty on the Project Euler website [1]. To solve these problems quickly requires a deep understanding of both mathematics (which often has formulas to cut down on the computations)



and computer science (to efficiently code the problem). Form a group and see how many of these problems you can solve. 2001: Comments Fibonacci fun. The twenty-fifth problem on the Project Euler website concerns the Fibonacci numbers, defined by F0 = 0,

F1 = 1,


Fn+1 = Fn + Fn−1 .


It asks: What is the index of the first term in the Fibonacci sequence to contain 1,000 digits?

One can solve this by brute force. Here is a short Mathematica program to solve the problem by searching among the first 100,000 Fibonacci numbers: For[n = 1, n = 999, Print[n]; Break[]] ] The computer provides the answer, n = 4,782, in a fraction of a second. However, this is not terribly satisfactory since it does not suggest a general method. We relied upon the “black box” command Fibonacci[] to do the work for us. What would happen if instead of 1,000 digits we insisted upon a billion? Do we really understand what is going on? One of the goals of the Project Euler problems is to show the interplay between theory and coding. Is there a more elegant approach to the problem above? Binet’s formula is a beautiful closed-form expression for the nth Fibonacci number:  √ n  √ n  1− 5 1+ 5 1 Fn = √ − . (2001.2) 2 2 5 Although it is named after Jacques Philippe Marie Binet (1786–1856), who found it in 1843, the formula was already known to Abraham de Moivre (1667–1754). This is a classic example of Stigler’s law of eponymy; see the comments for the 2010 entry. The comments below contain two derivations of Binet’s formula. How does Binet’s formula help? First observe that √ 1+ 5 ≈ 1.61803398 2 is the golden ratio from classical geometry. Its algebraic conjugate,1 √ 1− 5 ≈ −0.61803398, 2 has absolute value less than one. Therefore, (2001.2) ensures that  √ n 1+ 5 1 Fn ≈ √ 2 5 1 The golden ratio is a root of z 2 − z − 1, which is irreducible over Z (and hence, by Gauss’s lemma, over Q). The other root of this polynomial is an algebraic conjugate of the golden ratio.



with an error that tends to zero exponentially fast. If we want the first index n such that Fn has k + 1 digits, then we solve  √ n √ 1+ 5 k log 10 + log 5 1 k √ √ ≈ 10 and deduce n ≈ . 2 5 log( 1+ 5 ) 2

Let k = 999 and conclude that the first Fibonacci with 1,000 digits has index approximately 4,781.86. Rounding up to 4,782 yields the answer. In addition to estimating the critical index, we can also check our claim with Binet’s formula: F4,781 ≈ 6.613373228392440 × 10998 , F4,782 ≈ 1.070066266382759 × 10999 . We are correct! For more on the Fibonacci numbers, see the 1938, 1957, 1970, and 1980 entries. Binet’s formula via linear algebra. Let ! 1 1 A = 1 0 and use induction to confirm that2 An =

Fn+1 Fn

Fn Fn−1

! (2001.3)

for n = 1, 2, 3, . . .. The characteristic polynomial of A is pA (z) = z 2 − z − 1, √ √ 1+ 5 1− 5 φ= and ψ= . (2001.4) 2 2 Eigenvectors that correspond to the eigenvalues (2001.4) are s1 = [1 − ψ]T and s2 = [1 −φ]T . This yields the diagonalization A = SDS −1 , in which S = [s1 s2 ] and D = diag(φ, ψ). Thus,

which has roots

An = (SDS −1 )n = (SDS −1 )(SDS −1 ) · · · (SDS −1 ) = SDn S −1    n times

and hence ! n ! ! 1 −φ 1 1 φ Fn 0 Fn+1 = −√ −ψ −φ 0 ψn Fn Fn−1 5 ψ            An



S −1

−1 1

! .



! ! 1  φn − ψ n Fn+1 Fn = √ , Fn Fn−1  5  in which  denotes entries whose exact values are irrelevant. Compare the (1, 2) entries on both sides and obtain Fn = √15 (φn − ψ n ), which is Binet’s formula. 2 Here are some nice consequences of (2001.3). Take determinants in (2001.3) and obtain Simpson’s formula: Fn+1 Fn−1 − Fn2 = (−1)n . Compare the lower-left entries of Am+n = Am An and obtain Fm+n = Fm−1 Fn + Fm Fn+1 . Induction and the preceding formula can be used to prove that Fd |Fn whenever d|n. For example, F5 = 5 divides F10 = 55.


Binet’s formula via calculus. Let ∞  Fn z n f (z) =




denote the generating function for the Fibonacci numbers. Then ∞  Fn+2 z n+2 f (z) = F0 + F1 z + n=0

= z+ = z+


(Fn+1 + Fn )z n+2

n=0 ∞ 

Fn+1 z





Fn z n+2


= z + zf (z) + z 2 f (z), and hence

z . (2001.7) 1 − z − z2 The roots of the denominator are −φ and −ψ. A partial fraction expansion expresses (2001.7) as a linear combination of two geometric series. Some tedious calculations eventually yield ∞  1 √ (φn − ψ n )z n . f (z) = 5 n=0 f (z) =

Compare the coefficients in (2001.6) and the preceding to deduce Binet’s formula. Bibliography [1] Project Euler.net, https://projecteuler.net/. [2] J. Franklin, Methods of mathematical economics: Linear and nonlinear programming, fixedpoint theorems, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Berlin, 1980. MR602694 [3] T. Koshy, Fibonacci and Lucas numbers with applications, Pure and Applied Mathematics (New York), Wiley-Interscience, New York, 2001. MR1855020 [4] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Undergraduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274 [5] E. Sperner, Neuer beweis f¨ ur die invarianz der dimensionszahl und des gebietes (German), Abh. Math. Sem. Univ. Hamburg 6 (1928), no. 1, 265–272, DOI 10.1007/BF02940617. MR3069504


PRIMES in P Introduction Given a large integer n, how quickly can one determine whether it is prime or composite? The naive method is to divide n by each prime 2, 3, 5, 7, . . .. If one √ reaches n without finding a factor, then n is prime. However, if n has a few hundred digits, this approach can take longer than the age of the universe! A more efficient approach is based upon Fermat’s little theorem, which says that if p is prime and p does not divide a, then1 ap−1 ≡ 1 (mod p).


First, select an integer a. The Euclidean algorithm rapidly computes gcd(a, n) without the need to factor either number.2 If gcd(a, n) = 1, then n is composite. If gcd(a, n) = 1 and an−1 ≡ 1 (mod n), then Fermat’s little theorem ensures that n is composite (although it does not provide a specific factor of n). If an−1 ≡ 1 (mod n), then the test is inconclusive. In this case, repeat the test with another base a. This can be implemented rapidly on a computer since an−1 need not be computed directly. An example illustrates the approach. Suppose that we wish to determine whether n = 1763 is prime or composite. We first write 1763 in binary. Divide n = 1763 by the largest power of 2 that is at most n and repeat: 1762 = 1024 + 738, 738 = 512 + 226, 226 = 128 + 98, 98 = 64 + 34, 34 = 32 + 2. Thus, 1763 = 1024 + 512 + 128 + 64 + 32 + 2 = 210 + 29 + 27 + 26 + 25 + 21 = (11011100010)2 . 1 To prove Fermat’s little theorem, first show that a, 2a, 3a, . . . , (p − 1)a are distinct and nonzero modulo p. Then a, 2a, 3a, . . . , (p − 1)a are congruent modulo p to 1, 2, 3, . . . , (p − 1), in some order. Thus, a · 2a · 3a · · · (p − 1)a ≡ 1 · 2 · 3 · · · (p − 1) (mod p), and hence ap−1 (p − 1)! ≡ (p − 1)! (mod p). Since p does not divide (p − 1)!, we obtain ap−1 ≡ 1 (mod p). 2 A theorem of Gabriel Lam´ e says that the number of steps in the Euclidean algorithm is at most five times the number of base-10 digits of min{a, n}.




Repeated squaring and reduction modulo 1763 provide 0

= 21 ≡ 2 (mod 1763),


= 22 ≡ 4 (mod 1763),


= 24 ≡ 16 (mod 1763),



= 28 ≡ 256 (mod 1763),


= 216 ≡ 305 (mod 1763),


= 232 ≡ 1349 (mod 1763),



= 264 ≡ 385 (mod 1763),


= 2128 ≡ 133 (mod 1763),


= 2256 ≡ 59 (mod 1763),


= 2512 ≡ 1718 (mod 1763),

22 22 22

22 22

22 22 22 10


= 21024 ≡ 262 (mod 1763).

Reducing modulo 1763 at each step, we obtain 10

21762 ≡ 22

+29 +27 +26 +25 +21







≡ 22 22 22 22 22 22 (mod 1763)

≡ 262 · 1718 · 133 · 385 · 1349 · 4 (mod 1763) ≡ 262 · 1718 · 133 · 385 · 107 (mod 1763) ≡ 262 · 1718 · 133 · 646 (mod 1763) ≡ 262 · 1718 · 1294 (mod 1763) ≡ 262 · 1712 (mod 1763) ≡ 742 (mod 1763). Since 21762 ≡ 1 (mod 1763), Fermat’s little theorem implies that 1763 is composite. There are several important points here. • We have proved that n is composite without providing a factor of n (for those dying of curiosity: 1763 = 41 · 43). • Judicious reduction modulo n means that our computations do not involve numbers that are significantly larger than n. √ • The number of steps is proportional to log2 n and not n, as in the naive method. Although the Fermat-based algorithm is fast, it is not always conclusive. For example, n = 341 = 11 · 41 and 2340 ≡ 1 (mod 341). We say that 341 is a pseudoprime for the base 2. There are infinitely many such numbers; see the comments below. The first few are 341, 561, 645, 1105, 1387, 1729, 1905, 2047, 2465, 2701, 2821, 3277, 4033, 4369, 4371, 4681, 5461, 6601, 7957, 8321, 8481, 8911, 10261, 10585, 11305, 12801, 13741, 13747, 13981, 14491, 15709, 15841, 16705, 18705, 18721, 19951.



Fortunately, 3340 ≡ 56 (mod 341) and hence 3 is a witness to the fact that 341 is composite; that is, 341 is not a pseudoprime for the base 3. By testing an integer n with several different bases, we can weed out more pseudoprimes. Unfortunately, there are composite numbers n that are pseudoprime for all bases 2 ≤ k ≤ n − 1 with gcd(k, n) = 1.3 These Carmichael numbers always fool our Fermat-based primality test; see the 2010 entry. Is there a polynomial-time algorithm that distinguishes primes and composites? By polynomial time we mean that there are constants A, B > 0 such that the number of elementary steps performed by the algorithm on the input n is at most A(log n)B . The focus on log n is because the length of the decimal (or binary) representation of n is proportional to log n. There are algorithms that depend upon randomly selected parameters that can do the job. One example is the Miller–Rabin test, named after Gary Lee Miller and Michael Oser Rabin (1931– ). Let n > 2 and write n − 1 = 2r m, in which m ≥ 1 is odd and r ≥ 0. If bm ≡ 1 (mod n) or




≡ −1 (mod n)

for some j ∈ {0, 1, 2, . . . , r − 1},

then n passes Miller’s test for the base b. If n fails the test for some base b, then it is composite. It can be shown that if n is an odd composite number, then n passes Miller’s test for at most (n−1)/4 bases b with 1 ≤ b ≤ n−1.4 This yields the Miller– Rabin probabilistic primality test: if n passes Miller’s test for k different bases, then the probability that n is composite is at most 1/4k . For example, if n passes the test for k = 50 bases, then this probability is 1/450 ≈ 7.89 × 10−31 . Although we are not 100% certain that n is a prime, our level of confidence is sufficient for most industrial applications. Sometimes speed is more important than absolute certainty. It is conceivable, although highly unlikely, that n is composite but that we continually pick from among the (n − 1)/4 “bad” bases. Thus, we cannot guarantee that the Miller–Rabin test will work in polynomial time. On the other hand, the Adleman–Huang test is a random procedure that is guaranteed to find a proof of primality for a prime input in polynomial time [1]. What we really want is a deterministic, polynomial-time algorithm that distinguishes primes and composites. Over the years there were some close calls, but it was not until an electrifying announcement from India in 2002 that we had an answer. Manindra Agrawal (1966– ) and his two undergraduate honors students Neeraj Kayal (1979– ) and Nitin Saxena (1981– ) gave a fairly simple deterministic, polynomial-time algorithm that distinguishes primes from composites. It involves a generalization of Fermat’s little theorem to the ring of polynomials over a finite field of prime order modulo an irreducible polynomial. We follow the description of the AKS primality test (named for Agrawal, Kayal, and Saxena) in [3], which also contains a number of worked examples. We first require some preliminaries. Recall that   (Z/nZ)× = k ∈ {1, 2, . . . , n − 1} : gcd(k, n) = 1 gcd(k, n) = 1, then we already know that n is composite. the generalized Riemann hypothesis is true, then for every composite integer n, there is a b < 2(log2 n)2 for which n fails Miller’s test for the base b. 3 If 4 If



is a group under multiplication modulo n. The order of x ∈ (Z/nZ)× is the smallest natural number k such that xk ≡ 1 (mod n). For example, (Z/12Z)× = {1, 5, 7, 11}. Each element has order 2 since 12 , 52 , 72 , 112 ≡ 1 (mod 12). For polynomials f (x), g(x), and m(x) with integer coefficients and deg m(x) ≥ 1, we say that f (x) ≡ g(x) (mod m(x))


m|(f − g),

that is, if and only if there is a polynomial h(x) with integer coefficients such that h(x)m(x) = f (x) − g(x). For example, 3x2 + 7x + 4 ≡ x2 + 2x + 1 (mod(x + 1)) since (3x2 + 7x + 4) − (x2 + 2x + 1) = (2x + 3)(x + 1). The great insight of Agrawal–Kayal–Saxena was to combine regular and polynomial congruences. We say that f (x) ≡ g(x) (mod n, m(x)) if there is an h(x) with f (x) − g(x) − h(x)m(x) ≡ 0 (mod n). Although we can describe the AKS primality test, showing that it runs in polynomial time would take us too far afield. See the original paper [2] or the exposition in [3]. AKS primality test. Let N > 1 be a positive integer. 1. Test if N is a perfect kth power for some k ≥ 2. If it is, then N is composite and stop. Else proceed to step 2. 2. Find the smallest prime r such that the order of N modulo r is greater than (log2 N )2 . 3. If any of the numbers in {2, 3, . . . , r} share a common divisor other than 1 with N , then N is composite and stop. Else proceed to step 4. 4. If N ≤ r, then N is prime and stop. Else proceed to step 5.  5. For each positive integer a at most φ(r) log2 N , check if (x + a)N ≡ xN + a (mod xr − 1, N ). If there is an a for which the congruence  fails, then N is composite; if the congruence holds for all such a at most φ(r) log2 N , then N is prime. If the AKS primality test terminates in either step 1 or 3, then it produces a factor of N . This is done by applying the Euclidean algorithm in step 3 to r and N to find their greatest common divisor. If the program ends in step 5, then N is composite but we do not obtain a factor.



Agrawal, Kayal, and Saxena were successful in de-randomizing the prime recognition problem. Here is another problem for which there is a random polynomialtime algorithm, yet for which we do not know if there is a deterministic polynomialtime algorithm. Centennial Problem 2002 Proposed by Carl Pomerance, Dartmouth College. An integer a is a quadratic nonresidue modulo p if x2 ≡ a (mod p) has no solutions. Exactly half of the nonzero residues modulo p fit the bill. A candidate can be checked (in polynomial time) via Euler’s criterion or the reciprocity law for Jacobi symbols. Thus, randomly selecting nonzero residues a until you get a quadratic nonresidue should succeed in around two tries! A possible deterministic algorithm sequentially tries small a until a quadratic nonresidue is found. This works well for a large proportion of the primes. For example, one of −1, 2, 3, 5 is a quadratic nonresidue for an odd prime p unless p ≡ 1 or 49 (mod 120). It is believed that this procedure works in polynomial time, but this is only known under the extended Riemann hypothesis. Another possible strategy is to start with −1 and sequentially take square roots modulo p until a nonsquare is found. Unfortunately, we know no method to take modular square roots in deterministic polynomial time, unless one has an oracle that provides quadratic nonresidues! Is there a deterministic, polynomial-time algorithm to produce a quadratic nonresidue modulo an odd prime p? 2002: Comments Infinitude of base-2 psuedoprimes. To demonstrate that there are infinitely many pseudoprimes for the base 2, it suffices to show that for each odd, base-2 pseudoprime, there is a larger odd one. We start our construction with n = 341. Let n be an odd pseudoprime for the base 2 and let Mn = 2n − 1 denote the nth Mersenne number, which is known to be composite (see the 1996 entry). Because 2n−1 ≡ 1 (mod n), we have Mn − 1 = 2n − 2 = 2(2n−1 − 1) = 2dn for some d. Thus, 2(Mn −1)/2 − 1 = 2dn − 1 = (2n − 1)(2n(d−1) + 2n(d−2) + · · · + 2n + 1) = Mn (2n(d−1) + 2n(d−2) + · · · + 2n + 1) ≡ 0 (mod Mn ). Since Mn > n is composite and 2Mn −1 ≡ (2(Mn −1)/2 )2 ≡ 12 ≡ 1 (mod Mn ), we conclude that Mn is a pseudoprime for the base 2.



Although there are infinitely many pseudoprimes for the base 2, our method does not provide an efficient method for producing them. Indeed, M341 = 2341 −1 is far larger than 561, the smallest pseudoprime for the base 2 after 341. The number 561 is also the first Carmichael number; see the 2010 entry. Carl Pomerance alerted us to a simpler proof of the infinitude of base-2 pseudoprimes. We claim that if p ≥ 5 is prime, then n = (4p −1)/3 is a base-2 pseudoprime. First observe that 4p ≡ 1 (mod 3), so n is indeed an integer. Moreover, (2p + 1)/3 is an integer and hence n = (2p − 1)((2p + 1)/3) is composite. Fermat’s theorem ensures that n ≡ (2p − 1)(2p + 1)3−1 ≡ (2 − 1)(2 + 1)3−1 ≡ 1 (mod p). Since n−1 is even, we have (n−1)/2 ≡ 0 (mod p) and hence 2n−1 −1 = 4(n−1)/2 −1 is divisible by 4p − 1. Thus, n is a base-2 pseudoprime. Bibliography [1] L. M. Adleman and M.-D. A. Huang, Primality testing and abelian varieties over finite fields, Lecture Notes in Mathematics, vol. 1512, Springer-Verlag, Berlin, 1992. MR1176511 [2] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Ann. of Math. (2) 160 (2004), no. 2, 781–793, DOI 10.4007/annals.2004.160.781. http://www.cse.iitk.ac.in/users/manindra/ algebra/primality_v6.pdf. MR2123939 [3] M. Cozzens and S. J. Miller, The mathematics of encryption: An elementary introduction, Mathematical World, vol. 29, American Mathematical Society, Providence, RI, 2013. MR3098499 [4] R. Crandall and C. Pomerance, Prime numbers: A computational perspective, 2nd ed., Springer, New York, 2005. MR2156291 [5] A. Granville, It is easy to determine whether a given integer is prime, Bull. Amer. Math. Soc. (N.S.) 42 (2005), no. 1, 3–38, DOI 10.1090/S0273-0979-04-01037-7. http://www.dms. umontreal.ca/~andrew/PDF/Bulletin04.pdf. MR2115065 [6] C. Pomerance, Primality testing: variations on a theme of Lucas, Congr. Numer. 201 (2010), 301–312. https://math.dartmouth.edu/~carlp/PDF/lucastalk.pdf. MR2598366


Poincar´ e Conjecture Introduction In 2003, Grigori Perelman, building upon seminal work of Richard S. Hamilton (1943– ), proved the Poincar´e conjecture, one of the million-dollar Clay Millennium Problems (see the comments for the 2000 entry). The conjecture asserts that every smooth, compact, simply connected, closed 3-manifold is homeomorphic to the 3sphere   (x, y, z, w) ∈ R4 : x2 + y 2 + z 2 + w2 = 1 . Two manifolds are homeomorphic if there is a continuous bijection between them that has a continuous inverse. For example, a circle and the trefoil knot (see the 1985 entry) are homeomorphic 1-manifolds, even though they cannot be continuously deformed into each other when embedded in R3 . On the other hand, the 2-sphere and the Euclidean plane are not homeomorphic: one is compact and the other is not.1 A particularly down-to-earth explanation of the main difficulty behind the Poincar´e conjecture was recalled by Gerry Myerson in 2012 [3]: I once heard an expert “explain” the difficulty of the n = 3 case to a general audience by saying something like this: when n ≤ 2, there isn’t enough room for anything to go wrong, while for n ≥ 4, there’s enough room to fix anything that goes wrong; for n = 3, there’s enough room for something to go wrong, and. . . it’s not clear whether there’s enough room to fix things when they go wrong.

The cases n = 1 and n = 2 are classical and date back to the foundations of algebraic topology. Stephen Smale proved the conjecture for n ≥ 4 in 1961 and Michael Freedman (1951– ) proved it for n = 4 in 1982. Since both of them received Fields Medals for their work, one can claim that the Poincar´e conjecture resulted in either two or three medals, depending upon how one accounts for the enigmatic Perelman (see the comments below). Although the resolution of the Poincar´e conjecture is hopelessly beyond the level of this book and the expertise of its authors, we can discuss its analogue for 2manifolds. By a surface, we mean a smooth, connected, two-dimensional manifold. Think of this as a nice topological space that locally resembles R2 and does not consist of multiple disjoint pieces. For example, a microscopic observer on a torus or Klein bottle will believe their local environment is flat and two dimensional, much as we perceive the ground around us as flat. A surface is closed if it is compact 1 However, the 2-sphere with a point removed is homeomorphic to the plane via stereographic projection.




(a) Sphere

(b) Torus

(c) Klein bottle

(d) Projective plane

Figure 1. Fundamental polygons for several 2-manifolds. More sides may be necessary for more complicated manifolds, such as a sphere with several handles attached. and has no boundary. For example, the sphere is closed, but the M¨ obius strip is not since its boundary is a circle (see the 1958 entry). A closed surface can be diagrammed using a fundamental polygon, an even-sided polygon with some of its edges identified; see Figure 1 and the comments for the 1958 entry. A surface is simply connected if every loop on the surface can be contracted to a point without leaving the surface. For example, the sphere is simply connected and the torus is not; see Figure 2. The analogue of the Poincar´e conjecture for 2-manifolds asserts that every simply connected, closed surface is homeomorphic to the sphere. This is a consequence of the classification of surfaces from algebraic topology, which says that every closed surface is homeomorphic to one of the following: (a) the sphere, (b) the connected sum of tori, or (c) the connected sum of real projective planes; see the comments below for information about the connected sum of manifolds. Every surface in the first two classes is orientable; every surface in the third class is nonorientable. Of these, the only simply connected surface is the sphere; this implies the Poincar´e conjecture for 2-manifolds.


(a) Every path on the sphere is contractible to a point. Thus, the sphere is simply connected.


(b) Neither of these two paths on the torus is contractible to a point.

Figure 2. The sphere is simply connected and the torus is not. The problem for this year was originally posed by Frank Morgan of Williams College and it concerned 4-manifolds. However, he felt that the statement was too imprecise to be included here. Instead, we present a simple combinatorial problem with a visual twist that builds upon the comments to the 1980 entry. See below for the solution. Centennial Problem 2003 Proposed by Stephan Ramon Garcia, Pomona College. We saw in the 1980 entry that it is impossible to tile, with nonoverlapping 2 × 1 black-and-white dominoes, a chessboard that has two corners removed (while respecting the underlying black-and-white pattern). Is such a tiling possible if two squares of different colors are removed instead (see Figure 3)? 2003: Comments Perelman’s Fields Medal. Contrary to popular belief, Perelman did not receive the prestigious Fields Medal for his resolution of the Poincar´e conjecture. He declined the award and did not even attend the award ceremony: In May 2006, a committee of nine mathematicians voted to award Perelman a Fields Medal for his work on the Poincar´e conjecture. However, Perelman declined to accept the prize. Sir John Ball, president of the International Mathematical Union, approached Perelman in Saint Petersburg in June 2006 to persuade him to accept the prize. After 10 hours of attempted persuasion over two days, Ball gave up. Two weeks later, Perelman summed up the conversation as follows: “He proposed to me three alternatives: accept and come; accept and don’t come, and we will send you the medal later; third, I don’t accept the prize. From the very beginning, I told him I have chosen the third one. . . [the prize] was completely irrelevant for me. Everybody understood that if the proof is correct, then no other recognition is needed.” [9]

In 2010, Perelman also declined the million-dollar prize offered by the Clay foundation (see the comments for the 2000 entry).







Figure 3. Is it possible to tile, with 2 × 1 black-and-white dominoes, a chessboard that has two squares of different colors removed? What if both squares marked “A” are removed? What if both squares marked “B” are removed?

A monoid of manifolds. A monoid is an algebraic structure similar to a group, except that inverses need not exist. To be more specific, a monoid is a set that is closed under an associative binary operation for which an identity element exists. What is the relationship between monoids and surfaces? Given two surfaces M and N , their connected sum M #N is the surface obtained by removing a disk from each of M and N and then gluing the resulting boundary circles together [10]. One can show that the homeomorphism class of the resulting surface is independent of the location of the excised disks. Let S denote the (two-dimensional) sphere, K the Klein bottle, T the torus, and P the (real) projective plane. The sphere is the identity element for the connected sum operation, in the sense that S#M = M for all surfaces M . This is because if we remove a disk from S, then the resulting surface can be deformed into a disk that takes the place of the disk removed from M . What about the connected sum of a surface with a torus? In visual terms, M #T is “M with a handle attached.” What does attaching a Klein bottle to a surface mean? If M is orientable, then M #K can be regarded as “M with a handle whose ends are attached to opposite sides of M .” The projective plane P is not orientable, so the notion of “side” is meaningless. This is reflected in the algebraic relation P #T = P #K. One can also show that P #P = K and P #K = P #T . These computations can be summarized succinctly as follows. The monoid of homeomorphism classes of surfaces is the commutative monoid with



identity S that is generated by P and T , modulo the single relation P #P #P = P #T. This identity is called Dyck’s theorem, after Walther von Dyck. The connected sum operation is compatible with the Euler characteristic (see the comments for the 1976 entry) in the following sense: χ(M #N ) = χ(M ) + χ(N ) − 2. Since χ(S) = 2, χ(P ) = 1, and χ(T ) = 0, it follows that χ(T #T # · · · #T ) = 2 − 2k    k times


χ(P #P # · · · #P ) = 2 − k.    k times

Putting this all together, we see that a closed surface is completely determined, up to homeomorphism, by its Euler characteristic and orientability. If a surface is nonorientable, then it is homeomorphic to a connected sum of projective planes. On the other hand, an orientable surface is homeomorphic either to a sphere or a connected sum of tori. The number of summands, in both cases, can be discerned by computing the Euler characteristic of the given surface [10]. Solution to the problem. The elegant solution to our problem is due to Ralph E. Gomory (1929– ) [1]. Consider the path llustrated in Figure 4. Suppose that two squares of different colors are removed from the board. Then they are separated, in either direction along the path, by an even number of squares, half of which are black and half of which are white. Thus, the desired tiling exists and, moreover, Figure 4 suggests an algorithm to efficiently produce it.

Figure 4. The chessboard can be regarded as a cycle graph of length 64.



What happens if we replace the standard 8 × 8 board with an 8 × 9 board? A 9 × 9 board? More generally, when can we tile an m × n board that has two squares of the same color removed? Bibliography [1] R. Honsberger, Mathematical Gems I, Mathematical Association of America, 1974. [2] J. Milnor, Poincare Conjecture, http://www.claymath.org/millennium-problems/ poincare-conjecture. [3] G. Myerson, Poincare conjecture for n = 2 (answer), https://math.stackexchange.com/ questions/103182/poincare-conjecture-for-n-2. [4] S. Nasar and D. Gruber, Manifold Destiny: A legendary problem and the battle over who solved it, The New Yorker, https://www.newyorker.com/magazine/2006/08/28/manifolddestiny. [5] G. Perelman, The entropy formula for the Ricci flow and its geometric applications, https:// arxiv.org/abs/math/0211159. [6] G. Perelman, Ricci flow with surgery on three-manifolds, https://arxiv.org/abs/math/ 0303109. [7] G. Perelman, Finite extinction time for the solutions to the Ricci flow on certain threemanifolds, https://arxiv.org/abs/math/0307245. [8] T. Tao, Perelman’s proof of the Poincar´ e conjecture: a nonlinear PDE perspective, https:// arxiv.org/abs/math/0610903. [9] Wikipedia, Grigori Perelman, https://en.wikipedia.org/wiki/Grigori_Perelman. [10] Wikipedia, Surface (topology), https://en.wikipedia.org/wiki/Surface_(topology).


Primes in Arithmetic Progression Introduction 2004 is another year that witnessed the announcement of two major results, each of which is worthy of a whole entry in its own right. One was the culmination of decades of work by dozens of mathematicians: the classification of finite simple groups. The other is the celebrated Green–Tao theorem, which guarantees the existence of arbitrarily long arithmetic progressions in the primes [8, 17]. Alas, we can choose only one to focus on. However, we do have a few words to say about finite simple groups; see the comments below. What does the Green–Tao theorem say? It asserts that for any length , there is an initial prime p and a common difference k so that the length- arithmetic progression p, p + k, p + 2k, . . . , p + ( − 1)k consists entirely of primes. Ben Green and Terence Tao proved this amazing result using a “relative” version of Szemer´edi’s theorem (see the 1975 entry). Szemer´edi’s theorem tells us that a subset of the natural numbers with positive upper density contains arbitrarily long arithmetic progressions. Unfortunately, the prime numbers have density zero and hence Szemer´edi’s theorem does not immediately apply. Green and Tao proved a version of Szemer´edi’s theorem that applies to sets of natural numbers that are pseudorandom in a certain technical sense. The final step of their proof is the construction of a pseudorandom subset of the natural numbers that contains the primes as a relatively dense subset. A recent overview of the theorem and its proof is [3]. Can the Green–Tao theorem be used to find arithmetic progressions in the primes? Yes and no. The proof provides numerical bounds that guarantee the existence of such an arithmetic progression in a certain range. However, the numbers produced are so astronomically large that they are well beyond the limit of modern computation. As of mid-2018, the longest known arithmetic progression in the primes has length twenty-six. The first such example, 43142746595714191 + 5283234035979900k,

k = 0, 1, 2, . . . , 25,

was discovered in 2010 by Benoˆıt Perichon on a PlayStation 3 equipped with special software produced for the purpose [11, 18]. There are now many generalizations and extensions of the Green–Tao theorem [7,9,13–15]. We focus here on one of them that has a particularly nice visual appeal to it [13]. A Gaussian integer is a number of the form a + bi, in which a, b ∈ Z and i2 = −1. The set of Gaussian integers forms a ring, denoted Z[i], under the usual operations inherited from the complex number system. A Gaussian prime is 511



(a) ρ = 50

(b) ρ = 100

Figure 1. Gaussian primes a + bi in the range |a|, |b| ≤ ρ. a prime in the ring Z[i]. Thus, z ∈ Z[i] is a Gaussian prime if z = xy with x, y ∈ Z[i]


x ∈ {1, −1, i, −i} or y ∈ {1, −1, i, −i}.

For example, 2 is not a Gaussian prime since 2 = (1 + i)(1 − i). One can show that a Gaussian integer is prime if and only if it is of the form ±p or ±pi, in which p ≡ 3 (mod 4) is prime in the usual sense, or if it is of the form a + bi, in which a2 + b2 is prime in the usual sense; see Figure 1. In 2005, Terence Tao showed that given any distinct v0 , v1 , . . . , vk−1 ∈ Z[i], then there are infinitely many sets {a + rv0 , a + rv1 , . . . , a + rvk−1 }, in which a ∈ Z[i] and r ∈ Z\{0}, all of whose elements are Gaussian primes. The Green–Tao theorem, along with many other famous theorems and difficult conjectures, follows from the Bateman–Horn conjecture. See the comments for the 2005 entry for more information about this tantalizing conjecture. Centennial Problem 2004 Proposed by Steven J. Miller, Williams College. The Green–Tao theorem implies that for each natural number N , there is an even number 2m, in which m depends on N , such that there are at least N pairs of primes whose common difference is 2m. Prove this without appealing to the Green–Tao theorem. 2004: Comments Four squares in arithmetic progression. The Green–Tao theorem addresses primes in arithmetic progressions. What about perfect squares? The comments to the 1913 entry show how to construct three squares in arithmetic progression. Mathematical folklore credits Fermat with the proof that there does



not exist an arithmetic progression of four perfect squares [6], although Leonhard Euler is attributed the observation in 1780 [4]. A proof using Fermat’s method of descent can be found in [16]. The more modern approach to the problem involves elliptic curves. The crux of the matter is that the rational quadruples (a, b, c, d) so that a2 , b2 , c2 , d2 form an arithmetic progression can be parametrized by the rational points on the elliptic curve y 2 = (x − 1)(x − 2)(x + 2). One can show that the curve has only eight rational points, all of which give rise to trivial solutions to the original problem. Consequently, there are no rational perfect squares in arithmetic progression. The details can be found in [4].

Euclid’s theorem revisited. There is a lot that we do not understand about prime numbers. Even Euclid’s theorem (see p. 4) still holds some mystery. Let a1 = 2, the first prime. Then a1 + 1 = 3, which is also prime; set a2 = 3. Next, observe a1 a2 + 1 = 7, which is another prime; set a3 = 7. In the next stage, we see that a1 a2 a3 + 1 = 43, another prime, which we denote by a4 . Now things get interesting. Observe that a1 a2 a3 a4 + 1 = 1,807 = 13 · 139; set a5 = 13. In general, let an be the smallest prime in the factorization of a1 a2 · · · an−1 + 1 that is not among a1 , a2 , . . . , an−1 . This yields the Euclid–Mullin sequence [5, 10]: 2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471, 52662739, 23003, 30693651606209, 37, 1741, 1313797957, 887, 71, 7127, 109, 23, 97, 159227, 643679794963466223081509857, 103, 1079990819, 9539, 3143065813, 29, 3847, 89, 19, 577, 223, 139703, 457, 9649, 61, 4357,. . . . Does this sequence contain every prime? Without a major breakthrough in our understanding of prime numbers, this question will likely remain unanswered for many years to some.

Classification of finite simple groups. The year 2004 witnessed the completion of the classification of finite simple groups, a decades-long quest. A group is simple if it contains no normal subgroups other than itself and the trivial subgroup (see the 1992 entry for more background). Consequently, a simple group cannot be decomposed further using the quotient group construction. The finite simple groups are the “atoms” from which more complicated finite groups, “molecules” if you will, can be constructed. In contrast to atoms, which come in only a hundred or so varieties, there are infinitely many finite simple groups.



In 1972, Daniel Gorenstein (1923–1992) proposed a sixteen-step program to ´ complete the classification, an odyssey first (unintentionally) begun by Evariste Galois (1811–1832) with his discovery of groups and of two families of finite simple groups. In 2004, Michael Aschbacher (1944– ) and Stephen D. Smith published a massive two-volume book, over a thousand pages in total, that handled the classification of “quasithin groups” [1,2]. This was the only missing piece in the Gorenstein program and it finally completed the classification of finite simple groups. The classification theorem states that every finite simple group is isomorphic to one of the following: (a) a cyclic group of prime order, (b) an alternating group An with n ≥ 5, (c) a group of Lie type, (d) one of the 26 sporadic groups. There is a lot to unpack here and we can only sketch the details. The alternating group An is the subgroup of Sn , the group of permutations on n symbols, that consists of all even permutations. The groups An are simple if n ≥ 5 (the simplicity of these groups is closely related to the fact that there is no analogue of the quadratic formula for polynomial equations of degree five and higher; see the 1973 entry). The technical definition of a “group of Lie type” would take us too far afield, so we content ourselves with some broad strokes. Here “Lie” refers to Sophus Lie (1842–1899), whose name is pronounced “Lee.” There are sixteen families of finite simple groups of Lie type, most of which were discovered long ago. Many can be realized as matrix groups over finite fields and several are closely related to exotic Lie algebras. For the sake of illustration, here is one such example. Start with the special linear group SLn (Fq ) of all n × n matrices with determinant 1 and entries in the finite field Fq of q elements. The quotient of SLn (Fq ) by the subgroup of nonzero multiples of the identity is the projective special linear group P SLn (Fq ). If n ≥ 2 and q = 2, 3, then one can show that P SLn (Fq ) is a finite simple group of Lie type. Most interesting are the 26 sporadic groups.1 These are outliers that do not fit neatly into any classification scheme. The sporadic groups are divided into two broad classes: the pariahs and the happy family. The pariahs are not subquotients of the monster group M (see the 1992 entry); that is, a pariah cannot be obtained as a quotient group of some subgroup of M . These are the six vertices that do not have upward paths toward the monster group M in Figure 2. In contrast, all twenty members of the happy family are subquotients of M . They are divided into three “generations,” with the monster group being of the third generation. No single human in 2004 could comprehend the proof of the classification theorem in its entirety (see the 1976 and 1998 entries for other instances of this phenomenon). It was spread over hundreds of journal articles, written by many dozens of authors, over the course of several decades. Moreover, the final piece of the puzzle was the two-volume book of Aschbacher and Smith, which weighs in at well 1 There is another group, named after Jacques Tits (1930– ), that is occasionally regarded as the “27th sporadic group.” However, it is usually considered an unusual group of Lie type.



Figure 2. Table of sporadic groups and their subquotient relationships (groups that are maximal with respect to this relation are circled). The monster group M contains 20 of the sporadic groups as subquotients. Image by user Drschawrz https://en. wikipedia.org/wiki/File:SporadicGroups.svg and used under Creative Commons Attribution-Share Alike 3.0 Unported license. over 1,000 pages. A massive effort to compile a complete and largely self-contained proof of the classification theorem is well underway: In 1981 the monumental project to classify all of the finite simple groups appeared to be nearing its conclusion. Danny Gorenstein had dubbed the project the “Thirty Years’ War” dating its inception from an address by Richard Brauer at the International Congress of Mathematicians in 1954. He and Richard Lyons agreed that it would be desirable to write a series of volumes that would contain the complete proof of this Classification Theorem, modulo a short and clearly specified list of background results. As the existing proof was scattered over hundreds of journal articles, some of which cited other articles that were never published, there was a consensus that this was indeed a worthwhile project. [12]

The project is expected to be completed in 2023. Perhaps one day soon the entire proof will be verified by computer. Solution to the problem. Although we could use the prime number theorem to solve the problem, a weaker result due to Chebyshev suffices. He proved that there are constants A ≈ 0.9212 and B ≈ 1.1055 so that Ax Bx ≤ π(x) ≤ log x log x



for sufficiently large x, in which π(x) denotes the prime-counting function. Suppose that x is even and large enough for Chebyshev’s estimate to hold. Then the number of distinct pairs of primes (p, q) with 2 < p < q ≤ x is   π(x) − 1 π(x)2 A 2 x2 (π(x) − 1)(π(x) − 2) > > = . 2 2 3 3 log2 x Since the number of possible even differences between primes at most x is bounded above by x/2, the average number of occurrences of each difference is π(x)−1 A2 x2 /(3 log2 x) 2A2 x 2 ≥ = , (2004.1) x/2 x/2 3 log2 x which tends to infinity. At least one of these differences occurs at least the average number of times. Given N , let x be an even number that is large enough to ensure that Chebyshev’s estimates are valid and that the right-hand side of (2004.1) is larger than N . Then there is a common difference 2m for which at least N pairs of primes (p, q) with p − q = 2m exist. Bibliography [1] M. Aschbacher and S. D. Smith, The classification of quasithin groups. I, Structure of strongly quasithin K-groups, Mathematical Surveys and Monographs, vol. 111, American Mathematical Society, Providence, RI, 2004. MR2097623 [2] M. Aschbacher and S. D. Smith, The classification of quasithin groups. II, Main theorems: the classification of simple QTKE-groups, Mathematical Surveys and Monographs, vol. 112, American Mathematical Society, Providence, RI, 2004. MR2097624 [3] D. Conlon, J. Fox, and Y. Zhao, The Green-Tao theorem: an exposition, EMS Surv. Math. Sci. 1 (2014), no. 2, 249–282, DOI 10.4171/EMSS/6. https://arxiv.org/abs/1403.2957. MR3285854 [4] K. Conrad, Arithmetic progressions of four squares, http://www.math.uconn.edu/~kconrad/ blurbs/ugradnumthy/4squarearithprog.pdf. [5] R. Crandall and C. Pomerance, Prime numbers:A computational perspective, Springer-Verlag, New York, 2001. MR1821158 [6] L. E. Dickson, History of the theory of numbers. Vol. II, Diophantine analysis, reprinted by AMS, 1992. [7] J. Fox and Y. Zhao, A short proof of the multidimensional Szemer´ edi theorem in the primes, Amer. J. Math. 137 (2015), no. 4, 1139–1145, DOI 10.1353/ajm.2015.0028. MR3372317 [8] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. http://arxiv.org/ abs/math.NT/0404188. MR2415379 [9] B. Green and T. Tao, Linear equations in primes, Ann. of Math. (2) 171 (2010), no. 3, 1753–1850, DOI 10.4007/annals.2010.171.1753. MR2680398 [10] On-Line Encyclopedia of Integer Sequences, A000945 (Euclid-Mullin sequence: a(1) = 2,  a(n + 1) is smallest prime factor of 1 + n k=1 a(k), https://oeis.org/A000945. [11] On-Line Encyclopedia of Integer Sequences, A204189 (Benoˆt Perichon’s 26 primes in arithmetic progression), https://oeis.org/A204189. [12] R. Solomon, The classification of finite simple groups: a progress report, Notices Amer. Math. Soc. 65 (2018), no. 6, 646–651. https://www.ams.org/journals/notices/201806/ rnoti-p646.pdf. MR3792856 [13] T. Tao, The Gaussian primes contain arbitrarily shaped constellations, J. Anal. Math. 99 (2006), 109–176, DOI 10.1007/BF02789444. https://arxiv.org/abs/math/0501314. MR2279549 [14] T. Tao and T. Ziegler, The primes contain arbitrarily long polynomial progressions, Acta Math. 201 (2008), no. 2, 213–305, DOI 10.1007/s11511-008-0032-5. MR2461509



[15] T. Tao and T. Ziegler, A multi-dimensional Szemer´ edi theorem for the primes via a correspondence principle, Israel J. Math. 207 (2015), no. 1, 203–228, DOI 10.1007/s11856-0151157-9. MR3358045 [16] A. van der Poorten, Fermat’s Four Squares Theorem, https://arxiv.org/abs/0712.3850v1. [17] Wikipedia, Green–Tao theorem, https://en.wikipedia.org/wiki/Green-Tao_theorem. [18] Wikipedia, Primes in arithmetic progression, https://en.wikipedia.org/wiki/ Primes_in_arithmetic_progression.


William Stein Developed Sage Introduction A lot of mathematical software, such as Mathematica (see the 1988 entry) and Maple, are closed source. This means that the actual nuts and bolts of the algorithms and implementations are hidden from the user. For example, the Mathematica command Fibonacci[n] almost instantly returns the nth Fibonacci number. But what is going on under the hood? Is the program using the definition of the Fibonacci numbers? Probably not, that would be painfully slow. Is it using something along the lines of Binet’s formula (see the comments for the 2001 entry)? Possibly. Perhaps Mathematica uses something altogether different and much more clever. We simply do not know because the source code is not publicly available. Without publicly available source code, it is difficult for a researcher to verify that a program does exactly what it claims. Are the results accurate? Are the algorithms correctly implemented? With closed-source programs, one must simply trust that the programmers knew what they were doing and got things right. In early 2005, William A. Stein (1974– ) released Sage (Software for Algebra and Geometry Experimentation) in response to these issues; see Figure 1. Although it is now called SageMath, the goal remains the same [4]: The goal of the Sage project is to create a viable open source alternative to Magma, Maple, Mathematica, and MATLAB, which are all closed source. This means that people have choice—they at least have the option to use open source software for their math research and teaching in all the academic areas represented by those software. Providing such a choice entails both implementing all relevant algorithms in Sage (with competitive efficiency and correctness), and creating corresponding textbooks and documentation.

Figure 1. Sage logo. Harald Schilly.

Image courtesy of Alex Clemesha and




Figure 2. Collection of four screenshots showing Sage in the following situations: a command-line terminal (text-only), jupyter notebook (interactive document), “sage cell” (an online service to run a block of code), and CoCalc (virtual online environment for computations, showing a “Sage Worksheet”). Image courtesy of Harald Schilly.

SageMath features a web-based interface that lets the user harness the power of dozens of open-source packages and perform computations across the spectrum of pure and applied mathematics; see Figure 2. Computations can be performed locally or remotely on a SageMath server. Although SageMath is used by many mathematicians around the world, Stein faced enormous difficulty obtaining funding. Unlike commercially available software, SageMath does not bring in revenue and, in fact, it did not have a single full-time developer until 2016 [11]. Most of the software development was carried out by volunteers, mostly students and working mathematicians or computer scientists. In a 2018 interview, Stein said [4]: My perspective with Sage has always been to try to make a tool that people could use to compute mathematical objects more easily, with minimal friction. They should not have to pay a lot of money, they should have full access to readable source code, and have many good code examples that definitely work.



Although Stein has stepped back a bit from development work on SageMath (he is now the CEO of SageMath, Inc. and focuses mostly on its cloud-computing platform, CoCalc), progress continues unabated [4]: Sage development proceeds at a steady pace, with many Sage Days workshops in both the US and Europe; for example IMA [Institute for Mathematics and its Applications] in Minnesota is sponsoring many workshops this year and OpenDreamKit in Europe too! Most work on Sage is motivated by the needs of research mathematicians for their own work. Releases keep happening, and around 100 people contribute to each release.

Centennial Problem 2005 Proposed by Steven J. Miller, Williams College. Go to the SageMath homepage http://www.sagemath.org/index.html, download SageMath or sign up for an online account, and see what it can do!

2005: Comments The Bateman–Horn conjecture. On the theme of numerical computation and hot on the heels of last year’s entry (the Green–Tao theorem), we embark upon one of the final running threads in this book: the Bateman–Horn conjecture. Like the Riemann hypothesis (see the 1942 and 1987 entries) and the abc-conjecture (see the 1981 entry), the Bateman–Horn conjecture has many far-reaching consequences and remains unproven. The material below, and much more, can be found in the recent expository article [1]. The conjecture stems from a 1962 summer undergraduate research project at the University of Illinois at Urbana-Champaign. Paul T. Bateman (1919–2012), an analytic number theorist who joined the university in 1950, sponsored the project and employed a promising young student, Roger A. Horn (1942– ). In 1963, they used the ILLIAC (Illinois Automatic Computer), the first computer built and owned by a US-based academic institution, to run some computations concerning the distribution of prime numbers. Needless to say, they did not use Sage, Mathematica, or any other software that the modern user might recognize. The programs were entered on paper tape and fed into the machine by a dedicated operator via a noisy mechanism. An attached printer could produce output at the modest rate of ten characters per second. Among other computations, Bateman and Horn found the 776 primes p ≤ 113,000 for which p2 + p + 1 is also prime. This computation, which took 400 minutes on the state-of-the-art ILLIAC, was performed on the first named author’s late-2013 iMac in a tenth of a second. How times have changed! These sorts of computations, along with previous conjectures of Bunyakowky (1854), Dickson (1904), Landau (1912), Hardy and Littlewood (1923), and Schinzel (1958), pointed toward a grand conjecture about the asymptotic distribution of primes generated



by families of polynomials [2, 3]: Bateman–Horn conjecture. Let f1 , f2 , . . . , fk ∈ Z[x] be distinct irreducible polynomials with positive leading coefficients and let   Q(f1 , f2 , . . . , fk ; x) = {n ≤ x : f1 (n), f2 (n), . . . , fk (n) are prime}. (2005.1) Suppose that f = f1 f2 · · · fk does not vanish identically modulo any prime. Then  C(f1 , f2 , . . . , fk ) x dt , (2005.2) Q(f1 , f2 , . . . , fk ; x) ∼ k k 2 (log t) i=1 deg fi in which

−k    1 ωf (p) C(f1 , f2 , . . . , fk ) = 1− 1− p p p


and ωf (p) is the number of solutions to f (x) ≡ 0 (mod p). Consequences of the Bateman–Horn conjecture include the Green–Tao theorem, the prime number theorem, Dirichlet’s theorem on primes in arithmetic progressions, and the twin prime conjecture. It also explains Euler’s enigmatic “prime producing” polynomial and the mysterious Ulam spiral (see the comments for the 2006, 2007, and 2009 entries). We will return to the Bateman–Horn conjecture several times over the remaining entries and explain some of these exciting connections. Why is this conjecture plausible?1 First of all, the many hypotheses ensure that there is no simple “obstruction” that prevents the polynomials f1 , f2 , . . . , fk from simultaneously assuming prime values infinitely often. For example, x2 − 1 = (x − 1)(x + 1) is reducible and hence factors nontrivially if x ≥ 3. Another obstacle is illustrated by x3 − x + 3, which is irreducible but always divisible by 3 since x3 − x + 3 ≡ x3 − x ≡ x(x − 1)(x + 1) ≡ 0 (mod 3). We expect that higher-degree polynomials assume prime values less frequently over k a given range. This tendency manifests itself in the denominator i=1 deg fi of (2005.2). The integral in (2005.2) is reminiscent of the logarithmic integral that we encountered in our study of the prime number theorem (see the 1933 entry). The power of the logarithm reflects the fact that additional polynomials drive down the frequency of arguments for which the polynomials simultaneously attain prime values. Finally, the Bateman–Horn constant (2005.3) that appears in (2005.2) is a correction factor that takes into account information about how the f1 , f2 , . . . , fk behave modulo each prime. The fact that the infinite product (2005.3) converges is not at all obvious. The proof is quite delicate and involves elements of both algebraic and analytic number theory; see [1, Sect. 5] for the details. 1 See [1, Sect. 3] for a detailed heuristic derivation of the Bateman–Horn conjecture, based upon the Cram´ er random model of the primes (see the comments for the 1987 entry).



Before moving on, we should say something about Roger A. Horn, a collaborator of the first named author on a recent linear algebra textbook [5]. The following passage is from [1, Sect. 5]: Horn is known best for his long and storied career in matrix analysis. Among his chief publications are the classic texts Matrix Analysis [7] and Topics in Matrix Analysis [8], both coauthored with Charles Johnson. Of his many papers, only two are on number theory; both of these date from the early 1960s and concern the Bateman–Horn conjecture [2, 3]. Consequently, many of his close colleagues are unaware of his connection to a famous conjecture in number theory.

Indeed, the first named author only became aware of Horn’s involvement in the conjecture because of his recent work on primitive roots for twin primes [6]. Bibliography [1] S. L. Aletheia-Zomlefer, L. Fukshansky, and S. R. Garcia, The Bateman–Horn Conjecture: Heuristics, History, and Applications, to appear in Expositiones Mathematicae, https:// arxiv.org/abs/1807.08899. [2] P. T. Bateman and R. A. Horn, A heuristic asymptotic formula concerning the distribution of prime numbers, Math. Comp. 16 (1962), 363–367, DOI 10.2307/2004056. MR0148632 [3] P. T. Bateman and R. A. Horn, Primes represented by irreducible polynomials in one variable, Proc. Sympos. Pure Math., Vol. VIII, Amer. Math. Soc., Providence, R.I., 1965, pp. 119–132. MR0176966 [4] A. Diaz-Lopez, William Stein interview, Notices Amer. Math. Soc. 65 (2018), no. 5, 540–543. MR3753815 [5] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge Mathematical Textbooks, Cambridge University Press, 2017. [6] S. R. Garcia, E. Kahoro, and F. Luca, Primitive root biases for twin primes, Experimental Mathematics (in press), https://arxiv.org/abs/1705.02485. [7] R. A. Horn and C. R. Johnson, Matrix analysis, 2nd ed., Cambridge University Press, Cambridge, 2013. MR2978290 [8] R. A. Horn and C. R. Johnson, Topics in matrix analysis, corrected reprint of the 1991 original, Cambridge University Press, Cambridge, 1994. MR1288752 [9] Sage, http://www.sagemath.org/. [10] W. Stein, Mathematical software and me: a very personal recollection, http://sagemath. blogspot.com/2009/12/mathematical-software-and-me-very.html. [11] Wikipedia, SageMath, https://en.wikipedia.org/wiki/SageMath.


The Strong Perfect Graph Theorem Introduction Let G be a graph. The chromatic number χ(G) of G is the smallest number of colors needed to paint the vertices of G so that no pair of adjacent vertices have the same color. The clique number ω(G) of G is the size of the largest induced complete subgraph in G, that is, the size of the largest subset of vertices of G, all of which are connected to each other. Since a complete graph Kn on n vertices satisfies χ(Kn ) = n, it follows that χ(G) ≥ ω(G) for any graph; see Figure 1. In principle