A Decade of the Berkeley Math Circle: The American Experience [2] 0821846833, 9780821846834

Many mathematicians have been drawn to mathematics through their experience with math circles: extracurricular programs

411 94 21MB

English Pages 326 [372] Year 2008

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

A Decade of the Berkeley Math Circle: The American Experience [2]
 0821846833, 9780821846834

Table of contents :
Dedications
Contents
Foreword
Introduction
Session 1. Geometric Re-Constructions. Part I
Session 2. Rubik’s Cube. Part II
Session 3. Knotty Mathematics
Session 4. Multiplicative Functions. Part I
Session 5. Introduction to Group Theory
Session 6. Monovariants. Part II
Session 7. Geometric Re-Constructions. Part II
Session 8. Complex Numbers. Part II
Session 9. Introduction to Inequalities. Part I
Session 10. Multiplicative Functions. Part II
Session 11. Monovariants. Part III
Session 12. Geometric Re-Constructions. Part III
Epilogue
Symbols and Notation
Abbreviations
Biographical Data
Bibliography
Credits
Index

Citation preview

Mathematical Circles Library

A Decade of the Berkeley Math Circle The American Experience, Volume II Zvezdelina Stankova Tom Rike Editors

MATHEMATICAL SCIENCES RESEARCH INSTITUTE AMERICAN MATHEMATICAL SOCIETY

A Decade of the Berkeley Math Circle The American Experience, Volume II

Mathematical Circles Library

A Decade of the Berkeley Math Circle The American Experience, Volume II Zvezdelina Stankova Tom Rike Editors

Advisory Board for the MSRI/Mathematical Circles Library Titu Andreescu David Auckly H´el`ene Barcelo Alissa S. Crans Zuming Feng Tony Gardiner Kiran Kedlaya Nikolaj N. Konstantinov Silvio Levy Andy Liu

Walter Mientka Bjorn Poonen Alexander Shen Tatiana Shubin (Chair) Zvezdelina Stankova Ravi Vakil Ivan Yashchenko Paul Zeitz Joshua Zucker

2010 Mathematics Subject Classification. Primary 00–01, 00A07; Secondary 00A08.

This volume is published with the generous support of the Simons Foundation and Tom Leighton and Bonnie Berger Leighton.

For additional information and updates on this book, visit www.ams.org/bookpages/mcl-14

Library of Congress Cataloging-in-Publication Data A decade of the Berkeley Math Circle : the American experience / Zvezdelina Stankova, Tom Rike, editors. p. cm. — (MSRI mathematical circles library ; v. 1–) Includes bibliographical references and index. ISBN 978-0-8218-4683-4 (alk. paper) 1. Mathematics—Study and teaching (Middle school)—California—San Francisco Bay Area. 2. Mathematics—Study and teaching (Secondary)—California—San Francisco Bay Area. 3. Berkeley Math Circle. I. Stankova, Zvezdelina, 1969– II. Rike, Tom, 1943– III. Mathematical Sciences Research Institute (Berkeley, Calif.) QA13.5.C22S363 2008 510.7127946—dc22 2008030521

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Permissions to reuse portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink service. For more information, please visit: http://www.ams.org/rightslink. Send requests for translation rights and licensed reprints to [email protected]. Excluded from these provisions is material for which the author holds copyright. In such cases, requests for permission to reuse or reprint material should be addressed directly to the author(s). Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the first page of each article within proceedings volumes. c 2015 by Zvezdelina Stankova and Thomas Rike  Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines 

established to ensure permanence and durability. Visit the AMS home page at http://www.ams.org/ Visit the MSRI home page at http://www.msri.org/ 10 9 8 7 6 5 4 3 2 1

20 19 18 17 16 15

To Zvezda’s husband, Dmitri, and to Tom’s wife, Peggy, for making this book possible, for their infinite patience and love over the course of two years of hard work, and . . . to the instructors of the Berkeley Math Circle for donating their time and effort over the last decade, for leading inspiring sessions full of mathematical challenges and wonders, and for sharing their passion for mathematics with the circlers. Zvezda Stankova and Tom Rike

Contents Foreword Introduction 1. Top-Tier Math Circles 2. Why, What, and for Whom? 3. Notation and Technicalities 4. The Art of Being a Mathematician and Problem Solving 5. Acknowledgments

xi xiii xiii xvi xx xxii xxiii

Session 1. Geometric Re-Constructions. Part I 1. Experimenting and Conjecturing 2. A Triangle Workout 3. Walking Along an Optimal Path 4. Walking Along an Integer Grid 5. To Prove or to Take for Granted? 6. Hints and Solutions to Selected Problems

1 1 5 10 14 16 20

Session 2. Rubik’s Cube. Part II 1. What Is a Group? 2. Permutation Groups and Group Isomorphisms 3. Properties of Groups and Their Subgroups 4. Even and Odd Worlds 5. How Many Cube Positions Can Be Reached? 6. Conclusions 7. Hints and Solutions to Selected Problems

23 23 27 31 35 39 45 46

Session 3. Knotty Mathematics 1. A Knot, or Not a Knot. That Is the Question. 2. Reidemeister and Knot-Eating Machines 3. Three Crayons Defeat an Army of Knots 4. The Jones Polynomial 5. Is This the End? 6. Hints and Solutions to Selected Exercises

49 49 53 56 64 70 70 vii

viii

CONTENTS

Session 4. Multiplicative Functions. Part I 1. Infinite Raffle: the Initial Setup 2. What are Multiplicative Functions? 3. Sum-Functions 4. Hints and Solutions to Selected Problems

79 79 82 92 96

Session 5. Introduction to Group Theory 1. Puzzling It Out 2. A Polynomial Prelude 3. Action Groups 4. General Groups 5. Some More Examples of Groups 6. Permutation (or Symmetric) Groups 7. The 15-Puzzle Puzzled Out 8. Hints and Solutions to Selected Problems

103 103 104 105 110 112 116 123 126

Session 6. Monovariants. Part II 1. Numerical Monovariants 2. Constructive Activities 3. Not Getting There 4. Conway’s Checkers 5. Hints and Solutions to Selected Problems

141 141 149 153 158 164

Session 7. Geometric Re-Constructions. Part II 1. Optimal and Infinite Challenges 2. A Pythagorean Path for the Intermediate 3. Physics and Math Combine Forces 4. Ptolemy’s Lead into Trigonometry 5. Hints and Solutions to Selected Problems

171 171 173 176 178 185

Session 8. Complex Numbers. Part II 1. Warning, “Teaser,” and Strategy 2. Conventions from the Past 3. Complex Division 4. The Triangle Inequality: No “Respect” for Addition? 5. Integer Powers in C 6. Roots in C 7. Roots of Unity and Regular Polygons 8. Geometric Promise Fulfilled 9. Venturing Everywhere in the Plane 10. Which are the “Closest” Lines 11. Hints and Solutions to Selected Problems

189 189 190 190 192 193 196 198 200 202 205 208

Session 9. Introduction to Inequalities. Part I 1. The Language of Inequalities 2. Arithmetic Mean – Geometric Mean Inequality 3. Power Mean Inequality

211 211 212 216

CONTENTS

4. 5. 6. 7.

The Land of the Convex Applications of Convexity to Inequalities Geometry Leftovers and a Mean Summary Hints and Solutions to Selected Problems

ix

218 220 223 225

Session 10. Multiplicative Functions. Part II 4. Dirichlet Product 5. Möbius Inversion Formula 6. The Euler Function φ(n) 7. The Taming of the ShrewD φ 8. Hints and Solutions to Selected Problems

233 233 237 243 247 252

Session 11. Monovariants. Part III 1. The Balkan Roots Challenge 2. Smoothing and Unsmoothing 3. Rearranging Terms 4. Convexity and Smoothing 5. Random Fun with Smoothing 6. Appendix on Limits and Endless Smoothing 7. Hints and Solutions to Selected Problems

263 263 264 266 268 275 278 281

Session 12. Geometric Re-Constructions. Part III 1. Farmer-and-Cow via Inequalities and Calculus 2. Optimal Bridge Located! 3. Infinitely Many Angles and Infinite Series 4. Historical Detour: from Today back to Archimedes? 5. Hints and Solutions to Selected Problems

287 287 292 296 301 302

Epilogue 1. What Comes from Within 2. The Culture of Circles 3. Eastern European vs. USA Math Circles 4. History and Power 5. Does the U.S. Need Top-Tier Math Circles?

305 305 306 307 310 314

Symbols and Notation

321

Abbreviations

325

Biographical Data

327

Bibliography

331

Credits

335

Index

337

Foreword When I came to the Mathematical Sciences Research Institute, MSRI, as Director in 1997, the Institute already had an extraordinary and distinguished history of research programs. Bill Thurston, my predecessor, felt strongly that MSRI had both an opportunity and an obligation to build on its research excellence a structure that would promote mathematics and its applications in other ways, among them public engagement and education, and he had started several programs for this purpose. I was very much in agreement with this point of view, as was the then Deputy Director, Hugo Rossi. In 1998 a new opportunity appeared in the form of a postdoc: Zvezdelina Stankova freshly at Berkeley after a PhD at Harvard. Zvezda (as we all learned to call her) came to our offices, telling us that the U.S. mathematical community was missing a major opportunity to encourage youngsters to love and be inspired by mathematics: Math Circles, a program long popular in the Eastern Bloc countries (and in particular in Bulgaria, Zvezda’s former home), was nearly unheard of here. Zvezda proposed that we get a math circle going in Berkeley, and perhaps try to spread the tradition. This turned out to be a most rewarding project, and a large group was soon engaged. Aside from Zvezda, Paul Zeitz (Professor of Mathematics at the University of San Francisco and coach of the winning American team at a recent International Mathematics Olympiad), Tatiana Shubin (Professor of Mathematics at San Jose State University, who brought her passion and experience of math circles from the former Soviet Union), and Tom Davis (an applied mathematician at Silicon Graphics) joined forces to offer after-school math programs that were advanced, challenging, fun, and beautiful. We soon learned of others who were also passionate about developing math circles, including Mark Saul in New York and Bob and Ellen Kaplan in Boston. It is one thing to start an individual math circle, another to start a national movement, but the latter was always something we hoped for. When Mark Saul was a Program Director at the Templeton Foundation, MSRI received a grant from the Foundation to help start the Math Circles Library, in partnership with the American Mathematical Society. This volume is the latest in that series, all of which can be found, for example, at library.msri.org/msri-mcl/index.html or ams.org/bookstore/mclseries. xi

xii

FOREWORD

We also began a National Math Circles Association (see mathcircles.org), an organization that now has more than 100 adherent Circles and makes small grants to those who would like to start one. The work has received further support from Tom Leighton and the Akamai Foundation, the Simons Foundation, the Moscow Center for Continuous Mathematics Education, the American Mathematical Society, and of course by MSRI’s donors and Trustees. To all of these we are most grateful! And what do the Math Circles actually do? I can do no better than to quote the description by Robert Bryant, from his foreword to the first volume in the Math Circles Library, which appeared in 2008: And the students came, first in Berkeley and San Jose, then in San Francisco, Stanford, Oakland, and Davis, with open minds and willing hearts to learn about mathematics of which they previously could not have dreamt. They worked on problem solving with faculty who had a depth of understanding and a love of mathematics beyond anything they had ever encountered. There were no attendance lists, no tests, no being forced to do anything. The beauty of the mathematics attracted them to want more and more. We have come to learn that many students, girls and boys, of different socio-economic and racial backgrounds thrive in math circles and come to love the experience. Both professors and students find in math circles a situation that is rare in classroom mathematics instruction. Math circles are voluntary, extra-curricular, after-school programs. The students who are there are much less likely to be motivated by the need to satisfy an academic requirement, prepare for a career, or enhance a resumé. They are, for the most part, there because they love mathematics. The teachers encounter students who are willing and hungry to learn, while the students encounter teachers with expertise and enthusiasm far beyond the usual classroom experience. Teachers and students look forward with anticipation to the next meeting. The Math Circles Library in which this volume appears is just one facet of the support that MSRI provides to the National Association of Math Circles (NAMC); MSRI also supports the position of Director of the NAMC, arranges funding for mini-grants to begin new Circles and helps organize and fund workshops that help train new Circle leaders. For all this see the NAMC website, www.mathcircles.org, where one can also find lists of Circles in different neighborhoods and additional resources. MSRI provides all this support because the Math Circles have proven such an effective way of sparking an enthusiasm for mathematics in young minds! David Eisenbud Director Mathematical Sciences Research Institute

Introduction “The Berkeley Math Circle was really critical in my development. It was the best method available not only to get a flow of mathematical ideas and problems to think about each week but also to meet other interested students and professional mathematicians from all over the Bay Area. You get stimulation from exchanging ideas with other people that you don’t get from reading books at home. I can also testify to the usefulness of studying mathematics even for students who don’t plan on doing it as a career. For someone who wants to go into, say, law, policy analysis, philosophy, economics, or computer science, the kind of logical, abstract thinking that mathematics develops is really the best preparation. I realize that the Circle is most interested in attracting students whose lifelong passion is for mathematics, but it also helps others along the way.” Gabriel Carroll, BMC alumnus Perfect IMO ’01 score Four-time Putnam Fellow Assistant Professor of Economics, Stanford

1. Top-Tier Math Circles1 This book is based on material from a dozen of the 800 sessions of the Berkeley Math Circle (BMC), held over the past 16 years. BMC has been described as a top-tier math circle, calling for the following two definitions. 1.1. Math circles are weekly math programs that attract elementary, middle, and high school students to mathematics by exposing them to intriguing and intellectually stimulating topics, rarely encountered in classrooms. Math circles vary in their organization, styles of sessions, and goals. But they all have one thing in common: to inspire in students an understanding of and a lifelong love for mathematics. 1 Based on contributions from Marc Whitlow and Mike Breen (BMC Parents), Zvezdelina Stankova (BMC Director), and Tatiana Shubin (SJMC Director).

xiii

xiv

INTRODUCTION

1.2. Top-tier math circles prepare our best young minds for their future roles as mathematics leaders. Sessions are taught by accomplished mathematicians and explore advanced mathematical areas. They provide an educational opportunity for top pre-college mathematics students, not offered in any other setting in the U.S. education system. In addition to learning advanced mathematics topics, students are taught the technical writing skills needed to convey the solutions of complex problems. As an example of a top-tier math circle, the Berkeley Math Circle is fashioned after the leading models in Eastern Europe, where math circles originated over a century ago. BMC itself started in the fall of 1998 with about 50 students, primarily in grades 7-12, and there was only one session per week that lasted 2 hours. Sixteen years after, the circle has expanded to about 300 students in grades 1-12, split into two major groups: • BMC-Upper with 3 levels: BMC-Beginners for 5th -6th grades (1.5 hours per week); BMC-Intermediate for 7th -8th grades (2 hours per week); and BMC-Advanced for 9th -12th grades (2 hours per week). BMC-Upper is directed by Zvezdelina Stankova. • BMC-Elementary with 2 levels: BMC-Elementary I for 1st -2nd grades (3 sections, 1 hour per week); and BMC-Elementary II for 3rd -4th grades (3 sections, 1 hour per week). BMC-Elementary is directed by Laura Givental. This book series is based on sessions from BMC-Upper and from the original BMC, when there was only one group for all. To save space, “BMC” throughout this book will refer, for the most part, to materials, instructors, and students from “BMC-Upper.” Like top-tier universities, BMC • challenges students with beautiful, difficult mathematical theories, • introduces them to powerful problem-solving techniques, • constantly provokes deep thought, and • inspires the creation of original ideas. Topics covered at BMC include combinatorics, graph theory, linear algebra, geometric transformations, recursive sequences, series, set theory, group theory, number theory, elliptic curves, algebraic geometry, applications to computer science, natural sciences, economics,and many more. Each topic is taught by an expert in the field who has the ability to challenge the students and support them as they attempt to meet these challenges. All problems require students tocome up with mathematical proofs. Proofs put forward by the students are not always the most eloquent. Only an accomplished mathematician can understand where a student might be heading in his/her proof and offer assistance through this challenge.2 2 For examples of noteworthy past and present instructors who have brought their world expertise to BMC, see the Epilogue.

1. TOP-TIER MATH CIRCLES

xv

The sessions are fast-paced and intellectually demanding. It is hard to convey just how advanced this subject matter is without actually attending a session; but comparable levels can be found in advanced undergraduate and beginning graduate courses. The Monthly Contests (MC) at BMC can also convey the depth of the material. These are take-home exams of four or five hard, thoughtprovoking problems, requiring independent research, split into two levels: MC-Beginners (up to grade 8) and MC-Advanced (up to grade 12). In the beginning years of BMC, the monthly contests were designed and graded by UCB faculty. However, for the last 14 years the MC were designed and coordinated by current and former circlers. The MC develop not only advanced understanding, but also technical writing skills: the students must describe on paper, convincingly and without gaps, how they solved a problem. This is a fundamental skill and key to making intellectual property contributions; it is a unique feature of the toptier math circles, not found in middle or high schools, where students are taught to meet state standards on questions that take less than a minute to answer. In contrast, monthly contest problems may take the best students hours or days of concentrated thought. Only a few participants are capable of solving all the problems; yet, through the attempt everyone learns about the real world of mathematical research. 1.3. The next generation of math leaders. The students of BMC come from a variety of socio-economic and ethnic backgrounds. The proportion of female to male students is approximately 2:3. This is an amazingly high ratio considering the trend of other high-level math programs, which are “maledominated” or “male-only.” Excellent role models for the female students are provided by the female directors of the top-tier math circles in Berkeley [11], San Jose [71], Los Angeles [47], and (formerly of) Marin Math Circle [52]; but perhaps even more important to the students are the outstanding lectures given by dozens of female professors and graduate students. Currently, BMC does not actively recruit participants. Students and their parents find out about the circles by word of mouth, from the Circle’s web site, http://mathcircle.berkeley.edu/, through local universities, and in publications. Due to an increased number of applicants, there is a semi-formal selection process based on several open essay-type questions along the lines of: • Describe your mathematical background and experiences so far. • Why do you want to join BMC? What do you expect from BMC? • What is your favorite math problem that you can solve? State and solve the problem. Why is it your favorite? • What is your favorite math problem that you cannot solve? State the problem and explain why you cannot solve it but why you would like to solve it.

xvi

INTRODUCTION

Needless to say, BMC students are usually years ahead of their peers: they often complete most of high school mathematics by age 13 (8th grade), some take many college math major courses by the time they graduate from high school, and a few of the top circlers venture into graduate courses and serious mathematical research even before entering a university. The accomplishments of students who have benefited from BMC can be measured in many ways. For example, a number of these students have gone on to win International Math Olympiad medals and Putnam awards, and the majority have been admitted to top-tier universities. BMC and the other top-tier math circles not only produce highly accomplished students – they produce and train the next generation of leaders in mathematics.3

2. Why, What, and for Whom? Running BMC for 16 years has taught us a lot about math education in the U.S. and has helped us to understand better our own childhood education and origins of our passion for mathematics. To share this experience with you, the reader, is the purpose of this book : • to present you with beautiful theories, problem-solving techniques, and mathematical insights; • to provide you with an abundance of exercises and problems to work on and with ready materials for math circle sessions. 2.1. The middle or high school student who is interested in expanding his/her math horizons and going well beyond anything that the regular math classroom can offer, who is brave enough to tackle non-trivial math ideas and work on hard problems for hours, who loves challenges and is motivated to overcome them: this is the ideal reader of the book. Don’t confuse the above description with “top” or “brilliant” students: you will never know if you are talented in math unless you give it a try. And you may be pleasantly surprised by what you find out: that mathematics is a whole lot more than “adding fractions,” “algebraic manipulations,” or “endless quadratic equations” in homework assignments. You will discover that Calculus is not the “pinnacle” of mathematical knowledge (as thought by many): it is only one of many beginnings, part of the subject of real analysis. Indeed, other wonderful topics are awaiting you (cf. Fig. 1, p. xviii): • • • • • • • 3

multiplicative functions in number theory; knot theory in topology; Rubik’s Cube and groups in abstract algebra; interaction between geometry, trigonometry, physics, and Calculus; complex numbers arising from algebra and applied to geometry; game theory and inequalities attacked by monovariants; and plenty of proof methods and problem-solving techniques. To learn about the need for top-tier math circles, we direct the reader to the Epilogue.

2. WHY, WHAT, AND FOR WHOM?

xvii

2.2. Prerequisites. To read the book comfortably, you do not need to have Calculus under your belt, except • in the very last section of Session 12 on plane geometry, which discusses a series solution to a geometric question, or • if you want to prove the cited theorems in Session 9 on inequalities. However, familiarity with basic geometry and algebra concepts and theorems will definitely be helpful; e.g., lines, circles, triangles, rectangles, trapezoids, and quadrilaterals in general; similarity criteria for triangles and the Pythagorean Theorem; equal alternate interior angles for parallel lines and bisecting diagonals in a parallelogram; integers, divisibility and remainders; operations on fractions and real numbers, intervals and sets of numbers; and manipulations of algebraic expressions written with letters. In some sessions, functions will play a major role; hence having studied some basic (pre-calculus) examples will not hurt; e.g., linear and quadratic functions, polynomials, exponential and trigonometric functions, as well as their graphs. The above concepts will be re-introduced via examples in the book. But if you feel that you need more solid background, we direct you to several wonderful books that should be part of any budding mathematician’s library: • Geometry, Book 1 by Kiselev [32], • Functions and Graphs [27], The Method of Coordinates [28], Sequences, Combinations, Limits [31], Algebra [30] and Trigonometry [29] by Gelfand, et al., • for the older reader, 103 Trigonometry Problems from the Training of the USA IMO Team by Andreescu and Feng [5]. 2.3. The logical structure of the book series (volumes I and II) is outlined in Figure 2 on page xviii. A solid arrow indicates that a session requires its “predecessor” to be studied beforehand, while a dashed arrow indicates that the “predecessor” will be helpful but is not absolutely necessary. For example, in order to understand Rubik’s Cube II, one should first study Rubik’s Cube I; on the other hand, Rubik’s Cube I-II will make Group Theory I more concrete, but they are certainly not mandatory. Sessions that are bubbled in an ellipse can be attempted without any prerequisites, while sessions encompassed in a rectangle have at least one necessary predecessor. For example, Monovariants II calls for a prior study of Monovariants I, while Knot Theory can be attacked with little reference to other sessions. Sessions not enclosed in anything are from volume I. Finally, there is a group of sessions that pertain to general proof methods, PSTs, and theory that appear in most other places. These sessions are from volume I and are roughly grouped in the two nebulous “clouds”: Proofs I-II, PSTs, and Induction in one “cloud,” and Number Theory I and Combinatorics I in another “cloud.” Figure 2 captures some, but certainly not all, relations among the sessions and topics. The reader is welcome to search for and draw more arrows, as he/she goes through the book.

Geometric Re-Constructions I

Geometric Re-Constructions II

Geometric Re-Constructions III

Geometry

Mass Point Geometry

Proofs I

Proofs II Induction

Inequalities I

Stomp (Invariants)

Monovariants I

Monovariants II

Monovariants III

Group Theory I

Multiplicative I

Multiplicative II

Monovariants

Number Theory I Combinatorics I

Complex Numbers I

Complex Numbers II

Figure 1. Main Areas in Volumes I-II

Number Theory

Proofs, PSTs & Combinatorics

Topology

Figure 2. Logical Structure of the 24 Sessions in Volumes I-II

Circle Geometry

Inversion I

A bit of Calculus

Inequalities

Complex Numbers

Abstract Algebra

Rubik's Cube I

Rubik's Cube II

Knot Theory

xviii INTRODUCTION

2. WHY, WHAT, AND FOR WHOM?



xix

2.4. The middle or high school teacher who wishes to start a math circle in his/her school or teach a specially designed problem-solving class will find this book series invaluable. To start with, five sessions from volume I are a must for any math circle, as they provide techniques and a foundation for solid mathematical understanding; these are Combinatorics I, Number Theory I, Proofs I-II, and Induction. Five of the topics in volume II are introductory and independent of each other; e.g., Geometric Re-Constructions I, Knot Theory, Group Theory I, Multiplicative I, and Inequalities I. Towards the end, some of these contain harder material suitable for intermediate level and the second-to-third year of a math circle. Four other sessions obviously need to be introduced after studying their earlier counterparts; e.g., Geometric Re-Constructions II, Rubik’s Cube II, Monovariants II, and Complex Numbers II. The remaining three sessions are designed truly for the advanced reader: Multiplicative II, Monovariants III, and the last section of Geometric Re-Constructions III. Open questions or problems beyond the scope of the book are interspersed throughout the book and should be left to the die-hards. Running a math circle, especially for a teacher, is a hard task. But it is possible. In the 1960’s, Tom Rike (an editor for this book and a veteran high school math teacher) was working on his master’s degree. While browsing in the library one day, he ran across The USSR Olympiad Problem Book [74]. It contained problems written for talented 7th –10th graders; yet, he could not solve any of these “elementary” problems. In his own words: “My abstract algebra had been too abstract, and I did not have the concrete examples that I needed. I never took a class in number theory because it sounded too elementary. I had developed the real number system starting from the Peano axioms, but I didn’t really understand the fundamentals of the natural numbers, prime numbers. This was an epiphany for me. I felt as though I had been challenged by some force outside me and did not know how to respond.”

For the next 30 years Tom studied olympiad problem solving, first on his own, then through workshops and math circles in the SF Bay Area. He ran his own math circle at Oakland High School and gave talks at just about all other circles around. Even though at times he was only “a few pages” ahead of the students, he kept on learning and teaching problem solving because working on math circles had come to be a large part of his life: “Although I have not attained my goal of becoming a true olympiad problem solver, the journey I have made in pursuit of this goal has been one of the most rewarding endeavors in my life.”

Hence, a word to the middle and high school teachers: keep on reading the book, despite moments of difficulty or confusion. For the motivated, persevering, and caring teacher, there will come a time when he/she will look back at the material here, smile, and effortlessly deliver it to the students at his/her own math circle. Truly gratifying.

xx

INTRODUCTION

2.5. Proofs in particular. That proofs are important goes without question in the mind of Galileo’s father: “It appears to me that those who rely simply on the weight of authority to prove any assertion, without searching out the arguments to support it, act absurdly. I wish to question freely and to answer freely without any sort of adulation. That well becomes any who are sincere in the search for truth.”



In volume I we learned a variety of proof methods: by contradiction, Pigeonhole Principle, and induction; by counterexample, example, or general argument; using invariants or monovariants, and others. All sessions in volume II call for rigorous proofs. Although it is possible to get the gist of the sessions without being familiar with proofs, reviewing first the sessions on Proofs and Induction in volume I will make it faster and easier to read and understand this book. 2.6. The parent of a middle or high school student is also among our intended audience; in fact, parents are probably the most important readers because without their support and enthusiasm, without them bringing and encouraging their children, there would hardly be any top-tier math circles in the U.S. Hence, if you are among those parents or if you are a parent new to the math circle movement, this book series will provide a very strong beginning for your child. And for you as well. As a parent, you can do three things with this book: give it to your child (but make sure that he/she has the necessary background – see the recommended basic books); learn from it and teach your child; or give it to his/her math teacher and encourage the founding of a school-based math circle. Whatever path you choose to follow, it will eventually benefit your child and possibly a larger group of classmates. In any case, enjoy the book!

3. Notation and Technicalities “Philosophy is written in this grand book, the universe, which stands continually open to our gaze. But the book cannot be understood unless one first learns to comprehend the language and read the characters in which it is written. It is written in the language of mathematics, and its characters are triangles, circles, and other geometric figures without which it is humanly impossible to understand a single word of it; without these one is wandering in a dark labyrinth.” Galileo

3.1. Marginalia. In addition to geometric “characters,” we will also use a number of other symbols from algebra and logic. Let us examine first the non-standard margin icons which appear throughout the book.

3. NOTATION AND TECHNICALITIES



Warm-up or brute force

Basic Pigeonhole Principle

Exercise

Generalized Pigeonhole Principle

Problem



xxi

Open question or one that requires extra knowledge Problem-solving technique

Basis step Inductive step Strong basis step

Warning Strong inductive step Contradiction

The first four margin pictures refer to increasing difficulty of exercises and problems. Assigning such symbols is somewhat arbitrary since the same exercise could be easy for one person and could be a really hard problem for another; something may be beyond the knowledge of the reader early in the book, while later it may turn out to be a piece of cake. Thus, treat these symbols as a general guide to the difficulty of the material and make your own judgment after having attempted each problem. The problem-solving techniques, indicated by an eye, are ubiquitous throughout the book and will be discussed in the next section. The warning road sign, the high-voltage symbol, and the pigeons were introduced in Proofs I in volume I. The last four margin pictures refer to the steps of basic and strong mathematical induction, the basis for Session 6 in volume I. 3.2. Logic. Mathematical statements that are proven are referred to by standard names such as theorem, lemma, proposition, property, or corollary. Conjectures are statements that are believed to be true, but no proof for them has been supplied yet. As opposed to volume I, in this book we will not avoid the formal definition environment; likewise, theorems and such will be often phrased formally. All sessions have a section on Hints and Solutions to Selected Problems. There and throughout the text, you will see two symbols indicating the end of a solution. The standard square  indicates the end of a complete solution or a proof with minor gaps, which are usually mentioned and the reader is expected to easily fill them in. The diamond ♦ is at the end of an incomplete solution, partial proof, sketch of a proof, hint, or any discussion requiring more work by the reader to reach a complete proof. The text uses standard mathematical words and expressions, such as “implies,” “therefore,” “if then,” “only if,” “if and only if,” letter notations for various sets of numbers, e.g., Z for the integers, and many others. Even though some are explained and illustrated via examples, the reader is expected to be familiar with basic logic notions and notation (cf. the list of

xxii

INTRODUCTION

Symbols and Notation on page 321). If you need to review or learn this material in depth, we refer you to the first chapter of Jacobs’ Geometry [43] on deductive reasoning. A complete list of Abbreviations can be found on page 325. 3.3. Labeling and future volumes. Subfigures within the same figure are implicitly labeled in alphabetical order. For example, Figure 4 on page 9 contains subfigures Figure 4a, 4b, 4c, and 4d, reading from left to right. Finally, about half of the sessions are parts of series of sessions, to be continued in Volume III of the book.

4. The Art of Being a Mathematician and Problem Solving “Perhaps I can best describe my experience of doing mathematics in terms of a journey through a dark unexplored mansion. You enter the first room of the mansion and it’s completely dark. You stumble around bumping into the furniture, but gradually you learn where each piece of furniture is. Finally, after six months or so, you find the light switch, you turn it on, and suddenly it’s all illuminated. You can see exactly where you were.” Sir Andrew John Wiles

There are no manuals on how to become a mathematician. This book will give you tips and will point to possible paths; but the “art of being a mathematician” can be mastered only through personal experience. With every problem solved and every new definition or theorem learned, you will move closer to this goal. The two most important skills that you will acquire along the way are • to think creatively while still “obeying the rules” and • to make connections between problems, ideas, and even theories.



4.1. Problem-solving techniques. Although all sessions in this book are based on basic knowledge from middle and high school and are, therefore, accessible to a wide range of ages and mathematical backgrounds, to do the exercises, you need to develop problem-solving techniques (PSTs). Session 1 on inversion in volume I introduced PSTs as part of a trilogy of mathematical knowledge: Concepts, Theorems, and PSTs; and throughout this book you will encounter about 100 PSTs. You will also need to learn how to fit together various mathematical parts in order to move forward in the solutions. 4.2. Muddying your hands. Do not expect each session to be a collection of clearly spelled out recipes leading to instantaneous solutions . . . . Nope! The book will encourage you to apply the newly acquired knowledge to problems and will guide you along the way but will rarely give you ready answers. “The best way to learn is to learn from your own mistakes,” said

5. ACKNOWLEDGMENTS

xxiii

my advisor Joe Harris. A number of places in the book will present common problem-solving pitfalls, and alternative ways to solve the same problem. And so, it will be you, the reader, who has to commit to mastering the new math theories and techniques by • “muddying your hands” in the problems, • going back and reviewing necessary PSTs and theory, and • persistently moving forward in the book. Nothing good comes “for free”: you will have to work hard, always with a pencil and paper in hand. Keep in mind that the math world is huge: you’ll never know everything, but you’ll learn where to find things, how to connect and use them. The rewards will be substantial.

5. Acknowledgments 5.1. Institutional support and sponsors. The Berkeley Math Circle was made possible through the years with the unwavering support of: • University of California at Berkeley Math Department, which hosts the Circle and its web site and has provided student assistants and secretarial support every year since 1999. Through faculty grants, Ivan Matić has been able to act as an associate director. The department chairs Cal Moore, Hugh Woodin, Ted Slaman, Alan Weinstein, and Arthur Ogus have always been encouraging and supportive, and several dozen UCB professors have delivered Circle sessions. • Mathematical Sciences Research Institute, which from its inception has overseen the project, provided funds through various sponsors, and hosted Circle meetings and events. Special thanks to Deputy Directors Hugo Rossi, Joe Buhler, Michael Singer, Bob Megginson, and Hélène Barchelo, Directors David Eisenbud and Robert Bryant, and Associate Director David Auckly for their leadership, understanding, and help. A number of sponsors have financially supported BMC over the years: Packard Foundation, Toyota Foundation, Clay Mathematics Institute, Mosse Foundation for Art and Education, Merriam-Webster Foundation for the Scripps National Spelling Bee; National Science Foundation and other grants from Professors Ravi Vakil (Stanford), Bjorn Poonen, Alexander Givental, and Martin Olsson (UC Berkeley), and generous private donors. 5.2. Parents and students. BMC parents have encouraged and driven their kids to the Circle for years, brought snacks during the breaks, organized Circle parties, attended meetings, and donated time, effort, and personal funds to the Circle. We are especially grateful to Marc Whitlow, Mike Breen, Jennifer O’Dorney, Yuki Ishikawa, Ian Brown, and Tony DeRose for their enthusiasm, leadership, and professional services provided so selflessly to the Circle.

xxiv

INTRODUCTION

A sequence of UC Berkeley student assistants have contributed to the smooth operation of the Circle by communicating with circlers, parents, instructors, and administrators and by re-designing and maintaining the web site. Joyce Yeung, Maksim Maydanskiy, Wycee de Vera, William Chen, David Wertheimer, Michael Pejic, Stephanie Tung, and Hojae Lee, have been exceptionally professional and caring. Many thanks go to our monthly contest coordinators: Professors Alexander Givental and Bjorn Poonen, circlers Gabriel Carroll, Andrew Dudzik, Inna Zakharevich, Neil Herriot, Maksim Maydanskiy, Evan O’Dorney, Evan Chen, and former associate director Ivan Matić. 5.3. Professional support with the web site has been rendered on numerous occasions by Paulo de Souza, Dmitri Mironov, Steve Sizemore, and Igor Savine. Marsha Snow, Barbara Peavy, and Tom Brown have offered valuable secretarial support over the years. BMC owes its logo design to Archer Design, Inc. As one can see, many dozens of people have been involved in running the Berkeley Math Circle: it is a joint operation born of the love and care for our young generation of mathematicians. The most important people in this operation are undoubtedly the BMC instructors (over 100), who have delivered the 800 sessions during the last 16 years. We would like to thank all of them! Twelve instructors joined BMC in the beginning and most have stayed with us throughout the years: Ted Alper, Tom Davis, Dmitry Fuchs, Alexander Givental, Quan Lam, Bjorn Poonen, Tom Rike, Vera Serganova, Tatiana Shubin, Zvezdelina Stankova, Paul Zeitz, and Joshua Zucker. 5.4. Book support. Edward Dunne, our AMS editor, and his staff have been very helpful in resolving technical and other issues. Gabriel Carroll is responsible for drawing some of the cartoons in the book series, inspired by the earlier BMC sessions. All USAMO problems are used with permission from the American Mathematics Competitions (AMC), Lincoln, Nebraska [2]. A few pictures and references have been taken from Wikipedia at wikipedia.org/. With gratitude, Zvezdelina Stankova Berkeley Math Circle Director

Session 1

Geometric Re-Constructions. Part I Along Optimal Paths and Integer Grids

Zvezdelina Stankova Sneak Preview. Volume I introduced us to three geometry topics: circle geometry, mass point geometry, and inversion in the plane. To different degrees they all assumed some familiarity with the theory and techniques of classical geometry. In this session, we will start filling in the missing geometric background, motivated by two tantalizing problems: adding up angles in a triplet of squares and finding the shortest path on a hot summer day. Both problems will be easy to understand but certainly not easy to conquer. In our search for solutions, we will intelligently conjecture by physical experimentation; boldly re-create by reflections or grid extensions; and convincingly prove by the criteria for congruence and similarity. Along the way, we will justify two well-known geometry theorems: about centers of parallelograms and triangles, and we will briefly dip into the history and logical foundations of geometry: Euclidean and hyperbolic. All employed techniques will be accessible to ages 10+ . Yet, the originality of the approaches will be gratifying for anyone seeing these problems for the first time. In Part II we will continue exploring other, more advanced but perhaps not as innocently exciting, solutions to our problems and their extensions.

1. Experimenting and Conjecturing The signature problems of this session are two of my favorite plane geometry problems. After decades of subsequent advanced math studies, they still remain crystal clear in my memory . . . to remind me of the wonder I experienced when I first saw them as a 5th -grader back in Bulgaria [12]. As our first step toward solving them, we will experiment and decide if our answers constitute a mathematical proof or not. It is absolutely necessary to bring aboard for this journey some graph paper, scissors, clear tape, a flexible but not stretchable cord, a pin, and of course, a pencil and a straightedge. Highly recommended are a compass and completely prohibited are calculators and other electronic equipment. We will depend only on basic tools and on our unlimited imagination. 1

2

1. RE-CONSTRUCTIONS. PART I

1.1. Cutting, taping, and guessing. Our first problem has an almost century-long history. One of its solutions presented here resembles a truly famous, almost mystical 2000-year old puzzle leading way back to Archimedes!1 Problem 1. (Three Squares) Three identical squares with bases AM , M H, and HB are put next to each other to form a rectangle ABCD (cf. Fig. 1a). What is the sum of the angles ∠AM D + ∠AHD + ∠ABD? D

C β α

A

M

α

γ

β H

γ

B

Figure 1. Experimenting on the three squares



The problem is asking us to find something – an angle. In such situations, people would give you the answer they believe is correct and, more often than not, would think that they are done, without having actually proven anything! But the reader who has gotten this far in the book series knows that the solution should consist of at least two parts: (1) investigating and conjecturing, and (2) formally proving the conjecture. Alas, sometimes even just coming up with the correct answer is already a challenge. For example, when I encountered this problem as a 5th -grader, it wasn’t at all obvious to me what the sum of the three angles had to be . . . . So, how was I to start on a problem when it was unclear what I was supposed to be proving?

 PST 1. If physical experimentation is not too difficult, then do it in order

to discover some possible answers to a problem. Since conjecturing does not require any proof, just about anything is allowed as “experimentation,” as long as you follow the rules of the problem (and don’t hurt anyone!). Figure 1b is more than suggestive: Exercise 1. Draw the 3-squares problem on a graph paper, cut out the three angles, and tape them to each other to form a single angle sum: the three vertices will become one and some adjacent arms will coincide too. How large do you think this angle sum will be? Estimate it. If Figure 1c were drawn to show this resulting angle sum, it would have given away the answer too easily. Now, of course, due to errors in the physical experimentation, no two final angle sums will be absolutely the same. Nevertheless, they will all look suspiciously close to a very well-known angle . . . . 1

See the Historical Appendix in Part II for an explanation of this startling reference.

1. EXPERIMENTING AND CONJECTURING

3

Every time I ask the BMC-Beginners to complete this experiment, it always produces the same emotional outcome: a number of students shout out that the sum-angle is about 89.9◦, others are adamant that it is slightly obtuse; and yet, upon voting, the majoring are convinced that “It has to be 90◦ !” And when I ask how they know, some students Exercise 2. Pull out a protractor, measure, and add the three angles. And hence a second physical experiment is performed, with its own error of measurement, despite how much one might rely on his/her own protractor. In fact, if you do it yourself, you will likely discover that only one of the three angles measures easily and nicely (which one?), and the other two angles yield seemingly random non-integer degrees . . . . As a result, this experiment might prove to be even less precise than the first one with the scissors! One thing, though, should be clear by now – if the problem has a nice answer suitable for a 5th -grade solution (albeit, from a Bulgarian geometry math circle book!), then that answer must be: Conjecture 1. The three angles add up to a right angle. As a middle school student, I knew three ways to prove this conjecture:2

 Idea 1: A bold and truly brilliant solution that re-creates the “missing”

half of the picture by an original extra construction. A bright 5th -grader will understand this solution, as it uses only very elementary technical tools such as congruent triangles and a couple of special plane figures that everyone knows. But it is unlikely (although not impossible) that the bright 5th grader, or even the most seasoned problem-solver, will be able to come up with such an amazing solution out of nowhere. ♦

 Idea 2: A 7 -grade solution using similar triangles and the Pythagorean Theorem, which only partially illuminates the reason behind the 90 -sum. ♦  Idea 3: A standard and boring but fast 8 -grade solution via trigonometry, th



th

which does not explain why the result really is what it is.



The first challenge has been served. You should try on your own to solve the problem in at least one way. The picture on the left contains color-coded hints for all three different ways. We will re-create the 5th -grade solution in this session and come back to the other solutions in Part II. 2 . . . that is, until I saw the 54 proofs in [82]! Check out the History Appendix in Part II.

4

1. RE-CONSTRUCTIONS. PART I

1.2. Pinning, stretching, and sliding. Here is another popular math problem from folklore, a favorite in math circles in Eastern Europe and around the world. Problem 2. (Farmer & Cow) During a hot summer day, a farmer and a cow find themselves on the same side of a river. The farmer is 2 km from the river and the cow is 6 km from the river. If each of them walks straight to the river, they will be 4 km from each other. Unfortunately, the cow has a broken leg. The farmer must get to the river, dip his bucket there, and take the water to the cow. To which point ? on the riverbank should the farmer walk so that his total path is as short as possible? If you draw several possible paths for the farmer, measure, and add, you will get an idea as to where the optimal place will be along the river. You may even want to organize all data in a neat table. There is, however, a simple physical experiment that can help you arrive at a conjecture faster: Exercise 3. Take a flexible (but not stretchable) string or cord; pin one end at the farmer’s position; with your right-hand fingers loosely hold the other end at the cow’s position; and with a pencil (or your left-hand fingers) stretch the cord until it touches the river. Then start sliding the pencil along the river, accordingly loosening or tightening at the cow’s position to keep the cord in two straight segments. Which place along the river needed the least amount of cord? Sort of an answer: As you move the pencil (or your left-hand fingers), you will discover a place X at the river, to the left and to the right of which you will need to loosen the cord in order for the pencil to stay along the river. If the farmer and the cow walked straight to the river to points A and B, respectively, then how long is AX? Since different pictures will be drawn with different scales, a more appropriate question might be to approximate the ratio AX : BX. ♦

C

6 F 2 A

B X

As with the 3-squares problem, upon performing this experiment, the BMC-Beginners split in their predictions; some claim that AX ≈ 0.9 km and for some it looks like AX ≈ 1.05 km, while the majority suspect that the exact answer must be a nice round number: Conjecture 2. The farmer should go to a place X on the river so that AX = 1 km; i.e., AX : BX = 1 : 3.

2. A TRIANGLE WORKOUT

5

There are at least three ways to attack the problem:



Idea 4: A clever idea is to reduce the problem to a trivial but equivalent version by an extra construction that (again!) re-creates the “missing” half of the picture. A bright 5th grader will be able to follow the logic of the solution, if she is familiar with basic geometry tools such as the Triangle Inequality and similar triangles, and experienced in manipulating fractions and solving simple linear equations. ♦

 Idea 5: Take a “leap-of-faith” and apply a fundamental law of physics and its consequence that we observe every day.



 or a proof with inequalities, which give us no better explanation of why the Idea 6: The standard, technically-loaded calculus solution with derivatives answer is what it is other than “This is how the calculations work out.”



The second challenge has been served. Incidentally, the picture of the sun looking into the mirror is an indirect and direct hint for two of the ideas. 1.3. The grand design. For the rest of this session we will build the necessary elementary geometry background and discuss the creative and nontrivial 5th -grade solutions to our two overarching problems. At the end, we will briefly look into the logical foundation of our plane geometry studies. In Part II, we will continue building sophisticated geometry and some technical trigonometry background in order to complete the remaining suggested solutions, generalize their methods to other more advanced problems, and finally, go out of our “comfort zone” and see beyond the 3-squares problem and possibly into the origins of trigonometry millennia ago. If you feel you are already fortified with enough plane geometry background and the two overarching problems are not challenging enough, you can skip to the historical section at the end of the session. However, be aware that the solutions we will discuss here are purely geometric (a.k.a. synthetic) and, arguably, these are the most beautiful solutions; they can be potentially created by bright middle schoolers with little technical background and open minds. And hence, they are worth experiencing.

2. A Triangle Workout Triangles make up any polygonal shape: if you haven’t done this before, just cut any polygon that happens to be lying around along several of its non-intersecting diagonals, until you are left with only triangles. Triangles also appear on their own everywhere in geometry and in everyday life. We will definitely need them to solve all of our problems in this session!

6

1. RE-CONSTRUCTIONS. PART I

Therefore, it is important to answer some fundamental questions about them. Even if you remember your geometry lessons from middle and high school, browse through this section to double-check if it identically “reflects” your knowledge about triangles. 2.1. To be or not to be alike? The two main questions here are: Question 1. When are two triangles the “same,” i.e., all of their corresponding angles and sides are equal3 in size? This is formally known as congruent i triangles and is denoted by the symbol ∼ =. Question 2. When do two triangles look “alike,” i.e., their corresponding angles are equal in size and their sides are proportional? This is formally i known as similar triangles and is denoted by the symbol ∼. Suppose we want to show the congruence ABC ∼ = A1 B1 C1 (cf. Fig. 2a). Do we need to verify all six conditions for sides and angles: • a = a1 , b = b1 , c = c1 ; and α = α1 , β = β1 , γ = γ1 ? C1 a1

α1

A1

∼ =



γ

a

b

β1

c1

B2

C

γ1

b1

γ2

b2 α2

B1

α

A

A2

β

c

a2 β2

c2

C2

B

Figure 2. Congruent or Similar: A1 B1 C1 ∼ = ABC ∼ A2 B2 C2 The same question goes for similar ABC ∼ A2 B2 C2 (cf. Fig. 2b). Will it be overkill to verify all five conditions for their sides and angles: • a/a2 = b/b2 = c/c2 , and α = α2 , β = β2 , γ = γ2 ?

 for congruence/similarity of triangles, which will require you to verify the PST 2. To show that two triangles are congruent or similar, use a criterion

minimum number of conditions: typically only three for congruence and two for similarity. These will be sufficient to imply the remaining conditions on sides and angles and will guarantee the congruence/similarity of the triangles.



Here is a table with 5 standard criteria for congruence of triangles and their counterparts for similarity. The way each criterion works is as follows: verify that the elements listed in the table under the criterion for one triangle are equal to the corresponding elements for the other triangle, and then all other corresponding elements of the two triangles will follow suit. 3 We shall be sloppy and say “equal” for sides and angles to mean that they have the same measure, this is formally referred to as congruent sides and congruent angles.

2. A TRIANGLE WORKOUT

7

Congruence Criterion

Similarity Criterion

(SAS) Two sides and included angle: a = a1 , b = b1 , γ = γ1 . (ASA) Two angles and the included side: α = α1 , β = β1 , c = c1 . (SSS) Three sides: a = a1 , b = b1 , c = c1 . (SsA) Two sides and the angle opposite the longer side: a = a1 , α = α1 , b = b1 . (HL) The hypotenuse and a leg in a right triangle.

(RA) Ratio of two sides and included angle : a/b = a2 /b2 , γ = γ2 . (AA) Two angles: α = α2 , β = β2 . (RR) Two ratios of two sides: a/b = a2 /b2 , b/c = b2 /c2 .  (R A) Ratio of two sides and the angle opposite the longer side: a/b = a2 /b2 , α = α1 . (H/L) Ratio of the hypotenuse and a leg in a right triangle.

Table 1. Congruence and similarity criteria for triangles4 Examples. ASA criterion (cf. Fig. 3a) asks us to check that, say, ?

?

?

AB = A1 B1 , ∠ABC = ∠A1 B1 C1 , and ∠BAC = ∠B1 A1 C1 . Similarly, according to RR, two ratios of sides in ABC must be equal to ? ? two ratios of sides in A2 B2 C2 , e.g., BC/CA = B2 C2 /C2 A2 and CA/AB = C2 A2 /A2 B2 (cf. Fig. 3b). This can be written in an equivalent but more memorable way as follows: AB ? BC ? CA = = · A2 B2 B2 C2 C2 A2 C

C1

C

∼ = α1

α

β1

c

A1

B1 A

C2 a

b

∼ b2

a2

β

c1

B

A

c

B

A2

c2

B2

Figure 3. ASA and RR criteria Exercise 4. For each criterion, draw a relevant picture of two triangles that are congruent (or similar), as in Figure 3, label their vertices, mark the sides or angles (or ratios) that are supposed to be equal, and write down (in letter notation) the conditions that are satisfied by the criterion. 2.2. Reaping the benefits of congruence and similarity is what one ordinarily does after establishing that triangles are congruent or similar: one concludes that the remaining sides, angles, or ratios of sides are equal and uses these facts in whatever way necessary. We formulated and applied this PST in Inversion I (vol. I). We now again demonstrate it, by walking through two famous basic theorems that were used but not proven in Inversion I. 4

The similarity criteria RA and RR are known by the names SAS∼ and SSS∼.

8

1. RE-CONSTRUCTIONS. PART I

2.2.1. Center of a parallelogram. Let’s first agree on what a parallelogram is? Definition 1. A quadrilateral with two pairs of parallel (opposite) sides is i called a parallelogram.

Often a parallelogram is defined in a different way: Definition 1 . A quadrilateral with two pairs of equal opposite sides is i called a parallelogram. Are the two definitions of a parallelogram equivalent? To answer this (indeed, to be able to prove anything about parallelograms!), we need the following fact from plane geometry: Theorem 1. (Alternate Interior Angles) When a transversal 5 intersects two parallel lines, the eight resulting angles are grouped into two quadruples of equal angles; in Figure 4a one such quadruple is α = α = β = β  , where the angles α and β are called alternate interior angles. Conversely, if alternate interior angles α and β formed by two lines and a transversal are equal (as in Fig. 4a), then the two lines are parallel. β

D

l

D

C α

β α α



β α

m A

D

C

C β

E

β

E

α

B

A

B

α

β

A

B

Figure 4. Parallel lines, Sides and Diagonals in a parallelogram Returning to the equivalence of the two definitions, let us show one direction. If AB||CD and BC||DA then the marked alternate interior angles on Figure 4b are equal: α = α and β = β  . Combining this with the common side AC enables us to apply ASA and conclude that ACB ∼ = CAD. As a consequence of this congruence, we have AB = CD and BC = DA: the opposite sides of a parallelogram are indeed equal! Thus, Definition 1 implies Definition 1 . ♦ For the other direction, another congruence criterion is needed:

 two opposite sides are parallel.

Exercise 5. If a quadrilateral has pairs of equal opposite sides, then any We are now ready to state the famous theorem that generated the discussion here about parallelograms. Its proof is left for the reader, especially since an almost explicit hint about it is definitely somewhere on this page.



Theorem 2. (Parallelogram’s Center) The diagonals of a parallelogram bisect each other (cf. Fig. 4c). 5

A line that intersects two lines in different points is called a transversal.

2. A TRIANGLE WORKOUT

9

2.2.2. Center of a triangle. A parallelogram has a unique undisputed center: the intersection of its diagonals. However, in general, a triangle has many different “centers,” depending on what you define its center to be.6 Below we locate one such center, using our similarity criteria multiple times. Definition 2. In a triangle, a median is a segment connecting a vertex i with the midpoint of the opposite side. The point where the three medians

intersect is called the centroid. of the triangle. Does such a centroid always exist and can its position be described using only one median, not three? Do not peek at the figures below before you give your best try in the upcoming exercises! Exercise 6. Draw two medians in a triangle, and experiment physically to estimate the ratio in which their intersection point divides each median. Yes, your guess is right: the ratio is the same for both medians and it is the simplest one you would come up with by eyeballing the drawing. C

C B1

M

B1

A1

α

C A1

B1 γ

α

A

δ

β

B

A

B

A

γ

A1

M δ

B

Figure 5. Centroid, Midsegment, and Medians Theorem 3. (Centroid) The three medians in a triangle intersect in one point, which divides each median in a ratio of 2 : 1 counted from the vertex of the triangle (cf. Fig. 5a). Before we attack the Centroid Theorem, let us do something more basic that will help us prove it. In a triangle, the segment connecting the midpoints i of two of its sides is called a midsegment of the triangle. Exercise 7. How long is the midsegment in relation to the third side of the triangle? Do the two seem to be in special relation to one another? Upon completing Exercise 7, you will likely formulate:

 to the third side of the triangle.

Theorem 4. (Midsegment) A midsegment is half the length and parallel

Proof: If A1 and B1 are the midpoints of sides BC and AC, respectively, then CB1 : CA = 1 : 2 = CA1 : CB. Since ∠C is common to both triangles, by RA we have B1 A1 C ∼ ABC (cf. Fig. 5b). Since the triangles are similar, we have B1 A1 = 12 AB and α = α (as marked on the figure). Since α = β (as vertical angles), we conclude that  alternate interior angles α and β are equal, and hence AB||B1 A1 . 6

Orthocenter, circumcenter, incenter, or centroid, to name a few.

10

1. RE-CONSTRUCTIONS. PART I

The Centroid Theorem is asking us to show that three segments – the i medians – are concurrent, i.e., that they intersect in one point – the centroid of the triangle. To establish that several figures are concurrent, in Circle Geometry (vol. I) we utilized a general technique. A specific version of it will help us now get hold of the centroid M :



PST 3. To show that several segments intersect in the same point M , fix one segment XY and show that the remaining segments divide XY in the same ratio counted from X to Y . Furthermore, if you can find the exact ratio in which a second segment divides XY , you may be able to apply an analogous argument to prove that any other segment intersects XY in that same ratio. X

Y M

Now the centroid’s existence and location is only one -similarity away! Proof of Centroid Theorem: If medians AA1 and BB1 intersect in point M , then ABM ∼ A1 B1 M , as strongly suggested in Figure 5c. Indeed, since B1 A1 ||AB, the alternate interior angles formed by any transversal of these two lines are equal. In particular, γ = γ  and δ = δ  as marked on the figure, so that AA justifies the similarity of triangles. But we know that the ratio of the corresponding sides is AB : B1 A1 = 2 : 1, from which the other ratios are also 2 : 1. Therefore, AM : M A1 = 2 : 1 = BM : M B1 , as desired for the two medians AA1 and BB1 . “And history repeats itself!” Applying the same proof above to the medians AA1 and CC1 , we conclude that they divide each other in ratio 2 : 1 counted from A and C. Because M divides AA1 in that same ratio 2 : 1, from PST 3 we conclude that the intersection point of AA1 and CC1 coincides with the intersection point M of AA1 and BB1 . The final result is that all three medians intersect in point M , a.k.a. the centroid of ABC.  Now that we have seen the power of the congruence/similarity criteria, let’s turn to our main problems. Applying the criteria will be the easy part . . . . How do we create positions where we can usefully apply them?

3. Walking Along an Optimal Path Let us start with our farmer and his cow. The way the problem is posed makes it hard to solve. Recall how we had to experiment in order to guess the optimal place on the river where the farmer should go, and still we were far from actually proving anything!



PST 4. Identify what makes the problem hard, eliminate it by reducing the situation to a simpler one, and see if the new problem is easily solved. Then connect your solution to the original problem.

3. WALKING ALONG AN OPTIMAL PATH

11

3.1. Simplify and solve. What makes the farmer-and-cow problem hard? One expression in the statement of the problem, about which we did not think twice, is what got us into trouble . . . . Which one is it? How about . . . “on the same side of the river!” What if our two protagonists were on different sides of the river? Would you be able to solve the problem now in no time? Certainly! The shortest path would be the straight path from the farmer to the cow, going through the river! Here it is reasonable to pause: what if the river is wide? Does it make a difference to the farmer’s path? Sure it does, so . . . eliminate the width of the river! I can hear some readers objecting: “But you cannot! It is part of the problem.” Actually, it is not: the original problem was placed entirely on one side of the river and did not depend on the width of the river, or for that matter, on whether the river had any width at all. Hence, as any brave mathematician would do, we will draw the river as a line with no width: this will simplify our new problem and make it a better match for the old problem.

River l

3.2. Relate back to the original problem. It doesn’t take much effort to see that reflecting the original farmer across the river to create a “phantom” farmer will turn one problem into the other. Since any path that the original farmer can take is mimicked by the phantom farmer, then the shortest path of the original farmer must correspond to the shortest path of the phantom farmer: the straight one, as noted earlier. C So, what is our answer? We must 3.3. Create an algorithm to show where the original farmer should to go. Step 1. Make the river into a line l (with no width). Step 2. Reflect farmer F across line l to a point F  . Step 3. Let X be the intersection point of segment F  C with line l. Step 4. Tell the farmer to go to point X at the river, dip his bucket there, and then go to the cow.

F River l X F

3.4. Prove that your algorithm works. Our earlier informal argument led to creating the algorithm, but we still need to formally justify that it will yield the shortest possible path for the farmer. Indeed, suppose farmer F walks to any other point Y on the river. Why is this path F → Y → C longer than the path F → X → C suggested by our algorithm?

12



1. RE-CONSTRUCTIONS. PART I

PST 5. To show that a broken path P1 is longer than another broken path P2 , try laying out path P1 along two sides of a triangle ABC and path P2 along the third side (as on the figure to the right), and use the Triangle Inequality AC + CB > AB to conclude that P1 is longer than P2 . C

F l X F

Y

C

A

P1

P2

B

The triangle in question in our problem is created using the phantom farmer F  . Because of the reflection, F X = F  X and F Y = F  Y (cf. Exercise 8), so that the original path P2 : F → X → C is as long as P2 : F  → X → C, and the new path P1 : F → Y → C is as long as P1 : F  → Y → C. All this boils down to applying the Triangle Inequality (I) to F  Y C: length(P1 ) = F Y + Y C = F  Y + Y C I

≥ F  C = F  X + XC = F X + XC = length(P2 ),

with equality if and only if F  Y C degenerates into a segment F  C, i.e., Y = X and the farmer walks to the point X prescribed by our algorithm. In other words, the path P2 : F → X → C is indeed the shortest possible.  This discussion should have convinced even the most skeptical reader of the vast possibilities when working with something as simple as reflections:



PST 6. One way to create new problems or reduce to simpler ones is to reflect across a line. Since any triangle (moreover, any figure!) retains its size and shape, we arrive at a twin to the original situation. The beginner should confirm the above statements about reflection:



Exercise 8. Show that the measure of any segment and any angle is preserved under reflection. What can you say about triangles under reflection? 3.5. Reflect upon the result of your algorithm. Are we done with the Farmer-and-Cow problem? In some sense yes: we described a geometric algorithm, which leads step by step to the optimal path for the farmer, and we proved that this algorithm works. On a second thought, though, did you notice that our solution did not depend at all on the given numerical data: 2 km, 4 km, and 6 km?! What was that all about? A further mystery is why we studied in detail similar triangles when we didn’t use them at all?! Well, the Triangle Inequality (which we did use) can and will be proven in Part II as a consequence of the Pythagorean Theorem, which in turn will be proven via similar triangles. But more to the point, do you remember our experiment with the flexible cord? We made a specific conjecture about the location of point X along the river: AX : XB = 1 : 3, with A and B at the river directly from the farmer and the cow. Similar triangles and a bit of algebra will be the “cure” here.

3. WALKING ALONG AN OPTIMAL PATH

13

Proof of Conjecture 2: The figure on the C right contains four triangles (count them!). However, only three of them are of interest to us; naturally, these are the ones similar to each other: 6 F • F AX ∼ = F  AX due to the reflection and SAS (how?); and 2 • F  AX ∼ CBX due to vertical and right l A X B angles and AA (how?). The similarity F AX ∼ CBX prompts us to F compare ratios of sides and to finally use the given numerical data: F A = 2, AB = 4, and CB = 6. If AX = x, then BX = AB − AX = 4 − x so that AX/AF = BX/BC and we can calculate: 4−x x = ⇒ 6x = 2(4 − x) ⇒ 3x = 4 − x ⇒ 4x = 4 ⇒ x = 1. 2 6 As predicted, AX = 1 km and AX : BX = 1 : 3. ♦ 3.6. The problem-solving structure that persisted throughout our solution could be applied whenever the question asks us to locate certain geometric objects, be they optimal paths, special points, or other: reflect upon



prove construct experiment

Figure 6. Problem-solving structure (1) Experiment physically (or abstractly) to come up with a conjecture about the possible location(s) of the object. (2) Construct an algorithm (typically, of geometric steps) that leads to the object/location in question. (3) Prove that your algorithm works, i.e., that it indeed yields the desired object/location. (4) Reflect upon (pun intended!) the result of your algorithm and try to produce an alternative (perhaps, algebraic) description of the object/location. Look for insights from the complete picture. In construction problems, the first step is often replaced by a “discussion,” during which one assumes that the object has been found and reasons what must be true about its location. The last step too could take the shape of an “analysis”: here one investigates the number of solutions depending on the original configuration. For example, any locations of the farmer and the cow will yield a unique optimal place X at the river . . . unless we allow them to be at the river, in which case there are infinitely many solutions X (why?).

14

1. RE-CONSTRUCTIONS. PART I

4. Walking Along an Integer Grid Let us now turn our attention to the three-squares problem (cf. Fig. 7a). Recall our conjecture that ? α + β + γ = 90◦ .



4.1. Fitting the conjecture into the picture. Our first experiment – cutting and pasting – led to a right angle made out of non-overlapping α, β, and γ, as Figure 1b suggested. But where in our original picture will this right angle fit well? One possibility is the right ∠ABC: since it already contains γ = ∠ABD, we “just” have to show that the remaining ∠DBC can be split into α and β.

4.2. Grid hopping. Consider the integer grid made out of unit squares, just like the three squares in our problem. The points where the grid lines i intersect will be called grid points. To split ∠DBC as desired, we need to draw at least one extra arm inside this angle. The brilliant idea of this solution is to:

 triangles whose vertices are grid points (a.k.a. grid ’s).

PST 7. Restrict to the grid : connect only grid points, and consider only The original problem already has 9 grid triangles! Did you find them? H1

A1

B1

β

D

C

D

C β

α

A

90◦ ? γ

β

M

H

45◦ ? γ

β

B

A

M

H

B

Figure 7. Tiling of the integer grid Now, we want to re-locate angle β inside ∠DBC, and β participates in the grid AHD. So cut out this triangle, move it, flip it, and rotate it as needed, until H coincides with B, side HA goes vertically up from B, and ∠AHD = β happily fits inside our right ∠ABB1 (cf. Fig. 7b). In other words, we have constructed a new grid BB1 H1 ∼ = AHD. To ease the solution and make the construction more “balanced,” draw yet a third copy of AHD: the grid DH1 A1 as in Figure 7b. This completes our picture to a rectangle ABB1 A1 , twice the size of the original 3-squares drawing. As the reader may have noticed, we have labeled by X1 the reflection7 of any grid point X across line CD. 7 Although the three original squares are reflected across line CD, it is worth noting that our augmented picture is not entirely symmetric with respect to line CD (why not?).

4. WALKING ALONG AN INTEGER GRID

15 ?

4.3. Special triangles to the rescue! It remains to show ∠DBH1 = α. Since α = 45◦ from the right isosceles AM D, ideally we would find another right isosceles grid triangle one of whose angles is ∠DBH1 . . . . Not that we have much of a choice:

 Exercise 9. Show that DBH

1

is right isosceles and hence ∠DBH1 = α.

Proof: That BH1 = DH1 is immediate, as they are hypotenuses of our two congruent grid triangles, BB1 H1 ∼ = H1 A1 D. From these same right triangles, ∠DH1 A1 = β and ∠BH1 B1 = 90◦ − β. Therefore, in the 180◦ ∠A1 H1 B1 (depicted white in Fig. 7b), two angles add up to 90◦ and hence the remaining third angle must be 90◦ . Namely, ∠DH1 B is right. The desired conclusion now follows: DBH1 is right isosceles, and thus  ∠DBH1 = 45◦ , so 45◦ + β + γ = 90◦ . Therefore α + β + γ = 90◦ .

 a tiling of a 3 × 2 grid rectangle via five grid-triangles:

4.4. Triangular tiling. What really happened in our solution? We devised (1) One tile in the shape of an obtuse triangle contained γ. (2-4) Three tiles were congruent right triangles with legs 1 and 2; they brought β into our argument. (5) And finally, a big central tile was a right isosceles triangle, which provided the 45◦ -angle equal to α and completed the tiling.

At a first glance, nothing spectacular . . . . Still, when I saw this five-tile construction as a 5th -grader, it seemed to me surprising that anyone could come up with this drawing in the first place, and even miraculous that the tiling could be so conveniently used to solve the 3-squares problem! By the way, this is not the only grid tiling that can be used for our problem: as mentioned earlier, there are several dozen geometric proofs in [82], involving a variety of grid constructions, one of which will occur in the 7th -grade solution in Part II. And as you might have guessed by now, the 5th -grade solution we just re-created is my favorite: it requires the least amount of technical background but (perhaps, because of that) the most imaginative thinking of all. Its main steps listed below match closely the general problemsolving scheme that we outlined on page 13: reflect upon: six-tiling of grid prove:



w/ triangles construct: extend grid experiment: cut & paste

Figure 8. Problem-solving steps in the 3-Squares solution

16

1. RE-CONSTRUCTIONS. PART I

5. To Prove or to Take for Granted? 5.1. Full disclosure. Before we move to the more advanced solutions to our two overarching problems in Part II, let us briefly discuss the logical foundation of what has transpired so far. 5.1.1. We proved (or assigned to the reader to prove) several statements: (1) (2) (3) (4) (5)

Equivalence of parallelogram definitions. Center of Parallelogram Theorem. Midsegment Theorem. Centroid Theorem. Reflection across a line preserves sizes of segments and angles.

5.1.2. We listed but did not prove ten criteria for triangles: (6-10) Congruence: SAS, ASA, SSS, SsA, HL. (11-15) Similarity: RA, AA, RR, R A, H/L. 5.1.3. We used the following facts within our discussion: (16) (17) (18) (19) (20)

The Triangle Inequality (to be proven in Part II). Parallel lines imply equal alternate interior angles and vice versa. The sum of three angles in a triangle is 180◦ . The base angles in a right isosceles triangle are equal to 45◦ . Vertical angles are equal.8

5.2. Euclid and Hilbert must agree. It is important to understand that i some statements are theorems, i.e., they can be proven based on other true

statements in our plane geometry theory or other areas of mathematics. On the other hand, certain statements cannot be proven as they are assumed to i be true without proof; such statements are called axioms. Euclidean geometry – what you know from school as (plane) geometry – is based on a carefully chosen set of axioms. It took two millennia for mankind to agree on which statements should be “axioms” and which could be proven from them and hence called “theorems”. How the geometric axiomatic system evolved over time from Euclid to Hilbert to the present, and how new geometries (such as hyperbolic and elliptic) sprouted from the deep analysis of the logical foundations of geometry, is a fascinating story to study, worth another session on its own, if not a whole semester college course [33]. We will explore here enough history to identify which of our statements have been accepted as axioms; which could, potentially, replace traditionally assumed axioms; and which should be proven as theorems; . . . and what happens when a basic pillar of our geometric intuition is around no more. 8 And it is quite possible that we have missed something to list here, which is fine because the diligent reader can find it and add it to the list.

5. TO PROVE OR TO TAKE FOR GRANTED?

17

5.3. Axioms, reveal yourselves! In 1899, the German mathematician David Hilbert (1862-1943) proposed in his Grundlagen der Geometrie (Foundations of Geometry, [40]) an axiomatic system of plane and solid geometry. It consists of three primitive terms: point, line, and plane, and six relations of betweenness, containment, and congruence; and 20 (originally 21) axioms. 5.3.1. Can all criteria for triangles be proven? According to Hilbert’s axiomatic system, part of the SAS criterion for congruence is an axiom: Hilbert’s Congruence Axiom: If two sides and the angle between them in one triangle are congruent to the corresponding elements in another triangle, then the remaining corresponding angles are also congruent. In other words, the axiom does not conclude that the triangles are congruent! It can be shown then, using Hilbert’s axioms, that the remaining sides are congruent so that the triangles are congruent. Thus, the SAS criterion is partly an axiom and partly a theorem. All other congruence and similarity criteria can be deduced from SAS, and hence they are theorems. 5.3.2. Why do parallel lines create congruent angles? Statements (17)-(18) about alternate interior angles and angle-sum of a triangle are implied by the most controversial geometry axiom in history, proposed by the Greek mathematician Euclid of Alexandria (325-265 BCE): Euclid’s Fifth Postulate: If two lines l and m have a transversal t so that the sum of the interior angles on one side of t is less than two right angles (e.g., α + β < 180◦ as in Fig. 9a), then l and m intersect on that side of t. k

m t P l

β α

X

P

l

m l

Figure 9. Euclid’s, Hyperbolic, and Parallel Axioms Since Euclid, many a famous mathematician tried to prove the Fifth Postulate from Euclid’s other axioms,9 and some even published “proofs” . . . only for flaws to be eventually found in the arguments. Nevertheless, with each such attempt mankind moved closer to a non-Euclidean geometry. Finally, around 1830, the Russian Nikolai Lobachevsky (1792-1856), the Hungarian János Bolyai (1802-1860), and the German Carl Friedriech Gauss (17771855) independently arrived at hyperbolic geometry, where all axioms of Euclidean geometry hold, except for the Fifth Postulate. 9 Euclid’s axiomatic system, proposed in his Elements [24], consists of 23 definitions, 5 undefined concepts, and 5 axioms, a.k.a. postulates.

18

1. RE-CONSTRUCTIONS. PART I

To complete the story, in 1868 the Italian Eugenio Beltrami (1835-1899) provided models of this geometry, thereby proving its consistency and validity. For example, in the so-called Beltrami-Klein model (cf. Fig. 9b), the “points” are all points inside a fixed circle k; the “lines” are the chords in k (excluding their endpoints on k); and two “lines” intersect when the corresponding chords intersect in an ordinary point inside k. Of course, a lot more needs to be defined and technical details “ironed out” to show that all axioms of hyperbolic geometry are satisfied in this model. But contrary to what the Fifth Postulate implies in Euclidean geometry, a striking feature persists in any of these (equivalent) hyperbolic models: there are infinitely many parallels to a given line through a given point (three of those are drawn as dashed lines in Figure 9b). This situation completely defies our intuitive understanding of how (Euclidean) geometry works! Still, the models of hyperbolic geometry live within our usual Euclidean space and the theory behind them provides useful insights (such as inversion in the plane) to study and elegantly prove some phenomena that we observe in math and in life. 5.3.3. Can the Fifth Postulate be replaced? Even though Euclid’s Elements is historically, perhaps, the most influential math book, over the millennia it became evident that there were gaps in Euclid’s original axiomatic system and that it had to be revised. This is what Hilbert completed at the end of the 19th century. He chose an equivalent form of the Fifth Postulate: Parallel Axiom: There is at most one parallel to a given line l through a given point P (cf. Fig. 9c).

 The existence of such a parallel can be proven by a specific construction (say, with two right angles), and hence it is not a necessary part of the axiom.

Going back to our little logic discussion, curiously enough, statement (18) about the 180◦ -sum in a triangle can also replace the Fifth Postulate, but at the price of an additional continuity axiom attributed to the Greek mathematician, scientist, and engineer, Archimedes of Syracuse (287-212 BCE): Archimedes’ Axiom: Given two segments AB and C CD, we can put together enough copies of AB to construct a segment larger than CD. A

D B

You may recognize that this property is also valid for real (positive) numbers instead of segments. It will take us too far afield to show that statement (18) and the Archimedes’ Axiom together imply the Fifth Postulate. But here is a little logic exercise about (18) that you may have encountered in school, yet likely not in such depth. C m Exercise 10. Prove that the angles in a triangle add α β γ up to 180◦ . You may assume the Fifth Postulate and α β l Theorem 1 about alternate interior angles. B A

5. TO PROVE OR TO TAKE FOR GRANTED?

19

Beginning of a Proof: The figure in Exercise 10 is probably what you created if you tackled the problem before. The (dashed) parallel m should be situated exactly as shown: outside ABC. But what if it were inside? As preposterous as this suggestion may seem, it must be logically eliminated in a rigorous proof. To our rescue comes the so-called Crossbar Theorem (cf. Fig. 10a, [37]), according to which a ray through vertex C of ABC and inside ∠ACB must necessarily intersect segment AB. But then this ray cannot be part of a “parallel” line to AB, a contradiction! Thus, indeed, line m is outside ABC and the picture above correctly depicts the situation so that you can use it to complete the proof of the ♦ 180◦ -angle sum in ABC. 5.4. A fair game in congruences. As you can see, there is a lot that can be discussed and learned from studying the logical foundation of geometry and exploring the implications among various theorems and axioms. It is not the point of this section to make the reader go through the somewhat grueling process of proving all non-axioms within the 20 statements listed earlier (not to be confused with Hilbert’s 20 axioms). Several questions of why, what, and how one thing implies another are, though, in order. C

A

X B

C1

C1

C

C γ A

B A

A1 B

B1

Figure 10. Crossbar Theorem, “SsA,” and HL



Exercise 11. Consider the SsA criterion. (a) Why does it require the angle to be opposite the longer side? (b) Isn’t HL congruence criterion a special case of it? Partial Solution: (a) The question essentially asks if we can drop the condition that the equal angles are opposite the longer sides of the triangle. Does SsA work when the equal angles are opposite the smaller sides? In other words, can we strengthen the SsA criterion? Recall the following PST which was discussed and used in volume I:

 PST 8. One way to disprove a statement is to provide a counterexample, i.e., a situation where the hypothesis is satisfied but the conclusion fails.

In the case of SsA, draw ABC with an obtuse ∠ACB = γ. On ray BC locate a point C1 , different from C and such that AC1 = AC (cf. Fig. 10b). ♦ Why does C1 exist? What can you say about ABC and ABC1 ? Turning to the second question above, PST 9. To show that a statement S1 is a special case of a statement S, 1 all conditions of S and something extra are satisfied.

 verify that in S

20

1. RE-CONSTRUCTIONS. PART I

Hint: (b) Properties of right triangles are obviously involved (cf. Fig. 10c). Assuming the Pythagorean Theorem, can you deduce from it the famous fact about right triangles that is necessary to answer the question? ♦ After answering affirmatively the last question, in effect we are left with four criteria for congruence. Our last question introduces a slightly esoteric, fifth criterion, which you should try to prove from the other criteria: Exercise 12. (SASum) Show that two triangles are congruent if one side in one triangle, an angle adjacent to that side, and the sum of the other two sides are correspondingly equal to the same elements in the second triangle. Hint: An extra construction is called for. Can you align the two sides whose sum is known, without moving the third side or the given angle? Which criterion implies that the base angles of an isosceles triangle are equal? How is this relevant here? ♦ 5.5. Historical and modern perspectives. In preparation for Part II of this session, the reader has several options: (B+ ) (Beginners-and-up) Work through Kiselev’s Geometry, vol.I,[32]. This will be a great way to commence your geometry studies! (I+ ) (Intermediate-and-up) Think about other solutions to our two overarching problems, along the lines of Ideas 2, 3, 5, and 6. (A) (Advanced) Look deeper into the history of Euclidean geometry. Study Hilbert’s axiomatic system or any modern equivalent of it. (A+ ) (Super-advanced) Study hyperbolic geometry: as an axiomatic system, its models and applications.

6. Hints and Solutions to Selected Problems Exercise 1. The angles make what looks like a right angle (cf. Fig. 11a).



γ β α

Figure 11. Right angle, SsA, and H/L Exercise 2. The protractor shows the angles as α = 45◦ , β ≈ 26.5◦ , and  γ ≈ 18.5◦ , which do add up to about 90◦ . Exercise 3. The place X on the river that requires minimum amount of cord is about three times closer to A than B. 

6. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

21

Exercise 4. Figures 11b-c represent SsA and H/L, respectively.



Exercise 5. If AB = CD and BC = DA, we can also throw in the common side BD and conclude by SSS that ABD ∼ = CDB (cf. Fig. 12a). Reaping the benefits of the congruence, we have ∠ABD = ∠CDB – alternate interior angles! Therefore, AB||DC. A similar argument shows that AD||BC.  Theorem 2. Figure 4d gives it away! We already know that opposite sides are equal, e.g., AB = CD, and that the alternate interior angles are also equal for the pairs of parallel sides. If E is the intersection of the diagonals, by ASA we have ABE ∼ = CDE. From here, AE = CE and BE = DE, i.e., the diagonals bisect each other in a parallelogram.  B D

C β

A X

A

B

A1

C

C Y

β

A

C

B B

A

B1

A1

l B1

C1 B1 A 1

C1

C1

Figure 12. Def. 2 ⇒ Def. 1 and Same sizes under reflection Exercise 8. (a) Let segment AB go to segment A1 B1 under reflection across line l (cf. Fig. 12b). If AA1 and BB1 intersect l in X and Y , respectively, the reflection means that AA1 ⊥ l, AX = A1 X, BB1 ⊥ l, and BY = B1 Y . ? To show that AB = A1 B1 , we draw parallels to l through A and A1 , which intersect line BB1 in C and C1 , respectively. Using alternate interior angles, show that in quadrilateral XY CA all angles are right, and hence XY CA is a rectangle. Similarly, XY C1 A1 is rectangle, which is congruent to XY CA because of equal sides. Consequently, AC = A1 C1 , BC = BY − CY = B1 Y − C1 Y = B1 C1 , and ∠ACB = ∠A1 C1 B1 = 90◦ . By SAS, ABC ∼ = A1 B1 C1 and hence AB = A1 B1 . Did we miss a special case? ♦ (b) Since each side of a triangle under reflection across l will go to an equal segment (cf. Fig. 12c), by SSS we conclude that triangles are sent to congruent triangles under reflection.  (c) Consider ∠CAB where B and C are some points on the two arms of the angle, thereby forming ABC. By part (b), reflection across line l must send ABC to a congruent A1 B1 C1 (cf. Fig. 12d), which implies  that ∠CAB goes to an equal ∠C1 A1 B1 too. Conjecture 2. From the reflection across l, we have F A = F  A and ∠F AX = ∠F  AX = 90◦ . Since AX is common to both triangles, the required congruence F AX ∼ = F  AX follows from SAS. Further, ∠F  AX = ◦  ∠CBX = 90 and ∠AXF = ∠BXC (as vertical angles), so that AA implies  the required similarity F  AX ∼ CBX.

22

1. RE-CONSTRUCTIONS. PART I

Exercise 10. Continuing the solution that started on page 18 and referring to the picture there, by alternate interior angles we have α = α and β = β  . From the straight angle about point C we have α + γ + β  = 180◦ so that  α + γ + β = 180◦ , which is the desired sum of the angles in ABC. Exercise 11. (a) Since 180◦ − γ = δ is acute, there is an isosceles ACC1 with AC = AC1 and base angles ∠ACC1 = ∠AC1 C = δ. Note that C1 will be on line BC with C between C1 and B (why?), as shown in Figure 13a. Now, ABC and ABC1 share side AB, and have other equal sides: AC = AC1 , across which lie equal angles: ∠ABC = ∠ABC1 = β. However, the two triangles are definitely not congruent, since one of them is contained in the other, namely, ABC is strictly inside ABC1 ! This counterexample to a “strengthened SsA” criterion originated from having the (equal) angle β lie across the smaller side AC of ABC. In conclusion, we cannot strengthen SsA: we must have the equal angles opposite the longer sides of both triangles when applying SsA.  2 2 2 (b) The Pythagorean Theorem says that c = a +b for any right triangle with lengths c, a, and b of the hypotenuse and the legs. Algebraically, this implies that c2 > a2 and c2 > b2 , i.e., c > a and c > b. Thus, we arrive at the well-known fact that the hypotenuse is the longest side of a right triangle. The three triangle elements in the HL criterion are a leg, the hypotenuse, and the right angle, which is opposite the longest side of triangle. But this is precisely the SsA criterion for right triangles! Thus, HL is indeed a special case of SsA.  C1

C1 δδ

δ

C

β

γ

A

C

B

B

C1

μ

A

μ

α

α

β

A

D1

D

B

A1

B1

Figure 13. Constructing extra isosceles triangles Exercise 12. In ABC and A1 B1 C1 (cf. Fig. 13b), let AB = A1 B1 , ∠BAC = ∠B1 A1 C1 = α, and AC + BC = A1 C1 + B1 C1 . The extra construction hinted at in the text is to extend side AC of ABC beyond C to point D so that CD = CB, and analogously for A1 B1 C1 to obtain point D1 . This was done so as to arrive at two obviously congruent triangles: ABD ∼ = A1 B1 D1 by SAS, where the second pair of equal sides are the sum-sides AD = A1 D1 . The congruence yields four more equal angles: ∠ADB = δ = A1 D1 B1 along with the other δ-angles from the two isosceles triangles, BCD ∼ = B1 C1 D1 . Subtracting, we obtain yet another pair of equal angles: ∠ABC = ∠ABD − ∠CBD = μ − δ = ∠A1 B1 D1 − ∠C1 B1 D1 = ∠A1 B1 C1 . Since AB = A1 B1 and α is the same in both original triangles, ASA kicks in to complete the proof of the desired congruence: ABC ∼ = A1 B1 C1 . 

Session 2

Rubik’s Cube. Part II Tom Davis Sneak Preview. In Part I of this session, we encoded the moves on the Rubik’s Cube via permutations. Understanding the mathematics of these face-twisting permutations is indeed equivalent to a complete understanding of Rubik’s Cube. Fortunately, permutations form a most famous and well-studied example of what is known in mathematics as a group. We begin this Part II with a super-fast introduction to group theory, discussing very basic groups together with examples based on Rubik’s Cube. The session culminates in calculating the total number of positions that can be reached from a solved cube. Although more complex, this feat resembles the 15-puzzle in Session 5, where one can plunge into a detailed study of group theory. Naturally, these two sessions reinforce the same abstract concepts from somewhat different angles, and each of them is self-contained and can be tackled independently. Part III will reward the patient reader: our newly-developed group-theoretic tools will be used to find methods for efficiently solving jumbled cubes.

1. What Is a Group? 1.1. Formal definition. Whether small or “impossibly” large, a group is an abstract mathematical object that can be defined in terms of a few simple axioms and about which theorems can be proven. For example, the set of permutations of Rubik’s Cube that we studied in Part I provide one example of a group. Unfortunately, this Rubik’s Cube group is large and fairly complex. Indeed, as we shall prove in Section 5: Problem 1. The Rubik’s Cube group R has 8! 12! 210 37 = 43, 252, 003, 274, 489, 856, 000 members, one corresponding to each position reachable from a solved cube! To begin our study of group theory, as is always the case in mathematics, it is a good idea to begin looking at basic groups with only a few members instead of trying to tackle the Rubik’s Cube group as our first example. But first, let’s settle on 23

24

2. RUBIK’S CUBE. PART II

Definition 1. A group G consists of a set of objects and a binary operation i ∗ on those objects satisfying the following four conditions: (1) The operation ∗ is closed, i.e., if g and h are any two elements of the group G then the object g ∗ h is also in G. (2) The operation ∗ is associative, i.e., if f , g, and h are any three elements of G, then (f ∗ g) ∗ h = f ∗ (g ∗ h). (3) There is an identity element e in G, i.e., there exists an e ∈ G such that for every element g ∈ G, e ∗ g = g ∗ e = g. (Often, in groups where the operation is like multiplication, the symbol “1” is used in place of e.) (4) Every element in G has an inverse relative to the operation ∗, i.e., for every g ∈ G, there exists an element g −1 ∈ G such that g ∗ g −1 = g −1 ∗ g = e. For those who desire the absolute minimum in conditions, see this footnote.1 Most familiar mathematical systems involve commutative operations, but this is not necessarily the case in group theory. In other words, there may exist elements g and h of G such that g ∗ h = h ∗ g, making the group i non-commutative. Notice also that the definition above does not require that a group be finite. In this session we will consider mostly finite groups, although, as in the case of R, those finite groups may be quite large. Since there is only one operation ∗, we often omit it and write gh in place of g ∗ h. In the case of multiplication of permutations we already do this: (1 2) combined with (1 3 4) is written (1 2)(1 3 4). Similarly, we can define g 2 = gg = g ∗ g, g 3 = ggg = g ∗ g ∗ g and so on, g 0 = e, and g −n = (g −1 )n . Because of associativity, these are all well-defined and they obey the usual laws of exponents, such as: g m+n = g m g n , and this is true for any integers m and n, be they positive, negative, or zero. 1.2. Famous infinite groups. You are probably already familiar with a few finite groups, but most of the best-known examples are infinite: • • • • • 1

The The The The The

integers as the group elements under addition.2 rational numbers under addition. rational numbers with 0 omitted under multiplication. real numbers or complex numbers under addition. real or complex numbers omitting 0 under multiplication.

In fact, there is a slightly simpler and equivalent definition of a group: only a right identity and a right inverse are required (or a left identity and a left inverse). In other words, if there is an e such that g ∗ e = g for all g ∈ G and for every g ∈ G there exists a g −1 such that g ∗ g −1 = e then you can show that e ∗ g = g and that g −1 ∗ g = e. This can be done by evaluating the expression g −1 ∗ g ∗ g −1 ∗ (g −1 )−1 in two different ways using the associative property, yielding that the left and right identities are the same and that the left and right inverses of any element are also the same. 2 The term “under addition” simply means that the group operation is addition.

1. WHAT IS A GROUP?

25

Check your understanding of the definitions so far:



Exercise 1. Verify that the examples above do form groups according to the formal definition. Identify which element in these groups is the identity (signified by e in the formal definition), and then check that all the other group properties listed in Definition 1 hold. In fact, all of the above sets are infinite and commutative groups. A

i group that is commutative is sometimes called an abelian group.

On the other hand, the non-negative integers {0, 1, 2, 3, . . .} under addition do not form a group – there is an identity (0), but there are no inverses for any positive numbers. We can’t include zero in the groups of rational, real, or complex numbers under multiplication since it has no inverse. i The so-called trivial group consists of the single element e, and satisfies e ∗ e = e. Since every group must contain the identity element, this is the smallest possible group. 1.3. Groups from number theory. Only if you know about modular arithmetic (cf. the Number Theory I session, vol. I), show that: Exercise 2. The n elements 0, 1, . . . , n − 1 form a (finite abelian) group under addition modulo n.



Exercise 3. If p is prime, then multiplication modulo p forms a group containing p − 1 elements: 1, 2, . . . , p − 1. If p is not a prime then the operation of multiplication modulo p does not form a group. For example, if p = 6 there is no inverse for 2: 2 ∗ 1 = 2, 2∗2 = 4, 2∗3 = 0, 2∗4 = 2, and 2∗5 = 4. It is also not a group since 2∗3 = 0 and 0 is not in the set {1, 2, 3, 4, 5}, so in this case the operation is not even closed! (Remember that in this example the “∗” represents multiplication modulo 6.) Worse, when two numbers, neither of which is zero, multiply to yield zero, then the system is said to have zero divisors; this immediately prevents it from being a group (why?). In fact,



Exercise 4. When a modular system under multiplication has no zero divisors it forms a group. This occurs precisely when the modulus n is a prime number. If n is not prime, there will be zero divisors, and hence no group under multiplication.

In the group based on addition modulo n, if you begin with the element 1, one can get to any element in the group by successive additions of that element. In the group modulo 5, we obtain: 1 = 1, 2 = 1 + 1, 3 = 1 + 1 + 1, 4 = 1 + 1 + 1 + 1 and 0 = 1 + 1 + 1 + 1 + 1. The same idea holds for any n. In this case we say that the group is generated by a single element (e.g., 1), and such groups are called cyclic groups, since successive additions simply i cycle through all the group elements. The element that generates the group in this way is called a generator.

26

2. RUBIK’S CUBE. PART II

A cyclic group may have more than one generator. For example, in the same group corresponding to addition modulo 5, the element 3 is also a generator: 1 = 3 + 3, 2 = 3 + 3 + 3 + 3, 3 = 3, 4 = 3 + 3 + 3 and 0 = 3 + 3 + 3 + 3 + 3.

 any other generators? What are all the generators of the group correspond-

Exercise 5. Does the group above corresponding to addition modulo 5 have ing to addition modulo 6? How about multiplication modulo 7?

1.4. Groups from geometry. For any particular geometric object, the i symmetry operations on that object form a group. A symmetry operation is a movement after which the object looks the same (as if nothing happened to it and it didn’t move!). For example,

 not a circle? Describe the group of these symmetries.

Exercise 6. How many symmetry operations are there on an ellipse that is Solution: There are 4 symmetry operations on an ellipse whose width and height are different:

e

a

c

b

Figure 1. Four Symmetry Operations on an Ellipse e: a: b: c:

Leave it unchanged. Rotate it 180◦ about its center Reflect it across its short axis Reflect it across its long axis

∗ e a b c

e e a b c

a a e c b

b b c e a

c c b a e

The group operation consists of making one movement followed by making a second movement. Clearly e is the identity, and each of the operations is its own inverse. We can write down the group operation ∗ on any pair of elements of the ellipse symmetries in the 4 × 4 table above. 

 triangle? On a square?

Exercise 7. How many symmetry operations are there on an equilateral

Answers: The group of symmetries of an equilateral triangle consists of six elements. You can leave it unchanged, rotate it by 120◦ or 240◦ , and you can reflect it across any of the lines through the center and a vertex.  In the same way, the group of symmetries of a square consists of eight elements: the four rotations (including a rotation of 0◦ which is the identity) and four reflections through lines passing through the center containing either the diagonals or the perpendiculars to the edges. 

2. PERMUTATION GROUPS AND GROUP ISOMORPHISMS

27

In general, a regular n-gon has a group of 2n symmetries and these are i called the dihedral groups.

 symmetries of the equilateral triangle and for the 8-element group of symExercise 8. Try to make a multiplication table for the 6-element group of metries of the square.

Unlike the group of symmetries of the ellipse, the groups in Exercise 8 are not abelian, so a ∗ b is not necessarily the same as b ∗ a. To keep track of which is which, make the column correspond to the first element and the row correspond to the second.3 In other words, a ∗ b is in column a, row b, and b ∗ a is in column b, row a.

 Exercise 9. Is the group of symmetries for the circle finite? Abelian?

Answer: A circle has an infinite number of symmetries. It can be rotated about its center by any angle θ such that 0 ≤ θ < 360◦ or it can be reflected across any line passing through its center. The group is not abelian: rotations and reflections do not commute in general. 

2. Permutation Groups and Group Isomorphisms 2.1. Moves and twists vs. the group operation on R. The most important class of examples for us (since we’re supposed to be fixated on Rubik’s Cube as we read this) come from certain sets of permutations which also form groups. Since a permutation is just a rearrangement of objects, the group operation is simply the concatenation of two such rearrangements.4 In other words, if g is one rearrangement and h is another, then the rearrangement that results from taking the set of objects and applying g to it, and then applying h to the rearranged objects, is what is meant by g ∗ h. To avoid a possible misunderstanding, when we speak about the Rubik’s Cube group, the group members are move sequences and the group operation is the act of doing one sequence followed by another sequence. At first it’s easy to get confused if you think of rotating the front face as a group operation. The term “move sequence” above is not exactly right either – move sequences that have the same final result are considered to be the same. For an easy example, F and F5 are the same group element. Definition 2. The Rubik’s Cube group R is the set of all possible permutations of the facelets achievable by means of a finite number of twists of the i cube faces. To combine two of these permutations, we simply apply one set of twists after the other. This, of course, is a huge group. 3

Compare with the Group Theory I session, where rows and columns are reversed. Warning: This session uses the notation of multiplication from left to right, i.e., gh means apply first g and then h. This is in contrast with the right-to-left notation in the Group Theory session, where gh in “action groups” means apply first h and then g. 4

28

2. RUBIK’S CUBE. PART II

2.2. Permutation after permutation. In any permutation group the identity is the permutation that leaves all the objects in place. The inverse i of a permutation is the permutation that exactly undoes it. To multiply two permutations together, just pick each element from the set of objects being permuted and trace it through. For example, if the set of objects that are to be permuted consists of the six objects {1, 2, 3, 4, 5, 6} and we wish to multiply together (1 2 4)(3 6) and (5 1 2)(4 3), we can begin by seeing what happens to the object in box 1 under the influence of the two operations (cf. Fig. 2 for a visual display of this product). The first operation moves it to box 2 and the second moves the object in box 2 to box 5. Thus, the combination moves the object in box 1 to box 5. Therefore, we can begin to write out the product as follows: (1 2 4)(3 6) ∗ (5 1 2)(4 3) = (1 5 . . . We have written “. . .” at the end since we don’t know where the object in box 5 goes yet. Let’s trace 5 through the two permutations. The first does not move 5 and the second moves 5 to 1, so (1 5) is a complete cycle in the product. Here’s what we have, so far: (1 2 4)(3 6) ∗ (5 1 2)(4 3) = (1 5) . . . 1

2

3

4

5

6

4

1

6

2

5

3

5

4

2

6

1

3

Figure 2. Multiplying (1 2 4)(3 6) ∗ (5 1 2)(4 3) We still need to determine the fates of the other objects. So far, we haven’t looked at 2, so let’s begin with that. The first permutation takes it to 4 and the second takes 4 to 3 so we’ve got this: (1 2 4)(3 6) ∗ (5 1 2)(4 3) = (1 5)(2 3 . . . Doing the same thing again and again, we find that the pair of permutations takes 3 to 6, that it takes 6 to 4, and finally, that it takes 4 back to 2. This accounts for all of the objects in the set, so the final product of the two permutations is given by: (1 2 4)(3 6) ∗ (5 1 2)(4 3) = (1 5)(2 3 6 4).

2. PERMUTATION GROUPS AND GROUP ISOMORPHISMS

29

From now on we’ll omit the “∗” operator and simply place the permutations to be multiplied next to each other. Exercise 10. Verify the following product of permutations of {1, 2, . . . , 9}: (1 2 3)(4 5)(6 7 8 9)(2 5 6)(4 1)(3 7) = (1 5)(2 7 8 9)(3 4 6). Practice multiplying together other pairs of permutations.



Exercise 11. If a permutation is expressed in cycle notation where each of the permuted objects appears in a single cycle, show that the inverse of that permutation can be obtained by reversing the order of the elements in each Ä ä−1 cycle. For example, (1 4)(3 5 2) = (4 1)(3 2 5), where (4 1) = (1 4). As we noticed when we looked at permutations of the facelets of Rubik’s Cube, the order makes a difference: (1 2)(1 3) = (1 3)(1 2) since (1 2)(1 3) = (1 2 3) and (1 3)(1 2) = (1 3 2). And indeed, here the object 1 is shared by both cycles, preventing them from commuting with each other (why?). 2.3. The multiplication table revisited. Let’s look in detail at a particular group – the group of all permutations of the three objects {1, 2, 3}. We know that there are n! ways to rearrange n items since we can chose the final position of the first in n ways, leaving n − 1 ways to chose the final position of the second, n − 2 for the third, and so on. The product, n · (n − 1) · (n − 2) · · · 3 · 2 · 1 = n! is thus the total number of permutations. For three items this means there are 3! = 3 · 2 · 1 = 6 permutations: (1), (1 2), (1 3), (2 3), (1 2 3), and (1 3 2).

 Table 1 is the group “multiplication table” for these six elements. Since, as we noted above, the group multiplication is not necessarily commutative, the table is to be interpreted such that the first permutation in a product is chosen from the row on the top and the second from the column on the left. At the intersection of the row and column determined by these choices is the product of the permutations. For example, to find the permutation product of (1 2) by (1 3) choose the item in the second column and third row: (1 2 3). *

(1)

(1 2)

(1 3)

(2 3)

(1 2 3)

(1 3 2)

(1)

(1)

(1 2)

(1 3)

(2 3)

(1 2 3)

(1 3 2)

(1 2)

(1 2)

(1)

(1 3 2)

(1 2 3)

(2 3)

(1 3)

(1 3)

(1 3)

(1 2 3)

(1)

(1 3 2)

(1 2)

(2 3)

(2 3)

(2 3)

(1 3 2)

(1 2 3)

(1)

(1 3)

(1 2)

(1 2 3)

(1 2 3)

(1 3)

(2 3)

(1 2)

(1 3 2)

(1)

(1 3 2)

(1 3 2)

(2 3)

(1 2)

(1 3)

(1)

(1 2 3)

Table 1. Multiplication of permutations of 3 objects

30

2. RUBIK’S CUBE. PART II

2.4. To be or not to be isomorphic? If we make a similar table of the symmetries of an equilateral triangle ABC (cf. Exer. 8) with A, B, and C listed counterclockwise, whose elements are 1, rotate 120◦ (r1 ), rotate 240◦ (r2 ), and flip across an axis through A, B or C (fA , fB , fC ), then we will obtain the group multiplication Table 2: 1

fA

fB

fC

r1

r2

1

1

fA

fB

fC

r1

r2

fA

fA

1

r2

r1

fC

fB

fB

fB

r1

1

r2

fA

fC

fC

fC

r2

r1

1

fB

fA

r1

r1

fB

fC

fA

r2

1

r2

r2

fC

fA

fB

1

r1

Table 2. Multiplication of symmetries of an equilateral triangle If you look carefully at Tables 1 and 2, you can see that they are, in a sense, the same – the only difference is the names used for the group elements. If you substitute 1 for (1), fA for (1 2), fB for (1 3), fC for (2 3), r1 for (1 2 3), and r2 for (1 3 2), the two tables are identical, so in a sense, the two groups are the same. When two groups differ only in the names used i for their elements, we call them isomorphic. In fact, it is easy to see why this is the case. The symmetries of ABC just move the letters labeling the vertices around to new locations and the six symmetries of the triangle can arrange them in any possible way, so in a sense, the triangle symmetries rearrange A, B, and C and the permutation group rearranges the objects 1, 2, and 3. Thus when you read in a group theory textbook that: Problem 2. There are exactly two groups of order 6. . . . what this means is that every group having 6 elements, with an appropriate relabeling of the members of the group, will be like (be isomorphic to) one of those two groups. Now, the groups in Tables 1 and 2 are the same; the other group of order 6 corresponds to addition modulo 6 (cf. Exer. 2). Definition 3. The group that contains all the permutations of three objects i is called the symmetric group on three objects. In general, the group consist-

ing of all the permutations on n objects is the symmetric group on n objects. Since there are n! permutations of n objects, n! is the size of the symmetric group on n objects.

 dition modulo 6 and multiplication modulo 7. Is there more than one isomor-

Exercise 12. Find an isomorphism between the group corresponding to adphism, i.e., more than one way to make the multiplication tables identical?

3. PROPERTIES OF GROUPS AND THEIR SUBGROUPS



31

Exercise 13. Show that the group consisting of the 4 symmetries of an ellipse with different length axes (described in Exercise 6) is not isomorphic to the group corresponding to addition modulo 4. 2.5. Part of the whole may be all you need. A permutation group does not have to include all possible permutations of the objects. If we consider the Rubik group R as a permutation group on the cubies, there is obviously no permutation that moves an edge cubie to a corner cubie and vice-versa. The group consisting of the complete set of permutations of three objects shown in Table 1 contains various proper subsets that also form groups using the same operation, but limited to that subset: {1}, {1, (1 2)}, {1, (1 3)}, {1, (2 3)}, and {1, (1 2 3), (1 3 2)}. Definition 4. The subsets of groups that are themselves groups under the

i same operation are called subgroups.

For example, the above subsets are recognizable as the trivial subgroup and the four subgroups generated each by a reflection or a rotation of the equilateral triangle. The group in which we are most interested here, the Rubik’s Cube group R, is itself a subgroup of the group of all permutations of 48 items. We will examine the properties of subgroups in the following section.

3. Properties of Groups and Their Subgroups 3.1. Basics in group theory. This session is not meant to be a complete course in group theory, so we’ll list below a few of the important definitions and some properties satisfied by all groups. Proofs can be found in any introduction to group theory or abstract algebra textbook (cf. Gallian [26]). From time to time, we will give a name of a group or group property with no further explanation because we do not need it to help us solve Rubik’s Cube. If you are interested, you can look up that name or property in a group theory textbook to learn more. Theorem 1. Let G be a group. (a) The identity is unique and every element of G has a unique inverse. (b) The order of an element g ∈ G is the smallest positive integer n such that g n = e. If no such n exists, the order is said to be infinite. In a finite group every element has a finite order. (c) The order of a group is the number of elements in it. If g ∈ G then the order of g divides the order of G. (d) If H is a subgroup of G then the order of H divides the order of G. Although parts (a)-(b) are doable (by contradiction or Pigeonhole Principle) parts (c)-(d) are hard and are beyond the scope of this session. Still, the next page has exercises on properties of subgroups (in special but important cases) that can be verified with no advanced group theory background.

32



2. RUBIK’S CUBE. PART II

Exercise 14. If H and K are both subgroups of the same group G, then H ∩ K is also a subgroup of G. In other words, the intersection of any two subgroups of G satisfies all the group properties from Definition 1. Using as an example the symmetric group on three objects displayed in Table 1, the order of (1 2) is 2, the order of (1 2 3) is 3, and both 2 and 3 divide 6, the order of the group. The proper subgroups of the symmetric group listed in Section 2.5 have orders 1, 2, and 3 – again, all are divisors of 6, as they must be. Any pair of subgroups in that list only have the identity element in common, so clearly the intersection of any two of them is also a group, although in these cases it is the trivial group.



Exercise 15. Consider the symmetric group G on 4 objects: the group of order 4! = 24 that consists of all the permutations of 4 objects. Let H be the subset of G made of all permutations that leave the element 1 fixed (but with no further restrictions), and let K be the subset of permutations that leave 2 fixed. List the elements of H, K, and their intersection H ∩ K, and verify that all three subsets are indeed subgroups of G. Answers: For the three subsets, we have: H = {(1), (2 3), (2 4), (3 4), (2 3 4), (2 4 3)}, K = {(1), (1 3), (1 4), (3 4), (1 3 4), (1 4 3)}, H ∩ K = {(1), (3 4)}, illustrating that the intersection of two subgroups is also a subgroup (and in this case, it is the set of all permutations that leave both 1 and 2 fixed). ♦ It is easy to see why Theorem 1(c) is true for the full symmetric groups: Problem 3. If G is the symmetric group on several objects, then the order

 of any permutation in G has to divide the order of G.

Sketch: As we saw in Part I, we can write down any particular permutation as a set of (disjoint) cycles, and the order of that permutation is simply the least common multiple (lcm) of the cycle lengths (why?). Since there are n elements that are moved by the permutations, the longest cycle can have length at most n, so all the cycle lengths are thus n or less. But the order of the group is n!, and clearly the lcm of a set of numbers less than n will divide n! (why?). ♦ 3.2. A few proper subgroups of the Rubik group. Since the center cubies always remain in the same position relative to the others, we will always consider the cube to be oriented in a specific way (say, with the white face up and the green face on the left). We consider to be moves only those operations that twist a face relative to the others, so rotating the entire cube as a unit is not a move we will consider. With a real cube, it is sometimes interesting to think about “slice moves” where, say, the top and bottom face are left in position and the center slice between them is turned (cf. the “slice

3. PROPERTIES OF GROUPS AND THEIR SUBGROUPS

33

subgroup” in Problem 4, p. 34), but this is equivalent to a combination of a clockwise rotation of one face together with a counterclockwise rotation of the face opposite, so a slice move does not really introduce anything new. In its total glory, a jumbled Rubik’s Cube is difficult to unjumble, especially when you are a beginner.



PST 10. A common method to study complex situations is to look first at simpler cases and learn as much as you can about them before tackling the harder problem. One way to simplify Rubik’s Cube is to consider only a subset of moves as being allowable and to learn to solve cubes that were jumbled with only those moves. If you do this, you are effectively reducing the number of allowable permutations, but you will still be studying a subgroup of the full Rubik group. 3.2.1. Rubik program to the rescue! Let’s consider a few subgroups of R, which you may wish to investigate yourself using the Rubik program: http://www.geometer.org/rubik/. Figure 3a shows what the Rubik window looks like after pressing the “Fcw” (meaning “F clockwise”) button twice, beginning with a solved cube. The cube can be returned to a solved state by pressing the “Reset Cube” button.

Figure 3. Rubik window and Macro gizmo (FF,LL) The Rubik program contains a “macro gizmo” to make this easier. Figure 3b shows the gizmo with two macros defined: one that does the F operation twice and one that does the L operation twice. To perform the FF macro, simply click on the button marked “FF”. The help file for the Rubik program describes how to define macros and include them in the macro gizmo. If you’d like to investigate the positions achievable by a limited set of moves, define each of the moves as a macro and put all of them in the macro

34

2. RUBIK’S CUBE. PART II

gizmo. Then make moves from an initialized cube using only macro gizmo entries. In fact, if you place the macro gizmo on top of the control panel of Rubik, you will not press any other buttons by accident. If you restrict your moves to any of these subgroups, the cube will be easier to solve. 3.2.2. Examples you can do in practice. The list below is a tiny subset of the total number of subgroups of the whole group, but these are “practical” examples: you can experiment with a real cube making only the moves in the indicated subgroups. Explore and describe, as much as you can, features of these subgroups, e.g., try to calculate the order of the subgroup, to decide whether it is abelian or not, cyclic or not, whether it looks like another group you know, etc. Do not look at the commentaries after the exercises until you have thought about the subgroups for a while. (In Part III, we will examine in detail more general but less practical subgroups of R.) Exercise 16. (Single face subgroup) In this subgroup of R, you are only allowed to move a single face. Hint: This group is not very interesting, since there are only 4 achievable positions including “solved,” but it still is a proper subgroup of R. ♦



Exercise 17. (Two opposite faces subgroup) In this subgroup of R, you are only allowed to move only two opposite faces. Hint: This is also a fairly trivial group since twists of two opposite faces are independent. Still, it has 16 elements and is an example of what is known as a direct product group. Beware: if you are allowed to turn two adjacent faces, the subgroup is enormous: it contains 73,483,200 members, the calculation of which is beyond the scope of this session. ♦ Exercise 18. (F-L half-turn subgroup) In this subgroup of R, you are

 allowed to move either the front face or the left face by half-turns.

Solution: In Figure 4 we see all 12 cube positions in the subgroup generated by FF and LL. Since applying FF or LL twice in a row brings us to the previous position, the 11 positions different from the solved position are: FF, FFLL, FFLLFF, FFLLFFLL, . . . , (FFLL)5 FF, arranged in that order in the figure. The final position in the lower-right corner of the figure will return to the solved position with one more application of LL.  Problem 4. (The slice subgroup) In this subgroup of R, you can only move the center slices (cf. the figure to the right). The subgroup can be further restricted by requiring that one, two, or three of those slices must make halfturns only. Answers: The full slice group contains 768 members. If one of the slices must be a half-turn, there are 192 members. If two are half-turns, there are 32 group members, and if all three moves must be half-turns, there are only 8 members. Can you justify all these numbers? ♦

4. EVEN AND ODD WORLDS

35

Figure 4. F-L half-turn subgroup of R

4. Even and Odd Worlds Not every rearrangement of the Rubik’s Cube is possible. In the upcoming Group Theory I session, we will learn that the 15-puzzle “prohibits” exactly half of the possible arrangements of its squares: these were the socalled odd permutations, which are unattainable (unless you cheat, break the puzzle apart, and put it back together switching two tiles). For the Rubik’s Cube, a larger fraction of rearrangements are impossible, some due to a similar parity argument. To prepare ourselves for it, we study even and odd permutations from scratch in this section. 4.1. Parity of permutations. We will now show that all permutations can be divided into two sets – those with even and odd parity. Just as is the case of addition of whole numbers, multiplying two permutations with even parity or two with odd parity will result in a permutation of even parity. If one of the two has odd parity and the other even parity, the result will be odd. To start with, notice the following:

(1 (1 2)(1 (1 2)(1 3)(1 (1 2)(1 3)(1 4)(1

(1 2)(1 3)(1 4)(1 5)(1

2) 3) 4) 5) 6)

= = = = =

(1 (1 (1 (1 (1

2) 2 3) 2 3 4) 2 3 4 5) 2 3 4 5 6),

36



2. RUBIK’S CUBE. PART II

and it is not hard to prove that the pattern continues. This shows that any n-cycle can be expressed as a product of 2-cycles. If n is even, there are an odd number of 2-cycles and vice-versa. Since every permutation can be expressed as a set of disjoint cycles, this means that every permutation can be expressed as a product of 2-cycles. For example: (1 4 2)(3 5 6 7)(9 8) = (1 4)(1 2)(3 5)(3 6)(3 7)(9 8). Obviously, there are an infinite number of ways to express any particular permutation as a product of 2-cycles: (1 2 3) = (1 2)(1 3) = (1 2)(1 3)(1 2)(1 2) = (1 2)(1 3)(1 2)4 = · · · . But it turns out that there is one big restriction in such representations: Theorem 2. For any given permutation, the number of 2-cycles necessary to represent it is either always even or always odd. For this reason, we can say that Definition 5. A permutation is either even or odd, depending on whether

i its representation requires an even or an odd number of 2-cycles.

Theorem 2 is not too hard to prove, as long as one is willing to allow some polynomial algebra to sneak into our discussion.5 Proof: Consider a permutation of the set {1, 2, . . . , n} that moves 1 to x1 , 2 to x2 , 3 to x3 , and so on. All the xi ’s are different, and they represent exactly the numbers from 1 to n in some order. Now construct the product: 

(1)

(xi −xj ) = (x2 −x1 )(x3 −x1 ) · · · (xn −x1 )(x3 −x2 ) · · · (xn −xn−1 ),

1≤j j, xi −xj = i−j > 0, and all the terms in the product are positive, making the product positive. Now, if we multiply any permutation by a 2-cycle, this should change it from even to odd or vice-versa. Correspondingly, we’d like to see that multiplying by a 2-cycle will flip the sign of the product. The following technique will work for any 2-cycle, but let’s just look at multiplication of some permutation ρ by the 2-cycle (1 2): ρ (1 2). This 2-cycle exchanges 1 and 2, so in the product, every x1 becomes an x2 and vice-versa. Let’s write the original product in the following form: 

(xi −xj ) = (x2 − x1 )(x3 − x1 )(x4 − x1 ) · · · (xn − x1 ) (x3 − x2 )(x4 − x2 ) · · · (xn − x2 ) (x4 − x3 ) · · · (xn − x3 ) .. .. . .

1≤j 3 – but it is a stronger invariant, as demonstrated next.

60

3. KNOTTY MATHEMATICS

Exercise 16. Compute τ for the trefoil, the figure 8 knot, and the so-called square knot shown in Figure 9. Conclude that these are all different knots! The last exercise shows that τ is a more refined invariant than the simple Yes/No of tricolorability: it can distinguish between the trefoil and the square knot, even though both are tricolorable. Problem 3. Compute τ for various knots from the knot table on page 68. Do you notice a pattern? Can you explain why you see that pattern? (This will be treated in more detail with linear algebra in Section 3.5.) 3.4. Tricolorings and connected sums. Just as we can build any natural number from its prime divisors, we can try to create more complex knots from simpler knots. For this, we will do a bit of “surgery” on the simpler knots in order to join them together, sort of like Siamese twins. The connected sum K1 #K2 of two knots K1 and K2 is formed by erasing i a little piece of a strand from each knot and then connecting the loose strands together. The example in Figure 9 takes the right-handed and the lefthanded trefoils and forms their connected sum, known as the square knot.

K1

K2

K1 #K2

Figure 9. The square knot is the connected sum of two trefoils For instance, it is easy to see that a knot K doesn’t change if you connect it with the unknot U ; but that K acquires an extra ring around one of its strands if you connect K with the Hopf link H (why?).

 Exercise 17. If K

1

and K2 are tricolorable, is K1 #K2 tricolorable?

Taking connected sums is a good operation on knots as it relates features of the resulting knot to those of its building blocks. One such feature is τ . Problem 4. Find a formula that relates τ (K1 ), τ (K2 ), and τ (K1 #K2 ).

 PST 20. It is always a smart idea to check your formulas against some examples that you can work out directly or using other methods.

Problem 5. Consider your formula for τ (K1 #K2 ). (a) Verify it when one of the knots Ki is U or H. (b) What does it say about K1 and K2 if K1 #K2 is tricolorable? (c) Use it to find τ of a linear chain of n rings (cf. Fig. 10). (d) Is it useful in finding τ of a necklace of n rings? How about the Brunnian link with n rings from Exercise 4? Calculate τ if you can.

3. THREE CRAYONS DEFEAT AN ARMY OF KNOTS

61

Figure 10. Chain and Necklace of rings 3.5. Tricolorings and linear algebra over F3 . We will now use tools from linear algebra to systematize our study of tricolorings. To this end, we will need to assume knowledge of a few things. You can skip ahead to Section 4 on the Jones polynomial if you don’t know about matrices, systems of linear equations, or adding and multiplying modulo 3. The set of numbers {0, 1, 2} is a perfectly good place for doing arithmetic:5 it is called the field F3 . This just means that you can add, subtract, multiply, and divide in F3 subject to all the usual rules, e.g., distributive law, associative law, etc. However, each time you get a number a ∈ F3 , you divide a by 3 and replace a by its remainder 0, 1, or 2. (In a fancy language, you reduce a mod 3.) For example, 5 = 2 and 7 = 1, 5 + 7 = 12 = 0, 5 − 7 = −2 = 1, 5 · 7 = 35 = 2, and 7 ÷ 5 = 2. In practice, arithmetic mod 3 boils down to 4 simple tables: + 0 1 2

0 0 1 2

1 1 2 0

2 2 0 1

− 0 1 2

0 0 1 2

1 2 0 1

2 1 2 0

· 0 1 2

0 0 0 0

1 0 1 2

÷ 0 1 2

2 0 2 1

1 0 1 2

2 0 2 1

Moving on, you might have learned about matrices and linear algebra working over Q or R (i.e., using rational or real numbers); but in fact you can do linear algebra over any field, including F3 . You can do row operations, find inverse matrices, and solve systems of equations in just the same way. All of the theorems generalize word for word over F3 . To get warmed up, do the following couple of computations with linear algebra over F3 . Exercise 18. Write down the coefficient matrix for the system of equations 2x + y = 0 x + y + z = 1. Then write down the augmented matrix and do row operations to find all solutions to this system over F3 . How many are there? Why?

Exercise 19. Consider the matrix ⎛

⎜ ⎜ ⎜ A=⎜ ⎜ ⎜ ⎝

1 0 0 0 0 0

2 1 0 0 0 0

0 2 1 0 0 0

0 2 0 0 0 0

1 0 2 0 0 0

1 0 0 0 0 0

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

How many solutions does the system of equations A x = 0 have over F3 ?

Exercise 20. Let A be a matrix over F3 . Describe the relationship between

 the number of solutions to A x = 0 and the number of zero rows in the echelon form of A. 5

Review Number Theory I in Volume I or Group Theory I in Volume II.

62

3. KNOTTY MATHEMATICS

Solution: If A is a matrix over F3 with k rows of zeros in its echelon form, then the homogeneous system of equations A x = 0 has 3k solutions. Indeed, each row of zeros in the echelon form of A corresponds to a free variable. After assigning arbitrary values in F3 to all free variables, we can uniquely solve for the remaining (leading) variables. For each of the k free variables, we have three choices (0, 1, or 2 in F3 ); so there are 3k possible k-tuples of  the free variables, and hence 3k overall solutions to our system. 3.6. And on to the knots! Let D be a diagram of a link with m crossings and label the strands in the diagram s1 , s2 , s3 , . . . , sn . Most of the time m = n, but sometimes not! Why not? Essentially, the only counterexample is the unknot drawn with 0 crossings: yet, it will still have 1 strand! For the combinatorially inclined, here is one little exercise on counting, which can be skipped without harm.

 drawn with no self-crossings and unlinked to the rest of the diagram.

Exercise 21. Prove that n = m + u where u is the number of D’s unknots Hint: To every crossing associate the strands that go under it; and to every strand associate the crossings (if any) under which it goes. ♦

Instead of using 3 colors to label the strands, let’s use the numbers 0, 1, and 2. Then a tricoloring of D is an assignment of one of the numbers 0, 1, i or 2 to each strand sk such that at each crossing either all 3 numbers are present or only 1 number is present. Let’s denote the “color” of sk by xk , so that xk ∈ {0, 1, 2}. Thus a “coloring” of D will be a list x1 , x2 , x3 , . . . , xn of “colors” for the strands. But not just any list . . . . Exercise 22. How many strands meet at a single crossing? Examples? Solution: There could be 1, 2, or 3 distinct strands meeting at a single crossing. Examples are provided by the unknot twisted once or twice (cf. Fig. 8), or thrice; but you can easily go with the Hopf link and the trefoil for the 2- and 3-strand crossings.  The variety of possibilities at a single crossing is inconvenient. Instead, imagine an ant sitting on the diagram D in the vicinity of our crossing C. The ant will observe three distinct pieces of strands at C and will not know if they are “glued” within the same strands somewhere far away (as in the picture to the right). For the remainder of this section over F3 , we will take the ant’s viewpoint: the local coloring of a crossing will consists of the three i numbers assigned to the pieces of strands that make up the crossing. Thus, the conditions on a tricoloring say that at each crossing there must be only 1 number repeated three times, or there must be all 3 numbers written in the order determined by our original strand sequence {s1 , s2 , . . . , sn }.

3. THREE CRAYONS DEFEAT AN ARMY OF KNOTS



63

Exercise 23. In our new language, (a) List all possible combinations of local colorings of a single crossing. For each of your combinations, compute the sum of the three elements. What do you notice? (b) Suppose that strands si , sj , and sk (possibly listed with repetitions) meet at a crossing. Based on your observation above, write down an equation that their colors xi , xj , and xk must satisfy. Answer (b): xi + xj + xk ≡ 0 (mod 3).



As we consistently work mod 3, let’s write xi + xj + xk = 0 with the i understanding that this equation lives in F3 . Denote by T (D) the set of tricolorings of a knot diagram D. Then by definition, τ (D) is the size of T (D). From Exercise 23 we can describe T (D) as: ß

T (D) = (x1 , x2 , . . . , xn ) ∈

Fn3

™ xi + xj + xk = 0 if strands si , sj , and . sk meet at a crossing

Since there is one equation of the form xi + xj + xk = 0 for each crossing, the conditions on the list (x1 , x2 , . . . , xn ) are a set of m equations in n unknowns. Ah ha! Finding τ (D), that is, calculating the number of allowable sequences (x1 , x2 , . . . , xn ), is a linear algebra problem over F3 ! Let’s apply this idea. Problem 6. Let 77 be the knot depicted below. Label its strands with the numbers 1 through 7. Find τ (77 ) by completing the following steps. (a) Write down the equations that must be true in order to have a tricoloring of this knot. (b) Write down the coefficient matrix A of the resulting system of 7 equations and 7 unknowns. (c) Do row operations on A to obtain its reduced echelon form B.

The 77 knot Let x = (x1 , x2 , . . . , x7 ). Then a tricoloring is a solution x to the matrix i equation A

x = 0, or, equivalently, to B x = 0. Recalling that you are working over F3 , and keeping in mind monochromatic (a.k.a. trivial) colorings: (d) Decide if this knot is tricolorable. (e) Even better, count the number τ (77 ) of tricolorings of this knot! Have fun by playing with this awesome linear algebra tool in the following: Exercise 24. Apply this linear algebra procedure to all knots whose τ you already know and compare your answers, e.g., (a) our six famous knots and links in Figure 4; (b) the unknot twisted by n consecutive R1-moves in Figure 8; (c) the knots with odd and even names on page 59; (d) the square knot; the linear chain and the necklace with n rings each as in Problem 5; (e) the knots from the knot table in Problem 3 on page 68.

64

3. KNOTTY MATHEMATICS

4. The Jones Polynomial 4.1. Revolutionizing knot theory. The Jones polynomial is another example of a link invariant; but instead of being a number (like the crossing number) or a simple Yes/No (like tricolorability), it is a polynomial. Actually, it’s not quite a polynomial, since it can have half-integer and negative exponents as well, but it is commonly referred to as a polynomial. It was discovered in 1983 by UC Berkeley Professor Vaughan Jones, and it revolutionized the world of knot theory! Suddenly longstanding conjectures were easy to prove and a whole host of generalizations were invented. In 1990, Jones won the Fields Medal for his work.6 Recall that being an invariant means that: • If two knots or links are equivalent, then their Jones polynomials are equal. In other words, if two knots or links have different Jones polynomials, then they are not the same object! • The Jones polynomial is not perfect, though. It can happen that two different knots or links have the same Jones polynomial. 4.2. Orienting links. A key idea in defining the Jones polynomial is the i notion of an orientation on a link : this is just a choice of direction for each

of the link’s components. Here the two possible oriented Hopf links: They turn out to be inequivalent as oriented links, since it is impossible to transform one into the other (via legal link moves) while still keeping the assigned orientations. But to explain why this is impossible would take us too far afield and is not necessary for our purposes. The top i orientation, called the positively-oriented Hopf link, will be denoted by H, while the bottom orientation by H − . To get a feeling for oriented links, play around with the following warm-up questions:



Exercise 25. How many orientations does a knot have? Display all possible orientations of the unknot drawn with 0, 1, or 2 twists (cf. Fig. 8), and decide which are equivalent. Give all orientations for the trefoil, the Whitehead link, and the Borromean rings, and think about which of them are equivalent. 6

Sir Vaughan Frederick Randal Jones was born in 1952 in New Zealand. In 1979 he completed his doctoral studies at the University of Geneva, under the Swiss topologist André Haefliger. The next year, Jones moved to the United States, and after teaching for several years at the University of California at Los Angeles and the University of Pennsylvania, he received a permanent position at the University of California at Berkeley. In 1984, while working in the theory of von Neumann algebras (an area in analysis motivated by group representations, operator theory, ergodic theory, and quantum mechanics), Jones discovered the link invariant known now as the Jones polynomial, which unexpectedly had vast applications in knot theory and re-energized the study of low-dimensional topology. In 2002, Jones received the Distinguished Companionship of the New Zealand Order of Merit, which was renamed Knight Companion in 2009.

4. THE JONES POLYNOMIAL

65

4.3. What is the Jones polynomial? There are several choices for how to define the Jones polynomial. For our purposes, the easiest way is through the so-called skein relation, which relates the Jones polynomials of certain triplets of (oriented) links. The diagrams of these links L+ , L− , and L0 are

L+

L−

L0

Figure 11. Links in the skein relation identical except for at one specific crossing (cf. Fig. 11) where L+ has an overcrossing, L− has an undercrossing, and L0 has no crossing. i Definition 1. The Jones polynomial VL is defined for all oriented links L

by the following three properties.7 • VU (t) = 1, where U (t) is the unknot. • VL is an invariant of links. • VL satisfies the skein relation: for any triplet of oriented links L+ , L− , and L0 as described above (cf. Fig. 11), Ç å √ 1 1 VL − tVL− = t − √ VL0 . t + t The skein relation looks complicated, but it helps us relate the Jones polynomials of knots that differ at one crossing.



PST 21. If you can find three links whose diagrams are identical except at one specific crossing, where they differ as in Figure 11, then you can use the skein relation to relate their Jones polynomials. If you know two of the Jones polynomials, the skein relation will allow you to solve for the third! We will do an example shortly; but first let us mention an open problem: Question 1. (Open) If knot K has Jones polynomial 1, is K equivalent to the unknot? Equivalently, is there a nontrivial knot with Jones polynomial 1? This is such a simple question; yet we still don’t know the answer! Perhaps you can enlighten us someday. 4.4. Building up the trefoil via the skein relation. We will go through a series of examples to build up to computing the Jones polynomial of the trefoil. We already know that VU (t) = 1. To see how the skein relation works in practice, let us move to the next simplest case:

 Exercise 26. Find the Jones polynomial of the unlink with two components. 7

The reader will notice that if we adopt Definition 1, we must prove that the Jones polynomial actually exists and is unique for every link! We don’t have space for this here; however, the advanced reader is encouraged to look up the proof in [44].

66

3. KNOTTY MATHEMATICS

Solution: Let U stand for the (oriented) unknot, and U2 for the (oriented)

i unlink with two components. By changing the uncrossing of U2 to an over-

crossing and an under-crossing, we can relate U2 to two copies of the unknot:

L+ = U

L− = U

L0 = U2

Figure 12. Skein relation for the unlink U2 If VU and VU2 are the Jones polynomials for the unknot and our unlink, respectively, the skein relation yields: Ç å Ç å √ √ 1 1 1 VU =1 1 VU − tVU = −t= t − √ VU2 ⇒ t − √ VU2 . t t t t Using the formula a2 − b2 = (a − b)(a + b), we can now solve for VU2 : Ä √ä Ä √ä 1 1 √ − t · √1t + t √ − t 1 t √ VU2 (t) = √t = = − t− √ · 1 1 √ √ t− t t− t t



We just found the Jones polynomial of the unlink with two components! Now, if T denotes the right-handed trefoil (cf. Fig. 2a), how do we find VT ?

 sumably simpler) links via the skein relation.Keep applying the skein relation PST 22. To calculate VL , reason backwards: relate the link L to other (preto those new links, until you end up with links whose Jones polynomials are already known to you.



Exercise 27. Draw pictures that relate the right-handed trefoil T to other well-known links, one of which is the positive Hopf link H (cf. p. 64). Now we are stuck with the Hopf link ! Get over this obstacle:

 down what the skein relation says in your diagram and solve for V

Exercise 28. Relate the (positive) Hopf link H to well-known links. Write H. Partial Solution: The diagram relates H = L+ to the unlink U2 = L− and the unknot U = L0 ; notice that the only change occurs√ in the √ top crossing. After “skeining,” the final answer comes to VH (t) = − t − t2 t. ♦

L+ = H

L− = U2

L0 = U

Figure 13. Skein relation for the Hopf link We are now ready to put together everything and attack the trefoil again.

4. THE JONES POLYNOMIAL



67

Exercise 29. Using your findings so far, calculate the Jones polynomial VT of the right-handed trefoil T . The careful reader might have noticed that we skipped one simple link: the negatively-oriented Hopf link H − . Its Jones polynomial turns out to be VH − = −t−1/2 − t−5/2 (check it!), which differs from VH ! Have we made a mistake? It is important to understand that the Jones polynomial is an invariant of oriented links. Orientation does not affect the Jones polynomial of a knot (why?); but for a general link, you may get different Jones polynomials, depending on the link’s orientations. With this in mind, Exercise 30. Find the Jones polynomials of the figure 8 knot, the Whitehead link, the Borromean rings, and the square knot.

 in Figure 3. The idea is to be systematic in your calculations: track down The more adventurous reader can also try out the complicated links

which links need to be dealt with and keep a record of the Jones polynomials already found. For those skilled in induction and algebraic operations on polynomials, here are a couple of challenges in true math-Olympiad style.

Problem 7. Find the Jones polynomials of the unlink with n components, the linear chain with n components, and the knots n1 from page 59. 4.5. Mirror, mirror. Imagine taking a knot and switching all the crossings. i Doing this to a knot creates the knot’s mirror image.8 Below you see the figure 8 knot and its mirror image. Are these two knots equivalent? For starters, we should look at their Jones polynomials.



Exercise 31. The Jones polynomial of the figure 8 knot, as you should have shown earlier, is V41 = t2 − t + 1 − t−1 + t−2 . Compute the Jones polynomial of its mirror image to obtain the same result! Are the two knots equivalent?

The figure 8

Its mirror image

Hmm . . . the Jones polynomial can’t tell the difference between these two knots. They might be the same, but they might not! In fact, there are special words to reflect both possibilities. A knot is amphichiral if it is i equivalent to its mirror image; it is chiral otherwise. Problem 8. Make a figure 8 knot and its mirror image out of rope. Play with the ropes to try to see if the figure 8 knot is chiral or amphichiral. If you think it is amphichiral, then prove it using Reidemeister moves! What if not? Wait a minute! Shouldn’t we try this on a simpler example?

 Exercise 32. Is the trefoil chiral or amphichiral?

8 Think about where the name comes from! Nope, despite appearances, it’s not a reflection across a vertical line! Where is the “mirror”? 

68

3. KNOTTY MATHEMATICS

Partial Solution: As you must have found earlier, the right-handed trefoil has VT = t + t3 − t4 . However, its mirror image, the left-handed trefoil T , has VT = t−1 + t−3 − t−4 . Since the Jones polynomial is an invariant of knots, we conclude that the right-handed and left-handed trefoils are different knots, and hence the trefoil is chiral. ♦ Implicitly, we have assumed that the mirror image of a link does not depend on the particular diagram we use . . . . But we haven’t proven this!

 move. Prove that the mirror images of D

Exercise 33. Let D2 be obtained from a link diagram D1 via a Reidemeister 1 and D2 represent the same link.

 their Jones polynomials.

Exercise 34. Below are some pictures of knots, their mirror images, and What do you notice? Make a conjecture! Then give a criterion on the Jones polynomial that implies that a knot is chiral. V61 = t2 − t + 2 − V61 =

1 t2



1 t

2 t

+

1 t2



1 t3

+

1 t4

+ 2 − 2t + t2 − t3 + t4

61

61 V816 = − t2 + 3t − 4 + 6t − t62 + t63 − V816 = −

816

1 t2

+

3 t

5 t4

+

3 t5



1 t6

−4

+ 6t − 6t2 + 6t3 − 5t4 + 3t5 − t6

816

V910 = − t11 + t10 − 3t9 + 5t8 − 5t7 + 6t6 − 5t5 + 4t4 − 2t3 + t2

910

V910 = − +

1 1 3 5 5 t11 + t10 − t9 + t8 − t7 6 5 4 2 1 t6 − t5 + t4 − t3 + t2

910

Figure 14. Chiral or amphichiral? Here are a few of the many intriguing and fundamental properties of the Jones polynomial, some demonstrated by Jones himself in 1985. Theorem 2. For any knot K, its mirror image K, and links L, L1 , and L2 : 2π



2π (a) VK (t) = VK (t−1 ) and VK (e 3 i ) = 1, where e 3 i = cos( 2π 3 ) + i sin( 3 ). d VK (1) = 0, where d/dt is the derivative with respect to t. (b) dt (c) VL (1) = (−2)p−1 , where p is the number of components of L. Moreover, if p is odd, then VL (t) is a polynomial with integer powers; if p is even, then VL (t) is t1/2 times such a polynomial. (d) VL1 #L2 (t) = VL1 (t) · VL2 (t).

We challenge the advanced reader to prove or find proofs of these facts.

4. THE JONES POLYNOMIAL

69

4.6. Mysticism, art, and mathematics. Bumping into knot and link celebrities is a daily occurrence for everyone, whether we realize it or not. For example, the trefoil is often the centerpiece of beautiful jewelry:

Figure 15. Trefoil ring and “Russian” wedding ring



The very popular “Russian” wedding ring (a.k.a. Cartier trinity ring) is simply a link of 3 unknots. As opposed to the Borromean rings, removing one of the unknots does not cause the rest to fall apart but leaves a Hopf link. A feature making it so convenient to wear can be appreciated when the 3 pieces are aligned to put them on a finger: they glide on top of each other, causing the whole ring to smoothly move across the finger! Try this on a homemade link: it’s worth it to see the gliding in action. Is it possible to create a “super-Russian” wedding ring of 4 pieces with similar properties? The Celtic knot (or The Emblem of Divine Inscrutability), rumored to contain all the wisdom of King Solomon, appears in an array of artistic versions. It is actually not a knot but a link of 2 unknots intertwined twice and is known in mathematics as the 4-crossing link.

Figure 16. King Solomon’s knot and the IMO logo There is no way we can omit our last example: the logo of the International Mathematical Olympiad itself is our old friend, the Whitehead link; but instead of being non-trivially tricolored (which it cannot!), the link is 5-colored in honor of another even more famous logo. Can you guess which one? 



Exercise 35. Calculate the Jones polynomials of the Celtic knot and of the “Russian” wedding ring, and compare them, correspondingly, to those of (a) the Hopf link; the unlink with 2 rings; and the Whitehead link; (b) the Borromean rings; the linear chain; and the unlink with 3 rings. You can find on the internet a multitude of intricate knots in a variety of situations. If you have loads of time on your hands, our final exercise will offer you exhaustive practice in calculating and analysing Jones polynomials. Exercise 36. Verify the properties of Jones polynomials in Theorem 2 in all cases of knots and links in this session, including the celebrity ones.

70

3. KNOTTY MATHEMATICS

5. Is This the End? Definitely not! If you would like to learn more about knots, you should have a look at Justin Roberts’ “Knots Notes,” available on his website [68]. Another great resource, as we mentioned earlier, is “The Knot Book” by Colin Adams [1]. Of course, you should also take a peek at the more recent article “The Jones Polynomial” by the master himself, Sir Vaughan Jones on his own website [44]. We mentioned in passing some funny-sounding, yet rigorous knot terminology. If you are curious about a quandle, it is a knot invariant, discussed in the accessibly written “Knot Quandle” by then-undergraduate Elenoir Birrell [10]. For using flypes – a different type of knot transformations – to prove “The Tait Flyping Conjecture,” we direct you to two papers of Menasco and Thistlethwaite [54, 55]. Finally, a writhe – a property of a positivelyoriented link – fails to be a knot invariant, as demonstrated by Hoste et al. in “The First 1, 701, 936 Knots” [42]. Regarding the open Question 1 on page 65, check out “Links with trivial Jones polynomial” [81] and “Infinite families of links with trivial Jones polynomial” [23] by Thistlethwaite et al. Even the basic notion of Reidemeister moves enters into modern research nowadays; for some upper bounds on “The number of Reidemeister moves needed for unknotting” we direct you to Hass and Lagarias’ paper [38]. Many of the images in this session were created using Robert Scharein’s KnotPlot software at http://knotplot.com, which you should absolutely download and play with! It allows you to load knots from a library up to 10 crossings, see them in 3-D, compute polynomial invariants, sketch new knots, and much, much more! The author would also like to thank Henning Hohnhold for the idea to include the Alexander the Great story.

6. Hints and Solutions to Selected Exercises Exercise 1+ . In Figure 2, the upper loop of the knot on the right has two twists. Just untwist it to get the trefoil. 

T

T

Figure 17. Right-handedness and Unknotting 2-crossings knots Why is the trefoil T in Figure 2a called right-handed ? Orient T by tracing it in one of the two possible directions (as in Fig. 17a). Then every crossing of T is right-handed : if you grasp the over-strand in your right hand with the

6. HINTS AND SOLUTIONS TO SELECTED EXERCISES

71

thumb pointing in its direction, then your other fingers point in the direction of the under-strand; thus, each crossing is of type L+ (cf. skein relation in Fig. 11). This remains true for the right-handed trefoil T regardless of its orientation, and it is false for the left-handed trefoil. Exercise 3. A knot diagram with 1 or 2 crossings inevitably results in the unknot, as demonstrated by the untwisting in Figure 8. Trying something “different” with 2 crossings as in Figure 17b doesn’t help: just pull the strand that is draped over the other to eliminate the crossings and get the unknot. ♦ Exercise 4. Take n − 1 unlinks and arrange them in a line so that each one overlaps slightly with the one before and the one after. Add in the final link by weaving through these, going over and under, over and under, and then fuse the ends of this final link together. A case of a Brunnian link with 4 components is displayed in Figure 3b; one actually has to stare at it for a while to realize that our construction recipe is not followed to the letter. For another construction using “rubberbands” check [57]. To see a Brunnian link with 5 components go to YouTube at www.youtube.com/watch?v=vshcgnSUtyI and watch it fall apart in slow motion when one link is cut. In fact, for each n the infinitely many Brunnian links with n components were classified in 1954 by John Milnor via what is now called Milnor invariants [56]. ♦ Exercise 5. Use two R1 moves or one R2 move.



Exercise 6. Whichever crossing you choose to change, the Hopf link will become an unlink, the trefoil and the figure 8 knot will turn into the unknot, and one of the Borromean rings will peel off, forcing the remaining two rings into a Hopf link. In this respect, the Whitehead link is more interesting: changing any of its 4 “outside” crossings results in the Hopf link; but changing its central crossing breaks it into an unlink! You should check that the transformations described here (except for crossing changes, of course!) can be expressed as sequences of Reidemeister moves. ♦ Exercise 9. Show that any diagram with three crossings represents either the trefoil or the unknot. ♦ Exercise 10. The solution to Exercise 6 actually tells us that the unknotting number is 2 for the Borromean rings, 0 for the unknot (of course!), and 1 for all other knots and links in Figure 4. ♦ Exercise 11. The links in Figure 4 will be ordered if you insert the 2-strand Hopf link between the 1-strand unknot and the 3-strand trefoil. Note that the number of strands in these links equals the number of crossings, except for the unknot. Do you know why? Check out Exercise 21. ♦ Problem 1. Tri-color the diagram D. An R1 move can be performed only on a monochromatic crossing, after which the crossing is eliminated but its color is preserved (cf. Fig. 18a). For R2, the over-crossing strand is all one

72

3. KNOTTY MATHEMATICS

color, while the three under-crossing strands can be colored in two ways (cf. Fig. 18b-c). In either case, after pulling apart the strands by move R2, the diagram remains tricolorable: indeed, all strands “exiting” the picture preserve their colors, thereby allowing for the rest of the (unseen) diagram to remain tricolored as before. Note that we are only allowed to change the color of strands that lie entirely inside our picture. R1

R2

R2

Figure 18. Moves R1 and R2 and tricolorability The same idea governs tricolorability when applying move R3. The first picture in Figure 19 has five “exiting” strands (in black) and one “non-exiting” strand (in green). There are five ways to tri-color this diagram segment: two cases with monochromatic (blue) central crossing and three cases with tricolored central crossing. Check that after move R3, all “exiting” strands have preserved their colors, while the “non-exiting” central strand may preserve its color (as in column 2) or may change its color (as in column 3).  R3 R3 R3 R3 R3

Figure 19. Move R3 and tricolorability Exercise 13. No: Hopf link, Figure 8, Whitehead link, Borromean rings. Yes: trefoil, 74 knot. The picture has 5 trefoils, including the hairdo! ♦ Exercises 14-15. These knots are tricolorable if and only if the number of crossings is divisible by 3. Think about why! ♦ Problem 2. Use the solution to Problem 1.



Exercise 16. τ (Trefoil) = 9; τ (Figure 8) = 3; τ (Square) = 27.



Problem 3. τ (61 ) = 9 = τ (61 ); τ (816 ) = 3 = τ (816 ); τ (910 ) = 9 = τ (910 ). ♦ Exercise 17. Yes, K1 #K2 is tricolorable. Let α1 and α2 be the strands in K1 and K2 , respectively, on which the “surgery” will be performed. If α1 and α2 have the different colors, permute the colors on K2 to make α2 ’s color match α1 ’s color. Perform then the surgery and extend that common color  onto the pieces connecting K1 and K2 within K1 #K2 .

6. HINTS AND SOLUTIONS TO SELECTED EXERCISES

73

Problem 4. τ (K1 #K2 ) = 13 τ (K1 )τ (K2 ). The factor of 1/3 is explained within the solution of Exercise 17. In order for the colorings of K1 and K2 to match for a coloring of K1 #K2 , the strands α1 and α2 must be colored the same, i.e., every coloring of K1 can be matched with exactly a third of ♦ the colorings of K2 . Problem 5. (a) Connect the unknot U to a knot K1 doesn’t change K1 , i.e., K1 #U = K1 . However, connecting the Hopf link H to K1 loops an extra ring around the strand α1 of K1 (α1 was described in the previous solution); thus, in any coloring of K1 #H, the color of α1 forces the same color on the extra ring (why?), implying τ (K1 #H) = τ (K1 ). Since τ (U ) = 3 = τ (H), we can now verify the formula from Problem 4: • τ (K1 #U ) = τ (K1 ) = 13 τ (K1 )τ (L0 );  • τ (K1 #H) = τ (K1 ) = 13 τ (K1 )τ (H). (b) If τ (K1 #K2 ) > 3, then τ (K1 )τ (K2 ) > 9, i.e., τ (K1 ) > 3 or τ (K2 ) > 3; so one of K1 or K2 is tricolorable. This is a converse to Exercise 17. To summarize, K1 #K2 is tricolorable iff one of K1 or K2 is tricolorable.  (c) According to part (a), a linear chain of n rings, Ln , can be constructed by consecutively summing n − 1 Hopf links H to the unknot U = L1 ; moreover, each repetition of this operation preserves the number of tricolorings. Hence, τ (Ln ) = τ (L1 ) = 3, and Ln is not tricolorable (which can be directly verified by trying and failing to tricolor a linear chain).  (d) If Nn is the necklace of n rings, then Nn = Nn #U ; but this is not useful in calculating τ (Nn ). Moreover, Nn cannot be viewed as a non-trivial sum of other links. To see this, make two cuts in Nn (in the same ring or in two different rings) and reconnect the 4 ends in an attempt to decompose into a connect sum and reconstruct links L1 and L2 with L1 #L2 = Nn . The result will depend on the choices you make. Check out all possibilities and conclude that we cannot effectively use the formula for τ from Problem 4. We have to calculate τ (Nn ) by brute force! Tricoloring one of Nn ’s crossings forces a unique tricoloring on the adjacent crossings, which in turn forces a unique tricoloring on their adjacent crossings, and so on and so forth. In order to successfully complete the tricoloring of the whole necklace, the tricoloring of the final crossing will have to be matched as coming from both directions along Nn . You can easily verify that this happens only when n is divisible by 3. In such a case, the initial tricoloring of a crossing uniquely determines the whole tricoloring of Nn . As there are exactly 3! = 6 possible tricolorings (plus 3 monochromatic colorings) of a crossing, this makes a total of 9 colorings of Nn . The final ♦ answer is τ (Nn ) = 9 if 3 divides n, and τ (Nn ) = 3 otherwise. Attempting to tricolor the Borromean rings will quickly lead to a contradiction. However, deciding if a general Brunnian link with n components is tricolorable or counting its tricolorings, will require deeper investigation (as suggested by the comments to Exercise 4). ♦

74

3. KNOTTY MATHEMATICS Å ã Å ã Å ã 2 1 0 0 1 0 −1 −1 1 0 −1 −1 Exercise 18. ∼ ∼ · 1 1 1 1 1 1 1 1 0 1 2 2

Hence z is non-leading (free) variable, x = z − 1, and y = 2 − 2z = 2 + z (as −2 = 1 in F3 ). There are 3 choices for z ∈ F3 , each of which completely determines x and y. Thus, overall there are 3 solutions.  Exercise 19. The 3 leading 1s in the non-zero rows determine the 3 leading variables x1 , x2 , and x3 . Each of the non-leading variables, x4 , x5 , and x6 ,  can be 0, 1, or 2. Thus, overall there are 33 = 27 solutions. Exercise 23. (a) 0, 0, 0; 1, 1, 1; 2, 2, 2; 0, 1, 2; 2, 0, 1; 1, 2, 0; 0, 2, 1; 1, 0, 2; 2, 1, 0: 3 mono-colorings and 6 tricolorings. The sum is always divisible by 3.  Problem 6. (a)-(c) Below is the linear system. Of course, you might have written down the equations in another order or labeled the strands differently. This will make your matrix A look slightly different: your rows or columns might be shuffled. x1 + x2 + x7 x1 + x2 + x3 x3 + x4 + x7 x4 + x5 + x6 x1 + x4 + x5 x3 + x5 + x6 x2 + x6 + x7

⎛ =0 1 ⎜1 =0 ⎜ ⎜0 =0 ⎜ = 0 ⇒ A=⎜ ⎜0 ⎜1 =0 ⎜ ⎝0 =0 =0 0

1 1 0 0 0 0 1

0 1 1 0 0 1 0

0 0 1 1 1 0 0

0 0 0 1 1 1 0

0 0 0 1 0 1 1

⎛ ⎞ 1 1 ⎜0 0⎟ ⎜ ⎟ ⎜0 1⎟ ⎜ ⎟ ⎜0 0⎟ ∼ B = ⎜ ⎟ ⎜0 0⎟ ⎜ ⎟ ⎝0 0⎠ 1 0

0 1 0 0 0 0 0

0 0 1 0 0 0 0

1 2 0 1 0 0 0

1 2 1 0 1 0 0

0 0 1 0 1 0 0

⎞ 0 1⎟ ⎟ 0⎟ ⎟ 2⎟ ⎟ 1⎟ ⎟ 0⎠ 0

Performing row operations on A, we find its reduced echelon form B; your matrix B must too have 5 leading 1s in the top 5 rows, and 2 zero rows at the bottom.  (d) In general, solutions to the system of equations are the null space of A. We can always make a monochromatic coloring (e.g., set xi = 2 for all i); this coloring would certainly be a solution to the system of equations; but it won’t be a nontrivial tricoloring. The set of monochromatic colorings in T (D) is the span of the vector with all 1s and is, therefore, 1-dimensional. Any other solution vector will constitute a nontrivial tricoloring, raising the dimension of the null space of A to 2 or larger. Equivalently, the echelon form of A must have more than 1 row of zeros.  Since our B has 2 rows of zeros, our knot 77 is tricolorable! (e) Since B has 2 free variables, with 3 choices for each, there are 9 tricolorings of 77 , i.e., τ (77 ) = 9. More generally, Lemma 1. τ (L) = 3 dim Null(A) for a link L with matrix A.



Exercise 25. A priori, a knot has 2 possible orientations: by following it all around in one direction, or in the other. Revolving the standard diagrams of the unknot (with as many twists as you wish), the trefoil, and the figure eight about a vertical line transforms one orientation to the other, and hence each of these knots has exactly 1 orientation.

6. HINTS AND SOLUTIONS TO SELECTED EXERCISES

75

A link L with k components has a priori 2k possible orientations (why?), some of which may be equivalent. As mentioned earlier, the Hopf link has only 2 (not 4) non-equivalent9 orientations. Revolving the Whitehead link W about a vertical or horizontal line shows that it has only 1 orientation (instead of 4). For the Borromean rings, rotations by 120◦ and a revolution about a vertical line reduce the 23 = 8 initial orientations to only 2: with all rings oriented the same way, or with one ring oriented opposite to the others. Are these orientations really non-equivalent? ♦ Exercise 27. By changing one crossing of the right-hand trefoil T = L+ , we obtain the unknot U = L− and the Hopf link H = L0 (cf. Fig. 20).

L+ = T

=

=

L− = U

L0 = H

Figure 20. Trefoil in the skein relation Exercise 29. From Exercises 27–28 we have:



√ 1 1 1 √ VT − tVU = t − √ VH = t− √ t(−1 − t2 ) = (t − 1)(−1 − t2 ). t t t

Since VU = 1, this simplifies to VT = t(t2 + 1 − t3 ) = t + t3 − t4 .



Exercise 30. For a link L, the table below lists triplets (L+ , L− , L0 ) entering in a skein relation with L. Here X  Y is the disjoint union of X and Y . Link L

L+

two unknots U2 positive Hopf link H right-hand trefoil T Figure 8, 41 Whitehead link W Borromean rings B Square knot S

U H T U H B S

L−

L0

U U2 U2 U U H 41 H W U H U W T T #H

Jones polynomial VL −t1/2 − t−1/2 −t1/2 − t5/2 t + t3 − t 4 t2 − t + 1 − t−1 + t−2 t−3/2 (−1 + t − 2t2 + t3 − 2t4 + t5 ) −t3 +3t2 −2t+4−2t−1 +3t−2 −t−3 (t + t3 − t4 )(t−1 + t−3 − t−4 )

For a knot K, changing the orientation on one strand in a local crossing picture forces us to change the orientation of the other strand (by tracing around the knot) and, thus, the preserves the crossing type L+ , L− , and L0 , and does not affect VK . For a link L, though, VL may depend on the orientation of L: you can easily see this for the negative Hopf link H. Why doesn’t it matter for the unlink U2 ? We leave it to the reader to decipher which orientations we have used for the Whitehead link W and the Borromean rings B above and if VW and VB are affected by our choices. The positive and negative Hopf links H and H − are inequivalent (the fact that they have different Jones polynomials proves this). The next exercise will help you calculate their Jones polynomials. Ditto for the Borromean rings. 9

76

3. KNOTTY MATHEMATICS

While the first five examples in the table can be handled one by one in the listed order, for B and S we need to know the Jones polynomials of H  U and T #H, which must be computed separately. To get VHU , note that U2 = U  U . Using the “skein” √ triplet √ (L+ = U , L− = U , L0 = U2 ), we found earlier that VU2 = −(1/ t + t)VU . Our calculation generalizes to any disjoint union LU . Indeed, the corresponding skein relation is represented by (L+ = L, L− = L, L0 = L  U ) (why?), and

√ 1 1 VL − tVL = t − √ VLU . t t Algebra manipulations similar to those in the text for VU2 yield Ä √ äÄ 1 √ä Ä ä 1 1 √ √ + t − t Ä − t V √ä L t t t Ä√ ä = VL = − √1t + t VL . VLU = √ 1 1 t − √t t − √t This allows for a painless calculation of VHU (and VB ) and also establishes Ä √ä Lemma 2. VLU = − √1t + t VL for any link L. Finally, VS for the square knot S can be yanked out through the skein relation with (S, T, T #H). A faster approach would be to apply Theorem 2 (cf. p. 68), using that S is the connected sum of T and its mirror image T : VS = VT #T (t)

Thm. 2(d)

=

VT (t) · VT (t)

Thm. 2(a)

=

VT (t) · VT (t−1 ).



Problem 7. We can view the unlink Un with n components as the disjoint solution, union Un = Un−1  U . According to Lemma 2 from the previous Ä √ä disjointly adding an unknot U to a link L multiplies VL by − √1t − t . Starting with the unknot U1 = U and applying this procedure n − 1 times results in Un . Since VU1 = VU = 1, we arrive at Ä √ än−1 Ä√ än−1 VU1 = (−1)n−1 t + √1t .  VUn = − t − √1t Draw the diagram of the linear chain Ln of n rings as in Figure 23, ignoring orientation. Now orient counter-clockwise all rings of Ln : this makes all crossings positively-oriented. Skeining on an end crossing,Ä L+ = √Lnä, L− = Ln−1 U , and L0 = Ln−1 . Substituting VLn−1 U = VLn−1 − √1t − t , Ä √ √ä yields VLn = VLn−1 − t − t2 t . But we already knew this! Recall that joining the Hopf link H to any link L loops an extra ring around a component of L, so that we can view consequently Ln inductively as Ln = Ln−1 #H, and Ä the ä √ Jones √ polynomials multiply: VLn = VLn−1 · VH = VLn−1 − t − t2 t , where L1 = U . Thus, Ä √ Ä √ än−1 VL1 = (−1)n−1 t1/2 + t5/2 )n−1 .  VL n = − t − t 2 t Finding Vn1 for the knots n1 from page 59 requires a more intricate inductive reasoning with skein triplets. To completely understand this solution will require familiarity with recursive sequences.

6. HINTS AND SOLUTIONS TO SELECTED EXERCISES

77

Let Tn denote the link with 2 components that twist n times around each other. Consider first the case for n odd. Skeining on any crossing, check that L+ = n1 , L− = (n − 2)1 , and L0 = Tn−1 . In turn, skeining on Tn−1 yields L+ = Tn−1 , L− = Tn−3 , and L0 = (n − 2)1 . For simplicity, write an = Vn1 and bn = VTn . Therefore, 1 t

an − tan−2 = (t1/2 − t−1/2 )bn−1 ;

1 t bn−1

− tbn−3 = (t1/2 − t−1/2 )an−2 .

Solve for bn−1 from the first equation and then shift down the indices in the result to obtain an expression for bn−3 too. Substitute these findings into the second equation to eliminate all bk ’s and derive a “symmetric” recursive relation involving the ak ’s alone: an − (t3 + t)an−2 + t4 an−4 = 0 ⇒ an − t3 an−2 = t(an−2 − t3 an−4 ). The last representation rolls down to the lowest possible index n = 5: an − t3 an−2 = t(n−3)/2 (a3 − t3 a1 ) ⇒ an = t3 an−2 + t(n−3)/2 (t − t4 ), where a3 = VT = t +t3 −t4 and a1 = VU = 1. Rolling down the last equation to the lowest possible index n = 3 results in a direct formula for the an ’s: (1)

Vn1 = an = t

n−1 2

î

ó

tn−1 + (1 + t2 + t4 + · · · + tn−3 )(1 − t3 ) .

Using a geometric series, we can rewrite (1) in a closed form as (2)

Vn1 = an =

tn+2 −tn+1 −t3 +1 1−t2

t

n−1 2

.

The compact formula (2) is cumbersome to work with, as it requires long division. Since n is odd, it is evident from the direct formula (1) that Vn1 is an ordinary polynomial with positive integer powers of t and coefficients ♦ ±1. For example, one can check that V51 = −t7 + t6 − t5 + t4 + t2 . We leave the case of n1 with n even to the reader. Note that the bottom crossing in all such knots is special: changing it unravels the whole n1 into the unknot U . The final answer is: Vn1 = (t3 + t − t5−n + t2−n )/(t + 1). ♦ Problem 8. 41 is amphichiral: it takes 8 Reidemeister moves to show it. ♦ RI

Exercise 33. Let P1 → P2 be a Reidemeister move, where P1 and P2 are the parts of the diagrams D1 and D2 affected by the move (as on p. 53). It RI ♦ suffices to show that P1 → P2 for the mirror images of P1 and P2 . Exercise 34. For a knot K and its mirror image K, VK (t) = VK (t−1 ) (cf. Theorem 2(a)). Indeed, if links L+ , L− , and L0 satisfy the skein relation, then L+ , L− , and L0 also satisfy the skein relation, but with L+ and L− playing opposite roles. Thus (3) (4)

√ 1 VL+ (t) − tVL− (t) = t− t

√ 1 ⇒ VL− (t) − tVL+ (t) = t− t

1 √ VL0 (t) t 1 √ VL0 (t) t

78

3. KNOTTY MATHEMATICS

Switching t → t−1 and multiplying by −1 yields the mirror image of (3):

√ 1 1 VL+ (t−1 ) − tVL− (t−1 ) = t − √ VL0 (t−1 ). t t

(5)

Thus, if you know that the desired statement is true for the Jones polynomials of two of L+ , L− , and L0 , you can deduce the statement for the third, mirror image too. For instance, if you know that VL+ (t−1 ) = VL+ (t) and VL− (t−1 ) = VL− (t), then the LHS’s of (3) and (5) are identical, forcing their RHS’s to be identical too, i.e., VL0 (t−1 ) = VL0 (t).  Exercise 35. Orient the Celtic knot C as in Figure 21a. From here, the ♦ Jones polynomial is VC = −t−9/2 − t−5/2 + t−3/2 − t−1/2 .

L− = C

=

=

L+ = H −

L0 = U

Figure 21. Celtic knot C in the skein relation As predicted by Theorem 2(c), VC , VU2 , VH , and VW , contain VT , VW , VB , and VS have only integer powers of t (why?).

= L+ = R



t, while 

=

L− = L 3

L0 = U2

Figure 22. Russian “wedding” knot R in the skein relation Orienting the Russian “wedding” ring R as in √ Figure 22a, and skeining on its bottom crossing yields 1t VR − tVL3 = ( t − √1t )VU2 . Skeining on √ L3 (cf. Fig. 23) yields 1t VHU − tVL3 = ( t − √1t )VH . Eliminating the common term √ tVL3 from the equations and applying our formula VHU = √ −( t + 1/ t)VH results in VR = t4 + t2 + 2. There is no surprise that all powers are integer here (why?). But is there an explanation for why the exponents are all non-negative, i.e., that VR is an ordinary polynomial? ♦

L− = L 3

L+ = H  U

L0 = H

Figure 23. Oriented 3-ring linear chain in the skein relation

Session 4

Multiplicative Functions. Part I The Infinite-Raffle Challenge Zvezdelina Stankova Sneak Preview. To enter Multiplicative Land, we’ll have to get tickets from an infinite raffle. While walking through villages of relatively prime numbers and fields of perfect squares, while examining prime decompositions of castles and crossing geometric series rivers, we will be constantly searching for ways to win this raffle game. To this end, we will make friends with the two-faced duke, the function ε, and the princes of divisors, τ and σ; we will meet their sum-function relatives Sε , Sτ , and Sσ , and realize just how contagious multiplicativity is! Invoking the strength of induction, we will eventually emerge victorious with a winning raffle ticket, only to discover that even deeper challenges await us in this Multiplicative Land of Dirichlet, Möbius, and Euler, in Part II. A beginner with some basic knowledge from Number Theory I will be wellequipped to follow our journey. The advanced reader can study the summarizing Figure 1, hop quickly to the olympiad-hurdle Problem 8, and upon clearing it, plunge directly into Part II, the intermediate-level continuation.

1. Infinite Raffle: the Initial Setup Suppose we buy several tickets from an infinite raffle, that is, a lottery with infinitely many tickets. Each ticket has some natural number written on it. We have a favorite number in mind, say, 2009, and we would like to get a ticket with that number on it. But will there necessarily be a ticket with 2009 on it? Of course, it depends on which particular numbers are written and how they are distributed among the raffle tickets. Here is one interesting way of doing just that. Problem 1. (∞-Raffle) There are infinitely many tickets, each with one natural number on it. For any n ∈ N the number of tickets on which divisors of n are written is exactly n. For example, the divisors of 6, {1, 2, 3, 6}, are written in some variation on 6 tickets, and no other ticket has these numbers written on it. Prove that any n ∈ N is written on at least one ticket. 79

80

4. MULTIPLICATIVE FUNCTIONS

According to Problem 1, our number1 2009 will indeed appear on some ticket. But why is that so and how can we prove it? 1.1. Initial exploration. Let’s mess a bit with some initial data to get a feeling for ∞-Raffle. Try to solve the first cases for n = 1, 2, 3, 4 on your own before reading the ensuing discussion below. • The easiest number to be tackled is obviously n = 1: it has to be written on exactly 1 ticket since {1} constitutes all divisors of 1. • The next number is n = 2: its divisors {1, 2} must be written on a total of 2 tickets; we just found out that 1 is written on exactly 1 ticket, so that 2 has no choice but to appear on the remaining 1 ticket and on no more tickets. • We apply the same analysis for n = 3: its divisors {1, 3} must be written on a total of 3 tickets; as 1 is already known to occupy exactly 1 ticket, 3 must appear on exactly 2 tickets. • For n = 4 the situation is marginally more exciting: the divisors {1, 2, 4} must be written on a total of 4 tickets; knowing that each of 1 and 2 is written on a unique ticket, 4 must appear on the remaining 2 tickets.

The reader has probably gathered by now that,



PST 23. In order to solve ∞-Raffle, i.e., to prove that every number appears on at least 1 ticket, we must do something more: we need to introduce a “stronger” object, a function R(n) that counts the exact number of tickets on which n appears. The function R (for “Raffle”) suggested by PST 23 is the main player in the solution to the ∞-Raffle Problem. We already know its first few values: R(1) = 1, R(2) = 1, R(3) = 2, and R(4) = 2. To find out on how many tickets 5 is written, we just calculate R(5): the divisors of 5 are {1, 5}, written on a total of 5 tickets, so that R(1) + R(5) = 5 ⇒ 1 + R(5) = 5 ⇒ R(5) = 4. Similarly, the divisors {1, 2, 3, 6} of 6 produce an equation for the total number 6 of tickets on which they appear: R(1) + R(2) + R(3) + R(6) = 6 ⇒ 1 + 1 + 2 + R(6) = 6 ⇒ R(6) = 2. Exercise 1. Continue with the above calculations up to n = 10 to find out that R(7) = 6, R(8) = 4, R(9) = 6, and R(10) = 4. The impatient reader should keep on calculating R(n) until at least n = 20 to see if a pattern for the function R pops up. 1 From now on, “numbers” and “divisors” will refer to natural numbers and divisors, until we lift this restriction in Part II.

1. INFINITE RAFFLE: THE INITIAL SETUP

81

1.2. Brute force bows to general abstract theory. Using the above method, it should be clear that one can determine R(n) for any n, as long as all R(k) for smaller k are already calculated. Although this gives one way of proving that our favorite number 2009 will appear on some ticket (just grind out all numbers R(1), R(2), . . . , R(2009)), these close-to-insane calculations are definitely not the way intended by the authors of Problem 1: for one, calculating R(2009) alone will not prove that every number is written at least once on the tickets. In light of our new function R(n), the ∞-Raffle Problem can be paraphrased to say: Problem 1 . (∞-Raffle) Show that R(n) ≥ 1 for every n ∈ N. This is far from a simple task. Interestingly, trying to prove just the inequality (≥ 1) is much harder than trying to find the exact values R(n) and compare them to 1. It is also true that, in order to conquer our problem, we will require much more sophisticated methods than brute-force calculations. So, here is the plan: for the remainder of the session, we will



PST 24. Step back and look at the ∞-Raffle Problem from different angles, discover and formalize properties of R(n) along with a bunch of its sibling functions, develop a new theory of multiplicative functions to explain all of the arising phenomena, and ultimately produce an exact formula for R(n). At every stage of creating this new theory, we will reconsider the ∞Raffle Problem, check how it relates to our new discoveries, and describe the progress we have made on it up to that moment. M s group structure

Dirichlet series nfn(n) s Multiplicative functions M

Sumfunctions Sf

Dirichlet product 

∞-Raffle

Riemann zeta-function ζ

Möbius function μ

Euler function φ

Möbius inversion

Figure 1. ∞-Raffle within the larger picture In mathematics, the overarching PST 24 is referred to as “abstracting properties” and is used to create new theories that encompass broad collections of objects with common properties. A prime example of this approach is the very topic of abstract algebra, a mandatory upper-division course for every math major. The word “abstract” should not, however, deceive you: even abstract theories can have abundant applications in practice. For instance, the Rubik’s Cube sessions in this book series present a particular application of abstract algebra to a concrete problem. Ditto, producing an

4. MULTIPLICATIVE FUNCTIONS

82

exact formula for the ∞-Raffle function R(n) at the end of this session will come as a direct consequence of the abstract theory of multiplicative functions. The real beauty of the abstraction approach of PST 24 is that, in the context of these two multiplicative sessions, it will • lead us to a new and deeper understanding of numbers, functions, and relations between them, and • empower us to conquer numerous other difficult problems that we could not have solved before. Figure 1 illustrates the richness of the land M of multiplicative functions. The - area summarizes the current session, while the six-concept area marked by  will be developed in the intermediate-level Part II. Both sessions contain (different) solutions to the ∞-Raffle puzzle. Part II will venture into more advanced areas such as M’s group structure and the Dirichlet series, and touch upon the famous Riemann zeta-function ζ(s). An historical overview at the end will link six great mathematicians who have contributed to the topic of multiplicative functions and its various extensions.

2. What are Multiplicative Functions? 2.1. Basic definitions. The first thing to notice about the ∞-Raffle function R(n) is that it essentially differs from commonly used functions such as g(x) = x2 : the variable x in g(x) is a real number (x ∈ R); in contrast, the variable n in R(n) is just a natural number (n ∈ N). Thus, R has the restricted domain of N. Such functions have a special name: i Definition 1. A function f : N → C is called arithmetic.

Here C is the set of complex numbers. If you don’t feel comfortable with C, for now you can safely replace it with the set of integers Z. For instance, R(n) is arithmetic because R : N → Z. The important thing to remember about arithmetic functions is that their inputs can only be natural numbers. Let A denote the set of all arithmetic functions. This is a rather large i set involving all sorts of functions. In these sessions we will concentrate on a special subset M of A comprised of all multiplicative functions. Why M is so special can be explained by the fact that it is usually easier to calculate explicit formulas for multiplicative functions and not so easy for arbitrary arithmetic functions.2 Definition 2. An arithmetic function f : N → C is multiplicative if for any

i relatively prime m, n ∈ N:

(1)

f (mn) = f (m)f (n).

2 For the advanced reader, M is special on a deeper level partly because it is closed under the Dirichlet product in A, as we will discover in the Part II continuation.

2. WHAT ARE MULTIPLICATIVE FUNCTIONS?

83

Recall that m and n are relatively prime if they share no common divisor other than 1. For example, 9 and 20 are relatively prime, but 9 and 6 are not. Thus, any multiplicative function must satisfy f (180) = f (9)f (20) and f (1) = f (1)f (1) but not necessarily f (54) = f (9)f (6) (why?). While it is obvious that the name “multiplicative” is inspired by equation (1), it is not immediately clear why relative primeness should be involved at all. 2.2. Trivial examples. Looking at Definition 2, our first impulse is to construct very simple examples of multiplicative functions f that always satisfy (1), regardless of whether m and n are relatively prime or not. One such obvious example is f (n) = n for all n (check it!). It is called the identity i function since it returns the same output as the input n; thus, id(n) = n for all n ∈ N. Another such trivial example is f (n) = n2 , as f (mn) = (mn)2 = m2 n2 = f (m)f (n) for all m, n ∈ N. For that matter, any power function f (n) = na (for a fixed a ∈ R) is also multiplicative (why?). Such functions will be called strongly 3 multiplicative for the obvious reason that they satisfy (1) for all pairs of numbers (m, n). Definition 3. An arithmetic function f : N → C is strongly multiplicative if i f (mn) = f (m)f (n) for any m, n ∈ N. Exercise 2. How about constant functions: are any of them multiplicative? Solution: If f (n) = c is a constant multiplicative function, then f (1) = f (1 · 1) = f (1)f (1), so that c = c2 ⇒ c(c − 1) = 0 ⇒ c = 0 or 1. It is easy to check that the constant functions f (n) = 1 and f (n) = 0 satisfy Definition 3, making them strongly multiplicative.  The constant function 1 is so important in our upcoming analysis that i we give it the special name ι; thus,

ι(n) = 1 for all n ∈ N. As a bonus, we learn from the above solution that for any multiplicative function f we must have f (1) = 1 or f (1) = 0 (as these are the only numbers making the equation f (1) = f (1) · f (1) work). Further, if f (1) = 0 then f (n) = f (n · 1) = f (n) · f (1) = 0 for all n ∈ N, i and the whole function is 0, denoted by f = O. From this we see that all

interesting cases of multiplicative functions have f (1) = 1. 3

In literature, also referred to as totally or completely multiplicative.

4. MULTIPLICATIVE FUNCTIONS

84

2.3. Semi-trivial examples. If we want to merge the two constant functions ι and O into a single “hybrid” multiplicative function, we could define ®

i

ε(n) =

1 if n = 1; 0 if n ≥ 2.

Check a couple of easy cases in order to Exercise 3. Verify that ε(n) is a strongly multiplicative function. But why would anyone want to consider such a two-value multiplicative function? This will become transparent later in Part II where we define Dirichlet product on the set of arithmetic functions A. While we are still on the topic of strong multiplicativity, try the following preparatory problem:

 there any functions (other than ε(n)) that attain only two distinct values?

Exercise 4. Describe all strongly multiplicative functions. Among them, are Even though the definition of strong multiplicativity does not involve (at least on the surface) relative primeness, the solution to Exercise 4 heavily depends on the notion of the prime decomposition for any n ∈ N: n = pa11 pa22 · · · par r ,

(2)

where p1 , p2 , . . . , pr are the distinct prime divisors of n and a1 , a2 , . . . , ar are the corresponding positive exponents. Partial Solution to Exercise 4: Let f be strongly multiplicative. Definition 3 then allows us to split f (n) along any divisors of n. For example, for a prime power pa we can split as follows: f (pa ) = f (p · p · · · p) = f (p) · f (p) · · · f (p) = (f (p))a .

 a



 a



More generally, we can split f (n) along the prime decomposition of n: ⇒ f (n) = (f (p1 ))a1 (f (p2 ))a2 · · · (f (pr ))ar . Thus, to completely know f we need to know only the values f (p) for any prime p. These values can be arbitrarily assigned, as long as f (1) = 0 or 1 (why?). Of course, if we set f (1) = 0, we end up with the constant function O. We summarize: any strongly multiplicative function f is either the 0function O, or it is constructed in the following way. For any prime number pi we arbitrarily choose a (complex) number bi , set f (pi ) = bi , and expand along the prime decomposition of n; that is, we define f (n) = f (pa11 pa22 · · · par r ) := ba11 ba22 · · · bar r .



For instance, if we set f (pi ) = p2i for all primes pi , we get back the square function f (n) = p1 2a1 p2 2a2 · · · pr 2ar = n2 . If we set all f (pi ) = 1, we get back the constant function ι(n) = 1. And finally, if we set all f (pi ) = 0 but insist on f (1) = 1, we get back the hybrid function ε(n). Needless to add, all of these functions are strongly multiplicative, as observed earlier.

2. WHAT ARE MULTIPLICATIVE FUNCTIONS?

85

2.4. “Truly” multiplicative examples. As promised in the introduction, we pause here to reassess the situation with the ∞-Raffle function R(n). Question 1. Is R(n) strongly multiplicative? Answer: Recall the initial data we found, R(1) = R(2) = 1, R(3) = R(4) = R(6) = 2, and R(5) = 4. For strong multiplicativity, we need to have R(4) = R(2) · R(2), but this is false as 2 = 1 · 1. We have finally come across a function which is not strongly multiplicative!  Question 2. Is R(n) at least multiplicative? From the above values of R(n) it looks like R(n) still has a shot at being multiplicative. For instance, since 2 and 3 are relatively prime, we must have R(6) = R(2) · R(3), which happens to be true: 2 = 1 · 2. In fact, a main goal of Part I and II will be to prove that R(n) is multiplicative. Alas, we are not equipped to do that yet, so be patient until we develop our theory up to the necessary level. As we leave R(n) in peace for now, can we think of other examples of “truly” multiplicative functions, that is, functions that satisfy equation (1) for relatively prime pairs (m, n) yet not for all pairs (m, n)? Such examples are hard to come up with unless you have already seen some before. Problem 2. Let n ∈ N. Define functions τ, σ, π : N → N as follows: (a) τ (n) = the number of all divisors of n; i (b) σ(n) = the sum of all divisors of n;

(c) π(n) = the product of all divisors of n.

 Prove that τ and σ are multiplicative functions, while π is not. To make sure we are on the same page, here are the values of the three functions for n = 6: τ (6) = 4, σ(6) = 1 + 2 + 3 + 6 = 12, and π(6) = 1 · 2 · 3 · 6 = 36. I can think of at least two different ways of doing each part of Problem 2. Hence, you should first try really hard on your own before peeking at the solution below. In view of what we want to prove (or disprove), what is the first nontrivial case to confirm (or deny) that a function is multiplicative? The smallest non-trivial relatively prime numbers are 2 and 3. Therefore, it makes sense to check if f (2)f (3) = f (6) for each of our functions: • τ (2)τ (3) = 2 · 2 = 4 = τ (6): checks out!  • σ(2)τ (3) = (1 + 2)(1 + 3) = 12 = σ(6): checks out!  • π(2)π(3) = 2 · 3 = 36 = π(6): aha! π fails multiplicativity at the very first possible non-trivial instance!  Part (c) is done.  Before we devise a formal solution to parts (a)–(b), let’s do something slightly “illegal”: let’s

4. MULTIPLICATIVE FUNCTIONS

86

 PST 25. “Prove” that τ and σ are multiplicative via a specific example. The chosen example must be representative enough to illustrate the involved ideas and PSTs so as to allow us later to generalize our solution to all cases.

Some initial trials lead us to choose the simple (but general enough) case of relatively prime m = 5 and n = 6. The divisors of 5, 6, and 5 · 6 = 30 are {1, 5}, {1, 2, 3, 6}, and {1, 2, 3, 6, 5, 10, 15, 30}, respectively. The key question is: how can we obtain the divisors of 30 by using only the divisors of 5 and 6? After staring at the data for a while, the reader is likely to notice that the divisors of 30 are all the pairwise products of the divisors of 5 and 6: (3)

{1, 2, 3, 6, 5, 10, 15, 30} = {1·1, 1·2, 1·3, 1·6, 5·1, 5·2, 5·3, 5·6}.

For starters, this means that the desired multiplicative relation among the number of divisors of 5, 6, and 30 is satisfied: τ (30) = τ (5)τ (6) (8 = 2·4). Moreover, we can calculate the divisor-sum σ(30) in two ways, using the usual distributivity property: σ(5)σ(6)

= distr.

= =

(1 + 5)(1 + 2 + 3 + 6) 1·1 + 1·2 + 1·3 + 1·6 + 5·1 + 5·2 + 5·3 + 5·6 1 + 2 + 3 + 6 + 5 + 10 + 15 + 30 = σ(30).

There is no obstruction to generalizing the above “proof-by-example” to all cases, as long as the idea of pairwise products in (3) holds for any relatively prime m and n. This is a well-known fact from number theory:



Lemma 1. For any numbers m and n, the divisors of mn are all pairwise products of divisors of m and n. If m and n are relatively prime, then all such products are distinct. In particular, the number of divisors of mn is the product of the numbers of divisors of m and of n: τ (mn) = τ (m)τ (n). We leave the reader to come up with a rigorous proof of Lemma 1 (cf. Hints section). Note that Lemma 1 shows de facto that τ is multiplicative. ♦

It remains only to prove that σ is multiplicative, which we do by formalizing the earlier calculations for σ(30). For the remainder of Part I and in Part II, unless otherwise stated, let {c1 , c2 , . . . , cs } and {d1 , d2 , . . . , dr } i denote the divisors of m and of n, respectively. Thus, τ (m) = s, τ (n) = r, σ(m) = c1 + c2 + · · · + cs , and σ(n) = d1 + d2 + · · · + dr . Solution to Problem 2(b): For relatively prime m and n, Lemma 1 has established that the divisors of mn are the pairwise products {ci dj } (where 1 ≤ i ≤ s and 1 ≤ j ≤ r) and that these products are all distinct. Then by definition, σ(mn) is the sum of all products ci dj : (4) (5)

σ(mn) = c1 d1 + c1 d2 + · · · + ci dj + · · · + cs dr distr.

= (c1 + c2 + · · · + cs )(d1 + d2 + · · · + dr ) = σ(m)σ(n).

Therefore, σ is multiplicative.



2. WHAT ARE MULTIPLICATIVE FUNCTIONS?

87

It is worth verifying that neither τ nor σ is strongly multiplicative, as they miserably fail the very first non-trivial case: τ (4) = 3 = 2 · 2 = τ (2)τ (2) and σ(4) = 7 = 3 · 3 = σ(2)σ(2). If you fast-forward to Figure 2 (p. 95), you will see that τ and σ are correspondingly placed in the middle “ring” of the diagram, meaning that they belong to M but not to S, the set of strongly i multiplicative functions. 2.5. A taste of prime power. At this point, multiplicativity may still seem like a nice abstraction of no practical value. How deceiving! Once you know that a function is multiplicative, you can do wonders with it:



PST 26. You can reduce the question of finding a direct formula for a multiplicative function f (n) to finding such a formula only in the case when n is a prime power pa . Namely, you can split f (n) along the prime powers pai i from the prime decomposition of n: f (n) = f (pa11 )f (pa22 ) · · · f (par r ),

(6)

and now look for a direct formula just in the prime-power case f (pa ). Equation (6) follows from the multiplicativity of f a and the fact that pai i and pj j are relatively prime for distinct primes pi and pj . We cannot split f (n) any further since the divisors of a prime power pa , excluding 1, are not relatively prime. Thus, (6) is the finest splitting which applies to all multiplicative functions.

f (71 ) f (52 ) f (10!) f (28 ) f ∈M

f (34 )

Using the prime-power reduction in PST 26, Exercise 5. Split f (10!) into as fine a product as possible for any f ∈ M.



Problem 3. Derive the following representations of τ and σ: (a) τ (n) =

r 

(ai + 1);

(b) σ(n) =

i=1

r  pai +1 − 1 i

i=1

pi − 1

·

The notation in this problem calls for a short detour. By now, we have  and , but it is high time that we stop carefully avoided the symbols beating about the bush and re-introduce them, as they will substantially shorten our presentation and clarify calculations. The notation ri=1 simply i means “take the sum of all terms indexed by i = 1, 2, . . . , r.” For instance, we could have written the initial definition of σ in two equivalent ways: σ(n) =

r  i=1

di =



d,

d|n

where the notation d|n stands for “d divides n.” The first summation is read as “add all d1 , d2 , . . . , dr ”; while the second summation: “add all d’s for

4. MULTIPLICATIVE FUNCTIONS

88

which d|n,” or in other words, “add all divisors d of n.” Likewise, equations (4)–(5) on σ’s multiplicativity can be succinctly rewritten as: σ(m)σ(n) =

s r Ä äÄ 

ci

i=1

dj

ä distr. 

=

j=1

ci dj

Lem.1

= σ(mn).

i,j

Notice the double-index “i, j” in the last summation: when bounds for i and j are not explicitly written, it is assumed that i and j run over all possibilities. Using the divisor notation, we can rewrite the above calculation in yet another way that may at first look confusing; but ultimately, it is most advantageous for multiplicative functions: σ(m)σ(n) =

Ä  äÄ  ä distr. 

c

r

c|m

d

=

d|n



Lem.1

cd =

c|m,d|n

e = σ(mn).

e|mn

is analogous: just take the product of all terms inr  d|n d. i=1 di = Conversely, the desired formula for τ (n) in Problem 3 can be expanded as τ (n) = (a1 + 1)(a2 + 1) · · · (ar + 1). The notation

i=1

i dexed by i. Thus, the function π can be written as π(n) =

With this said, we can go back to proving our formulas for τ and σ. Solution to Problem 3: The prime-power reduction of PST 26 teaches us that, to find general formulas for τ and σ, we need only to find formulas for τ (pa ) and σ(pa ). Since the divisors of pa are {1, p1 , p2 , . . . , pa }, a total of a + 1 divisors, by definition, pa+1 − 1 · p−1 The last formula is known as the sum of a finite geometric series with first i term 1, ratio p, and total number of terms a + 1.4 At any rate, by PST 26 we can patch the prime-power pieces from (7) into general formulas: (7)

τ (pa ) = a + 1 and σ(pa ) = 1 + p + p2 + · · · + pa =

• τ (n) = τ (pa11 )τ (pa22 ) · · · τ (par r ) = (a1 + 1)(a2 + 1) · · · (ar + 1); • σ(n) = σ(pa11 )σ(pa22 ) · · · σ(par r ) =

a +1

p1 1 −1 p1 −1

·

a +1

p2 2 −1 p2 −1

ar +1

· · · prpr −1−1 ·



An excellent illustration of these findings is provided by our number n = 2009. Its prime decomposition 2009 = 72 · 41 instantaneously yields • τ (2009) = (2 + 1)(1 + 1) = 6, and • σ(2009) =

73 −1 7−1

·

412 −1 41−1

= 57 · 42 = 2394.

4 We encountered the geometric series formula in the Stomp session in Volume I. To prove it, multiply both sides of (7) by p − 1 to clear the denominator, expand the resulting product on the LHS, and cancel just about everything in sight, arriving at the RHS:

(1 + p + p2 + · · · + pa )(p − 1)

=

p + p2 + p3 + · · · + pa + pa+1

−1 − p − p2 − p3 − · · · − pa = pa+1 − 1.

2. WHAT ARE MULTIPLICATIVE FUNCTIONS?

89

If you don’t believe this, calculate by brute-force the number and the sum of all divisors of 2009 and compare answers. 2.6. Non-multiplicative example. Earlier, we found that the product π(n) of all divisors of n is not a multiplicative function, and hence, the prime-power reduction formula (6) is powerless here. This certainly does not prevent us from finding a nice compact formula for π(n):

 Problem 4. Prove that π is given by π(n) = n

1 τ (n) 2

.

Note how deftly this formula links the two functions τ and π. Hint: The proof departs from the theme of multiplicative functions, so we leave it up to the discretion of the reader. The only hint we slip in here is to study the legendary way Gauss proved (as a child) the summation formula for the arithmetic series n(n + 1) , 1 + 2 +3 + ···+ n = 2 and to replace appropriately addition with multiplication. ♦ 2.7. Is the ∞-Raffle Problem doable after all? It’s time to pause and think what this all means for the function R(n). If we eventually do manage to prove that R(n) is multiplicative, PST 26 will empower us to find a direct formula for it. For this to work, we will need to

 Problem 5. Find a formula for R(p ) for any prime power p . a

a

Solution: Let’s check R(n) for the first few powers of p. • We already know that R(1) = 1. • The divisors of p are {1, p}. Thus, R(1) + R(p) = p ⇒ R(p) = p − 1. • The divisors of p2 are {1, p, p2 }. Thus, R(1) + R(p) + R(p2 ) = p2 , ⇒ R(p2 ) = p2 − (p − 1) − 1 = p2 − p. • For p3 we similarly calculate R(1) + R(p) + R(p2 ) + R(p3 ) = p3 , ⇒ R(p3 ) = p3 − (p2 − p) − (p − 1) − 1 = p3 − p2 .

A pattern surfaces: R(pa ) = pa − pa−1 for a ≥ 1. Having been through the induction session in Volume I, we could immediately attack our conjecture by induction on a, but this will be overkill! Here is a shortcut: For any a ≥ 1, the divisors of pa are {1, p, . . . , pa−1 , pa }. By ∞-Raffle: R(1) + R(p) + · · · + R(pa−1 ) +R(pa ) = pa .





pa−1

The first a terms correspond to the divisors of pa−1 , so by ∞-Raffle again, they add up to pa−1 (this is emphasized by the underbrace). Solving for the  last term R(pa ) poses no difficulty: R(pa ) = pa − pa−1 .

4. MULTIPLICATIVE FUNCTIONS

90

We have experienced one of the most fundamental, popular, and effective techniques in solving problems with numbers and sequences:

 PST 27. Check the first few cases of a problem to search for a pattern. After you find a pattern, prove it directly, by induction, or another method. We are now ready to derive a direct formula for the general case R(n): (8) (9)

R(n)

mult?

=

R(pa11 )R(pa22 ) · · · R(par r )

=

(pa1 − pa1 −1 )(pa2 − pa2 −1 )(par − par −1 ).

Since each factor (pa −pa−1 ) ≥ 1 (why?), we conclude that the whole product R(n) ≥ 1. Hence, every number n appears on at least 1 ticket! Are we done? Far from done! We still need to Problem 1 . (∞-Raffle) Prove that the function R(n) is multiplicative. We challenge the reader to tackle the multiplicativity of R by bruteforce. Meanwhile, we will take a deeper, more elegant approach in Section 3 to explain the multiplicativity of R(n). . . without a single specific calculation! 2.8. Warming up to τ , σ, and π. To check your understanding of the concepts and theory so far, do the exercises below. For the most part, they require clever manipulation of the definitions and formulas for τ , σ, and π. After a reasonable amount of effort, you may glimpse at the solutions we have listed and try to reconstruct them in your own way.



Exercise 6. Find all n ∈ N such that (a) τ (n) = 403; (b) σ(n) = 381; (c) π(n) = 5832; (d) π(n) = 330 540 . Solution (a): The key observation is that 403 is the product of two primes 13 and 31. Correspondingly, if n had three or more distinct prime divisors, i.e., n = pa11 pa22 pa33 k for some k ∈ N, then τ (n) = (a1 + 1)(a2 + 1)(a3 + 1)τ (k) = 13 · 31. But each factor ai + 1 ≥ 2 and, therefore, it yields a prime divisor of τ (n); yet, 13 · 31 has only two prime divisors, a contradiction. We conclude that n has at most two distinct prime divisors; i.e., n = pa11 pa22 or n = pa . The formula for τ then yields τ (n) = (a1 + 1)(a2 + 1) = 13 · 31 or τ (n) = a + 1 = 13 · 31, from which a1 = 12, a2 = 30, and a = 13 · 31 − 1. The 30 13·31−1 for primes p , p , and p, with p = p .  answer is n = p12 1 2 1 2 1 p2 or p At the heart of this solution stands a powerful idea:

 PST 28. Via properties of prime decompositions (e.g., 13 · 31), bound the number of distinct prime divisors of n (e.g., n has at most two prime divisors), and investigate each case within your newly-found bound.

2. WHAT ARE MULTIPLICATIVE FUNCTIONS?

91

We leave the reader to figure out part (b) with somewhat similar techniques, and we move to the different part (c). Solution (c): If π(n) = 23 36 (= 5832), the definition of π(n) implies that n has exactly two prime divisors: p1 = 2 and p2 = 3 (why?), from which 1 n = 2a 3b . From Problem 4 for π, we have π(2a 3b ) = (2a 3b ) 2 τ (n) , so that (2a 3b )(a+1)(b+1)/2 = 2a(a+1)(b+1)/2 3(a+1)b(b+1)/2 = 23 36 . Equating the exponents of the involved prime powers of 2 and 3, we arrive at a system of two equations:



a(a + 1)(b + 1) = 6 and (a + 1)b(b + 1) = 12. There are many ways to continue from here. A slick way is to divide the two equations, resulting in a/b = 1/2, i.e., b = 2a. Substituting in the first equation yields a(a + 1)(2a + 1) = 6, which (by trial and error) has only one natural root a = 1 (why?), and hence b = 2. The final answer is then  n = 2 · 32 = 18. Checking: π(18) = 1 · 2 · 3 · 6 · 9 · 18 = 23 36 .  For the next exercise, recall the notation gcd(m, n), which stands for the greatest common divisor of m and n. Recall also that for each prime p, the gcd picks up the smaller of the two prime powers pa in m and pb in n. PST 28 applies with full force here too.



Exercise 7. Find all m and n such that gcd(m, n) = 18, τ (m) = 21, and τ (n) = 10.

2.9. “Fields” of perfect squares. Our last problem in this section is both theoretically important and interesting on its own. It’s centered on the i concept of a perfect square, that is, the square of an integer; for instance, 36 is a perfect square, but 24 is not.

 is odd iff n is a perfect square or twice a perfect square.

Problem 6. Show that τ (n) is odd iff n is a perfect square, and that σ(n)

For some reason, every time I give a session on multiplicative functions, someone from the audience expresses a doubt regarding these equivalences. To dispel all such doubts, let’s list a few supporting examples: • 36 = 22 32 is a perfect square and τ (36) = 3 · 3 is odd, while 24 = 23 3 is not a perfect square and τ (24) = 4 · 2 is even. • σ(36) = 7·13 and σ(2·36) = 15·13 are odd, while σ(24) = 15·4 is even. Proof: The product τ (n) = (a1 + 1)(a2 + 1) · · · (ar + 1) is odd iff all factors (ai + 1) are odd themselves, i.e., all ai ’s are even. In turn, this means that ai = 2bi for some numbers bi , and the prime decomposition of n is b1 b2 2br br 2 2 1 2b2 n = pa11 pa22 · · · par r = p2b 1 p2 · · · pr = (p1 p2 · · · pr ) = k ,

which is a perfect square.



4. MULTIPLICATIVE FUNCTIONS

92

p

ai +1 −1

Analogously, σ(n) is odd iff all of its factors σ(pai i ) = ipi −1 are odd themselves. It is better to forget here about this compact geometric series formula and work with the original definition of σ(pa ) for prime powers pa : σ(pa ) = 1 + p + p2 + · · · + pa . If p = 2, this sum is always odd. But if p > 2, i.e., if p is an odd prime, the sum consists of a + 1 odd summands; thus, the total sum is odd iff it has an odd number of odd summands, i.e., if a + 1 is odd, i.e., if a is even! Summarizing, the exponent of 2 can be either even or odd, but all other exponents of primes pi must be even: 2br a b2 b3 br 2 a 2 2 2b3 n = 2a p2b 2 p3 · · · pr = 2 (p2 p3 · · · pr ) = 2 k .

If a is even, then n is a perfect square (why?). Otherwise, a = 2b + 1, and  n = 2(2b k)2 is twice a perfect square.

3. Sum-Functions 3.1. Creating new functions. To any arithmetic function f we now associate another arithmetic function Sf in the following manner: Definition 4. For any f : N → C define the sum-function Sf of f by i (10)

Sf (n) = f (d1 ) + f (d2 ) + · · · + f (dr ) =



f (d),

d|n

i.e., the sum-function Sf adds up all values of f along the divisors of n. Suppose we want to evaluate the sum-function of the constant function ι: (11)

Sι (n) = ι(d1 ) + ι(d2 ) + · · · + ι(dr ) = 1 + 1 +· · · + 1 = r = τ (n). r

Thus, Sι = τ , our old friend τ counting the number of divisors. Exercise 8. What are the sum-functions of ε(n), id(n), and R(n)? Partial solution: It is fairly straightforward to arrive at Sε = ι and Sid = σ. As for the sum-function of the ∞-Raffle function R, we have: def

SR (n) =



R(d)

∞-Raffle

=

def

n = id(n) ⇒ SR = id.



d|n

We realize that ∞-Raffle can (yet again!) be paraphrased: Problem 1. (∞-Raffle) Let R be an arithmetic function whose sumfunction is the identity, i.e., let SR = id. Prove that R(n) ≥ 1 for all n.

3. SUM-FUNCTIONS

93

3.2. Multiplicativity is contagious! Viewing ∞-Raffle in the above way inevitably brings up the question: What is the relationship between a function f and its sum-function Sf ? Do they share common properties? Knowing f or Sf , can we easily calculate the other? The examples above, Sι = τ , Sε = ι, Sid = σ, and SR = id, suggest that:

 Theorem 1. If f is multiplicative then its sum-function S

f

is multiplicative.

Proof: Let f be multiplicative, and let m and n be relatively prime. Then any divisors ci of m and dj of n are also relatively prime and, from Lemma 1, the pairwise products {ci dj } are distinct and comprise all divisors of mn. We can now verify the definition of multiplicativity for Sf : Sf (m) · Sf (n)

s 

def

=

f (ci )

i=1

f –mult

=



r 

distr.

f (dj ) =

j=1 Lem.1

f (ci dj ) =

i,j





f (ci )f (dj )

i,j def

f (e) = Sf (mn).

e|mn

Hence, Sf (m)Sf (n) = Sf (mn) and the sum-function Sf is multiplicative.



Theorem 1 confirms that all examples of sum-functions we found earlier were multiplicative by no coincidence; starting with the multiplicative ι, ε, and id, we had to arrive at some multiplicative sum-functions, which turned out to be τ , ι, and σ. Furthermore, using sum-functions, we can now create many more examples of multiplicative functions.

 Exercise 9. For any a ∈ R, let σ (n) = a

d|n d

a.

Find a formula for σa .

Note that σa generalizes τ and σ: σ0 = τ and σ1 = σ. The solution below illustrates a typical exercise on multiplicativity of sum-functions. Solution: As we observed before, all power functions na are (strongly) multiplicative. Exercise 9 defines σa as the sum-function of na , i.e., σa = Sna .  By Theorem 1, Sna is multiplicative. Hence, we split σa (n) = ri=1 σa (pai i ) and then calculate a prime-power piece σa (pb ): σa (pb ) =

 d|pb

da =

b 

(pj )a = 1 + pa + (p2 )a + · · · + (pb )a

j=0

(pa )b+1 − 1 · pa − 1 The last equality featured a geometric series with initial term 1, ratio pa , and b + 1 terms. Multiplying together all σa (pai i ) yields: = 1 + pa + (pa )2 + · · · + (pa )b =

σa (n) =

r  pa(ai +1) − 1 i=1

pai − 1

·

Note that this formula works for all real a except for a = 0 (why?).



4. MULTIPLICATIVE FUNCTIONS

94

As in the above solution, calculations with multiplicative sum-functions often employ the following consequence of Theorem 1: Corollary 1. The sum-function of a multiplicative function f is given by



Sf (n) =

r Ä 

ä

f (1) + f (pi ) + f (p2i ) + · · · + f (pai i ) .

i=1

Try this formula on several more sum-functions:

 Problem 7. Find formulas for S

τ

and Sσ . How about Sln and Sπ ?

Partial Solution: Corollary 1 can be applied to the multiplicative sum-functions Sτ and Sσ , but it cannot help us in the remaining two nonmultiplicative cases. To find Sln , two basic properties of the logarithmic function come to the rescue: ln(a)+ln(b) = ln(ab) and ln(ac ) = c ln a. Hence, Sln (n) =



d|n ln d

= ln(



d|n d)

Ä

1

ä

= ln(π(n)) = ln n 2 τ (n) = 12 τ (n) ln n.

As for Sπ , try your luck, patience, and ingenuity!



If you would like to create even more diverse multiplicative functions, consider the following easy-to-prove statement:



Lemma 2. If f1 , f2 , . . . , fk are multiplicative functions, then the usual function product f1 f2 · · · fk is also a multiplicative function. Thus, for instance, τ 2009 , Sτ 2009 , and even SSτ 2009 are all multiplicative. 3.3. The AMS-inclusion and movement in Multiplicative land. The converse of Theorem 1 is also true: Theorem 2. If the sum-function Sf is multiplicative then the original function f is multiplicative too. The proof is, alas, harder and somewhat technical, involving the method of strong induction. The beginner is advised to accept this converse statement on a first reading. But the intermediate reader skilled with induction should attempt to find a proof and then consult with the Hints section.5 As alluded to earlier, Figure 2 depicts the strict inclusions6 between the three sets of functions we’ve encountered: A  M  S. As you track down functions and their sum-functions in this diagram, think about what exactly Theorems 1–2 imply about them; namely, that a function f and its sum-function Sf are either simultaneously inside M, or simultaneously outside M. For instance, O, ι, ε, id, na , τ , σ, R, and their sum-functions are all inside M; on the other hand, ln and π are in the outer ring A\M of non-multiplicative functions, and so are their sum-functions Sln and Sπ . Using closure of Dirichlet product on the set M of multiplicative functions, we shall provide in Part II another, direct and slick proof of Theorem 2. 6 half-jokingly named “the AMS -inclusion”: check out the publisher of this book!  5

3. SUM-FUNCTIONS

95

Sπ A Sln = 12 τ ln S M SΛ = ln Sid =σ Sτ σ S ι =τ Sε =ι SR =id S S μ =ε φ na O ι ε id R Λ μ τ σ ln π

Figure 2. The AMS -inclusion: A  M  S Interestingly enough, S is not closed under the taking of sum-functions; i.e., you can start with a strongly multiplicative function (like ι and id) but end up with a non-strongly multiplicative sum-function (like Sι = τ and Sid = σ in M\S). There is a good reason for this “movement” within M in and out of S, which will be explained via the Dirichlet product in Part II. Finally, the functions μ (Möbius), φ (Euler ), and Λ (von Mangoldt) must remain a mystery until we learn a whole lot about them in Parts II–III. 3.4. First victory over ∞-Raffle. Theorem 2 is important to us also for a personal reason. The sum-function of R is the multiplicative identity function: SR = id; so by Theorem 2 we conclude that R is also multiplicative. This, in effect, solves the second reformulation of ∞-Raffle (Problem 1 ) and yields formula (9) for R.  One could stop right here: after all, we have solved our ∞-Raffle Problem. But if you are curious to see the story of sum-functions placed within a much larger context and to arrive at an even niftier solution to ∞-Raffle, plow on into Part II. For those who have found the discussion so far too elementary, sharpen your olympiad problem-solving skills with the following delightful Problem 8. Let f (n) : N → N be multiplicative and strictly increasing.7 If f (2) = 2, then prove that f (n) = n for all n ∈ N. In other words, if we are looking for multiplicative and strictly increasing functions: N → N, and if we fix the first two values as f (1) = 1 and f (2) = 2, the function id will be the only one that fits! The proof doesn’t require any of the fancy techniques we develop later in Part II: indeed, a bit of induction and several linked inequalities is “all” it takes to nail down this problem. Yet, who will succeed? See you at the end of the Hints section for the “showdown.” 7

Strictly increasing means f (x) < f (y) for any x < y in the domain of f .

4. MULTIPLICATIVE FUNCTIONS

96

4. Hints and Solutions to Selected Problems Exercise 1. Continuing by brute-force, • • • •

R(7) R(8) R(9) R(10)

= 7 − R(1) = 7 − 1 = 6; = 8 − R(1) − R(2) − R(4) = 8 − 1 − 1 − 2 = 4; = 9 − R(1) − R(3) = 9 − 1 − 2 = 6; = 10 − R(1) − R(2) − R(5) = 10 − 1 − 1 − 4 = 4.

The next values are: R(11) = 10, R(12) = 4, R(13) = 12, R(14) = 6, R(15) = 8, R(16) = 8, R(17) = 16, R(18) = 6, R(19) = 18, R(20) = 8. ♦ Exercise 3. Let m, n ∈ N. All cases can be grouped into two categories. If one of n or m is 1, say, n = 1, then mn = m and ε(m)ε(n) = ε(m) · 1 = ε(m) = ε(mn). If both m and n are > 1, then mn > 1 and ε(m)ε(n) = 0 · 0 = 0 = ε(mn). In all cases, ε(m)ε(n) = ε(mn), i.e., ε is strongly multiplicative.  Exercise 4. For a two-value strongly multiplicative f , we must have f (1) = 1 (otherwise f = O, which is no good!). The second value of f must come from a prime p: f (p) = b for some b = 1 (why?). But then f (p2 ) = (f (p))2 = b2 so that b2 is another value of f . In order to have only two values of f , b2 = b or b2 = 1. As b = 1, this yields only two possibilities: b = 0 or b = −1. Hence, there are two types of such functions f1 and f2 , obtained as follows. Set f1 (1) = 1 = f2 (1) and choose a non-empty set of primes P. For any prime p, define ®

f1 (p) :=

0 if p ∈ P 1 otherwise;

®

and

f2 (p) :=

−1 if p ∈ P 1 otherwise.

Extend f1 and f2 in a strongly multiplicative fashion: fi (pa11 pa22 · · · par r ) := fia1 (p1 )fia2 (p2 ) · · · fiar (pr ) for i = 1, 2. Then f1 attains only the values 1 and 0, while f2 only the values 1 and −1. Thus, the function ε is one of infinitely many examples of two-value strongly multiplicative functions. ♦ Lemma 1. What are the divisors of mn? Obviously, if c|m and d|n, then the product follows suit: cd|mn; so all pairwise products ci dj are divisors of mn. Conversely, if e is a divisor of mn, using the prime decompositions of m and n we can write e as a product of a divisor of m and a divisor of n (why?): e = ci dj for some ci and dj . The key point here is that all products ci dj are distinct. It may not be immediately obvious that this follows from the relative primeness of m and n, but watch! If ci dj = ck dl for some divisors ci , ck of m and dj , dl of n, then ci divides the RHS, i.e., ci |ck dl . However, ci and dl are relatively prime (as divisors of the relatively prime m and n); thus, ci |ck . Turning the tables around, we can equally show that ck |ci , so that ci = ck and, consequently, dj = dl . Thus, two products ci dj and ck dl are equal only if they are comprised of identical divisors of m and n. Therefore, the products ci dj are distinct and they comprise all divisors of mn. There are sr such products, i.e., τ (mn) = sr = τ (m)τ (n). 

4. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

97

Problem 4. The (most likely) way by which Gauss added up 1+2+· · ·+100 in his primary school math class was to pair up terms in the front with terms in the back, each pair giving the same total sum of 101, i.e., 1+100 = 2+99 = 3 + 98 = · · · = 101. The same idea can be applied to the product of all divisors of n. If {1 = d1 , d2 , d3 , . . . , dr = n} are the divisors of n arranged in ascending order, note that n = d1 dr = d2 dr−1 = d3 dr−2 , and so on. The reason this works out so nicely is because if d is a divisor of n, then nd is also a divisor n, so that d · nd = n. Formally, { dn1 , dn2 , · · · , dnr } are also the divisors of n, but arranged in descending order. We see that π(n) can be calculated in two different ways, and we multiply the two corresponding expressions below: π(n) = d d · · · d 1 2 r π(n) = dn1 dn2 · · · dnr

×

Ä

⇒ π 2 (n) = d1 ·

Ä n äÄ nä nä d2 · · · · dr · = nr . d1 d2 dr

As r = τ (n) is the number of divisors of n, we arrive at π(n) = nτ (n)/2 .



Exercise 6(b). The prime decomposition of 381 is 3 · 127. Since σ is multiplicative, we can apply the prime-power splitting to it: σ(n) = σ(pa11 )σ(pa22 ) · · · σ(par r ) = 3 · 127. Each factor σ(pa ) = 1+p+· · ·+pa yields a non-trivial divisor of σ(n) = 3·127. Hence, there can be at most two prime divisors p1 and p2 of n. (Why? Compare with PST 28.) Case 1. If n = pa q b for distinct primes p and q, then σ(pa ) = 3 and = 127. The first equation has only one solution: 1 + 2 = 3, i.e., pa = 2. You can “brute-force” the solutions to the second equation, but there is a finer way to proceed. From q b+1 − 1 = 127(q − 1) (how did we get this?) we can reduce modulo q to −1 ≡ −127 (mod q), i.e., q|126 = 2 · 32 · 7. But q = 2 (we already established p = 2), so that q = 3 or q = 7. Check that 3b+1 − 1 = 127 · 2 and 7b+1 − 1 = 127 · 6 do not yield acceptable solutions for b. Therefore, this case does not work in our problem. σ(q b )

Case 2. If n = pa is a prime power, then σ(pa ) = 381, which means = 381(p − 1). Again, reducing modulo p results in p|380 = 22 · 5 · 19. Check that p = 2 and p = 5 do not yield any solutions, but p = 19 works: 193 − 1 = 18 · 381. The final (and only) answer is n = 192 = 361. ♦ pa+1 − 1

Exercise 6(d). As in part (c), set n = 3a 5b and obtain a system of two equations. Dividing them, deduce that b = 43 a. Substituting into one equation, arrive at a(a + 1)(4a + 3) = 180. The LHS increases as a increases, so the solution a = 3 is the only one (why?). The final answer is n = 33 54 . ♦ Exercise 7. Among other things, gcd(m, n) = 18 implies that both 2 and 3 divide m and n (why?). On the other hand, τ (m) = 21 = 3 · 7 is a product of two primes, just like in Exercise 6(a) where τ (n) = 13 · 31. By a similar analysis, conclude that m = 22 36 or m = 26 32 . Ditto, since τ (n) = 10 = 2 · 5 is also a product of two primes, n = 21 34 or n = 24 31 (why?).

4. MULTIPLICATIVE FUNCTIONS

98

Finally, observe that not both m and n are divisible by 4 or by 27 (why? gcd(m, n) = 18 = 2 · 32 ). This leaves only one possibility for the pair (m, n): m = 26 32 and n = 21 34 . ♦ Exercise 8. By definition of sum-functions: • Sε (n) =



• Sid (n) =

d|n ε(d)



def

= ε(1) + 0 + 0 + · · · + 0 = 1 = ι(n);

d|n id(d)

=



d|n d

def



= σ(n).

Corollary 1. There is really nothing to prove here. Since f is multiplicative, we know that Sf is multiplicative; so we split Sf into prime-power components and write the definition of each component as a sum-function: Sf (n) =

r 

Sf (pai i ) =

i=1

r Ä 

ä

f (1)+f (pi )+f (p2i )+· · ·+f (pai i ) .



i=1

Problem 7. Since τ and σ are multiplicative, their sum-functions are also multiplicative; so it suffices to find formulas only at prime powers: Sτ (pa ) =



τ (d) =

d|pa

Sσ (pa ) =



τ (pi ) =

i=0

σ(d) =

d|pa

=

a 

σ(pi ) =

i=0

p−1

(a+2)(a+1) ; 2

(i + 1) =

and

i=0

a 

a+1 p p p−1−1 −(a+1)

a 

a Ä a+1 ä   pi+1 −1 i 1 = p − (a + 1) p−1 p−1 i=0

=

i=1

pa+2 −p−(p−1)(a+1) (p−1)2

=

pa+2 −p(a+2)+(a+1) · (p−1)2

Along the way, we used the formulas for the sum of the arithmetic series a a+1 i i=0 (i + 1) and for the sum of the geometric series i=1 p . Applying (6), we piece together all prime-power parts into general formulas for Sτ and Sσ : Sτ =

r  (ai +2)(ai +1) 2 i=1

r r a +2   ai +2 pi i −pi (ai +2)+(ai +1) = and S = · σ 2 (p −1)2 i

i=1

i=1

The final version of the formula for Sτ employs the notation for the binomial   coefficient a+2 = (a+2)(a+1) ·  2 2 As we indicated in the text, working with Sπ is much harder, since π is not multiplicative. We can’t use (6) for Sπ (n); even finding a closed formula for a prime-power piece is already problematic: Sπ (pa ) =

a  i=0

π(pi )

Prob.4

=

a  i=0

i

(pi )τ (p )/2 =

a  i=0

pi(i+1)/2 =

a 

i+1 p( 2 ) .



i=0

Lemma 2. To save chalk , we will prove the lemma only for two multiplicative functions f1 and f2 ; but this will actually suffice to prove the statement for any number of such functions (why?). Let m and n be relatively prime. To show multiplicativity of f1 · f2 , we calculate as follows:

4. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

(f1 ·f2 )(mn)

def·

=

=

99

mult

f1 (mn)·f2 (mn) = f1 (m)f1 (n)f2 (m)f2 (n)

Ä

äÄ

ä

f1 (m)f2 (m) f1 (n)f2 (n) = (f1 ·f2 )(m)(f1 ·f2 )(n).

♦ Therefore, f1 ·f2 is also multiplicative. For instance, in the text, we claimed that SSτ 2009 is multiplicative. This is true because τ is multiplicative, and so is its power τ 2009 by the newlyproven Lemma 2, and so is its sum-function Sτ 2009 , and in turn, so is its sum-function SSτ 2009 . Theorem 2. Let n1 and n2 be relatively prime numbers such that n = n1 n2 . We will prove by induction on n that f (n1 n2 ) = f (n1 )f (n2 ). The statement is trivial for n = 1: as n1 = n2 = 1, we need only to verify f (1) = f (1)f (1). By definition, Sf (1) = f (1); since Sf (1) = 1 or 0 (Sf is multiplicative!), we conclude that f (1) = 1 or 0, so that f (1) = f (1)f (1). Assume now that the statement is true for all d = d1 d2 < n, i.e., that f (d1 d2 ) = f (d1 )f (d2 ). Then for our n1 n2 = n we calculate twice: • Sf (n1 n2 )

Lem.1

=





f (d1 d2 ) = f (n1 n2 ) +

di |ni IH

=

def

• Sf (n1 )Sf (n2 ) =

 d1 |n1

**



f (n1 n2 ) +



d2 |n2

= f (n1 )f (n2 ) +

f (d1 d2 )

di |ni ,d1 d2 1, then f (n) = n for all n ∈ N. Solution: Replacing ( 1, it follows that f (2) = 2. We are back to our previous problem! Thus, f (n) = n for all n ∈ N. 

Session 5

Introduction to Group Theory based on

Tatiana Shubin’s session

Sneak Preview. Having played with Rubik’s Cube and taken it apart to see what is inside, it is now time to look under the hood and penetrate more deeply into what its true structure is. The building blocks are groups. Stubborn polynomials, symmetric elephants, and socks that beg to be put on, taken off, and permuted between your feet are all part of the story, directed by Galois. You will escape never-ending cycles in a complex world, only to stroll along in Permuterland and, ultimately, seek bi-polar paths in 15-Puzzleland.

1. Puzzling It Out The well-known 15 -puzzle consists of a shallow box filled with 16 squares in a 4 × 4 array (cf. Fig. 1a–d from left to right). The bottom right corner square is removed, and the other squares are labeled 1 through 15 as in Figure 1a. Using the empty spot, we can slide the squares around without lifting them up. 1

2

3

4

4

3

2

1

10 9

8

7

8 14 11 3

5

6

7

8

5

6

7

8

11 2

1

6

12 2 15 9

9 10 11 12

12 11 10 9

12 3

4

5

6

13 14 15

13 14 15

13 14 15

4 13 1

7 10 5

Figure 1. Achievable or not? Problem 1. (McCoy, [53]) Starting from the initial position in Figure 1a, which 15 -puzzle positions in Figures 1b–d can be achieved and why? Understandably, a novice may ask: “What does this puzzle have to do with serious mathematics?” “Ah, . . . wrong question!” an advanced math circler will say. “Just about any interesting (or uninteresting) puzzle is somehow related to mathematics.” The puzzle is frequently a disguise for an actual problem from group theory. In fact, by the end of this session you will have seen such a variety of examples of groups, that (whether you wanted to or not) you will start seeing groups everywhere around you! 103

104

5. GROUP THEORY

For instance, just like the Rubik’s Cube, the 15-puzzle is solvable via a special type of permutations that form a subgroup, fortified with the idea of a closed path . . . . What does this mean? As vague as this hint may be, it is the only one you will get for now on Problem 1. Did you try it? Any luck in transforming Figure 1a into others? Some positions will be achievable while others will stubbornly remain out of your reach! Is it possible to rigorously prove that the stubborn positions are indeed unreachable, regardless of how long you play with the puzzle and regardless of whatever complicated sequences of moves you invent? If you are stuck, hang around with us for a systematic introduction to the objects, theorems, tools, and basic applications of group theory. At the end, we will get back to the 15-puzzle and, hopefully, by then you will not find it nearly as difficult as it may now look. On the other hand, if you already know the fundamentals of group theory, skim over the examples spread throughout this session, and jump to the challenging problems in the last section. The 15-puzzle will be waiting for you there.

2. A Polynomial Prelude1 2.1. The promise of the quartics. When we think of algebra the first thing that comes to mind is the study of polynomial equations and their solutions. And duly so – for a long, long time algebra essentially had been that very study. Of course, any linear or quadratic equation can easily be solved, and there is evidence suggesting that Babylonians as early as 1800 BCE already knew general procedures for dealing with both types of equations. Cubic equations proved to be much trickier – the first description of a general way to solve them appeared in Ars Magna, published in 1545 by Gerolamo Cardano.2 Soon after, Cardano’s pupil Lodovico Ferrari invented a nice reduction procedure to conquer quartic (4th degree) equations by constructing an associated cubic equation, solving it, and then using its roots to find a solution to the original quartic equation. This method seemed to promise that a similar approach could be used to solve higher degree equations – just keep constructing auxiliary lower degree equations and solving them. Unfortunately, this did not work – so much so that all attempts to find a general method for solving even quintic (5th degree) equations failed. 1

If any words in this section are unfamiliar to you, don’t worry: just read on for the fun of it. After all, the history of mathematics is full of duels, drama, and enlightenment. 2 Recall the discussion of x3 = 15x + 4 in Complex Numbers I, volume I. The method was actually found independently by Scipione del Ferro and Niccoló Tartaglia, but revealed by Cardano in Ars Magna, apparently, against Tartaglia’s wishes.

3. ACTION GROUPS

105

2.2. Shifting focus to a new big picture. Mathematicians were really perplexed. But, of course, they kept working. Instead of direct attacks, they turned their attention to the relationships between the roots of a given equation. This eventually led to the discovery of the marvelous world of symmetry and, ultimately, to the idea of groups and other algebraic structures. A whole new field of mathematics called abstract algebra was created. So what about higher degree equations? It was by means of abstract algebra that the question was finally settled in the first half of the 19th century – it’s been proved that, in general, a polynomial equation of degree five and above cannot be solved in radicals; i.e., there is no way to get a solution formula which uses only the algebraic operations of addition, subtraction, multiplication, division, and root extraction.3 Meanwhile, the notion of a group has become one of the most important notions in mathematics. At the same time, it is also very widely used in applications. For example, apart from the study of algebraic equations, finite groups are indispensable in fields as distinct as crystallography and coding theory, just to name two.

3. Action Groups One way to think of groups is as follows. Definition 1. A nonempty collection of actions that can be performed one i after another is called a group if every action has a counteraction also included in this collection, and the result of performing any two of these actions in a row is also included in the collection. 3.1. A group for every soldier. Let us start with a very simple but illustrative example. Exercise 1. (Sosinski, [76]) The “Turning Soldier” group consists of four actions: • s = stand still; • r = turn right; • l = turn left; • b = turn around 180◦ . Why is this collection T = {s, r, l, b} a group? In order to see what happens when various actions are performed one after another, it is convenient to construct a table, called the multiplication i table of the group. We label (in some order) each row and each column of the table by the elements of the group and we place in the matrix cell (i, j) the element that is equal to the product of the elements labeling the ith row and j th column of the table. 3

Who proved it? The 22-year old Norwegian Niels Henrik Abel, in 1824.

106

5. GROUP THEORY

Now just stop for a second and see whether what you have just read makes any sense to you. You certainly should be perplexed by certain words! In particular, what exactly is meant by the product of actions? Actions are not numbers, so how do we multiply them? When we deal with an action group, we can combine a pair of actions by performing one of them and then following with the other one; and – just for convenience! – we say that we i have multiplied these two actions. We are really interested only in the final result of these actions, and not in the particular way by which that result has been achieved. So if the soldier turns 180◦ around and then turns right, the result is the same as if he simply turned left to begin with. (Can you see it?) Thus we say that the product of actions b and r equals l, and we write rb = l. Observe the order in which we list the actions b and r: from right to left! Let’s go back to the multiplication table for the · s r b l turning soldiers. If the 2nd row is labeled by r and the s 3rd column is labeled b, then we place l in cell (2, 3). r l Can you fill in the entire table? b Notice that s = “doing nothing” is a very special l action. Every group must have such an element. Why? Anyone would agree that the counteraction of turning left, l, is turning right, r; if we perform these two actions one after the other, we get rl = s. By our rules for a group, we must therefore include in T the “do nothing” turn s. Can you think of other reasons why s should be in T ? Check out the first row and column corresponding to s: they are also very distinguishable!



3.2. A group for every sock. While one sock is not enough for your two feet, it is enough (precisely because of this) to make for an interesting group. Exercise 2. (Sosinski, [77]) The “One Sock” group S consists of the actions: • n = do nothing; • c = take the sock off and put it on the other foot; • i = take it off, turn it inside out, then put it on the same foot again; • t = take it off, turn it inside out, then put it on the other foot. Show that S is indeed a group.



Here is one question over which you may (and should) want to ponder: Is the One Sock group any different from the Turning Soldier group? Each consists of four actions. Still, can we view every turn of a soldier as a sock move? We will explore such questions soon; but for now start thinking about this.

 PST 32. A classical way to distinguish between the Turning Soldier and

the One Sock groups is to find the counteraction of each sock’s move and of each soldier’s turn and compare the two situations.

3. ACTION GROUPS

107

3.3. A group for every figure. The next example is much more interesting (and important). While numbers measure size, groups measure symmetry. Symmetry is the property of an object to remain unchanged while undergoing i changes. More precisely, a symmetry is a motion that maps a figure onto itself. For instance, any motion you perform on the elephant-in-profile E1 – a translation, rotation, reflection, or glide reflection4 – will produce another figure (congruent to E1 ). By contrast, the full-face-elephant E2 will go to itself under a reflection r about a vertical line. We conclude that elephant E1 has only the trivial symmetry i (the “fix everything” motion), while elephant E2 has a second symmetry – the reflection s. In general, for every geometric figure F , the collection of symmetries of F i forms a group (why?) called the symmetry group of F and denoted by S(F ). The structure and size of this group tells us how much symmetry the figure possesses. Thus, S(E1 ) = {i} is a single-element group, while S(E2 ) = {i, s} is a group of 2 motions. Let’s move now to larger symmetry groups.

 the elements of the symmetry group S(Δ). How many are there?

Exercise 3. Let Δ denote an equilateral triangle. Describe (geometrically) Answer: S(Δ) consists of 6 actions: 3 rotations with respect to the center (including the 0◦ -rotation), and 3 flips (reflections) across Δ’s altitudes. ♦ By the way, if you want that large of a symmetry group for an elephant, you will need at least 3 elephants (cf. Fig. 2). Why?

Figure 2. D3 = S(Δ): symmetries of the equilateral triangle The symmetry group S(Δ) is usually denoted by D3 and is called the

i 3rd dihedral group. In general, Dn , the nth dihedral group, is the group of

symmetries of a regular n-gon. You may know what’s coming up now:



Exercise 4. Find the number of elements in D4 and compare with D3 . Establish a pattern and check it on Dn for any n ≥ 1. As a bonus, what is the maximal number of symmetries you can produce using only two full-face elephants, and does the resulting symmetry group match any Dn ? Partial Answer: By playing with the square, you will quickly discover its 8 symmetries and conjecture that Dn has 2n elements for all n. This will ♦ almost always be true. Two E2 -elephants can fill in the “gap”. 4

Such transformations of the plane are known as Euclidean motions (cf. Hints section).

108

5. GROUP THEORY

The curious reader, of course, will ask if the remaining numbers missed by the orders of Dn can be obtained as orders of symmetry groups of plane figures. We challenge the reader to affirmatively answer this question: Problem 2. For any odd n ≥ 1, find a plane figure F with exactly n symmetries, i.e., such that the number of elements in S(F ) is n. 3.4. Size is not everything. Nevertheless, the number of elements of a i group G is its most important characteristic. It is called the order of the

group G and is denoted by |G|. For instance, you must have discovered above that |Dn | = 2n for n ≥ 3. If G is a finite set, |G| is a positive integer; otherwise, we say that G is of infinite order, or more simply, is infinite. Clearly, if two groups have different orders then they are not the same group. How about if the orders match? Are the groups the same? We already encountered this situation in our first two examples: the Turning Soldier group T = {s, r, l, b} and the One Sock group S = {n, c, i, t} both have orders 4. Following PST 32, you must have noticed that the counteraction of each sock’s move is itself, while this is not always true for the soldiers turns: r = l, yet r and l counteract each other in T . Thus, T = S. In the next example, we will take this question to a new level: literally, to a new dimension. We will count symmetries in space. Problem 3. (Armstrong, [6]) Consider three solids: (1) a (right) pyramid whose base is a regular polygon with 12 sides; (2) a regular hexagonal plate (a hexagonal prism); (3) a regular tetrahedron. For simplicity, consider only rotational symmetries5 of these solids. For each solid, these symmetries form a group (why?) G1 , G2 , or G3 , respectively. Show that these groups have orders 12; yet, they are all different.

Figure 3. Rotational symmetries of solids Partial Solution: Locate first the rotational axes and decide how many rotations about these axes will send the solids to themselves (cf. Fig. 3). (1) G1 has 12 rotations about the vertical axis (including the identity). (2) G2 has 6 rotations about the vertical axis (including the identity); 1 rotation about each of 3 axes through the midpoints of opposite vertical 5 As opposed to rotations in the plane (which happen about single points), rotations in 3D-space are performed about lines, called axes of rotation.

3. ACTION GROUPS

109

edges; and 1 rotation about each of 3 axes through the centers of opposite rectangular side faces. (3) G3 has the identity; 2 rotations about each of the 4 axes through a vertex and the center of the opposite face; and 1 rotation about each of the 3 axes through the midpoints of opposite edges. Thus, |G1 | = |G2 | = |G3 | = 12. But clearly, the symmetries of these solids are distinctly different. One such striking difference is the fact that one single rotation when repeated, generates all rotations of the pyramid (which rotation is that?); but there is no such single rotation of the prism or the tetrahedron.6 There are other differences as well. To name just one more, for the pyramid there is only one (non-trivial) rotation which counteracts itself (which one?), i.e., combined once with itself it equals the identity. For the prism, there are more such rotations (how many?); and for the tetrahedron, the number is still different (what is it?). These essential differences imply that the Gi ’s are all distinct groups. ♦

 PST 33. To establish that groups are not the same, find a suitable property

that is satisfied by a different number of objects from each group. Along with group order, you may want to count, for example, the number of elements that counteract themselves, or those that generate the groups (if any).

3.5. A group within a group. Problem 3 was based on the fact that the rotational symmetries of the solids in Figure 3 form smaller groups Gi inside the full symmetry groups. A similar phenomenon can be observed in the simpler case of the group D4 : we may notice that some actions in this group i form a group by themselves. We call such a subset a subgroup.

 Problem 4. Find all subgroups of D , the symmetry group of the square. 4

Partial Solution: One of these subgroups con· r0 r1 r2 r3 tains 4 elements; it consists of all rotational symmetries r0 of a square. Let us call it R4 = {r0 , r1 , r2 , r3 }, where r1 r3 rj is a (90j)◦ -rotation. Note that r12 = r2 , r13 = r3 , and r 2 r14 = r0 ; i.e., r1 generates R4 . With this, the multiplir3 cation table for R4 can be filled in no time. Adding to R4 a reflectional symmetry of the square inevitably yields all of D4 (why?).♦

 PST 34. To construct a subgroup of a group, start by including the identity.

For each new element g you add, include all repetitions of g, all products of g with the current members of your subgroup, and these products’ counteractions. When you are done, the multiplication table will tell you if you have indeed created a subgroup. 6

Put geometrically, the pyramid has only 1 axis of rotation, while the prism has 7 and the tetrahedron 6. This allows the pyramid’s group G1 to have a generating rotation but makes the same impossible for G2 and G3 . Why?

110

5. GROUP THEORY

3.6. Twin groups. Comparing the tables for R4 and T (pp. 106, 109), we can see that they differ only by the letters used to denote the elements. After a suitable renaming (e.g., s → r0 , r → r1 , b → r2 , l → r3 ) one table will become exactly the same as the other. Therefore, these groups are i indistinguishable from an algebraic point of view. We call them isomorphic groups and denoted this by R4 ∼ = T. Exercise 5. Are the groups R4 and S isomorphic? Why or why not? So far, we have found only two non-isomorphic groups of order 4: T and S.

 also has order 4 and decide if it is isomorphic to T or S.

Exercise 6. Verify that the symmetry group of a (non-square) rectangle Problem 5. (Advanced) Is there another group of order 4 non-isomorphic to T and to S? How about a fourth group of order 12 that is non-isomorphic to G1 , G2 , and G3 ? Note that all differences between the 3D-solids (or, for that matter, between the planar figures we encountered), must be related to how their symmetries combine. In each case, the group of symmetries has a certain algebraic structure. Group theory studies this structure.

4. General Groups An abundance of groups naturally arise as “action groups”: we devoted a good amount of time studying them. However, some questions about these groups can be better and more easily answered if we momentarily forget about their origins and extract from them only their group essence. And hence, the general (a.k.a. “abstract”) definition of a group: Definition 2. A group is a nonempty set G together with a binary operai tion7 ∗ on G with the following properties:

(i) a ∗ (b ∗ c) = (a ∗ b) ∗ c for all a, b, c ∈ G (i.e., ∗ is associative). (ii) There is an identity element e ∈ G, i.e., a ∗ e = e ∗ a = a for all a ∈ G. (iii) For each a ∈ G, there is an inverse element a−1 ∈ G, i.e., a ∗ a−1 = a−1 ∗ a = e. 4.1. Gated communities. Implicit in the above group definition is that i the set G is closed under the operation, namely that a ∗ b ∈ G for all a, b ∈ G.

It’s worth spending a few moments thinking about the notion of being closed. Exercise 7. Recall the set T of soldier’s turns, where the (binary) operation is that of performing actions one after another (a.k.a. composing these actions). Is T closed under this operation? What if instead of the entire set T we consider its various subsets? Which of them are closed? 7

A binary operation ∗ on G takes two inputs a, b ∈ G and yields one output a ∗ b ∈ G.

4. GENERAL GROUPS

111

Solution: No matter what sequence of turns the soldier performs, in total, he will still have made one of the four allowed turns. Hence T is closed under the operation. (It better be, since you showed earlier that T is a group!) The only other two closed subsets of T are {s} (s is the identity element) and {s, b} (because b2 = s). Trying to include r forces inclusion of b = r2 , l = r3 , and s = r4 , i.e., of the whole set T ; and similarly for l.  Exercise 8. Let U = {0, 1}. Is U closed under multiplication? How about Can you add one real number to U so that the new set would be still closed under multiplication? More than one number?

 under addition?

Hint: The answer to the last question depends on whether we are allowed to add to U infinitely or finitely many numbers. The former case is simple: just add all real numbers and you cannot go wrong because of the closure of R itself! Adding finitely many numbers, however, requires deeper reasoning. If you add a to U , then you must also add all powers an ; therefore, there must be only finitely many such distinct powers, i.e., an = am for some distinct n, m ≥ 1 (why?). For which real a can this happen? ♦ 4.2. One too many. It is natural to ask if the objects in Definition 2(ii)-(iii) of a group are unique:

 element of a group have two different inverse elements?

Exercise 9. Can a group have two different identity elements? Can an

Hint: Both questions will be answered negatively in Multiplicative Functions II. Yet, these are such fundamental facts that they deserve to be proven again. What is the product e1 ∗e2 for two identity elements e1 and e2 ? What is the triple product a1 ∗a∗a2 for two inverse elements a1 and a2 of a? Answer each question in two ways and compare your answers. ♦

1

v

r1

vr

4.3. A billion or abelian? We denote a group G with operation ∗ by (G, ∗). If in addition to properties (i), (ii), and (iii), (G, ∗) satisfies a ∗ b = b ∗ a for i all a, b ∈ G, then G is said to be commutative, or abelian. The result of performing two actions one after the other usually depends on which r1 v is done first and which second. For example, reflecting elephant E1 across a vertical r1 v line and then rotating it by 90◦ clockwise is different from first rotating and then reflecting E1 . Indeed, starting with the dark elephant in Figure 4, we eventually arFigure 4. r1 v = vr1 in D4 rive at two differently positioned white elephants,8 showing that the rotation r1 and the reflection v do not commute with each other. (Try this on your own, and with other figures too!) This is to say that D4 is not abelian. 8 Ignore the translations of E1 in Figure 4: they are done so that we can see the differently positioned elephants, instead of stampeding all of them into each other.

112

5. GROUP THEORY

All the same, here is a property that will make a group abelian.9

 Problem 6. Show that if a ∗ a = e for all a ∈ G then G is abelian.

Hint: For any two a, b ∈ G, start with (a ∗ b)2 = e, expand this, and solve for b ∗ a (which will appear in the middle of your expression). ♦ The hypothesis of the problem may be interpreted to say that every element is its own inverse, or that every action is its own counteraction! This was the case in the One Sock group S; now we automatically know that S is abelian, without having to check the commutativity condition for all pairs of socks moves! Problem 6 is a classic in the group theory folklore: it relates the local self-inverse property of individual elements of G (a∗a = e) to the global property of G being abelian (a ∗ b = b ∗ a).

5. Some More Examples of Groups 5.1. Total “recall.” Here are some initial examples10 of groups, with which you have worked ever since you started adding and multiplying numbers. Whether you have realized that the sets below could be treated as groups is an altogether different situation, to be “remedied” right now. Exercise 10. By using Definition 2, show that the following are groups: (a) (R, +); R is the set of all real numbers, and + is ordinary addition. (b) (Z, +); Z is the set of all integers, and + is ordinary addition. (c) (Zn , +); Zn = {0, 1, 2, . . . , n − 1}, and + is addition modulo n.11 (d) (R∗ , ·); R∗ = R − {0} is the set of all non-zero real numbers, and the operation is ordinary multiplication. 5.2. An ocean of symmetries. The next example generalizes our new friends, the dihedral groups Dn . Now, each Dn is the group of symmetries of a regular n-gon, and as such it is finite. In order to obtain an infinite group D∞ , we need a figure with infinitely many symmetries. There are several natural choices here. One is the limiting figure of regular n-gons when n becomes large: this is the circle C, with its infinitely many rotations and reflections. Another choice is the real line R with its infinitely many reflections and . . . translations. The example below picks yet a third object.



Exercise 11. Think of Z as the set of all dots marking integers on the real number line (cf. Fig. 4b). Let t be the translation to the right through one unit, and let s be reflection in the origin. We set D∞ = {e, t, t−1 , t2 , t−2 , . . . , s, ts, t−1 s, t2 s, t−2 s, . . . }, where the operation is composition of transformations. Show that D∞ is the group of symmetries of Z; and that D∞ has some properties similar to those of Dn : s2 = e and stk = t−k s; but unlike Dn , tk = e in D∞ for any k = 0. Q: How many commutative groups are there? A: A billion (“Abelian”).  These and other similar examples also appear in Multiplicative Functions III. 11 Review operations modulo n from Number Theory I; e.g., 5 + 9 ≡ 3 (mod 11). 9

10

5. SOME MORE EXAMPLES OF GROUPS

113

The comparison between S(Z) and Dn justifies the name infinite dihedral i group D∞ for S(Z). Still, why wouldn’t S(R) or S(C) work as well as S(Z)? ζ5 C



ζ52 rα

−4 −3 −2 −1

0

0 t3

1 t

2

3

C

4 0

t

1

2

s

s

ζ53

ζ54

Figure 4. C-symmetries, Z-symmetries, and cyclic C5

Problem 7. (Advanced) Describe the symmetry groups S(R) of the real line and S(C) of the circle. Are they isomorphic to some other well-known groups? How do they compare to D∞ ? To answer fully these questions will require semidirect products of groups and, hence, take us beyond the intended level of this session. The advanced explorer may want to check his/her work against the Hints section. 5.3. Complex world. We can think of the circle C as the set of all complex numbers12 with magnitude 1: C = {z ∈ C | |z| = 1}, or equivalently, C is the unit circle in the C-plane. For starters,



Exercise 12. Show that C∗ = C − {0} is a group under ordinary multiplication of complex numbers, and that (C, ·) is a subgroup of (C∗ , ·).

The fact that (C, ·) is an infinite group does not prevent it from having finite subgroups. Indeed, let n ≥ 1 be an integer, and denote by Cn the set of all roots of the polynomial equation of degree n, z n − 1 = 0, i.e., i Cn = {z ∈ C | z n = 1}. For example, C2 = {1, −1} and C4 = {1, i, −1, −i}. It is no surprise that all these roots land on the unit circle C: the equation z n = 1 implies that |z| = 1 and hence z ∈ C. This is illustrated by Figure 4c, depicting the relative positions of the 5 elements of C5 along C. Moreover,

 Exercise 13. Show that C

n

is a subgroup of (C, ·).

√ We can actually list all elements of Cn (via de Moivre’s formula for n z): Cn = {1, ζn , ζn2 , ζn3 , . . . , ζnn−1 }, 2π th i where ζn = cos 2π n + i sin n is a primitive n root of unity, that is, a root whose powers yield all other roots of the equation z n = 1. We can observe this phenomenon in the above examples: • as (−1)2 = 1, the primitive root in C2 is ζ2 = −1; 12

To get comfortable with this example, read first Complex Numbers I-II. In particular, the magnitude |z| is the distance from point z = a + bi to the origin; |z −1 | = 1/|z| and |z1 z2 | = |z1 | · |z2 | for any z, z1 , z2 ∈ C. Exercises 12–13 are solved (under disguise) in these sessions. Primitive roots of unity and de Moivre’s formula appear there too.

114

5. GROUP THEORY

• in C4 we have i1 = i, i2 = −1, i3 = −i and i4 = 1, making i a primitive root; but so is (−i) (why?); • it turns out that there are four primitive roots in C5 : any non-identity element generates all of C5 (check it!). Such situations are so important that there is a special name for them. i Definition 3. If a group G has a generator a then G is called a cyclic group.

Thus, Cn is a cyclic group; but so are several other groups we have encountered. While some readers are searching for these cyclic examples, we will pause developing the theory to finish an earlier story. 5.4. Dramatic conclusion to the search for polynomial solutions. We managed to completely solve the equation z n −1 = 0 in complex numbers and to describe its group of solutions as the finite cyclic group Cn of order n. Of course, z n − 1 = 0 is a very special and simple equation. In order to fully understand when and why a general polynomial equation can or cannot be solved in radicals, you need to learn a very beautiful part of abstract algebra called Galois theory (cf. Stewart’s [80]). The theory is named after a French mathematician, Évariste Galois, who died (after a duel) at the age of 20 but who had managed to make fundamental mathematical discoveries and to create a whole new branch of mathematics. Certain of his impending death, the night before the duel Galois outlined his mathematical ideas in Galois (1811–1832) a famous letter to his closest friend. Nowadays, Galois Theory is a major part of mathematics programs all over the world: it is incorporated into the upper-division abstract algebra sequence or it constitutes a separate advanced college course. By the way, Galois was the first to use the word “group” in our present sense. 5.5. Back to the cyclic world. For a group (G, ∗), we often refer to the group operation ∗ as “multiplication”, omit the symbol ∗, and write ab for a ∗ b. If a ∈ G, we denote the product of n copies of a by an , and the product of n copies of a−1 by a−n (of course, n ∈ N). We also set a0 = e. With this convention, a group G is cyclic if everything in G is a power of a single element a. In other words, G = {an | n ∈ Z}, also denoted as G = a. Thus, the cyclic group Cn with generator ζn can be written as Cn = ζn .

 Problem 3 isomorphic to a cyclic group? Why not G

Exercise 14. Is the group G1 of rotational symmetries of the pyramid in 2 or G3 ? A group may have an additive operation, e.g., (R, +). In such a case, inverses a−1 are written as −a; powers an become sums na, e.g., 3a = a+a+a

5. SOME MORE EXAMPLES OF GROUPS

115

and (−2)a = (−a)+(−a); and a0 = e is simply 0a = 0. Thus, a group (G, +) is cyclic if G = {na | n ∈ Z}, also written as G = a. While (R, +) is not cyclic (why?), other familiar groups are.

 prove that any two cyclic groups of order n are isomorphic. Conclude that ∼

Exercise 15. Show that (Zn , +) is a cyclic group with n elements. Then (Zn , +) and (Cn , ·) are isomorphic, written as (Zn , +) = (Cn , ·).

Hint: Relabeling a generator of (Zn , +) as a generator of (Cn , ·) will make their multiplication tables identical. ♦ As for an infinite cyclic group, we have seen one: the group of integers (Z, +) with its two generators 1 and −1 (explain!), i.e., (Z, +) = 1 = −1. And this is essentially all that can be seen: any other infinite cyclic group is isomorphic to (Z, +) (Why? Compare with Exercise 15). For instance, Exercise 16. Let TZ be the set of translations of Z. Show that TZ is an ∼ Z = (Z, +).

 infinite cyclic subgroup of S(Z). Conclude that T

We agreed above that a−n = (a−1 )n for every positive integer n (this is simply the meaning of our notation). In order to explore if and how a generates the whole group, we need to be able to manipulate all powers of a:

Exercise 17. Is it true that (a−1 )n = (an )−1 for any integer n? Why? 5.6. Will the court, please, come to order! It is true that every element a ∈ G generates a cyclic subgroup a of G. Moreover, if G is finite, there must be a positive n such that an = e; otherwise, a will generate an infinite cyclic subgroup a of G! In general, Definition 4. If G is a group and a ∈ G, the smallest positive integer n for

i which an = e is called the order of a and denoted by o(a). If such n does

not exist, we say that a has infinite order and write o(a) = ∞. Here is a bunch of examples. Check them all out on your own! • In C4 the order o(i) = 4 while o(−1) = 2. However, in C5 , all elements (except for the identity 1, of course) have orders 5 so that each generates the whole group C5 (cf. Fig. 4c). • Moving to additive notation, o(3) = 2 in Z6 because 3 + 3 = 0; but o(3) = 4 in Z4 because 3 + 3 + 3 + 3 = 12 = 0 and no smaller sum would yield the identity 0; still yet, o(3) = ∞ in (Z, +) (why?). • Finally, for the reflection s and the generating rotation r in the dihedral group Dn we have o(s) = 2 while o(r) = n. Problem 8. Let a and b be elements of a group G, and let o(a) = k. (a) What is o(a−1 )? How about o(am ) for any m ∈ Z? (b) Prove that H = {e, a, a2 , a3 , . . . , ak−1 } is a subgroup of G, previously denoted by a. Deduce that the order of a is o(a). (c) If o(ab) = n, prove that o(ba) = n too.

116

5. GROUP THEORY

Partial solutions: (a) If o(am ) = s then ams = e, i.e., o(a) = k divides ms (why?). The smallest s for which this happens is s = k/ gcd(k, m) ♦ (why?). In particular, for m = −1 we have s = k and o(a−1 ) = ◦(a). i j (b) The product of any two powers a and a in H is also a power in H: if i + j ≥ k, simply subtract k to land ai+j = ai+j−k in H. Since aj and ♦ ak−j are inverses of each other, H satisfies the definition of a group. (c) For concreteness, suppose o(ab) = 3. Then e = ababab. Multiplying on the left by a−1 and on the right by a yields a−1 ea = bababa, i.e., e = (ba)3 . How does this imply that 3 is also the order of ba? ♦



Exercise 18. Sometimes “circular reasoning” is useful. (a) If G is cyclic, show that it is abelian. (b) If G is cyclic of order n, show that it has an element of order n. (c) Show that Dn is non-abelian and hence non-cyclic, but it contains a cyclic subgroup of order n. Hint: (c) Consider the set of rotational symmetries of a regular n-gon. ♦ 5.7. A never-ending cycle? Can an infinite group have elements of finite order? Not only the answer is Yes, but you have worked many times in the “extreme” scenario:



Problem 9. Give at least two different examples of (infinite) groups that contain elements of order n for every n ≥ 1. Hint: Two possible answers are among the groups on pp. 112 -113.



We constructed the cyclic groups Cn as examples of finite subgroups of the circle C. Are these all finite subgroups of C∗ ? The ingredients for the solution to our final problem below are spread all over this section. Problem 10. (Intermediate) Find all finite subgroups of (C∗ , ·).

6. Permutation (or Symmetric) Groups Permutation groups are the substance of Rubik’s Cube I-II. Indeed, their complexity is what makes the Rubik’s Cube such a tantalizing and challenging puzzle. Even though permutations provide “just” examples of groups, they are so fundamental for the development of group theory that it is worthwhile reviewing them again here and doing all associated exercises. If you feel strongly prepared for the topic, tackle on your own the 15-puzzle in Problem 1 and rejoin us later for the official “showdown” via permutations. 6.1. The word permutation has at least five mathematical synonyms. Definition 5. Let A be a set of n elements. A permutation α of A is a

i rearrangement of the elements of A. In other words, α is a 1-to-1 function

from A onto A, a.k.a. a 1-to-1 correspondence or a bijection of A.

6. PERMUTATION (OR SYMMETRIC) GROUPS

117

For example, let n = 5, and let us denote the elements of A by numbers, e.g., A = {1, 2, 3, 4, 5}. It is convenient to represent a permutation α by a table with two rows as follows: Å

ã 12345 , α= 43512

where α(1) = 4, α(2) = 3, α(3) = 5, α(4) = 1, α(5) = 2. Thus, α sends 1 to position 4, 2 to position 3, and so on. From the viewpoint of group theory, the first thing to notice is that the product of two permutations is a permutation as well (why?). For instance, Å

if β =

ã Å ãÅ ã Å ã 12345 12345 12345 12345 then αβ = = . 23154 43512 23154 35421

It is no surprise that in the above calculation we applied first β and then α but wrote αβ in the standard right-to-left notation. To be concrete, (αβ)(1) = α(β(1)) = α(2) = 3, (αβ)(2) = α(β(3)) = α(1) = 4, and so on.

 i

As A is a subset of N, the permutations of A can be also viewed as symmetries of A: they map A onto itself, just as the reflection across a vertical line maps the full-face elephant E2 onto itself. As shown in Rubik’s Cube II, the set of all permutations of A = {1, 2, . . . , n} forms a group, called the symmetric group on n elements and denoted by Sn . Let us review some basic facts that make Sn a group. • To obtain the inverse of a permutation, simply return all elements of A to their original positions. For example, check that α−1 =

Å

ã Å ã 12345 12345 and αα−1 = α−1 α = e = . 45213 12345

• The identity permutation e above is, of course, the do-nothing permutation e(i) = i for all i. • Finally, α(βγ) = (αβ)γ for any α, β, γ ∈ Sn because the composition of any functions is an associative operation. 6.2. Law & Order in Permuterland. In a high court of the kingdom of Permuterland,13 there are three judges for every trial. In the grand tradition of algebra, let’s call them A, B, and C. They file in at the beginning of any trial and sit at the table in the order ABC. But when the eccentric king of Permuterland, who attends all trials, yells “Promenade 1,” B and C change places; and when he yells “Promenade 2,” A and B change places; and when he yells “Promenade 3,” A goes to where C was sitting, B goes to where A was sitting, and C goes to where B was sitting. 13

The example of Permuterland was introduced in 1973 by Roy Dubish in his Groups (Topics For Mathematics Clubs, [22]). It is interesting to realize that 40 years ago, group theory was considered a suitable topic for budding pre-college mathematicians. The publisher of the book is the National Council of Teachers of Mathematics.

118

5. GROUP THEORY

Now, in a hectic mood one day, the king yells “Promenade 1” p1 p2 and, two minutes later, yells “Promenade 2.” To his royal amazement, p4 the king realizes that the judges p3 are now seated exactly as they p2 p1 would be if instead he had just yelled “Promenade 3.” The next day he decides to try this procedure Figure 5. Permuterland again – but with a slight variation: now he yells “Promenade 2” first and then yells “Promenade 1” – and he is amazed to find out that the result is not the same as Promenade 3; indeed, the result is what he has been calling Promenade 4. Exercise 19. Of course, you recognize that the “Promenades” are simply elements of some Sk . What is k and how does the example of Permuterland show that this Sk is non-abelian? Solution: The judges comprise the set A = {A, B, C}. If pj denotes Promenade j, then the King’s favorite promenades p1 , p2 , p3 , and p4 are permutations of A, i.e., they are elements of the symmetric group S3 . The King’s observation p1 p2 = p3 = p4 = p2 p1 shows that S3 is non-abelian. 

 tation. Which promenades are missing? What is the order of S ? S ? S ? Exercise 20. Write all promenades in Figure 5 in the standard 2-row no3

4

n

Partial Solution: Promenade 5 that switches A and C and the donothing Promenade 6 are missing in Figure 5. The order of S3 is thus 6. One does not need to know anything about groups to calculate the order of Sn : in Combinatorics I, we imagined each permutation as a row of n empty slots, to be filled with the numbers 1 through n in some order; this helped us arrive with |Sn | = n!. In particular, |S3 | = 3! = 6 and |S4 | = 4! = 24. ♦

 Exercise 21. Find the order of every element of S . 3

Solution: We can view the six promenades in Permuterland according to the number of judges they move. Three promenades switch two judges: p1 = (B ↔ C), p2 = (A ↔ B), and p5 = (C ↔ A). Two promenades rotate the three judges around, in one or the other direction: p3 = (A ← C ← B ← A) and p4 = (A → C → B → A); and promenade p6 does not move anyone. From here, o(p6 ) = 1, o(p1 ) = o(p2 ) = o(p5 ) = 2, and o(p3 ) = o(p4 ) = 3.  Note that p1 , p2 , and p5 are self-inverses, but p3 = p−1 4 . Also, none of the individual element’s orders equals the order 6 of S3 ; still all of them are divisors of 6. Is this a coincidence? The answer will come up later. Å

ã 1 2 ... n 6.3. Cycling again. The standard 2-row notation π = π(1) π(2) . . . π(n)

for any π ∈ Sn is not the only possible way to denote permutations. Let’s use it in the practice exercises below and think if there is a “better” option.

6. PERMUTATION (OR SYMMETRIC) GROUPS

119

Exercise 22. Perform the indicated operations: Å

ã Å ã 123 123 and ρ = ; 312 321 Å ãÅ ã Å ã−1 Å ã3 1234567 1234567 123456 12345678 (b) , , and ; 3451267 7624135 234615 14857623 Å ã−2 1 2 3 4 5 6 7 8 9 10 11 12 (c) . 10 2 9 8 12 3 4 1 11 5 7 6

(a) πρ and ρπ where π =



Notice that the permutation π in (a) has the effect of moving the elements i around in a cycle. Thus, we call it a cycle of length 3 and we write it as

π = (1 3 2). This is just another, more convenient, notation for the same permutation. We think of (1 3 2) as representing the following mapping: 1 → 3 → 2 → 1, and we drop the spaces if only one-digit numbers appear. Clearly, (132) = (321) = (213). A cycle of length r is called an r-cycle. A 2-cycle is also called a transpoi sition since it transposes two elements. Which promenades in Permuterland are transpositions and which are 3-cycles?

 Exercise 23. Calculate (1356) , (1356) , and (1356) . What is o((1356))? 2

3

4

It takes 12 “one-hour” rotations for a clock to come back to its original position. Analogously for the r-cycles:

 Problem 11. Prove that an r-cycle is of order r.

Exercise 24. Calculate (1342)(123) and (1534269)−1 . Hint: Remember to apply permutations from right to left! E.g., (1342)(123) sends 2 → 3, then 3 → 4; so overall 2 → 4.♦

11 12 1 2 10 9 3 8 4 7 6 5

Z12

Not all permutations are cycles (obviously!) . . . but it is true that they can all be written as products of one or more cycles. For starters,

 Exercise 25. Write permutation φ =

Å

ã 12345678 as a product of cycles. 32567814

Now generalize this result to any permutation, on your way to a formula for the order of permutations. Problem 12. Prove the following statements. (a) Every permutation can be expressed as a product of disjoint cycles, i.e., cycles which have no common elements. (b) Every permutation can be expressed as a product of transpositions. (c) The order of the product of disjoint cycles is the least common multiple (lcm) of the lengths of these cycles. “Proof” by Example: You should have found out in Exercise 25 that φ = (1357)(468) is the product of two disjoint cycles. Disjoint cycles are great because they commute: it doesn’t matter which way you write them, you will get the same result; e.g., (1357)(468) = (468)(1357). Thus, powers

120

5. GROUP THEORY

 of the permutation can be computed by taking individual powers of each

cycle; e.g., φ4 = (468)4 (1357)4 = (468) (why?). In order to eliminate 3-cycle (468) you need a power φk where k is a multiple of 3; similarly, k must be a multiple of 4 in order to eliminate 4-cycle (1357) (why?). This leads to the inevitable conclusion that raising φ only to multiples of 12 will make it the identity. Therefore, o(φ) = 12 = lcm(3, 4). ♦ Further, (1357) = (17)(15)(13) and (468) = (48)(46), so that one way to represent φ as a product of transpositions is φ = (17)(15)(13)(48)(46). ♦

6.4. Permutations are born unequal! Can you guess why we didn’t use the transpositions to calculate o(φ)? The key reason is that transpositions cannot always be made disjoint; hence, they may not commute and, in general, are not convenient in calculating the order of the permutation. Nevertheless, representing a permutation as product of transpositions plays a crucial role in solving the 15-puzzle, so don’t discard transpositions yet! Definition 6. A permutation is said to be even if it can be expressed as the i product of an even number of transpositions. A permutation is odd if it can be expressed as the product of an odd number of transpositions. For example, (123) and (12)(2543) are even since (123) = (13)(12) and (12)(2543) = (12)(23)(24)(25); while (1234) = (14)(13)(12) is odd, and so is φ above. The identity permutation is even: (1) = (12)(12). An “annoying” question should pop up in your mind: Isn’t it possible to write the same permutation in two different ways, once as a product of an even number of transpositions, and once as a product of an odd number of transpositions? If yes, this would completely obviate the meaning of the above definition! We urgently need to resolve this question. Problem 13. Prove the following facts about even and odd permutations. (a) The identity permutation is not odd. (b) Every permutation in Sn is either even or odd but not both. (c) An r-cycle is even if and only if r is odd. Part (a) may initially strike you as strange – why care so much about the specific case of the identity not being odd? If (b) is proven, wouldn’t it subsume (a)? Still, stating (a) separately is no mistake.

 PST 35. To prove a property for all permutations, first prove it for the

special case of the identity e and then reduce the general case to the case for e. Here is how this idea applies specifically to our problem.

“Proof” by Example: (b) If some α ∈ Sn were both even and odd, that would force e itself to be odd! Indeed, suppose that (hypothetically!) α = t1 t2 = q1 q2 q3 for some transpositions ti and qj . Since every transposition is its own inverse, t2i = e for all i and (t1 t2 )−1 = t2 t1 (Why? Check it!). We can now eliminate the LHS by pre-multiplying everything by t2 t1 :

6. PERMUTATION (OR SYMMETRIC) GROUPS

121

(t2 t1 )(t1 t2 ) = (t2 t1 )(q1 q2 q3 ) ⇒ e = t2 t1 q1 q2 q3 . This represents e as a product of 5 transpositions, and contradicts e being even from part (a). Of course, you should repeat this proof with arbitrary ♦ odd and even numbers of ti s and qj s, respectively. This was the easy part. The hard part will come up when you try to show that e cannot be odd, as desired in (a). Skeleton of a proof: (a) By contradiction, start with e = t1 t2 · · · t2n+1 being a shortest representation of e as a product of an odd number of transpositions. WLOG, suppose that 1 does appear in this representation, and locate the first 1 from right to left, in some tk = (1a). By considering 4 different possibilities for the previous transposition tk−1 , show that you can rewrite tk−1 tk in an equivalent form tk−1 tk where 1 appears only in the left transposition tk−1 . Conclude that you can consecutively move 1 to the left14 until it appears only in the leftmost transposition t1 = (1a). This is impossible, as e = (1a)t2 t3 · · · t2n+1 is supposed to fix everything, yet it actually moves 1 to a! Thus, 1 cannot appear in the representation of e, which is yet another contradiction (why?). ♦ Notice that in the above proof, we used a minimality principle:

 PST 36. Choosing to work with a shortest representation of e wrt a certain property gives you grounds for contradicting later this same shortest length.

In particular, in Problem 13b, if tk−1 = (1a), then tk−1 tk = (1a)(1a) = e reduces the length of the representation by 2. While keeping the number of transpositions odd, this blatantly contradicts the assumption of minimality. Now that the hard work is done, and we are certain that each permutation is either even or odd (but not both!), we have a simple algorithm:

 PST 37. To find the parity of any π ∈ S , write π in some way as a product n

of transpositions. The parity of the number of these transpositions will be equal to the parity of π.

Proof: (c) In part (a) earlier, when moving the 1 to the left, you probably used an equality like (1b)(1a) = (1a)(ab), where both sides equal the 3-cycle (1ab). A similar representation holds true for any r-cycle: (a1 a2 . . . ar ) = (a1 a2 )(a2 a3 ) · · · (ar−2 ar−1 )(ar−1 ar ) .





r−1 transpositions

From here, the parity of an r-cycle matches the parity of r − 1.



In fact, there are a number of other ways to represent a cycle as a product of transpositions. Can you think of several more? 14 The technical details of this move are included in the Hints section. A very different proof of parts (a) and (b) was already featured in Rubik II.

122

5. GROUP THEORY

Let us denote the set of all even permutations of Sn by An . Exercise 26. Show that the product of two permutations is even iff they are of the same parity, and that a permutation and its inverse have the same parity. Conclude that An is a subgroup of i Sn , called the alternating group on n elements.



(12) (132)

An

e

Sn

(1432)

6.5. Permutations in space. The relationship between An and Sn can explain a geometric phenomenon which we encountered earlier. Recall the group of all rotations G3 of the regular tetrahedron T in Problem 3. Since T has 4 vertices and every rotation permutes them (why?), it is obvious that G3 must be a subgroup of S4 . But G3 is not the entire S4 : Problem 14. Explain why G3 is the set of all even permutations A4 . And while you are at it, what is the group of all symmetries of the tetrahedron? Almost a solution: There are two types of axes of rotation for the tetrahedron T (cf. Fig. 3c), each of which defines a different type of even permutation of the vertices: a product of two disjoint transpositions (ab)(cd), or a product of two non-disjoint transposition (ab)(bc) (a.k.a., a 3-cycle). As for the full group S(T ), you can use brute force to list all 24 symmetries of T (cf. C Fig. 7a). Or you can be “sneaky” and use T the fact that there is no group G properly sitting between A4 and S4 (no G with A4  G  S4 ). (The proof will appear via Lagrange’s Theorem in Group Theory II.) ♦ Figure 7. More symmetries The cube, on the other hand, is a different matter altogether: Problem 15. Prove that the group of rotations of the cube is the entire S4 . Veiled hint: Can you think of the 4 things in the cube C which are being permuted by every rotation of the cube and whose group of permutations ♦ “coincides” with the group G4 of rotations of the cube? Problem 16. (Advanced) What is the group of all symmetries of the cube? Hints through answer: The full S(C) turns out to be twice as big as S4 , but certainly not as big as S8 . Why? Which permutations of the cube’s vertices cannot be obtained by symmetries of C? In fact, S(C) is the direct product of S4 with the cyclic group C2 = {1, −1}, i.e., S(C) = S4 × C2 . Now, the identity element 1 of C2 is easy to interpret geometrically (how?), but what is the geometric meaning of the other element −1 of C2 ? It cannot be a rotation of the cube (as all of those are already included in ♦ S4 ); but it is definitely a symmetry of order 2 (why?), so what is it?

7. THE 15-PUZZLE PUZZLED OUT

123

7. The 15-Puzzle Puzzled Out 7.1. Double tasking. Whoever seriously attempts Problem 1 realizes that

 PST 38. Solving the 15-puzzle consists of two distinct parts:

construction and elimination. The construction part amounts to an algorithm that demonstrates which positions of the puzzle are attainable. The elimination part is essentially a proof (by methods beyond the trial-and-error of the physical game) that the remaining positions cannot be attained. With our group theory knowledge, we are fully equipped to do the elimination part. We start with some preliminary work.

 PST 39. The first step in applying a theory to a problem is to interpret the problem in the setup of the theory.

In particular, you need to find out the group represented by the 15-puzzle, making sure that the puzzle movements match the group operation. 7.2. Is the 15-puzzle really a 15 -puzzle? It should be clear by now that the game has something to do with permutations. At a first glance, these permutations are elements of S15 . Indeed, let us agree to read the numbers line by line from left to right, starting at the top left corner and ending with the empty cell in the bottom right corner. Exercise 27. Interpret the arrangements in Figure 1 as permutations in S15 . Some answers: Figure 1a depicts the identity (1), Figure 1b the product of 4 transpositions (1,4)(2,3)(9,12)(10,11), and Figure 1c the 12-cycle Å

ã 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 = (1, 10, 3, 8, 6, 2, 9, 12, 5, 11, 4, 7). 10 9 8 7 11 2 1 6 12 3 4 5 13 14 15

Write the permutation in Figure 1d also in cyclic notation.



Yet, the moves of the empty cell cannot be interpreted as permutations in S15 ! The empty cell interferes, rendering intermediate arrangements that are something different and cannot be encoded just by looking at S15 . In order to apply our group theory knowledge, we want to be able to think of all arrangements of the 15-puzzle (regardless of where the empty cell is) as some permutations.

 PST 40. The key idea is to close our eyes and imagine that the number 16 is written in the empty cell. Then each move of the puzzle is simply a transposition of 16 with an adjacent number, and each arrangement of the 15-puzzle is a permutation in S16 . This allows us to think of S15 as a part of S16 : for every α ∈ S15 insert 16 at the end of α, i.e., set α(16) = 16; thus, S15 becomes the subset of S16 consisting of all permutations that fix 16 (cf. Fig. 5a). Moreover,

 Exercise 28. Prove that, viewed as above, S

15

is a subgroup of S16 .

124

5. GROUP THEORY α1

α3

A16 α5

α4

α2

S16 S15

e α8

α6

1

2

3

5 9 13

α7

P on e

4

1

2

3

4

1

2

3

4

1

3

4

8

5

6

7

8

5

6

7

8

5 10 6

8

2

9 10 11 16

9 10 16 11

9 14 7 11

13 14 15 12

13 14 15 12

13 15 12 16

α1

α2

α8

Figure 5. Even promenades in Puzzleland 7.3. Strolling along in Puzzleland. As you have certainly noticed, all arrangements in Problem 1 have “16” in their last position (bottom right corner), and as such they are elements of our S15 inside S16 . To get from the identity arrangement e ∈ S15 to another such arrangement α ∈ S15 , the 16-cell must leave its position, trace a path along the puzzle, and in the end return to its initial position – the bottom right corner. Figure 5b displays one such path P for 16: P = U LU LDDRR, where U = up, L = left, etc. Figures 5c-e show the results α1 = (16,12) and α2 = (16,11)(16,12) after one and two moves along P, as well as the final result α8 after 16 completes P.

 closed

Problem 17. Write down α8 . Is it odd or even? What if you choose another 15 path for 16: what will the parity of the final permutation be? Why? Solution: As P has 8 steps, α8 is even: α8 = (16, 12)(16, 15)(16, 14)(16, 10)(16, 6)(16, 7)(16, 11)(16, 12). An arbitrary closed path P  will still return 16 to its original position. Hence, P  has as many up as down steps, and as many left as right steps. In short, P  has some even length 2k, forcing the final permutation α2k to be a product of 2k transpositions and, therefore, necessarily even. 

The diagram in Figure 5a illustrates what is happening along the 8-step path P. Notice that S15 and the alternating group A16 are both subgroups of S16 ; their intersection S15 ∩ A16 (in white) is simply A15 , the set of even permutations in S15 . The path P starts at the identity e in this intersection. Taking one step along P multiplies e by one transposition and hence lands on some odd α1 ∈ A16 . The next step produces an even α2 ∈ A16 ; the third step yields again an odd α3 ∈ A16 , and so on. The path zigzags, going in and out of A16 , until it finally lands on the even α8 ∈ S15 ∩ A16 = A15 . What is the conclusion? If you want to start from the identity permutation e and end with the empty cell still in the bottom right corner, you must stroll along an even-length (closed) path, which will terminate in the intersection S15 ∩ A16 and hence your final result will always be an even permutation! In other words, Theorem 1. Odd permutations defy the 15-puzzle: only even permutations can be obtained in the 15-puzzle. 15

A closed path means that it starts and ends at the same place.

7. THE 15-PUZZLE PUZZLED OUT



125

In particular, you will never be able to reach the 12-cycle in Figure 1c (why?). On the other hand, the even permutation in Figure 1b still has a chance of being reached! We will determine whether this is so in the next subsection. How about the permutation in Figure 1d? 7.4. Playing the puzzle. Now it remains to show that any even permutation can be obtained via the 15-puzzle. The best way to do this is . . . to play the puzzle. But not randomly! Here is a vastly simplifying idea:



PST 41. Instead of starting with the identity e and finding your way to any even arrangement α of the 15-puzzle, reverse the process – start with α and try to reach e.16 You do not need 16 anymore; so use the empty cell instead. Problem 18. Here is the beginning of one possible algorithm to convert any even permutation α to e. Think about each step and how to perform it.



(1) Move 1 to the top left position. Without displacing 1, move 2 on its right; now, without shifting 1 or 2, move 3 to 2’s right. (2) Move 4 to 3’s right (this may temporarily displace 1, 2, and 3). By now you have arranged the first row into 1, 2, 3, 4. (3) Using the same algorithm (without touching the first row), you can arrange second row into 5, 6, 7, 8. (4) With some more care (without touching the first two rows), you can rearrange the third row into 9, 10, 11, 12. (5) Push the empty cell to the rightmost position on the fourth row. Call the resulting permutation β, i.e., α → β. What can β be? So far, we know that β has the numbers 1 through 12 in their correct positions. Since we started with an even α, Theorem 1 ensures that β must be even too. But there are only 3 ways to rearrange the remaining numbers {13, 14, 15} in β and still be even: the 3-cycles β1 = (13,14,15) and β2 = (13,15,14), or the identity e itself. If you manage to convert β1 → e, then applying the same algorithm to β2 will convert it to β1 (why? β22 = β1 ), so you will again reach e after another application of your algorithm: β2 → β1 → e. What is left is probably the hardest conversion you can make in the 15-puzzle: it looks simple, but it captures the true spirit of the puzzle. Prove that Theorem 2. In the 15-puzzle it is possible to convert the 3-cycle β1 = (13,14,15) to the identity arrangement e. Conclude that all even permutations are reachable.

1

2

3

4

5

6

7

8

2

3

4

5

6

7

8

9 10 11 12

9 10 11 12

14 15 13

13 14 15

β1 16

?

1

e

You can think of this as if you are going from your house to an unknown place and then back. Which way will be easier to cover? Probably from the unknown place back home, because you are likely to recognize more and more familiar scenes and road markers as you approach your house. Traveling to a familiar place will give you an advantage to take alternative routes or find out with greater ease where you are.

126

5. GROUP THEORY

8. Hints and Solutions to Selected Problems Exercise 1. The multiplication table for the Turning Soldier group T is shown in Figure 6a. The product of any two actions is also one of our four actions: s, r, b, and l. Further, every action has a counteraction in this table; to see this fast, observe that for every row labeled by action x there is some column labeled by action y such that the row and the column intersect in the “do nothing” action s: xy = s. Thus, rl = lr = bb = ss = s. In particular, the counteraction of l is r, the counteraction of b is b itself, etc.: every element has a counteraction in T ! By definition of a group, we have established that the Turning Soldier group T is a group indeed!  · s r b l

s s r b l

r r b l s

b l b l l s s r r b

· n c i t

n c i t n c i t c n l s i l n r t s r n

Figure 6. Tables for the Turning Soldier and the One Sock groups Note that b2 = r4 = l4 = s (why?), which gives more reasons to include the “do nothing” action s in T . In fact, s has the very special property that if we multiply it by something else, we will get that something else: sx = xs = x for any action x ∈ T . This property is clearly demonstrated by the row and by the column labeled by s, as they mimic precisely the labeling row and labeling column in the table. ♦ Exercise 2. The table for the One Sock group S is given in Figure 6b. The interesting observation here is that every action is its own counteraction: x2 = n for any x ∈ S. We observe this along the diagonal of the table, which is filled only with the “do nothing” action n. However, we didn’t observe a similar phenomenon in the Turning Soldier group T : r and l were counteractions of each other, but certainly not of themselves! This already makes the situation very suspicious: the two groups T and S must be different somehow, despite the fact that each hast 4 elements. ♦ Exercise 3. Let i, r1 , and r2 denote the rotations of Δ by 0◦ , 120◦ counterclockwise, and 120◦ clockwise, and let s1 , s2 , and s3 denote the three reflections of Δ across its altitudes (as in Fig. 2). The partial multiplication table for S(Δ) is displayed in Figure 8c. For example, s1 r1 = s3 and s1 s2 = r2 ; but multiplying the other way around yields r1 s1 = s2 and s2 s1 = r1 . To see how to get these results fast, label Δ as ABC and track down where the vertices go under the rotations and the reflections (cf. Fig. 7). For practice, the beginner should complete the whole table for S(Δ). The more advanced reader will realize that it is not necessary to go through the grueling calculations of finding the exact multiplication table for every group:

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS C

B

A

B

A s1

r1 C

A s1 r1

s3

C

C

C s2

= C

B A

A

B

127

B

A s1

B

A s1 s2

B

r2 = C A

C

B

Figure 7. In S(Δ) = D3 : s1 r1 = s3 and s1 s2 = r2 a more general argument is usually much faster and more elegant. In our situation with S(Δ): think about why the composition of two symmetries of Δ is again a symmetry of Δ, and why a symmetry always has a counteraction, i.e., a “reverse” symmetry that undoes it. For example, the counteraction of ♦ r1 is r2 , and of s1 is s1 itself.

· i

i i

· i s

i s i s s i

· i r1 r2 s1 s2 s3

i r1 r2 i r1 r2 r1 r2 i r2 i r1 s1 s3 s2 s2 s1 s3 s3 s2 s1

s1 s1 s2 s3 i r1 r2

s2 s2 s3 s1 r2 i r1

s3 s3 s1 s2 r1 r2 i

Figure 8. Tables for symmetry groups S(E1 ), S(E2 ), and S(Δ) If you are still unsure which “symmetries” of our figures we are allowed to consider in this session, check out the footnote on page 107: the allowable symmetries are called Euclidean motions. These are motions (bijections) of the plane that preserve distances, also known as rigid motions or isometries: i imagine your figure made of cardboard and you want to transform the figure onto itself without bending, twisting, pinching, or doing other horrible stuff to the cardboard. Thus, a symmetry of a plane figure is not just any bijection of the figure onto itself: it is a rigid motion. For example, switching the vertices A and B of a square ABCD while leaving the other two vertices C and D fixed is not part of a symmetry of the square (why?). Be aware that i in some sources “rigid” motions exclude orientation-changing motions like reflections (a reflection changes a clockwise orientation ABCD of the square to a counterclockwise orientation of the vertices, i.e., ADCB). However, we will consider reflections as part of our symmetry groups in this session. Finally, a reflection across a line combined with a translation along this i line is what is called a glide reflection. For any plane figure, its symmetry group will be generated by and will consist of the four types of plane transformations mentioned in the text: rotations, reflections, translations, and glide reflections. This is a fact that needs a proof, and we leave it to the more experienced reader to provide such a proof. Exercise 4. For n ≥ 3, Dn has 2n elements: n rotations and n reflections. The pattern breaks for n = 1 and n = 2. Of course, we may never think

128

5. GROUP THEORY

of a point or a segment as a “regular” 1-gon or 2-gon; but if we do, we will find out that D1 = {i} = S(E1 ) and D2 = {i, s} = S(E2 ) (cf. Fig. 8a–b). Thus, the sequence of Dn ’s sizes is {1, 2, 6, 8, 10, . . .}, and it misses the even number 4. One way to achieve a symmetry group of size 4 is to put two E2 elephants on top of each other or, equivalently, to consider the symmetry group of a (non-square) rectangle (cf. Fig. 9a). 

O

α

Figure 9. S(rectangle) and |S(F )| = 3 Problem 2. In Figure 9b, two equilateral triangles share the same center O and can be obtained from each other by a rotation and a rescaling; the rotation is about O at some angle α = k π3 (k ∈ Z), e.g., α = 45◦ , while the rescaling has some ratio r = 1, e.g., r = 1.5. It is easy to see that the union of these two triangles has only 3 (rotational) symmetries, written as |S(F )| = 3. Generalize this example to |S(F )| = n for any n ≥ 1. ♦ Problem 3. The hardest question to answer here is why G1 , G2 , and G3 are actually groups. You can show this by brute force for each of the groups (e.g., compute their multiplication tables). The true explanation, however, is that the composition of any two rotations in space is again a rotation in space, the proof of which can be done with linear algebra methods (e.g., multiplying the so-called orthogonal matrices) and is beyond the scope of this session. Now, having accepted that we are indeed dealing with groups G1 , G2 , and G3 , you can find plenty of reasons for these groups to be different. The text suggests that G1 has a generating rotation; if rj is the rotation about the vertical axis of the pyramid by (30j)◦ clockwise, then applying r1 repetitively j times will yield rotation rj for all j; so r1 certainly generates all of G1 . However, r5 , r7 , and r11 also generate G1 : either check this by brute-force examination of all their repetitive applications or, if you are more advanced, use slick reasoning from number theory to conclude that the generating rotations are precisely those rj s for which j is relatively prime with 12, i.e., j = 1, 5, 7, or 11. On the other hand, if a solid has more than 1 rotational axis, there is no hope for it to have a generating rotation: indeed, every rotation can generate at most some other rotations about its own axis, but certainly not about another rotational axis! Thus, G2 and G3 lack single generators! The text asks us also to pay attention to non-trivial rotations that counteract themselves: such a rotation can only be by 180◦ about the corresponding axis (why?). Each of the rotational axes of our solids has such a special rotation. Therefore, the number of “self-counteracting” rotations for each solid is the number of axes for that solid: 1, 7, and 6, respectively. ♦

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

129

Problem 4. If we add a reflectional symmetry s to R4 (i.e., s is a reflection across one of the two diagonals or across one of the two midsegments of the square), then the products r0 s, r1 s, r2 s, r3 s must also be in our subgroup of D4 . These products are obviously 4 different symmetries: all first apply s to the square, but then each continues with a different rotation rj of the square. In addition, each reflection of the square switches the labeling of the vertices of the square from clockwise to counterclockwise orientation (check it!); yet any rotation of the square preserves the orientation of this labeling (check it!). Hence, each product rj s first changes the orientation of the labeling (via s) and then preserves this new orientation (via rj ); so overall, rj s changes the orientation of the vertices’ labeling and, thus, must be one of the reflections of the square (cf. Fig. 10a). · r0 r1 r2 r3 s 1 s 2 s 3 s 4 · r0 r1 r2 r3 s 1 s 2 s 3 s 4 r0 r1 R4: reflections r2 rotations r3 s1 s2 reflections rotations s3 s4

· r0 r1 r2 r3 s1 s2 s3 s4

r0 r0 r1 r2 r3 s1 s2 s3 s4

r1 r1 r2 r3 r0

r2 r2 r3 r0 r1 s2 s1 s4 s3

r3 s 1 s 2 s 3 r3 s 1 s 2 s 3 r0 r1 s 2 s 1 s 4 r2 r0 r2 r2 r0 r0 r2

s4 s4 s3

r2 r0

Figure 10. Rotations vs. reflections in D4



Therefore, r0 s, r1 s, r2 s, and r3 s are the four distinct reflections of the square, and our subgroup R4 ∪ {r0 s, r1 s, r2 s, r3 s} of D4 has expanded to include all 8 elements of D4 . To paraphrase, there is no subgroup of D4 strictly between the rotational subgroup R4 and D4 itself.  It is clear that {r0 } (called the trivial or the identity subgroup) is the only subgroup of D4 of size 1; and that the subgroups of size 2 consist of the identity r0 plus a self-counteracting symmetry, i.e., these are {r0 , r2 } and {r0 , sj } for any reflection sj . The previous argument shows that once a subgroup K contains r1 and some reflection sj , then K contains everything, i.e., K = D4 . A similar argument can be applied to r3 and any reflection sj , since r3 generates the rotational subgroup R4 , just as r1 does. Thus, the rotations in any other subgroup of D4 are at most r2 and r0 (of course). Now, if you complete the full multiplication table for D4 , you will notice that r2 is a very special element: it commutes with everything in D4 , i.e., r2 x = xr2 for all x ∈ D4 (cf. Fig. 10b). In particular, if a subgroup K contains r2 and some reflection sj , then r2 sj = sj r2 = sk for some other reflection sk . Since r22 = s2k = s2j = r0 , the identity, with some more work, one can manipulate the above equalities to also obtain that r2 sk = sk r2 = sj and sj sk = sk sj = r2 . In other words, {r0 , r2 , sj , sk } already forms a subgroup of D4 of order 4. There are two such subgroups of D4 ; the pairs {sj , sk }

130

5. GROUP THEORY

corresponding to these subgroups are the two reflections {s1 , s2 } across the midsegments of the square, or the two reflections {s3 , s4 } across the diagonals of the square. Any other pair of reflections in your subgroup will multiply to the rotations r1 or r3 (why?), resulting in the whole group D4 (why?). This exhausts all possibilities for subgroups of D4 . In Group Theory II you will learn of more powerful techniques for tracking and classifying subgroups K of a given group G. In particular, |K| divides |G|, which explains why the group D4 of 8 elements ended up having subgroups only of orders 1, 2, 4, and 8, all of which are divisors of 8. ♦ Exercise 5. As was shown in the text, R4 and T are isomorphic; but T and S are not isomorphic (we came up with different number of self-counteractions in each of them). It follows that R4 and S cannot be isomorphic either (why?).  Exercise 6. Every symmetry of the rectangle is its own counteraction (cf. Fig. 9a). Thus, S(rectangle) cannot be isomorphic to T . However, it is isomorphic to S via any relabeling of the four rectangle’s symmetries to the four sock actions that sends the identity symmetry r0 to the “do-nothing” action n. Explain why any such relabeling will work, and count how many i relabellings, called isomorphisms, there are between S(rectangle) and S. ♦ Problem 5. It is slightly “illegal” to ask this question yet, as we haven’t defined groups in general! This does not prevent the reader from glancing ahead at Definition 2 and drawing some conclusions. For example, it is true (and not too hard to verify) that any group G = {e, a, b, c} of order 4 is isomorphic to T or S. Indeed, check that the product of any two non-identity elements of G must equal the identity element e or the third non-identity element, e.g., ac = e or ac = b. (Why ac = a and ac = c?) (a) If ac = e, then a and c are inverses to each other, leaving b to be its own inverse: b2 = e (why?). Then the row of a prohibits ab from being equal to a or e (cf. Fig. 11a), and it can’t be b anyways (Why not?), so that the only choice left is ab = c. This in turn leaves only one possibility for a2 in the row of a: a2 = b. Using the fact that any row and any column of G’s table contains all elements of G (without repetitions or omissions, why?), you can easily fill in the rest of the table and establish that it is identical to that of T . · e a b c

e a b c e a b c a e b e c e

· e a b c

e a b c e a b c a c b b c a c b a

Figure 11. If |G| = 4 then G ∼ = T or G ∼ =S

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

131

(b) Similarly, if any other product xy = e for some x, y = e, we end up with G ∼ = T . Thus, WLOG, assume that xy = z for any of the three non-identity elements x, y, z in G. This almost completes the table for G (cf. Fig. 11b), leaving only to plug in identity elements along the ♦ diagonal: a2 = b2 = c2 = e. Without doubt, G ∼ = S. Thus, there is no “third” group of order 4. The question of a “fourth” group of order 12 is much more involved and requires techniques beyond the current scope of the session. There are, in fact, five non-isomorphic groups of order 12; and, for those familiar with the notation, they are: Z12 , Z6 × Z2 , D6 , A4 , Z3  Z4 . The reader will learn about some of these groups as we move through this part I and part II of Group Theory. ♦ Exercise 8. U is closed under multiplication but not under addition: 1+1 = 2 ∈ U . If we want U + = {0, 1, a} to be closed under multiplication for a real number a, then a2 ∈ U + . But a2 = 0 and a2 = a (a = 0, 1), so we are left with a2 = 1 and forced to conclude a = −1. Indeed, the set U + = {0, 1, −1} is closed under multiplication! If we allow the addition of finitely many numbers to U , for the resulting set U ++ to be closed under multiplication we must have at least all powers of a in U ++ for any a ∈ U ++ . But there are infinitely many such powers of a: a1 , a2 , a3 , . . .! By the Pigeonhole Principle, two such powers must coincide, i.e., an = am for some n > m > 0. From here an−m = 1 (why?), i.e., a is a root of an equation xk = 1 for some k ≥ 1. The only real numbers satisfying such equations are ±1. Hence, our previous set U + = {0, +1, −1} is the only option for a finite real extension of U that is closed under multiplication. If you allow complex numbers to be added to U , the possibilities become numerous, as each equation xk = 1 has n distinct complex solutions. This will be discussed in more detail in relation to the cyclic subgroups Cn of the  complex numbers C∗ = C − {0} (cf. Exer. 13). Exercise 9. Following the hint in the text, the product of two identity elements e1 and e2 can be viewed differently, depending on whether we choose to apply the definition of an identity element to e1 or to e2 : e1 = e1 ∗e2 = e2 . From here e1 = e2 , i.e., any two (and hence all) identity elements are equal. Similarly, if a1 and a2 are inverses of a, the triple product a1 ∗ a ∗ a2 can be calculated two ways, using the associativity property of the operation: (a1 ∗ a) ∗ a2 = e ∗ a2 = a2 and a1 ∗ (a ∗ a2 ) = a1 ∗ e = a1 . From here,  a2 = a1 ∗ a ∗ a2 = a1 , i.e., any two inverses of a are equal. Problem 6. Following the hint, for any a, b ∈ G we expand e = (a ∗ b)2 : e = (a ∗ b) ∗ (a ∗ b) = a ∗ (b ∗ a) ∗ b. Multiply both sides by a on the left and by b on the right: a ∗ e ∗ b = a ∗ (a ∗ (b ∗ a) ∗ b) ∗ b. As a ∗ a = e and b ∗ b = e, the RHS simplifies: a ∗ b = (a ∗ a) ∗ (b ∗ a) ∗ (b ∗ b) = e ∗ (b ∗ a) ∗ e = b ∗ a.  Thus, a ∗ b = b ∗ a and the group is abelian.

132

5. GROUP THEORY

Exercise 10. Verifying that (R, +) and (Z, +) are groups should be no problem. To make sure everyone is on the same page, note that the number 0 is the identity element of both groups (why?), and inverses are obtained by the usual negation of a number: a−1 = −a (why?) for any a ∈ R or Z. The case of the group (Zn , +) requires some facts from Number Theory I. For instance, to establish that the operation + is well-defined in Zn (what is this and why are we concerned about it here?), we need the fundamental lemma that adding congruences modulo n is a valid operation in Zn : if a ≡ b (mod n) and c ≡ d (mod n) then a + c ≡ b + d (mod n). Again, 0 will serve as the identity element and −a = n − a will be the inverse of a (mod n). Regarding the group (R∗ , ·) it is important to understand that removing the number 0 from R is necessary, as 0 has no multiplicative inverse (i.e., no ♦ reciprocal ). The identity element in (R∗ , ·) is the number 1 this time! Exercise 11. It is conventional in mathematics to define t0 as the identity element in any group; in particular, t0 = e in D∞ . Note that by the listing of the elements of D∞ it is clear that D∞ is generated by two elements: starting with t and s, we keep adding to D∞ all powers of t and of s, and all the resulting products of such powers. The list for D∞ contains only the products in the form tk sm ; but what about something like s2 t3 s−5 t−2 ? As D∞ is supposed to be a group, this product must be in D∞ ! Is it? To start with, s2 = e as s is a reflection. Thus, s−1 = s and we can simplify s2 t3 s−5 t−2 to t3 st−2 . To get this into the desired form tk sm we need a rule which moves somehow all ss through all ts from left to right: ? this is precisely what the problem is asking us to prove, i.e., stk = t−k s. Indeed, if x is any integer, here is how the two sides of the proposed equality act on x: stk (x) = s(tk (x)) = s(x + k) = −(x + k) = −x − k, t−k s(x) = t−k (−x) = (−x) − k = −x − k. We conclude that stk (x) = t−k s(x) for all x ∈ Z, i.e., the transformations themselves are equal: stk = t−k s. In particular, t3 st−2 = t3 (st−2 ) = t3 (t2 s) = (t3 t2 )s = t5 s. In practice, this rule boils down to pushing all ss to the right and representing any product in D∞ in the form tk sm for k ∈ Z and m = 0, 1. In conclusion, D∞ is closed under the group operation of composition. Associativity is automatic because any composition of actions (or functions) is associative: (f ◦ (g ◦ h))(x) = ((f ◦ g) ◦ h)(x) = f (g(h(x))) for all x. And inverses are easy to find too: (tk sm )−1 = sm t−k ∈ D∞ (why?). Thus, D∞ is a group. More interestingly, D∞ is the group of all symmetries of the integer line Z. Indeed, a (rigid) symmetry φ of Z must preserve adjacency among integers; in particular, φ(0) and φ(1) must be adjacent integers; thus, if φ(0) = k, then φ(1) = k + 1 or φ(1) = k − 1. In the first case, we are forced to declare φ(2) = k + 2, φ(−1) = k − 1 and so on, i.e., φ = tk is translation

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

133

by k units. In the second case, φ can be realized as the composition of a reflection and then a translation: φ = tk s (why?). In either case, the symmetry φ belongs to D∞ . As D∞ consists only of rigid symmetries of Z,  we conclude that the two groups are identical: D∞ = S(Z). Problem 7. Let s be the reflection across a vertical line through the origin in all symmetry groups, rα a rotation of the circle C by α degrees clockwise about the origin, and tq a translation by q units of the real line R. Check the following basic relations for all angles α and β and all real numbers q and u: s2 = e, srα = r−α s, stq = t−q s, rα rβ = rα+β , and tq tu = tq+u . To understand the answers below, the reader needs to be familiar with the semidirect product  of groups, according to which the above relations give: • Dn ∼ = Zn  Z2 ∼ = S(Cn ); D∞ ∼ = Z  Z2 ∼ = S(Z); ∼ ∼ • S(R) = R  Z2 ; S(C) = R/Z  Z2 . Here Cn is the set of vertices of a regular n-gon under complex multiplication, which is a cyclic group of order n (cf. Exer. 13). The quotient R/Z (another standard construct of abstract algebra) can be thought of as the interval [0, 1] with its endpoints identified, which can be easily visualized as the unit circle C. ♦ Exercise 12. That (C∗ , ·) is a group follows in much the same way as you showed that (R∗ , ·) is a group: C∗ is closed under complex multiplication (if you multiply two non-zero complex numbers, you will get a non-zero complex number), the operation is associative, the number 1 is in C∗ and acts as the identity element there, and any z ∈ C∗ has an inverse in C∗ : 1·z 1 1 1 z −1 = = = z = 2 z ∈ C∗ . z z·z zz |z| Recall here that z is the complex conjugate of z, and that zz = |z|2 is always a positive real number for z = 0. To show that the unit circle C is a subgroup of C∗ , note that 1 ∈ C and that C is closed under multiplication and taking reciprocals: if z, w ∈ C, then 1 = 1, i.e., zw and z −1 |z| = |w| = 1 so that |zw| = |z| · |w| = 1 and z1 = |z| are both in C.  Exercise 13. If z1 and z2 are two roots of the equation z n = 1, then their product is too, as well as their reciprocals: (z1 z2 )n = z1n z2n = 1 · 1 = 1, and (1/z1 )n = 1/(z1n ) = 1/1 = 1. Hence, Cn is closed under multiplication and taking inverses. Noting that 1 is always a root of the equation completes the  proof that Cn is a subgroup of C, and hence of C∗ too. Exercise 14. We already discussed earlier that G1 is generated by its rotation r1 , and thus G1 = r1  is cyclic of order 12. Sending any power r1k → ζnk ∈ Cn defines an isomorphism from G1 to Cn . As G2 and G3 have ♦ no generators, they are not cyclic, and hence not isomorphic to Cn .

134



5. GROUP THEORY

Exercise 15. (Zn , +) is generated by 1, as any k ∈ Zn can be written as k = k · 1 = 1 + 1 + · · · + 1(mod n), and hence (Zn , +) is cyclic with n elements. For further challenge, find all generators m of (Zn , +): they will be precisely the relatively prime to n numbers m, i.e., gcd(m, n) = 1 (why?). For an isomorphism φ : (Zn , +) ∼ = (Cn , ·) send a generator to a generator, ♦ and follow with the powers: k · 1 → ζnk for all k = 1, 2, . . . , n. Exercise 16. From Problem 7 on D∞ , we know that the set of translations of Z is described by TZ = {e, t, t−1 , t2 , t−2 , . . . , tk , t−k , . . . } = t, where t is the translation by 1 to the right. Thus, TZ is cyclic of infinite order, and as such, it is isomorphic to any cyclic group of infinite order, e.g., Z = 1; indeed, show that the map φ : TZ → Z defined by φ(tk ) = k for all k ∈ Z is an isomorphism. ♦ Exercise 17. By definition of the inverse of b = an in a group G, we need ? ? ? ? only to verify that b−1 b = e = bb−1 , i.e., (a−1 )n an = e = an (a−1 )n . Do this using associativity of the operation and work from inside out. For example, for n = 3 we can calculate a3 (a−1 )3 as follows: −1 −1 −1 −1 −1 −1 (aaa)(a−1 a−1 a−1 ) = a(a(aa

 )a )a ) = a(aa

 )a ) = aa  = e. e



e

e −1 n n −1 (a ) = (a ) = e−1 = e,

Problem 8. (a) Note that whenever an = e then and vice versa: if (a−1 )n = e then an = ((a−1 )n )−1 = e−1 = e. Thus, the same powers of a and a−1 equal e, i.e., the orders of these elements are the same (why?). In particular, in the present case, o(a−1 ) = k.  m For o(a ) we need the following key lemma: Lemma 1. If an = e for some n = 0, then a has finite order k that divides n. Proof: If n < 0, then −n > 0 and by the previous exercise we have a−n = (an )−1 = e−1 = e. Thus, WLOG we may assume n > 0 so that some positive power of a equals e: an = e. But then there is a smallest positive power of a which equals e, i.e., let k > 0 be the smallest integer such that ak = e. By definition, k = o(a). We want to show that k|n. To this end, divide n by k: n = kq + r for some quotient and remainder q, r ∈ Z such that 0 ≤ r < k. We can now calculate an in two different ways: an = akq+r ⇒ e = (ak )q · ar = eq ar = ar ⇒ ar = e. If r = 0, this will produce a positive integer r smaller than k with ar = e, a contradiction. Thus, r = 0 and n = kq, i.e., k|n.  Back to Problem 8(a). Moving on to o(am ) for any m ∈ Z, we already know that this order is finite because (am )k = (ak )m = em = e. So set o(am ) = s, i.e., e = (am )s = ams . By Lemma 1 applied to ams = e, we conclude that o(a) = k divides ms. Here m and k are given to us, and we are trying to find the smallest s > 0 for which this happens. If gcd(k, m) = d and k = dk1 , m = dm1 for some relatively prime k1 , m1 ∈ Z,

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

135

then k|ms iff (dk1 )|(dm1 s) iff k1 |(m1 s) iff k1 |s (why?). So, the smallest s is k1 : s = k1 = k/d = k/ gcd(k, m). Conversely, if s is given by this formula, k m gcd(k,m)

(am )s = ams = a Therefore, o(am ) =

k gcd(k,m) ·

k

= adm1 d = am1 k = (ak )m1 = em1 = e. 

Exercise 18. (a) If G is cyclic, then all its elements are powers of the same a ∈ G. Hence, the product of any two elements b, c ∈ G can be calculated by these powers: bc = ai aj = ai+j = aj+i = aj ai = cb. Thus, G is abelian because the addition among the integers i and j is also abelian.  In (b), let G = a. If o(a) = k, by Problem 8(b), H = {e, a, a2 , . . . , ak−1 } is already a (cyclic) subgroup of G, with k elements. But any power aj of a equals some element of H! Indeed, if we divide j by k, i.e., j = kq + r with quotient q and remainder r (0 ≤ r < k), then aj = akq+r = (ak )q ar = eq ar = ar ∈ H. Thus, the whole group G equals H, and their orders must be the same: n = |G| = |H| = k, i.e., o(a) = n = |G|.  (c) In our previous notation for Dn , let rk be the rotation of a regular n-gon A0 A1 A2 . . . An−1 which takes vertex A0 to Ak , for k = 0, 1, 2, . . . n − 1. Then (r1 )k = rk for all k, i.e., the rotation r1 generates the rotational subgroup Rn = {r0 , r1 , . . . , rn−1 } of Dn . In particular, Rn = r1  is a cyclic subgroup of order n. Now, if s1 the reflection of the n-gon across the perpendicular bisector of A0 A1 , then check that r1 s1 = s1 r1 . Indeed, while r1 and s1 both send vertex A0 to A1 , the two compositions r1 s1 and s1 r1 act overall differently on A0 : r1 s1 (A0 ) = r1 (A1 ) = A2 = s1 r1 (A0 ) = s1 (A1 ) = A0 .  Therefore, Dn is non-abelian. By part (a), Dn can’t be cyclic either.



Problem 9. Following the hint, let’s examine the examples in Exercise 10. The elements of (R, +) all have infinite orders (why?), except for the identity element 0, whose order is 1. The subgroup (Z, +) follows suit (why?) and doesn’t produce anything interesting in terms of orders of elements. The group (Zn , +) is finite, hence it can’t have elements of orders larger than n (why?); in fact, every j = 1, 2, . . . , n ∈ Zn has order k = n/ gcd(j, n) ≤ n. (This needs a proof!) The only elements x of (R∗ , ·) with finite orders are those for which xn = 1 for some positive n; but the only real numbers satisfying such equations have absolute value 1 (why?), i.e., x = 1 or x = −1, with orders 1 and 2, respectively. No luck here either! Moving to D∞ and using Exercise 11 check that, as long as there is a translation in the product tm sk (i.e., m = 0), the element will have infinite order. Thus, the only two elements in D∞ of finite orders are e and s. Dn is a finite group of order 2n; so it won’t have elements of order greater than 2n; in fact, check that the largest order of an element in Dn is n, attained by the rotations rj with gcd(j, n) = 1. The symmetries of the real line R behave in much the same way as D∞ (cf. solution to Problem 7); so again no luck here.

136

5. GROUP THEORY

Finally, let’s examine the symmetries S(C) of the unit circle C. Any rotation r 2π has order n (why?), and hence S(C) is a infinite group containing n elements of any finite orders n, for n ∈ N. Note that S(C) also has elements of infinite order; for example, any rotation raπ where a is an irrational number will never compose several times with itself to give the identity rotation (why?). For instance, r√2π and rπ2 fall into this category.  The final example is the unit circle C itself, or the larger group in which it is contained: the non-zero complex numbers under multiplication, (C∗ , ·). Since any cyclic group Cn = ζn  is contained in C, and since o(ζn ) = n, our  infinite groups C and C∗ have elements of any order n ∈ N. Problem 10. Let G be a finite subgroup in (C∗ , ·). Then the same discussion we had in the hints about U ++ in Exercise 8 applies to G too! Indeed, any a ∈ G has infinitely many powers {aj }, all of which are inside the finite group G. By PHP, it follows that two of those powers must coincide, i.e., an = am for some n > m, from which an−m = 1 for n − m > 0, and hence a has some finite order k (why?). In other words, ak = 1 in C, which means: (a) a has a finite order in G, and (b) the modulus of a is 1 and hence a lies on the unit circle C. Therefore, G ⊂ C. Starting from 1 ∈ G ∩ C, let’s walk along C counterclockwise. Since G is finite, after 1 ∈ G, there will be a first element g of G which we will hit along our walk. Let the angle of g with the real axis be α, i.e., g = cos α + i sin α. We claim that g generates all of G. Indeed, let h ∈ G, h = 1, and h = g. Then the angle of h is larger than α, i.e., h = cos β + i sin β with 0 < α < β < 2π. Keep subtracting α from β until you hit a negative angle for the first time: say, β − (l + 1)α < 0 but γ = β − lα ≥ 0, so that γ − α < 0. This means that b = cos γ + i sin γ, and in group terminology, b = h · g −l ∈ G. Thus, b ∈ G has angle γ such that 0 ≤ γ < α. By the minimality of α this is impossible, unless γ = 0, i.e., β = lα, b = 1, and hence h = g l . So all elements of G are in the cyclic subgroup generated by g. This certainly means that G equals its own subgroup, i.e., G = g. As we showed in (a) above with g in place of a, g must have some finite order q. We conclude that G = g is the cyclic group Cq = ζq , which we encountered earlier. Thus, all finite subgroups of (C∗ , ·) are precisely the cyclic subgroups Cn for any n ∈ N.  Å

ã 123 ; ρπ Exercise 22. (a) πρ = 213 Å ã Å ã 123456 12345678 ; ; (c) 512364 17824653

Å

=

Å 1 4

ã Å ã 123 1234567 ; (b) ; 132 7641352 ã 2 3 4 5 6 7 8 9 10 11 12 . 2 12 11 1 5 9 7 6 8 3 10

Exercise 23. (1356)2 = (15)(36), (1356)3 = (1653), (1356)4 = (1) = e. Hence, o((1356)) = 4. 

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

137

Problem 11. If the r-cycle is α = (a1 a2 · · · ar ), then αk = e for any k = 1, 2, . . . , r − 1. Indeed, αk sends a1 → ak+1 = a1 . However, αr = e as every element will move r slots to the right, i.e., it will come back to its original position. Thus, o(α) = r.  Exercise 25. φ = (1357)(2)(468) = (1357)(468).



Problem 12. (a) The statement is obvious for n = 1: the only permutation around is (1). Given permutation α ∈ Sn , take some a1 ∈ {1, 2, . . . , n} and track down where it goes under α; let α(a1 ) = a2 , α(a2 ) = a3 , α(a3 ) = a4 and so on; and define αk = ak . Keep on going, until you come back to a1 , i.e., α(aj ) = a1 for some r ≤ n. This will always happen. Indeed, the sequence {a1 , a2 , a3 , . . . , an , an+1 } consists of n + 1 numbers while we have only n numbers to work with! By PHP, two elements of the sequence must be the same! Let ai = aj be the first two elements that are equal. If i > 1, then the permutation α has hit ai = a1 twice: indeed, ai−1 = aj−1 (why?) but α(ai−1 ) = ai = aj = α(aj−1 ). This contradicts the bijectivity of α! Thus, the only possibility is for the first repetition in the sequence to involve a1 : a1 = aj where 2 ≤ j ≤ n + 1. Thus, α contains the j-cycle (a1 , a2 , . . . , aj−1 ). The remaining k = n − (j − 1) numbers in {1, 2, . . . , n} must be permuted amongst themselves by α (again because of bijectivity of α); we can think of this permutation β as an element of Sk for some k < n. Thus, α = (a1 , a2 , . . . , aj−1 )β. By strong induction on n, we can write β as a product of disjoint cycles, which implies that α can be written as such a product too.  (b) By (a), we can split any permutation α as a product of several disjoint cycles αi : α = α1 α2 · · · αr . It remains to represent any cycle as a product of (not necessarily disjoint) transpositions. The text suggests one way to do this via a specific example. Here is a general formula for a cycle αi : αi = (a1 a2 a3 . . . aj ) = (a1 aj )(a1 aj−1 )(a1 aj−2 ) · · · (a1 a4 )(a1 a3 )(a1 a2 ). Indeed, track down where each of the elements of αi goes in the RHS, being careful to apply the transpositions in the correct order, from right to left! For example, in the RHS we send a1 → a2 , a2 → a1 → a3 , . . . , aj−1 → a1 → aj ,  and aj → a1 , which is precisely what we want to happen. (c) Let α = α1 α2 · · · αr be a product of disjoint cycles. As any two disjoint cycles commute: αi αj = αj αi , we can easily compute any power of α by rearranging and combining together the same cycles: αk = α1k α2k · · · αrk . In order for this whole product to be e, we must “kill” each power αjk . Let the length of αj be lj . By Problem 11, we know that o(αj ) = lj , so that the l first power that kills αj is its length: αjj = e. Moreover, within the proof of Problem 8, we showed that ak = e iff o(a) divides the exponent k. Thus, αjk = e iff lj |k. As this applies to all cycles αj , we conclude that αk = e iff k is divisible by all lengths l1 , l2 , . . . , lr . The smallest k that makes this  happen is their least common multiple: k = lcm(l1 , l2 , . . . , lr ).

138

5. GROUP THEORY

Problem 13. (a) The arising 4 cases depend on whether, how much, and how exactly tk = (1a) overlaps with the previous transposition tk−1 : • Complete overlap: tk−1 = tk = (1a) so that tk−1 tk = (1a)(1a) = e and we can erase tk−1 tk from the product, thereby reducing the number of transpositions by 2 and contradicting the minimality of this odd-length representation of e. • No overlap: tk−1 = (bc) for some b and c different from 1 and a. Then tk and tk−1 commute: tk−1 tk = (bc)(1a) = (1a)(bc). • Partial overlap 1: tk−1 = (1b) for some b different from a and 1. Then tk−1 tk = (1b)(1a) = (1ab) = (1a)(ab). • Partial overlap 2: tk−1 = (ba) for some b different from a and 1. Then tk−1 tk = (ba)(1a) = (1ba) = (1b)(ba). While the first case is impossible, in the last three cases we managed to move 1 to the left in the total product. ♦ Exercise 26. Write any two even permutations α and β as products of transpositions: α = t1 t2 · · · t2m and β = q1 q2 · · · q2k . Then αβ can be written as the product of 2m + 2k transpositions, i.e., αβ is an even permutation. −1 −1 −1 Furthermore, the inverse is α−1 = (t1 t2 · · · t2m )−1 = t−1 2m t2m−1 · · · t2 t1 = t2m t2m−1 · · · t2 t1 , i.e., α−1 is also an even permutation. We have shown that An , the set of even permutations, is closed under taking products and inverses. It definitely contains the identity e, and its operation is associative  (as it is so in Sn ). Hence An is a subgroup of Sn . Using similar arguments, the reader can practice in showing that the product of two odd permutations is even, while the product of an odd and an even permutation is odd, and that the inverse of an odd permutation is odd. ♦ Problem 14. Any symmetry of the tetrahedron is a permutation on its 4 vertices. The rotational symmetries correspond to the 12 elements of A4 : e, the three (disjoint) products (ab)(cd), and the eight 3-cycles (abc). A new (reflectional) symmetry adds to A4 a new, odd permutation (ab). Thus, 13 ≥ |S(T )| ≤ 24 = |S4 |. But as the text alludes, the order of a subgroup divides the order of the group, and the only number between 13 and 24 ♦ dividing 24 is 24 itself, i.e., S(T ) = S4 . Problem 15. The cube has 6 pairs of opposite edges, through the midpoints of which passes a rotational axis with only 1 non-trivial rotation by 180◦ . The cube has 3 pairs of opposite faces, through the centers of which passes a rotational axis with 3 non-trivial rotations (by 90◦ , 180◦ and 270◦ ). Finally, there are 4 pairs of opposite vertices (forming the 4 diagonals), through which passes a rotational axis with 2 non-trivial rotations (by 120◦ and 240◦ ). With the identity this makes a total of 1+6·1+3·3+4·2 = 24 rotational symmetries of the cube, i.e., |G4 | = 24. Can we identify G4 with a well-known group?

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

139

The cube has 8 vertices. As the hint in the text suggests, we are looking for 4 objects that are permuted; such are the 4 pairs of diagonally opposite vertices. Indeed, the 4 diagonals of the cube are permuted amongst themselves by any symmetry of the cube (why?). If l is the line through the midpoints of any two diagonals di and dj , then l is perpendicular to the other two diagonals dk and dm (can you see it?). Hence, the rotation about l by 180◦ will switch di and dj but fix dk and dm , thereby inducing the transposition (di dj ) ∈ S4 . As any permutation is a product of transpositions, these (di dj )s generate the whole S4 of permutations of {d1 , d2 , d3 , d4 }. Thus, S4 ⊆ G4 . But |S4 | = 4! = 24 = |G4 |, so that G4 = S4 .  Problem 16. Conversely, if we know where the diagonals go under a symmetry of the cube, we know where the whole cubes goes! The subtlety here is to realize that each diagonal di corresponds to a pair of vertices (ai , bi ), which can be fixed or which can switch with each other under a cube’s symmetry; in either case, the diagonal di goes it itself, so we don’t see a difference between such symmetries from the viewpoint of our S4 in Problem 15. So, we might still be short a few symmetries of the cube! Indeed, a central symmetry σ through the center O of the cube (i.e., a dilation through O by a ratio of −1) will switch any two opposite vertices, thereby fixing all 4 diagonals. Hence, σ is a (non-rotational) symmetry of the cube such that o(σ) = 2. The reader should check that σ commutes with all rotations in G4 and that any reflection of the cube can be uniquely written as a product of σ and a rotational symmetry. For instance, the reflection depicted in Figure 7b equals σρ = ρσ, where ρ is an 180◦ -rotation about the line connecting the centers of the front and the back faces of the cube. In ♦ summary, S(C) ∼ = S4 × C2 . Exercise 27. (d) (1, 8, 9, 6, 2, 14, 10, 4, 3, 11, 13, 7, 15, 5, 12).



Exercise 28. If α, β ∈ S15 , then they both fix the number 16, so that their composition αβ, as well as their inverses α−1 and β −1 , will also fix 16. As e fixes 16, S15 can, indeed, be thought of as the subgroup of S16 consisting of all permutations that fix 16.  Question on p. 125. The permutation in Figure 1d is the 12-cycle from Exercise 27(d) above, and hence it is odd permutation. By Theorem 1, it is unreachable.  Theorem 2. The 3-cycle (13, 14, 15) in the 15-puzzle can be converted into e in many ways by permuting only the bottom two rows. Here is one way (cf. Fig. 12), proposed by Alison Mirin, an alumna of Mills College, who took the first course in Problem Solving in Mathematics that was based on volume I of the present book.

140

5. GROUP THEORY

(1) (2) (3) (4)

Rotate the bottom two rows clockwise by one place. Rotate the bottom middle 2 × 2 square clockwise by one place. Rotate the bottom left 2 × 2 square counter-clockwise by one place. Move the square with 9 in it (to the left), and then move the square with 10 in it (up). (5) Rotate the bottom two rows counter-clockwise by one place. 

The last permutation is the desired identity permutation e. 1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

5

6

7

8

5

6

7

8

5

6

7

8

5

6

7

8

5

6

7

8

9 10 11 12

14 9 10 11

14 13 9 11

13

14 15 13

15 13

15

14 15 10 12

12

10 12

9 11

13 9 10 11 14 15

Figure 12. Permuting the 3-cycle (13, 14, 15) to the identity e

12

Session 6

Monovariants. Part II Jumping Fleas and Conway’s Checkers based on

Gabriel Carroll’s session

Sneak Preview. In Part I of this session, we learned about monovariants: things that can change but only in one direction. Plunging deeper into the topic, here we shall track “right-wing” fleas and other “extremists,” zero-in on everything in sight, enforce affirmative action by re-distributing favors and peacefully settle scores among feuding knights, learn to efficiently edit our essay assignments, enjoy organized sleep-shifts . . . at school, and discover unseen barriers for migrating checkers. Through recreational problems and constructive activities, we shall focus on two major uses of monovariants: showing that an iterative process must end, and showing that some state is unreachable from some other state. Part III shall present another, more technical application of monovariants to inequalities. In addition to Monovariants I, the reader should review from volume I some basics from Combinatorics, Number Theory, Proofs, Induction, and, of course, the Invariants session, especially the “Escape of the Clones” problem.

1. Numerical Monovariants A large portion of the examples of monovariants in Part I might be called “numerical” – they were full of numbers, e.g., in the mansion problem we talked about the number of people in each room, or the numbers of men and women, separately, in each room, etc. When you have a large collection of numbers and want to form a single monovariant from them, there are some standard recipes to try:

 largest, or the smallest; or more generally, apply some function to each of PST 42. Look at the sum of all the numbers, or their product; look at the them (such as squaring), and then add or multiply them together.

With some luck, you have created a monovariant: a feature in the probi lem that either always increases or always decreases! Let’s see how this works

in practice. 141

142

6. MONOVARIANTS. PART II

1.1. Sum-monovariants are among the most common monovariants.



Problem 1. (Russia ’61) A rectangular m × n array of real numbers is given. Whenever the sum of the numbers in any row or column is negative, we may switch the signs of all the numbers in that row or column, from negative to positive or vice versa. Prove that if we repeat this operation enough times, eventually all the row and column sums will be non-negative. Before looking at the official solution below, try a simple example with a 2 × 2 array, as in Figure 1. Think about how many flips of the row or column signs you need to perform before being unable to continue. What happens if you change the numbers in your table? Do you always get 4 positive numbers in the end, or could some be negative? −2 −3 −1 −4



+2 −3 +1 −4



−2 +3 +1 −4



+2 +3 −1 −4



+2 −3 −1 +4



−2 +3 −1 +4



+2 +3 +1 +4

Figure 1. Switching signs in rows and columns Solution: If the sum of the numbers in a row or column is x < 0, then after we switch their signs, their sum will be −x > x. Since all the other numbers in the table stay the same, we see that the sum of all the numbers in the table has increased. (In the example in Figure 1, check that the sum indeed increases: −10 → −4 → −2 → 0 → 2 → 4 → 10.) So this is our monovariant – just the sum of all mn numbers. Since each position in the array can take on only two different values (the original number, with a + or − sign), the whole array can take on at most 2mn possible states. This is a finite number of choices, so the sum of all the numbers cannot keep increasing forever. Thus, eventually it must stop increasing, which means we cannot perform the sign-switching operation any more, and hence all the row and column sums will be non-negative. 

 PST 43. The key idea was to count the number of all possible states of the

array and deduce that some feature of the table we come up with, such as the sum of all entries, will have only finitely many values. If our feature is a monovariant, then it must reach an extreme value and stop changing.

If you are wondering how we counted 2mn possible states: we multiplied the number of possibilities (2) for each cell of the array, e.g., for a 2 × 2 array there are at most 24 = 16 possible states. This is an independent-choice “menu”-type problem that you encountered in Combinatorics I (vol. I). Potentially, a function could run through all of its possible values before reaching its maximum. Could the process in Problem 1 take all the 2mn − 1 sign-switching steps before terminating? For an m × n array, what would be the longest number of steps in this process? The sum in Figure 1 takes only

1. NUMERICAL MONOVARIANTS

143

6 (not 15) steps to stop: Can we devise a 2 × 2 example that will take longer to terminate? This discussion should remind the reader of the Appendix to the Monovariants I session, where an analogous question was answered for the mansion problem. To make it even more challenging, the advanced reader can ask and attempt to answer the same questions about maximizing the length of the process whenever appropriate in the forthcoming problems. Only hints to the next few problems will be offered here, but you can check out the solutions at the end if you get stuck. Recall that the greatest common divisor of two integers, gcd(a, b), is the largest integer that divides i both a and b, while the least common multiple, lcm(a, b), is the smallest (positive) integer that is divisible by both a and b.

 a blackboard. One can erase any two distinct numbers and write their gcd Problem 2. (St. Petersburg ’96) Several positive integers are written on and lcm instead. Prove that eventually the numbers will stop changing. As usual, let’s try out some examples to see what is happening: • {2, 3, 5, 15} → {1, 6, 5, 15} → {1, 1, 30, 15}, or differently: • {2, 3, 5, 15} → {1, 3, 5, 30} → {1, 1, 15, 30}. Hints: If d = gcd(a, b), then a = dk and b = dm for some relatively prime positive integers k and m, and l = lcm(a, b) = dkm (why?). Think of how the sum changes from a + b to d + l: some algebra will be necessary to establish whether you have an increasing or decreasing monovariant. Finally, what is the largest number you can possibly write down? ♦ 1.2. “Extremal” monovariants. By now, for your monovariants you should be automatically trying first the sum of your numbers. However,

 PST 44. Sometimes it is easier to track the largest or the smallest number present, and to set it aside if it doesn’t change under the operation.

Here are a few more related puzzles for you to think about. Their solutions may also require concepts like prime powers pk . Problem 3. In the setup of Problem 2:



(a) Is the product of all numbers a monovariant? Is it helpful at all? (b) You will inevitably see a pattern among the numbers when the process terminates. What is this pattern? Does it always persist? (c) We choose the order of pairs on which to perform the operation. Presumably, the length of the process and the final resulting set of numbers depend on our choice, or do they? Hints: When does the process terminate? What pairs of numbers are kept the same under the operation? If the largest (or smallest) possible number is written on the board, will that number change afterwards? Can you put it aside and apply the problem to the remaining numbers? ♦

144

6. MONOVARIANTS. PART II

Problem 4. (Cofman, [16]) Place four non-negative integers a, b, c, and d around a circle. For every two consecutive numbers, take their absolute difference and write that difference between them; then erase the four original numbers. Thus, after one step, the four new numbers will be |a − b|, |b − c|, |c − d|, |d − a|. Iterate this process. Is it true that the process always eventually leads to a circle with four 0s? Generalize your result by replacing 4 with any power of 2 greater than one. Solution: The first part of our solution works for any M numbers (not just 4 or 2m ). For starters, notice that at every step the largest number L around the circle can never increase: it will be replaced with some difference |L − c| = L − c ≤ L, while no other (smaller) value can go beyond L. Thus, the largest value is a decreasing monovariant, and it will eventually stabilize at some value a ≥ 0. Of course, if a = 0, we are done (why?).

 stabilizes at some a > 0, eventually each of the numbers will be either a or 0. Lemma 1. For any M numbers, if the largest number around the circle

Solution: The only way to get the value a from now on is, at each step, to have the consecutive pair {a, 0} or {0, a} appearing somewhere on the circle. But the only ways to produce {a, 0} are to have consecutively {a, 0, 0} or {0, a, a}, and to produce {0, a} – to have {0, 0, a} or {a, a, 0}. We see that from now on, at each step, we must have at least one of the sequences {a, 0, 0}, {0, a, a}, {0, 0, a}, or {a, a, 0}. This begs for an inductive argument. Suppose we have shown that for some n ≥ 2 it is always necessary (from now on at each step) to have some sequence An = {a1 , a2 , . . . , an } where all ai ’s are 0 or a, and at least one of them is a. (The sequence itself is not fixed, so it can be one of 2n − 1 types – the exact number of sequences is irrelevant.) But to produce sequence An , we must have from now on, at each step, a sequence An+1 of (n + 1) 0s and a’s, with at least one a. Indeed, start from some ai = a in An . To have such an a, as we saw above, we need to have either {a, 0} or {0, a} in the corresponding place in An+1 . One can easily see that each of these cases uniquely determines the rest of sequence An+1 , populating it only with 0s and a’s. This completes the induction step. But what happens when the length n of the required sequence An exceeds M ? It simply means that we have wrapped the sequence around the whole circle, and from now on the only values on the circle will be a’s and 0s.  0 a

a

a

a

a

a

0

0

0

a ↔0 a

a

a 0

a

a

Figure 2. Cycles for triangles and hexagons The remainder of the problem does not work in general. For instance, take the 3 numbers {0, a, a} on the gray triangle in Figure 2a; the next

1. NUMERICAL MONOVARIANTS

145

step will be the white triangle with exactly the same label {0, a, a}! Neither will we get to the zero-configuration from the hexagon {0, a, a, 0, a, a} (cf. Fig. 2b), whose label also goes to a rotated version of itself under the operation. The reader may want to think of other counterexamples. Below we move to the 2m case, which always works. Lemma 2. If you start with M = 2m numbers, all 0s and a’s, then after 2m iterations, you will be left with only 0s. Partial proof: As an illustration, take the square with labels {0, a, a, a} in Figure 3. After 4 iterations (follow the numbers outside the squares), this turns into {0, 0, 0, 0}. To see why, forget about the operation |b − c|: it is too hard to track what happens under it. As everything is divisible by a, we can factor out a and, de facto, assume that a = 1. Now, let’s add up any two adjacent numbers (written inside the squares) and work modulo 2, i.e., think of 0 for even and 1 for odd. The results will be really the same as before: at each step, 1 + 1 = 2 = 0 = |1 − 1|, 1 + 0 = 1 = |1 − 0|, and 0 + 0 = 0 = |0 − 0|. The net effect of these additions are shown in Figure 3, where the final label is {10, 12, 14, 12} = {0, 0, 0, 0} (mod 2). a 1

a

0 0

1

1

1

a 1 a

a

2

3

3

4

a

0

0

0 10

12

12

14

7 a

a 5

2 0 2 0

a 5

a

0

7 a

0

0

Figure 3. Zeroing-in on the circle for 22 numbers In order to show that this works for any initial labels {a1 , a2 , a3 , a4 }, track the additions for the next 4 iterations. Thus, for instance, in place of a1 in the last square we will have 6a1 + 4a2 + 4a4 + 2a3 = 0 (mod 2). Analogous formulas will imply that all numbers around the last square are 0. For instance, a1 = 0, a2 = 1, a3 = 1, a4 = 1 give 6 · 0 + 4 · 1 + 4 · 1 + 2 · 1 = 10. To extend this argument to any 2m numbers, it is necessary to recognize the coefficients 6, 4, 4, and 2 in the final formula as certain binomial coefficients, generalize them, and show that they are all even. ♦ 1.3. Looking for more than just a monovariant. Here’s a clever but really difficult problem. It will require using an invariant, a feature that does i not change (which shouldn’t be hard to find), together with a monovariant. Problem 5. (IMO ’86, [21]) An integer is written at each vertex of a regular pentagon so that the sum of all five numbers is positive. If three consecutive vertices are assigned the numbers x, y, z with y < 0, then the following operation is allowed: the numbers x, y, z are replaced by x + y, −y, z + y, respectively. Such an operation is repeated as long as at least one of the five numbers is negative. Determine whether the procedure necessarily comes to an end in a finite number of steps.

146

6. MONOVARIANTS. PART II x+y

x t

5 −1

−2

y

t

3 −1

0 q

−1 < 0

0

3 z

1

2

−2 < 0

q

2

2

−y 1

0

2 −1

2

−1 < 0

1

1

0

z+y

Figure 4. Changing consecutive sums Hint: Try the following numerical experiment. Put some numbers at the vertices, e.g., as in Figure 4a. Then write down all of the possible sums of 1 or more consecutive numbers around the pentagon. (There are 21 such sums. Why?) Perform the operation in the problem several times, and at each step, again write down all 21 sums. How do the sums change when the operation is performed? In particular, what about their absolute values? ♦ If you attempt to solve this last problem, you will realize that the appropriate monovariant can sometimes be tricky to construct. In practice, though, when a problem requires a monovariant, it’s usually not too hard to come up with it. The simple formulas mentioned at the beginning of this section are fairly general-purpose. The further problems in this session should help provide you with inspiration for the rough times when the usual recipes come up short. 1.4. Monovariants and Sequences. We’ve been looking so far at descriptions of processes repeated over time. Of course, time, strictly speaking, is not a mathematical concept; it’s a conceptual convenience. When we talk about the state of some system changing over time, that is really shorthand for a sequence of states, with a certain relationship between each state and the next. For example, Problem 4 could be rewritten as describing a sequence s0 , s1 , s2 , . . ., where each si is a quadruple of numbers (ai , bi , ci , di ). The relationship between si and si+1 is given by si+1 = (|ai − bi |, |bi − ci |, |ci − di |, |di − ai |). (Or, alternatively, we could think of the problem as describing four sequences of integers, ai , bi , ci , di , living in interrelated harmony.) Using this equivalence between changes over time and sequences, we can translate an earlier PST:

 tually constant, try using a monovariant.

PST 45. To show that some sequence of numbers (or other things) is even-

For example, the monovariant might be some function of the nth term of the sequence, which would change in some predictable way as n increases. Or perhaps we need to look not just at one term at a time, but at two or more successive terms. This is confusing to explain without an example, so here’s an example to make the discussion concrete.

1. NUMERICAL MONOVARIANTS

147

Problem 6. (USAMO ’93, [41]) Let a and b be two odd positive integers. Define a sequence by putting f1 = a, f2 = b, and letting fn for n ≥ 3 be the greatest odd divisor of fn−1 + fn−2 . Prove that fn becomes constant for n sufficiently large, and determine the eventual value as a function of a and b. As an example, if a = 11 and b = 23, then f3 = 17, f4 = 5, f5 = 11, f6 = 1, f7 = 3, f8 = 1, f9 = 1, and the sequence stabilizes at 1. But how do we tie monovariants into the problem?

 PST 46. The key is to turn a divisor relationship into an inequality.

For example, if c divides d, then c ≤ d. Thus, for a sequence (of positive terms) in which fn is a divisor of fn−1 for each n, in particular we know that fn ≤ fn−1 , i.e., that the sequence is decreasing. The situation, however, isn’t quite that simple in our problem – the example shows we don’t necessarily have fn ≤ fn−1 for all n – but the truth is not too much more complicated. Proof of stabilization: Notice that for each n, fn−2 + fn−1 is even, which means that fn , as an odd divisor of fn−2 + fn−1 , is no larger than (fn−2 + fn−1 )/2, the average of the two preceding terms. Therefore, (1)

fn ≤ (fn−2 + fn−1 )/2 ≤ max{fn−2 , fn−1 }.

This inequality is not “well-balanced” for our purposes: on the LHS we have one term fn , and on the RHS we have some “max” function. To make it symmetric, this “max” function should appear on the LHS too. But clearly (2)

fn−1 ≤ max{fn−2 , fn−1 },

so that the LHS’s of the last two inequalities (1)-(2) together imply max{fn−1 , fn } ≤ max{fn−2 , fn−1 }, the desired balanced inequality we need. We read it as follows: if, for each n, we consider the larger of the two numbers fn−1 , fn , this max never increases when n increases. Aha! A monovariant! Hence, max{fn−1 , fn } can only decrease. Since it is a positive integer, it can’t keep decreasing forever, so it must eventually become constant at some value c. Let n be large enough that the monovariant has stopped decreasing, i.e., max{fn−1 , fn } = c from n on. We can assume fn = c (otherwise, use max{fn , fn+1 } = c and just replace n by n + 1). Then we claim fn+1 = c also. For contradiction, suppose fn+1 < c; then max{fn+1 , fn+2 } = c implies fn+2 = c. By definition of fn+2 , this means that c is a divisor of fn + fn+1 . But fn + fn+1 lies strictly between c and 2c (why?), so it cannot be divisible by c. This contradiction proves the claim: fn+1 = c after all. We have completed an inductive argument that shows all subsequent terms of the sequence are also equal to c, i.e., the sequence is eventually constant.  Once this is out of the way, the second part of the problem is not difficult. In our example that started with f1 = 11, f2 = 23, the stabilizing constant was c = 1. At the same time, the original two terms were relatively prime,

148

6. MONOVARIANTS. PART II

as well as any pair of consecutive terms. If we multiply both f1 and f2 by 5, then every consecutive term would be multiplied by 5, modifying the constant c to 5. This gives us the idea of how to proceed:



Exercise 1. In the setup of Problem 6 prove that gcd(fn−1 , fn ) is an invariant. Conclude that the constant value at which fn stabilizes is gcd(a, b). Hint: Start by proving that fn+1 is divisible by gcd(fn−1 , fn ), and that fn−1 is divisible by gcd(fn , fn+1 ). ♦ Problem 7. (USAMO ’97, [41]) Let p1 , p2 , p3 , . . . be the prime numbers listed in increasing order, and let x0 be a real number between 0 and 1. For each positive integer k, define xk = 0 if xk−1 = 0, and xk = {pk /xk−1 } otherwise, where {x} = x−x denotes1 the fractional part of x. Find all x0 satisfying 0 < x0 < 1 for which the sequence x0 , x1 , x2 , . . . eventually becomes 0. As an example, if x0 = 35 , then x1 = {p1 /x0 } = {2/ 35 } = {10/3} = 10/3 − 10/3 = 10/3 − 3 = 1/3, x2 = {p2 /x1 } = {3/ 13 } = {9} = 0, and √ the sequence stabilizes. If x0 = 1/ 2, the calculation below is a lot more involved and requires some algebra skills. (If you have a hard time following it, you can skip it for now, as it won’t affect the gist of the solution.) Thus, √ √ √ √ √ x1 = {p1 /x0 } = {2 2} = 2 2 − 2 2 = 2 2 − 2 = 2( 2 − 1), x2 = =



©

p2 x1 = √ 3( 2+1) 2



√    √3 √3( 2+1) √ = = 2( 2−1) 2( 2−1)( 2+1) √ √ −  3( 22+1)  = 3( 22+1) − 3 = √32

 √

3( 2+1) 2(2−1)



− 32 , and so on.

Hint: What √ looks evident in the second example is that we won’t be able to get rid of 2, which will prevent the sequence from stabilizing. The reason is that if x0 is irrational,2 then all xk are going to be irrational (why?). Now suppose x0 is rational, as in our first example. The sequence itself may not be monovariant! However, do you notice something about the denominators of x0 , x1 , x2 , . . .? Employ the PST below and see if you can locate your monovariant. ♦



PST 47. For a sequence of rational numbers, investigate the two sequences that it naturally generates: its numerators and its denominators. Depending on the problem, it may or may not be advantageous to reduce the fractions so as to redefine the two sequences. Problem 8. (USAMO ’07, [25]) Let n be a positive integer. Define a sequence by setting a1 = n and, for each k > 1, letting ak be the unique integer in the range 0 ≤ ak ≤ k − 1 for which a1 + a2 + · · · + ak is divisible by k. For instance, when n = 9 the sequence obtained is 9, 1, 2, 0, 3, 3, 3, . . .. Prove that for any n the sequence a1 , a2 , a3 , . . . eventually becomes constant. 1 2

√ x is the largest integer ≤ x, e.g., 10/3 = 3.¯ 3 = 3 and  2 = 1.4 . . . = 1. a Irrational means a real number that is not of the form b for any integers a and b.

2. CONSTRUCTIVE ACTIVITIES

149

Why does such a sequence exist and why is it unique? Once a1 , a2 , . . . , ak−1 are determined, there is exactly one choice for ak to adjust the previous sum a1 +a2 +· · ·+ak−1 to be divisible by k. For instance, if we already know the first terms 9, 1, 2, 0, to get a5 ∈ [0, 4], we compensate for the remainder 2 (mod 5) of 9 + 1 + 2 + 0 = 12 by adding the only possibility 3 = a5 . Hint: Since a1 + a2 + · · · + ak is divisible by k, this suggests looking at the quotient bk = (a1 + a2 + · · · + ak )/k. Check the sequence b1 , b2 , b3 , . . . : you may want to first show that it stabilizes. ♦

2. Constructive Activities Imagine that you’re editing a piece of writing – maybe you’re a student turning in a paper for a class, or maybe you’re a professional writer working on your next book. If you’re like most writers, you don’t simply sit down and instantly crank out a perfect piece of work. You start with a first draft, and you know that there are lots of flaws and mistakes and weak points in the writing. So you go through and fix them. Sometimes fixing a problem requires making major changes to the paper, and in the process you create new problems and mistakes. But you do it, because you know that on the whole it’s an improvement, and the new problems can be fixed in turn. You keep revising and revising, and eventually the result will be satisfactory. Now, what does all that have to do with mathematics? It’s an example of monovariants in action: the paper keeps getting better each time a problem is fixed. In real life this iterative process is tedious and time-consuming, but mathematics is great, because all you have to do is describe the process and verify that it is possible. To strip away the allegory and get to the point:

 object that doesn’t meet the conditions, and then fix it until it does. Use a PST 48. To construct an object that meets certain conditions, create an monovariant to show that it will eventually be completely fixed.

2.1. Connecting the dots. The vagueness above is probably making your eyes glaze, so let’s save the day with a specific example [59], where we will progressively eradicate all “errors” in the solution. Problem 9. (Kvant ’94) Given are n grey and n black points in the plane, no three collinear (i.e., no three on the same line). Show that we can draw n nonintersecting segments connecting the black points to the grey points.

Figure 5. Switching connections and Creating more intersections

150

6. MONOVARIANTS. PART II

How to go about this? If you try to give an explicit description of how to pair up the points, it’s not clear where to start. So, using PST 48, start instead by simply connecting the points randomly. Now whenever two segments cross, we can uncross them by changing the way the points are connected, as shown in Figure 5a. We can just repeat this process, until all the intersection points have gone away. Or can we? We need to show that the process will eventually end. We can’t simply say that the number of intersection points will decrease, because we’d be lying. As shown in Figure 5b, switching the segments could create new and even increase the number of intersection points! 2.1.1. A geometric monovariant. We need to find some other monovariant to guarantee that the process won’t get caught in a loop. Well, you might notice that each time two points are uncrossed, two long segments are replaced by two shorter segments. Let’s make this precise. The four endpoints of the segments form a convex quadrilateral, and uncrossing the lines just means replacing the two diagonals by two opposite sides. Exercise 2. Check that, if ABCD is a convex quadrilateral, then AC + BD > AD + BC.

 Hint: If AC ∩ BD = {E}, use the Triangle Inequality on ADE and BCE.



D

D

C

C

E A

B

A

B

It immediately follows that the sum of the lengths of all n segments decreases every time we perform the uncrossing operation. 2.1.2. The clean write-up. Now all the parts are in place to be put together.



Solution to Problem 9: First pair up the grey and the black points arbitrarily, and connect each grey point to the corresponding black point. This may create some intersection points. Now iterate the following operation: • Whenever a segment AC crosses a segment BD, with A, B grey and C, D black, replace segments AC and BD by AD and BC. For AC and BD to intersect, ABCD must be a convex quadrilateral (why?). So by Exercise 2, the sum of the lengths of all n line segments must decrease each time we perform this uncrossing operation. But there are only a finite number of ways to pair up the grey points with the black points.3 So if we perform the operation repeatedly, the process must eventually end. By assumption, we perform the uncrossing above whenever two segments cross. So when the process stops, there must be no more crossings, which means that we have paired up the grey points with the black points using n nonintersecting line segments – just as the problem requires.  3

Recall the matchmaking Exercise 8 from Combinatorics I (vol. I). There, 10 men and 10 women could marry off in 10! heterosexual couples. Similarly, the number of possible pairings between the black and grey points is n!.

2. CONSTRUCTIVE ACTIVITIES



151

Notice that this is an example of a problem where no operation is supplied. Instead, solving the problem requires coming up with both the monovariant (total segment length) and the operation (uncrossing of segments) that makes it monovary. 2.1.3. Extremes again! Writing out this solution doesn’t necessarily require us to describe fixing the configuration as a process. For example, we could also write the solution as follows:

Alternative solution to Problem 9: Among all n! ways of pairing the grey points with the black points, consider the pairing that makes the sum of the lengths of the segments as small as possible. We claim that, with this pairing, the segments never cross each other. Indeed, if there is a crossing, then we can re-pair the points involved as in Figure 5a to make the total length of the n segments shorter (by Exercise 2). But we assumed that the original pairing made this total length as small as possible, so this is a contradiction.  The idea of picking up a pairing that minimizes the total segment lengths i is a famous technique called the Extreme Principle:

 an extreme value of that feature (e.g., minimal sum). Then argue that, due

PST 49. Define a feature (e.g., sum of lengths) and select an object having to the extreme value, some operation is not possible (e.g., uncrossing), and hence conclude that the object in question possesses some other property (e.g., no intersection points).

Speaking of which, our alternative solution relied on just one minimallength pairing; but it did not preclude the existence of other such pairings, nor did it outlaw good non-minimal pairings:

 more than one minimal-length pairing of the segments? How about having a correct (non-intersecting) pairing of the points that is not the shortest?  Another idea that anchored both solutions was the finiteness of all pos-

Exercise 3. Could there be a configuration of grey and black points with

sible correct (and, in fact, incorrect) pairings. We saw this idea earlier in PST 43, where we counted the total number of states in an array. Here, just knowing that there are finitely many possible pairings made our, a priori continuous, length monovariant into a discrete monovariant (having only finitely many values). Because of this, we could conclude that the monovariant eventually stabilized, perhaps, not necessarily at its minimal value.

2.1.4. Pros and cons. The solution to Problem 9 can be written either way: by a “self-correcting” process or via the Extreme Principle. Arguably the second way is in some sense more appealing, because it explicitly identifies the pairing (or one of the pairings) that works. But both solutions require the same key idea – the same operation and the same monovariant.

152

6. MONOVARIANTS. PART II

Thinking about the problem in terms of a process and a monovariant can be more helpful for you as the solver, trying to come up with a solution. It can also be more helpful for someone trying to actually implement the solution. While the Extreme Principle is a great theoretical tool. . . going back to the writing example, if your algorithm for writing a 10-page essay is to consider all possible 10-page sequences of letters, spaces, and punctuation marks, and choose the one that best fits the demands of the assignment, you’re going to have to ask your teacher for one heck of an extension! 2.2. Friendship/Enmity Relationships. Let’s practice our newly-learned “correcting” or “editing” technique of starting from a random arrangement and gradually “improving” it. Below we assume that friendship is symmetric, i.e., if P is Q’s friend, then Q is P ’s friend. Problem 10. You are the host of a party, with some number of guests. Some of the guests are friends with each other. You have n kinds of party favors, each in unlimited supply.4 Prove that you can give each guest one of the favors so that the following condition is satisfied for each person P : at most 1/n of P ’s friends have the same kind of favor as P . Hint: Consider an arbitrary assignment of party favors to people. If something is wrong for a person P , what operation could you perform to improve the situation? What will your monovariant be? Think of the purpose of the problem: to limit the number of friends with same favors. ♦ Now, let’s talk about enemies as seen in this Moscow ’64 contest Problem 11. (Dirac’s Theorem, [48]) King Arthur summoned 2n knights to his court. Each knight has at most n − 1 enemies among the other knights present. Prove that the knights can sit at the Round Table so that no two enemies sit next to each other. (The relation of enmity is symmetric.) Hint: The operation needed here is a little tricky: take some arc of the table and reverse the order of the knights sitting in that arc. If some two enemies are sitting next to each other, there is always a way of performing this operation to decrease the number of pairs of adjacent enemies. ♦ 2.3. Shedding the disguise. If you are familiar with the language of graph theory, you will probably notice that both of these last two problems are really theorems of graph theory, recast in anthropomorphic form. Briefly, (1) A graph consists of vertices (dots), some of which are connected by edges (the segments between the dots). i (2) Neighbors are two vertices connected by an edge. (3) A Hamiltonian cycle is a path that tours the graph along the edges, visiting each vertex exactly once and coming back where it started. 4

Paul Zeitz called this the “affirmative action coloring problem” in the case of n = 2.

3. NOT GETTING THERE

153

“Friends” often end up being translated as “neighbors” on a graph, “enemies” are not connected by an edge, and colors could be properties assigned to vertices or edges. In this language, Problem 10 can be conventionally expressed using the coloring metaphor : for each n, the vertices of any finite graph can be colored in n colors so that, for each vertex v, at most 1/n of its neighbors (a.k.a. friends) are the same color as v. Problem 11 says that in a graph with 2n vertices, where each vertex has at least n neighbors (a.k.a. friends), there exists a Hamiltonian cycle, i.e., a closed path “around the table” that visits each knight once and goes from friend to friend.

3. Not Getting There So far we have used monovariants to study how a repeated process must eventually reach a certain type of state. But there is another, perhaps more self-evident use of monovariants, namely to show how a process cannot reach a certain final state from a certain initial state. The idea is simple: if a certain monovariant can only increase, there is no way to get from one state to another state where the monovariant’s value is smaller, and likewise if the monovariant can only decrease. If you recall in our china-shop example from Part I (vol. I), you know that there is no way to reassemble a plate from a bunch of pieces by repeatedly dropping them on the floor, because this operation can only increase the total number of pieces, whereas having a single plate at the end requires decreasing the number of pieces. This idea can be presented as a PST, because that’s what it is:

 formed repeatedly. To show that the system can never reach some state from PST 50. Suppose you have a system on which certain operations are persome other state, try using a monovariant.

This use of monovariants is a natural generalization of invariants. Indeed, you can think of an invariant as a special type of monovariant – one that can never increase and can never decrease. But there are some problems of this sort where coming up with an appropriate invariant would be difficult or awkward, and a monovariant does the job easily. 3.1. Flea-ing in a straight line. Here’s one example: Problem 12. (IMO ’00, adapted, [21]) Let n ≥ 2 be a positive integer, and let λ be a positive number less than 1/(n − 1). Suppose there are n fleas on a horizontal line.5 Whenever two fleas are at points A and B on the line, with A to the left of B, the flea at A may jump to the point C on the line to the right of B with BC/AB = λ. Show that there exists some initial position of the n fleas and some point M on the line such that it is not possible for all of the fleas to get to the right of M . 5

The problem does not preclude two or more fleas crowding at the same point.

154

6. MONOVARIANTS. PART II

3.1.1. Mono-search. The natural way to begin such problems is to coordinatize the line and identify the fleas with their respective coordinates. Thus, the rule BC/AB = λ (on the picture: |AB| = 1 and |BC| = λ) translates into coordinates as C − B = λ(B − A). So, a move consists of taking two fleas A and B, with A < B, and replacing A by C = B +λ(B −A) = (1+λ)B −λA. i This is called a linear function of the A 1 B λ C two coordinates A and B.

 expect that the monovariant will also be given by some sort of linear function.

PST 51. If the operation is given by a linear function, it is reasonable to

In the case of the fleas, that would be α1 P1 + α2 P2 + · · · + αn Pn where the αi ’s are constants and the Pi ’s are the positions of the fleas arranged from left to right on the line.

Notice that if the problem is correct, then actually we can’t even get the rightmost flea past M ; otherwise we could jump the other fleas over it. So this suggests the rightmost flea, and its coefficient αn , are special. Let’s hope, for simplicity, that all the other coefficients are equal. Finally, again because of linearity, we can divide everything by αn to adjust the coefficient of Pn to 1. So, the conjectured monovariant function is Pn +α(P1 +P2 +· · ·+Pn−1 ).

 PST 52. Adjust the coefficients of the linear function according to the specifics of the problem. If some variables are special (such as the largest coordinate), while the other variables play a symmetric role in the problem, then let the former have different coefficients, and the latter the same coefficients. Typically in such problems you will be able to rescale all coefficients so as to make one of them equal to 1.

To figure out α, let’s just think of the simplest case of two fleas P1 < P2 . After P1 jumps over P2 , P1 is replaced by P2 + λ(P2 − P1 ), which is also the rightmost point. Correspondingly, our function changes as P2 + αP1 → (P2 + λ(P2 − P1 )) + αP2 , and the net effect is RHS − LHS = (λ + α)(P2 − P1 ). We want this to be a decreasing monovariant, so the last quantity must be non-positive. As P2 − P1 > 0, we need only α ≤ −λ.

 try the simplest possible option.

PST 53. When making a choice and in the absence of further restrictions, For example, turn a non-strict inequality α ≤ −λ into an equality α = −λ.

Without hesitation then let’s choose α = −λ, which does not depend on the specific configuration of the fleas (excellent!). Our proposed monovariant i is thus Pn − λ (P1 + P2 + · · · + Pn−1 ), called the value of the configuration. If there is a tie for the rightmost position, then chose one of the rightmost fleas for Pn and treat the other(s) as “regular” non-extreme fleas.

3. NOT GETTING THERE

155

A formal proof is in order, but it will feel like a technical calculation compared to the hard (and creative!) work done above. Exercise 4. If Pn is the rightmost flea, show that Pn −λ (P1 +P2 +· · ·+Pn−1 )

 can never increase when a flea jumps.

Hint: Consider two cases, depending on whether the rightmost flea does or does not change after the jump. Calculate the net effect of the jump on the value of the configuration, factor, and show that it is ≤ 0. ♦ 3.1.2. Place-search. We still haven’t solved our problem. For all we know, our monovariant could decrease forever if the fleas could go on jumping forever, unless they all eventually land in the same location. (When does this happen?) However, the problem asks for something different: a place on the line over which the fleas cannot jump, i.e., for an impossible final configuration. It seems we again need to do some detective work. Now consider any configuration V of the n fleas, with some value ν. According to PST 50, we should find a configuration W of the fleas that is unreachable from V; more precisely, whose value is larger than our value ν, so that the decreasing monovariant will prevent us from reaching it. If ω is the rightmost position in W, then the value of W is ω − λ (sum of fleas ≤ ω). How can we make sure this is larger than ν? By the Sandwich technique from the Induction session (vol. I), we squeeze in an obvious intermediate quantity: (3)

?

ω − λ (sum of fleas ≤ ω) ≥ ω − λ(n − 1)ω = ω (1 − λ(n − 1)) > ν.

This may look intimidating, but all we did was replace each of the other fleas with ω, in order to decrease the overall value. To resolve the “?” before reaching ν at the end, let μ = 1 − (n − 1)λ. We finally see here why the condition λ < 1/(n−1) was required in the problem: to make our μ positive! Thus, we need ωμ > ν, i.e., any ω > ν/μ will do. The inequalities in (3) will be satisfied regardless of where the remaining fleas are, as long as they are to the left of ω. Our search over, a formal proof needs to recap the above points.



Exercise 5. From any configuration with value ν, show that it is not possible for any one of the fleas to get to a position M > ν/μ, where μ = 1 − (n − 1)λ. Hint: Use inequalities (3), our monovariant, and a contradiction.



We have just proven that for any initial configuration, there is some point over which none of the fleas can ever jump. This is a stronger statement than the problem required, so we are certainly done. In fact, the statement of the original problem is slightly misleading in that it asks us to search for a whole configuration of fleas, along with an unreachable position. In reality, any flea configuration works, and any M > ν/μ is beyond the fleas’ reach.

156

6. MONOVARIANTS. PART II

3.2. Sleeping in an organized fashion. Here is another example of the same technique, in a less “numerical” setting. Problem 13. 100 students are sitting at a 10 × 10 array of desks in a boring class. Each student is either asleep or awake. When at least two of a student’s immediate neighbors (vertically or horizontally) are asleep, the student may fall asleep; when at least two neighbors are awake, the student may wake up. At the end of class, there are ten students awake, and no two of them are sitting adjacent to each other. Prove that there were at least ten students awake at the beginning of the class. If you play around a bit, you’ll quickly notice that the number of students awake at any given time is not a monovariant! For example, Figure 6 displays the bottom left corner of the grid; W stands for “awake” and S for “asleep”. The cell that is about to change its state is shaded, and the border of its neighborhood of adjacent students is thicker. Check that the number of awake students fluctuates up and down! Indeed, this problem requires being a little cleverer.

W W S S S W W S S

W W S S S W S S S

W W S S W W S S S

W W S W W W S S S

Figure 6. Number of awakes is not a monovariant!

 

The trick is similar to the idea behind Problem 10 about favors among friends. At each step, a student assimilates to the state of a majority of his or her neighbors, and hence the total amount of mismatch between students sitting next to each other never increases. That is, the number of pairs of adjacent students, with one asleep and one awake, always decreases or stays the same. This solves the problem, because we can bound the number of such pairs at the beginning and at the end of class. Well, almost – the count doesn’t quite work out correctly, because while most students can belong to up to 4 such pairs, those on the edges of the grid have only 2 or 3 neighbors. So, at the end with 10 awake students, we could have anything from 26 mismatched pairs (with 4 corner and 6 edge W ’s) to 40 mismatched pairs (with all 10 W ’s inside the grid). This is not enough information to conclude for sure that there were 10 awake students in the beginning. Hypothetically, we could have started with 36 mismatched pairs created by 9 awake non-adjacent students inside the grid, but ended with all W ’s having “migrated” to the border, e.g., 10 edge W ’s for a total of 30 mismatched pairs and 10 awake non-adjacent students. As predicted, the monovariant decreased (from 36 to 30), but we didn’t solve the problem!

3. NOT GETTING THERE



157

PST 54. When everything works perfectly inside a region and is slightly off along the border, you might remedy this inelegance by embedding the given region in a bigger region and imposing some trivial conditions outside the region in order to extend the problem there. You may have seen this technique in another setting before. In Pascal’s Triangle, the defining rule “add two adjacent numbers to get the number directly below them” does not work for the 1s at the ends of the rows (why?). However, if you place 0s all around Pascal’s Triangle in the same triangular grid covering the whole plane, the rule will work for all numbers in the plane, except for the 1 at the top of the triangle. In our present predicament, we embed the grid in a bigger grid. The “trivial” condition that will be imposed outside our original grid will be perpetual sleepiness. Solution to Problem 13: Imagine the 10 × 10 grid as part of an infinitely large grid of students, with all students outside the small grid being always asleep. (If the idea of that many sleeping students scares you, it also works to imagine it as the central square in a 12 × 12 grid.) We need to be careful to check if anything changed along the border of the 10 × 10 grid: • The students outside the 10 × 10 grid will continue to be perpetually asleep, as they neighbor at least 3 other such sleepers at all times. • If a student along the border of the 10 × 10 grid could S S S S S wake up before due to 2 (or 3) adjacent W ’s, then adding S S W 1 (or 2) adjacent sleepers to that student will still allow the S W student to wake up (cf. the figure on the right). S • If a W on the border of the 10 × 10 grid had 2 adjacent S W W ’s (and at most 1 adjacent S) before, then adding S’s out- S S W S S side the grid will allow this W to switch to S, something that S was not allowed in the original grid. Hence, our “extended” S S W problem allows more possibilities for change of states, in par- S W W W S ticular, for falling asleep. Nevertheless, we will show that the S S S S S conclusion of the problem works in all new scenarios too. Now consider the number of pairs of (horizontally or vertically) adjacent students such that one is asleep and the other is awake. Each time a student falls asleep or wakes up following the rules described, at most 2 such pairs are created and at least 2 disappear, so the total number of such pairs can never increase: this is our monovariant. At the end of class, there are 10 students awake, and each has 4 sleeping neighbors, making 40 asleep-awake pairs. So at the beginning of the class, there must have also been at least 40 such pairs. Since each awake student can belong to at most 4 pairs (one for each neighbor), there must have been at least 10 students awake at the beginning of the class. 

158

6. MONOVARIANTS. PART II

4. Conway’s Checkers This session will close with one last, fairly complex example. It might get a little tiresome to go through so many disconnected examples of monovariants, but that’s no excuse for stopping here, because if you haven’t seen this problem, your life is not complete. The problem is credited to the great recreational mathematician John Horton Conway and is often called Conway’s Checkers or Conway’s Soldiers [9]. 4.1. The setup is similar in spirit to the Escape of the Clones from the Invariants session (vol. I), played on a grid infinite to the right and up. Every cell had at most one clone. A clone sprouted one clone in the cell to the right and another in the cell directly above, and then disappeared. Given a “prison” fence enclosing some (or all) clones, the task was to free the the clones from the prison. Let’s see how Conway’s problem differs. Problem 14. (Conway’s Checkers) Imagine that you have an infinite square grid, with a particular horizontal line of the grid designated. You play the following game: (a) First, you may initially place checkers in the squares below the line – as many as you want, but no more than one checker per square. (b) Then, you may take a checker and jump it over a checker that is adjacent to it – in any of the four directions – into the square immediately beyond, if that space is vacant. In the process, you remove the checker that has been jumped over (cf. Fig. 7). (c) You may continue jumping checkers, as long as there are two checkers adjacent to each other somewhere. The goal is to get some checker to be as far above the designated line as possible. What is the highest row that can be reached?

Figure 7. Checker-jumping legal and illegal moves Figure 7a–b display legal moves in all four directions and their results, and Figure 7c warns against illegal moves: no jumping over several checkers or into an occupied square, and no diagonal jumping! 4.2. Initial victory. Check that you can get a checker to the first row above the designated line, by simply starting with two checkers stacked just below the line and then jumping upward. With four checkers and a series of three jumps, you can get a checker to the second row above the line (cf. Fig. 8).

4. CONWAY’S CHECKERS

159

Exercise 6. Find a way to get to the third row above the line. Then try to get to the fourth row. Hint: Figure 8c contains the initial configuration for reaching row 1, only moved up a row. In general, suppose you have reached row k from some initial configuration Fk . Shift Fk one row up, and try to reach it from some new configuration Fk+1 . If you are successful, then your previous transformation of Fk will result in a checker in row k + 1. There could be other ways. ♦

Figure 8. Getting to the second row Here is the general principle that may have helped you so far:

 PST 55. Re-use inductively your solution for a previous case inside your solution for the next case.

The situation looks hopeful! Pushing on: Exercise 7. Try to find a way to get a checker up to the fifth row. Become frustrated. As the last exercise foreshadowed, and fairly remarkably, it’s impossible to get a checker more than four rows up, no matter how many checkers you place in the first stage of the game. Can we prove this? 4.3. Is there an invariant? First, let’s try using an invariant, along the lines of the Escape of the Clones. Can we assign a number to each square, so that the sum of the numbers of squares with checkers in them stays constant at each step? a b c

a b c

Figure 9. Invariant in both directions Consider three successive squares (cf. Fig. 9). Suppose we want to write the numbers a, b, c in them so that the sum of the occupied squares stays constant. One legal move is to jump a checker from the a square to the c square, removing the checker on b in the process. For the sum to be invariant under this move, we must have a+b = c. Similarly for a jump in the opposite direction, we must have c+b = a. But these two equations add to give b = 0. This argument shows that the number written in every square must be zero! That doesn’t give us a very useful invariant.

160

6. MONOVARIANTS. PART II

4.4. Modifying an invariant. Maybe we ask too much from the invariant?

 laxing or dropping some of these conditions. In the case of Conway’s checkers,

PST 56. If imposing too many conditions leads to a trivial solution, try reinstead of having an invariant in all four directions (moving to the right, left, up, and down), try to make the sum invariant only under jumps in certain directions, e.g., only right and up.

First, let’s focus on a single row. Let’s choose numbers to write in the squares so that the sum stays invariant when we jump to the right. Just as powers of 12 were the natural choice to use in the Escape of the Clones, so are powers xn of some unknown x in order here. Three successive squares with xn , xn+1 , xn+2 , written from left to right, will accommodate our desired invariant only if xn + xn+1 = xn+2 . Dividing by xn leads to a famous quadratic equation: x2 − x − 1√= 0, and the quadratic formula gives x1 = √ 1+ 5 ≈ 1.618 > 1 and x2 = 1−2 5 ≈ −0.618. The larger root x1 is denoted 2 i by φ and is known as the golden ratio. To make it easier to express things concretely, coordinatize the squares in a row with integers (increasing as we move to the right). If we write φn in square n, we have ensured that the sum of the numbers of the squares containing checkers stays invariant under jumps to the right: φn + φn+1 = φn+2 . On the other hand, a jump to the left consists of replacing φn+2 +φn+1 by φn , thus always decreasing the sum (why?). Now, what about jumps up and down, trying to be invariant only up? For any column, an analogous discussion leads us to assign powers φn , with n increasing as we move up the column. However, this column intersects our previously discussed row, and there is already some power φm assigned to the square in the intersection. We need to shift all powers in our column up or down in order to match this φm . There is a simple algebraic way of reconciling the numbers in all rows and columns: we assign vertical as well as horizontal coordinates to the squares, and then write φm+n in square (m, n). This multiplies all numbers in our column by the same φm , and all numbers in our row by the same φn , without changing the properties that we would like: the sum of the numbers in the checkers’ squares will • stay the same under rightward and upward jumps, and • decrease under leftward and downward jumps. 4.5. Symmetry gets rid of infinities. Even though we now have turned the sum into a monovariant, this doesn’t quite work to solve the problem. We want to show that the fifth row is unreachable by arguing the sum of the original checkers is not large enough. But with the numbering scheme just described, there exist squares with arbitrarily large numbers, even below the designated line: since φ > 1, we have a real problem with the positive powers φm . In particular, we need the sum in a single row to be finite, but what we have now is this:

4. CONWAY’S CHECKERS



161

Lemma 3. For any row, the assignment {φm } for any integer m yields an infinite sum for half of the row and a finite sum for the other half: (a) 1 + φ + φ2 + φ3 + · · · + φm + · · · = ∞. 1 (b) 1 + φ−1 + φ−2 + φ−3 + · · · + φ−m + · · · = = φ2 . 1 − φ−1 Proof: Part (a) is self-evident, as all numbers there are > 1. The sum in

i part (b) is a geometric series a + ar + ar2 + · · · + ar m + · · · , where every

next term is the previous multiplied by the ratio r. Provided r is small, a namely, −1 < r < 1, the sum adds up to 1−r (cf. the Invariants session, φ −1 = φ2 because vol. I). In our case, a = 1 and r = φ , and 1−φ1 −1 = φ−1 1 = φ.  (φ − 1)φ = φ2 − φ = 1 and hence φ−1 The ∞ in part (a) is worrisome, showing that our argument will never come together. We need to get rid of the large powers φm , while at the same time ensure that what happens when jumping left (the sum decreases) will also happen when jumping right! The way to do this is to:

 PST 57. Choose a central object and symmetrize the rest with respect to it. Specifically, for our checkers choose a central column, have the numbers be highest in that column, and decrease as you go away from it along any row.

To put it more directly, choose the “central” column to be the one with m-coordinate 0, and as before assign to it all powers {φn }. Row n intersects this central column in φn , so decrease the powers of φ as you move away from φn along row n, either to the right or the left: . . . , φn−2 , φn−1 , φn , φn−1 , φn−2 , . . .. This boils down to replacing φm → φ−|m| in our previous formula and arriving at the pretty V -shape pattern on the right. Exercise 8. Suppose the number φ−|m|+n is assigned to square (m, n), for all m and n. Check that whenever a jump is made, the sum of the numbers in squares occupied by checkers will either stay the same or decrease.

φ.7.. φ.8.. φ.9.. φ10 ...φ9 ...φ8 ...φ7 ... ... ... ... ... ... ... ... ... 6 φ... φ.7.. φ.8.. φ9 ...φ8 ...φ7 ...φ6 ... ... ... ... ... ... ... ... ... φ.5.. φ.6.. φ.7.. φ8 ...φ7 ...φ6 ...φ5 ... ... ... ... ... ... ... ... ... φ.4.. φ.5.. φ.6.. φ7 ...φ6 ...φ5 ...φ4 ... ... ... ... ... ... ... ... ... φ.3.. φ.4.. φ.5.. φ6 ...φ5 ...φ4 ...φ3 ... ... ... ... ... ... ... ... ... φ.2.. φ.3.. φ.4.. φ5 ...φ4 ...φ3 ...φ2 ... ... ... ... ... ... ... ... ... φ... φ.2.. φ.3.. φ4 ...φ3 ...φ2 ...φ ... ... ... ... ... ... ... ... ... ... 1... φ... φ.2.. φ3 ...φ2 ...φ ...1 . . . ... ... ... ... ... .... .. .. . .. .. −1 φ... 1... φ... φ2 ...φ ...1 ...φ−1 ... .. ... ... ... ... ... .. ... ... ... ... −1 −1 −2 φ.−2 ... φ.... 1.... φ ....1 ....φ ....φ ... ... . . . . . . . .. .. .. ... ... ... φ−3 φ−2 φ−1 1 φ−1 φ−2 φ−3

Hint: More precisely, the sum of the occupied squares will • stay the same if you jump up or towards the central column; • decrease if you jump down, over, or away from the central column. ♦ Now that we have modified our monovariant, let’s see if we have resolved our previous difficulty of having infinite sums.

 the sum of all the numbers on or below row 0 is exactly φ .

Exercise 9. Using the formula for the sum of a geometric series, check that 5

162

6. MONOVARIANTS. PART II

Note that, if you add up all the numbers in the grid, you will inevitably get ∞; indeed, any one column alone will yield an infinite sum (why?). But as it will turn out, for our solution we do not need to add up all numbers and we will not do that. 4.6. What’s stopping us from reaching the 5th row? Now we have all the pieces in place to explain this “mystery”. Solution to Problem 14: In Exercise 6, we saw that it was possible to get checkers as high as the 4th row above the designated line. We claim that it is not possible to get a checker to the 5th row, which establishes the answer: the 4th row is the highest possible. Suppose, for a contradiction, that it is possible to get a checker to the th 5 row. Coordinatize the grid so that the row just below the designated line is row 0 and the row just above it is row 1, and so that the alleged 5th -row checker lands in column 0, i.e., in square (0, 5). As before, for each m, n, let the number φ−|m|+n be written in the square in column m and row n. Now consider the sum of all the numbers written in squares containing a checker. Initially, checkers exist only in the squares below the line, so their sum is at most φ5 according to Exercise 9. Furthermore, Exercise 8 showed that our sum will either stay the same or decrease with each jump, and so it will always be ≤ φ5 . But we assumed that we can eventually get a checker to the square (0, 5), which has the number φ5 written in it, making the sum ≥ φ5 . This means that the sum must have started from φ5 and ended also at φ5 , i.e., it stayed constant throughout the whole game! So the sum is, after all, an invariant? Wait a minute! To have an initial sum of φ5 we must have started with all checkers that are below the designated line. To have a final sum of φ5 concentrated in one square, (0, 5), means that we ended the game with exactly one checker: having more checkers in the end will bump up the sum beyond φ5 . So, we converted an initial configuration with infinitely many checkers into a single checker? But that would take an infinite number of jumps, while the game is finite: we said we reached the 5th row, which ended the game! This is our contradiction: after all, ∞ is not finite.  4.7. Monologue on the monovariant. Our numbering of the squares went through several stages before reaching its final shape: (1) a “universal” invariant that ended up in 0s all over and made us give up on the invariant idea and ask for less. (2) a “universal” monovariant that mimicked the Escape of the Clones solution, i.e., had φm+n in square (m, n), for all m and n. But since our grid is infinite in all directions (not just two, as in the Clones problem), this led to a disaster: we had arbitrarily large numbers φn in a monovariant that was decreasing, thereby cutting off our chance of reaching a contradiction.

4. CONWAY’S CHECKERS

163

(3) a “modified” monovariant that removed these arbitrarily large powers from each row by placing φ−|m|+n in square (m, n), for all m and n. Now every individual row had a finite sum, and moreover, all rows in half of the grid added up to a finite number (φ5 ). This made it possible to squeeze out the desired contradiction. But in (3) we avoided some parts of the grid, for otherwise we would be stuck with ∞ as soon as we tried to add up a single column! There was also the somewhat mysterious sum of φ5 that appeared twice in our solution and the lonely line of symmetry: the central column. Without messing up our monovariant solution, here is one pretty (but not necessary) final re-shaping: i

(4) a “doubly-symmetric” monovariant that puts 1 in the position (0, 5); in the four adjacent positions (above, below, right and left), forming a square; φ−2 in the 8 positions directly outside this square, forming a larger square; φ−3 in the 12 positions directly outside those, and so on. The whole plane will be covered this way with powers of φn for n ≤ 0 (cf. Fig. 10). φ−1

ρ5

ρ5

ρ5

ρ4

ρ5

ρ4

ρ3

ρ4

ρ3

ρ2

ρ5 @ ρ4 @ρ5 @ @ ρ3 @ρ4 @ρ5 @ @ @ ρ2 @ρ3 @ρ4 @ρ5 @ @ @ @ ρ @ρ2 @ρ3 @ρ4 @ρ5 @ @ @ @ @ 1 @ρ @ρ2 @ρ3 @ρ4 @ρ5

ρ5 ρ4 ρ3 ρ2 ρ @ @ @ @ @ 5 4 3 2 @ρ @ρ @ρ @ρ @ρ @ @ @ @ 5 4 3 2 @ρ @ρ @ρ @ρ @ @ @ 5 4 3 @ρ @ρ @ρ @ @ 5 4 @ρ @ρ @ 5 @ρ

ρ2

ρ3

ρ4

ρ3

ρ4

ρ5

ρ4

ρ5

ρ5

ρ5

Figure 10. Alternative numbering, ρ = φ−1 Exercise 10. With the alternative numbering in (4), show that the sum of occupied squares is a monovariant as follows: moving away from the row or the column of 1 keeps it constant, while moving towards or over the row or the column of 1 decreases it. Finally, show that the sum below the designated row is 1, while the total sum of the grid is still finite. Obviously, our previous solution will carry over with no blips because the drastic changes happened only to rows above (0, 5), while the rows up to the fifth row were only rescaled by φ−5 . . . and that is the only region where, as it turned out, any checkers moved.

164

6. MONOVARIANTS. PART II

5. Hints and Solutions to Selected Problems Problem 2. Look at the sum of all the numbers: it increases in the example, and in fact it always increases. Indeed, using the notation from the hint, any two numbers a and b among the given will be replaced by d and l = dkm, so the sum will go from a + b = d(k + m) to d + l = d(1 + km). To see why 1+km ≥ k+m move to the LHS and factor: 1−k+km−m = (k−1)(m−1) ≥ 0. Because k, m ≥ 1, equality occurs iff k = 1 or m = 1, which boils down to a = d or b = d, and a|b or b|a. Thus, the sum will remain the same iff one of the numbers divides the other (and in this case, the two numbers will not change either), and the sum will strictly increase otherwise. Note that the operation does not change the overall lcm of all numbers. This is so because lcm(a, b) = l = lcm(d, l). Thus, the largest number we can possibly write down is L = lcm(a1 , a2 , . . . , an ), the lcm of all given numbers a1 , a2 , . . . , an . This puts a (very rough) upper bound on the total sum: at most nL, and as the sum increases by steps of 1 or more, it eventually must stop changing. By our argument above, this means that for any two of the given numbers, one divides the other, and hence the numbers themselves stop changing at that point.  Problem 3. In part (a), we have ab = d2 km = dl, i.e., the total product Π = a1 a2 · · · an is an invariant. This could have been used instead of the lcm above to show that the process will terminate (how?). It also explains what really happens in parts (b)-(c): the process reshuffles and recombines the factors of the given numbers, without dropping or creating new factors.  From our solution to Problem 2 we know that the process terminates when, for any two numbers, one divides the other. Arrange the final numbers ci in increasing order. Then we must have a chain of divisibilities: c1 |c2 |c3 | · · · |cn−1 |cn . There are further restrictions. If p is a prime that divides some ai , let pα1 ≤ pα2 ≤ · · · ≤ pαn be all prime powers that divide the original numbers. It is easy to see that these prime powers cannot combine or split into two different numbers during the process (why?). Hence, they will end up dividing the resulting final numbers on the board, i.e., pαi |ci for all i = 1, 2, . . . , n. Thus, the original prime powers are only reshuffled to make the final numbers, thereby, completely determining the final outcome of the process, regardless of the order in which we conduct the process. An interesting twist is that the length of process does depend on our choices. In addition to the 2-step example on page 143, we can complete the process in a longer way: • {2, 3, 5, 15} → {1, 3, 10, 15} → {1, 3, 5, 30} → {1, 1, 15, 30}.  Problem 4. For the case of M numbers on the circle, to track the contributions of a single label a1 to all labels, set a1 = 1 and all other ai = 0, and perform the addition process. You will quickly see a famous pattern, encoded in the Pascal’s Triangle:

5. HINTS AND SOLUTIONS TO SELECTED PROBLEMS



165

1 The 1 at the top stands for a1 = 1. The second row shows the contribu1 1 tion of this 1 after one iteration: the 2 1 1 labels will be {1, 1} and 0s elsewhere. 1 3 3 1 The third row shows the labels after 1 4 6 4 1 two iterations: {1, 2, 1} and 0s elsewhere; 2 is in the spot of the original 1 5 10 10 5 1 a1 . The fifth row shows {1, 4, 6, 4, 1} = 1 6 15 20 15 6 1 {a4 , a5 , a1 , a2 , a3 }, unless we start with 1 7 21 35 35 21 7 1 only 4 numbers: then the two 1s come together in the spot of a3 , yielding la- 1 8 28 56 70 56 28 8 1 bels {6, 4, 2, 4} = {a1 , a2 , a3 , a4 }, i.e., 0s (mod 2) everywhere. Analogously, the total final contribution of any number in any other slot after 4 iterations will also be {0, 0, 0, 0}. We conclude that for 22 numbers 22 iterations will zero-in everything. For 23 = 8 numbers around the circle, the 9th row displays only evens, except for the end 1s, which will come together to add up on the spot of a5 (opposite to a1 ), yielding 2 ≡ 0 (mod 2) and showing that 8 iterations will zero-in everything. To complete the most general case of 2m numbers, show 2m  ♦ that all binomial coefficients j are even for j = 0, 2m .

Problem 5. Let x, y, z, t and q be the five numbers as some point, with y < 0. Let S = x + y + z + t + q be the total sum. It is clear that S is invariant under the operation: (x+y)−y+(z+y)+t+q = x+y+z+t+q = S. Following the hint, we look at the set of all consecutive sums of 1, 2, 3, and 4 numbers. To make sure that we do not overcount, we will begin with a vertex and list all four sums that start with it, going clockwise; and then we will list what happens to that sum under the operation. First, let’s do this on the concrete example in Figure 4. Starting from the top vertex, the 20 sums are (we skip the total sum as it is an invariant): • {5, 3, 6, 6, -2, 1, 1, 0, 3, 3, 2,   ↓ ↓ ? • {3, 5, 6, 6, 2, 3, 3, 2, 1, 1, 0, ? • {2, 4, 5, 4, 2, 3, 2, 3, 1, 0, 1,

7, 0, -1, 4, 2, -1, 4, 2, ? ↓ ↓   ↓   3, 0, -1, 2, 4, -1, 2, 4, ? 3, -1, 0, 2, 4, 1, 3, 5, ? • {2, 4, 4, 5, 2, 2, 3, 3, 0, 1, 1, 3, 1, 1, 3, 5, 0, 2, 4,

5} ↓ 5} 6} ? 4}

Try to match the 20 numbers from one list to the next. This has been partially done for the first two lists. If you complete the matching, you will see that at every step two numbers refuse to match anything; for example, in the first transition, we are left with −2 → 2 and 7 → 3, marked by ?s. Now, let’s do this in general, using variables. Starting the sums with x: • x → x + y, x + y → (x + y) − y = x, i.e., x ↔ x + y (switch); • x + y + z , x + y + z + t  (go to themselves).

166

6. MONOVARIANTS. PART II

“Miraculously,” the set of sums starting with x does not change as a whole! Similarly for the sums starting with t or q: • q + x ↔ q + x + y, t + q + x ↔ t + q + x + y (switch). • t, q, t + q, q + x + y + z  (go to themselves); Finally, starting with y or z: • y + z ↔ z; y + z + t ↔ z + t; y + z + t + q ↔ z + t + q (switch); • y → −y and z + q + t + x → z + q + t + x + 2y. So the only change to the set of sums occurs when y → −y and S −y → S +y: these are the two sums without matching partners in our example! Taking absolute values, we have no change for |y| = | − y|, but |S − y| = S + (−y) > |S +y| (why?). (In our example, |S −y| transitions from 7 to 3, then from 6 to 4, and again from 6 to 4, dropping down every time by −2y.) Consequently, the sum of all absolute values of the 21 possible sums goes down by an integer value. This is our monovariant! Since this sum is always positive (why?), it must stop decreasing after a while, i.e., the process must terminate.  Try repeating the same argument for 3, 4, and 6 numbers. Exercise 1. When n is large enough that fn has reached its eventual constant value c (odd), then fn = fn+1 = c implies (fn + fn+1 )/2 = c, so indeed fn+2 = c. At the same time, gcd(fn , fn+1 ) = gcd(c, c) = c for all n from now on. But what happens before the sequence stabilizes? Following the hint, for any n, let d = gcd(fn−1 , fn ). Thus, d |fn−1 and d |fn , and hence d also divides the sum fn−1 + fn . Since both fn−1 and fn are odd, then d must be odd too; this means that we can divide the sum by 2 without affecting d, i.e., d |(fn−1 + fn )/2. But fn+1 is the largest odd divisor of this average, so d |fn+1 . Ordinarily, the gcd of a subset of numbers {fn−1 , fn } is greater than the gcd of the whole set {fn−1 , fn , fn+1 }. However, in our case, adding fn+1 to {fn−1 , fn } does not decrease the gcd (as fn+1 is already divisible by that gcd), so gcd(fn−1 , fn ) = gcd(fn−1 , fn , fn+1 ) = d. We are half done. Now, let e = gcd(fn , fn+1 ), and hence e |fn and e |fn+1 . By definition, fn+1 |(fn−1 + fn )/2, therefore, e |(fn−1 + fn ). But e already divides the summand fn ; hence e must divide the other summand fn−1 , i.e., gcd(fn , fn+1 ) | fn−1 . Analogously as above, gcd(fn , fn+1 ) = gcd(fn−1 , fn , fn+1 ) = e. Combining the two conclusions, gcd(fn−1 , fn ) = gcd(fn , fn+1 ) for any n, i.e., the gcd of two consecutive numbers is an invariant.  To finish the exercise, on the one hand gcd(f1 , f2 ) = gcd(fn , fn+1 ) for any n, and on the other hand c = gcd(fn , fn+1 ) for n large enough (when  {fn } stabilizes). So the constant value of the sequence is gcd(a, b). Problem 7. Recall (from Exercise 4 in Proofs I, volume I, about arithmetic operations on irrational and rational numbers) that the difference, sum, product, or ratio of an irrational and a (non-zero) rational number is irrational (proven by contradiction). Now, if x0 is irrational (x0 ∈ I), then by induction, all xk ∈ I. Indeed, if some xk−1 ∈ I, then pk /xk−1 ∈ I (as pk is

5. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

167

a prime and hence rational). As the floor function x outputs only integers (hence rational), the fractional part {x} = x − x of a number preserves the rationality/irrationality of x. Putting these together, xk = {pk /xk−1 } is also irrational. Hence the sequence will never reach 0. On the contrary, if x0 is rational (x0 ∈ Q), the sequence will reach 0. Indeed, similarly as above, all xk ∈ Q (xk ≥ 0), so we can write them as xk = abkk for some relatively prime positive integers ak and bk , unless xk becomes 0. We will show that the denominators {bk } decrease in the process. 1 } = { pa1 0b0 } = ab11 < 1. Since 0 < x0 < 1, a0 < b0 , then x1 = { xp10 } = { a0p/b 0 The fractional part {x} changes only the numerator, but not the denominator of the rational x (why?). Therefore, b1 = a0 , or b1 < a0 if there is some reduction of the fraction ab11 . In either case, b1 ≤ a0 < b0 , so the denominator of x1 is smaller than that of x0 . To push this argument through induction (which we leave to the reader), you will also need to use a1 < b1 , which follows from {x} < 1 for all x. But the sequence of (positive) denominators {bk } cannot decrease for♦ ever. Hence, the process terminates, implying that some xk = 0. Problem 8. The averages bk for the given example are 9, 5, 4, 3, 3, 3, . . .. While the original sequence may not be decreasing, the sequence of averages seems to be decreasing. Let’s prove it. A standard way to proceed is to set ?

up the inequality bk+1 ≤ bk and work backward until you get something like ?

ak+1 ≤ bk (do it!), but the latter is not easy to prove either. One thing that becomes evident in these calculations is that there must be some relationship between the three involved quantities bk+1 , bk , and ak+1 . Let’s find it: bk+1 =

kbk + ak+1 k a1 + a2 + · · · + ak + ak+1 ak+1 = = bk + · k+1 k+1 k+1 k+1

This relationship is actually true for any sequence {an } with average sequence {bn }. Now we are on the right track: since k/(k + 1) < 1 and ak+1 < k + 1, then bk+1 < bk + 1. As both sides are integers, we can be more precise: bk+1 ≤ bk , which is what we were after! The sequence {bk } is decreasing and consists of positive integers, so it must stabilize. But as soon as bk+1 = bk , our inequalities above turn into equalities, showing ak+1 = bk .  Thus, the original sequence {an } stabilizes at the same value. Exercise 3. The triangles on the right are equilateral. The fourth point of the first configuration is the center of the triangle and in the second configuration it is a point below the center. In each case, the two possible non-intersecting pairings are marked in solid or dashed segments. In the first configuration both pairings are minimal

168

6. MONOVARIANTS. PART II

(they have the same length). In the second configuration the solid pairing is longer than the other, hence it is non-minimal, but still a correct pairing.  Problem 10. Starting from an arbitrary assignment of party favors to people, design the following operation. Whenever some person P violates the condition, notice that there is some favor that at most 1/n of P ’s friends have, so reassign P to this favor. For example, in Figure 11a, n = 5 and P ’s friends are split into 5 groups according to their favors F1 , . . . , F5 . Originally, P has favor F2 , so he is connected to each friend in the group with F2 . However, this group is larger than 1/5 of P ’s friends. This means that another group, namely, the group with F5 , is smaller than 1/5 of P ’s friends, so we change P ’s favor to F5 and connect him to everyone in that group. Check that each time the operation is performed, the number of pairs of friends who have the same favor decreases. This monovariant cannot decrease forever, and eventually the desired situation will be achieved.  F1

Bk B

A B

F2

An

F5 B1 P F4

Bn

A1 F3

An A2

Ak

B2 A2

Bk

Bn B2

Ak

A1 B1

A

Figure 11. Favors and Knights Problem 11. Seat the knights in any random order around the table (and make sure they don’t kill each other while you are rearranging them!) Suppose two enemies are sitting next to each other. Call them A and B, going clockwise (CW) around the table (cf. Fig. 11b). Starting from A and going counterclockwise (CCW), let n of A’s friends be A1 , A2 , . . . , An . For any Am , let Bm be his CW neighbor. (Some Bm ’s will coincide with Am−1 ’s or B1 = A, but that won’t affect our solution.) Among the n knights B1 , B2 , . . . , Bn , one must be a friend of B, say, Bk ( = A). We have the CW arrangement: A, B, . . . , Ak , Bk . . ., with friends {A, Ak }, friends {B, Bk }, and adjacent enemies {A, B}. The arc of the table that we will switch goes CCW from A to Bk (cf. the dashed arcs in Figures 11b–c). The switch creates two new adjacent friendly pairs {A, Ak } and {B, Bk }, while splitting the original adjacent enemy pair {A, B} and not affecting other adjacent pairs (except for, possibly, flipping their order of appearance around the table). As a result, we have reduced by 1 or 2 the number of adjacent enemy pairs. This is our monovariant, which will keep decreasing until it hits 0 and only friends sit next to each other.  Exercise 4. Suppose that the flea at A jumps over the flea at B to B + λ(B − A). Prior to the jump, suppose the rightmost of the n fleas is at position C (which may be equal to B). There are two possibilities.

5. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

169

Case 1. If C is still the rightmost position after the jump (cf. Fig. 12a), then the only change in the value happens with the flea at A: −λA → −λ(B + λ(B − A)). Ä

ä

This is a net change of −λ B + λ(B − A) − A = −λ(1 + λ)(B − A) < 0, so the value of the configuration has decreased.

A

B

A

C

A

B

C

A

Figure 12. Preservation/Change of “leadership” among the fleas Case 2. If the flea has jumped farther right than C (cf. Fig. 12b), then the new value is (B + λ(B − A)) − λ(C + B + (other terms)), and the net change: (B + λ(B − A) − λC) − (C − λA) = (1 + λ)(B − C) ≤ 0. (We know that B ≤ C because C was originally the rightmost flea.) So the value may stay the same, but it still cannot increase.  Exercise 5. Suppose otherwise. Then the rightmost flea is at some position ω > ν/μ and all the other fleas are at positions ≤ ω, which means the value of the configuration is: (3)

ω − λ(sum of fleas ≤ ω) > ν. So getting any flea to a position > ω requires the value of the configuration to increase, and we have shown in Exercise 4 that this can never happen. 

Figure 13. Reaching the third row in an inductive way



Exercise 6. Figure 13 uses the inductive idea: it starts with 10 checkers, reduces them to the 4-checker configuration from Figure 8 shifted up a row, and ends in one checker in row 3. A faster way to reach row 3 is to start with 8 checkers as in Figure 14; some steps require you to find the correct order to follow the arrows.  To reach row 4, Figure 15 builds upon our previous 8-checker configuration. However, there is a solution starting with only 20 checkers. 

170

6. MONOVARIANTS. PART II

Figure 14. Reaching the third row in an efficient way

Figure 15. Reaching the fourth row in an inductive way Exercise 8. Jumping up or toward the central column (whether left or right), replaces φk + φk+1 → φk+2 ; but the two sides are equal since 1 + φ = φ2 . Jumping down or away from the central column replaces φk+2 + φk+1 → φk ; this is a decrease as φk+1 > φk . Finally, jumping over the central column  replaces φk + φk+1 → φk , which is also an obvious decrease. Exercise 9. Row 0 = {. . . , φ−3 , φ−2 , φ−1 , 1, φ−1 , φ−2 , φ−3 , . . .}. The right half of row 0, starting with 1, is a geometric series that adds up to φ2 (cf. Lemma 3(b)). The left half of the row is the same series minus the term 1, i.e., φ2 − 1. Adding the two halves we get 2φ2 − 1 = 2φ2 − (φ2 − φ) = φ2 + φ = φ3 . But row n < 0 (below row 0) is just row 0 with everything multiplied by φn , hence the sum for row n is φ3 φn . Now we add up all rows for n = 0, −1, −2, −3, . . . and factor the repeating φ3 , and discover yet another geometric series that we have already encountered again in Lemma 3(b): Ä

ä

φ3 1 + φ−1 + φ−2 + φ−3 + · · · = φ3 · φ2 = φ5 .



Exercise 10. Compared to before, everything on and under row 5 has been divided by φ5 . Hence the sum underneath the designated line did the same thing: φ5 /φ5 = 1. The sum up to row 5 (inclusive) is φ5 , so the total sum is twice that minus row 5’s sum: 2φ5 − φ3 = φ5 + (φ5 − φ3 ) = φ5 + φ4 = φ6 . ♦

Session 7

Geometric Re-Constructions. Part II Bits of Geometry, Physics & Trigonometry Zvezdelina Stankova

Sneak Preview. In this session, we will explore intermediate-level solutions to our main challenges from Part I: the Farmer-and-Cow and the Three-Squares problems. Some basic facts about inscribed angles from Circle Geometry (vol. I) and a couple of similarity criteria for triangles from Part I will come in handy. We will also need to review and learn more sophisticated geometry facts and techniques. For example, we will re-discover and re-prove the famous theorems of Pythagoras and Ptolemy, and through the Farmer-and-Cow problem we will link physics and everyday phenomena to mathematical theory. We will also experience a speed-of-light introduction to trigonometry, use our previous geometric knowledge to prove a famous trigonometric formula, and in turn apply this formula to solving the Three-Squares problem in yet a different but super-fast way. Two challenges on optimal bridges and the infinitely many angles will be posed right away for the most advanced. However, their solutions will be postponed until Part III, where a creative plane geometry idea will be inspired by an inequalities approach and further Calculus techniques will provide alternative ways of tackling a whole range of such problems.

1. Optimal and Infinite Challenges If you feel you are already fortified with enough plane geometry background and the two main problems from Part I are not challenging enough, you can tackle their two cousins below and meet us later in the session to compare notes on their difficulty and variety of approaches. Calculus solutions are allowed; but the ultimate challenge in these problems, of course, is to discover beautiful purely geometric solutions that can be potentially created by bright middle schoolers with little technical background and open minds. Do such solutions exist? Part III will partially answer this question. 171

172

7. RE-CONSTRUCTIONS. PART II

Problem 1. (Optimal Bridge) Two villages are situated on opposite banks, not necessarily across from each other. The river is of constant width. The farmer’s market is always held in the same village. The other village wants to build a bridge across the river (and perpendicular to the banks of the river) so that the total trip to the farmer’s market is as short as possible. Where should the bridge be built and why?

?

Hint: This problem reminds us of the Farmer-and-Cow situation, where we reflected the farmer across the river to a phantom farmer and asked the latter to walk straight through the river and toward the cow. Alas, we justified there that the width of the river could be safely assumed to be zero without changing the problem. Yet, in our Optimal-Bridge problem the width of the river plays an essential role and cannot be ignored. Is reflection again the “magic” transformation that will reduce the problem to a trivial one, or is there another, more appropriate “action” in the plane? ♦. The second challenge for the die-hards is an infinite extension of our previous Three-Squares puzzle: Problem 2. (ℵ0 -Squares)1 Glue to each other infinitely many identical squares with bases AA1 , A1 A2 , A2 A3 , A3 A4 , A4 A5 , and so on, to form an infinite row (cf. Fig. 3). If D is the top left corner of the first square, right above A, what is the sum ∠AA1 D + ∠AA2 D + ∠AA3 D + ∠AA4 D + · · · ? D

α1

A

α2

A1

α3

A2

α4

A3

α5

A4

A5

Figure 1. α1 + α2 + α3 + α4 + α5 + · · · = ? Ideas: The discussion about the original Three-Squares problem in Part I concluded with finding the sum of the first three angles: α1 + α2 + α3 = 90◦ . Is there a similar geometric construction for the ℵ0 -Squares puzzle, i.e., can you usefully tile (part of) the integer grid into grid-triangles? Or could you apply some more advanced techniques instead? In the latter case, try first to solve the Three-Squares problem with trigonometry as a preparatory step for this infinite version. Or, perhaps, you know how to employ the so-called Taylor expansion of a suitable function for the infinitely many squares? Whatever you decide, experimenting by summing some of the angles and estimating the total can be illuminating. Starting with α4 , you may need to add up more than a dozen angles before you realize that this problem is very different in nature than its Three-Squares predecessor. ♦ 1

ℵ0 is a shortcut for “infinitely many.”

2. A PYTHAGOREAN PATH FOR THE INTERMEDIATE

173

2. A Pythagorean Path for the Intermediate 2.1. Similarity rules again. Let’s review the problem in Part I that we solved via an auxiliary geometric construction and congruent triangles: Problem 3. (Three Squares) Three identical squares with bases AM , M H, and HB are put next to each other to form a rectangle ABCD (cf. Fig. 2a). Prove that ∠AM D + ∠AHD + ∠ABD = 90◦ . D

C

D

C β ?

α A

γ

β M

H

B

A

δ β M

γ H

B

Figure 2. Three-Squares problem and Similarity of triangles



The auxiliary geometric construction was the hard and the brilliant part of our first solution. It is unlikely that one would come up with the exact same construction. A natural task would be to find a solution that does not depend on auxiliary segments. It turns out that there is such a solution; but to compensate for the lack of auxiliary segments, we will need to replace the simpler congruences by similarities of triangles. 2.1.1. Angle discussion in reverse. Since α = 45◦ , we just need β + γ = 45◦ . Do we see a 45◦ -degree angle that is already split into β and γ? (Remember: no additional drawings!) The only plausible location is the clustering of angles at vertex D. Do we recognize some angles there? Indeed, ∠CDM is 45◦ and it happens to contain β (cf. Fig. 2b) because ∠CDH = ∠AHD as alternate interior angles for AB || CD and the transversal HD. The solution to the Three-Squares problem will be complete if we can show that ∠M DH = γ. But there isn’t a pair of (drawn) parallel lines to imply it! What other tools can be used to compare ∠M DH and ∠HBD = γ? Do these angles, perhaps, participate in some congruent or similar triangles? Two natural candidates are M DH and M BD: since ∠HM D = δ is shared by them, if our two angles were indeed equal, then by AA the triangles would be similar ! This is a good place to pause and re-think what happened just now.



PST 58. When reasoning backward you will often reach an important fact that must be true (in order for the original problem to work out): try to prove this fact without using any unjustified assumptions from the “backward” discussion. Applying PST 58 may as well be the the turning moment in the analysis of the problem, where your solution starts “moving forward”.

174

7. RE-CONSTRUCTIONS. PART II ?

2.1.2. Moving forward . . . and back again. How do we show that M DH ∼ M BD without assuming ∠M DH = γ? We still have ∠HM D = δ shared by the two triangles, but we do not know anything about other pairs of angles in these non-congruent triangles. Our only chance is to use ratios of sides through, say, the RAR criterion for similarity. With this in mind, is it true that the sides adjacent to δ in M DH and M BD form equal ratios, i.e., MH ? MD ? ? ? √ ⇔ M H · M B = M D 2 ⇔ 1 · 2 = M D 2 ⇔ M D = 2? = MD MB We again reasoned backward! But we finally seem to have reached something that can be proved independent of the discussion so far.

(1)

2.1.3. Ending with a Pythagorean certainty. I can almost hear the reader objecting to the last question in (1): “It is a well-known fact! M D is the diagonal of a unit square. The Pythagorean Theorem for isosceles right √ AM D implies M D 2 = DA2 +AM 2 = 12 +12 = 2, so M D = 2. Done!” ♦ Not so fast! First, do we know how to prove the Pythagorean Theorem? And even if we do, our reasoning back and forth is not quite written in the form of a traditional proof.

 the forward argument from the above discussion, and write a short formal

Exercise 1. Assuming the Pythagorean Theorem, track down and extract solution to Problem 3 with similar triangles.

2.1.4. Restricting ourselves may be advantageous. Do you recall what got us looking for similar triangles in the first place? Was it perhaps the lack of other geometric options?



PST 59. When searching for a second solution, eliminate the methods from the first solution, in order to restrict your attention to what other techniques and ideas are available and suitable in your situation. For example, in the 5th -grade solution from Part I extra constructions were encouraged, albeit restricted only to the integer grid. On the other hand, in the second solution we disallowed any extra drawings at all! As restrictive as this may have seemed at the time, it worked to our advantage: it reduced the number of possible triangles and, even more drastically, the number of possible pairs of similar triangles, making it easier to find the “right” pair: M DH∼M BD. Exercise 2. If you have extra time on your hands, count for fun the number of families of similar triangles that appear in the original Figure 2a.

Here, a family is a collection of triangles any two of which are similar to each other, and two triangles from different families are not similar. Be i aware that congruent triangles are also counted as similar!

2. A PYTHAGOREAN PATH FOR THE INTERMEDIATE

175

2.2. The Pythagorean Theorem is arguably the most widely-known theorem in geometry. (Actually, it is a hybrid between geometry and algebra.) It was mentioned and used several times in the discussion of our main problems, so a proof of it is due. Incidentally, among the hundreds of explanations of the theorem, an elementary and straightforward one is based on a similarity criterion introduced in Part I. Theorem 1. (Pythagorean Theorem (PT)) In a right triangle, the squares of the legs add up to the square of the hypotenuse; in other words, AC 2 + CB 2 = AB 2 as in Figure 3a. Hint: Drop the altitude CH to the hypotenuse AB. Note that the foot of CH will be on segment AB (why?) and, hence, it will split AB into two parts, AH and HB. Using two pairs of similar triangles (how many similar triangles do you see?), express each of AH and HB in terms of the sides of ABC and then sum your results. ♦ C

C b

β

C α

α

A

60◦

β

H

c

6

F

a B

A

2

30◦

O

B

A 1X

3

B

Figure 3. Pythagorean Theorem and Special cases



Exercise 3. (Baby Pythagorean Consequences) Using PT, show that (a) In a right triangle, the hypotenuse is the largest side. √ (b) In a right isosceles triangle with legs 1, the hypotenuse is√ 2. (c) In a 30◦ -60◦ -90◦ triangle, the three sides are in ratios 1 : 3 : 2. Hint: Parts (a)-(b) have been done before (where?), with the “premature” assumption of PT. In part (c) draw a segment through the vertex of the right angle to split the original triangle into two smaller triangles, one of which equilateral (cf. Fig. 3b). Describe the other small triangle. How does this imply that the hypotenuse is twice the side of the equilateral triangle? ♦ We have some unfinished business from the Farmer-and-Cow discussion in Part I. We concluded there that the farmer’s shortest route is through point X on the river such that AX = 1 km and BX = 3 km. The other (given) distances are F A = 2 km and CB = 3 km (cf. Fig. 3c).

 Exercise 4. Calculate the length of the shortest route of the farmer.

Another PT consequence (used earlier) has a more demanding proof:

 is longer than the third side.

Exercise 5. (Triangle Inequality) In a triangle the sum of any two sides Hint: Drop the altitude to that third side, split into cases depending on where the foot of this altitude lands, and use a baby consequence of PT. ♦

176

7. RE-CONSTRUCTIONS. PART II

3. Physics and Math Combine Forces 3.1. Through the looking-glass. In Part I we mentioned a law of physics that we observe every day and which could be used to find an alternative solution to the Farmer-and-Cow problem. If you recall the picture-hint there – the sun looking at itself in the mirror – it should not be surprising that we were referring to the following well-known laws:

To illustrate, on the right is the sunlight reflecting off a river.2 Everything is in a half-plane with respect to the river, the normal is the dashed perpendicular to the river, and the doubly-marked angles are equal: ρ = ρ . Subtracting each from 90◦ yields α = α , which is a rephrase of law (2): the angles made by the riverand the incoming ray and by the river and the reflected ray are equal.

normal

Laws of Reflection. If the reflecting surface is very smooth, the reflection of light obeys the rules: (1) The incident ray, the reflected ray, and the normal to the reflection surface at the point of the incidence lie in the same plane. (2) The angles which the two rays make with the normal are equal. (3) The two rays are on the opposite sides of the normal.

ρ ρ α

α Y

X

river Z

Since we are trying to connect these “laws of nature” to our Farmerand-Cow problem, it will be silly to expect that numerical data (such as the specific distances from the farmer and the cow to the river) are relevant in this discussion. With this understanding, let’s generalize the original problem by keeping only its features that are essential:



Problem 4. (Generalized Farmer & Cow) A farmer and a cow are on the same side of a river. The farmer must get to the river, dip his bucket there, and take the water to his cow. To which point at the river should the farmer walk so that his total path is as short as possible? α α river Hint: The solution from Part I applies equally X Z well to this generalized version: reflect the farmer Y across the river to obtain three similar triangles (review page 13). Then the optimal path of the farmer must have made two equal angles with the river, ♦ namely, α = α as marked above (why?). 2

Caution: “Reflection” may mean different things, depending on the context. In the Laws of Reflection, the sunlight is reflecting off the river. Mathematically speaking, this is different from reflecting across the river, which the farmer did in order to get to the phantom farmer on the other side of the river. The two usages are related, of course: the sunlight’s reflection off the river is the same as its reflection across the normal (why?).

3. PHYSICS AND MATH COMBINE FORCES



177

Comparing the two pictures on the previous page leads to the inevitable conclusion that the farmer must follow the same path as the sunlight, except on a horizontal instead of the vertical or slanted plane along which the sunlight travels. To make this into a formal argument, a small hurdle about uniqueness must be overcome:

 that if we connect it to the farmer F and the cow C we will make two equal

Exercise 6. Show that there is exactly one point X on the river Y Z such angles with the river: ∠F XY = ∠CXZ.

Proof: If X is such a point, let F  be the phantom farmer and F F  intersect the river at A. By SAS, F  XA ∼ = F XA, so ∠F  XA = ∠F XA. Hence F  ∠F XA = ∠CXZ, forcing X to be on F  C, i.e., X is the intersection of F  C and the river. But this is precisely how we constructed the original optimal point X Y A F on the river! Thus, there is exactly one such point. 

C

X

Z

To truly understand the uniqueness of point X, try the following: Exercise 7. If line F C and the river are not parallel, their intersection X  will produce two angles at the river, ∠F X  Y and ∠CX  Z. Are these angles equal? Are X and X  different? Is there a situation when X and X  coincide? Why does this not contradict the uniqueness of X in Exercise 6? 3.2. What have we proven? If we assume the Laws of Reflection, then our Farmer-and-Cow solution implies a well-known fact about the sunlight:



Exercise 8. Among all paths from one point to another that bounce off a mirror, show that the sunlight will take the shortest distance possible. Proof: If the sunlight starts at point F , reflects in one mirror and passes through point C, then the two angles that the sunlight’s path makes with the mirror are equal by the Laws of Reflection. Now put everything on a horizontal plane and let a farmer start at F and walk to the mirror and then to the cow. From the general Problem 4 and the uniqueness in Exercise 6, we know that the shortest path the farmer goes through the unique point X on the mirror where the path makes equal angles with the mirror. In other words, the path of the farmer and the path of the sunlight are identical. Since this is the shortest path for the farmer, it will be the shortest path for the sunlight too. 



So, what happened here? Simply put, our solution to the Farmer-andCow problem implied a “law of nature”: the sunlight travels the shortest route possible even if it has to reflect along the way! And conversely, if we assume this “law of nature” about the sunlight, then the shortest route for the farmer will make two equal angles with the river. It depends on what you assume as an “axiom” and what you decide to prove from it as a “theorem”.

178

7. RE-CONSTRUCTIONS. PART II

3.3. More mirrors. It is natural to explore what happens if there is more than one smooth surface off which the sunlight has to reflect. As a preparatory version, try the special case with two “mirrors” where the path is closed, i.e., it starts and ends at the same place:



Exercise 9. (Optimal Game) Two trees grow in a yard fenced in the shape of an acute or right angle. Children play the following game: starting from one of the trees, they run to one side of the fence, then to the other tree, then to the other side of the fence, and finally return to the first tree. Help them do this as fast as possible. You may assume that the fence extends as far as necessary so that the children cannot go out of the courtyard. You should also think about:

 go wrong then and how should the solution be modified to work here too?

Exercise 10. Why were obtuse angles eliminated in Problem 9? What may And now, for the final generalization: Exercise 11. If the sunlight M3

sible route among all routes with the three properties (a), (b), and (c).

A

M4

 show that the sunlight has taken the shortest pos-

M2

(a) starts at point A, (b) bounces off from a sequence of mirrors M1 , M2 , . . . , Mn , and (c) ends at point B, B

M1

The picture above shows two paths: the sunlight’s path from A to B that reflects through mirrors M1 , M2 , M3 , and M4 , and an alternative (dashed) path that bounces off from the same sequence of mirrors. Note that the two paths happen to pass through the same point on mirror M3 . Still, Exercise 11 claims that the sunlight’s path will be the shorter of the two. Hint: Resolve the sunlight path into an equally long straight line path while showing that the alternate path is a broken line path. ♦

4. Ptolemy’s Lead into Trigonometry 4.1. Ptolemy’s Theorem can be used as a springboard to a standard trigonometric formula needed in the promised 8th -grade solution to the ThreeSquares problem. If you recall, Ptolemy’s Theorem succumbed to the method of inversion in volume I. Among its numerous proofs, the one discussed here stands out as a powerful application of inscribed angles and similar triangles, glued together by an auxiliary geometric construction.

4. PTOLEMY’S LEAD INTO TRIGONOMETRY

179

Theorem 2. (Ptolemy’s Theorem (PtT)) For an inscribed quadrilateral, the sum of the products of opposite sides equals the product of the diagonals; in other words, AB · DC + AD · BC = AC · BD as in Figure 4a. D

C

C

D

C

D

β

α

B

A

M

δ

α

A

γ

γ

M β

δ

B

A

B

Figure 4. Proof of Ptolemy’s Theorem Proof: The key idea is to split a diagonal, say, BD into two parts BM and M D, so that ∠BAM = ∠CAD (= α as in Fig. 4b). Since inscribed angles ¯ they are equal.3 From here, ∠ABM and ∠ACD intercept the same arc AD, BAM ∼ CAD by the AA similarity criterion. We can picture this by rotating CAD about vertex A until side AC aligns with ray AB, and then rescale CAD to the size of BAM . The angle of rotation is ∠CAB. A second rotation about A but through ∠M AB (as in Fig. 4c), followed by a rescaling, will move DAM onto CAB: why are these triangles also similar? Check out the pairs of equal angles denoted by γ and δ. Now we use ratios of sides from the above two similarities to express the parts BM and CM of diagonal BD in terms of quadrilateral ABCD: CD DM CB BA · CD DA · CB BM = , = ⇒ BD = BM + M D = + · BA CA DA CA CA CA Clearing the common denominator CA yields the desired equality of PtT.  Did you notice that the same problem-solving idea occurred in the proofs to both the Pythagorean and Ptolemy’s Theorems? The hypotenuse or a diagonal was split into two parts, whether by the foot of an altitude or by an extra point we created. In both situations, similar triangles played a crucial role in the geometric construction and the ensuing algebraic calculations.

 of the segments XY into two parts, XZ and ZY , by an auxiliary geometric

PST 60. In proving an algebraic equality involving segments, try to split one construction using, perhaps, similar triangles. Then express each of XZ and ZY in terms of the given segments, sum the results for XZ and ZY to get XY , and finally simplify to obtain the desired equality.

4.2. Lightspeed entry into trigonometry. The technical tool needed in the 8th -grade solution to the Three-Squares problem is a famous trigonometric formula that expresses the tangent of a sum of two angles, tan(α + β), in terms of its building blocks, tan α and tan β. 3

To review some facts about angles in a circle, see Circle Geometry, vol. I.

180

7. RE-CONSTRUCTIONS. PART II

Let us first review the four basic trigonometric functions:

C

90◦

Definition 1. In ABC with ∠B = and ∠A = α, i we define the following ratios of sides as new functions of angle α, called sine, cosine, tangent, and cotangent: sin α =

CB CA ,

cos α =

BA CA ,

tan α =

CB AB ,

and cot α =

CB AB ·

If you are seeing these functions for the first time, you should:



γ

α A

B

Exercise 12. Verify that sin x and cos x, as well as tan x and cot x, swap their values for angles α and γ = ∠C; that tan x and cot x are reciprocals of each other and can be expressed as ratios of sin x and cos x; and that a trigonometric version of the PT is satisfied for any right triangle: (a) sin α = cos γ, cos α = sin γ, tan α = cot γ, and cot α = tan γ; sin α cos α (b) tan α = cot1 α and tan γ = cot1 γ ; tan α = cos α and cot α = sin α ; (c) sin2 α + cos2 α = 1 and sin2 γ + cos2 γ = 1. Using the Baby Pythagorean Consequences from Exercise 3, Exercise 13. Calculate the values of the four trigonometric functions at the following famous angles: 0◦ , 30◦ , 45◦ , 60◦ , and 90◦ . Did you plug 90◦ into tan x or 0◦ into cot x? Why or why not? ◦ Partial solution: Since tan α = CB AB , α = 0 means that CB = 0 and ◦ ◦ ◦ ◦ √1 tan 0 = 0. When α = 30 , ACB is 30 -60 -90◦ and tan 30◦ = CB AB = 3

AB (cf. Exer. 3c). But tan 60◦ is the reciprocal of tan 30◦ , i.e., tan 60◦ = CB = √ 3. For α = 45◦ we obtain a right isosceles triangle, so tan 45◦ = CB = 1. AB Finally, tan 90◦ is undefined: we cannot have a right ABC with two AC = AC ♦ right angles; equivalently, we cannot divide by 0 as AB 0 ·

1

tan α

sin α

If we make the hypotenuse AC = 1, we can use E the unit circle k centered at A to visualize the basic y trigonometric functions. To this end, let A be the cenC k ter of the coordinate system in the plane, and place B along the positive x-axis. Then C will lie on the circle while B will be inside it (why?). Since sin α = CB α x and cos α = AB (why?), the sine and cosine funcD A B cos α tions will simply measure the vertical and horizontal displacement of point C, i.e., its y- and x-coordinate, l respectively. Similar triangles can also help us geometrically interpret the tangent function as a single segment (not a ratio of segments).

 perpendicular l to the x-axis through point D and extend ray AC until it

Exercise 14. Let the x-axis intersect the unit circle k in point D. Draw a

meets l at point E. Show that tan α = DE.

4. PTOLEMY’S LEAD INTO TRIGONOMETRY

181

Proof: By the AA similarity criterion, ABC ∼ ADE since they are both right and share angle α. Calculating corresponding ratios and taking into account that the radius of the circle is 1, we obtain: ED ED CB = = = ED.  tan α = AB AD 1 In other words, the tangent function measures the vertical displacement of point E on line l. Line l “happens” to be the tangent line to the circle k at point D. This is no coincidence! So, if the name of tan x was a mystery before, it should not be anymore. For practice,

 can be similarly interpreted as the length of a single segment. Exercise 16. When α moves from 0 to 90 , show that sin x and tan x  strictly increase, while cos x and cot x strictly decrease. (This means, for Exercise 15. Find a horizontal line m along which the cotangent function ◦



example, that sin x < sin y and cot x > cot y for acute angles x < y.)

Hint: Use the unit circle for the values of sin x and cos x, or lines l and m from Exercises 14-15 for the values of tan x and cot x. ♦ Yet a third way to think about a trigonometric function is via its graph. When drawing graphs of trigonometric functions, on the x-axis we ordinarily use linear units called radians (instead of degrees, which are angular units). For example, 0◦ corresponds to 0 radians, 90◦ to π2 radians, 180◦ to π radians, πz radians: this is the length of the etc. More generally, z ◦ corresponds to 180 arc on the unit circle k that is encompassed by a central z ◦ -angle. Thus, ¯ on the unit circle k on page 180 the length of the smaller (dotted) arc CD measures angle ∠BAC = α in radians. Keep this in mind when drawing the graphs below and use radian measure along the x-axis.

 and cos x on interval [0,

Exercise 17. Put together all findings so far to sketch the graphs of sin x π π π 2 ], tan x on [0, 2 ), and cot x on (0, 2 ]. y

ν

tan x

Partial solution: From Exercise 14, we know that the tangent function is measured on line l tangent to the unit circle at point D: tan α = DE. When α = 0, side AC of ∠BAC coincides with the other (horizontal) side AB, causing E = D and tan 0 = 0. As α increases (still staying acute), side AC starts moving counterclockwise from the the horizontal position on the x-axis towards the vertical position on the y-axis, lifting in the process point E higher and higher on line l and making tan α increase.

1 x O

π 4

π 2

In fact, any positive value of tan α can be obtained this way (why?). Thus, the range of tan x is [0, ∞) for 0 ≤ x < π/2 . Furthermore, the graph of tan x has a vertical asymptote ν at x = π2 (not to be confused with the previously discussed tangent line l). Visually, we observe that the graph of ♦ tan x gets closer and closer to the line ν as x approaches π2 .

182

7. RE-CONSTRUCTIONS. PART II

The sine and cosine functions can be defined for any angles, not just for acute angles like α and γ above, while the tangent and cotangent functions can be extended with care to almost all angles, avoiding division by cos x or sin x when they are zero. We will not do this here, but the reader interested in having a more complete understanding of trigonometry should consult a basic text on trigonometry and then justify in the Hints section the corresponding (dashed) extensions of the graphs from Exercise 17. 4.3. Deep in Trigland. To prepare for the promised formula for tan(α+β), we need to first address its predecessors: analogous versions for sine and cosine. If you are familiar with these formulas, skip to Section 4.4 for the trigonometric solution to the Three-Squares problem. Otherwise, hold on tight, for we will pass through some rough trigonometric terrain. Theorem 3. For any angles α and β: (a) sin(α + β) = sin α cos β + cos α sin β; (b) sin(α − β) = sin α cos β − cos α sin β; (c) cos(α + β) = cos α cos β − sin α sin β.

 

For our purposes it suffices to consider only the case when α + β is acute. D We leave it to the reader to extend the proofs to any other cases. PST 61. To prove a trigonometric formula that involves angles α, β, and their sum α + β, try to incorporate two smaller right triangles with angles α and β, respectively, into a larger right triangle one of whose angles α + β is made from gluing angles α and β together, as in the picture on the right. O

B C 1

β α

A

Proof: (a) Following PST 61, glue two right triangles OAC and OBC along their hypotenuse OC to form quadrilateral OACB with right angles at A and B. This makes ∠AOB = α + β (< 90◦ ), which is assumed to be acute for the duration of our proof. Extend AC and OB to form another, larger right OAD. To simplify calculations, let OC = 1. Then from OAC and OBC, the RHS of (a) can be written as: (2)

sin α cos β + cos α sin β = AC · OB + OA · BC.

On the other hand, quadrilateral OACB is cyclic because the two opposite angles at A and B sum to 180◦ . (In fact, the diameter of circle k circumscribed about OACB is OC.) Applying Ptolemy’s Theorem, we can rewrite the RHS of (2) as OC · AB = 1 · AB = AB. To finish the proof, we need to show sin(α + β) = AB. DA · In right OAD, sin(α + β) = sin ∠AOD = DO

D B

k

C β α

O

1

A

4. PTOLEMY’S LEAD INTO TRIGONOMETRY

183

From circle k, ∠BOC = ∠BAC = β, since they are ¯ Because of inscribed angles in k intercepting arc BC. the common angle ∠ADO, the AA similarity criterion implies OCD ∼ ABD, and hence AB AB DA = = = AB. DO OC 1 This establishes that RHS = LHS in (a). 

D δ

B

k

C β

β

1

O

A

In part (b), we can modify the ideas encountered just now to accommodate the required difference (instead of sum) of two angles. We can also restrict the solution to the case when α > β so that α − β > 0 and we can use our basic definitions of the sine and cosine functions. Hint: (b) Assuming α > β, geometrically “subtract” β from α: start with right OAC and right OBC such that ∠AOC = α and ∠BOC = β (as in the figure on the right); glue them along hypotenuse OC so that angle β is inside angle α, and hence ∠AOB = α − β. Extend −→ −−→ rays OA and CB until they intersect at D. There are more clues in the figure. ♦

D α



β

A

α

B

β

O

C

1

k

Proof: (c) Using our introductory Exercise 12(a), we can switch back and forth between sines and cosines by applying cos x = sin(90◦ − x) and sin x = cos(90◦ −x). Again assuming that 0 < α, 0 < β, and α + β < 90◦ , we can reduce part (c) to the previous part (b): cos(α + β)

Ä

ä

Ä

Ex. 12(a)

sin 90◦ − (α + β) = sin (90◦ − α) − β

Thm 3(b)

sin(90◦ − α) cos β − cos(90◦ − α) sin β

Ex. 12(a)

cos α cos β − sin α sin β.

= = =

ä



After all this hard work, the final formula for the tangent of a sum will feel anticlimactic. We just have to be careful not to divide by 0 so as to have well-defined tangents. Hence the conditions below:



Corollary 1. If cos α = 0, cos β = 0, and cos(α + β) = 0 then tan α + tan β · tan(α + β) = 1 − tan α tan β Hint: This is more of an exercise on fractions than anything else. Use the fact that tan x is the ratio of sin x and cos x, expand sin(α + β) and cos(α + β), and practice your algebraic skills! ♦ For a complete trigonometric picture,



Exercise 18. Devise and prove analogous formulas for cos(α−β), tan(α−β), cot(α + β), and cot(α − β).

184

7. RE-CONSTRUCTIONS. PART II

4.4. Trigonometric gratification. Believe it or not, we are ready for the shortest, yet long-overdue trigonometric solution to the Three-Squares ? problem. Recall from Figure 2a on page 173 that we must show α + β + γ = ? 90◦ . Since α = 45◦ , this boils down to β + γ = 45◦ . 4.4.1. Tangents rule! In view of what we studied in the previous subsection, it is reasonable to try to calculate some trigonometric function of β +γ. Why not the tangent function? Exercise 19. Calculate tan(β + γ). Solution: Since AHD and ABD are right triangles, we have tan β = AD 1 AD 1 AH = 2 and tan γ = AB = 3 . Substituting into the formula for the tangent of a sum, we obtain tan(β + γ) =

1 3+2 +1 tan β + tan γ = 2 1 3 1 = 2·3 1 = 1 − tan β tan γ 1− 2 · 3 1− 6

5 6 5 6

= 1.



However, we already know that tan 45◦ = 1. Thus, tan(β + γ) = tan 45◦ . Does this imply that the two angles are equal? Indeed, as we demonstrated earlier, tan x strictly increases for acute angles. Well, 45◦ is acute. How about the other angle β + γ? In order to show that β + γ < 90◦ verify that:

 Exercise 20. Both β and γ are < 45 . ◦

Proof: In right AHD, β + ∠ADH = 90◦ . But since ∠ADH contains ∠ADM = 45◦ , it follows that ∠ADH > 45◦ and β = 90◦ − ∠ADH < 45◦ . You can use a similar argument for γ and ABD. Alternatively, α is an exterior angle for M HD and for M BD, and hence α = 45◦ is larger than the remote interior angles β and γ in these two triangles. 

To wrap things up, since tan x is strictly increasing for acute angles, we can’t have tan x = tan y for two different acute angles. But β + γ and 45◦ are both acute and tan(β + γ) = 1 = tan 45◦ . So the two angles must be  equal! Overall, α + (β + γ) = 45◦ + 45◦ = 90◦ and we are truly done. 4.4.2. More trig-routes? As we went through the above solution, the reader should have questioned our choice of the tangent function: couldn’t we have done as well with other trigonometric functions? The answer is Yes, but you need to complete the earlier exercises about the basic properties of sin x, cos x, and cot x, before you can:

 Exercise 21. Produce three more solutions to the√Three-Squares√problem.

Solution with cosine: In Figure 2a, DH = 5 and DB = 10 from right DAH and right DAB. Thus, cos(β + γ) can be calculated as cos β cos γ − sin β sin γ =

√2 √3 5 10



√1 √1 5 10

=

√ √ 5 5 10

=

√1 2

= cos 45◦ .

But cos x strictly decreases for acute angles, and, as above, β + γ < 90◦ ; so cos(β + γ) = cos 45◦ means β + γ = 45◦ . Thus, again α + β + γ = 90◦ . 

5. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

185

5. Hints and Solutions to Selected Problems Exercise 1. Assuming√the Pythagorean Theorem, from right isosceles √ 2 2 M AD we have M D = 1 + 1 = 2, and a sequence of equalities follows: √ H MD M D = 2 ⇒ M D2 = 1 · 2 = M H · M B ⇒ M MD = MB · Since ∠HM D = δ is shared by M DH and M BD, by RAR criterion the two triangles are similar. This in turn implies ∠M DH = ∠M BD = γ. Summarizing, β + γ = ∠HDC + ∠M DH = ∠M DC = 45◦ and, finally,  α + β + γ = 45◦ + 45◦ = 90◦ . Exercise 2. With a risk of having missed something, we’ll say that there are 8 types of non-similar triangles that appear in Figure 2a: 3 of them are right triangles with a second angle α, β, or γ, and 5 are obtuse triangles, whose pairs of acute angles are {α, γ}, {γ, β}, {β, α}, {γ, β −γ}, or {β −γ, α+γ}. ♦ Theorem 1 (PT). If the foot H of altitude CH were not on hypotenuse AB, say, B is between A and H (cf. Fig. 5a), then ∠ABC would be an exterior angle for BHC and, as such, ∠ABC > ∠AHC = 90◦ . But we can’t possibly have an obtuse ∠ABC in the right ABC! This contradiction explains why H must be on hypotenuse AB and, hence, it must split AB into two segments AH and HB. (Why can’t H = A or H = B?) The three resulting triangles (back in Fig. 3a) are similar by the AA criterion: AHC ∼ ACB ∼ CHB: they are right and the two smaller triangles share angles α or β with the big triangle. From these similarities, AH AC BH BC 2 2 AC = AB and BC = BA ⇒ AH · AB = AC and BH · AB = BC . 

Adding up, (AH + BH)AB = AC 2 + BC 2 , i.e., AB 2 = AC 2 + BC 2 .

Exercise 3. (c) Following the hint, ∠ACO = 60◦ means that ∠BCO = 30◦ , and hence OBC is isosceles with OC = OB. Since AOC is equilateral, OC = OA. Thus, √ O is the circumcenter √ of ABC and√AB = 2OC = 2AC. 2 2 From PT, BC = AB − AC = 4AC 2 − AC 2 = 3AC. We √ conclude  that the desired ratios are satisfied; namely, AC : CB : BA = 1 : 3 : 2. Exercise 4. The shortest route goes along hypotenuses F X and XC: √ √ √ √ √ F X +XC = 22 + 12 + 32 + 62 = 5+ 45 = 4 5.



Exercise 5. Following the hint, to show that AB + BC > AC in ABC, drop the altitude CH to side AB. There are three cases to consider. C

C

C

C

C

F A

B

H

A

H =B A H

B

Y X

F X

Z

Y X=X  Z

Figure 5. Triangle Inequality and Questioning uniqueness

186

7. RE-CONSTRUCTIONS. PART II

(a) If H is outside segment AB (cf. Fig. 5a), WLOG let B be between A and H. Using right AHC and part (a) of Baby Pythagorean we have AC > AH > AB and hence AC +BC > AB. (b) If H = B (cf. Fig. 5b), then ∠B is right in ABC. Part (a) of Baby Pythagorean implies AC > AB and hence AC + BC > AC > AB. (c) If H is between A and B (cf. Fig. 5c), part (a) of Baby Pythagorean for right AHC and right BHC implies AC > AH and BC > BH. Adding the inequalities, we have AC + BC > AH + BH = AB.  Exercise 7. The angles ∠F X  Y and ∠CX  Z are supplementary (cf. Fig. 5d) and, in general, not equal to each other, causing X  = X. It is only when they are both right angles, i.e., F C ⊥ river (cf. Fig. 5e), that X = X  .  Exercise 9. The game consists of two independent subgames: to get from tree T1 to fence side BA and then to tree T2 , and to get from tree T2 to fence side BC and then to tree T1 . Each part is a copy of the Farmer-andCow problem. Thus, we can reflect T1 across line BA to point R1 and reflect T2 across line BC to point R2 , and let R1 T2 intersect line BA in X1 and R2 T1 intersect line BC in X2 . How does ∠BAC ≤ 90◦ imply that X1 is on −−→ −− → ray BA and X2 is on ray BC? The shortest path will be T1 → X1→ T2 → X2 → T1 . ♦

CR2 T2

X2

T1 X1

A

B

R1

Exercise 10. The above solution will work for an obtuse −− → angle too as long as X1 is on ray BA and X2 is on ray T1 −−→ −−→ T2 BC. But what if, say, X1 is not on ray BA: this happens −−→ C if R1 T2 intersects line BA outside ray BA?! We cannot possibly take the sub-route T1 → X1 → T2 because we A B X1 will exit the garden! As it turns out, the shortest path from T1 to T2 via wall BA will go through the corner B R1 of the garden: T1 → B → T2 . To see this, let D be any point on wall BA. Then T1 T2 sub-routes T1 → D → T2 and T1 → B → T2 are the same E length as, respectively, sub-routes R1 → D → T2 and R1 → B → T2 . In other words, we can consider our D B X1 sub-routes to start from R1 and end at T2 . But then sub-route R1 → B → T2 is inside R1 DT2 while subR1 route R1 → D → T2 goes along the sides of the triangle. To compare these two sub-routes, extend R1 B until it intersects side DT2 in point E. By -Inequality for BET2 and for R1 DE: R1 B + BT2 < R1 B + (BE + ET2 ) = R1 E + ET2 < (R1 D + DE) + ET2 = R1 D + DT2 . Thus, the sub-route going through the corner B is shorter than any other sub-route in this situation. ♦

5. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

187

M2

Exercise 11. Reflect the initial point A across mirror M1 to point A1 and let the sunlight and the alternative (dashed) route start at A1 instead. S2 More precisely, replace the initial line segments Q2 AS1 and AQ1 of the two routes by segments A1 S1 A and A1 Q1 of, correspondingly, equal lengths. Note B that the sunlight will now continue straight from Q1 M1 A1 through S1 to mirror M2 , while the dashed S1 route will, in general, follow a broken line from A1 through Q1 to mirror M2 . A1 To summarize, moving the starting point of the routes from A to A1 did not change the total length of each route. But now we can forget mirror M1 and reduce the problem to one fewer mirror. Continuing this way, we can gradually straighten out the sunlight’s route. In the end, after the last mirror has been eliminated, both routes will start at some point An and both will end at B, but the sunlight’s route will be a straight segment while the alternative route will still be a broken line, unless it originally coincided (everywhere!) with the sunlight’s route. ♦ BC AB BC Exercise 12. Since sin γ = AB AC , cos γ = AC , tan γ = BC , and cot γ = AB , the identities in parts (a)-(b) can be directly verified. For part (c), the Pythagorean Theorem for ABC says that AB 2 + BC 2 = AC 2 . Dividing everything by AC 2 yields:

AB 2 BC 2 + = 1 ⇒ cos2 α + sin2 α = sin2 γ + cos2 γ = 1. AC 2 AC 2



Exercise 13. Check that sin 0◦ = cos 90◦ = 0, sin 30◦ = cos 60◦ = 12 , sin 60◦ = √

cos 30◦ = 23 , sin 90◦ = cos 0◦ = 1. Furthermore, cot 0◦ = 0 is not defined, √ ♦ cot 30◦ = 3, cot 60◦ = √13 , and cot 90◦ = 0.

Exercise 15. Using the figure on page 180, let m be the line through G(0, 1) tangent to circle k and −→ intersecting ray AC at H. Note that the measure of ∠CAG is (90◦ − α). By Exercises 12 and 14, cot α = tan(90◦ − α) = tan ∠CAG = GH, i.e., the cotangent function is measured along line m. 

y G F ◦

90 −α

A

cot α

m

H C

k

α

B

x

Exercise 16. As the second side of ∠BAC rises from 0◦ to 90◦ , point C also rises along the unit circle k and hence its y-coordinate sin α increases. At the same time, B moves closer to the center A of the unit circle; i.e., its x-coordinate cos α, decreases. As we saw in Exercise 15, cot α = GH, which will decrease since H will move towards point G. ♦

188

7. RE-CONSTRUCTIONS. PART II

Exercise 17. The graphs of sin x and cos x on [0, π], of tan x on (− π2 , π2 ) ∪ ( π2 , 3π 2 ), and of cot x on (0, π) ∪ (π, 2π) are sketched below. To justify the dashed parts of the graphs, the functions need to be defined on the corresponding intervals; for these definitions, use the unit circle for sin x and cos x, and use the tangent and cotangent lines l and m for tan x and cot x ♦

− π2

co

y 1

sx 0

π

0

π 2

tan x

2π x

cot x

x sin π

0

y

tan x

y

π 3π 2

cot x

y 1

x

0

π 2

π

2π x

3π 2

x 2π

Figure 6. Graphs of sin x, cos x, tan x and cot x Theorem 3. (b) Again we have a cyclic quadrilateral OCBA (why?). WLOG, let the diameter OC of k be 1. Then from right triangles OAC and OBC we can write the RHS of (b) as: PtT

sin α cos β − cos α sin β = AC · OB − OA · BC = OC · AB = AB. From right OBD, sin(α − β) = DB DO . From the cyclic OCBA, ∠ABD = α AB (why?), so ABD ∼ COD by the AA criterion. Thus, DB DO = CO = AB. Everything matches: sin(α − β) = AB and formula (b) follows.  Corollary 1. Using tan x = sin x/ cos x and Theorem 3, we calculate: sin α

sin β

sin α cos β + cos α sin β sin(α + β) cos α + cos β = = , sin α sin β cos(α + β) cos α cos β − sin α sin β 1 − cos α cos β where the last step was division of both numerator and denominator by cos α cos β. This introduced the tangent function everywhere and we can tan α+tan β  now arrive at the desired expression 1−tan α tan β · tan(α + β) =

Exercise 18. The four formulas are:

tan α − tan β ; 1 + tan α tan β cot α cot β + 1 • cot(α − β) = · cot β − cot α

• cos(α − β) = cos α cos β + sin α sin β; • tan(α − β) = • cot(α + β) =

cot α cot β − 1 ; cot α + cot β

For the cosine formula, if you assume that 0 < β < α < 90◦ , you can mimic the proof of Theorem 3(c). Once you prove this formula, the remaining formulas are just algebraic exercises with fractions: turn the tangents and cotangents into fractions of sines and cosines and algebraically manipulate the expressions on the RHS and LHS of the desired formulas to verify that they are equal. ♦

Session 8

Complex Numbers. Part II Zvezdelina Stankova Sneak Preview. The discussion of basic operations on complex numbers from Part I will continue here with ratios, integer powers, and roots in C. Along the way, we will stumble upon a stunning resemblance between powers in C and mollusk shells and will become skilled with de Moivre’s Formula by applying its “offspring,” the roots of unity, to geometric problems with regular polygons. In particular, we shall discover the connection between C and the Triangle Inequality by showing that modulus lacks “respect” for addition, solve the introductory nonagon Problem 1 from Part I, and expand toward a fundamental question in statistics by minimizing sums of squared distances . . . . All with complex numbers.

1. Warning, “Teaser,” and Strategy Although Sections 3–7 are a must for everyone, Sections 8–9 are rather non-trivial: the applications of complex numbers to geometry will require sophistication and determination from the reader to follow through the arguments and absorb all the ideas. A1 A2

l O

A0

A3 A4

A prime example of this is Problem 7, which will come up in Section 10. Informally, if A0 A1 . . . An−1 is a regular polygon, which line l in the plane is the “closest” to its vertices? More precisely, if we take the distances from each vertex Ai to l (denoted by dashed segments in Fig. 1), square them, and add them, which line l will yield the minimal such sum?

After skimming through Sections 3–7, the novice reader may decide to wait for Part IV (in a future volume), devoted entirely to solving Olympiad-style problems via complex numbers. Part III would then be an option for the intermediate reader. The advanced reader, on the other hand, is encouraged to “stick with it” and try Problem 7 on his/her own while we diligently move towards its solution at the end of the present Part II. Figure 1. “Closest” line?

189

190

8. ROOTS OF UNITY IN GEOMETRY

2. Conventions from the Past Recall from Part I that a complex number z is written in Cartesian form i as z = (x, y) = x + iy, and in polar form as z = (|z|, θ) = |z|(cos θ + i sin θ).

Here x = Re(z) and y = Im(z) are the real and the imaginary parts of z, while |z| and θ are the modulus and argument of z. Note that both forms can be written as ordered pairs (a, b); yet these pairs mean different things depending on which form they represent. If not otherwise specified, in this session ordered pairs of real numbers will stand for polar notation of complex numbers. Further, we will frequently use the polar form of addition in C, so it is worth reviewing it: for any z, w ∈ C and any angles θ, μ ∈ R we have (|z|, θ) · (|w|, μ) = (|z||w|, θ + μ).

(1)

 moduli and adds the arguments.

In other words, as shown in Part I, complex multiplication multiplies the Finally, the polar form of conjugation is z = (|z|, θ) = (|z|, −θ), i.e., conjugation preserves the modulus and negates the argument. As in Part I, wherever possible throughout this session we will strive to provide both algebraic and geometric arguments.

3. Complex Division We saw in Part I how to add, subtract, and multiply two complex numbers z and w. In R, we can also divide two numbers x and y, as long as y = 0. Can we do this in C too? In other words, can we rewrite the ratio z/w as some complex number q such that qw = z? 3.1. Conjugation to the rescue! Let’s study first a special product:



Exercise 1. Prove that ww = |w|2 for any w ∈ C, i.e., multiplying a complex number by its conjugate produces a real number: the square of its modulus. A1

w

A2 θ −θ

w

|w|2 ww

1 A0 A3 A4

Figure 2. Multiplying by the conjugate One solution: There is no mystery about the origins of the equation ww = |w|2 . Indeed, if we view it in polar coordinates, the product ww simply “kills” the angle θ and lands us on the real axis (cf. Fig. 2a):  ww = (|w|, θ)(|w|, −θ) = (|w||w|, θ − θ) = (|w|2 , 0) = |w|2 ∈ R.

3. COMPLEX DIVISION

191

Exercise 1 leads us to one of the oldest “tricks” with complex numbers:

 PST 62. If you want to get rid of i in the denominator of a fraction z/w, multiply both top and bottom by the conjugate w of the denominator:

(2)

zw zw z = = = w ww |w|2

Ç

1 |w|2

å

zw.

Note that the last expression in (2) is a well-defined complex number: the product zw is rescaled by the real number 1/|w|2 . Thus, we do know how to divide two complex numbers: simply apply PST 62. Let’s try it.



Exercise 2. Find real numbers x and y such that x + yi equals the fraction 2 + 3i 7−v a + bi 1 ; (b) ; (c) if v = 2 + 3i; (d) for a, b, c, d ∈ R. (a) 5 − 4i 5 − 4i 7+v c + di Solution for part (a): The desired fraction is nothing but the reciprocal of w = 5 − 4i. Applying (2) with z = 1 we obtain a formula for w−1 : 5 4 5 + 4i w 1  = = + i. = (3) 2 w |w| 25 + 16 41 41 Whoever diligently completes part (d) will frown at the resulting complicated formula for the ratio z/w: this happened because the question was phrased in Cartesian coordinates. Can we interpret division in C via polar coordinates in a more easily remembered and natural way? Prove the following corollary in two ways: algebraically and geometrically, and convince yourselves that it makes sense for any non-zero denominator w. Corollary 1. If z = (|z|, θ) and w = (|w|, μ), then



Ä |z| ä Ä 1 ä 1 z = , θ − μ , and = , −μ . w |w| w |w| In other words, division in C divides the moduli and subtracts the arguments of the numerator and denominator.

3.2. Division in C is respected. Just like in Part I, it is time to understand how the operations of modulus and conjugation interact with complex division. That they respect division should come as no surprise. To see this, do the following exercise in two ways, using Cartesian or polar forms.

 Exercise 3. For z, w ∈ C with w = 0, prove that = and   = · It is often advantageous to phrase mathematical statements without ref erence to a particular notation: this way, they can be better understood, z w

|z| |w|

z w

z w

readily remembered, and more easily used in various situations. We have done just that on a number of occasions. For example, word reformulation of the equation ww = |w|2 can be applied to regular polygons placed on the unit circle as the pentagon in Figure 2b: the product of pairs of conjugate vertices always lands on the vertex corresponding to 1. For the pentagon, this means A1 · A4 = A0 = 1 = A2 · A3 . As another example of word reformulation, let’s rephrase Exercise 3.

192

8. ROOTS OF UNITY IN GEOMETRY

Corollary 2. In C, the modulus of a ratio equals the ratio of the moduli, and the conjugate of a ratio equals the ratio of the conjugates. Note how the word order changes in the sentences. Good mathematicians use well-phrased statements. Contrary to what the general public is likely to imagine mathematics to be–they might see countless blackboards covered with complicated formulas and “big” numbers–the simple truth is that the more involved and advanced the math topic is, the more letters, words, and language permeate it to express the complexity of concepts and abstractness of ideas, and to allow for applications of these ideas to numerous other areas of mathematics, sciences, and everyday life. Therefore, learning to verbalize our math statements well in words is a skill worth acquiring as early as possible. As we go along, we shall paraphrase “in English” some important conclusions; but it is an ongoing task for the reader to perform this constantly, as we did in Corollary 2.

4. The Triangle Inequality: No “Respect” for Addition? 4.1. Modulus and addition. We have seen that conjugation respects all four standard operations on C: addition, subtraction, multiplication, and division.1 We have also observed that the modulus preserves multiplication and division. How about modulus and addition? Let’s experiment. Exercise 4. For z = 3 + 4i and w given below, compare the modulus of their sum with the sum of their moduli. Which is larger, |z + w| or |z| + |w|? How about |z − w| and |z| − |w|? Are they ever equal? (a) w = 5 − 12i; (b) w = 6 + 8i; (c) w = 1 + 43 i. We have finally come to operations in C that are not respected: the modulus does not respect addition or subtraction on C. Instead, we have: Theorem 1. (Triangle Inequality) For any complex numbers z and w it is true that |z + w| ≤ |z| + |w|. Equality is attained iff z and w are nonnegative multiples of each other: z = kw or w = kz for some real k ≥ 0. Proof: Geometrically, the situation becomes familiar if we label points P = z, Q = w, and T = z + w, as in Figure 3a. We realize that the sides of OP T are given by |OP | = |z|, |P T | = |OQ| = |w|, and |OT | = |z + w|. Since the shortest route from O to T is the straight segment between them, we see that |OT | ≤ |OP | + |P T |; i.e., |z + w| ≤ |z| + |w|. Equality will be attained iff OP T degenerates to a segment OT with point P between O and T (cf. Fig. 3b); i.e., z and w are positive multiples of each other or one of them is 0 (why?).  1 In Abstract Algebra this identifies conjugation as an automorphism of C as a field. Fancy!

5. INTEGER POWERS IN C

193

T =z + w

T = kz

|w|

Q=w P =z O

degenerate

|z|

P =z O

|z − w|

O

Figure 3. -Inequality, -Equality, and Subtraction version All right, but how does one prove this geometric version of the Triangle Inequality? Check out Geometry II. 4.2. Modulus and subtraction. It is worth pointing out that there is a subtraction version of the Triangle Inequality:

 Corollary 3. |z − w| ≥ |z| − |w| for any z, w ∈ C.

Partial Proof: Corollary 3 is equivalent to |z−w|+|w| ≥ |z| (cf. Fig. 3c), which is true by the ordinary Triangle Inequality applied to (z − w) and w: |z − w| + |w| ≥ |(z − w) + w| = |z|.

(4)

For which z and w is equality obtained in Corollary 3?



 the previously proved Triangle Inequality, we applied a reduction PST that Note that by reworking our subtraction inequality to look exactly like

appeared in various forms in Volume I and in Group Theory I.

5. Integer Powers in C We introduce and discuss integer powers in C via several problems. √ Problem 1. Calculate ( 3 + i)2004 and (1 − i)2004 . √ Multiplying out the 2004 terms ( 3 + i) seems ludicrous. Even if the reader is familiar with the Binomial Theorem,2 it is still unclear how to simplify the final answer. We need another method that quickly yields powers of complex numbers. The difficulty here arises from the “wrong” viewpoint in the formulation of Problem 1: it is again phrased in Cartesian coordinates, while polar ones are a lot more insightful. 5.1. de Moivre’s formula saves the day!



Theorem 2. (de Moivre) In polar coordinates: if z = (r, θ) then z n = (rn , nθ) for any n ∈ Z. In Cartesian coordinates: if z = |z|(cos θ + i sin θ) then z n = |z|n (cos nθ + i sin nθ). There isn’t much to prove here when n ≥ 0: de Moivre’s Theorem is an n-repeated application of complex multiplication of the same number z. When n < 0 a small calculation is necessary. Using Corollary 1 for n = −5: 2

Check Combinatorics I in volume I.

194

8. ROOTS OF UNITY IN GEOMETRY

z n = z −5 = (z 5 )−1 = (r5 , 5θ)−1 = (r−5 , −5θ) = (rn , nθ). In either case, to apply de Moivre’s Theorem, we need to know the angle θ.

 PST 63. Let z = a+bi in Cartesian coordinates. To find the argument θ of z, a b • factor out |z|, as in z = |z|( |z| + i |z| ), and

• check which angle θ fits the bill: cos θ = Solution to Problem 1: Factor |z| = z=

√ 2( 23



and sin θ =

b |z| ·

3 + 1 = 2 from z =



3 + i:

+ 12 i) = 2(cos π6 + i sin π6 ).

de Moivre’s Theorem then yields: Ä

a |z|

ä

2004 2004 (cos 334π + i sin 334π) = 22004 . z 2004 = 22004 cos 2004 6 π + i sin 6 π = 2

The reader should repeat this calculation for (1 − i)2004 .



Pz

5.2. Hopscotch on mollusks. In Figure 4, locate the dot√corresponding to the initial first z power z 1 = 3 + i. The white dots are the 1 next several consecutive powers z 2 , z 3 , z 4 , etc. Let us connect these white dots with a smooth curve (in solid black), which we call the power curve for z and denote by Pz . The resem√ Figure 4. Pz for z = 3 + i blance of Pz to a mollusk shell is inescapable! z3

If we also draw the “mollusk-type” power curve Pw for w = 1 − i, which white dots on Pw and Pz would be the first to coincide? In other words:

 equality (√3 + i)

Problem 2. Find the smallest positive integers m and n satisfying the m = (1 − i)n .

Solution: Recycling the idea in the solution to Problem 1, we obtain: √   (5) ( 3 + i)m = z m = 2m cos π6 m + i sin π6 m . √ √ √ Similarly, |w| = 2, w = 2( √12 − i √12 ) = 2(cos(− π4 ) + i sin(− π4 )), √ (6) (1 − i)n = wn = ( 2)n (cos(− π4 n) + i sin(− π4 n)). Before launching into complex calculations, consider the following “obvious”



PST 64. If you want z = w, write in polar form z = (|z|, θ) and w = (|w|, μ) and equate the moduli and arguments: |z| = |w| and θ ≡ μ (mod 2π). Recall from Number Theory I (vol. I) that “mod 2π” simply means that m n θ and μ differ by a multiple of 2π. So, we apply PST √ n64 to z and w . For m starters, the moduli must be equal, i.e., 2 = ( 2) , and hence n = 2m. Excellent: n must be even, which simplifies (6) to (7)

(1 − i)n = w2m = 2m (cos(− π2 m) + i sin(− π2 m)).

5. INTEGER POWERS IN C

195

Now we “equate” the arguments in (5) and (7): π6 m = − π2 m + 2kπ, from which we conclude m = 3k, for some k ∈ Z. We remember at this point that we were looking for the smallest positive m and n, i.e., m = 3 and n = 2m = 6. The reader is encouraged (as always!) to check the answers by plugging them into (5) and (6); you should get z 3 = w6 = 8i (cf. Fig. 4).  5.3. Landing on the axes. In the next problem, we seek out all z ∈ C such that the fourth white dot on their power curve Pz lands on the real or the imaginary axes. Problem 3. If z = a + bi ∈ C, find out relations between a and b such that (a) z 4 is real; (b) z 4 is purely imaginary. Hint: The problem is again stated in the “wrong” coordinates. Instead, write z = (r, θ) in polar form and use de Moivre’s formula: z 4 = (r4 , 4θ). Note that the modulus of z is irrelevant in our question, since landing on a specific axes is determined entirely by the angle θ. For example, in part (b), in order for z 4 to be purely imaginary, 4θ must “line up” with the positive or the negative imaginary axis, which yields two possibilities: 4θ ≡ ±π/2 (mod 2π). These two possibilities are contained in the single congruence relation 4θ ≡ π/2 (mod π) (why?). ♦



5.4. Extending a mollusk spiral. Once we know how to find positive integer powers of complex numbers, we can fill in the white dots z n (for n ∈ N) on the power curve for any z ∈ C. Note that Pz will start at z and either spiral away from the origin if |z| > 1, or toward the origin if |z| < 1, or move along the unit circle if |z| = 1. (Draw a few cases for various z to get a feeling for the three situations.) Yet, inspecting carefully a mollusk, it seems that the spiral does both things: it starts at the “origin”, and it spirals away forever (if the mollusk lives and grows forever.) √ Exercise 5. In Figure 4 which powers of z = 3 + i will extend the spiral Pz from the initial dot z toward the origin O? Solution: The intended extension of Pz is depicted by a dotted curve in Figure 4. The black dots on this curve are the negative integer powers of z: z −1 , z −2 , z −3 , etc. Indeed, for any integer n > 0, z −n = ( |z|1n , −nθ) = ( 21n , −nθ), where θ = π6 · Thus, the modulus 1/2n becomes smaller and approaches 0 as n increases, thereby pulling the complex numbers z −n toward the origin. Further, the angle −nθ rotates z clockwise, making a spiral revolving toward O.  We can summarize informally this section as follows. For all non-zero z ∈ C the integer powers z n comprise the “skeleton” for the power curve Pz , to give the impression that Pz “starts” at the origin and spirals away forever, or that it is the unit circle for |z| = 1.

196

8. ROOTS OF UNITY IN GEOMETRY

6. Roots in C Yet, there are plenty of empty spots on Pz between two consecutive powers of z on Figure 4. With what can we fill these spots? Since the angles associated√ to such complex numbers are non-integer parts of kθ, considering n the roots z is reasonable. But can we take roots in C? 6.1. de Moivre again. Corollary 4. (Root Formula) A non-zero complex z = |z|(cos θ + i sin θ) has exactly n complex nth -roots w1 , w2 , · · · , wn , given by the formula wk =

» n

Ä

|z| cos θ+2πk + i sin θ+2πk n n

ä

for k = 0, 1, . . . , n − 1.

“Proof” via example: At first glance, this is a rather forbidding formula, so let’s take it apart. For concreteness, let n = 5. We want to find all 5th roots w of z: w5 = z (cf. Fig. 5a). In other words, if w = (r, μ) and z = (|z|, θ), de Moivre’s formula yields ?

w5 = (r5 , 5μ) = (|z|, θ) = z.

ω w1

w2



ω

z= 3 + i

2 2π 5

w0

w3

1

ω3

w4

ω4 Pz

Figure 5. 5th Roots of z =



3 + i and Regular pentagon in C

5 For » the moduli we must have r = |z|, which accounts for the term r = 5 |z| in the Root Formula. Further, equating arguments of w5 and z, we obtain 5μ ≡ θ (mod 2π), i.e., 5μ = θ + 2kπ for some integer k. Therefore μ = 5θ + 2k 5 π. Although, in principle, we can plug in any integer k, only five distinct sums μ will be formed up to a multiple of 2π (why?):

μ = θ5 ,

θ 5

+ 25 π,

θ 5

+ 45 π,

θ 5

+ 65 π,

θ 5

+ 85 π

(for k = 0, 1, 2, 3, 4).

For instance, k = 2007 will land us on the third possibility: + (802 + 45 )π ≡ 5θ + 45 π (mod 2π). √ This explains why there are exactlyfive roots 5 z, given by k = 0, 1, 2, 3, 4.  θ 5

+

2·2007 5 π

=

θ 5

6. ROOTS IN C

197

Now you should repeat this whole reasoning for a general n ∈ N, in the place of 5. To really understand the Root Formula, do the following: Exercise 6. Let z = (4, 23 π).



√ (a) Use polar coordinates to show that z has exactly two square roots z. First reason geometrically, and then use the Root Formula. Draw a picture. √ (b) Repeat the exercise for the cube roots 3 z, showing z has exactly three cube roots.

6.2. The provocation: choosing your favorite root of z. Taking integer powers of z is a “one-way” street in the sense that there is only one for instance, filling in the white dots on the power answer for z n . Thus, √ curve Pz for z = 3 + i was straightforward: plot the unique point z n . √ However, if we want to fill in Pz with a dot corresponding to a root 5 z, we have to make a choice among the five possible such roots w0 , w1 , . . . , w4 . » 5 Note that all wk ’s have the same modulus |z|, i.e., they lie on the same circle centered at O (drawn dashed in Fig. 5a). Hence only one of the √ wk ’s should land on the spiral Pz . Figure 5a seems to indicate that for z = 3+i, it is w0 =

» 5

|z|(cos 5θ + i sin 5θ )

that lands on Pz . But is this true for any complex z? Besides, the angle θ in the polar form for z was arbitrarily chosen up to a multiple of 2π. If we change θ to θ + 2π in the expression for w0 , we will end up with the formula for w1 =

» 5

θ+2π |z|(cos θ+2π 5 + i sin 5 ).

So, should w1 also lie on Pz ? What is going on? We can clearly see that only one of the roots wi can land on Pz . . . . The answer is hiding where we are not looking for it: we haven’t really defined the power curve Pz , other than saying it is a “smooth curve passing through all integer powers of z”. But maybe there are several such smooth curves, one of which passes through w0 , and another through w1 ?! We shall resolve this question in Part III and extend the discussion to any powers z v , whether v is real or complex. For the time being, check your understanding by solving the following: Exercise 7. Consider the equations w6 = z for z = 1, −64, i and 64i. (a) Find all complex solutions w of these four equations, and draw pictures.

 (b) In each case, can you visually select “the one” solution w which lies on the corresponding power curve Pz ? How are you sure that your choice is correct?

198

8. ROOTS OF UNITY IN GEOMETRY

7. Roots of Unity and Regular Polygons 7.1. A definition is in order. The frequently encountered equation z n = 1 (or equivalently, z n − 1 = 0) is so important, that its n distinct C-roots have i been named the nth roots of unity. By the Root Formula, they are given by: 2πk ωk = cos 2πk for k = 0, 1, . . . , n − 1. n + i sin n 2π In other words, all (ωk )n = 1. We denote the root ω1 = cos 2π n + i sin n by th 3 i ω and refer to it as a primitive n root of unity. The powers of ω yield the other roots of unity: ω k = ωk for all k, and hence the name primitive root.

We conclude that the original polynomial z n − 1 factors as: (8) (z − ω0 )(z − ω1 ) · · · (z − ωn−1 ) = (z − 1)(z − ω)(z − ω 2 ) · · · (z − ω n−1 ). Exercise 8. Verify that ω = ω −1 = ωn−1 . Conclude that the other roots of unity also pair up under conjugation: ωk = ωn−1−k for k = 0, 1, . . . , n − 1. 7.2. Choosing the best coordinate system. The roots of unity are not only algebraic objects – roots of a polynomial – they are also geometric objects; the vertices of a regular n-gon inscribed in the unit circle (cf. Fig. 5b). Can we use this to our advantage in geometry problems? Let us start with a relatively straightforward situation. Exercise 9. Let A0 A1 A2 A3 A4 be a regular pentagon. Find a C–coordinate k are easily encoded as complex numbers.

 system in which the five vertices A

Solution: “Obviously,” we should place all vertices of the pentagon on the unit circle. This forces us to choose the origin O as the center of the pentagon (cf. Fig. 5b). For ease of calculations, we can further place the number “1” to coincide with vertex A0 , and we can even choose the positive imaginary axis in such a way that the vertices A0 , A1 , . . . , A4 are arranged counterclockwise along the unit circle. Since the five central angles ∠A0 OA1 , ∠A1 OA2 , . . . , ∠A4 OA0 are all equal to 2π/5, one can get from any vertex to the next via multiplication by the primitive 5th root ω: A0 = 1, A1 = ω, A2 = ω 2 , A3 = ω 3 , and A4 = ω 4 . In other words, the five vertices correspond to the five 5th roots of unity.  The reader should be able to generalize now this situation to any regular n-gon and draw the following problem-solving conclusion:

 PST 65. In a specific geometry problem, pick the most convenient point and unit of length for the origin and for the radius of the unit circle; in plain language, you can place the points 0 and 1 wherever in the plane you (wisely) wish, thereby fixing the real axis. If need be, you can further also pick one of two possibilities for the positive direction of the imaginary axis.

3 Note the subtle difference between the Greek letter ω (omega), which we reserve for the roots of unity, and the Latin letter w, which stands for an arbitrary complex number.

7. ROOTS OF UNITY AND REGULAR POLYGONS

199

7.3. What if a C–system is fixed? For example, if the C–coordinates are already chosen to fit well other objects in the plane, how do we determine if a given n-gon is regular?



Exercise 10. Prove that A0 A1 ...An−1 is a regular n-gon iff for some v, z ∈ C, v = 0, and all k = 0, 1, . . . , n − 1, we have (a) Ak = vω k + z (polygon oriented counterclockwise); (b) Ak = vω −k + z (polygon oriented clockwise). A4 A0

A3

B4 +z A2

A1

B0 ω3 ∗v = ru

B3

ω2 B2

ω4

ω2

ω1

w

ω0

ω0 ω3

ω1

ω4

B1

Figure 6. Regular polygon, oriented clockwise Sketch: Figure 6 demonstrates how to transform the fifth roots of unity ωk to the vertices Ak of an arbitrary regular pentagon, oriented clockwise: • Reflect the (rightmost) polygon in Figure 6d across the real axis to reverse the order of its vertices, as in the polygon in Figure 6c, • Rotate the latter about the origin to arrive at the dashed polygon in Figure 6b. • Rescale the latter to land on polygon B0 B1 . . . B5 , also in Figure 6b. • Translate the latter to obtain polygon A0 A1 . . . A5 in Figure 6a. We can schematically arrange the transformations as follows: +z

∗r

∗u

w

poly 5 ←− poly 4 ←− poly 3 ←− poly 2 ←− poly 1, where u is unit (|u| = 1), r is real, and z is complex. Note that multiplying by ∗v the complex v = ru represents directly the move poly 4 ←− poly 2. Putting everything together, the total transformation sends ω k → vω k + z = Ak . ♦ Often, you will see the equations in Exercise 10 paraphrased as follows.



Exercise 11. Show that A0 A1 . . . An−1 is a regular n-gon oriented counterclockwise iff Ak+2 − Ak+1 = ω for k = 0, 1, . . . , n − 3. Ak+1 − Ak In particular, show that A0 A1 A2 is equilateral iff ν0 A0 + ν1 A1 + ν2 A2 = 0 where ν0 , ν1 , ν2 are the third roots of unity (in some order).

200

8. ROOTS OF UNITY IN GEOMETRY

8. Geometric Promise Fulfilled 8.1. Products of distances. We are finally ready to take on the introductory nonagon Problem 1 from Part I and solve it with complex numbers. Here follows a paraphrase that utilizes some of the new terminology and techniques we learned in this session so far. Problem 4. The vertices of a regular 9-gon inscribed in the unit circle are Ak = ω k for k = 0, 1, . . . , 8, where A0 = 1 and A1 = ω = (1, 2π/9), a primitive 9th root of unity. Prove that the product of segment lengths from  A0 to the other eight vertices is 9, i.e., that 8k=1 |A0 Ak | = 9 (cf. Fig. 7a). A2

A3 A4

A2

A3

A1 A4

A0

O A5

A6

A8 A7

P

A4

A0

O A5

A2

A3

A1 P

A1 A0

O A5

A6

A8

A6

A7

A8 A7

Figure 7. Problem 4 and Generalizations Solution: Since length is represented by modulus in C, i.e., |A0 Ak | = |1 − ω k | for all k, the desired equality becomes (9) (10)

?

|1 − ω||1 − ω 2 ||1 − ω 3 | · · · |1 − ω 8 | = 9 ?

⇔ |(1 − ω)(1 − ω 2 )(1 − ω 3 ) · · · (1 − ω 8 )| = 9.

Equation (10) was obtained using the fact that modulus respects multiplication. But this looks suspiciously like the factorization of z 9 − 1 in (8): (11)

z 9 − 1 = (z − 1)(z − ω)(z − ω 2 ) · · · (z − ω 8 ).

If we plug z = 1 into (11), we will get 0 = 0, which isn’t helpful. The reason for the zeros is the factor of (z − 1) on both sides, so we divide by it: z9 − 1 = (z − ω)(z − ω 2 ) · · · (z − ω 8 ). z−1 Plugging z = 1 in the RHS is OK now, but we must get rid of (z − 1) in the denominator on the LHS. Let’s recall the following useful factorization: (12)

Lemma 1. z n − 1 = (z − 1)(z n−1 + z n−2 + · · · + z + 1) for z ∈ C and n ∈ N. “Proof” of Lemma 1 via Example: To make things crystal clear for the novice, let n = 5. Brute force works great here – use the distributive property to expand the RHS and cancel just about everything in sight: (z − 1)(z 4 + z 3 + z 2 + z + 1) = z 5 + z 4 + z 3 + z 2 + z − z 4 − z 3 − z 2 − z − 1 = z 5 − 1. ♦

8. GEOMETRIC PROMISE FULFILLED

201

Applying Lemma 1 for n = 9, the LHS of (12) becomes (z − 1)(z 8 + z 7 + · · · + z + 1) z9 − 1 = = z8 + z7 + · · · + z + 1 z−1 (z − 1) (13)

⇒ z 8 + z 7 + · · · + z + 1 = (z − ω)(z − ω 2 ) · · · (z − ω 8 ).

We are finally free to plug in z = 1 into (13): 9 = (1 − ω)(1 − ω 2 ) · · · (1 − ω 8 ), and noting that |9| = 9 we get the desired equality of distances in (10).  The generalization of this solution to any regular n-gon is straightforward and it can be summarized algebraically by Corollary 5. If ω is an nth primitive root of unity, then z n−1 + z n−2 + · · · + z + 1 = (z − ω)(z − ω 2 ) · · · (z − ω n−1 ) as polynomials in z. In particular, (1 − ω)(1 − ω 2 ) · · · (1 − ω n−1 ) = n. 8.2. Getting extra mileage. The reader should be at least a bit curious about the need for factoring out and cancelling (z − 1) on both sides of (11): this was caused exclusively by our determination to plug in z = 1. What if we plug any other complex number z into (11): as long as z = ω k (for any integer k), we will get a non-trivial equality. The question is: which of these inequalities will correspond to an elegant geometric formula? Problem 5. Let A0 A1 . . . An−1 be a regular polygon inscribed in a circle of −−→ radius r and center O, and let P be a point on ray OA0 beyond A0 . Prove that the product of distances from P to the vertices of the polygon is |OP |n − rn . Proof: A nonagon version of this problem is presented in Figure 7b. Again, we fix the origin O at the center of the polygon, and let A0 lie on the positive real axis. Because of the given radius r, we slightly adjust by making A0 = r, and hence Ak = rω k for k = 0, 1, . . . , n − 1. Since P also lies on the positive real axis, it is advantageous to write P in a similar way: p = rq for some real q > 0. The desired product is calculated by: n−1 

|P Ak | =

k=0

n−1  k=0

|rq − rω k | = rn

n−1 

|q − ω k |

k=0

= rn |(q − 1)(q − ω)(q − ω 2 ) · · · (q − ω n−1 )| = rn |q n − 1|. The last equality was obtained from (8) for z = q. We can put r back inside the modulus: rn |q n −1| = |(rq)n −rn | = |pn −rn |. Now, if P were an arbitrary point, we would stop here since there would be nothing to simplify. But P −−→ lies on the ray OA0 , outside of the circle. Hence, p ∈ R and p > r. Thus, pn − rn is also a positive real number, which therefore equals its modulus: n−1 

|P Ak | = |pn − rn | = pn − rn = |OP |n − rn .



k=0

Along the way we established that for an arbitrary point P , the corresponding  n n product (illustrated in Fig. 7c) is given by n−1 k=0 |P Ak | = |p − r |.

202

8. ROOTS OF UNITY IN GEOMETRY

9. Venturing Everywhere in the Plane 9.1. Sums versus products. Now, why should the product of the above segment lengths be any more interesting than, say, their sum? If you try to calculate |P A0 | + |P A1 | + · · · + |P An−1 |, you will find out that, due to convoluted square roots, this sum is harder to control and simplify than the product. For some people the more obvious and more important question would be to investigate the sum of squares, |P A0 |2 +|P A1 |2 +· · ·+|P An−1 |2 ; for instance, such people • may have studied a bit of statistics and are therefore always tempted to minimize sums of squares of distances; • are geometry fans of Pythagorean-like problems and would like to generalize the Pythagorean Theorem; • have understood Part I of complex numbers well enough to realize that the modulus |z| is much harder to manipulate since it involves a square root, while the square |z|2 = zz = a2 + b2 is susceptible to more than one method of slick calculation. 9.2. Restricting point P to the circumcircle allows us to calculate the sum of the squares for this special placement of P (cf. Fig. 8a). Problem 6. If A0 A1 . . . An−1 is a regular n-gon and P lies on its circumscribed circle, prove that |P A1 |2 + |P A2 |2 + · · · + |P An |2 is constant. Proof:4 The C–coordinatization from Problem 5 also works well here. We set Ak = rω k for k = 0, 1, . . . , n, where r is the circumradius. Then P = p = rq with |q| = 1 (why?), and |P Ak | = |rq − rω k | = r|q − ω k |. Then n−1 

|P Ak |2 =

k=0

n−1 

r 2 |q − ω k |2 = r 2

k=0

(q − ω k )(q − ω k ) = r 2

k=0

n−1 

(q − ω k )(q − ω k )

k=0

n−1 n−1   (qq + (ωω)k − qω k − qω k ) = r 2 2 − r2 q ωk − r2 q ωk k=0 k=0 k=0 k=0 n−1  n−1   2 2 2 k k −r q = 2nr − r q ω ω .

= r2

(14)

n−1 

n−1 

n−1 

k=0

k=0

Along the way, we used that qq = 1 = ωω, since both q and ω lie on the unit circle. A little bit of “cheating” is in order now. Recall that we are supposed to show that the above sum |P Ak |2 is independent of the position of P on the circumscribed circle; i.e., n and r are OK, but the variable q should be eliminated from the last expression. To this end, it would be very convenient if r2 q and r2 q are each multiplied by 0 in (14). This leads us naturally to the next lemma.





k Lemma 2. The sum of all nth roots of unity is 0: n−1 k=0 ω = 0. Geometrically, the center of a regular n-gon inscribed in the unit circle is the origin. 4

Algebraic manipulations of complex numbers are required for this solution.

9. VENTURING EVERYWHERE IN THE PLANE

ω

P

A2

ω2

203

A1

P

P

P

1

O

O A3

ω3

A0

P

ω4

A4 Figure 8. Sums of squares

We present four different proofs of this fact, each proof using a different PST. Even though four proofs are an “over-overkill” for the task at hand, one never knows which idea will end up being useful in a later situation. Proof 1 (Equating): By (8), z n − 1 = (z − ω0 )(z − ω1 ) · · · (z − ωn−1 ). This is not just one equation, but several. Indeed, if we multiply out the RHS and regroup around the powers of z we will obtain z n − 1 = z n − (ω0 + ω1 + · · · + ωn−1 )z n−1 + · · · + (−1)n (ω0 ω1 · · · ωn−1 ). We can equate the coefficients5 on both sides for any power of z. But there is no power of z n−1 on the LHS! Therefore, equating its coefficients on both  sides yields 0 = ω0 + ω1 + · · · + ωn−1 . Proof 2 (Series): Why not use Lemma 1? Substituting the nth primitive root of unity ω for z, we arrive at 1−1 ωn − 1 = = 0.  1 + ω + ω 2 + · · · + ω n−1 = ω−1 ω−1 Along the way, we realized that Lemma 1 is equivalent to the well-known and useful formula for a geometric series, which made its appearance in the Stomp session in volume I: zn − 1 (15) a + az + az 2 + · · · + az n−1 = a for any a, z ∈ C, z = 1. z−1 Proof 3 (Invariants): If S = ω0 + ω1 + · · · + ωn−1 , multiplying each vertex ωk by the primitive root of unity ω simply rotates ωk to the next vertex ωk+1 (where ωn = ω0 = 1). Overall, the set of vertices remains the same: {ωω0 , ωω1 , · · · , ωωn−1 } = {ω1 , ω2 , · · · , ωn−1 , ω0 } = {ω0 , ω1 , · · · , ωn−1 }, and thus the sums in these two sets are equal: S = ω0 + ω1 + · · · + ωn−1 = ωω0 + ωω1 + · · · + ωωn−1 = ωS ⇒ S − ωS = 0 ⇒ S(1 − ω) = 0 ⇒ S = 0. 5



Equating these coefficients would yield n relations between the roots and the coefficients of the given polynomial, which are a special case of Viète’s formulas. For instance, equating the free terms yields −1 = (−1)n (ω0 ω1 · · · ωn−1 ), i.e., ω0 ω1 · · · ωn−1 = (−1)n−1 .

204

8. ROOTS OF UNITY IN GEOMETRY

Proof 4 (Centroid): There’s got to be a geometric proof! Recall that adding complex numbers is essentially the same as adding vectors emanating from the origin. For those of you who know a bit about vectors: in the case −−→ −−→ of our regular polygon, adding all OAk results in OO = 0 because the center of mass (the centroid) coincides with the center O of the polygon. ♦ Completion of Problem 6: We can now conclude that the sum of the k nth roots of unity is 0, i.e., n−1 k=0 ω = 0. The sum in (14) is then equal to n−1

|P Ak |2 = 2nr2 − r2 q · 0 − r2 q · 0 = 2nr2 , which certainly does not depend on the specific P and is thus constant. k=0



9.3. Letting P wander off in the plane. Naturally, we should question the necessity of placing P on the circumcircle of the n-gon, and we should attempt to generalize Problem 6 to any point P in the plane.



Exercise 12. Given a regular n-gon A0 A1 . . . An−1 , calculate the sum |P A1 |2 + |P A2 |2 + · · · + |P An |2 and determine for which P it is minimal. Sketch: The proof of Problem 6 goes through here with only one small change. We can write again P = p = rq where r is the circumradius, but |q| is no longer required to be 1. We adjust the calculation accordingly: qq = |q|2 and ωω = 1, so that n−1  k=0

|P Ak |2

= r2

n−1 

(qq + (ωω)k − qω k − qω k )

k=0

= r 2 |q|2 n + r 2 n − r 2 q · 0 − r 2 q · 0 = n(|rq|2 + r 2 ) = n(|p|2 + r 2 ).

We conclude that for any circle K centered at O of radius |p|, the given sum depends only on |p| and is therefore constant along K. Figure 8b displays four examples of such circles K, along each of which the sum of squares remains constant. As the circle K shrinks, the sum also decreases, and its minimal value of nr2 is obtained when P coincides with O: |P A1 |2 + |P A2 |2 + · · · + |P An |2 ≥ nr2 , with equality iff P = O.



9.4. Summary of PSTs. It is important to record the various ideas which we used in the proof of Lemma 2, since these ideas are ubiquitous.

 f (z) = g(z), equating the coefficients on both sides for each power z

PST 66. (Partial Viète’s Formulas) Given an equality of polynomials k yields a relation. In particular, for a polynomial f (z) of degree n with leading coefficient 1 whose roots are z1 , z2 , . . . , zn (counted with multiplicities), i.e., f (z) = z n + an−1 z n−1 + · · · + a1 z + a0 = (z − z1 )(z − z2 ) · · · (z − zn ), the sum of the roots is minus the coefficient of z n−1 : z1 +z2 +· · ·+zn = −an−1 , and the product is ± the constant term: z1 z2 · · · zn = (−1)n a0 .

10. WHICH ARE THE “CLOSEST” LINES

205



PST 67. (Geometric Series) When calculating a sum S, try to identify it with some well-known type of sum. In particular, if each term is the previous term times the same number z = 1, we can use formula (15) for the so-called i geometric series with ratio z and initial term a.

 bers S = {z , z , . . . , z }, suppose that for some complex number c = 1 PST 68. (Invariant under Multiplication) Given a set of several num1

2

n

multiplication by c rearranges the elements of S, i.e., as sets

S = {z1 , z2 , . . . , zn } = {cz1 , cz2 , . . . , czn }. Then the sum z1 + z2 + · · · + zn is 0. The product z1 z2 · · · zn is also 0 if in addition c is not an nth root of unity (why?).

10. Which are the “Closest” Lines In Section 9 we explored a problem involving distances from a point P to the vertices of a regular polygon. Now let’s replace the point P by a line l and ask the same question. We arrive at the problem posed in the beginning of the session, which we reformulate below: Problem 7. If A0 A1 . . . An−1 is a regular n-gon and l is a line, let dk be the distance from vertex Ak to l for k = 0, 1, . . . , n − 1. Consider the sum of squares of all such distances: S = d20 + d21 + · · · + d2n−1 . For which line(s) l is S minimal and what is this minimal value? Figure 1 depicted a regular pentagon and a line l and showed the five distances dk via the perpendiculars from the vertices Ak to l. We could proceed as before and setup our C-system so that the Ak ’s line up with the nth roots of unity. The line l will have some equation in this C-system, and we will have to find a formula for the distances from each Ak to l. Even though this is possible and not hard (the reader should try it), let us go along a different route which will introduce another problem-solving idea.



10.1. Fixing, unfixing, and adjusting. Contrary to how the problem is phrased, let us first fix the line l as the most “convenient” line in the plane, and then adjust the rest of the C-system to fit l. Given a number z = x + yi ∈ C, two lines seem the most “convenient” for finding the distance dz from z to l: • If l is the real axis, then dz = y = Im(z). • If l is the imaginary axis, then dz = x = Re(z). Verify the following algebraic formulas which express x and y in terms of z. z−z z+z and Im(z) = y = · Lemma 3. For any z ∈ C, Re(z) = x = 2 2i Lemma 3 suggests that it will be slightly easier if we fix l to be the imaginary axis, and then move the polygon to fit this situation. Our final

206

8. ROOTS OF UNITY IN GEOMETRY

Figure 9d depicts just that: the imaginary axis is rotated and shifted to coincide with line l. Meanwhile, the pentagon is also rotated and shifted without changing its relative position to l. The vertices of the pentagon may no longer be the 5th roots of unity. A1 l

l A1

l

A2

A0

A2 A0

O

A2 A0

O A4

A3

A4

A0

A1

O

A3

l

A1

A4 O

A3

A2

A3

A4

Figure 9. Distances from a line to a pentagon Instead of jumping directly from Figure 1 to Figure 9d and dealing with all of the complicated calculations at the same time, we shall solve the problem in stages and cope with each computational obstacle as it arises. 10.2. If l happens to be the imaginary axis, then the polygon (as depicted in Fig. 9a) does have its vertices as roots of unity: Ak = ωk so that ωk ωk = 1 for all k. The distances are easily computed by Lemma 3: 1 2 4 ωk n−1  d2k k=0

d2k = 14 (ωk + ωk )2 = ⇒ S=

(16)



+ 2ωk ωk + ωk 2 = = 12 n +

1 4

n−1 Ä

1 2

+

1 4

ä

Ä

ä

ωk2 + ωk2 .

ωk2 + ωk2 ,

k=0

where we have added all expressions d2k for k = 0, 1, . . . , n − 1. Recall that the sum of all nth roots of unity is 0. We need the sum of their squares:

 Lemma 4.

n−1 

ωk2

= 0 and, more generally,

k=0

n−1 

ωkm = 0 for any integer m.

k=0

Hint: The necessity for the geometric series approach should be evident here: write out the sums and decide what your ratio and initial term are. ♦ Plugging the result of Lemma 4 into (16) yields the desired sum: S = 12 n. (Why did the sum of conjugates ωk2 in (16) disappear too?)  10.3. If l passes through the center O of the polygon, this yields another special case (as depicted in Fig. 9b). The vertices of the polygon may no longer be the original roots of unity, but they are still on the unit circle. If the polygon has been rotated by angle θ from its original position, its vertices Ak have been correspondingly multiplied by some unit number u = (1, θ), i.e., Ak = uωk for all k, and uu = 1. The distances dk are computed similarly by Lemma 3: d2k = 14 (uωk + uωk )2 =

1 4

Ä ä  2 2  u ωk + 2uuωk ωk + u2 ωk 2 = 12 + 14 u2 ωk2 + u2 ωk2 .

10. WHICH ARE THE “CLOSEST” LINES

207

The only difference from the previous case are the u2 and u2 , which are “stuck” in front of the squares ωk2 and ωk2 . However, since u2 and u2 are constants, after summing up everything over k, they will factor in front of the sums ωk2 = 0 and ωk2 = 0, yielding: S=

n−1 

d2k = 12 n +

1 4

Ä

u2

Ä ä 2 ä ωk + u2 ωk2 = 12 n + 14 u2 · 0 + u2 · 0 = 12 n.

k=0

We conclude that as long as the line l passes through the center of the polygon, the angle of rotation θ about the origin will not matter and the sum will remain the same.  10.4. If l is parallel to the original imaginary axis, the situation is depicted in Figure 9c and is our last special case. It corresponds to translating the polygon horizontally by some (real) number t: the vertices Ak may no longer be on the unit circle. Instead, Ak = ωk + t, with t = t. Again we calculate the distances by Lemma 3: Ä ä2 Ä ä2 1 = 14 (ωk + ωk ) + 2t 4 (ωk + t) + ωk + t = 14 (ωk + ωk )2 + t(ωk + ω k ) + t2 . The first summand 14 (ωk + ωk )2 appeared in the case of Subsection 10.2 and contributed 12 n to the sum S. The second term 12 t(ωk + ω k ) will yield 0 in S (why?), and the last term t2 will contribute t2 n. Overall, S = ( 12 + t2 )n.  The term t2 ≥ 0 matches our intuition: as the line l recedes from the

d2k =

polygonal center, the sum of the distances grows. 10.5. If l is an arbitrary line, in order for l to line up with the imaginary axis (as in Fig. 9d), our polygon needs to be rotated first (as in Fig. 9b) and then to be translated horizontally (as in Fig. 9c). The vertices will be given by Ak = uωk + t where |u| = 1 and t ∈ R. One final time we calculate by Lemma 3: d2k = =

1 1 2 4 ((uωk + t) + uωk + t) = 4 ((uωk + u ωk ) 1 2 2 4 (uωk + u ωk ) + t(uωk + u ω k ) + t .

+ 2t)2

Sum over k and combine the previous cases (fill in the details!) to get S = 12 n + t(u · 0 + u · 0) + t2 n = ( 12 + t2 )n.



10.6. The “closest lines”: conclusions and a look ahead. The actual final answer should be adjusted to reflect the fact that the circumradius of our polygon may be some (real) r = 1: if Ak = ruωk + t, then S = ( 12 r2 + t2 )n. Note that the real number t measures the distance from the line l to the center of the polygon. Thus, the sum is minimal when t = 0, i.e., the “closest” lines  l pass through the center O and yield a minimal sum S = 12 r2 n.

208

8. ROOTS OF UNITY IN GEOMETRY

It is curious that the answers for this Problem 7 and for the previous Problem 6 are in some sense identical: the minimal sums are obtained when the line l and the point P are incident with O: l passes through O or P = O. Is there a deeper reason, beyond our calculations, for such a “coincidence” of answers? To nudge you in one possible direction, here is a related problem in 3-dimensions. It is suggested by one of the giants of contemporary mathematics, the Russian Vladimir Arnol’d, in his Trivium Mathematique [7], a collection of 100 problems that he expects every well educated mathematician should be able to solve. Problem 8. Given a cube, let l be a line through its center. Consider the sum of the squares of distances from each of its vertices to l. For which such line l is this distance minimal? How about replacing the cube by other Platonic solids? 6 For further discussion of C and more similar examples check out the books by Hahn, Needham, Schwerdtfeger, and Yaglom [35],[60],[72],[86], and look for the next two sessions on complex numbers. Part III will round up the theoretical discussion of C by applying the Fundamental Theorem of Algebra to real polynomials, while Part IV will apply C-techniques to solving, as promised, Olympiad-type geometry problems from around the world.

11. Hints and Solutions to Selected Problems Exercise 1. In Cartesian form w = a + ib, so that ww = (a + bi)(a − bi) = a2 − (bi)2 = a2 + b2 = |w|2 . Along the way, we remember the basic definition of i: i2 = −1.



Exercise 2(c). Since 7 + v = 7 − v, be careful when conjugating the de6 7 − 15 i. ♦ nominator: 7 + v = 7 + 2 + 3i = 9 − 3i. The final answer is 15 Exercise 2(d). By formula (2), we arrive at the unsightly expression: a + bi (a + bi)(c − di) (ac + bd) + i(bc − ad) ac + bd bc − ad = = = 2 + 2 i. ♦ 2 2 c + di (c + di)(c − di) c +d c + d2 c + d2 Corollary 1. Instead of the tedious task of dividing (and possibly using the above uninspiring formula), let’s “cheat”: we already know that division in C is well-defined for w = 0 and yields a unique answer. (Do we know this? Explain why.) In other words, we know that z/w is some complex q such that qw = z. But Corollary 1 gives in polar form one candidate for q. So 6

The Platonic solids, also referred to as regular polyhedra, are convex polyhedra whose faces are congruent convex regular polygons. There are exactly five such solids: the tetrahedron, cube, octahedron, dodecahedron, and icosahedron. Check out the website mathworld.wolfram.com/PlatonicSolid.html.

11. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

209

it remains only to show that this candidate behaves as a true ratio should. We multiply it by w in hopes of getting z:

|z| |w| , θ



− μ · (|w|, μ) =



|z| |w|



· |w|, (θ − μ) + μ = (|z|, θ) = z.

Along the way, we used (1) from Section 2 in order to multiply in polar form. |z|  We conclude that |w| , θ − μ indeed equals the ratio z/w. Exercise 3. The identities follow directly from Corollary 1 in polar form; e.g., z w



=

|z| |w| , θ



−μ =



|z| |w| , −θ





+μ =

|z| |w| , −θ



− (−μ) =

(|z|,−θ) (|w|,−μ)

=

z w·

Justify all steps. Why did we do the “strange” double-negation “−(−μ)”? ♦ Exercise 4. In part (a), |z + w| “loses” by a lot to |z| + |w|: √ √ √ |z + w| = |8 − 8i| = 8|1 − i| = 8 2 = 128 < 324 = 18 = 5 + 13 = |z| + |w|. At the same time, |z − w| “wins” by a lot to |z| − |w|: √ |z − w| = | − 2 + 16i| = 2 65 > −8 = 5 − 13 = |z| − |w|. This trend continues elsewhere, except that for w = 6 + 8i we have a tie: |z + w| = |9 + 12i| = 15 = 5 + 10 = |z| + |w|, and for w = 1+ 43 i there are two ties: |z +w| = |z|+|w| and |z −w| = |z|−|w|. Note that all ties arise when the numbers are rescales of each other: w = 2z in (b)’s tie, and z = 3w in (c)’s ties. ♦ Corollary 3. By the ordinary Triangle Inequality applied to (z − w) and w, equality is obtained in (4) iff (z − w) = cw for some real c ≥ 0, or w = 0. Translating, z = (1 + c)w = dw for some real d ≥ 1, or w = 0. No wonder in Exercise 4 we got the only tie of the form |z − w| = |z| − |w| in part(c) when z = 3w.  √ 2004 î√ Ä äó2004 Problem 1. (1 − i)2004 = 2( √12 − i √12 ) = 2 cos(− π4 )+i sin(− π4 ) √ 2004π 1002 (cos(−501π)+i sin(−501π)). = ( 2)2004 (cos(− 2004π 4 )+i sin(− 4 )) = 2 Since cos(−501π) = 1 and sin(−501π) = 0, the final result is −21002 Problem 3(b). θ =

π 8

+

kπ 4 ,

and hence

b a

= tan( π8 +

kπ 4 )

for 0 ≤ k ≤ 7.

 ♦

Ä√ 2π +2kπ ä √ ) then w = 4, 3 2 = (2, π3 + kπ) for Exercise 6(a). If w = (4, 2π 3 k = 0, 1. Check that these two solutions work and draw relevant pictures. ♦

Exercise 11. For the first part, assuming that the polygon is regular, we can simply plug the formulas Ak = vω k +z into the given fractions and verify that we’ll get ω: ω k+1 (ω − 1) (vω k+2 + z) − (vω k+1 + z) Ak+2 − Ak+1 = = ω. = Ak+1 − Ak (vω k+1 + z) − (vω k + z) ω k (ω − 1)

210

8. ROOTS OF UNITY IN GEOMETRY

We are not done yet! Conversely, assuming that Ak+2 − Ak+1 = ω for k = 0, 1, . . . , n − 3, Ak+1 − Ak we must show that the polygon is regular. For the reader not versed in indices and sequences, this may seem like a formidable task. To make things more accessible, let’s assume n = 5 for the time being. We have three equations: A3 − A2 A4 − A3 A2 − A1 = ω, = ω, = ω. (17) A1 − A0 A2 − A1 A3 − A2 ω A1

2π 5

A0

A1

A0

A1

A0

- 3π 5 A2

A2

- 3π 5

A2 A3

- 3π 5

A4

A3

Figure 10. Building a Regular Polygon from Equations (17) We must show that A0 A1 A2 A3 A4 is regular. Instead of doing this algebraically, let’s try a geometric argument. Rewrite the first equation as (18)

A2 − A1 = (−ω)(A0 − A1 ).

Since ω corresponds to angle 2π n on the unit circle, then −ω corresponds n−2 3 ) = − π = − π (cf. Fig. 10a). Equation (18) then means to −(π − 2π n n 5 that segment A0 A1 goes to segment A2 A1 via rotation about A1 at angle 35 π clockwise, i.e. sides A0 A1 and A2 A1 have same lengths, and ∠A0 A1 A2 = 35 π when traversed clockwise. Excellent: all of the interior angles of a regular pentagon are equal to 35 π (why?) Reasoning in a similar fashion with the other two equations in (17), we can build the pentagon A0 A1 A2 A3 A4 in three stages, as shown in Figure 10. The reader should explain why in Figure 10c we have arrived at a regular polygon. 

Session 9

Introduction to Inequalities. Part I Arithmetic, Geometric, and Power Means based on

Bjorn Poonen’s session

Sneak Preview. When your teacher calculates the average of your exams scores, she usually adds up all scores and divides by the number of exams. But what if instead she multiplies the n scores and takes the nth root of that, or adds up the squares of the scores, divides by n, and takes the square root of that? If all exam scores are equal, these three ways will yield the same average; but if even two exam scores differ, the results will all be different. Which method yields the highest average? And what if your teacher weights the exams unequally? This session will answers these and more questions from the realm of inequalities. Some problems will invoke geometry, combinatorics, or calculus, and can be skimmed on a first reading. Part II will tackle other fundamental inequalities.

1. The Language of Inequalities “For a real number t, one has t2 ≥ 0, with equality if and only if t = 0.”

What does a statement like this mean? It seems it can mean only one thing: that “the squares of real numbers are non-negative.” Actually, it is saying three things. The main part of the statement is that (1) if t is a real number, then t2 ≥ 0. But the last phrase “with equality if and only if t = 0” adds two more things: (2) if t = 0, then t2 = 0; and (3) if t2 = 0, then t = 0. (Thus if t = 0, then t2 > 0.) This simple example shows that there is an interplay between the language of equalities and the language of inequalities, and that often statements of inequalities may be saying “more” than what can be seen on the surface. As we go through this session, we will introduce further terminology related to inequalities and pay close attention to the specific language used so as to interpret and use it correctly. 211

212

9. INEQUALITIES I

2. Arithmetic Mean – Geometric Mean Inequality 2.1. Gardening With Baby AM-GM. The most basic arithmetic mean– geometric mean inequality involves only two variables:





Lemma 1. (Baby AM-GM) If x and y are non-negative real numbers, x+y √ ≥ xy, with equality if and only if x = y. then 2 Again, the last phrase “with equality. . . ” means two things: √ = xy (obvious; check it!); and conversely, (1) if x = y ≥ 0, then x+y 2 √ xy for some x, y ≥ 0, then x = y. (2) if x+y 2 = Figure 1a depicts the relative positions of the two means when 0 < x < y. The name “AM” obviously comes from applying arithmetic operations to x and y to obtain the arithmetic mean x+y 2 . The name “GM” might come from the geometric problem of constructing a square with the same area as a given rectangle (cf. Fig. 1b), or it might come from its geometric solution in Figure 1c, where two segments of lengths x and y make the diameter AB of a circle and the length of the (dashed) √ perpendicular CD to that diameter turns out to be the geometric mean xy. The baby AM-GM inequality itself can be visualized using the shaded right OCD, where the hypotenuse is “AM” (also equal to the radius), while the (dashed) leg is “GM”. The proofs of these facts can be found in the plane geometry interlude at the end of the session. √ xy

? √ xy 0

x

x+y 2

y

x Area A = Area A y



A

x+y 2

C

  xD

O  y

xy



B

k

Figure 1. Arithmetic and geometric means when 0 < x < y Hint: Using some algebra instead, deduce the baby AM-GM the √ from √ inequality in the opening quotation on page 211 by letting t = x − y. For √ i x = y, explain why x+y xy, an example of a strict inequality. ♦ 2 > 2.2. Which is the perfect garden? Inequalities such as the AM-GM inequality often give a quick way to solve optimization problems.

  PST 69. Knowing if and when an inequality becomes an equality is usually

Problem 1. A rectangular garden is to be constructed using 20 meters of fence for three of the sides, and using an existing long wall for the fourth side. What is the maximum possible area that can be enclosed? the key to finding extreme values, e.g., baby AM-GM roughly implies that x = y (> 0) makes the sum x + y minimal and the product xy maximal.

2. ARITHMETIC MEAN – GEOMETRIC MEAN INEQUALITY

213

Solution: Let x be the length of the side along the wall (cf. Fig. 2), and let y be the length of each side adjacent to this side (in meters). We must find when the area A = xy is maximal for positive x and y such that the fence length x + 2y is at most 20. In formal language, we must maximize xy subject to the constraints x, y > 0 and x + 2y ≤ 20.

y

y

y

y x

x

Figure 2. Two gardens along a wall

 

Since we see the sum x + 2y, we apply AM-GM to x and 2y (both > 0): AM-GM x + 2y  20 ≤ = 10. (1) x · 2y ≤ 2 2 √ Squaring both sides of 2xy ≤ 10 and dividing by 2 yields xy ≤ 50. To finish the problem, we must show that xy = 50 is possible (if not, the maximum area would be < 50). To obtain xy = 50 in the end, we must have an equality at each step of (1). This happens when x = 2y (by the equality criterion in AM-GM) and x + 2y = 20. Solving this system yields x = 10 and y = 5, so these are the only allowable values of x and y that might make the equality xy = 50 hold, and they do. We conclude that the maximum possible area is 50 square meters, and this is attained if and only if the rectangle is 10 meters by 5 meters, with the long side against the wall.  2.3. PSTs everywhere! In our garden problem, we used several PSTs: some pertained to solving optimization problems in general, and others to applying specific techniques when dealing with inequalities and AM-GM.

 PST 70. In real-life problems asking to find an extreme value of something

(such as the area of a garden), follow the general scheme: (a) Assign variables to the unknown quantities (e.g., length x and width y) and note any natural restrictions on them (x, y > 0), as well as constraints given by the problem (e.g., x + 2y ≤ 20). (b) Use these variables to construct the function you need to optimize (e.g., the area function f (x, y) = xy). (c) Translate the problem into a formal mathematical statement of optimizing the value of the function subject to the constraints (e.g., maximize f (x, y) = xy subject to x, y > 0 and x + 2y ≤ 20). (d) Solve the mathematical problem in (c) using whatever methods are necessary (e.g., AM-GM, calculus techniques, monovariants, etc.). (e) Translate your answer back into the original problem, ensuring that it works (e.g., x = 10 and y = 5 satisfy x, y > 0 and x + 2y ≤ 20).

214

9. INEQUALITIES I

 PST 71. When applying AM-GM to solve an optimization problem, these

ideas might come in handy: (a) A sum with positive terms can be bounded below by applying ÄAMGM, provided you use its summands as the variables in AM-GM e.g., » ä √ x + y ≥ 2 xy, but also x + 2y ≥ 2 x(2y) . (b) A product with positive terms can be bounded above by applying ÄAMGM, provided you use its factors as the variables in AM-GM e.g., Ä

ä2

Ä

ä2 ä

, but also x(2y) ≤ x+2y . xy ≤ x+y 2 2 (c) After setting up a chain of inequalities, the optimal value is usually obtained when the beginning and the end are equal, forcing all inequalities in between to become equalities. In particular, the variables used in AM-GM will be equal (e.g., x = y or x = 2y). (d) In the end, check that the values resulting from equality in AM-GM do yield equalities everywhere else in your chain of inequalities; otherwise, you will need to modify your solution. To make sure you understand the above PSTs, verify that: Exercise 1. In the setting of Problem 1, if the fence length is changed to something other than 20 m, the previous solution will essentially go through, still yielding an optimal garden with length twice as long as its width. However, some modifications are in order to address the next exercise. Exercise 2. You have bought enough flowers to plant a rectangular area of 50 m2 along the wall, but the fence is very expensive. Find the dimensions of the garden that will cost you least in terms of the fence. What if, instead of flowers, you have already purchased a fence of length 20 m: which rectangular garden will have the largest area, assuming the fence goes all the way around the garden (no wall)? 2.4. More variables, more challenge for baby AM-GM! The garden problem can also be solved using calculus, which gives another approach to many inequality problems. Instead, we did it in detail with the baby AMGM because the same reasoning can be used in more complicated problems. Try the two exercises below: despite their “multivariable” appearance, clever applications of baby AM-GM for two variables at a time is all you need! Exercise 3. Prove that for any a, b, c > 0,

 (a + b)(b + c)(c + a) ≥ 8abc, and determine when equality holds.  Exercise 4. Prove that n! < Ä ä for all integers n > 1. n+1 n 2

Hints: In both exercises, baby AM-GM applies nicely to pairs of numbers. In the former exercise there isn’t much of a choice for the pairs, whereas in the latter exercise you have to be careful to pair up the “right” numbers according to their sum. ♦

2. ARITHMETIC MEAN – GEOMETRIC MEAN INEQUALITY

215

2.5. Need more strength. Some problems with more variables cannot be conquered by a repetitive application of baby AM-GM. And hence, we formulate the general version of AM-GM for any number of variables: Theorem 1. (AM-GM) If x1 , x2 , . . . , xn ≥ 0, then √ x1 + x2 + · · · + xn ≥ n x1 x2 . . . xn n with equality if and only if x1 = x2 = · · · = xn . Theorem 1 and other fundamental inequalities of n variables will be proven later in Monovariants III. In this session, we assume them and show how to use them in problems, along with other PSTs. To start off,

 and to make it the numerator of an arithmetic mean.

PST 72. Often the key to using AM-GM is to identify a sum in the problem

  

You will need to manipulate algebraically both sides of the following inequalities before you can identify which sum to plug into AM-GM: √ 1 Exercise 5. Prove 2 x ≥ 3 − for x > 0. x Exercise 6. Prove that if a ≥ b ≥ 0 and n ≥ 1 is an integer, then an − bn ≥ n(a − b)(ab)(n−1)/2. √ √ √ Hints: Write 2 x as x + x, or factor a − b out of the LHS. Then apply AM-GM for 3 or for n variables. If equality is attainable, find out when. ♦ 2

2

2

Exercise 7. Let E be the ellipsoid xa2 + yb2 + zc2 = 1 for some a, b, c > 0. Find, in terms of a, b, and c, the volume of the largest rectangular box that can fit inside E, with faces parallel to the coordinate planes (cf. Fig. 3a). z y x x

y

Figure 3. Box in ellipsoid and Rectangle in ellipse Hint: x = y = z is not necessarily useful, but some rescales of these variables are. Try also the two-dimensional version of the problem asking for a rectangle of largest area inside an ellipse (cf. Fig. 3b). ♦

 variables x , it is natural to apply the AM-GM inequality precisely to the

PST 73. If an inequality becomes an equality for certain values ai of the i

corresponding rescaled quantities xi /ai that can equal each other.

216



9. INEQUALITIES I

√ Problem 2. Let g = n a1 a2 . . . an be the geometric mean of the numbers a1 , a2 , . . . , an > 0. Prove that (1 + a1 )(1 + a2 ) . . . (1 + an ) ≥ (1 + g)n . Hint: Following PST 73, equality holds if a1 = a2 = · · · = an , so one should apply AM-GM only to terms that are equal when a1 = a2 = · · · = an . This suggests expanding both sides of the inequality to be proved and grouping terms according to total degree. Some combinatorics will be needed here. ♦

3. Power Mean Inequality 3.1. Are there any other means? The arithmetic and geometric means are certainly not the only ways to assign an “average” to several numbers. Definition 1. Fix x1 , x2 , . . . , xn ≥ 0. For r = 0, the rth power mean Pr of x1 , x2 , . . . , xn is the rth root of the average of the rth powers of the xi ’s: Pr =

Å r ã x1 + xr2 + · · · + xrn 1/r

. n To avoid inverting 0s, assume r > 0 if some xi is 0. Even though the formula yields nonsense if r = 0, there is a√natural way i to define P0 too: simply let it be the geometric mean,1 i.e., P0 = n x1 x2 . . . xn . At the other extreme, what happens when r is very large? If one of the xi ’s, say xm , is larger than all the others, then xrm will be much larger than the rth powers of the others, so much larger that Pr ≈ xm . Hence, we define:

i

P∞ = max{x1 , . . . , xn }, and similarly, P−∞ = min{x1 , . . . , xn }.

i

Below are three famous examples of power means: x1 + · · · + xn , P2 = P1 = n

 

x21 + · · · + x2n , and P−1 = n

1 x1

n +··· +

1 xn

·

Here P1 is just the arithmetic mean, P2 is sometimes called the root mean

i square, and P−1 (defined only for x1 , . . . , xn > 0) is the harmonic mean (HM).

3.2. What is the relation between all power means? Briefly, the larger the power, the larger the mean: Theorem 2. (Power Mean Inequality) Let x1 , x2 , . . . , xn ≥ 0. Suppose that r > s (and s ≥ 0 if some xi is 0). Then Pr ≥ Ps , with equality if and only if x1 = x2 = · · · = xn . The power mean inequality (PM) holds even if r = ∞ or s = −∞, provided that we use the definitions of P∞ and P−∞ above, and the convention that ∞ > r > −∞ for all numbers r. Here are three important special cases of the PM inequality, including our previous AM-GM: 1

The definitions of P0 , P∞ , and P−∞ are explained in Section 3.3.

3. POWER MEAN INEQUALITY

217

Corollary 1. (AM-GM-HM Inequalities) P1 ≥ P0 ≥ P−1 . If you are seeing these inequalities for the first time, you should write them out so as to recognize them more easily in practice.

 √2 times the hypotenuse of the triangle.

Exercise 8. Prove that the sum of the legs of a right triangle never exceeds Hint: Although PM or AM-GM (P1 ≥ P0 ) are powerful tools in these exercises, can you get by with just the fact that squares are non-negative? ♦



Exercise 9. Among all planes passing through a fixed point (a, b, c) with a, b, c > 0 and meeting the positive parts of the three coordinate axes, find the one such that the tetrahedron bounded by it and the coordinate planes has minimal volume.

z

Hint: For r, s, t > 0 what is the equation of the plane through (r, 0, 0), (0, s, 0), and (0, 0, t)? Try also the two-dimensional version of the problem. ♦



(a, b, c) y

x

3.3. Limits justify our choices. The discussion below explains the definitions for the power means P0 , P∞ , and P−∞ . If you do not know limits well, you can skip this on a first reading, without hurting your understanding of inequalities. The die-hards can still find the necessary background material in a real analysis [69] or an advanced calculus textbook. √ Let’s start with P0 . The reason for the convention P0 = n x1 x2 . . . xn is that when r is very small but nonzero the value of Pr is very close to the geometric mean, and it can be made as close as desired by taking r sufficiently close to 0. In the language of limits, √ Lemma 2. lim Pr = n x1 x2 . . . xn . r→0

Another way of saying this is that the only choice for P0 that makes Pr depend continuously on r is the geometric mean: lim Pr = P0 . r→0

Hint: l’Hôpital’s Rule and properties of ln x will be needed in the proof. ♦ Let us now explain why we defined P∞ as we did. Let xm be the largest of the xi ’s. Then 0 ≤ xi ≤ xm for all i. Hence xrm xr + · · · + xrm + · · · + xrn nxrm xm ≤ 1 ≤ = xrm , so √ ≤ P r ≤ xm . r n n n n √ But lim r n = lim n1/r = n0 = 1, so by the Sandwich (Squeeze) Theorem r→∞

r→∞

lim Pr = xm = max{x1 , . . . , xm }.

r→∞

This motivates the definition P∞ = max{x1 , . . . , xm }. See if you can modify this proof to explain the choice for P−∞ :

 Lemma 3.

lim Pr = min{x1 , x2 , . . . , xn }.

r→−∞

218

9. INEQUALITIES I

4. The Land of the Convex The power mean Pr provided a large generalization of AM and GM. Still, we don’t know yet why the infinitely many inequalities among the Pr ’s are true! The notion of convexity will allow us to further generalize the power means and explain all inequalities encountered so far in one fell swoop. 4.1. What is a convex function? Briefly, a function f is convex if for every two points A and B on the graph of f , the line segment AB lies above the part of the graph between A and B (cf. Fig. 4a). More formally, Definition 2. (Geometric convexity) A function f (x) is convex if for any real numbers a and b with a < b, each point D = (c, d) on the line i segment joining A = (a, f (a)) and B = (b, f (b)) lies above or at the point C = (c, f (c)) on the graph of f with the same x-coordinate as D (cf. Fig. 4a). y

y

f (x)

B

l y

D A

x

x C

a

convex

b x

c

Figure 4. Graphs of x2 , x3 , and sin x



The x-value c is a fraction λ of the way from a to b for some λ ∈ [0, 1], i.e., c − a = λ(b − a), and hence c Ä= (1 − λ)a + λb. ä This yields the height of point C on the graph: f (c) = f (1 − λ)a + λb . At the same time, the height of point D on the line segment AB is (1 − λ)f (a) + λf (b); indeed, this is the same (linear) combination of the heights f (a) and f (b) of A and B, coming from the right trapezoid “abBA” (for a proof, see the plane geometry interlude). The condition for convexity, that the height of C is at most the height of D, can be expressed algebraically as follows: Definition 2 . (Algebraic convexity) A function f (x) is convex if

i (2)

Ä

ä

f (1 − λ)a + λb ≤ (1 − λ)f (a) + λf (b)

whenever a < b and λ ∈ [0, 1]. Those who know what a convex set in geometry is can interpret the condition as saying that the set S = {(x, y) : y ≥ f (x)} of points above the graph of f is a convex set, i.e., the segment connecting any two points in S i is entirely in S. Loosely speaking, this will hold if the graph of f curves in the shape of a smile instead of a frown. For example, the function f (x) = x2 is convex (cf. Fig. 4a), and so is f (x) = xn for any positive even integer.

4. THE LAND OF THE CONVEX

219

One can also speak of a function f (x) being convex on an interval I. i This means that the condition (2) above holds at least when a, b ∈ I (and a < b and λ ∈ [0, 1]). In Figure 4b-c, for instance, one can observe that f (x) = x3 is convex on [0, ∞), and that f (x) = sin x is convex on [−π, 0]. Finally, one says that a function f (x) on an interval I is strictly convex if Ä

ä

f (1 − λ)a + λb < (1 − λ)f (a) + λf (b)

i

whenever a, b ∈ I and a < b and λ ∈ (0, 1). In other words, the line segment connecting two points on the graph of f should lie entirely above the graph of f , except where it touches at its endpoints. Thus, for example, while a linear function ax + b and a quadratic function ax2 + bx + c (with a > 0) are both convex everywhere, only the quadratic one is strictly convex (why?). 4.2. The “convex hall” of fame. For convenience, here is a brief list of some frequently encountered convex functions: • x2k • −xr • − ln x • − cos x • ex

on on on on on

all of R; [0, ∞), if r ∈ [0, 1]; (0, ∞); [−π/2, π/2]; all of R;

• xr • xr • − sin x • tan x r • s+x

on on on on on

[0, ∞), if r ≥ 1; (0, ∞), if r ≤ 0; [0, π]; [0, π/2); (−s, ∞), if r > 0.

In these, k represents a positive integer, r, s represent real constants, and x is the variable. In fact, all of these are strictly convex on the interval given, except for xr and −xr when r is 0 or 1. Exercise 10. Draw the graphs of the functions above and explain why they are convex on the given intervals by verifying visually the geometric definition of convexity. To make more convex functions out of already known convex functions, we can perform certain arithmetic operations: Lemma 4. Show that a sum of convex functions is convex, and that adding a constant or linear function to a function does not affect convexity. Hint: Verify the algebraic definition of convexity.



4.3. Convexity fast-track for calculus aficionados. If you know about continuity and derivatives and want to rigorously prove that a function is convex, then . . . instead of just guessing it from the graph or trying to verify the algebraic definition of convexity (which can be quite hard and timeconsuming), it is often easier to use one of the criteria below. The first criterion says that for a continuous function (roughly, a function whose graph you can draw without lifting your pencil), it is enough to verify the definition of convexity only for the midpoint c of every interval [a, b]:

220

9. INEQUALITIES I

Theorem 3. (Continuity and Midpoints) Let f (x) be a continuous function on an interval I. Then f (x) is convex if and only if for all a, b ∈ I: Å

ã

a+b f (a) + f (b) ≥f . 2 2 Also, f (x) is strictly convex if and only if the inequality is strict for all a = b. Convexity can be also expressed in terms of the derivative f  (x), which measures the rate of change of f (x): Theorem 4. (First Derivative Test) Let f (x) be a differentiable function on an interval I. Then f (x) is convex if and only if f  (x) is increasing on the interior of I. Often, it is hard to determine directly if a function increases or decreases: this is more easily verified by determining where the derivative is positive or negative. More precisely, if g(x) is a function whose derivative satisfies g  (x) > 0, then g(x) is increasing. Applying this with g(x) defined to be f  (x) results in the second derivative f  (x) and leads to another useful test: Theorem 5. (Second Derivative Test) Let f (x) be a twice differentiable function on an interval I. Then f (x) is convex if and only if f  (x) ≥ 0 for all x ∈ I. Also, f (x) is strictly convex if and only if f  (x) ≥ 0 for all x ∈ I and there is no subinterval J ⊂ I of positive length on which f  is zero. Use each of the criteria above to produce three different solutions to: Exercise 11. Find out (with proof) on which intervals x2 , x3 , and sin x are convex. For an extra challenge, prove that each of the functions in Exercise 10 is convex (or strictly convex) on the indicated intervals.

5. Applications of Convexity to Inequalities 5.1. Convexity and endpoints. Convexity is frequently used to prove inequalities that would have been too hard to tackle before. A seemingly obvious but powerful principle is in action here:



Theorem 6. (Maximum Principle for Convex Functions) A convex function f (x) on an interval [a, b] is maximized at x = a or x = b (or both). Proof: First suppose that f (b) ≥ f (a). Given c in [a, b], let λ ∈ [0, 1] be such that c = (1 − λ)a + λb. Then the algebraic definition of convexity implies that f (c) ≤ (1 − λ)f (a) + λf (b) ≤ (1 − λ)f (b) + λf (b) = f (b), so f attains a maximum at b. The case f (a) ≥ f (b) is analogous.  Thus, as long as f (x) is convex on [a, b], its maximum is attained at an endpoint of the interval.

5. APPLICATIONS OF CONVEXITY TO INEQUALITIES

221

Problem 3. (USAMO ’80) Prove that for a, b, c ∈ [0, 1], b c a + + + (1 − a)(1 − b)(1 − c) ≤ 1. b+c+1 c+a+1 a+b+1 How can we apply the Maximum Principle and plug into the LHS the end values of the interval [0, 1], when our treatment of convex functions concentrated only on functions of a single variable, while the inequality above has three variables?! There is a standard technique that can help.

 PST 74. For a function f (x , x , . . . , x ) in several variables, fix all but one 1

2

n

variable, e.g., pretend that x2 , . . . , xn are constants. Viewing the function as having only one variable x1 will allow you to apply your knowledge of single-variable functions, for example, to their convexity.

Solution: Let F (a, b, c) denote the LHS. If we fix b and c in [0, 1], the resulting function of a is convex on [0, 1], because it is a sum of functions r a and linear functions. In detail, f1 (a) = b+c+1 and of the type f (a) = s+a b f4 (a) = (1 − a)(1 − b)(1 − c) are the linear functions, while f2 (a) = c+a+1 c and f3 (a) = a+b+1 are convex on their respective domains (−1 − c, ∞) and (−1 − b, ∞), and the latter intervals include [0, 1] since b, c ≥ 0. Therefore, the whole sum F (a, b, c) is maximized when a = 0 or a = 1; i.e., we will not decrease F (a, b, c) by replacing a by 0 or 1. Similarly we will not decrease F (a, b, c) by replacing each of b and c by 0 or 1. Hence the maximum value of F (a, b, c) will occur at one of the 23 = 8 cases when a, b, and c are 0s or 1s. But F (a, b, c) = 1 at these eight points (why? check it!), so F (a, b, c) ≤ 1 whenever 0 ≤ a, b, c ≤ 1.  5.2. Jensen’s inequality is one of the most widely-applied inequalities with convex functions. Theorem 7. (Jensen’s Inequality (JI)) Let f be a convex function on an interval I. If x1 , x2 , . . . , xn ∈ I, then Å ã f (x1 ) + f (x2 ) + · · · + f (xn ) x1 + x2 + · · · + xn ≥f . n n If moreover f is strictly convex, then equality holds iff x1 = x2 = · · · = xn . JI resembles the Continuity-and-Midpoints convexity criterion but for n variables, and it can be proven by induction on n. Furthermore, Lemma 5. JI implies AM-GM (P1 ≥ P0 ) and PM (Pr ≥ Ps ) for r ≥ s > 0

 and positive numbers x , x , . . . , x . 1

2

n

Hint: Apply JI with f (x) = − ln x or g(x) = xr/s .

 Exercise 12. Prove that x

x



Ä

x+1 2

äx+1



for x > 0.

Hint: Apply JI or just the definition of convexity to x ln x on (0, ∞).



222

9. INEQUALITIES I

Exercise 13. Show that among all convex n-gons inscribed in a fixed circle the regular n-gons have the largest perimeter. Hint: A bit of trigonometry and a careful choice of the variables and the function are necessary here. Apply Jensen’s Inequality with f (x) = − sin x and with xi as suggested by the diagram to the right. ♦

x xi i

5.3. Hardy-Littlewood-Pólya (HLP) inequality. Next we have an inequality that is so general that it includes almost all of the other inequalities we have discussed so far as special cases: Theorem 8. (HLP Majorization Inequality) Let f be a convex function on an interval I, and let a1 , . . . , an , b1 , . . . , bn ∈ I. Suppose that the sequence i a1 , . . . , an majorizes b1 , . . . , bn ; that is, a1 ≥ · · · ≥ an , b1 ≥ · · · ≥ bn , and a1 ≥ b1 , a1 + a2 ≥ b1 + b2 , .. . a1 + a2 + · · · + an−1 ≥ b1 + b2 + · · · + bn−1 , a1 + a2 + · · · + an−1 + an = b1 + b2 + · · · + bn−1 + bn . (Note the equality in the final equation.) Then f (a1 ) + · · · + f (an ) ≥ f (b1 ) + · · · + f (bn ). If in addition f is strictly convex on I, then equality holds iff ai = bi for all i. Exercise 14. Suppose that 0 ≤ θ1 , . . . , θn ≤ π/2 and θ1 + · · · + θn = 2π.

 Prove that 4 ≤ sin(θ ) + · · · + sin(θ ) ≤ n sin(2π/n). 1

n

Hint: If f (x) = − sin x, then apply JI or HLP, as needed.



Hint: Let one of the sequences in HLP be constant.



 Lemma 6. Show that Jensen’s Inequality is a special case of HLP.

5.4. Inequalities with weights. Many of the inequalities we have looked at so far have versions in which the terms in a mean can be weighted unequally. The algebraic definition of convexity itself unequally weights the two x-values a and b with non-negative weights λ1 = 1 − λ and λ2 = λ so that λ1 + λ2 = 1 and λ1 f (a) + λ2 f (b) ≥ f (λ1 a + λ2 b). Let’s see how this works for other inequalities with more variables and weights. Theorem 9. (Weighted AM-GM) If x1 , . . . , xn > 0, λ1 , . . . , λn ≥ 0, and λ1 + · · · + λn = 1, then λ1 x1 + λ2 x2 + · · · + λn xn ≥ xλ1 1 xλ2 2 . . . xλnn , with equality iff all the xi with λi = 0 are equal.

6. GEOMETRY LEFTOVERS AND A MEAN SUMMARY

223

Definition 3. Fix x1 , . . . , xn > 0 and weights λ1 , . . . , λn ≥ 0 such that i λ1 + · · · + λn = 1. For any r = 0, define the rth weighted power mean by Pr := (λ1 xr1 + λ2 xr2 + · · · + λn xrn )1/r . Also let P0 be the weighted geometric mean P0 := xλ1 1 xλ2 2 . . . xλnn . Theorem 10. (Weighted PM) The weighted power means Pr increase as r increase. Moreover, if the xi ’s with λi = 0 are not all equal, then Pr is a strictly increasing function of r. Theorem 11. (Weighted JI) Let f be a convex function on an interval I. If x1 , . . . , xn ∈ I and λ1 , . . . , λn ≥ 0 with λ1 + · · · + λn = 1, then λ1 f (x1 ) + λ2 f (x2 ) + · · · + λn f (xn ) ≥ f (λ1 x1 + λ2 x2 + · · · + λn xn ) . If f is strictly convex, then equality holds iff all the xi with λi = 0 are equal. It is not surprising that there is a relation between the various inequalities and their weighted versions: Lemma 7. The weighted AM-GM, weighted PM, and weighted JI contain as special cases their ordinary versions AM-GM, PM, and JI, respectively. And conversely,

 GM implies the weighted AM-GM, and similarly, the weighted PM and the

Lemma 8. When the weights are all rational numbers, the ordinary AMweighted JI follow from their unweighted versions.

Hint: Apply AM-GM to a list in which some of the numbers are repeated. The same proof works for the weighted PM and weighted JI. ♦

 

Exercise 15. Given a, b, c, p, q, r > 0 with p + q + r = 1, prove a + b + c ≥ ap bq cr + ar bp cq + aq brc p . Hint: Apply the weighted AM-GM three times.



Exercise 16. Prove that if a, b, c are sides of a triangle, then (a + b − c)a (b + c − a)b (c + a − b)c ≤ aa bb c c . Hint: Why is a triangle mentioned? Divide both sides by the RHS, take some root of both sides, and apply the weighted AM-GM. ♦

6. Geometry Leftovers and a Mean Summary 6.1. Plane geometry interlude. Several problems in this session called for some knowledge of plane or analytic geometry, or trigonometry. In particular, three plane geometry facts appeared prominently in the theory part of the session and deserve to be proven. The first one pertains to the name geometric mean of two variables.

224



9. INEQUALITIES I

Exercise 17. In Figure 1b, two segments form the diameter AB of circle k: AD = x and DB = y. A perpendicular is erected at point D to AB until it √ hits the circle k in point C. Prove that CD = xy. Proof: Since ∠ACB is an inscribed angle in circle k overlooking diameter AB, we have ∠ACB = 90◦ (cf. Circle Geometry session, vol. I), and all three triangles ADC, CDB, and ACB are right. They are also similar because they have one more equal angle; e.g., ∠BAC is shared among two of them, etc. In particular, the ratios of the two smaller triangles’ sides are the same. √  Hence, AD/CD = CD/BD, from which xy = CD 2 and CD = xy. This geometric construction explains the name geometric mean of x and y. Using it, it is possible to:

 Exercise 18. Interpret and prove geometrically the baby AM-GM.

Proof: The midpoint O of AB is the center of k, and the radius of k, being half of the diameter, is the arithmetic mean (x + y)/2 = OA = OB = OC. So both the AM and GM of x and y appear in right triangle ODC as the hypotenuse OC and the leg CD, respectively (cf. Fig. 1b). √ The geometric inequality OC ≥ OD says that (x + y)/2 ≥ xy, which is the baby AM-GM inequality. Equality is attained if and only if ODC degenerates into a segment OC, which happens exactly when D = O, i.e., when x = y (cf. Fig. 5a).  C

B k

A

y x D=O

D B

A

A1

F

E

D1

B1 x

Figure 5. Equality in baby AM-GM and Trapezoids in convexity A third geometric fact sneaked into the discussion of convexity in Figure 4a, which is redrawn below as Figure 5b.

 point D on side AB so that A A, D D, and B B are parallel. If A D

Exercise 19. Given trapezoid A1 B1 BA, let point D1 be on side A1 B1 and 1 1 1 1 1 = λA1 B1 for some λ ∈ (0, 1), then show that DD1 = (1 − λ)A1 A + λB1 B.

Proof: WLOG, assume AA1 ≤ BB1 . Draw a line through A1 parallel to AB that intersects DD1 in E and BB1 in F . Then A1 A = ED = F B, and D1 E/B1 F = A1 D1 /A1 B1 = λ. (Why? Think of similar triangles.) We can now calculate the length of the segment in question: DD1 = D1 E + ED = λB1 F + λED + (1 − λ)ED = λ(B1 F + F B) + (1 − λ)A1 A = λB1 B + (1 − λ)A1 A.



7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

225 Baby AM-GM

6.2. A diagram of the major inequalities for means introduced in this session appears to the right. The arrows show implications between different inequalities, e.g., the bottom-most arrow indicates that the Hardy-Littlewood-Pólya Inequality implies the Jensen’s Inequality. The dashed arrows refer to implications being shown here for the case of rational weights only. Using the so-called smoothing technique, we will prove some of these inequalities in Monovariants III. We will see other fundamental inequalities and further sophisticated applications of inequalities to olympiad-style problems in the upcoming Inequalities II.

Weighted Baby AM-GM

AM-GM

Weighted AM-GM

Weighted PM

Weighted JI

PM

JI

HLP

6.3. Acknowledgments and sources for more inequalities. Some of the problems here were drawn from notes from the U.S. training session for the International Mathematics Olympiad. Others are from [73]. Many of the inequalities themselves can be found in the book [36], which contains a very thorough treatment of the topic.

7. Hints and Solutions to Selected Problems Lemma 1. Reasoning backward, we square, simplify, and move to the RHS: ? √ xy ≤

x+y 2

?

?

?

2 2 2 2 2 ⇔ xy ≤ ( x+y 2 ) ⇔ 4xy ≤ x + 2xy + y ⇔ 0 ≤ x − 2xy + y . ?

The latter is recognized as 0 ≤ (x − y)2 , which is always true, with equality iff x = y. If you prefer to avoid squaring both sides, multiply the proposed inequality instead by 2, pull to the RHS and rewrite as a square: ? ? √ ? √ √ √ √ √ √ 2 xy ≤ x + y ⇔ 0 ≤ ( x)2 − 2 x y + ( y)2 ⇔ 0 ≤ ( x − y)2 . √ √ Plugging t = x − y into the quotation in the beginning of the session, we √ √ obtain the true inequality t2 ≥ 0, with equality iff x = y, i.e., x = y.  Exercise 1. If L replaces 20m, then the system of equations is x + 2y = L ♦ and x = 2y, leading to y = L/4, x = L/2, and largest area xy = L2 /8. Exercise 2. For the first question, the area xy = 50 is fixed and we want to minimize the fence length x + 2y. By baby AM-GM: √ √ x + 2y AM-GM  ≥ x · 2y = 2 · 50 = 100 = 10, 2 where equality is obtained iff x = 2y. Plugging into the fixed area, we obtain 2y 2 = 50, i.e., y = 5 and x = 10, yielding a minimal fence of 20 m again!

226

9. INEQUALITIES I

This is not a coincidence, since this exercise and the original problem are two sides of the same optimization situation (why?).  To answer the other question, we have the fence length fixed at 20 = 2x + 2y, i.e., x + y = 10. To maximize the area, we again apply AM-GM, but this time to variables x and y: AM-GM √ x+y 10 x·y ≤ 2 = 2 = 5, with equality iff x = y = 5. Thus, the square has maximal area of 25 m2 among all rectangles of 20 m perimeter. Again, any perimeter length will yield a square as the optimal figure in this type of a problem.  Exercise 3. Following the hint, we apply AM-GM once to each sum on the LHS and then multiply the three resulting inequalities: √ √ √ a + b ≥ 2 ab, b + c ≥ 2 bc, c + a ≥ 2 ca, and √ √ √ √ (a + b)(b + c)(c + a) ≥ 2 ab · 2 bc · 2 ca = 8 a2 b2 c2 = 8abc, where equality is obtained iff equalities are obtained in each of the original three applications of AM-GM, i.e., a = b = c.  Exercise 4. We pair up the numbers from {1, 2, . . . , n} so that each pair adds up to n + 1: (1, n), (2, n − 1), . . . , (n − 1, 2), (n, 1). Note that each number appears twice, and if n is odd then the middle number (n + 1)/2 is paired up with itself. We now apply AM-GM to each such pair: √ √ √ √ 2+(n−1) 1+n 1·n, ≥ 2(n−1), . . . , (n−1)+2 ≥ (n−1)2, n+1 2 ≥ 2 2 2 ≥ n·1. Ä än √ ≥ n! n! = n!. EqualMultiplying now all these n inequalities yields n+1 2 ity can never be obtained for n > √ 1 since the very first application of AM-GM > n.  yields a strict inequality: 1+n 2 Exercise 5. Pulling all variable expressions to the LHS, and dividing by 3, ? √ we turn the inequality into its equivalent version (2 x + x1 )/3 ≥ 1. The 3 in the denominator suggests using AM-GM for 3 variables, but we√have only 2 summands in the numerator! Hence the text suggests to split 2 x: √ √ √ » » √ 2 x+ x1 x+ x+ x1 AM-GM 3 √ √ 1 3 1 3 = ≥ x x = x = 1 = 1, 3 3 x x √ √  with equality iff x = x1 , i.e., x x = 1, x3 = 1, and x = 1. Exercise 6. We see the product of many ab’s on the RHS; if we divide everything by n, we also see a denominator of n on the LHS; still, we do not see a sum on the LHS! But there is a common factor of (a − b) on both sides: ?

an −bn = (a−b)(an−1 +an−2 b+· · ·+an−1−k bk +· · ·+abn−2 +bn−1 ) ≥ n(a−b)(ab)

n−1 2

.

If a = b, then both sides are 0 and we are done. If a > b, we divide by n(a − b) without changing the direction of the inequality: n−1 an−1 + an−2 b + · · · + an−1−k bk + · · · + abn−2 + bn−1 ? ≥ (ab) 2 . n

7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

227

We can now apply AM-GM to the n summands an−1−k bk on the LHS: LHS

AM-GM



» n (an−1 )(an−2 b1 ) · · · (an−1−k bk ) · · · (a1 bn−2 )(bn−1 ).

How many a’s are under the radical? Adding up the exponents of a, we get (n − 1) + (n − 2) + · · · + 1 =

(n−1+1)(n−1) 2

=

n(n−1) · 2

Therefore taking the nth root in the RHS leads to

» n n−1 n−1 n−1 n(n−1) n(n−1) LHS ≥ a 2 b 2 = a 2 b 2 = (ab) 2 .

Incidentally, when a > b, the terms an−1 and bn−1 in the application of AM-GM are not equal, so the inequality is strict.  Exercise 7. Let (x, y, z) be the corner of the box such that x, y, z > 0. Then y2 x2 z2 (why?). If we a2 + b2 + c2 = 1 and the box’s volume is (2x)(2y)(2z) = 8xyz √ 3 blindly apply AM-GM to the product xyz, we end up with xyz ≤ x+y+z , 3 and we do not have information about the last average. We need to involve x2 , y 2 , and z 2 , and so instead we try »

x2 + y 2 + z 2 · 3 But again we are out of luck: we need the constants a, b, and c to appear on the RHS! Following the hint, we realize that in the defining equation for the ellipsoid, equality is obtained if xa = yb = zc = √13 , and since the given sum 3

involves the terms Ä 3

x2 y 2 z 2

x2 y 2 a2 , b2 ,

x2 a2

äÄ

y2 b2

and

äÄ

AM-GM



z2 c2 ,

we apply AM-GM to these and obtain 2

ä AM-GM ( x2 ) + ( y2 ) + ( z2 ) 1 a b c = · ≤ 2

z2 c2

2

3 3 By cubing, taking the square root, and clearing a denominator, we solve √ · If x = √a , y = √b , and for the product xyz, and we find that xyz ≤ 3abc 3 3 3 z=

√c , 3

then equality holds everywhere, so the maximum volume is

8abc √ · 3 3



In a similar vein, the largest area of a rectangle inscribed in an ellipse 2 + yb2 = 1 with sides parallel to the axes is 2ab; the corner of this optimal ♦ rectangle in the first quadrant is (x = √a2 , y = √b2 ). x2 a2

Problem 2. Following the hint, we multiply out and expand the LHS. It consists of 2n products of the form ai1 ai2 . . . aik , corresponding to the 2n subsets {i1 , i2 , . . . , ik } of the indices {1, 2, . . . , n}, each such subset indicating which ai ’s have been chosen and which have been replaced by 1s when multiplying   out. For every k = 0, 1, . . . , n we apply AM-GM to the sum of all such nk products that have exactly k terms: (3)

 {i1 ,i2 ,...,ik }

ai1 ai2 . . . aik

AM-GM



Ç å n n

k



k

ai1 ai2 · · · aik .

{i1 ,i2 ,...,ik }

If the notation is too intimidating, then use n = 3 to see the pattern. Each ai    appears in exactly n−1 times: after removing ai , these are the number of k−1

228

9. INEQUALITIES I

ways to choose the other (k−1) elements from the (n−1) leftover numbers aj . Thus, each ai is raised to the power n−1 k−1 n k

=

(n − 1)! k!(n − k)! k · = · (k − 1)!(n − k)! n! n n n k k

Hence, the RHS of (3) equals k (a1 a2 . . . an ) n = k g . For the whole sum, we run this argument for k = 0, 1, . . . , n and recover on the LHS the original product that was multiplied out: n Ç å  n k g = (1 + g)n . (1 + a1 )(1 + a2 ) . . . (1 + an ) ≥ k k=0

i The last is a famous combinatorial identity called the Binomial Theorem.

Equality is achieved only if equalities are obtained everywhere in the applications of AM-GM in (3). In particular, for k = 1 we must have the singleton products equal among themselves, i.e., a1 = a2 = · · · = an (= g), and these do produce an overall equality. 

 ric functions and some abstract manipulation of the LHS, or by smoothing

The Binomial Theorem can be avoided by using the so-called symmet-

algorithms (to be done in Inequalities II and Monovariants III, respectively).

Exercise 8. If ABC has a right angle at B, then by the Pythagorean Theorem, AC 2 = AB 2 + BC 2 . On the other hand, by PM we have » 2 » 2 √ AB+BC AB +BC 2 AC = P ≤ P = = 2AC. 1 2 2 2 2 , so AB + BC ≤ Equality is obtained iff AB = BC, i.e., ABC is right isosceles.



Exercise 9. The equation of the plane through (r, 0, 0), (0, s, 0), and (0, 0, t) is xr + ys + zt = 1. The plane passes through point (a, b, c), so ar + sb + ct = 1. The volume of the tetrahedron is rst/6. AM-HM for ar , sb , and ct implies   3

r s t · · = P0 ≥ P−1 = a b c

3 a r

+

b s a r

+

c t

=

3 = 3, 1

so rst ≥ 27abc, with equality if and only if = sb = ct = 13 , i.e., r = 3a, s = 3b, and t = 3c. The maximal volume is thus 9abc/2.  The two-dimensional version of the problem asks for the triangle of largest area bounded by the positive parts of the x- and y-axes and a line passing through a fixed point (a, b) with a, b > 0. Analogously, the largest area is 2ab attained when the line has x-intercept 2a and y-intercept 2b. ♦ n xr + · · · + xrn · Then lim y(r) = = 1, and Lemma 2. Let y(r) = 1 r→0 n n y  (r) =

1 xr1 ln x1 +· · ·+xrn ln xn ln x1 + · · · + ln xn , so lim y  (r) = = ln(x1 . . . xn ) n . r→0 n n Ä ä Ä ä 1 1

1

If the limits exist, P0 = lim Pr = lim y(r) r = lim e r→0

r→0

r→0

ln y(r) r

= e

lim

r→0 r

ln y(r)

.

7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

229

The new problem lim ln y(r) is solved by l’Hôpital’s Rule (l’H), since both top r r→0

and bottom go to 0 when r → 0. By continuity, lim ln y(r) = ln 1 = 0, so r→0

ln y(r) l’H (ln y(r)) = lim = lim r→0 r→0 r→0 r r lim

y  (r) y(r)

1

Backtracking, we get P0 = eln(x1 x2 ...xn )

1/n

lim y  (r)

=

r→0

= ln(x1 x2 . . . xn )1/n .

lim y(r)

r→0

=

√ n



x1 x2 . . . xn .

Lemma 3. If xk = min{x1 , . . . , xn } and r < 0, then xri =

1 x−r i



1 x−r k

= xrk .

After replacing xm by xk , the rest of the solution goes exactly as in the text √ 1 for P∞ , with the observation that lim r n = lim n r = n0 = 1.  r→−∞

r→−∞

Exercise 10. In addition to x2 and x3 in Figure 4, below are graphs of examples of the listed types of functions (cf. Fig. 6). Some convex parts are drawn in solid lines. ♦ tan x − cos x

x−1

−x

ex

− sin x

1 3

3 5+x

−5 − log x

Figure 6. Graphs of convex examples Lemma 4. If f (x) and g(x) are convex functions on [a, b], then by the algebraic definition of convexity, for any λ ∈ [0, 1]:

    f (1 − λ)a + λb ≤ (1 − λ)f (a) + λf (b) and g (1 − λ)a + λb ≤ (1 − λ)g(a) + λg(b).

Adding these two inequalities yields

    f (1 − λ)a + λb + g (1 − λ)a + λb ≤ (1 − λ)(f (a) + g(a)) + λ(f (b) + g(b)),

which is the algebraic definition of convexity on [a, b] for the function f + g. A constant or linear function g(x) is automatically convex, so adding it to a convex function preserves convexity.  Exercise 11. Since all three functions x2 , x3 , and sin x are continuous on R, we can apply to them the Continuity-and-Midpoint (CM) criterion. To start, for a, b ≥ 0 we apply special cases of the PM inequality to x2 and x3 : Å

ã

Å

ã

a3 + b3 (P3 ≥P1 ) a + b 3 a2 + b2 (P2 ≥P1 ) a + b 2 ≥ and ≥ , 2 2 2 2 and hence, by CM criterion, x2 and x3 are convex on [0, ∞). As for showing that x2 is convex on all of R, what would happen to the inequality if you replace a and b by ±a and ±b? ♦ 2

2

3

3

230

9. INEQUALITIES I

Using some trigonometry, we get Å ã Å ã a+b a−b sin a + sin b = sin cos . 2 2 2 Ä

ä

Ä

ä

a−b | ≤ 1, we How does the RHS compare with sin a+b 2 ? Although | cos 2 have to be careful with signs, or we will get the inequality in the Ä ä Ä äwrong a−b ≤ 0 and 1 ≥ cos ≥ 0 direction! If −π ≤ a, b ≤ 0, then sin a+b 2 2 (why?), so convexity of sin x on [−π, 0] follows from the inequality Å ã Å ã Å ã a+b a−b a+b sin a + sin b = sin cos ≥ sin .  2 2 2 2

To apply the First Derivative Test, we find out where derivatives increase: • (x2 ) = 2x increases for all x, so x2 is convex on R; • (x3 ) = 3x2 increases for all x ≥ 0, so x3 is convex on [0, ∞); • (sin x) = cos x increases for −π ≤ x ≤ 0, so sin x is convex there.



For the reader interested in applying the First Derivative Test to the functions in Exercise 10, here are the corresponding derivatives: • (xr ) = rxr−1 , (−xr ) = −rxr−1 ,

Ä

r s+x

• (− sin x) = − cos x, (− cos x) = sin x,

ä

−r 1  (s+x)2 , (− ln x) = − x (tan x) = cos12 x , (ex ) = ex .

=

Finally we check convexity with the Second Derivative Test: • (x2 ) = (2x) = 2 > 0, so x2 is convex on R; • (x3 ) = (3x2 ) = 6x > 0 for all x > 0, so x3 is convex on [0, ∞); • (sin x)= (cos x)= − sin x ≥ 0 on [−π, 0], so sin x is convex there. Lemma 5. Apply JI to the convex function f (x) = − ln x on (0, ∞): Ä

ä

Ä

 ä

xn n n − ln x1 +···+ln ≥ − ln x1 +···+x implies n1 ln(x1 . . . xn ) ≤ ln x1 +···+x n n n Ä ä √ √ n n so ln n x1 . . . xn ≤ ln x1 +···+x , or P0 = n x1 . . . xn ≤ x1 +···+x = P1 , n n  with equality iff x1 = x2 = · · · = xn .

For Pr ≥ Ps with r ≥ s > 0, we apply JI to xs1 , xs2 , . . . , xsn > 0 and the function g(x) = xr/s , which is convex on [0, ∞) because rs > 1: Å

Taking rth

(xs1 )r/s + · · · + (xsn )r/s JI xs1 + · · · + xsn ≥ n n roots on both sides gives

Pr =

Å r ã x1 + · · · + xrn 1/r

n



ãr/s

Å s ã x1 + · · · + xsn 1/s

n

.

= Ps .



Exercise 12. To discover the needed convex function, reason backward: ?

xx ≥

Å

x+1 2

ãx+1

Å Å ã ã ? ? ln x + 1 x+1 x+1 ⇐⇒ ln xx ≥ ln ⇐⇒ x ln x ≥ (x + 1) ln · 2 2

The function f (x) = x ln x participates on both sides of the last inequality,

7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

231

so we check if f (x) is strictly convex for x > 0. This is true by the First Derivative Test since f  (x) = ln x + 1 is strictly increasing for x > 0. By the definition of convexity of f (x) = x ln x on [x, 1] (or [1, x]) and t = 1/2, f (x) + f (1) ≥f 2

Å

Å ã ã x+1 x+1 x+1 x ln x + 1 ln 1 ≥ ln , so , 2 2 2 2



which simplifies to the desired inequality, with equality iff x = 1.

Exercise 13. Let xi be as in the hint, so xi ∈ (0, π) and x1 + · · · + xn = π. Then the length of the ith side of the n-gon is 2R sin xi , where R is the radius of the circle. Thus, we need to maximize the sum of all sin xi . But f (x) = − sin x is convex on [0, π], hence by JI: Å

ã

π x1 + · · · + xn sin x1 + · · · + sin xn ≥ − sin = − sin · (4) − n n n π Therefore, sin x1 + · · · + sin xn ≤ n sin n , with equality iff x1 = · · · = xn , i.e., the polygon is regular.  Exercise 14. Modifying slightly the solution in Exercise 13, we immediately obtain the second inequality: sin θ1 +· · ·+sin θn ≤ n sin(2π/n), with equality iff all xi ’s are equal. For the first inequality, note that there are at least four angles θ1 , θ2 , θ3 , and θ4 , or else their sum would be at most 3π/2 and not 2π. We arrange the angles θi in decreasing order and apply HLP to the convex function f (x) = − sin x on [0, π/2] and to the sequences a1 = · · · = a4 = π2 , a5 = · · · = an = 0, and bi = θi for all i. (Check that {an } majorizes {bn }.) Thus, −4 sin

π − 0 − · · · − 0 ≥ − sin θ1 − · · · − sin θn ⇒ 4 ≤ sin θ1 + · · · + sin θn , 2



with equality iff four θi ’s are right angles and the rest are 0s.

Lemma 6. Arrange the numbers in JI in decreasing order: x1 ≥ · · · ≥ xn , and let c be their average (x1 + · · · + xn )/n. Note that the average of x1 , x2 , . . . , xk decreases when we include the next xk+1 . Indeed, x1 + · · · + xk ? x1 + · · · + xk + xk+1 ≥ k k+1 ?

?

⇔ (k + 1)(x1 + · · · + xk ) ≥ k(x1 + · · · + xk + xk+1 ) ⇔ x1 + · · · + xk ≥ kxk+1 ,

and the latter is certainly true since each number on the LHS is ≥ xk+1 . In particular, the smallest average is the total average c, so x1 + · · · + xk ≥ kc. This means that the sequence x1 , . . . , xn majorizes the constant sequence c, . . . , c. Applying HLP to the convex function f (x), we arrive at JI: f (x1 ) + f (x2 ) + · · · + f (xn ) f (x1 ) + f (x2 ) + · · · + f (xn ) Hence n

≥ f (c) + f (c) + · · · + f (c) = nf (c).

x + x + · · · + x 1 2 n ≥ f . n

Lemma 7. Apply the weighted inequalities with λ1 = · · · = λn = which case λ1 + · · · + λn = 1 as needed.

 1 n,

in ♦

232

9. INEQUALITIES I

Lemma 8. If the weights λi are rational numbers, multiplying by the lcm of their denominators, we can assume that all of them have the same denominator q: λi = pqi with q and the pi ’s positive integers. Since λ1 + λ2 + · · · + λn = 1, we have p1 + p2 + · · · + pn = q. Now we construct a list in which each ai is repeated pi times, for a total of q variables, i.e., { x1 , . . . , x1 , x2 , . . . , x2 , . . . , xn , . . . , xn }, and apply the ordinary AM-GM:





p1

 p2



λ1 x1 + · · · + λn xn =



pn



p1 » pn q p1 p 1 x1 + · · · + p n xn ≥ x1 . . . xpkk = x1q1 . . . x1qn . q

The last is xλ1 1 . . . xλnn , which is the RHS of the weighted AM-GM inequality for non-negative rational weights.  Similarly, JI implies weighted JI, using the same list of repeated variables: λ1 f (x1 ) + · · · + λn f (xn ) =

p1 f (x1 )+···+pn f (xn ) JI ≥ q

Ä

f

p1 x1 +···+pn xn q

ä

.

The last is the desired RHS f (λ1 x1 + · · · + λn xn ) of the weighted JI.



PM too implies weighted PM, with the same list of variables, r ≥ s > 0: Pr = (λ1 xr1 + · · · + λn xrn )1/r =

Äp

r r 1 x1 +···+pn xn

q

ä1/r

PM



Äp

s s 1 x1 +···+pn xn

q

ä1/s = Ps .

We leave it to the reader to revise this to include the weighted P0 too.



Exercise 15. The powers p, q, and r play here the role of the weights λi , but we need to change their order to match all three products on the RHS. Hence, we apply weighted AM-GM three times to the variables {a, b, c} with the weight arrangements (p, q, r), (r, p, q), and (q, r, p): pa + qb + rc ≥ ap bq cr , ra + pb + qc ≥ ar bp cq , qa + rb + pc ≥ aq brc p . Adding these up and using that p+q+r = 1 yields the desired inequality.



Exercise 16. The triangle inequality ensures that a + b − c > 0, etc. We divide by aa bb cc and take the (a + b + c)th root on both sides: Å

ã Å ã Å ã a+b−c a b+c−a b c+a−b c a b c a b Å Å Å ã a+b+c ã a+b+c ã c a+b−c b+c−a c + a − b a+b+c ⇔ a b c

?

≤ ?



1 1.

The three exponents add up to 1 and are positive. Hence they can serve as b+c−a b+c−a weights λi , making the LHS the weighted GM of a+b−c a , b , and c , which is less than or equal to the weighted AM, i.e., a+b−c b b+c−a c b+c−a a · + · + · a+b+c a a+b+c b a+b+c c a+b−c b+c−a c+a−b a+b+c = + + = =1 a+b+c a+b+c a+b+c a+b+c

LHS ≤



Session 10

Multiplicative Functions. Part II Dirichlet Product and Möbius Inversion Zvezdelina Stankova Sneak Preview. This session is a direct continuation of Multiplicative functions Part I; even the numbering of sections and statements here follows suit. Sum-functions will be generalized via the Dirichlet product ; arithmetic functions will be inverted via the Möbius function μ; and upon discovery of the Euler function φ, the ∞-Raffle Problem will (yet again!) be conquered in a most elegant way. Occasionally, basic operations on remainders (cf. congruence modulo n in Number Theory I) and knowledge of binomial coefficients and counting techniques (cf. Combinatorics I) will aid our studies. The advanced reader familiar with the theory should ensure that he/she can solve all problems on μ and φ in Subsection 5.7 and Section 7 before moving on to the group structure of M, Dirichlet series, and the Riemann zeta-function in Part III.

4. Dirichlet Product

4.1. Redefining multiplication. The sum-function Sf (n) := d|n f (d), defined in Session 4, is a special case of a much broader notion. Just as you have learned (some time ago) how to multiply numbers, we will learn here how to multiply functions. Certainly, you can do it in the usual way: (f · g)(n) = f (n)g(n), i.e., simply multiply the corresponding values of f and g. For example, id · ε = ε and f · ι = f for any f ∈ A (check it!) But, as will soon become clear, this function product does not capture the basic properties of multiplicative functions in which we are interested, and it certainly does not generalize sum-functions. A different product on the set A of arithmetic functions is called for. Definition 5. Let f and g be two arithmetic functions. We define their i Dirichlet product (a.k.a. Dirichlet convolution) by  

(12)

f  g (n) =

f (d1 ) g(d2 ) =

d1 d2 =n

f (d) g

Ä ä n d

.

d|n

233

234

10. DIRICHLET, MÖBIUS, AND EULER

In other words, the product is taken over all pairs of divisors (d1 , d2 ) that multiply to n: d1 d2 = n. Solving for the divisor d2 = n/d1 yields the second equivalent summation in (12). For instance, f  g (6) =



f (d) g

Ä

6 d)

= f (1)g(6) + f (2)g(3) + f (3)g(2) + f (6)g(1).

d|6

The ordinary function product is now transformed into something entirely different, as demonstrated by the next exercise. Exercise 10. Calculate f  ε and f  ι for any f ∈ A. Solution: As ε is 0 except for ε(1) = 1, the Dirichlet products with ε are easy to calculate: f  ε (n) =



f (d1 ) ε(d2 ) = 0 + · · · + 0 + f (n) · ε(1) = f (n).

d1 d2 =n

Therefore, f  ε = f ; in particular, id  ε = id. Next, f  ι (n) =

 d|n

f (d) ι( nd ) =

 d|n

f (d)·1 =



f (d) = Sf (n).



d|n

Along the way, we have discovered that sum-functions are, not surprisingly, a particular instance of the D-product:1 Property 1. D-multiplying by ι produces the sum-function: f  ι = Sf for any f ∈ A.

(13)

Recall the reformulation of ∞-Raffle in Problem 1 . It described R as an arithmetic function whose sum-function is id: SR = id. By Property 1, we can now rewrite the sum-function SR as the D-product R  ι = id and have yet another reformulation: 

Problem 1 (∞-Raffle). An arithmetic function R satisfies R  ι = id. Solve for R and prove that R(n) ≥ 1 for all n ∈ N. Can we really solve for R from here? We’ll answer this affirmatively in a bit. 4.2. D-product and number-product are alike. Just as multiplication of numbers produces a number (5·7 = 35 ∈ N), so does D-multiplication start with two arithmetic functions f and g and produce an arithmetic function i f g. Formally, we say that  is a binary operation 2 on A, i.e.,  : A×A → A

sending the pair (f, g) → f  g. The first properties of number-multiplication that you have probably used are commutativity and associativity: mn = nm and (mn)k = m(nk) for all m, n, k ∈ N. The same properties hold true if we extend multiplication to rational, real, or even complex numbers. Likewise, 1 2

We will abbreviate “Dirichlet” to “D-” in various expressions from now on. We met with binary operations in Complex Numbers I.

4. DIRICHLET PRODUCT



235

Property 2. D-product is commutative and associative: f  g = g  f and (f  g)  h = f  (g  h) for all f, g, h ∈ A. Partial Solution: Commutativity of  is automatic from the symmetry of  in Definition 5: switching the places of f and g results in the same Dproduct. That  is also associative follows from a convenient way of rewriting Definition 5 for the triple product (f  g)  h: (14)

Ä

ä

(f  g)  h (n) =



f (d1 ) g(d2 ) h(d3 ),

d1 d2 d3 =n

where the sum is taken over all triples (d1 , d2 , d3 ) of divisors of n that multiply to n (prove this!) ÄAs the RHS ä of (14) is symmetric with respect to f , g, and h, it also equals f  (g  h) (n). Thus, we can write: f  g  h = (f  g)  h = f  (g  h).



4.3. Multiplicative identity. Suppose your little sister asks you: “What is the number 1?” How would you describe 1 to identify it uniquely among all other numbers? Answering “The number 1 signifies one object.” is a circular definition. Saying “1 is such that 1 + 1 = 2.” is no good either: you are defining 1 via another number 2; besides, I prohibit you from using in your description any operation other than multiplication . . . . Well, here is what you will “learn” about 1 in any abstract algebra course: 1 is the unique number such that multiplying any number by 1 gives that number, i.e., n · 1 = 1 · n = n for all n ∈ N. Again, this works equally well in the sets of rational, real, or complex numbers too. Moving to the set A of arithmetic functions with product , the question is: what function plays the role of “1” and deserves to be called the multii plicative identity of A? Our calculations in Exercise 10 point to the answer: Property 3. With respect to the D-product, the multiplicative identity in A is the two-valued function ε, i.e., f  ε = ε  f = f for all f ∈ A. Note that any multiplicative identity (if it exists) is unique: this is well known in abstract algebra. Indeed, in our context, if ε is another multiplicative identity in A, then ε = ε  ε = ε (why?), implying uniqueness of ε. At this point it is worth comparing the D-product with the usual product of functions. The ordinary product is commutative and associative: f · g = g · f and f · (g · h) = f · (g · h), but the multiplicative identity with respect to it is the function ι: as observed earlier, f · ι = ι ·f = f for any f ∈ A. With respect to the D-product, ι is not the multiplicative identity, but has the nice property of transforming each function f into its sum-function Sf : ι  f = f  ι = Sf . The two types of function products  and · have started to diverge, and they will continue to do so as we study our next notion.

236

10. DIRICHLET, MÖBIUS, AND EULER

4.4. Multiplicative inverses. Our inquisitive little sister is bothering us again: “What is the number 13 ”? You could reply: “ 13 is 1 divided by 3.” But we haven’t defined yet division! “ 13 is the reciprocal of 3.” Likewise, what is a reciprocal ? Correction: “ 13 is that number which, added 3 times to itself, gives 1.” Yet, addition must not be used either in this description. . . . A last attempt finally does it: “ 13 is the number which, multiplied by 3, gives 1.” That’s right! We can define now the reciprocal n−1 (a.k.a. multiplicative inverse) of any number n as the solution x to the equation n · x = x · n = 1.3 We make an analogous definition in the set of arithmetic functions: Definition 6. The Dirichlet inverse f −1 of f ∈ A is an arithmetic function i whose D-product with f is the multiplicative identity ε: f f −1 = f −1 f = ε. Uniqueness of multiplicative inverses is another well-known fact from abstract algebra. In our context, if fˆ−1 is another D-inverse of f , the trick is to calculate a specific triple D-product (in the middle below): assoc. fˆ−1 = fˆ−1  ε = fˆ−1  (f  f −1 ) = (fˆ−1  f )  f −1 = ε  f −1 = f −1 , from which fˆ−1 = f −1 and the D-inverse of f is unique. As for existence, unfortunately, not all arithmetic functions have Dinverses in A. This shouldn’t be surprising since not all numbers have reciprocals either: how about 0? The lemma below addresses this issue, but its proof is deferred until Part III, as it is not crucial for understanding the material before then.

 Lemma 3. An arithmetic function f has a D-inverse iff f (1) = 0.

4

Because ε is the multiplicative identity in A, it is its own D-inverse: ε = ε (why?). Calculating D-inverses in general is far from trivial even for strongly multiplicative functions, as we will see in Problem 10. −1

4.5. ∞-Raffle challenge. Ultimately, we want to solve for the function R from R  ι = id. If we were dealing with (rational) numbers instead, this wouldn’t have been a problem. To solve x a = b for x, we would multiply both sides by the reciprocal of a and arrive at x = b a−1. As long as we can calculate the D-inverse ι−1 , we can apply the same logic to our equation for R: R  ι = id

ι−1

assoc.

⇔ (R  ι)  ι−1 = id  ι−1 ⇔ R  (ι  ι−1 ) = id  ι−1 ⇔ R  ε = id  ι−1 ⇔ R = id  ι−1 .

Our work in the next section is cut out for us: we need to find a formula for ι−1 , which, incidentally, will help us understand much better the relationship between any function and its sum-function. As reciprocals do not exist within the set of natural numbers (except for 1−1 = 1), we need to enlarge N to include at least the positive rational numbers. From now on, “numbers” will refer to natural, rational, real, or complex numbers, as needed. 4 For the group theory fans: Lemma 3 implies that A is not a group under . However, we’ll see that its subset A∗ of all arithmetic functions f with f (1) = 0 is a group. 3

5. MÖBIUS INVERSION FORMULA

237

5. Möbius Inversion Formula 5.1. Ad-hocking ι−1 . For simplicity, denote the D-inverse of ι by g ∈ A, i.e., g  ι = ε. But g  ι is the sum-function Sg ; hence, Sg = ε. Thus, (15)

g(1) = ε(1) = 1 and



g(n) = ε(n) = 0 for n ≥ 2.

d|n

To find an explicit formula for g, we will use the ad-hoc 5 approach of PST 27, guess the formula, and then prove it rigorously. In the following, we have done some initial calculations for g(n). Try to come up with these calculations on your own and then compare your answers with those in the table. n

Sg (n) = 0 for n ≥ 1

g(n)

p

g(1) + g(p) = 0

 

−1

p2

g(1) + g(p) + g(p2 ) = 0

   

p3 p1 p2

g(p3 ) = 0; why? g(1) + g(p1 ) + g(p2 ) + g(p1 p2 ) = 0

     

1

?

0

0

p1 p2 p3

g(1) +

 1

0

?

3 

−1

g(pi ) +

i=1

  −3

 i 1.

μ(d) = μ(1) +

r 



μ(pi1 · · · pik ).

k=1 1≤i1 1 where all pi ’s are odd, and we can apply the formula for φ: 2n = φ(pa11 pa22 · · · par r ) = p1a1 −1 p2a2 −1 · · · prar −1 (p1 − 1)(p2 − 1) · · · (pr − 1). ?

Therefore, all ai = 1 so that 2n = (p1 − 1)(p2 − 1) · · · (pr − 1). This forces k each pi to be a Fermat prime: pi = 22 i + 1 for some ki ≥ 0 (why?). The problem is reduced to finding all sets of Fermat primes {pi } such that p1 p2 · · · pr + 1 = 2n+1 = 2(p1 − 1)(p2 − 1) · · · (pr − 1);

(26)

or, equivalently, (27)

Ä

22

k1

äÄ

+ 1 22

k2

ä

Ä

+ 1 · · · 22

kr

ä

+ 1 + 1 = 2 · 22

k1

k2

kr

· 22 · · · 22 .

At this point, the problem changes direction and demands some skills with modulo calculations (cf. Number Theory I).



PST 80. Order the primes p1 < p2 < · · · < pr . To find out the smallest k Fermat prime p1 = 22 1 + 1 appearing, check both sides of the equation using a suitable modulus. After substituting this smallest p1 , find out the next p2 again by using a suitable modulus. Continue until you find all p1 , p2 ,. . . , pr and reach a contradiction for pr+1 . For a beginner, it may be tricky to figure out these suitable moduli: k k they are the powers 22 i . Indeed, the smallest among them, 22 1 , is suitable k for calculating p1 because p1 ≡ 1 (mod 22 1 ) and because all other appearing k Fermat primes are also ≡ 1 (mod 22 1 ) (why?). Equation (27) “miraculously” k k simplifies to 1 · 1 · · · 1 + 1 ≡ 2 · 0 · 0 · · · 0 (mod 22 1 ), i.e., 2 ≡ 0 (mod 22 1 ).

252

10. DIRICHLET, MÖBIUS, AND EULER k

The latter simply means that 22 1|2, i.e., k1 = 0 and p1 = 3 is the smallest Fermat prime appearing in (27). Substituting p1 = 3, we obtain: Ä

3 22

(28)

k2

ä

Ä

+ 1 · · · 22

kr

ä

+ 1 + 1 = 22 · 22

k2

kr

· · · 22 .

As suggested earlier, in order to find k2 and p2 , we reduce equation (28) k modulo 22 2 . We leave the reader to follow the directive of PST 80 and finish the problem in this fashion. Along the way, the product of consecutive Fermat numbers will keep displaying the following pretty pattern:

 Lemma 6. F

k+1

= F0 F1 F2 · · · Fk + 2 for all k ≥ 0.

For instance, F4 = 65, 537 = 3 · 5 · 17 · 257 + 2 = F0 F1 F2 F3 + 2. The lemma would provide serious shortcuts in your solution to φ(2n+1 − 1) = 2n . The final answers are n = 2s − 1 for s = 0, 1, . . . , 5. ♦ Problem 16. Solve the equation φ(φ(n)) = 213 33 . Hint: This problem definitely resembles Exercise 17, where a correct calculation would have yielded φ(φ(10!)) = 213 33 . Aha, we know one solution to our current problem: n = 10! = 28 34 52 7. Alas, there are plenty more! A few that come to mind are n = 212 35 , 29 35 52 , 29 33 11·13, 23 ·7·11·13·17·19·37. . . As you try to organize the array of all possibilities for n, you will be forced to generalize the Fermat primes to primes of the form 2k 3m + 1 for some k, m ≥ 0. To classify the latter must be even harder than to classify k the “regular” Fermat primes 22 +1. Luckily, in Problem 16 there is a natural bound on the cases that need to be checked (what upper bound? cf. PST 78), which allows for computer attacks on the solution. Is there a way to extract “human sense” from such a computer solution? Is there a unified approach to resolve all equations of the form φ(φ(n)) = 2a 3b where a, b ≥ 0? ♦

8. Hints and Solutions to Selected Problems Property 2. Rewrite Definition 5 for the triple product (f g)h as follows: ã  Å     (f  g)(d) h(d3 ) = f (d1 ) g(d2 ) h(d3 ) (f  g)  h (n) = dd3 =n

=



dd3 =n d1 d2 =d

f (d1 ) g(d2 ) h(d3 ),

d1 d2 d3 =n

which fills the gap in showing associativity of  in the text.



Exercise 11. Let m and n be relatively prime. If one of them is not square? free, then mn is not square-free either; thus, both sides of μ(m)μ(n) = μ(mn) are 0 by μ’s definition and equality holds. If both m and n are square-free, their prime decompositions m = q1 q2 · · · qs and n = p1 p2 · · · pr multiply to a square-free prime decomposition: mn = q1 q2 · · · qs p1 p2 · · · pr . Thus,  μ(m)μ(n) = (−1)s (−1)r = (−1)s+r = μ(mn), and μ is multiplicative.

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

253

Exercise 12. (a) The initial purpose of discovering the function μ was to  find the D-inverse of ι, i.e., μ  ι = ε and Sμ = ε. (b) The D-product μ  μ is multiplicative since μ ∈ M, which reduces the calculation to prime powers: (μ  μ)(pa ) = a1 +a2 =a μ(pa1 )μ(pa2 ). If a ≥ 3, then we always have a1 ≥ 2 or a2 ≥ 2, yielding a 0 μ-value and hence a 0 overall answer. The remaining cases (with a ≤ 2) are: • (μ  μ)(p) =

μ(1)μ(p) + μ(p)μ(1) = −2;

• (μ  μ)(p ) =

μ(1)μ(p2 ) + μ(p)μ(p) + μ(p2 )μ(1) = 1.

2

Multiplying all answers for the prime powers pa produces the final formula: ß

(μ2 )(n) =

(−2)k 0

if n cube-free; otherwise,

where k counts the primes pi such that pi| n but p2i | n. Thus, Sμ2 = μ.



Exercise 13. Each relation f → Sf was verified somewhere in the text. ♦ Exercise 14. The explicit hint in the text leaves almost nothing to do: Sμf (pa ) =

a 

μ(pj )f (pj ) = μ(1)f (1) + μ(p)f (p) + 0 = f (1) − f (p);

j=0

μ  f (pa ) =

a 

μ(pj )f (pa−j ) = μ(1)f (pa ) + μ(p)f (pa−1 ) + 0 = f (pa ) − f (pa−1 ).

j=0



Multiplying these prime-power pieces gives the desired formulas.

Exercise 15. (a)–(b) The sums are Sμτ and Sμσ , which are multiplicative. If r counts the distinct prime divisors pi of n, by Exercise 14: Sμτ (n) = Sμσ (n) =

r 

r 

(τ (1) − τ (pi )) =

i=1

(σ(1) − σ(pi )) =

i=1

r 

r 

(1 − 2) = (−1)r , and

i=1

(1 − (1 + pi )) =

i=1

r 

(−pi ) = (−1)r

i=1

r 



pi .

i=1

(d)–(f) The sums are μ  τ = ι, μ  σ = id, and μ  id = R, which are the Möbius inversion of τ = Sι (cf. (11)), Sid = σ, and SR = id, respectively.  Problem 9. (a) Using ln ab = ln a + ln b and ln pa = a ln p, we calculate SΛ (n) =

 d|n

= 0+

Λ(d) =  i

  i

i Λ(pα i )+0=

α pi i |n

ln pai i

=

  i

ln(pa1 1 pa2 2

· · · par r )

ln pi = ln 1 +

α pi i |n

 i

ai ln pi

pi |n

= ln n.

pi |n



Hence, SΛ = ln, and by Möbius inversion, μ  ln = Λ. (b) Using ln(a/b) = ln a − ln b, we convert part (a) into part (b):

 d|n

μ(d) ln

  n  ∗ = μ(d)(ln n − ln d) = μ(d) ln n− μ(d) ln d = 0−Sμ ln (n). d d|n d|n d|n (a)

Justify (∗)! Hence, Sμ ln = −μ  ln = −Λ, i.e.,

d|n μ(d) ln d

= −Λ(n).



254

10. DIRICHLET, MÖBIUS, AND EULER

(c) The sum is Sμπ = (μπ)  ι, which is probably its most compact form. It is worth noting that, since μ “kills” any non-square-free d’s, the sum depends only on the distinct primes p1 , p2 , . . . , pr dividing n; i.e., increasing the exponents of prime powers does not change the value of the overall sum (why?). Thus, Sμπ (n) = Sμπ (p1 p2 · · · pr ), and the latter can be expanded as “a sum of sums” into a symmetric polynomial in the pi ’s. ♦ (d) The sum is the D-product μ  π, which (in contrast to the previous sum) does depend on the specific exponents of the prime powers dividing n. Although a “closed form” seems to be out of question, here is an interesting observation shifting the emphasis of the problem in a different direction: Problem 17. (Evan O’Dorney) Show that



n d|n μ(d)π( d )

> 0 ∀n ∈ N.

Solution: Let n = pa11 pa22 · · · par r > 1. For any subset S of {p1 , ..., pr }   define π(S) = pi ∈S pi and π(S + 1) = pi ∈S (pi + 1) as the products of all elements of S, or of the shifts up by 1 of all pi ∈ S. As μ(d) = 0 for nonsquare-free d’s, the only surviving terms μ(d)π(n/d) correspond to d = π(S) for any such subset S. If |S| is the number of elements of S, the LHS of the desired inequality can be written as  S

Ç

(−1)

|S|

åπ(S)π(S+1)

n , π(S)

2

one term for each such subset S. The “hero term” (i.e., the biggest term) is τ (n)

when S is empty13 (why?); it equals n 2 . The biggest “enemy term” (i.e., the most negative term) must occur for some singleton set (why?). WLOG τ (n/p1 )

S = {p1 }, so that this term (with the negative sign dropped) is (n/p1 ) 2 . Taking into account that n ≥ 2, all ai ≥ 1, and 2r−2 ≥ r − 1 for r ≥ 1 (why?), we bound the hero term from below as follows: n

τ (n) 2

= n(a1 +1)(a2 +1)···(ar +1)/2 = n(a2 +1)···(ar +1)/2 · na1 (a2 +1)···(ar +1)/2 > 22

r−2

Ä

ä n a1 (a2 +1)···(ar +1)/2 p1

≥ 2r−1

Ä

n p1

ä τ (n/p1 ) 2

.

In other words, the hero term is at least 2r−1 times as large as any enemy term. But each enemy term corresponds to a subset S with an odd number of elements, while each positive term to a subset S with an even number of elements. A well-known combinatorial problem says the following:



Exercise 22. The number of subsets of a set with r elements is 2r . For r ≥ 1, half of these subsets have an odd number of elements, and the other half have an even number of elements. Thus, there are exactly 2r−1 enemy terms, whose sum is already dominated by the single hero term. We conclude that the whole sum μ  π(n) > 0 for all n > 1. ♦ 13

Note that for S = ∅, the empty product



pi ∈∅

is defined to be 1.

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS



255

Problem 10. The statement is a giveaway: it asks us to find f −1 and show that f −1 ∈ M. Instead, we go the other way around: we calculate only the prime-power cases f −1 (pa ), multiply them together for a multiplicative function, and prove that the resulting formula indeed gives the D-inverse f −1 . (Compare with PST 96 on order-switching in Geometry III on page 294.) Initial checks for f −1 are based on the definition f −1  f = ε. For n = 1, −1 f (1)f (1) = 1; as f ∈ M with f (1) = 0, we have f (1) = 1 so that f −1 (1) = 1 too. For n = p, a prime, (f −1  f )(p) = f −1 (1)f (p) + f −1 (p)f (1) = ε(p) = 0 ⇒ f −1 (p) = −f (p).

However, for n = p2 : (f −1  f )(p2 )

=

f −1 (1)f (p2 ) + f −1 (p)f (p) + f −1 (p2 )f (1) = ε(p2 ) = 0

⇒ f 2 (p) − f 2 (p) + f −1 (p2 ) = 0 ⇒ f −1 (p2 ) = 0.

Along the way, we used that f is strongly multiplicative in f (p2 ) = f 2 (p). Continuing, you will discover that f −1 (pa ) = 0 when a ≥ 2. The conjectured multiplicativity of f −1 means then that f −1 (n) = 0 if n is not square-free, which, of course, reminds us of μ. If you think about it for a moment, you ? will come up with the compact formula f −1 = μf . Now we have to prove that this conjectured formula works. That μf is multiplicative follows directly from Lemma 2: both μ and f are in M, and hence so is their ordinary product μf . We are left to show that μf is indeed the D-inverse of f , i.e., why is (μf )  f = ε? As a D-product of two multiplicative functions μf and f , the entire LHS is also multiplicative. So, to calculate it, we split as usual into prime powers: a a     (μf )(pj )f (pa−j ) = μ(pj )f (pj )f (pa−j ) (μf )f (pa ) = j=0

j=0 f ∈S

= μ(1)f (1)f (pa)+μ(p)f (p)f (pa−1)+0 = f a (p)−f (p)f a−1(p) = 0. Ä ä

Multiplying all such pieces yields (μf )  f (n) = 0, except for n = 1: then Ä ä (μf )  f (1) = μ(1)f (1)f (1) = 1. Thus, (μf )  f = ε, and by uniqueness of D-inverses (cf. p. 236) f −1 = μf for any f ∈ S ∗ .  Since ε  ε = ε, then ε−1 = ε; and we already know that ι−1 = μ. Our formula checks for both cases: με = ε (why?) and μι = μ (ι is the multiplicative identity wrt “·”), so that ε−1 = με and ι−1 = μι.As for ♦ id−1 = μ·id ( ∈ S!), this is the first time we see a formula for id−1 . It is interesting to generalize our formula f −1 = μf for f ∈ S ∗ . In analogy with S ∗ , define M∗ := M − {O}.

 Exercise 23. Show that (μf )  g = g  Ä1 −

f g (p)

ä

for f ∈ M∗ , g ∈ S ∗ .

p|n

Plugging f = g ∈

S∗

quickly yields (μf )  g = ε, i.e., μf = f −1 .



256

10. DIRICHLET, MÖBIUS, AND EULER

Exercise 17. The cube-diagram on page 87 splits 10! into prime powers: φ(φ(10!)) = φ(φ(28 34 52 71 )) = φ(φ(28 )φ(34 )φ(52 )φ(7)) = φ((28 −27 )(34 −33 )(52 −5)(7−1)) = φ(211 34 51 ) = φ(211 )φ(34 )φ(5) = (211 −210 )(34 −33 )4 = 213 33 .



Exercise 18. (a) As x, y ≥ 1, we can calculate φ as follows: φ(2x 5y ) = φ(2x )φ(5y ) = (2x − 2x−1 )(5y − 5y−1 ) = 2x−1 5y−1 4 = 2x+1 5y−1 . Since 80 = 24 51 , we obtain x = 3 and y = 2. 



ai −1 (pi i pi

(b) If φ(n) = 12, then − 1) = 12. Hence, each (pi − 1)|12. The primes with this property are pi = 2, 3, 5, 7, and 13. In addition, piai −1 also must divide 12 = 22 31 ; thus, either ai = 1, or pi = 2 with ai ≤ 3, or pi = 3 with ai ≤ 2. Overall, n = 2a1 3a2 5a3 7a4 13a5 with a1 ≤ 3, a2 ≤ 2, and a3 , a4 , a5 ≤ 1. These are 4·3·2·2·2 = 96 cases: too many for brute-force!



PST 81. To reduce the number of cases to be checked, study whether a specific prime participates in n, starting with the largest prime! Case 1. If 13 participates (a5 = 1), then n = 13k where gcd(13, k) = 1. Thus, φ(n) = φ(13)φ(k) = 12φ(k) = 12 and φ(k) = 1. Check that only k = 1, 2 work here, so that n = 13 or 2·13. Case 2. Similarly, if 7 participates, reduce to φ(k) = 2, which is satisfied only by k = 3, 4, 6 (why?), so that n = 3· 7, 22 ·7 or 2· 3·7. Case 3. If 5 participates, reduce to φ(k) = 3, which never works! (why?). Case 4. If none of 13, 7, or 5 participates, n = 2a1 3a2 . But φ(2a1 ) = 2a1 −1 and φ(3a2 ) = 3a2 −1 · 2, neither of which ever equals 12 because of missing factors of 3 or 2. Thus, both 2 and 3 must participate in n with a1 = 2, 3 ♦ and a2 = 2 (why?). Check that n = 22 32 works, but n = 23 32 doesn’t. (c) Our first approach follows the techniques from the text. If φ(n) = n/2, then 2|n, so that n = 2a k with k odd and a ≥ 1. Substituting and canceling yields φ(k) = 1, which we just saw has the only solutions k = 1, 2. The final answer is n = 2a with a ≥ 1. ♦ In a second approach, we apply the formula for φ and clear denominators: n



(1 −

1 pi )

=

n 2

⇒ 2(p1 − 1)(p2 − 1) · · · (pr − 1) = p1 p2 · · · pr .

i

As 2|LHS, p1 = 2 on the RHS. Canceling results in (p2 − 1) · · · (pr − 1) = p2 · · · pr , which doesn’t have solutions. Indeed, any further prime p2 would be odd ; hence, p2 −1 would be even, making the LHS even; but all primes on the RHS are odd, which makes the RHS odd ! Therefore, the only participating  prime is p1 = 2, and n = 2a with a ≥ 1.

8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

257

(d) For the second approach started in the text, consider again (23) and PST 78. The RHS is divisible by 2, possibly by 4 (if p1 = 2), but never by 8 (why?). Correspondingly, the same must be true for the LHS. Each factor pi − 1 is even, except for possibly p1 − 1 = 1 (if p1 = 2). We conclude that there are at most two odd primes p2 and p3 , and possibly one even prime p1 = 2. Further, since 3|LHS, then p2 = 3. So, the only possibilities are ! "3 , ⇒ p÷ ! "3 − 1)(3 − 1)(p÷ 3(2’ 3 − 1) = 2· 2·3· p 3 − 1 = 2· p

where anything under a hat ! could be missing. But as p3 − 1 and p3 are 2, i.e., p3 = 3, contradicting p2 = 3. relatively prime, if p3 occurs then (p3 −1)|! ! We are left with 1 = 2, which simply means that p1 = 2 does not participate  either. This leads to the final answer n = 3a with a ≥ 1. Alternatively, in (23), 3|LHS hence 3|RHS; so WLOG p1 = 3. Dividing through by 6, (p2 − 1) . . . (pr − 1) = p2 . . . pr . Thus LHS < RHS, unless there are no primes p2 , p3 ,. . . , pr , i.e., r = 1 and n is a power of 3.  Exercise 19. All sums represent multiplicative functions. The only unclear case may be in (d): Sμ/φ . As in Lemma 2, the ordinary ratio f /g of two multiplicative functions is also multiplicative, provided we can actually divide by g. We don’t have a problem with division by φ since φ(n) ≥ 1 for all n (as incidentally, the ∞-Raffle puzzle also required). The end results are: (a) Sμφ (n) =



(c) Sμ2 φ2 (n) =

i (2



− pi );

i ((pi

(b) (μ  φ)(n) = n

− 1)2 + 1);



φ2 (n) 1 2 pi ) = n ;  pi −2 i pi −1 ·

i (1

(d) Sμ/φ (n) =





Exercise 20. Since n and nk have identical prime divisors {pi }, we obtain: φ(nk ) = nk



(1 −

1 pi )

Ä 

(1 −

= nk−1 n

i

1 pi )

ä

= nk−1 φ(n).



i

Proof of (24). The text points to the prime decompositions of gcd(m, n) =   (m, n) and lcm(m, n) = [m, n]. Let n = ri=1 pai i and m = ri=1 pbi i for the same primes pi , with ai ≥ 0 and bi ≥ 0. Then (29) (30)

min{a1 ,b1 } min{a2 ,b2 } r ,br } p2 · · · pmin{a , and r max{a1 ,b1 } max{a2 ,b2 } r ,br } [m, n] = p1 p2 · · · pmax{a , r the gcd picks the smaller of the two prime powers pai i and pbi i , while picks the larger. For example, for 4 = 22 30 and 6 = 21 31 , we obtain

(m, n) = p1

because the lcm ♦ (4, 6) = 21 30 = 2 and [4, 6] = 22 31 = 12. Now suppose that ai ≤ bi for some i. Then min{ai , bi } = ai and min{ai ,bi } max{ai ,bi } pi = pai i +bi . The max{ai , bi } = bi , so that the product pi same result holds true if ai ≥ bi . Thus, multiplying expressions for (m, n) and [m, n] in (29)–(30) yields:  min{a ,b }  max{a ,b } i i i i

pi

i

pi

i

=

r r  a +b   pi i i = pai i pbi i = mn. i

i=1

i=1



258

10. DIRICHLET, MÖBIUS, AND EULER

Exercise 21. (a) Since f ∈ M, we can split everything along the corresponding prime powers and use (24). To establish the equality, it will be, therefore, sufficient to compare only the resulting prime-power pieces: min{ai ,bi }

?

f (pbi i ) f (pai i ) = f (pi

(31)

max{ai ,bi }

) f (pi

).

This is true, as seen before: either min{ai , bi } = ai and max{ai , bi } = bi , or the other way around. Multiplying (31) for all i gives the desired relation Ä

ä Ä

ä



f (m)f (n) = f (m, n) f [m, n] .

(b) We apply the formula for φ, keeping in mind that the prime divisors of mn and of [m, n] comprise the same set {p1 , p2 , . . . , pr } (why?): φ

(m, n)φ([m, n]) = (m, n)[m, n]

  (24) φ (1− p1i ) = mn (1− p1i ) = φ(mn). i



i

Trivial examples that satisfy (b) in place of φ are O, ε, and id; but none of ι, μ, τ , or σ works. In fact,



Exercise 24. For any f ∈ M show that Exercises 20 and 21(b), with f in place of φ, are equivalent: f (nk ) = nk−1 f (n) ∀n, k ∈ N ⇔ f (mn) = (m, n)f ([m, n]) ∀m, n ∈ N. Hint: Split along prime powers and rewrite for any prime p as follows: f (pk ) pk

=

f (pa+b ) pa+b

?

f (p) p

∀k ∈ N ⇔

=

f (pb ) pb

∀a, b ∈ N, a ≤ b.

You can further simplify by substituting g(n) = f (n)/n for all n ∈ N.



Problem 13. To use Gauss’s approach, we split all numbers from 1 to n into pairs {t, n − t}. Note that t is relatively prime to n iff n − t is relatively prime to n; thus, either both numbers t and n−t participate in the sum η(n), or neither of them does. More good news: each pair adds up to n. Finally, to avoid overcounting because of the pairing, we divide by 2 and skip writing that each sum runs over all t’s such that (t, n) = 1 and 1 ≤ t ≤ n: η(n) =

1 2



ä

t + (n − t) =

1 2



n = 12 n



def. φ 1 2 nφ(n).

1 =



For the alternative combinatorial approach suggested in the text, let’s calculate a few preliminary sums. If not indicated, all t’s run from 1 to n; {p1 , p2 , . . . , pr } are, as usual, the prime factors of n, and d is a divisor of n. . If we restrict the sum to The basic “Gauss” sum-formula is nt=1 t = n(n+1) 2 multiples t of d, we can write t = dq for q = 1, 2, . . . , n/d and add up:  d|t

t=

n/d 

dq = d

q=1

n/d 

Gauss

q = d

q=1

If, say, d = pi pj pk , recall that 



t=

i 0. This precise smoothing situation is so common that we phrase it as:



Lemma 1. (Smoothing) If f (x) is a convex function on interval I, and A < B < C < D are numbers in I such that the middle two are equidistant from the end ones, i.e., B − A = D − C, then f (B) + f (C) ≤ f (A) + f (D). Moreover, if f (x) is strictly convex, then the inequality above is strict. Concisely put, bringing the inputs closer together (as {A, D} → {B, C} in Fig. 2c) decreases the sum of the outputs of a convex function. For the novice, it will be a worthwhile experience to attack this Smoothing Lemma about four points by the definition of convexity, which relates only three points at a time.



PST 86. If you are given a statement P1 about k objects, but you are trying to prove some other statement P2 about n objects (where n > k), select all or several suitable k-element subsets out of the n objects, apply P1 to each subset, and then bring together your results in P2 by adding, multiplying, or performing some other such symmetric operation. Hint for Smoothing Lemma 1: It turns out that of the four possible triplets of points from {A, B, C, D} only two will suffice (hinted by Fig. 2c): you just have to choose them symmetrically, apply the definition of convexity to each triplet, and then add up your inequalities. The geometry-oriented reader may want to find a fast trapezoidal explanation. ♦ Going back to AM-HM, we apply Lemma 1 to the convex f (x) = x1 on (0, ∞) and the numbers {x − b, x, x + a − b, x + a} (possibly x > x + a − b): f (x − b) + f (x + a) > f (x) + f (x + a − b) ⇒

1 1 1 1 + > + · x−b x+a x x+a−b

Since the LHS (AM) remains constant, and from the above the RHS (HM) increases (why?), our monovariant argument for AM-HM works out.  4.1.2. Convex look at AM-GM. To locate the convex function behind the proof of AM-GM requires some manipulation of the inequality. The key smoothing step was the same as in the AM-HM proof: to replace x1 = x + a and x2 = x − b by x + a − b and the average x of the given n non-negative numbers. This kept the sum x1 + x2 constant, but how do we explain that it also increased the product x1 x2 ?

270

11. MONOVARIANTS. PART III

Products are not conveniently recognized as convex functions. Still, we can turn them into sums by applying, say, ln x: ln(xy) = ln(x) + ln(y) and ln x1/n = n1 ln x. The effect of this on both sides of AM-GM is: ln

x + x + · · · + x ? √ ln x1 + ln x2 + · · · + ln xn 1 2 n · ≥ ln n x1 x2 · · · xn = n n

But this is the opposite to the inequality that we would expect for a convex function, and not surprisingly, ln x is not convex: it is commonly called coni cave, the opposite of convex. To reverse the inequality, the convex function to be used here is f (x) = − ln x: − ln

x + x + · · · + x ? − ln x − ln x − · · · − ln x 1 2 n 1 2 n · ≤ n n

Again by Lemma 1, − ln(x + a) − ln(x − b) > − ln(x) − ln(x + a − b), from which the monovariant argument for AM-GM can be completed.  4.2. Smoothing Jensen’s inequality. Smoothing or unsmoothing famous inequalities is a treat for problem-solvers. Inequalities I discussed the relations between ubiquitous means such as AM, GM, HM, and more generally, power means and their weighted versions. Our next goal is to prove two fundamental inequalities: Jensen’s inequality (ordinary and weighted) and Hardy-Littlewood-Pólya’s inequality. These two inequalities imply the rest of the standard inequalities among power means. To start off, Jensen’s inequality can be recognized as generalizing the Midpoint Rule from two numbers x and y to any n numbers:



Problem 4. (Jensen’s Inequality (JI)) Let f be a strictly convex function on some interval I. If x1 , x2 , . . . , xn are any numbers in I, prove that Å ã f (x1 ) + f (x2 ) + · · · + f (xn ) x1 + x2 + · · · + xn ≤ , f n n with equality if and only if all xi ’s are equal. Hint: Let x be the average of x1 , x2 , . . . , xn . Use the same method as in the proof of AM-GM: if the xi ’s are not all equal to x, then change two of them so that one becomes x but the average of all stays the same. Lemma 1 will show that, meanwhile, the sum in the RHS has decreased. ♦ Exercise 7. (AM-GM again!) Modify the previous proof of AM-GM with another convex function in place of the logarithmic f (x) = − ln x; for example, the exponential 2x or, frankly, bx for any constant b > 1. Further shorten these proofs by applying Jensen’s inequality. Problem 5. (Weighted JI) Let f be a strictly convex function on some interval I. Prove that if x1 , x2 , . . . , xn are any numbers in I and λ1 , λ2 , . . . , λn are any non-negative numbers whose sum is 1, then f (λ1 x1 + λ2 x2 + · · · + λn xn ) ≤ λ1 f (x1 ) + λ2 f (x2 ) + · · · + λn f (xn ), with equality if and only if all xi ’s with non-zero weights λi are equal.

4. CONVEXITY AND SMOOTHING

271

The reason for the sum of the weights λi to be 1 is two-fold. On the one hand, it forces the weighted average x ˜ = λ1 x1 + λ2 x2 + · · · + λn xn to be between the smallest and the largest of the xi ’s (why?) and, therefore, on the interval I where f is defined and convex. On the other hand, something that might be self-evident: now there is no need to divide by n when taking the weighted average of the xi ’s because the λi ’s have taken care of this (how?). Although we warned in Exercise 2 against a (possibly endless) smoothing argument that replaced each of two variables by their pairwise average, let’s see how this can be turned to our advantage: Proof of weighted JI: Take two variables that are not equal, say, 1 2 ˜ = λ1λ+λ x1 + λ1λ+λ x2 . x1 = x2 , and replace each by their weighted average x 2 2 Here we divided the original weights λ1 and λ2 by (λ1 + λ2 ) in order to make the new weights of x1 and x2 sum to 1. From the definition of a convex function on [x1 , x2 ] (or [x2 , x1 ]): Å ã λ2 λ2 λ1 λ1 f (x1 ) + f (x2 ) ≥ f x1 + x2 = f (˜ x) λ1 + λ2 λ1 + λ2 λ1 + λ2 λ1 + λ2 ∗(λ1 +λ2 )

⇒ λ1 f (x1 ) + λ2 f (x2 ) ≥ (λ1 + λ2 )f (˜ x) = λ1 f (˜ x) + λ2 f (˜ x), which shows that the RHS of the weighted JI decreased. Meanwhile, the weighted average of all numbers did not change: ˜ + λ2 x ˜) + λ3 x3 + · · · + λn xn , (λ1 x1 + λ2 x2 ) + λ3 x3 + · · · + λn xn = (λ1 x



so the LHS stayed constant. This provides the intended smoothing argument, alas, possibly never ending! However, something more happened: not only have we replaced each x1 and x2 by x ˜, but we can actually combine these two variables into one ˜ = λ1 + λ2 , and prove instead the inequality: variable x ˜, with weight λ ?

˜ (˜ ˜ x + λ3 x3 + · · · + λn xn ) ≤ λf x) + λ3 f (x3 ) + · · · + λn f (xn ), f (λ˜ ˜ + λ3 + · · · + λn = 1. So, there are actually one invariant and two where λ monovariants anchoring this proof: • the constant LHS and the decreasing RHS; • the total number of variables (and not the number of variables equal to the average), which decreases by 1 at every step. At the end, we are left with only one variable or with all variables equal to each other, both of which cases are trivially true. Backtracking, equality is obtained iff all original variables with non-zero weights are equal (why?).  In Inequalities I we showed that the weighted JI implies the weighted versions of AM-GM, AM-HM, and other inequalities among means. With our new understanding of convex functions and smoothing techniques, the reader may want to redo these proofs here. For “extra credit,”

 Problem 6. Invent other problems that can be solved by (weighted) JI.

272

11. MONOVARIANTS. PART III

4.3. Hardy-Littlewood-Pólya’s inequality (HLP) (or Karamata’s inequality) will require more than just straightforward smoothing. First recall how we defined majorization of sequences in Inequalities I: Definition 1. The n-tuple {x1 , x2 , . . . , xn } is said to majorize the n-tuple

i {y1 , y2 , . . . , yn } if x1 ≥ x2 ≥ · · · ≥ xn , y1 ≥ y2 ≥ · · · ≥ yn and

(7)

x1 ≥ y1 ; x1 + x2 ≥ y1 + y2 ; .. . x1 + · · · + xn−1 ≥ y1 + · · · + yn−1 ; x1 + · · · + xn−1 + xn = y1 + · · · + yn−1 + yn .

The majorizing conditions (7) appear commonly in the theory of inequalities, as well as the theory of partitions and elsewhere. Problem 7. (HLP) Suppose that {x1 , . . . , xn } majorizes {y1 , . . . , yn } and f is a convex function on an interval I containing the xi ’s and yi ’s. Then f (x1 ) + f (x2 ) + · · · + f (xn ) ≥ f (y1 ) + f (y2 ) + · · · + f (yn ). If f is strictly convex on I, then equality is attained iff xi = yi for all i. The proof of HLP is tricky, because we have to choose the steps so that the constraints (7) of majorization never get violated. We proceed in stages. 4.3.1. Happy endings that could happen. In two cases the HLP situation will be majorly simplified: Happy ending 1. All xi are equal. Then the last equality of (7) yields: nx1 ≤ ny1 , i.e., x1 ≤ y1 . Combining with the first inequality x1 ≥ y1 , we conclude x1 = y1 . Cancelling both x1 and y1 from all inequalities, we remain in the same situation but for only n − 1 numbers. Continuing inductively, we arrive at xi = yi for all i, and then HLP follows trivially.  Happy ending 2. An inequality in (7) is an equality: x1 + · · · + xk = y1 + · · · + yk for some k ≤ n − 1. We can then restrict the problem to the first k inequalities and variables. Moreover, canceling x1 + · · · + xk and y1 + · · · + yk from both sides of the remaining inequalities, we again arrive at the HLP problem but only for the sequences {xk+1 , . . . , xn } and {yk+1 , . . . , yn }. Applying induction on the number of variables, we conclude that HLP works for the first k variables, and also for the last n − k variables: f (x1 ) + f (x2 ) + · · · + f (xk ) ≥ f (y1 ) + f (y2 ) + · · · + f (yk ), f (xk+1 ) + f (xk+2 ) + · · · + f (xn ) ≥ f (yk+1 ) + f (yk+2 ) + · · · + f (yn ). Summing, we obtain the HLP inequality for all n variables.



4. CONVEXITY AND SMOOTHING

273

4.3.2. How to get to a happy ending? We will come up with an operation which, at every step, will: (1) increase the number of initial xi ’s that are all equal to each other, x1 = · · · = xk ; or failing this, (2) lead us to the Happy ending 2. The operation, of course, will help us prove the HLP inequality: we will ensure that the sum of the f (xi )’s (the LHS) decreases due to convexity, while the yi ’s (and hence the RHS) remains the same. Start by letting k be the first index for which x1 = · · · = xk−1 > xk . Let x = (x1 +· · ·+xk )/k be the average of the first k xi ’s. We will perform one of two smoothings, being careful to preserve the majorization inequalities (7). Smoothing 1. If we can replace each of x1 , . . . , xk by x (cf. Fig. 3a) so that {x1 , . . . , xn } still majorizes {y1 , . . . , yn }, then we do so. This equalizes the first k numbers: x1 = · · · = xk , for which Jensen’s inequality implies k i=1 f (xi ) ≥ kf (x), i.e., the LHS of the desired HLP has decreased. xk−1 xn . . . xk

x

xn . . . xk

x2 x1

−a

+(k − 1)a

x

xk−1 x2 x1

Figure 3. Smoothing 1 and Smoothing 2 of HLP Smoothing 2. If Smoothing 1 disturbs some inequality in (7), apply



Lemma 2. For some positive value a < x1 −x we can shift down x1 , . . . , xk−1 by a to x1 −a, . . . , xk−1 −a, and compensate by shifting up xk to xk +(k −1)a (cf. Fig. 3b), so that one of the majorization inequalities ( 7) becomes an equality while the other inequalities are preserved. Proof: Since x1 + . . . + xk−1 + xk is not changed, the inequalities after the k th in (7) are not affected by Smoothing 2. If one of the first k inequalities were an equality from the get-go, then we are already at Happy ending 2. If not, let ai = 1i (LHSi − RHSi ) > 0 be the difference of the two sides of the ith inequality, divided by i. Note that ai is exactly as much as we would decrease each xj in the ith inequality in order to make it into an equality: x1 + · · · + xi > y1 + · · · + yi ⇒ (x1 − ai ) + · · · + (xi − ai ) = y1 + · · · + yi .

Set a = am to be min{a1 , a2 , . . . , ak−1 } and use it to perform Smoothing 2, i.e., decrease each x1 , . . . , xk−1 by a, and increase xk by (k − 1)a. This will make the mth inequality into an equality, while preserving the rest of (7). Since Smoothing 1 (x1 → x) would have violated something in (7) but Smoothing 2 (x1 → x1 − a) does not, we must have x1 − x > a (why?). We can now arrange the players in Smoothing 2 in increasing order (cf. Fig. 3b): (∗)

xk < xk + (k − 1)a < x < x1 − a < x1 , (∗) being true, or else all k new numbers would be > their average x!

274

11. MONOVARIANTS. PART III

What happened to the LHS? Did it decrease under Smoothing 2, i.e., k 

?

f (xi ) = (k − 1)f (x1 ) + f (xk ) ≥ (k − 1)f (x1 − a) + f (xk + (k − 1)a)?

i=1

By making appropriate replacements, you can identify this inequality with the following generalization of the Smoothing Lemma:



Lemma 3. (Multi-smoothing) If f (x) is a convex function on interval I, and A < B < C < D are numbers in I such that C is l times closer to D than B is to A for some l ∈ N, i.e., B − A = l(D − C), then f (A) + lf (D) ≥ f (B) + lf (C). Moreover, if f (x) is strictly convex, then the inequality above is strict. Hint: You could repeatedly apply the Smoothing Lemma, or you could devise a fast-track geometry argument with trapezoids. In the case of l = 4, the drawing on the right has already set up the ground for both solutions. ♦

A

B

C  D C

B

A

B1 . . . Bl−1 B

C

D

4.3.3. Weaving inductively the proof of HLP. We now have all pieces to put together: we know where we would like to end, and we know how to get there. Naturally, we use induction on n. The HLP inequality is trivially true for n = 1, since then x1 = y1 . Suppose HLP is true for n − 1 variables. For n variables, we keep applying our Smoothing 1 or 2 operations until we make all variables equal to each other, or until one of the majorization (7) inequalities (other than the nth one) becomes an equality. These two situations were addressed before, and each ends happily.  A “bifurcation” phenomenon persisted throughout our solution of HLP: there were two happy endings and two smoothing procedures to get there.



PST 87. While it is often possible to construct a smoothing operation leading eventually to all variables being equal, some inequalities call for alternative smoothing operations that lead to other favorable outcomes. In such problems, you have to simultaneously take into account two or more scenarios throughout the induction (or smoothing) process. Now that we have proven HLP, Exercise 8. Can you recognize the Smoothing and the Multi-smoothing Lemmas as special cases of HLP? For practice, explain why HLP implies the following:



Corollary 1. (HLP for Products) If {x1 , . . . , xn } majorizes {y1 , . . . , yn } on interval I and all xi ’s and yj ’s are positive, then x1 x2 · · · xn ≤ y1 y2 · · · yn . Equality is attained if and only if xi = yi for all i.

5. RANDOM FUN WITH SMOOTHING

275

5. Random Fun with Smoothing Here are two beautiful Olympiad problems that will challenge us to combine old monovariant ideas in creative new ways. We will only discuss how to link the problems to what we have already learned, and leave it to the reader to “smooth out” (pun intended) all arguments into complete solutions. 5.1. Alternating sums are featured in the Balkan Olympiad Problem 1 that started our session: if n ≥ 2 and 0 < a1 < · · · < a2n+1 , then √ √ √ √ √ √ n a1 − n a2 + n a3 − · · · + n a2n−1 − n a2n + n a2n+1 ? √ < n a1 − a2 + a3 − · · · + a2n−1 − a2n + a2n+1 . Constructing a clever smoothing (or unsmoothing) procedure for a convex (or a concave) function will finally unravel the inequality for us. Asking the right questions is half of the smoothing! So, let’s start. √ 5.1.1. What function should we use? The only function around is n x for x > 0. Even if we apply induction on n and manage to decrease the number of variables, it is√unlikely that we will succeed in reducing to the previous √ n x root function n−1 x. So, we have to live with the same function f (x) = √ throughout the whole solution. To make matters slightly more annoying, n x is concave (why?), which can be resolved easily: all of our known inequalities will be applied with the opposite signs. 5.1.2. What is our monovariant: what feature of the inequality is it feasible to preserve? Not the sum, evidently, but the alternating sum of the variables. Hence, we could try to keep the RHS constant, while increasing the LHS. For the same reason, useful smoothing cannot possibly collect terms together towards an average, because preserving the sum seems irrelevant here. 5.1.3. Can we reduce the number of variables? If we combine two consecutive terms into one, i.e., −a2n + a2n+1 → −a2n , the number of terms will change from odd to even, and the nature of the problem will change.4 √ n x Instead, we could combine three consecutive terms into one: something like a2n−1 − a2n + a2n+1 → a, and thereby involve the four variables a2n−1 , a2n , a, a2n+1 , along with their nth roots in our smoothing argument. Note that a2n−1 < a < a2n+1 (why?), but we don’t a2n−1 a2n a a2n+1 know (and won’t care) which of a and a2n is larger. What we do care about is that a and a2n are symmetrically placed in the interval [a2n−1 , a2n+1 ] (why?). Which obvious tool comes to mind? ♦ 4



Besides, the inequality doesn’t make sense with an even number of variables; e.g., √ ? √ √ 1 − 2 < 1 − 2 = −1 = i. . . . Nope, complex numbers cannot be compared like that!

276

11. MONOVARIANTS. PART III

5.2. Concentration monovariant revisited! Our last Olympiad problem will bring us full circle to the very first serious monovariant we constructed way back in Part I. Does something below look familiar? Problem 8. (USAMO ’99, [4]) Let n > 3, and let a1 , a2 , . . . , an be real numbers such that a1 + a2 + · · · + an ≥ n and a21 + a22 + · · · + a2n ≥ n2 . Prove that at least one of the numbers a1 , a2 , . . . , an is ≥ 2. 5.2.1. Easy come, easy go! Before we plunge into a deep discussion, we should discard the possibility of a trivial solution. At a first glance, the solution rests on inequalities about the sum of squares a2i , along with the assumption that all ai < 2: n2 ≤ a21 + a22 + · · · + a2n < 4n, from which n < 4, a contradiction. . . . or is it? We forgot that a2i < 4 would be implied if the ai > 0, but our ai ’s could be any real numbers! The inequality P1 ≤ P2 between the arithmetic mean and the root mean square also requires ai ≥ 0 and yields the unhelpful 1 < 2. There goes any chance of a trivial solution. 5.2.2. “Rewriting” history. Our Problem 8 resembles a lot the signature problem of Monovariants I about the mansion. Let’s take a fresh look at it. Problem 9. (Mansion ) P people reside in the rooms of a n-room mansion. Each minute a person walks from one room into another with at least as many people in it. Prove that eventually everyone will be in one room. Proof : The mansion problem came with an explicit operation – the movement of people between rooms. But it did not have a readily available monovariant. In our solution, we created the latter as the “concentration of the mansion,” namely, the sum of the squares a21 + a22 + · · · + a2n where ak was the number of people in room k. Was the given operation smoothing with respect to our monovariant? To the contrary: it was unsmoothing! To see this, picture the action happening along the graph of the convex f (x) = x2 for x > 0: whenever ai ≤ aj x2 a person could move from room i to room j and (ai , aj ) → (ai − 1, aj + 1). This kept the sum of ai −1 ai aj aj +1 all a ’s constant, while it pulled them apart. k So by Smoothing Lemma 1 applied in reverse: f (ai ) + f (aj ) < f (ai − 1) + f (aj + 1) ⇒ a2i + a2j < (ai − 1)2 + (aj + 1)2 ,

i.e., the monovariant increased at each step. To complete the argument: each ai could assume only finitely many values (why?), and hence so could  the monovariant a2i , which prevented it from increasing forever. 5.2.3. Learning from the past. To attack our current problem, let’s try to change the numbers a1 , a2 , . . . , an in a systematic way to reach the extreme case, in the spirit of smoothing. In order to preserve the given inequality a21 + a22 + · · · + a2n ≥ n2 , we need each step to increase the sum of squares,

5. RANDOM FUN WITH SMOOTHING

277

not decrease it. This is where the mansion problem comes in; it suggests that we should try an unsmoothing operation. There are minor differences between the two problems, all resolvable: • Before, the ai ’s were integers ≥ 0. Now they are any real numbers. • Before, we could only make changes of (−1, +1) to pairs ai < aj . Now the unsmoothing change (−a, +a) is allowed for any a > 0. • Before, the monovariant eventually came to a full stop, simply because it had only finitely many possible values. But now, we have continuous variables ak and, therefore, infinitely many values for a2i . How do we make the monovariant stop changing? We need another, discrete monovariant to put the brakes on the process. Recall the goal of the problem: to show that some ai ≥ 2. Assuming to the contrary that all ai < 2, create a unsmoothing operation that increases at each step the number of ai ’s ≥ 2. ♦ In both the mansion and USAMO ’99 problems the sum of squares acted as a “concentration” monovariant. From our discussion of convex functions, we know that we don’t have to use squares. In the latter problem the monovariant is, at least hypothetically, in danger of continuing to increase forever, so we helped it with an auxiliary monovariant. These ideas are general enough to be written out as PSTs.



PST 88. If you have a collection of numbers xi whose sum stays constant, and need a monovariant that increases when the numbers become more “spread out,” try using the sum of their squares. More generally, you can try using the sum of f (xi ) where f is any strictly convex function.



PST 89. If the monovariant is continuous (or can take on infinitely many values), create another, discrete monovariant (e.g., the number of variables with some specific property) that will cause the smoothing process to end. For fun and to understand better the technique of smoothing: Exercise 9. Redo the problems from Monovariants I about gender balance and hybrid mansions, leaping frogs along collinear lilies or in a circular swamp, and simultaneous switches. (The images below should help bring on a flashback.) Identify explicitly the convex functions and the smoothing operations used in the solutions. Create more exercises of the same type.

278

11. MONOVARIANTS. PART III

6. Appendix on Limits and Endless Smoothing We avoided using limits so far in the Monovariants sessions, relying on discrete arguments. As promised, we shall prove here several technical statements for the reader advanced in limits and continuous functions. 6.1. We need a GPS device! Smoothing by pairwise averaging did not work in Exercise 3(b). We started there with three numbers, {a, a, b}. At each step we replaced two of the unequal numbers with their average and left the third number unchanged: ß ™ ß ™ a+b a+b 3a + b 3a + b a + b , , , → → · · · (8) {a, a, b} → a, 2 2 4 4 2 Unfortunately, this smoothing process never ends because it keeps producing the same type of configuration {c, c, d} with c = d. Still, performing a few steps of the process leads to the inevitable observation that the three numbers are crowding closer and closer around their average x = (2a + b)/3 (cf. Fig. 4a). Can we make this precise?

a

dk

a+b 2

3a+b 4

x

b

A ak

C

X x

B bk

Figure 4. Approaching x and Decreasing dk Exercise 10. Let {ak , bk , ck } be the three numbers after performing k steps lim ak = lim bk = lim ck = x.

 of the pairwise averaging in (8). Then

k→∞

k→∞

k→∞

There are many ways to prove this, but perhaps the most easily generalized way rests on a standard “sandwich” idea:



PST 90. To see why several sequences converge to the same limit x, set dk to be the maximal distance between x and all the numbers after the k th step. Show that lim dk = 0 to force all sequences to converge to x. k→∞

Solution to Exercise 10: The process averages only pairs of numbers that come from opposite sides of x (why?), i.e., if ak < x < bk , then ak and bk each go to (ak + bk )/2. Suppose ak is further away from x than bk , i.e., dk = |x − ak |. Using the notation in Figure 4b, the simple geometric argument CX < CB = 12 AB < AX = dk shows that ak shortened its distance to x by a factor of at least 2. Applying the averaging once more will bring in the third number at least twice as close to x as it was before. To summarize, dk decreases at each step and gets at least halved every other step, i.e, dk+2 ≤ dk /2. This results in limk→∞ dk = 0, and by PST 90  all three sequences ak , bk , and ck converge to x. With four or more numbers, however, there are choices for the order of pairs to average, and if we are not careful, our numbers may not converge to the same place! Using a distance monovariant again,

6. APPENDIX ON LIMITS AND ENDLESS SMOOTHING



279

Exercise 11. Devise an algorithm for pairwise averaging of x1 , x2 , . . . , xn that forces them to approach their arithmetic average x = ( i xi )/n. We will refer to such an algorithm as a good pairwise smoothing, or for

i short, a GPS directing all numbers x1 , x2 , . . . , xn towards their average x.

6.2. Limits tame inequalities. To see how this is useful in working with inequalities, suppose you want to prove something as general as: LHS = F (x1 , x2 , . . . , xn ) ≤ G(x1 , x2 , . . . , xn ) = RHS

(9)

I. Suppose for two functions F and G, continuous for all xi in some interval Ä x +x ä x +x further that under pairwise averaging of the inputs (xi , xj ) → i 2 j , i 2 j , F increases and G decreases. Using a GPS, we have lim xi → x for all i, k→∞

where the steps of the GPS are indexed by k.5 By continuity of F and G and properties of limits, we are left to prove only the middle inequality below: ?

LHS ≤ lim F (x1 , . . . , xn ) = F (x, . . . , x) ≤ G(x, . . . , x) = lim G(x1 , . . . , xn ) ≤ RHS. k→∞

k→∞

We have gained a powerful insight:



PST 91. If pairwise smoothing does not disturb the inequality between two continuous functions, i.e., it lifts the smaller side up and lowers the larger side down, then all you need to show is that the inequality is true when all variables are equal. This simplifies the proof of some inequalities we have encountered so far. If you are bothered by the continuity condition, rest assured that: Lemma 4. Any composition of the four arithmetic operations ±, ×, ÷, the algebraic operations of raising to a power, and any of the standard continuous functions such as exponential, logarithmic, or trigonometric functions, is continuous on any interval where this composition is well-defined. 6.3. Infinite pairwise smoothing is useful, after all! With this sea of continuous functions, we can attack a number of inequalities. Keep in mind that any convex function on interval I is necessarily continuous on I (why?).

 inequalities between the arithmeticmean P

Exercise 12. Using pairwise smoothing, prove Jensen’s inequality and the 1 and any other power mean Pr . xr +···+xr

Partial Proof: Since Pr = r 1 n n is continuous for xi > 0 and Pr = P1 for equal inputs, when r > 1 the proof of P1 ≤ Pr boils down to showing that Pr strictly decreases under pairwise smoothing for x = y: Å

ã

x+y r ⇔ x +y >2 2 r

5

r ?

  r

xr + y r ? x + y · > 2 2

Strictly speaking, each xi should also be indexed by k since xi changes with the steps.

280

11. MONOVARIANTS. PART III

The latter is the original inequality P1 < Pr but only for two variables, which is a substantial reduction brought about by the pairwise smoothing. Setting f (x) = xr and I = (0, ∞), we can rewrite the last inequality as: Å

ã

x + y ? f (x) + f (y) for all x = y in I. < (10) f 2 2 “Surprise!” This is the Midpoint Rule for the strictly convex f (x) on I!



6.4. Why is the Midpoint Rule (MR) true? Taking (10) as given, we need to show the definition of a strictly convex function, i.e., (11)

?

f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) for any λ ∈ (0, 1).

Starting with x and y, we have to get to x ˜ = λx +(1−λ)y, their weighted average, by using only ordinary averages of pairs so that we can apply MR at each step. Another (weighted) GPS construction is in order here:



Lemma 5. There is a sequence {an } inside interval (x, y) that converges to x ˜ so that each an is midway between some previous ai ’s, including possibly x and y. If an = λn x + (1 − λn )y for some λn ∈ (0, 1) then lim λn = λ. n→∞

By induction on n, show that inequality (11) works for these an ’s:

 Lemma 6. f (a ) < λ f (x) + (1 − λ )f (y) for all n ≥ 1. n

n

n

To complete the proof of the Midpoint Rule, we apply limits to both sides of the inequality in Lemma 6: lim f (an ) ≤ ( lim λn )f (x) + lim (1 − λn )f (y)

n→∞

n→∞

n→∞

!

⇒ f (˜ x) ≤ λf (x) + (1 − λ)f (y), which is almost what we wanted, but since the limits on both sides of the inequality could be equal, we lost along the way the strict inequality! However, we did prove (11) as a non-strict “≤” inequality for all λ ∈ (0, 1), and thereby for all weighted averages of x and y. Geometrically, this means that the graph of f on (x, y) is below or on the segment XY . (In the figure, for any z ∈ I we denote by Z the point on the graph of f over z). If (11) were an equality for some λ ∈ (0, 1), then X ˜ (above the weighted average x X the point X ˜) must lie f on segment XY . Because of the Midpoint Rule hy˜ X Y pothesis, λ = 1/2 (why?), i.e., x ˜ is not the midpoint between x and y; so say, x ˜ is closer to x than to y. Then x ˜ is midway between x and another x ∈ (x, y). y x x ˜ x Again, from the strict inequality in the Midpoint Rule we know that ˜ below XX  , i.e., X  2f (˜ x) < f (x) + f (x ). Geometrically, this puts point X ˜ = XY ! This contradicts our previous conclusion! is strictly above line X X Thus, all inequalities (11) are strict, f (x) is strictly convex on I, and the Midpoint Rule is indeed true. 

7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

281

6.5. Final Quiz. Recall Problem 2 from Inequalities I: if a1 , a2 , . . . , an are √ positive numbers and g = a1 a2 · · · an is their geometric mean, then (1 + a1 )(1 + a2 ) . . . (1 + an ) ≥ (1 + g)n .

(12)

The solution in Inequalities I relied on a fair amount of combinatorics and some algebra, in addition to clever multiple applications of AM-GM to a variety of terms. It is reasonable now to attempt to prove this inequality using smoothing, but since the sum of the ai ’s does not appear in the problem, at least not in an obvious way, can smoothing be performed at all? ai ; Expanding by multiplying out the LHS will “create” the desired sum unfortunately, various products of the ai ’s will also pop up on that side of the intended inequality, messing up any smoothing attempts that fix the sum!



Problem 10. Challenge your understanding of the techniques developed in this session by finding a quick and elegant proof to inequality (12) via: (a) a finite smoothing argument; (b) an infinite smoothing argument.

7. Hints and Solutions to Selected Problems Exercise 1. Suppose we want to run smoothing with the same idea as in the proof of AM-GM: fix the sum of the xi ’s and increase the other side. Thus, we replace a pair of numbers x + a and x − b not equal to the average x by x and x + a − b. We know that their sum remains constant and their product (x + a)(x − b) increases to x(x + a − b). But we also need to figure out what happens to the sum of their reciprocals in the denominator of the RHS: 1 (x + a) + (x − b) x + (x + a − b) 1 1 1 + = > = + · (13) x+a x−b (x + a)(x − b) x (x + a − b) x x+a−b Adding the other unchanged x1i ’s and reciprocating (13) flips the sign of the inequality and forces the RHS to increase under our operation. Just as in the proof of AM-GM, the number of variables xi equal to the average x increases at each step, i.e., after at most n repetitions all variables will be equal to x and equality will be then obtained. Hence, the original AM-HM inequality must have been true, with equality iff all variables are equal.  It is possible to also deduce AM-HM from two applications of AM-GM. Rewrite equivalently the desired AM-HM by pulling everything to the LHS: 1 x1

1 x2

+ ··· +

1 xn

x1 + x2 + · · · + xn ? ≥ 1. n n Now, AM-GM applied separately to x1 , x2 , . . . , xn and to x11 , x12 , . . . , x1n yields two geometric means that cancel each other: 1 1 √ x1 + · · · + xn AM-GM  x1 + · · · + xn n 1 · ≥ . . . x1n · n x1 . . . xn = 1. x 1 n n Equality is obtained iff AM-GM yields equalities, i.e., for x1 = · · · = xn .  +

·

282

11. MONOVARIANTS. PART III

Exercise 2. The sum remains constant and the product increases under the x +x operation of replacing each of two different numbers xi and xj by i 2 j : xi + xj xi + xj + and xi xj xi + xj = 2 2

AM-GM


1 numbers will be equal to each other after n−1 steps. More precisely, if n − k of the numbers are already equal to the average x, with k ≥ 1, then it will take at most k − 1 steps for the process to end in equality, fixing the LHS at each step and increasing the RHS. We leave it to the reader to finish the formal proof by induction on k, taking into account that every time the operation reduces the numbers not equal to x. ♦ Exercise 5. Just reverse the monovariant step from the proof of RI: for n = 2, if z1 = y2 and z2 = y1 , do nothing; if z1 = y1 and z2 = y2 , then switch the zi ’s to decrease the RHS. Now apply an analogous inductive argument as in the proof of RI. ♦ Exercise 6. The Midpoint Rule boils down to 2-variable AM-HM/AM-GM: • •

1 x+y 2

1 ? x

< Ä

+ 2

1 y



1 x

2 +

?

1 y


ln xy ⇔ x+y xy. − ln x+y 2 2 2 2 >



Smoothing Lemma 1: Since B is between A and D, it can be viewed as a weighted average of A and D for some λ ∈ (0, 1) (cf. Fig. 2c), i.e., B = λA+(1−λ)D. Since B −A = D −C, for symmetry reasons C is also the weighted average of A and D with the reverse weights: C = (1 − λ)A + λD.

7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

283

By the convexity definition applied twice to these two triplets of points: f (B) ≤ λf (A) + (1 − λ)f (D), f (C) ≤ (1 − λ)f (A) + λf (D). Adding and canceling on the RHS yields: f (B) + f (C) ≤ f (A) + f (D).



Problem 4. For xi < x < xj , smoothing as usual (xi , xj ) → (x, xi + xj − x) results in decreasing the RHS according to Smoothing Lemma 1: f (xi ) + f (xj ) = f (A) + f (D) < f (B) + f (C) = f (x) + f (xi + xj − x). 

When all variables are equal, JI is trivially true. 2x

2(x+y)/2

= Exercise 7. To show that is convex, apply the Midpoint Rule: √ ? 2x 2y < (2x + 2y )/2, which is AM-GM for the positive numbers 2x and 2y . Alternatively, if you know derivatives, (2x ) = (ln 2) 2x is an increasing function, so by the First Derivative Test (cf. Inequalities I) 2x is convex for all x. Applying JI to the convex function f (x) = 2x and any numbers y1 , . . . , yn : √ y1 +···+yn JI 2y1 + · · · + 2yn 2y 1 + · · · + 2y n n y ⇔ · ≤ 2 1 · · · 2y n ≤ (14) 2 n n n Now, any xi > 0 can be uniquely written as xi = 2yi : simply set yi = log2 (xi ). Substituting in (14) yields the AM-GM for x1 , . . . , xn . Equality is  obtained iff all yi ’s are equal, i.e., all xi ’s are equal. Lemma 3. The figure on page 274 shows points A = B0 , B1 ,. . . ,Bl = B that divide AB into l equal parts. Applying repeatedly the Smoothing Lemma to the quadruple of points {Bi−1 , Bi , C, D} for 1 ≤ i ≤ l, we obtain l inequalities f (Bi−1 ) + f (D) > f (Bi ) + f (C). Adding them up and canceling all intermediate f (Bi ) for i = 1, . . . , l − 1, but not f (D) and f (C) which appear l times, we arrive at the desired f (A) + lf (D) ≥ f (B) + lf (C).     For the geometric argument, in trapezoid ADD A segments BB and  CC are parallel to AA and DD  . From AB/CD = l and the similar shaded right triangles (with hypotenuses along A D  and one leg horizontal): lC  D  = A B  , from which l(CC  − DD  ) = AA − BB  (why?). Rearranging this, AA +lDD  = BB  +lCC  . Using that the graph of f (x) between A and D  lies underneath the segment A D  , we have BB  > BB and CC  > CC, which yields again f (A) + lf (D) ≥ f (B) + lf (C). ♦ Exercise 8. The Smoothing Lemma is the case with two inequalities in HLP: x1 ≥ y1 and x1 +x2 = y1 +y2 , where x1 , x2 , y1 , y2 are D, A, C, B, respectively. The Multi-smoothing Lemma is the following case of l inequalities in HLP: D ≥ C, D + D ≥ C + C, . . . , lD ≥ lC, lD + A = lC + B.  Corollary 1. Apply − ln(x) to both sides of the intended inequality and n n split the products to sums: i=1 (− ln(xi ))≥ i=1 (− ln(yi )), which is HLP for the convex f (x) = − ln x on (0, ∞). 

284

11. MONOVARIANTS. PART III

Problem 1. To show that » n

√ √ x+y ? n x+ n y > 2 2

√ n x is concave, use the Midpoint Rule: ⇔

x+y ? 2 >



x1/n +y 1/n 2

n

?

⇔ P1 > P 1 , n

1 n

which is the power mean inequality for two variables with r = < 1. Alternatively, using the Second Derivative Test (cf. Inequalities I), we calculate √ √ 1 ( n x) = n1 ( n1 − 1)x n −2 < 0 for x > 0, so n x is concave there. Thus, all inequalities work with the opposite signs. √ m x instead Below we will show that Problem 1 is true for any function √ of n x, where m > 1 is fixed. Indeed, by the Smoothing Lemma for the √ m concave x and a2n−1 , a2n , a = a2n+1 − a2n + a2n−1 , and a2n+1 : √ √ √ √ m m a m a a2n + m a2n+1 − a2n + a2n−1 2n+1 + 2n−1 < √ √ √ √ (15) ⇒ m a2n+1 − m a2n + m a2n−1 < m a2n+1 − a2n + a2n−1 . In the key step, replace a2n → a2n−1 and a2n+1 → a = a2n+1 − a2n + a2n−1 . This increases the LHS and fixes the RHS. Cancelling out the resulting two a2n−1 terms, we end up with √ two fewer radicals and n becomes n − 1. By m induction on n (for function x), we only need the base case for n = 2: √ √ √ ? √ m a1 − m a2 + m a3 < m a1 − a2 + a3 . But this is the general inequality (15) that we already proved earlier with n = 1. Finally, note that our original problem is just the special case of the √ n above for the function x (when m = n) and 2n + 1 variables.  Problem 8. Suppose that all ai < 2. Iterate the following operation as long as possible: take two numbers ai and aj both less than 2, and replace them by 2 and ai + aj − 2. Now ai stays the same, but what happens to a2i ? The operation is unsmoothing: ai +aj −2 < ai , aj < 2, so the middle numbers ai and aj are pulled apart to ai +aj −2 and 2. As f (x) = x2 x2 is convex, by Lemma 1 the sum of the a2j ’s acts as a “concentration” monovariant and goes up: aj ai +aj −2 ai 2 f (ai ) + f (aj ) < f (ai + aj − 2) + f (2). Both given inequalities are preserved by the operation; however, one more of the ai ’s became 2. As we cannot repeat this operation forever, eventually exactly one aj is < 2 and the rest all 2’s. The two inequalities then read: • aj + 2(n − 1) ≥ n ⇒ aj ≥ −(n − 2) ⇒ • a2j + 4(n − 1) ≥ n2 ⇒ a2j ≥ (n − 2)2 ⇒

aj < −(n − 2); |aj | ≥ |n − 2|.

From here aj ≥ n − 2 ≥ 4 − 2 = 2, a contradiction with aj < 2. Hence, indeed, one of the original ai ’s must be ≥ 2.



Exercise 11. At every step simply average the number that is furthest away from x and any other number on the other side of x. After n (really, & n2 ') such steps the maximum distance from x will be at least halved. ♦

7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

285

Lemma 5. We modify the GPS algorithm to approach the weighted average x ˜ of x and y. Set d = y − x, a−1 = x, and a0 = y. Since x and y are on opposite sides of x ˜, their average a1 = (x + y)/2 is at most d/2 from x ˜. Continue inductively: at the nth step, average an−1 and the ai on the opposite side of x ˜ that is closest to x ˜, resulting in an = (an−1 + ai )/2 at most n d/2 from x ˜. Writing an = λn x + (1 − λn )y for some λn ∈ (0, 1), we have ˜) = lim (λn − λ)(x − y) ⇒ lim λn = λ. 0 = lim (an − x



 Incidentally, this GPS proof shows that any real number is the limit of some n→∞

n→∞

n→∞

sequence of rational numbers { 2bnn } whose denominators are just powers of 2.

Lemma 6. For a−1 = x and a0 = y, the inequality in Lemma 6 becomes trivially an equality (why?). Assume that Lemma 6 is true for all k < n. Since an = an−12+ai for some i < n, add the IH inequalities for an−1 and ai : f (an ) = f

Äa

n−1 +ai

2

ä MR f (a )+f (a ) n−1 i


F A+AC. The Pythagorean Theorem may be helpful. ♦ Because of Exercise 1, we can now safely assume that the shortest path of the farmer must pass through a point X on segment AB (cf. Fig. 1b). Hence we can introduce the non-negative unknowns x = AX and y = BY such that x + y = AB. Moving on to the next step:

 PST 93. Describe the quantity in question via some function of the unknowns (and the knowns).

A specific case of this was done in Exercise 4 in Part II. To generalize, the farmer’s route is made of hypotenuses F X and XC in F AX and CBX: f (x, y) =



a2 + x2 +

»

b2 + y 2 where x, y ≥ 0 with x + y = AB.

1.2. Famous inequality in disguise. The reader may wonder: why do we not substitute y = AB − x to get rid of one variable in f (x, y)? We will do this later in our Calculus solution; but it is more convenient to proceed in a slightly different fashion in our first approach with inequalities below.



?

PST 94. Set up the problem as an inequality A ≥ B, where A is the quantity in question and B is a conjectured minimal 1 value of A. 1 Analogously, setting up the opposite inequality A ≤ B will work for establishing the maximal value for A.

1. FARMER-AND-COW VIA INEQUALITIES AND CALCULUS

289

1.2.1. Conjecturing by “cheating”. While the function f (x, y) will undoubtedly serve as the LHS A of our inequality, it is not at all obvious what the RHS B should be! We can take advantage of our prior experience with this problem and make an intelligent “guess”: we can conjecture that the length B of the shortest path will be equal to F  C, where F  is the reflection of the farmer F across the river (cf. Fig. 1c). So, what is B?

 Exercise 2. Calculate the length of F C in terms of a, b, x, and y. 

 Solution: If AF  EB is a rectangle (cf. Fig. 1c), »using the right F EC √ shortens our calculations: F  C = F  E 2 + EC 2 = (x + y)2 + (a + b)2 . 

We can now algebraically re-formulate the Farmer-and-Cow problem: Problem 1. (Inequality Version) For any x, y, a, b ≥ 0 prove that 

a2 + x2 +

(1)

»

?

b2 + y 2 ≥

»

(a + b)2 + (x + y)2 .

1.2.2. Nothing new under the sun. Inequality (1) is a symmetrically phrased, elegant inequality, so it should not be a surprise that it is well-known. Indeed, it is a special case of a famous and much more general inequality. Theorem 1. (Minkowski’s Inequality) If all ai , bj ≥ 0 and r ≥ 1, then » r

ar1 + · · · + arn +

» r

br1 + · · · + brn ≥

» r

(a1 + b1 )r + · · · + (an + bn )r .

If 0 < r < 1, then the inequality is reversed. We shall not attempt to prove Minkowski’s Inequality here (it will be discussed in its full generality in Inequalities II, vol. III). We can, though, prove our special inequality (1) practically with “bare hands”. 1.2.3. Proof by reasoning backwards. We will rewrite (1) in a sequence of simpler but equivalent ways. Since everything is positive, we can square both sides without changing the direction of the inequality: »

?

(a2 + x2 ) + 2 (a2 + x2 )(b2 + y 2 ) + (b2 + y 2 ) ≥ (a + b)2 + (x + y)2 . The RHS expands to a2 + b2 + 2ab + x2 + 2xy + x2 , causing four squares to cancel from each side and yielding other equivalent versions of the inequality: »

?

?

2 (a2 + x2 )(b2 + y 2 ) ≥ 2ab + 2xy ⇐⇒ (a2 + x2 )(b2 + y 2 ) ≥ (ab + xy)2 , where we divided by 2 and squared once again. We now expand: ?

a2 b2 + x2 y 2 + a2 y 2 + x2 b2 ≥ (ab)2 + 2(ab)(xy) + (xy)2 , cancel a2 b2 +x2 y 2 = (ab)2 +(xy)2 , pull everything to the LHS, and rearrange a bit to recognize the well-known formula c2 + d2 − 2cd = (c − d)2 : ?

?

a2 y 2 + x2 b2 − 2(ay)(xb) ≥ 0 ⇐⇒ (ay − xb)2 ≥ 0. The last inequality is certainly true for all a, b, x, and y.



290

12. RE-CONSTRUCTIONS. PART III

1.2.4. Don’t forget the equality! Our proof so far verified that all paths of » the farmer are at least (a + b)2 + (x + y)2 . As per Exercise 2, the latter is the length of F  C where F  is the reflected, “phantom” farmer. But we must ask: is the corresponding path F → X → C (where X is the intersection of F  C with the river) the unique shortest path for the real farmer F ? PST 95. To complete the proof of an inequality A ≥ B, investigate when

 equality is obtained. In other words, find a condition (algebraic, geometric, or other) on the involved letters that makes the two sides equal. ?

The task of solving A = B has been trivialized by the last step of our proof: inequality (1) is equivalent to (ay − xb)2 ≥ 0. Equality is obviously obtained exactly when ay = xb. Furthermore, substituting y = AB − x and solving a(AB − x) = xb for x yields the only possible value x = a ·AB/(a + b), as long as we can divide by a + b. Therefore, if at least one of a or b is nonzero, there is a unique place on the riverbank such that the corresponding path F → X → C is a shortest path for the farmer; namely, this is the point X between A and B with AX = a · AB/(a + b). If a = b = 0, then (1) is always trivially satisfied (check it!). In reality, this corresponds to the situation when both the farmer and the cow are at the riverbank: the farmer can dip his bucket into the river anywhere along his way to the cow; i.e., AX = x can be any number between 0 and AB.  1.2.5. Why be restricted to the plane? Rewriting the condition for equality in (1) in the form x/a = y/b, makes it reasonable to expect that the general Minkowski’s Inequality will become equality if and only if all ratios ai /bi are equal.2 The reader curious about more general versions of the Farmer-andCow problem can formulate and solve the problem in space (a flying farmer and a flying cow?) and even venture into four or more dimensions. 1.3. A Calculus drill. Only the reader versed in Calculus techniques should read this solution, since we will use but not justify those techniques. 1.3.1. Translation from life to Calculus has essentially been done. Recall our function f (x, y) that measures the length of the farmer’s path for x, y ≥ 0 such that x + y = AB. Setting AB = c, we can substitute y = c − x in order to reduce to a one-variable function F (x) for 0 ≤ x ≤ c: 

»

F (x) = f (x, c − x) = a2 + x2 + b2 + (c − x)2 . We are asked to find the minimum value of F (x) on the interval [0, c]. 1.3.2. The critical points of F (x) are places where optimum values could potentially occur. To locate them, we take the derivative F  (x) and set it to equal 0. The derivative itself is calculated by applying the Chain Rule twice: 2 Technically, to avoid division by 0, we must stipulate that when some bi = 0 then ai = 0, or rewrite the conditions for equality as ai bj = aj bi for all i, j.

1. FARMER-AND-COW VIA INEQUALITIES AND CALCULUS F  (x) = √

291

»  x (c − x) − = 0 ⇐⇒ x b2 + (c − x)2 = (c−x) a2 + x2 . a 2 + x2 b2 + (c − x)2

The last manipulation was simply clearing the denominators. Now note that both sides of the equality are non-negative since x ∈ [0, c]. Squaring and multiplying through leads to:

  x2 b2 +(c−x)2 = (c−x)2 (a2 +x2 ) ⇐⇒ x2 b2 +x2 (c−x)2 = (c−x)2 a2 +(c−x)2 x2 .

An obvious cancellation leaves us with x2 b2 = (c − x)2 a2 . Again, since all involved quantities are non-negative, taking the square root on both sides reduces this to xb = (c − x)a, or xb = ya as previously discovered. If both a = 0 and b = 0 (the farmer and the cow are both on the riverbank), the function F (x) is the constant x + (c − x) = c = AB, so it doesn’t matter where the farmer dips his bucket between A and B: he will end up walking straight to the cow and covering the same (minimal) distance AB. If at least one of a and b is not 0 then a + b > 0, and we can solve xb = (c − x)a for x to get x0 = ac/(a + b). This is where the only critical point of F (x) occurs and where F (x) has a potential minimum or maximum. 1.3.3. Optimum realized. To check what really happens at x0 , we investigate how the sign of F  (x) changes as x moves through x0 . Thus, instead of the ?

equality F  (x) = 0, we try to solve the inequality F  (x) > 0. Exercise 3. Check that all of the algebraic manipulations of the equalities above in Subsection 1.3.2 can be redone as inequalities: start with F  (x) > 0 for 0 ≤ x ≤ c, replace everywhere “=” by “>”, and show that eventually you will arrive at the following equivalent inequalities: ? ? ? ac = x0 . xb > (c − x)a ⇐⇒ x(a + b) > ca ⇐⇒ x > a+b To summarize, F  (x) > 0 if x > x0 and F  (x) < 0 if x < x0 (cf. Fig. 2a on p. 292). This means that the original function F (x) decreases before x0 and increases after x0 , i.e., F (x0 ) is the global minimum of F (x) on [0, c].  1.3.4. No more “guessing”. We can now find the minimum of our function: Exercise 4. Calculate F (x0 ) and simplify it as much as possible. Answer: ÄAfter » non-taxing algebraic manipulations, one arrives at ä some ac = (a + b)2 + c2 . ♦ F (x0 ) = F a+b If we recall that c = AB = x + y, the expression for F (x0 ) should not be surprising: F (x0 ) is precisely the “mysterious” RHS of the inequality A ≥ B in our previous approach. We conclude that » ac ·  F (x) ≥ F (x0 ) = (a + b)2 + c2 with equality iff x0 = a+b

292

12. RE-CONSTRUCTIONS. PART III

1.3.5. The big versus the really big picture. Our investigation of the derivative F  (x) can be used to show that the local behavior of F (x) on the interval [0, c] extends to a global behavior on (−∞, ∞). More precisely,

 Exercise 5. Using the sign of F (x) again, show that F (x) decreases for all 

x < 0 and increases for all x > c.

y

y

F

4

x

200 ) (x

F

(x )

2x

2x −

12 4−

)  (x F + 0 − x0 c

0 x0 c

x

−2 0 x0

Figure 2. Graphs of F (x) =



6 x

−100

0 x0

100 x

» √ 22 + x2 + 62 + (4 − x)2

The expected shape of the graph of F (x) is confirmed by Figure 2b in the original case of the problem with a = 2, b = 6, and c = 4, on the interval [−2, 6]. The graph basically looks like a smile,3 with the bottom of the smile at x0 = ac/(a + b) = 1. However, as we enlarge the interval to, say, [−100, 100] (cf. Fig. 2c), the graph of the function starts resembling a wedge: it “straightens out” into two lines as x moves further away from x0 = 1. If you are familiar with the necessary Calculus techniques, » √ Exercise 6. Show that F (x) = 22 + x2 + 62 + (4 − x)2 has two slant asymptotes: y = 2x − 4 when x → ∞ and y = 4 − 2x when x → −∞. Alternatively said, F (x) ≈ |x| + |4 − x| when |x| is large. Thus, the length of the farmer’s path changes approximately linearly when he approaches the river at places X very far from the cow.

2. Optimal Bridge Located! Let us apply the techniques we have developed so far to: Problem 2. (Optimal Bridge) Two villages are situated on opposite banks, not necessarily across from each other. The river has constant width. The farmers’ market is always held in the same village. The other village wants to build a bridge across the river (and perpendicular to the banks of the river) so that the total trip to the farmer’s market is as short as possible. Where should the bridge be built and why?

?

3 The “smile” refers to the formal term convex : to show that F (x) is convex on (−∞, ∞), you could verify that the second derivative F  (x) > 0 for all x.

2. OPTIMAL BRIDGE LOCATED!

293

2.1. Inequalities: something old. The memory from applying inequalities should be still fresh in our minds. So, let’s first attempt the inequalities approach in the Optimal-Bridge problem. 2.1.1. Appropriate labeling. For clarity, let’s draw a rectangle whose sides are parallel or perpendicular to the river and two of whose diagonally opposite vertices are our villages V1 and V2 (cf. the picture below). Let V1 A = a and V2 B = b be the distances from V1 and V2 to the river. If X1 and X2 are the beginning and the end of the bridge, then X1 X2 = d is the fixed width of the river. Our unknowns are the distances from A and B to the respective ends of the bridge: AX1 = x and BX2 = y. Note that the height of our rectangle is the fixed a + d + b, while its base is some c, also fixed.

V2 b

X2 d

Aa V1

x

y

B

X1 c

2.1.2. Restricting the possibilities. As ridiculous as it may seem, the bridge could be built outside our rectangle. We know how to proceed:

 outside the rectangle by showing that the resulting

Exercise 7. Eliminate the cases when the bridge is

V2 A

routes are not the shortest possible.

Sketch: If the bridge X1 X2 is to the left of our rectangle, then V1 → A → A → V2 , the (dashed) route on the picture going straight up from village V1 to the river, will be shorter than the route V1 → ♦ X1 → X2 → V2 through the bridge X1 X2 .

X2

B A

X1

V1

2.1.3. We’ve done this before! We have justified that the optimal bridge must be inside our rectangle, and hence our unknowns x and y are non-negative and add up to x + y = c. Further, the total length of the route from V1 to V2 is V1 X1 + X1 X2 + X2 V2 , which can be expressed as the following function: f (x, y) =



a2 + x2 + d +

»

b2 + y 2 for a, b, x, y ≥ 0.

By now the reader has, no doubt, seen the connection with the special case of Minkowski’s inequality, proven in the Farmer-and-Cow situation: 

a2 + x2 +

»

b2 + y 2 ≥

»

(a + b)2 + (x + y)2 =

Therefore, the length of the shortest route is (again!) iff ay = bx.

»

»

(a + b)2 + c2 .

(a + b)2 + c2 + d, attained 

2.2. The “magic” transformation: something new. If neither of the villages is on the riverbank, we found that the shortest route is obtained when x/a = y/b. In turn, this implies that two right triangles are similar: V1 AX1 ∼ V2 BX2 . Is there a geometric explanation of this phenomenon?

294

12. RE-CONSTRUCTIONS. PART III

Unfortunately, both the width d of the river and the fact that the villages are on opposite sides of the river makes our previous idea of reflecting across the river useless here. Below we uncover another transformation that will elegantly explain the situation and lead to a purely geometric solution. 2.2.1. Rearranging parts for a better understanding. As we observed, every route consists of three parts: walking from V1 to the bridge, walking across the bridge, and then walking to V2 . While the first and the third parts depend on where the bridge is built, the middle part is kind of a “constant”: • it always goes in the same direction; e.g., we can assume (as in our figures) that walking across the bridge is in the north direction; and • it has a fixed length of d.



PST 96. If some quantity (whether algebraic or geometric) consists of several parts, try swapping some of these parts: this may give you an advantageous angle by viewing the quantity in a different, easier way. In the bridge situation, why not first walk the “constant” middle part of the route and then follow it by the other two parts of route? To this end, we ignore temporarily the river: this will enable us to arbitrarily build “bridges” on land and walk on water in any direction without a bridge. Thus, (a) First walk north from V1 to point Y for a distance of d. (b) Then walk straight from Y to village V2 . In effect, this swapped the first segment V1 X1 of the route with the second, bridge-part X1 X2 . To recover our original route, we need to swap back these two parts! There is one convenient place to break the walk Y V2 in order to build the bridge:

V2 X2

Z2 d

Y

X1

d

Z1

d

V1 (c) Let Y V2 intersect the riverbank of V2 in point X2 . (d) Build the bridge at X2 , going back to point X1 across the river. (e) Let the villagers take the route V1 → X1 → X2 → V2 .

2.2.2. Wait! Is this the most optimal route? A proof is in order here.



Exercise 8. Take another route V1 → Z1 → Z2 → V2 (going over an actual bridge Z1 Z2 , of course!) and show that it is longer than the route V1 → X1 → X2 → V2 proposed by the algorithm (a)-(e) above. Hint: Using two parallelograms and the Triangle Inequality, re-direct everything through Y without changing the overall length of the routes. ♦



2.2.3. What is the “magic” transformation? If you think about what happened above, for each route V1 → Z1 → Z2 → V2 we found another route V1 → Y → Z2 → V2 of equal length (not necessarily through a bridge) that always started with segment V1 Y ; i.e., the useful transformation turned out to be a translation V1 → Y from village V1 to the north by distance d.

2. OPTIMAL BRIDGE LOCATED!

295

The idea of the translation in the plane can also explain the aftermath of our previous inequality solution, where by algebraic calculations we discovered that the shortest route occurs if x/a = y/b. Exercise 9. Justify geometrically that V1 AX1 ∼ V2 BX2 for the optimal

 bridge X X . 1

2

Solution: According to our algorithm (a)-(e), in V2 the shortest route V1 → X1 → X2 → V2 the first and third segments are parallel : by construction, V1 X1 || Y V2 and X2 V2 lies on Y V2 , so V1 X1 || X2 V2 . X2 B But the two riverbanks are also parallel to each other. Thus, angles ∠V1 X1 A and ∠V2 X2 B are A formed by two pairs of parallel sides and therefore Y X1 they are equal (why?). Since V1 AX1 and V2 BX2 are right triangles, AA criterion implies the desired V1  similarity V1 AX1 ∼ V2 BX2 . To wrap up the discussion, from equal ratios in V1 AX1 and V2 BX2 we have V1 X1 /AX1 = V2 B/BX2 , i.e., x/a = y/b. In other words, the inequalities and the translation solutions yield the same optimal bridge. 2.2.4. Is the optimal bridge always unique? So far we worked only with the case when none of the villages was directly at the river. This caused the existence of Y Z2 V2 and a unique optimal bridge. To complete the picture,

 are on their corresponding riverbanks. How many optimal bridges are there Exercise 10. Investigate the special cases when one or both of the villages and how do we locate them?

2.3. Why only two villages? If you want to test everything you’ve learned so far in Parts I-III about solving optimization problems, bump up the number of villages to three, change the river to a railroad track, and try to come up with a variety of approaches (purely geometric, inequalities, and Calculus – anything counts!) to the following challenge problem: Problem 3. (Optimal Station) There are three villages nearby a railroad track: one is situated right by the track, and the other two are built symmetrically on opposite sides of track. Where should the villages build a joint train station so that the total commute from the three villages to the station is the shortest possible? Hint: Let V3 be the village at the railroad track and V1 and V2 the other two villages. Two different situations occur depending on how ∠V1 V3 V2 com♦ pares to 120◦ .

V3 ? V1

V2

296

12. RE-CONSTRUCTIONS. PART III

3. Infinitely Many Angles and Infinite Series The next challenge will require both trigonometry and advanced Calculus techniques. Read on only if you are fluent in both. Problem 4. (ℵ0 –Squares) Glue to each other infinitely many identical squares with bases AA1 , A1 A2 , A2 A3 , A3 A4 , A4 A5 , and so on, to form an infinite row (cf. Fig. 3). If D is the top left corner of the first square, right above A, what is the sum ∠AA1 D + ∠AA2 D + ∠AA3 D + ∠AA4 D + · · · ? D

α1

A

α2

α3

α4

A2

A1

A3

α5

A4

A5

Figure 3. α1 + α2 + α3 + α4 + α5 + · · · = ? 3.1. Finite or infinite? Problem 4 asks us to find the sum of all angles αi . From the Three-Squares problem, we know that α1 + α2 + α3 = 90◦ . So, let’s concentrate on finding the sum of the rest of the αi ’s. i To this end, define the partial sum sn to be sn = α4 + α5 + · · · + αn for any n ≥ 4. From right DAAi , tan αn = n1 . Luckily, the formula from Part II for tangent of a sum will link recursively all values of tan sn : tan sn = tan(sn−1 + αn ) = Thus, tan α4 = tan s4 =

1 4

tan sn−1 + n1 tan sn−1 + tan αn = · 1 − tan sn−1 tan αn 1 − tan sn−1 · n1

and tan s5 = ( 14 + 15 )/(1 −

1 4

· 15 ) =

9 19 .

 culate the first dozen terms of {tan s }. Is the sequence increasing?

Exercise 11. Starting with tan s5 and rounding to the nearest tenth, caln

Solution: The values of tan s5 through tan s16 are approximately: 0.5, 0.7, 0.9, 1.2, 1.5, 1.9, 2.4, 3.1, 4.2, 6., 10.1, and 27.9. Since the tangent function is increasing on [0◦ , 90◦ ), it is no surprise that the sequence seems to be increasing. . . . But the very next term will make us stop in our tracks: tan s17 = −43.6 < 0. To cause the tangent to be negative, we must have gone over the right angle, i.e., s17 = α4 + · · · + α17 > 90◦ . Thus, the sequence {tan sn } is not increasing. 



PST 97. To find out if a sum is finite or infinite, investigate the first partial sums and make a conjecture in order to know what type of proof to expect, because the techniques in the finite vs. infinite case will be different. With this in mind, we keep on investigating the sequence {tan sn }. To go over another 90◦ , i.e., to turn the tangent positive again, check that you will need to wait much longer: tan s81 ≈ −0.01 and tan s82 ≈ 0.002. So far, (α1 + α2 + α3 ) + (α4 + · · · + α17 ) + (α18 + · · · + α82 ) > 3 · 90◦ = 270◦ .

3. INFINITELY MANY ANGLES AND INFINITE SERIES

297

Given the evidence, there is no reason to expect that the sum of the infinitely many angles αn will be finite! We are compelled to make the following Conjecture 1. The sum α1 + α2 + · · · + αn + · · · is unbounded. The conjectured infinite sum will necessitate a completely different approach compared to that in the Three-Squares problem. Before we dedicate the rest of the section to proving the conjecture, it is interesting to ponder over an analogous, “semi-finite” version of the Three-Squares problem: Problem 5. We know that α1 + α2 + α3 = 90◦ . Is there a place beyond α3 where the sum of the angles up to αn is an exact multiple of 90◦ , i.e., are there natural numbers n, k ≥ 4 for which α1 +α2 +α3 +α4 +· · ·+αn = 90◦ k? 3.2. Inverse trigonometry. As stated, Conjecture 1 is difficult to prove because we can easily calculate tan αn = 1/n, but we are trying to find the sum of the inputs αn , not of the outputs 1/n. We are looking at the “wrong” function: tan x! Instead, we should be looking at its inverse arctan y, which is defined for all reals? In particular, arctan(1/n) = αn and we can rewrite: Conjecture 1 . arctan 1 + arctan 12 + · · · + arctan n1 + · · · = ∞. This is a Calculus problem about series, and a sequence of Calculus exercises will help us justify that the series diverges. 3.3. Bounding from below will serve as our first step.



PST 98. Let the terms an of a sequence be given by some function f (x). To show that the an ’s add up to ∞ (the sum has no upper bound), find a lower bound for f (x), i.e., another function g(x) such f (x) ≥ g(x), and show instead that the corresponding terms bn given by g(x) add up to ∞. In our case, an = f ( n1 ) with f (x) = arctan x, and the bn ’s should be given as bn = g( n1 ). If you haven’t worked before with Taylor series, the choice we will make here for the lower bound for arctan x will seem to come out of nowhere. As we shall see later, it is not a guess at all. Exercise 12. The function arctan x for 0 ≤ x ≤ 1 is bounded from below 3 by a cubic polynomial g(x); namely, arctan x ≥ x − x3 for all x ∈ [0, 1]. We shall first go through a less technical proof that avoids Taylor series and relies on analysis with derivatives to minimize a function. Proof 1: Pull everything to the LHS to form a new function h(x) = 3 arctan x − x + x3 . To show that h(x) ≥ 0 on [0, 1], calculate and simplify x4 the derivative: h (x) = 1+x 2 , and note that it is always non-negative! Hence h(x) increases for all x. In particular, for x ≥ 0 we have h(x) ≥ h(0) = 0. Unraveling, arctan x − x +

x3 3

≥ 0, i.e., arctan x ≥ x −

x3 3

for x ≥ 0.



12. RE-CONSTRUCTIONS. PART III

and x −

x3 3

nx

arcta

x 0

3 x 3

Exercise 13. Show that x ≥ arctan x for x ≥ 0.  What happens among the functions x, arctan x,

y

x−

As the picture shows, arctan x is sandwiched be3 tween the polynomials x − x3 and x. We proved 3 above that arctan x ≥ x − x3 for x ≥ 0. For practice, using the derivative techniques above,

x

298

when x ≤ 0?

3

3.4. The price to pay for demystifying the cubic polynomial x − x3 is using Taylor expansions. It is a standard exercise in Calculus to derive the Taylor expansion of arctan x centered at x = 0 and find the interval where it converges to arctan x. We will discuss this calculation only in the Hints section, and leave it to the advanced reader to investigate the topic in a Calculus textbook. The result needed for our purposes is: x 3 x5 x7 + − + ···· 3 5 7 The RHS looks like a polynomial of “infinite” degree, but we need only the 3 degree-3 polynomial x − x3 made of its first terms! Why does dropping the 3 higher powers of x yield the desired inequality arctan x ≥ x − x3 for x ≥ 0?

Exercise 14. For any x ∈ [−1, 1], arctan x = x −



PST 99. Given an equality between a function and an infinite series (such as in Exer. 14), group the unwanted terms in the RHS and show that each group is positive (or each group is negative, as needed). Then drop all such grouped terms to produce an inequality in the desired direction. Equipped with PST 99, we can justify again the lower bound for arctan x. Proof 2 of Exercise 12: Restricting the Taylor expansion of arctan x to 0 < x ≤ 1, note that the absolute values of the terms decrease as n grows: ? x2n+1 ? x2n+3 2n + 3 ? 2 2 ⇔ ≥ ≥x ⇔1+ ≥ x2 , 2n + 1 2n + 3 2n + 1 2n + 1 2 and the last in certainly true because 1 ≥ x . Leaving alone the first two 3 terms x − x3 , we can therefore group the remaining (unwanted) terms into pairs with non-negative differences when x ∈ [0, 1]: Ç 5 å Ç 9 å Ç 2n+1 å x x x x7 x11 x2n+3 + + ··· + + · · · ≥ 0. − − − 5 7 9 11 2n + 1 2n + 3 As a result, arctan x ≥ x −

x3 3

for x ∈ [0, 1].



3.5. Classic infinite and finite sums. Recall that we wanted to show that all arctan n1 add up to ∞. From Exercise 12, we know that their sum 3 will be at least the corresponding sum of values of x − x3 ; namely, ã ∞ ∞ Å   1 1 1 − 3 · arctan ≥ (2) n n=1 n 3n n=1

3. INFINITELY MANY ANGLES AND INFINITE SERIES

299

So by PST 98 we need to verify that the RHS of (2) adds up to ∞. Part of i this RHS sum known as the harmonic series will be infinite: Exercise 15. Show that the reciprocals of all natural numbers add up to ∞: 1 1 1 1 + + + · · · + + · · · = ∞. 2 3 n On the other hand, the rest of the RHS of (2) is known to be finite: Exercise 16. Show that sum of the reciprocals of all cubes of natural numbers is bounded from above, i.e., for some number B: 1 1 1 1 + 3 + 3 + · · · + 3 + · · · < B. 2 3 n Note that Exercise 16 is not asking us to find the exact value of the sum of all i 1 . This value is denoted by ζ(3), after the famous Riemann zeta-function n3 1 2 = π6 ; but no closed formula is known ζ(z). It turns out that ζ(2) = n2 for ζ(3)! Hence, our task here in only to show that ζ(3) is finite, or what it is equivalent to here, that it is bounded from above by a number B. There are many ways to do Exercises 15-16; for example, by integrals. We will go instead along paths accessible without Calculus knowledge but requiring advanced PSTs for sequences and some non-trivial thinking. 3.5.1. Doubling the index adds another half. To start off, for any n ≥ 1 let i an = 1+ 12 + 13 +· · · + n1 . The an ’s are called the partial sums of the harmonic series. Since n1 > 0, the sequence of partial sums {an } is increasing.



PST 100. To show that an increasing sequence {an } goes to ∞, it is enough to show that a subsequence {ank } of it goes to ∞. The choice of a convenient subsequence {ank } depends on the specific example. For our harmonic series, something inventive needs to be done. Solution to Exercise 15: The slick approach is to consider the subsequence {a2k } made of every (2k )th term. Now, every next term a2k+1 is a sum of twice as many fractions as the previous term a2k . How will this increase the value of a2k ? Check the beginning: a20 = a1 = 1, a21 = a2 = 1+ 12 = 1 12 , Ä

a2 3



ä

Ä

ä

Ä

ä

Ä

ä

a22 = a4 = 1 + 12 + 13 + 14 > 1 + 12 + 14 + 14 = 2, Ä ä Ä ä = a8 = a4 + 15 + 16 + 17 + 18 > 2 + 18 + 18 + 18 + 18 = 2 + 4· 18 = 2 12 ·

A pattern emerges: when we double the index from 2k to 2k+1 , the terms a2k increase by at least a half, which is the brilliant idea in this approach: a2k+1 = a2k +

1 2k +1

+

1 2k +2

+··· +

1 2k+1

1 > a2k + 2k · 2k+1 = a2k + 12 ·

Using induction, one can formally show that a2k ≥ 1 + k 21 for all k ≥ 1. But the new, smaller sequence {1 + k 21 } obviously goes to ∞, pushing the larger sequence {a2k } to go to ∞. Retracing our steps, by PST 100 we conclude  that the original (increasing) sequence {an } is also forced to go to ∞.

300

12. RE-CONSTRUCTIONS. PART III

3.5.2. Telescoping for convergence. Turning now to the partial sums of the 1 1 1 + · · · + n13 , we must change tactics because series n3 , bn = 1 + 23 + 33 1 1 and the harmonic series n behave in opposite ways! n3

 PST 101. To prove that a sequence {b } is bounded from above, find ann

other sequence {cn } greater than it and bounded from above. Symbolically, if bn ≤ cn and cn ≤ B for all n, then bn ≤ B for all n.

Solution to Exercise 16: Confirm the following chain of events: 1 1 1 1 1 (∗) − for any n > 1, < 2 < = 3 n n n(n − 1) n−1 n where in (∗) we split the fraction as a difference of two simpler fractions. If you remember the telescoping method from Induction (vol. I), you will recognize that our solution is about to employ this method: 1 1 1 1 + 3 + ···+ + 3 3 3 2 3 (n − 1) n Å ã Å ã Å ã Å ã 1 1 1 1 1 1 1 1 − − − − ≤1+ + + ···+ + · 1 2 2 3 n−2 n−1 n−1 n

bn = 1 +

Almost all intermediate terms cancel, leaving only three surviving fractions: 1 1 1 bn ≤ 1 + − = 2 − < 2. 1 n n ∞  1 ≤ 2.  Thus, an upper bound for all bn ’s is B = 2, and ultimately, 3 n n=1 To show that all fractions n13 actually add up to something, a strong Real Analysis theorem needs to be invoked (cf. the Hints section). 3.6. Concluding arguments. Recall inequality (2) from page 299, which 1 provided a lowerä bound for our desired sum of arc-tangents: ∞ n=1 arctan n ≥ ∞ Ä 1 1 n=1 n − 3n3 . If we stop the sum on the RHS at some n and regroup the terms, the partial sums an and bn discussed above will spring up: Å

ã

1 1 − + 1 3 · 13 Å 1 1 + + ··· + = 1 2

Å

ã

Å

ã

1 1 1 1 − − +··· + 3 2 3·2 n 3 · n3 ã Å ã 1 1 1 1 1 1 2 − + 3 + · · · + 3 = an − bn > an − , 3 n 3 1 2 n 3 3

where in the last inequality we used bn < 2. Since {an } goes to ∞, then {an − 23 } also goes to ∞, making the whole RHS of (2) also go to ∞. This in turn pushes the larger sum arctan n1 on the LHS of (2) to go to ∞. Translating back to our original Problem 4, arctan n1 = αn and the infinitely many angles in Figure 3 do add up to ∞: α1 + α2 + · · · + αn + · · · = ∞. This completes the proof of Conjecture 1. 

4. HISTORICAL DETOUR: FROM TODAY BACK TO ARCHIMEDES?

301

4. Historical Detour: from Today back to Archimedes? We managed to conquer the Infinitely Many Squares problem using anything but geometry! Yet, its predecessor, the Three-Squares problem, yielded to a variety of geometric ideas. In fact, 54 proofs to it that use only elementary geometry can be found in Charles Trigg’s article [82] from 1971 in the Journal of Recreational Mathematics. Our own investigation of the Three-Squares problem prominently included the brilliant 5th -grade solution in Part I, based on the specific tiling of the 2 × 3 grid-rectangle shown in Figure 4a.

Figure 4. Tilings in the Three-Squares and Stomachion For someone who has followed the recent great discoveries of ancient mathematical works, this discussion may have triggered a memory of other tilings: Figure 4b represents one possible solution to the famous Stomachion, a 14-piece puzzle attributed to Archimedes.4 The task is to take the pieces out and then reassemble them back into the square shape. At a first glance, the pieces are so distinct that it seems just a few configurations are possible; but our intuition is very far from the truth! It was only in 2003 that William Cutler, via a computer program, proved that there are 17,152 possibilities. Discarding those that can be obtained from each other by rotations and reflections, he showed that the number of truly different ways to arrange the puzzle is exactly 536 [18]. And there is more amazing combinatorics related to the problem! For example, as pointed by Fan Chung and Ron Graham [14], there are 3 pairs of pieces such that no matter how we rearrange the 14 original pieces, these 6 pieces will line up within each pair next to each other exactly as shown by the shaded figures in Figure 4c (and as one can check too in Figure 4b). In other words, after gluing the pieces within these pairs, we are left to play with only 11 pieces. 4

The Stomachion is a 950 AD copy of a work of Archimedes by a Byzantine scribe. It is also the last paper in the Palimpsest, a collection of several manuscripts that were scraped, washed, and reused in the 13th century for a Christian liturgical book. Having a fascinating history on its own of being discovered, re-discovered, and lost in the 19th and 20th centuries, the Palimpsest finally became available again to the public after it was purchased by an anonymous bidder in 1998 for over $2,000,000. This led to a decade of scholarly research that heavily relied on technological advances, making the original papers in the Palimpsest readable and overturning century-held beliefs.

302

12. RE-CONSTRUCTIONS. PART III

Back to Archimedes, it is not completely clear what his ultimate goal was in working on the puzzle. Unfortunately, only the beginning of the Stomachion is preserved in the manuscript, and it is hard to judge where the text was actually leading. Alexander Givental from UC Berkeley conjectured that if the whole of the paper were recovered it would show that Archimedes was solving the problem of comparing angles of triangles on a grid lattice, thus discovering the basics of trigonometry. We may never know whether this conjecture is true or not. But we certainly did use trigonometry in the 8th -grade solution to the Three-Squares problem, and the general idea of tilings on the grid lattice appeared in both the Stomachion and in the 5th -grade solution to the Three-Squares problem. Since geometry (and, for that matter, combinatorics) was entirely absent in our approach to the Infinitely Many Squares problem, a gap is begging to be filled by the most curious, persistent, and advanced readers: Problem 6. (Super Challenge) Find a purely geometric argument, perhaps along the lines of tiling up the grid lattice, to prove that the sum of all angles αn in the Infinitely Many Squares problem is ∞. Do you think Archimedes would have been able to come up with your solution?

5. Hints and Solutions to Selected Problems Exercise 1. From right F AY we have F Y > F A. By the Pythagorean Theorem for right CBY and right√ CBA, and from √ Y B > AB (A is between Y and B), we have Y C = CB 2 + Y B 2 > CB 2 + AB 2 = AC. Adding the two inequalities verifies that the path through Y is longer than the path through A: F Y + Y C > F A + AC.  Exercise 5. The text formula for F  (x) still works when x < 0 or x > c: x (c − x) −» · F  (x) = √ 2 2 2 a +x b + (c − x)2 In case x < 0, the first fraction is negative while the second fraction is positive (why?), making the overall difference negative: F  (x) < 0 for x < 0. This implies that F (x) decreases when x < 0. ♦ Argue similarly to show that F  (x) > 0 for x > c. »

Exercise 6. More generally, for any function g(x) = A2 + (x − B)2 we will show that g(x) ≈ |x − B| when |x| is large. Indeed, rationalizing the “numerator” of the difference g(x) − |x − B|, we obtain: g 2 (x) − (x − B)2 A2 g(x) − |x − B| g(x) + |x − B| · = = · 1 g(x) + |x − B| g(x) + |x − B| g(x) + |x − B|

Since both g(x) and |x−B| go to ∞ when x → ±∞ (why?), the denominator goes to ∞, forcing the whole fraction to converge to 0. In other words, when |x| is large we have g(x) − |x − B| ≈ 0, i.e., g(x) ≈ |x − B|.

5. HINTS AND SOLUTIONS TO SELECTED PROBLEMS

303

Applying this to the two » square root functions appearing in our F (x), we √ 2 2 have 2 + x ≈ |x| and 62 + (4 − x)2 ≈ |4−x|, so that F (x) ≈ |x|+|4−x| when |x| is large. Thus, when x → ∞, F (x) ≈ x + (x − 4) = 2x − 4, and when x → −∞, F (x) ≈ −x + (4 − x) = 4 − 2x.  Exercise 7. The middle parts of the routes are equal: X1 X2 = AA = d. From right triangles X2 BV2 right V1 AX1 , we have V1 A 34 > 23 = 1 − 13 . Exercise 16. To show that the fractions n13 add up to something (i.e., the partial sums bn converge to a limit, called the sum of n13 ), we enlist the following theorem from Real Analysis: Theorem 2. (Monotone Bounded Theorem (MBT)) If a sequence is monotone (i.e., either only increasing or only decreasing) and bounded (from above and from below), then it converges to a number, called its limit.

1 increase because we keep adding In our case, the partial sums bn of n3 positive fractions. The bn ’s are also bounded from below by, say, 0, and from above by 2 (as shown in the text). Thus, by MBT, {bn } converges to some (finite) limit L = ζ(3). 

V2

V1

V1

T

V2 T

V3 V3

Figure 5. Optimal Station when ∠V1 V2 V3 is < 120◦ or ≥ 120◦ Problem 3. The optimal station will be located at a point T along the railroad, inside V1 V2 V3 , and such that the three angles between arms T V1 , T V2 , and T V3 are as equal to each other as possible. In case ∠V1 V2 V3 < 120◦ , T will be the unique such point with ∠V1 T V3 = 120◦ , making the three angles all equal to 120◦ (cf. Fig. 5a). If ∠V1 V2 V3 ≥ 120◦ , then T will coincide with village V2 (cf. Fig. 5b). Can you find a geometric way to justify the answer? ♦

Epilogue 1. What Comes from Within It is the 1980s. A sunny 5th grade classroom in Bulgaria. The math teacher opens the class register, calls two girls to the board, and gives each a problem. Soon enough, one of the girls writes a correct solution, receives an A, and goes back to her seat. The other girl is stuck; she tries one approach, then another; but the boat and ship in her problem go up and down the river and refuse to meet in simple mathematical equations . . . . Meanwhile, the other students “tame” the vessels in their notebooks, and the teacher moves on with the new lesson. The girl remains at the board for the rest of the period, her tears making it even harder to think about the problem. The bell rings. The teacher beckons the girl and asks: “You know what grade you deserve, yes?” A nod. “Well, I will not give it to you if you explain the correct solution to me by the next math class.” The still sobbing girl goes home in a miserable mood, yet with a big hope. Her father (a shipbuilding engineer, speaking of coincidences) helps her derive a system of two linear equations in two variables. From here on the solution is easy, and so the girl explains it to the teacher the next day. Having avoided the poor grade, she doesn’t stop there: “May I come to your math circle?” she inquires. Three months later, to her classmates’, parents’, and her own amazement, that girl wins the local Math Olympiad with a perfect score. Her fate is sealed right then: math will be her future. Sure enough, she will continue for years with her ballet, piano, and guitar lessons; she will attend a poetry circle and compete at science and literature olympiads; but her passion . . . her passion will always be for math problem solving. Later that year she would devise her own way of conquering the last row of the Rubik’s Cube (having learned to solve the first two at her math circle); in a couple of years she would represent Bulgaria at the International Mathematical Olympiads (IMO); then go onto a math major at Bryn Mawr and a doctorate at Harvard; train the USA math team for the IMOs . . . and come full circle by founding the Berkeley Math Circle in 1998. 305

306

EPILOGUE

That girl is me – not angry at my middle school math teacher for putting me on the spot in front of the whole class, rather, grateful to her for giving me a second chance, for seeing the seed of talent in me, for accepting me and nurturing my mathematical curiosity at her math circle, and for propelling me forward with the belief that “what comes from within will take you far.”

2. The Culture of Circles 2.1. All you need is love. There is more than one way to fall in love with mathematics. Many Eastern European mathematicians have come along the path of math circles, where they have learned for the first time that the world of math is larger than one could imagine, more interesting, and more diverse. The math circle culture is ingrained in the societies in these countries. During the communist era, established mathematicians and pre-college teachers considered it their duty to expose the younger generation to the wonders of mathematics. And so they teamed together to found and run math circles. In my hometown of Rousse ( ), for example, the math circles used to meet twice a week in the afternoons or after dinner for 1.5-hour sessions. The elementary/middle school math circle started in 3rd grade and included about 25 kids of the same age from my school. The high school math circle started in 8th grade, held its sessions at the local science youth center, and involved about 15 students from several schools, about half of whom made the circle’s core and competed at local, national, and international olympiads. Concurrently, there were identically organized circles at all grade levels 3–11. The material covered ranged from basic algebra and geometry to advanced olympiad problem solving, to lower- and upper-division college topics. 2.2. Worthy of a circle. Mathematics was not the only subject “worthy of a circle”. Starting in late middle or early high school, there were circles in chemistry, physics, and biology; in English, poetry, and literature. I participated in just about all of them at one time or another. I tried many fields because the opportunities were there for me to explore. The math circles were only part of a large net of pre-college circles created to draw children and discover their talents. It was no more prestigious or “cool” to attend a soccer club or take music lessons than to be a member of, say, a high school physics circle. In fact, parents knew how important the advanced knowledge gained in circles would be for their children’s future and hence enthusiastically supported circle participation. 2.3. And they said higher math wasn’t practical? It was to my advantage to attend math circles in particular. The type of thinking and specific knowledge I mastered there helped me win science olympiads, e.g., devise systems of equations to balance chemical elements or solve a quadratic equation in physics in 7th grade. I was heavily courted by my high school teacher to participate in biology olympiads, for they often involved combinatorial gene-counting or probability theory: a piece of cake for math circlers.

3. EASTERN EUROPEAN VS. USA MATH CIRCLES

307

Even composing poetry and critiquing literature apparently benefited from my “math-set” of mind. My favorite story here (which, incidentally, landed in the Philadelphia Inquirer in the early 1990s) goes back to the mandatory two-semester freshman English course at Bryn Mawr. As the only non-native speaker of English in my class, I put a tremendous effort into the weekly essay assignments, practically sleeping with the dictionary under my pillow every weekend before the homework was due. Still awaiting my final grade in January, I was sitting one day on the floor in my dormitory and assembling my spring schedule. The phone rang, and, to my surprise, my English instructor spoke at the other end of the line: “Are you a math major, by any chance?” I answered affirmatively and steeled myself for the worst. The instructor exclaimed: “It figures! You write so clearly and in such a structured way, yet your personality shows through your words! Even though I disagree with half of the arguments in your final essay [‘How to Read the Beatles’] you wrote them so convincingly, like a true mathematician . . . . I exempt you from the second semester of English. You should take a higher-level course: I can teach you nothing more in writing in this course.”

I didn’t end up taking another English course (probably a mistake on my part); but needless to say, my math (and literature) circle training was responsible for the above remarkable exemption.

3. Eastern European vs. USA Math Circles 3.1. He loves me; he loves me not! On the larger scale, the math circles shaped my future by drawing me like a powerful magnet to the world of mathematical problem solving. Since that 5th grade dramatic experience, I knew within me that no subject but math would complete me, and no profession other than one in mathematics would be satisfactory to me. It is because I loved math at school that I went to the math circle to get more of it. Unfortunately, students in the U.S. by-and-large do not like their math classes. And let us not deceive ourselves: generally, the talented middle and high school students are bored by the low-level math, the relentless repetition, and the lack of advanced ideas or challenging problems. And it is because they don’t like math in school that they come to the Berkeley Math Circle. Ironic, isn’t it? 3.2. Frequently asked questions. Here are some more differences between Eastern European and U.S. math circles. Keep in mind that not all U.S. circles follow the BMC model, and neither are my hometown math circles (HMC) identical twins of the other Eastern European math circles. 3.2.1. Age of circlers. While in HMC all students were about the same age, U.S. math circles may incorporate students of a variety of ages, e.g., BMC ordinarily engages students in two or three different grades, but sometimes ranging from 4th to 12th grade, all sitting and learning in the same room.

308

EPILOGUE

3.2.2. Logistics. HMC met twice a week for 1.5 (or more) hours. The HMC were numerous and organized in such a way that students ordinarily could go there and get home without parents’ assistance. U.S. math circles, due to transportation issues and conflict with other established school and out-ofschool activities (e.g., volleyball team, music lessons, chorus, etc.), may meet only once a week for 2-hour sessions. The large area covered by the one BMC (from Sacramento to San Jose, from Palo Alto to Orinda and Danville) calls for parents to drive their kids across the long distances and forces the evening BMC time (6–8 pm) during the week, or alternative weekend sessions whose timing presents other obstacles to families and organizers. 3.2.3. Home base. While HMC were either based at a school or at a local math/science center, their U.S. counterparts are usually university-based. A sufficient number of teachers in Eastern Europe were qualified to lead math circles on their own, with some occasional support of materials and instructors from a nearby university. Alas, this is not the case in the U.S. 3.2.4. Topics in HMC were organized in modules, providing continuity and gradual increase of difficulty and depth of the material. This was possible mostly because the students had very similar math background, level of knowledge, and mathematical maturity and because circlers attended all sessions: transportation issues did not exist and other activities were deprioritized by the math circles. In the U.S., the circlers may vary from beginners to seasoned members of the national USA math team, and hence single powerful sessions incorporating the various levels and backgrounds are more practical than long sequences of linked sessions. Besides, the sparsity of U.S. math circles and competing activities (which become more the older the student gets) means regular weekly attendance is not always possible; hence missing one session should not preclude understanding the following one. For the BMC-advanced group, the sessions are usually singletons, with occasional series of 2 sessions. For the BMC-intermediate group, the sessions are often in a series of 2, while for the BMC-beginners group a single instructor undertakes a module of 3–4 thematically arranged sessions. (The BMCelementary groups have the same instructor throughout the whole year, and topics tend to last for a month or two of sessions.) The younger the students, the more continuity in topics and instructors is provided at BMC. 3.2.5. Session leaders in HMC were only one or two teachers who organized the specific math circle. Occasionally we had guest speakers from the local university, and once in a while we were visited by professors from Sofia University or the National Youth Science/Math Center who trained the Bulgarian national team. In contrast with HMC, each BMC instructor leads an average of 2 sessions per year, accounting for approximately 50 instructors at the BMC-Upper every year. They are mathematicians from nearby universities and colleges, some specially trained high school teachers, some professionals working in related fields, and even some alumni and current advanced circlers.

3. EASTERN EUROPEAN VS. USA MATH CIRCLES

309

3.2.6. Popularity. Everyone in Eastern Europe knew about the math circles; children and parents alike were well aware of the opportunity to enroll and of the possibilities which successful participation might open in the students’ future. What portion of the U.S. population has an inkling that math circles exist? Negligible. What status do math circles have in U.S. society and its educational system? Unclear. Can they compare in popularity to membership of a high school football or debate team? No, they can’t.

Figure 1. Football or Math Circle? 3.2.7. Government support. The overall organization and funding in the socialist model math circle was entirely secured by the state; a math circle was an extracurricular activity roughly equivalent to one course each semester and was thus correspondingly compensated by the Ministry of Education. To the contrary, SF Bay Area math circles, for instance, are partially funded (if at all) by private sources; the remaining “funds” are donated by volunteers’ time, effort, professionalism, and enthusiasm. Undoubtedly, the reader has more questions, and the comparison list can go on and on. But this Epilogue is not intended as an exhaustive study of the math circle phenomenon. For more details on U.S. Math Circles, see Sam Vandervelde’s “Circle in a Box” [84]. 3.3. Get to the point. One way to resolve most of the problems associated with math circles in the U.S. is . . . (OK, start dreaming!) . . . to have a math circle at every college and university. (1) The professor organizing and running the math circle will receive a one- or two-course release from the math department, depending on the frequency, length, and intensity of the circle sessions. This will compensate for the huge effort involved in directing a math circle and will hopefully encourage more mathematicians to get involved in educating the talented youth of the U.S. (2) The math circle can be formally organized as a math course and, thus, be open also to undergraduates. (3) Undergraduate and graduate students, as well as interested postdocs and tenured faculty, can be vertically integrated in this model.

310

EPILOGUE

(4) A modest semester fee for non-university participants (pre-college students and teachers) will provide honoraria to the session leaders. (5) The math department can provide secretarial and computing support and office supplies, as well as a work-study student assistant and web administrator. The math circle will be an invaluable math program offered to the local community and can be viewed as part of the math department’s outreach activities. This network model will resolve transportation problems at least for the urban and suburban areas (i.e., areas with an institution of higher education), will mobilize previously disinterested math faculty, and will give some tangible and formal recognition to the work of math circle leaders. An NSF VIGRE grant for the University of Utah ensured the above model for their math circle, led by Peter Trapa and Dan Ciubotaru [83]. Other university-based circles approaching this model were founded at San Jose State University [71], University of California at Davis [19], Stanford University [78], University of California at Los Angeles [47], and others. Below we’ll examine more closely the model of the Berkeley Math Circle [11].

4. History and Power Despite the shortfalls of U.S. math circles’ set-up, don’t get me wrong: I founded and ran one such circle for a decade and plan on doing so for at least another decade. If I had to describe the Berkeley Math Circle in one phrase, it would simply be a “high-power version of my hometown math circle”. But let’s start from the beginning. 4.1. To marvel and to be appalled. By my last year of graduate studies at Harvard, I had taught enough math courses to question the quality and depth of pre-college math education in the U.S. The few strong (very strong!) undergraduates never took calculus or linear algebra (apparently having taken them at some university while in high school) but jumped directly to upper-division courses like real analysis, abstract algebra, or number theory, to name a few. The cream of the crop, former USAMO winners and IMO medalists, even ventured into graduate courses like algebraic geometry or topology, or Lie algebras (why not?). Each and every such top student had beaten his/her own path out of the jungle of U.S. secondary math education by hiring tutors, by escaping to a nearby university, or, if extremely talented in problem solving, by qualifying for the 30-student one month Mathematical Olympiad Summer Program (MOSP), in preparation for the IMOs. As I marveled at the super-advanced math knowledge and skills those relatively rare students had acquired through very special personal circumstances, I was appalled at the general math level of the remaining huge bulk of undergraduates. We are talking here about problems in dealing with fractions and simple algebraic manipulations, with which, I am sure, a 6th grader in Bulgaria would have felt perfectly comfortable!

4. HISTORY AND POWER

311

4.2. The missing link. In addition to the outrageous discrepancy between the “top” and the “generic” math student, the link between secondary and college math education – the math circles – was nonexistent as a system. It seemed to me there was no statewide system in the U.S. to meet the needs of talented math students, to discover and train them, to inspire them to continue on with advanced mathematics. And so, I decided it was high time to get acquainted, first-hand, with secondary education in the U.S.: I enrolled in the Massachusetts’ teachers certification program. The two high schools for my practicum, Newton North and Chelmsford, offered me an interesting mixture of classes from almost remedial algebra to a problem-solving course of my design. I saw the mathematical potential in a number of students, the desire to go beyond the regular school curriculum. But I realized too that the math teachers were overburdened with courses, never-ending administrative chores and extracurricular activities; the additional load of running math circles (assuming some teachers were qualified and willing) was inconceivable unless the school supported the enterprise financially and administratively. And these were two of the good and prosperous schools in the Boston area. 4.3. The chicken or the egg. I didn’t have time to think about the situation in the bad schools, as I graduated from Harvard and moved in 1997 to Berkeley to take up a postdoctoral position at the Mathematical Sciences Research Institute (MSRI). It wasn’t a month into my new job, when I got an e-mail from Hugo Rossi (then the Deputy Director of MSRI) asking MSRI members for suggestions on possible outreach activities to the community. About 10 minutes later, Hugo and I were in agreement that a regional Math Olympiad for pre-college students would be the right thing to do: an Olympiad different from the numerous fast-type calculational contests, an Olympiad consisting of a few hard essay-proof problems for several hours, in the true fashion of Eastern Europe. I met Paul Zeitz (University of San Francisco) a week later, and definite plans to start the Bay Area Mathematical Olympiad (BAMO) were set in motion. To publicize the plan, in the late fall of 1997 MSRI asked me to give a talk to an audience of 400 people at a bi-annual public event. Sandwiched between two spectacular lectures on the mathematics behind “Brain Waves” and “Toy Story”, was my modest presentation “The High School Olympiads - Excitement, Talent, and Determination” (cf. MSRI streaming video [79]). Years afterward, people still remember it by a single picture: that of a chicken and an egg.

312

EPILOGUE

The idea was that BAMO would get its participants mainly through newly founded school-based math circles around the SF Bay Area and would serve as an annual focal event for their activities. The Olympiad and the math circles would complete and strengthen each other and would be founded at the same time: neither would exist without the other. The mathematical community would support the math circles with materials and occasional session leaders; but the circles would be run by teachers at their schools. In the audience were Tom Davis (Silicon Graphics), Tom Rike (Oakland High School), Quan Lam (UC Berkeley President’s Office), Brian Conrey (Director of the American Institute of Mathematics in Palo Alto (AIM)), and Donald Knuth (Stanford), who all expressed desire to help with the new circle and Olympiad movement. MSRI and AIM then launched a series of events with local teachers and the media to publicize BAMO and to encourage the start-up of many math circles. Alexander Givental and Bjorn Poonen (UC Berkeley), John McCuan (MSRI), Dmitry Fuchs (UC Davis), Tatiana Shubin (SJSU), Joshua Zucker (then at Henry Gunn High School), and others were attracted through these events and pledged their support. 4.4. The “temporary” is the most permanent. One of these public events stands out in my mind as the conception of the Berkeley Math Circle. It was half a year later, in April 1998. Thirty or so local teachers had gathered at MSRI to learn about BAMO and to experience a math circle mock-session. Everyone was elated after the presentations; people were talking excitedly. But when a poll was taken of how many teachers were interested in starting a math circle at their own school, guess what? There was not a single hand up in the air! This was a wake-up call for all of us . . . more precisely, a bucket of icy water on my hot head. I remember sitting in my chair and puzzling over it: “What shall we do? BAMO can’t survive without math circles . . . . But the teachers are obviously not ready to undertake the enterprise on their own. Is this the end of it?” Still reeling with the thought, I started circulating among my colleagueprofessors asking if they were willing to deliver several sessions a year at a temporary math circle, to serve as an example to teachers, so that they would learn how it is done and would then start their own math circles. I got affirmative answers from seven and undertook the task of organizing a 1–2 year trial math circle in Berkeley. I must have been out of my mind, not realizing at the time what an enormous responsibility, both academic and administrative, I was willingly adding to my full-time job. But that’s what a new baby requires: sacrifice and effort and devotion. I had more than enough of each, as I was carrying a lifelong gratitude for my own childhood math circles and wanted to convey the wonders of mathematics to the young generation of the United States, my new home. What I didn’t know was that this project was far from temporary: that it would go on year after year, until we would be celebrating now 15 years of BMC and the present book series would be our new baby.

4. HISTORY AND POWER

313

There must have been more “crazy” people in the SF Bay Area at that time. A twin to BMC was born: the San Jose Math Circle [71] came into existence the same week as BMC, mid-September 1998, under the tender care and never-ending enthusiasm of Tatiana Shubin and Tom Davis, and is still operational. For a few years Tom Rike, Joshua Zucker, and John Howe led their own school-based circles in Oakland, Henry Gunn, and Presentation High Schools, respectively. Sam Vandervelde had a circle for two years at Stanford [78] (now led by parents). With MSRI’s guidance and support, Paul Zeitz and Brandy Wiegers launched a different type of math circle in San Francisco [70] and Oakland [62]. Sharon Madison opened the Sudbury Math Circle (Canada) as a chapter of BMC, and Olga Radko also fashioned the LA Math Circle [47] after BMC. The SF Bay Area network has expanded now to a number of math circles across the U.S.: very few school-based and not nearly as many as needed, but certainly way more than a decade ago. 4.5. Mapping out the future. Zooming back in on the Berkeley Math Circle, the services it offers begin with the weekly sessions and the monthly contests, but certainly do no end there. BMC has become a center for communications between students, parents, instructors, teachers, educators, and university administrators, where the circlers’ present and future mathematical education is mapped out. This kind of mentoring is possible only in the presence of both “sides”: high quality instructors and students. The more than 50 BMC instructors per year range from teachers and students to university faculty and real world tycoons. Among them are mathematicians: Alexander Givental, Alexandre Chorin, Bernd Sturmfels, Bjorn Poonen, Dmitry Fuchs, Elwyn Berlekamp, Federico Ardilla, Joe Buhler, Kiran Kedlaya, Olga Holtz, Ravi Vakil, Robin Hartshorne, Serge Lang, Vera Serganova, and many more. Some famous alumni have also contributed sessions to the circle: Gabriel Carroll, Maxim Maydanskiy, Inna Zakharevich, Neil Herriot, Andrew Dudzik, Austin Shapiro, Oaz Nir, and Evan O’Dorney, all of whom have chosen career paths in or related to mathematics. The accomplishments of the BMCers are stellar. For example, half of the BAMO grand prizes and brilliancy awards have been captured by the BMCers, including the only brilliancy award won by a girl, Hoan Ngo (Oakland High School), and the only BAMO-8 grand prize won by a girl, Laura Pierson (then a 6th grader at Oakland’s Hillcrest School), as well as a dozen gold and silver medals at the IMO’s and a dozen USAMO wins. In 2007, Evan O’Dorney, as an 8th grader, scored perfectly at BAMO and won the National Spelling Bee, meeting and enchanting the then-President Bush; the next year he scored highest at the USAMO and received the Clay Olympiad Scholar Award [15] for one of his solutions; he went on to earn the second highest score in the world at the IMO ’10 in Kazakhstan and received a congratulatory call from President Obama, meeting him a year later when in Washington to be awarded the first place prize at the Intel Talent Science

314

EPILOGUE

Search in 2011. Several multiple-time Putnam Fellows1 are also among our students. But most importantly, original mathematical research has been conducted by several circlers, including Gabriel Carroll, Tiankai Liu, Maksim Maydanskiy, Evan O’Dorney, and others.

5. Does the U.S. Need Top-Tier Math Circles?2 “I wish to state in no uncertain terms how important programs for our talented young people are to the future of this country. The best place to develop the highest end mathematical talent is in groups where young people can feed off each others’ excitement, guided by the best minds in the field. The model of top-tier math circles has been honed over decades in other countries. An American version has been in place for a decade and has shown measurable and almost unbelievable results. Now is the time to make these programs a permanent feature of our educational landscape. The community is ready to assist in any way possible. Universities are happy to provide facilities. Professors are happy to volunteer their time. Parents are happy to spend countless hours. And the reason we do this is that when you see these kids catch fire, it takes your breath away.” Ravi Vakil Four-time Putnam Fellow Professor of Mathematics Stanford University

5.1. Early birds. Creative people start at a very young age to think “outside-of-the-box” and to make significant contributions to the world. Some noticeable examples are Bill Gates, who at age 20 dropped out of Harvard to run Microsoft full-time; Steve Jobs founded Apple at age 19; and recently Mark Zuckerberg created Facebook, a social graph platform, also at age 19. The best young minds in the U.S. deserve our support. The Top-Tier Math Circles are venues for such support: they nurture individuals who are capable of significant accomplishments by giving them advanced training in problem-solving tools that are found in no other U.S. educational institution. As another example, a month before Evan O’Dorney [50] qualified for his first IMO in Spain ’08, the 9th grader was exempted from his final in a linear algebra class at UC Berkeley. The reason: he solved an open problem posed in an article by Professor William Kahan [45]; more precisely, Evan found out how small one can make the Cayley transform of a real orthogonal matrix by reversing the signs on selected columns. 1

William Lowell Putnam Mathematical Competition [64] is the premier Mathematical Olympiad for college students in the world. A Putnam Fellow is among the top 5 scorers. 2 Excerpts from [85].

5. DOES THE U.S. NEED TOP-TIER MATH CIRCLES?

315

“BMC has taught me a number of useful mathematical concepts and theories and exposed me to challenging problems. Writing problems for the Monthly Contests provided an outlet for my creative mind. BMC also introduced me to the top local, national, and international mathematical contests. The mentorship I receive through BMC is invaluable.” Evan O’Dorney, BMC alumnus Junior at Harvard University Two gold, two silver IMO medals 1st prize, Intel Talent Science Search ’11 National Spelling Bee Champion ’07

BMCer Gabriel Carroll was a high school junior when he took time off from IMO participation to work at the Research Science Institute at MIT. Without any prior experience in algebraic topology, he studied the link between posets and geometric figures, and his paper “Homology of Narrow Posets” [63] won the third place prize at the Intel Science Talent Search ’01. Gabriel went on to win two gold and one silver medals at the IMOs, achieving one of only four perfect scores at IMO ’01. He conquered the Putnam four times, two of those four while still in high school. A quote by him appears in the beginning of the Introduction. After winning a BAMO grand prize and the Regents’ and Chancellor’s Scholarship to UC Berkeley, BMC alumnus Maksim Maydanskiy attended two top undergraduate research programs: the Penn State REU and the REU in Duluth, Minnesota. His first project was inspired by Monsky’s Theorem on triangulations of the square and resulted in the paper “Triangles Gone Wild” [46]. His Duluth work “The Incidence Coloring Conjecture for Graphs of Maximum Degree 3” [51] extended the previously known result that all Hamiltonian cubic graphs have incidence 5-coloring to all cubic graphs. “The impact of the math circle program on my personal mathematical development is hard to overestimate. It was, and continues to be, the single most vibrant source of mathematical activity for high school students in the Bay Area. The lectures introduced me to many areas of mathematics, a number of which came up again in my later studies. The opportunity to meet a variety of people from fellow students to professors, the college campus setting, the overall atmosphere – all of that made BMC unique. The program helped me to shape my plans for undergraduate education. It was an experience no other sources could provide. The program has a great effect on mathematical youth in the Bay Area. It provides an interaction media and stimulating environment, both encouraging further involvement from students already interested in mathematics and promoting mathematics to a wider audience.” Maxim Maydanskiy, BMC alumnus BAMO ’00 grand prize Ph.D. in mathematics, MIT Institut de Mathématiques de Jussieu, Paris

316

EPILOGUE

5.2. The ultimate measure: more testimonials. An important contribution that top-tier math circles make is to challenge the exceptional students and by doing so to keep them interested in science and mathematics. “The math circle was so crucial to my education and interest in math; I can hardly imagine studying math at Harvard if it weren’t for it.” Tiankai Liu, SJMC and BMC alumnus Three-time IMO gold medalist Two-time Putnam Fellow Ph.D. student in mathematics, MIT

Over and over again, our circlers write about the impact of the program on their understanding of mathematics and their future; about a “different side of math” which they can acquire at the math circle but not at school; about “mind-bending” and “constantly challenging” sessions; about “gaining confidence” and finding a place where they “feel accepted”. Starting with a senior at BMC, we will move to quotes from younger and younger circlers. Evan Chen from Fremont, who teamed up with Evan O’Dorney for the last three years to coordinate the Monthly Contest, was a USA IMO ’13 Candidate and a USAJMO ’10, ’11 Winner. He received perfect scores at BAMO ’12 and Asian-Pacific MO ’13 and was selected to participate at the Research Science Institute in the summer of 2013 at MIT. “The Berkeley Math Circle was an unparalleled educational opportunity for me, both as a student and instructor. The lectures burrowed into countless different areas of mathematics, most of which I otherwise would not have seen until much later, and many which I would likely have not seen at all. The opportunity to plan and deliver my own sessions and to teach students proof-writing through the monthly contests has also been an invaluable pedagogical experience (and lots of fun!).” Evan Chen, BMCer, 12th grader

The moment Laura Pierson from Oakland walked into BMC as a 5th grader, it was obvious that she was special beyond any regular measures. As a 6th grader, she made history: she won the BAMO-8 Grand Prize in 2012 with a perfect score and conquered USAJMO ’12, thereby becoming the youngest to have been invited to MOSP. She went on to win silver medals on the U.S. (high school!) teams at the European and China Girls Math Olympiads in 2013 and 2012, respectively. She astounded her professors at UCB when, as a seventh grader, she received the top scores in multi-hundred student Calculus II and the upper-division Linear Algebra courses. She was accepted to College Preparatory School in Oakland, skipping 8th grade. “BMC has opened up a whole new world for me. It sparked my passion for math and introduced me to whole new areas of math I had no idea existed. I’ve also gotten to meet so many amazing people who share my passions and who I can connect with and learn from. In many ways it’s been a really life-changing experience.” Laura Pierson, BMCer, 9th grader

5. DOES THE U.S. NEED TOP-TIER MATH CIRCLES?

317

Nico Brown from Mill Valley is the kind of kid about whom you have no doubt: he “breathes” mathematics just as he breathes air. Being precocious does not come even close to describing the mature interest in pure mathematics which Nico spontaneously exudes. He has 13 accepted sequences on the Online Encyclopedia of Integer Sequences, a mathematician peerreviewed database. A multiple winner of the Monthly Contest and the Winner in the Individual Countdown Round of the Berkeley Mini-Math Tournament ’13, Nico’s passion is expressed most prominently through his work at mathnik.com on “original mathematics and proof writing, particularly in number theory.” “Most weeks start on Monday mornings, but mine start on Tuesday nights with the Berkeley Math Circle. It’s the highlight of my week for a couple of reasons. Reason #1: The math, of course, but math I wouldn’t see otherwise, such as the chromatic number of the plane or matrices, brought in by people who love math like me. Reason #2: I’ve met two of my best friends at BMC. For kids who love math, it’s rare to meet others who feel the same; so combining math with friendship is why I keep coming back. BMC also stands for ‘Best Math Community’.” Nico Brown, BMCer, 6th grader

Vincent Pisani from Castro Valley has been in BMC for three years and, as one of the youngest participants, has bravely taken any and all tests offered at the circle, including AMC8, AMC10, and BAMO. Having been awarded the John Hopkins 2012 High Honors, it may come as an anticlimax to know that he also received the credit for the California High School Algebra requirement based on test results taken as a 4th grader. A programmer and iPad App developer, Vincent is an accomplished trumpet player. “I really enjoy going to the Berkeley Math Circle. Each week has a new topic, so I get to learn about a huge variety of mathematical topics, unlike school. I have also met several great friends who also enjoy math, including a professor from USF. I get together with them often to share and work on math. BMC feeds my appetite for learning about math, and I think it is worth driving all the way to Berkeley each Tuesday.” Vincent Pisani, BMCer, 6th grader

Arav Karighattam from Davis joined the circle two years ago and won over everyone with his smile and irrepressible enthusiasm for math. He received the BAMO Young Student Achievement Award in ’12 and ’13, was one of the top students in the Junior High category of the mathleague.org California State Championships in ’12 and ’13, qualified for AIME in ’13 (as a 4th grader) and in ’14, and continues to amaze his UC Davis professors in upper-division courses such as Combinatorics, Euclidean Geometry, Number Theory, and Real Analysis. He has also won music and poetry competitions, including the Composers Today California State Contest in ’13 and the ‘Voices of Lincoln’ Young Poet Contest in ’11, ’12, and ’13.

318

EPILOGUE “There are many things I love about the Berkeley Math Circle. First, I like the range of advanced topics taught at each session. Second, I enjoy all the open problems presented at the circle during certain lectures (e.g., which permutations are Wilf-equivalent?). That is why I don’t like to miss a single session of BMC, rain or shine. It is an extraordinary experience.” Arav Karighattam, BMCer, 5th grader

Espen Slettnes is a third grader at BMC, who rapidly moved from the BMC-Elementary to the BMC-Intermediate group in only two years and received, not surprisingly, the 2012 High Honors Award from Johns Hopkins University’s Center for Talented Youth and Math Kangaroo’s 2013 5th place in California and 10th place nationwide. He is also a Young Scholar at Davidson Institute for Talent Development and was selected to participate at the Epsilon Camp for exceptionally gifted young children in 2013 and 2014. “I am 8 years old, and I love math. BMC is an important part of my math education, because it is one of the only places I get to work on real math that I don’t get to do in school. The lectures introduce me to many different math topics and help me dive deeper into topics I already know. I also love participating in the BMC monthly contests, which exercise my mind and help me improve my skills in writing mathematical proofs. I am very glad to be part of BMC.” Espen Slettnes, BMCer, 3rd grader

5.3. The gathering storm. There are a number of studies of the deteriorating situation in U.S. math and science education and its impact on the scientific and technological presence of the U.S. in the world. To describe just how critical the situation is, we refer below to three such reports. “The United States is losing its edge in innovation and is watching the erosion of its capacity to create new scientific and technological breakthroughs. Increased global competition, lackluster performance in mathematics and science education, and a lack of national focus on renewing its science and technology infrastructure have created a new economic and technological vulnerability as serious as any military or terrorist threat.” A Commitment to America’s Future, 2005 [13]

The National Academy of Sciences has also called to our attention the need for the U.S. to raise its capabilities in mathematics, science, and engineering, in a report “Rising Above the Gathering Storm: Energizing and Employing America for a Brighter Economic Future” [58]. According to it: • The U.S. has long depended on foreign-born and -trained mathematicians, engineers and scientists to help maintain its intellectual lead. • The global competition for these talented individuals has greatly intensified in recent years and will continue to do so, as the rest of the world increases its technical capabilities and living standards. • To remain competitive, the U.S. needs to devote considerably more effort and resources to foster excellence in mathematics, science and engineering.

5. DOES THE U.S. NEED TOP-TIER MATH CIRCLES?

319

The majority of talented individuals in these fields recruited by U.S. universities and technology companies are from China, Europe, India, and the former Soviet Union. A 2006 report on Science, Technology, Engineering, and Mathematics Education (STEM, [61]) brought forward related troubling trends and numbers: • In 2004, China graduated approximately 500,000 engineers; India graduated 200,000 engineers; and the U.S. graduated 70,000 engineers. On the other hand, South Korea graduates as many engineers as the U.S. even though it has only one sixth of the U.S. population. • More than half of all engineering doctorates awarded in the U.S. go to foreignborn students. In 2003, 25% of all college-educated workers and 40% of all doctorate holders were foreign-born. Over half of the doctorate holders in several fields who resided in the U.S. were foreign-born: computer science; electrical, civil, and mechanical engineering. • From 1994 to 2004, there has been a steady increase in the percentage of U.S. patents granted with a foreign origin, including foreign-owned companies and foreign inventors. In one decade this number has increased from 18% in 1994 to 48% in 2004!

What do these foreign countries do differently from the U.S.? There are many differences and each country is unique. India and China value technical education as a path to prosperity; admission to technical schools there is based on rank in national exams. In the former Soviet Union and Eastern Europe, mathematically talented individuals are identified very early and are provided with the resources needed to reach their full potential. 5.4. Raising the ceiling. What can be done in the U.S.? Hung-Hsi Wu, Professor of Mathematics at UC Berkeley, has been involved in the education of U.S. mathematics teachers for the last decade. He was on the Task Group on Teachers in the National Mathematics Advisory Panel appointed by President Bush and is currently serving on the National Research Council Panel on the Study of Teacher Preparation Programs. According to Professor Wu, a main purpose of both panels is to address the crisis in teacher quality among math teachers so as to insure the production of a large enough pool of mathematically literate students to fill our technological needs. However, to insure that we also produce first rate scientists and mathematicians, a different kind of approach would be necessary: “This is where the Math Circles come in. It is programs like the Math Circles that can provide the needed guidance and stimulation for the cream of the crop of this pool. While the work done by the abovementioned panels is designed to raise the floor to make our nation competitive in the global market, what the Math Circles do is to raise the ceiling in order to maintain our worldwide leadership position in science and technology.

320

EPILOGUE At a time of need in our nation’s mathematics education, the work done in top-tier math circles such as the Berkeley Math Circle and the San Jose Math Circle is of vital importance.” Hung-Hsi Wu Professor of Mathematics University of California at Berkeley

While it is unlikely that math circles will have a large impact on the value system of the American public, the top-tier math circles in the U.S. do play a significant role in meeting the challenges described above by preparing our best young minds for their future role as mathematics, science, and technology leaders. With your help, we can establish a dense network of math circles across the U.S. With hope, Zvezdelina Stankova Berkeley Math Circle Director Berkeley, March 17, 2014

Symbols and Notation Set and Logic Notation N Z Q I R C ∞ (a, b) [a, b] (a, ∞) (−∞, b) (−∞, ∞) ∈; ∈ (; ( ⊂; ⊂ ⊃; ⊃   A∩B A∪B AB A\B A×B A |A| Σ (A) (A) ⇒ ⇐ ⇔, iff  ♦ ?

set of natural numbers set of integer numbers set of rational numbers set of irrational numbers set of real numbers set of complex numbers infinity or infinitely many open interval: all x ∈ R such that a < x < b closed interval: all x ∈ R such that a ≤ x ≤ b semi-infinite interval: all x ∈ R such that x > a semi-infinite interval: all x ∈ R such that x < b infinite interval: all real numbers, R is an element of; is not an element of passing; not passing through is a subset of; is not a subset of contains; does not contain is contained in but is not equal to contains but is not equal to intersection of set A and set B union of set A and set B disjoint union of set A and set B set A but without the elements of set B all pairs (a, b) of elements a in A and b in B the complement of set A number of elements in set A sum of elements in set A product of elements in set A implies, only if if, is implied by if and only if end of proof end of hint or partial solution questionable proof

321

322

SYMBOLS AND NOTATION

Geometry Notation : α, β, γ, δ aA I(A, r) I(A) [ABC] AB |AB| AB − −→ AB ∠ABC ABC I ⊥ * ∼ = ∼ ∠A = ∠B



divide or take the ratio of segments alpha, beta, gamma, or delta: letters from the Greek alphabet mass point (a, A) inversion with center A and radius r inversion with center A and unspecified radius area of triangle ABC segment AB or its length depending on context distance from A to B; used if AB is ambiguous arc AB ray AB angle ABC triangle ABC Triangle Inequality is perpendicular to is parallel to geometric congruence geometric similarity congruence of angles written also as ∠A ∼ = ∠B

Group Theory Notation R B,F,U,D,L,R e, id g −1 o(a) Dn Sn An Zn Q∗ , R∗ , C∗ G1 × G2 G1  G2

Rubik’s Cube group quarter-turn clockwise twist about the back, front, up, down, left, and right faces of the Rubik’s Cube. identity element (e.g., in a group) inverse of an element in a group or reciprocal of a number order of element a of a group the nth dihedral group: the group of symmetries of a regular n-gon the symmetric group of permutations on n objects (usually) the alternating group of even permutations on n objects the group of remainders modulo n under addition same sets but without 0; all groups under multiplication direct product of groups G1 and G2 semidirect product of groups G1 and G2

Complex Numbers Notation C i Re{z} Im{z} z (|z|, θ) ζn , ω1 ζnk , ωk Cn Pz ζ(s)

the unit circle (as √ viewed in the C-plane) imaginary unit, −1 real part of complex number z imaginary part of complex number z conjugate of complex number z polar form of z with modulus z and argument θ primitive nth root of unity kth power of a primitive nth root of unity the group of all nth roots of unity under multiplication smooth curve through all integer powers z n of z Riemann zeta-function

SYMBOLS AND NOTATION

Number Theory Notation x = [x] &x' {x} min{a, b} max{a, b} a | b (a  b) a ≡ b (mod c) gcd(a, b) lcm(a, b) R(n) id(n) ι(n) O(n) ε(n) φ(n) μ(n) Λ(n) τ (n) σ(n) π(n) A M S Sf f g x “i

floor of x or integer part of x: greatest integer ≤ x ceiling of x: least integer ≥ x fractional part of x: x − [x] minimum of a and b maximum of a and b a divides b without remainder (a does not divide b) a is congruent to b modulo c greatest common divisor of a and b least common multiple of a and b ∞-Raffle function the identity function: id(n) = 1 for all n ∈ N the constant function 1: ι(n) = 1 for all n ∈ N the zero-function: O(n) = 0 for all n ∈ N a two-value function: ε(1) = 1 and 0 elsewhere Euler function Möbius function von Mangoldt function number of the divisors of n sum of the divisors of n product of the divisors of n set of arithmetic functions set of multiplicative functions set of strongly multiplicative functions sum-function of the function f Dirichlet convolution of functions f and g xi is missing from the product of the other xj ’s

Combinatorics Notation n! P (n, k) n k

n factorial, 1 · 2 · 3 · · · (n − 1) · n number of permutations of n objects taken k at a time binomial coefficient n choose k, n!/(r!(n − r)!)

Knot Theory Notation U T 41 H W B S R1, R2, R3 τ (L) K1 #K2 L VL

unknot (right-hand) trefoil figure 8 Hopf link Whitehead link Borromean rings Square knot Reidemeister moves on links the number of tricolorings of a link L connected sum of two knots mirror image of link L Jones polynomial of a link L

323

324

SYMBOLS AND NOTATION

Linear Algebra Notation

x 3-D Null(A) dimV

vector x three-dimensional null space (or kernel) of a matrix A dimension of space V

Functions, Means, and Calculus Notation ≈ →, → e π φ; φ ex ln x sin x cos x tan x cot x arctan x |x| |x √ − y| x √ n x Pr P1 P0 P−1 P2 P∞ P−∞ x ˜ lim f (x) x→a

lim xn

x→∞ 



$f (x), f (x) f (x)dx

approximately goes to (under a function or a process) base of natural log, ≈ 2.71828 ratio of circumference √ to diameter of a circle,√≈ 3.14159 golden ratio, (1 + 5)/2; its conjugate, (1 − 5)/2 natural exponential function natural logarithmic function, loge (x) sine function cosine function tangent function cotangent function inverse of the tangent function modulus, or absolute value of, x distance between numbers x and y square root of x nth root of x r th power mean arithmetic mean geometric mean harmonic mean root mean square max{x1 , . . . , xn } min{x1 , . . . , xn } weighted average limit of function f (x) as x goes to a limit of sequence xn first and second derivatives of f (x) integral (or antiderivative) of f (x)

Abbreviations AA ASA AHSME AIM AIME AM AMC AMS ARML AWM BAMM BAMO BMC CM Cor CTY Def gcd GM GPHP HM HMC HL H/L HLP iff IH, IHs IMO JI l’H LA lcm Lem LHS MAA MASS MC MI, MIs

Angle-Angle Criterion for similarity of triangles Angle-Side-Angle Criterion for similarity of triangles American High School Mathematics Examination American Institute of Mathematics American Invitational Mathematics Examination Arithmetic Mean American Mathematics Competition American Mathematical Society American Regional Mathematics League Association for Women in Mathematics Bay Area Mathematics Meet Bay Area Mathematical Olympiad Berkeley Math Circle Continuity and Midpoint Criterion Corollary Center for Talented Youth at John Hopkins University Definition Greatest Common Divisor Geometric Mean Generalized Pigeonhole Principle Harmonic Mean Hometown Math Circles Hypotenuse-Leg Criterion for congruence of right triangles Hypotenuse-Leg Criterion for similarity of right triangles Hardy-Littlewood-Pólya Inequality If and only if Inductive Hypothesis, Strong Inductive Hypothesis International Mathematical Olympiad Jensen’s Inequality l’Hôpital’s Rule Los Angeles Least Common Multiple Lemma Left-Hand Side Mathematical Association of America Mathematics Advanced Study Semesters Monthly Contest at the Berkeley Math Circle Mathematical Induction, Strong Form of Mathematical Induction 325

326 MIT MOSP MSRI Mult NSF NYCML OEIS PM PHP Prop PST RA R A REU RHS RI RR SAS SF SJMC SJSU SOS SsA SSS Thm TLC USAMO USAJMO USSR VIGRE WLOG wrt

ABBREVIATIONS Massachusetts Institute of Technology Mathematical Olympiad Summer Program Mathematical Sciences Research Institute Multiplicative National Science Foundation New York City Mathematical League Online Encyclopedia of Integer Sequences Power Mean Pigeonhole Principle Proposition Problem Solving Technique Ratio-Angle Criterion for similarity of triangles Ratio-Opposite-Angle Criterion for similarity of triangles Research Experience for Undergraduates Right-Hand Side Rearrangement Inequality Ratio-Ratio Criterion for similarity of triangles Side-Angle-Side Criterion for congruence of triangles San Francisco San Jose Math Circle San Jose State University Sum of Squares Side-Side-Angle Criterion for congruence of triangles Side-Side-Side Criterion for congruence of triangles Theorem Tangent Line-Chord USA Mathematical Olympiad USA Junior Mathematical Olympiad Union of Soviet Socialist Republics Vertical Integration of Research and Education Without Loss of Generality With Respect To

Biographical Data Bjorn Poonen is the Claude Shannon Professor of Mathematics at MIT. He received AB and PhD degrees from Harvard and Berkeley, respectively, and held positions at MSRI, Princeton, and Berkeley before moving to MIT in 2008. He was involved with the Berkeley Math Circle from its creation in 1998 until 2008; he first led a session on inequalities there in 2001. Poonen’s research focuses mainly on number theory and algebraic geometry; in particular, he is interested in the rational number solutions to equations. Poonen is the founding managing editor of Algebra & Number Theory. He is a fellow of the American Academy of Arts and Sciences and of the American Mathematical Society. He has received the Guggenheim, Packard, Rosenbaum, and Sloan fellowships, as well as a Miller Professorship, and the Chauvenet Prize (in 2011). Earlier, he was a four-time Putnam Competition winner, an International Mathematical Olympiad silver medalist, and the unique perfect scorer out of 385,000 participants in the 1985 American High School Mathematics Exam. Thirteen mathematicians have completed a PhD thesis under his guidance. Gabriel Carroll was a student at Oakland Technical High School when he attended the Berkeley Math Circle for three years. He won three consecutive BAMO grand prizes and three ARML top individual prizes, received two gold medals and one silver IMO medal (including a perfect score at IMO ’01 in Washington, D.C.), and was among the top five-ranked Putnam scorers from 2000–2003, becoming one of only seven four-time Putnam Fellows. Gabriel co-coordinated the BMC Monthly Contest for two years. While still a circler, he presented a number of topics to the more advanced BMC students; one of these sessions became the basis for Monovariants in this book series. After a stint teaching English in Hunan province in China, Gabriel has since proceeded to put his mathematics background to use studying theoretical economics. He completed his PhD at MIT and a post-doc at Microsoft Research, and is now an assistant professor in the economics department at Stanford. He continues to write problems for contests such as BAMO, USAMO, and IMO. Gabriel’s recent activities outside academia include poking at piano keyboards, making ceramics, playing Go, and eating unexpected vegetables. 327

328

BIOGRAPHICAL DATA

Maia Averett is on the faculty at Mills College. She completed her PhD in 2008 at UC San Diego where she was a UC Regents Dissertation Year Fellow. Her area of mathematical specialty is the wobbly world of topology. She started off as a homotopy theorist, but lately her research has been in the fascinating new area of topological data analysis, a field that applies the abstract machinery of algebraic topology to point cloud data to gain insight about topics ranging from breast cancer to basketball. Since finding mathematics as her passion came relatively late in college, she has made mathematical outreach to young people a central objective in her career. She created and conducted math circle sessions since 2008 for both the Berkeley and the Marin Math Circles. Maia also has a special interest in fostering women in mathematics. She has been engaged in the Expanding Your Horizons program at UC San Diego and at Mills. She founded student chapters of the Association for Women in Mathematics (AWM) at UC San Diego while in graduate school and later at Mills, where the chapter goes by the name of The Möbius Band in honor of her love of topology. Events organized by The Möbius Band regularly attract upwards of 30 people – quite a feat at a school like Mills, which has only 950 undergraduates. Maia has also taken an active role in the AWM on a national level, serving on and chairing the student chapters committee and creating chapter meet-ups at national math meetings. When she’s not teaching, researching, programming, outreaching, or otherwise mathematically engaged, Maia enjoys cooking Thai food, circuitbending children’s toys, and hiking in the Oakland hills with her dog.

T om Davis competed on the Caltech Putnam team as an undergraduate and earned his PhD in probability and partial differential equations at Stanford. He was a founder of Silicon Graphics and a Principal Scientist there. For fifteen years he has been a freelance mathematician, pursuing his passion: to work on challenging problems with talented students in the setting of math circles. Tom has been involved in the Berkeley Math Circle from its inception. He co-founded and co-directs the San Jose Math Circle [71] and regularly leads sessions at all SF Bay Area math circles. He has also co-organized Teachers’ Math Circles at AIM and MSRI. According to Tom, calculus is not enough to do computer graphics: “People who are interested in making computer-generated dinosaurs for ‘Jurassic Park’ or a liquid metal man in ‘Terminator II’ or who want to have Forrest Gump shake hands with Richard Nixon had better have a solid grounding in advanced calculus and in differential and projective geometry.” Tom’s web site [20] contains free dynamic geometry software, the Rubik’s Cube software discussed in his articles, and an extensive collection of math circle talks. Tom is also an avid fan of endurance athletics. He has completed three ironman-distance triathlons and one ultramarathon, but hopes that he finally has the good sense not to do another.

BIOGRAPHICAL DATA

329

T om Rike graduated from San Francisco State College in 1968. After spending the next two years taking graduate courses and getting a teaching credential, he taught six years at Westlake Junior High School. In 1974, he went back to school in the evenings and received his M.S. in mathematics from Holy Names College in 1976. Moving to Oakland High School, he taught until he retired in 2003 and now volunteers there three days a week. Tom served as the high school liaison for BMC and BAMO from the beginning until 2009. He has been fascinated by giants of mathematical thought such as Archimedes, Euler, and Gauss and has shared their works at BMC sessions on a number of occasions: using the arbelos from the Book of Lemmas by Archimedes; showing Euler’s solution to the Basel Problem; and demonstrating Gauss’s proof from Disquisitiones Arithmeticae that a 17-gon is constructible. It is Archimedes’ lever, on which he “balanced” his Mass Point article in Volume I. For a number of years, Tom ran a math circle at Oakland High. Tom has been deeply involved as a coach and is now director of the East Bay Mathletes, a monthly competition among local high schools, which began in 1978. In 1983, he helped found in Oakland a middle school mathematics competition, All-Star Mathletes, which takes place four times a year. Among his interests outside mathematics are the San Jose Sharks, the SF Giants, and opera. He has been playing Go and studying the Japanese language with devotion for over 45 years.

T atiana Shubin went as a high school student to the Special Mathematics and Physics Boarding School of the Academy of Sciences in Novosibirsk. She did her undergraduate work in the USSR at the Kazakh and Moscow State Universities. In 1983 she received her PhD in Mathematics at UC Santa Barbara, and after a couple of years at UC Davis she joined the Mathematics Department of San Jose State University in 1985. At the outset, Tatiana was a strong proponent of math circles. She has often said that she owes her life to math circles and is now paying back her debt of gratitude by making math circles available to others. In 1998, Tatiana co-founded the San Jose Math Circle [71] and the highly successful Bay Area Math Adventures (BAMA) talks. The latter have been preserved in two volumes published by MAA: Mathematical Adventures for Students and Amateurs, and Expeditions in Mathematics, co-edited by Tatiana, David Hayes, and Gerald Alexanderson. [39, 75] Tatiana has also contributed sessions to BMC since 2001, with emphasis on group theory and a variety of hybrid geometry topics. She is a co-founder and a leadership team member of the Math Teachers’ Circle Network since 2006, and a co-founder and a member of the Executive Committee of a Special Interest Group of MAA on Math Circles for Students and Teachers since 2009. She translated and edited several books published by the AMS in the MSRI Mathematical Circles Library. She is also the founder and a co-director of the Navajo Nation Math Circles project, aimed at launching

330

BIOGRAPHICAL DATA

and supporting mathematically rich experiences such as math circles and math summer camps for children and teachers in the Navajo Nation. Tatiana’s outstanding teaching was recognized in 2006 via MAA’s Distinguished College or University Teaching of Mathematics Award, of the Northern California, Nevada, and Hawaii section. With all this, Tatiana still finds opportunities for her favorite pastime, rock hounding, resulting in an expansive rock collection at her home.

Zvezdelina Stankova was drawn into the world of mathematics when, as a 5th grader, she joined the math circle at her school in Bulgaria and three months later won the Regional Math Olympiad. She represented her home country at two IMOs, earning silver medals. Some of her articles in this book series are inspired by the lectures she heard during the training of the Bulgarian IMO team. As a freshwoman at Sofia University, Zvezda won a competition to study in the U.S. and completed her undergraduate degree at Bryn Mawr College in 1993. She did her first math research in enumerative combinatorics at two summer REU’s in Duluth, Minnesota. The resulting papers contributed to her Alice T. Schafer Prize for Excellence in Mathematics by an Undergraduate Woman, awarded by the Association for Women in Mathematics. In 1997, Zvezda received a PhD from Harvard University, with a thesis on moduli spaces of curves, in the field of algebraic geometry. Meanwhile, she earned a high school teaching certificate in the state of Massachusetts and later in California. As a postdoctoral fellow at MSRI and UC Berkeley in 1997–1999, Zvezda co-founded BAMO [8] and started BMC [11]. She trained the USA national team for the IMOs for six years, including the memorable year 2001 when three of the six team members were BMCers, and USA tied with Russia for a second overall place in the world. Since 1999, she has been at Mills College. Her current research interests include classification of restricted patterns in the area of enumerative and algebraic combinatorics. Zvezda’s inspiring style and passion to teach have been recognized by the MAA: in 2004 she was selected as a recipient of the first Henry L. Alder Award for Distinguished Teaching by a Beginning College or University Mathematics Faculty Member. In 2011 MAA awarded her the highest math teaching award in the United States, the Deborah and Franklin Tepper Haimo Award for Distinguished College or University Teaching of Mathematics. Zvezda was featured in the Salutes Program of the ABC 7 News in spring 2011. In 2012, she was listed in Princeton’s Review “300 Best Professors.” Zvezda’s most enduring passion remains working at BMC with young students motivated to discover new mathematical wonders. She spends a lot of time with her girl and boy, studying foreign languages with them and playing the piano, and teaching them mathematics the “Bulgarian” way.

Bibliography 1. C. Adams, The Knot Book: An Elementary Introduction to the Mathematical Theory of Knots, Amer. Math. Soc., 2004. 2. American Mathematics Competitions, www.maa.org/math-competitions. 3. T. Andreescu and Z. Feng, Mathematical Olympiads 1998-1999, Math. Assoc. of America, 2000. , Mathematical Olympiads 1999-2000, Math. Assoc. of America, 2002. 4. , 103 Trigonometry Problems from the Training of the USA IMO Team, 5. Birkhäuser, 2005. 6. M. Armstrong, Groups and Symmetry, Springer, 1987. 7. V. Arnol’d,Trivium Mathematique,Translated by C.J. Shaddock, hans.math.upenn. edu/Arnold/Arnold-Trivium-1991.pdf, 1991. 8. Bay Area Mathematical Olympiad, www.bamo.org/. 9. E. Berlekamp, J.H. Conway, and R. Guy, Winning Ways for Your Mathematical Plays, Vol. 4: Solitaire Army, A K Peters/CRC Press; 2nd edition, 2004. 10. E. Birrell, The Knot Quandle, www.thehcmr.org/issue1_2/knot_quandle.pdf, Fall 2007. 11. Berkeley Math Circle (BMC), mathcircle.berkeley.edu/. 12. S. Budurov and D. Serafimov, Mathematical Olympiads, part II, State Publishing Company “Narodna Prosveta”, 1985. 13. Business-Higher Education Forum, A Commitment to America’s Future, www.bhef.com/publications/commitment-americas-future-respondingcrisis-mathematics-and-science-education, 2005. 14. F. Chung and R. Graham, A Tour of Archimedes’ Stomachion, www.math.ucsd.edu/ ∼fan/stomach/tour/stomach.html, 1993. 15. Clay Olympiad Scholar Award and USAMO winners, www.maa.org/news/ usamo-winners-celebrated-nations-capital. 16. J. Cofman, What to Solve? Problems and Suggestions for Young Mathematicians, Oxford University Press, 1990. 17. L. Cohen and G. Ehrlich, The Structure of the Real Number System, D. Van Nostrand, 1963. 18. B. Cutler, Stomachion, mathworld.wolfram.com/Stomachion.html, Nov. 2003. 19. Davis Math Circle, davismathcircle.wordpress.com. 20. T. Davis, www.geometer.org/. 21. D. Djukić, V. Janković, I. Matić, and N. Petrović, The IMO Compendium:1959-2004, Problem Books in Mathematics, Springer, 2006. 22. Roy Dubish, Groups (Topics for Mathematics Clubs), National Council of Teachers of Mathematics, 1973. 23. S. Eliahou, L. Kauffman, and M. Thistlethwaite, Infinite families of links with trivial Jones polynomial, Topology 42 (2003), no. 1, 155–69. 331

332

BIBLIOGRAPHY

24. Euclid, Euclid’s Elements, Green Lion Press, 2003. 25. Zuming Feng and Yi Sun, USA and International Mathematical Olympiads 2007-2008, Math. Assoc. of America, 2008. 26. J. Gallian, Contempory Abstract Algebra, 7th ed., Brooks Cole, 2009. 27. I. Gelfand, Functions and Graphs, Dover, 2002. 28. I. Gelfand, E. Glagoleva, and A. Kirilov, The Method of Coordinates, Dover, 2011. 29. I. Gelfand and M. Saul, Trigonometry, Birkhäuser, 2013. 30. I. Gelfand and A. Shen, Algebra, Birkhäuser, 2013. 31. S. Gelfand, M. Gerver, A. Kirillov, and N. Konstantinvov, Sequences, Combinations, Limits, Dover, 2002. 32. A. Givental, Kiselev’s Geometry: Book I. Planimetry, Sumizdat, 2006. 33. M. Greenberg, Euclidean and Non-Euclidean Geometry, W. H. Freeman & Company, 1997. 34. H. Guerber, The Story of the Greeks, www.mainlesson.com/display.php?author= guerber&book=greeks&story=knot, 1923. 35. L. Hahn, Complex Numbers & Geometry, Math. Assoc. of America, 1994. 36. G. Hardy, J. Littlewood, and G. Pólya, Inequalities, Cambridge Univ. Press„ 1988. 37. Robin Hartshorne, Geometry: Euclid and Beyond, Springer, 2000. 38. J. Hass and J. Lagarias, The Number of Reidemeister Moves Needed for Unknotting, Journal of Amer. Math. Soc. 14 (2001), 399–428. 39. D. Hayes and T. Shubin (eds.), Mathematical Adventures for Students and Amateurs, Spectrum Series, Math. Assoc. of America, 2004. 40. D. Hilbert, Foundations of Geometry, Open Court, 1990. 41. H. Hoà, United States of America Mathematical Olympiad (USAMO), www. math-olympiad.com/america-mathematical-olympiad.htm, 2007. 42. J. Hoste, M. Thistlethwaite, and J. Weeks, The First 1,701,936 Knots, Math. Intell. 20 (1998), 33–48. 43. H. Jacobs, Geometry, third ed., W. H. Freeman and Company, 2003. 44. V. Jones, The Jones Polynomial, http://math.berkeley.edu/∼ vfr/jones.pdf, August 2005. 45. W. Kahan, Is There a Small Skew Cayley Transform with Zero Diagonal?, Linear Algebra and Its Applications (2006), 335–341. 46. J. Kantor and M. Maydanskiy, Triangles Gone Wild, MASS selecta (2003), 277–288. 47. Los Angeles Math Circle, www.math.ucla.edu/∼ radko/circles/. 48. D. Leites, 60-odd YEARS of Moscow Mathematical Olympiads, andrej.fizika.org/ ostalo/gimnazija/math/ruske_olimpijade/11a-olym-1.pdf, 1997. 49. C. Livingston, Knot Theory, Carus Monograph, vol. 24, Mathematical Association of America, 1993. 50. MAA Online, Evan O’Dorney: Spelling Champ and Math Whiz, www.maa.org/news/ 060408odorney.html. 51. M. Maydanskiy, The Incidence Coloring Conjecture for Graphs of Maximum Degree 3, Discrete Mathematics 292 (2005), 131–141. 52. Marin Math Circle, www.marinmathcircle.org. 53. N. McCoy, Introduction to Modern Algebra, 4th ed., Allyn and Bacon, Inc., 1987. 54. W. Menasco and M. Thistlethwaite, The Tait Flyping Conjecture, Bull. Amer. Math. Soc. 25 (1991), 403–12. , The Classification of Alternating Links, Ann. Math 138 (1993), 113–73. 55. 56. J. Milnor, Link Groups, Ann. Math 59 (1954), no. 2, 177–195. 57. S. Morrison and D. Bar-Natan, “Rubberband” Brunnian Links, http://katlas.math. toronto.edu/wiki/%22Rubberband%22_Brunnian_Links, May 2009. 58. National Academy of Sciences et al., Rising Above the Gathering Storm: Energizing and Employing America for a Brighter Economic Future, books.nap.edu/catalog. php?record_id=11463, 2007.

BIBLIOGRAPHY

333

59. Nauka, Kvant, kvant.mccme.ru, 1994. 60. T. Needham, Visual Complex Analysis, Clarendon Press, 1997. 61. Northern Illinois University, Illinois Status Report on Science, Technology, Engineering, and Mathematics Education, www.keepingillinoiscompetitive.niu.edu/ilstem/ pdfs/STEM_ed_report.pdf, 2006. 62. Oakland/East Bay Math Circle, oebmc.mathcircles.org/. 63. I. Peterson, Prized Geometric Logic, mathforum.org/library/view/18454.html, 2001. 64. William Lowell Putnam Mathematical Competition, math.scu.edu/putnam/. 65. K. Reidemeister, Elementare Bergründung der Knotentheorie, Abh. Math. Sem. Univ. Hamburg 5 (1926), 24–32. , Knoten und Gruppen, Abh. Math. Sem. Univ. Hamburg 5 (1926), 7–23. 66. , Knotentheorie, Chelsea, 1948. 67. 68. J. Roberts, Knot Knotes, math.ucsd.edu/∼ justin/Papers/knotes.pdf, 1999. 69. W. Rudin, Principles or Mathematical Analysis: 3rd edition, McGraw-Hill, 1976. 70. San Francisco Math Circle, www.sfmathcircle.org/index.html. 71. San Jose Math Circle, www.sanjosemathcircle.org/. 72. H. Schwerdtfeger, Geometry of Complex Numbers, Dover, 1979. 73. D. Shklarsky, N. Chentzov, and I. Yaglom, The USSR Olympiad Problem Book, Dover, 1993. , The U.S.S.R. Olympiad Problem Book, Dover, 2013. 74. 75. T. Shubin and D. Hayes and G. Alexanderson (eds.), Expeditions in Mathematics, Spectrum Series, Math. Assoc. of America, 2011. 76. A. Sosinski, Marching Orders, Quantum 2 (1991), no. 2, 8–11. , Finite Groups (in Russian), Kvant (1996), no. 6. 77. 78. Stanford Math Circle, www.stanfordmathcircle.org/. 79. Z. Stankova, The High School Olympiads: Excitement, Talent, and Determination, www.msri.org/publications/ln/msri/1997/bamo/sf/1/index.html, 1997. 80. I. Stewart, Galois Theory; 3rd edition, 3 ed., Chapman & Hall/CRC Mathematics, 2003. 81. M. Thistlethwaite, Links with trivial Jones polynomial, Journal of Knot Theory Ramifications 10 (2001), no. 4, 641–3. 82. C. Trigg, A Three-Square Geometry Problem, Journal of Recreational Mathematics 4 (1971), 90–99. 83. Utah Math Circle, www.math.utah.edu/mathcircle/. 84. S. Vandervelde, Circle in a Box, MSRI Math Circle Library, Vol. 2, AMS and MSRI, 2009. 85. M. Whitlow, M. Breen, Z. Stankova, and T. Shubin, Sustainable Funding of Top Tier Math Circles, Proposal, 2007. 86. I. Yaglom, Complex Numbers in Geometry, Academic Press, 1968. 87. B. Youse, The Number System, Dickenson, 1965.

Credits The American Mathematical Society gratefully acknowledges these institutions and individuals for granting the following permissions: Business-Higher Education Forum The quotation “A Commitment to America’s Future: Responding to the Crisis in Mathematics and Science Education” in the Epilogue, www.bhef.com/sites/g/files/g829556/f/report_2005_commitment_ to_americas_future_0.pdf, Business-Higher Education Forum, 2005. The Mathematical Association of America, American Mathematics Competitions Problems from USAMO ’80, USAMO ’93, USAMO ’97, USAMO ’99, USAMO ’07 used with permission. Alexander the Great, The Story of the Greeks, by Helene A. Guerber [34] on page 51. The International Mathematical Olympiad (IMO) logo Robert Scharein’s KnotPlot software at knotplot.com A few references and the public domain portrait of Galois have been taken from Wikipedia at www.wikipedia.org/

335

Index n-factorial, 247 AMS-inclusion, 94 Rubik program, 33 Macro gizmo, 33 “convex hall” of fame, 219 15-puzzle, 103, 139 Abel, Niels Henrik, 105 abstract algebra, xvi, 31, 48, 81, 105, 114, 234, 310 AIM, 312, 328 Alexander the Great, 49 Alexanderson, Gerald, 329 algebra, xx, 104, 212, 306, 330 algebraic topology, 315 algebraic geometry, xiv, 310, 327, 330 algebraic number theory, 53 algebraic structure, 110 algorithm, 11, 63, 125, 140, 271, 294 GPS, 278, 285 Rubik’s Cube, 45, 48 smoothing, 264 unsmoothing, 265 Alper, Ted, xxiv altitude, 126, 175, 185 foot of, 175 AMC, xxiv, 317 American Academy of Arts and Sciences, 327 AMS, 327 angles, 2, 172 acute, 178, 182, 184 alternate interior, xvii, 8–10, 16, 18, 21, 173 central, 198 congruent, 17 equal, 176, 177, 304 exterior, 184 inscribed, 171, 178, 224 obtuse, 19, 178 remote interior, 184 right, 14, 20, 178, 296 straight, 22

supplementary, 186 vertical, 9, 16, 21 arc, 181, 183 Archer Design Inc., xxiv Archimedes, 2, 18, 301, 329 Archimedes’ Axiom, 18 Stomachion, 301 Ardilla, Federico, 313 argument, 190, 194 ARML, 327 array, 142 Ars Magna, 104 art, xxii, 51, 68 asymptote, 181, 292 Auckly, David, xxiii automorphism, 192 average, 167, 211 ordinary, 280 weighted, 271, 280, 282 Averett, Maia, 328 AWM, 328 axes in the C-plane, 205 axiom, 16, 23, 177 baby AM-GM, 225 Balkan Math Olympiad, 263, 275 BAMO, 311–313, 315–317, 329 Barchelo, Helene, xxiii Bay Area Math Adventures, 329 Bay Area math circles, xix, 309, 312, 313, 328 Beatles, 307 Beltrami, Eugenio Beltrami-Klein model, 18 Beltrami, Eugenio, 18 Berkeley Math Circle, xiv BMC-Elementary, xiv BMC-Upper, xiv Berkeley Math Circle (BMC), 307, 310, 312, 313, 330 Berkeley Math Circle(BMC), 305 Berkeley Mini-Math Tournament, 317 Berlekamp, Elwyn, 313 337

338 bijection, 241 binomial coefficients, 98, 165, 238 Binomial Theorem, 193, 228, 238 biology, 306 BMC, xxiii Bolyai, János, 17 Boston, 311 bound, 156, 248, 297 bounded, 214 Breen, Mike, xiiin, xxiii Brown, Ian, xxiii Brown, Nico, 317 Brown, Tom, xxiv Bryant, Robert, xxiii Bryn Mawr College, 305, 307, 330 Buhler, Joe, xxiii, 313 Bulgaria, 1, 305, 310, 330 Bush, George W., 314 calculus, xvi, xvii, 5, 213, 290, 296, 316 advanced, 217, 310, 328 Caltech, 328 Cardano, Gerolamo, 104 Carroll, Gabriel, xiii, xxiv, 313, 315, 327 Cartesian form, 190 Cayley transform, 314 central object, 161 central symmetry, 139 centroid, 9, 10, 16, 204 Chain Rule, 290 chemistry, 306 Chen, Evan, xxiv, 316 Chen, William, xxiv China, 319 China Girls Math Olympiad, 316 chord, 18 Chorin, Alexandre, 313 Chu, Timothy, 102 Chung, Fan, 301 circle, xvii, xx, 27 diameter of, 188, 224 symmetries of, 27 unit, 113, 133, 136, 180, 198, 202 circumcenter, 9 circumcircle, 202, 204 Ciubotaru, Dan, 310 Clay Mathematics Institute, xxiii Clay Olympiad Scholar Award, 313 coding theory, 105 collinear, 149 combinatorial gene-counting, 306 combinatorics, xiv, 118, 150, 238, 302, 317, 330

INDEX complement, 245 computer graphics, 328 computer science, xiii, xiv, 319 congruences modulo n, 132 conjecture, xxi, 2, 4, 14, 59, 68, 289, 296 conjugate, 133, 191, 192 conjugation, 48, 190 Conrey, Brian, 312 constraints, 213 convexity, 218 Conway’s checkers, 158 Conway, John Horton, 158 corollary, xxi counterexample, 19, 62 criteria AA, 10, 13, 173, 179, 181, 183, 295 ASA, 7, 8, 21 congruence, 7, 16, 19, 20, 173 H/L, 21 HL, 19, 22 R’A, 7 RA, 9 RR, 7 SAS, 13, 17, 21 similarity, 7, 171, 173, 178 SsA, 19, 21, 22 SSS, 21 critical point, 290 Crossbar Theorem, 19 crystallography, 105 CTY, 318 cube, 138, 139 Cutler, William, 301 Davidson Institute, 318 Davis Math Circle, 310 Davis, Tom, xxiv, 312, 313, 328 de Moivre’s formula, 113, 193 de Souza, Paulo, xxiv de Vera, Wycee, xxiv deductive reasoning, xxii definition, xxi, 16, 23 del Ferro, Scipione, 104 denominator, 148, 167, 191, 226, 246 derivative, 5, 68, 287, 297 DeRose, Tony, xxiii differential geometry, 53, 328 dimensions, 290 dinosaur, 328 Dirac’s Theorem, 152 Dirichlet inverse, 236 product, 81, 233, 234, 247

INDEX series, 82 distance, 200 divisible, xvii division in C, 191 divisors, 32, 79, 80, 86, 130, 147, 243 odd, 250 of zero, 25 prime, 90, 97, 239 domain, 82, 288 Dubish, Roy, 117 Dudzik, Andrew, xxiv, 313 Duluth REU, 315, 330 Dunne, Edward, xxiv Eastern Europe, xiv, 306, 307, 311, 319 economics, xiii, xiv, 327 Eisenbud, David, xxiii ellipse, 26, 31, 47, 215 ellipsoid, 215 elliptic curves, xiv embed, 157 endurance athletics, 328 engineering, 318 English, 306, 307 Epsilon Camp, 318 equality, 99, 211, 280, 290 equation, 97 cubic, 104 linear, 5, 104, 305 of a plane, 228 polynomial, 104, 114 quadratic, 104, 160, 306 quartic, 104 quintic, 104 system, 74, 306 ergotic theory, 64 Escape of the Clones, 158 Euclid, 16, 17 Fifth Postulate, 17 Euclidean motions, 107, 127 isometries, 127 rigid motions, 127 rigid symmetries, 133 Euler, 251, 329 function, 81, 243, 245, 247, 250 Theorem, 243 Europe, 319 European Girls Math Olympiad, 316 example, xx, 34 existence, 149 Expanding Your Horizons, 328 experiment, 2, 4, 9, 12, 14, 34 extra construction, 3, 5, 173, 178

339 Extreme Principle, 151 extreme value, 213 family of similar triangles, 174 Fermat prime, 250, 251, 261 Ferrari, Lodovico, 104 field, 61 Fields Medal, 64 figure eight, 52 First Derivative Test, 220, 230, 231, 283 football, 309 Foundation Merriam-Webster, xxiii Mosse, xxiii National Science, xxiii Packard, xxiii Toyota, xxiii foundation, xix of geometry, 1, 5, 16, 19 fractional part, 148 fractions, xvii, 5, 46, 188, 191, 299, 302 Fuchs, Dmitry, xxiv, 312, 313 function, xvii, 80, 141, 288, 289 arctan x, 297, 299, 300, 303 arithmetic, 82, 92 concave, 270 constant, 83, 229 continuous, 217, 219, 268, 278 convex, 218, 268, 283 differentiable, 220 exponential, xvii, 278 increasing, 303 linear, xvii, 154, 229 logarithmic, 270, 278 max, 147 multiplicative, 79, 81, 82, 93, 233 odd, 303 power, 83 quadratic, xvii Riemann zeta, 81, 82, 299 square root, 303 strictly convex, 269, 277 strictly increasing, 95, 102 strongly multiplicative, 83, 87, 255 sum-function, 92–95, 233, 239, 245 symmetric, 228 tangent, 296 trigonometric, xvii, 278 Fundamental Theorem of Algebra, 208 Galileo, xx Gallian, Joe, 31 Galois theory, 114

340 Galois, Évariste, 103, 114 game theory, xvi Gates, Bill, 314 Gauss, Carl Friedriech, 17, 89, 329 gcd, 91, 135, 143, 166, 244 geometry, 26, 306, 330 analytic, 223 basic tools, 5 circle, 1, 171 classical, 1 elliptic, 16 Euclidean, 1, 16, 20, 317 hyperbolic, 1, 16, 17, 20 inversion in the plane, 1, 18, 178 mass point, 1 non-Euclidean, 17 plane, xvi, 1, 5, 8, 16, 171, 212, 218, 223, 290 projective, 328 space, 290 synthetic solution, 5, 171, 287, 302 Givental, Alexander, xxiii, xxiv, 302, 312, 313 glide reflection, 127 Graham, Ron, 301 graph theory, xiv, 152 graphs, 315 cubic, 315 edges of, 152 Hamiltonian, 152, 315 of convex functions, 229 of functions, xvii, 181, 292 of trigonometric functions, 188 vertices of, 152 grid, 14, 302 group, 23, 103 n-cycle, 35 r-cycle, 119, 121, 137 2-cycle, 35, 119 3-cycle, 139 abelian, 25, 111, 131, 135 action, 105 alternating, on n objects, 38, 122 commutative, 25 cube symmetries, 122 cycle, 29 cyclic, 25, 47, 48, 114, 115, 133–135 cyclic subgroup, 136 definition, 23, 105, 110 dihedral, 27, 107, 113 direct product, 34, 48 disjoint cycles, 32, 36, 119, 137 examples, 24

INDEX existence of inverses, 46, 47 generator, 25, 26, 46, 47, 114, 134 identity, 24, 110, 117, 124, 130–132 intersection of subgroups, 32 inverse, 24, 110, 117 isomorphic, 30, 130 isomorphism, 27, 134 multiplication table, 29 of permutations, 27 of Rubik’s Cube, 27 of symmetries, 26, 107, 117, 127, 132 order of, 31, 32, 108 order of an element, 31, 32, 115 order of Rubik’s Cube, 39 properties, 31 representations, 64 semidirect product, 113, 133 simple, 38 single face subgroup, 34 slice moves, 32 slice subgroup, 34 structure, 81, 82 subgroup, 31, 34, 47, 48, 104, 109, 123 subgroups of Rubik’s Cube, 32 symmetric, on n objects, 29, 30 symmetries in space, 108 theory, xiv, 23, 31, 38, 103, 330 theory, combinatorial, 53 transposition, 119, 137 trivial, 25, 129 Gump, Forrest, 328 Harris, Joe, xxiii Hartshorne, Robin, 313 Harvard, 305, 310, 314, 327, 330 Hayes, David, 329 Herriot, Neil, xxiv, 313 hexagonal plate, 108 High School Oakland Technical, 327 High School Chelmsford, 311 College Preparatory, 316 Henry Gunn, 312, 313 Newton North, 311 Oakland, xix, 312, 313, 329 Presentation, 313 Westlake Junior, 329 high-voltage symbol, xxi Hilbert, David, 16, 17 Congruence Axiom, 17 Parallel Axiom, 18 Holtz, Olga, 313

INDEX Holy Names College, 329 homotopy, 328 Howe, John, 313 hypotenuse, 175, 185, 288 IMO, xiii, xvi, 101, 225, 305, 310, 313–316, 327, 330 logo, 68 incenter, 9 incidence coloring, 315 Inclusion-Exclusion Principle, 250 India, 319 induction, xx, 67, 99, 144, 266, 282, 300 strong, 94, 101, 137 inequalities, xvi, 81, 100, 147, 186, 211, 263, 288, 289, 302, 327 AM-GM, 215, 226, 227, 232, 264, 266, 269, 270, 285 AM-GM-HM, 217 AM-HM, 228, 265, 269, 281 baby AM-GM, 212 chain of, 214 diagram of, 225 HLP, 222, 231, 271 Jensen’s, 230–232, 270, 278, 283 Jensen’s, JI, 221 Karamata’s, 271 Minkowski’s, 287, 289, 292 PM, 216, 229, 232 Rearrangement, 266, 282 weighted, 231 weighted AM-GM, 222, 232 weighted Jensen’s, 223, 232, 270 weighted PM, 223, 232 infinite raffle, 79 input, 82, 83, 297 integer powers in C, 195 integral, 299, 304 antiderivative, 304 Intel Science Talent Search, 315 interval, xvii, 219 invariant, 145, 148, 164, 203, 271 under multiplication, 205 Ishikawa, Yuki, xxiii Japanese, 329 Jobs, Steve, 314 John Hopkins, 317, 318 Jones, Vaughan, 64 jumping fleas, 153 Jurassic Park, 328 Jussieu, Paris, 315 Kahan, William, 314

341 Karighattam, Arav, 317, 318 Kazakh State Uniiversity, 329 Kazakhstan, 329 Kedlaya, Kiran, 313 King Arthur, 152 King Solomon, 68 Kiselev, A. P., 20 knot, 49 74 knot, 57 n1 , 59 amphichiral, 67, 68, 77 Celtic, 68, 78 chiral, 67, 68 connected sum, 60 crossing, 51 crossing number, 55, 58 diagram, 51, 54 equivalent, 50, 53 figure eight, 57, 58, 60 flype transformation, 70 fundamental group, 53 Gordian, 49 invariant, 50, 54, 58 KnotPlot, 70 mirror image, 67, 68, 77 not equivalent, 55 polynomial invariant, 70 quandle invariant, 70 square, 60 strand, 53, 56 string, 51 surgery, 72 theory, xvi, 50, 53 two-colorability, 58 unknotting number, 56, 58, 71 Knuth, Donald, 312 l’Hôpital’s Rule, 217, 229 Lam, Quan, xxiv, 312 Lang, Serge, 313 law, xiii associative, 61, 234 commutative, 234 distributive, 61, 86 lcm, 32, 119, 137, 143, 164, 249 Lee, Hojae, xxiv lemma, xxi Lie algebras, 310 limit, 217, 266, 278, 304 linear algebra, xiv, 61, 63, 128, 310, 314, 316 real orthogonal matrix, 314 lines, xvii, 205

342 concurrent, 10 parallel, 8, 16, 17, 173, 207, 295, 303 perpendicular, 176 link, 51 4-crossing, 68 Borromean rings, 52, 54, 58 Brunnian, 52, 60, 71 component, 51 Hopf, 52, 54, 58, 62, 64, 66, 67 invariant, 58, 59, 64 linear chain of rings, 60 local coloring, 62 necklace of rings, 60 number of components, 55 orientation, 64, 67 wedding/trinity ring, 68, 78 Whitehead, 52, 54, 58, 68, 71 literature, 306 Liu, Tiankai, 314, 316 Lobachevsky, Nikolai, 17 logic, xx, 53 Los Angeles Math Circle, xv, 310, 313 Möbius function, 237, 239 inverse, 239, 241 inversion, 81, 82, 241, 247, 253 relation, 239 Möbius Band, 328 MAA, 330 Madison, Sharon, 313 mansion problem, 276 margin pictures, xxi Marin Math Circle, xv, 328 mass point, 329 math, 68, 176 math circle, xiii, 309, 319, 328 Math Kangaroo, 318 Math Teachers’ Circle Network, 329 mathematical research, xv, xvi, 314, 315, 330 mathematics education, 319 Mathletes, 329 Matić, Ivan, xxiii, xxiv matrix, 61 augmented, 61 coefficient, 61, 63 echelon form, 61, 63, 74 inverse, 61 null space, 74 row operations, 61 maximize, 213 Maximum Principle, 220, 221

INDEX Maydanskiy, Maksim, xxiv, 313, 315 McCuan, John, 312 mean arithmetic, 212, 215, 224, 276, 278 geometric, 212, 216, 224, 281 harmonic, 216 power, 216, 278 root mean square, 216, 276 median, 9 Megginson, Bob, xxiii Microsoft Research, 327 midpoint, 220 Midpoint Rule, 220, 229, 268, 280, 283 midsegment, 9, 16, 130 Mills College, 139, 328, 330 Milnor invariant, 71 Minimality Principle, 121 minimize, 204, 267 Mirin, Alison, 139 Mironov, Dmitri, xxiv MIT, 316, 327 modular arithmetic, 25, 61, 260 moduli spaces of curves, 330 modulo n addition, 46, 112 multiplication, 46, 47 modulus, 136, 190, 192, 193 mono-coloring, 74 monochromatic, 63, 71, 74 Monotone Bounded Theorem, 304 monovariant, 213, 266, 271, 275, 327 concentration, 276 continuous, 151, 277 decreasing, 144 discrete, 151, 277 distance, 278 doubly-symmetric, 163 extremal, 143 geometric, 150 numerical, 141 operation, 151 sum-monovariant, 142 with sequences, 146 Monsky’s Theorem, 315 Monthly Contest, xv, xxiv, 315, 327 Moscow State University, 329 MOSP, 310 MSRI, xxiii, 311–313, 328, 330 Multi-smoothing Lemma, 274, 283 multiplication table, 106, 126, 129 multiplicative identity, 234, 255 multiplicative inverse, 236 music, 306, 317

INDEX mysticism, 68 National Academy of Sciences, 318 natural sciences, xiv Navajo Nation Math Circles, 329 Ngo, Hoan, 313 Nir, Oaz, 313 Nixon, Richard, 328 normal (line), 176 Novosibirsk, 329 number complex, xvi, 24, 46, 82, 113, 131, 190 composite, 99 integer, xvii, 24, 46, 82, 112 irrational, 148, 166 natural, xix, 24, 79, 82, 297 non-negative, 212, 288 prime, 25 purely imaginary, 195 rational, 24, 46, 61, 148, 166, 223, 285 real, xvii, 18, 24, 46, 61, 111, 112, 195 number theory, xiv number theory, xvi, 25, 86, 132, 310, 317, 327, 330 numerator, 148, 215 O’Dorney, Evan, xxiv, 101, 254, 313 O’Dorney, Jennifer, xxiii Oakland/East Bay Math Circles, 313 Obama, Barack, 314 OEIS, 317 Olsson, Martin, xxiii operation in a group associative, 24, 46, 131, 132 binary, 24, 110 closed, 24, 110 commutative, 24 symmetry, 26 operator theory, 64 optimization, 212, 214, 287, 288, 303 global behavior, 292 global extremum, 291 local behavior, 292 minimal, 287, 290 potential extremum, 291 ordered pairs, 190 orthocenter, 9 orthogonal matrices, 128 output, 83, 297 outreach activities, 310, 311 Palimpsest, 301 parallelogram, xvii, 8, 16, 21, 294, 303 center of, 8, 16

343 diagonals of, xvii, 8, 21 parity of cubie’s turn, 39 of edges, 40 partial differential equations, 328 Pascal’s Triangle, 164, 238 path broken, 12, 178, 187 closed, 104, 124, 178 of sunlight, 177, 187 optimal, 10 straight, 178 Paulos, John Allen, 39 Peano axioms, xix Peavy, Barbara, xxiv Pejic, Michael, xxiv Penn State REU, 315 perfect square, 91 permutations, 23, 28, 104, 116, 137 2-cycle, 35 3-cycle, 43 even, 35, 120, 124, 125, 138 group, 27 identity, 37 multiplying, 28 odd, 35, 120, 124, 138, 139 perpendicular bisector, 135 philosophy, xiii, xx, 53 PHP, xx, xxi, 31, 131, 136, 137 physics, xvi, 171, 176, 306 law of, 5, 176 laws of reflection, 176, 177 piano, 305, 330 Pierson, Laura, 313 Pierson, Laura, 316 Pisani, Vincent, 317 Platonic solids, 208 poetry, 306, 317 polar form, 190 policy analysis, xiii polygon, 5 convex, 222 regular, 189, 201, 202, 204, 210, 222 regular n-gon, 27, 112, 135 regular nanogon, 200 regular pentagon, 196, 205 polynomial, xvii, 103, 203, 299 algebra, 36 Jones, 64, 65, 68 real, 208 symmetric, 254 Poonen, Bjorn, xxiii, xxiv, 312, 313, 327 poset, 315

344 possible states, 142 power curve, 194 powers, 161 pre-calculus, xvii prime, xix, 92, 148, 256 decomposition, 84, 248 decomposition, square-free, 252 power, 87, 97, 253 prime-power reduction, 87 primitive root of unity, 113, 198, 201 probability, 306, 328 problem solving techniques, xxi abstract and develop a theory, 81 introduce stronger object, 80 proof via example, 86 reduce to prime powers, 87 problem-solving techniques, xxii programming, 317, 328 proof, xiv, xvi, xx, xxi, 288 property, xxi proposition, xxi protractor, 3, 20 Ptolemy’s Theorem, 179, 182 Putnam, xiii, 101, 314, 316, 327, 328 pyramid, 108 Pythagorean Theorem, xvii, 3, 20, 22, 174, 175, 185, 187, 202, 288, 302 baby Pythagorean, 175, 180, 186 quadrilateral, xvii convex, 150 cyclic, 182, 188 diagonals of, 150, 179 inscribed, 179 quantum mechanics, 64 quotient, 135 radians, 181 radicals, 105 Radko, Olga, 313 ratio, 4, 9, 10, 161, 174, 185, 190, 209 golden, 160 rationalizing, 302 ray, 19, 176, 186 real analysis, xvi, 217, 304, 310, 317 real number system, xix reciprocal, 133, 236 rectangle, 21, 110, 128, 130, 215, 292 reflection, 11, 16, 26, 46, 126, 129, 132, 133, 135, 172, 176, 186, 199, 289 regular polyhedra, 208 Reidemeister change-of-crossing move, 54

INDEX moves, 53, 54, 58, 59, 67, 68, 71, 77 Theorem, 53, 54 Reidemeister, Kurt, 53 relatively prime, 46, 83, 96, 98, 99, 134, 167, 243, 244, 259 remainders, xvii, 135, 259 system of, 244 rescaling, 199 Research Science Institute, 315, 316 restricted patterns, 330 reverse weights, 282 revolution, 75 Rike, Tom, xix, xxiv, 312, 313, 329 roots formula, 196 in C, 196 of unity, 198, 202, 205 Rossi, Hugo, xxiii, 311 rotation, 26, 30, 33, 40, 41, 46, 122, 126, 129, 135, 179, 199 Rousse, 306 Rubik’s Cube, xvi, 23, 103, 116, 328 Rubik’s cube, 81 Rubik’s Cube group, 27 San Francisco Math Circle, 313 San Francisco State College, 329 San Jose Math Circle, xv, 313, 328, 329 San Jose State University, 312, 329 Savine, Igor, xxiv science, 51, 305, 306, 316, 318, 319 Scripps Spelling Bee, xxiii, 313 Second Derivative Test, 220, 230, 284 segment, 19, 185, 192, 218, 283, 295 self-correcting process, 151 sequence, 144 constant, 222 convergent, 278, 280, 300, 304 increasing, 101, 296, 299 majorizes, 222, 231, 271 monotone, 304 of averages, 167 of moves, 104 of transformations, 57 recursive, xiv, 76 stabilizes, 167 subsequence, 299 Serganova, Vera, xxiv, 313 series, xiv, 203 arithmetic, 89 geometric, 88, 161, 170, 205, 304 harmonic, 299 Taylor, 287, 297

INDEX set, 205 convex, 218 of numbers, xvii subset of moves, 33 subsets, 227, 269 theory, xiv Shapiro, Austin, 313 Shubin, Tatiana, xiiin, xxiv, 313, 329 Silicon Graphics, 328 Singer, Michael, xxiii Sizemore, Steve, xxiv skein relation, 65, 75, 78 Slettnes, Espen, 318 smooth power curve, 197 smoothing, 225, 264 endless, 265 Smoothing Lemma, 269, 274, 283 Snow, Marsha, xxiv soccer, 306 Sofia University, 330 South Korea, 319 Soviet Union, USSR, xix, 319, 329 span of vectors, 74 Special Interest Group of MAA on Math Circles, 329 square, 26, 46, 130, 315 Squeeze Theorem, 217 stabilize, 144, 147 Stanford, xiii, xxiii, 310, 314, 327, 328 Stanford Math Circle, 313 Stankova, Zvezdelina, 330 Stankova, Zvezdelina, xiiin, xxiv Sturmfels, Bernd, 313 subfigures, labeling of, xxii subtraction in C, 193 Sudbury Math Circle, 313 sum bounded from above, 299 finite, 297, 299 infinite, 297, 299 of squares, 202 partial, 296, 299, 304 system of equations, 61 homogeneous, 62 Tartaglia, Niccoló, 104 Taylor expansion, 172, 299 technical writing skills, xv technology, 319 telescoping, 300 Terminator II, 328 tetrahedron, 108, 122, 138 theorem, xxi, 16, 23, 177

345 top-tier math circle, xiv, xx, 314, 320 topology, xvi, 310, 328 algebraic, 328 combinatorial, 53 data analysis, 328 low-dimensional, 64 transformation, xiv, 172, 199, 287, 292 translation, 115, 132, 134, 135, 199, 294 transpositions, 267 transversal, 10, 17, 173 Trapa, Peter, 310 trapezoid, xvii, 218, 224, 269, 283 trefoil, 52, 56–58, 60, 62 left-handed, 60, 71 right-handed, 51, 60, 66, 71 via skein relation, 65 Triangle Inequality, 5, 12, 16, 175, 192, 209, 294 triangles, xvii, xx, 315 center of, 9 congruent, 3, 6 equilateral, 26, 30, 46, 107, 128, 175, 185 isosceles, 20, 22, 185 obtuse, 15 right, 15, 20, 182, 295 right isosceles, 15, 16 similar, 3, 5, 6, 13, 224 symmetries of, 26 triangulation, 315 tricoloring, 56, 62, 74 not tricolorable, 57 number of, 59 set of, 63 tricolorability, 58, 72 tricolorable, 56, 59, 60 trivial, 56 Trigg, Charles, 301 trigonometry, xvi, 3, 171, 179, 223, 302 application of, 184 cosine, 180, 182 cotangent, 180, 181 inverse, 297 sine, 180, 182 tangent, 180, 181, 184 Tung, Stephanie, xxiv Turkey, 49 UC Berkeley, 330 UC Berkeley, xxiii, 64, 302, 312, 314, 315, 319, 327 UC Davis, 312, 329 UC San Diego, 328

346 UC Santa Barbara, 329 uniqueness, 31, 47, 111, 149, 177, 234, 290, 295, 303 University of San Francisco, 311 unknot, 52, 56, 58, 62 unsmoothing, 276 US, xx, 307, 308, 310, 313, 318 USAMO, xxiv, 313, 315 Utah Math Circle, 310 Vakil, Ravi, xxiii, 313, 314 Vandervelde, Sam, 313 variable, 62, 74 Viète’s Formulas, 203, 204 Vladimir Arnol’d, 208 von Mangoldt function, 242 von Neumann algebras, 64 warning road sign, xxi Washington, D.C., 327 weight, 222 Wertheimer, David, xxiv Whitlow, Marc, xiiin, xxiii Wiegers, Brandy, 313 Wikipedia, xxiv Wiles, Andrew, xxii, 39 Wu, Hung-Hsi, 319 Yeung, Joyce, xxiv Zakharevich, Inna, xxiv, 313 Zeitz, Paul, xxiv, 311, 313 Zucker, Joshua, xxiv, 312, 313 Zuckerberg, Mark, 314

INDEX

Photo courtesy of Rudolph Chung

Many mathematicians have been drawn to mathematics through their experience with math circles. The Berkeley Math Circle (BMC) started in 1998 as one of the very first math circles in the U.S. Over the last decade and a half, 100 instructors—university professors, business tycoons, high school teachers, and more—have shared their passion for mathematics by delivering over 800 BMC sessions on the UC Berkeley campus every week during the school year. This second volume of the book series is based on a dozen of these sessions, encompassing a variety of enticing and stimulating mathematical topics, some new and some continuing from Volume I: • from dismantling Rubik’s Cube and randomly putting it back together to solving it with the power of group theory; • from raising knot-eating machines and letting Alexander the Great cut the Gordian Knot to breaking through knot theory via the Jones polynomial; • from entering a seemingly hopeless infinite raffle to becoming friendly with multiplicative functions in the land of Dirichlet, Möbius, and Euler; • from leading an army of jumping fleas in an old problem from the International Mathematical Olympiads to improving our own essay-writing strategies; • from searching for optimal paths on a hot summer day to questioning whether Archimedes was on his way to discovering trigonometry 2000 years ago Do some of these scenarios sound bizarre, having never before been associated with mathematics? Mathematicians love having fun while doing serious mathematics and that love is what this book intends to share with the reader. Whether at a beginner, an intermediate, or an advanced level, anyone can find a place here to be provoked to think deeply and to be inspired to create. In the interest of fostering a greater awareness and appreciation of mathematics and its connections to other disciplines and everyday life, MSRI and the AMS are publishing books in the Mathematical Circles Library series as a service to young people, their parents and teachers, and the mathematics profession.

For additional information and updates on this book, visit www.ams.org/bookpages/mcl-14

AMS on the Web www.ams.org MCL/14