Applications of Diophantine Approximation to Integral Points and Transcendence 9781108348096

This introduction to the theory of Diophantine approximation pays special regard to Schmidt's subspace theorem and

535 36 1MB

English Pages 209 Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Applications of Diophantine approximation to integral points and transcendence 9781108424943, 9781108348096

313 54 1005KB Read more

Diophantine approximation

140 18 2MB Read more

Equidistribution and Counting Under Equilibrium States in Negative Curvature and Trees: Applications to Non-Archimedean Diophantine Approximation 9783030183141, 9783030183158, 3030183149

351 113 3MB Read more

Nevanlinna Theory and Its Relation to Diophantine Approximation 2021007603, 9789811233500, 9789811233517, 9789811233524

338 84 47MB Read more

Sums of Reciprocals of Fractional Parts and Multiplicative Diophantine Approximation [1 ed.] 9781470456603, 9781470440954

View the abstract.

164 113 892KB Read more

Integral points on algebraic varieties. An introduction to Diophantine geometry [1st ed.] 978-981-10-2648-5, 9811026483, 978-93-80250-83-0

432 47 826KB Read more

Systems, Approximation, Singular Integral Operators, and Related Topics: International Workshop on Operator Theory and Applications, Iwota 2000 9783034895347, 3034895348

This book is devoted to some topical problems and applications of operator theory and its interplay with modern complex

399 108 5MB Read more

Convexity, Extension of Linear Operators, Approximation and Applications 1527585042, 9781527585041

This book emphasizes some basic results in functional and classical analysis, including Hahn-Banach-type theorems, the M

248 100 8MB Read more

Diophantine equations and power integral bases theory [2 ed.] 9783030238643, 9783030238650

395 63 2MB Read more

Diophantine Approximation and Dirichlet Series [2 ed.] 9789811593505, 9789811593512, 9789386279828

The second edition of the book includes a new chapter on the study of composition operators on the Hardy space and their

380 98 4MB Read more

Applications of Diophantine Approximation to Integral Points and Transcendence
9781108348096

Author / Uploaded
Pietro Corvaja
Umberto Zannier

Categories
Mathematics

Table of contents :
Contents......Page 6
Preface......Page 7
Notation and Conventions......Page 10
Introduction......Page 12
1.1 The Origins......Page 14
1.2 From Thue to Roth......Page 25
1.3 Exercises......Page 36
1.4 Notes......Page 38
2.1 From Roth to Schmidt......Page 40
2.2 The S-Unit Equation......Page 43
2.3 S-Unit Points on Algebraic Varieties......Page 46
2.4 Norm-Form Equations......Page 49
2.5 Exercises......Page 53
2.6 Notes......Page 55
3.1 General Notions on Integral Points......Page 59
3.2 The Chevalley–Weil Theorem......Page 64
3.3 Integral Points on Curves: Siegel’s Theorem......Page 71
3.4 Another Approach to Siegel’s Theorem......Page 76
3.5 Varieties of Higher Dimension......Page 81
3.6 Quadratic-Integral Points on Curves......Page 100
3.7 Rational Points......Page 103
3.8 The Hilbert Irreducibility Theorem......Page 106
3.9 Constructing Integral Points on Certain Surfaces......Page 120
3.10 Exercises......Page 124
3.11 Notes......Page 127
4.1 Linear Recurrences......Page 130
4.2 Zeros of Recurrences......Page 134
4.3 Quotients of Recurrences and gcd Estimates......Page 137
4.4 Applications of gcd Estimates......Page 145
4.5 Further Diophantine Problems with Recurrences......Page 153
4.6 Fractional Parts of Powers......Page 164
4.7 Markov Numbers......Page 168
4.8 Exercises......Page 173
4.9 Notes......Page 178
5.1 Transcendence of Lacunary Series......Page 183
5.2 Complexity of Algebraic Numbers......Page 187
References......Page 199
Index......Page 208

Citation preview

C A M B R I D G E T R AC T S I N M AT H E M AT I C S General Editors ´ S , W. F U LTO N , F. K I RWA N , B. BOLLOBA P. S A R NA K , B . S I M O N , B . TOTA RO 212 Applications of Diophantine Approximation to Integral Points and Transcendence

C A M B R I D G E T R AC T S I N M AT H E M AT I C S GENERAL EDITORS ´ W. FULTON, F. KIRWAN, B. BOLLOBAS, P. SARNAK, B. SIMON, B. TOTARO A complete list of books in the series can be found at www.cambridge.org/mathematics. Recent titles include the following: 178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211. 212.

Analysis in Positive Characteristic. By A. N. Kochubei ´ Matheron Dynamics of Linear Operators. By F. Bayart and E. Synthetic Geometry of Manifolds. By A. Kock Totally Positive Matrices. By A. Pinkus Nonlinear Markov Processes and Kinetic Equations. By V. N. Kolokoltsov Period Domains over Finite and p-adic Fields. By J.-F. Dat, S. Orlik, and M. Rapoport ´ ´ and E. M. Vitale Algebraic Theories. By J. Adamek, J. Rosicky, Rigidity in Higher Rank Abelian Group Actions I: Introduction and Cocycle Problem. By A. Katok and V. Nit¸ica˘ Dimensions, Embeddings, and Attractors. By J. C. Robinson Convexity: An Analytic Viewpoint. By B. Simon Modern Approaches to the Invariant Subspace Problem. By I. Chalendar and J. R. Partington Nonlinear Perron–Frobenius Theory. By B. Lemmens and R. Nussbaum Jordan Structures in Geometry and Analysis. By C.-H. Chu Malliavin Calculus for Lévy Processes and Infinite-Dimensional Brownian Motion. By H. Osswald Normal Approximations with Malliavin Calculus. By I. Nourdin and G. Peccati Distribution Modulo One and Diophantine Approximation. By Y. Bugeaud Mathematics of Two-Dimensional Turbulence. By S. Kuksin and A. Shirikyan A Universal Construction for Groups Acting Freely on Real Trees. By I. Chiswell and ¨ T. Muller The Theory of Hardy’s Z-Function. By A. Ivić Induced Representations of Locally Compact Groups. By E. Kaniuth and K. F. Taylor Topics in Critical Point Theory. By K. Perera and M. Schechter Combinatorics of Minuscule Representations. By R. M. Green ´ Singularities of the Minimal Model Program. By J. Kollar Coherence in Three-Dimensional Category Theory. By N. Gurski Canonical Ramsey Theory on Polish Spaces. By V. Kanovei, M. Sabok, and J. Zapletal A Primer on the Dirichlet Space. By O. El-Fallah, K. Kellay, J. Mashreghi, and T. Ransford Group Cohomology and Algebraic Cycles. By B. Totaro Ridge Functions. By A. Pinkus Probability on Real Lie Algebras. By U. Franz and N. Privault Auxiliary Polynomials in Number Theory. By D. Masser Representations of Elementary Abelian p-Groups and Vector Bundles. By D. J. Benson Non-homogeneous Random Walks. By M. Menshikov, S. Popov, and A. Wade Fourier Integrals in Classical Analysis (Second Edition). By C. D. Sogge Eigenvalues, Multiplicities and Graphs. By C. R. Johnson and C. M. Saiago Applications of Diophantine Approximation to Integral Points and Transcendence. By P. Corvaja and U. Zannier

Applications of Diophantine Approximation to Integral Points and Transcendence P I E T RO C O RVA JA Università degli Studi di Udine, Italy U M B E RTO Z A N N I E R Scuola Normale Superiore, Pisa

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781108424943 DOI: 10.1017/9781108348096 © Pietro Corvaja and Umberto Zannier 2018 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2018 Printed in the United Kingdom by Clays, St Ives plc A catalogue record for this publication is available from the British Library. ISBN 978-1-108-42494-3 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

Preface Notation and Conventions

page vii ix

Introduction

1

1

Diophantine Approximation and Diophantine Equations 1.1 The Origins 1.2 From Thue to Roth 1.3 Exercises 1.4 Notes

3 3 14 25 27

2

Schmidt’s Subspace Theorem and S-Unit Equations 2.1 From Roth to Schmidt 2.2 The S-Unit Equation 2.3 S-Unit Points on Algebraic Varieties 2.4 Norm-Form Equations 2.5 Exercises 2.6 Notes

29 29 32 35 38 42 44

3

Integral Points on Curves and Other Varieties 3.1 General Notions on Integral Points 3.2 The Chevalley–Weil Theorem 3.3 Integral Points on Curves: Siegel’s Theorem 3.4 Another Approach to Siegel’s Theorem 3.5 Varieties of Higher Dimension 3.6 Quadratic-Integral Points on Curves 3.7 Rational Points 3.8 The Hilbert Irreducibility Theorem 3.9 Constructing Integral Points on Certain Surfaces 3.10 Exercises 3.11 Notes

48 48 53 60 65 70 89 92 95 109 113 116

v

vi

Contents

4

Diophantine Equations with Linear Recurrences 4.1 Linear Recurrences 4.2 Zeros of Recurrences 4.3 Quotients of Recurrences and gcd Estimates 4.4 Applications of gcd Estimates 4.5 Further Diophantine Problems with Recurrences 4.6 Fractional Parts of Powers 4.7 Markov Numbers 4.8 Exercises 4.9 Notes

119 119 123 126 134 142 153 157 162 167

5

Some Applications of the Subspace Theorem in Transcendental Number Theory 5.1 Transcendence of Lacunary Series 5.2 Complexity of Algebraic Numbers References Index

172 172 176 188 197

Preface

The present work originates from a short course (14 hours) given by the second author at the University of Pisa during October 2002; it was addressed to graduate students, who did not necessarily have a specific background. Notes were taken and collected in a short volume [Z5], which is now out of print. About ten years later, the first author gave another short course at the Mathematical Science Institute of Chennai, India, dealing with similar topics; the notes have recently been published in [Co2]. In the meantime, several new results had been obtained, and it seemed natural to add some material to the first volume, so as to make it more complete. The present authors had worked on several of the applications presented in the old notes, so they decided to write jointly this entirely new edition. To write an entirely new volume seemed difficult and much more time consuming; therefore we decided to keep much of the former version of the second author’s book [Z5], with just some additions. This also prevented the inclusion of highly interesting results obtained by other authors. As with the former notes, the present work does not require any particular prerequisites; actually, certain basic notions will be recalled, so the general level may be considered fairly elementary. The style is somewhere in between a survey and a detailed account. In any case, the last two chapters especially contain more recent material. Roughly speaking, the contents concern certain applications of Diophantine approximation to Diophantine equations. The whole field is, however, far too vast for a (short) course, or even for a general survey. Therefore we have concentrated on a few topics, involving the celebrated subspace theorem of W. M. Schmidt. However, the (difficult) proof of this theorem will not be discussed, let alone the quantitative versions by J.-H. Evertse, H.-P. Schlickewei, vii

viii

Preface

and Schmidt, and the geometric formulations due to Faltings and Wüstholz and to Evertse and Ferretti. Even within these limitations, we have not always given complete details. The five chapters contain several exercises, proposed both in the course of the main text and in a separate section near the end of each chapter. Those in the latter category, often containing hints at solutions, sometimes convey known results, which are not inserted in full for the sake of brevity. A ∗ is attached to somewhat more involved exercises. Insofar as the proofs of the theorems are concerned, we have basically followed the original arguments, but naturally sometimes we have introduced (more or less slight) variations. Also, some statements appear for the first time in the literature, especially concerning concrete examples and applications.

Notation and Conventions

The letters N, Z, Q, R, C will have their usual meanings and Q will denote an algebraic closure of Q. Usually (but not always) the letter k will denote a number field, with ring of integers O = Ok ; further related notation will be introduced or recalled in Section 1.2.2. If P ∈ k[X1 , . . . , Xn ] and if σ is an isomorphism of k in some field, Pσ will denote the polynomial obtained by applying σ to the coefficients of P. For a group G, the set {gd : g ∈ G} will be denoted by [d]G. By Gnm we shall denote the nth power of the multiplicative algebraic group Gm , as recalled in Section 2.3. For a commutative ring R, we shall denote by R∗ the (multiplicative) group of invertible elements in R. The symbols “An ” and “Pn ” will denote respectively affine and projective n-dimensional spaces. The point of Pn with homogeneous coordinates x0 , x1 , . . . , xn will be denoted by (x0 : x1 : · · · : xn ). For an algebraic variety V , embedded in some affine or projective space, V (L) will denote the set of points of V with coordinates in the field (or ring, or set, if V is affine) L. We have sometimes used in an equivalent way the terminology “point of V ” or “vector of V .” By “V /k” we shall mean that V is defined over the field k, i.e., defined by a system of equations with coefficients in k. In that case, k(V ) will denote the function field of V over k; if V is affine, k[V ] will denote the coordinate ring over k. (Also some further terminology from algebraic geometry will be standard, following, for example, [H].) Usually, X will denote a vector of variables (X1 , . . . , Xn ), while x will represent suitable specializations of X. For a vector a = (a1 , . . . , an ) ∈ Zn , we shall put Xa := X1a1 · · · Xnan . The symbols O and will have their usual meanings; namely, for real ix

x

Notation and Conventions

functions f , g of certain variables, expressions like “ f = O(g)” and “ f g” will mean that | f | ≤ C · |g| for the relevant values of the variables (which will normally be clear from the context), where the implied constant C is a positive number dependent only on certain basic data. These data too will be normally clear from the context; if not, notations like “ f ε g” will mean that C may depend also on the parameter ε . By f g, we mean both f g and f g. Concerning the list of references. Whenever the content of certain original papers has been treated exhaustively in some book, we have often cited only the book, with the double aim of directing the reader toward a more ample source and not expanding the already rather long list. Again to avoid lengthening the list of references, we have occasionally omitted some specific relevant reference, provided that it appears in some other item that has been cited.

Introduction

Diophantine approximation may be roughly described as the branch of number theory concerned with approximations by rational numbers; or rather, this constituted the original motivation. That such questions have attracted continued attention is undoubtedly substantially due to their relevance for another, more ancient, topic: the theory of Diophantine equations, namely those whose solutions have to be found in integers or rationals, possibly in a finite extension of Q. The connections between the subjects, which had already been observed by Lagrange and Legendre, were explicitly pointed out by the Norwegian A. Thue; in 1909 he proved a finiteness theorem for Diophantine equations which for the first time included whole families of equations, of arbitrarily large degree. At that time they could be treated only occasionally, and merely with ad hoc methods, albeit ingenious ones. Thue’s theorem relied solely on a result which limited the accuracy of the rational approximations to algebraic numbers (a previous result had been obtained by Liouville, but it was too weak for applications to equations). Thue’s method was extended and refined by such authors as C. L. Siegel, A. O. Gelfond, and F. Dyson; in 1955 K. F. Roth proved a best-possible result in this direction. However, other related questions remained open, like the simultaneous approximations to several numbers; for them, Roth’s techniques gave only partial answers. Around 1970 W. M. Schmidt combined the known methods with new ideas and resolved the whole subject, proving a multidimensional version of Roth’s result, which became known as the subspace theorem. Schmidt himself discovered remarkable applications to Diophantine equations generalizing in several variables those considered by Thue. Later, the theorem was extended by H.-P. Schlickewei to cover number fields and several absolute values. These versions soon suggested new applications, for 1

2

Introduction

instance to the so-called S-unit equations (which had already appeared in Siegel’s work). More recently, still further applications have been found, to Diophantine equations with recurrence sequences of semi-exponential type, and also to the problem of integral points on varieties. The present book will cover some of these results. In Chapter 1 we shall briefly review a few classical facts, from Pell’s equation to Thue’s and Roth’s theorems. We shall also recall some modern versions with several absolute values (after Ridout, Mahler, and Lang) and some applications. In Chapter 2 we shall state a few versions, by Schmidt and Schlickewei, of the subspace theorem. Then we shall apply this to the treatment of the equation x1 + · · · + xn = 1 in S-units xi and, in general, of S-unit points on algebraic varieties. Finally, as an application, we shall present a fairly simple proof of one of Schmidt’s theorems on norm-form equations. Chapter 3 will be devoted to integral points on algebraic curves and on certain varieties of higher dimension. After some definitions and examples, we shall sketch a modern version of Siegel’s original proof of his celebrated theorem; then we shall present a new argument depending on the subspace theorem; here we shall also mention how this method may be extended to cover the case of certain surfaces (and more generally of varieties) with sufficiently many components at infinity. As an application, we treat the question of quadratic-integral points on algebraic curves. In this chapter we consider also the Hilbert property for the set of rational points on an algebaric variety originating from Hilbert’s irreducibility theorem, and compare it with the Chevalley–Weil theorem. Chapter 4 will concern linear recurrence sequences. After surveying a number of basic facts and the classical results on zeros, we shall concentrate on the so-called quotient problem (concerning the integrality of the values un /vn ) and the dth-root problem (concerning the equations yd = un ). A related question treated in this chapter concerns estimates of the greatest common divisor of pairs of numbers of the form (an − 1, bn − 1). We shall present several applications of these estimates, to seemingly unrelated fields. Finally, the last chapter contains applications of Diophantine estimates arising from the subspace theorem to transcendental number theory.

1 Diophantine Approximation and Diophantine Equations

1.1 The Origins As mentioned in the introduction, Diophantine approximation stems from the study of the good rational approximations to a given real number. The term “Diophantine” comes from the mathematician Diophantus of Alexandria (about 250 AD) who wrote a treatise on mathematical problems corresponding to equations in which solutions in integers or rational numbers were required).1 Naturally, every real number admits rational approximations with arbitrarily small error; however, the really “good” ones are those whose accuracy is high compared with the complexity of the rational fraction. In other words, we try to approach our number by means of “simple” rational fractions; that is, ones with a “small” denominator (or numerator). The issue is that, once the target has been specified, not all denominators happen to be equally √ effective. For instance, using the denominator 100, we can approximate 2 at best with an accuracy of about 1/250 (with the fraction 141/100), while the denominator 70 yields an accuracy superior to 1/13, 000 (with the fraction 99/70). These questions go back to ancient times; as remarked by Tijdeman (see his paper in [EE]), the inequalities 233/71 < π < 22/7 obtained by Archimedes may be considered primordial results in this direction. However, apart from the great intrinsic interest of this topic, here we want to emphasize its applications to the theory of Diophantine equations, those to be solved in integers (of Z) or rational numbers (in Q or more generally in a number field); reciprocally, Diophantine equations have often represented a source of motivations for Diophantine approximation. We shall briefly review a few fundamental steps of this interplay, focusing later with more detail on certain aspects (see also Tijdeman’s paper mentioned above). 1

This consisted of several books, of which only a part has survived to our time.

3

4

Diophantine Approximation and Diophantine Equations

1.1.1 Linear Equations The simplest Diophantine equations, the linear ones, were considered by Euclid, who in practice answered all the most natural questions about them. We start with the simplest case of a line passing through the origin, of equation aX = bY . Here a, b can be supposed to be coprime integers. Owing to the uniqueness of factorization in the ring Z of integers, all the integral points are of the form (x, y) = (nb, na), for n ∈ Z. Our second example is a line of equation aY − bX = 1 (a, b > 0 integers); it is particularly illustrative, and the general theory of linear equations boils down to this case. Euclid’s algorithm shows that there exist integer solutions if and only if a and b are coprime. This simple equation already embodies a principle of Diophantine approximation. In fact, for an integer solution (p, q) (with q > 0) we have a p − = 1 . b q qb

(1.1)

Hence the fraction p/q is remarkably close to a/b. In fact, if p , q > 0 are any integers with p /q = a/b, the difference (a/b) − (p /q ) has the form d/bq , where d (= q a− p b) is a non-zero integer; therefore the absolute value |d| ≥ 1, whence |(a/b) − (p /q )| ≥ 1/q b. This shows that the integral point (p, q) on our line produces a rational approximation p/q for the (rational) number a/b which is in a way optimal; for its accuracy is superior to that of any other fraction p /q whose denominator q is < q (with the obvious possible exception of the trivial approximation p /q = a/b). This argument may be reversed, and the search for good rational approximations to a/b leads to solutions for the above Diophantine equation. Indeed, an algorithm for finding solutions to (1.1) comes from the continued fraction for a/b; we review in brief the fundamental facts about this. Remark 1.1 (Euclid’s algorithm and continued fractions) We just recall briefly and without proofs these issues. We start with Euclid’s algorithm for solving ax + by = gcd(a, b) for integers a, b. Assuming b > 0, we divide a by b, obtaining a = q1 b + r1 with 0 ≤ r1 < b. If r1 > 0 we continue as follows: b = q2 r1 + r2 , 0 ≤ r2 < r1 and so on, ri = qi+2 ri+1 + ri+2 , 0 ≤ ri+2 < ri+1 until we obtain a zero remainder, which will certainly happen sooner or later; at that point the algorithm stops. It is easy to check that the last non-zero remainder is the gcd(a, b) and, using the equations in reverse order, we easily obtain the sought solution. (The same algorithm holds in k[X], for any field k.) This kind of algorithm can be rephrased in terms of the continued fraction

1.1 The Origins

5

expansion of the (positive) rational number a/b in the sense that we may write a r1 1 = a1 + = a1 + = · · · = a1 + b b a2 + r2 /r1

1 a2 +

.

1 ..

.+

1 am

This expansion is essentially unique (except that, if am > 1, we may replace am by (am − 1) + 1). A solution to Euclid’s equation is obtained by computing the truncated continued fraction at the penultimate term. This algorithm works for any real number ξ in the following way. We start by writing ξ = a1 + θ1 , where a1 = [ξ ] is the integral part and 0 ≤ θ1 < 1. If θ1 = 0 (which is certainly the case if ξ is irrational), we write θ1 = 1/ξ1 with ξ1 > 1, and we continue with ξ1 = a2 + θ2 , where 0 ≤ θ2 < 1. If ξ is rational, the procedure ends after finitely many steps and amounts to Euclid’s algorithm, as illustrated above. If ξ is irrational, the procedure continues indefinitely and we write

ξ = a1 +

1 a2 +

= a1 +

1

..

1 1 = · · · = [a1 , a2 , . . .], a2 + a3 +

.

where the last two expressions are the customary abbreviations. The integers ai are called partial quotients, a terminology which is motivated by the link with Euclid’s algorithm. They are all strictly positive, with the possible exception of the first one. We have written this equality meaning that the finite truncations to such infinite continued fractions converge to ξ , as can be proved. Actually much more is true: on defining pm /qm = [a1 , a2 , . . . , am ], qm > 0, as the reduced expression for the truncated continued fraction, called the convergent to ξ , we have 1 1 ξ − pm < ≤ . (1.2) qm qm qm+1 am+1 q2m This may be re-written as |qm ξ − pm | < 1/am+1 qm . The approximations are “the best” in the sense that for every integer q < qm+1 and every p we have |qm ξ − pm | ≤ |qξ − p| with equality only for q = qm , p = pm . (In particular, |ξ − pm /qm | < |ξ − p/q| for all integers p and 0 < q < qm .) The last property essentially holds also for a rational ξ . On putting p0 = 1, q0 = 0, the sequences pm and qm satisfy the recurrences pm+2 = am+2 pm+1 + pm ,

qm+2 = am+2 qm+1 + qm ,

which are sometimes expressed in rather convenient matrix form as pm pm−1 a1 1 am+1 1 = ··· . qm qm−1 1 0 1 0

6

Diophantine Approximation and Diophantine Equations

By induction, or taking determinants, these yield that pn qn+1 − pn+1 qn = (−1)n . It is to be noted that, viewing a1 , a2 , . . . as independent variables, the above formula provides infinitely many polynomial parametrizations with integral coefficient for SL2 . As we have remarked, the continued fraction is effectively computable for every given rational number; for quadratic irrationals it has been known from as far back as Lagrange and Galois that the continued fraction is pre-periodic and conversely, that the anti-period and period are effectively computable. On the other hand, very little is known for more general classes of numbers, with a few exceptions; for instance, for no algebraic number of degree > 2 do we know whether the partial quotients are bounded (one would conjecture that they are not). Only for a “few” transcendental numbers do we have explicit formulae, for instance e = [2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, . . .]. We refer to [C1], [L2], and [S1] for the basic theory and proofs of the stated facts. Exercise 1.2 Prove that the different parametrizations of SL2 described above cannot be obtained from one another by polynomial composition. Exercise 1.3 Prove that for coprime a, b Euclid’s algorithm leads to an integral solution (m, n) of aX + bY = 1 after at most constant · log min(|a|, |b|) + 1 steps. (Also, find a “best-possible” constant and show that it is attained with consecutive Fibonacci numbers.) Exercise 1.4 Prove that, if a, b are coprime positive integers, for all sufficiently large integers r there exists a solution of aX + bY = r in non-negative integers. (Also, prove that the largest r for which there are not such solutions is (a − 1)(b − 1) − 1.) Exercise 1.5 √ for 7.

Compute the anti-period and period of the continued fraction

Exercise 1.6 Let A be an r × n matrix with entries in Z and let v ∈ Zr . Prove that the equation Ax = v has a solution x ∈ Zn if and only if the congruence Ax ≡ v mod m has a solution for all positive integers m. (Hint: the image A(Zn ) is a subgroup of Zr . Use the theorem of elementary divisors to find a basis bi of Zr such that some integral multiples δi bi generate the subgroup . . . ).

1.1 The Origins

7

1.1.2 Binary Quadratic Equations Let us now consider quadratic Diophantine equations, which historically represented the next step after the linear case. Again, let us concentrate on the case of two variables, supposed to take integer values; our problem then corresponds to the search for integral points on an affine conic, which can be assumed irreducible (otherwise we fall back to the case of lines). If the conic is an ellipse, the integral points naturally form a finite set, due to compactness.2 If the conic is a parabola, then easy linear substitutions (with integral coefficients together with their inverses) put its equation in the shape dY = aX 2 + bX + c,

a, b, c, d ∈ Z,

ad = 0,

(1.3)

and the search for integral points reduces to the solution of the congruence aX 2 + bX + c ≡ 0 (mod d). We are left with the hyperbola, the most interesting case. It turns out (as observed by Lagrange and Gauss) that the whole theory depends on the equation X 2 − ΔY 2 = 1,

(1.4)

where Δ is a positive integer, assumed not perfect square (for other√ to be a √ 2 2 wise the factorization X − ΔY = (X + ΔY )(X − ΔY ) shows that the only integral solutions are (±1, 0)). This equation, which can be traced back to ancient times,3 was explicitly proposed in the seventeenth century by P. Fermat, the famous judge who was a great mathematician as a hobby. However, Euler erroneously attributed it to J. Pell, and even today the denomination Pell’s equation is commonly used. It was Lagrange who first proved (for this proof see Remark 1.10(ii) below) that, if Δ is a positive integer, not a perfect square, the equation always admits non-trivial integral solutions, namely solutions (p, q) ∈ Z2 such that q = 0. Observe that such a solution generates √ an infinity of them on putting, for any √ integer n ∈ Z, pn ± qn Δ = (p ± q Δ)n , or, equivalently, √ √ √ √ (p + q Δ)n + (p − q Δ)n (p + q Δ)n − (p − q Δ)n √ , qn = pn = . 2 2 Δ In fact, one may check that the (pn , qn ) are pairwise distinct integral points satisfying p2n − Δq2n = 1, i.e. lying on the hyperbola defined by Pell’s equation. Lagrange’s result is quite remarkable, for several reasons. For instance, it 2 3

This is, however, no longer true over an arbitrary number field; in fact, over a suitable quadratic field, affine ellipses and hyperbolas become isomorphic curves. For instance it appears in Indian mathematics of the seventh century – see [W].

8

Diophantine Approximation and Diophantine Equations

√ easily yields the structure of the invertible elements in the quadratic ring Z[ Δ]: they form a group isomorphic to Z/(2)⊕Z (a special case of a result by Dirichlet), where the pair 0 ⊕ 1 is obtained just from the “minimal” non-trivial solution of Pell’s equation. Moreover, as alluded to above, a solution of (1.4) is relevant also in the treatment of general quadratic equations (like e.g. X 2 − ΔY 2 = c). From our point of view, the equation is√linked with the “good” rational approximations for the irrational number Δ. In fact, for a solution (p, q) in positive integers, it is easily verified that √ √ Δ − p ≤ (2 Δ)−1 1 . (1.5) q q2 √ We see that, even forgetting the factor (2 Δ)−1 < 1, the right-hand side is dominated by q−2 ; on the other hand, a random choice for the denominator q, and the consequent optimization √ for p, would yield an accuracy comparable to q−1 for the approximation to Δ. In particular, the error coming from a solution of Pell’s equation is negligible compared with that which may arise from a fraction with a “generic” denominator of similar magnitude. The exponent “2” assigned √ to q on the right-hand side of (1.5) is not unique to the case of the numbers Δ. Actually, it comes from the double freedom in choosing p, q and in fact every irrational number admits an infinity of rational approximations of such an accuracy. This result will be an easy consequence of the following well-known lemma, which is as simple as it is useful and elegant. Theorem 1.7 (Dirichlet’s lemma) Let ξ ∈ R and let Q > 0 be a positive integer. Then there exist p, q ∈ Z, such that (p, q) = 1 and 0 < q ≤ Q,

|qξ − p|
|{rξ } − {sξ }| = |(s − r)ξ − ([sξ ] − [rξ ])|, and, on putting p = [sξ ]−[rξ ], q = s−r, we obtain the desired conclusion. Remark 1.8 A slightly simpler argument is sometimes presented: it considers Q intervals [n/Q, (n + 1)/Q), and only the second case. This yields the weaker estimate in which the right-hand side is replaced by 1/Q, an almost equally useful result. Corollary 1.9 Let ξ ∈ R \ Q. Then there exist infinitely many p, q ∈ Z, q > 0, such that (p, q) = 1 and |qξ − p| < q−1 .

(1.7)

Proof In fact, it suffices to apply the previous result, on choosing successively Q = 1, 2, . . .. The fractions p/q yielded in turn by the conclusion certainly satisfy the inequality of the corollary, since q ≤ Q and hence |ξ − (p/q)| < (qQ)−1 ≤ q−2 . Moreover, such rational fractions p/q constitute an infinite set, since for Q → ∞ their sequence converges to ξ , which is irrational. Remark 1.10 (i) The above discussion on the integer points on a line shows that the corollary is false for ξ ∈ Q. √ (ii) In the special case ξ = Δ, the existence of non-trivial solutions for Pell’s equation yields another √ √ proof of the corollary (through (1.4)), strengthened in fact by a factor 1/(2 Δ). Conversely, applying the corollary to ξ = Δ easily shows the existence of infinitely many solutions for at least one equa√ 2 2 tion of the type X − ΔY = m (where |m| ≤ 2 Δ + 1). Looking then at pairs of positive solutions (p, q) = (p∗ , q∗ ), distinct but congruent modulo m, one finds (see Exercise 1.23 below) non-trivial solutions of Pell’s equation, given by m−1 (pp∗ − Δqq∗ , pq∗ − p∗ q). (iii) It is easily shown (see Exercise 1.15 below) that for almost all real numbers ξ (in the sense of Lebesgue measure) the exponent −1 in Corollary 1.9 is the best-possible value, i.e. the approximations |ξ − (p/q)| < q−2−ε are finite in number as soon as we fix ε > 0 (see [C1], Chapter VII]). Intuitively, this result appears natural; in fact, for integers q having N (decimal) digits, such an approximation yields roughly (2 + ε )N digits of ξ . But in the choice for p, q we dispose of 2N digits only, yielding a gain of information, which is but rarely possible. (For more precise results, due e.g. to Kintchine, see [C1], [S3].) (iv) An efficient algorithm to find the optimal rational approximations comes

10

Diophantine Approximation and Diophantine Equations

from the expansion of ξ as a continued fraction; we have sketched this procedure in Remark 1.1 above (see also [C1], [O], [S2]). Such √ a procedure coincides with Euclid’s algorithm for ξ ∈ Q and for ξ = Δ also leads to the solutions of Pell’s equations. For later reference, we give a multi-dimensional analogue of Dirichlet’s lemma. Theorem 1.11 Let ξ1 , . . . , ξr be real numbers and let Q be a given positive integer; then there exist a positive integer q ≤ Qr and integers p1 , . . . , pr such that |qξi − pi | < Q−1 . Note that for r = 1 we recover almost the previous lemma. Sketch of proof Consider the Qr +1 points ({t ξ1 }, . . . , {t ξr }) in the unit cube, for 0 ≤ t ≤ Qr . Subdividing the unit cube into Qr small cubes of side 1/Q yields two points within the same small cube, corresponding to two different integers 0 ≤ t1 < t2 ≤ Qr . On taking their difference, putting q = t1 − t2 , we obtain the desired inequality. Exercise 1.12 Let a1 < a2 < · · · be the sequence of integers of the form 2r 3s , arranged in increasing order. Prove that the ratio an+1 /an tends to 1 as n → ∞. Exercise 1.13 Let ξ ∈ R. Suppose that w > 0 is such that for every integer Q ≥ 1 there exist integers p, q with |p|, |q| ≤ Q and 0 < |qξ − p| ≤ Q−w . Prove that w ≤ 1. (Hint: fix a large Q and find coprime p, q with the said property. Then define X ≥ Q by |qξ − p| = X −w . Choose now t, u with the property for [2X] in place of Q. Finally, eliminate ξ to estimate |pu − qt|.) Actually the argument proves that in Dirichlet’s lemma we cannot replace the term (Q + 1) by c(Q + 1) for any c > 2. Exercise 1.14 Prove that there exists ξ ∈ R such that for every real number w and infinitely many pairs (p, q) of positive integers we have 0 < |qξ − p| < q−w . (Compare this case with the previous exercise. Hint: define ξ by a series of rational numbers, with suitably rapid convergence.) Exercise 1.15 Prove that the set of real numbers ξ for which there exists a number μ > 1 and infinitely many integers p, q such that |qξ − p| < q−μ has Lebesgue measure zero. Remark 1.16 Approximations in function fields. As we have pointed out, the “exponent” 2 attributed to q−1 in the approximations |ξ − (p/q)| ≤ q−2 comes from the double freedom in choosing p, q. One may see clearly this principle even more by looking at a function field version of the Dirichlet lemma

1.1 The Origins

11

and of this corollary. For this, let ξ (t) be a power series in k[[t]] (where k is a field) and look at “approximations” of ξ by rational functions p(t)/q(t) ∈ k(t), with respect to the topology of k[[t]]: namely, we want that p(t)/q(t) has a Taylor series at the origin which coincides with ξ (t) up to a “large” order. If p, q ∈ k[t] are restricted to have degree ≤ n (which is like bounding p, q in the Dirichlet lemma) we have 2n + 2 free coefficients. Imposing the vanishing of the first N-coefficients of q(t)ξ (t) − p(t) gives a linear system which can be solved nontrivially as soon as 2n + 2 > N. Thus we can achieve that deg p, deg q ≤ n and ordt=0 (qξ − p) > 2n. This shows why the “2” appears. To construct an even closer analogy with the numerical case, let us write q(t) = t n q∗ (1/t), p(t) = t n p∗ (1/t), where p∗ , q∗ are also polynomials of degree ≤ n (and are “large” in k[[t]]). Then ordt=0 (ξ − (p∗ (1/t)/q∗ (1/t))) > n + deg q∗ ≥ 2 deg q∗ while ordt=0 (q∗ (1/t)) = − deg q∗ ≥ −n. Remark 1.17 Good approximations are rare. For a real ξ and positive integer y let us put μ (y) = μ (ξ , y) := minx∈Z |x − ξ y|. We have noticed that μ (y) ≤ 1/2 and this cannot be improved if ξ = n + 1/2, with n ∈ Z, for every odd y. Also, for every ξ it is easy to see that μ (y) ≥ 1/3 for infinitely many y. To go further, fix an irrational ξ and a positive ε < 1/2. One may prove (see Exercise 1.19 or, for example, [C1]) that the density in [1, T ] of the set of y such that μ (y) ≤ ε tends to 2ε as T → ∞. All of this shows in particular that the approximations as in the corollary to Dirichlet’s theorem are very rare. Actually, one can prove that the number of corresponding denominators up to T is log T . See Exercise 1.19. Exercise 1.18 Let ξ be irrational and let 0 < ε < 1. Prove that, for T → ∞, the number of positive integers q ≤ T such that the fractional part {qξ } ≤ ε is ∼ ε T . (Hint: use Dirichlet’s lemma with Q = T to approximate ξ very well with a rational number and argue with residue classes modulo the denominator. This equidistribution principle may also be proved and sharpened by Weyl’s method involving Fourier series, see [C1].) Exercise 1.19 Prove that the number of positive integers q ≤ T such that {qξ } ≤ 1/q is log T . (Hint: consider the difference of approximations p/q, p /q with q < q , and observe that |pq − p q| ≤ 2q /q. Then fix p/q and vary p /q among a few other approximations.) Remark 1.20 Irrationality criterion. When ξ = a/b is rational, Corollary 1.9 does not hold. On the contrary, there exists a c = c(ξ ) > 0 such that every other rational p/q = a/b (b, q > 0) satisfies |ξ − p/q| ≥ c/q. In fact, |ξ − p/q| = |aq − bp|/bq ≥ 1/bq and we can take c = 1/b.

12

Diophantine Approximation and Diophantine Equations

Therefore, to prove that a given number ξ is irrational it suffices to find, for every ε > 0, a rational fraction p/q = ξ such that |ξ − (p/q)| ≤ ε q−1 . This principle for instance leads quickly to a proof of the irrationality of e = 2.7182 . . .. Assuming e ∈ Q, for every p/q = e we would have q|e − qp | > c for some constant c > 0. Now let n be an integer such that 1/n < c and consider the fraction ∑ni=0 1/i! = p/q = e, where q = n!. We have p 1 1 1 1 1 < n! ∑ c < q e − = n! ∑ = k 1 q (n + j)! (n + 1)! n + 1 (n + 1) j>0 k≥0 1− n+1 1 = , n which is a contradiction. √ Exercise 1.21 Prove that 2 is irrational by constructing √ good rational √ approximations to it. For this consider e.g. the equality an − bn 2 := (1 − 2)n , an , bn ∈ Z. Exercise 1.22 ae−1 = 0.)

Prove that e2 is irrational. (Hint: write e2 = a/b, then be −

Other Quadratic Equations In the following exercises we rapidly see how Pell’s equation leads to a general effective analysis of quadratic affine Diophantine equations. Exercise 1.23 (Existence of solutions to Pell’s equation) Show that the procedure outlined in Remark 1.10(i) in fact leads to non-trivial solutions of Pell’s equation. Observe further that the underlying trick is motivated by the following fact: if A is an integral domain and a, b, m ∈ A are non-zero and such that a, b|m and a ≡ b (mod m), then a = ub with u ∈ A∗ . (Hint: observe that a ≡ b (mod b), so b|a and conversely.) Exercise 1.24 (Effectivity for solutions of Pell’s equation) Show that the procedure outlined in Remark 1.10(ii) leads to an upper bound of at most ΔΔ for the minimal (non-trivial) solution of Pell’s equation. Better bounds, of the √ shape Δc Δ , may be obtained via continued fractions or Dirichlet’s class number formula. Exercise 1.25 (Structure of solutions of Pell’s equation) Let (a, b) be a solution of Pell’s equation X 2 − ΔY 2 = 1, such that a > b > 0 and a is minimal with these constraints. Prove that if√(x, y) is any other √ solution then there exists an integer m ∈ Z such that x +√y Δ = ±(a + b Δ)m . (Hint: for a solution (x, y) use the map (x, y) → x√+ y Δ to show that the solutions form a group. Use then the map log |x + y Δ| to R and prove that the image is cyclic.)

1.1 The Origins

13

Exercise 1.26 (The equation X 2 − ΔY 2 = m) Prove that the map defined in the previous hints extends to solutions to and that, if (x, y) is a √ √ this equation, solution, then, for some integer m, (x + y Δ)(a + b Δ)m yields also a solution with coordinates bounded explicitly in terms of a, b, Δ. Deduce that the general integer solution can be derived via finitely many solutions by means of exponential formulae. Exercise 1.27 Let Δ be a positive integer number, not a perfect square. Associate with every solution (a, b) of the corresponding Pell equation X 2 − ΔY 2 = 1 the matrix a Δb . b a Let G be the set of such matrices. Prove that every element in G but ±I has infinite order. Deduce from Lagrange’s theorem on Pell’s equation that G is infinite and, moreover, for every integer m > 1, the subgroup of matrices g ∈ G such that g ≡ I (mod m) is infinite. Exercise 1.28 (The general hyperbola) Let Q(X,Y ) = 0 be a quadratic equation with integer coefficients, representing an affine hyperbola. By means of linear transformations with integer coefficients we may put it in the form U 2 − ΔV 2 = m, where the integer solutions to the former equation correspond bijectively to integer solutions to the latter, which are restricted, however, by a certain congruence on U,V (with respect to a modulus depending only on Q). Use the result of the previous exercise to give an effective algorithm to solve the equation. Prove that, whenever Δ is not a square, either the equation admits no solution or it admits infinitely many solutions. Exercise 1.29 Prove that none of the equations X 2 − 82Y 2 = ±2 has an integer solution, but that the congruence X 2 − 82Y 2 ≡ 2 (mod M) is solvable for every integer M. (Hint: to minimize the number of computations, note the solution 92 − 82 = −1 of the negative Pell equation, and use this solution in place of the minimal (a, b) indicated in Exercise 1.25.) Exercise 1.30 Let Q(X,Y ) ∈ Q[X,Y ] be an indefinite binary quadratic form. Prove that the corresponding orthogonal group over Q is infinite. Use Pell’s equation to show that, if Q has a non-square (positive) discriminant, then the orthogonal group over Z is also infinite (compare this case with Exercise 1.27). Exercise 1.31 Let now Q(X,Y ) ∈ C[X,Y ] be a homogeneous form of degree > 2. Prove that if it has at least three pairwise linearly independent linear factors then its group of automorphisms (over C) is finite. For instance, this is the case whenever Q(X,Y ) = X n − ΔY n , for every complex Δ = 0 and integer n ≥ 3.

14

Diophantine Approximation and Diophantine Equations

Remark 1.32 (Integral points and rational points) In this brief treatment (and also in what follows) we consider mainly affine Diophantine equations, i.e., ones corresponding to algebraic varieties in affine spaces. In turn, this is linked to the search for integral points. Of course, very important problems arise in the search for rational points, which correspond to varieties in projective spaces. The methods for investigating these solutions are usually more difficult, and shall not be discussed in this book except at a very superficial level. (See [BoG] for a proof of Mordell’s conjecture, a celebrated theorem of Faltings.) For quadratic equations there is a local–global principle (also frequently called the Hasse principle) for such points; namely, solvability may be tested using congruences to all moduli (which actually reduces to testing for finitely many moduli). Such a principle does not hold for integral points, as shown by Exercise 1.29.

1.2 From Thue to Roth Note that Remark 1.10(iii) states that the majority of real numbers cannot be approximated by means of rational fractions in a much better way than predicted by the corollary; and Remark 1.10(i) points out that things are even worse for rational numbers. It is usually extraordinarily difficult to establish whether a number given a priori falls among the “exceptional” ones admitting an infinity of approximations appreciably better than predicted by the corollary; this problem has been approached (not to say solved) in only a few cases. As we shall recall, a set which is “not exceptional” in this sense is made up by the algebraic numbers. In this direction, Liouville, back in 1844, established the following theorem. Theorem 1.33 (Liouville 1844) Let ξ be an algebraic number of degree d. There exists a number c = c(ξ ) > 0 such that for any integers p, q, q > 0, either |ξ − p/q| ≥ cq−d or ξ = p/q. Proof The proof is easy. Let f (X) = a0 X d + a1 X d−1 + · · · + ad , a j ∈ Z, be the minimal polynomial of ξ over Z. If p/q = ξ , the number f (p/q) is a non-zero rational whose denominator divides qd ; hence | f (p/q)| ≥ q−d . On the other hand, the mean-value theorem yields | f (p/q)| = | f (ξ ) − f (p/q)| ≤ c |ξ − (p/q)|, where c := sup|t−ξ |≤|(p/q)−ξ | | f (t)|; the sought result follows. Liouville applied the contrapositive statement to construct transcendental numbers (see Exercise 1.50 below). Observe also that the proof yields a

1.2 From Thue to Roth

15

computable value for c(ξ ) (which will not be the case for the improvements we shall meet). Plainly, the theorem is essentially the best-possible case for the rationals (d = 1) and also for the quadratic irrationals (d = 2) (as follows from Corollary 1.9). For the algebraic numbers of degree d ≥ 3, A. Thue, around 1910, was the first to obtain a (very significant) strengthening. He established the following result. Theorem 1.34

Given any real number ε > 0 the inequality ξ − p > q−( d2 +1+ε ) q

(1.8)

holds for all but finitely many integers p, q. For d ≥ 3 we have 1 + (d/2) < d, whence the conclusion improves on Liouville’s theorem. (We shall note that any improvement on the exponent of Liouville, small as it may seem, is extremely significant for applications to Diophantine equations.) In what follows we shall briefly sketch Thue’s substantially elementary, but rather subtle, method; in particular, we shall see that it is not effective (contrary to Theorem 1.33), in the sense that it does not allow one to compute the possible exceptional approximations.4 For the moment, let us observe how it can be applied to certain Diophantine equations, which, by the way, constituted Thue’s main motivation. For this, let us consider, as in the proof of Theorem 1.33, the minimal polynomial f (X) of ξ , defining the homogeneous form of degree d by f˜(X,Y ) := Y d f (X/Y ). Consider now Thue’s equation f˜(p, q) = m,

(1.9)

where m = 0 is an integer and where we look for the solutions in integers p, q. On writing the polynomial in factored form f (X) = a0 (X − ξ1 ) . . . (X − ξd ), where ξ1 , . . . , ξd are the conjugates of ξ , (1.9) takes the form (for q = 0) p |m| p |a0 | − ξ1 · · · − ξd = d . q q |q| Now, for |q| → ∞ the right-hand term converges to 0, whence p/q approaches precisely one of the (distinct!) numbers ξi . In particular, for large q all but one of the factors on the left are bounded from below by a positive number c that is independent of p, q. Hence, for the remaining factor (depending on p, q), say 4

It is, however, possible with this method to bound effectively their number.

16

Diophantine Approximation and Diophantine Equations

the ith one, we have

ξi −

|m| p ≤ |q|−d . q |ao |cd−1

However, for d ≥ 3 this inequality has at most a finite number of solutions in fractions p/q, by Thue’s theorem (with ε = 1/4, say) applied to the algebraic number ξi ; therefore (1.9) too has at most a finite number of integer solutions. So, for instance, while Pell’s equation, Equation (1.4), has infinitely many integer solutions, the apparently analogous equation X 3 − ΔY 3 = 1 has at most finitely many solutions. Observe that the very proof of the simple result obtained by Liouville prevents any such finiteness conclusion. This theorem of Thue had a strong impact, since until that time Diophantine equations had been treated individually, with ingenious but ad hoc methods, which were not suited for broad generalization. Also, it is worth remarking that even a single particular instance of the result is often highly nontrivial. Remark 1.35 We saw in Exercise 1.31 that a homogeneous form f˜(X,Y ) of degree d ≥ 3 like the one appearing in Thue’s theorem has only finitely many automorphisms. Hence there is no simple general way of producing infinitely many solutions to Thue’s equation starting from a single one, unlike what happens for Pell’s equation. Thue’s theorem actually proves that, whenever such a simple reason does not hold, the corresponding Diophantine equation f˜(x, y) = m has only finitely many integral solutions. Siegel’s theorem on integral points on general curves (see Chapter 3) provides a natural generalization: whenever a smooth curve admits only finitely many automorphisms, its set of integral points is finite.

1.2.1 On Thue’s Proof As we remarked, Thue’s proof was elementary, but extremely ingenious. We give a brief sketch of its main points, starting with the following principle. Gap principle. The better the rational approximations to a given number the rarer they are. To quantify this assertion, let p/q, r/s be distinct rational approximations to ξ , such that r ξ − p ≤ q−μ , ξ − q, s > 0 ≤ s−ν , q s

1.2 From Thue to Roth

17

for certain μ , ν , with μ > 2, ν > 1. Suppose also that qμ ≤ sν . Then we have |(p/q) − (r/s)| ≤ q−μ + s−ν ≤ 2q−μ . But (p/q) − (r/s) is a non-zero rational number with denominator qs; its absolute value will therefore be ≥ (qs)−1 . By μ −1 combining these facts, we deduce that s ≥ q 2 which in fact says that s can’t be too near to q, i.e. there is a gap between those denominators.5 Applying the gap principle. Let us see how Thue took advantage of this principle. Starting with a hypothetical excellent approximation a/b for the algebraic number ξ (one so good as to contradict Thue’s inequality (1.8)), we construct (with a method to be described) a whole sequence {rn /sn }n∈N of reasonably good approximations, i.e. ones such that ν |ξ − (rn /sn )| ≤ s− n ,

for a certain ν > 1. Moreover, we require that the sequence sn is not too sparse, that is, without too large gaps. This property ensures that, given a Q > 0, we shall be able to find an n such that sn ≈ Q. Let then p/q be another excellent approximation to ξ , such that |ξ − (p/q)| ≤ q−μ , where μ = 1 + (d/2) + ε ; μ then put Q = q ν and find n so that sn ≈ Q, so sνn ≈ qμ . By the gap principle we μ find that (if rn /sn = p/q) sn qμ −1 , whence q ν qμ −1 . But this implies (taking q larger and larger) that μ − 1 ≤ μ /ν . To obtain the sought contradiction it will then suffice to carry out the construction so that ν > 1 + (2/(2 + d + 2ε )), and this is what Thue could do. We may reformulate this (rough) argument by saying that, since the approximations rn /sn are rather good and without large gaps, there is no space for other “excellent” approximations. Construction of the auxiliary approximations. Let us now illustrate the construction of the sequence (rn , sn ). The algebraic equation for ξ yields polynomials Rn (X), Sn (X) ∈ Z[X], of degree ≤ n, with not too large coefficients (i.e., O(Cn )) and such that Rn (X) − ξ Sn (X) has a zero of high order (≈ cn) at X = ξ . The existence of such polynomials can be proved indirectly with a simple principle from linear algebra. From these properties one can deduce that, since a/b is very near to ξ (we have in fact |(a/b) − ξ | < b−μ by assumption), |Rn (a/b) − ξ Sn (a/b)| is very

5

μ

Of course, we are assuming a priori that s ≥ q ν ; but this inequality becomes weaker (for large q) than what we have found, if μ − 1 > μ /ν .

18

Diophantine Approximation and Diophantine Equations

small; defining then the integers rn , sn so that rn /sn = Rn (a/b)/Sn (a/b) yields what is needed.6 A technical difficulty of crucial conceptual nature appeared in ensuring the crucial fact that rn /sn = p/q. Thue overcame this obstacle by differentiating the polynomials Rn , Sn several times; he thus obtained independent polynomials with similar properties, eventually satisfying the required condition. Naturally, to extract a proof it would be necessary to quantify everything; but this is not particularly difficult, once the general strategy has been laid down. (See [Z6] for the complete details of this argument; see also [SilT], Chapter V, for a detailed account of a proof whose substance is of similar nature.) Observe that the argument does not produce the possible exceptions to Thue’s inequality. In fact, the starting approximation a/b, which is crucial for the construction of rn , sn , is purely hypothetical. What the argument really shows is that another suitable approximation p/q cannot exist. In other words, Thue in substance proved that two excellent approximations would be inconsistent. Therefore the theorem was ineffective, and this involved also the following conclusion about Diophantine equations: the integer solutions to (1.8), even though they are finite in number, could not be found with this method.7 Thue’s method was revised and sharpened from 1921 to 1947 by such authors as C. L. Siegel, A.√O. Gelfond, and √ F. Dyson, who replaced Thue’s exponent 1 + (d/2) with 2 d and with 2d (see e.g. [G] or [Mor] for proofs). Observe that, in view of Corollary 1.9, the exponent cannot be replaced with anything < 2. Let us roughly see what these improvements depended on. The pair of polynomials Rn (X), Sn (X) appearing in the above description of Thue’s technique correspond to a single polynomial in two variables Rn (X) −Y Sn (X), vanishing at (ξ , ξ ) together with many derivatives with respect to X. The later authors used, more generally, polynomials P(X,Y ) of arbitrary degrees n, m in X,Y , with “many” partial derivatives (∂ a /∂ X a )(∂ b /∂ Y b )P(X,Y ) vanishing at (ξ , ξ ). The final conclusion was drawn on considering the number |P(a/b, p/q)|, provided that this was non-zero. On the one hand, as a rational number with denominator bn qm , it must be ≥ 1/bn qm . On the other hand, the closer a/b and p/q are to ξ , the smaller |P(a/b, p/q)| will be, since (ξ , ξ ) is a zero to high order for P(X,Y ). A comparison between such estimates led to the sought conclusions. (See [Z6] for a more complete discussion.) 6 7

As observed in [Bo5], here Thue follows Hermite’s principle that “functional approximations” produce numerical ones upon specializing. An effective method for Thue’s equations was found by A. Baker around 1970; see [B]. Later on, E. Bombieri [Bo1] proposed an alternative effective approach, which is more in line with Thue’s techniques.

1.2 From Thue to Roth

19

Siegel and his student Schneider had also suggested (see [S1]) that an extension of the method to polynomials in arbitrarily many variables could yield the best-possible exponent 2. A difficulty which appeared formidable was to guarantee that the analogue of the number P(a/b, p/q) would be non-zero.8 This obstacle was finally overcome by K. F. Roth in 1955 [R] (see also [BoG], [C1], [S2], [Tij2]). He proved therefore that for all algebraic numbers Corollary 1.9 gives the best-possible exponent, i.e. for no algebraic ξ can the exponent 2 be replaced therein by a larger number. We explicitly state this result in equivalent homogeneous form in the following theorem. Theorem 1.36 (Roth 1955) If ξ is algebraic and ε > 0, the integer pairs (p, q) such that |q(qξ − p)| < q−ε correspond at most to a finite number of ratios p/q. Exercise 1.37 Deduce from this statement the following finiteness theorem for Diophantine equations. If f , g ∈ Q[X,Y ], with f homogeneous without multiple factors, and if g = 0 has degree < deg f − 2, the equation f (X,Y ) = g(X,Y ) has at most finitely many integral solutions. (Hint: follow the above deductions from Thue’s theorem. We shall see in Chapter 3 an even more general result in this direction: Siegel’s theorem on integral points on curves.)

1.2.2 Review of Valuations on Number Fields, Heights In what follows we shall need to some extent the theory of valuations and heights on number fields. For the reader’s convenience, we briefly recall here, without any proof, a few fundamental definitions and results in this direction. For complete treatments and proofs we refer e.g. to the books [BoG], [L1], [L2], [Z6]. Places, product formula. Let k be a number field of degree d over Q. A set Mk of equivalence classes of valuations | · |v or places is associated with k. We say that the place v is trivial on x ∈ k if |x|v = 1; for any x ∈ k∗ there exist only finitely many v ∈ Mk that are not trivial on x (i.e. such that |x|v = 1), and with a suitable normalization the so-called product formula holds, i.e.

∏ |x|v = 1,

for all x ∈ k∗ .

(1.10)

v

This is the analogue of the fact that a non-constant rational function on a 8

As pointed out above, Thue had already met a similar difficulty.

20

Diophantine Approximation and Diophantine Equations

(smooth projective) curve has as many zeros as poles: actually there is a completely analogous theory of valuations for function fields of transcendence degree one, in which the product formula reads as above. In the case k = Q the places are the usual one and the p-adic ones, in bijective correspondence to the prime numbers. (If p is a prime and if x ∈ Q∗ , we may write uniquely x = pm y, where m ∈ Z and p is coprime with both numerator and denominator y. We then put |x| p := p−m .) For any place of Q, we obtain a corresponding topology and completion; the completion with respect to the usual place is R, while we denote by Q p the completion associated with the p-adic place. In the general case, the places are constructed in terms of the ones of Q. The so-called infinite (or Archimedean) places (which form a set denoted Mk,∞ ), correspond to the embeddings of k in C, up to complex conjugation. On the other hand, the finite places induce some p-adic place on Q, and correspond one-to-one to the non-zero prime ideals in the ring O = Ok of algebraic integers in k; the associated absolute values are then ultrametric (that is, they satisfy |x + y| ≤ max(|x|, |y|)). Normalization. We shall normalize the absolute values dependently on k, as follows. Suppose that the place v lies above p (i.e. that v restricted to Q is the p-adic value, or that, equivalently, the ideal associated with v divides p). Then the corresponding completion kv is a finite extension of Q p of degree dv := [kv : Q p ], and we put |p|v = p−dv /d ; similarly for infinite v. S-integers, S-units. For a finite set S ⊂ Mk , containing at least the Archimedean places, we define the ring of S-integers in k: OS = Ok,S = {x ∈ k : ∀v ∈ S, |x|v ≤ 1}. Note that, when S = Mk,∞ , this ring coincides with Ok . More generally, OS consists of those elements in k generating a fractional ideal whose denominator contains at most primes from S. Using the fact that the group of classes of ideals modulo principal ones is finite, it is not difficult to see that, if k is given and S is large enough to contain representatives for all classes, then OS is a unique factorization domain. This property is often useful. We also define OS∗ as the group of S-units in k, i.e. the invertible elements in OS . It consists of those elements in k generating a fractional ideal whose numerator and denominator contain at most primes from S. A famous result by Dirichlet states that OS∗ is the direct product of a finite group (the group of roots of unity in k) by a group isomorphic to Z#S−1 . (Pell’s equation substantially corresponds to units of real quadratic fields; the nontrivial solvability of it is a special case of this theorem of Dirichlet.)

1.2 From Thue to Roth

21

Heights. For a point P = (x0 : x1 : · · · : xn ) ∈ Pn (k) we define the Weil height and the logarithmic height of P by H(P) =

∏ max(|x0 |v , . . . , |xn |v ),

v∈Mk

and h(P) = log H(P), respectively. For α ∈ k, we define the Weil height by H(α ) := H(1 : α ) ≥ 1, and h(α ) = log H(α ). The product formula guarantees that the definition is independent of the projective coordinates for P; moreover, the chosen normalization for the absolute values depends on the field k, but ensures that the height is independent of it (provided of course that k contains the coordinates in question). For p, q coprime integers (q = 0), we have H(p/q) = max(|p|, |q|). In general, we have properties like (Exercise) (i) H(α m ) = H(α )|m| for m ∈ Q, (ii) H(αβ ) ≤ H(α )H(β ), (iii) H(α + β ) ≤ 2H(α )H(β ), (iv) H(α σ ) = H(α ) for σ ∈ Gal(Q/Q). Another important property (with an easy proof) is Northcott’s theorem: there exist at most finitely many algebraic numbers of bounded height and degree. Exercise 1.38 Deduce from Northcott’s theorem Kronecker’s theorem: the only algebraic numbers of height 1 are 0 and the roots of unity. If we work on a projective algebraic variety X/Q we can define a height by restriction as soon as we have an embedding of X in some projective space Pn . Now, such embeddings are associated with very ample divisors D on X, and one then speaks of a height with respect to D, denoted hD . Of course this is not entirely well-defined, because it depends on the system of functions which define the embedding. But any two such systems are linearly related, so this height is well-defined up to a bounded summand “O(1).” In general, for any divisor A, one can define a height hA by expressing A = D2 − D1 as the difference of very ample divisors D1 , D2 and setting hA (x) := hD2 (x) − hD1 (x) + O(1). That this definition really works requires of course some proofs. Also, it turns out that certain functorial properties hold, linking heights on different varieties related under morphisms. This fundamental approach, as well as most of the above definitions and properties, are originally due to A. Weil.

22

Diophantine Approximation and Diophantine Equations

See for instance [BoG], [HiSi], [L2], [Se1] for all of this and for extensive further accounts of the theory of heights.

1.2.3 The Theorems of Mahler, Ridout, Lang A few years after Roth’s proof, an interesting generalization was obtained by D. Ridout, a student of K. Mahler. Mahler had already considered rational approximations with respect to several places, taken into account simultaneously (see e.g. [S1]). For instance, in approaching an algebraic number ξ , one can consider only the fractions p/q such that q is, say, a power of 2; this corresponds to approaching ξ in the classical absolute value and ∞ in the 2-adic one. In this direction, Ridout established a theorem which we recall only in a special case (later on we shall state much more general conclusions). Fix a finite set S ⊂ MQ containing the infinite place and consider the rationals in OQ,S , i.e., those whose denominator contains only primes in S. We have the following theorem. Theorem (Ridout’s theorem) If ξ is algebraic and ε > 0, the set of p/q ∈ OQ,S (p, q ∈ Z) such that |ξ − (p/q)| ≤ q−1−ε is finite. So, we see that the mentioned restriction on the denominator q allows a strengthening of Roth’s result. Here is a curious consequence of this fact. Let us consider the decimal expansion ξ = 0.c1 c2 · · · of the irrational algebraic number ξ ∈ (0, 1). We ask the following question: how long can a sequence of consecutive zero digits be? Namely, if we have cm+1 = cm+2 = · · · = cm+l = 0, how large can l = l(m) be with respect to m? Probabilistic arguments suggest that l(m) should be unbounded; on the other hand, Ridout’s result implies that l(m)/m → 0. In fact, considering the truncated expansion 0.c1 · · · cm = Nm /10m , where Nm is an integer, we have |ξ − Nm /10m | ≤ 10−m−l(m)+1 . But the rational number Nm /10m lies in OS , defining S = {∞, 2, 5}; we thus obtain that, for any fixed ε > 0, we have |ξ − Nm /10m | ≥ 10−m(1+ε ) , apart from finitely many exceptions (depending on ε ). It follows that l(m) ≤ ε m + 1 for large enough m, whence the assertion. Mahler’s and Ridout’s results are special cases of a general formulation of Roth’s theorem, due to S. Lang (see [L2], Chapter 7), which we are going to state. (Such a result is in turn a special case of the subspace theorem, in the version 2.2 which we shall meet in the next chapter.) Let us first introduce a little further notation.

1.2 From Thue to Roth

23

Let k be a number field and let S ⊂ Mk be a finite set of places, containing all the Archimedean ones. For each place v ∈ S, let us normalize the corresponding absolute value |·|v as in Section 1.2.2, and let us choose arbitrarily an extension of it (which always exists) to the algebraic closure Q. (Observe that, therefore, if α is algebraic but does not lie in k, the absolute value |α |v need not coincide with the normalization with respect to k(α ).) With these conventions, we have the following theorem. Theorem 1.39 (Generalized Roth’s theorem – Lang 1962) For v ∈ S, let αv be algebraic over k and let ε > 0. Then there exist at most finitely many numbers β ∈ k such that

∏ min(1, |αv − β |v ) ≤ H(β )−2−ε .

v∈S

See [BoG], [L2] for complete proofs. Remark 1.40 In the theorem one can even assume that αv ∈ P1 (Q), using the rule |∞ − β |v = |1/β |v . This more general version follows from the stated one after replacing the αv and β with their transforms by an (invertible) homography (of the form x → (ax + b)/(cx + d)). The result thus considers approximations by an element β ∈ k, simultaneously to the algebraic “targets” αv , with respect to the places v ∈ S; the accuracy is measured by a product, namely taking a geometric mean. Exercise 1.41 Recover the previous version of Roth’s theorem from this last one, with the choices k = Q and S consisting of the single infinite place. Also, obtain the above-stated theorem by Ridout. (It is convenient – as in Remark 1.40 – to look at the approximations to 1/ξ with respect to the infinite place and to 0 (with respect to the remaining places in S).) To illustrate the strength of this result, we shall derive from it a general version of a theorem of Mahler on Thue’s equation (1.8). We shall prove the following. Theorem 1.42 (Mahler) Let f˜ ∈ k[X,Y ] be homogeneous, of degree d ≥ 3, without multiple factors, let m ∈ k∗ , and let S ⊂ Mk be a finite set. Then, there exist at most finitely many pairs (a, b) ∈ OS2 such that f˜(a, b) = m. Proof Observe that we may enlarge k and factor f˜ in the form f˜(X,Y ) = ∏di=1 (ri X − siY ), where (ri : si ) ∈ P1 (k) are pairwise distinct and where by a linear change of variables we may suppose that ri si = 0. We can also enlarge S and suppose that it contains all the infinite places. In what follows c1 , c2 , . . . will denote positive numbers depending only on f˜, k, S, and m (not on a, b).

24

Diophantine Approximation and Diophantine Equations

For a pair (a, b) as in the statement, we may suppose that b = 0. Now let v ∈ S and observe that the equation gives ∏di=1 |si /ri − a/b|v = |mπ |v |b|−d v , where π := ∏di=1 ri−1 . Suppose, to fix our ideas, that |b|v is large; then the product on the left is small. On the other hand, at most one factor may be small, since the ri /si are distinct by assumption. From this consideration it easily follows (Exercise) that, if v ∈ S is such that |b|v > 1, we have si a min min 1, − ≤ c1 |b|−d v . ri b v {1,...,r} In any case, we plainly have min(1, |si /ri − a/b|v ) ≤ 1, whence si a min min 1, − ≤ c2 max(1, |b|v )−d . ri b v {1,...,r} Now let αv be one among the si /ri that attain the minimum in the inequality corresponding to v. Observe that αv may depend on a, b; however, the number of possible choices for varying v ∈ S is bounded by c3 . Hence, for our purposes we may focus on the pairs (a, b) which, for every v ∈ S, correspond to a fixed choice for αv . Putting β = a/b and taking the product over v ∈ S of the above inequalities, we then obtain −d ∏ min(1, |αv − β |v ) ≤ c#S 2 ∏ max(1, |b|v ) .

v∈S

v∈S

To be able to apply Theorem 1.39 we need only compare the right-hand side with H(β ) = H(a : b). The equation ∏di=1 |a − (si /ri )b|v = |mπ |v immediately gives |a|v ≤ c4 |b|v + c5 . In turn, we find that H(a : b) ≤ ∏v∈S max(|a|v , |b|v ) ≤ c6 ∏v∈S max(1, |b|v ). Then, the last displayed inequality implies that

∏ min(1, |αv − β |v ) ≤ c7 H(β )−d .

v∈S

Finally, Northcott’s theorem (see Section 1.2.2) implies that for only finitely many β ∈ k∗ we have H(β ) ≤ c27 . For the remaining ones we have

∏ min(1, |αv − β |v ) ≤ H(β )−d+(1/2) ≤ H(β )−5/2

v∈S

and Theorem 1.39 finally applies, concluding the proof. Corollary 1.43 For f˜ as in the last theorem, the pairs a, b ∈ OS such that f˜(a, b) ∈ OS∗ correspond to at most finitely many ratios a/b. Proof Recall from Section 1.2.2 that OS∗ is finitely generated (the easiest half of Dirichlet’s above-mentioned theorem), and therefore OS∗ /[d]OS∗ is finite. Hence, for a pair (a, b) ∈ OS2 such that f˜(a, b) ∈ OS∗ we may write

1.3 Exercises

25

f˜(a, b) = mδ d , where δ = δ (a, b) ∈ OS∗ and where m = m(a, b) lies in a finite set; so, for our purposes we may assume that m is fixed. Then, since f˜ is homogeneous of degree d, we obtain a solution x = a/δ , y = b/δ to the equation f˜(x, y) = m. Since a, b ∈ OS , δ ∈ OS∗ , we see that x, y ∈ OS . Now, Theorem 1.42 implies that x, y assume finitely many values at most, and the same then holds for x/y = a/b, proving the claim.

1.3 Exercises Exercise 1.44 Let k be a number field, S ⊂ Mk be a finite set, OS := Ok,S . Let us consider the S-unit equation (which was introduced by Siegel in the study of hyperelliptic Diophantine equations): x + y = 1,

x, y ∈ OS∗ .

(1.11)

(i) Prove by a direct method that (1.11) has only finitely many solutions when (a) k = Q, S = {∞, 2, 3} (the situation reduces to the equation 2a − 3b = ±1 for a, b ∈ N) and (b) [k : Q] = 2 and S consists of the infinite places. (ii) Deduce from Theorem 1.42 that in any case Equation (1.11) has only finitely many solutions (a theorem due originally to Siegel for S = M∞ ). (Hint: use the finiteness of the quotient group OS∗ /[h]OS∗ , for any positive integer h.) (iii) Prove the same result directly from Theorem 1.39 and Remark 1.40. (Hint: for a solution (x, y), put β = x and αv = 0, 1, ∞ according to whether |x|v < 1/2, |y|v < 1/2, or |x|v > 2, respectively.) (iv) Deduce Mahler’s theorem: if p1 , . . . , pt , q1 , . . . , qu , r1 , . . . , rv are given pairwise distinct prime numbers, the equation pa11 · · · ptar +qb11 · · · qbuu = r1c1 · · · rvcv has only finitely many solutions in integers ai , b j , cl . (v) Prove that there are infinitely many solutions of x + y = 1 with x, y units in Q, i.e. invertible elements in the ring of algebraic integers, not restricted to a fixed number field. (Hint: find irreducible monic polynomials f ∈ Z[X] such that f (0) = f (1) = 1. The result may be seen as an extremely special case of Rumely’s local–global principle, that roughly speaking asserts that an algebraic system always has algebraic integer solutions provided that it has integral solutions locally at every place.) Exercise 1.45 Conversely, deduce Theorem 1.42 from the result at point (i) of the previous exercise. (Hint: in the above notation, we may assume that ri , si , m ∈ OS∗ . The factorization for f˜(X,Y ) shows that, for the solutions (a, b) ∈ OS2 , the factors ri a − si b ∈ OS∗ . Eliminating a, b then leads to (1.11).)

26

Diophantine Approximation and Diophantine Equations

Exercise 1.46 Let a1 , a2 , b1 , b2 ∈ k∗ satisfy δ := a1 b2 − a2 b1 = 0. Show (following Siegel) that the system a1 x2 + b1 = y2 , a2 x2 + b2 = z2 , x, y, z ∈ OS (it defines the S-integer points on a curve of genus 1) has at most finitely many solutions. (Hint: upon enlarging k and S we may assume that a1 , a2 are squares in k and that a1 , a2 , b1 , b2 , δ ∈ OS∗ . From y2 − a1 x2 = b1 , we then obtain that √ u := y+ a1 x ∈ OS∗ ; similarly, from z2 − a2 x2 = b2 and a1 z2 − a2 y2 = δ , we get √ √ √ √ √ that both v := z + a2 x and w := a2 y − a1 z lie in OS∗ . But a2 u − a1 v = w, reducing to (1.11).) Exercise 1.47 Show by some direct argument, independently of the above stated deep theorems, that, if a polynomial f (X) ∈ Z[X] assumes square values for every value X = n ∈ Z, then f is a square in Z[X]. (See [PS], Problem 114.) This is a particular case of the so-called Hilbert irreducibility theorem (see Chapter 3). Exercise 1.48 (This exercise needs a few more facts from algebraic number theory.) The result stated in the previous exercise is extremely weak compared with the following theorem of Siegel (see e.g. [Mor], p. 264): if a polynomial f (X) ∈ k[X] has at least three simple roots, the equation Y 2 = f (X) has at most a finite number of solutions (p, q) ∈ OS2 . Prove this statement as a corollary of the result in Exercise 1.46. (Hint: factoring f yields an equation q2 = c ∏di=1 (p − ξi ), where one can suppose that the ξi are distinct and d ≥ 3. Using unique factorization in ideals in OS , finiteness of class number, and Dirichlet’s description of OS∗ , one obtains equations p − ξi = ri s2i for i = 1, 2, 3, where ri , si ∈ k and the ri have only finitely many possibilities as p varies. Eliminating p from two pairs of such equations leads to a couple of equations as in Exercise 1.46, which concludes the argument.) An analogous result holds for superelliptic equations Y m = f (X); state a bestpossible conclusion in this direction. (See also Chapter 3.) Exercise 1.49 Prove the following theorem (of Pólya and Siegel): if f ∈ Z[X] has at least two distinct roots, then for n ∈ N the greatest prime factor of f (n) tends to infinity as n → ∞. (Hint: if for infinitely many integers n the prime factors of f (n) all lie in a certain finite set, and f (ξ ) = 0, then n − ξ is an Sunit, for a suitable number field k and a finite set S ⊂ Mk . Now, use the results in Exercise 1.44 above to conclude. For three distinct roots, even more directly, one can use Corollary 1.43 above.) See also [Se1], p. 105. Exercise 1.50 Following Liouville, use Theorem 1.33 to prove that the number ∑∞j=0 2− j! is transcendental. Also, use Roth’s theorem, Theorem 1.36, to j show the same for ∑∞j=0 2−3 . Finally, use Ridout’s result (or Theorem 1.39)

1.4 Notes

27

to prove the transcendency of ∑∞j=0 2−2 and ∑∞j=0 2−Fj , where {Fj } is the Fibonacci sequence. j

Exercise 1.51 Let a ∈ Z; prove that 5n + 2n + a can be a square only for finitely many 1.39 to bound from below the dis√ √ n ∈ N. (Hint: use Theorem tance of 5n to an integer. That 5 > 2 is crucial with this approach; see [DeZ]. Also, see [CZ1] and Chapter 4 of this book for much more general conclusions.)

1.4 Notes As has already been remarked, the above use of Thue’s or Roth’s theorems renders the corresponding proofs ineffective; namely, the relevant solutions are shown to be finite in number, but no algorithm is provided to find them. An alternative method to treat Thue’s equation (in almost complete generality) was proposed by Skolem (see [BS], Chapter IV); it was based on the theory of p-adic analytic functions, but was also ineffective. (See Section 4.2 below for an example.) Around 1970, A. Baker obtained certain explicit lower bounds for non-zero quantities of the form |α0 + α1 log β1 + · · · + αn log βn |, for algebraic numbers αi , β j (see e.g. [B]). Now, due to Dirichlet’s result about the structure of OS∗ , many classical Diophantine equations could be translated into exponential or logarithmic Diophantine (in)equalities, to which Baker’s estimates could be applied. This yielded some effective proofs, in particular of the theorem stated in Section 1.2.3, namely the solutions of (1.11) may be computed, for any given k, S. Consequently, this extends to the corollaries illustrated in the above exercises. A new effective approach to these questions, closer to Thue’s original one, was found later by Bombieri (see [Bo1], [Bo5], and also [BoC] and [BoG]). More recently, yet another completely different approach has appeared in works by Murty and van Pasten and by van Känel. These approaches work only over the rationals and make use of deep results in the circle of ideas introduced by Frey in the context of Fermat’s last theorem (which was proved finally by Wiles). Baker’s effective proof of Thue’s result on Diophantine equations yields an effective improvement of Liouville’s result for the Diophantine approximation to algebraic numbers of degree ≥ 3 (see [B]); it is rather striking that this implication goes in the direction opposite to Thue’s. We have already recalled Liouville’s application to transcendental numbers; Exercise 1.50 above provides other instances, with Roth’s and Ridout’s

28

Diophantine Approximation and Diophantine Equations

theorems. Baker’s results constitute a further deep example of the interplay among Diophantine approximation, Diophantine equations, and transcendence theory. For a detailed discussion of the development of these ideas in the last half-century, see [Wa1]. Insofar as Equation (1.11) is concerned, we shall soon present a generalization of it to several variables. However, this generalization is not known to be capable of effective treatments. In contrast, explicit estimates for the number of solutions of the Diophantine equations and inequalities discussed above have been obtained by several authors. In particular, this holds for the exceptions in Roth’s theorem (see [S3], Chapter II); for Thue’s equation essentially best-possible estimates are due to Bombieri and Schmidt (see [S3], Chapter III); and for Equation (1.11) a rather uniform bound was found by J.-H. Evertse (see [S3], Chapter IV]). In 1996 F. Beukers and H.-P. Schlickewei [BeS] proved that the number of solutions of x + y = 1, for (x, y) in a multiplicative group G ⊂ (C∗ )2 of finite rank r, is bounded by 256r+1 ; in particular, this upper bound, remarkably, depends only on the rank! See [BoG] and [Z6] for presentations of this proof. For this, and for an ample discussion of Equation (1.11), see also [Bo2], [EG]. All of these topics have been developed also by replacing the number field k with a function field of some algebraic variety (over some “constant field”). The simplest instances are provided by the affine line A1 (over a field k), whose affine ring k[t] and function field k(t) share many properties with Z and Q, respectively. In general, one has product formulae and there is a theory of heights (see [BoG], [L2], [Se1]). One may also define the analogues of OS , OS∗ and consider Diophantine equations and approximations. Actually, it happens that certain rather relevant geometric questions may be interpreted as Diophantine questions over function fields. Then, this setting has often been a source of inspiration for formulating conjectures over number fields and for testing them. In fact, often, for function fields the corresponding conclusions have proved to be much easier to derive, due mainly to the existence of non-trivial derivations. (Alternatively, sometimes one may reduce a situation to the number-field case by specialization.) For instance, a fairly simple (but ingenious) proof of an effective form of Roth’s theorem is known for function fields [Wan]. See also [Mas] for some aspects of the theory and Exercise 2.11 of the next chapter for a few examples.

2 Schmidt’s Subspace Theorem and S-Unit Equations

2.1 From Roth to Schmidt As remarked in the previous chapter, Roth’s theorem gave a final answer to the problem of the best-possible exponent for the rational approximations to a given algebraic number, while its generalizations, such as Theorem 1.39, extended this to the approximations by numbers in a given number field k. However, certain related natural questions remained open. One of them, raised by Siegel and studied later by Wirsing and others, concerned the approximations by algebraic numbers of given degree (but not necessarily in the same number field). Another question concerned simultaneous approximations to several algebraic numbers, by means of rationals with the same denominator. After having obtained earlier some partial results, in 1970 W. M. Schmidt answered all of these questions, showing that If α1 , . . . , αn are algebraic numbers such that 1, α1 , . . . , αn are linearly independent over Q, then, if ε > 0 there exist at most finitely many integers q, p1 , . . . , pn such that q1+ε ∏ni=1 |qαi − pi | < 1. This conclusion immediately implies the finiteness of the integers q, p1 , . . . , pn such that |αi − pi /q| < q−1−(1/n)−ε for i = 1, . . . , n, which is an essentially best-possible extension of Roth’s theorem (i.e., the case n = 1). To see that the exponent is the best-possible one, use Theorem 1.11, in the same way as Dirichlet’s theorem is used to prove the optimality of Roth’s theorem. Moreover, by means of certain transference theorems from the geometry of numbers, Schmidt was also able to give a final answer to the “exponent” problem for the approximations of bounded degree. Schmidt’s arguments followed only in part those of Roth; in fact, it proved necessary to introduce several new substantial ideas. However, as remarked above, we have in mind here the applications of these results to Diophantine 29

30

Schmidt’s Subspace Theorem and S-Unit Equations

equations, and thus we will not pause on Schmidt’s proof. Rather, we shall describe more general formulations of the theorem, which will prove quite convenient for applications. One of them, which gave the name to the whole series of results of this type, was obtained by Schmidt himself in 1972; here it is. Theorem 2.1 (Subspace theorem I (Schmidt, 1972)) Let L1 , . . . , Ln be linear forms in X1 , . . . , Xn , linearly independent, with algebraic coefficients. For given ε > 0, there exist proper linear subspaces T1 , . . . , Tm ⊂ Qn , whose union contains the set of x = (x1 , . . . , xn ) ∈ Zn such that |L1 (x) · · · Ln (x)| ≤ max(|xi |)−ε .

(2.1)

In practice, the theorem states that the integral vectors in Zn , with the possible exception of those lying on a certain finite union of subspaces, cannot n lie “too near” to the subspaces of Q defined by the forms Li ; this closeness is measured, roughly speaking, by taking the product of the distances, i.e., in geometric mean. A complete proof, together with the deduction of several corollaries, can be found in [S2]. See also [BoG] for complete proofs of the following, more general, versions, or [S3], Chapter V, and in addition [E1] and [B] for shorter proofs of the previously stated result by Schmidt. Observe that on multiplying the right-hand side of (2.1) by a fixed number c > 0 we obtain an equivalent statement. In fact, there are only finitely many vectors x ∈ Zn such that max(|x|i ) ≤ c2/ε ; on the other hand, for the remaining vectors we have c max(|xi |)−ε ≤ max(|xi |)−ε /2 , and we reduce the situation to the inequality (2.1) after halving ε . This remark also holds for Roth’s theorem and for the next statements. Roth’s theorem, Theorem 1.36, immediately follows from Theorem 2.1 on putting L1 (X1 , X2 ) = X1 − ξ X2 , L2 (X1 , X2 ) = X2 for algebraic ξ . In fact, if p, q are integers with q|qξ − p| < |q|−ε , we have |L1 (p, q)L2 (p, q)| < |q|−ε . Moreover, since p/q approaches ξ , it is clear that max(|p|, |q|) ≤ c|q| for a suitable constant c = c(ξ ). Therefore |L1 (p, q)L2 (p, q)| < cε max(|p|, |q|)−ε and Theorem 2.1 applies (as in the previous remark); we conclude that the pairs (p, q) all lie in a finite union of lines through the origin, whence the corresponding rationals p/q are finite in number, as wanted. In a similar way, we recover the previously stated results of Schmidt. In 1977 H. P. Schlickewei obtained a generalization where the vector solutions x of the relevant inequality had coordinates in an arbitrary number field, and where several absolute values appeared, like in the generalized Roth’s

2.1 From Roth to Schmidt

31

theorem, Theorem 1.39. Among various possible formulations in this direction, all substantially equivalent, we start with the following one, which appears as Theorem 1.6 in [E1]. (One may easily check (Exercise) that it contains Theorem 1.39.) In what follows, let k be a number field and let S ⊂ Mk be a finite set of places, containing the Archimedean ones; we suppose that the corresponding valuations are normalized with respect to k, as in Section 1.2.2 and that they are extended in some way to Q (which is always possible). For a vector x = (x1 , . . . , xn ) ∈ kn , put |x|v := supni=1 |xi |v . We then have the following. Theorem 2.2 (Subspace theorem II (Schlickewei)) For v ∈ S let Liv , i = 1, . . . , n, be linearly independent linear forms in n variables, with coefficients in Q, and let ε > 0. Then the solutions x ∈ kn to the inequality |Liv (x)|v ≤ H(x)−n−ε |x| v v∈S i=1 n

∏∏

(2.2)

all lie in a certain finite union of proper linear subspaces of kn . We pause to make a few comments. (i) This statement too may be interpreted by saying that the points in kn cannot be “in geometric mean” too close to the spaces defined by the forms Liv . Here the mean is considered simultaneously with respect to all the places in S. (ii) Recall that, for x ∈ kn \ {0}, the (projective) height H(x) depends only on the point defined by x in Pn−1 ; the same holds for each factor on the left of (2.2). Therefore the vectors x in question may be also thought of as projective points. (iii) A seemingly less general version is Theorem 1.D in [S3], where the linear forms are supposed to have coefficients in k. Actually, by a suitable application of that result to a Galois extension of k containing the coefficients of all the Liv , it is not difficult to derive Theorem 2.2 in general. (iv) Note that in this statement the coordinates of x are not necessarily Sintegers. If this happens, however (which is the case in many applications), we recover an “affine” formulation, nearer to Theorem 2.1. n , so |x| ≤ 1 for v ∈ S. It follows that H(x) = In fact, suppose that x ∈ Ok,S v ∏v∈Mk |x|v ≤ ∏v∈S |x|v . In particular, ∏v∈S ∏ni=1 |x|v ≥ H(x)n . Using this fact on the right-hand side of (2.2) we obtain at once the following. Theorem 2.3 (Subspace theorem III) For v ∈ S let Liv , i = 1, . . . , n, be linearly

32

Schmidt’s Subspace Theorem and S-Unit Equations

independent linear forms in n variables, with coefficients in Q, and let ε > 0. n to the inequality Then the solutions x ∈ Ok,S n

∏ ∏ |Liv (x)|v ≤ H(x)−ε

(2.3)

v∈S i=1

all lie in a certain finite union of proper linear subspaces of kn . n . (NatIn fact, the above argument proves that (2.3) implies (2.2) for x ∈ OS,k urally, in this affine version the vector x can no longer be interpreted as a projective point.) A generalization where the number of linear forms depends on v is due to P. Vojta; one can rapidly deduce it from Theorem 2.3, as in [E1], Theorem 1.8.

In the following sections we shall illustrate a few applications, which nowadays are regarded as being classical, of these results.

2.2 The S-Unit Equation Let k be a number field, let S ⊂ Mk be a finite set containing all of the infinite ∗ be the group of S-units in k (see Section 1.2.2). In places, and let OS∗ = Ok,S Section 1.2.3, we considered the equation x + y = 1, Equation (1.11), to be solved in (OS∗ )2 . We deduced (in practice from Roth’s theorem, Theorem 1.39) that it has at most a finite number of solutions. This result was shown in turn to admit several applications to Diophantine equations. Here we shall consider the more general equation in S-units, x1 + · · · + xn = 1,

x1 , . . . , xn ∈ OS∗ .

(2.4)

Observe that, for n ≥ 3, (2.4) may well admit an infinity of solutions; for instance, if n = 3, we may set x1 = 1, x2 = t = −x3 , where t ∈ OS∗ is arbitrary. However, here x2 + x3 = 0, so these solutions are in a sense special. We shall see that this is the essential phenomenon explaining infinitely many solutions. In general, we shall say that a solution of (2.4) is non-degenerate if no subsum of the left-hand side vanishes, namely ∑i∈I xi = 0 for every I ⊂ {1, . . . , n}, I = 0. / With this definition we have the following result (see also [BoG], [E1], [S3], [Vo1], [Z6]). Theorem 2.4 (Evertse, van der Poorten, Schlickewei) most a finite number of non-degenerate solutions.

Equation (2.4) has at

Before we get to the proof (which will use Theorem 2.3), we pause to make a few remarks.

2.2 The S-Unit Equation

33

Remark 2.5 (i) For given a1 , . . . , an ∈ k∗ , the more general equation a1 x1 + · · · + an xn = 1, xi ∈ OS∗ , reduces to (2.4). In fact, it suffices to enlarge S so that ai ∈ OS∗ for i = 1, . . . , n. (ii) The homogeneous version of (2.4), i.e. the equation x0 +x1 +· · ·+xn = 0, with xi ∈ OS∗ , also reduces to (2.4), on dividing by −x0 . Naturally the conclusion will now be that there exist at most finitely many solutions which are non-degenerate and non-proportional. ∗

(iii) Let G ⊂ Q be a finitely generated subgroup. Then there certainly exist a number field k and a set S as above such that G ⊂ OS∗ (it suffices that OS∗ contains a finite set of generators for G). Therefore, the equation x1 + · · · + xn = 1, xi ∈ G, has only a finite number of non-degenerate solutions. Using specialization arguments (but other methods are possible as well) it may be deduced from Theorem 2.4 (a non-trivial Exercise) that the same result holds for an arbitrary finitely generated group G ⊂ C∗ .1 (iv) An easy and useful corollary is that there exists a finite set Φ = Φ(n, k, S) such that every solution of (2.4) has at least one coordinate in Φ. (Note that this holds also for degenerate solutions!) In the homogeneous version of course the conclusion will be that the ratio of two suitable coordinates lies in Φ. To justify this assertion, given any solution x of (2.4), delete from the lefthand side a maximal vanishing subsum; in this way we shall obtain an equation ∑i∈Ix xi = 1, with no new vanishing subsums. We may now partition the solutions in a finite number of classes, according to the set Ix ; so, arguing separately with each class, we may assume for our purpose that Ix = I is fixed for all solutions in question. Now, the vector x := (xi )i∈I represents a non-degenerate solution for the equation ∑i∈I xi = 1. But then, Theorem 2.4 applied to this equation implies that x has finitely many possibilities; the assertion follows. Proof We shall argue by induction on n, the case n = 1 being obvious. Let then n > 1 and suppose by contradiction the existence of an infinite set Σ of non-degenerate solutions of (2.4). For a given solution x ∈ Σ, for v ∈ S, let jv be an index such that |x jv |v = |x|v := max1≤i≤n (|xi |v ). This jv depends on x, but, replacing Σ with an infinite subset if necessary, we may suppose that, for all v ∈ S, jv is constant for all solutions in Σ. We shall apply Theorem 2.3, on defining Liv = Xi for i = jv and L jv v = X1 + · · · + Xn (note that for each v these linear forms are indeed independent). 1

Consider for instance the case G ⊂ Q(t)∗ , for transcendental t; it will prove convenient to specialize t so that independent generators for G remain independent – see Exercise 2.12 below.

34

Schmidt’s Subspace Theorem and S-Unit Equations Since x1 + · · · + xn = 1 for every solution x and since |x jv |v = |x|v , we obtain n

n

∏ |Liv (x)|v = ∏ |xi |v = ∏ |xi |v i=1

i = jv

|x|−1 v .

i=1

On the other hand, the xi are S-units, i.e. |xi |v = 1 for v ∈ S, whence ∏v∈S |xi |v = ∏v∈Mk |xi |v = 1, the last equality being the product formula. Hence n

n

∏ ∏ |Liv (x)|v = ∏ ∏ |xi |v

v∈S i=1

i=1 v∈S

∏ |x|−1 v

v∈S

= ∏ |x|−1 v . v∈S

Finally, H(x) = ∏v∈Mk |x|v ≤ ∏v∈S |x|v (since the xi are in particular S-integers) and the last equality implies n

∏ ∏ |Liv (x)|v ≤ H(x)−1 .

v∈S i=1

We are thus in a position to apply Theorem 2.3 (with ε = 1), and deduce that the solutions x ∈ Σ all lie in a certain finite union of proper subspaces of kn . Upon once again replacing Σ with an infinite subset, we may further assume that for all solutions in Σ the same equation a1 x1 + · · · + an xn = 0 holds, where ai ∈ k are not all zero, say an = 0. Using this equation to substitute for xn in (2.4) and putting bi = 1 − (ai /an ), we find b1 x1 + · · · + bn−1 xn−1 = 1. Let now I ⊂ {1, . . . , n − 1} be the (non-empty) set of indices such that bi = 0. Then

∑ bi xi = 1. i∈I

We can now enlarge S and suppose that bi ∈ OS∗ for i ∈ I. Moreover, we can omit from the left-hand side a maximal vanishing subsum. Such a subsum will depend on x, but, upon once again replacing Σ with an infinite subset, we may assume that the subsum is the same for all solutions in question. On replacing I with a (possibly smaller) subset, we can then suppose (similarly to remark (iv) to the theorem) that no subsum of the left-hand side of the last displayed equation vanishes. Then, by induction (observe that #I < n), the bi xi , i ∈ I, may take at most finitely many values. In particular, upon once more replacing Σ with a suitable infinite subset, we may assume that, for all the solutions x ∈ Σ, some coordinate, say xn , takes a constant value c.2 The proof can now be rapidly concluded as follows. Equation (2.4) for these solutions becomes x1 + · · · + xn−1 = 1 − c. We cannot have c = 1, since our solutions are non-degenerate by assumption. We can then enlarge S so that 2

Observe that this conclusion is just remark (iv) above; however, the remark was previously deduced from the theorem that we are now proving.

2.3 S-Unit Points on Algebraic Varieties

35

1 − c ∈ OS∗ . Now, put yi := xi /(1 − c); then yi ∈ OS∗ and moreover y1 + · · · + yn−1 = 1. Also, as x varies in the infinite set Σ, the vector y := (y1 , . . . , yn−1 ) takes infinitely many values as well (since xn = c is constant and so the map x → y is injective). Therefore the inductive assumption implies that the solution y1 + · · · + yn−1 = 1 is degenerate for all but finitely many ys; but then also the corresponding solutions x would be degenerate, a contradiction which concludes the argument. Theorem 2.4 can be given a quantitative formulation; namely, given a se(i) (i) quence of non-degenerate tuples of S-units P(i) = (x1 , . . . , xn ) ∈ (OS∗ )n , not only is it the case that their sum cannot be infinitely often equal to a given number (say 1), but also its height must tend to infinity. More precisely, we have the following. Theorem 2.6 Let k, S, OS , OS∗ denote as before a number field, a finite set of places containing the Archimedean ones and the corresponding ring of Sintegers and group of S-units. Let n ≥ 2 be a positive integer and ε > 0 be a positive real number. The inequality max |x1 + · · · + xn |v < max(H(x1 ), . . . , H(xn ))1−ε v∈S

has only finitely many degenerate solutions (x1 , . . . , xn ) ∈ (OS∗ )n . Again, by degenerate, we mean that no subsum vanishes. The above statement admits the following geometrical interpretation. Consider the hypersurface x1 + · · · + xn = 0 in Gnm . Theorem 2.6 provides a lower bound for the distance from an integral point (x1 , . . . , xn ) ∈ Gnm (OS ) = (OS∗ )n to that hypersurface, while Theorem 2.4 (properly reformulated, by homogenizing Equation (2.4)) just states that the mentioned hypersurface cannot contain infinitely many integral points.

2.3 S-Unit Points on Algebraic Varieties Following in part [E1] (but see also [E4]), we shall now prove a generalization of Theorem 2.4. As a preliminary to the statement, we briefly recall a few definitions. We shall denote by Gnm the nth power of the multiplicative algebraic group Gm ; this group is just the affine variety A1 \ {0}, endowed with the multiplicative group law; namely, for a (commutative) ring R, Gnm (R) just denotes the multiplicative group (R∗ )n (that is, we multiply coordinatewise). We have a simple example of an algebraic group, which, roughly speaking, is an algebraic variety with a group law expressed by regular (rational)

36

Schmidt’s Subspace Theorem and S-Unit Equations

functions. An algebraic subgroup is simply a subvariety which inherits the group law, namely a subgroup which is closed for the Zariski topology. In what follows, we shall for simplicity often identify an algebraic group H with the set H(Q) of its points with algebraic coordinates. In 1984, M. Laurent proved a conjecture of Lang on the distribution of points in the intersection of a finitely generated subgroup of Gnm with a given subvariety. We formulate (a special case of) the result as the following. ∗

Theorem 2.7 (Laurent) Let G be a finitely generated subgroup of (Q )n and let Σ be any subset of G. Then the Zariski closure of Σ in Gnm consists of a finite union of translates of algebraic subgroups of Gnm . Before giving the proof, we pause to illustrate the statement. Roughly speaking, it asserts that the algebraic relations verified by the coordinates of all the points in Σ are not genuinely “additive,” but can in any case be reduced to the “multiplicative” type and described in finite terms. More precisely, the minimal algebraic subvariety of Gnm containing Σ has a rather special type: it is a finite union of translates of algebraic subgroups. It is not difficult to classify and describe such subgroups (or translates). Each of them is defined by a finite number of equations of the form Xa = 1 (or Xa = λ for translates), where we have abbreviated Xa := X1a1 · · · Xnan , and where a = (a1 , . . . , an ) ∈ Zn (see e.g. [Bo2], [BoG], or [Z6]). Alternatively, each algebraic subgroup may be parametrized by the formulae Xi = ζi T1bi1 · · · Trbir , for a finite group of vectors (ζ1 , . . . , ζn ) of roots of unity, for parameters Ti free to vary in Gm and for suitable integers bi j . (In particular, the group G above must not be confused with an algebraic subgroup, which can be finitely generated only if it is finite.) It is rather easy to construct examples showing how each algebraic subgroup or translate can well contain a Zariski dense set from a finitely generated group G. For instance, one can let the parameters Ti in the formula above vary along ∗ a finitely generated group G1 ⊂ Q (like e.g. OS∗ ) and let G be the image of Gr1 under the map (T1 , . . . , Tr ) → (ζi T1bi1 · · · Trbir )1≤i≤n . In this case, whenever G1 is infinite, the Zariski closure of G is the whole algebraic group parametrized by the above map. We remark that often the theorem is stated by taking Σ := V ∩ G, for some given (irreducible) algebraic variety V ⊂ Gnm : when V is not an algebraic translate, we have a non-trivial conclusion (confining Σ to a proper subvariety). Such a formulation, which is apparently more special (in fact, it is quite easy to check its equivalence with the above one), is motivated by some applica∗ )n and Σ = V ∩ G, the theorem describes tions. For instance, when G = (Ok,S

2.3 S-Unit Points on Algebraic Varieties

37

the points in V whose coordinates are S-units in k. (In the language of Chapter 3 below, these are simply the S-integral points over k on the subvariety V of Gnm .) In this view, it is not difficult to recover Theorem 2.4, on taking V as the linear hypersurface X1 + · · · + Xn = 1; see Exercise 2.14 below. It is worth observing that it is possible to “parametrize” the subgroups underlying the families of maximal algebraic translates entirely contained in a given variety V (see e.g. [BoZ]). For example, if one finds out that V does not contain algebraic translates of positive dimension (which is the case for a “general” V ), Theorem 2.7 implies that V ∩ G is finite. However, it is at present not known how to compute this finite set in the general case. Proof of Theorem 2.7 Let V be an irreducible component of the Zariski closure of Σ. We then need to prove that V is a translate of an algebraic subgroup of Gnm . Let f1 = · · · = fr = 0 be a defining system of polynomial equations for V . We normalize such a system in the following way. If f is one of the fi having more than two terms, and if two monomials Xa and Xb appear in f (X) such that their ratio is a constant λ on the whole V , we substitute λ Xb for Xa in f (X), obtaining another polynomial f ∗ (X) with fewer terms than f . We then replace the equation f (X) = 0 with the pair of equations Xa − λ Xb = f ∗ (X) = 0; observe that both equations continue to hold on the whole V , and that the new system again defines V . Let us now iterate the procedure; each time the total number of terms in the non-binomial equations of the system decreases, so eventually we shall end up with a finite set (perhaps empty) of binomial equations, together with other s equations (possibly s = 0) f˜1 = f˜2 = · · · = f˜s = 0, all valid on the whole V , and such that, for each i, no ratio of distinct monomials appearing in f˜i is constant on V . Now, a first possibility occurs when s = 0. In this case we may completely define V by binomial equations Xai = λi Xbi ; therefore V is either empty or a translate of the algebraic subgroup defined by the equations Xai = Xbi , concluding the argument. We remain with the case when there exists a non-trivial equation f˜(X) = 0, valid on V and such that no ratio of distinct monomials appearing in it is constant on V ; we proceed to derive a contradiction. Let us write h

f˜(X) = ∑ ci Xai , i=1

∗

where the ci ∈ Q and where, for all distinct i, j ∈ {1, . . . , h}, Xai −a j is not constant on V . Let then k, S be a number field and finite set of places, so large that G ⊂

38

Schmidt’s Subspace Theorem and S-Unit Equations

(OS∗ )n (it suffices to argue with a finite set of generators of G); upon enlarging k, S we can also assume that all of the ci lie in OS∗ . Let now g ∈ V ∩ Σ; then f˜(g) = 0, i.e. h

∑ ci gai = 0.

i=1

(OS∗ )n

Since Σ ⊂ G ⊂ and since ci ∈ OS∗ , all the terms ci gai are S-units in k, so we may apply Theorem 2.4 (in the homogeneous version, as in remark (ii)). In particular, remark (iv) to that theorem ensures the existence of a finite set Φ ⊂ OS∗ such that for every g ∈ V ∩ Σ the ratio of some pair of the type ci gai , c j ga j , i = j, lies in Φ. We may then partition V ∩ Σ into a finite number of subsets Gl , l = 1, . . . , L, and assume that, for l = 1, . . . , L, there exist distinct indices i = il , j = jl , and φ = φl ∈ Φ such that, for g ∈ Gl , ci gai = φ c j ga j . The last equation says that g lies in the algebraic translate defined by ci Xai = φ c j Xa j . On the other hand, such a translate cannot contain V , since by construction the monomials Xai , Xa j do not have a constant ratio on V . Then the equation for the translate defines a proper subvariety Vl of V . Note that we have proved that V ∩ Σ = Ll=1 (Vl ∩ Σ). Since V is irreducible, L we have dimVl < dimV , so l=1 Vl is a proper subvariety of V . But this contradicts the fact that V ∩ Σ is Zariski dense in V , which finally proves the theorem. We conclude this section with a corollary, which was proved by Lang as early as in 1966 (see [L2]). Corollary 2.8 (Lang 1966) Let f ∈ k[X,Y ], f = 0. Suppose that G ⊂ k∗ 2 is a finitely generated subgroup and that there exist infinitely many pairs (u, v) ∈ G with f (u, v) = 0. Then f has a factor of type aX mY n + b or of type aX m + bY n . Proof We can deduce this result from Theorem 2.7, on letting V be the curve defined by f = 0 in G2m and Σ := V ∩ G. The conclusion implies that, if Σ is infinite, then V contains at least one translate of some algebraic subgroup of G2m of positive dimension. Since dimV = 1, such a translate must coincide with a component of V , and thus corresponds to some irreducible factor of f . The shape of that factor may now be derived at once from the mentioned structure of the algebraic subgroups of G2m .

2.4 Norm-Form Equations Let ω1 , . . . , ωn be algebraic numbers generating over Q the number field k = Q(ω1 , . . . , ωn ), of degree d = [k : Q] and let K be the normal closure of k over

2.4 Norm-Form Equations

39

Q, with Galois group G = Gal(K/Q). Put H := Gal(K/k) and define R as a system of representatives for G/H. Let us consider the linear form L(X) = ∑ni=1 ωi Xi and the norm N(X) := NQk (L(X)) =

∏ Lσ (X) ∈ Q[X].

σ ∈R

(Observe that N(X) is homogeneous of degree #R = d.) An equation of type N(X) = c, for a c ∈ Q∗ , is called a “norm-form” equation, and we are interested in the integral solutions X = x ∈ Zn . A linear change of variables with rational coefficients shows that we need to consider only the case when ω1 , . . . , ωn are linearly independent over Q, an assumption that will hold throughout. A norm-form equation may well have an infinity of integral solutions. In fact, suppose for instance that ∑ni=1 Qωi = k; then it is easily seen that the module L(Zn ) = ∑ni=1 Zωi contains μ Ok , for a suitable μ ∈ k∗ . It thus contains μ Ok∗ and it follows immediately that the equation N(X) = NQk (μ ) has infinitely many solutions as soon as Ok∗ is infinite (which is the case if k is neither Q nor imaginary quadratic). A simple instance comes from Pell’s equation (1.4). In contrast, if ∑ni=1 Qωi is strictly contained in k (which amounts to the condition n < d), the number ω1 x1 + · · · + ωn xn , for xi ∈ Z, is rather special in k; thus we expect strong restrictions on the integer solutions of N(X) = c. As an example, consider the Thue equations met in Chapter 1, i.e., equations of type f˜(X,Y ) = c, where f˜ is an irreducible (over Q) form of degree d ≥ 3. It is clear that, up to a constant factor, f˜ is a norm-form, with n = 2: f˜ = aNQk (X − ξ Y ), where k = Q(ξ ) and ξ is a root of f˜(X, 1). Moreover, since d ≥ 3, we are in the case n < d; now Thue’s theorem says that in fact there are at most finitely many integral solutions. Observe, however, that the condition n < d does not itself suffice to ensure finiteness example occurs for n = 3, putting L(X,Y, Z) = √ √ √ (if√n > 2). An easy X + 2Y + 3Z, k = Q( 2, 3), so d = 4. Now the equation NQk (L(X)) = 1 has (at least) three infinite families of integral solutions, obtained with Z = 0 (we find the equation (X 2 − 2Y 2 )2 = 1), with Y = 0 (equation (X 2 − 3Z 2 )2 = 1)), is that we have and with X = 0 (equation (2Y 2 −3Z 2 )2 = 1). The√substance √ here √ three Pell equations3 relative to the subfields Q( 2), Q( 3), Q( 6) ⊂ k; all of them have infinitely many integer solutions. The phenomenon is explained by the fact that L(Z4 ) contains three rank-two submodules generating quadratic fields over Q; in other words, the condition n < d is no longer satisfied if we restrict our consideration to the submodules. In this situation (but we omit 3

The third one may be re-written ((2Y + 3Z)2 − 6(Y + Z)2 )2 = 1.

40

Schmidt’s Subspace Theorem and S-Unit Equations

formal definitions) one speaks of a degenerate modulus (see [BS], Chapter IV, Section 6.4, or [S2], Chapter VII]). It was a longstanding conjecture that this situation characterized an infinite set of integral solutions. Before Schmidt, the sole results in this direction, Thue’s equation apart, had been obtained by Skolem and by Chabauty, with methods relying on p-adic analysis (see [BS], Chapter IV); however, their conclusions concerned only the case n = 3. The whole subject was resolved in 1972 by Schmidt, who proved in particular the conjecture just sketched (see [S2]); he used his subspace theorem, Theorem 2.1. Here we shall propose a fairly simple proof of the conjecture, relying on Theorem 2.4. (We must stress that Schmidt originally did not have at his disposal such a result, which allows some simplifications.) Theorem 2.9 (Schmidt 1972) Suppose that for some c ∈ k∗ the equation NQk (L(X)) = c has an infinity of integer solutions. Then there exist λ ∈ k∗ and a subfield k ⊂ k such that Ok∗ is infinite and such that λ k ⊂ L(Qn ) (possibly k = k, i.e. n = d). Proof We shall argue by induction on d (the case d = 1 being clear and in fact empty), assuming that the equation NQk (x) = c has infinitely many solutions x ∈ Zn . Suppose first that n = d; then we claim that the conclusion holds with k = k, λ = 1. In fact, to start with we have L(Qn ) = k, by the linear independence of the ωi over Q. On the other hand, that Ok∗ must be infinite is a standard fact: it suffices to associate with a solution x ∈ Zn the fractional ideal I (x) := Ok L(x) ⊂ k. By virtue of the equation NQk (x) = c this ideal has only finitely many possibilities (note that, if δ is a common denominator for the ωi , δ I (x) divides δ d c). Hence, for an infinity of solutions, I (x) is constant; thus, for any pair x, x of such solutions, L(x)/L(x ) is a unit, in Ok∗ ; but x → L(x) is injective, and the claim follows. Suppose now that n < d and let V be a minimal subspace of Qn containing an infinity of integral solutions. If r = dimV , there exists an injective linear map ϕ : Qr → V such that ϕ (Zr ) contains infinitely many solutions (easy Exercise). Define Λ = L ◦ ϕ , so Λ is a linear form in r variables, with coefficients in k. Write Λ = β1Y1 + · · · + βrYr and observe that the βi are linearly independent over Q (as follows from the independence of the ωi and the injectivity of ϕ ). Since ϕ is defined over Q, we have Λσ = Lσ ◦ ϕ , so, by virtue of the construction of ϕ , the equation

∏ Λσ (Y) = c

σ ∈R

has an infinity of integral solutions Y = y ∈ Zr ∩ ϕ −1 (Zn ).

(2.5)

2.4 Norm-Form Equations

41

Now, since d > n ≥ r, the forms Λσ , σ ∈ R, must be linearly dependent, whence an identical equation

∑ ∗ ασ Λσ (Y) = 0

(2.6)

σ ∈R

holds, where ασ ∈ K ∗ , and R∗ is a suitable non-empty subset of R. Let then S ⊂ MK be large enough to ensure that βi , c, ασ (σ ∈ R∗ ) are all S-units in K. Then, Equation (2.5) implies in particular that, for every solution ∗ ; moreover, from (2.6) we y ∈ Zr and for every σ ∈ R, we have Λσ (y) ∈ OK,S obtain

∑

σ ∈R∗

ασ Λσ (y) = 0,

so we may apply Theorem 2.4 (in the homogeneous version of remark (ii)). From remark (iv) to that theorem, we deduce that there exist distinct elements σ , τ ∈ R∗ , a ξ ∈ K ∗ , and infinitely many solutions y in our set, such that Λσ (y) = ξ Λτ (y). If we had Λσ (Y) = ξ Λτ (Y), the solutions in question would lie in a proper linear subspace W of Qr ; but then infinitely many integral solutions of N(x) = c would lie in ϕ (W ), against our minimality assumption (note that dim ϕ (W ) < r). Therefore Λσ (Y) = ξ Λτ (Y) identically, so, putting g = τ −1 σ , ξ = τ −1 (ξ ), we find Λg (Y) = ξ Λ(Y). In particular, we then have ξ = β1g /β1 , whence the linear form Ω := Λ/β1 is invariant because g = 1. Its coefficients then lie in a certain proper subfield k1 ⊂ k. Also, we have [k:k1 ] . NQk (Λ(y)) = NQk (β1 ) NQk1 (Ω(y)) In view of (2.5) we then deduce that there exists c1 ∈ Q∗ such that, for infinitely many of the solutions y in question, we have NQk1 (Ω(y)) = c1 .

(2.7)

Then, taking into account that [k1 : Q] < d, we may apply the inductive assumption, with k1 in place of k, Ω in place of L, and c1 in place of c. The corresponding conclusion states that there exist λ ∈ k1∗ and a subfield k ⊂ k1 , with an infinite Ok∗ , and such that λ k ⊂ Ω(Qr ). On the other hand, Ω(Qr ) = β1−1 Λ(Qr ) ⊂ β1−1 L(Qn ). Therefore β1 λ k ⊂ L(Qn ), so the conclusion holds for L as well, with β1 λ in place of λ (and with the same field k ), proving the theorem.

42

Schmidt’s Subspace Theorem and S-Unit Equations

2.5 Exercises Exercise 2.10 Let ξ be an irrational algebraic number and let ε > 0. Show that there exist only finitely many pairs p, n ∈ N such that |ξ − (p/(2n + 1))| < 2−(1+ε )n . Hence, the shape “2n + 1” for the denominator of an approximation yields the same exponent as the shape “2n ” considered by Ridout; however, Ridout’s theorem seems not to be sufficient for this deduction. (Hint: apply Theorem 2.2.) The result appears in a much more general form as [CZ1], Lemma 2; see also [TrZ] for an application to a certain transcendence proof. Exercise 2.11 Let a, b, c ∈ C[t] be coprime polynomials, not all constant and such that a + b + c = 0. (i) Prove the Mason–Stothers theorem (see [Mas], [Sto]): the number max(deg a, deg b, deg c) + 1 does not exceed the number of distinct zeros of abc. (Hint: start by factoring a, b, c in C[t], then differentiate the equation, obtain c a−ca = cb −c b, and compare degrees, after a “divisibility” argument.) This result, which admits the hinted simple proof, allows a generalization to C(t) of Theorem 2.4 in the case n = 2. (ii) Obtain in that case an estimate for the degrees of the solutions of x + y = 1 in a subgroup G ⊂ Q(t)∗ of finite rank r. (iii) Use (i) to prove a version of “Fermat’s last theorem” for polynomials. Arguments similar to the suggested one actually apply in any number of variables (see [BrMa], [Z1]). See also [Z6] for an extension of the equation x + y = 1 to more general equations f (x, y) = 0. A “numerical” version of the above statement known as “the abc conjecture” is due to D. Masser and J. Oesterlé; it has spectacular consequences in many central topics of number theory (see e.g. the surveys [Go], [S3], and [Vo1]). Exercise 2.12 Show that, given rational functions r1 (t), . . . , rh (t) ∈ Q(t)∗ , which are multiplicatively independent,4 there exist rationals t0 ∈ Q such that ri (t0 ) are all defined, non-zero, and multiplicatively independent. Generalize the result to rational functions on an algebraic curve. (Stronger conclusions appear in [BoMaZ] and in previous papers by Masser.) Exercise 2.13 Let pi j , i = 1, . . . , m, j = 1, . . . , n, be fixed pairwise distinct m1 mn = 0 · · · pamn prime numbers. Prove that the equation pa1111 · · · pa1n1n ± · · · ± pam1 has at most a finite number of solutions in integers ai j ∈ Z. 4

a

a

That is, r11 · · · rhh = 1 for integers ai not all zero.

2.5 Exercises

43

Exercise 2.14 Let V be the subvariety of Gnm defined by the equation X1 + · · · + Xn = 1. Describe by equations the translates of algebraic subgroups of Gnm that are entirely contained in V . Apply the result to recover Theorem 2.4 as a special case of Theorem 2.7. Exercise 2.15 Let p, q, n be positive integers and let ε > 0; prove that, if n is large enough, the inequality q2 |(3/2)n − (p/q)| > exp(−ε n) holds. (Hint: set x1 = q3n , x2 = −p2n and apply Theorem 1.39, as in the proof of Theorem 2.4.) Deduce Pourchet’s result (answering a question of Mendès France: see [MF]): the length of Euclid’s algorithm for 3n : 2n tends to infinity as n → ∞. (Hint: use the properties of continued fractions to show that the partial quotients for 3n /2n are “small” compared with 2n .) See Exercise 4.38 for a generalization. Exercise 2.16 Let α1 , . . . , αs , c1 , . . . , cs be non-zero algebraic numbers such that αi /α j is not a root of unity if i = j. Prove that c1 α1n +· · ·+cs αsn may vanish at most for finitely many integers n. (Hint: use Theorem 2.4 for suitable k, S.) This result is known as (a special case of) the Skolem–Mahler–Lech theorem; an elegant and substantially elementary proof may be obtained by viewing the functions n → α n as p-adic analytic functions (see Chapter 4 and [vdP1]). The suggested approach making use of Theorem 2.4 is less elementary, but has the advantage of yielding superior quantitative conclusions (see e.g. [ESS]) and also sharper results that the p-adic method misses. For instance, show that, if ci ∈ Q∗ , if αi ∈ N are distinct and not all divisible by a certain prime p, and if ε > 0, then ord p (∑si=1 ci αin ) ≤ ε n for large enough n. (This was shown by Evertse in much greater generality; see also [CZ1], Lemma 1.) Also, find an example when lim sup ord p (∑si=1 ci αin ) = ∞.

Exercise 2.17 Let A/k be an algebraic group defined over the field k and let G be a subgroup of A(k). Prove that the Zariski closure of G (in A) is an algebraic subgroup of A. (This easy fact proves Theorem 2.7 in the special case when Σ = G.) Exercise 2.18 Show that Theorem 2.9 admits a converse and that in concrete cases one can effectively check whether the conclusion is verified. (Show that, for a given subfield k ⊂ k, the λ ∈ k such that λ k ⊂ L(Qn ) make up a Qvector space, which can be computed.) So, there is an algorithm to answer the

44

Schmidt’s Subspace Theorem and S-Unit Equations

question of whether the equation N(X) = c has an infinity of integer solutions for some c; on the other hand, we do not know of any algorithm which answers the same question for any given c.

2.6 Notes A version of the subspace theorem for several places (over Q) and also a special case of Theorem 2.4 were obtained by E. Dubois and G. Rhin [DR] independently of Schlickewei. The subspace theorem, in its various formulations, has been quantified by several authors, including Schmidt himself, Schlickewei, and Evertse. It turns out that the number of exceptional subspaces appearing in the conclusion may be explicitly bounded.5 Some results appear in [S3], while subsequent, more uniform, estimates have been obtained, e.g., in [ES]. A new, more general, geometric formulation of the theorem has been given by G. Faltings and G. Wüstholz [FaWu]. In this version the “approximant” points are restricted in an algebraic subvariety of Pn ; this information sometimes allows one to weaken the required inequality (2.2) or (2.3). Even the proof in [FaWu] is new, compared with the original argument by Schmidt; it uses the celebrated product theorem devised by Faltings (for this, see e.g. [Fa] or the paper by van der Put in [EE]). The Faltings–Wüstholz result has been quantified by Evertse and Ferretti in [EF1]; this paper also shows how to deduce the version in [FaWu] from the original Schmidt–Schlickewei formulation, after a suitable change of coordinates and ambient space. This idea also appears, in a different form, in [CZ9], Theorem 3, and in [EF2], where an explicit version of a “subspace theorem” with polynomials of arbitrary degrees in place of linear forms is proposed. The work by Wirsing [Wi] mentioned above concerns the approximation of a fixed algebraic number by a sequence of algebraic numbers of fixed degree: for a real algebraic number α , real positive ε > 0, and positive integer δ ≥ 1, he proved that, for all but finitely many algebraic numbers β with [Q(β ) : Q] ≤ δ , −2δ −ε |α − β | > HQ( β) .

Here HQ(β ) is the height normalized with respect to Q(β ), while the absolute value | · | is normalized with respect to Q. 5

However, similarly to Roth’s theorem, one does not know how to compute the equations for the subspaces.

2.6 Notes

45

A deep generalization of Wirsing’s theorem to linear forms in several variables was recently obtained by A. Levin [Lev3], after earlier work by M. Ru and J. Wong [RuW]. A particular case of Levin’s theorem from [Lev3] gives (in the notation of the subspace theorem, Theorem 2.2) the lower bound n |Li,ν (x)|ν δn−1 2 > − (δ n) + ε log H(x), log ∏ ∏ |x|ν 2δ n − 3 i=0 ν ∈S ¯ with [k(x0 , . . . , xn ) : k] ≤ δ . Here, for each for points x = (x0 : · · · : xn ) ∈ Pn (k) ν ∈ S, L0,ν , . . . , Ln,ν are, as in Theorem 2.2, independent linear forms defined over the number field k. A version of the subspace theorem with “moving targets” is the object of the work [RuV] by Min Ru and P. Vojta. Deep general conjectures about possible extensions of the subspace theorem are due to P. Vojta (see [Vo1], [Be1], and [L3]). He also discovered a profound and surprising analogy of this context with Nevanlinna’s theory on the distribution of the values of meromorphic maps C → Cn (see [Vo1], [L3], [Vo2]). (Special cases had been noted also by Osgood and Reyssat.) Concerning the subspace theorem, its Nevanlinna-analogue consists of Cartan’s second main theorem, which was proved by H. Cartan in the 1930s. The extension to polynomials of arbitrary degree instead of linear forms (Cartan’s conjecture) was obtained by Min Ru (see [Ru1], [Ru2]), building on the work done by Evertse and Ferretti and the present authors in the arithmetic setting (see [EF1], [EF2], [CZ9]). The quantitative conclusions in [ES] have been applied in [ESS] to the estimation of the number of non-degenerate solutions of Equation (2.2), leading to a quantification of Theorem 2.4 (see [E4]). In particular, the authors prove that the number of non-degenerate solutions is bounded only in terms of the rank of the group G (so for instance a field of definition for G is immaterial). Forgetting the explicit shape of the estimates, this deep result extends what had been found in [BeS], in the case n = 2. (See the notes to the previous chapter, and see [BoG] and [Z6] for versions of such a proof.) Theorem 2.4 has been generalized by Evertse to the case when the variables are “almost” S-units (in the sense that the contribution to the height coming from places outside S is small). Theorem 2.7 has been proved by Laurent in a more general form, where one considers the points in the division group of G, namely the group {g ∈ Gnm : ∃h > 0, gh ∈ G}. This version requires not new arithmetical tools, but rather

46

Schmidt’s Subspace Theorem and S-Unit Equations

arguments from Kummer theory (i.e., the Galois theory of radical abelian extensions). The case of curves had previously been treated by Liardet (see [L2]). Also, as remarked above, the case of an arbitrary field k ⊂ C reduces to the algebraic case, for example by specialization. Laurent’s theorem solves part of a conjecture by Lang, who considered either multiplicative algebraic groups Gnm or abelian varieties. This last part has been solved by Faltings (1989; see [EE]), after Laurent’s result; an almost immediate corollary is the extremely deep (former) “Mordell conjecture,” which was also proved by Faltings with different methods as early as in 1982: an algebraic curve of genus ≥ 2 has at most finitely many rational points (see [L3] for a survey of the original proof and [Bo4], or [BoG], or [HiSi] for a more recent and more elementary proof, due to Bombieri, simplifying a method of Vojta). In some sense, Faltings’ theorem mentioned above can be viewed as a “compact analogue” of Theorem 2.4; it asserts that the Zariski closure of the set of rational points on algebraic subvarieties of an abelian variety is a finite union of translates of abelian subvarieties. So, if the algebraic variety in question does not contain positive-dimensional algebraic subgroups, its rational points are finite in number. Theorem 2.6 also admits a compact version, which was also proved by Faltings. It consists of bounding from below the distance from a rational point to a hypersurface in an abelian variety (see Theorem 2 in [Fa]). An effective version of Corollary 2.8 is implicit in [Bilu] (and a version appears in [BoG]); the proofs use the above-mentioned results by Baker on linear forms in logarithms of algebraic numbers. However, effective versions of the general case of Theorem 2.4 are not known at present. The norm-form equations are special cases of equations L1 · · · Lr = m, where m is constant and Li are linear forms. Schmidt also treated the case when the constant m is replaced with a polynomial of “small” degree (see [S2]). In general (under suitable necessary conditions), the subspace theorem implies that the integral solutions all lie in a certain finite union of hyperplanes; however, one cannot in general prove finiteness, as in Theorem 2.9, which strongly depends on the fact that the linear forms are conjugate (over Q). See also [Gy] for a survey of results on related equations and for an effective analysis, whenever possible. The equations L1 · · · Lr = m are generalized in [CZ9] to equations of the shape f1 (X) · · · fr (X) = g(X), where fi , g are polynomials and g has small

2.6 Notes

47

degree. It is shown for instance that (Thm. 1) if such polynomials verify certain natural geometric conditions and if ∑ deg fi > n max deg fi + deg g, then the integral solutions all lie in some subvariety of An of dimension ≤ n − 2. (See also [FaWu] for the case of constant g, and Section 3.5.)

3 Integral Points on Curves and Other Varieties

3.1 General Notions on Integral Points The classical object of the theory of Diophantine equations consists of describing the integral, or rational, solutions of a system of algebraic equations;1 in geometric language, one is concerned with integral or rational points on algebraic varieties. To distinguish between integral and rational may appear somewhat artificial, since any problem about rationals may be in any case formulated in terms of integers. Geometrically, however, this classification is natural, since a rational point (p1 /q, . . . , pn /q) (pi , q ∈ Z) corresponds not quite to the vector (p1 , . . . , pn , q), but rather to the point (p1 : p2 : · · · : pn : q) in projective ndimensional space. In other words, integral and rational points correspond respectively to affine or projective varieties. Here we shall be mainly concerned with integral points (hence with affine varieties), a problem that is sometimes more accessible than that of rational points,2 but nonetheless usually very deep. Let us now formalize our problem, and simultaneously generalize it (as done by Mahler) by considering, together with the usual integers in Z, the set of Sintegers (actually a ring), OS = Ok,S , where k is a number field and S ⊂ Mk is a finite set, containing all the infinite places. (This generalization is natural and convenient; for instance it often eliminates from the statements certain conditions which depend on the normalization for the equations rather than on intrinsic properties of the varieties.) Let V /k be an affine algebraic variety defined over k. To start with, we shall 1 2

However, other types of equations (e.g. exponential) have been considered as well. They are sometimes auxiliary for the algebraic ones, or related to them; see, for example, Chapter 1. An exception occurs e.g. with the theory of quadratic equations, or for varieties which admit “many” rational points.

48

3.1 General Notions on Integral Points

49

assume throughout that V is absolutely irreducible, an innocuous restriction for our purposes, as is shown in the following exercise. Exercise 3.1 Prove that, if V is an algebraic variety defined over k, but reducible over an extension of k, then (i) V is reducible over an algebraic extension of k; (ii) the points in V (k) are not Zariski-dense in V . ¯ are Zariski-dense in V . For (ii) (Hint: for (i), observe that the points in V (k) observe that V (k) is contained in an intersection of distinct components of V over k.) A first definition of integral points. If V is embedded as a closed algebraic set in an affine space Am , we define the set V (OS ) of S-integral points in V by V (OS ) = {(x1 , . . . , xm ) ∈ V ⊂ Am : xi ∈ OS , i = 1, . . . , m}. So, they are the points where all the coordinate functions xi take S-integer values. Sometimes, however, it is convenient to deal with affine varieties without an embedding. Second definition of integral points. Let us then consider the algebra k[V ] of regular functions on V , defined over k. We now say (following [Se1] or [Vo1]) that a set Σ ⊂ V (k) is quasi-S-integral (or quasi-integral if S has been chosen once for all) if for every ϕ ∈ k[V ] there exists a = aϕ ∈ k∗ such that aϕ (P) ∈ OS for every P ∈ Σ. In practice, one considers all regular functions on V (not just the coordinates for a given embedding), forgetting about a possible denominator, depending on the function but not on the points in Σ. Since V is affine, the algebra k[V ] is finitely generated and it is then clear that in the definition it suffices to take into account only the ϕ s in some finite set of generators. Observe also that every finite set Σ ⊂ V (k) is quasi-S-integral, so the definition is meaningful only for infinite sets. Moreover, if Σ is quasi-S-integral, we may choose affine coordinates for V so that the points in Σ are S-integral in the previous sense; therefore there is not much difference between the two notions, and in what follows we shall tacitly mix them, when there will be no risk of confusion. Integral points with respect to a divisor. As remarked above, the second definition of the concept of integrality may be convenient when V is given, but not embedded. Let us see how to formulate the above conditions in terms of divisors.

50

Integral Points on Curves and Other Varieties

For us a divisor on a projective variety will be a Weil divisor, which by definition is a finite formal linear combination of irreducible hypersurfaces. It is called effective if all its involved coefficients are non-negative. We say that a divisor is very ample if the variety admits a projective embedding such that the given divisor becomes a hyperplane intersection. A divisor is ample if a suitable positive multiple of it is very ample. Let us suppose in this discussion that V is given as a Zariski-open subset of a projective variety V˜ . We can always reduce to the case where V˜ \ V is of pure codimension one, after blowing up higher-codimension components at infinity. Now the divisor at infinity D is relevant: it is defined as the sum of the irreducible components of V˜ \V , and is thus an effective divisor (defined over k). The algebra k[V ] consists of the functions in k(V˜ ) = k(V ) which are regular outside the support |D|. Reciprocally, if we start with a projective variety V˜ and an ample (effective) divisor D on V , we may embed V˜ \ D in affine space and define the integral points relative to D, or the integral points of the variety V˜ \ D, as those which are integral for this embedding. Of course, again these points are defined only up to a denominator; however, with this proviso, they are well defined in terms of the data V˜ , D. Third definition of integral points. Finally, again with these last data, there is another definition, which is often most useful, because it does not use affine embeddings or regular functions, but only a projective embedding V˜ ⊂ Pn . Let x = (x0 : · · · : xn ) be a point in V˜ (k); we may assume that S is large enough that OS is a unique factorization domain.3 Then the projective coordinates xi of the point may be chosen as coprime elements of OS , in a unique way up to multiplication by a unit in OS∗ . In this case, if v is a place of k outside S, the reduction of x modulo v is well defined as a projective point over the residue field. Then, we say that x is integral relative to D if for all places v ∈ S the reduction of x modulo v does not lie in the reduction of D. This amounts to the fact that not all the equations for D have to be satisfied by x modulo v. It is not difficult to check how this third definition relates to the previous ones. Note that this last definition also applies to quasi-projective varieties which are not necessarily affine, in the sense that we do not need that the 3

We could actually dispense with this proviso, working separately in each local ring which is known to be a unique factorization domain.

3.1 General Notions on Integral Points

51

effective divisor D be ample. For instance, on taking D = 0, it applies to projective varieties: in that case, integral points coincide with rational ones. Example (The punctured projective plane) Let us consider a point Q ∈ P2 (k); we shall describe the integral point on the complement P \ {Q}. Note that Q is not a divisor on P2 . Choose coordinates such that Q = (0 : 0 : 1) and suppose for simplicity that the relevant ring of S-integers is a principal ideal domain. A rational point P = (x : y : z) ∈ P2 (k) can be represented with coprime coordinates x, y, z ∈ OS ; moreover, if P = Q, then (x, y) = (0, 0). Then P is an S-integer with respect to Q if and only if for no place v outside S (equivalently: prime ideal v of OS ) x ≡ y ≡ 0 (mod v). We can recover the same notion by working over the complement of a divisor in another surface, obtained from P2 by blowing up the point Q. Let V˜ ⊂ P2 × P1 be the hypersurface defined as V˜ = {((x : y : z), (ξ : η )) | ξ y = η x}, and let D ⊂ V˜ be the curve defined by x = y = 0. Then every point P = (x : y : z) ∈ P2 (k) with P = Q defines a point P := ((x : y : z), (x : y)) ∈ V˜ \ D. Asking that P be S-integral with respect to Q amounts to asking that P be S-integral with respect to the divisor D. For our purposes here, the given definitions will turn out to be equivalent (in the sense that the choice does not affect the truth of the various stated results). Note that, if ϕ ∈ k(V ) is integral over k[V ], then every quasi-integral set relative to k[V ] is quasi-integral also relative to k[V ][ϕ ] (Exercise: use that OS is integrally closed). In other words, in place of k[V ] we may consider its integral closure in k(V ); it is well known that, since V is affine, this (possibly larger) ring is the algebra of regular functions of a normal affine variety V (endowed with a regular birational map π : V → V ). This fact allows us to assume without loss of generality that V is normal. To proceed, let π : V → V be any regular map between affine varieties V,V , defined over k. Then one may easily check that a quasi-S-integral set on V is sent by π to a quasi-S-integral set on V . (When π is finite and unramified, there is also a kind of converse property, which will be treated in the next paragraph.) Examples Let us illustrate the above notions with a few further simple examples, when V˜ = P1 is the projective line. The divisor at infinity takes the shape ∑ri=1 Qi for distinct points Qi . (For this divisor to be defined over k, it is necessary and sufficient that the Qi form a complete set of conjugates over k; possibly it could happen that the Qi are not individually defined over k.)

52

Integral Points on Curves and Other Varieties

If r = 0, then V = P1 is projective and, as we mentioned, every set of rational points is quasi-integral. Let now r = 1; the point Q1 will be defined over k, and we may suppose it is ∞ := (0 : 1), so V = A1 , the affine line. Its affine algebra over k is k[t], where t is the standard coordinate function on A1 . The set of S-integral points is now identified with OS . If r = 2 and both Q1 , Q2 are defined over k, we may assume they are 0, ∞, so now V is identified with A1 \{0} = Gm . Its algebra k[V ] is k[t,t −1 ], where again t is the coordinate on A1 . By means of the generators t,t −1 , V is embedded in A2 as the hyperbola XY = 1 and the S-integral points correspond to the S-units OS∗ ; similarly for quasi-integral sets. In this language, the S-integral points on Gnm are just those with S-unit coordinates and Theorem 2.7 (for G = (OS∗ )n and Σ = V ∩ G) simply describes the S-integral points on a subvariety V ⊂ Gnm . If, on the other hand, r = 2, but Q1 , Q2 are not both defined over k, they will be conjugate in a quadratic extension. If t is the coordinate on A1 , then t(Q1 ),t(Q2 ) will be conjugate quadratic over k, say, roots of the equation f (X) = X 2 + aX + b = 0, where a, b ∈ k. The algebra k[V ] is now easily seen to be k[1/ f (t),t/ f (t)]. On putting u = 1/ f (t), v = t/ f (t), we find t = v/u and we may then embed V as the conic u2 f (v/u) = u, namely v2 + auv + bu2 = u. Over Q, an example occurs with Pell’s equation. (Put e.g. a = 0, b = −2, x = 1 + 4u, y = 2v.) If k and S are large enough, the set of S-integral points is always infinite (see also Exercise 3.65 below); on the other hand, for certain choices of k, S, this set may be empty. Finally, let r = 3 and say that the Qi are all defined over k. Since Aut (P1 ) is 3-transitive, we may assume that Q1 , Q2 , Q3 are 0, 1, ∞. If t is as above, we have k[V ] = k[t, 1/t, 1/(t − 1)]. Using this presentation, we may embed V in 3-space by the system XY = (X − 1)Z = 1. The S-integral points correspond to the S-integers x such that 1/x and 1/(x − 1) are also S-integers. Namely, x and u := 1 − x must both be S-units. This gives the S-unit equation x + u = 1, which has only finitely many solutions (according to Section 1.3 or Theorem 2.4). Therefore, in this embedding V has only finitely many S-integral points, no matter what k, S are (and the same argument shows that this happens for quasi-S-integral points and every embedding). Exercise 3.2 In this exercise we use the notion of integrality with respect to the third definition, often omitting any reference to the set of places S. (i) Let V = Pn , D := {x0 = 0}. Prove that the integral points with respect to D, relative to k = Q, are the usual integral points of An . (ii) Let V = P1 , D = {0, ∞}. Prove that the integral points with respect to D correspond to the units OS∗ . They are the points in Gm (OS ).

3.2 The Chevalley–Weil Theorem

53

(iii) Let V = P1 , D = {0, 1, ∞}. Prove that the integer points correspond to the solutions of x + y = 1 with x, y ∈ OS∗ . (iv) Let V = Pn , D = L + L0 + · · · + Ln , where Li : xi = 0 and L : x0 + · · · + xn = 0. Prove that the integral points correspond to the solutions of x0 + · · · + xn = 1 in S-units xi . (v) Let V be an algebraic curve in A2 , containing the origin (0, 0). Prove that the integral points on V \ {(0, 0)} correspond to those integral points (x, y) ∈ V such that x, y ∈ OS are coprime. Some of the verifications in this exercise have already been done; others shall be done below, choosing among the above definitions, but the interested reader should use all the definitions given above.

3.2 The Chevalley–Weil Theorem Suppose we are given two algebraic varieties V,W defined over a number field k and a morphism π : W → V , still defined over k. Then the morphism π sends rational points to rational points, i.e. induces a map W (k) → V (k). If OS is a ring of S-integers in k, then it is not necessarily true that the whole W (OS ) is mapped to V (OS ), but this becomes true after enlarging S by a finite set of valuations. This last fact can be easily seen in the affine case, the only case of our interest: every morphism π : W → V can be locally expressed by polynomials with rational coefficients, which can have finitely many primes at the denominators. Clearly, the converse is not generally true: if the image of a point P ∈ W is a rational point of V , P need not be rational. Whenever π : W → V is a finite map, say of degree n ≥ 1, the pre-image of each rational point of V consists of algebraic points in W (k) of degree at most n. One expects in general that the degree will be exactly n, so that the fiber is irreducible over k. Hilbert’s irreducibility theorem, which will be the content of Section 3.8, asserts that this last fact holds, whenever V is a rational variety, for a “dense” set of rational points of V . Also, the field of definition does depend, in general, on p. Under a further hypothesis of geometrical nature, however, one can prove that there exists a finite extension of k containing each point in the pre-image of V (k). This extra hypothesis is the absence of ramification for the map π ; the assertion that such a hypothesis provides the above property is known as the Chevalley– Weil Theorem. We give two, essentially equivalent, versions of this theorem; the first is more standard. Theorem 3.3

Let k be a number field, with OS ⊂ k a ring of S-integers.

54

Integral Points on Curves and Other Varieties

Let V,W be two quasi-projective algebraic varieties defined over k, and let π : W → V be a finite morphism, still defined over k. Suppose that π is unramified.4 Then there exists a number field k containing k and a set of places S of k containing all those lying over S, such that the following holds: for every ¯ with π (q) = p, we have S-integral point p ∈ V (OS ) and every point q ∈ W (k) q ∈ W (OS ). The next version keeps control of the field k , at the cost of adding further algebraic varieties. Theorem 3.4 Under the above hypotheses on V,W, π , there exist a finite set of places S containing S and finitely many algebraic varieties W1 , . . . ,Wm , all defined over k, endowed with morphisms πi : Wi → V such that

• V (OS ) ⊂ m i=1 πi (Wi (OS )); ¯ with π ◦ ψi = πi . • there exist isomorphisms ψi : Wi → W , defined over k, Remark 3.5 The Chevalley–Weil theorem is rather elementary, but is often a most useful tool. It is an arithmetic analogue of the lifting of maps in homotopy theory. Also, it exemplifies the general principle (see also [Bo3]) that a ˜ is unfunctional property (here, that the function field extension k(C˜ )/k(C) ramified) reflects by specialization into a numerical property (i.e. that the extensions k(π −1 (P))/k are almost unramified). The result may also be used for the proof of the weak Mordell–Weil theorem (see the remarks after Examples 3.6 and 3.8). See also Exercise 3.73 below for another application. Below we shall sketch two proofs of Theorem 3.3 and a deduction of Theorem 3.4 from Theorem 3.3; first, we analyse some basic and instructive examples. Example 3.6 Consider the affine curve V = W = Gm and the unramified map π : x → xn , where n ≥ 2. As we noticed, the integral points on Gm are the Sunits, which form a finitely generated abelian group OS∗ . The quotient of OS∗ by the subgroup of nth powers is then a finite group; let {ξ1 , . . . , ξm } ⊂ OS∗ be a set of representatives for such a quotient. Then each S-unit u ∈ OS∗ can be written in the form u = ξ j vn , for some index j ∈ {1, . . . , m} and an S-unit v ∈ OS∗ . On letting k be the number field generated, over the given number field k, by the nth roots of ξ1 , . . . , ξm and defining S as the set of places of k lying over those of S, we obtain that π −1 (V (OS )) ⊂ V (OS ), thus confirming Theorem 3.3. To obtain the conclusion of Theorem 3.4, we define the varieties W j , for 4

This is equivalent to saying that the corresponding holomorphic map W (C) → V (C) is a topological cover.

3.2 The Chevalley–Weil Theorem

55

j = 1, . . . , m, to be still Gm , but the morphism π j : Gm → Gm will be defined by π j (x) = ξ j xn . Clearly, the points of the form ξ j · vn in OS∗ have an S-integral pre-image, namely v. So Gm (OS ) = OS∗ is covered by the images π j (Gm (OS )), as predicted by Theorem 3.4. A completely analogous situation arises when the algebraic group Gm is replaced by an elliptic curve E and the map π : E → E is the multiplication by n; in that case the theorem of Chevalley applies to rational points, and leads to the so-called weak Mordell–Weil theorem, stating that the quotient group E(k)/n · E(k) is finite. See also Example 3.8 below for a concrete application to an isogeny E → E, where E is isogenous but not isomorphic to E. The next example shows that the hypothesis that the morphism is unramified cannot be omitted. Example 3.7 Let now V = W be the affine line A1 over the number field k. As before, π : W → V is the morphism x → xn raising to the nth power (for some n ≥ 2), which now ramifies at the origin. Now, the integral points of V (and of W ) are simply the S-integers in OS . Adding the nth roots of all the S-integers produces an infinite-degree extension of k (already the field generated by the nth roots of the rational primes has infinite degree). Hence the conclusion of Theorem 3.3 does not hold in this case. It is evident that even the conclusion of Theorem 3.4 fails in this case. Note that, if we remove the origin, which is the only ramified point, and consider only the integral points of A1 which are also integral with respect to the origin, we again obtain the case of Example 3.6. Let us examine an example involving rational points. Example 3.8 Consider the smooth complete cubic curve V defined in the projective plane by the homogeneous equation ZY 2 = X(X − Z)(X + 6Z). The field of definition is taken to be the rational field Q and the ring of Sintegers will be the usual ring of integers Z. Now, the integral points coincide with the rational ones, since the variety in question is complete. The set of such points is infinite, since the rational point (2 : 4 : 1) has infinite order under the group law corresponding to taking for the origin the point at infinity. In affine coordinates, the equation becomes y2 = x(x − 1)(x + 6). Let us take a rational point (x, y) (written relative to the affine model), where x = a/b, for a, b coprime integers. From the equation of the curve, we obtain

56

Integral Points on Curves and Other Varieties

that the rational number a a(a − b)(a + 6b) a a −1 +6 = b b b b3 must be a square in Q. Clearly, the above fraction is reduced, so both the denominator b3 and the numerator a(a − b)(a + 6b) must be squares. Now, since a, b are coprime, a, a − b are also coprime; insofar as a and a + 6b are concerned, we see immediately that either they are coprime or their greatest common divisor is 2, 3, or 6. So, every prime dividing a, with the possible exception of the primes 2 and 3, appears in the factorization √ √ of a with even multiplicity. Then a is a square in the number field Q(i, 2, 3), and so is x = a/b (recall that b is a square already in Q, since b3 is a square). Let now W be the smooth projective model of the affine algebraic curve defined in A3 by the system

y2 = x (x − 1) (x + 6) W: u2 = x which is naturally endowed with a projection π : W → V , corresponding to the √ field extension Q(V )( x)/Q(V ). The above argument shows that each rational point P ∈ V√(Q)√has a pre¯ which is defined over the number field Q(i, 2, 3), so the image in W (Q) conclusion of Theorem 3.3 is verified. Let us now see how to construct the curves W j and the map π j : W j → W as in Theorem 3.4. The three rational numbers −1, 2, 3 generate a multiplicative group of order 8 modulo rational squares; let {ε1 , . . . , ε8 } be representative for the quotient group. Define, for i = 1, . . . , 8, the curve W j to be the smooth projective model of the affine curve given by the equation

y2 = x(x − 1)(x + 6) Wj : u2 = ε j x. We call it a twisted form of the curve W defined above; it turns out to be isomorphic to W over the field of algebraic numbers, but not over Q (which is a field of definition for each W j and for W . We also define π j : W j → W as before, by sending (u, x, y) → (x, y). What we proved about the arithmetic of the rational points on V can be rephrased by saying that for each rational point P ∈ V (Q) there exists an index j ∈ {1, . . . , 8} such that π −1 j (P) is formed by rational points of W j (Q). This is the conclusion of Theorem 3.4. To finish the discussion of this example, let us check that the hypothesis of the Chevalley–Weil theorem is satisfied, namely the covering map π : W → V

3.2 The Chevalley–Weil Theorem

57

between the two complex curves in question is unramified. Since the field ex√ tension C(W )/π ∗ (C(V )) = C(V )( x)/C(V ) is obtained by adding the squareroot of the rational function x, the possible ramification can arise only over the zeros and poles of x. Proving that in fact there is no ramification amounts to showing that the rational function x ∈ C(V ) is locally a square everywhere, i.e. all its poles and zeros have even multiplicity. Now, the only pole of x is the point at infinity (0 : 1 : 0), and has multiplicity two, while its only zero is the point (0 : 0 : 1), which is a double zero. In this example W,V are elliptic curves, and hence in particular algebraic groups, and the unramified map π : W → V is an isogeny. The argument just seen is at the basis of the proof that for every elliptic curve E over a number field k the group E(k)/2·E(k) is finite (the weak Mordell–Weil theorem); from this fact, the full Mordell–Weil theorem, i.e., the finite generation of E(k), follows by height considerations. Let us now sketch a proof of the Chevalley–Weil theorem in its first form, Theorem 3.3, and then we shall formally deduce the second form from the first. Proofs of Theorem 3.3. As promised, we sketch two different proofs. In both arguments, the strategy consists of proving first that the field of definition of all the pre-images of S-integral points of V ramifies only over a finite set of primes, irrespective of the chosen point. Since the degree of these fields of definition is ≤ deg π , we shall deduce that only finitely many fields can occur (this is the theorem of Hermite); their compositum is then still a number field and contains all the coordinates of all the pre-images of the S-integral points of V , thus proving the assertion. First Proof of Theorem 3.3 We now give the details. In this first proof we suppose for simplicity that V,W are affine irreducible varieties (this will be the most important case for future applications; however, the general case can be formally deduced from this particular case). The morphism π : W → V corresponds to an integral k-algebra extension k[W ]/k[V ]; as explained above, this can be obtained by scalar extension from an OS -algebra extension, up to adding to S a suitable finite set. Let us suppose that we have carried out this operation on the finite set S, so we have an integral extension OS [W ]/OS [V ]. Now, the fact that the morphism π is unramified can be read algebraically as follows: take a basis g1 , . . . , gn , n = deg π , of the vector space k(W ) over the field k(V ) such that gi ∈ k[W ] for all i = 1, . . . , n; consider the n × n matrices T (g1 , . . . , gn ) := (σ j (gi ))≤i, j≤n , where σ1 , . . . , σn denote all the embedding of k(W ) into a fixed algebraic closure of k(W ), leaving pointwise fixed the subfield k(V ). Note that det T 2 belongs to k[W ]. The ideal generated by all such determinants for varying bases is the unit ideal precisely when π is

58

Integral Points on Curves and Other Varieties

unramified. Since k[V ] is a Noetherian ring, under this condition there exist finitely many choices of n-tuples (g1 , . . . , gn ) as above such that the corresponding determinants generate the constant 1 in k[V ]. Hence, after enlarging once again S to a finite set S ⊃ S we obtain that constant 1 function also belongs to the corresponding ideal of the ring OS [V ], i.e. the ideal generated by the determinants det T 2 , where T is now constructed starting from the n-tuples (g1 , . . . , gn ) ∈ OS [W ]n . ¯ lying above Take now an S -integral point P ∈ V (OS ) and a point Q ∈ W (k) P: π (Q) = P. Denote by k(Q) the field generated over k by the coordinates of Q. On choosing all the possible n-tuples (g1 , . . . , gn ) ∈ OS [W ]n , which are linearly independent over k(V ), the values g1 (Q), . . . , gn (Q) will generate the field k(Q). Since the corresponding determinants det(T (g1 , . . . , gn ))2 generate the unit ideal in OS [V ], the only ramification of the extension k(Q)/k can arise over the places of S . Hence all the fields k(Q), after varying P ∈ V (OS ) and Q ∈ π −1 (P), are unramified outside S and of course have degree ≤ deg π , concluding the proof, in view of the theorem of Hermite mentioned already. The following example shows that at some places the corresponding field extension can indeed ramify, although the original morphism between algebraic varieties in characteristic zero is unramified. Consider again the example W = V = Gm , and π (x) := x2 . Put OS = Z[1/3]. Then the field generated by the pre-images of the S-integral points, which are of the form ±3n , for n ∈ Z, √ is Q(i, 3). Here the prime 2 ramifies. Geometrically, this corresponds to the fact that the morphism x → x2 is not separable in characteristic 2, in particular it is ramified. Second proof of Theorem 3.3 In this second argument we suppose for simplicity that V,W are projective, so we shall be interested in rational points. Let then π : W → V be a finite unramified morphism defined over k. As in the first proof, the crucial point consists of proving that the extensions k(Q)/k, when Q ∈ W (k) satisfies π (Q) ∈ V (k), are unramified outside a finite set depending only on the map π : W → V , not on Q. Let us consider the Galois closure X → V of the cover π : W → V , which is still unramified; the conclusion of Theorem 3.3 for the cover X → V implies the same conclusion for the original cover W → V . Hence we can suppose that π : W → V is Galois, with Galois group G (so that |G| = deg π ). The action of G on W might be defined only on a finite extension of k, but again this would create no problem in our proof, so we shall suppose that k is a number field over which V,W, π , and the action of G are all defined. The fact that the morphism π is unramified can be stated by saying that for each g ∈ G, g = 1, the subvariety of W where g(x) = x is empty. On reducing

3.2 The Chevalley–Weil Theorem

59

modulo a prime (or valuation) ν of k, we obtain that the same remains true of the corresponding varieties over the residue fields, up to finitely many exceptions: G acts freely on the reduced variety W modulo ν . Let S be the finite set of primes responsible for such exceptions. Let P ∈ V (k) be a rational point and Q ∈ W (k) be in the pre-image π −1 (P) of P. Now let ν be a prime (valuation) of k outside S. We want to prove that the extension k(Q)/k(P) is unramified at ν . For this purpose, we let Γ be the Galois group of the Galois closure of k(Q)/k(P) and observe that for each γ ∈ Γ there exists g ∈ G with γ (y) = g(y). Suppose by contradiction that it is ramified and let γ ∈ Γ be an element of the inertia group with γ = 1, so that γ (y) = y but γ (y) ≡ y modulo ν . Letting g ∈ G as above (i.e. coinciding with γ on y), we obtain that g = 1 (because g(y) = y), but g(y) ≡ y modulo ν , contradicting the fact that the reduction of g modulo ν has no fixed point. This contradiction concludes the proof that the extension k(Q)/k(P) is unramified at each place ν ∈ S. The rest of the proof runs as before. Sketch of deduction of Theorem 3.4 from Theorem 3.3 Suppose we have two varieties V,W , defined over a number field k, and an (unramified) morphism π : W → V , such that for a number field k extending k, the inclusion V (k) ⊂ π (W (k )) holds. The construction of the varieties W j appearing in Theorem 3.4 makes use of the so-called restriction-of-scalars functor, whose construction we now recall (see also [Se3], Section 3.2). Given a field extension k /k and a variety W over k one can construct another variety W˜ := Resk /k (W ) as follows: put W˜ = ∏σ W σ , where σ runs over all the k-embedding k → k and W σ is the σ -twist of W by σ . For each Galois automorphism σ ∈ Gal(k/k) there is a natural automorphism between W σ and W , so W comes from a k-variety by extension of scalars. There is a natural set-identification W (k ) W˜ (k). In the case of our interest, W will be defined over k, so W˜ will be isomorphic to W [k :k] over k , so clearly it can be defined over k. However, the action of Gal(k/k) will not be trivial, so W˜ will not be isomorphic, over k to W [k :k] . Also, W embeds diagonally into W˜ ; let us denote by Δ the image of such embedding. In the identification of W (k ) with W˜ (k), the subset W (k) will be identified with Δ(k). Let us now come back to the situation of Theorem 3.3, with two k-varieties V,W and a morphism π : W → V . This morphism induces a corresponding morphism π˜ : W˜ → V˜ , defined over k, where V˜ , W˜ are obtained from V and W by scalar extension k /k as explained above. Also V embeds diagonally into V˜ and we denote by Δ V the image of V inside V˜ . Let X˜ := π −1 (Δ) ⊂ W˜ . It is a

60

Integral Points on Curves and Other Varieties

variety defined over k. In view of the identification W (k ) W˜ (k) and the inclusion V (k) ⊂ π (W (k )), which can also be written as V (k) = π (W (k ) ∩ ˜ π −1 (V (k))), we obtain that Δ(k) = π (X(k)). Recall that Δ is isomorphic to V and that the restriction π |X˜ : X˜ → Δ is a finite map. We have then concluded the construction of the varieties Wi appearing in Theorem 3.4 which are now just ˜ It remains to prove that each such compothe irreducible components of X. nent is geometrically isomorphic to W . This follows from the fact that, over an algebraic closure k/k, the varieties V˜ (resp. W˜ ) are isomorphic to V [k :k] (resp. W [k :k] ).

3.3 Integral Points on Curves: Siegel’s Theorem In Chapter 1 we recalled a few simple and classical results on integral points on affine lines or conics; we have seen that their set may sometimes be infinite. (As remarked above, this is always the case if k, S are large enough.) Then we have seen how the results obtained by Thue, Roth, Mahler and Ridout in Diophantine approximation imply the finiteness of integral points on other families of curves, e.g. those defined by Thue’s equation f (X,Y ) = c, where f is a form of degree ≥ 3, without multiple factors, and c is a non-zero constant. Such equations, even though in a sense they are fairly general (for instance their degree is unbounded), represent curves of a rather special type. It is of course natural to ask what happens for an arbitrary curve. This problem was solved by Siegel in 1929 in a way that may be considered complete if one forgets the question of effectivity, which has not yet been clarified in general. In Mahler’s version for S-integers, Siegel’s theorem may be stated as follows. Theorem 3.9 (Siegel 1929 for S = M∞ , Mahler for genus 1, any S) Let C be an affine irreducible algebraic curve. Suppose that C(OS ) is infinite; then C has genus zero and at most two points at infinity. Let us pause to appreciate this remarkable result; for instance, it easily implies (see Exercise 3.67 below) the Thue–Mahler theorem, Theorem 1.42. The most concrete case (roughly equivalent to the general one) occurs with a plane curve, defined by an irreducible equation f (X,Y ) = 0. The theorem implies that, if there are infinitely many pairs (p, q) ∈ OS2 with f (p, q) = 0, then the curve has genus zero and at most two asymptotic directions (over C). We recall that the genus of an (irreducible) curve is a natural number; it can be defined in several ways (see e.g. [L4]). We have, for instance, a topological

3.3 Integral Points on Curves: Siegel’s Theorem

61

definition. In fact, it may be shown that the set C(C) of complex points of a curve C is, apart from a finite number of singularities, homeomorphic to a torus with g handles with a finite number of points removed; well, this integer g is precisely the genus. Hence, for a curve of genus zero, the set C(C) is, up to a finite set, homeomorphic to the Riemann sphere S2 . Algebraically, the curves of genus zero are those which may be parametrized, i.e., they are birational with P1 . This amounts to the existence of a rational map ϕ : P1 → C which is bijective up to a finite set of exceptional points. In the case of a plane curve as above this is in turn equivalent (Luroth’s theorem, see [Sch1]) to the existence of rational functions r(t), s(t) ∈ C(t) that are not both constant and such that f (r(t), s(t)) = 0 identically. Starting from this parametrization of the complex points, it is possible to parametrize the integral points as well; however, this cannot always be done with rational functions!5 Significant examples occur with the line, the parabola, and the hyperbola, which were considered in Chapter 1. In these three cases we have respectively one, one, and two points at infinity. Correspondingly, the parametrization takes a polynomial or exponential shape. We have here simple examples of how the geometry of the affine curve affects the distribution of integral points. With two points at infinity the integral points are much more sparse than with a single point at infinity, while three points at infinity already imply finiteness (irrespective of the genus), by Siegel’s theorem. It is important to note that the genus is computable in a systematic algebraic way, starting from a defining system of equations for the curve; hence, the conclusion of the theorem is easy to check, and this can be done independently of arithmetical concepts. Further, we remark that the theorem is a best-possible result, in the sense that a kind of converse is true (see Exercise 3.64 below); namely, if the curve C/k is non-singular and satisfies the conclusion, then C(Ok,S ) is infinite for suitably “large” k, S. We may thus say that the structure of the complex points of a curve determines the existence of an infinity of integral points. It is also worth noticing that the non-singularity assumption is not restrictive; in fact, we have already observed that for the analysis of integral points one can work with normal varieties, which for curves amounts to non-singularity. It is, however, possible that in a non-singular model the number of points at infinity increases; in this case the result becomes even stronger. It is unfortunately still ineffective (except for certain special cases). We now provide some alternative formulations of Siegel’s theorem. 5

In contrast, this is the case for the set of rational points, provided that such a non-singular point exists.

62

Integral Points on Curves and Other Varieties

It was remarked by Lang ([L2], Chapter 8, Theorem 2.4) that Siegel’s theorem can be rephrased as follow. Theorem 3.10 (Siegel’s theorem – alternative version) Let C be a smooth complete curve over a number field k. Let ϕ ∈ k(C) be a non-constant function, and OS ⊂ k a ring of S-integers. The set of rational points p ∈ C(k) such that ϕ (p) ∈ OS is finite, unless C is rational and ϕ has at most two poles. Here we give an improvement, as follows. Theorem 3.11 (Generalized Siegel’s theorem) Let π : X → C be a finite morphism between smooth projective curves, defined over a number field k. Let Σ ¯ Let OS ⊂ k be a ring of S-integers. be a finite non-empty set of points in C(k). Finally, let φ1 , . . . , φh ∈ k(X) be rational functions on X such that the union of their poles is sent by π precisely onto Σ. Suppose that there are infinitely many ¯ such that π (p) ∈ C(k) and such that φi (p) is an algebraic points p ∈ X(k) S-integer for each i = 1, . . . , h. Then g = 0 and |Σ| ≤ 2. Remark If Σ is empty, by using Falting’s theorem on integral points one can obtain the conclusion that the genus of C is ≤ 1. In the proof, we make use of a tool from Galois theory as follows. Given a Galois cover π : X → C of algebraic curves over a number field k, with ¯ be an algebraic point of X outside the ramifiΓ = Gal(X/C), let p ∈ X(k) cation locus of π which is sent to a rational point of π (p) ∈ C(k). Then the ¯ The defiber π −1 (π (p)) is a union of orbits for the Galois group Gal(k/k). composition group of p is the subgroup Δ ⊂ Γ stabilizing the orbit of p. One can prove that this group is isomorphic to Gal(k(p)/k).6 Proof We reduce to an ordinary case of Siegel’s theorem, which is recovered on taking X = C. We may and shall assume that Σ is non-empty. First, we may suppose that the cover is Galois, with a group denoted by G. For p in our infinite set denoted by R, we let Δ p be the decomposition group at p, and we may assume that Δ p = Δ is the same for all p ∈ R. We let Y be the smooth curve corresponding to Δ, i.e., Y = X/Δ. Note that the natural map X → Y sends every p ∈ R to a rational point of Y . Hence we may replace Y with C and assume at the outset that Y = C and Δ = G. We may find a function ψ ∈ k(X) in the algebra generated by the φi such that the set T of poles of ψ projects surjectively onto Σ through π . Let ψ m +a1 ψ m−1 +· · ·+am = 0 be the minimal equation satisfied by ψ over k(C). Note that the a j are rational functions on C, not all constant, and such 6

If p is ramified, the same holds after making a quotient by the inertia group.

3.3 Integral Points on Curves: Siegel’s Theorem

63

that the set of their poles is precisely Σ. In fact, these coefficients are symmetric functions in the conjugates of ψ , so their poles lie above points of Σ. Now, for p ∈ R the equation Z m + a1 (p)Z m−1 + · · · + am (p) has coefficients in k and is irreducible over k, because the decomposition group is G. Since one solution is ψ (p), which is an algebraic S-integer, all the solutions which are conjugate to ψ (p) have the same property. Hence all the coefficients ai (p) are in OS . The result now follows immediately from Siegel’s theorem.

3.3.1 A Sketch of Siegel’s Argument, with Modern Tools We give in this section an overview, with modern language, of Siegel’s original proof published in [Sie] (nowadays there also exists an English translation, [FZ], of Siegel’s paper, containing a discussion on Siegel’s proof and its developments from a modern viewpoint). Most modern proofs in the literature (see [HiSi], [L2], [Se1]) follow in substance Siegel’s original argument, with a technical simplification due to the use of Roth’s theorem (Siegel had at his disposal only a weaker version thereof). One uses also the embedding of a curve in its Jacobian variety J, the structure of J(k), and the behaviour of the height in J(Q). Without discussing any of these concepts, we shall now briefly survey the essentials of this proof. In the next section we shall present a new proof based on the subspace theorem, which entirely avoids the recourse to Jacobians and their arithmetic. The case of genus g = 0 with three (or more) points at infinity was considered at the end of Section 3.1: now C˜ is birational with P1 and we have seen that removing three points leads to the S-unit equation x + y = 1, x, y ∈ OS∗ , which was within the previous chapter (e.g., by means of Roth’s generalized theorem). Let us then suppose that the genus g is positive, considering for simplicity the case k = Q, OS = Z. Assume by contradiction given an infinite sequence of pairwise distinct integral points Pi ∈ C(Z). Going to an infinite subsequence, we may assume that Pi converges (in the usual absolute value) to some point at infinity Q (which will have algebraic projective coordinates). Let C˜ be a projective completion of our curve; locally near any point, we may define a distance function (e.g., by considering some affine coordinates that are regular on the subset). Consider now the distance d(Pi , Q); for geometrical reasons it will be |Pi |−δ , where δ > 0 depends only on the distance function and where |P| is the maximum absolute value of the coordinates. On the other hand, the Pi are integral points, whence H(Pi ) ≤ |Pi | and d(Pi , Q) H(Pi )−δ .

64

Integral Points on Curves and Other Varieties

However, Q is an algebraic point and the Pi are in particular rational points. Then, if ε > 0, we have d(Pi , Q) H(Pi )−2−ε by Roth’s theorem (applied to the coordinates). If δ happens to be > 2, the above inequalities are compatible only for finitely many points (it suffices to choose ε < δ − 2), concluding the argument.7 Siegel was able to reduce the general case to this approach by embedding the curve in its Jacobian variety; we recall that this is an abelian variety (of dimension g), namely an irreducible projective algebraic variety endowed with an algebraic group law (which is shown to be necessarily commutative). Concerning this, here we say only a little more on the special, though very important, case, when the curve is a plane non-singular cubic; we then have g = 1 and the Jacobian may be identified with the curve itself. The group law may be explained geometrically: with a pair of points P, P on the curve we associate a third one P ∗ P , i.e. the remaining intersection of the cubic with the line through P and P (or with the tangent through P if P = P ). Having selected a point O on the curve (it will be the identity element), we set P+P := O∗(P∗P ). This is the famous group law on an elliptic curve defined by a plane cubic. The sketched procedure, which was apparently observed for the first time by Newton, was often called the chord and tangent process. (See [Sil1] for this case of genus 1 and, for example, [BoG] or [HiSi] for some general theory of abelian varieties, especially from the arithmetical viewpoint.) For the Jacobian variety (into which the curve may be embedded) and its group J(k) of k-rational points, where k is a number field, we have the celebrated Mordell–Weil theorem (see [BoG], [HiSi], [L2], [Se1]): J(k) is a finitely generated abelian group. In particular (according to the “weak Mordell–Weil theorem”), for any positive integer m, the quotient group J(k)/mJ(k) is finite (see the remarks following Example 3.8 for the one-dimensional case). Siegel applied this theory to the above context. The points Pi lie in particular in J(Q). By applying the weak Mordell–Weil theorem we may write, going to an infinite subsequence of the points, Pi = mPi + R, where Pi , R ∈ J(Q) and R is fixed for the whole subsequence. Since J(C) is compact, on going further to an infinite subsequence we may assume that the Pi converge to a point Q ∈ J. Since Pi → Q, we see that Q = mQ + R and it follows that Q is an algebraic point. Let us now apply Roth’s theorem as above, but replacing C with J and Pi , Q with Pi , Q . We obtain (on choosing ε = 1) d(Pi , Q ) H(Pi )−3 . 7

This is actually the proof in the case of Thue’s equations f (X,Y ) = c (where f is a form without multiple factors), provided we take as distance function at the point at infinity (α : 1 : 0) the quantity |(X/Y ) − α |; here we use projective coordinates (X : Y : Z) and α is a root of f (t, 1).

3.4 Another Approach to Siegel’s Theorem

65

To conclude, we still need two crucial observations. The first, more elementary, one is that P → mP is a covering map, whence locally it “almost” preserves the distance (that is, up to a constant factor); the same may be proved to hold for the translation P → P + R on the Jacobian, whence d(Pi , Q) d(Pi , Q ).

(3.1)

The second, more sophisticated, fact concerns the behavior of the height in J(k) with respect to an endomorphism. In the case of the multiplication-by-m map followed by a translation by R (where m and R are fixed), one can prove 2 that (for a suitable projective embedding of J) H(mP + R) H(P)m /2 (where the exponent m2 /2 could be replaced with m2 (1 − ε ), a sharpening which is 2 immaterial here). Hence H(Pi ) = H(mPi + R) H(Pi )m /2 . Remark 3.12 A simpler but illustrative analogue appears in the case of Gm , ∗ in place of J: let in fact P ∈ Q = Gm (Q) be “near” to 1, say. Then, for fixed m m, the distance |P − 1| |P − 1|; but H(Pm ) = H(P)m may be much bigger than H(P). Therefore, while the map P → mP + R does not strongly deform the distances, it appreciably changes the heights. Siegel took advantage of this phenomenon: by (3.1) and by applying Roth’s theorem to the Pi , Q , one finds as before d(Pi , Q) H(Pi )−3 . Using now the 2 transformation inequality for the height, one obtains d(Pi , Q) H(Pi )−6/m . We see that, for large m, this substantially strengthens the direct consequence of Roth’s theorem for the Pi , Q. In particular, on choosing m > 6/δ , we deduce that H(Pi ) is bounded (recall that d(Pi , Q) H(Pi )−δ ), a contradiction which concludes the argument.

3.4 Another Approach to Siegel’s Theorem As anticipated above, we shall now describe another proof of Siegel’s theorem, which was proposed in [CZ4], where the subspace theorem replaces Roth’s theorem. This proof advantageously avoids any recourse to the arithmetic of the Jacobian (and even to its existence). Beyond this methodological point, later we shall point out other advantages of such a method; in fact it leads to quantitative conclusions which are often superior to those coming from the classical argument; also, it may sometimes be applied to affine varieties of dimension > 1 whose divisor at infinity is sufficiently reducible.

66

Integral Points on Curves and Other Varieties

We saw in Section 3.3 that Siegel’s theorem may be proved without appealing to the Jacobian if the number δ defined there, which depends on the chosen metric, is > 2. This happens only for certain special curves, like, for example, those defined by Thue’s equations. The new principle is to change the embedding of the curve, in order to get an advantageous induced metric; in practice, this amounts to the existence of (many) linear spaces with high-order contact with the curve at a point at infinity. To this end, it proves necessary to increase freely the dimension of the ambient space; it is here that Roth’s theorem no longer suffices, with a multi-dimensional extension of it, which is represented precisely by the subspace theorem, being necessary. Let then C be as in Theorem 3.9, let C˜ be its projective completion, and let ˜ C \ C = {Q1 , . . . , Qr } be the set of points at infinity (Qi = Q j for i = j). The construction alluded to above in fact succeeds only if r ≥ 3, so we start with this case; we shall see later how to deduce the general one by means of a rather classical principle, which involves going to an unramified cover of C. Theorem 3.13

If r ≥ 3 then C has only a finite number of S-integral points.

Proof We have already observed that C˜ may be assumed to be non-singular; upon enlarging k we can also assume that all the Qi are defined over k. For a positive integer N, to be specified in what follows, let us consider the vector space V = VN over k, made up of rational functions in k(C) having poles at most at the Qi (and hence regular on C) with orders ≤ N; namely V = VN = {ϕ ∈ k(C) : div (ϕ ) ≥ −N(Q1 + · · · + Qr )}. We recall a weak version of the Riemann–Roch theorem (see [L4] or [Se2]) amenable to an easy proof which states that for all N > 0 d = dN := dimk VN ≥ Nr − c, where c (which might be taken equal to g − 1) depends only on C. Let N be so large that d ≥ 2N + 2 (recall r ≥ 3), and let {ϕ1 , . . . , ϕd } be a basis for V . Let now {Pn } be an infinite sequence of distinct S-integral points. Then, since the ϕi are regular on C, on multiplying them by a suitable non-zero integer if necessary, we shall have ϕi (Pn ) ∈ OS for i = 1, . . . , d and for all n ∈ N. ˜ v ) is compact for the v-adic topology. ThereNow, since C˜ is projective, C(k fore, on going to an infinite subsequence of the points, we can assume that, ˜ v ). We now write for all v ∈ S, Pn converges v-adically to a point Pv ∈ C(k S = S ∪ S , where S is the set of places in S such that Pv ∈ {Q1 , . . . , Qr } and where S = S \ S . Observe at once that for v ∈ S the values |ϕi (Pn )|v are uniformly bounded, since Pv then lies in C(kv ) and the functions ϕi are regular on C.

3.4 Another Approach to Siegel’s Theorem

67

Fix now v ∈ S and consider, for j ≥ 1, the subspace of V defined by W j = W j,v = {ϕ ∈ V : ordPv ϕ ≥ j − 1 − N}. We have V = W1 ⊃ W2 ⊃ · · · and dim(W j /W j+1 ) ≤ 1, since increasing the order at Pv by 1 corresponds to the vanishing of a single coefficient in the local Laurent series at Pv . In particular, dimW j ≥ d − j + 1. We can now pick a basis for Wd = 0 and complete it successively to bases for Wd−1 ,Wd−2 , . . . ,W1 , obtaining vectors wd , wd−1 , . . . , w1 . We shall have w j ∈ W j , since dimW j ≥ d − j + 1. Expressing these vectors in terms of ϕ1 , . . . , ϕd , we shall obtain independent linear forms Ldv , . . . , L1v in ϕ1 , . . . , ϕd , defined ˜ and such that over k (since now v ∈ S and Pv ∈ C(k)) ordPv L jv ≥ j − 1 − N,

j = 1, . . . , d.

We now define such forms even for v ∈ S , putting L jv = ϕ j for j = 1, . . . , d. For v ∈ S , let us choose a local parameter tv ∈ k(C) at Pv ; in other words, ordPv tv = 1, so the last displayed formula yields, for n → ∞, |L jv (Pn )|v |tv (Pn )|vj−1−N ,

j = 1, . . . , d,

Moreover, |L jv (Pn )|v 1 for v ∈ S (since the |ϕ j (Pn )|v are then bounded). Hence (observe that ∑dj=1 ( j − 1 − N) = (d/2)(d − 2N − 1)), (d/2)(d−2N−1) d

∏ ∏ |L jv (Pn )|v ∏ |tv (Pn )|v

v∈S j=1

.

v∈S

On the other hand, the values ϕ j (Pn ) are S-integers, so max j |ϕ j (Pn )|v ≤ 1 for v ∈ S; moreover, we deduce as above that max j |ϕ j (Pn )|v |tv (Pn )|−N v for v ∈ S and (as we have already noticed) max j |ϕ j (Pn )|v 1 for v ∈ S . Then the height H(ϕ1 (Pn ) : · · · : ϕd (Pn )) is (∏v∈S |tv (Pn )|v )−N ; by comparison with the above we find (recall also that d ≥ 2N + 2) d

∏ ∏ |L jv (Pn )|v H(ϕ1 (Pn ) : · · · : ϕd (Pn ))−

d(d−2N−1) 2N

v∈S j=1

d

H(ϕ1 (Pn ) : · · · : ϕd (Pn ))− 2N . We have d ≥ 2N +2 ≥ 2, so the ϕ j are not all proportional. Hence H(ϕ1 (Pn ) : · · · : ϕd (Pn )) → ∞ for n → ∞, for otherwise the ratios ϕ j (Pn )/ϕ1 (Pn ) would all lie in a finite set independent of n, and the same would happen for the Pn . We may then apply the subspace theorem, Theorem 2.3 (e.g. with ε = d/4N > 0), and conclude that all the (S-integer) points (ϕ1 (Pn ), . . . , ϕd (Pn )), n ∈ N,

68

Integral Points on Curves and Other Varieties

lie in a certain finite union of subspaces of kd . Now ϕ1 , . . . , ϕd are linearly ˜ and again we conclude that the Pn all lie in a finite independent functions on C, set independent of n, concluding the proof of Theorem 3.13. Proof of Siegel’s Theorem 3.9 To deduce the general case of Siegel’s theorem from Theorem 3.13, we can assume that C˜ has positive genus. Then its ˜ is not trivial and there exists a topological covering fundamental group π1 (C) ˜ ˜ space π : C → C of finite degree ≥ 3. It is a well-known (but rather deep) fact that the cover may be given the structure of a cover of algebraic curves (see [Fo]), and an easy specialization argument shows that it may be assumed to be defined over Q. (Alternatively, one may construct C˜ by embedding C˜ in its Jacobian and considering the inverse image of an isogeny.) Let now C = π −1 (C). Then C is affine, #(C˜ \ C ) ≥ deg π ≥ 3 and the restriction map πC : C → C is an unramified cover. At this point we apply the Chevalley–Weil theorem, in the form given in Theorem 3.3. Suppose now by contradiction that C contains infinitely many S-integral points. By applying the Chevalley–Weil theorem to the unramified cover π : C˜ → C˜ and the points P ∈ C(OS ), we obtain that for all P ∈ C(OS ) the points in π −1 (P) are quasi-S -integral on C , for a finite set S of places of a suitable fixed number field k . It suffices now to apply Theorem 3.13 to C , the number field k , and the set of places S to obtain a contradiction. Remark 3.14 (i) In 1982 Faltings proved the Mordell conjecture: a curve of genus ≥ 2 has at most finitely many rational points. Hence, for genus ≥ 2 we have a much stronger result than Siegel’s theorem. Faltings’ original proof was very sophisticated; later, Masser and Wüstholz got a crucial intermediate result (which is also important for its own sake) as a corollary of their studies on the transcendence of values of abelian functions (see e.g. [L3], Part IV, and [Mass]). Subsequently, Vojta [Vo2] found another completely different approach, building on a principle that had been discovered by Mumford in the 1960s. Vojta’s proof was substantially simplified by Bombieri, who brought to light important analogies with the Thue–Siegel–Roth method (see [Bo4] and also [HiSi], [BoG] for a complete description). Of course, curves of genus 1 may well admit an infinity of rational points (recall, for example, the geometric process of Section 3.2 above), but only a finite number of integral points. Thus Faltings’ theorem does not completely cover Siegel’s. However, it is not difficult to deduce Siegel’s theorem from Faltings’ even for genus ≤ 1, by going to a suitable cover of genus ≥ 2 and using the Chevalley–Weil theorem, more or less as above.

3.4 Another Approach to Siegel’s Theorem

69

(ii) Using quantitative versions of the subspace theorem (for instance the one due to Evertse in [E2]), the present method for the proof of Theorem 3.13 leads to estimates for the number of integral points which seem to be missed by the classical approach. For example, one can prove that, if C has at least three points at infinity and is defined in Am by equations of degree ≤ d and height ≤ H, then the number of its S-integral points of height ≥ H c is bounded by c#S , where c depends only on m, d. In particular, for fixed C we have that #C(Ok ) is bounded in terms only of the degree [k : Q]. See [CZ5] for this result and certain corollaries of it. The following sequence of exercises aims at classifying curves with infinitely many integral points over Z. Exercise 3.15 Let C be an affine (possibly singular) rational curve, defined over a number field k, with two points at infinity A, B. Prove that A and B are either rational or conjugate quadratic over k. Exercise 3.16 In the notation of the previous exercise, suppose that k = Q and that A, B are rational. Using the Riemann–Roch formula (or elementary linear algebra considerations), prove that there exist non-constant rational functions fA , fB , defined over Q, such that the only pole of fA is A and the only pole of fB is B. Observing that at each integral point P ∈ C(Z) the values fA (P), fB (P) must be rational with uniformely bounded denominators, deduce that C(Z) is finite. (Hint: suppose by contradiction C(Z) is infinite; extract two sequences, one converging to A and the other to B, their union forming the whole set C(Z). Note that fB takes only finitely many values on the first sequence and fA only finitely many values on the second one. This argument is called Runge’s method.) Exercise 3.17 Suppose now C is defined over Q and has irrational quadratic points at infinity. Suppose, moreover, that their field of definition is imaginary quadratic. Deduce that C(R) is compact and that C(Z) is again finite. This is another instance of Runge’s method. Exercise 3.18 Now let C be a (possibly singular) rational curve with two real-quadratic points at infinity A, B. Apply the Riemann–Roch formula to the divisor (A + B) to deduce the existence of two linearly independent functions f , g ∈ Q(C) of degree 2 on C, with poles only at A and B. Apply the Riemann–Roch formula to the divisor 2A + 2B to deduce that the six functions 1, f , g, f g, f 2 , g2 are linearly dependent. Deduce the existence of a degree-one map C → X, where X ⊂ A2 is a hyperbola.

70

Integral Points on Curves and Other Varieties

Exercise 3.19 Let b ≥ 2 be an integer and let f (X) ∈ Z[X] be a polynomial. Prove that if f (X) has degree ≥ 2 and is not a monomial, then the Diophantine equation f (m) = bn has only finitely many solutions.

3.5 Varieties of Higher Dimension For varieties of dimension > 1 our knowledge of integral or rational points is more fragmentary. However, for a subvariety V of an abelian variety Faltings (see [Fa], [EE]) has proved in particular that Zariski closure of V (k) consists of a finite union of translates of abelian subvarieties of A (Lang’s conjecture) and that ([Fa], Corollary 6.2 to Theorem 2), if D is an ample divisor in A, then A \ D has only finitely many integral points. These are very deep results, which Faltings obtained by means of an extension of Vojta’s method [Vo2] for Mordell’s conjecture (see Remark 3.14 (ii) above). For integral points we have also Laurent’s theorem (2.7) and Schmidt’s theorem (2.9), and one can reduce to Laurent’s theorem a result by Vojta that we shall state in a moment. Furthermore, Vojta, [Vo4], has extended Faltings’ theorems to semiabelian varieties, which are algebraic groups, namely extensions of abelian varieties by Gnm (see Section 3.5.1). In particular, his deep conclusions combine Faltings’ with Laurent’s results. In general, Laurent’s, Faltings’, and Vojta’s results may be applied to varieties that may be embedded (or at least admit non-trivial maps) in some group Gnm or (like curves of positive genus) in an abelian variety or, more generally, in a semiabelian variety. However, this is not always the case in dimension > 1. A few other results have recently been established in [CZ9] and [CZ4], by means of an extension of the method employed for Theorem 3.13; we shall recall later a statement in this direction. Let us now state one of Vojta’s results. Let V˜ be a projective non-singular variety over k. The group Div(V˜ ) of divisors on V˜ has two relevant quotients: the Picard group Pic (V˜ ) and the Néron–Severi group NS(V˜ ), defined respectively by means of linear and algebraic equivalence (see, for example, [H]). For instance, these groups are equal when V˜ = Pn , and NS(Pn ) is generated by the algebraic equivalence class of a hyperplane, and is thus isomorphic to Z; the class of a hypersurface is represented by its degree. Another interesting case occurs if P1 × P1 ; in this case the NS group is isomorphic to Z × Z, through the bi-degree. In general, there is a surjective map Pic(V˜ ) → NS(V˜ ), whose kernel is denoted Pic0 (V˜ ). It is known that this has the structure of an abelian variety (the

3.5 Varieties of Higher Dimension

71

Jacobian in the case of curves), while NS(V˜ ) is finitely generated. Vojta has proved the following theorem. Theorem 3.20 (Vojta 1983) Let ρ be the rank of NS(V˜ ) and let D be the sum of at least dim V˜ + ρ +1 distinct irreducible divisors. Suppose that Pic0 (V˜ ) = 0. Then no set of quasi-S-integral points on V := V˜ \ D is Zariski-dense in V . The last conclusion implies that, if V is embedded in some affine space, there exists a proper subvariety of V containing all the integral points. This theorem is a special case of [Vo1], Theorem 2.4.1, where it is not assumed that Pic0 (V˜ ) = 0; this hypothesis for instance does not hold for curves of positive genus, but it is often true in higher dimensions (see [Vo1], p. 23). Without this assumption, Vojta proved the same conclusion, but assuming that the number of components of D is ≥ dimV + ρ + 1 + r, where r is the rank of Pic0 (V˜ )(k); now, if Pic0 = 0, this rank grows with the ground field k, so one does not have a “geometric” conclusion. Subsequently, in [Vo4], Corollary 0.3, Vojta, by applying his much deeper results on subvarieties of semi-abelian varieties, was able to remove the arithmetic assumption about the rank of Pic0 (V˜ )(k), at the cost of strengthening the assumptions on the components of the divisor D, which were now supposed to be ample. (Further applications of the theorems of [Vo4] have been given in [NW1].) Theorem 3.20 implies once more that the curve “P1 minus three points” has at most a finite number of integral points. More generally, it implies that upon removing from Pn at least n + 2 divisors the integral points are not Zariskidense (but they may be infinitely many; see Exercise 3.68); in fact, as recalled above, NS(Pn ) ∼ = Z has rank 1, while Pic0 (Pn ) vanishes. Proof of Theorem 3.20 On enlarging k we may assume that D is the sum of distinct divisors D1 , . . . , Ds defined over k, with s ≥ dimV + ρ + 1. Then there exist at least dimV + 1 independent relations ai1 D1 + · · · + ais Ds ≈ 0, i = 1, . . . , n := dimV + 1, where ≈ denotes algebraic equivalence; on the other hand, Pic0 (V˜ ) vanishes, whence algebraic and linear equivalence coincide. Therefore there exist rational functions fi ∈ k(V˜ ) such that div( fi ) = ai1 D1 + · · · + ais Ds . In particular, the fi and the 1/ fi have zeros and poles contained in |D|, and are thus regular on V . Hence they assume S-integer values (up to a constant factor) on any set Σ ⊂ V (k) of quasi-S-integral points. Therefore, up to a constant factor the fi assume S-unit values on Σ. In particular, the values fi (P), for P ∈ Σ lie in a finitely generated group, G ⊂ k∗ , irrespective of P. Consider now the rational map ϕ = ( f1 , . . . , fn ) : V → Gnm (its image lies in

72

Integral Points on Curves and Other Varieties

fact in Gnm since the fi have no zero or pole in V ). Let W be the Zariski closure of ϕ (V ); it is an irreducible variety; moreover, it is a general fact that ϕ (V ) contains a non-empty set which is Zariski-open in W , and so dimW ≤ dimV = n − 1, whence W is properly contained in Gnm . Observe that W contains ϕ (Σ), which in turn is contained in the finitely generated group Gn . By Theorem 2.7, ϕ (Σ) is contained in a finite union of algebraic translates in Gnm , which is entirely contained in W . Suppose that W itself is an algebraic translate. In particular, since W = Gnm , we would then have an equation X1b1 · · · Xnbn = λ valid on the whole W , where the Xi are coordinates on Gnm , the bi are integers not all zero, and λ is a non-zero constant. Then the function f1b1 · · · fnbn would be constant on V , and hence on V˜ . Therefore its divisor would be zero; in turn, this would entail the dependence of the linear forms ai1Y1 + · · · + aisYs , i = 1, . . . , n, which would constitute a contradiction. Therefore W is not an algebraic translate. Then ϕ (Σ) is not Zariski-dense in W , whence Σ is not Zariski-dense in V . We have shown that no quasi-S-integral set of k-points may be Zariski-dense in V , i.e. the sought conclusion. It will be noticed that the proof substantially boils down to Theorem 2.7, and in turn to the S-unit equation. A result for non-singular surfaces which sometimes goes beyond this principle has been obtained in [CZ7]. We give below the statement of the main theorem in [CZ7] and some of its corollaries. ˜ be an irreducible projective non-singular surface. One can define Let X/k ˜ (See [H]; one has an intersection product D.D ∈ Z, for divisors D, D on X. D.D = #(D ∩ D ) if D, D are effective, reduced, and have only transversal intersections.) We recall that a divisor D on a surface X˜ is said to be nef (mean˜ D.C ≥ 0. A divisor D on ing numerically effective) if, for every curve C ⊂ X, 0 ˜ O(nD)) n2 . a surface X˜ is said to be big if dim H (X, We have the following theorem. Theorem 3.21 Let X˜ be as above and let X ⊂ X˜ be an affine open subset. Assume that X˜ \ X = D1 ∪ · · · ∪ Dr , where the Di are distinct irreducible divisors, no three of them sharing a common point. Suppose there exist positive integers p1 , . . . , pr such that the divisor D := p1 D1 + · · · + pr Dr is big and nef. Suppose also that the following holds: letting, for each i = 1, . . . , r, ξi be the minimal positive real root8 to the equation (D − ξ Di ) = D2i ξ 2 − 2D.Di ξ + D2 = 0,

(3.2)

2ξi D2 > D.Di ξi2 + 3pi D2 .

(3.3)

we have 8

As a consequence of the Hodge index theorem, the roots are real.

3.5 Varieties of Higher Dimension

73

Then the integral points on X˜ \ (D1 ∪ · · · ∪ Dr ) are not Zariski-dense. The conditions expressed by inequality (3.3) might seem a little cumbersome; however, they depend only on geometric data. The corollaries below, stated as Theorems 3.22, 3.23, 3.26, and 3.27, give concrete applications. We notice at once that some condition, besides the number of curves “at infinity” is needed in order to ensure the degeneracy of integral points on an open surface: in fact, the affine plane A2 , where integral points are clearly Zariski-dense, can be embedded into a complete surface X˜ so that the complement X˜ \ A2 consists of the union of an arbitrary number of irreducible curves. It suffices to first embed A2 → P2 in the standard way, and then blow up some points at infinity. The new divisors arising as exceptional curves will have negative self-intersection. Hence it makes sense to ask for a condition on the intersection matrix associated with the set {D1 , . . . , Dr }, not just on its cardinality r. Anyway, for the result to apply, it is needed that the divisor at infinity splits. We do not give the full proof of Theorem 3.21 here, but we now explain the main ideas. The pattern of the proof resembles that for Theorem 3.13; namely, by means of a Riemann–Roch theorem, one embeds the surface in a space of large dimension, and then constructs regular functions on X vanishing to a large order along prescribed divisors at infinity. Finally, the subspace theorem is applied to conclude. The main difference with the case of curves lies in the estimates for the codimensions of the subspaces W j,v (compare this case with the proof of Theorem 3.13). We give a short explanation, showing the role of the intersection products appearing also in the statement of the theorem, as follows. Supposing for simplicity all weights p1 , . . . , pr to be equal to 1, consider again the divisor ND = N(D1 + · · · + Dr ) and the associated space of func˜ : div(ϕ ) ≥ −ND}. As in the proof of Siegel’s theorem tions VN = {ϕ ∈ k(X) for curves, we need to construct a filtration W j = W j,v in VN , defined as in the one-dimensional case: W j,v = {ϕ ∈ VN : ordDv (ϕ ) ≥ −ND + jDv }, where v ∈ S and Dv is one of the Di . Here, however, the codimension of W j,v in VN cannot be bounded just by j. In some sense, while working on linear systems on a surface, it is more demanding to impose a vanishing condition on a curve of high degree than on one of lower degree, while this distinction does not arise for points on a curve (all points are algebraically equivalent). In general, we have an estimate of the kind dimW j,v /W j+1,v ≤ (ND − jDv ).Dv + 1,

74

Integral Points on Curves and Other Varieties

unless W j,v is zero-dimensional. With this proviso, the proof follows the same pattern as the one of Theorem 3.13. We suppose that we have a sequence of S-integral points P1 , P2 , . . . , on the surface X and want to prove that there exists a curve containing infinitely many of them. This will suffice to prove that the set X(OS ) is not Zariski-dense. We can extract a subsequence, still denoted by n → Pn for simplicity of notation, ˜ v ), such that for each place v ∈ S it converges v-adically to a point Pv ∈ X(k where kv is the v-adic completion of k. Such a point Pv can lie in X(kv ) or at infinity, i.e. on a divisor Dv ∈ {D1 , . . . , Dr }. Since the height H(Pn ) tends to infinity, some point Pv must lie at infinity. For each place v such that the limit lies at infinity, say in Dv , we consider the above filtration (W j,v ) j=1,... and take a basis ϕ1,v , . . . , ϕd,v , d = dN = dimVN of VN containing a basis of W j,v for each index j. For the other places v ∈ S, just take any basis ϕ1 , . . . , ϕd of VN . We then estimate the double product d

∏ ∏ |ϕi,v (Pn )|v

v∈S i=1

and compare it with the height of the point (φ1 (Pn ), . . . , ϕd (Pn )). The inequality of the subspace theorem will be satisfied whenever the following holds: for every divisor Dv , h

∑ ordDv (ϕi,v ) > 0.

(3.4)

i=1

In that case, the conclusion of the subspace theorem provides a linear form in the ϕ1 , . . . , ϕd vanishing on infinitely many points of the sequence P1 , P2 , . . .; geometrically, this means that a curve on X contains infinitely many points of the sequence. It turns out that one can construct the rational functions ϕ j,v satisfying the inequality (3.4) whenever the inequality (3.3) of the theorem holds. Remarks. As for Siegel’s theorem, the result is ineffective, in the sense that it never enables one to find all the integral points on a given surface, even when the theorem asserts that they are finite in number. Moreover, unlike what happens in the case of dimension one (Siegel’s theorem), by this method one cannot even bound the number of integral points. On the contrary, it should be clear from the pattern of the proof that, whenever one can prove the degeneracy of integral points, one can also bound the degree of the curves on the surface (possibly) containing infinitely many integral points. Then, after applying Siegel’s theorem for curves, such curves are parametrized by A1 or by Gm . It turns out that it is possible to find all the curves

3.5 Varieties of Higher Dimension

75

on a given affine surface of given degree and parametrized by A1 or by Gm , so it is possible to determine all infinite families of integral points. It is also possible to bound such families independently of the ground number field and ring of S-integers. However, the “exceptional” isolated integral points, which depend on the number field k, the set of places S, and the given equations for the surface X, cannot be determined. We now present the promised corollaries of Theorem 3.21, starting with the remark that Siegel’s theorem can be deduced from Theorem 3.21. Deduction of Siegel’s theorem We first show how we can deduce Siegel’s theorem for curves, in the form of Theorem 3.13, from case (b) of Theorem 3.22: consider a smooth curve C with h ≥ 3 points at infinity Q1 , Q2 , . . . , Qh . ˜ where as usual C˜ is the smooth completion of C. Consider Put X˜ = C˜ × C, ˜ i = 1, . . . , h, and Dh+i := C˜ × {Qi }. Clearly, the 2h divisors Di := Qi × C, ˜ C ×C = X \(D1 ∪· · ·∪D2h ). Now we have D2i = 0, Di ·D j = 0 for 1 ≤ i ≤ j ≤ h and h < i ≤ j ≤ 2h and Di D j = 1 in the other cases. Take all the weights pi equal to 1 so that D = D1 + · · · + D2h satisfies D · Di = h for all i = 1, . . . , 2h and D2 = 2h2 . Equation (3.2) gives ξi = h for all i. Then inequality (3.3) reads 2h·2h2 > h·h2 +3·2h2 , i.e., h > 2, which is precisely the hypothesis of Siegel’s theorem, Theorem 3.13. The next corollary appeared already in [CZ7]. Theorem 3.22 Let X˜ be as above and let X ⊂ X˜ be an affine open subset. Assume that X˜ \ X = D1 ∪ · · · ∪ Dr , where the Di are distinct irreducible divisors, with no three of them sharing a common point. Assume also that there exist positive integers p1 , . . . , pr , c, with either (a) r ≥ 4 and pi p j (Di .D j ) = c for all i, j; or (b) r ≥ 5 and D2i = 0, pi p j (Di .D j ) = c for i = j. Then the S-integral points are not Zariski-dense in X. We observe that each of the conditions (a) and (b) implies, via the Riemann– Roch theorem, that D = ∑ j p j D j is big. Also, D is clearly nef. Note that the assumption for part (a) holds if the Di have algebraically equivalent positive multiples. Actually, one may also prove the converse. In turn, this shows that part (a) follows also from [Vo4], Corollary 0.3, which relies, however, on much more difficult techniques. In any case, a sharpening of part (a) of the above result has been obtained, still as a corollary of Theorem 3.21, by A. Levin [Lev1] and P. Autissier (unpublished, but see [Bilu1]), as follows. Corollary 3.23 (Levin, Autissier) Let X˜ be a smooth projective surface, and

76

Integral Points on Curves and Other Varieties

˜ with no three of them let D1 , . . . , Dr , r ≥ 4, be irreducible ample divisors on X, ˜ intersecting. Then the integral points on X \ (D1 ∪ · · · ∪ Dr ) are not Zariskidense. The example of three lines in general position in P2 , whose complement is isomorphic to G2m , proves that the condition r ≥ 4 is optimal. The proof of Corollary 3.23 is obtained by showing that, under the hypothesis that D1 , . . . , Dr are ample, one can always find positive weights p1 , . . . , pr such that the inequality (3.3) is satisfied; roughly speaking, one chooses weights pi such that the divisors pi Di are “almost” numerically equivalent, so that we are practically reduced to the situation of part (a) of Theorem 3.22. For this reason, we start by deducing that part of Theorem 3.22 from the main theorem, Theorem 3.21. Proof of case (a) of Theorem 3.22 Suppose then that for some positive integer numbers (weights) a1 , . . . , ar and a positive integer c, we have for all 1 ≤ i, j ≤ r: ai a j Di .D j = c. Then, on setting D := ∑ri=1 ai Di , we have D.Di = rc/ai for all i = 1, . . . , r and D2 = cr2 . We then write Equation (3.2) defining ξi in Theorem 3.21 as rc c 2 ξ − 2 ξ + r2 c = 0. 2 ai ai It has a double solution, ξ = ξi = ai r. On substituting the values of ξi , D2 , D.Di into the inequality (3.3) we re-write that inequality as 2r3 cai > r3 cai + 3ai r2 c, which, independently of i, is equivalent to r > 3, i.e. r ≥ 4 since r is an integer. This proves case (a) of Theorem 3.22. The idea of Levin and Autissier consists of reducing “up to ε ” to case (a) of Theorem 3.22, by proving that, whenever the divisors Di are ample, it is possible to choose the weights in such a way that condition (a) of Theorem 3.22 is “almost” satisfied. We follow Bilu’s presentation [Bilu1] of the unpublished paper of Autissier; for a different but almost equivalent presentation, see [Lev1]. We start with an elementary linear algebra lemma. Lemma 3.24 Let M = (mi, j )1≤i. j≤r be a real symmetric matrix with positive entries. Consider the associated linear forms Li : Rr → R (for i = 1, . . . , r) with Li (x1 , . . . , xr ) = mi,1 x1 + · · · + mi,r xr and the quadratic form q : Rr → R defined by q(v) = t v · M · v. Then for every

3.5 Varieties of Higher Dimension

77

ε > 0 there exists a vector v = (p1 , . . . , pr ) ∈ Zr , for positive integers p1 , . . . , pr , such that (1 − ε )q(v) < rpi Li (v) < (1 + ε )q(v). Proof

(3.5)

We note that, for any v = (x1 , . . . , xr ) ∈ Rr , q(v) = x1 L1 (v) + · · · + xr Lr (v).

So v ∈ {1, 2, . . . , }r would certainly be a solution of (3.5) if xi Li (v) is (positive and) independent of i. We first find such a vector with positive real entries. For this goal, consider the (r − 1)-dimensional simplex Δ := {(x1 , . . . , xr ) : x1 + · · · + xr = 1,

0 ≤ xi ≤ 1, (i = 1, . . . , r)} ⊂ Rr .

Consider the continuous map Δ → Δ defined by −1 v = (x1 , . . . , xr ) →

r

∑ Li (v)−1

· (L1 (v)−1 , . . . , Lr (v)−1 ).

i=1

By Brouwer’s fixed-point theorem there exists a point v = (a1 , . . . , ar ) ∈ Δ which is sent to itself. This means that the values ai Li (v) are equal for all i = 1, . . . , r. Also, the coordinates a1 , . . . , ar are all strictly positive, since otherwise we would have L j (v) = 0 for all j = 1, . . . , r (while in fact L j (v) > 0 for all j). Now, given ε > 0 as in the lemma, we can replace (a1 , . . . , ar ) by a suitable rational approximation so that inequality (3.5) holds, and on clearing denominators we obtain the sought integral point. Proof of Corollary 3.23 We can reduce to the case r = 4, so we suppose that we have exactly four big divisors D1 , . . . , D4 . We apply the above lemma by taking for M the intersection matrix (Di · D j )i, j . Owing to the homogeneity of the inequality (3.5), we can find, for every ε > 0, four positive integers p1 , . . . , p4 such that, on putting D = ∑4i=1 pi Di , (1 − ε )D2 < 4pi (D.Di ) < (1 + ε )D2 . In order to apply the main theorem, Theorem 3.21, we need to calculate the relevant terms D2 , D.Di and ξi , for i = 1, . . . , 4. Let us put c :=

D2 16

and pi Di p j D j = c + δi, j ,

(3.6)

78

Integral Points on Curves and Other Varieties

so that

∑ δi, j = 0. i, j

We obtain from (3.6) D2i =

1 (c + δi,i ) p2i

and D.Di =

1 pi

4

∑ (c + δi, j ) =

j=1

4c δi + , pi pi

where δi = ∑ j δi, j . The inequality (3.5) of the lemma gives (1 − ε )16c < 16c + 4δi < (1 + ε )16c, i.e. |δi | < 4cε . Then the equation (D − ξi Di )2 = 0 defining ξ becomes 2 ξ ξ (c + δi,i ) − 2(4c + δi ) + 16c = 0, pi pi

(3.7)

while the inequality (3.3) coming from Theorem 3.21, which we want to be satisfied, reads 2 ξ ξ − 32c + 48c < 0, (4c + δi ) pi pi which can be re-written as ξ ξ δi ξ 2 −2 −6 + < 0. pi pi 4cp2i

(3.8)

Recalling that |δi | ≤ 4cε thus tends to zero for ε → 0 and observing that ξ , defined as the minimal positive root of (3.7), depends continuously on δi,i and δi , we have just to verify that the inequality is satisfied whenever ε = 0. This amounts to checking that at the point ξ /pi = 2 the left-hand side in (3.7) is positive, which is the case since c + δi,i = p2i D2i > 0. Another application of Theorem 3.21 concerns divisibility problems. In one dimension, one can deduce from Siegel’s theorem the following fact. Theorem 3.25 (Corollary to Siegel’s theorem) Let f (X), g(X) be two polynomials in OS [X], with a non-trivial common factor in k[X]. If there exist infinitely many S-integers x ∈ OS such that f (x)|g(x) in OS , then f (X) has at most one complex root.

3.5 Varieties of Higher Dimension

79

The deduction from Siegel’s theorem runs as follows. Consider the algebraic curve C defined by the equation y f (x) = g(x). Its integral points correspond to the S-integers x such that f (x)|g(x). Note that C has one point at infinity (in a smooth model) for every zero of the polynomial f (X). Moreover, it has at least one other point at infinity, corresponding to the poles of the x-function. According to Siegel’s theorem, Theorem 3.13, if C has infinitely many integral points it can have at most one point at infinity, hence f (X) can have at most one (complex) root. We could also rephrase the statement above by saying that the rational function ϕ (X) = g(X)/ f (X) can take infinitely many integral values at integral points in a ring of S-integers only when it has at most one pole. Also, Siegel’s theorem in the particular case of curves of genus zero is equivalent to Theorem 3.25. We can also consider the problem for integral values at rational points: then an application of Thue’s theorem (over arbitrary number fields) yields the conclusion that A rational function ϕ (X) ∈ k(X) taking integral values at infinitely many rational points can have at most two poles. We now give yet another equivalent formulation of Theorem 3.25, which, as explained, boils down to Siegel’s theorem in the specific case of rational curves: Given two non-constant coprime polynomials f1 (X), f2 (X) ∈ OS [X], and two polynomials g1 (X), g2 (X) such that for i = 1, 2 fi (X) does not divide gi (X) in k[X], there exist only finitely many α ∈ OS such that fi (α ) divides gi (α ) in the ring OS for i = 1, 2. We now show the equivalence of these two statements. Given two coprime polynomials f (X), g(X) ∈ OS [X], suppose that f (X) has at least two distinct (complex) zeros. Up to enlarging the ring OS we can suppose that f (X) decomposes as f (X) = f1 (X) · f2 (X) in OS [X], with f1 (X), f2 (X) coprime. Now, put g1 (X) = g2 (X) = g(X) and observe that, for x ∈ OS , whenever f (x)|g(x), we will have the two divisibilities f1 (x)|g1 (x) and f2 (x)|g2 (x); then the above statement implies Theorem 3.25. Suppose now that Theorem 3.25 holds and let f1 (X), f2 (X), g1 (X), g2 (X) be as in the second statement. Let us assume, as we may, that f1 (X), g1 (X) and f2 (X), g2 (X) are coprime. If the conclusion of the second statement does not hold, then, by applying Theorem 3.25 twice, i.e. to the pairs ( f1 (X), g1 (X)) and ( f2 (X), g2 (X)), we obtain that both f1 (X) and f2 (X) have just one (complex) root. Up to a change of variable, involving if necessary an enlargement of the ring OS , we

80

Integral Points on Curves and Other Varieties

can suppose f1 (X) = λ X a , f2 (X) = μ (X − 1)b , for λ , μ ∈ OS \ {0} and positive integers a, b. But, since X a , g1 (X) are coprime, the divisibility α a |g1 (α ) implies that α is a unit (after possibly a finite enlargement of OS independently of α ). The same is true of α − 1, so both α and α − 1 are units. Then put f (X) = X(X − 1), g(X) = 1 and apply Theorem 3.25 again. It is very natural to look for two-dimensional generalizations. The S-unit theorem, Theorem 2.4, for Equation (2.4) in the particular case n = 3 can be restated as follows: The set of points (α , β ) ∈ OS2 such that α |1, β |1, and (1 − α − β )|1 in the ring OS is not Zariski-dense in the plane. In other words, we defined three polynomials f1 (X,Y ) = X, f2 (X,Y ) = Y, f3 (X,Y ) = 1 − X − Y and three more polynomials gi (X,Y ), for i = 1, 2, 3, which in this case are the constant degree-one polynomials, and looked for the solution to the divisibility problem fi (α , β )|gi (α , β ). In general, this amounts to solving in S-integers (x, y, z1 , z2 , z3 ) ∈ OS5 the system of equations zi fi (x, y) = gi (x, y). It is easy to see that this system defines a (rational) surface. Hence the divisibility problem becomes a problem on the distribution of integral points on a rational surface. Then, an improvement on the S-unit theorem is represented by the following theorem. Theorem 3.26 Let f1 (X,Y ), f2 (X,Y ), f3 (X,Y ), g1 (X,Y ), g2 (X,Y ), g3 (X,Y ) ∈ k[X,Y ] be degree-one polynomials such that f1 (X,Y ), f2 (X,Y ), f3 (X,Y ) are linearly independent. Suppose also that no three of the six polynomials share a common zero on the plane. Then the set of pairs (x, y) ∈ OS2 such that fi (x, y)|gi (x, y) in OS is not Zariski-dense in the plane. This result corresponds to the degeneration of the integral points on a simply connected smooth surface. This surface is obtained by blowing up the projective plane so as to regularize the rational functions gi (X,Y )/ fi (X,Y ), for i = 1, 2, 3. The divisor at infinity consists of the pre-image of the line at infinity (giving the integrality condition on (x, y)) and the strict transforms of the zero-divisors of the polynomials fi (X,Y ) (giving the integrality condition on the values gi (x, y)/ fi (x, y)). Proof of Theorem 3.26. To prove Theorem 3.26 we once again apply Theorem 3.21. On the blown-up surface the divisors at infinity to consider are the pull-back of a line and the strict transform of three more lines, the four lines being in general position. Let D1 be the first divisor and D2 , D3 , D4 the other three. Then Di D j = 1 for i = j, D21 = 1, and D2I = 0 for i = 2, 3, 4. Now put D = pD1 + D2 + D3 + D4 , where the positive real weight p will be chosen later.

3.5 Varieties of Higher Dimension

81

2 Then D · D1 = p + 3 and D · Di = p + 2 for i = 2, 3, 4, so D2 = √p + 6p + 6. The real numbers ξ defined in (3.2) turn out to be ξ1 = p + 3 − 3 and ξ2 = ξ3 = ξ4 = (p2 + 6p + 6)/(2p + 4). Inequality (3.3) for ξ1 and ξi (i = 2, 3, 4) then reads (after some simplifications) as (3− 2p)(6p2 + 6p√+ 6) > 6(p + 3 − 3)p2 + 2p − 2 > 0, √ which admits common solutions, forming a right-neighborhood of p = 3 − 1. Taking a rational solution p = a/b, with a, b positive integers, we can then put p1 = a, p2 = p3 = p4 = b, thereby obtaining integral weights satisfying the hypotheses of Theorem 3.21.

The significance of Theorem 3.26 lies also in the fact that on simply connected varieties one can never apply the above-mentioned results of Faltings and Vojta, which concern subvarieties of semi-abelian varieties (see also our discussion in the notes). We present a final application of Theorem 3.21 on integral points on surfaces. We have already treated in Chapter 2 the S-unit equation au + bv = c,

(3.9)

where a, b, c ∈ OS are non-zero, to be solved in units u, v ∈ OS∗ . Its solutions correspond to integral points on the complement of 0, 1, ∞ in P1 . We consider now a parametric version: given three non-zero polynomials a(T ), b(T ), c(T ) ∈ OS [T ], consider the equation a(t)u + b(t)v = c(t),

(3.10)

to be solved in triples (t, u, v) ∈ OS × OS∗ × OS∗ . Its solutions correspond now to the integral points on a surface. We can prove the following theorem. Theorem 3.27 Equation (3.10) has only finitely many solutions in OS × OS∗ × OS∗ if at least one of the following conditions is satisfied: (i) a(T ), b(T ), c(T ) have the same degree and a(t), b(t), c(t) are linearly independent; (ii) deg a(T ) + deg b(T ) = deg c(T ) > 2 and no two of the three polynomials share a common root. Theorem 3.27 in case (i) has been proved in [CZ12], [CZ15] and boils down to the distribution of integral points on a so-called Hirzebruch surface. Let us see the link. After homogenizing, equation (3.10) becomes ˜ 1 ,t2 )v = c(t ˜ 1 ,t2 )w, a(t ˜ 1 ,t2 )u + b(t

82

Integral Points on Curves and Other Varieties

where now (t1 : t2 ) are homogeneous coordinates in P1 and (u : v : w) are ho˜ 1 , T2 ), c(T ˜ 1 , T2 ), b(T ˜ 1 , T2 ) ∈ OS [T1 , T2 ] mogeneous coordinates in P2 , while a(T are the homogeneous forms (of the same degree) associated with the polynomials a(T ), b(T ), c(T ). We then obtain the equation of a hypersurface X˜ inside P1 ×P2 . Note that the projection X˜ → P1 on the first factor gives X˜ the structure of a P1 -bundle over P1 . The solutions to our divisibility problem correspond to the integral points on X˜ with respect to the divisor T2UVW = 0. In the case (ii), the result was obtained by Levin [Lev2], working with another completion, namely P1 × P1 . It will be shown in the exercises that condition (i) or condition (ii) cannot be completely removed. Yet another result generalizes Thue’s theorem to certain varieties of dimension > 1. Let f1 , . . . , fr , g ∈ k[X1 , . . . , Xn ] and let f1 , . . . , fr , g be the corresponding homogeneous forms in k[X0 , . . . , Xn ]. Define V as the hypersurface f1 · · · fr = g. Then we have the following theorem. Theorem 3.28 ([CZ9]) Suppose that the set of common zeros (in Pn ) of X0 g and any n − 1 among the forms f i is finite, and that no n among the f i have a common zero at ∞. Then, if the inequality ∑ri=1 deg fi > n max(deg fi ) + deg g holds, the set V (OSn ) is not Zariski-dense in V . As in Thue’s theorem, the finiteness (or degeneracy) statement for the solutions to a Diophantine equation follows from a Diophantine inequality, namely a lower bound for the absolute value of f1 (x1 , . . . , xn ) · · · fr (x1 , . . . , xn ) at integral points (x1 , . . . , xn ) ∈ OSn which prevents satisfaction of the equation f1 (x1 , . . . , xn ) · · · fr (x1 , . . . , xn ) = g(x1 , . . . , xn ). Such an inequality was established in [CZ9] and generalized by Evertse and Ferretti in [EF2]. The most general result is the following theorem of Evertse and Ferretti. Theorem 3.29 ([EF2]) Let X be a projective subvariety of Pn defined over k (v) (v) and let, for each v ∈ S, f0 , . . . , fn be homogeneous polynomials in k[X0 , . . . , Xn ] without (complex) common roots in X(C). Let ε > 0 be a positive real number. Then the solutions x = (x0 , . . . , xn ) ∈ X(k) to the inequality (v)

(v)

1/ deg fi

| f (x)|v ∏ ∏ i xv v∈S i=0 n

< H(x)−n−1−ε

are not Zariski-dense in X. Moreover, the Zariski closure of the solutions is contained in a finite union of hypersurfaces whose degree can be bounded explicitly. See [FaWu], [CZ9], [EF2], and [EF1] for the proof of this and similar

3.5 Varieties of Higher Dimension

83

statements. Here we just remark that the conditions on the common zeros are “generically” true.

3.5.1 The Faltings–Vojta Theorem We present without proof a deep and difficult result on integral points of subvarieties of semi-abelian varieties, and show a curious application. A semi-abelian variety is an irreducible algebraic group A which can be realized as an extension of an abelian variety by a linear torus; in other words, it is the middle term of an exact sequence {0} → Grm → A → A0 → {0}, where A0 is an abelian variety. If A is defined over the ring of S-integers OS of a number field k, its group of S-integral points A(OS ) is finitely generated. This fact follows formally from the combined application of the Mordell–Weil theorem to the abelian variety A0 and Dirichlet’s unit theorem to the torus Grm (recalling that Gm (OS ) = OS∗ is the group of S-units). The following theorem was proved by Vojta in [Vo4], after previous work by Faltings [Fa]. Theorem 3.30 Let X ⊂ A be an irreducible algebraic subvariety of a semiabelian variety A, defined over a number field k, which is not a translate of an abelian subvariety. Then, for every ring of S-integers OS , the set of integral points X(OS ) is not Zariski-dense in X. Remarks. (1) In the case A0 = {0}, i.e. A = Grm , we obtain the S-unit equation theorem (Theorem 2.4). As for that theorem, a possible reformulation of the above theorem is the following: given a semi-abelian variety A, for each algebraic subvariety X ⊂ A, the set X(OS ) is contained in the union of finitely many translates of algebraic subgroups contained in X. Another formulation reads as follows: For each set of S-integral points on a semi-abelian variety, its Zariski closure is a finite union of translates of algebraic subgroups. (Compare this with Theorem 2.7). (2) In the compact case r = 0, A = A0 , as treated by Faltings, one obtains once again the solution of Mordell’s conjecture: starting from an algebraic curve C of genus ≥ 2, take for A its Jacobian. Since A is projective, A(OS ) = A(k) and one deduces from Theorem 3.30 the finiteness of the set C (k).

84

Integral Points on Curves and Other Varieties

(3) Theorem 3.30 could be stated without mentioning at all either integrality or rationality. Starting from a finitely generated subgroup Γ ⊂ A(C) and an algebraic subvariety X ⊂ A, from Theorem 3.30 it follows that the intersection X ∩ Γ of Γ is the union of finitely many translates of subgroups. Theorem 3.30 can be applied to deduce the degeneracy of integral points to varieties X admitting a morphism X → A whose image is not a translate of an algebraic subgroup. Whenever X is projective, the image will be contained in (a translate of) the kernel of the map A → A0 , so one reduces the situation to sending X to an abelian variety A0 . Such maps factor through the so-called Albanese variety of X, which can be analytically described (in the smooth case) by integrating the regular 1-forms on X as follows. On letting ω1 , . . . , ωg be a basis for the vector space of holomorphic 1-forms on X (which are automatically closed), and choosing a point x0 ∈ X, one considers the map x

x ω1 , . . . , ωg ∈ Cg /Λ, X x → x0

x0

is the lattice obtained by integrating (ω1 , . . . , ωg ) over the closed loops in X. It turns out that the quotient Cg /Λ is isomorphic to an abelian variety, called the Albanese variety of X; it is defined over the same field of definition as is X. In the case of curves, we obtain Abel’s construction of the Jacobian. As mentioned, the morphisms X → A to an abelian variety A (actually to any algebraic group) factor through the Albanese of X. where Λ ⊂ Cg

Whenever X is not projective, one can define a substitute for the Albanese variety, called the quasi-Albanese variety of X, which is a semi-abelian variety and has the same universal property of factoring all morphisms X → A, for every semi-abelian variety A. An analytic construction of the quasi-Albanese variety, for a smooth quasi-projective variety, follows the same lines as in the compact case, but now also the so-called logarithmic 1-forms should be considered. These are defined as follows. Suppose X = X˜ \ D, where D is a divisor with normal crossing singularities. We say that a holomorphic 1-form ω on X has logarithmic singularities along D if, at any point p ∈ D where D can be locally defined by an equation of the form f1 · · · fh = 0, ω is locally of the form ω = a1 d f1 / f1 + · · · + ah d fh / fh + (regular form); here a1 , . . . , ah are holomor˜ phic functions in a neighborhood of p in X. It turns out that the logarithmic 1-forms are also closed. By integrating these forms as in the compact case, one defines the quasi-Albanese variety of X and a morphism from X to this semi-abelian variety. We stress that, whenever X admits non-constant maps to a semi-abelian

3.5 Varieties of Higher Dimension

85

variety A, X admits closed holomorphic 1-forms, by taking the pull-back of the invariant forms on A; in particular, X cannot be simply connected. Moreover, if such a map X → A does not send X onto a translate of a subgroup of A, the dimension of the vector space of logarithmic 1-forms on X is the dimension of A. A class of varieties to which Theorem 3.30 can be applied has been described by J. Noguchi and J. Winkelmann in [NW1]. We give an idea of the underlying principles. Whenever from a complete variety X˜ we eliminate two linearly equivalent divisors D1 , D2 , the resulting quasi-projective variety admits a never-vanishing non-constant rational function f , i.e. a morphism f : X˜ \ (D1 ∪ D2 ) → Gm to the multiplicative group. The 1-form d f / f has logarithmic singularities around D = D1 + D2 . Whenever D1 , D2 are algebraically equivalent, but not linearly equivalent, such a form can again be constructed, although it will not be of the form d f / f . Using this principle, in [NW1] a criterion for producing maps from a quasi-projective variety to semi-abelian varieties is provided; this criterion involves the number of divisors at infinity compared with the rank of the Néron–Severi group. We now show a curious arithmetic application of Theorem 3.30. Let us first introduce some notation. Let an elliptic curve over Q be defined by a Weierstrass equation y2 = x3 + ax + b,

(3.11)

where a, b ∈ Z are integers with 4a3 − 27b2 = 0. For a rational solution P = (x, y) ∈ Q2 of the above equation, one can write the rational numbers x, y in a unique way as v u x = 2, y = 3, d d for integers u, v, d without a common factor, d > 0. Denote by d(P) the positive number d appearing in the above formulae. (Of course, d(P) is defined only in the affine part of the curve, i.e., not at the point at infinity). The primes dividing d(P) are precisely the primes modulo which the point reduces to the (unique) point at infinity of the completion of the curve defined by the above equation. A consequence of Vojta’s conjecture (see the notes at the end of this chapter) is the following. Conjecture Let E1 , E2 be two elliptic curves in the Weierstrass equation. Suppose there are infinitely many pairs (P1 , P2 ) ∈ E1 (Q) × E2 (Q) such that d(P1 ) = d(P2 ).

(3.12)

86

Integral Points on Curves and Other Varieties

Then E1 is isomorphic to E2 over Q and for all but finitely many such pairs P1 = ±P2 . The pairs (P1 , P2 ) satisfying the above equation correspond to integral points on a certain variety which we now construct. Let X˜ be the blow-up of the surface E1 × E2 above the point (O1 , O2 ), where Oi , for i = 1, 2, is the point at infinity of the curve Ei . Let D1 ⊂ X˜ (resp. D2 ) be the strict transform of {O1 }×E2 (resp. E1 ×{O2 }). Then a rational point (P1 , P2 ) ∈ (E1 × E2 )(Q) with d(P1 ) = d(P2 ) provides a rational point on X := X˜ \ (D1 ∪ D2 ) which is integral with respect to D1 + D2 . Now, a simple calculation, which we omit, enables us to show that X satisfies the hypotheses of Vojta’s conjecture (see Section 3.11 below), so one expects the degeneracy of the integral points. Let us show now that, admitting such degeneracy, the only possibility for infinitude is that E1 is isomorphic to E2 and all but finitely many solutions satisfy P1 = ±P2 . We show that this argument holds, by proving the following lemma which classifies the case of infinite algebraic families of solutions to Equation (3.12). Lemma 3.31 Let C ⊂ X be a curve with infinitely many integral points. Then either C is the exceptional divisor on X, or E1 E2 and C is the pull-back either of the diagonal of E1 × E2 = E12 or of the curve defined by P1 = −P2 . Proof The quasi-projective surface X contains, as closed subvarieties, both complete and non-complete (i.e., affine) curves. Recall that the integral points on a complete curve coincide with the rational points. By Siegel’s theorem, if C is affine with infinitely many integral points, it must be rational. Now, X contains only one rational curve, which is the exceptional divisor of the blow-up. (The exceptional divisor on X˜ is isomorphic to P1 , while its affine part on X is isomorphic to P1 \ {2 points}.) Let then C ⊂ X be a complete algebraic curve with infinitely many rational points. Then by Falting’s theorem it must have genus 1. Now, if the elliptic curves E1 , E2 are not isogenous, the only genus-one curves on E1 × E2 are the vertical and the horizontal ones, i.e. those of the form {P} × E2 or E2 × {P}. However, the corresponding pull-backs on X turn out to have points at infinity, so they cannot produce infinite families of solutions. It remains to consider the case of isogenous elliptic curves E1 , E2 . We are interested in maps from a third elliptic curve E to E1 ×E2 whose images induce complete curves on X. Such maps necessarily are of the form E P → F(P) = (Φ1 (P) + Q1 , Φ2 (P) + Q2 ) ∈ E1 × E2 ,

(3.13)

3.5 Varieties of Higher Dimension

87

where, for i = 1, 2, Φi : E → Ei is an isogeny and Qi a rational point on Ei . The condition that the pull-back of the curve F(E) on X˜ omits the divisor D1 + D2 amounts to the following equality of sets: {P ∈ E | Q1 + Φ1 (P) = O1 } = {P ∈ E | Q2 + Φ2 (P) = O2 }. Now, the above sets are cosets for the finite groups ker Φ1 , ker Φ2 respectively; their equality implies in particular the equality of ker Φ1 , ker Φ2 , hence E1 = E/ ker Φ1 is isomorphic to E2 = E/ ker Φ2 . We then reduce to the case E1 = E2 = E and Φ1 , Φ2 are isomorphisms. It now follows from Equation (3.13) that the infinite families of solutions come from translates of a subgroup in E1 × E2 = E × E of the form {(P, Q) | Q = Ψ(P)} for some automorphism Ψ of E. But the only translates giving rise to a complete curve on X are those constituting the subgroups themselves. Hence the curve in question will be an algebraic subgroup parametrized as E P → (P, Ψ(P)) ∈ E × E for an automorphism Ψ : E → E fixing the origin. Now, if E has no complex multiplication, the subgroups of that type are exclusively those defined by P2 = P1 (the diagonal) or P2 = −P1 and we are done. In the case of complex multiplication, one should consider also those defined by the equation P2 = Ψ(P1 ), where Ψ is a complex automorphism; however, these subgroups contain only one rational point, finishing the proof. It follows from the above lemma that the infinite algebraic families of solutions to (3.12) occur only if E1 E2 and are given by the pairs with P1 = ±P2 (since the exceptional curve on X gives rise to no solution to the original problem). Unfortunately, we are not able at present to prove the degeneracy of the integral points on X, so the conjecture above is not settled. However, we can prove unconditionally the following weaker result. Theorem 3.32 Let E1 , E2 be two elliptic curves in the Weierstrass equation. Suppose there are infinitely many pairs (P1 , P2 ) ∈ E1 (Q) × E2 (Q) such that d(P1 ) = d(P2 )

and

d(2P1 ) = d(2P2 ).

(3.14)

Then E1 is isomorphic to E2 over Q and for all but finitely many such pairs P1 = ±P2 . In the above formula, the point 2P1 (resp. 2P2 ) is the double of P1 (resp. P2 ) according to the group law on E1 (resp. E2 ) defined by taking for the origin the point at infinity O1 (resp. O2 ). Proof Observe at once that the second condition d(2P1 ) = d(2P2 ) amounts to requiring that the point P1 reduces to some point of 2-torsion modulo some

88

Integral Points on Curves and Other Varieties

power of a prime if and only if P2 reduces to a point of 2-torsion. Also, after eliminating a finite number of primes if necessary, one can suppose that the four points of 2-torsions on each elliptic curve are pairwise non-congruent modulo any prime. As in the attempted proof of the conjecture, we blow up the surface E1 × E2 . In this case, letting Q1i , Q2i , Q3i , for i = 1, 2 be the points of exact order 2 on Ei , and, as before, Oi the origins of Ei , we blow up the surface E1 × E2 over the ten points (O1 , O2 ), (Qh1 , Qk2 ), with h, k = 1, 2, 3. Define Y˜ as the corresponding surface and D ⊂ Y˜ as the pull-back of the strict transform of the eight divisors {O1 } × E2 , {Q1j } × E2 , E1 × {O2 }, E1 × {Q2j }, for j = 1, 2, 3. Then put Y = Y˜ \ D. Note that we have natural morphisms Y˜ → X˜ and Y → X, where the surfaces X, X˜ have been constructed above. Letting κ be a field of definition for each torsion point of order 2, and taking a finite set of places S containing all places modulo which two distinct points of 2-torsion on any of the curves might be congruent, we obtain that the solutions to Equation (3.14) give rise to integral points on Y . Now, we apply Theorem 3.30 to deduce that the integral points on Y are not Zariski-dense. To this end, consider the rational functions y1 /y2 ∈ κ (E1 × E2 ), where, for i = 1, 2, yi is the y-function on Ei , relative to the Weierstrass equation. The divisor of y1 /y2 turns out to be 3 3 y1 = −3{O1 } × E2 + ∑ {Q1j } × E2 + 3E1 × {O2 } − ∑ E1 × {Q2j }. y2 j=1 j=1 Hence, viewing y1 /y2 as a rational function on Y˜ , it has neither zeros nor poles at the exceptional divisors (since each of the blown-up points was an indeterminacy point for y1 /y2 ). Also, its zeros and poles are contained on D. It then follows that y1 /y2 induces a regular never-vanishing function on Y , i.e. a morphism Y → Gm . Using also the projection Y → E1 × E2 , we obtain a morphism (actually injective) Y → E1 × E2 × Gm to a semi-abelian variety. Now, the facts that the last component (the morphism to Gm ) is non-constant and that the projection Y → E1 × E2 is surjective, and the classification of the algebraic subgroups of E1 × E2 × Gm , easily imply that the image of Y cannot be contained in any translate of an algebraic subgroup. Then the integral points on the image of Y in E1 × E2 × Gm are degenerate, and so are the integral points on Y . In order to conclude the proof, it suffices to notice that the possible infinite

3.6 Quadratic-Integral Points on Curves

89

algebraic families of integral points on Y give also infinite families of integral points on X, and these have been classified in the previous lemma. Exercise 3.33 Prove the toric analogue of the conjecture considered in this section: given a ring of S-integers OS , all but finitely many pairs (u, v) ∈ OS∗ × OS∗ such that u−1 ∈ OS∗ v−1 satisfy u = v±1 .

3.6 Quadratic-Integral Points on Curves In this section, we give another application of Theorem 3.21 on integral points on surfaces. Namely, we shall consider Diophantine equations defining algebraic curves, to be solved in algebraic integers which have degree two over a fixed number field. These solutions define integral points (rational over a fixed number field) on a suitable surface. Consider a polynomial equation f (X,Y ) = 0, to be solved in k or in Ok . If f has degree d in Y , it is reasonable to expect that, for an x ∈ Q, the solutions of f (x,Y ) = 0 have degree d over k(x). (That this happens for “most” x ∈ k is the content of Hilbert’s irreducibility theorem treated in Section 3.8.) This leads to the problem of studying the “points of degree ≤ d,” e.g. on a curve C: by this we mean the points of C whose coordinates lie on some extension of degree ≤ d of the ground field k. D. Abramovich and J. Harris have investigated this question, reducing it to the location of rational points on a certain subvariety of the Jacobian of C (see the paper by van der Geer in [EE]). Their results are sometimes complete. This happens e.g. when d = 2 (we then call the points in question “quadratic” over k); they prove that there are infinitely many of them, for some k, if and only if there exists a rational function on C of degree ≤ 2, to P1 or to an elliptic curve. (For instance, in the first case the curve is birational to a plane curve f (X,Y ) = 0 where degY f ≤ 2.) The case of quadratic-integral points (i.e. points with algebraic integer coordinates in a quadratic extension) may be reduced to the quadratic-rational case only in part; in fact, the above-stated condition remains of course necessary for the existence of infinitely many of them, but is no longer sufficient. Let us now briefly see how Theorem 3.22 may be applied to this question. The starting point follows the idea of the authors cited above. Let P be a quadratic-integral point (over k) on the affine curve C/k, and let P denote the conjugate point over k. The pair (P, P ) then lies on the surface C ×C. Consider

90

Integral Points on Curves and Other Varieties

now the symmetric product C(2) of C with itself. It is defined as the quotient of C × C with respect to the involution (P, Q) → (Q, P) (see [Se2], p. 53) and there is a natural projection map π : C × C → C(2) . Now, the conjugate (over k) of the point (P, P ) equals (P , P); but these points have the same image in C(2) , whence π (P, P ) is a rational point of C(2) ; one also checks that it is in fact integral, since P is integral. The idea is now to apply Theorem 3.21 to the surface C(2) .9 (At this point the above authors apply Faltings’ theorems mentioned in the foregoing.) This is sometimes possible, and leads to the following result ([CZ7], Cor. 1). Theorem 3.34 Let C˜ be a projective non-singular curve and let C = C˜ \ ˜ Then {Q1 , . . . , Qr } be an open affine subset, for distinct Qi ∈ C(k). (i) if r ≥ 5, C contains only finitely many quadratic-integral points (over k); (ii) if r ≥ 4, there exist finitely many rational maps ψ : C˜ → P1 of degree 2 such that all but a finite number of the quadratic-integral points on C (over k) are sent to P1 (k) by at least one of the maps in question. One may check (see [CZ7]) that these conclusions are the best-possible ones. Elegant examples are provided by simultaneous Pell equations, like Y 2 = 2X 2 + 1, Z 2 = 3X 2 + 1. Such systems represent affine curves of genus 1, with four points at infinity. Siegel’s theorem therefore implies the finiteness of the usual integral points (see also Exercise 1.46). On the contrary, however, there are infinitely many quadratic-integral points √ (over Q): in fact, one can solve the first equation in Z and then define z = 3x2 + 1, thereby obtaining a first infinite family; the corresponding function ψ is represented now by the projection (X,Y, Z) → (X,Y ). Similarly, we may solve the second equation in Z, thereby obtaining another family, and a third family comes from solving the equation 3Y 2 − 2Z 2 = 1 in Z and defining x = (y2 − 1)/2. The proof of Theorem 3.34 yields for these curves the more precise result that no other infinite families exist (see [CZ7], Addendum to Corollary 1). In particular, while three points at infinity ensure the finiteness of the usual integral points, we may need five points at infinity in the quadratic case. We conclude this section by remarking that an alternative, sometimes superior, approach to rational points of (any!) bounded degree has been found by Vojta [Vo3]. This seems to yield remarkable conclusions also for integral points, leading, for example, to a different proof of part of Theorem 3.34. By combining “Vojta’s inequality” from [Vo3] with Faltings’ and Vojta’s results on degeneracy of rational and integral points on subvarieties of 9

Theorem 3.20 cannot be applied if C has positive genus, for Pic0 (C(2) ) = 0 in this case; however, [Vo4], Corollary 0.3, can be applied as well.

3.6 Quadratic-Integral Points on Curves

91

semiabelian varieties via techniques introduced by Noguchi and Winkelmann [NW1], Levin recently proved the following very general result [Lev4]. Theorem 3.35 Let C be a smooth affine curve defined over a number field k. Let C˜ be a smooth projective completion of C and let {P1 , . . . , Pq } = C˜ \C. Let d be a positive integer. The following statements are equivalent. (1) There exists a finite extension L of k and a set of places S of L such that the ¯ of degree ≤ d over L is infinite. set of S-integral points of C(L) (2) There exists a morphism φ : C → P1 defined over k¯ with deg φ ≤ d such that φ ({P1 , . . . , Pd }) ⊂ {0, ∞}.

Note that, although Theorem 3.35 provides a necessary and sufficient condition for the existence of an infinite set of points of given degree d, it leaves open the question of classifying them whenever there are infinitely many such points. For instance, it is not clear when one can conclude that they are all preimages of rational points under finitely many maps of degree ≤ d. This problem, however, is settled in the quadratic case for curves with at least four points at infinity, thanks to Theorem 3.34. A natural and significant case, treated by F. Veneziano in his thesis, arises from the already-mentioned simultaneous Pell equations, i.e. a system of the form

y2 = ax2 + c (3.15) z2 = bx2 + d, where a, b, c, d are rational integers with a > 0, b > 0, cd = 0, and ad − bc = 0. This system defines a smooth genus-one curve with four points at infinity. By Siegel’s theorem, it has only finitely many integral solutions. Note that, in contrast, for some choices of a, b, c, d, each single equation of the system can have infinitely many solutions in Z × Z. Veneziano considered in [Ve] the solutions in quadratic S-integers, where OS is a ring of S-integers in a fixed number field k. √ Note that, whenever (x, y) ∈ first equation, (x, y, bx2 + d) is a quadratic soOS × OS is a solution to the √ lution to the system, where bx2 + d denotes a square root of bx2 + d in an (at most) quadratic extension of k. Since the first equation has infinitely many solutions, up to enlarging if necessary the ring OS , we obtain an infinite family of quadratic integral solutions for the system (3.15). Another infinite family arises from the S-integral solutions to the second equation, and a third family is obtained by eliminating x from the two equations and solving in S-integers the equation by2 − az2 = bc − ad. In [Ve], Veneziano proved the finiteness of the quadratic integral solutions outside these three families. More precisely, his Theorem 2 reads as follows.

92

Integral Points on Curves and Other Varieties

Theorem 3.36 consists of

The set of quadratic integral solutions to the system (3.15)

• the three infinite families described above; • a finite set of cardinality ≤ 22835(S)+3 ; • a finite and effectively computable set of cardinality ≤ 3 · 21121((S)+h−1)+1 , where h is the class number of the ring OS .

3.7 Rational Points We present without proofs some general results about rational points on algebraic varieties. Namely, we are interested in the Zariski density of rational points on an algebraic variety, possibly after finite extension of the ground field. In this situation, it is not restrictive to consider only projective varieties, and only even smooth ones, since the problems will be invariant by birational isomorphisms. We recall the celebrated theorem of Faltings, which was proved in 1982. Theorem 3.37 Let C be a projective curve of (geometric) genus ≥ 2, defined over a number field k. Then C(k) is finite. Note that each smooth projective curve of genus zero becomes isomorphic, over a suitable (quadratic) extension of the ground field, to the projective line, on which the rational points are dense. Also, a genus-one (smooth projective) curve becomes isomorphic, over a suitable finite extension of its field of definition, to an elliptic curve of positive rank. So, again, its rational points form a dense set. In other words, Faltings’ theorem, Theorem 3.37, provides a full classification of the algebraic curves having infinitely many rational points (over a suitable extension of the number field of definition). We can express the distinction between the class of curves with infinitely many rational points (over a suitable number field) and the class of curves with only finitely many such points (over any number field) as follows. Let C be a smooth projective curve defined over a number field k. Then one of the following mutually exclusive properties holds. • C is a homogeneous space under the action of an algebraic group and there exists a field extension L/k such that C(L) is Zariski-dense. • C can be embedded in an abelian variety in such a way that its image is not an abelian subvariety and for every field extension L/k the set of L-rational points C(L) is finite.

3.7 Rational Points

93

Even in higher dimensions it is quite easy to prove that, for a homogeneous space for an algebraic group, defined over a number field, the rational points are Zariski-dense. Concerning varieties which can be embedded into abelian ones, another theorem of Faltings (already mentioned in Section 3.5.1), which was proved in [Fa], reads as follows. Theorem 3.38 Let A be an abelian variety, and let X ⊂ A be an irreducible algebraic subvariety, both defined over a number field k. Suppose that X(k) is Zariski-dense in X. Then X is a translate of an abelian subvariety. In another, but equivalent, formulation, the Zariski closure of the set X(k) is a finite union of translates of abelian subvarieties. This result seems to be the only general result on degeneracy of rational points in higher dimensions over arbitrary number fields. However, in higher dimensions unlike what happens for the curves, it is not true that every algebraic variety is either a homogeneous space for an algebraic group or embeds into an abelian variety. For instance, a smooth hypersurface of a projective space Pn , with n ≥ 3, of degree ≥ n + 2, admits no non-trivial action by algebraic groups of positive dimension, and does not admit any nonconstant map to any abelian variety. For such algebraic varieties, nothing is known about the density of rational points: it is conjectured, after Lang and Vojta, that their set of rational points should not be dense, but not even a single example has been established. A general conjecture asserts that, for an algebraic variety X, defined over a number field, a necessary condition for having a Zariski-dense set of rational points is that X is covered by images of non-constant rational maps G → X, where G varies in a (possibly infinite) set of algebraic groups. This is the case, for instance, for elliptic surfaces, i.e., surfaces admitting a fibration in elliptic curves. Apart from trivial cases, namely products of an elliptic curve by a curve of genus ≤ 1, such surfaces are not homogeneous spaces for algebraic groups. However, if the elliptic fibration admits a non-torsion section, its rational points will be Zariski-dense. Note that such surfaces admit (rational) endomorphisms of degree > 1, and hence an infinite semigroup of rational endomorphisms. This is also the case of the Kummer surfaces arising as quotients of abelian varieties: namely, they are birationally defined as the quotient A/{±I}, where A is an abelian surface. The isogenies of A (e.g., multiplication by integer numbers) define endomorphisms of the quotient surface. To the best of our knowledge, all known examples of algebraic varieties with

94

Integral Points on Curves and Other Varieties

a Zariski-dense set of rational points are provided by varieties endowed with an infinite semigroup of rational endomorphisms. We now show some applications of Falting’s theorem, Theorem 3.38. Let X be a smooth projective variety. Suppose that X does not embed into an abelian variety, but admits a dominant rational map to a variety Y that embeds into an abelian one, without being an abelian variety itself. Then, the rational points on Y will not be Zariski-dense by Faltings’ theorem, so the same will be true of the rational points of X. To investigate the geometrical properties of a variety X under which we can apply Faltings’ theorem via this construction, we recall that the Albanese variety of X is an abelian variety AX endowed with a morphism ι : X → AX satisfying the universal property that, for every abelian variety B and morphism ψ : X → B, there exists a morphism φ : AX → B such that φ ◦ ι = ψ . Both the variety AX and the canonical morphism ι : X → AX can be defined analytically, by integration of regular 1-forms on X. The irregularity q(X) is the dimension of the Albanese variety AX . The above method applies whenever the irregularity is > 0 and the Albanese map is not surjective. This is the case if, for instance, q(X) > dim X. A deep problem would be to weaken this last condition, and prove degeneracy whenever q(X) = dim(X) and the Albanese map has degree > 1 (so X is not itself an abelian variety). This seems, however, to require essentially new techniques. Note that, whenever q(X) > 0, the variety X is not simply connected (since it admits a vector space of dimension q(X) of holomorphic 1-forms, which by the compacity of X(C) cannot be exact.) Rational points of degree d > 1 As in the case of integral points, the distribution of rational points of a given degree d > 1 on a curve reduces to the study of rational points on higher-dimensional varieties, via symmetric products. Namely, given a (smooth, projective) curve C defined over a number field k, define X := C (d) to be the quotient of the d-dimensional variety C d by the (finite) symmetric group Sd acting by permutation of coordinates. Each point P ∈ C (k) of degree d over k defines an un-ordered d-tuple made of P together with all its Galois conjugates. Viewed in X, this tuple corresponds to a k-rational point; hence, one can try to apply knwon results on the distribution of rational points on X to deduce something about algebraic points of degree d on C . Note that, if C has genus zero, the corresponding variety X is isomorphic (possibly after finite extension of k) to the d-dimensional projective space Pd .

3.8 The Hilbert Irreducibility Theorem

95

The picture is clear in this case: there are infinitely many points of degree d on C P1 and they define a Zariski-dense set on X Pd . If, on the contrary, C has positive genus g, consider the Jacobian variety J of C ; it is an abelian variety of dimension g. There is a natural morphism X → J defined as follows: take a rational point P ∈ C (k) and associate with any unordered d-tuple (P1 , . . . , Pn ) in X the class of the divisor P1 + · · · + Pd − dP; it is a zeroth-degree divisor on C , hence a point on J. Denote this map by π : X → J. Whenever π (X) is a proper subvariety of J (which happens whenever g > d) one can apply Faltings’ theorem, Theorem 3.38, to this variety and deduce the degeneracy of rational points on X. Using essentially this idea, Abramovich and Harris, [AbH], proved in 1997 the following result. Theorem 3.39 Let C be a smooth projective curve defined over a number field, and let d = 2 or 3. The following statements are equivalent: • there exists a number field L ⊃ k and infinitely many algebraic points on C of degree d over L; • there exists a morphism C → Y , where Y is a curve of genus ≤ 1, of degree d.

3.8 The Hilbert Irreducibility Theorem In this section we apply Siegel’s finiteness result on integral points to derive a sharpening of the celebrated Hilbert irreducibility theorem (henceforth abbreviated to HIT). Here is a basic version of the original statement proved by Hilbert in 1892 (see [Hilb]). Theorem 3.40 (Hilbert irreducibility theorem) Let F(X,Y ) ∈ Z[X,Y ] be a polynomial, of degree ≥ 1 in Y , which is irreducible in the ring Q[X,Y ]. Then there exist infinitely many integers n ∈ Z such that the specialized polynomial F(n,Y ) ∈ Z[Y ] is irreducible in the ring Q[Y ]. In the case when degY F ≥ 2, the only interesting one, as a corollary we obtain that Under the above hypothesis on the polynomial F(X,Y ) ∈ Z[X,Y ], for infinitely many n ∈ Z the specialized polynomial F(n,Y ) has no rational roots. We want now to see the link between the above statement and Theorem 3.40; in one direction, as we observed, the last statement is a corollary of the HIT. Our aim is to show that in fact the reverse implication also holds. On

96

Integral Points on Curves and Other Varieties

letting V ⊂ A2 be the affine curve defined by the equation F(X,Y ) = 0, and x : V → A1 be the projection on the X-coordinate, the above conclusion reads as follows: the set A1 (Q) is not contained in the image x(V (Q)). Note that the assumption that F(X,Y ) is irreducible and degY F ≥ 2 implies that the x-map V → A1 admits no rational section. This formulation will be the starting point for the generalization treated here, which we now state formally. Theorem 3.41 Let k be a number field, let V be an affine (possibly reducible) algebraic curve, and let π : V → A1 be a dominant map, defined over k. If π admits no rational section, then A1 (k) ⊂ π (V (k)). More precisely, the set A1 (k) \ π (V (k)) is infinite. Let us inspect more deeply the link between Theorem 3.41 and the HIT. Suppose an irreducible polynomial F(X,Y ) ∈ k[X,Y ], of degree ≥ 2 in Y , is given. For almost all specializations X = a ∈ k, the specialized polynomial F(a,Y ) ∈ k[Y ] will have degree equal to degY F. If this degree is 2 or 3, the condition of having a k-rational root is equivalent to the one of being reducible in k[Y ]. Hence the conclusion of the above theorem, applied to the curve V given by the equation F(X,Y ) = 0 and π equal to the projection on the X-coordinate, gives the conclusion in the HIT, namely the existence of a specialization a ∈ k leaving the specialized polynomial irreducible. It is easy to deduce also the existence of infinitely many such specializations, still by applying Theorem 3.41 (see below). If, however, degY F ≥ 4 a further argument is needed to deduce the full HIT from the last statement. Suppose first, for the sake of example, that degY F = 4. Then let W ⊂ A2 be the curve defined by F(X,Y ) = 0, endowed with the projection x : W → A1 ; and let V = W (2) be the symmetric square of W over A1 , namely V := {(p1 , p2 ) ∈ W 2 | x(p1 ) = x(p2 )}/ ∼ where ∼ is the relation induced by identifying (p1 , p2 ) with (p2 , p1 ). In other words, the curve W (2) consists of unordered pairs of points of W having the same x-coordinate. The x-coordinate is then well defined in V , and gives a dominant morphism V → A1 of degree 10. Note that V is a reducible curve, containing an irreducible component isomorphic to W (the image of the diagonal in W ×W ). The splitting of V depends on the Galois group of the equation F(X,Y ) = 0 over k(X). The rational points on V correspond to the algebraic points in W which are either rational or have the x-coordinate in k and the y-coordinate quadratic over k. Hence, V admits a

3.8 The Hilbert Irreducibility Theorem

97

rational point in the fiber π −1 (a) of the rational point a ∈ A1 (k) if and only if the equation F(a, y) = 0 has a rational or a quadratic solution, which happens if and only if F(a,Y ) is reducible in k[Y ]. Hence our Theorem 3.41, applied to the curve V constructed above, implies the existence of a rational specialization a ∈ k such that the polynomial F(a,Y ) is irreducible in k[Y ]. The condition that π admits no section is ensured by the hypothesis that F(X,Y ) is irreducible, so every solution y(x) to the equation F(x, y(x)) = 0 must be an algebraic function of degree 4. It is clear that the above method can be pursued further, namely from an irreducible polynomial F(X,Y ) ∈ k[X.Y ], of degree degY F = d > 1, one first defines the curve W : F(X,Y ) = 0 and then constructs the curves W (1) = W,W (2) , . . . ,W ([d/2]) by taking symmetric fiber products of W with itself, and one goes on to put V = W (1) ∪ · · · ∪W ([d/2]) . We then obtain that for at least one specialization a ∈ k = A1 (k) the fiber π −1 (a) in V contains no rational point, so the polynomial F(a,Y ) has no factor of degree ≤ d/2, but this implies that it is irreducible. See especially [Se1], [Se3] for a similar viewpoint, with greater details. We now deduce from Theorem 3.41 that the required specializations form an infinite set. In fact, suppose by contradiction that they form a finite set {a1 . . . , am } ⊂ A1 (k). Consider the hyperelliptic curve W of equation Y 2 = (X − a1 ) · · · (X − am ). We “add” W to V and prolong the morphism π to V := V ∪ W , still by sending W (x, y) → x. Then each point ai , i = 1, . . . , m, has a rational pre-image in V (k). Then application of Theorem 3.41 to V gives a contradiction, as wanted. We can present the above discussion from another viewpoint, using the concept of a decomposition group in Galois theory, which we have already used in connection with Siegel’s theorem. Given a finite morphism V → A1 , which for simplicity we suppose to be Galois over k, consider the proper subgroups Δ1 , . . . , Δr of the Galois group Gal( AV1 ). On letting Vi = V /Δi be the intermediate covers of the line, each rational point p ∈ A1 (k) whose fiber in V is not irreducible lifts to at least one of the covers Vi → A1 . We now give a proof of Theorem 3.41 using Siegel’s theorem on curves. Proof of Theorem 3.41. The fact that π admits no rational section amounts to the fact that it has degree > 1 when restricted to any irreducible component of V on which it is non-constant. Of course, those components of V on which π is constant give rise to a single rational point in their image. Note also that, if a component W of V is not defined over k, then it can have only finitely many

98

Integral Points on Curves and Other Varieties

rational points (because rational points also lie on its conjugate components; see Exercise 3.1). Then, it suffices to prove the results for the curve obtained from V by removing the union of all the components not defined over k and the components on which π is constant. Hence we can and shall suppose that each geometrically irreducible component of V is defined over k and that on each such component the map π is dominant. We further reduce to the case where π is a finite map. It suffices, for each component W of V , to replace the ring k[W ] by the integral closure of k[X] in k(W ); this operation amounts to replacing W by an algebraic curve W , still endowed with a projection to A1 , birationally isomorphic to W ; more precisely, there will be a birational isomorphism W W compatible with the projection to A1 . The result for the new curve V , obtained as the union of the W , will imply our result for V , since V “differs” from V by a finite set. We can then suppose that π : V → A1 is a finite map. Hence we can find a ring of S-integers OS such that, for each point a ∈ A1 (OS ) = OS , every rational point in the fiber π −1 (a) must lie in V (OS ). Now, if the genus of each irreducible component of V is positive, then Siegel’s theorem applies and we deduce that V (OS ) is finite, so we are done. Otherwise, we can argue as follows. We consider a polynomial map p : A1 → A1 such that for each component W of V the corresponding fiber product CW := W ×A1 A1 = {(w,t) ∈ W × A1 | π (w) = p(y)} is irreducible and of positive genus. We can, for instance, choose p(t) = t 3 − c, where c ∈ OS is not a branch point of π . (Alternatively, for the components W of genus zero, forgetting integrality, we reduce to a rational map ϕ : P1 → P1 of degree > 1. Taking into account that the height of the image satisfies h(ϕ (t)) deg(ϕ )h(t) for t ∈ P1 (Q) of “large height”, a counting argument enables us to conclude that “most” rational (or integral) points do not lie in the image.) Then, letting C be the union of the curves CW , for W ranging over the irreducible components of V Siegel’s theorem yields that C(OS ) is finite. On the other hand, each point in the infinite set p(OS ) having a rational pre-image in V has also a rational (hence S-integral) pre-image in C. Hence this can happen only for finitely many a ∈ p(OS ), and the theorem is proved. Remark It has to be noted that the HIT is a much less deep result than Siegel’s theorem, and admits also a number of elementary proofs. Siegel’s theorem actually yields a best-possible estimate for the distribution in Z of exceptional specializations which produce a reducible polynomial. We give here a precise statement (see [Se1], [Sch1]).

3.8 The Hilbert Irreducibility Theorem

99

Theorem 3.42 Let P(X,Y ) ∈ Z[X,Y ] be an irreducible polynomial. The number of integers n ∈ Z with 0 < n ≤ H such that P(n, y) is reducible in Q[Y ] is H 1/2 , for H → ∞. Proof The problem is reduced to the following one: given an algebraic curve C (which is possibly reducible) defined over Q and a finite map π : C → A1 , without any rational section, count the points n ∈ A1 (Z) with 0 ≤ n ≤ H such that π −1 (n) contains at least one rational point. The components C of higher genus of C contribute only for a finite set, since C (Z) is finite (it is here that we first use Siegel’s theorem). For each component C of genus 0, C must have at most two points at infinity, again by Siegel’s theorem. Then, if C has two points at infinity, it is parametrized by a hyperbola (see Exercise 3.18); in that case, either C (Z) is finite, or such a hyperbola has quadratic irrational points at infinity and its integral points are obtained by solving a Pell equation (see Exercise 1.28). Then the sequence of its integral points grows exponentially, and as a consequence the number of points p ∈ C (Z) of height H(p) ≤ H is bounded asymptotically as log H, and this gives a similar bound for the points π (p) ∈ Z of absolute value ≤ H. Finally, if C is a component with just one point at infinity, then it is parametrized by A1 . We then obtain a morphism A1 → C → A1 , where the first arrow has degree one and the second one is π . Their composition is a morphism A1 → A1 , expressed by a polynomial of degree ≥ 2 (otherwise π would admit a rational section) so the image of integral points of C (which are all obtained, up to finitely many, by integral points in A1 , at least after suitably normalizing the first morphism) is formed by a sequence which grows at least quadratically, hence our estimate. The example P(X,Y ) = X −Y 2 shows that the estimate of Theorem 3.42 is optimal. Remark It is easy to derive from the HIT a “universal version” predicting the existence of a sequence of integers an , n = 1, 2, . . ., such that the specialization x → an preserves the irreducibility of each given irreducible polynomial F(X,Y ) for all n > n0 (F). One may give also explicit example of these sequences, and actually of rather “dense” sets, as in [DeZ], [Bilu2]. As we shall see in the next chapter, the subspace theorem allows us to prove that certain simple exponential sequences like 2n + 3n are universal in this sense. The HIT has been generalized in various directions (see [BoG], [L2], [Sch1], [Se1], [Z2] and also the beginning of Section 3.6). In particular, it admits a higher-dimensional generalization as follows. Theorem 3.43 Let X be a (possibly reducible) algebraic variety defined over

100

Integral Points on Curves and Other Varieties

a number field k, of dimension n, and let π : X → An be a dominant morphism admitting no rational sections. Then the set An (k) is not contained in π (X(k)). Proof We choose a line l ⊂ An , defined over k, such that π −1 (l) ⊂ X is irreducible in each irreducible component of X; this can be achieved via Bertini’s theorem. Then apply Theorem 3.41. Yet another version for the projective space Pn in place of An is possible. More generally, we can work with a dominant morphism V1 → V2 , where dimV1 = dimV2 and V2 is a k-rational irreducible variety. Note that An and Pn are simply connected, so each map from an irreducible variety X to An or to Pn must ramify somewhere, unless it is birational, in which case it admits a rational section. If we remove some hypersurfaces from Pn , then its complement, which is still a rational variety, can admit non-trivial unramified covers. Then one could apply the Chevalley–Weil theorem, Theorem 3.4, to such a cover and, up to changing if necessary such a cover by replacing it with a union of twists of it, deduce the opposite conclusion, albeit only for integral points. Examples are provided, for instance, by the isogenies Gnm → Gnm . As has just been remarked, the HIT is in a sense a converse to the Chevalley– Weil theorem discussed in the previous section. In fact, for unramified finite maps W → V Theorem 3.4 states that, up to replacing W by a reducible variety W of the same dimension, one obtains a map W → V , not admitting rational sections and surjective on the rational points. We have already seen a couple of examples in the previous paragraph. Below, we shall present another one, which comes from compact surfaces. A natural question is whether all counter-examples to Hilbert irreducibility, for varieties with a Zariski-dense set of rational (or integral) points, come from unramified covers. It is tempting to formulate the following conjecture. Conjecture Let π : W → V be a finite morphism of smooth quasi-projective algebraic varieties defined over a number field k, with V irreducible. Let OS ⊂ k be a ring of S-integers and suppose that V (OS ) is Zariski-dense. Suppose also that on each irreducible component W ⊂ W of W the finite map πW : W → V ramifies. Then π (W (k)) does not contain V (OS ). This statement clearly contains the HIT, which is just the particular case when V = A1 . More generally, from the above conjecture one can deduce the following further conjecture. Conjecture Let V be a smooth simply connected irreducible quasi-projective algebraic variety, defined over a number field k. Let π : W → V be a finite

3.8 The Hilbert Irreducibility Theorem

101

morphism defined over k, of degree > 1 on each component of W . Let OS ⊂ k be a ring of S-integers and suppose that V (OS ) is Zariski-dense. Then π (W (k)) does not contain V (OS ). These conjectural extensions of the HIT have been proved over linear algebraic groups by Colliot-Thélène and Sansuc [C-TS] (see also stronger statements in [FeZ] for linear tori, [Co1] for linear algebraic groups, and [Z7] for products of elliptic curves). We now present an example of a complete surface, with a Zariski-dense set of integral points, admitting an unramified cover of degree two; according to the Chevalley–Weil theorem, such a surface does not satisfy the conclusion of the HIT (and we shall prove explicitly that all its rational points lift to rational points on Q(i)). Of course, it would be easy to produce examples coming from products of elliptic curves, and more generally from abelian surfaces. Even more generally, whenever a surface X is endowed with a dominant rational map X A, where A is an abelian variety (of dimension one or two), X admits non-trivial (abelian) covers, so it does not have the Hilbert property over any number field. In several cases X(k) is Zariski-dense for some number-field case, e.g. whenever the surface X is birational to an abelian surface or when the abelian variety A is an elliptic curve and the fibers of X A are rational curves. Our example is of a different kind: it is constituted by a so-called Enriques surface, admitting a degree-two cover by a K3 surface. Example/Theorem equations

The smooth surface E defined in P7 by the system of

E :

⎧ ⎪ x02 + x0 x1 = x0 x2 + x32 ⎪ ⎪ ⎪ ⎪ x42 = x0t3 ⎪ ⎪ ⎨ t32 = x1 x2 ⎪ t22 = x1 x3 ⎪ ⎪ ⎪ ⎪ t12 = x2 x3 ⎪ ⎪ ⎩ t2t3 = x1t1

(3.16)

is birationally equivalent to the (singular) normal hypersurface defined in P3 by the equation E :

x1 y40 + x0 y41 = x03 x12 + x02 x13 .

(3.17)

The rational points (over Q) on these surfaces are Zariski-dense. The surface

102

Integral Points on Curves and Other Varieties

X in P4 defined by the system x1 y40 + x0 y41 = x03 x12 + x02 x13 X : t 2 = x0 x1

(3.18)

is birationally equivalent to a K3 surface. The map X → E sending (x0 : x1 : y0 : y1 : t) → (x0 : x1 : y0 : y1 ) is an unramified cover of degree two. We start by verifying the above assertion. Note that the function field of X is obtained from that of E by adjoining the square root of the rational function x1 /x0 . Hence the resulting map X → Ee is a degree-two cover. Let us prove that X is irreducible, which amounts to saying that the function x1 /x0 is not a square in the function field Q(E ) (and not even in C(E )). In fact this field can

be identified with C(u1 , u2 )( 4 u21 + u31 − u42 u1 ), where u1 = x1 /x0 , u2 = y0 /x0 and y1 /y0 = 4 u21 + u31 − u42 u1 . Clearly, the function u1 = x1 /x0 is not a square in such a field, hence the (geometrical) irreducibility of the variety X follows. We now verify that, for every discrete valuation ν of any field k, and for every k-rational point (x0 : x1 : y0 : y1 ) ∈ E (k), ν (x0 x1 ) ≡ 0 (mod 2). Let a0 = ν (x0 ), a1 = ν (x1 ); if b := a0 + a1 were odd, then the four terms appearing in (3.17), i.e. x1 y40 , x0 y41 , x03 x12 , x02 x13 , would have ν -valuation congruent modulo 4 to a1 , a0 , 2b + a0 , 2b + a1 , respectively. These values are pairwise distinct, contradicting Equation (3.17). Thus ν (x0 x1 ) is even. This proves both the geometric fact that the cover X → E is unramified, which, algebraically, amounts to the fact that the rational function x1 /x0 on E is locally a square everywhere, and the arithmetic fact that the rational points of E (Q) lift to rational points of X (Q(i)). In fact, each prime divisor of the rational number x1 /x0 appears with even multiplicty, so x1 /x0 (or equivalently x1 x0 ) is a square up to its sign, and hence is a square in the imaginary field Q(i). We have then explictly proved that the Chevalley–Weil theorem applies in this case. In order to prove the assertion about the density of rational points, we state without full proof the following fact. Let F be the K3 surface defined in P3 by the equation X 4 +Y 4 = Z 4 +W 4 .

(3.19)

There is a degree-two map F → X . The rational points in F are Zariskidense. We just give the (abstract) construction of the map: consider the automorphism σ of F sending (X : Y : Z : W ) → (X : iY : −iZ : −W ).

3.8 The Hilbert Irreducibility Theorem

103

Clearly, it has order four. It has no fixed point, but σ 2 has eight fixed points, so the quotient is singular. The quotient F /σ 2 turns out to be isomorphic to X , while F /σ is isomorphic to E . Blowing up the eight fixed points for σ 2 on F produces another surface over which σ acts naturally: the corresponding quotient by σ produces the smooth Enriques surface E . The fact that F (Q) is Zariski-dense can be proved in the following way. The surface F contains the line r : X − Z = 0 = Y − W ; the pencil of planes containing r defines an elliptic fibration on Y , since each plane of this pencil intersects Y in r plus a plane cubic curve (which is smooth in general). On taking two more lines s1 , s2 on Y , disjoint from r and defined over Q, one obtains two sections, since each plane of the pencil intersects these lines at one point on each of them. On taking one of the sections as the origin, the second section turns out to have infinite order. The details can be found in the paper [S-D]. We end this section by stating without proof the following theorem. Theorem 3.44 The Fermat quartic surface F defined by Equation (3.19) has the Hilbert property over Q. Namely, for every (possibly reducible) algebraic variety W and generically finite morphism π : W → F without rational section, the set F (Q) is not contained in π (W (Q)). The proof, which was provided by the authors of the present book, appears in [CZ18]; it makes essential use of the presence of two elliptic fibrations of positive rank on F . In recent work of J. Demeio [Dem], this result has been extended to a vast class of surfaces admitting two independent elliptic fibrations of positive rank.

3.8.1 A Hilbert Property for Fibrations As remarked, the HIT leads to statements like Theorem 3.43, where for a generically finite morphism π : X → An without rational sections it is proved that the image π (X(k)) of the rational points cannot cover the set An (k) = kn of rational points on the image. One may ask what happens if we take dominant maps X → Y , not necessarily of finite degree. As we shall briefly illustrate below, this question has been implicitly studied in the literature for some interesting special cases. More recently, it was brought to our attention by F. Balestrieri. Further important motivations come from logic: in fact, the so-called Diophantine sets may be interpreted as the subsets of the set of integral or rational points in An admitting an integral (or rational) pre-image under a dominant morphism from an algebraic variety.

104

Integral Points on Curves and Other Varieties

A first natural example to consider is that of a morphism π : S → P1 , from a surface S to the line, and we may also assume that the generic fibre is again a line. By Tsen’s theorem, or by Castelnuovo’s criterion (see e.g. [Beau]) these surfaces are birational to P21 over the complex field, and the projection π : S → P1 admits a section θ : P1 → S (defined over C). Suppose now that S and π are defined over the rationals (or over a given number field k). If the section θ is also defined over Q, clearly P1 (Q) = π (S(Q)). (If we assume that π is a rational map, we can still obtain that π (S(Q)) contains all but finitely many rational points of P1 .) The following question arises: if π (S(Q)) is cofinite in P1 (Q), is it necessarily true that the projection π : S → P1 admits a section defined over Q? This problem has an affirmative answer (actually in a strong sense), provided by theorems of H. Davenport, D. J. Lewis, and A. Schinzel, which can be viewed as a local-to-global principle, as we now explain. First of all the surface S can be viewed as a curve over the function field Q(t) of P1 . By specializing t to a rational value t0 ∈ Q, we obtain a curve over Q. The assumption that π (S(Q)) is cofinite in P1 (Q) means that the specialized curve almost always admits a rational point; the conclusion, i.e. the existence of a section, can be phrased as the existence of a point defined over Q(t). Hence the conclusion may be stated by saying that the local solvability everywhere implies global solvability. Let us further translate this kind of statement. By assumption, the generic fiber of π is rational, so S, viewed as a curve over Q(t), is a rational curve. By a theorem of Hilbert and Hurwitz, such curves always admit a birational model isomorphic to a conic over the ground field. After diagonalizing the quadratic form appearing in the homogeneous equation for the conic, we obtain an equation for X of the form A(t)X 2 + B(t)Y 2 = Z 2 ,

(3.20)

for polynomials A(t), B(t) ∈ Q[t]. The aforementioned theorem of Davenport, Lewis, and Schinzel reads as follows. Theorem 3.45 Let A(t), B(t) ∈ Q[t] be polynomials and suppose that every arithmetic progression contains an integer n such that the equation x2 A(n) + y2 B(n) = z2 admits a solution (x : y : z) ∈ P2 (Q). Then Equation (3.20) admits a solution in (x(t) : y(t) : z(t)) ∈ P2 (Q(t)). This result has been extended and related to specializations of Brauer groups, by authors including Colliot-Thélène, Fadeev, and Serre.

3.8 The Hilbert Irreducibility Theorem

105

For a survey on this theorem and its connections with norm equations, see [Z3]. The situation changes radically if we replace the base P1 with an elliptic curve: namely, there exist algebraic surfaces S fibered over an elliptic curve E, with rational fibers, such that the fiber of each rational point of E contains rational points, but nevertheless the fibration S → E admits no section defined over Q. Again, this phenomenon is linked with the Chevalley–Weil theorem, the weak-approximation, and the existence of unramified covering of elliptic curves. We do not pause to consider these aspects, and just give the following concrete example. Theorem 3.46

Let E be the elliptic curve defined in Weierstrass form as E : y2 = x3 + 5x.

Then (1) the set E(Q) of rational points on E is infinite; (2) for every rational point (x, y) ∈ E(Q), the equation u2 + v2 = xw2

(3.21)

admits a rational solution (u : v : w) ∈ P2 (Q); (3) the above equation admits no solution (u : v : w) ∈ P2 (Q(E)) in the function field of E over Q. Note that, as predicted by Tsen’s theorem, there are sections defined over a finite extension of Q, actually already over Q(i), namely (u : v : w) = (x + 1 : i(1 − x) : 2). Proof Assertion (1) follows from the fact that the point T = (1/4, 9/8) has infinite order for the group law on E, after taking the point at infinity as the origin. To prove assertion (2), write a rational point (x, y) as (a/d 2 , b/d 3 ) with gcd(a, d) = gcd(b, d) = 1. Also a is certainly ≥ 0. From the equation it follows that a(a2 + 5d 4 ) is a square, and this implies that either a or 5a is a square. In any case, x is the sum of two squares in Q.10 We now prove that there exist no non-trivial function solutions to (3.21) with u, v, w ∈ Q(E). We could suppose w = 1 and express x = u2 + v2 = z¯z, where z = u + iv and we extend the conjugation to the function field Q(i)(E) by imposing that it has trivial action on Q(E). It follows that (z) + (¯z) = (x) = 2[A] − 2[O]. 10

This calculation is related to the Chevalley–Weil theorem, via the fact that the field extension √ Q(E)( x)/Q(E) is unramified.

106

Integral Points on Curves and Other Varieties

It is evident that this implies the following shape for the divisor of z: ¯ − [P]) + [A] − [O], (z) = ∑ mP ([P]

where the sum is extended over representatives of conjugate pairs of non-real complex points of E, and the mP ≥ 0 are integers, almost all zero. Let us define Q = ∑ mP P, P

where the summation is now done in E. Since z is defined over Q(i), also the divisor ∑P mP [P] is defined over Q(i) and therefore Q ∈ E(Q(i)). We also have Q¯ − Q = A. Since Q¯ + Q ∈ E(Q), we find 2Q ∈ E(Q). Note that a system of representatives for E(Q) modulo 2E(Q) is given by the four points O, A, T , and A − T = (20, 90). Then we can write 2Q = R + 2S for some R ∈ {O, T, A, A − T } and S ∈ E(Q), and, replacing Q with Q−S, we may assume that S = 0. To conclude the argument it suffices to check that no point in the finite set of representatives is divisible by 2 in the group E(Q(i)). We leave to the reader this easy verification. To continue with other examples, consider now the following situation: A is an abelian variety (defined, say, over Q) and f ∈ Q(A) is a non-constant rational function. One can view f as a rational dominant map A P1 . It admits no rational sections, since every rational map P1 → A is necessarily constant. Then we expect that it is not surjective on rational points. As a consequence of Falting’s theorem, Theorem 3.38, we prove the following assertion. Theorem 3.47 Let f : A P1 be any non-constant rational map from an abelian variety to the line. There exist infinitely many rational points p ∈ P1 (Q) having an empty pre-image (in the domain of f ). Proof Let E be any elliptic curve, with infinitely many rational points, that is not isogenous to any abelian subvariety of A, and let g : E → P1 be any nonconstant morphism (we can take e.g. the x function in a Weierstrass model for E). Let us define the variety X ⊂ A × E as (the closure of) the variety defined by the equation f (p) = g(q) for (p, q) ∈ A × E. The variety X is a hypersurface of the abelian variety A × E, and does not contain any translate of an abelian subvariety: in fact, all such translates are either of the form (p + B) × {q}, for an abelian subvariety B of A and a point (p, q) ∈ A × E, or correspond to (p + B) × E for some p ∈ A.

3.8 The Hilbert Irreducibility Theorem

107

To verify that none of these translates (if of positive dimension) can be contained in X, just fix a point in one of the two factors and move the other one. Then X(Q) is finite, and so is its projection on P1 . It follows that only finitely many points of the infinite set g(E(Q)) have a rational pre-image by f . Similar considerations may be brought to bear in the context of integral points. An instance occurs in Gnm , where again we have a finitely generated group of integral points similar to the Mordell–Weil group. Other instances occur in the theory of Pell’s equations with parameters. Consider an equation of the form x2 − A(t)y2 = 1,

(3.22)

where A(t) ∈ Z[t] is a polynomial, not a square in C[t]. The above equation defines a surface S, fibered over the line via S (x, y,t) → t ∈ A1 , whose generic fiber is a hyperbola (i.e. a Q-form of a torus Gm ). For infinitely many specializations of t → t0 ∈ Z, the integer A(t0 ) will not be a square in Z, so the specialized equation has infinitely many integral solutions. However, in general the fibration S → A1 admits no section A1 → S outside the “constant” sections t → (±1, 0,t). An elliptic analogue of the above situation appears with the Fermat quartic surface defined by Equation (3.19). As remarked, F is endowed with an elliptic fibration S → P1 (actually more then one) admitting sections of infinite order. However, not all the rational points on F belong to the image of one of these sections. In the case of tori, a stronger quantitative result such as Theorem 3.48 below can be derived from recent work of Levin [Lev5] (see Theorem 4.8 below). Theorem 3.48 Let k be a number field, and let ϕ ∈ k(x1 , . . . , xn ) be a rational function. Let Γ ⊂ Gnm (k) be a finitely generated group. The number of elements α ∈ ϕ (Γ) of height ≤ T is bounded as (logT )δ , for some positive number δ depending only on the rank of Γ. Proof sketch. We may assume that Γ = (OS∗ )n and that OS is principal. Let us write ϕ (x1 , . . . , xn ) as a quotient ϕ = f /g of two coprime polynomials in OS [x1 , . . . , xn ]. For each γ ∈ Γ we can write f (γ ) = uγ · aγ , and similarly g(γ ) = vγ · bγ , where uγ , vγ are S-units and aγ , bγ are S-integers not divisible by any prime in S. We have T ≥ H(ϕ (γ )) ≥ H(uγ /vγ ) · H(aγ /bγ ). The number of S-units of height ≤ T is bounded as in the statement. This concludes the proof if f , g are both monomials. According to Theorem 4.8,

108

Integral Points on Curves and Other Varieties

the remaining factor H(aγ , bγ ) ≥ max(H(aγ ), H(bγ )) up to a gcd which is H(γ )ε . By invoking results of Evertse that have already been mentioned above (and were still obtained by means of the subspace theorem) we have max(H(aγ ), H(bγ )) ≥ H(γ )c1 for a positive number c1 . Hence the values of ϕ in Γ of height ≤ T are attained at points γ ∈ Γ of height ≤ T c2 for some positive number c2 . As has already been remarked, the number of such elements γ grows as log T δ for some δ > 0. Exercise 3.49 Deduce from the HIT that, for every number field k and every integer d > 1, the quotient group k∗ /k∗d is infinite. (Here k∗d denotes the subgroup of dth powers in k∗ .) In the case k = Q, this is an easily seen consequence of the infinitude of primes. Exercise 3.50 A field K is said to be Hilbertian if the assertion of Theorem 3.41 holds with k replaced by K. Use the previous exercise to show that the p-adic fields Q p are not Hilbertian. Prove that the function fields k(t), with k any field, are Hilbertian. Exercise 3.51 Deduce from the original version of the HIT that, if all the values at integral points of a polynomial in one variable, with integral coefficients, are perfect dth powers for a fixed number d > 1, then such a polynomial is a perfect dth power in the ring of polynomials. This result could also be strengthened to a finiteness statement by Siegel’s theorem. Also, it can be further strengthened even in an effective way, using the theory of lower bounds for linear forms in logarithms, by allowing a variable exponent d. Namely, given a polynomial p(X) ∈ Z[X], if the equation p(n) = yd has infinitely many solutions (n, d) ∈ Z2 with d > 1, then either for all but finitely many solutions d = 2 and p has at most two (complex) roots of odd order, or for some integer δ > 1, p has all but one root of order not divisible by δ . Also, in the latter case, for all but finitely many solutions (n, d), d must divide δ . This result is due to Schinzel and Tijdeman, see [SchT]. Exercise 3.52

Consider the (elliptic) curve of affine equation y2 = x(x − 1)(x + 6).

It has infinitely many rational points, since the point (2, 4) has infinite order (after taking the point at infinity for the origin). Prove that the square roots of the first coordinates x of its rational points generate a field of finite degree over Q. Prove that this is not the case for the cube roots.

3.9 Constructing Integral Points on Certain Surfaces

109

3.9 Constructing Integral Points on Certain Surfaces In this short section, we study certain quasi-projective surfaces which do admit a Zariski-dense set of integral points, thus providing a converse to the degeneracy results presented in the last two section. We start by considering the open surfaces obtained as complements of a curve in the projective plane: according to Vojta’s conjecture (see the notes at the end of this chapter) when a (possibly reducible) curve C ⊂ P2 admits only normal crossing sigularities11 (if any) and the degree of C is at least four, the set of integral points on the complement P2 \ C should be degenerate (i.e. contained in finitely many curves). This fact is proved only when C has at least four components; it is a special case of Theorem 3.23, but it can be proved also just by applying the S-unit equation theorem, observing that for each pair of components D, D of C there exists a rational function vanishing only on D and regular outside D ; the values of such a function at the integral points of P2 \C are S-units; hence one obtains several S-unit-valued functions which must be dependent, hence implying an S-unit equation. In contrast, when C is irreducible, such a method cannot work; as we remarked, Theorem 3.21 does not apply either, so the problem remains open. We now show that the condition on the degree of C is optimal, so that whenever degC ≤ 3 there exists an infinite set of rational points on P2 \C. Whenever C is composed of lines, the resulting open surface becomes isomorphic, after a suitable extension of the field of definition, to A2 (the complement of one line), or A1 × Gm (the complement of two lines) or G2m (the complement of three lines in general position). In each case we know that the set of integral points is Zariski-dense. This fact can be explained by invoking the presence of an algebraic group of automorphisms acting transitively on the resulting surface, so that each single integral point produces a Zariski-dense orbit. We leave to the reader (Exercises 3.56 and 3.57) the proof that the complement of a smooth conic and the complement of the union of a conic and a line contains a Zariski-dense set of S-integral points (for a suitable ring of S-integers). Also, the complement of a singular cubic, with either a nodal or a cuspidal singularity, contains a Zariski-dense set of integral points (Exercise 3.58). The crucial case is that of the complement of a smooth (hence irreducible) cubic; it was solved by F. Beukers, who proved the following theorem. 11

That is, in a neighborhood of each singular point, C is analytically isomorphic to the curve of the equation xy = 0.

110

Integral Points on Curves and Other Varieties

Theorem 3.53 Let E ⊂ P2 be a smooth cubic in the plane defined over a number field k. There exists a ring of S-integers OS ⊂ k such that (P2 \ E)(OS ) is Zariski-dense. Actually, Beukers gives also a precise description of the conditions on the set of places S and the number field k under which the set of integral points is Zariski-dense. We give just an idea of the proof, referring to [Be1] or to Section 5.3 of [Co2] for the details. Sketch of the proof. We follow the presentation in [Co2]. Consider a flexus O of E (which can be supposed to be rational) and the associated involution ι fixing that flexus (in the standard Weierstrass model, the involution with respect to the flexus at infinity is the symmetry (x, y) → (x, −y)). The argument is based on the following geometric fact: for every P ∈ E there exists a unique conic CP which is symmetric with respect to ι intersecting E only at P and ι (P) (necessarily with multiplicity three). This fact is proved by dimension counting. Then the points on CP integral with respect to E are the points in a rational curve integral with respect to two points at infinity; if the ring of S-integers has non-trivial units and CP has an integral point (with respect to P, ι (P)), then CP has infinitely many integral points. In order to produce infinitely many conics having at least one integral point, we start from an integral point Q on the line L tangent to E at O (so that Q must be integral with respect to O); one verifies that there exists a unique conic of the form CP passing through Q; Q will be an integral point on that conic, since it is integral with respect to E, so CP contains infinitely many integral points. Then the open surface P2 \E contains infinitely many points on infinitely many of the curves CP , in particular a Zariski-dense set. Note that the surface P2 \ E is not a homogeneous space under the action of any algebraic group, so the density of the integral points cannot be obtained as in the easier examples described above. We note that the cubic curve E ⊂ P2 is a divisor in the anti-canonical class of P2 , which is an ample class. This is an instance of a so-called del Pezzo surface provided with a smooth divisor in its anti-canonical class. The potential density of integral points in the complement of such a divisor can be proved in general, as shown by B. Hasset and Yu. Tschinkel in [HT]. Theorem 3.54 Let X˜ be a smooth del Pezzo surface, and D a smooth divisor such that −D is a canonical divisor, defined over a nunber field k. Suppose that

3.9 Constructing Integral Points on Certain Surfaces

111

˜ X(k) is Zariski-dense. Then there exists a ring of S-integers of k such that the set of S-integral points on X = X˜ \ D is Zariski-dense. The proof is inspired by Beukers, and still exploits a suitable family of conics on the surface (i.e. reduces the problem to solving a parametric family of Pell equations). As an example, one deduces the potential density of integral points on smooth affine hypersurfaces of A3 of degree three. Another interesting example of an open surface with a Zariski-dense set of integral points comes from the symmetric square of an elliptic curve. Let us describe this example. Let E/k be an elliptic curve with origin O and let X˜ be the symmetric square of E. Then we have canonical maps E 2 → X˜ → E, where the second arrow sends {P, Q} → P + Q ∈ E. Since the fibers of this map are isomorphic to E/{±1} ∼ = P1 , X˜ → E has the structure of a P1 -bundle over E. (It is a general fact in the theory of ruled surfaces that all the P1 -bundles over a curve can be obtained by projectivizing a rank-two vector bundle.) On letting X be the quasi-projective open set obtained by removing the im˜ we obtain that the age of {O} × E (which equals the image of E × {O}) in X, restricted morphism X →E gives X the structure of an A1 -bundle over E. (It is neither a vector bundle nor a principal Ga -bundle). This X is actually an affine variety, because it is the quotient of the affine surface (E \ {O})2 by a finite group (of order 2). The k-rational points on X˜ correspond to the k-quadratic points on E. Now, every degree-two map E → P1 (defined over k) produces infinitely many quadratic points just as pre-images of rational points on P1 . ¯ which is integral on E \ {O} gives Similarly, every quadratic point P ∈ E(k) rise to an integral point on X(k). However, for a fixed map of this kind, the rational points in the symmetric square obtained in this way all lie on a curve (depending on the map). So, to obtain in this way a Zariski-dense set we must consider infinitely many such maps. We shall prove that, on varying the map, we may indeed achieve a Zariskidense set of rational points, and, moreover, we may ensure that they are integral for X. In particular, we shall prove the following theorem. Theorem 3.55 Suppose that E(k) is infinite. Then (in a suitable integral model) the integral points in the symmetric square of E\{O} are Zariski-dense.

112

Integral Points on Curves and Other Varieties

Concerning the model of X, we may start for instance from a Weierstrass affine model of E \ {O}, y2 = x3 + ax + b, so that every specialization x → x0 ∈ k provides a pair of conjugate quadratic points, which are integral if x0 is integral. These points are sent to the origin O of E by the canonical map X → E, hence they do not form a Zariski-dense set in X. Proof of the theorem. According to the last remarks there are infinitely many integral points on the fiber of O as a result of the canonical projection X → E. We want to prove that there are infinitely many integral points over any fiber of a 2-divisible rational point in E(k). Suppose we have enlarged S so that E has good reduction outside S (and the above Weierstrass equation has S-integral coefficients). Let then P = (xP , yP ) ∈ E(k) (again with respect to√the Weierstrass equation). We look for a “trivial” quadratic point Q = (u, u3 + au + b), u ∈ k, such that P + Q is integral. (Note that this entails that the point {P + Q, (P + Q) } ∈ ˜ where the prime denotes the quadratic conjugate, is integral in X(k) and lies X, in the fiber above 2P.) Note that P + Q is not integral at some place P if and only if P ≡ −Q modulo P. So, we want that for every place outside S this does not happen. We can suppose (up to enlarging S once again) that the ring OS is a unique factorization domain and write xP = α /β for coprime S-integers α , β . Then we just choose u = γ /δ , where γ , δ are S-integers such that δ α − β γ = 1, and we are done. We note that it has been checked that the proof will work in particular over Z, without enlarging S. In the next three exercises, further classes of affine surfaces will be shown to possess a Zariski-dense set of integral points. Exercise 3.56 Let C ⊂ P2 be a smooth conic over the complex number field. Prove that the group of projective transformations leaving C invariant acts transitively on the complement. Deduce that, when C is defined over a number field k, there exists a ring of S-integers in k such that the S-integral points on P2 \C are Zariski-dense. Exercise 3.57 Let C, L ⊂ P2 be a conic C and a non-tangent line L. Let P ∈ C ∩ L be an intersection point. Let M be the line tangent to C at P and let X := P2 \ (C ∪ L ∪ M). Let Λ P1 be the pencil of lines through P and consider

3.10 Exercises

113

the map X → Λ \ {L, M} sending a generic point Q to the line joining Q to P. Prove that the fibers of f are isomorphic to Gm and note that Λ \ {L, M} is also isomorphic to Gm . Deduce that X(OS ) is Zariski-dense for a suitable ring of S-integers S. Exercise 3.58 Using the same idea as in the previous exercise, prove that the complement of a nodal cubic curve in the plane admits a Zariski-dense set of integral points. The same is true of the complement of a cuspidal cubic curve, and the proof is similar.

3.10 Exercises Exercise 3.59 Let f (X), g(X) ∈ Z[X] be non-constant coprime polynomials. Show that there exist only finitely many integers n ∈ Z such that f (n)|g(n) in the ring of integers. Hence, in the case OS = Z, the aforementioned finiteness result on divisibility between values of polynomials can be strengthened. This is an instance of Runge’s theorem (see [BoG] for a general statement) and holds also over the ring of integers of imaginary quadratic fields. Exercise 3.60 Consider the polynomials f1 (X,Y ) = X, f2 (X,Y ) = Y, f3 (X,Y ) = 1 + X −Y, g1 (X,Y ) = Y, g2 (X,Y ) = X, g3 (X,Y ) = 1. Show that, if the group of units OS∗ is infinite, there exists a Zariski-dense set of integral points (x, y) ∈ OS2 = A2 (OS ) such that fi (x, y)|gi (x, y). Deduce that the condition that no three of the six polynomials fi , g j share a common zero cannot be eliminated in Theorem 3.26. Exercise 3.61

Deduce Thue’s original theorem from Theorem 3.29.

Exercise 3.62 Prove that, given a ring of S-integers with infinitely many units, the equation u − tv = 1 − t has infinitely many solutions (t, u, v) ∈ OS ×OS∗ ×OS∗ . Show also that they form a Zariski-dense set of the surface defined by the above equation. Conclude that the condition on the degrees of a(t), b(t), c(t) appearing in the assumptions of Theorem 3.27 cannot be omitted. Exercise 3.63 Let C be a smooth projective conic, and let L1 , L2 be two nontangent lines intersecting on the conics, all defined over a number field k. Let X be the complement of C ∪ L1 ∪ L2 on the projective plane. Prove that there exists a ring of S-integers OS ⊂ k such that X(OS ) is Zariski-dense in X. (Hint: use the previous exercise.)

114

Integral Points on Curves and Other Varieties

Exercise 3.64 Prove the following (partial) converse to Siegel’s theorem: Let ˜ C/Q be a projective non-singular curve of genus zero and let C be an affine open subset such that r := #(C˜ \ C) ≤ 2. Then there exist a number field k and a finite set S ⊂ Mk such that C(Ok,S ) is infinite. (Hint: show that, over a suitable number field k, C˜ is isomorphic to P1 and k[C] ∼ = k[t] or k[C] ∼ = k[t, 1/t], according to whether r = 1 or r = 2.) Exercise 3.65 In the notation of Exercise 3.64, show that, if r = 1 and C(Z) is infinite, then there exist positive constants c, α , such that, asymptotically, #{P ∈ C(Z) : H(P) ≤ X} ∼ cX α ,

for

X → ∞.

Show also that, if r = 2, then #{P ∈ C(Z) : H(P) ≤ X} log X. (Hint: obtain polynomial and exponential parametrizations for the integral points. For the second part, it will help to show that the points at infinity are defined over a quadratic field over Q; this will relate the question to the units in a quadratic field, bringing to bear a Pell equation.) Exercise 3.66 Let f ∈ Z[X,Y ] be such that f (X, p(X)) = 0 if p ∈ Q[X]. Let then Σ be the set of integers n ∈ N such that f (n,Y ) = 0 has at least an integer solution. Prove √ that N \ Σ is infinite and that actually (Siegel, Fried) #{n ∈ Σ : n ≤ X} X, where the “exponent” 1/2 is the best-possible one. (Hint: apply Siegel’s theorem to the components of the curve f (X,Y ) = 0. By making use of Exercise 3.65, in the notation of that exercise, one can assume that r = 1.) See [Se1], Section 9.7, and Exercise 1.48 for the case f (X,Y ) = Y 2 − f (X). See also [Sch1], pp. 309–310, for references and [DTZ], [Sch1], Theorem 51, p. 321, for related results. Exercise 3.67 (i) Let f ∈ k[X,Y ] be homogeneous without multiple factors. Prove that f (X,Y ) + c is irreducible for all c ∈ k∗ . (Hint: consider the polynomial in T , over k(X,Y ), given by T d ( f (X/T,Y /T ) + c), where d = deg f .) (ii) Deduce Thue’s theorem from Siegel’s theorem. (Siegel’s two finiteness criteria, i.e. on the genus and points at infinity, suffice.) Note. Schinzel [Sch2] has applied Siegel’s theorem to prove that, if k = Q and d ≥ 3, then f (X,Y ) + g(X,Y ) = 0 has only finitely many integer solutions provided f is not a power of a linear or quadratic factor (up to a constant) and deg g < d. Exercise 3.68 (i) Let D be the sum of r hyperplanes in Pn in general position (i.e. any n + 1 of them have an empty intersection). Prove that, if r = n + 1, then, for suitable k, S, there exist sets of quasi-S-integral points in V := Pn \ D which are Zariski-dense in V . (ii) Find examples with r = n + 2 with infinite

3.10 Exercises

115

sets of quasi-S-integral points. (Now they cannot be Zariski-dense, in view of Theorem 3.20.) Exercise 3.69 Prove that the number dimV + ρ + 1 in Theorem 3.20 cannot be lowered without supplementary assumptions. (Consider e.g. V = P1 × P1 . The same example shows that the inequality r ≥ 4 in part (a) of Theorem 3.22 is not itself sufficient to conclude the argument.) (2)

Exercise 3.70 (i) Prove that the symmetric product P1 is isomorphic to P2 (but P21 is not). (Consider the function from P1 × P1 to P2 defined by (u : v) × (u : v ) → (uu : uv + u v : vv ).) (ii) Use (i) and Theorem 3.20 to prove Theorem 3.34 in the case C˜ = P1 . Exercise 3.71 Let f ∈ k[X0 , . . . , Xn ] be homogeneous, defining a divisor D in Pn . Let Σ be a quasi-S-integral set for Pn \ D. Prove that one can find a finite representing P in Pn , such set S ⊃ S and, for all P ∈ Σ, a vector x(P) ∈ OSn+1 that f (x(P)) ∈ OS∗ . (Hint: choose a finite S ⊃ S so that OS is a principal ideal domain, then choose good coordinates for P ∈ Σ and observe that the Xid / f (X) are regular outside D.) Exercise 3.72 Let V be a “general” hypersurface of degree d in Pn+1 , not containing a given point Q, and let D ⊂ Pn be the branch locus of the projection of V from Q. Prove that the S-integral points of Pn \ D are contained in a subvariety of dimension ≤ max(0, n + 2 − d). (Hint: assuming Q = (0 : · · · : 0 : 1), express V by F = 0, where F is monic in Xn+1 . Then view D as a discriminanthypersurface f = 0 and use Exercise 3.71; now use Hermite theorem on the finiteness of number fields with bounded degree and discriminant in OS∗ to conclude that all the inverse images of the integral points lie in a given number field; this is similar to the use of the Chevalley–Weil theorem. Finally, factor f as a product of differences of “roots” ρi − ρ j and apply Theorem 2.4, with n = 2, to the identities (ρi − ρl ) = (ρi − ρ j ) + (ρ j − ρl ).) Exercise 3.73 With the notation being as usual, let f ∈ k[X,Y ] be an absolutely irreducible polynomial, monic in Y , and suppose that there are infinitely many (x, y) ∈ OS∗ × OS such that f (x, y) = 0. Prove that there exist a positive ¯ integer m and a polynomial p ∈ k[T,U] such that f (X m , p(X, X −1 )) = 0. (Hint: if the set of zeros and poles of x and poles of y – as functions on the curve f = 0 – altogether has at least three points then we have finiteness by Siegel’s theorem. Then we may suppose that there are at most two points in this set, and then exactly two, and that the curve has genus zero. If t is a function with a simple pole and a simple zero at those points, then x must be a power of t and y a polynomial in t,t −1 ; the conclusion follows.

116

Integral Points on Curves and Other Varieties

Alternatively, we may view the curve in question as embedded in Gm × A1 and having infinitely many S-integral points. The set of points at infinity with respect to its completion in P1 ×P1 contains the fiber above 0, ∞ with respect to the first projection. On changing X into X m for a suitable m, each component of the resulting curve is unramified above 0, ∞ on the first projection. Since one at least of the components has an infinity of integral points, there can be at most two points at infinity. But this implies that the degree of the corresponding projection is 1, whence the conclusion. The argument also shows that we can take p ∈ k[T,U].)

3.11 Notes In the language of arithmetic varieties (see e.g. [L3], Part VII) the integral points are those which do not meet infinity; namely, the absence of denominators implies that the reduction modulo p is never infinite, for any prime p. This interpretation is essentially the same as the one given in Section 3.1 above with reduction. No effective version of Siegel’s theorem is known at present, except in some special cases, when Baker’s theory applies. For instance, one can compute the integral points when the curve has genus 0 or 1, and also in the superelliptic case, i.e. when a defining equation has the shape Y m = f (X) (see [B], Chapter IV, or [Se1]). One may reduce to Baker’s theory also the case of a Galois cover of the affine line; this may be reformulated by saying that, given a polynomial f ∈ k[X,Y ], one can compute (or parametrize, as the case may be) the integers a ∈ OS such that f (a,Y ) has all of its solutions12 in k. This was found by Yu. Bilu (see [Bilu], also for further effective criteria) and (later) by Dvornicich and Zannier (see e.g. [Z2]). For solutions in ordinary integers of Z, the effective Runge theorem is sometimes available (see [Bo3], [GS]). However, it is always possible to establish whether a given curve has an infinity of integral points (by Siegel’s theorem this boils down to the case of genus 0). Alternative proofs of Siegel’s theorem have been provided by A. Robinson and P. Roquette [RoRo], using the language of non-standard analysis, and by C. Gasbarri [Ga], using a generalization of Dyson’s method in Diophantine approximation. 12

Rather than merely one solution.

3.11 Notes

117

Schinzel [Sch1], p. 50, and Bilu and Tichy [BiT] have applied Siegel’s theorem to the classification of “separated-variables” equations G(X) = H(Y ) with infinitely many integral solutions. In the striking analogy pointed out by Vojta in [Vo1] (see also [Vo5]), Siegel’s theorem parallels the fact that there exists a non-constant holomorphic function on C to an affine curve C only if C has genus 0 and at most two points at infinity; this is an extension of the celebrated little Picard theorem that a meromorphic non-constant function on C assumes all but at most two values (including ∞; see [Fo], p. 213). With this in mind, note also that, when C is the affine line (one point at infinity), the polynomial functions are holomorphic from C to C; correspondingly the integral points on C admit a polynomial parametrization. And similarly, when C is the affine line deprived of the origin (two points at infinity), the exponential functions (but not the polynomials) are holomorphic from C to C; and now the integral points on C admit exponential parametrizations. This analogy can be pursued, at least conjecturally, to higher dimensions. It is conjectured that, given a quasi-projective variety V defined over a number field, the following statements are equivalent. (1) There exists a holomorphic map C → V (C) with Zariski-dense image. (2) There exists a number field k containing the field of definition for V and a ring of S-integers OS ⊂ k such that V (OS ) is Zariski-dense. This has been proved for curves, by comparing the theorems of Siegel and Faltings with the aforementioned little Picard theorem. Broad general conjectures on integral points, involving suitable geometric invariants of the relevant varieties, are due to Lang and Vojta; see [L3], [Vo1], [Vo4], [BoG], [HiSi], and [Co2]. Their analogues for holomorphic maps are due to Green and Griffiths. A particular case of Vojta’s conjecture, containing all the results presented in this chapter, reads as follows. Vojta’s conjecture Let V be a smooth quasi-projective variety defined over a number field k, and let V˜ be a smooth projective completion of V such that D := V˜ \V is a (reduced) normal crossing divisor. Let K be a canonical divisor for V˜ . If D + K is big13 then no set of S-integral points is Zariski-dense. Again, this is settled in dimension one due to the theorems of Siegel and Faltings.

13

˜ A divisor A on a complete variety V˜ is said to be big if h0 (V˜ , O(nA)) ndim V ; equivalently, a multiple of A is linearly equivalent to the sum of an ample and an effective divisor.

118

Integral Points on Curves and Other Varieties

The requirement that D has normal crossing singularities (if any) cannot be omitted: see, for instance, the example arising in Exercise 3.63. Results on the “density” of integral points (somewhat like in Exercise 3.65 above) are due e.g. to Schmidt [S6] (who treats hypersurfaces), to Bombieri and Pila [BoP] (also for non-algebraic curves) and to D. R. Heath-Brown [H-B]; the estimates are remarkably uniform with respect to the coefficients of the relevant equations. Recent results concerning the Markov surface (see also Section 4.7) have been obtained by A. Ghosh and P. Sarnak in [GhS]. A. Gamburd, M. Magee, and R. Ronan considered in [GMR] similar hypersurfaces in higher dimensions. These density results can be viewed as limit cases of Vojta’s conjecture. As already remarked, Laurent’s theorem 2.7 clarifies the structure of integral points on subvarieties of Gnm ; little is known already for subvarieties of A1 × Gnm ; see Exercise 3.73 above for the case n = 1, Chapter 4 below for results in (rather) special cases (e.g. Theorem 4.18), and the notes to Chapter 4 for a relevant conjecture. The conjecture presented in Section 3.5.1 can be viewed as the arithmetic analogue of the “unicity theorem” of [CoNo], asserting that, given an abelian variety A and an ample divisor D ⊂ A, the pair (A, D) can be recovered from the set f −1 (D), where f is any holomorphic map f : C → A with a Zariski-dense image.

4 Diophantine Equations with Linear Recurrences

4.1 Linear Recurrences Linear recurrences have an ancient tradition in number theory. Their prototype is the Fibonacci sequence (defined by F0 = 0, F1 = 1, and, for n ≥ 2, Fn = Fn−1 + Fn−2 ), but also the polynomials and the exponential functions on N fall within this realm. Questions like “When is Fn a square?” have been asked for a long time (see e.g. [Mor]). While this question was answered long ago, and can be treated e.g. via Siegel’s theorem on curves, it was only recently that Y. Bugeaud, M. Mignotte, and S. Siksek [BMS] managed to determine all of the perfect powers among the Fibonacci numbers: they proved that 0, 1, 8, and 144 are the only ones. In this chapter we shall investigate this kind of problem for more general linear recurrences. Let us start by recalling a few fundamental definitions and algebraic facts in the topic (see [vdP1] and also [ShT] or [S4] for an ample overview). A sequence { f (n)}n∈N of complex numbers is called a linear recurrence (or sometimes just a recurrence) if there exist a0 , . . . , ar−1 ∈ C, (r ≥ 1), a0 = 0, such that f (n + r) = a0 f (n) + a1 f (n + 1) + · · · + ar−1 f (n + r − 1),

for all n ∈ N.

The minimum integer r with this property is called the order of the recurrence. Let us introduce the generating function, i.e., the formal power series ∞

F(X) =

∑ f (n)X n .

n=0

One verifies at once that, for n ≥ 0, the coefficient of X n+r in the product 119

120

Diophantine Equations with Linear Recurrences

(1 − ar−1 X − · · · − a0 X r )F(X) is f (n + r) − ar−1 f (n + r − 1) − · · · − a0 f (n), which vanishes; hence the product is a polynomial (of degree ≤ r − 1) and F(X) is a rational function (vanishing at ∞). Conversely, the Laurent coefficients of the expansion (at 0) of a rational function coincide with a recurrence from a certain point onwards. On writing s

(1 − ar−1 X − · · · − a0 X r ) = ∏(1 − ρi X)mi , i=1

where ρ1 , . . . , ρs are the distinct roots of the polynomial X r − ar−1 X r−1 − · · · − a0 , the partial fraction decomposition for F(X) immediately shows (Exercise) that there exists an expression, which is essentially unique, of the type s

f (n) = ∑ ci (n)ρin ,

∀n ∈ N,

(4.1)

i=1

where the ci ∈ C[X] are non-zero polynomials and the ρi ∈ C∗ are distinct. Conversely, the right-hand side of (4.1) defines a recurrence sequence. The ρi are called the roots of the recurrence; they are roots of the polynomial X r − ar−1 X r−1 − · · · − a0 . The right-hand side of (4.1) is also called an exponential polynomial.1 The recurrence is said to be simple if all the ci (n) are constant (and the exponential polynomial is then called a power sum) and non-degenerate when no ratio of distinct roots is a root of unity. (We agree that the zero recurrence is degenerate.) In general, we shall say (differently from other authors) that a recurrence f is defined over k if ci ∈ k[X] and ρi ∈ k∗ in (4.1). Note that for f to be defined over k it is not sufficient that the values f (n) lie in k; if this is the case, however, f is defined over √ a finite extension of k (e.g. the “roots” of the Fibonacci sequence are (1 ± 5)/2). In what follows we shall mainly deal with recurrences defined over Q. Anyway, many results may be reduced to this case by specialization. Namely, the field generated by the roots ρi and by the coefficients of the ci is a finitely generated extension of Q, in practice the function field of a certain algebraic variety defined over Q. An algebraic point on this variety defines a specialization of the roots and the coefficients, producing a recurrence over Q. For a simple recurrence f defined over a number field k, (4.1) shows that all the values f (n) are expressible as sums of a bounded number of S-units, for a suitable finite set S ⊂ Mk : it suffices that the ci and ρ j all lie in OS∗ . This observation already shows why Diophantine approximation, in particular the results of Chapter 2, may be relevant in studying recurrences. 1

However, an exponential polynomial is often thought of as a function on the whole C.

4.1 Linear Recurrences

121

From the above-recalled uniqueness of the expression (4.1) we deduce that an exponential polynomial vanishes for all n ∈ N only if its defining expression is empty. Below, we shall see a much more precise result on the structure of zeros, i.e., the integers n ∈ N such that f (n) = 0 (see also Section 2.5, Exercise 2.16). The recurrences have the important property that, if f (n) is a recurrence defined over k and if q, r ∈ Z, then f (qn + r) is of the same type (as a function of n). This simple fact, which immediately follows from (4.1), often proves to be quite useful. The expression (4.1) also shows that the recurrences defined over k form a ring. We shall establish in a moment a simple and useful result on its structure, anticipating a few remarks (see e.g. [vdP1] for a more complete theory). For a recurrence { f (n)} (as in (4.1)) the group generated in C∗ by its roots is relevant. Conversely, given a subgroup G ⊂ k∗ , we may focus on the ring denoted Rk [G] of the recurrences defined over k all of whose roots lie in G. That Rk [G] is in fact a ring is again clear from (4.1). To study it better, let us consider a few simplifications and normalizations. To start with, if we deal with a finite number of recurrences, we can often suppose that G is finitely generated. Given this assumption, we may further easily reduce to the case in which G is torsion-free. In fact, let q ≥ 1 be the (finite!) order of the torsion subgroup of G. If { f (n)} ∈ Rk [G], let us consider the q recurrences fr (n) := f (qn+r), for r = 0, . . . , q−1; using (4.1) once more, we see that each fr ∈ Rk [[q]G], where [q]G = {gq : g ∈ G}. Also, it is clear that the recurrence f is completely reconstructed from our knowledge of all the fr . Further, for a given r, the map f → fr is a ring homomorphism from Rk [G] to Rk [[q]G]. One can even verify (see Exercise 4.27 below) that by combining all these q homomorphisms one obtains an isomorphism Rk [G] ∼ = (Rk [[q]G])q . In conclusion, it is sufficient for many purposes to study Rk [[q]G]; the advantage is represented by the fact that [q]G is torsion-free, as in the following easy exercise. Exercise 4.1 Let G be an abelian group with finite torsion subgroup of order q. Prove that [q]G is torsion-free. In the present context, for torsion-free groups the following holds. Proposition 4.2 Let G ⊂ k∗ be torsion-free, of finite rank t ≥ 1. The ring Rk [G] is isomorphic to k[X, T1 , . . . , Tt , T1−1 , . . . , Tt−1 ], where the isomorphism is induced by X → {n}, Ti → {gni }, where g1 , . . . , gt are (any) given independent generators for G. In particular, Rk [G] is a unique factorization domain. The proof is easy. First, it is clear that the function in the statement is a

122

Diophantine Equations with Linear Recurrences

surjective ring homomorphism. That it is injective immediately follows from the fact that the functions on N : n → n and n → gni are algebraically independent; in turn, the verification of this claim (Exercise) reduces at once to the multiplicative independence of the gi and to the above-mentioned fact that a non-zero exponential polynomial cannot vanish for all n ∈ N. (See also [Rum], paper II.) Remark One may relate Rk [G] to the group algebra k[Z ⊕ G]. However, the present notation should not be confused with that for the group algebras. This proposition transfers many algebraic verifications on recurrences to the case of (Laurent) polynomials; for instance, one can speak of “coprime recurrences”; also, the quotient between two recurrences is again a recurrence if and only if there is divisibility between the corresponding polynomials, and similarly for a recurrence that is a perfect power of another recurrence. Note also that to obtain the stated isomorphism we are free to choose a basis g1 , . . . , gt for G, and often some special choice may lead to a simplification. Finally, observe that a non-zero recurrence having roots in a torsion-free group is automatically non-degenerate. Linear recurrences arise naturally in many situations: we have already mentioned Taylor expansion of rational functions; a second occurence of linear recurrences is the iteration of endomorphisms of finite-dimensional vector spaces or, in other words, powering of square matrices. Given a d × d matrix A = (ai, j )i, j with entries in any field, its powers An = (ai, j (n))i, j are expressed by d 2 sequences n → ai, j (n). It turns out that each of these sequences is eventually recurrent; actually if det A = 0, these sequences are all recurrent and the minimal recurrence relation they satisfy is associated with a divisor of the characteristic polynomial of A. In fact, on letting pA (T ) := det(T · I − A) ∈ k[T ] be the characteristic polynomial, from the fact that PA (A) = 0 it follows that, for all n ≥ 0, An+d = a1 An+d−1 + · · · + ad An , where a1 = Tr(A), . . . , ad = ± det(A) are the invariants of A. This means precisely that the above relation holds for every entry of An , An+1 , . . . , An+d . Linear recurrences appear also as counting functions for the number of rational points on an algebraic variety over a finite field. Denoting by Fq the finite field with q elements, let X/Fq be an algebraic variety defined over Fq . For each natural number n, we can consider the finite set X(Fqn ) of points of X with coordinates in Fqn . We then obtain the integer sequence n → (X(Fnq )). It was proved by Dwork, after preliminary work by Hasse and Weil, that such

4.2 Zeros of Recurrences

123

sequences are always linear recurrent. In term of generating functions, it is often expressed in term of the rationality of the so-called zeta function attached to the algebraic variety X: (X(Fqn )) n t . (4.2) Z(t) = exp ∑ n n≥1 (On taking the logarithmic derivative, one can see that the rationality of Z(t) implies the rationality of the generating function ∑n≥1 (X(Fqn ))t n .) In the case of a smooth projective curve of genus g, the recurrence has order 2g + 2. The last two examples of occurrences of linear recurrences in apparently distant fields of mathematics are actually naturally related, as shown by Weil and Grothendieck. The link is provided by viewing the rational points over Fqn on a variety X defined over Fq as those points in X(Fq ) which are fixed by the nth iteration of the Frobenius endomorphism. Since every endomorphism of a (projective) algebraic variety acts linearly on the (finite-dimensional) cohomology spaces, Weil’s idea was to apply a substitute of the Lefschetz trace formula in this context. This trace formula was shown to hold, in a cohomology theory introduced by Grothendieck. In Section 4.4 we shall apply some general arithmetic results on linear recurrences to the particular recurrences arising from algebraic varieties (especially algebraic groups) over finite fields.

4.2 Zeros of Recurrences The problem of describing the n ∈ N such that f (n) = 0 is classical. Simple examples like f (n) = 1 + (−1)n show that their set may be infinite if f is degenerate, even if f = 0. The non-degenerate case is, in contrast, far from obvious in general; one can reduce to this case by the process illustrated in Section 4.1: that is, by partitioning N into a finite number of suitable arithmetic progressions. When f is defined over R and has positive roots, a simple inductive method relying on Rolle’s theorem even shows (Exercise 4.35) that the number of (real) zeros of f is bounded by its order (see also [GL], p. 221, Lemma 1); in general, however, this approach does not apply. Generally, a simple case occurs when in (4.1) there exists an absolute value ν on k and a unique root which is maximal for ν . One then speaks of a dominant root (for ν ); this assumption substantially simplifies many problems on recurrences. In this case, if the dominant root is,

124

Diophantine Equations with Linear Recurrences

say, ρ1 , we find at once that | f (n)|ν |ρ1 |nν · n−d , proving the finiteness of zeros and much more. Without any of these assumptions, in Exercise 2.16(i), we recalled, as a “hint,” a well-known approach to this problem, relying on Theorem 2.4, for simple and non-degenerate recurrences, defined over a number field (one can achieve this last assumption by elementary means – see [S4], Sections 9 and 10, or [BoMuZ]); one uses the fact, as has already been observed, that the algebraic numbers ρin are S-units, for a suitable finite set S independent of n. With a little more effort, the same method applies to non-simple recurrences as well, since the polynomial growth of the coefficients does not greatly affect the estimates involved in the application of the subspace theorem; we leave the details as an exercise. This approach is the best suited for the problem, since it yields both (a) the estimate | f (n)|ν ε |ρ1 |nν exp(−ε n), for any ε > 0, (even if ρ1 is not dominant), and (b) uniform quantitative conclusions which seem to lie outside the range of other methods. For details and much more on this approach see the paper [E3] and the recent book [EG] by Evertse-Gy˝ory. We shall not pause on these points, but instead illustrate a p-adic method, which is substantially elementary (being in a way an extension of the aforementioned method for the reals), leading to the following elegant result, which has been proved at various levels of generality by several authors. Theorem 4.3 (Skolem, Mahler, and Lech, [vdP1], [S4], [Z6]) The set of zeros of a recurrence f is the union of a finite set with a finite union of arithmetic progressions. If f is non-degenerate, it is a finite set. Sketch of proof. Let p be a prime number; it is easy to see that, since Q p has infinite transcendence degree over Q, it is possible to embed a field of definition k for f (given by (4.1)) in a finite extension L p of Q p (see [Se2], p. 61). Moreover, if p is large enough, this may be done with the additional property that all the roots ρi are p-adic units. (If we assume that the recurrence is defined over a number field k, it suffices to embed k in the completion kv , for a place v ∈ Mk such that |ρi |v = 1, ∀i.) Now, let q be the cardinal of the residue field of L p . Put Q = q2 (q − 1), λi = ρiQ . Then λi ≡ 1 (mod p2 ). For λ ≡ 1 (mod p2 ), one can consider the logarithm log λ := ∑∞j=1 (−1) j−1 (λ − 1) j / j, the series being convergent, since |λ − 1| p < p−1 . Let, for x ∈ L p , E(x) = exp(x log λ ) = ∑∞j=0 (x log λ ) j / j!. This is an analytic function of the p-adic variable x, for |x| p ≤ 1; moreover, we have E(n) = λ n for n ∈ N. (See e.g. [DGS] or [L1], Chapter IX, for these elementary facts.)

4.2 Zeros of Recurrences

125

Using this in (4.1), with λ = λi , i = 1, . . . , s, we see that, for every fixed integer R ∈ {0, 1, . . . , Q − 1}, the function fR (n) := f (nQ + R) = ∑si=1 ci (nQ + R)ρiR λin is the restriction to N of a p-adic function, which is analytic for |x| p ≤ 1. Suppose now that fR has infinitely many zeros in N. Since the set of p-adic integers in L p is compact and contains N, there exists some limit point for these zeros. However, fR must then be identically zero (there is an easy proof, as in the classical case). Therefore, some of the functions fR will be identically zero (corresponding to the vanishing of f on the whole respective progressions {Qn + R}n∈N ), while the remaining functions fR will have each a finite number of zeros. This proves the first part of the theorem. Suppose now that the recurrence f = 0 vanishes on the whole progression {Qn + R}n∈N , namely that fR is zero. Define Σ = {ρ1Q , . . . , ρrQ }; we have ⎛ ⎞ f (Qn + R) =

∑⎝ ∑

σ ∈Σ

ρiQ =σ

ci (Qn + R)ρiR ⎠ σ n =:

∑ Cσ (n)σ n = 0

σ ∈Σ

∀n ∈ N.

We must have Cσ (n) = 0 identically, for every σ ∈ Σ. Since, however, the ci (n) are non-zero, there will appear at least two terms ci (Qn + r)ρiR corresponding to each σ ; in particular, there exist i = j such that ρiQ = ρ Q j , whence ρi /ρ j is a root of unity and f is degenerate. The proof just sketched yields an estimate for the number of zeros and progressions, in terms of (4.1). Many papers have been devoted to the problem of an optimal estimate (see [Be2], [ShT]). For instance, Schmidt, confirming a well-known longstanding conjecture, has shown [S5] that the number of zeros and vanishing arithmetic progressions is bounded only in terms of the order of the recurrence, not on the coefficients and roots. The Skolem–Mahler–Lech theorem admits a natural formulation and generalization in the language of (commutative) algebraic groups. Using the alreadymentioned relation between linear recurrence sequences and powerings of square matrices, one can reformulate Theorem 4.3 as follows. Theorem 4.4 Let g ∈ GLd (C) be a non-singular square matrix, and let X ⊂ GLd be an algebraic variety. Let Γ ⊂ GLd (C) be the cyclic group generated by g and let Γ ⊂ GLd be its Zariski closure. If the intersection Γ ∩ X is infinite, ¯ then X contains an irreducible component of Γ. We leave it to the reader (see Exercises 4.28–4.31) to prove the equivalence between the above statement and Theorem 4.3. As shown in [Z6], the same statement holds over an arbitrary algebraic group G instead of the linear group GLd . The main point in its proof is the same as in the Skolem–Mahler–Lech theorem: working p-adically one can identify

126

Diophantine Equations with Linear Recurrences

locally G(Q p ) with a neighborhood of the origin in its tangent space at the identity element. Then the fact that Γ ∩ X is infinite amounts to the infinitude of the intersection of the “logarithms” of powers of g, which form an additive G , with a p-adic analytic subvariety of Qdim G . The subgroup of a ball in Qdim p p details are left as exercises (see Exercises 4.31–4.33). An application. We conclude this section with a nice application, due to Skolem (see [BS], also for a rather more general analysis), to the finiteness of integer solutions of a cubic Thue equation X 3 − dY 3 = c (d, c ∈ Z). We sketch the argument. For an integer solution p, q ∈ Z, we have NQk (p− δ q) = c, where δ , supposed here to be irrational, is a real cube root of d and k = Q(δ ). By elementary arguments from algebraic number theory we conclude that p−qδ = ζ ϕ , where ϕ lies in a finite set independent of p, q and ζ is a unit, in Ok∗ (observe that the ideal (p− δ q)Ok has only finitely many possibilities). Now, by Dirichlet’s Unit theorem recalled in Section 1.2.2, Ok∗ is now a group of rank 1 (there are two places in Mk,∞ ), whence one can write ζ = gn θ , for a fixed generator g of the free part of Ok∗ , for an integer n depending on p, q and for a root of unity θ = θ (p, q) lying in a finite set (actually, θ ∈ {±1}). Upon conjugating the resulting equation we obtain the three equations p − qδ (i) = gni θi ϕi , i = 1, 2, 3, whence, eliminating p, q, we have finally an equation a1 gn1 + a2 gn2 + a3 gn3 = 0, where the ai have finitely many (non-zero) possibilities. Now, if the Thue equation had an infinity of solutions, we could go to an infinite subset of them and assume that a1 , a2 , a3 are fixed. Then the recurrence f (n) = a1 gn1 + a2 gn2 + a3 gn3 would have an infinity of zeros. However, it is easy to verify that it is non-degenerate, which amounts to a contradiction with Theorem 4.3.

4.3 Quotients of Recurrences and gcd Estimates For recurrences f , g, the quotient f (n)/g(n) is not a recurrence in general; its generating function is often called the Hadamard quotient of the respective generating functions. A necessary condition for it to be a recurrence (or, better, for it to coincide with a recurrence when g(n) = 0) is of course that all the values f (n)/g(n), n ∈ N, lie in a finitely generated ring (where we require that f (n) = 0 whenever g(n) = 0). It was apparently Pisot (see [vdP1]) who conjectured the converse implication, whereas it was van der Poorten [vdP2] who obtained a general proof, after an incomplete argument had been presented by Pourchet [Po]. See also

4.3 Quotients of Recurrences and gcd Estimates

127

the detailed exposition in Rumely’s paper [Rum], describing among other things a method of specialization to reduce to the case when f , g are defined over a number field. If the recurrence g admits a dominant root (see Section 4.2) and if we assume f (n), g(n), f (n)/g(n) ∈ Z for all n ∈ N, an elementary approach is possible (see Exercise 4.36 below for an instance). The general case is much more delicate, and the ingenious proof by Pourchet and van der Poorten relies on an intricate auxiliary construction and certain p-adic estimates. However, even this method leaves open the natural question of the infinitude of the set of n ∈ N such that f (n)/g(n) lies in Z, or in a prescribed finitely generated ring R. The specialization argument works also with the weaker hypothesis, but, when f , g are defined over Q, it is crucial to assume that all the values (not merely an infinity of them) lie in R in order for that approach to work. The problem corresponds to the finiteness of the number of solutions to (semi-linear) Diophantine equations of the shape f (n) = mg(n), where f , g are recurrences and m ∈ Ok,S . An answer comes from [CZ1], Theorem 1, for the situation when f , g are simple non-degenerate recurrences defined over Q. In this case, with the aid of the subspace theorem it is established that If f (n)/g(n) ∈ Z for infinitely many n ∈ N, then f /g is a recurrence. The restriction to the non-degenerate case is immaterial, as we have pointed out in Section 4.1; and the method of [CZ1] often works even over Q. In the general case however, it is crucial for that method that g admits a dominant root (which is automatically the case when the roots lie in Q). The assumption about the dominant root is finally eliminated in [CZ3]. In particular, the following result was proved. Theorem 4.5 ([CZ3], Theorem 1) Let OS be a ring of S-integers, and let f , g be linear recurrences with values in OS . If f (n)/g(n) ∈ OS for infinitely many n ∈ N, there exist a non-zero polynomial P(n) and positive integers q, r such that both P(n) f (qn + r)/g(qn + r) and g(qn + r)/P(n) are recurrences. When g admits a dominant root, the idea of the method is to approximate 1/g by a power sum via an expansion in geometric series. For instance, 1 ∞ −3n − 1 j 1 = . ∑ 3n + 5n + 1 5n j=0 5n On truncating the series we obtain the aforementioned approximation in sums of units, to which the subspace theorem is applied. When there is no dominant root, such an expansion is not possible. However, one can obtain an expansion by isolating all the terms of maximal absolute

128

Diophantine Equations with Linear Recurrences

value. The difficulty is that the expansion so obtained is not made of S-units. On multiplying by monomials in the dominant roots of g we can, however, obtain several linear combinations in sums of units to which the subspace theorem can be applied. Often (e.g. when g is simple) one can take P = 1, but this is not generally the case, as shown by examples like f (n) = 2n , g(n) = nd and f (n) = 2n − 2, g(n) = n; in the latter case, f (n)/g(n) is an integer whenever n is a prime, hence for a fairly dense set in N. In [CZ3], Appendix, a density conclusion is shown in this direction, which very easily yields another kind of sharpening of van der Poorten’s theorem. The conclusion about the progression {qN + r} cannot be generally improved by choosing the modulus q = 1 (look e.g. at the case f (n) = 2n + 1, g(n) = 2n + (−1)n ). However, q may be chosen as in Section 4.1, namely such that the roots of f (qn + r), g(qn + r) generate a torsion-free group (see [CZ3], Theorem. 2). Roughly speaking, these results say that a divisibility relation between infinitely many pairs of values f (n), g(n) may always be explained by algebraic identities (except for a polynomial factor). In other words, if we do not have divisibility in the ring of recurrences, then there is no divisibility between the values, with a finite number of exceptions at most. Note that Proposition 4.2 allows one to check divisibility in the appropriate ring for any given f , g. Actually, the method of [CZ1] (or [CZ3]) yields, more precisely, a nontrivial bound for the cancellation in the quotient f (n)/g(n), i.e., for the gcd ( f (n), g(n)). In some cases, like (an − 1)/(bn − 1), it is possible to get an almost best-possible conclusion in this direction: in the authors’ joint paper [BuCZ] with Y. Bugeaud it is proved that the following theorem holds. Theorem 4.6 Let a > 1, b > 1 ∈ Z be multiplicatively independent integers and ε > 0 a positive real number. Then, for all but finitely many n ∈ N, gcd(an − 1, bn − 1) < exp(ε n). See Exercise 4.37 below for a polynomial analogue of this theorem. Note that, for fixed a, b and varying n, the numbers an and bn are S-units in Q, for a fixed finite set of places S. It is then tempting to see what can be said about greatest common divisors of pairs of numbers of the form (u − 1, v − 1), for units u, v in a fixed group of S-units OS∗ . In order to work over arbitrary number fields, we give the following (natural) definition for the greatest common divisor in a ring of S-integers: given two S-integers α , β , not both zero, we

4.3 Quotients of Recurrences and gcd Estimates

129

define log gcdS (α , β ) = − ∑ min(0, log(max(|α |ν , |β |ν ))), ν ∈S

where the sum runs over the places of k outside S. In the case OS = Z, we obtain the usual notion of (logarithmic) greatest common divisor. With this convention, Proposition 2 in [CZ10] reads as follows. Theorem 4.7 Let k be a number field, and let S ⊂ Mk be a finite set, with ε > 0. All but finitely many solutions (u, v) ∈ OS∗ 2 to the inequality log gcd(u − 1, v − 1) > ε max(h(u), h(v)) lie in a finite union of proper algebraic subgroups of G2m defined by an equation ua = vb , with (a, b) ∈ Z2 \ {0} and max(|a|, |b|) ≤ ε −1 . In particular, if we restrict the discussion to multiplicatively independent pairs of S-units, we have the same estimate log gcd(u − 1, v − 1) = o(max h(u), h(v)) as for the case (u, v) = (an , bn ) in Theorem 4.6. For simplicity, we shall give the proof only in the case of u, v ∈ Z. Proof of Theorem 4.7. We can assume that for (u, v) ∈ Σ, |v| ≥ |u|. Let d(u, v) > 0 be the reduced denominator of the fraction (u − 1)/(v − 1), so we have d(u, v) ≤ 2|v|1−ε for (u, v) ∈ Σ. Write, for an integer j ≥ 1 and integers c j (u, v) ∈ Z, u − 1 c j (u, v) = . z j = z j (u, v) = u j−1 v−1 d(u, v) Then fix an integer h ≥ 0 (to be specified later) and observe the approximation h 1 1 1 ∞ 1 1 = = ∑ r = ∑ r + O(v−(h+1) ). −1 v − 1 v(1 − v ) v r=0 v v r=1

For an integer j ≥ 1 we obtain, on multiplying by u j−1 (u − 1), h h u j−1 u j z j + ∑ s − ∑ r = O(|u| j |v|−(h+1) ). s=1 v r=1 v

(4.3)

We shall apply the subspace theorem, Theorem 2.3, on viewing the lefthand side as a “small” linear form in the variables z j , u j−1 /vs , u j /vr . We shall consider k such linear forms, where k > 0 is a fixed, large enough, integer, to be chosen later. Put n = hk + h + k; it will prove convenient to denote n-dimensional vectors by writing X = (X1 , . . . , Xn ) = (W1 , . . . ,Wk ,Y01 , . . . ,Y0h , . . . ,Yk1 , . . . ,Ykh ).

130

Diophantine Equations with Linear Recurrences

With this notation, let us choose linear forms over Q as follows. For i = 1, . . . , k, let us set Li∞ (X) = Wi +Yi−1,1 + · · · +Yi−1,h −Yi1 − · · · −Yih , while, for (i, ν ) ∈ {(1, ∞), . . . , (k, ∞)}, we put Liν (X) = Xi . Observe that for every ν ∈ S the linear forms L1ν , . . . , Lnν are in fact linearly independent. Further, for a pair (u, v) ∈ Σ, define x = x(u, v) = (x1 , . . . , xn ) by x = d(u, v)vh z1 , . . . , zk , v−1 , . . . , v−h , uv−1 , . . . , uv−h , . . . , uk v−1 , . . . , uk v−h . Note that in fact x ∈ Zn . In order to apply Theorem 2.3, we shall estimate the double product ∏ν ∈S ∏ni=1 |Liν (x)|ν . Observe at once that for i > k we have ∏ν ∈S |Liν (x)|ν ≤ d(u, v): in fact, for i > k, Liν (x) equals the coordinate xi , which has the form d(u, v)ti , where ti = ti (u, v) is an S-unit. The assertion then follows from the product formula ∏ν ∈S |ti |ν = 1 and from ∏ν ∈S |d(u, v)|ν ≤ |d(u, v)|∞ = d(u, v). Therefore n

k

∏ ∏ |Liν (x)|ν ≤ d(u, v)n−k ∏ ∏ |Liν (x)|ν

ν ∈S i=1

ν ∈S i=1

= d(u, v)

n−k

k

∏ |Li∞ (x)| i=1

k

∏ ∏ |xi | p .

(4.4)

p∈S i=1

Moreover, for i ≤ k we have xi = d(u, v)vh zi = ci (u, v)vh , whence ∏ p∈S |xi | p ≤ Further, from (4.3) it follows that |Li∞ (x)| = O(d(u, v)|u|i |v|−1 ), again for i ≤ k. On plugging these estimates into (4.4) we obtain

|v|−h .

n

∏ ∏ |Liν (x)|ν = O(d(u, v)n−k |v|−hk d(u, v)k |u|k |v|−k ) 2

ν ∈S i=1

= O(d(u, v)n |u|k |v|−hk−k ). 2

(4.5)

Recall that d(u, v) ≤ 2|v|1−ε , whence, after a few calculations with (4.5), we find the estimate 2 n k h −ε n . |L (x)| = O |u| |v| |v| ∏ ∏ iν ν ν ∈S i=1

(Note that the implied constants depend only on S, h, k, not on the integers u, v in question.) Choose now once and for all the integer k so that ε k > 2. With such a choice we have ε n > 2h, whence |v|ε n−h > |v|h . Therefore we obtain 2 ∏ν ∈S ∏ni=1 |Liν (x)|ν = O(|u|k |v|−h ). Finally, let us choose the integer h so that h > 1 + k2 , giving n

∏ ∏ |Liν (x)|ν = O(|v|−1 ).

ν ∈S i=1

(4.6)

4.3 Quotients of Recurrences and gcd Estimates

131

On the other hand, since d(u, v) ≤ 2|v|, we see immediately that max |xi | ≤ 2|v|h+k+1 . From (4.6) we then deduce that, if |v| is large enough, n

∏ ∏ |Li,v (x)|ν < (max |xi |)− h+k+2 . 1

ν ∈S i=1

From Theorem 2.3 we now obtain that the vectors x in question all lie on a certain finite union of proper subspaces Λ1 , . . . , Λm of Qn . Hence, it will suffice to prove the conclusion for the pairs corresponding to a fixed Λl , say of the following equation: ζ1W1 + · · · + ζkWk + ∑i, j αi jYi j = 0, where (i, j) runs through {0, . . . , k} × {1, . . . , h} and the coefficients are rational numbers, not all zero. On substituting from the definition of x, we find the equation

ζ1

u−1 uk−1 (u − 1) ui + · · · + ζk + ∑ αi j j = 0, v−1 v−1 v i, j

(4.7)

which is valid for all the pairs (u, v) in question. Now let C be the curve defined in G2m by the equation

ζ1

X −1 X k−1 (X − 1) Xi + · · · + ζk + ∑ αi j j = 0. Y −1 Y −1 Y i, j

We may write the left-hand side in the form f (X)/(Y − 1) + g(X,Y )/Y h . Such a rational function cannot vanish identically, for otherwise Y − 1 would divide the polynomial f (X), yielding f = 0, and then g = 0; hence, all the coefficients ζi , αi j would vanish, which would amount to a contradiction. Thus the equation represents in fact a curve C in G2m , containing, by (4.7), all our pairs (u, v). Such pairs lie in the finitely generated subgroup (OS∗ )2 ⊂ G2m (Q). Therefore, by Theorem 2.7, they lie in a certain finite union of translates of algebraic subgroups of G2m , which is entirely contained in C, and hence distinct from G2m . To obtain the sought conclusion on the structure of Σ, it will now suffice to prove that, If an algebraic translate contains infinitely many pairs in Σ, then it is an algebraic subgroup. Now, such a translate will be given by an equation X aY b = λ , for a certain λ ∈ Q∗ and for certain integers a, b; hence, for infinitely many pairs (u, v) ∈ Σ we shall have ua vb = λ . Now, u ≡ v ≡ 1 (mod (u−1, v−1)), whence λ ≡ 1 (mod (u−1, v−1)). Since (u−1, v−1) ≥ max(|u|, |v|)ε for these infinitely many pairs, we have λ = 1, i.e. the translate is in fact a subgroup, as required. It is, on the other hand, clear that if u, v are multiplicatively independent the pair (u, v) does not lie in any proper algebraic subgroup of G2m ; this fact immediately implies the last part of the conclusion as well.

132

Diophantine Equations with Linear Recurrences

Several applications of Theorem 4.7 are shown in the next Section. Very recently, A. Levin [Lev5] extended this result to higher dimensions by proving the following theorem, where gcdS (a, b), for S-integers a, b, denotes the part of their gcd containing no prime in S. Theorem 4.8 Let n be a positive integer, k be a number field and S a finite set of places as in Theorem 4.7. Let f (X1 , . . . , Xn ), g(X1 , . . . , Xn ) ∈ OS [X1 , . . . , Xn ] be coprime polynomials. For each ε > 0 there exists a finite union Z = Z f ,g,ε of proper translates of algebraic subgroups of Gnm such that log gcdS ( f (u1 , . . . , un ), g(u1 , . . . , un )) < ε max{h(u1 ), . . . , h(un )} for all (u1 , . . . , un ) ∈ OS∗ outside Z. The gcdS may be replaced by the usual gcd whenever the polynomials do not both vanish at the origin. In particular, this leads to a more general version, also in quantitative form, of Theorem 4.5, for ratios of sums of S-units. For instance, given two linear recurrent sequences F, G, with values in a ring of S-integers OS , one can deduce from the theorem above that, if log gcd(F(n), G(n)) > ε n for infinitely many n ∈ N, then there exist positive integers a, r such that the two linear recurrent sequences n → F(a + rn) and n → G(a + rn) have a non-trivial common divisor in the ring of linear recurrent sequences. The particular case of the sequences F(n) = an − 1, G(n) = bn − 1 coincides with Theorem 4.6. We sketch here a simplification of Levin’s argument for the proof of Theorem 4.8. For simplicity and for comparison with the previous proof, suppose we are in the setting of Theorem 4.7 where n = 2 and f (X1 , X2 ) = X1 − 1, g(X1 , X2 ) = X2 − 1. Also, as for the proof of Theorem 4.7, we consider only the case u, v ∈ Z. The proof still makes use of the subspace theorem. The main point in the construction of the linear forms is the following elementary algebraic lemma. Lemma 4.9 For a finite set X ⊂ Z2 denote by VX ⊂ Q[x, x−1 , y, y−1 ] the vector space of Laurent polynomials with support in X, vanishing at (1, 1). Let ν be a (real-valued) valuation of the ring k[x, x−1 , y, y−1 ], which is trivial on k. There exists a basis { f1 , . . . , fd } of VX such that d

∑ ν ( fh ) ≥ ∑

h=1

ν (xi y j ) − max{ν (xi y j ) : (i, j) ∈ X}.

(i, j)∈X

Proof The idea is tantamount to choosing a basis made of binomials. Let xa yb be the monomial on which ν attains the maximal value, among

4.3 Quotients of Recurrences and gcd Estimates

133

the monomials with support in X. For each (i, j) ∈ X the binomial xi y j − xa yb vanishes at (1, 1). The basis f1 , . . . , fd is formed by these binomials, for (i, j) ∈ X \{(a, b)}. Clearly, ν (xi y j −xa yb ) ≥ ν (xi y j ), so the left-hand side in the above formula is bounded from below by ∑(i, j) =(a,b) ν (xi y j ), which coincides with the right-hand side. Let N be a positive integer (which will tend to infinity at the end of our proof, as happened in the proof of Siegel’s theorem given in Section 3.4). Set X(N) = {0, . . . , N} × {0, . . . , N} (many other choices for X(N) will be possible; see our final remarks). Then VX(N) ⊂ Q[x, y] is the vector space of polynomials of partial degrees ≤ N vanishing at the point (1, 1), so that its dimension d satisfies d = dN = (N + 1)2 − 1 = N 2 + O(N).

(4.8)

For each solution (u, v) ∈ (OS∗ )2 ∩ Z2 to the inequality gcd(u − 1, v − 1) > max(|u|, |v|)ε put D = D(u, v) = gcd(u − 1, v − 1). Let {ϕ1 , . . . , ϕd } be any basis of V = VX(N) ∩ Z[x, y] and set ϕ1 (u, v) ϕd (u, v) x= ,..., ∈ Zd . D D For each prime p ∈ S, let ν p be the valuation of the function field Q(x, y) which satisfies

ν p (xi y j ) = −(i log |u| p + j log |v| p ). Let ν∞ be the valuation satisfying

ν∞ (xi y j ) = −(i log |u| + j log |v|). For each place ν ∈ S (where we identify places of S with the corresponding functional valuations as defined above), let ( f1ν , . . . , fdν ) be the basis, made of binomials, coming from the lemma. We can express the polynomials fhν , for h = 1, . . . , d, ν ∈ S, as values at (ϕ1 , . . . , ϕd ) of linear forms L1ν , . . . , Ldν with rational coefficients. The double product d

∏ ∏ |Liν (x)|ν

ν ∈S i=1

is then estimated as follows. Letting (aν , bν ) ∈ X(N) be a point (i, j) where

134

Diophantine Equations with Linear Recurrences

ν (xi y j ) attains the maximum, we have for p-adic places d

1

∏ |Liν (x)|ν ≤ Cν |ua vb |ν · ∏ i=1

|ui v j |ν ,

(i, j)∈X(N)

for some constants Cν independent of u, v. At the Archimedean place we have d

1

|ui v j | . D (i, j)∈X(N)

∏ |Li∞ (x)| ≤ C∞ |ua vb | · ∏ i=1

By the product formula and the fact that u, v are S-units, for each (i, j) ∈ X(N) the product ∏ν ∈S |ui v j |ν equals 1. Hence the double product above satisfies d

∏ ∏ |Liν (x)|ν ≤ ∏ Cν max(1, |uaν vbν |ν )

ν ∈S i=1

ν ∈S

·D−d ≤ C ·max(|u|, |v|)2N ·D−d .

Now, for N → ∞, we have that the height of x is bounded as |u|N · |v|n , while d = N 2 + O(N), so for each positive ε there exist an N = N(ε ) and a positive δ so that the above double product satisfies d

∏ ∏ |Liν (x)|ν ≤ H(x)−δ .

ν ∈S i=1

By application of the subspace theorem in the form of Theorem 2.3, we obtain that infinitely many points x satisfy a fixed linear relation; the rest of the proof runs as in the preceding case.

4.4 Applications of gcd Estimates A first immediate corollary of Theorem 4.7 is a sharp form of a conjecture by Györy, Sarkozy, and Stewart proved in [CZ6]. Corollary 4.10 Let a > b > c > 0 be integers. Then for a → ∞ the greatest prime factor of (ab + 1)(ac + 1) tends to infinity. The original conjecture predicted the same conclusion for (ab + 1)(ac + 1)(bc + 1). For the proof, suppose by contradiction that, for an infinity of triples as in the statement, all the prime factors of (ab + 1)(ac + 1) lie in a certain finite set S independent of a, b, c. Set u = ac + 1, v = ab + 1, so u < v are S-units such that the gcd (u − 1, v − 1) is a multiple of a, and hence ≥ v1/2 . Then this set Σ of pairs (u, v) satisfies the assumptions of Theorem 4.7, with ε = 1/2; there exists therefore a non-trivial equation um = vn verified by an infinity of

4.4 Applications of gcd Estimates

135

the pairs in question. Since u < v are positive integers, we may assume that m > n are positive coprime integers; then the equation implies u = t n , v = t m , for some integer t = t(u, v). Now, the polynomials (X m − 1)/(X − 1) and (X n − 1)/(X − 1) are coprime and it follows at once (see Exercise 4.37 below) that the gcd of the integers (t m − 1)/(t − 1) and (t n − 1)/(t − 1) is bounded in terms of m only.2 This implies (u − 1, v − 1) |t − 1|. Hence |v|1/2 |t| = |v|1/m , whence max(m, n) ≤ 2, yielding v = u2 . Now, u2 ≥ (a + 1)2 > a2 + 1, hence b ≥ a, which is a contradiction. Lower bounds for order of matrices For the next application, we start with the following observation: given two positive integers a, b, bounding from above the gcd of an − 1, bn − 1 in terms of the exponent n amounts to bounding from below the multiplicative order modulo N of the matrix a 0 , 0 b as a function of N (which must run on the sequence of integers coprime with ab). In general, letting A be a d × d square matrix with integral entries and N > 1 be an integer coprime with det(A), we denote by ordN (A) the minimal positive integer n such that An is congruent to the identity modulo N, i.e. the order of the reduction of A in GLd (Z/NZ). Then Theorem 4.6 can be rephrased as follows: Given a diagonal 2 × 2 matrix

a 0 A := , 0 b

the order modulo N of A satisfies limN→∞ ordN (A)/log N = ∞, unless a, b are multiplicatively dependent. Note that in this case the matrix A belongs to a one-dimensional algebraic subgroup of GL2 . A generalization of the above statement has been provided for arbitrary square matrices (with integral entries). As found by Z. Rudnik, this problem is motivated by the dynamics of toral automorphisms. In order to formulate this result, we notice that, given a matrix A ∈ GLd (C), the Zariski closure of the cyclic group generated by A is a commutative algebraic group containing a cyclic group as a dense subset; then the connected component of the identity in such an algebraic group is isomorphic to a product Gea × Gmf , where e = 0, 1 and e + f ≤ d. Let us call this connected component GA . The exponent e vanishes precisely if A is diagonalizable, while f represents the rank of the multiplicative group generated by the eigenvalues of A. 2

One can actually prove that in this case the gcd is always 1.

136

Diophantine Equations with Linear Recurrences

The main theorem from the joint paper [CRZ] with Z. Rudnik can be formulated as follows: Theorem 4.11 Let A be a d × d matrix with integral coefficients and nonvanishing determinant. The following are equivalent. (i) lim infN→∞ ordN (A)/log N < ∞; (ii) there exists a power Ah of A, for an h ≥ 1, which either (iia) is conjugate to a matrix with diagonal blocks which are 2 × 2 matrices T ∈ SL2 (Z) or the identity matrix and, moreover, the eigenvalues of these matrix blocks are all powers of a single unit in a real quadratic field; or (iib) is conjugate to a diagonal matrix whose eigenvalues are all powers of a single positive integer; (iii) GA Gm or GA = {1}. We give a sketch of the proof of the crucial implication (i) ⇒ (ii). First of all, we can reformulate condition (i) by writing log gcd(An − I)i, j > 0, n i.e., there exists ε > 0 such that, for infinitely many integers n > 0, (i )

lim supn→∞

log gcd(An − I)i, j > ε n.

(4.9)

Here, for a matrix B, the gcd Bi, j denotes the gcd of the entries of Bi, j , (i, j) ∈ {1, . . . , d} × {1, . . . , d}. As observed, in the case of a 2 × 2 diagonal matrix with eigenvalues a, b, condition (i) amounts to log gcd(an − 1, bn − 1) > ε /n, which by Theorem 4.6 implies that a, b are multiplicatively dependent, and hence (iib) and (iii). Consider next the case when A is a diagonalizable (over the field of algebraic numbers) d × d matrix; we can then find a matrix P ∈ GLd (k), for some number field k, such that D := PAP−1 is diagonal. Now, if for some integers n ≥ 1, N > 1, An ≡ I (mod N), we can write An = I +NC for some d ×d matrix C with integral coefficients and then Dn − I = P(An − I)P−1 = PNCP−1 . Let d ∈ Z, d > 0 be an integer such that dP−1 has algebraic integral coefficients, so that dDn also has algebraic integer coefficients for every n. Then from (4.9) it follows that log gcd(d · (Dn − I)i, j )i, j > ε n and, looking at the diagonal terms, we shall have that for infinitely many integers n > 0 and all pairs of eigenvalues αi , α j of D (so of A) log gcd(αin − 1, α nj − 1) > ε n − log d >

ε · n. 2

4.4 Applications of gcd Estimates

137

Upon applying Theorem 4.7 with (u, v) = (αin , α nj ) we obtain that αi and α j must be multiplicatively dependent, and this must hold for all pairs 1 ≤ i ≤ j ≤ d. From this (ii) follows easily. On the other hand, if A is not diagonalizable, then a conjugate of it has a block of the form λ 1 0 λ and it is clear that its order modulo N is divisible by N and hence is bounded from below by N, and this implies a much stronger inequality than in the general case. Application to elliptic curves over finite fields A natural and interesting application of Theorem 4.11 (or of Theorem 4.7 on which Theorem 4.11 is based) was found by Luca and Shparlinski in [LS]. It provides a lower bound for the exponent of the group of points of an elliptic curve over a finite field. Let E be an elliptic curve defined over a finite field Fq . For every integer n ≥ 1, the set E(Fqn ) of points of E defined over Fqn has the structure of a finite group. As we said, its order is given by a linear recurrent sequence: namely (E(Fqn )) = qn + 1 − α n − α¯ n √ for complex algebraic numbers α , α¯ of absolute value q. The group E(Fqn ) can be decomposed as the product E(Fqn ) (Z/m(qn )Z) × (Z/l(qn )Z), with 1 ≤ m(qn ) ≤ l(qn ) and m(qn )|l(qn ). Also, the integer l(qn ) represents the exponent of the group E(Fqn ). The main theorem of [LS] reads as follows. Theorem 4.12 Suppose E/Fq is not supersingular. Then for every ε > 0 there are only finitely many integers n > 1 such that the exponent l(qn ) of E(Fqn ) is < qn(1−ε ) . We can also express this theorem by saying that E(Fqn ) tends to be “almost” cyclic, for n → ∞. We show the link between Theorem 4.12 and Theorem 4.11, by following the argument presented by C. Magagna in [Mag] (the original argument by Luca and Shparlinski is slightly different). Consider the action of the Frobenius endomorphism F of E on the Tate module Tr (E), where r is any prime not dividing q. Since Tr (E) is a rank-2 module over the ring Zr , the endomorphsim F is represented by a 2 × 2 matrix with r-adic coefficients. Actually, one can choose a basis under which such a matrix takes integral coefficients. Now, if E(Fqn ) (Z/m(qn )Z) × (Z/l(qn )Z) with m(qn )|l(qn ), it means that all of the m(qn )-torsion subgroup E[m(qn )] is contained in E(Fqn ), and hence F n acts trivially on E[m(qn )]. Since the action on E[m(qn )] is compatible with that on Tr (E), for all prime r|m(qn ), it follows that F n is congruent to the identity

138

Diophantine Equations with Linear Recurrences

modulo m, so the order of F (or its matrix) modulo m(qn ) is at most n. But this gives n/ log m(qn ) → ∞, so m(qn ) < qε n for all large n. Now, since the product m(qn ) · l(qn ) = qn + O(qn/2 ) (Hasse’s theorem, equivalent to |α | = q1/2 ), we obtain l(qn ) > qn(1−ε ) . Another application concerns comparison of the cardinalities, E(Fqn ) and E (Fqn ), for two distinct elliptic curves defined over the finite field Fq . It is known that the two cardinalities coincide for every n ≥ 1 if and only if the two curves are isogenous over Fq . When this does not happen, it is natural to expect that gcd(E(Fqn ), E (Fqn )) is small compared with the cardinalities of E(Fqn ) and E (Fqn ), which both tend to infinity asymptotically as qn . Actually, a result of Magagna, whose proof again uses Theorem 4.7, reads as follows. Theorem 4.13 Let E, E be two non-supersingular elliptic curves defined over a finite field Fq . If E and E are not isogenous, then, for every ε > 0, gcd(E(Fqn ), E (Fqn )) ε exp(ε n). Bogomolov, Korotiaev, and Tschinkel used a similar method, relying on gcd estimates for Frobenius eigenvalues, to prove in [BKT] a kind of grouptheoretic analogue of Torelli’s theorem over finite fields. Namely, they proved an isogeny criterion for two abelian varieties over a finite field based on the existence of “many” homomorphisms between the abstract groups of rational points on such abelian varieties. Zeta functions for dynamical systems. An application of gcd estimates to dynamical systems has been found by R. Miles and appears in [Mi1], [Mi2]. For an endomorphism T of a topological space X, M. Artin and B. Mazur defined the zeta function ∞ |Fix(T n )| n t . (4.10) Z(t) = exp ∑ n n=1 Here Fix(T n ) denotes the set of fixed points for the nth iterate of T , so the above formula makes sense only if such sets are finite for all n. In the case in which the topological space X is replaced by the set X(Fq ), for an algebraic variety X defined over a finite field Fq , and T is the Frobenius endomorphism, we again obtain the zeta function defined by the formula (4.2) above. One can further generalize the setting to the case of several commuting endomorphisms of a topological space. In [Mi1], [Mi2], the author considered a continuous Zd -action by automorphisms on a compact connected abelian group X. Denote by α n , for n ∈ Zd , the corresponding automorphism of X (so α : n → α n will be the homomorphism Zd → Aut(X) defining the action). For each finite index subgroup L ⊂ Zd , define F(L) to be the cardinality of the

4.4 Applications of gcd Estimates

139

set of points x ∈ X fixed by all the α n for n ∈ L. Then the corresponding zeta function will be Zα (t) = exp

∑

L⊂Zd

F(L) [Zd :L] t . [Zd : L]

Again, we obtain (4.10) for a single automorphism (i.e. when d = 1). As before, the above formal power series makes sense only if F(L) is finite for every subgroup L ⊂ Zd of finite index. Its convergence radius is e−g(α ) , where log(F(L)) . d [Zd :L]→∞ [Z : L]

g(α ) = lim sup

(4.11)

(Compare the above expression with the term lim infN→∞ ordN (A)/log N appearing in (i) of Theorem 4.11). The quantity g(α ) is called the upper growth rate of periodic points of α . Before stating the next result, which is formulated in the language of dynamical systems, we introduce a definition. Let X be a compact topological group; then X is provided with a left-invariant Haar measure λX . We say that an action α : Zd × X → X of the group Zd on X is mixing if, for every pair U,V of non-empty open subsets of X, lim λX (α n (U) ∩V ) = λX (U) · λX (V ).

n→∞

The main result in [Mi2], which confirms a conjecture by Lind, reads as follows. Theorem 4.14 Suppose X is a compact connected abelian group of finite topological dimension and α is a mixing Zd action by continuous automorphisms of X. If d ≥ 2 then g(α ) = 0 and the unit circle is a natural boundary for the zeta function Zα . We note that the vanishing of (4.11) is the exact analogue of the divergence of lim infN→∞ ordN /log N, which holds, by Theorem 4.11, under the hypothesis that condition (ii) in Theorem 4.11 does not hold. The latter condition is the formal analogue of the mixing hypothesis appearing in Theorem 4.14. Not surprisingly, the proofs of Theorems 4.11 and 4.14 eventually make use of the same tool, i.e. the gcd estimates of Theorem 4.7. We give a concrete example of the link between Theorem 4.14 and the gcdestimates, following Section 2 of [Mi2]. Suppose X is the Pontryagin dual of the discrete group Z[1/6]: it is the group of all characters χ : Z[1/6] → S1 , where S1 is the complex unit circle. The morphism X χ → χ (1) ∈ S1 gives X the structure of a group extension of S1 by the p-adic group Z2 × Z3 . Take now d = 2 and consider the action α : Z2 → Aut(X) associated with

140

Diophantine Equations with Linear Recurrences

the multiplication by 2 and by 3 on Z[1/6]; explicitly, for a vector (a, b) ∈ Z2 and a character χ ∈ X, the image α ((a, b), χ ) is the character sending Z[1/6] x → χ (2a 3b x) ∈ S1 . Under the projection X → S1 such an action corresponds to squaring and cubing maps on S1 : the vector (a, b) acts on S1 by sending a b z → z2 3 . Now, finite-index subgroups L of Z2 can be generated by vectors of the form (a, 0), (b, c), with a > 0, c > 0, 0 ≤ b < a, where the product ac is the index of L = L(a, b, c). The number of fixed points for L(a, b, c) turns out to be F(L(a, b, c)) = |{x ∈ X : 2a x = x, 2b 3c x = x}| = | ker(x → (2a − 1)x) ∩ ker(x → (2a 3c − 1)x)|. Then F(L(a, b, c)) is the index of the ideal (2a − 1, 2b 3c − 1)Z[1/6] in Z[1/6], so F(L(a, b, c)) = gcd(2a − 1, 2b 3c − 1). As has been already remarked, the index [Z2 : L(a, b, c)] equals ac. An application of Theorem 4.7 gives immediately log F(L)/[Z2 : L] → 0 in this case. Divisibility problems. We end this section by reconsidering the quotients of linear recurrences occuring in the previous section, as well as the divisibility problems for Section 3.5. We proved (Theorem 4.5) that a quotient of the form f (n)/g(n) cannot be integral for infinitely many n (apart from “trivial” cases), where f (n) = ∑i ai αin , g(n) = ∑ j b j β jn , are (finite) power sums. Since, for fixed ai , αi , the terms ai αin are all S-units for a fixed S, we can try to generalize the problem to the quotients of sums of units. The first non-trivial case is represented by f = u − 1, g = v − 1, where u, v are S-units, and has already been considered. It can happen, of course, that the quotient (u − 1)/(v − 1) is integral, but in view of Theorem 4.7, this integrality forces the height of u to be much larger than the height of v. We can state this fact formally, as follows. Theorem 4.15 Let OS be a ring of S-integers. Consider the set Σ ⊂ OS∗ × OS∗ of pairs of multiplicatively independent S-units u, v such that (u − 1)/(v − 1) ∈ OS . Then h(u) lim = ∞. (4.12) (u,v)∈Σ h(v) Note that one cannot expect finiteness in general; consider the ratio (2m − 1)/(3n − 1). Here 2m = u, 3n = v are S-units in the ring Z[1/6]. For each fixed integer n > 0 one can write 3n − 1 as 2h (2p + 1) for non-negative integers h, p and choose for m the order of 2 modulo 2p + 1; then the ratio (2m − 1)/(3n − 1) will be an S-integer in Z[1/2] ⊂ Z[1/6]. So the divisibility problem (u − 1)/(v − 1) ∈ OS can have infinitely many multiplicatively independent solutions. What Theorem 4.15 asserts in the case just analyzed is that the order of 2 modulo the odd part of 3n − 1 tends to infinity faster than n.

4.4 Applications of gcd Estimates

141

Let us consider the following generalization. Given two polynomials f (X,Y ), g(X,Y ) ∈ OS [X,Y ], describe the pairs of Sunits (u, v) ∈ OS∗ × OS∗ satisfying f (u, v) ∈ OS . g(u, v) We expect that the height h(g(u, v)) of g(u, v) tends to infinity with max(h(u), h(v)) in such a way that lim inf h(g(u, v))/ max(h(u), h(v)) > 0; we leave it as an exercise for the reader to classify the polynomials g(X,Y ) for which this does not happen. Then, if g(u, v) divides f (u, v), we can deduce that log gcd( f (u, v), g(u, v)) max(h(u), h(v)). By taking resultants of f (X,Y ), g(X,Y ), first with respect to X and then with respect to Y , we obtain that for two non-zero polynomials ϕ (U), ψ (V ) we have log gcd(ϕ (u), ψ (v)) max(h(u), h(v)). On factoring ϕ (U), ψ (V ) in a suitable extension, this leads to an inequality of the form log gcd(u − α , v − β ) max(h(u), h(v)); upon absorbing α and β into u and v, respectively (after enlarging S), this would lead to a large gcd(u − 1, v − 1), contradicting Theorem 4.7. Hence, apart from in trivial cases which can be effectively determined, there are only finitely many pairs (u, v) in a given finitely generated group such that f (u, v)/g(u, v) is an S-integer. The details of the proof and the classification of exceptional pairs of polynomials are provided in [CZ10]. Let us consider a very special but relevant case: the integrality of the quotient (au + bv + 1)/(u + v + 1), where a, b are fixed non-zero S-integers. On putting y = (au+bv+1)/(u+v+1), this amounts to solving the Diophantine equation y(u + v + 1) = au + bv + 1, to be solved in y ∈ OS , u, v ∈ OS∗ , whose homogeneous form is yu + yv + yw = auw + bvw + w2 . It is a smooth quadric in P3 , and our integrality conditions correspond to removing the three-component divisor uvw = 0. Note that all components meet at the point (y : u : v : w) = (1 : 0 : 0 : 0), so the normal crossing condition appearing in Vojta’s conjecture is not satisfied. After blowing up this intersection point, we reduce to a surface with four components at infinity to which Theorem 3.21 can be applied. So in this case we dispose of two different proofs, both giving unconditional finiteness whenever (a, b) = (1, 1).

142

Diophantine Equations with Linear Recurrences

4.5 Further Diophantine Problems with Recurrences Taking into account the expression (4.1) for a general linear recurrent sequence, the Diophantine equations with recurrences turn out to be special cases of Diophantine equations of polynomial-exponential type, in an arbitrary number of variables. Naturally, among these equations we find the purely polynomial ones, which are more classical, yet far from being fully understood; it could therefore seem futile to study a still more general case. The point is that, on the one hand, exponential equations sometimes occur naturally in the study of the polynomial ones and, on the other hand, the rapidly growing exponential terms sometimes simplify things. We have already met a few examples; e.g., the S-unit equation appears to be of purely exponential type, if we express the variables in the form ga11 · · · gar r , for fixed generators gi of OS∗ . The subspace theorem always appears to be an extremely efficient tool in this context. A theorem of remarkable generality for the polynomial-exponential equations is due to Laurent; however, its statement needs certain definitions, so we omit it from this brief account, referring the reader instead to [Lau] or [S4]. In particular, such a result yields a complete classification of the equations f (x) = g(y) with infinitely many integer solutions, where f , g, are linear recurrences, each having at least one root which is not a root of unity. Note, however, that this apparently innocuous restriction in fact excludes a fairly natural class of equations, like those of the type yd = f (x) (and the conclusions on linear equations of Section 4.3 are also missed by this theorem of Laurent.) Insofar as the equations yd = f (x), for a recurrence f and a fixed d ∈ N, are concerned, it was again Pisot who formulated a conjecture, called the dth-root conjecture, analogous to the one concerning the Hadamard quotient; namely If all the values f (n), n ∈ N, are perfect dth powers in a fixed finitely generated field, then f is the dth power of some recurrence. Partial results were obtained by several authors, including Pisot himself, Lovasz, Bézivin, Perelli and Zannier, and Rumely and van der Poorten; the last pair of authors reduced the problem by specialization to the number-field case. The conjecture was finally proved in [Z4], with methods relying on congruences.3 As for the “Hadamard quotients,” it would be desirable to draw conclusions under assumptions valid for an infinity of values rather than for all values. In this direction, apart from simple recurrences with at most two roots (see [ShT]), the known results always involve a dominant root. 3

See also [Z7] for more general results.

4.5 Further Diophantine Problems with Recurrences

143

When d is large with respect to certain parameters related to the recurrence f , Baker’s estimates for linear forms in logarithms may sometimes be applied; in those cases one may obtain strong finiteness theorems, even for variable d, that are, moreover, effective (see [ShSt]). For unrestricted (but fixed) d (e.g. for d = 2), the first finiteness results valid for any number of roots were obtained in [CZ1], in the general case of simple recurrences defined over Q. That paper actually considers arbitrary algebraic equations F(y, f (n)) = 0, where F is a polynomial and f is a simple recurrence over Q. It is also observed that the same arguments often go through with the sole assumption of a dominant root for f . A result in this direction appears as Theorem 2 in [CZ4a]. Here we shall limit ourselves to an example embodying much of the content of the methods concerned. We shall treat the problem of classifying the perfect dth powers which may be written as sums of a bounded number of S-units; we shall work under the assumption that some term in the sum is dominant. Finally, we shall apply the conclusion to simple recurrences with a dominant root. An important tool in dealing with results of this kind is an approximation result, generalizing the Roth–Ridout theorem. It is obtained as an application of the subspace theorem, and shall be used also in the applications to transcendental number theory in the last chapter. Here is its statement, which appeared implicitly in [CZ1] and in greater generality in [CZ4a]: Theorem 4.16 Let n → f (n) be a non-constant linear recurrence sequence with integral roots > 1 and rational coefficients. Let α be a non-zero real algebraic number and let ε > 0 be a positive real number. There exist only finitely many rational numbers of the form m/ f (n), m ∈ Z \ {0}, n ∈ N, satisfying α − m < |m|−(1+ε ) . (4.13) f (n) Remark 4.17 The case f (n) = bn , with b > 1 a fixed integer, falls under Ridout’s Theorem: in that case the special form of the denominator ensures that the rational approximation m/bn tends to infinity with respect to the padic absolute values for p dividing b. Even the simple case f (n) = 2n + 1 is not covered by such an interpretation: however, if we pass to higher dimensions we can argue as follows. We associate with the rational number m/(2n + 1) the rational point (m : 2n : 1) = (x0 : x1 : x2 ) = x ∈ P2 (Q); the special form of this point can be translated by saying that the 2-adic linear forms X1 and the Archimedean linear form X2 take “small” values at x. This last fact can be exploited via the two-dimensional subspace theorem. This is the main idea of the proof of Theorem 4.16.

144

Diophantine Equations with Linear Recurrences

Notice that the numerator |m| can be replaced by the denominator | f (n)| on the right-hand side of (4.13), leading to an equivalent statement. As for Ridout’s Theorem, Theorem 4.16 admits an improvement to “exponent ε ” whenever both the numerator and the denominators are restricted to being values of fixed linear recurrences. Intermediate cases are possible: for instance one can consider rational approximations of the form mbk / f (n), for fixed b and varying m, k, n; in that case the conclusion would be exactly as stated, i.e. the right-hand side in (4.13) would remain |m|−(1+ε ) . Proof of Theorem 4.16. Suppose then f : N → Q is a power sum with integral roots > 1; that it is given by an expression of the form f (n) = a1 b1 n + · · · + ak bk n .

(4.14)

Here a1 , . . . , ak are non-zero integers, whereas 1 < b1 < b2 < · · · < bk are positive distinct integers. Let α ∈ R be a real non-zero algebraic number. We first note that | f (n)| tends to infinity exponentially with n, and more precisely f (n)/bnk → ak = 0, so in particular f (n) = 0 for all sufficiently large n. For each such n write α − m = 1 · |a1 α bn + · · · + ak α bn − m|. 1 k f (n) | f (n)| Put x = (x0 , . . . , xk ) = (m, bn1 , . . . , bnk ). We shall apply the subspace theorem (e.g. in its third version, namely Theorem 2.3) in the following way. Let S be the set of places of Q formed by the Archimedean place and those corresponding to the prime divisors of b1 · · · bk . Define the linear forms in k + 1 variables (X0 , . . . , Xk ) as follows: for each prime p dividing b1 · · · bk put Li,p (X0 , . . . , Xk ) = Xi ,

i = 0, . . . , k;

and for the Archimedean place | · |∞ put L0,∞ (X0 , . . . , Xk ) = a1 α X1 +· · · , ak α Xk −X0 ,

Li,∞ (X0 , . . . , Xk ) = Xi ,

i > 0.

The double product ∏ki=0 ∏ν ∈S |Li,ν (x0 , . . . , xn )|ν appearing in Theorem 2.3 can be easily estimated in the present case: upon writing (x0 , . . . , xk ) = (m, bn1 , . . . , bnk ) = x, we have k

∏ ∏ |Li,ν (x)|ν = |L0 (x)| · ∏ i=0 ν ∈S

p|b1 ···bk

k

|m| p · ∏ ∏ |xi |ν . i=1 ν ∈S

The second factor above is ≤ 1, while the third factor is exactly equal to 1 since

4.5 Further Diophantine Problems with Recurrences

145

the xi , i > 0, are all S-units. Then the double product is bounded as k

∏ ∏ |Li,ν (x)|ν ≤ |L0 (x)|. i=0 ν ∈S

Now, Theorem 2.3 provides the lower bound H(x)−ε for the double product above, apart from in the case of possible “trivial families.” We leave to the reader the verification that in fact such infinite families of exceptions cannot occur in this case. Hence the bound |L0 (x)| = |α a1 bn1 + · · · + α ak bnk − m| > H(mbn , bn1 , . . . , bnk )−ε is valid for all but finitely many pairs (m, n) ∈ Z × N. From the above lower bound the inequality (4.13) follows immediately. Theorem 4.16 admits some applications to finiteness results for Diophantine equations, for instance to perfect powers in linear recurrence sequences, as shown in [CZ1]; a generalization of such results is provided below in Theorem 4.18. Further applications to transcendental number theory will be provided in the next chapter. Sums of units which are perfect powers. Diophantine equations like y2 = bn + 1, for fixed b, can easily be dealt with via Siegel’s theorem for curves (see Exercise 3.19). However, the similar equation y2 = 1+2n +3n apparently is not amenable to treatment using Siegel’s method. Again, we shall prove finiteness for the set of solutions to that equation, and similar ones, but once again the proof will involve higher-dimensional constructions. A more general problem consists of investigating perfect powers which are sums of units, or “almost units” in a sense which will be made precise. As usual, one expects finiteness results, apart from in the case of possible trivial families, arising e.g. from identities like (α + β bm )2 = α 2 + 2αβ bm + b2m , which enable one to construct infinitely many perfect squares which are sums of three units in a fixed ring of S-integers. We shall treat the problem over a general number field. The arithmetical problems are naturally formulated in the geometric setting of integral points on subvarieties of linear tori. Let k be a number field, and let S ⊂ Mk be a finite set containing Mk,∞ , ν ∈ S. Also, let X1 , . . . , Xn be the standard coordinates on Gnm . We have then the following theorem. Theorem 4.18 Let d ∈ N, δ > 0. Let Σ ⊂ Gnm be a set of vectors x = (x1 , . . . , xn ) ∈ (OS∗ )n such that

146

Diophantine Equations with Linear Recurrences

(i) |x1 |ν ≥ (max j≥2 |x j |ν )H(x)δ ; (ii) there exists z = zx ∈ k∗ with x1 + · · · + xn = zd . Then Σ is contained in a finite union of translates uH ⊂ Gnm , u ∈ (OS∗ )n , where H is an algebraic subgroup of Gnm , such that X1 + · · · + Xn (as a function on uH) is the product of a monomial cX1 by a dth power in k[uH] (here k[uH] denotes the ring of regular functions on H). Condition (i) on the “dominant term” is probably not needed (for a similar conclusion), but removing it seems a very difficult problem. Note that the conclusion is rather restrictive on the relevant translates uH, and admits a partial converse. In fact, suppose that, for example, in the ring k[H] a relation of the form X1 + · · · + Xn = X1 P(X1 , . . . , Xn )d holds. Well, if y ∈ H(OS ), then all the vectors x := yd satisfy at least assumption (ii) and are Zariski-dense in H if S is large enough. For given δ , d, the proof method allows one to find all the finitely many relevant subgroups H, namely those such that the solutions to (ii) are Zariskidense in uH, for suitable k, S, u. However, as often happens with these questions, one does not know, given k and S, how to decide whether for a given H (say H = {id}) there exists an admissible “u.” Proof of Theorem 4.18. Since OS∗ is finitely generated, the quotient OS∗ /[d]OS∗ is finite. In proving the theorem we may then suppose that for x ∈ Σ the class of x1 modulo [d]OS∗ is fixed, say x1 = ξ t d , ξ ∈ OS∗ is fixed and where t = tx ∈ OS∗ . Put π (x) := (ξ /x1 )x. Then the set π (Σ) also satisfies the assumptions. Suppose for a moment that the conclusion is true for this set π (Σ) of vectors, and let uH be a translate as in the conclusion. We have an equation X1 + · · · + Xn = cX1 P(X)d in k[uH]. Now, the substitution Xi → Xi ξ /X1 , performed on a set of defining equations for uH, yields equations for a new translate u H = π −1 (uH). Note that the union of the u H contains Σ (if the union of the uH contains π (Σ)); also, the equation X1 + · · · + Xn = cX1 P(π (X))d holds on u H . Summing up, we can argue just with π (Σ) in place of Σ; namely, we may suppose that x1 = ξ is fixed for x ∈ Σ. Write zd = x1 + · · · + xn = ξ (1 + ρ ), where ρ := (x2 + · · · + xn )/ξ and z = zx ∈ k∗ (and x ∈ Σ). It is clear that in fact z ∈ OS . Define k as the field obtained by adjoining to k the dth roots of ξ , and extend the place ν to k (preserving, however, the normalization with respect to k). Now, assumption (i) entails |ρ |ν ≤ nH(x)−δ . For all but finitely many x ∈ Σ, which we tacitly disregard, we can then expand (1 + ρ )1/d by the binomial

4.5 Further Diophantine Problems with Recurrences

147

theorem, in a series which is absolutely convergent in the ν -adic topology (see [DGS]). In particular, we obtain, for a suitable dth root of ξ , η ∈ k , ∞ 1/d z=η ∑ ρj j j=0

ν -adically. On truncating this series after R terms, where R is a fixed integer > 3/δ , using (i) and expanding for the powers of ρ = (x2 + · · · + xn )/ξ , we get N (4.15) z − η ∑ c j μ j H(x)−Rδ ≤ H(x)−3 , j=1 ν

where N = N(R), the c j are fixed (independently of x), and the μ j are monomials in the quantities xi (i ≥ 2), of degree ≤ R. In particular, the μ j are S-units of height ≤ H(x)R . We shall apply the subspace theorem in the form of Theorem 2.3 (in the present notation) with the following data. The number of variables will be 1 + N, to start with. Insofar as the linear forms (in Y0 , . . . ,YN ) are concerned, let us put L0ν (Y) = Y0 − ∑Nj=1 η c jY j and, for (i, v) = (0, ν ), Liv (Y) = Yi . Note that for all v ∈ S the forms Liv are indeed linearly independent. We are going to evaluate these forms in the vectors y = (y0 , . . . , yN ) defined by y = (z, μ1 , . . . , μN ) (associated with the vectors x ∈ Σ). To verify the assumptions, we have to estimate the double product ∏v∈S ∏Ni=0 |Liv (y)|v . Since y1 , . . . , yN are S-units, we have ∏v∈S |yi |v = 1 (for i > 0) by the product formula, and therefore N

∏ ∏ |Liv (y)|v = |L0ν (y)|ν ∏

v =ν ,v∈S

v∈S i=0

|z|v .

−1 Moreover, z ∈ OS , whence ∏v =ν ,v∈S |z|v ≤ |z|−1 ν ∏v∈Mk max(1, |z|v ) ≤ |z|ν H(z). −1 −1 Since |z|ν ≤ H(z ) = H(z), using (4.15) we obtain N

∏ ∏ |Liv (y)|v H(x)−3 H(z)2 .

v∈S i=0

Finally, H(z) = d −1 H(ξ +x2 +· · ·+xn ) ξ H(ξ : x2 : · · · : xn ) = H(x), whence N

∏ ∏ |Liv (y)|v H(x)−1 .

v∈S i=0

On the other hand, H(y) ≤ H(x)Rn+1 ; then we may indeed apply the subspace theorem, Theorem 2.3 (with some ε < (Rn + 1)−1 ), concluding that the vectors y in question lie on a finite union of proper linear subspaces of kN+1 . Therefore,

148

Diophantine Equations with Linear Recurrences

taking into account a single subspace each time, we may assume that we have an equation N

α0 z = ∑ αi μi ,

(4.16)

i=1

where the αi are fixed elements of k, not all zero. Suppose first α0 = 0. Then we may express one of the μ j as a fixed linear combination of the remaining ones. On substituting in the left of (4.15) we find an analogous inequality, which, however, involves only N − 1 among the N terms μ j ; we will then be able to repeat the whole procedure which led from (4.15) to (4.16). Since the number of μ j that appear decreases each time, this iteration will stop after at most N steps. In conclusion, replacing N with a smaller number if necessary, we can assume that α0 = 0 in (4.16); then, on dividing by α0 , we can assume that α0 = 1. On substituting into the equation zd = ξ + x2 + · · · + xn we find d N

∑ αi μi

= ξ + x2 + · · · + xn .

i=1

Recall that the μi represent certain monomials Mi in X2 , . . . , Xn , evaluated at X j = x j , and recall that x1 = ξ for the vectors in question. Then the last equation (re-written to agree with the sought conclusion) says that the point x lies in the variety V ⊂ Gnm defined by the equations d

ξ −1 X1

N

∑ αi Mi

= X1 + X2 + · · · + Xn ,

X1 = ξ .

(4.17)

i=1

Note that the proof leads to at most finitely many such varieties. Now, by Theorem 2.7, the Zariski closure of V (OS∗ ) is a finite union of certain algebraic translates entirely contained in V ; in view of the left of (4.17) these translates therefore verify the conclusion of Theorem 4.18. To complete the proof, we now need only observe that Σ is contained in the union of the (finitely many) sets V (OS∗ ). Corollary 4.19 Let Σ be as in Theorem 4.18. Then the Zariski closure of Σ in Gnm is a finite union of algebraic translates with the property of Theorem 4.18. Proof Since Σ is contained in the finitely generated group (OS∗ )n , by Theorem 2.7 the Zariski closure of Σ is a finite union of algebraic translates, each of which we may assume to be irreducible. By Theorem 4.18, each such translate is then contained in an algebraic translate with the property in question. Now, plainly that property is shared by any subtranslate (in fact, by any subvariety), concluding the proof.

4.5 Further Diophantine Problems with Recurrences

149

Example As a simple instance, define Σ as the set of vectors (3n , 2n , 1) ∈ G3m (n ∈ N) such that 3n + 2n + 1 is a perfect square. To this set we may apply the corollary, on taking ν as the usual valuation and δ = log(3/2)/ log 3. If Σ were infinite, its Zariski closure would be the algebraic coset G2m × {1}. Its algebra equals the ring k[X1±1 , X2±1 ] (for independent variables X1 , X2 ) and the function X1 + X2 + X3 restricted to it equals X1 + X2 + 1, which is not a monomial times a square. Hence Corollary 4.19 (or Theorem 4.18) implies the finiteness of Σ. (Compare this with Exercise 1.51.)4 By the same reasoning, Corollary 4.19 proves that, for each d ≥ 2, the sequence 3n + 2n + 1 is a perfect dth power only for finitely many n. (As we have mentioned, if we take a sufficiently large exponent d, we can prove, via Baker’s theory of linear forms in logarithms, that the value of the sequence is never a perfect dth power; hence that sequence contains only finitely many perfect powers.) A natural problem arises: can we prove the finiteness of perfect dth-power values for an exponential sequence in two variables, like 1+2m +3n ? The problem in applying Theorem 4.18 lies in the verification of condition (i), which is not satisfied whenever 2m is of the same order of magnitude as 3n , precisely when m log 2/(n log 3) tends to 1. While in some cases one can find ad hoc methods to treat the problem, as D. Leitner did in [Lei] for the perfect squares in the sequence 1 + 2m + 3n , the general question is still open. We have, however, particular results, which cover for instance the case of numbers of the form 1 + 2m + 2n ; we present here one such result as an example. Theorem 4.20 Let b > 1 be a positive integer, a1 , a2 , a3 non-zero integers. For each d ≥ 3, the Diophantine equation yd = a1 + a2 bm + a3 bn has only finitely many solutions (m, n) ∈ N2 , y ∈ k. If the equation y2 = a1 + a2 bm + a3 bn has infinitely many solutions, then there exist α , β ∈ Z such that for infinitely many of them y = α + β bh , for some h ∈ N. This result would follow immediately from Theorem 4.18 except possibly for a sequence of solutions for which m/n tends to 1. In that case, however, dividing by, say, bm , we obtain the sequence a1 b−m + a2 + a3 bn−m , where now the first term is “dominant” with respect to any padic absolute value for p dividing b. Hence condition (i) in Theorem 4.18 is 4

Sometimes congruence considerations may lead to the same conclusion for similar power sums (see, e.g., Exercise 4.48 below and the notes at the end of this chapter); however, this will never be the case if we interpret “perfect square” as meaning in an arbitrary (though fixed) number field.

150

Diophantine Equations with Linear Recurrences

satisfied (for (x1 , x2 , x3 ) = (b−m , 1, bn−m )) for the p-adic absolute value | · | p and any δ < 1. In a similar way, we can get a more general result on considering the perfect dth powers in a simple recurrence sequence. The following result generalizes [CZ1], Corollary 1. (See also Theorem 2 of [CZ1], and Theorem 2 of [CZ4a], for more general algebraic equations.) Theorem 4.21 Let f (n) be a simple recurrence defined over a number field k, with roots ρ1 , . . . , ρs ∈ k∗ . Suppose that for a place ν ∈ Mk we have |ρ1 |ν > maxsi=2 |ρi |ν and that there exist infinitely many n ∈ N such that the equation zd = f (n) has a solution z ∈ k. Then there exist positive integers Q, R and a recurrence g(n) (defined over k) such that g(n)d = f (Qn + R) for all n ∈ N. Proof Put f (n) = c1 ρ1n + · · · + cs ρsn , ci ∈ k∗ . On subdividing N into a finite number of arithmetic progressions (with the same modulus), we can assume as in Section 4.1 that the group G ⊂ k∗ generated by the roots ρi is torsion-free, and that f is non-degenerate (and hence non-zero). Choose a finite subset, S ⊂ Mk , large enough to contain ν , the infinite places of k, and such that ci , ρi ∈ OS∗ for i = 1, . . . , s. Let A ⊂ N be the infinite set of n ∈ N such that f (n) is a perfect dth power in k. Moreover, define Σ := {xn := (c1 ρ1n , . . . , cs ρsn ) : n ∈ A}. Let λ = mini≥2 (|ρ1 /ρi |ν ) > 1. Then, if xn = (x1 , . . . , xs ) ∈ Σ and if i ≥ 2, we have |x1 /xi |ν = (c1 ρ1n )/(ci ρi )n |ν λ n . On the other hand, H(xn ) Cn for some C, whence for a certain δ > 0 we have |x1 /xi |ν ≥ H(xn )δ for all large enough n. Then the set Σ (apart from a finite subset) satisfies the assumptions of Theorem 4.18 (on putting s in place of n). In particular, this implies that Σ is contained in a finite union of algebraic translates as in that theorem. Let uH be one of those translates, having an infinite intersection Σ with Σ. We have therefore an identity X1 + · · · + Xs = cX1 Q(X1 , . . . , Xs )d

(4.18)

that is valid on the whole of uH, where Q ∈ k[X1±1 , . . . , Xs±1 ]. In particular, the identity holds for X = xn ∈ Σ . On setting B = {n ∈ N; xn ∈ Σ } we obtain, on substituting into (4.18), f (n) = cc1 ρ1n Q(c1 ρ1n , . . . , cs ρsn )d ,

∀n ∈ B.

(4.19)

Let h(n) be the recurrence on the right-hand side. Then both f , h have roots in G, which is torsion-free; hence the recurrence f (n) − h(n) is either nondegenerate or zero. This recurrence vanishes in B; since B is infinite, the first

4.5 Further Diophantine Problems with Recurrences

151

case cannot occur (by Theorem 4.3) and therefore f (n) = h(n) identically. Choose now r ∈ B so that f (r) = 0 (which is possible since f is non-degenerate and B is infinite). From f (r) = h(r) it follows that cc1 ρ1r must be a perfect dth power in k, say = η d , η ∈ k∗ , whence, on replacing n with dn+ r in the identity f (n) = h(n), we get f (dn + r) = (ηρ1n Q(c1 ρ1dn+r , . . . , cs ρsdn+r ))d ,

∀n ∈ N,

proving the result, with g(n) := ηρ1n Q(c1 ρ1dn+r , . . . , cs ρsdn+r ). Like many of the previous statements, this theorem too (which admits an obvious converse) says that an infinity of “special” values (now, those which are perfect dth powers) may always be explained by some algebraic identity. Note by the way that in concrete cases the existence of such an identity may easily be checked: as in Section 4.1, one first reduces to a recurrence with roots in a torsion-free group, then applies Proposition 4.2, reducing the verification to the easy case of polynomials. For instance, in the above example of the equation y2 = 3n + 2n + 1, the finiteness of the set of solutions now follows from the fact that 3r X1q + 2r X2q + 1 is not a square in k[X1±1 , X2±1 ], no matter what values the positive integers r, q take. We owe to A. Pethö the following remark. Take a simple recurrence f with a dominant root and another root. Then, firstly, one can apply the results in [ShSt] to show that, for a certain computable d f , the equation yd = f (n) has only finitely many solutions in integers d > d f , n and y ∈ k. Secondly, one can apply Corollary 4.21 for each d ≤ d f to obtain a complete description of the solutions, for variable d ≥ 2. Observe also that all the conclusions on recurrences that we have met so far imply that, if the corresponding property of f (n) holds for an infinity of n ∈ N, then it holds for all the n in a suitable arithmetic progression, somewhat similarly to Theorem 4.3. A related but more difficult problem is that of proving the finiteness of perfect powers with a fixed number of non-zero digits. For instance, consider the equation yd = 2l + 2m + 2n + 1

(4.20)

to be solved in positive integers l, m, n, d, y with l > m > n > 0, d ≥ 2. If the ratios l/m, l/n (and so also m/n) are fixed, the right-hand-side term above is a linear recurrence sequence in n; so, for a fixed exponent d ≥ 2, the finiteness of the solutions follows from our Corollary 4.19. However, if we do not fix the ratios l/m, l/n, even for fixed d, the above equation is not amenable to any application of Theorem 4.18. Note that again,

152

Diophantine Equations with Linear Recurrences

for fixed d, the finiteness of the set of solutions would follow from Vojta’s conjecture: indeed, consider the three-fold V ⊂ A4 defined in A4 by the equation yd = x1d + x2d + x3d + 1. Every solution to (4.20) provides a point (x1 , x2 , x3 , y) = (2l/d , 2m/d , 2n/d , y) ∈ V (OS ), where OS is the ring of S-integers: OS = Z[2−1/d ]. Moreover, such a point will be integral with respect to the divisor x1 x2 x3 = 0, as well as to the divisor at infinity. In view of this fact, whenever d ≥ 2, this is sufficient to allow us to apply Vojta’s conjecture, which predicts the degeneracy of the solutions. From this, it would be easy to deduce finiteness. (Note that, whenever d ≥ 5, Vojta’s conjecture predicts the degeneracy of the set of solutions (y, x1 , x2 , x3 ) ∈ OS4 to the equation yd = x1d + x2d + x3d + 1 even without assuming that x1 , x2 , x3 are units.) Although for such varieties the solution of the general version of Vojta’s conjecture (i.e., for arbitrary number fields and rings of integers) is still unknown, Equation (4.20) has been proved to have only finitely many solutions; by combining different techniques, which would divert us too far from our main purpose if they were to be explained here. The works of Bennett, Bugeaud, and Mignotte in [BBM] and of the authors of the present book in [CZ10] led to the following theorem. Theorem 4.22 Equation (4.20) has only finitely many solutions (l, m, n, d, y) ∈ N4 with l > m > n > 0, d > 1. For all solutions, d ≤ 5. Equations with several linear recurrences. We shall briefly consider equations of the form f1 (n1 ) + · · · + fk (nk ) = 0,

(4.21)

where k ≥ 2 is a fixed integer and f1 , . . . , fk : N → C are given linear recurrent sequences, to be solved in (n1 , . . . , nk ) ∈ Nk . Actually many of the equations treated so far belong to this class: for instance, taking into account that the sequence of perfect squares is indeed a linear recurrent one, the equation y2 = 1 + 2n + 3n , which by Theorem 4.20 has only finitely many solutions, is of the above form with k = 2. Whenever k = 2 or the recurrences are purely exponential, one might conjecture that the set of solutions to (4.21) consists of the union of a finite set together with a finite (possibly empty) union of infinite families of the form n(m1 , . . . , mh ) := (p1 (m1 , . . . , mh ), . . . , pk (m1 , . . . , mh )),

(4.22)

where h ≥ 1 is an integer and each p j is of polynomial-exponential type, i.e., m mi · · · αi,hh . of the form ∑di=1 qi (m1 , . . . , mh )αi,1 For instance, the S-unit equation Theorem 2.4 gives that conclusion for the

4.6 Fractional Parts of Powers

153

n

equation a1 α1n1 + · · · + ak αk k , where a1 , α1 , . . . , ak , αk are fixed non-zero (algebraic)5 numbers. However, for general equations of the form (4.21) there are cases in which such parametrization of infinite families does not hold, as discovered by V. Losert [Los]. We give here Losert’s example for the simple-looking equation 2l + m2m = n2n ,

(4.23)

which fits into the form (4.21) with k = 2. An infinite two-dimensional family of solutions, parametrized by the pairs (u, v) ∈ N2 , is provided by putting u

2v2 − 1 , n = 2 · 2u 2 −1 u

m = n − 2u

l = m + u + v2u .

(4.24)

Notice that, whenever v is fixed, say v = 2, one obtains doubly exponential sequences, e.g. u

u

u

(l, m, n) = (2u+2 + u + 2 · 2u , 2u+2 , 2u (22 + 1)). Losert provides all the infinite families of solutions to (4.23), proving in particular that the above family cannot be included in any family of the form (4.22). Other examples of this kind have recently been provided by H. Derksen and D. Masser in [DerMas]: the equation √ √ √ √ 2l + (2m 2 − m) + (2n 3 − n 2) − 3 j = 0 admits the triply exponential infinite family of solutions u

2u

(l, m, n, j) = (u, 2u , 22 , 22 ).

4.6 Fractional Parts of Powers Some classes of linear recurrence sequences arise as integral or fractional parts √ of powers of algebraic numbers. For instance, letting α := (1 + 5)/2 be the golden ratio, the nearest integer un to α n is expressed as un = Fn−1 + Fn+1 , where n → Fn is the Fibonacci sequence. In particular, it satisfies Fibonacci’s recurrence: un+2 = un+1 + un . This is due to the fact that α is a quadratic algebraic integer whose Galois conjugate is −α −1 , which lies in the open interval (−1, 1); then the sum α n + (−α )−n is an integer for all n ≥ 0 and the second 5

As has already been remarked, the condition of algebraicity can be removed by a specialization argument.

154

Diophantine Equations with Linear Recurrences

addend tends to zero (exponentially), so the sum represents the integer nearest to α n . Then, for all n ≥ 1, α n = mink∈Z |α n − k| = |α |−n , and this sequence is also a linear recurrent sequence. This example is a typical instance of the general situation of Pisot numbers. Definition A real algebraic integer is called a Pisot number if all its other algebraic conjugates have complex absolute value < 1. Clearly, given a Pisot number α , for all large n the trace TrQ(α )/Q (α n ) will be the nearest integer to α n , and the difference α n − TrQ(α )/Q (α n ) tends to zero exponentially with n → ∞. Notation Recall from Chapter 1 that, for every real number α , we denoted by α the distance from α to the lattice Z ⊂ R of integers. On letting as usual [α ] denote the integral part of α , and {α } = α − [α ] the fractional part of α , we have that α = min({α }, 1 − {α }). With this notation, we have that, for a Pisot number α , on letting l be any real number strictly larger than the absolute value of the maximal conjugate of α distinct from α itself, for all large n, α n < l n . Note that we can take l < 1. Also, we note that the sequence n → α n is linear recurrent (at least for large n). It is natural to ask for which real numbers α the quantity α n can tend to zero exponentially, at least on an infinite sequence of exponents n. In 1957, Mahler proved (see [Mah]) the following result. Theorem 4.23 (Mahler’s theorem) Let α > 1 be a rational non-integer number. If 0 < l < 1, then α n > l n except for a finite set of integers n depending on α and l. Proof Mahler’s theorem is a consequence of Ridout’s theorem (which is a special case of Theorem 1.39) as we now show. Let us write α = a/b, for coprime positive integers a, b; we must have a > b > 0, since α > 1. On letting un ∈ Z be the nearest integer to α n , we have n an a un bn . α n = n − un = n 1 − b b an We now apply the generalized Roth theorem (Theorem 1.39) with k = Q, where S is the set containing the Archimedean absolute values and the p-adic

4.6 Fractional Parts of Powers

155

absolute values for p|ab. For ν = ∞, let αν = 1; for ν dividing a, take αν = ∞, and for ν dividing b, put αν = 0. Then if α n ≤ l n the product appearing in the generalized Roth theorem can be bounded as un bn ∏ αν − an ≤ l n a−n bn · a−n b−n = l n a−2n . ν v∈S On the other hand, the Roth–Ridout theorem, Theorem 1.39, provides the lower bound max(an , un bn )−2−ε , except for finitely many exceptions depending on ε , a, b. Then, by applying Roth–Ridout theorem with any ε < − log l/ log a, and observing that an ≤ max(an , un bn ) ≤ an + bn (and that b < a), we obtain the desired finiteness. In the same paper, Mahler asked for which algebraic numbers one can prove the analogue conclusion, remarking that the golden ratio is a counterexample (as well as all Pisot numbers, as we saw). A full characterization of those algebraic numbers for which the conclusion of Mahler’s theorem still holds is provided by the following result, which was proved in [CZ8]. Theorem 4.24 Let α > 1 be a real algebraic number and let 0 < l < 1. Suppose that α n < l n for infinitely many natural numbers n. Then there exists a positive integer d such that α d is a Pisot number. In particular, α is an algebraic integer. From this theorem, whose proof will be sketched below (for full details, see [CZ8], Theorem 1), one can deduce the following. Corollary 4.25 Let α > 1 be a real algebraic number and let 0 < l < 1. The solutions n ∈ N to the inequality α n < l n form the union of finitely many arithmetic progressions. A generalization to the fractional part of powers of the terms of an arbitrary linear recurrent sequence has recently been obtained by A. Kulkarni, N. Mavraki, and K. Nguyen. The idea of the proof of Theorem 4.24 is applying the higher-dimensional subspace theorem, instead of the Roth–Ridout theorem, by exploiting the Galois conjugates of α n to produce further “small” linear forms. Here is the construction, following [CZ8]. Let K be the Galois closure over Q of the number field Q(α ). Let d = [Q(α ) : Q] be the degree of α and let {σ1 , . . . , σd } be a set of representatives for the left cosets of the subgroup

156

Diophantine Equations with Linear Recurrences

Gal(K/Q(α )) in Gal(K/Q). Then the restriction of σ1 , . . . , σd to Q(α ) gives all the embeddings of Q(α ) into K. Each automorphism ρ ∈ Gal(K/Q) defines an Archimedean absolute value | · |ρ by |x|ρ := |ρ −1 (x)|δ /[K:Q] ,

(4.25)

where δ = 1 if K is a real field, whereas δ = 2 if K is a non-real field (recall that K is Galois over Q). Now, letting un be the nearest integer to α n , we have that, for all n ∈ N and all automorphisms ρ ∈ Gal(K/Q), δ /[K:Q]

α n δ /[K:Q] = |α n − un |δ /[K:Q] = |ρ (α n ) − un |ρ

.

For each Archimedean place ν ∈ MK,∞ , let ρν be an element of the Galois group inducing that place via (4.25). Let, for i = 1, . . . , d, Si ⊂ MK,∞ be the set of Archimedean places ν for which ρν coincides with σi on Q(α ). We thus obtain a partition of the set of Archimedean places. Then, by the above equality and the fact that δ |MK,∞ | = [K : Q], we obtain d

∏ ∏ |σi (α )n − un |ν = α n .

(4.26)

i=1 ν ∈Si

We now let S ⊂ MK be the minimal set of places containing the Archimedean ones and such that α is an S-unit. Define, for each place ν ∈ S, d + 1 linearly independent linear forms in d + 1 variables x0 , . . . , xd as follows. For an Archimedean ν ∈ Si , put Lν ,i (x0 , . . . , xd ) = x0 − xi and, for ν ∈ S \ MK,∞ or j = 0, put Lν , j = x j . Put also x = (x0 , . . . , xd ) = (un , σ1 (α n ), . . . , σd (α n )) ∈ K d+1 . Suppose now that for an infinite set of positive integers n we have α n < l n . Then Equation (4.26) easily implies that the double product satisfies |Lν , j (x)|ν < H(x)−d−1 l n . x ν ν ∈S j=0 d

∏∏

The subspace theorem now implies the existence of a linear relation between un and the conjugates of the powers of α pertaining for infinitely many solutions n to the inequality α n < l n . This in turn is shown to imply that either such

4.7 Markov Numbers

157

powers of α must lie in a smaller subfield, or that un is a linear combination of the powers of the conjugates of α n . The last case holds only if α is a Pisot number, while from the first case we obtain the same conclusion for some power α h of α .

4.7 Markov Numbers In this section, we show an application of Theorem 4.18 and Corollary 4.19, outside the theory of linear recurrences. It concerns greatest prime factors of Markov numbers, which we now introduce. The results and open problems presented here can be related to the distribution of integral points on algebraic surfaces, especially in the context of Vojta’s conjecture, as explained in the appendix to this section. Markov triples are defined as the solutions (x, y, z), with x, y, z positive integers, to “Markov’s equation”: x2 + y2 + z2 = 3xyz.

(4.27)

(The coefficient 3 is relevant, since any coefficient different from 1 and 3 would yield no solutions, and for each solution to x2 + y2 + z2 = xyz necessarily x, y, z must all be divisible by 3, so after dividing all terms by 9 one reduces to the above equation.) We call any positive integer x which appears in a Markov triple a Markov number, and we call any pair (x, y) such that for some integer z the triple (x, y, z) is a Markov triple (i.e. satisfies (4.27)) a Markov pair. Markov numbers appear in several different contexts. The arithmetic properties of Markov numbers have been investigated in depth from several viewpoints, including the local-to-global principle, weak approximation, and strong approximation. Recent results in the last direction are the object of the work [BGS] by J. Bourgain, A. Gamburd, and P. Sarnak. One can view Equation (4.27) as defining an affine algebraic surface. This surface admits a group of automorphisms generated by the permutations of x, y, z and the substitution (x, y, z) → (x, y, 3xy − z); this substitution derives from viewing (4.27) as a quadratic equation in z depending on parameters x, y. The orbit of the integral point (1, 1, 1) is infinite, and also Zariski-dense. It includes all the points with positive integral coordinates. A question about the arithmetic nature of Markov numbers is the following: does the greatest prime factor of a Markov number tend to infinity? If not, there would exist infinitely many Markov numbers which are S-units for a fixed finite set of places S. We do not know the answer to this question, but one can prove (see Exercise 4.51) that on a suitable ring of integers (e.g. in the ring

158

Diophantine Equations with Linear Recurrences

of Gaussian integers Z[i]) there are indeed infinitely many ‘Markov numbers’ which are S-units. Note that this problem corresponds to describing the S-integral points on the complement of the divisor x = 0 on the surface defined by (4.27). We remark that Vojta’s conjecture does not exclude a Zariski-dense set of solutions. In an appendix to this chapter, we shall study more thoroughly the geometry of the surface defined by Equation (4.27), especially in connection with Vojta’s conjecture. A related problem consists of considering the greatest prime factor of the product xy, for a Markov pair (x, y). In other words, we ask whether there are infinitely many solutions to Markov equations where x, y are both integral Sunits (and positive integers). In that case, we can give a complete answer as follows, which constitutes Theorem 1 from [CZ14]. Theorem 4.26 The greatest prime factor of xy, for (x, y, z) a solution of (4.27) in positive integers, tends to infinity. On viewing equation (4.27) again as a quadratic equation in z, we can solve it if and only if the discriminant 9x2 y2 − 4(x2 + y2 ) is a perfect square. Hence we are led to equations of the form t 2 = ax2 y2 + bx2 + cy2

(4.28)

to be solved in rational integers x, y, z, where x, y are moreover S-units (here a, b, c are fixed non-zero integers). Geometrically, the problem amounts to the distribution of integral points on the complement of the divisor xy = 0 on the affine surface defined by (4.27), or by (4.28). Note also the (unique) singularity of (4.27) at the point (0, 0, 0) where two irreducible components of the divisor at infinity (which now includes the curve xy = 0) meet. Now, if for a sequence (x, y,t) ∈ Z3 of integral solutions to (4.28) with x, y Sunits the ratio log |x|/ log |y| remains bounded from above and from below, then a direct application of Theorem 4.18, with n = 3, x1 = ax2 y2 , x2 = bx2 , x3 = cy2 , and d = 2, provides the degeneracy of the solutions (x, y). This means that all but finitely many of them satisfy one of finitely many multiplicative dependence relations modulo constants (in geometric terms: they belong to one of the easily described special algebraic curves lying on the surface defined by (4.28) and parametrized by Gm ). In the case coming from the original Markov equation (i.e. with a = 9, b = c = −4) such curves cannot contain integral points, so we derive unconditional finiteness, i.e. Theorem 4.26. If, on the contrary, we have an infinite sequence of solutions with, say,

4.7 Markov Numbers

159

log |y|/ log |x| → ∞, then we argue differently. We take a prime divisor p of y such that log |y| · log |y| p is bounded (such a prime can be found after extracting a suitable subsequence from the sequence of solutions); then we observe Roth theorem that t 2 − bx2 is highly divisible by p, and apply the generalized √ which gives a lower bound for the approximation |t/x − b| p , and hence the finiteness of the ratios t/x. Then we conclude as before. Alternatively, in this second case we can divide all of the terms in (4.28) by y2 and then apply again Theorem 4.18, since now the constant term c is “dominant.” Note that condition (i) of Theorem 4.18 is satisfied, either for the infinite absolute value of Q or for a p-adic absolute value (after dividing all terms by y2 in (4.28)); this is due to the fact that we are supposing that x, y, z are rational integers, not just Sintegers in some number field, so automatically the maximal of their absolute values is comparable (even equal) to the height of the vector (x, y, z). It is still unknown whether Equation (4.28) can have a Zariski-dense set of S-integral solution with x, y S-units in an arbitrary ring of S-integers. A negative answer is expected, and again this would follow from Vojta’s conjecture (see Exercise 4.49).

4.7.1 Appendix: On the Geometry of Markov’s Surface Consider the surface M ⊂ A3 defined by Equation (4.27). As remarked at the beginning of Section 4.7, this surface admits a group of automorphisms generated by the permutations of x, y, z and the substitution σz : (x, y, z) → (x, y, 3xy− z); this substitution is the Galois involution associated with the degree-two map M → A2 provided by the projection (x, y, z) → (x, y). Of course, by conjugating σz by the above-mentioned permutations of the coordinates one obtains two more involutions σx , σy defined by σx (x, y, z) = (3yz − x, y, z) and σy (x, y, z) = (x, 3xz − y, z). We note that σx , σy , σz are well defined as automorphisms of the whole space A3 , and remain of order two also when viewed in A3 . The surface M is singular at the origin and smooth elsewhere; its natural compactification M in P3 , whose equation in homogeneous coordinates (X : Y : Z : W ) becomes M:

W (X 2 +Y 2 + Z 2 ) = 3XY Z,

(4.29)

is also smooth apart from for the single point (0 : 0 : 0 : 1). Note that the divisor at infinity W = 0 consists of three lines. Projecting from the singular point to a plane (for instance the hyperplane at infinity) provides a birational map M → P2 , whose only indeterminacy point is the singular point.

160

Diophantine Equations with Linear Recurrences

We now describe its inverse map, and its minimal regularization, thus providing a smooth model of M . It is well known that smooth cubic surfaces are isomorphic to the projective plane blown up at six points in general position; by this condition, we mean that no three of them lie on a line, and they do not all lie on a (smooth) conic. In a converse direction, we shall prove that there is a configuration of six points on a smooth conic such that the corresponding blown-up plane Pˆ is endowed with a birational morphism Pˆ → M ; the mentioned conic (or, more precisely, ˆ is contracted to the singular its strict transform, which is a −2-curve on P) point (0 : 0 : 0 : 1) of M . Let us describe the construction of Pˆ and the projection to M . Take three independent linear forms in three variables u1 , u2 , u3 , say Li (u1 , u2 , u3 ) = ui , for i = 1, 2, 3, and consider the (non-degenerate) quadratic form Q : = L12 + L22 + L32 . For each i = 1, 2, 3, the line of equation Li = 0 intersects the smooth conic of equation Q = 0 at two distinct points Pi,1 , Pi,2 . We then obtain six points Pi, j , (i, j) ∈ {1, 2, 3} × {1, 2}, lying on the smooth conic Q = 0. The four cubic forms Q · L1 , Q · L2 , Q · L3 , 3L1 L2 L3 are linearly independent and generate the vector space of cubic forms vanishing on the six mentioned points. The four-tuple of these forms defines a rational map P2 P3 , namely u = (u1 : u2 : u3 ) → (QL1 (u) : QL2 (u) : QL3 (u) : 3L1 L2 L3 (u)) = (X : Y : Z : W ), whose indeterminacy locus is the set of the mentioned six points {Pi, j , i = 1, 2, 3; j = 1, 2}. The conic Q = 0 is contracted to the single point (0 : 0 : 0 : 1). By construction, the image of P2 satisfies the Markov equation (4.29) After blowing up the six points Pi, j , one obtains a smooth surface Pˆ on which the above-defined rational map can be continued as a morphism Pˆ → M . The divisor at infinity M \ M = {W = 0} ∩ M consists, as we said, of three lines: these lines correspond in M to the strict transforms of the lines of equation Li = 0 (to see that only the strict transforms are sent to infinity, note that the points on the conic Q = 0 are sent to the singular point of M , which does not lie at infinity). The six blown-up points correspond to six lines contained in M and passing through the singular point, namely the lines with affine equations x = 0 = y±iz, y = 0 = z ± ix, z = 0 = x ± iy. ˆ in particular its Picard group, We shall now describe the geometry of P, in order to read Vojta’s conditions on the singular surface M in terms of the ˆ smooth surface P. Let, for i = 1, 2, 3, Hi denote the strict transform of the line Li = 0, and let Ei, j , for (i, j) ∈ {1, 2, 3} × {1, 2}, denote the exceptional divisor over Pi, j

4.7 Markov Numbers

161

produced by the blow-up. The Picard group of Pˆ is free of rank seven, generated e.g. by H1 , H2 , H3 and the Ei, j , subject to the three relations Hi + Ei,1 + Ei,2 ∼ H j + E j,1 + E j,2 for all i, j (note that just two of them are independent). The canonical divisor of Pˆ is in the class of the divisor K := −(H1 + H2 + H3 ), hence on summing to K the divisor at infinity we obtain the zero divisor. In more sophisticated terms, M , or better its desingularization, has a trivial logcanonical bundle. The surface M (or its desingularization) can then be viewed as a logarithmic analogue of K3 surfaces. In this respect, let us mention that Silverman [Sil3], and later Baragar [Bar], studied the arithmetic of certain families of (compact) K3-surfaces admitting discrete groups of automorphisms produced from involutions coming from degree-two covers of the plane. It is then natural to expect that removing one hyperplane section, whose sum with the divisor at infinity has normal crossing singularities, leads to degeneracy of integral points. Let us see what happens on removing the plane x = 0, which corresponds to imposing that x be an S-unit. In that case the plane passes through the singular point (0, 0, 0) (or (0 : 0 : 0 : 1)). The corresponding divisor on Pˆ is the strict transform C of the conic Q = 0 together with the exceptional divisors E1,1 + E1,2 . Finally, the sum of the (chosen) canonical divisor with the new divisor at infinity is K + H1 + H2 + H3 + C + E1,1 + E1,2 = C + E1,1 + E1,2 . Since C2 = 2 = −1, while C · E −2, E1, 1, j = 1 and E1,1 · E1,2 = 0, we obtain that (C + E1,1 + j 2 E1,2 ) = 0. Hence the log-canonical divisor of the new surface fails to be big. Then Vojta’s conjecture does not apply, and in fact we can prove that e.g. over the ring Z[i, 1/2] the integral points are indeed Zariski-dense. In contrast, if we ask that two coordinates be units, say x and y, we are removing also the exceptional divisors E2,1 + E2,2 , so the log-canonical divisor will be in the class of C + E1,1 + E1,2 + E2,1 + E2,2 , which has self-intersection 2. Being effective, with positive self-intersection, it is big. Then Vojta’s conjecture predicts the degeneracy of integral points, over an arbitrary ring of S-integers. Our result (Theorem 4.26) confirms this conjecture, although only for the usual ring of rational integers (or for the integers in an imaginary quadratic field). Finally, we note that removing the conic Q = 0 from Pˆ (and asking for integrality with respect to that divisor) corresponds, in M , to removing the singular point (0, 0, 0), which arithmetically amounts to imposing that x, y, z be coprime. If we work over the usual ring Z of rational integers, it is easy to see that automatically all integral solutions (x, y, z) ∈ Z3 to (4.27) satisfy this further integrality condition.

162

Diophantine Equations with Linear Recurrences

Automorphisms. The three automorphisms σx , σy , σz of M defined above inˆ First of all, note that they can be continued duce birational automorphisms of P. as birational automorphisms of the projective space P3 by putting

σ¯ x (X : Y : Z : W ) = (Y Z − XW : YW : W 2 ) and similarly for σy , σz . They are undefined at the three lines at infinity in P3 . On projecting from (0 : 0 : 0 : 1) onto the plane at infinity, on which we use the u-coordinates, we obtain for σx , after using the equation of M , the formula

σˆ x (u1 : u2 : u3 ) = (u22 + u23 : u1 u2 : u1 u3 ), and analogously for σy , σz . It is easily seen to be an involution. Also, the indetermination locus for σˆ x , σˆ y , σˆ z on the plane consists of the six points Pi, j and the three points (0 : 0 : 1), (0 : 1 : 0), (1 : 0 : 0). While the first six are blown ˆ so that the map is regularized therein, at the other three the indetermiup in P, nation remains. Note, however, that they belong to the divisor at infinity, since they consist precisely of the points H1 ∩ H2 , H2 ∩ H3 , H3 ∩ H1 . Finally, the automorphisms of the affine surface M can be defined on the complete surface Pˆˆ obtained by blowing up nine points on P2 , or three more ˆ It is worthwhile to remark that the self-intersection of the canonipoints on P. cal divisor on this surface vanishes, thus showing once again the link with the surfaces studied by Silverman and Baragar. We conclude by recalling that in the seventies M. H. El’-Huti studied the automorphisms of a larger class of cubic surfaces, including the Markov surface; one of his results asserts that the full group of automorphisms of M is generated by σx , σy , σz .

4.8 Exercises Exercise 4.27 In the notation of Section 4.1, prove that Rk [G] ∼ = Rk [[q]G]q , q−1 where the isomorphism is induced by { f (n)} → ({ f (qn + r)})r=0 . (Hint: only the surjectivity might not be evident; for this, consider, for example, a generating function.) Exercise 4.28 Let g ∈ GLd (C) be an invertible matrix. Let X ⊂ GLd be an algebraic variety, defined by polynomial equations f1 (x) = · · · = fk (x) = 0. 2 Here x = (xi, j )1≤i, j≤d can be viewed as a point in the affine space Ad . Suppose that gn ∈ X for infinitely many n ∈ N. By applying the Skolem–Mahler–Lech theorem k times, prove that there is an infinite arithmetic progression n → qn+r such that for each index l = 1, . . . , k, and for each n in such a progression, fl (gn ) = 0, so gn ∈ X for all n in an arithmetic progression.

4.8 Exercises

163

Exercise 4.29 Let g ∈ GLd (C) be an invertible matrix. Let Γ ⊂ GLd (C) be the cyclic semigroup generated by g. Prove that the Zariski closure Γ¯ of Γ is a commutative algebraic subgroup of GLd . Prove that it is connected if and only if the multiplicative group generated by the eigenvalues of g is torsion-free. Exercise 4.30 Let g be an invertible d × d matrix; fix two integers q > 0 and r with 0 ≤ r < q and consider the Zariski closure Y in GLd of the set {gqn+r : n ∈ N}. Letting Γ be the Zariski closure of the cyclic semigroup Γ generated by g, prove that Y is a coset in Γ¯ of the connected component Γ¯ 0 of Γ¯ containing the identity. Exercise 4.31 Using the three previous exercises, prove the equivalence between Theorem 4.3 and Theorem 4.4. The next series of exercises is aimed at providing the generalization of Theorem 4.4 to arbitrary algebraic groups. Exercise 4.32 Let G be an algebraic group defined over a field of characteristic zero. Let g ∈ G be one of its elements and let X ⊂ G be an algebraic subvariety of G. Prove that there exist infinitely many primes p and hence that there exists an embedding of a field of definition for G, g, and X into the p-adic field Q p . From now on, we suppose that every object is defined over Q p . Exercise 4.33 In the notation of the above exercise, prove that, if gn ∈ X for infinitely many n, then for every p-adic neighborhood U ⊂ G of the origin in G there exist an element h ∈ G and an arithmetic progression n → qn + r such that, for every n in such a progression, gn ∈ h ·U. (Here h ·U denotes the translate of U by the element h.) Exercise 4.34 In the notation of the two previous exercises, suppose that gn ∈ X for infinitely many exponents n ∈ N. On replacing G by the Zariski closure of the group generated by g, we reduce to a commutative algebraic group G containG. Choose a neighborhood U of the origin in G and a ball A ⊂ Qdim p dim G on which the p-adic exponential and ing the origin in the vector space Q p logarithmic maps are bijective. Using the previous exercise, prove that there is G containing logarithms of elements a p-adic analytic subvariety Y ⊂ A in Qdim p −1 n of the form h g , for a fixed h and infinitely many exponents n ∈ N. Deduce that Y contains a translate of a Z p -submodule of A. Exercise 4.35 Let f : N → R be a linear recurrent sequence defined over the reals,6 of order r ≥ 1. Prove that f (n) vanishes for at most r − 1 values of 6

Remember that this means that not only its values but also its roots are real.

164

Diophantine Equations with Linear Recurrences

n ∈ N. One can generalize this result to exponential polynomials, i.e. functions f : R → R of real variables, expressed by a formula of the form (4.1). Exercise 4.36 Let a, b be integers > 1 such that bn − 1 is divisible by an − 1 for all n ∈ N. Prove in some way that is independent of the theorems in this chapter that b must then be a power of a. (This corollary of van der Poorten’s theorem admits an elementary proof. Hint: set z(n) = (bn − 1)/(an − 1) ∈ Z. kn n On truncating the expansion ∑∞ k=1 (1/a ) for 1/(a −1), one finds a recurrence d(n) over Q, such that r(n) := z(n) − d(n) → 0 for n → ∞. If d(n) satisfies the relation c0 d(n) + · · · + ck d(n + k) = 0, with ci ∈ Z, we have c0 z(n) + · · · + ck z(n + k) → 0; but z(n) are integers, whence . . . Another argument comes from algebraic number theory, where we seek prime numbers p and integers n = (p − 1)/h such that an ≡ 1 (mod p), bn ≡ 1 (mod p); in this approach, Chebotarev’s theorem will be helpful.) Exercise 4.37 Let f , g ∈ Z[X] be coprime polynomials. Prove that the gcd ( f (n), g(n)) is bounded independently of n ∈ Z. On the other hand, given any a, b ∈ N, the gcd (an − 1, bn − 1) is not bounded as n varies. (However, if an − 1, bn − 1 are “coprime” recurrences, Theorem 4.6 implies a sub-exponential estimate.) How large can this gcd be for special values of n? (See [BuCZ], Remarks.) Exercise 4.38 (A. Pethö) Prove that the recurrence defined by f (0) = f (1) = 0, f (2) = 1, f (n + 3) = f (n + 2) + f (n + 1) + f (n) admits a dominant root. Prove also that, in contract, the recurrence g(n) = f (−n), n ∈ N, is nondegenerate and does not admit a dominant root, no matter what the chosen place is. Exercise 4.39 For the recurrence f of Exercise 4.38 above, prove that f (n) cannot be a perfect square for infinitely many n ∈ N. (On the other hand, the same conclusion is not known to hold for g(n) := f (−n), though it appears to be extremely likely.) Exercise 4.40 Let a1 , . . . , as be positive integers such that a2 /a1 , . . . , as /a1 are multiplicatively independent (s ≥ 2) and let c1 , . . . , cs be non-zero integers. Prove that, for a fixed integer d ≥ 2, the equation ∑si=1 ci ani = zd has only a finite number of solutions (n, z) ∈ N × Z. (Hint: apply Corollary 4.19 and Proposition 4.2. See [CZ1], Corollary 3, for a more general conclusion. Using [ShSt], Theorem. 3, one can even assume that d ≥ 2 is variable!) Exercise 4.41 Classify the algebraic translates uH ⊂ G3m such that X1 + X2 + X3 , as a function on uH, is a product of a dth power in k[uH] by a monomial.

4.8 Exercises

165

Exercise 4.42 (i) Classify the infinite families of perfect squares whose decimal expansion contains at most three non-zero digits (see [CZ2]). Prove also that, if d > 2, all such perfect dth powers have the shape 10a b, where b lies in a finite set (depending on d). Analogues with four or more digits are not known, apart from for the case of four binary digits (Theorem 4.22). (ii) Prove that the equation y2 = 1 + 6m + 10n has only a finite number of integer solutions (y, m, n). (Hint: in both (i) and (ii), use Theorem 4.18 with suitable valuations ν . See [CZ2] for generalizations.) Exercise 4.43 For integers a > b > c ≥ 1, put t = ab + 1, u = ac + 1, v = bc + 1. Let S be a finite set of places of Q and let Θ be the set of triples (a, b, c) such that t, u, v ∈ OS∗ . Observe that (t − 1)(u − 1)(v − 1) = (abc)2 is a perfect square. Expand the left-hand side and apply Corollary 4.19 to deduce that, for fixed ε > 0, there exist only finitely many triples in Θ such that b > aε .7 Exercise 4.44 Find the integer solutions of y2 = 5n + 2n + 7. (Theorem 1.39 already suffices for finiteness, as in Exercise 1.51; however, a congruence mod 8 will now prove to be more effective.) Exercise 4.45 For a given ε > 0, let Σε be the set of pairs (u, v) with the property of Theorem 4.7. Describe the positive-dimensional components of the Zariski closure of Σε . Exercise 4.46 Let a, u, v be multiplicatively independent non-zero integers. For m < n ∈ N, define d(m, n) as the greatest common factor coprime with auv √ of am − u and an − v. Prove that log |d(m, n)| a,u,v n. (The setting implies now a much more effective and elementary proof than Theorem 4.7. Hint: using Theorem 1.7, or its proof, find integers r, s such that √ √ |rm + sn| ≤ 2 n, |r|, |s| ≤ 2 n. Then consider arm+sn − ur vs .) Exercise∗ 4.47 Obtain the following analogue of the result of Pourchet. (i) If a, b ∈ N are multiplicatively independent, the length of Euclid’s algorithm for (bn − 1) : (an − 1) tends to infinity as n → ∞. (Hint: set z(n) := (bn − 1)/(an − 1). Prove first that, if ε > 0 is “small” and q ≤ exp(ε n), then |z(n) − p/q| exp(−cε n) for a c = c(a, b) independent of n. For this, use (4.3), with u = bn , v = an , j = 1, similarly to Theorem 4.7, but more simply. Then, use the estimate |z(n) − pr /qr | ≤ (qr+1 qr )−1 for the convergents to z(n); the previous claim gives inductively qr = qr (n) exp(εr n), where the εr are, for fixed r, as small as wanted. If the continued 7

This approach yields an alternative proof of the conjecture by Györy, Sarkozy, and Stewart mentioned after Theorem 4.7. In fact, Stewart and Tijdeman [StT] had already proved the conclusion for triples such that log c/ log a → 0. See [CZ6], Remark 2.

166

Diophantine Equations with Linear Recurrences

fraction has fixed length R, 0 = |z(n) − pR /qR | exp(−cεR n), which amounts to a contradiction. See also [Scr], [BL], [CZ11], and [CZ8]). (ii) What happens if a, b are multiplicatively dependent? Exercise 4.48 (i) Let D be the divisor of degree 4 in P2 defined as the sum of the lines X0 = 0, X1 = 0 and of the conic X22 = X02 + X0 X1 . Show that for a suitable presentation of V := P2 \ D the S-integral points on V correspond to the S-units u, v such that 1 + u + v is a perfect square. (ii) Consider the Diophantine equation y2 = 1 + 2m + 3n . Show (use e.g. Theorem 2.7) that an infinite set of integral solutions would yield a Zariskidense set of S-integral points on the variety V in (i) (for an S containg 2, 3); this would contradict the conjecture by Lang and Vojta discussed in Section 3.11 (see also [BoG], or [HiSi], p. 486, or [Co2], Section 1.2); compare this also with Exercise 3.68, where D is a sum of lines. Using Theorem 4.18 show that any possible infinite sequence of solutions would have the ratio m/n converging to log 3/ log 2. (Recent work of D. Leitner [Lei] completely solved the above equation, proving in particular the finiteness of the solutions. However, the problem remains open over number fields, i.e. when y is allowed to lie in any fixed number field.) Exercise 4.49 Let a, b, c be three non-zero elements in a number field. Consider the equation t 2 = auv + bu + cv, which is to be solved in S-units u, v and S-integers t. Its homogeneous form becomes t 2 = auv + buw + cvw and defines a smooth quadric X ⊂ P3 . The solutions to our equation correspond to integral points on the complement in X of the divisor uvw = 0. Note that Theorem 4.18 does provide degeneracy of solutions in the particular case when u, v are also supposed to be rational integers, and more generally under mild conditions on the existence of a dominant absolute value. Exercise 4.50 Show that Theorem 4.26 would follow also from the (still open) case of Vojta’s conjecture for the complement of a conic and two lines in the plane. Exercise 4.51 Consider the Markov equation x2 + y2 + z2 = 3xyz. The aim of this exercise is to prove that the equation admits a Zariski-dense set of solutions in a certain ring of S-integers, where one coordinate is an S-unit. Consider, for any value of n ∈ N, the solution (x, y, z) = (2n , i2n , 0) ∈ Z[i] × Z[i] × Z[i]. By

4.9 Notes

167

iterating the automorphism (x, y, z) → (x, 9x2 y − y − 3xz, 3xy − z), find infinitely many solutions of the form (2n , y, z), with y, z ∈ Z[i]. By varying n, deduce that the solutions so produced form a Zariski-dense set on the Markov surface.

4.9 Notes A fairly complete theory of the structure of the ring of exponential polynomials (viewed as functions on C) is due to J. F. Ritt; see [vdP1], [S4]. No effective proof of Theorem 4.3 is known at present,8 no matter what the chosen approach is, i.e., either through Schmidt’s theorems or through p-adic analysis. However, the latter method sometimes leads to the enumeration of all zeros (by using certain estimates for the number of zeros of the relevant p-adic analytic functions, obtained, e.g., through Newton’s polygon, as in [DGS]). See [C2], pp. 52–53, for an example related to the Diophantine equation 2n = x2 + 7, and also [Mor], Chapter 23, for several other instances. The problem of an optimal estimate for the number of zeros remains, again, not yet completely solved, though some most significant advances have been achieved; see for instance [Be2], [S5]. In the case of transcendental (or real) roots, one may often use elementary methods to obtain results nearer to the conjecturally best-possible ones (see [S4], Sections 9, and 10, and [Z8]). Concerning the ratios of linear recurrences f (n)/g(n), as mentioned before, a substantial generalization follows from a very recent work of Levin [Lev5]. Concerning again these ratios, the continued-fraction expansion has been studied in a number of papers, e.g., [CZ11] containing the result of Exercise 4.47. Similar investigations have been carried out this time concerning the length ofthe period for the continued-fraction expansion of quadratic irrationals like f (n) when f (n) is a recurrence (see [BL] and [Scr]). A further result in this direction, answering a question by Mendès-France, appears in the paper by the authors [CZ11]. It is proved that, if α lies in a real quadratic field and is not a unit, and α 2 is not rational, then the length of the period for the continued fraction of α n tends to infinity with n. A particular case of Theorem 4.18 appeared already in [CZ1]. This sometimes works for general algebraic equations of the type F(Y, X1 , . . . , Xn ) = 0, to 8

Cerlienco, Mignotte, and Piras have even suggested that an algorithm in this direction might not exist.

168

Diophantine Equations with Linear Recurrences

be solved in S-units x1 , . . . , xn and S-integers y. When there is a dominant term one can often use a Puiseux expansion for Y , replacing the binomial expansion used in the above proof (see [CZ4a]). For n = 1 this applies to equations f (abm , y) = 0 (or f (u, y) = 0 in S-units u), leading back to results by Dèbes [De] on Hilbert’s irreducibility theorem (this is related to Exercise 3.73, but there is no need for Siegel’s theorem with the present approach). Sometimes Theorem 4.18 may be applied separately (and simultaneously) to each of several absolute values; see Exercise 4.42 above and [CZ2] for the equation f (am , y) = bn , with suitable assumptions on the polynomial f and on the integers a, b. The context of Theorem 4.18 substantially concerns integral points on a hypersurface in A1 × Gnm . For them little seems to be known in general, and Theorem 4.18 represents just a first step. We have the following conjecture, which again can be derived from Vojta’s conjecture. Conjecture Let V be an irreducible subvariety of A1 × Gnm with a Zariskidense set of S-integral points, such that the projection π : V → Gnm is finite above its image. Then π (V ) is an algebraic translate uH and there exist an isogeny σ : Ghm → H and a morphism τ : Ghm → V such that uσ = π ◦ τ . Note that A1 is the underlying algebraic variety for the algebraic group Ga . So the above conjecture constitutes a first attempt at describing integral points on hypersurfaces of commutative algebraic groups which are not semi-abelian varieties (in our case the algebraic group is Ga × Gnm ). The difficult point is the existence of σ , τ ; in Theorem 4.18 this corresponds to the algebraic identity holding on uH, which appears in the conclusion. In the case dimV = 1 the conjecture is true, as can be seen either by the method of proof of Theorem 4.18 or by Siegel’s theorem on integral points (see Exercise 3.73). Note that the above conjecture implies the degeneracy of perfect squares (or higher powers) of the form 1+u+v, where u, v are units (for instance it implies the finiteness of perfect squares of the form am + bn + 1, for fixed a > b > 1). A complex-analytic analogue of the above conjecture has been proved by Noguchi, Winkelmann, and Yamanoi, actually in much greater generality. The function-field analogue is settled, for dimV = 2, in [CZ17]. As alluded to above, it looks difficult to prove Theorem 4.18 on dropping condition (i). Even the finiteness of the integral solutions of simple equations like y2 = 1 + am + bn for fixed integers a, b, seems to be missed by the known results and techniques (compare this with Exercise 4.42(ii) above, where the fact that 6 and 10 are not coprime is crucial, and with Exercise 4.48 above).

4.9 Notes

169

In some special cases, e.g. for the equation y2 = 1 + 2m + 3n , ad hoc methods based on congruence can provide a finiteness statement over Q, but not for arbitrary number fields (see D. Leitner’s paper [Lei]). In [CZ1], the results on recurrences proved therein are applied to the construction of “universal Hilbert sets,” a concept related to Hilbert’s irreducibility theorem (Theorem 3.40); they are infinite sets H of integers such that, given any irreducible polynomial f ∈ Q[X,Y ], the polynomial f (h,Y ) is irreducible in Q[Y ] for all but a finite number of h ∈ H. The mere existence of such sets is an easy consequence of Hilbert’s theorem, but it may be of interest to produce “explicit” examples described by relevant number-theoretical functions (which was done, for example, by Sprindzuk, Bilu, M. Yasumoto, and others). In [CZ1], Corollary 3, it is proved for instance that (see also Exercise 4.39(ii)) If f (n) = ∑si=1 ci ani , for positive integers a1 , . . . , as (s ≥ 2) multiplicatively independent and non-zero integers c1 , . . . , cs , then f (N) is a universal Hilbert set. The case f (n) = 2n + 3n was a conjecture by Yasumoto; see also [DeZ] for previous partial results in this direction. As for the theory treated in the previous chapter, most of the Diophantine results presented in this chapter have a complex-analytic analogue. Consider for instance Theorem 4.5, concerning the divisibility of values of two linear recurrences. To simplify, suppose the two recurrences are simple, defined by n → f (n) := a1 α1n + · · · + ah αhn and n → g(n) := b1 β1n + · · · + bk βkn for non-zero complex numbers a1 , . . . , ah , α1 , . . . , αh and b1 , . . . , bk , β1 , . . . , βk . We can associate two one-parameter groups ϕ : C → Ghm , ψ : C → Gkm defined as ϕ (z) = (α1z , . . . , αhz ) and ψ (z) = (β1z , . . . , βkz ), depending on some determinations of the logarithms for the complex numbers αi , β j . We can then consider the set of points z ∈ C at which the first one-parameter group ϕ (resp. the second one-parameter group ψ ) intersects the divisor a1 x1 + · · · + ah xh = 0 in the variety Ghm (resp. the divisor b1 y1 + · · · + bk yk = 0 in Gkm ). We expect that the two sets do not have “large” intersections, apart from in “special” cases that should be easily described. This turns out to be true, and is proved in [CoNo], in greater generality, in the sense that the holomorphic maps ϕ , ψ are not supposed to be group homomorphisms and the target algebraic group can be any semi-abelian variety. Note that in the arithmetic case one can just exclude the integrality of the ratio f (n)/g(n) for large n, which amounts to saying that, for all large n, either g(n) has a prime factor not dividing f (n), or the prime factors of g(n) all divide f (n) but one of them appears in the factorization of g(n) with a higher

170

Diophantine Equations with Linear Recurrences

multiplicity than in the factorization of f (n). In the complex analytic case we have the analogue of the first conclusion, thanks to the control of the “ramification term.” A complex-analytic companion can be provided also to Theorem 4.15. Given two never-vanishing complex analytic functions f , g : C → C∗ , such that ( f − 1)/(g − 1) is entire, J. Noguchi and the first author of the present book proved in [CoNo] that either f , g are multiplicatively dependent or the ratio of their characteristic functions T f (r)/Tg (r) tends to infinity. (Here, by characteristic functions we mean those defined in Nevanlinna theory, which constitute the complex-analytic analogue of the logarithmic Weil height of algebraic numbers (see [BoG] or [Vo5]).) The proof of this result was obtained by appealing to a previous theorem of Noguchi, Winkelmann, and Yamanoi [NWY], which constitutes the exact analogue of the gcd estimates of Theorem 4.7. A compact analogue, which holds for abelian varieties, is also proved by K. Yamanoi (see Chapter 6 of [NW2]). In the case of function fields, an analogue of the gcd estimates of Theorem 4.7 has been proved (see [CZ12]). It admitted some applications to the solution of Vojta’s conjecture over function fields for certain surfaces. Diophantine equations of mixed polynomial-exponential type can be viewed geometrically as intersecting complex-analytic subgroups in a linear commutative algebraic group with algebraic varieties. Consider for instance Losert’s equation, Equation (4.23): 2l + m2m = n2n . In the torus G2m × G3m with coordinates ((x1 , x2 ), (u1 , u2 , u3 )) consider the analytic subgroup H defined by the two equations u1 = 2x1 , u2 = 2x2 . Define the algebraic variety X ⊂ G2a × G3m by the linear equation u3 + x1 u1 = x2 u2 . Then each solution (l, m, n) to the above equation gives rise to the integral point ((m, n), (2m , 2n , 2l )) ∈ (H ∩ X). The two-parameter family of solutions provided in (4.24) can easily be adapted to define a dominant holomorphic map C2 → H ∩ X. Concerning fractional parts of powers, a recent preprint of A. Kulkarni, N. Mavraki, and K. Nguyen ([KMN] considers expressions like ∑hi=1 qi αhn , where α1 , . . . , αh are fixed algebraic numbers and the height of the qi is infinitesimal with respect to n. In this context they prove an analogue of Theorem 4.24; in particular they manage to classify the linear recurrent sequences n → f (n) of algebraic numbers for which there exists a real θ with 0 < θ < 1 such that f (n) < θ n infinitely often.

4.9 Notes

171

The gcd estimates for pairs of the form (u − 1, v − 1), u, v being S-units, admit a conjectural elliptic analogue. In the notation of the conjecture of Section 3.5.1, let E1 , E2 be two non-isogenous elliptic curves defined by Weierstrass equations over the integers, and denote by d(P) the denominator function as in Section 3.5.1. J. Silverman, inspired by the aforementioned gcd estimates for S-units, conjectured in [Sil2] that, for (P1 , P2 ) ∈ E1 (Q) × E2 (Q), log gcd(d(P1 ), d(P2 )) E1 ,E2 ,ε ε max(h(P1 ), h(P2 )), where h(P1 ) (resp. h(P2 )) denotes the logarithmic Tate height on E1 (resp. E2 ). From this conjectural estimate one could deduce the conjecture proposed in Section 3.5.1 on the finiteness of the pairs (P, Q) with d(P) = d(Q). Also, these gcd estimates admit an analogue in positive characteristic, as shown in [CZ16a], leading to a new proof, and sometimes to improvements, of Weil’s bound for the number of rational points on a curve over a finite field. See also [BGS] for some applications of these new bounds to the distribution of Markov numbers.

5 Some Applications of the Subspace Theorem in Transcendental Number Theory

In this final chapter, we shall be interested in proving that certain complex numbers are transcendental. As for other results in this book, the main tool will be the subspace theorem.

5.1 Transcendence of Lacunary Series Every property of algebraic numbers automatically provides a transcendence criterion: the (complex) numbers which do not satisfy this property are transcendental. In this chapter, we consider Diophantine-approximation properties of algebraic numbers, and use them to deduce transcendence criteria. The first instance of this procedure appeared with Liouville in 1844 (see Theorem 1.33 and Exercise 1.50 from Chapter 1), who gave the first proof of the existence of transcendental numbers. His inequality implies, for example, the transcendency of numbers like ∞

∑ 10−n! .

n=0

More generally, by Liouville’s theorem, Theorem 1.33, one can prove the transcendency of a certain uncountable set of real numbers, nowadays named Liouville numbers: they are those real numbers α ∈ R with the property that for every real number μ > 0 the inequality p 0 < α − < q−μ q admits infinitely many solutions in rational numbers p/q. This set has Lebesgue measure zero (see Exercise 1.15), and in some sense it contains only numbers which are constructed ad hoc to be transcendental: for instance, the number π , 172

5.1 Transcendence of Lacunary Series

173

Napier’s constant e, and the logarithms of integers > 1, which are well known to be transcendental, do not satisfy Liouville’s transcedence criterion.1 Replacing Liouville’s theorem by Roth’s theorem (Theorem 1.36) leads to a more efficient transcendence criterion of similar nature, enabling one, for −3n , as we now show. Put α = instance, to prove the transcendency of ∑∞ n=1 2 n −3 , and, for n = 0, 1, . . ., put p /q = n 2−3i , where the fraction is ∑∞ ∑i=0 n n n=1 2 meant to be in reduced terms. Then 0 0 and all large n (depending on ε ). On choosing, for example, ε = 1/2, we obtain a contradiction, if we assume that α is algebraic. Again by Ridout’s theorem, one can prove, for instance, the transcendence −Fn , where F is the nth Fibonacci number. (See Exercise of the number ∑∞ n n=1 2 3.70). However, even Ridout’s theorem or its generalization provided by Theorem 1.39 would not suffice to treat infinite series like ∞

∑ β2 , n

n=0

whenever 0 < β < 1 is an irrational (real) algebraic number. Here we remark that Mahler proved their transcendence, by using the functional equan tion f (x2 ) = f (x) − x satisfied by the analytic function f (x) = ∑n x2 . Mahler’s method has been much developed in recent years: see, for example, the book by K. Nishioka [Nish]. By using the subspace theorem, we can treat that case too, and actually obtain the following more general result, which appears as Corollary 3 in [CZ4a].

1

While for the number e, whose continued-fraction expansion is known, it is easy to prove that it is not a Liouville number, proving this fact for π or for the logarithms of integers is more difficult; it can be done, for example, via Baker’s theory of linear forms in logarithms; see for instance [B].

174

Some Applications of the Subspace Theorem

Theorem 5.1 Let m1 < m2 < · · · be a sequence of positive integers satisfying mn+N = ∞. n→∞ mn

sup lim sup N

(5.1)

Let a1 , a2 , . . . be a sequence of positive real algebraic numbers satisfying mi h(an ) = o(mn ). Then the real function defined in (0, 1) by the series ∑∞ i=1 ai x takes transcendental values at all algebraic points in (0, 1). In our case, we can take for the exponents the sequence mi := 2i ; since mn+N /mn = 2N for all n, the relation (5.1) appearing in Theorem 5.1 is satisfied; if we take all the coefficients an to be 1, the condition on the heights is clearly satisfied too, and we obtain as a particular case the transcendency of 2i ∑∞ i=1 x at any algebraic point x ∈ (0, 1). Note that the condition (5.1) certainly holds whenever mn+1 > cmn for a fixed c > 1 and all large n. Note also that we do not require any functional mi equation for the analytic function ∑∞ i=1 ai x involved. An analogue of Theorem 5.1 also holds in the ultrametric setting, i.e., the ai belong to a p-adic field Q p , are algebraic, and satisfy the same growth condition as above for the height; the series ∑i ai xi converges in the open unit ball in Q p , and the conclusion is that its values are transcendental for x = 0. We sketch the proof of Theorem 5.1. Take an algebraic γ ∈ (0, 1) and let mi α = ∑∞ n=1 ai γ . Fix for the moment a number N > 1, which will be taken sufficiently large at the end of the proof. For each natural number n = 1, 2 . . ., let n

αn := α − ∑ ai γ mi . i=1

∞ mi mi Then we have the identity αn − ∑n+N i=n+1 ai γ = ∑i=n+N+1 ai γ , from which we derive the inequality n+N ∞ mi (5.2) 0 < αn − ∑ ai γ ≤ ∑ ai γ mi ≤ c1 · γ mn+N+1 , i=n+N+1 i=n+1

for some constant c1 > 0. Now, the logarithmic height of αn can be bounded as h(αn ) ≤ h(α ) + h

n

∑ ai γ mi

+ log 2 ≤ c2 · (mn + log n).

i=1

Note that the remainder term, i.e. the right-hand-side term in (5.2), is bounded by H(γmn+N )−ε for some fixed positive ε . Also, by our assumption (5.1) in Theorem 5.1, it is bounded from above as H(αn )−δ , for every δ > 0,

5.1 Transcendence of Lacunary Series

175

provided we take N suitable and large enough. Finally, on taking N suitable and ε small enough so that |γmn+N+1 | < H(γmn+N )−ε , we obtain that |αn − an+1 γ mn+1 − · · · − am+N γ mn+N | < H(αn )−1−δ · (H(γ mn+1 ) · · · H(γ mn+N ))−δ . We then have “small values” of a linear form in the N +1 terms αn , an+1 γ mn+1, . . . , an+N γ mn+N , of which the first one is an S-integer, over a fixed ring of Sintegers, while the others are “almost” S-units. By almost S-units we mean that a dominant part of their height is due to an S-unit: in this case the algebraic numbers are of the form ai γ mi ; for a suitable finite set S independent of the index i the factor γ mi is an S-unit; its (logarithmic) height is h(γ mi ) = mi h(γ ) (note that h(γ ) > 0 since |γ | < 1, in the ordinary absolute value); now, since h(ai ) = o(mi ), we have that h(ai ) = o(h(γ mi )). To this situation we can apply the subspace theorem, in a manner similiar to what was done in the proof of Theorem 4.18. In our situation, where we are assuming that all coefficients ai as well as the algebraic number γ are positive reals, we cannot have vanishing subsums, so the inequality of the subspace theorem holds unconditionally (not merely outside a finite union of hyperplanes). We then obtain the sought conclusion, i.e. that the αn (and hence α ) cannot be algebraic. For a more general result on lower bounds for linear forms with one Sinteger and N “almost” S-units, see Theorem 4 in [CZ4a]. Let us note that Mahler’s theorem asserting the transcendency of the num2n bers ∑∞ n=1 β , where 0 < β < 1 is an algebraic number, is obtained as a very particular case of Theorem 5.1.2 By application of the same technique based on the subspace theorem, we recover also another result generalizing Mahler’s theorem. Theorem 5.2 Let f (z) ∈ C((z)) be a Laurent series with complex algebraic coefficients, with positive convergence radius. Let q be a complex algebraic number, 0 < |q| < 1. Let OS be a ring of S-integers in a number field. If f is not a Laurent polynomial, and f (qn ) ∈ OS for all integers n in an infinite sequence N , then h( f (qn )) = ∞. lim n∈N n Let us see why Theorem 5.2 implies Mahler’s theorem. On taking for f (z) n the Fredholm series f (z) = ∑n≥0 z2 , which satisfies the functional equation f (z2 ) = f (z) − z, we obtain from the functional equation that, if f (q) is algen braic for some complex number q, then, for all n ≥ 1, f (q2 ) is an S-integer in 2

As remarked by Waldschmidt [Wa2], the transcendency of the number ∑n 2−2 goes back to a 1916 work by Kempner. n

176

Some Applications of the Subspace Theorem

a fixed ring of S-integers OS ; again from the functional equation it is easy to n see that h( f (q2 ))/2n is bounded. Since f (z) is not a Laurent polynomial, we obtain a contradiction from Theorem 5.2 whenever q is algebraic.

5.2 Complexity of Algebraic Numbers In the previous paragraph we exploited the lacunarity of certain power series to deduce the transcendence of their sums. Now, we shall be interested in another feature, namely the regularity, or, viewed from the opposite side, the complexity, of some expressions defining a real number in terms of integers: for instance, rational numbers have a decimal expansion which is ultimately periodic (possibly even finite) and this is the most regular possible expression. In some sense, we expect that algebraic numbers admit either an expansion which is so regular as to ensure rationality, or a very complex one, in a sense which will be clarified later. −2n . For every Let us go back again to our original example, namely ∑∞ n=1 2 −i real number α in the interval (0, 1) we have the 2-adic expansion ∑∞ i=1 ai 2 , where a0 , a1 , a2 , . . . is a sequence with values in {0, 1}. The expression is unique, unless α belongs to the ring Z[1/2], in which case it admits exactly two such expressions. −2n (which can be written A peculiarity of the expansion of α := ∑∞ n=1 2 −i equivalently as α = ∑∞ i=1 ai 2 , where i → ai is the characteristic function of the set of powers of 2) is the appearance of long repetitions of zeros. A natural possible precise formulation of this property, i.e. the presence of long repetitions of zeros, has been considered by B. Adamczewski, Y. Bugeaud, and F. Luca in [ABL]. It makes use of the following definition. Definition Given a finite set (an alphabet) A , we say that a sequence u1 , u2 , . . . in A has long repetitions if there exists a positive ε such that, for infinitely many N ∈ N, the word u1 u2 · · · uN has two disjoint equal subwords of length ≥ ε N. (In what follows we shall use the notation l(A) for the length of a finite word A = u1 u2 · · · ul .) Clearly, the sequence which equals 1 on the numbers of the form 2i , and 0 elsewhere, has long repetitions, namely long repetitions of zeros. Also, periodic or ultimately periodic sequences have long repetitions. The main theorem of [ABL] can be stated as follows. Theorem 5.3 Assume that for some integer b ≥ 2, the b-ary expansion of the real number α ∈ (0, 1) has long repetitions. Then α is either rational or transcendental.

5.2 Complexity of Algebraic Numbers

177

This result is clearly related to the idea behind Exercise 1.50, or to the discussion at the beginning of this chapter: Roth’s or Ridout’s theorem can be applied whenever there are sufficiently long repetitions of zeros, whereas in Theorem 5.3 any kind of long repetition is allowed. In Theorem 5.1 too, we require long repetitions of zeros, but we can replace a b-ary expansion, where b is a natural number, with an arbitrary power series with an algebraic base (and we can also allow an infinite alphabet, provided the height inequality in the statement holds). In the proof of Theorem 5.3 we shall need a refinement of Theorem 4.16, which we state here. Theorem 5.4 Let n → f (n) be a power sum with positive integral roots and b ≥ 1 a positive integer. For every real algebraic number α and positive real number ε > 0 there exist only finitely many rational numbers of the form m/bk f (n) such that α − m < |m|−1−ε . bk f (n) Remark 5.5 The case b = 1 coincides with Theorem 4.16. The same conclusion would follow for the approximations of the form mbk / f (n). Again, this coincides with Ridout’s approximation theorem whenever f (n) is a geometric progression. We omit the proof of Theorem 5.4, since it is entirely equivalent to that of Theorem 4.16. Proof of Theorem 5.3. We follow [ABL] and especially Bilu’s survey [Bilu1], from which we adopt the notation. Let us write the b-ary expansion of α as

α = u1 b−1 + u2 b−2 + u3 b−3 + · · · ,

(5.3)

where the digits ui belong to the finite alphabet {0, 1, . . . , b − 1}. By assumption, there exists a positive real number ε and infinitely many natural numbers N such that the word u1 · · · uN can be written as ABCB, where B has length at least ε N (and the words A,C might even be empty). Fix one such natural number N. Let ξ be the rational number whose b-ary expansion is the eventually periodic word ABCBCBC · · · . Then

ξ=

M br (bs − 1)

,

where r is the length of A and s is the period, i.e. the length of BC. The main point of the proof is that ξ is a good rational approximation to α , as will be checked presently, and also ξ is of special type, in the sense that it is a rational

178

Some Applications of the Subspace Theorem

number whose denominator is the product of a power br of a fixed number b and the value of a linear recurrence sequence (s → bs − 1). We are then in a situation covered by Theorem 5.4 which provides for the distance |α − ξ | the same lower bound as in Ridout’s theorem, i.e., as if the denominator of the approximants were a power of b. Let us now verify that ξ is indeed a good rational approximation to α . Since the length of ABCB (which is a common prefix to both α and ξ ) is ≥ r +s+ ε N, we obtain |α − ξ | ≤ b−r−s−ε N ≤ b−(1+ε )N . (The last inequality follows from the fact that N is the length of ABCB, so N > r + s.) Since the height of the approximation is ≤ |b|r+s ≤ bN , we obtain from the above inequality that |α − ξ | < H(ξ )−(1+ε ) . Then, we can apply Theorem 5.4 and conclude that the above inequality can hold only for finitely many rational ξ of the form M/br (bs − 1) as above. This means that infinitely many of the numbers ξ constructed above (starting from the corresponding numbers N) do indeed coincide. This can happen only when α is rational, equal to one of its approximations ξ , concluding the proof of Theorem 5.3. This issue has been developed by Adamczewski and Bugeaud, who considered the longstanding question of the complexity of algebraic numbers. The real numbers that are among the simplest to describe are those admitting finite or periodic b-ary expansion, with respect to some integer base b ≥ 2: these are well known to be the rational ones. In general, we give a definition of the complexity of a b-ary expansion as follows. Definition Let A be a finite alphabet and let U = (u1 , u2 , u3 , . . .) be an infinite sequence of letters from A . For every positive integer n ≥ 1, we let ρU (n) be the number of distinct words occurring as consecutive elements of U:

ρU (n) = |{uk uk+1 · · · uk+n−1 , : k = 1, 2, . . .}|. Clearly 1 ≤ ρU (n) ≤ |A |n . The function ρU : N → N is called the complexity function, or simply the complexity, of the sequence U. Clearly, the periodic sequences have bounded complexity. A classical theorem by Morse and Hedlund (see [BK], Theorem 1.1) provides a weaker condition for periodiciy in terms of the ρ -function.

5.2 Complexity of Algebraic Numbers

179

Theorem 5.6 (Morse and Hedlund) Let U = (u1 u2 · · · ) be an infinite word in a finite alphabet. The following statements are equivalent: (1) U is eventually periodic; (2) ρU = O(1); (3) there exists a number n ∈ N such that ρU (n) = n. Now let α ∈ (0, 1) be a real number and let b ≥ 2 be an integer. We can write the b-ary expansion of α as in (5.3). Now the theorem of Adamczewski and Bugeaud proved in [AB2] reads as follows. Theorem 5.7 Let α ∈ (0, 1) be an irrational (real) algebraic number and let b ≥ 2 be an integer. The complexity ρU of the b-ary expansion U = u1 , u2 , . . . of α satisfies ρU (n) = ∞. lim n→∞ n This result greatly improves a previous one by S. Ferenczy and C. Maudit [FM], asserting that ρU (n) − n → ∞. On the other hand, a conjecture originat´ ing from the work of Emile Borel predicts that irrational algebraic numbers are normal, i.e. in their b-ary expansion every word appears with the correct frequency, which implies that ρU (n) ∼ bn . This seems, however, to be far beyond the capability of the known methods. Theorem 5.7 can be reduced to Theorem 5.3 via the following (elementary but ingenious) combinatorial lemma, whose proof exposition is borrowed from [Bilu1]. Lemma 5.8 If the complexity function3 of an infinite sequence U in a finite alphabet satisfies lim infn→∞ ρU (n)/n < ∞, then U has long repetitions. Proof By hypothesis there exists a number c, which can be taken to be an integer, such that infinitely often ρU (n) < cn. For such an n take N = (c + 1)n and consider the prefix u1 u2 · · · uN of U. Since this prefix contains cn words of length n, two such words must coincide. If these words are disjoint, we have found a long repetition, in the sense of our definition with ε = 1/c. Suppose that these words are not disjoint: then u1 u2 · · · uN contains a subword of the form W = ABC, where A, B,C are non-empty words and AB = BC have length n. From AB = BC we obtain AAB = ABC = W ; now, if l(A) > n/2 we have found a repetition, namely A; the ratio l(A)/N is ≥ (n/2)/(c + 1)n > 1/2(c + 1); so it is a long repetition for ε = 1/(2(c + 1)). If, on the other hand, l(A) ≤ n/2, so that l(AA) ≤ n, then from ABC = AAB and the fact that l(AB) = n it follows that AA is a prefix of AB. More generally, on putting k = [n/l(A)] + 1 we obtain that A · · · A, k times, is a prefix for W ; in particular 3

See the definition above.

180

Some Applications of the Subspace Theorem

there are two equal disjoint words, which are of the form A · · · A ([k/2] times), of length [k/2]l(A) > kl(A)/3 > n/3. So we have found a long repetition with ε = 1/3(c + 1) and the lemma is proved. Another function measuring the complexity of infinite words has recently been introduced by Y. Bugeaud and D. H. Kim. Definition

For an infinite word U = u1 u2 · · · , set rU (n) = min{m ≥ 1 : ∃i with 1 ≤ i ≤ m − n, such that ui · · · ui+n−1 = um−n+1 · · · um }.

In other words, rU (n) is the length of the smallest prefix of the word U containing a repetion of subwords of length n. Note that these subwords can overlap. It is not difficult to see that rU (n) ≤ ρU (n) + n. A criterion for periodicity in terms of the r-function reads as follows (see [BK], Theorem 2.3). Theorem 5.9 Let U = (u1 u2 · · · ) be an infinite word. The following conditions are equivalent: (1) U is eventually periodic; (2) rU (n) − n = O(1); (3) for all large integers n, rU (n) ≤ 2n. In view of Theorems 5.9 and 5.6 it is natural to consider sequences for which the ρ -function equals n + 1; these are in some sense the least complex words after the periodic ones. They are characterized by the following theorem/definition (see [BK], Definition 1.2 and Theorem 2.4) and are called Sturmiam words. Theorem 5.10 Let U be an infinite word. The following conditions are equivalent: (1) ρU (n) = n + 1 for all n ≥ 1; (2) rU (n) ≤ 2n + 1 for all n ≥ 1, with equality for infinitely many n. Let b ≥ 2 be an integer. The real numbers whose b-ary expansion is a Sturmian word (in the finite alphabet {0, . . . , b−1}) turn out to be transcendental in view of Theorem 5.7 (and the above characterization of Sturmian words). The following result by Bugeaud and Kim provides a lower bound for their measure of irrationality; recall that the measure of irraionality μ (α ) of a real number α is the supremum of the real numbers ν such that the inequality |α − p/q| < q−ν admits infinitely many solutions in rational numbers p/q (since p, q are coprime integers, q > 0).

5.2 Complexity of Algebraic Numbers

181

Theorem 5.11 Let b ≥ 2 be an integer. Let U = u1 u2 · · · be a Sturmian word in the alphabet {0, . . . , b − 1}. Then √ ∞ ui 5 4 10 = 2.5099 . . . . μ ∑ i ≥ + 3 15 i=1 b Note that the above estimate provides, in view of Roth’s theorem, another proof of the transcendency of such numbers. A strengthening of Theorem 5.7 can be stated in terms of the r-function, and reads as follows. Theorem 5.12 Let α be a real algebraic number, with b ≥ 2 an integer. The b-ary expansion U = u1 u2 · · · of α satisfies rU (n) = ∞. n We now draw some interesting consequences from Theorem 5.7, which are also due to Adamczewski and Bugeaud. These consequences will be formulated in the language of finite automata. Roughly speaking, we shall deduce from Theorem 5.7 (resp. Theorems 5.16 and 5.17) that the b-ary expansion (resp. continued fraction expansion) of an irrational algebraic number (resp. algebraic number of degree ≥ 3) cannot be computed by a finite automaton (is not automatic). lim

n→∞

We follow once more Bilu’s survey [Bilu1] for the definition of finite automata (but for simplicity we restrict our discussion to the case where the input alphabet coincides with the output alphabet). Definition

A finite automaton consists of the following elements:

• a finite alphabet A ; • the set of states Q, a finite set with two or more elements, with one marked element (the initial state); • the transition map Q × A → Q; • the output map Q → A . An example of a finite automaton with three states (X,Y, Z) is represented in Figure 5.1, redrawn from [Bilu1]. The input (and the output) alphabet is A = {0, 1}. The set of states is Q = {X,Y, Z}. The transition map is depicted in Figure 5.1, and the initial state is, say, X. As for the output map, choose for instance the one sending X → 1,

Y → 0,

Z → 1.

Suppose now we have as input a word like 10011. Starting from the right,

182

Some Applications of the Subspace Theorem 0

X

0

0

Y

Z

1

1 1

Figure 5.1

the digit 1 tells us to move from X to Z, while the second 1 tells us to stay in Z. The 0 moves from Z to X and now the 0 makes us move to Y ; finally, the last 1, since we are in Y , sends us to X, so the final output is the image of X under the output map, i.e. 0. Let us fix a natural number b ≥ 2 and consider the sequence of the b-ary expansions of the natural numbers 0, 1, 2, . . . (for instance, for b = 2 we obtain 0, 1, 10, 11, 100, . . .). We then obtain a sequence u1 , u2 , . . . of words in the alphabet A = {0, 1, . . . , b − 1}. Now let us take a finite automaton with alphabet A = {0, 1, . . . , b − 1} and give to it consecutively as inputs the elements u1 , u2 , . . . constructed above. We obtain as output a sequence of letters of A , i.e., an infinite word in the alphabet A . We call such a sequence an automatic sequence. An automatic number is a number whose b-ary expansion is an automatic sequence. Example 5.13 In the case of the automaton described in Figure 5.1, on taking b = 2 the corresponding automatic sequence is the infinite word 01010111 . . . . In fact, 0 gives as output 0, since it sends X → Y and the output corresponding to Y is 0; 1 gives 1, since X is sent to Z, and Z gives the output 1; 10 gives 0; 11 gives 1; 100 gives 0; while 101, 111, and 1000 all give as output 1. Example 5.14 We show now that the characteristic sequence of the set of powers of 2 is automatic. We follow Waldschmidt’s construction in [Wa2]. Take b = 2, A = {0, 1}, Q = {X,Y, Z} as before, where X is the initial state, and define the transition map illustrated in Figure 5.2. Finally, the output map is set to send X → 0, Y → 1, Z → 0.

5.2 Complexity of Algebraic Numbers 0

X

183

0

1

Y

0 1

Z

1

Figure 5.2

We consider again the sequence of non-negative integers in base 2: 0, 1, 10, 11, 100, 101, . . . and see that, starting from X ∈ Q, we obtain the sequence X,Y,Y, Z,Y, Z, Z, . . . . According to the above-defined output map, the resulting sequence is 011010001000000010 · · · having a 1 at the place 2n , for n = 0, 1, . . ., and 0 elsewhere. Note that the asymptotic percentage of the 1s vanishes, and the complexity function is clearly O(n). The latter fact is actually common to all automatic sequences (see, for example, [AllSh]): namely for every automatic sequence U, ρU (n) = O(n). It then follows from Theorem 5.7 that we have the corollary below. Corollary 5.15

An irrational automatic number is transcendental.

One then obtains once again the transcendency of the number ∑n≥0 2−2 (the theorem of Kempner and Mahler mentioned above). n

Continued fractions. Another possible representation of real numbers employs continued fractions, as mentioned in Chapter 1. Finite continued fractions correspond to rational numbers, while periodic (or eventually periodic) ones correspond to irrational quadratic numbers. Again, a natural problem arises, concerning the “complexity” of the representation of an algebraic number which is neither rational nor quadratic. To the best of our knowledge, for no algebraic irrational of degree higher than 2 is a continued-fraction expansion known, in the sense that the sequence of partial quotient can be described in

184

Some Applications of the Subspace Theorem

any simple way. On the basis of numerical evidence and probabilistic considerations, it is conjectured that such sequences should be unbounded; however, at present we seem to be still very far from a proof of this fact. The following result, due to Bugeaud [Bug], implies in particular that algebraic numbers cannot have “too simple” a continued-fraction expansion. Theorem 5.16 (0, 1) be

Let the continued-fraction expansion of the real number α ∈

α = [0, a1 , a2 , . . .], where a1 , a2 , . . . is an infinite non-periodic sequence admitting long repetitions. 1/n Suppose, moreover, that the sequence qn is bounded. Then α is transcendental. We have used notation drawn from Chapter 1: if α = [a0 , a1 , a2 , . . .], then pn /qn = [a0 , a1 , . . . , an ] is the reduced form of the nth approximant. 1/n Note that the condition that qn = O(1) is almost always satisfied, in the sense of Lebesgue measure. Also, it certainly holds if the sequence of partial quotients an is bounded, so that Theorem 5.16 applies in that case. 1/n If we drop the hypothesis that the sequence of the denominators qn is bounded, a transcendency conclusion can still be achieved, albeit under stronger conditions on the repetitions appearing in the sequence an ; the corresponding result due to Adamczewski and Bugeaud [AB2] reads as follows. Theorem 5.17 Let α = [0, a1 , a2 , . . .] be, as before, the continued expansion of the real number α ∈ (0, 1). Suppose that for infinitely many n ∈ N the sequence a1 , a2 , . . . begins with a block of the form Bn Bn , where the length of Bn tends to infinity. Then α is transcendental. We note that no condition on the growth of the sequence qn appears in the statement, unlike what happens for Theorem 5.16; however, the requirement on the presence of long repetitions is stronger here. Proof Suppose by contradiction that the real number α = [0, a1 , a2 , . . .] above is algebraic. Let N ⊂ N be the infinite set of positive integers n for which the sequence a1 , . . . , an , a1 , . . . , an = a1 , . . . , a2n . For each n ∈ N define the real quadratic number αn to be

αn := [0, a1 , . . . , an ]. Then αn satisfies the quadratic equation Pn (αn ) = 0, for the polynomial Pn (X) := qn−1 X 2 + (qn − pn−1 )X − pn .

5.2 Complexity of Algebraic Numbers

185

(Here, as above, we denote by pn /qn , n = 1, 2, . . ., the sequence of convergents to α .) By Cauchy’s theorem, we have the inequality |Pn (α )| = |Pn (α ) − Pn (αn )| ≤ qn · |α − αn | qn q−2 2n , since the first 2n partial quotients are the same for α and αn . Now, since q2n > q2n , we obtain from the above inequality that −3 |Pn (α )| q−3 n H(Pn ) ,

(5.4)

where the last term H(Pn ) denotes the height of the polynomial Pn . Consider now the three linear forms with coefficients in Q(α ), L1 (X1 , X2 , X3 ) := α 2 X2 + α X1 + X3 , L2 (X1 , X2 , X3 ) := X2 , L3 (X1 , X2 , X3 ) := X3 , and the rational point x = (x1 , x2 , x3 ) := (qn − pn−1 , qn−1 , −pn ). By (5.4) the product L1 (x)L2 (x)L2 (x) turns out to be H(Pn )−1 H(x)−1 and an application of the subspace theorem enables us to conclude the existence of a nontrivial linear dependence relation among qn − pn−1 , pn , qn−1 holding infinitely often. By using this relation to write pn or qn−1 as a function of the remaining two and substituting the result into the linear form L1 (x) one obtains a linear form in two variables taking values that are too small, hence contradicting Roth’s theorem. Even in the context of continued fractions, we have an analogue of Theorem 5.7, providing a lower bound for the complexity function of the sequence of partial fractions of an algebraic number. In the paper [Bug] Bugeaud proved the following theorem. Theorem 5.18 Let α = [a0 , a1 , . . .] be the continued-fraction expansion of an algebraic number of degree at least 3. Then the complexity function ρa of the sequence a = (a0 , a1 , . . .) satisfies lim

n→∞

ρa (n) = ∞. n

The weaker result asserting only that ρ (n) − n → ∞ appeared as Theorem 4 in [AB1]. As remarked in [BK], the same conclusion holds for the function rU (n)/n, instead of ρU (n)/n. We address the reader to [AB1] and [Bug] for the proof of the last three theorems, which once again involves the subspace theorem as the crucial Diophantine tool.

186

Some Applications of the Subspace Theorem

Here, we shall present the proof of yet another result obtained by the same authors, which concerns palindromic continued fractions. Recall that a word is said to be palindromic if it coincides with its mirror image: u1 · · · ul is palindromic if u1 u2 · · · ul = ul ul−1 · · · u1 . In general, if A = u1 · · · ul , let us denote by A¯ the reverse word A¯ = ul · · · u1 . Hence a word is ¯ where A is an arbitrary palindromic if it is of the form AA¯ or of the form AuA, word and u an arbitrary letter. Theorem 1 from [AB3] reads as follows. Theorem 5.19 Let U = u1 u2 · · · be an infinite sequence of positive integers, not ultimately periodic. Suppose that the word U admits infinitely many palindromic prefixes. Then the real number α := [0, u1 , u2 , . . .] is transcendental. The proof uses the following classical fact. Lemma 5.20 Let α = [0, u1 , u2 , . . .] be the continued-fraction expansion of a real number, with convergents pl /ql . Then, for every l ≥ 2, ql−1 = [0, ul , ul−1 , . . . , u1 ]. (5.5) ql This lemma is well known, and easy: it suffices to argue by induction using the recurrence formula qn+1 = an+1 qn + qn−1 . Proof of Theorem 5.19. Let α = [0, u1 , u2 , . . .]. Let N be the set of natural numbers for which the palindromic word u1 u2 · · · un−1 un un−1 · · · u1 =: Un unU¯n or the palindromic word u1 u2 · · · un−1 un un un−1 · · · u1 =: UnUn is a prefix of u1 u2 · · · . We are supposing that this set N is infinite. For simplicity, we suppose that for infinitely many n the palindrome is of the second form UnU¯n , hence of even length 2n. Let pl /ql = [0, u1 , . . . , ul ] be the sequence of convergents to α . For all n ∈ N we have p2n = [0,UnU¯n ]. q2n From the above lemma we deduce that p2n /q2n = q2n−1 /q2n , whence by the coprimality of the two fractions we deduce that q2n−1 = p2n . Then we can write 2 p2n−1 2 p2n−1 p2n 1 α − = α − ≤ α − p2n · α + p2n−1 + · . q2n q2n−1 q2n q2n q2n−1 q2n q2n−1 Recalling that

α − p2n < 1 , q2n q22n

α − p2n−1 < 1 , q2n−1 q22n−1

5.2 Complexity of Algebraic Numbers

187

and that 0 < α < 1, |p2n q2n−1 − p2n−1 q2n | = 1, and q2n ≤ (u2n + 1)q2n−1 = q2n ≤ (u1 + 1)q2n−1 , we can deduce from the above inequality that 2 p2n−1 1 u1 + 1 α − ≤ 2 α − p2n + ≤ 2 . q2n q2n q2n q2n−1 q2n We then have at our disposal of two good rational approximations to α (resp. α 2 ), i.e. p2n /q2n (resp. p2n−1 q2n ) having the same denominator. They satisfy u1 + 1 p2n 2 p2n−1 , ≤ 2 . α − max α − q2n q2n q2n We can then apply the subspace theorem to the linear forms in three variables, L1 (X1 , X2 , X3 ) := α X3 − X1 , L2 (X1 , X2 , X3 ) := α 2 X3 − X2 , L3 (X1 , X2 , X3 ) := X3 , and to the rational points of the sequence (x1 , x2 , x3 ) = (q2n , p2n , p2n−1 ). The product of the three linear forms is H(x1 , x2 , x3 )−1 , so for infinitely many points (q2n , p2n , p − 2n − 1) a fixed linear dependence relation between q2n , p2n , p − 2n − 1 should hold. This would imply, however, that the three numbers 1, α , α 2 are linearly dependent over Q, but this can hold only if α is rational or quadratic over Q, which is excluded by our hypothesis, so the proof has been concluded.

References

[AbH] D. Abramovich, J. Harris. Abelian varieties and curves in Wd (C). Compositio Math., 78 (2) (1991), 227–238. [AB1] B. Adamczewski, Y. Bugeaud. On the complexity of algebraic numbers. II. Continued fractions. Acta Math. 195 (2005), 1–20. [AB2] B. Adamczewski, Y. Bugeaud. On the complexity of algebraic numbers. I. Expansions in integer bases. Ann. Math. 165 (2) (2007), 547–565. [AB3] B. Adamczewski, Y. Bugeaud. Palindromic continued fractions. Annales Inst. Fourier 57 (2007), 1557–1574. [ABL] B. Adamczewski, Y. Bugeaud, F. Luca. Sur la complexité des nombres algébriques. C.R. Math. Acad. Sci. Paris 339 (2004), 11–14. [AllSh] J.-P. Allouche, J. Shallit. Automatic Sequences. Theory, Applications, Generalizations. Cambridge University Press, 2003. [AZ] F. Amoroso, U. Zannier (eds.). Diophantine approximation. In Proceedings of the C.I.M.E. Conference, Cetraro (Italy), 2000. Lecture Notes in Mathematics 1819. Springer, 2000. [Aut1] P. Autissier. Géométrie, points entiers et courbes entières. Annales Sci. E.N.S. 42 (2009), 221–239. [Aut2] P. Autissier. Sur la non-densité des points entiers. Duke Math. J., 158 (2011), 13–27. [B] A. Baker, Transcendental Number Theory. Cambridge University Press, 1976. [Bar] A. Baragar. Rational points on K3 surfaces in P1 × P1 × P1 . Math. Ann. 305(3) (1996), 541–558. [Beau] A. Beauville. Surfaces algébriques complexes. Société Mathématique de France, 1978. English translation: Complex Algebraic Surfaces. London Mathematical Society Student Texts. Cambridge University Press, 1996. [BBM] M. Bennett, Y. Bugeaud, M. Mignotte. Perfect powers with few binary digits and related Diophantine problems. Ann. Scuola Norm. Super. Pisa Cl. Sci. (5) 12(4) (2013), 941–953. [Be1] F. Beukers. Diophantine equations and approximation. In [EE] (2004). [Be2] F. Beukers. The zero multiplicity of ternary recurrences. Compositio Math. 77 (1991), 165–177. [BeS] F. Beukers, H. P. Schlickewei. The equation x + y = 1 in finitely generated groups. Acta Arith. 78 (1996), 189–199.

188

References

189

[Bilu] Yu. Bilu. Effective analysis of integral points on algebraic curves. Israel J. Math., 90 (1995), 235–252. [Bilu1] Yu. Bilu. The many faces of the Subspace Theorem, [after Adamczewski, Bugeaud, Corvaja, Zannier, . . . ]. In Séminaire Bourbaki, 2006/2007. Astérisque No. 317 (2008), Exp. No. 967. [Bilu2] Yu. Bilu. A note on universal Hilbert sets. J. Reine Angew. Math., 479 (1996), 195–203. [BiT] Yu. Bilu, R. F. Tichy. The Diophantine equation f (x) = g(y). Acta Arith. 95 (2000), 261–288. [BKT] F. Bogomolov, M. Korotiaev, Y. Tschinkel. A Torelli theorem for curves over finite fields. Pure Appl. Math Quarterly 6 (1) (2010), 245–294. [Bo1] E. Bombieri. Effective Diophantine approximation on Gm . Ann. Scuola Norm. Super. Pisa Cl. Sci. (4) 20 (1993), 61–69. [Bo2] E. Bombieri. Subvarieties of linear tori and the unit equation. A survey. In Analytic Number Theory, Y. Motohashi (ed.). London Mathematical Society Lecture Notes 247. Cambridge University Press, 1997. [Bo3] E. Bombieri. On Weil’s “Théorème de décomposition”. Amer. J. Math. 105 (1983), 295–308. [Bo4] E. Bombieri. The Mordell conjecture revisited. Ann. Scuola Norm. Super. Pisa Cl. Sci. 17 (1990), 615–640. [Bo5] E. Bombieri. Forty years of effective results in Diophantine theory. In [Wu] (2004). [BoC] E. Bombieri, P. B. Cohen. An elementary approach to effective Diophantine approximation on Gm . Preprint, 2002. [BoG] E. Bombieri, W. Gubler. Heights in Diophantine Geometry. New Mathematical Monographs 4. Cambridge University Press, 2006. [BoMaZ] E. Bombieri, D. Masser, U. Zannier. Intersecting a curve with algebraic subgroups of multiplicative groups. Int. Math. Research Notices, 20 (1999), 1119– 1140. [BoMuZ] E. Bombieri, J. Müller, U. Zannier. Equations in one variable over function fields. Acta Arith. 99 (2001), 27–39. [BoP] E. Bombieri, J. Pila. The number of integral points on arcs and ovals. Duke Math. J. 59 (1989), 337–357. [BoZ] E. Bombieri, U. Zannier. Algebraic points on subvarieties of Gnm . Int. Math. Research Notices, 7 (1995), 333–347. [BS] Z. I. Borevitch, I. R. Shafarevitch. Théorie des nombres. Gauthier-Villars, 1967. [BGS] J. Bourgain, A. Gamburd, P. Sarnak. Markov surfaces and strong approximation I. Preprint, 2016. [BrMa] D. Brownawell, D. Masser. Vanishing sums in function fields. Math. Proc. Camb. Phil. Soc. 100 (1986), 427–434. [Bug] Y. Bugeaud. Automatic continued fractions are transcendental or quadratic. ´ Ann. Ecole Norm. Sup. 46 (6) (2013), 1005–1022. [BuCZ] Y. Bugeaud, P. Corvaja, U. Zannier. An upper bound for the G.C.D. of an − 1 and bn − 1. Math. Z. 243 (2003), 79–84. [BK] Y. Bugeaud, Dong Han Kim. A new complexity function, repetitions in Sturmian words and irrationality exponents of Sturmian numbers. Preprint, arXiv: 1510.00279v2 math.NT 7 Jan 2017.

190

References

[BL] √Y. Bugeaud, F. Luca. On the period of the continued fraction expansion of 22n+1 + 1. Indag. Math. 16 (2015), 21–35. [BMS] Y. Bugeaud, M. Mignotte, S. Siksek. Classical and modular approaches to exponential Diophantine equations I: Fibonacci and Lucas perfect powers. Ann. Math. 163 (2006), 969–1018. [C1] J. W. S. Cassels. An Introduction to Diophantine Approximation. Cambridge University Press, 1957. [C2] J. W. S. Cassels. Rational Quadratic Forms. Academic Press, 1978. [C-TS] J.-L. Colliot-Thélène, J.-J. Sansuc. Principal homogeneous spaces under flasque tori: applications. J. Algebra 106 (1987), 148–205. [Co1] P. Corvaja. Rational fixed points for linear group actions. Ann. Scuola Norm. Super. Pisa Cl. Sci. 5 (4) (2007), 561–597. [Co2] P. Corvaja. Integral Points on Algebraic Varieties: An Introduction to Diophantine Geometry. Hindustan Book Agency, 2016. [CZ1] P. Corvaja, U. Zannier. Diophantine equations with power sums and universal Hilbert sets. Indag. Mathem., N.S., 9 (3) (1998), 317–332. [CZ2] P. Corvaja, U. Zannier. On the Diophantine equation f (am , y) = bn . Acta Arith. 94 (1) (2000), 25–40. [CZ3] P. Corvaja, U. Zannier. Finiteness of integral values for the ratio of two linear recurrences. Invent. Math. 149 (2002), 431–451. [CZ4] P. Corvaja, U. Zannier. A subspace theorem approach to integral points on curves. C.R. Acad. Sci. Paris Série I 334 (2002), 267–271. [CZ4a] P. Corvaja, U. Zannier. Some new applications of the subspace theorem. Compositio Math. 131 (3) (2002), 319–340. [CZ5] P. Corvaja, U. Zannier. On the number of integral points on algebraic curves. J. Reine Angew. Math. 565 (2003), 27–42. [CZ6] P. Corvaja, U. Zannier. On the greatest prime factor of (ab + 1)(ac + 1). Proc. Amer. Math. Soc., 131 (2003), 1705–1709. [CZ7] P. Corvaja, U. Zannier. On integral points on surfaces. Ann. Math. 160 (2004), 705–726. [CZ8] P. Corvaja, U. Zannier. On the rational approximation to the powers of an algebraic numbers: solution of two problems of Mahler and Mendès France. Acta Math. 193 (2004), 175–191. [CZ9] P. Corvaja, U. Zannier. On a general Thue’s equation. Amer. J. Math. 126 (2004), 1033–1055; Addendum ibid. 128 (2006), 1057–1066. [CZ10] P. Corvaja, U. Zannier. A lower bound for the height of a rational function at S-unit points. Monats. Math. 144, 203–224 (2004). [CZ11] P. Corvaja, U. Zannier. On the length of the continued fraction for values of quotients of power sums. J. Théorie Nombres Bordeaux 17 (2005), 737–747. [CZ12] P. Corvaja, U. Zannier. On integral points on certain surfaces. Int. Math. Research Notices (2006), 1–20. [CZ13] P. Corvaja, U. Zannier. Some cases of Vojta’s conjecture for integral points over function fields. J. Alg. Geom. 17 (2008), 295–333. Addendum in Asian J. Math. 14 (2010), 581–584. [CZ14] P. Corvaja, U. Zannier. On the greatest prime factor of Markov pairs. Rendiconti Sem. Mat. Univ. Padova 116 (2006), 253–260.

References

191

[CZ15] P. Corvaja, U. Zannier. Integral points, divisibility between values of polynomials and entire curves on algebraic surfaces. Adv. Math. 225 (2010), 1095–1118. [CZ16] P. Corvaja, U. Zannier. Finiteness of odd perfect powers with four nonzero binary digits. Ann. Inst. Fourier (Grenoble) 63 (2) (2013), 715–731. [CZ16a] P. Corvaja, U. Zannier. Greatest common divisors of u − 1, v − 1 in positive characteristic and rational points on curves over finite fields. J. Eur. Math. Soc. 15 (2013), 1927–1942. [CZ17] P. Corvaja, U. Zannier. Algebraic hyperbolicity of ramified covers of G2m (and integral points on affine subsets of P2 ). J. Diff. Geometry 93 (2013), 355–377. [CZ18] P. Corvaja, U. Zannier. On the Hilbert property and the fundamental group of algebraic varieties. Math. Z. 286 (2017), 579–602. [CoNo] P. Corvaja, J. Noguchi. A new unicity theorem and Erd˝os’ problem for polarized semi-abelian varieties. Math. Annalen 353 (2012), 439–464. [CRZ] P. Corvaja, Z. Rudnik, U. Zannier. A lower bound for periods of matrices. Commun. Math. Phys. 252 (2004), 535–541. [De] P. Dèbes. On the irreducibility of the polynomials P(t m ,Y ). J. Number Theory 42 (1992), 141–157. [DeZ] P. Dèbes, U. Zannier. Universal Hilbert subsets. Math. Proc. Camb. Phil. Soc. 124 (1998), 127–134. [Dem] J. Demeio. Non-rational varieties with the Hilbert property, preprint (2017). [Der] H. Derksen. A Skolem–Mahler–Lech theorem in positive characteristic and finite automata. Inv. Math. 168 (2007), 67–108. [DerMas] H. Derksen, D. Masser. Linear equations over multiplicative groups, recurrences and mixing, II. Indag. Math. 26 (2015), 113–136. [DGS] B. Dwork, G. Gerotto, F. Sullivan. An Introduction to G-functions. Princeton University Press, 1994. [DR] E. Dubois, G. Rhin. Sur la majoration de formes linéaires a` coefficients algébriques réels et p-adiques. C.R. Acad. Sci. Paris, 282 (1976), 1211–1214. [DTZ] R. Dvornicich, S. P. Tung, U. Zannier. On polynomials taking small values at integral arguments, II. Acta Arith. 106 (2) (2003), 115–121. [EE] B. Edixhoven, J.-H. Evertse (eds.). Diophantine Approximation and Abelian Varieties. Lecture Notes in Mathematics 1566. Springer, 1993. [E1] J.-H. Evertse. The subspace theorem of W. M. Schmidt. In [EE] (2004). [E2] J.-H. Evertse. An improvement of the quantitative subspace theorem. Compositio Math. 101 (1996), 225–311. [E3] J.-H. Evertse. On sums of S-units and linear recurrences. Compositio Math. 53 (1984), 225–244. [E4] J.-H. Evertse. Points on subvarieties of tori. In [Wu] (2004). [EF1] J. H. Evertse, R. Ferretti. Diophantine inequalities on projective varieties. Int. Math. Research Notices, 25 (2002), 1295–1330. [EF2] J.-H. Evertse, R. G. Ferretti. A generalization of the subspace theorem with polynomials of higher degree. In Diophantine Approximation. Developments in Mathematics 16. Springer, 2008. [EG] J.-H. Evertse, K. Gy˝ory. Unit Equations in Diophantine Number Theory. Cambridge University Press, 2015. [ES] J. H. Evertse, H. P. Schlickewei. A quantitative version of the absolute subspace theorem. J. Reine Angew. Math., 548 (2002), 21–127.

192

References

[ESS] J. H. Evertse, H. P. Schlickewei, W. M. Schmidt. Linear equations in variables which lie in a multiplicative group. Ann. Math., 155 (2002), 807–836. [Fa] G. Faltings. Diophantine approximation on abelian varieties. Ann. Math., 133 (1991), 549–576. [FaWu] G. Faltings, G. Wüstholz. Diophantine approximations on projective spaces. Invent. Math., 116 (1994), 109–138. [FM] S. Ferenczy, C. Maudit, Transcendence of numbers with a low complexity expansion, J. Number Theory, 67 (1997), 146–161. [FeZ] A. Ferretti, U. Zannier. Equations in the Hadamard ring of rational functions. Ann. Scuola Norm. Sup. Pisa Cl. Sci., 6 (2007), 457–475. [Fo] O. Forster. Lectures on Riemann Surfaces, Graduate Text in Mathematics 81. Springer, 1981. [FZ] C. Fuchs, U. Zannier. On Some Applications of Diophantine Approximations ¨ (a translation of Carl Ludwig Siegel’s Uber einige Anwendungen diophantischer Approximationen). Edizioni della Normale, 2014. [GMR] A. Gamburd, M. Magee, R. Ronan. An asymptotic for integer points on Markoff–Hurwitz surfaces. arXiv:1603.06267 [math.NT] (2017). [Ga] C. Gasbarri. Dyson’s theorem for curves. J. Number Theory 129 (1) (2009), 36–58. [Ge] A. O. Gelfond. Transcendental and Algebraic Numbers. Dover, 1970. [GL] A. O. Gelfond, Y. Linnik. Méthodes e´ lémentaires dans la théorie analytique des nombres. Gauthier-Villars, 1965. [GhS] A. Ghosh, P. Sarnak. Integral points on Markoff type cubic surfaces, preprint (2017). [Go] D. Goldfeld. Modular forms, elliptic curves and the ABC-conjecture. In [Wu] (2004). [GS] A. Grytzuk, A. Schinzel. On Runge’s theorem,. Coll. Math. Soc. J. Bolyai, 60 (1992), 329–356. [Gy] K. Györy. Solving Diophantine equations by Baker’s theory. In [Wu] (2004). [H] R. Hartshorne. Algebraic Geometry. Graduate Texts in Mathematics 52. Springer, 1977. [HT] B. Hassett, Yu. Tschinkel. Density of integral points on algebraic varieties. In Rational Points on Algebraic Varieties. Progress in Mathematics 199. Birkhäuser, 2001, pp. 169–197. [H-B] D. R. Heath-Brown. The density of rational points on curves and surfaces. Ann. Math., 155 (2002), 553–595. ¨ [Hilb] D. Hilbert. Uber die Irreduzibilität ganzer rationaler Funktionen mit ganzzahligen Koeffizienten. J. Reine Ang. Math. 110 (1892), 104–129. [HiSi] M. Hindry, J. H. Silverman. Diophantine Geometry. Springer, 2000. [KMN] A. Kulkarni, N. M. Mavraki, K. D. Nguyen. Algebraic approximations to linear combinations of powers: an extension of results by Mahler and Corvaja– Zannier. Preprint, 2017. [L1] S. Lang. Algebraic Number Theory. Addison Wesley, 1970. [L2] S. Lang. Fundamentals of Diophantine Geometry. Springer, 1983. [L3] S. Lang. Number Theory III. Encyclopedia of Mathematical Sciences, 60. Springer, 1991.

References

193

[L4] S. Lang. Introduction to Algebraic and Abelian Functions. Graduate Texts in Mathematics 89. Springer, 1982. ´ [Lau] M. Laurent. Equations exponentielles polynômes et suites récurrentes linéaires. Astérisque 147–148 (1987), 121–139; II, J. Number Theory 31 (1989), 24–53. [Lei] D. Leitner. Two exponential Diophantine equations. J. Théorie Nombres Bordeaux 23 (2) (2011), 479–487. [Lev1] A. Levin. Generalizations of Siegel’s and Picard’s theorems. Ann. Math. 170 (2) (2009), 609–655. [Lev2] A. Levin. One-parameter families of unit equations. Math. Res. Lett. 13 (5–6) (2006), 935–945. [Lev3] A. Levin. On the Schmidt subspace theorem for algebraic points. Duke Math. J. 163 (15) (2014), 2841–2885. [Lev4] A. Levin. Integral points of bounded degree on affine curves. Compos. Math. 152 (4) (2016), 754–768. [Lev5] A. Levin. Greatest common divisors and Vojta’s conjecture for blowups of algebraic tori (2017), to appear Inventiones Math. [Los] V. Losert. The set of solutions of some equation for linear recurrence sequences. In [SST] (2004). [LS] F. Luca, I. Shparlinski. On the exponent of the group of points on elliptic curves in extension fields. Int. Math. Research Notices 2005, 1391–1409. [Mag] C. Magagna. A lower bound for the r-order of a matrix modulo N. Monats. Math. 153 (2008), 59–81. [Mah] K. Mahler. On the fractional parts of the powers of rational numbers II. Mathematika 4 (1957), 122–124. [Mas] R. C. Mason. Diophantine Equations over Function Fields. LMS Lecture Notes, 96. Cambridge University Press, 1985. [Mass] D. W. Masser. Heights, transcendence and linear independence on commutative group varieties. In [AZ] (2004). [MF] M. Mendès France. Sur les fractions continues limitées. Acta Arith. 23 (1973), 207–215. [Mi1] R. Miles. Synchronization points and associated dynamical invariants. Trans. Amer. Math. Soc. 365 (2013), 5503–5524. [Mi2] R. Miles. A natural boundary for the dynamical zeta function for commuting group automorphisms. Proc. Amer. Math. Soc. 143 (2015), 2927–2933. [Mor] J. L. Mordell. Diophantine Equations. Academic Press, 1969. [Mum] D. Mumford. A remark on Mordell’s conjecture. Amer. J. Math. 87 (1965), 1007–1016. [Nish] K. Nishioka. Mahler Functions and Transcendence. Lecture Notes in Mathematics 1631. Springer 1996. [NW1] J. Noguchi, J. Winkelmann. Holomorphic curves and integral points off divisors. Math. Z. 239 (2002), 593–610. [NW2] J. Noguchi, J. Winkelmann. Nevanlinna Theory in Several Complex Variables and Diophantine Approximation. Springer 2014. [NWY] J. Noguchi, J. Winkelmann, K. Yamanoi. Degeneracy of holomorphic curves into algebraic varieties. J. Math. Pures Appl. 88 (2007), 293–306. [O] C. D. Olds. Continued Fractions. Random House, 1963. [PS] G. Pólya, G. Szego. Problems and Theorems in Analysis II. Springer, pp. 1976.

194

References

[vdP1] A. J. van der Poorten. Some facts that should be better known, especially about rational functions. In Number Theory and Applications. Kluwer Academic, 1989, pp. 497–528. [vdP2] A. J. van der Poorten. Solution de la conjecture de Pisot sur le quotient de Hadamard de deux fractions rationnelles. C.R. Acad. Sci. Paris Série I 306 (1988), 97–102. [Po] Y. Pourchet. Solution du problème arithmétique du quotient de Hadamard de deux fractions rationnelles. C.R. Acad. Sci. Paris Série A 288 (1979), 1055–1057. [RoRo] A. Robinson, P. Roquette. On the finiteness theorems of Siegel and Mahler concerning Diophantine equations. J. Number Theory 7 (1975), 121–176. [R] K. F. Roth. Rational approximations to algebraic numbers. Mathematika, 2 (1955), 1–20. [Ri] D. Ridout. The p-adic generalization of the Thue–Siegel–Roth theorem. Mathematika, 5 (1958), 40–48. [Ru1] Min Ru. A defect relation for holomorphic curves intersecting hypersurfaces. Amer. J. Math. 126 (1) (2004), 215–226. [Ru2] Min Ru. Holomorphic curves into algebraic varieties. Ann. Math. (2) 169 (1) (2009), 255–267. [RuV] Min Ru, P. Vojta. Schmidt’s subspace theorem with moving targets. Invent. Math. 127 (1) (1997), 51–65. [RuW] Min Ru, J. T. Y. Wong. Diophantine approximation with algebraic points of bounded degree. J. Number Theory 81 (1) (2000), 110–119. [Rum] R. Rumely. Note on van der Poorten’s proof of the Hadamard quotient theorem I, II. In Séminaire de Théorie des nombres de Paris 1986–87. Progress in Mathematics 75, Birkhäuser, 1988. pp. 349–409. [Sch1] A. Schinzel. Polynomials with Special Regard to Reducibility. Encyclopedia of Mathematics and Its Applications 77. Cambridge University Press, 2000. [Sch2] A. Schinzel. An improvement of Runge’s theorem on Diophantine equations. Comm. Pontif. Acad. Soc. 20 (1968), 9. [SchT] A. Schinzel, R. Tijdeman. On the equation ym = P(x). Acta Arith. 31 (2) (1976), 199–204. [SST] H.-P. Schlickewei, K. Schmidt, R. F. Tichy (eds.). Diophantine Approximation. Developments in Mathematics 16. Springer, 2008. [S1] W. M. Schmidt. Approximation to algebraic numbers. L’Ens. Math. 17 (1971), 187–253. [S2] W. M. Schmidt. Diophantine Approximation. Lecture Notes in Mathematics 785. Springer, 1980. [S3] W. M. Schmidt. Diophantine Approximations and Diophantine Equations. Lecture Notes in Mathematics 1467. Springer, 1991. [S4] W. M. Schmidt. Linear recurrence sequences and polynomial–exponential equations. In [AZ] (2004). [S5] W. M. Schmidt. The zero multiplicity of linear recurrence sequences. Acta Math. 182 (1999), 243–282. [S6] W. M. Schmidt. Integer points on hypersurfaces. Monats. Math. 102 (1986), 27–58. ¨ [Schn] Th. Schneider. Uber die Approximation algebraischer Zahlen, J. Reine Ang. Math. 175 (1936), 182–192.

References

195

[Scr] A. Scremin. On the period of the continued fraction for values of the square root of a power sum. Acta Arith. 123 (2006), 297–312. [Se1] J-P. Serre. Lectures on the Mordell–Weil Theorem. Vieweg, 1990. [Se2] J-P. Serre. Algebraic Groups and Class Fields. Graduate Texts in Mathematics 117. Springer, 1988. [Se3] J-P. Serre. Topics in Galois Theory. Jones and Bartlett, 1992. [ShSt] T. N. Shorey, C. L. Stewart. Pure powers in recurrence sequences and some related Diophantine equations. J. Number Theory, 27 (1987), 324–352. [ShT] T. N. Shorey, R. Tijdeman. Exponential Diophantine Equations. Cambridge University Press, 1986. ¨ [Sie] C. L. Siegel. Uber einige Anwendungen diophantischer Approximationen. Abh. Preuß. Akad. Wissen. Phys.-math. Klasse, (1929). Reprinted in Ges. Abh. Bd. I, 209–266. Springer, 1966. English translation in [FZ] (2004). [Sil1] J. Silverman. The Arithmetic of Elliptic Curves. Graduate Texts in Mathematics 106. Springer, 1986. [Sil2] J. Silverman. Generalized greatest common divisors, divisibility sequences, and Vojta’s conjecture for blowups. Monats. Math., 145 (2005), 333–350. [Sil3] J. Silverman. Rational points on K3 surfaces: a new canonical height. Inventiones Math. 105 (1991), 347–373. [SilT] J. Silverman, J. Tate. Rational Points on Elliptic Curves. Springer, 1992. [StT] C. L. Stewart, R. Tijdeman. On the greatest prime factor of (ab + 1)(ac + 1)(bc + 1). Acta Arith. 79 (1997), 93–101. [Sto] W. W. Stothers. Polynomial identities and Hauptmodulen. Quart. J. Math. Oxford 32 (1981), 349–370. [S-D] H. P. F. Swinnerton-Dyer. A4 + B4 = C4 + D4 revisited. J. London Math. Soc. 43 (1968), 149–151. [Tij1] R. Tijdeman. Diophantine approximation and its applications. In [EE] (2004). [Tij2] R. Tijdeman. Roth’s theorem. In [EE] (2004). [TrZ] G. Troi, U. Zannier. Note on the density constant in the distribution of self numbers II. Boll. U.M.I., 8 (2B) (1999), 397–399. [Ve] F. Veneziano. Quadratic integral solutions to double Pell equations. Rend. Sem. Mat. Univ. Padova, 126 (2011), 47–61. [Vo1] P. Vojta. Diophantine Approximations and Value Distribution Theory. Lecture Notes in Mathematics 1239. Springer, 1987. [Vo2] P. Vojta. Siegel’s theorem in the compact case. Ann. Math. 133 (1991), 509–548. [Vo3] P. Vojta. A generalization of theorems of Faltings and Thue–Siegel–Roth– Wirsing. J. Amer. Math. Soc. 5 (1992), 763–804. [Vo4] P. Vojta. Integral points on subvarieties of semiabelian varieties, I. Inventiones Math. 126 (1996), 133–181. [Vo5] P. Vojta. Diophantine approximation and Nevanlinna theory. In Arithmetic Geometry, P. Corvaja, C. Gasbarri (eds.). Lecture Notes in Mathematics 2009, Springer, 2011, pp. 111–224. [Wa1] M. Waldschmidt. Un demi-siècle de transcendence. In Development of Mathematics 1950–2000. Birkhauser, 2000, pp. 1121–1186. [Wa2] M. Waldschmidt. Words and transcendence. In Analytic Number Theory, Cambridge University Press, 2009, pp. 449–470.

196

References

[Wan] J. Tzu-Yueh Wang. An effective Roth’s theorem for function fields. Rocky Mountain J. Math. 26 (1996), 1225–1234. [W] A. Weil. Number Theory, An Approach through History from Hammurapi to Legendre. Birkäuser, 1983. [Wi] E. A. Wirsing. On approximations of algebraic numbers by algebraic numbers of bounded degree. In 1969 Number Theory Institute, Stony Brook. Proceedings of Symposia in Pure Mathematics XX. American Mathematical Society, 1971, pp. 213–247. [Wu] G. Wüstholz (ed.). A Panorama of Number Theory; or The View from Baker’s Garden. Cambridge University Press, 2002. [Z1] U. Zannier. Some remarks on the S-unit equation in function fields. Acta Arith. LXIV (1993), 87–98. [Z2] U. Zannier. Fields containing values of algebraic functions and related questions. In Number Theory 1993–94, S. David (ed.). Cambridge University Press, 1996, pp. 199–213. [Z3] U. Zannier. A local–global principle for norms from cyclic extensions of Q(t) (a direct, constructive and quantitative approach). L’Enseignement Math., 45 (1999), 357–377. [Z4] U. Zannier. A proof of Pisot dth root conjecture. Ann. Math. 151 (2000), 375–383. [Z5] U. Zannier. Some Applications of Diophantine Approximation to Diophantine Equations (with Special Emphasis on the Schmidt Subspace Theorem). Forum Editrice, 2003. [Z6] U. Zannier. Lecture Notes on Diophantine Analysis (with an Appendix by F. Amoroso). Edizioni della Normale, 2008. [Z7] U. Zannier. Hilbert irreducibility above algebraic groups. Duke Math. J. 153 (2010), 397–425. [Z8] U. Zannier. On the integer solutions of exponential equations in function fields. Ann. Inst. Fourier (Grenoble), 54 (4) (2004), 849–874.

Index

abc conjecture, 42 Abramovich, D., 95 Albanese variety, 84 quasi-, 84 Algebraic group, 35, 64, 125 Alphabet, 176 Automatic number, 182 Automatic sequence, 182 Baker, A., 18, 116 Beukers, F., 28 Big divisor, 72, 117 Bilu, Yu., 116 Bombieri, E., 18, 28 Cartan, H., 45 Cartan’s conjecture, 45 Complexity function, 178 Degenerate modulus, 40 Dirichlet, J. L., 19 Dirichlet’s lemma, 10 Divisor big, 72, 117 nef, 72 Divisor at infinity, 50 dth-root conjecture, 142 Dyson, F., 18 Equation norm-form, 39 Pell’s, 9, 12 S-unit, 32 Thue, 23 Equidistribution principle, 11 Euclid algorithm, 4 Evertse, J.-H., 28, 32 Exponential polynomial, 120

Faltings, G., 14, 70 Fibonacci sequence, 119, 153 Finite automata, 181 Gelfond, A. O., 18 Generating function, 119 Genus, 60 Harris, J., 95 Hasse principle, 14 Heath-Brown, D. R., 118 Height, 19 Hermite, C., 18 Integral points, 14, 37, 49 Jacobian variety, 64 Kronecker, L., 21 Lang’s conjecture, 70 Laurent, M., 70 Linear recurrence, 119 Liouville, J., 14 Liouville numbers, 172 Logarithmic singularities, 84 Losert’s equation, 170 Mahler’s theorem, 154 Markov number, 157 Markov triples, 157 Markov’s equation, 157 Markov’s surface, 159 Mordell conjecture, 14 Multiplicative algebraic group, 35 Nef divisor, 72 Non-degenerate solution, 32, 33 Normal number, 179 Northcott’s theorem, 21 Order of a recurrence, 119

197

198 Palindromic, 186 Pell’s equation, 9, 12 Pila, J., 118 Pisot number, 154 Pisot, C., 126 Place, 19 Pourchet, C., 126 Power sum, 120 Product formula, 19 Quasi-S-integral, 49 Quasi-integral, 49 Recurrence linear, 119, 120 non-degenerate, 120 order, 119 roots of, 120 simple, 120 Repetition long, 176 Ridout’s theorem, 22 Ritt, J. F., 167 Roth, K. F., 14, 19 Rumely, R., 25 Runge’s method, 69 S-integer, 20 S-unit, 20 equation, 25, 32 Schlickewei, H.-P., 28, 30, 32 Schmidt, W. M., 29 Schmidt’s subspace theorem, 30 Siegel, C. L., 19, 26, 60 Skolem–Mahler–Lech theorem, 43 Sturmian word, 180

Index Theorem Chevalley–Weil, 53, 115 Hilbert’s irreducibility theorem, 95 Lang, 22 Laurent, 36 Luroth, 61 Mahler, 22 Mordell–Weil, 64 Northcott, 21 Ridout, 22 Roth, 22 Roth generalized, 23 Schmidt, 29 Siegel, 60 Siegel, generalized, 62 Skolem–Mahler–Lech, 125 subspace I, 30 subspace II, 31 subspace III, 31 Thue, 22 weak Mordell–Weil, 55, 64 Thue, A., 14, 19 Tijdeman, R., 3 Transcendental, 26 Twisted form of a curve, 56 Universal Hilbert set, 169 Upper growth rate of periodic points, 139 Valuation, 19 van der Poorten, A. J., 32 Vojta, P., 32, 70 Vojta’s conjecture, 117, 141, 152, 158, 161, 166, 168 Weil, A., 21