Nonstandard Methods and Applications in Mathematics 1316755762, 9781316755761

Since their inception, the Perspectives in Logic and Lecture Notes in Logic series have published seminal works by leadi

415 28 3MB

English Pages [259] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Nonstandard Methods and Applications in Mathematics
 1316755762, 9781316755761

Table of contents :
Contents
Preface
FOUNDATIONS
The eightfold path to nonstandard analysis
Neoclosed forcing
Nonstandard objects in set theory
PURE MATHEMATICS
The microscopic behavior of measurable functions
Nonstandard measure constructions — solutions and problems
Inverse problem for upper asymptotic density II
Nonstandard analysis and cohomology
APPLIED MATHEMATICS
Loeb space methods for stochastic Navier-Stokes equations
Discrete approximation of compact operators and approximation of their spectra
TEACHING
Nonstandard analysis at pre-university level: Naive magnitude analysis

Citation preview

Nonstandard Methods and Applications in Mathematics Since their inception, the Perspectives in Logic and Lecture Notes in Logic series have published seminal works by leading logicians. Many of the original books in the series have been unavailable for years, but they are now in print once again. This volume, the 25th publication in the Lecture Notes in Logic series, grew from a conference on Nonstandard Methods and Applications in Mathematics held in Pisa, Italy from 12–16 June, 2002. It contains ten peer-reviewed papers that aim to provide something more timely than a textbook, but less ephemeral than a conventional proceedings. Nonstandard analysis is one of the great achievements of modern applied mathematical logic. These articles consider the foundations of the subject, as well as its applications to pure and applied mathematics, and mathematics education. N i g e l J . C u t l a n d is a Professor of Mathematics at the University of York where he researches logic and foundations of mathematics, stochastic analysis, and nonstandard analysis and its applications. M au r o D i N a s s o is a professor at the University of Pisa where he researches model theory and nonstandard analysis. Dav i d A . R o s s is a professor at the University of Hawaii, Manoa where he researches nonstandard analysis and probability theory.

L E C T U R E N OT E S I N L O G I C

A Publication of The Association for Symbolic Logic This series serves researchers, teachers, and students in the field of symbolic logic, broadly interpreted. The aim of the series is to bring publications to the logic community with the least possible delay and to provide rapid dissemination of the latest research. Scientific quality is the overriding criterion by which submissions are evaluated. Editorial Board Jeremy Avigad, Department of Philosophy, Carnegie Mellon University Zoe Chatzidakis DMA, Ecole Normale Supérieure, Paris Peter Cholak, Managing Editor Department of Mathematics, University of Notre Dame, Indiana Volker Halbach, New College, University of Oxford H. Dugald Macpherson School of Mathematics, University of Leeds Slawomir Solecki Department of Mathematics, University of Illinois at Urbana–Champaign Thomas Wilke, Institut für Informatik, Christian-Albrechts-Universität zu Kiel More information, including a list of the books in the series, can be found at http://www.aslonline.org/books-lnl.html

L E C T U R E N OT E S I N L O G I C 2 5

Nonstandard Methods and Applications in Mathematics

Edited by

NIGEL J. CUTLAND University of York

MAURO DI NASSO University of Pisa

DAVID A. ROSS University of Hawaii, Manoa

association for symbolic logic

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 4843/24, 2nd Floor, Ansari Road, Daryaganj, Delhi – 110002, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781316755761 10.1017/9781316755761 First edition © 2006 Association for Symbolic Logic under license to A.K. Peters, Ltd. This edition © 2016 Association for Symbolic Logic under license to Cambridge University Press. Association for Symbolic Logic Richard A. Shore, Publisher Department of Mathematics, Cornell University, Ithaca, NY 14853 http://www.aslonline.org This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. A catalogue record for this publication is available from the British Library. Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party Internet Web sites referred to in this publication and does not guarantee that any content on such Web sites is, or will remain, accurate or appropriate.

TABLE OF CONTENTS Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

FOUNDATIONS Vieri Benci, Mauro Di Nasso, and Marco Forti The eightfold path to nonstandard analysis. . . . . . . . . . . . . . . . . . . . . . . . .

3

Sergio Fajardo and H. Jerome Keisler Neoclosed forcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

Karel Hrbacek Nonstandard objects in set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

PURE MATHEMATICS Peter A. Loeb The microscopic behavior of measurable functions. . . . . . . . . . . . . . . . . . 123 David A. Ross Nonstandard measure constructions — solutions and problems . . . . . 127 Renling Jin Inverse problem for upper asymptotic density II . . . . . . . . . . . . . . . . . . . . 147 Angus Macintyre Nonstandard analysis and cohomology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 APPLIED MATHEMATICS Nigel J. Cutland Loeb space methods for stochastic Navier-Stokes equations . . . . . . . . . 195 Manfred P. H. Wolff Discrete approximation of compact operators and approximation of their spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

v

vi

table of contents TEACHING

Richard O’Donovan and John Kimber Nonstandard analysis at pre-university level: Naive magnitude analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

PREFACE

Nonstandard analysis is one of the the great achievements of modern applied mathematical logic. In addition to the important philosophical achievement of providing a sound mathematical basis for using infinitesimals in analysis, the methodology is now well established as a tool for both research and teaching, and has become a fruitful field of investigation in its own right. It has been used to discover and prove significant new standard theorems in such diverse areas as probability theory and stochastic analysis, functional analysis, fluid mechanics, dynamical systems and control theory, and recently there have been some striking and unexpected applications to additive number theory. A conference on Nonstandard Methods and Applications in Mathematics (NS2002) was held in Pisa, Italy from June 12-16 2002. This was originally planned as a special section in the very successful first joint meeting of the American Mathematical Society and the Unione Matematica Italiana. In order to accommodate the large number of mathematicians interested in the field, a satellite conference, hosted by the Universit´a di Pisa and held at the Domus Galilaeana, was added during the days preceding the main AMS/UMI meeting. A complete list of the registered participants appears later in this forward. This volume is a byproduct of NS2002. Not a proceedings per se, it is a collection of peer-reviewed papers solicited from some of the participants with the aim of providing something more timely than a textbook, but less ephemeral than a conventional proceedings. To that end, the volume contains both survey papers on topics for which other surveys are either dated or nonexistent, and research articles on applications too recent to have received attention in older volumes. One of the included papers, on an infinitesimal approach to calculus, deserves special mention. The use of infinitesimals in the teaching of calculus is of course not at all new, though they began to disappear from textbooks late in the 19th century due to concerns about their theoretical underpinnings. (Even today most instructors use infinitesimals in teaching applications, such as volumes of rotation, as they are more natural and compelling than Riemann sums vii

viii

PREFACE

in this context.) Any foundational concerns were of course completely dispelled by Abraham Robinson’s work, and at least two calculus textbooks and several introductory analysis texts using infinitesimals have since appeared. By beginning the course with some basic rules for working in an extension of the real number system, such books make it possible to offer completely correct proofs to beginning students, proofs which better encapsulate mathematical intuition than do more conventional arguments. A few months prior to NS2002, the organizers learned that an infinitesimal approach to calculus was being adopted by some high school teachers in Geneva, Switzerland. This was the first attempt we had heard of to use a modern infinitesimal approach at the high school level. Curious about the effort — which appeared to be independent of (and different than) the earlier approaches of Keisler et al — we asked one of the course’s designers, Richard O’Donovan, to come to our meeting and report on their work. The paper here, Nonstandard analysis at pre-university level: naive magnitude analysis by O’Donovan and his colleague John Kimber describes their approach We are grateful to the Istituto Nazionale di Alta Matematica, Gruppo Nazionale per le Strutture Algebriche, Geome-triche e le loro Applicazioni (INdAM-GNSAGA), and to the University of Pisa Interdepartmental Center for the Study of Complex Systems (CISSC), for the financial support which made NS2002 possible. We are also grateful to the Domus Galilaeana of Pisa for hosting part of the congress. Thanks also to the ASL, in particular to C. Ward Henson and to Steffen Lempp, for their assistance at all stages of producing this volume. The program comprised a total of thirty-three talks, including the following invited lectures: N.J. Cutland (Hull, UK): Nonstandard techniques in stochastic fluid dynamics ¨ H. Osswald (Munchen, Germany): Malliavin calculus on Banach space valued continuous functions F. Diener (Nice, France): Nonstandard tree model for financial mathematics: Beyond the continuous Black-Scholes approximation for vanilla and barrier options ¨ M. Wolff (Tubingen, Germany): Discrete approximation of spaces and operators V. Benci (Pisa, Italia): Numerosities of labelled sets: A new way of counting T. Nakamura (Tsuda, Japan): Construction of a path-space measure for the Ornstein-Uhlenbeck process by infinitesimal random walks A. Macintyre (London, UK): Ultraproducts of cohomology theories J.L. Bell (Western Ontario, Canada): Real lines in smooth infinitesimal analysis

PREFACE

ix

H.J. Keisler (Wisconsin, USA): Products of Loeb spaces K. Hrbacek (CUNY, USA): Nonstandard set theory P.A. Loeb (Illinois, USA): Base operators in analysis and a generalization of monads R. Jin (Charleston, USA): Nonstandard analysis and density problems: Introduction and recent developments D.A. Ross (Hawaii, USA): Nonstandard measure constructions: Examples and problems S. Albeverio (Bonn, Germany): No title The following is a complete list of registered participants in NS2002: Eva Aigner, Petr Andreyev, David Ballard, Stefano Baratella, John Bell, Vieri Benci, Eric Benoit, Alessandro Berarducci, Josef Berger, Ouahiba Cherikh, Nigel Cutland, Francine Diener, Mauro Di Nasso, Antonino Drago, Ruggero Ferro, Marco Forti, Eberhard Gerlach, Guido Gherardi, Paolo Giordano, Karel Hrbacek, Chris Impens, Renling Jin, Vladimir Kanovei, Jerome Keisler, Giacomo Lenzi, Steven Leth, Peter Loeb, Angus Macintyre, Natalia Martins, Vladimir Molchanov, Mojtaba Moniri, Toru Nakamura, Vitor Neves, Siu-Ah Ng, Richard O’Donovan, Horst Osswald, Yves Peraire, Hans Ploss, Emiliano Rago, Giuseppe Randazzo, Hermann Render, Sergio Rodrigues, David Ross, Peter Schuster, Joao Teixeira, Hans Vernaeve, Guy Wallet, Manfred Wolff, Beate Zimmer. The organizers note with sadness the death in May 2004 of our friend and colleague David Ballard. David was an invited participant in NS2002, and his work in the foundations of nonstandard set theory was intriguing and highly original. The Editors Nigel J. Cutland, Hull Mauro Di Nasso, Pisa David A. Ross, Honolulu

FOUNDATIONS

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

Abstract. This paper consists of a quick introduction to the “hyper-methods” of nonstandard analysis, and of a review of eight different approaches to the subject, which have been recently elaborated by the authors. Those who follow the noble Eightfold Path are freed from the suffering and are led ultimately to Enlightenment. (Gautama Buddha)

Introduction. Since the original works [39, 40] by Abraham Robinson, many different presentations to the methods of nonstandard analysis have been proposed over the last forty years. The task of combining in a satisfactory manner rigorous theoretical foundations with an easily accessible exposition soon revealed very difficult to be accomplished. The first pioneering work in this direction was W.A.J. Luxemburg’s lecture notes [36]. Based on a direct use of the ultrapower construction, those notes were very popular in the “nonstandard” community in the sixties. Also Robinson himself gave a contribution to the sake of simplification, by reformulating his initial typetheoretic approach in a more familiar set-theoretic framework. Precisely, in his joint work with E. Zakon [42], he introduced the superstructure approach, by now the most used foundational framework. To the authors’ knowledge, the first relevant contribution aimed to make the “hyper-methods” available even at a freshman level, is Keisler’s book [33], which is a college textbook for a first course of elementary calculus. There, the principles of nonstandard analysis are presented axiomatically in a nice and elementary form (see the accompanying book [32] for the foundational aspects). Among the more recent works, there are the “gentle” introduction by W.C. Henson [26], R. Goldblatt’s lectures on the hyperreals [25], and K.D. Stroyan’s textbook [44]. 2000 Mathematics Subject Classification. 26E35 Nonstandard analysis; 03E65 Other hypotheses and axioms. During the preparation of this paper the authors were supported by MIUR PRIN grants “Metodi variazionali e topologici nello studio di fenomeni non lineari” and “Metodi logici nello studio di strutture geometriche, topologiche e insiemistiche”. Nonstandard Methods and Applications in Mathematics Edited by N. J. Cutland, M. Di Nasso, and D. A. Ross Lecture Notes in Logic, 25 c 2006, Association for Symbolic Logic 

3

4

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

Recently the authors investigated several different frameworks in algebra, topology, and set theory, that turn out to incorporate explicitly or implicitly the “hyper-methods”. These approaches show that nonstandard extensions naturally arise in several quite different contexts of mathematics. An interesting phenomenon is that some of those approaches lead in a straightforward manner to ultrafilter properties that are independent of the axioms of Zermelo-Fraenkel set theory ZFC. Contents. This article is divided into two parts. The first part consists of an introduction to the hyper-methods of nonstandard analysis, while the second one is an overview of eight different approaches to the subject recently elaborated by the authors. Most proofs are omitted, but precise references are given where the interested reader can find all details. Part I contains two sections. The longest Section 1 is a soft introduction to the basics of nonstandard analysis, and will be used as a reference for the remaining sections of this article. The three fundamental “hyper-tools” are presented, namely the star-map, the transfer principle, and the saturation property, and several examples are given to illustrate their use in the practice. The material is intentionally presented in an elementary (and sometimes semi-formal) manner, so that it may also serve as a quick presentation of nonstandard analysis for newcomers. Section 2 is focused on the connections between the hyper-extensions of nonstandard analysis and ultrapowers. In particular, a useful characterization of the models of hyper-methods is presented in purely algebraic terms, by means of limit ultrapowers. Each of the eight Sections 3–10 in Part II presents a different possible “path” to nonstandard analysis. The resulting eight approaches, although not strictly equivalent to each other, are all suitable for the practice, in that each of them explicitly or implicitly incorporates the fundamental “hyper-tools” introduced in Section 1. Section 3 is about a modified version of the so-called superstructure approach, where a single superstructure is considered both as the standard and the nonstandard universe (see [3].) In Section 4, we present the purely algebraic approach introduced in [6, 7], which is based on the existence of a “special” ring homomorphism. Starting from such a homomorphism, we define in a direct manner a superstructure model of the hyper-methods, as defined in Section 3. In Section 5, the axiomatic theory ∗ZFC of [17] is presented, that can be seen as an extension of the superstructure approach to the full generality of set theory. Section 6 is dedicated to the so-called Alpha Theory, an axiomatic presentation that postulates five elementary properties for an “ideal” (infinite) natural number α (see [4].) These axioms suffice for defining a star-map on the universal class of all mathematical objects. Section 7 deals with topological extensions, a sort of “topological completions” of a given set X , introduced and studied in [9, 18]. These structures

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

5

are topological spaces X where any function f : X → X has a continuous -extension, ∗f : ∗X → ∗X , and where the -extension A of a subset A ⊆ X is simply its closure in X . Hyper-extensions of nonstandard analysis, endowed with a natural topology, are characterized as those topological extensions that satisfy two simple additional properties. Moreover, several important features of nonstandard extensions, such as the enlarging and saturation properties, can be naturally described in this topological framework. Section 8, following [24], further simplifies the topological approach of the preceding section. By assuming that the -extensions of unary functions satisfy three simple “preservation properties” having a purely functional nature, one obtains all possible hyper-extensions of nonstandard analysis. Section 9 deals with natural ring structures that can be given to suitable ˇ subspaces of Z, the Stone-Cech compactification of the integers Z (see [19].) Such rings turn out to be sets of hyperintegers with special properties that are independent of ZFC. In the final Section 10, we consider a new way of counting that has been proposed in [5] and which maintains the ancient principle that “the whole is larger than its parts”. This counting procedure is suitable for those countable sets whose elements are “labelled” by natural numbers. We postulate that this procedure satisfies three natural “axioms of compatibility” with respect to inclusion, disjoint union, and Cartesian product. As a consequence, sums and products of numerosities can be defined, and the resulting semi-ring of numerosities becomes a special set of hypernatural numbers, whose existence is independent of ZFC. Disclaimer. A disclaimer is in order. By no means the approaches presented here have been choosen because they are better than others, or because they provide an exhaustive picture of this field of research. Simply, this article surveys the authors’ contributions to the subject over the last decade. In particular, throughout the paper we stick to the so-called external viewpoint of nonstandard methods, based on the existence of a star-map ∗ providing an hyper-extension ∗A for each standard object A. This is to be confronted with the internal approach of Nelson’s IST [37], and other related nonstandard set theories where the standard predicate st is used in place of the star-map (cf. e.g. the recent book [30]; see also Hrbacek’s article in this volume). Extensive treatments of nonstandard analysis based on the internal approach are given e.g. in the books [21, 22, 38].

Part I – The “Hyper-methods” §1. What are the “hyper-methods”? Roughly, nonstandard analysis essentially consists of two fundamental tools: the star-map ∗ and the transfer principle. In most applications, a third fundamental tool is also considered, namely the saturation property.

6

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

There are several different frameworks where the methods of nonstandard analysis (the “hyper-methods”) can be presented. The goal of this section is to introduce the basic notions in such a way that their formulations do not depend on the specific approach that one is adopting. Of course, there is a price we have to pay to reach this generality. Sometimes, the definitions as given here are not entirely formalized (at least from the point of view of a logician). However we are confident that they are still sufficiently clear and unambiguous to the point that some “practitioners” may find them suitable already. To reassure the suspicious reader, we anticipate that each of the eight Sections 3–10 consists of a specific approach where all notions presented here are given rigorous foundations. Besides the fundamental tools and definitions, this section also contains the definition of internal element, sketchy proofs of the first consequences of the definitions, as well as a bunch of relevant examples. It is not a complete introduction (e.g. overspill and hyperfinite sets are not treated), but it may be used as a first reading for beginners interested in nonstandard analysis. 1.1. The basic definitions. In order to correctly formulate the fundamental tools of hyper-methods, we need the following Definition 1.1. A universe U is a nonempty collection of “mathematical objects” that is closed under subsets (i.e. a ⊆ A ∈ U ⇒ a ∈ U) and closed under the basic mathematical operations. Precisely, whenever A, B ∈ U, we require that also the union A ∪ B, the intersection A ∩ B, the set-difference A \ B, the ordered pair (A, B), the Cartesian product A × B, the powerset P(A) = {a | a ⊆ A}, the function-set B A = {f | f : A → B}, all belong to U.1 A universe U is also assumed to contain (copies of) all sets of numbers N, Z, Q, R, C ∈ U, and to be transitive, i.e. members of U belong to U (in formulæ: a ∈ A ∈ U ⇒ a ∈ U). The notion of “mathematical object” includes all objects used in the ordinary practice of mathematics, namely: numbers, sets, functions, relations, ordered tuples, Cartesian products, etc. It is well-known that all these notions can be defined as sets and formalized in the foundational framework of Zermelo-Fraenkel axiomatic set theory ZFC.2 For sake of simplicity, here we consider them as primitive concepts not necessarily reduced to sets. Hyper-Tool # 1: STAR-MAP. The star-map is a function ∗ : U → V between two universes that associates to each object A ∈ U its hyper-extension (or nonstandard 1 Clearly, here we implicitly assume that A and B are sets, otherwise these operation don’t make sense. The only exception is the ordered pair, that makes sense for all mathematical objects A and B. 2 E.g. in ZFC, an ordered pair (a, b) is defined as the Kuratowski pair {{a}, {a, b}}; an n-tuple is inductively defined by (a1 , . . . , an , an+1 ) = ((a1 , . . . , an ), an+1 ); an n-place relation R on A is

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

7

extension) ∗A ∈ V. It is assumed that ∗n = n for all natural numbers n ∈ N, and that the properness condition ∗N = N holds. It is customary to call standard any object A ∈ U in the domain of the star-map, and nonstandard any object B ∈ V in the codomain. The adjective standard is also often used in the literature for hyper-extensions ∗A ∈ V. We remark rightaway that one could directly consider a single universe U = V. Doing so, the traditional distinction between standard and nonstandard objects is overcome.3 We point out that in all approaches appeared in the literature, the standard universe is taken to be large enough so as to include all mathematical objects under consideration. We are now ready to introduce the second powerful tool of nonstandard methods. It states that the star-map preserves a large class of properties. Hyper-Tool # 2: TRANSFER PRINCIPLE. Let P(a1 , . . . , an ) be a property of the standard objects a1 , . . . , an expressed as an “elementary sentence”. Then P(a1 , . . . , an ) is true if and only if the same sentence is true about the corresponding hyperextensions ∗a1 , . . . , ∗an . That is: P(a1 , . . . , an ) ⇐⇒ P(∗a1 , . . . , ∗an ) The transfer principle (also known as Leibniz principle) is given a rigorous formulation by using the formalism of mathematical logic and, in particular, by appealing to the notion of bounded quantifier formula in the first-order language of set theory. Here we only give a semi-formal definition, and refer the reader to §4.4 of [12] for a rigorous treatment. Definition 1.2. We say that a property P(x1 , . . . , xn ) of the objects x1 , . . . , xn is expressed as an elementary sentence if the following two conditions are fulfilled: (1) Besides the usual logic connectives (“not”, “and”, “or”, “if . . . then”, “if and only if ”) and quantifiers (“there exists”, “for all”), only the basic notions of function, value of a function at a given point, relation, identified with the set R ⊆ An of n-tuples that satisfy it; a function f : A → B is identified with its graph {(a, b) ∈ A × B | b = f(a)}; and so forth. As for numbers, complex numbers C = R × R/ ≈ are defined as equivalence classes of ordered pairs of real numbers, and the real numbers R are defined as equivalence classes of suitable sets of rational numbers (namely, Dedekind cuts or Cauchy sequences). The rational numbers Q are a suitable quotient Z × Z/ ≈, and the integers Z are in turn a suitable quotient N × N/ ≈. The natural numbers of ZFC are defined as the set  of von Neumann naturals: 0 = ∅ and n + 1 = n ∪ {n} (so that each natural number n = {0, 1, . . . , n − 1} is identified with the set of its predecessors.) We remark that these definitions are almost compulsory in order to obtain a set theoretic reductionist foundation, but certainly they are not needed in the ordinary development of analysis. 3 This matter will be discussed in Section 3 (see Definition 3.3) and Section 5.

8

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

domain, codomain, ordered n-tuple, ith component of an ordered tuple, and membership ∈, are involved. (2) The scopes of all universal quantifiers ∀ (“for all”) and existential quantifiers ∃ (“there exists”) are “bounded” by some set. A quantifier is bounded when it occurs in the form “for every x ∈ X ” or “there exists y ∈ Y ”, for some specified sets X, Y . Thus, in order to correctly apply the transfer principle, one has to stick to the following rule. Rule of the thumb. Whenever considering quantifiers: “∀x . . . ” or “∃y . . . ”, we must always specify the range of the variables, i.e. we must specify sets X and Y and reformulate: “∀ x ∈ X . . . ” and “∃ y ∈ Y . . . ”. In particular, all quantifications on subsets: “∀ x ⊆ X . . . ” or “∃ x ⊆ X . . . ”, must be reformulated in the form “∀ x ∈ P(X ) . . . ” and “∃ x ∈ P(X ) . . . ” respectively, where P(X ) is the powerset of X . Similarly, all quantifications on functions f : A → B, must be bounded by B A , the set of all functions from A to B. We are now ready to give the FUNDAMENTAL DEFINITION: A model of hyper-methods (or a model of nonstandard analysis) is a triple ∗ ; U ; V  where ∗ : U → V is a star-map satisfying the transfer principle. 1.2. Some applications of transfer. We now show a few simple applications of the transfer principle, aimed to clarify the crucial notion of elementary sentence. Example 1.3. By condition (1) of Definition 1.2, the following are all elementary sentences: “f is a function with domain A and codomain B”; “b is the value taken by f at the point a”; “R in an n-place relation on A”; “C is the Cartesian product of A and B”. Thus by transfer, we get that “∗f : ∗A → ∗B is a function with domain ∗A and codomain ∗B”; “∗b = ∗f(∗a) is the value taken by ∗f at the point ∗a”, i.e. ∗(f(a)) = ∗f(∗a); “∗R is an n-place relation on ∗A”; and “∗C = ∗A × ∗B is the Cartesian product of ∗A and ∗B”. Example 1.4. The inclusion and all basic operations on sets are preserved under the star-map, with the only relevant exceptions of the powerset and the function-set (see Example 1.9 below). In fact the properties: “A ⊆ B”; “C = A ∪ B”; “C = A ∩ B”; and “C = A \ B” can all be formulated as elementary sentences. For instance, “A ⊆ B” means that “∀x ∈ A. x ∈ B”, etc. By transfer we obtain that “∗A ⊆ ∗B”; “∗C = ∗A ∪ ∗B”; “∗C = ∗A ∩ ∗B”; and “∗C = ∗A \ ∗B”. Example 1.5. Let f : A → B be any given standard function. Then the images f(A ) = {f(a) | a ∈ A } of subsets A ⊆ A, and the preimages

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

9

f −1 (B  ) = {a ∈ A | f(a) ∈ B  } of subsets B  ⊆ B, are both preserved under the star-map, i.e. ∗(f(A )) = ∗f(∗A ) and ∗(f −1 (B  )) = ∗f −1 (∗B  ). In particular, ∗Range(f) = Range(∗f), and so f is onto if and only if ∗f is. It is also easily shown that f is 1-1 if and only if ∗f is. Two more relevant properties are the following: ∗{a ∈ A | f(a) = g(a)} = {α ∈ ∗A | ∗f(α) = ∗g(α)}, and ∗Graph(f) = Graph(∗f). All these properties are proved by direct applications of the transfer principle. E.g. the last equality is proved by transferring the elementary sentence: “x ∈ Graph (f) if and only if there exist a ∈ A and b ∈ B such that b = f(a) and x = (a, b)”. Example 1.6. Let A be a nonempty standard set, and consider the property: “< is a linear ordering on A”. Notice first that < is a binary relation, hence ∗ < is a binary relation on ∗A. By definition, < is a linera ordering if and only if it satisfies the following three properties, that are expressed by means of bounded quantifiers. ∀x ∈ A (x < x) ∀x, y, z ∈ A (x < y and y < z) ⇒ x < z ∀x, y ∈ A (x < y or y < x or x = y) Then we can apply the transfer principle and obtain that “∗< is a linear ordering on ∗A”. Example 1.7. It directly follows from condition (1) of Definition 1.2 that the hyper-extension of an n-tuple of standard objects A = (a1 , . . . , an ) is ∗ A = (∗a1 , . . . , ∗an ). Similarly, if A = {a1 , . . . , an } is a finite set of standard objects, then its star-extension is ∗A = {∗a1 , . . . , ∗an }. This is proved by applying transfer to the following elementary sentence: “a1 ∈ A and . . . and an ∈ A and for all x ∈ A, x = a1 or . . . or x = an ” Notice that for every standard set A, {∗a | a ∈ A} ⊆ ∗A (apply transfer to all sentences “a ∈ A”). In the last example we have seen that the inclusion is actually an equality when A is finite. But this is never the case when A is infinite, as a consequence of the properness condition ∗N = N. Proposition 1.8. Let A be an infinite standard set A. Then the inclusion {∗a | a ∈ A} ⊂ ∗A is proper. Proof. Fix a standard map f : A → N which is onto. Then ∗f : ∗A → ∗N is onto as well. Now assume by contradiction that all elements in ∗A are of the form ∗a for some a ∈ A. Then: ∗

N = {∗f(∗a) | a ∈ A} = {∗(f(a)) | a ∈ A} = {∗n | n ∈ N} = N,

against the properness condition ∗N = N.



10

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

Example 1.9. Let A and B be any standard sets. By transferring the sentences: “∀ x ∈ P(A), ∀y ∈ x, y ∈ A” and “∀f ∈ B A , f is a function with domain A and codomain B”, it is proved that ∗P(A) ⊆ P(∗A), ∗ and ∗(B A ) ⊆ ∗B A , respectively. Arguing similarly as in Example 1.7, one easily shows that these inclusions are equalities whenever both A and B are finite. In the infinite case, the inclusions are proper (cf. Proposition 1.25). 1.3. The basic sets of hypernumbers. Let us now concentrate on the hyperextensions of sets of numbers. Definition 1.10. The elements of ∗N, ∗Z, ∗Q, ∗R and ∗C are called hypernatural, hyperinteger, hyperrational, hyperreal, and hypercomplex numbers, respectively. Besides natural numbers, for convenience it is also customary to assume that z = z for all numbers z. In this case, we have the inclusions N ⊂ ∗N, Z ⊂ ∗Z, Q ⊂ ∗Q, R ⊂ ∗R, and C ⊂ ∗C (the inclusions are proper by Proposition 1.8). Whenever confusion is unlikely, some asterisks will be omitted. For instance, we shall use the same symbols + and · to denote both the sum and product operations on N, Z, Q, R, C and the corresponding operations defined on the hyper-extensions ∗N, ∗Z, ∗Q, ∗R, ∗C. Similarly for the ordering ≤. In the next proposition we itemize the first properties of hypernumbers, all obtained as straightforward applications of the transfer principle. ∗

Proposition 1.11. 1. ∗Z is a commutative ring, ∗Q is an ordered field, ∗R is a real-closed field, and ∗C is an algebraically closed field ;4 2. Every non-zero hypernatural number  ∈ ∗N has a successor  + 1 and a predecessor  − 1;5 3. (N, ≤) is an initial segment of (∗N, ≤), i.e. if  ∈ ∗N \ N, then  > n for all n ∈ N; 4. For every positive  ∈ ∗R there exists a unique  ∈ ∗N such that  ≤  <  + 1. In particular, ∗N is unbounded in ∗R; 5. The hyperrational numbers ∗Q, as well as the hyperirrational numbers ∗ (R \ Q) = ∗R \ ∗Q, are dense in ∗R;6 6. Let Z be any of the sets N, Z, Q or R, and consider the open interval (a, b) = {x ∈ X | a < x < b} determined by numbers a < b in Z. 4 Recall that an ordered field is real-closed if every positive element is a square, and every polynomial of odd degree has a root. A field is algebraically closed if all non-constant polynomials have a root. 5 We say that   is the successor of  (or  is the predecessor of   ) if  <   and there exist no elements  such that  <  <   . 6 I.e., for all  <  in ∗R, there exist x ∈ ∗Q and y ∈ ∗R \ ∗Q such that  < x, y < .

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

11

Then the hyper-extension ∗(a, b) = { ∈ ∗Z | a <  < b}. Similar equalities also hold for intervals of the form [a, b), (a, b], (a, b), (−∞, b] and [a, +∞). As a consequence of property (3) above, the elements of ∗N \ N are called infinite. More generally: Definition 1.12. A hyperreal number  ∈ ∗R is infinite if either  >  or  < − for some  ∈ ∗N \ N. Otherwise we say that  is finite. We call infinitesimal those hyperreal numbers ε ∈ ∗R such that −r < ε < r for all positive reals r ∈ R. In this case we write ε ∼ 0. The following properties are easily seen:7 ε = 0 is infinitesimal if and only if its reciprocal 1/ε is infinite; if  and  are finite, then also  +  and  ·  are finite; if ε, ∼ 0, then also ε + ∼ 0; if ε ∼ 0 and  is finite, then ε ·  ∼ 0; if  is infinite and  is not infinitesimal, then  · is infinite; if ε = 0 is infinitesimal but  is not infinitesimal, then /ε is infinite; if  is infinite and  is finite, then / ∼ 0; etc. Infinitesimal and infinite numbers can be seen as formalizations of the intuitive ideas of “small” number and “large” number, respectively. Also the idea of “closeness” can be formalized as follows. Definition 1.13. The hyperreal numbers  and  are infinitely close if  −  is infinitesimal. In this case, we write  ∼ . Clearly, ∼ is an equivalence relation. The completeness of the real numbers R yields the following result. Theorem 1.14 (Standard part). For every finite  ∈ ∗R, there exists a unique real number r ∈ R (called the “standard part” of ) such that  ∼ r. Proof. The least upper bound r = sup{a ∈ R | a ≤ } has the desired property.  The next interesting result shows that in a way the hyperrationals already “incorporate” the real numbers (see e.g. [45, Thm. 4.4.4] and [14, Ch.II, Thm. 2]). Theorem 1.15. Let ∗Qb be the ring of finite hyperrationals, and let I be the maximal ideal of its infinitesimals. Then R and ∗Qb /I are isomorphic as ordered fields. 1.4. Correctly applying the transfer principle. From the examples presented so far, one might (wrongly) guess that applying the transfer principle merely consists in putting asterisks ∗ all over the place. It is not so, because — as we already pointed out — only elementary sentences can be transferred. We now give three relevant examples aimed to clarify this matter. 7 In fact, they hold in any non-archimedean field (the archimedean property is defined in Example 1.18).

12

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

Example 1.16. Recall the well-ordering property of N: “Every nonempty subset of N has a least element”. By applying the transfer principle to this formulation, we would get that “Every nonempty subset of ∗N has a least element”. But this is clearly false (e.g. the collection ∗N \ N of infinite hypernaturals has no least element, because if  is infinite, then  − 1 is infinite as well). We reached a wrong conclusion because we transferred a sentence which is not elementary (the universal quantifier is not bounded). However, we can easily overcome this problem by reformulating the well-ordering property as the following elementary sentence: “Every nonempty element of P(N) has a least element”, where P(N) is the powerset of N. (Notice that the property “X has a least element” is elementary, because it means: “there exists x ∈ X such that for all y ∈ X , x ≤ y”.) We can now correctly apply the transfer and get: “Every nonempty element of ∗P(N) has a least element”, where it is intended that the ordering on ∗N is the hyper-extension of the ordering on N. The crucial point here is that ∗ P(N) is properly included in P(∗N) (see Proposition 1.25 below). Example 1.17. Recall the completeness property of real numbers: “Every nonempty subset of R which is bounded above, has a l.u.b.” As in the previous example, if we directly apply transfer to this formulation, we reach a false conclusion, namely: “Every nonempty subset of ∗R which is bounded above, has a l.u.b.” (e.g. the set of infinitesimals is bounded above but has no least upper bound). Again, the problem is that the sentence above is not elementary because it contains a quantification over subsets. To fix the problem, we simply have to consider the powerset P(R) and reformulate: “Every nonempty element of P(R) which is bounded above has a l.u.b.”. Thus, by the transfer principle, we have a least upper bound for each upper-bounded element of ∗ P(R) (which is a proper subset of P(∗R), see Proposition 1.25 below). As suggested by the last examples, restricting to elementary sentences is not a limitation, because virtually all mathematical properties can be equivalently rephrased in elementary terms. Another delicate aspect that needs some caution, is the possibility of misreading a transferred sentence, once all asterisks ∗ have been put in the right place. A relevant example is given by the archimedean property. Example 1.18. The archimedean property of real numbers can be expressed in this elementary form: “For all positive x ∈ R, there exists n ∈ N such that n · x > 1”. By transfer, we obtain: “For all positive  ∈ ∗R, there exists  ∈ ∗N such that  · > 1”. Notice that this sentence does not express the archimedean property of ∗R, because the element  could be an infinite hypernatural.

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

13

Clearly, the hyperreal field ∗R is not archimedean (in fact, an ordered field is non-archimedean if and only if it contains non-zero infinitesimals). In particular R and ∗R are not isomorphic. We remark that this phenomenon of non-isomorphic mathematical structures that cannot be distinguished by any elementary sentence, is indeed the very essence of nonstandard analysis (and more generally, of model theory, a branch of mathematical logic). 1.5. Internal elements. We now introduce a fundamental class of objects in nonstandard analysis. Definition 1.19. An internal object is any element  ∈ ∗X belonging to some hyper-extension ∗X . An element  ∈ V of the nonstandard universe is external if it is not internal. Notice that all hyper-extensions ∗X are internal, because e.g. ∗X ∈ ∗Y , where Y = {X } is the singleton of X . We remark that in most foundational approaches proposed in the literature, the collection of internal objects is assumed to be transitive, i.e. if b ∈ B and B is internal, then b is internal as well.8 The following useful theorem is a straightforward consequence of the transfer principle and of the definition of internal object (see e.g. [12, Prop. 4.4.14]). Theorem 1.20 (Internal Definition Principle). If P(x, x1 , . . . , xn ) is an elementary sentence and B, B1 , . . . , Bn are internal objects, then also the set {x ∈ B | P(x, B1 , . . . , Bn )} is internal. By direct applications of this principle, the following is proved. Proposition 1.21. 1. The collection I of internal sets is closed under union, intersection, setdifference, finite sets and tuples, finite Cartesian products, and under images and preimages of internal functions; 2. For every standard set A, ∗P(A) = P(∗A) ∩ I is the set of all internal subsets of ∗A; ∗ 3. For all standard sets A and B, ∗(B A ) = (∗B A ) ∩ I is the set of all internal functions from ∗A to ∗B; 4. If C, D ∈ I are internal, then P(C )∩I (the set of all internal subsets of C ) and (D C ) ∩ I (the set of all internal functions from C to D) are internal. The notion of internal set is useful to correctly apply the transfer principle. In fact, any quantification on subsets or functions, can be transferred to a quantification on internal subsets or internal functions, respectively. For instance, let us go back to Examples 1.16 and 1.17. The well-ordering of N is transferred to: “Every nonempty internal subset of ∗N has a least element”. The completeness of R transfers to: “Every nonempty internal subset of ∗R that is bounded above has a l.u.b.”. 8 The matter of transitivity of the class of internal sets gives rise to interesting considerations in the foundations of nonstandard set theories (cf. Hrbacek’s remarks in Subsection 3.3 of [29].)

14

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

Another example is the following. Example 1.22. The well-ordering property of N implies that: “There is no decreasing function f : N → N”. Then, by transfer, “There is no internal decreasing function g : ∗N → ∗N”.9 In general, we can state the following Rule of the thumb. Properties about subsets or about functions of standard objects, transfer to the corresponding properties about internal subsets or internal functions, respectively. We can use the above considerations to prove that certain objects are external. Example 1.23. The set ∗N \ N of the infinite hypernatural numbers is external, because it has no least element. Also N is external, otherwise the set-difference ∗N \ N would be internal.10 The set of infinitesimal hyperreal numbers is another external collection, because it is bounded above but with no least upper bound. An easy example of external function is the following. Example 1.24. Let g : ∗N → ∗N be the function such that g(n) = n if n ∈ N, and g() = 0 if  ∈ ∗N \ N. Then g is external, otherwise its range N would be internal. As a consequence of Proposition 1.21, the above Examples 1.23 and 1.24 ∗ show that ∗ P(N) = P(∗N) and ∗ (NN ) = ∗N N . More generally, we have Proposition 1.25. 1. Every infinite internal set has at least the size of the continuum, hence it cannot be countable. In particular, for every infinite standard set A, the inclusion ∗ P(A) ⊂ P(∗ A) is proper; 2. If the standard set A is infinite and B contains at least two elements, then ∗ the inclusion ∗ (B A ) ⊂ ∗B A is proper. We warn the reader that getting familiar with the distinction between internal and external objects is probably the hardest step in learning nonstandard analysis. 1.6. The saturation principle. The star-map and the transfer principle suffice to develop the basics of nonstandard analysis, but for more advanced applications a third tool is also necessary, namely: Countable Saturation Principle: Suppose {Bn }n∈N ⊆ ∗A is a countable family of internal sets with the “finite intersection property”. Then the inter section n∈N Bn = ∅ is nonempty. 9 We remark that there are models of hyper-methods where (external) decreasing functions g : ∗N → ∗N exist. 10 Here N ⊂ ∗N is seen as an element of the nonstandard universe.

15

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

n

Recall that a family of sets B has the finite intersection property if i=1 Bi = ∅ for all choices of finitely many B1 , . . . , Bn ∈ B. In several contexts, stronger saturation principles are considered where also families of larger size are allowed. Precisely, let κ be a given uncountable cardinal. Fundamental Tool # 3: κ-SATURATION PROPERTY. Suppose B ⊆ ∗A is a family of internal subsets of some hyperextension ∗A, and suppose |B| < κ. If B has the “finite intersection property”, then B = ∅. In this terminology, countable saturation is ℵ1 -saturation. The next example illustrates a relevant use of saturation. Example 1.26. Let (X, ) be a Hausdorff topological space with character κ, hence each point x ∈ X has a base of neighborhoods Nx of size at most κ. Clearly, the family of internal sets Bx = {∗I | I ∈ Nx } has the finite intersection property. If we assume κ + -saturation,11 the intersection (x) =  ∗ I ∈Nx I = ∅. In the literature, (x) is called the monad of x. Notice that

(x) ∩ (y) = ∅ whenever x = y, since X is Hausdorff. Monads are the basic ingredient in applying the hyper-methods to topology, starting with the following characterizations (see e.g. [35, Ch.III]): • A ⊆ X is open if and only if for every x ∈ A, (a) ⊆ ∗ A; • C ⊆ X is closed if and only if for everyx ∈ / C , (x) ∩ ∗ C = ∅; ∗ • K ⊆ X is compact if and only if K ⊆ x∈K (x). Sometimes in the literature, the following weakened version of saturation is considered, where only families of hyper-extensions are allowed. Definition 1.27 (κ-enlarging property). Suppose F ⊆ A is a family of subsets of some standard set A,and suppose that |F | < κ. If F has the “finite intersection property”, then F ∈F ∗ F = ∅.12 We remark that the κ + -enlarging property suffices to prove the nonstandard characterizations for open, closed and compact sets. §2. Ultrapowers and hyper-extensions. In this section we deal with the connections between ultrapowers and the hyper-extensions of nonstandard analysis. In particular, we will see that, up to isomorphisms, hyper-extensions are precisely suitable subsets of ultrapowers, namely the proper limit ultrapowers. This characterization theorem will be used in Part II of this article to show that the given definitions actually yield models of the hyper-methods. denotes the successor cardinal of κ. Thus |B| < κ + is the same as |B| ≤ κ. remark that the enlarging property is strictly weaker than saturation, in the sense that there are models of the hyper-methods where the κ-enlarging property holds but κ-saturation fails. 11 κ +

12 We

16

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

2.1. Ultrafilters and ultrapowers. Recall that a filter F on a set I is a nonempty family of subsets of I that is closed under intersections and supersets, i.e. • If A, B ∈ F then A ∩ B ∈ F; • If A ∈ F and B ⊇ A, then also B ∈ F. A typical example of filter on a set I is the Frechet filter F r of cofinite subsets. F r = {A ⊆ I | I \ A is finite}. Definition 2.1. An ultrafilter U on I is a filter that satisfies the additional property: A ∈ / U ⇔ I \ A ∈ U. It is easily shown that ultrafilters on I are those non-trivial filters with are maximal with respect to inclusion.13 As a consequence of the definition, if a finite union A1 ∪ · · · ∪ An ∈ U belongs to an ultrafilter, then at least one of the Ai ∈ U. First examples are the principal ultrafilters Ui = {A ⊆ I | i ∈ A}, where i is a fixed element of I . Notice that an ultrafilter is non-principal if and only if it contains no finite sets (hence, if and only if it includes the Frechet filter). The existence of non-principal ultrafilters is proved by a straight application of Zorn’s lemma. Given an ultrafilter U on the set I , consider the following equivalence relation ≡U on functions with domain I : f ≡U g ⇐⇒ {i ∈ I | f(i) = g(i)} ∈ U. The ultrapower of a set X modulo U is the quotient set: X IU = {[f]U | f : I → X } where we denoted by [f]U = {g ∈ X I | f ≡U g} the equivalence class of f. When the ultrafilter U is clear from the context, we simply write [f]. X is canonically embedded into its ultrapower X IU by means of the diagonal map d : x → [cx ], where cx : I → X is the constant function with value x. The ultrapower construction is commonly used to obtain models of hypermethods. Indeed, models of hyper-methods are fully characterized by means the generalized notion of limit ultrapower (see Theorem 2.10 below.) Ultrafilters naturally arise in hyper-extensions. Definition 2.2. Let X be any standard set, and let α ∈ ∗X . The ultrafilter generated by α ∈ ∗X , is the following family of subsets of X : Uα = {A ⊆ X | α ∈ ∗A}. It is readily verified that Uα is actually an ultrafilter on X . Moreover, Uα is non-principal if and only if α = ∗x for all x ∈ X . 13 By

the trivial filter on I we mean the collection P(I ) of all subsets of I .

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

17

2.2. Complete structures. In order to formulate the next results, we need the Definition 2.3. A X -complete structure is a system A(X ) = XA ; {FA | F : X n → X }; {RA | R ⊆ X n } that consists of a superset XA of X , of an n-place function FA : (XA )n → XA for each F : X n → X , and of an n-place relation RA ⊆ (XA )n for each R ⊆ X n. Ultrapowers and hyper-extensions of X provide natural examples of X complete structures. Example 2.4. A crucial feature of ultrapowers of a given set X , is that all functions F : X n → X and all relations R ⊆ X n can be naturally extended to  ⊆ (X I )n , respectively. Precisely, functions F : (X IU )n → X IU and relations R U we set: F ([f1 ], . . . , [fn ]) = [g] ⇔ {i ∈ I | F (f1 (i), . . . , fn (i)) = g(i)} ∈ U  1 ], . . . , [fn ]) ⇔ {i ∈ I | R(f1 (i), . . . , fn (i))} ∈ U. R([f (The above definitions are well-posed as a consequence of the properties of filter.) If we identify every x ∈ X with its diagonal image d (x) ∈ X IU , then the ultrapower X IU becomes a X -complete structure:       | R ⊆ Xn . XIU = X IU ; F | F : X n → X ; R Example 2.5. Let ∗ ; U ; V  be a model of hyper-methods, and take any X ∈ U. If every x ∈ X is identified with ∗x ∈ ∗X , then ∗

X = ∗X ; {∗F | F : X n → X }; {∗R | R ⊆ X n } 

is a X -complete structure, called the hyper-structure induced by ∗ ; U ; V . Another important example is the following Example 2.6. Let ∗ ; U ; V  be a model of hyper-methods, take X ∈ U and α ∈ ∗X . Define the subspace generated by α in ∗X as Xα = {∗f(α) | f : X → X }. Notice that, if F : X n → X is any n-place function, and ∗fi (α) ∈ Xα for i = 1, . . . , n, then also the image ∗F (∗f1 (α), . . . , ∗fn (α)) = ∗g(α) ∈ Xα , where g is the function defined by x → F (f1 (x), . . . , fn (x)). Thus, by restricting the structure ∗X of Example 2.5 above we obtain a X -complete structure Xα = Xα ; {∗F  Xα n | F : X n → X }; {∗R ∩ Xα n | R ⊆ X n } . The natural notion of isomorphism for X -complete structures is the following:

18

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

Definition 2.7. Let A(X ) and B(X ) be X -complete structures. A bijection Θ : XA → XB is an isomorphism of X -complete structures if for every F : X n → X , for every R ⊆ X n , and for every x1 , . . . , xn ∈ XA , the following hold: Θ(FA (x1 , . . . , xn )) = FB (Θ(x1 ), . . . , Θ(xn )) and (x1 , . . . , xn ) ∈ RA ⇔ (Θ(x1 ), . . . , Θ(xn )) ∈ RB . In this case, we say that A(X ) and B(X ) are completely isomorphic. A relevant example of isomorphic complete structures is given by the next proposition, whose proof is straightforward from the examples above. Proposition 2.8. Let ∗ ; U ; V  be a model of hyper-methods, let X ∈ U, and pick α ∈ ∗X . Let Xα and Uα be the subspace and the ultrafilter generated by α, respectively. Then the map Θ : Xα → X XUα defined by Θ(∗f(α)) = [f]Uα is an isomorphism between the X -complete structures Xα and XXUα . 2.3. The characterization theorem. Hyper-extensions have an algebraic characterization as suitable subsets of ultrapowers. To this end, we recall the following generalization of ultrapowers. Definition 2.9. Let I be a set, U an ultrafilter on I and F a filter on the product I × I . For every set X , the limit ultrapower X IU |F is the subset of the ultrapower X IU that consists of all equivalence classes [f] of functions f : I → X that are “piecewise constant” with respect to F , i.e. such that {(i, i  ) ∈ I × I | f(i) = f(i  )} ∈ F.14 We say that the triple (I, U, F ) is proper when the diagonal embedding d : N → NIU |F is not onto.15 Notice that, when F = P(I × I ) is the trivial filter, then X IU |F = X IU . Thus limit ultrapowers generalize ultrapowers. Similarly as ultrapowers, also limit ultrapowers provide complete structures, according to the Example 2.4 above. The following characterization holds (cf. Theorem 3.4). Theorem 2.10 (Keisler’s characterization). Let X be an infinite set, and let X = X ; {F | F : X n → X }; {R | R ⊆ X n } .



be a X -complete structure. Then the following are equivalent: 1. X = ∗X is induced by a model of hyper-methods ∗ ; U ; V ; 2. X is isomorphic to some limit ultrapower XIU |F where (I, U, F ) is proper; 3. A is properly included in A for all infinite A ⊆ X , and the transfer principle holds: If  is an elementary formula involving functions F1 , . . . , Fm and relations R1 , . . . , Rk , then, for all x1 , . . . , xn ∈ X , (x1 , . . . , xn , F1 , . . . , Fm , R1 , . . . , Rk ) ⇔ (x1 , . . . , xn , F1 , . . . , Fm , R1 , . . . , Rk ). 14 Limit

ultrapowers have been introduced in the early sixties by H.J. Keisler [31]. when the diagonal embedding d : A → AIU |F is not onto for any infinite A.

15 Equivalently,

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

19

The above result was proved by H.J. Keisler in the context of superstructures, as an application of his characterization theorem of complete extensions as (isomorphic copies of) limit ultrapowers (see [12, Thms. 6.4.10 and 6.4.17]). An alternative proof of this result, based on the subspaces Xα and the ultrafilters Uα generated by α, can be reconstructed from arguments in [18, 24], and will appear in full details in [8].

Part II – The Eightfold Path §3. The superstructure approach. The approach that is most commonly adopted by practitioners of nonstandard methods is the so-called superstructure approach. It was first elaborated by A. Robinson jointly with E. Zakon in [42]. For a detailed exposition of this approach, we refer to Section 4.4 of [12], where all the proofs omitted here can be found. By the axioms of Zermelo-Fraenkel set theory ZFC, all existing “objects” are sets. As already pointed out in the Footnote 2, numbers, ordered tuples, sets, Cartesian products, relations, functions, as well as virtually all mathematical objects, can in fact be coded as sets. Following the common practice with superstructures, here we adopt as a foundational framework the (slightly) modified version of ZFC that allows also the existence of “atoms”. (By atoms we mean objects that can be elements of sets but are not sets themselves, and are “empty” with respect to ∈.) This is consistent with everyday practice, where one never considers, say,  or Napier’s constant e as sets. 3.1. The definitions. The basic notion is the following. Definition 3.1. Let X be a set of atoms. The superstructure over X is the increasing union  V (X ) = n∈N Vn (X ) where V0 (X ) = X and, by induction, the (n + 1)th stage Vn+1 (X ) = Vn (X ) ∪ P(Vn (X )) adds all subsets of the nth stage. It is assumed that (a copy of) the natural numbers N ⊆ X .16 Notice that superstructures are suitable to formalize the notion of universe of Definition 1.1. Suppose we want to investigate some mathematical object Z. Then, all what is needed in the study of Z belongs to any superstructure V (X ), provided X includes (a copy of) Z. E.g. in real analysis, the real functions, the usual spaces of functions and functionals, the norms, as well as the involved topologies, are all elements of V (R). The point is that superstructures are 16 We remark that superstructures V (X ) over sets of atoms can be also implemented in the “pure” set theory ZFC. This can be done by taking X as a set of nonempty sets x that “behave” as atoms with respect to V (X ), i.e. such that x ∩ V (X ) = ∅ (see [12, §4.4], where such X are called base sets).

20

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

closed under all the usual mathematical constructions. Namely, if A, B ∈ V (X ) are sets, then the union A ∪ B, the intersection A ∩ B, the set-difference A \ B, the ordered pair (A, B), the Cartesian product A × B, the set B A of all functions from A to B, any n-place relation R on A, the powerset P(A) ∈ V (X ), etc., all belong to V (X ). The following definition is the one that is most commonly adopted by practitioners of nonstandard analysis. Definition 3.2 ([12, §4.4]). A superstructure model of (nonstandard or) hyper-methods is a triple ∗ ; V (X ) ; V (Y ) where: 1. V (X ) and V (Y ) are superstructures; 2. ∗X = Y ; 3. ∗ n = n for all n ∈ N, and N is properly included in ∗N; 4. ∗ : V (X ) → V (Y ) satisfies the transfer principle. We propose here a modified version of the above definition, where a single superstructure V (X ) is considered instead of two.17 Definition 3.3. We say that a triple ∗ ; V (X ) ; V (X ) is a single superstructure model of (nonstandard or) hyper-methods if: 1. V (X ) is a superstructure; 2. ∗X = X ; 3. ∗ n = n for all n ∈ N, and N is properly included in ∗N; 4. ∗ : V (X ) → V (X ) satisfies the transfer principle. One advantage of this definition is that the traditional distinction between standard and nonstandard objects is overcome. Each object under consideration is in fact standard, and one can consider its hyper-extension. For instance, in this context, one could take the set of hyper-hypernatural numbers ∗∗N, the set of hyper-infinitesimals, and so forth. Moreover, all possible hyper-extensions are obtained in some single superstructure model, as shown by the following theorem. (The proof is obtained by suitably modifying the construction in [3]. See also [8].) Theorem 3.4. Let X be a set, and let  ; U ; V be any model of hypermethods with X ∈ U. Then there exists a single superstructure model of hypermethods ∗ ; V (X  ) ; V (X  ) with X ⊆ X  and such that the two X -complete structures X and ∗X are isomorphic. 3.2. A characterization theorem. The following result shows that the transfer principle can be equivalently reformulated as a closure property of the star-map under basic set operations. It can be used as an alternative definition that may be more appealing to those mathematicians who are not familiar with the notion of elementary sentence. A proof can be obtained by adapting the arguments used to prove Theorem 3.2. of [42]. 17 This

idea was first persued by V. Benci in [3].

THE EIGHTFOLD PATH TO NONSTANDARD ANALYSIS

21

Theorem 3.5. A map ∗ : V (X ) → V (Y ) between superstructures satisfies the transfer principle if and only if the following finite list of properties is satisfied for all A, B ∈ V (X ):18 1. ∗{A, B} = {∗A, ∗B}; 2. ∗(A ∪ B) = ∗A ∪ ∗B; 3. ∗(A ∩ B) = ∗A ∩ ∗B; 4. ∗(A \ B) = ∗A \ ∗B; ∗ ∗ 5. ∗(A  × B) =  ∗A × B; ∗ 6. ( A) = A, i.e. ∗{x | ∃y ∈ A. x ∈ y} = { | ∃ ∈ ∗A.  ∈ }; ∗ 7. {(x, x) | x ∈ A} = {(, ) |  ∈ ∗A}; 8. ∗{(x, y) | x ∈ y ∈ A} = {(, ) |  ∈ ∈ ∗A}; 9. ∗{x | ∃y. (x, y) ∈ A} = { | ∃ .(, ) ∈ ∗A}; 10. ∗{y | ∃x. (x, y) ∈ A} = { | ∃. (, ) ∈ ∗A}; 11. ∗{(x, y) | (y, x) ∈ A} = {(, ) | ( , ) ∈ ∗A}; 12. ∗{(x, y, z) | (x, z, y) ∈ A} = {(, , ) | (, , ) ∈ ∗A}. §4. The algebraic approach. We think there is a very simple “path” to nonstandard analysis, which is suitable to students who know the basics of elementary algebra. It is an algebraic approach based on the existence of a “special” homomorphism of algebras. Precisely: Definition 4.1. The map J : RN → R is a hyper-homomorphism19 if the following conditions are satisfied: 1. R is a superfield of the real numbers R. 2. J : RN → R is a surjective homomorphism of R-algebras, where RN is the ring of sequences ϕ : N → R, with operations defined pointwise. 3. The kernel of J is non-principal. 4.1. The star-map. We now sketch how to obtain a model of hyper-methods out of an hyper-homomorphism J : RN → R. For convenience, without loss of generality we assume that R is a set of  atoms. Let V (R) = k∈N Vk (R) be the superstructure over R, and consider  the family of sequences F = k∈N (Vk+1 (R) \ Vk (R))N . Inductively extend the map J to a map J : F ∪ RN → V (R) as follows. J (ϕ) if ϕ : N → R J(ϕ) = {J() | ∀n . (n) ∈ ϕ(n)} if ϕ : N → (Vk+1 (R) \ Vk (R)) Let cA ∈ F ∪RN denote the constant sequence with value A ∈ V (R). Define the map ∗ : V (R) → V (R) by setting ∗A = J(cA ) for every A ∈ V (R). 18 This list could easily be reduced (some of the itemized properties can be derived from the others). However, for the sake of completeness and clarity, we decided to include all basic operations. Clearly, with the exception of item 1, A and B are assumed to be sets. 19 This notion of hyper-homomorphism is different from that given in [6].

22

VIERI BENCI, MARCO FORTI, AND MAURO DI NASSO

Notice that, for any  ∈ R, we have ∗ = J(c ) = J (c ) ∈ R. In particular, if x ∈ R, then ∗x = J (cx ) = J (x · c1 ) = x · J (c1 ) = x · 1 = x. Moreover, for every set A ∈ V (R), we have that ∗A = {J(ϕ) | ϕ ∈ AN }. A suitable modification of arguments in [6] proves that the map ∗ satisfies the transfer principle, as well as the other properties of Definition 3.3: Theorem 4.2. The triple ∗ ; V (R) ; V (R) is a single superstructure model of hyper-methods that satisfies the countable saturation property. More details can be found in [8]. 4.2. Construction of a hyper-homomorphism. We define by transfinite induction an increasing κ-sequence of fields R |  < κ and an increasing κ-sequence of maps J |  < κ such that, for all  < κ, J : (R )N → R+1 is a surjective homomorphism of R-algebras. If the length κ of the chains has uncountable cofinality, e.g. if κ = 1 , then

N

N R = R .  0} and X − = ∗ X \X + . Intuitively, + (respectively − ) should be the measure E → ◦∗ (∗ E ∩ X + ) (respectively, E → ◦∗ (∗ E ∩ X − )). Indeed, + and − are positive and = + − − . Unfortunately, there is no a priori reason to believe that + and − are countably additive, or even measures. (What Loeb did show for this case, and comparable results for the other decompositions, is that if a Jordan decomposition exists, then this construction gives that decomposition.) Now, suppose we knew Theorem 2.1 to be true." We could take f to be the characteristic function of X + ; then + is E → st∗ ∗E fd ∗ for this f, and the theorem would guarantee both finiteness and countable additivity of + . Similarly, suppose that ,   are finite positive measures on (X, A). Let Π be as above, put X⊥ = {p ∈ Π : ∗ (p) = 0 and ∗ (p) > 0}, let f be characteristic function of X⊥ , and g = 1 −" f. If 1 is the measure " E → st∗ ∗E fd ∗ and 2 is the measure E → st∗ ∗E gd ∗ then it is easy to verify that 1 , 2 form a Lebesgue decomposition:  = 1 + 2 , 2 is absolutely continuous with respect to ( + ), and 1 is singular with respect to . Theorem 2.1 evidently makes it possible to convert some nonstandard representations of measures into nonstandard existence theorems for measures. How might the theorem be proved? Suppose first that is a positive measure, and f is bounded by M. Let En be a disjoint sequence of measurable sets, E = n En , and E m = n>m En . Then      ∗ fd ∗ =∗ fd ∗ +∗ fd ∗ + · · · +∗ fd ∗ + ∗ fd ∗ , ∗E

where

∗E 1

#∗ # # #

∗E m

#  # fd ∗ ## ≤∗

∗E

∗E m

∗E

2

|f|d ∗ ≤∗



m

∗E m

M d ∗ = M (E m ) ∗E m

132

DAVID A. ROSS

which → 0 as m " → ∞. This proves countable additivity; finiteness is similar, and of course ∗ ∅ fd ∗ = 0, proving Theorem 2.1 in this case. The inequalities in the above argument do not necessarily hold for a general signed measure. Suppose however that we that " know independently " the total variation measure # # exists. Clearly |∗ ∗A fd ∗ | ≤∗ ∗A |f|d ∗ # #, which combined with the above argument completes the proof. In the next subsection I give a new nonstandard proof for the existence of the Hahn decomposition, which of course gives the Jordan decomposition, the total variation, and so completes the proof of Theorem 2.1. 2.1. Nonstandard proof of Hahn decomposition. Let (X, A, ) be a signed measure space. In this section I show how to obtain the Hahn decomposition {A, A } of (X, A, ). The advantage of this nonstandard proof is that it follows Loeb’s natural representation given above (and exploits our prior knowledge that ∗ A must differ from the internal X + by an internal nullset). The proof takes the form of a sequence of propositions. First, a standard fact about signed measures. Proposition 2.2. is continuous from above; that is, if E0 ⊇ E1 ⊇ E2 ⊇ · · ·  is a sequence of measurable sets with E = n En then limn→∞ (En ) = (E)  Proof. Put An = En \ En+1 . Then for any n ∈ N, En = E ∪ m≥n Am , and  the latter union is disjoint, so (En ) = (E) + m≥n (Am ). Since (E0 )   is finite, m≥0 (Am ) must exist, so m≥n (Am ) → 0 as n → ∞; it follows  that (En ) → (E). Now, let Π, X + , X − be as in Section 2 above. The next two propositions show that ∗ on X + is (externally) finite, by assuming the negation and extracting a standard set of infinite measure. Proposition 2.3. Suppose A ∈ A and ∗ (∗ A ∩ X + ) is (externally) infinite. Then for some B ⊆ A, (B) < (A) − 1 and ∗ (∗ B ∩ X + ) is infinite. Proof. There are two cases: 1. For some C ⊆ A with (C ) > 1, ∗ (∗ (A \ C ) ∩ X + ) is infinite, then put B = A \ C , and note (B) = (A) − (C ) < (A) − 1. 2. Otherwise, for every measurable C ⊆ A with (C ) > 1, ∗ (∗ (A \ C ) ∩ X + ) is finite. It follows that if n ∈ N+ and C1 , C2 , . . . , Cn are measurable subsets of A with (Ci ) > 1 for each i, then ∗ (∗ (A\(C1 ∩· · ·∩Cn ))∩X + ) is finite. This means that ∗ (∗ (C1 ∩ · · · ∩ Cn ) ∩ X + ) is infinite, and by transfer there is a D ⊆ C1 ∩ · · · ∩ Cn with (D) > n. Inductively define a nested decreasing sequence Cn by: C1 ⊆ A, (C1 ) > 1, which exists by transfer since ∗ (∗ A ∩ X + ) > 1; Cn+1 ⊆ Cn = C1 ∩ · · · ∩ Cn with (Cn+1 ) > n + 1. Then ( n Cn ) = ∞, which cannot happen since is a finite measure.  Proposition 2.4. ∗ (X + ) is finite.

NONSTANDARD MEASURE CONSTRUCTIONS

133

(Note that this means that ∗ (X − ) is finite as well.) Proof. Otherwiseinductively let A0 = X and An+1 ⊆ An with (An+1 )
  1 +

(X ) − 2n . Put A = n m≥n Am . 1 ∗ ∗ − ∗ ∗ − If E  ⊆ An then (E) ≥ ( E ∩ X ) ≥ ( An ∩ X ) > − 2n . Then if E ⊆ m≥n Am , then

◦∗

(E) ≥ (E ∩An )+

$ m≥n

(E ∩(Am+1 \ Am )) > −

1 $ 1 1 + − m+1 = − n−1 . 2n 2 2 m≥n

It follows that if E ⊆ A then (E) ≥ 0. Now, suppose that E ⊆ Am , then (E) ≤ ∗ (∗ E ∩ X + ) ≤ ∗ (X + \  ∗ Am ) < 21m . It follows that if E ⊆ m≥n Am then (E) ≤ 0, and finally if      E ⊆ A = n m≥n Am then (E) ≤ n (E ∩ m≥n Am ) ≤ 0 §3. S-measures, conditional expectation, and the Radon-Nikodym theorem. Internal partitions can also be used to represent the derivatives of measures.  and Π be as in Suppose that and  are finite measures, and  + . Let A ∗ ∗ ∗ ∗ Section 2, and define f : X → R by f(x) = (p)/ (p) if x ∈ p ∈ Π and ∗ (p) > 0, f(x) = 0 otherwise. f behaves very much like the Radond ; in particular, if A ∈ A then Nikodym derivative d $ (A) = ∗ (∗ A) = {∗ (p) : p ∈ Π, p ⊆ ∗ A} $ = {∗ (p) : p ∈ Π, p ⊆ ∗ A, ∗ (p) > 0} & $%  ∗ = fd ∗ : p ∈ Π, p ⊆ ∗ A, ∗ (p) > 0 p  ∗ ∗ = fd . A

"  ∗ (A) = ∗ fd ∗ .) (In fact, for any A ∈ A, A Transformation of this representation into a proof of the Radon-Nikodym Theorem (Theorem 3.2, below) has proved suprisingly difficult. Nonstandard proofs for that theorem to date have followed a different approach (such as Luxemburg’s proof [16] using a result of Riesz).

134

DAVID A. ROSS

In this section I give a natural nonstandard proof of the Radon-Nikodym Theorem. It exploits a construction that C. Ward Henson and Frank Wattenberg first developed for a proof of Egoroff’s Theorem ([6]). 3.1. The S-algebra. Suppose A is an algebra on X , there are two natural sigma-algebras on ∗ X : the Loeb algebra AL , defined as the smallest (normally external) -algebra containing ∗ A, and the algebra AS , defined as the smallest -algebra extending {∗ A : A ∈ A}. AS will be no larger, and often will be smaller, than AL . If is a measure on (X, A) then denote by L the Loeb measure on (∗ X , AL ), obtained from ∗ in the usual way. L is of course also defined on AS . Henson and Wattenberg [6] proved the following extraordinary, though quite simple, result connecting A-measurability and AS -measurability. Theorem 3.1. Suppose G : ∗ X → R is AS -measurable, and g : X → R is defined by g(x) = G(x). Then 1. g is A-measurable, 2. L ({x ∈ ∗ X : ∗ g ≈ G(x)}) = 0 Note in particular that if E ∈ AS then S(E) = X ∩ E (the set of standard elements of E) is measurable: take G as the characteristic function of E, then g is the characteristic function of S(E). Moreover,  

(S(E)) = L (∗ S(E)) = ∗ gd L = Gd L = L (E). 3.2. Radon-Nikodym. Recall the statement of the Radon-Nikodym Theorem: Theorem 3.2. Suppose , are -finite measures on a measurable space 1 (X, A), and "  + . Then there is a g ∈ L (X, A, ) such that for any A ∈ A, (A) = A gd . Such a g is called the Radon-Nikodym derivative of  with respect to , and d . is denoted by d To prove this theorem, note first that we may assume that  and are finite, since if {An }n (respectively {Bn }n ) is a countable set with (An ) < ∞ (respectively (Bn ) < ∞),then  and are both finite on any An ∩ Bm , and if d gmn is d on this set then mn gmn is evidently the Radon-Nikodym derivative of  with respect to for the whole space. In fact, by renormalizing we may suppose that is a probability measure. Now consider the internal function f as defined at the beginning of Section 3. We proceed as follows: 1. Show f is finite L -almost everywhere; it follows that ◦ f is defined ae, and is AL -measurable. " 2. Show that f is S-integrable (see [20]); this ensures that ∗ A ◦ fd L ≈ " ∗ ∗ A fd = (A) for A ∈ A.

NONSTANDARD MEASURE CONSTRUCTIONS

135

3. Put G = E[◦ f | AS ], the conditional expectation of ◦ f with respect to AS . This exists because ◦ f is in L1 ( L ) by S-integrability. Note that some proofs for the existence of the conditional expectation rely on the Radon-Nikodym derivative; to avoid circularity, one must appeal to a proof that does not. A proof using only completeness of L2 can be found in [21]. I offer a different proof below in Subsection 3.3 (where also I review the definition of E). 4. Let g be the restriction of G to X ; by Theorem 3.1 g is A-measurable. 5. Show that g is in fact the Radon-Nikodym derivative. ∗ For n ∈ ∗ N denote by [f < n] the internal  set {x ∈ X : f(x) < n}, and put ∗ [f < ∞] = {x ∈ X : f(x) finite} = n∈N [f < n]. To prove (1), suppose instead that L ([f < ∞]) < 1 − r, r > 0 standard; then ([f < n]) < 1 − r for each n ∈ N, and by overspill ([f < H ]) < 1 − r for some infinite H . Then   fd ∗ ≥ H ∗ ([f ≥ H ]) > rH, ∞ > (X ) = ∗ fd ∗ ≥ ∗ [f≥H ]

a contradiction since r is standard and positive. "For (2), ∗recall ([20]) that it suffices to prove that for any infinite H , ∗ [f>H ] fd ≈ 0. Let ΠH = {p ∈ Π : p ⊆ [f > H ]} = {p ∈ Π : ∗ (p) > H ∗ (p) > 0}, then ∗



fd ∗ = [f>H ]

$



p∈ΠH

=

$



p∈ΠH

 

fd ∗ p

(∗ (p)/∗ (p))d ∗ = ∗ ([f > H ]). p

Suppose now (for a contradiction) that ∗ ([f > H ]) > r > 0 for some N there is standard r. Note ∗ ([f > H ]) ≈ 0. By transfer, for any n ∈  a set Bn ∈ A with (B) > r and (Bn ) < 1/2−n . Put B n = m≥n Bm .   Then (B n ) < m≥n 2−m = 2−n+1 , (B n ) > r, and so ( n B n ) = 0 but  n ( n B ) > r, contradicting that  + . works." Let A ∈ A, then " It remains " to∗ show that " the g we have " constructed ◦ ◦ gd = gd = Gd = E[ f|A ]d L S L = ∗ A fd L (since ∗ ∗ ∗ A A A "A ∗ ∗ ∗ A ∈ AS ) ≈ ∗ A fd (by S-integrability) = (A) = (A), as desired. 3.3. Conditional expectation. Theorem 3.3. Suppose (X, A, ) is a probability measure, that B ⊆ A is another -algebra, and that f ∈ L1 (X, " is a B-measurable function " A, ). There g : X → R such that for any B ∈ B, B fd = B gd .

136

DAVID A. ROSS

The function g is called the conditional expectation of f on B, and denoted by E[f|B]. For completeness, I now give a proof of Theorem 3.3 not using the RadonNikodym Theorem. " 1 " Suppose first that f ≥ 0. Put G = {g ∈ L (X, B, ) : ∀B ∈ B "B gd ≤ B fd }. Note if g1 , g2 ∈ G then max{g1 , g2 ,"0} ∈ G. Let r = sup{ X gd : g ∈ G}, and for n ∈ N let gn ∈ G with X gn d > r − 1/2n ; we may " g1 ≤ g2 ≤ "assume that 0 ≤ " · · · . Put g = supn gn . Note"that if B ∈ B, gd = sup g d ≤ n n B B B fd , so " g ∈ G. " Moreover, X gd = r. It remains to show that for any B ∈ B, B fd = B gd ." " Suppose not; then there is a B ∈ B and > 0 with B fd − B gd = . Then         fd − gd + fd − gd ≥ + 0 > 0. fd − gd = B

"

B

B

B

" "It follows that for some  > 0, f − ( + g)d > 0, that is, gd +  (X ) < fd . " Let  be the signed measure E → E f − ( + g)d > 0. Let {A, A } be the Hahn decomposition of . g  =" g + IA , where IA is the characteristic " Put  function of A. Note that (i) g d = gd +  (A) > r, and (ii)         g d = g d + g d = (g + )d + gd  B B∩A B∩A    B∩A B∩A (g + )d + fd ≤ fd + fd ≤ B∩A B∩A B∩A B∩A  = fd , B  which means " that g ∈ G. However, (i) and (ii) cannot both be true, so in fact " fd = B gd . This proves the theorem for nonnegative f. B The general result follows by taking E[f|B] = E[f + |B] − E[f − |B], where f + = max{f, 0} and f − = f + − f.

§4. Lyapunov theorem. In this section we consider the well-known theorem of Lyapunov, which asserts that the range of a finite, atomless, finitedimensional vector measure is convex. Theorem 4.1. Let 1 , . . . , n be finite atomless measures on (X, B). Then {( 1 E, . . . , n E) : E ∈ B} is a convex subset of Rn . Equivalently, ∀r ∈ (0, 1) ∃E ∈ B ∀i ≤ n i (E) = r i (X ) (Recall that is atomless provided that for any B ∈ B with (B) > 0, there is an A ∈ B with A ⊆ B and 0 < (A) < (B).) There are many standard proofs, though the simplest is certainly [18].

NONSTANDARD MEASURE CONSTRUCTIONS

137

Note that the theorem fails for countable families of measures. For example, if X = [0, 1] with the usual Borel algebra, {(rn , sn )}n is an enumeration of all subintervals of [0, 1] with rational endpoints, and n is the restriction of Lebesgue measure to (rn , sn ) (but defined on all of X ), then there is no E such that n (E) = (sn − rn )/2 for all n; otherwise, since the Borel algebra is generated by the intervals (rn , sn ), such an E would then also satisfy

(E) = (E ∩ E) = (E)/2. The interest of Theorem 4.1 for nonstandard analysis is twofold. First, in the nonstandard setting there are straightforward analogues of the result to infinite families. For example, if { n }n∈I is an internal, hyperfinite, infinite family of ∗ atomless ∗ measures, then by transfer Theorem 4.1 holds for { n }n∈I , and therefore for the external infinite family {( n )L }n∈I . If instead {( n )L }n∈I is an external family of finite atomless Loeb measures, with a small index set (that is, the nonstandard model is card(I )+ -saturated), then a straightforward saturation argument shows that Lyapunov’s Theorem holds for this family as well. Of more fundamental interest is that the Lyapunov result would seem to be a ‘test case’ for the representation of standard measures by nonstandard ones. In the context of internal measures defined on a hyperfinite set, Peter Loeb [13] has proved the following discrete Lyapunov Theorem. Theorem 4.2 (Loeb, 1973). Let 1 , . . . , n be S-finite internal measures on (Ω,∗P(Ω)), where Ω is ∗finite and i ({}) ≈ 0 ∀ ∈ Ω. Then ∀r ∈ (0, 1) ∃E ∈∗ P(Ω) ∀i ≤ n i (E) ≈ r i (Ω). The proof is a quick and clever application of a deep and difficult classical theorem of E. Steinitz from 1913. The clear application of this result would be a nonstandard proof for Theorem 4.1 along the following lines: i , it suffices to consider mea1. By considering the functions d ( 1 + d2 +···+ n) n sures on X = [0, 1] . 2. If Ω is an hyperfinite set S-dense in [0, 1]n then each n can be obtained as the image under the standard part map of a Loeb measure iL on Ω.  ∈∗ P(Ω) ∀i ≤ n i (E)  ≈ r i (Ω). 3. By Theorem 4.2, ∀r ∈ (0, 1) ∃E  down to a set E ⊆ X . 4. ‘Push’ E

Only the last step in this list is not straightforward; unfortunately, there seems to be no good way to do this, and a nonstandard proof for Theorem 4.1  the two obvious from Theorem 4.2 remains elusive. In particular, for general E ◦ ∗  pushdowns, namely E and X ∩ E, might be either X or ∅ or almost anything in between. By the results of Henson and Wattenberg discussed above, it would  ∈ AS , but neither the construction in [13] suffice to somehow ensure that E nor that below suggest how this can be done.

138

DAVID A. ROSS

The remainder of this section is devoted to a new direct proof for Theorem 4.2. The underlying idea of this proof has recently been adapted by the author to an easy standard proof for Theorem 4.1 (see [18]), but a direct proof of the standard result from Theorem 4.2 still seems to me to be of interest. Proof. Induct on (standard) n. (n = 1) Write Ω = {1 , . . . , H }, and let N be ∗ -least such that 1 ({1 , . . . , 1 N }) ≥ r 1 Ω. Put E = {1 , . . . , N }. Then | 1 (E)−r 1 (Ω)|  ≤ i({N }) ≈ n+1 0. (n ⇒ n + 1) First, suppose (for simplicity) that L + i≤n L . For any finite k ∈ N there is (by induction) an internal partition Ω1 , . . . , Ωk of Ω such that 1 ∀i ≤ n, j ≤ k

iL Ωj = iL Ω. k Then for any I ⊆ {1, . . . , k} and i ≤ n, # i # # (∪j∈I Ωj ) #I # # # #< 1 − (∗) # i (Ω) k # k2 By overspill, for some infinite k ∈ N and ∗−partition Ω1 , . . . , Ωk of Ω, (∗) holds for every internal I ⊆ {1, . . . , k} and i ≤ n.  n+1 + i≤n iL ). Note that n+1 L Ωi = 0 for every i ≤ k. (Otherwise L Without loss of generality, (1)

n+1 Ω1 ≤ n+1 Ω2 ≤ · · · ≤ n+1 Ωk .

Let R ≤ k such that R/k ≈ r. Consider the sets: Λ1 = Ω1 ∪ · · · ∪ ΩR Λ2 = Ω1 ∪ · · · ∪ ΩR−1 ∪ Ωk Λ3 = Ω1 ∪ · · · ∪ ΩR−2 ∪ Ωk−1 ∪ Ωk .. . ΛR = Ωk−R+1 ∪ · · · ∪ Ωk By assumption (1), n+1 (Λ1 ) ≤ n+1 (Λ2 ) ≤ · · · ≤ n+1 (ΛR ). Moreover,

(Λ1 ) ≤ Rk ≤ n+1 (ΛR ) It follows that for some j ≤ R, n+1 (Λj ) ≈ Rk ≈ r. It follows that so E = Λj will work. It remains the case where n+1 L is not absolutely continuous with  to consider i respect to i≤n L . In this case there is (by the Lebesgue decomposition; see   Section 2) a set B such that n+1 + i≤n iL on B, i≤n iL (Ω \ B) = 0. L We may take B internal. Apply the above argument to the restriction of

n+1 to B to find an internal set E0 ⊆ Ω with i (E0 ) ≈ r i (E0 ) for i ≤ n and n+1 (E0 ∩ B) ≈ r n+1 (B). Apply the n = 1 case to find an internal E1 ⊆ (Ω \ B) such that n+1 (E1 ) ≈ r n+1 (Ω \ B). Then E := E0 ∪ E1 works.  n+1

NONSTANDARD MEASURE CONSTRUCTIONS

139

§5. Vitali-Hahn-Saks, characterizations of -additivity, and category. One often obtains a nonstandard measure as a limiting element of a sequence of approximations. In this case, the limit measure is automatically -additive. Theorem 5.1. Let n be finite measures on (X, B), and suppose the function

(E) := st H (∗E)

(E ∈ B)

does not depend on the choice of H ∈∗ N \ N. Then is -additive. In fact, this is just a restatement of a standard result: Theorem 5.2 (Vitali-Hahn-Saks). Let n be -additive finite measures on (X, B), and suppose (E) = limn→∞ n (E) exists for all E ∈ B. Then is a (-additive) measure. Note that is obviously a finitely-additive measure, the problem is to show that it is in fact -additive. Standard proofs for this result generally take two forms. One is as a consequence of the Baire category theorem. I discuss this kind of proof in Section 5.4 below, since a general form of this argument seems to invite a nonstandard treatment. The other usual proof is a so-called “gliding hump” argument. The nonstandard proof starting in Section 5.1 is essentially such an argument; the advantage of the nonstandard formulation is that the problem becomes one of explicitly locating mass at infinity, and leads naturally to some new nonstandard characterizations of -additivity. However, I first consider a natural, illuminating, but false Loeb-measure proof for Theorem 5.2. Suppose (for a contradiction) that is not countably additive; there  is then a nested decreasing sequence of sets A0 ⊇ A1 ⊇ A2 ⊇ · · · with n An = ∅ but (An ) → 0. This means that r = L (A∞ ) > 0, where A∞ = n∈N ∗ An . In fact, ( n )L (A∞ ) = r for every infinite n. On the other hand, for each standard n, ( n )L (A∞ ) = 0 since n is countably-additive. (This would seem to contradict overspill, but of course does not since ( n )L and A∞ are external.)  Put  = n 2−n n , which one easily verifies is countably-additive. For any E, if (E) = 0 then n (E) = 0 for all n, so (E) = 0. In other words, + . To obtain the desired contradiction, it would suffice to infer from +  that L + L . However, at this point the proof fails, because this natural inference fails. In fact, it only holds if we know already that is countably additive. Theorem 5.3. Suppose that (X, B, ) is a finite measure, that is a finite, finitely-additive measure on (X, B), and that for any E ∈ B, if (E) = 0 then n (E) = 0. The following are equivalent: (1) is countably additive; (2) L + L

140

DAVID A. ROSS

Proof. (1 → 2) Suppose for a contradiction that L (E) = 0 but L (E) = r > 0 for some E ∈ BL . For every n ∈ N there exists an set En ∈ ∗ B with E ⊆ En and ∗ (En ) < 1/2n ; of course, ∗ (En ) > r/2. By there is an  transfer,  An ∈ B with (An ) < 1/2n and (An ) > r/2. Put A = n∈N m>n Am . Since    ( m>n Am ) ≤ m>n 1/2m = 1/2n , (A) = 0. However, ( m>n Am ) > r/2, so (A) ≥ r/2 > 0, contradicting (1).  An = ∅. L (∗ An ) = (2 → 1) Let An ∈ B be a decreasing sequence with n  (An ) → 0 as n → ∞ by countable additivity of , so L ( n∈N ∗ An ) = 0. By  (2), L ( n∈N ∗ An ) = 0, so (An ) = L (∗ An ) → 0 as n → ∞ by countable additivity of L . This proves (1).  5.1. Point masses on N. I now give a criterion for countable additivity for measures (starting with point masses) on N, and use it to give a proof of Theorem 5.2 on that space. The theory of finitely-additive, 0, 1-valued measures (i.e., ultrafilters) on (N, P(N)) is well-understood both standardly and nonstandardly. Nevertheless, a brief reminder will be useful for later arguments. Proposition 5.4. Let be a finitely-additive measure on (N, P(N)) taking only the values 0 and 1. For some H ∈ ∗ N, (A) = 1 if and only if for every (standard ) A ⊆ N, H ∈ ∗ A. Proof. Let A = {∗ A : A ⊆ N, (A) = 1}. If A, B ∈ A then by finite additivity (A ∩ B) = (A) + (B) − (A ∪ B) = 1 + 1 − 1 = 1; it follows that Ahas the finite intersection property, and by saturation we can take H ∈ A. If A ⊆ N then (A) = 1 ⇐⇒ ∗ A ∈ A ⇐⇒ H ∈ ∗ A as desired.  If , H are as in the last proposition write = H . Note that in general the choice of H is not unique; however: Theorem 5.5. Let be a finitely-additive measure on (N, P(N)) taking only the values 0 and 1. Then the following are equivalent: 1. 2. 3. 4.

For some (standard ) H ∈ N, = H For a unique H ∈ ∗ N, = H

is -additive

L (∗ N \ N) = 0

Proof. (1 → 2) H standard means {H } = ∗ {H } and ∗ N \ {H } = N \ {H }, so ({H }) = 1 and ∗ (∗ N \ {H }) = (N \ {H }) = 0. However, if J = H then J ∈ ∗ N \ {H }, so = J . ∗ (2 → 1) Suppose (2) holds, and (for a contradiction) H ∈  N \ N. Let A be as in the proof of Proposition 5.4; it suffices to show that A contains an element other than H , i.e. that A ∪ (∗ N \ {H }) has the finite intersection property. Since A is already closed under finite intersections, it suffices to show that any element of A contains an element other than H . However, if ∗

NONSTANDARD MEASURE CONSTRUCTIONS ∗

141

A ∈ A then it follows from H ∈ ∗ N \ N and transfer that A is infinite, so A = {H }. (1 → 4) If = H , H finite, then {H } = ∗ {H } so L (∗ {H }) = ({H }) = 1 and L (∗ N \ N) ≤ L (∗ N \ {H }) = 0. (4 → 3) Assume (4), and suppose An ⊆ N is a nested sequence of sets decreasing to ∅. Note that for any A ⊆ N, N ∩ ∗ A = A. It follows that

(An ) = L (∗ An ) = L (N ∩∗ An ) (by (4)) = L (An ), which tends to 0 as n → ∞ since Loeb measure is countably additive. (3 → 1) Suppose that for each standard H , = H ; then ({H }) = 0 By countable additivity, (N) = 0, a contradiction.  The Vitali-Hahn-Saks Theorem for 0, 1-valued measures on N is now straight-forward: Theorem 5.6. Let n be -additive measures on (N, P(N)) taking only the values 0 and 1. Suppose (E) = limn→∞ n (E) exists for all E ⊆ N. Then is a -additive measure. Proof. If is not -additive then = H for some H ∈ ∗ N \ N. Let A = {N ∈ N : ∃n ∈ N n = N }; for every n there is an N ∈ A with n = N . Thus n (A) = 1 for every n, so (A) = 1. As H ∈ ∗ A and H is infinite, A is infinite, so A can be partitioned into two disjoint infinite sets C and D. H is in either ∗ C or ∗ D but not both; for definiteness, assume H ∈ ∗ D. Then 0 = (D); but also (D) = limn→∞ n (D) = limn∈D n (D) = 1, a contradiction.  5.2. Measures on N. I now generalize the above to arbitrary finite measures (not just point masses) on N. The analogue of Proposition 5.4 is the following. Proposition 5.7. Let be a finite,  finitely-additive measure on (N, P(N)). For some p : ∗ N → ∗ [0, ∞), (A) = ∗ {p(a) : a ∈ ∗ A} for every (standard ) A ⊆ N.  ⊆ ∗ P(N) be hyperfinite with ∗ A ∈ A  for all A ⊆ N, and let Proof. Let A ∗ ∗ ∗ Π be the corresponding -partition of N. If n ∈ N let (n) be the element of Π containing n, and let p(n) = ∗ ((n)) if n is the least element of (n), otherwise p(n) = 0. Note that  for n standard, (n)  = {n} and p(n) = ({n}).  If A ⊆ N then (A) = ⊆∗ A,∈Π ∗ () = (n)⊆∗A,∈Π p(n) = n∈∗ A p(n) as desired.  p If , p are as in the last proposition write = . By abuse of notation, ∗ we will write p instead of ( p) when p : N → [0, ∞) is standard. Note that in general the choice of p is not unique; however: Theorem 5.8. Let be a finite, finitely-additive measure on (N, P(N)). Then the following are equivalent: 1. For some (standard ) p : N → [0, ∞), = p 2. is -additive 3. L (∗ N \ N) = 0 ∗

142

DAVID A. ROSS

Before proving the theorem it is worth observing that this is closely related to the Radon-Nikodym Theorem. In particular, in the implication (2 → 1) the function p is simply the Radon-Nikodym derivative of with respect to counting measure on N.  Proof. Note first that if p is standard and A ∈ P(N) then ∗ {∗ p(a) : a ∈  ∗ A} = {p(a) : a ∈ A} by transfer. (1 →  3) If = p , p standard, then 

L (∗ N) ≈ ∗ (∗ N) = (N) = n∈N p(n) = n∈N L ({n}) = L (N) (since

L is -additive). (3 → 2) Assume (3), and suppose An ⊆ N is a nested sequence of sets decreasing to ∅. Note that for any A ⊆ N, N ∩ ∗ A = A. It follows that

(An ) = L (∗ An ) = L (N ∩∗ An ) (by (3)) = L (An ), which tends to 0 as n → ∞ since Loeb measure is countably additive.  (2 → 1) Put p(n) = ({n}) for n ∈ N. If A ⊆ N then (A) = n∈A ({n})   (by -additivity) = n∈A p(n) = ∗ n∈∗ A ∗ p(n).  The proof of the Vitali-Hahn-Saks Theorem for measures on N is now similar to that for Theorem 5.6. The idea is to find for each n a ‘piece’ of the measure space that acts like the support, but also travels off to infinity as n → ∞, and use this as a substitute for the supports H from Theorem 5.6. Theorem 5.9. Let n be -additive probability measures on (N, P(N)). Suppose (E) = limn→∞ n (E) exists for all E ⊆ N. Then is a -additive measure. Proof. Suppose (for a contradiction) that is not -additive; then =

L (∗ N \ N) > 0 For each n ∈ N there is (by -additivity of n ) a least an with n [0, an ] > 1 − , and a least bn ≥ an with [an , bn ] > 3 /4. Put A = {an : n ∈ N} and In = [an , bn ]. Note that A is unbounded; for, if A ⊆ [0, N ] for a standard N , then ([0, N ]) = limn→∞ n ([0, N ]) ≥ limn→∞ n ([0, an ]) > 1 − ≥=

L (N) ≥ L ([0, N ]) = ([0, N ]). By passing if necessary to a subsequence we may assume that a1 < b1 <   a2 < b2 < a3 < b3 · · · . Put C = n I2n , D = n I2n+1 , C∞ = ∗ C \ N = n∈∗ N\N I2n D∞ = ∗ D \ N = n∈∗ N\N I2n+1 . It suffices to show that ∗

L (C∞ ), L (D∞ ) ≥ 3 /4,  since = L ( N \ N) ≥ L (C∞ ∪ D∞ ). For N ∈ N put CN = n∈N,n>N I2n , and note that C∞ =





C N · L (∗ C N ) = (CN ) = lim n (CN ) n→∞

N

=

lim

n→∞,n>N

2n (CN ) ≥

lim

n→∞,n>N

2n (I2n ) ≥ 3 /4,

and by countable additivity of L , L (C∞ ) ≥ 3 /4. The proof for D∞ is similar. 

NONSTANDARD MEASURE CONSTRUCTIONS

143

5.3. General measures. The Vitali-Hahn-Saks Theorem for general finite measures (Theorem 5.2) is an immediate consequence of Theorem 5.9. Let n be -additive finite measures on (X, B), and suppose (E) = limn→∞ n (E) exists for all E ∈ B. Since n (X ) → (X ), we may if necessary divide each

n by n (X ) and by (X ) in order to assume that each n is a probability measure. Suppose (for a contradiction) that is not countably additive,  then there is a nested decreasing sequence of sets A0 ⊇ A1 ⊇ A2 ⊇ · · · with n An = ∅ but

(An ) → 0. For n, k ∈ N let pk (n) = k (An \An+1 ) and p(n) = (An \An+1 ); it is easy to see that the measures pn and p on (N, P(N)) satisfy the hypotheses, but not the conclusion, of Theorem 5.9. This completes the proof. Finally, it should be noted that Theorem 5.8 can be applied to general measures, by considering countable measurable partitions, as in the above proof. If (X, B, ) is a finitely-additive measure, and f : X → N is measurable, define a measure f on (N, P(N )) by f (A) = (f −1 [A]). The proof of the following is straightforward and standard, so is omitted. Proposition 5.10. Suppose (X, B, ) is a finitely-additive measure. The following are equivalent: (i) is -additive; (ii) For every measurable f : X → N,

f is -additive. 5.4. Category. Another way to prove Theorem 5.2 is by a Baire category argument. More generally, one can prove: Theorem 5.11. Let X and Y be complete metric spaces, fn : X → Y continuous functions, and suppose f(x) := limn→∞ fn (x) exists for all x ∈ X . Then for some x ∈ X , f is continuous at x. The proof is a straightforward application of the Baire category theorem. To see that this implies Theorem 5.2, let n be -additive finite measures on (X, B), and suppose

(E) = limn→∞ n (E) exists for all E ∈ B. Let   be the measure n n /2n , then B (modulo -nullsets) can be completely metrized by d (A, B) = (A∇B). Since n + , each n is continuous on B, so that by Theorem 5.11 is continuous  at a point, that is, for some A ∈ B, if A1 ⊇ A2 ⊇ A3 ⊇ A4 ⊇ · · · andA = n An then inf n (An ) = (A). Now, if B1 ⊇ B2 ⊇ B3 ⊇ B4 ⊇ · · · and n Bn = ∅ then consideration of An = A ∪ Bn shows inf n (Bn ) = 0, that is, is -additive.2 There has been some interesting, but largely unsuccessful, effort in the past to use nonstandard analysis to better understand category arguments; see, for example, [2]. This seems to be an area of mathematics that deserves more nonstandard consideration. §6. Problems. I conclude with some open problems connected to the above results. 2 The

author first saw this nice proof in a Usenet post by Ronald Bruck.

144

DAVID A. ROSS

6.1. Countable additivity. Theorems 2.1, 5.1, and 5.3 give conditions sufficient (and sometimes necessary) for a measure to be countably additive. In these cases there are other measures to which the constructed one is related. A natural question is whether there is an intrinsic property of a nonstandardly-constructed measure which guarantees countable additivity. For example, suppose (as in Section 1.1) that (X, B) is a measurable space, Ω is a hyperfinite sample from X , and (A) =◦ #∗ A ∩ Ω#/#Ω# for every A ∈ B. Karel Hrbacek has shown ([7]) that the following natural condition is sufficient for to be countably additive: For every sequence {An }n∈N ⊆ B and infinite H ∈ ∗ N, if X ∩ AH = ∅ then #∗ AH ∩ Ω#/#Ω# ≈ 0. The condition is not, however, necessary, in fact Hrbacek has shown that the condition is false for atomless measures and sufficiently saturated nonstandard models. For example, let X = [0, 1], Ω any sample such that is Lebesgue measure, and let {An }n be an enumeration of subintervals of X with rational endpoints. For any finite E ⊆ X and standard > 0 there is an An such that (1) An ∩ E = ∅ and (2) #∗ An ∩ Ω#/#Ω# > 1 − (just take an An satisfying (1) and (An ) > 1 − ). By saturation there is an AH with #∗ AH ∩ Ω#/#Ω# ≈ 1 and x ∈ AH for every standard x ∈ X . So, the question of finding a useful intrinsic characterization for countable additivity of is still open. (“Useful” here means that it should lead to new, essentially nonstandard proofs for theorems such as Theorem 5.2.) 6.2. Lyapunov’s theorem. There are several natural questions related to Lyapunov’s theorem: 1. Find a way to infer Lyapunov’s Theorem directly from Loeb’s discrete version. 2. More generally, find an essentially nonstandard proof of Lyapunov’s Theorem. 3. Obtain a deeper standard result from the combination of Steinitz’s Theorem and the nonstandard machinery. The goal of such problems would not necessarily be to find a simpler proof for Lyapunov’s theorem than the simplest standard proofs, as the latter are not likely to be bested, but rather to further explore and understand the power of the nonstandard machinery. 6.3. Vitali-Hahn-Saks. As with Lyapunov’s theorem, the Vitali-Hahn-Saks theorem (and related discussion) leads to some interesting questions for nonstandard methods: 1. Find a more direct nonstandard proof of Theorem 5.2. The first proof sketched in Section 5 would qualify if it was correct. The proof given in the subsequent sections, while motivated by the essentially

NONSTANDARD MEASURE CONSTRUCTIONS

145

nonstandard proof for Theorem 5.6, is not completely satisfying in this regard as it is conceptually little different from standard gliding hump arguments. A similar problem is: 2. Find an essentially nonstandard proof of Theorem 5.11. A nonstandard proof of this, especially one that manages to avoid standard category arguments, could be a significant step forward in the nonstandard understanding of category arguments. Another consequence of Theorem 5.11 is the Banach-Steinhaus (or uniform boundedness) theorem for families of linear operators: Theorem 6.1 (Banach-Steinhaus). Let {Tn }n∈N be a family of bounded linear operators on a Banach space X , and suppose that for every x ∈ X {Tn (x)}n∈N is bounded. Then sup{#Tn (x)# : n ∈ N, x ∈ X, #x# ≤ 1} < ∞. A variant on the last two problems would be: 3. Find an essentially nonstandard proof of the Banach-Steinhaus Theorem. REFERENCES

[1] A.R. Bernstein and F. Wattenberg, Nonstandard measure theory, International symposium on applications of model theory to algebra, analysis, and probability, Holt, Rinehart and Winston, New York, 1969, pp. 171–185. , Cardinality-dependent properties of topological spaces, Victoria symposium on [2] nonstandard analysis (Univ. Victoria, Victoria, B.C., 1972), Lecture Notes in Math., vol. 369, Springer, Berlin, 1974, pp. 50–59. [3] Sergio Fajardo and H. Jerome Keisler, Model theory of stochastic processes, Lecture Notes in Logic, vol. 14, Association for Symbolic Logic, Urbana, IL, 2002. [4] D.H. Fremlin, Compact measure spaces, Mathematika, vol. 46 (1999), no. 2, pp. 331–336. [5] C.W. Henson, Følner condition =⇒ amenability, Transactions of the American Mathematical Society, vol. 172 (1972), pp. 437– 446. [6] C.W. Henson and F. Wattenberg, Egoroff’s theorem and the distribution of standard points in a nonstandard model, Proceedings of the American Mathematical Society, vol. 81 (1981), no. 3, pp. 455– 461. [7] K. Hrbacek, Countably additive measures in nonstandard analysis, Handwritten manuscript, 1995. [8] Renling Jin and H. Jerome Keisler, Maharam spectra of Loeb spaces, The Journal of Symbolic Logic, vol. 65 (2000), no. 2, pp. 550–566. [9] Teturo Kamae, A simple proof of the ergodic theorem using nonstandard analysis, Israel Journal of Mathematics, vol. 42 (1982), no. 4, pp. 284–290. [10] H. Jerome Keisler and Yeneng Sun, Loeb measures and Borel algebras, Reuniting the antipodes—constructive and nonstandard views of the continuum (Venice, 1999), Synthese Lib., vol. 306, Kluwer Acad. Publ., Dordrecht, 2001, pp. 111–117. [11] Peter A. Loeb, A nonstandard representation of measurable spaces and L∞ , American Mathematical Society. Bulletin, vol. 77 (1971), pp. 540–544. , A non-standard representation of measurable spaces, L∞ , and L∗∞ , Contributions [12] to non-standard analysis (Sympos., Oberwolfach, 1970), Studies in Logic and Found. Math., vol. 69, North-Holland, Amsterdam, 1972, pp. 65–80. , A combinatorial analog of Lyapunov’s theorem for infinitesimally generated atomic [13] vector measures, Proceedings of the American Mathematical Society, vol. 39 (1973), pp. 585–586.

146

DAVID A. ROSS

[14] , Conversion from nonstandard to standard measure spaces and applications in probability theory, Transactions of the American Mathematical Society, vol. 211 (1975), pp. 113– 122. [15] W.A.J. Luxemburg, A general theory of monads, Applications of model theory to algebra, analysis, and probability (Internat. Sympos., Pasadena, Calif., 1967), Holt, Rinehart and Winston, New York, 1969, pp. 18–86. , On some concurrent binary relations occurring in analysis, Contributions to non[16] standard analysis (Sympos., Oberwolfach, 1970), Studies in Logic and Found. Math., vol. 69, North-Holland, Amsterdam, 1972, pp. 85–100. [17] Werner Rinkewitz, Compact measures and measurable weak sections, Mathematica Scandinavica, vol. 91 (2002), no. 1, pp. 150–160. [18] David A. Ross, An elementary proof of Lyapunov’s theorem, American Mathematical Monthly, accepted for publication. , Compact measures have Loeb preimages, Proceedings of the American Mathemat[19] ical Society, vol. 115 (1992), no. 2, pp. 365–370. , Loeb measure and probability, Nonstandard analysis (Edinburgh, 1996), NATO [20] Adv. Sci. Inst. Ser. C Math. Phys. Sci., vol. 493, Kluwer Acad. Publ., Dordrecht, 1997, pp. 91–120. [21] David Williams, Probability with martingales, Cambridge Mathematical Textbooks, Cambridge University Press, Cambridge, 1991. DEPARTMENT OF MATHEMATICS UNIVERSITY OF HAWAII HONOLULU, HI 96822, USA

E-mail: [email protected] URL: http://www.math.hawaii.edu/∼ ross URL: http://www.infinitesimals.org

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

RENLING JIN

Abstract. Inverse problems study the structure of a set A when the A + A is “small”. In the article, the structure of an infinite set A of natural numbers is described when A + A has the least possible upper asymptotic density and A contains two consecutive numbers. For example, if the upper asymptotic density α of A is between 0 and 12 , the upper asymptotic density of A + A is less than or equal to 32 α, and A contains two consecutive numbers, then A is either a large subset of the union of two arithmetic sequences with same common difference k = α2 , or for any increasing sequence hn of positive integers such that the relative density of A in [0, hn ] approaches α, the set A ∩ [0, hn ] can be partitioned into two parts A ∩ [0, cn ] and A ∩ [bn , hn ] such that cn /hn approaches 0, i.e. the cardinality of A ∩ [0, cn ] is relatively very small, and (hn − bn )/hn approaches to α, i.e. the cardinality of A ∩ [bn , hn ] is relatively the same as the cardinality of the interval [bn , hn ].

§1. Introduction. Let N be the set of all natural numbers, including 0. A and B will always denote the sets of natural numbers, and a, b, c, h, i, j, k, m, n, x, y, z will always denote natural numbers. For any m, n and A, we write [m, n] for the set {k ∈ N : m  k  n}, A[m, n] for the set A ∩ [m, n], and A(m, n) for the number of elements in A[m, n]. Sometimes, we write A(1, n) as A(n). For any A and B we write A ± B for the set {a ± b : a ∈ A and b ∈ B}, and 2A for A + A. For a set A and a number b we write A ± b for A ± {b}. In this paper we often use (2A)(m, n) for |{x ∈ 2A : m  x  n}| and 2A(m, n) for 2 times A(m, n). We write a.p. as an abbreviation for arithmetic progression. The upper asymptotic density of a set A is defined by A(n) d¯(A) = lim sup . n n→∞ The main results of this paper describe the structural properties of A when A contains two consecutive numbers and d¯(2A)

  = inf d¯(2B) : B contains two consecutive numbers and d¯(B) d¯(A) .

2000 Mathematics Subject Classification. Primary 11B05, 11B13, 11U10, 03H15. Key words and phrases. Upper asymptotic density, inverse problem, nonstandard analysis. The author was supported in part by the NSF grant DMS–#0070407. Nonstandard Methods and Applications in Mathematics Edited by N. J. Cutland, M. Di Nasso, and D. A. Ross Lecture Notes in Logic, 25 c 2006, Association for Symbolic Logic 

147

148

RENLING JIN

To motivate of the main result, we quote a few sentences from the preface of the book [9]: “The classical problems in additive number theory are direct problems, in which we start with a set A of integers and proceed to describe the h-fold sumset hA, that is, the set of all sums of h elements of A. In an inverse problems, we begin with the sumset hA and try to deduce information about the underlying set A. In the last few years, there has been remarkable progress in the study of inverse problems for finite sets in additive number theory. There ¨ are important inverse theorems due to Freiman, Kneser, Plunnecke, Vosper, and others. In particular, Ruzsa recently discovered a new method to prove a generalization of Freiman’s theorem.” Although the results in this paper are not directly related to the Freiman’s Theorem mentioned above, they share the same pattern, which says that if 2A is small, then A must have some structure. In fact, the idea of inverse problem occurs also in some of the theorems involving densities. The theorems about Shnirel’man pairs and Mann’s pairs in [4] deduce information about the Shnirel’man density of A and Shnirel’man density of B when the Shnirel’man density of A+B is small. Kneser’s Theorem (cf. [3, 1]) deduces information about A + B, which gives information about A and B when the lower asymptotic density of A + B is small. In [7], the inverse problems for upper asymptotic density are considered. We describe the structural properties of A when the upper asymptotic density of 2A + {0, 1} is small. However, adding {0, 1} to 2A seems to be a non-traditional condition. The result will be more interesting if the condition can be replaced by some condition more natural to number theorists. Why do we need to add {0, 1} to 2A in the first place in [7]? Let α = d¯(A) and a0 = min A. By Lemma 1.5, one can prove the following. If α  12 and gcd(A − a0 ), the greatest common divisor of all numbers in A − a0 , is one, then d¯(2A) 32 α. If α > 12 , then d¯(2A) α+1 2 . Note that the two inequalities above are optimal. There exists two kinds of sets, which witness that the equalities hold. Example 1.1. For any real number 0  α  1, let A=



 n n -(1 − α)22 ., 22 . n=1

Then d¯(A) = α, d¯(2A) = d¯(2A + {0, 1}) = d¯(2A + {0, 1}) = 32 α if α  12 .

1+α 2

if α

1 2,

and d¯(2A) =

Example 1.2. Let k, m, n ∈ N be such that k 4 and 2m, 2n, m + n are not pairwise equivalent modulo k. Let A = {m + ik : i ∈ N} ∪ {n + ik : i ∈ N}. Then d¯(A) = = α  12 and d¯(2A) = k3 = 32 α. One can choose k, m, n such that gcd(A − a0 ) = 1. Note that d¯(2A + {0, 1}) 2α. 2 k

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

149

We believe that if A is a set with positive upper asymptotic density such that gcd(A − a0 ) = 1 and the upper asymptotic density of 2A reaches its smallest possible value, then A should be a set similar to the one in Example 1.1 or to the one in Example 1.2. If one requires that d¯(2A + {0, 1}) = 32 α when d¯(A) = α  12 , then A cannot be the set similar to the one in Example 1.2. Hence one needs only to show that A is a set similar to the one in Example 1.1 as done in [7]. This greatly simplifies the proof. Besides, adding {0, 1} to the set 2A makes it possible to apply Besicovitch’s Theorem [3, page 6] to the proof of [7, Lemma 2.1]. Without adding {0, 1}, one needs not only consider that A can be a set similar to the one in Example 1.2, but also find a new way of proofs by-passing Besicovitch’s Theorem. Of course, a new condition must be added. The ideal condition should be gcd(A − a0 ) = 1. However, so far we are unable to derive the same conclusion as in Part II of Theorem 1.3 with this condition. Instead, a condition slightly stronger than gcd(A − a0 ) = 1 is added: the set A contains two consecutive numbers, and leave the case with gcd(A − a0 ) = 1 as an unsolved question near the end of this paper. Next, we state the main theorem of this paper. Theorem 1.3. Let A be a set of natural numbers and d¯(A) = α > 0. Part I: Assume α > 12 . Then d¯(2A) = 1+α 2 implies that for any increasing n) = α, one has sequence hn : n ∈ N with limn→∞ A(0,h hn +1 (2A) 0, hn lim = α. n→∞ hn + 1 Part II: Assume α < 12 and A contains two consecutive numbers. Then d¯(2A) = 32 α implies that either (a) there exist k and c such that α = k2 and A ⊆ {c + ik : i ∈ N} ∪ {c + 1 + ik : i ∈ N} or (b) for any increasing sequence hn : n ∈ N with limn→∞ exist two sequences 0  cn  bn  hn such that A b n , hn = 1, lim n→∞ hn − bn + 1 cn = 0, lim n→∞ hn

A(0,hn ) hn +1

= α, there

and [cn , bn ] ∩ A = ∅ for every n ∈ N. Part III: Assume α = 12 and A contains two consecutive numbers. Then d¯(2A) = 32 α implies that either (a) there exist c ∈ {0, 1, 2, 3} such that A ⊆ {c + 4i : i ∈ N} ∪ {c + 1 + 4i : i ∈ N}

150

RENLING JIN

n) or (b) for any increasing sequence hn : n ∈ N with limn→∞ A(0,h hn +1 = α, one has (2A) 0, hn lim = α. n→∞ hn + 1 Remark 1.4. (1) If d¯(A) > 12 , then A automatically contains two consecutive numbers. (2) The proof of Part I of Theorem 1.3 is easy and will be omitted. See [7, (2) of the remarks in Section 4] for an explanation. (3) Given a set A ⊆ [0, n], the set 2A is a subset of [0, 2n]. The structural property of A in Part I of Theorem 1.3 says that the number (2A)(0, hn ) is close to A(0, hn ). Hence the growth of the size of (2A)(0, 2hn ) occurs mostly in [hn , 2hn ]. (4) In Part I and Part III(b) of Theorem 1.3, one cannot expect the set A has a structural property similar to the one in (b) of Part II. See [7, (1) of the remarks in Section 4] for an example.

In the next section, we will prove several lemmas necessary for the proof of Theorem 1.3. Then in the third section, the proof of Theorem 1.3 is presented and a corollary is given. In both sections, the techniques from nonstandard analysis are used. Although these nonstandard techniques might not be unavoidable, we strongly believe that they significantly shorten the length of the proofs. Besides, nonstandard analysis is one of my favorite subjects, which gives me a handy tool and offers me a better insight. Since two of Freiman’s theorems (cf. [9, Theorem 1.15 and Theorem 1.16, page 28] or cf. [2, Proposition 1.1]) will frequently be cited, we state them as lemmas in this section. The proofs are entirely standard. Lemma 1.5 (G. Freiman). Let A = {a0 , a1 , . . . , ak−1 } be such that 0 = a0 < a1 < · · · < ak−1 = n and gcd(A) = 1. If k  n+3 2 , then (2A)(0, 2n) 3k − 3. If k n+3 , then (2A)(0, 2n) k + n. 2 Lemma 1.6 (G. Freiman). Let A ⊆ N be such that |A| = k > 2. If |2A| = 2k − 1 + b  3k − 4, then A is a subset of an a.p. of the length k + b  2k − 3. §2. Lemmas. As mentioned in §1, techniques from nonstandard analysis in are needed in §2 and §3. One of the advantages of nonstandard methods is that an asymptotic argument such as upper asymptotic density in the standard world can be translated into a ∗ finite argument in a nonstandard world, so that instead of dealing with a sequence of intervals in an upper asymptotic density argument, we can deal with only one interval of ∗ finite length in the nonstandard world. For basic knowledge of nonstandard analysis, the reader is referred to [8], [5], or [6]. We work within a fixed ℵ1 -saturated nonstandard universe ∗V in this paper. For each set A, write ∗A for the nonstandard version of A in ∗V . For example,

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II ∗

151

N is the set of all natural numbers in ∗V , and if A is the set of all even numbers in N, then ∗A is the set of all even numbers in ∗ N. If we do not specify that A, B are sets of standard natural numbers, A, B are always assumed to be internal sets of (standard and nonstandard) natural numbers. Now a, b, c, h, i, j, k, m, n, x, y, z can take values in ∗ N. The integers in ∗ N  N are called hyperfinite integers. The letters H, K and N are exclusively used for hyperfinite integers. The Greek letters α, , , , and are reserved exclusively for standard real numbers. We introduce here some useful notation for comparisons. For any real numbers r, s in ∗V , by r ≈ s we mean that r − s is an infinitesimal, by r + s (r / s) we mean that r < s (r > s) and r ≈ s, and by r s (r s) we mean r < s (r > s) or r ≈ s. Given a hyperfinite integer H and two real numbers r, s, by r ∼H s we mean that s−r H ≈ 0, by r ≺H s (r 1H s) we mean that r < s (r > s) and r ∼H s, and by r 2H s (r 3H s) we mean that r ≺H s (r 1H s) or r ∼H s. Say that a is insignificant with respect to H if a ∼H 0. In the most cases the subscript H is clearly given, so it will be dropped as a subscript for convenience. Note that the comparison relations +, /, ≈, etc. can be interpreted in terms of ≺, 1, ∼, etc., or vice versa, when H is given. For example, Ha Hb iff a 2 b. We use 2 more often than because fractions can be avoided. When using ∼, ≺, 2, etc., insignificant quantities can often be neglected. For example, instead of using A(0, H ) ∼ α(H + 1) we can use its equivalent form A(0, H ) ∼ αH . For another example, when a  c  b, we often write A(a, c) ∼ A(a, b) + A(b, c) instead of A(a, c) = A(a, b) + A(b + 1, c). For a real number r ∈ ∗ R bounded by a standard real number, let st(r), the standard part of r, be the unique standard real number α such that r ≈ α. A set T of three integers is called a crowded triple if T = {a, a + 1, a + 2} or T = {a, a + 1, a + 3} or T = {a, a + 2, a + 3} for some integer a. The next lemma shows how upper asymptotic density can be translated into a nonstandard version. Lemma 2.1. Let A ⊆ N and let 0  α  1. Then d¯(A) α if and only if there is a hyperfinite integer H such that ∗A(0, H ) 3 αH . Proof. Left to the reader. The lemma is an easy consequence of the transfer property in nonstandard analysis.  Lemma 2.2. Suppose A ⊆ [0, H ] with A(0, H ) 1 0. Then there exists an a ∈ [0, H ] such that A(0, a) ∼ 0 and for any a ≺ b  H , A(a, b) 1 0. Proof. Let S = {st( Hx ) : x ∈ [0, H ] and A(0, x) ∼ 0}. Then S is a subset containing 0 of the standard unit interval. Let  be the least upper bound of S and let a ∈ [0, H ] be such that st( Ha ) = . Note that x ∼ y iff st( Hx ) = st( Hy ). Claim 2.2.1. A(0, a) ∼ 0.

152

RENLING JIN

Proof of Claim 2.2.1. Suppose A(0, a) 1 0. Then there is a standard real  number  > 0 such that A(0,a) ≈ . Choose a c ≺ a such that a−c H H < 2 . Then for any c  x  a A(0, x) A(0, a) a − x  − . H H H 2 x So st( H ) is not in S, which contradicts that  is the least upper bound of S.



Claim 2.2.2. For any a ≺ b  H , A(a, b) 1 0. Proof of Claim 2.2.2. If there is a b 1 a with A(a, b) ∼ 0, then A(0, b) ∼ A(0, a) + A(a, b) ∼ 0. So st( Hb ) is in S and is greater than . This contradicts that  is the least upper bound of S.  Lemma 2.3. Let 0  α  1 and let A ⊆ [0, H ] be such that A(0, H ) 2 αH . Let 0 ≺ a  H be such that for every 0 ≺ b ≺ a, A(0, b) 1 αb. Then there is a c a such that A(0, c) ∼ αc and for every 0 ≺ b ≺ c, A(0, b) 1 αb. Proof. Let S = {st( Hx ) : x ∈ [0, H ] and A(0, x) 2 αx} and let  be the greatest lower bound of S. Let c ∈ [0, H ] be such that st( Hc ) = . Now using a similar argument as in the proof of Lemma 2.2, one can verify that c is the desired number.  Lemma 2.4. Suppose A ⊆ [0, H ] does not contain any crowded triple. Then (1) A(0, H ) 2 12 H , (2) |A + {a, a + 1}| 3 32 |A|, (3) if T is a crowded triple, then |A + T | 3 2|A|. Proof. Let A1 = {x ∈ A : x + 1 ∈ A and x − 1 ∈ A} and A2 = A  A1 . Note that A1 contains all “isolated” singleton numbers in A. Since A does not contain any crowded triple, then A2 contains all paired numbers in A and any two different pairs in A are at least two units apart. (1) If x ∈ A1 , then let f(x) = x+1. If {x, x+1} ⊆ A2 , then let f(x) = x+2 and f(x + 1) = x + 3. It is easy to see that f : A → [0, H + 2]  A is a one-one internal function. Hence A(0, H ) 2 12 H . (2) It suffices to show |A + {0, 1}| 3 32 |A|. Clearly A1 + {0, 1} = 2|A1 | and |A2 + {0, 1}| = 32 |A2 |. Since A does not contain any crowded triple, then (A1 + {0, 1}) ∩ (A2 + {0, 1}) = ∅. Hence 3 3 |A + {0, 1}| = |A1 + {0, 1}| + |A2 + {0, 1}| = 2|A1 | + |A2 | |A|. 2 2 (3) Without loss of generality, we assume that T = {0, 1, 2} or T = {0, 1, 3}. Let f be the function from A to [0, H +2]A defined in (1). Then A1 ∪f[A1 ] ⊆ A1 + T and A2 ∪ f[A2 ] ⊆ A2 + T . Hence |A + T | |A1 ∪ f[A1 ]| + |A2 ∪ f[A2 ]| = 2|A1 | + 2|A2 | = 2|A|.



INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

153

Lemma 2.5. Let A ⊆ [0, H ] and 0  x1 ≺ x2  H be such that (1) (2A)(2x1 , 2x2 ) 1 3A(x1 , x2 ), (2) if 0 ≺ x1 , then A[0, x] is not a subset of an a.p. of difference d 2 and A(0, x) 2 12 (x + 1) for some x ∼ x1 , (3) if x2 ≺ H , then A[x, H ] is not a subset of an a.p. of difference d 2 and A(x, H ) 2 12 (H − x + 1) for some x ∼ x2 , Then (2A)(0, 2H ) 1 3A(0, H ). Proof. By Lemma 1.6, one has (2A)(0, 2x1 ) 3 3A(0, x1 ) and (2A)(2x2 , 2H ) 3 3A(x2 , H ). Hence (2A)(0, 2H ) ∼ (2A)(0, 2x1 ) + (2A)(2x1 , 2x2 ) + (2A)(2x2 , 2H ) 1 3A(0, x1 ) + 3A(x1 , x2 ) + 3A(x2 , H ) ∼ 3A(0, H ).



The next three lemmas are the main ingredients of the proof of Theorem 1.3. Lemma 2.6. Let A ⊆ [0, H ] be such that (1) A ∩ N contains two consecutive numbers, (2) A(0, H ) ≺ 12 H and A(0, x) 2 12 x for any 0 ≺ x  H , (3) (2A)(0, 2H ) ∼ 3A(0, H ), (4) there exists a k0 ≺ H such that for every k0 ≺ k ≺ H , A(k, H ) 1 1 2 (H − k). Then there exist 0  c ≺ b ≺ H such that c ∼ 0, [c, b] ∩ A = ∅, and A(b, H ) ∼ H − b. Proof. By (1) we can, without loss of generality, assume 0, 1 ∈ A. By Lemma 2.3, one can choose k0 such that A(k0 , H ) ∼ 12 (H − k0 ). By (2), one has k0 1 0. Claim 2.6.1. (2A)(k0 + H, 2H ) ∼ H − k0 ∼ 2A(k0 , H ). Proof of Claim 2.6.1. Given any k0 + H ≺ x ≺ 2H , since A(x − H, H ) 1

1 (2H − x), 2

A[x − H, H ] ⊆ [x − H, H ], and x − A[x − H, H ] ⊆ [x − H, H ], then A[x − H, H ] ∩ (x − A[x − H, H ]) = ∅. This shows x ∈ A[x − H, H ] + A[x − H, H ] ⊆ 2A.

154

RENLING JIN

Hence 2A contains all x with k0 + H ≺ x ≺ 2H . This shows (2A)(k0 + H, 2H ) ∼ H − k0 ∼ 2A(k0 , H ).



The lemma is now divided into two cases. Case 2.6.1. A(0, k0 ) ∼ 0. Claim 2.6.2. (2A)(H, k0 + H ) ∼ 0. Proof of Claim 2.6.2. If (2A)(H, k0 + H ) 1 0, then (2A)(0, 2H ) ∼ (2A)(0, H ) + (2A)(H, k0 + H ) + (2A)(k0 + H, 2H ) 3 A(0, H ) + (2A)(H, k0 + H ) + 2A(k0 , H ) ∼ 3A(0, H ) + (2A)(H, k0 + H ) 1 3A(0, H ), 

which contradicts (3).

Claim 2.6.3. Let c = max A[0, k0 ] and b = min A[k0 , H ]. Then c ∼ 0 and 0 b ∼ H +k 2 . Hence the conclusion of Lemma 2.6 holds. Proof of Claim 2.6.3. The proof is similar to the proof of [7, Claim 1.3.5]. 0 It is clear that b 2 H +k because otherwise 2 A(k0 , H ) = A(b, H ) 2 H − b ≺

1 (H − k0 ), 2

which contradicts the choice of k0 . For notational convenience, we deal with B = H − A ⊆ [0, H ], the reverse of A, instead, in the rest of this claim. Let m0 = H − k0 , let d = H − b, and let a = H − c. By Claim 2.6.2, (2B)(m0 , H ) ∼ 0. Choose two standard natural numbers p < q such that gcd(p, q) = 1 and 2m0 m0 + + 2 < H. p 0 For each i = 0, 1, . . . , 2p, let xi = [ m 2p i]. Then B[xp−i , xp−i+1 ] + B[xp+i , xp+i+1 ] ⊆ (2B)[m0 − 2, H ]. Hence for each of i = 0, 1, . . . , p − 1, B(xp−i , xp−i+1 ) 1 0 implies B(xp+i , xp+i+1 ) = ∅ and B(xp+i , xp+i+1 ) 1 0 implies B(xp−i , xp−i+1 ) = ∅. Note that when i = 0, one has B(xp , xp+1 ) ∼ 0. Since B(0, m0 ) ∼ 12 m0 , then there are exactly half of the i’s in {1, 2, . . . , p − 1} ∪ {p + 1, p + 2, . . . , 2p − 1} such that B(xi , xi+1 ) ∼ xi+1 − xi and for the rest of the i’s, B[xi , xi+1 ] = ∅. Since B(xp , xp+1 ) ∼ 0, then B(x0 , x1 ) ∼ x1 − x0 . By the same procedure, one can define yj = [ m2q0 j] for j = 0, 1, . . . , 2q so that there are exactly half of the j’s in {1, 2, . . . , q−1}∪{q+1, q+2, . . . , 2q−1}

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

155

such that B(yj , yj+1 ) ∼ yj+1 − yj and for the rest of the j’s, B[yj , yj+1 ] = ∅. Since B(yq , yq+1 ) ∼ 0, then B(y0 , y1 ) ∼ y1 − y0 . Since p and q are relatively prime, for any i ∈ {1, 2, . . . , p − 1} and j ∈ {1, 2, . . . , q − 1}, one has xi ∼ yj . Hence there is no i ∈ {1, 2, . . . , p − 1} such that B(xi−1 , xi ) ∼ B(xi , xi+1 ) because otherwise, one can take a j ∈ {1, 2, . . . , q − 1} such that xi ∈ [yj , yj+1 ], which would make B(yj , yj+1 ) ∼ 0 and B(yj , yj+1 ) ∼ yj+1 − yj at the same time. Since B(x0 , x1 ) ∼ x1 − x0 , then B(0, xp ) ∼ xp and B[xp+1 , m0 ] = ∅. So d < xp+1 . By the fact that p and q can be chosen arbitrarily large in N, we have d ∼ m20 . If m0  a ≺ H , then (2B)(0, 2H ) 3 (2B)(0, m0 ) + (2B)(a, H ) + (2B)(H, 2H ) 3 2B(0, m0 ) + |B[0, H − a] + a| + B(0, H ) 3 3B(0, H ) + B(0, H − a) 1 3B(0, H ), which contradicts (3). So a ∼ H . It is easy to see that c = H − a and b = H − d are the numbers satisfying the conclusion of the lemma.  Case 2.6.2. A(0, k0 ) 1 0. By Lemma 2.2, there is a 0 ≺ k 2 k0 such that A(k, k0 ) ∼ 0 and for any 0  k  ≺ k, A(k  , k) 1 0. Without loss of generality, we assume that k ∈ A. Choose k  ∈ A such that k0 + k − H ≺ k  ≺ k. Then (2A)(0, 2H ) 3 (2A)(0, 2k) + (2A)(k0 + k, H + k  ) + (2A)(H + k  , H + k) + (2A)(H + k0 , 2H ) 3 3A(0, k) + |A[k0 + k − k  , H ] + k  | + |A[H + k  − k, H ] + k| + 2A(k0 , H ) 1 1 1 3A(0, k) + (H − k0 − k + k  ) + (k − k  ) + 2A(k0 , H ) 2 2 1 ∼ 3A(0, k0 ) + (H − k0 ) + 2A(k0 , H ) 2 ∼ 3A(0, H ), 

which contradicts (3). Lemma 2.7. Let A ⊆ [0, H ] and α  (1) (2) (3) (4)

1 2

be such that

0, 1 ∈ A, A(0, H ) ∼ αH , for any 0 ≺ x  H , A(0, x) 2 αx, for any 0 ≺ x  2H , (2A)(0, x) 2 32 αx,

156

RENLING JIN

(5) there is no d ∈ N such that α = 0  n  Hd }.

2 d

and A ⊆ {nd : 0  n 

H d

} ∪ {1 + nd :

Then A contains a crowded triple. Proof. Suppose that A satisfies (1)–(5) and A does not contain a crowded triple. We derive a contradiction. Let P = {x ∈ A : x + 1 ∈ A or x − 1 ∈ A} be the set of all paired numbers in A. Claim 2.7.1. |A  P| ∼ 0. Proof of Claim 2.7.1. Let Q = A  P. Since A contains no three consecutive numbers, then |Q + {0, 1}| = 2|Q|, |P + {0, 1}| =

3 |P|, 2

and (Q + {0, 1}) ∩ (P + {0, 1}) = ∅. Hence (2A)(0, H ) 3 |Q + {0, 1}| + |P + {0, 1}| 3 = 2|Q| + |P| 2 1 3 = A(0, H ) + |Q|. 2 2 By (2) and (4), one has |Q| ∼ 0.



By Claim 2.7.1, one can assume that A contains no isolated points, i.e. A = P. Since A contains no crowded triple, then for any x, y ∈ A, |x − y| = 2, i.e. every gap of A has the length at least 2. Note also that for any 0  x ≺ y  H , we have A(x, y) 2 12 (y − x) because every subinterval of length 4 can contain at most two elements from A. Let A0 contain all even numbers in A and A1 contain all odd numbers in A. Since A contains only pairs of two consecutive numbers, then |A0 | = |A1 |. Let A+ 0 = {x ∈ A0 : x + 1 ∈ A1 } and let + + = A  A . Note that 0 ∈ A . A− 0 0 0 0 Claim 2.7.2. For any 0 2 x ≺ y 2 H , if A(x, y) 1 0, then A[x, y] ∩ A+ 0 ∅. = Proof of Claim 2.7.2. Suppose the claim is not true and let a = max A+ 0 [0, x].

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

157

− Without loss of generality, one can assume x = min A− 0 [a, H ] and y ∈ A0 . Let − B = A− 0 [x, y] ∪ (A0 [x, y] + y).

Then B + {−2, −1, 0} ⊆ 2A

and

|B + {−2, −1, 0}| = 3|B|.

A− 0 [x, y],

For each z ∈ if a + z ∈ B, then let f(z) = a + 1 + z, and if a + z ∈ B, then let f(z) = a + z − 1. Since a + 1 ∈ A and z − 1 ∈ A, then   f A− 0 [x, y] ⊆ (2A)  (B + {−2, −1, 0}). Hence (2A)(2a, 2y) 3 |B + {−2, −1, 0}| + |f[A− 0 [x, y]]| = 3|B| + A− 0 (x, y) ∼ 7A− 0 (x, y) ∼ 3.5A(x, y) ∼ 3.5A(a, y) 1 3A(a, y). By Lemma 2.5, (2A)(0, 2H ) 1 3A(0, H ), which contradicts (4).  + + By Claim 2.7.2, max A0 ∼ H . Note that the existence of a = max A0 [0, x] + depends on the fact A+ 0 [0, x] = ∅ (because 0 ∈ A0 ). However, a parallel + proof can be given if one has A0 [y, H ] = ∅ instead of A+ 0 [0, x] = ∅ and let [y, H ]. So the proof needs only the condition A+ a = min A+ 0 0 = ∅ instead + of 0 ∈ A0 . By the same argument, one can show that if A− 0 = ∅, then − [x, y] = ∅. So max A ∼ H and if b = min A− A(x, y) 1 0 implies A− 0 0 0 , then A(0, b − 1) ∼ 0. − + Claim 2.7.3. If A− 0 = ∅, then A0 (0, H ) ∼ A0 (0, H ). + Proof of Claim 2.7.3. Let c = min A− 0 and b = max A0 . Let − + B + = A+ and B − = A− 0 ∪ A0 + b 0 ∪ A0 + b .

Let B = B + ∪ B − . Then B + + {0, 1, 2} ⊆ 2A, B − + {−1, 0, 1} ⊆ 2A, # # # + # #B + {0, 1, 2}# = 3#B + #, and

Let

# − # # # #B + {−1, 0, 1}# = 3#B − #. C = B + + {0, 1, 2} ∪ B − + {−1, 0, 1} .

Then |C | ∼ 3|A|.

158

RENLING JIN

− Suppose z ∈ A+ 0 and c + z ∈ B . If c + z ∈ C , let f(z) = z + c. + If c + z ∈ B , then let f(z) = z + c − 1. If z + c ∈ B + + 2, then let f(z) = z + 1 + c. It is easy to see that   − f z ∈ A+ ∩ C = ∅. 0 : z + c ∈ B

So by (4), Hence

# # # z ∈ A+ : z + c ∈ B − # ∼ 0. 0 # # # +# #A # 2 B − (c, b) + B − (b, b + c) ∼ #A− #. 0 0

By a similar argument, one can show # # # z ∈ A− : z + c ∈ B + + 2 # ∼ 0. 0

Hence one has

|A− 0 |

2

|A+ 0 |.



+ + Claim 2.7.4. If A− 0 = ∅, then A0 is a subset of an a.p. of length ∼ |A0 | with − − difference d and A0 is a subset of an a.p. of length ∼ |A0 | with difference d .

Proof of Claim 2.7.4. We use the notation from the last claim. If + − z ∈ A− 0 + A0  B + 2 , then either z ∈ C, or z − 1 ∈ C, or z − 2 ∈ C. Since |C | ∼ 3|A|, then So

# − # # A + A−  B + + 2 # ∼ 0. 0 0

# # # # # − # # A + A− # 2 |B + | 2 2#A+ # ∼ 2#A− #. 0

0

0

0

− By Lemma 1.6, A− 0 is a subset of an a.p. of length ∼ |A0 | with difference d . By a similar argument, one can show # # + # A + A+  B + # ∼ 0. 0

Hence

0

# + # # # # A + A+ # 2 2#A+ # 0

0

0

is a subset of an a.p. of length ∼ |A+ and this implies 0 | with a difference  d ∈ N. + + Clearly, d  A+ 0 (0, H ) ∼ H since 0 ∈ A0 and max A0 ∼ H . And since + + A0 (0, x) 1 0 for any x 1 0 by the fact that A0 is almost an a.p. (A+ 0 is a − ), then min A ∼ 0 by the comments after subset of an a.p. of the length ∼ A+ 0 0  Claim 2.7.2. Hence dA− (0, H ) ∼ H . This shows d = d .  0 that A+ 0

Claim 2.7.5. A− 0 = ∅.

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

159

Proof of Claim 2.7.5. Assume the claim is not true. By Claim 2.7.4, A+ 0 ⊆ {dn : 0  n  H/d } and A− 0 ⊆ {b + dn : 0  n  H/d } for some 0 < b < d . It is easy to see that d1 = α4 because # +# # −# 1 # # 1 #A # ∼ #A # ∼ #A0 # ∼ |A|. 0 0 2 4 Since there are + − z ∈ A− 0 + A0 ∩ B + 2 , then 2b ≡ 2 (mod d ). Note that A0 is a set of even numbers, then d is even. Hence b ≡ 1 (mod d2 ). This shows % & d 2H n : 0  n  A− − 1 ⊆ . 0 2 d Hence − A0 − 1 ∪ A+ 0 ⊆ and A− 0

∪ A+ 0 +1 ⊆

%

d 2H n:0n 2 d

&

% & 2H d 1+ n :0n  . 2 d

Now we have

+ A = A− 0 + {−1, 0} ∪ A0 + {0, 1} % & % & 2H 2H d d n:0n ⊆ ∪ 1+ n :0n  2 d 2 d

and α =

2 d/2 ,

which contradicts (5) with d replaced by d/2. A+ 0



and derive the contradiction. Let Now we can assume that A0 = b = max A0 and let B = A0 ∪ (A0 + b). Then B + {0, 1, 2} ⊆ 2A and

# # |B + {0, 1, 2}| = 3|B| ∼ 6#A0 # = 3A(0, H ).

For each x ∈ A0 + A0 , if x ∈ B, then x + 1 ∈ B + {0, 1, 2}. Hence

# # # # #A0 + A0 # 2 |B| = 2#A0 #.

160

RENLING JIN

By Lemma 1.6, A0 is an a.p. of length ∼ |A0 | with difference d . Clearly, % & % & H H A ⊆ dn : 0  n  ∪ 1 + dn : 0  n  d d  and α = d2 , which contradict (5). The next lemma is a weak version of Kneser’s theorem in nonstandard analysis. In order to state the lemma, some notation needs to be introduced. An infinite initial segment U of ∗ N is a cut if N ⊆ U and U + U ⊆ U . A cut is usually an external set. For example, N is a cut. The set  [0, [H/n]] UH = n∈N

is a cut. In the next lemma and in the next section write U = UH H is clearly given. Suppose U is a cut such that U ⊆ D ⊆ ∗ N. Given a function f : D → ∗ R (not necessarily internal) bounded by a standard real number, the lower U density of f is defined as the following: d U (f) = sup{inf{st(f(n)) : n ∈ U  [0, m]} : m ∈ U }. A set C ⊆ ∗ N is called U -internal if for any m ∈ U , the set C [0, m] is internal. Note that if A ⊆ [0, H ] is internal, then A is U -internal. Given a set A ⊆ [0, H ], let fA (x) = A(0,x) x+1 for any x ∈ [0, H ]. The lower U -density of A is defined as d U (A) = d U fA . For any x ∈ ∗ N  U , define also d x+U (A) = d U ((A − x) ∩ ∗ N) and d x−U (A) = d U ((x − A) ∩ ∗ N). Remark 2.8. (1) For any A ⊆ N, d (A) = d U (A) with U = N, where d (A) = lim inf n→∞ A(n) n . (2) d x+U (A) is a forward lower U -density of A from x and d x−U (A) is a backward lower U -density of A from x. (3) It is easy to check that for any a ∈ U , d U (A + a) = d U (A) and d U (A  [0, a]) = d U (A). (4) Let U ⊆ [0, H ] be a cut and A ⊆ [0, H ]. If d U (A) > , then by the overspill principle, there are x ∈ U and y ∈ [0, H ]  U such that for any x  z  y, A(0,z) z+1 > .

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

161

Another definition needed is called e-transform (cf. [9, page 42]). It is also called -transformation (cf. [3, page 58]). Let A, B ⊆ ∗ N and a ∈ A. An ea -transform of (A, B) is a pair (A , B  ) = ea (A, B) such that A = A ∪ (B + a)

and

B  = B ∩ (A − a).

The important facts of the ea -transform are the following: (1) A ⊇ A and B  ⊆ B. (2) A + B  ⊆ A + B. Hence d U (A + B  )  d U (A + B) when a ∈ U . (3) d U (fA + fB  ) = d U (fA + fB ) when a ∈ U . The essential idea of the proof of the next lemma can be found in [3, page 61]. The detailed proof is included here because a nonstandard setting is involved. Lemma 2.9. Let U be a cut and let A ⊆ ∗ N be such that 0 < d U (A) = α  12 and A ∩ U contains a crowded triple. Then d U (2A) 85 α. Proof. Let T ⊆ A ∩ U be the crowded triple. Without loss of generality, one can assume 0 = min T . By performing a sequence of ea -transform with a ∈ U to (A, A) one can produce a pair (A , B  ) such that A contains four consecutive numbers [a, a + 3] and [a, a + 3] + B  ⊆ A . (Because A contains a crowded triple T , after one e-transform A contains four consecutive numbers [a, a + 3]. After four additional e-transforms A contains B  + [a, a + 3].) It suffices now to show 8 d U A + B  α. 5  For simplicity we assume [0, 3] ⊆ A and [0, 3] + B  ⊆ A . Note that B  ⊆ A . Choose an arbitrary 0 <  < 2α  1. We want to show 4 d U (2A) d U A + B  . 5 Since d U fA + fB  = d U fA + fA = 2d U (A) = 2α > , then there exists an x0 ∈ U such that for any x ∈ U  [0, x0 − 1], A (0, x) + B  (0, x) > . x +1 Let x0 be the least such number. Then A 0, x0 − 1 + B  0, x0 − 1  x0 if x0 > 0. This shows A x0 , x0 + x + B  x0 , x0 + x > x +1

162

RENLING JIN

for any x 0. It is easy to see that x0 ∈ A ∪ B  ⊆ A . One can assume d U (B  ) > 0 because otherwise 4 d U A + B  d U A = d U fA + fB  >  > . 5 Let   x1 = min B  ∩ U  0, x0 − 1 , A¯ = A − x0 ∩ [0, H ], and

B¯ = B  − x1 ∩ [0, H ].

It is easy to check that 0 ∈ A¯ ∩ B¯ and d U A¯ + B¯ = d A + B  . ¯ ¯ Claim 2.9.1. For any x ∈ U , 1 + A(x) + B(x) 45 (x + 1). Proof of Claim 2.9.1. The proof is divided into four cases. Case 2.9.1.1. x 5 − 1. Then (x + 1) 5 and hence 5(x + 1) − 5 4(x + 1). So (x + 1) − 1 Since

4 (x + 1). 5

¯ A(x) = A x0 , x0 + x − 1

and ¯ B(x) = B  x1 + 1, x1 + x B  x0 , x1 + x − 1 B  x0 , x0 + x − 1, then

¯ ¯ 1 + A(x) + B(x) A x0 , x0 + x + B  x0 , x0 + x − 1 4 > (x + 1) − 1 (x + 1). 5 Case 2.9.1.2. x < x1 − x0 . Since B  (x0 , x0 + x) = 0, then ¯ ¯ 1 + A(x) + B(x) A x0 , x0 + x = A x0 , x0 + x + B  x0 , x0 + x > (x + 1) >

4 (x + 1). 5

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

163

Case 2.9.1.3. x1 − x0  x  x1 − x0 + 3. Since x1 + [0, 3] = [x1 , x1 + 3] ⊆ A , then ¯ ¯ 1 + A(x) + B(x) A x0 , x1 − 1 + x − x1 − x0 + B  x0 , x1 − 1 + 1 >  x1 − x 0 + x − x 1 − x 0 + 1 >  x1 − x 0 +  x − x 1 − x 0 + 1 4 = (x + 1) > (x + 1). 5 Case 2.9.1.4. x1 − x0 + 3  x < 5 − 1. Then one has (x + 1) < 5, which implies 45 (x + 1) < 4. Hence ¯ ¯ 1 + A(x) + B(x) A x1 , x1 + 3 4 = 4 > (x + 1). 5 By the four cases above, one has that for any x 4 ¯ ¯ 1 + A(x) + B(x) > (x + 1). 5 By van der Corput’s Theorem (cf. [3, Theorem 9, page 22]), one has 4 1 + A¯ + B¯ (x) > (x + 1). 5 This shows 4 d U (2A) d U A¯ + B¯ . 5 Since  < 2α is arbitrary, then one has d U (2A) 85 α.



§3. Proofs of the main theorem. In order to use nonstandard techniques, we first translate Theorem 1.3 into a nonstandard equivalent. We translate the part II of Theorem 1.3 into Theorem 3.1 and part III of Theorem 1.3 into Theorem 3.2. We then prove Theorem 3.1 and Theorem 3.2. Theorem 3.1. Let A ⊆ [0, H ] and 0 < α < 12 be such that (1) A ∩ N contains two consecutive numbers, (2) A(0, H ) ∼ αH , (3) for any 0 ≺ x  H , A(0, x) 2 αx, (4) for any 0 ≺ x  2H , (2A)(0, x) 2 32 αx. Then (a) either there are a, d ∈ N such that α = d2 and % & % & H H A ⊆ a + dn : 0  n  ∪ a + 1 + dn : 0  n  d d

164

RENLING JIN

(b) or there are 0  c  b  H such that c ∼ 0, [c, b] ∩ A = ∅, and A(b, H ) ∼ H − b. Proof of Part II of Theorem 1.3 from Theorem 3.1. Let A ⊆ N contain ¯ two consecutive numbers and 0 < α < 12 such that d(A) = α and d¯(2A) = 32 α. n) Take any increasing sequence hn ∈ N such that limn→∞ A(h hn = α. Let K be a ∗ hyperfinite integer, let H = hK , and let B = A ∩ [0, H ]. We now check that (1)–(4) of Theorem 3.1 are true for B in the place of A. B(0,hK ) n) ≈ α. (3) is (1) is trivially true. (2) is true because A(h hn → α implies hK true because otherwise d¯(A) > α by Lemma 2.1. (4) is true because otherwise d¯(2A) > 32 α by Lemma 2.1. By Theorem 3.1, either (a) or (b) of Theorem 3.1 is true for B. If (a) of Theorem 3.1 is true, then there are a, d ∈ N such that α = d2 and % & % & H H ∗ A ∩ [0, H ] ⊆ a + dn : 0  n  ∪ a + 1 + dn : 0  n  . d d So clearly (a) of Part II of Theorem 1.3 is true for A. If (b) of Theorem 3.1 is true, then by the underspill principle, there are 0  cn  bn  hn such that n ,hn ) = 1.  limn→∞ hcnn = 0, [cn , bn ] ∩ A = ∅, and limn→∞ hA(b n −bn +1 Theorem 3.2. Let A ⊆ [0, H ] be such that (1) (2) (3) (4)

A ∩ N contains two consecutive numbers, A(0, H ) ∼ 12 H , for any 0 ≺ x  H , A(0, x) 2 12 x, for any 0 ≺ x  2H , (2A)(0, x) 2 34 x.

Then either (a) for some a < 4, % & % & H H A ⊆ a + 4n : 0  n  ∪ a + 1 + 4n : 0  n  , or 4 4 (b) (2A)(0, H ) ∼ A(0, H ). Proof of Part III of Theorem 1.3 from Theorem 3.2. Let A ⊆ N contain two consecutive numbers such that d¯(A) = 12 and d¯(2A) = 34 . Take any 1 n) increasing sequence hn ∈ N such that limn→∞ A(h hn = 2 . Let K be a hyperfinite ∗ integer, let H = hK , and let B = A ∩ [0, H ]. By the same idea as in the proof above, we can check that (1)–(4) of Theorem 3.2 are true for B in the place of A. Now it is easy to see that (a) of Theorem 3.2 for B implies (a) of Part III of Theorem 1.3 for A and (b) of Theorem 3.2 for B implies (b) of part III of Theorem 1.3 for A.  Now we are ready to prove Theorem 3.1 and Theorem 3.2.

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

165

Proof of Theorem 3.1. Suppose 0 < α < 12 and A ⊆ [0, H ] such that A satisfies (1)–(4) of Theorem 3.1. Suppose that A does not satisfy (a) of Theorem 3.1. We want to show that A satisfies (b) of Theorem 3.1. In the proof, we show that (2A)(0, 2H ) 1 3A(0, H ) unless A satisfies (b) of Theorem 3.1. Without loss of generality, we assume that 0, 1 ∈ A. By Lemma 2.7, A contains a crowded triple. Let a = max{x : x is the least element of a crowded triple in A}. Claim 3.1.1. a 1 0. Proof of Claim 3.1.1. Let T ⊆ A be the crowded triple with a = min T . Suppose a ∼ 0, then A[a + 3, H ] does not contain any crowded triple. By (3) of Lemma 2.4, |A[a + 3, H ] + T | 3 2A(a + 3, H ). Hence (2A)(0, H ) 3 (2A)(2a + 3, H + a + 3) 3 |A[a + 3, H ] + T | 3 2A(a + 3, H ) ∼ 2A(0, H ) ∼ 2αH. This contradicts (4) of this theorem. Claim 3.1.2. A(0, a) 1 0. Proof of Claim 3.1.2. Suppose A(0, a) ∼ 0. Hence A(0, H ) ∼ A(a, H ). If 2a < H , then (2A)(0, H + a) ∼ (2A)(0, H ) + (2A)(H, H + a) 3 |A[0, H ] + {0, 1}| + |A[H − a, H ] + T | 3 3 A(0, H ) + 2A(H − a, H ) 2 3 3 3 αH + 2αa 1 α(H + a), 2 2 which contradicts (4). If 2a H , then (2A)(0, H + a) ∼ (2A)(0, H ) + (2A)(H, H + a) 3 |A[a + 3, H ] + {0, 1}| + |A[H − a, H ] + T | 3 3 A(a + 3, H ) + |A[a + 3, H ] + T | 2



166

RENLING JIN

3 A(0, H ) + 2A(a, H ) 2 3 ∼ A(0, H ) + 2A(0, H ) 2 3 3 3 αH + 2αH 1 α(H + a), 2 2 which again contradicts (4). 3



Claim 3.1.3. If a ∼ H , then A satisfies (b) of Theorem 3.1. Proof of Claim 3.1.3. There are two cases. The first case is when 1 d H −U (A) > 2 and the second case is when d H −U (A)  12 . It is shown that only the first case can be true and if the first case is true, then A satisfies (b) of this theorem. Case 3.1.3.1. d H −U (A) > 12 . Then there exists a K ≺ H such that for any K  x ≺ H , A(x, H ) 1 1 (H − x). Hence by Lemma 2.6, A satisfies (b). 2 Case 3.1.3.2. d H −U (A)  12 . By (2) and (3), d H −U (A) =  > 0. Note that A[x, H ] contains a crowded triple for some x ∼ H . By Lemma 2.9, one has d 2H −U (2A) 85 . Let ' 1 <  < 16/15. Then there is a K ≺ H in A such that A(K, H )  (H − K) and (2A)(2K, 2H )

8 2(H − K). 5

Hence (2A)(2K, 2H )

8 2A(K, H ) 1 3A(K, H ). 5 2

Note that A(0, K) 2 12 K by (3) of this theorem. Now Lemma 2.5 implies (2A)(0, 2H ) 1 3A(0, H ), which contradicts (4).  Now the only case left is a ≺ H with A(0, a) 1 0. The next claim shows that this case does not occur. Claim 3.1.4. If a ≺ H and A(0, a) 1 0, then (2A)(0, 2H ) 1 3A(0, H ), which contradicts (4). The proof is divided into four cases. Assume a ≺ H and A(0, a) 1 0. Since there is no crowded triple in A[a + 1, H ], one has A(a, H ) 2 12 (H − a) by (1) of Lemma 2.4.

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

167

Case 3.1.4.1. 0 < d a−U (A)  12 . By the same argument in Case 3.1.3.2, one can find an x ≺ a in A such that (2A) 2x, 2(a + 3) 1 3A(x, a + 3). Hence by Lemma 2.5, (2A)(0, 2H ) 1 3A(0, H ). Case 3.1.4.2. d a−U (A) > 12 . If (2A)(0, 2(a+3)) 1 3A(0, a+3), then by Lemma 2.5 one has (2A)(0, 2H ) 1 3A(0, H ). So we can assume that (2A) 0, 2(a + 3) ∼ 3A(0, a + 3). It is now easy to check that A[0, a + 3] satisfies (1)–(4) of Lemma 2.6 in place of A. (1) is trivially true. (2) is true because of (3) of this theorem. (3) is true by the assumption (2A)(0, 2(a + 3)) ∼ 3A(0, a + 3). (4) is true because of the assumption of this case. Hence by Lemma 2.6, there are 0  c  b  a + 3 such that c ∼ 0, [c, b] ∩ A = ∅, and A(b, a + 3) ∼ a + 3 − b. Note that 2b 1 a because 1 a 1 αa 3 A(0, a + 3) ∼ A(b, a + 3) ∼ a + 3 − b. 2 By Lemma 2.2, there is a k ∈ [a + 3, H ] such that A(a + 3, k) ∼ 0 and for any k ≺ x  H , A(k, x) 1 0. One can choose k ∈ A. Subcase 3.1.4.2.1. k ∼ a. Note that a, a + 1 ∈ A[a, H ]. Then one has (2A)(0, 2H ) ∼ (2A)(0, b) + (2A)(b, a) + (2A)(a, 2b) +(2A)(2b, 2a) + (2A)(2a, 2H ) 3 A(0, b) + A(b, a) + A a, min{2b, H } +2A(b, a) + 3A(a, H ) ∼ 3A(0, H ) + A a, min{2b, H } 1 3A(0, H ). Subcase 3.1.4.2.2. k 1 a. Let T ⊆ A be the crowded triple with a = min T . Note that # # # # # T ∪ A[k, H ] + T ∪ A[k, H ] # 3 3# T ∪ A[k, H ] # and

Then



T ∪ A[k, H ] + T ∪ A[k, H ] ∩ [2a + 7, a + k − 1] = ∅.

# # (2A)(a + k, 2H ) 3 # T ∪ A[k, H ] + T ∪ A[k, H ] # − |T + T | # # 3 3# T ∪ A[k, H ] # ∼ 3A(k, H ).

168

RENLING JIN

Let b ≺ x ≺ a be such that a − x  k − a − 7. Then (2A)(0, 2H ) 3 (2A)(b, a) + (2A)(2b, 2a) +(2A)(k + x, k + a) + (2A)(k + a, 2H ) 3 A(b, a) + 2A(b, a) + A(x, a) + 3A(k, H ) ∼ 3A(0, H ) + A(x, a) 1 3A(0, H ). So from now on we assume d a−U (A) = 0 because of Cases 3.1.4.1 and 3.1.4.2. Case 3.1.4.3. d a+U (A) =  > 0. Since A[a + 3, H ] does not contain any crowded triple, by (1) of Lemma 2.4, one has   12 . Subcase 3.1.4.3.1. For any x ∼ a, A[x, H ] is not a subset of an a.p. of difference d > 1. By the overspill principle, there exists a a ≺ K ≺ H such that A[K, H ] is not a subset of an a.p. of difference d > 1. By Lemma 2.9, d 2a+U (2A) 85 . Hence similar to Case 3.1.4.2, there exists a k with a ≺ k  K such that (2A)(2a, 2k) 1 3A(a, k). This shows (2A)(0, 2H ) 1 3A(0, H ) by Lemma 2.5. Subcase 3.1.4.3.2. There is an x ∼ a such that A[x, H ] is a subset of an a.p. of difference d > 1. Then one has that either A[x, H ] + A[x, H ] ∩ A[x, H ] + a  = ∅ or

A[x, H ] + A[x, H ] ∩ A[x, H ] + a  + 1 = ∅,

for any two consecutive numbers a  and a  + 1. Since d a+U > 0, one can assume x ∈ A. If (2A)(2x, 2H ) 1 2A(x, H ), then (2A)(2a, 2H ) 3 A(x, H ) + (2A)(2x, 2H ) 1 3A(x, H ) ∼ 3A(a, H ) because T contains two consecutive numbers. Hence (2A)(0, 2H ) 1 3A(0, H ). So one can now assume that (2A)(2x, 2H ) 2 2A(x, H ). By Lemma 1.6, A[x, H ] is a subset of an a.p. of difference d of length ∼ A(x, H ). This implies A(x, H ) ∼ d1 (H − x). If there is a y ≺ a with y ∈ A such that A(y, a) ∼ 0, then (2A)(0, 2H ) ∼ (2A)(0, 2y) + (2A)(2y, 2a) + (2A)(2a, 2H ) #  #  3 3A(0, y) + #A y, min{2a − y, H } + y # + 3A(a, H ) 1 3A(0, H )

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

169

because 2a − y 1 a and 1 (a − y) 1 0 d when 2a −y  H . Otherwise, choose a y ≺ a such that y ∈ A, a −y < H −a, and A(y, a) ≺ 4d1 (a − y). Such a y exists because of d a−U (A) = 0. Then A(a, 2a − y) ∼

(2A)(0, 2H ) ∼ (2A)(0, 2y) + (2A)(2y, 2a) + (2A)(2a, 2H ) 3 3A(0, y) + |A[y, 2a − y] + y| + 3A(a, H ) 3 3A(0, y) + A(a, 2a − y) + 3A(a, H ) 1 ∼ 3A(0, y) + (a − y) + 3A(a, H ) d 1 3A(0, y) + 4A(y, a) + 3A(a, H ) 1 3A(0, H ). Now the only case left is the following. Case 3.1.4.4. d a+U (A) = 0 and d a−U (A) = 0. This case will be divided into four subcases. Subcase 3.1.4.4.1. There is a c 1 a such that A(a, c) ∼ 0. By Lemma 2.2, c can be chosen so that for any c ≺ x  H , A(c, x) 1 0. Without loss of generality, we can choose c ∈ A. By Lemma 2.2, there is a b 2 a in A such that A(b, a) ∼ 0 and for any 0  x ≺ b, A(x, b) 1 0. Now one has (2A)(0, 2H ) ∼ (2A)(0, 2b) + (2A)(2b, b + c) + (2A)(a + c, 2H ) # # 3 3A(0, b) + #A[2b − c, b] + c # # # +# T ∪ A[c, H ] + T ∪ A[c, H ] # 3 3A(0, b) + A(2b − c, b) + 3A(c, H ) 1 3A(0, H ). Subcase 3.1.4.4.2. There is a b ≺ a with A(b, a) ∼ 0. Again b can be chosen so that for any 0  x ≺ b, A(x, b) 1 0. Now the proof is similar to the proof above. Assume b ∈ A. Then (2A)(0, 2H ) ∼ (2A)(0, 2b) + (2A)(2b, 2a) + (2A)(2a, 2H ) 3 3A(0, b) + |A[max{2b − a, 0}, a] + a| + 3A(a, H ) 3 3A(0, H ) + A(max{2b − a, 0}, b) 1 3A(0, H ). So we can now assume that for any x ≺ a ≺ y, A(x, a) 1 0 and A(a, y) 1 0. Subcase 3.1.4.4.3. For any c ∼ a, the set A[c, H ] is not a subset of an a.p. of difference d > 1. By the overspill principle, one can choose a c 1 a small enough such that 2c − a < H and A[c, H ] is not a subset of an a.p. of difference d > 1.

170

RENLING JIN

Furthermore, one can require that A(a, c) 2 A(c, 2c − a). This can be done by the following steps. Suppose the original c does not work. First, let A(a,x) 0 < < A(a,c) c−a . Let a ≺ x  c be the smallest number such that x−a > (x x+a exists because of d a+U (A) = 0). Then re-define c = [ 2 ]. It is easy to see the new c is the number we want. Hence (2A)(0, 2H ) ∼ (2A)(0, 2a) + (2A)(2a, 2c) + (2A)(2c, 2H ) # # 3 3A(0, a) + #A[a, 2c − a] + T # + 3A(c, H ) 3 3A(0, a) + 2A(a, 2c − a) + 3A(c, H ) 3 3A(0, a) + 4A(a, c) + 3A(c, H ) 1 3A(0, H ). Subcase 3.1.4.4.4. There is a c ∼ a such that A[c, H ] is a subset of an a.p. of difference d > 1. One can assume c ∈ A by the fact that A(a, y) 1 0 for any a ≺ y  H . If (2A)(2c, 2H ) 1 2A(c, H ), then (2A)(2a, 2H ) 1 3A(c, H ) ∼ 3A(a, H ) because T contains two consecutive numbers. Hence (2A)(0, 2H ) 1 3A(0, H ). Therefore, one can assume (2A)(2c, 2H ) ∼ 2A(c, H ). By Lemma 1.6, A[c, H ] is a subset of an a.p. of length ∼ A(c, H ) with difference H −c . d≈ A(c, H ) This implies that 1 A(c, x) ∼ (x − c) d for any c ≺ x  H . Hence one has d a+U (A) = d1 , which contradicts  d a+U (A) = 0 as assumed in this case. The theorem now follows from above four claims.  1 Proof of Theorem 3.2. If α < 2 in the proof of Theorem 3.1 is replaced by α = 12 , everything still holds except the proof of Lemma 2.6, which is used in Cases 3.1.3.1 and 3.1.4.2. In Lemma 2.6, the inequality α < 12 is used is to guarantee that k0 1 0. When α = 12 , Case 2.6.1 cannot occur. Under Case 2.6.2, one can still derive a contradiction using the same proof. Hence k0 ∼ 0 must be true. If k0 ∼ 0, then it is easy to see that (2A)(H, 2H ) ∼ H . Hence (2A)(0, 2H ) ∼ 32 H implies (2A)(0, H ) ∼ 12 H . So (2A)(0, H ) ∼ A(0, H ) is true. This shows that in Claim 3.1.3, if α = 12 , then the conclusion needs to be changed to (2A)(0, H ) ∼ A(0, H ), which is (b) of Theorem 3.2.

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

171

In Case 3.1.4.2, one needs to derive a contradiction when α = 12 . In fact if α = 12 , then the condition of Claim 3.1.4 cannot occur. Let a = max{x ∈ A : x is the least element of a crowded triple in A} and let T ⊆ A be the crowded triple with a = min T . If 0 ≺ a ≺ H , then A[a + 3, H ] contains no crowded triple. Hence A(a + 3, H ) 2

1 (H − a − 3), 2

which implies A(0, a) ∼ 12 a, and # # #A[a + 3, H ] + T # 3 2A(a + 3, H ). So (2A)(0, H + a) 3 (2A)(0, 2a) + (2A)(2a, a + H ) 3 3A(0, a) + 2A(a, H ) ∼ 2A(0, H ) + A(0, a) 1 3H + a 2 1 3 = (H + a) + (H − a) 4 4 3 1 (H + a). 4 This contradicts (4) of Theorem 3.2. Next we present a corollary to Theorem 1.3. Let & % 2 Q= :k 4 . k



Corollary 3.3. Suppose A contains two consecutive numbers and d¯(A) = α < 1. If α ∈ Q and   d¯(2A) = min d¯(2B) : B contains two consecutive numbers and d¯(B) α , then d (A) = 0. Proof. Since α ∈ Q, then A cannot have the structure described in the conclusion of Part II(a) or Part III(a) of Theorem 1.3. If A has the structure described in the conclusion of Part II(b) of Theorem 1.3, then it is easy to see n)  limn→∞ bcnn = 0. Hence d (A) = 0. Suppose that A has that limn→∞ A(b bn the structure described in the conclusion of Part I or Part III(b). Suppose d (A) =  > 0. Then d U (∗A) =  , where U = UH and H is the hyperfinite integer chosen in Theorem 3.1. If  > 12 , then there is 0 ≺ a ≺ H such that for any x with 0 ≺ x  a, one has ∗A(0, x) 1 12 x. Hence by Pigeonhole principle (2 ∗A)(0, a) ∼ a. This implies d¯(2A) = 1, which contradicts the fact that d¯(2A) = 1+α 2 < 1.

172

RENLING JIN

If 0 <     12 , then by Lemma 2.9 one has d U (2 ∗A) 85 . Hence there 15 ∗ is an a with 0 ≺ a  H such that ∗A(0, a)  11 10 a and (2 A)(0, a) 10 a. ∗ ∗  This contradicts the condition (2 A)(0, H ) ∼ A(0, H ). If α ∈ Q, then the corollary is no longer true. The following is an example. Example 3.4. Let α ∈ Q. Note that α  12 . Fix any 0    α. Let B = {kn : n ∈ N} ∪ {1 + kn : n ∈ N}. Then d¯(B) = α. If  = 0, then let A=B

(

2n

2n+1

22 , 22

) ,

n∈N

and if 0 <   α then let A=B

**  n + n + 22 , 22 . α

n∈N

It is easy to check that A contains two consecutive numbers, d¯(A) = α, d¯(2A) = 32 α, and d (A) = . §4. An unsolved case. We end this paper by asking the following question. A less vigorous version of the question is also asked in [7]. Question 4.1. Is it true that if d¯(A) = α < 12 and gcd(A − min A) = 1, then d¯(2A) = 32 α implies that either (1) there exist k, c, c  such that α = k2 and A ⊆ {c + ik : i ∈ N} ∪ {c  + ik : i ∈ N} or (2) for any increasing sequence hn : n ∈ N with limn→∞ exist two sequences 0  cn  bn  hn such that A b n , hn = 1, lim n→∞ hn − bn + 1 cn = 0, lim n→∞ hn and [cn , bn ] ∩ A = ∅ for every n ∈ N?

A(0,hn ) hn +1

= α, there

We conjecture that the answer is “yes”. REFERENCES

[1] Yuri Bilu, Addition of sets of integers of positive density, The Journal of Number Theory, vol. 64 (1997), no. 2, pp. 233–275. , Structure of sets with small sumset, Asterisque, vol. 258 (1999), pp. 77–108. [2] [3] Heini Halberstam and Klaus Friedrich Roth, Sequences, Oxford University Press, 1966.

INVERSE PROBLEM FOR UPPER ASYMPTOTIC DENSITY II

173

[4] Pal Piroska, and Imre Z. Ruzsa, On the Schnirelmann density of sumsets, ´ Hegedus, ¨ Gabor ´ Publicationes Mathematicae Debrecen, vol. 53 (1998), no. 3– 4, pp. 333–345. [5] C. Ward Henson, Foundations of nonstandard analysis—a gentle introduction to nonstandard extension, Nonstandard analysis: Theory and applications (Nigel J. Cutland, C. Ward Henson, and Leif Arkeryd, editors), Kluwer Academic Publishers, 1997. [6] Renling Jin, Nonstandard methods for upper Banach density problems, The Journal of Number Theory, vol. 91 (2001), pp. 20–38. , Inverse problem for upper asymptotic density, The Transactions of American Math[7] ematical Society, vol. 355 (2003), no. 1, pp. 57–78. [8] Tom Lindstrom, An invitation to nonstandard analysis, Nonstandard analysis and its application (Nigel J. Cutland, editor), Cambridge University Press, 1988. [9] Melvyn B. Nathanson, Additive number theory—inverse problems and the geometry of sumsets, Springer, 1996. DEPARTMENT OF MATHEMATICS COLLEGE OF CHARLESTON CHARLESTON, SC 29424, USA

E-mail: [email protected]

NONSTANDARD ANALYSIS AND COHOMOLOGY

ANGUS MACINTYRE

Abstract. The article outlines developing isues in the model theory of cohomology as used in algebraic geometry. It considers nonstandard cohomology theories, and the delicate issues of formalism and uniformity needed to get nontrivial results. One important general point of method is that for this kind of nonstandard analysis one does better in a category-theoretic rather than a set-theoretic, framework. Particular emphasis is placed on category-theoretic versions of the ultraproduct construction.

§1. Introduction. This short article outlines developments and issues connecting model theory and cohomological aspects of algebraic geometry. The literature is all recent and from two sources. On the one hand, there is my paper [23], rather technical and deliberately avoiding foundational issues, and the companion piece [28] by Schoutens, more concerned with bounds in advanced commutative algebra. There is significant overlap, especially in connection with intersection multiplicities. [23] led to Tomasic’s paper [29], which uses model theory to construct a Weil Cohomology Theory out of more elementary torsion situations. On the other hand, there is an indepen¨ dent development, from outside of model theory, by Brunjes and Serp´e [6, 7], developing nonstandard versions of cohomology, on foundations of nonstandard category theory. They, too, construct a Weil Cohomology Theory, and also obtain suggestive results on l -adic e´ tale cohomology. More recently, there is the remarkable paper of Beke [1]. In this article, I provide a guide to the above papers, and some suggestions on how to develop the ideas therein. The main methodological point is that one does better if one does one’s nonstandard analysis in a category-theoretic version rather than in the set-theoretic enlargement style. §2. Foundational issues. 2.1. Geometry and model theory. Algebraic geometry meets model theory first in the setting of algebraically closed fields, and the structure of definable sets therein. Only first-order logic is involved, and for a long way into the development of the model theory of fields one does not encounter, or feel the Nonstandard Methods and Applications in Mathematics Edited by N. J. Cutland, M. Di Nasso, and D. A. Ross Lecture Notes in Logic, 25 c 2006, Association for Symbolic Logic 

174

NONSTANDARD ANALYSIS AND COHOMOLOGY

175

need for, nonstandard integers, infinitesimals, the counting/measure analogies of nonstandard analysis, etc. As one approaches diophantine geometry, the situation changes. Indeed, the first serious attempt to relate model theory to diophantine geometry, by Robinson and Roquette [27], did use nonstandard methods, and had a number of suggestive ideas. But the paper has had negligible influence lately, as compared with that of the standard, but very sophisticated, first-order model theory applied to diophantine geometry by Hrushovski [16] and others. In [17] Hrushovski has a very useful metaphor concerning interactions between model theory and number theory/geometry. It is conspicuous that one sees almost no topos theory in contemporary model theory, and indeed very little of the sheaf-theoretic formalism of modern geometry. I suspect that most model theorists are aware that they will have to use the sheaf formalism if they are to keep in touch with geometry. Yet, most model theorists are at best indifferent to topos theory, and uninterested in its semantics. Despite this, it is encouraging to see nontrivial cohomology emerging in o-minimality [4], and the beginnings of a model theory (with little cohomology) of complex or rigid analytic spaces. 2.2. Ultraproducts. I prefer to think of enlargements as given by ultraproducts. For one thing, the constructions are more concrete, and, for another, they are richly functorial. In addition, the concrete interpretation via ultraproducts should normally remove any confusions about the internal/external distinction. Another virtue of the ultraproduct is that it can be defined categorically. Then, for certain categories of structures (see below), one can show the existence of ultraproducts in this categorical sense, even when the conventional set-theoretic ultraproduct takes us out of the category. Important examples are the categories of profinite groups, or normed spaces, in both cases suitably enriched. The need for this enrichment in most interesting cases is perhaps the main point of this paper, and is analogous to the enrichments made by Bishop (for example) in constructive mathematics [5, 22]. 2.3. Sheaf interpretation. One geometric aspect of enlargements often neglected is that all enlargements live together in a sheaf setting. I find it natural to think of the ultraproduct construction as taking a family {Mi : i ∈ I } of conventional Tarskian structures and extending it naturally to a family over a space Iˆ extending I . Iˆ will be Spec(R), the space of prime ideals of the Boolean ring R corresponding to the Boolean algebra Powerset(I ). Thus a basic open set of Iˆ is of the form UA for A an element of R, where UA is {℘ : A ∈ / ℘}. ˇ Spec(R) is well known to be homeomorphic to I , the Stone-Cech compactification of I , that is the space of ultrafilters on I , via the map sending an ideal ℘ to I \ ℘.

176

ANGUS MACINTYRE

Since one is talking about geometry, it is not mere self-indulgence to formulate the ultraproduct construction in terms of sheaves, as follows. One starts from a family of Tarskian structures {Mi : i ∈ I } (we will use very little about the Tarskian category in what follows), and makes a presheaf F on Iˆ out of it. The crucial definition is given on the basic open sets: , Mi F UA = i∈A

with the natural restriction maps on such products. It is clear that F naturally extends to a presheaf, and indeed a sheaf, on Spec(R). What is of most importance to us is the identification of the stalk of F at a point ℘ of Spec(R). It is well known that, when the Mi are conventional Tarskian models, the stalk, which we write as M℘ , is canonically identifiable as the ultraproduct of the Mi with respect to the ultrafilter I \℘. The principal ultrafilters correspond to the principal prime ideals I \ {i}. 2.4. Continuity of satisfaction. The ultraproduct, in the Tarskian category, satisfies Łos’s Theorem [3], which has a natural interpretation in terms of continuity of satisfaction. The elements of the stalk M℘ are identified as germs of functions in the standard and obvious way. To avoid excessive notation, we write germs as if they were functions. For a formula φ(v1 , . . . , vn ) and partial functions f1 , . . . , fn , define val φ, f1 , . . . , fn as Then



 i ∈ I ; Mi |= φ f1 (i), . . . , fn (i) .

M℘ |= φ f1 , . . . , fn ⇐⇒ val φ, f1 , . . . , fn ∈ ℘.

Since the right hand side of the equivalence is an open condition on ℘, one deduces continuity of satisfaction, in the sense that if a formula holds at M℘ for the germs at ℘ of functions (f1 , . . . , fn ) then it holds for the germs of these functions in all stalks over points of Spec(R) in an open neighbourhood of ℘. 2.5. Topos-theoretic aspects. A very much more general version of the ultraproduct construction is to be found in the literature on topoi. Remarkably, it seems to be little known to model theorists. One begins with the notion of a filter on a left exact functor L between two topoi E and F , and the subnotion of ultrafilter. To the data of such an filter one associates a new topos, the filterpower of E modulo the filter, and a canonical projection functor from E to the filterpower. One proves that the projection functor is logical, i.e. preserves finite limits, exponentials, and the subobject classifier.

NONSTANDARD ANALYSIS AND COHOMOLOGY

177

A particularly important case generalizes the Tarskian ultraproduct and the Robinson enlargement. That is the case when I is an object of F with global support, E is the slice topos F /I , and the left exact functor from E to F is the canonical “product of I -indexed families”. For the category of sets one gets from an ultrafilter on I an ultrafilter on the above left-exact functor, and one readily identifies the elements of the filterpower as the ultraproducts of the “I -indexed families”. This is not the place to enter into further details. There are excellent accounts in many places, for example Bell’s [2] or Johnstone’s [20]. The Beth-Kripke-Joyal semantics subsumes the semantics implicit in Łos’s Theorem. I remain puzzled at the apparent indifference of model theorists and nonstandard analysts to this beautiful machinery. 2.6. Sorting. From the beginnings of the model theory of ultraproducts it has been obvious that Łos’s Theorem, and all the functoriality, is true for structures in many-sorted logic. It is noteworthy that one did not exploit this until much later, and, indeed, model theory was presented in an inflexible one-sorted way until quite recently. In the special case of ultraproducts (or its special subcase of enlargements) the advantages of the many-sorted formalism are conspicuous. For example, one way to demystify the internal/external distinction is to have one sort for the structure and one for its powerset, and to stress that in taking ultraproducts one does not take products across sorts. Thus the set sort in the ultraproduct is not the powerset of the object sort, and that’s all there is to say. One need not even interpret the set sort as a subset of the powerset. (At least, I cannot think of any arguments where it would be crucial to make this interpretation.) It seems to me much more natural to regard the ultraproducts/enlargements as living in a topos, not necessarily that of sets. It is of course convenient to be able to construe topoi as local (sorted) set theories as Bell does (loc. cit). My main point here is that for certain rather basic categories in mathematics the naive Tarskian ultraproduct takes one out of the category, but there is an enrichment of the category, equivalent to the original, given in a sorted formulation, closed under ultraproduct in the categorical sense, and thereby providing a natural definition of ultraproduct for the original situation. I will now discuss, sketchily, some such cases, leaving to the interested reader the (presumably) routine task of laying foundations of this kind of model theory. 2.7. Nonstandard hull. Suppose we want to define the ultraproduct of normed linear spaces over a field K (the reals, or complexes, or p-adics). The Tarskian ultraproduct will have a norm taking values in an ultrapower of K, rather than in K, and will certainly lose any completeness the original spaces might have had. The standard solution [14] is to consider a quotient of a substructure of the Tarskian ultraproduct. Namely, one considers the elements of Z-bounded norm, modulo the elements of Z-infinitesimal norm.

178

ANGUS MACINTYRE

Let me just sketch how this is really an ultraproduct in a categorical sense, when normed spaces are construed as structures for a many-sorted formalism. Let M be a normed space. One construes this as sorted by sorts Br , as r runs through the non-negative reals. Br is to be the sort of elements of norm ≤ r. The module operations (addition and scalar multiplication) must also be sorted. Addition has to be replaced by a family of maps +r,s from (Br ) × (Bs ) to Br+s , and similarly for all the scalar multiplications. We should have a constant 0 for the zero of the module, and perhaps also transition maps jr,s from Br to Bs for r ≤ s. Now consider the category of normed spaces with contraction maps. Such maps send each B-sort to itself, and are maps for the many sorted formalism above.  Note that the category has products, where Mi consists of the elements of the set-theoretic product whose supnorm is finite, and we take this supnorm as norm. The projections are obviously contractions. Moreover, the category has the direct limits of products required for the categorical definition of ultraproducts, and this is just the quotient of the conventional direct limit by the infinitesimals. That is, the nonstandard hull is exactly the categorical ultraproduct. The exact form of the Łos’s Theorem in this setting has been worked out by Henson and associates. 2.8. Profinite groups. A similar construction arises for profinite groups [9]. The Tarskian ultraproduct of profinite groups need not be profinite. But a many-sorted version is closed under ultraproduct. Here is a brief sketch of the basic idea. There is a sort for each positive integer n and each group (structure) on 1, . . . , n. Call these the group sorts H . To a profinite group G we associate in sort H all continuous epimorphisms to H . The other crucial part of the formalism is the data of unary maps (f) from sort H1 to sort H2 , for every epimorphism f from H1 to H2 . The interpretation of this map is composition with f. We can recover G up to isomorphism from the many sorted structure as a projective limit (see [8] for details of a related interpretation). The essential point is that one can find a set of sentences in the sorted formalism characterizing exactly the structures arising from profinite groups. Then the work of [9] really shows that the categorical definition applied to the sorted structures gives the notion of co-ultraproduct of profinite group used in [9]. Moreover it shows that for the category of profinite groups with epimorphisms there is a categorical definition of co-ultraproduct (and this is a natural notion for Galois-theoretic purposes, as [9] makes clear). In this situation one was led to the co-ultraproduct because of a desire to have a model theory of Galois groups dual to the model theory of the category of fields with regular embeddings. One of course knew from the outset that the classical ultraproduct did not commute with the operation of taking absolute Galois group.

NONSTANDARD ANALYSIS AND COHOMOLOGY

179

For future reference, note that the category of finite Galois extensions of fields is intimately connected to the e´ tale site over the spectrum of a field. We will later see an extensive generalization of the preceding in the model theory of e´ tale sites over varieties over algebraically closed fields. The sorting required will be significantly more complicated. 2.9. Polynomial rings. An example more fundamental for our present exposition concerns the polynomial ring construction in n variables over a field. It is well known that this construction does not commute with the Tarskian ultrapower. Despite this, one has succeeded in getting significant understanding of nonstandard polynomials (of infinite degree) and this has led to a number of fine results about bounds in the theory of polynomial ideals [30]. Here, however, I take a different, less ambitious, point of view. The polynomial ring K[x1 , . . . , xn ] is sorted as follows. One has sorts Dk for the set of polynomials of degree ≤ k, and, as above, transition maps identifying Dk inside Dl for k ≤ l . We have group operations +k on Dk , and multiplications × k, l from (Dk ) × (Dl ) to Dk+l . It is obvious that there are some simple first-order axioms characterizing polynomial rings over fields in this formalism. Note that there is no global ring sort. Clearly this construction commutes with ultraproduct, and is again a categorical ultraproduct. The interpretation of such important results as the first-order nature of primality ([30]) is that the property of F1 , . . . , Fk in Dl to generate a prime ideal is definable by a simple sorted formula. I want to stress that I applaud the efforts to give more than ritual sense to “infinite polynomials”, but all worthwhile such efforts are difficult. It is noteworthy that in [6, 7] there are many appeals to nonstandard entities of this kind, in situations where one has no nontrivial information about them. This is certainly not an optimal situation, as real understanding requires two perspectives on the nonstandard [17]. Having got that the notion of prime for k-tuples in Dl is elementary, one has in some sense got a sorted hold on Spec of the polynomial ring. Spec is sorted by k, l as above, and this commutes with ultraproduct. From here one can go on to give a sorting of affine, or projective, varieties over a field. Thence one goes, in the style of [23] to a sorting of the free abelian group generated by the subvarieties of a variety. Noting, too, that dimension is an elementary notion in the elementary theory of ideals over an algebraically closed field (and the field sort can be identified with polynomials of degree 0), we get the result that the divisor group of varieties of sort  commutes with the sorted ultraproduct. All this is fairly trivial and obtained without ever understanding cycles of nonstandard complexity (although of course any progress in that direction would be welcome). 2.10. Intersection theory. One proceeds from here to intersection theory as in [23], and rather quickly the matter becomes much more complex. The graded abelian group of cycles over a variety X is, as just remarked,

180

ANGUS MACINTYRE

first-order in a suitably sorted formalism. It is usually written Z ∗ (X ) or Z∗ (X ) depending on whether it is graded by codimension or dimension. The crucial structure for intersection theory is a quotient of the above by an equivalence relation E, where there are three standard choices for E. These are respectively • rational equivalence, • algebraic equivalence, • numerical equivalence, and Kleiman [21] gives an account attractive from a model-theoretic point of view. A fundamental issue is the first-order nature of these relations on the sorted structure Z ∗ . What one needs to know is whether or not these notions, for cycles in particular sorts , can be defined using quantification over a finite set of sorts depending only on , . Some answers or suggestive observations are known. For example, Hrushovski [15] has adapted a famous counterexample of Mumford [26] to show that rational equivalence is not definable in this sense, for cycles over projective nonsingular varieties. It seems likely that the same is true for algebraic equivalence, but I believe this is not known. For numerical equivalence, I made the observation, using ultraproducts, that it is elementary in this sense, provided a suitable version of the Grothendieck Standard Conjectures hold [21]. One may of course press on, and try to understand nonstandard rational or algebraic equivalence, but nothing significant has been found. Numerical equivalence is probably more basic nowadays, because of its motivic significance [18]. It may thus seem that one has insuperable difficulties in showing that the Chow ring, got via rational equivalence, behaves well with respect to ultraproduct or enlargement. But in fact, the situation is not chaotic. What is needed is an unwinding, in the style advocated by Kreisel [22], of Fulton’s development [11] of intersection theory. A more detailed outline is given in [23] (and a more systematic account is desirable). Here are the essential points. The category of projective nonsingular varieties (over an algebraically closed field L) has a natural sorting both on its objects and on its morphisms. That on its objects is more or less that already indicated on Spec of polynomial rings. Of course one has to take account of covers by affine varieties (e.g. in Weil style), but this is routine. More subtle is the sorting of morphisms. These are given by tuples of polynomials subject to various consistency conditions. Note that we have to make composition sorted, i.e. provide a family of sorted compositions. Once all this is done the enriched category is preserved under ultraproducts. As usual, enlargement provides artefacts like nonstandard varieties and morphisms, but to my knowledge one knows nothing nontrivial about them.

NONSTANDARD ANALYSIS AND COHOMOLOGY

181

Classically one has some functoriality for the Chow ring got by dividing the cycle group by an adequate equivalence relation. Thus to a morphism f:V →W in the category of varieties one associates functorially two maps f∗ and f ∗ of cycle structures, respectively covariant and contravariant [21]. Now even the functor has to be sorted, and considerable care is needed to get round, for example, the Mumford obstruction. f∗ is a morphism (shifting grading) of graded groups, and f ∗ a morphism of graded rings. In the sorted setting, things happen as follows. Suppose f : V → W is a sorted morphism. For f∗ we have to produce, uniformly in f, for each cycle C over V a cycle (C ) over W , together with constructible evidence that this is welldefined modulo the equivalence relation. What exactly is needed here? The equivalence relation of rational equivalence is definable in the following sense. It is a countable disjunction, indexed over various sorts, of first-order sorted relations of a special form. The essential idea is that C is equivalent to D if and only if for some sort and some cycle E of that sort a definable relation depending only on the sorts holds between C , D and E. What the unwinding must provide, and does, is a definable rule which gives, for all C , D and E and definable relations as above, a corresponding sort, definable relation (both of bounded complexity) and E ∗ , witnessing that (C ) is equivalent to (D). This requires detailed examination of the proofs in [11]. An exactly similar result holds for f ∗ . In both cases one shows that the process constructively respects composition. One shows that the process respects, constructibly, addition. This means that one shows explicitly that (C ) + (D) is equivalent to (C + D). The next, and crucial, item is the product structure on Z ∗ . This is done explicitly in [23], and again gives a map which is constructibly well-defined modulo the specified equivalence. One shows, constructibly, that f ∗ preserves this product, and gives a morphism of graded rings. Having done all this, one gets a version of the functor Z ∗ which is preserved under ultraproducts. But what are the consequences of the fact that rational equivalence is not first-order? The Mumford result shows that rational equivalence is not preserved under ultraproduct. It is, of course, preserved under the diagonal map associated to the ultrapower, and the diagonal map is suitably elementary. It seems not to be known whether or not algebraic equivalence is firstorder in the sorted sense. As for numerical equivalence, what we know is that it is first-order, in the sorted sense, if a version of the Grothendieck Standard Conjectures is true [23] for all Weil Cohomology Theories. Under those assumptions, the Chow ring modulo numerical equivalence does

182

ANGUS MACINTYRE

commute with (sorted) ultraproducts. Indeed (but this is not yet published) these assumptions give very strong uniformities (or preservation results) for Kunneth components, and for the Hodge and Lefschetz parts of the Standard Conjectures. Moreover, such results as Milne’s [25] on special cases of Hodge-Lefshetz should yield unconditional uniformities via ultraproducts. It is not clear to me if there are converse relations, namely that if numerical equivalence is elementary it should coincide with homological equivalence. There is also the issue of relating all this to Jannsen’s work [18]. I want to stress that there is a conceptual problem here. My unwinding argument really does show that the Chow ring commutes with ultraproduct, so what exactly is the difference between rational and numerical equivalence from the logical-geometric point of view? Syntactically there is an obvious difference, the former being a disjunction of elementary conditions, the latter a conjunction (and algebraic equivalence is a disjunction). 2.11. Formalism. That one should work in a sorted formalism is clear. Beyond that, it is not so clear what kind of definability should be emphasized. The most delicate structures in the preceding are sorted, and their basic algebraic structure is defined in a sorted way, for example in the case of multiplication in polynomial rings or the ring operations in intersection theory. There is a useful foundational exercise waiting for a student, to set up definitions at just the right level of generality for the intended geometric interactions. It is clear that the kind of structure one encounters in this area is a limit of structures definable in various sorts, with the transition maps between the sorts definable, and the operations partial in general on each sort in the limit, but with a “grading” analogous to that seen for polynomial multiplication or intersection product. §3. Ultraproducts and category theory. 3.1. General model theory of categories. In my work [23] I had to consider ultraproducts of Weil Cohomology Theories, and, more generally, ultraproducts of functors. I did this in a cursory fashion, as the basics are quite easy. It is all the more welcome that others have been more systematic about the matter subsequently [6, 7]. I now review the development, beginning with the most general considerations. There are many ways to formalize the notion of category in a first-order way, all so obvious as not to need spelling out. More striking is the fact that the notion of topos has such a formalization [20], a fact seemingly of no interest to most model-theorists. One of course begins abstractly, maybe as in [6, 7], and in effect defines ultraproduct in a Tarskian way for categories and functors. [6, 7] work in a one-sorted formalism, to me not very natural. I prefer to have a sort for objects and a sort for morphisms, a partial composition and operations giving domain and codomain of morphisms. To me it seems a false

NONSTANDARD ANALYSIS AND COHOMOLOGY

183

economy to take objects as idempotents, etc. In any case it is obvious that we have closure under the usual Łos ultraproduct, and preservation of such notions as epi, mono, iso, and indeed all finite limits. Moreover, the notion of ultraproduct of functors is clear, yielding a new functor. Of course, the naive functor category between categories is not preserved in general, as it is obviously a second-order notion. There is however a sorted notion which is preserved, analogous to the fake power sets. I remark that, unlike [6, 7], I disregard set-theoretic issues (smallness, universes, etc.), in complete confidence that I can, if need be, employ basic metamathematical hygiene to cope with any irritations. Continuing, one sees that sieve, additive category, abelian category, additive and exact functors, injective and projective objects of an abelian category, and derived functor, are preserved. The notion of having enough injectives is preserved too. One then goes on to show that the notion of triangulated category is firstorder. Up to this point sorting is scarcely relevant (there is none in [6, 7]). 3.2. Derived categories. We first have to do serious sorting when we come to the notion of the derived category of an abelian category. For that we first have to pass via the notion Kom(A) of complexes in an abelian category A. An object of Kom(A) is a Z-indexed sequence of morphisms from A so that the composition of two successive morphisms is 0. A morphism in Kom(A) is a sequence of morphisms between the two lines, with the obvious commutation condition. To get the right notion of ultraproduct we obviously do not take the global Tarski ultraproduct, but rather the sorted version where the Tarski ultraproduct is taken at each stage of the grading. It is then obvious that ultraproduct commutes with Kom, and that, as usual, the ultraproduct is categorically a limit of products. Something that has not been worked out, and needs to be, is a notion of formula appropriate for categories of complexes. For now, we get by using ad hoc observations as to what is preserved by ultraproduct. If someone can make sense of the unrestricted ultraproduct, this is bound to be valuable, but I know of no such success. From Kom(A) one passes to the homotopy category of complexes K(A), where one identifies morphisms which are homotopic [13]. Now, the relation of being homotopic is given by a countable conjunction of first-order sorted formulas, and one readily gets a functor from K(A)℘ to K(A℘ ). Note that there is no obvious functor going the other way (though exactly what pathology occurs I have not checked). The matter is lightly touched on in [6, 7]. It is completely obvious in the sorted formulation that the cohomology functors Hn are each first-order definable, and so preserved under ultraproduct. From this one sees that the notion of a morphism of complexes being a quasi-isomorphism in Kom(A) is preserved under ultraproduct, in the sense that Łos’s Theorem holds for it and its complement, in each sort.

184

ANGUS MACINTYRE

As carefully pointed out in [13], one gets the derived category by localizing K(A), rather than Kom(A), at the class of quasi-isomorphisms. Paying attention to the explicit definitions given in [13], one gets a functor from D(A)℘ to D(A℘ ). This is slightly more general than the observation in [6, 7]. In optimal situations one can work entirely in one of the bounded subcategories, coming from bounded complexes, or complexes bounded above, or below. See [13]. For each, one certainly does not want to take the unrestricted ultraproduct. In fact the natural things to consider are families in which H n is nonzero only in a bounded range depending only on the family. In such situations it is obvious that everything commutes with ultraproduct. More interestingly, as [6, 7] remark, there is a canonical fully faithful functor from D b (A℘ ) to D(A)℘ . The existence of this is immediate from Łos. A similar result, with a similar proof, holds for D + . We mention in passing the last section of [6, 7], on fibred categories. Everything in that section is quite straightforward from Łos. 3.3. More structured categories. The preceding was about abstract categories A, to which we then assign more structured categories, e.g. of complexes, and then get the sophisticated notion of derived category. The main point is that, with suitable attention to formulation, one shows that the derived category functor commutes with ultraproduct. Much more attention to detail is needed when one comes to the derived category of sheaves on the e´ tale site. Many of the most important categories, as far as logic/geometry are concerned, have a more delicate structure, which we must spell out in order to define ultraproducts. We made this clear already in the case of the category of projective nonsingular varieties, where there is a sorting of objects and morphisms. My paper on Weil Cohomology depends on such considerations. [7] addresses the issue of enlarging e´ tale cohomology, something I had touched on in lectures at MSRI in 1998. As they remark, e´ tale cohomology has good properties only for torsion sheaves (or even restricted subclasses), and the l -adic theory is not a derived functor. They proceed with the general technology of enlargements, and rarely get (or need) any alternative representation of the structures thus created. For example, they begin with schemes of finite type over a noetherian scheme, and take ultraproducts of everything. As they say, the structures obtained are not schemes, “but can be thought of as some kinds of limits of schemes”. It seems likely that one can do better than this, by restricting the sorts of the schemes (of finite type), or by more systematic use of the perspectives on schemes given in [10]. One should note that a very special case of all this is the Galois theory, whose model theory was naturally analyzed in [9], without bringing in the general nonsense of nonstandard analysis. What should be done is to sort the e´ tale site over X , and then there is no real difficulty in understanding abelian presheaves, abelian sheaves, the

NONSTANDARD ANALYSIS AND COHOMOLOGY

185

fibrations given on page 4 of [7], etc. Note of course that the coverings of the site must also be sorted. When I say that we understand sheaves (as opposed to presheaves), it is precisely because the covers are finite, and then the sheaf condition is given by a conjunction of elementary conditions. Without this, one would have problems. No doubt such problems will have to be faced in due course, but for now it seems unnecessary to aim for such generality. I prefer to work with the sorted versions of scheme, which are closed under ultraproducts. It is then completely clear that the notion of abelian presheaf is closed under ultraproduct. Since we deal only with finite covers (sorted), sheaf is also closed under ultraproduct. The discussion in [24] is very convenient if one wishes to establish the preservation results. Thus we will get ultraproduct taking abelian categories to abelian categories, functorially but not in general faithfully. On general nonsense grounds we will have a functor from Sheaves(X )℘ to Sheaves(X )℘ , with the sorting of e´ tale morphisms. Closer inspection shows it is faithful (this is the same kind of observation as underlies [9]), and exact. We just follow out the development of [6, 7]. The most important point is that the sheafification functor behaves well under ultraproduct in this special situation. Since the matter is important both practically and theoretically, let me elaborate a bit. A convenient reference is [24]. One is interested (for now) mainly in varieties X , or more generally quasicompact schemes, over algebraically closed fields, and the fact that allows one to deal directly (rather than by disappearing into the obscurity of enlargements) with the e´ tale site over X , and its sheaf theory, is that every object U of the e´ tale site over X is itself quasicompact, so admits a finite description fitting nicely into a sorted formalism. Moreover, every e´ tale covering has a finite subcovering, and thus again we may sort the coverings of the site. Essentially as Milne remarks in his 7.20, one is therefore able to check that a presheaf is a sheaf by considering finite coverings, and this will in our language become a family of elementary conditions. Still, a little precision is needed. We know how to sort algebraic sets, and in particular projective varieties. How about quasicompact schemes, and their e´ tale morphisms? A quasicompact scheme is just a finite union of affine schemes, i.e. of finitely many Spec(Ri ). Since the rings involved can be complicated, this is not per se much help. But recall that the e´ tale maps are quasi-finite and of finite type, so that the schemes in the e´ tale site over a variety X are finite unions of Spec(Ri ), where the Ri are finitely presented. Via this, one sorts the elements of the e´ tale site. To sort the morphisms one proceeds similarly. I am not going to present the details here, though they will be needed in any systematic treatment of the model theory of e´ tale sites. The point to be made now is that if {Xi : i ∈ I } is a (sorted) family of varieties

186

ANGUS MACINTYRE

over algebraically closed fields then not only is the ultraproduct of the family well-defined, but so is the ultraproduct of their e´ tale sites, and it is naturally the e´ tale site of the ultraproduct. This understood, one can pass to ultraproducts of presheaves and sheaves on these sites. Two points seem most important to me. 1. The elementary nature of the sheafification functor in this situation (though presumably not in general). 2. The operation of extension by zero is in this situation elementary, allowing one to get an elementary treatment of cohomology with compact support [24]. The former is of course used systematically in defining the abelian structure of the category of e´ tale sheaves on a scheme, and we will exploit this to get a result giving the elementary version of this for the e´ tale site over an affine or projective variety. Having this, one can then use the general results (e.g. from [6]) on the elementary nature of the derived category of an abelian variety, to get elementary versions of e´ tale cohomology. In particular, one will get preservation theorems for the e´ tale cohomology groups, and uniform bounds on dimensions. 3.4. Pseudofinite coefficients and standard part for l -adic cohomology. The ultraproduct of coefficients is what has so far led to the most interesting contribution of model theory to this area. Thus in [29] one goes from finite coefficients to pseudofinite coefficients, and in particular gets a Weil theory out of non-Weil theories. [7] devote some time to this, and the also interesting issue of standard part maps for the l -adic cohomology theories. Let me summarize their very nice results. Their objective is to understand standard l -adic e´ tale cohomology (which is not a derived functor) by means of specialization from cohomology which is a derived functor. The procedure is to go from an abelian category A successively to 1. Proj(A), the abelian category of projective systems. 2. Shift. 3. Null systems, l -adic systems. 4. Artin-Rees systems, and Artin-Rees l -adic systems. 5. The Mittag-Leffler Property. Then one considers the effect of ultraproduct (or, as [6, 7] say, enlargement). The sorted meaning of projective system is clear, and preserved by ultraproduct. However, [7] wish to go further, by looking at nonstandard aspects of projective systems, i.e. systems over the ultraproduct of the integers. The meaning of this is quite clear. In particular the elements of an ultraproduct of A are not merely abelian groups but even modules over an ultraproduct of Z. The main game to be played is to operate modulo l h in l -adic systems, for h nonstandard [7], 5.14 gives existence of faithful functors, reflecting

NONSTANDARD ANALYSIS AND COHOMOLOGY

187

isomorphisms, both to A/l h A and to Artin-Rees systems over the ultraproduct. This yields first an interpretation of the l -adic groups of the classical theory in terms of the derived category interpretation for a quotient by a nonstandard power. In particular, using the deep results of Gabber [12], it even provides canonical isomorphism for almost all l (where almost all depends only on the family from which our varieties are taken). This gives us an interpretation of standard l -adic e´ tale cohomology in terms of a standard part of a derived functor cohomology for nonstandard coefficients, a satisfying result. [7] enunciate a general metamathematical principle (3.6), and verify it for most of the basic ingredients of Weil Cohomology. En route they make a number of basically important remarks, essentially about the elementary nature of various homological constructions, including cup-product. Note, however, that their discussion of the cycle map is less detailed than the one I gave in [7]. It is done in the naive enlargement setting, without any attempt to sort out the intelligible cycles. Let us summarize what is now known. One can readily define ultraproducts of categories and functors, and then of abelian (or merely additive) categories. One does not have a general notion (beyond the general nonsense one) of ultraproduct of site, sheaf, etc. However, for a large class of schemes of interest, one has natural sortings so that one can define both ultraproduct of the schemes and ultraproduct of the e´ tale sites, and establish the natural commutativity. One may also define triangulated and derived categories, and their ultraproducts. One can show that e´ tale cohomology, in the kind of sorting appropriate to affine or projective varieties, commutes with ultraproduct (also in the compact support situation). With more work, one may obtain l -adic e´ tale cohomology by a “standard part” map from a derived functor cohomology of ultraproducts. More work is needed to extract uniformities from this, and to look at uniformities across l too. In particular, if one assumes the Standard Conjectures, even only for l -adic e´ tale cohomologies, what uniformities are there beyond the familiar “independence of l ” conjectures on the characteristic polynomials? 3.5. Final observations about sorting on the e´ tale site. I conclude this paper by sketching two results. Neither is deep, and each is subject to slightly unpleasant restrictions, but they are probably essential for any useful model theory of e´ tale cohomology. They concern mainly the elementary nature of the notion “sheaf associated to a presheaf ”. Once one has that, one will be able to get, in elementary formulation, the abelian structure for the category of sheaves of abelian groups on the e´ tale sites. Moreover, one will then have the elementary model theory of constant sheaves, constructible sheaves,

188

ANGUS MACINTYRE

etc. The elementary formalism for the compact support situation is covered too. So what is involved? Suppose F is a presheaf of abelian groups for the e´ tale site on a scheme X of suitable finite type. We want to define the associated sheaf G in a uniform model-theoretic way. Let U be an object of the e´ tale site over X , that is, an e´ tale morphism U → X . We have to define G(U ) first of all. For this one has to refer to all finite coverings of U for the e´ tale topology, and one will get a useful model theory only if one pays attention to the sorting of these coverings. For each covering, and each compatible system of sections for that covering, one considers the elementary question whether there is exactly one element of F (U ) inducing that system. Of course, the complexity of the elementary question depends on the size of the covering (inter alia). The intuitive (and best for our purposes) way is, as Mumford [26] says, first to identify things which have the same restrictions, and then add in all things which can be added together. Thus, the first step is to collapse F (U ) modulo the equivalence relation of agreeing on a cover. This relation is a countable union of elementary conditions, obviously. Note that by this collapse we obtain another presheaf, a little more sheaflike than F , in that it passes the uniqueness part of the sheaf test, but not necessarily the existence part. Let us note too that this part of the construction commutes with ultraproducts, under the necessary sorting constraints. Now to the second part, and we may as well assume that we have passed the first part of the sheaf test. So now we define G(U ) as the group of consistent systems of sections over coverings, again modulo the equivalence relation of agreeing on a common refinement. This relation is clearly a countable union of elementary conditions. Note the very convenient point that the common refinement involved is elementary (without this there could be a serious anxiety about preservation under ultraproduct). Milne [24] gives several ways of defining the associated sheaf of a presheaf. One, in Aside 7.19, is particularly convenient in our setting. To a presheaf F on a site one assigns another presheaf F + , as a limit over coverings, and then shows that F + + is the sheaf associated to F . The formula involved is , ,  → P Ui ×X Uj P + (U ) = lim Ker P Ui → →

In what model theory is the above construction definable? Certainly one has to work in a sorted category with sorted coverings, i.e. a sorted site (like the e´ tale sites we have considered). But there are further, and more interesting, constraints. The value F + (U ) is the limit, in the category of abelian groups, of a family of equalizers, indexed by coverings of U . Now, the coverings are sorted, and one first looks at the equalizers arising from coverings of a given sort. These are simply first-order definable, uniformly across a given sort. In

NONSTANDARD ANALYSIS AND COHOMOLOGY

189

effect one has a definable function of two variables, U and the covering C. By a remark made earlier, for the e´ tale setting, any two sorted coverings have a common refinement, uniformly definable from them, and of a sort depending only on the sorts of the original coverings. Thus F + (U ) is a sorted limit of abelian groups, much as the Chow ring is a sorted limit of fragments of groups. This now leads to the notion (which I leave to others to formulate precisely) of a sorted presheaf of abelian groups on a sorted site. So far, we are interpreting Milne’s construction as leading, in a elementary way, from presheaves to sorted presheaves. But, in fact, it goes in an elementary way from sorted presheaves to sorted presheaves. This is crucial for the model theory, as we want the notion of associated sheaf to be first-order, and so preserved under ultraproducts, at least in the geometric situations which presently concern us. So suppose F is a sorted presheaf, and look at Milne’s formula for F + (U ). The individual terms in the limit involve coverings Ui (i ∈ I ) and the values F (Ui ). Since F itself is to be a sorted presheaf, F (Ui ) will be defined as a sorted group, where the sorts involved in the definition depend only on the sort of Ui , and similarly for the sorting of the group operation. Now notice that in Milne’s , ,  → P + (U ) = lim Ker P Ui ×X Uj P Ui → →

in the special case of the e´ tale site the covers are finite and it is readily checked that F + (U ) is given as a sorted presheaf. In a systematic treatment of this kind of model theory, full details will have to be given, with as general as possible a definition of sorted presheaf. But the basic idea is clear. 3.6. Beke’s work on ultraproducts. [1] introduces new considerations into this kind of model theory. He is dealing with cohomology of groups, and the standard complex, where length of representation of cocycles is both interesting and a source of complications. He confronts the natural problem of isoperimetric inequalities on the cohomology. This is extremely suggestive, in view of the nonstandard analysis of groups initiated by van den Dries and Wilkie [31]. The sorting of the cohomology by length is also relevant. One can expect much more model-theoretic work in this direction. Jardine’s work [19] is also relevant. REFERENCES

[1] T. Beke, Isoperimetric inequalities and the Friedlander-Milnor conjecture, Preprint, Michigan, 2002. [2] J. L. Bell, Toposes and local set theories, an introduction, Oxford Logic Guides, vol. 14, O.U.P., 1988.

190

ANGUS MACINTYRE

[3] J. L. Bell and A. B. Slomson, Models and ultraproducts: An introduction, North-Holland, 1971. [4] A. Berarducci and M. Otero, o-minimal fundamental group, homology and manifolds, Journal of the London Mathematical Society. Second Series, vol. 65 (2002), no. 2, pp. 257–270. [5] E. Bishop and D. Bridges, Constructive analysis, Grundlehren der Mathematischen Wissenschaften, vol. 279, Springer-Verlag, Berlin, 1985. [6] L. Brunjes and C. Serp´e, Enlargements of categories, Preprint Series SFB 478, Number ¨ 286, Munster, 2003. , Nonstandard e´ tale cohomology, Preprint Series SFB 478, Number 288, Munster, [7] 2003. [8] Z. Chatzidakis, Model theory of profinite groups having the Iwasawa property, Illinois Journal of Mathematics, vol. 42 (1998), no. 1, pp. 70–96. [9] G. Cherlin, L. van den Dries, and A. Macintyre, Decidability and undecidability theorems for PAC-fields, American Mathematical Society. Bulletin. New Series, vol. 4 (1981), no. 1, pp. 101–104. [10] M. Demazure and P. Gabriel, Groupes algebriques, North Holland, 1970. [11] W. Fulton, Intersection theory, Springer, 1988. [12] O. Gabber, Sur la torsion dans la cohomologie l -adique d’une vari´et´e, Comptes Rendus de l’Acad´emie des Sciences. S´erie I. Math´ematique, vol. 297 (1983), no. 3, pp. 179–182. [13] S. Gelfand and Y. Manin, Homological algebra, Springer, 1999. [14] W. Henson and L. C. Moore, Nonstandard analysis and the theory of Banach spaces, Nonstandard analysis-recent developments, Springer Lecture Notes in Mathematics, vol. 983, Springer, 1983, pp. 27–112. [15] E. Hrushovski, Personal communication. , The Manin-Mumford conjecture and the model theory of difference fields, Annals [16] of Pure and Applied Logic, vol. 112 (2001), no. 1, pp. 43–115. , Geometric model theory, Proceedings of the international congress of mathemati[17] cians, vol. I (Berlin, 1998), Doc. Math. 1988, Extra Vol. I, 281–302. [18] U. Jannsen, Motives, numerical equivalence, and semi-simplicity, Inventiones Mathematicae, vol. 107 (1992), no. 3, pp. 447– 452. [19] J. F. Jardine, Ultraproducts and the discrete cohomology of algebraic groups, Algebraic K -theory, Fields Institute Communications, vol. 16, A.M.S., 1997, pp. 111–129. [20] P. Johnstone, Topos theory, Academic Press, 1977. [21] S. L. Kleiman, The standard conjectures, Motives (U. Janssen et al., editors), Proceedings of Symposia in Pure Mathematics, vol. 55, American Mathematical Society, 1994, pp. 3–19. [22] G. Kreisel and A. Macintyre, Constructive logic versus algebraization 1, The L. E. J. Brouwer centenary symposium (Noordwijkerhout, 1981), Stud. Logic Found. Math., vol. 110, North-Holland, Amsterdam, 1982, pp. 217–260. [23] A. Macintyre, Weil cohomology and model theory, Connections between model theory and algebraic and analytic geometry, Quad. Mat., vol. 6, Aracne, Rome, 2000, pp. 179–199. [24] J. S. Milne, Lectures on e´ tale cohomology, Michigan, 1998, available at www.math.1sa.umich.edu/∼ jmilne. , Polarizations and Grothendieck’s standard conjectures, Annals of Mathematics, [25] vol. 155 (2002), pp. 599–610. [26] D. Mumford, The red book of varieties and schemes, Springer Lecture Notes in Mathematics, vol. 1358, Springer, 1988. [27] A. Robinson and P. Roquette, On the finiteness theorem of Siegel and Mahler concerning Diophantine equations, Journal of Number Theory, vol. 7 (1975), pp. 121–176. [28] H. Schoutens, Uniform bounds in algebraic geometry and commutative algebra, Connections between model theory and algebraic and analytic geometry, Quad. Mat., vol. 6, Aracne, Rome, 2000, pp. 44–93.

NONSTANDARD ANALYSIS AND COHOMOLOGY

191

[29] I. Tomasic, A new Weil cohomology theory, preprint, Leeds, 2003. [30] L. van den Dries and K. Schmidt, Bounds in the theory of polynomial ideals over fields. A nonstandard approach, Inventiones Mathematicae, vol. 76 (1984), pp. 77–91. [31] L. van den Dries and A. J. Wilkie, Gromov’s theorem on groups of polynomial growth and elementary logic, Journal of Algebra, vol. 89 (1984), pp. 349–374. SCHOOL OF MATHEMATICS AND STATISTICS QUEEN MARY, UNIVERSITY OF LONDON MILE END RD. LONDON E14NS, ENGLAND

E-mail: [email protected]

APPLIED MATHEMATICS

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

NIGEL J. CUTLAND

Abstract. We survey the use of Loeb space methods in the study of the stochastic Navier-Stokes equations, with particular emphasis on recent results concerning the existence of attractors.

Introduction. The Navier-Stokes system of equations is a pair of partial differential equations describing the flow of an incompressible fluid in a region of space — in practice a subdomain of R2 or R3 . These equations have a very long history and even now there are fundamental problems concerning the basic issues of existence and uniqueness of solutions, which figure in the list of Millennium Prize Problems with a $1m reward for the solution of any one of them [15]. The setup we consider here is that of a bounded domain D ⊂ Rd (mainly d = 2, 3) in which there is a fluid in motion, with the velocity of the fluid at each point x ∈ D given by a function u(x) ∈ Rd . Nonstandard techniques have been useful in the study of the Navier-Stokes equations in two main ways. First, the classical formulation of the equations casts them in a Hilbert space setting, where they may be viewed as a single infinite dimensional equation describing the time evolution of the entire function u = u(·). Here the natural nonstandard approach is to work with an infinite but hyperfinite dimensional equation that has the Navier-Stokes equations as its standard part in some sense, and then use the transfer of the theory of (finite dimensional) ODEs. Secondly, there is considerable interest in studying stochastic versions of the Navier-Stokes equations, and here there is the nonstandard theory of stochastic differential equations (SDEs) initiated by Keisler [25] to draw on. Although Keisler’s theory deals with finite dimensional equations, it extends naturally to infinite dimensional equations especially when these are tackled by means of the hyperfinite dimensional representation indicated above. Loeb spaces play an important role, since it appears that the underlying probability spaces needed to construct solutions to the stochastic Navier-Stokes equations must have a certain richness — which we know is guaranteed by a Loeb space. Nonstandard Methods and Applications in Mathematics Edited by N. J. Cutland, M. Di Nasso, and D. A. Ross Lecture Notes in Logic, 25 c 2006, Association for Symbolic Logic 

195

196

NIGEL J. CUTLAND

In this survey our aim is to provide for non specialists (in fluid dynamics) an account of some of the applications of Loeb space methods to the NavierStokes equations, with particular emphasis on the stochastic version of these equations. The main topics we discuss are the existence problem, statistical solutions and finally some recent results on attractors for the stochastic NavierStokes equations. We assume a good working knowledge of nonstandard analysis, and in particular Loeb measure and integration theory. §1. The stochastic Navier-Stokes equations. A general version of the stochastic Navier-Stokes (sNS) equations in a bounded domain D ⊂ Rd (mainly d = 2, 3) takes the form: du = [Δu − u, ∇u + f(t, u) − ∇p]dt + g(t, u)dwt (1) div u = 0 Here u(t, x, ) is the (random) velocity of an incompressible viscous fluid at the location x ∈ D at time t, so that we have u : [0, ∞) × D × Ω → Rd where Ω is the domain of an underlying probability space. The operators Δ and ∇ refer to the space variables (xi )1≤i≤d .1 In this paper, the initial condition u(0, ·, ·) = u0 is prescribed (and may be random); the boundary condition is either u(t, x, ) = 0 for x ∈ ∂D or, occasionally, when d = 2 we assume boundary conditions periodic in the space variables. The first term in the equation models the effect of the viscosity , and the second models the interaction of the fluid particles. The function f represents the effect of external forces on the flow of the fluid, while p is the pressure. The final term introduces a general form of additional noisy external forces driven by a Wiener process w. In both f and g the dependence on u is on the whole velocity field u(t, ·, ). The second equation is the incompressibility condition. 1.1. Mathematical formulation. The usual (classical) Hilbert space setting for the precise mathematical formulation of the equations (1) is as follows. The idea is think of the time evolution of the entire velocity field u : D → Rd ; that is, we are dealing with the time evolution of a function so it is important to specify the space in which this function should live. In fact there are a number 1 For those not too familiar with this subject matter, the meaning of u, ∇u is as follows: for each fixed (t, ) we have u(t, ·, ) : D → Rd and for any u(t, ·, ) = y : D → Rd we have y, ∇y : D → Rd given by

y, ∇y(x) =

d X i =1

for x = (xi )1≤i ≤d ∈ D.

yi (x)

∂y (x) ∂xi

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

197

of possible spaces, all subspaces of L2 (D, Rd ) — and much of the theory of Navier-Stokes equations is concerned with the discussion of where (i.e. in which function spaces) solutions exist. Many of the spaces of functions that arise are Hilbert spaces. The most fundamental of these, denoted by H is the closure of the set {u ∈ C0∞ (D, Rd ) : div u = 0} in the L2 norm |u| = (u, u)1/2 , where d  $ (u, v) = u j (x)v j (x)dx. j=1

D

Note that the norm of H does not involve any derivatives. A smaller space that requires its members to have more regularity (involving first derivatives), is the space V, which is the closure of {u ∈ C0∞ (D, Rd ) : div u = 0} in the stronger norm |u| + #u# where #u# = ((u, u))1/2 and  d  $ ∂u ∂v . , ((u, v)) = ∂xj ∂xj j=1

H and V are Hilbert spaces with scalar products (·, ·) and ((·, ·)) respectively, and | · | ≤ c# · # for some constant c. A weak solution to the Navier-Stokes equations is one where the velocity field u belongs to H for all time and a strong solution lives in V for all time. (See Definition 1.1 below for a more precise statement of this.) By A we denote the self adjoint extension of the projection of −Δ in H; A has an orthonormal basis {ek } of eigenfunctions with corresponding eigenvalues k , k > 0, k 4 ∞. For u ∈ H we write uk = (u, ek ), and write Prm for the projection of H on the subspace Hm spanned by {e1 , . . . , em }. Since each ek ∈ V then Hm ⊂ V. The trilinear form b defined by d  $ ∂v i u j (x) (x)w i (x)dx = ( u, ∇v, w) b(u, v, w) = ∂x j D i,j=1

(whenever the integrals make sense) has the well-known and crucial property b(u, v, w) = −b(u, w, v) so that b(u, v, v) = 0. In this framework, the stochastic Navier-Stokes equations (1) may be formulated as a stochastic differential equation in H as follows: (2)

du = [−Au − B(u) + f(t, u)]dt + g(t, u)dwt

where B(u) = b(u, u, ·).2 This is initially regarded as an equation in V (the dual of V) although it turns out that the solution lives in H (and in fact in V for almost all times). Compared to (1), note that the pressure has disappeared, because ∇p = 0 in V (using div v = 0 in V and an integration by parts). 2 That

is, B(u) ∈ V  is given by B(u)(v) = b(u, u, v) ∈ R for any v ∈ V.

198

NIGEL J. CUTLAND

Note also that the second equation div u = 0 in (1) is automatically fulfilled through the condition imposed in the definition of H. The equation (2) is really an integral equation, with the first integral being the Bochner integral and the second an extension of the Itoˆ integral to Hilbert spaces, due to Ichikawa [24]. The noise is given by a Wiener process w : [0, ∞) × Ω → H with trace class covariance, and so the noise coefficient g belongs to L(H, H). It is assumed that g : [0, ∞) × V → L(H, H) while f : [0, ∞) × V → V  . (The restriction to V in the domains is sufficient because we will have the solution in V for almost all times.) 1.2. Definition of solutions to the stochastic Navier-Stokes equations. The following makes precise what is meant by a solution to the stochastic NavierStokes equations as formulated above. In fact there is a range of solution concepts of varying strength, each of which is appropriate in certain circumstances. Definition 1.1. Suppose that u0 ∈ H and f, g as above are given, together with a probability space Ω carrying an H-valued Wiener process w. A weak solution of the stochastic Navier-Stokes equations is a stochastic process u : [0, ∞) × Ω → H such that for a.a.  (i) u ∈ L2 (0, T ; V) ∩ L∞ (0, T ; H) ∩ C (0, T ; Hweak ) for all T < ∞, (ii) for all t ≥ 0  t u(t) = u0 + [Au(s) − B(u(s)) + f(s, u(s))]ds 0 (3)  t g(s, u(s))dws + 0 3

A strong solution has in addition that for a.a.   T 2 |Au(t)|2 dt < ∞ sup #u(t)# + t≤T

0

for all T . In the above Hweak means H with the weak topology. The notion of solution for the deterministic case is given by taking g = 0 and removing the random parameter  throughout, so a weak solution is a single function u ∈ L2 (0, T ; V) ∩ L∞ (0, T ; H) ∩ C (0, T ; Hweak ) for all T . R the stronger property E(supt≤T u(t)2 + 0T |Au(t)|2 dt) < ∞ for all T is specified for a strong solution; we prefer to call this strictly strong. 3 Sometimes

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

199

The classical approach to the solution of the Navier-Stokes equations (deterministic or stochastic) is to begin with an approximate version in the finite dimensional space Hn for each n, the so called Galerkin approximation, which can be solved easily using standard techniques from ODEs (or SDEs in the stochastic case) to give Galerkin approximate solutions u n (t). The hard part is then to find some way to pass to the limit to obtain a solution to the Navier-Stokes equations. First, some specialized compactness theorems are required to show that there is a subsequence of (u n (t))n∈N that converges in an appropriate sense to a limit u(t) say. Then it is necessary to show that this limit u(t) actually is a solution. The difficulties are compounded in the case of the stochastic equations, especially in dimension ≥ 3, because it seems necessary to work with a probability space that is bigger than the Wiener space, where each of the Galerkin approximations can be solved. In the standard approaches some ad hoc methods are employed to enlarge the probability space sufficiently, whereas in the approach described below, by taking a Loeb space at the outset we know (by the universal property of Loeb spaces) that this should be sufficient. The stochastic Navier-Stokes equation (2) was first discussed by Bensoussan and Temam [5], who were able to show existence of solutions for d ≤ 3, but only with additive noise — that is, with g = identity. It was first solved in full generality (for d ≤ 4 and placing only natural growth conditions on f, g) in [7], using Loeb space methods. The difficulties with the more general multiplicative noise in (2) are connected with the (possible) nonuniqueness of solutions and the need for a large probability space to support the randomness. Further complications derive from the infinite dimensional nature of the equations. The essence of the approach in [7] is to overcome these difficulties by using a hyperfinite dimensional space to represent the infinite dimensionality, and a canonical Loeb space to provide the probabilistic richness. Since the publication of [7] a number of alternative (standard) proofs of existence for the stochastic equations (1) have appeared, building on earlier solutions to less general forms of the stochastic input.4 There is now considerable interest in more delicate issues such as the existence of a stochastic flow and attractors for the sNS equations. The Loeb space methods developed in [7] have continued to prove powerful in this field, in combination with the well-developed techniques of “classical” infinite dimensional stochastic analysis. The purpose of this paper is 4 After the seminal paper of Bensoussan and Temam [5], later contributions to the additive noise case (that is, with g = constant) were made by Viot [31] and Vishik and Fursikov [32]. For multiplicative noise, just prior to the appearance of [7] (though published later) Brzezniak, ´ Capinski and Flandoli [6] obtained solutions for d = 2 with a special form of multiplicative noise, and only for small initial conditions; around the same time Bensoussan [3] established general existence for d = 2.

200

NIGEL J. CUTLAND

to survey what has been achieved, with particular emphasis on recent work ´ with Marek Capinski and Jerry Keisler on attractors for the sNS equations in dimensions d = 2, 3. First, to set the scene we remind the reader of the elementary nonstandard approach to the solution of the deterministic NavierStokes equations in [8] which forms the basis for all of the later developments that we discuss. §2. Solution of the deterministic Navier-Stokes equations. In this section set g = 0 and assume that we are given an element u0 ∈ H (the initial condition) and a forcing term f : [0, ∞) × V → V  which is continuous and has linear growth in the second variable. Let U0 = PrN ∗ u 0 ∈ HN and consider the following system of N equations for the internal function U : [∗ 0, ∞) → HN . In vector form, (4)

U˙ ( ) = − ∗AU ( ) − BN (U ( )) + F ( , U ( ))

with initial condition U (0) = U0 , where F ( , V ) = PrN ∗f( , V ) and BN (V ) = PrN ∗ b(V, V, ·) N Putting U ( ) = k=1 Uk ( )Ek ∈ HN , that is (5)

U˙ k ( ) = −k Uk ( ) − ∗ b(U ( ), U ( ), Ek ) + Fk ( , U ( ))

for k = 1, . . . , N . This is simply the Galerkin approximation to the Navier-Stokes equations in dimension N . Transfer of standard results (the theory of ODEs in Rn ) shows immediately that there is a nonstandard solution.5 Elementary calculus on HN (which is, after all, isomorphic to ∗ RN ) gives the following energy equation (using (B(V ), V ) = 0 and (AV, V ) = #V #2 ): (6)

1 d |U ( )|2 = −#U ( )#2 + (F ( , U ( )), U ( )). 2 d

Some three years later, alternative proofs of existence for the general equations (2) in higher ´ and Ga¸tarek [14] and followed dimensions began to appear, beginning with the papers of Capinski by Bensoussan [4]. The latter paper, curiously, has exactly the same title and appears in the same journal as [7], but makes no reference to the earlier results in [7] even though it is likely that, as an editor of the journal at the time, the author knew of their existence. Around this time Flandoli and Ga¸tarek [20] proved existence of solutions to a number of different formulations of the general sNS equations (1) for d ≤ 4, as well as stationary solutions in each case. 5 This will be unique if f satisfies a Lipschitz condition; alternatively, if it were needed, uniqueness of solution to the Galerkin equation on HN could be achieved by an infinitesimal modification of F so that it is ∗Lipschitz.

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

201

Thus d |U ( )|2 + 2#U ( )#2 ≤ 2|F ( , U ( ))|V  #U ( )# d 1 ≤ #U ( )#2 + |F ( , U ( ))|2V  

(7)

Integrating with respect to and using the growth condition on f (and hence F ) allows an application of Gronwall’s Lemma to give 

T

sup |U ( )| + 

#U ( )#2 d < ∞

2

(8)

∈[0,T ]

0

for finite T . For finite times , this means that |U ( )| is finite and so U ( ) is weakly nearstandard. Moreover, U ( ) is weakly S-continuous (since each component Uk ( ) is S-continuous for finite ). This allows the definition of a weakly continuous standard function u : [0, ∞) → H as follows: u(t) = ◦ U (t) = ◦ U ( ) for any ≈ t, where ◦ U denotes the weak standard part of U . In terms of co-ordinates, uk (t) = ◦ U k (t) for all finite k; in particular u(0) = u0 . Now we have Theorem 2.1. The function u(t) defined above is a weak solution to the (deterministic) Navier-Stokes equations with  sup |u(s)|2 + 

(9)

s≤t

t

#u(s)#2 ds < ∞ 0

for all t. Proof (Sketch). The inequality (9) follows immediately from (8) using the inequalities |◦ V | ≤ ◦ |V |, #◦ V # ≤ ◦ #V # and Loeb integration theory. Thus u ∈ L2 (0, T ; V) ∩ L∞ (0, T ; H) ∩ C (0, T ; Hweak ) as required. To see that it provides a solution, it is sufficient to show that

(10)

(u(t), v) = (u0 , v)  t [−((u(s), v)) − b(u(s), u(s), v) + (f(s, u(s)), v)]ds + 0

for all t and v ∈ V . For this it is enough to consider v = ek for k = 1, 2, 3, . . . ;

202

NIGEL J. CUTLAND

that is



t

[−k uk (s) − b(u(s), u(s), ek ) + fk (s, u(s))]ds

uk (t) = (u0 , ek ) + 0

where fk = (f, ek ). For this it is an elementary application of Loeb theory (after checking that all the relevant integrands are S-integrable) to show that  t  t  t  t ◦ k Uk ( )d ≈ k uk ( )dL = k uk (s)ds = ((u(s), ek ))ds; 0

0

and



t



0



t

b(U ( ), U ( ), Ek )d ≈

0

0

◦∗

b(U ( ), U ( ), Ek )dL

0



t

=

b(u(◦ ), u(◦ ), ek )dL

0



t

b(u(s), u(s), ek )ds

= 0

using the continuity properties of b; and finally  t  t  t ◦ Fk ( , U ( ))d ≈ F k ( , U ( ))dL = fk (s, u(s))ds 0

0

0

using the continuity property of f. Putting all this together, using (5), shows that u is indeed a solution to the Navier-Stokes equations.  Remark. It is shown in [11, Section 4.3] that all weak solutions to the Navier-Stokes equations can be obtained by this method if we allow an infinitesimal perturbation of the force term F in (4). 2.1. Uniqueness. In the case d = 2, provided that f is locally Lipschitz in u, it can be shown that the solution constructed above is unique — the technique is almost identical to the standard proof of uniqueness in dimension two. When d = 3, again assuming that f is Lipschitz, the solution U = u N to the Galerkin approximation is unique — but note that this does not imply uniqueness of the solution to the Navier-Stokes equations (this would solve the Millennium problem). Possible sources of non-uniqueness (i.e. constructions that might give a different solution) are: (a) make an infinitesimal perturbation of U (0); (b) make an infinitesimal perturbation of F ; (c) choose a different infinite value for N . Any of these modifications in the construction might produce different solutions. The book [11, Section 4.3] contains a fuller discussion of the information about the uniqueness problem that can be gleaned from the nonstandard method of solution.

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

203

§3. Solution of the stochastic Navier-Stokes equations. Let us now return to the general stochastic Navier-Stokes equations (11)

du = [−Au − B(u) + f(t, u)]dt + g(t, u)dwt

in the Hilbert space setting, for which we seek solutions in the sense of Definition 1.1. We impose appropriate growth and continuity conditions on the coefficients f, g. Attempts to solve these equations using the standard Galerkin approximation technique encounter not only the difficulties discussed at the beginning of Section 1.2 but new ones on account of the stochastic terms. It is straightforward to construct a sequence of solutions u n to the Galerkin approximations to equation (11), each of which is a stochastic process in Hn carried by some probability space Ωn . The problems arise when looking for a convergent subsequence, and where to find it. It turns out6 to be necessary to construct a new richer probability space Ω to accommodate a process u that is in the appropriate sense a limit of a subsequence of the processes u n and their spaces Ωn . Then it is necessary to show that u is a solution to (11) for some Wiener process w on Ω This difficult procedure is circumvented by the use of a Loeb space Ω, which can be given (or constructed) in advance, carrying a prescribed Wiener process w that is the standard part ◦ W of an internal ∗ Wiener process on HN . The richness of the Loeb space means that all constructions can be carried out without leaving this space. The solution lives here and uses the prescribed Wiener process w. In this sense the Loeb space solutions are stronger than those that were subsequently constructed by standard limiting arguments. The pattern of the solution is the same as for the deterministic case — first solving the Galerkin approximation on HN and then taking standard parts, after working to show that the internal solution UN is nearstandard in an appropriate sense. The solution then lives on the same space Ω and is driven by the Wiener process that was given in advance. There are (whatever the approach) a number of additional technicalities to take into account before the equations can be solved. To make sense of the stochastic term (i.e. to be able to define the Ichikawa integral), the Wiener process w must have covariance Q, a nuclear (or trace class) operator on H. The Galerkin approximations living on Hn will then have a stochastic term driven by a Wiener process w n with n × n covariance matrix Qn = Prn Q Prn . Moving to the internal space HN for infinite N , the Galerkin approximation to (11) is the following internal N -dimensional SDE 6 The first proof of existence of solutions to the general stochastic Navier-Stokes equations (11) was given in [7] using the Loeb space methods described here. Later it was discovered how to construct the limit of the Galerkin approximations, using, among other techniques, the Skorohod embedding theorem to construct a richer space.

204

NIGEL J. CUTLAND

for a stochastic process U ( , ) ∈ HN : (12)

dU ( ) = [− ∗AU ( ) − B(U ( )) + F ( , U ( ))]d + G( , U ( ))dW ( )

(13)

U (0) = PrN ∗u0

where F , G are given by F ( , U ) = PrN ∗f( , U ), G( , U )V = PrN ∗g( , U )V. Let Ω0 = (Ω, A, (A ) ≥0 , Π) be an internal filtered space carrying an internal Wiener process W on HN with covariance QN .7 The growth and continuity conditions imposed on f and g ensure (using the transfer of the standard theory of SDE’s) that (12) has an internal solution U ( , ) for all ∈ ∗ (0, ∞) on Ω0 , and U is adapted to (A ) ≥0 . ˆ formula gives The transfer of Ito’s  #U ()#2 d |U ( )|2 + 2 0  (F (, U ()), U ())d = |U (0)|2 + 2 0 (14)  ( ) + tr G(, U ())QN G(, U ())T d 0  (U (), G(, U ()))dW (). +2 0

which corresponds to the equation (6). From this, a rather technical argument involving the Burkholder-Davis-Gundy inequalities, is used to establish the following counterpart of (8): 

 T 2 2 (15) #U ( )# d < ∞ E sup |U ( )| + ≤T

0

for all finite T . The expectation here is with respect to the internal probability Π on Ω0 . From this we have, almost surely with respect to the corresponding Loeb probability  T (16) #U ( )#2 d < ∞ sup |U ( )|2 + ≤T

0

for all finite T . S-continuity properties of internal stochastic integrals show that in addition, almost every path U (·, ) is weakly S-continuous for finite . 7 For

example we may take the canonical process on the space of ∗ continuous paths in HN .

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

205

This allows the construction of a standard process u(t, ) on the Loeb space as in the deterministic case: for a.a.  u(t, ) = ◦ U (t, ) = ◦ U ( , ) for any ≈ t. Theorem 3.1. The process u(t, ) is a solution to the stochastic NavierStokes equations (11). Proof (Sketch). It is clear that this process has the property 1.1(i) required of a solution. To see that it satisfies the stochastic integral equation (11) proceed just as in the deterministic case for the terms in the deterministic (Bochner) integral. For the stochastic term it is necessary to establish that for a.a.   T  ◦ T G( , U ( , ))dW ( ) = g(t, u(t, ))dwt 0

0

for all finite T . This is achieved by extending to the Ichikawa integral the theory initiated by Anderson and developed by Hoover, Perkins and Lindstrøm [23, 27] relating internal stochastic integrals to standard stochastic integrals.  An easy extension of the above method provides a solution to the stochastic Navier-Stokes equations when the initial condition is random (and independent of the Wiener process w). 3.1. Stochastic flow. It was shown in [9] (see also [11, Section 6.3]) that in two dimensions with periodic boundary conditions a stochastic flow of solutions to the stochastic Navier-Stokes equations can be constructed using the above methods, provided that the noise g is homogeneous in time, linear in u and is orthogonal to the velocity — i.e. (g(u), u) = 0. A stochastic flow of solutions is a single measurable function ϕ : [0, ∞) × H × Ω → H such that ϕ(·, ·, ) is continuous for a.a. , and for each fixed u0 ∈ H the process u(t, ) = ϕ(t, u0 , ) is a solution of the stochastic Navier-Stokes equations (11) with initial condition u(0) = u0 . This will be discussed further in Section 5.4 below. §4. Statistical solutions to the Navier-Stokes equations. We begin by recalling the notion of a statistical solution in a general setting, before discussing the way in which Loeb measures help in constructing statistical solutions for the Navier-Stokes equations.

206

NIGEL J. CUTLAND

4.1. The Foias equation. Consider the following abstract evolution equation taking place in a normed space H d u(t) = F (t, u(t)), t > 0. dt Suppose that for each initial value u0 ∈ H there is a unique solution u(t) with u(0) = u0 . Denote this solution by u(t) = S(t, u0 ) to emphasize the dependence on the initial value. Suppose now that the initial value is a random variable u0 : Ω → H . This random variable induces a probability measure 0 on H by (17)

0 (A) = P(u0 ∈ A) where P is the given probability on Ω. The function S(t, u0 ()) is then a stochastic process with initial distribution 0 . The probability distributions

t of the random variables S(t, ·) are the measures on H given by

t (A) = P(S(t, u0 (·)) ∈ A) = 0 ({v ∈ H : S(t, v) ∈ A}). In other words, ( t )t≥0 is the family of time evolving measures on H induced by the equation (17) from the given initial measure 0 . The idea behind statistical solutions to (17) is to find an equation that describes the time evolution of t . It is sufficient to characterize the evolution of the integrals  (18) (u)d t (u). H

for a sufficiently broad class of test functions . Computing the time derivative of (18) heuristically, assuming that   ∈ H and drawing on equation (17) gives:  d (u)d t (u) dt H  d = (19) (S(t, v))d 0 (v) (by the definition of t ) dt  H = (20) (  (S(t, v)), F (t, S(t, v)))H d 0 (v) (from (17)) H = (21) (  (u), F (u))H d t (u) (by definition of t again). H

After integrating from 0 to t we obtain the so-called Foias equation corresponding to the original evolution equation (17):   t  (22) (u)d t (u) − (u)d 0 (u) = (  (u), F (t, u))H d s (u)ds. H

H

0

H

Definition 4.1. Any solution ( t )t≥0 to the Foias equation is called a statistical solution to the equation (17).

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

207

Note that the above derivation of the Foias equation requires that (17) have the uniqueness property — otherwise the function S does not exist. However, S does not occur in the Foias equation, so as an abstract equation for the time evolution of a family of measures (22) makes sense even when the underlying equation does not have a unique solution. In this case, there is then an unanswered question about existence of a solution to the Foias equation. This is the crucial point that was observed by Foias, and it applies to the NavierStokes equations in 3-dimensions, since it is not known whether there is a unique solution. Thus the above derivation may not make sense; however, the end result —the corresponding Foias equation— is perfectly well defined and it makes sense to look for statistical solutions. The Foias equation for the Navier-Stokes equations takes the form   (u)d t (u) = (u)d (u) H H  t (23) [−((u,   (u))) − b(u, u,   (u)) + 0

H

+ (f(s, u),   (u))]d s (u)ds

for a family of suitable test functions . There is a corresponding notion of statistical solution for the stochastic Navier-Stokes equations which we will discuss later (see Section 4.6). 4.2. Construction of statistical solutions using Loeb measures. Unlike the Navier-Stokes equations themselves, the hyperfinite Galerkin approximation (4) does have unique solutions (provided we adjust the function F infinitesimally if necessary to make it ∗ Lipschitz). This then allows the Foias derivation to be made precise in the space HN , and we can exploit that to get a very easy construction of a statistical solution to the deterministic Navier-Stokes equations using Loeb measures, which should be compared with Foias’ original construction which takes more than twenty pages. Here is the outline of this procedure. Step 1. For fixed V ∈ HN with |V | finite, solve the Galerkin equation (4) in HN with initial condition U (0) = V , and follow the rest of the proof of Theorem 2.1 to see that it provides a solution u(t) to the Navier-Stokes equations with initial condition u(0) = ◦ V . Write u(t) = St V for this solution, to indicate its dependence on V . Step 2. Let M = ∗ 0 ◦ Pr−1 N , which is an internal probability measure on HN , and consider the corresponding Loeb measure ML . Step 3. Define a family of measures t on H, for t ≥ 0 by

t (A) = ML ({V ∈ HN : St V ∈ A}).

208

NIGEL J. CUTLAND

Step 4. Carry out Foias’ heuristic argument (now rigorously) in HN , using the fact that we have uniqueness of solutions in HN , and combine with Loeb space techniques, to show that t is a statistical solution to the Navier-Stokes equations. A minor variation on this approach would be to write U ( ) = T V to denote the internal solution to the Galerkin approximation on HN and let M be the corresponding internal family of measures induced on HN by means of T with M0 = M . Foias’ argument shows that (M ) ≥0 is an internal statistical solution (i.e. solves the Foias equation corresponding to the Galerkin equation on HN ). Now define

t (A) = (Mt )L ◦ st−1 (A) for Borel A ⊆ H; it follows quite easily that this is a statistical solution. 4.3. Construction of statistical solutions using nonstandard densities. In finite dimensions a measure can often be described by means of a density (·) — and then an evolving family of measures t would be given by an evolving density (t, ·). The Foias equation for the family ( t ) would then become an equation for (t, ·) — in fact a PDE. Measures on infinite dimensional spaces such as H cannot be described by densities due to the lack of Lebesgue measure as a reference measure. However, the hyperfinite space HN carries nonstandard Lebesgue measure, since HN ∼ = ∗ RN , so we can introduce nonstandard densities here and carry out the above procedure for the Foias equation corresponding to the NavierStokes equations. This provides an alternative Loeb space technique for constructing statistical solutions, which brings a real advantage when one considers the stochastic Navier-Stokes equations, as we shall see. Here is an outline. 4.4. Nonstandard densities. Definition 4.2. An internal function Φ →∗ R is a nonstandard density of the probability measure on the Hilbert space H if Φ is " non-negative, ∗ integrable with respect to ∗ Lebesgue measure on RN , with Φ(U )dU = 1 and  

(B) = ML st−1 (B) ∩ H N H where ML is the Loeb measure corresponding to the internal measure on HN given by  M (A) = Φ(U )dU. A

To see that nonstandard densities exist for Borel probabilities, let N (X, C ) denote the normal density on ∗ RN with mean X and covariance C , and we have:

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

209

Proposition 4.3. The function  Φ(U ) = N (U − PrN v, ε 2 · I )d ∗ (v) ∗H

is a nonstandard density of . Moreover

"

|U |2 dM (U ) ≤

"

|u|2 d (u) + Nε 2

4.5. The density equation. We next heuristically derive an equation for the densities of solutions to the Foias equation (23). Suppose that Φ(t, U ) is a density of t so that for a test functional  we have   (u)d t (u) ≈ ∗ (U )Φ(t, U )dU. Then it is natural to rewrite the Foias equation replacing   by the vector ∂ ( ∂U ), d t by ΦdU etc. Then after integration by parts and dropping ∗  from k both sides we obtain (24) $ ∂ ∂ Φ( , U ) + [(−k Uk − ∗ b(U, U, Ek ) + Fk ( , U )) · Φ( , U )] = 0 ∂ ∂Uk N

k=1

which is now a nonstandard equation with ∈ ∗ [0, T ] and U ∈ HN ; we call it the density equation (for the Navier-Stokes equations). It is in fact a hyperfinite version of the Liouville equation. The next result shows how its solution provides statistical solutions. Theorem 4.4. Let be a Borel probability measure on H satisfying  |u|2 d (u) < ∞. Let Φ0 be a nonstandard density of with  |U |2 Φ0 (U )dU < ∞. Let Φ( , U ) be the solution to the density equation (24) with initial function Φ0 . Then the internal measures M determined by Φ( ) are nearstandardly concentrated and the standard family of measures given by

t = (Mt )L ◦ st−1 ,

t ∈ [0, T ],

is a statistical solution of the Navier-Stokes equation with initial measure . Proof (Sketch). The density equation can be solved by the method of characteristics. By reversing the derivation of the density equation, it can be seen that the measures M solve the Foias equation on HN — and it follows as before that the family t is a statistical solution. 

210

NIGEL J. CUTLAND

4.6. Statistical solutions for stochastic Navier-Stokes equations. The heuristic argument for the Foias equation for the Navier-Stokes equations can be carried through for the stochastic Navier-Stokes equations, resulting in an extra term as compared with equation (23):   (u)d t (u) − (u)d 0 (u) H H  t (25) (−((u,   (u))) − b(u, u,   (u)) + (f(s, u),   (u)) = 0

H

1 tr(Qg T (s, u)  (u)g(s, u)))d s (u)ds 2 (here g T denotes the adjoint of g). Once we have a solution u(t, ) to the stochastic Navier-Stokes equations (11) with random initial condition u(0, ) distributed according to the initial measure 0 , then it is easy to see that a statistical solution is given by +

t (A) = P(u(t, ) ∈ A). Note, however, that (25) is a deterministic equation for the evolution of a family of probability measures — there is no underlying probability space or stochastic process even implicitly. Thus we might hope to find a statistical solution without having to solve (11) — and even without any stochastic calculus and the need to develop stochastic integration in H. This can indeed be achieved by extending the method of nonstandard densities. The density equation for the stochastic Navier-Stokes equations is the same as the deterministic case (24) except for an additional second order term on the left: N $ ∂ 2 (ij Φ) − ∂Ui ∂Uj i,j=1

where ( , U ) = 12 G( , U )QN G T ( , U ). Although it is a little more complicated, the construction of a statistical solution to the stochastic Navier-Stokes equations by first solving the density equation proceeds along the same lines as the deterministic case — see [10] for details. §5. Attractors for stochastic Navier-Stokes equations. The general notion of attractor is concerned with the asymptotic behaviour of trajectories of semigroups of operators. Suppose that we have a topological space X and a semigroup (S(t))t≥0 of operators S(t) : X → X — that is 1) S(0) = IdX , 2) S(t + s) = S(t) ◦ S(s). A very general notion of attractor for S is as follows. A set A ⊆ X is an attractor for S if

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

211

1) it is invariant, that is, S(t)A = A, for all t. 2) there is an open neighbourhood U of A, called the basin of attraction, such that for all x ∈ U , S(t)x → A (in the sense that for each open neighbourhood V of A, S(t)x ∈ V for t sufficiently large). If 2) holds with U = X , then A is a global attractor. Condition 1) is trivially fulfilled for A = Ø, while 2) holds for A = X , so the interest lies in having both conditions fulfilled simultaneously. Variations of this definition are possible and occur in the literature — for example, requirement 2) can be replaced by the stronger condition that A attracts sets from a certain class B, i.e. for all B ∈ B, S(t)B ⊆ V for sufficiently large t. An example of such a class is where B is the family of all bounded sets (assuming now that X is metric). For a differential equation with unique solution for each initial condition a semigroup S(t) is defined on the phase space by: S(t)x is the solution to the equation in question with the initial value x ∈ X . To get the semigroup property it is necessary that the equation be homogeneous, that is, the coefficients must be independent of time. Here we are concerned with existence of attractors for the Navier-Stokes equations — particularly versions that have a stochastic ingredient. For deterministic Navier-Stokes equations, the existence of a global attractor in dimension 2 goes back to the work of Ladyzhenskaya [26] and Foias and Temam; for a full exposition see Chapter III (sec. 2) of Temam’s book [30]. The restriction to dimension 2 is because it is only in that case that the semigroup S(t) exists for the Navier-Stokes equations — in higher dimensions the question of uniqueness is still open. The new difficulties encountered when seeking attractors for the stochastic equations are twofold. First there is a problem with the very definition of attractors for stochastic equations, since the noise is not homogeneous in time — see the discussion below. Second, for the stochastic Navier-Stokes equations there is the issue of existence of solutions to the equations themselves — particularly existence of the stochastic equivalent of a semigroup of solutions. In view of these additional difficulties, several different formulations the notion of an attractor for a system of stochastic differential equations have been developed — for example measure attractors (see [12, 28]), or the notion of stochastic attractor developed by Crauel and Flandoli [16]. A third approach is to extend the idea of Sell [29] that was used for deterministic Navier-Stokes equations in 3 dimensions to overcome the problem of nonuniqueness. We will explain each of these —and how Loeb space methods can assist— below. In each case, to avoid unnecessary additional complications, the drift and noise coefficients f, g in (2) are taken to be time-independent, so the equations considered are (26)

du = [−Au − B(u) + f(u)]dt + g(u)dwt

212

NIGEL J. CUTLAND

5.1. Nonstandard attractors and standard attractors. Before discussing attractors for the Navier-Stokes equations, we outline a general nonstandard approach to attractors that underlies all of the applications to the NavierStokes equations below. In the general setting described earlier, suppose that X is now a Banach space with norm | · | and there is a norm-bounded set E that is an absorbing set, defined as follows. Definition 5.1. The set E is an absorbing set if for any bounded set B there is t0 > 0 such that St B ⊆ E whenever t ≥ t0 . If there is a bounded absorbing set E then a simple way to construct an attractor is as follows. First note that ∗E is S-absorbing — meaning that for any finitely bounded set B ⊂ ∗X there is a finite time 0 such that ∗S B ⊆ ∗E for all ≥ 0 . This follows by transfer of the absorbing property of E. Now write T for ∗S and define the internal set C by 

T ∗E = T ∗E C = n∈N ≥n

-infinite

(the equivalence follows by an application of ℵ1 -saturation). Then it follows easily that C is a global S-attractor, by which we mean that C has the three properties noted in the following theorem. Theorem 5.2. (1) C is a countable intersection of internal sets; in fact  C = Cn n∈N

where Cn =



T ∗E.

≥n

(2) T C = C for all finite ; (3) For each n ∈ N and finitely bounded set B ⊂ ∗X there is t0 ∈ [0, ∞) such that T ( )B ⊆ Cn

for all ≥ t0

The set C is bounded and hence weakly nearstandard in ∗X , so we may take the standard part in the weak topology A = w-st C = {w-st x : x ∈ C } and this is a weakly compact subset of X . It is now quite straightforward to see that A is a global attractor for the semigroup St , given certain natural continuity assumptions. The proof draws on the fact that C is an S-attractor.

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

213

This idea for constructing attractors is adapted in various ways to give the applications to the Navier-Stokes equations that are described in the following sections. 5.2. Measure attractors. For general time-homogeneous stochastic NavierStokes equations of the form (26) with random initial condition, one approach to attractors is to study the time evolution of the initial measure — that is, the probability distribution of the initial value. In other words, this is to consider attractors for statistical solutions to the stochastic Navier-Stokes equations. Such attractors are called measure attractors. This approach is currently applicable only to d = 2 since it is necessary that the equation (26) has a unique solution. Thus it is also assumed that f, g satisfy an appropriate Lipschitz condition, to ensure that for each initial condition u ∈ H there is a unique solution u(t) = v(t, u) with u(0) = u (so v(0, u) = u). A semigroup St is defined on M1 (H), the set of Borel probability measures on H, by putting St = t where   ϑ(u)d t (u) = Eϑ(v(t, u))d (u) H

H

for all bounded weakly continuous functions  : H → R. An attractor for the dynamical system (M1 (H), St ) is called a measure attractor. The existence of measure attractors for the sNS equations was first investigated by Schmallfuß in [28] for example. The paper [12] with ´ Capinski establishes existence of a measure attractor for (26) under quite general conditions: Theorem 5.3. Suppose that f, g are Lipschitz and satisfy an appropriate growth condition8 . Then there is a measure attractor A ⊂ M1 (H) for the stochastic Navier-Stokes equations (26). That is (a) A is weakly compact; (b) St A = A for all t (c) for each open set O ⊇ A, and for each r > 0 St B r ⊆ O for all sufficiently large t, where B r = { ∈ M1 (H) :

"

|u|2 d (u) ≤ r}.

The methods in [12] do not make essential use of Loeb spaces although at some points they can be employed to assist the construction of the attractor, which is very much along the lines outlined above. Note that the dynamical system M1 (H) with the semigroup St is actually deterministic in this formulation, unlike the approach to stochastic attractors that we discuss next. 8 For example, a sufficient condition is that |f(u)| −1 ≤ c + 1 u and |g(u)|H,H ≤ c + 2 u for some 1 , 2 > 0 with 21 + 22 · trQ < 2, where Q is the covariance of the H-valued Wiener process w.

214

NIGEL J. CUTLAND

5.3. Stochastic attractors. For a stochastic system such as (26) the idea of a stochastic attractor developed by Crauel and Flandoli [16] takes into account the fact that at all times new noise is introduced into the evolution of each path of any solution to (26). They use the notion of a cocycle, a generalization of the semigroup idea to non time-homogeneous situations, and it also necessary to work with a stochastic flow of solutions to the equation. A stochastic attractor is then defined to be a random set A() that, at time 0, attracts trajectories “starting at −∞” (compared to the usual idea of an attractor being a set “at time ∞” that attracts trajectories starting at time 0). This idea is spelled out below, and involves the introduction of a one parameter group t : Ω → Ω of measure preserving maps, which should be thought of as a shift of the noise to the left by t. Making this precise, suppose that ϕ is a stochastic flow of solutions to (26). That is, ϕ is a measurable function ϕ : [0, ∞) × H × Ω → H such that ϕ(·, ·, ) is continuous for a.a. , and for each fixed initial condition u0 the process u(t, ) = ϕ(t, u0 , ) is a solution to (26) with u(0, ) = u0 . The notion of a semigroup in the usual definition of a deterministic attractor, along with the notion of an attractor itself, is now replaced by the following. Definition 5.4. (i) The flow ϕ is a crude cocycle if for each s ∈ R+ there is a full set Ωs such that for all  ∈ Ωs ϕ(s + t, x, ) = ϕ(t, ϕ(s, x, ), s ) holds for each x ∈ H and t ∈ R+ . (ii) A cocycle is perfect if Ωs does not depend on s. (iii) Given a perfect cocycle ϕ, a global stochastic attractor is a random compact subset A() of H such that for almost all  ϕ(t, A(), ) = A(t ),

t ≥ 0,

lim dist(ϕ(t, B, −t ), A()) = 0

t→∞

for each bounded set B ⊂ H. Note that the existence of a perfect cocycle is necessary for the possibility of having a stochastic attractor. Constructing a perfect cocycle is difficult for infinite dimensional systems, particularly for those that are truly stochastic (as compared to random dynamical systems in which paths may be treated individually). 5.4. Existence of a stochastic attractor for the Navier-Stokes equations. A stochastic attractor was constructed for the stochastic Navier-Stokes equation with d = 2 by Crauel and Flandoli [16], but the version of (26) that was considered reduced to a random equation that could be solved pathwise, giving essentially a pathwise construction of the random attractor A(). When

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

215

seeking to construct a stochastic attractor for the general system (26) the nonstandard framework makes it particularly easy to consider −∞, and this was achieved in [13] for Navier-Stokes equations in 2 dimensions with a special form of noise (as in Section 3.1 above). This is the first example of a stochastic attractor for a truly stochastic9 version of the Navier-Stokes equations and used Loeb space methods, seemingly in an essential way. In the following, for simplicity the Wiener process was taken to be one dimensional. ´ Theorem 5.5 (Capinski and Cutland [13]). (a) Suppose that (g(u) − g(v), u − v) = 0 10

and (g(u), u) = 0. With appropriate Lipschitz and growth conditions on f, g, there is an adapted Loeb space carrying a stochastic flow of solutions to the system (26) that is a perfect cocycle, and there is a stochastic attractor A() (compact in the strong topology of H) for this system. (b) If g has the additional property that ((g(v), v)) = 0 for v ∈ V the stochastic attractor is bounded and weakly compact in V. The proof of this result is quite long and complicated, so here we only sketch the basic ideas. As underlying nonstandard probability space we take Ω = ∗ C 0 (R) with the Wiener measure, and the canonical one dimensional Wiener process W () = ( ) defined for all time ∈ ∗ (−∞, ∞). On Ω take the group of measure preserving shifts Θ given by (Θ )() = ( + ) − ( ). On this space consider the Galerkin approximation (27)

dU ( ) = [− ∗AU ( ) − B(U ( )) + F (U ( ))]d + G(U ( ))dW ( ) U (s) = V ∗

for any s ∈ Q and ≥ s. Using the transfer of standard results concerning flows and cocycles for finite dimensional SDEs [2] construct a superflow Ψ( , s, V, ) of solutions to (27) for all s ∈ ∗ Q, all ≥ s, for all V ∈ HN . This has the property that for each s the process U ( , ) = Ψ( , s, V, ) is a solution to (27) on ∗ [s, ∞) with initial condition U (s) = V . In addition it has the “glueing together” property Ψ( , s, V, ) = Ψ( , r, Ψ(r, s, V, ), )

(28) ∗

for all r ≥ s ∈ Q, all ≥ r, all V ∈ HN and all  ∈ Ω1 where Ω1 is a ∗ full subset of Ω. This connects with the cocycle property as follows. If we write Φ( , V, ) = Ψ( , 0, V, ) 9 That 10 For

is, with feedback of the noise through the stochastic integral. example g(u) = h, ∇u for some h ∈ H.

216

NIGEL J. CUTLAND

for the solutions to (27) starting at 0 then Ψ( , s, V, ) = Φ( − s, V, Θs ) for all s ∈ ∗ Q, all ≥ s and all V ∈ HN , for all  ∈ Ω1 and (28) shows that Φ is a crude ∗ cocycle. The next lemma is the key to the whole construction, since it allows the taking of standard parts. Lemma 5.6. There is a Loeb full subset of Ω1 such that (i) for all  ∈ Ω1 the superflow Ψ( , s, V, ) is S-continuous in ( , s, V ) for finite and s and (strongly) nearstandard V ∈ HN ; (ii) Ω1 is invariant under Θt for all finite t ∈ ∗ Q. The proof of this lemma is quite delicate and requires a special adaptation of the Lindstrøm-Kolmogorov Continuity Theorem in [1], along with some new strong regularity properties of the solutions to (27). The proof of Theorem 5.5 continues by showing that there is a fixed finite radius  such that B(), the ball of radius  in HN , absorbs almost all of the paths of the superflow Ψ uniformly in the starting time s and a given bounded set of initial conditions B(r) say. This is where the special form of the noise is used — since it gives a deterministic equation for the evolution of the energy |U ( )|2 of solutions to (27). Then, following the general pattern outlined above (Section 5.1) define a stochastic S-attractor C () ⊆ HN by

C () = Ψ(0, −s, B(), ) 0 0. Then for any r ≥ 0 the process v = Sr u is defined by v(t, ) = u(r + t, r ) It is clear that Sr is a semigroup, and if u is adapted so is St u. Suppose now that X is closed under St . Then a process attractor for the class X can now be defined. In the following, if u is a stochastic process then Law(u) is defined to be the probability law (on path space) of the coupled process (u, w). Definition 5.8. (a) A set of laws A ⊂ Law(X ) is a Law-attractor if (i) (Invariance) Sˆt A = A for all t ≥ 0, where Sˆt is the mapping of laws induced by the semigroup St . (ii) (Attraction) For any open set O ⊃ A and bounded Z ⊂ Law(X ), Sˆt Z ⊆ O eventually (i.e. this holds for all t ≥ t0 (O, Z)). (iii) (Compactness) A is compact12 (b) A (process) attractor for the semiflow St on X is a set of processes A ⊆ X such that (i) Law(A) is a Law-attractor (in particular Law(A) is compact and so A is bounded); (ii) (Invariance) St A = A for all t ≥ 0; 12 In

the metric d1 (1 , 2 ) = d0 (1 , 2 ) + |E 1 (|u|2 ) − E 2 (|u|2 )|

where d0 is the Prohorov metric and i (i = 1, 2) is the projection of i onto the first coordinate — that is, path space for the solutions of (26).

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

219

(iii) (Attraction) For any bounded set Z ⊂ X and compact set K lim t→∞ d (St Z, K) ≥ d (A, K) 13

(iv) A is closed . The theory of neometric spaces [19] allows a stronger definition of attractor as follows [17], where M = { : [0, ∞) → H : || < ∞} with || = "∞ 1 ( 0 |(t)|2 exp(−t)dt) 2 and M = L2 (Ω, M ): Definition 5.9. A neo-attractor for S on X in M is a set A ⊆ X such that: (a) A is neocompact in M. (b) St A = A for all t ∈ [0, ∞). (c) For each bounded set Z ⊆ X and neo-open set O ⊇ A in M, St Z ⊆ O eventually (that is, there exists r ∈ [0, ∞) such that St Z ⊆ O for all t ∈ [r, ∞).) In [17] the following is established. Theorem 5.10. A neo-attractor is a process attractor. Remarks on these definitions. (1) Since existence results for the stochastic Navier-Stokes equations require a rather large probability space, it is to be expected that any space carrying a whole class of solutions X as above will be too big to allow an attractor A ⊂ X that is compact in the usual sense. Hence a weaking of the usual notion of attractor is necessary for any hope of existence of a process attractor. (2) The attraction property 5.8(b)(iii) is equivalent to the following: (29)

St Z ⊆ O

eventually for any bounded Z and any open O ⊃ A of the form O = L2 (Ω, M ) \ K ≤ε , with K compact. Property 5.8(b)(i) means that in addition (29) holds eventually for any open set O of the form O = Law−1 (O ) where O is an open set of laws with Law(A) ⊆ O . The usual attraction property for attractors, namely that St Z ⊆ O eventually for any bounded Z and any open O ⊃ A is probably too much to expect. However, a neo-attractor seems to be a very natural weaking of the notion of attractor given the theory of neometric spaces. The weaking given by Definition 5.8 is somewhat ad hoc and is the best possible without the neo machinery. Note that the particular open sets L2 (Ω, M ) \ K ≤ε or Law−1 (O ) as above are neo-open. We can now state the main theorem of [17] drawing on [18]. Theorem 5.11. There is a Loeb space Ω (which carries solutions to the stochastic Navier-Stokes equations (26) in dimension 3 for all L2 F0 -measurable initial conditions) with a neo-attractor A for the class of solutions X described below. 13 2 2 R ∞ Here 2and in (iii) the topology is the L norm topology on processes in H given by |u| = |u(t)| exp(−t)dt. 0

220

NIGEL J. CUTLAND

The class of solutions in the following definition depends on the constants k1 , k2 , k3 , α, , a, b. In the proof of Theorem 5.11 in [18] an explicit choice of these is identified that ensures that X = Ø. The condition (X5) is the only one that needs explanation — see the remarks below. Definition 5.12. (i) Denote by X the class of adapted stochastic processes u : [0, ∞)×Ω → H (for space dimension d = 3) with the following properties. (X1) For a.a.  the path u(·, ) belongs to the following spaces: 2 2 L∞ loc (0, ∞; H) ∩ Lloc [0, ∞; H) ∩ Lloc (0, ∞; V) ∩ C (0, ∞; Hweak )

(X2) For all t1 ≥ t0 > 0 u(t1 ) = u(t0 )  t1  + [−Au(t) − B(u(t)) + f(u(t))]dt + t0

t1

g(u(t))dwt

t0

(X3) For a.a. t0 > 0 and all t1 ≥ t0 , E(|u(t1 )|2 ) ≤ E(|u(t0 )|2 ) exp(−k1 (t1 − t0 )) + k2 (X4) For a.a. t0 > 0 and all t1 ≥ t0 ,

 E

sup |u(s)|2 +

t0 ≤s≤t1

t1



#u(s)#2 ds

≤ αE(|u(t0 )|2 ) + (t1 − t0 )

t0

(X5) For a.a. t0 > 0 and all t1 ≥ t0 , for all n ≥ 1 E(ϕn (u(t1 ))) ≤ E(ϕn (u(t0 )) exp(−k3 (t1 − t0 ))) + n − 2 (a + bE(|u(t0 )|2 )) 1

(X6) E

"1 0

|u(t)|2 dt < ∞

(ii) Denote by Xk the set of u ∈ X with "1 (X6k ) E 0 |u(t)|2 dt ≤ k Remarks. 1. The above conditions tell us nothing about u(t, ) at t = 0 and there may be a singularity there. In this sense the class X is a class of generalized weak solutions to the stochastic Navier-Stokes equations (cf. [29, p. 12]). 2. It follows from (X6) that E(|u(t)|2 ) < ∞ for a.a. t ∈ (0, 1). Thus, from (X3) we see that E(|u(t)|2 ) is bounded on [ n1 , ∞) for all n. 3. In condition (X5), the function ϕn (u) is an explicit smooth approximation to the function |u|2 I{|u|≥n} . The inequalities (X5) follow heuristically from the equation (26) as a particular instance of the Foias equation corresponding to (26). The choice of the functions ϕn makes (X5) a kind of uniform integrability condition for the random variables |u(t, )|2 for t ∈ [t0 , ∞).

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

221

The proof of Theorem 5.11 proceeds as follows. First show that X = Ø by the construction of solutions to the stochastic Navier-Stokes equations outlined in Section 3. The heuristic argument for the inequalities (X5) can be made precise for the approximate solution U living in HN and it is this that gives (X5) for the solution u = ◦ U . The other properties in the definition of X follow naturally. Next it is necessary to define a set of internal (“nonstandard”) approximate solutions X to (26) that is bigger than the Galerkin approximations on HN : X includes processes U where the equality (X2) is replaced by an infinitesimal approximation. The set X has a natural decomposition as

X = Xk where Xk corresponds to the set Xk . Then it is shown that X is precisely the set of processes u such that u = ◦ U for some process U ∈ X that is nearstandard as a process: in symbols X = ◦ (X ∩ NS) and similarly Xk = ◦ (Xk ∩ NS) Finally, after defining a semigroup operation T on X corresponding to St , the set  C= Tn Xk n∈N

is defined for a certain k (for which Xk is absorbing). It is easily proved that T C = C for finite times and that C attracts bounded sets in X — so C is an S-attractor. The key observations are now the following: • C is non-empty (this follows from ℵ1 saturation) • C ⊂ NS even though each Tn Xk  NS. This follows because of a “uniform integrability” in the definition of Xk that is similar to the property (X5) in the definition of X . • The set C is a Π01 -set (that is, a countable intersection of internal sets) • In consequence the set A = ◦ C is nonempty and in fact neocompact. The other properties required for A to be an attractor follow from the corresponding properties of the nonstandard attractor C. In the final part of [18] the class X¯ of two-sided solutions to (26) is discussed. It is shown that X¯ = Ø, and the attractor A is simply the restriction of solutions in X¯ to the nonnegative time interval [0, ∞). REFERENCES

[1] S. Albeverio, J.-E. Fenstad, R. Høegh-Krohn, and T. Lindstrøm, Nonstandard methods in stochastic analysis and mathematical physics, Academic Press, New York, 1986 .

222

NIGEL J. CUTLAND

[2] L. Arnold and M. Scheutzow, Perfect cocycles through stochastic differential equations, Probability and Related Fields, vol. 101 (1995), pp. 65–88. [3] A. Bensoussan, A model of stochastic differential equation in Hilbert space applicable to Navier-Stokes equation in dimension 2, Stochastic analysis, liber amicorum for Moshe Zakai (E. Mayer-Wolf, E. Merzbach, and A. Schwartz, editors), Academic Press, 1991, pp. 51–73. , Stochastic Navier-Stokes equations, Acta Applicanda Mathematicae, vol. 38 [4] (1995), pp. 267–304. [5] A. Bensoussan and R. Temam, Equations stochastiques du type Navier-Stokes, Journal of Functional Analysis, vol. 13 (1973), pp. 195–222. [6] Z. Brzezniak, M. Capinski, and F. Flandoli, Stochastic Navier-Stokes equations with ´ ´ multiplicative noise, Stochastic Analysis and Applications, vol. 105 (1992), pp. 523–532. [7] M. Capinski and N. J. Cutland, Stochastic Navier-Stokes equations, Acta Applicanda ´ Mathematicae, vol. 25 (1991), pp. 59–85. , A simple proof of existence of weak and statistical solutions of Navier-Stokes [8] equations, Proceedings of the Royal Society. London. Series A, vol. 436 (1992), pp. 1–11. , Navier-Stokes equations with multiplicative noise, Nonlinearity, vol. 6 (1993), [9] pp. 71–77. , Statistical solutions of stochastic Navier-Stokes equations, Indiana University [10] Mathematics Journal, vol. 43 (1994), pp. 927–940. , Nonstandard methods for stochastic fluid mechanics, World Scientific, Singapore, [11] London, 1995. , Measure attractors for stochastic Navier-Stokes equations, Electronic Journal of [12] Probability, vol. 3 (1998), pp. 1–15, Paper 8. , Existence of global stochastic flow and attractors for Navier-Stokes equations, [13] Probability Theory and Related Fields, vol. 115 (1999), pp. 121–151. [14] M. Capinski and D. Ga¸tarek, Stochastic equations in Hilbert space with application to ´ Navier-Stokes equation in any dimension, Journal of Functional Analysis, vol. 126 (1994), pp. 26– 35. [15] The Clay Institute, Millennium prize problems, see the website http://www.claymath.org/Millennium Prize Problems. [16] H. Crauel and F. Flandoli, Attractors for random dynamical systems, Probability Theory and Related Fields, vol. 100 (1994), pp. 365–393. [17] N. J. Cutland and H. J. Keisler, Attractors and neo-attractors for 3D stochastic NavierStokes equations, to appear in Stochastics and Dynamics. , Global attractors for 3-dimensional stochastic Navier-Stokes equations, Journal of [18] Dynamics and Differential Equations, vol. 16 (2004), pp. 205–266. [19] S. Fajardo and H. J. Keisler, Neometric spaces, Advances in Mathematics, vol. 118 (1996), pp. 134–175. [20] F. Flandoli and D. Ga¸tarek, Martingale and stationary solutions for stochastic NavierStokes equations, Probability Theory and Related Fields, vol. 102 (1995), pp. 367–391. [21] F. Flandoli and B. Schmalfuß, Random attractors for the 3D stochastic Navier-Stokes equation with multiplicative white noise, Stochastics and Stochastics Reports, vol. 59 (1996), pp. 21– 45. , Weak solutions and attractors for three-dimensional Navier-Stokes equations with [22] nonregular force, Journal of Dynamics and Differential Equations, vol. 11 (1999), pp. 355–398. [23] D. N. Hoover and E. Perkins, Nonstandard constructions of the stochastic integral and applications to stochastic differential equations I, II, Transactions of the American Mathematical Society, vol. 275 (1983), pp. 1–58. [24] A. Ichikawa, Stability of semilinear stochastic evolution equations, Journal of Mathematical Analysis and Applications, vol. 90 (1982), pp. 12– 44.

LOEB SPACE METHODS FOR STOCHASTIC NAVIER-STOKES EQUATIONS

223

[25] H. J. Keisler, An infinitesimal approach to stochastic analysis, vol. 297, Memoirs of the American Mathematical Society, 1984. [26] O. A. Ladyzhenskaya, A dynamical system generated by the Navier-Stokes equations, Journal of Soviet Mathematics, vol. 3 (1975), pp. 458– 479. [27] T. L. Lindstrøm, Hyperfinite stochastic integration I, II, III, Mathematica Scandinavica, vol. 46 (1980), pp. 265–333. [28] B. Schmallfuß, Measure attractors of the stochastic Navier-Stokes equation, Bremen Report 258, University of Bremen, 1991. [29] G. R. Sell, Global attractors for the three-dimensional Navier-Stokes equations, Journal of Dynamics and Differential Equations, vol. 8 (1996), pp. 1–33. [30] R. Temam, Infinite-dimensional dynamical systems in mechanics and physics, SpringerVerlag, New York, 1988, 2nd edition 1997. [31] M. Viot, Solutions faibles d’equations aux derivees partielles non lineaires, Thesis, Universite Paris VI, 1976. [32] M. I. Vishik and A. V. Fursikov, Mathematical problems of statistical hydromechanics, Kluwer, Dordrecht - London, 1988. DEPARTMENT OF MATHEMATICS UNIVERSITY OF YORK YORK YO10 5DD, UK and DEPARTMENT OF MATHEMATICS UNIVERSITY OF SWAZILAND PRIVATE BAG 4, KWALUSENI SWAZILAND

E-mail: [email protected]

DISCRETE APPROXIMATION OF COMPACT OPERATORS AND APPROXIMATION OF THEIR SPECTRA

MANFRED P. H. WOLFF

Abstract. We consider the discrete approximation of compact operators. We find a necessary and sufficient condition for an operator A to be compact in terms of the sequence (An ) of operators An approximating A discretely. Moreover we analyze how the spectrum of A will be approximated by the spectra (An ), by the ε-pseudospectra, respectively, of the An .

§1. Introduction. In recent years the notion of discrete approximation has become more and more fruitful because it describes best what we do when we approximate operators numerically (see [6] for an excellent monography on this subject). Discrete approximation is a generalization of strong convergence of operators and it means the following: Let E and Fn (n ∈ N) be Banach spaces. Let E1 be a dense subspace of E and for each n let Pn : E1 → Fn be an arbitrary linear operator. The sequence (E, E1 , (Fn ), (Pn ))n is called a discrete approximation scheme if limn→∞ #Pn u#n = #u# holds for all u ∈ E1 . Notice that in contrast to [6] we do not require the Pn to be continuous, an advantage which widens the field  of applications, cf. Example 1.1 below. A sequence (un )n ∈ k∈N Fk converges discretely to u ∈ E1 if lim #Pn u − un #n = 0.

n→∞

Whenever this happens we write u = d − limn→∞ un . Let E0 ⊂ E1 be another dense subspace of E, and let A : E0 → E1 be a linear operator. A sequence (An )n of densely defined linear operators An on Fn approximates A discretely on E0 if d − limn→∞ An Pn u = Au holds for all u ∈ E0 . In other words for all u ∈ E1 the following assertion holds: lim #An Pn u − Pn Au#n = 0

n→∞

∀u ∈ E0 .

2000 Mathematics Subject Classification. 47A58, 47A10, 47B05. Key words and phrases. discrete approximation, compact operator, spectrum of a compact operator. Nonstandard Methods and Applications in Mathematics Edited by N. J. Cutland, M. Di Nasso, and D. A. Ross Lecture Notes in Logic, 25 c 2006, Association for Symbolic Logic 

224

DISCRETE APPROXIMATION OF COMPACT OPERATORS

225

In this case we write A = d − limn→∞ An . Strong convergence is a special case of this notion (set E = E0 = E1 = Fn and Pn = I for all n). Example 1.1. Let E = L2 ([0, 1]), E 1 = {f ∈ E : f is continuous}, Fn = n C with the scalar product (x|y) = n1 k=1 x¯ k yk , Pn f = (f( n1 ), . . . , f( nn )), E0 = {f ∈ E1 : f  ∈ E1 , f(0) = f(1)}, and finally let Af = f  with boundary condition f(0) = f(1). For An x = n(x2 − x1 , . . . , xn − xn−1 , x1 − xn ) the sequence (An ) approximates A discretely. Notice that no Pn is continuous with respect to the L2 -norm. n

Let now V (∗ X ) be a polysaturated nonstandard extension of the full superstructure V (X ) over an infinite set X such that N, E, Fn etc., are elements of V (X ). If G is an internal normed linear space over ∗ K (K = R or K = C) in V (∗ X ) then its finite part {x ∈ G : #x# ∈ Fin(∗ R)} is denotes by Fin(G). Here Fin(∗ R) = { ∈ ∗ R : ∃n ∈ N[|| ≤ n]}. Fin(G) is an exterior vector space over K, and G0 = {x ∈ G : #x# ≈ 0} is a subspace  = Fin(G)/G0 equipped with the norm #x# ˆ = ◦ #x# is a Banach such that G space, the nonstandard hull of G. Moreover let T be a K-linear operator from a subspace G1 ⊂ Fin(G) to another internal normed linear space H . ˆ = (Tx)∧ there is uniquely defined linear If T (G1 ∩ G0 ) ⊂ H0 then by Tˆ (x)   . These notions enable us to borrow operator from G1 := {xˆ : x ∈ G1 } into H the following result from [9]. Proposition 1.2. Let (E, E1 , (Fn ), (Pn )) be a given approximation scheme. Then the following assertions hold :  ˜ (i) For N ≈ ∞ the operator P N |E1 =: PN is well-defined and embeds E1  isometrically into FN . Its unique extension to an isometry from E to FN will also be denoted by P˜ N . (ii) Let A be a bounded everywhere defined operator on E and let E0 ⊂ E1 be a dense subspace with A(E0 ) ⊂ E1 . Assume that (An ) approximates A|E0 discretely and moreover assume that (#An #n ) is bounded. Let N ≈ ∞. Then ˆ E = AˆN P˜ N . P˜N A| §2. Discrete approximation of compact operators. Let (E, E1 , (Fn ), (Pn )) be a discrete approximation scheme and let the operator A be discretely approximated by the sequence (An ) of operators An . In the following let us denote the spectrum of an operator A by (A). In this section we want to give a necessary and sufficient condition for A to be compact in terms of the approximating sequence (An ). Moreover we want to show in which manner the spectrum of A will be approximated by the spectra of the An (or better to say: by the ε-pseudospectra of the An . The usefulness of this notion which traces back to H. Landau [4] is well-established as is demonstrated e.g. in the papers [7, 8]).

226

MANFRED P. H. WOLFF

 Definition 2.1 (see [9]). A sequence (xn ) ∈ Fn is called discretely compact if for every ε > 0 there exists a finite set Y (ε) ⊂ E1 depending on ε such that lim sup dist(xn , Pn (Y (ε))) < ε. n

In case E is separable this condition is equivalent to the following one (to be found in [6]): to each subsequence (unk ) there exists a subsequence (unk ) and u ∈ E satisfying limk #Pnk u − unk # = 0. Like in classical analysis the discrete compactness is characterized as follows: Proposition 2.2 (cf. [9, Lemma 3.3]). Let (un )n be a discrete sequence. The following assertions are equivalent: (i) (un )n is discretely compact (ii) To every N ≈ ∞ there exists u ∈ E such that P˜N u = uN. Proof. (i) ⇒ (ii) Let N ≈ ∞ be arbitrary. To ε = 2−r there exists a (standard) finite set Yr such that limn sup dist(xn , Pn (Yr )) < 2−r . This in turn implies that there exists yr ∈ Yr satisfying #xN − PN yr # < 2−r . Obviously the sequence (yr ) is a Cauchy sequence in E1 hence convergent to some u ∈ E. Since P˜N is an isometry the assertion follows. (ii) ⇒ (i) Notice that (ii) implies that (#un #n )n is bounded by q, say. Suppose that (i) does not hold. Then there exists ε0 > 0 such that for every standard finite set Y ⊂ {y ∈ E1 : #y# < 2q} and for every n ∈ N the standard set AY,n = {N ∈ N : N ≥ n, #PN y − uN # > ε0 /2 for all y ∈ Y }   is infinite.  ∗Moreover Y ⊂ Y and n < n implies AY  ,n ⊂ AY,n . Therefore A := Y,n AY,n = ∅. Take N in A. Then N ≈ ∞ and #PN y − uN # > ε0 /2 for all standard y ∈ E1 satisfying #y# < 2q. Let u ∈ E fulfill P˜N u = uˆN . Then #u# ≥ q. Since E1 is dense in E there exists y ∈ E1 such that #u − y# < ε0 /3 and #y# < 2q. Since P˜N is an isometry on E we obtain

ε0 /2 ≤ #P˜ N y − uˆN # ≤ #P˜N y − P˜N u# + #P˜N u − uˆN # < ε0 /3, a contradiction. 



Definition 2.3. A subset X ⊂ n Fn is called uniformly d -compact if for every ε > 0 there exists a finite subset Y ⊂ E1 such that lim sup dist(xn , Pn (Y )) < ε n→∞

holds for all (xn )n ∈ X .

DISCRETE APPROXIMATION OF COMPACT OPERATORS

227

Example 2.4. We take the setting from Example 1.1. For every n let An be the lower triangular matrix ⎛ ⎞ 1 0 ··· 0 ⎟ 1 1 ··· 1⎜ ⎜. . ⎟ ⎜. . ⎟. n ⎝. . ⎠ 1

1

1

Then the set {(An Pn f)n : #f# = 1, f ∈ E1 } is uniformly d -compact as follows from" the next Theorem since (An ) approximates the Volterra Operator x f → (x → 0 f(t)dt). Theorem 2.5. Let the linear operator A : E1 → E be approximated by the sequence (An ) of operators An : D(An ) ⊂ Fn → Fn . Then A is compact if and only if the set X = {(An Pn x) : #x# = 1, x ∈ E1 } is uniformly d -compact. Proof. Let A be compact and let ε > 0 be given. Then there exists a finite set Y ⊂ E such that dist(Ax, Y ) < ε/2 holds for all x ∈ E1 of norm 1. Let N ≈ ∞ be given. Then AN PN x ≈ PN Ax holds for all standard x ∈ E1 . Moreover dist(PN Ax, PN (Y )) < 2ε/3 holds since Y is a standard finite set, Ax is standard and PN |E1 is almost isometric. Hence we obtain dist(AN PN x, PN Y ) < ε whence the assertion follows. Now suppose that X is uniformly d -compact and let ε > 0 be given. Then to ε/2 there exists a finite set Y such that limn sup dist(An Pn x, Pn (Y )) < ε/2 for all x ∈ E1 of norm 1. But this implies dist(AN PN x, PN (Y )) ≤ ε/2 for all N ≈ ∞. Since AN PN x ≈ PN Ax and PN |E1 is almost isometric, dist(Ax, Y ) < ε holds for all x ∈ E1 of norm 1 which implies that A is compact.  Remark. In [9] we introduced the following notion: A sequence (An ) of operators An , densely defined on Fn , is called d -compact if (An xn ) is d -compact for all bounded sequences (xn )n . This notion is equivalent to “uniform compactness” as introduced in [1] (which is the same as the notion of “quasicompactness” as introduced later in [3, p. 236]): A sequence (An ) of uniformly bounded operators is called quasicompact provided AˆN (FˆN ) ⊂ P˜N (E) for all N ≈ ∞. In standard terms this property reads as ∀ε > 0 ∃Y finite ∃n ∈ N ∀m ≥ n[dist(An (Bn (0, 1)), Pn (Y )) < ε], where B(0, 1) denotes the unit ball in E1 . P. Anselone’s notion of collective compactness (see his book [2]) is a special case of d -compactness. Suppose (An ) approximates discretely the bounded operator A. Then d compactness implies that the set X = {(An Pn x) : #x# = 1, x ∈ E1 } above is uniformly d -compact but the converse is not true as the following example shows.

228

MANFRED P. H. WOLFF

Example 2.6. Let E be  2 (N) with the canonical base (en )n≥1 and let En be C with the usual scalar product and the canonical base (ek,n )1≤k≤2n . We set ek,n 1 ≤ k ≤ 2n Pn (ek ) = 0 otherwise. 2n

Moreover we define A by Aek = 2−k ek (k ≥ 1) (and continuous linear extension). Finally let An be given by 2−k ek,n 1 ≤ k ≤ n An ek,n = ek,n otherwise. Then A is compact and A = d − lim An holds true so {An Pn x : x ∈ E, #x# = 1} is uniformly d -compact but AˆN (Eˆ N ) = P˜N (E) hence (An ) is not quasicompact. Notice that even supz=1 (inf{#y# : Pn y = z}) = 1 holds for all n, a rather strong condition which is sometimes required in addition, see [5, 3]. Notice also that 1 ∈ (An ) for all n and therefore dist((An ), (A)) → 0 in contrast to the case of uniform convergence. In [9] we introduced the notion of the approximate ε-spectrum of an operator A, defined by ε,a (A) = { ∈ C : inf x=1 #( − A)x# ≤ ε} which is part of the so-called ε-pseudospectrum (see e.g. [4, 7, 8]). Let us point out explicitly that ε,a (A) = ε (A) holds for finite dimensional operators as well as for normal operators on Hilbert spaces. In [9] we proved the following result: Theorem 2.7 ([9, Theorem 2.2]). Let (E, E1 , (Fn ), (Pn )) be a given approximation scheme. Moreover let the sequence (An , D(An )) of densely defined linear operators on the Banach space Fn approximate discretely the closed densely defined linear operator (A, D(A)). Then for every pair ( , ε) of real numbers 0≤ 0 n=1 k≥n

We are now interested in the opposite problem: when is the set D :=



 ε>0 n=1 k≥n

ε,a (Ak )

DISCRETE APPROXIMATION OF COMPACT OPERATORS

229

contained in a (A)? Obviously  ∈ D if and only if there exists a sequence (xn ) of normalized vectors xn satisfying lim inf #( − An )xn # = 0.

(1)

n→∞

The subset of all  in D for which there exists a discretely compact sequence (xn ) of normalized vectors satisfying (1) is denoted by dc ((An )). Let (An )n approximate discretely the compact operator A, and let 0 =  be an eigenvalue of A. Then  ∈ dc ((An )). For let x be a normalized eigenvector to . Then the sequence (xn )n = (Pn x)n is d -compact and limn→∞ #( − An )xn # = 0 since lim An xn = Ax holds. The following proposition is the main tool for our second theorem, which generalizes results of [1] as well as [9, prop. 3.6]. Notice that in the following theorem the An are only required to be defined on a dense subspace D(An ) so the boundedness of An does not imply the uniform boundedness and must be introduced as an extra condition. Proposition 2.8. Let (An ) be a uniformly bounded sequence of bounded operators approximating discretely the compact operator A. Then dc ((An )) ⊂ (A) ⊂ dc ((An )) ∪ {0}. Proof. Let  ∈ dc ((An )) be given. Let (xn ) be a sequence of normalized vectors xn which is d -compact and satisfies (1). Then there exists N ≈ ∞ such that xN ≈ AN xN . Moreover there exists y ∈ E satisfying xN ≈ PN y. So we obtain PN Ay ≈ AN PN y ≈ AN xN ≈ xN ≈ PN y. Since PN |E is almost isometric and #xN # = 1 we conclude Ay ≈ y, and since A,  and y are standard we obtain #y# = 1 and Ay = y; hence  ∈ (A). The remainder follows from the considerations foregoing the proposition.  Theorem 2.9. Assume that the sequence (An )n is d -compact. Then 



ε,a (Ak ) \ {0} ⊂ (A) ⊂ ε,a (Ak ) ∪ {0}. ε>0 n k≥n

ε

n k≥n

Proof. Notice that a d -compact sequence  ofoperators is obviously uniformly bounded. We show that D := ε>0 n k≥n ε (Ak ) is contained in dc ((An )) ∪ {0}. Let 0 =  ∈ D be arbitrary. Then there exists a sequence (xn ) with #xn # = 1 such that (1) holds. Therefore there exists N ≈ ∞ such that xN ≈ AN xN . Since (An )n is d -compact by hypothesis there is y ∈ E such that AN xN ≈ PN y holds. In particular xN ≈ PN y . Setting z = y we obtain #z# ≈ 1 and PN z = PN y ≈ xN ≈ AN xN 0123 ≈ AN PN z ≈ PN Az, (2)

230

MANFRED P. H. WOLFF

where (2) holds since (An )n is uniformly bounded. Since PN |E1 is almost isometric we obtain  ∈ (A) (cf. the proof of the proposition above). Since A is compact  ∈ dc ((An )). So the assertion follows from the previous proposition.     Remark. The set D = ε>0 n k≥n ε,a (Ak ) looks rather complicated. But by equation (1) above it is characterized rather simple. The additional condition in part (b) of the following corollary is satisfied e.g. for normal operators on Hilbert spaces. Moreover let us point out, that in case Fn are all finite dimensional or the An are all normal operators on Hilbert spaces Fn then a (An ) = (An ) holds. Corollary 2.10. Under the hypotheses of the theorem the following assertions hold : (a) limn→∞ dist(a (An ), (A)) = 0. (b) Assume in addition that there is a constant M ≥ 0 such that #( − An )−1 # ≤ M (dist(, (An )))−1 holds for all n and for all  in the resolvent set (An ) of An . Then lim dH (a (An ), (A)) = 0,

n→∞

where dH denotes the Hausdorff-distance between compact sets in the complex plane. Proof. The first assertion follows from a (An ) ⊂ ε,a (An ) and the theorem. The second assertion follows from the first one and Theorem 2.3 in [9].  REFERENCES

[1] S. Albeverio, E. I. Gordon, and A. Yu. Khrennikov, Finite dimensional approximations of operators in the Hilbert spaces of functions on locally compact abelian groups, Acta Applicandae Mathematicae, vol. 64 (2000), pp. 33–73. [2] P. M. Anselone, Collectively compact operator approximation theory, Prentice-Hall, Englewood Cliffs, 1971. [3] E. I. Gordon, A. G. Kusraev, and S. S. Kutateladze, Infinitesimal analysis, Kluwer Acad. Publ., Dordrecht, 2002. [4] H. Landau, On Szeg¨o’s eigenvalue distribution theory and non-Hermitian kernels, Journal d’Analyse Math´ematique, vol. 28 (1975), pp. 335–357. [5] F. Rabiger and M. P. H. Wolff, On the approximation of positive operators and the ¨ behaviour of the spectra of the approximants, Integral Equations and Operator Theory, vol. 28 (1997), pp. 72–86. [6] H. J. Reinhardt, Analysis of approximation methods for differential and integral equations, Springer Verlag, Berlin, Heidelberg, New York, 1985. [7] L. N. Trefethen, Pseudospectra of matrices, Numerical analysis, Proceedings of the 14th Dundee conference 1991 (D. F. Griffith and G. A. Watson, editors), Pitman, London, 1992, pp. 234–264.

DISCRETE APPROXIMATION OF COMPACT OPERATORS

231

[8] , Pseudospectra of linear operators, SIAM Review, vol. 39 (1997), pp. 383– 406. [9] M. P. H. Wolff, Discrete approximation of unbounded operators and approximation of their spectra, Journal of Approximation Theory, vol. 113 (2001), pp. 229–244. MATHEMATISCHES INSTITUT ¨ TUBINGEN ¨ UNIVERSITAT AUF DER MORGENSTELLE 10 ¨ D-72072 TUBINGEN, GERMANY

TEACHING

NONSTANDARD ANALYSIS AT PRE-UNIVERSITY LEVEL: NAIVE MAGNITUDE ANALYSIS

RICHARD O’DONOVAN AND JOHN KIMBER

Abstract. Nonstandard analysis can be introduced in a pre-university mathematics course. Examples are given of how an intuition of infinitesimals leads to a mathematical concept. Using infinitesimals, students acquire the fundamental idea of a differential which in turn introduces derivatives and integrals. At this level, compared to epsilon-delta analysis, the nonstandard analysis approach allows for greater simplicity.

Introduction. In Coll`ege Andr´e-Chavanne in Geneva, Switzerland, nonstandard analysis is used at a pre-university level.1 For the mathematician, it is interesting to observe how words encountered in everyday language, such as “smaller than”, “infinitely close” or “infinitely large” lead to a more refined and rigorous description of fundamental ideas. Here is a presentation of how infinitesimals are used to introduce the basic concepts of analysis: differentials, derivatives and integrals. Comments are given about the students’ response. Workshop exercises are used throughout the course; these are the driving force and motivation provided to illustrate and introduce new fields of investigation. Theory is a result of discussions following the exercises which introduce a concept. This type of pedagogical approach works well with nonstandard analysis. Some of these exercises are given as examples. At this pre-university level, ultrafilters are not mentioned but neither are Dedekind cuts in an epsilon-delta analysis course. At this stage, “analysis” whatever the approach, could be called “naive analysis”. Nonstandard analysis at this level, emphasises simplicity.

At the AMS-UMI meeting in Pisa (June 2002) Richard O’Donovan received many helpful remarks by Jerome Keisler, Peter Loeb, Nigel Cutland, David Ross, David Ballard, Mauro di Nasso, John Bell and Renling Jin. Many of their suggestions have been included in this article. 1 For a short description of the mathematics curriculum in Geneva see Appendix A. Nonstandard Methods and Applications in Mathematics Edited by N. J. Cutland, M. Di Nasso, and D. A. Ross Lecture Notes in Logic, 25 c 2006, Association for Symbolic Logic 

235

236

RICHARD O’DONOVAN AND JOHN KIMBER

§1. Complexity. A difficulty systematically encountered by students is the order in which quantifiers are used. It often requires some time for them to realise that ∀x∃y is not the same as ∃y∀x. An interesting link between logic and pedagogy is illustrated by the prenex form of a formula. In [4] the authors quote the prenex rank of a formula as the number of alternations of quantifiers, with rank 0 when no alternation occurs. The higher the rank, the more difficult it is for students to grasp the meaning and, the authors add, “A formula of prenex rank 4 would make any mathematician look twice.” Considering the prenex rank as some measure of conceptual difficulty, analysis with infinitesimals rates better than analysis with epsilons and deltas. Example. The formula “∀ > 0 ∃ > 0 ∀x ∈ A |x − x0 | <  ⇒ |f(x) − f(x0 )| < ” which defines continuity at x0 ∈ A is of prenex rank 2, whereas the formula “∀x ∈ A x ) x0 ⇒ f(x) ) f(x0 )” is of prenex rank zero. Other rankings are conceivable. The number of alternations of the variables, for example “which of  or comes first?”, “which of |x −x0 | or |f(x)−f(x0 )| comes first?” These alternations cause many errors. Another ranking compares the number of signs per theorem and per proof. The more sophisticated the theorem, the greater the difference in number of signs between standard and nonstandard proofs — again to the advantage of nonstandard analysis. This could be a good reason to use nonstandard analysis at pre-university level. §2. Making infinitesimals appear. The following exercises are introduced in class with no prior theory or definitions. The students, therefore, do not necessarily know what kind of answer is expected of them. Finding a correct answer is not the main goal of this type of exercise. Its purpose is to start a discussion where students will first rely on their intuition, then gradually —through a question and answer process— refine their ideas and eventually feel the need for formalisation. Exercise 1: Hold up your pen. Now drop your pen. At first, the speed is zero (there is no speed) then . . . What is the first non-zero speed? Students never mention the idea that there is no first non-zero speed.2 They do, however, attempt to write down the concept of a first non-zero speed symbolically. One common claim is 0.01. To the students, it “feels” smaller than any real positive number. 2 The nonexistence of a first non-zero speed —even in ∗ R— is discussed at a later stage when a working knowledge of infinitesimals has been acquired.

NONSTANDARD ANALYSIS AT PRE-UNIVERSITY LEVEL

237

But what then would 0.012 be? Sometimes the answer 0.0 01 is suggested by students.√It is quite interesting to notice that it is a purely symbolic answer. Of course 0.01 will prove to be an insoluble problem to them; thus 0.01 cannot be kept as a decent notation for an infinitesimal — though certainly not a stupid suggestion. Another common suggestion is 10−∞ . The use of the infinity symbol is not surprising: it does appear as part of everyday culture — which does not mean that the mathematical meaning of the symbol is understood. That 1, 2, 3, etc. goes on indefinitely seems an obvious fact to most students, probably since primary school when children are challenged with the question: “how far can you count?” For some of them, indefinitely means infinity, whereas for others the limit is when you die . . . The concept of infinitesimals, on the other hand, does not appear with the same ease and requires more thinking. The following exercise helps students make the link between infinitely small and infinitely large. Exercise 2: Take 11 ; 12 ; 13 ; 14 ; 15 . . . and go on forever. Where does it go to? — because it must get somewhere . . . (just an informal restatement of the compactness theorem.) No student suggests that it would reach zero. Even though the students’ first intuition is that counting can go on to infinity, infinity is not thought of as a number but rather as an unfinished process. On the contrary, the infinitely small, situated between numbers (which —for them— are already there), once accepted, are considered quite naturally to be numbers. This in turn allows the infinitely large to be accepted as numbers. The following exercise gives an immediate perception of what can be done with these new number-like objects. At this stage, students have no idea of what a derivative is. Exercise 3: Physicists have found that a falling object satisfies d (t) = 12 gt 2 where d (t) is the vertical distance after t seconds and g ≈ 10 [ sm2 ] How far has the object fallen in 3 seconds? What is its speed after 3 seconds? Of course the average speed 15 m · s−1 is suggested as a first answer; however, on second thought, students realise that after 1 second the object has only fallen 5 m thus the average speed will not do. As a next step, some students think of calculating the average speed over the last instant — which they note as x because, as they say “that is the name we give to unknown values.” The computation follows the rule that the average speed between t0 and t1 is d (t1 )−d (t0 ) where t1 = 3 and t0 = 3 − x. The result is (30 − 5x) m · s−1 . t1 −t0

238

RICHARD O’DONOVAN AND JOHN KIMBER

A discussion takes place in class about the following statement: If an instant x is smaller than any measurable duration, then a speed (30 − 5x) m · s−1 is not measurably different from 30 m · s−1 . Everything is here: a derivative, an infinitesimal, infinite closeness and even the concept of the standard part. A colleague, attending a class to become acquainted with teaching nonstandard analysis, observed that nothing the students do is contrary to their intuition. This reinforces the idea that mathematics need not stand out as that part of philosophy where intuition is systematically proven to be wrong. It is not an absolute necessity that students should “hate the subject”. §3. Definitions and proofs. The question of considering zero as an infinitesimal or not was discussed more than once by the authors. In spite of the fact that zero is considered in many articles as an infinitesimal, it seems that this approach is less appealing from a pedagogical point of view. The following definition is now used throughout the course. Definition: An infinitesimal is a number which is less in modulus than any positive real number, yet not zero. This approach appears to be safer and closer to intuition. Zero is a real number, infinitesimals are “new” numbers, zero is not new, therefore zero is not an infinitesimal! In other words: 0 ∈ / IS (where IS is the set of infinitesimals, IS+ the set of positive infinitesimals and IS− the set of negative infinitesimals) and whenever  ∈ IS it is possible to divide by . There is a distinct advantage in allowing division by all infinitesimals, bearing in mind that division by zero is precisely one of the main pitfalls of analysis. Another pedagogical difficulty arises when considering variables and constants.  ∈ IS is usually an undetermined infinitesimal, intrinsically a variable, this collides with the fact that zero is a uniquely defined number. By stressing “infinitesimal or zero” whenever necessary, the teacher emphasises this dual aspect. Two fundamental ideas related to the above discussion are those of “getting infinitely close to” and “reaching” which are confusing for students. This can be seen in most examples of convergence. A function represented by a straight line is not easily considered as converging to the line, precisely because it reaches it. The idea of closeness is formalised using the following concept: Definition: a ) b ⇔ a − b ∈ IS or a − b = 0. The ) sign is read “equivalent to”, merging into a new concept infinite closeness, and equality —or in a dynamic view, the ideas of approaching and reaching— which appear in their own right as distinct concepts.

NONSTANDARD ANALYSIS AT PRE-UNIVERSITY LEVEL

239

As the students have already used “infinitely smaller” and “infinitely larger”, it is necessary to formalise the definitions. Definition: If ab ∈ IS then a is infinitely smaller than b — which leads to the converse: b is infinitely larger than a. ∈ IS ⇒ ab = 0 ⇒ a = 0 (by definition of infinitesimals) so it cannot be said that 0 is infinitely smaller than 1 — or that 1 is infinitely larger than 0, implying that 1 could be zero multiplied by an infinitely large number. Once again, everything is set up so that students are not even remotely tempted into wanting to divide by zero. If a is infinitely smaller than b then a is said to be “invisible” at the scale of b. “The visible part of a” is used as an informal statement for the standard part of a. “The visible part of a at the -scale” is what is left after discarding all the additive terms infinitely smaller than : i.e. it is what would be the standard part after a division by . Cancellation is traditionally done by crossing out. Invisibility is indicated by square brackets. a b

2 + 4 + 2+

 2  0123 invisible at the -scale   2

4 +  0 12 3

invisible at the standard scale

The definition of a standard part is introduced later. The words “visible” and “invisible” appear to be fully operational metaphors enabling the students to start working more formally on the following exercises: Exercise 4: Prove:

∀x ∈ R∗ ∀ ∈ IS x ·  < 1 Hint: Otherwise, assume x ·  = b where b is in R and show that  would be in R. Exercise 5: Prove that if  is infinitely larger than some real number then it is infinitely larger than any real number. The infinitely large (IL) are defined to be all numbers infinitely larger than some real number. §4. Standard part. The ∗ R symbol is used for hyperreals but aside from that, the left star notation is never used. Theorems are about real functions, thus inputs are either real numbers or hyperreals explicitly written in the form x +  (with x ∈ R and  ) 0). The syllabus does not require the study of the

240

RICHARD O’DONOVAN AND JOHN KIMBER

difference between continuity and uniform continuity, so x, y ∈ ∗A (as in [1]) is not a required statement. The definition of the standard part relies on the existence and uniqueness of a real number equivalent to a given hyperreal. At this pre-university level, finite hyperreals are defined as numbers of the form a +  for a ∈ R and  ) 0 thus the existence of a real number equivalent to a given finite hyperreal is a consequence of this definition; its uniqueness is probably the first acquaintance of students with uniqueness theorems. The notation st(a + dx) = a is used. §5. Magnitudes and scales. Exercise 6: Draw a straight line scaled from −4 to 4 with 2 squares per unit. On this line, plot , −2 + , 1 + 105 ·  [for  ∈ IS+ ] Redraw this line after magnification by a factor of 1 centred on 0 [for the same  ∈ IS+ ] On the second drawing, what is 2 squares to the left of 0? What is 4 squares to the right of 0? Plot 3 on this line. Plot√ 2 on this line. How many squares are there between 0 and 1? Plot  on the line. At this point most students are hesitant because infinitesimals are considered to be invisible and the objective here is to make these invisibles visible! The students are asked to look at the Alps 80 kilometres away and visualise adding a snowball to the top of the highest mountain.3 Would the difference be visible from where they stand? Would the difference be visible through a sufficiently powerful telescope? They easily accept that what can be seen depends on the scale of observation. √ When attempting to draw  2 and , students realise that there are different magnitudes of infinitesimals, formalised by the following: Definition: If ab is finite but not infinitesimal nor zero, then a and b are of the same magnitude. This definition, together with those of infinitely smaller and infinitely larger, leads to -equivalence: Definition: For  ∈ IS, two numbers a and b are -equivalent if their difference is infinitely smaller than  or if they are equal. This is written a ) b The subscript is mentioned explicitly for a given scale. Thus the scale of dx would be characterised by dx-equivalence: 2 + dx )dx 2 + dx + dx 2 . The symbol “)” without a subscript refers to equivalence at the standard scale. 3 Mont

Blanc, 4807 m.

NONSTANDARD ANALYSIS AT PRE-UNIVERSITY LEVEL

241

Now the students are ready for two dimensions. §6. The differential and the derivative. The next step for the students is to understand the differential which in this approach is a key concept. Exercise 7: Let f : x → x 2 , g : x → x 3 , and h : x → |x| be real functions and  ∈ IS+ Zoom in at a magnification factor of 1 , centred on 1; 1 of f, g and h. Zoom in at a magnification factor of 1 , centred on 0; 0 of f, g and h. Before this lesson, the only observation that the students could have made was that these functions are equal at some points in the sense that f(1) = g(1) = h(1) (same for 0). They now realise that something more can be observed.4 If an interval of size dx can be drawn, then an interval of size dx 2 can neither be seen nor drawn at the dx-scale. The functions f and g appear to be almost straight lines in the interval. The modulus function doesn’t share this newly observed characteristic. The next step in class is formalisation. As the differential is considered fundamental it is defined before the derivative. Δf is defined as f(x + dx) − f(x) as usual in nonstandard analysis. If, through a dx-zoom on x0 ; y0 , the plot is infinitely close to a straight line then the function is said to be locally linear (no steps, no “angular” singularities). If the function is locally linear then dy is defined as the visible part of Δy at the dx-scale (otherwise, dy is not defined at x0 .) The derivative y  = dy/dx can be described as the slope of the line which is visible at the dx-scale. It is the average slope of the curve on an infinitesimal interval. The fact that dx and dy can be drawn is the fundamental reason to define dx and dy as basic objects. Then y  = st(Δy/dx) is shown to be an equivalent definition of the derivative. These concepts are introduced with polynomials and rational functions, thus the expansion of f(x + dx) is simple. For y = x 3 students compute Δy = 3x 2 · dx + 3x · dx 2 + dx 3 hence dy = 3x 2 · dx and dy/dx = 3x 2 . With the equivalent definition, st(Δy/dx) = st((3x 2 · dx + 3x · dx 2 + dx 3 )/dx) = st(3x 2 + 3x · dx + dx 2 ) = 3x 2 (for x ∈ R). Figure 1 is fundamental and students must be able to draw it for any function placing the various numbers relative to a given dx-scale. Though simple, this sketch contains all the fundamental ideas of analysis with infinitesimals. It is also fundamentally different from what is rigorously acceptable in - theory. 4 “Zoom in” now appears to be used both for microscope close-ups and for telescope views probably due to the widespread use of video cameras, special effects and computer programs.

242

RICHARD O’DONOVAN AND JOHN KIMBER

f(x + dx) H Y H difference between Δf(x) and df(x)  invisible at this scale     Δf(x)    f(x)   dx x x + dx Figure 1. A locally linear function. The pedagogically important point is that the slope of a function is the slope of a line through two points (infinitely close). The tangent line is otherwise usually defined as the limit of secants. Students often have a problem with this definition since, when the two points meet, the tangent can twirl around the point and it no longer qualifies as a “tangent”. The metaphor is that of a line resting on two supports. As the tangent is about to be defined, it disappears. This anomaly is avoided with nonstandard analysis: with two points to rest on, the tangent remains firmly anchored. These ideas are summarised in two basic equations: (1)

dy = y  · dx

and the microscope equation given by Stroyan [2]. (2)

f(x + dx) = f(x) + df(x) + · dx

with ) 0

provided that the function is differentiable, of course. Theorems about derivatives are generally simple and their proofs a direct consequence of the concept of differential. For example, the proof concerning the derivative of the composition of functions can be obtained by the systematic use of (1). There is no need to add and subtract convenient terms (or other white-rabbit-out-of-a-hat performance). Students can then discover the proof on their own. It is sufficient to assume that u and v are differentiable, that u depends on v and v depends on x, therefore d (u ◦ v) = u  · dv = u  · v  · dx Dividing by dx, which is not zero since the independent variable is chosen in IS yields the well known result. This works even if dv = 0, unlike the frequently du dv used dx = du dv · dx . This elegant proof often encourages other teachers to consider nonstandard analysis as a worthwhile approach. Proofs also often use Thompson’s simple approach [3]. Once the theorems about derivatives have been proven, many exercises follow the same path as in

NONSTANDARD ANALYSIS AT PRE-UNIVERSITY LEVEL

243

any other analysis course. There is, however, a major change in the syllabus. Integration can be introduced shortly after the derivative. §7. Integrals. The concept of integrals is introduced in class by considering areas under curves. The students are made aware of the fact that the function describing an area is assumed to exist. The differential of this function yields the concept of antiderivatives. The area is then shown to be a sum of rectangles, where areas of rectangles are assumed to be well defined. This confirms the existence of an area function, which in turn proves the fundamental theorem of calculus. Exercise 8: Let f : x → x 2 . Assume that A(x) is the function which describes the area under the curve of f between 0 and x. Sketch the curve of f and indicate where ΔA(x) would be. Zoom in at x; f(x). Calculate ΔA(x) and dA(x) and show where these are on the drawing. Students are asked to generalise to any continuous function, which leads to Figure 2. It follows that ΔA(x) )dx f(x) · dx + Since f is assumed to be continuous, thus

Δf(x) · dx 2

Δf(x)·dx 2

is invisible at the dx scale,

dA(x) = f(x) · dx and therefore A (x) = f(x). This is probably one of the easiest ways to show the students that the area under a curve is given by the inverse operation of finding the derivative: the antiderivative. In many - courses Figure 2 is used as a metaphor to illustrate the meaning of the fundamental theorem. Exercise 9: Partition the interval [a; b] into pieces dx long (dx ∈ IS). Put all these pieces together again (add all the dx). What is the result? Apply Figure 2 to each of these pieces. Add all these pieces together again. What is the result?

244

RICHARD O’DONOVAN AND JOHN KIMBER

f

A(x) a

x

f(x+dx)

b

Δ f(x) f(x)

dx Δ A(x)

Figure 2. Area under the curve defined by f.  The “ ” sign indicates sums with an integer increment. It is re-interpreted here by indicating the step in the subscript. b $ x=a by steps of dx

The transfer theorem is not used to show that sums can be extended to a hyperfinite number of terms. Here also, the extension of the concept of sum follows what intuition expects it to be. The following discussion ensues: On an interval [a; b] on which the function b is increasing, the area is greater than x=a f(x) · dx and less than b $

f(x + dx) · dx.

x=a

A close study of the endpoints through a dx-zoom shows that the end  slice goes from b − dx to b and these sums are in fact b−dx x=a f(x) · dx

NONSTANDARD ANALYSIS AT PRE-UNIVERSITY LEVEL

245

b−dx

f(x + dx) · dx. Students can then transform the latter into f(x) · dx. The difference between the two sums is x=a+dx

and b

x=a

b $

f(x) · dx −

b−dx $

f(x) · dx = f(b) · dx − f(a) · dx

x=a

x=a+dx

= (f(b) − f(a)) · dx Since f(b) − f(a) is a standard number, the difference between the two sums is a standard number multiplied by an infinitesimal, hence an infinitesimal or zero. The same holds for closed intervals on which the function is decreasing. So the area is squeezed between two infinitely close numbers, therefore A ) b a f(x) · dx. (Adding an extra slice, so that the endpoints are a and b, only adds an infinitesimal quantity.) This number is not necessarily standard. Therefore the area is redefined to be the standard number:

b   b $ by definition A = st f(x) · dx = f(x) · dx. 

a

=

a



 tandard part of the

As f(x) · dx = dA(x) then



um.

b

dA(x)

A= a

In Greek Antiquity: “the sum of the parts is equal to the whole”. It was not: the whole is the limit of the sum of the limits of its parts. As a student put it: “Seeing the area as a sum of slices does not enable us to compute the value. It shows what is happening. To compute the value, we need to study the variation of the area.” The next exercise is difficult to solve in - analysis. Exercise 10: 2 Given that the gravitational force between two masses is F = k m1d·m 2 (where d is the distance between the two masses and k a constant), what is the force between objects A and B in the following situation? A is a sphere considered as a point (for simplicity), B is a bar. 18kg 6m

B

6kg t  3m A

Students answer this by finding the weight of an infinitesimal piece of the bar B of length dx. Finding how many pieces of length dx there are is not easy for the students. They tend to answer something like  ∈ IL. They usually

246

RICHARD O’DONOVAN AND JOHN KIMBER .................................................... ......... ...... ...... ..... ..... .... .... .... . . . . ... ... ... . ... .... ... ... ... ... ... ... .. .. ... ... ... .. .. .. .. . .. . . ......................... . . ... ......... ... ....... ... ...... ... ... .... .. ... .... ... ... . .... . .... ... .... .... ... .... ..... .... ... ...... ..... ... ........ ....... . . . . ................. . . . . ... . .......................... .. .. .. .. ..

−d cos()   T  T dTT d sin()   T P T  PP A  q P A   

Figure 3. Trigonometry zoom. need to be guided, so that they give an answer with respect to dx. Hence the resulting corresponding infinitesimal force depends on the distance x from A: ΔF (x) = 18k dx. The students can check that it satisfies the conditions to be x2 " written as dF (x). They then calculate the sum of all the forces: dF = F . For these students, the dx is not “a mere symbol to point out which letter "b is the independent variable.” f(x) · dx really is a product and a f(x) · dx really is a sum — the standard part of a sum. Exercise 11: Prove that if ∀x ∈ [a; b] f  (x) = 0 then ∀x ∈ [a; b] df(x) = 0. Calculate the variation of f between a and u ∈ [a; b]. Hence, deduce that if ∀ ∈ [a; b] f  () = 0 then f is constant on [a; b]. This leads to:

 f(x) − f(a) =



x

df(t) = a

x

0=0 a

thus f(x) = f(a). An immediate consequence is the uniqueness of the antiderivative up to an additive constant. §8. More complicated functions. After polynomials, rational and root functions, the class studies trigonometric, exponential and logarithmic functions. It can remain simple. For instance, the hard way to compute the derivative of the sine function is to expand sin(a + dx). Alternately, in Figure 3, d cos() is negative because the horizontal variation is negative. In each quadrant there is always one and only one of the two variations which is adjacent , therefore negative. The trigonometric definition yields: cosine = hypotenuse = sin (). cos() = d sin() d The higher level students also study differential equations. Though solving the equations may remain difficult, the underlying concept is, in this approach,

NONSTANDARD ANALYSIS AT PRE-UNIVERSITY LEVEL

247

simple and several previous exercises already used it. No abstract “differential form” is needed. These exercises give a flavour of the content of the course and how students respond. Some more general aspects remain to be discussed. §9. Terminology. The problem with the title “nonstandard analysis” seems to imply that the course will not be “normal”. And then follows the obvious question: “What is standard analysis?” a title nowhere to be found. “Infinitesimal Calculus” or “Differential Calculus” sound oldfashioned, and “Leibnizian Analysis” —though a deserved homage— roots the subject back a few centuries and would be unfair to Newton. Nonstandard analysis is one of the rare chapters where 20th century mathematics can be taught at this level. In this context, “Magnitude Analysis” conveys the idea that a function is analysed by changing the scale of observation and allowing the use of infinitesimals as well as infinitely large. It also follows the French “Analyse en ordres de grandeurs” by F. Diener and R. Lutz. We merely suggest this title as a synonym. Another suggestion arises from the intuitive feeling that there are different “layers” rather than magnitudes. Thus “Strata analysis” could be used. §10. What next? Students will choose to study law, medicine, languages, literature, some may read physics or mathematics. The majority will never really study mathematics, thus the question of how their knowledge fits in with mainstream - analysis is irrelevant. These students will hopefully remember that they did, once upon a time, study concepts related to the idea of infinity and recall that it was not counter-intuitive. Those who choose to study Science will take some introductory course in analysis starting from scratch, covering all necessary concepts to grasp the subject. For these students, will the - approach be seen as: • an enrichment? • an unnecessary complication? If the analysis course at University turned out to be given as nonstandard analysis, we would see mathematicians who would not have studied with epsilons and deltas or at least: not as the basic approach. How would these mathematicians tackle problems? What would their insight of problems be? Would they come up with novel solutions and new problems? There is a heated debate whether nonstandard analysis should be introduced at pre-university level. It has been demonstrated herein that it is possible, emphasising that in mathematics, simplicity rhymes with beauty. Appendix A. The school. The school system in Geneva encompasses 13 years of basic academic education before University. During six years of

248

RICHARD O’DONOVAN AND JOHN KIMBER

primary school children learn to count and do arithmetic along with some basic geometry. The three year middle school includes the study of simple algebraic manipulations, linear equations, polynomials and rational functions. By the end of middle school students are 14 or 15 years old. Compulsory education comes to an end. Students interested in further academic education can go to “Coll`ege”, grades permitting. In Coll`ege, mathematics is a compulsory subject. During the first two years, courses include trigonometric relations, solving quadratics and linear systems with more than one unknown. In third and fourth year emphasis is on analysis. Linear algebra, probability and statistics amount to no more than half the teaching time. Mathematics is one of ten to twelve subjects, all of which are tested in final exams. REFERENCES

[1] Robert Goldblatt, Lectures on the hyperreals, Springer, 1998. [2] K. D. Stroyan, Mathematical background: Foundations of infinitesimal calculus, Academic Press, 1997, www.math.uiowa.edu/%7Estroyan/backgndctlc.html. [3] Silvanus P. Thompson and Martin Gardner, Calculus made easy, MacMillan, 1998. [4] William Weiss and Cherie D’Mello, Fundamentals of model theory, Dep. of Mathematics, University of Toronto, 1997. ` ´ COLLEGE ANDRE-CHAVANNE 14 AV. TREMBLEY, 1209 GENEVA, SWITZERLAND

Correspondence address: Richard O’Donovan, 3 place des Eaux-Vives, 1207 Geneva, Switzerland E-mail: [email protected] E-mail: [email protected] URL: http://hypo.ge.ch:8080/chavanne/Enseignement/Disciplines/Math