Foundations of Modern Probability [2nd ed.] 978-0-387-95313-7;978-1-4757-4015-8

From the reviews of the first editions: "... Kallenberg's present book would have to qualify as the assimilati

472 105 59MB

English Pages XX, 638 [655] Year 2002

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Foundations of Modern Probability [2nd ed.]
 978-0-387-95313-7;978-1-4757-4015-8

Table of contents :
Front Matter ....Pages i-N1
Measure Theory — Basic Notions (Olav Kallenberg)....Pages 1-22
Measure Theory — Key Results (Olav Kallenberg)....Pages 23-44
Processes, Distributions, and Independence (Olav Kallenberg)....Pages 45-61
Random Sequences, Series, and Averages (Olav Kallenberg)....Pages 62-82
Characteristic Functions and Classical Limit Theorems (Olav Kallenberg)....Pages 83-102
Conditioning and Disintegration (Olav Kallenberg)....Pages 103-118
Martingales and Optional Times (Olav Kallenberg)....Pages 119-139
Markov Processes and Discrete-Time Chains (Olav Kallenberg)....Pages 140-158
Random Walks and Renewal Theory (Olav Kallenberg)....Pages 159-177
Stationary Processes and Ergodic Theory (Olav Kallenberg)....Pages 178-201
Special Notions of Symmetry and Invariance (Olav Kallenberg)....Pages 202-223
Poisson and Pure Jump-Type Markov Processes (Olav Kallenberg)....Pages 224-248
Gaussian Processes and Brownian Motion (Olav Kallenberg)....Pages 249-269
Skorohod Embedding and Invariance Principles (Olav Kallenberg)....Pages 270-284
Independent Increments and Infinite Divisibility (Olav Kallenberg)....Pages 285-306
Convergence of Random Processes, Measures, and Sets (Olav Kallenberg)....Pages 307-328
Stochastic Integrals and Quadratic Variation (Olav Kallenberg)....Pages 329-349
Continuous Martingales and Brownian Motion (Olav Kallenberg)....Pages 350-366
Feller Processes and Semigroups (Olav Kallenberg)....Pages 367-389
Ergodic Properties of Markov Processes (Olav Kallenberg)....Pages 390-411
Stochastic Differential Equations and Martingale Problems (Olav Kallenberg)....Pages 412-427
Local Time, Excursions, and Additive Functionals (Olav Kallenberg)....Pages 428-449
One-Dimensional SDEs and Diffusions (Olav Kallenberg)....Pages 450-469
Connections with PDEs and Potential Theory (Olav Kallenberg)....Pages 470-489
Predictability, Compensation, and Excessive Functions (Olav Kallenberg)....Pages 490-514
Semimartingales and General Stochastic Integration (Olav Kallenberg)....Pages 515-536
Large Deviations (Olav Kallenberg)....Pages 537-560
Back Matter ....Pages 561-638

Citation preview

Probability and its Applications A Series of the Applied Probability Trust Editors: J. Gani, C.C. Heyde, T.G. Kurtz

Springer Science+Business Media, LLC

Probability and its Applications Anderson: Continuous-Time Markov Chains. Azencott/Dacunha-Castelle: Series of Irregular Observations. Bass: Diffusions and Elliptic Operators. Bass: Probabilistic Techniques in Analysis. Choi: ARMA Model ldentification. de la Pena!Gine: Decoupling: From Dependence to Independence. Galambos/Simonelli: Bonferroni-type lnequalities with Applications. Gani (Editor): The Craft ofProbabilistic Modelling. Grandell: Aspects of Risk Theory. Gut: StoppedRandom Walks. Guyon: Random Fields on a Network. Kallenberg: Foundations ofModem Probability, Second Edition. Last/Brandt: Marked Point Processes on the Real Line. Leadbetter/Lindgren/Rootzen: Extremes and Related Properties of Random Sequences

and Processes.

Nualart: The Malliavin Calculus and Related Topics. Rachev/Rüschendorf: Mass Transportation Problems. Volume 1: Theory. Rachev/Rüschendorf: Mass Transportation Problems. Volume II: Applications. Resnick: Extreme Values, Regular Variation and Point Processes. Shedler: Regeneration and Networks of Queues. Thorisson: Coupling, Stationarity, and Regeneration. Todorovic: An lntroduction to Stochastic Processes and Their Applications.

Olav Kallenberg

Foundations of Modem Probability Second Edition

i

Springer

Olav Kallenberg Department of Mathematics Aubum University Aubum, AL 36849 USA

Series Editors J.Gani Stochastic Analysis Group, CMA Australian National University Canberra, ACT 0200 Australia

C.C. Heyde Stochastic Analysis Group, CMA Australian National University Canberra, ACT 0200 Australia

T.G. Kurtz Department of Mathematics University of Wisconsin 480 Lincoln Drive Madison, WI 53706 USA

Mathematics Subject Classification (2000): 60-01 Library of Congress Cataloging-in-Publication Data Kallenberg, Olav. Foundations of modern probability I Olav Kallenberg. p. cm. - (Probability and its applications) Includes bibliographical references and index.

2nd ed.

ISBN 978-1-4419-2949-5 ISBN 978-1-4757-4015-8 (eBook) DOI 10.1007/978-1-4757-4015-8 1. Probabilities. I. Title. applications. QA273.K285 2001 519.2-dc21

II. Springer series in statistics. Probability and its 2001032816

Printed on acid-free paper. © 2002 by Springer Science+Business Media New York Originally published by Applied Probability Trust in 2002. Softcover reprint of the hardcover 2nd edition 2002 All rights reserved. This work may not be translated or copied in whole or in part without the written pennission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Production managed by Allan Abrams; manufacturing supervised by Jerome Basma. Photocomposed pages prepared by the BartJett Press.

9876 5 432

Praise for the First Edition "It is truly surprising how much material the author has managed to cover in the book. . . . More advanced readers are likely to regard the book as an ideal reference. Indeed, the monograph has the potential to become a (possibly even 'the') major reference book on large parts of probability theory for the next decade or more." -M. Scheutzow (Berlin) "I am often asked by mathematicians ... for literatme on 'a broad introduction to modern stochastics.' . . . Due to this book, my task for answering is made easier. This is it! A concise, broad overview of the main results and techniques . . . . From the table of contents it is difficult to believe that behind all these topics a streamlined, readable text is at all possible. It is: Convince yourself. I have no doubt that this text will become a classic. Its main feature of keeping the whole area of probability together and presenting a general overview is a real success. Scores of students . . . and indeed -P.A.L. Embrechts (Zürich) researchers will be most grateful!" "The theory of probability has grown exponentially during the second half of the twentieth century, and the idea of writing a single volume that could serve as a general reference ... seems almost foolhardy. Yet this is precisely what Professor Kallenberg has attempted . . . and he has accomplished it brilliantly. . . . With regard to his primary goal, the author has been more successful than I would have imagined possible. It is astanishing that a single volume of just over five hundred pages could contain so much material presented with complete rigor, and still be at least formally selfcontained. . . . As a general reference for a good deal of modern probability theory [the book] is outstanding. It should have a place in the library of every probabilist. Professor Kallenbergset hirnself a very difficult task, and he should be congratulated for carrying it out so well." -R.K. Getoor (La Jolla, California) "This is a superbly written, high-level introduction to contemporary probability theory. In it, the advanced mathematics student will find basic information, presented in a uniform terminology and notation, essential to gaining access to much present-day research. . . . I congratulate Professor Kallenberg on a noteworthy achievement." -M.F. Neuts (Tucson, Arizona) "This is a very modern, very ambitious, and very well-written book. The scope is greater than I would have thought possible in a book of this length. This is made possible by the extremely efficient treatment, particularly the proofs . . . . [Kallenberg] has succeeded in his mammoth task beyond all reasonable expectations. I think this book is destined to become a modern classic." -N.H. Bingham (London)

"Kallenberg has ably achieved [his] goal and presents all the important results and techniques that every prohabilist should know .... We do not doubt that the book . . . will be widely used as material for advanced postgraduate courses and seminars on various topics in probability." -jste, European Math. Soc. Newsletter "This is a very well written book. ... Much e:ffort must have been put into simplifying and streamlining proofs, and the results are quite impressive. . . . I would highly recommend [the book] to anybody who wants a good concise reference text on several very important parts of modern probability theory. Fora mathematical sciences library, such a book is a must." -K. Borovkov (Melbourne} "[This] is an unusual book about a wide range of probability and stochastic processes, written by a single excellent mathematician. . . . The graduate studentwill definitely enjoy reading it, and for the researcher it will become a useful reference book and necessary tool for his or her work." -T. Mikosch (Groningen) "The author has succeeded in writing a text containing-in the spirit of Loeve's Probability Theory-all the essential results that any prohabilist needs to know. Like Loeve's classic, this book will become a standard source of study and reference for students and researchers in probability theory." -R. Kiesel (London) "Kallenberg's present book would have to qualify as the assimilation of probability par excellence. It is a great edifice of material, clearly and ingeniously presented, without any nonmathematical distractions. Readers wishing to venture into it may do so with confidence that they are in very -F.B. Knight (Urbana, fllinois} capable hands." "The presentation of the material is characterized by a surprising clarity and precision. The author's overview over the various subfields of probability theory and his detailed knowledge are impressive. Through an activity over many years as a researcher, academic teacher, and editor, he has acquired a deep competence in many areas. Wherever one reads, all chapters are carefully worked through and brought in streamlined form. One can imagine what an enormous e:ffort it has cost the author to reach this final state, though no signs of this are visible. His goal, as set forth in the preface, of giving clear and economical proofs of the included theorems has been achieved admirably.... I can't recall that in recent times I have held in my hands a mathematics book so thoroughly worked through." -H. Rost (Heidelberg)

Preface to the Second Edition For this new edition the entire text has been carefully revised, and some portions are totally rewritten. More importantly, I have inserted more than a hundred pages of new material, in chapters on general measure and ergodic theory, the asymptotics of Markov processes, and large deviations. The expanded size has made it possible to give a self-contained treatment of the underlying measure theory and to include topics like multivariate and ratio ergodie theorems, shift coupling, Palm distributions, entropy and information, Harris recurrence, invariant measures, strong and weak ergodicity, Strassen's law of the iterated logarithm, and the basic large deviation results of Cramer, Sanov, Schilder, and Freidlin and Ventzel. Unfortunately, the body of knowledge in probability theory keeps growing at an ever increasing rate, and I am painfully aware that I will never catch up in my efforts to survey the entire subject. Many areas are still totally beyond reach, and a comprehensive treatment of the more recent developments would require another volume or two. I am asking for the reader's patience and understanding. Many colleagues have pointed out errors or provided helpful information. I am especially grateful for some valuable comments from Wlodzimierz Kuperberg, Michael Scheutzow, Josef Teichmann, and Hermann Thorisson. Some of the new material was presented in our probability seminar at Auburn, where I benefited from stimulating discussions with Bill Hudson, Ming Liao, Lisa Peterson, and Hussain Talibi. My greatest thanks are due, as always, to my wife Jinsoo, whose constant love and support have sustained and inspired me throughout many months of hard work. Olav Kallenberg

March 2001

Preface to the First Edition Some thirty years ago it was still possible, as Loeve so ably demonstrated, to write a single book in probability theory containing practically everything worth knowing in the subject. The subsequent development has been explosive, and today a corresponding comprehensive coverage would require a whole library. Researchers and graduate students alike seem compelled to a rather extreme degree of specialization. As a result, the subject is threatened by disintegration into dozens or hundreds of subfields. At the sametime the interaction between the areas is livelier than ever, and there is a steadily growing core of key results and techniques that every prohabilist needs to know, if only to read the Iiterature in his or her own field. Thus, it seems essential that we all have at least a general overview of the whole area, and we should do what we can to keep the subject together. The present volume is an earnest attempt in that direction. My original aim was to write a book about "everything." Various space and time constraints forced me to accept more modest and realistic goals for the project. Thus, "foundations" had tobe understood in the narrower sense of the early 1970s, and there was no room for some of the more recent developments. I especially regret the omission of topics such as large deviations, Gibbs and Palm measures, interacting particle systems, stochastic differential geometry, Malliavin calculus, SPDEs, measure-valued diffusions, and branching and superprocesses. Clearly plenty of fundamental and intriguing material remains for a possible second volume. Even with my more limited, revised ambitions, I had to be extremely selective in the choice of material. More importantly, it was necessary to look for the most economical approach to every result I did decide to include. In the latter respect, I was surprised to see how much could actually be done to simplify and streamline proofs, often handed down through generations of textbook writers. My general preference has been for results conveying some new idea or relationship, whereas many propositions of a more technical nature have been omitted. In the same vein, I have avoided technical or computational proofs that give little insight into the proven results. This conforms with my conviction that the logical structure is what matters most in mathematics, even when applications is the ultimate goal. Though the book is primarily intended as a general reference, it should also be useful for graduate and seminar courses on different levels, ranging from elementary to advanced. Thus, a first-year graduate course in measure-theoretic probability could be based on the first ten or so chapters, while the rest of the book will readily provide material for more advanced courses on various topics. Though the treatment is formally self-contained, as far as measure theory and probability are concerned, the text is intended for a rather sophisticated reader with at least some rudimentary knowledge of subjects like topology, functional analysis, and complex variables.

x

Foundations of Modern Probability

My exposition is based on experiences from the numerous graduate and seminar courses I have been privileged to teach in Sweden and in the United States, ever since I was a graduate student myself. Over the years I have developed a personal approach to almost every topic, and even experts might find something of interest. Thus, many proofs may be new, and every chapter contains results that are not available in the standard textbook literature. It is my sincere hope that the book will convey some of the excitement I still feel for the subject, which is without a doubt (even apart from its utter usefulness) one of the richest and most beautiful areas of modern mathematics. Notes and Acknowledgments: My first thanks are due to my numerous Swedish teachers, and especially to Peter Jagers, whose 1971 seminar opened my eyes to modern probability. The idea of this book was raised a few years later when the analysts at Gothenburg asked me to give a short lecture course on "probability for mathematicians." Although I objected to the title, the lectures were promptly delivered, and I became convinced of the project's feasibility. For many years afterward I had a faithful and enthusiastic audience in numerous courses on stochastic calculus, SDEs, and Markov processes. I am grateful for that learning opportunity and for the feedback and encouragement I received from colleagues and graduate students. Inevitably I have benefited immensely from the heritage of countless authors, many of whom are not even listed in the bibliography. I have further been fortunate to know many prominent probabilists of our time, who have often inspired me through their schalarship and personal example. Two people, Klaus Matthes and Gopi Kallianpur, stand out as particularly important influences in connection with my numerous visits to Berlin and Chapel Hili, respectively. The great Kai Lai Chung, my mentor and friend from recent years, offered penetrating comments on all aspects of the work: linguistic, historical, and mathematical. My colleague Ming Liao, always a stimulating partner for discussions, was kind enough to check my material on potential theory. Early versions of the manuscript were tested on several groups of graduate students, and Kamesh Casukhela, Davorin Dujmovic, and Hussain Talibi in particular were helpful in spotting misprints. Ulrich Albrecht and Ed Siaminka offered generous help with software problems. I am further grateful to John Kimmel, Karina Mikhli, and the Springer production team for their patience with my Iast-minute revisions and their truly professional handling of the project. My greatest thanks go to my family, who is my constant source of happiness and inspiration. Without their Iove, encouragement, and understanding, this work would not have been possible. Olav Kallenberg

May 1991

Contents Preface to the Second Edition Preface to the First Edition 1. Measure Theory -

Basic Notions

vii

ix 1

Measurable sets and functions measures and integration monotone and dominated convergence transformation of integrals product measures and Fubini 's theorem V -spaces and projection approximation measure spaces and kernels 2. Measure Theory -

Key Results

23

Outer measures and extension Lebesgue and Lebesgue-Stieltjes measures Jordan-Hahn and Lebesgue decompositions Radon-Nikodym theorem Lebesgue 's differentiation theorem functions of finite variation Riesz' representation theorem Haarandinvariant measures 3. Processes, Distributions, and lndependence

45

Random elements and processes distributions and expectation independence zero-one laws Borel-Cantelli lemma Bernoulli sequences and existence moments and continuity of paths 4. Random Sequences, Series, and A verages

Convergence in probability and in V uniform integrability and tightness convergence in distribution convergence of random series strong laws of Zarge numbers Portmanteau theorem continuous mapping and approximation coupling and measurability

62

xii

Foundations of Modern Probability

5. Characteristic Functions and Classical Limit Theorems

83

Uniqueness and continuity theorem Poisson convergence positive and symmetric terms Lindeberg's condition general Gaussian convergence weak laws of large numbers domain of Gaussian attraction vague and weak compactness 6. Conditioning and Disintegration

103

Conditional expectations and probabilities regular conditional distributions disintegration conditional independence transfer and coupling existence of sequences and processes extension through conditioning 7. Martingales and Optional Times

119

Filtrations and optional times random time-change martingale property optional stopping and sampling

maximum and upcrossing inequalities martingale convergence, regularity, and closure limits of conditional expectations regularization of submartingales 8. Markov Processes and Discrete-Time Chains

140

Markov property and transition kernels finite-dimensional distributions and existence space and time homogeneity strong M arkov property and excursions invariant distributions and stationarity recurrence and transience ergodie behavior of irreducible chains mean recurrence times 9. Random Walks and Renewal Theory

Recurrence and transience dependence on dimension general recurrence criteria symmetry and duality Wiener-Hopf factorization

159

Contents

xiii

ladder time and height distribution stationary renewal proeess renewal theorem 10. Stationary Processes and Ergodie Theory

178

Stationarity, invarianee, and ergodieity diserete- and eontinuous-time ergodie theorems moment and maximum inequalities multivariate ergodie theorems sample intensity of a random measure subadditivity and produets of random matriees eonditioning and ergodie deeomposition • shijt eoupling and the invariant a-field 11. Special Notions of Symmetry and lnvariance

202

Palm distributions and inversion formulas stationarity and eycle stationarity loeal hitting and eonditioning ergodie properlies of Palm measures exehangeable sequenees and proeesses strong stationarity and predietable sampling ballot theorems entropy and information 12. Poisson and Pure Jump-Type Markov Processes

224

Random measures and point proeesses Cox proeesses, randomization, and thinning mixed Poisson and binomial processes independenee and symmetry eriteria Markov transition and rate kernels embedded M arkov ehains and explosion eompound and pseudo-Poisson proeesses ergodie behavior of irredueible ehains 13. Gaussian Processes and Brownian Motion

Symmetries of Gaussian distribution existenee and path properlies of Brownian motion strong Markov and refteetion properlies aresine and uniform laws law of the iterated logarithm Wiener integrals and isonormal Gaussian proeesses multiple Wiener-Ito integrals ehaos expansion of Brownian functionals

249

xiv

Foundations of Modern Probability

14. Skorohod Embedding and Invariance Principles

270

Embedding of random variables approximation of random walks functional central limit theorem laws of the iterated logarithm arcsine laws approximation of renewal processes empirical distribution functions embedding and approximation of martingales

15. Independent Increments and Infinite Divisibility

285

Regularity and integral representation Levy processes and subordinators stable processes and first-passage times infinitely divisible distributions characteristics and convergence criteria approximation of Levy processes and random walks limit theorems for null arrays convergence of extremes

16. Convergence of Random Processes, Measures, and Sets 307 Relative compactness and tightness uniform topology on C(K, S) Skorohod's J 1 -topology equicontinuity and tightness convergence of random measures superposition and thinning exchangeable sequences and processes simple point processes and random closed sets 17. Stochastic Integrals and Quadratic Variation Continuous local martingales and semimartingales quadratic variation and covariation existence and basic properties of the integral integration by parts and Ito's formula Fisk-Stratonovich integral approximation and uniqueness random time-change dependence on parameter

329

18. Continuous Martingales and Brownian Motion Real and complex exponential martingales martingale characterization of Brownian motion random time-change of martingales integral representation of martingales

350

Contents

xv

iterated and multiple integrals change of measure and Girsanov's theorem Cameron-Martin theorem Wald's identity and Novikov's condition 19. Feller Processes and Semigroups

367

Semigroups, resolvents, and generators closure and core Hille- Yosida theorem existence and regularization strong M arkov property characteristic operator diffusions and elliptic operators convergence and approximation 20. Ergodie Properties of Markov Processes

390

transition and contraction operators ratio ergodie theorem space-time invariance and tail triviality mixing and convergence in total variation H arris recurrence and transience existence and uniqueness of invariant measure distributional and pathwise limits 21. Stochastic Differential Equations and Martingale Problems Linear equations and Ornstein-Uhlenbeck processes strong existence, uniqueness, and nonexplosion criteria weak solutions and local martingale problems well-posedness and measurability pathwise uniqueness and functional solution weak existence and continuity transformation of SDEs strong M arkov and Fell er properlies 22. Local Time, Excursions, and Additive Functionals

Tanaka 's formula and semimartingale local time occupation density, continuity and approximation regenerative sets and processes excursion local time and Poisson process Ray-Knight theorem excessive functions and additive functionals local time at a regular point additive functionals of Brownian motion

412

428

xvi

Foundations of Modern Probability

23. One-dimensional SDEs and Diffusions Weak existence and uniqueness pathwise uniqueness and comparison scale function and speed measure time-change representation boundary classification entrance boundaries and Feller properlies ratio ergodie theorem recurrence and ergodicity

450

24. Connections with PDEs and Potential Theory Backward equation and Feynman-Kac formula uniqueness for SDEs from existence for PDEs harmonic functions and Dirichlet's problern Green functions as occupation densities sweeping and equilibrium problems dependence on conductor and domain time reversal capacities and random sets

470

25. Predictability, Compensation, and Excessive Functions

490

Accessible and predictable times natural and predictable processes Doob-Meyer decomposition

quasi-left-continuity compensation of random measures excessive and Superharmonie functions additive functionals as compensators Riesz decomposition 26. Semimartingales and General Stochastic Integration

515

L 2 -integral

Predictable covariation and semimartingale integral and covariation general substitution rule Doleans' exponential and change of measure norm and exponential inequalities martingale integral decomposition of semimartingales quasi-martingales and stochastic integrators 27. Large Deviations Legendre-Fenchel transform Cramer's and Schilder's theorems Zarge-deviation principle and rate function functional form of the LDP

537

Contents

xvii

continuous mapping and extension perturbation of dynamical systems empirical processes and entropy Strassen's law of the iterated logarithm

Appendices Al. Advanced Measure Theory Polish and Borel spaces measurable inverses projection and sections A2. Some Special Spaces Function spaces measure spaces spaces of closed sets measure-valued functions projective limits

561

562

Historical and Bibliographical Notes

569

Bibliography

596

Symbol Index

621

Author Index

623

Subject Index

629

Words of Wisdom and Folly • "A mathematician who argues from probabilities in geometry is not worth an ace" - Socrates (on the demands of rigor in mathematics) • "(We will travel a road] full of interest of its own. It familiarizes us with the measurement of variability, and with curious laws of chance that apply to a vast diversity of social subjects" - Froncis Galton (on the wondrous world of probability) • "God doesn't play dice" (i.e., there is no randomness in the universe] - Albert Einstein (on quantum mechanics and causality) • "It might be possible to prove certain theorems, but they would not be of any interest, since, in practice, one could not verify whether the assumptions are fulfilled" - Emile Borel (on why bothering with probability) •

"(The stated result] is a special case of a very general theorem (the strong Markov property]. The measure [theoretic] ideas involved are somewhat glossed over in the proof, in order to avoid complexities out of keeping with the rest of this paper" - Joseph L. Doob (on why bothering with generality or mathematical rigor)

• "Probability theory (has two hands]: On the right is the rigorous (technical work]; the left hand ... reduces problems to gambling situations, coin-tossing, motions of a physical particle" - Leo Breiman (on probabilistic thinking) • "There are good taste and bad taste in mathematics just as in music, literature, or cuisine, and one who dabbles in it must stand judged thereby" -Kai Lai Chung (on the art of writing mathematics) • "The traveler often has the choice between dirnhing a peak or using a cable car" - William Fell er (on the art of reading mathematics) • "A Catalogue Aria of triumphs is of less benefit [to the student] than an indication of the techniques by which such results are achieved" - David Williams (on seduction and the art of discovery) •

"One needs (for stochastic integration] a six months course (to cover only] the definitions. What is there to do?"- Paul-Andre Meyer (on the dilemma of modern math education)

• "There were very many (bones] in the open valley; and lo, they were very dry. And (God] said unto me, 'Son of man, can these bones live?' And I answered, '0 Lord, thou knowest.'" - Ezekiel 37:2-3 (on the ultimate reward of hard studies, as quoted by Chris Rogers and David Williams)

Chapter 1

Measure Theory- BasicNotions Measurable sets and functions; measures and integration; monotone and dominated convergence; transformation of integrals; product measures and Fubini 's theorem; V -spaces and projection; approximation; measure spaces and kernels

Modern probability theory is technically a branch of measure theory, and any systematic exposition of the subject must begin with some basic measure-theoretic facts. In this chapter and its sequel we have collected some basic ideas and results from measure theory that will be useful throughout this book. Though most of the quoted propositions may be found in any textbook in real analysis, our emphasis is often somewhat different and has been chosen to suit our special needs. Many readers may prefer to omit these chapters on their first encounter and return for reference when the need arises. To fix our notation, we begin with some elementary notions from set theory. For Subsets A, Ak, B, ... of some abstract space n, recall the definitions of union AU Bor Uk Ak, intersection An Bor nk Ak, complement Ac, and difference A \ B = An Be. The latter is said to be proper if A ::::> B. The symmetric difference of A and B is given by A~B = (A \ B) U (B \ A). Among basic set relations, we note in particular the distributive laws

and de Morgan's laws

valid for arbitrary (not necessarily countable) unions and intersections. The latter formulas allow us to convert any relation involving unions (intersections) into the dual formula for intersections (unions). We define a a-algebra or a-field in n as a nonempty collection A of subsets of n that is closed under countable unions and intersections as well as under complementation. (For a field, closure is required only under finite set Operations.) Thus, if A, Al, A2, ... E A, then also Ac, uk Ak, and nk Ak lie in A. In particular, the whole space n and the empty set 0 belong to every a-field. In any space n there is a smallest a-field {0, n} and a largest one 2° -the dass of all subsets of n. Note that any a-field A is closed under monotone limits. Thus, if A1, A2, · · · E A with An t A or An 4.- A,

2

Foundations of Modern Probability

then also A E A. A measumble space is a pair (n, A), where n is a space and A is a a-field in n. For any class of a-fields in n, the intersection (but usually not the union) is again a a-field. If Cis an arbitrary class of subsets of n, there is a smallest a-field in n containing C, denoted by a(C) and called the a-field genemted or induced by C. Note that a(C) can be obtained as the intersection of all a-fields in n that contain C. We endow a metric or topological space S with its Borel a-field B(S) generated by the topology (dass of open subsets) in S, unless a a-field is otherwise specified. The elements of B(S) are called Borel sets. In the case of the realline IR, we often write B instead of B(JR). More primitive classes than a-fields often arise in applications. A class C of subsets of some space n is called a 1r-system if it is closed under finite intersections, so that A, B E C implies A n B E C. Furthermore, a class V is a >.-system if it contains n and is closed under proper differences and increasing limits. Thus, we require that n E V, that A, B E V with A ::::> B implies A \BE V, and that At, A2, · · · E V with An t A implies A E V. The following monotone-class theorem is often useful to extend an established property or relation from a class C to the generated a-field a(C). An application of this result is referred to as a monotone-class argument. Theorem 1.1 {monotone classes, Sierpinski} Let C be a 1r-system and V a >.-system in some space n suchthat C c V. Then a(C) c V.

Proof: We may clearly assume that V = >..(C) -the smallest >..-system containing C. It suffices to show that V is a 1r-system, since it is then a a-field containing C and therefore contains the smallest a-field a(C) with this property. Thus, we need to show that A n B E V whenever A, B E V. The relation An B E V is certainly true when A, B E C, since C is a 'Ir-system contained in V. We proceed by extension in two steps. First we fix any B E C and define AB = {A c 0; An B E V}. Then AB is a >..system containing C, and so it contains the smallest >..-system V with this property. This shows that An BE V for any A E V and BE C. Next we fix any A E V and define BA= {B c 0; An BE V}. As before, we note that even BA contains V, which yields the desired property. D

For any family of spaces Ot, t E T, we define the Cartesian product XtETnt as the class of all collections (wt; t E T), where Wt E Ot for all t. When T = {1, ... , n} or T = N = {1, 2, ... }, we often write the product space as nl X ... X nn or nl X n2 X •.• , respectively; if Ot = n for all t, we use the notation nT, nn, or noo. In case of topological spaces Ot, we endow Xt!lt with the product topology unless a topology is otherwise specified. Now assume that each space Ot is equipped with a a-field At. In XtOt we may then introduce the product a-jield ®tAt, generated by all onedimensional cylinder sets At x Xs#t!l 8 , where t E T and At E At. (Note the analogy with the definition of product topologies.) As before, we write A1 ®···®An, A1 ® A2 ®···,AT, An, or A 00 in the appropriate special cases.

1. Measure Theory- BasicNotions

3

Lemma 1.2 (product and Borel a-fields} lf Sb 82,... are sepamble metric spaces, then B(S1 x s2 x ... ) = B(S1) ®B(S2) ® .... Thus, for countable products of separable metric spaces, the product and Borel a-fields agree. In particular, B(lRd) = (B(JR))d = Bd, the afield generated by allreetangular boxes h x · · · x Id, where h, ... , Id are arbitrary real intervals. This special case can also be proved directly.

Proof: The assertion may be written as a(CI) = a(C2), and it suffices to show that C1 c a(C2) and C2 c a(C1). For C2 we may choose the dass of all cylinder sets Gk x Xn#Sn with k E N and Gk open in Sk. Those sets generate the product topology inS= XnSn, and so they belang to B(S). Conversely, we note that S = XnSn is again separable. Thus, for any topological base C inS, the open subsets of S are countable unians of sets in C. In particular, we may choose C to consist of all finite intersections of cylinder sets Gk x Xnf.kSn as above. It remains to note that the latter sets lie in ®n B(Sn)· 0 Every point mapping f between two spaces S and T induces a set mapping f- 1 in the opposite direction, that is, from 2T to 28 , given by

f- 1 B = {s ES; f(s)

E B},

B

c T.

Note that f- 1 preserves the basic set operations in the sense that, for any subsets B and Bk ofT, /-1 Be= u-1 Br,

/-luk Bk= U/-1 Bk,

f-lnk Bk= nJ-1 Bk.

(1) The next result shows that f- 1 also preserves a-fields, in both directions. For convenience, we write

Lemma 1.3 {induced a-fields} Let f be a mapping between two measurable spaces (S,S) and (T, T). Then (i) S' = f- 1 7 is a a-field inS; (ii) 7' = {B

c T; f- 1 B

ES} is a a-field in T.

Proof: (i) Let A, Ab A2, · · · ES'. Then there exists some sets B, B1, B2,

· · · E 7 with A = f- 1B and An= /- 1Bn for each n. Since 7 is a a-field,

the sets Be,

Un Bn, and nn Bn allbelang to 7, and by (1) we get (f-1 B)e = /-1 Be E /-17 = S', UJ-1 Bn = /-1 un Bn E /-17 = S',

=

n/n

1Bn = /- 1

n n

Bn E /- 17 = S'.

4

Foundations of Modern Probability

(ii) Let B, B~, B 2, · · · E T', so that f- 1 B, f- 1 B~, f- 1 B2, · · · E S. Using (1) and the fact that S is a a-field, we get

f-1nc f-1UnBn f-1nnnn which shows that nc,

(f- 1 B)c ES,

=

UJ- Bn ES, nJ- Bn ES, 1

1

Un Bn, and nn Bn alllie in T'.

0

Given two measurable spaces (S,S) and (T, 7), a mapping f: S-+ T is said tobe S/T-measurable or simply measurable if f- 1 T C S, that is, if f- 1 B E S for every B E T (Note the analogy with the definition of continuity in terms of topologies on S and T.) By the next result, it is enough to verify the defining condition for a generating subclass. Lemma 1.4 {measurable functions} Consider a mapping f between two measurable spaces (S,S) and (T, 7), and let C c 2T with a(C) = T. Then f is S/T-measurable iff f- 1 c S.

c

Proof: Let T' = {B c T; f- 1 B ES}. Then C T' is a a-field by Lemma 1.3 (ii). Hence,

T' which shows that

c T' by hypothesis and

= a(T') ::) a(C) = T,

f- 1 BE S for all BE T.

0

Lemma 1.5 (continuity and measurability) Let f be a continuous mapping between two topological spaces S and T with Borel a-fields S and T. Then f is S /T-measurable.

Proof: LetS' and T' denote the classes of open setsinS and T. Since f is continuous and S = a(S'), we have

f- 1 r c s' c s. By Lemma 1.4 it follows that that a(T') = T.

f

is S/a(T')-measurable. It remains to note 0

We insert a result about subspace topologies and a-fields that will be needed in Chapter 16. Given a dass C of subsets of SandasetA C S, we define An C ={An C; CE C}. Lemma 1.6 {subspaces) Fix a metric space (S, p) with topology T and Borel a-field S, and let AC S. Then (A, p) has topology TA =An T and Borel a-field SA =AnS.

Proof: The natural embedding JA: A-+ S is continuous and hence measurable, and so An T = (41 T c TA and AnS= TA_ 1 S c SA. Conversely, given any B E TA, we define G = (B U Ac)o where the complement and interior are with respect toS and note that B = AnG. Hence, TA c AnT,

1. Mea.sure Theory- Ba.sic Notions

5

and therefore SA

= a(TA)

C a(A

n T)

C a(A

n S) =AnS,

where the Operation a( ·) refers to the subspace A.

D

As with continuity, we note that even measurability is preserved by composition. Lemma 1. 7 (composition) Fix three measurable spaces (S, S), (T, T), and (U, U), and consider some measurable mappings f : S -t T and g: T -t U. Then the composition h = g o f: S -t U is again measurable.

=

c

Proof: Let C E U, and note that B g- 1 E T since g is measur1 1 1 able. Noting that (! o g)- = g- o f- and using the fact that even f is measurable, we get D

To state the next result, we note that any collection of functions ft: St, t E T, defines a mapping f = Ut) from n to XtSt given by f(w)

= (ft(w); t

E T),

w E 0.

n -t (2)

It is often useful to relate the measurability of f to that of the coordinate mappings ft. Lemma 1.8 (collections of functions) Consider any set of functions ft: n -t St, t E T, where (O,A) and (St,St), t E T, are measurable spaces, and dejine f = (!t): n -t XtSt. Then f is A/ ®t Srmeasurable iff ft is A/St-measurable for every t E T. Proof: We may use Lemma 1.4, with C equal to the dass of cylinder sets At x Xs=f=.tSt for arbitrary t E T and At E St. D

Changing our perspective, assume the ft in (2) tobe mappings into some measurable spaces (St, St)· In n we may then introduce the generated or induced a-field a(f) = a{ft; t E T}, defined as the smallest a-field in n that makes all the ft measurable. In other words, a(f) is the intersection of all a-fields A in n suchthat ft is A/Srmeasurable for every t E T. In this notation, the functions ft are clearly measurable with respect to a a-field A in n iff a(f) c A. It is further useful to note that a(f) agrees with the a-field in n generated by the collection Ut- 1 St; t E T}. For functions on or into a Euclidean space IRd, measurability is understood tobe with respect to the Borel a-field ßd. Thus, a real-valued function f on some measurable space (O,A) is measurable iff {w; f(w) ~ x} E A for all x E R The same convention applies to functions into the extended real line iR = [-oo, oo] or the extended half-line iR+ = [0, oo], regarded as compactifications of IR and IR+ = [0, oo ), respectively. Note that B(iR) = a{B, ±oo} and B(iR+) = a{B(IR+), oo}. For any set A C 0, we define the associated indicator function lA: 0 -t IR tobe equal to 1 on A and to 0 on Ac. (The term characteristic function has

6

Foundations of Modern Probability

a different meaning in probability theory.) For sets A = {w; f(w) E B}, it is often convenient to write 1{-} instead of 1o. Assuming A tobe a a-field in n, we note that 1A is A-measurable iff A E A. Linear combinations of indicator functions are called simple functions. Thus, a general simple function f: n ~ IR has the form

f = c11Al + ... + Cn1An' where n E z+ = {0,1, ... }, C!,····Cn E IR, and Al, ... ,An c n. Herewe may clearly take c1 , ... , Cn to be the distinct nonzero values attained by f and define Ak = f- 1 { ck}, k = 1, ... , n. With this choice ofrepresentation, we note that f is measurable with respect to a given a-field A in n iff

A1, ... ,AnEA.

We proceed to show that the class of measurable ftmctions is closed under the basic finite or countable operations occurring in analysis.

Lemma 1.9 {bounds and limits) Let /!, /2,... be measurable functions from some measurable space (0, A) into iR. Then supn fn, infn fn, limsupn fn, and liminfn fn are again measurable. Proof: To see that supn f n is measurable, write {w; supnfn(w)::; t}

=

nn

{w; fn(w) ::; t}

=

n/; [-oo, 1

t] E A,

and use Lemma 1.4. The measurability of the other three functions follows easily if we write infn fn = -supn(- fn) and note that limsupfn = inf sup fk, liminf fn = sup inf fk· D n--+oo n k;?.n n--+oo n k;?_n Since fn ~ f iff limsupn fn = liminfn fn = /, it follows easily that both the set of convergence and the possible limit are measurable. The next result gives an extension to functions with values in more general spaces.

Lemma 1.10 {convergence and limits) Let/!, /2, ... be measurable functions from a measurable space (n, A) into some metric space (8, p). Then (i) {w; fn(w) converges} E A when S is complete; (ii) fn ~ f on n implies that f is measurable. Proof: (i) Since S is complete, the convergence of fn is equivalent to the Cauchy convergence lim sup p(fm, fn) n--+oo m;?_n

= 0.

Here the left-hand side is measurable by Lemmas 1.5 and 1.9. (ii) If fn ~ J, we have 90 fn ~ 90 f for any continuous function 9: S ~ IR, and so 9 o f is measurable by Lemmas 1.5 and 1.9. Fixing any open set G C S, we may choose some continuous functions 91, 92, ... : S ~ IR+ with 9n t 1c and conclude from Lemma 1.9 that 1c o f is measurable. Thus, f- 1 G E A for all G, and so f is measurable by Lemma 1.4. D

1. Mea.sure Theory- Ba.sic Notions

7

Many results in measure theory are proved by a simple approximation, based on the following observation. Lemma 1.11 {approximation} For any measurable function f: (0, A) --+ :IR+, there exist some simple measurable functions ft, h, ... : 0--+ IR.+ with 0 ~ fn t J. Proof: We may define fn(w)

= Tn[2n f(w)]l\ n,

w E 0, n E N.

D

To illustrate the method, we may use the last lemma to prove the measurability of the basic arithmetic operations. Lemma 1.12 {elementary operations} Fix any measurable functions J, g: (0, A) --+ IR. and constants a, b E IR.. Then af + bg and fg are again measurable, and so is f I g when g -::1 0 on 0. Proof: By Lemma 1.11 applied to f± = (±!)V 0 and 9± = (±g) V 0, we may approximate by simple measurable functions fn--+ fand 9n--+ g. Here afn + bgn and fn9n are again simple measurable functions. Since they converge to af +bg and Jg, respectively, even the latter functions are measurable by Lemma 1.9. The same argument applies to the ratio f I g, provided we choose 9n-::/ 0. An alternative argument is to write af + bg, fg, or f I g as a composition 7/J o cp, where cp = (f,g): 0--+ IR. 2 , and 'lj;(x,y) is defined as ax + by, xy, or xly, repectively. The desired measurability then follows by Lemmas 1.2, 1.5, and 1.8. In the case of ratios, we may use the continuity of the mapping (x,y) 1--t xly on IR. x (IR.\ {0}). D

For many statements in measure theory and probability, it is convenient first to give a proof for the real line and then to extend the result to more general spaces. In this context, it is useful to identify pairs of measurable spaces S and T that are Borel isomorphic, in the sense that there exists a bijection f: S --+ T such that both f and f- 1 are measurable. A space S that is Borel isomorphic to a Borel subset of [0, 1] is called a Borel space. In particular, any Polish space endowed with its Borela-fieldis known to be Borel (cf. Theorem Al.2). (Recall that a topological space is said tobe Polish if it admits a separable and complete metrization.) The next result gives a useful functional representation of measurable functions. Given any two functions f and g on the same space 0, we say that f is g-measurable if the induced a-fields are related by a(f) C a(g). Lemma 1.13 (functional representation, Doob} Fix two measurable functions fand g from a space 0 into some measurable spaces (S, S) and (T, 7), where the former is Borel. Then f is g-measurable iff there exists some measurable mapping h: T --+ S with f = h o g. Proof: Since S is Borel, we may assume that SE B([O, 1]). By a suitable modification of h, we may further reduce to the case when S = [0, 1]. If

8

Foundations of Modern Probability

f = lA with a g-measurable A C n, then by Lemma 1.3 there exists some set B E 7 with A = g- 1 B. In this case f = lA = lB o g, and we may choose h = lß. The result extends by linearity to any simple g-measurable function f. In the general case, there exist by Lemma 1.11 some simple g-measurable functions /1. /2, ... with 0 :::; fn t /, and we may choose associated 7-measurable functions h1. h2, ... : T-+ [0, 1] with fn = hn o g. Then h = supn hn is again 7-measurable by Lemma 1.9, and we note that D

Given any measurable space (n,A), a function J..L: A-+ iR+ is said tobe countably additive if

J..LUk2:': 1Ak = Lk2:': 1J..LAk,

A1.A2, ... E A disjoint.

(3)

A measure on (n, A) is defined as a function J..L: A -+ iR+ with J..L0 = 0 and satisfying (3). A triple (n, A, J..L) as above, where J..L is a measure, is called a measure space. From (3) we note that any measure is finitely additive and nondecreasing. This implies in turn the countable subadditivity

J..Luk2:':lAk:::; L:k2:':lJ..LAk,

A1,A2,··.

E

A.

We note the following basic continuity properties. Lemma 1.14 (continuity) Let J..L be a measure on

A1. A2, · · · E A. Then (i) An t A implies J..LAn t J..LA; (ii) An t A with J..LA1 < oo implies J..LAn

(n, A), and assume that

t J..LA.

Proof: For (i) we may apply (3) to the differences Dn =An\ An-1 with Ao = 0. To get (ii), apply (i) to the sets Bn = A1 \An. D The simplest measures on a measurable space (n, A) are the unit masses or Dirac measures 8x, X E n, given by 8xA = lA(x). For any Countahle set A = { x1, x2, ... }, we may form the associated counting measure J..L = Ln 8xn. More generally, we may form countable linear combinations of arbitrary measures on n, as follows.

{series of measures} For any measures J..LI,J..L2, ••• on (n, A) and constants c1, c2, · · · 2:: 0, the sum J..L = Ln CnJ..Ln is again a measure.

Proposition 1.15

Proof: We need the fact that, for any array of constants

Cij

2:: 0, i, j

E N,

'"' '"' c·~J· - '"' '"' c·~J·· L.....JiL.....Jj L.....JjL.....Ji

This is trivially true for finite sulUS. In general, let m, n E N and write 'L.....JiL.....Jj " ' ' " ' Ci·J

>'"' '"' -

L.....Ji~mL.....Jj~n Ci·J

='"' '"'

L.....Jj~nL.....Ji~m c·~J· •

(4)

1. Measure Theory- BasicNotions

9

Letting m -+ oo and then n -+ oo, we ohtain (4) with the inequality 2:. The same argument yields the reverse relation, and the equality follows. Now consider any disjoint sets A1 , A2 , · · · E A. Using (4) and the countahle additivity of each J.Ln, we get Ln CnJ.ln

uk

Ak = LnLk CnP,nAk D

LkLn CnP,nAk = LkJ.LAk· The last result may he restated in terms of monotone sequences.

Corollary 1.16 (monotone limits) Let J.lb P,2, ... be measures on a measurable space (0, A) such that either J.Ln t p,, or J.Ln ..!. p, with p, 1 bounded. Then p, is again a measure on (O,A).

Proof: In the increasing case, we may apply Proposition 1.15 to the sum J.L = En(J.Ln- J.Ln-1), where J.Lo = 0. For decreasing sequences, the previous D case applies to the increasing measures p, 1 -I-Ln· For any measure p, on (n, A) and set B E A, the function v : A ~ p,(A n B) is again a measure on (0, A), called the restriction of p, to B. Given any Countahle partition of n into disjoint sets Ab A2, ... E A, we note that J.L = En J.Ln, where J.Ln denotes the restriction of J.L to An. The measure p, is said to he a-finite if the partition can he chosen such that p,An < oo for all n. In that case the restrictions J.Ln are clearly hounded. A measure p, on some topological space S with Borel a-field S is said to he locally finite if every point s E S has a neighhorhood where p, is finite. A locally finite measure on a a-compact space is clearly a-finite. It is often useful to identify simple measure-determining classes C C S such that a measure on S is uniquely determined hy its values on C. For locally finite measures on a Euclidean space ~d, we may take C = J:d, the dass of all hounded rectangles. Lemma 1.17 (uniqueness) Let p, and v be bounded measures on some measurable space (n, A) and let be a n-system in n such that n E and a(C) = A. Then p, = v iff p,A = vA for all A E C.

c

c

Proof: Assuming p, = v on C, let 1J denote the dass of sets A E A with p,A = vA. Using the condition n E C, the finite additivity of p, and v, and Lemma 1.14, we see that 1J is a A-system. Moreover, C C 1J hy hypothesis. Hence, Theorem 1.1 yields 1J :::::> a(C) = A, which means that p, = v. The converse assertion is ohvious. D For any measure p, on a topological space S, the support supp p, is defined as the smallest closed set F C S with p,Fc = 0. If lsupp J.LI :«; 1, then J.L is said to he degenerate, and we note that p, = c08 for some s ES and c 2: 0. More generally, a measure p, is said to have an atom at s E S if { s} E S and p,{ s} > 0. For any locally finite measure p, on some a-compact metric space S, the set A = {s E S; p,{ s} > 0} is clearly measurahle, and we may define the atomic and diffuse components J.La and J.ld of J.L as the restrictions

Foundations of Modern Probability

10

of J.L to A and its complement. We further say that J.L is diffuse if J.La = 0 and purely atomic if J.Ld = 0. In the important special case when J.L is locally finite and integer valued, the set A above is clearly locally finite and hence closed. By Lemma 1.14 we further have supp J.L c A, and so J.L is purely atomic. Hence, in this case J.L = l:sEA C8 88 for some integers C8 • In particular, J.L is said tobe simple if c8 = 1 for all s E A. Then clearly J.L agrees with the counting measure on its support A. Any measurable mapping f between two measurable spaces (8, S) and (T, induces a mapping of measures Oll 8 into measures Oll T. More precisely, given any measure J.L on (8, S), we may define a measure J.L o f- 1 on (r,n by

n

(J.Lo f- 1 )B = J.L(f- 1 B) = J.L{s E 8; f(s) E B},

BE T

Here the countable additivity of J.L o f- 1 follows from that for J.L tagether with the fact that f- 1 preserves unions and intersections. Our next aim is to define the integml

J.L!

=

1 =1 fdJ.L

f(w)J.L(ciMJ)

of a real-valued, measurable function f on some measure space (O,A,J.L). First assume that f is simple and nonnegative, hence of the form c11A 1 + ... + Cn1A .. for some n E z+, All ... ' An E A, and cl, ... 'Cn E IR+, and define

J.LJ = C1J.LA1

+ · · · + CnJ.LAn.

(Throughout measure theory we are following the convention 0 · oo = 0.) Using the finite additivity of J.L, it is easy to verify that J.Lf is independent of the choice of representation of f. It is further clear that the mapping f 1-t J.Lf is linear and nondecreasing, in the sense that

J.L(af + bg)

=

aJ.Lf + bJ.Lg,

f ~g

=?

J.L! ~ J.L9·

a, b ~ 0,

To extend the integral to any nonnegative measurable function f, we may choose as in Lemma 1.11 some simple measurable functions h, h, ... with 0 ~ fn t J, and define J.Lf = limn J.Lfn· The following result shows that the limit is independent of the choice of approximating sequence Un)·

Lemma 1.18 (consistency) Fix any measumble function f ~ 0 on some measure space (n, A, J.L), and let h, h, . . . and g be simple measurable functions satisfying 0 ~ fn t f and 0 ~ g ~ f. Then limn J.Lfn ~ J.L9· Proof: By the linearity of J.L, it is enough to consider the case when g for some A E A. Then fix any c > 0, and define An= {w E A; fn(w) ~ 1- c},

n E N.

= 1A

1. Measure Theory- BasicNotions

Here An

t A,

11

and so

It remains to let e ---+ 0.

0

The linearity and monotonicity extend immediately to arbitrary f 2: 0, since if fn t f and 9n t g, then afn + bgn t af + bg, and if also f :::; g, then fn :::; Un V 9n) t g. We are now ready to prove the basic continuity property of the integral.

Theorem 1.19 {monotone convergence, Levi} Let J, !I, h ... be measurable functions on (n, A, t-t) with 0:::; fn t f. Then t-tfn t t-tf· Proof: Foreach n we may choose some simple measurable functions 9nk, with 0:::; 9nk t fn as k---+ oo. The functions hnk = 91k V··· V 9nk have the same properties and are further nondecreasing in both indices. Hence, f 2: lim hkk 2: lim hnk = fn k-too k-.oo and so 0 :::; hkk we obtain

t

t /,

f. Using the definition and monotonicity of the integral, 0

The last result leads to the following key inequality.

Lemma 1.20 {Fatou) For any measurable functions A,t-t), we have lim inf t-t f n 2: j.dim inf f n. n-+oo

!I, /2, · · · 2: 0 on (n,

n-+oo

Proof: Since fm 2: infk::::n /k for all m 2: n, we have inf t-t!k 2: t-t inf /k, k::::n k::::n

n E N.

Letting n ---+ oo, we get by Theorem 1.19 lim inf t-tfk 2: lim t-t inf fk k-too n-.oo k::::n

= t-tlim inf fk. k-.oo

0

A measurable function f on (n, A, t-t) is said to be integrable if t-tl/1 < oo. In that case f may be written as the difference of two nonnegative, integrable functions g and h (e.g., as f+- f-, where J± =(±!)V 0), and we may define t-tf = t-t9-t-th. It is easy to checkthat the extended integral is independent of the choice of representation f = g- h and that t-tf satisfies the basic linearity and monotonicity properties (the former with arbitrary real coefticients). We are now ready to state the basic condition that allows us to take limits under the integral sign. For 9n = g the result reduces to Lebesgue 's dominated convergence theorem, a key result in analysis.

12

Foundations of Modern Probability

Theorem 1.21 {dominated convergence, Lebes9ue) Let/, h, /2, ... and 9, 91> 92, ... be measurable functions on (n, A, JL) with lfnl :S 9n for all n, and suchthat fn-+ /, 9n-+ 9, and JL9n-+ JL9 < oo. Then JLfn-+ JLf.

Proof: Applying Fatou's lemma to the nmctions 9n ± fn JL9

+ liminf(±JL/n) = liminf JL(9n ± fn) n--+oo n--+cx::>

~

~

0, we get

JL(9 ± f) = JL9 ± JLf.

Subtracting JL9 < oo from each side gives

JL! :S liminfJL/n :S limsupJLfn :S JLf. n->ex>

0

n->oo

The next result shows how integrals are transformed by measurable mappings. Lemma 1.22 (substitution) Consider a measure space (n,A,JL), a measurable space (S, S), and two measurable mappin9s f : n -+ S and g: S-+ IR. Then

(5) whenever either side exists. {In other words, if one side exists, then so does the other and the two are equal.) Proof: If 9 is an indicator function, then (5) reduces to the definition of JLO f- 1 . From here on we may extend by linearity and monotone convergence to any measurable function 9 ~ 0. For general 9 it follows that JLI9 o /I = (JL o f- 1 ) 191, and so the integrals in (5) exist at the same time. When they do, we get (5) by taking differences on both sides. 0 Turning to the other basic transformation of measures and integrals, fix any measurable function f ~ 0 on some measure space (n, A, JL), and define a function f · JL on A by

(! JL)A 0

= JL(lA/) =

L

fdJL,

A E A,

where the last relation defines the integral over a set A. It is easy to check that v = f · JL is again a measure on (n, A). Here f is referred to as the JL-density of v. The corresponding transformation rule is as follows. Lemma 1.23 {chain rule} For any measure space able functions f: n -+ IR+ and 9: n -+ IR, we have

(n, A, JL)

and measur-

JL(/g) = (! JL)9 0

whenever either side exists. Proof: As in the last proof, we may begin with the case when 9 is an indicator function and then extend in steps to the general case. 0 Given a measure space (n, A, JL), a set A E A is said to be JL-null or simply null if JLA = 0. A relation between functions on n is said to hold almost everywhere with respect to JL (abbreviated as a.e. JL or JL-a.e.) if it

1. Measure Theory- BasicNotions

13

holds for all w E n outside some J.L-null set. The following frequently used result explains the relevance of null sets. Lemma 1.24 (null sets and functions} For any measurable function f 2:: 0 on some measure space (O,A,J.L), we have J.Lf = 0 iff f = 0 a.e. J.L.

Proof: The statement is obvious when f is simple. In the general case, we may choose some simple measurable functions f n with 0 ~ f n t f, and note that f = 0 a.e. iff fn = 0 a.e. for every n, that is, iff J.Lfn = 0 for all n. Here the latter integrals converge to J.Lf, and so the last condition is 0 equivalent to J.Lf = 0. The last result shows that two integrals agree when the integrands are a.e. equal. We may then allow integrands that are undefined on some J.L-null set. It is also dear that the condusions of Theorems 1.19 and 1.21 remain valid if the hypotheses are only fulfilled outside some null set. In the other direction, we note that if two er-finite measures J.L and v are related by v = f · J.L for some density f, then the latter is J.L-a.e. unique, which justifies the notation f = dv / dJ.L. It is further dear that any J.L-null set is also a null set for v. For measures J.L and v with the latter property, we say that v is absolutely continuous with respect to J.L and write v «: J.L. The other extreme case is when J.L and v are mutually singular or orthogonal (written as J.L ..l v), in the sense that J,LA = 0 and vAc = 0 for some set AEA. Given a measure space (O,A,J.L) and a cr-field :F C A, we define the J.L-completion of :F in A as the cr-field :F~-' = cr(:F,NI-'), where NI-' denotes the dass of all subsets of arbitrary J.L-null sets in A. The description of :F~-' can be made more explicit, as follows. Lemma 1.25 (completion) Consider a measure space (O,A,J.t), a cr-field :F c A, and a Borel space (S,S). Then a function J: 0 -+ S is :F~-'­ measurable iff there exists some :F -measurable function g satisfying f = g a.e. J.L.

Proof: Beginning with indicator functions, let Q be the dass of subsets A c n such that A~B E NI-' for some B E :F. Then A \ B and B \ A are again in NI-', which implies Q C :F~-'. Conversely, :F~-' C Q since both :F and NI-' are trivially contained in Q. Combining the two relations gives Q = :F~-', which shows that A E :F~-' iff 1A = 1B a.e. for some B E :F. In the general case, we may dearly assume that S = [0, 1]. For any :F~-'­ measurable function J, we may then choose some simple :F~-'-measurable functions f n such that 0 ~ f n t f. By the result for indicator functions, we may next choose some simple :F-measurable functions 9n suchthat fn = 9n a.e. for each n. Since a countable union of null sets is again a null set, the function g = lim supn 9n has the desired property. 0 Any measure J.L on (0, A) has a unique extension to the cr-field A~-'. Indeed, for any A E A~-' there exist by Lemma 1.25 some sets A± E A with

14

Foundations of Modern Probability

A_ C A C A+ and JL(A+ \ A_) = 0, and any extension must satisfy JLA = JLA±. With this choice, it is easy to check that JL remains a measure onAJ.i. Our next aims are to construct product measures and to establish the basic condition for changing the order of integration. This requires a preliminary technical lemma. Lemma 1.26 (sections} Fix two measurable spaces (S,S) and (T, 7), a measurable function f: S x T-+ IR+, and a a-finite measure JL on S. Then

(i) j(8, t) i8 S-mea8urable in 8 ES foreachtE T; (ii) J j(8, t)JL(d8) i8 'T-mea8urable in t E T. Proof- We may assume that JL is bounded. Both statements are obvious when f = lA with A = B x C for some BE Sand CE 7, and they extend by a monotone dass argument to any indicator functions of sets in S 0 7. The general case follows by linearity and monotone convergence. 0 We are now ready to state the main result involving product measures, commonly referred to as Fubini '8 theorem. Theorem 1.27 (product mea8ure8 and iterated integral8, Lebesgue, Fu-

bini, Tonelli} For any a-finite mea8ure 8pace8 (S, S, JL) and (T, 7, v), there 8ati8fying exists a unique mea8ure JL 0 V on (s X T, s 0

n

(JL 0 v)(B x C) = JLB · vC, Furthermore, for any mea8urable function (JL 0 v)f

=

J J JL(d8)

f(8, t)v(dt)

BE S, CE/.

f: S

=

(6)

x T -+ IR+,

J J v(dt)

f(8, t)JL(ds).

(7)

The last relation remains valid for any measurable function f: S x T-+ IR with (JL 0 v)lfl < oo. Note that the iterated integrals in (7) are well defined by Lemma 1.26, although the inner integrals vj(8, ·) and JLf(·, t) may fail to exist on some null sets in S and T, respectively. Proof: By Lemma 1.26 we may define (M 0 v)A

=

J J JL(d8)

1A(8, t)v(dt),

A ES 0 7,

(8)

which is dearly a measure on S x T satisfying (6). By a monotone dass argument there can be at most one such measure. In particular, (8) remains true with the order of integration reversed, which proves (7) for indicator functions f. The formula extends by linearity and monotone convergence to arbitrary measurable functions f ~ 0. In the general case, we note that (7) holds with f replaced by lfl. If (M 0 v) lfl < oo, it follows that N s = {8 E S; vlf(8, ·)I = oo} is a JL-null set inS whereas NT= {t E T; JLI!(·, t)l = oo} is a v-null set in T. By Lemma 1.24 we may redefine j(8, t) tobe zero when 8 E Ns or t E NT. Then (7) follows for f by subtraction of the formulas for f+ and f _. 0

1. Measure Theory- BasicNotions

15

The measure p, ® v in Theorem 1.27 is called the product measure of p, and v. Iterating the construction in finitely many steps, we obtain product measures J-t1 ® · · · ® J-tn = ®k J-tk satisfying higher-dimensional versions of (7). If /-tk = p, for all k, we often write the product as p,Q?m or p,n. By a measurable group we mean a group G endowed with a er-field g such that the group operations in G are g-measurable. If /-t1, •.• , J-tn are er-finite measures on G, we may define the convolution /-t1 * ·· · * J-tn as the image of the product measure /-t1 ® ... ® /-tn on cn under the iterated group operation (xb ... , Xn) f-t x1 · · · Xn· The convolution is said tobe associative if (J-t1 * J-t2) * /-t3 = P,1 * (J-t2 * J-t3) whenever both P,1 * P,2 and J-t2 * /-t3 are er-finite and commutative if P,1 * P,2 = P,2 * /-t1· A measure p, on G is said tobe right or left invariant if p, o T;; 1 = p, for all g E G, where T 9 denotes the right or left shift x f-t xg or x f-t gx. When G is Abelian, the shift is called a translation. We may also consider spaces of the form G x S, in which case translations are defined to be mappings of the form T9 : (x, s) f-t (x + g, s). Lemma 1.28 (convolution) The convolution of er-finite measures on a measurable group (G, g) is associative, and for Abelian G it is also commutative. In the latter case,

(p, * v)B If p, = f · >. >.-density

J

J

= p,(B- s) v(ds) = v(B- s) p,(ds), BEg. and v = g · >. for some invariant measure >., then p, * v has the

(f * g)(s)

=

J

f(s- t) g(t) >.(dt)

=

J

f(t) g(s- t) >.(dt),

s E G.

Proof: Use Fubini's theorem.

D

Given a measure space (0, A, p,) and a p for the dass of all measurable functions f:

IIJIIP

> 0, we write LP = LP(O, A, p,)

n --+ lR with

= (~-tlfiP) 11P < 00.

Lemma 1.29 (Hölder and Minkowski inequalities) For any measurable functions fand g on some measure space (O,A,p,), we have

(i) llfgllr ::S llfllpllgllq for all p, q, r > 0 with p- 1 + q- 1 = r- 1 ,

(ii) II!

+ 911~1\1 ::S 11!11~" 1 + IIYII~I\1

for all P > 0.

Proof: (i) It is clearly enough to take r = 1 and IIJIIP = I!YIIq = 1. The relation p- 1 + q- 1 = 1 implies (p- 1)(q- 1) = 1, and so the equations y = xP- 1 and x = yq- 1 are equivalent for x, y ~ 0. By calculus,

and so

16

Foundations of Modern Probability

(ii) The relation holds for p ~ 1 by the concavity of xP on ~+ For p > 1, we get by (i) with q = p/(1- p) and r = 1

I1!11! +

II! + gll~ ~

giP- 1 dJL

+I 1911! +

giP- 1 dJL

llfllpllf + gll~- 1 + ll9llpllf + gll~- 1 ·

~

0

The inequality in (ii) is often needed in the following extended form. Corollary 1.30 (extended Minkowski inequality) Let JL, v, and f besuch as in Theorem 1.27, and assume that JLJ(t) = f(s, t)JL(ds) exists fort E T a.e. v. Write llfllp(s) = (vlf(s, ·)IP) 11P. Then

J

IIJL!IIp ~ JLII!IIp,

P 2:: 1.

Proof:.ßince IJL!I ~ JLIJI, we may assume that f 2:: 0, and we may also assume that IIJL!IIP E (0, oo). For p > 1, we get by Fubini's theorem and Hölder's inequality

IIJL!II~

= ~

= v(JL!(JLJ)P- 1 ) = JLV(f(JLJ)P- 1 ) JLIIJIIP I (JLJ)P- 1 11q = JLIIfllp IIJL!II~- 1 , v(JLJ)P

and it remains to divide by IIJL!II~- 1 . The proof for p simpler.

= 1 is similar but 0

In particular, Lemma 1.29 shows that I · IIP becomes a norm for p 2:: 1 if we identify functions that agree a.e. For any p > 0 and f, h, h, · · · E LP, we write fn -+ f in V if llfn - JIIP -+ 0 and say that Un) is Cauchy in LP if llfm- fniiP-+ 0 as m, n-+ oo. Lemma 1.31 (completeness) Let Un) be a Cauchy sequence in LP, where > 0. Then llfn- JIIP-+ 0 for some f E V.

p

Proof: Choose a subsequence (nk) C N with Lk llfnk+ 1 - fnk 11~/\ 1 < oo. By Lemma 1.29 and monotone convergence we get II Lk lfnk+l - fnk 111~/\ 1 < oo, and so Lk lfnk+ 1 - fnkl < oo a.e. Hence, Unk) is a.e. Cauchy in IR., and so Lemma 1.10 yields fnk -+ f a.e. for some measurable function f. By Fatou's lemma,

IIJ- fnllp ~ liminf llfnk- fnllp ~ sup llfm- fnllp-+ 0, k-+oo m?:n which shows that

f n -+ f

n-+ oo,

in LP.

0

The next result gives a useful criterion for convergence in V. Lemma 1.32 (V-convergence) For any p > 0, let J, JI, h, · · · E LP with fn-+ f a.e. Then fn-+ f in LP iff llfnllp-+ llfllp· Proof: If fn-+ f in V, we get by Lemma 1.29

lllfnll~/\ 1 -IIJII~/\ 1 1 ~ llfn- Jll~l\1-+ 0,

1. Measure Theory - Basic Notions

17

and so llfniiP--+ llfllw Now assume instead the latter condition, and define

Then 9n --+ g a.e. and J.l-9n --+ J.l-9 < oo by hypotheses. Since also l9n I 2:: lfn - JIP --+ 0 a.e., Theorem 1.21 yields llfn- !II~ = Mlfn- JIP --+ 0. 0 Taking p = q = 2 and r = 1 in Lemma 1.29 (i), we get the CauchyBuniakovsky or Schwarz inequality

In particular, we note that, for any J, g E L 2 , the inner product (!, g) = J.l.(fg) exists and satisfies l(f,g)l::::; llfll2llgll2· From the obvious bilinearity of the inner product, we get the parallelogram identity

Two functions J,g E L 2 are said tobe orthogonal (written as f .1 g) if (!, g) = 0. Orthogonality between two subsets A, B C L 2 means that f .1 g for all f E A and g E B. A subspace M C L 2 is said to be linear if af + bg E M for any J, g E M and a, b E IR, and closed if f E M whenever f is the L 2-limit of a sequence in M.

Theorem 1.33 (orthogonal projection} Let M be a closed linear subspace of L 2 • Then any function f E L 2 has an a.e. unique decomposition f = g+h with g E M and h .1 M.

Proof: Fix any f E L 2 , and define d = inf{llf- gll; g E M}. Choose 91,92, · · · E M with II!- 9nll--+ d. Using the linearity of M, the definition of d, and (9), we get as m, n--+ oo, 4d2

+119m- 9nll 2 < II2J- 9m- 9nll 2 +119m- 9nll 2 = 2IIJ- 9mll 2 + 2IIJ- 9nll 2 --+ 4d2 •

Thus, 119m- 9nll --+ 0, and so the sequence (gn) is Cauchy in L 2 . By Lemma 1.31 it converges toward some g E L 2 , and since M is closed we have g E M. Noting that h = f- g has norm d, we get for any l E M,

which implies (h, l) = 0. Hence, h .1M, as required. To prove the uniqueness, let g' + h' be another decomposition with the stated properties. Then g - g' E M and also g - g' = h' - h .1 M, so g- g' .1 g- g', which implies llg- g'll 2 = (g- g',g- g') = 0, and hence g = g' a.e. o We proceed with a basic approximation property of sets.

18

Foundations of Modern Probability

Lemma 1.34 (regularity) Let IL be a bounded measure on some metric space S with Borel a-field S. Then ILB

=

sup !LF

FCB

= inf /LG, G-:::>B

BE S,

with F and G restricted to the classes of closed and open subsets of S, respectively. Proof: For any open set G there exist some closed sets Fn t G, and by Lemma 1.14 we get 1-"Fn t /LG. This proves the statement for B belonging to the 1r-system g of all open sets. Letting V denote the dass of all sets B with the stated property, we further note that V is a .X-system. Hence, 0 Theorem 1.1 shows that V:::> a(Y) = S.

The last result leads to a basic approximation property for functions. Lemma 1.35 ( approximation} Given a metric space S with Borel a-field S, a bounded measure IL on (S,S), and a constant p > 0, the set of bounded, continuous functions on S is densein LP(S,S,/L)· Thus, for any f E LP there exist some bounded, continuous functions JI, /2, ... : S -+ IR with llfn - Jllp -+ 0. Proof: If f = 1A with A C S open, we may choose some continuous functions fn with 0 S fn t J, and then llfn - JIIP -+ 0 by dominated convergence. By Lemma 1.34 the result remains true for arbitrary A ES. The further extension to simple measurable functions is immediate. For general f E LP we may choose some simple measurable functions f n -+ f with lfnl S lfl. Since lfn- JIP S 2P+ 1 IJIP, we get llfn- JIIP -+ 0 by 0 dominated convergence.

The next result shows how the pointwise convergence of a sequence of measurable functions is almost uniform. Lemma 1.36 {near uniformity, Egorov) Let J, JI, h, ... be measurable functions on some finite measure space (0, A, !L) such that fn -+ f on n. Then for any c > 0 there exists some A E A with !LA c < c such that f n -+ f uniformly on A. Proof: Define Amn '

=

n

k~n

{x E 0; lfk(x)- f(x)l < m- 1 },

m,n E N.

Asn-+ oo for fixed m, we have Am,n t n and hence !LA~,n-+ 0. Given any > 0, we may then choose n1, n2, · · · E N solarge that !LA~ n < c2-m for all m. Letting A = Am,nm, we get ' m €

nm

~-"Ac< -

and we note that

/LU

m

Acm,nm < €'""" 2-m= € L.-im

f n -+ f uniformly on A.

'

0

1. Measure Theory- BasicNotions

19

The last two results may be combined to show that every measurable function is almost continuous. Lemma 1.37 (near continuity, Lusin) Let f be a measurable function on some compact metric space S with Borel a-field S and a bounded measure ft· Then there exist some continuous functions h, h, ... on S such that Jt{x; fn(x) -=f. f(x)}--+ 0.

Proof: We may dearly assume that f is bounded. By Lemma 1.35 we may choose some continuous functions g1, g2, ... on S suchthat Jtlgk- fl ~ 2-k. By Fubini's theorem, we get

and so I:;k lgk- fl < oo a.e., which implies gk --+ f a.e. By Lemma 1.36, we may next choose A 1 , A 2 , • · · ES with ~tA~ --+ 0 suchthat the convergence is uniform on each An. Since each gk is uniformly continuous on S, we condude that f is uniformly continuous on each An. By Tietze's extension theorem, the restriction fiAn then admits a continuous extension fn to 8. D For any measurable space (S, S), we may introduce the dass M(S) of afinite measures on S. The set M(S) becomes a measurable space in its own right when endowed with the a-field induced by the mappings 'lrß: ft r-+ ~tB, B ES. Note in particular that the dass P(S) of probability measures on S is a measurable subset of M(S). In the next two lemmas we state some less obvious measurability properties, which will be needed in subsequent chapters. Lemma 1.38 (measurability of products) For any measurable spaces (S,S) and (T, T), the mapping (Jt,v) r-+ ft 18) v is measurable from P(S) x P(T) to P(S x T).

Proof: Note that (Jt18iv)A is measurable whenever A = B x C with BE S and C E T, and extend by a monotone dass argument. D

In the context of separable metric spaces S, we assume the measures < oo for any bounded Borel set B.

ft E M (S) to be locally finite, in the sense that ~tB

Lemma 1.39 (diffuse and atomic parts) For any separable metric space

s,

(i) the set D c M(S) of degenerate measures on S is measurable; (ii) the diffuse and purely atomic components ftd and fta are measurable functions of ft E M(S).

20

J

Foundations of Modern Probability

Proof: (i) Choose a countahle topological hase B1, B2, ... inS, and define Then, clearly,

= {(i,j); Bin Bi = 0}. D = {J.L

E

M(S); L(i,j)EJ(J.LBi)(J.LBj) =

0}.

(ii) Choose a nested sequence of countahle partitions Bn of Sinto Borel sets of diameter less than n- 1 . For any c: > 0 and n E N we introduce the sets u~ = U{B E Bn; J.LB 2:: c:}, UE: = {s ES; J.L{s} 2:: c:}, and u = {s ES; J.L{s} > 0}. It is easily seen that U~ ..j.. UE: as n--+ oo and tU as c: --+ 0. By dominated convergence, the restrictions J.L':t = J.L(U~ n ·) and J.Le = J.L(UE: n ·) satisfy locally J.L':t ..j.. J.LE: and J.LE: t J.La· Since J.L':t is clearly a measurahle function of J.L, the asserted measurahility of J.La and J.Ld now follows hy Lemma 1.10. D

ue

Given two measurahle spaces (S, S) and (T,

n, a mapping J.L: s

X

T--+

iR+ is called a (probability) kernel from S toT if the function J.LsB = J.L(s, B)

is S-measurahle in s E S for fixed B E T and a (prohahility) measure in B E T for fixed s E S. Any kernel J.L determines an associated operator that maps suitahle functions f : T --+ lR. into their integrals J.Lf(s) = J.L(s,dt)f(t). Kerneis play an important role in prohahility theory, where they may appear in the guises of random measures, conditional distrihutions, Markov transition functions, and potentials. The following characterizations of the kernel property are often useful. For simplicity we restriet our attention to prohahility kernels.

J

Lemma 1.40 {kernels) Fix two measumble spaces (S, S) and (T, T), a 1r-system C with a(C) = T, and a family J.L = {J.L 8 ; s E S} of probability measures an T. Then these conditions are equivalent: (i) J.L is a probability kernel from S toT; (ii) J.L is a measurable mapping from S to P(T); (iii) s ....-+ J.L 8 B is a measurable mapping from S to [0, 1) for every BE C.

Proof: Since 1fB : J.L ....-+ J.LB is measurahle on P(T) for every B E T, condition (ii) implies (iii) hy Lemma 1.7. Furthermore, (iii) implies (i) hy a Straightforward application of Theorem 1.1. Finally, under (i) we have J.L- 1 7rß 1 [0, x] E S for all B E T and x 2:: 0, and (ii) follows hy Lemma 1.4. D Let us now introduce a third measurahle space (U,U), and consider two kernels J.L and v, one from S to T and the other from S x T to U. Imitating the construction of product measures, we may attempt to comhine J.L and v into a kernel 1-L Q9 v from S to T x U given hy

(J.L Q9 v)(s, B)

=I

J.L(s, dt)

I

v(s, t, du)lB(t, u),

BE T Q9 U.

The following lemma justifies the formula and provides some further useful information.

1. Mea.sure Theory- Ba.sic Notions

21

Lemma 1.41 (kemels and functions) Fix three measurable spaces (S, S), (T, T), and (U, U). Let J.L and 11 be probability kemels from S to T and from S x T to U, respectively, and consider two measurable functions f: S x T --+ JR+ and g: S x T --+ U. Then

(i) J.Lsf(s, ·) is a measurable function of s ES; (ii) J.Ls o (g(s, ·))- 1 is a kemel from S to U; (iii) J.L 0 v is a kemel from S to T x U. Proof: Assertion (i) is obvious when f is the indicator function of a set

A = B x C with B E S and C E T. From here on, we may extend to general A E S 0 T by a monotone dass argument and then to arbitrary f by linearity and monotone convergence. The statements in (ii) and (iii) are easy consequences. 0 For any measurable function

(J.L 0 v)sf =

I

f

J.L(s, dt)

0 on T x U, we get as in Theorem 1.27

I

~

v(s, t, du)f(t, u),

s E S,

or simply (J.L 0 v)f = J.L(ll !). By iteration we may combine any kernels J.Lk from So X ... X sk-1 to sk, k = 1, ... 'n, into a kernel J.L1 0 ... 0 J.Ln from So to s1 X ... X Sn, given by (J.L1 0 · · · 0 J.Ln)f

= J.L1(J.L2(· · · (J.Lnf) · · · ))

for any measurable function f ~ 0 on s1 X .•. X Sn. In applications we may often encounter kernels J.Lk from Sk-1 to Sk, k = 1, ... , n, in which case the composition J.L 1 · · · J.Ln is defined as a kernel from So to Sn given for measurable B C Sn by (J.Ll 0 · · · 0 J.Ln)s(Sl X · · · X Sn-1 X B)

I

J.L1(s,ds1)

···I

I

J.L2(s1,ds2) · · ·

J.Ln-l(Sn-2,dSn-l)J.Ln(Sn-l,B).

Exercises 1. Prove the triangle inequality J.L(AöC) :::; J.L(AöB) + J.L(BöC). (Hint: Note that 1AAB = I1A- 1sl-) 2. Show that Lemma 1.9 is false for uncountable index sets. (Hint: Show that every measurable set depends on countably many coordinates.)

3. For any space S, let J.LA denote the cardinality of the set AC S. Show that J.L is a measure on (S, 28 ). 4. Let K be the dass of compact subsets of some metric space S, and let J.L be a bounded measure suchthat infKEK: J.LKc = 0. Show for any BE B(S) that J.LB = SUPKEK:nB J.LK.

22

Foundations of Modern Probability

5. Show that any absolutely convergent series can be written as an integral with respect to counting measure on N. State series versions of Fatou's lemma and the dominated convergence theorem, and give direct elementary proofs. 6. Give an example of integrable functions J, fl, /2, ... on some probability space (0, A, p,) suchthat fn-+ f but J-tfn f+ p,f.

7. Fix two a-finite measures p, and von some measurable space (O,F) with sub-a-field Q. Show that if p, «: v holds on F, it is also true on Q. Further show by an example that the converse may fail. 8. Fix two measurable spaces (8, S) and (T, T), a measurable function f: S -+ T, and a measure p, on S with image v = p, o f- 1 . Show that f remains measurable w.r.t. the completions SIL and

rv.

9. Fixa measure space (S,S,p,) and a a-field TC S, let SILdenote the p,-completion of S, and let /IL be the a-field generated by T and the p,-null sets of SIL. Show that A E /IL iff there exist some B E T and N E SIL with AD.B C N and p,N = 0. Also, show by an example that /IL may be strictly greater than the p,-completion of T 10. State Fubini's theorem for the case where p, is any a-finite measure and v is the counting measure on N. Give a direct proof of this result.

11. Let fl, /2, ... be p,-integrable functions on some measurable space S suchthat g = Lk fk exists a.e., and put 9n = Lk and compare with the result of the preceding exercise. 12. Extend Theorem 1.27 to the product of n measures. 13. Let ,\ denote Lebesgue measure on JR+, and fix any p > 0. Show that the class of step functions with bounded support and finitely many jumps is dense in LP(-\). Generalize to JRi. 14. Let M :) N be closed linear subspaces of L 2 . Show that if f E L 2 has projections g onto M and h onto N, then g has projection h onto N.

15. Let M be a closed linear subspace of L 2 , and let projections j and g. Show that (!, g) = (!, g) = (j, g).

J, g

E L 2 with M-

16. Let JLI,J-t2, ... be kernels between two measurable spaces Sand T. Show that the function p, = Ln /Ln is again a kernel. · 17. Fixa function f between two measurable spaces Sand T, and define p,(s,B) =IBo f(s). Show that p, is a kernel iff f is measurable. 18. Show that if p, Use Lemma 1.24.)

«: v and vf = 0 with f

~

0, then also p,f

= 0.

(Hint:

19. For any a-finite measures /LI «: J-t2 and v1 «: v2, show that /LI 0 v1 /L2 ® v2. (Hint: Use Fubini's theorem and Lemma 1.24.)

«:

Chapter 2

Measure Theory -

Key Results

Outer measures and extension; Lebesgue and Lebesgue-Stieltjes measures; Jordan-Hahn and Lebesgue decompositions; RadonNikodym theorem; Lebesgue 's differentiation theorem; functions of finite variation; Riesz' representation theorem; Haar and invariant measures

We continue our introduction to measure theory with a detailed discussion of some basic results of the subject, all of special relevance to probability theory. Again the hurried or impatient reader may skip to the next chapter and return for reference when need arises. Most important, by far, of the quoted results is the existence of Lebesgue measure, which lies at the heart of most probabilistic constructions, often via a use of the Daniell-Kolmogorov theorem of Chapter 6. A similar role is played by the construction of Haar and other invariant measures, which ensures the existence of uniform distributions or homogeneaus Poisson processes on spheres and other manifolds. Other key results include Riesz' representation theorem, which will enable us in Chapter 19 to construct Markov processes with a given generator, via the resolvents and the associated semigroup of transition operators. We may also mention the Radon-Nikodym theorem, of relevance to the theory of conditioning in Chapter 6, Lebesgue's differentiation theorem, instrumental for proving the general ballot theorem in Chapter 11, and various results on functions of bounded variation, important for the theory of predictable processes and general semimartingales in Chapters 25 and 26. We begin with an ingenious technical result that will play a crucial role for our construction of Lebesgue measure in Theorem 2.2 and for the proof of Riesz' representation Theorem 2.22. By an outer measure on a space n we mean a nondecreasing and countably subadditive set function JL: 211 --+ IR+ with JL0 = 0. Given an outer measure JL Oll n, we say that a set A c n is JL-measurable if JLE

= JL(E n A) + JL(E n N),

E

c n.

(1)

Note that the inequality ~ holds automatically by subadditivity. The following result gives the basic construction of measures from outer measures.

24

Foundations of Modern Probability

Theorem 2.1 (restriction of outer measure, Caratheodory) Let 11 be an

outer measure on n, and write A for the class of 11-measurable sets. Then

A is a a-field and the restriction of 11 to A is a measure. Proof: Since 110 = 0, we have for any set E c 11 (E n 0) + 11 (E n 0)

n

= 110 + 11E = 11E,

which shows that 0 E A. Alsonote that trivially A E A implies AcE A. Next assume that A, B E A. Using (1) for A and B tagether with the subadditivity of /1, we get for any E c n

11E

11(E n A) + 11(E n Ac) = 11(E n An B) + 11(E n An Be)+ 11(E n Ac) 2:: 11(E n (An B)) + 11 (E n (An B)c),

=

which shows that even A n B E A. It follows easily that A is a field. If A, BE Aare disjoint, we also get by (1) for any E c n

11(E n (Au B))

= 11(E n (Au B) n A) + 11 (E n (Au B) n Ac) = 11 (E n A) + 11 (E n B). (2)

Finally, consider any disjoint sets A1, A2, · · · E A, and put Un = Uk

Lk~ni-L(E n Ak) + 1-L(E n uc)

--+ 1-L(E n U) + 11(E n uc), which shows that U E A. Thus, Ais a a-field.

0

We are now ready to introduce Lebesgue measure >. on R The length of an interval I c IRis denoted by IJI. Theorem 2.2 (Lebesgue measure, Borel) There exists a unique measure >. on (IR, B) such that >.I = III for every interval I C IR.

As a first step in the proof, we show that the length III of intervals I c IR admits an extension to an outer measure on IR. Then define

(4)

2. Measure Theory - Key Results

25

where the infimum extends over all countable covers of A by open intervals h,I2, .... We show that (4) provides the desired extension. Lemma 2.3 (outerLebesgue measure) The function A in (4) is an outer measure on IR. Moreover, Al= III for every interval I. Proof: The set function A is clearly nonnegative and nondecreasing with A0 = 0. To prove the countable subadditivity, let At, A2 , · • · c IR be arbitrary. For any e > 0 and n E N, we may choose some open intervals ln1Jn2, ... suchthat nEN.

Then

un

An

AUnAn

c

UnU/nk,

< LnLkllnkl :S LnAAn +e,

and the desired relation follows as we let e --+ 0. To prove the second assertion, we may assume that I = [a, b] for some finite numbers a < b. Since I C (a- e, b + e) for every e > 0, we get AI::; III + 2e, and so AI::; IJI. To obtain the reverse relation, we need to prove that if I c uk Ik for some open intervals h, h ... ' then III ::; I::k lhiBy the Heine-Borel theorem, I remains covered by finitely many intervals h, ... ,In, and it suffices to show that III ::; I:;k 0, we may cover E by some open intervals h,l2, ... suchthat AE ~ I::n llnl- e. Writing I= ( -oo, a]

26

Foundations of Modern Probability

and using the subadditivity of >. and Lemma 2.3, we get

>.E + €

> Lniinl = Lniin n Il + Lniin n Icl Ln >.(In n I) + Ln >.(In n Ic)

> >.(E n I)+ >.(E n Ic). Since e was arbitrary, it follows that I is >.-measurable.

0

Proof of Theorem 2.2: Define >. as in (4). Then Lemma 2.3 shows that >. is an outer measure suchthat >.I= III for every interval I. Furthermore, Theorem 2.1 shows that >. is a measure on the a-field A of all >.-measurable sets. Finally, Lemma 2.4 shows that A contains all intervals ( -oo, a] with a E lR.. Since the latter sets generate the Borel a-field B, we have B CA. To prove the uniqueness, consider any measure J.L with the stated properties, and put In= [-n,n] for n E N. Using Lemma 1.17 with C equal to the set of intervals, we see that

>.(B n In) = J.L(B

n In),

B E B, n E N.

Letting n-+ oo and using Lemma 1.14, we get >.B required.

= J.LB for all BEB, as 0

Before proceeding to a more detailed study of Lebesgue measure, we state an abstract extension theorem that can be proved by essentially the same arguments. Here a nonempty dass 'I of subsets of a space 0 is called a semiring if for any I, J E I we have I n J E 'I and the set I n Je can be written as a union of finitely many disjoint sets h, ... , In E 'I. Theorem 2.5 (extension, Caratheodory) Let J.L be a finitely additive and countably subadditive set function on a semiring 'I such that J.L0 = 0. Then J.L extends to a measure on a('L).

Proof: Define a set function J.L* on 2° by

J.L* A

= inf{Jk} LkJ.Lh,

Ac n,

where the infimum extends over all covers of A by sets h, I 2 , • • • E 'I. Let J.L* A = oo when no such cover exists. Proceeding as in the proof of Lemma 2.3, we see that J.L* is an outer measure on 0. To check that J.L* extends J.L, fix any I E 'I, and consider an arbitrary cover I 1 ,I2 , • · · E 'I of I. Using both the subadditivity and the finite additivity of J.L, we get

J.L* I::; J.LI::; Lk (In Ik) ::; LkJ.Lh, which implies J.L* I = J.LI. By Theorem 2.1, it remains to show that every set I E 'I is J.L* -measurable. Then let A c 0 be covered by some sets I1, h, · · · E 'I with J.L* A ~ l:k J.Lik-E, and proceed as in the proof ofLemma 2.4, noting that In n Ic is a finite disjoint union of some sets Inj E 'I, and therefore J.L(In n Ic) = l:j J.Linj by the finite additivity of J.L. 0

2. Measure Theory - Key Results

27

Using Theorem 1.27, we may construct the product measure >..d = >.. ® · · · ® >.. on JR.d for every d E N. We call >..d the d-dimensional Lebesgue

measure. Note that >..d generalizes the ordinary notion of area (when d = 2) or volume (when d ~ 3). The following result shows that >..d is invariant under arbitrary translations (or shifts) and rotations. Weshall also see that the shift invariance characterizes >..d up to a constant factor.

Theorem 2.6 {invariance of Lebesgue measure) Fix any measurable space (S,S) and a measure J.t on JR.d x S with a-jinite projection v = J.t((O, l]d x ·) onto S. Then J.t is invariant under shifts in JR.d iff J.t = >..d ® v, in which case J.t remains invariant under arbitrary rigid motions of JR.d. Proof: First assume that J.t is invariant under shifts in JR.d. Let I denote the dass of intervals I = (a, b] with rational endpoints, and note that for any h, ... , Id E I and C E S with vC < oo,

lh

J.t(h X ... X Id X C)

·IIdlvC (>.. d ® V) (h X ... X I d X C). 1 ..

For fixed I 2 , ••. , Id and C, the relation extends by monotonicity to arbitrary intervals h and then, by the uniqueness in Theorem 2.2, to any B1 E B. Proceeding recursively in d steps, we get for arbitrary B 1, ... , Bd E B

J.t(B1 x .. · x Bd x C) = (>..d ® v)(B1 x .. · x Bd x C), and so J.t = >..d ® v by the uniqueness in Theorem 1.27. Conversely, let J.t = >..d ® v. For any h = (h 1, ... , hd) E JR.d, we define the shift operator Th: IR.d --+ JR.d by Thx = x + h for all x E JR.d. For any intervals h, ... , Id and sets CES, we have

= lhl· · ·IIdl vC

J.t(I1 x · · · x Id x C)

=

J.t0Ti: 1(h X ... X Id XC),

where Th(x, s) = (x + h, s). As before, it follows that J.t = J.t o T;; 1. It remains to show that J.t is invariant under arbitrary orthogonal transformations P on JR.d. Then note that, for any x, h E JR.d,

Px + h = P(x + p- 1h) P(x + h')

= PTh'x,

where h' = p- 1h. Since J.t is shift-invariant, we obtain

J.t 0 P -1

0

r-1 h = J.t 0 r-1 h'

0

p-1 = J.t 0 p-1 '

where P(x,s) = (Px,s). Thus, even J.tOP- 1 is shift-invariant and hence of the form >.. d ® v'. Writing B for the unit ball in JR.d, we get for any C E S

>..dB·v'C

= J.toP- 1(BxC)=J.t(P- 1BxC) = J.t(B

x C)

= >.d B · vC.

Dividing by >..dB yields v' C = vC. Hence, v' = v, and so J.t o p- 1 = J.t.

D

28

Foundations of Modern Probability

We proceed to show that integrable functions on Rd are continuous in a specified average sense.

Lemma 2.7 (mean continuity) Let f be a measurable function on Rd with Adifl < oo. Then lim

h-+0

I

if(x

+ h)-

f(x)l dx

= 0.

Proof: By Lemma 1.35 and a simple truncation, we may choose some continuous functions fl, /2, ... with bounded supportssuchthat Adlfnfl --+ 0. By the triangle inequality, we get for n E N and h E Rd

I

if(x

+ h)- f(x)l dx:::;

I

lfn(X

+ h)-

fn(x)i dx

+ 2Adlfn- Jj.

Since the fn are bounded, the right-hand side tends to 0 by dominated 0 convergence as h--+ 0 and then n--+ oo. By a bounded signed measure on a measurable space (0, A) we mean a bounded function v: A --+ R such that v Un An = l:n v An for any disjoint sets At, A2, · · · E A, where the series converges absolutely. We say that two measures fJ, and von (0, A) are (mutually) singular or orthogonal and write fJ, l_ vif there exists some set A E A with f.,LA = vAc = 0. Note that this A may not be unique. The following result gives the basic decomposition of a signed measure into positive components.

Theorem 2.8 (Hahn decomposition) Any bounded signed measure v can be written uniquely as a difference of two bounded, nonnegative, and mutually singular measures v+ and v_. Proof: Put c = sup{vA; A E A} and note that, if A,A' E A with vA :=:: c - c and v A' :=:: c - c', then v(AUA')

= vA+vA'-v(AnA') > (c- c) + (c- c')- c = c- c- c'.

Choosing At, A2, · · · E A with vAn :=:: c- 2-n, we get by iteration and countable additivity V

Uk>n Ak :=:: c -

nn

"" Tk L...Jk>n

= c - 2-n,

n E N.

Define A+ = Uk>n Ak and A_ = A+. Using the countable additivity again, we get v A+ = c. Hence, for sets B E A, vB vB

= =

c

vA+ - v(A+ \ B) :::: 0,

B

v(A+UB)-vA+:s;O,

BcA_.

A+,

We may then define some measures v+ and v_ by BEA.

To prove the uniqueness, assume also that v = f1+ - /1- for some positive measures f1+ l_ f1-· Choose a set B+ E A with /1-B+ = f1+B+ = 0. Then

2. Measure Theory - Key Results

29

v is both positive and negative on the sets A+ \ B+ and B+ \ A+, and therefore v = 0 on A+~B+. Hence, for any CE A 1-L+C = 1-L+(B+ n C) = v(B+ n C) = v(A+ n C) = v+C, which shows that 1-L+ = v+. Then also 0

/-L- = /-L+- V= V+- V= V_.

The last result can be used to construct the maximum 1-L Vv and minimum 1-L 1\ v of two a-finite measures 1-L and v.

Corollary 2.9 (maximum and minimum) For any a-finite measures 1-L and v on a common measurable space, there exists a largest measure 1-L 1\ v bounded by 1-L and v and a smallest measure 1-L V v bounding 1-L and v. Furthermore, t-L-t-L/\V~V-t-L/\~

t-L/\V+t-LVV=t-L+~

Proof: We may assume that 1-L and v are bounded. Letting P+ - P- be the Hahn decomposition of 1-L - v, we put 1-L 1\ V= 1-L- P+,

D

1-L V v = 1-L + P-·

For any two measures 1-L and v on (n, A), we say that v is absolutely continuous with respect to 1-L and write v « 1-L if /-LA = 0 implies v A = 0 for all A E A. The following result gives a fundamental decomposition of a measure into an absolutely continuous and a singular component; at the same time it provides a basic representation of the former part.

Theorem 2.10 (Lebesgue decomposition, Radon-Nikodym theorem) For any a-finite measures 1-L and v on n, there exist some unique measures Va « 1-L and V8 ~ 1-L such that v = Va + V8 • Furthermore, Va = f · 1-L for some t-L-a.e. unique measurable function f 2: 0 on n. Two lemmas will be needed for the proof.

Lemma 2.11 (closure) Fix two measures 1-L and v on n and some measurable functions /I, h, · · · 2: 0 on n with fn · 1-L :S: v. Then even f · 1-L :S: v, where f = supn fn·

Proof: First assume that f · 1-L :S: v and g · 1-L :S: v, and put h Writing A = {! 2: g}, we get

=

f V g.

h · /-L = lAh · /-L +lAch· /-L = lAf · /-L + 1Ac9 · /-L :S: lA ·V+ lAc ·V= V. Thus, we may assume that f n t f. But then v 2: f n · 1-L t f · 1-L by monotone 0 convergence, and so f · 1-L :S: v.

Lemma 2.12 (partial density) Let 1-L and v be bounded measures on n with 1-L 1- v. Then there exists a measurable function f 2: 0 on n such that t-L! > 0 and f · 1-L :S: v.

Proof: Foreach n E N we introduce the signed measure Xn = v- n- 1 /-L. By Theorem 2.8 we may choose some A;i E A with complement A;;: such

30

Foundations of Modern Probability

that ±xn 2: 0 on A;. Since the Xn are nondecreasing, we may assume that At c At c · · · . Writing A = Un A~ and noting that Ac = nn A;;- c A;;-, we obtain

vAc::; vA;;- = xnA;;- + n- 1 p,A;;-

:=::;

n- 1 J.t0--+ 0,

and so vAc = 0. Since J.t J.. v, we get J.LA > 0. Furthermore, A~ t A implies J.LA~ t J.LA > 0, and we may choose n so large that J.LA~ > 0. Putting f = n- 1 1A;t• we obtain J.tf = n- 1 J.LA~ > 0 and

f

·J.t

= n- 1 1A;t · f..t = 1A;t · V - 1A;t · Xn::;

V.

D

Proof of Theorem 2.10: We may assume that f..t and v are bounded. Let C denote the dass of measurable functions f 2: 0 on n with f · J.t ::; v, and define c = sup{J.tf; f E C}. Choose !1, h, · · · E C with J.tfn--+ c. Then f = supn fn E C by Lemma 2.11 and J.tf = c by monotone convergence. Define Va = f · f..t and V8 = v- Va, and note that Va « f..t· If V8 J.. f..t, then by Lemma 2.12 there exists a measurable function g 2: 0 with J.t9 > 0 and g · J.t :=::; V8 • But then f + g E C with J.t(/ + g) > c, which contradicts the definition of c. Thus, V 8 .l f..t· To prove the uniqueness of Va and v8 , assume that also v = v~ + v~ for some measures v~ « J.t and v~ .l J.t. Choose A, BE A with v8 A = J.LAc = v~B = J.LBc = 0. Then clearly Vs(A n B)

= v~(A n B) = Va(Ac u Be)= v~(Ac u Be)= 0,

and so

Va V8

=

1AnB · Va = 1AnB ·V= 1AnB ·V~= V~,

= V- Va =V- V~

=V~.

To see that f is a.e. unique, assume that also va = g · J.t for some measurable function g 2: 0. Writing h = f- g and noting that h · J.t = 0, we get

J.tlhl= {

j{h>O}

and so h

hdJ.t- {

j{h. o g- 1 , where >. denotes Lebesgue measure on R Noting that g(t) :S x iff t :S F(x), we get for any a < b

JL(a,b]

=

>.{t; g(t) E (a,b]} >.(F(a), F(b)] = F(b)- F(a).

Thus, the restriction of JL to IR satisfies (5). The uniqueness of JL may be proved in the same way as for >.in Theorem 2.2. D We now specialize Theorem 2.10 to the case when JL equals Lebesgue measure and v is a locally finite measure on IR, defined as in Proposition 2.14 in terms of some nondecreasing, right-continuous function F. The Lebesgue decomposition and Radon-Nikodym property may be expressed in terms of F as (6)

where Fa and F8 correspond to the absolutely continuous and singular components of v, respectively, and we assume that Fa(O) = 0. Here f denotes the function f(t) dt, where the Lebesgue density f is a locally integrable function on IR. The following result extends the fundamental theorem of calculus for Riemann integrals of continuously differentiable functions-the fact that differentiation and integration are mutually inverse Operations.

J

J;

Theorem 2.15 (differentiation, Lebesgue) Any nondecreasing and right-

continuous function F = F' = f.

Jf

+ Fs is differentiable a. e. with derivative

Thus, the two parts of the fundamental theorem generalize to (J !)' = f a.e. and F' =Fa. In other words, the density of an integral can still be

J

32

Foundations of Modern Probability

recovered a.e. through differentiation, whereas integration of a derivative yields only the absolutely continuous component of the underlying function. In particular, Fis absolutely continuous iff J F' = F- F(O) and singular iff F' = 0 a.e. The last result extends trivially to any difference F = F+ - F_ between two nondecreasing, right-continuous functions F+ and F _. However, it fails for more general functions, already because the derivative may not exist. For example, the paths of Brownian motion introduced in Chapter 13 are a.s. nowhere differentiable. Two lemmas will be helpful for the proof of the last theorem. Lemma 2.16 (interval selection) Let I be a class of open intervals with union G. If >..G < oo, there exist some disjoint sets h, ... Jn E I with

Lk lhl 2:: >..G/4.

Proof: Choose a compact set K c G with >..K 2:: 3>..G /4. By compactness we may cover K by finitely many intervals J11 ... , Jm EI. We now define h, 12 , •.• recursively, by letting h be the Iongest interval lr not yet chosen suchthat lr n Ij = 0 for all j < k. The selection terminates when no such interval exists. If an interval lr is not selected, it must intersect a Ionger interval h. Writing ik for the interval centered at h with length 3lhl, we obtain K

C

UrJr

C

Ukfk,

and so

(3/4)>..G:::; >..K:::; >..Ukik:::; Lklikl

= 3 Lkiikl·

Lemma 2.17 (differentiation on null sets) Let F(x)

0

=

JL(O,x] for some locally finite measure JL on IR, and let A E B with JLA = 0. Then F' = 0 a.e. >.. on A.

Proof: By Lemma 1.34 there exists for every 8 > 0 some open set Go :JA with JLG 0 < 8. Define Ae

.

= { X E A; ln~~~p

JL(X- h, X+ h) } h >c ,

c

> 0,

and note that each Ae is measurable since the lim sup may be taken along the rationals. For every x E Ae there exists some interval I= (x-h, x+h) c G0 with 2j.tl > ciii, and we note that the dass Ie,o of such intervals covers Ae. Hence, by Lemma 2.16 we may choose some disjoint sets / 1 , ... , In E Ie,o with Lk lhl 2:: >..Ae/4. Then ~

>..Ae:::; 4 L....)Ikl:::;

8~

EL.....tkfllk:::;

8JLGo 88 -c- < €.

As 8 -t 0, we get >..Ae = 0. Thus, limsupJL(X- h,x A, and the assertion follows since c is arbitrary.

+ h)/h:::; c a.e. >.. on 0

2. Measure Theory - Key Results

Proof of Theorem 2.15: Since assume that F = f. Define

F~

J

33

= 0 a.e. >. by Lemma 2.17, we may

limsup h- 1(F(x + h)- F(x)),

F"(x)

h-+0

liminf h- 1(F(x + h)- F(x)), h-+0

and note that F" = 0 a.e. on the set {! = 0} = {x; f(x) = 0} by Lemma 2.17. Applying this to the function Fr= J(! -r)+ for arbitrary r E IR and noting that f ::; (! - r) + + r, we get F" ::; r a.e. on {! ::; r}. Thus, for r restricted to the rationals,

>-U)!::; r < F"}

>.{! < F"}
.{!::; r < F"} = 0,

J(-

which shows that F" ::; f a.e. Applying this result to - F = f) yields pv = -( -F)" 2': f a.e. Thus, F" = pv = f a.e., and so F' exists a.e. and equals f. D For any function F : IR interval [a, b] as

--t

IR, we define the total variation of F on the

IIFII~ = SUP{tk} LkiF(tk)- F(tk-1)1, where the supremum extends over all finite partitions a = to < t1 < · · · < tn = b. Similarly, the positive and negative variations of F are defined by the same expression with the absolute value I · I replaced by the positive and negative parts (-)±. Here x± = (±x) V 0, so that x = x+- x- and lxl = x+ + x-. We also write Ll~F = F(b)- F(a). The following result gives a basic decomposition of functions of locally finite variation, similar to the Hahn decomposition in Theorem 2.8. Proposition 2.18 (Jordan decomposition) A function F on IR has locally finite variation iff it is a difference of two nondecreasing functions F+ and F _. In that case, IIFII~

::; Ll~F+ + Ll~F_,

s

< t,

(7)

with equality iff the increments Ll~F± agree with the positive and negative variations of Fon (s, t]. Proof: For any s

< t we have

(Ll~F)+

=

(Ll~F)-

ILl~FI

=

(Ll~F)+

+ Ll~F,

+ (Ll~F)-

= 2(Ll~F)-

+ Ll~F.

Summing over the intervals in an arbitrary partition s = to tn = t and taking the supremum of each side, we obtain A~F+

IIFII~

A~F- +A~F,

=

2Ll~F- + A~F

= A~F+ + A~F-,

< t1 < · · ·
0, we define Ft(t)

="

L....sE[O,t)

(F+(s)- F(s)),

Fr(t) = F(t)- Ft(t);

when t::; 0 we need to take the negative of the corresponding sum on (t, 0]. It is easy to check that Ft is left-continuous and Fr is right-continuous, and that both functions are nondecreasing. To prove the last assertion, assume that F is right-continuous at some point s. If IJFII~ -+ c > 0 as t ..!.- s, we may choose t - s so small that IIFII; < 4c/3. Next we may choose a partition s = to < t1 < · · · < tn = t of [s, t] suchthat the corresponding F-increments 8k satisfy Lk l8kl > 2c/3. By the right continuity of F at s, we may assume that t 1 - s is small enough that 81 = IF(t1)- F(s)l < c/3. Then IIFIIt > c/3, and so 4c/3 > IIFII;

= IIFII;l + IIFIIt > c + c/3 = 4c/3,

a contradiction. Hence c = 0. Assuming F± tobe minimal, we obtain D

Justified by the last theorem, we may assume our finite-variation functions to be right-continuous. In that case, we have the following basic relation to signed measures. Here we only require the latter to be locally bounded.

Proposition 2.20 (finite-variation functions and signed measures) For any right-continuous function F of locally finite variation, there exists a unique signed measure v on IR suchthat v( s, t] = D.!F for all s < t. Furthermore, the Hahn decomposition v = v+- v_ and the Jordan decomposition F = F+- F_ into minimal components are related by V±(s, t] D.!F±.

=

Proof: The positive and negative variations F± are right-continuous by Proposition 2.19. Hence, by Proposition 2.14 there exist some locally finite

2. Measure Theory - Key Results

35

=

measures J-L± on IR such that J-L±(s, t] ~;F±, and we may take v = 1-L+- 1-L-· To see that this agrees with the Hahn decomposition v = v+ - v _, choose A E A suchthat v+Ac = v_A = 0. For any B E B, we get

which shows that J-L+ 2: v+. Then also J-L- 2: v _. If the equality fails on some interval (s, t], then

IIFII; = J-L+(s, t] + J-L-(s, t] > v+(s, t] + v_(s, t], which contradicts Proposition 2.18. Hence, J-L± =V±.

0

A function F: IR --7 IR is said to be absolutely continuous if for any a < b and c: > 0 there exists some 8 > 0 such that, for any finite collection of disjoint intervals (ak, bk] C (a, b] with Lk lbk-aki < 8, we have Lk IF(bk)F(ak)i < c:. In particular, we note that every absolutely continuous function is continuous and has locally finite variation. Given a function F of locally finite variation, we say that F is singular if for any a < band c: > 0 there exist finitely many disjoint intervals (ak, bk] c (a, b] suchthat Lk lbk- aki < c: and IIFII~ < Lk IF(bk)- F(ak)i + c:. We say that a locally finite signed measure v on IR is absolutely continuous or singular if the components V± of the associated Hahn decomposition satisfy V± « >. or V± _L >., respectively. The following result relates the notions of absolute continuity and singularity for functions and measures. Proposition 2.21 (absolutely continuous and singular functions) Let F be a right-continuous function on IR of locally finite variation, a";,d let v be the associated signed measure on IR with v(s, t] ~~F. Then Fis absolutely continuous or singular iff the corresponding property holds for v.

=

Proof: If F is absolutely continuous or singular, then the corresponding property holds for the total variation function IIFII~ with arbitrary a and hence also for the minimal components F± in Proposition 2.20. Thus, we may assume that F is nondecreasing, so that v is a positive and locally finite measure on IR. First assume that F is absolutely continuous. If v ., there exists a bounded interval I = (a, b) with a subset A E B such that >.A = 0 but vA > 0. Taking c: = vA/2, we choose a corresponding 8 > 0 as in the definition of absolute continuity. Since A is measurable and has outer Lebesgue measure 0, we may next choose an open set G with A C G C I suchthat >.G < 8. But then vA ::; vG < c: = vA/2, a contradiction. This shows that v « >.. Next assume that Fis singular, and fix any bounded interval I= (a, b]. Given any c: > 0, we may choose some Borel sets A1, A2, · · · C I such that >.An < c:2-n and vAn --+ vi. Then B = Un An satisfies >.B < c: and vB = vi. Next we may choose some Borel sets Bn CI with >.Bn --7 0 and

36

Foundations of Modern Probability

nn

vBn = vi. Then C = Bn satisfies >..C = 0 and vC = vi, which shows that v ..l >.. on I. Conversely, assume that v « >.., so that v = f · >.. for some locally integrable function f 2:: 0. Fix any bounded interval I and put An = {x E I; f(x) > n}. Fix any c > 0. Since vAn -+ 0 by Lemma 1.14, we may choose n so large that v An < c/2. Put 8 = c/2n. For any Borel set B C I with >..B < 8 we obtain vB

= v(B n An)+ v(B n A~)

~

vAn + n>..B
... Fix any finite interval I = ( a, b], and choose a Borel set AC I suchthat >..A = 0 and vA = vi. For any c > 0 we may choose some open set G ::J A with >..G < c. Letting (an, bn) denote the connected components of G and writing In= (an, bn], we get Ln IIn I < c D and Ln v(I n In) = vi. This shows that F is singular. From now on, we assume the basic space 8 to be locally compact, second countable, and Hausdorff (abbreviated lcscH). Let Q, :F, and K denote the classes of open, closed, and compact sets in 8, and put g = {G E Q; G E K}. Let 6+ = 6+(8) denote the dass of continuous functions f: 8-+ IR+ with compact support, where the latter is defined as the closure of the set {x E 8; f(x) > 0}. Relations such as U -< f-< V mean that f E 6+ with 0 ~ f ~ 1 and satisfies f = 1 on U and suppf c V 0 • By a positive linear functional on 6+ we mean a mapping J.L: 6+ -+ IR+ such that J.LU + 9) = J.L! + J.L9 for all /, 9 E 6+. This clearly implies the homogeneity J.L(cf) = CJ.Lf for any f E 6+ and c E IR+. A Radon measure on 8 is defined as a measure J.L on the Borel u-field S = B(8) such that J.LK < oo for every K E K. The following result gives the basic extension of positive linear functionals to measures. Theorem 2.22 (Riesz representation) lf 8 is lcscH, then every positive linear functional J.L on 6+(8) extends uniquely to a Radon measure on 8.

Severallemmas will be needed for the proof, and we begin with a simple topological fact. Lemma 2.23 (partition of unity) For any open cover G1, ... , Gn of a compact set K C 8, there exist some functions ft, ... , fn E 6+(8) with !k-< Gk suchthat Lk !k = 1 on K.

Proof: For any x E K we may choose some k < n and V E g with x E V and V c Gk. By compactness, K is covered by finitely many such sets V1, ... , Vm. For each k ~ n, let Uk be the union of all sets Vj with V3 C Gk. Then U k C Gk, and so we may choose 91. ... , 9n E 6+ with Uk -< 9k -< Gk· Define fk

= 9k(1 -

91) · · · (1- 9k-d,

k

= 1, ... , n.

2. Measure Theory - Key Results

37

Then fk --< Gk for all k, and by induction

/I+···+ fn = 1 - (1- g1) · · · (1- gn)· lt remains to note that flk(1- gk)

= 0 on K since K

C

Uk Uk.

D

By an inner content on an lcscH space S we mean a nondecreasing function J.L : Q ---+ iR+, finite on Q, such that f. L is both finitely additive and countably subadditive, and also satisfies the inner continuity

J.LG = sup{J.LU; U E Q, U

c

G},

(8)

GE Q.

Lemma 2.24 {inner approximation) For any positive linear functional J.L on C+(S), we may define an inner content v on S by

vG = sup{J.L/; f --< G},

GE Q.

Proof: Note that v is nondecreasing with v0 = 0 and that vG < oo for bounded G. It is also clear that v is inner continuous in the sense of (8). To show that v is countably subadditive, fix any G1 , G 2, · · · E Q and let f --< Uk Gk. By compactness, f --< Uk 0 we may choose some G1, G2, · .. E Q with Gn :>An and J.tGn:::; J.tAn + c2-n. Since f. L is subadditive on Q, we get

38

Foundations of Modern Probability

The desired relation follows since c was arbitrary. Thus, the extension is an outer measure on S. Finally, the inner regularity in (10) follows from D (8) and the monotonicity of J.l· Lemma 2.26 (measurability) lf J.L is a regular outer measure on S, then every Borel set in S is J.L-measurable.

Proof: Fix any F E F and A C G E Q. By the inner regularity in (10), we may choose G1. G2, .. · E Q with Gn C G \Fand J.LGn --+ J.L(G \ F). Since J.L is nondecreasing and finitely additive on Q, we get J.LG

> J.L(G \ 8Gn) = J.LGn + J.L(G \ Gn) > J.LGn + J.L(G n F) --+ J.L(G \ F) + J.L(G n F) > J.L(A \ F) + J.L(A n F).

Using the outer regularity in (9) gives J.LA ~ J.L(A \ F) + J.L(A n F),

FE F, Ac S.

Hence, every closed set is measurable, and by Theorem 2.1 the measuraD bility extends to CJ(F) = B(S) = S. Proof of Theorem 2.22: Construct an inner content v as in Lemma 2.24, and conclude from Lemma 2.25 that v admits an extension to a regular outer measure on S. By Theorem 2.1 and Lemma 2.26, the restriction of the latter toS= B(S) is a Radon measure on S, here still denoted by v. To see that J.L =von(\, fix any f E C+. For n E N and k E Z+, let fi:(x) Gk

u;:

(nf(x)- k)+ 1\ 1, {nf>k}={fi:>O}.

Noting that G;+l c = 1} and using the definition of V and the outer regularity in (9), we get for appropriate k V

fk+l ~ vGk+l ~ J.Lfk ~ v"G; ~ V fk-1·

Writing Go= G 0 = {! > 0} and noting that nf = nvf -vGo

~

nJ.Lf

~

l:k J;:, we obtain

nvf + vGo.

Here vG 0 < oo since Go is bounded. Dividing by n and letting n --+ oo gives J.Lf = v f. To prove the asserted uniqueness, let J.L and v be Radon measues on S By an inner approximation, we have J.LG = vG with J.L! =V f for all f E for every G E g, and a monotone-dass argument yields J.L = v. D

c+.

By a topological group we mean a group endowed with a topology that renders the group operations continuous. Thus, the mapping (f,g) f--7 fg is continuous from G 2 to G, whereas the mapping g f--7 g- 1 is continuous from G to G. In the former case, G 2 is equipped with the product topology.

2. Measure Theory - Key Results

39

Introducing the Borel a-field g = B(G), we obtain a measurable group (G, g), and we note that the group Operations are measurable when Gis lcscH. A measure J.L on G is said to be left-invariant if J.L(gB) = J.LB for all g E G and B E g, where gB = {gb; b E B}, the left translate of B by g. This is clearly equivalent to f(gk)J.L(dk) = J.Lf for any measurable function f: G-+ IR+ and element g E G. The definition of right-invariant measures is similar. We may now state the basic existence and uniqueness theorem for invariant measures on groups.

J

Theorem 2.27

{Haar measure) On every lcscH group G there exists, uniquely up to a normalization, a left-invariant Radon measure >. "1- 0. If G is compact, then >. is also right-invariant. Proof {Weil}: For any j, g E ( \ we define 1!1 9 = inf Lk ck, where the infimum extends over all finite sets of constants c1, ... , Cn 2:: 0 such that f(x) :::; Lk~n ckg(skx),

x E G,

for some s1. ... , Sn E G. By compactness, lfl 9 < oo when g "1- 0. We also note that IJI 9 is nondecreasing and translation invariant in J, and that it satisfies the subadditivity and homogeneity properties

I!+ f'lg

:::;

lflg + lf'lg,

icfig

= ciflg,

(11)

as well as the inequalities

11!11 M : :; 1119 :::; l!lhlhlg· We may normalize

(12)

lfl 9

by fixing an fo E (\ \ {0} and putting

.Agf

=

lfl 9 llfol 9 ,

f, g E

C\,

g

"1- 0.

From (11) and (12) we note that

>.9 (! + J') :::; >.9 j + >.9 j', lfolj 1 :::; Agj:::; lfiJo·

(13) (14)

Conversely, >. 9 is nearly superadditive in the following sense.

Lemma 2.28 (near superadditivity) For any j, f' E

"1- 0 such that Agj + >.gf' :::; Ag(f + !') + E,

exists an open set U

0 "1- g

6+

and

E

> 0,

there

-< u.

6+ with h = 1 on supp(f + J'), and define for o> 0 !o = J + !' + oh, ho = JI fo, h'o =!'I fo, so that h0 , h'o E 6+. By compactness we may choose a neighborhood U of Proof: Fix any h E

the identity element e E G such that

(15)

40

Foundations of Modern Probability

Now assume 0 -:/= 9 -< U, and let j 0 (x) ~ Ek Ck9(skx) for some E G and cl, ... ,en 2:0. Since 9(skx)-:/= 0 implies SkX EU, we have by (15) St, ... ,sn

f(x) = !a(x)ha(x)

< Lk Ck9(skx)ha(x)

< LkCk9(skx){ha(s; 1 )+8}, and similarly for

f'. Noting that

h0

+ h~

~

1, we get

Taking the infimum over all dominating sums for conclude that

18

and using (11), we

Now divide by lfolg, and use (14) to obtain

Agj + Agf' < {>-.g(J + J') + 8Agh}(1 + 28) < Ag(!+ J') + 28IJ + J'l!o + 8(1 + 28)lh1Jo• which tends to Ag(!+ J') as 8 ---* 0.

0

Returning to the proof of Theorem 2.27, we may consider the functionals

Ag as elements of the product space A = ~~+. For any neighborhood U of e, let Au denote the closure in A of the set {>-.g; 0 i= 9 -< U}. Since Agj ~ lfiJo < oo for all f E 6+ by (14), the Au are compact by Tychonov's theorem. Furthermore, the family {Au; e E U} has the finite intersection property since U C V implies Au C Av. We may then choose an element AE Au, here regarded as a functional on 6+. From (14) we note that Ai= 0. To see that Ais linear, fix any J, f' E 6+ and a, b;::: 0, and choose some 91,92, · · · E 6+ with SUPP9n-!- {e} suchthat

nu

Agnf-"* Aj,

AgnJ'-"* Aj',

Agn (aJ + bj')-"* A(aj + bj').

By (13) and Lemma 2.28 we obtain A(af + bf') = aAj + bAj'. Thus, A is a nontrivial, positive linear functional on 6+, and so by Theorem 2.22 it extends uniquely to a Radon measure on S. The invariance of the functionals Ag clearly carries over to A. Now consider any left-invariant Radon measure A i= 0 on G. Fixing a right-invariant Radon measure JL i= 0 and a function h E 6+ \ {0}, we define

p(x) =

J

h(y- 1 x)JL(dy),

x E G,

2. Measure Theory - Key Results

41

and we note that p > 0 on G. Using the invariance of A and JL together with Fubini's theorem, we get for any f E C+

(Ah)(JLJ)

=

I

h(x) A(dx)

I

I I I I h(x) A(dx)

f(y) JL(dy) f(yx) JL(dy)

JL(dy)

h(x)f(yx) A(dx)

JL(dy)

h(y- 1 x)f(x) A(dx)

I I

I

f(x) A(dx)

I

h(y- 1 x) JL(dy) = A(jp).

Since f was arbitrary, we conclude that (Ah)JL = p · A or, equivalently, Aj Ah = p- 1 • JL· Here the right-hand side is independent of A, and the asserted uniqueness follows. If S is compact, we may choose h 1 to obtain Aj AS = JL/ JLB. D Given a group G and an abstract space S, we define a left action of G on S as a mapping (g, s) t-t gs from G x S to S such that es = s and (gh)s = g(hs) for any g, h E G and s E S, where e denotes the identity element in G. Similarly, a right action is a mapping ( s, g) t-t sg such that se = s and s(gh) = (sg)h for all s, g, h as above. The action is said tobe transitive if for any s, t E S there exists some g E G such that gs = t or sg = t, respectively. All actions are henceforth assumed to be from the left. If G is a topological group and S is a topological space, we assume the action (x, s) t-t xs to be continuous from G x S to S. A function h: G --* S is said tobe proper if h- 1 K is compact in G for any compact set K C S; if this holds for every mapping 7r8 (x) = xs, s E S, we say that the group action is proper. Finally, a measure JL on S is G-invariant if JL(xB) = JLB for any x E G and BE S. This is clearly equivalent to the relation f(xs)JL(ds) = JL! for any measurable function f: S--* IR+ and element x E G. We may now state the basic existence and uniqueness result for invariant measures on a general lcscH space. The existence of Haar measures in Theorem 2.27 is a special case.

J

Theorem 2.29 (invariant measure) Consider an lcscH group G that acts transitively and properly on an lcscH space S. Then there exists, uniquely up to a normalization, aG-invariant Radon measure JL =j:. 0 on S.

Proof: Fix any p E S, and let 1r denote the mapping x t-t xp from G to S. Letting A be a left Haar measure on G, we define JL = A o 1r- 1 . Since 1r is proper, we note that JL is a Radon measure on S. To see that JL is

42

Foundations of Modern Probability

G-invariant, let

f E (\ be arbitrary, and note that for any x E G

fs f(xs) JL(ds) = Ia f(xyp) >.(dy) = Ia f(yp) >.(dy) = JLf,

by the invariance of >.. To prove the uniqueness, let JL be an arbitrary G-invariant Radon measure on S. Introduce the subgroup

= {x E G;

K

xp = p}

and note that K is compact since Haar measure on K, and define

/(x) =

1r

= 1r- 1 {p},

is proper. Let v be the normalized

L

f(xk) v(dk),

x E G, f E C+(G).

If xp = yp, we have y- 1 xp = p, and so y- 1 x x = yh. Hence, the left invariance of v yields

/(x)

= /(yh) =

L

f(yhk) v(dk)

=

L

=h E K, which implies

f(yk) v(dk)

= /(y).

f f-t f* by s = xp ES, x E G, f E C+(G).

We may then define a mapping

j*(s)

= /(x),

For any subset B

C

(0, oo), we note that

(f*)- 1 B = 7f(J- 1 B) c 1r[(suppf) · K].

Here the right-hand side is compact since the sets supp fand Kare compact, and since 1r and the group operation in G are both continuous. Thus, f* has bounded support. Furthermore, f is continuous by dominated convergence, and so J- 1 [t, oo) is closed and hence compact for every t > 0. By the continuity of 1r it follows that even (f*)- 1 [t,oo) is compact. In particular, f* is measurable. We may now define a functional >. on C+(G) by

>.f=JLJ*,

jEC+(G).

The linearity and positivity of >. are clear from the corresponding properties of the mapping f f-t f* and the measure JL· We also note that >. is finite on C+(G) since JL is locally finite. By Theorem 2.22, we may then extend >. to a Radon measure on G. To see that >. is left-invariant, let f E C+(G) be arbitrary and define jy(x) = f(yx). For any s = xp ES and y E G we get

J;(s) = /y(x) =

L

Hence, by the invariance of JL,

f(yxk) v(dk)

= f(yx) = j*(ys).

Ia f(yx) >.(dx) = >.jy = JL!; = fs j*(ys) JL(ds) = JLJ* = >.j.

2. Measure Theory - Key Results

43

Now fix any g E C+(S), and put

f(x)

= g(xp) = g o n(x),

x

E

G.

Then f E C+(G) because {! > 0} C n- 1 suppg, which is compact since is proper. By the definition of K, we have for any s = xp ES

f*(s)

1r

= /(x) = [ f(xk) v(dk) = [ g(xkp) v(dk) [ g(xp) v(dk) = g(s),

and so J.Lg

= J-tf* = >.j = >.(g o n) = (>. o n- 1 )g,

which shows that J-t = ).. o n- 1 . Since )... is unique up to a normalization, the same thing is true for J-t. D

Exercises 1. Show that if /-tl = h · J-t and /-t2 = h · J-t, then J-t1 V J-t2 = (h V h) · J-t and = (h 1\ h) · J-t. In particular, we may take J-t = /-tl + /-t2· Extend the result to sequences /-tl, /-t2, ....

J-t1 1\ J-t2

2. Consider an arbitrary family J-ti, i E I, of a-finite measures on some measurable space S. Show that there exists a largest measure J-t = An J-tn such that J-t ::; /-ti for all i E I. Show also that if the J-ti are bounded by some a-finite measure v, there exists a smallest measure p = Vn /-ti such that /-ti ::; p for all i. (Hint: Use Zorn's lemma.) 3. Show that any countably additive set function J-t ;::: 0 on a field A with J-t0 = 0 extends to a measure on a(A). Show also that the extension is unique whenever J-t is bounded. 4. Extend the first assertion of Theorem 2.6 to the context of general invariant measures, as in Theorem 2.29. 5. Construct d-dimensional Lebesgue measure >.d directly, by the method of Theorem 2.2. Then show that Ad = >.d. 6. Derive the existence of d-dimensional Lebesgue measure from Riesz' representation theorem and the basic properties of the Riemann integral.

7. Extend the mean continuity in Lemma 2.7 to generalinvariant measures. 8. For any bounded, signed measure von (0, A), show that there exists a smallest measure lvl suchthat lvAI ::; lviA for all A E A. Show also that lvl = v+ + v_, where V± are the components in the Hahn decomposition of v. Finally, for any bounded, measurable function f on n, show that lv !I ::; lvllfl.

44

Foundations of Modern Probability

9. Extend the last result to complex-valued measures x = JL + iv, where JL and v are bounded, signed measures on (0, A). Introducing the complexvalued Radon-Nikodym density f = dx/d(IJLI + lvl), show that lxl = 1!1·

(IJLI + lvl).

10. Show by an example that the uniqueness in Theorem 2.29 may fail if the group action is not transitive.

Chapter 3

Processes, Distributions, and Independence Random elements and processes; distributions and expectation; independence; zero-one laws; Borel-Cantelli lemma; Bernoulli sequences and existence; moments and continuity of paths Armed with the basic notions and results of measure theory from the previous chapter, we may now ernhark on our study of probability theory itself. The dual purpose of this chapter is to introduce the basic terminology and notation and to prove some fundamental results, many of which are used throughout the remainder of this book. In modern probability theory it is customary to relate all objects of study to a basic probability space (0, A, P), which is nothing more than anormalized measure space. Random variables may then be defined as measurable functions on 0, and their expected values as the integrals Ee = f edP. Furthermore, independence between random quantities reduces to a kind of orthogonality between the induced sub-a-fields. It should be noted, however, that the reference space 0 is introduced only for technical convenience, to provide a consistent mathematical framework. Indeed, the actual choice of 0 plays no role, and the interest focuses instead on the various induced distributions .C(e) = p 0 e- 1 . The notion of independence is fundamental for all areas of probability theory. Despite its simplicity, it has some truly remarkable consequences. A particularly striking result is Kolmogorov's 0-1 law, which states that every tail event associated with a sequence of independent random elements has probability zero or one. As a consequence, any random variable that depends only on the "tail" of the sequence must be a.s. constant. This result and the related Hewitt-Savage ü-1law convey much of the flavor of modern probability: Although the individual elements of a random sequence are erratic and unpredictable, the long-term behavior may often conform to deterministic laws and patterns. Our main objective is to uncover the latter. Here the classical Borel-Cantelli lemma is a useful tool, among others. To justify our study, we need to ensure the existence of the random objects under discussion. For most purposes, it suffices to use the Lebesgue unit interval ([0, 1], B, >.) as the basic probability space. In this chapter the existence will be proved only for independent random variables with prescribed distributions; we postpone the more general discussion until

e

46

Foundations of Modern Probability

Chapter 6. As a key step, we use the binary expansion of real numbers to construct a so-called Bernoulli sequence, consisting of independent random digits 0 or 1 with probabilities 1 - p and p, respectively. Such sequences may be regarded as discrete-time Counterparts of the fundamental Poisson process, tobe introduced and studied in Chapter 12. The distribution of a random process X is determined by the finite-dimensional distributions, and those are not affected if we change each value Xt on a null set. It is then natural to Iook for versions of X with suitable regularity properties. As another striking result, weshall provide a moment condition that ensures the existence of a continuous modification of the process. Regularizations of various kinds are important throughout modern probability theory, as they may enable us to deal with events depending on the values of a process at uncountably many times. To begin our systematic exposition of the theory, we may fix an arbitrary probability space (0, A, P), where P, the probability measure, has total mass 1. In the probabilistic context the sets A E Aare called events, and PA = P(A) is called the probability of A. In addition to results valid for all measures, there are properties that depend on the boundedness or normalization of P, such as the relation PA c = 1 - PA and the fact that An ..j.. A implies PAn --+ PA. Same infinite set operations have special probabilistic significance. Thus, given any sequence of events A1. A2, · · · E A, we may be interested in the sets {An i.o.}, where An happens infinitely often, and {An ult.}, where An happens ultimately (i.e., for all but finitely many n). Those occurrences are events in their own right, expressible in terms of the An as {An i.o.}

(1)

=

{An ult.}

(2)

From here on, we omit the argument w from our notation when there is no risk for confusion. For example, the expression n::::n 1An = 00} is used as a convenient shorthand form of the unwieldy {w E 0; Ln 1An ( w) = oo}. The indicator nmctions of the events in (1) and (2) may be expressedas 1{An i.o.}

= limsup1An, n--+oo

1{An ult.}

= liminf 1A , n--+oo n

where, for typographical convenience, we write 1{·} instead of 1o. Applying Fatou's Iemma to the functions 1An and 1A:i, we get P{An i.o.} 2:: limsupPAn, n--+oo

P{An ult.} :S liminfPAn. n--+oo

Using the continuity and subadditivity of P, we further see from (1) that P{An i.o.}

= n--+oo lim PU k?_n Ak :S n--+oo lim '""' PAk. L..Jk?_n

3. Processes, Distributions, and Independence

47

If L:n PAn < oo, we get zero on the right, and it follows that P{ An i.o.} = 0. The resulting implication constitutes the easy part of the Borel-Cantelli lemma, tobe reconsidered in Theorem 3.18. Any measurable mapping ~ of n into some measurable space (S, S) is called a random element inS. If BE S, then {~ E B} = ~- 1 BE A, and we may consider the associated probabilities P{~ E B} = P(C 1 B) = (PoC 1 )B,

BE S.

The set function .C(~) = Po ~- 1 is a probability measure on the range space S of ~, called the distribution or law of ~. We shall also use the term distribution as synonomous to probability measure, even when no generating random element has been introduced. Random elements are of interest in a wide variety of spaces. A random element in S is called a random variable when S = IR, a random vector when S = JRd, a random sequence when S = JR.=, a random or stochastic process when S is a function space, and a random measure or set when S is a dass of measures or sets, respectively. A metric or topological space S will be endowed with its Borel a-field B(S) unless a a-field is otherwise specified. For any separable metric space S, it is dear from Lemma 1.2 that ~ = (~1> 6, ... ) is a random element in s= iff 6, 6, ... are random elements in S. If ( S, S) is a measurable space, then any subset A C S becomes a measurable space in its own right when endowed with the a-field AnS= {AnB; B E S}. By Lemma 1.6 we note in particular that if S is a metric space with Borel a-field S, then AnS is the Borel a-field in A. Any random element in (A, AnS) may dearly be regarded, alternatively, as a random element in S. Conversely, if ~ is a random element in S suchthat ~ E A a.s. (almost surely or with probability 1) for some A E S, then ~ = 'fJ a.s. for some random element rJ in A. Fixing a measurable space (S, S) and an abstract index set T, weshall write sT for the dass of functions f: T -+ s, and let sT derrote the afield in sT generated by all evaluation maps 1rt: sT -+ s, t E T, given by 1rtf = f(t). If X: n-+ U C ST, then dearly Xt = 7rt o X maps n into S. Thus, X may also be regarded as a function X(t,w) = Xt(w) from T x n to S.

Lemma 3.1 {measurability) Fixa measurable space (S,S), an index set T' and a subset u c sT. Then a function X: n -+ u is u n sT -measurable iff Xt : n-+ S is S-measurable for every t E T. Proof: Since X is U-valued, the U n ST-measurabi1ity is equivalent to measurability with respect to ST. The result now follows by Lemma 1.4 from the fact that ST is generated by the mappings 7rt· 0 A mapping X with the properties in Lemma 3.1 is called an S-valued (random) process on T with paths in U. By the lemma it is equivalent to regard X as a collection of random elements Xt in the state space S.

48

Foundations of Modern Probability

For any random elements

~

and TJ in a common measurable space, the

equality ~ :!:::. TJ means that ~ and TJ have the same distribution, or C(~) = C(ry). If X is a random process on some index set T, the associated finitedimensional distributions are given by J.lfi, ... ,t,.

= .C(Xt

1 , ••• ,

tt, ... , tn E T, n E N.

Xt,.),

The following result shows that the distribution of a process is determined by the set of finite-dimensional distributions.

Proposition 3.2 (finite-dimensional distributions} Fix any S, T, and U as in Lemma 3.1, and let X and Y be processes on T with paths in U. Then X:!:::_ Y iff

(Xtu ... , Xt,.)

=d (Yiu ... , Yi..),

t1, ... , tn E T, n E N.

(3)

Proof: Assurne (3). Let V denote the dass of sets A E sr with P{X E A} E A}, and let C consist of all sets

= P{Y

A = {! E sT; Utw .. Jt,.) E B},

tl, ... 'tn E T, BE sn, n E N.

Then C is a 1r-system and V a >.-system, and furthermore C C V by hypothesis. Hence, sr = a(C) c V by Theorem 1.1, which means that

x:f:::.Y.

D

For any random vector ~ distribution function F by

F(xt, ... 'Xd)

= (6, ... , ~d)

in Rd, we define the associated

= p nk:::;) ~k :S xk},

Xt, ... 'Xd E R

The next result shows that F determines the distribution of ~·

Lemma 3.3 (distribution functions} Let~ and TJ be random vectors in JRd with distribution functions F and G. Then ~ :!:::. TJ iff F = G.

Proof: Use Theorem 1.1.

D

The expected value, expectation, or mean of a random variable ~ is defined as

E~=

L~dP= l

x(PoC 1 )(dx)

(4)

whenever either integral exists. The last equality then holds by Lemma 1.22. By the same result we note that, for any random element ~ in some measurable space S and for an arbitrary measurable function f: S -+ JR,

Ef(~) = fn!(~)dP= fsf(s)(PoC 1 )(ds) = Lx(Po(fo~)- 1 )(dx),

(5)

3. Processes, Distributions, and Independence

49

provided that at least one of the three integrals exists. Integrals over a measurable subset A c n are often denoted by

E[~;A] = E(~1A) =

i ~dP,

A E A.

For any random variable ~ and constant p > 0, the integral EI~IP = is called the pth absolute moment of ~· By Hölder's inequality (or by Jensen's inequality in Lemma 3.5) we have II~IIP :::; ll~llq for p :::; q, so the corresponding V'-spaces are nonincreasing in p. If ~ E LP and either p E N or ~ 2: 0, we may further define the pth moment of ~ as E~P. The following result gives a useful relationship between moments and tail probabilities. 11~11~

Lemma 3.4 (moments and tails} For any random

Ee = p

variable~

1oo P{~ > t} tP- dt = p 1oo P{~ 2: t} tP- dt, 1

1

2: 0, p > 0.

Proof: By calculus and Fubini's theorem, Ee

=

pE p

1e

tP- 1 dt = pE

1oo 1{~ > t}tP- dt 1

1oo P{~ > t} tP- dt. 1

The proof of the second expression is similar.

0

A random vector ~ = (6, ... ,~d) or process X (Xt) is said tobe integrable if integrability holds for every component ~k or value Xt, in which case we may write E~ = (E6, ... ,E~d) or EX = (EXt)· Recall that a function f: JRd --+ IR is said to be convex if

f(px

+ (1- p)y):::; pf(x) + (1- p)f(y),

x,y E lRd, p E [0, 1].

(6)

The relation may be written as f(E~) :::; Ef(~), where ~ is a random vector in JRd with P{~ = x} = 1- P{~ = y} = p. The following extension to arbitrary integrable random vectors is known as Jensen's inequality. Lemma 3.5 (convex maps, Hölder, Jensen} For any integrable random vector ~ in JRd and convex function f: JRd --+ JR, we have Ef(~)

2:

f(E~).

Proof: By a version of the Hahn-Banach theorem, the convexity condition (6) is equivalent to the existence for every s E JRd of a supporting affine function h 8 (x) = ax + b with f 2: h8 and f(s) = hs(s). Taking s = E~ gives Ef(~)

2:

Ehs(~) = hs(E~) = f(E~).

The covariance of two random variables ~, "' E L 2 is given by cov(~,

ry)

= E(~- E~)(ry- Ery) = E~ry- E~ · Ery.

0

50

Foundations of Modern Probability

The resulting functional is bilinear, in the sense that cov (". L....JJ~m ai~i, "L....Jk~n bk'fJk)

= ". " aibkcov(~i, 'fJk)· L....JJ~mL....Jk~n

Taking ~ = "" E L 2 yields the variance var(~)

= cov(~,~) = E(~- E~) 2 = Ee- (E~) 2 ,

and we note that, by the Cauchy-Buniakovsky inequality,

Jcov(~,rJ)I:::; {var(~)var(rJ)} 1 / 2 . Two random variables~ and rJ are said tobe uncorrelated if cov(~, rJ) = 0. For any collection of random variables ~t E L 2 , t E T, the associated covariance function Ps,t = cov(~s,~t), s, t E T, is nonnegative definite, in the sense that L:;ij aiaiPt;,t; 2: 0 for any n E N, tl> ... tn E T, and al> ... , an E IR. This is clear if we write

L· .aiaiPt;,t; = L· .aiaicov(~t;,~t;) = var {I:.ai~t.} 2: 0. t,J

t,J

t

The events At E A, t E T, are said tobe {mutually) independent if, for any distinct indices tl> ... , tn E T,

(7) More generally, we say that the families Ct C A, t E T, are independent if independence holds between the events At for arbitrary At E Ct, t E T. Finally, the random elements ~t. t E T, are independent if independence holds between the generated cr-fields cr(~t), t E T. Pairwise independence between two objects A and B, ~ and rJ, or B and C is often denoted by AllB, ~llrJ, or BllC, respectively. The following result is often useful to prove extensions of the independence property.

Lemma 3.6 {extension) lftherr-systems Ct, t E T, are independent, then so are the generated cr-fields :Ft = cr(Ct), t E T.

Proof: We may clearly assume that Ct =f. 0 for all t. Fix any distinct indices tl> ... , tn E T, and note that (7) holds for arbitrary Atk E Ctk, k = 1, ... , n. For fixed At 2 , ••• , Atn, we introduce the dass V of sets Att E A satisfying (7). Then V is a .\-system containing Ct 1 , and so V :::> cr(Ct 1 ) = Ft 1 by Theorem 1.1. Thus, (7) holds for arbitrary At1 E :Ftt and Atk E Ctk, k = 2, ... , n. Proceeding recursively in n steps, we obtain the desired D extension to arbitrary Atk E :Ftk, k = 1, ... , n. As an immediate consequence, we obtain the following basic grouping property. Here and in the sequel we shall often write :F V g = er{ :F, 9} and :Fs = VtES :Ft = cr{:Ft; t E 8}.

3. Processes, Distributions, and Independence

51

Corollary 3.7 (grouping) Let :Ft, t E T, be independent a-fields, and let

r

be a disjoint partition ofT. Then the a-fields :Fs = again independent.

VtES :Ft,

s E /, are

Proof: For any S E /, Iet Cs denote the class of all finite intersections of sets in UtES :Ft. Then the classes Cs are independent n-systems, and by Lemma 3.6 the independence extends to the generated a-fields :Fs. 0 Though independence between more than two a-fields is clearly stronger than pairwise independence, we shall see how the full independence may be reduced to the pairwise notion in various ways. Given any set T, we say that a class c 2T is separating, if for any s =/:. t in T there exists some 8 E /such that exactly one of the elements s and t lies in 8.

r

Lemma 3.8 (pairwise independence)

(i) The a-fields :F1,:F2, ... are independent iffVk~n:FkJJ..:Fn+l for all n. (ii) The a-fields :Ft, t E T, are independent iff :FsJl..:Fsc for all sets S in some separating class f C 2T. Proof: The necessity of the two conditions follows from Corollary 3.7. As for the sufficiency, we consider only part (ii), the proof for (i) being similar. Under the stated condition, we need to show that, for any finite subset S C T, the a-fields :F8 , s E S, are independent. Let ISI denote the cardinality of S, and assume the statement to be true for ISI ~ n. Proceeding to the case when ISI = n + 1, we may choose U E T suchthat S' = Sn U and S" = S \ U are nonempty. Since :FS' JJ..:FS", we get for any sets As E :F8 , s ES,

Pn

sES

As =

(PnsES' As) (Pn sES" As) = I1sES PAs,

where the last relation follows from the induction hypothesis.

0

A a-field :Fis said tobe P-trivial if PA= 0 or 1 for every A E :F. We further say that a random element is a.s. degenernte if its distribution is a degenerate probability measure. Lemma 3.9 {triviality and degeneracy) A a-field :Fis P-trivial iff :FJl..:F.

In that case, any :F-measurable random element separable metric space is a.s. degenerate.

~

taking values in a

Proof: If :FJl..:F, then for any A E :F we have PA = P(A n A) = (P A) 2, and so PA = 0 or 1. Conversely, assume that :F is P-trivial. Then for any two sets A,B E :Fwe have P(AnB) =PA I\ PB= PA·PB, which means that :FJl..:F. Now assume that :Fis P-trivial, and let ~ be as stated. For each n we may partition S into countably many disjoint Borel sets Bnj of diameter < n- 1. Since P{~ E Bnj} = 0 or 1, we have ~ E Bnj a.s. for exactly one j,

Foundations of Modern Probability

52

nn

say for j =in· Hence, e E Bn,jn a.s. The latter set has diameter 0, so it consists of exactly one point s, and we get = s a.s. D

e

The next result gives the basic relation between independence and product measures.

Lemma 3.10 (product measures) Let 6, ... ,en be random elements in some measurable spaces 8 1 , ... , Sn with distributions JL1 , ... , JLn. Then the ek are independent iff e = (6' ... ' en) has distribution JLl ® ... ® JLn.

Proof: Assuming the independence, we get for any measurable product set B = B1 X · • · X Bn P{e

E

B} =

I1 P{ek E Bk}= I1 JLkBk = {g)JLkB.

This extends by Theorem 1.1 to arbitrary sets in the product a-field.

D

In conjunction with Fubini's theorem, the last result leads to a useful method of computing expected values.

Lemma 3.11 (conditioning) Let e and TJ be independent random elements in some measurable spaces S and T, and let the function f: S x T -t IR be measurable with E(Eif(s, ry)l)s=~ < oo. Then EJ(e, TJ) = E(Ef(s, ry))s=~·

Proof: Let JL and V denote the distributions of e and "'' respectively. Assuming that f 2': 0 and writing g(s) = Ef(s,ry), we get, by Lemma 1.22 and Fubini's theorem,

= I f(s, t)(JL ® v)(dsdt)

EJ(e, TJ)

I JL(ds) I f(s, t)v(dt) =I g(s)JL(ds) = Eg(e). For general /, this applies to the function desired relation then follows as before.

l/1, and so Elf(e, ry)l < oo.

The D

In particular, for any independent random variables

6, ... , en, we have

whenever the expressions on the right exist. If and TJ are random elements in a measurable group G, then the product eTJ is again a random element in G. The following result gives the connection between independence and the convolutions of Lemma 1.28.

e

Corollary 3.12 (convolution) Let e and TJ be independent random elements with distributions JL and v, respectively, in some measurable group G. Then the product eTJ has distribution JL *V.

Proof: For any measurable set B c G, we get by Lemma 3.10 and the definition of convolution P{eTJ

E

B}

= (JL ® v){(x, y) E G2 ;

xy

E

B}

= (JL * v)B.

D

3. Processes, Distributions, and Independence

53

Given any sequence of a-fields :F1, :F2, ... , we introduce the associated tail a-field

T=

nV n

k>n

:Fk

=

n n

a{:Fk; k

> n}.

The following remarkable result shows that T is trivial whenever the :Fn are independent. An extension appears in Corollary 7.25. Theorem 3.13 (Kolmogorov's 0-1 law) Let :F1, :F2, ... be independent a-fields. Then the tail a-field T = Vk>n :Fk is P-trivial.

nn

Proof: For each n

E N, define

Tn = Vk>n :Fk,

and note that

:F1, ... , :Fn, Tn are independent by Corollary 3. 7. Hence, so are the a-fields :F1, ... , :Fn, T, and then also :F1, :F2, ... , T By the same theorem we obtain /oll T, and so T JlT Thus, T is P-trivial by Lemma 3.9. D We shall consider some simple illustrations of the last theorem. Corollary 3.14 (sums and averages) Let 6, 6, ... be independent random variables, and put Sn = 6 + · · · + ~n· Then each of the sequences (Sn) and (Sn/n) is either a.s. convergent or a.s. divergent. For the latter sequence, the possible limit is a.s. degenerate. Proof: Define :Fn = a{~n}, n E N, and note that the associated tail afield T is P-trivial by Theorem 3.13. Since the sets of convergence of (Sn) and (Sn/n) are 7-measurable by Lemma 1.9, the first assertion follows. The second assertion is obtained from Lemma 3.9. D

By a finite permutation of N we mean a bijective map p: N -+ N such that Pn = n for all but finitely many n. For any space S, a finite permutation p of N induces a permutation Tp on S 00 given by Tp(s) = sop= (sp 11 Sp 2 , ••• ),

A set I

s = (sbs2, ... ) E S 00 •

c soo is said to be symmetric (under finite permutations) if TP-l I

:::: { s E S 00 ; s o p E I}

=I

for every finite permutation p of N. If (S, S) is a measurable space, the symmetric setsIE S 00 form a sub-a-field I C S 00 , called the permutation invariant a-field in soo. We may now state the other basic 0-1law, which refers to sequences of random elements that are independent and identically distributed (often abbreviated as i.i.d.). Theorem 3.15 (Hewitt-Savage 0-1 law) Let~ be an infinite sequence of i. i. d. random elements in some measurable space (S, S), and let I denote the permutationinvariant a-field in S 00 • Then the a-field ~- 1 I is P-trivial.

Our proof is based on a simple approximation. Write

Ah.B = (A \ B) u (B \ A),

Foundations of Modern Probability

54

and note that

Lemma 3.16 (approximation) Given any a-fields F1 C F2 C · · · and a set A E VnFn, there exist some A1,A2,· ·· E UnFn with P(AL.An)-+ 0.

Proof: Define C = Un Fn, and let V denote the class of sets A E Vn Fn with the stated property. Then C is a 1r-system and V a .\-system containing C. By Theorem 1.1 we get Vn Fn = a(C) C V. D

.C(e),

Proof of Theorem 3.15: Define J.L = put Fn = sn x S 00 , and note that I C S 00 = Vn Fn. For any I E I there exist by Lemma 3.16 some Bn E sn such that the corresponding cylinder sets In = Bn X soo satisfy J.L(IL.In)-+ 0. Writing in= sn X Bn X S 00 , it is clear from the symmetry of J.L and I that J.Lin = J.Lin -+ J.LI and J.L(I L.in) = J.L(I L.In) -+ 0. Hence, by

(8),

J.L(I L.(In n in)) ~ J.L(I L.In) + J.L(I L.in) -+ 0. Since moreover In.Jl..in under J.L, we get

J.LI +--- J.L(In

n in)= (J.Lin)(J.Lin)-+

Thus, J.Ll = (J.L/) 2 , and so Po

e- 1I= J.LI = 0 or 1.

(J.LI) 2. D

The next result lists some typical applications. Say that a random variable

eis symmetrie if e4 -e

0

Let 6, 6,... be i.i.d., nondegenerate random variables, and put Sn = 6 + · · · + en. Then (i) P{Sn E B i.o.} = 0 or 1 for any BEB; (ii) limsupn Sn= oo a.s. or --oo a.s.; (iii) limsupn(±Sn) = oo a.s. if the en are symmetrie.

Corollary 3.17 {random walk)

Proof: Statement (i) is immediate from Theorem 3.15, since for any finite permutation p of N we have Xp 1 + · · · + Xvn = x 1 + · · · + Xn for all but finitely many n. To prove (ii), conclude from Theorem 3.15 and Lemma 3.9 that lim supn Sn = e a.s. for some constant e E iR = [-oo, oo J. Hence, a.s., e

= limsupnSn+1 = limsupn(Sn+1- 6) + 6 = e + 6-

If Iei < oo, we get 6 = 0 a.s., which contradicts the nondegeneracy of 6Thus, Iei = oo. In case (iii), we have

c = limsupnSn ?: liminfnSn

=

and so -e ~ e E {±oo}, which implies e

-limsupn( -Sn)= -e,

= oo.

D

Using a suitable zero-one law, one can often rather easily see that a given event has probability zero or one. Determining which alternative actually occurs is often harder. The following classical result, known as the

3. Processes, Distributions, and Independence

55

Borel-Cantelli lemma, may then be helpful, especially when the events are independent. An extension to the general case appears in Corollary 7.20. Theorem 3.18 (Borel, Cantelli) Let A1, A2, · · · E A. Then En PAn < oo implies P {An i.o.} = 0, and the two conditions are equivalent when the An are independent. Here the first assertion was proved earlier as an application of Fatou's lemma. The use of expected values allows a more transparent argument.

Proof: If En PAn < oo, we get by monotone convergence

Thus, En 1An < oo a.s., which means that P{An i.o.} = 0. Next assume that the An are independent and satisfy En PAn Noting that 1 - x :::; e-x for all x, we get

1- p

n

k?:n

Ak =

1-

rr

k?:n

=

oo.

p Ak

IJ k?:n (1- PAk) 2: 1- IJk?:n exp(-PAk) 1 - exp {- L k?:n PAk} = 1.

1-

Hence, as n -+ oo,

1

=PUk?:n Ak Pnn Uk?:n Ak =P{An i.o.}, ..1-

and so the probability on the right equals 1.

D

For many purposes it is sufficient to use the Lebesgue unit interval ([0, 1], B[O, 1], .\) as the basic probability space. In particular, the following result ensures the existence on [0, 1] of some independent random variables 6, 6, . . . with arbitrarily prescribed distributions. The present statement is only preliminary. Thus, we shall remove the independence assumption in Theorem 6.14, prove an extension to arbitrary index sets in Theorem 6.16, and eliminate the restriction on the spaces in Theorem 6.17.

Theorem 3.19 (existence, Borel) For any probability measures f-Ll, f-L2, ... on some Borel spaces 8 1 , 8 2 , .•. , there exist some independent random elements 6, 6, ... on ([0, 1], .\) with distributions f-Ll, f-L2, ... . As a consequence, there exists a probability measure f-L satisfying f-L o (n1, ... , ?rn)- 1 = /-Ll 0 · · · 0 f-Ln,

Oll

81

X

82

X ...

n E N.

For the proof, we first consider two special cases of independent interest. By a Bernoulli sequence with rate rate p we mean a sequence of i.i.d. random variables 6,6, ... suchthat P{~n 1} 1- P{~n 0} p. Furthermore, we say that a random variable {) is uniformly distributed on [0, 1] (written as U(O, 1)) if its distribution .C(ß) equals Lebesgue

= =

= =

56

Foundations of Modern Probability

measure >. on [0, 1]. Every number x E [0, 1] has a binary expansion r 1 , r 2 , .. · E { 0, 1} satisfying x = I.:n r n2-n, and to ensure uniqueness we assume that I.:n rn = oo when x > 0. The following result provides a simple construction of a Bernoulli sequence on the Lebesgue unit interval.

Lemma 3.20 {Bernoulli sequence) Let{) be a random variable in [0, 1] with binary expansion ~b 6, .... Then {) is U(O, 1) iff the ~n form a Bernoulli sequence with rate ~. Proof: lf{) is U(0,1), then Pnj~n{~j = kj} = 2-n for all k1, ... ,kn E {0, 1}. Summing over k1, ... , kn-1 gives P{ ~n = k} = ~ for k = 0 and 1. A similar calculation yields the asserted independence. Now assume instead that the ~n form a Bernoulli sequence with rate ~· -

Letting {) be U(O, 1) with binary expansion Thus,

6, 6, ... , we get

{)=Ln ~n2-n 4 Ln tn2-n = J.

(~n)

=d (~n)·

D

The next result shows how a single U(O, 1) random variable can be used to generate a whole sequence.

Lemma 3.21 {reproduction) There exist some measurablefunctions !1, /2, ... on [0, 1] such that whenever {) is U(O, 1), the random variables {)n = fn({)) are i.i.d. U(O, 1). Proof: For any x E [0, 1] we introduce the associated binary expansion 9 1 ( x), 92 ( x), . . . and note that the 9k are measurable. Rearranging the 9k into a two-dimensional array hnj, n, j E N, we define fn(x)

= L .2-jhnj(x), J

XE

[0, 1], n

E

N.

By Lemma 3.20 the random variables gk({)) form a Bernoulli sequence with rate~' and the same result shows that the variables {)n = fn({)) are U(O, 1). The latter are further independent by Corollary 3. 7. D Finally, we need to construct a random element with given distribution from an arbitrary randomization variable. The required lemma is stated in a version for kernels, to meet the needs of Chapters 6, 8, and 14.

Lemma 3.22 {kernels and randomization) Let J..t be a probability kernel from a measurable space S to a Borel space T. Then there exists a measurable function f: S x [0, 1] --+ T such that if {) is U(O, 1), then f(s, {)) has distribution J..t(s, ·) for every s ES. Proof: We may assume that T is a Borel subset of [0, 1], in which case we may easily reduce to the case when T = [0, 1]. Define

= sup{x E [0, 1]; J..t(s, (0, x]) < t}, s ES, t E [0, 1], (9) and note that f is product measurable on S x [0, 1], since the set {(s, t); J..t(s, [0, x]) < t} is measurable for each x by Lemma 1.12, and the supremum f(s, t)

3. Processes, Distributions, and Independence

57

in (9) can be restricted to rational x. If {) is U(O, 1), we get P{f(s, {)) ::; x}

= P{ {) ::; J-L( s, [0, x])} = J-L(s, [0, x]),

x E [0, 1],

and so f(s, {)) has distribution J-L(s, ·) by Lemma 3.3.

0

Proof of Theorem 3.19: By Lemma 3.22 there exist some measurable functions fn: [0, 1] -+ Sn suchthat ..\ o J;; 1 = J-ln· Letting {) be the identity mapping on [0, 1] and choosing {)b {) 2 , ... as in Lemma 3.21, we note that the functions ~n = fn({)n), n E N, have the desired joint distribution. 0

Next we consider the regularization and sample path properties of random processes. Say that two processes X and Y on the same index set T are versions of each other if Xt = yt a.s. for each t E T. In the special case when T = ~d or ~+, we note that two continuous or right-continuous versions X and Y of the same process are indistinguishable, in the sense that X = Y a.s. In general, the latter notion is clearly stronger. For any function f between two metric spaces (S,p) and (S',p'), the associated modulus of continuity WJ = w(f, ·) is given by WJ(r)

= sup{p1 (!8 , ft);

s, t ES, p(s, t)::; r},

r

> 0.

Note that f is uniformly continuous iff WJ(r) -+ 0 as r-+ 0. Say that f is Hölder continuous with exponent c if WJ(r) ;S_ rc as r -+ 0. The property is said to hold locally if it is true on every bounded set. (Here and in the sequel, the relation f < g between positive functions means that f ::; cg for some constant c < ~-) A simple moment condition ensures the existence of a Hölder-continuous version of a given process on ~d. Important applications are given in Theorems 13.5, 21.3, and 22.4, and a related tightness criterion appears in Corollary 16.9. Theorem 3.23 (moments and continuity, Kolmogorov, Loeve, Chentsov) Let X be a process on ~d with values in a complete metric space ( S, p), and assume for some a, b > 0 that E{p(Xs,Xt)}a ;s_ls- tid+b, s, t E ~d. (10) Then X has a continuous version, and the latter is a.s. locally Hölder continuous with exponent c for any c E (O,bja). Proof: It is clearly enough to consider the restriction of X to [0, 1jd. Define

and let

Since

Foundations of Modern Probability

58

we get by (10), for any c E (0, b/a), E 2:.)2cn~n)a = n

L 2acn E~~ ;S L 2acn2dn(Tn)d+b = L 2(ac-b)n < n

n

OO.

n

The sum on the left is then a.s. convergent, and therefore ~n ;S_ 2-cn a.s. Now any two points s, t E Un Dn with ls- tl ::; 2-m can be connected by a piecewise linear path involving, for each n 2: m, at most 2d steps between nearest neighbors in Dn. Thus, for r E [2-m-1, 2-m], sup {p(Xs, Xt); s, t E

< ""'

,_ L...Jn~m

~n

Un Dn, is- tl ::; r} < ""'

_.._ L...Jn?:_m

2-cn

< 2-cm --. < Tc,

,_

which shows that X is a.s. Hölder continuous on Un Dn with exponent c. In particular, there exists a continuous process Y on [0, 1]d that agrees with X a.s. on Un Dn, and it is easily seen that the Hölder continuity of Y on Un Dn extends with the same exponent c to the entire cube [0, 1]d. To show that Y is a version of X, fix any t E [0, 1]d and choose h, t2, · · · E

Un Dn with tn -t t. Then Xtn = Itn a.s. for each n. Furthermore, Xtn ~ Xt by (10) and ytn -t yt a.s. by continuity, so Xt = yt a.s. 0

The next result shows how regularity of the paths may sometimes be established by comparison with a regular process. Lemma 3.24 (transfer of regularity) Let X ~ Y be random processes on some index set T, taking values in a separable metric space S, and

assume that the paths ofY lie in a set U C ST that is Borel for the a-field U = (B(S))T n U. Then X has a version with paths in U. Proof: For clarity we may write Y for the path of Y, regarded as a random element in U. Then Y is Y-measurable, and by Lemma 1.13 there exists a measurable mapping f: ST -t U suchthat Y = f(Y) a.s. Define

X=

f(X), and note that (X, X) ~ measurable, we get in particular

(Y, Y).

Since the diagonal in 8 2 is

P{Xt = Xt} = P{Yt = yt} = 1,

t E T.

0

We conclude this chapter with a characterization of distribution functions in JRd, required in Chapter 5. For any vectors x = (xt, ... , xd) and y = (y1, ... , Yd), write x ::; y for the componentwise inequality Xk ::; Yk, k = 1, ... , d, and similarly for x < y. In particular, the distribution function F of a probability measure Jl on JRd is given by F(x) = Jl{y; y::; x }. Similarly, let x V y denote the componentwise maximum. Put 1 = (1, ... , 1) and oo = (oo, ... ,oo). For any reetangular box (x, y] = {u; x < u ::; y} = (x 1 , y1 ] x · · · x (xd,Yd], we note that Jl(x,y] = 'L:us(u)F(u) where s(u) = (-1)P withp = 'L:k 1{uk = yk} and the summation extends over all corners u of (x,y]. Let F(x, y] denote the stated sum and say that F has nonnegative increments if

3. Processes, Distributions, and Independence

59

F(x, y] 2: 0 for all pairs x < y. Let us further say that Fis right-continuous if F(xn)-+ F(x) as Xn .J.. x and proper if F(x)-+ 1 or 0 as mink Xk-+ ±oo, respectively. The following result characterizes distribution functions in terms of the mentioned properties. Theorem 3.25 ( distribution functions) A function F: JR.d -+ [0, 1] is the distribution function of some probability measure f..l on JR.d iff it is rightcontinuous and proper with nonnegative increments.

Proof: Assurne that F has the stated properties, and note that the associated set function F(x, y] is finitely additive. Since F is proper, we further have F(x, y] -+ 1 as x -+ -oo and y -+ oo, that is, as (x,y] t (-oo,oo) = JR.d. Hence, for every n E N there exists a probability measure f..ln on (2-nz)d with Z = {... , -1, 0, 1, ... } suchthat f..ln{2-nk}

= F(Tn(k-

1), Tnk],

k E zd, n E N,

and from the finite additivity of F(x, y] we obtain

f..lm(Tm(k- 1, k]) = f..ln(2-m(k- 1, k]),

k E zd, m < n in N.

(11)

In view of (11), we may split the Lebesgue unit interval ([0, 1], B[O, 1], >.) recursively to construct some random vectors 6, 6, . . . with distributions f..ll, f..l2, ... such that ~m -2-m < ~n :S: ~m for all m < n. In particular, 6 2: 6 2: · · · 2: 6 - 1, and so ~n converges pointwise to some random vector ~· Define f..l = >. o ~- 1 . To see that f..l has distribution function F, we note that since Fis proper, >.{~n :S: Tnk}

= f..ln( -oo, Tnk] = F(2-nk),

kE

zd,

n E N.

Since also ~n .J.. ~ a.s., Fatou's lemma yields for dyadic x E JR.d >.{~

< x}

.{~n < x} < F(x) = limsupn>.{~n :S: x} < >.{~n :S: x i.o.} :S: >.{~ :S: x}, >.{~n

and so

F(x)::;

>.{~::;

x}::; F(x

+ Tnl),

n E N.

Letting n -+ oo and using the right-continuity of F, we get >. {~ ::; x} = F( x), which extends to any x E JR.d by the right-continuity of both sides. D The last result has the following version for unbounded measures.

Corollary 3.26 (unbounded measures) Let the function F on JR.d be rightcontinuous with nonnegative increments. Then there exists a measure f..l on JR.d suchthat f..l(x,y] = F(x,y] for all x::; y in JR.d.

Proof: For any a E JR.d, we may apply Theorem 3.25 to suitably normalized versions of the function Fa(x) = F(a, a V x] to obtain a measure f..la

60

Foundations of Modern Probability

on [a, oo) with J.la (a, x] = F( a, x] for all x > a. Then clearly J.la = J.lb on ( a V b, oo) for any a and b, and so the set function J.l = supa J.la is a measure with the required property. D

Exercises 1. Give an example of two processes X and Y with different distributions

suchthat Xt

4 yt for

all t.

2. Let X and Y be {0, 1}-valued processes on some index set T. Show that

X 4 Y iff P{Xh + · · · + Xtn > 0} = P{rt 1 + · · · + ytn > 0} for all n E N and t1, ... , tn E T. 3. Let F be a right-continuous function of bounded variation and with F( -oo) = 0. Show for any random variable ~ that EF(~) = P{ ~ 2: t} F(dt). (Hint: Firsttake F tobe the distribution function of some random variable ryll~, and use Lemma 3.11.) 4. Consider a random variable ~ E L 1 and a strictly convex function f on JR. Show that Ef(~) = f(E~) iff ~ = E~ a.s.

J

5. Assurne that ~ = l:j aj~j and 1J = l:j bj1]j, where the sums converge in L 2 • Show that cov(~, ry) = l:i,j aibjcov(~i, 1Jj ), where the double series on the right is absolutely convergent. 6. Let the O"-fields :Ft,n, t E T, n E N, be nondecreasing in n foreacht and independent in t for each n. Show that the independence extends to the O"-fields :Ft = Vn :Ft,n· 7. For each t E T' let ~L ~~' . . . be random elements in some metric space St with ~; -+ a.s., and assume for each n E N that the random elements ~; are independent. Show that the independence extends to the limits (Hint: Firstshow that Ef1tEsft(e) = f1tEsEft(~t) for any bounded, continuous functions ft on St and for finite subsets SC T.)

e

e'

e.

8. Give an example of three events that are pairwise independent but not independent. 9. Give an example of two random variables that are uncorrelated but not independent. 10. Let 6, 6, . . . be i.i.d. random elements with distribution J.l in some measurable space (S, S). Fix a set A E S with pA > 0, and put T = inf{k; ~k E A}. Show that ~7 has distribution p[ ·JA] = p(· n A)j pA. 11. Let 6,6, ... be independent random variables taking values in [0, 1]. Show that E I1n ~n = I1n E~n. In particular, show that P An = Tin PAn for any independent events Ab A2, ... .

nn

12. Let 6, 6, . . . be arbitrary random variables. Show that there exist some constants c1, c2, · · · > 0 such that the series l:n Cn~n converges a.s.

3. Processes, Distributions, and Independence

61

13. Let 6, 6, . . . be random variables with ~n --+ 0 a.s. Show that there exists some measurable function f > 0 with Ln f(~n) < oo a.s. Alsoshow that the conclusion fails if we only assume L 1-convergence.

14. Give an example of events A1, A2, ... such that P{ An i.o.}

Ln PAn=

= 0 but

00.

15. Extend Lemma 3.20 to a correspondence between U(O, 1) random variables{} and Bernoulli sequences 6, 6, ... with rate p E (0, 1). 16. Give an elementary proof of Theorem 3.25 for d = 1. (Hint: Define = p- 1 ({}), where {} is U(O, 1), and note that ~ has distribution function

~

F.)

17. Let 6, 6, . . . be random variables such that P { ~n =/:- 0 i.o.} = 1. Show that there exist some constants Cn E lR. suchthat P{lcn~nl > 1 i.o.} = 1. (Hint: Note that P{L:k~n l~kl > 0}--+ 1.)

Chapter

4

Random Sequences, Series, and A verages Convergence in probability and in LP; uniform integrability and tightness; convergence in distribution; convergence of random series; strong laws of Zarge numbers; Portmanteau theorem; continuous mapping and approximation; coupling and measurability

The first goal of this chapter is to introduce and compare the basic modes of convergence ofrandom quantities. For random elements ~ and 6, 6, . . . in a metric or topological space S, the most commonly used notions are those of almost sure convergence, ~n --+ ~ a.s., and convergence in probability, ~n ~ ~' corresponding to the general notions of convergence a.e. and in measure, respectively. When S = IR, we have the additional concept of LP-convergence, familiar from Chapter 1. Those three notions are used throughout this book. For a special purpose in Chapter 25, we shall also need the notion of weak L 1 -convergence. For our second main topic, we shall study the very different concept of convergence in distribution, ~n ~ ~' defined by the condition Ef(~n) --+ Ef(~) for all bounded, continuous functions f on S. This is clearly equivalent to weak convergence of the associated distributions J.ln = .C(~n) and J.l = .C(~), written as J.ln ~ J.l and defined by the condition J.lnf --+ J.Lf for every f as above. In this chapter we shall only establish the most basic results of weak convergence theory, such as the "Portmanteau" theorem, the continuous mapping and approximation theorems, and the Skorohod coupling. Our development of the general theory continues in Chapters 5 and 16, and further distributionallimit theorems appear in Chapters 8, 9, 12, 14, 15, 19, and 23. Our third main theme is to characterize the convergence of series L;k ~k and averages n-c L;k rEü:::; ~'

r > 0.

(1)

The second relation in (1) is often referred to as Chebyshev's or Markov's inequality. Assuming that Ee < oo, we get in particular the well-known estimate P{l~- E~l

> e}:::; e- 2 var(~),

e > 0.

Proof of Lemma 4.1: We may clearly assume that E~ = 1. The upper bound then follows as we take expectations in the inequality r 1{~ > r} :::; ~. To get the lower bound, we note that for any r, t > 0 t 2 1{~

> r};:::

(~- r)(2t

+ r- ~) = 2~(r + t)- r(2t + r)- e.

Taking expected values, we get for r E (0, 1)

t2 P{~ > r};::: Now choose t

2(r + t)- r(2t + r)- Ee ;::: 2t(1- r)- Ee.

= Ee /(1- r).

D

For random elements ~ and ~1, 6, ... in a metric space (S, p), we say that ~n converges in probability to ~ (written as ~n ~ ~) if lim P{p(.;n,.;) > e}

n-+oo

=

0,

E

> 0.

By Chebyshev's inequality it is equivalent that E[p(~n, ~) 1\ 1] -+ 0. This notion of convergence is related to the a.s. version as follows. Lemma 4.2 (subsequence criterion}

Let~'

6, 6, ... be random elements

in a metric space (S, p). Then ~n ~ ~ iff every subsequence N' C N has a further subsequence N" C N' such that ~n -+ ~ a.s. along N". In particular,

~n-+ ~ a.s. implies ~n ~ ~This shows in particular that the notion of convergence in probability depends only on the topology and is independent of the metrization p.

Proof: Assurne that ~n ~ ~' and fix an arbitrary subsequence N' C N. We may then choose a further subsequence N" C N' such that nEN"

nEN"

where the equality holds by monotone convergence. The series on the left then converges a.s., which implies ~n-+ ~ a.s. along N".

64

Foundations of Modern Probability

Now assume instead the stated condition. If ~n ~ ~' there exists some c > 0 suchthat E[p(~n, ~) 1\ 1] > c along a subsequence N' C N. By hypothesis, ~n -+ ~ a.s. along a further subsequence N" C N', and by dominated 0 convergence we get E[p(~n,~) 1\ 1]-+ 0 along N", a contradiction. For a first application, we shall see how convergence in probability is preserved by continuous mappings. Lemma 4.3 (continuous mapping) For any metric spaces S and T, let

~, 6, 6, . . . be random elements in S with ~n ~ ~, and let the mapping f: S-+ T be measurable and a.s. continuous at ~- Then f(~n) ~ !(~). Proof: Fix any subsequence N' C N. By Lemma 4.2 we have ~n -+ ~ a.s. along some further subsequence N" C N', and by continuity we get p f(~n)-+ f(~) a.s. along N". Hence, f(~n)-+ !(~) by Lemma 4.2. 0

Now consider a sequence of metric spaces (Sk, Pk), and introduce the product space S = XkSk = 81 X 82 X • • • endowed with the product topology, a convenient metrization of which is given by p(x, y) = Lk Tk{Pk(xk, Yk) 1\ 1},

(2)

x, y E xksk.

If each Sk is separable, then B(S) = ®k B(Sk) by Lemma 1.2, and so a random elementinS is simply a sequence of random elements in Sk, k E N. Lemma 4.4 (random sequences) For any separable metric spaces 8 1 , 82, ... , let ~ = (6, 6, ... ) and ~n = (~1, ~2, ... ), n E N, be random elements in XkSk. Then ~n ~ ~ iff ~J: ~ ~k in Sk for each k. Proof: With p as in (2), we get for each n E N

E[p(~n,~)

1\

1]

=

Ep(~n,~) = LkTkE[pk(~J:,~k)

1\

1].

Thus, by dominated convergence E[p(~n, ~)/\1] -+ 0 iff E[pk(~J:, ~k)/\1] -+ 0 for all k. 0 Combining the last two lemmas, it is easy to see how convergence in probability is preserved by the basic arithmetic Operations.

Corollary 4.5 (elementary operations) Let~. 6, 6, ... and ry, 'f/I. 'r/2, ... be random variables with ~n ~ e and 'f/n ~ 'rf· Then aen + b'f/n ~ ae + bry p p for all a, b E IR, and en'f/n -+ e'f/. Furthermore, enf'rfn-+ e/'f/ whenever a.s. rJ =1- 0 and TJn =1- 0 for all n. Proof: By Lemma 4.4 we have (en,'rfn) ~ (~,ry) in IR 2 , so the results for linear combinations and products follow by Lemma 4.3. To prove the last assertion, we may apply Lemma 4.3 to the function f: (x,y) ~ (xjy)1{y =10}, which is clearly a.s. continuous at (e, ry). o

4. Random Sequences, Series, and A verages

65

Let us next examine the associated completeness properties. For any random elements 6, 6, . . . in a metric space (S, p), we say that (~n) is Cauchy {convergent) in probability if p(~m, ~n) ~ 0 as m, n ----+ oo, in the sense that E[p(~m, ~n) 1\ 1] ----+ 0.

Lemma 4.6 {completeness} Let 6, 6, . . . be random elements in a complete metric space (S,p). Then (~n) is Cauchy in probability or a.s. iff ~n ~ ~ or ~n----+ ~ a.s., respectively, for some random element ~ inS.

Proof: The a.s. case is immediate from Lemma 1.10. Assuming ~n ~ ~' we get

which means that (~n) is Cauchy in probability. Now assume instead the latter condition. Define

The

nk

and so

are finite and satisfy

l::k p(~nk, ~nk+ 1 ) < oo a.s.

The sequence (~nk) is then a.s. Cauchy

and converges a.s. toward some measurable limit ~· To see that ~n ~ ~' write

and note that the right-hand side tends to zero as m, k ----+ oo, by the Cauchy convergence of (~n) and dominated convergence. D Next consider any probability measures J.L and J.Ll> J.L2, ... on some metric space (S,p) with Borel a-field S, and say that J.Ln converges weakly to J.L (written as J.Ln ~ J.L) if J.Lnf----+ J.Lf for every f E Cb(S), the dass ofbounded, continuous functions f : S ----+ R If ~ and 6, 6, . . . are random elements in S, we further say that ~n converges in distribution to ~ (written as

~n ~ ~) if .C(~n) ~ .C(~), that is, if Ef(~n) ----+ Ef(~) for all f E Cb(S).

Note that the latter mode of convergence depends only on the distributions and that ~ and the ~n need not even be defined on the same probability space. To motivate the definition, note that Xn ----+ x in a metric space S iff f(xn)----+ f(x) for all continuous functions f: S----+ JR, and also that .C(~) is determined by the integrals Ef(~) for all f E Cb(S). The following result gives a connection between convergence in probability and in distribution.

66

Foundations of Modern Probability

e,

Lemma 4. 7 (convergence in probability and in distribution) Let 6, 6, . . . be random elements in a metric space (S, p). Then en !+ e implies en i+ and the two conditions are equivalent when e is a.s. constant.

e,

p

Proof: Assurne en --+ e. For any f E Cb(S) we need to show that Ef(en) --+ EJ(e). If the convergence fails, we may choose some subsequence N' c N suchthat infnEN' IEJ(en)- EJ(e)l > 0. By Lemma 4.2 there exists a further subsequence N" c N' such that en --+ e a.s. along N". By continuity and dominated convergence we get EJ(en) --+ EJ(e) along N", a contradiction.

Conversely, assume that en i+ s ES. Since p(x, s) 1\ 1 is a bounded and continuous function of x, we get E[p(en, s) 1\ 1] --+ E[p(s, s) 1\ 1] = 0, and p

so en--+ s.

0

A family of random vectors et, t E T, in JRd is said to be tight if lim supP{Ietl > r}

r--+oo tET

= 0.

For sequences (en) the condition is clearly equivalent to lim limsupP{Ienl > r}

r-+oo

n-+oo

= 0,

(3)

which is often easier to verify. Tightness plays an important role for the compactness methods developed in Chapters 5 and 16. For the moment we note only the following simple connection with weak convergence.

Lemma 4.8 (weak convergence and tightness) Let e, 6, 6, ... be random vectors in JRd satisfying en i+ e. Then (en) is tight. Proof: Fix any r

> 0, and define f(x) = (1- (r -lxl)+)+· Then

limsupP{Ienl > r}::; lim EJ(en) n--+oo

n--+oo

= EJ(e)::; P{lel > r -1}.

Here the right-hand side tends to 0 as r--+ oo, and (3) follows.

0

We may further note the following simple relationship between tightness and convergence in probability.

Lemma 4.9 (tightness and convergence in probability) Let random vectors in JRd. Then (en) is tight iff cnen 2: 0 with Cn --+ 0.

!+ 0 for

6, 6, ... be

any constants

CI, c2, · · ·

Proof: Assurne (en) tobe tight, and let Cn--+ 0. Fixing any r, c: > 0, and noting that Cnr :S c: for all but finitely many n E N, we get

limsupP{Icnenl > c:} :S limsupP{I~nl > r}. n--+oo

n--+oo

Here the right-hand side tends to 0 as r--+ oo, and so P{lcnenl > c:}--+ 0. Since c: was arbitrary, we get cnen !+ 0. If instead (en) is not tight, we may

4. Random Sequences, Series, and A verages

67

choose a subsequence (nk) C N such that infk P{l~nk I > k} > 0. Letting Cn = sup{k-\ nk 2: n}, we note that Cn-+ 0 and yet P{lcnk~nkl > 1}-/+ 0. Thus, the stated condition fails. D We turn to a related notion for expected values. A family of random variables ~t, t E T, is said tobe uniformly integrable if lim supE[I~tl; l~tl > r] tET

r-->oo

= 0.

(4)

For sequences (~n) in L 1 , this is clearly equivalent to lim limsupE[I~nl; l~nl > r] = 0.

r----+oo

n----+oo

(5)

Condition (4) holds in particular if the ~t are LP-bounded for some p > 1, in thesensethat supt El~tiP < oo. To see this, it suffices to write E[l~tl; l~tl

> r] :S r-p+l El~tiP,

r,p

> 0.

The next result gives a useful characterization of uniform integrability. For motivation we note that if ~ is an integrable random variable, then E[l~l; AJ-+ 0 as PA-+ 0, by Lemma 4.2 and dominated convergence. The latter condition means that supAEA,PA0

Proof: Assurne the

~t

E[l~tl; AJ

= 0.

(6)

to be uniformly integrable, and write

:S rPA + E[l~tl;

l~tl

> r], r > 0.

Here (6) follows as we let PA-+ 0 and then r-+ oo. To get the boundedness in L 1 , it suffices to take A = n and choose r > 0 large enough. Conversely, let the ~t be L 1-bounded and satisfy (6). By Chebyshev's inequality we get as r -+ oo

> r} :S r- 1 suptEI~tl-+ 0, and so (4) follows from (6) with A = {l~tl > r}. suptP{I~tl

D

The relevance of uniform integrability for the convergence of moments is clear from the following result, which also contains a weak convergence version of Fatou's lemma. Lemma 4.11 {convergence of means} Let~' 6, 6, ... be 1R.+-valued random variables with ~n ~ ~· Then E~ :S liminfn E~n, and we have -+ E~ < oo iff (5) holds.

E~n

Proof: For any r > 0 the function x on IR.+. Thus,

1--t

x 1\ r is bounded and continuous

liminf E~n 2: lim E(~n 1\ r) n-+oo

n-+oo

= E(~ 1\ r),

68

Foundations of Modern Probability

and the first assertion follows as we let r-+ oo. Next assume (5), and note in particular that Ee ::::; lim infn Een < oo. For any r > 0 we get

IEen- Eel

::::;

IEen- E(en 1\ r)l + IE(en + IE(e 1\ r)- Eel.

1\

r)- E(e 1\ r)l

Letting n -+ oo and then r -+ oo, we obtain Een -+ Ee. Now assume instead that Een -+ Ee < oo. Keeping r > 0 fixed, we get as n -+ oo Since x 1\ (r- x)+ t x as r -+ oo, the right-hand side tends to zero by dominated convergence, and (5) follows. D We may now examine the relationship between convergence in LP and in probability.

Proposition 4.12 (LP-convergence) Fix any p > 0, and let

e.

LP with en ~ Then these conditions are equivalent: (i) en-+ ein LP;

e, 6, 6, · · · E

(ii) IIen llv -+ llellv; (iii) the variables len IP, n E N, are uniformly integrable.

Conversely, (i) implies en ~

e.

Proof: First assume that en-+ ein LP. Then llenllv-+ llellv by Lemma 1.29, and by Lemma 4.1 we have, for any

E:

>

0,

Thus, en ~ e. For the remainder of the proof we may assume that en ~ e. In particular, leniP ~ 1e1v by Lemmas 4.3 and 4.7, and so (ii) and (iii) are equivalent by Lemma 4.11. Next assume (ii). If (i) fails, there exists some subsequence N' c N with infnEN' IIen - ellv > 0. By Lemma 4.2 we may choose a further subsequence N" c N' suchthat en -+ e a.s. along N". But then Lemma 1.32 yields IIen - ellv -+ 0 along N"' a contradiction. Thus, (ii) implies (i), and so all three conditions are equivalent. D We shall briefiy consider yet another notion of convergence of random variables. Assuming e,6, · · · E LP for some p E [1, oo), we say that en-+ e weakly in LP if EenTJ-+ Eery for every TJ E Lq, where p- 1 +q- 1 = 1. Taking TJ = 1e1v- 1 sgne gives IITJIIq = 11e11~- 1 , and so by Hölder's inequality

11e11~ = Eery = n-+oo lim EenrJ::::; 11e11~- 1 liminf llenllv, n-+oo which shows that llellv::::; liminfn lleniiP' Now recall the well-known fact that any L 2-bounded sequence has a subsequence that converges weakly in L 2 • The following related criterion for weak compactness in L 1 will be needed in Chapter 25.

4. Random Sequences, Series, and A verages

69

Lemma 4.13 (weak L 1 -compactness, Dunford) Every uniformly integrable sequence of random variables has a subsequence that converges weakly in L 1 .

Proof: Let (~n) be uniformly integrable. Define ~~ = ~n1{l~nl :S k}, and note that (~~) is L 2 -bounded in n for each k. By the compactness in L 2 and a diagonal argument, there exist a subsequence N' c N and some random variables 'f/1> 'f/2, ... suchthat ~~ -+ 'f/k holds weakly in L 2 and then also in L 1, as n-+ oo along N' for fixed k. Now IIT/k- 'flzll1 :S liminfn II~~- ~~111, and by uniform integrability the right-hand side tends to zero as k, l -+ oo. Thus, the sequence ('flk) is Cauchy in Ll, and so it converges in L 1 toward some ~- By approximation it follows easily that ~n -+ ~ weakly in L 1 along N'. 0 We now derive criteria for the convergence of random series, beginning with an important special case. Proposition 4.14 (series with positive terms) Let 6, 6, ... be independentJR+-valuedrandomvariables. Thenl:n~n < oo a.s. iffl:nE[~n!\1] < 00.

Proof: Assuming the stated condition, we get E l:n (~n !\ 1) < oo by Fubini's theorem, so l:n(~n!\1) < oo a.s. In particular, l:n 1{~n > 1} < oo a.s., so the series l:n(~n !\ 1) and l:n~n differ by at most finitely many terms, and we get l:n ~n < oo a.s. Conversely, assume that l:n ~n < oo a.s. Then also l:n (~n !\ 1) < oo a.s., so we may assume that ~n ~ 1 for all n. Noting that 1- x ~ e-x ~ 1- ax for x E [0, 1] where a = 1 - e- 1, we get 0

< Eexp{- Ln~n} = IJnEe-~n < IJn (1- aE~n) :S IJn e-aE~n = exp { -a Ln E~n},

and so l:n E~n

< 00.

0

To handle more general series, we need the following strengthened version of the Bienayme-Chebyshev inequality. A further extension appears as Proposition 7.15. Lemma 4.15 (maximum inequality, Kolmogorov) Let 6, 6, ... be independent random variables with mean zero, and put Sn = 6 + · · · + ~n. Then

70

Foundations of Modern Probability

Proof: We may assume that L:nE~; < oo. Writing T = inf{n; ISnl > r} and noting that Sk1{T = k}ll(Sn- Sk) for k ~ n, we get

>'"""'

2• ESn2 - L....tk5,n E[Snl

2 '"""' L....tk5,n ßC'-k

T

= k]

> '"""' {E[S~; T = k] + 2E[Sk(Sn- Sk); T = k]} L....tk5,n Lk5,n E[S~; T = k] ~ r 2 P{T ~ n}.

As n ---+ oo, we obtain

LkE~~ ~ r 2 P{T < oo} = r 2 P{supkiSkl > r}.

0

The last result leads easily to the following sufficient condition for the a.s. convergence of random series with independent terms. Conditions that are both necessary and sufficient are given in Theorem 4.18. Lemma 4.16 (variance criterion for series, Khinchin and Kolmogorov) Let 6, 6, ... be independent random variables with mean 0 and L:n E~; < oo. Then L:n ~n converges a.s.

Proof: Write Sn=

6 + · · · + ~n·

By Lemma 4.15 we get for any c > 0

P{supk~niSn- Ski> s} ~ c- 2 Lk~nE~~Hence, supk>n ISn -Ski ~ 0 as n---+ oo, and Lemma 4.2 yields supk>n IBnSki ---+ 0 a.s.-along a subsequence. Since the last supremum is nonincreasing in n, the a.s. convergence extends to the entire sequence, which means that (Sn) is a.s. Cauchy convergent. Thus, Sn converges a.s. by Lemma 4.6. 0 The next result gives the basic connection between series with positive and symmetric terms. By ~n ~ oo we mean that P{ ~n > r} ---+ 1 for every r > 0. Theorem 4.17 (positive and symmetric terms) Let 6, 6, . . . be independent, symmetric random variables. Then these conditions are equivalent:

(i) L:n ~n converges a.s.; (ii) L:n ~; < oo a.s.; (iii) L:n E(~; 1\ 1)
0. For such an c we may write P{~n E F} ~ P {~ E pe:}, and (iii) follows as we let n --t oo and then c --t 0. Finally, assume (ii) and let f ~ 0 be continuous. By Lemma 3.4 and Fatou's lemma,

()0 P{f(~) > t}dt ~

Ef(~)

Jo

< liminf { n~oo }

00

0

Now let

liminf P{f(~n) > t}dt n-+oo

P{!(~n) > t}dt = liminf Ef(~n)· n__..oo

f be continuous with I/I

Ef(~n) --t Ef(~),

{oo

Jo

~ c


r}::; 2P{ISI > r} for all r > 0, where S = Lk~k· (Hint:

P{maxkl~kl

Let 'fJ be the first term ~k where maxk l~k I is attained, and check that d

(ry,S-ry) = (ry,ry-S).) 3. Let 6, 6, . . . be i.i.d. random variables with P { l~n I > t} > 0 for all > 0. Show that there exist some constants c1 , c2 , . . . such that Cn~n -+ 0 in probability but not a.s.

t

4. Show that a family of random variables ~t is tight iff supt Ef(l~tl) for some increasing function f : IR.+ -+ IR.+ with f (oo) = oo.

< oo

5. Consider some random variables ~n and 'f/n suchthat (~n) is tight and p

'f/n -+ 0. Show that even

~n'f/n

p

-+ 0.

6. Show that the random variables ~t are uniformly integrable iff supt Ef(l~tl) < oo for some increasing function f: IR.+-+ IR.+ with f(x)jx-+ oo as x-+ oo.

7. Show that the condition supt El~tl < oo in Lemma 4.10 can be omitted

if A is nonatomic.

8. Let 6, 6, · · · E L 1 . Show that the ~n are uniformly integrable iff the condition in Lemma 4.10 holds with supn replaced by limsupn. 9. Deduce the dominated convergence theorem from Lemma 4.11. 10. Show that if {l~tiP} and {ITJtiP} are uniformly integrable for some p > 0, then so is {la~t + b'fJtiP} for any a, b ER (Hint: Use Lemma 4.10.) Use this fact to deduce Proposition 4.12 from Lemma 4.11. 11. Give examples of random variables ~, 6, 6, · · · E L 2 such that ~n -+ ~ holds a.s. but not in L 2 , in L 2 but not a.s., or in L 1 but not in L 2 . 12. Let 6, 6, ... be independent random variables in L 2 . Show that Ln ~n converges in L 2 iff Ln E~n and Ln var(~n) both converge.

13. Give an example of independent symmetric random variables 6, 6, suchthat Ln ~n is a.s. conditionally (nonabsolutely) convergent.

...

14. Let ~n and 'f/n be symmetric random variables with l~nl ::; ITJnl suchthat the pairs (~n, TJn) are independent. Show that Ln ~n converges whenever Ln 'fJn does. 15. Let 6, 6, ... be independent symmetric random variables. Show that E[(Ln ~n) 2 A 1] ::; Ln E[~; A 1] whenever the latter series converges. (Hint: Integrate over the sets where supn l~nl ::; 1 or > 1, respectively.) 16. Consider some independent sequences of symmetric random variables

~k, TJk, TJ~, ... with ITJrl ::; l~kl suchthat Lk ~k converges, and assume "lk ~

82

Foundations of Modern Probability

'f/k for each k. Show that L:k rJ'k ~ L:k 'f/k· (Hint: Use a truncation based on the preceding exercise.)

17. Let "En en be a convergent series of independent random variables. Show that the sum is a.s. independent of the order of terms iff "En IE[en; lenl :::; 1]1 < oo. 18. Let the random variables enj be symmetric and independent for each

n. Show that

l::j enj ~ 0 iff l::j E[e;j 1\ 1] ---+ 0.

19. Let en ~ e and anen ~ e for some nondegenerate random variable e and some constants an > 0. Show that an ---+ 1. (Hint: Turning to subsequences, we may assume that an ---+ a.) 20. Let en ~ e and anen +bn ~ e for some nondegenerate random variable e, where an > 0. Show that an---+ 1 and bn---+ 0. (Hint: Symmetrize.) 21. Let 6, 6, ... be independent random variables suchthat an L:k 0, we have

L Be;1(11\ 1en1D ~ c L .cnj + L B[e;1; l~njl > c], J

J

J

which tends to 0 by (ii), as n-+ oo and then c-+ 0. Furthernote that

L .E(;j(11\ l(njl) ~ L El(njl = L c~/ El~l 3 :S CnSUpjc;,j 3

J

J

J

by the first part of the proof.

2

-+ 0 D

The problern of characterizing the convergence to a Gaussian limit is solved completely by the following result. The reader should notice the striking resemblance between the present conditions and those of the threeseries criterion in Theorem 4.18. A far-reaching extension of the present result is obtained by different methods in Chapter 15. As before var[e; A] = var(e1A)· Theorem 5.15 (Gaussian convergence, Feller, Levy) Let (~nj) be a null array of random variables, and let ~ be N(b, c) for some constants b and c.

L,1 ~nj :!t ~ iff these conditions hold: (i) L,1 P{lenil > c}-+ 0 for all c > 0; (ii) L,1 E[enji l~nil ~ 1]-+ b; (iii) L,1 var[enji l~njl ~ 1]-+ c.

Then

Moreover, (i) is equivalent to sup1 lenjl ~ 0. lf distribution, then (i) holds iff the limit is Gaussian.

L,1 enj

converges m

Foundations of Modern Probability

94

Proof: To see that (i) is equivalent to supi l~nj I ~ 0, we note that

P{supil~nil > c} = 1-

I1 .(1- P{l~nJI > c}), 3

c

> 0.

Since supi P{l(njl > c} --+ 0 under both conditions, the assertion follows by Lemma 5.8. Now assume l:nj (nj ~ (. Introduce medians mnj and symmetrizations -

=



-

d

(nj of the vanables (nj, and note that mn supj lmnj I --+ 0 and I:j (nj --+ {, where { is N(O, 2c). By Lemma 4.19 and Theorem 5.11, we get for any

c>O

L .P{I(nj- mnjl > c- mn} 2 L .P{I{njl > c- mn}--+ 0. 3


x], where m = E~, and so we may assume that E~ = 0. Now define Cn

= 1 V sup{x > 0; nL(x)

~ x 2 },

n E N,

5. Characteristic Functions and Classical Limit Theorems

97

and note that Cn t oo. From the slow variation of L it is further clear that Cn < oo for all n and that, moreover, nL(cn) "'c;. In particular, Cn "'n 112 iff L(en)"' 1, that is, iffvar(~) = 1. We shall verify the conditions of Theorem 5.15 with b = 0, c = 1, and ~nj = ~j/Cn, j :Sn. Beginning with (i), let E > 0 be arbitrary, and conclude from Lemma 5.18 that

n

} "'c;P{j~J > CnE} "'c;P{j~J > CnE} O P{l c/ J "" Cn > E L(cn) L(cnE) -+ ·

Recalling that

E~

= 0, we get by the same lemma

njE[~/cn; J~/cnJ :S 1]j :S ~E[j~J; j~J > Cn]"' CnE[j~( ~~~ > Cn] Cn

Cn

-+ 0,

(11)

which proves (ii). To obtain (iii), we note that in view of (11)

n var[~/cn;

J~/cnJ :S

1]

= ~ L(cn)- n(E[~/cn; J~J :S Cn]) 2 -+ 1. cn

By Theorem 5.15 the required convergence follows with an = c~ 1 and mn=O. Now assume instead that the stated convergence holds for suitable constants an and mn. Then a corresponding result holds for the symmetrized variables ~' ~1 , ~2, . . . with constants an/ /2 and 0, and so we may assume that c~ 1 Lks;n ~k ~ (. Here, clearly, Cn -+ oo and, moreover, Cn+l "' Cn, since even c;:;-~ 1 Lks;n ~k ~ ( by Theorem 4.28. Now define for x

T(x) = P{l~l >X},

L(x) = E[e; 1~1

: : ; x],

>0

Ü(x) = E(e A x 2).

By Theorem 5.15 we have nT(cnc)-+ 0 for all c > 0, and also nc~ 2 L(cn) -+ 1. Thus, c;i'(cnE)ji(cn)-+ 0, which extends by monotonicity to

x 2 T(x) x 2 T(x) 0 Ü(x) ::::; i(x) -+ '

X-+

00.

Next define for any x > 0

T(x)

= P{j~J > x},

By Lemma 4.19 we have T(x + Jmj) ::::; 2T(x) for any median m of ~­ Furthermore, by Lemmas 3.4 and 4.19, we get x2

Ü(x)

x2

= fo P{~2 > t}dt::::; 2 fo P{4e > t}dt = BU(x/2).

Hence, as x -+ oo,

L(2x)- L(x) 4x 2 T(x) 8x 2 T(x -Jmj) < < -+ 0' L(x) - U(x)- x 2 T(x) - 8-1U(2x)- 2x 2 T(x -Jmj)

---'----=-..,.......,----'--'-

which shows that L is slowly varying.

98

Foundations of Modern Probability

Finally, assume that n- 112 :Ek r this is a contradiction, and the asserted tightness follows. 0 We may now prove the desired extension of Theorem 5.3.

100

Foundations of Modern Probability

Theorem 5.22 ( extended continuity theorem, Levy, Bochner} Let Jlb /12, ... be probability measures on IRd with fln(t) ---+ cp(t) for every t E IRd, where the limit

0. 10. For any a-field F, event A, and random variable E L 1 , show that E[eiF, 1A] = E[e; AIF]/ P[AIFJ a.s. Oll A. 11. Let the random variables 6, 6, · · · 2 0 and a-fields F1, F2, . . . be suchthat E[eniFn] ~ 0. Show that en ~ 0. (Hint: Consider the random variables en 1\ 1.)

e

118

Foundations of Modern Probability

ij), where .; E L 1 . Show that E(el17) g, E[€ifi]. (Hint: If E[eiTJ] = f(TJ), then E(€1iil = f(ij) a.s.) 13. Let (.;, TJ) be a random vector in IR2 with probability density /, put F(y) = J f(x,y)dx, andletg(x,y) = f(x,y)JF(y). Showthat P[.; E BITJ] = J8 g(x, TJ)dx a.s. 14. Use conditional distributions to deduce the monotone and dominated convergence theorems for conditional expectations from the corresponding unconditional results. 15. Assurne that E:F e 4 e for some e E L 1 . Show that e is a.s. Fmeasurable. (Hint: Choose a strictly convex function f with Ef(e) < oo, and apply the strict Jensen inequality to the conditional distributions.) 12. Let (.;, TJ)

g, (€,

4 (e, (), where 17 is (-measurable. Show that ell 71 (. above that P[e E BITJ] 4 P[e E BI(), and deduce the

16. Assurne that (e, 17)

(Hint: Show as corresponding a.s. equality.) 17. Let be a random element in some separable metric space S. Show that P[e E ·IFJ is a.s. degenerate iff e is a.s. F-measurable. (Hint: Reduce to the case when P[e E ·IFJ is degenerate everywhere and hence equal to 871 for some F-measurable random element TJ inS. Then show that e = TJ a.s.) 18. Assuming ell71 ( and 'Yll(.;, TJ, (), show that ell71 ,-y( and .;ll 71 ((, 'Y)· 19. Extend Lemma 3.6 to the context of conditional independence. Also show that Corollary 3. 7 and Lemma 3.8 remain valid for the conditional independence, given some a-field 1l. 20. Fix any a-field F and random element .; in some Borel space, and define TJ = P[.; E ·IFJ. Show that ell 71 F. 21. Let .; and TJ be random elements in some Borel space S. Prove the existence of a measurable function f: S x (0, 1) -+ S and some U(O, 1) random variable 'YllTJ such that e = f(TJ, 'Y) a.s. (Hint: Choose f with (f(TJ, '!9), 17) 4 (e, 17) for any U(O, 1) random variable '!9ll(e, TJ), and then let ('Y, ij) 4 ('!9, TJ) with (e, 17) = (f('Y, ij), ij) a.s.) 22. Let and 17 be random elements in some Borel space S. Show that we may choose a random element ij inS with (e, 17) 4 (e, ij) and TJlld. 23. Let the probability measures p and Q Oll (n, A) be related by Q = .p for some random variable 2 0, and consider any a-field F c A. Show that Q = Ep[eiF] · P on F. 24. Assurne as before that Q = P on A, and let F c A. Show that EQ[TJIF] = Ep[.;TJIFJfEp[eiF] a.s. Q for any random variable TJ 2 0.

e

e

e

e



Chapter 7

Martingales and Optional Times Filtrations and optional times; random time-change; martingale property; optional stopping and sampling; maximum and upcrossing inequalities; martingale convergence, regularity, and closure; limits of conditional expectations; regularization of submartingales The importance of martingale methods and ideas can hardly be exaggerated. Indeed, martingales and the associated notions of filtrations and optional times are constantly used in all areas of modern probability; they appear frequently throughout the remainder of this book. In discrete time a martingale is simply a sequence of integrable random variables centered at the successive conditional means, a centering that can always be achieved by the elementary Doob decomposition. More precisely, given any discrete filtration F = (Fn), that is, an increasing sequence of CT-fields in n, we say that a sequence M = (Mn) forms a martingale with respect to F if E[MnlFn-1] = Mn-1 a.s. for all n. A special role is played by the dass of uniformly integrable martingales, which can be represented in the form Mn= E[~IFn] for some integrable random variables~Martingale theory owes its usefulness to a number of powerful general results, such as the optional sampling theorem, the Submartingale convergence theorem, and a wide range of maximum inequalities. The applications discussed in this chapter include extensions of the Borel-Cantelli lemma and Kolmogorov's o-1 law. Martingales can also be used to establish the existence of measurable densities and to give a short proof of the law of large numbers. Much of the discrete-time theory extends immediately to continuous time, thanks to the fundamental regularization theorem, which ensures that every continuous-time martingale with respect to a right-continuous filtration has a right-continuous version with left-hand limits. The implications of this result extend far beyond martingale theory. In particular, it will enable us in Chapters 15 and 19 to obtain right-continuous versions of independent-increment and Feller processes. The theory of continuous-time martingales is continued in Chapters 17, 18, 25, and 26 with sturlies of quadratic variation, random time-change, integral representations, removal of drift, additional maximum inequalities, and various decomposition theorems. Martingales also play a basic role for especially the Skorohod embedding in Chapter 14, the stochastic integra-

120

Foundations of Modern Probability

tion in Chapters 17 and 26, and the theories ofFeUer processes, SDEs, and diffusions in Chapters 19, 21, and 23. As for the closely related notion of optional times, our present treatment is continued with a more detailed study in Chapter 25. Optional times are fundamental not only for martingale theory but also for various models involving Markov processes. In the latter context they appear frequently in the sequel, especially in Chapters 8, 9, 12, 13, 14, 19, and 22-25. To begin our systematic exposition of the theory, we may fix an arbitrary index set T C iR. A filtration on T is defined as a nondecreasing family of a-fields Ft c A, t E T. We say that a process X on T is adapted to F = (Ft) if Xt is Ft-measurable for every t E T. The smallest filtration with this property, namely Ft = a{Xs; s ~ t}, t E T, is called the induced or generated filtration. Here "smallest" is understood in the sense of set inclusion for every fixed t. By a random time we mean a random element T in T = TU {supT}. We say that T is F-optional or an F-stopping time if {T ~ t} E Ft for every t E T, that is, if the process Xt = 1{ T ~ t} is adapted. (Here and in similar cases, we often omit the prefix F when there is no risk for confusion.) If T is countable, it is clearly equivalent that {T = t} E Ft for every t E T. For any optional times a and T we note that even a VT and a 1\ T are optional. With every optional timeT we may associate a a-field F'T

= {A E A; An {r ~ t}

E Ft, t E T}.

Some basic properties of optional times and the associated a-fields are listed below. Lemma 7.1 (optional times) For any optional times a and r, we have

(i) T is F'T-measurable; (ii) F'T = Ft on {r = t} for all t E T; (iii) Fa n { a ~ T} c Fai\'T = Fa n F'T. In particular, we see from (iii) that {a ~ r} E Fa nFn that Fa= F'T on {a = r}, and that Fa C F'T whenever a ~ r. Proof- (iii) For any A E Fa and t E T, we have

An {a ~ r} n {r ~ t}

= (An {a

which belongs to Ft since a 1\ t and

T

~ t}) n {r ~

t} n {a

1\ t ~ r 1\ t},

1\ t are both Ft-measurable. Hence,

Fan{a~r}cF'T.

The first relation now follows as we replace T by a 1\ T. Replacing a and T by the pairs (a 1\ r, a) and (a 1\ r, r), we obtain FaM C Fan F'T" To prove the reverse relation, we note that for any A E Fa n F'T and t E T An {a 1\ r ~ t} whence A E FaM·

= (An {a ~ t}) u (An {r

~

t}) E Ft,

7. Martingales and OptionaJ Times

121

(i) Applying (iii) to the pair (r, t) gives {T ::; t} E Fr for all t E T, which extends immediately to any t ER Now use Lemma 1.4. (ii) First assume that T t. Then Fr = Fr n {T ::; t} C Ft. Conversely, assume that A E Ft and s E T. If s :2: t we get An{r::; s} = A E Ft C F 8 , and for s < t we have An {T ::; s} = 0 E F 8 • Thus, A E Fr· This shows that Fr = Ft when T t. The general case now follows by part (iii). 0

=

=

Given an arbitrary filtration F on ~+• we may define a new filtration F+ by Ft = nu>t Fu, t :2: 0, and we say that F is right-continuous if F+ = F. In particular, F+ is right-continuous for any filtration F. We say that a random timeT is weakly F-optional if {r < t} E Ft for every t > 0. In that case T + h is clearly F-optional for every h > 0, and we may define Fr+= nh>ü Fr+h· When the index set is Z+, we take F+ =Fand malm no difference between strictly and weakly optional times. The following result shows that the notions of optional and weakly optional times agree when F is right-continuous.

Lemma 7.2 (weakly optional times) A random time optional iff it is F+ -optional, in which case Fr+= F: = {A E A; An {r < t} E Ft, t

T

is weakly F-

> 0}.

(1)

Proof: For any t :2: 0, we note that {r::;t}=n

r>t

{r < t}

{r 0, hence to A n {T ::; t} E Ft+ for every t :2: 0, which means that A E F;. 0 We have already seen that the maximum and minimum of two optional times are again optional. The result extends to countable collections as follows.

Foundations of Modern Probability

122

Lemma 7.3 (closure properties) For any random times T1, r2, . . . and filtration F on JR+ or Z+, we have: (i) lf the Tn are F-optional, then so is CT = supn Tn·

(ii) lf the Tn are weakly F-optional, then so is r Ff

= nnFi,..

= infn Tn,

and we have

Proof: To prove (i) and the first assertion in (ii), we note that

{r < t} =

un

{rn < t},

(3)

where the strict inequalities may be replaced by:::; for the index set T = Z+· To prove the second assertion in (ii), we note that Ff c nn Fi,. by Lemma 7.1. Conversely, assuming A E nn Fi,., we get by (3) for any t 2:: 0

An {T < t} =An

un{ < un Tn

with the indicated modification for T

t} =

= Z+.

(An {Tn < t}) E Ft,

Thus, A E Ff.

D

Part (ii) of the last result is often useful in connection with the following approximation of optional times from the right.

Lemma 7.4 {discrete approximation) For any weakly optional time r in

R+, there exist some countably valued optional times Tn .j.. r. Proof: We may define Tn

= 2-n[2nT + 1], n

E N.

Then Tn E 2-nfij for all n, and Tn .j.. r. Alsonote that the Tn are optional since {rn:::; k2-n} = {r < k2-n} E Fk2-n. D It is now time to relate the optional times to random processes. We say that a process X on JR+ is progressively measurable or simply progressive if its restriction to n x (0, t] is Ft 0 B[O, t]-measurable for every t 2:: 0. Note that any progressive process is adapted by Lemma 1.26. Conversely, a simple approximation from the left or right shows that any adapted and left- or right-continuous process is progressive. A set A c n X JR+ is said to be progressive if the corresponding indicator function lA has this property, and we note that the progressivesetsform a CT-field.

Lemma 7.5 {optional evaluation) Fix a filtration F on an index set T, let X be a process on T with values in a measurable space (S, S), and let r be an optional time in T. Then X 7 is F 7 -measurable under each of these conditions: (i) T is countable and X is adapted; (ii) T

= JR+

and X is progressive.

Proof: In both cases, we need to show that {X7 E B, r:::; t} E :Ft,

t

2:: 0, BE S.

7. Martingales and OptionaJ Times

123

This is clear in case (i) if we write {X7 E B} =

Us:S;t {Xs E B,

T

= s} E Ft,

BE S.

In case (ii) it is enough to show that XrAt is Ft-measurable for every t 2: 0. We may then assume T ~ t and prove instead that X 7 is Ft-measurable. Writing X 7 =X o'I/J where '1/J(w) = (w,T(w)), we note that '1/J is measurable from Ft to Ft ®ß[O, t] whereas X is measurable on n x [0, t] from Ft ®ß[O, t] toS. The required measurability of X 7 now follows by Lemma 1.7. D Given a process X on JR.+ or Z+ and a set B in the range space of X, we introduce the hitting time TB=

inf{t > 0; Xt E B}.

It is often important to decide whether TB is optional. The following elementary result covers the most commonly occurring cases. Lemma 7.6 {hitting times} FixafiltrationFon T = JR.+ or Z+, let X be an F -adapted process on T with values in a measurable space ( S, S), and let B ES. Then TB is weakly optional under each of these conditions:

(i) T = Z+; (ii) T = JR.+, S is a metric space, B is closed, and X is continuous; (iii) T = JR.+, S is a topological space, B is open, and X is rightcontinuous. Proof: In case (i) it is enough to write {TB

U {Xk E B} E Fn,

:'Sn}=

n E N.

kE[l,n]

In case (ii) we get for any t > 0

h>O nEN rEIQin[h,t)

where p denotes the metric in S. Finally, in case (iii) we get

{TBO,

rEIQin(O,t)

which suffices by Lemma 7.2.

D

For special purposes we need the following more general but much deeper result, known as the debut theorem. Here and below, a filtration Fis said to be complete if the basic CT-field A is complete and each Ft contains all P-null sets in A.

124

Foundations of Modern Probability

Theorem 7. 7 (first entry, Doob, Hunt) Let the set A C ~+ x 0 be progressive with respect to some right-continuous and complete jiltration F. Then the time r(w) = inf{t 2:: 0; (t,w) E A} is F-optional. Proof: Since A is progressive, we have An [0, t) E Ft Q9 13([0, t]) for every

> 0. Noting that {r < t} is the projection of An [O,t) onto 0, we get {r < t} E Ft by Theorem Al.4, and so T is optional by Lemma 7.2. D

t

In applications of the last result and for other purposes, we may need to extend a given filtration F on ~+ to make it both right-continuous and complete. Writing A for the completion of A, we put N = {A E A; PA = 0} and define Ft = O"{Ft,N}. Then F = (Ft) is the smallest complete extension of :F. Similarly, F+ = (Ft+) is the smallest right-continuous extension of F. We show that the two operations commute and can be combined into a smallest right-continuous and complete extension, known as the (usual) augmentation of :F.

Lemma 7.8 (augmented jiltration) Every jiltration F on smallest right-continuous and complete extension g, given by 9t

= Ft+ = Ft+,

t 2:: 0.

~+

has a (4)

Proof: First we note that Ft+

c Ft+ c Ft+, t 2:: 0.

Conversely, assume that A E Ft+· Then A E Ft+h for every h > 0, and so, as in Lemma 1.25, there exist some sets Ah E Ft+h with P(A~Ah) = 0. Now choose hn -+ 0, and define A' = {Ahn i.o.}. Then A' = Ft+ and P(A~A') = 0, so A E Ft+· Thus, Ft+ C Ft+, which proves the second relation in (4). In particular, the filtration g in (4) contains F and is both right-continuous and complete. For any filtration 1l with those properties, we have

9t

= Ft+ c tlt+ = 1lt+ = 1-lt,

which proves the required minimality of g.

t 2:: 0, D

The next result shows how the O"-fields Fr arise naturally in connection with a random time-change.

Proposition 7.9 (random time-change) Let X 2:: 0 be a nondecreasing, right-continuous process adapted to some right-continuous jiltration :F. Then T 8 = inf{t > 0; Xt > s}, s 2::0, is a right-continuous process of optional times, generating a rightcontinuous jiltration 9s = Fr., s 2:: 0. If X is continuous and the time T is F -optional, then Xr is g -optional and Fr C gx.,.. Ij X is further strictly increasing, then Fr = 9x.,..

7. Martingales and Optional Times

125

In the latter case, we have in particular Ft = 9x, for all t, so the processes (rs) and (Xt) play symmetric roles. Proof: The times T 8 are optional by Lemmas 7.2 and 7.6, and since (rs) is right-continuous, so is Ws) by Lemma 7.3. If X is continuous, then by Lemma 7.1 we get for any F-optional timeT> 0 and set A E Fr

An{Xr:::; s} =An {r:::; For A

T8

}

E Fr.=

Y

8 ,

s?: 0.

= n it follows that Xr is 9-optional, and for general A we get

A E YX-r· Thus, Fr C YX-r· Both statementsextend by Lemma 7.3 to arbitrary r. Now assume that X is also strictly increasing. For any A E 9x, with t > 0 we have and so

8?: 0, u > t.

An {t:::; Ts:::; u} E Fu,

Taking the union over all s E Q+ -the set of nonnegative rationals-gives A E Fu, and as u .J.. t we get A E Ft+ = Ft. Hence, Ft = 9x., which extends as before tot= 0. By Lemma 7.1 we now obtain for any A E 9xT

An {T:::; t} =An {Xr :::; Xt} E 9x, = Ft,

t?: 0,

and so A E Fr. Thus, YX-r C Fn so the two a-fields agree.

0

To motivate the introduction of martingales, we may fix a random variable E L 1 and a filtration F Oll some index set T, and put

e

Mt = E[eiFtJ,

t E T.

The process M is clearly integrable (for each t) and adapted, and by the chain rule for conditional expectations we note that

Ms = E[MtiFs] a.s.,

s:::; t.

(5)

Any integrable and adapted process M satisfying (5) is called a martingale with respect to F, or an F-martingale. When T = Z+, it suffices to require (5) fort= s + 1, so in that case the condition becomes E[~MniFn-1]

=0

n E N,

a.s.,

(6)

where ~Mn = Mn - Mn-1· A process M = (M 1 , ... , Md) in JR.d is said to be a martingale if M 1 , ..• , Md are one-dimensional martingales. Replacing the equality in (5) or (6) by an inequality, we arrive at the notions of sub- and supermartingales. Thus, a submartingale is defined as an integrable and adapted process X with 8

< t·' -

(7)

reversing the inequality sign yields the notion of a supermartingale. In particular, the mean is nondecreasing for Submartingales and nonincreasing

126

Foundations of Modern Probability

for supermartingales. (The sign convention is suggested by analogy with sub- and Superharmonie functions.) Given a filtration :Fon Z+, we say that a random sequence A = (An) with Ao = 0 is predictable with respect to :F, or :F-predictable, if An is :Fn-1-measurable for every n E N, that is, if the shifted sequence BA = (An+1) is adapted. The following elementary result, known as the Doob decomposition, is useful to deduce results for Submartingales from the corresponding martingale versions. An extension to continuous time is proved in Chapter 25. Lemma 7.10 (centering) Any integrable and :F-adapted process X on Z+ has an a.s. unique decomposition M + A, where M is an :F-martingale and A is an :F-predictable process with Ao = 0. In particular, X is a Submartingale iff A is a.s. nondecreasing. Proof: If X = M + A for some processes M and A as stated, then clearly = E[~Xni:Fn-1] a.s. for all n E N, and so

~An

(8) which proves the required uniqueness. In general, we may define a predictable process A by (8). Then M =X-Ais a martingale, since D

We proceed to show how the martingale and Submartingale properties are preserved under various transformations. Lemma 7.11 (convex maps) Let M be a martingalein JR.d, and consider a convex function f: JR.d -T IR such that X = f(M) is integrable. Then X is a submartingale. The statement remains true for any real submartingale M, provided that f is also nondecreasing. Proof: In the martingale case, the conditional version of Jensen's inequality yields f(Ms)

= f(E[Mti:Fs]) :S E[j(Mt)I:Fs]

a.s.,

s :S t,

(9)

which shows that f(M) is a submartingale. If instead M is a Submartingale and f is nondecreasing, the first relation in (9) becomes f(Ms) :::; f(E[Mti:Fs]), and the conclusion remains valid. D The last result is often applied with f(x) =

d = 1, with j(x) =X+ =X V 0.

lxiP

for some p

~

1 or,

for

We say that an optional time r is bounded if r :::; u a.s. for some u E T. This is always true when T has a last element. The following result is an elementary version of the basic optional sampling theorem. An extension to continuous-time Submartingales appears as Theorem 7.29.

7. Martingales and Optional Times

127

Theorem 7.12 (optional sampling, Doob) Let M be a martingale on some countable index set T with filtration :F, and consider two optional times a and T, where r is bounded. Then M.. is integrable, and

Mu/\r = E[M.. I:Fu] a.s. Proof: By Lemmas 6.2 and 7.1 we get for any

t:::; u

in T

E[Mui:F.. ] = E[Mui:Ft] =Mt= M.. a.s. on {r = t}, and so E[Mui:F.. ] = M.. a.s. whenever C :F7 by Lemma 7.1, and we get

:Fu

T

:::;

u a.s. If a :::; r

< u, then

On the other hand, clearly E[M.. I:Fu] = M 7 a.s. when T :::; a 1\ u. In the general case, the previous results combine by means of Lemmas 6.2 and 7.1 into

E[M.. I:Fu] = E[M.. I:FuM] = Mu/\r a.s. on {a:::; r}, E[M.. I:Fu] = E[MuMI:Fu] = Mu/\r a.s. on {a > r}.

D

In particular, we note that if M is a martingale on an arbitrary time scale T with filtration :F and (Ts) is a nondecreasing family of bounded, optional timesthat take countably many values, then the process (M... ) is a martingale with respect to the filtration (:F... ). In this sense, the martingale property is preserved by a random time-change. From the last theorem we note that every martingale M satisfies EMu= EM.., for any bounded optional times a and T that take only countably many values. An even weaker property characterizes the dass of martingales.

Lemma 7.13 (martingale criterion) Let M be an integrable, adapted process on some index set T. Then M is a martingale iff EMu = EM7 for any T -valued optional times a and T that take at most two values. Proof: If s < t in T and A E :F8 , then T = s1A + tlAc is optional, and so 0

= EMt- EM.. = EMt - E[Ms; A] - E[Mt; Ac] = E[Mt- Ms; A].

Since Ais arbitrary, it follows that E[Mt- Msi:Fs] = 0 a.s.

D

The following predictable transformation of martingales is basic for the theory of stochastic integration.

Corollary 7.14 (martingale transform) Let M be a martingale on some index set T with filtration :F, fix an optional time T that takes countably many values, and let rJ be a bounded, :F7 -measurable random variable. Then the process Nt = ry(Mt - MtM) is again a martingale. Proof: The integrability follows from Theorem 7.12, and the adaptedness is clear if we replace TJ by rJ 1{T :::; t} in the expression for Nt. Now fix any

128

Foundations of Modern Probability

bounded, optional time a taking countably many values. By Theorem 7.12 and the pull-out property of conditional expectations, we get a.s.

and so ENu

= 0.

Thus, N is a martingale by Lemma 7.13.

0

In particular, we note that optional stopping preserves the martingale property, in thesensethat the stopped process M{ = Mrl\t is a martingale whenever M is a martingale and r is an optional time that takes countably many values. More generally, we may consider predictable step processes of the form

where r 1 :::; · · · :::; r n are optional times, and each 7Jk is a bounded, :F7 kmeasurable random variable. For any process X, we may introduce the associated elementary stochastic integral

From Corollary 7.14 we note that V· X is a martingale whenever X is a martingale and each Tk takes countably many values. In discrete time we may clearly allow V to be any bounded, predictable sequence, in which case

The result for martingales extends in an obvious way to submartingales X, provided that the predictable sequence V is nonnegative. Our next aim is to derive some basic martingale inequalities. We begin with an extension of Kolmogorov's maximum inequality in Lemma 4.15.

Proposition 7.15 (maximum inequalities, Bernstein, Levy) Let X be a submartingale on a countable index set T. Then for any r ;:::: 0 and u E T, rP{supt~uXt;:::: r}

rP{suptiXtl;:::: r}

< E[Xu; :::;

SUPt~uXt;:::: r]:::;

3suptEIXtl·

EX:;,

(10) (11)

Proof: By dominated convergence it is enough to consider finite index sets, so we may assume that T = Z+· Define r = u A inf{t; Xt;:::: r} and B = {maxt~uXt;:::: r}. Then r is an optional time bounded by u, and we note that BE :F7 and X 7 ;:::: r on B. Hence, by Lemma 7.10 and Theorem 7.12,

7. Martingales and Optional Times

129

which proves (10). Letting M + A be the Doob decomposition of X and applying (10) to -M, we further get rP{mint~uXt :=:;

-r} < =

-r} :=:; EM;; EM:_- - EMu ~ EXt - EXo


1 with p- 1 + q- 1 = 1. Then IIMtllp :=:; qiiMtllp,

t E T.

Proof: By monotone convergence we may assume that T = Z+. If IIMtiiP < oo, then 11MsliP < oo for all s :=:; t by Jensen's inequality, and so we may assume that 0 < IIMtiiP < oo. Applying Proposition 7.15 to the submartingale IMI, we get rP{Mt > r} :=:; E[IMtl; Mt> r],

r > 0.

Hence, by Lemma 3.4, Fubini's theorem, and Hölder's inequality,

IIMtll~

P


r}rP- 1 dr

00

E[IMtl; Mt> r] rP- 2 dr M*

p E 1Mtl1 ' rP- 2 dr = q EI Mt I Mt(p- 1 )



7. Martingales and Optional Times

133

Corollary 7.25 (tail u-field} If :F11 :F2,... and Q are independent ufields, then

Proof: Let T derrote the u-field on the left, and note that T Jlg(:F1 V

···V :Fn) by Proposition 6.8. Using Proposition 6.6 and Corollary 7.24, we

get for any A E T

which shows that T C Q a.s. The converse relation is obvious.

D

The last theorem can be used to give a short proof of the law of large numbers. Then let 6, 6, . . . be i.i.d. random variables in L 1 , put Sn = 6 + · · · + ~n, and define F-n = u{Sn, Sn+b ... }. Here :F_ 00 is trivial by Theorem 3.15, and for any k :S n we have E[~kiF-n] = E[~liF-n] a.s., since (~k, Sn, Sn+b ... ) g, (6, Sn, Sn+l, ... ). Hence, by Theorem 7.23, E[n- 1 SnlF-n] = n- 1 ' " '

L.Jk~n

E[~liF-n]-+ E[61F-oo]

E[~kl:F-n]

= E6.

As a further application of Theorem 7.23, we consider a kernel version of the regularization Theorem 6.3. The result is needed in Chapter 21. Proposition 7.26 (regular densities} For any measurable space (S, S) and Borel spaces (T, T) and (U, U), let f-L be a probability kernel from S to T x U. Then the densities J.L( s, dt X B) v(s,t,B)= ( d U)' sES,tET,BEU, (14) J-l s, t X have versions that form a probability kernel from S x T to U.

Proof: We may assume T and U to be Borel subsets of IR, in which case

f-L can be regarded as a probability kernel from S to IR2 • Letting 'Dn derrote the u-field in IR generated by the intervalsInk = [(k -1)2-n, k2-n), k E Z,

we define

'"'J.L(S, lnk Mn(s, t, B) = L.J ( I k f-L s, nk

X X

B) U) 1{t E Ink},

s E S, t E T, B E ß,

under the convention 0/0 = 0. Then Mn(s, ·, B) is a version of the density in (14) with respect to 'Dn, and for fixed s and B it is also a martingale with respect to J.L(s, · x U). By Theorem 7.23 we get Mn(s, ·, B) -+ v(s, ·, B) a.e. J.L(s, · x U). Thus, a product-measurable version of v is given by

v(s, t, B) = limsup Mn(s, t, B), n->oo

s ES, t E T, BE U.

It remains to find a version of v that is a probability measure on U for fixed s and t. Then proceed as in the proof of Theorem 6.3, noting that

134

Foundations of Modern Probability

in each step the exceptional (s, t)-set A lies inS (8) 7 and is suchthat the sections As = {t E T; (s, t) E A} satisfy J.t(s, As x U) = 0 for all s ES. 0 In order to extend the previous theory to martingales on JR+, we need to choose suitably regular versions of the studied processes. The next result provides two closely related regularizations of a given submartingale. Say that a process X on JR+ is right-continuous with left-hand limits (abbreviated as rclQ if Xt = Xt+ for all t ~ 0 and the left-hand limits Xt- exist and arefinite for all t > 0. For any process Y on Q+, we write y+ for the process of right-hand limits Yt+, t ~ 0, provided that the latter exist.

Theorem 7.27 (regularization, Doob} For any :F-submartingale X on with restriction Y to Q+, we have: (i) y+ exists and is rcll outside some jixed P-null set A, and Z = 1Ac y+ is a submartingale with respect to the augmented filtration :F+. (ii) If :F is right-continuous, then X has an rcll version iff EX is rightcontinuous; this holds in particular when X is a martingale.

~+

The proof requires an extension of Theorem 7.21 to suitable submartingales.

Lemma 7.28 (uniform integrability) A Submartingale X uniformly integrable iff EX is bounded.

on

z_

is

Proof: Let EX be bounded. Introduce the predictable sequence an

= E[AXni:Fn-1]

~ 0,

n ::; 0,

and note that E"'"'

~n~O

an

= EXo -

infn 0, and so :S E[X!!Fs] a.s. by o Lemma 7.11. Hence, x+ is uniformly integrable by Lemma 6.5.

x;

136

Foundations of Modern Probability

For a simple application, we consider the hitting probabilities of a continuous martingale. The result will be useful in Chapters 14, 17, and 23.

Corollary 7.30 (first hit} Let M be a continuous martingale with Mo= 0 and P{M* > 0} > 0, and define Tx = inf{t > 0; Mt= x}. Then b P[Ta 0) ~ b _ a ~ P[Ta ~Tb IM*> 0), a < 0 < b. Proof: Since T =Ta 1\ Tb is optional by Lemma 7.6, Theorem 7.29 yields EMrAt = 0 for all t > 0, and so by dominated convergence EMr = 0. Hence,

0

aP{Ta a a.s. on {a < oo}. If p > 0, we get the contradiction Ee-r < p, so p = 0. Hence, a = oo a.s. D

Exercises 1. Show for any optional times a and T that {a = T} E :F(f n :F,. and :F(f = :F,. on {a = r}. However, :F,. and :F00 may differ on {r = oo}.

2. Show that if a and r are optional times on the time scale IR.+ or Z+, then so is a + T. 3. Give an example of a random time that is weakly optional but not optional. (Hint: Let :F be the filtration induced by the process Xt = 1Jt with P{1J = ±1} = ~' and take T = inf{t; Xt > 0}.) 4. Fix a random time T and a random variable ~ in IR.\ {0}. Show that the process Xt = ~ 1{T :S t} is adapted to a given filtration :F iff T is :F-optional and ~ is :F,.-measurable. Give corresponding conditions for the process yt = O{r < t}. 5. Let p denote the dass of sets A E IR.+ X n such that the process lA is progressive. Show that P is a a-field and that a process X is progressive iff it is P-measurable.

138

Foundations of Modern Probability

6. Let X be a progressive process with induced filtration F, and fix any optional timeT< oo. Show that a{T,Xr} C Fr C Fj" C a{T,xr+h} for every h > 0. (Hint: The first relation becomes an equality when T takes only countably many values.) Note that the result may fail when P{ T = oo} > 0. 7. Let M be an F-martingale on some countable index set, and fix an optional time T. Show that M - Mr remains a martingale conditionally on Fr· (Hint: Use Theorem 7.12 and Lemma 7.13.) Extend the result to continuous time. 8. Show that any Submartingale remains a submartingale with respect to the induced filtration. 9. Let X 1 ' X 2 ' . • . be Submartingales such that the process X = supn xn is integrable. Show that X is again a submartingale. Also show that limsupn xn is a Submartingale when even SUPn IXnl is integrable. 10. Show that the Doob decomposition of an integrable random sequence X = (Xn) depends on the filtration unless X is a.s. Xo-measurable. (Hint: Compare the filtrations induced by X and by the sequence Yn = (Xo,Xn+l).) 11. Fixa random timeT and a random variable~ E L 1 , and define Mt = O{T :::; t}. Show that M is a martingale with respect to the induced filtration F iff E(~; T :::; tiT > s) = 0 for any s < t. (Hint: The set {T > s} is an atom of Fs.) 12. Let F and g be filtrations on a common probability space. Show that every F-martingale is a 9-martingale iff Ft C 9tll.r.Foo for every t ~ 0. (Hint: For the necessity, consider F-martingales of the form M 8 = E[~IFs] with ~ E L 1 (Ft).) 13. Show for any rcll Supermartingale X ~ 0 and constant r ~ 0 that rP{supt Xt ~ r} S EXo. 14. Let M be an L 2 -bounded martingale Oll z+. Imitate the proof of Lemma 4.16 to show that Mn converges a.s. andin L 2 . 15. Give an example of a martingale that is L 1-bounded but not uniformly integrable. (Hint: Every positive martingale is L 1-bounded.) 16. Show that if 91L.rn 1i for some increasing a-fields Fn, then 91L.r"" 1i.

17. Let ~n-+ ~in L 1 . Show for any increasing a-fields Fn that E[~niFn]

-+ E[~IFoo) in L 1 .

18. Let ~.6,~2, · · · E L 1 with ~n

t ~ a.s. Show for any increasing a-fields

Fn that E[~niFn] -+ E[~IFoo) a.s. (Hint: By Proposition 7.15 we have supm E[~- ~niFm] ~ 0. Now use the monotonicity.) 19. Show that any right-continuous Submartingale is a.s. rcll.

20. Let a and T be optional times with respect to some right-continuous filtration F. Show that the operators E.r" and E.r,. commute on L 1 with product EFui\T. (Hint: For any ~ E L 1 , apply the optional sampling theorem to a right-continuous version of the martingale Mt= E[~IFt).)

7. Martingales and Optional Times

139

21. Let X 2:: 0 be a Supermartingale on Z+, and let To :::; T1 :::; · · • be optional times. Show that the sequence (XrJ is again a supermartingale. (Hint: Truncate the times Tn, and use the conditional Fatou lemma.) Show by an example that the result fails for submartingales. 22. For any random time T 2:: 0 and right-continuous filtration F = (Ft), show that the process Xt = P[r :::; tiFt] has a right-continuous version. (Hint: Use Theorem 7.27 (ii).)

Chapter 8

Processes and Discrete-Time Chains ~arkov

Markov property and transition kernels; finite-dimensional distributions and existence; space and time homogeneity; strong Markov property and excursions; invariant distributions and stationarity; recurrence and transience; ergodie behavior of irreducible chains; mean recurrence times A Markov process may be described informally as a randomized dynamical system, a description that explains the fundamental role that Markov processes play both in theory and in a wide range of applications. Processes of this type appear more or less explicitly throughout the remainder of this book. To make the above description precise, let us fix any Borel space S and filtration :F. An adapted process X in S is said to be Markov if for any times s < t we have Xt = fs,t(X 8 , '19s,t) a.s. for some measurable function !s,t and some U(O, 1) random variable '19 8 ,tllF8 • The stated condition is equivalent to the less transparent conditional independence Xtllx.F8 • The process is said to be time-homogeneous if we can take !s,t = fo,t-s and space-homogeneous (when S = JRd) if !s,t(x, ·) = !s,t(O, ·) + x. A more convenient description of the evolution is in terms of the transition kernels J.ls,t(x, ·) = P{fs,t(x,ß) E ·}, which are easily seen to satisfy an a.s. version of the Chapman-Kolmogorov relation JLs,tJ.lt,u = J.ls,u· In the usual axiomatic treatment, the latter equation is assumed to hold identically. This chapter is devoted to some of the most basic and elementary portions of Markov process theory. Thus, the space homogeneity will be shown to be equivalent to the independence of the increments, which motivates our discussion of random walks and Levy processes in Chapters 9 and 15. In the time-homogeneous case we shall establish a primitive form of the strong Markov property and see how the result simplifies when the process is also space-homogeneous. Next we shall see how invariance of the initial distribution implies stationarity of the process, which motivates our treatment of stationary processes in Chapter 10. Finally, we shall discuss the classification of states and examine the ergodie behavior of discretetime Markov chains on a countable state space. The analogous but less elementary theory for continuous-time chains is postponed until Chapter 12.

8. Markov Processes and Discrete-Time Chains

141

The general theory of Markov processes is more advanced and is not continued until Chapter 19, which develops the basic theory of FeUer processes. In the meantime we shall consider several important subclasses, such as the pure jump-type processes in Chapter 12, Brownian motion and related processes in Chapters 13 and 18, and the above-mentioned random walks and Levy processes in Chapters 9 and 15. A detailed discussion of diffusion processes appears in Chapters 21 and 23, and additional aspects of Brownian motion are considered in Chapters 22, 24, and 25. To begin our systematic study of Markov processes, consider an arbitrary time scale TC IR, equipped with a filtration F = (Ft), and fix a measurable space (S,S). An S-valued process X on T is said tobe a Markov process if it is adapted to F and such that

FtJl Xu,

x.

t::; u in T.

(1)

Just as for the martingale property, we note that even the Markov property depends on the choice of filtration, with the weakest version obtained for the filtration induced by X. The simple property in (1) may be strengthened as follows. Lemma 8.1 (extended Markov property) lf X satisfies (1), then

FtJl {Xu; u ~ t},

x.

(2)

t E T.

Proof: Fix any t = to ::; t1 ::; · · · in T. By (1) we have Ftnllx.n Xtn+t for every n ~ 0, and so by Proposition 6.8

Ft

Jl

Xto, ... 'Xtn

Xtn+l'

n ~ 0.

By the same proposition, this is equivalent to

FtJl(Xt 1 ,Xt 2 , ••• ),

x.

and (2) follows by a monotone dass argument.

0

For any times s ::; t in T, we assume the existence of some regular conditional distributions

Jl-s,t(X8 , B)

= P[Xt E BIXs] = P[Xt E BIFs]

a.s.,

B ES.

(3)

In particular, we note that the transition kernels 11-s,t exist by Theorem 6.3 when S is Borel. We may further introduce the one-dimensional distributions Vt = .C(Xt), t E T. When T begins at 0, weshall prove that the distribution of X is uniquely determined by the kernels P,s,t tagether with the initial distribution vo. For a precise statement, it is convenient to use the kernel Operations introduced in Chapter 1. Note in particular that if p, and v are kernels on

142

Foundations of Modern Probability

S, then J.L ® v and J.LV are kernels from S to 8 2 and S, respectively, given for s ES by (J.L ® v)(s, B) (J.Lv)(s, B)

= =

J J.L(s, dt) J v(t, du)1B(t, u), (J.L ® v)(s, S x B)

= J J.L(s, dt)v(t, B),

Proposition 8.2 (finite-dimensional distributions) Let X be a Markov

process on T with one-dimensional distributions Vt and transition kernels J.Ls,t· Then for any to :::; · · · :::; tn in T, .C(Xto, .. . 'XtJ = Vto ® /lto,tl ® ... ® Jltn-lotn' (4) P[(Xt 1 , ••• , XtJ E ·1Ft0 ] = (J.Lt 0 ,t 1 ® · · · ® J.Ltn_ 1,tJ(Xt0 ,-). (5)

Proof: Formula (4) is clearly true for n = 0. Proceeding by induction, assume (4) to be true with n replaced by n - 1, and fix any bounded measurable function f on sn+I. Noting that Xto, ... , Xtn_ 1 are Ftn_ 1measurable, we get by Theorem 6.4 and the induction hypothesis Ef(Xt 0 , ••• , Xtn)

=

E E[f(Xtw .. , Xtn)IFtn_ 1] E

J

J(Xt 0 , ••• , Xtn- 1, Xn)J.Ltn_ 1,tn (Xtn- 1, dxn)

(vto ® Jlto,it ® · · · ® J.Ltn-lotn )J, as desired. This completes the proof of (4). In particular, for any B E S and C E sn we get

P{(Xt 0 , ••• ,Xtn) E B x C}

=

ll/to(dx)(J.Lto,t 1 ® · · · ® /ltn_ 1 ,tn)(x, C)

=

E[(J.Lto,t1 ® ··· ®J.Ltn-1,tn)(Xto,C); Xto E B],

and (5) follows by Theorem 6.1 and Lemma 8.1.

0

An obvious consistency requirement leads to the following basic so-called

Chapman-Kolmogorov relation between the transition kernels. Herewe say that two kernels J.L and J.L 1 agree a.s. if J.L(x, ·) = J.L'(x, ·) for almost every x. Corollary 8.3 (Chapman, Smoluchovsky) For any Markov process in a

Borel space S, we have

/ls,u = Jls,tJ.Lt,u a.s. Vs,

S :::;

t :::; U.

Proof: By Proposition 8.2 we have a.s. for any BE S P[Xu E BIFs] = P[(Xt, Xu) E S X BIFs] (J.Ls,t ® J.Lt,u)(Xs, S X B) = (J.Ls,t/lt,u)(Xs, B). Since S is Borel, we may choose a common null set for all B.

0

8. Markov Processes and Discrete-Time Chains

143

We henceforth assume that the Chapman-Kolmogorov relation holds identically, so that

f..ts,u

= f..ts,tf..tt,u,

S

:=:; t :=:;

(6)

U.

Thus, we define a Markov process by condition (3), in terms of some transition kernels f..ts,t satisfying (6). In discrete time, when T = Z+, the latter relation is no restriction, since we may then start from any versions of the kernels f-ln = f-ln-1,n, and define f-lm,n = f-lm+l · · · f-ln for arbitrary m < n. Given such a family of transition kernels f-ls,t and an arbitrary initial distribution v, we need to show that an associated Markov process exists. This is ensured, under weak restrictions, by the following result. Theorem 8.4 (existence, Kolmogorov) Fixa time scale T starting at 0, a Borel space ( S, S), a probability measure v on S, and a family of probability kernels f..ts,t on S, s :=:; t in T, satisfying (6). Then there exists an S-valued M arkov process X on T with initial distribution v and transition kernels f..ts,t• Proof: Introduce the probability measures Vtl , ... ,tn

=

Vf..tto,tt

0 = to :=:; t1 :=:; ••. :=:; tn, n E N.

0 ... 0 f-ltn-l,tn'

To see that the family (vt 0 , ... ,tn) is projective, let B E sn- 1 be arbitrary, and define for any k E { 1, ... , n} the set Bk= {(xb ... 'Xn) E sn;

(x1, ... 'Xk-1, Xk+1> ... 'Xn)

E

B}.

Then by (6)

as desired. By Theorem 6.16 there exists an S-valued process X on T with

and, in particular, C(Xo) = vo = v. To see that X is Markov with transition kernels f-l 8 ,t, fix any times s1 :=:; · · · :=:;Sn= s :=:; t and sets BE sn and CES, and conclude from (7) that

Vst, ... ,sn,t(B X C) E[J..ts,t(Xs, C); (Xs 11 ••• , Xsn) E B]. Writing :F for the filtration induced by X, we get by a monotone dass argument

P[Xt E C; A] = E[J..ts,t(Xs, C); A], and so P[Xt

E

CI:Fs] = f..ts,t(X 8 , C) a.s.

A E Fs, 0

144

Foundations of Modern Probability

Now assume that S is a measurable Abelian group. A kernel J.L on S is then said to be homogeneaus if

J.L(X, B) = JL(O, B- x),

x ES, BE S.

An S-valued Markov process with homogeneous transition kernels J.Ls,t is said to be space-homogeneous. Furthermore, we say that a process X in S has independent increments if, for any times to ~ · · · ~ tn, the increments Xtk- Xtk_ 1 are mutually independent and independent of Xo. More generally, given any filtration :Fon T, we say that X has :F-independent increments if Xis adapted to :Fand suchthat Xt- X 8 ll:Fs for all s ~ t in T. Note that the elementary notion of independence corresponds to the case when :F is induced by X. Proposition 8.5 {independent increments and homogeneity) Consider a measurable Abelian group S, a filtration :F on some time scale T, and an S -valued and :F -adapted process X on T. Then X is space-homogeneous :FMarkov iff it has :F-independent increments, in which case the transition kernels are given by

JLs,t(x,B)=P{Xt-XsEB-x},

xES, BES,

s~tinT.

(8)

Proof: First assume that X is Markov with transition kernels J.Ls,t(X, B) = JLs,t(B- x), By Theorem 6.4, for any s

~ t

P[Xt - Xs E BI:Fs]

XE S, BE S, s ~ t in T.

(9)

in T and B E S we get

= P[Xt E B + Xsi:Fs] = J.Ls,t(Xs, B + Xs) = JLs,tB.

Thus, Xt- X 8 is independent of :Fs with distribution JLs,t, and (8) follows by means of (9). Conversely, assume that Xt - X 8 is independent of :Fs with distribution JLs,t· Defining the associated kernel JLs,t by (9), we get by Theorem 6.4, for any s, t, and B as before,

P[Xt E BI:Fs]

= =

P[Xt- Xs E B- Xsi:Fs] J.Ls,t(B- Xs) = JLs,t(X8 , B).

Thus, Xis Markov with the homogeneous transition kernels in (9).

D

We may now specialize to the time-homogeneaus case-when T = IR+ or and the transition kernels are of the form JLs,t = JLt-s, so that

z+

P[Xt E BI:Fs] = J.Lt-s(X 8 , B) a.s., Introducing the initial distribution v of Proposition 8.2 as

.C(Xt0 , ••• ,XtJ P[(Xtw .. , XtJ E ·I:Ft0 ]

B ES, s ::; t in T.

= .C(X0 ), we may write the formulas VJ.Lto 0 J.Lh -to 0 · · · 0 JLtn -tn-1' (JLh -to 0 · · · 0 JLtn -tn_J(Xto' ·).

8. Markov Processes and Discrete-Time Chains

145

The Chapman-Kolmogorov relation now becomes

J.Ls+t

= J.LsJ.Lt,

S,

t

E T,

which is again assumed to hold identically. We often refer to the family (J.Lt) as a semigroup of transition kernels. The following result justifies the interpretation of a discrete-time Markov process as a randomized dynamical system.

Proposition 8.6 (recursion) Let X be a process on Z+ with values in a Borel space S. Then X is Markov iff there exist some measurable functions JI,f2, ... : Sx[0,1]---+ S andi.i.d. U(0,1) randomvariables'l?b'I?2 , •.• JlX0 such that Xn = fn(Xn-1, 'l?n) a.s. for all n E N. Here we may choose h = h = · · · = f iff X is time-homogeneous.

Proof: Let X have the stated representation and introduce the kernels J.Ln(x, ·) = P{fn(x, '!?) E ·}, where '!? is U(O, 1). Writing F for the filtration induced by X, we get by Theorem 6.4 for any BE S P[fn(Xn-1' 'l?n) E BIFn-1] .X{t; fn(Xn-1,t) E B} = J.Ln(Xn-bB), which shows that X is Markov with transition kernels J.Ln· Now assume instead the latter condition. By Lemma 3.22 we may choose some associated functions fn as above. Let J1, J2, ... be i.i.d. U(O, 1) and -

d

-

-

-

independent of Xo = Xo, and define recursively Xn = fn(Xn-b 'l?n) for n E N. As before, Xis Markov with transition kernels J.Ln· Hence, X 4 X by Proposition 8.2, and so by Theorem 6.10 there exist some random variables 'l?n with (X, ('!?n)) 4 (X, (Jn)). Since the diagonal in 8 2 is measurable, the desired representation follows. The last assertion is obvious from the construction. 0 Now fix a transition semigroup (J.Lt) on some Borel space S. For any probability measure v on S, there exists by Theorem 8.4 an associated Markov process Xv, and by Proposition 3.2 the corresponding distribution Pv is uniquely determined by v. Note that Pv is a probability measure on the path space (Sr, Sr). For degenerate initial distributions fix, we may write Px instead of P8x. Integration with respect to Pv or Px is denoted by Ev or Ex, respectively.

Lemma 8.7 (mixtures) The measures Px form a probability kernel from 8 to ST, and jor any initial distribution V we have

PvA =

fs

Px(A) v(dx),

AE

sr.

(10)

Proof: Both the measurability of PxA and formula (10) are obvious for cylinder sets of the form A = (7rt 1 , ••• , 7rtn)- 1 B. The general case follows 0 easily by a monotone class argument.

146

Foundations of Modern Probability

Rather than considering one Markov process X., for each initial distribution v, it is more convenient to introduce the canonical process X, defined as the identity mapping on the path space (ST,ST), and equip the latter space with the different probability measures P.,. Then Xt agrees with the evaluation map 7rt: w H Wt on sT, which is measurable by the definition of sT. For our present purposes, it is sufficient to endow the path space sT with the canonical jiltration :F induced by X. On sT we may also introduce the shift Operators Bt: sT --+ sT, t E T, given by (BtW)s

= Ws+t,

s, t E T, w E sT,

and we note that the Bt are measurable with respect to ST. In the canonical case it is further clear that BtX = Bt =X o Bt. Optional times with respect to a Markov process are often constructed recursively in terms of shifts on the underlying path space. Thus, for any pair of optional times a and r on the canonical space, we may consider the random time 'Y = a + r o B,,., with the understanding that 'Y = oo when a = oo. Under weak restrictions on space and filtration, we show that 'Y is again optional. Let C(S) and D(S) denote the spaces of continuous or rcll functions, respectively, from JR+ to S. Proposition 8.8 (compound optional times) For any metric space S, let a and r be optional times on the canonical space 8 00 , C(S), or D(S), endowed with the right-continuous, induced jiltration. Then even 'Y = a + r o Oa is optional. Proof: Since a 1\ n + r o BaAn t 'Y, we may assume by Lemma 7.3 that a is bounded. Let X denote the canonical process with induced filtration :F. Since X is :F+ -progressive, Xa+s = Xs o Ba is :F:+8 -measurable for every s ~ 0 by Lemma 7.5. Fixing any t ~ 0, it follows that all sets A = {Xs E B} with s ~ t and B E S satisfy B;;- 1 A E :F:+t· The sets A with the latter property form a a-field, and therefore

B;; 1:Ft c Now fix any t

~

:F:+t•

t ~

o.

(11)

0, and note that

{'Y 0 is recurrent. Proof: By the invariance of v,

0 < v{x} =

Jv(dy)p~x'

n

E

(21)

N.

Thus, by Proposition 8.12 and Fubini's theorem,

00 = L jv(dy)p~x = jv(dy) LP~x = jv(dy) 1 ~Y; n~1

Hence, rxx

n~1

= 1, and so x is recurrent.

xx

< -1 -_1-rx-x 0

150

Foundations of Modern Probability

The period dx of a state x is defined as the greatest common divisor of the set {n E N; P~x > 0}, and we say that x is aperiodic if dx = 1. Proposition 8.14 (positivity) If x E S has period d for all but finitely many n.

< oo,

then p~~

>0

Proof: Define S = {n E N; p~~ > 0}, and conclude from the ChapmanKolmogorov relation that S is closed under addition. Since S has greatest common divisor 1, the generated additive group equals Z. In particular, there exist some n1. ... , nk E S and z1, ... , Zk E Z with L:j Zjnj = 1. Writing m = n 1 L:j !zilnj, we note that any number n 2:: m can be represented, for suitable h E Z+ and r E {0, ... , n 1 - 1}, as n = m

+ hn1 + r =

hn1

+ 2:J_ . 0. By (25), m+h+n > n h m Pjj - PjiPiiPij'

h '2:. 0.

152

Foundations of Modern Probability

For h = 0 we get pjj+n > 0, and so dil(m + n) (dj divides m + n). Hence, in general, p~ > 0 implies dilh, and we get di ::; di. Reversing the roles of i and j yields the opposite inequality. (iii) Fix any i E S. Choosing j E S with Vj > 0 and then n E N with Pji > 0, we see from (21) that even vi > 0. D We may now state the basic ergodie theorem for irreducible Markov chains. Related results will appear in Chapters 12, 19, and 23. For any signed measure f-l we define 111-lll = sup A IMAl. Theorem 8.18 (ergodic behavior, Markov, Kolmogorov, Orey) For any irreducible, aperiodic Markov chain inS, exactly one of these cases occurs:

(i) There exists a unique invariant distribution v, the latter satisfies vi > 0 for all i E S, and for any distribution f-l on S we have (26) (ii) No invariant distribution exists, and we have lim P0

n-+oo

= 0,

i, j E S.

(27)

A Markov chain satisfying (i) is clearly recurrent, whereas one that satisfies (ii) may be either recurrent or transient. This leads to the further classification of the irreducible, aperiodic, and recurrent Markov chains into positive recurrent and null-recurrent ones, depending on whether (i) or (ii) applies. Weshall prove Theorem 8.18 by the powerful method of coupling. Here the general idea is to compare the distributions of two processes X and Y, by constructing copies X :1::. X and Y :1::. Y on a common probability space. By a suitable choice of joint distribution, one may sometimes reduce the original problern to a pathwise comparison. The coupling approach often leads to simple and transparent proofs; we shall see further applications of the method in Chapters 9, 14, 15, 16, 20, and 23. For our present needs, an elementary coupling by independence is sufficient. Lemma 8.19 {coupling) Let X and Y be independent Markov chains in Sand T with transition matrices (Pii') and (qii' ), respectively. Then (X, Y) is a Markov chain in S x T with transition matrix rij,i'i' = Pii'qjj'· If X and Y are irreducible and aperiodic, then so is (X, Y); in that case (X, Y) is recurrent whenever invariant distributions exist for both X and Y. Proof: The first assertion is easily proved by computation of the finitedimensional distributions of (X, Y) for an arbitrary initial distribution /-l®V on S x T, using Proposition 8.2. Now assume that X and Y are irreducible and aperiodic. Fixing any i, i' E Sand j,j' E T, we see from Proposition 8.14 that r0,i'i' = p"/i,qji' > 0 for all but finitely many n E N, and so even (X, Y) has the stated properties. Finally, if f-l and v are invariant

8. Markov Processes and Discrete-Time Chains

153

distributions for X and Y, respectively, then J..l ® v is invariant for (X, Y), and the last assertion follows by Proposition 8.13. 0 The point of the construction is that, if the coupled processes eventually meet, their distributions will agree asymptotically. Lemma 8.20 {strong ergodicity) If the Markov chain in 8 2 with transition matrix Pii'Pii' is irreducible and recurrent, then for any distributions J..l and von S, (28)

Proof {Doeblin): Let X and Y be independent with distributions PI-' and Pv. By Lemma 8.19 the pair (X, Y) is again Markov with respect to the induced filtration :F, and by Proposition 8.9 it satisfies the strong Markov property at every finite optional timeT. Taking T = inf{n 2: 0; Xn = Yn}, we get for any measurable set A C 8 00

In particular, (r,X 7 ,B7 X) ~ (r,X 7 ,07 Y). Defining Xn = Xn for n:::; - d and Xn = Yn otherwise, we obtain X= X, and so for any Aas above

IP{BnX E A}- P{BnY E A}l IP{BnX E A}- P{BnY E A}l IP{BnX E A, T > n}- P{BnY E A, < P{r > n}--+ 0.

T

T

> n}l 0

The next result ensures the existence of an invariant distribution. Here a coupling argument is again useful. Lemma 8.21 (existence) If (27) fails, there exists an invariant distribution.

Proof: Assurne that (27) fails, so that limsupnPio,io > 0 for some io,Jo E S. By a diagonal argument we may choose a subsequence N' C N and some constants Cj with Cj 0 > 0 such that Pio,j --+ Cj along N' for every j E S. Note that 0 < l:i Cj :::; 1 by Fatou's Iemma. To extend the convergence to arbitrary i, Iet X and Y be independent processes with the given transition matrix (pij), and conclude from Lemma 8.19 that (X, Y) is an irreducible Markov chain on 8 2 with transition probabilities %,i'j' = Pii'Pjj'. If (X, Y) is transient, then by Proposition 8.12 2 '""' L...Jn (pij ) =

L nqf;,jj < oo,

i, j E

s,

and (27) follows. The pair (X, Y) is then recurrent and Lemma 8.20 yields Pij- Pio,j --+ 0 for all i,j E J. Hence, pij --+ Cj along N' for all i and j.

154

Foundations of Modern Probability

Next conclude from the Chapman-Kolmogorov relation that

P~k+l =

L .Pi}Pjk = L .PiiP]k, J

J

i, k E

s.

Using Fatou's lemma on the left and dominated convergence on the right, we get as n -+ oo along N'

(29) Summing over k gives L:j Cj :::; 1 on both sides, and so (29) holds with equality. Thus, (ci) is invariant, and we get an invariant distribution v by 0 taking Vi =Ci/ L:j Cj· Proof of Theorem 8.18: If no invariant distribution exists, then (27) holds by Lemma 8.21. Now let v be an invariant distribution, and note that vi > 0 for all i by Proposition 8.16. By Lemma 8.19 the coupled chain in Lemma 8.20 is irreducible and recurrent, so (28) holds for any initial distribution J.L, and (26) follows since Pv o 0;; 1 = Pv by Lemma 8.11. If even v' is invariant, then (26) yields Pv' = Pv, and so v' = v. 0 The limits in Theorem 8.18 may be expressed in terms of the mean recurrence times EjTj, as follows. Theorem 8.22 (mean recurrence times, Kolmogorov) For any Markov chain in S and states i, j E S with j aperiodic, we have

(30) Proof: First take i = j. If j is transient, then pjj -+ 0 and EjTj = oo, and so (30) is trivially true. If instead j is recurrent, then the restriction of X to the set Sj = {i; rji > 0} is irreducible recurrent by Lemma 8.17 and aperiodic by Proposition 8.16. Hence, pj3 converges by Theorem 8.18. To identify the limit, define

Ln= sup{k E Z+; rj:::; n}

=

n

L 1{Xk = j},

n E N.

k=l

The rj form a random walk under Pi, and so, by the law of large numbers,

L(rj) Tjn

n

= rj

1 -+ EjTj a.s. pj ·

By the monotonicity of Lk and rj it follows that Ln/n -+ (Ejr3)- 1 a.s. Pj. Noting that Ln :::; n, we get by dominated convergence

1~ k EjLn 1 :;:;; L...,.Pjj = -n--+ E·r·' k=l

and (30) follows.

J J

8. Markov Processes and Discrete-Time Chains

155

Now let i # j. Using the strong Markov property, the disintegration theorem, and dominated convergence, we get Pi{Xn = j} = Pi{Tj $ n, (O.,.iX)n-ri = j}

pij

=

Ei[p;i-ri; Tj $ n]

-t

Pi{ Tj

< oo} / EjTj.

0

We return to continuous time and a general state space, to clarify the nature of the strong Markov property of a process X at finite optional times T. The condition is clearly a combination of the conditional independence 07 X llx.,.Fr and the strong homogeneity (31) Though (31) appears tobe weaker than (13), the two properties are in fact equivalent, under suitable regularity conditions on X and F. Theorem 8.23 (strong homogeneity) Fixa separable metric space (S,p), a probability kernel (Px) from S to D(S), and a right-continuous jiltration Fon JR+. Let X be an F-adapted rcll process inS suchthat (31) holds for all bounded optional timesT. Then X satisfies the strong Markov property.

Our proof is based on a ü-1 law for absorption probabilities, involving the sets

I

= {w E D; Wt

=wo},

A = {x ES; Pxl = 1}.

(32)

Lemma 8.24 (absorption) For X as in Theorem 8.23 and for any optional timeT< oo, we have

(33) Proof: We may clearly assume that T is bounded, say by n E N. Fix any S into disjoint Borel sets B~, B2, ... of diameter < h. For each k E N, define h

> 0, and divide

and put Tk = T otherwise. The times Tk are again bounded and optional, and we note that

156

Foundations of Modern Probability

Using (31) and (35), we get as n---+ oo and h---+ 0

E[Px.,.JC; BrX E I]

= Lk E[PxJc; BrX EI, Xr

< Lk E[Px,.k JC; Xrk

E

E Bk]

Bk]

Lk P{OrkX f{. I, Xrk E Bk}


0. More sophisticated limit theorems of this type will be derived in Chapters 14-16 and 27, often through approximation by a Brownian motion or some other Levy process. Random walks in JRd are either recurrent or transient, and our first major task is to derive a recurrence criterion in terms of the transition distribution J..l· We proceed with some striking connections between maximum andreturn times, anticipating the arcsine laws of Chapters 13, 14, and 15. This is followed by a detailed study of ladder times and heights for one-dimensional random walks, culminating with the Wiener-Hopf factorization and Baxter's formula. Finally, we prove a two-sided version of the renewal theorem, which describes the asymptotic behavior of the occupation measure and associated intensity for a transient random walk. In addition to the already mentioned connections to other chapters, we note the relevance of renewal theory for the study of continuous-time Markov chains, as considered in Chapter 12. Renewal processes may further be regarded as constituting an elementary subdass of the regenerative

160

Foundations of Modern Pobability

sets, to be studied in full generality in Chapter 22 in connection with local time and excursion theory. To begin our systematic discussion of random walks, assume as before that Sn = 6 + · · · + ~n for all n E Z+, where the ~n are i.i.d. random vectors in JRd. The distribution of (Sn) is then determined by the common distribution f..L = C(~n) of the increments. By the effective dimension of (Sn) we mean the dimension ofthe linear subspace spanned by the support of f..L· For most purposes, we may assume that the effective dimension agrees with the dimension of the underlying space, since we may otherwise restriet our attention to the generated subspace. The occupation measure of (Sn) is defined as the random measure

We also need to consider the corresponding intensity measure

(Ery)B

=E(ryB) ="' L...,.n?:O P{Sn E B},

BE

ßd.

Writing B;, = {y; Jx- yJ < c}, we may introduce the accessible set A, the mean recurrence set M, and the recurrence set R, given by

A M

R

n{

EnBe: > 0} ' 'I X ' e:>O x E IRd·, EnBe: ., X = oo} , e:>O

x E

n {

n

e:>O

{x

]Rd·

E JR.d; 'fJB;,

= oo a.s.}.

The following result gives the basic dichotomy for random walks in JRd. Theorem 9.1 {recurrence dichotomy) Let (Sn) be a random walk in JRd, and define A, M, and R as above. Then exactly one of these conditions holds:

(i) R (ii) R

= M = A, which is then a closed additive subgroup = M = 0, and JSnl -+ oo a.s.

of JRd;

A random walk is said to be recurrent if (i) holds and to be transient otherwise. Proof: Since trivially R C M C A, the relations in (i) and (ii) are equivalent to A C R and M = 0, respectively. Further note that A is a closed additive semigroup. First assume P{JSnl -+ oo} < 1, so that P{JSnl < r i.o.} > 0 for some r > 0. Fix any c > 0, cover the r-ball araund 0 by finitely many open balls BI, ... ,Bn ofradius c/2, and note that P{Sn E Bk i.o.} > 0 for at least one k. By the Hewitt-Savage ü-1law, the latter probability equals 1. Thus, the optional time T = inf{n 2: 0; Sn E Bk} is a.s. finite, and the strong Markov property at T yields 1 = P{Sn E Bk i.o.} :S P{JSr+n- SrJ < c i.o.} = P{JSnJ < c i.o.}.

9. Random Walks and Renewal Theory

161

Hence, 0 E R in this case. To extend the latter relation to A C R, fix any x E A and e > 0. By the strong Markov property at u = inf{n 2': 0; IBn- xi < e/2},

P{ISn- xj < e i.o.}

> P{u < oo, ISCY+n- SCYI < e/2 i.o.} P{u < oo}P{ISnl < e/2 i.o.} > 0,

and by the Hewitt-Savage 0-1 law the probability on the left equals 1. Thus, x E R. The asserted group property will follow if we can prove that even -x E A. This is clear if we write

P{ISn

+ xl < e

i.o.}

=

P{ISC7+n- 8(7

>

P{ISnl

+ xi < e

< e/2 i.o.}

i.o.}

= 1.

Next assume that IBn I --+ oo a.s. Fix any m, k E N, and conclude from the Markov property at m that

< r, infn2':kiSm+nl2': r} > P{ISml < r, infn2':kiSm+n- Sml2': 2r} = P{ISml < r} P{infn2':kiSnl 2': 2r }.

P{ISml

Here the event on the left can occur for at most k different values of m, and therefore P{infn2':kiSnl2': 2r} LmP{ISml

< r} < oo,

k E N.

As k --+ oo, the probability on the left tends to 1. Hence, the sum converges, and we get E'T]B < oo for any bounded set B. This shows that M = 0. D The next result gives some easily verified recurrence criteria.

Theorem 9.2 (recurrence for d = 1, 2) A random walk (Sn) in JRd is recurrent under each of these conditions:

(i) d = 1 and n- 1 Sn .!+ 0; (ii) d = 2, E6 = 0, and El61 2 < oo. In (i) we recognize the weak law of large numbers, which is characterized in Theorem 5.16. In particular, the condition is fulfilled when E6 = 0. By contrast, E6 E (0, oo] implies Sn --+ oo a.s. by the strong law of large numbers, so in that case (Sn) is transient. Our proof of Theorem 9.2 is based on the following scaling relation. As before, a ;:S b means that a ~ cb for some constant c > 0.

Lemma 9.3 (scaling) For any random walk (Sn) in JRd, Ln2': 0P{ISnl ~ re} $ rdLn2':0P{ISnl ~ e},

r 2': 1, e

> 0.

Proof: Cover the ball {x; lxl ~ re} by balls B1. ... , Bm of radius e /2, and note that we can make m ;:S rd. Introduce the optional times Tk =

Foundations of Modern Pobability

162

inf{n; Sn E Bk}, k property that

=

~n P{JSnl :S rc}

1, ... , m, and conclude from the strong Markov


0. Now fix any c > 0 and r ~ 1, and conclude from Lemma 9.3 that

LP{JSnl :Sc} n

2: r- 2 LP{ISnl :S rc} = n

{

Jo

00

P{IS[r2tJI :S rc-}dt.

As r -+ oo, we get by Fatou's lemma

and the recurrence follows again by Theorem 9.1.

0

Our next aim is to derive a general recurrence criterion, stated in terms ofthe characteristic function P, of f.L· Write Be= {x E JR.d; JxJ < c-}.

Theorem 9.4 (recurrence criterion, Chung and Fuchs} Let (Sn) be a random walk in JR.d based on some distribution J.L, and fix any c > 0. Then (Sn) is recurrent iff 1 sup 1R ~ dt = oo. (1) Ooo

r ~ 1-1Tn/-L-> }B.r lim ~ 1-1Tn/-L = }B.r ~~ = 1A

}B.

A

J-L

n->oo

t1

00.

Thus, (1) holds, and (Sn) is recurrent. Now assume (4) instead. Decreasing c: ifnecessary, we may further assume that ~p, ~ 0 on Be:. As before, we get

r ~-1< r 1 < r _1_ < 1-r{L- JB.1-r~{L- JB.1-~{L

}B.

and so (1) fails. Thus, (Sn) is transient.

00

' 0

The last result enables us to supplement Theorem 9.2 with some conclusive information for d ~ 3.

Theorem 9.8 (transience for d dimension d ~ 3 is transient.

~

3)

Any random walk of effective

Proof: We may assume that the symmetrized distribution is again ddimensional, since J-L is otherwise supported by some hyperplane outside the origin, and the transience follows by the strong law of large numbers. By Corollary 9.6, it is enough to prove that the symmetrized random walk (Sn) is transient, and so we may assume that J-L is symmetric. Considering the conditional distributions on Br and B'j. for large enough r > 0, we may write J-L as a convex combination CJ-Ll +(1-c)J-L 2, where /-Ll is symmetric and d-dimensional with bounded support. Letting (rij) denote the covariance matrix of /-Lb we get as in Lemma 5.10

P1(t)

= 1- ~ L·•,J.rijtitj + o(ltl 2 ),

t-+ 0.

Since the matrix (rij) is positive definite, it follows that 1- P, 1 (t) ~ 1t1 2 for small enough ltl, say fort E Be:. A similar relation then holds for P,, and so

1 B.

-1 dt ;S - /-Lt -A-

1 1c dt ;S -112 B. t

0

r d-3 dr < oo.

9. Random Walks and Renewal Theory

165

Thus, (Sn) is transient by Theorem 9.4.

0

We turn to a more detailed study of the one-dimensional random walk Sn = 6 + · · · + en, n E Z+· Say that (Sn) is simple if 161 = 1 a.s. For a simple, symmetric random walk (Sn) we note that

(5) The following result gives a surprising connection between the probabilities Un and the distribution of last return to the origin. Proposition 9.9 {last retum, FeUer) Let (Sn) be a simple, symmetric random walk in Z, put CTn = max{k ::; n; s2k = 0}, and define Un by (5). Then

P{crn

= k} = UkUn-k,

0 :S k :Sn.

Our proof will be based on a simple symmetry property, which will also appear in a continuous-time version as Lemma 13.14. Lemma 9.10 {refiection principle, Andre) For any symmetric random

walk (Sn) and optional timeT, we have (Sn)

g, (Sn),

Sn= SnM- (Sn- SnM),

where

n 2:: 0.

Proof: We may clearly assume that T < 00 a.s. Writing s~ = ST+n - s'T' d n E Z+, we get by the strong Markov property S = S'll(S'T, r), and by

symmetry - s g, s. Hence, by combination (- S'' s'T' T) the assertion follows by suitable assembly.

g, (S'' s'T' T)'

and 0

Proof of Proposition 9.9: By the Markov property at time 2k, we get P{crn

= k} = P{S2k = O}P{crn-k = 0}, 0 :S k :Sn,

which reduces the proof to the case when k that P{S2

= 0.

Thus, it remains to show

# 0, ... , S2n # 0} = P{S2n = 0},

n E N.

By the Markov property at time 1, the left-hand side equals ~P{mink 0. The contradiction shows that E71 = oo, and so liminfn Sn= -oo by Proposition 9.13. (ii) In this case Sn -+ oo a.s. by the law of large numbers, and the formula ES71 = E71 E6 follows as before. (iii) This is clear from the relations S 71 2': ~t and S7 - :::; - ( [ . D 1

We proceed with a celebrated factorization, which provides some more detailed information about the distributions of ladder times and heights. Here we write x± for the possibly defective distributions of the pairs (71, S 7 J and (71, S7 - ), respectively, and let 1/J± denote the correspond1

ing distributions of (a1,SoJ and (a!,S".-). Put x; = x±({n} X·) and 1 1/J;= = 1/J± ({n} x ·). Let us finally introduce the measure x0 on N, given by X~

= =

P{S1 A · · · A Sn-1 > 0 =Sn} P{S1 V··· V Sn-1 < 0 =Sn},

n E N,

where the second equality holds by (6). Theorem 9.15 (Wiener-Hopf factorization) For any random walk in lR based on some distribution p,, we have

oo- 81 ® JL ho -1/J±

(oo- x+) * (oo -1/J-) = (oo -1/J+) * (oo- x-), (11) = (ho- X±)* (ho- X0 ). (12)

Note that the convolutions in (11) are defined Oll the space z+ X JR, whereas those in (12) can be regarded as defined Oll z+. Alternatively, we may consider x0 as a measure on N x {0}, and interpret all convolutions as defi.ned Oll /E+ X JR.

Proof: Define the measures Pb p2, ... on (0, oo) by

PnB

= =

P{S11\···ASn-1>0,SnEB}

ELk 1{7k = n, STk E B},

n E N, BE B(O,oo),

(13)

where the second equality holds by (10). Put p0 = 80 , and regard the sequence p = (Pn) as a measure on Z+ x (0, oo). Noting that the corresponding measures on lR equal Pn + 1/J;; and using the Markov property at time n - 1, we get

Pn + 1/;;;_ = Pn-1 * JL = (p * (81 ® p,))n, Applying the strong Markov property at (13), we see that also

71

n E N.

(14)

to the second expression in

n

Pn=:Lxt*Pn-k=(X+*P)n, k=1

nEN.

(15)

9. Random Walks and Renewal Theory

169

Recalling the values at zero, we get from (14) and (15) p + 1/J- =

8o + p * (81 0 J..L),

p=

8o + x+ * p.

Eliminating p between the two equations yields the first relation in (11), and the second relation follows by symmetry. To prove (12), we note that the restriction of '!j;+ to (0, oo) equals 1/J:/; -x~. Thus, for any BE B(O,oo),

(x:; -1/J:/; + X~)B = P{ma.xk liminf { Ery(I -x+tk)J.L*m(dx) k->oo }B b -limsup k->oo

}

r Ery(I- X+ tk)J.L*m(dx) ßc

> b- { limsupEry(l-x+tk)J.L*m(dx)~bJ.L*mB. } ßc

k->oo

(20)

Now fix any h > 0 with J.L(O, h] > 0. Noting that Ery[r, r + h] > 0 for all r ~ 0 and writing J = [0, a] with a = h + 1, we get by (20) liminf Ery(J + tk- r) ~ b,

r

k->oo

Next conclude from the identity 80

= (80 -

~

a.

(21)

J.L) * Ery that

As k--+ oo, we get by (21) and Fatou's lemma 1 ~ b l:k>l J.L(na, oo ). Since the sum diverges by Lemma 3.4, it follows that b = 0. D We may use the preceding theory to study the renewal equation F = f + F * J.L, which often arises in applications. Here the convolution F * J.L is

defined by

(F * J.L)t = fot F(t- s)J.L(ds),

t

~ 0,

9. Random Walks and Renewal Theory

175

whenever the integrals on the right exist. Under suitable regularity conditions, the renewal equation has the unique solution F = f * jj, where Jt denotes the renewal measure Ln>o p,*n. Additional conditions ensure the solution F to converge at oo. A precise statement requires some further terminology. By a regular step function we mean a function on ~+ of the form (22) where h > 0 and at, a2, · · · E R A measurable function f on ~+ is said tobe directly Riemann integrable if .Aifl < oo and there exist some regular step functions f~ with J;; ::; f::; Jt and .A(ft - J;;) --+ 0. Corollary 9.23 {renewal equation) Fixadistribution p, =f 80 on ~+ with associated renewal measure jj, and let f be a locally bounded and measurable function on ~+. Then the equation F = f + F * p, has the unique, locally bounded solution F = f * Jt. If f is also directly Riemann integrable and if p, is nonarithmetic with mean c, then Ft --+ c- 1 ..\f as t--+ oo. Proof: Iterating the renewal equation gives

(23) Now p,*n[o, t] --+ 0 as n --+ oo for fixed t ~ 0 by the weak law of large numbers, and so for a locally bounded F we have F * p,*n --+ 0. If even f is locally bounded, then by (23) and Fubini's theorem,

F = "" f * p,*k = f * "" p,*k = f * Jt. ~k~O ~k~O Conversely, f + f * Jt * p, = f * jj, which shows that F = f * Jt solves the

given equation. Now let p, be nonarithmetic. If f is a regular step function as in (22), then by Theorem 9.20 and dominated convergence we get as t--+ oo

=

lot

f(t- s)Jt(ds)

--+ c- 1 h""

= L.

J~l

ajP,((O, h]

+ t- jh)

a · = c- 1 )..j.

~j~l J

In the general case, we may introduce some regular step functions f~ with J;; ::; f ::; Jt and .A(ft - J;;) --+ 0, and note that

(!;; *Mt :S Ft :S (f;t * Jt)t,

t ~ 0, n E N.

Letting t --+ oo and then n --+ oo, we obtain Ft --+ c- 1 ..\f.

0

Exercises 1. Show that if (Sn) is recurrent, then so is the random walk (Snk) for each k E N. (Hint: If (Snk) is transient, then so is (Snk+j) for any j > 0.)

176

Foundations of Modern Pobability

2. For any nondegenerate random walk (Sn) in IRd, show that ISnl (Hint: Use Lemma 5.1.)

!:+ oo.

3. Let (Sn) be a random walk in IR based on a symmetric, nondegenerate distribution with bounded support. Show that (Sn) is recurrent, using the fact that limsupn(±Sn) = oo a.s. 4. Show that the accessible set A equals the closed semigroup generated by supp 1-L· Also show by examples that A may or may not be a group. 5. Let v be an invariant measure on the accessible set of a recurrent random walk in IRd. Show by examples that ETJ may or may not be ofthe form oo·v. 6. Show that a nondegenerate random walk in IRd has no invariant distribution. (Hint: If v is invariant, then 1-L * v = v.) 7. Show by examples that the conditions in Theorem 9.2 arenot necessary. (Hint: For d = 2, consider mixtures of N(O,a 2 ) and use Lemma 5.18.) 8. Consider a random walk (Sn) based on the symmetric p-stable distribution on IR with characteristic function e-IW. Show that (Sn) is recurrent for p 2: 1 and transient for p < 1. 9. Let (Sn) be a random walk in IR 2 based on the distribution /-L 2, where 1-L is symmetric p-stable. Show that (Sn) is recurrent for p = 2 and transient for p < 2.

10. Let 1-L = C/-Ll + (1- c)/-L2, where /-Ll and /-L2 are symmetric distributions on IRd and c is a constant in (0, 1). Show that a random walk based on 1-L is recurrent iff recurrence holds for the random walks based on p, 1 and p, 2 . 11. Let 1-L = /-Ll * /-L2, where /-Ll and /-L2 are symmetric distributions on IRd. Show that if a random walk based on 1-L is recurrent, then so are the random walks based on /-Ll and /-L2· Also show by an example that the converse is false. (Hint: For the latter part, let /-Ll and /-L2 be supported by orthogonal subspaces.) 12. For any symmetric, recurrent random walk on zd, show that the expected number of visits to an accessible state k =/: 0 before return to the origin equals 1. (Hint: Compute the distribution, assuming probability p for return before visit to k.)

13. Use Proposition 9.13 to show that any nondegenerate random walk in

zd has infinite mean recurrence time. Compare with the preceding problem.

14. Show how part (i) of Proposition 9.14 can be strengthened by means of Theorems 5.16 and 9.2. 15. For a nondegenerate random walk in IR, show that limsupn Sn = oo a.s. iff a1 < oo a.s. and that Sn -+ oo a.s. iff Ea1 < oo. In both conditions, note that a1 can be replaced by r 1 . 16. Let 'fJ be a renewal process based on some nonarithmetic distribution on IR+. Show for any e > 0 that sup{t > 0; ETJ[t, t + e] = 0} < oo. (Hint: Imitate the proof of Proposition 8.14.)

9. Random Walks and Renewal Theory

177

17. Let JL be a distribution Oll z+ such that the group generated by supp JL equals Z. Show that Proposition 9.18 remains true with v{ n} = c- 1 JL(n,oo), n ~ 0, and prove a corresponding version ofProposition 9.19. 18. Let TJ be the occupation measure of a random walk on Z based on some distribution JL with mean c E iR \ {0} such that the group generated by supp JL equals Z. Show as in Theorem 9.20 that Ery{ n} -+ c- 1 V 0. 19. Derive the renewal theorem for random walks on Z+ from the ergodie theorem for discrete-time Markov chains, and conversely. (Hint: Given a distribution JL Oll N, construct a Markov chain X Oll z+ with Xn+l = Xn + 1 or 0, and such that the recurrence times at 0 are i.i.d. JL· Note that X is aperiodic iff Z is the smallest group containing supp JL·) 20. Fix a distribution JL on IR with symmetrization P,. Note that if ji is nonarithmetic, then so is JL· Show by an example that the converse is false. 21. Simplify the proof of Lemma 9.21, in the case when even the symmetrization ji is nonarithmetic. ( Hint: Let 6, 6, . . . and ~~, ~~, . . . be i.i.d. JL, and define Sn= a'- a + Lk::;n(~~- ~k).) 22. Show that any monotone and Lebesgue integrable function on IR+ is directly Riemann integrable.

23. State and prove the Counterpart of Corollary 9.23 for arithmetic distributions. 24. Let (~n) and (TJn) be independent i.i.d. sequences with distributions JL and v, put Sn = Lko[Sn, Sn+ ~n+l)· Show that Ft = P{ t E-U} satisfies the renewal equation-F = f + F * JL * v with ft = JL( t, oo). Assuming JL and v to have finite means, show also that Ft converges as t-+ oo, and identify the limit. 25. Consider a renewal process TJ based on some nonarithmetic distribution JL with mean c < oo, fix an h > 0, and define Ft = P{ ry[t, t + h] = 0}. Show that F = f + F * JL, where ft = JL( t + h, oo). Also show that Ft converges as t-+ oo, and identify the limit. (Hint: Consider the first point of TJ in (0, t), if any.)

26. For TJ as above, let r = inf{t ~ 0; ry[t, t + h] = 0}, and put Ft = P{r ~ t}. Show that Ft = JL(h,oo) + J0h/\t JL(ds)Ft-s, or F = f + F * JLh, where JLh = 1[o,h] • JL and f JL(h, oo ).

=

Chapter 10

Stationary Processes and Ergodie Theory Stationarity, invarianee, and ergodieity; diserete- and eontinuoustime ergodie theorems; moment and maximum inequalities; multivariate ergodie theorems; sample intensity of a random measure; subadditivity and produets of random matriees; eonditioning and ergodie deeomposition; shijt eoupling and the invariant a-field In this ehapter we eome to the third important dependenee strueture of probability theory, beside those of martingales and Markov proeesses, namely stationarity. A stationary proeess is simply a proeess whose distribution is invariant under shifts. Stationary proeesses are important in their own right, and they also arise under broad eonditions as steady-state limits of various Markov and renewal-type proeesses, as we have seen in Chapters 8 and 9 and will see again in Chapters 12, 20, and 23. Our present aim is to present some of the most useful general results for stationary and related proeesses. The key result of stationarity theory is Birkhoff's ergodie theorem, whieh may be regarded as a strong law of large numbers for stationary sequenees and proeesses. After proving the classieal ergodie theorems in diserete and eontinuous time, we turn to the multivariate versions of Zygmund and Wiener, the former in a setting for noneommutative mappings and reetangular regions, the latter in the eommutative ease but with averages over increasing families of eonvex sets. Wiener's theorem will also be considered in a version for random measures that will be useful in Chapter 11 for the theory of Palm distributions. We finally present a version of Kingman's subadditive ergodie theorem, along with an important applieation to random matrices. In all the mentioned results, the limit is a random variable, measurable with respect to the appropriate invariant a-field I. Of special interest then is the ergodie ease, when I is trivial and the limit reduees to a eonstant. For general stationary proeesses, we eonsider a deeomposition of the distribution into ergodie components. The ehapter concludes with some basic criteria for coupling and shift eoupling of two processes, expressed in terms of the tail and invariant a-fields T and I, respeetively. Those results will be helpful to prove some ergodie theorems in Chapters 11 and 20.

10. Stationary Processes and Ergodie Theory

179

Our treatment of stationary sequences and processes is continued in Chapter 11 with some important applications and extensions of the present theory. In particular, we will then derive ergodie theorems for Palm distributions, as well as for entropy and information. In Chapter 20 we show how the basic ergodie theorems admit extensions to suitable contraction operators, which leads to a profound unification of the present theory with the ergodie theory for Markov transition operators. Our treatment of the ratio ergodie theorem is also postponed until then. Let us now return to the basie notions of stationarity and invariance. Then fix an arbitrary measurable space (S, S). Given a measure J.L and a measurable transformation T on S, we say that T is J.L-preserving or measure-preserving if J.L o r- 1 = J.L· Thus, if ~ is a random element of S with distribution J.L, then T is measure-preserving iff T~ ~ ~· In partieular, consider a random sequence ~ = (~o, 6, ... ) in some measurable space (S', S'), and let () denote the shift on S = (S') 00 given by 0( xo, x 1 , ... ) =

(x1, x2, ... ). Then ~ is said tobe stationary if ()~ ~ ~· We show that the general situation is equivalent to this special case. Lemma 10.1 {stationarity and invariance) For any random element ~in S and measurable transformation T on S, we have T~ ~ ~ iff the sequence (Tn~) is stationary, in which case even (! o Tn~) is stationary for every measurable function f. Conversely, any stationary random sequence admits such a representation. Proof: Assuming T~ ~ ~' we get

and so (! o rn~) is stationary. Conversely, if 'fJ = (rJo, 'f/ll ... ) is stationary, we may write 'f/n = 1r0 (0nry) with 7ro(xo, x1, ... ) = xo, and we note that ()ry ~ 'fJ by the stationarity of 'f/· D In partieular, we note that if ~o, 6, . . . is a stationary sequence of random elements in some measurable space S, and if f is a measurable mapping of 8 00 into some measurable space S', then the random sequence

is again stationary. The definition of stationarity extends in the obvious way to random sequences indexed by Z. The two-sided versions have the technical advantage that the associated shift operators form a group, rather than just a semigroup as in the one-sided context. The following result shows that the two cases are essentially equivalent. Herewe assume the existence of appropriate randomization variables, as explained in Chapter 6.

180

Foundations of Modern Probability

Lemma 10.2 (two-sided extension) Any stationary random sequence ... in a Borel space admits a stationary extension ... ,~-1,~o,6, ... to the index set Z.

~0 ,6,

Proof: Assuming iJt, iJ2,... to be i.i.d. U(O, 1) and independent of (~o, 6, ... ), we may construct the ~-n recursively as functions of ~ d and iJ 1 , ... , i}n suchthat (~-n,~-n+1, ... ) = ~ for all n. In fact, once ~- 1 , ... ,~-n have been chosen, the existence of ~-n-1 is clear from Theorem 6.10 if we note that (~-n, ~-n+1, ... ) 4 Of Finally, the extended sequence is stationary by Proposition 3.2. D ~

=

Now fix a measurable transformation T on some measure space (S, S, J.L), and let S~-' denote the J.L-Completion of S. We say that a set I C S is invariant if r- 1 I = I and almost invariant if r- 1 I = I a.e. J.L, in the sense that J.L(T- 1 I i:::J.I) = 0. Since inverse mappings preserve the basic set operations, the classes I and I' of invariant sets in S and almost invariant sets in S~-' form a-fields in S, called the invariant and almost invariant a-fields, respectively. A measurable function f on S is said to be invariant if f o T f and almost invariant if f o T = f a.e. J.L· The following result gives the basic relationship between invariant or almost invariant sets and functions.

=

Lemma 10.3 (invariant sets and functions) Fixa measure J.L and a measurable transformation T on S, and let f be a measurable mapping of S into a Borel space S'. Then f is invariant or almost invariant iff it is I-measurable or I' -measurable, respectively.

Proof: We may first apply a Borel isomorphism to reduce to the case when S' = IR. If f is invariant or almost invariant, then so is the set Ix = f- 1 ( -oo, x) for any x E IR, and so Ix E I or I', respectively. Conversely, if f is measurable with respect to I or I', then Ix E I or I', respectively, for every x E R Hence, the function fn(s) = 2-n[2n f(s)], s E S, is invariant or almost invariant for every n E N, and the invariance or almost invariance D carries over to the limit f. The next result clarifies the relationship between the invariant and almost invariant a-fields. Here we write I~-' for the J.L-completion of I in S~-', the a-field generated by I and the J.L-null sets inS~-'. Lemma 10.4 (almost invariance) For any distribution J.L and J.L-preserving transformation T on S, the associated invariant and almost invariant a-fields I and I' are related by I' =I~-'.

Proof: If JE I~-', there exists some I EI with J.L(Ii:::J.J) = 0. Since T is J.L-preserving, we get J.L(T- 1 Ji:::J.J)

< J.L(T- 1 Ji:::J.T- 1 I)+J.L(T- 1 Ii:::J.I)+J.L(Ii:::J.J) J.L o r- 1 (Ji:::J.I)

= J.L(Ji:::J.I) = 0,

10. Stationary Processes and Ergodie Theory

181

whieh shows that J E I'. Conversely, given any J E I', we may choose Uk>n r-n J'. Then, some J' E S with J..t( J 6.J') = 0 and put I = 0 clearly, I EI and J..t(I6.J) = 0, and so JE I~-'.

nn

A measure-preserving mapping Ton some probability space (8, S, J..t) is said to be ergodie for J..t or simply J..t-ergodie if the invariant a-field I is J..t-trivial, in the sense that J..ti = 0 or 1 for every I E I. Depending on viewpoint, we may prefer to say that J..t is ergodie for T, or T-ergodie. The terminology carries over to any random element ~ with distribution J..t, whieh is said to be ergodie whenever this is true for T or J..l· Thus, ~ is ergodie iff P{~ EI}= 0 or 1 for any I EI, that is, ifthe a-fieldi~ = ~- 1 I in n is P-trivial. In partieular, a stationary sequence ~ = (~n) is ergodie if the shift-invariant a-field is trivial for the distribution of ~The next result shows how the ergodicity of a random element ~ is related to the ergodieity of the generated stationary sequence. Lemma 10.5 (ergodieity) Let~ be a random elementinS with distribution J..t, and let T be a J..t-preserving mapping on S. Then ~ is T-ergodie iff the sequenee (Tn~) is B-ergodie, in whieh ease even fJ = (! o rn~) is B-ergodie for every measurable mapping f on S.

Proof: Fix any measurable mapping f: S -+ S', and define F = (! o rn; n ~ 0), so that FoT=(} o F. If I c (8') 00 is B-invariant, then r- 1p- 1 I= p- 1 (}- 1 I = p- 1 I, and so p- 1 I is T-invariant in S. Assuming ~ to be ergodic, we obtain P { 'fJ E I} = P { ~ E F- 1 I} = 0 or 1, which shows that even fJ is ergodie. Conversely, let the sequence (Tn~) be ergodie, and fix any T-invariant set I inS. Put F = (Tn; n ~ 0), and define A = {s E 8 00 ; sn EI i.o.}. Then I= p- 1 A and Ais {}-invariant. Hence, P{~ EI} = P{(Tn~) E A} = 0 or 1, whieh means that even ~ is ergodic. 0 We may now state the fundamental a.s. and mean ergodie theorem for stationary sequences of random variables. Recall that (S, S) denotes an arbitrary measurable space, and write I~= ~- 1 I for convenience. Theorem 10.6 ( ergodie theorem, Birkhoff) Let~ be a random element in S with distribution J..t, and let T be a J..t-preserving map on S with invariant a-field I. Then for any measurable funetion f ~ 0 on S,

f(Tk~)-+ E[f(~)II~] a.s. L...Jk 0]

~ 0.

182

Foundations of Modern Probability

Proof (Garsia): Put Mn = S 1 V · · · V Sn. Assuming the canonical space :IR.00 , we note that

~

to be defined on

6 + Sk-1 oB :::; 6 + (Mn oB)+, k = 1, ... , n. maxima yields Mn :::; 6 +{Mn oB)+ for all n E N, and Sk =

Taking stationarity

E[6; Mn> 0] Since Mn

so by

2: E[Mn- (Mn oB)+; Mn> 0] 2: E[(Mn)+- (Mn oB)+] = 0.

t supn Sn, the assertion follows by dominated convergence.

D

Proof of Theorem 10.6 (Yosida and Kakutani): First assume that f E L 1 , and put 'f/k = j(Tk- 1 ~) for convenience. Since E['TJ1IId is an invariant function of ~ by Lemma 10.3, the sequence (k = 'f/k - E[ry1IId is again stationary. Writing Sn= ( 1 + · · · + (n, we define for any c > 0

Ae:

= {limsupn(Sn/n)

and note that the sums {supnS~

> c:},

(~ =

((n- c)1A.,

s;, = (f + · · · + (;, satisfy

> 0} = {supn(S~jn) > 0} = {supn(Snfn) > c:} n Ac= Ag.

Since Ac E It:,, the sequence ((;,) is stationary, and Lemma 10.7 yields 0

> 0] = E[(- c; Ac]

:::;

E[(f; supnS~

=

E[E[(IIt:,]; Ac] - c:PAC

= -c:P Ag,

which implies PAc = 0. Thus, limsupn(Snfn) :::; c a.s., and c being arbitrary, we obtain limsupn(Snfn) :::; 0 a.s. Applying the same result to -Sn yields liminfn(Sn/n) ;:::: 0 a.s., and so by combination Snfn ~ 0 a.s. Next assume that f E LP for some p 2: 1. Using Jensen's inequality and the stationarity of Tk ~, we get for any A E A and r > 0

E 1Ain- 1 "

~k r}rP- 1 dr

2~1 > r] rP- 2 dr

{26 2pE 6 Jo rP- 2 dr

2p(p-1)- 1E6(26)P- 1 ~ E~f.

(ii) Form= 0, we may write EM-1


r}dr


r] r- 1 dr {26V1

J1

r - 1dr = 2 E 6 log+ 26

e + 2E[6log26; 26 > e) 1 + E 6 log+ 6.

For m > 0, we instead write

EMlog~ M

1oo P{Mlog~ M > r}dr 1

00

P{M

> t} (mlogm- 1 t + logm t) dt

< 21 00 E[6; 26 > t) (mlogm- 1 t+logmt)r 1dt

0

186

Foundations of Modern Probability

Given a measure space (8, S, J.L), we introduce for any m ~ 0 the dass L logm L(J.L) of measurable functions f on S satisfying J lfllog~ ifidJ.L < oo. Note in particular that L log0 L = L 1 . Using the maximum inequalities of Proposition 10.10, we may prove the following multivariate version of Theorem 10.6 for possibly noncommuting, measure-preserving transformations T1, ... ,Td. Theorem 10.12 (multivariate ergodie theorem, Zygmund) Let~ be a random element in S with distribution J.L, let T1, ... , Td be J.L-preserving maps on S with invariant a-fields I1. ... ,Id, and put :h = ~- 1 Ik. Then for any f E Llogd-l L(J.L), we have as n1, ... , nd-+ oo (n 1 ·"nd)- 1

L ... L

f(Tf 1

...

T;d~)-+ E 3 d ... E 31 j(~) a.s.

The same convergence holds in IJ' for some p

~

(3)

1 when f E IJ' (J.L).

Proof: Since E[f(~)l.7k] = J.L[fiik] o ~ a.s., e.g. by Theorem 10.6, we may take ~ to be the identity mapping on 8. For d = 1 the result reduces to Theorem 10.6. Now assume the statement to be true up to dimension d. Proceeding by induction, consider any J.L-preserving maps T1. ... , Td+l on S and let f E L logd L. By the induction hypothesis, the d-dimensional version of (3) holds as stated, and we may write the result in the form fm -+ f a.s., where m = (nb ... , nd)· Iterating Proposition 10.10, we also note that J.L supm lfml < oo. Hence, by Corollary 10.8 (i) we have as m, n-+ oo n-

12::k 0, then

(i) IB- BI :::; (2d//d)IBI; (ii) lßcBI:::; 2((1 + c:fr(B))d- l)IBIO We continue with a simple geometric estimateo Lemma 10.16 (space filling) Fix any bounded, convex sets B 1 c ° C Bm in ßd with IB1 1 > 0, a bounded set K E ßd, and a function p: K -+ { 1, 000, m} 0 Then there exists a finite subset H C K such that the sets Bp(x) + x, XE H, are disjoint and satisfy IKI 5 (2d//d) L:xEH IBv(x)lo o

0

Proof: Put Cx = Bp(x) + x and choose Xt, x2, o E K recursively, as followso Once x 1 , 000, Xj- 1 have been selected, we choose Xj E K with the largest possible p(x) suchthat Cx, nCxj = 0 for all i < jo The construction terminates when no such Xj existso Put H = {xi}, and note that the sets Cx with x EH are disjointo Now fix any y E Ko By the construction of H we have Cx n Gy=/:- 0 for some x EH with p(x) 2: p(y), and so 0

y E Bv(x) - Bp(y)

Hence, K C UxEH(Bp(x)- Bp(x)

+x C

0

Bp(x) - Bv(x)

+ Xo

+ x), and so by Lemma 10015 (i)

IKI S~xEH " /Bp(x)- Bp(x)l S (2d//d)" /Bp(x)lo ~xEH

0

We may now establish a multivariate version of Lemma 10011, stated for convenience in terms of random measures (see the detailed discussion below)o For motivation, we note that the set function ryB = f 8 f(Ts~)ds in

188

Foundations of Modern Probability

Theorem 10.14 is a stationary random measure on JRd and that the intensity m of ry, defined by the relation Ery = m>.d, is equal to EJ(e). Lemma 10.17 (maximum inequality) Let e be a stationary random measure on JRd with intensity m, and let B1 C B2 C · · · be bounded, convex sets in ßd with IB1I > 0. Then r P{supk(eBk/IBkl)

> r}:::;

m (2d//d),

r

> 0.

Proof: Fix any r, a > 0 and n E N, and define a process v on JRd and a random set Kin Sa = {x E JRd; lxl:::; a} by v(x) K

= inf{k

=

E N; e(Bk

+ x) > riBkl},

XE Rd,

{x E Sa; v(x):::; n}.

By Lemma 10.16 there exists a finite, random subset H C K suchthat the sets Bv(x) +x, x EH, are disjoint and IKI:::; (2d//d) LxeH IBv(x)l· Writing b = sup{lxl; x E Bn}, we get esa+b

~ LxeHe(Bv(x) + x) ~ r LxeHIBv(x)l ~ riKI/(2d//d).

Taking expectations and using Fubini's theorem and the stationarity and measurability of v, we obtain m(2d//d)ISa+bl

~

rEIKI=r { P{v(x):::;n}dx

=

riSal P{maxk~n(eBk/IBkl)

Jsa

> r}.

Now divide by ISal, and then let a -t oo and n -t oo in this order.

0

We finally need an elementary Hilbert-space result. Recall that a contraction on a Hilbert space H is defined as a linear operator T such that IITell :::; llell for alle EH. For any linear subspace M c H, we write Ml. for the orthogonal complement and M for the closure of M. The adjoint T* of an operator T is characterized by the identity (e, Try) = (T*e, ry), where (·, ·) denotes the inner product in H. Lemma 10.18 (invariant subspace) For any family I of contractions on a Hilbert space H, let N denote the I-invariant subspace of H, and let R be the linear subspace of H spanned by the set {e- Te; e E H, T E 7}. Then Nl. c R. Proof: If e .l R, then

(e- T*e,ry)

= (e,"l- Try) = 0, TE/, 'Tl EH,

which implies T* e = e for every T E /. Hence, for any T E T we have (Te, e) = (e, T*e) = llell 2 , and so by the contraction property,

o :::; II Te- ell 2 = :::; which implies Te=

211e11 2 -

11Tell 2 + 11e11 2

211e11 2

= o,

-

2(Te, e)

e. This gives Rl. c N, and so Nl. c (Rl.)l. = R.

o

10. Stationary Processes and Ergodie Theory

189

Proof of Theorem 10.14: First assume that f E L 1 , and define

For any c > 0, Lemma 10.18 yields a measurable decomposition

f =

r

+ Lk~m (gk- Tskgk) + hc,

r

where E L 2 is Ts-invariant for all s E JRd, the functions gf, ... , g;,. are Next, we see from bounded, and Elhg(~)l < c. Here clearly Lemma 10.15 (ii) that, as n -+ oo for fixed k ~ m and c > 0,

Anr =r.

IIAn(gk- Tskgk)ll

< (I(Bn + Sk)ßBni/IBnl) llgkll < 2 ((1 + iski/r(Bn))d -1) llgkll-+ 0.

Finally, Lemma 10.17 yields rP{supnAnlhc(~)l2:

r} ~ (2dffd)Eihc(~)l ~ (2dffd)c,

r,c > 0,

which implies supn Anlhc(~)l ~ 0 as c -+ 0. In particular, it follows that liminfn Anf(~) < oo a.s., which justifies the estimate (limsupn

-liminfn)An/(~)

(limsupn

-liminfn)Anhc(~)

~ 2supnAnW(~)I ~ 0. This shows that the left-hand side vanishes a.s., and the required a.s. convergence follows. When f E LP for some p 2: 1, the asserted LP-convergence follows as before from the uniform integrability ofthe powers IAnf(~)IP. We may now identify the limit, as in the proof of Corollary 10.9, and the a.s. convergence D extends to arbitrary f 2: 0, as in case of Theorem 10.6. We turn to a version of Theorem 10.14 for random measures on JRd. Recall that a random measure ~ on JRd is defined as a locally finite kernel from the basic probability space (0, A, P) into JRd. In other words, ~(w, B) is required to be a locally finite measure in B E ßd for fixed w E 0 and a random variable in w E n for every bounded set B E ßd. Alternatively, we may regard ~ as a random element in the space M(JRd) of locally finite measures J.L on JRd, endowed with the a-field generated by all evaluation maps J.L f-t J.LB with B E ßd. We say that ~ is stationary if 0 8 ~ 4 ~ for every s E JRd, where the shift operators 08 on M(JRd) are defined by (0 8 J.L)B = J.L(B + s) for all BE ßd. Theinvariant a-field of ~ is given by I~= ~- 1 I, where I denotes the a-field of all shift-invariant, measurable sets in M(JRd). We may now define the sample intensity of ~ as the extended-valued random variable [ = E[~BIId/IBI, where BE ßd is arbitrary with IBI E (0, oo). Note that this expression is independent of B, by the stationarity of ~ and Theorem 2.6.

190

Foundations of Modern Probability

Corollary 10.19 (sample intensity, Nguyen and Zessin) Let~ be a stationary random measure on JRd, and fix some bounded, convex sets B1 C B2 C · · · in ßd with r(Bn) -t oo. Then ~Bn/IBnl -t ( a.s., where D,d = E[~IIel· The same convergence holds in LP for some p 2: 1 when ~[0, 1]d E LP. Proof: By Fubini's theorem, we have for any A, B E ßd

L(Os~)Ads

=

=

Ldsf1A(t-s)~(dt)

J~(dt)

L 1A(t- s) ds =

~(lA * 1B)·

Assuming lAI = 1 and A C Sa = {s; lsl < a}, and putting B+ = B + Ba and B- = (Be+ Sa)c, we note that also 1A * 1B- ~ 1B ~ 1A * 1B+· Applying this to the sets B = Bn gives IB; I ~(1A d B;;-) < ~Bn < IB,i I ~(1A d B;!-) IBnl IBn I - IBnl - IBnl IBtl

Since r(Bn) -t oo, Lemma 10.15 (ii) yields IB~I/IBnl -t 1. Next we may apply Theorem 10.14 to the function f(IL) = 11A and the convex sets B~ to obtain ~(1A * 1B;t=)/IB~I-+ Eie~A = (in the appropriate sense. D The LP-versions of Theorem 10.14 and Corollary 10.19 remain valid under weaker conditions than previously indicated. The following results are adequate for most purposes. Here we say that the distributions (probability measures) /Ln on JRd are asymptotically invariant if II/Ln - /Ln * 88 II -t 0 for every s E JRd, where 11·11 denotes the total variation norm. Similarly, the weight functions (probability densities) fn on JRd are said to be asymptotically invariant if >.dlfn- Osfnl -t 0 for every s. Note that the conclusion of Theorem 10.14 can be written as !LnX -t X, where /Ln = (1Bn · >.d)/IBnl, Xs = f(Ts~), and X= E[f(~)IIeJ.

Corollary 10.20 (mean ergodie theorem)

(i) For any p 2: 1, consider on JRd a stationary, measurable, and LPvalued process X and some asymptotically invariant distributions /Ln· Then !LnX -t X= E[XIIx] in LP.

(ii) Consider on JRd a stationary random measure ~ with finite intensity and some asymptotically invariant weight functions f n. Then ~~n -t ( in L 1 , where (>.d = E[~IIel· Proof: (i) By Theorem 10.14 we may choose some distributions Vm on JRd suchthat vmX -t X in LP. Using Minkowski's inequality and its extension in Corollary 1.30, along with the stationarity of X, the invariance of X,

10. Stationary Processes and Ergodie Theory

191

and dominated convergence, we get as n -+ oo and then m -+ oo IIJ.LnX- Xllv

< IIJ.LnX- (J.Ln * Vm)XIIv + II(J.Ln * Vm)X- Xllp < IIJ.Ln- J.Ln * VmiiiiXIIp + < IIXIIv

J

IIJ.Ln- J.Ln

J

ll(8s

* Vm)X- XllvJ.Ln(ds)

* 8tll Vm(dt) + llvmX- Xllv-+ 0.

(ii) By Corollary 10.19 we may choose some weight functions 9m such that egm -+ (in L 1. Using Minkowski's inequality, the stationarity of e, the invariance of (, and dominated convergence, we get as n -+ oo and then m-+oo IIUn- (111

< IIUn- eUn * 9m)ll1 + lleUn * 9m)- (111 < Eel!n- fn * 9ml + < E(

J

J

lle(8s

* 9m)- (111 fn(s) ds D

Adlfn- Btfnl 9m(t) dt + lle9m- (111-+ 0.

Additional conditions may be needed to ensure V-convergence in case (ii) when eB E V for bounded sets B. It is certainly enough to require f n :S cgn for some weight functions 9n with egn -+ ( in V and some constant c > 0. Our next aim is to prove a subadditive version of Theorem 10.6. For motivation and subsequent needs, we begin with a simple result for nonrandom sequences. Recall that a sequence c1. c2 , · • • E IRis said to be subadditive if Cm+n :'S Cm + Cn for all m, n E N.

Lemma 10.21 (subadditivity) For IR, we have . f . -Cn = m l 1m n->oo n n

any subadditive sequence c1, c2 ,

.••

E

-Cn E [-oo, oo ) . n

Proof: Iterating the subadditivity relation, we get for any k, n E N

Cn :'S [njk]ck

+ Cn-k[n/k] :'S [njk]ck +Co V··· V Ck-1.

where Co= 0. Noting that [n/k] ck/ k for all k, and so

rv

njk as n-+ oo, we get limsupn(cn/n) :S

. . f -Cn :'S 11msup. . f Cn Cn :'S lll . fCn. D :'S 11mm n n n->oo n n->oo n n n We turn to the more general case of a two-dimensional array Cjk, 0 :S j < k, which is said to be subadditive if Co,n :S co,m + Cm,n for all m < n. The present notion reduces to the previous one when Cjk = Ck-j for some sequence ck. We also note that subadditivity holds automatically for arrays of the form Cjk = aJ+1 + · · · + ak. We shall now extend the ergodie theorem to subadditive arrays of random variables ejk, 0 :'S j < k. For motivation, we recall from Theorem 10.6 that if ejk = 'T/j+1 +·. ·+'TJk for some stationary and integrable sequence ofrandom lll -

192

Foundations of Modern Probability

variables 'flk, then f.o,n/n converges a.s. and in L 1. A similar result holds for generalsubadditive arrays (f.jk) that are stationary under simultaneaus shifts in the two indices, so that (f.H1,k+1) ~ (l;,j,k)· To allow for a wider range of applications, we introduce the slightly weaker assumptions ( f.k,2k' 6k,3k' ... ) ( f.k,k+l' f.k,k+2' ... )

d d

(f.o,k, f.k,2k, ... ),

k E N,

(5)

(f.o,I. f.o,2, ... ),

k EN.

(6)

For convenience of reference, we also restate the subadditivity requirement: f.o,n ::=; f.o,m + f.m,n,

0 < m < n.

(7)

Theorem 10.22 (subadditive ergodie theorem, Kingman) Let (f.jk) be a subadditive array of random variables satisfying (5) and (6), and assume that Et;,t, 1 < oo. Then f.o,n/n converges a.s. toward a random variable [in [-oo, oo) with E[ = infn(Ef.o,n/n) c. The same convergence holds in L 1 when c > -oo. lf the sequences in (5) are ergodic, then [ is a.s. a constant.

=

Proof {Liggett): Put f.o,n = f.n for convenience. By (6) and (7) we have Et;,;i ::=; nEt;,{ < oo. We first assume c > -oo, so that the variables f.m,n are integrable. Iterating (7) gives f.n -

n

[n/k] ::=;

n

L f.(i-1)k,jk/n + L

j=1

f.j-1,j/n, j=k[n/k]+1

n, k E N.

(8)

By (5) the sequence f.(i-l)k,jk' j E N, is station~ for fixed k, and so by Theorem 10.6 we have n- 1 Lj::;;n f.(i-l)k,jk --+ f.k a.s. and in L 1, where E[k = Ef.k· Hence, the first term in (8) tends a.s. andin L 1 toward [k/k. Similarly, n- 1 Lj::;;n f.j-l,j --+ [1 a.s. andin L 1, and so the second term in (8) tends in the samesense to 0. Thus, the right-hand side converges a.s. andin L 1 toward [k/k, and since k is arbitrary, we get

(9) The variables moreover

t;,;i jn are uniformly integrable by Proposition 4.12, and

Elimsupn(f.n/n) ::=; E[ ::=; infn(E[njn)

= infn(Ef.n/n) = c.

(10)

To derive a lower bound, let l'l:nll(f.jk) be uniformly distributed over {1, ... , n} for each n, and define

kEN. By (6) we have

((f,(2, · · ·) ~ (6,6, · ·. ), n

E N.

(11)

Moreover, 'flk ::=; f.~~:n+k-1,~ 0 P{~[t, t

+ h) > c}

1)h, nh) > c} P{~[(n-1)h,nh)>c i.o.}

limsupnP{~[(n-

<
..d for some constant c E [0, oo], called the intensity of ~' and we note that c = E(, where ( is the sample intensity in Corollary 10.19. lf X and ~ are jointly stationary and ~ has finite and positive intensity, we define the Palm distribution Qx,e of (X,~) with respect to ~ by the formula

=(

Qx,d = E

Lf(Bs(X,~))~(ds)/E~B,

(1)

for any set B E ßd with >..dB E (O,oo) and for measurable functions f ;::: 0 Oll sR.d X M(~d). The following result shows that the definition is independent of the choice of B.

204

Foundations of Modern Probability

Lemma 11.2 (coding) Consider a stationary pair (X, e) on JRd, where X is a measurable process in S and e is a random measure. Then for any measurable function f 2:: 0, the stationarity carries over to the random measure

Proof: For any t E JRd and B E ßd, a simple computation gives (fltet)B

= ~t(B+t)=

J J L

{ f(Bs(X,~))~(ds) jB+t

1B(s- t) f(Bs(X, ~)) ~(ds)

1B(u) f(Bu+t(X, ~))~(du+ t) J(BuBt(X, ~)) (Bte)(du).

Writing et

= F(X, ~) and using the stationarity of (X,~), we obtain D

The mapping in (1) is essentially a one-to-one correspondence, and we proceed to derive some useful inversion formulas. To state the latter, it is suggestive to introduce a random pair (Y, TJ) with distribution Qx,t;, where in view of (1) the process Y can again be chosentobe measurable. When is a simple point process, then so is ry, and we note that ry{O} = 1 a.s. The result may then be stated in terms of the associated Voronoi cells

e

VJ.t

= {s E IRd; J.L(Sisl + s) = 0}, J.L E N(IRd),

where N(JRd) is the class of locally finite measures on IR+ and Sr denotes the openball of radius r around the origin. If also d = 1, we may enumerate the supporting points of J.L in increasing order as tn(J.L), subject to the convention to(J.L)::; 0 < tl(J.L). To simplify our statements, we often omit the obvious requirement that the space S and the functions f and g be measurable.

Proposition 11.3 (uniqueness and inversion} Consider a stationary pair (X, ~) on JRd, where X is a measurable process in S and ~ is a random measure with E( E (0, oo ). Then P[(X, ~) E ·I~=/= 0] is uniquely determined by .C(Y, ry) = Qx,t;, and the following inversion formulas hold: (i) For any f 2:: 0 and g > 0 with >...dg < oo, E[f(X, ~); e =/= 0]

= E( · E

J!(~;~~;))

g( -s) ds.

(ii) lf ~ is a simple point process, we have for any f 2:: 0 E[f(X, e);

~ =/= 0] = E( · E

{ f(Bs(Y, ry)) ds.

lv'IJ

11. Special Notions of Symmetry and Invariance

(iii) Il ~ is a simple point process and d = 1, we have lor any I

tl ('7)

E(!(X, ~); ~ # 0] = E( · E Jo

~

205

0

f(Bs(Y, ry)) ds.

To express the conditional distribution P[f(X, ~) E ·I~# 0] in terms of .C(Y, ry), it suffices in each case to divide by the corresponding formula for I 1. The latter equation also expresses P {~ # 0} / E( in terms of .C( rJ). In particular, this ratio equals EIV11 1 in case (ii) and Et1 (ry) in case (iii).

=

Prool: (i) Write (1) in the form E( · >..dB · Ef(Y,ry)

=E

Lf(Bs(X,~)) ~(ds),

BE Bd,

and extend by a monotone dass argument to

for any measurable function h ~ 0 on the appropriate product space. Applying the latter formula to the function h(x,J.L,s) = f(B-s(x,J.L),s) for measurable f ~ 0 and substituting -s for s, we get

E( · E

J

f(Bs(Y, ry), -s) ds

In particular, we have for measurable g, h

E( · E

J

j

= E !(X,~' s) ~(ds). ~

h(Bs(Y, ry)) g( -s) ds

If g > 0 with >..dg < oo, then ~g by the further Substitution

(2)

0

= E h(X, ~) ~g.

< oo a.s., and the desired relation follows

h(x,J.L) = f(x,J.L) 1{J.Lg > 0}. J.L9 (ii) Herewe may apply (2) to the nmction h(x,J.L,s) = f(x,J.L) 1{J.L{s} = 1, J.LBisl = 0}, and note that (Bsry)SI-sl = 0 iff s E V11 • (iii) In this case, we apply (2) to the nmction

h(x,J.L,s) and note that to(BsrJ)

= l(x,J.L) 1{to(J.L) = s},

= -s iff s E [0, t1(ry)).

0

Now consider a simple point process "1 on IR. and a measurable process Y on IR. with values in an arbitrary measurable space (S,S). We say that

the pair (Y, ry) is cycle-stationary if ry{O} = 1 and t1 (ry) < oo a.s., and if in addition Bt 1 ( 11 )(Y, ry) ~ (Y, ry). The variables tn(rJ) are then a.s. finite, and the successive differences Lltn (rJ) = tn+l (rJ) - tn (rJ) along with the shifted processes yn = Btn(TI)y form a stationary sequence in the space

206

Foundations of Modern Probability

(0, oo) X sR. The following result gives a striking relationE;hip between the notions of stationarity and cycle stationarity for pairs (X,~) and (Y, 1J). When d = 1 and ~ =f. 0 a.s., the definition (1) of the Palm distribution and the inversion formula in Proposition 11.3 (iii) reduce to the nearly symmetric equations

(3)

(4) Theorem 11.4 (cycle stationarity, Kaplan) Equations (3) and (4) provide a one-to-one correspondence between the distributions of all stationary pairs (X,~) on IR and all cycle-stationary ones (Y,1J), where X and Y are measurable processes in S, and ~ and 1J are simple point processes with ~ =1- 0 a.s., E[ < oo, and Et1(17) < oo.

Proof: First assume that (X,~) is stationary with ~ =1- 0 and E[ < oo, put ak = tk(~), and define .C(Y, 17) by (3). Then for any n E N and for bounded, measurable f 2:: 0, we have

L

nE[·Ef(Y,1J) =E1n f(Bs(X,~))~(ds) =E f(Ouk(X,~)). 0 UkE(O,n) Writing Tk

= tk(1J),

we get by a suitable substitution

L

nE[·EJ(Br1 (Y,1J)) =E f(Ouk+l(X,~)), ukE(O,n) and so by subtraction,

211!11

1Ef(Br1 (Y, 17))- Ef(Y, 7J)I :::; n E[. Asn-+ oo, we obtain E/(071 (Y, 1])) = Ef(Y, 17), and therefore 071 (Y, 17) :!!::. (Y, 17), which means that (Y, 17) is cycle-stationary. Alsonote that (4) holds in this case by Proposition 11.3. Next assume that (Y, 17) is cycle-stationary with Et1 (17) < oo, and define .C(X,~) by (4). Then for n and f as before, nEr1 · Ef(X,~)

= E lrn

f(Os(Y,1J))ds,

and so for any t E IR,

nEr1 · Ef(Ot(X,~))

= E forn

Hence, by subtraction,

f(Os+t(Y,1J))ds

= E lrn+t f(Os(Y,1J))ds.

11. Special Notions of Symmetry and Invariance

207

d

Asn -+ oo, we get Ef(Ot(X,~)) = Ef(X,~), and so Ot(X,~) = (X,~), which means that (X,~) is stationary. To see that (X,~) and (Y, ry) are related by (3), we introduce a possibly unbounded measure space with integration operator E and a random pair (Y, ij) satisfying

(5) Proceeding as in the proof of Proposition 11.3, except that the monotone dass argument requires some extra care since E[ may be infinite, we obtain

tl (ij)

E Jo

tl (Tf)

f(Os(Y, ij)) ds = Ef(X, ~) = E Jo

f(0 8 (Y, ry)) ds/Et1(1J).

Replacing f(x, f-t) by f(Oto(!l-)(x, f-t)) and noting that to(Osf-t) = -s when f-t{O} = 1 and s E [0, tl(f-t)), we get

E[t1 (ij)f(Y, ij)] = E[t1 (ry)f(Y, ry)]/ Et1 (ry). Hence, by a suitable Substitution,

Ej(Y,ij) = Ef(Y,ry)jEt1(1J). Inserting this into (5) and dividing by the same formula for obtain the required equation.

f = 1, we D

When ~ is a simple point process on JR.d, we may think of the Palm distribution Qx,t; as the conditional distribution of (X,~), given that e{O} = 1. The interpretation is justified by the following result, which also provides an asymptotic formula for the hitting probabilities of small Borel sets. By Bn-+ 0 we mean that sup{lsl; s E Bn}-+ 0, and we write 11·11 for the total variation norm. Theorem 11.5 (local hitting and conditioning, Korolyuk, Ryll-Nardzewski, König, Matthes) Consider a stationary pair (X,~) on JR.d, where X is a measurable process in S and ~ is a simple point process with E[ E (0, oo). Let B 1 , B2, · · · E ßd with IBn I > 0 and Bn -+ 0, and let f be bounded, measurable, and shift-continuous. On {~Bn = 1}, let an denote the unique

point of ~ in Bn. Then

(i) P{~Bn = 1} ""'P{~Bn > 0} ""'E~Bn; (ii) IIP[OuJX, ~) E ·I ~Bn = 1]- Qx,t;ll -+ 0; (iii) E[f(X,~)I~Bn > 0]-+ Qx,d· Proof: (i) Since ry{O} = 1 a.s., we have (0 8 ry)Bn > 0 for all s E -Bn. Hence, Proposition 11.3 (ii) yields P{~Bn

E~

> 0}

{

= E lv" 1{(0sry)Bn > O}ds;:::: EIV'f/ n (-Bn)l.

208

Foundations of Modern Probability

Dividing by IBnl and using Fatou's lemma, we obtain l. . f P{~Bn > 0} lffilll ECB n--+oo ~ n

> l . . f EIV1j n (-Bn)l IBn I

lffilll n-+oo

> E l. . f IV1j n (- Bn) I - 1 IBn I

lffilll n--+oo

-

'

which implies

. . f P{~Bn = 1} > 21 . . f P{~Bn > 0} _ 1 > 1 llffilll ECB - lffilll ECBn - . n--+oo ':, n n-+oo ~

The converse relations are obvious since P{~Bn

(ii) Introduce on

= 1}

~ P{~Bn

> 0}

~ E~Bn.

sRd

x N(JR.d) the measures

/Ln

=

E {

}B,.

1{0s(X,~) E ·H(ds),

lln = P[Ou,. (X,~) E ·; and put mn = E~Bn and Pn variation becomes

=

P{~Bn

~Bn

=

= 1),

1}. By (1) the stated total

which tends to 0 in view of (i). (iii) Herewe write

> 0)- Qx,dl < IE[J(X,~)I~Bn > 0)- E[J(X,~)I~Bn = 1)1

IE[f(X,~)I ~Bn

+ IE[j(X, ~)- J(Ou,. (X, ~))I ~Bn = 1)1 + IE[J(Oun(X,~))I ~Bn = 1)- Qx,dl·

By (i) and (ii) the first and last terms on the right tend to 0 as n --t oo. To estimate the second term, we introduce on sRd X N(JR.d) the bounded, measurable functions

ge:(x,JL) = sup IJ(Os(x,JL))- f(x,JL)I,

c > 0,

lsl..d. For convenience, we may sometimes write g · J.L = gJ-L.

Theorem 11.6 (pointwise averages) Consider a stationary and ergodie pair (X,~) on JR.d, where X is a measurable process inS and ~ is a random measure with [ E (O,oo) a.s. Let .C(Y,1J) = Qx,t;· Then for any bounded, measurable function f and asymptotically invariant distributions J.Ln or weight functions 9n on JR.d, we have .

-

p

(1) fP.n(Y,1J)-+

Ef(X,~);

(ii) fgnt;(X,~) ~ Ef(Y,1J). The same convergence holds a.s. when J.Ln = 1Bn · >..d or 9n = 1Bn, respectively, for some bounded, convex sets B1 C B2 C · · · in ßd with r(Bn)-+ oo. We can give a short and transparent proof by using the general shift coupling in Theorem 10.28. Since the latter result applies directly only when the sample intensity [ is a constant (which holds in particular when ~ is ergodic), we need to replace the Palm distribution Qx,t; in (1) by a suitably modified version Q'x,t;• given for f ~ 0 and BE ßd with IBI E (0, oo) by

Q'x,t;f = E

Lf(Os(X,~)) ~(ds)/[IBI,

whenever [ E (O,oo) a.s. If ~ is ergodic, we note that [ = E[ a.s., and therefore Q'x,t; = Qx,t;· As previously for Qx,t;, it is both suggestive and convenient to introduce a random pair (Z, () with distribution Q'x,t;·

Lemma 11.7 (shift coupling, Thorisson) Consider a stationary pair (X,~) on JR.d, where X is a measurable process in S and ~ is a random measure with [ E (O,oo) a.s. Let .C(Z,() = Q'x,t;· Then there exist some random vectors a and T in JR.d such that (X,~)

=d Oa(Z, (),

(Z, ()

=d Or(X, ~).

The result suggests that we think of Q'x,t; as the distribution of (X,~) shifted to a "typical" point of f Note that this interpretation fails for Qx,t; in general. Proof: Write I for the shift-invariant a-field in the measurable path space of (X,~), and put Ix,t; = (X, ~)- 1 I. Letting B = [0, 1Jd and noting that

210

Foundations of Modern Probability

E[~BIIx,e]

= ~. we get for any I EI

P{(Z,() EI}

=

E

L1I(Os(X,~))~(ds)/~

E[~Bj~; (X,~) EI]= P{(X,~) EI},

which shows that (X,~) .:1;, (Z, () on I. Both assertions now follow from 0 Theorem 10.28.

Proof of Theorem 11.6: (i) By Lemma 11.7 we may assume that (Y, ry) = Or(X,~) for some random element Tin JR.d. Using Corollary 10.20 (i) and

the asymptotic invariance of J.ln, we get

The a.s. version follows in the same way from Theorem 10.14. (ii) Let ~f be the stationary and ergodie random measure in Lemma 11.2. Applying Corollary 10.20 (ii) to both ~ and ~f and using (1), we obtain

J

Yne

(X

'

~) = ~J9n )..dgn

P

~! = E~JB = Ef(Y )

)..dgn ~9n --+ ~

E~B

''fJ .

For the pointwise version, we may use Corollary 10.19 instead. Taking expected values in Theorem 11.6, we get for bounded formulas

0

f the

which may be interpreted as limit theorems for suitable space averages of the distributions C(X, ~) and C(Y, ry). We shall prove the less obvious fact that both relations hold uniformly for bounded f. For a striking formulation, we may introduce the possibly defective distributions Cp,(X, ~) and C9 e(X, ~), given for measurable functions f:?: 0 by Theorem 11.8 ( distributional averages, Slivnyak, Zähle) Consider a stationary pair (X, ~) on JR.d, where X is a measurable process in S and ~ is

a random measure with ~ E (O,oo) a.s. Let C(Z,() = Q'x,e· Thenfor any asymptotically invariant distributions J.ln or weight functions 9n on JR.d,

(i) I CI-Ln (Z, () - C(X, ~) I --+ 0; (ii) lllgne{X,~)- C(Z,()II-+ 0.

Proof: (i) By Lemma 11.7 we may assume that (Z,() = Or(X,~). Using Fubini's theorem and the stationarity of (X,~), we get for any measurable function f :?: 0

CJ-Ln(X,~)f =

JEf(Os(X,~))

J.Ln(ds) =

Ef(X,~) = C(X,~)f.

11. Special Notions of Symmetry and Invariance

211

Hence, by Fubini's theorem and dominated convergence,

IICI'n(Z,()- .C(X,~)~~

IICJLn(Br(X,~))- Cl'n(X,~)~~

< Eil! 1{Bs(X,~)

E

·}(!Ln- Br!Ln)(ds)ll

< Eil/Ln- Br/Lnll -+ 0. (ii) Letting 0 :S f :S 1 and defining

~f9n =

~~

as in Lemma 11.2, we get

Jf(Bs(X,~))gn(s)~(ds)

Interpreting ~J9n/~9n as 0 when ~9n

ICgn~(X, ~)J- .C(Z, ()I

:S ~9n·

= 0, we obtain

IE/gn~(X, ~)- Ej(Z, ()I

< EI ~J9n -

-

~9n

~JEn IO

c 1e[o, t] = (

(14)

a.s.

To justify the statement, we note that singularity is a measurable property of a measure J.l,. Indeed, by Proposition 2.21, it is equivalent that the function Ft = J.1,[0, t] be singular. Now it is easy to checkthat the singularity of F can be described by countably many conditions, each involving the increments of F over finitely many intervals with rational endpoints. Proof: If

e is stationary

Oll

En 0 when S is finite. (Hint: Show as in Lemma 11.19 that II supn J(~IFn)IIP < oo when ~ is S-valued, and use Corollary 10.8 (ii).) 21. Show that H(~,ry) :::; H(~) + H(ry) for any ~ and ry. (Hint: Note that H(ryl~) :::; H(ry) by Jensen's inequality.) 22. Give an example of a stationary Markov chain (~n) suchthat H(6) > 0 but H(6l~o) = 0. 23. Give an example of a stationary Markov chain (~n) suchthat H(6) = oo but H(6l~o) < oo. (Hint: Choose the state space Z+, and consider transition probabilities Pii that equal 0 unless j = i + 1 or j = 0.)

Chapter 12

Poisson and Pure Jump-Type ~arkov Processes Random measures and point processes; Cox processes, randomization, and thinning; mixed Poisson and binomial processes; independence and symmetry criteria; Markov transition and rate kernels; embedded Markov chains and explosion; compound and pseudo-Poisson processes; ergodie behavior of irreducible chains Poisson processes and Brownian motion constitute the basic building blocks of modern probability theory. Our first goal in this chapter is to introduce the family of Poisson and related processes. In particular, we construct Poisson processes on bounded sets as mixed binomial processes and derive a variety of Poisson characterizations in terms of independence, symmetry, and renewal properties. A randomization of the underlying intensity measure leads to the richer dass of Cox processes. We also consider the related randomizations of general point processes, obtainable through independent motions of the individual point masses. In particular, we will see how the latter type of transformations preserve the Poisson property. It is usually most convenient to regard Poisson and other point processes on an abstract space as integer-valued random measures. The relevant parts of this chapter may then serve at the same time as an introduction to random measure theory. In particular, Cox processes and randomizations will be used to derive some general uniqueness criteria for simple point processes and diffuse random measures. The notions and results of this chapter form a basis for the corresponding weak convergence theory developed in Chapter 16, where Poisson and Cox processes appear as Iimits in important special cases. Our second goal is to continue the theory of Markov processes from Chapter 8 with a detailed study of pure jump-type processes. The evolution of such a process is governed by a rate kernel a, which determines both the rate at which transitions occur and the associated transition probabilities. For bounded a one gets a pseudo-Poisson process, which may be described as a discrete-time Markov chain with transition times given by an independent, homogeneaus Poisson process. Of special interest is the case of compound Poisson processes, where the underlying Markov chain is a random walk. In Chapter 19 weshall see how every Feiler process can be

12. Poisson and Pure Jump- Type Markov Processes

225

approximated in a natural way by pseudo-Poisson processes, recognized in that context by the boundedness of their generators. A similar compound Poisson approximation of general Levy processes is utilized in Chapter 15. In addition to the already mentioned connections to other topics, we note the fundamental role of Poisson processes for the theory of Levy processes in Chapter 15 and for excursion theory in Chapter 22. In Chapter 25 the independent-increment characterization of Poisson processes is extended to a criterion in terms of compensators, and we derive some related time-change results. Finally, the ergodie theory for continuous-time Markov chains, developed at the end of this chapter, is analogaus to the discretetime theory of Chapter 8 and will be extended in Chapter 20 to a general dass of FeUer processes. A related theory for diffusions appears in Chapter 23.

To introduce the basic notions of random measure theory, consider an arbitrary measurable space (S, S). By a random measure on S we mean a a-finite kernel ~ from the basic probability space (0, A, P) into S. Here the a-finiteness means that there exists a partition B1. B 2 , • · • E S of S such that ~Bk < oo a.s. for all k. It is often convenient tothink of ~ as a random element in the space M(S) of a-finite measures on S, endowed with the a-field generated by the projection maps 7rB: J.l f-+ J.LB for arbitrary B ES. Note that ~B = ~(·, B) is a random variable in [0, oo] for every B E S. More generally, it is dear by a simple approximation that U = fd~ is a random variable in [0, oo] for every measurable function f ~ 0 on S. The intensity of ~ is defined as the measure E~B = E(~B), BE S. We often encounter the situation when S is a topological space with Borel a-field S = B(S). In the special case when S is a locally compact, second countable Hausdorff space (abbreviated as lcscH), it is understood that ~ is a.s. finite on the ring S of all relatively compact Borel sets. Equivalently, we assume that U < oo a.s. for every f E Cj((S), the dass of continuous functions f ~ 0 on S with compact support. In this case, the a-field in M(S) is generated by the projections 1rf: J.l f-+ J.L! for all f E Cj((S). The following elementary result provides the basic uniqueness criteria for random measures. Stronger results are given for simple point processes and diffuse random measures in Theorem 12.8, and related convergence criteria appear in Theorem 16.16.

J

Lemma 12.1 (uniqueness for random measures}

Let~

and rJ be random

measures on S. Then ~ :1::. rJ under each of these conditions:

(i) (~B1. ... ,~Bn) (ii) U

:!::_

(ryB1, ... , ryBn) for any B1. ... , Bn ES, n E N;

:1::. ryf for any measurable function f

~ 0 on S.

lf S is lcscH, it suffices in (ii) to consider functions f E Cj((S). Proof: The sufficiency of (i) is dear from Proposition 3.2. Next we note that (i) follows from (ii), as we apply the latter condition to any positive linear combination f = Lk ck1Bk and use the Cramer-Wold Corollary 5.5.

226

Foundations of Modern Probability

Now assume that S is lcscH, and that (ii) holds for all f E Cj((S). Since Cj((S) is closed under positive linear combinations, we see as before that (t;,fi, · ·. ,efn) !J= (ryfi, ... ,ryfn),

fi, ... ,fn E Cj(, n E N.

By Theorem 1.1 it follows that .C(t;,) = .C(ry) on the a-field 9 = a{1rf; f E ct}' where 7rf : J.L f-t J.Lf' and it remains to show that 9 contains :F = a{ 1rB; B E S}. Then fix any compact set K C S, and choose some functions fn E Cj( with fn ..j.. 1K. Since J.Lfn ..j.. J.LK for every J.L E M(S), the mapping 1TK is 9-measurable by Lemma 1.10. Next apply Theorem 1.1 to the Borel subsets of an arbitrary compact set, to see that 7rB is 9-measurable for any B ES. Hence, :F C 9. D By a point process on S we mean an integer-valued random measure t;,. In other words, we assume t;,B tobe a Z+-valued random variable for every B E S. Alternatively, we may think of t;, as a random element in the space N(S) C M(S) of all a-finite, integer-valued measures on S. When S is Borel, we may write t;, = L::k:S"' 8'Yk for some random elements 1'1> ')'2, ... in s and f'i, in z+' and we note that t;, is simple iff the 'Yk with k ~ f'i, are distinct. In general, we may eliminate the possible multiplicities to create a simple point process t;,*, which agrees with the counting measure on the support of t;,. By construction it is clear that t;,* is a measurable function oft;,. A random measure t;, on a measurable space S is said to have independent increments if the random variables t;,B1, ... , t;,Bn are independent for any disjoint sets B1. ... , Bn E S. By a Poisson process on S with intensity measure J.L E M(S) we mean a point process t;, on S with independent increments such that t;,B is Poisson with mean J.LB whenever J.LB < oo. By Lemma 12.1 the stated conditions specify the distribution oft;,, which is then determined by the intensity measure J.L. More generally, for any random measure Tl on S, we say that a point process t;, is a Cox process directed by Tl if it is conditionally Poisson, given ry, with E[t;,lry] = Tl a.s. In particular, we may take Tl = aJ.L for some measure J.L E M (S) and random variable a ~ 0 to form a mixed Poisson process based on J.L and a. We next define a v-randomization ( of an arbitrary point process t;, on S, where v is a probability kernel from S to some measurable space T. Assuming first that t;, is nonrandom and equal to J.L = L::k 88 k, we may take ( = L::k 8sk,"fk' where the 'Yk are independent random elements in T with distributions v(sk, ·). Note that the distribution P,_. of ( depends only on J.L. In general, we define a v-randomization (oft;, by the condition P[( E ·lt;,] =Pt; a.s. In the special case when T = {0, 1} and v(s, {0}) p E [0, 1], we refer to the point process l;,p = ((· x {0}) on S as a p-thinning oft;,. Another special instance is when S = {0}, t;, = r;,80 , and v = J.L/ J.LT for some J.L E M(T) with J.LT E (0, oo), in which case ( is called a mixed binomial {or sample} process based on J.L and r;,. Note that (B is then binomially distributed, conditionally on r;,, with parameters v B and r;,. If T is Borel,

=

12. Poisson and Pure

Jum~ Type

Markov Processes

227

we can write ( = Lk ... 'Bn Es X r and SI, ... ' Sn E (0, 1), we get by Lemma 12.2 (iii) Eexp(/1- Lk 1Bk logsk exp JL log Dexp Lk 1Bk log Sk exp JL log D ITk s~ 8 k. Using Lemma 1.41 {i) twice, we see that DIJks~8 k is a measurable function on S for fixed BI. ... , sn, and hence that the right-hand side is a measurable function of JL. Differentiating mk times with respect to Bk for each k and taking 81 = ... = Sn = 0, we conclude that the probability p nk {(11-Bk = mk} is a measurable function of JL for any m1, ... , mn E Z-t. As before, it follows that Pll- = .C{(/1-) is a probability kernel from N(S) to N(S x T), and the general result follows by Lemma 6.9. D We may use Cox transformations and thinnings to derive some general uniqueness criteria for simple point processes and diffuse random measures, improving the elementary statements in Lemma 12.1. Related convergence criteria are given in Proposition 16.17 and Theorems 16.28 and 16.29.

12. Poisson and Pure Jump-Type Markov Processes

233

Theorem 12.8 {one-dimensional uniqueness criteria) Let (S,S) be Borel.

(i) For any simple point processes e and 'fJ on S, we have e ~ 'fJ iff P{eB = 0} = P{rJB = 0} for all BE S. (ii) Let e and 'fJ be simple point processes or diffuse random measures on s, and fix any c > 0. Then e ~ 'fJ iff Ee-c~B = Ee-Cf/B for all B Es. (iii) Let e be a simple point process or diffuse random measure on S, and let 'fJ be an arbitrary random measure on S. Then e ~ 'fJ iff es~ rJB for all BE S. Proof: We may clearly assume that S = (0, 1]. (i) Let C denote the dass of sets {J.L; pB = 0} with B ES, and note that C is a 1r-system since {J.LB

= 0} n {J.LC = 0} = {J.L(B U C) = 0},

B, CES.

By Theorem 1.1 it follows that e ~ 'fJ on CT(C). Furthermore, writing Inj 2-n(j -1,j] for n E N and j = 1, ... ,2n, we have

J.L* B

= n--+oo lim L .(J.L(B n Inj) !\ 1), J

=

J.L E N(S), BE s,

which shows that the mapping J.L r-+ J.L* is CT(C)-measurable. Since e and 'fJ are simple, we conclude that e = ~ 'f]* = 'fJ· (ii) First let e and 'fJ be diffuse. By Theorem 12.7 we may choose some Cox processes { and fJ directed by and C'f]. Conditioning on e or 'f], respectively, we obtain

c

ce

P{{B

= 0} = Ee-c~B = Ee-c11B = P{f]B = 0},

BE S.

(2)

Since { and fJ are a.s. simple by Corollary 12.5, assertion (i) yields { ~ f], and so e ~ 'fJ by Lemma 12.6. If e and 'fJ are instead simple point processes, then (2) holds by Lemma 12.2 (iv) when { and f] are p-thinnings of and 'fJ with p = 1 - e-c, and the proof may be completed as before. (iii) First let be a simple point process. Fix any B E S such that 'f]B < oo a.s. Defining Inj as before, we note that TJ(B n Inj) E Z+ outside a fixed null set. It follows easily that 1B · 'fJ is a.s. integer valued, and so even 'fJ is a.s. a point process. Noting that

e

e

P{TJ*B = 0} = P{TJB = 0} = P{eB = 0},

BE S,

we conclude from (i) that e ~ 'fJ*. In particular, 'f]B ~ eB ~ 'fJ* B for all B, and so 'f]* = 'fJ a.s. Next assume that is a.s. diffuse. Letting { and f] be Cox processes directed by e and 'f/, we note that {B ~ f]B for every BE S. Since { is a.s. simple by Proposition 12.5, it follows as before that { ~ f], and so e ~ 'fJ by D Lemma 12.6.

e

234

Foundations of Modern Probability

As an easy consequence, we get the following characterization of Poisson processes. To simplify the statement, we may allow a Poisson random variable to have infinite mean, hence to be a.s. infinite. Corollary 12.9 ( one-dimensional Poisson criterion, Renyi) Let ~ be a random measure on a Borel space S such that ~ {s} = 0 a. s. for all s E S. Then ~ is a Poisson process iff ~B is Poisson for every B E S, in which case E~ is a-finite and diffuse.

Proof: Assurne the stated condition. Then J.L = E~ is clearly a-finite and diffuse, and by Theorem 12.7 there exists a Poisson process 'Tl on S with intensity J.L. Then ".,B 4 ~B for all B E S, and since 'Tl is a.s. simple by d Corollary 12.5, we conclude from Theorem 12.8 that ~ = 'Tl· D Much of the previous theory can be extended to the case of marks. Given any measurable spaces (8, S) and (K, K), we define a K-marked point process on S as a point process ~ on S x K in the usual sense satisfying ~({s} x K) ::; 1 identically and suchthat the projections ~(· x Kj) are a-finite point processes on S for some measurable partition K 1 , K 2 , ... of K. We say that ~ has independent increments if the point processes ~ (B 1 x ·), ... , ~(Bn x ·) on Kareindependent for any disjoint sets B1. ... , Bn ES. We also say that ~ is a Poisson process if ~ is Poisson in the usual sense on the product space S x K. The following result characterizes Poisson processes in terms of the independence property. The result plays a crucial role in Chapters 15 and 22. A related characterization in terms of compensators is given in Corollary 25.25. Theorem 12.10 (independence criterion for Poisson, Erlang, Levy) Let be a K -marked point process on a Borel space S such that ~( {s} x K) = 0 a.s. for all s E S. Then ~ is Poisson iff it has independent increments, in which case E~ isa-finite with diffuse projections onto S.

~

Proof: We may assume that S = (0, 1]. Fix any set B E S ® K with < oo a.s., and note that the projection 'Tl= (1B · ~)(· x K) isasimple point process on S with independent increments such that 'Tl{ s} = 0 a.s. for all s E S. Introduce the dyadic intervals Inj = 2-n(j- 1,j], and note that maxi 'Tflnj V 1 -+ 1 a.s. Next fix any e > 0. By dominated convergence, every point s E [0, 1] has an open neighborhood G 8 suchthat P{'TfG 8 > 0} < e, and by compactness we may cover [0, 1] by finitely many such sets G1. ... , Gm. Choosing n so large that every interval Inj lies in one of the Gk, we get maxi P{'Tflnj > 0} < e. This shows that the variables 'Tflnj form a null array. Now apply Theorem 5. 7 to see that the random variable ~B = ".,s = L:j 'Tflnj is Poisson. Since B was arbitrary, Corollary 12.9 then shows that ~ is a Poisson process on S x K. The last assertion is now obvious. D ~B

12.

Poisson and Pure Jump-Type Markov Processes

235

The last theorem yields in particular a representation of random measures with independent increments. A version for general processes on IR+ will be proved in Theorem 15.4. Corollary 12.11 {independent increments} Let~ be a mndom measure on a Borel space S such that Hs} = 0 a. s. for all s. Then ~ has independent

increments iff a.s.

~B =aB+ fooo xry(B x dx),

BE S,

(3)

for some nonmndom measure a on S and some Poisson process Tl on S x (0, oo). Furthermore, ~B < oo a.s. for some BE S iff aB< oo and

fooo (x 1\ 1) Ery(B

X

dx) < oo.

(4)

Proof: Introduce on S x (0, oo) the point process TJ = I:s Ds,Hs}' where the required measurability follows by a simple approximation. Noting that Tl ha.s independent S-increments, and also that

ry({s} x (O,oo)) = 1{Hs} > 0} :S 1,

s ES,

we conclude from Theorem 12.10 that Tl is a Poisson process. Subtracting the atomic part from ~' wegetadiffuse random measure a satisfying (3), and we note that a has again independent increments. Hence, a is a.s. nonrandom by Theorem 5.11. Next, Lemma 12.2 (i) yields for any B ES andr>O -logEexp{-r

fooo xry(Bxdx)}=

1

00

(1-e-rx)Ery(Bxdx).

As r-+ 0, it follows by dominated convergence that a.s. iff (4) holds.

J0

00

xry(B x dx) < oo 0

We proceed to characterize the mixed Poisson and binomial processes by a natural symmetry condition. Related results for more general processes appear in Theorems 11.15 and 16.21. Given a random measure ~ and a diffuse measure J-l on S, we say that ~ is J-L-symmetric if ~ o f- 1 :1:. ~ for every J-L- preserving mapping f on S. Theorem 12.12 (symmetric point processes)

Consider a simple point process ~ and a diffuse, a-finite measure J-l on a Borel space S. Then ~ is J-L-symmetric iff it is a mixed Poisson or binomial process based on J-l.

Proof: By Theorem 12.4 and scaling we may assume that J-LS = 1. By the symmetry of ~ there exists a function

~tU';, 1\ 1) --+ =· Show that I(~- ry)fnl !+ oo. (Hint: Consider the symmetrization i/ of a fixed measure v E N(S) with v f~ --+ oo, and argue along subsequences as in the proof of Theorem 4.17.) 22. For any pure jump-type Markov process on S, show that Px{r2 ~ t} = o(t) for all x ES. Alsonote that the bound can be sharpened to O{t 2 ) if the rate function is bounded, but not in general. (Hint: Use Lemma 12.16 and dominated convergence.) 23. Show that any transient, discrete-time Markov chain Y can be embedded into an exploding {resp., nonexploding) continuous-time chain X. (Hint: Use Propositions 8.12 and 12.19.)

24. In Corollary 12.21, use the measurability of the mapping X= Y o N to deduce the implication {iii) => (i) from its converse. (Hint: Proceed as in the proof of Proposition 12.15.) Also use Proposition 12.3 to show that {iii) implies {ii), and prove the converse by means of Theorem 12.10.

248

Foundations of Modern Probability

25. Consider a pure jump-type Markov process on (S, S) with transition kernels JLt and rate kernel o:. Show for any x E S and B E S that a(x,B) = f-to(x,B \ {x}). (Hint: Take f = 1B\{x} in Theorem 12.22, and use dominated convergence.) 26. Use Theorem 12.22 to derive a system of differential equations for the transition functions Pii(t) of a continuous-time Markov chain. (Hint: Take f(i) = Dij for fixed j.) 27. Give an example of a positive recurrent, continuous-time Markov chain suchthat the embedded discrete-time chain is null-recurrent, and vice versa. (Hint: Use Proposition 12.23.) 28. Establish Theorem 12.25 by a direct argument, mimicking the proof of Theorem 8.18.

Chapter 13

Gaussian Processes and Brownian Motion Symmetries of Gaussian distribution; existence and path properlies of Brownian motion; strong Markov and refiection properties; arcsine and uniform laws; law of the iterated logarithm; Wiener integrals and isonormal Gaussian processes; multiple Wiener-Ito integrals; chaos expansion of Brownian functionals The main purpose of this chapter is to initiate the study of Brownian motion, arguably the single most important object in modern probability theory. Indeed, weshall see in Chapters 14 and 16 how the Gaussian limit theorems of Chapter 5 can be extended to approximations of broad classes of random walks and discrete-time martingales by a Brownian motion. In Chapter 18 we show how every continuous local martingale may be represented in terms of Brownian motion through a suitable random timechange. Similarly, the results of Chapters 21 and 23 demonstrate how large classes of diffusion processes may be constructed from Brownian motion by various pathwise transformations. Finally, a close relationship between Brownian motion and classical potential theory is uneavered in Chapters 24 and 25. The easiest construction of Brownian motion is via a so-called isonormal Gaussian process on L 2 (JR+), whose existence is a consequence ofthe characteristic spherical symmetry of the multivariate Gaussian distributions. Among the many important properties of Brownian motion, this chapter covers the Hölder continuity and existence of quadratic variation, the strong Markov and refl.ection properties, the three arcsine laws, and the law of the iterated logarithm. The values of an isonormal Gaussian process on L 2 (1R+) may be identified with integrals of L 2 -functions with respect to the associated Brownian motion. Many processes of interest have representations in terms of such integrals, and in particular we shall consider spectral and moving average representations of stationary Gaussian processes. More generally, we shall introduce the multiple Wiener-Itö integrals Inf of functions f E L 2 (JR+) and establish the fundamental chaos expansion of Brownian L 2-functionals. The present material is related to practically every other chapter in the book. Thus, we refer to Chapter 5 for the definition of Gaussian distribu-

250

Foundations of Modern Probability

tions and the basic Gaussian limit theorem, to Chapter 6 for the transfer theorem, to Chapter 7 for properties of martingales and optional times, to Chapter 8 for basic facts about Markov processes, to Chapter 9 for similarities with random walks, to Chapter 11 for some basic symmetry results, and to Chapter 12 for analogies with the Poisson process. Our study of Brownian motion per se is continued in Chapter 18 with the basic recurrence or transience dichotomy, some further invariance properties, and a representation of Brownian martingales. Brownian local time and additive functionals are studied in Chapter 22. In Chapter 24 we consider some basic properties of Brownian hitting distributions, and in Chapter 25 we examine the relationship between excessive functions and additive functionals of Brownian motion. A further discussion of multiple integrals and chaos expansions appears in Chapter 18. To begin with some basic definitions, we say that a process X on some parameter space T is Gaussian if the random variable c1Xh + · · · + cnXtn is Gaussian for any choice of n E N, t~, ... , tn E T, and c1, ... , Cn E IR. This holds in particular if the Xt are independent Gaussian random variables. A Gaussian process X is said tobe centered if EXt = 0 for all t E T. Let us also say that the processes Xi on Ti, i E I, are jointly Gaussian if the combined process X= {Xf; t E Ti, i EI} is Gaussian. The latter condition is certainly fulfilled if the processes Xi are independent and Gaussian. The following simple facts clarify the fundamental role of the covariance function. As usual, we assume all distributionstobe defined on the cr-fields generated by the evaluation maps. Lemma 13.1 (covariance function) (i) The distribution of a Gaussian process X on T is determined by the functions EXt and cov(X8 ,Xt), s,t E T. (ii) The jointly Gaussian processes Xi on Ti, i E I, are independent iff cov(X!,X/) = 0 for all s E Ti and t E Tj, i -:f= j in I.

Proof: (i) Let X and Y be Gaussian processes on T with the same means and covariances. Then the random variables c1Xt 1 +· · ·+cnXtn and c1 rt 1 + · · · + Cn Ytn have the same mean and variance for any c1, ... , Cn E IR and t1, ... , tn E T, n E N, and since both variables are Gaussian, their distributions must agree. By the Cramer-Wold theorem it follows that d d (Xh, ... , XtJ = (Yt 1 , ••• , Ytn) for any t1, ... , tn E T, n E N, and so X= Y by Proposition 3.2. (ii) Assurne the stated condition. To prove the asserted independence, we may assume I to be finite. Introduce some independent processes yi, i E I, with the same distributions as the Xi, and note that the combined processes X= (Xi) and Y = (Yi) have the same means and covariances. Hence, the joint distributions agree by part (i). In particular, the independence between the processes yi implies the corresponding property for the D processes Xi.

13. Gaussian Processes and Brownian Motion

251

The following result characterizes the Gaussian distributions by a simple symmetry property. Proposition 13.2 (spherical symmetry, Maxwell} Let 6, ... , ~d be independent random variables, where d 2:: 2. Then the distribution of (6, ... , ~d) is spherically symmetric iff the ~i are i. i. d. centered Gaussian. Proof: Let cp denote the common characteristic function of 6, ... , ~d, and assume the stated condition. In particular, -6 !!!::. 6, and so cp is real valued and symmetric. Noting that s6 +t6 !!!::. 6 J s2 + t 2 , we obtain the functional equation cp(s )cp(t) = cp( .js2 + t 2 ), and so by iteration cpn(t) = cp(t.,fii) for all n. Thus, for rational t 2 we have cp(t) = eat 2 for some constant a, and by continuity this extends to all t ER Finally, we have a::; 0 since jcpj ::; 1. Conversely, Iet 6, ... , ~d be i.i.d. centered Gaussian, and assume that (171, ... , 'T7d) = T(6, ... , ~d) for some orthogonal transformation T. Then both random vectors are Gaussian, and we may easily verify that cov(ryi,'T7j) = cov(~i.~j) for all i and j. Hence, the two distributions agree by Lemma 13.1. 0

In infinite dimensions, the Gaussian property is essentially a consequence of the rotational symmetry alone, without any assumption of independence. Theorem 13.3 (unitary invariance, Schoenberg, Preedman} For any infinite sequence of random variables 6, 6, ... , the distribution of (6, ... , ~n) is spherically symmetric for every n 2:: 1 iff the ~k are conditionally i. i. d. N(O,a 2 ), given some random variable a 2 2::0. Proof: The ~n are clearly exchangeable, and so by Theorem 11.10 there exists a random probability measure f1, such that the ~n are conditionally fl,-i.i.d. given f-t· By the law of large numbers,

pB

= n---+oo lim n- 1 " ' 1{~k E B} a.s., L...,;k'5,n

BE ß,

and in particular J-t is a.s. {6, ~4 , ... }-measurable. Now the spherical symmetry implies that, for any orthogonal transformation T on JR2 ,

P[(6, 6) E Bj6, ... , ~n] = P[T(6, 6) E Bj6, ... , ~nJ,

BE B(ll~?).

As n --+ oo, we get J1, 2 = f1, 2 0 r- 1 a.s. Considering a Countahle dense set of mappings T, it is clear that the exceptional null set can be chosen to be independent of T. Thus, f1, 2 is a.s. spherically symmetric, and so f1, is a.s. centered Gaussian by Proposition 13.2. It remains to take a 2 = Jx 2 J-t(dx). 0 Now fix a separable Hilbert space H. By an isonormal Gaussian process on H we mean a centered Gaussian process ryh, h E H, such that E(ryhryk) = (h, k), the inner product of h and k. To construct such a process ry, we may introduce an orthonormal basis (ONB) e1, e2, ···EH, and Iet

6, 6, ... be independent N(O, 1) random variables. For any element

252

Foundations of Modern Probability

h =.Ei biei, we define ryh = .Ei biei, where the series converges a.s. andin L 2 since .Ei b~ < oo. The process 'Tl is clearly centered Gaussian. It is also linear, in the sense that ry( ah + bk) = aryh + bryk a.s. for all h, k E H and a, b E :IR. Assuming k =.Ei .{t; Bt = u} =

J

P{Bt = u}dt = 0,

u E IR..

0

The next result shows that Brownian motion has locally finite quadratic variation. An extension to general continuous semimartingales is obtained in Proposition 17.17.

13. Gaussian Processes and Brownian Motion

255

Theorem 13.9 (quadratic variation, Levy) Let B be a Brownian motion, and fix any t > 0 and a sequence of partitions 0 = tn,o < tn,1 < · · · < tn,kn = t, n E N, such that hn maxk(tn,k - tn,k-1) -t 0. Then

=

(n

= Lk(Btn,k- Btn,k-

1)

2

-t t in L 2 .

(3)

If the partitions are nested, then also (n -t t a.s. Proof (Doob): To prove (3), we may use the scaling property Bt- Bs it- sj 112 B1 to obtain E(n

g,

Lk E(Btn,k - Btn,k-J 2

=

Lk (tn,k - tn,k-1)EB~

= t,

Lk var(Btn,k - Btn,k-J 2

var((n)

Lk (tn,k - tn,k-I) 2 var(B?} :S hntEB{ -t 0. For nested partitions we may prove the a.s. convergence by showing that the sequence ((n) is a reverse martingale, that is, E[(n-1 - (n l(n, (n+1, ... ] = 0 a.s.,

n E N.

(4)

Inserting intermediate partitions if necessary, we may assume that kn = n for all n. In that case there exist some numbers t1, t 2 , • • • E [0, t] such that the nth partition has division points t1, ... , tn. To verify (4) for a fixed n, we may further introduce an auxiliary random variable 'l?llB with P {'!? = ± 1} = ~, and replace B by the Brownian motion B~

= BsAtn + 'I?(Bs -

BsAtn),

s ~ 0.

Since B' has the same sums (n, (n+1• ... as B whereas (n-1 -(n is replaced by '!?((n - (n-1), it is enough to show that E['!?((n - (n-1)l(n, (n+b ... ] = 0 a.s. This is clear from the choice of '!? if we first condition on (n-1,(n,··· · 0 The last result implies that B has locally unbounded variation. This explains why the stochastic integral J V dB cannot be defined as an ordinary Stieltjes integral and a more sophisticated approach is required in Chapter 17.

Corollary 13.10 (linear variation) Brownian motion has a.s. unbounded variation on every interval [s, t] with s < t.

Proof: The quadratic variation vanishes for any continuous function of 0 bounded variation on [s, t]. From Proposition 8.5 we note that Brownian motionBis a space-homogeneous Markov process with respect to its induced filtration. If the Markov property holds for some more general filtration :F = (:Ft) -that is, if B is adapted to :F and such that the process B~ = B s+t - B s is independent of

256

Foundations of Modern Probability

:Fs for each s 2: 0 -we say that B is a Brownian motion with respect to :F, or an :F-Brownian motion. In particular, we may take :Ft = 9t V N, t 2: 0, where g is the filtration induced by B and N = u{ N C A; A E A, PA = 0}. With this construction, :F becomes right-continuous by Corollary 7.25. The Markov property of B will now be extended to suitable optional times. A more general version of this result appears in Theorem 19.17. As in Chapter 7, we write :F"t = :Ft+· Theorem 13.11 {strong Markov property, Hunt) For any :F-Brownian motion B in JR.d and a.s. finite;:+ -optional timeT, the process B: = Br+tHn t 2: 0, is again a Brownian motion independent of :Fi.

Proof: As in Lemma 7.4, we may choose some optional times Tn-+ T that take countably many values and satisfy r n 2: r + 2-n. Then :Fi :Frn by Lemmas 7.1 and 7.3, and so by Proposition 8.9 and Theorem 8.10 each process Bf = Brn+t- B 7 n, t 2: 0, is a Brownian motion independent of :Fi. The continuity of B yields Bf -+ B: a.s. for every t. By dominated convergence we then obtain, for any A E :Fi and h, ... , tk E JR.+, k E N, and for bounded continuous functions f: JR.k -+ JR.,

c nn

E[f(B~ 1 , ••• , B;k); A] = Ef(Btt, ... , Btk) ·PA.

The general relation P[B' E ·, A] = P{B E ·} · PA now follows by a 0 Straightforwardextension argument. If B is a Brownian motion in JR.d, then a process with the same distribution as jBj is called a Bessel process of order d. More general Bessel processes may be obtained as solutions to suitable SDEs. The next result shows that JBJ inherits the strong Markov property from B. Corollary 13.12 {Bessel processes) If B is an :F-Brownian motion in JR.d, then JBI is a strong ;:+ -Markov process.

Proof: By Theorem 13.11 it is enough to show that JB + xJ 4 JB + yJ whenever lxl = IYI· We may then choose an orthogonal transformation T on JR.d with Tx = y, and note that

IB+xl = IT(B+x)l = ITB+yl =d IB+yJ.

0

We shall use the strong Markov property to derive the distribution of the maximum of Brownian motion up to a fixed time. A stronger result is obtained in Corollary 22.3. Proposition 13.13 {maximum process, Bachelier) Let B be a Brownian motion in IR., and define Mt = SUPs:::;t B 8 , t 2': 0. Then d

Mt= Mt- Bt

=d IBtl,

t 2:0.

For the proof we need the following continuous-time counterpart to Lemma 9.10.

13. Gaussian Processes and Brownian Motion

257

Lemma 13.14 (refiection principle) Consider a Brownian motion B and an associated optional time T. Then B has the same distribution as the refiected process

Bt

= Btl\r- (Bt- Bu,r),

t ~ 0.

Proof: It is enough to compare the distributions up to a fixed time t, and so we may assume that T < oo. Define B[ = Br/\t and s; = Br+t- Br. By Theorem 13.11 the process B' is a Brownian motion independent of (r, Br). Since, moreover, -B' 4 B', we get (r, Br, B') 4 (r, Br, -B'). It remains to note that t ~ 0.

D

Proof of Proposition 13.13: By scaling it suffices to take t = 1. Applying Lemma 13.14 with T = inf{ t; Bt = x} gives P{M1 ~ x, B1:::; y} = P{B1 ~ 2x- y},

x ~ y V 0.

By differentiation it follows that the pair (M~, B 1) has probability density -2 0}

2P{sups2:t(Bs- Bt) < -Bt} = 2P{IBt- Btl < Bt} P{IBt- Btl < IBtl} = P{72:::; t}. D

130 Gaussian Processes and Brownian Motion

259

The first two arcsine laws have the following counterparts for the Brownian bridgeo

Theorem 13.17 (uniform laws) Let B be a Brownian bridge with maximum M1o Then these random variables are both U(O, 1):

= .X{t;

T1

Bt

> 0},

T2

= inf{t;

Bt

= M1}o

Proof: The relation T1 4 T2 may be proved in the same way as for Brownian motiono To see that T2 is U(O, 1), write (x) = x- [x], and consider for each u E [0, 1] the process Bf = B(u+t) - Bu, t E [0, 1]0 It is easy to check

that nu 4 B for each u, and further that the maximum of ßU occurs at (T2- u)o By Fubini's theorem we hence obtain for any t E [0, 1] P{T2:::; t}

=

1 1

P{(T2- u):::; t}du

= E .X{u;

(T2- u):::; t}

= to

D

From Theorem 1305 we note that rc Bt --+ 0 aoso as t --+ 0 for any c E [0, ~)o The following classical result gives the exact growth rate of Brownian motion at 0 and OOo Extensions to random walks and renewal processes are obtained in Corollaries 1408 and 140140 A functional version appears in Theorem 270180

Theorem 13.18 (laws of the iterated logarithm, Khinchin) Fora Brownian motion B in JR, we have aoso limsup t--+0

~

= 1°1msup

J2t log log(1/t)

t--+oo

~

.J2t log logt

-1 - 0

Proof: The Brownian inversion Bt = tB 1;t of Lemma 1306 converts the two formulas into one another, so it is enough to prove the result fort--+ ooo Then we note that as u --+ oo

1oo e-x2 /2dx""' u-1 1oo xe-x2 /2dx

= u-le-u2 /2 0

By Proposition 13013 we hence obtain, uniformly in t P{Mt

> ut 112 } = 2P{Bt > ut 112 }""' (2/7r) 112 u- 1e-u 2/ 2 ,

where Mt= SUPs 0 P{M(rn)

Fixing c that

> 0,

> 1 and

= (2tloglogt) 112 , we get for

> ch(rn-l)} $

choosing r

P{limsupt--+ 00 (Bt/ht)

n-c2 fr(logn)- 112 ,

any r > 1

n E No

< c2 , it follows by the Borel-Cantelli lemma

> c}:::;

P{M(rn)

> ch(rn-l)

iooo}

= 0,

which shows that limsupt--+ 00 (Bt/ht) :::; 1 aoso To prove the reverse inequality, we may write P{B(rn)- B(rn-l)

> ch(rn)};::: n-c 2r/(r-l)(logn)-ll 2 ,

n E No

260

Foundations of Modern Probability

Taking c = {(r -1)/rP1 2, we get by the Borel-Cantelli lemma . Bt-Bt;r B(rn)-B(rn- 1 ) (r-1)1/2 . l1msup a.s. h 2: 11msup h( n) 2: - t-+oo

t

n-+oo

r

r

The upper bound obtained earlier yields limsupt_. 00 (-Bt;r/ht) :S r- 112, and combining the two estimates gives

B

limsup _..! 2: (1- r- 1) 112 - r- 112 a.s. t-+oo

ht

Herewe may finally let r--+ oo to obtain limsupt-+ 00 (Bt/ht) 2: 1 a.s.

0

In the proof of Theorem 13.5 we constructed a Brownian motion B from an isonormal Gaussian process TJ on L 2(1R+, .A) such that Bt = TJ1[o,t] a.s. for all t 2: 0. If instead we are starting from a Brownian motion B on IR+, the existence of an associated isonormal Gaussian process TJ may be inferred from Theorem 6.10. Since every function h E L 2(1R+, .A) can be approximated by simple step functions, as in the proof of Lemma 1.35, we note that the random variables ryh are a.s. unique. We shall see how they can also be constructed directly from B as suitable Wiener integrals I hdB. As already noted, the latter fail to exist in the pathwise Stieltjes sense, and so a different approach is needed. As a first step, we may consider the dass S of simple step functions of the form ht

= Lj:::;n aj1(tJ-l,tJJ(t),

t 2: 0,

where n E Z+, 0 = to < · · · < tn, and a1. ... , an E IR. For such integrands h, we may define the integral in the obvious way as ryh

=

.< aj(BtJ- Bt _J. Jofoo htdBt = Bh = L J_n 3

Here ryh is clearly centered Gaussian with variance E(ryh) 2

=

L.J:::;n a](tj - tj-1) = Jo

{oo

h~dt = llhll 2,

where llhll denotes the norm in L2(1R+, .A). Thus, the integration h H ryh = I hdB defines a linear isometry from Sc L2(1R+, .A) into L2(0, P). Since S is densein L2(1R+, .A), we may extend the integral by continuity to a linear isometry h H ryh = I hdB from L 2(.A) to L 2 (P). Here ryh is again centered Gaussian for every h E L 2(.A), and by linearity the whole process h H ryh is then Gaussian. By a polarization argument it is also clear that the integration preserves inner products, in the sense that E(ryhryk)

=

1

00

htktdt

= (h,k),

h,k E L 2 (.A).

We shall consider two general ways of representing stationary Gaussian processes in terms of Wiener integrals ryh. Here a complex notation is convenient. By a complex-valued, isonormal Gaussian process on a (real) Hilbert

13. Gaussian Processes and Brownian Motion

261

space H we mean a process ( = ~ + iry on H such that ~ and "1 are independent, real-valued, isonormal Gaussian processes on H. For any f = g + ih with g, h EH, we define (/ = ~g- ryh + i(~h + ryg). Now let X be a stationary, centered Gaussian process on lR with covariance function rt = E X 8 Xs+t, s, t E R We know that r is nonnegative definite, and it is further continuous whenever X is continuous in probability. In that case Bochner's theorem yields a unique spectral representation rt

=

L:

eitxf-l(dx),

t E JR,

where the spectral measure f-l is a bounded, symmetric measure on R The following result gives a similar spectral representation of the process X itself. By a different argument, the result extends to suitable nonGaussian processes. As usual, we assume that the basic probability space is rich enough to support the required randomization variables. Proposition 13.19 {spectral representation, Stone, Cramer) Let X be an L 2 -continuous, stationary, centered Gaussian process on lR with spectral measure f-l· Then there exists a complex, isonormal Gaussian process ( on L 2 (f-l) such that

(5) Proof: Denoting the right-hand side of (5) by Y, we may compute E Y 8 yt

E

J(

cos sx

d~x -

sin sx d'Tfx)

J(

cos tx

d~x -

sin tx d'Tfx)

j(cossx costx- sinsx sintx)f-l(dx)

J

cos(s- t)x f-l(dx) =

J

ei(s-t)xf-l(dx) = rs-t·

Since both X and Y are centered Gaussian, Lemma 13.1 shows that Y!!::. X. Now both X and ( are continuous and defined on the separable spaces L 2 (X) and L 2 (f-l), and so they may be regarded as random elements in suitable Polish spaces. The a.s. representation in (5) then follows by Theorem 6.10. D Another useful representation may be obtained under suitable regularity conditions on the spectral measure f-l· Proposition 13.20 {moving average representation) Let X be an L 2 continuous, stationary, centered Gaussian process on lR with absolutely continuous spectral measure f-l· Then there exist an isonormal Gaussian process "1 on L 2 (JR, .A) and a function f E L 2 (.A) suchthat

(6)

262

Foundations of Modern Probability

Proof: Fix a symmetric density g ~ 0 of J-L, and define h = g112 • Then h E L 2 (>. ), and we may introduce the Fourier transform in the sense of Plancherel,

(7) which is again real valued and square integrable. For each t E IR the function kx = e-itxhx has Fourier transform ks =Is-t, and so by Parseval's relation Tt

=

100

-oo

itx 2 e hxdx

=

100

-oo

-

hxkxdX

=

100

-oo

fsfs-tdS.

(8)

Now consider any isonormal Gaussian process 77 on L 2 (..\). For f as in (7), we may define a process Y on IR by the right-hand side of (6). Using (8), we get EYsYs+t = Tt for arbitrary s, t E IR, and so Y :1;, X by Lemma 13.1. Againanappeal to Theorem 6.10 yields the desired a.s. representation of

X.

o

For an example, we may consider a moving average representation of the stationary Ornstein-Uhlenbeck process. Then introduce an isonormal Gaussian process rJ on L 2 (IR, ..\) and define

The process Xis clearly centered Gaussian, and we get s, t E IR,

as desired. The Markov property of X follows most easily from the fact that

We proceed to introduce multiple integrals In = ryfim with respect to an isonormal Gaussian process rJ on a separable (infinite-dimensional) Hilbert space H. Without loss of generality, we may take H to be of the form L 2 (S, J-L). Then H'8in can be identified with L 2 (Sn, J-L0n), where J-L0n derrotes the n-fold product measure J-L0 · · · 0 J-L, and the tensor product ®k.A. To prove the latter, we may divide A for each m into 2m subsets Bmj of measure ::::; 2-m, and note as in Theorem 13.9 and Lemma 13.23 that

(17A) 2 = L.(1JBmi) 2 + L·..J. .1JBmi 1]Bmj---+ >.A + I2A 2 in L 2. •

0

•.,-J

The last lemma will be used to derive an explicit representation of the integrals In in terms of the Hermite polynomials Po,Pl, ... . The latterare defined as orthogonal polynomials of degrees 0, 1, ... with respect to the standard Gaussian distribution on IR.. This condition determines each Pn up to a normalization, which we choose for convenience such that the leading coefficient becomes 1. The first few polynomials are then

Po(x)

= 1,

Pl(x)

= x,

P2(x)

= x2 -

1,

p3(x)

= x3 -

3x,

Theorem 13.25 (orthogonal representation, Ito) On a separable Hilbert space H, let 1J be an isonormal Gaussian process with associated multiple Wiener-Ito integrals h, I 2, ... . Then for any orthonormal elements e1, ... , em E H and integers n1, ... , nm 2: 1 with sum n, we have

In

0

j'S,m

eTnj =

rr

j'S,m

Pnj(1Jej)·

Using the linearity of In and writing h = formula is equivalent to the factorization

In

0

j'S,m

h1ni =

I1 Inih1ni,

j'S,m

h/llhll, we see that the stated

h1, ... , hk EH

orthogonal,

(15)

tagether with the representation of the individual factors

Inh®n = llhllnPn(1Jh),

h EH\ {0}.

(16)

Proof: We prove (15) by induction on n. Then assume the relation to hold for all integrals up to order n, fix any orthonormal elements h, ht, ... , hm E H and integers k, n1, ... , nm E N with sum n + 1, and write f = ®j-::;,m h1ni. By Lemma 13.24 and the induction hypothesis,

In(! 0 h®(k-l)) ·1Jh- (k- 1)In-l(f 0 h®(k- 2)) (In-k+lf) { h-lh®(k-l) •1Jh- (k- 1)h-2h®(k- 2)}

=

In-k+d · hh0 k.

Using the induction hypothesis again, we obtain the desired extension to In+l·

266

Foundations of Modern Probability

It remains to prove (16) for an arbitrary element h E H with Then conclude from Lemma 13.24 that

In+lh®(n+l)

= Inh®n · ryh- nln-lh®(n-l),

llhll = 1.

n E N.

Since ! 0 1 = 1 and I1h = ryh, we see by induction that Inh®n is a polynomial in ryh of degree n and with leading coefficient 1. By the definition of Hermite polynomials, it remains to show that the integrals Inh®n for different n are D orthogonal, which holds by Lemma 13.22. Given an isonormal Gaussian process Tl on some separable Hilbert space H, we introduce the space L 2 (ry) = L 2 (0, a{ ry}, P) of ry-measurable random variables~ with < oo. The nth polynomial chaos Pn is defined as the closed linear subspace generated by all polynomials of degree ::::; n in the random variables ryh, h E H. We also introduce for every n E Z+ the nth homogeneaus chaos 1ln, consisting of all integrals Inf, f E H®n. The relationship between the mentioned spaces is clarified by the following result. As usual, we write EB and 8 for direct sums and orthogonal complements, respectively.

Ee

Theorem 13.26 (chaos expansion, Wiener) On a separable Hilbert space H, let Tl be an isonormal Gaussian process with associated polynomial and homogeneaus chaoses Pn and 1ln, respectively. Then the 1ln are orthogonal, closed, linear subspaces of L 2 ( TJ), satisfying (17) k=O

n=O

Furthermore, every ~ E L 2 (ry) has a unique a.s. representation ~ with symmetric elements fn E H®n, n 2: 0.

= L:n Infn

In particular, we note that 1-lo =Po= lR and

Proof: The properties in Lemma 13.22 extend to arbitrary integrands, and so the spaces 1ln are mutually orthogonal, closed, linear subspaces of L 2 (ry). From Lemma 13.23 or Theorem 13.25 we see that also 1ln C Pn· Conversely, let ~ be an nth-degree polynomial in the variables ryh. We may then choose some orthonormal elements eb ... , em E H suchthat ~ is an nth-degree polynomial in rJeb ... , ryem. Since any power (rJej )k is a linear combination of the variables Po ( ryej), ... , Pk (ryej), Theorem 13.25 shows that ~ is a linear combination of multiple integrals Ikf with k ::::; n, which means that ~ E ffik 0} = 0 a.s. (Hint: Conclude from Kolmogorov's 0-1 law that the stated event has probability 0 or 1. Alternatively, use Theorem 13.18.) 9. Fora Brownian motion B, define Ta= inf{t > 0; Bt = a}. Compute the density of the distribution of Ta for a =/= 0, and show that ETa = oo. (Hint: Use Proposition 13.13.) 10. Fora Brownian motion B, show that Zt = exp(cBt- ~c2 t) is a martingale for every c. Use optional sampling to compute the Laplace transform of Ta above, and compare with the preceding result. 11. (Paley, Wiener, and Zygmund) Show that Brownian motionBis a.s. nowhere Lipschitz continuous, and hence nowhere differentiable. ( Hint: If B is Lipschitz at t < 1, there exist some K, 6 > 0 such that \Br - Bs I ::; 2hK for all r, s E (t- h, t + h) with h < 6. Apply this to three consecutive n-dyadic intervals (r, s) around t.)

12. Refine the preceding argument to show that Bis a.s. nowhere Hölder continuous with exponent c > ~. 13. Show that the local maxima of a Brownian motion are a.s. dense in IR. and that the corresponding times are a.s. dense in IR.+. (Hint: Use the preceding result.) 14. Show by a direct argument that limsupt t- 1 12 Bt = oo a.s. as t ---+ 0 and oo, where Bis a Brownian motion. (Hint: Use Kolmogorov's 0-1law.) 15. Show that the law of the iterated logarithm for Brownian motion at 0 remains valid for the Brownian bridge. 16. Showfora Brownian motion B in JR.d that the process \BI satisfies the law of the iterated logarithm at 0 and oo. 17. Let 6,6, ... be i.i.d. N(O, 1). Show that limsupn(2logn)- 1 1 2 ~n = 1 a.s. 18. Fora Brownian motion B, show that Mt= C 1 Bt is a reverse martingale, and conclude that C 1 Bt -r 0 a.s. and in LP, p > 0, as t -r oo. (Hint: The limit is degenerate by Kolmogorov's ü-1 law.) Deduce the same result from Theorem 10.9. 19. Fora Brownian bridge B, show that Mt= (1- t)- 1 Bt is a martingale on [0, 1). Checkthat M is not L 1-bounded. 20. Let In be the n-fold Wiener-Itö integral w.r.t. Brownian motion Bon IR+. Show that the process Mt = In(l[o,t]n) is a martingale. Express M in terms of B, and compute the expression for n = 1, 2, 3. (Hint: Use Theorem 13.25.)

130 Gaussian Processes and Brownian Motion

269

21. Let TJ1, oo TJn be independent, isonormal Gaussian processes on a separable Hilbert space Ho Show that there exists a unique continuous linear mapping ®k 'TJk from Hrgm to L 2 (P) such that ®k 'TJk ®k hk = Ilk 'T]khk aoso for all h1. hn E Ho Alsoshow that ®k TJk is an isometryo 0

,

0

0

0,

Chapter 14

Skorohod Embedding and lnvariance Principles Embedding of rondom variables; approximation of mndom walks; functional centml limit theorem; laws of the iterated logarithm; arcsine laws; approximation of renewal processes; empirical distribution functions; embedding and approximation of martingales

In Chapter 5 we used analytic methods to derive criteria for a sum of independent random variables to be approximately Gaussian. Though this may remain the easiest approach to the classicallimit theorems, the results are best understood when viewed as consequences of some general approximation theorems for random processes. The aim of this chapter is to develop a purely probabilistic technique, the so-called Skorohod embedding, for deriving such functionallimit theorems. In the simplest setting, we may consider a random walk (Sn) based on some i.i.d. random variables ek with mean 0 and variance 1. In this case there exist a Brownian motion B and some optional times T1 :::; T2 :::; · · · suchthat Sn = Brn a.s. for every n. For applications it is essential to choose the Tn such that the differences D.rn are again i.i.d. with mean one. The step process B[t] will then be close to the path of B, and many results for Brownian motion carry over, at least approximately, to the random walk. In particular, the procedure yields versions for random walks of the arcsine laws and the law of the iterated logarithm. From the statements for random walks, similar results may be deduced rather easily for various related processes. In particular, we shall derive a functional central limit theorem and a law of the iterated logarithm for renewal processes, and we shall also see how suitably normalized versions of the empirical distribution functions from an i.i.d. sample can be approximated by a Brownian bridge. For an extension in another direction, we shall obtain a version of the Skorohod embedding for general L 2 -martingales and show how any suitably time-changed martingale with small jumps can be approximated by a Brownian motion. The present exposition depends in many ways on material from previous chapters. Thus, we rely on the basic theory of Brownian motion, as set forth in Chapter 13. We also make frequent use of ideas and results from Chapter 7 on martingales and optional times. Finally, occasional references

14. Skorohod Embedding and Invariance Principles

271

are made to Chapter 4 for empirical distributions, to Chapter 6 for the transfer theorem, to Chapter 9 for random walks and renewal processes, and to Chapter 12 for the Poisson process. More general approximations and functionallimit theorems are obtained by different methods in Chapters 15, 16, and 19. We also note the close relationship between the present approximation result for martingales with small jumps and the time-change results for continuous local martingales in Chapter 18. To clarify the basic ideas, we begin with a detailed discussion of the classical Skorohod embedding for random walks. The main result in this context is the following.

Theorem 14.1 ( embedding of random walk, Skorohod} Let ~I. 6, . . . be i.i.d. random variables with mean 0, and put Sn = 6 + ·· · + ~n· Then there exists a jiltered probability space with a Brownian motion B and some optional times 0 = ro ::; r1 ::; ... suchthat (B-rn) !1:: (Sn) and the differences ßrn = rn- rn-1 are i.i.d. with Eßrn = E~? and E(ßrn) 2 ::; 4E~[. Here the moment requirements on the differences ßrn are crucial for applications. Without those conditions the statement would be trivially true, since we could then choose Bll(~n) and define the rn recursively by rn = inf{t 2:: rn-1; Bt =Sn}· In that case Ern= oo unless 6 = 0 a.s. The proof of Theorem 14.1 is based on a sequence of Iemmas. First we exhibit some martingales associated with Brownian motion.

Lemma 14.2 (Brownian martingales) For a Brownian motion B, the processes Bt, B'f - t, and Bf- 6tB'f + 3t2 are all martingales. Proof: Note that EBt = EBr = 0, EB'f = t, and EBf = 3t2 . Write :F for the filtration induced by B, let 0 ::::; s S t, and recall that the process Fit= Bs+t- B 8 is again a Brownian motion independent of :F8 • Hence, 2

2

E[Bt J:Fs] = E[B 8

-2 + 2BsBt-s + Bt-sl:Fs]

=

2

B8

+ t- S.

Moreover,

and so 0

By optional sampling, we may deduce some useful formulas.

Lemma 14.3 (moment relations} Consider a Brownian motion B and an optional time r such that B-r is bounded. Then EB-r

= 0,

Er 2

Er= EB;,

::;

4EB;..

(1)

Proof: By optional stopping and Lemma 14.2, we get for any t 2:: 0 EB-r/\t

= 0,

E( r

1\

t)

= EB;"t'

(2)

272

Foundations of Modern Probability

(3) The first two relations in (1) follow from (2) by dominated and monotone convergence as t -+ oo. In particular, we have Er < oo. We may then take limits even in (3) and conclude by dominated and monotone convergence together with the Cauchy-Buniakovsky inequality that

3Er2

+ EB; = 6Er B; ~ 6(Er 2 EB;) 112 •

Writing r = (Er 2 fEB;.) 112 , we get 3r 2 + 1 ~ 6r. Thus, 3(r -1) 2 ~ 2, and finally, r ~ 1 + (2/3) 112 < 2. 0 The next result shows how an arbitrary distribution with mean 0 can be expressed as a mixture of centered two-point distributions. For any a ~ 0 ~ b, let Va,b denote the unique probability measure on {a, b} with mean 0. Clearly, Va,b = t5o when ab = 0, and otherwise b- a

'

a

< 0 < b.

It is easy to verify that v is a probability kernel from llL x JR+ to lR. For mappings between two measure spaces, measurability is defined in terms of the a-fields generated by all evaluation maps 7rB: JL f---7 JLB, where B is an arbitrary set in the underlying a-field.

Lemma 14.4 (randomization} For any distribution JL on lR with mean zero, there exists a distribution [L on llL x JR+ with JL = J [L(dxdy)vx,y, and we can choose [L to be a measurable function of JL. Proof (Chung}: Let JL± denote the restrictions of JL to lR± \ {0}, define x, and put c = J ldJL+ = - J ldJL-· For any measurable function l(x) f: lR-+ JR+ with f(O) = 0, we get

=

c

J

fdJL

=

J J JJ ldJL+

fdJL--

J J J ldJL-

(y- X)JL-(dx)JL+(dy)

fdJL+

fdvx,y,

and so we may take [L(dx dy)

= JL{O}t5o,o(dx dy) + c- 1 (y- x)JL-(dx)JL+(dy).

The measurability of the mapping JL r-+ [L is clear by a monotone dass argument, once we note that [L(A x B) is a measurable function of JL for arbitrary A, BE B(JR). 0 The embedding in Theorem 14.1 will now be constructed recursively, beginning with the first random variable 6.

14. Skorohod Embedding and Invariance Principles

273

Lemma 14.5 {embedding of random variable} For any probability measure J-L on IR with mean 0, consider a random pair (a,ß) with distribution ji as in Lemma 14.4, and let B be an independent Brownian motion. Then the time T = inf{ t ~ 0; Bt E { a, ß}} is optional for the filtration Ft = a{a,ß; B 8 , s :S t}, and we have C(Br) = J-L,

ET =

J

x 2 J-L(dx),

ET 2 :S 4

J

x 4 J-L(dx).

Proof: The process B is clearly an F-Brownian motion, and T is Foptional as in Lemma 7.6 (ii). Using Lemma 14.3 and Fubini's theorem gives E P[Br E ET

·I a, ß] = Eva:,ß = J-L,

EE[Tia,ß] =E

J J

E E[T 2 1 a, ß] :S 4E

=

J

x 4 va:,ß(dx)

=4

x 2 va:,ß(dx)

x 2 J-L(dx),

J

x 4 J-L(dx).

D

Proof of Theorem 14.1: Let J-L be the common distribution of the ~n· Introduce a Brownian motionBand some independent i.i.d. pairs (an,ßn), n E N, with the distribution ji of Lemma 14.4. Define recursively the random times 0 = To :S T1 :S · · · by

Here each Tn is clearly optional for the filtration Ft = a{ ak, ßk, k ~ 1; Bt}, ~ 0, and Bis an F-Brownian motion. By the strong Markov property at Tn, the process B~n) = Brn+t -B7 n is then a Brownian motion independent of Gn = a{Tk,B7 k; k :Sn}. Since moreover (an+l,ßn+l)ll(ß(n),Qn), we obtain (an+l, ßn+l• ß(n))Jl9n, and so the pairs (ßTn, ßB7 J are i.i.d. The remaining assertions now follow by Lemma 14.5. D

t

The last theorem enables us to approximate the entire random walk by a Brownian motion. As before, we assume the underlying probability space to be rich enough to support the required randomization variables. Theorem 14.6 (approximation of random walk, Skorohod, Strassen} Let

6, 6, . . . be i. i. d. random variables with mean 0 and variance 1, and write Sn = 6 + · · · + ~n. Then there exists a Brownian motion B such that C

1/ 2

SUPs::;tiS[s] - Bsl ~ 0, lim hoo

S[t] - Bt -./2t log logt

=0

t-+ oo, a.s.

The proof of (5) requires the following estimate.

(4) (5)

274

Foundations of Modern Probability

Lemma 14.7 (rate of continuity) For a Brownian motion B in IR, we have . 1. IBu- Btl 11m 1msup sup = 0 a.s. r!l t-+oo t~u~rt v'2t log logt Proof: Write h(t)

= (2tloglogt) 112. It is enough to show that

lim lim sup r !1 n-+oo

sup

r"~t~rn+l

IBt- Brnl h(rn) = 0 a.s.

Proceeding as in the proof of Theorem 13.18, we get as n r > 1 and c > 0 P {suptE[rn,rn+lJIBt- Brnl > ch(rn)}

(6) ~

oo for fixed

;:S

P{B(rn(r -1)) > ch(rn)}

;:S

n-c2/(r-l)(logn)-1/2.

(As before, a;::; b means that a::; cb for some constant c > 0.) If c2 > r -1, it is clear from the Borel-Cantelli lemma that the limsup in (6) is a.s. bounded by c, and the relation follows as we let r ~ 1. 0 For the main proof, we need to introduce the modulus of continuity

w(f, t, h) =

sup

r,s9, lr-sl~h

1/r-

fsl,

t, h > 0.

Proof of Theorem 14.6: By Theorems 6.10 and 14.1 we may choose a Brownian motion B and some optional times 0 = To ::; T1 ::; · · · such that Sn = Brn a.s. for all n, and the differences Tn- Tn-1 are i.i.d. with mean 1. Then Tn/n ~ 1 a.s. by the law of large numbers, and so T[tJft ~ 1 a.s. Relation (5) now follows by Lemma 14.7. Next define

Dt = SUPs91T[8 ]

-

si,

t 2:: 0,

and note that the a.s. convergence Tn/n ~ 1 implies 8tft ~ 0 a.s. Fix any

t, h, c > 0, and conclude by the scaling property of B that P { r 1 1 2 sups~tiB-r1 • 1 ::;

-

Bsl

> c}

P{ w(B, t

+ th, th) > ct 112 } + P{ 8t > th}

= P{w(B,1+h,h)>c}+P{r 18t>h}. Here the right-hand side tends to zero as t follows.

~

oo and then h

~

0, and (4) 0

As an immediate application of the last theorem, we may extend the law of the iterated logarithm to suitable random walks.

14. Skorohod Embedding and Invariance Principles

275

Corollary 14.8 (law of the iterated logarithm, Hartman and Wintner) Let 6, 6, . . . be i. i. d. random variables with mean 0 and variance 1, and define Sn= ~1 + · · · + ~n· Then Sn . l1msup = 1 a.s. n-+oo v2n log log n

Proof: Combine Theorems 13.18 and 14.6.

0

To derive a weak convergence result, let D[O, 1] denote the space of all functions on [0, 1] that are right-continuous with left-hand limits (rcll). For our present needs, it is convenient to equip D[O, 1] with the norm Jlxll = supt lxtl and the a-field V generated by all evaluation maps 1ft: x H Xt. The norm is clearly V-measurable, and so the same thing is true for the open balls Bx,r = {y; llx- yJJ < r}, x E D[O, 1], r > 0. (However, V is strictly smaller than the Borel a-field induced by the norm.) Given a process X with paths in D[O, 1] and a mapping f: D[O, 1]-+ :IR, we say that f is a.s. continuous at X if X rt DJ a.s., where DJ is the set of functions x E D[O, 1] where f is discontinuous. (The measurability of DJ is irrelevant here, provided that we interpret the condition in the sense of inner measure.) We may now state a functional version of the classical central limit theorem. Theorem 14.9 (functional centrallimit theorem, Donsker) Let ~1, 6, ... be i. i. d. random variables with mean 0 and variance 1, and define

xn t

= n- 112 " ' c L...Jk'5_nt"'k'

t E

[0 , 1] , n E l'l. lM

Consider a Brownian motion B on [0, 1], and let f : D[O, 1] -+ :IR be measurable and a.s. continuous at B. Then f(Xn) 4 f(B).

The result follows immediately from Theorem 14.6 tagether with the following lemma. Lemma 14.10 (approximation and convergence)

Let X1,X2 , ••• and

=

Y1, Y2, ... be rcll processes on [0, 1] with Yn :!= Y1 Y for all n and IIXn- Ynll ~ 0, and let f: D[O, 1]-+ :IR be measurable and a.s. continuous at Y. Then f(Xn) 4 f(Y).

= Q n [0, 1]. By Theorem 6.10 there exist some processes X~ on T suchthat (X~, Y) :!= (Xn, Yn) on T for all n. Then each X~ is a.s. Proof: Put T

bounded and has finitely many upcrossings of any nondegenerate interval, and so the process Xn (t) = X~ (t+) exists a.s. with paths in D[O, 1]. From the right continuity of paths it is also clear that (Xn, Y) :!= (Xn, Yn) on [0, 1] for every n. To obtain the desired convergence, we note that IIXn-Yll.i= I!Xn- Yn II ~ 0, and hence f(Xn) :!= f(Xn) ~ f(Y) as in Lemma 4.3. 0

276

Foundations of Modern Probability

In particular, we may recover the central limit theorem in Proposition 5.9 by taking f(x) = x1 in Theorem 14.9. We may also obtain results that go beyond the classical theory, such as for the choice f(x) =supt lxtl· As a less obvious application, weshall see how the arcsine laws of Theorem 13.16 can be extended to suitable random walks. Recall that a random variable ~ is said tobe arcsine distributed if ~ g, sin2 a, where a is U(O, 27r). Theorem 14.11 (arcsine laws, Erdös and Kac, Sparre-Andersen} Let (Sn) be a random walk based on some distribution f..l with mean 0 and variance 1, and define for n E N

T~ r2 n

= =

r3 n

n- 1 ~k~n1{Sk n- 1 min{k n- 1 max{k

> 0},

2:: 0; Sk = maxj~nSj}, ~ n; SkSn ~ 0}.

Then r~ ~ r for i = 1, 2, 3, where r is arcsine distributed. The results for i = 1, 2 remain valid for any nondegenerate, symmetric distribution f-l·

For the proof, we consider on D[O, 1] the functionals

.\{t E [0, 1]; Xt > 0},

ft(x) h(x) fa(x)

=

inf{t E [0, 1]; Xt V Xt- = SUP8 9Xs}, sup{t E [0, 1]; XtX 1 ~ 0}.

The following result is elementary. Lemma 14.12 (continuity of functionals) The functionals Ii are measurable. Furthermore, ft is continuous at x iff .X{ t; Xt = 0} = 0, h is continuous at x iff Xt V Xt- has a unique maximum, and Ja is continuous at x if 0 is not a local extreme of Xt or Xt- on (0, 1].

Proof of Theorem 14.11: Clearly, r~ = fi(Xn) for n E N and i = 1, 2, 3, where X tn

= n -1/2s[nt],

t E

[0, 1], n E N.

To prove the first assertion, it suffi.ces by Theorems 13.16 and 14.9 to show that each fi is a.s. continuous at B. Thus, we need to verify that B a.s. satisfies the conditions in Lemma 14.12. For ft this is obvious, since by Fubini's theorem E.X{t

~ 1; Bt = 0} =

1 1

P{Bt

= O}dt = 0.

The conditions for h and Ja follow easily from Lemma 13.15. To prove the last assertion, it is enough to consider r~, since r; has the same distribution by Corollary 11.14. Then introduce an independent Brownian motion B and define

a~

= n- 1 "'"' 1{c-Bk + (1- c-)Sk > 0}, L...,;k~n

n E N, e E (0, 1].

14. Skorohod Embedding and Invariance Principles

277

By the first assertiontagether with Theorem 9.11 and Corollary 11.14, we have a~ ~ a;, ~ T. Since P{Sn = 0}--+ 0, e.g. by Theorem 4.17, we also note that limsup Ia~- r~l ~ n- 1 c~o

L

Hence, we may choose some constants Theorem 4.28 we get T~ ~

k~n

C:n

1{Sk = 0} ~ 0.

--+ 0 with a~n - T~ ~ 0, and by D

T.

Theorem 14.9 is often referred to as an invariance principle, because the limiting distribution of J(Xn) is the same for all i.i.d. sequences (~k) with mean 0 and variance 1. This fact is often useful for applications, since a direct computation may be possible for some special choice of distribution, such as for P{~k = ±1} = ~The approximation Theorem 14.6 yields a corresponding result for renewal processes, regarded here as nondecreasing step processes. Theorem 14.13 ( approximation of renewal processes) Let N be a renewal process based on some distribution J.l with mean 1 and variance a 2 E (0, oo). Then there exists a Brownian motion B such that

r

1 / 2 SUPs~tiNs-

s- aBsl ~ 0,

t--+ oo,

. Nt- t- aBt hm = 0 a.s. y'2t log logt

t~oo

(7) (8)

Proof: Let To, T1, ... be the renewal times of N, and introduce the random walk Sn = n- Tn + To, n E z+. Choosing a Brownian motion B as in Theorem 14.6, we get Nrn-Tn-O"Bn • lIm = l"Im

n~oo

y'2n log log n

n~oo

Sn-O"Bn = O a.s. y'2n log log n

Since Tn "' n a.s. by the law of large numbers, we may replace n in the denominator by Tn, and by Lemma 14.7 we may further replace Bn by Brn· Hence, Nt- t- aBt y'2t log logt

----;:~===-=:;:---+

0 a.s. along (rn)·

Invoking Lemma 14.7, we see that (8) will follow if we can only show that Tn+1- Tn y'2rn log log T n

---;:;;=:::'=F==r== --+

O

a.s.

This may be seen most easily from Theorem 14.6. From Theorem 14.6 we see that also n- 1 / 2 sup INrk- Tk- aBkl = n- 1/ 2 sup ISk- To- aBkl ~ 0, k~n

k~n

and by Brownian scaling,

n- 112 w(B, n, 1) ~ w(B, 1, n- 1 )--+ 0.

278

Foundations of Modern Probability

To get (7), it is then enough to show that n-l/ 2 SUPk::;nlrk- Tk-1- 11

= n- 1/ 2SUPk::;niSk- Sk-11 ~ 0,

which is again clear from Theorem 14.6.

0

We may now proceed as in Corollary 14.8 and Theorem 14.9 to deduce an associated law of the iterated logarithm and a weak convergence result. Corollary 14.14 {limits of renewal processes) Let N be a renewal process based on some distribution J.l with mean 1 and variance a 2 < oo. Then

. ±(Nt- t) hmsup hoo y'2t log logt If B is a Brownian motion and xr _ Nrt - rt t arl/ 2 ,

= a a.s. [

]

t E 0, 1 , r

> 0,

then also f(Xr) ~ f(B) as r -+ oo for any measurable function f : D[O, 1] -+ lR that is a.s. continuous at B. The weak convergence part of the last corollary yields a similar result for the empirical distribution functions associated with a sequence of i.i.d. random variables. In this case the asymptotic behavior can be expressed in terms of a Brownian bridge. Theorem 14.15 (approximation of empirical distribution functions) Let

6, 6, . . . be

i. i. d. random variables with distribution function F and empirical distribution functions F't, F2, . . . . Then there exist some Brownian bridges B 1 , B 2 , ... such that

supx ln 1/ 2 (Fn(x)- F(x))- Bn o F(x)i

~ 0, n-+ oo.

(9)

Proof: Arguing as in the proof of Proposition 4.24, we may reduce the discussion to the case when the ~n are U(O, 1), and F(t) t on [0, 1]. Then clearly

=

"'n

Now introduce for each n an independent Poisson random variable with mean n, and conclude from Proposition 12.4 that N[" = Lk::;~tn 1{~k ::::; t} isahomogeneaus Poisson process on [0, 1] with raten. By Theorem 14.13 on [0, 1] with there exist some Brownian motions

wn

SUPt::;t ln- 112 (Nt- nt)For the associated Brownian bridges B'f supt::;t

Wtl ~ 0.

= W["

- t Wf, we get

ln- 1 (Nt- tNf)- Bfl ~ 0. 1 2

14. Skorohod Embedding and Invariance Principles

279

To deduce (9), it is enough to show that

n-l/ 2 SUPt 2}. Then

[Mn]"'" - (Mn)"'" ~ 0 by Lemma 14.18, and so P{ (Mn)"'n < 1, "-n < oo} -t 0. We may then reduce by optional stopping to the case when [Mn] :::; 3. The proof may now be completed as before. 0 Though the Skorohod embedding has no natural extension to higher dimensions, one can still obtain useful multidimensional approximations by applying the previous results to each component separately. To illustrate the method, we proceed to show how suitable random walks in JRd can be approximated by continuous processes with stationary, independent increments. Extensions to more generallimits are obtained by different methods in Corollary 15.20 and Theorem 16.14.

Theorem 14.20 (approximation of random walks in Rd) Let 8 1 , 8 2 , ••. be random walks in JR.d such that .C(S~J ~ N(O, aa') for some d x d matrix a and integers mn -t oo. Then there exist some Brownian motions B 1 , B 2 ,... in JR.d such that the processes X;' = S[;nnt] satisfy (Xn- aBn); ~ 0 for all t 2: 0. Proof: By Theorem 5.15 we have p

IASi:l -t 0, k~mnt max

t

2: 0,

14. Skorohod Embedding and Invariance Principles

283

=

and so we may assume that ILlSkl ::; 1 for all n and k. Subtracting the means, we may further assume that ESk 0. Applying Theorem 14.17 in each coordinate, we get w(Xn,t,h) ~ 0 as n-+ oo and then h-+ 0. Furthermore, w(aB, t, h)-+ 0 a.s. as h-+ 0. Using Theorem 5.15 in both directions gives X4, ~ aBt as tn-+ t. By independence it follows that (X~ , ... , Xt,) ~ a( Bt 1 , ••• , Bt"') for all n E N and tl' ... 'tn ~ 0, and so xn ~ aB Oll IQ+ by Theorem 4.29. By Theorem 4.30 or, more conveniently, by Corollary 6.12 and Theorem A2.2, there exist some rcll processes yn 4 xn with ~n-+ aBt a.s. for all t E IQ+. For any t, h > 0 we have E[(Yn- aB); 1\ 1]

< E [maxj:9/hllj'h- aBjhl 1\ 1] +E[w(Yn, t, h) 1\ 1] + E[w(aB, t, h) 1\ 1].

Multiplying by e-t, integrating over t > 0, and letting n -+ oo and then h-+ 0 along IQ+, we get by dominated convergence

Hence, by monotonicity, the last integrand tends to zero as n -+ oo, and so (Yn- aB); ~ 0 foreacht > 0. It remains to use Theorem 6.10. 0

Exercises 1. Proceed as in Lemma 14.2 to construct Brownian martingales with leading terms Bi and Bf. Use multiple Wiener-Itö integrals to give an alternative proof of the lemma, and find for every n E N a martingale with leading term Bf. (Hint: Use Theorem 13.25.)

2. Given a Brownian motion B and an associated optional time T < oo, show that Er~ EB;. (Hint: Truncate T and use Fatou's lemma.)

3. For Sn as in Corollary 14.8, show that sequence of random variables (2nlog 1ogn)- 112 Sn, n ~ 3, is a.s. relatively compact with set oflimit points equal to [-1, 1]. (Hint: Prove the corresponding property for Brownian motion, and use Theorem 14.6.)

4. Let 6, 6, ... be i.i.d. random vectors in JRd with mean 0 and covariances 8ij· Show that the conclusion of Corollary 14.8 holds with Sn replaced by ISnl· More precisely, show that the sequence (2nloglogn)- 112 Sn, n ~ 3, is relatively compact in JRd, and that the set of Iimit points is contained in the closed unit ball. (Hint: Apply Corollary 14.8 to the projections u ·Sn for arbitrary u E JRd with Iu I = 1.)

284

Foundations of Modern Probability

5. In Theorem 13.18, show that for any c E (0, 1) there exists a sequence

tn -+ oo suchthat the limsup along (tn) equals c a.s. Gonelude that the

set of limit points in the preceding exercise agrees with the elosed unit ball in JRd.

6. Gondition (21) elearly follows from Lk E[lßMrl/\ 1IF~_ 1 ] ~ 0. Show by an example that the latter condition is strictly stronger. (Hint: Gonsider a sequence of random walks.)

7. Specialize Lemma 14.18 to random walks, and give a direct proof in this case. 8. In the special case of random walks, show that condition (21) is also necessary. (Hint: Use Theorem 5.15.) 9. Specialize Theorem 14.17 to a sequence of random walks in R, and derive a corresponding extension of Theorem 14.9. Then derive a functional version of Theorem 5.12. 10. Specialize further to the case of successive renormalizations of a single random walk Sn. Then derive a limit theorem for the values at t = 1, and compare with Proposition 5.9. 11. In the second aresirre law of Theorem 14.11, show that the first maximum on [0, 1] can be replaced by the last one. Goneludethat the associated times an and Tn satisfy Tn- an ~ 0. (Hint: Use the corresponding result for Brownian motion. Alternatively, use the symmetry of (Sn) and of the aresirre distribution.) 12. Extend Theorem 14.11 to an arbitrary sequence of symmetric random walks satisfying a Lindeberg condition. Also extend the results for and to sequences of random walks based on diffuse, symmetric distributions. Finally, show that the result for T~ may fail in the latter case. (Hint: Consider the n- 1-increments of a compound Poisson process based on the uniform distribution on [-1, 1], perturbed by a small diffusion term EnB, where B is an independent Brownian motion.)

T;

T;

13. In the context of Theorem 14.20, show that for any Brownian motion B there exist some processes yn :'!::: xn such that (Yn - aB); -+ 0 a.s.

for all t 2: 0. Prove a corresponding version of Theorem 14.17. (Hint: Use Theorem 4.30 or Gorollary 6.12.)

Chapter 15

Independent Increments and Infinite Divisibility Regularity and integral representation; Levy processes and subordinators; stable processes and first-passage times; infinitely divisible distributions; characteristics and convergence criteria; approximation of Levy processes and random walks; limit theorems for null arrays; convergence of extremes

In Chapters 12 and 13 we saw how Poisson processes and Brownian motion arise as special processes with independent increments. Our present aim is to study more general processes of this type. Under a mild regularity assumption, we shall derive a general representation of independent-increment processes in terms of a Gaussian component and a jump component, where the latter is expressible as a suitably compensated Poisson integral. Of special importance is the time-homogeneaus case of so-called Levy processes, which admit a description in terms of a characteristic triple (a, b, v), where a is the diffusion rate, bis the drift coefficient, and v is the Levy measure that determines the rates for jumps of different sizes. In the same way that Brownian motion is the basic example of both a a diffusion process and a continuous martingale, the general Levy processes constitute the fundamental cases of both Markov processes and general semimartingales. As a motivation for the general weak convergence theory of Chapter 16, weshall further see how Levy processes serve as the natural approximations to random walks. In particular, such approximations may be used to extend two of the arcsine laws for Brownian motion to general symmetric Levy processes. Increasing Levy processes, even called subordinators, play a basic role in Chapter 22, where they appear in representations of local time and regenerative sets. The distributions of Levy processes at fixed times coincide with the infinitely divisible laws, which also arise as the most generallimit laws in the classicallimit theorems for null arrays. The special cases of convergence toward Poisson and Gaussian limits were considered in Chapter 5, and now we shall be able to characterize the convergence toward an arbitrary infinitely divisible law. Though characteristic functions will still be needed occasionally as a technical tool, the present treatment is more probabilis-

286

Foundations of Modern Probability

tic in fl.avor and involves as crucial steps a centering at truncated means followed by a compound Poisson approximation. To resume our discussion of general independent-increment processes, say that a process X in JR.d is continuous in probability if Xs !+ Xt whenever s--+ t. Let us further say that a function f on IR+ or [0, 1] is right-continuous with left-hand limits (abbreviated as rclQ if the right- and left-hand limits ft± exist and are finite and if, moreover, ft+ ft. A process X is said to be rcll if its paths have this property. In that case only jump discontinuities may occur, and we say that X has a fixed jump at some time t > 0 if P{Xt -::J Xt-} > 0. The following result gives the basic regularity properties of independentincrement processes. A similar result for Feller processes is obtained by different methods in Theorem 19.15.

=

Theorem 15.1 (regularization, Levy) lf a process X in JR.d is continuous in probability and has independent increments, then X has an rcll version without fixed jumps.

For the proof we shall use a martingale argument based on the characteristic functions 'Ps,t(u)

= Eexp{iu(Xt- X 8)},

u E IR.d, 0::;

s::; t.

Note that 'Pr,s'Ps,t = 'Pr,t for any r ::; s ::; t, and put 0.

(2)

In the special case when X is real and nondecreasing, (1) simplifies to Xt = at + 1t1oo XTJ(dsdx),

t 2:0,

for some nondecreasing continuous function a with ao Poisson process 'fJ on (0, oo) 2 with 1t1oo(xi\1)Ery(dsdx)O.

Both representations are a.s. unique, and all functions m or a and processes G and 'fJ with the stated properlies may occur. We begin the proof by analyzing the jump structure of X. Let us then introduce the random measure

=

(5)

where the summation extends over all times t > 0 with ßXt Xt - Xt- i0. We say that 'fJ is locally X -measumble if, for any s < t, the measure ry((s, t] x ·) is a measurable function of the process Xr- X 8 , r E [s, t].

288

Foundations of Modern Probability

Lemma 15.5 (Poisson process of jumps) Let X be an rcll process in ~d with independent increments and no fixed jumps. Then TJ in (5) is a locally X -measurable Poisson process on (0, oo) x (~d \ {0}) satisfying (2). lf X is further real-valued and nondecreasing, then rJ is supported by (0, oo )2 and satisfies (4).

Proof (beginning): Fix any times s < t, and consider a sequence of partitions s = tn,O < · · · < tn,n with maxk(tn,k-tn,k-d--+ 0. For any continuous function f on ~d that vanishes in a neighborhood of 0, we have L/(Xtn,k - Xtn,k-l)--+

J

f(x)ry((s, t]

X

dx),

which implies the measurability of the integrals on the right. By a simple approximation we may conclude that ry((s, t] x B) is measurable for every compact set B c ~d \ {0}. The measurability extends by a monotone class argument to all random variables ryA with A included in some fixed bounded reetangle [0, t] x B, and the further extension to arbitrary Borel sets is immediate. Since X has independent increments and no fixed jumps, the same properties hold for ry, which is then Poisson by Theorem 12.10. If Xis real-valued D and nondecreasing, then (4) holds by Theorem 12.13. The proof of (2) requires a further lemma, which is also needed for the main proof. Lemma 15.6 (orthogonality and independence} Let X and Y be rcll processes in ~d with X 0 = Yo = 0 suchthat (X, Y) has independent increments and no fixed jumps. Assume also that Y is a.s. a step process and that ÄX · ÄY = 0 a.s. Then XJlY.

Proof: Define rJ as in (5) in terms of Y, and note as before that 'fJ is locally Y-measurable whereas Y is locally ry-measurable. By a simple transformation of rJ we may reduce to the case when Y has bounded jumps. Since TJ is Poisson, Y then has integrable variation on every finite interval. By Corollary 3.7 we need to show that (Xt 1 , ••• , XtJll('l'tu ... , ytJ for any t1 < · · · < tn, and by Lemma 3.8 it suffices to show for all s < t that Xt- X 8 llyt- Y 8 • Without loss of generality, we may take s = 0 and t = 1. Then fix any u, v E ~d, and introduce the locally bounded martingales t ~ 0.

15. Independent Increments and Infinite Divisibility

289

Note that N again has integrable variation on [0, 1]. For n E N, we get by the martingale property and dominated convergence

E"'""' L...tk 0.

(6)

Then introduce for each c > 0 the process

Xf = L

s$_t

6.Xs1{16.Xsl > c}

=1

lxl>c

X'f!t(dx),

t

2:: 0,

and note that X" JLX- X" by Lemma 15.6. By Lemmas 12.2 (i) and 15.2 we get for any .s,t > 0 and u E JRd \ {0} 0

< IEeiuX•I::; IEeiux:l = Eexp1

lxl>c

exp1

lxl>c

iUX'f!t(dx)

(eiux -1)E'f!t(dx) = exp1

lxl>c

Letting c -+ 0 gives { iuxl 2 E'f/t(dx) Jluxl9

~

!(1-

and (6) follows since u is arbitrary.

(cosux -1)E'f!t(dx).

cosux)E'f!t(dx) < oo, 0

Proof of Theorem 15.4: In the nondecreasing case, we may subtract the jump component to obtain a continuous, nondecreasing process Y with independent increments, and from Theorem 5.11 it is clear that Y is a.s. nonrandom. Thus, in this case we get a representation as in (3).

290

Foundations of Modern Probability

In the general case, introduce for each c E [0, 1) the martingale

M[ =

t

f

Jo JlxiE(e,l]

x (17- E17)(ds dx),

t 2:: 0.

Put Mt = Mf, and let Jt denote the last term in (1). By Proposition 7.16 we have E(Me - M 0 ); 2 ~ 0 for each t. Thus, M + J has a.s. the same jumps as X, and so the process Y =X- M- J is a.s. continuous. Since 17 is locally X-measurable, the same thing is true for Y. Theorem 13.4 then shows that Y is Gaussian with continuous mean and covariance functions. Subtracting the means mt yields a continuous, centered Gaussian process G, and by Lemma 15.6 we get Gll(ME + J) for every c > 0. The independence extends to M by Lemma 3.6, and so G ll1J. The uniqueness of 1J is clear from (5), and G is then determined by subtraction. From Theorem 12.13 it is further seen that the integrals in (1) and (3) exist for any Poisson process 1J with the stated properties, and we note that the resulting process has independent increments. 0 We may now specialize to the time-homogeneous case, when the distribution of Xt+h - Xt depends only on h. An rcll process X in ~d with stationary independent increments and Xo = 0 is called a Levy process. If X is also real and nonnegative, it is often called a subordinator. Corollary 15.7 (Levy processes and subordinators) An rcll process X in JRd is Levy iff (1) holds with mt bt, Gt a Bt, and Ery =..X® 11 for some b E JRd, some d x d-matrix a, some measure v on JRd \ {0} with J(lxl 2 1\ 1)11(dx) < oo, and some Brownian motion Bll17 in JRd. Furthermore, X is a subordinator iff (3) holds with at at and E17 = ..X®11 for some a 2: 0 and some measure 11 on (0, oo) with J(x 1\ 1)11(dx) < oo. The triple (aa', b, v) or pair (a, v) is then determined by .C(X), and any a, b, a, and 11 with the stated properties may occur.

= = =

The measure 11 above is called the Levy measure of X, and the quantities aa', b, and 11 or a and 11 are referred to collectively as the characteristics ofX.

Proof: The stationarity of the increments excludes the possibility of fixed jumps, and so X has a representation as in Theorem 15.4. The stationarity also implies that E77 is time invariant. Thus, Theorem 2.6 yields E77 = ..X®11 for some measure 11 on JRd \ {0} or (0, oo). The stated conditions on 11 are immediate from (2) and (4). Finally, Theorem 13.4 gives the form of the continuous component. Formula (5) shows that 1J is a measurable function of X, and so v is uniquely determined by .C(X). The uniqueness of the 0 remaining characteristics then follows by subtraction. From the representations in Theorem 15.4 we may easily deduce the following so-called Levy-Khinchin formulas for the associated characteristic functions or Laplace transforms. Here we write u' for the transpose of u.

15. Independent Increments and Infinite Divisibility

291

Corollary 15.8 (characteristic exponents, Kolmogorov, Levy) Let X be a Levy process in JRd with characteristics (a, b, v). Then EeiuXt = et'I/J,. for alt t 2: 0 and u E JRd, where

7/Ju = iu'b-

~u'au + J(eiu'x -1- iu'x1{lxl :$ 1})v(dx),

u E !Rd.

(7)

lf Xis a subordinator with characteristics (a, v), then also Ee-uXt = e-tx ... for all t, u 2: 0, where Xu

=

ua + /(1- e-ux)v(dx),

u 2:0.

(8)

In both cases, the characteristics are determined by C(X1). Proof: Formula (8) follows immediately from (3) and Lemma 12.2 (i). Similarly, (7) is obtained from (1) by the same Iemma when v is bounded, and the general case then follows by dominated convergence. To prove the last assertion, we note that 'ljJ is the unique continuous function with 7/Jo = 0 satisfying e'I/Ju = EeiuX1 • By the uniqueness theorem for characteristic functions and the independence of the increments, 'ljJ determines all finite-dimensional distributions of X, and so the uniqueness of the characteristics follows from the uniqueness in Corollary 15.7. D From Proposition 8.5 we note that a Levy process Xis Markov for the induced filtration g = (Qt) with translation-invariant transition kernels f..lt(X, B) = f..lt(B-x) = P{Xt E B-x }. More generally, given any filtration F, we say that X is Levy with respect to F, or simply F-Levy, if X is adapted to Fand suchthat (Xt- X 8 )llFs for all s < t. In particular, we may take Ft = 9t V N, t 2: 0, where N = a{N c A; A E A, PA= 0}. Note that the latter filtration is right-continuous by Corollary 7.25. Just as for Brownian motion in Theorem 13.11, we further see that any process X which is F-Levy for some right-continuous, complete filtration F is a strong Markov process, in thesensethat the process X'= B-rX -X-r satisfies X :Jl X'llF-r for any finite optional time r. We turn to a brief discussion of some basic symmetry properties. A process X on IR+ is said to be self-similar if for any r > 0 there exists some s = h(r) > 0 suchthat the process Xrt, t 2: 0, has the same distribution as sX. Excluding the trivial case when Xt = 0 a.s. for all t > 0, it is clear that h satisfies the Cauchy equation h(xy) = h(x)h(y). If Xis right-continuous, then h is continuous, and the only solutions are of the form h(x) = xa for some o: ER Let us now return to the context of Levy processes. We say that such a process X is strictly stable if it is self-similar, and weakly stable if it is self-similar apart from a centering, so that for each r > 0 the process (Xrt) has the same distribution as (sXt + bt) for suitable s and b. In the latter case, the corresponding symmetrized process is strictly stable, and so s is again of the form ra. In both cases it is clear that o: > 0. We may then introduce the index p = o:- 1 and say that X is strictly or weakly p-stable.

292

Foundations of Modern Probability

The terminology carries over to random variables or vectors with the same distribution as X 1·

Proposition 15.9 (stable Levy processes) Let X be a nondegenerate Levy process in IR with characteristics (a, b, v). Then X is weakly p-stable for some p > 0 iff either of these conditions holds: and v

= 0;

(ii) p E (0, 2), a

= 0,

(i) p

=2

and v(dx)

= c±lxi-P- 1 dx

on IR± for some C± 2: 0.

For subordinators, weak p-stability is equivalent to the condition

(iii) p E (0, 1) and v(dx)

= cx-P- 1 dx

on (0, oo) for some c

> 0.

Proof: Writing Sr: x r-+ rx for any r > 0, we note that the processes X (rPt) and r X have characteristics rP (a, b, V) and (r 2 a, rb, V0 s; 1 )' respectively. Since the latter are determined by the distributions, it follows that Xis weakly p-stable iff rPa = r 2 a and rPv =V 0 S; 1 for all r > 0. In particular, a = 0 when p-=/= 2. Writing F(x) = v[x, oo) or v( -oo, -x], we also note that rP F(rx) = F(x) for all r, x > 0, and so F(x) = x-P F(1), which yields the stated form of the density. The condition J(x 2 1\ 1)v(dx) < oo implies p E (0, 2) when v -=/= 0. If X 2: 0, we have the stronger condition J(x 1\ 1)v(dx) < oo, so in this case p < 1. D

If X is weakly p-stable for some p -=/= 1, it can be made strictly p-stable by a suitable centering. In particular, a weakly p-stable subordinator is strictly stable iff the drift component vanishes. In the latter case we simply say that X is stable. The next result shows how stable subordinators may arise naturally even in the study of continuous processes. Given a Brownian motion B in IR, we introduce the maximum process Mt = SUPs:-:;t Bs and its right-continuous inverse Tr

= inf{t 2: 0; Mt> r} = inf{t 2: 0; Bt > r},

r 2:0.

(9)

Theorem 15.10 (jirst-passage times, Levy) For a Brownian motion B, the process T in (9) is a ~-stable subordinator with Levy measure v(dx)

= (27r)- 112 x- 3 12 dx,

x

> 0.

Proof: By Lemma 7.6, the random times Tr are optional with respect to the right-continuous filtration :F induced by B. By the strong Markov property of B, the process BrT- Tr is then independent of :Frr with the same distribution as T. Since T is further adapted to the filtration (:Frr), it follows that T has stationary independent increments and hence is a subordinator. To see that T is ~-stable, fix any c > 0, put Eh= c- 1 B(c2 t), and define Tr = inf{t 2: 0; Bt > r}. Then

Tcr = inf{t 2: 0; Bt >er}= c2 inf{t 2: 0; Bt > r} = c2 Tr.

15. Independent Increments and Infinite Divisibility

293

By Proposition 15.9 the Levy measure ofT has a density of the form ax- 3 12 , x > 0, and it remains to identify a. Then note that the process

Xt = exp(uBt- ~u 2 t),

t 2:: 0,

is a martingale for any u E JR. In particular, EX.,.rAt = 1 for any r, t 2:: 0, and since clearly B.,.r = r, we get by dominated convergence Eexp( -~u 2 Tr) Taking u =

u, r 2:: 0.

J2 and comparing with Corollary 15.8, we obtain

J2 = foo (1 -

a

= e-ur,

lo

e-x)x- 3 12 dx

=2

{ 00 e-xx- 112 dx

lo

= 2y7i,

which shows that a = (211")- 1 /2.

0

If we add a negative drift to a Brownian motion, the associated maximum process M becomes bounded, and so T = M- 1 terminates by a jump to infinity. For such occasions, it is useful to consider subordinators with possibly infinite jumps. By a generalized subordinator we mean a process of the form Xt yt + oo ·1{t 2:: (} a.s., where Y is an ordinary subordinator and ( is an independent, exponentially distributed random variable. In this case we say that X is obtained from Y by exponential killing. The representation in Theorem 15.4 remains valid in the generalized case, except that v may now have positive mass at oo. The following characterization is needed in Chapter 22.

=

Lemma 15.11 (generalized subordinators) Let X be a nondecreasing and right-continuous process in [0, oo] with Xo = 0, and let :F denote the filtration induced by X. Then X is a generalized subordinator iff P[Xs+t- Xs E ·IFs]

Proof: Writing ( equation

=

= P{Xt

E ·} a.s. on {Xs

inf{t; Xt

P{( > s+t}

=

< oo },

s, t

> 0.

(10)

oo}, we get from (10) the Cauchy

= P{( > s}P{( > t},

s,t ~ 0,

(11)

which shows that ( is exponentially distributed with mean m E (0, oo]. Next define J.Lt = P[Xt E ·IXt < oo], t 2:: 0, and conclude from (10) and (11) that the J.Lt form a semigroup under convolution. By Theorem 8.4 there exists a corresponding process Y with stationary, independent increments. From the right-continuity of X, it follows that Y is continuous in probability. Hence, Y has a version that is a subordinator. Now choose - d ( = ( with (llY, and let X denote the process Y killed at (. Comparing - d with (10), we note that X = X. By Theorem 6.10 we may assume that even X= X a.s., which means that X is a generalized subordinator. The 0 converse assertion is obvious. The next result provides the basic link between Levy processes and triangular arrays. A random vector or its distribution is said to be in-

e

Foundations of Modern Probability

294

finitely divisible if for every n E N there exist some i.i.d. random vectors ~nl, ... , ~nn with I:k ~nk 4 ~. By an i. i. d. array we mean a triangular array of random vectors ~nj, j :::; mn, where the ~nj are i.i.d. for each n and mn-+ oo. Theorem 15.12 (Levy processes andinfinite divisibility) For any random vector ~ in JR.d, these conditions are equivalent:

(i) (ii)

~

is infinitely divisible;

I:j ~nj ~ ~ for some i.i.d.

(iii) ~

array (~nj);

4 X 1 for some Levy process X

in JR.d. Under those conditions, .C(X) is determined by

.C(~) =

.C(Xl)·

A simple lemma is needed for the proof.

Lemma 15.13 (individual terms) If the ~nj aresuch as in Theorem 15.12

(ii), then ~nl ~ 0. Proof: Let JL and Jln denote the distributions of ~ and ~nj, respectively. Choose r > 0 so small that jl =f. 0 on [-r, r], and write jl = e.P on this interval, where '1/J : [-r, r] -+ C is continuous with '1/J(O) = 0. Since the convergence p,r;:n -+ jl is uniform on bounded intervals, it follows that fln =f. 0 on [-r, r] for sufficiently large n. Thus, we may write fln(u) = e.Pn(u) for lul :::; r, where mn'I/Jn-+ '1/J on [-r, r]. Then '1/Jn-+ 0 on the same interval, and therefore fln-+ 1. Now let c:::; r- 1 , and note as in Lemma 5.1 that 2r

J(

sinrx) rx

1- - - JLn(dx)

sin rc) rc As n -+ oo, the left-hand side tends to 0 by dominated convergence, and we get Jln ~ 6o. D

>

2r ( 1 - - - JLn{lxl2 c}.

Proof of Theorem 15.12: Trivially (iii) =:} (i) =:} (ii). Now let ~ni• j:::; mn, be an i.i.d. array satisfying (ii), put Jln = .C(~nj), and fix any k E N. By Lemma 15.13 we may assume that k divides each mn and write I:j ~nj = 'f/nl

+ · · ·+ 'f/nk, where the 'f/nj

u E JR.d and r

> 0 we have

are i.i.d. with distribution JL~(mn/k). For any

(P{U'TJnl > r})k = P{minj::;kU'f/nj > r}:::; P{Li:Sk U'f/nj > kr}, and so the tightness of Lj 'f/nj carries over to the sequence ('TJnd· By Proposition 5.21 we may extract a weakly convergent subsequence, say with limiting distribution vk. Since Ei 'T/nj ~ ~' it follows by Theorem 5.3 that ~ has distribution vk,k. Thus, (ii) =:} (i). Next assume (i), so that .C(~) JL = JL~n for each n. By Lemma 15.13 we get fln -+ 1 uniformly on bounded intervals, and so jl =f. 0. We may

=

15. Independent Increments and Infinite Divisibility

295

then write jl = e..P and fln = e..Pn for some continuous functions 'ljJ and '1/Jn with 1/J(O) = '1/Jn(O) = 0, and we get 'ljJ = n'l/Jn for each n. Hence, et..P is a characteristic function for every t E Q+, and then also for t E :IR.+ by Theorem 5.22. By Theorem 6.16 there exists a process X with stationary independent increments such that Xt has characteristic function et..P for every t. Here Xis continuous in probability, and so by Theorem 15.1 it has an rcll version, which is the desired Levy process. Thus, (i) =? (iii). The last assertion is clear from Corollary 15.8. D Justified by the one-to-one correspondence between infinitely divisible distributions f-L and their characteristics (a, b, v) or (a, v), we may write f-L = id(a, b, v) or f-L = id(a, v), respectively. The last result shows that the dass of infinitely divisible laws is closed under weak convergence, and we proceed to derive explicit convergence criteria. Then define for each h > 0

ah

=a+

where Jh 0 with m{ ±h} = 0. Thus, for distributions J.L and J.Ln on ~. we have iin ~ ii iff Vn ~ v on iR \ {0} and a~ ---+ ah for any h > 0 with v{ ±h} = 0. Similarly, iin ~ ii holds for distributions J.L and J.Ln on ~+ iff Vn ~ v on (0, oo] and a~ ---+ ah for all h > 0 with v{ h} = 0. Thus, (ii) follows immediately from Lemma 15.15. To obtain (i) from the same lemma when d = 1, it remains to notice that the conditions b~ ---+ bh and Cn ---+ c are equivalent when iin ~ ii and v{±h} = 0, since lx-x(1+x 2 )- 1 1 ~ lxl 3 • Turning to the proof of (i) when d > 1, let us first assume that Vn ~ v on ~d \ {0} and that a~ ---+ ah and b~ ---+ bh for some h > 0 with v{lxl = h} = 0. To prove J.Ln ~ p,, it suffices by Corollary 5.5 to show that, for any one-dimensional projection 1l"u: x t-t u'x with u-:/= 0, J.Ln o 1r; 1 ~ J.L o 1r; 1 . Then fix any k > 0 with v{lu'xl = k} = 0, and note that p, o 1r:;; 1 has the associated characteristics vu = v o 1r:;; 1 and au,k bu,k

u'ahu+ J(u'x) 2 {1(o,k)(lu'xl) -1(o,hJ(Ixl)}v(dx),

=

u'bh + J u'x{1(1,k)(lu'xl) -1(l,hJ(Ixl)}v(dx).

15. Independent Increments and Infinite Divisibility

297

Let a~,k, b~,k, and v~ denote the corresponding characteristics of f.Ln o 1r; 1 . Then vu ~ vu on IR\ {0} and furthermore au,k -t au,k and bu,k -t bu,k n ' n n · The desired convergence now follows from the one-dimensional result. Conversely, assume that f.Ln ~ f.L· Then f.Ln o 1r; 1 ~ J.L o 1r; 1 for every u =1- 0, and the one-dimensional result yields v~ ~ vu on IR\ {0} as well as a~,k -t au,k and b~,k -t bu,k for any k > 0 with v{lu'xl = k} = 0. In particular, the sequence (vnK) is bounded for every compact set K C JRd \ {0}, and so the sequences (u'a~u) and (u'b~) are bounded for any u =1- 0 and h > 0. In follows easily that (a~) and (b~) are bounded for every h > 0, and therefore all three sequences are relatively compact. Given any subsequence N' C N, we have Vn ~ v' along a further subsequence N" C N' for some measure v' satisfying J(lxl 2 1\ 1)v'(dx) < oo. Fixing any h > 0 with v'{lxl = h} = 0, we may choose a still further subsequence N"' such that even a~ and b~ converge toward some Iimits a' and b'. The direct assertion then yields f.Ln ~ J.L1 along N"', where J.L 1 is infinitely divisible with characteristics determined by (a', b', v'). Since J.L1 = J.L, we get v' = v, a' = ah, and b' = bh. Thus, the convergence remains valid along the original sequence. D By a simple approximation, we may now derive explicit criteria for the convergence Lj ~nj ~ ~ in Theorem 15.12. Note that the compound Poisson distribution with characteristic measure J.L = .C(O is given by jl = id(O, b, J.L), where b = E[~; 1~1 ~ 1]. For any array ofrandom vectors ~nj, we may introduce an associated compound Poisson ar::ay, consisting of rowwise independent compound Poisson random vectors ~nj with characteristic measures .C (~nj).

Corollary 15.16 {i.i.d. arrays) Consider in JRd an i.i.d. array (~nj) and an associated compound Poisson array (~nj), and let ~ be id(a, b, v). Then

Lj~nj ~ ~ iffl:jf.nj ~ ~· For any h > 0 with v{lxl = h} = 0, it is also equivalent that

(i) mn.C(~nt) ~V on JRd \ {0}; (ii) mnE[~n1~~1; l~n1l ~ h] -t ah;

(iii) mnE[~n1; l~n1l ~ h] -t bh. Proof: Let J.L = .C(~) and write P, = e'I/J, where 'lj; is continuous with '!j;(O) = 0. lf J.L~mn ~ J.L, then p,r;::n -t P, uniformly on compacts. Thus, on any bounded set B we may write Pn = e'I/Jn for large enough n, where the 'l/Jn are continuous with mn'l/Jn -t 'lj; uniformly on B. Hence, mn(e'I/Jn -1) -t 'lj;, and so jl~mn ~ f.L· The proofin the other direction is similar. Since jl~mn is id(O, bn, mnJ.Ln) with bn = mn ~xl 9 XJ.Ln(dx), the last assertion follows by Theorem 15.14. D The weak convergence of infinitely divisible laws extends to a pathwise approximation property for the corresponding Levy processes.

298

Foundations of Modern Probability

Theorem 15.17 (approximation of Levy processes, Skorohod} Let X, X 1 ,

X 2 , . . . be Levy processes in JR.d with Xf ~ X 1· Then there exist some -

processes xn

p =d xn such that (xnx); ----* 0 for all t::::: 0.

Before proving the general result, we consider two special cases. Lemma 15.18 (compound Poisson case) The conclusion of Theorem 15.17 holds when X, X 1 , X 2 , ... are compound Poisson with characteristic measures v, v1, v2, ... satisfying Vn ~ v. Proof: Allowing positive mass at the origin, we may assume that v and the Vn have the same total mass, which may then be reduced to 1 through a suitable scaling. If 6, 6, . . . and f[, ~2, . . . are associated i.i.d. sequences, then

(~f, ~2, ... ) ~ (6, 6, ... ) by Theorem 4.29, and by Theorem 4.30 we may assume that the convergence holds a.s. Letting N be an independent unitrate Poisson process, and defining Xt = I:j::;Nt ~j and = I:j::;Nt fj' it follows that (Xn- X); ----* 0 a.s. foreacht 2: 0. o

xr

Lemma 15.19 (case of small jumps) The conclusion of Theorem 15.17 holds when EXn

=0 and 1 2:

p

(~Xn)t----*

0.

!t 0, we may choose some constants hn ----* 0 with mn = E N such that w(Xn, 1, hn) !t 0. By the stationarity of the increments, it follows that w(Xn, t, hn) !t 0 for all t 2: 0. Next, Theorem Proof: Since (~Xn)i

h;;, 1

15.14 shows that Xis centered Gaussian. Thus, there exist as in Theorem 14.20 some processes yn !!=. (X{:,.nt]hJ with (Yn - X); !t 0 for all t 2: 0. -

By Corollary 6.11 we may further choose some processes xn yn = X(:nnt]hn a.s. Then, as n----* oo for fixed t 2: 0, E[(Xn- X); 1\ 1] ~ E[(Yn- X); 1\ 1]

=d xn

+ E[w(Xn, t, hn) 1\ 1]----* 0.

with

o

Proof of Theorem 15.17: The asserted convergence is clearly equivalent to p(Xn, X) ----* 0, where p denotes the metric p(X, Y)

=

1=

e-t E[(X- Y); 1\ 1]dt.

For any h > 0 we may write X = Lh + Mh + Jh and xn = Ln,h + Mn,h + Jn,h with L~ bht and L~,h b~t, where Mh and Mn,h are martingales containing the Gaussian components and all centered jumps of size ~ h, and the processes Jh and Jn,h are formed by all remaining jumps. Write B for the Gaussian component of X, and note that p(Mh, B) ----* 0 as h ----* 0 by Proposition 7.16. For any h > 0 with v{lxl = h} = 0, it is clear from Theorem 15.14 that b~ ----* bh and v~ ~ vh, where vh and v~ denote the restrictions of v and Vn, respectively, to the set {lxl > h}. The same theorem yields a~----* a as n ----* oo and then h ----* 0, and so under those conditions M~,h ~ B 1 .

=

=

15. Independent Increments and Infinite Divisibility

299

Now fix any e > 0. By Lemma 15.19 there exist some constants h, r > 0 and processes !Vfn,h 1= Mn,h such that p(Mh, B) :=:;; e and p(!Vfn,h, B) :=:;; e for all n > r. Furthermore, if v{lxl = h} = 0, there exist by Lemma 15.18 some number r' ~ r and processes Jn,h 1= Jn,h independent of !Vfn,h such that p( }h, Jn,h) :=:;; e for all n > r'. We may finally choose r" ~ r' so large that p( Lh, Ln,h) :=:;; e for all n > r". The processes j(n = Ln,h + !Vfn,h +

jn,h !f= xn then satisfy p(X, J(n)

:=:;;

4e for all n

> r".

0

Combining Theorem 15.17 with Corollary 15.16, we get a similar approximation theorem for random walks, which extends the result for Gaussian limits in Theorem 14.20. A slightly weaker result is obtained by different methods in Theorem 16.14. Corollary 15.20 (approximation of random walks)

Consider in JR.d a such that s;:,n ~ X 1 Levy process X and some random walks for some integers mn -+ oo, and let N be an independent unit-rate Poisson process. Then there exist some processes xn 1= (Sn o Nmnt) such that (xn - x); ~ 0 for all t ~ 0.

8 1,

82, . . .

In particular, we may use this result to extend the first two arcsine laws in Theorem 13.16 to symmetric Levy processes. Theorem 15.21 (arcsine laws) Let X be a symmetric Levy process in IR. with X1 # 0 a.s. Then these random variables are arcsine distributed: Tl=

A{t

:=:;;

1; Xt

> 0},

T2 =

inf{t ~ 0; Xt V Xt-

The purpose of the condition X 1

#

= SUP 8 9X8 }.

(16)

0 a.s. is to exclude the degenerate

case of pure jump-type processes.

Lemma 15.22 (diffuseness, Doeblin) A measure 1-L = id(a, b, v) in JR.d is diffuse iff a =/:- 0 or v!R.d = oo.

Proof: If a = 0 and vJR.d < oo, then 1-L is compound Poisson apart from a shift, and so it is clearly not diffuse. When either condition fails, then it does so for at least one coordinate projection, and we may take d = 1. If a > 0, the diffuseness is obvious by Lemma 1.28. Next assume that v is unbounded, say with v(O, oo) = oo. For each n E N we may then write v = Vn +v~, where v~ is supported by (0, n- 1 ) and has total mass log 2. For 1-L we get a corresponding decomposition I-Ln* /-L~, where 1-L~ is compound Poisson with Levy measure v~ and 1-L~{O} = ~- For any x E IR. and e > 0 we get ~-t{ x}

< I-Ln{ x }t-L~ {0} + t-Ln[x- e, x)~-t~(O, e] + t-L~(e, oo) < ~1-Ln[x- e, x] + t-L~(e, oo).

Letting n -+ oo and then e -+ 0, and noting that 1-L~ ~ Do and I-Ln ~ /-L, we get ~-t{x} :=:;; ~~-t{x} by Theorem 4.25, and so ~-t{x} = 0. 0

300

Foundations of Modern Probability

sr

Proof of Theorem 15.21: Introduce the random walk = Xk/n• let N be an independent unit-rate Poisson process, and define Xf = SnoNnt· By -

d

-

p

Corollary 15.20 there exist some processes xn = xn with (Xn- X)i -t 0. Define Tf and T!j as in (16) in terms of xn, and conclude from Lemmas 14.12 and 15.22 that Now define af=N;; 1

L

T['

~ Ti for i = 1, 2.

1{Sr>o},

a2=N;; 1 min{k;Sr=maxJ~NnSj}.

k~Nn

Since r

1 Nt

-t 1 a.s. by the law oflarge numbers, we have SUPt 0 with v{!xl = h} = 0. By Theorem 15.14 (i) and Lemma 15.24 we have

15. Independent Increments andInfinite Divisibility

301

~j ~nj ~ ~ iff

R\ {0},

L.J.Lnj ~ v on J

L .E[~~j; l~njl:::; h] J

whereas ~i ~;i ~ TJ iff

L .J.Lnj J

0

P2 1 ~ v o P2 1 on (0, oo],

L.E[~~j;~~3 :::;h 2 ]---+ J

a+1

y~h2

y(vop2 1 )(dy).

The two sets of conditions are equivalent by Lemma 1.22.

0

The limit problern for general null arrays is more delicate, since a compound Poisson approximation as in Corollary 15.16 or Lemma 15.24 applies only after a careful centering, as specified by the following key result. Theorem 15.25 (compound Poisson approximation) Let (~nj) be a null array of random vectors in JRd, and fix any h > 0. Define 'f/nj = ~nj- bnj, where bnj = E[~nj; l~njl :::; h], and let (ilnj) be an associated compound Poisson array. Then (17) A technical estimate is needed for the proof. Lemma 15.26 (uniform summability) Let the random vectors "lni = ~nj - bnj in Theorem 15.25 have characteristic functions 'Pnj. Then either condition in (17) implies limsup L .11- 'Pnj(u)l < oo, n-+c:::xJ

J

u E :!Rd.

Proof: By the definitions of bnj, 'f/nj, and 'Pni, we have 1- 'Pnj(u)

= E[1- eiu'1lni + iU 1 'f/nj1{l~njl:::; h}]- iu'bnjP{I~njl > h}.

Putting

an= L .E[TJnj"l~j; l~njl:::; h], J

Pn

= L J.P{I~njl > h},

and using Lemma 5.14, we get

L .11- 'Pnj(u)l ;S ~u'anu + (2 + lui)Pn· J

Hence, it is enough to show that (u'anu) and (Pn) are bounded. Assuming the second condition in ( 17), the desired boundedness follows easily from Theorem 15.14, together with the fact that maxj lbnjl --+ 0. If instead ~i ~nj ~ ~. we may introduce an independent copy (~~j) of the

302

Foundations of Modern Probability

array (enj) and apply Theorem 15.14 and Lemma 15.24 to the symmetric random variables (;:i = u'eni- u'e~j· For any h' > 0, this gives limsup L n-+oo

P{l(:::il > h'} < oo,

(18)

J

(19) The boundedness of Pn follows from (18) and Lemma 4.19. Next we note that (19) remains true with the condition !(;:il :::; h' replaced by lenil V 1e~j 1 :::; h. Furthermore, by the independence of enj and e~j,

~L .E[((;:j) 2 ; lenil v 1e~jl:::; h] J

L .E[(u'17nj) 2 ; lenil:::; h]P{Ienil:::; h}- L .(E[u'17nii lenil:::; h]) 2 J

J

> u'anuminjP{Ienil:::; h}- L.(u'bnjP{Ienil > h}) 2 . J

Here the last sum is bounded by Pn maxj(u'bnj) 2 -+ 0, and the minimum on the right tends to 1. The boundedness of (u'anu) now follows by (19). D Proof of Theorem 15.25: By Lemma 5.13 it is enough to show that

L;j l 0 with v{x} = 0. In particular, for such an x, P{a~::; x}

= P{"ln(x,oo) = 0} -t P{"l(x,oo) = 0} = P{a+::; x},

and so a~ ~ a+. Similarly, a;;: ~ a-, which proves (iii).

0

Exercises 1. Show that a Levy process X in IR is a subordinator iff X 1

~

0 a.s.

2. Show that the Cauchy distribution J.L(dx) = + is strictly 1-stable, and determine the corresponding Levy measure v. (Hint: Check that ji,(u) = e-lul. By symmetry, v(dx) = cx- 2 dx for some c > 0, and it remains to determine c.) rr- 1 (1

x 2 )- 1 dx

3. Let X be a weakly p-stable Levy process. If p -:f:. 1, show that the process Xt- ct is strictly p-stable for a suitable constant c. Note that the centering fails for p = 1.

15. Independent Increments and Infinite Divisibility

305

4. Extend Proposition 15.23 to null arrays of spherically symmetric random vectors in IRd. 5. Show by an example that Theorem 15.25 fails without the centering at truncated means. (Hint: Without the centering, condition (ii) of Theorem 15.28 becomes Lj E[~nj~~j; l~njl:::; h]-+ ah.) 6. Deduce Theorems 5.7 and 5.11 from Theorem 15.14 and Lemma 15.24. 7. Fora Levy process X of effective dimension d 2: 3, show that IXtl -+ oo a.s. as t-+ oo. (Hint: Define T = inf{t; IXtl > 1}, and iterate to form a random walk (Sn)· Show that the latter has the same effective dimension as X, and use Theorem 9.8.) 8. Let X be a Levy process in IR, and fix any p E (0, 2). Show that c 1 1P Xt converges a.s. iff EIXtiP < oo and either p:::; 1 or EX1 = 0. (Hint: Define a random walk (Sn) as before, show that S1 satisfies the same moment condition as Xt, and apply Theorem 4.23.) 9. If ~ is id(a, b, v) andp > 0, show that EI~IP < oo iff ~xl> 1 lx1Pv(dx) < oo. (Hint: If v has bounded support, then EI~IP < oo for all p. It is then enough to consider compound Poisson distributions, for which the result is elementary.) 10. Show by a direct argument that a Z+-valued random variable ~ is infinitely divisible (on Z+) iff -log Es~ = Lk(1-sk)vk, s E (0, 1], for some unique, bounded measure v = (vk) on N. (Hint: Assuming .C(~) = Jt~n, use the inequality 1 - x :::; e-x to show that the sequence (nJtn) is tight on N. Then nJtn ~ v along a subsequence for some bounded measure v on N. Finally note that -log(1- x) "'x as x-+ 0. For the uniqueness, take differences and use the uniqueness theorem for power series.) 11. Show by a direct argument that a random variable ~ 2: 0 is infinitely divisible iff -log Ee-u~ = ua + J (1 - e-ux)v( dx ), u 2: 0, for some unique constant a 2: 0 and measure von (O,oo) with J(lxl/\ 1) < oo. (Hint: If .C(~) = Jt~n, note that the measures Xn(dx) = n(1- e-x)Jtn(dx) are tight on IR+. Then Xn ~ X along a subsequence, and we may write x(dx) = a8o(dx) + (1- e-x)v(dx). The desired representation now follows as before. To get the uniqueness, take differences and use the uniqueness theorem for Laplace transforms.) 12. Show by a direct argument that a random variable ~ is infinitely divisible iff 1/Ju = log Eeiu~ exists and is given by (7) for some unique constants a 2: 0 and b and measure v on IR\ {0} with J(x 2 1\ 1)v(dx) < oo. (Hint: Proceed as in Lemma 15.15.)

306

Foundations of Modern Probability

13. Given a semigroup of infinitely divisible distributions J.Lt, show that there exists a process X on IR+ with stationary, independent increments and .C(Xt) = J.Lt for all t 2: 0. Starting from a suitable Poisson process and an independent Brownian motion, construct a Levy process Y with the same property. Gonelude that X has a version with rcll paths and a similar representation as Y. (Hint: Use Lemma 3.24 and Theorems 6.10 and 6.16.)

Chapter 16

Convergence of Random Processes, Measures, and Sets Relative compactness and tightness; uniform topology on C(K, S); Skorohod's J1 -topology; equicontinuity and tightness; convergence of random measures; Superposition and thinning; exchangeable sequences and processes; simple point processes and random closed sets

The basic notions of weak or distributional convergence were introduced in Chapter 4, and in Chapter 5 we studied the special case of distributions on Euclidean spaces. The purpose of this chapter is to develop the general weak convergence theory into a powerful tool that applies to a wide range of set, measure, and function spaces. In particular, some functional limit theorems derived in the last two chapters by cumbersome embedding and approximation techniques will then be accessible by Straightforward compactness arguments. The key result is Prohorov's theorem, which gives the basic connection between tightness and relative distributional compactness. This result will enable us to convert some classical compactness criteria into convenient probabilistic versions. In particular, we shall see how the Arzela-Ascoli theorem yields a corresponding criterion for distributional compactness of continuous processes. Similarly, an optional equicontinuity condition will be shown to guarantee the appropriate compactness for processes that are right-continuous with left-hand limits (rcll). Weshall also derive some general criteria for convergence in distribution of random measures and sets, with special attention to the point process case. The general criteria will be applied to some interesting concrete situations. In addition to some already familiar results from Chapters 14 and 13, we shall obtain a general functional limit theorem for sampling from finite populations and derive convergence criteria for Superpositions and thinnings of point processes. Further applications appear in subsequent chapters, such as a general approximation result for Markov chains in Chapter 19 and a method for constructing weak solutions to SDEs in Chapter 21.

Beginning with the case of continuous processes, let us fix two metric spaces (K, d) and (S, p), where K is compact and S is separable and complete, and consider the space C(K, S) of continuous functions from K to

308

Foundations of Modern Probability

S, endowed with the uniform metric p(x, y)

= suptEK p(xt, Yt)· For each

t E K we may introduce the evaluation map 7rt : x f-t

Xt from C(K, S) toS. The following result shows that the random elements in C(K, S) are precisely the continuous S-valued processes on K.

Lemma 16.1 {Borel sets and evaluations} B(C(K, S))

= a{7rt; t E K}.

Proof: The maps 7rt are continuous, hence Borel measurable, and so the generated a-field C is contained in B(C(K, S)). To prove the reverse relation, we need to show that any open subset G C C(K, S) lies in C. From the Arzela-Ascoli Theorem A2.1 we note that C(K, S) is acompact and hence separable. Thus, G is a countable union of open balls Bx,r = {y E C(K, S); p(x, y) < r }, and it suffices to prove that the latter lie in C. Butthis is clear, since for any countable dense set D C K, D

tED If X and xn are random processes Oll K' we write xn ~ X for convergence of the finite-dimensional distributions, in the sense that

(X~, ... ,X4,) ~ (Xtw .. ,Xtk),

tt, ... , tk

E K, k E

N.

(1)

Though by Proposition 3.2 the distribution of a random process is determined by the family of finite-dimensional distributions, condition ( 1) is insufficient in general for the convergence xn ~X in C(K, S). This is al-

ready clear when the processes are nonrandom, since pointwise convergence

of a sequence of functions need not be uniform. To overcome this difficulty, we may add a compactness condition. Recall that a sequence of random elements 6, 6, . . . is said to be relatively compact in distribution if every subsequence has a further subsequence that converges in distribution. Lemma 16.2 {weak convergence via compactness} Let X,X1 ,X2 , ... be

random elements in C(K, S). Then Xn ~X iff Xn ~X and (Xn) is relatively compact in distribution. Proof: If Xn ~X, then Xn ~X by Theorem 4.27, and {Xn) is trivially relatively compact in distribution. Now assume instead that (Xn) satisfies the two conditions. If Xn ~ X, we may choose a bounded continuous function f: C(K, S) --t lR. and an e > 0 such that IE f(Xn) - E f(X) I > e along some subsequence N' c N. By the relative compactness we may d choose a further subsequence N" and a process Y suchthat Xn --t Y along N". But then Xn ~ Y along N", and since also Xn ~ X, Proposition 3.2 yields X 1::, Y. Thus, Xn ~X along N", and so Ef(Xn) -+ Ef(X) along the same sequence, a contradiction. We conclude that Xn ~X. D

The last result shows the importance of finding tractable conditions for a random sequence 6, 6, ... in a metric space S tobe relatively compact.

160 Convergence of Random Processes, Measures, and Sets

Generalizing a notion from Chapter 4, we say that supK liminf P{~n E K} = 1, n->oo

(~n)

309

is tight if

(2)

where the supremum extends over all compact subsets K C So We may now state the key result of weak convergence theory, the equivalence between tightness and relative compactness for random elements in sufficiently regular metric spaceso A version for Euclidean spaces was obtained in Proposition 5021. Theorem 16.3 (tightness and relative compactness, Prohorov) For any in a metric space S, tightness sequence of random elements 6,6, implies relative compactness in distribution, and the two conditions are equivalent when S is separable and completeo 0

o

o

In particular, we note that when S is separable and complete, a single random element ~ in S is tight, in thesensethat supK P{~ E K} = 1. In that case we may clearly replace the "lim inf" in (2) by "inf." For the proof of Theorem 1603 we need a simple lemmao Recall from Lemma 1.6 that a random element in a subspace of a metric space S may also be regarded as a random element in S 0 Lemma 16.4 (preservation oftightness} Tightness is preserved by continuaus mappingso In particular, if (~n) is a tight sequence of random elements in a subspace A of some metric space S, then (~n) remains tight when regarded as a sequence in So Proof: Compactness is preserved by continuous mappingso This applies D in particular to the natural embedding I: A--t So Proof of Theorem 1603 (Varadarajan): For S = JRd the result was proved in Proposition 5o21. Thrning to the case when S = IR 00 , consider a tight sequence of random elements ~n = (~1\ ~2, 000) in IR 00 0Writing 11k' = (~f, oo0, ~k'), we conclude from Lemma 1604 that the sequence (ryk'; n E N) is tight in JRk for each k E No Given any subsequence N' C N, we may then use a did agonal argument to extract a further subsequence N" such that 11k' --+ some 1Jk as n--+ oo along N" for fixed k E No The sequence (.C(ryk)) is projective by the continuity of the coordinate projections, and so by Theorem 6o14 there exists a random sequence ~ = (6, 6, 000) such that (6, 000, ~k) :!:::. 1Jk

for each ko But then ~n Ä ~ along N", and so Theorem 4029 yields ~n ~ ~ along the same sequenceo Next assume that S c IR 00 0 If (~n) is tight in S, then by Lemma 16.4 it remains tight as a sequence in IR 00 0 Hence, for any sequence N' C N there exist a further subsequence N" and some random element ~ such that ~ein IR 00 along N"o To show that the convergence remains valid inS, it suffices by Lemma 4o26 to verify that ~ES aoSo Then choose some compactsetsKm C SwithliminfnP{~n E Km} 2: 1-2-m foreachm E No

en

310

Foundations of Modern Probability

Since the Km remain closed in !R00 , Theorem 4.25 yields P{~

E Km} 2:: limsupP{~n E Km} 2:: liminf P{~n E Km} 2:: 1- Tm, nEN"

n---+oo

and so~ E Um Km C S a.s. Now assume that S is a-compact. In particular, it is then separable and therefore homeomorphic to a subset A C lR00 • By Lemma 16.4 the tightness of (~n) carries over to the image sequence (~n) in A, and by Lemma 4.26 the possible relative compactness of (~n) implies the same property for (~n)· This reduces the discussion to the previous case. Now turn to the general case. If (~n) is tight, there exist some compact sets Km C S with liminfn P{~n E Km} 2:: 1-2-m. In particular, P{~n E A} -4 1, where A =Um Km, and so we may choose some random elements 'T/n in A with P{~n = 'Tln} -4 1. Here ('Tln) is again tight, even as a sequence in A, and since A is a-compact, the previous argument shows that ('T/n) is relatively compact as a sequence in A. By Lemma 4.26 it remains relatively compact in S, and by Theorem 4.28 the relative compactness carries over to (~n)· Ta prove the converse assertion, Iet S be separable and complete, and assume that (~n) is relatively compact. For any r > 0 we may cover S by some open balls B1, B2, ... of radius r. Writing Gk = B1 U · · · U Bk, we claim that

(3)

lim inf P{~n E Gk} = 1.

k---+oo

n

t oo with supk P{.;nk E Gk} = c < 1. By the relative compactness we have ~nk ~ ~ along a

Indeed, we may otherwise choose some integers subsequence N' C N for a suitable P{~ E

~,

nk

and so

Gm}::; liminf P{~nk E Gm} ::; c < 1, kEN'

m E N,

which Ieads as m -4 oo to the absurdity 1 < 1. Thus, (3) must be true. Now take r = m- 1 and write G'k for the corresponding sets Gk· For any c: > 0 there exist by (3) some k1. k1. · · · E N with inf P{~n E G'km} 2:: 1- c:2-m,

nm

n

m E N.

Writing A = G'km, we get infn P{ ~n E A} 2:: 1- c:. Also, note that Ä is complete and totally bounded, hence compact. Thus, (~n) is tight. D In order to apply the last theorem, we need convenient criteria for tightness. Beginning with the space C(K, S), we may convert the classical Arzela-Ascoli compactness criterion into a condition for tightness. Then introduce the modulus of continuity

w(x, h) = sup{p(x 8 , Xt); d(s, t) ::; h },

x E C(K, S), h > 0.

The function w(x, h) is clearly continuous for fixed h > 0 and hence a measurable function of x.

16. Convergence of Random Processes, Measures, and Sets

311

Theorem 16.5 {tightness in C(K, S), Prohorov) For any metric spaces

K and S, where K is compact and S is separable and complete, let X, X1, X2, ... be random elements in C(K, S). Then Xn ~X iff Xn !:!:.". X and

lim limsupE[w(Xn,h) 1\ 1]

h-+0

n-+oo

= 0.

(4)

Proof: Since C(K, S) is separable and complete, Theorem 16.3 shows that tightness and relative compactness are equivalent for (Xn)· By Lemma 16.2

it is then enough to show that, under the condition Xn !:!:.". X, the tightness of (Xn) is equivalent to (4). Firstlet (Xn) be tight. For any c: > 0 we may then choose a compact set B C C(K, S) suchthat limsupn P{Xn E Be} < c:. By the Arzela-Ascoli Theorem A2.1 we may next choose h > 0 sosmallthat w(x, h) ~ c: for all x E B. But then limsupn P{w(Xn, h) > c:} < c:, and (4) follows since c: was arbitrary. Next assume that (4) holds and Xn !:!:.".X. Since each Xn is continuous, w(Xn, h) -t 0 a.s. as h -t 0 for fixed n, so the "limsup" in (4) may be replaced by "sup." For any c: > 0 we may then choose h1, h 2 , · • · > 0 so small that

(5) Letting t1, t2, ... be densein K, we may further choose some compact sets C1, C2, · · · C S such that (6)

Now define

Then Bis compact by the Arzela-Ascoli Theorem A2.1, and from (5) and (6) we get supn P{Xn E Be}~ c:. Thus, (Xn) is tight. D One often needs to replace the compact parameter space K by some more general index set T. Here we assume T to be locally compact, secondcountable, and Hausdorff (abbreviated as lcsclf) and endow the space C(T, S) of continuous functions from T toS with the topology of uniform convergence on compacts. As before, the Borel a-field in C(T, S) is generated by the evaluation maps 7rt, and so the random elements in C(T, S) are precisely the continuous processes on T taking values inS. The following result characterizes convergence in distribution of such processes.

312

Foundations of Modern Probability

Proposition 16.6 (locally compact parameter space) Let X, X 1 , X 2 , ... be random elements in C(T, S), where S is a metric space and T is lcscH.

Then xn ~ X iff convergence holds for the restrictions to any compact subset K C T.

Proof: The necessity is obvious from Theorem 4.27, since the restriction map 7rK: C(T, S) --+ C(K, S) is continuous for any compact set K C T. To prove the sufficiency, we may choose some compact sets K 1 c K 2 c · · · c T with KJ t T, and let Xi, Xl, Xf, ... denote the restrictions of the processes X, X 1 , X 2 , ... to Ki· By hypothesis we have Xf ~ Xi for

every i, and so Theorem 4.29 yields (xr,x~, ... ) ~ (X1,X2 , ••• ). Now = (7rK1 , 7rK2 , ••• ) is a homeomorphism from C(T, S) onto its range in

1r

XiC(Ki,S), and so

xn--+d X

by Lemma 4.26 and Theorem 4.27.

o

For a simple illustration, we may prove a version of Donsker's Theorem 14.9. Since Theorem 16.5 applies only to processes with continuous paths, we need to replace the original step processes by their linearly interpolated versions

x:; =

n- 112 {~k'S,nt " ~k

+ (nt- [nt])~[nt]+l},

t :2: 0, n E N.

(7)

Corollary 16.7 (functional centrollimit theorem, Donsker) Let 6, 6, .. . be i. i. d. random variables with mean 0 and variance 1, define X 1 , X 2 , .. . by (7), and let B denote a Brownian motion on ~+ Then xn ~ B in C(JR+)·

The following simple estimate may be used to verify the tightness. Lemma 16.8 (maximum inequality, Ottaviani} Let 6,6, ... be i.i.d. random variables with mean 0 and variance 1, and put Sn = l:j-s,n~i·

Then

P{S~;::: 2rv'n} ~ P{l~n~ ~:2vn},

r

> 1,

n E N.

Proof: Put c = r.Jii, and define T = inf{kEN; ISkl :2: 2c}. By the strong Markov property at T and Theorem 6.4, P{ISnl :2: c}

> P{ISnl :2: c, S~ :2: 2c} > P{r ~ n, IBn- Srl ~ c} > P{S~ :2: 2c} mink-s,nP{ISkl

~ c},

and by Chebyshev's inequality, minP{ISkl ~ c} :2: min(1- kc- 2 ) :2: (1- nc- 2 ) k'S,n

k 0,

(13)

and so by (11) and (12), lim limsupEw(Xn, t, h):::; 2e + limsupP{a~

h-+0

n-+oo

n~oo

< t}.

(14)

Next we conclude from (13) and Lemma 16.13 that, for any c > 0, P{a~

< t}:::; etE[e-a:,;

a~

< t]:::; et{e-mc+e- 1 vn(t+c,c)}.

By (11) the right-hand side tends to 0 as m, n --+ oo and then c --+ 0. Hence, the last term in (14) tends to 0 as m--+ oo, and (9) follows since e is arbitrary. D We may illustrate the use of Theorem 16.11 by proving an extension of Corollary 16.7. A more precise result is obtained by different methods in Corollary 15.20. An extension to Markov chains appears in Theorem 19.28. Theorem 16.14 (approximation of random walks, Skorohod) Let SI,

8 2 , ... be random walks in ~d suchthat S~n ~ X1 for some Levy process X and some integers mn --+ oo. Then the processes Xf' = S~nt] satisfy xn ~X in D(~+'~d). fd

Proof: By Corollary 15.16 we have xn c:....t X, and so by Theorem 16.11 it is enough to show that IX;.'n+hn - x;n I ~ 0 for any finite optional times T n and constants hn --+ 0. By the strong Markov property of sn' or

316

Foundations of Modern Probability

alternatively by Theorem 11.13, we may reduce to the case when 'Tn = 0 for all n. Thus, it suffi.ces to show that XJ:n ~ 0 as hn --+ 0, which again may be seen from Corollary 15.16. 0 For the remainder of this chapter, we assume that S is lcscH with Borel a-field S. Write S for the dass of relatively compact sets inS. Let M(S) denote the space of locally finite measures on S, endowed with the vague topology induced by the mappings 'Trf : JL 1---t JL! = J fdJL, f E C}(. The basic properties of this topology are summarized in Theorem A2.3. Note in particular that M ( S) is Polish and that the random elements in M (S) are precisely the random measures on S. Similarly, the point processes on S are random elements in the vaguely closed subspace .N(S), consisting of all integer-valued measures in M(S). We begin with the basic tightness criterion. Lemma 16.15 {tightness of random measures, Prohorov) Let 6, 6, ... be random measures on some lcscH space S. Then the sequence (~n) is relatively compact in distribution iff (~nB) is tight in JR.+ for every BE S. Proof: By Theorems 16.3 and A2.3 the notions of relative compactness and tightness are equivalent for (~n)· If (~n) is tight, then so is (~nf) for every f E C}( by Lemma 16.4, and hence (~nB) is tight for all B E S. Conversely, assume the latter condition. Choose an open cover G 1 , G2 , · · · E S of S, fix any e > 0, and let r1. r2, · · · > 0 belarge enough that supnP{~nGk > rk}

nk

< e2-k,

k E N.

(15)

Then the set A = {JL; JLGk S rk} is relatively compact by Theorem A2.3 (ii), and (15) yields infn P{~n E A} > 1- e. Thus, (~n) is tight. 0 We may now derive some general convergence criteria for random measures, corresponding to the uniqueness results in Lemma 12.1 and Theorem 12.8. Define Se= {BE S; ~öB = 0 a.s.}. Theorem 16.16 {convergence of random measures) Let ~, 6, 6,... be random measures on an lcscH space S. Then these conditions are equivalent:

(i) ~n ~ ~; (ii) ~nf ~

U

for all f E C}(;

(iii) (~nBb ... , ~nBk) ~ (~B1, ... ,~Bk) for all B1. ... , Bk E Se, k E N. lf ~ is a simple point process or a diffuse random measure, it is also equivalent that d

(iv) ~nB--+ ~B for all BE

Se. A

Proof: By Theorems 4.27 and A2.3 (iii), condition (i) implies both (ii) and (iii). Conversely, Lemma 16.15 shows that (~n) is relatively compact in distribution under both (ii) and (iii). Arguing as in the proof of Lemma

16. Convergence of Random Processes, Mea.sures, and Sets

16.2, it remains to show for any random measures ~ and fJ on S that ~ if U 4 rlf for all f E Cj(, or if

317

4 fJ

In the former case, this holds by Lemma 12.1; in the latter case it follows by a monotone dass argument from Theorem A2.3 (iv). The last assertion is obtained in a similar way from a suitable version of Theorem 12.8 (iii). D Weaker conditions are required for convergence to a simple point process, as suggested by Theorem 12.8. The following conditions are only suflicient, and a precise criterion is given in Theorem 16.29. Here a dass U C S is said to be separating if, for any compact and open sets K and G with K C G, there exists some U E U with K C U C G. Furthermore, we say that I C S is preseparating if the finite unions of sets in I form a separating dass. Applying Lemma A2.6 to the function h(B) = Ee-f.B, we note that the dass Sf. is separating for any random measure ~. For Euclidean spaces S, a preseparating dass typically consist of reetangular boxes, whereas the corresponding finite unions form a separating dass. Proposition 16.17 {convergence of point processes) Let ~' 6, 6,... be point processes on an lcscH space S, where ~ is simple, and fix a separating class U C S. Then ~n ~ ~ under these conditions:

(i) P{~nU = 0} -t P{~U = 0} for all U EU; (ii) limsupn E~nK::; E~K < oo for all compact sets K

C

S.

Proof: Firstnote that both (i) and (ii) extend by suitable approximation to sets in Sf.. By the usual compactness argument together with Lemma 4.11, it is enough to prove that a point process fJ is distributedas ~ whenever P{fJB

= 0} = P{~B = 0},

E'T]B::; E~B,

B E Sf.+w

Here the first relation yields fJ* 4 ~ as in Theorem 12.8 (i). From the second relation we then obtain E'T]B ::; E'T]* B for all B E Sf., which shows that fJ is a.s. simple. D We may illustrate the use of Theorem 16.16 by showing how Poisson and Cox processes may arise as limits under Superposition or thinning. Say that the random measures ~nj, n,j E N, form a null array if they are independent for fixed n and such that, for every B E S, the random variables ~njB form a null array in the sense of Chapter 5. The following result is a point process version of Theorem 5.7.

Foundations of Modern Probability

318

Theorem 16.18 (convergence of superpositions, Grigelionis} Let

(~nj) be a null array of point processes on an lcscH space S, and consider a Poisson

process

~ on S with E~ = f-L· Then L:j ~nj ~ ~ iff these conditions hold:

(i) L:j P{~njB > 0}-+ f-LB for all BE SI-'; (ii) L:j P{~njB > 1}-+ 0 for all BE S. d

d

Proof: If L:j ~nj -+ ~' then L:j ~njB -+ ~B for all B E SJ-1 by Theorem A

16.16, which implies (i) and (ii) by Theorem 5.7. Conversely, assume (i)

and (ii). To prove that L:j ~nj ~ ~' we may restriet our attention to an arbitrary compact set C E Sw For notational convenience, we may also assume that S itself is compact. Now define T/nj = ~nj1{~njS ::; 1}, and note that (i) and (ii) remain true for the array (TJnj ). Moreover, L:j T/nj ~ ~ implies L:j ~nj ~ ~ by Theorem 4.28. This reduces the discussion to the case when ~njS::; 1 for all n and j. Now define f-Lnj = E~nj· By (i) we get L

./-LnjB

J

= L J.E~njB = L JP{~njB > 0}-+ f-LB,

BE SJ-1,

and so L:j f-Lnj ~ f-L by Theorem 4.25. Noting that m(1- e-f) = 1- e-mf when m = 8x or 0 and writing ~n = L:j ~nj, we get by Lemmas 5.8 and 12.2 (i)

rr

Ee-l;nf

_Ee-l;njf J

=

rr

E{l- ~nj(l- e-f)} J

rrj {1- /-Lnj(1- e-tn "'exp {- Ljf-Lnj(1- e-f)} -+

exp(-f..L(1-e-f))=Ee-H.

0

We may next establish a basic limit theorem for independent thinnings of point processes.

Theorem 16.19 {convergence of thinnings) For every n E N, let ~n be a Pn-thinning of some point process TJn on S, where S is lcscH and Pn -+ 0. Then ~n ~ some ~ iff PnTJn Cox process directed by TJ·

~ some TJ, in which case ~ is distributed as a

Proof: For any f E Cj(, we get by Lemma 12.2 E-l;nf = Eexp(TJn log{1- Pn(1- e-f)}).

Noting that px::; -log(1- px) ::; -xlog(1- p) for any p,x E (0, 1) and writing p~ = -log(l- Pn), we obtain Eexp{-p~TJn(1- e- 1 )} d

:S: Ee-f.nf :S: Eexp{-PnTJn(1- e-f)}. d

If PnTJn -+ TJ, then even P~TJn -+ TJ, and so by Lemma 12.2 Ecl;nf -+ E exp{ -ry(1 - e- f)} = Ee-U,

(17)

16. Convergence of Random Processes, Measures, and Sets

319

where ~ is a Cox process directed by 'Tl· Hence, ~n ~ ~Conversely, assume that ~n ~~-Fix any g E Ci< and let 0:::; t < Applying (17) with f = -log(1- tg), we get

11911- 1 .

liminf E exp{ -tPn"ln9} ~ Eexp{~log(1- tg)}. n->oo

Here the right-hand side tends to 1 as t-+ 0, and so by Lemmas 5.2 and 16.15 the sequence (Pn"ln) is tight. For any subsequence N' c N, we may d then choose a further subsequence N" suchthat Pn"ln-+ some 'Tl along N". By the direct assertion, ~ is then distributed as a Cox process directed by ry, which by Lemma 12.6 determines the distribution of 'Tl· Hence, "ln ~ 'Tl remains true along the original sequence. D The last result leads in particular to an interesting characterization of Cox processes. Corollary 16.20 (Cox processes and thinnings, Mecke) Let~ be a point process on S. Then ~ is Cox iff for every p E (0, 1) there exists a point process ~P such that ~ is distributed as a p-thinning of ~p· Proof: If ~ and ~P are Cox processes directed by 'Tl and ryjp, respectively, then Proposition 12.3 shows that ~ is distributed as a p-thinning of ~p· Conversely, assuming the stated condition for every p E (0, 1), we note that ~ is Cox by Theorem 16.19. D

The previous theory will now be used to derive a general limit theorem for sums of exchangeable random variables. The result applies in particular to sequences obtained by sampling without replacement from a finite population. It is also general enough to contain a version of Donsker's theorem. The appropriate nmction space in this case is D([O, l],JR) = D[O, 1], to which the results for D(JR+) apply with obvious modifications. For motivation, we begin with a description of the possible limits, which are precisely the exchangeable processes on [0, 1]. Herewe say that a process X on [0, 1] is exchangeable if it is continuous in probability with X 0 = 0 and has exchangeable increments over any set of disjoint intervals of equal length. The following result is a finite-interval version of Theorem 11.15. Theorem 16.21 (exchangeable processes on [0, 1]) A process X on [0, 1] is exchangeable iff it has a version with representation

Xt

= at + aBt + L .ßj(1{rj:::; t}- t), J

t E

[0, 1],

(18)

for some Brownian bridge B, some independent i.i.d. U(O, 1) random variables r 1, r2, ... , and some independent set of coefficients a, a, and ß1, ß2, ... such that Lj ß] < oo a.s. In that case, the sum in (18) converges in probability, uniformly on [0, 1], toward an rclllimit.

In particular, we note that a simple point process on [0, 1] is symmetric with respect to Lebesgue measure "\ iff it is a mixed binomial process based on .-\,in agreement with Theorem 12.12. Combining the present result with

320

Foundations of Modern Probability

Theorem 11.15, we also see that a continuous process X on IR+ or [0, 1] with X 0 = 0 is exchangeable iff it can be written in the form Xt = at + aBt, where B is a Brownian motion or bridge, respectively, and (a, a) is an independent pair of random variables. We first examine the convergence of the series in (18). Lemma 16.22 (convergence of series) For any t E (0, 1), the series in (18) converges a.s. iff L:i ß] < oo a.s. In that case, it converges in probability with respect to the uniform metric on [0, 1], and the sum has a version in D[O, 1]. Proof: For both assertions, we may assume that the coefficients ßi are nonrandom. Then for fixed t E (0, 1), the terms are independent and bounded with mean 0 and variance ßjt(1 - t), and so by Theorem 4.18 the series converges iff L:i ß] < oo. To prove the second assertion, let xn derrote the nth partial sum in (18), and note that the processes M[" =X[" /(1- t) are L 2 -martingales on [0, 1) with respect to the filtration induced by the processes 1{ri ~ t}. By Doob's inequality we have for any m < n and t E [0, 1) E(Xn - xm); 2

< E(Mn- Mm); 2

=

~ 4E(M["-

M["") 2

4(1- t)- 2 E(Xf - X["") 2

< 4t(1- t)-l Lj>mßJ,

which tends to 0 as m -t oo for fixed t. Hence, (Xn- X)i -t 0 a.s. along p a subsequence for some process X, and then also (Xn- X); -t 0 along p N. By symmetry we have also (Xn- X); -t 0 for the refl.ected processes p Xt = X1-t- and X["= X!-t-• and so by combination (Xn-X)i---+ 0. The last assertion now follows from the fact that xn is rcll for every n. D We plan to prove Theorem 16.21 tagether with the following approximation result. Here we consider for every n E N some exchangeable random variables ~ni• j ~ mn, where mn ---+ oo, and introduce the summation processes

Xf =


0, E[~;I.Fn] ~ a(E[~ni.Fn]) 2 < oo a.s., n E N. Then if (~n) is tight, so is the sequence 'fJn = E[~ni.Fn], n E N. p

Proof: By Lemma 4.9 we need to show that Cn'f/n -+ 0 whenever 0 ~ Cn-+ 0. Then conclude from Lemma 4.1 that, for any r E (0, 1) and c: > 0,

Here the first term on the right tends in probability to 0 since Cn~n ~ 0 by Lemma 4.9. Hence, 1{Cn'fJn < c:} ~ 1, which means that P{Cn'fJn ~ c:}-+ 0. Since c: is arbitrary, we get Cn'fJn ~ 0.

0

Since the summation processes in (19) will be approximated by exchangeable processes, as in Theorem 16.21, we finally need a convergence criterion for the latter. This result also has some independent interest.

322

Foundations of Modern Probability

Proposition 16.26 (convergence of exchangeable processes) Let the processes xn and pairs (an, ~~:n) be related as in (18) and (21). Then xn ~ some X in D[O, 1] ifJ (an, ~~:n) ~ some (a, ~~:) in lR x M(i), in which case even X and (a, ~~:) are related by (18) and (21). Proof: Firstlet (an, ~~:n) ~ (a, ~~:). To prove xn ~X for the corresponding processes in (18), it suffices by Lemma 16.24 to assume that all the an and ll:n are nonrandom. Thus, we may restriet our attention to processes xn with constant coefficients an, an, and ßni, j E N. To prove that xn ~ X, we begin with four special cases. First we note that if an --+ a, then trivially ant--+ at uniformly on [0, 1]. Similarly, an --+ a implies anB--+ aB in the same sense. Next we consider the case when an= an= 0 and ßn,m+l = ßn,m+2 = · · · = 0 for some fixed m E N. Here we may assume that even a = a = 0 and ßm+l = ßm+2 = · · · = 0, and that moreover ßnj --+ ßi for all j. The convergence xn --+ X is then obvious. Finally, we may assume that an = an = 0 and a = ß1 = ß2 = · · · = 0. Then maxi lßnj I --+ 0, and for any s :::; t we have

E(X~ X;') = s(1 - t)

L .ß~i --+ s(1- t)a 2 = E(XsXt)· J

(22)

In this case, xn ~X by Theorem 5.12 and Corollary 5.5. By independence we may combine the four special cases to obtain xn ~ X whenever ßj = 0 for all but finitely many j. From here on, it is easy to extend to the

general case by means of Theorem 4.28, where the required uniform error estimate may be obtained as in (22). To strengthen the convergence to xn ~ X in D[O, 1], it is enough to verify the tightness criterion in Theorem 16.11. Thus, for any xn-optional times Tn and positive constants hn--+ 0 with Tn + hn :::; 1, we need to show that x;-n+hn - x;-n ~ 0. By Theorem 11.13 and a simple approximation, it is equivalent that x;:n ~ 0, which is clear since

E(XJ:J 2 = h;a; + hn(1- hn)~~:nlR--+ 0. To obtain the reverse implication, we assume that xn ~X in D[O, 1] for some process X. Since an define for n E N T/n

= 2Xf; 2 -

Xf

= Xf ~ X11

the sequence (an) is tight. Next

= 2anB1/2 + 2 L J.ßnj(1{Tj:::; Ü- ~).

Then

L .ß~j = ll:nlR,

E[ry~J~~:n]

a~ +

E[ry;J~~:n]

3 { a~

J

+ Ll~j

r- Ll;j:::; 2

3(~~:n1R)2.

16. Convergence of Random Processes, Measures, and Sets

323

Since (rJn) is tight, Lemmas 16.15 and 16.25 show that even (,..n) is tight, and so the same thing is true for the sequence of pairs (a:n, ,..n)· The tightness implies relative compactness in distribution, and so every subsequence contains a further subsequence that converges in lR x M (i:) toward some random pair (a:, ,..). Since the measures in (21) form a vaguely closed subset of M (i:), the limit ,.. has the same form for suitable a and

ß~, ß2, .... The direct assertion then yields xn ~ Y with Y as in (18), and therefore X !1:. Y. Now the coeffi.cients in (18) are measurable functions of Y, and so the distribution of (a:, ,..) is uniquely determined by that of X. Thus, the limiting distribution is independent of subsequence, and the

convergence (a:n, ,..n) ~ (a:, ,..) remains valid along N. We may finally use Corollary 6.11 to transfer the representation (18) to the original process X. o Proof of Theorem 16.23: Let of all ~nj, and define

TI, T2, ...

be i.i.d. U(O, 1) and independent

Yt = L .~nj1{Tj :S t} = O:nt + L .~nj(1{Tj :S t}- t), J

J

t E [0, 1].

Writing ~nk for the kth jump from the left of yn (including possible 0 jumps when ~nj -

d

=

= 0), we note that (~nj) !1:. (~nj) by exchangeability. Thus, -

xn, where Xf

=

-

-

Furthermore, d(Xn, yn) -+ 0 a.s. by Proposition 4.24, where d is the metric in Theorem A2.2. Hence, by Theorem 4.28 it is equivalent to replace xn by yn. But then the assertion follows by Proposition 16.26. 0 xn

L:i~mnt ~ni.

Using similar compactness arguments, we may finally prove the main representation theorem for exchangeable processes on [0, 1]. Proof of Theorem 16.21: The suffi.ciency part being obvious, it is enough to prove the necessity. Thus, assume that X has exchangeable increments. Introduce the step processes

Xf = X(Tn[2nt]),

t E [0, 1], n E N,

=

define / 0. Let ft, /2, ... besuch as in Theorem A2.4, and choose some compact sets B1, B2, · · · c D(JR+, JR+) with (24) Then A = nd11; 11fk E Bk} is relatively compact in D(JR+,M(S)), and o (24) yields P{Xn E A};:::: 1- c. We turn our attention to random sets. Then fix an lcscH space S, and let :F, Q, and K denote the classes of closed, open, and compact subsets, respectively. We endow :F with the so-called Fell topology, generated by the sets {F; F n G # 0} and {F; F n K = 0} for arbitrary G E g and K E K. Some basic properties of this topology are summarized in Theorem A2.5. In particular, :Fis compact and metrizable, and {F; F n B = 0} is universally measurable for every B E S. By a mndom closed set in S we mean a random element cp in :F. In this context we often write cp n B = cpB, and we note that the probabilities P{ cpB = 0} are well defined. For any random closed set cp, we introduce the dass

Scp = { B E S;

P{ cpB 0 = 0} = P{ cpB = 0}} ,

which is separating by Lemma A2.6. We may now state the basic convergence criterion for random sets. It is interesting to note the formal agreement with the first condition in Proposition 16.17.

16. Convergence of Random Processes, Mea.sures, and Sets

325

Theorem 16.28 (convergence of random sets, Norberg) Let TJ:; IMt- Mrk'l = Tn}, Clearly,

ytn

k 2:0.

Ti: -+ oo as k -+ oo for fixed n. Introduce the processes '"'kMTkn1{t E (TJ:, Tk+d}, Q~ ~k (Mti\Tn- Mti\Tn ~

=

='"'

k

k-1

)2 .

The yn are bounded predictable step processes, and we note that

Mt2

= 2(Vn · M)t + Q~,

t 2: 0.

(3)

By Lemma 17.3 the integrals yn · M are continuous L 2-martingales, and since !Vn- MI:::; 2n for each n, we have

uvm. M- yn. Mll

= II(Vm- vn). Mll:::; 2-m+liiMII, m:::; n.

17. Stochastic Integrals and Quadratic Variation

333

Hence, by Lemma 17.4 there exists some continuous martingale N such that (Vn ·M-N)*~ 0. The process [MJ = M 2 - 2N is again continuous, and by (3) we have

(Qn- [M])*

= 2(N- vn · M)*

~ 0.

In particular, [M] is a.s. nondecreasing on the random time set T = {Tk; n,k E N}, and the monotonicity extends by continuity to the closure T. Alsonote that [MJ is constant on each interval in Tc, since this is true for M and hence also for every Qn. Thus, [M] is a.s. nondecreasing. Turning to the unbounded case, we define Tn = inf{t > 0; IMtl = n}, n E N. The processes [MTn J meist as before, and we note that [MTm = [Mrn jTm a.s. for all m < n. Hence, [MTm J = [Mrn J a.s. on [0, Tm], and since Tn-+ oo there exists a nondecreasing, continuous, and adapted process [M] such that [M] = [Mrn J a.s. on [0, Tn] for each n. Here (Mrn ) 2 - [MjTn is a local martingale for each n, and so M 2 - [M] is a local martingale by D Lemma 17.1.

rm

We proceed to establish a basic continuity property.

Proposition 17.6 {continuity) For any continuous local martingales Mn starting at 0, we have M~ ~ 0 iff [Mn]oo ~ 0. Proof: First let M~ ~ 0. Fix any c > 0, and define Tn = inf{t 2: 0; IMn(t)i > c}, n E N. Write Nn = M;- [Mn], and note that N~n is a true martingale on i+. In particular, E[Mnk ::; c 2 , and so by Chebyshev's inequality P{[Mn]oo > c} :S P{Tn < oo} + e- 1 E[Mnk :S P{M~ > e} + e.

Here the right-hand side tends to zero as n-+ oo and then E: -+ 0, which p shows that [Mn]oo -+ 0. The proof in the other direction is similar, except that we need to use a localization argument together with Fatou's lemma to see that a continuous local martingale M with Mo = 0 and E[M]oo < oo is necessarily L 2 bounded. D Next we prove a pair of basic norm inequalities involving the quadratic variation, known as the BDG inequalities. Partial extensions to discontinuous martingales are established in Theorem 26.12.

Theorem 17.7 {norm inequalities, Burkholder, Millar, Gundy, Novikov) There exist some constants Cp E ( 0, oo), p > 0, such that for any continuous local martingale M with Mo= 0,

c; 1 E[MJ~ 2

::;

EM*P ::; cpE[MJ~ 2 ,

p

> 0.

Proof: By optional stopping we may assume that M and [M] are bounded. Write M' = M - Mr with T = inf{t; Ml = r} and define

334

N

Foundations of Modern Probability

= (M') 2- [M'). By Corollary 7.30 we have for any r > 0 and c E (0, z-v) P{M* 2 ~ 4r}- P{[M)oo ~er}

Multiplying by (pj2)rP1 2 -

1

< <
r- er} cP{N* > 0}:::; cP{M* 2 ~ r}.

and integrating over ~+• we get by Lemma 3.4

TP EM*P- c-P/ 2 E[M)~ 2

:::;

cEM*P,

and the right-hand inequality follows with cp = c-P/ 2/(2-P- c). Next let N be as before with T = inf{t; [M)t = r}, and write for any r > 0 and c E (0, z-P/ 2- 2) P{[M)oo ~ 2r}- P{M* 2 ~er}

< <
0; (V2 · [M])t = n }. By the previous argument there exist some continuous local martingales V · MTn such that, for any continuous local martingale N,

[V· MTn,N] =V· [MTn,N] a.s.,

r=

n E N.

(7)

Form< n it follows that (V. MTn satisfies the corresponding relation with [MT.,., N], and so (V · MTn )T"' = V · MT.,. a.s. Hence, there exists a continuous process V · M with (V · Mfn = V · MTn a.s. for all n, and Lemma 17.1 shows that V ·M is again a local martingale. Finally, (7) yields [V· M, N] =V· [M, N] a.s. on [0, Tn] for each n, and so the same relation D holds on IR.+. By Lemma 17.10 we note that the stochastic integral V· M of the last theorem extends the previously defined elementary integral. It is also clear that V· M is a.s. bilinear in the pair (V, M) and satisfies the following basic continuity property. Lemma 17.12 (continuity) For any continuous local martingales Mn and

processes Vn E L(Mn), we have (Vn ·Mn)*~ 0 iff (V;· [Mn])oo ~ 0. Proof: Recall that [Vn ·Mn] =

v; · [Mn] and use Proposition 17.6.

D

Before continuing the study of stochastic integrals, it is convenient to extend the definition to a larger dass of integrators. A process X is said to be a continuous semimartingale if it can be written as a sum M + A, where M is a continuous local martingale and A is a continuous, adapted process of locally finite variation and with A 0 = 0. By Proposition 17.2 the decomposition X = M + A is then a.s. unique, and it is often referred to as the canonical decomposition of X. By a continuous semimartingale in JR.d we mean a process X = (X 1 , ... , Xd) such that the component processes Xk are Olle-dimensional continUOUS semimartingales. Let L( A) denote the dass of progressive processes V such that the process (V · A)t = V dA exists in the sense of ordinary Stieltjes integration. For any continuous semimartingale X = M + A we may write L(X) = L(M) n L(A), and we define the integral of a process V E L(X) as the sum V· X = V· M +V· A. Note that V· X is again a continuous semimartingale with canonical decomposition V · M + V · A. For progressive processes V, it is further clear that V E L(X) iff V 2 E L([M]) and V E L(A). From Lemma 17.12 we may easily deduce the following stochastic version of the dominated convergence theorem.

J;

Corollary 17.13 {dominated convergence} For any continuous semimartingale X, let u, V, vl, v2, ... E L(X) with IVnl ::; u and Vn --+ V. Then

(Vn ·X - V · X)t ~ 0, t 2: 0.

338

Foundations of Modern Probability

Proof: Assurne that X = M + A. Since U E L(X), we have U 2 E L([M]) and U E L(A). Hence, by dominated convergence for ordinary Stieltjes integrals, ((Vn- V) 2 · [M])t---+ 0 and (Vn · A- V· A);---+ 0 a.s. By Lemma 17.12 the former convergence implies (Vn ·M-V· M); ~ 0, and the assertion follows. D The next result extends the elementary chain rule of Lemma 1.23 to stochastic integrals. Proposition 17.14 (chain rule) Consider a continuous semimartingale X and two progressive processes U and V, where V E L(X). Then U E L(V ·X) iff UV E L(X), in which case U ·(V· X)= (UV) ·X a.s. Proof: Let M + A be the canonical decomposition of X. Then U E L(V ·X) iff U 2 E L([V · M]) and U E L(V · A), whereas UV E L(X) iff (UV) 2 E L([M]) and UV E L(A). Since [V·M] = V 2 ·[M], the two pairs of conditions are equivalent. The formula U ·(V· A) = (UV) ·Ais elementary. To see that even U · (V· M) = (UV) · M a.s., let N be an arbitrary continuous local martingale, and note that [(UV) · M, NJ

(UV) · [M, N] =

= U ·(V· [M, N])

U ·[V· M,NJ = [U ·(V· M),NJ.

D

The next result shows how the stochastic integral behaves under optional stopping. Proposition 17.15 (optional stopping) For any continuous semimartingale X, process V E L(X), and optional time r, we have a.s. (V· Xt =V·

xr = (V1[o,rJ) ·X.

Proof: The relations being obvious for ordinary Stieltjes integrals, we may assume that X = M is a continuous local martingale. Then (V · M) 7 is a continuous local martingale starting at 0, and we have [(V· Mt",NJ

[V· M,N 7 ] =V· [M,N 7 ] =V· [M 7 ,N]

=

V· [M,Nr

= (V1[o,rJ) · [M,NJ.

Thus, (V · Mf satisfies the conditions characterizing the integrals V · M 7 and (V1[o,rj) · M. D We may extend the definitions of quadratic variation and covariation to arbitrary continuous semimartingales X and Y with canonical decompositions M + A and N + B, respectively, by putting [X] = [M] and [X, Y] = [M, N]. As a key step toward the development of a stochastic calculus, we show how the covariation process can be expressed in terms of stochastic integrals. In the martingale case, the result is implicit in the proof of Theorem 17.5.

17. Stochastic Integrals and Quadratic Variation

339

Theorem 17.16 {integration by parts} For any continuous semimartingales X and Y, we have a.s.

XY

= XoYo +X· Y + Y ·X+ [X, Y].

(8)

Proof: We may take X = Y, since the general result will then follow by polarization. First let X = M E M 2 , and define vn and Qn as in the proof of Theorem 17.5. Then vn ---+ M and ~~nl :S Mt < oo, and so Corollary 17.13 yields (Vn · M)t ~ (M · M)t foreacht ~ 0. Thus, (8) follows in this case as we let n ---+ oo in the relation M 2 = vn · M + Qn, and it extends by localization to general continuous local martingales M with Mo = 0. If instead X= A, formula (8) reduces to A 2 = 2A·A, which holds by Fubini's theorem. Turning to the general case, we may assume that X 0 = 0, since the formula for general Xo will then follow by an easy computation from the result for X- X 0 . In this case (8) reduces to X 2 = 2X ·X+ [M]. Subtracting the formulas for M 2 and A 2 , it remains to prove that AM = A · M + M · A a.s. Then fix any t > 0, and introduce the processes A~ = A(k-l)tfn'

M;- = Mktfn,

s E t(k -1,k]jn, k,n E N,

which satisfy p

Here (An · M)t ---+ (A · M)t by Corollary 17.13 and (Mn · A)t ---+ (M · A)t by dominated convergence for ordinary Stieltjes integrals. D The terms quadratic variation and covariation are justified by the following result, which extends Theorem 13.9 for Brownian motion. Proposition 17.17 ( approximation, Fisk) Let X and Y be continuous semimartingales, fix any t > 0, and consider for every n E N a partition 0 = tn,O < tn,l < ... < tn,kn = t such that maxk(tn,k - tn,k-1) ---+ 0. Then

(n ;::::: Lk (Xtn,k - Xtn,k-J (Yfn,k - Ytn,k-J Proof: We may clearly assume that Xo predictable step processes

=

Yo

~ [X, Y]t.

=

(9)

0. Introduce the

s E (tn,k-b tn,k], k, n E N,

and note that Xtrt

= (Xn · Y)t + (Yn · X)t + (n,

n E N.

Since xn ---+ X and yn ---+ Y, and also (Xn); :S x; < oo and (Yn)t x; < oo, we get by Corollary 17.13 and Theorem 17.16 (n ~ XtYt - (X· Y)t - (Y · X)t

= [X, Y]t.

< D

We proceed to prove a version of Ito 's formula, arguably the most important formula in modern probability. The result shows that the dass of

340

Foundations of Modern Probability

continuous semimartingales is preserved under smooth mappings; it also exhibits the canonical decomposition of the image process in terms of the components of the original process. Extended versions appear in Corollaries 17.19 and 17.20 as well as in Theorems 22.5 and 26.7. Let Ck = Ck(JRd) denote the dass of k times continuously differentiable functions on JRd. When f E C 2 , we write f} and f}j for the first- and secondorder partial derivatives of f. Here and below, summation over repeated indices is understood. Theorem 17.18 (substitution rule, Ito) For any continuous semimartingale X in JRd and function f E C 2 (JRd), we have a.s.

f(X)

= f(Xo) + f}(X)

· Xi

+ Vij(X) · [Xi,Xi].

(10)

The result is often written in differential form as

df(X)

= f{(X) dXi + Vij(X) d[Xi,Xi].

It is suggestive to think of Itö's formula as a second-order Taylor expansion

df(X)

= f}(X)dXi + ~!Ij(X)dXidXi,

where the second-order differential dXidXi is interpreted as d[Xi, Xi]. If X has canonical decomposition M + A, we get the corresponding decomposition of f(X) by substituting Mi+ Ai for Xi on the right of (10). When M = 0, the last term vanishes, and (10) reduces to the familiar substitution rule for ordinary Stieltjes integrals. In general, the appearance of this Ito correction term shows that the Itö integral does not obey the rules of ordinary calculus. Proof of Theorem 11.18: For notational convenience we may assume that d = 1, the general case being similar. Then fix a one-dimensional, continuaus semimartingale X, and let C denote the dass of functions f E C 2 satisfying (10), now appearing in the form

f(X)

= f(Xo) + f'(X)

·X+ ~f"(X) ·[X).

(11)

The dass C is dearly a linear subspace of C 2 containing the functions f(x) 1 and f(x) x. Weshall prove that Cis dosedunder multiplication and hence contains all polynomials. To see this, assume that (11) holds for both f and g. Then F = f(X) and G = g(X) are continuous semimartingales, and so, by the definition of the integral tagether with Proposition 17.14 and Theorem 17.16, we have

=

=

(fg)(X)- (fg)(Xo) FG-FoGo =F·G+G·F+ [F,G) F · (g'(X) ·X+ ~g"(X) ·[X)) + G ·(!'(X)· X+ ~f"(X) ·[X))+ [f'(X) ·X, g'(X) ·X) (fg' + f'g)(X) ·X+ ~(fg" + 2f'g' + J"g)(X) ·[X) (fg)'(X) ·X+ ~(fg)"(X) ·[X).

17. Stochastic Integrals and Quadratic Variation

341

Now let f E C 2 be arbitrary. By Weierstrass' approximation theorem, we may choose some polynomials P1,P2, ... such that suplxi::Sc IPn(x) f"(x)l--* 0 for every c > 0. Integrating the Pn twice yields polynomials fn satisfying sup (ifn(x)- f(x)i V if~(x)- f'(x)i V if~(x)- f"(x)i)---* 0, lxi::Sc

c

> 0.

In particular, fn(Xt)--? f(Xt) foreacht > 0. Letting M +A be the canonical decomposition of X and using dominated convergence for ordinary Stieltjes integrals, we get for any t ;::=: 0 (!~(X). A + V~(X). [X])t--? (!'(X). A + ~f"(X). [XDt·

Similarly, (!~(X)- f'(X)) 2 · [M])t--? 0 for all t, and so by Lemma 17.12

(!~(X)· M)t ~(!'(X)· M)t,

t;:::: 0.

Thus, equation (11) for the polynomials fn extends in the limit to the same D formula for f. We sometimes need a local version of the last theorem, involving stochastic integrals up to the time (v when X first leaves a given domain D C JR.d. If X is continuous and adapted, then (v is clearly predictable, in the sense of being announced by some optional times Tn t (v suchthat Tn < (v a.s. on {(v > 0} for all n. In fact, writing p for the Euclidean metric in JR.d, we may choose

We say that Xis a semimartingale on [0, (v) if the stopped process X 7 n is a semimartingale in the usual sense for every n E N. In that case, we may define the covariation processes [Xi, Xi] on the interval [0, (v) by requiring [Xi,Xitn = [(xirn,(XirnJ a.s. for every n. Stochastic integrals with respect to X 1 , •.• , Xd are defined Oll [0, (D) in a similar way. Corollary 17.19 (local Ito-formula) For any domain D c JR.d, let X be a continuous semimartingale on [0, (v). Then (10) holds a.s. on [0, (v) for every f E C2 (D). Proof: Choose some functions fn E C 2(JR.d) with fn(x) = f(x) when p(x, Dc) ;::=: n- 1. Applying Theorem 17.18 to fn(X 7 n) with Tn as in (12), we get (10) on [0, Tn]· Since n was arbitrary, the result extends to [0, (v). D

By a complex-valued, continuous semimartingale we mean a process of the form Z = X+ iY, where X and Y arereal continuous semimartingales. The bilinearity of the covariation process suggests that we define the quadratic variation of Z as [Z] = [Z,Z] =[X +iY,X +iY] =[X] +2i[X,Y]- [Y].

Foundations of Modern Probability

342

Let L(Z) denote the dass of processes W = U + iV with U, V E L(X) n L(Y). For such a process W, we define the integral by

=

W ·Z

=

(U + iV) · (X+ iY) U ·X- V· Y + i(U · Y +V· X).

Corollary 17.20 (conformal mapping) Let f be an analytic function on some domain D C C. Then (10) holds for any D-valued, continuous

semimartingale Z.

Proof: Writing f(x + iy) = g(x, y) + ih(x, y) for any x + iy E D, we get 9tI

+ z'hl1 = !I '

I

92

+ z'hl2 =

z·~I '

and so by iteration g"11

+ •h" • u -- !" '

II

912

+ ,;hll • 12 -

,;!II



'

g"22 + •h" • 22-

-!" .

Equation (10) now follows for Z =X+ iY, as we apply Corollary 17.19 to the semimartingale (X, Y) and the functions g and h. D We also consider a modification of the Itö integral that does obey the rules of ordinary calculus. Assuming both X and Y to be continuous semimartingales, we define the Fisk-Stratonovich integral by

Iot X

o

dY =(X· Y)t

or in differential form X o dY = XdY the right is an ordinary Itö integral.

+~[X, Y]t,

t

~ 0,

+ ~d[X, Y], where the first

(13) term on

Corollary 17.21 (modified substitution rule, Fisk, Stratonovich) For any

continuous semimartingale X in JRd and function f E C 3 (JRd), we have a. s. J(Xt)

= f(Xo) + Iot ff(X) o dXi,

t

~ 0.

Proof: By Itö's formula, ff(X) = Jf(Xo) + Jfj(X) · Xi + Ufj1k(X) · [Xi,xk]. Using Itö's formula again, tagether with (6) and (13), we get

1

JI(X) o dXi

=

ff (X) · Xi + ~ [ff (X), Xi]

=

ff(X) · Xi + ~Jfj(X) · [Xi,Xi] = f(X)- f(Xo). D

Unfortunately, the more convenient substitution rule of Corollary 17.21 comes at a high price: The new integral does not preserve the martingale property, and it requires even the integrand to be a continuous semimartingale. It is the latter restriction that forces us to impose stronger regularity conditions on the function f in the Substitution rule. Our next task is to establish a basic uniqueness property, justifying our reference to the process V · M in Theorem 17.11 as an integral.

17. Stocha.stic Integrals and Quadratic Variation

343

Theorem 17.22 (uniqueness) The integral V ·M in Theorem 17.11 is the a.s. unique linear extension of the elementary stochastic integral such that, for any t > 0, the convergence (V;· [M])t ~ 0 implies (Vn · M); ~ 0. The statement follows immediately from Lemmas 17.10 and 17.12, tagether with the following approximation of progressive processes by predictable step processes.

Lemma 17.23 (approximation) For any continuous semimartingale X = M + A and process V E L(X), there exist some processes V~, V2, · · · E E such that a.s. ((Vn- V) 2 • [M])t -+ 0 and ((Vn- V)· A); -+ 0 for every t > 0. Proof: It is enough to take t = 1, since we can then combine the processes Vn for disjoint finite intervals to construct an approximating sequence on ~+· Furthermore, it suffices to consider approximations in the sense of convergence in probability, since the a.s. versions will then follow for a suitable subsequence. This allows us to perform the construction in steps, first approximating V by bounded and progressive processes V', next approximating each V' by continuous and adapted processes V", and finally approximating each V" by predictable step processes V 111 • Here the first and last steps are elementary, so we may concentrate on the second step. Then let V be bounded. We need to construct some continuous, adapted processes Vn suchthat ((Vn- V) 2 • [M]h -+ 0 and ((Vn- V)· A)i -+ 0 a.s. Since the Vn can be taken to be uniformly bounded, we may replace the former condition by (IVn- VI · [M]h -+ 0 a.s. Thus, it is enough to establish the approximation (IVn- VI · Ah -+ 0 in the case when A is a nondecreasing, continuous, adapted process with Ao = 0. Replacing At by At + t if necessary, we may even assume that A is strictly increasing. To construct the required approximations, we may introduce the inverse process Ts = sup{ t 2:: 0; At :S s}, and define Yth

t

= h- 1

V dA= h- 1

jT(At-h)

1A•

V(T8 )ds,

t,h > 0.

(At-h)+

By Theorem 2.15 we have Vh o T-+ V o T as h-+ 0, a.e. on [0, A1]. Thus, by dominated convergence,

1 1

JVh- VldA

=

1A

1

IVh(Ts)- V(Ts)lds-+ 0.

The processes Vh are clearly continuous. To prove that they are also adapted, we note that the process T(At - h) is adapted for every h > 0 by the definition ofT. Since V is progressive, it is further seen that V· A is adapted and hence progressive. The adaptedness of (V· A)r(A.-h) now D follows by composition. Though the dass L(X) of stochastic integrands is sufficient for most purposes, it is sometimes useful to allow the integration of slightly more

Foundations of Modern Probability

344

general processes. Given any continuous semimartingale X = M + A, let L(X) denote the dass of product-measurable processes V suchthat (VV)· [M] = 0 and (V- V)· A = 0 a.s. for some process V E L(X). For V E L(X) we define V· X= V· X a.s. The extension dearly enjoys all the previously established properties of stochastic integration. It is often important to see how semimartingales, covariation processes, and stochastic integrals are transformed by a random time-change. Let us then consider a nondecreasing, right-continuous family of finite optional times T8 , s 2: 0, here referred to as a finite random time-change T. If even F is right-continuous, then by Lemma 7.3 the same thing is true for the induced filtration Ys = FT., s 2: 0. A process X is said to be T-continuous if it is a.s. continuous on ~+ and constant on every interval h-, T 8 ], s 2: 0, where To- = Xo- = 0 by convention. Theorem 17.24 (random time-change, Kazamaki} Let T be a finite random time-change with induced filtration Q, and let X= M + A be a Tcontinuous F -semimartingale. Then X or is a continuous 9 -semimartingale with canonical decomposition Mo T + A o T and such that [X o r] = [X] o T a.s. Furthermore, V E L(X) implies V o TE L(X o r) and

(Vor)· (X o r) =(V· X) o T a.s.

(14)

Proof: It is easy to check that the time-change X f-t X o T preserves continuity, adaptedness, monotonicity, and the local martingale property. In particular, X o T is then a continuous 9-semimartingale with canonical decomposition M or+ Aor. Since M 2 - [M] is a continuous local martingale, the same thing is true for the time-changed process M 2 o T - [M] o r, and so

[X o r] = [Mo r] = [M] o T = [X] o T a.s. If V E L(X), we also note that V o T is product-measurable, since this is true for both V and T. Fixing any t 2: 0 and using the r-continuity of X, we get

(1[o,t]

0

r). (X

0

r)

-1

= 1[0,Tt-ll . (X 0 r) =(X 0 rrt = (l[o,t] . X) 0 T,

which proves (14) when V = 1[o,t]· If X has locally finite variation, the result extends by a monotone dass argument and monotone convergence to arbitrary V E L(X). In general, Lemma 17.23 yields the existence of some continuous, adapted processes V1 , V2 , ... suchthat f(Vn - V) 2 d[M] --+ 0 and I(Vn - V)dAj --+ 0 a.s. By (14) the corresponding properties hold for the time-changed processes, and since the processes Vn o T are rightcontinuous and adapted, hence progressive, we obtain V o TE L(X o r). Now assume instead that the approximating processes V1 , V2 , ... are predictable step processes. The previous calculation then shows that (14) holds for each Vn, and by Lemma 17.12 the relation extends to V. D

J

Let us next consider stochastic integrals of processes depending on a parameter. Given any measurable space (S, S), we say that a process Von

17. Stochastic Integrals and Quadratic Variation

345

S x ~+ is progressive if its restriction to S x [0, t] is S ® Bt ® Ft-measurable for every t 2: 0, where Bt = B([O, t]). A simple version of the following result will be useful in Chapter 18.

Theorem 17.25 ( dependence on parameter, Doleans, Stricker and Yor) Let X be a continuous semimartingale, fix a measurable space S, and consider a progressive process V.(t), s E S, t 2: 0, such that V. E L(X) for every s E S. Then the process Ys(t) = (V. · X)t has a version that is progressive on S x .IR+ and a.s. continuous for each s E S. Proof: Let M + A be the canonical decomposition of X. Assurne the existence of some progressive processes ~n on S x .IR+ such that, for any t 2: 0 and s E S,

Then Lemma 17.12 yields (~n.x- V.·X); ~ 0 for every s and t. Proceeding as in the proof of Proposition 4.31, we may choose a subsequence (nk(s)) C N, depending measurably on s, suchthat the same convergence holds a.s. along (nk(s)) for any s and t. Define Ys,t = limsupk(~nk · X)t whenever this is finite, and put Ys,t = 0 otherwise. If we can choose versions of the processes (~n · X)t that are progressive on S x .IR+ and a.s. continuous for each s, then Ys,t is clearly a version of the process (Vs · X)t with the same properties. This argument will now be applied in three steps. First we reduce to the case of bounded and progressive integrands by taking vn = V1{1VI ::; n}. Next we apply the transformation in the proof of Lemma 17.23, to reduce to the case of continuous and progressive integrands. In the final step, we approximate any continuous, progressive process V by the predictable step processes V8n(t) = V8 (2-n[2nt]). Here the integrals V8n · X are elementary, and the desired continuity and measurability are obvious by inspection. D We turn to the related topic of functional representations. To motivate the problem, note that the construction of the stochastic integral V · X depends in a subtle way on the underlying probability measure P and filtration F. Thus, we cannot expect any universal representation F(V, X) ofthe integral process V·X. In view ofProposition 4.31, one might still hope for a modified representation F(J.t, V, X), where J.t denotes the distribution of (V, X). Even this could be too optimistic, however, since the canonical decomposition of X may also depend on F. Dictated by our needs in Chapter 21, we restriet our attention to a very special situation, which is still general enough to cover most applications of interest. Fixing any progressive functions a} and bi of suitable dimension, defined on the path space C(JR+, JRd), we may consider an arbitrary adapted process X satisfying the stochastic differential equation

(15)

346

Foundations of Modern Probability

where B is a Brownian motion in IRr. A detailed discussion of such equations is given in Chapter 21. For the moment, we need only the simple fact from Lemma 21.1 that the coefficients a}(t, X) and bi(t, X) are again progressive. Write ai1 = aiafc. Proposition 17.26 (functional representation} For any progressive func-

tions a, b, and f of suitable dimension, there exists a measurable mapping (16) such that, whenever X is a solution to (15) with .C(X) = J.L and Ji(X) E L(Xi) for all i, we have Ji(X) · Xi = F(J.L, X) a.s.

Proof: From (15) we note that X is a semimartingale with covariation processes [Xi,X1] = ai1(X) ·.X and drift components bi(X) ·.X. Hence, t(X) E L(Xi) for all i iff the processes (Ji) 2 aii(X) and Jibi(X) are a.s. Lebesgue integrable. Note that this holds in particular when f is bounded. Now assume that ft, h, . . . are progressive with (17) .

.

.

.

p

Then (!~(X)· X'- f'(X) ·X'); --+ 0 for every t 2: 0 by Lemma 17.12. Thus, if /~(X)· Xi = Fn(J.L,X) a.s. for some measurable mappings Fn as in (16), then Proposition 4.31 yields a similar representation for the limit Ji(X). Xi. As in the preceding proof, we may apply this argument in three steps, reducing first to the case when f is bounded, next to the case of continuous f, and finally to the case when f is a predictable step function. Here the first and last steps are again elementary. For the second step, we may now use the simpler approximation

By Theorem 2.15 we have fn(t,x) --+ f(t,x) a.e. in t for each x E C(IR+,JRd), and (17) follows by dominated convergence. D

Exercises 1. Show that if M is a local martingale and ~ is an F 0 -measurable random variable, then the process Nt= ~Mt is again a local martingale. 2. Use Fatou's lemma to show that every local martingale M 2: 0 with EM0 < oo is a supermartingale. Alsoshow by an example that M may fail tobe a martingale. (Hint: Let Mt= Xt/(l-t)+• where Xis a Brownian motion starting at 1, stopped when it reaches 0.)

17. Stochastic Integralsand Quadratic Variation

347

3. Fixa continuous local martingale M. Show that M and [M] have a.s. the same intervals of constancy. (Hint: For any r E Q+, put r = inf{t > r; [M]t > [M]r}· Then MT is a continuous local martingale on [r,oo) with quadratic variation 0, so MT is a.s. constant on [s, r]. Use a similar argument in the other direction.) 4. For any continuous local martingales Mn starting at 0 and associated optional times rn, show that (Mn);n ~ 0 iff [Mn]Tn ~ 0. State the corresponding result for stochastic integrals.

5. Show that there exist some continuous semimartingales X 1 , X 2 , ... such that X~ ~ 0 and yet [Xn]t ~ 0 for all t > 0. (Hint: Let B be a Brownian motion stopped at time 1, put Ak 2 -n = B(k- 1 )+ 2 -n, and interpolate linearly. Define xn =B-An.) 6. Consider a Brownian motion B and an optional time r. Show that EBT = 0 when Er 112 < oo and that EB';. =Er when Er< oo. (Hint: Use optional sampling and Theorem 17.7.) 7. Deduce the first inequality in Proposition 17.9 from Proposition 17.17 and the classical Cauchy-Buniakovsky inequality. 8. Prove for any continuous semimartingales X and Y that [X+ Yjll 2 :S [X]1/2 + [Yjl/2 a.s. 9. (Kunita and Watanabe) Let M and N be continuous local martingales, and fix any p,q,r > 0 with p- 1 + q- 1 = r- 1. Show that II[M,N]tll~r :S II[M]tiiPII[N]tllq for all t > 0. 10. Let M, N be continuous local martingales with Mo = No = 0. Show that M lLN implies [M, N] 0 a.s. Also show by an example that the converse is false. ( Hint: Let M = U · B and N = V · B for a Brownian motionBand suitable U, V E L(B).) 11. Fixa continuous semimartingale X, and let U, V E L(X) with U =V a.s. on some set A E :F0 • Show that U ·X= V· X a.s. on A. (Hint: Use Proposition 17.15.) 12. Fixa continuous local martingale M, and let U, U1, U2, ... and V, V1, v2, ... E L(M) with IUnl :S Vn, Un --r u, Vn --r V, and ((Vn- V) ·M); ~ 0 for all t > 0. Show that (Un · M)t ~ (U · M)t for all t. (Hint: Write (Un - U) 2 :S 2(Vn - V) 2 + 8V 2, and use Theorem 1.21 and Lemmas 4.2 and 17.12.) 13. Let B be a Brownian bridge. Show that Xt = Bt/\1 is a semimartingale on ~+ w.r.t. the induced filtration. (Hint: Note that Mt = (1 - t)- 1 Bt is a martingale on [0, 1), integrate by parts, and checkthat the compensator has finite variation.) 14. Show by an example that the canonical decomposition of a continuous semimartingale may depend on the filtration. (Hint: Let B be Brownian motion with induced filtration :F, put 9t = :Ft V a(B1), and use the preceding result.)

=

348

Foundations of Modern Probability

15. Show by stochastic calculus that t-P Bt -t 0 a.s. as t -t oo, where B is a Brownian motion and p > ~. (Hint: Irrtegrate by parts to find the canonical decomposition. Compare with the L 1-limit.) 16. Extend Theorem 17.16 to a product of n semimartingales. 17. Consider a Brownian bridge X and a bounded, progressive process V with 01 Vidt = 0 a.s. Show that E 01 VdX = 0. (Hint: Irrtegrate by

J

J

parts to get J; VdX = J;(v- U)dB, where Bis a Brownian motion and Ut = (1- t)- 1 ft 1 V8 ds.) 18. Show that Proposition 17.17 remains valid for any finite optional times t and tnk satisfying ma.xk(tnk- tn,k-1) ~ 0. 19. Let M be a continuous local martingale. Find the canonical decomposition of IMIP when p 2: 2, and deduce for such a p the second relation in Theorem 17.7. (Hint: Use Theorem 17.18. Forthelast part, use Hölder's inequality.) 20. Let M be a continuous local martingale with Mo = 0 and [M)oo ::; 1. Show for any r 2: 0 that P{suptMt 2: r}::; e-r 2 / 2 • (Hint: Consider the Supermartingale Z = exp(cM- c2 [M]/2) for a suitable c > 0.) 21. Let X and Y be continuous semimartingales. Fix a t > 0 and a sequence of partitions (tnk) of [0, t) with ma.xk(tnk- tn,k-1) -t 0. Show that

~ l:k(ytnk + Ytn,k-J(Xtnk - Xtn,k-J ~ (Y o X)t. (Hint: Use Corollary 17.13 and Proposition 17.17.) 22. Show that the Fisk-Stratonovich integral satisfies the chain rule U o (VoX) = (UV)oX. (Hint: Reduce to Itö integrals and use Theorems 17.11 and 17.16 and Proposition 17.14.) 23. A process is predictable if it is measurable with respect to the a-field in IR+ X n induced by all predictable step processes. Show that every predictable process is progressive. Conversely, given a progressive process X and a constant h > 0, show that the process yt = x(t-h)+ is predictable. 24. Given a progressive process V and a nondecreasing, continuous, adapted process A, show that there exists some predictable process V with IV- VI· A = 0 a.s. (Hint: Use Lemma 17.23.) 25. Given the preceding statement, deduce Lemma 17.23. (Hint: Begin with predictable V, using a monotone dass argument.) 26. Construct the stochastic integral V · M by approximation from elementary integrals, using Lemmas 17.10 and 17.23. Show that the resulting integral satisfies the relation in Theorem 17.11. (Hint: First let M E M 2 and E(V 2 · [M)) 00 < oo, and extend by localization.) d

-

-

-

27. Let (V, B) =(V, B), where BandBare Brownian motions on possibly different filtered probability spaces and V E L(B), V E L(B). Show that (V, B, V· B) 4 (V, B, V· B). (Hint: Argue as in the proof of Proposition 17.26.)

17. Stochastic Integrals and Quadratic Variation

349

28. Let X be a continuous .F-semimartingale. Show that X remains a semimartingale conditionally on Fo, and that the conditional quadratic variation agrees with [X]. Alsoshow that if V E L(X), where V= a(Y) for some continuous process Y and measurable function a, then V remains conditionally X-integrable, and the conditional integral agrees with V· X. (Hint: Conditioning on Fo preserves martingales.)

Chapter 18

Continuous Martingales and Brownian Motion Real and complex exponential martingales; martingale characterization of Brownian motion; random time-change of martingales; integral representation of martingales; iterated and multiple integrals; change of measure and Girsanov's theorem; Cameron-Martin theorem; Wald's identity and Novikov's condition

This chapter deals with a wide range of applications of the stochastic calculus, the principal tools of which were introduced in the preceding chapter. A recurrent theme is the notion of exponential martingales, which appear in both a real and a complex variety. Exploring the latter yields an effortless approach to Levy's celebrated martingale characterization of Brownian motion as well as to the basic random time-change reduction of isotropic continuous local martingales to a Brownian motion. By applying the latter result to suitable compositions of Brownian motion with harmonic or analytic functions, we may deduce some important information about Brownian motion in ffi.d. Similar methods can be used to analyze a variety of other transformations that lead to Gaussian processes. As a further application of the exponential martingales, we shall derive stochastic integral representations of Brownian fundionals and martingales and examine their relationship to the chaos expansions obtained by different methods in Chapter 13. In this context, we show how the previously introduced multiple Wiener-Itö integrals can be expressedas iterated single ltö integrals. A similar problem, of crucial importance for Chapter 21, is to represent a continuous local martingale with absolutely continuous covariation processes in terms of stochastic integrals with respect to a suitable Brownian motion. Our last main topic is to examine the transformations induced by an absolutely continuous change of probability measure. The density process turns out to be a real exponential martingale, and any continuous local martingale in the original setting will remain a martingale under the new measure, apart from an additional drift term. The observation is useful for applications, where it is often employed to remove the drift from a given semimartingale. The appropriate change of measure then depends on the

18. Continuous Martingalesand Brownian Motion

351

process, and it becomes important to derive effective criteria for a proposed exponential process to be a true martingale. Our present exposition may be regarded as a continuation of the discussion of martingales and Brownian motion from Chapters 7 and 13, respectively. Changes of time and measure are both important for the theory of stochastic differential equations, as developed in Chapters 21 and 23. The time-change results for continuous martingales have a counterpart for point processes explored in Chapter 25, where general Poisson processes play a role similar to that of the Gaussian processes here. The results about changes of measure are extended in Chapter 26 to the context of possibly discontinuous semimartingales. To elaborate on the new ideas, we begin with an introduction of complex exponential martingales. It is instructive to compare them with the real versions appearing in Lemma 18.21. Lemma 18.1 ( complex exponential martingales) Let M be a real continuaus local martingale with Mo= 0. Then Zt

= exp(iMt + ~[M]t),

is a complex local martingale satisfying Zt

t ~ 0,

= 1 + i(Z · M)t a.s.

Proof: Applying Corollary 17.20 to the complex-valued semimartingale Xt = iMt + ~[M]t and the entire function f(z) = ez, we get dZt

= Zt(dXt + ~d[X]t) = Zt(idMt + ~d[M]t-

~d[M]t)

= iZtdMt.

D

The next result gives the basic connection between continuous martingales and Gaussian processes. For any subset K of a Hilbert space, we write k for the closed linear subspace generated by K. Lemma 18.2 {isometries and Gaussian processes) Given a subset K of a Hilbert space H, consider for each h E K a continuous local :F-martingale Mh with M~ = 0 such that

[Mh, Mk]oo

= (h, k) a.s.,

(1)

h, k E K.

Then there exists an isonormal Gaussian process ryll:Fo on M!, = ryh a.s. for all h E K. Proof: Fix any linear combination Nt = u1M: 1 conclude from (1) that

[N]oo

=



J,k

UjUk[Mhi,Mhk]oo

=



J,k

k

such that

+ · · · + unMthn,

UjUk(hj,hk)

and

= llhll 2 ,

where h = u 1h 1 + · · · + unhn. The process Z = exp(iN + ~[N]) is a.s. bounded, and so by Lemma 18.1 it is a uniformly integrable martingale. Writing ~ = N 00 , we hence obtain for any A E :Fo PA= E[Zoo;AJ = E[exp(iN00

+ ~[N]oo);A]

= E[eie;A]ellhll 2 / 2 .

Since u 1, ... , un were arbitrary, we conclude from the uniqueness theorem for characteristic functions that the random vector (Mt;;}, ... , M!,n) is in-

352

Foundations of Modern Probability

dependent of :F0 and centered Gaussian with covariances (hj, hk)· It is now easy to construct a process rJ with the stated properties. 0 As a first application, we may establish the following basic martingale characterization of Brownian motion. Theorem 18.3 (characterization of Brownian motion, Levy) Let B = (B 1 , ... , ßd) be a process in ~d with Bo = 0. Then B is an :F-Brownian motion iff it is a continuous local :F-martingale with [Bi, Bi]t 8ijt a.s.

=

Proof: For fixed s < t, we may apply Lemma 18.2 to the continuous local martingales M: = B~l\t- B~l\s' r ~ s, i = 1, ... , d, to see that the differences Bf- B! are i.i.d. N(O, t- s) and independent of :F8 • 0 The last theorem suggests the possibility of transforming an arbitrary continuous local martingale M into a Brownian motion through a suitable random time-change. The proposed result is indeed true and admits a natural extension to higher dimensions; for convenience, we consider directly the version in ~d. A continuous local martingale M = (M 1 , ... , Md) is said tobe isotropic if a.s. [Mi]= [Mi] and [Mi, Mi]= 0 for all i =I j. Note in particular that this holds for Brownian motion in ~d. When M is a continuous local martingalein C, the condition is clearly equivalent to [M] = 0 a.s., or [RM] = [~M] and [RM, ~M] = 0 a.s. For isotropic processes M, we refer to [M 1] =···=[Md] or [RM] = [~M] as the rate process of M. The proof is Straightforward when [M]oo = oo a.s., but in general it requires a rather subtle extension of the filtered probability space. To simplify our statements, we assume the existence of any requested randomization variables. This can always be achieved, as in the elementary context of Chapter 6, by passing from the original setup (!1, A, :F, P) to the product .A, J:, P), where = n X [0, 1], .A = A ® B, j:t = :Ft X [0, 1], and space p = P®>-.. Given two filtrations :Fand g on n, we say that g is a Standard extension of :F if :Ft c 9tll.rt:F for all t ~ 0. This is precisely the condition needed to ensure that all adaptedness and conditioning properties will be preserved. The notion is still flexible enough to admit a variety of useful constructions.

(n,

n

Theorem 18.4 (time-change reduction, Dambis, Dubins and Schwarz) Let M be an isotropic continuous local :F-martingale in ~d with Mo = 0, and define

[M 1]t > s}, Ys = F 7 . , S ~ 0. Then there exists in JRd a Brownian motion B with respect to a standard extension ofQ, suchthat a.s. B = M or on [0, [M 1 ] 00 ) and M = Bo [M 1 ]. T8

= inf{t ~ 0;

Proof: We may take d = 1, the proof in higher dimensions being similar. Introduce a Brownian motion Xll:F with induced filtration X, and put gt = 9t V Xt. Since QllX, it is clear that g is a standard extension of both

18. Continuous Martingales and Brownian Motion

353

g and X. In particular, X remains a Brownian motion und er g. Now define (2) Since M is r-continuous by Proposition 17.6, Theorem 17.24 shows that the first term Mo r is a continuous 9-martingale, hence also a g-martingale, with quadratic variation

[Mo r] 8 = [M]T. =

8

1\

[M] 00 ,

s 2: 0.

The second term in (2) has quadratic variation s - s 1\ [M] 00 , and the covariation vanishes since Mo r llX. Thus, [B]s = s a.s., and so Theorem 18.3 shows that Bis a g-Brownian motion. Finally, B 8 =MT. for s < [M] 00 , which implies M = B o [M] a.s. by the r-continuity of M. D In two dimensions, isotropic martingales arise naturally through the composition of a complex Brownian motion B with an arbitrary (possibly multi-valued) analytic function f. Forageneral continuous process X, we may clearly choose a continuous evolution of f(X), as long as X avoids the possible singularities of f. Similar results are available for harmonic functions, which is especially useful in dimensions d 2: 3, when no analytic functions exist. Theorem 18.5 {harmonic and analytic maps, Levy)

(i) Let M be an isotropic, continuous local martingale in JR.d, and fix an harmonic function f such that M a.s. avoids the sigularities of f. Then f(M) is a local martingale with [f(M)] = IVJ(MW · [M 1]. (ii) Let M be a complex, isotropic, continuous local martingale, and fix an analytic function f suchthat M a.s. avoids the singularities of f. Then f(M) is again an isotropic local martingale, and [~f(M)] = if'(MW · [~M]. If Bis a Brownian motion and f' '# 0, then [~f(B)] is a.s. unbounded and strictly increasing. Proof: (i) Using the isotropy of M, we get by Corollary 17.19 f(M) =!(Mo)+ ff ·Mi+ ~!::J.f(M) · [M 1].

Here the last term vanishes since f is harmonic, and so f(M) is a local martingale. From the isotropy of M we further obtain

(ii) Since f is analytic, we get by Corollary 17.20 f(M) =!(Mo)+ f'(M) · M

+ V"(M) · [M].

(3)

Here the last term vanishes since M is isotropic. The same property also yields

[f(M)] = [j'(M) · M] = (f'(M)) 2 · [M] = 0,

354

Foundations of Modern Probability

and so f(M) is again isotropic. Finally, writing M =X +iY and f'(M) U +iV, we get [~f(M)] = [U ·X- V· Y] = (U 2

=

+ V 2 ) ·[X]= lf'(MW · [~M].

If f' is not identically 0, it has at most countably many zeros. Hence, by Fubini's theorem

EA{t 2:: 0; f'(Bt) = 0} =

loo

P{f'(Bt) = O}dt = 0,

and so [~f(B)] = lf'(B)l2 ·Ais a.s. strictly increasing. To see that it is also a.s. unbounded, we note that f(B) converges a.s. on the set {[~f(B)] < oo }. However, f(B) diverges a.s. since f is nonconstant and the random walk Bo, B1, ... is recurrent by Theorem 9.2. D Combining the last two results, we may derive two basic properties of Brownian motion in JRd, namely the polarity of singleton sets when d 2:: 2 and the transience when d 2:: 3. Note that the latter property is a continuous-time counterpart of Theorem 9.8 for random walks. Both properties play important roles for the potential theory developed in Chapter 24. Define Ta= inf{t > 0; Bt = a}. Theorem 18.6 (point polarity and transience, Levy, Kakutani) For a Brownian motion B in JRd, we have the following:

(i) lf d 2:: 2, then (ii) lf d 2: 3, then

= oo a.s. for all a E JRd. IBtl -+ oo a.s. as t-+ oo.

Ta

Proof: (i) Here we may clearly take d = 2, so we may let B be a complex Brownian motion. Applying Theorem 18.5 (ii) to the entire function ez, it is seen that M = e 8 is a conformal local martingale with unbounded rate [~M]. By Theorem 18.4 we have M - 1 = X o [~M] a.s. for some Brownian motion X, and since M =F 0 it follows that X a.s. avoids -1. Hence, T -1 = oo a.s., and by the scaling and rotational symmetries of B we get Ta= oo a.s. for every a =F 0. To extend the result to a = 0, we may conclude from the Markov property at h > 0 that

Po{ To o fh < oo} = EoPBh {To < oo} = 0,

h > 0.

As h-+ 0, we get Po{ro < oo} = 0, and so To = oo a.s. (ii) Here we may take d = 3. For any a =F 0 we have Ta = oo a.s. by claim (i), and so by Theorem 18.5 (i) the process M = IB - aJ- 1 is a continuous local martingale. By Fatou's lemma M is then an L 1-bounded supermartingale, and so by Theorem 7.18 it converges a.s. toward some random variable f Since Mt ~ 0 we have ~ = 0 a.s. D Combining part (i) of the last result with Theorem 19.11, we note that a complex, isotropic continuous local martingale avoids every fixed point outside the origin. Thus, Theorem 18.5 (ii) applies to any analytic function f with only isolated singularities. Since f is allowed to be multi-valued,

18. Continuous Martingales and Brownian Motion

355

the result applies even to functions with essential singularities, such as to f(z) = log(1 + z). Fora simple application, we may consider the windings · of planar Brownian motionaraund a fixed point.

Corollary 18.7 {skew-product representation, Galmarino) Let B be a complex Brownian motion starting at 1, and choose a continuous version of V = arg B with Vo = 0. Then vt Y o (IBI- 2 • A)t a.s. for some real Brownian motion Y JliBI.

=

Proof: Applying Theorem 18.5 (ii) with f(z) = log(1 + z), we note that Mt =log IBtl + ivt is an isotropic martingale with rate [~M] = IBI- 2 • A. Hence, by Theorem 18.4 there exists some complex Brownian motion Z = X+ iY with M = Z o [~M] a.s., and the assertion follows. 0

For a nonisotropic continuous local martingale M in JR.d, there is no single random time-change that will reduce the process to a Brownian motion. However, we may transform each component Mi separately, as in Theorem 18.4, to obtain a collection of one-dimensional Brownian motions B 1 , ... , Bd. If the latter processes happen to be independent, they may clearly be combined into a d-dimensional Brownian motion B = (B 1 , .•. , Bd). It is remarkable that the required independence arises automatically whenever the original components Mi are strongly orthogonal, in the sensethat [Mi, Mi] = 0 a.s. for all i =f. j.

Proposition 18.8 ( orthogonality and independence, K night) Let M 1 , M 2 , . . . be strongly orthogonal, continuous local martingales starting at 0. Then there exist some independent Brownian motions B 1 , B 2 , . . • such that Mk =Bk o [Mk] a.s. for every k. Proof: When [Mk]oo = oo a.s. for all k, the result is an easy consequence of Lemma 18.2. In general, we may introduce a sequence of independent Brownian motions X 1 , X 2 , ..• Jl :F with induced filtration X. Define

s:

= Mk(r:)

+ Xk((sand put gt

[Mk]oo)+),

s ~ 0, k E N,

write 'l/Jt = -log(1- t)+, = F,p. + X(t- 1 )+' t ~ 0. To checkthat B 1 , B 2 , . . • have the desired joint distribution, we may clearly assume the [Mk] tobe bounded. Then the processes Ntk = Mj. + X~_ 1 )+ are strongly orthogonal, continuous g-martingales with quadratic variations [Nk]t = [Mk],p. + (t- 1)+, and we note that = N;k, where = inf{t ~ 0;

[Nk]t as.

>

s:

a:

s }. The assertion now follows from the result for [Mk]oo = oo 0

As a further application of Lemma 18.2, we consider a simple continuoustime version of Theorem 11.13. Given a continuous semimartingale X on I = JR.+ or [0, 1) and a progressive process T on I that takes values in I= [0, oo] or [0, 1], respectively, we may define

(X oT- 1 )t = 11{Ts

~ t}dX

8 ,

t EI,

356

Foundations of Modern Probability

as long as the integrals on the right exist. For motivation, we note that if ~ is a random measure on I with "distribution function" Xt = ~[0, t], t E I, then X o r- 1 is the distribution function of the transformed measure ~o

r- 1 .

Proposition 18.9 {measure-preserving progressive maps) Let B be a Brownian motion or bridge on I = ~+ or [0, 1], respectively, and let T be a progressive process on I suchthat >.oT- 1 = >. a.s. Then BoT- 1 f= B.

Proof: The result for I = IR.+ is an immediate consequence of Lemma 18.2, and so we may assume that Bis a Brownian bridge on [0, 1]. Then Mt= Bt/(1- t) is a martingale on [0, 1), and therefore Bis a semimartingale on the same interval. Integrating by parts gives

dBt

=dXt- Mtdt.

= (1- t)dMt- Mtdt

(4)

Thus, [X]t = [B]t = t a.s. for all t, and X is a Brownian motion by Theorem 18.3. Now let V be a bounded, progressive process on [0, 1] such that the integral V = 01 vtdt is a.s. nonrandom. Integrating by parts, we get for any u E [0, 1)

J

1u ViMtdt

=

Mu 1u Vidt - 1u dMt 1t V8 dS

=

1u dMt

1 1

V8 dS - Mu

1 1

Vidt.

As u -7 1, we have (1 - u)Mu = Bu -7 0, and so the last term tends to 0. Hence, by dominated convergence and (4),

where Vt = (1-t)- 1 ft1 Vsds. If U isanother bounded, progressive process, we get by a simple calculation

For Ur = 1{Tr ~ s} and Vr = 1{Tr ~ t}, the right-hand side becomes sl\ t- st = E(BsBt), and the assertion follows by Lemma 18.2. o We turn to a basic representation of martingales with respect to a Brownian filtration.

18. Continuous Martingales and Brownian Motion

357

Theorem 18.10 {Brownian martingales) Let :F be the complete filtration induced by a Brownian motion B = (B 1 , ... , Bd) in JR.d. Then any local :Fmartingale M is a.s. continuous, and there exist some (P x A)-a.e. unique processes V 1 , ... , Vd E L(B 1 ) such that M

= Mo + "' Vk ·Bk L..Jk'S,d

(5)

a.s.

The statement is essentially equivalent to the following representation of Brownian functionals, which we prove first. Lemma 18.11 {Brownian functionals, Ito) Let B = (B 1 , ... , Bd) be a Brownian motion in JR.d, and fix any B -measurable random variable ~ E L 2 with E~ = 0. Then there exist some (P x A)-a.e. unique processes V1 ' •.• 'vd E L(B 1 ) such that ~ = Ek(Vk . Bk)oo a.s. Proof {Dellacherie): Let H denote the Hilbert space of B-measurable random variables ~ E L 2 with E~ = 0, and write K for the subspace of elements ~ admitting the desired representation Ek(Vk · Bk) 00 • For such a ~ we get Ee = El:k((Vk) 2 · A) 00 , which implies the asserted uniqueness. By the obvious completeness of L(B 1 ), it is further seen from the same formula that K is closed. To obtain K = H, we need to show that any ~ E H 8 K vanishes a.s. Then fix any nonrandom functions ul, ... , ud E L 2 (JR.). Put M = Ek uk ·Bk, and define the process Z as in Lemma 18.1. Then Z- 1 = iZ · M = i Ek(zuk) ·Bk by Proposition 17.14, and so~ ..l (Zoo- 1), or E~exp{il:k(uk · Bk)oo} = 0. Specializing to step functions uk and using the uniqueness theorem for characteristic functions, we get E[~;

(Btt, ... ,BtJ E C] = 0,

t1, ... ,tn E !R.+, CE Bn,

By a monotone dass argument this extends to A E :F00 , and so~= E[~I:Foo] = 0 a.s.

E[~;

n

E N.

A] = 0 for arbitrary D

Proof of Theorem 18.10: We may clearly take Mo = 0, and by suitable localization we may assume that M is uniformly integrable. Then M 00 exists in L 1 (:Foo) and may be approximated in L 1 by some random variables 6,6, · · · E L 2 (:F00 ). The martingales Mt= E[~ni:Ft] are a.s. continuous by Lemma 18.11, and by Proposition 7.15 we get, for any c > 0, P{(~M)*

> 2~:} :S P{(Mn- M)* > ~:} :S c- 1 El~n- Mool-+ 0.

Hence, (~M)* = 0 a.s., and so M is a.s. continuous. The remaining D assertions now follow by localization from Lemma 18.11. Our next theorem deals with the converse problern of finding a Brownian motion B satisfying (5) when the representing processes Vk are given. The result plays a crucial role in Chapter 21.

358

Foundations of Modern Probability

Theorem 18.12 (integral representation, Doob) Let M be a CO'f!-tinuous local :F-martingale in JRd with Mo= 0 suchthat [Mi, Mi]= V~Vl· A a.s. for some :F-progressive processes V~, 1 ::::; i ::::; d, 1 ::::; k ::::; n. Then there exists in JRd a Brownian motion B with respect to a standard extension of :F such that Mi = V~ ·Bk a.s. for all i. Proof: For any t 2': 0, let Nt and Rt be the null and range spaces of the matrix vt, and write Nf and Rt.l.. for their orthogonal complements. and Denote the corresponding orthogonal projections by TrNp TrRt> 1rN.1., t 1rR.1., respectively. Note that vt is a bijection from Nf to Rt, and write t ~-l for the inverse mapping from Rt to Nf. Allthese mappings are clearly Borel-measurable functions of vt, and hence again progressive. Now introduce a Brownian motion X ll:F in Rn with induced filtration X, and note that 9t = :Ft V Xt, t 2': 0, is a standard extension of both :F and X. Thus, V remains Q-progressive, and the martingale properties of M and X arestill valid for Q. Consider in Rn the local Q-martingale B

= v- 1 7rR. M + 7rN • x.

The covariation matrix of B has density (V- 17rR)VV'(V- 17rR) 1

+ 7rN7r~ =

'TrN.l. Tr~.l.

+ 7rN7r~ =

'TrN.l.

+ 7rN =I,

and so Theorem 18.3 shows that B is a Brownian motion. Furthermore, the process TrR.l. · M = 0 vanishes a.s. since its covariation matrix has density 1rR.1. VV'rr~.l. = 0. Hence, by Proposition 17.14,

We may next prove a Fubini-type theorem, which shows how the multiple Wiener-Itö integrals defined in Chapter 13 can be expressed in tenns of iterated Itö integrals. Then introduce for each n E N the simplex ~n =

{(tt, ... , tn) ER+.; t1 < "· < tn}·

Given a function f E L 2 (R+., An), we write j = n!lla .. , where the symmetrization of f defined in Chapter 13.

j denotes

Theorem 18.13 (multiple and iterated integrals) Consider a Brownian motion B in IR with associated multiple Wiener-Ito integrals In, and fix any f E L 2 (R+.). Then Inf

=

J J dBt.,

dBt .. _1

• • •

J

f(tt, ... , tn)dBt 1 a.s.

(6)

Though a formal verification is easy, the existence of the integrals on the right depends in a subtle way on the possibility of choosing suitable versions in each step. The existence of such versions is implicitly regarded as part of the assertion.

18. Continuous Martingales and Brownian Motion

359

Prool: We shall prove by induction that the iterated integral

~:+l>···,tn =

J J dBtk

dBtk-l · · ·

J

](ti. ... ,tn)dBt1

exists for almost all tk+b ... 'tn, and that vk has a version supported by ßn-k that is progressive as a process in tk+l with parameters tk+2, ... , tn. Furthermore, we shall establish the relation E (

~~+ 1 ,

..•

,tn

f = J···J

{j(tb • • •, tn)} 2dt1 • • •dtk.

{7)

...

This allows us, in the next step, to define ~~!;, ,tn for almost all tk+2 1 • • • , tn. The integral V 0 = j clearly has the stated properties. Now assume that a version of the integral ~~~.~,tn has been constructed with the desired properties. For any tk+l, ... , tn such that {7) is finite, Theorem 17.25 shows that the process t ~ 0,

has a progressive version that is a.s. continuous in t for fixed tk+b ... , tn. By Proposition 17.15 we obtain

v:tkk+b· .. , tn

= xtkk+l, t k+lt· .. ,t n

a.s.,

tk+1, ... 'tn ~ 0,

and the progressivity clearly carries over to Vk, regarded as a process in tk+l with parameters tk+ 2, ... , tn. Since vk- 1 is supported by ßn-k+l, we may choose Xk tobe supported by JR+ X ßn-k, which ensures Vk tobe supported by ßn-k· Finally, equation (7) for vk- 1 yields

E

(~~+l>····tnr

J(~~~.~.tJ 2 J···J

= E

=

dtk

{](tb ... , tn)} 2dt1 · · · dtk.

To prove {6), we note that the right-hand side is linear and L 2-continuous in I. Furthermore, the two sides agree for indicator functions of reetangular boxes in ßn. The relation extends by a monotone dass argument to arbitrary indicator functions in ßn, and the further extension to L 2(ßn) is immediate. It remains to note that Inl = Inf = In] for any I E L 2 (JR+)· D Our previous developments have provided two entirely different representations of Brownian functionals with zero mean and finite variance, namely the chaos expansion in Theorem 13.26 and the stochastic integral representation in Lemma 18.11. We proceed to examine how the two formulas are related. For any function f E L 2 (R+), we define ft(tb ... , tn_ 1) = l(tb ... , tn-1 1 t) and write In-d(t) = In-dt when Iift II < oo.

360

Foundations of Modern Probability

Proposition 18.14 (chaos and integml representations} Fixa Brownian motion B in IR, and let ~ be a B -measurable random variable with chaos expansion Ln~l Infn· Then ~ = (V· B)oo a.s., where

Vt

="'

L....n~l

In-dn(t),

t 2: 0.

Proof: For any m E N we get, as in the last proof,

J "'

A 2 dt L.... E{In-dn(t)} n~m

=

=

" L....' A llfnli 2 " L....' n~m

E(Infn) 2
0.

T

18. Continuous Martingales and Brownian Motion

365

Since the Tb remain optional with respect to the right-continuous, induced filtration, we may assume B to be canonical Brownian motion with associated distribution P = Po. Defining ht = t and Z = &(B), we see from Theorem 18.22 that Ph = Zt · P on :Ft for all t 2: 0. Since Tb < oo a.s. under both P and Ph, Lemma 18.16 yields

Eexp(Brb-

~Tb)=

EZTb

= E[ZTbi Tb
0. Finally, A commutes on V with every Tt.

Proof: If f E Co, then (Ft) yields Ttf E Co for every t, and so by dominated convergence we have even R:xf E Co. To prove the stated contraction property, we write for any f E Co

II.XR>./11 :::; .X

1oo e->.tiiTtflldt:::; .XII/111

00

e->.tdt

= 11111·

A simple computation yields the resolvent equation

(3)

19. Feiler Processes and Semigroups

371

which shows that the operators R>. commute and have a common range V. If f = Rtg with g E Co, we get by (3) and as .A ~ oo

11-AR>..f- !II = 11(-AR>..- J)Rtgll = II(Rt- I)R>.gll < .A- 1 11Rt- 11111911 ~ 0. The convergence extends by a simple approximation to the closure of V. Now introduce the one-point compactification S = S U {ß} of S, and extend any f E Co to 6 = C(S) by putting J(ß) = 0. If V# C0 , then by the Hahn-Banachtheorem there exists a bounded linear functional cp :f=. 0 on 6 suchthat cpRtf = 0 for all f E Co. By Riesz's representation Theorem 2.22 we may extend cp to a bounded, signed measure on S. Letting f E C0 and using (F 2), we get by dominated convergence as .A ~ oo 0

= .AcpR>.f = =

=

J 1

J 1 cp(dx)

00

cp(dx)

00

.Ae->.tTtf(x)dt

e- 8 Ts;>.f(x)dt

~ cpf,

and so cp 0. The contradiction shows that V is densein Co. To see that the operators R>. are injective, let f E Co with R>.of = 0 for some .Ao > 0. Then (3) yields R>.f = 0 for every .A > 0, and since .AR>.f ~ f as .A ~ oo, we get f = 0. Hence, the inverses R). 1 exist on V. Multiplying (3) by R). 1 from the left and by R-;/ from the right, we get on V the relation R;/- R). 1 = J.L- .A. Thus, the operator A = .A- R). 1 on V is independent of .A. To prove the final assertion, we note that Tt and R>. commute for any t, .A > 0, and write

The operator A in Theorem 19.4 is called the generator of the semigroup (Tt)· If we want to emphasize the role of the domain V, we say that (Tt) has generator (A, V). The term is justified by the following lemma. Lemma 19.5 (uniqueness) A Feller semigroup is uniquely determined by its generator.

Proof: The operator Adetermines R>. = (.A- A)- 1 for all .A > 0. By the uniqueness theorem for Laplace transforms, it then determines the measure J.L(dt) = Ttf(x)dt on IR+ for any f E Co and x ES. Since the density Ttf(x) is right-continuous in t for fixed x, the assertion follows. D We now aim to show that any Feiler semigroup is strongly continuous and to derive abstract versions of Kolmogorov's forward and backward equations.

372

Foundations of Modern Probability

Theorem 19.6 (strong continuity, forward and backward equations) Let

(Tt) be a Feller semigroup with generator (A, V). Then (Tt) is strongly continuous and satisfies Ttf-f= 1tTsAfds,

/EV,

t~O.

(4)

Furthermore, Ttf is differentiable at 0 iff f E V, in which case d

dt (Ttf)

= TtAJ = ATtf,

t ~ 0.

(5)

To prove this result, we introduce the so-called Yosida approximation

= .\(.\R>.- I), associated semigroup T/ = etA''', t ~ A>. =.\AR>.

.\ > 0,

(6)

and the 0. The latter is clearly the transition semigroup of a pseudo-Poisson process with rate.\ based on the transition operator .\R>.. Lemma 19.7 (Yosida approximation) For any f E V, we have

t, .\ > o, (7) and A>. f--+ Af as .\--+ oo. Furthermore, T/ f--+ Ttf as .\--+ oo for each f E Co, uniformly for bounded t ~ 0.

liTt!- T/ !II ~ t11Af- A>. !II,

Proof: By Theorem 19.4 we have A>. f = .\R>.Af --+ Af for any f E V. For fixed ,\ > 0 it is further clear that h- 1 (Tt -I) --+ A>. in the norm topology as h--+ 0. Now for any commuting contraction operators B and

c,

IIBn f- cn !II

IIBn-l + Bn- 2 C + ... + cn-liiiiBf- c !II ~ niiBJ- Cfll· Fixing any f E Co and t, .\, J.l > 0, we hence obtain as h = tjn--+ 0 liTt>. f- Tf !II < n liTt f- T/: !II ~

=

I

t Tt fh- f -

Tt: fh- f II-+ t IIA>.f- AJj !II·

For f E V it follows that Tt>. f is Cauchy convergent as ,\ --+ oo for fixed t, and since V is densein Co, the same property holds for arbitrary f E C0 • Denoting the limit by Ttf, we get in particular

II Tl!- tu II ~ tiiA>. 1- AJII, 1 E v, t ~ o.

(8)

Thus, for each f E V we have Tt>. f --+ Ttf as .\ --+ oo, uniformly for bounded E Co. To identify Tt, we may use the resolvent equation (3) to obtain, for any f E Co and A,J.l > 0,

t, which again extends to all f

{oo e->.tTfJ.LRJjfdt =

h

(.\- AJj)- 1 J.LRJjf =

~Rvf, A+J.l

(9)

19. Feller Processes and Semigroups

where 11 = AJ.L(>. + J.L)- 1 . As J.L --+ oo, we have Furthermore,

11

--+ >., and so Rvf

373

--+ R>..

J

so from (9) we get by dominated convergence e->.t'ftfdt = R>.J. Hence, the semigroups (Tt) and (Tt) have the same resolvent operators R>., and so they agree by Lemma 19.5. In particular, (7) then follows from (8). D

Proof of Theorem 19.6: The semigroup (T/) is clearly norm continuous in t for each >. > 0, and so the strong continuity of (Tt) follows by Lemma 19.7 as >. --+ oo. Furthermore, we note that h- 1 (T~ -I) --+ A>. as h .J.. 0. Using the semigroup relation and continuity, we obtain more generally !!_T>. dt t

= A>.T>.t = T>.t A>. '

t ~ 0,

which implies

T/J- f = Iot r;A>.fds,

f E Co, t

~ 0.

(10)

If f E TJ, then by Lemma 19.7 we get as >.--+ oo

uniformly for bounded s, and so (4) follows from (10) as >. --+ oo. By the strong continuity of Tt we may differentiate (4) to get the first relation in (5). The second relation holds by Theorem 19.4. Conversely, assume that h- 1 (Thf-!) --+ g for some pair of functions j,g E C 0 . As h--+ 0, we get

and so

In applications, the domain of a generator A is often hard to identify or too large to be convenient for computations. It is then useful to restriet A to a suitable subdomain. An operator A with domain 1J on some Banach space B is said tobe closed if its graph G = {(!, Af); f E TJ} is a closed subset of B 2 • In general, we say that A is closable if the closure G is the graph of a single-valued operator A, the so-called closure of A. Note that A is closable iff the conditions 1J 3 fn --+ 0 and Afn --+ g imply g = 0. When A is closed, a core for A is defined as a linear subspace D C 1J such that the restriction Al D has closure A. In this case, A is clearly uniquely determined by Alv· Weshall give some conditions ensuring that D C 1J is a core when Ais the generator of a Feiler semigroup (Tt) on Co.

374

Foundations of Modern Probability

Lemma 19.8 (closure and cores} The generator (A, V) ol a Feller semigroup is closed, and lor any >. > 0 a subspace D C V is a core lor A iff (>.- A)D is densein Co. Prool: Assurne that ft, /2, · · · E V with ln ---+ I and Aln ---+ g. Then (I- A)ln ---+ I- g, and since R1 is bounded, it follows that ln ---+ R1 {!- g ). Hence, I= R 1 (f- g) E V, and we have {I- A)l =I- g, or g = Af. Thus, A is closed. If D is a core for A, then for any g E Co and >. > 0 there exist some /1, /2, · · · E D with ln---+ R>,.g and Aln---+ AR>,.g, and we get (>.- A)ln---+ (>.- A)R>..g = g. Thus, (>.- A)D is densein Co. Conversely, assume that (>.- A)D is densein Co. To show that Dis a core, fix any I E V. By hypothesis we may choose some ft, h, · · · E D with 9n

=(>.- A)ln ---+ (>.- A)l =g.

Since R>.. is bounded, we obtain ln = R>..9n---+ R>,.g =I, and thus 0

A subspace D C Co is said to be invariant under (Tt) if TtD C D for all C Co, the linear span of Ut TtB is an invariant subspace of Co.

t

2:: 0. In particular, we note that, for any subset B

Proposition 19.9 (invariance and cores, Watanabe} II(A, V) is the generator ol a Feller semigroup, then any dense, invariant subspace D C V is a core lor A. Prool: By the strong continuity of (Tt) we note that R1 can be approximated in the strong topology by some finite linear combinations L 1 , L 2 , ..• of the operators Tt. Now fix any I E D, and define 9n = Lnl· Noting that A and Ln commute on D by Theorem 19.4, we get (I- A)gn =(I- A)Lnl = Ln(I- A)l---+ R1(I- A)l

= f.

Since 9n E D and Disdensein Co, it follows that (I- A)D is densein Co. Hence, D is a core by Lemma 19.8. 0 The Levy processes in JRd are the archetypes of FeUer processes, and we proceed to identify their generators. Let C(f denote the dass of all infinitely differentiable functions f on JR.d such that I and all its derivatives belong to Co = Co(JR.d).

Theorem 19.10 (Levy processes) Let Tt, t 2:: 0, be the transition operators ol a Levy process in JR.d with characteristics (a,b,v). Then (Tt) is a Feller semigroup, and C0 is a core lor the associated generator A. Moreover, we have lor any I E C0 and x E JR.d Al(x)

=

~

+

L· .aiiiij(x) + Lbd;(x)

J

~.J

t

{l(x +y)- l(x)- Liyd;(x)1{IYI:::; 1}} v(dy).(ll)

19. Feller Processes and Semigroups

375

In particular, a standard Brownian motion in JRd has generator ~6., and the uniform motion with velocity b E JRd has generator b'\1, both on the core c~. Here 6. and '\1 denote the Laplace and gradient Operators, respectively. Also note that the generator of the jump component has the same form as for the pseudo-Poisson processes in Proposition 19.2, apart from the compensation for small jumps by a linear drift term. *[t- 1 ] w

Proof of Theorem 19.10: As t-+ 0, we have 11-t 15.20 yields p,tft -4 v on JR.d \ {0} and at,h=r 1

f

Jlxl9

provided that h write

xx'p,t(dx)-+ah,

C

1

f

}IYIS.h

+r 1 f

f

Jlxl9

X/1-t(dx)-tbh,

> 0 satisfies v{lxl = h} = 0. Now fix any f

C 1 (Ttf(x)- f(x))

=

bt,h=r 1

-+ /1-1· Thus, Corollary

=C 1

J

(f(x

and

+ y)- f(x))~-tt(dy)

{f(x + y)- f(x)- L.Ydi(x)-

}IYI>h

E C~,

(12)

~

(f(x + y)- f(x))p,t(dy)

~L-~.J.YiYi!Ij(x)} /1-t(dy)

+ L b~,h JI(x) + ~ L a~jh !Ij(x). i

i,i

As t -+ 0, the last three terms approach the expression in (11), though with aii replaced by at and with the integral taken over {lxl > h}. To establish the required convergence, it is then enough to show that the first term on the right tends to zero as h -+ 0, uniformly for small t > 0. But this is clear from (12), since the integrand is of the order hlyl 2 by Taylor's formula. From the uniform boundedness of the derivatives of f, we also see that the convergence is uniform in X. Thus, c~ c V by Theorem 19.6, and (11) holds on c~. It remains to show that C~ is a core for A. Since C~ is densein Co, it suffi.ces by Proposition 19.9 to show that it is also invariant under (Tt)· Then note that, by dominated convergence, the differentiation operators commute with each Tt, and use condition (F1). D We proceed to characterize the linear operators A on Co whose closures Ä are generators of a Feller semigroups.

Theorem 19.11 (characterization of generators, Hille, Yosida} Let A be a linear operator on Co with domain V. Then A is closable and its closure Ä is the generator of a Feller semigroup on Co iff these conditions hold: (i) V is dense in Co; (ii) the range of Ao- A is densein Co for some Ao > 0; (iii) if f V 0::; f(x) for some f E V and x ES, then Af(x)::; 0. Condition (iii) is known as the positive-maximum principle.

376

Foundations of Modern Probability

Proof: First assume that A is the generator of a Feller semigroup (Tt)· Then (i) and (ii) hold by Theorem 19.4. To prove (iii), let f E V and x E S with J+ = f V 0:::; f(x). Then Ttf(x):::; Ttf+(x):::; liTt!+ II :::; 11/+11 = f(x), t 2:: 0, and so h- 1(Thf- f)(x) :::; 0. As h--+ 0, we get Af(x) :::; 0. Conversely, assume that A satisfies (i), (ii), and (iii). Let f E V be arbitrary, choose x E S with lf(x) I = 11!11, and put g = f sgn f(x ). Then g E V with g+ :::; g(x), and so (iii) yields Ag(x) :::; 0. Thus, we get for any -A>O 11(-A- A)fll 2:: .Ag(x)- Ag(x) 2:: .Ag(x)

= -AIIfll·

(13)

To show that A is closable, let ft, h, · · · E V with fn --+ 0 and Afn --+ g. By (i) we may choose 91,92, · · · E V with 9n--+ g, and by (13) we have

+ -Afn)ll 2:: -AIIYm + -Afnll,

11(-A- A)(gm

m, n E N, .A > 0.

Asn--+ oo, we get JI(.A- A)gm- .Agil 2:: -AIIYmll· Herewe may divide by A and let A --+ oo to obtain 119m - Yll 2:: IIYmll, which yields IIYII = 0 as m --+ oo. Thus, A is closable, and from (13) we note that the closure A satisfies

(14) A)!ll 2:: -AIIfll, A > 0, f E dom(A). Now assume that An--+ A > 0 and (.An- A)fn--+ g for some ft, h, · · · E dom(A). By (14) the sequence Un) is then Cauchy, say with limit f E Co. 11(-A-

By the definition of A we get (.A- A)J = g, and sog belongs to the range of .A - A. Letting A denote the set of constants A > 0 such that A - A has range Co, it follows in particular that Ais closed. If we can show that Ais open as well, then by (ii) we have A = (0, oo). Then fix any A E A, and conclude from (14) that A- A has a bounded inverseR>. with norm IIR>-11 :::; _A- 1. For any J1. > 0 with I-A- J.l.IIIR>-11 < 1, we may form the bounded linear operator

R =" (.A _ 11.)n Rn+1 JJ. L..Jn;:::o ,>. ' and we note that

(Jl.- A)Rp, = (.A- A)Rp,- (.A- Jl.)Rp, =I. In particular, J1. E A, which shows that A E A0 • We may next establish the resolvent equation (3). Then start from the identity (.A- A)R>. = (Jl.- A)Rp, =I. By a simple rearrangement, (.A- A)(R>. - Rp,)

= (Jl.- .A)Rp,,

and (3) follows as we multiply from the left by R>,. In particular, (3) shows that the operators R>. and Rp, commute for any .A, J1. > 0. Since R>.(A- A) = I on dom(A) and IIR>.II :::; _A-I, we have for any f E dom(A) as A--+ oo

11-AR>.J- !II = IIR>.AJII:::; A- 1IIA!II--+ 0.

19. Feller Processes and Semigroups

377

From (i) and the contractivity of >.R>., it follows easily that >.R>. --+ I in the strong topology. Now define A>. as in (6) and let T/ = etA"'. As in the proof of Lemma 19.7, we get T/J --+ Ttf for each f E C0 , uniformly for bounded t, where the Tt form a strongly continuous family of contraction operators on Co such that J e->.tTtdt = R>. for all >. > 0. To deduce the semigroup property, fix any f E Co and s, t 2 0, and note that as >.--+ oo

(Ts+t - TsTt)f

= (Ts+t -

r;+t)f + r;(Tt>.- Tt)f + (T; - Ts)Ttf--+ 0.

The positivity of the Operators Tt will follow immediately, if we can show that R>. is positive for each >. > 0. Then fix any function g 2 0 in C 0 , and put f = R>,g, so that g = (>. - A)j. By the definition of A, there exist some /I, /2, · · · E V with fn--+ fand Afn--+ Aj. If infx f(x) < 0, we have infx fn(x) < 0 for all sufficiently large n, and we may choose some Xn ES with fn(Xn):::; fn 1\0. By (iii) we have Afn(Xn) 2 0, and so infx(>.- A)fn(x)

< (>.- A)fn(xn) :::; >.fn(Xn) = >.infxfn(x).

As n --+ oo, we get the contradiction 0:::; infxg(x)

= infx(>.- A)j(x):::; >.infxf(x) < 0.

It remains to show that Ais the generator of the semigroup (Tt)· Butthis is clear from the fact that the operators >. - A are inverses to the resolvent operators R>.. 0 From the proof we note that any operator A on C0 satisfying the positive maximum principle in (iii) must be dissipative, in the sense that II(>.A)/11 2 >.IIJII for all f E dom(A) and >. > 0. This leads to the following simple observation, which will be needed later. Lemma 19.12 (maximality) Let (A, V) be the generator of a Feller semigroup on C0 , and assume that A extends to a linear operator (A', V') satisfying the positive-maximum principle. Then V' = V.

Proof: Fix any f E V', and put g =(I -A')f. Since A' is dissipative and (I- A)R1 = I on Co, we get II/- Rlgll:::; II(I- A')(f- Rlg)ll = 119- (I- A)Rlgll = 0, and so

f = R1g E V.

0

Our next aim is to show how a nice Markov process can be associated with every Feiler semigroup (Tt)· In order for the corresponding transition kernels 1-lt to have total mass 1, we need the operators Tt tobe conservative, in thesensethat sup19 Ttf(x) = 1 for all x ES. This can be achieved by a suitable extension. Let us then introduce an auxiliary state ß rt S and form the compactified space S = S U { A }, where A is regarded as the point at infinity when S is noncompact, and otherwise as isolated from S. Note that any function

378

Foundations of Modern Probability

f E Co has a continuous extension toS, obtained by putting /(A) = 0. We may now extend the original semigroup on Co to a conservative semigroup on the space 6 = C(S). Lemma 19.13 (compactijication) Any Feller semigroup (Tt) on Co admits an extension to a conservative Feller semigroup (i't) on 6, given by

Ttf

= /(A) + Tt{f- /(A)},

t ~ 0, f

E

6.

Proof: It is Straightforward to verify that (Tt) is a strongly continuous semigroup on 6. To show that the operators Tt are positive, fix any f E 6 with f ~ 0, and note that g f(A)- f E Co with g::::; f(A). Hence,

=

Ttg::::; Ttg+::::; IITtg+ll::::; llg+ll::::; f(A), and so Ttf = /(A)- Ttg ~ 0. The contraction and conservation properties 0 now follow from the fact that Tt 1 = 1. Our next step is to construct an associated semigroup of Markov transition kernels J.tt on S, satisfying

Ttf(x) We say that a state x E t ~ 0.

=

J

S is

f(y)J.tt(x, dy),

f

E

(15)

Co.

absorbing for (J.tt) if J.tt (x, {x})

= 1 for

each

Proposition 19.14 {existence) For any Feller semigroup (Tt) on C0 ,

there exists a unique semigroup of Markov tmnsition kernels J.tt on satisfying (15) and such that A is absorbing for (J.tt)·

S

Proof: For fixed x ES and t ~ 0, the mapping f f-t Ttf(x) isapositive linear functional on 6 with norm 1, so by Riesz's representation Theorem 2.22 there exist some probability measures J.tt(x, ·) on S satisfying Ttf(x)

=

J

f(y)J.tt(x, dy),

f E 6, x ES, t

~ 0.

(16)

The measurability of the right-hand side is clear by continuity. By a standard approximation followed by a monotone dass argument, we then obtain the desired measurability of J.tt(x, B) for any t ~ 0 and Borel set B c S. The Chapman-Kolmogorov relation holds on S by Lemma 19.1. Relation (15) is a special case of (16), and from (16) we further get

j f(y)J.tt(A, dy) = Ttf(A.) = f(A) = 0,

f E Co,

which shows that Ais absorbing. The uniqueness of (J.tt) is a consequence of the last two properties. o For any probability measure v on S, there exists by Theorem 8.4 a Markov process xv in S with initial distribution v and transition kernels J.tt· As before, we denote the distribution of xv by Pv and write Ev for

19. Feller Processes and Semigroups

379

the corresponding integration operator. When v = 8x, we often prefer the simpler forms Px and Ex, respectively. We may now extend Theorem 15.1 to a basic regularization theorem for Feller processes. Given a process X, we say that .6. is absorbing for x± if Xt = .6. or Xt- = .6. implies Xu = .6. for all u;::: t.

Theorem 19.15 (regularization, Kinney) Let X be a Feller process inS with arbitmry initial distribution v. Then X has an rcll version X in S such that .6. is absorbing for If (Tt) is conservative and v is restricted to S, we can choose X to be rcll even in S.

x±.

The idea of the proof is to construct a sufficiently rich dass of Supermartingales, to which the regularity theorems of Chapter 7 can be applied. Let denote the dass of nonnegative functions in C0 •

ct

Lemma 19.16 (resolvents and excessive functions) If f E Cri, then the process yt = e-t Rd(Xt), t ;::: 0, is a Supermartingale under P", for every V.

Proof: Writing C9t) for the filtration induced by X, we get for any t, h;::: 0 E[e-t-h Rif(Xt+h)i~lt] e-t-h e-t

= e-t-hThRd(Xt)

hoo e- Ts+hf(Xt)ds

1

00

8

e- 8 T 8 j(Xt)ds ::; yt.

D

Proof of Theorem 19.15: By Lemma 19.16 and Theorem 7.27, the process f(Xt) has a.s. right- and left-hand limits along Q+ for any f E 'D dom(A). Since 1J is densein Co, the stated property holds for every f E Co. By the separability of Co we may choose the exceptional null set N to be independent of f. If x1, x2, ···ES aresuchthat f(xn) converges for every f E Co, then the compactness of S ensures that Xn converges in the topology of S. Thus, on Ne the process X itself has right- and left-hand limits xt± along IQJ+; Oll N we may redefine X tobe 0. Then dearly Xt = Xt+ is rcll. It remains to show that X is a version of X or, equivalently, that Xt+ = Xt a.s. foreacht 2: 0. But this follows from the fact that Xt+h ~ Xt as h ..1- 0 by Lemma 19.3 and dominated convergence. Now fix any f E Co with f > 0 on S, and note from the strong continuity of (Tt) that even Rd > 0 on S. Applying Lemma 7.31 to the Supermartingale yt = e-t Rd(Xt), we condude that X= .6. a.s. on the interval [(, oo), where ( = inf{t;::: 0; .6. E {Xt,Xt_}}. Discarding the exceptional null set, we can make this hold identically. If (Tt) is conservative and v is restricted to S, then Xt E S a.s. for every t ;::: 0. Thus, ( > t a.s. for all t, and hence ( = oo a.s. Again we may assume that this holds identically. Then Xt and Xt- take values inS, and the stated regularity properties remain D valid inS.

=

380

Foundations of Modern Probability

In view of the last theorem, we may choose 0 to be the space of all S-valued rcll functions such that the state ß is absorbing, and let X be the canonical process on 0. Processes with different initial distributions V are then distinguished by their distributions Pv Oll n. Thus, under Pv the process X is Markov with initial distribution v and transition kernels J.Lt, and X has all the regularity properties stated in Theorem 19.15. In particular, X ß on the interval [(, oo), where ( derrotes the terminal time

=

( = inf{t 2: 0;

Xt



or Xt-

= ß}.

We take (Ft) tobe the right-continuous filtration generated by X, and put A = F 00 = Vt Ft. The shift operators Ot on 0 are defined as before by

(0tW) 8

= W +t, 8

s, t 2: 0.

The process X with associated distributions Pv, filtration F = (Ft), and shift operators Ot is called the canonical Feller process with semigroup (Tt). We are now ready to state a general version of the strong Markov property. The result extends the special versions obtained in Proposition 8.9 and Theorems 12.14 and 13.11. A further instant of this property appears in Theorem 21.11.

Theorem 19.17 {strong Markov property, Dynkin and Yushkevich, Blumenthal} For any canonical Feller process X, initial distribution v, optional time r, and random variable~;:::: 0, we have Ev[~ o O".IF".] =Ex. . ~ a.s. Pv on {r < oo}. Proof: By Lemmas 6.2 and 7.1 we may assume that T < oo. Let g derrote the filtration induced by X. Then Lemma 7.4 shows that the times Tn = 2-n[2nr+ 1] are Q-optional, and by Lemma 7.3 we have F". C Q".n for all n. Thus, Proposition 8.9 yields Ev[~ o (}".n; AJ

= Ev[Ex.,.n ~; AJ,

A E F"., n E N.

(17)

To extend the relation to r, we first assume that ~ = Ilk c} ~ .s- 1 EI~nl--+ 0,



> 0,

382

Foundations of Modern Probability p

and so supt IMil --+ 0. Now let = Ih h },

h > 0,

where p denotes the metric in 8. Note that a state x is absorbing iff Th a.s. Pc for every h > 0.

= oo

Lemma 19.22 (escape times} For any nonabsorbing state x ES, we have ExTh < oo for all sufficiently small h > 0.

Proof: If x is not absorbing, then J.Lt(x, B;) < p < 1 for some t, c: > 0, B; = {y; p(x,y) :::;; c:}. By Lemma 19.3 and Theorem 4.25 we may

where

19. Feller Processes and Semigroups

383

choose h E (0, e] so small that

JLt(y,B~):::; JLt(y,B;) :::;p,

y E B~.

Then Proposition 8.2 yields Px{rh ~ nt}:::; Px nk~n {Xkt E B~}:::; pn,

n E Z+,

and so by Lemma 3.4 Exrh

=

1

00

0

P{rh ~ s}ds:::; t LP{rh ~ nt} n~O

t

= t LPn = 1=- < oo.D n~O

p

We turn to a probabilistic description of the generator and its domain. Say that A is maximal within a dass of linear operators if it extends every member of the dass. Theorem 19.23 (characteristic opemtor, Dynkin) Let (A, V) be the genemtor of a Feller process. Then for any f E V we have Af(x) = 0 if x is absorbing, and otherwise Af(x) = lim Exf(X7'h)- f(x). (21) h-+0 Exrh Furthermore, A is the maximal operator on Co with those properties. Proof: Fix any f E V. If x is absorbing, then Ttf(x) = f(x) for all t ~ 0, and so Af(x) = 0. Fora nonabsorbing x, we get instead by Lemma 19.21 t,h

> 0.

(22)

By Lemma 19.22 we have Erh < oo for sufficiently small h > 0, and so (22) extends by dominated convergence tot= oo. Relation (21) now follows from the continuity of AJ, together with the fact that p(Xs, x) :::; h for all s < Th. Since the positive maximum principle holds for any extension of A with the stated properties, the last assertion follows by Lemma 19.12. D In the special case when S = JR.d, let C}( denote the dass of infinitely differentiable functions on JR.d with bounded support. An operator (A, V) with V::> C}( is said tobe localon C}( if Af(x) = 0 whenever f vanishes in some neighborhood of x. For any generator with this property, we note that the positive-maximum principle implies a local positive-maximum principle, asserting that if f E C}( has a local maximum ~ 0 at some point x, then Af(x) :::; 0. The following result gives the basic connection between diffusion processes and elliptic differential operators. This connection is explored further in Chapters 21 and 24.

384

Foundations of Modern Probability

{Feller diffusions and elliptic operators, Dynkin) Let (A, V) be the generator of a Feller process X in JRd, and assume that Cl( C V. Then X is continuous on [0, (), a.s. Pv for every v, iff A is local on CJt. In that case there exist some functions aij, bi, c E C(JRd), where c 2:: 0 and the aij form a symmetric, nonnegative definite matrix, such that for any f E C'j( and x E IR+, Theorem 19.24

Af(x) = ~

L· .aij(x)f{j(x) + l:.bi(x)JI(x)- c(x)f(x). t,J

t

(23)

In the situation described by this result, we may choose 0 to consist of all paths that are continuous on [0, (). The resulting Markov process is referred to as a canonical Feller diffusion.

Proof: If X is continuous on [0, (), then A is local by Theorem 19.23. Conversely, assume that A is local on C'j(. Fix any x E JRd and 0 < h < m, and choose f E C'j( with f 2:: 0 and support {y; h:::; IY- xl :::; m}. Then Af(y) = 0 for all y E B~, and so Lemma 19.21 shows that f(XtATh) is a martingale under Px. By dominated convergence we get Exf(Xrh) = 0, and since m was arbitrary, Px {IXrh- xl :::; h or Xrh = ~} = 1,

XE IRd, h

> 0.

Applying the Markov property at fixed times, we obtain for any initial distribution v

which implies

Hence, X is continuous on [0, () a.s. Pw To show that (23) holds for suitable aij, bi, and c, we choose for every x E JRd some functions fff, ff, f[j E C'j( such that, for any y close to x,

Putting

c(x)

= -Afff(x),

bi(x)

= Aff(x),

aij(x)

= Afij(x),

we note that (23) holds locally for any function f E CJt that agrees near x with a second-degree polynomial. In particular, we may choose f 0 (y) = 1, fi(Y) = Yi, and /ij(y) = YiYj near x to obtain

Afo(x) Afi(x) A/ij(x)

=

-c(x), bi(x)- Xic(x), aij(x) + xibj(x) + Xjbi(x)- xixjc(x).

This shows that c, bi, and aij

= aji

are continuous.

19. Feller Processes and Semigroups

f6

Applying the local positive-maximum principle to the same principle applied to the function

gives c(x)

~

385

0. By

we get 'l:ij UiUjaij(x) ~ 0, which shows that (aij) is nonnegative definite. Finally, we consider any function f E Cl( with a second-order Taylor expansion j around x. Here each function

g±(y) = ±(f(y)- }(y))- clx- Yl 2 ,

c

> 0,

has a local maximum 0 at x, and so

Ag±(x) = ±(Af(x)- A}(x))- c Liaii(x) ::; 0, Letting c --+ 0 gives Af(x) true.

= A}(x),

c > 0.

which shows that (23) is generally D

We consider next a basic convergence theorem for Feller processes, essentially generalizing the result for Levy processes in Theorem 15.17. Theorem 19.25 (convergence, Trotter, Sova, Kurtz, Mackevicius) Let X and xn be Feller processes inS with semigroups (Tt) and (Tn,t) and generators (A, V) and (An, Vn), respectively. Fixa core D for A. Then these conditions are equivalent:

(i) If f E D, there exist some fn E Vn with fn--+ f and Anfn--+ Af. (ii) Tn,t --+ Tt strongly for each t > 0. (iii) Tn,d--+ Ttf for every f E Co, uniformly for bounded t > 0. (iv) If Xfi _:; X 0 inS, then xn _:;X in D(1R.+,S). For the proof we need two lemmas, the first of which extends Lemma 19.7. Lemma 19.26 {norm inequality) Let (Tt) and (Tf) be Feller semigroups with generators (A, V) and (A', V'), respectively, where A' is bounded. Then

IITtf- T:fll :S 1t II(A- A')Tsfll ds, Proof: Fix any f E V and t by Theorem 19.6

f E V, t

~ 0.

(24)

> 0. Since (T;) is norm continuous, we get

:S (Tf-sTsf) = Tf-s(A- A')Tsf,

0

S

:S :S t.

Here the right-hand side is continuous in s, because of the strong continuity of (T8 ), the boundedness of A', the commutativity of A and T8 , and the norm continuity of (T;). Hence,

Ttf- T:J = 1t :s (Tf_ 8 Tsf) ds = Iot Tf_ 8 (A- A')Tsf ds,

386

Foundations of Modern Probability

and (24) follows by the contractivity of Tf_ 8 •

0

We may next establish a continuity property for the Yosida approximations A>. and A~ of A and An, respectively. Lemma 19.27 (continuity of Yosida approximation) Let (A, V) and (An, Vn) be the generators of some Feller semigroups satisfying condition (i) of Theorem 19.25. Then A~ -7 A>. strongly for every ..\ > 0.

Proof: By Lemma 19.8 it suffices to show that A~f -7 A>. f for every f E (..\- A)D. Then define g R>. f E D. By (i) we may choose some 9n E Vn with 9n -7 g and An9n -7 Ag. Then fn An)9n -7 (..\- A)g = /, and so

=

IIA~f- A>. !II

=

=(..\-

.A 2 IIR~f- R>.fii

< .A 2 IIR~(f- fn)ll + .A 2 IIR~fn- R>.fii < ..\jlf- fnll + .A2 II9n- 911 -7 0.

0

Proof of Theorem 19.25: First we show that (i) implies (iii). Since D is dense in C0 , it is enough to verify (iii) for f E D. Then choose some nmctions fn as in (i), and conclude by Lemmas 19.7 and 19.26 that, for any n E N and t, ..\ > 0,

IITn,d- Ttfll ::; IITn,tU- fn)ll + II(Tn,t- r;,t)fnll + IIT;,tUn- !)II + II(T;,t- Tt>.)fll + II(T/- Tt)fll < 211/n- /II + tii(A>.- A)fll + tii(An- A~)fnll + 1t II(A~- A>.)T8>. /II ds.

(25)

By Lemma 19.27 and dominated convergence, the last term tends to zero as n -7 oo. For the third term on the right, we get

AJII + II(A- A>.)JII + II(A>.- A~)fll + IIA~(f- fn)ll, which tends to II(A- A>.)fii by the same lemma. Hence, by (25) limsup sup IITn,d- Ttfll::; 2uii(A>.- A)/11, u, ..\ > 0, II(An- A~)fnll

n-+oo

::;

IIAnfn-

t~u

and the desired convergence follows by Lemma 19.7 as we let ..\ -t oo. Conversely, (iii) trivially implies (ii), and so the equivalence of (i)-(iii) will follow if we can show that (ii) implies (i). Then fix any f E D and ..\ > 0, and define g = (..\- A)f and fn = R~g. Assuming (ii), we get by dominated convergence fn -7 R>.g = f. Since (..\- An)fn = g = (..\- A)f, we also note that Anfn -7 Af. Thus, even (i) holds. It remains to show that conditions (i)-(iii) are equivalent to (iv). For convenience, we may then assume that S is compact and the semigroups (Tt) and (Tn,t) are conservative. First assume (iv). We may establish (ii) by showing that, for any f E C and t > 0, we have Tt f(xn) -7 Ttf(x)

19. Feller Processes and Semigroups

387

whenever Xn -+ x inS. Then assume that Xo = x and X[j = Xn· By Lemma 19.3 the process Xis a.s. continuous at t. Thus, (iv) yields Xf ~ Xt, and the desired convergence follows. Conversely, assume conditions (i)-(iii), and let X[j ~ X 0 . To obtain

xn ~X, it is enough to show that, for any /o, ... 'Im E

t1 · · · tm,

c and 0 = to < (26)

This holds by hypothesis when m = 0. Proceeding by induction, we may use the Markov property to rewrite (26) in the form

E

II fk(X;:,) ·Thmfm(Xf'm_J-+ E II /k(Xtk) · Thmfm(Xtm_J,

k 0 and g E Co, where R~ = (,\ - An)- 1 • Now (ii) yields R~g --+ R>..g, where R~ = e->..tTn,tdt, and so it suffices to prove that (R~- R~)g--+ 0. Then note that

J

.\R~g- .\R~g

= Eg(Y:n-1)- Eg(Yifn-1),

where the random variables Kn and K-n are independent of yn and geometricaUy distributed with parameters Pn = 1-e->..hn and Pn = Ahn(1+.\hn)- 1, respectively. Since Pn "'ßn, we have II.C(Kn)- .C(K-n)ll --+ 0, and the desired convergence foUows by Fubini's theorem. D

Exercises 1. Examine how the proofs of Theorems 19.4 and 19.6 can be simplified if we assume (F 3) instead of the weaker condition (F 2). 2. Consider a pseudo-Poisson process X on S with rate kernel a. Give conditions ensuring X to be FeUer. 3. Verify the resolvent equation (3), and conclude that the range of R>.. is independent of .\. 4. Show that a FeUer semigroup (Tt) is uniquely determined by the resolvent operator R>.. for a fixed ,\ > 0. Interpret the result probabilistically in terms of an independent, exponentially distributed random variable with mean ,\- 1 . (Hint: Use Theorem 19.4 and Lemma 19.5.)

5. Consider a discrete-time Markov process in S with transition operator T, and let r be an independent random variable with a fixed geometric

distribution. Show that T is uniquely determined by Exf(X7 ) for arbitrary x E S and f 2:: 0. (Hint: Apply the preceding result to the associated pseudo-Poisson process.) 6. Give a probabilistic description of the Yosida approximation Tt>.. in terms the original process X and two independent Poisson processes with rate ,\. 7. Given a FeUerdiffusion semigroup, write the second differential equation in Theorem 19.6, for suitable J, as a PDE for the function Ttf(x) on JR+ x JRd. Also show that the backward equation of Theorem 12.22 is a special case of the same equation.

19. Feller Processes and Semigroups

389

8. Consider a Feiler process X and an independent subordinator T. Show that yt = X(Tt) is again Markov, and that Y is Levy whenever this is true for X. If both T and X are stable, then so is Y. Find the relationship between the transition semigroups, respectively between the indices of stability. 9. Consider a Feiler process X and an independent renewal process To, T 1 , .... Show that Yn = Xrn is a discrete-time Markov process, and express its transition kernel in terms of the transition semigroup of X. Also show that yt = X(T[tj) may fail tobe Markov, even when (Tn) is Poisson.

10. Let X and Y be independent Feiler processes in S and T with generators A and B. Show that (X, Y) isaFeiler process inS x T with generator extending A + B, where A and B denote the natural extensions of A and B to C0 (S x T). 11. Consider inS a Feiler process with generator A and a pseudo-Poisson process with generator B. Construct a Markov process with generator A + B. 12. Use Theorem 19.23 to show that the generator of Brownian motion in A = ~ß Oll the set D of functions f E cg with Af E Co.

~ extends

13. Let R>.. be the A-resolvent of Brownian motion in R For any f E C0 , put h = R>..J, and show by direct computation that Ah- ~h" = f. Gonelude by Theorem 19.4 that ~ß with domain D, defined as above, extends the generator A. Thus, A = ~ß by the preceding exercise or by Lemma 19.12. 14. Show that if A is a bounded generator on C0 , then the associated Markov process is pseudo-Poisson. (Hint: Note as in Theorem 19.11 that A satisfies the positive-maximum principle. Next use Riesz' representation theorem to express A in terms of bounded kernels, and show that A has the form of Proposition 19.2.) 15. Let the processes xn and X besuch as in Theorem 16.14. Show that X[' .!!:t Xt for ail t > 0 implies xn .!!:t X in D(~+, ~d), and compare with the stated theorem. Also prove a corrsponding result for a sequence of Levy processes xn. (Hint: Use Theorems 19.28 and 19.25, respectively.)

Chapter 20

Ergodie Properties of Markov Processes transition and contraction operators; ratio ergodie theorem; space-time invariance and tail triviality; mixing and convergence in total variation; Harris recurrence and transience; existence and uniqueness of invariant measure; distributional and pathwise limits In Chapters 8 and 12 we have seen, under suitable regularity eonditions, how the transition probabilities of a discrete- or eontinuous-time Markov ehain converge in total variation toward a unique invariant distribution. Here our main purpose is to study the asymptotic behavior of more general Markov processes and their associated transition kernels. A wide range of powerful tools will then come into play. We first extend the basic ergodie theorem of Chapter 10 to suitable eontraetion operators on an arbitrary measure space and establish a general operator version of the ratio ergodie theorem. The relevance of those results for the study of Markov processes is due to the faet that the transition operatorsarepositive L 1 -L 00 -contraetions with respeet to any invariant measure >. on the state spaee S. The mentioned results cover both the positive recurrent ease, where >.S < oo, and the null-recurrent case, where >.S = oo. Even more remarkably, the same ergodie theorems apply to both the transition probabilities and the sample paths, in eaeh ease giving eonclusive information about the asymptotic behavior. Next we prove for an arbitrary Markov proeess that a eertain strong ergodicity condition is equivalent to the triviality of the tail a-field, the eonstancy of all bounded, space-time invariant functions, and a uniform mixing condition. We also eonsider a similar result where all four conditions are replaeed by suitably averaged versions. For both sets of equivalences, one gets very simple and transparent proofs by applying the general eoupling results of Chapter 10. In order to apply the mentioned theorems to specifie Markov proeesses, one needs to find regularity conditions ensuring the existenee of an invariant measure or the triviality of the tail a-field. Herewe consider a general dass of Feiler processes whieh satisfy either a strong reeurrenee or a uniform transienee condition. In the former case, we prove the existenee of an invariant measure, required for the application of the mentioned ergodie

20. Ergodie Properties of Markov Processes

391

theorems, and show that the space-time invariant functions are constant, which implies the mentioned strong ergodicity. Our proofs of the latter results depend on some potential theoretic tools related to those developed in Chapter 19. To begin with the technical developments, we consider a Markov transition operator Ton an arbitrary measurable space (S,S). Note that T is positive, in the sense that f 2:: 0 implies T f 2:: 0, and also that Tl = 1. As before, we write Px for the distribution of a Markov process Oll z+ with transition operator T starting at x E S. More generally, we define PI-' = 8 PxJ.L(dx) for any measure J.L on S. A measure >. on S is said to be invariant if >.T f = >.f for any measurable function f 2:: 0. Writing (} for the shift on the path space 8 00 , we define the associated operator Öby

J

Öf=foO. For any p 2:: 1, we say that an operator T on some measure space (S, S, J.L) is an V'-contraction if IITfllv ::; llfllv for every f E LP. By an L 1 -L 00 contraction we mean an operator that is an LP-contraction for every p E

[1, oo]. The following result shows the relevance of the mentioned notions for the theory of Markov processes.

Lemma 20.1 (Markov processes and contractions) Let T be a Markov transition operator on (S, S) with invariant measure >.. Then

(i) T isapositive L 1 -L 00 -contraction on (S,>.); (ii) Ö isapositive L 1 -L 00 -contraction on (S 00 ,P>..)· Proof: (i) Applying Jensen's inequality to the transition kernel J.L(x, B) = TlB(x) and using the invariance of >., we get for any p E [1, oo) and f E V'

The result for p = oo is obvious. (ii) Proceeding as in Lemma 8.11, we see that (} is a measure-preserving transformation on (8 00 , P>.). Hence, for any measurable function f 2:: 0 on 8 00 and constant p 2:: 1, we have

The contraction property for p = oo is again obvious.

D

We shall see how some crucial results of Chapter 10 carry over to the context of positive L 1 - L 00 -contractions on an arbitrary measure space. First we consider an operator version of Birkhoff's ergodie theorem. To simplify our writing, we introduce the operators Sn = Lk 0] 2:0, f E L1 . If T is even an L 1 - L 00 -contraction, then also

(ii) rJL{Mf (iii)

> 2r}

11MfliP ~

> r], f E L 1 , r > 0; 11/llp, f E LP, P > 1. ~ JL[f; f

Proof: (i) For any f E L 1 we write Mnf by positivity that Skf

Hence, Mnf

~

= f + TSk-d

~ f

= Sd V · · · V Snf and conclude

+ T(Mnf)+,

k

= 1, ... , n.

f+T(Mnf)+ for all n, and so by positivity and contractivity

JL[f; Mnf

> 0] >

JL[Mnf- T(Mnf)+; Mnf

> 0]

2: JL[(Mnf)+- T(Mnf)+J = II(Mn/)+111 -IIT(Mn/)+111 2: 0. As before, it remains to let n --7 oo. (ii) Put fr = /1{! > r}. By the L 00 -contractivity and positivity of An, Anf- 2r::; An(!- 2r) ::; An(fr- r),

which implies Mf- 2r

~

rJL{Mf

n E N,

M(fr- r). Hence, by part (i),

> 2r} < rJL{M(fr- r) > 0} ~ JL[fr; M(fr- r) > 0] ~ f.Lfr = JL[f; f > r].

(iii) Here the earlier proof applies with only notational changes.

D

Proof of Theorem 20.2: Fix any f E L 1 • By dominated convergence, we may approximate f in L 1 by functions jE L 1 nL 00 C L 2 • By Lemma 10.18, we may next approximate j in L 2 by functions of the form j + (g- Tg), where }, g E L 2 and T j = j. Finally, we may approximate g in L 2 by functions g E L 1 n L 00 • Since T contracts L 2 , the functions g- Tg will then approximate g- Tg in L 2 . Combining the three approximations, we have for any c > 0

J=

fc

+ (gc - Tgc) + hc + kc,

where fc E L2with Tfc = fc, 9c E L 1 nLoo, and

llhcll2 V llkcll1 < c.

(1)

20. Ergodie Properties of Markov Processes

Since fc: is invariant, we have Anfc:

393

=fc:· Next we note that

IIAn(gc:- Tgc:)lloo = n- 1llgc:- Tngc:lloo ~ 2n- 1llgc:lloo--+ 0.

(2)

Hence,

limsupAnf ~ fc: + Mhc: + Mkc: < oo a.e., n->oo and similarly for lim infn Anf. Combining the two estimates gives (limsupn -liminfn)Anf ~ 2Mihc:l Now Lemma 20.3 yields for any e, r

+ 2Mikc:l·

>0

< < r- 1llkc:ll1 < ejr, ~

JL{Mikc:l > 2r}

and so Mlhc:l + Mlkc:l --+ 0 a.e. as e--+ 0 along a suitable sequence. Thus, Anf converges a.e. toward some limit Af. To see that Af isT-invariant, we note that by (1) and (2) the a.e. limits Ahc: and Akc: exist and satisfy T Af- Af = (TA- A)(hc: + kc:)· By the contraction property and Fatou's lemma, the right-hand side tends to 0 a.e. as e --+ 0 along some sequence, and we get T Af = Af a.e. 0 A problern with the last theorem is that the limit Af may be 0, in which case the a.s. convergence Anf --+ Af gives little information about the asymptotic behavior of Anf· For example, this happens when JLS = oo and T is the operator induced by a JL-preserving and ergodie transformation () on S. Then Af is a constant, and the condition Af E L 1 implies Af = 0. To get around this difficulty, we may instead compare the asymptotic behavior of Snf with that of Sng for a suitable reference function g E L 1. This idea leads to a far-reaching and powerful extension of Birkhoff's theorem. Theorem 20.4 (ratio ergodie theorem, Chacon and Ornstein) Let T be a positive L 1-contraction on a measure space (S,S,JL), and fix any f E L 1 and g E L~. Then Snf /Sng converges a.e. on the set {S00 g > 0}.

Our proof will be based on three lemmas. Lemma 20.5 (individual terms) rn f /Sn+ 1g--+ 0 a.e. on {Soog

Proof: We may assume that f 2:: 0. Fix any e > 0, and define hn = Tn f- eSn+lg,

An= {hn

> 0},

n 2:: 0.

By positivity,

hn

= Thn-1

- eg ~ Th~_ 1 - Eg,

n 2:: 1.

Examining the cases An and A; separately, we conclude that

h-:; ~ Th~-1 -dAng,

n 2:: 1,

and so by contractivity

EJL[g; An]~ JL(Th~-1)- JLh-:; ~ JLh~-1- JLh-:;.

> 0}.

394

Foundations of Modern Probability

Summing over n gives €JL Ln~ 1 1A,.9 :::; JLhd = JL(f- cg) :::; JLf

< oo,

which implies JL[g; An i.o.] = 0 and hence limsupn(Tnf/Sn+19):::; € a.e. on {g > 0}. Since c was arbitrary, we obtain Tn f / Sn+19 -t 0 a.e. on {g > 0}. Applying this result to the functions Tm f and Tmg gives the same convergence on {Sm-19 = 0 < Smg} for arbitrary m > 1. 0 To state the next result, we introduce the nonlinear filling operator U on L 1, given by Uh = Th+- h_. It is suggestivetothink of the sequence unh as resulting from successive attempts to fill a hole h_, by mapping in each step only the matter that has not yet fallen into the hole. We also define Mnh = S1h V··· V Snh.

Lemma 20.6 (filling operator) For any h E L 1 and n E N, we have un-lh 2: 0 on {Mnh > 0}. Proof: Writing hk

= h+ +

(Uh)+ + · · · + (Ukh)+, we claim that

(3) This holds for k = 0 since h+ = h + h_ 2: h. Proceeding by induction, we assume (3) tobe true for k = m 2: 0. Using the induction hypothesis and the definitions of sk, hk, and u, we get form+ 1 h+TSm+lh ::=; h+Thm

= h+ Lk~mT(Ukh)+

h+'""

(uk+ 1h+ (Ukh)_)

h+'""

((uk+ 1h)+- (Uk+ 1h)_

L...Jk~m

L...Jk$.m

h + hm+l- h+ +

h_-

+ (Ukh)_)

(Um+lh)_:::; hm+l·

This completes the proof of (3). If Mnh > 0 at some point inS, then Skh > 0 for some k:::; n, and so by (3) we have hk > 0 for some k < n. But then (Ukh)+ > 0, and therefore (Ukh)_ = 0 for the same k. Since (Ukh)_ is nonincreasing, it follows that (Un-lh)_ = 0, and hence un-lh 2: 0. 0 To state our third and crucial lemma, we write g E 1i (!) for a given E L~ if there exists a decomposition f = !I + h with fi, h E L~ such that g = Tfi + f2. In particular, we note that J,g E L~ implies U(f - g) = f' - g for some f' E 7(!). The classes 7",(!) are defined recursively by ln+l(J) = 1i(7",(f)) and we put 7(!) = Un 7",(!). We may now introduce the functionals

f

'1/Jsf

= sup{JL[g; B]; g E 7(!)},

f E L~, B ES.

20. Ergodie Properties of Markov Processes

395

Lemma 20.7 (filling functionals} Let J,g E L~ and BE S. Then

B C {limsupnSn(f- g) > 0}

===}

'1/Jsf 2:: '1/Jsg.

Proof: Fix any g' E T(g) and c > 1. First we show that {limsupnSn(f- g) > 0} C {limsupnSn(cf- g') > 0} a.e.

(4)

We may then assume that g' E 1i (g), since the general result then follows by iteration in finitely many steps. Letting g' = r + Ts for some r, s E L~ with r + s = g, we obtain

Sn(cf- g')

Tk(cf- r- Ts) LJk 0, we get

=

U1K(x) ~ b- 1 Uh(x) ~ b- 1 (1- r)- 1 < oo,

x ES,

which shows that Xis uniformly transient. Now assume instead that Uhh 1. Fix any measurable function f on S with 0 ~ f ~ h and pf > 0, and put g = 1- Utf. By Lemma 20.13 we get

=

g

= =

1- Utf = Uhh- Uhf- Uh(h- j)Utf Uh(h- !)(1- Utf) Uh(h- f)g ~ Uhhg = Qg.

(19)

Iterating this relation and using Lemma 20.16 (ii), we obtain g ~ Qng -t vg, where V rv p is the unique Q-invariant distributionOll s. Inserting this into (19) gives g ~ Uh(h- f)vg, and so by Lemma 20.15 (ii) vg ~ v(Uh(h- !)) vg ~ (1- p(kf)) vg.

Since p(kf) > 0, we obtain vg = 0, and so UJ f = 1 - g = 1 a.e. v ,...., p. Recalling that Utf is continuous by Lemma 20.14 and suppp = S, we obtain Utf = 1. Taking expected values in (13), we conclude that Ato = oo a.s. Px for every x E S. Now fix any compact set K C S with pK > 0. Since b infK h > 0, we may choose f = b1K, and the desired Harris recurrence follows. D

=

A measure ,\ on S is said to be invariant for the semigroup (Tt) if .\(Ttf) = .\f for all t > 0 and every measurable function f 2:: 0 on S. In the Harris recurrent case, the existence of an invariant measure ,\ can be inferred from Lemma 20.16.

Theorem 20.18 {invariant measure, Harris, Watanabe) Any Harris recurrent FeZZer process on S with supporting measure p has a locally finite, invariant measure ,\,...., p, and every u-finite, invariant measure agrees with ,\ up to a normalization.

To prepare for the proof, we first express the required invariance in terms of the resolvent operators.

20. Ergodie Properties of Markov Processes

407

Lemma 20.19 (invariance equivalence) Let (Tt) be a Feller semigroup on S with resolvent (Ua), and fix any locally finite measure >. on S and constant c > 0. Then >. is (Tt)-invariant iff it is aUa-invariant lor every a 2:: c.

Proof: If >. is (Tt)-invariant, then Fubini's theorem yields for any measurable function I 2:: 0 and constant a > 0 >.(Uaf)

=

1

00

e-at >.(Ttf) dt

=

1

00

e-at .XI dt

= .XI/a,

(20)

which shows that >. is aUa-invariant. Conversely, assume that >. is aUa-invariant for every a 2:: c. Then for any measurable function I 2:: 0 on S with .XI < oo, the integrals in (20) agree for all a 2:: c. Hence, by Theorem 5.3 the measures >.(Ttf)e-ctdt and .XIe-ctdt agree on JR+, which implies >.(Ttf) = .XI for almost every t 2:: 0. By the semigroup property and Fubini's theorem we then obtain for any t2::0

>.(Ttf)

=

c>.Uc(Ttf) = c>.

= c

=

c

1

00

1

00

e-csTsTtf ds

e-cs >.(Ts+tf) ds

1oo e-cs .XI ds = .XI,

which shows that >. is (Tt)-invariant.

D

Proof of Theorem 20.18: Let h, Q, and v be such as in Lemmas 20.15 and 20.16, and put >. = h- 1 · v. Using the definition of >. (twice), the Qinvariance of v (three times), and Lemma 20.13, we get for any constant a 2:: llhll and bounded, measurable flmction I 2:: 0 on S a>.Uaf

=

=

av(h- 1 Uaf) = avUhUal v(Uhl - Uaf + UhhUal) vUhl = v(h- 1 !) =.XI,

which shows that >. is aUa-invariant for every such a. By Lemma 20.19 it follows that >. is also (Tt)-invariant. To prove the asserted uniqueness, consider any a-finite, (Tt)-invariant measure >.' on S. By Lemma 20.19, >.' is even aUa-invariant for every a 2:: llhll· Now define v' = h · >.'. Letting I 2:: 0 be bounded and measurable on Sand using Lemma 20.13, we get as before

>.'(hUh(hf)) = a>.'UahUh(hl) a>.'(Ua(hf)- Uh(hf) + aUaUh(hf)) a>.'Ua(hf) = >.'(hf) = v' /,

408

Foundations of Modern Probability

which shows that v' is Q-invariant. Hence, the uniqueness part of Lemma 20.16 (ii) yields v' = cv for some constant c ~ 0, which implies >..'

c>...

D

A Harris recurrent Feiler process is said to be positive recurrent if the invariant measure >.. is bounded and null-recurrent otherwise. In the former case, we may assume that >.. is a probability measure on S. For any process X in S, the divergence Xt --+ oo a.s. or Xt ~ oo means that lK(Xt) --+ 0 in the samesense for every compact set K C S. Theorem 20.20 (distributionallimits) For any regular Feller process X and distribution J.L on S, the following holds as t--+ oo:

(i) If X is positive recurrent with invariant distribution >.. and A E Foo with PJ.!A > 0, then IIPt o 8f: 1 - P.>.ll --+ 0. p (ii) If X is null-recurrent or transient, then Xt ~ oo. Proof: (i) Since P>. o 8f: 1 = P>. by Lemma 8.11, the assertion foilows from Theorem 20.12 together with properties (ii) and (iv) of Theorem 20.10. (ii) (null-recurrent case): For any compact set K c Sand constant E: > 0, we define

and note that, for any invariant measure .X, (21) Since J.LTtlK - TtlK(x) --+ 0 for ail x E S by Theorem 20.12, we have liminft Bt = S, and so >..Bt --+ oo by Fatou's lemma. Hence, (21) yields limsupt J.LTtlK ~ E:, and since E: was arbitrary, we obtain PJ.!{Xt E K} = J.LTtlK --+ 0. (ii) (transient case): Fix any compact set K C S with pK > 0, and conclude from the uniform transience of X that UlK is bounded. Hence, by the Markov property at t and dominated convergence,

p

which shows that UlK(Xt) ~ 0. Since UlK is strictly positive and also continuous by Lemma 20.14, we conclude that Xt

.!.:; oo.

D

We complete our discussion of regular Feiler processes with a pathwise limit theorem. Recail that "almost surely" means a.s. PJ.! for every initial distribution J.L on S.

20. Ergodie Properties of Markov Processes

409

Theorem 20.21 (pathwise limits) For any regular Feller process X on S, the following holds as t-+ oo:

(i) If X is positive recurrent with invariant distribution .\, then C 1 Iot f(OsX) ds-+ E>J(X) a.s.,

f bounded, measurable.

(ii) If X is null-recurrent, then C 1 Iot lK(Xs) ds -+ 0 a.s.,

K

S compact.

C

(iii) If X is transient, then Xt -+ oo a.s.

Proof: (i) From Lemma 8.11 and Theorems 20.10 (i) and 20.12 we note that P>.. is stationary and ergodic, and so the assertion holds a.s. P>.. by Corollary 10.9. Since the stated convergence is a tail event and PJL = P>.. on T for any JL, the general result follows. (ii) Since P>.. is shift-invariant with P>..{Xs E K} = .\K < oo, the lefthand side converges a.e. P>.. by Theorem 20.2. From Theorems 20.10 and 20.12 we see that the limit is a.e. a constant c 2: 0. Using Fatou's lemma and Fubini's theorem gives E>..c :S liminfC 1 t-+oo

Jot

P>..{Xs E K} ds = .\K < oo,

which implies c = 0 since IIP> 0, and conclude from the Markov property at t > 0 that a.s. PJL

UlK(Xt) =Ex,

1

00

lK(Xr) dr = E::•

1

00

lK(Xr) dr.

Using the chain rule for conditional expectations, we get for any s < t

EJL[UlK(Xt)IFs]

E:·ioo lK(Xr) dr

< E:·1oo lK(Xr) dr = UlK(Xs), which shows that UlK(Xt) is a supermartingale. Since it is also nonnegative and right-continuous, it converges a.s. PJL as t -+ oo, and the limit equals

0 a.s. since UlK(Xt) ~ 0 by the preceding proof. Since UlK is strictly D positive and continuous, it follows that Xt -+ oo a.s. Pw

410

Foundations of Modern Probability

Exercises 1. Given a measure space (S,S,JL), let T be a positive, linear operator on L 1 n L 00 • Show that if T is both an L 1-contraction and an L 00 -contraction, then it is also an V-contraction for every p E [1, oo]. (Hint: Prove a Höldertype inequality forT.) 2. Extend Lemma 10.3 to arbitrary transition operators T on a measurable space (S, S). In other words, letting I denote the dass of sets BE S with T1s = 1s, show that an S-measurable function f 2: 0 isT-invariant iff it is I-measurable. 3. Prove a continuous-time version of Theorem 20.2 for measurable semigroups of positive L 1 - L 00 -contraction. (Hint: Interpolate in the discrete-time result.) 4. Let (Tt) be a measurable, discrete- or continuous-time semigroup of positive L 1 -L 00 -contractions on (S, S, v), let /11.11 2 , ... be asymptotically invariant distributionsOll z+ or R+, and define An = fTtJLn(dt). Show that Anf ~ Af for any f E L 1 (-\), where ~ denotes convergence in measure. (Hint: Proceed as in Theorem 20.2, using the contractivity tagether with Minkowski's and Chebyshev's inequalities to estimate the remainder terms.) 5. Prove a continuous-time version of Theorem 20.4. (Hint: Use Lemma 20.5 to interpolate in the discrete-time result.) 6. Derive Theorem 10.6 from Theorem 20.4. (Hint: Take g = 1, and proceed as in Corollary 10.9 to identify the limit.) 7. Show that when f 2: 0, the limit in Theorem 20.4 is strictly positive on the set {Soof 1\ S00 g > 0}. 8. Show that the limit in Theorem 20.4 is invariant, at least when T is induced by a measure-preserving map on S. 9. Derive Lemma 20.3 (i) from Lemma 20.6. (Hint: Note that if g E T(f) with f E L~, then 119 :::; JLJ. Gonelude that for any h E L 1 , JL[h; Mnh > 0] 2: JL[un- 1 h; Mnh > 0] 2: 0.) 10. Show that Brownian motion X in JRd is regular and strongly ergodie for every d E N with an invariant measure that is unique up to a constant factor. Alsoshow that Xis Harris recurrent for d = 1, 2, uniformly transient for d 2: 3. 11. Let X be a Markov process with associated space-time process Je Show that X is strongly ergodie in the sense of Theorem 20.10 iff X is weakly ergodie in the sense of Theorem 20.11. (Hint: Note that a function is space-time invariant for X iff it is invariant for X.) 12. Fora Harris recurrent process on R+ or Z+, every tail event is clearly a.s. invariant. Show by an example that the statement may fail in the transient case.

20. Ergodie Properties of Markov Processes

411

13. State and prove discrete-time versions of Theorems 20.12, 20.17, and 20.18. (Hint: The continuous-time arguments apply with obvious changes.) 14. Derive discrete-time versions of Theorems 20.17 and 20.18 from the corresponding continuous-time results. 15. Show that a regular Markov process may be weakly but not strongly ergodic. (Hint: For any strongly ergodie process, the associated space-time process has the stated property. For a less trivial example, consider a suitable supercritical branching process.) 16. Give examples of nonregular Markov processes with no invariant measure, with exactly one (up to a normalization), and with more than one. 17. Show that a discrete-time Markov process X and the corresponding pseudo-Poisson process Y have the sameinvariant measures. Furthermore, regularity of X implies that Y is regular, but not conversely.

Chapter 21

Stochastic Differential Equations and Martingale Problems Linear equations and Ornstein-Uhlenbeck processes; strong existence, uniqueness, and nonexplosion criteria; weak solutions and local martingale problems; well-posedness and measurability; pathwise uniqueness and functional solution; weak existence and continuity; transformation of SDEs; strong Markov and Feller properties

In this chapter we shall study classical stochastic differential equations (SDEs) driven by a Brownian motion and clarify the connection with the associated local martingale problems. Originally, the mentioned equations were devised to provide a pathwise construction of diffusions and more general continuous semimartingales. They have later turned out to be useful in a wide range of applications, where they may provide models for a diversity of dynamical systems with random perturbations. The coefficients determine a possibly time-dependent elliptic operator A as in Theorem 19.24, which suggests the associated martingale problern of finding a process X such that the processes M f in Lemma 19.21 become martingales. It turns out to be essentially equivalent for X to be a weak solutions to the given SDE, as will be seen from the fundamental Theorem 21.7. The theory of SDEs utilizes the basic notions and ideas of stochastic calculus, as developed in Chapters 17 and 18. Occasional references will be made to other chapters, such as to Chapter 6 for conditional independence, to Chapter 7 for martingale theory, to Chapter 16 for weak convergence, and to Chapter 19 for Feller processes. Some further aspects of the theory are displayed at the beginning of Chapter 23 as well as in Theorems 24.2, 26.8, and 27.14. The SDEs studied in this chapter are typically of the form dXf = O"j(t,X)dBf

+ bi(t,X)dt,

(1)

or more explicitly, Xt

=X~+ L·J}ot O"J(s,X)dB~ + lot bi(s,X)ds,

t

~ 0.

(2)

Here B = (B 1 , ... , Br) is a Brownian motion in JRr with respect to some filtration :F, and the solution X = (X 1 , ... , Xd) is a continuous

21. Stocbastic Differential Equations and Martingale Problems

413

.F-semimartingale in JR.d. Furthermore, the coefficients a and b are progressive functions of suitable dimension, defined on the canonical path space C(IR.+,JR.d) equipped with the induced filtration 9t = a{w 8 ; s :S t}, t ~ 0. For convenience, weshall often refer to (1) as equation (a, b). For the integrals in (2) to exist in the sense of Itö and Lebesgue integration, X must fulfill the integrability conditions

(3) where aij = a1ak or a = aa', and the bars denote any norms in the spaces of d x d-matrices and d-vectors, respectively. For the existence and adaptedness of the right-hand side, it is also necessary that the integrands in (2) be progressive. This is ensured by the following result. Lemma 21.1 (progressive functions) Let the function f on IR.+ xC(IR.+, JR.d) be progressive for the induced filtration g on C (IR.+, JR.d), and let X be a continuous, .F-adapted process in JR.d. Then the process yt = f(t, X) is .F -progressive. Proof: Fix any t ~ 0. Since Xis adapted, we note that 1T 8 (X) = Xs is Fr measurable for every s :S t, where 1T8 (w) = W 8 on C(IR.+,JR.d). Since 9t = a{1r8 ; s :S t}, Lemma 1.4 shows that X is .Ft/9rmeasurable. Hence, by Lemma 1.8 the mapping 0, t 2:: 0, and x E C(IR+, JR.d), define

and let aE: = aE:a~. Since a and bare progressive, the processes aE:(s,X) and bE:(s,X), s:::; t, are measurable functions of X on [0, (t- c)+l· Hence, a strong solution Xe to equation (ac, bc) may be constructed recursively on the intervals [(n- 1)c, nc], n E N, starting from an arbitrary random

420

Foundations of Modern Probability

vector ~JlB in JRd with distribution /-lo Note in particular that X 15 solves the martingale problern for the pair (a15 , b15 ) Applying Theorem 1707 to equation (ae:,b 15 ) and using the boundedness of a and b, we get for any p > 0 0

E sup IXf+r- XfiP _$ hP/ 2 + hP _$ hP/ 2 , O:S;r:S;h

t,E ~ 0, h E [0, 1]0

For p > 2d it follows by Corollary 1609 that the family {Xe:} is tight in C(IR+, JRd), and by Theorem 1603 we may then choose some En -+ 0 such that xe:n ~ X for a suitable X To see that X solves the martingale problern for (a, b), let f E Cl( and s < t be arbitrary, and consider any bounded, continuous function g: C([O, s], JRd) -+ R We need to show that 0

E { f(Xt) - f(Xs) - 1t Arf(X)dr} g(X) = Oo Then note that xe: satisfies the corresponding equation for the Operators A;: constructed from the pair (ae:, be:)o Writing the two conditions as Ecp(X) = 0 and Ecpe:(Xe:) = 0, respectively, it suffices by Theorem 4027 to show that 'Pe:(xe:) -+ cp(x) whenever Xe: -+ x in C(IR+,JRd)o This follows easily from the continuity conditions imposed on a and bo Now assume that the solutions PJ-L are unique, and let I-ln~ /-lo Arguing as before, we see that (PJ-Ln) is tight, and so by Theorem 1603 it is also relatively compacto If P!Ln ~ Q along some subsequence, then as before we note that Q solves the martingale problern for (a, b) with initial distribution /-lo Hence Q = PJ-L, and the convergence extends to the original sequenceo D Our next aim is to show how the well-posedness of the local martingale problern for (a, b) extends from degenerate to arbitrary initial distributionso This requires a basic measurability property, which will also be needed latero Theorem 21.10 (measurability and mixtures, Stroock and Varadhan) Let a and b be progressive and such that, for any x E JRd, the local martingale problem for (a, b) with initial distribution 8x has a unique solution Px Then (Px) is a kernel from JRd to C(IR+, JRd), and for every initial distribution t-t, the associated local martingale problern has the unique solution PJ-L = J Pxt-t(dx)o 0

Proof: According to the proof of Theorem 21.7, it is enough to formulate the local martingale problern in terms of functions f belanging to some countable subdass C c Cl(, consisting of suitably truncated versians of the coordinate functions xi and their products xixi Now define P = P(C(JRd,JRd)) and PM= {Px; x E JRd}, and write X for the canonical process in C(IR+, JRd)o Let D denote the class of measures P E P with degenerate projections Po X 0 1 Next let I consist of all measures P E P such that X satisfies the integrability condition (3)0 Finally, put 0

0

21. Stochastic Differential Equations and Martingale Problems

421

rl = inf{t; IM{ I ~ n}, and let L be the dass of measures P E P such that

the processes M{'n = M f (t 1\ rl) exist and are martingales under P for all f E C and n E N. Then clearly PM= D n In L. To prove the asserted kernel property, it is enough to show that PM is a measurable subset of P, since the desired measurability will then follow by Theorem Al.3 and Lemma 1.40. The measurability of D is clear from Lemma 1.39 (i). Even I is measurable, since the integrals on the left of (3) are measurable by Fubini's theorem. Finally, Ln I is a measurable subset of I, since the defining condition is equivalent to countably many relations of the form E[M{'n- Mf•n; F] = 0, with f E C, n E N, s < t in IQ+, and FEF8 • Now fix any probability measure J.t on ~d. The measure PJ.L = I Pxt-t( dx) has clearly initial distribution J.t, and from the previous argument we note that PJ.L again solves the local martingale for (a, b). To prove the uniqueness, let P be any measure with the stated properties. Then E[M{•n- Mf•n; Fl Xo] = 0 a.s. for all /, n, s < t, and F as above, and so P[ ·IXo] is a.s. a solution to the local martingale problern with initial distribution 8Xo. Thus, P[ ·IXo] = Px0 a.s., and we get P = EPx0 Pxt-t(dx) = Pw This D extends the well-posedness to arbitrary initial distributions.

=I

We return to the basic problern of constructing a Feller diffusion with given generator A in (17) as the solution to a suitable SDE or the associated martingale problem. The following result may be regarded as a converse to Theorem 19.24. Theorem 21.11 (strong Markov and Feller properties, Stroock and Vamdhan) Let a and b be measumble functions on JRd such that, for any x E JRd, the local martingale problern for (a, b) with initial distribution 8x has a unique solution Px. Then the family (Px) satisfies the strong Markov property. If a and b are also bounded and continuous, then the equation Ttf(x) = Exf(Xt) defines a Feller semigroup on Co, and the opemtor A in (17) extends uniquely to the associated generator. Proof: By Theorem 21.10 it remains to prove that, for any state x E JRd and bounded optional time T,

As in the previous proof, this is equivalent to countably many relations of the form (19) with s < t and F E F 8 , where Mf,n denotes the process Mf stopped at Tn = inf{t; IMfl ~ n}. Now 0:; 1 Fs c Fr+s by Lemma 7.5, and in the diffusion case (MtJ,n

-

Mf,n) s

0

() _ Mf T -

(r+t)i\ 0 on JRd, where c is bounded away from 0 and oo. Then weak existence and uniqueness in law hold simultaneously for equations (a, b) and (ca, c2 b). Proof: Assurne that X solves the local martingale problern for the pair

(a, b), and introduce the process V= c2 (X)·.\ with inverse (r8 ). By optional

sampling we note that M/., s ~ 0, is again a local martingale, and the process Ys = X 7 • satisfies Mf.

= f(Ys)- f(Yo)

-1

8

c2 Af(Yr)dr.

Thus, Y solves the local martingaleproblern for (c2 a, c2 b). Now let T denote the mapping on C(JR+,JRd) leading from X to Y, and write T' for the corresponding mapping based on c- 1 . Then T and T' are mutual inverses, and so by the previous argument applied to both mappings, a measure PE P(C(JR+,JRd)) solves the local martingale problern for (a, b) iff Po r- 1 solves the corresponding problern for (c2 a, c2 b). Thus, both existence and uniqueness hold simultaneously for the two problems. By Theorem 21.7 the last statement translates immediately into a corresponding assertion for the SDEs. 0 Our next aim is to examine the connection between weak and strong solutions. Under appropriate conditions, weshall further establish the existence of a universal functional solution. To explain the subsequent terminology, let Q be the filtration induced by the identity mapping (e, B) on the canonical space n = JRd X C(JR+,JRr), so that Yt = a{e,Bt), t ~ 0, where B! = Bsl\t· Writing wr forther-dimensional Wiener measure, we introduce for any fL E P(JRd) the (J.L 0 wr)-completion Qf of Yt· The universal 9t' and we say that a function completion 9t is defined as

nJL

F: JRd

X

(20)

C(JR+,JRr)-+ C(JR+,JRd)

is universally adapted if it is adapted to the filtration

g = (9t)·

Theorem 21.14 (pathwise uniqueness and functional solution) Let a and b be progressive and such that weak existence and pathwise uniqueness hold for solutions to equation (a, b) starting at jixed points. Then strong existence and uniqueness in law hold for any initial distribution, and there exists a measurable and universally adapted function F as in (20) such that every solution (X, B) to equation (a, b) satisfies X= F(X0 , B) a.s.

Note in particular that the function F above is independent of initial distribution fl· A key step in the proof, accomplished in Lemma 21.17, is

424

Foundations of Modern Probability

to establish the corresponding result for a fixed J.L· Two further lemmas will be needed, and we begin with a statement that clarifies the connection between adaptedness, strong existence, and functional solutions. Lemma 21.15 (transfer of strong solution) Let (X, B) solve equation (a, b), and assume that X is adapted to the complete filtration induced by X 0 and B. Then X= F(Xo,B) a.s. for some Borel-measurable function F as in {20), and for any basic triple (F, B, ~) with ~ ;;, Xo, the process X = F(~, B) is F-adapted and such that the pair (X, B) solves equation (a,b). Proof: By Lemma 1.13 we have X = F(Xo, B) a.s. for some Borelmeasurable function F as stated. By the same result, there exists for every t ;::: 0 a further representation of the form Xt = Gt(Xo, Bt) a.s., and so F(Xo, B)t = Gt(Xo, Bt) a.s. Hence, Xt = Gt(~, f3t) a.s., and so X is Fadapted. Since also (X,B);;, (X,B), Proposition 17.26 shows that even the formerpair solves equation (a, b). 0

The following result shows that even weak solutions can be transferred to any given probability space with a specified Brownian motion. Lemma 21.16 (transfer of weak solution)

Let (X, B) solve equation

(a,b), and fix any basic triple (F,B,O with ~ ;;, Xo. Then there exists d a process Xll 0 '1F with Xo = ~ a.s. and (X,B) = (X,B). Furthermore, the filtration g induced by (X, F) is a standard extension of :F, and the pair (X, B) with filtration g solves equation (a, b). Proof: By Theorem 6.10 and Proposition 6.13 there exists a process -

-

-

d

-

Xlle,.BF satisfying (X,~,B) = (X,Xo,B), andin particular X 0 = ~ a.s. To see that g is a standard extension of F, fix any t ;::: 0 and define B' = B - f3t. Then {Xt, f3t)llB' since the corresponding relation holds for (X, B), and so xt lle,ßtB'. Since also xt lle,.BF, Proposition 6.8 yields Xtlle,ßt(B',F) and hence Xtll.rtF. But then (Xt,Ft)ll.rtF by Corollary 6.7, which means that 9tll.rtF. Since standard extensions preserve martingales, Theorem 18.3 shows that B remains a Brownian motion with respect to Q. As in Proposition 17.26, we conclude that the pair (X, B) solves equation (a, b). o

We are now ready to establish the crucial relationship between strong existence and pathwise uniqueness. Lemma 21.17 (strong existence and pathwise uniqueness, Yamada and Watanabe} Assurne that weak existence and pathwise uniqueness hold for solutions to equation (u, b) with initial distribution J.L· Then even strong existence and uniqueness in law hold for such solutions, and there exists a measurable function Fl-' as in {20) suchthat any solution (X, B) with initial distribution J.L satisfies X= Fp.(Xo, B) a.s.

21. Stochastic Differential Equations and Martingale Problems

425

Proof: Fix any solution (X, B) with initial distribution f-L and associated filtration :F. By Lemma 21.16 there exists some process Y llx0 ,B :F with Yo = Xo a.s. such that (Y, B) solves equation (a, b) for the filtration g induced by (Y,:F). Since g is a standard extension of :F, the pair (X,B) remains a solution for Q, and the pathwise uniquenessyields X= Y a.s. For each t 2: 0 we have xt llx0 ,sXt and (Xt, Bt)ll(B - Bt), and so Xtllx 0 ,stXt a.s. by Proposition 6.8. Thus, Corollary 6.7 (ii) shows that X is adapted to the complete filtration induced by (Xo, B). Hence, by Lemma 21.15 there exists a measurable function F11 with X = F11 (X0 , B)

a.s. and such that, for any basic triple (f:, B, ~) with ~ 4 Xo, the process X = F11 (~, B) is f:-adapted and solves equation (a, b) along with B. In

particular, X 4 X since (~, B) 4 (X0 , B), and the pathwise uniqueness shows that X is the a.s. unique solution for the given triple (f:, B, ~). This 0 proves the uniqueness in law.

Proof of Theorem 21.14: By Lemma 21.17 we have uniqueness in law for solutions starting at fixed points, and Theorem 21.10 shows that the corresponding distributions Px form a kernel from ~d to C(~+' ~d). By Lemma 21.8 there exists a measurable mapping G such that, whenever X has distribution Px and '!?llX is U(O, 1), the process B = G(Px, X, '!9) is a Brownian motion in ~r and the pair (X, B) solves equation (a, b). Writing Qx for the distribution of (X, B), it is clear from Lemmas 1.38 and 1.41 (ii) that the mapping x 1-t Qx is a kernel from ~d to C(~+' ~d+r). Changing the notation, we may write (X, B) for the canonical process in C(~+' ~d+r). By Lemma 21.17 we have X= Fx(x, B) = Fx(B) a.s. Qx, and so

Qx[X E ·IB]

= 8F.,(B)

a.s.,

x E ~d.

(21)

By Proposition 7.26 we may choose versions Vx,w = Qx[X E ·IB E dw] that combine into a probability kernel v from ~d x C(~+' ~r) to C(~+' ~d). From (21) we see that Vx,w is a.s. degenerate for each x, and since the set D of degenerate measures is measurable by Lemma 1.39 (i), we can modify v suchthat Vx,wD 1. In that case,

=

Vx,w = DF(x,w)'

XE ~d'

W

E C(~+'~r),

(22)

for some function F as in (20), and the kernel property of v implies that Fis product measurable. Comparing (21) and (22) gives F(x, B) = Fx(B) a.s. for all x. Now fix any probability measure f-L on ~d, and conclude as in Theorem 21.10 that P11 = PxJ.L(dx) solves the local martingaleproblern for (a, b) with initial distribution f-L· Hence, equation (a, b) has a solution (X, B) with distribution f-L for X 0 . Since conditioning on :Fo preserves martingales, the equation remains conditionally valid given Xo. By the pathwise uniqueness in the degenerate case we get P[X = F(Xo, B)IXo] = 1 a.s., and so X = F(X0 , B) a.s. In particular, the pathwise uniqueness extends to arbitrary initial distributions f-L·

J

426

Foundations of Modern Probability

Returning to the canonical setting, we may take (e, B) tobe the identity map on the canonical space JRd x C(IR+,!Rr), endowed with the probability measure J.L 0 wr and the induced complete filtration QJ.L. By Lemma 21.17 equation (a, b) has a QI-L-adapted solution X = FJ.L(e, B) with Xo = e a.s., and the previous discussion shows that even X = F(e, B) a.s. Hence, F is adapted to QJ.L, and since J.L is arbitrary, the adaptedness extends to the universal completion ?It = nJ.L Qf, t 2:: 0. 0

Exercises 1. Show that for any c E (0, 1), the stochastic flow Xf in Theorem 21.3 is a.s. Hölder continuous in x with exponent c, uniformly for bounded x and t. (Hint: Apply Theorem 3.23 to the estimate in the proof of Theorem 21.3.) 2. Show that a process X in JRd is a Brownian motion iff the process f(Xt) - ~ J~ D.f(Xs)ds is a martingale for every f E Cl(. Compare with Theorem 18.3 and Lemma 19.21. 3. Show that a Brownian bridge in JRd satisfies the SDE dXt = dBt (1- t)- 1 Xtdt on [0, 1) with initial condition X 0 = 0. Alsoshow that if xx denotes the solution starting at x, then the process ~x = Xf- (1- t)x is again a Brownian bridge. (Hint: Note that Mt= Xt/(1- t) is a martingale on [0, 1) and that yx satisfies the same SDE as X.) 4. Solve the preceding SDE, using Proposition 21.2, to express the Brownian bridge in terms of a Brownian motion. Compare with previously known formulas. 5. Given two continuous semimartingales U and V, show that the FiskStratonovich SDE dX =dU +XodV has the unique solution X= Z(Xo+ z- 1 oU), where Z = exp(V- V0 ). (Hint: Use Corollary 17.21 and the chain rule for FS-integrals, or derive the result from Proposition 21.2.) 6. Show under suitable conditions how a Fisk-Stratonovich SDE can be converted into an Itö equation, and conversely. Also give a sufficient condition for the existence of a strong solution to an FS-equation.

7. Show that weak existence and uniqueness in law hold for the SDE dXt = sgn(Xt+ )dBt with initial condition Xo = 0, while strong existence and pathwise uniqueness fail. ( Hint: Show that any solution X is a Brownian motion, and define B = sgn(X+)dX. Note that both X and -X satisfy the given SDE.) 8. Show that weak existence holds for the SDE dXt = sgn(Xt)dBt with initial condition Xo = 0, while strong existence and uniqueness in law fail. (Hint: We may take X tobe a Brownian motion or put X= 0.) 9. Show that strong existence holds for the SDE dXt = 1{Xt =F O}dBt with initial condition Xo = 0, while uniqueness in law fails. (Hint: Here X= B and X = 0 are both solutions.)

21. Stochastic Differential Equations and Martingale Problems

427

10. Show that a given process may satisfy SDE's with different (aa', b). (Hint: Fora trivial example, take X= 0, b = 0, and a = 0 or a(x) = sgnx.)

11. Construct a non-Markovian solution X to the SDE dXt = sgn(Xt)dBt. (Hint: We may take X to be a Brownian motion, stopped at the first visit to 0 after time 1. Another interesting choice is to take X to be 0 on [0, 1J and a Brownian motion on [1,oo).) 12. For X as in Theorem 21.3, construct an SDE in JRmd satisfied by the process (Xf 1 , ••• , Xf"') for arbitrary X1, ... , Xm E JRd. Gonelude that .C(X) is determined by .C(Xx, XY) for arbitrary x, y E JRd. (Hint: Note that .C(Xx) is determined by (aa', b) and x, and apply this result to the m-point motion.) 13. Find two SDE's as in Theorem 21.3 with solutions X and Y suchthat $- Y. (Hint: We may choose dX = dB and

xx :4 yx for all x but X dY = sgn(Y + )dB.)

14. For a diffusion equation (a, b) as in Theorem 21.3, show that the distribution of the associated flow X determines Lj aj(x)aj(y) for arbitrary pairs i, k E {1, ... , d} and x, y E JRd. 15. Show that if weak existence holds for the SDE (a, b), then the pathwise uniqueness can be strengthened to the corresponding property for solutions X and Y with respect to possibly different filtrations.

16. Assurne that weak existence and the stronger version of pathwise uniqueness hold for the SDE (a, b). Use Theorem 6.10 and Lemma 21.15 to prove the existence for every J1, of an a.s. unique functional solution Fp,(X0 , B) with .C(Xo) = Ji,.

Chapter 22

Local Time, Excursions, and Additive Functionals Tanaka's formula and semimartingale local time; occupation density, continuity and approximation; regenerative sets and processes; excursion local time and Poisson process; Ray-Knight theorem; excessive functions and additive functionals; local time at a regular point; additive functionals of Brownian motion The central theme of this chapter is the notion of local time, which we will approach in three different ways, namely via stochastic calculus, via excursion theory, and via additive functionals. Here the first approach leads in particular to a useful extension of Itö's formula and to an interpretation of local time as an occupation density. Excursion theory will be developed for processes that are regenerative at a fixed state, and we shall prove the basic Itö representation, involving a Poisson process of excursions on the local time scale. Among the many applications, we consider a version of the Ray-Knight theorem about the spatial variation of Brownian local time. Finally, we shall study continuous additive fundionals (CAFs) and their potentials, prove the existence of local time at a regular point, and show that any CAF of one-dimensional Brownian motion is a mixture of local times. The beginning of this chapter may be regarded as a continuation of the stochastic calculus developed in Chapter 17. The present excursion theory continues the elementary discussion for the discrete-time case in Chapter 8. Though the theory of CAFs is formally developed for Feiler processes, few results from Chapter 19 will be needed beyond the strong Markov property and its integrated version in Corollary 19.19. Both semimartingale local time and excursion theory will reappear in Chapter 23 as useful tools for studying one-dimensional SDEs and diffusions. Our discussion of CAFs of Brownian motion and their associated potentials is continued at the end of Chapter 25. For the stochastic calculus approach to local time, consider an arbitrary continuous semimartingale X in IR.. The semimartingale local time L 0 of X at 0 may be defined through Tanaka 's formula

L~ = IXtl -

IXol

-Iot sgn(Xs-

)dXs,

t

~ 0,

(1)

22. Local Time, Excursions, and Additive F'unctionals

429

where sgn(x-) = 1(o,oo)(x) -1(-oo,oj(x). Note that the stochastic integral on the right exists since the integrand is bounded and progressive. The process L 0 is clearly continuous and adapted with Lg = 0. To motivate the definition, we note that a formal application of Itö's rule to the function f(x) = !x! yields (1) with L~ = fs 0, define s = sup{r < t; 9r = g~}, and note that h' ~ h'- h = g'- g > 0 on (s,t]. Hence, g~ = g~, and so 0 < g~- 9t :S: g~- 9s = 0, a contradiction. D Proof of Theorem 22.1: For any h > 0, we may choose a convex function fh E C 2 suchthat fh(x) = -x for x :=:; 0 and fh(x) = x- h for x ~ h. Here clearly fh(x) -t !xl and f~ -t sgn(x-) as h -t 0. By Itö's formula we get, a.s. for any t ~ 0,

~h =/h(Xt)- /h(Xo)- Iot f~(Xs)dXs = ! Iot f~(Xs)d[X]s, and by Corollary 17.13 and dominated convergence we note that (Yh- L 0 );

~ 0 for each t > 0. The first assertion now follows from the fact that the processes yh are nondecreasing and satisfy

1

00

1{X8

~ [0, h]}dY"h = 0

a.s.,

The last assertion is a consequence of Lemma 22.2.

h > 0. D

Foundations of Modern Probability

430

In particular, we may deduce a basic relationship between a Brownian motion, its maximum process, and its local time at 0. The result improves the elementary Proposition 13.13. Corollary 22.3 (local time and maximum process, Levy) Let L 0 be the local time at 0 of Brownian motion B, and define Mt= sups~t B 8 • Then

(L 0 , IBI)

4 (M, M-B).

Proof: Define Bi= - fs. 011 invariant, and we get 11 0

s-r 1 = r 11211'

r > 0.

(23)

Combining (22) and (23), we get for any r > 0 1

00

(11x

0

S; 1 )x- 312dx

= r 1/ 2 1

00

11xX- 3/ 2dx

=1

00

11rxX- 3/ 2dx,

440

Foundations of Modern Probability

and by the uniqueness in (22) we obtain x

> 0 a.e . .A, r > 0.

By Fubini's theorem, we may then fix an x Vc

Define

f)

= Vc 0 Vr

0

S r-1

r

=Ver.

= c > 0 such that

> 0 a.s . .A.

sl/~' and conclude that for almost every r

-1 -1 = Vc(rfc) = Vc 0 Srfc = Vc 0 8 1/c 0

s-1 r =V A

0

>0 s-1 r .

Substituting this into (22) yields equation (21). If f.L is another probability measure with the stated properties, then for almost every r > 0 we have f.L 0 8; 1 = f) 0 8; 1 ' and hence f.L

Thus,

f)

s-1 s-1 = f.L o Sr-1 o s-1 1/r =V o r o 1/r =V. A

A

is unique.

0

By continuity of paths, an excursion of Brownian motion is either positive or negative, and by symmetry the two possibilities have the same probability ! under v. This leads to the further decomposition f) = Hv+ + i/_). A process with distribution f)+ is called a (normalized} Brownian excursion. For subsequent needs, we continue with a simple computation.

Lemma 22.16 (height distribution} Let v be the excursion law of Brownian motion. Then v{u E Do; suptut > h} = (2h)- 1 , h > 0. Proof: By Tanaka's formula the process M = 2BVO-L0 = B+ IBI-L0 is a martingale, and so we get for 7 = inf{t 2: 0; Bt = h} E L~At

= 2E(BrAt V 0),

t

2: 0.

Hence, by monotone and dominated convergence EL~ = 2E(Br V 0) = 2h. On the other hand, Theorem 22.11 shows that L~ is exponentially 0 distributed with mean (v Ah) -1, where Ah = {u; supt Ut 2: h}. The following result gives some remarkably precise information about the spatial behavior of Brownian local time.

Theorem 22.17 (space dependence, Ray, Knight} For Brownian motion B with local time L, let 7 = inf{t > 0; Bt = 1}. Then on [0, 1] the process St = L;.-t is a squared Bessel process of order 2. Several proofs are known. Here we derive the result as an application of the previously developed excursion theory. Proof (Walsh): Fix any u E [0, 1], put o- = L~, and let ~± denote the Poisson processes of positive and negative excursions from u. Write Y for the process B, stopped when it first hits u. Then Y ll(~+, ~-) and ~+ lL~-, so ~+ JL(~-, Y). Since o- is ~+ -measurable, we obtain ~+ JL".(~-, Y) and hence ~d JL".(~;;:-, Y), which implies the Markov property of L~ at x = u.

22. Local Time, Excursions, and Additive Functionals

441

To derive the corresponding transition kernels, fix any x E [0, u), and write h = u- x. Put r0 = 0, and let T1, T2, ... be the right endpoints of those excursions from x that reach u. Next define (k = L~k+t - L~k, k 2: 0, so that L~ = (o + · · · + ("., with Ii = sup{k; Tk :::; T }. By Lemma 22.16 the variables (k are i.i.d. and exponentially distributed with mean 2h. Since Ii agrees with the number of completed u-excursions before time T that reach x and since a Jl~-, it is further seen that Ii is conditionally Poisson a j2h, given a. We also need the fact that (a, ii)Jl((0 , ( 1 , ... ). To see this, define ak = L~k· Since ~- is Poisson, we note that (a1,a2, ... )Jl((1,(2, ... ), and so (a, a1, a2, ... )Jl(Y, (1, (2, ... ). The desired relation now follows, since Ii is a measurable function of (a, a1, a2, ... ) and (o depends measurably on Y. For any s 2: 0, we may now compute E [ e-sL~-h I a]

E [ (Ee-s(o

=

r+

11

a]

(1 + 2sh)- 1 exp { 1

= E[ (1 + 2sh)-"'- 1 a] 1

~s~h}.

In combination with the Markov property of L~, the last relation is equivalent, via the Substitutions u = 1- t and 2s = (a- t)- 1 , to the martingale property of the process Mt

-L1-t }

= (a- t)- 1 exp { 2(a :_ t) ,

t E [O,a),

(24)

for arbitrary a > 0. Now let X be a squared Bessel process of order 2, and note that L~ = Xo = 0 by Theorem 22.4. By Corollary 13.12 the process X is again Markov. To see that X has the same transition kernel as L~-t, it is enough to show for an arbitrary a > 0 that the process M in (24) remains a martingale when L~-t is replaced by Xt. This is easily verified by means of Itö's formula, if we note that X is a weak solution to the SDE dXt = 2XJ 12 dBt + 2dt. D As an important application of the last result, we may show that the local time is strictly positive on the range of the process.

Corollary 22.18 (range and support) Let M be a continuous local martingale with local time L. Then outside a fixed P-null set,

{Lf > 0} = {infs::;tMs 0 otherwise, we may reduce by Theorem 18.3 and Corollary 22.6 to the case when M is a Brownian motion B. Letting Tu= inf{t 2: 0; Bt = u}, we see from Theorems 18.6 (i) and 18.16 that, outside a fixed P-null set,

(26)

442

Foundations of Modern Probability

If 0 :::; x < sups 0.

D

Now let Pt(x) denote the transition density (2nt)-df 2 e-lxl 2 / 2t of Brownian motion in JRd, and put ua:(x) = J000 e-a:tPt(x)dt. For any measure f1 on JRd, we may introduce the associated a-potential ua:J1(x) = ua:(xY)J1(dy). The following result shows that the Revuz measure has the same potential as the underlying CAF.

J

Theorem 22.21 (a-potentials, Hunt, Revuz) For Brownian motion in JRd, let A be a CAF with Revuz measure VA. Then U.A = U 0 vA for all a 2:0. Proof: By monotone convergence we may assume that a > 0. By Lemma 22.20 we may choose some positive functions fn t 1 such that Vfn·A1 = VA!n < 00 for each n, and by dominated convergence we have U'fn·A tu~ and U 0 Vfn·A t U 0 VA. Thus, we may further assume that VA is bounded. In that case, clearly U.A < oo a.e.

Foundations of Modern Probability

444

Now fix any bounded, continuous function I ~ 0 on JRd, and note that by dominated convergence ua I is again bounded and continuous. Writing h = n- 1 for an arbitrary n E N, we get by dominated convergence and the additivity of A 1

VAU 0 I= E { ua I(Xs)dAs

Jo

= lim E ~. n--+oo

ua I(Xjh)Ah

L...JJ 0. Now VA determines u~ by Theorem 22.21, and from the proof of Lemma 22.19 we note that U~ determines A a.s. Px whenever U~(x) < oo. Since Px o XJ: 1 « >..d for each h > 0, it follows that A o fh is a.s. unique, and it remains to let h --+ 0. 0 u~

We turn to the reverse problern of constructing a CAF associated with a given potential. To motivate the following definition, we may take expected values in (29) to get e-atTtU~ :::; u~. A function f Oll s is said to be uniformly a-excessive if it is bounded and measurable with 0 :::; e-atTtf :::; f for all t 2:: 0 and such that 1/Ttf- !II --+ 0 as t --+ 0, where II · I denotes the supremum norm. Theorem 22.23 (excessive functions and CAFs, Volkonsky} For any Feller process X in S and constant o: > 0, let f 2:: 0 be a uniformly aexcessive function on S. Then f = U~ for some a.s. unique, perfect CAF A ofX. Proof: For any bounded, measurable function g on S, we get by Fubini's theorem and the Markov property of X

~Ex 11

00

e-atg(Xt)d{ =Ex 1oo e-atg(Xt)dt 1oo e-a(t+h)g(Xt+h)dh Ex Ex


0 the bounded, nonnegative functions

h- 1 (!- e-ahThf),

9h

fh

1

00

=

Uagh = h- 1 1h e-asTsfds,

and define Ah(t)

= 1t 9h(Xs)ds,

As in (29), we note that the processes Mh are martingales under Px for every x. Using the continuity of the Ah, we get by Proposition 7.16 and

446

Foundations of Modern Probability

(31), for any x ES and as h, k--+ 0, Ex(Ah- Ak)* 2

+ llfh- fkll 2 ;::: Ex IAh(oo)- Ak(oo)l 2 + llfh- fkll 2 < ll!h- !kllll!h + !kll + ll!h- !kll 2 --+ 0. ;:::

ExSUPtEQ+ IMh(t)- Mk(tW

Hence, there exists some continuous process A independent of x such that Ex(Ah - A"')* 2 --+ 0 for every x. Fora suitable sequence hn--+ 0 we have (Ahn --+ A"')* --+ 0 a.s. Px for all x, and it follows easily that A is a.s. a perfect CAF. Taking limits in the relation fh(x) = ExAh(oo), we also note that f(x) = Exk~(oo) = UÄ(x). Thus, A has a-potential f. D We will now use the last result to construct local times. Let us say that a CAF A is supported by some set B C S if its set of increase is a.s. contained in the closure of the set {t 2: 0; Xt E B}. In particular, a nonzero and perfect CAF supported by a singleton set {x} is called a local time at x. This terminology is clearly consistent with our earlier definitions of local time. Writing rx = inf{t > 0; Xt = x}, we say that x is regular (for itself) if rx = 0 a.s. Px. By Proposition 22.7 this holds iff Px-a.s. the random set Zx = {t 2: 0; Xt = x} has no isolated points.

Theorem 22.24 (additive functionallocal time, Blumenthal and Getoor) A Feller process in S has a local time L at a point a E S iff a is regular. In that case L is a.s. unique up to a normalization, and Ul(x) = Ui(a)Exe-ra < oo, XE S. (32)

Proof: Let L be a local time at a. Comparing with the renewal process L:;; 1 ' n E z+, we see that SUPx,t Ex(Lt+h- Lt) < 00 for every h > 0, which implies Ul(x) < oo for all x. By the strong Markov property at r =Ta, we get for any x E S Ul(x)

=

Ex(L~-L~)=Exe-r(L~oOr)

=

Exe-T EaL~ = ul(a)Exe-r,

proving (32). The uniqueness assertion now follows by Lemma 22.19. To prove the existence of L, define f(x) = Excr, and note that f is bounded and measurable. Since r ~ t + r o Ot, we may also conclude from the Markov property at t that, for any x ES, f(x)

= =

Exe-r 2: e-t Ex(e-r o Ot) e-tExEx,e-r

= e-tExf(Xt) = e-tTtf(x).

Noting that CTt = t + r o Ot is nondecreasing and tends to 0 a.s. Pa as t --+ 0 by the regularity of a, we further obtain 0

< f(x)- e-hThf(x) Ex(e-r- e-uh)

~

Ex(e-r- e-uh+T)

Exe-r Ea(1- e-uh)

~

Ea(1- e-uh) --+ 0.

22. Local Time, Excursions, and Additive Functionals

447

Thus, f is uniformly 1-excessive, and so by Theorem 22.23 there exists a perfect CAF L with Ul = f. To see that L is supported by the singleton {a}, we may write

which implies L,; = 0 a.s. Hence, L 7 = 0 a.s., and so the Markov property yields Lat = Lt a.s. for all rational t. This shows that L has a.s. no point D of increase outside the closure of {t ~ 0; Xt = a}. The next result shows that every CAF of one-dimensional Brownian motion is a unique mixture of local times. Recall that v A denotes the Revuz measure of the CAF A. Theorem 22.25 {integral representation, Volkonsky, McKean and Tanaka) For Brownian motion X in~ with local time L, a process A is a CAF of X iff it has an a.s. representation At=

I:

L~v(dx),

for some locally finite measure v on equals VA.

~-

t

~ 0,

(33)

The latter is then unique and

Proof: For any measure v we may define an associated process A as in (33). If v is locally finite, it is clear by the continuity of L and dominated convergence that Ais a.s. continuous, hence a CAF. In the opposite case, we note that v is infinite in every neighborhood of some point a E ~- Under Pa and for any t > 0, the process Lf is further a.s. continuous and strictly positive near x = a. Hence, At= oo a.s. Pa, and A fails tobe a CAF. Next, we conclude from Fubini's theorem and Theorem 22.5 that ELf= J(EyLl)dy

= Eo J L~-ydy =

1.

Since LX is supported by {x}, we get for any CAF A as in (33) VA!

= E(f · A)t

j

= E J v(dx)

f(x)v(dx)ELf

1f(Xt)dL~ 1

= vj,

which shows that v = v A. Now consider an arbitrary CAF A. By Lemma 22.20 there exists some function f > 0 with v A! < oo. The process Bt

=

JL~vf·A(dx) JL~ =

f(x)vA(dx),

t

~ 0,

is then a CAF with v 8 = VJ·A, and by Corollary 22.22 we get B = a.s. Thus, A = f- 1 · B a.s., and (33) follows.

f ·A D

448

Foundations of Modern Probability

Exercises 1. Use Lemma 13.15 to show that the set of increase of Brownian local time at 0 agrees a.s. with the zero set Z. Extend the result to any continuous local martingale. (Hint: Apply Lemma 13.15 to the process sgn(B-) ·Bin Theorem 22.1.) 2. (Levy) Let M be the maximum process of a Brownian motion B. Show that B can be measurably recovered from M-B. (Hint: Use Corollaries 22.3 and 22.6.) 3. Use Corollary 22.3 to give a simple proof of the relation 7 2 4 73 in Theorem 13.16. (Hint: Recall that the maximum is unique by Lemma 13.15.)

Also use Proposition 18.9 to give a direct proof of the relation 71 4 72· ( Hint: Integrate separately over the positive and negative excursions of B, and use Lemma 13.15 to identify the minimum.)

4. Show that for any c E (0, ~), Brownian local time Lf is a.s. Hölder continuous in x with exponent c, uniformly for bounded t. Also show that the bound c < ~ is best possible. (Hint: Apply Theorem 3.23 to the estimate in the proof of Theorem 22.4. Forthelast assertion, use Theorem 22.17.) 5. Let M be a continuous local martingalesuchthat B o [M] a.s. for some Brownian motion. Show that if B has local time Lf, then the local time of M at x equals Lx o [M]. (Hint: Use Theorem 22.5, and note that L o [M] is jointly continuous.) 6. For any continuous semimartingale X, show that J~ f(X 8 , s)d[X]s

=

J dx J~ f(x, s)dL~ outside a fixed null set. (Hint: Extend Theorem 22.5 by

a monotone class argument.) 7. Let Z be the zero set of Brownian motion B. Use Proposition 22.12 and Theorem 22.15 to construct its local time L directly from Z. Also use Lemma 22.16 to construct L from the heights of the excursions of B. Finally, use Corollary 22.6 to construct L from the occupation measure of B. 8. Let 'TJ be the maximum of a Brownian excursion. Show that ETJ (7r/2) 112 . (Hint: Use Theorem 22.15 and Lemmas 22.16 and 3.4.)

=

9. Let L be the continuous local time of a continuous local martingale M with [M] 00 = oo a.s. Show that a.s. Lf -+ oo as t -+ oo, uniformly on compacts. (Hint: Reduce to the case of Brownian motion. Then use Corollary 22.18, the strong Markov property, and the law oflarge numbers.) 10. Show that the intersection of two regenerative sets is regenerative. 11. Let L be the local time of aregenerative set and let 7 be an independent, exponentially distributed time. Show that Lr is again exponentially distributed. ( Hint: Prove a Cauchy equation for the function P {Lr > s}.) 12. For any unbounded regenerative set Z, show that C(Z) is a.s. determined by Z. (Hint: Use the law of large numbers.)

22. Local Time, Excursions, and Additive F'unctionals

449

13. Let Z be a nontrivial regenerative set. Show that cZ ~ Z for all c > 0 iff the inverse local time is strictly stable. 14. Let X be a Feiler process in ~ and put Mt = sups 0, and let R be the event where Bs ~ Sv on (0, t]. Noting that Lf = 0 a.s. for x outside the range B[O, t], we get a.s. on R

At=/_: Lfv(dx) :S v(B[O, t]) supxLf < oo since B[O, t] is compact and Lf is a.s. continuous, hence bounded. Conversely, suppose that Bs E Sv for some s < t. To show that At= oo a.s. on this event, we may use the strong Markov property to reduce to the case when B 0 = a is nonrandom in Sv. But then Lf > 0 a.s. by Tanaka's formula, and so by the continuity of L we get for small enough e: > 0

At=

1

00

-oo

Lfv(dx)

~ v(a- e:, a + e:) lx-al ( we have At = oo. Alsonote that A, = oo when ( = oo, whereas A( may be finite when ( < oo. In the latter case A jumps from A( to oo at time(. Now introduce the inverse Tt=inf{s>O;As>t},

(6)

t~O.

The process T is clearly continuous and strictly increasing on (0, Ad, and for t ~ A, we have Tt = (. Also note that Xt = Yr, is a continuous local martingale and, moreover,

t =Ar, = lar' 0"- 2 (Yr)dr =

Iot

0"- 2 (Xs)dT8

,

Hence, for t :S A(,

(7) Here both sides remain constant after time A, since Su C Nu, and so (7) remains true for all t ~ 0. Hence, Theorem 18.12 yields the existence of a Brownian motion B satisfying (3), which means that Xis a weak solution with initial distribution J.t. To prove the converse implication, assume that weak existence holds for any initial distribution. To show that Su C Nu, we may fix any x E Su and

23. One-dimensional SDEs and Diffusions

453

choose a solution X with Xo = x. Since X is a continuous local martingale, Theorem 18.4 yields Xt = Yrt for some Brownian motion Y starting at x and some random time-change T satisfying (7). For A as in (5) and for t;::: 0 we have

(8)

=

Since As = oo for s > 0 by Lemma 23.2, we get Tt = 0 a.s., and so Xt x a.s. But then x E Nu by (7). Turning to the uniqueness assertion, assume that Nu C Su, and consider a solution X with initial distribution 1-l· As before, we may write Xt = Yrt a.s., where Y is a Brownian motion with initial distribution 1-l and T is a random time-change satisfying (7). Define A as in (5), put x = inf{t ;::: 0; Xt E Su }, and note that Tx = ( inf{ s ;::: 0; Ys E Su }. Since Nu C Su, we get as in (8)

=

Art

=

1Tt a- (Ys)ds = t, 2

Furthermore, As = oo for s > ( by Lemma 23.2, and so (8) implies Tt ::; ( a.s. for all t, which means that Tremains constant aftertime X· Thus, T and Aare related by (6), which shows that T and then also X are measurable functions of Y. Since the distribution of Y depends only on /-l, the same thing is true for X, which proves the asserted uniqueness in law. To prove the converse, assume that Su isaproper subset of Nu, and fix any x E Nu\ Su. As before, we may construct a solution starting at x by writing Xt = Yrt, where Y is a Brownian motion starting at x, and T is defined as in (6) from the process A in (5). Since x ~ Su, Lemma 23.2 gives Ao+ < oo a.s., and so Tt > 0 a.s. for t > 0, which shows that X is a.s. nonconstant. Since x E Nu, (3) has also the trivial solution Xt x. Thus, uniqueness in law fails for solutions starting at x. D

=

Proceeding with a study of pathwise uniqueness, we return to equation (1), and let w(a, ·) denote the modulus of continuity of a. Theorem 23.3 {pathwise uniqueness, Skorohod, Yamada and Watanabe) Let a and b be bounded, measurable functions on JR., where

1e

(w(a, h))- 2 dh

= oo,

c > 0,

(9)

and either b is Lipschitz continuous or a -:/= 0. Then pathwise uniqueness holds for equation (a, b) . The significance of condition (9) is clarified by the following lemma, where for any semimartingale Y we write Lf (Y) for the associated local time.

454

Foundations of Modern Probability

Lemma 23.4 (local time) Fori = 1, 2, let Xi solve equation (o-, bi), where a satisfies (9). Then L0 (X 1 - X 2 ) = 0 a.s.

Proof: Write Y = X 1 - X 2 , Lf = Lf(Y), and w(x) = w(a, !xl). Using (1) and Theorem 22.5, we get for any t > 0

1

00

-00

Lfdx -1t d[Y]s -1t { a(X})- a(X'1) } 2 ds:::; t < oo. wx2 - 0 (w (Y.s ))2 - 0 w (Xls X2) s

By (1) and the right-continuity of L it follows that L~ = 0 a.s.

D

Proof of Theorem 23.3 for a =f. 0: By Propositions 21.12 and 21.13 combined with a simple localization argument, we note that uniqueness in law holds for equation (a, b) when a =f. 0. To prove the pathwise uniqueness, consider any two solutions X and Y with Xo = Yo a.s. Using Tanaka's formula, Lemma 23.4, and equation (a, b), we get d(Xt V yt)

=

dXt + d(yt - Xt)+ dXt + 1{yt > Xt}d(yt- Xt) 1{yt :::; Xt}dXt + 1{yt > Xt}dyt

a(Xt V yt)dBt + b(Xt V yt)dt,

which shows that X V Y is again a solution. By the uniqueness in law we get X~ X V Y. Since X:::; X V Y, it follows that X= X V Y a.s., which implies Y :::; X a.s. Similarly, X :::; Y a.s. D The assertion for Lipschitz continuous b is a special case of the following comparison result. Theorem 23.5 (weak comparison, Skorohod, Yamada) Fix some functions a and b1 2: b2, where a satisfies (9) and either b1 or b2 is Lipschitz continuous. For i = 1, 2, let Xi solve equation (a, bi), and assume that XJ 2: X5 a.s. Then X 1 2: X 2 a.s.

Proof: By symmetry we may assume that b1 is Lipschitz continuous. Since X5 :::; XJ a.s., we get by Tanaka's formula and Lemma 23.4 (X[- Xl)+

= l t 1{X'1 >X;} (a(X[)- a(Xl)) dBt + fot 1{x; > x;} (b2(X?)- b1(x;)) ds.

23. One-dimensional SDEs and Diffusions

455

Using the martingale property of the first term, the Lipschitz continuity of b1, and the condition b2 ~ b1. we conclude that E(Xl- xl)+


x;}(b1(x;)- b1(x;)) ds

;S

E fot

l{x; > x;} 1x;- x;1 ds

=

fot E(x;- x;)+ds.

By Gronwall's lemma E(Xf - Xf )+

= 0, and hence Xf

~ Xf a.s.

o

Imposing stronger restrictions on the coefficients, we may strengthen the last conclusion to a strict inequality. Theorem 23.6 (strict comparison) Fixa Lipschitz continuous function a and some continuous functions bt > b2. For i = 1, 2, let Xi solve equation (a,bi), and assume that XJ;::: XJ a.s. Then X 1 > X 2 a.s. on (O,oo). Proof: Since the bi are continuous with b1 > b2, there exists a locally Lipschitz continuous function b on lR with b1 > b > b2. By Theorem 21.3 equation (a, b) has a solution X with Xo = XJ 2: XJ a.s., and it suffices to show that X 1 > X > X 2 a.s. Oll (0, 00 ). This reduces the discussion to the case when one of the functions bi is locally Lipschitz. By symmetry we may take that function to be b1 . By the Lipschitz continuity of a and b1 , we may define some continuous semimartingales U and V by Iot (b1 (X;) - bz(X;)) ds,

Ut

v;t

= 1t a(X}) - a(X;) dB Xl8 - X28

0

s

+

1t b1 (X}) - b1 (X;) d Xls - X2s s, 0

= 0, and we note that Xl) = dUt + (Xl - Xl)dvt.

subject to the convention 0/0 d(Xl -

Letting Z = exp(V Xl- x;

~[V])

> 0, we get by Proposition 21.2

= Zt(XÖ- x5) + Zt fot z; 1 (bt(x;)- b2(x;)) ds,

and the assertion follows since XJ 2: XJ a.s. and b1

> b2.

0

We turn to a systematic study of one-dimensional diffusions. By a diffusion on some interval I C lR we mean a continuous strong Markov process taking values in I. Termination will only be allowed at open end-points of I. We define Ty = inf{t 2: 0; Xt = y} and say that X is regular if Px{ Ty < oo} > 0 for any x E I 0 and y EI. Let us further write Ta,b = TaATb·

456

Foundations of Modern Probability

Our first aim is to transform the general diffusion process into a continuaus local martingale, using a suitable change of scale. This corresponds to the removal of drift in the SDE (1). Theorem 23.7 {scale function, Feller, Dynkin} For any regular diffusion

X on I, there exists a continuous and strictly increasing function p on I such that p(X'~"a,b) is a Px-martingale for all a:::; x :::; b in I. Furthermore, an increasing function p has the stated property iff Px { Tb< Ta } = Px- Pa , XE [a,b]. (10) Pb- Pa

A function p with the stated property is called a scale function for X, and we say that X is on a natuml scale if the scale function can be chosen tobe linear. In general, we note that Y = p(X) is a regular diffusion on a natural scale. Our proof begins with a study of the functions

Pa,b(x)

= Px{Tb

0 for any a < x < b. Then introduce the optional time a1 =Ta+ Tx o (}".a, and define recursively an+l =an +a1 o(}CTn. By the strong Markov property the an form a random walk in [0, oo] under each Px. If Px{Tb x [0, 1]: Relation (10) yields Px{ ro

Px{ro < oo} 2: Px{ro < rb} = (b-x)jb, which tends to 1 as b --+ oo. ( -oo, oo): The recurrence follows from the previous case.

[[0, oo ): Since 0 is refl.ecting, we have Po{ ry < oo} > 0 for some y > 0. By the strong Markov property and the regularity of X, this extends to arbitrary y. Arguing as in the proof of Lemma 23.8, we may conclude that Po{ ry < oo} = 1 for all y > 0. The asserted recurrence now follows, as we combine with the statement for [0, oo). (0, oo ): In this case X = B o [X] a.s. for some Brownian motion B. Since X> 0, we have [X]oo < oo a.s., and therefore X converges a.s. Now Py{ ra,b < oo} = 1 for any 0 < a ::; y ::; b. Applying the Markov property at an arbitrary time t > 0, we conclude that a.s. either

466

Foundations of Modern Probability

lim inft Xt ~ a or lim supt Xt 2: b. Since a and b are arbitrary, it follows that Xoo is an endpoint of (0, oo) and hence equals 0. (0, 1): Arguing as in the previous case, we get a.s. convergence to either 0 or 1. To find the corresponding probabilities, we conclude from (10) that b-x

Px{Ta < oo} 2: Px{Ta 0 suchthat 1-ltn(xo, ·) -41-l along a subsequence, in the topology of I. The convergence extends by Lemma 23.17 to arbitrary x EI, and so Ttnf(x) -+ 1-lf,

f E Co(I), x EI.

(26)

Now fix any h 2: 0 and f E Co(I), and note that even Thf E Co(l) by Theorem 23.13. Using (26), the semigroup property, and dominated convergence, we get for any x E I

!-L(Thf) +- TtJThf)(x)

= Th(Ttnf)(x)

-+ 1-Lf.

Thus, 1-l/-lh = 1-l for all h, which means that 1-l is invariant on I. In particular, 1-l( I \ I) = 0 by the nature of entrance boundaries, and so the normalized 0 measure 1-L/ !-LI is an invariant distribution on I. Our final lemma provides the crucial connection between speed measure and invariant distributions.

Lemma 23.19 (positive recurrence) Fora regular, recurrent diffusion on

a natural scale and with speed measure v, these conditions are equivalent: (i) vi < oo; (ii) the process is positive recurrent; (iii) an invariant distribution exists. The invariant distribution is then unique and equals v / v I.

Proof: If the process is null-recurrent, then clearly no invariant distribution exists. The converse is also true by Lemma 23.18, and so (ii) and (iii) are equivalent. Now fix any bounded, measurable function f: I -+ IR.+ with bounded support. By Theorem 23.14, Fubini's theorem, and dominated

468

Foundations of Modern Probability

convergence, we have for any distribution 11- on I C 1 1t Ep,f(Xs)ds

= Ep, C 1 1t j(Xs)ds--+

:~.

If 11- is invariant, we get 11-f = vf jvi, and so vi < oo. If instead X is null-recurrent, then Ep,f(Xs)--+ 0 as s--+ oo, and we get vf jvi = 0, which D implies vi = oo.

End of proof of Theorem 23.15: It remains to consider the cases when I is either (oo, oo), [[0, oo), or [[0, 1]], since we have otherwise convergence or absorption at some endpoint. In case of [[0, 1]] we note from Theorem 23.12 (ii) that v is bounded. In the remaining cases v may be unbounded, and then Xis null-recurrent by Lemma 23.19. If v is bounded, then 11- = vjvi is invariant by the same lemma, and the asserted v-ergodicity follows from D Lemma 23.17 with 11-1 = 11-·

Exercises 1. Prove pathwise uniqueness for the SDE dXt = (Xt) 112 dBt + cdt with c > 0. Also show that the solutions xx with X(f = X satisfy Xt < xr a.s. for x < y up to the time when xx reaches 0.

2. Let X be Brownian motion in ~d, absorbed at 0. Show that Y = IXI 2 is a regular diffusion on (0, oo), describe its boundary behavior for different d, and identify the corresponding case ofTheorem 23.15. Verify the conclusion by computing the associated scale function and speed measure. 3. Show that solutions to equation dXt = a(Xt)dBt cannot explode. (Hint: If X explodes at time ( < oo, then [Xk = oo, and the local time of X tends to oo as t--+ (, uniformly on compacts. Now use Theorem 22.5 to see that ( = oo a.s.) 4. Assurne in Theorem 23.1 that Su =Nu. Show that the solutions X to (3) form a regular diffusion on a natural scale on every connected component I of Su. Also note that the endpoints of I are either absorbing or exit boundaries for X. (Hint: Use Theorems 21.11, 22.4, and 22.5, and show that the exit time from any compact interval J C I is finite.) 5. Assurne in Theorem 23.1 that Su C Nu, and form ä from a by taking = 1 on A =Nu\ Su. Show that any solution X to equation (ä,O) also solves equation (a, 0), but not conversely unless A = 0. (Hint: Since .\A = 0, we have 1A(Xt)dt 1A(Xt)d[X]t = 0 a.s. by Theorem 22.5.) 6. Assurne in Theorem 23.1 that Su C Nu. Show that equation (a, 0) has solutions that form a regular diffusion on every connected component of S~. Prove the corresponding statement for the connected components of N; when Nu is closed. (Hint: For S~, use the preceding result. For N;, take X tobe absorbed when it first reaches Nu.)

ä(x)

I

=I

23. One-dimensional SDEs and Diffusions

469

7. In the setting of Theorem 23.14, show that the stated relation implies the convergence in Corollary 20.8 (i). Also use the result to prove a law of large numbers for regular, recurrent diffusions with bounded speed measure v. (Hint: Note that vg > 0 implies Jg(Xs)ds > 0 a.s.)

Chapter 24

Connections with PDEs and Potential Theory Backward equation and Feynman-Kac formula; uniqueness for SDEs from existence for PDEs; harmonic functions and Dirichlet's problem; Green functions as occupation densities; sweeping and equilibrium problems; dependence on conductor and domain; time reversal; capacities and random sets

In Chapters 19 and 21 we saw how elliptic differential operators arise naturally in probability theory as the generators of nice diffusion processes. This fact is the ultimate cause of some profound connections between probability theory and partial differential equations (PDEs). In particular, a suitable extension of the operator ~ 6. appears as the generator of Brownian motion in JRd, which leads to a close relationship between classical potential theory and the theory of Brownian motion. More specifically, many basic problems in potential theory can be solved by probabilistic methods, and, conversely, various hitting distributions for Brownian motion can be given a potential theoretic interpretation. This chapter explores some of the mentioned connections. First we derive the celebrated Feynman-Kac formula and show how existence of solutions to a given Cauchy problern implies uniqueness of solutions to the associated SDE. We then proceed with a probabilistic construction of Green functions and potentials and solve the Dirichlet, sweeping, and equilibrium problems of classical potential theory in terms of Brownian motion. Finally, we show how Green capacities and alternating set functions can be represented in a natural way in terms of random sets. Some stochastic calculus from Chapters 17 and 21 is used at the beginning of the chapter, and we also rely on the theory of Feiler processes from Chapter 19. As for Brownian motion, the present discussion is essentially self-contained, apart from some elementary facts cited from Chapters 13 and 18. Occasionally we refer to Chapters 4 and 16 for some basic weak convergence theory. Finally, the results at the end of the chapter require the existence of Poisson processes from Proposition 12.5, as well as some basic facts about the Fell topology listed in Theorem A2.5. Potential theoretic ideas are used in several other chapters, and additional, though essentially unrelated, results appear in especially Chapters 20, 22, and 25.

24. Connections with PDEs and Potential Theory

471

To begin with the general PDE connections, we consider an arbitrary Feiler diffusion in JRd with associated semigroup operators Tt and generator (A, V). Recail from Theorem 19.6 that, for any f E V, the function u(t, x)

=

Ttf(x)

=

t ~ 0, x E lRd,

Exf(Xt),

satisfies Kolmogorov's backward equation ü =Au, where ü = 8uj8t. Thus, u provides a probabilistic solution to the Cauchy problem u=Au,

u(O,x)

= f(x).

(1)

Let us now add a potential term vu to (1), where v: JRd -+ JR+, and consider the more general problern ü =Au- vu,

u(O,x)

= f(x).

(2)

Here the solution may be expressed in terms of the elementary multiplicative functional e- v, where

Vi = fot v(Xs)ds, t 2: 0. Let C 1•2 denote the dass of functions f : JR+ x JRd that are of dass C 1 in the time variable and of dass C 2 in the space variables. Write Cb(JRd) and (JRd) for the classes of bounded, continuous functions from JRd to lR and JR+, respectively.

c:

Theorem 24.1 (Cauchy problem, Feynman, Kac) Let (A, V) be the generator of a Feller diffusion in JRd, and fix any f E Cb(JRd) and v E (JRd). Then any bounded solution u E C 1 •2 to (2) is given by

ct

t 2:0, x E lRd.

u(t,x) = Exe-V' f(Xt), Conversely, (3) solves (2) whenever

f

E

(3)

V.

The expression in (3) has an interesting interpretation in terms of killing. To see this, we may introduce an exponential random variable ')'JlX with mean 1, and define ( = inf{t 2: 0; vt > 1'}. Letting X denote the process X killed at time(, we may express the right-hand side of (3) as Exf(Xt), with the understanding that f(Xt) = 0 when t ~(.In other words, u(t,x) = Ttf(x), where Tt is the transition operator of the killed process. It is easy to verify directly from (3) that the family (Tt) is again a Feiler semigroup. Proof of Theorem 24.1: Assurne that u E C 1 •2 is bounded and solves (2), and define for fixed t > 0 M8

= e-V·u(t- s, X 8 ),

s E [0, t].

Letting ~ denote equality apart from a continuous local martingale or its differential, we see from Lemma 19.21, Itö's formula, and (2) that, for any

s < t, dM8

e-V·{du(t- s,X8 ) - u(t- s,X8 )v(X8 )ds} m

e-V·{Au(t- s,X8 ) - ü(t- s,X8 ) - u(t- s,X8 )v(X8 )}ds = 0.

472

Foundations of Modern Probability

Thus, M is a continuous local martingale on [0, t). Since M is bounded, the martingale property extends tot, and we get u(t,x)

= ExMo = ExMt = Exu(O,Xt) = Exe-v' f(Xt).

Next let u be given by (3) for some using Lemma 19.21, we obtain d{e-V' f(Xt)}

f

E V. Integrating by parts and

e-V'{df(Xt)- (vf)(Xt)dt} m

e-V•(Af- vf)(Xt)dt.

Taking expectations and differentiating at t = 0, we conclude that the generator of the semigroup Ttf(x) = ExfCXt) = u(t, x) equals Ä. = A- v on V. Equation (2) now follows by the last assertion in Theorem 19.6. D The converse part of Theorem 24.1 can often be improved in special cases. In particular, if v = 0 and A = ~ ß = ~ L.JP j 8x~, so that X is a Brownian motion and (2) reduces to the standard heat equation, then u(t, x) = Exf(Xt) solves (2) for any bounded, continuous function f on ~d. To see this, we note that u E C 1 •2 on (0, oo) x ~d because of the smoothness of the Brownian transition density. We may then obtain (2) by applying the backward equation to the function Thf(x) for a fixed h E (0, t). Let us now consider an SDE in ~d of the form dXi

= a}(Xt)dB{ + bi(Xt)dt,

(4)

and introduce the associated elliptic operator Av(x)

= ~aij(x)v~j(x) + bi(x)v~(x),

XE ~d, v E

C2,

where aij = ala~. The next result shows how uniqueness in law for solutions to (4) may be inferred from the existence of solutions to the associated Cauchy problern (1). Theorem 24.2 (uniqueness, Stroock and Varadhan) lf for every f E C0 (~d) the Cauchy problern in (1) has a bounded solutionon [O,c] x ~d for some c > 0, then uniqueness in law holds for the SDE (4).

Proof: Fix any f E C0 and t E (O,c], and let u be a bounded solution to (1) on [0, t] x ~d. If X solves (4), we note as before that Ms = u(t- s, Xs) is a martingale on [0, t], and so Ef(Xt)

= Eu(O,Xt) = EMt = EMo = Eu(t,Xo).

Thus, the one-dimensional distributions of X on [0, c] are uniquely determined by the initial distribution. Now assume that X and Y are solutions with the same initial distribution. To prove that their finite-dimensional distributions agree, it is enough to consider times 0 = to < t1 < · · · < tn such that tk - tk-l ~ c for all k. Assurne that the distributions agree at to, ... , tn-l = t, and fix any set C = 1rt;;~... ,tn-l B with B E ßnd. By Theorem 21.7, both .C(X) and .C(Y) solve the local martingaleproblern for (a, b). If P{X E C} = P{Y E C} >

24. Connections with PDEs and Potential Theory

473

0, we see as in case of Theorem 21.11 that the same property holds for the conditional measures P[(hX E ·IX E C] and P[OtY E ·IY E C]. Since the corresponding initial distributions agree by hypothesis, the one-dimensional result yields the extension P{X E C, Xt+h E ·}

= P{Y E C, Yi+h

E ·},

h E (O,.s].

In particular, the distributions agree at times to, ... , tn. The general result now follows by induction. 0 Let us now specialize to the case when X is Brownian motion in JRd. For any closed set B C JRd, we introduce the hitting time 'TB= inf{t > O;Xt E B} and associated hitting kernet HB(X, dy)

= Px{rB < oo, X-rB

E dy},

XE

lRd.

For suitable functions J, we write HBf(x) = J f(y)HB(x, dy). By a domain in JRd we mean an open, connected subset D c JRd. A function u: D -+ lR is said to be harmonic if it belongs to C 2 (D) and satisfies the Laplace equation ßu = 0. We also say that u has the meanvalue property if it is locally bounded and measurable, and such that for any ball B C D with center x, the average of u over the boundary äB equals u( x). The following analytic result is crucial for the probabilistic developments.

Lemma 24.3 (harmonic functions, Gauss, Koebe} A function u on a domain D C JRd is harmonic iff it has the mean-value property, in which case u E C 00 (D). Proof: First assume that u E C 2 (D), and fixaball B C D with center x. Writing r = raB and noting that Exr < oo, we get by Itö's formula Exu(X-r)- u(x)

=~Ex 1-r ßu(Xs)ds.

Here the first term on the left equals the average of u over äB, due to the spherical symmetry of Brownian motion. If u is harmonic, then the right-hand side vanishes, and the mean-value property follows. If instead u is not harmonic, we may choose B suchthat ßu # 0 on B. But then the right-hand side is nonzero, and so the mean-value property fails. It remains to show that every nmction u with the mean-value property is infinitely differentiable. Then fix any infinitely differentiable and spherically symmetric probability density r.p, supported by a ball of radius c > 0 around the origin. The mean-value property yields u = u * r.p on the set where the right-hand side is defined, and by dominated convergence the infinite differentiability of r.p carries over to u * r.p = u. 0 Before proceeding to the potential theoretic developments, we need to introduce a regularity condition on the domain D. Writing ( = (n = rvc, we note that Px{( = 0} = 0 or 1 for every x E öD by Corollary 19.18. When this probability is 1, we say that x is regular for Dc or simply regular.

474

Foundations of Modern Probability

If this holds for every x E öD, then the boundary öD is said tobe regular

and we refer to D as a regular dornain. Regularity is a fairly weak condition. In particular, any domain with a smooth boundary is regular, and we shall see that even various edges and corners are allowed, provided they are not too sharp and directed inward. By a spherical cone in Jlld with vertex v and axis a :f; 0 we mean a set of the form C = {x; (x- v,a) 2: clx- vl}, where c E (0, laiJ. Lemma 24.4 (cone condition, Zaremba) Given a dornain D c Jlld, let x E öD be such that C n G C Dc for sorne sorne spherical cone C with vertex x and sorne neighborhood G of x. Then x is regular for Dc. Proof: By compactness of the unit sphere in Jlld, we may cover Jlld by C1 = C along with finitely many congruent cones C2, ... , Cn with vertex x. By rotational symmetry

1 = Px{mink 0. Hence, Corollary 19.18 yields P{Tc we get (v :::; Tcna = 0 a.s. Px.

= 0} = 1, and D

Now fix a domain D C llld and a continuous function f : öD -+ R A function u on D is said to solve the Dirichlet problern (D, !), if u is harmonic on D and continuous on D with u = f on öD. The solution may be interpreted as the electrostatic potential in D when the potential on the boundary is given by f. Theorem 24.5 (Dirichlet problern, Kakutani, Doob) For any regular dornain D C Jlld and function f E Cb(öD), the Dirichlet problern (D, f) is solved by the function u(x) = Ex[f(X(n); (D < oo] = Hvcf(x), x E D. (5) If (v < oo a.s., then this is the only bounded solution; when d 2: 3 and f E Co(öD), it is the only solution in Co(D).

Thus, Hvc agrees with the sweeping (balayage) kernel of Newtonian potential theory, which determines the harmonic rneasure on öD. The following result clarifies the role of the regularity condition on öD. Lemma 24.6 (regularity, Doob) A point b E öD is regular for Dc iff, for any f E Cb(öD), the function u in (5) satisfies u(x)-+ f(b) as D 3 x-+ b. Proof: First assume that bis regular. For any t get by the Markov property Px{( > t} :'S Px{( o lh > t - h}

> h > 0 and x

E D, we

= ExPxh {( > t- h}.

Here the right-hand side is continuous in x, by the continuity of the Gaussian kernel and dominated convergence, and so limsupPx{( > t} :S EbPxh {( > t- h} =Pb{( o(h > t- h}. X->b

24. Connections with PDEs and Potential Theory

475

As h --+ 0, the probability on the right tends to Pb{ ( > t} = 0, and so Px{( > t} --+ 0 as x --+ b, which means that Px o (- 1 ~ 8o. Since also Px ~Pb in C(ffi.+, ffi.d), Theorem 4.28 yields Px o (X, ()- 1 ~Pb o (X, o)- 1 in C(ffi.+,ffi.d) x [O,oo]. By the continuity of the mapping (x,t) t--+ Xt it follows that Px o X( 1 ~ Pb o X 0 1 = 8b, and so u(x) --+ f(b) by the continuity of f. Next assume the stated condition. If d = 1, then Dis an interval, which is obviously regular. Now assume that d ~ 2. By the Markov property we get for any f E Cb(äD)

As h--+ 0, it follows by dominated convergence that u(b) = f(b), and for f(x) = e-ix-bi we get Pb{X( = b, ( < oo} = 1. Since a.s. Xt =/=- b for all t > 0 by Theorem 18.6 (i), we may conclude that Pb{(= 0} = 1, and so b is regular. 0

Proof of Theorem 24.5: Let u be given by (5), fix any closed ball in D with center x and boundary S, and conclude by the strong Markov property at T = Ts that

This shows that u has the mean-value property, and so by Lemma 24.3 it is harmonic. From Lemma 24.6 it is further seen that u is continuous on D with u = f on äD. Thus, u solves the Dirichlet problern (D, !). Now assume that d ~ 3 and f E Co(äD). For any c: > 0 we have lu(x)l ~

E:

+ llfll Px{lf(Xdl > E:, ( < CXl }.

(6)

Since Xis transient by Theorem 18.6 (ii) and the set {y E äD; lf(y)l > c:} is bounded, the right-hand side of (6) tends to 0 as lxl --+ oo and then c:--+ 0, which shows that u E Co(D). To prove the asserted uniqueness, it is clearly enough to assume f = 0 and show that any solution u with the stated properties is identically zero. If d ~ 3 and u E C0 (D), then this is clear by Lemma 24.3, which shows that harmonic functions can have no local maxima or minima. Next assume that ( < oo a.s. and u E Cb(D). By Corollary 17.19 we have Exu(X( 11 n) = u(x) for any x E D and n E N, and as n --+ oo, we get by continuity and dominated convergence u(x) = Exu(X() = 0. 0 To prepare for our probabilistic construction of the Green function in a domain D c ffi.d, we need to study the transition densities of Brownian motion killed on the boundary äD. Recall that ordinary Brownian motion in ffi.d has transition densities

476

Foundations of Modern Probability

By the strong Markov property and Theorem 6.4, we get for any t > 0, x E D, and B c B(D),

Px{Xt E B}

= Px{Xt

E

B,

t:::; (} + Ex[Tt- (].

Thus, the killed process has transition densities

pf(x, y)

= Pt(x, y)- Ex[pt- 0.

(8)

The following symmetry and continuity properties of pf play a crucial role in the sequel.

Theorem 24.7 (tmnsition density, Hunt) For any domain D in JR.d and timet> 0, the function pf is symmetric and continuous on D 2 . If b E 8D is regular, then pf(x, y)-+ 0 as x-+ b for jixed y E D.

Proof: From (7) we note that Pt (x, y) is uniformly continuous in (x, y) for fixed t > 0, as well as in (x, y, t) for lx- Yl > e > 0 and t > 0. By {8) it follows that pf(x,y) is equicontinuous in y E D for fixed t > 0. To prove the continuity in x E D for fixed t > 0 and y E D, it is then enough to show that Px{Xt E B, t:::; (} is continuous in x for fixed t > 0 and BE B(D). Letting h E (0, t), we get by the Markov property Px{Xt E B, ( :2:: t}

= Ex[Pxh {Xt-h

E

B, ( :2:: t- h}; ( > h].

Thus, for any x, y E D,

I(Px- Py){Xt E B, t:::; (}I :::; (Px + Py){(:::; h} +

IIPx oX7: 1 - Py oX7: 1 }11,

which tends to 0 as y -+ x and then h -+ 0. Combining the continuity in x with the equicontinuity in y, we conclude that pf(x, y) is continuous in (x, y) E D 2 for fixed t > 0. To prove the symmetry in x and y, it is now enough to establish the integrated version

Ia

Px{Xt E B, ( > t}dx

=

l

Px{Xt E C, ( > t}dx,

(9)

for any bounded sets B, C E B(D). Then fix any compact set F C D. Letting n E N and writing h = 2-nt and tk = kh, we get by Proposition

8.2

Ia

Px{Xtk E F, k:::; 2n; Xt E B}dx

=1

.. ·11c(xo)1B(X2n) F

F

IJ Ph(Xk-1, Xk)dxo .. · dx2n.

kS~

Here the right-hand side is symmetric in the pair (B, C), because of the symmetry of Ph(x, y). By dominated convergence as n -+ oo we obtain (9) with F instead of D, and the stated version follows by monotone convergence as F t D.

24. Connections with PDEs and Potential Theory

477

To prove the last assertion, we recall from the proof of Lemma 24.6 that Px o ((, X)- 1 ~Pb o (0, X)- 1 as x--+ b with b E 8D regular. In particular, Px o ((,X c > 0, it is clear from (8) that pf(x, y)--+ 0. D A domain D C JR.d is said to be Greenian if either d 2:: 3, or if d ::; 2 and Px {(v < oo} = 1 for all x E D. Since the latter probability is harmonic in x, it is enough by Lemma 24.3 to verify the stated property for a single x E D. Given a Greenian domain D, we may introduce the Green function gD(x,y)

= fooo pf(x,y)dt,

x,y E D.

For any measure J.L on D, we may further introduce the associated Green potential

Writing cD J.l = cD f when J.L(dy) = f(y)dy, we get by Fubini's theorem

which identifies gD as an occupation density for the killed process. The next result shows that gD and GD agree with the Green function and Green potential of classical potential theory. Thus, cD J.L( X) may be interpreted as the electrostatic potential at x arising from a charge distribution J.l in D, when the boundary ßD is grounded. Theorem 24.8 (Green function) For any Greenian domain D c JR.d, the function gD is symmetric on D 2 • Furthermore, gD (x, y) is harmonic in x E D \ {y} for each y E D, and if b E 8D is regular, then gD(x, y)--+ 0 as x--+ b for jixed y E D.

The proof is Straightforward when d 2:: 3, but for d ::; 2 we need two technicallemmas. We begin with a uniform estimate for large t. Lemma 24.9 (uniform integrability) Consider a domain D assumed to be bounded when d::; 2. Then

lim sup

t---.oo

x,yED

C

JR.d,

[oo p~(x, y)ds = 0.

lt

Proof: For d 2:: 3 we may take D = JR.d, in which case the result is obvious from (7). Next let d = 2. By obvious domination and scaling arguments, we may then assume that lxl ::; 1, y = 0, D = {z; lzl ::; 2}, and t > 1.

478

Foundations of Modern Probability

Writing Pt(x) = Pt(x, 0), we get by (8)

pf(x, 0)

< <
t/2} + Pt(O)- Pt(1) C 1 Po{(> t/2} + r 2 .

As in case of Lemma 23.8 (ii), we have Eo( < oo, and so by Lemma 3.4 the right-hand side is integrable in t E [1, oo ). The proof for d = 1 is similar. D We also need the fact that bounded sets have bounded Green potential. Lemma 24.10 (boundedness) For any Greenian domain D C ffi.d and bounded set BE B(D), the function GD1B is bounded.

Proof: By domination and scaling together with the strong Markov property, it suffices to take B = {x; lxl:::; 1} and to show that GD1B(O) < oo. For d 2: 3 we may further take D = ffi.d, in which case the result follows by a simple computation. For d = 2 we may assume that D :J C {x; lxl < 2}. Write er= (c+TBo(}(c and To = 0, and recursively define Tk+l = Tk+rro(}rk, k 2: 0. Putting b = (1, 0), we get by the strong Markov property at the times Tk

=

Here G 0 1B(O) V G 0 1B(b) < oo by Lemma 24.9. By the strong Markov property it is further seen that Po{ Tk < (} :::; pk, where p = supxEB Px{ er < (}. Finally, note that p < 1, since Px{ er < (} is harmonic and hence continuous on B. The proof for d = 1 is similar. D

Proof of Theorem 24.8: The symmetry of gD is clear from Theorem 24.7. If d 2: 3, or if d = 2 and D is bounded, it is further seen from Theorem 24.7, Lemma 24.9, and dominated convergence that gD(x, y) is continuous in x E D \ {y} for each y E D. Next we note that GD1B has the meanvalue property in D \ B for bounded BE B(D). The property extends by continuity to the density gD(x,y), which is then harmonic in x E D \ {y} for fixed y E D, by Lemma 24.3. For d = 2 and unbounded D, we define Dn = {x E D; lxl < n}, and note as before that gDn(x,y) has the mean-value property in x E Dn \ {y} for each y E Dn. Since pfn t pf by dominated convergence, we have gDn t gD, and so the mean-value property extends to the limit. For any x f. y in D, choose a circular disk B around y with radius s > 0 small enough that x ~ B C D. Then 1rs2 gD(x, y) = GD1B(x) < oo by Lemma 24.10. Thus, by Lemma 24.3 even gD(x, y) is harmonic in x E D \ {y}. To prove the last assertion, fix any y E D, and assume that x--+ b E ßD. Choose a Greenian domain D' :J D with b E D'. Since pf :::; pf', and

479

24. Connections witb PDEs and Potential Tbeory

both pf' (·, y) and gD' (·, y) are continuous at b whereas pf (x, y) -+ 0 by Theorem 24.7, we get gD(x,y)-+ 0 by Theorem 1.21. D We proceed to show that a measure is determined by its Green potential whenever the latter is finite. An extension appears as part of Theorem 24.12. For convenience, we write

Pf J-t(x)

=

J

pf(x, Y)J-t(dy),

x E D, t > 0.

Theorem 24.11 (uniqueness) lf f-t and v are measures on a Greenian domain D C JRd suchthat GD J-t = GDv < oo, then f-t = v.

Proof: For any t > 0 we have 1t(P8DJ-t)ds

= GDJ-t- PfGDJ-t = GDv- PfGDv = 1t(Pfv)ds.

(10)

By the symmetry of pD, we further get for any measurable function D-+ JR+

J

=

f(x)Pf J-t(x)dx

J Jp~(x, J Jf(x)p~(x, f(x)dx

y)J-t(dy)

J-t(dy)

Hence,

1t ds

J

y)dx =

f :

J

Pf f(Y)J-t(dy).

J

Pf f(y) J-t(dy)

J-L(dy) lt psD f(y) ds,

and similarly for v. By (10) we obtain

J

J-t(dy) lt psD f(y)ds =

J

v(dy) lt P8Df(y)ds.

(11)

Assuming that f E Cj((D), we get Pf f-+ fass-+ 0, and so r 1 J~ P8Dfds -+ f. If we can take limits inside the outer integrations in (11), we obtain J-tf = v J, which implies J-t = v since f is arbitrary. To justify the argument, it suffices to show that sup 8 Pf' f is J-t- and vintegrable. Then conclude from Theorem 24.7 that f ;:::: p~ ( ·, y) for fixed s > 0 and y E D, and from Theorem 24.8 that f ;:::: GD f. The latter property yields Pf' f ;:::: Pf' GD f :::; GD f, and by the former property we get for any y E D and s > 0

J-t(GD!) =

J

and similarly for v.

cD J-L(x)f(x)dx;:::: psDGD J-t(y):::; GD J-t(y) < oo, D

480

Foundations of Modern Probability

Now let Fn and JCn denote the classes of closed and compact subsets of JCb for the subclasses of sets with regular boundary. For any BE Fn we may introduce the associated hitting kernel

D, and write .1}) and

H{i(x,dy)

= Px{TB < (n, Xr

8

E

dy},

x E D.

Note that if X has initial distribution f-t, then the hitting distribution of xt: in B equals J.tHfi = J.t(dx)Hfl(x, ·). The next result solves the sweeping problern of classical potential theory. To avoid technical complications, here and below, we shall only consider subsets with regular boundary. In general, the irregular part of the boundary can be shown to be polar, in the sense of being a.s. avoided by a Brownian motion. Given this result, one can easily remove all regularity restrictions.

J

Theorem 24.12 (sweeping and hitting) For any Greenian domain D c JRd and subset B E .1}), let f-t be a bounded measure on D with GD f-t < oo on B. Then J.tHJl is the unique measure von B with GDJ.t = GDv on B.

For an electrostatic interpretation, assume that a grounded conductor B is inserted into a domain D with grounded boundary and charge distribution f-t· Then acharge distribution -J.tHJl arises on B. A lemma is needed for the proof. Herewe define gD\B (x, y) = 0 whenever x or y lies in B. Lemma 24.13 (fundamental identity) For any Greenian domain D

and subset B E .1}), we have gD(x,y)

= gD\B(x,y) +

l

Hfl(x,dz)gD(z,y),

c JRd

x,y E D.

Proof: Write ( = (n and T = Tß. Subtracting relations (8) for the domains D and D \ B, and using the strong Markov property at T together with Theorem 6.4, we get pf(x, y)- p~\B (x, y) =

=

Ex[pt-r(Xn y); T < ( 1\ t]- Ex[pt-c;(Xc;, y); T < ( < t] Ex[pt-r(Xn y); T < ( 1\ t] - Ex[Ex,. [pt-r-c;{Xc;, y); ( < t- r]; T < ( 1\ t] Ex[pf- 7 (Xn y); T < ( 1\ t].

Now integrate with respect tot to get

gD(x,y) _ gD\B(x,y)

Ex[gD(Xn y);

T

< (]

j Hfl(x, dz)gD(z, y).

0

Proof of Theorem 24.12: Since ßB is regular, we have Hfl(x, ·) = 8x for all x E B, and so by Lemma 24.13 we get for all x E Band z E D

j gD(x,y)Hfl(z,dy) = j gD(z,y)Hfl(x,dy) = gD(z,x).

24. Connections with PDEs and Potential Theory

481

Integrating with respect to tL( dz) gives GD (f.l,HJl) (x) = GD !L( x), which shows that v = jLHJl has the stated property. Now consider any measure von B with GD /L = GDv on B. Noting that gD\B(x, ·) = 0 on B whereas H}l(x, ·) is supported by B, we get by Lemma 24.13 for any x E D

GDv(x)

I I

=

v(dz)gD(z,x)

=I I =I v(dz)

H}l(x, dy)GDv(y)

gD(z,y)H}l(x,dy)

H}l(x, dy)GD !L(y).

Thus, /L determines GDv on D, and so v is unique by Theorem 24.11.

D

Let us now turn to the classical equilibrium problem. For any K E KD we introduce the last exit or quitting time 'Yß

= sup{t < (D; Xt

E

K}

and the associated quitting kernel L~(x,dy)

= Px{'Yß > 0; X('Yß)

E

dy}.

Theorem 24.14 (equilibrium measure and quitting, Chung) For any Greenian domain D E JRd and subset K E KD, there exists a measure /L~ on äK such that L~(x, dy) = gD(x, Y)!L~(dy), x E D. (12) Furthermore, /L~ is diffuse when d 2: 2, and if K E Kb, then ILf is the unique measure /L on K satisfying GD /L = 1 on K. Here ILf is called the equilibrium measure of K relative to D, and its total mass Cß is called the capacity of K in D. For an electrostatic interpretation, assume that a conductor K with potential 1 is inserted into a domain D with grounded boundary. Then a charge distribution ILf arises on the boundary of K. Proof of Theorem 24.14: Write 'Y = 'Yß, and define

le(x)

=C

1 Px{0

< 'Y:::; c},

c > 0.

Using Fubini's theorem, the simple Markov property, and dominated convergence as c:-+ 0, we get for any f E Cb(D) and x E D

GD(!le)(x)

Ex

=

1' 1 1

f(Xt)le(Xt)dt

C 1

00

c- 1

00

Ex[f(Xt)Pxt{O < 'Y:::; c:}; t Ex[f(Xt); t < 'Y:::; t + c:]dt

c:- 1 Ex (Y

kr-E)+

j(Xt)dt

-+ Ex[f(X."); 'Y > 0] = L~f(x).

< (]dt

482

Foundations of Modern Probability

If f has compact support, then for each x we may replace bounded, continuous function f / gD (x, ·) to get as c: --+ 0

!

!( )l ( )d --+jL~(x,dy)f(y) y "Y Y gD(x, y) .

f by the (13)

Since the left-hand side is independent of x, the same thing is true for the measure

D(dy) = L~(x,dy). gD(x,y)

(14)

1-LK

If d = 1, we have gD(x,x) < oo, and (14) is trivially equivalent to (12). If instead d ~ 2, then singletons are polar, and so the measure L~(x, ·) is diffuse, which implies the same property for J.L~· Thus, (12) and (14) are again equivalent. We may further conclude from the continuity of X that L~(x, ·), and then also J.L~ is supported by 8K. Integrating (12) over D yields

Px{TK < (D} = GDJ.L~(x),

x E D,

and so for K E ICb we get GD J.L~ = 1 on K. If v is another measure on K with GDv = 1 on K, then v = J.L~ by the uniqueness part of Theorem 24.12. D The next result relates the equilibrium measures and capacities for different sets K E ICb. Proposition 24.15 (consistency) For any Greenian domain D subsets K C B in IC!J, we have

D

DHD

1-LK

l

1-LB

c~ =

K

= 1-LBDLDK>

Px{TK
1,

where the difference ~Un in the last formula is taken with respect to Uo Note that the higher-order differences ~U!,ooo,Un areinvariant under permutations of U1, o o, Uno We say that h is alternating or completely monotone if 0

( -1)n+l ~u1 ,oo.,unh(U) 2: 0,

n E N, U, U1, U2, 000 E Uo

Corollary 24.16 {dependence on conductor, Choquet) For any Greenian domain D C JR.d, the capacity Cß is an alternating function of K E K[y Furthermore, JLJL ~ p,~ as Kn-!,. Kor Kn t Kin K[yo 0

Proof: Let

Do Writing

'1/J

denote the path of

X',

regarded as a random closed set in

hx(K) = Px{'l/JK #0} = Px{TK < (},

x E D \ K,

we get by induction

and the first assertion follows by Proposition 24015 with K c Bo 0 To prove the last assertion, we note that trivially TKn -!,. TK when Kn t K, and that TKn t TK when Kn -!,. K since the Kn are closedo In the latter case we also note that TKn < (} = {TK < (} by compactneSSo Thus, in both cases HßJx, o) ~ Hß(x, o) for all x E D \ Un Kn, and by dominated convergence in Proposition 24015 with B 0 ::) Un Kn we get D w D D Jl.Kn --7 Jl.Ko

nn{

The next result solves an equilibrium problern involving two conductorso Corollary 24.17 (condenser theorem) For any disjoint sets BE FD and K E K[y, there exists a unique signed measure v on B U K with GD v = 0 on Band cDv = 1 on K, namely D\B D\BHD V=flK -jJK ßo

Proof: Applying Theorem 24014 to the domain D \ B with subset K, we get v = p,~\B on K, and then v = -jJ~\B Hfl on B by Theorem 240120 D The symmetry between hitting and quitting kernels in Proposition 24015 may be extended to an invariance under time reversal of the whole processo More precisely, putting r = rß, we may relate the stopped process X~ = X-y/\t to its reversal Xi = X('Y-t)+ For convenience, we write 0

Foundations of Modern Probability

484

J

PP. = PxJ.L(dx) and refer to the induced measures as distributions, even when J.L is not normalized. Theorem 24.18 {time reversal) Given a Greenian domain D E JRd and

a set K E Kb, put"' = "Yß and J.L

= J.L~·

Then X'Y ~

fp under Pw

Proof: Let Px and Ex refer to the process X(. Fix any times 0 = to < t 1 < · · · < tn, and write Sk = tn -tk and hk = tk -tk-1· For any continuous nmctions /o, ... 'fn with compact Supports in D, we define

It tn}· Comparing with (17), we see that X'Y and X'Y have the same finite-dimensional distributions. D We may now extend Proposition 24.15 to the case of possibly different Greenian domains D CD'. Fixing any K E Kv, we recursively define the optional times Tj = "/j-1

+ rKD' o o'Yj-l,

"'i =

Tj

+ "YKD o oTj,

i ;:::: 1,

starting with "Yo = 0. In other words, Tk and "/k are the times of hitting or quitting K during the kth excursion in D that reaches K, prior to the exit

24. Connections with PDEs and Potential Theory

time

(n~.

485

The generalized hitting and quitting kernels are given by

D D (x, ·)=Ex L....tk 8x(rk)• HK, 1

D D (x, ·)=Ex L....tk 8x('yk)' LK' 1

"'""'

"'""'

where the summations extend over all k E N with

< oo.

Tk

Theorem 24.19 (extended consistency relations} Let D C D' be Greenian domains in ~d with regular compact subsets K c K'. Then

(18)

Proof: Define l" = C 1 Pxb~ E (0, e]}. Proceeding as in the proof of Theorem 24.14, we get for any x E D' and f E Cb(D') I r(DI I GD (fl")(x) = e- 1 Ex Jo f(Xt)1{'yfl o Ot E (0, e]}dt -t Lf,D f(x). If f has compact support in D, we may conclude as before that

J

D f(Y)f.LK(dy) +-

J

(fl")(y)dy

--1

J

DD LK' (x,dy)f(y) gDI (x, y) ' 1

and so

Lf'D (x, dy) 1

= gD

1

(x, Y)JLß(dy).

Integrating with respect to JLß;, and noting that GD f.lß; = 1 on K' ~ K, we obtain the second expression for JLß in (18). I DD DD To deduce the first expression, we note that H ~ HK' = HK' by the strong Markov property at TK. Combining with the second expression in (18) and using Theorem 24.18 and Proposition 24.15, we get 1

1

D

lkK

1

D D D' D DD D D' D D D' D D' = lkK LK' = lkK HK' = ikK HK HK' = lkK HK' . 1

1

1

1

1

1

1

D

The last result enables us to study the equilibrium measure p,f and capacity C~ as functions of both D and K. In particular, we obtain the following continuity and monotonicity properties. Corollary 24.20 (dependence on domain} For any regular, compact set K C ~d, the measure JLJI is nonincreasing and continuous from above as a function of the Greenian domain D ~ K.

Proof: The monotonicity is clear from (18) with K = K', since H~'D 1 (x, ·) 2: 8x for x E K CD CD'. lt remains to prove that C~ is continuous from above and below in D for fixed K. By dominated convergence it is then enough to show that ~~;fn -t ~~;f, where ~~;f = sup{j; Tj < oo} is the number of D-excursions hitting K. Assuming Dn t D, we need to show that if X 5 , Xt E K and X E D on [s, t], then X E Dn on [s, t] for sufficiently large n. But this is clear from the compactness of the path on the interval [s, t]. If instead Dn -!- D, we need to show for any r < s < t with Xn Xt E K and X 8 tj: D that Xs tj: Dn for sufficiently large n. But this is obvious. D

486

Foundations of Modern Probability

We proceed to show how Green capacities can be expressed in terms of random sets. Let x denote the identity mapping on :Fv. Given any measure v on :Fn \ {0} with v{xK # 0} < oo for all K E /Cn, we may introduce a Poisson process 11 on :Fn \ {0} with intensity measure v and form the associated random dosed set cp = U{F; ry{F} > 0} in D. Letting 1fv denote the distribution of cp, we note that

1rv{xK = 0}

= P{ry{xK # 0} = 0} = exp(-v{xK # 0}), K

E /Cn.

Theorem 24.21 (Green capacities and random sets, Choquet) For any Greenian domain D C JRd, there exists a unique measure v on :Fn \ {0} suchthat C~

= v{xK # 0} = -log7rv{xK = 0}, K c /Cb.

Proof: Let 1/J denote the path of x< in D. Choose sets Kn t D in /Cb with Kn c K~+l for all n, and put J.Ln = J.L~,., '1/Jn = '1/JKn, and Xn = xKn. Define

v;: =

J

Px{'I/Jp E ·, '1/Jn '# 0}J.Lp(dx),

n:::; p,

(19)

and condude by the strong Markov property and Proposition 24.15 that

(20) By Corollary 6.15 there exist some measures Vn on :Fn, n E N, satisfying (21)

and from (20) we note that

# 0} = Vm, m :::; n. Hence, the measures Vn agree on {Xm # 0} for n ~ define v = supn Vn. By (22) we have v{·, Xn '# 0} = Vn Vn {.' Xm

K

E

Kb with K

C K~,

(22)

m, and so we may for all n. Assuming we condude from (19), (21), and Proposition 24.15

that

v{xK # 0}

= vn{xK # 0} = v:{xK # 0}

J J

Px{'I/JnK '# 0}J.Ln(dx) Px{TK

< OJ.Ln(dx) = c~.

The uniqueness of v is dear by a monotone dass argument.

0

The representation of capacities in terms of random sets will now be extended to the abstract setting of alternating set functions. As in Chapter 16, we may then fix an lcscH space S with Borel a-field S, open sets Q, dosed sets :F, and compacts K. Write S = {B E S; B E K}, and recall that a dass U C S is said to be separating if for any K E K and G E g with K C G there exists some U EU with K CU C G.

24. Connections with PDEs and Potential Theory

For any nondecreasing function h on a separating dass U the associated inner and outer capacities h 0 and h by

487

c S, we define

sup{h(U); U EU, U C G},

GE g,

inf{h(U); U EU, U 0 :::> K},

K E K.

Note that the formulas remain valid with U replaced by any separating subdass. For any random dosed set cp inS, the associated hitting function h is given by h(B) = P{ cpB =/= 0} for all B E S.

Theorem 24.22 (altemating functions and random sets, Choquet) The hitting function h of a random closed set in S is altemating with h = h on K and h = h 0 on g. Conversely, given a separating class U C S, closed under finite unions, and an altemating function p: U -+ [0, 1] with p(0) = 0, there exists a random closed set with hitting function h such that h = p on K and h = p0 on g. The algebraic part of the construction is darified by the following lemma.

Lemma 24.23 {discrete case) Assume U C S to be finite and closed under unions, and let h: U-+ [0, 1] be altemating with h(0) = 0. Then there exists a point process ~ on S suchthat P{~U > 0} = h(U) for all U EU. Proof: The statement is obvious when U = {0}. Proceeding by induction, assume the assertion to be true when U is generated by up to n -1 sets, and consider a dass U generated by n nonempty sets B1, ... , Bn. By scaling we may assume that h(Bt U · · · U Bn) = 1. For each j E { 1, ... , n}, let Uj be the dass of unians formed by the sets Bi \ B 3 , i =!= j, and define hj(U)

= A.uh(Bj) = h(Bi U U)- h(B3 ),

U E U3 .

Then each hj is again alternating with hj(0) = 0, and so the induction hypothesis ensures the existence of some point process ~i on Ui Bi\ B 3 with hitting function hj. Note that hj remains the hitting function of ~j on all of U. Let us further introduce a point process ~n+l with

For 1 ~ j ~ n + 1, let Vj denote the restriction of .C(~i) to the set Ai = ni 0}, and put V = Lj Vj· We may take ~ to be the canonical point process on S with distribution v. To see that ~ has hitting function h, we note that for any U E U and j ~ n,

488

Foundations of Modern Probability

It remains to show that, for any U EU\ {0},

Lj:~::)-1)H 1 ßBl>····BJ-l,uh(Bj) + (-1t+lßBl>···•Bnh(0)

= h(U).

This is clear from the fact that ßB 1 ,. .. ,BJ_ 1 ,uh(Bj)

= ßB

1 , .••

,BJ,uh(0)

+ ßB

1 , ...

,BJ_ 1 ,uh(0).

D

Proof of Theorem 24.22: The direct assertion can be proved in the same way as Corollary 24.16. Conversely, let U and p be as stated. By Lemma A2.7 we may assume U tobe countable, say U = {UI. U2, ... }. Foreach n, let Un be the dass of unions formed from U1 , ... , Un. By Lemma 24.23 there exist some point processes 6, 6, ... on S suchthat P{~nU

> 0} = p(U),

U E Un, n E N.

The space F is compact by Theorem A2.5, and so by Theorem 16.3 there exists some random closed set

0} for all n. With any optional timeT we may associate the a-field Fr- generated by Fo and the classes Ft n { r > t} for arbitrary t > 0. The following result gives the basic properties of the a-fields Fr-. It is interesting to note the similarity with the results for the a-fields Fr in Lemma 7.1. Lemma 25.2 (strict past) For any optional times a and r, we have

(i) Fun{a 0 and A E :Ft, then clearly

{Xr1{r < oo} = 1} =An {t < T < oo} E Fr-· We may now extend by a monotone dass argument and subsequent approximation, first to arbitrary predictable indicator functions, and then to the general case. (ii) We may clearly assume a to be integrable. Fixing an announcing sequence (T n) for T, we define X;'= E(o:I:Frn]{1{0 < Tn < t}

+ 1{Tn = 0}), t 2 0.

Then each xn is left-continuous and adapted, hence predictable. Moreover, xn ---+X on JR.+ a.s. by Theorem 7.23 and Lemma 25.2 (iii). o By a totally inaccessible time we mean an optional time T such that P {a = T < oo} = 0 for every predictable time a. An accessible time may then be defined as an optional time T such that P{ a = r < oo} = 0 for every totally inaccessible time a. For any random time T, we introduce the associated graph

(r]

=

{(t,w)

E

JR.+

X

0; r(w) = t},

which allows us to express the previous condition on a and ras (a] n (r] = 0 a.s. Given any optional timeT and set A E Fn the time r A = r1A + oo ·1Ac is again optional and is called the restriction of T to A. We now consider a basic decomposition of optional times. Related decompositions of increasing processes and martingales are given in Propositions 25.17 and 26.16.

25. Predictability, Compensation, and Excessive Functions

493

Proposition 25.4 ( decomposition of optional times) For any optional timeT there exists an a.s. uniqueset A E :F7 n {T < oo} suchthat TA is accessible and TAc is totally inaccessible. Purthermore, there exist some predictable times T1, T2, ... with [TA] C Un[Tn] a.s. Proof: Define p = supPUn {T = Tn

< oo},

(1)

where the supremum extends over all sequences of predictable times Tn. Combining sequences such that the probability in (1) approaches p, we may construct a sequence (Tn) for which the supremum is attained. For such a maximal sequence, we define Aas the union in (1). To see that TA is accessible, leterbe totally inaccessible. Then [cr]n[Tn] = 0 a.s. for every n, and so [er] n [TA] = 0 a.s. If T Ac is not totally inaccessible, then P {TA c = To < oo} > 0 for some predictable time To, which contradicts the maximality of T1, T2, .... This shows that A has the desired property. To prove that A is a.s. unique, let B be another set with the stated properties. Then TA \B and Tß\A are both accessible and totally inaccessible, D and so TA\B = Tß\A = oo a.s., which implies A = B a.s. We proceed to establish a version ofthe celebrated Doob-Meyer decomposition, a cornerstone in modern probability theory. By an increasing process we mean a nondecreasing, right-continuous, and adapted process A with Ao = 0. We say that A is integrable if EA 00 < oo. Recall that all Submartingales are assumed to be right-continuous. Local submartingales and locally integrable processes are defined by localization in the usual way.

Theorem 25.5 {decomposition of submartingales, Meyer, Doleans) A process X is a local submartingale iff it has a decomposition X = M + A, where M is a local martingale and A is a locally integrable, increasing, predictable process. In that case M and A are a.s. unique. The process A in the statement is often referred to as the compensator of X, especially when X is increasing. Several proofs of this result are known, most of which seem to require the deep section theorems. Here we give a relatively short and elementary proof, based on Dunford's weak compactness criterion and an approximation of totally inaccessible times. For convenience, we divide the proof into several lemmas. Let (D) denote the dass of measurable processes X such that the family {X7 } is uniformly integrable, where T ranges over the set of all finite optional times. By the following result it is enough to consider dass (D) submartingales.

Lemma 25.6 (uniform integrability) Any local submartingale X with Xo = 0 is locally of class (D ). Proof: First reduce to the case when X is a true submartingale. Then introduce for each n the optional timeT= n A inf{t > 0; IXtl > n}. Here

Foundations of Modern Pobability

494

IX'TI ~-

~

n V IX'TI, which is integrable by Theorem 7.29, and so X'T is of dass 0

An increasing process A is said to be natural if it is integrable and such that E J;' LlMtdAt = 0 for any bounded martingale M. As a crucial step in the proof of Theorem 25.5, we may establish the following preliminary decomposition, where the compensator A is shown to be natural rather than predictable.

Lemma 25.7 (Meyer) Any submartingale X of class (D) has a decomposition X= M + A, where M is a uniformly integrable martingale and A is a natural, increasing process. Proof (Rao): We may assume that Xo = 0. Introduce the n-dyadic times tk = k2-n, k E Z+, and define for any process Y the associated differences anky = ytnk+l - ytn. Let k

A~ = ~

~k r} for n E N and r > 0, we get by optional sampling, for any n-dyadic time t,

lE[An· An > 2r] 2 t ' t

r] ~ E[A~ - A~;-At] = E[Xt - X'T;-At] = E[Xt - X'T;'Ati A~ > r). (2) ~

E[A~ - A~ 1\

By the martingale property and uniform integrability, we further obtain

rP{Af > r} ~ EA~

= EXt :S 1,

and so the probability on the left tends to zero as r -t oo, uniformly in t and n. Since the random variables Xt- X'Tn/\t are uniformly integrable by (D), the same property holds for the variables Af by (2) and Lemma 4.10. In particular, the sequence (A~) is uniformly integrable, and each Mn is a uniformly integrable martingale. By Lemma 4.13 there exists some random variable a E L 1 (Foo) such that A~ -t a weakly in L 1 along some subsequence N' C N. Define

A=X-M, and note that A 00 = a a.s. by Theorem 7.23. For any dyadic t and bounded random variable ~, we get by the martingale and self-adjointness properties E(A~- At)~

E(Mt- M;")~ = E E[Moo- M~IFt]~

= E(Moo -

M~)E[~IFt]

E(A~- a)E[~IFt]-t 0,

as n -t oo along N'. Thus, Af -tAt weakly in L 1 for dyadic t. In particular, we get for any dyadic s < t

0 ~ E[A~ - A~; At - As

< 0) -t E[(At - As) 1\ 0]

~ 0.

25. Predictability, Compensation, and Excessive Functions

495

Thus, the last expectation vanishes, and therefore At 2:: As a.s. By rightcontinuity it follows that A is a.s. nondecreasing. Also note that A0 = 0 a.s. since A~ = 0 for all n. To see that A is natural, consider any bounded martingale N, and conclude by Fubini's theorem and the martingale properties of N and An - A = M- Mn that

Lk ENooß'kAn = Lk ENt'kßkAn Lk ENt'kß'kA = E Lk Nt'kß'kA.

Now use weak convergence on the left and dominated convergence on the right, and combine with Fubini's theorem and the martingale property of N to get

Hence, E J000 ßNtdAt

= 0,

as required.

0

To complete the proof of Theorem 25.5, it remains to show that the compensator A in the last lemma is predictable. This will be inferred from the following ingenious approximation of totally inaccessible times.

Lemma 25.8 (uniform approximation, Doob} For any totally inaccessible timeT, put Tn = 2-n[2nr], and let xn be a right-continUOUS Version of the process P[rn ~ t!Ft]· Then (3) lim supJXf-1{r~t}J=0 a.s. n-+oo

t~O

Proof: Since Tn t T, we may assume that Xf 2:: Xf 2:: · · · 2:: 1{r ~ t} for all t 2:: 0. Then Xf = 1 fort E [r,oo), and on the set {r = oo} we have Xf ~ P[r < oo!Ft] -+ 0 a.s. as t -+ oo by Theorem 7.23. Thus, supn IXf -1{r ~ t}l-+ 0 a.s. as t-+ oo. To prove (3), it is then enough to show for every c: > 0 that the optional times

O'n

= inf{t 2:: 0; Xf -1{r ~ t} > c:},

n E N,

tend a.s. to infinity. The an are clearly nondecreasing, and we denote their limit by a. Note that either O'n ~ T or O'n = oo for each n. By optional sampling, Theorem 6.4, and Lemma 7.1, we have x;1{a < oo}

P[rn ~ a < oo!Fu]

-+

P[r ~ a

< ooiFa] = l{r ~ a < oo}.

Hence, x:; -+ 1{r ~ a} a.s. on {a < oo}, and so by right-continuity we have on this set an < a for large enough n. Thus, a is predictable and announced by the times O'n 1\ n.

496

Foundations of Modern Pobability

Next apply the optional sampling and disintegration theorems to the optional times an, to obtain

c:P{a < oo}

< c:P{an < oo}::; E[X;n; an< oo] P{Tn::; an< oo} = P{Tn::; an::; T < oo} ---+ P{r = a < oo} = 0,

where the last equality holds since a.s.

T

is totally inaccessible. Thus, a = oo 0

It is now easy to see that A has only accessible jumps. Lemma 25.9 ( accessibility) For any natural increasing process A and totally inaccessible time T, we have ßA7 = 0 a. s. on {T < oo}.

Proof: Rescaling if necessary, we may assume that A is a.s. continuous at dyadic times. Define Tn = 2-n[2nr]. Since Ais natural, we have

E and since

T

1

P[rn > t!Ft]dAt = E

00

1 1 00

P[Tn > t!Ft-]dAt,

is totally inaccessible, it follows by Lemma 25.8 that

EAr- = E Hence, E[ßA 7 ;

T

1

00

< oo]

l{r > t}dAt = E

00

l{r 2:: t}dAt = EAr-

= 0, and so ßA 7 = 0 a.s. on {r

< oo}.

0

Finally, we need to show that A is predictable. Lemma 25.10 (Doleans} Every natural increasing process is predictable.

Proof: Fixa natural increasing process A. Consider a bounded martingale

M and a predictable timeT < oo announced by a 1 , a 2 , .•.. Then M 7 - Muk is again a bounded martingale, and since A is natural, we get by dominated convergence EßM7 ßA7 = 0. In particular, we may take Mt = P[B!Ft] with BE .1"7 • By optional sampling we have M 7 = ls and

= P[B!Fuk]---+ P[B!Fr-]·

MT-+-- Muk

Thus, ßM7 = ls- P[B!Fr-], and so E[ßA 7 ;B]

= EßA P[B!Fr-] = E[E[ßAr!Fr-];B]. 7

Since B was arbitrary in Fn we get ßA7 = E[ßA 7 !Fr-] a.s., and so the process Ai = ßA7 1{T ::; t} is predictable by Lemma 25.3 (ii). It is also natural, since for any bounded martingale M

EßA 7 ßM7 = EßA 7 E[ßMr!Fr-] = 0. By an elementary construction we have {t > 0; ßAt > 0} C Un[rn] a.s. for some optional times Tn < oo, and by Proposition 25.4 and Lemma 25.9 we may assume the latter to be predictable. Taking T = r 1 in the previous argument, we may conclude that the process Ai= ßA71 l{r1::; t} is both

25. Predictability, Compensation, and Excessive Functions

497

natural and predictable. Repeating the argument for the process A - A 1 with r = r2 and proceeding by induction, we may condude that the jump component Ad of Ais predictable. Since A- Ad is continuous and hence 0 predictable, the predictability of A follows. For the uniqueness assertion we need the following extension of Proposition 17.2. Lemma 25.11 {constancy criterion) A process M is a predictable martingale of integrable variation iff Mt =Mo a.s. Proof: On the predictable a-field P we define the signed measure

where the inner integral is an ordinary Lebesgue-Stieltjes integral. The martingale property implies that JL vanishes for sets B of the form F x (t, oo) with F E Ft. By Lemma 25.1 and a monotone dass argument it follows that JL = 0 on P. Since M is predictable, the same thing is true for the process tl.Mt = Mt- Mt_, and then also for the sets J± = {t > 0; ±tl.Mt > 0}. Thus, JLJ± = 0, and so tl.M = 0 a.s., which means that M is a.s. continuous. But then Mt Mo a.s. by Proposition 17.2. 0

=

Proof of Theorem 25.5: The sufliciency is obvious, and the uniqueness holds by Lemma 25.11. It remains to prove that any local submartingale X

has the stated decomposition. By Lemmas 25.6 and 25.11 we may assume that X is of dass (D). Then Lemma 25.7 shows that X = M + A for some uniformly integrable martingale M and some natural increasing process A, and by Lemma 25.10 the latter process is predictable. 0 The two conditions in Lemma 25.10 are, in fact, equivalent. Theorem 25.12 {natural and predictable processes, Doleans) An integrable, increasing process is natural iff it is predictable. Proof: If an integrable, increasing process A is natural, it is also predictable by Lemma 25.10. Now assume instead that A is predictable. By Lemma 25.7 we have A = M + B for some uniformly integrable martingale M and some natural increasing process B, and Lemma 25.10 shows that B is predictable. But then A = B a.s. by Lemma 25.11, and so Ais natural. 0

The following useful result is essentially implicit in earlier proofs.

Foundations of Modern Pobability

498

Lemma 25.13 {dual predictable projection} Let X and Y be locally integrable, increasing processes, and assume that Y is predictable. Then X has compensator Y iff E I V dX = EI V dY for every predictable process v~o.

Proof: First reduce by localization to the case when X and Y are integrable. Then Y is the compensator of X iff M = Y - X is a martingale or, equivalently, iff EMr = 0 for every optional timeT. This is equivalent to the stated relation for V = 1[o,r)' and the general result follows by a 0 Straightforward monotone dass argument. We may now establish the fundamental connection between predictable times and processes.

Theorem 25.14 (predictable times and processes, Meyer} For any optional time T, these conditions are equivalent:

(i)

is predictable; (ii) the process 1{T ~ t} is predictable; (iii) El:::.Mr = 0 for any bounded martingale M. T

Proof {Chung and Walsh}: Since (i) :::} (ii) by Lemma 25.3 (ii), and (ii) (iii) by Theorem 25.12, it remains to show that (iii) :::} (i). We then introduce the martingale Mt= E[e-r!Ft] and the Supermartingale {::>

Xt = e-r/\t- Mt= E[e-r1\t- e-r 1Ft]

~

0,

t

~

0.

Here Xr = 0 a.s. by optional sampling. Letting a = inf{ t ~ 0; Xt- 1\ Xt = 0}, we see from Lemma 7.31 that {t ~ 0; Xt = 0} = [a, oo) a.s., and in particular a ~ T a.s. Using optional sampling again, we get E(e-u -e-r)= EXu = 0, and so a = T a.s. Hence, Xt 1\ Xt- > 0 a.s. on [0, r). Finally, (iii) yields

EXr-

= E(e-r- Mr-) = E(e-r- Mr) = EXr = 0,

and so Xr- = 0. It is now clear that = inf{t; Xt < n- 1 }.

T

is announced by the optional times 0

Tn

To illustrate the power of the last result, we may give a short proof of the following useful statement, which can also be proved directly.

Corollary 25.15 {restriction} For any predictable time Fr-, the restriction TA is again predictable.

T

and set A E

Proof: The process 1A1{r ~ t} = 1{TA ~ t} is predictable by Lemma 25.3, and so the time TA is predictable by Theorem 25.14. 0 We may also use the last theorem to show that predictable martingales are continuous.

25. Predictability, Compensation, and Excessive Functions

499

Proposition 25.16 (predictable martingales) A local martingale is predictable iff it is a.s. continuous.

Proof: The suffi.ciency is clear by definitions. To prove the necessity, we note that, for any optional time T,

M[ = Mt1[o,rj(t) + Mr1(r,oo)(t),

t

~ 0.

Thus, predictability is preserved by optional stopping, and so we may assume that M is a uniformly integrable martingale. Now fix any c: > 0, and introduce the optional time r = inf{t > 0; lö.Mtl > c:}. Since the left-continuous version Mt- is predictable, so is the process Ö.Mt as well as the random set A = {t > 0; lö.Mtl > c:}. Hence, the same thing is true for the random interval [r, oo) = AU (r, oo), and therefore T is predictable by Theorem 25.14. Choosing an announcing sequence (rn), we conclude by optional sampling, martingale convergence, and Lemmas 25.2 (iii) and 25.3 (i) that Thus, r = oo a.s. Since c: was arbitrary, it follows that M is a.s. D continuous. The decomposition of optional times in Proposition 25.4 may now be extended to increasing processes. We say that an rcll process X or a filtration F is quasi-leftcontinuous if Xr- = x'T a.s. on {T < 00} or Fr- = Fn respectively, for every predictable time T. We further say that X has accessible jumps if Xr- = Xr a.s. on {T < oo} for every totally inaccessible time r. Proposition 25.17 (decomposition of increasing processes) Any purely discontinuous, increasing process A has an a.s. unique decomposition into increasing processes A q and A a, where A q is quasi-leftcontinuous and Aa has accessible jumps. Furthermore, there exist some predictable times 71, 72, . . . with disjoint graphs such that {t > 0; ö.Af > 0} C Un [rn] a.s. Finally, if A is locally integrable with compensator A, then Aq has compensator (.Ay.

Proof: Introduce the locally integrable process Xt = I:s~t (ö.As 1\1) with compensator X, and define Aq = A- Aa = 1{ö.X = 0} · A, or Ai= At- Af

= Iot+ 1{.6.Xs = 0} dAs,

t ~ 0.

(4)

For any finite predictable time T, the graph [r] is again predictable by Theorem 25.14, and so by Lemma 25.13,

which shows that Aq is quasi-leftcontinuous.

Foundations of Modern Pobability

500

Now let

Tn,o

Tn,k

= 0, and recursively define the random times

= inf{t > Tn,k-li

f:1Xt E

(Tn, Tn+ 1 ]},

n, k E N,

which are predictable by Theorem 25.14. Alsonote that {t > 0; !:1Af > 0} C Un k[Tnk] a.s. by the definition of Aa. Hence, if T is a totally inaccessible time, 'then !:1A~ = 0 a.s. on {T < oo }, which shows that Aa has accessible jumps. To prove the uniqueness, assume that A has two decompositions Aq + A a = Bq +Ba with the stated properties. Then Y = A q - Bq = Ba - A a is quasi-leftcontinuous with accessible jumps. Hence, by Proposition 25.4 we have !:1Y7 = 0 a.s. on {T < oo} for any optional time T, which means that Y is a.s. continuous. Since it is also purely discontinuous, we get Y = 0 a.s. If Ais locally integrable, we may replace (4) by Aq = 1{!:1A = 0} · A, and we also note that (A)c = 1{!:1A = 0} · A. Thus, Lemma 25.13 yields for any predictable process V ~ 0

E

j VdAq

=

=

j 1{!:1A O}VdA E j 1{!:1A = O}VdA = E j Vd(Af, E

=

and the same lemma shows that Aq has compensator (A)c.

D

By the compensator of an optional time T we mean the compensator of the associated jump process Xt = 1{ T ::::; t}. The following result characterizes the special categories of optional times in terms of the associated compensators. Corollary 25.18 (compensation of optional times) Let T be an optional time with compensator A. Then (i) T is predictable iff A is a.s. constant apart from a possible unit jump;

(ii)

T

is accessible iff A is a.s. purely discontinuous;

(iii)

T

is totally inaccessible iff A is a.s. continuous.

In geneml,

T

has the accessible part TD, where D = { !:1A7

> 0,

T

< oo}.

Proof: (i) If T is predictable, then so is the process Xt = 1{T ::::; t} by Theorem 25.14, and hence A = X a.s. Conversely, if At = 1{cr ::::; t} for some optional time cr, then the latter is predictable by Theorem 25.14, and Lemma 25.13 yields

P{ CT

= T < oo} = =

E[t1Xu; CT < oo] = E[t1Au; CT < oo] P{cr < oo} = EAoo = EXoo = P{T < oo}.

Thus, T = cr a.s., and so T is predictable. (ii) Clearly, T is accessible iff X has accessible jumps, which holds by Proposition 25.17 iff A = Ad a.s. (iii) Here we note that T is totally inaccessible iff X is quasi-leftcontinuous, which holds by Proposition 25.17 iff A =Ac a.s.

25. Predictability, Compensation, and Excessive Functions

The last assertion follows easily from (ii) and (iii).

501 D

The next result characterizes quasi-left-continuity for both filtrations and martingales.

Proposition 25.19 (quasi-leftcontinuous jiltrations, Meyer) For any jiltration :F, these conditions are equivalent:

(i) Every accessible time is predictable; (ii) :F'T- = :F'T on {T < oo} for every predictable time T; (iii) flM'T = 0 a.s. on {T < oo} for every martingale M and predictable timeT.

If the basic a-field in n is taken to be Foo, then :F'T- = :F'T on {T for any optional time T, and the relation in (ii) extends to all of n.

= 00}

Proof: (i) ::::} (ii): Let T be a predictable time, and fix any B E :F'T n {T < oo}. Then [TB] C [T], and so TB is accessible, hence by (i) even predictable. The process Xt = 1{TB :::; t} is then predictable by Theorem 25.14, and since

X'T1{T < oo}

= 1{TB:::; T < oo} = 1B,

Lemma 25.3 (i) yields B E :F'T-. (ii) ::::} (iii): Fix any martingale M, and let T be a bounded, predictable time with announcing sequence (Tn)· Using (ii) and Lemma 25.2 (iii), we get as before and so M'T- = M'T a.s. (iii) ::::} (i): If T is accessible, then by Proposition 25.4 there exist some predictable times Tn with [T] C Un[Tn] a.s. By (iii) we have tl.M'Tn = 0 a.s. on {Tn < oo} for every martingale M and all n, and so flM'T = 0 a.s. on {T < oo}. Hence, T is predictable by Theorem 25.14. D In particular, quasi-left-continuity holds for canonical FeUer processes and their induced filtrations.

Proposition 25.20 (quasi-left-continuity of Feller processes, Blumenthal, Meyer) Let X be a canonical FeUer process with arbitrary initial distribution, and fix any optional time T. Then these conditions are equivalent:

(i) T is predictable; (ii) T is accessible;

(iii) X'T- = X'T a.s.

On { T

< 00 }.

In the special case when X is a.s. continuous, we may conclude that every optional time is predictable. Proof: (ii) ::::} (iii): By Proposition 25.4 we may assume that T is finite and predictable. Fix an announcing sequence (Tn) and a function f E Co.

502

Foundations of Modern Pobability

By the strong Markov property, we get for any h > 0

E{f(X,.J- f(XTn+h)} 2

2fThf + ThP)(XTJ < 2fThf + ThPII < 21lfllllf- Thfll + 11! 2 - ThPII·

=

E(f 2

-

11!2 -

Letting n -+ oo and then h .j.. 0, it follows by dominated convergence on the left and by strong continuity on the right that E{f(XT-)- f(XT)p = 0, which means that f(X,._) = f(XT) a.s. Applying this to a sequence fi, h, · · · E 0 0 that separates points, we obtain XT- = XT a.s. (iii) =9 (i): By (iii) and Theorem 19.20 we have D..MT = 0 a.s. on {r < oo} for every martingale M, and so r is predictable by Theorem 25.14. (i) =9 (ii): This is trivial. D The following basic inequality will be needed in the proof of Theorem 26.12.

Proposition 25.21 (norm inequality, Garsia, Neveu) Consider a rightor left-continuous, predictable, increasing process A and a mndom variable ( ~ 0 such that a.s.

E[Aoo- At 1Ft]

~ E[(IFt],

t

~ 0.

(5)

Then

In the left-continuous case, predictability is clearly equivalent to adaptedness. The proper interpretation of (5) is to take E[Ati.Ft] At and to choose right-continuous versions ofthe martingales E[AooiFt] and E[(IFt]· For a right-continuous A, we may clearly choose ( = Z*, where Z is the supermartingale on the left of (5). We also note that if Ais the compensator of an increasing process X, then (5) holds with ( = X 00 •

=

Proof: We need to consider only the right-continuous case, the case of a left-continuous process A being similar but simpler. It is enough to assume that A is bounded, since we may otherwise replace A by the process A 1\ u for arbitrary u > 0, and let u -+ oo in the resulting formula. For each r > 0, the random time r,. = inf{t; At~ r} is predictable by Theorem 25.14. By optional sampling and Lemma 25.2 we note that (5) remains true with t replaced by r,.-. Since r,. is FTr_-measurable by the same lemma, we obtain

E[Aoo - r; A 00 > r]

< E[Aoo - r; Tr < oo] < E[Aoo- ATr-i Tr < oo] < E[(; Tr < oo] ~ E[(; A 00

~

r].

25. Predictability, Compensation, and Excessive Functions

Writing A 00 = a and letting p- 1 + q- 1 Hölder's inequality, and some calculus

!lall~ =

p 2 q- 1 E p 2 q- 1

If

= 1, we get

503

by Fubini's theorem,

1o: (a- r)rP- 2dr

1

00

E[a- r; a

> r]rP- 2dr

1oo E[(; a ~ r]rP-2dr p2q-1 E ( 1o: rP-2dr


0, we may finally divide both sides by

llall~- 1 .

D

Let us now turn our attention to random measures ~ on (0, oo) x S, where (S, S) is a Borel space. We say that ~ is adapted, predictable, or locally integrable if there exists a subringS c S with a(S) = S such that the process ~tB = ~((0, t] x B) has the corresponding property for every B ES. In case of adaptedness or predictability, it is clearly equivalent that the relevant property holds for the measure-valued process ~t· Let us further say that a process V on IR+ x S is predictable if it is P 0 S-measurable, where p denotes the predictable a-field in IR+ X n.

Theorem 25.22 (compensation of random measures, Grigelionis, Jacod) Let~ be a locally integrable, adapted random measure on some product space (0, oo) x S, where S is Borel. Then there exists an a.s. unique predictable random measure { on (0, oo) x S such that E JV d~ = E JV d{ for every predictable process V 2: 0 on :IR+ x S.

f

The random measure above is called the compensator of ~. By Lemma 25.13 this extends the notion of compensator for real-valued processes. For the proof of Theorem 25.22 we need a simple technical lemma, which can be established by Straightforward monotone dass arguments.

Lemma 25.23 (predictable random measures)

(i) For any predictable random measure ~ and predictable process V on (0, oo) x S, the process V · ~ is again predictable.

~

0

(ii) For any predictable process V ~ 0 on (0, oo) x S and predictable, measure-valued process p on S, the process yt = J Yt,sPt (ds) is again predictable. Proof of Theorem 25.22: Since ~ is locally integrable, we may easily construct a predictable process V > 0 on IR+ x S such that E V d~ < oo. If the random measure (=V·~ has compensator (, then by Lemma 25.23 the measure = v- 1 • ( is the compensator of f Thus, we may henceforth assume that E~((O, oo) x S) = 1.

J

f

Foundations of Modern Pobability

504

Write TJ = ~(· x S). Using the kernel operation ® of Chapter 1, we may introduce the probability measure J.L = P®~ on OxiR+ xS and its projection v = P ® TJ onto n x IR+· Applying Theorem 6.3 to the restrictions of J.L and v to the a-fields P ® S and P, respectively, we conclude that there exists some probability kernel p from (0 x IR+, P) to (S, S) satisfying J.L = v ® p, or P®~=P®rJ®P

on (OxiR+xS,PxS).

Letting i] denote the compensator of 'TJ, we may introduce the random measure € = i] ® p on IR+ x S. To see that € is the compensator of {, we first note that € is predictable by Lemma 25.23 (i). Next we consider an arbitrary predictable process V~ 0 on IR+ x S, and note that the process Ys = JVs,tPt(ds) is again predictable by Lemma 25.23 (ii). By Theorem 6.4 and Lemma 25.13 we get

E

J

Vd€

= E =

It remains to note that

E

J J J J i](dt)

Vs,tPt(ds)

TJ(dt)

Vs,tPt(ds)

=E

J

Vd{.

€ is a.s. unique by Lemma 25.13.

0

Our next aim is to show, under a weak regularity condition, how a point process can be transformed to Poisson by means of a suitable predictable mapping. The result leads to various time-change formulas for point processes, similar to those for continuous local martingales in Chapter 18. Recall that an S-marked point process on (0, oo) is defined as an integervalued random measure { on (0, oo) x S suchthat a.s. ~([t] x S) ~ 1 for all t > 0. The condition implies that ~ is locally integrable, and so the existence of the associated compensator € is automatic. We say that ~ is quasi-leftcontinuous if ~([r] x S) = 0 a.s. for every predictable time r.

Theorem 25.24 {predictable mapping to Poisson) Fix a Borel space S and a a-finite measure space (T, J.L), let ~ be a quasi-leftcontinuous S-marked point process on (0, oo) with compensator and let Y be a predictable mapping from IR+ x S toT with € o y- 1 = J.L a.s. Then TJ = ~ o y- 1 is a Poisson process on T with ETJ = J.L.

€,

Proof: For any disjoint measurable sets B 1 , ... , Bn in T with finite J.Lmeasure, we need to show that 'T]B1, ... , TJBn are independent Poisson random variables with means J.LB1, ... , J.LBn. Then introduce for each k ~ n the processes

Jtk

= fsfot+ 1Bk(Ys,x)~(dsdx), Jtk = fsfot 1Bk(Ys,x)€(dsdx).

Here J!, = J.LBk < oo a.s. by hypothesis, and so the Jk are simple and integrable point processes on IR+ with compensators Jk. For fixed

25. Predictability, Compensation, and Excessive F'unctions

505

Ut, ... , Un ;?: 0, we define Xt

= Lk::;n { UkJ;- (1- e-uk)J;}, t;?: 0.

The process Mt= e-Xt has bounded variation and finitely many jumps, and so by an elementary change of variables Mt-1

=

L

s:=;t

~e-x•-

Lk 0. Let ~ be an S-marked, F-adapted point process on (0, oo) with compensator { Then ~ is F-Poisson with E~ = J-l iff = J-l a.s.

t

We may further deduce a basic time-change result, similar to Proposition 18.8 for continuous local martingales. Corollary 25.26 {time-change to Poisson, Papangelou, Meyer) Let Nl, ... , Nn be counting processes on IR+ with a.s. unbounded and continuous compensators N1 , . .. , fln, and assume that I:k Nk is a.s. simple. Define =inf{ t > 0; flk > s} and Y 8k = Nk ( Then Yl, . .. , yn are independent unit-rate Poisson processes.

T:

T:).

Proof: We may apply Theorem 25.24 to the random measures ~ = ' (6, ... , ~n) and ~' = (6, ... , ~n) on {1, ... , n}xiR+ induced by (N 1 , ... , Nn) and (N 1 , ... ,fln), respectively, and to the predictable mapping n,t =

(k, Ntk) on {1, ... , n} x IR+. It is then enough to verify that, a.s. for fixed

k and t,

'

~k{s;?:

.

'k

0; N 8

::;

t}

= t,

which is clear by the continuity of flk.

D

There is a similar result for stochastic integrals with respect to p-stable Levy processes, as described in Proposition 15.9. For simplicity, we consider only the case when p < 1.

506

Foundations of Modern Pobability

Proposition 25.27 (time-change of stable integrals} For a p E (0, 1), let X be a strictly p-stable Levy process, and consider a predictable process V 2:: 0 such that the process A = VP · >.. is a.s. finite but unbounded. Define 7 8 = inf{t; At> s}, s 2:: 0. Then (V· X) o r 4 X. Proof: Define a point process ~ on R+ x (R\ {0}) by ~B = L:s1B(s, ßX8 ), and recall from Corollary 15.7 and Proposition 15.9 that ~ is Poisson with intensity measure ofthe form >..®v, where v(dx) = qlxi-P- 1 dx for ±x > 0. In particular, ~ has compensator € = >.. ® v. Let the predictable mapping Ton R+ x R be given by T8 ,x = (As, xVs). Since Ais continuous, we have {As ~ t} = {s ~ Tt} and A,., = t. By Fubini's theorem, we hence obtain for any t, u > 0

(>..@ v) o T- 1 ((0, tj

X

(u, oo))

(>.. ® v){(s,x); As

Jor· v{x; xV



~ t, xV8

> u}

> u}ds

8

v(u,oo) lo Vfds

= tv(u,oo),

and similarly for the sets [0, t] X ( -oo, -u). Thus, t 0 r- 1 = t =>..®V a.s., and so Theorem 25.24 yields ~ o r- 1 4 ~· Finally, we note that

(V· X),.,

=

for,+ JxV ~(dsdx) = ~o= JxV 1{A

=

1t+

J~ y(

8

8

0

8

~ t}e(dsdx)

r- 1 ) ( dr dy)'

where the process on the right has the same distribution as X.

D

We turn to an important special case where the compensator can be computed explicitly. By the natural compensator of a random measure ~ we mean the compensator with respect to the induced filtration. Proposition 25.28 {natural compensator) For any Borel space (S, S), let (r, () be a random element in (0, oo] x S with distribution IL· Then ~ = 8,-,t; has natural compensator €tB

= {

J(O,tl\7"]

rr(dr

~ B)),

/-L r, 00

X

S

t

2:: 0, B ES.

(6)

Proof: The process TJtB on the right of (6) is clearly predictable for every B E S. It remains to show that Mt = ~tB- TJtB is a martingale, hence that E[Mt- M 8 ; A] = 0 for any s < t and A E F 8 • Since Mt = M 8 on {r ~ s }, and the set {r > s} is a.s. an atom of F 8 , it suffi.ces to show that

25. Predictability, Compensation, and Excessive F'unctions E(Mt - M 8 )

= 0, or EMt

507

=0. Then use Fubini's theorem to get J.L(dr x B)

E { J(o,t!IT)

[

J.L([r, oo] x S)

J.L(dx) [

J(o,oo)

[

J(O,tllx)

J.L(dr x B)

J.L(dr x B) X S)

J.L([r, oo] [

J.L([r, oo] X S) }[r,oo) J.L((O, tj x B) = E~tB.

J.L(dx)

J(o,t)

0

We turn to some applications of the previous ideas to classical potential theory. Then fix a domain D c JRd, and let Tt = Tf denote the transition operators of Brownian motion X in D, killed at the boundary 8D. A function f 2: 0 on D is said to be excessive if Ttf ::; f for all t > 0 and Ttf --t f as t --t 0. In this case clearly Ttf t f. Note that if f is excessive, then f(X) is a Supermartingale under Px for every x E D. The basic example of an excessive function is the Green potential GD 11 of a measure 11 on a Greenian domain D, provided this potential is finite. Though excessivity is defined globally in terms of the operators Tf, it is in fact a local property. For a precise statement, we say that a measurable function f 2: 0 on D is Superharmonie if, for any ball B in D with center x, the average of f over the sphere oB is bounded by f(x). As weshall see, it is enough to consider balls in D of radius less than an arbitrary e > 0. Recall that f is lower semicontinuous if Xn --t x implies liminfn f(xn) 2: f(x). Theorem 25.29 (superharmonic and excessive functions, Doob) Let f 2: 0 be a measurable function on a domain D C JRd. Then f is excessive iff it is superharmonic and lower semicontinuous.

For the proof we need two lemmas, the first of which clarifies the relation between the two continuity properties. Lemma 25.30 (semicontinuity) Consider a measurable function f 2: 0 on a domain D C JRd such that Ttf ::; f for all t > 0. Then f is excessive iff it is lower semicontinuous.

Proof: First assume that f is excessive, and let Xn --t x in D. By Theorem 24.7 and Fatou's lemma

I

Ttf(x)




and as t --t 0, we get f(x)::; liminfn f(xn)· Thus, f is lower semicontinuous.

508

Foundations of Modern Pobability

Next assume that f is lower semieontinuous. Using the eontinuity of X and Fatou's lemma, we get as t--+ 0 along an arbitrary sequenee

f(x)

Exf(Xo) ~Ex liminf f(Xt) t--->0

< liminf Exf(Xt) = liminfTtf(x) t--->0

< limsupTtf(x) t--->0

Thus, Ttf --+

t--->0

~

f(x).

f, and f is exeessive.

0

For smooth funetions, the Superharmonie property is easy to deseribe. Lemma 25.31 (smooth functions) A function f ~ 0 in C 2 (D) is Superharmonie iff tlf ~ 0, in which case f is also excessive.

Proof: By Ito's formula, the proeess Mt = f(Xt) -

~ Iot tlf(Xs)ds,

t E [0, (),

(7)

is a eontinuous loeal martingale. Now fix any closed ball B CD with eenter x, and write T =TaB· Sinee ExT < oo, we get by dominated eonvergenee

Thus, f is superharmonie iff the last expectation is ~ 0, and the first assertion follows. To prove the last statement, we note that the exit time ( = TßD is predictable, say with announcing sequenee (rn)· If tlf ~ 0, we get from (7) by optional sampling

Henee, Fatou's lemma yields Ex[f(Xt)i t exeessive by Lemma 25.30.

< (] = Ttf(x), and so f is 0

Proof of Theorem 25.29: If f is exeessive or superharmonic, then Lemma 25.30 shows that f 1\ n has the same property for every n > 0. The eonverse statement is also true-by monotone eonvergenee and beeause the lower semicontinuity is preserved by inereasing limits. Thus, we may heneeforth assume that f is bounded. Now assume that f is excessive on D. By Lemma 25.30 it is then lower semicontinuous, and it remains to prove that f is superharmonic. Sinee the property Ttf ~ f is preserved by passing to a subdomain, we may assume that D is bounded. For eaeh h > 0 we define Qh = h- 1 (!- Thf) and !h = GDQh· Sinee fand D are bounded, we have GD!< oo, and so !h = h- 1 0h T8 fds t f. By the strong Markov property we further see that,

J

25. Predictability, Compensation, and Excessive Functions

for any optional time

Exfh(Xr)

=

T

< (, ExExr Ex

1

00

1

00

qh(Xs)ds =Ex

1

00

509

qh(Xs+r)ds

%(Xs)ds :S fh(x).

In particular, fh is Superharmonie for each h, and so by monotone convergence the same property holds for f. Conversely, assume that f is Superharmonie and lower semicontinuous. To prove that f is excessive, it is enough by Lemma 25.30 to show that Ttf ::; f for all t. Then fix a spherically symmetric probability density 'ljJ E C 00 (JR.d) with SUpport in the unit ball, and put '1/Jh(x) = h-d'I/J(xjh) for each h > 0. Writing p for the Euclidean metric in JR.d, we may define fh = '1/Jh * f on the set Dh = {x E D; p(x,Dc) > h}. Note that fh E C 00 (Dh) for all h, that fh is superharmonic on Dh, and that fh t f. By Lemma 25.31 and monotone convergence we conclude that f is excessive on each set Dh· Letting (h denote the first exit time from Dh, we obtain

Ex[f(Xt); t < (h] :S f(x), As h ---+ 0, we have (h t (, and hence {t monotone convergence Ttf (x) ::; f (x).


0. (h}

t {t < (}. Thus, by D

We may now prove the remarkable fact that, although an excessive function f need not be continuous, the Supermartingale f(X) is a.s. continuous under Px for every x. Theorem 25.32 (continuity, Doob} Fix an excessive function f on a domain D c JR.d, and let X be a Brownian motion killed at 8D. Then the process f(Xt) is a.s. continuous on [0, ().

The proof is based on the following invariance under time reversal of a stationary version of Brownian motion. Though no such process exists in the usual sense, we may consider distributions with respect to the a-finite measure P = Pxdx, where Px is the distribution of a Brownian motion in JR.d starting at x.

J

Lemma 25.33 (time reversal, Doob} For any c > 0, the processes yt Xt and = Xc-t on [0, c] have the same distribution under P.

Yt

=

Proof: Introduce the processes

t

E

[0, c],

and note that B and iJ are Brownian motions on [0, c] under each Px. Fix any measurable function f ~ 0 on C([O, c], JR.d). By Fubini's theorem and

510

Foundations of Modern Pobability

the invariance of Lebesgue measure, we get Ef(Y)

=

Ef(Xo-

=

J

=

Eo

Be+ B) =

Eof(x-

J

f(x

j Exf(x- Be+ B) dx

Be + B) dx =

+ B) dx =

J

Eo

J

f(x -

Exf(Y) dx

Be+ B) dx

= Ef(Y).

0

Proof of Theorem 25.32: Since f 1\ n is again excessive for each n > 0 by Theorem 25.29, we may assume that f is bounded. As in the proof of the same theorem, we may then approximate f by smooth excessive functions fh t f on suitable subdomains Dh t D. Since fh(X) is a continuous Supermartingale up to the exit time (h from Dh, Theorem 7.32 shows that f(X) is a.s. right-continuous on [0, () under any initial distribution J.L· Using the Markov property at rational times, we may extend the a.s. right-continuity to the random time set T = {t 2: 0; Xt E D}. To strengthen the result to a.s. continuity on T, we note that f(X) is right-continuous on T, a.e. P. By Lemma 25.33 it follows that f(X) is also left-continuous on T, a.e. P. Thus, f(X) is continuous on T, a.s. PI-' for arbitrary J.L «: >..d. Since PI-' o x;; 1 «: >._d for any J.L and h > 0, we may conclude that f(X) is a.s. continuous on T n [h, oo) for any h > 0. This tagether with the right-continuity at 0 yields the asserted continuity on ~.().

0

If f is excessive, then f(X) is a Supermartingale under Px for every x, and so it has a Doob-Meyer decomposition f(X) = M- A. It is remarkable that we can choose A to be a continuous additive functional (CAF) of X independent of x. A similar situation was encountered in connection with Theorem 22.23.

Theorem 25.34 (compensation by additive functional, Meyer) Let f be an excessive function on a domain D C JR.d, and let Px be the distribution of Brownian motion in D, killed at ßD. Then there exists an a.s. unique CAF A of X such that M = f(X) + A is a continuous, local Px-martingale on [0, () for every x E D.

The main difficulty in the proof is to construct a version of the process A that compensates - f(X) under every measure Pw Here the following lemma is helpful. Lemma 25.35 (universal compensation} Consider an excessive function f on a domain D C .!Rd, a distribution m ,.._, >._d on D, and a Pm-compensator A of- f(X) on [0, (). Then for any distribution J.L and constant h > 0, the process A o (}h is a P1-'-compensator of- f(X o Oh) on [0, ( o (}h)·

In other words, the process Mt = f(Xt) P1-'-martingale on [h, () for every J.L and h.

+ At-h

o (}h is a local

25. Predictability, Compensation, and Excessive Punctions

511

Proof: For any bounded Pm-martingale M and initial distribution J.L « m, we note that M is also a P1'-martingale. To see this, write k = dJ.L/dm, and note that PI'= k(Xo) ·Pm. It is equivalent to show that Nt= k(X0 )Mt is a Pm-martingale, which is clear since k(Xo) is Fo-measurable with mean 1.

Now fix any distribution J.L and a constant h > 0. To prove the stated property of A, it is enough to show that, for any bounded Pm-martingale M, the process Nt = Mt-h o eh is a P~'-martingale on [h, oo). Then fix any times s < t and sets FE Fh and GE .1"8 • Using the Markov property at h and noting that P~' o x;; 1 « m, we get E~'[Mt o eh; F n e;; 1 G]

E~'[Exh [Mt; G]; F]

EIL[Exh[Ms;G];F] EIL[Ms 0 eh; F n e;; 1 G].

=

=

Hence, by a monotone dass argument, E~'[MtoOhiFh+sl = Msoeh a.s.

0

Proof of Theorem 25.34: Let A~' denote the P~'-compensator of- f(X) on (0, (), and note that A~' is a.s. continuous, e.g. by Theorem 18.10. Fix any distribution m"' ).d Oll D, and conclude from Lemma 25.35 that Am 0 eh is a P1'-compensator of- f(X o eh) on (0, ( o eh) for any J.L and h > 0. Since this is also true for the process At+h - A~, we get for any J.L and h > 0 Ar = A~ + A~h o eh,

t

2:: h, a.s. Pw

(8)

Restricting h to the positive rationals, we may define

At = lim

h->0

A~h

o eh,

t

> 0,

whenever the Iimit exists and is continuous and nondecreasing with Ao = 0, and put A = 0 otherwise. By (8) we have A = A~' a.s. P~' for every J.L, and so A is a P~'-compensator of - f(X) on (0, () for every J.L. For each h > 0 it follows by Lemma 25.35 that A o eh is a P~'-compensator of- f(X o eh) on (0, ( o eh), and since this is also true for the process At+h- Ah, we get At+h = Ah +At o Oh a.s. Pw Thus, Ais a CAF. o We may now establish a probabilistic version of the classical Riesz decomposition. To avoid technical difficulties, we restriet our attention to locally bounded functions f. By the greatest harmonic minorant of f we mean a harmonic function h ::::; f that dominates all other such functions. Recall that the potential UA of a CAF A of Xis given by UA(x) = ExA 00 • Theorem 25.36 (Riesz decomposition) Fix any locally bounded function f 2:: 0 on a domain D C JRd, and let X be Brownian motion on D, killed

at 8D. Then f is excessive iff it has a representation f = UA + h, where A is a CAF of X and h is harmonic with h 2:: 0. In that case, A is the compensator of- f(X) and h is the greatest harmonic minorant of f.

A similar result for uniformly a-excessive functions of an arbitrary Feiler process was obtained in Theorem 22.23. From the classical Riesz represen-

512

Foundations of Modern Pobability

tation on Greenian domains, we know that UA may also be written as the Green potential of a unique measure VA, so that f = cDvA + h. In the special case when D = JR.d with d ~ 3, we recall from Theorem 22.21 that VAB = E(1B · A)l. A similar representation holds in the general case. Proof of Theorem 25.36: First assume that Ais a CAF with UA < oo. By the additivity of A and the Markov property of X, we get for any t > 0

UA(x)

=

ExAX) = Ex(At + Aoo o flt) ExAt + ExExtAoo = ExAt + TtUA(x).

By dominated convergence Ex At .j.. 0 as t -+ 0, and so UA is excessive. Even UA + h is then excessive for any harmonic function h ~ 0. Conversely, assume that f is excessive and locally bounded. By Theorem 25.34 there exists some CAF A suchthat M = f(X) +Ais a continuous local martingale on [0, (). For any localizing and announcing sequence Tn t (, we get

Asn-+ oo, it follows by monotone convergence that UA:::; By the additivity of A and the Markov property of X,

Ex[Aooi.Ft]

=

At+ Ex[Aoo o fhi.Ft] At+ ExtAoo =Mt- J(Xt)

f.

+ UA(Xt)·

(9)

Writing h = f- UA, it follows that h(X) is a continuous local martingale. Since h is locally bounded, we may conclude by optional sampling and dominated convergence that h has the mean-value property. Thus, h is harmonic by Lemma 24.3. To prove the uniqueness of A, assume that f also has a representation UB + k for some CAF B and some harmonic function k ~ 0. Proceeding as in (9), we get

which shows that A-B is a continuous local martingale. Hence, Proposition 17.2 yields A = B a.s. To see that h is the greatest harmonic minorant of f, consider any harmonic minorant k ~ 0. Since f - k is again excessive and locally bounded, it has a representation UB + l for some CAF B and some harmonic function l. But then f = UB + k + l, and so A = B a.s. and h = k + l ~ k. D For any sufficiently regular measure v on JR.d, we may now construct an associated CAF A of Brownian motion X such that A increases only when X visits the support of v. This clearly extends the notion of local time. For convenience we may write GD(1D · v) = GDv.

25. Predictability, Compensation, and Excessive Functions

513

Proposition 25.37 {additive functionals induced by measures}

Fix a measure v on JR.d such that U(1n · v) is bounded for every bounded domain D. Then there exists an a.s. unique CAF A of Brownian motion X such that, for any D, ExA(v = GDv(x), XE D. (10) Conversely, v is uniquely determined by A. Furthermore, suppA C {t 2: 0; Xt E suppv} a.s.

(11)

The proof is straightforward, given the classical Riesz decomposition, and we shall indicate the main steps only. Proof: A simple calculation shows that GD v is excessive for any bounded domain D. Since GDv::; U(1n · v), it is further bounded. Hence, by Theorem 25.36 there exist a CAF An of X on [0, (n) and a harmonic function hn 2: 0 suchthat GDv = UAv + hn. In fact, hn = 0 by Riesz' theorem. Now consider another bounded domain D' :J D. We claim that an' V GD v is harmonic on D. This is clear from the analytic definitions, and it also follows, under a regularity condition, from Lemma 24.13. Since An and An' are compensators of -GDv(X) and -GD' v(X), respectively, we conclude that An - An' is a martingale on [0, (n ), and so An = An' a.s. up to time (n. Now choose a sequence of bounded domains Dn t JR.d, and define A = supn Ann, so that A = An a.s. on [0, (n) for all D. It is easy to see that A is a CAF of X, and that (10) holds for any bounded domain D. The uniqueness of v is clear from the uniqueness in the classical Riesz decomposition. Finally, we obtain (11) by noting that GDv is harmonic on D \ suppv for every D, so that GDv(X) is a local D martingale on the predictable set {t < (n; Xt rf. suppv}.

Exercises 1. Show by an example that the a-fields Fr and Fr- may differ. (Hint: Take T to be constant.) 2. Give examples of optional times that are predictable; accessible but not predictable; and totally inaccessible. (Hint: Use Corollary 25.18.)

3. Show by an example that a right-continuous, adapted process need not be predictable. (Hint: Use Theorem 25.14.) 4. Given a Brownian motion B on [0, 1], let F be the filtration induced by Xt = (Bt, B 1 ). Find the Doob-Meyer decomposition B = M + A on [0, 1) and show that A has a.s. finite variation on [0, 1]. 5. For any totally inaccessible time r, show that supt IP{r::; t + ciFt]1{T ::; t} I --+ 0 a.s. as c --+ 0. Derive a corresponding result for the compensator. (Hint: Use Lemma 25.8.)

514

Foundations of Modern Pobability

6. Let the process X be adapted and rcll. Show that X is predictable iff it has accessible jumps and !:l.Xr is Fr--measurable for every predictable timeT< oo. (Hint: Use Proposition 25.17 and Lemmas 25.2 and 25.3.) 7. Show that the compensator A of a quasi-leftcontinuous local submartingale is a.s. continuous. (Hint: Note that A has accessible jumps. Use optional sampling at an arbitrary predictable time T < oo with announcing sequence (rn).) 8. Extend Corollary 25.26 to possibly bounded compensators. Show that the result fails in general when the compensators are not continuous. 9. Show that any general inequality involving an increasing process A and its compensator A remains valid in discrete time. {Hint: Embed the discrete-time process and filtration into continuous time.)

Chapter 26

Semimartingales and General Stochastic Integration Predictable covariation and L 2 -integral; semimartingale integral and covariation; general substitution rule; Doleans' exponential and change of measure; norm and exponential inequalities; martingale integral; decomposition of semimartingales; quasi-martingales and stochastic integrators In this chapter we shall use the previously established Doob--Meyer decomposition to extend the stochastic integral of Chapter 17 to possibly discontinuous semimartingales. The construction proceeds in three steps. First we imitate the definition of the L 2-integral V · M from Chapter 17, using a predictable version (M, N) of the covariation process. A suitable truncation then allows us to extend the integral to arbitrary semimartingales X and bounded, predictable processes V. The ordinary covariation [X, Y] can now be defined by the integration-by-parts formula, and we may use some generalized versions of the BDG inequalities from Chapter 17 to extend the martingale integral V · M to more general integrands V. Once the stochastic integral is defined, we may develop a stochastic calculus for general semimartingales. In particular, weshall prove an extension of Itö's formula, solve a basic stochastic differential equation, and establish a general Girsanov-type theorem for absolutely continuous changes of the probability measure. The latter material extends the appropriate portions of Chapters 18 and 21. The stochastic integral and covariation process, together with the DoobMeyer decomposition from the preceding chapter, provide the tools for a more detailed analysis of semimartingales. Thus, we may now establish two general decompositions, similar to the decompositions of optional times and increasing processes in Chapter 25. We shall further derive some exponential inequalities for martingales with bounded jumps, characterize local quasi-martingales as special semimartingales, and show that no continuous extension of the predictable integral exists beyond the context of semimartingales. Throughout this chapter, M 2 denotes the dass of uniformly squareintegrable martingales. As in Lemma 17.4, we note that M 2 is a Hilbert space for the norm 11M II = (EM(,.J11 2 . We define M5 as the closed linear subspace of martingales M E M 2 with Mo = 0. The corresponding

516

Foundations of Modern Probability

classes Mfoc and M~ loc are defined as the sets of processes M such that the stopped versions 'MTn belong to M 2 or M~, respectively, for some sequence of optional times Tn---+ oo. For every M E Mfoc we note that M 2 is a local submartingale. The corresponding compensator, denoted by (M), is called the predictable quadratic variation of M. More generally, we may define the predictable covariation (M, N) of two processes M, N E Mfoc as the compensator of MN, also computable by the polarization formula

4(M,N) Note that (M, M)

(M, N)

=

= (M + N)- (M-N).

(M). If M and N are continuous, then clearly

[M, N] a.s. The following result collects some further useful

properties. Proposition 26.1 (predictable covariation) For any M, Mn, N E Mfoc'

(i) (M, N) = (M- Mo, N- No) a.s.; (ii) (M) is a.s. increasing, and (M, N) is a.s. symmetric and bilinear;

(iii) !(M, N)!:::; J !d(M, N)!:::; (M) 112 (N) 112 a.s.; (iv) (M, N) T = (MT, N) = (MT, NT) a.s. for any optional time T; (v) (Mn)oo

!+ 0 implies (Mn- M0 )* !+ 0.

Proof: By Lemma 25.11 we note that (M, N) is the a.s. unique predictable process of locally integrable variation and starting at 0 such that MN (M, N) is a local martingale. The symmetry and bilinearity in (ii) follow immediately, as does property (i), since MN0 , M 0 N, and M 0 N 0 are all local martingales. Property (iii) is proved in the same way as Proposition 17.9, and (iv) is obtained as in Theorem 17.5. To prove (v), we may assume that M 0 = 0 for all n. Let (Mn)oo !+ 0. Fix any c: > 0, and define Tn = inf{t; (Mn)t 2: c}. Since (Mn) is predictable, even Tn is predictable by Theorem 25.14 and is therefore announced by some sequence Tnk t Tn· The latter may be chosensuchthat Mn is an L 2martingale and (Mn) 2 - (Mn) a uniformly integrable martingale on [0, Tnk] for every k. By Proposition 7.16 2 < E(Mn) 2 E(Mn)*Tnk = E(Mn) Tnk < c: ' ,-... Tnk -

and as k---+ oo, we get E(Mn);: _ _:Sc:. Now fix any 8 > 0, and write

P{(Mn)* 2 >8}

<
0.

Proof: It is enough to establish the approximation (/Vn- V/)P · A)t ~ 0. By Minkowski's inequality we may then approximate in steps, and by dominated convergence we may first reduce to the case when V is simple. Each term may then be approximated separately, and so we may next

assume that V= 18 for some predictable set B. Approximating separately Oll disjoint intervals, we may finally reduce to the case when B c n X [0, t] for some t > 0. The desired approximation is then obtained from Lemma 25.1 by a monotone dass argument. D Proof of Theorem 26.2: As in Theorem 17.11, we may construct the integral V · M as the a.s. unique element of M~,ioc satisfying (i). The mapping (V, M) f-t V· M is clearly bilinear, and by the analogue of Lemma 17.10 it extends the elementary predictable integral. Properties (ii) and (iv) may be obtained in the same way as in Propositions 17.14 and 17.15. The stated continuity property follows immediately from (i) and Proposition 26.1 (v). To get the stated uniqueness, it is then enough to apply Lemma 26.3 with A = (M) and p = 2. To prove (iii), we note from Lemma 26.3 with At= (M)t +L:s 2J;_}, L.....ts~t Since

t

2: 0.

ll:!..AI ::; 2/:).J*, we have

lo ldAsl = L s ll::!..Asl ::; 2J* ::; 4M~. 00

{

Writing

A for the compensator of A and putting D =

A, we get

A-

ED~ V ED~::; E laoo ldDsl :S:: E laoo ldAsl :S:: EM~.

(11)

To get a similar estimate for N = M - D, we introduce the optional times Tr=inf{t;

NfvJt

r>O,

>r},

and note that

P{N~ > r}

::; P{rr < oo} + P{rr = oo, N~ > r} ::;

P{N~>r}+P{J*>r}+P{N;r>r}.

Arguing as in the proof of Lemma 26.5, we get Nt ::; N~

1\

(N~r-

(12)

ll:!..NI ::; 4J~, and so

+ 4J;r_) ::; N~ 1\ 5r.

Since N 2 - (N] is a local martingale, we get by Chebyshev's inequality or Proposition 7.15, respectively, r 2 P{N;r > r} :S:: EN~; :S:: E(N~ /\r) 2 • Hence, by Fubini's theorem and some calculus,

1

00

P{N;r > r}dr :S::

1 E(N~ 00

1\

r) 2 r- 2 dr ;s::

EN~.

Combining this with (11)-(12) and using Lemma 3.4, we get

1oo P{N~ > r}dr

EN~
r} + P{N;r > r}) dr :S:: EM~.

It remains to note that EM':x,::; ED":x,

+ EN':x,.

0

Extension top> 1 (Garsia): For any t 2: 0 and B E :Ft, we may apply (10) with p = 1 to the local martingale 1B(M- Mt) to get a.s.

c!l E[[M- Mt]~21:Ft]

< E[(M- Mt)~I:Ft] ::; c1E((M- Mt]~ 2 I:Ft]·

Foundations of Modern Probability

526

Since

[M]~2 - [MJ!/2 ~ [M- Mt]~2 ~ [M]~2' M:X, -Mt ~ (M- Mt):-x, ~ 2M:X,, the relation E[A 00 - Ati.Ft] ~ E[(I.Ft] occurring in Proposition 25.21 holds with At = [MJ! 12 and ( = M*, and also with At = Mt and ( = [M]~ 2 . Since

tl.M*t < tl.[M].t112

=

ltl.Mt_ I < [M] t112 1\ 2M*t•

we have in both cases tl.Ar ~ E[(I.Fr] a.s. for every optional timeT, and so the cited condition remains fulfilled for the left-continuous version A_. Hence, Proposition 25.21 yields IIAoollp ~ ll(llp for every p 2: 1, and (10) follows. D We may use the last theorem to extend the stochastic integral to a !arger dass of integrands. Then write M for the space of local martingales and Mo for the subdass of processes M with Mo = 0. For any M E M, let L(M) denote the dass of predictable processes V suchthat (V 2 · [M]) 112 is locally integrable. Theorem 26.13 (martingale integral, Meyer) The elementary predictable integral extends a.s. uniquely to a bilinear map of any M E M and V E L(M) into V· ME Mo, suchthat if V, Vt, V2, · · · E L(M) with IVnl ~V and (V; · [M])t ~ 0 for some t > 0, then (Vn · M); ~ 0. This integral satisfies properlies (ii)-(iv) of Theorem 26.2 and is characterized by the condition [V· M,N] =V· [M,N] a.s.,

NE M.

(13)

Proof: For the construction of the integral, we may reduce by localization to the case when E(M - M 0 )* < oo and E(V 2 · [M])~ 2 < oo. For each n E N, define Vn = V1{[VI ~ n}. Then Vn ·ME Mo by Theorem 26.4, and by Theorem 26.12 we have E(Vn · M)* < oo. Using Theorems 26.6 (v) and 26.12, Minkowski's inequality, and dominated convergence, we obtain

E(Vm · M- Vn · M)*

~

E[(Vm - Vn) · M]~ 2 E((Vm- Vn) 2 · [M])~ 2 -+ 0.

Hence, there exists a process V· M with E(Vn ·M-V· M)* -+ 0, and clearly V· M E Mo and E(V · M)*oo. To prove (13), we note that the relation holds for each Vn by Theorem 26.6 (v). Since E[Vn · M - V· MJ~ 2 -+ 0 by Theorem 26.12, we get by Theorem 26.6 (iii) for any NE M and t 2: 0

I[Vn · M,N]t- [V· M,N]tl ~ [Vn ·M-V· MJi 12 [NJi 12 ~ 0.

(14)

26. Semimartingales and General Stochastic Integration

527

Next we note that, by Theorem 26.6 (iii) and (v),

fot IVnd[M,N]I = fot ld[Vn · M,NJI S [Vn · MJ! 12 [NJ!12 . Asn-+ oo, we get by monotone convergence on the left and Minkowski's

inequality on the right

lt

IVd[M,N]I:::; [V. MJ! 12 [NJ!/ 2
0. The property is clearly independent of the choice of decomposition X = M + A. To motivate the terminology, we note that any martingale M of locally finite variation may be written as M = Mo + A- A, where At = L::s 0, and put T = inf {t; llMt > c}. Define At = 1{T :S: t}, let A denote the compensator of A, and put N = A - A. Integrating by parts and using Lemma 25.13 gives Tn

Thus, N is L 2 -bounded and hence lies in 'D. For any bounded martingale M', we get

E E

J J

M'dN

=E

llM'dA

J

llM'dN

= E[llM~;

T

< oo),

where the first equality is obtained as in the proof of Lemma 25. 7, the second is due to the predictability of M!__, and the third holds since A is predictable and hence natural. Letting M' -+ M in M 2 , we obtain

Thus, llM :=:; c a.s., and therefore llM :=:; 0 a.s. since c is arbitrary. Similarly, llM ~ 0 a.s., and the desired continuity follows. Next assume that M E 'D and N E C, and choose martingales Mn -+ M of locally finite variation. By Theorem 26.6 (vi) and (vii) and optional sampling, we get for any optional time T

and so [M, N] is a martingale by Lemma 7.13. Since it is also continuous by (15), Proposition 17.2 yields [M, N] = 0 a.s. In particular, EM00 N 00 = 0, which shows that C .l'D. The uniqueness assertion now follows easily. To prove the last assertion, we conclude from Theorem 26.6 (iv) that, for any ME M 2 ,

[M]t = [M]~

+"'

~s~t

(llMs?,

t

~ 0.

(15)

Letting M E 'D, we may choose martingales of locally finite variation Mn -+ M. By Theorem 26.6 (vii) and (viii) we have [Mn]c = 0 and E[Mn-M] 00

26. Semimartingales and General Stochastic Integration

529

--+ 0. For any t 2: 0, we get by Minkowski's inequality and (15)

I{Ls~t (flM:)2} 1/2- {Ls~t (flMs)2} 1/21 < {""" (!lMn!lM )2}1/2 < [Mn- M]l/2 ~ 0 - L...J s~t 8 8 t ' I[MnJ!/2- [MJ!/21 :::; [Mn- MJ!/2 ~ 0. Taking limits in (15) for the martingales Mn, we get the same formula for M without the term [M]f, which shows that [M] = [M]d. Now consider any ME M 2 • Using the strong orthogonality [Mc,Md] = 0, we get a.s.

[M]c + [M]d = [M] = [Mc +Md] = [Mc]

+ [Md],

which shows that even [Mc] = [M]c a.s. By the same argument combined with Theorem 26.6 (viii) we obtain [Xd] = [X]d a.s. for any D semimartingale X. The last result immediately yields an explicit formula for the covariation of two semimartingales. Corollary 26.15 (decomposition of covariation} For any semimartingale X, the process xc is the a.s. unique continuous local martingale M with Mo = 0 such that [X - M] is purely discontinuous. Furthermore, we have a.s. for any semimartingales X and Y [X, Y]t = [Xe, yc]

In particular, we note that (V·

+"""

L...Js9

flXsflYs,

t 2: 0.

(16)

xy =V· xc a.s. for any semimartingale

X and locally bounded, predictable process V.

Proof: If M has the stated properties, then [(X- M)c] =[X- M]c = 0 a.s., and so (X -M)c = 0 a.s. Thus, X -M is purely discontinuous. Formula (16) holds by Theorem 26.6 (iv) and Theorem 26.14 when X= Y, and the general result follows by polarization. D The purely discontinuous component of a local martingale has a further decomposition, similar to the decompositions of optional times and increasing processes in Propositions 25.4 and 25.17. Corollary 26.16 (decomposition of martingales, Yoeurp) Every purely discontinuous local martingale M has an a.s. unique decomposition M = Mo + Mq + Ma with purely discontinuous Mq, Ma E Mo, where Mq is quasi-leftcontinuous and Ma has accessible jumps. Furthermore, there exist some predictable times r 1 , r 2 , ... with disjoint graphs suchthat {t; !lMf -f:. 0} C Un[rn] a.s. Finally, [Mq] = [M]q and [Ma] = [M]a a.s., and also (Mq) = (M)c and (Ma) = (M)d a.s. when ME Mfoc·

Proof: Introduce the locally integrable process At= Ls 0; flNt =I= 0} C Un[rn]· Assuming the stated condition, we get by Fubini's theorem and Lemma 25.2 for any bounded optional time T

ENr

Ln E[t:J.N'Tn; Tn ~ r] Ln E[E[t:J.Nrni.Frn-l; Tn ~ r] = 0,

and so N is a martingale by Lemma 7.13. Conversely, given any uniformly integrable martingale N and finite predictable time T, we have a.s. E[NriFr-] = Nr- and hence E[tlNriFr-] = 0. 0 For general martingales M, the process Z = eM-[MJ/ 2 in Lemma 18.21 is not necessarily a martingale. For many purposes, however, it can be replaced by a similar supermartingale.

Lemma 26.19 {exponential supermartingales) Let M be a local martingale with Mo= 0 and lt:J.MI ~ c < oo a.s., and put a = f(c) and b = g(c), where f(x) = -(x + log(1- x)+)x- 2 , g(x) = (ex- 1- x)x- 2 • Then the processes X = eM -a[MJ and Y = eM -b(M} are supermartingales.

26. Semimartingales and General Stochastic Integration

531

Proof: In case of X we may clearly assume that c < 1. By Theorem 26.7 we get, in an obvious shorthand notation,

x:- 1 . X= M- (a- ~)[M]c + L {et:.M-a(t:.M)

2

-

1- flM}.

Here the first term on the right is a local martingale, and the second term is nonincreasing since a 2:: ~. To see that even the sum is nonincreasing, we need to show that exp(x- ax 2 ) :::; 1 + x or f( -x) :::; f(c) whenever lxl :::; c. Butthis is clear by a Taylor expansion of each side. Thus, X: 1 ·X is a local supermartingale, and since X > 0, the same thing is true for X_ · (X:- 1 · X)= X. By Fatou's lemma it follows that Xis a true supermartingale. In the case of Y, we may decompose M according to Theorem 26.14 and Proposition 26.16 as M = Mc + Mq + Ma, and conclude by Theorem 26.7 that M- b(M)c + ~[Mr

Y.=- 1 . y

=

M

+L

{

et:.M-bt:.(M)-

1- flM}

+ b([Mq]- (Mq))- (b- ~)[M]c '"' { t:.M-M(M) _ 1 + flM + b(/lM) 2 } + L....J e 1 + b/l(M) '"' { 1 + flMa + b(flMa)2 - 1- /lMa} + L....J

1 + bfl(Ma)

.

Here the first two terms on the right are martingales, and the third term is nonincreasing since b 2:: ~. Even the first sum of jumps is nonincreasing since ex - 1- x :::; bx2 for lxl :::; c and eY :::; 1 + y for y 2:: 0. The last sum clearly defines a purely discontinuous process N of locally finite variation and with accessible jumps. Fixing any finite predictable timeT and writing ~ = l:..M-r and 'f/ = !:..(M)n we note that

E 1 + ~ + be - 1- ~1

1

1 + b'T]

Since E

Lt (/lMt)


0, and conclude from Lemma 26.19 that the process X;'= exp{uMt- u 2 f(uc)[M]t},

t ~ 0,

isapositive supermartingale. Since [M) :::; 1 and X(f = 1, we get for any

r>O

P{suptMt > r}

:::;

P{suptX;' > exp{ur- u 2 f(uc)}}

:::;

exp{-ur+u 2 f(uc)}.

(17)

Now define F(x) = 2xf(x), and note that F is continuous and strictly increasing from [0, 1) onto IR+. Alsonote that F(x) :::; x/(1- x) and hence p- 1 (y) ~ y/(1 +y). Taking u = p- 1 (rc)/c in (17), we get P{suptMt > r}

:::;

exp{ -~rF- 1 (rc)/c}

:::;

exp{ -~r 2 /(1 + rc)}.

It remains to combine with the same inequality for -M. (ii) Define G(x) = 2xg(x), and note that Gis a continuous and strictly increasing mapping onto IR+. Furthermore, G(x) :::; ex -1, and so c- 1 (y) ~ log(1 + y). Proceeding as before, we get P{suptMt > r}

:::;

exp{-~rG- 1 (rc)/c}

:::;

exp{-~rlog(1+rc)/c},

and the result follows.

0

A quasi-martingale is defined as an integrable, adapted, and right-continuous process X such that (18) where the supremum extends over all finite partitions 1r of IR+ of the form 0 = to < t1 < · · · < tn < oo, and the last term is computed under the conventions tn+l = oo and X 00 = 0. In particular, we note that (18) holds when X is the sum of an L 1-bounded martingale and a process of integrable variation starting at 0. The next result shows that this case is close to the general situation. Here localization is defined in the usual way in terms of a sequence of optional times Tn t oo. Theorem 26.20 ( quasi-martingales, Rao) Any quasi-martingale is a difference of two nonnegative supermartingales. Thus, a process X with X 0 = 0 is a local quasi-martingale iff it is a special semimartingale.

Proof: For any t ~ 0, let Pt denote the dass of partitions 1r of the interval [t, oo) of the formt= to < t 1 < · · · < tn, and define "'; =""' E [ (Xtk - E[Xtk+liFtkD± IFt) ' L.Jk~n

7f

E

Pt,

where tn+l = oo and X 00 = 0 as before. We claim that ry;t and ry; are a.s. nondecreasing under refinements of 1r E Pt. To see this, it is clearly

26. Semimartingales and General Stochastic Integration

533

enough to add one more division point u to 1r, say in the interval (tk, tk+l)· Put a = Xtk - Xu and ß = Xu - Xtk+l. By subadditivity and Jensen's inequality we get the desired relation

E[E[a + ßiFtk]±IFt]

::; E [E[aiFtk]± + E[ßiFtk]±IFt] ::; E [E[aiFtk]± + E[ßiFu]±IFt]·

mr =

E-";

Now fix any t ~ 0, and conclude from (18) that sup1rEPt < 00. For each n E N we may then choose some 1rn E Pt with Ery;n > n- 1 . The sequences (ry;J are Cauchy in L 1 , and so they converge in L 1 toward some limits 11;;±. Note also that EITJ; - 11;;± I < n - 1 whenever 1r is a refinement of 7rn· Thus, TJ; --+ 11;;± in L 1 along the directed set Pt. Next fix any s < t, let 7r E Pt be arbitrary, and define 7r 1 E Ps by adding the point s to 1r. Then

mr -

~± ~ ry;, = (Xs- E[XtiFsD± + E[TJ;IFs] ~ E[TJ;IFsl· Taking limits along Pt on the right, we get ~± ~ E[l';;± IFs] a.s., which means that the processes y± are supermartingales. By Theorem 7.27 the right-hand limits along the rationals zf' = 11;;~ then exist outside a fixed null set, and the processes z± are right-continuous supermartingales. For 1r E Pt we have Xt = ry;):" - ry; --+ 11;;+ - 11;;-, and so Zf: = Xt+ = Xt as. D

zt -

The next result shows that semimartingales are the most general processes for which a stochastic integral with reasonable continuity properties can be defined. As before, f denotes the class of bounded, predictable step processes with jumps at finitely many fixed points. Theorem 26.21 (stochastic integrators, Bichteler, Dellacherie} A rightcontinuous, adapted process X is a semimartingale iff for any V 1 , V2, · · · E E

with

IIV; lloo --+ 0 we have (Vn · X)t ~ 0 for

all t > 0.

The proof is based on three lemmas, the first of which separates the crucial functional-analytic part of the argument. Lemma 26.22 (convexity and tightness} For any tight, convex set K C L 1 (P), there exists a bounded random variable p > 0 with SUPt;a: Ep~ < oo.

Proof (Yan): Let ß denote the class of bounded, nonnegative random variables, and define C = {'y E ß; supt;EK E('y~) < oo }. We claim that, for any 'Yl> 'Y2 , · • · E C, there exists some 'Y E C with {'y > 0} = Un bn > 0}. Indeed, we may assume that 'Yn ::; 1 and supt;EK E('yn~) ::; 1, in which case we may choose 'Y = l:n 2-n'Yn· It is then easy to construct a p E C such that P{p > 0} = sup-yEC P{'y > 0}. Clearly,

{'y > 0}

C

{p > 0} a.s.,

'Y

E

C,

(19)

since we could otherwise choose a p' E C with P{p' > 0} > P{p > 0}. To show that p > 0 a.s., we assume that instead P{p = 0} > E > 0. By the tightness of K we may choose r > 0 so large that P { ~ > r} ::; E for

534

Foundations of Modern Probability

all ~ E /C. Then P{~- ß > r} ~ e for all ~ E IC and ß E B. By Fatou's lemma we obtain P{ ( > r} ~ e for all ( in the L 1-closure Z = IC- B. In particular, the random variable (o = 2r1{p = 0} lies outside Z. Now Z is convex and closed, and so, by a version of the Hahn-Banach theorem, there exists some 'Y E (L 1 )* = L'~0 satisfying sup E'Y~- inf E'Yß ~ sup E'Y( < E'Y(o

~EIC

ßEB

(EZ

= 2r E['Y; p = 0].

(20)

Here 'Y ;::: 0, since we would otherwise get a contradiction by choosing ß = b1{'Y < 0} for large enough b > 0. Hence, (20) reduces to sup~EJC E'Y~ < 2rE['Y; p = 0], which implies 'Y E C and E['Yi p = 0] > 0. But this contradicts {19), and therefore p > 0 a.s. 0 Two further lemmas are needed for the proof of Theorem 26.21.

Lemma 26.23 (tightness and boundedness) Let T be the class of optional times r < oo taking finitely many values, and consider a right-continuous, adapted process X such that the family {Xr; r E T} is tight. Then X* < oo a.s. Proof: By Lemma 7.4 any bounded optional time r can be approximated from the right by optional times Tn E 7, and by right-continuity we have Xrn --+ Xr. Hence, Fatou's lemma yields P{IXrl > r} ~ liminfn P{IX.,.__n I > r}, and so the hypothesis remains true with replaced by the dass T of all bounded optional times. By Lemma 7.6 the times Tt,n = t A inf{s; IXsl > n} belong toT for all t > 0 and n E N, and as n--+ oo, we get

r

P{X*

> n} = supP{X; > n} t>O

~

sup P{IXrl > n}--+ 0.

rET

0

Lemma 26.24 (scaling) For any finite random variable~' there exists a bounded random variable p > 0 such that Elp~l < oo. Proof: We may take p = (1~1 V 1)- 1 .

0

Proof of Theorem 26.21: The necessity is clear from Theorem 26.4. Now assume the stated condition. By Lemma 4.9 it is equivalent to assume for each t > 0 that the family ICt = {(V · X)ti V E &I} is tight, where &1 = {V E &; lVI ~ 1}. The latter family is clearly convex, and by the linearity of the integral the convexity carries over to ICt. By Lemma 26.23 we have X* < oo a.s., and so by Lemma 26.24 there exists a probability measure Q"' P suchthat EQX; = Jx;dQ < oo. In particular, ICt C L 1 (Q), and we note that ICt remains tight with respect to Q. Hence, by Lemma 26.22 there exists a probability measure R "' Q with bounded density p = dRjdQ suchthat ICt is bounded in L 1 (R). Now consider an arbitrary partition 0 = to < t1 < · · · < tn = t, and note that

26. Semimartingales and General Stochastic Integration

535

where

Vs = Lk x} tend to 0 at-an exponential rate I(x), given by the Legendre-Fenchel transform A* of A. In higher dimensions, it is often convenient to state the result more generally in the form n- 1 log P{~n E B} -+ -I(B), where I(B) = infxEB I(x) and B is restricted to a suitable dass of continuity sets. In this standard format of a Zarge-deviation principle with rate function I, the result extends to an amazing variety of contexts throughout probability theory. A striking example, of fundamental importance in statistical mechanics, is Sanov's theorem, which provides a similar large deviation result for the empirical distributions of a sequence of i.i.d. random variables with a common distribution p,. Here the rate function I is defined on the space of probability measures v on lR and agrees with the relative entropy function H(vjp,). Another important example is Schilder's theorem for the family of rescaled Brownian motions in JRd, where the rate function becomes I(x) = ~llxll~, the squared norm in the Cameron-Martin space considered in Chapter 18. The latter result can be used to derive the Fredlin-Wentzell estimates for randomly perturbed dynamical systems. It also provides a short proof of Strassen's law of the iterated logarithm, a stunning extension of the classical Khinchin law from Chapter 13. Modern proofs of those and other large deviation results rely on some general extension principles, which also serve to explain the wide applicability of the present ideas. In addition to some rather straightforward and elementary techniques of continuity and approximation, we consider the more sophisticated and extremely powerful methods of inverse continuaus mapping and projective limits, both of which play a crucial role in subsequent applications. We may also call attention to the significance of

538

Foundations of Modern Probability

exponential tightness, and to the essential equivalence between the setwise and functional formulations of the large-deviation principle. Large deviation theory is arguably one of the most technical branches of modern probability theory. For the nonexpert it then seems essential to avoid getting distracted by topological subtleties or elaborate computations. Many results are therefore stated here under simplifying assumptions. Likewise, we postpone our discussion of general principles until the reader has become aquainted with the basic ideas in a concrete setting. For this reason, important applications appear both at the beginning and at the end of the chapter, separated by a more abstract discussion of some general notions and principles. Let us now return to the elementary context of i.i.d. random variables ~' 6, 6, ... and write Sn = Ek x} -=+ 0 for all x > m by the weak law of large numbers. Under stronger moment conditions, the rate of convergence turns out to be exponential and can be estimated with great accuracy. This rather elementary but quite technical result lies, along with its multidimensional counterpart, at the core of large-deviation theory and provides both a pattern and a point of departure for more advanced developments. For motivation, we begin with some simple observations.

Lemma 27.1 {convergence) Let Then

~,6,6,

... be i.i.d. mndom variables.

=-h(x) for all x;

(i) n- 1 logP{(n ~ x}--+ supn n- 1 logP{(n ~ x} (ii) h is [0, oo]-valued, nondecreasing, and convex;

(iii) h(x) < oo iff P{~

~

x} > 0.

Proof: (i) Writing Pn = P{(n ~ x}, we get for any m,n E N P{Sm+n ~ (m+n)x}

Pm+n ~

P{Sm ~ mx, Sm+n- Sm~ nx}

= PmPn·

Taking logarithms, we conclude that the sequence -logpn is subadditive, and the assertion follows by Lemma 10.21. (ii) The first two assertions are obvious. To prove the convexity, let x, y E lR be arbitrary, and proceed as before to get P{S2n ~ n(x + y)} ~ P{Sn ~ nx} P{Sn ~ ny}. Taking logarithms, dividing by 2n, and letting n h(Hx + y))

s;

Hh(x)

+ h(y)),

-t

oo, we obtain

x, y > 0.

(iii) If P{~ ~ x} = 0, then P{(n ~ x} = 0 for all n, and so h(x) = oo. Conversely, (i) yields logP{~ ~ x} s; -h(x), and so h(x) = oo implies P{~~x}=O.

o

To determine the limit in Lemma 27.1 we need some further notation, which is given here for convenience directly in d dimensions. For any random

vector

27. Large Deviations

539

u ERd,

{1)

ein JRd' we introduce the function A(u)

= Ae(u) = logEeu~,

known in statistics as the cumulant-generating function of f Note that A is convex, since by Hölder's inequality we have for any u, v E JRd and p, q > 0 withp+q = 1 A(pu + qv)

=

log E exp{ {pu + qv )e)


0

= P{ euSn 2:: enux} ::; e-nux EeuSn = enA(u)-nux'

and so

n- 1 logP{(n 2:: x}::; A(u)- ux. This remains true for u ::; 0, since in that case A(u)- ux 2:: 0 for x 2:: m. Hence, by (2) we have the upper bound

n- 1 logP{(n 2:: x}::; -A*(x),

x 2:: m, n E N.

(4)

To derive a matehing lower bound, we first assume that A < oo on ~+. Then Ais smooth on (0, oo) with A'(O+) = m and A'(oo) = esssup~ b, and so for any a E (m, b) we can choose a u > 0 suchthat A'(u) = a. Let ry, 'f/1. 'f/2, ... be i.i.d. with distribution

=

(5) Then A'7(r) = A~(r + u)- A~(u), and therefore Ery For any c > 0, we get by (5) P{l~n-

al < c}

= A~(O) = A{(u) = a.

= enA(u) E[e-nu'iJn; liJn- al < c}

2:: enA(u)-nu(a+e) P{liJn- al < c}.

(6)

Here the last probability tends to 1 by the law of large numbers, and so by

(2)

liminf n- 1 logP{I(n- al < c} 2:: A(u)- u(a + c) 2:: -A*(a + c). n-+oo

Fixing any x E ( m, b) and putting a = x + c, we get for small enough c > 0 liminf n- 1 log P{(n 2:: x} 2:: -A*(x + 2c). n-+oo

Since A* is continuous on (m, b) by convexity, we may let c -+ 0 and combine with (4) to obtain (3). The result for x > b is trivial, since in that case both sides of (3) equal -oo. If instead x = b < oo, then both sides equallogP{~ = b}, the left side by a simple computation and the right side by an elementary estimate. Finally, assume that x = m > -oo. Since the statement is trivial when ~ = m a.s., we may assume that b > m. For any y E (m, b), we have 0 2:: n- 1 logP{(n 2:: m} 2:: n- 1 P{(n 2:: y}-+ -A*(y)

> -oo.

Here A*(y) -+ A*(m) = 0 by continuity, and (3) follows for x = m. This completes the proof when A < oo on ~+·

27. Large Deviations

541

The case when A(u) = oo for some u > 0 may be handled by truncation. Thus, for any r > m we consider the random variables ~k = ~k 1\ r. Writing Ar and A; for the associated functions A and A*, we get for x ~ m ~ E~r n- 1 logP{~n ~ x} ~ n- 1 logP{~~ ~ x}--+ -A;(x).

(7)

Now Ar(u) t A(u) by monotone convergence as r --+ oo, and by Dini's theorem the convergence is uniform on every compact interval where A < oo. Since also A' is unbounded on the set where A < oo, it follows easily that A;(x)--+ A*(x) for all x ~ m. The required lower bound is now immediate from (7). o We may now Supplement Lemma 27.1 with a criterion for exponential decline of the tail probabilities P{ ~n ~ x }. Corollary 27.4 (exponential rate) Let ~,6,6, ... be i.i.d. with m = E~ < oo and b = ess sup ~. Then for any x E ( m, b), the probabilities P{~n ~ x} decrease exponentially iff A(e) < oo for some e > 0. The exponential decline extends to x = b iff 0 < P{ ~ = b} < 1.

Proof: If A(e) < oo for some e > 0, then A'(O+) = m by dominated convergence, and so A*(x) > 0 for all x > m. If instead A = oo on (0, oo), then A*(x) = 0 for all x ~ m. The statement for x =bis trivial. 0

The large deviation estimates in Theorem 27.3 are easily extended from intervals [x, oo) to arbitrary open or closed sets, which leads to the Zarge-deviation principle for i.i.d. sequences in R To fulfill the needs of subsequent applications and extensions, we shall derive a version of the same result in JR.d. Motivated by the last result, and also to avoid some technical complications, we assume that A(u) < oo for all u. Write B 0 and B- for the interior and closure of a set B. Theorem 27.5 (large deviations in JR.d, Varadhan) Let ~,~1,6, ... be i.i.d. random vectors in JR.d with A = Ae < oo. Then for any B E ßd, we have - inf A*(x) < liminf n- 1 logP{~n E B} xEB 0

n->oo

< limsup n->oo

n- 1 logP{~n E

B}::::;- inf A*(x). xEB-

Proof: To derive the upper bound, we fix any e > 0. By (2) there exists for every x E JR.d some Ux E JR.d such that UxX- A(ux) > (A *(x)- e) 1\ e-I,

and by continuity we may choose an open ball Bx around x such that UxY

> A(ux) + (A*(x)- e) 1\ e- 1 ,

y E Bx.

By Chebyshev's inequality and (1) we get for any n E N P{~n E Bx}

< Eexp(uxSn- ninf{uxy; y ::::;

exp( -n((A*(x)- e) 1\

E Bx})

e- 1 )).

(8)

542

Foundations of Modern Probability

Alsonote that A < oo implies A*(x)-+ oo as lxl-+ oo, at least when d = 1. By Lemma 27.1 and Theorem 27.3 we may then choose r > 0 so large that n- 1 logP{I~nl > r}::; -1/e,

n E N.

(9)

Now let B c JRd be closed. Then the set {x E B; lxl ::; r} is compact and may be covered by finitely many balls Bx 1 , ••• , Bxm with centers Xi E B. By (8) and (9) we get for any n E N

P{~nEB} < Li:~mP{~nEBxJ+P{I~nl>r}

< ""· exp(-n((A*(xi) -e) t\e- 1 )) +e-n/c L.Jt$.m < (m + 1) exp( -n((A*(B)- e) t\ e- 1 )), where A*(B) = infxEB A*(x). Hence, limsup n- 1 log P{~n E B} ::; -(A*(B)- e) t\ e-1, n--+oo

and the upper bound follows since e was arbitrary. Turning to the lower bound, we first assume that A(u)/lul -+ oo as lul -+ oo. Fix any open set B C JRd and a point x E B. By compactness and the smoothness of A, there exists a u E JRd suchthat V'A(u) = x. Let 'IJ, 'IJI. 1]2 , ..• be i.i.d. random vectors with distribution (5), and note as before that ETJ = x. Fore> 0 small enough, we get as in (6) P{~n E B}

;:::::

P{l~n- xl

< e}

> exp(nA(u)-nux-nelui)P{Iiln-xl 0 and u E JRd, we have by Lemma 27.2 (i)

andin particular A~+u( ::; Ae- Since also Ae+u((u)/iul ;::::: u 2 lul/2-+ oo, we note that the previous bound applies to & + u(n. Now fix any x E B as before, and choose e > 0 small enough that B contains a 2e-ball around x. Then P{l~n + u(n- xl

< e} < P{~n E B} + P{ul(nl;::::: e} < 2 (P{~n E B} V P{ul(nl;::::: e}).

27. Large Deviations

Applying the lower bound to the variables (n to (n, we get by Lemma 27.2 (i) -A~(x)

543

+ a(n and the upper bound

< -A~+a((x) ::; liminf n- 1 logP{I(n + a(n- xl < e} n--+oo < liminf n- 1 log (P{(n E B} V P{al(nl n--+oo

~ e})

The desired lower bound now follows, as we let a --+ 0 and then take the supremum over all x E B. 0 We can also derive large-deviation results in function spaces. Here the following theorem is basic and sets the pattern for more complex results. For convenience, we may write C = C([O, 1],JR.d) and C~ = {x ECk; x 0 = 0}. We also introduce the Cameron-Martin space H1, consisting of all absolutely continuous functions x E Co admitting a R.adon-Nikodym derivative ± E L 2, so that 11±11~ = f01 l±tl 2dt < oo.

Theorem 27.6 (Zarge deviations of Brownian motion, Schilder) Let X be a d-dimensional Brownian motion on [0, 1]. Then for any Borel set B C C([O, 1],JR.d), we have - inf I(x) < liminf e 2 logP{eX E B} xEB 0

e--+0


0, Theorem 18.22 yields P{cX E B}

;::::

Integrating by parts gives log&( -(x/c) · Xh

P{llcX- xlloo < h} E[&(-(x/c) · X)I; llcXIIoo < h].

1

-c- 1

=

1

(11)

±tdXt- c- 2 I(x)

1

-c- 1x1X1 + c- 1

1

XtXtdt- c- 2 I(x),

and so by (11) c2 logP{cX E B};:::: -I(x)- hlx1l- hllxlh +c 2 logP{IIcXIIoo

< h}.

Relation (10) now follows as we let c--+ 0 and then h--+ 0. Turning to the upper bound, we fix any closed set B C C and let Bh denote the closed h-neighborhood of B. Letting Xn be the n-segment, polygonal approximation of X with Xn(k/n) = X(k/n) for k :::; n, we note that P{cX E B} :S P{cXn E Bh} + P{ciiX- Xnll

Writing I(Bh)

= inf{J(x); x

> h}.

(12)

E Bh}, we obtain

P{cXn E Bh} :S P{I(cXn);:::: I(Bh)}.

Here 2I(Xn) is a sum of nd variables ~[k, where the ~ik are i.i.d. N(O, 1), and so by Lemma 27.2 (i) and an interpolated version of Theorem 27.5, limsup c2 logP{cXn E Bh}:::; -I(Bh)· e:--+0

(13)

Next we get by Proposition 13.13 and some elementary estimates P{ciiX- Xnll

> h} :S nP{ciiXII > hvn/2} :::;

2ndP{c 2

e > h n/4d}, 2

where ~ is N(O, 1). Applying Theorem 27.5 and Lemma 27.2 (i) again, we obtain lim sup c 2 log P{ eil X - Xn II e:--+0

> h} :S

-h 2 n/8d.

(14)

Combining (12), (13), and (14) gives limsup c2 logP{cX E B}:::; -I(Bh) 1\ (h 2 n/8d), e:--+0

and as n--+ oo we obtain the upper bound -I(Bh)· It remains to show that I(Bh) t I(B) as h --+ 0. Then fix any r > suph I(Bh)· For every h > 0 we may choose some xh E Bh such that I(xh) :::; r, and by Lemma 27.7 we may extract a convergent sequence

27. Large Deviations Xhn -+X with hn-+ 0 suchthat even I(x) :::; r. Since also XE nh Bh we obtain I(B) :::; r, as required.

545

= B, D

The last two theorems suggest the following abstraction. Letting ~e:, c: > 0, be random elements in some metric space S with Borel a-field S, we say that the family (~e:) satisfies the Zarge-deviation principle (LDP) with rate function I: S-+ [0, oo], if for any BE S we have - inf I(x) xEB 0

< liminf dogP{~e: e:-+0 < limsup dogP{~e: t:-->0

E

B}

E

B}:::;- inf I(x). xEB-

(15)

For sequences 6, 6, ... we require the same condition with the normalizing factor c: replaced by n- 1 .1t is often convenient to write I(B) = infxEB I(x). Letting S1 denote the dass {B E S; I(B 0 ) = I(B-)} of all I-continuity sets, we note that (15) implies the convergence lim

e:-+0

c:logP{~e:

E B} = -I(B),

(16)

BE S1.

If ~.6,6, ... are i.i.d. random vectors in JRd with A(u) = Eeut; < oo for all u, then by Theorem 27.5 the averages [n satisfy the LDP in JRd with rate function A*. If instead X is a d-dimensional Brownian motion on [0, 1], then Theorem 27.6 shows that the processes c: 112 X satisfy the LDP in C([O, 1],JRd) with rate function I(x) = ~11±11~ for x E H1 and I(x) = oo otherwise. We show that the rate function I is essentially unique.

Lemma 27.8 (regularization and uniqueness) If (~e:) satisfies the LDP in a metric space S, then the associated rate function I can be chosen to be lower semicontinuous, in which case it is unique. Proof: Assurne that (15) holds for some I. Then the function J(x)

= liminf I(y), y-+x

x ES,

is clearly lower semicontinuous with J :::; I. It is also easy to verify that J(G) = I(G) for all open sets G C S. Thus, (15) remains true with I replaced by J. To prove the uniqueness, assume that (15) holds for two lower semicontinuous functions I and J, and Iet I (x) < J (x) for some x E S. By the semicontinuity of J, we may choose a neighborhood Gof x suchthat J(G-) > I(x). Applying (15) to both I and J yields the contradiction -I(x):::; -I(G):::; liminf dogP{~e: E G}:::; -J(G-) e:-+0

< -I(x).

D

Justified by the last result, we may henceforth take the lower semicontinuity tobe part of our definition of a rate function. (An arbitrary function I satisfying (15) will then be called a raw rate function.) No regularization is needed in Theorems 27.5 and 27.6, since the associated rate functions A*

546

Foundations of Modern Probability

and I are already lower semicontinuous, the former as the supremum of a family of continuous functions and the latter by Lemma 27.7. It is sometimes useful to impose a slightly stronger regularity condition on the function I. Thus, we say that I is good if the level sets I- 1 [0, r] = {x E S; I(x) ::; r} are compact (rather than just closed). Note that the infimum I(B) = infxEB I(x) is then attained for every closed set B =1- 0. The rate functions in Theorems 27.5 and 27.6 are clearly both good. A related condition on the family (~c) is the exponential tightness inf limsup K

c-+0

dogP{~c

rj. K} = -oo,

(17)

where the infimum extends over all compact sets K C S. We actually need only the slightly weaker condition of sequential exponential tightness, where (17) is only required along sequences cn --+ 0. To simplify our exposition, we often omit the sequential qualification from our statements and carry out the proofs under the stronger nonsequential hypothesis. We finally say that (~c) satisfies the weak LDP with rate function I if the lower bound in (15) holds as stated while the upper bound is only required for compact sets B. We list some relations between the mentioned properties. Lemma 27.9 (goodness, exponential tightness, and the weak LDP) Let ~c' c > 0, be random elements in a metric space S. (i) The LDP for (~c) with rate function I implies {16), and the two conditions are equivalent when I is good. (ii) If the ~c are exponentially tight and satisfy the weak LDP with rate function I, then I is good and (~c) satisfies the full LDP. (iii) (Pukhalsky) If S is Polish and (~c) satisfies the LDP with rate function I, then I is good iff (~c) is sequentially exponentially tight. Proof: (i) Let I be good and satisfy (16). Write Bh for the closed hneighborhood of B E S. Since I is nonincreasing on S, we have Bh rj. S 1 for at most countably many h > 0. Hence, {16) yields for almost every

h>O

To see that I(Bh) t I(B-) as h --+ 0, assume instead that suph I(Bh) < I(B-). Since I is good, we may choose for every h > 0 some xh E Bh with I(xh) = I(Bh), and then extract a convergent sequence Xhn --+ x E Bwith hn --+ 0. By the lower semicontinuity of I we get the contradiction I(B-)::; I(x)::; liminf I(xhJ::; supl(Bh) n-+oo

h>O

< I(B-),

which proves the upper bound. Next let x E B 0 be arbitrary, and conclude from (16) that, for almost all sufficiently small h > 0, -I(x)::; -I({x}h)

= c-+0 lim dogP{~c E {x}h}::; liminf dogP{~c E B}. c-+0

27. Large Deviations

547

The lower bound now follows as we take the supremum over x E B 0 • (ii) By (17) we may choose some compact sets Kr satisfying

r > 0.

limsup dogP{~e ~Kr}< -r, e-+0

(18)

For any closed set B C S, we have P{ ~e E B} ::; 2(P{ ~e E B

n Kr} V P{ ~e

~

Kr}),

r

> 0,

and so, by the weak LDP and (18), limsup dogP{~e E B}::; -I(BnKr) 1\r::; -I(B) 1\r. e-+0

The upper bound now follows as we let r -+ oo. Applying the lower bound and (18) to the sets K~ gives -I(K~)

::; limsup dog P{~e e-+0

< -r, r > 0,

~Kr}

and so I- 1 [0, r] c Kr for all r > 0, which shows that I is good. (iii) The sufficiency follows from (ii), applied to an arbitrary sequence €n -+ 0. Now let S be separable and complete, and assume that the rate function I is good. For any k E N we may cover S by some open balls Bkl, Bk2, ... of radius 1/k. Putting Ukm = Uj:::_;m Bkj, we have supm I(Ukm) = oo since any level set I- 1 [0, r] is covered by finitely many sets Bkj· Now fix any sequence cn -+ 0 and constant r > 0. By the LDP upper bound and the fact that P{~en E Ukm} -+ 0 as m-+ oo for fixed n and k, we may choose mk E N solarge that

Uk,mk}::; exp( -rk/cn),

P{~en E

n, k

E N.

Summing a geometric series, we obtain limsupn €n log P{ ~en E

Uk Uk,mk} s; -r.

The asserted exponential tightness now follows, since the set totally bounded and hence relatively compact.

nk Uk,mk is D

The analogy with weak convergence theory suggests that we look for a version of (16) for continuous functions. Theorem 27.10 (functional LDP, Varadhan, Bryc) Let ~e. c random elements in a metric space S.

> 0, be

(~e) satisfies the LDP with a rate function I and if f: S -+ ffi. is continuous and bounded above, then

(i) If

AJ

=lim dogEexp e-+0

(f(~e)/c)

= sup (f(x)- I(x)). xES

(ii) If the ~e are exponentially tight and the limit Af in (i) exists for every f E Cb, then (~e) satisfies the LDP with the good rate function I(x)

= sup (f(x)- AJ), fECb

x ES.

Foundations of Modern Probability

548

Proof: (i) For every n E N we call choose fillitely many closed sets B1, ... , Bj alld the oscillatioll offOll each Bj is Bm c s suchthat f ~ -n Oll 1 at most n- . Thell

nj

limsup dogEef(e.)/e: < max limsup dogE[ef(e.)/e:; ~c E Bj] V (-n) J-:;m

c-+0

c-+0

< :rp.ax (supxEB.f(x)- illfxEB I(x)) V ( -n) J::;m :J :J < :rp.ax sup (f(x)-I(x)+n- 1 )V(-n) J::;m xEB;

sup (f(x)- I(x) xES

+ n- 1) V ( -n).

The upper boulld llOW follows as we let n -+ oo. Next we fix ally x E S with a lleighborhood G alld write limillf dogEef(e.)/e: c-+0

> li~_J~f dogE[ef(e.)/e:; ~c

E

c]

illf f(y) - illf I(y) > yEG yEG

>

illf f(y)- I(x). yEG

Here the lower boulld follows as we let G i {x} alld thell take the supremum over x ES. (ii) First we llote that I is lower semicolltilluous, as the supremum over a family of colltilluous fullctiolls. Sillce AJ = 0 for f = 0, it is also clear that I 2: 0. By Lemma 27.9 (ii) it remaills to show that (~e:) satisfies the weak LDP with rate fullctioll I. Thell fix any 8 > 0. For every x ES, we may choose a fullctioll fx E Cb satisfyillg fx(x)- AJ., > (I(x)- 8) 1\8- 1 ,

and by colltilluity there exists a lleighborhood Bx of x such that

By Chebyshev's illequality we get for ally c P{~e: E Bx}

>0

< Eexp (c- 1 (fx(~e:)- illf{fx(y); y E Bx})) < Eexp (c- 1 (fx(~e:)- AJx- (I(x)- 8) 1\ 8- 1)),

and so by the defillitioll of Af x , limsup c log P{~e: E Bx} c-+0

< c-+0 lim dog E exp(fx ( ~c) / c) - Af x -(I(x)- 8) 1\8- 1 .

- (I (x) - 8) 1\ 8- 1

27. Large Deviations

549

Now fix any compact set K C S, and choose x 1 , ... ,xm E K suchthat K C UBx;· Then limsup clogP{~c E K} c->0

< max limsup clogP{~c c->0

t~m

E

BxJ

< - min (I(xi) - 8) 1\ 8- 1 t~m

< -(I(K) - 8) 1\ 8- 1 . The upper bound now follows as we let 8 -+ 0. Next consider any open set G and element x E G. For any n E N we may choose a continuous function fn: S-+ [-n, 0] suchthat fn(x) = 0 and fn =-non Ge. Then

-I(x)

/fhb (AJ- f(x))::; Afn- fn(x) = AJn lim clog E exp(/n(~c)/c)

c->0

< liminf clog P{~c c->0

E G} V ( -n).

The lower bound now follows as we let n -+ oo and then take the supremum D over all x E G. Next we note that the LDP is preserved by continuous mappings. The following results are often referred to as the direct and inverse contraction principles. Given any rate function Ion S and a function f: S -+ T, we define the image J =I o f- 1 on T as the function

J(y) = I(f- 1 {y}) = inf{I(x); f(x) = y},

y E T.

(19)

Note that the corresponding set functions are related by J(B)

=inf J(y) = inf{I(x); f(x) yEB

E B}

= I(f- 1 B),

B c T.

Theorem 27.11 (continuous mapping) Consider a continuous function f between two metric spaces S and T, and let ~c be random elements in S. (i) If (~c) satisfies the LDP in S with rate function I, then the images /(~c) satisfy the LDP in T with the raw rate function J =I o /- 1 . M oreover, J is a good rate function on T whenever the function I is good on S. (ii) {Ioffe) Let (~c) be exponentially tight in S, let f be injective, and let the images f(~c) satisfy the weak LDP in T with rate function J. Then (~c) satisfies the LDP in S with the good rate function I = J o f. Proof: (i) Since f is continuous, we note that /- 1 B is open or closed whenever the corresponding property holds for B. Using the LDP for (~c), we get for any B C T

< liminf clogP{~c c->0 < limsup clogP{~c c->0

E

f- 1 B

0 }

E /- 1 B-}::;

-I(!- 1 B-),

550

Foundations of Modern Probability

which proves the LDP for {!(e")} with the raw rate function J =I o f- 1 . When I is good, we claim that (20) To see this, fix any r ~ 0, and let x E I- 1 [0,r]. Then

= inf{I(u); f(u) = f(x)}::::; I(x)::::; r,

J o f(x) =I o f- 1 o f(x)

which means that f(x) E J- 1 [0,r]. Conversely, let y E J- 1 [0,r]. Since I is good and f is continuous, the infimum in (19) is attained at some x E S, and we get y = f(x) with I(x) ::::; r. Thus, y E f(I- 1 [0, r]), which completes the proof of (20). Since continuous maps preserve compactness, (20) shows that the goodness of I carries over to J. (ii) Here I is again a rate function, since the lower semicontinuity of J is preserved by composition with the continuous map f. By Lemma 27.9 (ii) it is then enough to show that (e") satisfies the weak LDP inS. To prove the upper bound, fix any compact set K C S, and note that the image set f(K) is again compact since f is continuous. Hence, the weak LDP for (! (e")) yields limsup clogP{f(~") E f(K)}

limsup clogP{e" E K} c--+0

c--+0

< -J(f(K)) Next we fix any open set G

= -I(K). E

G be arbitrary with

lim sup clog P {e" r;t K} < -r.

(21)

C S,

and let x

I(x) = r < oo. Since (e") is exponentially tight, we may choose a compact set K C S such that c--+0

The continuous image f(K) is compact in T, and so by (21) and the weak LDP for {!(~")}

-J(f(Kc))::::; -J((f(KW) < lim inf clog P{f(~") i f(K)} c--+0

< limsup clogP{~" i K} < -r. c--+0

Since I(x) = r, we conclude that x E K. As a continuous bijection from the compact set Konto f(K), the function f is in fact a homeomorphism between the two sets with their subset topologies. By Lemma 1.6 we may then choose an open set G' c T such that f(x) E f(G n K) = G' n f(K). Noting that P{!(~") E G'}::::; P{~" E G}

+ P{e" i K}

27. Large Deviations

and using the weak LDP of

-r

-I(x)

{f(~c)},

551

we get

= -J(f(x))

< liminf clogP{!(~c) c--+0

< liminf clogP{~c c--+0

E G'}

E G} V limsup clogP{~c ~ K}. c--+0

Hence, by (21)

-I(x)::; liminf c--+0

clogP{~c E

G},

x

E G,

and the lower bound follows as we take the supremum over all x E G.

D

We turn to the powerful method of projective limits. The following sequential version is sufficient for our needs and will enable us to extend the LDP to a variety of infinite-dimensional contexts. Some general background on projective limits is provided by Appendix A2.

Theorem 27.12 {random sequences, Dawson and Gärtner) For any metric spaces 81, 8 2 , ... , let ~c = (~~) be random elements in S = XkSk, such that for every n E N the vectors (~:, ... , ~~) satisfy the LDP in sn = Xk~nSk with a good rate function In. Then (~c) satisfies the LDP in S with the good rate function

I(x) = supnin(Xl, ... , Xn),

X= (x1, x2, ... ) ES.

(22)

Proof: For any m::; n we introduce the natural projections 7rn: S-+ sn and 7rmn : sn -+ sm. Since the 1rmn are continuous and the In are good, Theorem 27.11 shows that Im = In o 1r;;;; for all m ::; n, and so 1fmn(I,-;:- 1 [0, r]) C I;;; 1 [0, r] for all r 2: 0 and m::; n. Hence, for each r 2: 0 the level sets I;;:- 1 [0, r] form a projective sequence. Since they are also compact by hypothesis, and in view of (22) (23) Lemma A2.9 shows that the sets I- 1 [0,r] are compact. Thus, I is agairr a good rate function. Now fix any closed set AC Sand put An= 7rnA, so that 1fmnAn =Am for all m ::; n. Since the 1rmn are continuous, we have also 1rmnA;:;: C A;;; for m ::; n, which means that the sets A;:;: form a projective sequence. We claim that (24) Here the relation A C 1r,";:" 1 A;:;: is obvious. Next assume that x ~ A. By the definition of the product topology, we may choose a k E N and an open set U c Sk suchthat x E 1rJ; 1 U C Ac. It follows easily that 1fkX E U C Ak. Since u is open, we have even 1fkX E (A;;;f. Thus, X ~ 7r;:;: 1 A;:;:' which completes the proof of (24). The projective property carries over to the intersections A;:;: n I,-;:- 1 [0, r], and formulas (23) and (24) combine into the

nn

552

Foundations of Modern Probability

relation

An r

1

[0, r] =

nn1r; (A; n I; [0, rl), 1

r ~ 0.

1

(25)

Now assume that I(A) > r E IR. Then An I- 1 [0, r] = 0, and by (25) and Lemma A2.9 we get A; n I,;;- 1 [0, r] = 0 for some n E N, which implies In(A;) ~ r. Noting that AC 1r; 1 An and using the LDP in sn, we conclude that

< limsup clogP{7rn~e

limsup clogP{~e E A} e~o

E An}

e~o

< -In(A;)

~

-r,

The upper bound now follows as we let r t I(A). Finally, fix an open set G C S and let x E G be arbitrary. By the definition of the product topology, we may choose n E N and an open set U c Sn suchthat x E 1r; 1 U G. The LDP in sn yields

c

liminf e~o

clogP{~e

E G}

~

liminf

~

-In(U) ~ -In o 11"n(x) ~ -I(x),

e~o

dogP{7rn~e

EU}

and the lower bound follows as we take the supremum over all x E G.

D

We consider yet another basic method for extending the LDP, namely by suitable approximation. Here the following elementary result is often helpful. Let us say that the random elements ~e and rJe in a common separable metric space (8, d) are exponentially equivalent if lim dogP{d(~e, rJe) > h} = -oo,

e~o

h > 0.

(26)

The separability of S is needed only to ensure measurability of the pairwise distances d(~e, rJe)· In general, we may replace (26) by a similar condition involving the outer measure. Lemma 27.13 (approximation) Let ~e and rJe be exponentially equivalent random elements in a separable metric space S. Then (~e) satisfies the LDP with a good rate function I iff the same LDP holds for (rJe)· Proof: Suppose that the LDP holds for (~e) with rate function I. Fix any closed set B C S, and let Bh denote the closed h-neighborhood of B. Then P{rJe E B} ~ P{~e E Bh} + P{d(~e,rJe)

> h},

and so by (26) and the LDP for (~e) limsup dogP{rJe E B} e~o

< limsup dogP{~e

E Bh} V limsup dogP{d(~e,rJe)

e~o

e~o

> h}

< -I(Bh) V (-oo) = -I(Bh). Since I is good, we have I(Bh) bound follows.

t

I(B) as h

-t

0, and the required upper

27. Large Deviations

553

Next we fix an Openset G c s and an element X E G. If d(x, cc) > h > 0, we may choose a neighborhood U of x suchthat Uh c G. Noting that P{~e

EU}::; P{1Je E G} + P{d(~e,1Je) > h},

we get by (26) and the LDP for

-I(x)

(~e)

< - I(U) ::; lim inf dog P{ ~e e--+0 < liminf clogP{7Je e--+0

E

E U}

G} V limsup dogP{d(~e,7Je) > h} e--+0

liminf dogP{7Je E G}. e--+0

The required lower bound now follows, as we take the supremum over all xEG. D We now demonstrate the power of the abstract theory by considering some important applications. First we study perturbations of the ordinary differential equation x = b(x) by a small noise term. More precisely, we consider the unique solution Xe with Xö = 0 of the d-dimensional SDE (27) where B is a Brownian motion in JRd and b is bounded and uniformly Lipschitz continuous mapping on JRd. Let Hoo denote the set of all absolutely continuous functions x: JR+ -+ JRd with x 0 = 0 such that x E L 2 . Theorem 27.14 (perturbed dynamical systems, Freidlin and Wentzell) For any bounded, uniformly Lipschitz continuous function b : JRd -+ JRd, the solutions Xe to (27) with Xö = 0 satisfy the LDP in C (JR+, JRd) with the good rate function

I(x) =

~

1

00

lxt- b(xtW dt,

x E Hoo.

(28)

Here it is understood that I(x) = oo when x tf. H 00 • Note that the result for b = 0 extends Theorem 27.6 to processes on JR+. Proof: If B 1 is a Brownian motion on [0, 1], then for every r > 0 the process Br = if>(B 1 ) given by B[ = r 112 Bi;r is a Brownian motion on [0, r]. Noting that is continuous from C([O, 1]) to C([O, r]), we see from Theorems 27.6 and 27.11 (i) together with Lemma 27.7 that the processes c 112 Br satisfy the LDP in C([O, r]) with the good rate function Ir = h o -I, where h(x) = ~llxll~ for x E H1 and h(x) = oo otherwise. Now maps H1 onto Hr, and when y = (x) with x E H1 we have Xt = r 112 iJrt· Hence, by calculus Ir(Y) = ~ liJsl 2 ds = ~lliJII~, which extends Theorem 27.6 to [0, r]. For the further extension to JR+, let 1rnX denote the restriction of a function x E C(JR+) to [O,n], and infer from Theorem 27.12 that the processes c 1 / 2B satisfy the LDP in C(JR+) with the good rate function Ioo(x) = supnln(1rnX) = ~llxll~-

J;

554

Foundations of Modern Probability

By an elementary version of Theorem 21.3, the integral equation

Xt = Zt + 1t b(xs)ds,

~ 0,

t

(29)

has a unique solution x = F(z) in C = C(JR+) for every z E C. Letting z 1 , z 2 E C be arbitrary and writing a for the Lipschitz constant of b, we note that the corresponding solutions xi = F(zi) satisfy

lx:- x~l

:S

llz 1 - z2 l + a1t lx!- x~l ds, t ~ 0.

Hence, Gronwall's Lemma 21.4 yields llx 1 - x2 l :S l!z 1 - z2 llear on the interval [0, r], which shows that Fis continuous. Using Schilder's theorem on JR+ along with Theorem 27.11 (i), we conclude that the processes X€ satisfy the LDP in C(JR+) with the good rate nmction I= Ioo o F- 1 . Now Fis clearly bijective, and (29) shows that the functions z and x = F(z) lie simultaneously in H 00 , in which case i = x- b(x) a.e. Thus, I is indeed given by (28). D

e

Now consider a random element with distribution JL in an arbitrary metric space S. We introduce the cumulant-generating functional

f

A(f) = logEef(~) = logJLef,

E Cb(S),

and the associated Legendre-Fenchel transform A*(v)

= sup (vf- A(f)), v

E P(S),

(30)

/ECb

where P(S) denotes the dass of probability measures on S, endowed with the topology of weak convergence. Note that A and A* are both convex, by the same argument as for JRd. Given any two measures JL, v E P(S), we define the relative entropy of v with respect to JL by H(viJL)

={

vlogp oo,

= JL(plogp),

v 1/

«

JL with v 'Tf2, •.. satisfy the LDP in P(S) with the good rate function A*(v) = H(viJ.t), v E P(S).

(31)

A couple of lemmas will be needed for the proof. Lemma 27.16 (entropy, Donsker and Varadhan) In (30) it is equivalent to take the supremum over all bounded, measurable functions f: S -+ R The identity (31) then holds for any probability measures J.t and v on a common measurable space S.

Proof: The first assertion holds by Lemma 1.35 and dominated convergence. If v 1;:. J.t, then H(viJ.t) = oo by definition. Furthermore, we may choose a set B E S with J.LB = 0 and vB > 0, and take fn = n1s to obtain vfn -logJ.tefn = nvB-+ oo. Thus, even A*(v) = oo in this case, and it remains to prove (31) when v « J.t. Assuming v = p · J.t and writing f = logp, we note that vf -logJ.tef

= vlogp -logJ.tp = H(viJ.t).

If f = logp is unbounded, we may approximate by bounded measurable functions f n satisfying J.tefn -+ 1 and v f n -+ v f, and we get A*(v) ~ H(viJ.t). To prove the reverse inequality, we first assume that S is finite and generated by a partition B1, ... , Bn of S. Putting f.tk = J.LBk, Vk = vBk, and Pk = vk/ J.tk, we may write our claim in the form

g(x)

=LkVkXk -log LkJ.tkexk :S Lkvk logpk,

where x = (x 1, ... , xn) E JR.d is arbitrary. Here the function g is concave and satisfies \lg(x) = 0 for x = (logp1, ... ,logpn), asymptotically when Pk = 0 for some k. Thus, supxg(x) = g(logp1, ... , logpk) = LkVk logpk. To prove the inequality v f - log J.tef :S v log p in general, we may assume that f is simple. The generated a-field F C S is then finite, and we note that v = J.t[piF] · J.t on F. Using the result in the finite case, together with Jensen's inequality for conditional expectations, we obtain vf

- log Jlef < J1(J1[piF]log 11[piF]) < J.LJ.t[plogpj:F] = vlogp.

D

556

Foundations of Modern Probability

Lemma 27.17 {exponential tightness} The empirical distributions 'TJn in Theorem 27.15 are exponentially tight in P(S).

Proof: If BE S with P{~ E B} = p E (0, 1), then by Theorem 27.3 and Lemmas 27.1 and 27.2 we have for any x E [p, 1) x 1-x sup n- 1 1ogP{rynB > x} S -xlog-- (1- x) log -1 - . (32) p

n

-p

In particular, we note that the right-hand side tends to -oo as p--+ 0 for fixed x E (0, 1). Now fix any r > 0. By (32) and Theorem 16.3, we may choose some compact sets K1, K2, · · · C S suchthat

P{rynKk > 2-k} S e-knr,

k,n E N.

Summing over k gives limsupn- 1 logPU {TJnKk k

n-+oo

> Tk} S-r,

and it remains to note that the set M

= nk {v E P(S); vKk S

Tk}

is compact, by another application of Theorem 16.3.

D

Proof of Theorem 27.15: By Theorem Al.1 we can embed S as a Borel subset of a compact metric space K. The function space Cb(K) is separable, and we can choose a dense sequence /1, h, · · · E Cb(K). For any m E N, the random vector (/l(f.), ... , fm(f.)) has cumulant-generating function Am(u)

= logEexp L

ukfk(~)

= Ao L

ukfk,

u E !Rm,

and so by Theorem 27.5 the random vectors (TJnh, ... , 'TJnfm) satisfy the LDP in !Rm with the good rate function A;',.. By Theorem 27.12 it follows that the infinite sequences (TJnh, 'TJnh, ... ) satisfy the LDP in IR 00 with the good rate function J = supm(A;,. o 11'm), where 11'm denotes the natural projection of IR 00 onto !Rm. Since P(K) is compact by Theorem 16.3 and the mapping v H (vh,vh, ... ) is a continuous injection ofP(K) into IR 00 , Theorem 27.11 (ii) shows that the random measures 'TJn satisfy the LDP in P(K) with the good rate function

J(vh, vh, ... ) = supmA':n(v/1, ... , vfm) sup sup m

uEJRm

(L k:::;m ukv fk - A o L k:::;m ukfk)

sup (vf- A(f))

/E:F

= sup (vf- A(f)), /ECb

(33)

where F denotes the set of alllinear combinations of /1, h, ... . Next we note that the natural embedding P(S) --+ P(K) is continuous, since for any f E Cb(K) the restriction offtoS belongs to Cb(S). Since it is also trivially injective, we see from Theorem 27.11 (ii) and Lemma 27.17

27. Large Deviations

557

that the 'f/n satisfy the LDP even in P(S), with a good rate function Is that equals the restriction of I K to P (S). It remains to note that I s = A* by {33) and Lemma 27.16. D We conclude with a remarkable application of Schilder's Theorem 27.6. Writing B for a standard Brownian motion in JR.d, we define for any t > e the scaled process xt by xt _ 8 -

Bst -./2tloglogt'

s

~

{34)

0.

Theorem 27.18 (functionallaw of the iterated logarithm, Strassen} Let B be a Brownian motion in JR.d, and dejine the processes xt by {34). Then the following equivalent statements hold outside a fixed P -null set:

(i) The paths xt, t ~ 3, form a relatively compact set in C{IR+,JR.d), whose set of limit points as t-+ oo equals K = {x E H 00 ; ll±ll2 : : ; 1}. (ii) For any continuous function F: C{IR+,JRd)-+ IR, we have limsupF{Xt) = sup F(x). t->oo xEK In particular, we may recover the classical law of the iterated logarithm in Theorem 13.18 by choosing F(x) = x1. Using Theorem 14.6, we can easily derive a correspondingly strengthened version for random walks. Proof: The equivalence of {i) and (ii) being elementary, we need to prove only (i). Noting that xt :'!= B/-./2loglogt and using Theorem 27.6, we get for any measurable set AC C{IR+, JR.d) and constant r > 1

. logP{XrnE A} 1. logP{XtE A} I(A-) l1msup < 1msup oo log n - t->oo log logt . flogP{XrnEA} . nflogP{XtEA} 1. 2I(Ao) ' l. lmm > lml >n-->00 log n - t->oo log log t where I(x)

= ~11±11~

when x E Hoo and I(x)

= oo otherwise.

Hence,

2I(A-) > 1, 2I(A 0 ) < 1.

{35)

Now fix any r > 1, and let G ::J K be open. Note that 2I{Gc) > 1 by Lemma 27.7. By the first part of {35) and the Borel-Cantelli lemma we have P{Xrn tf. G i.o.} = 0 or, equivalently, 1c{Xrn) -+ 1 a.s. Since G was arbitrary, it follows that p(Xrn, K) -+ 0 a.s. for any metrization p of C{IR+, JR.d). In particular, this holds with any c > 0 for the metric Pc(x, y)

=

1

00

((x- y): 1\l) e-cs ds,

x, y E C{IR+, IR.d).

To extend the convergence to the entire family {Xt}, fix any path of B such that p1 (Xrn, K) -+ 0, and choose some functions yrn E K satisfying p1 (Xrn, yrn) -+ 0. For any t E (rn, rn+l ), the paths xrn and Xt are related

558

Foundations of Modern Probability

by xt(s)

( nl

l

= xrn (tr-ns) r og ogr t loglogt

n)

1/2

'

s

> 0.

Defining yt in the same way in terms of yrn, we note that also yt E K since I(yt) ::; I(yrn). (The two H 00 -norms would agree if the logarithmic factors were omitted.) Furthermore,

1oo ((Xt -

Pr(Xt, yt)


0. By the established part of the theorem and the Cauchy-Buniakowski inequality, we have a.s. lim sup (Xt - y); ::; sup (x- y); ::; sup x;

t-+oo

xEK

xEK

+ y; ::; 2c 1 / 2 .

(36)

Write x;,u = supsE[c:,uJixs-Xc:l, and choose r > ujc to ensure independence between the variables (Xrn -y); u· Applying the second part of (35) to the open set A = {x; (x - y); u 0.)

7. Use Theorem 27.11 (i) to deduce the preceding result from Schilder's theorem. (Hint: For x E H1, note that lx1l :S ll±ll2 with equality iff Xt

txl-)

=

8. Prove Schilder's theorem on [0, T] by the same argument as for [0, 1]. 9. Deduce Schilder's theorem in the space C([O, n], JRd) from the version in C([O, 1], JRnd). 10. Let B be a Brownian bridge in JRd. Show that the processes c: 112 B satisfy the LDP in C([O, 1],1Rd) with the good rate function I(x) = ~11±11~ for x E H1 with x1 = 0 and I(x) = oo otherwise. (Hint: Write Bt = Xt- tX1 , where X is a Brownian motion in JRd, and use Theorem 27.11. Check that II± - all2 is minimized for a = xl-) 11. Show that the property of exponential tightness and its sequential version are preserved by continuous mappings. 12. Prove that if the processes Xe and yc in C(IR+, JRd) are exponentially tight, then so is any linear combination aXc +bYc. (Hint: Use the ArzelaAscoli theorem.) 13. Show directly from (27) that the processes Xe in Theorem 27.14 are exponentially tight. (Hint: Use Lemmas 27.7 and 27.9 (iii) together with the Arzela-Ascoli theorem.) Derive the same result from the stated theorem. 14. Let ~c be random elements in a locally compact metric space S, satisfying the LDP with a good rate function I. Show that the ~c are exponentially tight (even in the nonsequential sense). (Hint: For any r > 0, there exists a compact set Kr c S such that I- 1 [0, r] C K~. Now apply the LDP upper bound to the closed sets (Knc :J K~.) 15. For any metric space Sand lcscH space T, let XC be random elements in C(T, S) whose restrictions Xi< to an arbitrary compact set K C T satisfy the LDP in C(K, S) with the good rate function IK. Show that the Xe satisfy the LDP in C(T, S) with the good rate function I= supK(IK o1rK ), where 7rK denotes the restriction map from C(T, S) to C(K, S). 16. Let ~kj be i.i.d. random vectors in JRd satisfying A( u) = Eeut;ki < oo for all u E JRd. Show that the sequences ~n = n- 1 Ek::;n (~kb ~k2, ... ) satisfy an LDP in (JRd)oo with the good rate function I(x) = Ei A*(xj)· Also derive an LDP for the associated random walks in JRd.

560

Foundations of Modern Probability

17. Let~ be a sequence of i.i.d. N(O, 1) random variables. Use the preceding result to show that the sequences c 1 1 2 ~ satisfy the LDP in IR 00 with the good rate function I(x) = ~llxll 2 for XE l 2 and I(x) = 00 otherwise. Also show how the statement follows from Schilder's theorem. 18. Let 6, 6, . . . be i.i.d. random probability merasures on a Polish space S. Derive an LDP in P(S) for the averages [n = n- 1 L:k 0, such a function x has at most finitely many jumps of size >c before timet. In D(IR+, S) we introduce the modified modulus of continuity w(x, t, h)

= inf max sup p(xn x 8 ), (h)

k

r,sEh

x E D(IR+,S), t, h > 0,

(1)

where the infimum extends over all partitions of the interval [0, t) into subintervals h = [u, v) such that v - u 2: h when v < t. Note that w(x, t, h) -+ 0 as h -+ 0 for fixed x E D(IR+,S) and t > 0. By a timcehange on IR+ we mean a monotone bijection .>.:IR+-+ IR+. Note that), is continuous and strictly increasing with .>.0 = 0 and A00 = oo. Theorem A2.2 (J 1 -topology, Skorohod, Prohorov, Kolmogorov) Fix a sepamble1 complete metric space ( S, p) and a dense set T c IR+. Then there exists a sepamble and complete metric d in D(IR+, S) suchthat d(xn, x)-+ 0 iff

sup i>.n(s)- si + sup p(xn o An(s), x(s))-+ 0, s~t

t

> 0,

s~t

for some time-changes An on IR+. Furthermore, B(D(IR+, S)) = a{ 7rt; t E T}, and a set A C D(IR+, S) is relatively compact iff 1rtA is relatively compact in S for every t E T and

lim sup w(x, t, h)

In that case,

Us 9

h-+0 xEA

= 0, t > 0.

(2)

1r8 A is relatively compact inS for every t 2: 0.

Proof: See Either and Kurtz (1986), Sections 3.5 and 3.6, or Jacod and Shiryaev (1987), Section VI.l. D

A suitably modified version of the last result applies to the space D([O, 1], S). Herewe define w(x, h) in terms of partitions of [0, 1) into subintervals oflength ;:::: h and use time-changes >. that are increasing bijections on [0, 1]. Thrning to the case of measure spaces, let S be a locally compact, secondcountable Hausdorff (lcscH) space S with Borel a-field S, and let S denote the dass of bounded (i.e., relatively compact) sets in S. The space S is known to be Polish, and the family Cj( of continuous functions f: S -+ IR+ with compact support is separable in the uniform metric. Furthermore,

564

Foundations of Modern Probability

there exists a sequence of compact sets Kn t S suchthat Kn C K~+l for each n. Let M(S) denote the dass of measures on S that are locally finite (i.e., finite on S), and write 7rB and 1r1 for the mappings J.L 1-t J.LB and J.L 1-t J.Lf = fdJ.L, respectively, on M(S). The vague topology in M(S) is generated by the maps 'Trf, f E Cj(, and we write the vague convergence of J.Ln toward J.L as J.Ln -4 J.L. For any J.L E M(S) we define Sp. ={BE S; J.L8B = 0}. Here we list some basic facts about the vague topology.

J

Theorem A2.3 (vague topology) For any lcscH space S, we have

(i) M(S) is Polish in the vague topology; (ii) a set AC M(S) is vaguely relatively compact iff supp.EA J.Lf < oo for all f E Cj(; (iii) if J.Ln -4 J.L and BE S with J.L8B = 0, then J.LnB-+ J.LB; (iv) B(M(S)) is generated by the maps 'Trf, f E Cj(, and also for any m E M(S) by the maps 7rB, BE Sm. Proof: (i) Let

/1, /2, ... be densein Cj(, and define

It is easily seen that p metrizes the vague topology. In particular, M(S) is homeomorphic to a subset of IR 00 and therefore separable. The completeness of p will be clear once we have proved (ii). (ii) The necessity is dear from the continuity of 1r1 for each f E Cj(. Conversely, assume that supp.EA J.L! < oo for all f E Cj(. Choose some compact sets Kn t S with Kn C K~+l for each n, and let the functions fn E Cj( besuchthat lKn ~ fn ~ lKn+l" Foreach n the set Un · J.L; J.L E A} is uniformly bounded, and so by Theorem 16.3 it is even sequentially relatively compact. A diagonal argument then shows that A itself is sequentially relatively compact. Since M(S) is metrizable, the desired relative compactness follows. (iii) The proof is the same as for Theorem 4.25. (iv) A topological basis in M(S) is formed by all finite intersections of the sets {J.L; a < J.Lf < b} with 0 < a 0, or eise there exist some xn E A and some bounded St < tn < Un with Un - Sn --+ 0 such that

lim sup (p(x~n, x~J 1\ p(x~n, x~J) n-HXl

> 0.

(4)

In the former case it is dear from (3) that lim supn lxfjj - x~ fJ I > 0 for some jE N, which contradicts the relative compactness of AIJ. Next assume (4). By (3) there exist some i,j E N suchthat limsup (lx~ji- x~jill\ lx4,1J- x~jjl) > 0. n--+oo

Now for any

(5)

a, a', b, b' E IR, we have

~(lall\

lb'l):::; (lall\ Ia' I) V (lbl/\ lb'l) V (Ia + a'll\ lb + b'i).

Since the set {h} is dosed under addition, (5) then implies the same relation with a common i = j. But then (2) fails for A/i, which by Theorem A2.2 contradicts the relative compactness of Ak Thus, (2) does hold for A, and so A is relatively compact. 0 Given an lcscH space S, we introduce the dasses Q, :F, and K of open, dosed, and compact subsets, respectively. Here we may consider :F as a space in its own right, endowed with the Fell topology generated by the sets {FE :F; FnG =j:. 0} and {FE :F; FnK = 0} for arbitrary GE g and K E K. To describe the corresponding notion of convergence, we may fix a metrization p of the topology inS suchthat every dosed p-ball is compact.

566

Foundations of Modern Probability

Theorem A2.5 (Fell topology) Fix any lcscH space S, and let F be the class of closed sets F C S, endowed with the Fell topology. Then

(i) F is compact, second-countable, and Hausdorff; (ii) Fn--+ F in F iff p(s,Fn)--+ p(s,F) for all s ES; (iii) {FE F; FnB -:f. 0} is universally Borel measumble for every BE S. Proof: First we show that the Fell topology is generated by the maps f-t p(s, F), s ES. To see that those mappings are continuous, put Bs,r = {t ES; p(s, t) < r}, and note that F

{F; p(s,F) < r} {F; p(s,F) > r}

{F; F n B~ {F; FnB~

-:f. 0},

= 0}.

Here the sets on the right are open, by the definition of the Fell topology and the choice of p. Thus, the Fell topology contains the p-topology. To prove the converse, fix any F E F and a net {Fi} C F with directed index set (I,-- i. Then also Fj n G 1- 0 for all j >- i. Next consider any K E /( with F n K = 0. Define r 8 = ~p(s, F) for each s E K and put Gs = Bs,r,· Since K is compact, it is covered by finitely many balls G sk. For each k we have p( Sk, Fi) --+ p( Sk, F), and so there exists some ik E I such that Fj n Gsk = 0 for all j >- ik. Letting i E I be suchthat i >- ik for all k, it is clear that Fj n K = 0 for all j >- i. Now we fix any countable dense set D C S, and assume that p(s, Fi)--+ p(s, F) for all s E D. For any s, s' ES we have

lp(s, Fj)- p(s, F)l:::; lp(s', Fj)- p(s', F)l + 2p(s, s'). Given any s and c: > 0, we can make the left-hand side < c:, by choosing an s' E D with p(s, s') < c:/3 and then an i E I suchthat lp(s', Fj)-p(s', F)l < c:/3 for all j >- i. This shows that the Fell topology is also generated by the mappings F f-7_pj,s, F) with s restricted to D. But then Fis homeomorphic to a subset of ~+ , which is second-countable and metrizable. To prove that Fis compact, it is now enough to show that every sequence (Fn) CF contains a convergent subsequence. Then choose a subsequence such that p(s, Fn) converges in iR:+ for all s E D, and hence also for all s E S. Since the family of functions p(s, Fn) is equicontinuous, even the limit f is continuous, and so the set F = {s E S; f (s) = 0} is closed. To obtain Fn --+ F, we need to show that whenever F n G -:f. 0 or F n K = 0 for some G E g or K E IC, the same relation eventually holds even for Fn. In the former case, we may fix any s E F n G and note that p(s, Fn) --+ f(s) = 0. Hence, we may choose some Sn E Fn with Sn --+ s, and since Sn E G for large n, we get Fn n G -:f. 0. In the latter case, we assume that instead FnnK -:f. 0 along a subsequence. Then there exist some

A2. Some Special Spaces

567

Sn E Fn n K, and we note that Sn -+ s E K along a further subsequence. Here 0 = p(sn, Fn) -+ p(s, F), which yields the contradiction s E F n K. This completes the proof of (i). To prove (iii), we note that the mapping (s, F) H p(s, F) is jointly continuous and hence Borel measurable. Now Sand :F are both separable, and so the Borel CT-field inS x :F agrees with the product CT-field S ® B(:F). Since s E F iff p(s, F) = 0, it follows that {(s, F); s E F} belongs to S ® B(:F). Hence, so does { (s, F); s E F n B} for arbitrary B E S. The assertion now follows by Theorem Al.4. 0 We say that a dass U C S is separating if for any K c G with K E K and G E g there exists some U EU with K C U C G. A preseparating dass I C S is such that the finite unions of I-sets form a separating dass. When S is Eudidean, we typically choose I to be a dass of intervals or rectangles and U as the corresponding dass of finite unions.

Lemma A2.6 (separation) For any monotone function h: class ={BE S; h(B 0 ) = h(B)} is separating.

sh

S -+

:IR, the

Proof: Fixa metric p inS suchthat every dosed p-ball is compact, and let K E K and GE g with K C G. For any e > 0, define Kc = {s ES; d(s,K) < e} andnotethat Kc = {s ES; p(s,K) ~ e}. SinceKiscompact, we have p(K, Ge) > 0, and so K c Kc c G for sufficiently small e > 0. From the monotonicity of h it is further dear that Kc E Sh for almost 0 every e > 0. We often need the separating dass to be countable.

Lemma A2. 7 (countable separation) Every separating class U contains a countable separating subclass.

c S

Proof: Fix a countable topological base B C S, dosed under finite unions. Choose for every BEB some compact sets KB,n .j_ B with Kß,n ::J B, and then foreachpair (B, n) E BxN some set UB,n EU with B C UB,n C Kß,n· The family {UB,n} is dearly separating. 0 The next result, needed for the proof of Theorem 16.29, relates the vague and Fell topologies for integer-valued measures and their supports. Let N(S) denote the dass of locally finite, integer-valued measures on S, and write

1t for convergence in the Fell topology.

Proposition A2.8 (supports of measures) Let J.L, J.ll, J.l2, · · · E N(S) with supp J.ln

1t supp J.L,

where S is lcscH and J.L is simple. Then limsup(J.LnB 1\ 1) ~ J.LB ~ liminfJ.LnB, BE Sw n---+-oo

n--+oo

Proof: To prove the left inequality, we may assume that J.LB = 0. Since B E SJ.L, we have even J.LB = 0, and so B n supp J.L = 0. By convergence of

568

Foundations of Modern Probability

the supports we get B n supp f..tn

= 0 for large enough n, which irnplies

lim sup(J.tnB 1\ 1) ~ lim sup f..tnB n->oo

n->oo

= 0 = JLB.

To prove the right inequality, we may assume that J.tB = m > 0. Since is a separating ring, we may choose a partition B1, ... , Bm E Sll- of B suchthat J.tBk = 1 for each k. Then also J.tB'k = 1 for each k, and so B'k n supp f..t =j:. 0. By convergence of the supports we get B'k n supp f..tn =/:- 0 for large enough n. Hence,

Sll-

1 ~ liminf JLnB'k ~ liminf JLnBk, n~oo

n~oo

and so

< liminf"' f..tnBk n~oo L....Jk

= liminfJ.tnB. n--+oo

0

To state the next result, fix any metric spaces Sb S2, ... , and introduce the product spaces sn = s1 X ... X Sn and s = s1 X s2 X ... endowed with their product topologies. For any m < n < oo, let 7rm and 1l"mn denote the natural projections of Sand sn onto sm. The sets An c sn, n E N, are said to form a projective sequence if 1l"mnAn C Am for all m ~ n. We may then define their projective limit in S as the set A = nn 1r; 1An.

Lemma A2.9 (projective limits) For any metric spaces Sb S2, ... , consider a projective sequence of nonempty, compact sets Kn C S 1 x · · · x Sn, n E N. Then the projective limit K = nn 1r; 1 Kn is again nonempty and compact. Proof: Since the Kn are nonempty, we may choose some sequences xn = 1r; 1Kn, n E N. By the projective property of the sets Km, we have 1l"mXn E Km for all m ~ n. In particular, the sequence x~, x~, ... is relatively compact in Sm for each m E N, and by a diagonal argument we may choose a subsequence N' C N and an element x = (xm) ES such that xn ---+ x as n ---+ oo along N'. Then also 1rmXn ---+ 1rmX along N' for each m E N, and since the Km are closed, we conclude that 1l"mX E Km for all m. Thus, we have x E K, which shows that K is nonempty. The compactness of K may be proved by the same argument, where we assume that x\x 2 ,·· · E K. 0 (x~) E

Historical and Bibliographical Notes The following notes were prepared with the modest intentions of tracing the origins of some of the basic ideas in each chapter, of giving precise references for the main results cited in the text, and of suggesting some literature for further reading. No completeness is claimed, and knowledgeable readers are likely to notice misinterpretations and omissions, for which I appologize in advance. A comprehensive history of modern probability theory still remains to be written.

1. Measure Theory -

Basic Notions

The first author to consider measures in the modern sense was BOREL (1895, 1898), who constructed Lebesgue measure on the Borel O"-field in R The corresponding integral was introduced by LEBESGUE (1902, 1904), who also established the dominated convergence theorem. The monotone convergence theorem and Fatou's lemma were later obtained by LEVI (1906a) and FATOU (1906), respectively. LEBESGUE also introduced the higher-dimensional Lebesgue measure and proved a first version of Fubini's theorem, subsequently generalized by FUBINI (1907) and TONELLI (1909). The integration theory was extended to general measures and abstract spaces by many authors, including RADON (1913) and FRECHET (1928). The norm inequalities in Lemma 1.29 were first noted for finite sums by HÖLDER (1889) and MINKOWSKI (1907), respectively, and were later extended to integrals by Rmsz (1910). Part (i) for p = 2 goes back to CAUCHY (1821) for finitesums and to BUNIAKOWSKY (1859) for integrals. The Hilbert space projection theorem can be traced back to LEVI (1906b). The monotone dass Theorem 1.1 was first proved, along with related results, already by SIERPINSKI (1928), but the result was not used in probability theory until DYNKIN (1961). More primitive versions had previously been employed by HALMOS (1950) and DOOB (1953). Most results in this chapter are well known and can be found in any textbook on real analysis. Many probability texts, including LOEVE (1977) and BILLINGSLEY (1995), contain detailed introductions to measure theory. There are also some excellent texts in real analysis adapted to the needs of probabilists, such as DUDLEY (1989) and DooB (1994). The former author also provides some more detailed historical information.

570

Foundations of Modern Probability

2. Measure Theory- Key Results As we have seen, BOREL (1995, 1998) was the first to prove the existence of one-dimensional Lebesgue measure. However, the modern construction via outer measures in due to CARATHEODORY (1918). Functions of bounded variation were introduced by JORDAN (1881), who proved that any such function is the difference of two nondecreasing functions. The corresponding decomposition of signed measures was obtained by HAHN (1921). Integrals with respect to nondecreasing functions were defined by STIELTJES (1894), but their importance was not recognized until Rmsz (1909b) proved his representation theorem for linear functionals on C[O, 1]. The a.e. differentiability of a function of bounded variationwas first proved by LEBESGUE (1904). VITAL! (1905) was the first author to see the connection between absolute continuity and the existence of a density. The Radon-Nikodym theorem was then proved in increasing generality by RADON (1913), DANIELL (1920), and NIKODYM (1930). The idea of a combined proofthat also establishes the Lebesgue decomposition is due to VON NEUMANN. Invariant measures on specific groups were early identified through explicit computation by many authors, notably by HURWITZ (1897) for the case of SO(n). HAAR (1933) proved the existence (but not the uniqueness) of invariant measures on an arbitrary lcscH group. The modern treatment originated with WEIL (1940), and excellent expositians can be found in many books on real or harmonic analysis. Invariant measures on more general spaces are usually approached via quotient spaces. Our discussion in Theorem 2.29 is adapted from ROYDEN (1988).

3. Processes, Distributions, and Independence The use of countably additive probability measures dates back to BOREL (1909), who constructed random variables as measurable functions on the Lebesgue unit interval and proved Theorem 3.18 for independent events. CANTELLI (1917) noticed that the "easy" part remains true without the independence assumption. Lemma 3.5 was proved by JENSEN (1906) after HÖLDER had obtained a special case. The modern framework, with random variables as measurable functions on an abstract probability space (n, A, P) and with expected values as Pintegrals over n, was used implicitly by KoLMOGOROV from (1928) on and was later formalized in KOLMOGOROV (1933). The latter monographalso contains Kolmogorov's zero-one law, discovered long before HEWITT and SAVAGE (1955) obtained theirs. Early work in probability theory deals with properties depending only on the finite-dimensional distributions. WIENER (1923) was the first author to construct the distribution of a process as a measure on a function space. The general continuity criterion in Theorem 3.23, essentially due to KoL-

Historical and Bibliographical Notes

571

MOGOROV, was first published by SLUTSKY (1937), with minor extensions later added by LOEVE (1978) and CHENTSOV (1956). The general search for regularity properties was initiated by DooB (1937, 1947). Soon it became clear, especially through the work of LEVY (1934-35, 1954), DOOB (1951, 1953), and KINNEY (1953), that most processes of interest have right-continuous versions with left-hand limits. More detailed accounts of the material in this chapter appear in many textbooks, such as in BILLINGSLEY (1995), ITO (1984), and WILLIAMS (1991). Further discussions of specific regularity properties appear in LOEVE (1977) and CRAMER and LEADBETTER (1967). Earlier texts tend to give more weight to distribution functions and their densities, less weight to measures and a--fields.

4. Random Sequences, Series, and Averages The weak law of large numbers was first obtained by BERNOULLI (1713) for the sequences named after him. More general versions were then established with increasing rigor by BIENAYME (1853), CHEBYSHEV (1867), and MARKOV (1899). A necessary and sufficient condition for the weak law of large numbers was finally obtained by KOLMOGOROV (1928-29). KHINCHIN and KOLMOGOROV (1925) studied series of independent, discrete random variables and showed that convergence holds under the condition in Lemma 4.16. KOLMOGOROV (1928-29) then obtained his maximum inequality and showed that the three conditions in Theorem 4.18 are necessary and sufficient for a.s. convergence. The equivalence with convergence in distributionwas later noted by LEVY (1954). The strong law of large numbers for Bernoulli sequences was stated by BOREL (1909), but the first rigorous proof is due to FABER (1910). The simple criterion in Corollary 4.22 was obtained in KOLMOGOROV (1930). In (1933) KOLMOGOROV showed that existence of the mean is necessary and sufficient for the strong law of large numbers for general i.i.d. sequences. The extension to exponents p # 1 is due to MARCINKIEWICZ and ZYGMUND (1937). Proposition 4.24 was proved in stages by GLIVENKO (1933) and CANTELLI (1933). Rmsz (1909a) introduced the notion of convergence in measure, for probability measures equivalent to convergence in probability, and showed that it implies a.e. convergence along a subsequence. The weak compactness criterion in Lemma 4.13 is due to DUNFORD (1939). The functional representation of Proposition 4.31 appeared in KALLENBERG (1996a), and Corollary 4.32 was given by STRICKERand YOR (1978). The theory of weak convergence was founded by ALEXANDROV (194ü43), who proved in particular the so-called Portmanteau Theorem 4.25. The continuous mapping Theorem 4.27 was obtained for a single function fn f by MANN and WALD (1943) and then in the general case by PROHOROV

=

572

Foundations of Modern Probability

(1956) and RuBIN. The coupling Theorem 4.30 is due for complete S to SKOROHOD (1956) and in general to DUDLEY (1968). More detailed accounts of the material in this chapter may be found in many textbooks, such as in LOEVE (1977) and Cuow and TEICHER (1997). Additional results on random series and a.s. convergence appear in STOUT (1974) and KWAPIEN and WoYCZYNSKI (1992).

5. Characteristic Functions and Classical Limit Theorems The centrallimit theorem (a name first used by P6LYA (1920)) has a long and glorious history, beginning with the work of DE MOIVRE (1733-56), who obtained the now-familiar approximation of binomial probabilities in terms of the normal density function. LAPLACE (1774, 1812-20) stated the general result in the modern integrated form, but his proof was incomplete, as was the proof of CHEBYSHEV (1867, 1890). The first rigorous proof was given by LIAPOUNOV (1901), though under an extra moment condition. Then LINDEBERG (1922a) proved his fundamental Theorem 5.12, which in turn led to the basic Proposition 5.9 in a series of papers by LINDEBERG (1922b) and LEVY (1922a-c). BERNSTEIN (1927) obtained the first extension to higher dimensions. The general problern of normal convergence, regarded for two centuries as the central (indeed the only) theoretical problern in probability, was eventually solved in the form of Theorem 5.15, independently by FELLER (1935) and LEVY (1935a). Slowly varying functions were introduced and studied by KARAMATA (1930). Though characteristic functions have been used in probability theory ever since LAPLACE (1812-20), their first use in a rigorous proof of a limit theorem had to wait until LIAPOUNOV (1901). The first general continuity theoremwas established by LEVY (1922c), who assumed the characteristic functions to converge uniformly in some neighborhood of the origin. The definitive version in Theorem 5.22 is due to BOCHNER (1933). Our direct approach to Theorem 5.3 may be new, in avoiding the relatively deep HELLY selection theorem (1911-12). The basic Corollary 5.5 was noted by CRAMER and WoLD {1936). Introductions to characteristic functions and classicallimit theorems may be found in many textbooks, notably LOEVE (1977). FELLER (1971) is a rich source of further information on Laplace transforms, characteristic functions, and classical limit theorems. For more detailed or advanced results on characteristic functions, see LUKACS (1970).

Historical and Bibliographical Notes

573

6. Conditioning and Disintegration Though conditional densities have been computed by statisticians ever since LAPLACE (1774), the first general approach to conditioning was devised by KOLMOGOROV (1933), who defined conditional probabilities and expectations as random variables on the basic probability space, using the Radon-Nikodym theorem, which had recently become available. His original notion of conditioning with respect to a random vector was extended by HALMOS (1950) to general random elements and then by DooB (1953) to abstract sub-a-fields. Our present Hilbert space approach to conditioning, essentially due to VON NEUMANN (1940), is more elementary and intuitive and avoids the use of the relatively deep Radon-Nikodym theorem. It has the further advantage of leading to the attractive interpretation of a martingale as a projective family of random variables. The existence of regular conditional distributions was studied by several authors, beginning with DooB (1938). It leads immediately to the familiar disintegration of measures on product spaces and to the frequently used but rarely stated disintegration Theorem 6.4. Measures on infinite product spaces were first considered by DANIELL (1918-19, 1919-20), who proved the extension Theorem 6.14 for countable product spaces. KOLMOGOROV (1933) extended the result to arbitrary index sets. LOMNICKI and ULAM (1934) noted that no topological assumptions are needed for the construction of infinite product measures, a result that was later extended by C.T. IoNESCU TULCEA (1949-50) to measures specified by a sequence of conditional distributions. The interpretation of the simple Markov property in terms of conditional independence was indicated already by MARKOV (1906), and the formal statement of Proposition 6.6 appears in DooB (1953). FUrther properties of conditional independence have been listed by DÖHLER (1980) and others. The transfer Theorem 6.10, in the present form quoted from KALLENBERG (1988), may have been first noted by THORISSON. The traditional Radon-Nikodym approach to conditional expectations appears in many textbooks, such as in BILLINGSLEY (1995).

7. Martingales and Optional Times Martingales were first introduced by BERNSTEIN (1927, 1937) in his efforts to relax the independence assumption in the classicallimit theorems. Both BERNSTEIN and LEVY (1935a-b, 1954) extended Kolmogorov's maximum inequality and the central limit theorem to a general martingale context. The term martingale (originally denoting part of a horse's harness and later used for a special gambling system) was introduced in the probabilistic context by VILLE (1939).

574

Foundations of Modern Probability

The first martingale convergence theoremwas obtained by JESSEN (1934) and LEVY (1935b), both of whom proved Theorem 7.23 for filtrations generated by sequences of independent random variables. A submartingale version of the same result appears in SPARRE-ANDERSEN and JESSEN (1948). The independence assumption was removed by LEVY (1954), who also noted the simple martingale proof of Kolmogorov's zero-one law and obtained his conditional version of the Borel-Cantelli lemma. The general convergence theorem for discrete-time martingales was proved by DOOB (1940), and the basic regularity theorems for continuoustime martingales first appeared in DooB (1951). The theory was extended to submartingales by SNELL (1952) and DooB (1953). The latter book is also the original source of such fundamental results as the martingale closure theorem, the optional sampling theorem, and the LP-inequality. Though hitting times have long been used informally, general optional times seem to appear for the firsttime in DooB (1936). Abstractfiltrations were not introduced until DooB (1953). Progressive processes were introduced by DYNKIN (1961), and the modern definition of the a-fields :F7 is due to YUSHKEVICH. Elementary introductions to martingale theory are given by many authors, including WILLIAMS (1991). Moreinformation about the discretetime case is given by NEVEU (1975) and CHOW and TEIGHER (1997). Fora detailed account of the continuous-time theory and its relations to Markov processes and stochastic calculus, see DELLACHERIE and MEYER (1975-87).

8. Markov Processes and Discrete-Time Chains Markov chains in discrete time and with finitely many states were introduced by MARKOV (1906), who proved the firstergodie theorem, assuming the transition probabilities to be strictly positive. KoLMOGOROV (1936ab) extended the theory to countable state spaces and arbitrary transition probabilities. In particular, he noted the decomposition of the state space into irreducible sets, classified the states with respect to recurrence and periodicity, and described the asymptotic behavior of the n-step transition probabilities. Kolmogorov's original proofs were analytic. The more intuitive coupling approach was introduced by DOEBLIN (1938), long before the strong Markov property had been formalized. BACHELlER had noted the connection between random walks and diffusions, which inspired KOLMOGOROV (1931a) to give a precise definition of Markov processes in continuous time. His treatment is purely analytic, with the distribution specified by a family of transition kernels satisfying the Chapman-Kolmogorov relation, previously noted in special cases by CHAPMAN (1928) and SMOLUCHOVSKY. KOLMOGOROV (1931a) makes no reference to sample paths. The transition to probabilistic methods began with the work of LEVY (1934-35) and DOEBLIN (1938). Though the strong Markov property was used informally

Historical and Bibliographical Notes

575

by those authors (and indeed already by BACHELlER (1900, 1901)), the result was first stated and proved in a special case by DooB (1945). General filtrations were introduced in Markov process theory by BLUMENTHAL (1957). The modern setup, with a canonical process X defined on the path Space f2, equipped with a filtration J', a family of shift Operators flt, and a collection of probability measures Px, was developed systematically by DYNKIN (1961, 1965). A weaker form of Theorem 8.23 appears in BLUMENTHAL and GETOOR (1968), and the present version is from KALLENBERG (1987, 1998). Elementary introductions to Markov processes appear in many textbooks, such as ROGERS and WILLIAMS (2000a) and CHUNG (1982). More detailed or advanced accounts are given by DYNKIN (1965), BLUMENTHAL and GETOOR (1968), ETHIER and KURTZ (1986), DELLACHERIE and MEYER (1975-87), and SHARPE (1988). FELLER (1968) gives a masterly introduction to Markov chains, later imitated by many authors. More detailed accounts of the discrete-time theory appear in KEMENY et al. (1966) and FREEDMAN (1971a). The coupling method fell into oblivion after Doeblin's untimely death in 1940 but has recently enjoyed a revival, meticulously documented by LINDVALL (1992) and THORISSON (2000).

9. Random Walks and Renewal Theory Random walks originally arose in a wide range of applications, such as gambling, queuing, storage, and insurance; their history can be traced back to the origins of probability. The approximation of diffusion processes by random walks dates back to BACHELlER (1900, 1901). A further application was to potential theory, where in the 1920s a method of discrete approximation was devised, admitting a probabilistic interpretation in terms of a simple symmetric random walk. Finally, random walks played an important role in the sequential analysis developed by WALD (1947). The modern theory began with POLYA's (1921) discovery that a simple symmetric random walk on zd is recurrent for d :::;; 2 and transient otherwise. His result was later extended to Brownian motion by LEVY (1940) and KAKUTANI (1944a). The general recurrence criterion in Theorem 9.4 was derived by CHUNG and FUCHS (1951), and the probabilistic approach to Theorem 9.2 was found by CHUNG and ÜRNSTEIN (1962). The first condition in Corollary 9. 7 is, in fact, even necessary for recurrence, as was noted independently by ÜRNSTEIN (1969) and C.J. STONE (1969). The reflection principle was first used by ANDRE (1887) in his discussion of the ballot problem. The systematic study of fluctuation and absorption problems for random walks began with the work of PoLLACZEK (1930). Ladder times and heights, first introduced by BLACKWELL, were explored in an influential paper by FELLER (1949). The factorizations in Theorem 9.15 were originally derived by the Wiener-Hopf technique, which had been developed by PALEY and WIENER (1934) as a general tool in Fourier analysis.

576

Foundations of Modern Probability

Theorem 9.16 is due for u = 0 to SPARRE-ANDERSEN (1953-54) andin general to BAXTER (1961). The former author used complicated combinatorial methods, whieh were later simplified by FELLER and others. Though renewals in Markov chains are implicit already in some early work of KOLMOGOROV and LEVY, the general renewal process was apparently first introduced by PALM (1943). The first renewal theoremwas obtained by ERDÖS et al. (1949) for random walks on Z+· In that case, however, CHUNG noted that the result is an easy consequence of KOLMOGOROV's (1936a-b) ergodie theorem for Markov chains on a countable state space. BLACKWELL (1948, 1953) extended the result to random walks on JR+. The ultimate version for transient random walks on lR is due to FELLERand ÜREY (1961). The first coupling proof of Blackwell's theorem was given by LINDVALL (1977). Our proof is a modification of an argument by ATHREYA et al. (1978), which originally did not cover all cases. The method seems to require the existence of a possibly infinite mean. An analytic approach to the general case appears in FELLER (1971). Elementary introductions to random walks are given by many authors, including CHUNG (1974), FELLER (1968, 1971), and LOEVE (1977). A detailed exposition of random walks on zd is given by SPITZER (1976).

10. Stationary Processes and Ergodie Theory The history of ergodie theory dates back to BOLTZMANN's (1887) work in statistical mechanics. Boltzmann's ergodie hypothesis--the conjectural equality between time and ensemble averages-was long accepted as a heuristic principle. In probabilistic terms it amounts to the convergence r 1 J~ f(X 8 ) ds -+ Ef(Xo), where Xt represents the state of the system (typically the configuration of all molecules in a gas) at time t, and the expected value is computed with respect to a suitably invariant probability measure on a compact submanifold of the state space. The ergodie hypothesis was sensationally proved as a mathematieal theorem, first in an L 2-version by VON NEUMANN (1932), after KOOPMAN (1931) had noted the connection between measure-preserving transformations and unitary operators on a Hilbert space, and shortly afterwards in the pointwise form of BIRKHOFF (1932). The initially quite intricate proof of the latter was simplified in stages: first by Y OSIDA and KAKUTANI (1939), who noted how the result follows easily from the maximal ergodie Lemma 10.7, and then by GARSIA (1965), who gave a short proof of the latter result. KHINCHIN (1933, 1934) pioneered a translation of the results of ergodie theory into the probabilistic setting of stationary sequences and processes. The first multivariate ergodie theoremwas obtained by WIENER (1939), who proved Theorem 10.14 in the special case of averages over concentric balls. More general versions were established by many authors, including DAY (1942) and PITT (1942). The classical methods were pushed to the

Historical and Bibliographical Notes

577

limit in a notable paper by TEMPEL'MAN (1972). NGUYEN and ZESSIN (1979) proved versions ofthe theorem for finitely additive set functions. The first ergodie theorem for noneommutative transformations was obtained by ZYGMUND (1951). SUCHESTON (1983) noted that the Statement follows easily from MAKER's (1940) result. In Lemma 10.15, part (i) is due to ROGERS and SHEPHARD (1958); part (ii) is elementary. The ergodie theorem for random matriees was proved by FURSTENBERG and KESTEN (1960), lang before the subadditive ergodie theorem beeame available. The latter result was originally proved by KINGMAN (1968) under the stronger hypothesis that the array (Xm,n) be jointly stationary in m and n. The present extension and shorter proof are due to LIGGETT (1985). The ergodie decomposition of invariant measures dates back to KRYLOV and BOGOLIOUBOV (1937), though the basie role of the invariant a-field was not reeognized until the work of FARRELL (1962) and VARADARAJAN (1963). The eonneetion between ergodie decompositions and suffi.cient statisties is explored in an elegant paper by DYNKIN (1978). The traditional approach to the subjeet is via Choquet theory, as surveyed by DELLACHERIE and MEYER (1975-87). The eoupling equivalenees in Theorem 10.27 (i) were proved by S. GOLDSTEIN (1979), after GRIFFEATH (1975) had obtained a related result for Markov ehains. The shift eoupling part of the same theorem was established by BERBEE (1979) and ALDOUS and THORISSON (1993), and the version for abstraet groups was then obtained by THORISSON (1996). The latter author surveyed the whole area in (2000). Elementary introduetions to stationary proeesses have been given by many authors, beginning with DOOB (1953) and CRAMER and LEADBETTER (1967). LOEVE (1978) eontains a more advaneed aceount of probabilistie ergodie theory. A modern and eomprehensive survey of the vast area of generalergodie theorems is given by KRENGEL (1985).

11. Related Notions of Symmetry and Invariance Palm distributions are named after the Swedish engineer PALM (1943), who in a pioneering study of intensity fl.uetuations in telephone traffi.e eonsidered some basie Palm probabilities associated with simple, stationary point processes an JR., using an elementary conditioning approach. Palm also derived some primitive inversion formulas. An extended and more rigorous account of Palm's ideas was given by KHINCHIN (1955), in a monograph on queuing theory. Independently of Palm's work, KAPLAN (1955) first obtained Theorem 11.4 as an extension of some results for renewal proeesses by DooB (1948). A partial diserete-time result in this direetion had already been noted by KAc (1947). Kaplan's result was rediseovered in the setting of Palm distributions, independently by RYLL-NARDZEWSKI (1961) and SLIVNYAK (1962). In the speeial ease of intervals on the realline, Theorem 11.5 (i) was

578

Foundations of Modern Probability

first noted by KOROLYUK (as cited by KHINCHIN (1955)), and part (iii) of the sametheoremwas obtained by RYLL-NARDZEWSKI (1961). The general versions are due to KÖNIG and MATTRES (1963) and MATTRES (1963) for d = 1 and to MATTRES et al. (1978) for d > 1. A moreprimitive setwise version of Theorem 11.8 (i), due to SLIVNYAK (1962), was strengthened by ZÄHLE (1980) to convergence in total variation. DE FINETTI (1930, 1937) proved that an infinite sequence of exchangeable random variables is mixed i.i.d. The result became a cornerstone in his theory of subjective probability and Bayesian statistics. RYLL-NARDZEWSKI ( 195 7) noted that the theorem remains valid under the weaker hypothesis of spreadability, and BÜHLMANN (1960) extended the result to continuous time. The predictable sampling property in Theorem 11.13 was first noted by DooB (1936) for i.i.d. random variables and increasing sequences of predictable times. The general result and its continuous-time counterpart appear in KALLENBERG (1988). SPARRE-ANDERSEN's (1953-54) announcement of his Corollary 11.14 was (according to FeUer) "a sensation greeted with incredulity, and the original proof was of an extraordinary intricacy and complexity." A simplified argument (different from ours) appears in FELLER (1971). Lemma 11.9 is quoted from KALLENBERG (1999b). BERTRAND (1887) noted that if two candidates A and B in an election get the proportians p and 1- p of the votes, then the probability that A will lead throughout the counting of bailots equals (2p- 1) V 0. Moregeneral "ballot theorems" and alternative proofs have been discovered by many authors, beginning with ANDRE (1887) and BARBIER (1887). TAKACS (1967) obtained the version for cyclically stationary processes on a finite interval and gave numerous applications to queuing theory. The present statement is cited from KALLENBERG (1999a). The first version of Theorem 11.18 was obtained by SHANNON (1948), who proved the convergence in probability for stationary and ergodie Markov chains in a finite state space. The Markovian restriction was lifted by McMILLAN (1953), who also strengthened the result to convergence in L 1 . CARLESON (1958) extended McMillan's result to countable state spaces. The a.s. convergence is due to BREIMAN (1957-60) and A. lONESCU TULCEA (1960) for finite state spaces and to CHUNG (1961) for the countable case. More information about Palm measures is available in MATTRES et al. (1978), DALEY and VERE-JONES (1988), and THORISSON (2000). Applications to queuing theory and other areas are discussed by many authors, including FRANKEN et al. (1981) and BACCELLI and BREMAUD (1994). ALDOUS (1985) gives a comprehensive survey of exchangeability theory. A nice introduction to information theory is given by BILLINGSLEY (1965).

Historical and Bibliographical Notes

579

12. Poisson and Pure Jump-Type Markov Processes The Poisson distribution was introduced by DE MOIVRE (1711-12) and POISSON (1837) as an approximation to the binomial distribution. The associated process arose much later from miscellaneous applications. Thus, it was considered by LUNDBERG (1903) to model streams of insurance claims, by RUTHERFORD and GEIGER (1908) to describe the process of radioactive decay, and by ERLANG (1909) to model the incoming traffic to a telephone exchange. Poisson random measures in higher dimensions appear implicitly in the work of LEVY (1934-35), whose treatment was later formalized by lTÖ (1942b). The independent-increment characterization of Poisson processes goes back to ERLANG (1909) and LEVY (1934-35). Cox processes, originally introduced by Cox (1955) under the name of doubly stochastic Poisson processes, were thoroughly explored by KINGMAN (1964), KRICKEBERG (1972), and GRANDELL (1976). Thinnings were first considered by RENYI (1956). The binomial construction of general Poisson processes was noted independently by KINGMAN (1967) and MECKE (1967). One-dimensional uniqueness criteria were obtained, first in the Poisson case by RENYI (1967), and then in general by MÖNCH (1971), KALLENBERG (1973a, 1986), and GRANDELL (1976). The mixed Poisson and binomial processes were studied extensively by MATTRES et al. (1978) and KALLENBERG (1986). Markov chains in continuous time have been studied by many authors, beginning with KOLMOGOROV (1931a). The transition functions of general pure jump-type Markov processes were explored by PosPISIL (1935-36) and FELLER (1936, 1940), and the corresponding sample path properties were examined by DOEBLIN (1939b) and DooB (1942b). The first continuoustime version of the strong Markov property was obtained by DOOB (1945). KINGMAN (1993) gives an elementary introduction to Poisson processes with numerous applications. More detailed accounts, set in the context of general random measures and point processes, appear in MATTRES et al. (1978), KALLENBERG (1986), and DALEY and VERE-JONES (1988). lntroductions to continuous-time Markov chains are provided by many authors, beginning with FELLER (1968). For a more comprehensive account, see CHUNG (1960). The underlying regenerative structure was examined by KINGMAN (1972).

13. Gaussian Processes and Brownian Motion The Gaussian density function first appeared in the work of DE MOlVRE (1733-56), and the corresponding distribution became explicit through the work of LAPLACE (1774, 1812-20). The Gaussian law was popularized by GAUSS (1809) in his theory of errors and so became named after him. MAXWELL derived the Gaussian law as the velocity distribution for the molecules in a gas, assuming the hypotheses of Proposition 13.2. Theorem

580

Foundations of Modern Probability

13.3 was originally stated by ScHOENBERG (1938) as a relation between positive definite and completely monotone functions; the probabilistic interpretation was later noted by FREEDMAN (1962-63). Isonormal Gaussian processes were introduced by SEGAL (1954). The process of Brownian motionwas introduced by BACHELlER (1900, 1901) to model fluctuations on the stock market. BacheHer discovered some basic properties of the process, such as the relation Mt =d IBtl· EINSTEIN (1905, 1906) later introduced the same process as a model for the physical phenomenon of Brownian motion-the irregular movement of microscopic particles suspended in a liquid. The latter phenomenon, first noted by VAN LEEUWENHOEK in the seventeenth century, is named after the botanist BROWN (1828) for his systematic observations of pollen grains. Einstein's theory was forwarded in support of the still-controversial molecular theory of matter. A more refined model for the physical Brownian motion was proposed by LANGEVIN (1909) and ÜRNSTEIN and UHLENBECK (1930). The mathematical theory of Brownian motion was put on a rigorous basis by WIENER (1923), who constructed the associated distribution as a measure on the space of continuous paths. The significance of Wiener's revolutionary paper was not fully recognized until after the pioneering work of KOLMOGOROV (1931a, 1933), LEVY (1934-35), and FELLER (1936). Wiener also introduced stochastic integrals of deterministic L 2-functions, which were later studied in further detail by PALEY et al. (1933). The spectral representation of stationary processes, originally deduced from BOCHNER's (1932) theorem by CRAMER (1942), was later recognized as equivalent to a general Hilbert space result due to M.H. STONE (1932). The chaos expansion of Brownian fundionals was discovered by WIENER (1938), and the theory of multiple integrals with respect to Brownian motion was developed in a seminal paper of ITÖ (1951c). The law of the iterated logarithm was discovered by KHINCHIN, first (1923, 1924) for Bernoulli sequences, and later (1933) for Brownian motion. A systematic study of the Brownian paths was initiated by LEVY (1954, 1965), who proved the existence of the quadratic variation in (1940) and the arcsine laws in (1939, 1965). Though many proofs of the latter have since been given, the present deduction from basic symmetry properties may be new. The strong Markov property was used implicitly in the work of Levy and others, but the result was not carefully stated and proved until HUNT (1956). Many modern probability texts contain detailed introductions to Brownian motion. The books by ITÖ and McKEAN (1965), FREEDMAN (1971b), KARATZAS and SHREVE (1991), and REVUZ and YOR (1999) provide a wealth of further information on the subject. Further information on multiple Wiener-ltö integrals is given by KALLIANPUR (1980), DELLACHERIE et al. (1992), and NUALART (1995). The advanced theory of Gaussian distributions is nicely surveyed by ADLER (1990).

Historical and Bibliographical Notes

581

14. Skorohod Embedding and Invariance Principles The first functionallimit theorems were obtained in (1931b, 1933a) by KoLMOGOROV, who considered special functionals of a random walk. ERDÖS and KAC (1946, 1947) conceived the idea of an invariance principle that would allow functionallimit theorems to be extended from particular cases to a general setting. They also treated some special functionals of a random walk. The first general functional limit theorems were obtained by DoNSKER (1951-52) for random walks and empirical distribution functions, following an idea of DOOB (1949). A general theory based on sophisticated compactness arguments was later developed by PROHOROV (1956) and others. SKOROHOD's (1965) embedding theorem provided a new and probabilistic approach to Donsker's theorem. Extensions to the martingale context were obtained by many authors, beginning with DUBINS (1968). Lemma 14.19 appears in DVORETZKY (1972). Donsker's weak invariance principle was supplemented by a strong version due to STRASSEN (1964), which yields extensions of many a.s. limit theorems for Brownian motion to suitable random walks. In particular, his result yields a simple proof of the HARTMAN and WINTNER (1941) law of the iterated logarithm, which had originally been deduced from some deep results of KOLMOGOROV (1929). BILLINGSLEY (1968) gives many interesting applications and extensions of Donsker's theorem. For a wide range of applications of the martingale embedding theorem, see HALL and HEYDE (1980) and DURRETT (1995). KOMLOSet al. (1975-76) showed that the approximationrate in the Skorohod embedding can be improved by a more delicate "strong approximation." For an exposition of their work and its numerous applications, see CsÖRGÖ and REVESZ (1981).

15. Independent-Increment Processes and Approximation Until the 1920s, Brownian motion and the Poisson process were essentially the only known processes with independent increments. In (1924, 1925) LEVY introduced the stable distributions and noted that they too could be associated with suitable "decomposable" processes. DE FINETTI (1929) saw the general connection between processes with independent increments and infinitely divisible distributions and posed the problern of characterizing the latter. A partial solution for distributions with a finite second momentwas found by KOLMOGOROV (1932). The complete solution was obtained in a revolutionary paper by LEVY (1934-35), where the "decomposable" processes are analyzed by a virtuosic blend of analytic and probabilistic methods, leading to an explicit description in terms of a jump and a diffusion component. As a byproduct, Levy

582

Foundations of Modern Probability

obtained the general representation for the associated characteristic functions. His analysis was so complete that only improvements in detail have since been possible. In particular, ITÖ (1942b) showed how the jump component can be expressed in terms of Poisson integrals. Analytic derivations of the representation formula for the characteristic function were later given by LEVY (1954) himself, by FELLER (1937), and by KHINCHIN (1937). The scope of the classical centrallimit problern was broadened by LEVY (1925) to a general study of suitably normalized partial sums, obtained from a single sequence of independent random variables. To include the case of the classical Poisson approximation, KOLMOGOROV proposed a further extension to general triangular arrays, subject to the sole condition of uniformly asymptotically negligible elements. In this context, FELLER (1937) and KHINCHIN (1937) proved independently that the limiting distributions are infinitely divisible. It remained to characterize the convergence to specific limits, a problern that had already been solved in the Gaussian case by FELLER (1935) and LEVY (1935a). The ultimate solutionwas obtained independently by DOEBLIN (1939) and GNEDENKO (1939), and a comprehensive exposition of the theory was published by GNEDENKO and KOLMOGOROV (1968). The basic convergence Theorem 15.17 for Levy processes and the associated approximation result for random walks in Corollary 15.20 are essentially due to SKOROHOD (1957), though with rather different statements and proofs. Lemma 15.22 appears in DüEBLIN (1939a). Our approach to the basic representation theorem is a modernized version of Levy's proof, with simplifications resulting from the use of basic point process and martingale methods. Detailed accounts of the basic limit theory for null arrays are provided by many authors, including LOEVE (1977) and FELLER (1971). The positive case is treated in KALLENBERG (1986). A modern introduction to Levy processes is given by BERTOIN (1996). Generalindependent increment processes and associated limit theorems are treated in JACOD and SHIRYAEV (1987). Extreme value theory is surveyed by LEADRETTER et al. (1983).

16. Convergence of Random Processes, Measures, and Sets After DONSKER (1951-52) had proved his functional limit theorems for random walks and empirical distribution functions, a general theory of weak convergence in function spaces was developed by the Russian school, in seminal papers by PROHOROV (1956), SKOROHOD (1956, 1957), and KOLMOGOROV (1956). Thus, PROHOROV (1956) proved his fundamental compactness Theorem 16.3, in a setting for separable and complete metric spaces. The abstract theory was later extended in various directions by

Historical and Bibliographical Notes

583

LE CAM (1957), VARADARAJAN (1958), and DUDLEY (1966, 1967). The elementary inequality of ÜTTAVIANI is from (1939). Originally SKOROHOD (1956) considered the space D([O, 1]) endowed with four different topologies, of which the J 1-topology considered here is by far the most important for applications. The theory was later extended to D(JR+) by C.J. STONE (1963) and LINDVALL (1973). Tightness was originally verified by means of various product moment conditions, developed by CHENTSOV (1956) and BILLINGSLEY (1968), before the powerful criterion of ALDOUS (1978) became available. KURTZ (1975) and MITOMA (1983) noted that criteria for tightness in D(JR+, S) can often be expressed in terms of one-dimensional projections, as in Theorem 16.27. The weak convergence theory for random measures and point processes originated with PROHOROV (1961), who noted the equivalence of (i) and (ii) in Theorem 16.16 when S is compact. The development continued with seminal papers by DEBES et al. (197ü-71), HARRIS (1971), and JAGERS (1974). The one-dimensional criteria in Proposition 16.17 and Theorems 16.16 and 16.29 are based on results in KALLENBERG (1973a, 1986, 1996b) and a subsequent remark by KURTZ. Random sets had already been studied extensively by many authors, including CHOQUET (1953-54), KENDALL (1974), and MATHERON (1975), when an associated weak convergence theory was developed by NORBERG (1984). The applications considered in this chapter have a long history. Thus, primitive versions of Theorem 16.18 were obtained by PALM (1943), KHINCHIN (1955), and ÜSOSKOV (1956). The present version is due for S = lR to GRIGELIONIS (1963) and for more general spaces to GOLDMAN (1967) and JAGERS (1972). Limit theorems under simultaneaus thinning and rescaling of a given point process were obtained by RENYI (1956), NAWROTZKI (1962), BELYAEV (1963), and GOLDMAN (1967). The general version in Theorem 16.19 was proved by KALLENBERG (1986) after MECKE (1968) had obtained his related characterization of Cox processes. Limit theorems for sampling from a finite population and for general exchangeable sequences have been proved in varying generality by many authors, including CHERNOV and TEICHER (1958), HAJEK (1960), ROSEN (1964), BILLINGSLEY (1968), and HAGBERG (1973). The results of Theorems 16.23 and 16.21 first appeared in KALLENBERG (1973b). Detailed accounts of weak convergence theory and its applications may be found in several excellent textbooks and monographs, including BILLINGSLEY (1968), POLLARD (1984), ETHIER and KURTZ (1986), and JACOD and SHIRYAEV (1987). Moreinformation on limit theorems for random measures and point processes is available in MATTRES et al. (1978) and KALLENBERG (1986). A good general reference for random sets is MATHERON (1975).

584

Foundations of Modern Probability

17. Stochastic Integralsand Quadratic Variation The first stochastic integral with a random integrandwas defined by ITÖ (1942a, 1944), who used Brownian motion as the integrator and assumed the integrandtobe product measurable and adapted. DooB (1953) noted the connection with martingale theory. A first version of the fundamental substitution rule was proved by ITÖ (1951a). The result was later extended by many authors. The compensated integral in Corollary 17.21 was introduced by FISK, and independently by STRATONOVICH (1966). The existence of the quadratic variation process was originally deduced from the Doob-Meyer decomposition. FISK (1966) showed how the quadratic variation can also be obtained directly from the process, as in Proposition 17.17. The present construction was inspired by RoGERS and WILLIAMS (2000b). The BDG inequalities were originally proved for p > 1 and discrete time by BURKHOLDER (1966). MILLAR (1968) noted the extension to continuous martingales, in which context the further extension to arbitrary p > 0 was obtained independently by BURKHOLDER and GUNDY (1970) and NOVIKOV (1971). KUNITA and WATANABE (1967) introduced the covariation of two martingales and proved the associated characterization of the integral. They further established some general inequalities related to Proposition 17.9. The ltö integral was extended to square-integrable martingales by CouRREGE (1962-63) and KUNITA and WATANABE (1967) and to continuous semimartingales by DoLEANS-DADE and MEYER (1970). The idea oflocalization is due to ITÖ and WATANABE (1965). Theorem 17.24 was obtained by KAZAMAKI (1972) as part of a general theory of random time change. Stochastic integrals depending on a parameter were studied by DOLEANS (1967b) and STRICKERand YoR (1978), and the functional representation of Proposition 17.26 first appeared in KALLENBERG (1996a). Elementary introductions to ltö integration appear in many textbooks, such as CHUNG and WILLIAMS (1983) and 0KSENDAL (1998). For more advanced accounts and for further information, see IKEDA and WATANABE (1989), ROGERS and WILLIAMS (2000b), KARATZAS and SHREVE (1991), and REVUZ and YOR (1999).

18. Continuous Martingales and Brownian Motion The fundamental characterization of Brownian motion in Theorem 18.3 was proved by LEVY (1954), who also (1940) noted the conformal invariance up to a time change of complex Brownian motion and stated the polarity of singletons. A rigorous proof of Theorem 18.6 was later provided by KAKUTANI (1944a-b). KUNITA and WATANABE (1967) gave the first modern proof of Levy's characterization theorem, based on Itö's formula and exponential martingales. The history of the latter can be traced back to the seminal CAMERON and MARTIN (1944) paper, the source of Theorem

Historical and Bibliographical Notes

585

18.22, and to WALD's (1946, 1947) work in sequential analysis, where the identity of Lemma 18.24 first appeared in a version for random walks. The integral representation in Theorem 18.10 is essentially due to ITÖ (1951c), who noted its connection with multiple stochastic integrals and chaos expansions. A one-dimensional version of Theorem 18.12 appears in DooB (1953). The general time-change Theorem 18.4 was discovered independently by DAMBIS (1965) and DUBINS and SCHWARZ (1965), and a systematic study of isotropic martingales was initiated by GETOOR and SHARPE (1972). The multivariate result in Proposition 18.8 was noted by KNIGHT (1971), and a version of Proposition 18.9 for general exchangeable processes appears in KALLENBERG (1989). The skew-product representation in Corollary 18.7 is due to GALMARINO (1963), The Cameron-Martin theorem was gradually extended to more general Settings by many authors, including MARUYAMA (1954, 1955), GIRSANOV (1960), and VAN SCHUPPEN and WoNG (1974). The martingale criterion of Theorem 18.23 was obtained by NOVIKOV (1972). The material in this chapter is covered by many texts, including the excellent monographs by KARATZAS and SHREVE (1991) and REVUZ and YoR (1999). A more advanced and amazingly informative text is JACOD (1979).

19. Feller Processes and Semigroups Semigroup ideas are implicit in KOLMOGOROV's pioneering (1931a) paper, whose central theme is the search for local characteristics that will determine the transition probabilities through a system of differential equations, the so-called Kolmogorov forward and backward equations. Markov chains and diffusion processes were originally treated separately, but in (1935) KOLMOGOROV proposed a unified framework, with transition kernels regarded as operators (initially operating on measures rather than on functions), and with local characteristics given by an associated generator. Kolmogorov's ideas were taken up by FELLER (1936), who obtained general existence and uniqueness results for the forward and backward equations. The abstract theory of contraction semigroups on Banach spaces was developed independently by RILLE (1948) and YOSIDA (1948), both of whom recognized its significance for the theory of Markov processes. The power of the semigroup approach became clear through the work of FELLER (1952, 1954), who gave a complete description of the generators of one-dimensional diffusions. In particular, FeUer characterizes the boundary behavior of the process in terms of the domain of the generator. The systematic study of Markov semigroups began with the work of DYNKIN (1955a). The standard approach is to postulate strong continuity instead of the weaker and more easily verified condition (F2). The positive maximum principle appears in the work of ITÖ (1957), and the core condition of Proposition 19.9 is due to S. WATANABE (1968).

586

Foundations of Modern Probability

The first regularity theorem was obtained by DOEBLIN (1939b), who gave conditions for the paths to be step functions. A sufficient condition for continuity was then obtained by FORTET (1943). Finaily, KINNEY (1953) showed that any Feiler process has a version with rcil paths, after DYNKIN (1952) had obtained the same property under a Hölder condition. The use of martingale methods for the study of Markov processes dates back to KINNEY (1953) and DOOB (1954). The strong Markov property for Feiler processes was proved independently by DYNKIN and YUSHKEVICH (1956) and by BLUMENTHAL (1957) after special cases had been considered by DooB (1945), HUNT (1956), and RAY (1956). BLUMENTHAL's (1957) paperalso contains his zero-one law. DYNKIN (1955a) introduced his "characteristic operator," and a version of Theorem 19.24 appears in DYNKIN (1956). There is a vast Iiterature on approximation results for Markov chains and Markov processes, covering a wide range of applications. The use of semigroup methods to prove limit theorems can be traced back to LINDEBERG's (1922a) proof of the centrallimit theorem. The general results in Theorems 19.25 and 19.28 were developed in stages by TROTTER (1958a), SovA (1967), KURTZ (1969, 1975), and MACKEVICIUS (1974). Üur proof of Theorem 19.25 uses ideas from J.A. GOLDSTEIN (1976). A splendid introduction to semigroup theory is given by the relevant chapters in FELLER (1971). In particular, Feiler shows how the onedimensional Levy-Khinchin formula and associated limit theorems can be derived by semigroup methods. More detailed and advanced accounts of the subject appear in DYNKIN (1965), ETHIER and KURTZ (1986), and DELLACHERIE and MEYER (1975-87).

20. Ergodie Properties of Markov Processes The first ratioergodie theorems were obtained by DOEBLIN (1938b), DooB (1938, 1948a), KAKUTANI (1940), and HUREWICZ (1944). HüPF (1954) and DUNFORD and SCHWARTZ (1956) extended the pointwise ergodie theorem to general L 1-L 00 -contractions, and the ratioergodie theoremwas extended to positive L 1-contractions by CHACON and ÜRNSTEIN (1960). The present approach to their result in due to AKCOGLU and CHACON (1970). The notion of Harris recurrence goes back to DOEBLIN (1940) and HARRIS (1956). The latter author used the condition to ensure the existence, in discrete time, of a a-finite invariant measure. A corresponding continuous-time result was obtained by H. WATANABE (1964). The total variation convergence of Markov transition probabilities was obtained for a countable state space by ÜREY (1959, 1962) and in general by JAMISON and ÜREY (1967). BLACKWELL and FREEDMAN (1964) noted the equivalence of mixing and tail triviality. The present coupling approach goes back to GRIFFEATH (1975) and 8. GOLDSTEIN (1979) for the case of strong ergod-

Historical and Bibliographical Notes

587

icity and to BERBEE (1979) and ALDOUS and THORISSON (1993) for the corresponding weak result. There is an extensive literatme on ergodie theorems for Markov processes, mostly dealing with the discrete-time case. General expositians have been given by many authors, beginning with NEVEU (1971) and ÜREY (1971). Our treatment of Harris recurrent Feller processes is adapted from KUNITA (1990), who in turn follows the discrete-time approach of REvuz (1984). KRENGEL (1985) gives a comprehensive survey of abstract ergodie theorems. Detailed accounts of the coupling method and its various ramifications appear in LINDVALL (1992) and THORISSON (2000).

21. Stochastic Differential Equations and Martingale Problems Long before the existence of any general theory for SDEs, LANGEVIN (1908) proposed his equation to model the velocity of a Brownian particle. The solution process was later studied by ÜRNSTEIN and UHLENBECK (1930) and was thus named after them. A more rigorous discussion appears in Doos (1942a). The general idea of a stochastic differential equation goes back to BERNSTEIN (1934, 1938), who proposed a pathwise construction of diffusion processes by a discrete approximation, leading in the limit to a formal differential equation driven by a Brownian motion. However, ITÖ (1942a, 1951b) was the first author to develop a rigorous and systematic theory, including a precise definition of the integral, conditions for existence and uniqueness of solutions, and basic properties of the solution process, such as the Markov property and the continuous dependence on initial state. Similar results were obtained, later but independently, by GIHMAN (1947, 1950-51). The notion of a weak solutionwas introduced by GIRSANOV (1960), and a version of the weak existence Theorem 21.9 appears in SKOROHOD (1965). The ideas behind the transformations in Propositions 21.12 and 21.13 date back to GIRSANOV (1960) and VOLKONSKY (1958), respectively. The notion of a martingale problern can be traced back to LEVY's martingale characterization of Brownian motion and DYNKIN's theory of the characteristic operator. A comprehensive theory was developed by STROOCK and VARADHAN (1969), who established the equivalence with weak solutions to the associated SDEs, obtained general criteria for uniqueness in law, and deduced conditions for the strong Markov and Feller properties. The measurability part of Theorem 21.10 is a slight extension of an exercise in 8TROOCK and VARADHAN (1979). YAMADA and WATANABE (1971) proved that weak existence and pathwise uniqueness imply strong existence and uniqueness in law. Under the same conditions, they further established the existence of a functional

588

Foundations of Modern Probability

solution, possibly depending on the initial distribution of the process; that dependence was later removed by KALLENBERG (1996a). IKEDA and WATANABE (1989) noted how the notions of pathwise uniqueness and uniqueness in law extend by conditioning from degenerate to arbitrary initial distributions. The basic theory of SDEs is covered by many excellent textbooks on different levels, including IKEDA and WATANABE (1989), RüGERS and WILLIAMS (1987), and KARATZAS and SHREVE (1991). Moreinformation on the martingale problern is available in JACOD (1979), STROOCK and VARADHAN (1979), and ETHIER and KURTZ (1986).

22. Local Time, Excursions, and Additive Functionals Local time of Brownian motion at a fixed point was discovered and explored by LEVY (1939), who devised several explicit constructions, mostly of the type of Proposition 22.12. Much of Levy's analysis is based on the observation in Corollary 22.3. The elementary Lemma 22.2 is due to SKOROHOD (1961-62). Formula (1), first noted for Brownian motion by TANAKA (1963), was taken by MEYER (1976) as the basis for a general semimartingale approach. The general Itö-Tanaka formula in Theorem 22.5 was obtained independently by MEYER (1976) and WANG (1977). TROTTER (1958b) proved that Brownian local time has a jointly continuous version, and the extension to general continuous semimartingales in Theorem 22.4 was obtained by YoR (1978). Modern excursion theory originated with the seminal paper of I TÖ (1972), which was partly inspired by earlier work of LEVY (1939). In particular, Itö proved a version of Theorem 22.11, assuming the existence of local time. HOROWITZ (1972) independently studied regenerative sets and noted their connection with subordinators, equivalent to the existence of a local time. A systematic theory of regenerative processes was developed by MAISONNEUVE (1974). The remarkable Theorem 22.17 was discovered independently by RAY (1963) and KNIGHT (1963), and the present proof is essentially due to WALSH (1978). Our construction of the excursion process is close in spirit to Levy's original ideas and to those in GREENWOOD and PITMAN (1980). Elementary additive functionals of integral type had been discussed extensively in the Iiterature when DYNKIN proposed a study of the general case. The existence Theorem 22.23 was obtained by VüLKONSKY (1960), and the construction of local time in Theorem 22.24 dates back to BLUMENTHAL and GETOOR (1964). The integral representation of CAFs in Theorem 22.25 was proved independently by VOLKONSKY (1958, 1960) and McKEAN and TANAKA (1961). The characterization of additive functionals in terms of suitable measures on the state space datesback to MEYER (1962), and the explicit representation of the associated measures was found by REVUZ (1970) after special cases had been considered by HUNT (1957-58).

Historical and Bibliographical Notes

589

An excellent introduction to local time appears in KARATZAS and SHREVE (1991). The books by ITÖ and MCKEAN (1965) and REVUZ and YOR (1999) contain an abundance of further information on the subject. The latter text may also serve as a good introduction to additive functionals and excursion theory. For more information on the latter topics, the reader may consult BLUMENTHAL and GETOOR (1968), BLUMENTHAL (1992), and DELLACHERIE et al. (1992).

23. One-Dimensional SDEs and Diffusions The study of continuous Markov processes and the associated parabolic differential equ~tions, initiated by KOLMOGOROV (1931a) and FELLER (1936), took a new direction with the seminal papers of FELLER (1952, 1954), who studied the generators of one-dimensional diffusions within the framework of the newly developed semigroup theory. In particular, Feilergave a complete description in terms of scale function and speed measure, classified the boundary behavior, and showed how the latter is determined by the domain of the generator. Finally, he identified the cases when explosion occurs, corresponding to the absorption cases in Theorem 23.15. A more probabilistic approach to these results was developed by DYNKIN (1955b, 1959), who along with RAY (1956) continued Feller's study of the relationship between analytic properties of the generator and sample path properties of the process. The idea of constructing diffusions on a natural scale through a time change of Brownian motion is due to HUNT (1958) and VOLKONSKY (1958), and the full description in Theorem 23.9 was completed by VOLKONSKY (1960) and ITÖ and MCKEAN (1965). The present stochastic calculus approach is based on ideas in MELE:ARD (1986). The ratioergodie Theorem 23.14 was first obtained for Brownian motion by DERMAN (1954), by a method originally devised for discrete-time chains by DOEBLIN (1938). It was later extended to more general diffusions by MoTOO and WATANABE (1958). The ergodie behavior of recurrent onedimensionaldiffusionswas analyzed by MARUYAMA and TANAKA (1957). For one-dimensional SDEs, SKOROHOD (1965) noticed that Itö's original Lipschitz condition for pathwise uniqueness can be replaced by a weaker Hölder condition. He also obtained a corresponding comparison theorem. The improved conditions in Theorems 23.3 and 23.5 are due to YAMADA and WATANABE (1971) and YAMADA (1973), respectively. PERKINS (1982) and LE GALL (1983) noted how the use of semimartingale local time simplifies and unifies the proofs of those and related results. The fundamental weak existence and uniqueness criteria in Theorem 23.1 were discovered by ENGELBERT and SCHMIDT (1984, 1985), whose (1981) zero-one law is implicit in Lemma 23.2. Elementary introductions to one-dimensional diffusions appear in BREIMAN (1968), FREEDMAN (1971b), and ROGERS and WILLIAMS (2000b). More detailed and advanced accounts are given by DYNKIN (1965) and ITÖ

590

Foundations of Modern Probability

and McKEAN (1965). Furtherinformation on one-dimensional SDEs may be obtained from the excellent books by KARATZAS and SHREVE (1991) and REVUZ and YOR (1999).

24. Connections with PDEs and Potential Theory The fundamental solution to the heat equation in terms of the Gaussian kerne! was obtained by LAPLACE (1809). A century later BACHELlER (1900, 1901) noted the relationship between Brownian motion and the heat equation. The PDE connections were further explored by many authors, including KOLMOGOROV (1931a), FELLER (1936), KAC (1951), and DooB (1955). A first version of Theorem 24.1 was obtained by KAc (1949), who was in turn inspired by FEYNMAN's (1948) work on the Schrödinger equation. Theorem 24.2 is due to STROOCK and VARADHAN (1969). GREEN (1828), in his discussion of the Dirichlet problem, introduced the functions named after him. The Dirichlet, sweeping, and equilibrium problems were all studied by GAUSS (1840) in a pioneering paper on electrostatics. The rigorous developments in potential theory began with POINCARE (189()-99), who solved the Dirichlet problern for domains with a smooth boundary. The equilibrium measure was characterized by GAuss as the unique measure minimizing a certain energy functional, but the existence of the minimum was not rigorously established until FROSTMAN (1935). The first probabilistic connections were made by PHILLIPS and WIENER (1923) and COURANT et al. (1928), who solved the Dirichlet problern in the plane by a method of discrete approximation, involving a version of Theorem 24.5 for a simple symmetric random walk. KOLMOGOROV and LEONTOVICH (1933) evaluated a special hitting distribution for twodimensional Brownian motion and noted that it satisfies the heat equation. KAKUTANI (1944b, 1945) showed how the harmonic measure and sweeping kerne! can be expressed in terms of a Brownian motion. The probabilistic methods were extended and perfected by DooB (1954, 1955), who noted the profound connections with martingale theory. A general potential theory was later developed by HUNT (1957-58) for broad classes of Markov processes. The interpretation of Green functions as occupation densities was known to KAC (1951), and a probabilistic approach to Green functions was developed by HUNT (1956). The connection between equilibrium measures and quitting times, implicit already in SPITZER (1964) and ITÖ and McKEAN (1965), was exploited by CHUNG (1973) to yield the explicit representation of Theorem 24.14. Time reversal of diffusion processes was first considered by SCHRÖDINGER (1931). KOLMOGOROV (1936b, 1937) computed the transition kernels ofthe reversed process and gave necessary and sufficient conditions for symmetry. The basic role of time reversal and duality in potential theory was recog-

Historical and Bibliograpbical Notes

591

nized by DooB (1954) and HUNT (1958). Proposition 24.15 and the related construction in Theorem 24.21 go back to HUNT, but Theorem 24.19 may be new. The measure v in Theorem 24.21 is related to the "Kuznetsov measures," discussed extensively in GETOOR (1990). The connection between random sets and alternating capacities was established by CHOQUET (195354), and a corresponding representation of infinitely divisible random sets was obtained by MATHERON (1975). Elementary introductions to probabilistic potential theory appear in BASS (1995) and CHUNG (1995), and to other PDE connections in KARATZAS and SHREVE (1991). A detailed exposition of classical probabilistic potential theory is given by PORT and STONE (1978). Doos (1984) provides a wealth of further information on both the analytic and probabilistic aspects. Introductions to Hunt's work and the subsequent developments are given by CHUNG (1982) and DELLACHERIE and MEYER (1975-87). More advanced treatments appear in BLUMENTHAL and GETOOR (1968) and SHARPE (1988).

25. Predictability, Compensation, and Excessive Functions The basic connection between Superharmonie functions and Supermartingales was established by Doos (1954), who also proved that compositions of excessive functions with Brownian motion are continuous. Doob further recognized the need for a general decomposition theorem for supermartingales, generalizing the elementary Lemma 7.10. Such a result was eventually proved by MEYER (1962, 1963), in the form of Lemma 25.7, after special decompositions in the Markovian context had been obtained by VOLKONSKY (1960) and SHUR (1961). Meyer's original proof was profound and clever. The present more elementary approach, based on DUNFORD's (1939) weak compactness criterion, was devised by RAO (1969a). The extension to general submartingales was accomplished by ITÖ and WATANABE (1965) through the introduction of local martingales. Predictable and totally inaccessible times appear implicitly in the work of BLUMENTHAL (1957) and HUNT (1957-58), in the context of quasi-left-continuity. A systematic study of optional times and their associated o--fields was initiated by CHUNG and DooB (1965). The basic role ofthe predictable o--field became clear after DOLEANS (1967a) had proved the equivalence between naturalness and predictability for increasing processes, thereby establishing the ultimate version of the Doob--Meyer decomposition. The moment inequality in Proposition 25.21 was obtained independently by GARSIA (1973) and NEVEU (1975) after a more special result had been proved by BURKHOLDER et al. (1972). The theory of optional and predictable times and o--fields was developed by MEYER (1966), DELLACHERIE

592

Foundations of Modern Probability

(1972), and others into a "general theory of processes," which has in many ways revolutionized modern probability. Natural compensators of optional times first appeared in reliability theory. More general compensators were later studied in the Markovian context by S. WATANABE (1964) under the name of "Levy systems." GRIGELIONIS (1971) and JACOD (1975) constructed the compensator of a general random measure and introduced the related "local characteristics" of a general semimartingale. WATANABE (1964) proved that a simple point process with a continuous and deterministic compensator is Poisson; a corresponding time-change result was obtained independently by MEYER (1971) and PAPANGELOU (1972). The extension in Theorem 25.24 was given by KALLENBERG (1990), and general versions of Proposition 25.27 appear in RüSINSKI and WüYCZYNSKI (1986) and KALLENBERG (1992). An authoritative account of the general theory, including an elegant but less elementary projection approach to the Doob--Meyer decomposition due to DOLE~ANS, is given by DELLACHERIE and MEYER (1975-87). Useful introductions to the theory are contained in ELLIOTT (1982) and ROGERS and WILLIAMS (2000b). Our elementary proof of Lemma 25.10 uses ideas from DOOB (1984). BLVMENTHAL and GETOOR (1968) remains a good general reference on additive functionals and their potentials. A detailed account of random measures and their compensators appears in J ACOD and SHIRYAEV (1987). Applications to queuing theory are given by BREMAUD (1981), BACCELLI and BREMAUD (2000), and LAST and BRANDT (1995).

26. Semimartingales and General Stochastic Integration DooB (1953) conceived the idea of a stochastic integration theory for general L 2 -martingales, based on a suitable decomposition of continuous-time submartingales. MEYER's (1962) proof of such a result opened the door to the L 2 -theory, which was then developed by COURREGE (1962-63) and KUNITA and WATANABE (1967). The latter paper contains in particular a version of the ·general substitution rule. The integration theory was later extended in a series of papers by MEYER (1967) and DoLEANS-DADE and MEYER (1970) and reached its final form with the notes of MEYER (1976) and the books by JACOD (1979), METIVIER and PELLAUMAIL (1979), and DELLACHERIE and MEYER (1975-87). The basic role of predictable processes as integrands was recognized by MEYER (1967). By contrast, semimartingales were originally introduced in an ad hoc manner by DOLEANS-DADE and MEYER (1970), and their basie preservation laws were only gradually recognized. In particular, JACOD (1975) used the general Girsanov theorem of VAN SCHUPPEN and WONG (1974) to show that the semimartingale property is preserved und er absolutely continuous changes of the probability measure. The characterization

Historical and Bibliographical Notes

593

of general stochastic integrators as semimartingales was obtained independently by BICHTELER (1979) and DELLACHERIE (1980), in both cases with support from analysts. Quasimartingales were originally introduced by FISK (1965) and ÜREY (1966). The decomposition of RAO (1969b) extends a result by KRICKEBERG (1956) for L 1-bounded martingales. YOEURP (1976) combined a notion of "stable subspaces" due to KUNITA and WATANABE (1967) with the Hilbert space structure of M 2 to obtain an orthogonal decomposition of L 2 -martingales, equivalent to the decompositions in Theorem 26.14 and Proposition 26.16. Elaborating on those ideas, MEYER (1976) showed that the purely discontinuous component admits a representation as a sum of compensated jumps. SDEs driven by general Levy processes were already considered by lTÖ (1951b). The study of SDEs driven by general semimartingales was initiated by DOLEANS-DADE (1970), who obtained her exponential process as a solution to the equation in Theorem 26.8. The scope of the theory was later expanded by many authors, and a comprehensive account is given by PROTTER (1990). The martingale inequalities in Theorems 26.12 and 26.17 have ancient origins. Thus, a version of the latter result for independent random variables was proved by KOLMOGOROV (1929) and, in a sharper form, by PROHOROV (1959). Their result was extended to discrete-time martingales by JOHNSON et al. (1985) and HITCZENKO (1990). The present statements appeared in KALLENBERG and SZTENCEL (1991). Early versions ofthe inequalities in Theorem 26.12 were proved by KHINCHIN (1923, 1924) for symmetric random walks and by PALEY (1932) for Walsh series. A version for independent random variables was obtained by MARCINKIEWICZ and ZYGMUND (1937, 1938). The extension to discretetime martingales is due to BURKHOLDER (1966) for p > 1 and to DAVIS (1970) for p = 1. The result was extended to continuous time by BURKHOLDER et al. (1972), who also noted how the general result can be deduced from the statement for p = 1. The present proof is a continuous-time version of Davis' original argument. Excellent introductions to semimartingales and stochastic integration are given by DELLACHERIE and MEYER (1975-87) and JACOD and SHIRYAEV (1987). PROTTER (1990) offers an interesting alternative approach, originally suggested by MEYER and by DELLACHERIE (1980). The book by JACOD (1979) remains a rich source of further information on the subject.

27. Large Deviations Large deviation theory originated with certain refinements of the central limit theorem obtained by many authors, beginning with KHINCHIN (1929). Here the object of study is the ratio oftail probabilities rn(x) = P{ (n > x} / P{( > x}, where ( is N(O, 1) and (n = n- 112 I:k::;n ~k for some i.i.d.

594

Foundations of Modern Probability

random variables ek with mean 0 and variance 1, so that rn(x) --+ 1 for fixed x. A precise asymptotic expansionwas obtained by CRAMER {1938), in the case when x varies with n at a rate x = o(n112 ). (See PETROV (1995), Theorem 5.23, for details.) In the same historic paper, CRAMER (1938) obtained the first true large deviation result, in the form of our Theorem 27.3, though under some technical assumptions that were later removed by CHERNOFF {1952) and BAHADUR (1971). VARADHAN {1966) extended the result to higher dimensions and rephrased it in the form of a general large deviation principle. At about the same time, SCHILDER {1966) proved his large deviation result for Brownian motion, using the present change-of-measure approach. Similar methods were used by FREIDLIN and WENTZELL {1970, 1998) to study random perturbations of dynamical systems. Even earlier, SANOV (1957) had obtained his large deviation result for empirical distributions of i.i.d. random variables. The relative entropy H(viJ.L) appearing in the limit had already been introduced in statistics by KULLBACK and LEIBLER (1951). lts cruciallink to the Legendre-Fenchel transform A*, long anticipated by physicists, was formalized by DONSKER and VARADHAN (1975-83). The latter authors also developed some profaund and far-reaching extensions of Sanov's theorem, in a long series of formidable papers. ELLIS {1985) gives a detailed exposition of those results, along with a discussion of their physical significance. Much of the formalization of underlying principles and techniques was developed at a later stage. Thus, an abstract version of the projective Iimit approachwas introduced by DAWSON and GÄRTNER (1987). BRYC (1990) supplemented VARADHAN's (1966) functional version of the LDP with a reverse proposition. Similarly, lüFFE (1991) appended a powerful inverse to the classical "contraction principle." Finally, PUKHALSKY {1991) established the equivalence, under suitable regularity conditions, of the exponential tightness and the goodness of the rate function. STRASSEN (1964) established hisformidable law ofthe iterated logarithm by direct estimates. A detailed exposition of the original approach appears in FREEDMAN (1971b). VARADHAN (1984) recognized the result as a corollary to Schilder's theorem, and a complete proof along the suggested lines appears in DEUSCHEL and STROOCK (1989). Gentle introductions to large deviation theory and its applications are given by VARADHAN (1984) and DEMBO and ZEITOUNI (1998). The more demanding text of DEUSCHEL and STROOCK (1989) provides much additional insight to the persistent reader.

Appendix Some more advanced aspects of measure theory are covered by RoYDEN (1988), PARTHASARATHY (1967), and DUDLEY (1989). The projection

Historical and Bibliographical Notes

595

and section theorems depend on capacity theory, for which we refer to DELLACHERIE (1972) and DELLACHERIE and MEYER (1975-87). The h-topology was introduced by SKOROHOD (1956), and detailed expositions may be found in BILLINGSLEY (1968), ETHIER and KURTZ (1986), and JACOD and SHIRYAEV (1987). A discussion of the vague topology on M(S) with S lcscH is given by BAUER (1972). The topology on the space of closed sets, considered here, was introduced in a more general setting by FELL (1962), and a full account appears in MATHERON (1975), including a detailed proof (different from ours) of the basic Theorem A2.5.

Bibliography This list includes only publications that are explicitly mentioned in the text or notes or are directly related to results cited in the book. Knowledgeable readers will notice that many books and papers of historical significance have been omitted. ADLER, R.J. (1990). An Introduetion to Continuity, Extrema, and Related Topies for General Gaussian Proeeses. lnst. Math. Statist., Hayward, CA. AKCOGLU, M.A., CHACON, R.V. (1970). Ergodie properties of Operators in Lebesgue space. Adv. Appl. Probab. 2, 1-47. Awous, D.J. (1978). Stopping times and tightness. Ann. Probab. 6, 335-340. - (1985). Exchangeability and related topics. Leet. Notes in Math. 1117, 1-198. Springer, Berlin. Awous, D., THORISSON, H. (1993). Shift-coupling. Stoeh. Proe. Appl. 44, 1-14. ALEXANDROV, A.D. (1940-43). Additive set-functions in abstract spaces. Mat. Sb. 8, 307-348; 9, 563-628; 13, 169-238. ANDRE, D. (1887). Solution directe du problerne resolu par M. Bertrand. C.R. Aead. Sei. Paris 105, 436-437. ATHREYA, K., McDONALD, D., NEY, P. (1978). Coupling and the renewal theorem. Amer. Math. Monthly 85, 809-814. BACCELLI, F., BREMAUD, P. (2000). Elements of Queueing [sie} Theory, 2nd ed., Springer, Berlin. BACHELIER, L. (190Ü). Theorie de la speculation. Ann. Sei. Ecole Norm. Sup. 17, 21-86. - (1901). Theorie mathematique du jeu. Ann. Sei. Eeole Norm. Sup. 18, 143210. BAHADUR, R.R. (1971). Some Limit Theorems in Statisties. SIAM, Philadelphia. BARBIER, E. (1887). Generalisation du problerne resolu par M. J. Bertrand. C.R. Aead. Sei. Paris 105, 407, 440. BASS, R.F. (1995). Probabilistie Teehniques in Analysis. Springer, NY. - (1998). Diffusions and Elliptie Operators. Springer, NY. BAUER, H. (1972). Probability Theory and Elements of Measure Theory. Engl. trans., Holt, Rinehart & Winston, NY. BAXTER, G. (1961). An analytic approach to finite fluctuation problems in probability. J. d'Analyse Math. 9, 31-70. BELYAEV, Y.K. (1963). Limit theorems for dissipative flows. Th. Probab. Appl. 8, 165-173.

Bibliography

597

BERBEE, H.C.P. (1979). Random Walks with Stationary Increments and Renewal Theory. Mathematisch Centrum, Amsterdam. BERNOULLI, J. (1713). Ars Conjectandi. Thurnisiorum, Basel. BERNSTEIN, S.N. (1927). Sur l'extension du theoreme Iimite du calcul des probabilites aux sommes de quantites dependantes. Math. Ann. 97, 1-59. - (1934). Principes de la theorie des equations differentielles stochastiques. Trudy Fiz.-Mat., Steklov Inst., Akad. Nauk. 5, 95-124. - (1937). On some variations of the Chebyshev inequality (in Russian). Dokl. Acad. Nauk SSSR 17, 275-277. - (1938). Equations differentielles stochastiques. Act. Sei. Ind. 738, 5-31. BERTOIN, J. (1996). Levy Processes. Garnbridge Univ. Press. BERTRAND, J. (1887). Solution d'un probleme. C.R. Acad. Sei. Paris 105, 369. BICHTELER, K. (1979). Stochastic integrators. Bull. Amer. Math. Soc. 1, 761765. BIENAYME, J. (1853). Considerations a l'appui de la decouverte de Laplace sur la loi de probabilite dans la methode des moindres carres. C.R. Acad. Sei. Paris 37, 309-324. BILLINGSLEY, P. (1965). Ergodie Theory and Information. Wiley, NY. - (1968). Convergence of Probability Measures. Wiley, NY. - (1995). Probability and Measure, 3rd ed. Wiley, NY. BIRKHOFF, G.D. (1932). Proof of the ergodie theorem. Proc. Natl. Acad. Sei. USA 17, 656-660. BLACKWELL, D. (1948). A renewal theorem. Duke Math. J. 15, 145-150. - (1953). Extension of a renewal theorem. Pacific J. Math. 3, 315-320. BLACKWELL, D., FREEDMAN, D. (1964). The tail a-field of a Markov chain and a theorem of Orey. Ann. Math. Statist. 35, 1291-1295. BLUMENTHAL, R.M. (1957). An extended Markov property. Trans. Amer. Math. Soc. 82, 52-72. - (1992). Excursions of Markov Processes. Birkhäuser, Boston. BLUMENTHAL, R.M., GETOOR, R.K. (1964). Local times for Markov processes. Z. Wahrsch. verw. Geb. 3, 5ü-74. - (1968). Markov Processes and Potential Theory. Academic Press, NY. BOCHNER, S. (1932). Vorlesungen über Fouriersehe Integmle, Akad. Verlagsges., Leipzig. Repr. Chelsea, NY 1948. - (1933). Monotone Funktionen, Stieltjessche Integrale und harmonische Analyse. Math. Ann. 108, 378-410. BOLTZMANN, L. (1887). Über die mechanischen Analogien des zweiten Hauptsatzes der Thermodynamik. J. Reine Angew. Math. 100, 201-212. BOREL, E. (1895). Sur quelques points de la theorie des fonctions. Ann. Sei. Ecole Norm. Sup. (3) 12, 9-55. - (1898). Ler;ons sur la Theorie des Fonctions. Gauthier-Villars, Paris. - (1909). Les probabilites denombrables et leurs applications arithmetiques. Rend. Circ. Mat. Palermo 27 247-271.

598

Foundations of Modern Probability

BREIMAN, L. (1957--60). The individual ergodie theorem of infomation theory. Ann. Math. Statist. 28, 809-811; 31, 809-810. - (1968). Probability. Addison-Wesley, Reading, MA. Repr. SIAM, Philadelphia 1992. BREMAUD, P. (1981). Point Processes and Queues. Springer, NY. BROWN, R. (1828). A brief description of microscopical observations made in the months of June, July and August 1827, on the particles contained in the pollen of plants; and on the general existence of active molecules in organic and inorganic bodies. Ann. Phys. 14, 294-313. BRYC, W. (1990). Large deviations by the asymptotic value method. In Diffusion Processes and Related Problems in Analysis (M. Pinsky, ed.), 447-472. Birkhäuser, Basel. BÜHLMANN, H. (1960). Austauschbare stochastische Variabeln und ihre Grenzwertsätze. Univ. Calif. Publ. Statist. 3, 1-35. BUNIAKOWSKY, V.Y. (1859). Sur quelques inegalites concernant les integrales ordinaires et les integrales aux differences finies. Mem. de l'Acad. St.-Petersbourg 1:9. BURKHOLDER, D.L. (1966). Martingale transforms. Ann. Math. Statist. 37, 14941504. BURKHOLDER, D.L., DAVIS, B.J., GUNDY, R.F. (1972). Integral inequalities for convex functions of operators on martingales. Proc. 6th Berkeley Symp. Math. Statist. Probab. 2, 223-240. BuRKHOLDER, D.L., GUNDY, R.F. (1970). Extrapolation and interpolation of quasi-linear operators on martingales. Acta Math. 124, 249-304. CAMERON, R.H., MARTIN, W.T. (1944). Transformation of Wiener integrals under translations. Ann. Math. 45, 386-396. CANTELLI, F.P. (1917). Su due applicazione di un teorema di G. Boole alla statistica matematica. Rend. Accad. Naz. Lincei 26, 295-302. - (1933). Sulla determinazione empirica della leggi di probabilita. Giorn. Ist. Ital. Attuari 4, 421-424. CARATHEODORY, C. (1927). Vorlesungen über reelle Punktionen, 2nd ed. Teubner, Leipzig (1st ed. 1918). Repr. Chelsea, NY 1946. CARLESON, L. (1958). Two remarks on the basic theorems of information theory. Math. Scand. 6, 175-180. CAUCHY, A.L. (1821). Cours d'analyse de l'Ecole Royale Polytechnique, Paris. CHACON, R.V., ÜRNSTEIN, D.S. (1960). A generalergodie theorem. Illinois J. Math. 4, 153-160. CHAPMAN, S. (1928). On the Brownian displacements and thermal diffusion of grains suspended in a non-uniform fluid. Proc. Roy. Soc. London (A) 119, 34-54. CHEBYSHEV, P.L. (1867). Desvaleurs moyennes. J. Math. Pures Appl. 12, 177184. - (1890). Sur deux theoremes relatifs aux probabilites. Acta Math. 14, 305-315.

Bibliography

599

CHENTSOV, N.N. (1956). Weak convergence of stochastic processes whose trajectories have no discontinuities of the second kind and the "heuristic" approach to the Kolmogorov-Smirnov tests. Th. Probab. Appl. 1, 14ü-144. CHERNOFF, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist. 23, 493-507. CHERNOFF, H., TEICHER, H. (1958). A centrallimit theorem for sequences of exchangeable random variables. Ann. Math. Statist. 29, 118-130. CHOQUET, G. (1953-54). Theory of capacities. Ann. Inst. Fourier Grenoble 5, 131-295. CHOW, Y.S., TEICHER, H. (1997). Probability Theory: Independence, Interchangeability, Martingales, 3nd ed. Springer, NY. CHUNG, K.L. (1960). Markov Chains with Stationary Transition Probabilities. Springer, Berlin. - (1961). A note on the ergodie theorem of information theory. Ann. Math. Statist. 32, 612-614. - (1973). Probabilistic approach to the equilibrium problern in potential theory. Ann. Inst. Fourier Grenoble 23, 313-322. - (1974). A Course in Probability Theory, 2nd ed. Academic Press, NY. - (1982). Lectures from Markov Processes to Brownian Motion. Springer, NY. - (1995). Green, Brown, and Probability. World Scientific, Singapore. CHUNG, K.L., DooB, J.L. (1965). Fields, optionality and measurability. Amer. J. Math. 87, 397-424. CHUNG, K.L., FUCHS, W.H.J. (1951). On the distribution of values of sums of random variables. Mem. Amer. Math. Soc. 6. CHUNG, K.L., ORNSTEIN, D.S. (1962). On the recurrence of sums of random variables. Bull. Amer. Math. Soc. 68, 3ü-32. CHUNG, K.L., WALSH, J.B. (1974). Meyer'stheoremonprevisibility. Z. Wahrsch. verw. Geb. 29, 253-256. CHUNG, K.L., WILLIAMS, R.J. (1990). Introduction to Stochastic Integration, 2nd ed. Birkhäuser, Boston. COURANT, R., FRIEDRICHS, K., LEWY, H. (1928). Über die partiellen Differentialgleichungen der mathematischen Physik. Math. Ann. 100, 32-74. COURREGE, P. (1962-63). Integrales stochastiques et martingales de carre integrable. Sem. Brelot-Choquet-Deny 7. Publ. Irrst. H. Poincare. Cox, D.R. (1955). Some statistical methods connected with series of events. J. R. Statist. Soc. Ser. B 17, 129-164. CRAMER, H. (1938). Sur un nouveau theoreme-limite de la theorie des probabilites. Actual. Sei. Indust. 736, 5-23. - (1942). On harmonic analysis in certain functional spaces. Ark. Mat. Astr. Fys. 28B:12 (17 pp.). CRAMER, H., LEADBETTER, M.R. (1967). Stationary and Related Stochastic Processes. Wiley, NY. CRAMER, H., WoLD, H. (1936). Some theorems on distribution functions. J. London Math. Soc. 11, 29ü-295.

600

Foundations of Modern Probability

CsÖRGÖ, M., R:Ev:Esz, P. (1981). Strong Approximations in Probability and Statistics. Aeademie Press, NY. DALEY, D.J., VERE-JONES, D. (1988). An Introduction to the Theory of Point Processes. Springer, NY. DAMBIS, K.E. (1965). On the deeomposition of eontinuous submartingales. Th. Probab. Appl. 10, 401-410. DANIELL, P.J. (1918-19). Integrals in an infinite number of dimensions. Ann. Math. (2) 20, 281-288. - (1919-20). Functions of limited variationinan infinite number of dimensions. Ann. Math. (2) 21, 3Q--38. - (1920). Stieltjes derivatives. Bull. Amer. Math. Soc. 26, 444-448. DAVIS, B.J. (1970). On the integrability ofthe martingale square funetion. Israel J. Math. 8, 187-190. DAWSON, D.A., GÄRTNER, J. {1987). Large deviations from the MeKean-Vlasov limit for weakly interacting diffusions. Stochastics 20, 247-308. DAY, M.M. (1942). Ergodie theorems for Abelian semigroups. Trans. Amer. Math. Soc. 51, 399--412. DEBES, H., KERSTAN, J., LIEMANT, A., MATTHES, K. (197Q--71). Verallgemeinerung eines Satzes von Dobrushin I, III. Math. Nachr. 47, 183-244; 50, 99--139. DELLACHERIE, C. (1972). Capacites et Processus Stochastiques. Springer, Berlin. - (1980). Un survol de la theorie de !'integrale stoehastique. Stach. Proc. Appl. 10, 115-144. DELLACHERIE, C., MAISONNEUVE, B., MEYER, P.A. (1992). Probabilites et Potentiel, V. Hermann, Paris. DELLACHERIE, C., MEYER, P.A. (1975-87). Probabilites et Potentiel, I-IV. Hermann, Paris. Engl. trans., North-Holland. DEMBO, A., ZEITOUNI, 0. (1998). Large Deviations Techniques and Applications, 2nd ed. Springer, NY. DERMAN, C. (1954). Ergodie property of the Brownian motion proeess. Proc. Natl. Acad. Sei. USA 40, 1155-1158. DEUSCHEL, J.D., STROOCK, D.W. (1989). Large Deviations. Aeademie Press, Boston. DOEBLIN, W. (1938a). Expose de la theorie des ehaines simples eonstantes de Markova un nombre fini d'etats. Rev. Math. Union Interbalkan. 2, 77-105. - (1938b). Sur deux probU:lmes de M. Kolmogoroff eoneernant les ehaines denombrables. Bull. Soc. Math. Prance 66, 210-220. - (1939a). Sur les sommes d'un grand nombre de variables aleatoires independantes. Bull. Sei. Math. 63, 23-64. - (1939b). Sur eertains mouvements aleatoires discontinus. Skand. Aktuarietidskr. 22, 211-222. - (1940). Elements d'une theorie generale des chaines simples constantes de Markoff. Ann. Sei. Ecole Norm. Sup. 3 57, 61-111. DÖHLER, R. (1980). On the conditional independence of random events. Th. Probab. Appl. 25, 628-634.

Bibliography

601

DüLEANS(-DADE), C. (1967a). Processus croissants nature! et processus croissants tres bien mesurable. C.R. Acad. Sei. Paris 264, 874-876. - (1967b). Integrales stochastiques dependant d'un parametre. Publ. Inst. Stat. Univ. Paris 16, 23-34. - (1970). Quelques applications de la formule de changement de variables pour les semimartingales. Z. Wahrsch. verw. Geb. 16, 181-194. DüLEANS-DADE, C., MEYER, P.A. (1970). Integrales stochastiques par rapport aux martingales locales. Lect. Notes in Math. 124, 77-107. Springer, Berlin. DüNSKER, M.D. (1951-52). An invariance principle for certain probability Iimit theorems. Mem. Amer. Math. Soc. 6. - (1952). Justification and extension of Doob's heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Statist. 23, 277-281. DüNSKER, M.D., VARADHAN, S.R.S. (1975-83). Asymptotic evaluation of certain Markov process expectations for large time, I-IV. Comm. Pure Appl. Math. 28, 1-47, 279-301; 29, 389-461; 36, 183-212. DooB, J.L. (1936). Note on probability. Ann. Math. (2) 37, 363-367. - (1937). Stochastic processes depending on a continuous parameter. Trans. Amer. Math. Soc. 42, 107-140. - (1938). Stochastic processes with an integral-valued parameter. Trans. Amer. Math. Soc. 44, 87-150. - (1940). Regularity properties of certain families of chance variables. Trans. Amer. Math. Soc. 47, 455-486. - (1942a). The Brownian movement and stochastic equations. Ann. Math. 43, 351-369. - (1942b). Topics in the theory of Markoff chains. Trans. Amer. Math. Soc. 52, 37-64. - (1945). Markoff chains--denumerable case. Trans. Amer. Math. Soc. 58, 455473. - (1947). Probability in function space. Bull. Amer. Math. Soc. 53, 15-30. - (1948a). Asymptotic properties of Markov transition probabilities. Trans. Amer. Math. Soc. 63, 393-421. - (1948b). Renewal theory from the point of view of the theory of probability. Trans. Amer. Math. Soc. 63, 422-438. - (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Statist. 20, 393-403. - (1951). Continuous parameter martingales. Proc. 2nd Berkeley Symp. Math. Statist. Probab., 269-277. - (1953). Stochastic Processes. Wiley, NY. - (1954). Semimartingales and subharmonic functions. Trans. Amer. Math. Soc. 77, 86-121. - (1955). A probability approach to the heat equation. Trans. Amer. Math. Soc. 80, 216-280. - (1984). Classical Potential Theory and its Probabilistic Counterpart. Springer, NY. - (1994). Measure Theory. Springer, NY. DuBINS, L.E. (1968). On a theorem of Skorohod. Ann. Math. Statist. 39, 20942097.

602

Foundations of Modern Probability

DUBINS, L.E., SCHWARZ, G. (1965). On continuous martingales. Proc. Natl. Acad. Sei. USA 53, 913-916. DUDLEY, R.M. (1966). Weak convergence ofprobabilities on nonseparable metric spaces and empirical measures on Euclidean spaces. Illinois J. Math. 10, 109126. - (1967). Measures on non-separable metric spaces. Illinois J. Math. 11, 449453. - (1968). Distances of probability measures and random variables. Ann. Math. Statist. 39, 1563-1572. - (1989). Real Analysis and Probability. Wadsworth, Brooks & Cole, Pacific Grove, CA. DUNFORD, N. (1939). A mean ergodie theorem. Duke Math. J. 5, 635-646. DUNFORD, N., SCHWARTZ, J.T. (1956). Convergence almost everywhere of operator averages. J. Rat. Mech. Anal. 5, 129-178. DURRETT, R. (1984). Brownian Motion and Martingales in Analysis. Wadsworth, Belmont, CA. - (1995). Probability Theory and Examples, 2nd ed. Wadsworth, Brooks & Cole, Pacific Grove, CA. DvoRETZKY, A. (1972). Asymptotic normality for sums of dependent random variables. Proc. 6th Berkeley Symp. Math. Statist. Probab. 2, 513-535. DYNKIN, E.B. (1952). Criteria of continuity and lack of discontinuities of the second kind for trajectories of a Markov stochastic process (Russian). Izv. Akad. Nauk SSSR, Ser. Mat. 16, 563-572. - (1955a). Infinitesimal operators of Markov stochastic processes (Russian). Dokl. Akad. Nauk SSSR 105, 206-209. - (1955b). Continuous one-dimensional Markov processes (Russian). Dokl. Akad. Nauk SSSR 105, 405-408. - (1956). Markov processes and semigroups of Operators. Infinitesimaloperators of Markov processes. Th. Probab. Appl. 1, 25-60. - (1959). One-dimensional continuous strong Markov processes. Th. Probab. Appl. 4, 3-54. - (1961). Theory of Markov Processes. Engl. trans., Prentice-Hall and Pergarnon Press, Englewood Cliffs, NJ, and Oxford. (Russian orig. 1959.) - (1965). Markov Processes, Vols. 1-2. Engl. trans., Springer, Berlin. (Russian orig. 1963.) - (1978). Sufficient statistics and extreme points. Ann. Probab. 6, 705-730. DYNKIN, E.B., YUSHKEVICH, A.A. (1956). Strong Markov processes. Th. Probab. Appl. 1, 134-139. EINSTEIN, A. (1905). On the movement of small particles suspended in a stationary liquid demanded by the molecular-kinetic theory of heat. Engl. trans. in Investigations on the Theory of the Brownian Movement. Repr. Dover, NY 1956. - (1906). On the theory of Brownian motion. Engl. trans. in Investigations on the Theory of the Brownian Movement. Repr. Dover, NY 1956. ELLIOTT, R.J. (1982). Stochastic Calculus and Applications. Springer, NY. ELLIS, R.S. (1985). Entropy, Large Deviations, and Statistical Mechanics. Springer, NY.

Bibliography

603

ENGELBERT, H.J., SCHMIDT, W. (1981). On the behaviour of certain functionals of the Wiener process and applications to stochastic differential equations. Lect. Notes in Control and Inform. Sei. 36, 47-55. - (1984). On one-dimensional stochastic differential equations with generalized drift. Lect. Notes in Cantrot and Inform. Sei. 69, 143-155. Springer, Berlin. - (1985). On solutions of stochastic differential equations without drift. Z. Wahrsch. verw. Geb. 68, 287-317. ERDÖS, P., FELLER, W., PoLLARD, H. (1949). A theorem on power series. Bull. Amer. Math. Soc. 55, 201-204. ERDÖS, P., KAC, M. (1946). On certain limit theorems in the theory of probability. Bull. Amer. Math. Soc. 52, 292-302. - (1947). On the number of positive sums of independent random variables. Bull. Amer. Math. Soc. 53, 1011-1020. ERLANG, A.K. (1909). The theory of probabilities and telephone conversations. Nyt. Tidskr. Mat. B 20, 33-41. ETHIER, S.N., KURTZ, T.G. (1986). Markov Processes: Characterization and Convergence. Wiley, NY. FABER, G. (1910). Über stetige Funktionen, II. Math. Ann. 69, 372-443. FARRELL, R.H. (1962). Representation of invariant measures. fllinois J. Math. 6, 447-467. FATOU, P. (1906). Series trigonometriques et series de Taylor. Acta Math. 30, 335-400. FELL, J.M.G. (1962). A Hausdorff topology for the closed subsets of a locally compact non-Hausdorff space. Proc. Amer. Math. Soc. 13, 472-476. FELLER, W. (1935-37). Über den zentralen Grenzwertsatz der Wahrscheinlichkeitstheorie, 1-11. Math. Z. 40, 521-559; 42, 301-312. - (1936). Zur Theorie der stochastischen Prozesse (Existenz und Eindeutigkeitssätze). Math. Ann. 113, 113-160. - (1937). On the Kolmogoroff-P. Levy formula for infinitely divisible distribution functions. Proc. Yugoslav Acad. Sei. 82, 95-112. - (1940). On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48, 488-515; 58, 474. - (1949). Fluctuation theory of recurrent events. Trans. Amer. Math. Soc. 67, 98-119. - (1952). The parabolic differential equations and the associated semi-groups of transformations. Ann. Math. 55, 468-519. - (1954). Diffusion processes in one dimension. Trans. Amer. Math. Soc. 77, 1-31. - (1968, 1971). An Introduction to Probability Theory and its Applications, 1 (3rd ed.); 2 (2nd ed.). Wiley, NY (1st eds. 1950, 1966). FELLER, W., OREY, S. (1961). A renewal theorem. J. Math. Mech. 10, 619-624. FEYNMAN, R.P. (1948). Space-time approach to nonrelativistic quantum mechanics. Rev. Mod. Phys. 20, 367-387. DE FINETTI, B. (1929). Sulle funzioni ad incremento aleatorio. Rend. Ace. Naz. Lincei 10, 163-168.

604 -

Foundations of Modern Probability

(1930). Fuzione caratteristica di un fenomeno aleatorio. Mem. R. Ace. Lincei (6) 4, 86-133. (1937). La prevision: ses lois logiques, ses sources subjectives. Ann. Inst. H. Poincare 7, 1-68.

FISK, D.L. (1965). Quasimartingales. Trans. Amer. Math. Soc. 120, 369-389. - (1966). Sampie quadratic variation of continuous, second-order martingales. Z. Wahrsch. verw. Geb. 6, 273-278. FORTET, R. (1943). Les fonctions aleatoires du type de Markoff associees a certaines equations lineaires aux derivees partielles du type parabolique. J. Math. Pures Appl. 22, 177-243. FRANKEN, P., KÖNIG, D., ARNDT, U., SCHMIDT, V. (1981). Queuesand Point Processes. Akademie-Verlag, Berlin. FRECHET, M. (1928). Les Espaces Abstraits. Gauthier-Villars, Paris. FREEDMAN, D. (1962-63). lnvariants under mixing which generalize de Finetti's theorem. Ann. Math. Statist. 33, 916-923; 34, 1194-1216. - (1971a). Markov Chains. Holden-Day, San Francisco. Repr. Springer, NY 1983. - (1971b). Brownian Motion and Diffusion. Holden-Day, San Francisco. Repr. Springer, NY 1983. FREIDLIN, M.l., WENTZEL, A.D. (1970). On small random permutations of dynamical systems. Russian Math. Surveys 25, 1-55. - (1998). Random Perturbations of Dynamical Systems. Engl. trans., Springer, NY. (Russian orig. 1979.) FROSTMAN, 0. (1935). Potentiel d'equilibre et capacite des ensembles avec quelques applications a la theoriedes fonctions. Medd. Lunds Univ. Mat. Sem. 3, 1-118. FUBINI, G. (1907). Sugli integrali multipli. Rend. Ace. Naz. Lincei 16, 608-614. FURSTENBERG, H., KESTEN, H. (1960). Products of random matrices. Ann. Math. Statist. 31, 457-469. GALMARINO, A.R. (1963). Representation of an isotropic diffusion as a skew product. Z. Wahrsch. verw. Geb. 1, 359-378. GARSIA, A.M. (1965). A simple proof of E. Hopf's maximal ergodie theorem. J. Math. Mech. 14, 381-382. - (1973). Martingale Inequalities: Seminar Notes on Recent Progress. Math. Lect. Notes Ser. Benjamin, Reading, MA. GAUSS, C.F. (1809). Theory of Motion of the Heavenly Bodies. Engl. trans., Dover, NY 1963. - (1840). Allgemeine Lehrsätze in Beziehung auf die im vehrkehrten Verhältnisse des Quadrats der Entfernung wirkenden Anziehungs- und Abstossungs-Kräfte. Gauss Werke 5, 197-242. Göttingen 1867. GETOOR, R.K. (1990). Excessive Measures. Birkhäuser, Boston. GETOOR, R.K., SHARPE, M.J. (1972). Gonformal martingales. Invent. Math. 16, 271-308. GIHMAN, 1.1. (1947). On a method of constructing random processes (Russian). Dokl. Akad. Nauk SSSR 58, 961-964.

Bibliography -

605

(195Q-51). On the theory of differential equations for random processes, 1-11 (Russian). Ukr. Mat. J. 2:4, 37--63; 3:3, 317-339.

GIHMAN, 1.1., SKOROHOD, A.V. (1965). Introduction to the Theory of Random Processes. Engl. trans., Saunders, Philadelphia. Repr. Dover, Mineola 1996. - (1974-79). The Theory of Stochastic Processes, 1-3. Engl. trans., Springer, Berlin. GIRSANOV, I. V. (1960). On transforming a certain dass of stochastic processes by absolutely continuous substitution of measures. Th. Probab. Appl. 5, 285-301. GLIVENKO, V.l. (1933). Sulla determinazione empirica della leggi di probabilita. Giorn. Ist. Ital. Attuari 4, 92-99. GNEDENKO, B.V. (1939). On the theory oflimit theorems for sums ofindependent random variables (Russian). Izv. Akad. Nauk SSSR Ser. Mat. 181-232, 643647. GNEDENKO, B.V., KoLMOGOROV, A.N. (1968). Limit Distributions for Sums of Independent Random Variables. Engl. trans., 2nd ed., Addison-Wesley, Reading, MA. (Russian orig. 1949.) GOLDMAN, J.R. (1967). Stochastic point processes: Limit theorems. Ann. Math. Statist. 38, 771-779. GoLDSTErN, J.A. (1976). Semigroup-theoretic proofs of the centrallimit theorem and other theorems of analysis. Semigroup Forum 12, 189-206. GOLDSTEIN, S. (1979). Maximal coupling. Z. Wahrsch. verw. Geb. 46, 193-204. GRANDELL, J. (1976). Doubly Stochastic Poisson Processes. Lect. Notes in Math. 529. Springer, Berlin. GREEN, G. (1828). An essay on the application of mathematical analysis to the theories of electricity and magnetism. Repr. in Mathematical Papers, Chelsea, NY 1970. GREENWOOD, P., PITMAN, J. (1980). Construction of local time and Poisson point processes from nested arrays. J. London Math. Soc. (2) 22, 182-192. GRIFFEATH, D. (1975). A maximal coupling for Markov chains. Z. Wahrsch. verw. Geb. 31, 95-106. GRIGELIONIS, B. (1963). On the convergence of sums of random step processes to a Poisson process. Th. Probab. Appl. 8, 172-182. - (1971). On the representation of integer-valued measures by means of stochastic integrals with respect to Poisson measure. Litovsk. Mat. Sb. 11, 93-108. HAAR, A. (1933). Der Maßbegriff in der Theorie der kontinuerlichen Gruppen. Ann. Math. 34, 147-169. HAGBERG, J. (1973). Approximation of the summation process obtained by sampling from a finite population. Th. Probab. Appl. 18, 790-803. HAHN, H. (1921). Theorie der reellen Funktionen. Julius Springer, Berlin. HAJEK, J. (1960). Limiting distributions in simple random sampling from a finite population. Magyar Tud. Akad. Mat. Kutat6 Int. Közl. 5, 361-374. HALL, P., HEYDE, C.C. (1980). Martingale Limit Theory and its Application. Academic Press, NY.

606

Foundations of Modern Probability

HALMOS, P.R. (1950). Measure Theory, Van Nostrand, Princeton. Repr. Springer, NY 1974. HARDY, G.H., LITTLEWOOD, J.E. (1930). A maximal theorem with functiontheoretic applications. Acta Math. 54, 81-116. HARRIS, T.E. (1956). The existence of stationary measures for certain Markov processes. Proc. 3rd Berkeley Symp. Math. Statist. Probab. 2, 113-124. - (1971). Random measures and motions of point processes. Z. Wahrsch. verw. Geb. 18, 85-115. HARTMAN, P., WINTNER, A. (1941). On the law of the iterated logarithm. J. Math. 63, 169-176. HELLY, E. (1911-12). Über lineare Funktionaloperatoren. Sitzungsber. Nat. Kais. Akad. Wiss. 121, 265-297. HEWITT, E., SAVAGE, L.J. (1955). Symmetrie measures on Cartesian products. Trans. Amer. Math. Soc. 80, 47ü-501. RILLE, E. (1948). Functional analysis and semi-groups. Amer. Math. Colloq. Publ. 31, NY. HITCZENKO, P. (1990). Best constants in martingale version of Rosenthal's inequality. Ann. Probab. 18, 1656-1668. HÖLDER, 0. (1889). Über einen Mittelwertsatz. Nachr. Akad. Wiss. Göttingen, math.phys. Kl., 38-47. HüPF, E. (1954). The general temporally discrete Markov process. J. Rat. Mech. Anal. 3, 13-45. HOROWITZ, J. (1972). Semilinear Markov processes, subordinators and renewal theory. Z. Wahrsch. verw. Geb. 24, 167-193. HUNT, G.A. (1956). Some theorems concerning Brownian motion. Trans. Amer. Math. Soc. 81, 294-319. - (1957-58). Markoff processes and potentials, I-III. Illinois J. Math. 1, 44-93, 316-369; 2, 151-213. HUREWICZ, W. (1944). Ergodietheorem without invariant measure. Ann. Math. 45, 192-206. HURWITZ, A. (1897). Über die Erzeugung der Invarianten durch Integration. Nachr. Ces. Göttingen, math.-phys. Kl., 71-90. IKEDA, N., WATANABE, S. (1989). Stochastic Differential Equations and Diffusion Processes, 2nd ed. North-Holland and Kodansha, Amsterdam and Tokyo. IOFFE, D. (1991). On some applicable versions of abstract large deviations theorems. Ann. Probab. 19, 1629-1639. IONESCU TULCEA, A. (1960). Contributions to information theory for abstract alphabets. Ark. Mat. 4, 235-247. IONESCU TULCEA, C.T. (1949-50). Mesures dans les espaces produits. Atti Accad. Naz. Lincei Rend. 7, 208-211. ITÖ, K. (1942a). Differential equations determining Markov processes (Japanese). Zenkoku Shijö Sügaku Danwakai 244:1077, 1352-1400. - (1942b). On stochastic processes (I) (Infinitely divisible laws of probability). Jap. J. Math. 18, 261-301. - (1944). Stochastic integral. Proc. Imp. Acad. Tokyo 20, 519-524.

Bibliography -

607

(1946). On a stochastic integral equation. Proc. Imp. Acad. Tokyo 22, 32-35. (1951a). On a formula concerning stochastic differentials. Nagoya Math. J. 3, 55-65. (1951b). On stochastic differential equations. Mem. Amer. Math. Soc. 4, 1-51. (1951c). Multiple Wiener integral. J. Math. Soc. Japan 3, 157-169. (1957). Stochastic Processes (Japanese). Iwanami Shoten, Tokyo. (1972). Poisson point processes attached to Markov processes. Proc. 6th Berkeley Symp. Math. Statist. Probab. 3, 225-239. (1984). Introduction to Probability Theory. Engl. trans., Cambridge Univ. Press.

ITÖ, K., McKEAN, H.P. (1965). Diffusion Processes and their Sample Paths. Repr. Springer, Berlin 1996. ITÖ, K., WATANABE, S. (1965). Transformation of Markov processes by multiplicative functionals. Ann. Inst. Fourier 15, 15-30. JACOD, J. (1975). Multivariate point processes: Predictable projection, RadonNikodym derivative, representation of martingales. Z. Wahrsch. verw. Geb. 31, 235-253. - (1979). Calcul Stochastique et Problemes de Martingales. Lect. Notes in Math. 714. Springer, Berlin. JACOD, J., SHIRYAEV, A.N. (1987). Limit Theorems for Stochastic Processes. Springer, Berlin. JAGERS, P. (1972). On the weak convergence of Superpositions of point processes. Z. Wahrsch. verw. Geb. 22, 1-7. - (1974). Aspects of random measures and point processes. Adv. Probab. Rel. Topics 3, 179-239. Marcel Dekker, NY. JAMISON, B., OREY, S. (1967). Markov chains recurrent in the sense of Harris. Z. Wahrsch. verw. Geb. 8, 206-223. JENSEN, J.L.W.V. (1906). Sur les fonctions convexes et les inegalites entre les valeurs moyennes. Acta Math. 30, 175-193. JESSEN, B. (1934). The theory of integration in a space of an infinite number of dimensions. Acta Math. 63, 249-323. JOHNSON, W.B., ScHECHTMAN, G., ZINN, J. (1985). Best constants in moment inequalities for linear combinations of independent and exchangeable random variables. Ann. Probab. 13, 234-253. JORDAN, C. (1881). Sur la serie de Fourier. C.R. Acad. Sei. Paris 92, 228-230. KAC, M. (1947). On the notion of recurrence in discrete stochastic processes. Bull. Amer. Math. Soc. 53, 1002-1010. - (1949). On distributions of certain Wiener functionals. Trans. Amer. Math. Soc. 65, 1-13. - (1951). On some connections between probability theory and differential and integral equations. Proc. 2nd Berkeley Symp. Math. Statist. Probab., 189--215. Univ. of California Press, Berkeley. KAKUTANI, S. (1940). Ergodie theorems and the Markoff process with a stable distribution. Proc. Imp. Acad. Tokyo 16, 49-54. - (1944a). On Brownian motions in n-space. Proc. Imp. Acad. Tokyo 20, 648652.

608

Foundations of Modern Probability

(1944b). Two-dimensional Brownian motion and harmonic functions. Proc. Imp. Acad. Tokyo 20, 706-714. - (1945). Markoff process and the Dirichlet problem. Proc. Japan Acad. 21, 227-233.

-

KALLENBERG, 0. (1973a). Characterization and convergence of random measures and point processes. Z. Wahrsch. verw. Geb. 27, 9-21. - (1973b). Canonical representations and convergence criteria for processes with interchangeable increments. Z. Wahrsch. verw. Geb. 27, 23-36. - (1986). Random Measures, 4th ed. Akademie-Verlag and Academic Press, Berlin and London (1st ed. 1975). - (1987). Homogeneity and the strong Markov property. Ann. Probab. 15, 213240. - (1988). Spreading and predictable sampling in exchangeable sequences and processes. Ann. Probab. 16, 508-534. - (1990). Random time change and an integral representation for marked stopping times. Probab. Th. Rel. Fields 86, 167-202. - (1992). Some time change representations of stable integrals, via predictable transformations of local martingales. Stoch. Proc. Appl. 40, 199-223. - (1996a). On the existence of universal functional solutions to classical SDEs. Ann. Probab. 24, 196-205. - (1996b). Improved criteria for distributional convergence of point processes. Stach. Proc. Appl. 64, 93-102. - (1999a). Ballot theorems and sojourn laws for stationary processes. Ann. Probab. 27, 2011-2019. - (1999b). Asymptotically invariant sampling and averaging from stationary-like processes. Stach. Proc. Appl. 82, 195-204. KALLENBERG, 0., SZTENCEL, R. (1991). Some dimension-free features ofvectorvalued martingales. Probab. Th. Rel. Fields 88, 215-247. KALLIANPUR, G. (1980). Stochastic Filtering Theory. Springer, NY. KAPLAN, E.L. (1955). Transformations of stationary random sequences. Math. Scand. 3, 127-149. KARAMATA, J. (1930). Sur une mode de croissance n3guliere des fonctions. Mathematica (Cluj) 4, 38-53. KARATZAS, 1., SHREVE, S.E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed. Springer, NY. KAZAMAKI, N. (1972). Change of time, stochastic integrals and weak martingales. Z. Wahrsch. verw. Geb. 22, 25-32. KEMENY, J.G., SNELL, J.L., KNAPP, A.W. (1966). Denumerable Markov Chains. Van Nostrand, Princeton. KENDALL, D.G. (1974). Foundations of a theory of random sets. In Stochastic Geometry (eds. E.F. Harding, D.G. Kendall), pp. 322-376. Wiley, NY. KHINCHIN, A.Y. (1923). Über dyadische Brücke. Math. Z. 18, 109-116. - (1924). Über einen Satz der Wahrscheinlichkeitsrechnung. Fund. Math. 6, 920. - (1929). Über einen neuen Grenzwertsatz der Wahrscheinlichkeitsrechnung. Math. Ann. 101, 745-752.

Bibliography

-

609

(1933). Zur mathematischen Begründing der statistischen Mechanik. Z. Angew. Math. Mech. 13, 101-103. (1933). Asymptotische Gesetze der Wahrscheinlichkeitsrechnung, Springer, Berlin. Repr. Chelsea, NY 1948. (1934). Korrelationstheorie der stationären stochastischen Prozesse. Math. Ann. 109, 604-615. (1937). Zur Theorie der unbeschränkt teilbaren Verteilungsgesetze. Mat. Sb. 2, 79--119. (1938). Limit Laws for Sums of Independent Random Variables (Russian). Moscow. (1960). Mathematical Methods in the Theory of Queuing. Engl. trans., Griffin, London. (Russian orig. 1955.)

KHINCHIN, A.Y., KOLMOGOROV, A.N. (1925). Über Konvergenz von Reihen deren Glieder durch den Zufall bestimmt werden. Mat. Sb. 32, 668-676. KINGMAN, J.F.C. (1964). On doubly stochastic Poisson processes. Proc. Garnbridge Phil. Soc. 60, 923-930. - (1967). Completely random measures. Pac .. J. Math. 21, 59-78. - (1968). The ergodie theory of subadditive stochastic processes. J. Roy. Statist. Soc. (B) 30, 499--510. - (1972). Regenerative Phenomena. Wiley, NY. - (1993). Poisson Processes. Clarendon Press, Oxford. KINNEY, J.R. (1953). Continuity properties of Markov processes. Trans. Amer. Math. Soc. 74, 28ü-302. KNIGHT, F.B. (1963). Random walks and a sojourn density process of Brownian motion. Trans. Amer. Math. Soc. 107, 56-86. - (1971). A reduction of continuous, square-integrable martingales to Brownian motion. Lect. Notes in Math. 190, 19--31. Springer, Berlin. KOLMOGOROV, A.N. (1928-29). Über die Summen durch den Zufall bestimmter unabhängiger Grössen. Math. Ann. 99, 309--319; 102, 484-488. - (1929). Über das Gesatz des iterierten Logarithmus. Math. Ann. 101, 126-135. - (1930). Sur la loi forte des grandes nombres. C.R. Acad. Sei. Paris 191, 91ü912. - (1931a). Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung. Math. Ann. 104, 415-458. - (1931b). Eine Verallgemeinerung des Laplace-Liapounoffschen Satzes. lzv. Akad. Nauk USSR, Otdel. Matern. Yestestv. Nauk 1931, 959--962. - (1932). Sulla forma generale di un processo stocastico omogeneo (un problema di B. de Finetti). Atti Accad. Naz. Lincei Rend. (6) 15, 805-808, 866-869. - (1933a). Über die Grenzwertsätze der Wahrscheinlichkeitsrechnung. Izv. Akad. Nauk USSR, Otdel. Matern. Yestestv. Nauk 1933, 363-372. - (1933b). Zur Theorie der stetigen zufälligen Prozesse. Math. Ann. 108, 149-160. - (1933c). Foundations of the Theory of Probability (German), Springer, Berlin. Engl. trans., Chelsea, NY 1956. - (1935). Some current developments in probability theory (in Russian). Proc. 2nd All-Union Math. Congr. 1, 349-358. Akad. Nauk SSSR, Leningrad. - (1936a). Anfangsgründe der Markoffsehen Ketten mit unendlich vielen möglichen Zuständen. Mat. Sb. 1, 607-610.

610

-

Foundations of Modern Probability

(1936b). Zur Theorie der Markoffsehen Ketten. Math. Ann. 112, 155-160. (1937). Zur Umkehrbarkeit der statistischen Naturgesetze. Math. Ann. 113, 766-772. (1956). On Skorohod convergence. Th. Probab. Appl. 1, 213-222.

KOLMOGOROV, A.N., LEONTOVICH, M.A. (1933). Zur Berechnung der mittleren Brownschen Fläche. Physik. Z. Sowjetunion 4, 1-13. KOMLOS, J., MAJOR, P., TUSNADY, G. (1975-76). An approximation of partial sums of independent r.v.'s and the sample d.f., 1-11. Z. Wahrsch. verw. Geb. 32, 111-131; 34, 33-58. KÖNIG, D., MATTHES, K. (1963). Verallgemeinerung der Erlangschen Formeln, I. Math. Nachr. 26, 45-56. KOOPMAN, B.O. (1931). Hamiltonian systems and transformations in Hilbert space. Proc. Nat. Acad. Sei. USA 17, 315-318. KRENGEL, U. (1985). Ergodie Theorems. de Gruyter, Berlin. KRICKEBERG, K. (1956). Convergence of martingales with a directed index set. Trans. Amer. Math. Soc. 83, 313-357. - (1972). The Cox process. Symp. Math. 9, 151-167. KRYLOV, N., BoGOLIOUBOV, N. (1937). La theorie generale de la mesure dans son application a l'etude des systemes de la mecanique non lineaires. Ann. Math. 38, 65-113. KuLLBACK, S., LEIBLER, R.A. (1951). On information and sufficiency. Ann. Math. Statist. 22, 79--86. KUNITA, H. (1990). Stochastic Flows and Stochastic Differential Equations. Cambridge Univ. Press, Cambridge. KUNITA, H., WATANABE, S. (1967). On square integrable martingales. Nagoya Math. J. 30, 209-245. KURTZ, T.G. (1969). Extensions of Trotter's operator semigroup approximation theorems. J. Funct. Anal. 3, 354-375. - (1975). Semigroups of conditioned shifts and approximation of Markov processes. Ann. Probab. 3, 618-642. KwAPIEN, S., WüYCZYNSKI, W.A. (1992). Random Series and Stochastic Integrals: Single and Multiple. Birkhäuser, Boston. LANGEVIN, P. (1908). Sur la theorie du mouvement brownien. C.R. Acad. Sei. Paris 146, 53ü-533. LAPLACE, P.S. DE (1774). Memoire sur la probabilite des causes par les evenemens. Engl. trans. in Statistical Science 1, 359-378. - (1809). Memoire sur divers points d'analyse. Repr. in Oeuvres Completes de Laplace 14, 178-214. Gauthier-Villars, Paris 1886-1912. - (1812-20). Theorie Analytique des Probabilites, 3rd ed. Repr. in Oeuvres Completes de Laplace 7. Gauthier-Villars, Paris 1886-1912. LAST, G., BRANDT, A. (1995). Marked Point Processes on the Real Line: The Dynamic Approach. Springer, NY. LEADBETTER, M.R., LINDGREN, G., ROOTZEN, H. (1983). Extremesand Related Properlies of Random Sequences and Processes. Springer, NY.

Bibliography

611

LEBESGUE, H. (1902). Integrale, longeur, aire. Ann. Mat. Pura Appl. 7, 231-359. - (1904). Ler,;ons sur ['Integration et la Recherche des Fonctions Primitives. Paris. LE CAM, L. (1957). Convergenee in distribution of stoehastie proeesses. Univ. California Publ. Statist. 2, 207-236. LE GALL, J .F. (1983). Applieations des temps loeaux aux equations differentielles stoehastiques unidimensionelles. Leet. Notes in Math. 986, 15-31. LEVI, B. (1906a). Sopra l'integrazione delle serie. Rend. Ist. Lombardo Sei. Lett. (2) 39, 775-780. - (1906b). Sul principio de Diriehlet. Rend. Cire. Mat. Palermo 22, 293-360. LEVY, P. (1922a). Sur le röle de la loi de Gauss dans la theoriedes erreurs. C.R. Aead. Sei. Paris 174, 855-857. - (1922b). Sur la loi de Gauss. C.R. Aead. Sei. Paris 1682-1684. - (1922e). Sur la determination des lois de probabilite par leurs fonetions earaeteristiques. C.R. Aead. Sei. Paris 175, 854-856. - (1924). Theorie des erreurs. La loi de Gauss et les lois exeeptionelles. Bull. Soc. Math. Franee 52, 49-85. - (1925). Caleul des Probabilites. Gauthier-Villars, Paris. - (1934-35). Sur les integrales clont les elements sont des variables aleatoires independantes. Ann. Seuola Norm. Sup. Pisa (2) 3, 337-366; 4, 217-218. - (1935a). Proprietes asymptotiques des sommes de variables aleatoires independantes ou enehainees. J. Math. Pures Appl. (8) 14, 347-402. - (1935b). Proprietes asymptotiques des sommes de variables aleatoires enehainees. Bull. Sei. Math. (2) 59, 84-96, 109-128. - (1939). Sur eertain proeessus stoehastiques homogenes. Comp. Math. 7, 283339. - (1940). Le mouvement brownien plan. Amer. J. Math. 62, 487-550. - (1954). Theorie de l'Addition des Variables Aleatoires, 2nd ed. GauthierVillars, Paris (1st ed. 1937). - (1965). Proeessus Stoehastiques et Mouvement Brownien, 2nd ed. GauthierVillars, Paris (1st ed. 1948). LIAPOUNOV, A.M. (1901). Nouvelle forme du theoreme sur la Iimite des probabilites. Mem. Aead. Sei. St. Petersbourg 12, 1-24. LIGGETT, T.M. (1985). An improved subadditive ergodie theorem. Ann. Probab. 13, 1279-1285. LINDEBERG, J.W. (1922a). Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung. Math. Zeitsehr. 15, 211-225. - (1922b). Sur la loi de Gauss. C.R. Aead. Sei. Paris 174, 1400-1402. LINDVALL, T. (1973). Weak convergence of probability measures and random functions in the function space D[O, oo). J. Appl. Probab. 10, 109-121. - (1977). A probabilistic proof of Blackwell's renewal theorem. Ann. Probab. 5, 482-485. - (1992). Leetures on the Coupling Method. Wiley, NY. LIPSTER, R.S., SHIRYAEV, A.N. (2000). Statisties of Random Proeesses, I-li, 2nd ed., Springer, Berlin.

612

Foundations of Modern Probability

LOEVE, M. (1977-78). Probability Theory 1-2, 4th ed. Springer, NY (1st ed. 1955). LOMNICKI, Z., ULAM, S. (1934). Sur la theorie de la mesure dans les espaces combinatoires et son application au calcul des probabilites: I. Variables independantes. Pund. Math. 23, 237-278. LUKACS, E. (1970). Chamcteristic Punctions, 2nd ed. Griffin, London. LUNDBERG, F. (1903). Approximemd Pramställning av Sannolikhetsfunktionen. Aterförsäkring av Kollektivrisker. Thesis, Uppsala. MACKEVICIUS, V. (1974). On the question of the weak convergence of random processes in the space D[O, oo ). Lithuanian Math. Trans. 14, 62ü-623. MAISONNEUVE, B. (1974). Systemes Regenemtifs. Asterique 15. Soc. Math. de France. MAKER, P. (1940). The ergodie theorem for a sequence of functions. Duke Math. J. 6, 27-30. MANN, H.B., WALD, A. (1943). On stochastic limit and order relations. Ann. Math. Statist. 14, 217-226. MARCINKIEWICZ, J., ZYGMUND, A. (1937). Sur les fonctions independantes. Fund. Math. 29, 6ü-90. - (1938). Quelques theoremes sur les fonctions independantes. Studia Math. 7, 104-120. MARKOV, A.A. (1899). The law oflarge numbers and the method ofleast squares (Russian). Izv. Fiz.-Mat. Obshch. Kazan Univ. (2) 8, llü-128. - (1906). Extension of the law of large numbers todependent events (Russian). Bull. Soc. Phys. Math. Kazan (2) 15, 135-156. MARUYAMA, G. (1954). On the transition probability functions of the Markov process. Natl. Sei. Rep. Ochanomizu Univ. 5, lü-20. - (1955). Continuous Markov processes and stochastic equations. Rend. Circ. Mat. Palermo 4, 48-90. MARUYAMA, G., TANAKA, H. (1957). Some properties of one-dimensional diffusion processes. Mem. Fac. Sei. Kyushu Univ. 11,117-141. MATHERON, G. (1975). Random Setsand Integml Geometry. Wiley, London. MATTHES, K. (1963). Stationäre zufällige Punktfolgen, I. Jahresber. Deutsch. Math.- Verein. 66, 66-79. MATTHES, K., KERSTAN, J., MECKE, J. (1978). lnfinitely Divisible Point Processes. Wiley, Chichester. (German ed. 1974, Russian ed. 1982.) McKEAN, H.P. (1969). Stochastic Integrals. Academic Press, NY. McKEAN, H.P., TANAKA, H. (1961). Additive functionals of the Brownian path. Mem. Coll. Sei. Univ. Kyoto, A 33, 479-506. McMILLAN, B. (1953). The basic theorems of information theory. Ann. Math. Statist. 24, 196-219. MECKE, J. (1967). Stationäre zufällige Maße auf lokalkompakten Abelschen Gruppen. Z. Wahrsch. verw. Geb. 9, 36-58. - (1968). Eine characteristische Eigenschaft der doppelt stochastischen Poissonschen Prozesse. Z. Wahrsch. verw. Geb. 11, 74-81.

Bibliography MELEARD, S. (1986). Applieation du ealcul stoehastique de Markov reguliers sur [0, 1]. Stochastics 19, 41-82.

613

a l'etude des proeessus

METIVIER, M. (1982). Semimartingales: A Course on Stochastic Processes. de Gruyter, Berlin. METIVIER, M., PELLAUMAIL, J. (1980). Stochastic Integration. Aeademic Press, NY. MEYER, P.A. (1962). A deeomposition theorem for supermartingales. Illinois J. Math. 6, 193-205. - (1963). Deeomposition of supermartingales: The uniqueness theorem. Illinois J. Math. 7, 1-17. - (1966). Probability and Potentials. Engl. trans., Blaisdell, Waltham. - (1967). Integrales stoehastiques, I-IV. Lect. Notes in Math. 39, 72-162. Springer, Berlin. - (1971). Demonstration simplifiee d'un theoreme de Knight. Lect. Notes in Math. 191, 191-195. Springer, Berlin. - (1976). Un eours sur les integrales stoehastiques. Lect. Notes in Math. 511, 245-398. Springer, Berlin. MILLAR, P.W. (1968). Martingale integrals. Trans. Amer. Math. Soc. 133, 145166. MINKOWSKI, H. (1907). Diophantische Approximationen. Teubner, Leipzig. MrTOMA, I. (1983). Tightness of probabilities on C([O, 1]; S') and D([O, 1]; S'). Ann. Probab. 11, 989-999. DE MOIVRE, A. (1711-12). On the measurement of ehance. Engl. trans., Int. Statist. Rev. 52 (1984), 229-262. - (1718-56). The Doctrine of Chances; or, a Method of Calculating the Probability of Events in Play, 3rd ed. (post.) Repr. Case and Chelsea, London and NY 1967. - (1733-56). Approximatio ad Summam Terminorum Binomii a +bin in Seriem Expansi. Translated and edited in The Doctrine of Chances, 2nd and 3rd eds. Repr. Case and Chelsea, London and NY 1967. MÖNCH, G. (1971). Verallgemeinerung eines Satzes von A. Renyi. Studia Sei. Math. Hung. 6, 81-90. MüTOO, M., WATANABE, H. (1958). Ergodie property of reeurrent diffusion proeess in one dimension. J. Math. Soc. Japan 10, 272-286. NAWROTZKI, K. (1962). Ein Grenzwertsatz für homogene zufällige Punktfolgen (Verallgemeinerung eines Satzes von A. Renyi). Math. Nachr. 24, 201-217. VON NEUMANN, J. (1932). Proof of the quasi-ergodie hypothesis. Proc. Natl. Acad. Sei. USA 18, 70-82. - (1940). On rings of operators, 111. Ann. Math. 41, 94-161. NEVEU, J. (1971). Mathematical Foundations of the Calculus of Probability. Holden-Day, San Franciseo. - (1975). Discrete-Parameter Martingales. North-Holland, Amsterdam. NGUYEN, X.X., ZESSIN, H. (1979). Ergodie theorems for spatial processes. Z. Wahrsch. verw. Geb. 48, 133-158. NIKODYM, O.M. (1930). Sur une generalisation des integrales de M. J. Radon. Fund. Math. 15, 131-179.

614

Foundations of Modern Probability

NoRBERG, T. (1984). Convergence and existence of random set distributions. Ann. Probab. 12, 726-732. NOVIKOV, A.A. (1971). On moment inequalities for stochastic integrals. Th. Probab. Appl. 16, 538-541. - (1972). On an identity for stochastic integrals. Th. Probab. Appl. 17, 717-720. NUALART, D. (1995). The Malliavin Caleulus and Related Topies. Springer, NY. 0KSENDAL, B. (1998). Stoehastie Differential Equations, 5th ed. Springer, Berlin. OREY, S. (1959). Recurrent Markov chains. Paeifie J. Math. 9, 805-827. - (1962). An ergodie theorem for Markov chains. Z. Wahrseh. verw. Geb. 1, 174-176. - (1966). F-processes. Proe. 5th Berkeley Symp. Math. Statist. Probab. 2:1, 301313. - (1971). Limit Theorems for Markov Chain 'ITansition Probabilities. Van Nostrand, London. ORNSTEIN, D.S. {1969). Random walks. Trans. Amer. Math. Soe. 138, 1-60. ORNSTEIN, L.S., UHLENBECK, G.E. (1930). On the theory of Brownian motion. Phys. Review 36, 823-841. OsosKOV, G.A. {1956). A limit theorem for fiows of homogeneous events. Th. Probab. Appl. 1, 248-255. OTTAVIANI, G. (1939). Sulla teoria astratta del calcolo delle probabilita proposita dal Cantelli. Giorn. Ist. Ital. Attuari 10, 1ü-40. PALEY, R.E.A.C. (1932). A remarkable series of orthogonal functions I. Proe. London Math. Soe. 34, 241-264. PALEY, R.E.A.C., WIENER, N. (1934). Fourier transforms in the complex domain. Amer. Math. Soe. Coll. Publ. 19. PALEY, R.E.A.C., WIENER, N., ZYGMUND, A. (1933). Notes on random functions. Math. Z. 37, 647-668. PALM, C. (1943). Intensity Variations in Telephone Traffic (German). Eriesson Teeknies 44, 1-189. Engl. trans., North-Holland Studies in Teleeommunieation 10, Elsevier 1988. PAPANGELOU, F. {1972). Integrability of expected increments of point processes and a related random change of scale. 'ITans. Amer. Math. Soe. 165, 486-506. PARTHASARATHY, K.R. (1967). Probability Measures on Metrie Spaees. Academic Press, NY. PERKINS, E. {1982). Local time and pathwise uniqueness for stochastic differential equations. Lect. Notes in Math. 920, 201-208. Springer, Berlin. PETROV, V.V. (1995). Limit Theorems of Probability Theory. Glarendon Press, Oxford. PHILLIPS, H.B., WIENER, N. {1923). Nets and Dirichlet problem. J. Math. Phys. 2, 105-124. PITT, H.R. {1942). Some generalizations of the ergodie theorem. Proe. Camb. Phil. Soe. 38, 325-343. POINCARE, H. (1890). Sur les equations aux derivees partielles de la physique mathema-tique. Amer. J. Math. 12, 211-294. - {1899). Theorie du Potentiel Newtonien. Gauthier-Villars, Paris.

Bibliography

615

POISSON, S.D. (1837). Recherehes sur la Probabilite des Jugements en Matiere Criminelle et en Matiere Civile, Precedees des Regles Genemles du Calcul des Probabilites. Bachelier, Paris. POLLACZEK, F. (1930). Über eine Aufgabe der Wahrscheinlichkeitstheorie, 1-11. Math. Z. 32, 64-100, 729-750. POLLARD, D. (1984). Convergence of Stochastic Processes. Springer, NY. P6LYA, G. (1920). Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das Momentenproblem. Math. Z. 8, 171-181. - (1921). Über eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend die Irrfahrt im Strassennetz. Math. Ann. 84, 149-160. PORT, S.C., STONE, C.J. (1978). Brownian Motion and Classical Potential Theory. Academic Press, NY. POSPISIL, B. (1935-36). Sur un problerne de M.M.S. Bernstein et A. Kolmogoroff. Casopis Pest. Mat. Fys. 65, 64-76. PROHOROV, Y.V. (1956). Convergence of random processes and Iimit theorems in probability theory. Th. Probab. Appl. 1, 157-214. - (1959). Some remarks on the strong law of large numbers. Th. Probab. Appl. 4, 204-208. - (1961). Raudom measures on a compactum. Soviet Math. Dokl. 2, 539-541. PROTTER, P. (1990). Stochastic Integmtion and Differential Equations. Springer, Berlin. PUKHALSKY, A.A. (1991). On functional principle of large deviations. In New Trends in Probability and Statistics (V. Sazonov and T. Shervashidze, eds.), 198-218. VSP Moks'las, Moscow. RADON, J. (1913). Theorie und Anwendungen der absolut additiven Mengenfunktionen. Wien Akad. Sitzungsber. 122, 1295-1438. RAO, K.M. (1969a). On decomposition theorems of Meyer. Math. Scand. 24, 66-78. - (1969b). Quasimartingales. Math. Scand. 24, 79-92. RAY, D.B. (1956). Stationary Markov processes with continuous paths. Trans. Amer. Math. Soc. 82, 452-493. - (1963). Sojourn times of a diffusion process. Illinois J. Math. 7, 615-630. RENYI, A. (1956). A characterization of Poisson processes. Magyar Tud. Akad. Mat. Kutato Int. Közl. 1, 519-527. - (1967). Remarks on the Poisson process. Studia Sei. Math. Hung. 2, 119-123. REvuz, D. (1970). Mesures assocü~es aux fonctionnelles additives de Markov, 1-11. Trans. Amer. Math. Soc. 148, 501-531; Z. Wahrsch. verw. Geb. 16, 336-344. - (1984). Markov Chains, 2nd ed. North-Holland, Amsterdam. REVUZ, D., YoR, M. (1999). Continuous Martingales and Brownian Motion, 23rd ed. Springer, Berlin. RIESZ, F. (1909a). Sur les suites de fonctions mesurables. C.R. Acad. Sei. Paris 148, 1303-1305. - (1909b). Sur les operations fonctionelles lineaires. C.R. Acad. Sei. Paris 149, 974-977.

616 -

Foundations of Modern Probability

(1910). Untersuchungen über Systeme integrierbarer Funktionen. Math. Ann. 69, 449-497. (1926-30). Sur les fonctions subharmoniques et leur rapport a la theorie du potentiel, 1-11. Acta Math. 48, 329-343; 54, 321-360.

ROGERS, C.A., SHEPHARD, G.C. (1958). Some extremal problems for convex bodies. Mathematica 5, 93-102. ROGERS, L.C.G., WILLIAMS, D. (2000a/b). Diffusions, Markov Processes, and Martingales, 1 (2nd ed.); 2. Cambridge Univ. Press. RosEN, B. (1964). Limit theorems for sampling from a finite population. Ark. Mat. 5, 383-424. RosrNSKI, J., WOYCZYNSKI, W.A. (1986). On Itö stochastic integration with respect to p-stable motion: Inner clock, integrability of sample paths, double and multiple integrals. Ann. Probab. 14, 271-286. ROYDEN, H.L. (1988). Real Analysis, 3rd ed. Macmillan, NY. RUTHERFORD, E., GEIGER, H. (1908). An electrical method of counting the number of particles from radioactive substances. Proc. Roy. Soc. A 81, 141161. RYLL-NARDZEWSKI, C. (1957). On stationary sequences of random variables and the de Finetti's [sie] equivalence. Colloq. Math. 4, 149-156. - (1961). Remarks on processes of calls. Proc. 4th Berkeley Symp. Math. Statist. Probab. 2, 455-465. SANOV, I.N. (1957). On the probability of large deviations of random variables (Russian). Engl. trans.: Sel. Trans. Math. Statist. Probab. 1 (1961), 213-244. SCHILDER, M. (1966). Some asymptotic formulae for Wiener integrals. Trans. Amer. Math. Soc. 125, 63-85. ScHOENBERG, I.J. (1938). Metric spaces and completely monotone functions. Ann. Math. 39, 811-841. SCHRÖDINGER, E. (1931). Über die Umkehrung der Naturgesetze. Sitzungsber. Preuss. Akad. Wiss. Phys. Math. Kl. 144-153. VAN SCHUPPEN, J.H., WONG, E. (1974). Transformation of local martingales under a change of law. Ann. Probab. 2, 879-888. SEGAL, I.E. (1954). Abstract probability spaces and a theorem of Kolmogorov. Amer. J. Math. 76, 721-732. SHANNON, C.E. (1948). A mathematical theory of communication. Bell System Tech. J. 27, 379-423, 623-656. SHARPE, M. (1988). General Theory of Markov Processes. Academic Press, Boston. SHIRYAEV, A.N. (1995). Probability, 2nd ed. Springer, NY. SHUR, M.G. (1961). Continuous additive functionals of a Markov process. Dokl. Akad. Nauk SSSR 137, 800-803. SrERPINSKI, W. (1928). Unetheoremegenerale sur les familles d'ensemble. Fund. Math. 12, 206-210. SKOROHOD, A.V. (1956). Limit theorems for stochastic processes. Th. Probab. Appl. 1, 261-290.

Bibliography -

617

(1957). Limittheorems for stochastic processes with independent increments. Th. Probab. Appl. 2, 122-142. (1961-62). Stochastic equations for diffusion processes in a bounded region, 1-11. Th. Probab. Appl. 6, 264-274; 7, 3-23. (1965). Studies in the Theory of Random Processes. Addison-Wesley, Reading, MA. (Russian orig. 1961.)

SLIVNYAK, I.M. (1962). Some properties of stationary flows of homogeneaus random events. Th. Probab. Appl. 7, 336-341. SLUTSKY, E.E. (1937). Qualehe proposizione relativa alla teoria delle funzioni aleatorie. Giorn. Ist. Ital. Attuari 8, 183-199. SNELL, J.L. (1952). Application of martingale system theorems. Trans. Amer. Math. Soc. 73, 293-312. SovA, M. (1967). Convergence d'operations lineaires non bornees. Rev. Roumaine Math. Pures Appl. 12, 373-389. SPARRE-ANDERSEN, E. (1953-54). On the fluctuations of sums of random variables, 1-11. Math. Scand. 1, 263-285; 2, 195-223. SPARRE-ANDERSEN, E., JESSEN, B. (1948). Some limit theorems on set-functions. Danske Vid. Selsk. Mat.-Fys. Medd. 25:5 (8 pp.). SPITZER, F. (1964). Electrostatic capacity, heat flow, and Brownian motion. Z. Wahrsch. verw. Geb. 3, 110-121. - (1976). Principles of Random Walk, 2nd ed. Springer, NY. STIELTJES, T.J. (1894-95). Recherehes sur les fractions continues. Ann. Fac. Sei. Toulouse 8, 1-122; 9, 1-47. STONE, C.J. (1963). Weak convergence of stochastic processes defined on a semiinfinite time interval. Proc. Amer. Math. Soc. 14, 694-696. - (1969). On the potential operator for one-dimensional recurrent random walks. Trans. Amer. Math. Soc. 136, 427-445. STONE, M.H. (1932). Linear transformations in Hilbert space and their applications to analysis. Amer. Math. Soc. Coll. Publ. 15. STOUT, W.F. (1974). Almost Sure Convergence. Academic Press, NY. STRASSEN, V. (1964). An invariance principle for the law of the iterated logarithm. Z. Wahrsch. verw. Geb. 3, 211-226. STRATONOVICH, R.L. (1966). A new representation for stochastic integrals and equations. SIAM J. Control 4, 362-371. STRICKER, C., YoR, M. (1978). Calcul stochastique dependant d'un parametre. Z. Wahrsch. verw. Geb. 45, 109-133. STROOCK, D.W. (1993). Probability Theory: An Analytic View. Cambridge Univ. Press. STROOCK, D.W., VARADHAN, S.R.S. (1969). Diffusion processes with continuous coefficients, 1-11. Comm. Pure Appl. Math. 22, 345-400, 479-530. - (1979). Multidimensional Diffusion Processes. Springer, Berlin. SuCHESTON, L. (1983). On one-parameter proofs of almost sure convergence of multiparameter processes. Z. Wahrsch. verw. Geb. 63, 43-49. TAKACS, L. (1967). Combinatorial Methods in the Theory of Stochastic Processes. Wiley, NY.

618

Foundations of Modern Probability

TANAKA, H. (1963). Note on eontinuous additive funetionals of the I-dimensional Brownian path. Z. Wahrsch. verw. Geb. 1, 251-257. TEMPEL'MAN, A.A. (1972). Ergodie theorems for general dynamical systems. Trans. Moscow Math. Soc. 26, 94-132. THORISSON, H. (1996). Transforming random elements and shifting random fields. Ann. Probab. 24, 2057-2064. - (2000). Coupling, Stationarity, and Regeneration. Springer, NY. TONELLI, L. (1909). Sull'integrazione per parti. Rend. Ace. Naz. Lincei (5) 18, 246-253. TROTTER, H.F. (1958a). Approximation of semi-groups of operators. Pacific J. Math. 8, 887-919. - (1958b). A property of Brownian motion paths. Illinois J. Math. 2, 425-433. VARADARAJAN, V.S. (1958). Weak eonvergenee of measures on separable metrie spaces. On the eonvergenee of probability distributions. Sankhyä 19, 15-26. - (1963). Groups of automorphisms of Borel spaees. Trans. Amer. Math. Soc. 109, 191-220. VARADHAN, S.R.S. (1966). Asymptotie probabilities and differential equations. Comm. Pure Appl. Math. 19, 261-286. - (1984). Large Deviations and Applications. SIAM, Philadelphia. VILLE, J. (1939). Etude Critique de la Notion du Collectif Gauthier-Villars, Paris. VITALI, G. (1905). Sulle funzioni integrali. Atti R. Accad. Sei. Torino 40, 753766. VOLKONSKY, V.A. (1958). Random time ehanges in strong Markov proeesses. Th. Probab. Appl. 3, 31ü-326. - (1960). Additive funetionals of Markov proeesses. Trudy Mosk. Mat. Obshc. 9, 143-189. WALD, A. (1946). Differentiation under the integral sign in the fundamental identity of sequential analysis. Ann. Math. Statist. 17, 493-497. - (1947). Sequential Analysis. Wiley, NY. WALSH, J.B. (1978). Exeursions and loeal time. Asterisque 52-53, 159-192. WANG, A.T. (1977). Generalized Itö's formula and additive functionals of Brownian motion. Z. Wahrsch. verw. Geb. 41, 153-159. WATANABE, H. (1964). Potentialoperator of a recurrent strong Feiler proeess in the striet sense and boundary value problem. J. Math. Soc. Japan 16, 83-95. WATANABE, S. (1964). On diseontinuous additive functionals and Levy measures of a Markov proeess. Japan. J. Math. 34, 53-79. - (1968). A Iimit theorem of branehing proeesses and eontinuous state branehing proeesses. J. Math. Kyoto Univ. 8, 141-167. WEIL, A. (1940). L'integration dans les Groupes Topologiques et ses Applications. Hermann et Cie, Paris. WIENER, N. (1923). Differential spaee. J. Math. Phys. 2, 131-174. - (1938). The homogeneaus ehaos. Amer. J. Math. 60, 897-936. - (1939). The ergodie theorem. Duke Math. J. 5, 1-18.

Bibliography WILLIAMS,

619

D. (1991). Probability with Martingales. Cambridge Univ. Press.

T. (1973). On a eomparison theorem for solutions of stoehastie differential equations and its applieations. J. Math. Kyoto Univ. 13, 497-512.

YAMADA,

T., WATANABE, S. (1971). On the uniqueness of solutions of stoehastie differential equations. J. Math. Kyoto Univ. 11, 155-167.

YAMADA,

C. (1976). Decompositions des martingales loeales et formules exponentielles. Leet. Notes in Math. 511, 432-480. Springer, Berlin.

YoEURP,

M. (1978). Sur la eontinuite des temps loeaux associee semimartingales. Asterisque 52-53, 23-36.

YoR,

a

eertaines

K. (1948). On the differentiability and the representation of oneparameter semigroups of linear Operators. J. Math. Soe. Japan 1, 15-21.

YosmA,

K., KAKUTANI, S. (1939). Birkhoff's ergodie theorem and the maximal ergodie theorem. Proe. Imp. Aead. 15, 165-168.

YOSIDA, ZÄHLE,

M. (1980). Ergodie properties of general Palm measures. Math. Naehr.

95, 93-106. ZAREMBA,

S. (1909). Sur le prineipe du minimum. Bull. Aead. Sei. Cmeovie.

A. (1951). An individual ergodie theorem for noneommutative transformations. Acta Sei. Math. (Szeged) 14, 103-110.

ZYGMUND,

Symbol Index A.,

499 391 A\ 372, 442 Ac, A \ B, A.6.B, A X B, 2 A, A~-', 13,46

Fv, :Ffy, 480 F0Q, 2 :FVQ, Vn:Fn, 50 FlLQ, :Fllg1-l, 50, 109 J±, 11 J-I, 3 JI, Jfj, 340 f · A, 442 f 0 g, 5 f 0 g, 262 (f,g),f.Lg, 17 f. J.L, 12 f >- U, f--< V, 36 !.., 567

An,

1

IBI, 187 B 0 , B-, 541 B, B(S), 2

6,

378 Co, Cif, 369, 374 ck, 340 ct:, 98, 225 Cb(S), 65 C(K,S), 307 cß, 481 cov(~, 77), cov[~; AJ,

!j' '{)B,

50, 302

Dh, vh, 434 D(R+, S), 313, 563 D([O, 1], S), 319 .6., V, 1, 287, 375, 377, 483 8, fJ, 150, 187, 473 Ox, 8 d' 48

~.

65

48, 225 443 Ex, E~-