Mathematical foundations of the calculus of probability

1,526 157 11MB

English Pages [248] Year 1965

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Mathematical foundations of the calculus of probability

Citation preview

NUNC COGNOSCO EX PARTE

TRENT UNIVERSITY LIBRARY

Digitized by the Internet Archive in 2019 with funding from Kahle/Austin Foundation

https://archive.org/details/mathematicalfounOOOOneve

MATHEMATICAL FOUNDATIONS OF

THE CALCULUS OL PROBABILITY

HOLDEN-DAY SERIES IN PROBABILITY AND STATISTICS

E. L. Lehmann, Editor

MATHEMATICAL FOUNDATIONS OF

THE CALCULUS OF PROBABILITY By JACQUES NEVEU Faculty of Sciences, University of Paris

Translated by AMIEL FEINSTEIN Foreword by R. FORTET

HOLDEN-DAY, INC. San Francisco, London, Amsterdam 1965

This book is translated from Bases Mathématiques du Calcul des Probabilités, 1964, Masson et Cie, Paris © Copyright 1965 by Holden-Day, Inc., 728 Montgomery Street, San Francisco, California. All rights reserved. No part of this book may be reproduced in any form without permission in writing from the publisher Library of Congress Catalog Card Number: 66-11140 Printed in the United States of America

QNULP

FOREWORD

In its present state, the calculus of probability and, in particular, the theory of stochastic processes and vector-valued random variables, cannot be understood by one who does not have, to begin with, a thorough understanding of measure theory.

If one is to prepare for participation

in the future development of the calculus of probability, it is not sufficient to know the fundamental concepts and results of measure theory; one must also be experienced in its techniques and able to use them and extend them to new situations. Again, one often hears—and quite justifiably in a certain sense—that the calculus of probability is simply a paragraph of the theory of measure; but within measure theory, the calculus of probability stands out by the nature of the questions which it seeks to answer—a nature which has its origins not in measure theory itself, but in the philosophical and practical content of the notion of probability. The advanced course in the calculus of probability is aimed at those students having the body of knowledge which in France is called “licence de Mathématiques”; this body of knowledge covers mathematics in general.

It naturally encompasses the theory of measure and integration,

but is necessarily limited to an introduction to the subject.

It is thus

necessary that this question be taken up again and developed in advanced studies; it is still more necessary that its exposition be oriented specifically toward applications to probability theory. Since 1959, Professor Neveu has been given the task of presenting this course at the Faculty of Sciences of Paris.

It is not necessary to introduce

him to the specialists in probability theory.

In a short number of years

he has gained their attention by brilliant work; but not all can know as well as I how much our students and young researchers appreciate his lively v

FOREWORD

VI

and clear method of teaching.

The course which he has taught, enriched

by this pedagogic experience, constitutes the subject matter of the present work. To be sure, there are already books, some of them more extensive, on the theory of measure and several of them are excellent.

However, I have

already stated why probabilists have need of a text written especially for their use; and for beginners, a text of limited size is preferable. In such a domain, Professor Neveu has naturally sought to write an expository book, not one of original work, in the sense that he does not pretend to introduce new concepts or to establish new theorems. The fact that he has, very usefully, enriched each chapter with Comple¬ ments and Problems underlines the essentially pedagogic objective of his book, concerning which I can with pleasure point out two non-trivial merits: he avoids an overburdened notation, and, in a subject which is by its nature abstract, he does not hesitate to insert whenever necessary a paragraph which interprets, which states the reason for things, or which calls attention to an error to be avoided. The exposition nevertheless proceeds with profound originality. First, by its contents: To the classical elements of measure and integration, the author adds all the theorems for the construction of a probability by extension; from an algebra to a a-algebra, from a compact subclass to a semialgebra, from finite products of spaces to infinite products of spaces (theorems of Kolmogorov and Tulcea), etc.

He treats the measurability,

separability and the construction of random functions; conditional ex¬ pectations, and martingales.

He illustrates general results by applications

to stopping times, ergodic theory, Markov processes, as well as other problems, all of these rarely included in treatises on measure theory, some of them because of their recent development, others because, while they are of major importance in probability theory, they are perhaps of less interest in general measure theory. The originality appears equally in the presentation; I particularly appreciate the simple but systematic way in which Professor Neveu has set forth from the first the algebraic structures of families of events which intervene (Boolean algebra and cx-algebras, etc.), while avoiding the pre¬ mature introduction of topological concepts, whose significance is thereby even better understood. Throughout, he has succeeded in establishing the most concise and elegant proofs, so that in a small number of pages he is able to be remark¬ ably complete; for example he treats, at least briefly, Lp spaces and even,

FOREWORD

Vil

by a judicious use of the Complements and Problems, decision theory and sufficient statistics. As a text for study by advanced students, as a reference work for researchers, I can without risk predict for this book long life and great success. R. Fortet August, 1965 Geneva

TRANSLATOR’S PREFACE

In comparison with the French original, this translation has benefited by the addition of a section (IV.7) on sequences of independent random variables, as well as by certain additions to the Complements and Problems. Also, the proofs of a few results have been modified.

For these improve¬

ments, and in particular for the full measure of assistance which I have received from Professor Neveu at every stage of the translation, it is a pleasure to record here my deep gratitude.

A. Feinstein

DEFINITIONS AND NOTATION Definitions of the terms partially ordered set (or system), totally ordered set, lattice, complete lattice, generalized sequence, vector space, Banach space, and linear functional (among others used in this book) may be found in Chapters I and II of the treatise Linear Operators, Part I, by N. Dunford and J. T. Schwartz. A real vector lattice is a set which is both a lattice (under some partial ordering) and a real vector space, and such that x < y implies cx < cy for every real c > 0, and also z + x ^ z + y for every z.

A linear func¬

tional / on a vector lattice is said to be positive if x ^ 0 implies /(x) ^ 0. A partially ordered set E is said to be inductive if it satisfies the hypothesis of Zorn’s lemma (Dunford and Schwartz, p. 6), i.e., if every totally ordered subset of E has an upper bound in E.

A pre-Hilbert space is a

space satisfying all the axioms of a Hilbert space except the axiom of completeness.

viii

translator’s preface

IX

The symbol => indicates logical implication; | denotes the end of a proof; {jc: • •

denotes the set of all objects x which satisfy the conditions

• • • ; * marks difficult sections or problems, for whose understanding or solution concepts not discussed in the text may be needed; finally, = has occasionally been used for “equality by definition.”

AUTHOR’S PREFACE

The object of the theory of probability is the mathematical analysis of the notion of chance.

As a mathematical discipline, it can only develop

in a rigorous manner if it is based upon a system of precise definitions and axioms.

Historically, the formulation of such a mathematical basis and

the mathematical elaboration of the theory goes back to the 1930’s.

In

fact, it was only at this period that the theory of measure and of integration on general spaces was sufficiently developed to furnish the theory of probability with its fundamental definitions, as well as its most powerful tool for development. Since then, numerous probabilistic investigations, undertaken in the theoretical as well as practical domain, in particular those making use of functional spaces, have only served to confirm the close relations established between probability theory and measure theory.

These relations are,

incidentally, so close that certain authors have been loath to see in proba¬ bility theory more than an extension (but how important a one !) of measure theory. In any case, it is impossible at the present time to undertake a pro¬ found study of probability theory and mathematical statistics without continually making use of measure theory, unless one limits oneself to a study of very elementary probabilistic models and, in particular, cuts one¬ self off from the consideration of random functions.

Attempts have been

made, it is true, to treat convergence problems of probability theory within the restricted framework of the study of distribution functions; but this procedure only gives a false simplification of the question and further conceals the intuitive basis of these problems. The book reproduces the essentials of a course for the first year of the third cycle (which corresponds roughly to the first or second year of

x

author’s preface

XI

graduate work in the United States) which is addressed to students who already have some elementary notions of the calculus of probability; it is intended to furnish them with a solid mathematical base for probability theory.

Only a reader with a sound mathematical development could

consider this book an introduction to the theory of probability. Our first aim in this course is therefore to teach the reader how to handle the powerful tools provided by measure theory and to permit him subsequently to deal with any chapter of probability theory.

Numerous

problems complement the text; given the very “technical” nature of the subject being treated, it would seem to us indispensable for the reader to try to read and solve the greater part of these problems.

(To help the

reader in this task, we have frequently sketched a solution of a problem.) In accordance with a presently well-established French tradition concern¬ ing introductory treatises, we have not deemed it worthwhile to insert bibliographical references in the text, or, with rare exceptions, to attribute the results obtained to their various authors.

The reader will find, at the

end of the book, a concise bibliography relating to the text or to the comple¬ ments; most of the problems, in particular, arise out of the works listed in this bibliography. We would not wish to conceal from the reader the fact that measure theory is not the unique tool of probability theory, even though it is its principal tool; we could not too strongly advise him to learn, if he has not already done so, the precise notions of topology, the theory of metric spaces, and the theory of Hilbert and Banach spaces.

This book could

not contain within its limited confines any introduction to these theories. Certain problems, and even certain portions of the text,f make use of notions borrowed from these theories; the beginner can ignore them without fear of losing the thread of the presentation, while the more ad¬ vanced reader will be able to find connections with outside fields which may interest him. I wish to take this opportunity to thank Professors R. Fortet, M. Loève and A. Tortrat for their suggestions and encouragement.

The

form of this book also owes much to the reactions of the students who have taken my course. Finally, my thanks go equally to Dr. A. Feinstein for his excellent work of translation. J. Neveu f We have marked them with an asterisk.

TABLE OF CONTENTS

Foreword

v

Translator’s Preface

viii

Author’s Preface

Chapter I.

x

PROBABILITY SPACES 1.1.

Events

2

1.2.

Trials

1.3.

Probabilities

10

1.4.

Probability spaces

13

1.5.

Extension of a probability

19

1.6.

Boolean semialgebras, compact classes, and

5

distribution functions on the real line

Chapter II.

25

INTEGRATION OF RANDOM VARIABLES 11.1.

Measurable mappings

30

11.2.

Real random variables

32

11.3.

The expectation of real random variables

37

11.4.

Almost sure convergence and convergence in probability

43

11.5.

Uniform integrability and mean convergence

49

11.6.

Lp spaces

55

Integration on topological spaces

60

*11.7.

xii

Xlll Chapter

TABLE OF CONTENTS

III.

PRODUCT SPACES AND RANDOM FUNCTIONS III. 1.

The product of two measurable spaces

111.2.

Transition probabilities and product prob¬ abilities

111.3.

70 73

Infinite products of measurable spaces and canonical probability spaces associated with random functions

111.4.

Chapter

Chapter

IV.

V.

79

Separability and measurability of random functions

86

111.5.

Continuity of real random functions

93

111.6.

Stopping times

99

CONDITIONAL EXPECTATIONS AND MARTINGALES IV. 1.

Measures

IV.2.

Duality of Lv spaces, and the weak topology

104

on the space Lx

113

IV.3.

Conditional expectations

120

IV.4.

Independence

125

IV.5.

Martingale theory

130

IV.6.

Centered sequences of real random variables

146

IV. 7.

Sequences of independent random variables

152

ERGODIC THEORY AND MARKOV PROCESSES V. l.

A theorem of Ionescu Tulcea and a theorem

V.2.

Construction of canonical Markov processes

on product spaces

161

(discrete time)

167

V.3.

Strong ergodic theorem

179

V.4.

Sub-Markovian operators

186

V.5.

Ergodic decomposition

194

V.6.

Pointwise ergodic theorem

202

Bibliography

215

Index

221

CHAPTER I

PROBABILITY SPACES

The fundamental concepts of the theory of probability are those of

events and of probabilities'. Axiomatically, events are mathematical entities which are susceptible of combination by the logical operations “not,” “and,” “or” (according to the rules specified in Section 1 of this chapter), while a probability is a valuation on the class of events whose properties are by definition analogous to those of a frequency (see Section 3). Another notion, which is in fact frequently introduced as the first notion of the theory of probability, is that of a trial, that is, the result of a random experiment.

From the natural condition of considering only

events and trials relating to the experiment which is being studied, every trial necessarily determines, by its definition, either the realization or non¬ realization of every event which one wishes to consider.

We are thus led

to introduce the ensemble Q of trials (or possible results of the experiment being considered) and to identify each event with the subensemble of trials which realize this event; a probability thus becomes a set function, similar to a volume defined on certain subsets of a Euclidean space.

The pre¬

ceding ensemble point of view is that of measure theory, which we shall develop in the first chapter. With regard to probabilities, we have defined them first on Boolean algebras (or, as in Section 6, on Boolean semialgebras), following which we extend them to cr-algebras and thus construct probability spaces.

This

procedure has the advantage of exhibiting a very important extension theorem of measure theory; moreover, in the construction of probabilities on Euclidean spaces or on product spaces (see Chapter 3), probabilities turn out to be defined naturally, at the outset, on algebras or semialgebras. 1

2

1.1

PROBABILITY SPACES

1.1. EVENTS The first concept of the theory of probability is that of an event; we shall consider events only from the point of view of their occurrence or non-occurrence.

The analysis of this concept will lead us to endow the

ensemble of events, which we wish to consider relative to a definite problem, with the structure of a Boolean algebra. We consider first two special events: the impossible event, denoted by 0, and the certain event, denoted by Q. With every event A we associate the contrary event, denoted by Ac; by definition the latter event is realized if and only if the event A is not realized.

The following properties of this operation (which are “ intui¬

tively” evident) are then set down as axioms: (.Ac)c = A;

0C = Q\

Qc = 0.

With every pair A, B of events we associate, on the one hand, the

event “union of A and Bf that is, “A or B,” denoted by A U B or sup (A, B); and on the other hand, the event “intersection of A and Bf that is, “A and B,” denoted by A n B, AB or inf (A, B). By definition the event A u B occurs if and only if at least one of the two events A and B occurs, while the event A n B occurs if and only if both of the events A and B occur. The operations of union and intersection are commutative and transitive:

A u B = B u A; (A u B) u C = A u (B u C);

A n B = B n A; (A n B) n C = A n (B n C)

so that every finite nonempty family {Ah i

e

/} of events has a union

U/^t = sup;^i and an intersection P|, At = inf; At.

The following

formulas are again set down as axioms:

A u

0

— A,

A u A = A, A n 0 = 0; A u Ac =

A n A = A; A u Q = Q, A n Q = A; A n Ac = 0 ;

as are the following relations which we write for a finite nonempty family {At, i e /} of events:

(y a)‘ = n tv

(n a)' = y m.

1.1

EVENTS

3

Finally, the operations of union and intersection are distributive relative to each other:

B n (y 4) = y (B n AO;

Bu (f)

=C\{B u At).

The structure which is established on the ensemble of events by the preceding definitions and axioms is called the structure of a Boolean algebra.

The following auxiliary notions which are defined on such a

Boolean algebra are no less important. Two events A, B such that AB = 0 are said to be exclusive or dis¬

joint; in this case we call their union their sum and write A + B instead of A U B. Given a finite nonempty family {A i} i e 1} of events which are pairwise disjoint, we similarly call their union the “ sum of the At (i e /) ” and write 2/ instead of U, At. The difference of two events A, B, denoted by A — B, is defined by A — B = ABC, while their symmetric difference, denoted by A A B, is defined by A A B = (A — B) + (B — A). The event A — B occurs if and only if the event A occurs and B does not; the event A AB occurs if and only if one, but not both, of the two events A and B occurs. It is convenient to extend the notation (J, A ;, Pfi Ah 2/ At to the case of an empty family of events by setting U Ai= 0,

y Ai=0,

i

i

(~)Ai = Q

(/empty).

1

By means of this natural convention, all the formulas written earlier relative to a finite family of events are valid even if the family is empty. This convention also permits us, for example, to write in a simple form the formula which constitutes the following elementary lemma. Lemma 1.1.1.

Given a family {Ah 1 < / < n) of n

1) events, we

have n

u

2 U - u A< ■ "

a, =

/

i-1

This lemma is proved by induction on n.

\

For n — 1 it is obvious.

Consequently (on setting A =

Ah B = An + 1), it suffices to prove that A U B = A + (B - A) for every pair A, B of events; but this identity is an easy consequence of the definitions.

3

Let us remark that the preceding lemma states simply that one of the events of the sequence Alf...,An is realized if and only if there exists a

1.1

PROBABILITY SPACES

4

first event of this sequence which is realized.

(The event Ai —

(J; = ï

is in fact realized if and only if At is the first event of the sequence to be realized.) The event A is said to imply the event B (which implication is denoted by A C B or B D A) if A = A n B, or equivalently, if B = A U B. Two events A and B such that A C B and B C A are said to be equivalent (A = B); we shall never distinguish between two such events.

The

relation of implication is an order relation on the ensemble of events, that is: A C A; A C B,

B C A => A = B -

A C B,

B C C

A C C.

Moreover, the union A u B (the intersection A n B) of two events A and B is the supremum (infimum) of these two events under the order relation: ACC,

BCCoAuBCC;

ADC,

BDCoAnBDC.

Finally, we note that A C B => Bc C Ac.

Complements and problems 1.1.1. Starting from the definitions and axioms above, show that the following identities are valid for any events A, B, C, D : A - B = A - (A n B) = (A u B) - B\ A A B = (A u B) - (A n B); (A - B)n(C - D) = (An C) - (Bu D); (A AB)n(A AC) = A A (B u C); (A A B) + (A A Bc) = Q. 1.1.2. Show that for the two operations A and n, every Boolean algebra is (in the algebraic sense) a commutative ring with a unit (Q) such that A n A = A for all A ; for this reason the operation A of symmetric difference is also called the Boolean sum, and the operation n of intersection is called the Boolean product. Conversely, given a commutative ring si with a unit, say Q, such that A -A — A for all A, the operations A n B = A-B;

A u B = A + B + A-B;

Ac = A + Q

define a Boolean algebra structure on si. (The signs + and • here denote the operations of addition and multiplication, respectively, in the ring si.)

1.2

5

TRIALS

1.2. TRIALS A second concept which is generally introduced at the beginning of the theory of probability is that of a trial. One thinks of a trial as an experiment in which chance intervenes, or rather as the outcome of this random experiment. As a consequence, every trial related to the model being considered necessarily implies either the realization or non-realiza¬ tion of every event given a priori and relative to the model in question. We shall now make precise in mathematical terms this relation between the concepts of trial and event. Let us first consider the ensemble of all trials related to a given model; we shall substitute the consideration of this ensemble for that of the model being studied. Let us therefore associate with every event A that part A' of the space of trials consisting of the trials which realize A; it is then natural to seek to “identify” A and A'. To this end, we shall first suppose that the correspondence A^A' is one-to-one, that is, that the space of trials is large enough that, given two distinct events, there is at least one trial which realizes one of these events to the exclusion of the other. Let us turn next to the definitions of Section 1.1. To the certain event Q corresponds the ensemble Q' consisting of all the trials, while to the impossible event 0 corresponds the ensemble 0' containing no trial; in other words Q' is the space of trials and 0' is the empty set in Q'. If to the event A corresponds the set A' in Q', then to the event Ac, which is realized if and only if A is not, there will correspond the set (A')c which is complementary to A' in Q', that is, which consists of those points in Q' (trials) not belonging to A'. Similarly if A and B are two events and A' and B' are the sets in Q' consisting of the trials which realize A and B respectively, the set (A u B)' [(A n B)'] in Q' is made up of those trials belonging to A' or B' [to A' and B'}. In short, if we denote by c, u and n the operations of complementa¬ tion, union and intersection, respectively, defined in the sense of set theory on Q', the preceding can be written as: (Ac)' = (AJ;

(AuB)' = A'uB';

(A n B)'= A'n B'.

The reader can now verify that the various axioms of Section 1.1 go over, under the correspondence A -» A', into axioms of set theory. We shall say that the correspondence A->A' establishes an isomorphism of the

1.2

PROBABILITY SPACES

6

Boolean algebra of events into the Boolean algebra FP(F2') consisting of all the subsets of the space Q' of trials.

(In general there exist subsets of Q'

which do not correspond to any event.) It is thus permissible to identify the event A and the set A' of trials which realize it.

(We shall in future suppress the sign '.)

To the various

notions of Section 1.1 relative to events, there correspond the classical notions of set theory.

This explains, in particular, the dual terminology

of Section 1.1, namely: A or B,

union of A and B;

A and B,

intersection of A and B\

A and B are incompatible,

A and B are disjoint;

the certain event F>,

the space F2\

the impossible event 0,

the empty set 0.

From the preceding we shall, in essence, carry over the following notions for the sequel: (a) the specification of a set Q (or space of trials), (b) the specification of a Boolean algebra of sets in Q (or events); that is, by definition, a class of sets in Q containing 0, F>, and closed under the operations of complementation, finite union and finite intersection. (A class # of subsets of a set Fi is said to be closed under an operation on sets if this operation, applied to any subsets of Fi belonging to a subset of Fi belonging to fé\)

, yields

Note that by virtue of the identities:

A, n-■ -nAn = (Ai u • • • u Acn)c; Ax u- • -u An = (Ai n • • • n Acn)c it suffices, for s# to be a Boolean algebra, that sd contain 0, F>, and be closed under complementation and finite union (alternatively, under complementation and finite intersection). Given an arbitrary class ^ of subsets of a set Fi, there exists a smallest Boolean algebra sf of subsets of Q which contains

.

To see this, it

suffices to define sf as the class of subsets of F> belonging to all Boolean algebras (of subsets of F>) containing ^(Q).)

.

(Such algebras exist ; for example

The Boolean algebra thus defined is said to be generated by *€.

The reasoning which shows its existence is of greater generality: it shows that when one considers a certain number of operations on sets (in the above, complementation, finite union and finite intersection), for every class ^ of subsets of Fi there exists a smallest class of subsets of Q containing ^ and closed under the set operations considered.

1.2

7

TRIALS

Proposition 2 below will give us a more explicit construction of the Boolean algebra generated by a class

First we will study the

particularly simple example of finite Boolean algebras. Definition 1.2.1.

A finite partition 2P = {At, i el} of

FINITE FAMILY OF NONEMPTY SUBSETS OF UNION

Q,

A

set Q is a

PAIRWISE DISJOINT AND WITH

O.

A finite partition CP' of Ti is said to be finer than a finite partition SP if every subset of Q in dP is the union of subsets of Q in SP' or, as one can easily show to be equivalent, if every subset of Ti in 0' is contained in some subset of Q in SP.

We note that if SPX and SP2 are two finite parti¬

tions of Q, then there always exists a finite partition SP of Q which is finer than both SPX and SP2\ in fact, if SPX = {Au i e /} and SP2 = {Bj,jeJ} it suffices to take for SP the family of nonempty At n Bj {i e I,j e J). Proposition 1.2.1.

The Boolean algebra sS generated by a finite

partition SP of Q is made up of the set of unions of all subfamilies of SP {if SP consists of n elements, stf consists of 2n).

Conversely, if stf is a finite

Boolean algebra of subsets of Q, the ensemble of its atoms {nonempty subsets A of Q such that 0 =£ B C A, B e srf => B = A) constitutes a finite partition of Q which generates srf. Concise proof.

The verification of the first part of the proposition is

simple and is left to the reader.

To prove the second part, we note that if

$4 = {Au i el) is a finite Boolean algebra of subsets of Q, then every subset B of Q of the form B = H/ Bi, where Bt = At or A\ {i e I), is either empty or an atom of sé.

Since two nonempty sets B are neces¬

sarily disjoint and since every Aestf (in particular Q) is the union of nonempty sets B which it contains, the set of nonempty B indeed forms a finite partition of Q which generates atom of srf is equal to a set B. Proposition 1.2.2.

Let

.

Finally, it is clear that every

\ be an arbitrary class of subsets of a set Q.

We form successively: (1) the class ^ consisting of 0, Q and the A C Q such that A or Ac e ; (2) the class (C2 of finite intersections of subsets of Q in rC1; (3) the class B e S, (c) A, B e Sr => AB e S. In order that a filter be maximal, that is, that it not be con¬ tained in any other filter, it is necessary and sufficient that for every A g æ/, either A or Ac belong to Sr. (Observe that if S is a filter and if A e jS, the ensemble {C: C 3 AB for a B 6 S) is either a filter, or else equal to stf.) Every filter is the intersection of the maximal filters which contain it. (Use Zorn’s lemma to show that if S is a filter and if A £ Sr, there exists a maximal filter not containing A, but containing S'.) The maximal filters will be called trials and denoted by A' = [ê: A e S'} is an isomorphism of jS onto a Boolean algebra of subsets of E. In order that A be an atom of srf, it is necessary and sufficient that A' consist of a single trial. If s# is isomorphic to a Boolean algebra of subsets of a set Q, the mapping 9 of Q into E such that

in s/

oo)

An f A(n -> oo) Proof.

P(Afi

P(A)\

in s/ => P{Afi t B(A).

The lemma follows from axioms

An — (An — A) + A

j,

(b)

and (c) and the relations

and

(An — A) )f 0

in stf,

and

(A - An) i 0

in sd,

if An j A in s/; An = (A — An) + An if An f A in s/. Lemma 1.3.2.

| (o-additivity).

In order that a mapping P of s/ into

[0, 1] he a probability, it is necessary and sufficient that: (a) 0 < P(A) ^ 1 (A esc/);

P(Q) = 1;

(b') P{2: Ad = f:P(Af) for every countable family ( finite or infinite) {Au i e I) of pairwise disjoint events such that 2/ At e ^■ Proof.

In fact, if P is a probability on (Q, s/) and if {An, n ^ 1} is a

countably infinite family of pairwise disjoint events such that 2nj> i Anes/, it follows from the convergence fjl Am

f

2m>i Am in ssf, when n

f

°°,

that by virtue of axiom (b) and the preceding lemma we have p(

V

\mfl

A^\ = lim /

n'°°

f

= lim \ 1

/

nîco

f

^P(Am) 1

= 2 P(Am). m> 1

Conversely, to show that axioms (a) and (b') imply (c), let us consider a

1.3

PROBABILITY SPACES

12

sequence {An, n ^ 1} of events which decreases to 0 ; it then follows from the identity An = fm>n (^m - Am + x) that

i ^ p(An) = 2 p(A” -

^ 0

I

m>n

Lemma

(o-subadditivity).

1.3.3.

For

every

countable

family

{Ah i e 1} of events such that (J; At g s/ one has

More generally, if the countable families {An i g /} and {B,, i e I} of events are such that A, D B{ (i g I) and U, At e srf,

Bt e stf, then

p(y A^ - p(y bt) ^ 2 FO4*) - p(^)](We leave the proof to the reader.)

Complements and problems 1.3.1. Metric Boolean algebra. If stf is a Boolean algebra and if P is a set function defined on jrf, satisfying axioms (a), (b) of this section, the formula d(Au A2) = P(A1 A A2) (Ax, A2 e j/) defines a function d from d x si into [0, 1] which satisfies the triangle inequality. On the other hand, the relation between events defined by Ax = A2 if P(Ax A A2) = 0 is an equivalence P

relation [A± = Ax for every

Ax = A2 is equivalent to A2 = A±; Ax = A2

p

p

p

p

and A2 = A3 implies Ai = A3\; moreover, the Boolean algebra structure of p

p o

induces on the quotient 00) implies An —> A. d

1.4. PROBABILITY SPACES One of the essential results of this chapter is that every probability defined on a Boolean algebra srf of subsets of a set Q has a unique extension to a probability on the cr-algebra generated by

.

In the present section,

we shall introduce various fundamental notions necessary for the proof of this result. Definition 1.4.1.

A Boolean ct-algebra

(or Borel field) of

SUBSETS OF A SET Q IS A CLASS OF SUBSETS OF Q WHICH CONTAINS 0 AND Q AND IS CLOSED

UNDER THE OPERATIONS OF COMPLEMENTATION, COUNT¬

ABLE UNION AND COUNTABLE INTERSECTION.

THE PAIR (Q, sf) CONSISTING

OF A SET Q AND A (BOOLEAN) ct-ALGEBRA S0 OF SUBSETS OF Q IS CALLED A MEASURABLE SPACE.

We remark that every class stf of subsets of Q which contains 0 and Q and is closed under complementation and countable union is already a a-algebra. Given a monotone increasing (decreasing) sequence {An, n ^ 1} of subsets of Q, we set lim„ t Definition 1.4.2.

= Un An (limn i An = (~)n An).

A class ^ of subsets of Q is said to be monotone

IF IT IS CLOSED UNDER THE OPERATIONS

lim

f

AND

lim

j

(SEQUENTIAL

LIMITS).

Proposition

1.4.1.

In order that a Boolean algebra stf be a a-algebra,

it is necessary and sufficient that it also be a monotone class. Proof.

Clearly, a Boolean o--algebra is closed under the operations

of monotone limits.

Conversely, consider a class ^ of subsets of Q which

is closed under finite union ; we shall show that ^ is closed under countable union if and only if ^ is closed under the operation lim f .

To this end,

14

1.4

PROBABILITY SPACES

we observe simply that for every sequence {An, n ^ 1} of subsets of Q we have

I

U An = lim t n

Given a class

m

of subsets of Q, the smallest cr-algebra (= the inter¬

section of all CT-algebras) containing ^ is called the cr-algebra generated by c3.

Similarly, the smallest monotone class containing Tl is called the

monotone class generated by (3. Proposition 1.4.2.

The Boolean o-algebra generated by a Boolean

algebra sZ is identical with the monotone class generated by sZ. Proof.

Let 38 be the Boolean cr-algebra generated by sZ and let Jt

be the monotone class generated by sZ. monotone class; hence Jt C 38.

By Proposition 1.4.1, 38 is a

It therefore suffices to prove that Jt is a

Boolean algebra in order to show (Proposition 1.4.1) that Jt is a Boolean cr-algebra and hence that 38 C Jt. To show first that Jt is closed under complementation, we must prove that the class Jt' — {B: B and BcsJt) coincides with Jt.

But this

follows from sZ C Jt' C Jt and the fact that Jt’ is a monotone class by virtue of the formulas (lim Î Bn)c = lim j B°,

(lim j Bn)c = lim f Bcn.

We next introduce, for each A e Jt, the subclass JtA = {B: B e Jt, AB e Jt)

of

Jt.

The identity lim ABn = A lim Bn which holds for every monotone sequence {Bn, n ^ 1} shows, to begin with, that the JtA are monotone classes.

When Ass/, we verify at once that A C JtA C Jt ; this is

possible only if JtA = Jt.

This implies, by virtue of the equivalence A £ Jt £ o B s Jt A,

that AeJtB for every Ass/ and B s Jt.

Consequently A C Jt B C Jt

for every B s Jt, from which we conclude that JtB = Jt for every B sJt. This shows that Jt is closed under intersection, and is therefore a Boolean algebra. | Definition 1.4.3.

A

Boolean ct-algebra sZ of subsets of a set Q is

SAID TO BE OF COUNTABLE TYPE (OR SEPARABLE) IF THERE EXISTS A COUNTABLE FAMILY OF SUBSETS OF Q WHICH GENERATES sZ.

1.4

15

PROBABILITY SPACES

Most of the cr-algebras which are considered in the applications are of countable type.

This is the case for the cr-algebra generated by a

countable partition {Au i e I), which consists of the unions of all sub¬ families of the partition.

One should be cautioned against supposing

that every cr-algebra of countable type can be generated by a countable partition of the space. Given an arbitrary sequence {An, n ^ 1} of subsets of Q, we define the subsets lim sup,,,,*, An and lim inf*-**, An by the formulas lim sup An

-

lim | (sup 4) = HU

n-*oo

n

m>n

n

lim inf An = lim f (inf Am) = (J Pi n-> co

n

m^n

n

Am,

m^n

Am-

m>n

The subset lim supn An (lim inf„ An) of Q consists of those to e Q which are contained in infinitely many (all but a finite number) of the An (n ^ 1). Clearly one always has lim infn An C lim supn An.

When these two sets

are identical, they are denoted by limn An; in particular, when the sequence {Anj is monotone increasing (decreasing), then limn f An = limn An (lim„ j An = limn An). Every Boolean cr-algebra is clearly closed under the (sequential) operations lim sup and lim inf.

We also observe that the event lim supn An

(lim infn An) occurs if and only if an infinite number of the events An (all of the events An with the exception of at most finitely many) occur. Definition 1.4.4.

A

probability space

SPECIFICATION OF A NONEMPTY SET cr-ALGEBRA

Q

stf OF SUBSETS OF Q (EVENTS)

Proposition

1.4.3.

(£?, jrf, P) is

defined by the

(THE SPACE OF TRIALS), A BOOLEAN AND A PROBABILITY P DEFINED ON

(Sequential continuity of a probability.)

.

For

every sequence {An, n ^ 1} of events defined in a probability space (ff sfi P) one has the inequalities P(lim inf An) < lim inf P(An) < lim sup P(An) < P(lim sup An). n

n

n

n

In particular, if limn An exists, then limn P(An) exists and equals P(limn An). Proof.

generalizes.

This proposition is a consequence of Lemma 1.3.1, which it In fact, by virtue of this lemma we have P(lim inf An) = lim P(inf An) < lim inf P(^m), n

m

n^m

m

P(lim sup An) = lim P(sup An) ^ lim sup P(Am). n

m

n>m

m

|

16

1.4

PROBABILITY SPACES

The following result is used very frequently in the calculus of probability: Proposition

1.4.4.

If{An, n ^ 1} is a sequence of events in (Q, sA, P)

such that 2n P(An) < oo, then P(lim supn An) = 0. Proof.

It suffices to let n -> oo in the following inequality, which is

deduced from Lemma 1.3.3: P(supm5sn Am) ^ £m>n P(Am)Definition 1.4.5.

A

set

A in

a probability space

BE NEGLIGIBLE (FOR THE PROBABILITY that

N C A, P(A) = 0.

COMPLETE IF

sA

P)

I

(Q, sA, P) is

IF THERE EXISTS A SET

The probability space

(Q,

sA,

CONTAINS EVERY P-NEGLIGIBLE SUBSET OF

A

P) is

said to

G sA SUCH

said to be

Q.

It is clear that every subset of a negligible set is again negligible, and it follows from Lemma 1.3.3 that the union of every countable family of negligible sets is again negligible. Proposition

1.4.5.

If jA denotes the class of negligible sets of a

probability space (Q, sA, P), the class sA of sets of the form A U N, where A

e

sA and N e JA, is identical with the o-algebra generated by sA and JA.

Moreover the formula P(A U N) = P(A) defines (unambiguously) the unique probability P on srf which extends P, and the probability space (ff s-fi P) is complete. The space (f3, s/, P) is called the completion of (Q, sA, P).

(This

operation of completion has nothing in common with the operation of completion in the theory of metric spaces and uniform spaces.) Proof.

The class sA is (as are the classes sA and jA) closed under the

operation of countable union.

It is closed under complementation as

well, for if N C B where P(B) = 0, then (A u N)c = (A u B)c + B n (A u N)c. The class sA is thus identical with the a-algebra generated by sA and jA. Since Ax U N1 = A2 u N2 implies that Ax A A2 C JVj U Nz and consequently that P(AX A A2) = 0 and thus that P(A±) = P(A2), the formula in the proposition defines P on sA unambiguously.

One can

then verify immediately that P is a probability and that the space (£, sÂ, P) is complete. |

1.4

17

PROBABILITY SPACES

It is interesting to connect this simple result with the properties of the outer and inner probabilities defined on a probability space (Q, srf, P). Definition 1.4.6.

The outer probability P* and inner probability

P*

ON A PROBABILITY SPACE

ON

0>(Q):

(Q, srf, P)

ARE DEFINED AS THE SET FUNCTIONS

PM) = inf{P(A); Q0 C A es/}, P-M) = sup{P(T); £20 D A e s/}. These functions have the following obvious properties: P*C&o) < P*(&o)

for every Q0,

PM) = P{A) = P*(A) P*(&o) = 1 - P*(&o) Proposition 1.4.6.

if A es/, for every Q0.

Given a probability space (Q, stf, P), the o-algebra

stf, the completion of stf for P introduced in Proposition 1.4.5, is identical with the class of subsets of Q on which P* and P * coincide, therefore also with the class {Q0: P*(Q0) + P *(£?£) — 1} of subsets of Q.

Moreover,

P = P* = P* on jtf. Proof.

We shall begin by proving the following lemma, which

makes essential use of the closure of stf under the operations of countable union and intersection. Lemma.

The infimum and supremum in the formulas defining P* and

P* are attained. To see this in the case of P*, for example, it suffices to note that if Q0 is a subset of Q and if {An} is a sequence in s# such that Q0 C An, P(An) -> P*(Q0), the set A — P| An belongs to a/, contains Q0, and is such that P(A) — P*(f20)The preceding lemma thus shows the existence, for each subset Q0 of Q, of two sets A and A' e srf such that A C Q0 C A' and

P{A) = PM),

P(A’) = PM)-

Hence if P*{&0) = P*(£0)> the set A’ — A is negligible and Q0eA-, moreover P(f20) = P*(£?0) = P*(£?0). Conversely, for every set A — A u N in s/ we can write P(A) < P*(Â) ^ P*(Â) ^ P(A u B) where B e

is chosen so that N C B and

P(B) = 0; the preceding inequalities are thus equalities, and the proposition is proved.

|

1.4

PROBABILITY SPACES

18

The preceding argument also shows that the outer (inner) probabilities defined starting from (Q,s/,P) and from (Q, jtf, P), respectively, are identical. Given a measurable space (Q, s/), those subsets of Q which, for every probability P on (Q, sf), belong to the completion of stf for P, are called universally measurable.

Clearly these sets form a cr-algebra to which

every probability on (Q, sf) can be extended.

Complements and problems 1.4.1. If Q is the real line and An is the interval (-co, an) (n > 1), what are lim supn A„, lim inf„ An? 1.4.2. events :

Demonstrate and interpret the following identities in terms of (lim inf An)c = lim sup (A„) n

n

lim sup (An u Bn) = lim sup An u lim sup Bn n

n

n

lim inf An n lim sup Bn C lim sup (An n Bn) C lim sup An rt lim sup Bn. 1.4.3. An atom in a probability space (12, , P) is defined as (the in¬ equivalence class of) a set A e ^ such that P(A) > 0 and A 2> B e s/ => P(B) or P(A — B) = 0 [compare with the notion of an atom of a Boolean algebra (Proposition 1.2.1) and observe that the notion of an atom is a notion in the metric Boolean algebra $4\P (Problem 1.3.2)]. Since two distinct (i.e., non P-equivalent) atoms have as intersection (the P-equivalence class of) the empty set, a probability space contains at most n atoms with probabilities and therefore at most countably many atoms. Every probability space can be decomposed into a countable union of atoms and a “ non-atomic” part; if the latter has probability zero, the space is said to be atomic. If (Q, sJ, P) is a probability space without atoms, then for every ae[0, 1] o

there exists at least one set A e s/ of probability P(A) = a.

[Let B0 be a o

maximal element of the subclass âS of s//P consisting of those B such that °

o

P(B) < a, which subclass is inductive under inclusion. would imply that the subclass o

Show that P(B0) < a

of s//P of those C such that P(C) > 0 and

o

C n B = 0 is inductive for D ; but every maximal element of ^ can only be an atom.] Show that for every probability space (Q, P) and for every e > 0 there exists a finite partition of Q in , each of whose elements either has probability or is an atom with probability > e.

1.5

19

EXTENSION OF A PROBABILITY

1.4.4. The trace probability and conditional probability relative to a set. If (Q, stf) is a measurable space and Q1 is a nonempty subset of Q, show that Q1 n stf = {£2iA; A e s/j is a a-algebra of subsets of (called the trace of sé on f?i). Show that if is a class of subsets of Q which generates the a-algebra si, then Qx n & generates n s/. lfP*(Qi) = 1, show that the formula Px{QiA) = P*(Q1A) unambiguously defines a probability Px on (Qx, n j&), called the trace of P on Qi. Show also that the metric Boolean algebras sijP and £?i n si/Pi are isomorphic (use the lemma of Proposition 1.4.6). For any Bei with probability P(B) > 0, the formula PB(A) = P(AB)/P(B) defines a probability on both (Q, si) and {B, B n si). Show that this result remains valid for a set B not belonging to si, if one sets Pb(A) = P*(AB)/P*(B). 1.4.5. o-additive class of sets. We apply this name to a class & of subsets of Q such that: (a) Q e (b) Fx, F2 e 3? implies Fx + F2 e & if FiF2 = 0, and F1 — F2 e & if Fx 3 F2\ (c) fF 3 Fn f implies lim„ Fn e &. Every class of subsets of Q is contained in a smallest cr-additive class, which is said to be “generated by oo)

with

An e si;

1.5

PROBABILITY SPACES

20

the lemma shows in fact that the second member does not depend on the sequence {An}, but only on G. Lemma

sequences

1.5.1. in

If {Am,m ^ 1} and {A'n, n ^ 1} are two increasing

s/

^ limn f P(A’n).

such

that

(Jm Am C (Jn A'n,

then

limm

f

P(Am)

If, moreover, \JmAm = {JnA’n, then lim f P(Am) = hm f P(A'n).

Proof.

Since for every fixed m, {AmA'n, n ^ 1} is an increasing

sequence in a/ such that

(Jn AmA'n

= Am, we have

lim f P(A'n) > lim f P(AmA'n) = P(Am), n

n

using the sequential continuity of P. from this by letting m -> oo.

The assertion of the lemma follows

|

The following proposition establishes the properties of ^ and II, defined above; it states, in particular, that ^ is the smallest class of subsets of 72 containing srf and closed under the operations of finite intersection and countable union. Proposition

1.5.1.

The class ^ and the set function II defined above

have the following properties: (a) 0 g g? and

71(0) = 0;

72 e & and 77(f2) = 1;

0 < 77(G) ^ l(Ge^); (b) g15 g2 g s?

fG1 u G2, Gx n G2

g

&;

W?i u G2) + 77(GX n G2) - 77(GX) + 77(G2); (c) G, C G2, G±, G2 g ^ n(Gi) ^ 77(G2); (d) Gn f G (n -> oo) with Gn g ^ => G g

77(G) = limn 77(Gn).

Moreover & is the smallest class of sets in 72 containing sA and enjoying the preceding properties (relative to A); finally, 11(A) = P(A) if A e s/. Proof.

The verification

of

(a) is immediate.

Next, let us consider

two increasing sequences {Any, n ^ 1} and {AUt2; n ^ 1} in st with limits G1 and G2 respectively; to show (b), it suffices to let n -» oo in P(Anf) + P(An f) = P(An 0 and choose G1} G2 e L§ such that Lii C G{, 77*(Qi) + e/2 ^ 77(G)) for

i

= 1,2; then

n*(Lif) + n*(Li2) +

e

^ 77(G1) + 77(G2) = 7I(G1 u G2) + 77(G1 n G2)

Ss 77*(Li1 u 7>2) + 77*(Lil n 7>2). Property (c) follows from the monotonicity of 77. fix

e

To prove (d), we

> 0 and choose a sequence of en > 0 (n ^ 1) such that

a sequence of Gn e g such that Lin C Gn, 77*(Lin) +

en

^ 77(Gn).

e and We

then set G{ = Um«n Gm, so that T?n C G'n (n ^ 1) and {G'n, n ^ 1} is an increasing sequence in Lg.

We shall next show by induction on n that

77*(Lin) + Xern> 77(G’n)

(n > 1).

1.1

PROBABILITY SPACES

22

This relation is by hypothesis satisfied for n = 1; suppose that it holds Then by virtue of Qn C G'n n Gn + 1 e £§, we have

for n.

J7((?; + 1) = n{G'n u Gn + 1) = n(G'n) + 77(Gn+1) - n(G; nGn+1) < [77*(£?„) + 2 cm] + = n*(f2n+1) +

2

in*(Qn + i)

+ eB + 1] - 77*(72n)

em-

The preceding relation is thus proved; letting n—»co and taking into account that 72œ C lim„ f G'n£&, we obtain

n*(Qn) +

lim t

e > 77(lim f GÎ.) ^

n

Since

e

n*(Qn).

n

is arbitrary and since, conversely, 77*(7?n)

virtue of (c), property (d) is proved.


that

Proof.

D e Q> => Dc e 3>, and from property (a) that 77*(0) + 77*(7>) = 1, and therefore that 0,

Q e 3>. If Du D2 e Q), the sum of the right sides of the inequalities

n*(D1 n*[(D1 equals 2.

u d2) + n*(D1 n d2) < u d2)c] + n*[(D1 n d2)c] ^

n*(D1) + 77*(t>2), n*(D{) + n*{Dc2\

Since property (b) of the proposition implies, on the other hand,

that

n*(D1 u D2) + TI*[(D1 u D2)c] > 1, TI*(D1 n D2) + n*[(D1 n D2f] > 1, we see that the four preceding inequalities are compatible only if they are all equalities.

It follows that Dx

D2 e &>.

The class Q) is thus closed

under finite union and intersection; the function 77* is additive (in the strong sense) on S>. If{7)n, n ^

1)

is an increasing sequence in 2, it follows from properties

(c) and (d) that 77* U^; 77*

= lim f n*(Dn), n

^

n*(Df)

if

m >

1.

1.5

EXTENSION OF A PROBABILITY

23

This implies that I7*[\Jn Dn\ + i7*[(U„ Dn)c] < 1 and consequently (by property (b)) that (Jn Dn e 3.

The preceding suffices to prove that 3

is a CT-algebra of subsets of Q and that the restriction of 77* to 3 is a probability.

The corollary is proved upon noting that C DeS,

11*(D) = 0

implies that

n*(Q1) + n*(Qi) ^

n

r

n

lim

P(Am A An) = 0

m,n~*

for every sequence {An, n ^ 1} in si and observe that one can extract from every Cauchy sequence {An, n ^ 1} in si/P a subsequence {Anj,j 3= 1} such that 2i P(Anj A Anj+ ^ < oo.]

Show how one can derive the extension

theorem for a probability from the theorem on metric completion. 1.5.2. Let (£2, si, P) be a probability space. If 38 is a Boolean algebra of subsets of £2 which generates the a-algebra si (possibly up to negligible sets), show that the classes 38a6 and 386a differ from si only by negligible sets. (The notation is that of Problem 1.2.1.) 1.5.3. Let (£2, si, P) be a probability space and let 3F be a a-filter of subsets F of £2 [Definition: if Fn e (n > 1), then Pin Fn e having outer probability P*{F) = 1. Show that there exists a probability P' on the aalgebra generated by si and £F, equal to P on si and identically 1 on & [apply Proposition 1.5.2 to the set function n(£20) = inf {P(A) \ A e si, £20 C A u Fc for some Fe *1.5.4. Theorem on capacities. Let ^ be a class of subsets of £2 containing 0 which is closed under finite union and countable intersection. A capacity >/< is then a mapping of 38(£2) into R such that (a)

£21 C £22 => 0(Fn) j >£(F).

Show that for every set A in the Souslin class 3?s one has: >P(A) - e

where

Gp =

(J

Fv.

v:vi 0

1.6

PROBABILITY SPACES

28

we choose Cn in # so that Cn C An and P(An) ^ P(Cn) + e2 n.

As

= 0-

On Cn c (~]n An = 0, there exists an integer N such that Cn The formula AN = An c (An - C„) and the finite additivity and subadditivity of P now show that

fW

p(an)
lim f E[inf (*;, JQ] = E{X'm) n

(m > 1).

n

Letting m-> oo proves the lemma. It follows from the lemma that if X = limn f Xn (Xn e S+, X g the expression limn f E(Xn) depends only on X and not on the sequence

II.3

INTEGRATION OF RANDOM VARIABLES

40 {Xn, n Ss 1}.

This justifies the definition of E{X) when Xef+, and it

clearly follows from the positivity of E that we have thus obtained a positive extension of E from S + to / +.

Furthermore, the lemma proves

property (c) above. To prove properties (b), it is sufficient to observe that if Xy = lim t Xn i n

and

X2 = lim f Xn,2

'

(Xnii e S + , X{e /+),

n

then cXi = lim f cXn 0,

n

X-y + X2 — lim f (3fnti + Xn2), n

sup (Xy, X2) = lim

t

sup (Xn_y, Xna)

n

and inf (Xy, X2) = lim t mf(Xn>1, XUi2), n

and then to apply the definition of E().

To prove property (d), we

observe that if Xn = lirnm f Tm,n (Ym$neS+;n > 1) increases in/ + with n, then, setting Zm -- sup Ym_n eS +

(m ^ 1),

n^m

we have Ym-n ^

Xm (m Jï n),

E( YmJ ^ E{Zm) < E(Xm) (m =* n),

Zm

Zm + y,

EZm < EZm + 1;

hence, letting m -> oo and then n -» co, we obtain lim f Xm = lim f Zm e / + and lim t EXm = lim f EZm = £(lim \ Zm). mm

|

m

In the case where the space S of the preceding proposition is the space of step r.r.v.’s on a probability space (Q, sZ, P), Proposition II.2.4 shows that the class / + is identical with the class of all positive r.r.v.’s (finite or not) on (Q, sZ, P).

The preceding proposition then establishes the

properties of the expectation of positive r.r.v.’s. A r.r.v. X is said to be integrable if E(X +) < oo and E(X~) < oo; in particular, every bounded r.r.v. and every step r.r.v. is integrable; a positive r.r.v. X is integrable if and only if E(X) < oo.

For every

integrable r.r.v. X, we set E(X) = E(X+) — E(X~); we thus obtain an extension of E( ) to all integrable r.r.v.’s which still has the property

II.3

41

EXPECTATION OF REAL RANDOM VARIABLES

of linearity, positivity and monotone continuity (see the following proposition). More generally, a r.r.v. X is said to be quasi-integrable if at least one of the numbers E(X +) and E(X~) is finite; this condition is the most general which still permits us to define the expectation E(X) as E(X) = E(X +) - E(X~). Proposition

11.3.3.

Given a probability space (Pi, sY, P), the expec¬

tation E(- ) defined on the set of quasi-integrable r.r.v.’s has the following properties: (a) E(X) e R; E(X) s R if and only if X is integrable, in which case P({X = ±oo}) = 0; E(X) ^ 0 if X ^ 0, or, in fact, if P({X < 0}) = 0; (b) E(cX) = cE(X) for every finite constant c; E(X + Y) — E(X) + E(Y) if X + Y is defined and if X~ and Y~ (or X+ and Y + ) are integrable; (c) X ^ Y => E(X) ^ E(Y); (d) Xn f X => E(Xn) f E(X) if X~ is integrable for at least one n; Xn | X => E(Xn) 4 E(X) if X^ is integrable for at least one n. Proof.

Properties (a) and the first of properties (b) follow immedi¬

ately from the definitions.

To prove the additivity of E(-), we note first

that if X1 and X2 are two positive r.r.v.’s of which one at least is integrable, and if

— X2 ^ oo — oo at every point of Pi, then the r.r.v. X =

X\ — X2 is quasi-integrable and E(X) = E(X1) — E(Xf).

In fact, we

have X+ < Xx, X~ ^ X2 (from which it follows that X is quasi-integrable), and X+ + X2 = X~ + Xx (from which it follows that E(X + ) — E(X~) = E(X1) — E(X2)).

The additivity of E( ) under the conditions (b) above is

then a simple consequence of the decomposition X + Y =(X+ + Y+) - (X- + Y-). The monotonicity of E( ) follows from its linearity and positivity. Let {Xn, n ^ 1} be an increasing sequence of r.r.v.’s such that Xn0 is integrable for some fixed n0 and let X = lim f Xn.

Then X^
Fflim sup Xn] ^ lim sup E(Xn), n n Xn ^ Z

£[lim inf Xn\ ^ lim inf E(Xn). Tl

TL

In particular, if the sequence {Xn, n ^ 1} is convergent and if there exists an integrable r.r.v. U such that \Xn\ ^ U (n ^ 1), then E{lim Xn) = lim E(Xn). n n Proof.

We note first that if Y is an integrable r.r.v. and if the r.r.v.

X is such that X ^ Y, then X+ is integrable and X is quasi-integrable. The first hypothesis of the corollary thus implies that (supn X„)+ is integrable; since supmS.n Xm j lim sup„ Xn, we deduce from property (d) of the proposition that sup E(Xm) ^ E\_ sup -^m] 4' E[hm sup .A^] m>n m^n n

US



oo.

The second implication of the corollary is proved in a similar way. Hence if — U

Xn ^ TC where f/ is integrable, then

Fflim inf Xn] ^ lim inf E(Xn) ^ lim sup E(Xn) < Fflim sup Xn] ; n n n n if the sequence {Xn, n ^ 1} is in addition convergent, this implies that limn E(Xn) exists and equals E\fimn Xn}.

|

With every positive r.r.v. X we associate the set function defined on sZ by

S

a

X = E[X\a]\ this set function (called the indefinite integral of X)

obviously has the following properties: (a) 0 ^ jA X ^ E(X); (b) JSi4( X = 2 $

jA X = Oo P(A{X > 0}) = 0;

X for every countable family {Au i e 1} of pairwise

disjoint sets; (c) A i C A 2

jAi X ^ jA2 X;

(d) A, t A =► jAn X f jAX, An i A => n ^ 1.

X j \AX except possibly if jA X = oo for every

II.4

43

ALMOST SURE CONVERGENCE

More generally, the set function j

X can be defined for every quasi-

integrable r.r.v. X; it again has properties analogous to (a)-(d). Complements and problems II.3.1. Extension of a probability. Let (Q, ■s?,P) be a complete prob¬ ability space, {Bt, i e /} a countable partition of Q, and 38 the a-algebra generated by sS and {Bh i s /}. By the lemma of Proposition 1.4.6, there exist sets Bf (is I) in s/ such that Bt C Bt, P*(Bt) = P(Bt). Show that every set B in 38 can be written in the form 5=2/ Aj5i where the At are subsets of the Bt belonging to st, and are determined by B up to equivalence. Give an analogous representation for the ^-measurable r.r.v.’s. Show that the most general probability P on (Q, 38) whose restriction to s3 is equal to P is given by P(B)

= 2 f I

Xi dP,

J Ai

where the Xt are positive r.r.v.’s defined on (Q, j&,P), vanishing off the respective Bt, with 2/ Xt = 1; these r.r.v.’s are determined by P up to equiv¬ alence. Deduce from this that except for the trivial case where all the Bt belong to s3, there exists an infinity of probabilities P on (Q, 38) which extend P.

..

11 4 ALMOST SURE CONVERGENCE AND CONVERGENCE IN PROBABILITY Two r.r.v.’s X and X' are said to be equal almost surely (or almost everywhere) if P(X ^ X') = 0.

This relation, which is clearly an

equivalence relation, is indicated by X = X'.

One can show without

a.s.

difficulty that

X = X' a.s.

and

Y = Y'

implies that cX — cX', that

a.s.

a.s.

X + Y = X' + Y' and that XY = X'Y' as long as the sums and a.s.

a.s.

products are meaningful; in the same way, if X, = X[for every i e /, where a.s.

I is a countable index set, then sup; Xt = sup; X' and inf; Xt = inf X[. a.s.

a.s.

Moreover, if X has an expectation, every r.r.v. X' = X has an a.s.

expectation E(X') equal to E(X)\ in particular X' is integrable if and only if X is. Given a r.r.v. X, we denote its equivalence class by X, i.e. X = {X ' : X' = X} ; obviously X is determined by any one of its elements.

As

44

II.4

INTEGRATION OF RANDOM VARIABLES

will be seen in the sequel, most problems of the theory of probability involve equivalence classes of r.r.v.’s rather than r.r.v.’s themselves; the importance of the foregoing elementary properties lies in that they allow one to operate on equivalence classes of r.r.v.’s in the same way as on r.r.v.’s themselves, provided however that one considers only a countable family of r.r.v.’s at one time.

In general one identifies (by abuse of

language) an equivalence class of r.r.v.’s with an arbitrary one of its representatives; the reader should be warned that this is valid only if one is considering countable families of r.r.v.’s. If {Xt, i el} is a countable family of r.r.v.’s and if

(i e /) are their

respective equivalence classes, we have already remarked that the equiv¬ alence class of sup/ Xt depends only on the Xt (i e I) and is therefore the supremum of the Xt (i e I).

We shall show that every family, even un¬

countable, {Xu i e /} of equivalence classes of r.r.v.’s has a supremum, denoted by ess sup/ Xp, one should note that in the uncountable case the function of a>: sup/ X^œ) (where X( e Xt) is not necessarily a r.r.v., and that even if it is measurable, its equivalence class is not necessarily equal to ess sup7 Xt (see the example below). The ensemble of equivalence classes of

Proposition II.4.1.

defined on iff , a.]

We then set U =

supJo Xt. For every r.r.v. Y such that X{ < Y a.s. (i s I) we obviously have U ^ Y a.s.

To show the converse implication, it suffices to show that

Xi ^ U a.s. for every i g I.

But it follows from the maximality property

of J0 that for every is I we have £(/[sup (Xu £/)]) = E[f(U)] = a; hence/[sup (Xt, 17)] = f(U), and so sup (Xh U) = U for every i s I. a.s.

We

a.s.

have proved the existence of ess sup, Xt; its uniqueness up to equivalence a.s. is immediate.

The existence and uniqueness of ess inf; Xt can be

proved in the same way. Example.

g

Let (Q, s/, P) be the complete probability space con¬

structed from the Lebesgue measure defined on the interval [0, 1]; we denote by Xr (r s [0, 1]) the r.r.v.: Xr(co) = 1 if w = r, = 0 if œ =£ r. this case Xr = 0 a.s. and ess supr Xr = 0 a.s.

In

In contrast the supremum

of the set {Xr, r s [0, 1]} of functions from Q into R is equal to 1.

Note

the role played by the subsets of £? with probability zero in this example and in the proof of the preceding proposition. If A s srf has probability zero, two r.r.v.’s X and X' which are equal on Ac are a.s. equal; in other words the restriction of X to Ac already determines the equivalence class Î of I.

A measurable mapping of Ac

into R (for example the restriction of X to Ac) is called a r.r.v. defined almost everywhere; it is always possible to extend a r.r.v. defined almost everywhere to a r.r.v. on (Q, sY) [for example by setting it equal to 0 where it is undefined]. The interest in complete probability spaces (Section 1.4) is due to the fact that one can modify a r.r.v. X arbitrarily on a negligible set of such a space, without in the process disturbing the measurability of X (nor by the way its equivalence class).

We remark that by completing a probability

space one increases the number of r.r.v.’s but does not introduce any new equivalence classes. Definition

II.4.1.

A sequence {Xn, n 7 1} of r.r.v.’s is said to

CONVERGE ALMOST surely (a.s.) if lim supn

Xn

= lim infn

Xn.

46

II.4

INTEGRATION OF RANDOM VARIABLES

The limit of {Xn, n ^ 1} is then, by definition, any one of the r.r.v.’s in the (uniquely determined) equivalence class of lim supn Xn; we write lim a.s.n_oo Xn for this equivalence class or any one of its elements. Cauchy criterion.

In order that a sequence {Xn, n

of a.s.

^ 1}

finite r.r.v.’s converge a.s. to an a.s. finite r.r.v., it is necessary and sufficient that it be a Cauchy sequence for a.s. convergence, that is, that {Xm - Xn; m,n >

1}

converge a.s. to 0 as m, n -> oo. This criterion results immediately from the Cauchy criterion for sequences of real numbers, upon observing that the sequence {Xn} {{Xm — Xn\m,n ^

converges a.s.

1})

only if the sequence {Xn (")}

((Xm(ai) — Xn(u>); m, n ^ 1}) converges in R for every œ outside of a set having probability zero. Proposition

II.4.2.

In order that a sequence {Xn, n ^ 1} of a.s.

finite r.r.v.’s converge a.s., it is sufficient that there exist a summable sequence {en, n > 1} of positive numbers such that oo

2

P{ \Xn + 1

-

xn\ > en) < oo;

71= 1

the limit is then a.s. finite. Proof.

We set An — {\Xn + 1 — Xn\ > cj for every n ^ 1.

hypothesis and Proposition 1.4.4 imply that lim supn An = 0.

The

We can

a.s.

therefore define, outside of the negligible set lim supn An, a r.v. N with positive integer values by setting N(w) = n

on

U

~

U

Am,

=0

on

( (J AmX.

Under these conditions the sequence (2fn + 1(cu) — Xn(œ)} is majorized in absolute value, from the N(w) + 1st term on, by the sequence suffices to show the existence of X(œ) = lirn Xfœ) = Xfw) + V [Zn + 1(o>) - Xn(œ)] n—* oo for every œ ) - Xm(œ)\ ^ 2 en n^m

as long as N (co) < m (note that P({N < m}) Î 1 as m f oo). Definition II.4.2.

A

sequence

{Xn, n >

1}

of a.s.

CONVERGES IN PROBABILITY TO THE a.S. FINITE r.r.V.

P(\Xn - X\ > FOR EVERY e

>

0.

WE

X IF

(n -» oo)

e)->0

THEN WRITE

finite r.r.v.’s

Xn -> X. P

A sequence {Xn, n ^

Cauchy criterion.

1}

of

finite

a.s.

r.r.v.’s

converges in probability if and only if it is a Cauchy sequence for convergence in probability, that is if Xm — Xn—>0 (m, n -> oo). p

We shall establish this criterion simultaneously with the following result, which gives the connection between a.s. convergence and conver¬ gence in probability. Proposition

which converges same limit. r.r.v.’s

II.4.3.

a.s.

Every sequence {Xn, n ^

to an

a.s.

finite

r.r.v.

1}

of a.s. finite

r.r.v.’s

converges in probability to the

Conversely, from every sequence {Xn, n ^ 1} of

a.s.

finite

which converges in probability one can extract a subsequence which

converges

a.s.

Proof.

to the same limit. Let {Xn, n ^ 1} be a sequence of a.s. finite r.r.v.’s, and let X

be an a.s. finite r.r.v.

Then:

(1) Xn —> X => Xn —j> X, since for every e > 0 lim sup P{{ | Xn — X\ >

e})

n

< P[lim sup {\Xn — X\ >

e}]

n

< P({ — e + lim sup Xn < X < e + lim inf Xfic) — 0. n

(2) Xn-y> X

n

(Xm - Xn) -y> 0 (m, n -> co),

since, for every

e > 0,

{ \ Xm - Xn\ >e)C{\Xm- X\> e/2} (J { | Xn - X\> e/2} and hence P{\xm - Xn\ > as m, n

oo.

e)

^ P(\Xm - X\>

e/2)

+ P(\Xn - X\> e/2) —>0

48

II.4

INTEGRATION OF RANDOM VARIABLES

(3) Xm- Xn-^0 (m, n -> oo) => XRj some r.r.v. X and some subsequence {«,}.

X and

Xn^ X for

In fact, we determine the terms

of this subsequence step by step by setting nx = 1 and taking for nf the smallest integer N > /z*_i such that p[\Xr - Xs\ > j, ) < jf

if

r,s> N.

It follows, then, from 2; P( IXnj+ 1 — Xnj\ > 1/2J) < 2/1/3; < oo that the sequence {Xnpj ^ 1} is a.s. convergent (Proposition II.4.2); moreover if X denotes its limit, it follows from P(\xn - X\ >

e)


e/2) +

P(\Xnf -

x\>

e/2),

letting n and j go to oo, and using the hypothesis and (1), that Xn —p» X. The proof of the Cauchy criterion and the proposition is thus complete.

|

Complements and problems 11.4.1. In order that the notions of almost sure convergence and con¬ vergence in probability be equivalent on a probability space (Q, P), it is necessary and sufficient that the space be atomic. 11.4.2. If (12, jtf, P) is the interval [0, 1] of the real line, taken with the e-algebra of the Borel sets and the Lebesgue measure, let {An, n 2= 1} be the sequence of subintervals of [0, 1] of the form An



Xq +M ,2P’ 2P J’

where 2P + q = n is the (unique) decomposition of n 5= 1 such that p and q are integers satisfying p ^ 0, 0 «S q < 2P. Show that \Afi —p > 0 but that limsupn lAn = 1, liming \An = 0. II.4.3. The functional

- ^nrm) on the set V(of equivalence classes) of a.s. finite r.r.v.’s is such that + Y) ^ c(X) + e( Y) and *(cX) [max (1, c)]f(T). Show that d(X, Y) = z(X - Y) defines a metric on V, that the topology of the metric space (V, d) is that of convergence in probability (i.e., that d(Xn, X)—>0o Xn—p* X) and that the metric space (V, d) is complete.

II.5

UNIFORM INTEGRABILITY AND MEAN CONVERGENCE

49

Show that for a subset H of V to be relatively compact in this topology, it is necessary and sufficient that for every € > 0 there exist a real constant C and a finite family {Ah i e /} of measurable sets such that (a) P(U/ At) > 1 - e;(b)\X\*Z C on Ur At for every Xe H; (c) ess sup,,, X - ess inf,,, X < e for every X e H. II.4.4. (Egorov’s theorem.) If Xn-> X a.s. on (Q, st, P) and X is a.s. finite, then for every e > 0 there exists a set Ae with probability P(Ae) ^ 1 — e such that Xn~^ X uniformly on Ae. [Take A% = sup sup {\ Xn - X\ > l/k} k

n^rik

with a suitable choice of {nfc}.] 11.4.5. Let (Q, jS,P) be a probability space and let (Q, -sf, P) be its completion. Show that there is an identity between: (1) the r.r.v.’s defined on (Q, sf, P); (2) the mappings of Q into R which are equal, except on a negligible set, to a r.r.v. defined on (Q, st, P). 11.4.6. Let {Xn, n > 1} be a sequence of real random variables. Show that there exists a smallest (largest) equivalence class Y' (Y") of r.r.v.’s such that for every e > 0 one has lim P( Y' - Xn
€„]
a}

f

J {X = co)

X =

0.

The first part of the proposition now follows from the fact that the inequality \Xt\ < X implies

1

\Xt\
a}

f

X.

J{X>a}

Every finite family {Xt, i s 1} of integrable r.r.v.’s is majorized by the integrable r.r.v. X = 2/ \Xt\, from which follows the second part of the proposition. | A family {Xu i e /} of integrable

II.5.2.

Proposition

r.r.v.’s

is

uniformly integrable if and only if it satisfies the following two conditions: (a) (uniform absolute continuity) for every e > 0 there exists an rje > 0 such that sup I

|Tj| ^ e

whenever

P{A) < ^

JA

(b) sup, jn\Xt\


1}

of integrable

r.r.v.’s the following two conditions are equivalent: (a) {Xn\ converges in the mean of order 1 as n-> oo; (b) {Xn} is a Cauchy sequence for convergence in the mean, that is, a sequence such that E\Xn — Xm\ —> 0 as m, n -» oo. Proposition

II.5.4.

For every sequence {Xn, n ^ 1} of integrable

r.r.v.’s and for every r.r.v. X, the following two conditions are equivalent: (c) {*», n ^ 1} is uniformly integrable and Xn —X as n —> oo; Li

(d) X is integrable and Xn-> X as n Proof.

We

shall

prove

oo.

that

(a) => (b) => (c) => (d) =► (a). The necessity of the Cauchy criterion follows from the fact that every sequence {Xn, n ^ 1} of integrable r.r.v.’s which converges in the mean to the integrable r.r.v. X is such that

E\Xn — Xm\ ^ E\Xn — X | + E\Xm — X\ -> 0

as

m and n ->■ oo.

Next, we use the criterion of Proposition II.5.2 to show that every sequence {Xn, n > 1} of integrable r.r.v.’s which is a Cauchy sequence for

II.5

53

UNIFORM INTEGRABILITY AND MEAN CONVERGENCE

convergence in the mean is necessarily uniformly integrable. choose, for every € > m, n ^ Ne.

an index Ne such that

0,

The inequality jA |3fn| ^

Let us first

\\Xm - Xn\ ^

e

if

| A"m| + j\Xn - Xm\ now implies

that SUp n

for every A e sY.

Ja

\Xn\ ^

sup

miNe Ja

\Xm\ +

e

As the finite family {Xm, m ^ Ne} is uniformly inte¬

grable (Proposition II.5.1), it follows that supnJ|An| < oo and that suPn

jA \Xn\

^

2e

as long as P(A) is sufficiently small.

We complete the

proof of (b) => (c) upon observing that E\Xm — Xn\ ->0 implies that P[ \Xm - Xn\ > e] sj - E\Xm - Xn\ -^0

as

m,n^ oo,



and that the sequence {Xn} therefore converges in probability to a finite r.r.v. by virtue of the Cauchy criterion for convergence in probability. Under the hypothesis (c), the r.r.v. X is necessarily integrable. fact, if {«_,} is an increasing sequence of integers such that Xnj (Proposition II.4.3), and therefore such that \Xnj\

\xn ~ x\ + \Xn-X\^e)

^ e +

e

->

0,

\Xn\

f

J {\Xn

as n -» oo and then

sup£|.Un|

< co

Next, under the same hypothesis

X

|A|, the Fatou-

Lebesgue lemma shows that E\X\ ^ lim inf; E\Xnj\ < (Proposition II.5.2).

In

— X| > e}

+

r

\xn~ x\ \Xn-X\>e)

f

|^f| —^ 0

J{|Xn-X|>e}

since P[ \Xn - X\ >

e]

->

0

as n -> oo and since

the sequence {Xn} is uniformly absolutely continuous. The proposition is proved, because (d) is equivalent to (a).

|

Complements and problems II.5.1.

If X, Y are two positive integrable r.r.v.’s and ifZ = sup (X, Y),

then

J

J

J{Z>a)

J{X>a}

Z+ J

Y

(a ^ 0).

54

INTEGRATION OF RANDOM VARIABLES

II.5

Deduce from this that if the sequence {Xn, n > 1} of r.r.v.’s is uniformly integrable, then e\- sup \.ti

1
oo).

J

11.5.2. In order that the family {Xu i e 1} of r.r.v.’s be uniformly integrable, it suffices that there exist a function / defined on [0, co], which is real, positive, measurable and such that lim + 0o (1 /x)f(x) = co and sup E[f( | Xt \ )] < co. /

Examples: f{x) = xp for p > 1 ; f(x) = x(logx) + . 11.5.3. If the two families {Xt, i e /} and { Yhj e J} of integrable r.r.v.’s are uniformly absolutely continuous (uniformly integrable), the family {Xt + Y, ; i e I,j e J} is again uniformly absolutely continuous (uniformly integrable). 11.5.4. For every integrable r.r.v. X, the set function X is uniformly continuous on the Boolean metric algebra s/. In order that the family {Xt, i e /} of integrable r.r.v.’s be uniformly absolutely continuous, it is necessary and sufficient that the family {J^ Xt, i e /} of set functions be equiuniformly continuous on sY. 11.5.5. Show that for every atomless probability space (12, s£, P) the uniform integrability of a family of r.r.v.’s is equivalent to its uniform absolute continuity (use the existence for every e > 0 of a finite partition of the space consisting of sets with probability < e). Deduce from this that in the general case condition (b) in Proposition II.5.2 can be replaced by the following condition: On every atom A the set of constant values taken by the Xt (/ e /) is bounded. Consequently, one can always replace uniform integrability in condition (c) of Proposition II.5.4 by uniform absolute continuity. 11.5.6. Extend the definitions and the results of the last two sections to generalized sequences of r.r.v.’s. A generalized sequence {Xa} of integrable r.r.v.’s is said to be uniformly integrable at infinity if for every e > 0 there exists an index ae and a real number ae such that | Xa\ < e

if

a

^ ae.

f\xai>ae)

Show that for a sequence of r.r.v.’s, this notion is equivalent to uniform integrability. Establish for uniform integrability at infinity a result analogous to Proposition II.5.2. Show that Proposition II.5.4 generalizes to generalized sequences of r.r.v.’s if we introduce uniform integrability at infinity in condition (c).

II.6

Lp SPACES

55

II.6. Lp SPACES Lemma II.6.1.

If cp is a real continuous and concave function defined

on a convex domain D in Rn, then E[ X.

Corollary.

|

Let p e

[1,

oo) and let {Xn, n ^

majorized in absolute value by Y e Lp: \Xn\ < Y.

1}

be a sequence in Lv

For every r.r.v. X the

following two conditions are equivalent: (a) Xn ——> X as n —> oo. (b) X g Lp and Xn —X as n^ oo. A number of earlier results can be brought together in the following form:

Proposition II.6.2.

For every p e

[1,

oo], the space LV{Q, stf, P) is a

complete normed vector space (a Banach space) and a complete lattice.

Proof.

We have already shown that Lp is a normed vector space;

the validity of the Cauchy criterion implies that it is a complete space, hence a Banach space. A partially ordered vector space L is said to be a complete lattice if every finite family and every upper bounded (lower bounded) infinite family has a supremum (infimum). suffices to prove here: (a) that sup

By virtue of Proposition II.4.1, it X2) e Lp if Xu X2eLp\ (b) that if

two positive r.r.v.’s X and Y are such that X ^ Y and Y e Lp, then X e Lp.

But (b) is immediate and (a) follows from it, since

II.6

Lp The space L2(&,

Corollary.

59

spaces

, P) is a Hilbert space for the scalar

product | ^ || X |j2 || Y ||2 where by definition ||Z||2 = -f(X, X). Complements and problems 11.6.1. For every r.r.v. X the mapping of [1, oo] into [0, co] defined by p—> ||X|]P is continuous except possibly at some one point p0, at which it is then continuous from the left and such that ||X||P < co if p < p0, || X\p = co if p > po. Show that on the interval where ||Ar||p < co the continuous function log ||X||P is convex in p. 11.6.2. If u is a continuous increasing mapping of [0, co] onto itself and if v is its inverse, show that xy < U(x) + V(y) for every x, y e [0, co], where U(x) = Jo u(z) dz, V(y) = J“ v(z) dz.

It follows that if X, Y are two r.r.v.’s

on (Q, sé, P), the product XY is integrable whenever the r.r.v.’s U[ \ X\ ] and V[ \ Y\ ] are [example: u{x) = v:p_1 where p > 1]. 11.6.3. Let E be the vector space of equivalence classes of step r.r.v.’s defined on a probability space (Q, j^,P). Show that E

C

Loo

C

Lq

C

Lp

C

L1

(1 < p < q
X

Lq

it suffices that Xn-> X if q > p\ show that this condition is necessary only if s#IP is finite. 11.6.4. Every positive linear mapping I of a space LV(Q, ^,P) into a space LP’(Q', s/', P') is necessarily continuous [arguing by contradiction, show that there exists a constant C such that ||7,(X)||P' < C||X||P for every positive r.r.v. X e Lv]. If T is a positive linear transformation of a space LfQ, sf, P) into a space LfQ', s4', P') such that T( 1) = Y, show that for every p e [1, oo] the restric¬ tion of T to LP(Q, s4, P) is a positive linear mapping of LP(Q, P) into LP{Q', si’, P') whose norm is equal to 1. 11.6.5. If the sequence {Xn, « 2* 1} of positive integrable r.r.v.’s con¬ verges in probability to a positive r.r.v. X, show that the condition

jxn^ jx

(n-+

ii

is enough to imply that Xn-> X. [Show that (X — Xn)+-> 0 as «—> °o.]

co)

60

II.7

INTEGRATION OF RANDOM VARIABLES

II.6.6. The space Lœ(Q, sd, P) is a Banach algebra. Show that the characters of this algebra (that is, the continuous linear functionals u on Lx such that u(XY) = u(X)u(Y) for every X, Ye Lœ) are put in one-to-one correspondence with the maximal filters of ^/P (Problem 1.2.3) by the formula S = {A: u(lA) = 1}. II.6.7. Integrability of r.v.’s in a Banach space. Let (12, sd,P) be a probability space and let £ be a Banach space. The random variables in E have been defined in Problem II.2.2; show that the norm ||Ar(-)|| in E of the random variable X is a positive r.r.v. For every step r.v. X in E, X = 2/ xtlAi, we define

in E.

Extend this integral by continuity to all the r.v.’s in E for which

||A(-)|| is in Li, and show that ||J XdP\ s£ J ||X|| dP. Show that the space LfE) of equivalence classes of r.v.’s in E whose norms are integrable is a Banach space for the norm 111X\ \ | = J j| X || dP and that the integral defines a continuous linear operator from LfE) into E.

Define the spaces LV{E).

II.6.8. Deduce from Proposition II.6.1 and from Problem II.4.3 a necessary and sufficient condition for a subset H in Lp to be relatively compact (1 p < oo).

*11.7. INTEGRATION ON TOPOLOGICAL SPACES In this section we intend to study the relations which can exist between the measurable structure and the topological structure of a given space Q.

Making use of the results of Sections 1.5 and II.3, we start by

proving the following fundamental result (Daniell). Proposition

II.7.1.

Let S' be a Riesz space (vector lattice) of real

functions defined on a set Q, containing the constant function 1.

Let E be a

positive linear functional defined on S such that E(\) = 1, and having the following sequential continuity property: limn j E(Xn) — 0 for every sequence {Xn} in S which decreases to 0 on Q. If s/ denotes the smallest o-algebra of subsets of Q with respect to which all the functions in S are measurable, there exists a unique probability P on (Q, sf) such that every function X in S is integrable and E(X) = f X dP. Proof.

We shall adopt the notation of Proposition II.3.2; we denote

by ^ the class {G: 1G e/+} of subsets of Li and set 77(G) = E( 1G) on

II.7

INTEGRATION ON TOPOLOGICAL SPACES

61

Then Proposition II.3.2 shows that the class ^ and the function 77 satisfy the hypotheses of Proposition 1.5.2; this latter proposition and its corollary show that if we introduce the set function

n*(Q1) = inf {/7(G); G e

G D

on S(Q), then the class S — {D: TJ*{D) + 77*(7>c) = 1} is a cr-algebra and the restriction of 77* to S is a probability. For every Ze/+ the sets {Z > a} belong to ^ by virtue of the formula

l{Z>a) = lim,, f min [1, n(Z — a) + ].

This

remark

definition of ^ show that the cr-algebra generated by

and

the

and the smallest

CT-algebra with respect to which the functions in^/^. are measurable, are identical; moreover these cr-algebras also coincide with the cr-algebra sZ defined in the proposition, since every function in ^ + is obviously ^-measurable. We show next that S contains ^ and therefore also the cr-algebra sZ which ^ generates.

To this end let us establish first that for every subset

Qx of Q we have IJ*(Q1) = inf{/7(G); Gef,G3 = inf {E(Z); z e S’ + , Z ^ lfll}. The first equality being simply a definition and the second term being obviously larger than the third, it suffices to prove that 77*(f21) ^ E(Z) for every Z e/+ such that Z ^ lfil.

But for such a Z and every real

ae(0, 1) the set {Z > a} belongs to & and contains Qx ; thus

E(Z) > ^ 77({Z > a}) > X- n*(Qx), and to obtain the desired inequality it remains only to let a tend to 1. Now let G0 be a set in

Since by definition there exists a sequence

{Zn} in S’ + which increases to lGo, we have: 77(G0) = lim f E(Xn) n

and T7*(G£) = inf{F(Z);Ze/ + ,Z ^ lGg} ^ lim j E( 1 - Xn). n

It follows that 77*(G0) + 77*(Gg) ^ 1 ; the strict inequality being impos¬ sible, G0 belongs to S.

This shows that ^ and hence sZ is contained in the

cr-algebra S\ we denote by P the restriction to sZ of the set function 77* (P is a probability).

62

II.7

INTEGRATION OF RANDOM VARIABLES

From the foregoing, every Ze/+ is measurable with respect to the 0 the sets {Xa ^ e} form a

generalized sequence in S' decreasing to 0, the inequality E{Xa) < € + CP\Xa > €)

II.7

67

INTEGRATION ON TOPOLOGICAL SPACES

implies that lim j E(Xa) = 0 as long as the probability P' satisfies the condition lim | P'(Fa) = 0 for every generalized sequence in ^ decreasing to 0.

Assuming that P'

satisfies this condition, we denote by P the regular probability on (Pi, 38) which we have associated with the functional E by the construction at the beginning; we will then have completed the proof of the proposition, upon showing that P = P' on 38 if and only if P' is regular on 38. To this end, it suffices to prove that if P' is regular, then JXdP' = lim f jxadP' when X — lima f Xa (X

Xa e CJ); this implies, in fact, that

the integrals of P and P' coincide on on 38.

But if X = lim« f Xa in

and therefore that P — P'

then for every a > 0, {X > a} =

lima f {Xa > a) in the class ^ of open sets and hence, if P' is regular, P'(X > a) = lim f P\Xa > a). a

By virtue of the formula Y

fi2n

Y = lim 1^2 1 {^ > a/2”} »t »

^

q=l

which holds for every positive function Y, the foregoing implies that lim f cc

XadP' = lim f 2-™ V P'(xa > £ J

a,n

=

lim

f

2_n

q~=1

\

J

P'(x

^ /

> ÿ) = \XdP'-

I

When Q is a compact topological space, every positive linear func¬ tional on C(f2) such that £(1) = 1 has the continuity property introduced in the preceding proposition; in fact every generalized sequence of con¬ tinuous functions on Q which decreases to 0 decreases uniformly to 0 (Dini’s lemma) and moreover we have E(X) < supfi |X|.

For such a

space the results of this section assume the following simple form: Proposition

II.7.5.

Let Pi be a compact topological space and let

3E, sY and 38 be the classes of closed sets, Baire sets and Borel sets, respec¬ tively, in Q.

Let us call a probability P on (Pi, 38) regular if it satisfies one

of the following equivalent conditions:

II.7

INTEGRATION OF RANDOM VARIABLES

68

(a) P(B) = sup {P(F); F e &, F

C

B} for every B e 38;

(b) for every F e 8F and every e > 0 there exists an open set G con¬ taining F such that P{G) < P{F) + e; (c) for every generalized sequence of closed sets Fa decreasing to F, lim j P(Fay = P(F); then the formula E(X) = J X dP, where X e C(Q), establishes a one-to-one correspondence between: (1) the positive linear functionals E on C(£?) such that E( 1) = 1; (2) the probabilities on (Q, sX); (3) the regular probabilities on (Q, 38). Proof.

By virtue of the earlier results, it remains to show only that

conditions (a)-(c) are equivalent for a probability P defined on (Q, 38). But property (a) is equivalent by passing to complements to the property P{B) = inf{.P((7); G open, GOB}; restricting ourselves to closed sets B, we obtain property (b).

If P satisfies condition (b) and {Fa} is a generalized

sequence in 3F which decreases to F, we choose an open set G 0 F such that P(G) ^ P(F) +

e

for a given

e;

since lim Fa n Gc — 0, we have

F„0 n Gc = 0 for at least one a0 by virtue of the compactness of Q, and hence lim | P(Fa) < P(Fao) ^ P{G) < P(F) + e; thus the probability P satisfies condition (c).

Finally, by Proposition

II.7.4, every probability P satisfying condition (c) also satisfies (a).

|

Complements and problems 11.7.1. Show that every positive linear functional E defined on the Riesz space C(Q) of continuous functions (bounded or not) defined on a topological space Q, necessarily satisfies the following condition: lim f E{Xn) = 0 if Xn j 0 in C(13); deduce from this that £(1) > 0 if E is not to be identically zero. [If Xn \ 0, show that E[{Xn - e) + ] f 0 for every e > 0, by noting that the series 2n {Xn e)+ defines a function which is continuous on each open set [Xm < e} and therefore on Q. Observe next that £(1) = 0 implies that E(X) = E(X — a)+ for every a > 0 and every X.] —

11.7.2. If a completely regular topological space Q is the countable union of compact subsets (more generally, a Lindelof space), every positive

II.7

INTEGRATION ON TOPOLOGICAL SPACES

69

linear functional on C (G) such that limn \ E(Xn) = 0 for every sequence {Xn} in Coo(^) which decreases to 0 necessarily has the stronger continuity property: lima ]■ E(XU) = 0 for every generalized sequence {Xa} in Cœ(G) which decreases to 0. Deduce from this that on such a space, every prob¬ ability defined on the a-algebra of Baire sets has a unique extension to the CT-algebra of Borel sets which is a regular probability. Carry over this result to the case of an arbitrary completely regular space and a probability whose support is a countable union of compact sets C,. 00

II.7.3. (Lusin’s theorem.) Let (Q, stf) be a Polish space taken with the a-algebra of its Borel sets and let P be a probability on this space. Show that for every r.r.v. X defined on (G, stf, P) and for every e > 0 there exists a com¬ pact set K€ of probability P(Ke) ^ 1 — e such that the restriction of X to Ke is a continuous mapping of K€ into R. [Use Proposition II.7.3 to prove this in the case of a step r.r.v. and Problem II.4.4 to pass to the case where X is a.s. finite.]

CHAPTER III

PRODUCT SPACES AND RANDOM FUNCTIONS

III.l. THE PRODUCT OF TWO MEASURABLE SPACES Given two arbitrary sets Q1 and Q2, we denote by Qy x Q2 the product of Qy and Q2, that is, by definition, the set of all pairs o> = (ojx, o»2) obtained when œ1 runs through Qy and a>2 runs through Q2.

The

mapping of Q1 x Q2 into £2t (i = 1, 2) which takes co = (ool5 œ2) into (vt is called the /-th coordinate. If A is an arbitrary subset of Qy x Q2, we denote by Aai the section of A at u>i, that is, the subset of Q2 defined by Aai = {œ2- (u>i, w2) e A). For every fixed o>x the mapping A -> A(ûl of 0>{Q1 x Q2) onto ^(f22) is a homomorphism for the operations of union, intersection, and comple¬ mentation ; if {Aa} is a family in tP(Qy x Q2), we in fact have

If X is an arbitrary mapping of Qy x Q2 into an arbitrary space, we denote by XCOl the section of X at cux, that is, the mapping defined on Q2 by Xaj^coA = X(m1, a>2).

We observe, to justify this terminology, that

The transformation XXœi (üjx fixed) obviously pre¬ serves the usual operations on functions, including pointwise convergence. A rectangle in Qy x Q2 is a subset of the form

Ay x A2 = {(a»!, oj2): ioy g Ay, co2 e A2}; a rectangle is empty if and only if one of its sides Ay or A2 is empty. 70

The

III.l

71

PRODUCT OF TWO MEASURABLE SPACES

section of a rectangle is given by: (Ay x A2)Wl = A2 or 0 according as OJy G Ay or COy

Ay.

Let s/y and ss/2 be a-algebras of subsets of Qy and Q2 respectively. A rectangle Ay x A2 is said to be measurable (with respect to stfy and

Corollary 2.

J

Xmi dP\[ajl5 •)
1)Pl(oj1, A2)

(A2

£

sZf)-

For every positive (quasi-integrable) r.r.v. Z defined on (Q2, xZ2, P2) the function Yfiif — [ PI{ml5 da>2)Z(a)2) is defined a.e., is sZ ^measurable on ff and is positive (quasi-integrable with respect to Pfi. jzdP2

=

Moreover

j P1(dco1)Y(œ1).

The special case of the preceding proposition obtained by assuming that the transition probability does not depend on the variable wx carries the name Fubini's theorem. III.2.2.

Proposition

probability spaces.

Let

(f21;

xZu Pf and {Q2, sZ2, P2) be two

There exists one and only one probability P on

(Qx x Li2, xZ\ 2)

J

J Q2

f

P2(dco2)XU)fia>2)

f

PfdœfiX^fœfi.

J Q2

* 12^

III.2

TRANSITION AND PRODUCT PROBABILITIES

77

The probability space (Qx x fl2, ,?/x ® sf2, Px x P2) is called the product of (Qx, s/x,

and (Q2, st2, P2) ; it iis 0/50 denoted by (i21? J3^ls Px) x (Æ2, j^a, P2).

Proof.

It suffices to apply Proposition III.2.1 to the probability Px

and the transition probability [P2(