Radically Elementary Probability Theory. (AM-117), Volume 117
 9781400882144

Table of contents :
Table of contents
Preface
Acknowledgments
1. Random variables
2. Algebras of random variables
3. Stochastic processes
4. External concepts
5. Infinitesimals
6. External analogues of internal notions
7. Properties that hold almost everywhere
8. L1 random variables
9. The decomposition of a stochastic process
10. The total variation of a process
11. Convergence of martingales
12. Fluctuations of martingales
13. Discontinuities of martingales
14. The Lindeberg condition
15. The maximum of a martingale
16. The law of large numbers
17. Nearly equivalent stochastic processes
18. The de Moivre-Laplace-Lindeberg-Feller-Wiener-Lévy-Doob-Erdös-Kac-Donsker-Prokhorov theorem
Appendix
Index

Citation preview

Annals of Mathematics Studies Number 117

RADICALLY ELEMENTARY PROBABILITY THEORY

BY

EDWARD NELSON

P R IN C E T O N U N IV E R S IT Y P R E SS

P R IN C E T O N , N EW JE R S E Y

1987

Copyright © 1987 by Princeton University Press A ll R ights Reserved The A nnals o f M athem atics Studies are edited by L uis A. Caffarelli, John N. Mather, and E lias M. Stein

Library of Congress Cataioging-in-Publication Data N elson, Edward, 1 9 3 2 R adically elementary probability theory. (Annals o f m athem atics studies ; no. 117) Includes index. 1. M artingales. 2. Stochastic proceses. 3. Probabilities. I. Title. II. Series Q A 274.5.N 45 1987 519.2 87-3160 IS B N 0-691-08473-4 IS B N 0-691 -08474-2(pbk.) Princeton University P ress books are printed on acid-free paper and m eet the guidelines for perm anence and durability o f the Com m ittee on Production G uidelines for Book L ongevity o f the Council on Library R esources http://pup.princeton.edu Printed in the United States o f A m erica

3 5 7 9

10

8 6 4

T able o f contents Preface Acknowledgments 1. Random variables 2 . Algebras of random variables 3 . Stochastic processes 4 . External concepts 5 . Infinitesimals 6. External analogues of internal notions

7 . Properties that hold almost everywhere 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

L l random variables The decomposition of a stochastic process The total variation of a process Convergence of martingales Fluctuations of martingales Discontinuities of martingales The Lindeberg condition The maximum of a martingale The law of large numbers Nearly equivalent stochastic processes The de Moivre-Laplace-Lindeberg-Feller-WienerL evy-D oob-Erdos-Kac-Donsker-P rokhorov theorem Appendix Index

v

vii ix

3 6 10 12

16 20

25 30 33 37 41 48 53 57 61 63 72 75 80 95

P reface More than any other branch of mathematics, probability theory has developed in conjunction with its applications. This was true in the begin­ ning, when Pascal and Fermat tackled the problem of finding a fair way to divide the stakes in a game of chance, and it continues to be true to­ day, when the most exciting work in probability theory is being done by physicists working on statistical mechanics. The foundations of probability theory were laid just over fifty years ago, by Kolmogorov. I am sure that many other probabilists teaching a begin­ ning graduate course have also had the feeling that these measure-theoretic foundations serve more to salve our mathematical consciences than to pro­ vide an incisive tool for the scientist who wishes to apply probability theory. This work is an attempt to lay new foundations for probability theory, using a tiny bit of nonstandard analysis. The mathematical background required is little more than that which is taught in high school, and it is my hope that it will make deep results from the modern theory of stochastic processes readily available to anyone who can add, multiply, and reason. What makes this possible is the decision to leave the results in non­ standard form. Nonstandard analysts have a new way of thinking about mathematics, and if it is not translated back into conventional terms then it is seen to be remarkably elementary. Mathematicians are quite rightly conservative and suspicious of new ideas. They will ask whether the results developed here are as powerful as the conventional results, and whether it is worth their while to learn non­ standard methods. These questions are addressed in an appendix, which assumes a much greater level of mathematical knowledge than does the main text. But I want to emphasize that the main text stands on its own.

A cknow ledgm ents I am grateful to Eric Carlen, Mai Gehrke, Klaus Kaiser, and Brian White for helpful comments, and to Pierre Cartier for a critical reading of the manuscript and the correction of many errors, together with helpful suggestions. This work was partially supported by the National Science Foundation. I thank Morgan Phillips for the six illustrations that JATgX did not draw.

R adically Elementary Probability Theory

C h ap te r 1 R an d o m v ariab les Here are some of the basic definitions and inequalities of probability theory, in the context of a finite probability space. A fin ite probability space is a finite set fi and a strictly positive function pr on ft such that X) pr(^) == 1. Then a random variable on ft is a function re: ft —» R , where R is the real numbers. The expectation or mean of a random variable x is Ere = rc(u;)pr(a;). An event is a subset A of ft, and the probability of an event A is P t A = Y , Pr(w)If A is an event, we define a random variable x a > called its indicator fu n c­ tion, by X a {w) — 1 if cj E A and X a (v ) = 0 if w (fc A . Then Pr A = Exa« Also, we define A c to be the complementary event A c = ft \ A of all w in ft that are not in A . The set R n of all random variables on ft is an n-dimensional vector space, where n is the number of points in ft. Consider the expression E xy, where x and y are any two random variables. Then E xy = E y x , Ercy is linear in x and y, and Ererc > 0 unless x = 0. Thus E xy has all of the properties of the inner product on n-dimensional Euclidean space. The Euclidean norm \/Err2 of the random variable x is denoted by ||rr||2. The expectation is a linear function on R n, so the random variables of mean 0 form a hyperplane. The orthogonal complement of this hyperplane is the one-dimensional subspace of constant random variables. We identify the constant random variable whose value is A with the number A. With this identification, x Ere is the orthogonal projection onto the constant random variables and a; *—►a: — Ere is the orthogonal projection onto the

3

4

CHAPTER 1

random variables of mean 0. We call Var x = E (x — E x)2 the variance of x, \/Varx the standard deviation of x, E(x — Ex)(y — Ey) the covariance of x and y, and E (x — Ex) (y — Ey) V V arx\/V ary the correlation coefficient of x and y. Thus if x and y have mean 0, the variance of x is the square ||x||2 of its Euclidean norm, the standard devia­ tion of x is its Euclidean norm ||^||2? the covariance of x and y is their inner product, and the correlation coefficient of x and y is the cosine of the angle between them. Other norms on random variables are frequently useful. For 1 < p < oo let ||x||p = (E|x|p)1/p, and let H x ^ = max|x(u;)|. Clearly ||x||p < M oo, and if u/o is a point at which |x| attains its maximum, then INI, > ( M ‘*>o)|ppr(wo))1/p -* IMloo as p —> oo, so that ||x||p ((xH^ as p —►oo. For 1 < p < oo, let p\ called the conjugate exponent to p, be defined by p1 = p/(p — 1), so that

Holder }s inequality asserts that |Exy| < ||x||p||y||p..

(1.1)

If x or y is 0, this is trivially true. Otherwise, we can assume that we have ||x||p = ||y||p» = 1 after replacing x by x/||x||p and y by y/||y||p«. Then E|x|p = E|y|p# = 1 and we want to show that |Exy| < 1. Since |Exyj < E|xy|, this will follow if we can show that |xy| is less than a convex combination of |x|p and |y|p\ Taking the obvious convex combination, we need only show that m

< V P

+ i i s r P

To see this, take logarithms. By the concavity of the logarithm function, the logarithm of the right hand side is greater than - log |x|p + i log |y |p' = log |xy|, P P which concludes the proof of Holder’s inequality. Keeping the normalizations ||x||p = ||y||p' = we see that we have strict inequality in (1.1) unless |Exy| = E|xy|— that is, unless x and y have the

5

RANDOM VARIABLES

same sign— and unless |a;|p = \y\p ] but if x = sgnt/|y|p#/p, then "Exy = 1. Consequently, for all random variables x we have blip = max |Ezy|.

Ill/Ilp#= l

(1.2)

An immediate consequence of (1.2) is M inkow ski’s inequality b + *llP < blip + blip-

(1.3)

If we let l 1 = oo and oo' = 1, then ( 1.1), (1.2 ), and (1.3 ) hold for all p with 1 < P < oo. Let / be a convex function. By definition this means that f ( J 2 x M p r(w)) < ] C / ( * ( w))pr(w);, that is, / ( E x ) < E /(*),

(1.4)

which is J en sen ’s inequality. If we apply this to the convex function f ( x ) = |x|p, where 1 < p < oo, we obtain |Ex|p < E|x|p. Applied to |a;| this gives IMIi < IMIp> and applied to |x|r, where 1 < r < oo, this gives ||a;||r < ||a;||rp, so that ||a;||p is an increasing function of p for 1 < p < oo. Let / be a positive function. Then, for A > 0, E f ( x ) - X ^ /(*M )p r(w ) >

YL

/ b M ) p r ( w ) > APr{/(a;) > A},

we{/(*)>A> so that

F r { / W > A)
A} is an abbreviation for {tt> E 0 : /(x(cj)) > A}; such abbreviations are customary in probability theory.) In particular, for A > 0 and p > 0 we have { ja:| > A} = {|*|> > Ap}, and so by (1.5 ) we have

P r(l*l > A) < This is Chebyshev’s inequality.

(1.6)

C h ap te r 2 A lg e b ras o f ran dom v ariab les The set R n of all random variables on fi is not only a vector space, it is an algebra. By an algebra A of random variables we will always mean a subalgebra of R n containing the constants; that is, A is a set of random variables containing the constants and such that whenever x and y are in X, then x + y and xy are in A . The structure of an algebra A is very simple. By an atom of A we mean a maximal event A such that each random variable in A is constant on A . Thus fi is partitioned into atoms— that is, fi is the union of the atoms and different atoms are disjoint. If A is an atom and w £ A> then by definition there is an x in A such that x{A ) ^ s(u;). Let x w = (x—z(c

%

8

CHAPTER 2

The expectation preserves constants; that is, EA = A if A is a constant. The conditional expectation preserves elements of A\ that is, E^j/ = y if y is in A. The expectation is the orthogonal projection onto the constants; that is, E (x — E x)2 < E (x — A)2 for all constants A. By relativization, E>i(x — E^x)2 < E *(x — y)2 for all y in A. Notice that EbE^j = Eb if 8 C A. In particular, EE.* = E since E is the conditional expectation with respect to the trivial algebra consisting of the constants. Therefore E (x — E^x)2 < E (x — y)2 for all y in A , so that E.* is the orthogonal projection onto A. Another notation for Eyjx is E{x|>l}, and we use E { x |x i,. . . , x„} for the conditional expectation of x with respect to the algebra generated by x%,. . . ,x n. In the example of Fig. 2.1, E{x|^} = E{y|^} by symmetry, so that E{x|^} = % E{z\z} = \z. We define the conditional probability Pr^ B of an event B with respect to the algebra A, by relativization, as Pr^ B — "Ea X b - Thus

P r , S ( , ) - P r ,' , r -4“J PrA,10

where A w is the atom of A containing w. Note that this is a random variable, not (in general) a constant. The relativization of Holder’s inequality is \EAx y \ < ( E „ | * n 1/p(E>!|y|p')1/p'> the relativization of Jensen’s inequality is f { B Ax) < E Af{ x ) for / convex, and the relativization of Chebyshev’s inequality is

P r*{/(*) > y } < for / positive and y > 0 in A. From Jensen’s inequality we have |Ey|X|P < Eyj|as|P, and since EE^ = E, this gives ||Eyjx||p < ||^||p, valid for 1 < p < oo. Let A be an algebra of random variables. We denote the set of all atoms of A by at(X). Not only is each element A of at(>{) a finite probability space with respect to prA, but at {A) is itself a finite probability space with respect to = p rM )

ALGEBRAS OF RANDOM VARIABLES

9

for A in a,t(A). We say that the original probability space (fi,pr) is fibered over (at(/t),pr^), with fibers (A,prA). In the example of Fig. 2.1, this can be visualized by rotating the figure 45° clockwise. Expectations with respect to pr^ are denoted by E^, and the probability of a set of atoms is denoted by Pr^. Notice that E^Ejja; = E$. A special case of a fibering is a product. Suppose that (n^ p r^ and (n2,pr2) are finite probability spaces. Then f l x X fi2 is a finite probability space with respect to X pr2, where prx x pr2((w1}w2» = pr1(w1)pr2(w2)Let A\ be the algebra of all random variables that are functions of cox alone. Then at(^i) consists of all sets of the form {(u>i,tx = rjx}, where rix is any element of Hi.

C h ap te r 3 S to c h a stic p ro cesses The word “stochastic” means random, and “process” in this context means function, so that a stochastic process is a function whose values are random variables. Let T be a finite set and let (fi, pr) be a finite probability space. A stochastic process indexed by T and defined over (fi,pr) is a function £:T —►R ° . By a “stochastic process” we will always mean one that is indexed by a finite set and defined over a finite probability space. We write £(£,o;) for the value of £(t) at w, and we write £(•,) for the function t i-» £(t,us). Thus each £(t) is a random variable, each €(t,tv) is a real number, and each £(*, m” , then we can form the subset S = { n € N : A(n)} of all natural numbers n for which A (n) holds. However, the formula must be a formula of the agreed-upon language for mathematics. Sets are not objects in the real world; they are formal mathematical objects and only exist when the formal rules of mathematics say they exist. For example, it does not make sense to consider S = {n 6 N : A(n)} if A (n ) is “n is not in my opinion enormously large” . From the work of Godel in the early thirties it emerged that the ba­ sic intuitive systems of mathematics, such as N , cannot be completely characterized by any axiom scheme. To explain what this means, let us adjoin to the language of conventional mathematics a new undefined pred­ icate “standard” . Then “n is standard” has no meaning within conven­ tional mathematics. We call a formula internal in case it does not involve “standard”— that is, in case it is a formula of conventional mathematics— and otherwise we call it external. Thus the simplest example of an external formula is “ n is standard” . Another example of an external formula is “a; is infinitesimal” , since by definition this means: there exists a nonstandard natural number u such that |x| < l / v . Only internal form ulas may be used to form subsets. (For example, it makes no sense to speak of “the set of all standard natural numbers” or “the set of all infinitesimal real numbers” .) We call an abuse of this rule illegal set form ation. We make the following assumptions: 1. 0 is standard,

14

CHAPTER 4 2. for all n in N , if n is standard then n + 1 is standard.

Then it is impossible to prove that every n in N is standard. (This does not contradict the induction theorem— it merely shows that it is impossible to prove that there is a subset S of N such that a natural number n is in S if and only if n is standard.) That is, it is consistent to assume also:

3 . there exists a nonstandard n in N . We also assume:

4 . if A(0) and i f for all standard n whenever A(n) then A(n + l), then for all standard n we have A(n). In (4), A(n) is any formula, internal or external. This assumption is called external ind uction. It is a complement to ordinary induction, which as we have seen may fail for external formulas. (Of course, ordinary induc­ tion continues to hold for ordinary— i.e., internal— formulas. Nothing in conventional mathematics is changed; we are merely constructing a richer language to discuss the same mathematical objects as before.) Using external induction we can easily prove that every nonstandard natural number is greater than every standard natural number (let v be a nonstandard natural number and in (4) let A(n) be V > n”), that the sum o f two standard natural numbers is standard (let m be a standard natural number and let A(n) be “n + m is standard”), and that the product o f two standard natural numbers is standard (let m be a standard natural number, let A(n) be “ n m is standard” , and use the fact just proved about sums). Another assumption that we shall occasionally use is called the sequence principle. Let A(n,x) be a formula, internal or external. If for all stan­ dard n there is an x such that A (n,x), then, of course, there is an x 0 such that A( 0 ,x 0), an X\ such that A(l, Xj), an such that A(2,a:2), and so forth. We assume: * 5 . i f for all standard n there is an x such that A(n,x), then there is a sequence n x n such that for all standard n we have A( n, x n). For an example of the use of the sequence principle, see the proof of Theo­ rem 6.1. Results that use the sequence principle will be starred. Notice that by (2) there is no smallest nonstandard natural number. We can picture the natural numbers as lying on a tape (Fig. 4 .1). The standard natural numbers behave just like the full system N , so far as internal properties are concerned. But N consists of the standard and the nonstandard natural numbers as well. Notice that we did not start with the left portion of the tape and invent a right portion to be added on.

EXTERNAL CONCEPTS

• • • • • • • • 0 1 2 3

standard

15

• •

nonstandard

Figure 4 .1: The natural numbers Rather we started with the whole tape and then adjoined a predicate to our language that allows us to distinguish the two portions of the tape. The use of this new predicate “standard” is similar to color on a T V set: the picture is the same, but we see distinctions that we could not make before. For a long time the incompleteness of axiomatic systems was regarded by mathematicians as unfortunate. It was the genius of Abraham Robinson, in the early sixties, to turn it to good use and show that thanks to it a vast simplification of mathematical reasoning can be achieved.

C h ap te r 5 In fin itesim als Now we introduce some useful external notions for the field R of real numbers. A real number re is called infinitesim al in case \x\ < l/ u for some nonstandard natural number v . Since a nonstandard v is bigger than every standard n, it follows that if x is infinitesimal then \x\ < 1/n for every standard natural number n. I claim that the converse is also true. If M < 1/n for all n in N , then x = 0 , and so is infinitesimal; otherwise, let p be the least natural number such that \x\ > l//z. Then fi is nonstandard, so v = p, — 1 is nonstandard. But \x\ < l/i/, so x is infinitesimal. A real number x is called lim ited in case \x\ < n for some standard n in N; otherwise x is called unlim ited. The words “finite” and “infinite” are sometimes used as synonyms for “limited” and “unlimited” , respec­ tively, but since they already have internal meanings, their use can lead to confusion, as in “this integral is finite” . If x and y are real numbers we say that: x~ y in case x — y is infinitesimal, x y

in case y < rc,

x < y in case x < y and x ^ y, x » y in case We may read x c*. y as x is infinitely close (or nearly equal) to ?/, x < y as x is weakly less than y, x > y as x is weakly greater than y, x 0. Let S = min* 6t. Then 6 is also strongly positive (simply because it is equal to St for some t in the finite set T ) . Thus £ is continuous on T if and only if for all e 0 there is a 6 0 such that for all s and t y \s — £| < 8 implies |£(5) £(01 ^ e - Hence near continuity at t is an elementary analogue of continuity, and near continuity on T is an elementary analogue of uniform continuity. Here are some simple and useful illustrations of these notions. By ex­ ternal induction, en )}, and A(tu) holds a.e. if and only if Pr{A} 1. But some of the most interest­ ing properties that we shall consider are external, and we need the formu­ lation of the preceding paragraph to avoid illegal set formation. Whether A(o;) is internal or external, though, the intuitive content of the state­ ment that A (a;) holds a.e. is near certainty: given e 0 — for example, e = 10~100— there is an event N with P t N < e such that with the possible exception of points in N the formula A(o>) always holds. T h eorem 7.1 Let x be a random variable. Then the follow ing are equiva­ lent: (i) x cz 0 a.e., (ii) for all A » 0 we have Pr{|x| > A} ~ 0, (iii) there is a A 0 such that Pr{|z| Proof. Suppose (i), and let A » 0 and e :» 0. Then there is an event N

25

26

CHAPTER 7

with F t N < e such that x(u;) ~ 0 for all w in iVc, so that {|x| > A} C N and thus Pr{|x| > A} < e. Since e » 0 is arbitrary, Pr{|a;| > A} ~ 0 . Thus (i) =>> (ii). Suppose (ii). Then the set of all A such that Pr{|x| > A} < A contains all A > 0 and so contains some A ~ 0 by overspill. Thus (ii) => (iii). Finally, the implication (iii) .=> (i) is obvious. □ So long as we are considering a single random variable x, if x cz 0 a.e. then we can safely think of x as being 0 for all practical purposes— the probability of being able to detect with the naked eye any difference from 0 is less than 10'”100. The situation changes radically when we con­ sider an unlimited number of random variables xi, . . . , x, / each of which is infinitesimal a.e. Suppose that the day is divided into u equal parts of infinitesimal duration 1 /*/, that we have a device whose malfunction would cause a disaster, that the probability of malfunction in any period is c j v where 0 < c < o o , and that different periods are independent. If we let x n be the indicator function of the event of a malfunction in the n’th period, then for each n we have x n ~ 0 a.e. (in fact, x n = 0 a.e.). But we are really interested in m axxn, the indicator function of a disaster sometime during the day. By independence, the probability of no disaster during the day is

Let Xj , . . . , x v be a finite sequence of random variables, with v unlimited. We say that x i , . . . ,x„ (nearly) converges to x in probability in case xn ~ x a.e. for all unlimited n < v. As the example above shows, this is not very restrictive. A more interesting question is whether x \, . . . , x v converges to x a.e. For convergence in probability the exceptional set N is allowed to depend on n, but not for convergence a.e. Th eorem 7.2 L et x u . . . , x v be random variables. verges to 0 a.e. i f and only if for all A > have Pr { “ i i i

Then x i , . . . , x „ con­

0 and all unlim ited n < u we

- A} - °-

Proof. Let Af(n,A) = | majc |x Aj . Suppose that converges to 0 a.e., and let A 0 and e 0. Then there is an event N with F t N < e such that x j , . . . , x v converges to 0 on N c. Then M (n, A) C N if n is unlimited, so that Pr Af(n, A) < e. Since e >• 0 is arbitrary, Pr M ( n , A) ~ 0.

PROPERTIES THAT HOLD ALMOST EVERYWHERE

27

Conversely, suppose that Pr Af(n, A) ^ 0 for n cz oo and A >> 0. Let e > 0, and for j ^ 0 in N , let be the least natural number such that

Let

(Not that it matters, but the M(tty,l/j>) are empty for j sufficiently big, since n is finite.) Then Pr TV < e. Notice that if j is limited, so is ny, for otherwise we would have ny — 1 o* oo and 1 / j ;> 0, so that

would be infinitesimal by hypothesis and so < e/2J, contradicting the def­ inition of n y . Consequently, if w E N c and n c* oo, then | x n ( a ;) | < 1/ j for all j ) 0. Since e }> 0 is arbitrary, this shows that converges to 0 a.e. □ Notice that, by Theorem 7 .1 , the relation (7 .1) is equivalent to saying that for all unlimited n < v we have max IjcJ n in (UJUyAi)0 we have k{w) < j — 1 A} < V arlf/A 2. But by independence, Var K = £

Var

< E E x l = EiT.

Hence Pr{jiif - EJT| > A} < E lf/A 2. Suppose that E l f ~ oo and take A = - E l f . Then P r{|lf — E l f | > | E l f } cz 0. Thus except for an event of infinitesimal probability we have \K — E l f | < | E l f , so that I f ~ oo a.s. □ If the A n are independent, then either K |A | ||£||oo> converges, or as we shall say more briefly, ]C|A|prx(A) (nearly) converges. Thus if x is L 1, then E|x| < oo. The converse is not true in general: suppose that pr(oj) 0 and let x = pr(w )~ 1X{w>. Then E|a;| oo, but for a c* oo with a < pr(o>)-1 we have E|a; — = 1^0. Th eorem 8.1 (Radon-Nikodym and converse) A random variable x is L l if and only if E|rc| is lim ited and for all events M with Pr M cz 0 we have

E | * | X m ^ 0. Proof. Suppose that x is L l and Pr M a 0. Let a ~ oo be such that a P r A f c* 0 (for example, let a = 1/ y / P r M ). Then T^\x \Xm < E|x(a^|x m + E|« — x ^ |x m < a Pr Af -f E|rc — x ^ | ~ 0. Conversely, suppose that E|#| a}. Then

30

31

L 1 RANDOM VARIABLES

(by Chebyshev’s inequality for p = 1) we have Pr A f < E|x|/a c* 0 , so that

E | x |x m — 0; that is, E|x — x ^ j

0. □ It follows from this criterion that if x and y are L l , then so is x 4- y; if x is L 1 and |y| < |x|, then y is L l \ and if x is L 1 and \\y\\oo ^ oo, then y x is L 1. Th eorem 8.2 (Lebesgue) If x and y are L 1 and x ^ y a.e., then E x

Ey.

P roof. Let z = x — y. Then z cz 0 a.e., so (by Theorem 7 .1) there is a n a c ^ O such that Pr{|^| > a} c* 0. But \z\ < |^|X(H>«> + and since z is L 1, Theorem 8.1 implies that E|z| ^ 0. Hence E x ~ Ey. □ For 1 < p < oo we say that x is L p in case |x|p is L 1, and we say that x is L°° in case ||x||oo < oo. If x is U and y is L p\ where p1 is the conjugate exponent to p, then by the inequality

|*v| < “I*IP + 4|y |p’ p p proved in Chapter 1, the product xy is L 1. Also, if p > 1 and E|x|p o

a

Th eorem 8.3 L et x be U where 1 < p < oo, and let A be an algebra o f random variables. Then E.*x is L p. Proof. For p = oo this is obvious.For 1 < p < oothe relativized Jensen inequality implies that |E^x|p < E>||x|p, so we need only prove the result for p = 1 . Let x be L 1. Then E|E/fx| < E|x| |X is L 1. □ Th eorem 8.4 Let x be L l and let A be an algebra o f random variables. Then x is L l on a.e. atom o f A. P roof. Let e » 0. For each n in N let an be the least natural number such that

Pr< { e « I * - * (- ' l

32

CHAPTER 8

(See Chapter 2 for the definition of Pr^.) an a n for all n C oo. Thus x is L 1 on those atoms. Since e >> 0 is arbitrary, this concludes the proof. □ In the converse direction, suppose that x is L 1 on every atom. If a ~ oo, then "Ea \x — 0 everywhere, so that Ejrn — ~ 0 . Thus if x is L 1 on every atom of A, then a; is L 1. This is the most that can be said in general, since we can always alter an L 1 random variable on a single point of infinitesimal probability and obtain a random variable that is not L 1. Theorem 8.4 has the following corollary. C orollary. (Fubini) I f x is L l on (fii x fi2,pri x pr2), then the random variable x m on (n2,pr2), given by x Ul(u2) = x {wi>w2), L l fo r a .e . u>i in the space ftj.

C h ap te r 9 T h e decom position o f a sto ch astic p rocess We will study stochastic processes £ indexed by a finite subset T of R. Recall the general notation introduced at the beginning of Chapter 6. Typ­ ical cases will be T = { 1 , . . . ,i/} where v is an unlimited natural number or the case that T is a near interval. Thus, although we require T to be finite, we will be studying the classical subjects of “infinite” sequences of random variables and of “continuous” time parameter stochastic processes. Let P: t Pt be an increasing function from T to the set of all algebras of random variables on (fi,pr). This is called a filtration. We abbreviate by E*. A P -process, or a process adapted to P , is a stochastic process £ indexed by T such that for all t in T we have £(t) 6 Pt. Since Ps C Pt for s < t> if £ is a P-process, then £(s) € Pt for all s < t. If £ is any stochastic process indexed by T, then it is a P-process if we define each Pt to be the algebra generated by the £(s) with s < t, but it is convenient to allow for the possibility that Pt is larger. The algebra Pt represents the past at time £, and if y is a random variable, then J£ty is the best prediction of y that can be made knowing the past at time t . A P-process £ is called a martingale in case E 5£(£) = £(s) for all s < t , a submartingale in case E s£(t) > £(s) for all s < t, and a super martingale in case < £(s) for all s < t. Thus in the trivial case that VI consists of a single point, a martingale reduces to a constant function of 2, a sub­ martingale to an increasing function, and a supermartingale to a decreasing function. Notice that if £ is a martingale, then a t ) = E,e(6)

33

34

CHAPTER 9

for all t. Conversely, given a filtration P and any random variable x, the process £ defined by £(t) = E*x is a martingale. Let £ be a P-process. We define D£, d£, and by D C {t)d t = E «cf£(t), d t { t ) = D £ { t ) d t + dZ{t), o\{t)dt = E td i( t ) 2. Thus JD£(t)dt and a^(t)dt are the conditional mean and variance of the increment d£(£). Notice that D£ and cr| are P-processes indexed by T \ whereas d£ and d£ are not in general P-processes. Observe that D£ = 0 if £ is a martingale, £>£ > 0 if £ is a submartingale, and D £ < 0 if £ is a supermartingale. (Of course, for a general process £>£ need not have a constant sign either in t or in w.) We will show that these conditions are sufficient as well as necessary. We have

£(t) = £ ( s ) + 5 3 D t{r )d r + 53 dHr)> s ^ t8< r< t

I9-1)

8 < r< t

Now E rd£(r) = 0. Since P9 C Pr for s < r, we have E* = E sE r for s < r. Hence E 5d£(r) = 0 for s < r. Therefore, if we apply E* to (9 .1) we obtain +E.

E

X>^(r)rfr, « < *.

(9 .2 )

s< r< t

Therefore £ is a martingale if and only if J9 £ = 0, a submartingale if and only if D£ > 0, and a supermartingale if and only if jD£ < 0. We call J9 £ the trend of £, and if d£ = 0 we say that £ is a predictable process. Thus £ is a predictable process if and only if cr| — 0 or, equivalently, d£ is a P-process. We let l( t ) =

9< t

so that £(a) = 0. Then d£(£) = D £ (t)d t is in Pt, so that £ is a predictable process. We call it the predictable process associated with £. Notice that if we know P*, then we know d£(2); we can predict the increment with certainty. But to predict the next increment with certainty, we would need to know P*+dt> and this is not in general generated by P% and d£(t). We let l(t) = e ( « ) + E « t f W 9< t

THE DECOMPOSITION OF A STOCHASTIC PROCESS

35

Thus £ is a P-process whose increments, as the notation requires, are the d£(t). Since D£ = 0, the process £ is a martingale. We call it the martingale associated with £. Notice that if £ is already a martingale, then £ = £. We have the decomposition of an arbitrary P-process £ into a predictable process £ and a martingale £, and this decomposition is unique if we impose the normalization £(a) = 0. If the d£(t) are independent, and if Pt is the algebra generated by the £(s) with s < t y then D £ (t) = E*d£(t) = Ed£(t). Therefore the partial sums £(t) d£(s) of independent random variables d£(s) of mean 0 form a martingale. To specify such a process up to equivalence, it is only necessary to give the probability distribution of the increments. Here are two examples. In the first example, which we call the W iener walk> we set

,w . — /

Pr°kability

\ —y/dt

with probability |,

and in the second example, which we call the P oisson walk, we assume that dt < 1 for all t and set

(

1 with probability |dt, 0 with probability 1 — dt,

—I

Let £ be a martingale. If t \ linear, we have E r2d£(ri)d£(r2) for s < r2, and consequently

with probability \dt. 7*2, then d£(ri) € Pr*. Since E r2 isPr2d£(ri)Er2d£(r2) = 0 . But E sE r2 = E d

< =

E 4d£(ri)d£(r2) = 0, rx ^ r2) s < m a x f o .r j}. Since £(t) — £(s) = J2 »