Stochastic Processes: A Survey of the Mathematical Theory (Applied Mathematical Sciences, 23) 9780387902753, 0387902759

This book is the result of lectures which I gave dur­ ing the academic year 1972-73 to third-year students a~ Aarhus Uni

203 92 19MB

English Pages 288 [284]

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Stochastic Processes: A Survey of the Mathematical Theory (Applied Mathematical Sciences, 23)
 9780387902753, 0387902759

Citation preview

Applied Mathematical Sciences EDITORS Fritz John Courant Institute of Mathematical Sciences New York University New York. N.Y. 10012

Lawrence Sirovich Division of Applied Mathematics Brown University Providence. R.I. 02912

Joseph P. LaSalle

Division of Applied Mathematics Lefschetz Center for Dynamical Systems Providence. R.I. 02912

ADVISORS H. Cabannes University of Paris-VI M. Ghil New York University J.K. Hale Brown University

J. Keller Stanford University J. Marsden Univ. of California at Berkeley G.B. Whitham California Insl. of Technology

EDITORIAL STATEMENT The mathematization of all sciences. the fading of traditional scientific boundaries. the impact of computer technology. the growing importance of mathematicalcomputer modelling and the necessity of scientific planning all create the need both in education and research for books that are introductory to and abreast of these developments. The purpose of this series is to provide such books. suitable for the user of mathematics. the mathematician interested in applications. and the student scientist. In particular. this series will provide an outlet for material less formally presented and more anticipatory of needs than finished texts or monographs. yet of immediate interest because of the novelty of its treatment of an application or of mathematics being applied or lying close to applications. The aim of the series is. through rapid publication in an attractive but inexpensive format. to make material of current interest widely accessible. This implies the absence of excessive generality and abstraction. and unrealistic idealization. but with quality of exposition as a goal. Many of the books will originate out of and will stimulate the development of new undergraduate and graduate courses in the applications of mathematics. Some of the books will present introductions to new areas of research. new applications and act as signposts for new directions in the mathematical sciences. This series will often serve as an intermediate stage of the publication of material which. through exposure here. will be further developed and refined. These will appear in conventional format and in hard cover.

MANUSCRIPTS The Editors welcome all inquiries regarding the submission of manuscripts for the series. Final preparation of all manuscripts will take place in the editorial offices of the series in the Division of Applied Mathematics. Brown University. Providence. Rhode Island. SPRINGER-VERLAG NEW YORK INC .. 175 Fifth Avenue. New York. N. Y. 10010

Applied Mathematical Sciences

I Volume 23

John Lamperti

Stochastic Processes A Survey of the Mathematical Theory

Springer-Verlag New York Heidelberg Berlin

John Lamperti Department of Mathematics Dartmouth College Hanover. New Hampshire 03755

AMS Subject Classifications:

60-01. 60Gxx. 60Jxx

Library of Congress Cataloging in Publication Data Lamperti. John. Stochastic processes. (Applied mathematical sciences ; v. 23) Bibliography: p. Includes index. 1. Stochastic processes. 2. Stationary processes. 3. Markov processes. I. Title. I I. Series. QAl.A647 vol. 23 [QA274] 5l0'.8s [519.2) 77-24321 All rights reserved. No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag. ~ 1977 by Springer-Verlag, New York Inc.

9 8 7 6

543

ISBN 978-0-387-90275-3 ISBN 978-1-4684-9358-0 (eBook) DOl 10.1007/978-1-4684-9358-0

PREFACE This book is the result of lectures which I gave during the academic year 1972-73 to third-year students University in Denmark.

a~

Aarhus

The purpose of the book, as of the

lectures, is to survey some of the main themes in the modern theory of stochastic processes. In my previous book Probability: ! survey of the mathematical theory I gave a short overview of "classical" probability mathematics, concentrating especially on sums of independent random variables.

I did not discuss specific appli-

cations of the theory; I did strive for a spirit friendly to application by coming to grips as fast as I could with the major problems and techniques and by avoiding too high levels of abstraction and completeness.

At the same time, I tried

to make the proofs both rigorous and motivated and to show how certain results have evolved rather than just presenting them in polished final form.

The same remarks apply to this

book, at least as a statement of intentions, and it can serve as a sequel to the earlier one continuing the story in the same style and spirit. The contents of the present book fall roughly into two parts.

The first deals mostly with stationary processes,

which provide the mathematics for describing phenomena in a steady state overall but subject to random fluctuations. Chapter 4 is the heart of this part.

The simple geometry of

the Wold decomposition is the starting point for discussing linear prediction, and the analysis is derived from it in a direct and natural way.

Basic results such as criteria for

singularity or regularity, the factorization of the spectral

vi

density and the error of optimum prediction are thus obtained with a minimum of heavy analytical machinery, while the need for those tools which are used becomes clear in advance.

The

individual ergodic theorem and the strong law of large numbers then round out the study of stationary processes. The second part of the book is mainly about Markov processes; if desired this can be read before most of part one by going directly from Chapter 2 to Chapter 6. that the application of semigroup

I think

theory is the key to under-

standing in this area, and so Chapter 7 in which this tool is developed is basic to the discussion.

Properties of path

functions, strong Markov processes and a little martingale theory are introduced in the later chapters of this part. I believe that, in the last analysis, probability cannot be properly understood just as pure mathematics, separated from the body of experience and examples which have brought it to life.

The students attending my Aarhus lectures al-

ready had a considerable acquaintance with applied probability and statistics, and I regard some such experience as one of the essential prerequisites for reading this book with profit. The other prerequisites are a general knowledge of "real analysis" as well as some familiarity with the measuretheoretic formulation of probability theory itself; details are given below.

I hope that after finishing this book

readers will be prepared either to go on to the frontiers of mathematical research through more specialized literature, or to turn toward applied problems with an ability to relate them to the general theory and to use its tools and ideas as far as may be possible.

vii If it is true that the mathematics discussed in this book is applicable, the question naturally must arise "Applicable for what?"

In the preface to the mimeographed version

of my Aarhus lectures,*) given while the Indochina War was still raging, I said: "It is impossible for me these days to write or lecture about mathematics without ambivalence. It is obvious that in many nations, and most of all in my own, science and mathematics are all too often serving as tools for militarism and oppression. Probability theory has played a considerable role in some of these perversions, and those who like myself, work in "pure mathematics" rather t~an directly with applications must also accept a share of the responsibility. I believe that today it is a vital duty for the scientific communitr to struggle against such misuse of science, ana to resist the demands - made in the name of "defense" or "security" - to develop ever more efficient means for killing and exploiting other human beings." Such concerns, of course, are not new.

The American

mathematician who has contributed most to the theories developed in this book is undoubtedly Norbert Wiener.

In 1947, in

the Introduction to his influential book Cybernetics,**) Wiener wrote: "Those of us who have contributed to the new science of cybernetics thus stand in a moral position which is, to say the least, not very comfortable. We have contributed to the initiation of a new science which, as I have said embraces technical developments with great possi~ilities for good and for evil. We can only hand it over into the world that exists about us, and this is the world of Belsen and Hiroshima. We do not even have the choice of suppressing these new technical developments. They belong to th~ age, and the most any of us can do by suppression is to put the development of the subject into the hands of the most irresponsible and most venal of our engineers. The best we can do is to see that a large public understands the trend and the bearing of the present work and to confine our personal efforts to those fiel~s, such as physiology and psychology, most remote from war and exploitation." II)

(Aarhus University Mathematics Lecture Note series, no. 38.) U) [W], page 28.

viii

That was an important statement, but we must now go further.

I believe that scientists have an obligation to try

to estimate which of the possible results of new technical developments are likely to occur in reality. done in a social and political vacuum.

This cannot be

In a peaceful, lib-

erated, nonexploitative society there would be little to fear; beneficial applications would be pushed while harmful ones would wither.

But in today's United States it is mainly the

government, especially the Pentagon, and the giant corporations which have the resources and the desire to exploit vanced technology for their own purposes.

ad~

I do not think the

prospects here for the benign application of science are encouraging.

Elsewhere in the world the outlook is rarely much

better, and sometimes worse. What then can be done?

To personally abstain from

immediately harmful work is a first step, but no more. Wiener's emphasis on public education is surely important; the vital decisions must not be left to the experts and rulers, but should be made in a broad political forum.

This

is beginning to happen in the nuclear energy controversy, for example, despite powerful efforts to exclude the public from meaningful participation.

Individual scientists

and

engineers, and several organizations of scientists, have played important roles in this process. Perhaps the key word which must be added to Wiener's statement is "organize."

The great day of the dedicated

solitary researcher is over, if indeed it ever existed.

Now

our scientific work is elaborately planned and supported, but the old individualistic idealogy of "disinterested research"

ix

and "knowledge for its own sake" persists.

These concepts can

serve as intellectual blinders which prevent us from understanding the social role which we in fact do playas mathematicians, scientists and engineers, and which keep us from working effectively for change.

In their stead, con-

cern for the human consequences of scientific and technological achievement must become part of our working lives, of our teaching and learning, of our professional meetings and writing.

Only through organized collective action can this be

achieved. The goal of controlling and humanizing science will not be fully attained, I believe, until radical changes have been made in the structure of society.

I also believe that

to wait for that day before beginning to act invites disaster. Fortunately there appear to be a growing number of people, in the U. S. and elsewhere, who are deeply concerned about the social consequences of their scientific work, who are ready to give this concern a major role in their professional lives, and who are getting together in old and new ways to develop their ideas and to put them into practice.

Since this must

be the starting point, perhaps there is some basis for optimism. The author of a book such as this one is obviously indebted to almost everyone who has contributed to the field, and I am drawing not only on the research but also on the expository writing of many others.

In particular, lecture

notes of A. D. Wentzel and (especially) of K. Ito have been very helpful in developing my feeling for stochastic processes, and the writings of Noam Chomsky, D. F. Fleming and

x

I. F. Stone have aided me to better understand the world in which we live.

Other resources are listed in the bibliography.

I wish to express my appreciation for the hospitality of the Mathematics Institute of Aarhus University where my lectures were given four years ago; my visit there was both pleasant and profitable.

And finally, I am grateful to the

Dartmouth College mathematics department for its generosity with leave and assistance during the preparation of the final version of the book.

John Lamperti Hanover, N. H. February, 1977

PREREQUISITES There are three general mathematical prerequisites for reading this book with profit.

They are an adequate know-

ledge of mathematical analysis, knowledge of basic probability mathematics (including its measure-theoretic foundations), and familiarity with examples and applications from elementary probability, preferably including finite'Markov chains. Taking up the last point first, there is no use in trying to prescribe exactly what

~

be known; the idea is

that readers should have some feeling for the importance of the mathematics we will be discussing, as well as some basic intuition about how probability works.

There are innumerable

valid ways to gain such experience, but in my opinion anyone who does not yet have it should postpone reading this book. I can think of no better source to which to turn than William Feller's beautiful text [F l ].

The chapters on Markov

chains and on examples of continuous-time stochastic processes (chapters 15 and 17 respectively in the 3rd edition) will be especially helpful in motivating most of part 2 here. It is assumed that the reader has already studied the modern formulation of probability theory:

probability spaces,

random variables as measurable functions, the concept of independence and some properties of sums of independent random variables such as the laws of large numbers (weak and strong) and the central limit theorem.

One source for all this is

chapters 1 through 3 of my previous book [LJ.

Of course

there are many other places to find this material, and some of them are listed in the bibliographies here and in [L].

xii

One specific comment:

the general (non-discrete) theory of

conditional probabilities and expectations is not treated in [LJ but will be needed here.

For this reason the essentials

have been set out in Appendix 2, and any reader who is not familiar with these topics should begin there. The prerequisites in analysis begin, of course, with an adequate knowledge of measure theory.

This must include

familiarity with abstract measure spaces and integrals on them, plus such basic results as the extension theorem for constructing

a-additive measures on Borel fields, the domin-

ated and monotone convergence theorems, the Fubini theorem for product measures, and the Radon-Nikodym theorem.

In ad-

dition to measure theory, considerable use is made of Hilbert and Banach spaces, both in the abstract and such more specific ones as

L2

and spaces of continuous or bounded functions.

The concept of "Hilbert space" is not defined in this book; it is assumed that the reader has studied Hilbert spaces already and is familiar with such things as orthogonal bases and series (generalized Fourier series), subspaces, bounded linear operators and functionals, and projections.

The spec-

tral theorem is not assumed, although a form of it is basic for Chapters 3 and 4; we will derive what we need there in (I hope) a relatively painless way.

As for Banach spaces, the

concepts of linear functional and of bounded and unbounded linear operators will be used (in part 2) without special explanations.

The results of F. Riesz relating linear func-

tionals on spaces of continuous functions to integrals should also be familiar.

xiii The material above is usually included in a course in "real analysis" such as the one taught at Dartmouth College for first-year mathematics graduate students.

All that I

have mentioned, and much more, can be found in Rudin's Real and Complex Analysis [R).

I might add, however, that it is

certainly not necessary to know everything in that book in order to read this one!

Also, I personally feel that main-

taining strict logical priorities can be a block to learning mathematics; it is not a sin to understand the statement of a theorem and to use it before learning the proof:

If this

point of view is accepted, the above list of prerequisities may seem much less formidable. Some of the more particular bits of information needed are listed below, arranged according to the chapter in which they occur: Chapter 2.

In section 3 some facts about the multi-

variate normal distribution are used without proof or much explanation; adequate background can be found in [F 2 J, Chapter 3, section 6. Chapter 3.

When reading section 1 it will help to have

seen the solution of linear, homogeneous finitedifference equations.

Section 3 requires the use of

"Helly's theorems" about the weak convergence of measures on

Rl; one reference among many is [LJ,

section 12.

Weak convergence turns up elsewhere in

the book too. Chapter 4.

Some standard Hilbert-space theory and

harmonic analysis are needed throughout the chapter, and in section 2 we need to know that linear

xiv

functionals on continuous functions are represented by at most one signed measure.

For this, see [R],

Chapter 6 (the Riesz representation theorem).

Then

in section 3 we require a theorem of F. and M. Riesz whose proof is sketched later in section 7.

That

proof (which can be omitted without loss of continuity) involves some complex analysis and a few facts about harmonic functions which go beyond what is needed elsewhere; they are all in [R]. Chapter 7.

Basic ideas about Banach spaces are used

throughout the chapter, but semigroup theory is developed from scratch.

A little advance famili-

arity with Laplace transforms may help; it's not strictli required.

The weak topology is mentioned

several times but never really used except at the end of section 6 (which can be omitted).

Some know-

ledge of elementary differential equations will help with certain examples. Chapters 8 and 9 involve lots of measure theory, but nothing exotic.

The last section of Chapter 9

probably can't be appreciated without some prior knowledge of potential theory and harmonic functions (see [R) and/or [K]) but it plays no role in what follows. And that's about all.

REMARKS ON NOTATION As a rule. bold-face letters have been used for operators throughout the book; at times they have also been used for the names of particular Hilbert or Banach spaces. The Hardy symbols

0

ment that a function

and

0

f(h)

appear occasionally; the stateo(h)

as

h

-+

a

means that

lim f(h)/h = 0, while the similar statement with h-+a that

f(h)/h

is bounded.

(The lias

if it is clear from the context.)

h -+ a"

O(h)

means

may be omitted

A few other symbols used

repeatedly, and their meanings, are listed below:

(n,

is the standard notation for a probability space,

~,P)

consisting of a set

Q, a

and a probability measure

a-field of its subsets P.

The letter

E

~

is always

used to denote mathematical expectation of a random variable. ~(W),

where

W is either a collection of sets or of random

variables, means the smallest sets in

a-field containing all the

W, or with respect to which all the random vari-

abIes are measurable, whichever is appropriate. is the indicator function of the set

~A(X)

~A(X)

¢

=I

if

x E A and

o

that is,

otherwise.

denotes the complex numbers.

Rn

is real Euclidean

n-space; Rn*

is the Kronecker

6 ij

i

r

6. which is

n-space com-

is real

pactified by adding a single "point at

.l'

A~

I

if

"'. " i

=j

and

j .

means the constant function whose value is always

'iff'

0

is sometimes used in place of 'if and only if.'

1.

if

TABLE OF CONTENTS Page Prerequisites

xi

Notation

xv

Chapter 1.

General Introduction

Chapter 2.

Second-Order Random Functions

12

Chapter 3.

Stationary Second-Order Processes

32

Chapter 4.

Interpolation and Prediction

52

Chapter 5.

Strictly-Stationary Processes and Ergodic Theory

1

83

Chapter 6.

Markov Transition Functions

106

Chapter 7.

The Application of Semigroup Theory

134

Chapter 8.

Markov Processes

181

Chapter 9.

Strong Markov Processes

204

Chapter 10. Martingale Theory

234

Appendix 1.

250

Appendix 2.

255

Bibliography

260

Index

262

CHAPTER 1 GENERAL INTRODUCTION 1.

Basic definitions.

A stochastic (or random) process is formally defined to be a collection of random variables defined on a common probability space

(n.

parameter set

The set

T.

and indexed by the elements of a

~P)

T will in this book generally be

one of these: Rl. R+ .. [0. 00 ) ' '1 " {. ••• -1.0.1.2 •... }. or

in all these cases the parameter thought of as time. of a random sequence.

If

T" 2 If

is often called a random field.

+

.. {0.1.2 •... };

t E T may usually be or

T" Rn

'1

",+ A

with



• one sometlmes spea k s

n > 1

the process

The random variables of the

process need not always be real-valued but must have the same rang.e-space

S; this may be

some other measurable space.

n R

(a vector-valued process) or

In the first part

of this book

the (common) range-space of the random variables making up the process will almost always be the real or complex numbers;

2

1.

INTRODUCTION

this is not generally true in the latter part, although the reader can privately make that restriction if desired. any case the range

S

In

of the variables is called the state

space of the process. In describing a stochastic process as we have just done there is a certain psychological bias:

one tends to regard

the process primarily as a function on

T

each

t E T

are random variables.

whose values for

Of course we are really X = X(t.w).

dealing with one function of two variables. say where tion of

t E T. wEn, and where for each fixed X(t.·)

t

is measurable with respect to

we fix an

t ~

wEn. we obtain a function

(or into whatever the state space

S

the funcIf instead

XC' .w): T

+

RI

may be) which is called

a trajectory or a path-function or sample-function of the process.

It is also legitimate. and sometimes most appropriate.

to think of the process

X as a single random variable whose

range is a space of functions on

T; the term random function

perhaps suggests this point of view. Let set in

t l 't 2 •... ,tn E T. and let Sn. Then the definition

makes sense since function

P

t l ' .. · .t n

urable sets of subsets of

XCt i ,')

Sn.

(.)

is

C be any measurable

_~-measurable.

and the set

is a probability measure on the meas-

The measures so obtained. for all finite

T, are called the finite-dimensional distributions

of the process.

This family of distributions is often (but

not always!) the most important aspect of the process, and one frequently needs to study a random process by starting with

2.

Remarks on methods

3

its finite-dimensional distributions.

The existence of such

a process -- that is, one whose finite-dimensional distributions coincide with a given family of measures is often assured provided the measures satisfy certain simple consistency conditions.

This theorem is due to

Kolmogorov and should be familiar; to review it see Appendix 1.

2.

Remarks on methods. The theory of random processes, although always deal-

ing with the same general sort of mathematical object, has many aspects and uses very diverse methods. pose that t ET. all cess.

X is real or complex and

Then the function

For example, sup-

E(IXtI2)

K(s,t) = E(XsX t )


t

have (a. s. ) .

(1)

That is, given the "present" of the process, the "past" event

A and the "future" event

B are conditionally inde-

pendent. The general physical interpretation mentioned above can be paraphrased as asserting that the conditional probability of a "future" event as the probability of

B given the "present" is the same B given the present and the past; in

analogy with (1) this becomes

10

1.

INTRODUCTION

(a.s.) for any

B E:

(2)

~>t'

Problem 6.

Prove that (1) and (2) are equivalent.

Note that the definition (1) is symmetric with respect to past and future.

It follows that the process

X'

defined

Xt t is also a .Markov process whenever X is one. 0It also follows that another equivalent expression of the

by

X~

Z

Markov property can be obtained by interchanging the roles of past and future in (2). The notion of Markov process has a generalization to cases when

T

is an ordered set not consisting of real num-

bers. but this extension seems not to be of great interest. However. the extension of a sort of Markov property to certain random fields is a subject of active research at this time.

We will not deal with Markov random fields in this

book; an introduction can be found in (SJ. A real or complex process

X

with

have independent increments if. whenever ET, the variables

T C Rl

tl < t2 < ••• < t

(X t -X t ),(X t -X t ) •.•. ,(X t -X t

mutually independent.

2

If

I

3

T

+

=

I

is said to

2

n

n-l

n

are

) •

• such a process consIsts

of the partial sums of a sequence of independent random variables.

Especially important is the special case of station-

ary independent increments which obtains when

X satisfies

the requirements both for stationary increments and for independent ones.

In the example

T

=

7+

this means that the

process is formed from partial sums of independent and identically-distributed random variables. examples with

T

= R+

The most famous

are the Brownian-motion and the

4.

Markov processes

11

Poisson processes.*) Problem 7. cess

Prove that if

and the pro-

X has independent increments, then

X is a Markov pro-

cess. (Hint:

Choose

t

>

0

and define a future set

B as

follows:

where

t < tl < t2 < ••• < tn

and the

First show that

for such sets

BE

~t'

E.

J

are Borel sets.

using the independence of the incre-

ments of the process and properties of conditional probability.

Then show that the class of sets

perty holds must form a

B for which the pro-

a-field.)

*)These two examples will be mentioned repeatedly later on. For introductions to them see for example (L], sections 20 and 21, and (F1J, chapter 17, section 2, respectively.

CHAPTER 2 SECOND-ORDER RANDOM FUNCTIONS In this chapter we will consider some general properties of stochastic processes with finite second moments. already noted, the variables of such a process

As

{X t } can be

H = L2 (0, Y,P).

thought of as elements in the Hilbert space

It will be convenient to sometimes allow complex values for the

proc~ss,

and so we will understand that H may mean

either the real or the complex

L2

space.

The norm and inner

product will be denoted (X,Y) = E(XY,)

respectively, and this notation will be used when the Hilbertspace aspects of the process, rather than more probabilistic ones, are in the foreground. {X t }

The covariance function of

can be written in two ways:

We will be looking mainly at those aspects of

{X t } which

1.

Differential calculus

13

make sense in Hilbert-space terms and we always assume that T

is a subset of

Rl

(although certain things can be gen-

eralized to other cases). 1.

Differential calculus. The word "limit" will now always mean covergence in the

LZ norm; for example, if y

T = RI

o.

Y E LZ and

means that

It is worth noting in passing, however, that rapid convergence in the norm does imply almost sure convergence: Proposition. such a way that Proof.

Suppose

Y

n=l

T = Z+

E(IX _YI Z) n

The functions




D.

However, our proof of (1) used the definiteness property only in the case when

Thus the apparently weaker con-

dition

for all real

A is actually sufficient for (1), while it is

easy to see that (1) in turn implies the positivity inequality in its original form (with arbitrary

zl"" ,zN)'

'Phe Bochner-Khintchine t-heorem, dealing with the continuous case, is often interpreted in probability theory as the condition for a function

~(t)

to be the characteristic

function of some probability distribution. i t here; see, for instance,

We will not prove

[FZJ, chapter 19, section Z.

3.

46

STATIONARY 2ND-ORDER PROCESSES

We now derive the spectral form of the stationary

pro~

cess itself, which is based on the "stochastic integral"

dis~

cussed in Chapter 2. ous case

T

= Rl;

and obvious.

changes for the discrete case are trivial

The idea is to set up an isometry between the

Hilbert space

M (a subspace of

the random variables dF

We will use the notation of the continu-

{X t

}

L2 (0, Y,P)) generated by 1

and the space

L 2 (R

,dF), where

is the spectral measure. We begin with the set Ml

binations of the

of all finite linear com-

Xt's and with the set of trigonometric

polynomials, and we define a correspondence between them by letting

1/1( for any

r j =1 n

t. G: T, c. E J J

c. Xt J

t.

n

j

)

.rl

J=

c.e J

it.). J

It is easily verified using (2) that

1/1 is an isometry:

Then in the usual way

1/1 extends to an isometry between the

closure of the set of trigonometric polynomials --- which is 1

L 2 (R ,dF) it).

e

*)

--- and the closure

,and the unitary operator

M of MI.

Ut

We have

1/Ix t ..

on M characterized by

goes over into multiplication by the function

e

it).

*)This point may not seem quite so obvious as before, and the reader is invited to think it over a little.

3.

Spectral representation

47

The function

1

is in

must correspond to something in M

and so it

L2 (R ,dF)

let's say to

is easy to see that the random variables

{Za}

with orthogonal increments; in fact, if

a

b


O.

0,

(~n,Xn-k)

But

In

X ) "n' n-k ..

( c

-n

so that

einAW(A)e-i(n-k)Af(A)dA

is orthogonal to

e

-ikA for all

k >

o.

There-

fore we may say that f2 E L~O' which means the span in (Lebesgue) of {e inA : n ~ a}; in other words, f2 is of

L2

"power series type." Finally, we must have

~n

E Mn;

~o

E MO

is enough.

But

It is clear that this will belong to -1

series of

quencies

*)

f

1

Mo

if the Fourier

contains only terms with non-positive fre-

, since then

can be expressed as a limit of -1

*)This statement requires some caution, since square-integrable.

fl

We will just assume that

f- l ELI); in particular, this is true if

f

-1

fl

may not be E L2

(Le.,

is continuous

4.

70

INTERPOLATION AND PREDICTION

linear combinations of X 's with n < O. Consequently, the n series for f- l will involve only non-negative frequencies. 2

To sum up, we have proved: Theorem. torization f2

which

f2

~

f

f -1

and

Suppose the spectral density

belong to

of pOlver-series

L>O

has a fac---

(i.e., have Fourier series Then the random variables

~).

(3)

the orthonormal set appearing in the Wold decomposition.

~

Corollary.

~

Suppose that

rzn

fl(~)

-ik~

f 1 E L)

A) then

[-w,w)

{X t } must be stationary.

(Com-

pare this with problem 4 of Chapter 1.) (b)

Rn

If

{X t }

{Y t }

and

{Xt+Y t }

which are independent, then

(Proof?)

are stationary processes in is stationary.

In particular, the random oscillations of example

(a) can be added if they are independent to form a larger class of stationary processes. (c)

An independent sequence

process (with

T

= 7)

{~n}

is a stationary

iff the random variables are identically

distributed.

where

~n

N+l

Let

(d)

g: R

1

... R

be a Borel function and define

are independent and identically distributed (iid).

{X } is stationary. If g is a linear function {X n } n will be a moving average of the type ''Ie have considered be

Then

fore, but

g

need not necessarily be linear.

sible also for (e) that is, an sums

= 1.

g

(It is pos-

to involve infinitely many variables.)

Suppose that

p

=

[Pij)

is a stochastic matrix;

N xN matrix of non-negative numbers with rowAlso, suppose that

bilities such that Markov chain with

WP

= n.

n

is a row-vector of proba-

Then there exists a stationary as its transition matrix and

w as

1.

Examples.

85

Law of large numbers

its stationary distribution.*) To construct such a process, let cardinality

N

S

be any set of

(which may be countab1y infinita) and think

of the indices

i,j

as ranging over

S.

Now set

n = S7; S

will be the state space of the process under construction. Sets such as C = {w En: wa where

St E Sand

be the

a,b E 7, are called cylinder sets; let

a-field which they generate.

measure

P

~

Finally, we define a

on cylinder sets by putting

This prescription determines a family of finite-dimensional distributions.

It is not hard to see that they are consis-

tent, and so by Kolmogorov's theorem on

Since

~

P(C)

P

is independent of

butions of the random variables

extends to a measure a, the joint distri-

Xn(W) = wn

are invariant

under time-translation; hence

{X}

is indeed a stationary

process.

{X n }

is a Markov process as

It's also true that

n

defined in Chapter 1, but we will not prove it now since this matter is discussed in more generality in Chapter 8. A method of obtaining many more examples will be described in the next section. In example (c) (an iid sequence of random variables {~n})

the famous strong law of large numbers (slln) holds pro-

vided

E(I~

n

I)
O} n

E fI, n

o.

~

(2)

is a stationary process.

If

~

is invertible, (2) can be extended to negative values

of

n

as well.

As l"e will see later, it is "almost true"

that every stationary sequence is of this form. Xn E Ll

Clearly The mp mapping

(or

L2) for all

n

iff

X E Ll (L 2).

induces a norm-preserving operator on

~

L2

which is defined by

= X(~w)

(UX)(w)

X E L2 : if ~ is invertible, then U is unitary. (Of course, this U agrees with the unitary operator associated

when

with the stationary process

{Xn }.)

Although every inverti-

ble mp mapping induces a unitary operator on

L2

most unitary operators are not of this form.

(Consider the

case when sional!)

fI

consists of 2 points, so that

in this way,

L2

is 2-dimen-

This illustrates the difference between wide-sense

and strict stationarity. Here are some examples of mp mappings. or

is measure-preserving; this fact is a basic prop-

erty of Lebesgue measure. ity spaces.

Translation of

Of course these are not probabil-

But one can also translate modulo

1

(that

is, on the circumference of a circle), and it is easy to check that the mapping

~a:

[0,1)

+

[0,1)

defined by

is indeed measure preserving and invertible for every real a.

This idea extends to compact topological groups in gen-

2.

Measure-preserving transformations

89

eral, on which one can construct probability measures (the Haar measures) invariant under translation by the group elements. The transformation

$x; 2x (mod 1)

on

[0,1)

is

measure-preserving but not invertible; it is two-to-one everywhere, in fact. Looking back to the Markov chain example (e) above, we can define

$:n

+

n

by n E 2';

in other words,

W --

elements of

-- is simply translated one unit to the left.

S

which is a doubly-infinite sequence of

This mapping is called a shift and it is clearly measurepreserving. defined

Moreover, the Markov chain random variables were

by (2) above using the shift:

we have

XO(w) =

Wo

and

The shift is invertible provided the process is defined for all

n E 7, but even if

T = t

+

the shift makes sense; it

:'mp:;.'~)no.::ng:r.ol~-tO!Jn~. th:O:.::::P:::,:'..:u:t:O::): sequence of independent-coin-tossing random variables.

When

+

T ; 7 , the shift for this process can be identified with the mapping

x

+

2x (mod 1)

on

[0,1).

(How?)

As a final example, we mention the context in which the study of

mp

transformations began.

Suppose that a

90

S.

STRICTLY-STATIONARY PROCESSES

classical mechanical system is confined to a bounded region V of its phase space (position and momentum space).

Its

time-evolution can be described by Hamilton's system of firstorder differential equations, whose solutions provide a famWt : V + V (t E Rl ). The mapping Wt has the following interpretation: If the system is in a state

ily of mappings Xo E V at time Wtx O'

to' then its state at time

to

+

t

will be

These mappings satisfy the group condition for all

1 t,s E R ;

moreover, they preserve the Lebesgue measure of subsets of V.

(The latter fact is knOlffl as Liouville's theorem, and it

partially explains why position-momentum coordinates are more useful in mechanics than position-velocity ones, for which the mappings corresponding to

Wt

are no longer measure-

preserving.) A mechanical system can thus be considered as a stationary stochastic process.

Of course, its "randomness" lies

only in the choice of an initial point in the phase space

V;

after that is done the path of the process is determined. But the same thing is true for any process IHi tten in the form (2), including, for example, a sequence of independent random variables!

There is less here than meets the eye at

first glance. We will now prove the "recurrence theorem" of H. Poincare, which (in 1912) was the first general result about measure-preserving transformations.

In the context of mech-

anics as described above, this theorem implies that a system

2.

Measure-preserving transformations

91

returns again and again arbitrarily close to its initial state.

This apparently rules out irreversible changes of

state such as the assertion in thermodynamics that the entropy of a system can only increase, and much has been written in attempts to resolve this seeming contradiction. Theorem.

Let

$

be

bility space (n, ~,P).

~ ~

Clearly

Let

B

B n $-n B

~

WomB n

1jJ

for almost every

n

k {wEA; $ w fl: A for all for all

- (m+ n)

also, so that the sets

n > 1.

B = $-m(B n

{1jJ-nB}

pen)
-

1jJ

-n

$

PCB) = O.

n > O. is mp.

1jJ

n

n > 1.

w E A infinitely often, we shall apply

the result already proved to the mp transformations w E A-N, where

But

Thus for al-

w E A, wnw E A for at least one

To see that

I} •

B) = ~

are disjoint for

we must have

~,

proba-

Then

These sets all have the same measure because then since

~

A E Y, wnw E A holds

Then for any ~

for infinitely many values of Proof.

transformation £g

N is a set of measure

0

When

1jJk.

equal to the union

k, we see that for every such that (1jJk) nkw EA. Thus

of the exceptional sets for each k > 1 1jJ n w

there exists

nk > 1

E A for arbitrarily large

n, as asserted. 00

Corollary 1. most everYlofhere on Proof. A

Let

f > O.

Then

{w: few) > OJ.

L

n=O

f(1jJnw)

=

CD

Apply the recurrence theorem to the set

{w: few) >!}

n '

and then let

n ~

00.

a1-

92

5. Corollary 2.

STRICTLY-STATIONARY PROCESSES

With probability I, a stationary Markov

chain returns infinitely often to the state it occupies at time

O. Proof.

{w: XO(w)

i}

i E S.

for each 3.

Apply the theorem to the sets

The ergodic theorem. Ergodic theory has bean called "a theorem looking for

a theory."

This is doubtless no longer fair, but for some

time the subject consisted mostly of applications and refinements of George Birkhoff's celebrated ergodic theorem, first proved by him in 1931.

The proof given here is much simpler

than the original one.

Many authors have contributed improve-

ments to the proof; here we are using a lemma published by A. Garsia in 1965. ~rgodic

Our version is adapted from [T].

Theorem.

probability space n

Let

n1 n-l L k=O

lim +

A E 9"I

Lemma 1. let

B =

for eVery

be!

(fl, .9",P), ~ let

00

exists almost everywhere. for every

~

=

k f(~ w)

Define

{w: Sn(w) > 0

So

for some

Then

~

= few)

1jI- l C

=0

transformation on a

f E L1 .

Moreover,

{C E .9":

~

(1)

fELl

~

fAf

fAf

C}.

and

S n

n} •

n-l

L k=O

Then

f(~

f.A()B

k

w), and fdP

~

A E .9" • I

Proof.

Let

S" (w) n

max Sk (w). O -

Inn

S* n

since

S* (~w) n

~

f

f

IA g (~w)dP

But because

f dP > 0

Inn

that

w(/;B. n

S (~w)dP = O. * (w)dP f A Sn A n*

A E STr ' fA g dP =

preserving.)

if

0

0, we have

(Note that for any integrable function set

Hence

00.)

Bn

f.~n

~

g

and any invariant

~

since

B as

f dP > 0

n

is measure-

+ ~,

it follows from

also.

I\J ID

Problem 1.

Give a complete proof of the assertion in

parentheses just above which is used in the proof of the lemma. Lemma 2.

f

Ber A

f dP

Let

1 n-1 {w: sup - L n>l n k=O

Ba

aP (B () A)

> -

for

a

Proof.

Apply Lemma 1 to the funct ion

few)

= lim sup n + 00

and let let

E a,B

few)

= {w:

1 n-l - L n k=O

be the lim info

a}.

Then

A E .91 '

Proof of the Ergodic Theorem. -

k

f(~ 00) >

g (00)

Let us put k

f(~ 00),

For any real numbers

a,e

few) < e, !(w) > a}, and note both that -

94

S.

and that

a,a E?r

E

STRICTLY-STATIONARY PROCESSES

Ea 13 C Ba'

Then by Lemma 2

,

JE Replacing

(2) f dP ~ ap(E a)' a,,., a,S by -f,-a,-a respectively and applying

f,a,a

Lemma 2 again, we have the same set

as before and so

E

a,a

(2) becomes

f dP < SP(E

IEa,a

,,), a,p

Comparing (2) and (3), we see that

(3)

P(E a 13)

,

o

if

S

< a.

But since

{w:

few)

this implies that

-f

the limit

A

P(!

few)}

U Es r r 0, be two random n-

sequences, not necessarily defined on the same probability space, which have the same finite-dimensional joint distributions.

Suppose that

P ( lim X, n .... 00 n

X')

I, where

on the same space as as

X)

P( lim X n n .... "" X'

{X'}

= 1.

Prove that

is a random variable defined

and having the same distribution

n

X.

Clearly the same conclusion holds for Cesaro convergence, and so Theorem 1 -- the strong law of large numbers for all stationary sequences -- follows from the ergodic theorem. 6.

Continuous time-parameter. In this final section, we shall show how Theorem 2

(section 1) can be deduced from what we have already proved. First, we note that if finite) then mT

=

E(IXtl)

I

f

To Xt

T,

(we assumed that it is

Ix Idt)

tot

by Fubini's theorem, since that for every

= EC IT

T Eclx I)dt o

=m

T

Io

{X t }

IXtldt

is measurable.

It follows

exists almost surely, as does

dt.

Now let us define two random sequences

It is plausible that both

{y}

n

and {Z} n

are stationary,

and clearly both are integrable since ECIYnl) 2 E(Zn) =

t

n+l EClxtl)dt

m < "".

6.

Continuous time-parameter

103

The question of stationarity, however, is not as obvious as it at first appears, and we will return to this point below. Accepting the stationarity for the moment, by the strong law already proved for sequences we have 1

n-l

- L

Yk

~I)'

To prove Theorem 2, we must extend (1)

n k=O

where

E(Yol

X

(1)

from integral to arbitrary values of small point that the process

{Zn}

In

Ix

not

If a sequence

ak

~

Idt

0

n k=O

In fact,

too and yields

n

= 1 n-l L

It is just for this

was introduced.

{Z}

the ergodic theorem applies to

-1

n.

Zk ~ Z

a.s.

(2)

converges in the Cesaero sense to a

finite limit it follows that

anln

~

~ 0

a. s.

0, and so (2) implies

that

-n1 But we can write

! It t

0

/n+l n

([t]

X ds = __1__ s

[t]

Ix t I dt

(3)

means the greater integer

J[t] 0

X ds . [t] s

+

t

! t

ft

< t)

X ds,

[t]

s

and then applying (1) and (3) we see that the first term on the right side converges a.s. to tends to

a.s.

0

Thus

!( t

where

XE

Ll

and

X, while the second term

0

X ds s

-+

-X

a. s.

E (X) = E(Y O) = E (X O)

(the second step in-

volves Fubini's theorem again), and the slln is proved. It remains to show that

result for

{Zn}

{Y n }

is stationary.

will be included, since

{IXtl}

(The

is sta-

104

S.

{X t }

tionary provided

integrals defining the then each

Yn

STRICTLY-STATIONARY PROCESSES

is.)

This would be quite easy if the

Yn's were of the Riemann type, for

could be approximated by a sum involving fin-

itely many of the

Xt's.

only measurability?

But what in general, when we assume

It is not at a glance obvious that the

distribution of, for example, YO =

f~ Xsds

is even deter-

mined by the finite-dimensional distributions of

{X t }.

Nevertheless, it is true: Lemma.

Let

dom process such

rUt: a that

~

t

~

be a measurable real ran-

b}

f~ E(lutl)dt

distribution of the random variable


0

for

pn = 11'

lim

11'

finite stochastic matrix hav-

Then

n ....

exists, where

~

is

a

(1)

00

stochastic matrix with identical rows.

Assume first that

m

= I,

so that

Pij ~ e: > denote the small-

min p~~) i 1J est element in the j'th column of pn, and similarly let for each

0

M.Cn) J

i, j.

Let

m. (n) J

be the largest element.

Then

1.

Markov chains:

discrete time and states

109

(n) ~ (n-l) ~ p.. = t. P,kPk' > t. P'k m. (n-l) = m. (n-l). 1) k 1 ) - k 1) ) Since (2) holds for each

i

it holds for the minimum, and so

m. (n) > m. )

(2)

-)

(n-l);

(3)

i.e., the column-minima increase with

n.

In just the same

way we can see that the maxima decrease; both M. (n), therefore, have limits as )

n

mj(n)

and

The theorem will

~~.

be proved if we can show that these limits are the same. To see this we estimate a little more carefully and we use (for the first time) the assumption that pose that the minimum tained when

i

m.(n)

and maximum

J

p .. > e. 1) -

M.(n-l) )

= iO and i = iI' respectively.

Sup-

are at-

Then

m. (n) J

so that m.(n)

>

J

-

eM.(n-l) J

(l-e)m.(n-l).

+

)

(4)

In just the same way (do it!) it can be shown that M. (n) < em. (n - 1) J

-

)

+ (1-

e) M. (n - 1) • )

(5)

Subtracting (4) from (5) then yields M. (n) - m. (n) < (1-2e) [M. (n-l) - m. (n-l)], so that

J

J

J

-


1. n

We know in any case that

lim -+

since the stochastic matrix treated. n

Since

For any

lim -+

p

k

nm+k n

00

pm

= 1,2, •..

falls into the case already

,mol

-+

we then have (8)

00

has ro~-sums equal to

pk

(7)

lim

columns, pk1l' is simply

land

11'

has constant

(7) and (8) therefore imply that

11';

holds in this case also and the proof is complete.

(1)

Corollary. 11'

= 11'

pnm 00

L 1r. i 1

satisfy

The (identical) rows of the limiting matrix p ..

1T.,

J

1J

no vector other than

1T

=

L 1T.1 =

> 0, and

1T. 1

i

1.

There is

which satisfies these condi-

{1T.} 1

tions. Proof. ness.

Everything is clear except (possibly) the unique-

Suppose

conditions.

is another vector which satisfies the

Then v

But since

v

L

v.1

Vp n

= Vp

=1

and

n 11'

vp

lim -+

n

=

V 11' •

00

has constant columns, V1l'

= 1T

which proves uniqueness. The hypothesis of the theorem that, for some (m)

Pij

> 0

for all

i,j

m,

is easily seen to be necessary as

well as sufficient for (1) when it is assumed that positive in addition to having identical rows.

11'

is

Still, this

hypothesis is probabilistically quite unnatural and it is

1.

Markov chains:

rliscrete time and states

111

useful to rephrase it. Definition. have

(n)

Pij

for some

0

>

If for every pair of states

we say the matrix

(si's.)

(which may depend on

n

J

i

we

and

j)

is irreducible or that all states com-

p

municate. Defini tion.

(n)

The integer

is called the period of state

d i ; g.c.d. {n > 1: Pii si'

If

di

= 1

O}

>

the state is

said to be aperiodic.

A stochastic matrix

Proposition.

satisfies the

P

positivity condition of the theorem if and only if

P

is ir-

reducible and has at least one aperiodic state. Proof.

Clearly the positivity condition implies that

all large powers of

p

states communicate (in

are strictly positive, and so all m steps, in fact) and are aperiodic.

Conversely, suppose that

S. 1

is aperiodic for some

i.

Since we have (n+m) Pii the set

(n) (m) (n) (m) Pik Pki ~ Pii Pii '

= t L

k

D; en: p~~) > o} 11

with the fact that its

is closed under addition.

gcd; 1, implies that

This,

D contains all

Now consider any (k) other state s. and choose k and t so that Pij > 0 J (R.) and Pj i > O. Then for all n > M we have large positive integers, say all

(n+k+t)

P .. JJ

which means that also aperiodic.

(t) (n) (k) > P ji Pii Pij > 0,

(m)

Pj j

n > M.

> 0

for all large

Similarly we have

m so that

is

6.

112

when

MARKOV TRANSITION FUNCTIONS

n ~ M, and so the off-diagonal entries of

wise positive when

n

is large enough.

Since

matrix, it is clear that for big values of

n

are like-

pn

p n p

is a finite is strictly

positive. If the hypothesis of irreducibility is kept but some state has period

d > 1, the theorem remains true when Cesaro

convergence is substituted for the limit; the statements about the stationary vector

n

still hold.

Finally, when

p

is

reducible, the states of the system can be divided into equivalence classes of mutually communicating states within which p

acts as described above.

There may also be some transient

states which belong to none of the equivalence classes. Their probabilistic role is temporary; after a finite time (with probability one) the system must be found in one of the irreducible classes.

For proofs and further developments

refer to [F I ] or the other references. Finally we mention that if the states form a denumerably-infinite set much of this theory continues to hold, though different methods of proof are required to attain adequate generality. positive for some

(The hypotheses that

pm

be strictly

m is now much too restrictive, and is no

longer even sufficient for the theorem to hold.

In addition some

quite different, and very interesting, phenomena arise including a far-reaching connection with potential theory.

In ad-

dition to [F l ], chapter 1 of [DY] is highly recommended for an introduction to some of these problems.

2.

Continuous-time Markov chains

2.

Continuous-time Markov chains.

113

The intuitive idea will be the same as in section 1 except that "transitions" of the system from one state to another can take place at any time Thus for any

0,1,2, ...

matrix

pet)

t

>

t

0, not only at

~

t

=

we should have a stochastic

0

which expresses the probabilities of passage

from one state to another after a time-lapse of duration

t.

The same argument which previously led us to conclude that the matrix powers sitions in

n

pn

represented the probabilities of tran-

steps now suggests that

the matrices

pet)

should satisfy p(t+s) Thus the matrices

= p(t)p(s);

{p(t): t

>

O}

s,t

>

O.

(1)

form a semigroup; we will

see that this idea is central in the general theory of continuous-time

~arkov

processes.

In contrast to the discrete case, where we could write down stochastic matrices at will, it is not so ohvious how to find the solutions of equation (1).

If

pet)

were a scalar

rather than a matrix, (1) would be the functional equation of the exponential function (but a regularity condition would still be necessary to exclude non-measurable solutions). Something of the same sort holds for matrices as well, as we shall now show. It is reasonable to imagine that no transition can take place in zero time, so we put s,t

~

0.)

for each t

= 0: lim

As stated, pet) t.

t+O+

p(O)

is an

N

Finally, we assume that pet)

= I.

I.

=

x

(Then (1) holds for

N stochastic matrix p(.)

A matrix function

is continuous at

p(.)

satisfying

6.

114

MARKOV TRANSITION FUNCTIONS

these conditions and equation (1) will be called a Markov transition function. Definition. Then

Let

A be any

N

x

N matrix

(N


0, be a Markov transition

for

i # j

(tG) ,

t

~

(2)

0,

matrix satisfying

N x N

L gij = 0

and

j

for all

i.

The correspondence between Markov transition functions and the class of matrices Proof.

G satisfying (3) is

tinuous for all

t

>

O.

t

o imply that

In fact, we have for

pet)

~-to-~.

Equation (1) and the assumption that

continuous from the right at

(3)

pet) h > 0

pet)

is

is con-

2.

Continuous-time Markov chains p(t+h)

(p(h)-l h

~

= p(t)p(h)

exists for all small

0+), and letting

for all

and

t > O.

h

0

~

h

115

pet-h)

= p(t)p(h)-l

since

p(h)" I

we see that

(Actually

pet)

as

= pet)

lim p(t±h) h+O

is uniformly continuous on

[0,00); we will use this fact later on.) Now assume that the right) at

t

exists for all

-

lim

t~O+

Pij (t) t

i ,j .

Clearly the pet)

h

h~O

Gp(t); G =

is also differentiable (from

O', i. e. , that

=

lim p (t+h)

pI (t)

(Here

pet)

also

=

c5 ij

lim

h~O+

pI (t)

=

.. = g 1) gij

(4 )

satisfy (3).

p(h) h

I

pet) (5)

p(t)G.

Left differentiability when

[g .. ]. 1)

just as easily verified.)

Then

t

>

0

is

Now both of the differential equa-

tions in (5) have unique solutions under the initial condition

peO)

= I.

But we know that the function

exp(tG)

sat-

isfies both differential equations as well as the initial condition

p(O)

=

I, and so (2) must hold:

transition function (as the matrix

pet)

is an exponential, with a generator

G is called) satisfying (3).

Conversely, pet)

= exp(tG)

function for every generator

G.

obvious except the fact that

pet)

matrix for each

t

>

O.

Since

00 p(tH =

every differentiable

L

n=O

=

is a Markov transition All the conditions are quite must be a stochastic

G has row-sums equal to

o

(tG) ,r

=

,r,

0,

116

6.

which shows that

pet)

MARKOV TRANSITION FUNCTIONS

has row-sums equal to unity.

"l" means a column-vector with all entries g ..

1J

for each

> 0

i , j.

1)

it is clear that

2 0 .. + tg .. + PCt ), 1J

small

for all small

1J

pCt) > 0

Finally, if

t.

1J

p .. (t) > 0

t.

But because

= p(t/k)k

pet) it follows that

Suppose that

Then since

=

p .. (t)

= 1.)

(Here

for all

=

g ..

1)

t

if it is true for

in some cases, we consider

0

e > O.

By the previous argument

p(e)(t)

lim p(e)(t) e+O+ and so we conclude that

>

=

O.

But then clearly

pet) ~ 0,

= exp(tG)

pet)

is a Markov transi-

tion function. To complete the proof of Theorem 1, we must prove that pI (0)

actually exists.

From (1) we have

[p(h)-I][I+p(h) + p(2h) + ... + penh-h)] = penh) - I for any if

h

+

h > 0 while

0

and any

n.

Now since

nh

t > 0

we have

n-l h

Since

p(O)

L

k=O

+

p(kh)

+

p(.)

is continuous,

Iot p(u)du.

= I, it is easy to see that the integral is non-

singular when

t

is small enough; if we fix such a

the Riemann sum will also be non-singular for small Clearly

(6)

penh) - I ... pet) - I.

t, then h.

Combining these facts with

2.

Continuous-time Markov chains

117

(6), we obtain

lim h... O+

p

Ch) - I h

I

1

t

= [pet) - 1][ 0 p(u)du]- ,

so that, in particular. p' (0+)

exists.

This finishes the

proof. Next we will discuss the behaviour of

pet)

for

large

t; the facts are simpler than in the discrete-time

case.

(We continue to assume that

Markov transition function for all

Criterion.

p(.)

t > 0

p(.)

N
1.

for all

[G ]ij = 0

i F j

is irreducible iff for every

there exists a finite sequence of states indexed by i l ,i 2 , ..• ,in = j

and

Consequently, t n [G n ] .. 1J = 0 ~ n! n=O ~

p .. (t) 1J

there

p (. )

t, so

for all ~

paths from

is not irreducible. i

to

length of the shortest ones.

j

Conversely. i f

(i F j), let

be the

in any such path iki k + l must be nonnegative since diagonal terms cannot occur in paths nO of minimum length; accordingly [G ]ij > O. As before, this implies that t > O.

p .. (t) > 0 1J

Therefore

p(.)

All

nO

for small

g

t -- and hence for all

is irreducible if the criterion on

G is satisfied. Note that in the continuous-time case there is no analogue for "periodicity."

6.

118

Theorem 2.

Let

~-function ~

pet), t

finitely

lim

MARKOV TRANSITION FUNCTIONS 0, be

~

states.

~

pet) =

~

irreducible



~

(7)

~

t~~

exists; the limiting matrix (identical)

Proof.

for all For any

stochastic matrix.

has constant columns.

the unique probability vector

~ ~

n = np(t)

that

~

Its

n

such

t > O. h > 0, p(h)

is a strictly-positive

By our results in the discrete-parameter

case we know that

= lim

lim penh)

p(h)n

=

~

(8)

n~~

exists, where the identical rows of ability

vector satisfying

n

But we have seen that on f

= np(h). pet)

is uniformly

such that

[O,~)

lim f(nh)

exists for each

n~w

is easily seen to possess a limit as in (7) exists. and so

continuous

Any real, bounded, uniformly continuous function

[O,~).

on

are the unique prob-

~

~ m,

and so the limit

The limits (8) thus are the same for each

np(t)

n

t

h > 0

for every

(even for a single fixed

t.

h,

The solution is unique

t > 0), and the proof of the theorem

is complete. It is clear that not every stochastic matrix can be embedded in a continuous-time Markov family so that the given matrix while

p

becomes

pel); for example, if some entry

pm > 0 for an m > 1

Another criterion: (Again, why?)

then p

an embeddable

p ..

1J

cannot be embedded. p

=0 (Why?)

must be invertible.

It is not always easy to tell when embedding

is possible, and quite a few papers have been written on this

2.

Continuous-time Markov chains

problem.

119

The following examples may be of interest:

Problem 2.

Show that

p

in a Markov transition function

[

-

-

a I-a

pet)

can be embedded iff

a E(1/2, 1).

the condition holds, compute the corresponding show that it

pet)

If

and

is unique.

Problem 3.

(Jane Speakman, 1967).

possible to embed a stochastic matrix

p

Even when it is into a continuous-

time Markov transition function, the latter need not be unique (in contrast to the previous problem).

r-~

JJ

1

Ll

G = 1

-1

o

and

Let

-1 G2 = [ 1/2

1/2

1/2 -1

1/2

l/J 1/2 -1

note that both are generators.

Find the corresponding tran-

sition functions

p(2)(t), and show that they

agree for (Hint:

t

pellet)

4n k

= --

/3'

and

",here

k

is any integer.

by considering the diagonal form of the generators,

one sees without much computation that certain form.

In the case of

has the form and

;

b ..

1)

p .. (t) 1)

are constants.

=

p .. (t) 1)

must be of a

p(2)(t), for example, each -3t/2 a .. + b .. e , where a .. 1)

1)

1)

Then the initial conditions and

symmetries allow one quite easily to find the precise formulas; in particular, the diagonal terms of

p(2)(t)

are

1/3 + 2/3 e- 3t / 2 .)

On the other hand, any stochastic matrix

~

describe

the transitions of a continuous-time Markov chain if the transitions are governed by a "random clock"; that is, a Poisson process. tem" undergoes

We assume namely that in time k

transitions, where

k

t

our "sys-

is a Poisson random

6.

120

variable with mean At. stochastic matrix

MARKOV TRANSITION FUNCTIONS

Each transition is governed by the

p which can be quite arbitrary.

probability of passing from state

i

to

j

Thus the

after time

t

has elapsed should be Pij (t)

= e -At

CIO

>: k=O

~ (k) k'. p IJ .. ,

so that we get p

Since

(t)

A(p - I)

e

-At

eXP(At p)

= exp[At(P

- I)}.

(9)

is a generator (satisfies (3)), this func-

tion is a Markov transition function. Conversely, let

pet)

be any transition function; we

= exp(tG)

shall express it in the form (9).

Since

pet)

by Theorem 1, we need only find a

A> 0

and a stochastic

matrix

p

such that

A(P - I) = G.

Thus if

A> 0

is fixed,

we must. take i

~

(10)

j;

The matrix defined in (10) automatically has row-sums = 1 and so it will be stochastic iff

p .. > 0; i. e., provided 11 -

A > max Igiil. Hence, for all large i such that p (t) is given by (9).

that

The representation of

pet)

A, there is a

p

as a discrete-time Markov

chain governed by a "random clock" makes it easy to construct the sample functions of the corresponding stochastic process. However, we will leave this matter until Chapter 8, when we will construct the sample spaces and random variables for a wide class of Markov processes.

3.

Discrete time, general state-space

121

Incidently, using a "random clock" and equation (9) is still a valid way to construct continuous-time Markov transition functions from individual stochastic matrices even when the number of states

(N)

is infinite.

However, one no

longer gets all transition functions this way, and the ones which are missed are naturally the most interesting! 3.

Discrete time, general state-space. Let

(5,5')

together with a

be a measurable space; that is, a set

a-field of subsets

~

5

The analogues of the

stochastic matrices considered in section 1 are functions p = p(x,A), x E 5, A E 5', which wi11 be ca11ed Markov kernals or stochastic kernals provided that (a)

is a probability measure on 5' for each

p(x,·)

x E 5, and (b)

pC' ,A) A E

is measurable (with respect to

~

for each

Yo

Given such a function we define the "matrix powers" ductively by setting

1

p

=p

P (n+l) (x,A) =

f5

pn

in-

and dY ) P (n)( yA) p (x, ,.

(1)

It is then easy to show that (2)

for a11

n,m> O.

(\\Ie define

point measure with mass at p(n) (x.A)

p(O)(x,A)

x.)

ought to represent the

= -A(x)

to be the

Clearly the function n-step transition proba-

bilities of the Markov process, in analogy with the matrix powers of section 1.

122

6.

Examples.

MARKOV TRANSITION FUNCTIONS

It is easy to construct an enormous variety

of stochastic kernals.

Let

RI

be

(S,Y)

with the Borel

field; we will consider some special types. a)

Suppose that from

possible, say to and

l-h(x)

f I (x)

and

respectively.

only two transitions are

x

with probability

f 4 (x) Then

and this is a stochastic kernal provided only that and x.

h

hex)

are measurable functions with

2

0

hex)

2

f l ,f 2 1

for all

Such processes arise in certain mathematical mddels for

psychological experiments in learning. f 2 (x)

= x-I, b)

If

= x+l,

flex)

we obtain a random walk on the integers.

Alternatively, take a continuous distribution de-

pending on several parameters -- to be specific, we can choose

N

the normal

o,~

functions of p(x,·).

Let

and

0

x, and then choose

be arbitrary measurable

~

N

o(x),~(x)

as the measure

This again yields a stochastic kernal. c)

If

p(x,A)

is independent of

x, we have indepen-

dent trials. d)

The most important case corresponds to the process

"sums of independent random variables." probability measure on

RI

p(x,A) (where

A-x

to see that

=

{y: y

= a-x

p(n) (x,A)

m-fold convolution of

Let

m(')

be any

and define

= meA-x) for some

a E A}).

m(n)(A-X), where

men)

Then it is easy denotes the

m with itself.

We shall not discuss the general theory of this class

4.

Continuous time and space

of Markov processes.

123

Some of the important problems, as

in the discrete case, are to study the limiting behaviour of p(n)(x,A)

n ~ 00, to investigate the possible existence

as

of a "stationary measure"

15

n

51' such that

on

n(A)

p(x,A) n(dx), and to formulate the best analogues for such

concepts as irreducibility, aperiodicity and transience.

Some

of what has been done along these lines can be found in Chung's 1964 paper which reviews and extends the pioneering work done by W. Doeblin before his untimely death in World War II. 4.

Continuous time and space:

definition and examples.

The idea behind the following definition is simple; the quantity

pt(x,E), where

t

~

0, x ES

and

E E Yo is

supposed to mean the probability that a "system" which is in "state" s

+

(a)

t

x

at some time

in the set Pt(x,')

Pt (. ,E) and

(c)

will be found at a later time

Accordingly we assume that

is a probability measure on

and each (b)

E.

s

51' for each

t > 0

xES; is an Y-measurable function for each

t > 0

E E 51';

PO (x ,E) =

~E

Cx) ;

and finally

A functiun satisfying (a) - Cd) will be called a Markov transition function.

The intuitive meaning of the

crucial condition Cd) should now be clear by analogy with what has gone before.

This condition is often called the

124

6.

MARKOV TRANSITION FUNCTIONS

"Chapman - Kolmogorov equation," and it is just this property which reflects the Markov principle of the lack of any "memory" in the

system.

In addition (as in the finite- state

case) some sort of continuity condition with respect to

t

will be needed; we will formulate various forms of this condition later as appropriate. Examples. (i)

Let

S be a finite set with

.51'= 2S •

Then except for the absence of the continuity assumption the above definition reduces to that of section 2. (ii) tion 3.

Let

p

= p(x,A)

be a stochastic kernal, as in sec-

If the transitions of the corresponding discrete-

time Markov process are governed by a Poisson "clock" we obtain (1)

where .{p k } 3.

are the iterates of

p

defined in (1), section

Note that the "Poisson process" itself is a special case:

simply let

S = t

and

p(x,E)

=e

Pt(x,{x+n}) Problem 4. function on

= ~E(x+l)

so that (1) becomes

-H (H)n I '

n.

Verify that (1) defines a Markov transition

(S,SI)

and that it satisfies as well the con-

tinuity condition lim Pt(x,{x}) t+O+

(e)

(iii)

field.

Let

u-. (S,J)

=

= 1,

(R 1 ,

m),

all

xES.

where

m

is the Borel

An important class of transition functions are those

which satisfy (a) - (d) and (f)

all

1 yER.

4.

Continuous time and space

125

We will see that when a Markov process is constructed from such a translation-invariant transition function, the result is a process with independent increments (Chapter 1).

The

Poisson process is the simplest example. Let ure on

~

be any infinitely-divisible probability meas-

Rl; this means that for each integer

is a distribution

whose

\.It+s

(where

"*"

\.It

such that

\.11

Pt (x,E)

t > 0

\.I.

there is a

= \.I and \.It * \.Is

denotes convolution).

it follows immediately that

there

n'th "convolution power" is

It is then not hard to show that for each probability measure

n > 0

We define

= \.It (E-x); Pt

is a translation-invariant

transition function. The most important example is the normal distribution: if

\.I

is

N(O,l), then

\.I

t

= N(O,t) e

_y2/2t

and we obtain dy.

(2)

This is the transition function of the famous Brownian motion or Wiener process.

Its character is quite different from

that of the Poisson process. does

~

The transition function (2)

satisfy the continuity condition (e), but does sat-

isfy (g)

for every

Pt(x,[x-£,x+£]) £ > 0

and every

= 1 - oCt)

x E R1 •

This means (as we will

see later) that the Wiener process never stands still, as does the "compound-Poisson" process governed by (1), but is in constant motion.

Condition (g), however, implies that the

process changes state not by jumps but by continuous motion.

126

6.

MARKOV TRANSITION FUNCTIONS

A Markov process with this property is called a diffusion. Problem S.

Suppose that

!

f(x) ..

tion" wi th densi ty

1f

_1_ , x E Rl.

density

t

= 1f- ----. t2+x2

ft(x)

Show that

l+x2

is infinitely divisible and that 1

is the "Cauchy distribu-

Il

Il

is the distribution with

Il t

(The resulting transition function

corresponds to a Markov process called the Cauchy process.) Does this transition function satisfy condition (g)1 (iv)

Other closely related examples can be constructed

easily from (2). N(mt,t)

For example, if

Il =

N(m,l)

we have

Il

=

t

so that Pt (x,E)

fE-x

1 =-121ft

e

_ (y-mt) 2 2t dy.

(3)

This represents Wiener process with a constant "drift" of magnitude

m superimposed.

If

{Xt }

are the random variables

of the original Wiener process governed by (2), then (3) corresponds to the process

{X t + mt}.

This new one still has

independent increments. Another modification leads to a transition function on R+.

Due to the symmetry of the function (2) about

is plausible that the process

y

x = O.

if

Xt goes from x to function suggests itself: 1 p (x,E) .. - -

I2nt

t

where

x

>

0

and

[

Since

IXtl

R+

goes from

it

with a x

to

±y, the following transition

fE-x e- y 2/2t dy + fE+x e _y2/2t dy),

E C R+.

= 0,

{Ixtl}, where

erned by (2), could represent a Wiener process on reflecting barrier at

x

(4)

S.

Ko1mogorov's equations

Problem 6. function on S.

127

Verify that (4) is a Markov transition

(R+,m+)

and satisfies condition (g).

Kolmogorov's equations. The Brownian motion, even with its modifications, is

only one example of the large class of transition functions which satisfy conditions (a) - (d)

an~

(g) of section 4 and

which we anticipate may correspond to processes with continuous paths.

We will now show that these functions generate

solutions to certain parabolic partial-differential equations. Conversely, the equations can be used to construct and study the transition functions and the Markov processes they govern. We begin with the Brownian case, where the transition function

Pt(x,E)

has the density

f(t,x,y)

e

This function is, for each

_(y-x)2/2t

t

>

o.

(1)

x, the fundamental solution of the

classical "diffusion equation" (or "heat equation") a£

at

1 2

a2f

(2)

ay2

This means that the right side of (1) is a solution of (2) for

t > 0, and that the measures defined using

a density converge weakly, as

t

+

f(t,x,')

0+, to a unit mass at

as x.

This way of looking at the problem makes (2) the so-called "forward equation" of the Wiener process:

for fixed initial

state, the density of the random variable

Xt

satisfies a

parabolic differential equation. There is another approach, which at first appears less natural but turns out to be both simpler and more general. This is, basically, to consider the terminal state to be fixed

6.

128

MARKOV TRANSITION FUNCTIONS

and to vary the initial state and the elapsed time.

In the

case of (1) the change seems deceptively trivial; if rather than

x

is fixed, f

solution of (2) with

y

is clearly still a fundamental

a2/ax 2

2 a2lay.

in place of

different way to put this is as follows: bounded, continuous function on

RI

let

A slightly be any

~(x)

and define

~(t,x) = f~ ~(y)f(t,x,y)dy.

(3)

-~

Then y)

satisfies the heat equation (2) (with

~

and also the initial condition

~(O,x)

x

replacing

= ~(x).

In this

context (2) becomes the "backward equation" of the process. The situation can be clarified by returning to the finite-state case studied in section 2. basic matrix equation in the form led to the differential equation

We began with the

p(t+h) = p(h)p(t), which pl(t)

= Gp(t).

This is the

backward equation of the Markov chain since we analyzed a transition in time final state Sj

t+h

and derive

p' (t)

to a new

= p(t)G,

p(t+h) =

we obtain the forward G acts on the

si; in the second (forward) case, on the final

Sj'

More generally, let tion.

si

Alternatively, if we begin with

In the first case the operator

initial state state

by holding the

to

h) from the initial state

intermediate state. equation.

si

fixed and performing first the short-time

transition (time p(t)p(h)

from

p

t

(x,E)

be any transition func-

The forward approach seeks a differential equation (or

a more general functional equation) satisfied by the density of the measure

Pt(x,')

for fixed

x; the choice of

enter only through the initial condition that as

t

x +

0+

will the

5.

Ko1mogorov's equations

129

solution should converge to a unit mass at

x.

This method

was known to physicists such as Smoluchowski even before the work of Wiener, and led to useful results about problems such as diffusion subject to outside forces or inhomogeneities in the medium. The backward approach was introduced by Kolmogorov in 1931 in a famous paper which for the first time surveyed the whole question; we will now take a closer look.

Define,

as in (3),

~(t,x) = I~~ ~(y)pt(x,dy), where

~

is bounded and continuous.

(4 )

We will try to find the

differential equation satisfied by these functions time

~

~(x).

will enter through the initial condition

~;

this

~(O+,x)

If the equation, subject to this condition, can be

solved for every

~

it is possible to reconstruct the transi-

tion function (although the information is given in a less intuitive form than in the forward case).

The advantages of

the backward method lie in its greater generality and theoretical simplicity, and it has come to occupy the main place in the mathematical literature. We will now derive a (backward) "diffusion equation" satisfied by the functions

~

defined in (4).

interested in continuous paths, we assume that

Since we are pt(x,E)

sat-

isfies (g); that is, that lim t+O+ for all

£ > 0, all

that the limits

tI pt(x,RI -[x-£,x+£]) = 0 x.

(5)

It is then a1sQ reasonable to require

130

6.

MARKOV TRANSITION FUNCTIONS

! Jx+e: pt(x,dy)(y-x) = a(x) t

lim t+O+

(6)

x-e:

and

exist for

each

(5), for all

mean (over

Jx+e: z pt(x,dy) (y-x) = b(x) x-e:

tI

lim t+O+

x, for some

e:).

0

(7)

e: > 0

(and hence, because of

Physically, a(x)

may be thought of as a

w) instantaneous (with respect to

when the process is "at tion as a variance. tion

~

x";

b(x)

t) velocity

has a similar interpreta-

Finally, we have to assume that the func-

has a continuous second partial derivative with re-

~

spect to

x, for all Theorem.

function

!i.

t > O. the above conditions hold, then the

defined in (4) satisfies the differential equa-

~

tion t

(8)

> 0,

and the initial condition lim t+O+ Proof. which leads to sume first that

~(t,x)

=

~(x),

I

xER.

(9)

We will write down the difference quotient a~/at,

h > O.

equation (d) to write

and show that its limit exists.

We can use the Chapman-Kolmogorov Pt+h

in terms of

there are again two ways of doing this.

and

~(t+h,x) = f~mI~mPh(X,dY)Pt(y,dZ)~(Z)

Ph; clearly

The one appropriate

to the "backward" approach is to begin with

which leads to

As-

=

Ph:

I~m Ph(x,dy)~(t,y),

5.

Kolmogorov's equations

!pct+h,x)-1/J(t,X) h

= h1

To take the limit as

roo

(5) and the fact that

Ph(x,dY){1/J(t,y)-1/J(t,x)}.

J-oo

h

131

(10)

0, we first observe that by using

+

1/J

is bounded by

we can

supl~(Y)1

rewrite (10) as 1/J(t+h,x)-1/J(t,x) h

for each

>

O.

smoothness of

1/J

1/J(t,y)



=

kfX+€

Ph(x,dY){1/J(t,Y)-1/J(t,x)}+o(l)

Next, using the assumptions about the we have

1/J(t,x) + (y-x)1/Jx(t,x) +

=

(11)

X-€

(y_x)2 2 1/Jxx(t,x)

(12)

+ ret ,y) (y-x) 2 by Taylor's theorem, where

r(t,y)

+

0

as

y

+

x.

We are

going to substitute (12) into the right side of (' L), and then use (6) and (7).

The error term in (12) leads to an in-

tegral bounded by the quantity max

1

yE[X-€,x+€]

as

h

+

0+

rx +€

2

Ir(t,y)1 h Jx-~ Ph(x,dy) (y-x) ; 0, verify that (9) holds for every

bounded, continuous function

~.

We will not derive the forward equation analogous to (8), but the reader may still want to know what it looks like. Assumptions (5), (6) and (7) are again required along with regularity conditions more stringent than for the backward case; to begin 'iith, Pt(x,E)

must have a density

which is twice continuously differentiable in tion satisfied by

f, for fixed 1 2

x,

y.

f(t,x,y) The equa-

turns out to be

a2 a {bey)£} - - {aCyl£} ay2 ay ,

-

(15)

so that the operator on the right is the formal adjoint of the one in (8).

It is apparent that (15) holds less generally

than the backward equation; for example, previously the functions

a

and

b

did not have to be differentiable.

We have shown that a transition function satisfying certain conditions generates solutions of the system (8) and (9).

The most important applications, however, are in the

S.

Kolmogorov's equations

opposite direction:

133

given the functions

would like to construct

pt(x,E)

a(x)

and

by solving (8).

b(x), we This was

done before 1920 in special cases (using the forward equation (15) instead of (8)), but the first general treatment was given by W. Feller in 1936. a(x)

and

Under certain conditions on

b(x), Feller showed that there is a unique bounded

solution

~

tions

are derived via (4) from a transition-probability

~

function. *)

to (8) and (9) for each

~,and

that these func-

It was first proved by R. Fortet in 1943 that

these probabilities correspond to processes with continuous paths.

This is about where the theory of diffusion stood at

the end of the 1940's. But there was already much more than this to be said about Markov processes.

Kolmogorov's 1931 paper discussed,

in addition to diffusions, processes of the "purely discontinuous" sort (the compound-Poisson processes of (1), last section, are examples of this kind) plus processes of "mixed type" which move both by jumps and continuously.

In addition

to these generalizations, the effect of boundaries (with associated "boundary conditions" such as "reflecting," "absorbing," etc.) had to be taken into account.

The result of all

this was a picture of considerable complexity.

Most theorems

required ad-hoc hypotheses and regularity conditions, and there were few complete, clear-cut results.

*)This is not quite how Feller put it, but equivalent. The hypotheses of our theorem, incidently, are approximately those of Feller's 1936 paper rather than of Kolmogorov's.

CHAPTER 7 THE APPLICATION OF SRMIGROUP THEORY 1.

Introduction. The skeleton key which brings order out of all

t~~s

chaos is the theory of semigroups; its application to Markov processes was developed in the early 1950's, with W. Feller doing the pioneering work. Let

Pt

urable space

be any Markov transition function on a meas(S,Y)

and le.t

measurable functions from

S

F(S) to

be the space of bounded,

Rl; F(S)

with respect to the supremum norm.

For

is a Banach space

f E F(S)

we define (1 )

It is clear that for each tor mapping

F

IITtfll~ Ilfll

into

t > 0

T

t

F', that is, Tt

for all

fE F.

is a contraction operais linear and

Moreover, the Chapman-

Kolmogorov equation (condition (d) in the definition of a transit~on

function) yields

1.

Introduction

135

(Tt+sf)(x) = fPt+S(X,dY)f(y)

ff

pt(x,du)ps(u,dy)f(y)

so that the operators

{T t }

satisfy the semigroup condition (2)

Because of (c), we also have On the other hand, let of finite signed measures on norm.

For each

TO = I.

M(S)

denote the Banach space

(S,5P)

with total-variation

v E M(S), we define (3)

Again, it is easy to see that the operators (with on

Uo = I)

M(S).

{U t ; t > O}

also form a contraction semigroup of operators

(The reader should verify this for herself.)

Finally, there is a natural bilinear functional between F(S)

v EM,

f

and

E F.

M(S)

defined by

(V,f) =

f

S

f(x)v (dx)

for

In terms of this functional the two semigroups

we have constructed are adjoint, since using Fubini's theorem we have (4 )

Thus we expect a certain duality to hold between the two semigroups; it is incomplete, since in general neither is the entire conjugate space of the other.

However, we may

expect that it will be enough to study primarily two semigroups and we will see that

M nor

~

of the

{T t } is usually the

F

136

7.

SEMIGROUP THEORY

better one on which to concentrate. As an example, when of column vectors and (1) becomes (~p(t))

..

J

(Ttf).

1

= \j

M(S)

S

is finite

F(S)

is the space

that of row vectors.

= (p(t)f)1"

p .. (t)f. 1J

The duality between

J

F

and

Equation

while

M is of course com-

plete in this case, and

Ut and Tt are true adjoints of each other, corresponding simply to transposing the matrices pet).

The semigroup equation (2) reduces to the multiplica-

tive property of the stochastic matrices which we studied in section 2 of the last chapter. In the next four sections we will study the structure of contraction-operator semigroups.

The theory bears con-

siderable resemblance to the matrix case treated earlier; there is again a differential equation analogous to

y = ay

and an operator which "generates" the semigroup, although the simple exponential function no longer suffices to construct it.

These general results will then be specialized to the

Markov case, and the mysteries of "backward" and "forward," boundary conditions and the like will largely disappear. After all this we will turn in Chapter 8 to the study of the Markov processes themselves, using the considerable supply of information about their transition functions which will then be available. 2.

Definitions and preliminaries. We have seen that there are two semigroups associated

in a natural way with each Markov transition function.

We

will now develop the most necessary results from the general theory of semigroups, due to E. Hille, K. Yosida, R. Phillips

2.

Definitions and preliminaries

and others.

137

From here on, the letter V

will mean an arbit-

rary Banach space over the real numbers.

Elements of V

be denoted by letters such as

z, scalars by Greek

x, y

and

letters, and operators by boldface Latin capitals. of an element

x EV

Definition 1.

will

The norm

is written II xii. {T t }, t E R+ , be a family of lin-

Let

V. Then {T t } is a one-l2arameter contrac-

ear operators on

tion semigroul2 provided that (i)

II\xll ~ "xII

(ii)

(iii)

Tt +s

= Tt

Ts

lim IITtx-xll t+O+

We will say for short that

0 {T t }

for all

XE V;

for all

t,s

for all

x E V.

~

0;

is a semisroul2 i f these

three conditions are satisfied. Remarks.

It follows from (ii) and (iii) that

the identity operator.

(Why?)

Condition (iii) is (by defini-

tion) equivalent to requiring that strong ol2erator tOl2ology.

= I,

TO

Tt

+

as

t + 0+

in the

It is natural to ask why the uni-

form or the weak topologies are not used instead.

The

answer in the first case is that requiring uniform convergence would be much too restrictive; this holds only when the generator (defined below) of operator.

{Ttl

is a bounded

As for weak convergence, it turns out that it is

actually equivalent to strong convergence in our situation. This fact is sometimes useful but will not be essential for us; see section 6, propositions 3 and 4. Lemma.

{T } is strongly continuous in t

t

for all

t>O.

138

SEMIGROUP THEORY

7.

Proof.

h

We must show that lim -+-

0

o

IITt+hx-Ttxll

But by (i) and (ii) we have for

which tends to

0

h

as

(but so small that

0

-+-

for all

x E V,

h > 0

by (iii) .

The case

t+h > 0) is handled similarly.

finite-dimensional case, the proof shows that ally uniformly continuous for Definition 2.

Ax

t >

{T } t

Let

=

t

lim -+-

0+

Ttx

set of

x EV

h < 0 As in the is actu-

o.

be a semigroup.

Define

Ttx-x t

(1 )

provided the limit exists (in the norm topology). called the generator*) of

t > O.

Then

A is

{T t }, and its domain (that is, the

such that the limit (1) exists) is denoted

~A' It is clear that into

A is a linear operator mapping

V; it mayor may not be continuous.

This operator is a

sort of time-derivative of the semigroup at

t = 0, and the

hope is that it can playa role like that of the matrix in the finite-dimensional case. erator of the matrix semigroup space.

In general

~A ~

V

always enough elements in tor very useful.

~A

G

In that case, G was the genpet), and

~G

was the whole

occurs frequently, but there are ~A

to make the concept of genera-

This is quite easy to show:

*)Sometimes A is also called the "infinitesimal generator" or the "infinitesimal operator" of {T t}.

2.

Definitions and preliminaries Problem 1.

Let

I: f(t)TtxdtE~A

0 < a < b
0, and we have t

Proof.

> O.

(2)

The derivative on the left side of (2) is taken

in the norm topology; this will always be assumed hereafter unless the contrary is explicitly stated. for any small

h > 0

Fix

t > O.

Then

we have T -I h • T X t ,

-,;--

and as

h

Tt(Ax)

because of (1) and the fact that

+

0+

the limit of the second expression is clearly Tt

is continuous.

The third expression thus approaches this limit also, which proves that

TtX E

~A

and that (2) holds with "right-hand

derivative" on the left side. Now assume that

h < 0

(but

t+h > 0).

Then

7.

140

Tlhl x - x

~ I I Ih I

- TI hi . Axil


0;

C, m < ~;

0+. u(t) = Ttx

We already know that

is a solu-

tion and that it satisfies the conditions; it remains to show uniqueness.

Suppose that

ul

satisfying (a) - (c) and set

and

u2

vet)

are both solutions

ul(t) - u 2 (t); then

is a solution satisfying (a) and (b) and tending to t

+

0+.

Since

Put vet)

wet)

=e

-At

v(t), taking

~

> max(m l ,m 2 ).

satisfies (6) we have

~ ~(t)

= -~w(t)

+

e-~t Av(t)

0

-1

-R). wet)

v as

7.

146

SEMIGROUP THEORY

by Theorem 1; i.e., wet)

= -R~ d~~t)

Integrating both sides from

f os w(t)dt = -R~ fS0 since

to

s

dw(t) dt dt

we obtain

= -R~

s

+

00, the left side

sumption (b) and the choice of vet)

vanishes for each

above it follows that

vet)

=0

wCO)

= O.

tends to the Laplace trans-

form of v, while the right side tends to form of

wCs)

commutes with the integration (why?) and

R~

But when

0

.

0

because of as-

Thus the Laplace trans-

~. ~

> m and by the lemma

itself.

This completes the

proof of uniqueness. Corollary. then

Tt

A of

{T t }

is bounded,

= exp(tA).

Proof. exp(tA)x

If the generator

We have seen that, when

A is bounded, u(t) =

satisfies the differential equation (6) and condi-

tions (a) and (c).

The verification that the exponential

series converges also shows that Ilexp(tA)xll ~ exp(tIIAII) Ilxll, so that (b) holds as well. TtX

for each

x E V.

x

in

!?)A'

Hence by the theorem \~hich

exp(tA)x

in this case means all

4.

The Hille-Yosida theorem

4.

The Hiile-Yosida theorem.

147

We have seen that semigroups are characterized by their generators, and have clarified the relation between generator, semigroup and resolvent.

But for constructing semigroups, it

is important to know precisely when an operator generator of some semigroup.

A is the

This question is answered by

the following: Theorem (Hille and Yosida). tor on tor of

V wi th domain ~

(i)

!?) A is dense in for every

V

A be the genera-

it is necessary and

A> 0

V; and

y EV

the equation

has a unique solution

the solution in (ii) satisfies

Outline: proximate

~

linear opera-

~

A satisfy the following three conditions:

AX - Ax = y (iii)

A be

In order that

contraction semigroup

sufficient that

(ii)

[/JA'

Let

x E!?)

A

when

2.11fU

Ilxll

We will find bounded operators

A for

x E !?)A;

A is large.

AA

which ap-

Using these

operators, we construct semigroups by setting (1)

Tt = lim Tt(A), and prove that A... '" is a semigroup with generator A.

Finally, we will define

Proof.

The necessity of the above three conditions was

proved in section 3.

Suppose the conditions hold.

it is clear that the operator write

{T t }

RA = (AI-A)

-1

.

From (ii)

AI - A is invertible, and we

(Of course we anticipate that

RA

will

become the resolvent of the semigroup we plan to construct.)

7.

148

For each

A

0

>

RA maps

V onto

~A'

SEMIGROUP THEORY

and by (iii) we have

liRA II < A-I. AA = AAR A, A > 0; this choice is moti-

We now define

vated by the corollary to Theorem I of the previous section. Since (2)

we see that

AA

is continuous, with

I IAAI I 22A.

We will

now prove that lim AAX A"'''''

for all

Ax

=

x E ~A'

(3)

x E ~A'

(4 )

In order to do this, we first show that lim A"''''' Since

A commutes with

ARAx

=

RA

x

for all

we have

Therefore

which proves (4). Next we extend (4) from and any

e

>

0, we can choose

~A

to all of

y E~A

so that

V.

For any

I Iy-xl I 2 e.

Then

and then as

A'" ""

we have

result (4) already proved. Finally, if

x E

~A'

lim sup I IARAx-xl I 2 2e Thus (4) holds for all we have

by the x.

x

4.

The Hille-Yosida theorem

149

lim AAX • lim ARAAX

A-+oo

A-+oo

because (4) holds for

Ax.

= Ax

Therefore (3) is proved; in a

sense, this says that the bounded operators A for large

approximate

A.

Now we consider the operators We know that

AA

{Tt(A); t

~

O}

Ti(A)

defined by (1).

is a semigroup for each

A > 0,

and it is easy to see that it is actually a contraction semigroup,

In fact by (2) we have

so that

We must next prove that

Tt(A)X

has a limit as

A -+ OOj

to do this we will show that lim IIT t (A)X- Tt (ll)xll A,Il-+- 00 For fixed

A and

Il

and

=0

x E:!)A

for

x E !:lA'

(5)

write

and note that fl (t)

where But then we have*)

*)This formula involves taking the derivative of a "product" as in elementary calculus, but in quite different circumstances. Think through what is involved!

7.

150

so that, since

SEMI GROUP THEORY

f(O) = 0,

Io exp(-sAA)g(s)ds. t

Noting that

Tt(A)

and

A~

Hence, since the operators "f(t)1I .::.

commute (why?) we can write

{Tt(A)}

are contractions,

fo IITt_s(A)Ts(~)(AA-A~)xllds t

(6)

.::. tIICAA-A~)xll. But by (3) the right side tends to is established. t

as

0

over any bounded set.

rangi~g

Ttx

= lim

A+OO

Tt(A)X,

and lim Ttx t+O

= x.

The first term is at most

t, Ttx

I Ixl I

Tt

~

I Ixi I·

by setting

will be continuous

Moreover, for each

since

tion while the second term tends to I ITtxl I

{T t }

XE[!)A'

Because convergence is uniform in t

00, and so (5)

We note that the convergence is uniform for

l'le nOlo[ can define the operators

in

A,~ +

0

x E [!)A'

TtCA) as

is a contrac-

A + 00; hence

It is now easy to extend the definition of

x E V, and to verify that the resulting operators

to all

are contractions and form a strongly continuous semigroup. (These details are left to the reader.) Finally, we must show that the generator of the semigroup

{T t }

any case

which we have constructed is precisely

{T t }

has a generator; let us call it

A'.

A.

In

We will

4.

The Hille-Yosida theorem

try to compute

A'x

151

x E 9 A.

for

We have

(7)

Now

The first term tends uniformly to bounded. Tt(A) A~

0

as

A

The second term is bounded by

~ w

for

I IAx-AAXI I

is a contraction, and this bound tends to

0

t since as

Therefore

w.

lim T (A)'A x = T 'Ax tAt

A~OO

holds for each fixed in

t

x

~

9

and the convergence is uniform

A

over bounded intervals.

As a result we can pass to

the limit in (7) to obtain

ft

Ttx-x Dividing Qy A'x (since

t

lim

t~O+

and letting

t

Ttx-x t

1 t

TO = I) . A'

AI-A

9 A,

V

agrees with

= ~A

r 0

:::l9i

Ts'AX ds

Ax

and

Ax

A'x

is the generator of

smaller domain also onto that

0, we have

~

is an extension of

in a 1-1 manner onto But

t~O+

Hence 9 A,

in other words, A' Since

lim

and so

(8)

Ts' AX ds.

0

for

x

E~A;

A. {Ttl, AI -A'

maps

9",

by Theorem 1 of the last section. AI-A' V

on

9

A

and maps this possibly

by condition (iij.

A' = A.

It follows

This completes the proof

152

7.

SEMIGROUP THEORY

of the theorem. A) 5.

Complements. a)

Although we will not need to do so, it is quite

easy to extend the above theory to certain semigroups which are not contractions; instead of condition (i) in the definition of a "semigroup" (page 137) we can assume (i')

where

C and

m are constants independent of

in section 2 is changed essentially. that the resolvent

RA

x.

Nothing

In section 3 we note

is defined for all

A

m, and the

>

bound (2) is obviously replaced by (1)

The theorems proved in the section remain true with only the obvious change that in Theorem 1 we must take Problem 4.

A > m.

Prove that the resolvent of a semigroup

satisfying (i') is bounded by

II Rnx II A

< CII x II

-

(A-m)n

for

n" 1,2,3, ...

in place of the similar formula, with

Cn

(2)

on the right,

which would be obtained by simply iterating (1). As for the Hille-Yosida theorem (section 4), we expect that condition (i) remains unchanged and that (ii) is the same as before except that "for every every

A > m."

A > 0" becomes "for

Because of (2) above, however, we must replace

A)It is sometimes useful to note that conditions (ii) and (iii) need only be assumed to hold for all large A, rather than for all A > O.

5.

Complements

153

condition (iii) by the following:

(~I-A)-l, which exists by (ii),

the operator

(iii')

satisfies n .. 1,2, ...

Problem 5.

Outline the changes which must be made in

the proof of the Hille-Yosida theorem to adapt it to these more general conditions. b)

The theory can be adapted to cases when condition

(iii) in the definition of a semigroup, asserting that Ttx

+

x

as

t

+

0

for each

x, is not satisfied.

In that

case we define M = {x E V: lim

t+O+

Tt x = x},

and restrict our attention to this space. Problem 6.

and that the operators TtM eM for each

M is a closed subspace of V

Show that {T t }

leave M invariant (i. e. ,

t > 0).

Because of this result the theory we have developed applies to the semigroup

{T }

restricted to M.

t

culty, of course, is that M

The diffi-

may be a very small subspace!

But there are also situations in which this trick is quite useful. c)

Recall that an operator

xn E ~B' xn y = Bx.

+

x

and

(That is, the

BX n

+

y

B on V

imply that

is closed if

x E~B

and

B.!:!!E!!. of B is a closed subset of

V x V.) Problem 7.

Show that the generator

A of a contrac-

154

7.

SEMIGROUP THEORY

tion semigroup (or the slightly more general type of (a) above) is closed. (Hint:

use the proposition on page 139 which implies

that T Z - Z t

for all

t > 0

whenever

=

Jt0

E

9JA.)

Z

T AZ ds s

From this fact, recalling the "closed graph theorem," we have at once this Corollary. entire space 6.

If

A is a generator whose domain is the

V, then A must be a bounded operator.

Markov transition functions:

continuity conditions.

We now return to the study of Markov transition functions.

The idea is to apply the general theory we have just

been developing to the semigroups

{Ttl

and

{Utl

which

were defined in section 1 by equations (1) and (3). technical reasons we will concentrate on

{Ttl.)

nential" differentia·l equations associated with {utl

(For

The "expo{T t}

and

will then be respectively the "backward" and "forward"

equations associated with the Markov process. Our first job is to adapt the continuity condition (iii) of section 2 to the present context. requires that

I ITtx-xl

1+

For the Markov semigroup

0

as

{Ttl

bounded, measurable functions on lim

sup

t+O+ xES

for all

f E F.

IJ

S

t

+

0+

Condition (iii) for all

acting on the space

x EV. F(S)

of

5, this assumption becomes

Pt(x,dy)f(y) - f(x)1

I

In particular, choosing

=0 f =

(1) ~{x}

we get

6.

Markov transition functions: continuity conditions

lim

t ... O+

Pt(x,{x}) = 1

for each

ISS

xES.

(2)

This is the same as condition (e) on page 124, and its validity is thus necessary in order for tinuous at

t

=0

on the space

{T t }

to be strongly con-

F(S).

As an example (or rather, a large class of examples) we can consider the "compound-Poisson" process defined by (1) on page 124 (see also problem 4 just below it). Problem 8.

Let

{T t }

be the semigroup on

tained from the above process. continuous at operator

t

= 0,

M(S)

{T t }

ob-

is strongly

and that its generator is the bounded

A defined by Af(x)

for all

Verify that

F(S)

f E F.

=A

I

S

p(x,dy)[f(y) - f(x)]

(3)

Verify also that the semigroup

rUt}

on

is strongly continuous and compute its generator. Every semigroup is, as we have seen, associated with

an "exponential" differential equation the generator of the semigroup.

~~ = Au, where

A is

For the above semigroup

{T t }, this equation becomes

d~~~,t) = A

Is

p(x,dy)[~(y,t) - ~(x,t)].

(4)

Equation (4) is thus the Kolmogorov backward equation for the compound-Poisson process, and of course it has the solution w(x,t)

= Ttf(x)

for

f E F.

This function satisfies both

the equation and the initial condition

~(x,O)

= f(x),

and it

is unique under mild additional restrictions (Theorem 3, page l4S).

The forward equation is obtained similarly from the

generator of the semigroup

rUt}; it may be a useful exercise

7.

156

SEMI GROUP THEORY

to derive it. It is clear, however, that we can't construct a satisfactory general theory in quite this way, since the Brownian motion process -- our most important and interesting example -- does not satisfy condition (2).

Perhaps it would

be natural to simply try and apply remark (b) in the last section, but we will do something a little different and shift our attention to continuous functions. sume from now on that

S

a-field of Borel sets. by

We accordingly as-

is a metric space and

~

is its

The metric itself will be denoted

p. Definition 1.

A Markov transition function on

is a Feller function if

(S,~

Ttf. defined by

is a continuous function whenever

f

is bounded and continu-

ous. Let

C(S)

denote the Banach space of bounded continu-

ous functions with the uniform norm. equivalent to the assertion that space of

F(S)

for the operators

lent to say that the measures x

The "Feller property" is

C(S) {T t }.

Pt(x,.)

is an invariant subIt is also equivadepend continuously on

in the usual weak topology, for every fixed

t

>

O.

Al-

most all the Markov processes which arise "naturally" in or

Rn

RI

(including Brownian motion~) do have the Feller pro-

perty, although of course it is easy to construct examples where the property fails. The Feller condition deals with continuity of the transition function

pt(x.E)

in

x, and does not, by itself.

6.

Markov transition functions: continuity conditions

imply anything about continuity in {T t }

easier for

t.

157

However, it is much

to be continuous in

t

on

C

than on

F;

in particular, condition (2) is no longer necessary since the function

~{x} is usually not continuous. We will now explore conditions related to continuity

in

t, in the sense appropriate to semi groups on

Notation:

we write

Ne;(x) '" {y E S; p(y,x)
0

and

XES, and define f E C(S).

Clearly

F

is a linear functional on

to see that it is bounded (its norm is

C(S), and it is easy 1) and non-negative.

According to the Riesz representation theorem, such a functional always arises by

integra~ion

non-negative Borel measure on

S.

with respect to a finite, Of course the functional

F and so the measure which represents it will depend on the choice of

t

and

x; we denote it

Pt(x,·).

Thus we have

7,

Normal Markov semigroups on

=

Ttf(x) = F(f) and we must show that

C(S)

163

fs pt(x,dy)f(y),

pt(x,')

is a normal Markov transition

function. Since

Pt(x,')

is a finite non-negative measure, (ii)

clearly implies that the total mass is the measure

PO(x.·)

1.

Because

T

o=I

must be just a unit mass at the point

x.

Ttf(x)

is continuous in

x

Pt

comes automatically; this also implies that

measurable for each Borel set

so the Feller property of E.

pte· .E)

is

We have

JS {J S pt(x.dy)ps(y,du)}f(u) {T t }

by the semigroup property of Since this holds for all

and Fubini's theorem.

f E C(S), it fol10ws*) that

i.e., the Chapman-Kolmogorov equation is satisfied,

Thus

Pt

is a Feller transition function, We know that f E C(S).

Ttf

~

f

uniformly as

S

~

~

0+

for

Even if the convergence were only pointwise, this

implies stochastic continuity for *)If

t

and

and if

v

J

Pt

by proposition 1.

If

are finite Borel measures on a metric space f

d~ = f f dv for all f E C(S). then ~ = v.

When S is compact this is the uniqueness part of the Riesz theorem used above, but it is quite easy to prove it directly for any metric space.

164

7.

SEMIGROUP THEORY

we rely on proposition 3 of the last section, uniform stochastic continuity fOllows automatically and so the proof that Pt

is normal is complete. Without using proposition 3 we can proceed as follows:

define for

fE(x,y)

{:

E

.

> 0

1

E

p(x,y)

By the triangle inequality

for

p(x,y) < E;

for

p(x,y)

>

E.

we have, for any

x

and

z,

- f (z y)/ < p(x,z) max /f~(x,y)

f

f

Then for any

to x

=g

-hi

is unique.

Af - Af

=

g

at-

7.

166

Repeating with

-f

and

-g, we obtain

SEMIGROUP THEORY

AI If I I ~ I Igl I,

which verifies the third condition necessary to apply the Hille-Yosida theorem.

Hence we can conclude that the operator

A does in fact generate some contraction semigroup on let us call it

C(S);

{T t }.

Next, we know that the initial-value problem du dt = AU, has the unique bounded solution

=~

u(t) Thus

Tt~

u(O)

=~

u(t) =

Tt~'

is also a solution, since =

~

for all

t.

=0

A~

We see that by (iv) above.

It only remains to prove that

Tt

is non-negative and then we can apply Theorem 1. We first show that Suppose that If

f

so that

H- M

RA

= (AI_A)-l

= g and that

attains its maximum at

g < 0

implies

means that the operators

g (x) < 0

for all

x.

x O' we have

f < O. RA

is non-negative.

But

f = RAg

and so this

preserve positivity.

The same property holds for

Tt .

In the proof of the

Hille-Yosida theorem, we constructed the operators e- tA

Clearly, since

RA

L

00

n=O

(tA 2R,)n A

n!

is a positive operator the same holds for

Finally, since

Tt = lim Tt(A), we conclude that

is itself positive. We have now shown that

A->-oo

A generates a semigroup

Tt (T t }

and that this semigroup satisfies the hypotheses of Theorem 1.

7.

Normal Markov semigroups on

Hence

{T t }

C(S)

167

is normal, and Theorem 2 is proved.

Remark.

In the definition of a Markov transition

function, we assumed that

=1

pt(x,S)

for all

t,x.

It is

sometimes useful to relax this by assuming only that p (x,S) < 1. t

The appropriate changes in Theorems 1 and 2 are

-

simply to delete the conditions that J" E!?) and

0, respectively.

A~ =

T

t

If

~

=~

or that

Pt(x,S) < 1, we inter-

pret the difference as the chance that the process, starting

x, has "vanished" from the state-space

from t.

S

before time

Sometimes one says instead that the process has been

"killed. " We conclude this section with a result of less importance, but which comes in handy for discussing certain examples and will have more applications later on. Theorem 3. normal semigroup ~

Suppose!h!! {T t

}

C(S)

~

t+O+

Ttf(x) - f(x)

the convergence in (1) Proof.

xES). ~

clear that H-A

Then i f

g

be uniform, and

Af = g

when

is continuf E!?).

~

A

fE 9.

A is an extension of the generator is an extension of

We will show that pose that

(1 )

Let!?) = {f E C(S): the limit (1) exists and

is continuous}, and write

fore

that for

~

- - ' - - - t - - - = g (x)

exists pOint,,,ise (for each

g

and suppose

f E C(S)

lim

~

A is the generator of !

-

f e!?) and that

~I

~I

A.

It is There-

- A as well.

- A is one-to-one on!?). Ai - Af

= 0, and let

f

Sup-

attain

7.

168

its maximum at and so

xO'

f(x) < 0

we find that But

f

It is very easy to see that

for all

=0

x.

AI - A maps

Af(x O)

0,

~

Repeating the argument for

-f,

AI - A is one-to-one.

and so

and is also one-to-one. 8.

SEMIGROUP THEORY

onto

~

C(S); AI - A extends it

!?I

Clearly, this implies that

A

9.

=

Examples. (a)

tion 2).

Suppose Then

S

F (S)

is finite (the case of Chapter 6, secC(S)

consists of column-vectors, and

a transition function is a family of stochastic matrices satisfying

= I, p(t+s) = pet) . p(s), and lim pet) = I.

p(O)

t .... O+

We define

Ttf(i) =

generator

G whose domain is dense; since

I: p .. (t)f(j).

j

1)

The semigroup C

dimensional, a dense linear subspace must be Hence lim t .... O+ exists for all

f.

I:jPij (t)f(j) - f(i) t

Choosing

f (i) =

{Ttl

pb (0)

has a

is finiteC

itself.

Gf (i)

i O' we i,j E S; this differen($ •• 110

for fixed

gij for each tiability was proved in a different way in Chapter 6. find that

pet)

also follows from the general theory that

pet)

It

= exp(tG).

It is easy to use Theorem 2 above to verify that a matrix

G is the generator of a transition function iff it

is non-negative off the main diagonal and has row-sums equal to

O.

Indeed, any matrix

theorem. being

G satisfies (i) and (ii) of the

Condition (iv) is clearly equivalent to row-sums

O.

If the entries of

G are also non-negative off

the diagonal, it is clear that condition (iii) of Theorem 2 holds.

Finally, suppose (iii) holds and consider the func-

tion (vector)

8.

169

Examples

{"

f(i)

-a,

a > 0, so

f

i

i

iI'

0'

otherwise.

0

We take

i

has its maximum at

i

i O'

Then by

(iii)

Taking

= 0,

a

g..

we conclude that

< O.

10 1 0

But i f the non-

g.. were negative, choosing a large enough lOll would produce a contradiction. Hence G does satisfy the diagonal term

description stated above, which agrees with the condition on generators derived earlier in Chapter 6. (b)

The previous examples have all involved generators

which are bounded operators.

The simplest unbounded case is

probably the deterministic motion with constant velocity We take f(x

+

S

vt)

R'"

and

for any

Pt(x,{x f E F(S).

+

vt}) = 1, so that Clearly, i f

f

v.

Ttf(x)

is differen-

tiable we have

lim

t ... O+

and if

f' E C(S)

Ttf(x) - f(x)

-------- = vf' (x), t

the convergence will be uniform according

to Theorem 3 in the last section.

(It is also easy to see it

directly using the mean value theorem.)

Hence

Cl C

~A

where

Cl = {f: f, f' E C(S)}, and for fECI we have Af = vf'. Note that f' (~) must be 0 since f is bounded, which agrees with the need to have is a trap.

(A state 1

for

Xo

Af(~)

o

because the state "","

is called a trap in case

t > 0.)

We now will prove that

~A

Cl .

Since

AI - A extends

170

p.

SEMI GROUP THEORY

7.

-

v d )

ax

and

-

AI

A

9 A onto

maps

A

way, i t is enough to show that onto

That is, for all large

C.

-

v d

Ox

in a one-to-one

C

actually maps

A and all

g EC

Cl

we must

show that

= g(x)

Af(x) - vf'(x) has a solution f(x) = e where

r

The general solution of (1) is

fECI' Ax/v

(1)

[K - -1 v

0

e -AU/V g(u)dul,

K is an arbitrary constant.

brackets must approach f E C(S)

x ....

As

+00,

the term in

i f there is to be any chance for

0

and so we have to choose

K

= -v1 foo0 e - AU/V

g(u)du.

It is then easy to verify (do it) that, with this choice of

K, f

actually is in

Cl ; hence (1) has a solution in

Cl ,

Cl = ~A'

and so

(c)

As noted already, the Brownian motion transition

function (page 125) is normal on generator.

R*.

We will now compute its

To begin with, we partly follow the method used

in proving the theorem on page 130: if f E Cz = if: f, f', ~'

E C(R*)} T tf(x)

we write

- f(x)

t

-l-f t ,!zTTt

- (y- x)

2t

e

_00

2

[fey) - f(x) ]dy.

Then we use the Taylor expansion fey) - f(x) '" f'(x)(y-x) where

r(x,y) .... 0

as

+

12 f"(x)(y-x) 2

y .... x, so that we have

+

r(x,y)(y-x)2,

8.

Examples

171

t f"(x) Since

r(x,y)

~J

+

ty2nt

(Z)

_(y_x)2

00

r (x ,y) (y-x) 2e

2t

dy.

-00

is bounded by

maxlf"l

and tends to

0

near

y • x, it is easy to show that T /(x)

lim t ... O+

- f(x)

According to Theorem 3, since

Z

pI E C

f E 9A

form convergence, so that Cz C 9 A, and



9 A.

we actually have uniAf =

and

2'1

f".

Thus

t(d Z/dx 2 ).

A is an extension of

Actually, Cz

1 f"(x).

t

The method of proof resembles

example (b); we simply need to verify that the equation

i- fIt (x)

Af (x) -

has a solution

f E Cz for any

g (x) g E C, A

>

O.

Changing nota-

tion slightly, we consider the equivalent equation A2f • g.

The general solution is

f(x) where

+

Kl

and

K2

1:. fX A

0

sinh(Ax - Au)g(u)du,

are arbitrary constants.

from this fact we can conclude that Problem 9.

f E C .



9 A.

C2 •

Show that the constants

KI

and

(3) are uniquely determined by the requirement that

bounded, and that the resulting function

(3)

We will leave

it to the reader to verify that there is a solution

to

f"-

f

in

K2

f

be

actually belongs

C2 . d)

We give another derivation of the generator of the

Brownian motion, using the resolvent. tion formula

Because of the integra-

7.

172

SEMIGROUP THEORY

1 -ml ul -- e

lIT

we have

=

R,\g(x)

= f;

Let us put R,\g

f

1

oo

-co

m

g(y) - e

-IV: I x- y I dy.

by direct calculation it is easy to show

that f"(x) = -2g(x) But since ZAf(x)

,\f - Af

=g

2,\R,\g(x).

+

the last expression above equals

and we have proved that and

Af

=~

fll.

This time the second derivative is seen to be an extension of A instead of the reverse. To prove

~A

= CZ'

we must show that the solution

f

of the equation ,\f -

is unique. (Why?)

1:. fIt Z

=g

But the general solution of

which clearly is not bounded (and hence not in Kl

= K2 = O. e)

,\f

1:.fll

is

2

CZ) unless

Again we have the result

On page 126 we discussed Brownian motion on

with a reflectin& barrier at

x

= O.

R+

Let us compute the gen-

erator of this process. If

shows that co,

x

x > O. ess en t ially the same argument as before \~hen

f E C2

of course, AfCeo)

= O.

In fact, for

= O.

we must have

Af(x) = if" (x) .

At

We will now consider the case

f tC z we again use a short Taylor

8.

Examples

173

series and get Ttf(O) - f(O) t

=

fih r

£J.Ql/k r t

1ft

0

Z

-L

2t [fey) - f(O)]dy

e 2

-h dy ye

0

+

1:.f ll (O) Z

+ 0(1) ;

the error term "0(1)" is similar to that in (Z).

The inte-

gral in the first term can easily be calculated, and the result is simply exist unless

fl

(O)~.

fl (0)

= 0,

necessary restriction on

=

fl (0)

lim

Hence the limit as

t

~

cannot

0

and so this boundary condition is a If

~A'

f E Cz and satisfies

0, then

Ttf(x) - f(x) t

t~O+

1

= '2f "(x)

fll E C,

exists pointwise with

and again Theorem 3 implies that the convergence is uniform so that

:::>C Z () {f: fl (0) = 0),

~A

Once more, there is actually equality. the equation

lows the same path as before: f E Cz

has a solution

() {fl (0)

"O}

The proof fol1 Af - -fll =g

for each

Z

g.

We leave

the details as an exericse: Problem 10. Af - 1:.fll

show that

Z fl (0) "0.

satisfies ator

For each g

A> 0

and each

has a unique solution

g E C([O ,co]), f E Cz which

Conclude that the domain of the generCz ()

A of the reflecting Brownian motion is exactly

{fl (0)

OJ. f)

There are two other simple cases:

we mention the

results without writing down the corresponding transition functions.

When the Brownian motion reaches

x = 0

for the

first time, instead of reflecting it may stick there forever; in this case

0

is a trap.

Consequently, we must have

174

7.

= f(O)

Tt£(O) But for Since

for all

t

and all

x> 0, we still have Af

SEMI GROUP THEORY

f, so that 1

Af(x) = 2'f"(x)

is continuous, it follows that

A£(O) x

if

f

A is now just

C2 n{f: f"(O) =

E~A'

f" (0) = O.

is called the sticking barrier boundary condition. main of

= o. This

The do-

oJ, and for functions

in this space the formula for the generator is still

Af = tf"(X). Alternatively, the process may terminate upon reaching 0

by the "killing" of the Bro\mian "particle." have

\~ill

Ttf(O)

= 0, and so ffl

f(O) = 0; the

~A unless

latter is the boundary condition for this case. this example involves L fl

Pt(x,[O,~])


0

=

(l-ct)f(O)

ctf:

f(y)dF(y)

+

oCt),

is the parameter in the exponential waiting-

time distribution at lim t+O+

+

x

= O.

From this we would get [fey) - f(O)]dF(y).

*)See Chapter 8, section 4, for a proof.

t,

9.

Conclusion

179

1 Af(x) = - flO (x) for x > 0 the above 2 limit must also equal f"(O). Thus we expect that

But if

f E~A

and

i

ex>

fo

flO (0) = 2c

[fey)

- f(O)]dF(y)

will be a necessary condition for

(5)

f E ~A; in other words,

(5) is the appropriate "boundary condition" for this process. The dependence of the condition on values of x = 0

from the boundary

f

far away

(unlike our previous boundary con-

ditions) reflects the fact that the motion of the process is not continuous in this case when it reaches the boundary. Af = !f" defined 2 ' satisfies (5)}, is the genera-

Problem 11.

Show that the operator

C2 ([O,00]) n {f tor of a normal Markov function.

on the domain

9.

Conclusion. Here are three more problems, perhaps a little harder

than most of the earlier ones: Problem 12. ing" boundary at (I. e. , Af

x

= ! f" 2

[0,1], fl (0)

Consider the Brownian motion with "stick-

= 1 and reflecting boundary at x = O.

and

= f" (1)

O. )

converges weakly as at

x = 1.

f E ~A t

(That is,

tinuous function

f.)

+

00

iff

flO

is continuous on

Prove that, for each

x, Pt (x,·)

to the unit measure concentrated

f~ Pt (x,dy)f(y)

+

f(l)

for every con-

This gives some justification for the

name "sticking". Problem 13. x

=1

is "sticky" -

pf"(l) = 0

for

Suppose instead that the boundary at the boundary condition is now

0 < p < 00.

In this case, show that

fl (1)

+

180

Ttf(x)

7. +

f(l)

does nQ1 hold for all

f

SEMIGROUP THEORY

as

t

+

~,

the process will never become permanently stuck at Problem 14.

Let

x

= 1.

A be the generator of a normal tran-

sition function on the compact space be strictly positive.

so that

S, and let

Show that the operator

Bf(x) = p(x)Af(x),

!?J

B

p E C(S)

B defined by

=

is also the generator of a normal Markov function on

S.

(The corresponding new process is said to be obtained from the old one by a "random time change.") A final remark.

It is clear that semigroup theory has

not made everything utterly simple -- that would be too much to expect.

What it has done, I hope, is to put the peculari-

ties of particular Markov processes into a unified context, so that one knows what to try and do, and why.

Now there are

specific technical problems to deal with, instead of general confusion.

I feel it's a big improvement!

An important paper, in which many of these things were

done for the first time in a systematic way, was published by Feller in 1952 with the title "The parabolic differential equations and the associated semigroups of transformations." To read it (or part of it) is still a valuable experience.

CHAPTER 8 MARKOV PROCESSES So far in the second part of this book we have studied Markov transition functions with only informal references to the random variables which actually form the processes themselves.

We now turn to this neglected side of our subject.

Of necessity, the discussion will have a more measure-theoretical flavor than hitherto.

In fact the modern theory of

Markov processes has become very complex, because it has been necessary to introduce a great deal of machinery to bridge the gap between intuition about how a process "without memory" should behave, on one hand, and what can be rigorously proved, on the other.

Here we will try to keep this machin-

ery as simple as possible, and we will introduce its components only gradually, as they are needed. 1.

Construction of processes. Suppose that

Pt

(S,5')

is a measurable space and that

is a Markov transi.tion function on

S.

The idea behind

the definition of a transition function suggests the following:

182

8.

Definition. with values in

(n.

~,P),

Pt

provid~d

A stochastic process

MARKOV PROCESSES

(xt(w): t

>

O}

S, defined on some probability space

is said to be governed £r the transition function that for some probability measure

~

on

(S,~)

we have

f ... f f

YnEBn" 'YIEB I x for all

~(dx)pt

1

(x.dYI)Pt -t (Yl' dY 2)" 'Pt -t (y -I,dYn) Z 1 n n-l n

0 < tl < t z < ••• < tn
0, and

D

E~,

8.

184

MARKOV PROCESSES

where, of course, each equality is to hold almost surely.

We

begin with the second equality in (1); the other part drops out along the way. It is clear that

is measurable with respect

pu(xt,O)

to

?=t' and hence with respect to a measurable function from S into

~t' since Pu (. ,0) [0,1). What must be

verified, therefore, is that for any set

fA p

u

(xt(w),O)dP = peA n

{X t

+u

is

A E~t C ~t; thus we have A E ~t' we conclude that

B E ~t

and the theorem is

proved.

If~: s(n) ...

Corollary. pect to

u(n),

and

J

l' f

R1

is measurable with res-

'" ( x t + "" ,x t + ) uI un

,

, 1n ' t egra hI e, 1S

then

This result is,of course, an extension of (5) above, and is proved in much the same way using that fact that ~(Xt+

ul

, ... ,X t +U

n

omit the details.

)

is measurable with respect to

~t;

we

An extension to an even larger class of

functions is also possible, which amounts to letting

~

in

(8) depend on infinitely many of the "future" random variables instead of a finite number.

We won't stop to formulate this

now; however, although I won't call it a "problem," it is a good idea for the reader to try to state and prove a result

188

8.

MARKOV PROCESSES

of this kind independently. 3.

Path functions of Markov processes. So far our study of Markov processes has been based en-

tirely on equation (1) (page 182) which shows how the finitedimensional distributions of the process are calculated from a Markov transition function.

As we saw long ago, however,

knowledge of all the finite-dimensional distributions may be insufficient to determine a random process as precisely as one may require.

This difficulty arises also in the Markov

case -- indeed, the example on page 4/5

is a Markov process

and can be obtained from the normal transition function defined by

Pt(x,{x})

= 1 for all t

> 0

and

x E [0,1].

Thus the Markov property by itself does not imply that the path functions of a process must be "reasonable." But as asserted in Chapter I, the correct question to ask is a different one:

given a transition function and an

initial distribution, does

~

exist a process having the

corresponding finite-dimensional distributions whose paths are almost all "nice" in some sense?

Kolmogorov's existence

theorem does not help much here, because if we use the space Q

= ST

Q '

and the field

JP generated by the cylinder sets of

as in proving that theorem we find that

continuous}

is not a measurable set.

{w: x t (w)

is

(The same holds for

right-continuity, for being locally bounded, and for most other reasonable properties which we might want the paths of the process to have.)

Nevertheless, under quite general con-

conditions there does exist a Markov process with nice paths

3.

Path functions of Markov processes

189

equivalent*) to any given process: Theorem 1. pose that in

S

tion.

Let

{x t ; t > O}

which is' governed

S

be

is

~

Qr

~

compact metric space, and

stochastic process with values

~

normal Markov transition func-

{x t },

Then there exists ~ process

{x t }, such that except for functions

XC.)Cw)

limits for all

~



~

eguivalent to

w-set of probability

0

the

right-continuous and have left-hand

t > O.

We will refer to such a process for short as a "rightcontinuous Markov process," although more than just rightcontinuity is involved.

Since equivalent processes have the

same finite-dimensional distributions, {Xt} the same transition function as initial distribution.

{x} t

is governed by

and has also the same

This theorem will be fundamental for

our further study of Markov processes.

Its proof is rather a

long story, since it uses the theory of "martingales" and "super-martingales" which we have not yet discussed. have many other uses as well.)

CThey

We will therefore give a

proof of Theorem 1 later, in Chapter 10 which is devoted to martingale theory and some of its applications.

For now we

will use the theorem without proof -- or, perhaps more logically, we can simply assume that we are dealing with rightcontinuous processes, while secretly relying on Theorem 1 for assurance that the objects under discussion do really exist. *)Recall that processes

{x t }

ity space and parameter set

and {Yt} T

on the same probabil-

are equivalent provided that

for all

t E T.

8.

190

MARKOV PROCESSES

It is naturally interesting and important to discover when the paths of

{x t }

are actually continuous for all

t.

The rest of this section is devoted to establishing a useful sufficient condition for path-continuity, and applying it to some of the examples we have considered earlier.

The results

below have evolved from ideas of Dynkin, Kinney and Dobrushin. Theorem 2.

{xt(w); t ~ O}

Let

be a measurable*) ran-

dom process with values in a metric space for each

E

and

> 0

M
e;}, then we


0, then



t

~

0)

= 1.

(5)

We know that right-continuity of its paths

implies that the process

{x t }

is measurable (page 6).

It is easy to see that condition (4) for the transition

func-

tion implies (1); in fact

where

~

is the initial distribution, 'and if (4) holds uni-

formly (1) follows immediately.

Hence (2) holds, and since

by assumption the process has no discontinuities except jumps we get (5). The last result gives considerable justification for the idea that conditions like (5) on page

129 are closely

related to continuity of paths; we see that it is actually sufficient for continuity if that condition holds uniformly in

x. Problem 2.

Show that there exists a version of the

Brownian-motion (Wiener) proces.s which has almost-surely continuous path functions. A)Remember that by "right-continuous process" we mean that the paths have limits from the left as well.

3.

Path functions of Markov processes

193

It is usually difficult to verify condition (4) directly, since it is rather exceptional when any simple formula for the transition probability function is available. By making stronger assumptions, however, we can give a useful criterion in terms of the generator of the process: Theorem 3.

Let

process governed Qx that, for each such that (iii)

f

~

e: > 0

(i) f and

~

Af

0

{x t }

be

~

right-continuous Markov

normal transition function. and

f E 9A

xES, there exists S, (ii)

Qll

fey) > 0

Suppose

for

y

~

Ne:(x),

both vanish on some neighborhood of

x.

Then is continuous for all

P(Xt(W) Proof. Ph(x,S-N 2 e:(x)) e:

0)

~

=

1.

Suppose that condition (4) of the corollary

above does not hold.

for this

t

Then there is an

is not

o(h)

there exist

e: > 0

uniformly in

cS > 0, h n .... 0

such that

x; in other words,

and

x

n

E S

such

that Ph (x n ,S-N 2 e:(x n )) > cSh n n Since to some limit

(6)

xn tends we will have

is compact, we can assume that

S

X', then when

S-Ne:(x) :JS-NZe:(X n )'

p(xn,x) < e:

Hence (6) implies that (7)

Ph (x ,S-Ne:(x)) > cSh n n n for all large

n.

Now using this

x

and

e:, consider the function

described in the hypothesis of the theorem. S-N (x), let e: By (7) we have

positive on the compact set minimum there.

Since c > 0

f

f is

denote its

194

S.

~

Th f(x ) n n For large

f

S-Ne(x)

close to

xn

x

n

But, since

MARKOV PROCESSES

Ph (xn,dy)f(y) > c O.

(S)

f E ~A' the convergence lim t ...

is uniform in

o

Ttf(y) - fey) - Af (y)

y.

t

Substituting

t

= hn

=

0

and

y

= xn '

and not-

Af(x ) = 0 for large n, lYe find a contradiction n with (S). If such functions f E ~A exist as postulated, ing that

therefore, (S) is impossible and so we see condition (4) does hold.

The conclusion of Theorem 3 therefore follows from the

corollary to Theorem Z above. This theorem is easy to apply in many cases.

We have,

for example, the following: Corollary.

Let

process on R1*,R+*

{x t }

be a right-continuous Markov whose generator is of the

or

form Af(x) = b(x) f"(x) + a(x)f'(x) Z

with

a (x)

and

b (x)

ter 7, section Si.)

continuous and

b (x) > O.

Suppose the domain of

A is

(See ChapCz

re-

stricted by boundary conditions such as

at each finite {x t }

end-point

a.

1

of the state-interval.

has continuous paths with probability

1.

Then

4.

Holding times

Proof. at first so does

195

As an example, suppose

x E (0,00). Af, since

If

f

. R+* ,and cons1der

S

vanishes in a neighborhood of

is a differential operator.

A

x

Thus it

is only necessary to find a dary condition at

C2 function satisfying the bounwhich vanishes near x and is positive

0

elsewhere; there is no difficulty at all in doing this. cases

x

=0

x =

and

00

they are just as simple.

The

must be examined separately but The reader can think over the de-

tails of the various cases more easily than they can be written down. To understand the working of our criterion for continuity a little better, it is useful to think through why it does not hold in the case of example Sj; even though

A is

still a differential operator in that case, the boundary condition is no longer of local character.

Another example:

when is the condition satisfied in the case (example Sa) of a finite Markov chain? 4.

Holding times. Suppose that a Markov process

state

x.

How long does it stay at

the first time?

In other words, if

{x t } x

on

begins in

S

before leaving for

TX = inf{t

~

0: x t

Fx(t) = PX(TX

what can be said about the distribution

The first thing to notice is that in general not be a random variable. and the random variables

In fact, if {x t }

S

RI

~-measurable.

TX

t)? may

n, ~,

is definitely

This melancholy situation is redeemed

by Theorem I above, however, since when continuous we have

~

are those constructed in the

proof of Kolmogorov's existence theorem, TX not

and

x},

~

{x t }

is right-

8.

196

for all

s < t}

for all rational

s < t}

{W: x

= {W:

x s .. x

MARKOV PROCESSES

.. x

s

x}. n s

O',

At the other extreme,

a trap.

Using the right continuity of the paths, we have

4.

Holding times

f(t)

197

= -ni t

s

n ....

00

lim

n ....

Pt/n (x, {x})

O,l, ... ,n-l)

i

n-l

00

Thus the theorem will be proved as soon as we establish the following: Lemma.

Suppose that

Proof.

Formally, for any

is a function on [0,00] on with values in [0,1] such that lim a(t/n) = f (t) exists n .... 00 for every t > 0, where f(t) is non-increasing. Then f(t) = e -ct for some constant c E [0,00].

feat) =

lim a(at)n n .... 00 n

=

aCt)

a

>

°

we can simply write

lim m .... 00

The only difficulty with this is that making the change of variable a

n = am

may lead to non-integral values of

m.

If

is rational, however, m will be an integer infinitely offeat) = f(t)a

ten, and so the limit relation in this case. hold for all and

c

for

Since

f

is monotonic the relation then must

a, and so fixing -log f(t)/t

is justified

t

we get

and writing

s

for

f(s) = exp(-cs)

at

as claimed.

This proves the lemma and the theorem. The discussion so far has concerned only the first holding time in

x, assuming

x

is the initial state.

The

result can be strengthened by using the Markov property: Theorem 2. governed by where

AE

Let

be a right-continuous process

-----

Pt' and suppose that ~t'

for

P~({Xt

= x} n

A)

>

0

Then t u is deter-

that

We must have

mined by n Z + k'

{Ti' i > Z}

and must produce the values

tl - u, t1 + t z

at times

-

that (Z) defines the original path. peNt

1

= k'

u

k'

and

in just the same way

Thus we have

+ I, N = k' + 1 + n 2 ) tl+t z

Substituting from (3) for the probability in the integrand (because of the induction hypothesis), this becomes k' k' nZ A (tl-u) -HZ (Hz) } -----e du k"•

e

which agrees with (3). k

=

-At

1

n Z·'

(At l )

k' +1

(k' + 1) !

e

-At z (At Z)

nZ

Thus (3) holds in all cases

where

Z.

Formula (3) for general an induction on

k

k

can now be established by

in just the same way we passed above from

202

k

8.

=1

k

to

2.

MARKOV PROCESSES

The details of this induction step will

be omitted. There is nothing very surprising. incidently. about this dual way of describing the Poisson process -- either as a Markov process with Poisson-distributed independent increments or as a "counting" of the number of exponentia11ydistributed waiting times which have elapsed.

One can see

the same phenomenon in a simple model using discrete Bernoulli trials.

Suppose for each

with probability

p

or

= 1.2 •••.

n

that

Xn

1 - P and that the

q

=1

or

{Xn }

0

are

Nn = ~ X.1 (with N = 0). The dis0 i=l is obviously binomial for each n. and

independent; then set tribution of {Nn }

Nn

has independent increments.

On the other hand. the

waiting times between occurrences of a "one" are independent and have a common geometric distribution. and the relation of Nn

to the number of such waiting times which have elapsed be-

fore time process.

n

is analogous to our construction of the Poisson

Moreover. if

units by setting

n

=

p

+

0

and time is measured in large

[At/pl. the discrete model approaches

the Poisson process with parameter Problem 4.

A as a limit.

Find a way to formulate rigorously the

above statement about the approximation of the discrete process to the Poisson -- and then prove your assertion. In much the same way -- using the idea of waitingtimes in each state -- one can directly construct the paths of any finite-state Markov chain (Chapter 6) and lots of more complicated processes as well.

But of course this won't

work for Brownian motion. since in that case for any

t > O.

p (x

= x)

= 0

x t A very nice direct construction of Brownian

S.

Example:

the Poisson process

203

motion as the sum of an infinite series of functions of

t

with random coefficients is described in Chapter 4 of my book Probability (I did not discover it!); Wiener's original construction used random Fourier series. with such things in this book.

We won't go further

In my opinion, though, com-

parison of special methods with general ones is often productive and almost always interesting, and no one can "really understan~'

herself.

probability theory without doing some of that for

CHAPTER 9 STRONG MARKOV PROCESSES Suppose that known that

xt

o

= x,

{x t }

is a Markov process.

the process can be thought of as "be-

ginning afresh" thereafter as though state.

If it is

x

had been its initial

In particular, we have seen that (i)

But suppose that

to

is not fixed, but instead is itself a

random variable.

Does any analogous statement hold in this

case? It is quite natural to expect that something similar to (i) should be true.

For example, if

to

is the random

moment when a continuous Brownian motion first reaches the state

x

=a

(having started elsewhere), then it is plausible

and true that the motion behaves subsequently as though it were beginning in state the other hand, if

to

a

with no "memory" of its past.

On

means a certain fixed length of time

before the process reaches

a, then clearly (i) does not hold.

1.

Families of

a-fields

205

The tasks of the present chapter are to formulate precisely this idea of beginning afresh at "suitable" random times, to show that many processes do satisfy the resulting "strong Markov property," and to explore some of the consequences. In the following pages, of course, we will only make a beginning with these matters which lead into much of the recent research on Markov processes.

In particular, section 6 is a

brief introduction to a substantial chapter in contemporary probability theory. Some suggestions for further reading can be found in the bibliography. 1.

Families of

a-fields.

We have previously defined and discussed the Markov property using the

a-fields

~t' ~t

and

~t

(page 9)

which are generated by the random variables of a process {x t }.

A small change in our definition and point of view

turns out to be quite useful: Defini tion 1. let

T

be a subset of

there is a sub when Then

Let

be a probability space,

Rl , and suppose that, for each

a-algebra

ff such that fft

of

.~

1

t E T,

C.9t 2

tl :. t Z' Let {x t } be a random process on (Q, ff, P) . {x t } is said to be adapted to the fields { fft } pro-

vided that the function t

(Q, ff,P)

x t (·)

is

~-measurable

for every

E T. Naturally every process

past, that is, to the fields smallest possible ones.

{x t }

~t'

In fact

is adapted to its own

and these fields are the {x t }

is adapted to any

non-decreasing family of fields such that

~ :>~t

for all

9.

206

t, and to no others.

STRONG MARKOV PROCESSES

However, important aspects of the re-

lationship between the variables and the fields may be distorted if the latter are "too big."

We will discuss this

idea a little in the case of Markov processes. Definition 2.

(n,

YoP)

with

T

ing family of sub

=

Let +

{x t

or

R

}

be a random process on

Rl , and let

a- fields of

¥.

be an increas-

We say that

the Markov property wi th respect to the fields vided that (i)

{x t }

is adapted to

{¥t}

{x t }

has pro-

{Y}

t and (ii)

(a. s. )

for every

t E T

and every set

BE

(1)

~t'

This definition reduces to our earlier one if

~

=

Y0

m>O

('I

hk

2.

l }] a + -m

The set in brackets on the right is in

Y a+m -1 and so the limit as m-+-oo belongs to ~+' or; because of right-continuity, to ~. This proves that A E SZ, so

nn

'!

(vii)

~

n

Let

C

~.

{x t }

on some metric space

be a right-continuous Markov process S, where the Markov property holds with

respect to fields { ?"t}. fine

T

Then

T

inf{t

=

~

Let

0: x t EU}

{w: x t EU

{w: T < a}

U).

{~}.

{w: x t EU for some rational t < a} = =

Obviously this set is in (viii)

be an open set and de-

(the first-passage time to

is a stopping time of

Proof.

U CS

t < a}

for some U

t 0

be generalized.)

p(x 1 , ... ,x n ), and assume for

everywhere in

Rn.

(This can easily

In elementary probability, one defines the

conditional density of to be the function

Xl"" ,Xn

Xn

given

258

APPENDIX 2

n Il < ""t as defined above can

It is an important exercise to verify that if then the conditional expectation of

Xn

EClx

be computed in this "elementary" way:

f~"" xp(X I

•· ..• xn_l,x)dx

f~