Dynamic Programming 9781400835386

This classic book is an introduction to dynamic programming, presented by the scientist who coined the term and develope

199 33 23MB

English Pages 392 [372] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Dynamic Programming
 9781400835386

Citation preview

DYNAMIC PROGRAMMING

DYNAMIC PROGRAMMING BY

RICHARD BELLMAN With a new introduction by Stuart Dreyfus

PRINCETON UNIVERSITY PRESS PRINCETON AND OXFORD

Copyright © 1957 by Princeton University Press New introduction © 2010 by Princeton University Press Published by Princeton University Press, 41 William Street, Princeton, New Jersey 08540 In the United Kingdom: Princeton University Press, 6 Oxford Street, Woodstock, Oxfordshire OX20 1TW press.princeton.edu First edition, 1957 First Princeton Landmarks in Mathematics edition, with a new introduction, 2010 Library of Congress Control Number 2009943155 ISBN 978-0-691-14668-3 Printed on acid-free paper. °o Printed in the United States of America 1 3 5 7 9

10

8 6 4 2

To Betty-Jo whose decision processes defy analysis

Contents INTRODUCTION TO THE 2010 EDITION

XV

PREFACE

Xix

CHAPTER I A MULTI-STAGE ALLOCATION PROCESS SECTION

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26

Introduction A multi-stage allocation process Discussion Functional equation approach Discussion A multi-dimensional maximization problem A "smoothing" problem Infinite stage approximation Existence and uniqueness theorems Successive approximations Approximation in policy space Properties of the solution—I: Convexity Properties of the solution—II: Concavity Properties of the solution—III: Concavity An "ornery" example A particular example—I A particular example—II Approximation and stability Time-dependent processes Multi-activity processes Multi-dimensional structure theorems Locating the unique maximum of a concave function Continuity and memory Stochastic allocation processes Functional equations Stieltjes integrals Exercises and research problems Bibliography and comments

. . .

3 4 5 7 9 10 10 11 12 16 16 19 20 22 25 26 28 29 30 31 33 34 37 38 39 40 40 59 vii

CONTENTS

CHAPTER II A STOCHASTIC MULTI-STAGE DECISION PROCESS SECTION

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14

Introduction 61 Stochastic gold-mining 61 Enumerative treatment 62 Functional equation approach 63 Infinite stage approximation 63 Existence and uniqueness 64 Approximation in policy space and monotone convergence . 65 The solution 66 Discussion 69 Some generalizations 69 The form off (x, y) 71 The problem for a finite number of stages 72 A three-choice problem 74 A stability theorem 76 Exercises and research problems 77 Bibliography and comments 79 CHAPTER III THE STRUCTURE OF DYNAMIC PROGRAMMING PROCESSES

81 Introduction Discussion of the two preceding processes 81 The principle of optimality 83 Mathematical formulation—I. A discrete deterministic process 83 Mathematical formulation—II. A discrete stochastic process 85 Mathematical formulation—III. A continuous deterministic process 86 3.7 Continuous stochastic processes 87 3.8 Generalizations 87 3.9 Causality and optimality 87 3.10 Approximation in policy space 88 Exercises and research problems 90 Bibliography and comments 115

3.1 3.2 3.3 3.4 3.5 3.6

viii

CONTENTS

CHAPTER IV EXISTENCE AND UNIQUENESS THEOREMS SECTION

4.1 Introduction 4.2 A fundamental inequality 4.3 Equations of type one 4.4 Equations of type two 4.5 Monotone convergence 4.6 Stability theorems 4.7 Some directions of generalization 4.8 An equation of the third type 4.9 An "optimal inventory" equation Exercises and research problems Bibliography and comments

116 117 119 121 122 123 124 125 129 132 151

CHAPTER V THE OPTIMAL INVENTORY EQUATION 5.1 5.2

5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15

Introduction 152 Formulation of the general problem 153 A. Finite total time period 154 B. Unbounded time period—discounted cost 156 C. Unbounded time period—partially expendable items . . 1 5 6 D. Unbounded time period—one period lag in supply . . . 156 E. Unbounded time period—two period lag 157 A simple observation 157 Constant stock level—preliminary discussion 158 Proportional cost—one-dimensional case 159 Proportional cost—multi-dimensional case 164 Finite time period 166 Finite time—multi-dimensional case 169 Non-proportional penalty cost—red tape 169 Particular cases 171 The form of the general solution 171 Fixed costs 172 Preliminaries to a discussion of more complicated policies . 173 Unbounded process—one period time lag 173 Convex cost function—unbounded process 176 Exercises and research problems 178 Bibliography and comments 182 ix

CONTENTS

CHAPTER VI BOTTLENECK PROBLEMS IN MULTI-STAGE PRODUCTION PROCESSES SECTION

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15

Introduction 183 A general class of multi-stage production problems . . . . 1 8 4 Discussion of the preceding model 187 Functional equations 188 A continuous version 189 Notation 191 Dynamic programming formulation 192 The basic functional equation 192 The resultant nonlinear partial differential equation . . . . 1 9 3 Application of the partial differential equation 193 A particular example 194 A dual problem 197 Verification of the solution given in § 10 200 Computational solution 202 Nonlinear problems 203 Exercises and research problems 204 Bibliography and comments 205 CHAPTER VII BOTTLENECK PROBLEMS: EXAMPLES

7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8

Introduction Preliminaries Delta-functions The solution The modified w solution The equilibrium solution A short-time w solution Description of solution and proof Bibliography and comments

207 209 211 211 214 215 217 218 221

CHAPTER VIII A CONTINUOUS STOCHASTIC DECISION PROCESS 8.1 8.2 8.3 x

Introduction Continuous versions—I: A differential approach ' Continuous versions—II: An integral approach

222 223 224

CONTENTS SECTION

8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14 8.15 8.16 8.17 8.18

Preliminary discussion Mixing at a point Reformulation of the gold-mining process Derivation of the differential equations The variational procedure The behavior of Kt The solution for T = = oo Solution for finite total time The three-choice problem Some lemmas and preliminary results Mixed policies The solution for infinite time, D > 0 D P'. q.