Direct Methods in Control Problems [illustrated] 0817647228, 9780817647223

Various general techniques have been developed for control and systems problems, many of which involve indirect methods.

343 15 4MB

English Pages 311 [312] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Direct Methods in Control Problems [illustrated]
 0817647228,  9780817647223

Citation preview

Peter Falb

Direct Methods in Control Problems

Peter Falb

Direct Methods in Control Problems

Peter Falb Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge, MA, USA Division of Applied Mathematics Brown University Providence, RI, USA

ISBN 978-0-8176-4723-0 (eBook) ISBN 978-0-8176-4722-3 https://doi.org/10.1007/978-0-8176-4723-0 © Springer Science+Business Media, LLC, part of Springer Nature 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This book is published under the imprint Birkhäuser, www.birkhauser-science.com by the registered company Springer Science+Business Media, LLC The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.

To Aaron, Eda and Ruth

Preface

Control theory represents an attempt to codify, in mathematical terms, the principles and techniques used in the analysis and design of control systems. Considerable interest in control problems has developed over many years. During the 1960s, there were a number of significant advances, namely (among many): (1) Bellman’s dynamic programming; (2) the Kalman filter and the introduction of stochastic differential equations; (3) multivariable linear system theory using linear algebra; (4) Lyapunov stability; and (5) the maximum principle of Pontryagin. The development which culminated in this volume began when I was at the Lincoln Laboratory in 1965. I gave some lectures on the topic in Lund Institute of Technology in Sweden in the mid1970s and taught a course on the topic at Brown University circa 1985. I then moved onto different things but again began to think about direct methods in 2006 or so. While there is a considerable literature on direct methods in the calculus of variations, there is less (considerably less) in control theory. Moreover, the control literature is a bit fragmented and does not appear to have a unified thematic point of view. Our aim in this book is to consider direct methods from a unified general point of view and to provide a stimulus for future research. Explicitly, implicitly, and by example, we hope to indicate potential areas of inquiry which may prove fruitful. We include (brief) summaries of relevant mathematics in some appendices. Hopefully, the reader will recognize that we wish to convey the ideas and general principles involved with direct methods in control problems (and not the detailed treatment of particular, specific control problems).

vii

viii

Acknowledgements

Acknowledgements There are a great many friends, colleagues, teachers and students to whom considerable thanks are due. My professors at the Harvard Mathematics Department from 1953–1961 were enormously influential on me. As noted, the original idea for this book goes back a long way. I gave some preliminary lectures at the Lund Institute of Technology in the 1970s and I should like to express my appreciation to Karl Aström. In the 1980s, I taught a graduate seminar-type course on the subject and I should like to thank my former student, the late Ruth Curtain, for useful correspondence. For many years, I have had the use of a quiet office in the Laboratory for Information and Decision Systems (LIDS) at MIT. Without this office and the colleagues there, this book would never have been written. The formal effort began about 12 years ago. The aid, encouragement and friendship of Professor Sanjoy K. Mitter and Professor Moe Win are very much appreciated. Last, but by no means least, considerable gratitude is due Rachel Cohen of LIDS for her extraordinary computer preparation of this challenging manuscript. Peter Falb Cambridge, 2019

Contents

1

Minimization of Functionals: Generalities . . . . . . . . . . . . . . . . .

2

Derivatives and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3

Inverse and Implicit Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4

Minimization of Functionals: Hilbert Spaces . . . . . . . . . . . . . . 23

5

Minimization of Functionals: Uniformly Convex Spaces . . . 31

6

Minimization of Functionals: Hilbert Space, 2 . . . . . . . . . . . . 41

7

Dynamical Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

8

Optimization of Functionals: Necessary Conditions . . . . . . . 79

9

Control Problems: Classical Methods . . . . . . . . . . . . . . . . . . . . . 95

5

10 Control Problems: An Existence Theory . . . . . . . . . . . . . . . . . . 125 11 Interlude: Variations and Solutions . . . . . . . . . . . . . . . . . . . . . . . 149 12 Approximation I: Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 13 Approximation II: Standard Discretizations . . . . . . . . . . . . . . . 173 14 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 15 Roads Not Traveled and Future Paths . . . . . . . . . . . . . . . . . . . . 207 A

Uniformly Convex Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

B

Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

ix

x

Contents

C

Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

D

Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

E

Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

F

Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

Introduction

Considerable interest in control problems has developed over many years and consequently, various general techniques for the solution and analysis of these problems have been studied extensively. In particular, dynamic programming, viscosity solutions, Lyapounov’s method, the maximum principle of Pontryagin, the “classical” variational approach, stochastic control and filtering come to mind. Primarily, the basic method of attack has been to “reduce” the control problem to a problem involving a differential equation (such as the Hamilton–Jacobi–Bellman partial differential equation of dynamic programming), or a system of differential equations (such as the canonical equations of the maximum principle). These indirect methods are not always effective and so, alternative direct methods have been sought. This book is essentially devoted to an exposition of some of these methods. Consider the prototype problem: Problem 0.1. Determine inf{J(x, u) : x = ϕ(u) , u ∈ U}. If we let J(u) = J(ϕ(u), u) and H = X × U where x ∈ X and U ⊂ U, then the abstract prototype control problem becomes: Problem 0.2. Determine inf{J(u) : u ∈ U}. We let A ⊂ H be the set {(ϕ(u), u) : u ∈ U} and so we can abstract the problem a bit further to: Problem 0.3. Determine inf{J(h) : h ∈ A}. This problem makes sense if there is an h with J(h) finite and if J( · ) is bounded below. In that case, we let J ∗ = inf h∈A J(h). Definition 0.4. A family {hα : α ≥ 0} is a minimizing family if limα→∞ J(hα ) = J ∗ . A sequence {hi } is a minimizing sequence if limi→∞ J(hi ) = J∗ . For simplicity here, we let {h∗α } denote either a minimizing family or sequence. These always exist. If we know that {h∗α } had a limit h∗ in A 1

Introduction

2

∗ and that J( · ) was “smooth” enough for J(h∗ ) = lima→∞ J(hα ) (so that ∗ ∗ J(limα→∞ hα ) = limα→∞ J(hα )), then we could conclude that h∗ was a solution to the minimization problem. In other words, the direct method approach consists of:

(i) constructing a minimizing family {h∗α }; ∗ = h∗ ∈ A; and, (ii) showing that limα→∞ hα ∗ (iii) demonstrating that J(h ) = J∗ . We have two prototype lemmas. Lemma 0.5. If J is lower semi-continuous on A and if A is compact, then there is an h∗ ∈ A such that J∗ = inf h∈A J(h) = J(h∗ ). Lemma 0.6. If limα→∞ hα = h∗ , {hα } is a minimizing family, and J is lower semicontinuous at h∗ , then J(h∗ ) = limα→∞ J(hα ) = J∗ . These proto-lemmas are a guide to approaching control problems via direct methods. The key is constructing convergent minimizing sequences or families. We, loosely, distinguish two approaches: (i) integration methods; and, (ii) approximation methods. A typical integration method takes the form dh(τ ) = ρ(h(τ )), dτ

h(0) = h0

where the equation has a solution h∗ (τ ) which converges asymptotically, i.e., lim h∗ (τ ) = h∗

τ →∞

and gives a minimizing family. A typical approximation method takes the form: Let ψ1 , . . . , ψk , . . . be elements of H and let Hk = span {ψ1 , . . . , ψk } and Ak = Hk ∩ A. Consider the problem inf{J(h) : h ∈ Ak } = inf{Jk (h)}. Since Ak ⊂ Ak+1 ⊂ . . . , Jk ≥ Jk+1 ≥ . . . , if limk→∞ inf h∈Ak {Jk (h)} = J∗ , then h∗k is a minimizing sequence. Two versions used in the sequel are the Ritz–Galerkin method and the finite element method. Our aim is to provide a general overview of the ideas and methodology. For instance, we show that, for a standard control problem for an elliptic linear partial differential equation, both the Ritz–Galerkin method and the finite element method converge. However, for a specific equation on a specific domain, the topology of the domain dictates the choice of partition and the particulars of the equation dictate the choice of approximation basis. We do not deal with this important question. Now let us turn our attention to outlining the book. First, note that the level of mathematical sophistication generally rises as the book progresses. Moreover, there are a number of examples which appear again and again from different points of view and different levels of sophistication. There are six

Introduction

3

mathematical appendices which provide material both relevant to the text and to potential extensions and amplifications to results in the text (some discussed in Chapter 15). We begin in Chapter 1 with a broad general account of the problem. A result of Moser (Theorem 1.13) is at the heart of penalty function methods and merits investigation. In Chapter 2, the idea of differentiation, weak derivatives, and a preliminary examination of the notion of solution are considered. General inverse and implicit function theorems are given in Chapter 3. In Chapter 4, we begin to develop the theory for Hilbert spaces. Lemma 4.12 provides a preliminary Lagrange multiplier rule and leads to an elementary Hilbert space integration method (Theorem 4.27). Chapter 5 involves extension of the theory to uniformly convex spaces. The Sobolev gradient property is introduced and an integration method theorem proved (Theorem 5.19). We return in Chapter 6 to Hilbert space problems. We deal with both integration and approximation methods. Coerciveness plays an important role. We consider “least squares” methods and introduce, by example, finite element methods. In Chapter 7, we define dynamical control systems, introduce generalized controls, and prove basic trajectory existence theorems. We deal with necessary conditions in Chapter 8. The approach is somewhat topological using a notion of a twisted approximation (Definition 8.34) which provides a way of defining admissible variations. A key result is Lemma 8.43. Chapter 9 deals with “classical” methods in control problems. In essence, this is an (abbreviated) review of the Pontryagin principle and some ideas from dynamic programming. A general existence theory for optimal controls is developed in Chapter 10. The key is to prove continuity in the control. While in principle elementary, the approach appears somewhat new in execution and should be rather generally applicable. Chapter 11 is a change of pace in which we revisit the notions of admissible variations and solutions. We study in a general way the idea of approximation in Chapter 12. These concepts probably merit further investigation. Standard discretizations of control problems (primarily for dynamical control systems) are studied in Chapter 13. A convergence result is obtained for rather general discretization (13.12). We also examine both the Ritz–Galerkin and finite element methods here. Chapter 14 is devoted to some examples involving control systems defined by partial differential equations (and which are non-dynamic). Both the Ritz–Galerkin and finite element methods converge. In Chapter 15, we summarize and indicate both aspects we did not study and areas for potential advances.

Chapter 1

Minimization of Functionals: Generalities

Let X be a topological space and let U be a non-empty subset of X so that 6 ∅. U ⊂ X and U = Let J : X → R be a real-valued function on X with dom (J) ⊃ U. We consider Problem 1.1. Minimize J(u) for u ∈ U. We shall without further ado make the following assumptions: Assumption 1.2. There is a u ∈ U such that J(u) < ∞. Assumption 1.3. inf{J(u) : u ∈ U} = J ∗ is finite. These assumptions mean that the problem makes sense. Lemma 1.4. If J is lower semi-continuous and U is compact, then there exists u∗ ∈ U such that J ∗ = inf{J(u) : u ∈ U} = J(u∗ ).

(1.5)

u∗ is called a minimizing element. Proof. Let Kn = {h : J ∗ (h) ≤ J ∗ + 1/n} ∩ U.

(1.6)

Then we can immediately see that Kn is closed in U and that Kn+1 ⊂ Kn for all n and that Kn is non-empty for every n. It follows that KN =

N \

Kn 6= ∅

(1.7)

n=1

and, hence, since U is compact, that there is an element u∗ of U such that u∗ ∈ Kn for all n. Since u∗ ∈ U, J(u∗ ) ≥ J ∗ . On the other hand, J(u∗ ) ≤ t u J ∗ + 1/n for all n implies J(u∗ ) ≤ J ∗ and the lemma is established.

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_1

5

1 Minimization of Functionals: Generalities

6

Definition 1.8. Let Ω = {ω : ω ∈ [0, ∞]} be an (unbounded) upwardly directed set. A family {uω : uω ∈ U} is a minimizing family for J over U, if J ∗ = inf{J(u) : u ∈ U} = lim J(uω ). ω→∞

(1.9)

Lemma 1.10. If {uω } is a minimizing family, if lim uω = u∗ exists and is ω→∞ an element of U, and if J is lower semi-continuous at u∗ , then J ∗ = inf {J(u) : u ∈ U} = lim J(uω ) = J(u∗ ) ω→∞

so that u∗ is a minimizing element. Proof. Since {uω } is a minimizing family, J ∗ = inf {J(u) : u ∈ U} = lim J(uω ). But u∗ ∈ U and so J(u∗ ) ≥ lim J(uω ). If  > 0 is given,

ω→∞

ω→∞

then there is an ω() such that J(u∗ ) ≤ J(uω ) + 

(1.11)

for ω > ω() since J is lower semi-continuous at u∗ . It follows that J(u∗ ) ≤ lim J(uω ) +  = J ∗ + . Since  is arbitrary, J(u∗ ) ≤ J ∗ and the lemma is ω→∞ t u established. Often the subset U of X is given by U = {x ∈ X : ϕ(x) = 0}

(1.12)

where ϕ : X → R is a non-negative functional, i.e., ϕ(x) ≥ 0 for all x ∈ dom(ϕ) and ϕ is lower semi-continuous. We write U = X(ϕ) or Xϕ in this case. Theorem 1.13 (Moser). Suppose that J and ϕ are lower semi-continuous, that ϕ(x) ≥ 0 for all x ∈ dom(ϕ), and that U = {x ∈ X : ϕ(x) = 0} = Xϕ is non-empty. Let δn ∈ (0, ∞) with δn → ∞ as n → ∞. Suppose (i) for all δn , there is an xn ∈ X such that J(xn ) + δn ϕ(xn ) ≤ J(x) + δn ϕ(x)

(1.14)

for all x; (ii) there is a u∗ in X such that lim xn = u∗ . n→∞



Then u is an element of U = Xϕ and J(u∗ ) ≤ J(x) for all x in U = Xϕ . Proof. Let γn = J(xn ) + δn ϕ(xn ) and let γ = glb {J(x) : x ∈ U}

1 Minimization of Functionals: Generalities

7

If x ∈ U, then J(x) = J(x) + δn ϕ(x) ≥ γn via (i) and γn ≤ γ. In other words, J(xn ) + δn ϕ(xn ) ≤ γ.

(1.15)

Since δn ϕ(xn ) ≥ 0, we have J(xn ) ≤ γ for all n. It follows from the lower semi-continuity of J that J(u∗ ) ≤ lim inf J(xn ) and that J(u∗ ) ≤ lim inf J(xn ) ≤ γ.

(1.16)

It follows that, say, J(u∗ ) − 1 < lim J(xn ) and, hence, that there is an n0 such that for n > n0 , J(xn ) ≥ J(u∗ ) − 1. By virtue of 1.14, for some d, J(u∗ ) − 1 + δn ϕ(xn ) ≤ d or δn ϕ(xn ) ≤ d − J(u∗ ) + 1 for n > n0 . Thus {δn ϕ(xn )} is bounded. Since δn → ∞ as n → ∞, ϕ(xn ) → 0 as n → ∞. But then limϕ(xn ) = lim ϕ(xn ) = 0. n→∞

Since ϕ is lower semi-continuous, ϕ(u∗ ) ≤ limϕ(xn ) = 0. But, by assumption, ϕ(x) ≥ 0 and so ϕ(u∗ ) = 0 i.e. u∗ ∈ U = Xϕ . Therefore, J(u∗ ) ≥ γ and, by 1.16, J(u∗ ) = γ. The theorem is established. t u Corollary 1.17. If X is compact, then (i) and (ii) are satisfied. Proof. Let δn be any sequence in (0, ∞) with δn → ∞ as n → ∞. Since J(x) + δn ϕ(x) is lower semi-continuous, by Lemma 1.4, we have at least one x∗n such that J(x∗n )+δn ϕ(x∗n ) ≤ J(x)+δn ϕ(x). Since X is compact, {x∗n } has a subsequence x∗ni which converges to u∗ in X. Set xi = x∗ni and let δi = δni and the corollary follows. t u The approximation Theorem 1.13 often provides a way to “remove” constraints. Example 1.18. Let f : R2 → R, g : R2 → R be C 2 in a neighborhood U of x0 = (x0 , y0 ). Let Sg = {x : g (x) = 0} and suppose that (∇g)(x0 ) 6= 0 so 2 that k∇g (x0 )k 6= 0. Assume that x0 is a minimum of f (x) subject to x ∈ Sg

1 Minimization of Functionals: Generalities

8

and that there is an open neighborhood U1 of x0 such that f (x) > f (x0 ) for all x 6= x0 in U1 ∩Sg . Let J = f and ϕ = g22 (note Xϕ = Sg ). Let V ⊂ U1 be a closed bounded neighborhood of x0 . Let xn be a point at which f (x)+δn g22 (x) is minimized. By Corollary 1.17, xn → x0 and, since x0 is an interior point of V , eventually (i.e., for n > N0 ) xn is an interior point of V . It follows that ∂f ∂g (xn ) + δn g(xn ) (xn ) = 0, ∂x ∂x ∂f ∂g (xn ) + δn g(xn ) (xn ) = 0, ∂y ∂y and, hence, that ∂(f, g)/∂(x, y)|xn = 0. By continuity, ∂(f, g)/∂(x, y)|x0 = 0 and so there is a (Lagrange multiplier) λ such that ∂f ∂g (x0 ) + λ (x0 ) = 0, ∂x ∂x

∂f ∂g (x0 ) + λ (x0 ) = 0. ∂y ∂y

Of course, the result can be established without the approximation theorem. Example 1.19. Consider the control system x˙ = f (x, u),

t ∈ (t0 , t1 ),

x(t0 ) = x0 ,

u(t) ∈ Ω ⊂ Rm

(1.20)

with appropriate continuity assumptions [A-3]. Let Sg be a “smooth” k-fold in Rn given by Sg = {x : g(x) = 0}. Assume that the cost functional J(u) is given by Z t1

J(u) =

L(x u (t), u(t)) dt t0

where xu (t) is the trajectory of 1.20 generated by u(·). The control problem is: determine the (admissible) control u, which transfers (t0 , x0 ) to {t1 } × Sg , that minimizes J(u). Let δn ∈ (0, ∞) with δn → ∞ as n → ∞. Let ϕ(u) = 1 2 2 g(xu (t1 )) so that xu (t1 ) ∈ Sg if and only if ϕ(u) = 0. If there is a uδn such that J(uδn ) + δn ϕ(uδn ) ≤ J(u) + δn ϕ(u) and lim uδn = u∗ , then, by the theorem, u∗ will solve the control problem. n→∞ In essence, the theorem allows for the replacement of problems involving constraints, particularly boundary conditions, by problems which are “free” (i.e., without constraints). An alternative point of view is expressed by the following problem: Problem 1.21. Let X, Y be topological vector spaces. Let U ⊂ X, let π be a (convex) subset of Y, let J : X → R with DJ ⊃ U (DJ is domain of J) and let g : X → Y. Determine u∗ ∈ U such that g(u∗ ) ∈ π and such that J(u∗ ) ≤ J(u) for all u with u ∈ U and g(u) ∈ π.

1 Minimization of Functionals: Generalities

9

We may say “find the minimum of J on U subject to the constraint g.” We make the assumptions analogous to 1.2, 1.3; namely, (1) there is a u ∈ U with g(u) ∈ π and J(u) < ∞, and (2) inf{J (u) : u ∈ U, g(u) ∈ π} = J ∗ is finite. This formulation [H-2, W-1] is useful in providing necessary conditions using properties of convex sets and fixed point theorems. The idea is to consider the map K = J × g : X → R × Y and the ray ρ = {(b, P ) : b < J ∗ , P ∈ π}. If ˜ : X → R × Y with U˜ ⊂ D ˜ and K ˜ “close to” K there are a U˜ ⊂ U and a K K ˜ ˜ such that K (U) is convex, then there will be a non-zero λ ∈ (R × Y)∗ which ˜ (U) ˜ and ρ · U, ˜ K ˜ is an “approximation” to U, K. For instance, separates K if X = Rn , Y = Rm , π = {0}, and J and g are continuously differentiable, ˜ is the Jacobian of the map K = J × g of Rn → Rm+1 so that then K ˜ ˜ K(x + δx ) = K(x) + ∂K ∂x δx + o(kδx k) = K(x) + K(x)δx + o(kδx k) and U satisfies the following condition: if δu1 , . . . , δuk are independent elements of ˜ then there is an  > 0 and a continuous map ζ of the convex hull of U, {u∗ , u∗ + δp1 , . . . , u∗ + δpk } into U such that ζ (u∗ +δu ) = u∗ +δu +o(kδu k). We shall examine this approach in the sequel (Chapters 8, 9).

Chapter 2

Derivatives and Solutions

We often consider control problems for systems defined by “differential” equations (ordinary, partial or stochastic) and we require an appropriate notion of differentiation and what constitutes a solution. Suppose f : R → R, then, in elementary calculus, we say that f is differentiable at x0 if lim

h→0

f (x0 + h) − f (x0 ) = f 0 (x0 ) h

exists. Alternatively, we can view f 0 (x0 ) as defining a linear transformation Df (x0 ) ∈ L(R, R), via Df (x0 )h = f 0 (x0 )h and, if we say that f : R → R, g : R → R are tangent at x0 if lim |f (x0 + h) − g(x0 + h)|/|h| = 0,

h→0

then we can say that f is differentiable at x0 , if there is a linear map Df (x0 ) ∈ L(R, R) such that f (x0 ) + Df (x0 )h is tangent to f at x0 . We now have: Definition 2.1. Let X, Y be Banach spaces with U open in X, and let f, g be continuous maps of U into Y . f and g are tangent at x0 ∈ U if lim ||f (x0 + h) − g(x0 + h)||/||h|| = 0.

||h||→0

(2.2)

Remark 2.3. There is at most one linear map L such that f (x0 ) + L(x − x0 ) is tangent to f at x0 . Definition 2.4. f : U → Y is differentiable at x0 if there exists a (unique) linear map Df (x0 ) such that f (x0 ) + Df (x0 )(x − x0 ) is tangent to f at x0 . f is differentiable on U if Df (x0 ) exists for all x0 ∈ U . It is easy to show that Df (x0 ) ∈ L(X, Y ). We call Df (x) the Fréchet or classical derivative of f .

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_2

11

2 Derivatives and Solutions

12

Example 2.5. Let U ⊂ R and f : U → R. Consider the equation Z f (x + h) − f (x) − f 0 (x)h lim dx = 0. h→0 U h

(2.6)

If f 0 (x) exists a.e. in U , then clearly (2.6) holds. Suppose (2.6) holds for some f 0 (x) and f is bounded, then f 0 (x) is the derivative of f a.e. in U (Exercise). The point is that the classical derivative is not the entire story. We can, by analogy with ordinary calculus, define higher and partial derivatives [D-7, D-9, B-1] which have the standard properties. The notion of a solution requires some care. Let us begin by considering the simple “control” system −

d2 x = u(t), dt2

t ∈ (0, 1).

(2.7)

Observe that xa,b (t) = at + b is a solution of −d2 x/dt2 = 0 for all a, b and hence that, if xu (t) is a solution of (2.7), then so is xu (t) + xa,b (t) for all a, b. In other words, we need auxiliary data for uniqueness. For example, if xu (t) is to be the solution, then (x0u (1)) = β

xu (0) = α,

would be typical auxiliary conditions. Let Z 1    df dg Q(f, g) = dt. dt dt 0

(2.8)

(2.9)

Let Y = {v ∈ L2 [0, 1] : v(0) = 0, Q(v, v) < ∞}. Consider the problem: find R1 x ∈ Y such that Q(x, v) = [u, v] for all v ∈ Y where [u, v] = 0 u(t)v(t)dt. This variational (or weak) form of (2.7) (with α = 0, β = 0) is “equivalent” to the differential equation form but may have solutions of bounded variation. Proposition 2.10. If u is continuous and xu ∈ C 2 ([0, 1]) is a solution of the variational problem, then xu is a solution of the differential equation with α = 0, β = 0. Proof. Let y ∈ Y with y ∈ C 1 ([0, 1]) and y(1) = 0. Then Z

1

(−x00u )ydt + x0u (1)y(1)

[u, y] = Q(xu , y) =

(2.11)

0

Z =

1

(−x00u )ydt

0

Or, equivalently, [u − (−x00u ), y] = 0 for all such y. If u + x00u 6≡ 0, then on some [t0 , t1 ] ⊂ [0, 1], u + x00u (say) > 0. Let y˜(t) = (t − t0 )2 (t − t1 )2 on [t0 , t1 ] and

2 Derivatives and Solutions

13

0 elsewhere. Then [u + x00u , y˜] 6= 0 is a contradiction and so −xu00 = u. In view of (2.11), xu0 (1) = 0 (by taking y(t) = t). t u Let us examine a slightly different problem, namely: −

d2 x = u, dt2



dx = α, dt 0

and let H 1 ([0, 1]) = {u :

Z 0

1

x(1) = β,

 du 2 dt

(2.12)

dt < ∞}

(a Hilbert space). Let Y = {x ∈ H 1 : x(1) = β} (Y is not a subspace). Let W = {w ∈ H 1 : w(1) = 0} (W is a subspace). Consider the “variational” problem: find x ∈ Y such that Z 1 Z 1 dw dxu dt = wudt + w(0)α (2.13) 0 dt dt 0 for all w ∈ W . We claim these problems are equivalent. If xu solves (2.12), then Z 1  2  d xu + u dt 0=− w dt2 0 and, integrating by parts, Z 1 Z 1 dw dxu dt = wudt + w(0)α. 0 0 dt dt Since xu (1) = β, xu solves (2.13). Conversely, if xu ∈ Y solves (2.13), then xu (1) = β and Z 1 Z 1 dw dxu dt = wudt + w(0)α 0 0 dt dt for all w ∈ W . Let w(t) ˜ = t(1 − t) so that w(1) ˜ = 0. Since xu solves (2.13) for all w, we have Z 1 Z 1  d2 x 2  d2 x 2 u u 0= t(1 − t) + u + u dt + w(0)α ˜ = t(1 − t) dt = 0. dt2 dt2 0 0 But t(1 − t) > 0 on (0, 1) and so d2 xu /dt2 + u = 0. It then follows that i h dx u 0 = w(0) +α dt 0 for all w ∈ W and the claim follows. Let us view the simple control system a bit differently. Let         x 01 0 0 x= , A= , u= , y= , x˙ 00 u y

2 Derivatives and Solutions

14

and consider the system equation x˙ = Ax − u. If we set 1

Z {u, y} =

1

Z hu(t), y(t)idt =

u(t)y(t)dt,

0

0 1

Z

˙ = hA0 Ax, yidt

Q(x, y) =

1

Z

x(t) ˙ ˙ y(t)dt, 0

0

then it is easy to see (exercise) that the formulations are equivalent. Let us introduce another idea in this elementary example. Suppose Z is a subspace of Y and consider the problem: find an xu such that [u, y] = Q(x, y) for all y ∈ Z.

(2.14)

Suppose Z is finite-dimensional with dim Z =Pn, and let z1 , . . . , zn be a basis. Set ui = [u, zi ], Qij = Q(zi , zj ) and xu = xi zi . Then, for xu = (xi ), Q = (Qij ), and u = (ui ), (2.12) becomes Qxu = u, a matrix equation. This is an opening to the Ritz method. The variational form Q(xu , v) = [u, v] may, as we noted earlier, have solutions which are not “classical,” i.e. are not suitably differentiable. This is especially true for systems defined by partial differential equations. Let us consider some simple examples. Example 2.15. Let D ⊂ R2 be the “domain” {(t, x) : t ∈ (0, ∞), x ∈ (0, 1)} so that ∂D = {(t, 0), (t, 1), (0, x)}. Let L be the (partial) differential operator given by ∂ ∂2 L= − = Dt − Dx2 ∂t ∂x2 and consider the control system Lψu = u,

ψu (t, 0) = ψu (t, 1) = 0,

ψu (0, x) = ψ0 (x)

which can also be written  Dt ψ u = Aψ u + u,

ψu =

 ψu , Dt ψu

 A=

 0 1 , Dx2 − Dt Dt

The associated variational problem takes the form [Lψu , Lv] = [u, Lv] for all v (in a suitable Hilbert space).

u=

  0 . u

2 Derivatives and Solutions

15

Example 2.16. Let D ⊂ R2 be the “domain” {(t, x) : t ∈ (0, T ), x ∈ (0, π)} so that ∂D = {(t, 0), (t, π), (0, x), (T, x)}. Let L be the (partial) differential operator ∂2 ∂2 L=− 2 − = −Dt2 − Dx2 ∂t ∂x2 and consider the control system Lψu = u, ψu (t, 0) = ψu (t, π) = 0, ψu (0, x) = ψ0 (x), Dt ψu (0, x) = ψ1 (x) which can also be written 

     ψu 0 1 0 Dt ψ u = Aψ u + u, ψ u = , A= , u= . Dt ψu Dx2 0 u The associated variational problem takes the form [Lψu , Lv] = [u, Lv] for all v (in a suitable Hilbert space). Example 2.17. Let D ⊂ R2 be the “domain” {(x, y) : x2 + y 2 ≤ 1} so that ∂D = {x2 + y 2 = 1}. Let L be the (partial) differential operator L=−

∂2 ∂2 − = −Dx2 − Dy2 ∂x2 ∂y 2

and consider the control system Lψu = u , ψu (∂D) = ψ0 (∂D) which can also be written 

     0 −1 ψu 0 −Dx ψ u = Aψ u + u, ψ u = , A= , u= . Dy2 0 Dx ψu u The associated variational problem (again) takes the form [Lψu , Lv] = [u, Lv] for all v (in a suitable Hilbert space). Let H be a Hilbert (or pre-Hilbert) space and let L : H → H be a linear operator. Let QL (w, v) = hLw, Lvi (2.18) and let Ju (v) = for w, v, u in H. We have

1 hLv, Lvi − hu, Lvi 2

(2.19)

2 Derivatives and Solutions

16

Problem 2.20. Given u, find a w such that QL (w, v) = hu, Lvi for all v. Problem 2.21. Given u, find a w such that Ju (w) = inf Ju (v) v

for all v. Clearly, if Lw = u, then w is a solution of (2.18). Note also that Ju (v) +

hu, ui 1 = hLv − u, Lv − ui, 2 2

i.e., that 1 1 Ju (v) + ||u||2 = ||Lv − u||2 . 2 2 Clearly, if Lw = u, then w solves (2.19). Proposition 2.22. w∗ solves 2.20 if and only if w∗ solves 2.21. Proof. Observe that Ju (w + tv) = Jw (w) + thLw − u, Lvi +

t2 hLv, Lvi 2

(2.23)

for all t, v. If w∗ solves 2.20, then, by (2.23), Ju (w∗ + tv) − Ju (w∗ ) ≥ 0 for all t, v, so that w∗ solves 2.21. If w∗ solves 2.21, then Ju (w∗ + tv) − Ju (w∗ ) ≥ 0 for all t, v, so that t{hLw∗ , Lvi − hu, Lvi} +

t2 ||Lv||2 ≥ 0 2

for all t, v. Since L 6= 0, let t = {hLw∗ , Lvi − hu, Lvi}/||Lv||, and then we get Ju (w∗ + tv) − Ju (w∗ ) ≥ −

{hLw∗ , Lvi − hu, Lvi}2 ||Lv|| ≥ 0. 2

It follows that t = {hLw∗ , Lvi − hu, Lvi} = 0.

t u

Now, we have purposely been somewhat loose about the choice of the space H and the characteristics of the operator L. Moreover, it is by no means clear

2 Derivatives and Solutions

17

that a solution of either problem solves the equation Lw = u. Considerable effort will be devoted to clarifying these points in the sequel. Definition 2.24. Let D be a domain in Rn and f be a continuous map of D in R. The support of f is supp f = {x ∈ D : f (x) 6= 0}. f has compact support in D if supp f is compact and is contained in the interior of D. P Let α = (α1 , . . . , αn ) be a multi-index, αi ∈ Z, αi ≥ 0 and let |α| = αi . We write  ∂  α1  ∂ αn  ∂ α Dα f , Dxα f , f, ··· f ∂x ∂x1 ∂xn for derivatives. Definition 2.25. Let C0∞ (D) = {f : f ∈ C ∞ (D) and supp f compact in D}. Let Lp` (D) = {f : f ∈ Lp (K), for all compact K ⊂ D}. Lp` (D) is often called the space of locally Lp functions in D. Definition 2.26. A function f ∈ L1` (D) is weakly differentiable, with weak α derivative Dw f , if there is a g ∈ L1` (D) such that Z Z |α| g(x)ψ(x)dx = (−1) f (x)D|α| ψ(x)dx D



for all ψ ∈ C0∞ (D). |α|

|α|

We observe that if f ∈ C |α| (D), then Dw f exists and Dw f = D|α| f . In other words, if a continuous classical derivative exists, then so does the weak derivative and they are the same. The converse is not true. Example 2.27. Let D be a domain in Rn and let f ∈ L1` (D). Set ||f ||W21 (D) =

 X

α ||Dw f ||2L2 (D)

1/2

(2.28)

|α|≤1

and let H 1 (D) = W21 (D) = {f ∈ L1` (D) : ||f ||W21 (D) < ∞}.

(2.29)

H 1 (D) = W21 (D) is an example of a Sobolev space [A-1], Appendix E. Consider the differential operator L on H 1 (D) given by Lψ = −∇ · (a(x)∇ψ) + c(x)ψ

(2.30)

and the associated form Z {a(x)∇ψ · ∇ϕ + c(x)ψϕ}dx.

QL (ψ, ϕ) =

(2.31)

D

QL is bounded and symmetric on H 1 (D) and we can consider the problems: (1) given u ∈ H 1 (D)∗ (as a Hilbert space), find ψ such that QL (ψ, ϕ) = u(ϕ)

2 Derivatives and Solutions

18

for all ϕ ∈ H 1 ; (2) find ψ which minimizes Ju (ϕ) = {QL (ϕ, ϕ) − 2hu, ϕi}; and (3) solve Lψ = u. Dealing with these types of problems in a more general setting will be an important part of the sequel. Note that the solution need not have a classical derivative. Let us reexamine the equation of Example 2.15 written in the form Dt ψ u = Aψ u + u. This is a differential equation in a Banach space. Many control systems are defined by such equations. Let E be an open interval and let X, U be Banach spaces. Let O be open in X, and U be open in U . Consider a map f : E × O × U → X and the equation dxu = f (t, xu , u) dt

(2.32)

(describing a typical control system). What are reasonable conditions for solutions? Definition 2.33 ([D-7]). A mapping of f of an interval E into a Banach space X is regulated if it has one-sided limits at every point. Remark 2.34 ([D-7]). If E is a compact interval, then f : E → X is regulated if and only if f is the limit of a uniformly convergent sequence of step-functions. Remark 2.35. Continuous mappings are regulated. Remark 2.36. If E is a compact interval, then any map f : E → X of bounded variation is regulated (the converse is not true). A well-known [C-6, D-7] existence theorem involves a locally Lipschitzian assumption and the condition that t → f (t, x(t), u(t)) be regulated for regulated x(t) and u(t). In that case, the solution is defined except for an, at most countable, set of points in E.

Chapter 3

Inverse and Implicit Functions

Consider, from elementary calculus, the following: let f : R → R be C 1 on (a, b) with f 0 (x0 ) 6= 0 for some x0 ∈ (a, b); suppose f 0 (x0 ) > 0 (say) so that there exists  > 0 with f 0 (x) > 0 on (x0 − , x0 + ) = S(x0 , ) and, hence, f is strictly increasing on S(x0 , ). It follows that f is injective with an inverse f −1 defined on S(f (x0 ), δ), some δ > 0, and f −1 ◦ f = 1 and f −1 ∈ C 1 . We want to develop such an inverse function theorem for Banach spaces. Lemma 3.1. Let (X, d) be a complete metric space and let T : X → X. If there exists λ, 0 < λ < 1, such that d(T (x1 ), T (x2 )) ≤ λd(x1 , x2 ) for all x1 , x2 ∈ X then there exists a unique x∗ ∈ X such that T (x∗ ) = x∗ . Proof. If T (x1 ) = x1 , T (x2 ) = x2 , then d(T (x1 ), T (x2 )) = d(x1 , x2 ) ≤ λd(x1 , x2 ), with λ < 1 so that x1 = x2 . For existence, let x be any element of X and xn = T n (x) = T (T n−1 (x)). Then d(xn+m , xn ) ≤ λn+m d(x1 , x), so xn is Cauchy and xn → x∗ . But T (xn ) → x∗ also (T is continuous). t u Theorem 3.2. Let X, Y be Banach spaces, UX be an open neighborhood in X, and f : UX → Y be a C p map, p ≥ 1. Let x0 ∈ UX and suppose that (Df )(x0 ) = f 0 (x0 ) is a linear isomorphism of X into Y . Then there is an open neighborhood VX of x0 and an open neighborhood WY of f (x0 ) such that f : VX → WY is a C p -isomorphism. Proof. We may assume X = Y , f 0 (x0 ) = I (consider (f 0 (x0 )−1 ◦ f ) and that x0 = 0, f (x0 ) = 0 (via translation)). Let h(x) = x − f (x). Then h0 (x0 ) = 0 and, since h is continuous, for small  > 0, h : S(0, ) → S(0, /2), and |x| < 2 ⇒ |h0 (x)| < 1/2. Let w ∈ S(0, /2) and consider ˜ w (x) = x − f (x) = w = h(x) + w h ˜ w (x) ∈ S(0, ) (as w ∈ S(0, /2)), and it follows that If x ∈ S(0, ), then h ˜ w (x1 ) − h ˜ w (x2 )| = |h(x1 ) − h(x2 )| ≤ 1 |x1 − x2 | |h 2

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_3

19

3 Inverse and Implicit Functions

20

˜ w , i.e., h ˜ w (x) = (mean value theorem). Thus, there is a fixed point of h ˜ x − f (x) + w = x so that f (x) = w. This gives the local inverse f = f −1 . We now show: (1) f˜ is continuous, because |x1 − x2 | ≤ |f (x1 ) − f (x2 )| + |h(x1 ) − h(x2 )| ≤ 2|f (x1 ) − f (x2 )|; (2) f˜ is differentiable in S(0, /2): let y1 = f (x1 ), y2 = f (x2 ), yi ∈ S(0, /2), xi ∈ S(0, ). Then ˜ |f (y1 )− f˜(y2 )−f 0 (x2 )−1 (y1 −y2 )| = |x1 −x2 −f 0 (x2 )−1 (f (x1 )−f (x2 ))|. (3.3) Since 1 = f 0 (x2 )−1 f 0 (x2 ), there is a constant M such that the quantity in (3.3) is ≤ M |f 0 (x2 )(x1 − x2 ) − f (x1 ) + f (x2 )|. Since f is differentiable, and, by continuity, quantities are o(x1 − x2 ) and o(y1 − y2 ), f˜ is differentiable and f˜0 (y) = f 0 (f˜(y))−1 so that f˜ is C 1 . C p follows by induction. t u Theorem 3.4 (Implicit Function). Let X, Y, Z be Banach spaces. f a C 1 map of U open in X × Y into Z. Suppose (x0 , y0 ) ∈ U with f (x0 , y0 ) = 0 and D2 f (x0 , y0 ) a linear homeomorphism of Y → Z. Then there exists an open ˜ of neighborhood U0 of x0 in X such that, for all connected neighborhoods U ˜ ⊂ U0 , there is a unique, continuous map ψu˜ : U ˜ → Y with ψu˜ (x0 ) = y0 , x0 , U ˜ . Moreover, ψu˜ is C 1 in (x, ψu˜ (x)) ∈ U0 , and f (x, ψu˜ (x)) = 0 for all x ∈ U ˜ and U ψu˜ (x) = −(D2 f (x, ψu˜ (x))−1 ◦ D1 f (x, ψu˜ (x)). Proof. We may assume (x0 , y0 ) = (0, 0) and Y = Z. Define ϕ : X × Y → X × Y by ϕ(x, y) = (x, f (x, y)). Then, ϕ0 is represented by the matrix I1

0

!

D1 f D2 f and is a linear homeomorphism (isomorphism) at (x0 , y0 ). By the previous theorem, there is a suitable local inverse ψu˜ . t u (Alternate proof sketch). Let L = D2 f (x0 , y0 ) with inverse L−1 . Then f (x, y) = 0 if and only if y = y − L−1 f (x, y). Let h(x, y) = y − L−1 f (x, y). Then, on a small open neighborhood N, h(x, y1 ) − h(x, y2 ) = L−1 (D2 f (x0 , y0 )(y1 − y2 ) − (f (x, y1 ) − f (x, y2 ))) = L−1 (w(x, y1 , y2 )) (as L−1 ◦ L = I). Let  > 0 such that ||L−1 || ≤ 1/2. Since f is C 1 in N, there are open neighborhoods U0 of x0 , W0 of y0 such that if x ∈ U0 and y1 , y2 ∈ W0 , then

3 Inverse and Implicit Functions

21

||w(x, y1 , y2 )|| ≤ ||y1 − y2 || and hence that ||h(x, y1 ) − h(x, y2 )|| ≤ ||L−1 ||||y1 − y2 || ≤

1 ||y1 − y2 ||. 2

By Lemma 3.1 (as f (x0 , y0 ) = 0), we have a unique continuous map ψu : U0 → W0 (possibly smaller neighborhoods) with f (x, ψu (x)) = 0. Differentiability can be shown similarly to (2) of Theorem 3.2 (cf. [D-7]). t u Corollary 3.5. Let f = (f1 , . . . , fn ) : Rm × Rn → Rn be a C 1 map of a neigborhood Nm × Nn of (x0 , y0 ) = (x01 , . . . , x0m , y01 , . . . , y0n ) such that f (x0 , y0 ) = 0 and the Jacobian ∂(f )/∂(y) = ∂(f1 , . . . , fn )/∂(y1 , . . . , yn ) does not vanish at (x0 , y0 ). Then there is an open neighborhood U0 ⊂ U , x0 ∈ U0 ˜ of x0 with U ˜ ⊂ U0 , there such that, for any connected open neighborhood U ˜ is a unique continuously differentiable ψu˜ : U → V with ψu˜ (x0 ) = y0 and f (x, ψu˜ (x)) = 0 for x ∈ U0 and the Jacobian satisfies (Dj ψu˜,i (x)) = −

−1  ∂(f )   ∂(f ) . ∂(y) ψu˜ (x) ∂(x) ψu˜ (x)

Chapter 4

Minimization of Functionals: Hilbert Spaces

Let H be an inner product space with [., .] as inner product and let H be the Hilbert space which is the completion of H with respect to [., .]. Let J be a functional on H. Definition 4.1. J is differentiable at h0 if there is a continuous linear functional DJ(h0 ) on H such that J(h0 + h) − J(h0 ) = DJ(h0 ) · h + o(||h||) where lim

||h||→0

o(||h||) = 0. ||h||

(4.2)

(4.3)

If J is differentiable at h0 , then there is an element h0 in H, such that DJ(h0 ) · h = [h0 , h]

for all h ∈ H;

(4.4)

h0 is the gradient of J at h0 and we write h0 = ∇J(h0 ). J is twice differentiable at h0 if there is a continuous linear functional DJ(h0 ) on H and a continuous quadratic functional D2 J(h0 ) on H such that J(h0 + h) − J(h0 ) = DJ(h0 )h + D2 J(h0 )(h, h) + o(||h||2 ).

(4.5)

Since D2 J(h0 ) is continuous, there is a linear transformation L of H into itself such that D2 J(h0 )(h, g) = [Lh, g]. (4.6) L is the Hessian of J at h0 and we write L = ΘJ(h0 ) so that D2 J(h0 )(h, g) = [ΘJ(h0 )h, g]

(4.7)

for all h, g. Proposition 4.8. Let h∗ be an interior local minimum of J. Then ∇J(h∗ ) = 0. © Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_4

23

24

4 Minimization of Functionals: Hilbert Spaces

˜ ||h|| ˜ = 1, with [∇J(h∗ ), h] ˜ = Proof. Suppose ∇J(h∗ ) 6= 0. Then there is an h, ∗ ∗ ˜ α > 0. For |t| small enough, J(h + th) − J(h ) = t · α + o(|t|). Let α >  > 0. ˜ ≥ 0 (local minimum) but tα < 0 ˜ h) Let t < 0 with |t| <  so that J(h∗ +th)−J( and o(|t|)/|t| < . Hence, t · α + o(|t|) = −|t|α + o(|t|) ≥ 0, o(|t|)/|t| ≥ α > 0. But then  > o(|t|)/|t| ≥ α and α <  is a contradiction. u t Proposition 4.9. If h∗ is an interior point, if ∇J(h∗ ) = 0, and if ΘJ(h∗ ) is strictly positive (i.e., [ΘJ(h∗ )h, h] ≥ M ||h||2 , M > 0 and all h), then h∗ is a local minimum of J. t u

Proof. Exercise.

Definition 4.10. Let V ({Gα }) = {h : Gα (h) = 0, Gα functions on H}. V ({Gα }) is a smooth variety in H if the Gα are continuously differentiable and ∇Gα (h), h ∈ V ({Gα }), are linearly independent. If h ∈ V ({Gα }), then Vh ({Gα }) = span ({∇Gα (h)} and the tangent space to V ({Gα }) at h is ˜ : [∇Gα (h), h] ˜ = 0 all α}. V ({Gα }) is finitely defined if Vh⊥ ({Gα }) = {h α = 1, . . . , k, k finite. Remark 4.11. If V = V (G1 , . . . Gk ) is a finitely-defined, smooth variety and h∗ ∈ V and MV (h∗ ) = ([∇Gα (h∗ ), ∇Gβ (h∗ )]), α, β = 1, . . . , k, then det MV (h∗ ) 6= 0 and for any given h, the equation   v1   MV (h∗ )v = ([∇Gi (h∗ ), h]), i = 1, . . . , k, v =  ...  , vk has a unique solution v = λ(h), vi = λi (h), i = 1, . . . , k. Lemma 4.12 (Lagrange Multipliers). Let V = V(G1 , . . . , Gk ) be a finitelydefined, smooth variety on H. Suppose that J is differentiable on V and that h∗ is a local minimum of J on V and that ∇J, ∇G1 , . . . , ∇Gk are continuous in a neighborhood of h∗ . Then: (1) ∇J(h∗ ) is orthogonal to the tangent space Vh∗ to V at h∗ ; and (2) if ∇J(h∗ ) 6= 0, then there are scalars λi = λi (∇J(h∗ ), i = 1, . . . , k, such that k X ∇J(h∗ ) = λi ∇Gi (h∗ ). (4.13) i=1

The λi are called Lagrange Multipliers. Proof. Let us assume for the moment that the assertion (1) holds. Since the ∇Gi (h∗ ), i = 1, . . . , k, are linearly independent, we have by Remark 4.11, MV (h∗ )λ = ([∇Gi (h∗ ), ∇J(h∗ )])i=1,...,k

(4.14)

has a unique solution λ = λ(∇J(h∗ )) = (λi (∇J(h∗ )) = (λi ). Moreover, if ∇J(h∗ ) 6= 0, then the λi are not all zero since ∇J(h∗ ) is orthogonal to Vh∗ . In view of (4.14),

25

4 Minimization of Functionals: Hilbert Spaces

∇J(h∗ ) −

k X

λGi (h∗ )

i=1

is in Vh∗ . It follows that [∇J(h∗ )−Σλi ∇Gi (h∗ ), ∇J(h∗ )−Σλi ∇Gi (h∗ )] = 0 and so, ∇J(h∗ ) − Σλi ∇Gi (h∗ ) = 0, which is assertion (2) of the lemma. Now, let us turn to assertion (1). We are going to use the Implicit Function Theorem 3.4 to obtain the result. Let h be an element of Vh∗ . Let Kh = span{h} and KG = span{∇Gi (h∗ ), i = 1, . . . , k}. Since Kh and KG are orthogonal, K = Kh + KG is a direct sum. Consider the functions Γi : K → R, i = 1, . . . , k, defined by Γi (v) = Γi (vh + vG ) = Gi (h∗ + vh + vG ),

i = 1, . . . , k,

(4.15)

where vh and vG are the Kh and KG components, respectively, of v in K. Setting γj = ∇Gj (h∗ ), j = 1, . . . , k, we have k k     X X Γi ah + aj γj = Gi h∗ + ah + aj γj , j=1

i = 1, . . . , k,

(4.16)

j=1

and so we may view the Γi as functions on Rk+1 . In view of the continuity assumptions and the linear independence of the γj , we can use Theorem 3.4 to locally solve the equations Γi (a, a1 , . . . , ak ) = 0,

i = 1, . . . , k,

(4.17)

for the aj in terms of a in a neighborhood N(0) of the origin. Let αj (a), j = 1, . . . , k, be this local solution and observe that dαj (a) −∂(Γ1 , . . . , Γk ; a1 , . . . , aj−1 , a, aj+1 , . . . , ak ) = da 0 ∂(Γ1 , . . . , Γk ; a1 , a2 , . . . , ak ) 0

(4.18)

for j = 1, . . . , k where the ∂(. . . , . . .) are suitable Jacobians. However, ∂Γi = [∇Gi (h∗ + ah + Σaj γj , h], ∂a

i = 1, . . . , k,

(4.19)

and h is an element of Vh∗ . It follows that dαj (a) = 0, da 0

j = 1, . . . , k.

(4.20)

Let N(h∗ ) be a suitably small neighborhood of h∗ and assume that h∗ + ah + Σaj γj ∈ N(h∗ ) if (a, a1 , . . . , ak ) ∈ N(0). Thus, if a is sufficiently “near” 0,

4 Minimization of Functionals: Hilbert Spaces

26

J(h∗ + ah + Σαj (a)αj ) − J(h∗ ) n o dαj (a) = a [∇J(h∗ ), h] + Σ [∇J(h∗ ), γj ] + o(|a|) da 0

(4.21)

or, in view of (4.20), J(h∗ + ah + Σαj (a)αj ) − J(h∗ ) = a[∇J(h∗ ), h] + o(|a|).

(4.22)

But h∗ is a local minimum of J on V, h∗ + αh + Σαj (a)γj is an element of V for a “near” zero, and since a may be positive or negative, we have [∇J(h∗ ), h] = 0 and the lemma is established. t u We shall generalize this result in Chapter 6. The key elements in the proof are the invertibility of a linear transformation (MV (h∗ ) here) and the use of the Implicit Function Theorem. Let D be a convex domain (has interior) in H, let V = V (G1 , . . . , Gk ) be a finitely-defined, smooth variety, let U = D ∩ V,

(4.23)

and let J be a functional on U. If h ∈ U, then the “direction” in which J is decreasing most rapidly in U is given by −∇J(h) +

k X

λβ (h)∇Gβ (h)

(4.24)

β=1

where the λβ (h) are the unique solution of the system Mv (h)v = ([∇Gi (h), ∇J(h)]).

(4.25)

Thus, if h0 ∈ U, then the solution of the differential equation k n o X ˙ h(t) = − ∇J(h(t)) − λj (h(t))∇Gj (h(t)) ,

t ≥ 0,

(4.26)

j=1

with h(0) = h0 , is a curve of steepest descent. If limt→∞ h(t) = h∗ exists, then integration of (4.26) will generate a (locally) minimizing family. We have: Theorem 4.27 (Elementary Hilbert space integration method [R-4]). Suppose that: (i) J, G1 , . . . , Gk are C 2 on U; (ii) if h ∈ V and Vh is the tangent space to V at h, then k hn o i X ΘJ(h) − λj (h)ΘGj (h) g, g ≥ M ||g||2 , j=1

M >0

(4.28)

27

4 Minimization of Functionals: Hilbert Spaces

for all g ∈ Vh (coercive on Vh ); (iii) let σ(f ) = inf h∈∂V {||f − h||} be the distance from f to the boundary ∂V of V and let Σ ⊂ V be the set of elements h0 ∈ V such that the solution of (4.26), with h(0) = h0 , exists on [0, ∞). Suppose that, for h0 ∈ Σ, k X λj (h0 )∇Gj (h0 ) < M σ(h0 ). ∇J(h0 ) − j=1

Then, if h(t) is the solution of (4.26) with h(0) = h0 ∈ Σ, then (a) limt→∞ h(t) = h∗ exists in H; (b) limt→∞ J(h(t)) = J ∗ exists; and (c) if h ∈ U, then J(h) ≥ J ∗ + o(||h − h∗ ||). Proof. We have dGi (h(t)) ˙ = [∇Gi (h(t)), h(t)] (4.29) dt k h i X = ∇Gi (h(t)), −∇J(h(t)) + λj (h(t))∇Gj (h(t)) = 0 j=1

(by the definition of the λj ) and k h i X dJ(h(t)) ˙ ˙ = [∇J(h(t)), h(t)] = ∇J(h(t)) − λj (h(t))∇Gj (h(t)), h(t) dt j=1 2 ˙ ˙ ˙ = [−h(t), h(t)] = −||h(t)|| ≤ 0,

hence J(h(t)) is non-increasing. It follows that d2 J(h(t)) ˙ ¨ = −2[h(t), h(t)] dt2

(4.30)

and that d2 J(h(t)) ˙ ˙ ¨ = [ΘJ(h(t))h(t), h(t)] + [∇J(h(t)), h(t)] (4.31) dt2 k hn o i X ˙ ˙ ˙ ˙ = ΘJ(h(t)) − λj (h(t))ΘGj (h(t)) h(t), h(t) − [h(t), h(t)]. j=1

Combining (4.30) and (4.31), we have, in view of assumption (i) and the fact ˙ that h(t) ∈ Vh(t) from (4.29), that d2 J(h(t)) dJ(h(t)) ≥ −2M . 2 dt dt

(4.32)

Hence e2M t dJ(h(t))/dt is an increasing function on [0, ∞). But by (4.30)

4 Minimization of Functionals: Hilbert Spaces

28

2 2 ˙ ˙ ≤0 ≤ −e2M t ||h(t)|| −||h(0)||

(4.33)

˙ ≤ e−M t M σ(h0 ). 0 ≤ ||h(t)||

(4.34)

which implies Integrating gives ||h(t) − h(s)|| ≤ σ(h0 )e−M s ,

t > s ≥ 0,

(4.35)

which establishes (a). Property (b) is proved in an entirely similar way (Exercise). Let us turn our attention to (c). Observe that lim

t→∞

k o n X λj (h(t))∇Gj (h(t)) = 0. ∇J(h(t)) −

(4.36)

j=1

Let h ∈ U and let ϕ(ξ, t) be the function defined by ϕ(ξ, t) = J(h(t) + ξ(h − h(t)) −

k X

λj (h(t))Gj (h(t) + ξ(h − h(t))). (4.37)

j=1

Then ∂ϕ (0, t) + o(||h − h(t)||) (4.38) ∂ξ k h i X = J(h(t)) + ∇J(h(t)) − (h(t))∇Gj (h(t)), h − h(t)

ϕ(1, t) = J(h) = ϕ(0, t) +

j=1

+ o(||h − h(t)||). Letting t → ∞, we find that J((h) ≥ J ∗ + o(||h − h∗ ||) which is (c). Corollary 4.39. Suppose that for f ∈ D and g ∈ H k hn o i X ΘJ(f ) − λj (f )ΘGj (f ) g, g ≥ M ||g||2 ,

M > 0,

j=1

then h ∈ U implies J(h) ≥ J ∗ + M ||h − h∗ ||2 . Proof. If h ∈ U and ϕ is given by (4.37), then ϕ(1, t) = J(h) = ϕ(0, t) + where 0 < ξˆ < 1. It follows that

∂ϕ ∂2ϕ ˆ (0, t) + (ξ, t) ∂ξ ∂ξ

t u

29

4 Minimization of Functionals: Hilbert Spaces k h i X J(h) = J(h(t)) + ∇J(h(t)) − λj (h(t))∇Gj (h(t)), h − h(t) j=1

h

ˆ − + {ΘJ(h)

k X

ˆ ˆ λj (h)ΘG j (h)}(h − h(t)), h − h(t)

i

j=1

ˆ = h + ξ(h ˆ − h(t)). In view of the hypothesis, where h k i h X J(h) ≥ J ∗ + ∇J(h(t)) − λj (h(t))∇Gj (h(t)), h − h(t) + M ||h − h(t)||2 . j=1

Letting t → ∞, we find that J(h) ≥ J ∗ + M ||h − h∗ ||2 and the corollary is established. t u Theorem 4.27 is a simple form of integration method (often called the gradient method ) which we will extend in the sequel and, also, interpret in a control context. Let us, in the same spirit, now turn our attention to a simple form of representation method (called the Ritz or Ritz–Galerkin method ). Let ψ0 , ψ1 , ψ2 , . . . be a sequence of (linearly independent) elements of H and let k o n X (4.40) aj ψj = span {ψ0 , . . . , ψk }. Hk = h : h = j=0

Let U ⊂ H and let J be a functional. Then Hk ∩ U = Uk is a subset of U and we let Uk be the subset of Rk+1 given by k n o X aj ψj ∈ U . Uk = (a0 , a1 , . . . , ak ) :

(4.41)

j=0

If Jk is the function defined by Jk (a0 , . . . , ak ) = J(a0 ψ0 + a1 ψ1 + · · · + ak ψk ),

(4.42)

then minimizing J on Uk is equivalent to minimizing Jk on Uk . Since U0 ≤ U1 ≤ · · · , we note that inf J0 ≥ inf J1 ≥ · · · . (4.43) U0



U1

Thus, if limk→∞ inf Uk Jk = J and if there are u ˜k ∈ Uk such that J(˜ uk ) = inf Uk Jk , then {˜ uk } is indeed a minimizing sequence. In other words, the problem of minimizing J has been replaced by a (nested) sequence of finitedimensional minimization problems. This idea is at the heart of representation methods.

30

4 Minimization of Functionals: Hilbert Spaces

Definition 4.44. ψ0 , ψ1 , . . . is complete in U if u ∈ U and  > 0 together imply that there is a k (= k()) and an element uk of Uk such that ||uk −u|| < . Theorem 4.45 (Elementary Hilbert space representation method). Suppose that, for all k, there is a u∗k in Uk such that J(u∗k ) = inf J = inf Jk . Uk

Uk

(4.46)

Suppose further that {ψ0 , ψ1 , . . .} is complete in U and that J is continuous on U. Then u0∗ , u1∗ , . . . is a minimizing sequence for J on U. Proof ([L-6]). Let  > 0 and let u ˜ ∈ U such that u) < J ∗ +  J(˜ where J ∗ = inf U J(u). Since J is continuous on U, there is a δ > 0 such that ||u − u ˜|| < , u ∈ U, such that u)| < . |J(u) − J(˜ Since u ˜ ∈ U and the {ψj } are complete in U, we observe that, for some k, ˜k such that ||˜ there is a u uk − u uk ) − J(˜ ˜|| < δ and hence that |J(˜ u)| <  and that uk ) < J ∗ + 2. J(˜ By assumption J(uk∗ ) ≤ J(˜ uk ) and so J ∗ ≤ J(uk∗ ) < J ∗ + 2. It follows immediately that limk→∞ J(uk∗ ) = J ∗ .

t u

Chapter 5

Minimization of Functionals: Uniformly Convex Spaces

Let X be a Banach space and let X ∗ be its dual. We have: Definition 5.1. If f ∈ X ∗ , then f has the (Sobolev) gradient property if, for λ > 0, there is a unique xf ∈ X such that f (xf ) =

sup

|f (x)|

(5.2)

x∈X,||x||=λ

(clearly enough for λ = 1). X has the gradient property if every f ∈ X ∗ has the gradient property. Proposition 5.3. If X = H is a Hilbert space, then H has the gradient property. Proof. If f ∈ H ∗ = H, then there is a unique hf ∈ H such that f (h) = hhf , hi. Then hhf , hf i sup |hhf , hi| = , ||hf || h∈H,||h||=1 since |hhf , hi| ≤ ||hf || · ||h|| ≤ ||hf || if ||h|| = 1.

t u

Definition 5.4. Let X have the gradient property and let ψ ∈ C 1 (X, R) so that ψ 0 (x) ∈ X ∗ and ψ 0 : X → X ∗ . The gradient of ψ, ∇ψ, is a map of X into X given by ψ 0 (x)[∇ψ(x)] = sup |ψ 0 (x)y| (5.5) y∈X,||y||=1

for x ∈ X. We want to prove an analogy of Theorem 4.27 for these (Sobolev) gradients in so-called uniformly convex spaces. Proposition 5.6. Let X be a Banach space. Then the following conditions (1) and (2) are equivalent. (1) If 0 ≤  ≤ 2, ||x|| ≤ 1, ||y|| ≤ 1, and ||x − y|| ≥ , then there is a δ > 0 such that ||(x + y)/2|| ≤ 1 − δ. © Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_5

31

5 Minimization of Functionals: Uniformly Convex Spaces

32

(2) If  > 0, ||x|| = ||y|| = 1, and ||x − y|| ≥ 2, then there is a δ, 0 < δ < 1, such that ||(x + y)/2|| ≤ δ (or ||x + y|| ≤ 2). Proof. Assume (1). Let  > 0, ||x|| = ||y|| = 1, and ||x − y|| ≥ 2, then we must have 0 <  < 1 as 2 ≥ ||x − y|| ≥ 2. Let 0 = 2 so that by (1), there is a δ 0 > 0 such that ||(x + y)/2|| ≤ 1 − δ 0 . Let δ = 1 − δ 0 and note 1 − δ 0 > 0 so that 1 > δ 0 and 0 < δ < 1. Assume (2). Let 0 = /2 so that ||x − y|| ≥ 20 which implies there is a δ 0 , 0 < δ 0 < 1, with ||(x + y)/2|| ≥ δ 0 . Let δ = 1 − δ 0 . t u Then δ > 0 and ||x − y|| ≥  and ||(x + y)/2|| ≤ 1 − δ. Definition 5.7. A Banach space X is uniformly convex if the equivalent conditions of Proposition 5.6 are satisfied [C-4]. Proposition 5.8. If X is uniformly convex, then X has the gradient property. Proof. Since X is reflexive [N-1, H-5], there is an xf∗ , ||x∗f || = 1, such that ||f || = sup||x||=1 |f (x)| = f (xf∗ ) for any f in X ∗ . Suppose that ||f || = sup |f (x)| = f (x1 ) = f (x2 ) ||x||=1

with x1 6= x2 . Then say, ||x1 − x2 || ≥ , 0 <  ≤ 2, and so there is a δ, 0 < δ < 1, such that ||(x1 + x2 )/2|| ≤ δ < 1. But f

 x + x  f (x ) + f (x ) f (x1 ) 1 2 1 2 = =2 = f (x1 ) = ||f || 2 2 2

and ||(x1 + x2 )/2|| < 1 which is a contradiction (for ||(x1 + x2 )/2|| ≤ δ < 1 implies |f (h/||h||)| = |f (h)|/||h|| where h = (x1 + x2 )/2, i.e., ||f ||/||h|| > ||f ||). t u Following [N-1] and [Z-1], we are now going to develop a gradient theory for uniformly convex spaces extending the results of Chapter 4. Proposition 5.9. Suppose that X is uniformly convex and that J ∈ C 1 (X). Let ∇J(x) be the Sobolev gradient of J at x. Then DJ(x)[∇J(x)] = ||∇J(x)||2

(5.10)

for all x ∈ X. Proof. By definition ||∇J(x)|| = |DJ(x)| = sup||h||=1 ||DJ(x)h|| and |DJ(x)|2 = sup||h||=|DJ(x)| |DJ(x)h|. By definition then, |DJ(x)|2 = DJ(x)[∇J(x)] and so ||∇J(x)||2 = DJ(x)[∇J(x)]. t u Corollary 5.11. If x∗ is a local minimum of J, then DJ(x∗ ) = 0 and so DJ(x∗ )[∇J(x∗ )] = ||∇J(x∗ )||2 = 0. Proof. Since x∗ is a local minimum, DJ(x∗ ) = 0 and so DJ(x∗ )[∇J(x)] = ||∇J(x)||2 = 0. t u

33

5 Minimization of Functionals: Uniformly Convex Spaces

Definition 5.12. Let X be a Banach space and let J ∈ D1 (X) (continuous and differentiable). An element u∗ ∈ U ⊂ X is an extremum or critical point of J on U if DJ(u∗ ) = 0. Theorem 5.13 ([Z-1, N-1]). Let X be a uniformly convex Banach space and let ψ ∈ C 2 (X). If x0 ∈ X, then there is a unique solution x(t), t ∈ [0, ∞), of ˙ x(t) = −(∇ψ)(x(t))

(5.14)

with x(0) = x0 , t ≥ 0. Proof. By standard results on existence and uniqueness, there is a solution x∗0 (t) on an interval [0, t∗0 ). Consider the set of all such t∗0 and suppose it is bounded with lub t∗0 = t∗ (< ∞). We shall show that this leads to contradiction. Let x∗ (t) be the solution on [0, t∗ ) and let 0 ≤ α < β < t∗ . Then β

Z ||x (β) − x (α)|| = ∗



Z β ∗ 2 2 dx∗ dx (s) (s)ds ≤ (β − α) ds. ds ds α

2

α

Now,

dψ ∗ dψ dx∗ (s) (x (s)) = = −(Dψ(x∗ (s))[∇ψ(x∗ (s))] ∗ ds dx x (s) ds

and so, ∗

β

Z



dψ ∗ (x (s))ds = − ds

ψ(x (β)) − ψ(x (α)) = α

Z

β

||Dψ(x∗ (s)||2 ds.

α

It follows that Z

β

||(Dψ)(x∗ (s)||2 ds ≤ ψ(x∗ (α)) , β ≥ α.

α

But, then ||x∗ (β) − x∗ (α)|| ≤

Z

β

|| − (∇ψ)(x∗ (s))||ds =

α ∗

Z

β

α ∗

dx∗ (s) ds ds

1/2

≤ (t ψ(x (a))) for α ≤ β < t∗ . Then Z lim∗

β→t −

β

α

dx∗ (s) ds exists ds

and lim x∗ (t) = y ∗ exists.

t→t∗ −

Consequently, the solution can be extended from t∗ (i.e., solve (5.14) with x(t∗ ) = y ∗ and patch). t u

34

5 Minimization of Functionals: Uniformly Convex Spaces

Proposition 5.15. Let f ∈ C 2 (X, Y), α ≥ 2, and let Jf : X → R be given by ||f (x)||α Jf (x) = . (5.16) α If f (x0 ) = 0, then (DJf )(x0 ) = 0. Proof. If x is near x0 , then ||f (x)||/||x − x0 || < 1 + ||Df (x0 )||. Since f is continuous and f (x0 ) = 0, ||f (x)||
0. It follows that, as Jf (x0 ) = 0, |Jf (x) − Jf (x0 )| ||f (x)||α = 0. Theorem 5.19. Let X be uniformly convex and J ∈ C 2 (X). Let U ⊂ X and J be gradient coercive on U. If u0 ∈ U, and x(t) ∈ U, t ≥ 0, where x(t) ˙ = −(∇J)(x(t)),

x(0) = u0 ,

(5.20)

then limt→∞ x(t) = u∗ exists and J(u∗ ) = 0 (hence u∗ ∈ U and J(u∗ ) = inf u∈U J(u) as J(u) ≥ 0, u ∈ U). Proof. If ∇J(u0 ) = 0, then we are done. So assume ∇J(u0 ) 6= 0 and, consequently, ∇J(x(t)) 6= 0 for all t (otherwise it contradicts uniqueness). Now, by Proposition 5.9, DJ(x(s)) = −||∇J(x(s))||2 and, by gradient coerciveness, DJ(x(s)) ≤ −λ2 J(x(s)) and ln

h DJ(x(s)) i J(x(s))

which implies J(x(s)) ≤ J(u0 )e−λ

2

s

≤ −λ2 s

so that limt→∞ J(x(t)) = 0. Moreover, p Z ∞ ∞ X  p J(u0 ) dx(s) −λ2 n/2 e = . ds ≤ J(u0 ) ds (1 − e−λ2 /2 ) 0 n=0

5 Minimization of Functionals: Uniformly Convex Spaces

35

In other words, dx(s) ∈ L1 ([0, ∞)) ds and so limt→∞ x(t) = u∗ exists and J(u∗ ) = 0 as 0 = limt→∞ J(x(t)) and J is continuous. t u Corollary 5.21. Let f ∈ C 2 (X, Y ), α ≥ 2, and X be uniformly convex. Let Jf : X → R be given by ||f (x)||2 Jf (x) = . α Suppose that on S(u0 , r) = {||x − u0 || ≤ r}, ||∇Jf (x)||X ≥ λ||f (x)||Y , for some λ > 0. Let x(t), t ∈ [0, ∞), be the solution of dz (t) = −∇Jf (z(t)), dt

z(0) = u0 .

If ||f (x0 )|| ≥ (λr)1/α−1 , then limt→∞ x(t) = u∗ exists and Jf (u∗ ) = 0. Proposition 5.22. Suppose that X is uniformly convex and J ∈ C 2 (X, R+ ). Let u0 ∈ X with ∇J(u0 ) 6= 0 and let x(t) be the solution of x(t) ˙ = −∇J(x(t)) with x(0) = u0 . Let σ(t) = DJ(x(t))[D(∇J(x(t)))](∇J(x(t)) and suppose there is a λ > 0 such that D2 J[∇x(t), ∇x(t)] − σ(t) ≥ λ||∇J(x(t))||2 .

(5.23)

Then limt→∞ x(t) = u∗ exists and ∇J(u∗ ) = 0. Proof. Let ξ(t) = J(x(t)). Then ˙ = DJ(x(t))[∇J(x(t))] = ||∇J(x(t))||2 , ξ(t) ¨ = D2 J(x(t))[∇J(x(t)), ∇J(x(t))] − DJ(x(t))[D(∇J(x(t))](∇J(x(t))). ξ(t) ¨ ˙ = d log ξ(t)/dt, ˙ ¨ ˙ ≤ −λ and Hence, since, ξ(t)/ ξ(t) ξ(t)/ ξ(t) Z 0

t

˙ d log ξ(s) ds = log ds

˙ ξ(t) ≤ −λt, ˙ ξ(0)

˙ ˙ ˙ therefore ξ(t)/ ξ(0) ≤ e−λt , λ > 0. But ξ(0) = ||∇J(u0 )||2 6= 0 and so, ∗ limt→∞ x(t) = u exists (estimate the integral by a sum). By continuity ∇J(u∗ ) = 0. t u Now let us consider a simple form of a representation method in the situation of a uniformly convex space. The situation is complicated by the absence of orthonormal bases. In fact, even a separable, infinite-dimensional Banach space does not have a countable vector basis.

5 Minimization of Functionals: Uniformly Convex Spaces

36

Definition 5.24. Let X be a Banach space. A family {xλ } is a vector (or algebraic) basis of X if any x ∈ X can be written as a unique finite linear combination of elements of {xλ }. If V is a linear subspace of X, then a linear suspace W of X is an algebraic supplement of V if X = V + W and V ∩ W = {0}. Proposition 5.25. Given a linear subspace V ⊂ X, there exists an algebraic supplement W of V . Proof. Consider the family A of subspaces A such that V < V + A and V ∩ A = {0}. We may assume S V 6= X. Order A by inclusion and note that if Aα is a chain in A, then Aα is in A and so (Zorn’s lemma), there is a maximal element W which must be an algebraic supplement. Let X1 , X2 be Banach spaces (normed linear is enough) and let X = X1 × X2 . Then the maps i1 : x1 → (x1 , 0) and i2 : x2 → (0, x2 ) are linear isometries of X1 and X2 onto the closed subspaces X1 × {0} and {0} × X2 , respectively, of X. Clearly X = [X1 × {0}] ⊕ [{0} × X2 ]. On the other hand, suppose X is the algebraic direct sum of two subspaces X1 and X2 . Then the map (x1 , x2 ) → x1 + x2 is a continuous (but not necessarily bicontinuous) map of X1 × X2 onto X. If x ∈ X, then X = π1 (x) + π2 (x) with π1 : X → X1 and π2 : X → X2 . t u Definition 5.26. If either (hence both) of the mappings πi are continuous, then X is the topological direct sum of X1 and X2 and we write X = X1 ⊕X2 . If V is a closed subspace of X, then a (necessarily) closed subspace W of X is a topological supplement of V if X = V ⊕ W . Definition 5.27. A subset A ⊂ X is total if sp (A) = X and a family {xα } is total if A = {xα } is total. Remark 5.28. If V is a closed subspace of X and W is a subspace with dim W = n finite, then V + W is closed in X [D-9] and hence any finitedimensional subspace is closed. If V is closed and W is an algebraic supplement of V with dim W = n finite, then W is also a topological supplement of V and X = V ⊕ W . If there is a countable total family {xn } in X, then X is separable and conversely, if X is separable, then there is a countable linearly independent total family {xn } in X. The spaces Lp , `p for 1 ≤ p < ∞ are separable but L∞ , `∞ are not. So suppose that X is separable, U ⊂ X and J is a functional. Let {ψn } be a total linearly independent set (so sp {ψn } = X), Then, letting k n o X Xk = x : x = aj ψj = span {ψ0 , . . . , ψk } j=0

and Uk = Xk ∩ U, we have

(5.29)

37

5 Minimization of Functionals: Uniformly Convex Spaces k o n X aj ψj ∈ U Uk = (a0 , . . . , ak ) :

(5.30)

j=0

as a subset of Rk+1 . If Jk is the function defined by Jk (a0 , . . . , ak ) = J(a0 ψ0 + . . . ak ψk ),

(5.31)

then minimizing J on Uk is equivalent to minimizing Jk on Uk . Since U0 ≤ U1 ≤ · · · , we note that (5.32) inf J0 ≥ inf J1 ≥ · · · . u0

u1



uk ) = ˜k ∈ Uk such that J(˜ Thus, if limk→∞ inf uk Jk = J and if there are u inf Uk Jk , then {˜ uk } is indeed a minimizing sequence. Theorem 5.33 (Elementary separable space representation method). Suppose that, for all k, there is a u∗k in Uk such that J(u∗k ) = inf J = inf Jk , Uk

Uk

(5.34)

{ψn } is total, and J is continuous on U. Then u0∗ , u1∗ , . . . is a minimizing sequence for J on U. Proof. Essentially the same as the proof of Theorem 4.45. Let J ∗ = u) < J ∗ +. By coninf U J(u) and let  > 0. Then there is a u ˜ ∈ U such that J(˜ u)| <  ˜|| < δ, u ∈ U imply |J(u) − J(˜ tinuity, there is a δ > 0 such that ||u − u ˜k with ||˜ ˜ ∈ U and {ψn } is total, there is, for some k, a u uk − u since u ˜|| < δ. u)| <  and, by assumption, J(u∗k ) ≤ J(˜ Then |J(˜ uk ) < J ∗ + 2 and uk ) − J(˜ ∗ ∗ limk→∞ J(uk ) = J . t u Let X, Y be Banach spaces and g be a map of X into Y Definition 5.35. An element x in X with x ∈ dom(g) is regular for g if: (1) dg(x, · ) = g 0 (x) exists; (2) the kernel of g 0 (x), Kx (g) = {h : g 0 (x)h = 0Y }, has a topological supplement Kx⊥ (g) in X (the notation is suggestive); (3) g 0 (x) when restricted to Kx⊥ (g) is a linear homeomorphism between Kx⊥ (g) and Y . Definition 5.36. Let S(g) = {x : g(x) = 0Y }. S(g) is smooth at x0 ∈ S(g) if x0 is regular for g. S(g) is a smooth variety if it is smooth at all its points. Definition 5.37. An element x is admissible for g if dg( · , · ) = g 0 ( · ) is continuous on a neighborhood N +x of x (viewed as a map of X into L(X, Y )). ∼ If x ∈ S(g), then x is admissible for S(g) if x is admissible for g and (N + ∼ x) ∩ S(g) consists of regular points of g. These notions express some requirements of the Implicit Function Theorem.

38

5 Minimization of Functionals: Uniformly Convex Spaces

Theorem 5.38 (Lagrange Multipliers). Let J : X → R. Suppose that x∗ is a local minimum of J on the smooth variety S(g). If dJ(x∗ , · ) exists and noting that x∗ is admissible for g, then dJ(x∗ , · ) is an element of the annihilator of Kx∗ (g). Proof. Let h be an element of Kx∗ (g) and let [h] be the subspace of X spanned by [h]. Let Σ = [h] ⊕ Kx⊥∗ (g) (5.39) and consider the mapping Γ : Σ → Y given by Γ (ah, k) = g(x∗ + ah + k)

(5.40)

for a ∈ R, k ∈ Kx⊥∗ (g). Since x∗ is admissible for g, it follows from the Implicit Function Theorem that (i) there is an interval U ⊂ R with 0 ∈ int(U ); (ii) a neighborhood N of 0X ; and, ∼ (iii) a map ψ : U h → Kx⊥∗ (g) such that ψ(0) = 0X , ah + ψ(a) ∈ N , and ∼ Γ (ah, ψ(a)) = 0Y for all a in U . Moreover, ψ is differentiable at 0 and dψ(0, a) = −ϕ[g 0 (x∗ )h]a

(5.41)

where ϕ is the inverse of the restriction of g 0 (x∗ ) to Kx⊥∗ (g). But h ∈ Kx∗ (g), the kernel of g 0 (x∗ ), and so g 0 (x∗ )h = 0 and dψ(0, a) = 0X . Hence, ψ(a) = ψ(0) + dψ(0, a) + o(a) = o(a)

(5.42)

for all a ∈ U . Since dJ(x∗ , · ) exists, J(x∗ + ah + ψ(a)) − J(x∗ ) = dJ(x∗ , ah) + dJ(x∗ , ψ(a)) + o(a)

(5.43)

for all a ∈ U . As dJ(x∗ , · ) is a continuous linear functional on X and, hence, on Kx⊥∗ (g), there is an M > 0 with |dJ(x∗ , k)| ≤ M ||k||. Since ψ(a) = o(a), dJ(x∗ , ψ(a)) = o(a), we have, by (5.43), J(x∗ + ah + ψ(a)) − J(x∗ ) = adJ(x∗ , h) + o(a) for all a ∈ U . Since x∗ +ah+ψ(a) is in S(g), since x∗ is a local minimum of J on S(g), and since a may be positive or negative as 0 ∈ int(U ), we conclude that dJ(x∗ , h) = 0. t u Corollary 5.44. If X = H, a Hilbert space, then dJ(x∗ , · ) = h∇J(x∗ ), · i with ∇J(x∗ ) orthogonal to Kx∗ (g). Thus, if eβ is an orthonormal basis of Kx⊥∗ (g), then ∇J(x∗ ) = Σλβ eβ .

5 Minimization of Functionals: Uniformly Convex Spaces

Exercise. Interpret Corollary 5.44 in finite dimensions. Exercise. Interpret Corollary 5.44 for Sobolev gradients.

39

Chapter 6

Minimization of Functionals: Hilbert Space, 2

Let H be a Hilbert space with inner product h , i. Let L : H → H be a linear map with D(L) dense in H. For example, L may be a differential operator. We want to consider equations of the form Lw = u. Let QL (w, v) be given by QL (w, v) = hLw, Lvi (6.1) for w, v ∈ D(L) and note that QL is positive. Let J(u)(v) be given by J(u)(v) =

1 QL (v, v) − hu, Lvi 2

(6.2)

for v ∈ D(L). Just as in Chapter 2, consider the problems: Problem 6.3. Given u, find w such that QL (w, v) = hu, Lvi for all v ∈ D(L). Problem 6.4. Find w such that J(u)(w) = inf v∈D(L) J(u)(v). We saw in Chapter 2 that these problems are equivalent in that w is a solution of 6.3 if and only if w is a solution of 6.4. We also note that if Lw = u, then w solves both these problems but the converse does not always hold. Problem 6.5. Find the critical points of J(u)( · ), i.e., the points where the gradient ∇J(u)( · ) is zero. Problem 6.6. Find the points w where ||Lw − u||2 is a minimum (mean square approximation or least squares). Clearly Lw = u if and only w is a critical point of J(u). Problem 6.7. Find the points where ||Lw − u||2 = 0 (or where ||Lw − u||p /p = 0). Example 6.8. Let ψ(x) = ||Lx − u||2 /2 and let Q = L∗ L (which is positive and symmetric). Assume Q is coercive, i.e., hx, Qyi = hQx, yi and also hx, Qxi ≥ M ||x||2 , M > 0. Set ||x||2Q = hx, Qxi. Then || · || and

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_6

41

6 Minimization of Functionals: Hilbert Space, 2

42

|| · ||Q are equivalent norms and ψ is differentiable in either. Hence ψ 0 (x)h = [∇Q ψ(x), h]Q = h∇Q ψ(x), Qhi = h∇ψ(x), hi so that Q∇Q ψ(x) = ∇ψ(x) and ∇Q ψ(x) = Q−1 ∇ψ(x). Now let us consider a map L : H → K where K is another Hilbert space. We want to solve the equation Lw = u for a given u ∈ K. Proposition 6.9 ([E-3]). If there is a w with Lw = u, let ψ(v) = ||Lv − u||2 /2, let w0 ∈ H and let z : [0, ∞) → H such that z(t) ˙ = −∇ψ(z(t)),

z(0) = w0 , t ≥ 0,

(6.10)

then w∗ = limt→∞ z(t) exists and Lw∗ = u. Proof. Observe that ψ 0 (v)h = hLh, Lv − uiK = hh, L∗ (Lv − u)iH . Hence z(0) = w0 and ˙ = −L∗ Lz(t) + L∗ u, t ≥ 0. z(t) (6.11) It follows that z(t) = e−tL



L

Z w0 +

t



e−(t−s)L

L

L∗ uds

(6.12)

0

and, since there is a w with Lw = u, z(t) = e−tL



L



w0 + w − e−tL

L

w.

(6.13)



But L∗ L is positive and symmetric so that e−tL L converges in norm to the projection EL onto Ker L. Then w∗ = limt→∞ = EL w0 + w − EL w and Lw∗ = LEL w0 + Lw − LEL w = Lw = u (as LEL = 0). t u Corollary 6.14. limt→∞ z(t) = u = E(Im L)⊥ u. Corollary 6.15. If w∗ = limt→∞ z(t), then ||w∗ − z(t)|| < ||w − z(t)||, t ≥ 0, where Lw = u and w 6= w∗ [E-3]. In effect the propositions say that if there is a solution, then the gradient integration method will converge to it. Proposition 6.16. If ψ ∈ C 2 (H, R) and ψ ≥ 0 (e.g., ψ(x) = ||Lx − u||p /p) and w0 ∈ H, then there is a unique z : [0, ∞) → H such that z(0) = w0 and z(t) ˙ = −∇ψ(z(t)), t ≥ 0. Proof. Simply a special case of Theorem 5.13.

t u

If we let Tψ (s)h = z(s), then Tψ (t)Tψ (s) = Tψ (t + s) for t, s ≥ 0, so the solution is represented by a one-parameter semigroup. We shall in the sequel extend and apply these results in control problems. We now turn our attention to representation methods which work especially well in Hilbert spaces. Let L : H → H be a densely defined symmetric positive operator and consider the equation

43

6 Minimization of Functionals: Hilbert Space, 2

Lw = u,

w ∈ D(L).

(6.17)

Proposition 6.18. There is at most one solution of (6.17) (in other words, if a solution exists, it is unique). Proof. If w1 , w2 are solutions, then w1 − w2 ∈ D(L) and L(w1 − w2 ) = 0. Hence hL(w1 − w2 ), w1 − w2 i = 0 and w1 − w2 = 0 as L is positive (i.e., t u hLh, hi > 0 for all h ∈ D(L), h 6= 0). Let J(u)(v) be given by J(u)(v) = hLv, vi − hu, vi − hv, ui

(6.19)

for v ∈ D(L). Then: Theorem 6.20. If w is a solution, then w is the unique element of D(L) which minimizes J(u)(v). Conversely, if w ∈ D(L) minimizes J(u)(v), then Lw = u. Proof. Suppose Lw = u and v ∈ D(L), v 6= w. Set w − v = z. Then J(u)(w) − J(u)(v) = hLz, wi + hLw, zi − hLz, zi − hu, zi − hz, ui = hLw − u, zi + hz, Lw − ui − hLz, zi = −hLz, zi. Since z 6= 0, hLz, zi > 0 and so J(u)(v) − J(u)(w) > 0. Now suppose w minimizes J(u)(v) over D(L). Then J(u)(w) ≤ J(u)(w + v)

(6.21)

for v ∈ D(L) and  in R. For fixed v, the function of  has a minimum at  = 0, and d J(u)[w + v] = 0. (6.22) de =0 Computing, we see that hLw − u, vi + hv, Lw − ui = 0 and, for iv, −hLw − u, vi + hv, Lw − ui = 0 for all v ∈ D(L). It follows that hLw − u, vi = 0 for all v ∈ D(L). Since D(L) is dense, Lw = u.

t u

Suppose that Vn is an n-dimensional subspace of D(L) and let En be the projection onto Vn . If v ∈ Vn , then En v = v and

6 Minimization of Functionals: Hilbert Space, 2

44

J(u)(v) = hLv, En vi − hu, En vi − hEn v, ui = hEn Lv, vi + hu, En vi − hEn v, ui, since En is self-adjoint. The operator En L may be viewed as an operator on Vn . It is symmetric and positive. Minimizing J(u)(v) for v ∈ Vn is equivalent to solving the n-dimensional equation w n ∈ Vn .

En Lwn = En u,

(6.23)

Note that this is simply projecting the equation Lw = u onto Vn which can be done as an approach even if L is nonlinear or not positive (the Ritz–Galerkin method). Equation (6.23) represents a set of n equations in n unknowns. If v1 , . . . , vn is a basis of Vn , then wn =

n X

αj vj

j=1

and (6.23) can be written as n X

αi hLvi , vj i = hu, vj i

(6.24)

i=1

for j = 1, . . . , n. If hLvi , vj i = δij , then (6.24) simplifies to αi = hu, vi i, and wn =

n X

hu, vj ivj .

j=1

Now, let [ , ]L be defined by [x, y]L = hLx, yi,

x, y ∈ D(L).

(6.25)

This is an inner product on D(L) (since L is positive) and defines a norm on D(L) by 1/2 1/2 (6.26) ||w||L = [w, w]L = hLw, wi . If v1 , . . . , vk , . . . satisfies the conditions [vi , vj ]L = δij ,

∞ X [w, vj ]L vj = w

(6.27)

j=1

for all i, j and w ∈ D(L), then we say that {vj } is a complete orthonormal set P∞ in L-energy. Suppose {vj } is such a set and let w = j=1 hu, vj ivj . Then the series converges in || · ||L and Lw = u. However, the series need not converge in || · || (i.e. in H). If L is coercive, then the series converges in || · || as well , since ||x||2L ≥ m||x||2 for some m > 0. The generation of the solution

45

6 Minimization of Functionals: Hilbert Space, 2

(or minimizing sequence) by this method for symmetric positive operators is often referred to as an energy method. Now, let L : H → H be a densely defined symmetric operator (i.e. hLh1 , h2 i = hh1 , Lh2 i for h1 , h2 ∈ D(L)) but not necessarily positive. Consider the equation (6.28) Lw = u, w ∈ D(L), and let J(u)(v) = hLv, vi − hu, vi − hv, ui

(6.29)

for v ∈ D(L). Proposition 6.30. If Lw = u, then w is a stationary point of J(u)( · ). Conversely, if w ∈ D(L) is a stationary point of J(u)( · ), then Lw = u. Proof. Suppose Lw = u and v ∈ D(L), then d J(u)(w + v) = hLw, vi − hLv, wi − hu, vi − hv, ui d =0 = hLw − u, vi + hu, Lw − ui =0 by symmetry. In other words, w is a stationary point. On the other hand, if w is a stationary point, then 0=

d J(u)(w + w) = hLw − u, vi + hv, Lw − ui. d 0

(6.31)

If v ∈ D(L), then iv ∈ D(L) and so 0 = −hLw − u, vi + hv, Lw − ui and so, hv, Lw − ui = 0 all v ∈ D(L). But D(L) is dense so Lw − u = 0. t u So if L is simply symmetric, then solutions of Lw = u correspond to stationary points of J(u). Now suppose that L : H → H is densely defined. Then, by Remark B.18, L∗ : H → H is also densely defined. Consider the equations Lw = u, L∗ z = u ˜, (6.32) with w ∈ D(L), z ∈ D(L∗ ). Consider the functional J(u, u ˜)(w, z) = hLw, zi − hw, u ˜i − hu, zi

(6.33)

with w ∈ D(L), z ∈ D(L∗ ). Proposition 6.34. If Lw = u and L∗ z = u ˜, then (w, z) is a stationary point of J(u, u ˜)( · , · ). Conversely, if (w, z) is a stationary point, then Lw = u, L∗ z = u ˜. Proof. Suppose Lw = u, L∗ z = u ˜ and let v ∈ D(L), x ∈ D(L∗ ). Then

6 Minimization of Functionals: Hilbert Space, 2

46

dJ (w + tv, z) = hLv, zi − hv, u ˜i = hv, L∗ z − u ˜i = 0, dt t=0 dJ (w, z + tv) = hLw, xi − hu, xi = hLw − u, xi = 0, dt t=0

(6.35)

and (w, z) is a stationary point. Conversely, if (w, z) is a stationary point, then hv, L∗ z − u ˜i = 0 for all v in D(L) and hLw − u, xi = 0 for all x ∈ D(L∗ ) ∗ and Lw = u, L z = u ˜ by density. t u In other words, in the general case, the variational problem involves both L and L∗ . Of course, in any of these cases, we can project on a sequence of finite-dimensional subspaces to obtain approximate solutions which may converge to a solution. Example 6.36. Let H = L2 ([0, 1]) and let L = −Dt2 with D(L) = C 2 ([0, 1]) so that L is densely defined. Now, Z 1 hLu, vi − hu, Lvi = − [vDt2 u − uDt2 v]dt 0

Z =−

1

Dt [vDt u − uDt v]dt 0

= −[(vDt u − uDt v)]10 . If, for instance, u = t, v = 1, then hLu, vi − hu, vi = −1 so L is not symmetric and Ker L 6= {0}. Example 6.37. Let D ⊂ Rm be a bounded domain and let Γ = ∂D. Let L = −∇2 = −

m X

Dx2i = −

i=1

m X ∂2 ∂x2i i=1

and let D(L) = {u : u ∈ C 2 (D), u(Γ ) = 0} ⊂ H = L2 (D). By Green’s theorem Z Z h ∂v ∂u i [u∇2 v − v∇2 u]dx = u −v dγ ∂n ∂n D Γ

(6.38)

where ∂ is the exterior normal derivative. Clearly then L is symmetric. By ∂n another version of Green’s theorem, Z Z Z ∂v 2 − u∇ vdx = h∇u, ∇vidx − u dγ, (6.39) D D Γ ∂n and so for u ∈ D(L), hLu, ui = h−∇2 u, ui =

Z

Z h∇u, ∇uidx =

D

D

||∇u||2 dx ≥ 0,

(6.40)

6 Minimization of Functionals: Hilbert Space, 2

47

and, if h−∇2 u, ui = 0, then ∇u ≡ 0 so u is constant and u(Γ ) = 0 means u ≡ 0. In other words, L is positive definite. Now, so far, we have shown that h−∇2 u, ui =

Z X n  ∂u 2 dx D j=1 ∂xj

(6.41)

for u ∈ D(L). we now want to establish an inequality due to Friedrichs [B-5]: Z X Z m  ∂u 2 dx ≥ λ u(x)2 dx = λ||u||2 D j=1 ∂xj D

(6.42)

for some λ > 0, all u ∈ D(L). In other words, L is coercive. Since D is bounded, we can enclose D in an m-cube with edges of length α > 0 as the coordinate axes. If u ∈ D(L), then we extend u by zero outside D. Let (x1 , y2 , . . . , ym ) ∈ C. Then Z x1 ∂u (x1 , y2 , . . . , ym )dx1 = u(x1 , y2 , . . . , ym ) − u(0, y2 , . . . , ym ) (6.43) ∂x1 0 since (0, y2 , . . . , ym ) lies on an edge of the m-cube, u(0, y2 , . . . , ym ) = 0, and applying the Cauchy–Schwarz inequality, Z x1 h Z αh ∂u i2 ∂u i2 dx1 ≤ α dx1 . (6.44) u2 (x1 , y2 , . . . , ym ) ≤ x1 ∂x1 ∂x1 0 0 Integrating this inequality, we obtain (6.42) for a suitable λ. The condition u(Γ ) = 0 is called a Dirichlet condition. Example 6.45. Let D ⊂ Rm be a bounded domain and let Γ = ∂D. Let L = −∇2 and let D(L) = {u : u ∈ C 2 (D), ∂u/∂n(Γ ) = 0} ⊂ H = L2 (D). In view of (6.38), L is symmetric with this domain. However, while h−∇2 u, ui = R ||u||2 dx ≥ 0 for u ∈ D(L), this does not imply u ≡ 0 as, for example, D 2 ∂u/∂n(Γ ) = 0 and Ru ≡ 1 is in D(L). Let now D1 (L) = {u : u ∈ C (D), 2 u(x)dx = 0}. On D (L), L is positive since, if h−∇ u, ui = 0, then u is a 1 D R constant (as ∇u = 0) and D udx = 0 implies u ≡ 0. Example 6.46 (Incomplete). Let D ⊂ Rm be a bounded domain and let Γ = ∂D. Let L = ∂/∂t − ∇2 and let D(L) = {ψ : ψ ∈ Cc2 ([0, ∞), L2 (D)) (compact support), ψ(t, Γ ) = 0, t > 0 and ψ(t, x) = ψ1 (t)ψ2 (x) with ψ1 (0) = −1}. Then L is densely defined and L∗ = −∂/∂t − ∇2 . We observe that

6 Minimization of Functionals: Hilbert Space, 2

48

Z

∞Z

hθ, Lψi =

[θ, Lψ]dxdt Z0 ∞ ZD

Z



Z [∇θ, ∇ψ]dxdt −

[θ, Dt ψ]dxdt +

= 0

Z

0

D ∞

Z

=

Z

D ∞

Z [∇θ, ∇ψ]dxdt

[θ, Dt ψ]dxdt + 0

D

Z Z h ∂ψ i θ, dxdt ∂n T Γ

0

D

for θ, ψ ∈ D(L) (by Green’s theorem). It follows, by a similar calculation for L∗ , that Z ∞Z hθ, Lψi − hψ, L∗ θi = ([θ, Dt ψ] + [Dt θ, ψ])dxdt 0 Z ∞DZ + ([∇θ, ∇ψ] − [∇ψ, ∇θ])dxdt D Z ∞0 Z = [θ1 , Dt ψ1 ] + [Dt θ1 , ψ1 ]dt [θ2 , ψ2 ]dx 0 D ∞ Z = [θ1 , ψ1 ] [θ2 , ψ2 ]dx 0 D Z = − [θ2 , ψ2 ]dx. D

If we want to solve the problem Lψ = 0 with ψ(0, x) = ψ0 (x), ψ(t, Γ ) = 0, t > 0. Then, using “separation of variables”, ψ(t, x) = ψ1 (t)ψ2 (x) with ψ1 (0) = −1 and ψ2 (x) = −ψ0 (x). In other words, hψ, Lψi − hψ, L∗ ψi = ||ψ0 (x)||2 . Now, let Λ : H → H be a densely defined map (Λ is not necessarily linear) and consider the equation Λψ = u,

ψ ∈ D(Λ).

(6.47)

A well-known approach to attempting to solve (6.47) is the “method of least squares.” Let v1 , . . . , vn be independent elements of D(Λ) and let Vn = span {Λvi }. Then dim Vn ≤ n and we can consider the problem: minimize n X 2 αj Λvj − u . j=1

If En is the projection on Vn , then the solution is En u. Note, however, if Λ is not linear, then En u need not be an element of D(Λ). If v1 , . . . , vk , . . . is an independent total set in D(Λ) (such a set exists by density), then Vn ⊂ Vn+1 ⊂ is nested and, consequently, En+j En = En for j = 1, . . . , k, and n X 2 n+1 2 X min αj Λvj − u ≥ min αj Λvj − u ≥ · · · ≥ 0. j=1

j=1

49

6 Minimization of Functionals: Hilbert Space, 2

Pn Let u∗n = j=1 αj∗ Λvj = En u be the minimizer over Vn and let n = ||u∗n − u||2 , then n is a monotone decreasing sequence which is bounded below and so n ↓ ∗ ≥ 0. However, for a general Λ, ∗ need not be 0. Example 6.48. Let H = L2 ([0, 1]) and let D(Λ) = C 1 ([0, 1]) and define Λ by Λ(f ) = (f 0 )2 . Clearly Λ is not linear but is densely defined and, for instance, vi = ti for i = 0, 1, . . . is a basis contained in D(Λ). However, there are elements u in L2 ([0, 1]) such that min ||Λψ − u||2 > 0. Suppose now that Λ = L is a densely defined linear operator from H into ∞ H. Let J(u)(ψ) = ||Lψ−u||2 and let {vj }j=1 be linearly independent elements of D(L). Let Vn = span {v1 , . . . , vn } and let Jn (u)(ψn ) = ||Lψn − u||2 where ψn ∈ Vn . ∞ Definition 6.49. {vj }j=1 is L-complete if for given ψ ∈ D(L) and  > 0, there is an N such that ||Lψ − LψN || <  for some ψN ∈ VN .

Observe that if L ∈ L(H, H), then any basis, in particular any orthonormal basis, is L-complete. Consider the problem minimize Jn (u)(ψn ) = ||Lψn − u||2 Pn for ψn ∈ Vn . Let ψn = j=1 αj vj ∈ Vn (where αj ∈ C). Then ||Lψn −u||2 =

n X

αj αk hLvj , Lvk i−

j,k=1

n X

αj hLvj , ui−

j,k=1

n X

(6.50)

αk hu, Lvk i+hu, ui

j,k=1

(6.51) and if αj = aj + ibj , then the minimum value requires ∂||Lψn − u||2 = 0, ∂aj

∂||Lψn − u||2 = 0, ∂bj

j = 1, . . . , n,

or, equivalently, differentiating (6.51) with respect to αk , we have n X

aj hLvj , Lvk i = hu, Lvk i,

k = 1, . . . , n.

(6.52)

j=1

These are the “least squares” equations. The system is symmetric, finite dimensional with Gram determinant of the Lvj , j = 1, . . . , n. Proposition 6.53. If Ker L = {0}, then the least squares equations can be solved for any n and the solution ψn∗ is unique. Proof. Since Ker L = {0}, Lψ = 0 has only the zero solution. If n X

X γj Lvj = L( γj vj ),

6 Minimization of Functionals: Hilbert Space, 2

50

then γj = 0 for all j (vj independent) and so Lv1 , . . . , Lvn are independent and the Gram determinant is not zero, so the solution ψn∗ exists and is unique. t u ∞ is L-complete, that there is a λ > 0 Proposition 6.54. Suppose {vj }j=1 such that ||ψ|| ≤ λ||Lψ|| for all ψ ∈ D(L), and that there is a solution ψ ∗ of Lψ ∗ = u. Then there are ψn∗ which minimize Jn (u)(ψn ) and ψn∗ → ψ ∗ (in other words, ψn∗ is a minimizing sequence of J(u)(ψ)).

Proof. Since ||ψ|| ≤ λ||Lψ||, Ker L = {0} ∞ , ψn∗ ∈ Vn , and ψn∗ is a sequence {ψn∗ }n=1 ∞ Since {vj }j=1 is L-complete, there is an N , is a ψ˜N ∈ VN with ||Lψ ∗ − Lψ˜N || < /λ λ||L(ψ ∗ − ψn∗ )|| <  and so ψn∗ → ψ ∗ .

and, by Proposition 6.53, there minimizes Jn (u)(ψn ) for all n. for given  > 0, such that there for n ≥ N . But ||ψ ∗ − ψn∗ || ≤ t u

If L is densely defined, then its adjoint L∗ is also densely defined (Proposition B.14 and Remark B.19). The operator L∗ L is symmetric, self-adjoint and positive. It is positive since hLL∗ x, xi = hLx, Lxi = ||Lx||2 ≥ 0. Let us now consider the equation L∗ Lψ = L∗ u,

u ∈ D(L∗ ).

(6.55)

Proposition 6.56. If u ∈ D(L∗ ) and Lψ = u, then L∗ Lψ = L∗ u. Conversely, if ψ˜ satisfies L∗ Lψ˜ = L∗ u and there is a solution ψ of Lψ = u, then Lψ˜ = Lψ = u. Proof. The first part is obvious. On the other hand, if L∗ Lψ˜ = L∗ u, then 0 = hL∗ L(ψ˜ − ψ), (ψ˜ − ψ)i = ||L(ψ˜ − ψ)||2 and Lψ˜ = Lψ = u.

t u

Proposition 6.57. If ||ψ|| ≤ λ||Lψ||, ψ ∈ D(L), then L∗ L is coercive. Proof. hL∗ Lψ, ψi = hLψ, Lψi = ||Lψ||2 ≥ ||ψ||2 /λ.

t u

Proposition 6.58. Consider the problems (1) Lψ = u and (2) L∗ Lψ = L∗ u. Under the assumptions of Proposition 6.54, the method of least squares for (1) is equivalent to the energy method ([Lu, v] = [ · , · ]L ) for (2). Proof. Simply note that solving (2) is equivalent to minimizing ˜ = hL∗ Lψ, ψi − hψ, L∗ ui − hL∗ u, ψi = hLψ − u, Lψ − ui − hu − ui J(u)(ψ) = ||Lψ − u||2 − ||u|| which is, thus, equivalent to minimizing ||Lψ − u||2 .

t u

51

6 Minimization of Functionals: Hilbert Space, 2

then Corollary 6.59. If L is coercive, √ the energy √ method for Lψ = u is √ equivalent to least squares for Lψ = ( L)−1 u ( L is unique positive square root of L as in Appendix B). Example 6.60. Let D ⊂ R2 be a bounded domain with ∂D = Γ (smooth). Consider the problem −∇2 ψ(x, y) = u(x, y),

x, y ∈ D,

ψ(Γ ) = 0,

(6.61)

and let L = −∇2 with D(L) = {ψ : ψ ∈ C 2 (D), ψ(Γ ) = 0} ⊂ H = L2 (D). Let {vj (x, y)}∞ j=1 be, say, an orthonormal independent set with vj (Γ ) = 0. Let Vn = span {vj }nj=1 and let Jn (u)(ψn ) = ||Lψn − u||2 for ψn ∈ Vn . Let ψn = Pn Pn ∗ ∗ ∗ ∗ j=1 αj vj and we want αi , . . . , αn such that ψn = j=1 αj vj minimizes Z

(−∇2 ψn − u)2 dxdy = ||Lψn − u||2

(6.62)

D

over ψn ∈ Vn . In view of the propositions, we have Z lim || − ∇2 ψn∗ − u||2 = lim [−∇2 ψn∗ − u]2 dxdy = 0. n→∞

n→∞

D

Let K(x, y; ξ, η) be the Green’s function of the problem which is given by K(x, y; ξ, η) =

1 1 1 1 log = log p , 2 2π ||(x, y) − (ξ, η)|| 2π (x − ξ) + (y − η)2

then ψ ∗ (x, y) − ψn∗ (x, y) =

Z

K(x, y; ξ, η)[−∇2 ψ ∗ + ∇2 ψn∗ ]dξdη

D

(where ψ ∗ is solution of Lψ = u) and, by the Schwarz inequality, Z Z ∗ ∗ 2 2 [ψ (x, y) − ψn (x, y)] ≤ K (x, y; ξ, η)dξdη [∇2 ψn∗ + u]2 dxdy. D

D

R Since D K 2 (x, y; ξ, η)dξdη is bounded (logarithmic) and D [∇2 ψn∗ + u]2 dxdy R = D [−∇2 ψn∗ − u]2 dxdy → 0 as n → ∞, we have ψn∗ → ψ ∗ uniformly in D. One can also show that Dx ψn∗ → Dx ψ ∗ , Dy ψn∗ → Dy ψ ∗ in L2 -norm by an extension of the argument. R

Example 6.63. Let (Ω, B, µ) be a probability space and let H = L2 (Ω, R) R with inner product hx, yi = E{xy} = Ω x(ω)y(ω)dω (E is the expectation). Suppose that Lψ = αψ + β and that u is an element of H. We want to determine α∗ , β ∗ so that ||u − (αψ + β)||2 = E{(u − (αψ + β))2 } is minimized. Consider first the case α = 0 (i.e. a constant approximation). Then E{(u − β)2 } = E{u2 } − 2βE{u} + β 2 and, differentiating with respect to β, we have

6 Minimization of Functionals: Hilbert Space, 2

52

∂||u − β||2 = 2β − 2E{u} = 0 ∂β if β = E{u}. Thus β ∗ = E{u}. Next consider the general case which, by the constant case, reduces to minimizing ||(u − E{u}) − α(ψ − E{ψ}||2 over α with β ∗ = E{u} − αE{ψ}. Now ||(u − E{u}) − α(ψ − E{ψ})||2 = E{(u − E{u})2 } − 2αE{(u − E{u})(ψ − E{ψ}) + α2 E{(ψ − E{ψ})2 } and so α∗ = cov (u, ψ)/var (ψ) where cov ( · , · ) is the covariance and var ( · ) is the variance. Example 6.64. Let (Ω, B, µ) be a probability space and let H be a Hilbert space. Consider the Hilbert space H = L2 ([0, 1] × Ω, H) with inner product given by Z Z 1

hx, yi =

[x(t, ω), y(t, ω)]H dtdω Ω

(6.65)

0

where [ , ]H is the inner product in H. If we view H as L2 (Ω, L2 ([0, 1], H)), then hx, yi = E{[x(ω), y(ω)]} (6.66) where [ , ] is the inner product in L2 ([0, 1], H). Let m(t)(x) = E{x(t, ω)} = R x(t, ω)dω be the (moving) mean of x; let σ(t, s)(x, y) be given by Ω σ(t, s)(x) = E{[x(t, ω) − m(t)(x), x(t, ω) − m(t)(x)]H },

(6.67)

and let σ(t, t)(x) be the (moving) variance of x; and, let K(t, s)(x, y) be given by K(t, s)(x, y) = E{[x(t, ω) − m(t)(x), y(s, ω) − m(s)(y)]H

(6.68)

so that K(t, s)(x, y) is the (moving) covariance of x and y. Observe that K(t, t)(y, x), and that σ(t, t)( · ) ≥ 0. In other words, as operators on H, K(t, s) is symmetric and non-negative. Suppose L is a linear map of H into H and u is an element of H. Consider the functional J(u)(ψ) given by J(u)(ψ) = ||Lψ − u||2 and the “least squares” problem of minimizing J(u)(ψ) as a function of ψ. We shall treat a special case in this example. Assume that m(t)(u) = m is constant (u is wide-sense stationary). To begin with, assume that L = I and that ψ(t, ω) = λ(t) = λ is a constant sure function. Then, it is easy to see that ||λ − u||2 = ||λ||2 + ||u||2 − 2hλ, m(t)(u)i = ||λ||2 + ||u||2 − 2hλ, mi and, hence, that the minimum is achieved when λ = m. Now, suppose that m(t)(u) = m(t) and that L = I and ψ(t, ω) = λ(t), still a sure function but

53

6 Minimization of Functionals: Hilbert Space, 2

not a constant. Then ||λ(t) − u||2 =

Z

1

Z [λ(t), λ(t) − 2m(t)]H dt +

0

Z

1

[m(t), m(t)]H dt 0

1

[λ(t) − m(t), λ(t) − m(t)]H dt

= 0

and so the minimum is achieved when λ(t) = m(t). Suppose now that Lψ = α(t)ψ + β(t) where α(t), β(t) are sure functions and m(t)(u) = m(t). Let us assume further that u and ψ are centered, i.e., that m(t)(u) = m and m(t)(ψ) = n are constant (this assumption is not essential). Then, as before, β ∗ (t) = m − α(t)n and we want to minimize ||(u − m) − α(ψ − n)||2 = J(u)(α) over α(t). Then Z J(u)(α) =

1

{σ(t, t)(u) − 2α(t)K(t, t)(u, ψ) + α2 σ(t, t, )(ψ)}dt,

0

and the minimum is achieved for α(t) = K(t, t)(u, ψ)/σ(t, t)(u). Then R1 J(u)(α) = 0 [σ(t, t, )(u) − K(t, t, )(u, ψ)K(t, t)(u, ψ)/σ(t, t)(ψ)]dt. If ψ = u, then α = 1 and J(u)(α) = 0. Example 6.69. Let (Ω, B, µ) be a probability space and let H = L2 (Ω, H) where H is a Hilbert space. Let u, n ∈ H and L a linear map of H into H. Consider the equation u = Lψ + n and we want to find a ψˆ such that E{||ψˆ − 2 ψ||H } is minimized over an appropriate set of ψ. Suppose, for simplicity, that E{u} = 0, E{n} = 0 and the admissible set of ψ, A is {ψ = Ku : K a linear map of H into H}. If L∗ L is invertible, then E{||ψˆ − ψ||2 } = E{||(L∗ L)−1 L∗ u − Ku||2 } is minimized over A for K = (L∗ L)−1 L∗ . Note also that E{(L∗ L)−1 Lu} = 0. Observe also that since E{||ψˆ − η||2 } ≥ 0 for any η, ψˆ = (L∗ L)−1 Lu gives the minimum for all η. ψˆ is called a minimum variance estimate. Note also that (L∗ L)−1 L∗ L = I. Example 6.70. Let (Ω, B, µ) be a probability space and let T be a real interval and H a Hilbert space. Let H = L2 (T × Ω, H). If n(t, ω) ∈ H, then (Appendix B) n(t, ω) · n(t, ω) (= n(t, ω)n∗ (t, ω)) is the element of L(H, H) given by [n(t, ω) ◦ n(t, ω)]h = n(t, ω)[n(t, ω), h]. (6.71) Note that it is a Hilbert–Schmidt operator and that if K is bounded, then 2 = ||Kn(t, ω)n∗ (t, ω)K∗ ||HS . It follows that, in H, ||Kn(t, ω)||H

6 Minimization of Functionals: Hilbert Space, 2

54

||Kn||2 =

Z ZT

=

E{||Kn(t, ω)n∗ (t, ω)K∗ ||HS }dt

(6.72)

KTrHS [E{n ◦ n}]K∗ dt

(6.73)

T

where TRHS is the Hilbert–Schmidt trace [D-9]. Consider the equation u = Lψ + n with L a linear map of H into H. We want to find an admissible ψˆ such that E{||ψˆ − ψ||2 } is minimized (i.e., 2 ||ψˆ − ψ||H is minimized). Let A = {ψ˜ : ψ˜ = Ku, KL = I}. This is the admissible set. It is easy Rto see that, over A, E{||ψˆ − ψ||2 } is ||Kn||2 and is given by (6.73). If W = T TRHS [E{n ◦ n}]dt, then the problem reduces to: find K such that KL = I and KWK∗ is minimized. Example 6.74 (Finite Elements). Consider H 1 ([0, 1]) = H 1 with inner R1 ˙ y(t)dt product [x, y] = 0 x(t) and the problem ˙ −

d2 x = u, dt2



dx = α, dt t=0

x(1) = β.

(6.75)

Let Y = {y ∈ H 1 : y(1) = β} (note that Y is not a subspace) and let W = {w ∈ H 1 : w(1) = 0} (W is a subspace). Then (see (2.12) and (2.13)) the problem (6.75) is equivalent to the variational problem: find xu ∈ H 1 such that Z 1 Z 1 [w, xu ] = w(t) ˙ x˙ i (t)dt = w(t)u(t)dt + w(0)α (6.76) 0

0

for all w ∈ W . Let (x, y) =

R1 0

x(t)y(t)dt so that (6.76) becomes

[w, xu ] = (w, u) + w(0)α.

(6.77)

Let Yh ⊂ Y and Wh ⊂ W . Given Wh and γh with γh (1) = β, we define Yh (Wh , γh ) = Wh + γh (a translate of Wh ). If yh ∈ Yh = Yh (Wh , γh ), then yh = vh + γh , vh ∈ Wh , and yh (1) = vh (1) + γh (1) = 0 + γh (1) = β. Then we can consider the approximate problem: find vh∗ ∈ Wh such that [wh , vh∗ ] = (wh , u) + wh (0)α − [wh , γh ]

(6.78)

for all wh ∈ Wh . Let ψj , j = 1, . . . , n be independent elements of W and let ψn+1 ∈ / W with ψn+1 (1) = 1. Set Wn = span {ψ1 , . . . , ψn } and Yn = Yn (Wn , ψn+1 β) = Wn + βψn+1 . Then the approximation problem takes the form n X [ψi , ψj ]aj = (ψi , u) + ψi (0)α − (ψi , ψn+1 )β (6.79) j=1

for i = 1, . . . , n. If K = ([ψi , ψj ]), Φ = ((ψi , u) + ψi (0)α − (ψi , ψn+1 )β) and a = (aj ). Then (6.79) is the matrix equation Ka = Φ. Now let us choose

6 Minimization of Functionals: Hilbert Space, 2

55

the functions ψi . Let 0 = t1 < t2 < · · · < tn+1 = 1 be a partition of [0, 1], hj = tj+1 − tj , and h = max(hj ) (mesh size). Let ( (t2 − t)/h1 , t1 ≤ t ≤ t2 , ψ1 (t) = 0, t > t2 ,  (t − tj−1 )/hj−1 , tj−1 ≤ t ≤ tj ,  ψj (t) = (tj+1 − t)/hj , tj ≤ t ≤ tj+1 , j = 2, . . . , n,   elsewhere, 0, ( t < tn , 0, ψn+1 (t) = (t − tn )/hn , tn < t ≤ tn+1 . Then ψj , j = 1, . . . , n, are independent elements of W and ψn+1 (1) = 1. These are “piecewise linear finite elements.” As an exercise, the reader should solve the approximation problem for, say, n = 2 and prove convergence as n → ∞ (cf. [E-3]).

Chapter 7

Dynamical Control Systems

Let T be completely ordered by t0 such that (t1 , ψ(t1 ; t0 , x0 , u)) ∈ S.

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_7

57

7 Dynamical Control Systems

58

So far the concept is extremely general. The next series of definitions provides for various more concrete restrictions. Definition 7.3. Σ is a discrete time system if T ⊂ Z, the integers. Σ is a (real) continuous time system if T is an open set in R. Σ is a time-invariant system if T is an additive group, the shift map στ : Ω → Ω, τ ∈ T , given by στ (u)(t) = u(t + τ ) is defined, and (i) ψ(t; t0 , x, u) = ψ(t + s; t0 + s, x, σs (u)) for all s ∈ T (shift invariance), and (ii) ϕ(t, · ) : X → Y is independent of T . Definition 7.4. Σ is a geometric (Banach) control system if (i) (ii) (iii) (iv)

Σ is discrete or continuous time; U, X, Y are Banach manifolds; Ω ⊂ space of bounded maps of T → U ; and ψ and ϕ are continuous (in appropriate topologies).

A geometric system is linear if X, Y, Ω are linear spaces and (i) ψ(t; t0 , x0 , u) is linear (affine) in x and (locally) linear in u for all t, t0 ; and (ii) ϕ(t, x) is linear in x for all t (with (t, x) ∈ Dϕ ). Remark 7.5. If T is an additive monoid and E(x, u) = {t0 ∈ T : (t0 , x, u) ∈ T × X × Ω and ψ(t; t0 , x, u) is defined }, then s(s1 + s2 ; r, u) = s(s1 + s2 ; s1 , u)s(s1 ; r, u) = s(s1 + s2 ; s2 , u)s(s2 ; r, u) for s1 , s2 , r ∈ E(x, u). Example 7.6. Let X, Y, U be Banach spaces and T = [0, ∞). Let Ω = {u : u : T → U , u regulated, bounded} and let B ∈ L(U, X), C ∈ L(X, Y ). Suppose that S(t), t ∈ [0, ∞) is a strongly continuous semigroup so that (i) S(t) ∈ L(X, X); (ii) S(0) = I; (iii) S(t + s) = S(t)S(s) = S(s)S(t), s, t ≥ 0; and (iv) limt→0+ ||S(t)x − x|| = 0 for all x. Rt Define ψ(t; t0 , x0 , u) = S(t − t0 )x0 + t0 S(t − τ )Bu(τ )dτ for t ≥ t0 and ϕ(t, x) = Cx (independent of t). Then s(t; t0 , u)x0 = ψ(t; t0 , x0 , u). Let us check the properties of Definition 7.1. Clearly Ω is non-empty. Since Z t0 ψ(t0 ; t0 , x0 , u) = S(0)x0 + S(t0 − τ )Bu(τ )dτ = Ix0 = x0 , t0

DCS2 holds. If t0 < s ≤ t, then

7 Dynamical Control Systems

59

ψ(t; s, ψ(s; t0 , x0 , u), u) Z t = S(t − s)ψ(s, t0 , x0 , u) + S(t − τ )Bu(τ )dτ s Z s i Z t h S(t − τ )Bu(τ )dτ = S(t − s) S(s − t0 )x0 + S(s − τ )Bu(τ )dτ + t0

Z

s

s

= S(t − t0 )x0 +

t

Z S(s − τ )Bu(τ )dτ +

S(t − τ )Bu(τ )dτ ) s

t0

and so DCS3 is satisfied. DCS4 is an immediate consequence of properties of the integral and DCS5 holds by the definition of ϕ. In finite dimensions, S(t) = eAt for a matrix A. Example 7.7. Let X, Y, U be Banach spaces and T = [0, ∞). Let Ω = {u : u : T → Ω regulated, bounded} and let B( · ) : T → L(U, X), C( · ) : T → L(X, Y ) be regulated. Let E(τ ) = {(t, s) : 0 ≤ s ≤ t ≤ τ } and let S(t, s) : E(τ ) → L(X, X) satisfy (i) S(s, s) = I, s ∈ E(τ ); (ii) S(t, r)S(r, s) = S(t, s), 0 ≤ s ≤ r ≤ t ≤ τ , (iii) S( · , s) is strongly continuous on [0, t]. If Z

t

S(t, s)B(s)u(s)ds,

ψ(t; t0 , x0 , u) = S(t, t0 )x0 + t0

then ψ(t; t0 , xo , u) defines a dynamical control system (Exercise: show this). Definition 7.8. A dynamical control system Σ is a differential system if (i) Σ is discrete or continuous time; (ii) U, X, Y are subsets of Banach spaces BU , BX , BY , respectively, with U × X × Y ⊂ O open in BU × BX × BY (often strengthened to U × X × Y ⊂ O); (iii) there is a continuous map f : T × BX × BU → BX (Df not necessarily all of T × BX × BU and T × X × U ⊂ Df ) such that, given t0 ∈ T , x0 ∈ X, and u ∈ Ω, and E = [t0 , t1 ], the differential equation dz(t) = f (t, z(t), u(t)) dt

(7.9)

has a unique solution ψ(t; t0 , x0 , u) with z(t0 ) = x0 and t ∈ E; (iv) moreover, ψ(t; t0 , x0 , u) ∈ X for all t; (v) there is a continuous map ϕ : T × X → Y with (t1 , ψ(t1 ; t0 , x0 , u)) ∈ Dϕ (with t1 a “final” time). f is called the generator of Σ. In general, existence and uniqueness results for such differential equations are local, and “finite escape times” are possible. The basic conditions are usually continuity and/or boundedness of f and locally Lipschitz with respect to z and u. The approach is to replace (7.9) by the integral equation

7 Dynamical Control Systems

60

Z

t

f (t, z(t), u(t))dt

z(t) = x0 +

(7.10)

t0

and solve by iteration (as we shall see in the sequel). The key issue is to appropriately topologize the space of solutions and the space of controls. Definition 7.11. Let E be an interval with end points a, b, a < b, where either may be infinite and let X be a Banach space. Let B(E, X) = {f : f is a bounded map of E into X} with ||f || = supt∈E ||f (t)||X . A map f : E → X is a step-function if there is a finite sequence t0 = a < t1 < · · · < tn = b in E (closure of E) such that f is constant on the open intervals (t0 , ti+1 ), 0 ≤ i ≤ n − 1. Let S(E, X) = {f : f is a step-function from E to X}. A mapping f : E → X is regulated if f (t−), f (t+) exist for all t in E (i.e., f has one-sided limits at every point). Let R(E, X) = {f : f is a regulated map of E into X}. Clearly every step-function is regulated and bounded. Proposition 7.12 ([D-7]). If E = [a, b] is a compact interval, then R(E, X) is closed in B(E, X) and S(E, X) is dense in R(E, X). Proof. See [D-7].

t u

Clearly, also, any continuous map of E into X is regulated as is any monotone map of E into R. Definition 7.13. Let f : E → X and let v(f, E) = glb{v ≥ 0 : Pn−1 i=0 ||f (ti+1 ) − f (ti )|| ≤ v, for any strictly increasing (finite) sequence in E}. v(f, E) is the variation of f on E and f is of bounded variation on E if v(f, E) < ∞. Let BV (E, X) = {f : f of bounded variation on E}. Proposition 7.14. If E = [a, b] is a compact interval, then BV (E, X) is a Banach space with ||f || = ||f (a+)|| + v(f, E). If f ∈ BV (E, X), then f ∈ R(E, X). Proof. Exercise.

t u

Definition 7.15. A map f : E → X is absolutely continuous if, given  > 0, there is P a δ > 0 such that if (αi , β i ), i = 1, . . . , n, are disjoint subintervals of P n n E with i=1 |βi − αi | < δ, then i=1 ||f (βi ) − f (αi )|| < . Let AC(E, X) = {f : f is absolutely continuous}. Remark 7.16. AC(E, X) is a subspace of BV (E, X) with the norm inherited from BV (E, X) and BV (E, X) ⊂ R(E, X). Remark 7.17 (Ascoli lemma). A family Φ ⊂ C (E, X) is (relatively) compact if and only if (1) Φ is equicontinuous7.1 ; and 7.1 Φ is equicontinuous at every t ∈ E, if for given  > 0, there is a δ > 0 such that if ||t − t0 || < δ, then ||f (t) − f (t0 )|| <  for all f ∈ Φ.

7 Dynamical Control Systems

61

(2) Φ[x] = {f (x) : f ∈ Φ} is relatively compact in X (all x). Consider a differential dynamical control system Σ with generator f (t, z, u). The following standard (local) existence theorem is a typical starting point. Theorem 7.18. Assume: (1) (2) (3) (4) (5) (6) (7)

T is open; f : T × X × U → X is continuous; f (t, x(t), w(t)) is regulated if x(t), w(t) are regulated; t0 ∈ T , x0 ∈ X; u : T → U is regulated; f is bounded on a neighborhood of (t0 , x0 , u(t0 )); and ||f (t, x1 , u1 ) − f (t, x2 , u2 )|| ≤ λ(|t − t0 |){||x1 − x2 || + ||u1 − u2 ||} on a neighborhood of (t0 , x0 , u(t0 )) (locally Lipschitz).

Then there is a neighborhood of (t0 , x0 ) on which there is a unique solution. Proof. There are α, τ, ν such that ||x − x0 || ≤ α, ||t − t0 || ≤ τ , u(s) regulated on [t0 − τ, t0 + τ ], u(s) ∈ S(u(t0 ), ν), and S(t0 , τ ) × S(t0 , α) × S(u(t0 ), ν) = N closed and bounded with f bounded on N , ||f (t, x(t), u(t)|| ≤ M , and ||f (t, x1 , u(t)) − f (t, x2 , u(t))|| ≤ λ(|t − t0 |{||x1 − x2 ||} on N . Let 0 < k < 1 and let nα k o ρ ≤ min , . M max λ Let Γ = {ϕ(t) : ||ϕ(t)−x0 || ≤ α, t ∈ S(t0 , ρ) } and define the map Ψ : Γ → Γ via Z t

(Ψ ϕ)(t) = x0 +

f (s, ϕ(s), u(s))ds. t0

Then Z t ||(Ψ ϕ)(t) − x0 || = f (s, ϕ(s), u(s))ds t0

≤ |t − t0 | sup ||f (s, ϕ(s), u(s)|| s∈[t0 ,t]

≤ |t − t0 | · M ≤

α M

and so Ψ maps Γ → Γ . We claim that Ψ is a contraction. Since Z t ||(Ψ ϕ1 )(t) − (Ψ ϕ2 )(t)|| ≤ ||f (s, ϕ1 (s), u(s)) − f (s, ϕ2 (s), u(s))||ds t0

Z

t

≤ |t − t0 |

λ(|t − t0 |) ||ϕ1 (s) − ϕ2 (s)||ds t0

≤ ρ max λ(|t − t0 |) · ||ϕ1 − ϕ2 || ≤ k max λ(|t − t0 |)

||ϕ1 − ϕ2 || , max λ(|t − t0 |)

7 Dynamical Control Systems

62

u t

the theorem holds.

Corollary 7.19. If f (t, x, u) = A(t)x + B(t)u with A( · ), B( · ) regulated on T , then there is a unique solution on T (i.e., for linear systems, there is a global result). The key elements are defining appropriate notions of distance and control classes and having “good” smoothness (at least locally) notions for the generator f . With this in mind, let us extend Definition 7.8 to allow almost everywhere considerations. Definition 7.20. A dynamical control system Σ is an extended differential system if (i) Σ is discrete or continuous time; (ii) U, X, Y are subsets of Banach spaces BU , BX , BY , respectively, with U × X×Y ⊂ O open in BU ×BX ×BY (often strengthened to U ×X×Y ⊂ O); (iii) there is a map f : T × BX × BU → BX (Df not necessarily all of T × BX × BU and T × X × U ⊂ Df ) such that, given t0 ∈ T , x0 ∈ X, u ∈ Ω, and E = [t0 , t1 ], the differential equation dz(t) = f (t, z(t), u(t)) dt

(7.21)

has a (unique) solution ψ(t; t0 , x0 , u) with z(t0 ) = x0 for almost all t ∈ E (i.e., except on a set of measure 0); (iv) moreover, ψ(t, t0 , x0 , u) ∈ X for all t; (v) there is a continuous output map ϕ : E × X → Y . We shall use (abuse of language) the term “differential system” for extended differential systems as well. The existence theorem which follows is due to Caratheodory [C-2] and the proof follows Coddington and Levinson [C-6]. ˜ ⊂ U be bounded and let Ω = {u( · ) : u : Theorem 7.22 ([C-2, C-6]). Let U ˜ , u is measurable }.7.2 Let D be a bounded domain in E × X and let E→U ˜ → X so that if (t, x, ω) ∈ D × U ˜ , then f (t, x, ω) ∈ X. f :D×U Suppose that: (i) f is measurable in t for fixed (x, ω); (ii) f is continuous in (x, ω) for fixed t; (iii) there is a (Lebesgue) integrable function m(t) on E with ||f (t, x, ω)|| ≤ m(t) ˜. for (t, x, ω) ∈ D × U 7.2

We agree as usual to identify functions which differ on sets of measure 0 and slur over the equivalence classes.

7 Dynamical Control Systems

63

Then, given t0 ∈ E, u ∈ Ω and x0 ∈ X such that (t0 , x0 ) ∈ D (hence ˜ ), there is an α > 0 and a solution ψ(t; t0 , x0 , u) of (t0 , x0 , u(t0 )) ∈ D × U z(t) ˙ = f (t, z(t), u(t))

(7.23)

almost everywhere on [t0 − α, t0 + α] with ψ(t0 ; t0 , x0 , u) = x0 .

(7.24)

Proof. Suppose E = [a, b] and t ≥ t0 (the case t ≤ t0 is entirely similar). We will use the Ascoli lemma (Remark 7.17). Set ( 0, t < t0 , M (t) = R t t0 ≤ t ≤ b, m(s)ds, t0 so that M is a continuous, non-negative, non-decreasing function with M (t0 ) = 0. Choose α > 0 so that t ∈ [t0 , t0 + α] (i.e., t − t0 ≤ α) and ||(t, x) − (t0 , x0 )|| ≤ M (t) implies (t, x) ∈ D and, consequently, (t, x, u(t)) ∈ ˜ . Now, let D×U z1 (t) = x0 , t ∈ [t0 , t0 + α], ( x0 , t ∈ [t0 , t0 + α/2], R t−α/2 z2 (t) = x 0 + t0 f (s, z2 (s), u(s))ds, t0 + α/2 < t ≤ t0 + α, and, generally, α t0 ≤ t ≤ t0 + , n Z t−α/s zn (t) = x0 + f (s, zn (s), u(s))ds, zn (t) = x0 ,

t0

(7.25) t0 +

α < t ≤ t0 + α. n

(7.26)

Observe that, for any n, (7.25) defines zn (t) on the interval [t0 , t0 + α/n] and, since (t, x0 ) ∈ D on that interval, (7.26) defines zn (t) on [t0 + α/n, t0 + 2α/n], and zn (t) is continuous on this interval with  α ||zn (t) − ξ|| ≤ M t − . (7.27) n If zn (t) is defined on t0 ≤ t ≤ t0 + να/n, 1 < ν < n, then (7.26) defines zn (t) as a continuous function on (t0 + να/n), t0 + (ν + 1)α/n]. By induction, the zn (t) are continuous functions on [t0 , t0 + α] such that zn (t) = x0 , ||zn (t) − x0 || ≤ M (t0 − α/n), Moreover, if s1 , s2 ∈ [t0 , t0 + α], then

t∈

[t0 , t0 + α],

t ∈ (t0 + α/n), t0 + α).

(7.28)

7 Dynamical Control Systems

64

  α α  ||zn (s1 ) − zn (s2 )|| ≤ M s1 − − M s2 − . n n It follows that the zn ( · ) are a uniformly-bounded equicontinuous set and, hence, by the Ascoli lemma, that there is a uniformly convergent subsequence znj which converges to a continuous function z(t). Moerover, ||f (t, znj (t), u(t)|| ≤ m(t) and, since f is continuous for fixed t, Z

t

lim

Z

t

f (s, znj (s), u(s))ds =

j→∞

f (s, z(s), u(s))ds

t0

t0

for t ∈ [t0 , t0 + α]. But Z

t

Z

t

f (s, znj (s), u(s))ds −

znj (t) = x0 + t0

f (s, znj (s), y(s))ds t−α/nj

and so Z

t

z(t) = x0 +

f (s, z(s), u(s))ds t0

for t ∈ [t0 , t0 + α], which is the result.

t u

Theorem 7.18 and Theorem 7.22 are “local” existence theorems. We next develop some “global” results. The key idea is to combine boundedness of f (t, x(t), u(t)) and the local Lipschitz condition to extend solutions. We begin with five versions of Gronwall’s lemma which applies to non-negative realvalued functions and which we apply to norms and locally finite measures. Lemma 7.29 (Gronwall 1). Let ψ ∈ L1 ([0, a], R) with ψ ≥ 0 and let ϕ0 ∈ R with ϕ0 ≥ 0. If w(t) ∈ L∞ ([0, a], R), w ≥ 0, with Z w(t) ≤ ϕ0 +

t

ψ(s)w(s)ds,

t ∈ [0, a],

(7.30)

0

then w(t) ≤ ϕ0 exp

Z

t

ψ(s)ds

(7.31)

0

on [0, a]. Proof. Let R(t) = ϕ0 ψ(t) a.e. Let

Rt 0

ψ(s)w(s)ds. Then, in view of (7.30), R0 (t)−R(t)ψ(t) ≤  Z t  α(t) = exp − ψ(s)ds . 0

Then

d [α(t){R0 (t) − R(t)ψ(t)}] ≤ ϕ0 ψ(t)α(t), dt and the result follows (α(t) is an “integrating factor”).

t u

7 Dynamical Control Systems

65

Lemma 7.32 (Gronwall 2). Let ψ : [0, a] → R, ϕ0 : [0, a] → R be regulated with ψ ≥ 0 and ϕ0 ≥ 0. If w is regulated, w ≥ 0, and Z t (7.33) w(t) ≤ ϕ0 (t) + ψ(s)w(s)ds, 0

then

t

Z w(t) ≤ ϕ0 (t) +

ϕ0 (s)ψ(s) exp

t

Z

 ψ(r)dr ds

(7.34)

s

0

on [0, a]. Proof. Mimics the previous proof. Let R(t) = R(t)ψ(t) ≤ ϕ0 (t)ψ(t). Let

Rt 0

ψ(s)w(s)ds. Then, R0 (t) −

 Z t  S(t) = R(t) exp − ψ(s)ds) 0

be an “integrating factor” and note S(0) = 0. Then Z S(t) ≤

t

 Z ϕ0 (s)ψ(s) exp −

s

 ψ(r)dr) ds,

0

0

t u

and the result follows.

Lemma 7.35 (Gronwall 3). Let ψ : [0, a] → R, ψ continuous (regulated enough) with ψ ≥ 0. Suppose there are α > 0, β > 0 such that t

Z ψ(t) ≤ β

t ∈ [0, a],

ψ(s)ds + αt,

(7.36)

0

then ψ(t) ≤

α βt (e − 1), β

t ∈ [0, a]

(7.37)

on [0, a]. Proof. Let m = ||ψ||∞ = sup{|ψ(t)| : t ∈ [0, a]}. Then |ψ(t)| ≤ mβt + αt = α mβt + α β (βt) = m + β (βt). Apply to (7.36) to obtain 2

|ψ(t)| ≤ m

β 2 t2 α X (βt)j + 2! β j=1 j!

and by induction n

(βt)n α X (βt)j + . |ψ(t)| ≤ m n! β j=1 j! Taking the limit as n → ∞ gives the result.

t u

7 Dynamical Control Systems

66

Corollary 7.38. Suppose that f (t, x, ω) is continuous, ||f (t, x0 , u0 )|| ≤ α for all t, and ||f (t, x, ω1 )−f (t, x, ω2 )|| ≤ β{||x1 −x2 ||+||ω1 −ω2 ||}. If z = f (t, z, u) Rt with z(t0 ) = x0 and ψ(t) = ||z(t0 + t) − x0 ||, then ψ(t) ≤ β t0 ψ(s)ds + αt and ||z(t) − x0 || ≤ (α/β)(eβ|t−t0 | − 1). t u

Proof. Apply Gronwall 3.

Lemma 7.39. Let {ϕn }, {ψn } be non-negative sequences. Let {wn } be a non-negative sequence such that X (7.40) wn ≤ ϕn + ψj wj , n ≥ 0. 0≤j  > 0. Let δ > 0 such that |t| < δ implies |o(||th||)/t| < . If −δ < t < 0, then (J(z0 + th) − J(z0 ))/t ≤ 0 but G0 (z0 )h + o(||th||)/t >  −  = 0. t u Remark 8.38. If J is Gateaux differentiable on K convex and if z0 ∈ K is a minimum of J, then dW (z0 )[h − z0 ] ≥ 0 for all h ∈ K. Proof. Since K is convex, z0 +θ(h−z0 ) ∈ K for 0 ≤ θ ≤ 1 and, by elementary calculus, d J(z0 + θ[h − z0 ]) ≥ 0. t u dθ θ=0 Remark 8.39. If J ∈ C 2 (z), U open and convex in Z, J 0 (u0 ) = 0, and J 00 (u) ≥ 0 for u ∈ U , then u0 is a minimum of J on U . Proof. We have Z J(u0 + h) − J(u0 ) = 0 · h +

1

(1 − s)J 00 (u0 + sh)[h, h]ds.

0 00

But u0 + sh ∈ U , so J (u0 + sh)[h, h] ≥ 0.

t u

Let u ⊂ Z, z0 ∈ u, and Au (z0 ) be an approximation to u at z0 . Let Cu (z0 ) = {(dW J(z0 )δx, δx) : δx ∈ Au (z0 )} and ρu (z0 ) = {(ρ, δx) : ρ < 0, δx ∈ Au (z0 )}. These are convex cones “attached” to (J(z0 ), z0 ). Proposition 8.40. If there is a δx in Au (z0 ) such that J(z0 + δx) − J(z0 ) < 0, then Cu (z0 ) ∩ ρu (z0 ) is non-empty and conversely. Proof. We have

8 Optimization of Functionals: Necessary Conditions

87

J(z0 + θδx) − J(z0 ) = θdW J(z0 )δx + o(θ). Since J(z0 + δx) − J(z0 ) < 0, there is an  with J(z0 + δx) − J(z0 ) <  < 0 and hence, for 0 < θ ≤ 1, dW J(z0 )δx + o(θ)/θ <  < 0. Taking limθ→0 gives dW J(z0 )δx ≤  < 0 and (dW J(z0 )δx, δx) is in Cu (z0 ) ∩ ρu (z0 ). The converse t u is similar. Corollary 8.41. If z0 is a minimum of J, then there is a λ = (λ0 , λ1 ) in (R ⊕ X)∗ which separates Cu (z0 ) and ρu (z0 ). Proof. Enough to show Cu (z0 ∩ ρu (z0 ) = ∅. If not, we have a contradiction u t by the proposition. Corollary 8.42. If z0 is a minimum of J and J is twice Gateaux differen2 tiable, then dw J(z0 , h) ≥ 0 for all h. u Proof. Apply the proposition and Proposition 8.18 (or Equation (8.36)). t Assume, for the moment, that Y = Rm is finite dimensional and that W = U ∩ ZG . Lemma 8.43 ([C-1]). If z0 is an optimal element and if AW (z0 ; J, G) is an approximation to W relative to J and G, then there is a non-zero linear functional λ = (λ0 , λ1 , . . . , λm ) on R ⊕ Rm such that hλ(hJ (z0 )δx, hG (z0 )δx)i ≤ 0

(8.44)

for all δx in AW (z0 ; J, G). Proof. Let C(z0 ) = {(hJ (z0 )δx, hG (z0 )δx) : δx ∈ AW (z0 ; J, G)}, ρ = {(ρ, 0) : ρ < 0}. Both C(z0 ) and ρ are convex cones in R⊕Rm . We claim that C(z0 ) and ρ are separated. If so, then there is a λ 6= 0 such that hλ, C(z0 )i ≤ 0 andhλ, ρi ≥ 0. The result follows since hJ (z0 ), hG (z0 ) are continuous. Suppose the claim were false. Then there is a δx1 6= 0 in AW (z0 ; J, G) with hJ (z0 )δx1 < 0,

hG (z0 )δx1 = 0.

If dim[hG (z0 )AW (z0 ; J, G)] < m, then there would be a q 6= 0 (in R∗m = Rm ) such that (0, q) would separate C(z0 ) and ρ. So we may suppose that dim[hG (z0 )AW (z0 ; J, G)] = m. It follows that there is a simplex [z1 , z2 , . . . , zm+1 ] in R ⊕ Rm with 0 as an interior point such that (i) z1 = hG (z0 )δx1 = 0, zj = hG (z0 )δxj ,

j = 2, . . . , m + 1;

(8.45)

88

8 Optimization of Functionals: Necessary Conditions

(ii) ψ(δx) ∈ W − {z0 }for all δx in co {δx1 , . . . , δxm+1 }

(8.46)

where ψ is the twisting map in the definition of AW (z0 ; J, G); (iii) hJ (z0 )δxj < 0,

j = 1, . . . , m + 1;

(8.47)

(iv) {z2 − z1 , . . . , zm+1 − z1 } = {z2 , . . . , zm+1 } is a basis of Rm . The existence of such a simplex is easy to prove ([C-1] or exercise). Now define a linear map L : Rm → Z by L{zj − z1 } = L(zj ) = δxj − δx1 ,

j = 2, . . . , m + 1,

and extending by linearity. If z = Σrj zj + (1 − Σrj )z0 = Σrj zj is an element of [0, z2 , . . . , zm+1 ], then ϕ(z) = L(z − z1 ) + δx1 = Σrj δxj + (1 − Σrj )δxj is a continuous map of [0, z2 , . . . , zm+1 ] into co {δx1 , . . . , δxm+1 }. For 0 < α ≤ 1, define a continuous map hα : α[0, z2 , . . . , zm+1 ] → Rm by hα (αz) = αz − G(z0 + ψ(α[L(z) + δx1 ])).

(8.48)

Since AW (z0 ; J, G) is an approximation, hα (αz) = −og (αL(z) + αδx1 ) and og (αL(z) + αδx1 ) ∈ α[0, z2 , . . . , zm+1 ] for all α with 0 < α ≤ α0 for some α0 . Since hJ (z0 )δxj < 0 for j = 1, . . . , m+ 1, we have J(z0 + ψ(αL(z) + αδx1 )) < J(z0 ) for all α with 0 < α ≤ α1 for some α1 . Let β = min{α0 , α1 }. Then, by the Brouwer fixed point theorem [L-4], there is an element zβ in β[0, z2 , . . . , zm+1 ] which is a fixed point of hβ , i.e. hβ (βzβ ) = βzβ . From (8.48) it follows that G(z0 + ψ(βL(zβ ) + βδx1 )) = 0. In other words, z = z0 + ψ(βL(zβ ) + β(x1 )) is an element such that z − z0 ∈ ψ(co [δx1 , . . . , δxm+1 ) ⊂ W − {z0 } and t u J(z) < J(z0 ) and G(z) = 0. This contradicts the optimality of z0 . Suppose now that Y need not be finite-dimensional, that W = U∩ZG , and that AW (z0 ; J, G) is an approximation to W relative to J, G at an optimal element z0 . Then C(z0 ) = {(hJ (z0 )δx, hG (z0 )δx) : δx ∈ AW (z0 ; J, G)}, ρ = {(ρ, 0) : ρ < 0},

8 Optimization of Functionals: Necessary Conditions

89

are convex cones in R ⊕ Y . If these cones are separated, then there is a λ 6= 0 in (R ⊕ Y )∗ (= R ⊕ Y ∗ ) such that λ(C(z0 )) ≤ 0 and λ(ρ) ≥ 0. Suppose these cones were not separated, so there would be a δx1 6= 0 with hJ (z0 )δx1 < 0 and hG (z0 )δx1 = 0. We now use a typical argument to obtain a contradiction. If δx1 , δy1 . . . , δym are linearly independent elements of AW (z0 ; J, G) and we let K be the convex cone in AW (z0 ; J, G) generated by {δx1 , δy1 , . . . , δym }, then CK (z0 ) = {(hJ (z0 )δx, hG (z0 )δx) : δx ∈ K} is a convex cone in Y which is not separated from ρ. Moreover, CK (z0 ) is contained in a finite-dimensional subspace Rµ+1 (µ ≤ m) ⊂ R ⊕ Y and CK (z0 ) is not separated from the ray ρµ = {(ρ, 0µ } in Rµ+1 . By the proof of the lemma, there is an element z with J(z) < J(z0 ) and G(z) = 0. This contradicts the optimality of z0 . So we have established: Theorem 8.49. If z0 is an optimal element and if AW (z0 ; J, G) is an approximation to W relative to J and G, then there is a non-zero λ ∈ (R ⊕ Y )∗ such that λ((hJ (x0 )δx, hG (x0 )δx)) ≤ 0 and λ({(ρ, 0) : ρ < 0} ≥ 0 for all δx in AW (z0 ; J, G). In essence, the notion of approximation provides a way of defining admissible variations. This is a critical issue. Example 8.50. Let X = C 2 ([a, b]) and consider two norms: ||x( · )||∞ = sup |x(t)|, t∈[a,b]

||x( · )||d = ||x( · )||∞ + ||x0 ( · )||∞ . Obviously, if ||x( · )||d is small, then ||x( · )||∞ is small. If x(t) = sin(t/2 ), then x0 (t) = (1/) cos(t/2 ). Note that lim→0 ||x( · )||∞ = 0 but lim→0 ||x0 ( · )||∞ is not defined (oscillates infinitely). Given x(t) ∈ C 1 ([a, b]), an element δW x with δW x(a) = δW (b) = 0, and ||δW x( · )||d ≤ α, α > 0, is called a weak variation; an element δS x with δS x(a) = δS x(b) = 0 and ||δS (x)( · )||∞ ≤ α, α > 0, is called a strong variation. These lead to notions of weak and strong approximations and minima which need not be the same [G-1]. It is instructive to examine the differences. Example 8.51 (A discrete control problem [C-8]). Let fi : Rn × Rm → Rn , i = 0, . . . , k = 1; Ui ⊂ Rm , i = 0, 1, . . . , k − 1; and let U = {[u(i)] : u(i) ∈ Ui , i = 0, 1, . . . , k −1}. Suppose that fi ( · , u) is C 1 for all u in Ui and consider the difference equation x(i + 1) − x(i) = fi (x(i), u(i)),

(8.52)

i = 0, . . . , k − 1. Let h0 : Rn → Rν0 , h1 : Rn → Rν1 be C 1 maps of maximum rank. Then S0 = {x : h0 (x) = 0}, S1 = {x : h1 (x) = 0} are smooth varieties (initial and terminal sets). Let fi0 : Rn × Rm → R, i = 0, 1, . . . , k, be such that fi0 ( · , u) is C 1 for all u in Ui . Let

8 Optimization of Functionals: Necessary Conditions

90

J(x(0), x(k), [u(i)]) =

k−1 X

fi0 (x(i), u(i))

(8.53)

0

for x(0) ∈ S0 , x(k) ∈ S1 , [u(i)] ∈ U and [x(i)] the corresponding solution of (8.52). Let Fi : Rn × Rm → Rn+1 be given by Fi (x, u) = (fi0 (x, u), fi (x, u)) Z = R(k+1)n+k(n+1) . If z ∈ Z , then for i = 0, 1, . . . , k − 1. Set ∼ ∼ ∼ z ∼ = (x(0), . . . , x(k), y(0), . . . , y(k − 1)) with x(j) ∈ Rn , y(`) ∈ Rn+1 , and y(`) = (y0 (`), y(`)) with y(`) ∈ Rn . Let Z : y(`) ∈ F` (x(`), U` ), ` = 0, 1, . . . , k = 1} and J : Z → R, W = { ∼z ∈ ∼ ∼ Z G : ∼ → Rkn+ν0 +νk be given by J( ∼z ) =

k−1 X

y0 (`),

(8.54)

`=0

G( ∼z ) = (x(1) − x(0) − y(0), . . . , x(k) − x(k − 1) − y(k − 1), h0 (x(0)), h1 (x(k))). Let Cr ( ∼z , K), K convex be the radial cone of K at ∼z so that Cr ( ∼z , K) = {δz : ∃1 > 0 with h ∼z , δzi > 0 and ∼z + δz in K for 0 ≤  ≤ 1 }. Given ∼z 0 , let n o ∂F` AW (z0 ; J, G) = δx : δy(`) ∈ (x(`), u(`))δx(`) ∂x + Cr [F` (x(0), u(0), F` (x(`)u` )]. Let δxµ , µ = 1, . . . , N , be linearly independent elements of AW (z0 ; J, G). Then there exist  > 0 and uµ (i) ∈ Ui with ∂F` (x0 (i), u0 (i))δx(i) + Fi (x0 (i), uµ (i)). ∂x PN If δx ∈ co {δx1 , . . . , δxN }, then δx =  µ=1 αµ (δx)δxµ and we let Fi (x0 (i), u0 (i)) + δyµ (i) = 

ψ(δx) = (δx(0), . . . , δx(k), ψ0 (δx), . . . , ψk−1 (δx)) where ψi (δx) = Fi (x0 (i) + δx(i), u0 (i)) − Fi (x0 (i), u0 (i)) +

N X

αi (δx)[Fi (x0 (i) + δx(i), µµ (i)) − Fi (x0 (i) + δx(i), u0 (i))],

i=1

with the obvious choices for hJ and hG with ψ as the twisting (Exercise [C-8]). The theorem then implies there is a λ = (λ0 , ϕ) with λ 6= 0, λ0 ≤ 0,

8 Optimization of Functionals: Necessary Conditions

91

ϕ = (−ϕ1 , . . . , −ϕk , ξ0 , ξ1 ) with ϕi ∈ Rn , ξ0 ∈ Rν0 , ξ1 ∈ Rν1 such that for all δx = (δx0 , . . . , δx(k), δu(0), . . . , δu(k − 1)) in AW (z0 ; J, g) 0 ≥ λ0

 k−1 X h ∂f 0 i

0

+

k−1 XD

∂x

(x0 (i), u0 (i))δx(i) +

i ∂fi (x0 (i), u0 (i))δu(i)) ∂x

− ϕi+1 , δx(i + 1) − δx(i) −

i=0

∂fi (x0 (i), µ0 (i))δx(i) ∂x

E ∂fi (x0 (i), u0 (i))δu(i) ∂u D ∂h E D ∂h E 0 1 ξ0 , (x0 (0))δx(0) + ξ1 , (x0 (k))δx(k) . ∂x ∂x −

(The reader should do the calculation for, say, n = 2.) If J : Z → R is a functional, then we have examined two approaches to necessary conditions. One via derivatives and the other via separation of convex sets. Constraints, via G(z) = 0, are handled by a “Lagrange multiplier” λ. Example 8.55. Let D be a closed domain in X, C0∞ (D) = {ψ : D → R : ψ ∈ C ∞ (D), ψ(∂D) = 0} (i.e., vanishing on the boundary of D). If x ∈ D, then λx : C (D) → R given by λx (f ) = f (x) is an element of C (D)∗ as |λx (f )| = |f (x)| ≤ ||f ( · )||∞ and ||λx || ≤ 1. If P(D) is the space of probability measures on D and if, for x0 ∈ D, we set µx0 (A)R = 1 if x0 ∈ A, µx0 (A) = 0 if x0 6∈ A, then µx0 ( · ) ∈ P(D) and λx0 (f ) = D f (s)µx0 (ds). Suppose that inf x∈D [J(x)] = J(x∗ ) = α with J ∈ C 1 (D) and −∞ < α < ∞. Then Z inf [J(x)] = λx∗ (J) = J(s)µxx (ds). x∈D

D

˜ ˜ ˜ ∗ ) = 0 and J(x) ˜ ˜ ∗) Let J(x) = J(x) − α so that inf x∈D [J(x)] = J(x ≥ J(x all x ∈ D.RConsider “test” functions R{ψ ∈ C0 (D) : ψ(x) > 0 for x ∈ Int (D)} 0 ˜ and that D J(s)ψ (x)µx∗ (ds) = − D J˜0 (s)ψ(x)µx∗ (ds). Then, for such ψ, ˜ ˜ J(x)ψ(x) ≥ 0, x ∈ D and inf x,ψ [J(s)ψ(x)] = 0. Suppose that x∗ ∈ Int (D) and that Z Z 0 ˜ J(s)ψ (s)µx∗ (ds) = − J˜0 (s)ψ(s)µx∗ (ds) D

D

˜ ∗ )ψ 0 (x∗ ) = for all test functions (an “integration by parts”). Then 0 = J(x 0 ∗ ∗ ˜ J (x )ψ(x ) for all ψ. An example without a moral. We have already encountered the notion of “generalized” controls (i.e., probability measure-valued controls). Let X be a Banach space and let U be a metric space with metric ρ. Let G : X × U → X be a map with G and Gx = Dx G (derivative of G with respect to x) continuous. We have (cf; Lemma 8.23):

8 Optimization of Functionals: Necessary Conditions

92

Lemma 8.56. Let (xu , u) be a given element of X × U and let N be a neighborhood of (xu , u). Suppose that if (xv , v) ∈ N, then ||xu −xv || ≤ ρ(u, v). Then G(xu , u) − G(xv , v) = G(xu , u) − G(xu , v) + Gx (xu , u)[xu − xv ] + o(ρ(u, v)) for (xv , v) ∈ N. Proof. Entirely similar to Lemma 8.23 as o(||xu − xv ||) = o(ρ(u, v)).

t u

Again if J : X × U → R and ψ : X × U → Y and H : X × U × Y ∗ → R is given by H[x, u, λ] = J(x, u) + λ[ψ(x, u)], (8.57) we have: Proposition 8.58. Let N be a neighborhood of (xu , u) and suppose that J, ψ and xu , xv satisfy the conditions of the lemma. If ψ(xu , u) = 0, if (xv , v) ∈ N with ψ(xv , v) = 0, and if Hx [xu , u, λ] = Jx (xu , u) + λ[ψx (xu , u)] = 0,

(8.59)

then J(xu , u) − J(xv , v) = H[xu , u, λ] − H[xu , v, λ] + o(ρ(u, v))

(8.60)

(relating the change in J to the change in H). Proof. Entirely similar to the proof of Proposition 8.25.

t u

Example 8.61. Let X be a Banach space, U be a (separable) metric space, and E be a compact metric space with µ a regular Borel measure on E. Let P(U) = {ν : ν a regular, Borel, probability measure on U} and let ρ( · , · ) be the Prokhorov metric on P(U). Let M = {f : f : E → X × U, f measurable} and MP = {σ : σ : E → X × P(U), σ measurable}. If u ∈ U and δ(u) is the Dirac (probability) measure concentrated at u, then f : E → X × U with f (t) = (xf (t), uf (t)) can be viewed as σf : E → X × U via σf (t) = (xf (t), δ(uf )(t)) so that M ⊂ MP . Let ψ : X × U → X and J : X × U → R satisfy the conditions of the proposition. Let Ω = {u : u : E → U, u measurable, and there exists a unique xu : E → X such that ψ(xu , u) = 0}. We say that ψP , JP extend ψ, J if ψP (x, δ(u)) = ψ(x, u), JP (x, δ(u)) = J(x, u). Suppose that ψP , JP extend ψ, J and let Ωg = {σ : σ : E → P(U), σ measurable and there exists a unique xσ : E → X such that ψP (xσ , σ) = 0}. Then, we let H[x, u, λ] = J(x, u) + λ[ψ(x, u)] and HP [x, σ, λ] = JP [x, σ] + λ[ψP (x, σ)]. Then, for u ∈ Ω or σ ∈ Ωg , and λ as solution of (8.59), we have, for instance, JP (xσ , σ) − JP (xτ , τ ) = HP [xσ , σ, λ] − HP (xσ , τ, λ] + o(ρ(σ, τ )),

8 Optimization of Functionals: Necessary Conditions

93

or, emphasizing the dependence on t ∈ E, JP (xσ (t), σ(t)) − JP (xτ (t), τ (t)) = HP [xσ (t), σ(t)] − HP (xσ (t), τ (t), λ(t)] + o(ρ(σ, (t)τ (t))). Suppose there is continuity in t (and say for simplicity) E = [t0 , t1 ]. Then, by a standard argument, if σ ∗ (t) is a local minimum of JP , then it is a local minimum of HP [xσ∗ , σ, λ∗ ]. (See [L-6]).

Chapter 9

Control Problems: Classical Methods

Let E = [t0 , t1 ] be a fixed compact interval and let X be a Banach space. Let U ⊂ Bu be a Banach manifold (Bu a Banach space). Let f : X × U → X, X0 ⊂ X and consider the differential equation Z t (9.1) x(t, u(t)) = x0 + f (x(s), u(s))ds, x0 ∈ X0 , t0

where u( · ) ∈ Ω = {u : E → U : u is regulated and bounded }. Let K : X → R and L : X × U → R. If u( · ) ∈ Ω, let xu ( · ) be the solution (if any) of (9.1) on E and consider Rt J(u( · ), x0 ) = K(xu (t1 )) + t01 L(xu (s), u(s))ds, (9.2) R. G(xu ( · ), u( · )) = xu ( · ) − x0 − t0 f (x(s), u(s))ds. Then, the “classical” control problem is: find u( · ) ∈ Ω which minimizes J subject to G = 0. Observe that if the problem is not autonomous so that there is t-dependence (i.e., x˙ = f (x, u, t), etc.) then letting z be a new variable with ˜ u) z˙ = 1 and z(t0 ) = t0 and setting y = (x, z), the system y˙ = f˜(y, u), L(y, ˜ ˜ where f (y, u) = f (x, z, u), L(y, u) = L(x, z, u) is autonomous. In effect there is no loss of generality. Before developing the results which illustrate the variational approach, we prove some lemmas critical to that approach. Let X be a Banach space and let C01 (E, X) = {h ∈ C 1 (E, X) : h(t0 ) = 0, h(t1 ) = 0}. R Lemma 9.3. If x∗ : E → X ∗ is continuous and if E x∗ (s)[h(s)]ds = 0 for all h ∈ C (E, X), then x∗ (s) ≡ 0. Proof. If x∗ 6≡ 0, then there exists s0 with x∗ (s0 ) 6= 0, and by the Hahn– Banach theorem, there exists an h ∈ X with x∗ (s0 )[h] > 0. By continuity there is an interval [s1 , s2 ] with s0 ∈ [s1 , s2 ] and x∗ (s)[h] ≥  > 0 on [s1 , s2 ]. Let λ : R → R+ be given by

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_9

95

9 Control Problems: Classical Methods

96

( λ(s) =

(s − s1 )2 (s2 − s)2 , s ∈ [s1 , s2 ], elsewhere. 0,

˜ Then h(s) = λ(s)h is continuous on E and diction.

R t1

˜ x∗ (s)[h(s)]ds > 0, a contrau t R t Corollary 9.4. If x∗ : E → X ∗ is continuous and if t01 x∗ (s)[h(s)]ds = 0 for all h ∈ C01 (E, X), then x∗ (s) ≡ 0. Rt Lemma 9.5. Suppose x∗ : E → X ∗ is continuous. If t01 x∗ (s)[v(s)]ds = 0 for all v : E → X which are continuous (regulated more than enough) and Rt satisfy t01 v(s)ds = 0, then x∗ (s) ≡ c where c is a constant. [Note that if Rt ˙ h(t) = v(s)ds, then v(t) = h(t)]. t0

t0

Proof ([S-1]). If x∗ ( · ) is not a constant, then there exist s1 , s2 with t0 < s1 < s2 < t1 such that x∗ (s1 ) 6= x∗ (s2 ). Hence, there exists h ∈ X with (say) x∗ (s1 )[h] > x∗ (s2 )[h]. Let 1 , 2 be such that x∗ (s1 )[h] > 1 > 2 > x∗ (s2 )[h]. By continuity, for  > 0 sufficiently small, x∗ (s)[h] > 1 , s ∈ [s1 − , s1 + ], x∗ (s)[h] > 2 ,

s ∈ [s2 − , s2 + ],

and t0 ≤ s1 −  ≤ s1 +  ≤ s2 −  ≤ s2 +  ≤ t1 . Let ϕ : R → R+ be a C ∞ function with ϕ(s) > 0 on (−, ) and 0 elsewhere. Set ξ(s) = ϕ(s−s1 )−ϕ(s− Rt s2 ) so that ξ(s) is C ∞ , t01 ξ(s)ds = 0, and ξ(s) > 0 for s ∈ (s1 − , s1 + ), ξ(s) < 0 for s ∈ (s2 − , s2 + ). Let v(s) = ξ(s)h. Then v(s) is continuous Rt and t01 v(s)ds = 0. But Z

t1

x∗ (s)[v(s)]ds > 1

Z

s1 +

Z

ϕ(s − s1 )ds − 2 s1 − Z  > (1 − 2 ) ϕ(s)ds > 0

t0

s2 +

ϕ(s − s2 )ds s2 −

−

which is a contradiction.

t u

Lemma 9.6 (Dubois–Raymond). Let x∗ : E → X ∗ , y ∗ : E → X ∗ be continuous. Then Z t1 ˙ x∗ (s)[h(s)] + y ∗ (s)[h(s)]ds =0 t0

for all h ∈

C01 (E, X)

if and only if y ∗ ( · ) is differentiable and y˙ ∗ ( · ) = x∗ ( · ).

Proof. If y˙ ∗ = x∗ , then

97

9 Control Problems: Classical Methods

Z

t1

Z

t1

d ∗ ˙ (y (s)[h(s)])ds t0 ds ˙ 0 )] − y ∗ (t1 )[h(t ˙ 0 )] = 0 = y ∗ (t1 )[h(t

˙ x∗ (s)[h(s)] + y ∗ (s)[h(s)]ds =

t0

as h( · ) ∈ C01 (E, X). Conversely, let x∗1 (t) =

t0

t0

x∗ (s)ds. Then

d ∗ ˙ (x (t)[h(t)]) − x∗1 (t)[h(t)] dt 1

x∗ (t)[h(t)] = x˙ ∗1 (t)[h(t)] = and so Z t1 Z ˙ {x∗ (s)[h(s)]+y ∗ (s)[h(s)]}ds =

Rt

t1

t0

t1 ∗ ˙ (y ∗ (s)[h(s)]+x∗1 (s))[h(s)]ds+x [h] . 1 t0

By the previous lemma, y ∗ (t) = x∗1 (t) + C, C a constant, and hence, y˙ ∗ (t) = x˙ ∗1 (t) = x∗ (t). t u Lemma 9.7. Let X, Y be Banach spaces and λ : E × X → Y with E a compact interval (say, E = [0, 1]). If λ is a continuous map of E × U → Y and U is open in X, then Z ˜ λ(x) = λ(t, x)dt E

is continuous on U . Proof. Let  > 0 and x0 ∈ U . For any t˜ ∈ E, there is a neighborhood O(t˜) and a δ(t˜) > 0 such that if t ∈ O(t˜) and ||x − x0 || ≤ δ(t˜), then ||λ(t, x) − λ(t˜, x0 )|| ≤ . Since E is compact, there is a finite cover O(t˜i ), i = 1, . . . , ν of E; put δ = min{δ(t˜i )}. If ||x − x0 || ≤ δ,Rthen ||λ(t, x) − λ(t, x0 )|| ≤  for all ˜ ˜ 0 )|| ≤ t ∈ E. It follows that ||λ(x) − λ(x ||λ(t, x) − λ(t, x0 )||dt ≤ µ(E) for E ˜ is continuous. ||x − x0 || ≤ δ, i.e., λ t u Lemma 9.8. Let λ, X, Y E, U be as in the previous lemma. Suppose that ˜ Dx λ(t, x) is a continuous map of E × U into L(X, Y ). Then λ(x) is continuously differentiable on U and Z ˜ Dx λ(x) = Dx λ(t, x)dt E

for x ∈ U . R Proof. Let µ(x) = E Dx λ(x, t)dt. Then µ is continuous by the previous lemma. Since Dx λ(t, x) is continuous,  > 0 implies that there exists δ > 0 such that ||Dx λ(t, x + h) − Dx λ(t, x)|| ≤  for all t ∈ E and ||h|| ≤ δ (by the argument in the proof of the previous lemma). It follows that

9 Control Problems: Classical Methods

98

||λ(t, x + h) − λ(t, x) − Dx λ(t, x)h]|| ≤ ||h|| for all t and ||h|| ≤ δ, and hence, that Z ˜ ˜ ||λ(x + h) − λ(x) − µ(x)h|| = [λ(t, x + h) − λ(t, x) − Dx λ(t, x)h]dt Z E ≤ ||[λ(t, x + h) − λ(t, x) − Dx λ(t, x)h]||dt E

≤ ||h||µ(E), t u

and the result follows.

Consider now the “classical” control problem for the cost and systems (9.2). Let I be an interval in R and let u0 ( · ) ∈ Ω. Then a map u(t; θ) : E × I → U is a (smooth) perturbation (variation) of u0 if (i) 0 ∈ interior of I; (ii) u(t; 0) = u0 (t); (iii) u(t; θ), Dθ u(t, θ) are continuous; (iv) Z

t

f (x(s; θ), u(s; θ))ds

x(t; θ) = x0 +

(9.9)

t0

is defined and limθ→0 ||x(t; θ) − xu0 (t)|| = 0 uniformly in t. For instance, u(t; θ) = u0 (t) + θh(t), h given. Then Z

t

Dx f (x(s; θ), u(s; θ))[Dθ x(s; θ)]ds

Dθ x(t; θ) = t0

Z

t

Du f (x(s; θ), u(s; θ))[Dθ u(s; θ)]ds.

+ t0

Let A(s; θ) = Dx f (x(s; θ), u(s; θ)), B(s; θ) = Du f (x(s; θ), u(s; θ)) and ψ(t; θ) = Dθ x(t; θ) with ψ(t0 ; θ) = x0 . Then d ψ(t; θ) = A(t; θ)ψ(t; θ) + B(t; θ)Dθ u(t; θ), dt and this is a linear (perturbation) equation for small θ. Fix x0 so that J(u( · ), x0 ) = J(u( · )) (abuse of notation). For ease of exposition, set uθ ( · ) = u( · ; θ), xθ ( · ) = x( · , θ), etc. Let Z

t1

P (θ) = J(uθ ) = K(xuθ (t1 )) +

L(xθ (s), uθ (s))ds. t0

Then P (θ) − P (0) = J(uθ ) − J(u0 ) represents the change in cost due to the perturbation uθ . In view of our assumptions, Dθ P (θ) is defined for θ small. Define a map H : X × U × X ∗ → R by H(x, u, λ) = L(x, u) + λ[f (x, u)].

(9.10)

99

9 Control Problems: Classical Methods

H is called the Hamiltonian. Observe that Z t1 h dxθ (s) i H(xθ (s), uθ (s), λ(s)) − λ(s) P (θ) = K(xθ (t1 )) + ds ds t0 and that ∂K ∂K Dθ K(xθ (t1 )) = (xθ (t1 ))Dθ xθ (t1 ) = (xθ (t1 ))ψ(t1 ; θ), ∂x ∂x Z h t1 i Dθ H(xθ (s), uθ (s), λ(s))ds Z

t0 t1

=

Dx H(xθ (s), uθ (s), λ(s))[ψ(s, θ)]ds t0

Z

t1

+

Du H(xθ (s), uθ (s), λ(s))[Dθ uθ (s)]ds, t0

and, integrating by parts, that Z t1 i hd i h Z t1 d Dθ − λ(s) xθ (s)ds = − λ(s) ψ(s; θ) ds ds ds t0 t0 Z t1 ˙ = λ(t0 )ψ(t0 ; θ) − λ(t1 )ψ(t1 ; θ) + λ(s)ψ(s; θ)ds. t0

If u0 is a minimizer for J, then Dθ P (θ) = 0. Suppose that λ0 (s) θ=0 satisfies dλ0 (s) = −D H(x(s; 0), u(s; 0), λ(s)) = −D H(x (s), u (s), λ(s)), x x 0 0 ds (9.11) λ0 (t1 ) = Dx K(x0 (t1 )). Then, the (necessary) condition for a minimum becomes Dθ P (θ)

Z

t1

= θ=0

t0

Du H(x0 (s), u0 (s), λ0 (s))Dθ u(s; θ)

ds = 0. θ=0

By, for example, Lemma 9.3, Du H(x0 (s), u0 (s), λ0 (s)) ≡ 0. Note also that x˙ 0 (s) = f (x0 (s), u0 (s)) = Dλ H(x0 (s), u0 (s), λ0 (s)), so that x, λ may be viewed as “adjoint” R t or “conjugate” variables. Note that K(x(t1 )) = K(x(t0 )) + t01 Dx K(x(s))[f (x(s), u(s))]ds so that (under our assumptions) the terminal cost term can be omitted without loss

9 Control Problems: Classical Methods

100

of generality. If we vary the problem slightly by assuming the end points satisfy a constraint of the form g(x(t0 ), x(t1 )) = 0 and the cost is of the form Rt J(u) = K(xu (t0 ), xu (t1 )) + t01 L(xu (s), u(s))ds, then there is an additional multiplier µ and the adjoint (or costate) equation (9.11) becomes d λ(s) = −Dx H(x0 (s), u0 (s), λ(s)), ds (−λ0 (t0 ), λ0 (t1 )) = [Dx K + µDx g][x(t0 ), x(t1 )], and again Du H ≡ 0. Let us now, for simplicity, assume that there is no terminal cost so that Z t1 h i d P (θ) = H(xθ (s), uθ (s), λ(s)) − λ(s) xθ (s) ds, ds t0 and that H is smooth, say, C 3 in x and u. Moreover, suppose that u(t; θ) = u0 (t) + θh(t) for h(t) in a ball about 0. Let Q[xθ (s), uθ (s), λ(s)] be the quadratic form given by 2 Q[xθ (s), uθ (s), λ(s)][ξ, η] = Dx22 H [ξ, ξ] + 2Dxu H [ξ, η] + Du2 2 H [η, η]. θ

θ

θ

Then, the second derivative (or variation) is given by d2 P (θ) = dθ2

Z

t1

Q[xθ (s), uθ (s), λ(s)][ψ(s; θ), h(s)]ds t0

Z

t1

+

Dx H(xθ (s), uθ (s), λ(s))[Dθ ψ(s, θ)]ds t0 Z t1



λ(s) t0

d Dθ ψ(s; θ)ds. ds

Integrating by parts, we ultimately have Z t1 d2 P (θ) = Q[xθ (s), uθ (s), λ(s)][ψ(s; θ), h(s)]ds dθ2 t0 and the further necessary condition P 00 (0) ≥ 0 where P 00 (0) =

Z t1 d2 P (θ) = Q[x0 (s), u0 (s), λ0 (s)][ψ(s; θ), h(s)]ds. dθ2 0 t0

Before analyzing the situation further, we point out that the conditions are merely necessary and, so far, require heavy assumptions. Let us recall Taylor’s formula [D-7]. Lemma 9.12. Let E be open in R, f : E → X, X a Banach space. Suppose f ∈ C p (E, X). Then, for any a, θ ∈ E,

9 Control Problems: Classical Methods

101

f (θ) − f (a) = (θ − a)Df (a) + (θ − a) (p − 1)!

+

(θ − a)2 2 (D f )(a) + · · · 2!

p−1

(Dp−1 f (a)) + r(a, θ)

where the remainder r(a, θ) is given by Z r(a, θ) = a

θ

(θ − s) (p − 1)!

p−1

Dp f (s)ds.

Lemma 9.13 ([D-7]). Let X, Y be Banach spaces, O be open X; f ∈ C p (O, Y ). Let x ∈ O and h ∈ X with αx + (1 − α)h ∈ O for 0 ≤ α ≤ 1. Then f (x+h)−f (x) = Df (x)[h]+ where Z

1

ρ(x, h) = 0

1 2 Dp−1 f (x) D f (x)[h, h]+· · ·+ [h, . . . , h]+ρ(x, h) 2! (p − 1)!

(1 − s) (p − 1)!

p−1

Dp f (x + sh)[h, . . . , h]ds

(we sometimes write h(n) for [h, . . . , h] with n entries). If  > 0, then there is a δ > 0 such that if ||h|| < δ, then ||f (x + h) − f (x) −

P X 1 j (D f )(x)[h(j) ]|| ≤ ||h||P j! j=1

(note Dn f (x) is an n-form). Suppose, by way of illustration, that O is open in Rn , F : O → R with F ∈ C 2 (O, R). Let PF (α) = F (x + αh),

0 ≤ α ≤ 1, x + αh ∈ O.

Then PF (α) = PF (0) + αPF0 (0) + α2

1

Z

(1 − s)PF00 (αs)ds

(9.14)

0

where PF0 (α) = DF (x + αh)[h], PF00 (α) = D2 F (x + αh)[h, h]. Since s)ds = 1/2, (9.14) becomes PF (α) − PF (0) =

αPF0 (0)

α2 00 P (0) + α2 + 2 F

Z

R1 0

(1 −

1

[PF00 (αs) − PF00 (0)](1 − s)ds.

0

(9.15) Here (D2 F )(x + αh) is a bilinear operator and since X = Rn , this operator can be represented by the n × n matrix ΘF = [∂ 2 F/∂xi ∂xj ]. Proposition 9.16. If x ˜ is a local minimum of F in O, then DF (˜ x) = 0 and ΘF (˜ x) ≥ 0 (i.e., ΘF (˜ x)[h, h] ≥ 0 for all h).

9 Control Problems: Classical Methods

102

Proof. Apply the theory of a single variable to PF .

t u

x) > 0, i.e., if (ΘF (˜ Proposition 9.17. If DF (˜ x)[h, h] > 0 x) = 0 and ΘF (˜ for all h 6= 0, then x ˜ is a strict local minimum of F . Proof. Let γ(α) be given by Z γ(α) =

1

[PF00 (αs) − PF00 (0)](1 − s)ds.

0

Since limα→0 γ(α) = 0, there is a δ > 0 such that if |α| < δ, then for x ˜ + αh ∈ x, δ), |γ(α)| < PF00 (0)/4. Applying (9.15) and noting that PF0 (0) = 0, we S(˜ have h P 00 (0) h P 00 (0) i i PF (α) = PF (0) + α2 F + γ(α) ≥ α2 F − |γ(α)| . 2 2 It follows that PF (α) − PF (0) > α2 PF00 (0)/4 (as |γ(α)| is strictly less than PF00 (0)/4). In other words, PF (α) > PF (0), and we have a strict local minimum. t u Observe that if Q is a quadratic form on Rn , then Q is positive definite if and only if it is coercive (exercise: prove this). However, in infinite dimensions, this is false and coerciveness is critical. Example 9.18. Let U ⊂ X and J : U → R be C 2 . Then PJ (α) = J(u + αh) = J(u) + αDJ(u)[h] + α2 = J(u) + αPJ0 (0)[h] + α2

Z

Z

1

(1 − s)D2 J(u + αsh)[h, h]ds

0 1

(1 − s)PJ00 (αs)[h, h]ds.

0

Hence (abuse of notation) PJ (α) − PJ (0) = αPJ0 (0)[h] +

α2 α2 00 PJ (0)[h, h] + 2 2

Z

1

[PJ00 (αs) − PJ00 (0)][h, h]ds.

0

Consider ||h|| = 1 and S(u, δ), i.e., |α| < δ. Suppose PJ0 (0) = 0 and that PJ00 (0) is coercive, so there is K > 0 with PJ00 (0)[h, h] ≥ K||h||2 . Let γ(α, h) be given by Z 1 γ(α, h) = (1 − s)[PJ00 (αs) − PJ00 (0)][h, h]ds. 0

Since limα→0 [PJ00 (αs) − PJ00 (0)] = 0, choose α sufficiently small so that |γ(αh)| < K||h||2 /4 ≤ PJ00 (0)[h, h]/4. Then

9 Control Problems: Classical Methods

103

h P 00 (0)

i h P 00 (0) i [h, h] + γ(α, h) ≥ α2 J [h, h] − |γ(α, h)| 2 2 hK i α2 K ≥ α2 ||h||2 − ||h||2 = K||h||2 > 0. 2 4 4

PJ (α) − PJ (0) = α2

J

In other words u is a local minimum.

t u

Suppose X1 , X2 , Y are Banach spaces. Then [D-7], L(X1 × X2 ; Y) ' L(X1 ; L(X2 ; Y)) and so, if Q ∈ L(X1 × X2 ; Y) is a continuous bilinear form then, for x1 ∈ X1 , Q1 (x1 ) : X2 → Y given by Q1 (x1 )[x2 ] = Q(x1 , x2 ) is an element of L(X2 ; Y) and the map Q → Q1 is an isometry of L(X1 × X2 ; Y) onto L(X1 ; L(X2 ; Y)). Returning to the control problem, we observe that for the Hamiltonian H(x, u, λ) : X × U × X ∗ → R, Dx H ∈ L(X, R) = X ∗ , Du H ∈ L(U, R) = U ∗ and Dx (Dx H) = Dx2 H ∈ L(X × X; R), Du (Du H) = Du2 H ∈ L(U × U ; R), 2 Dx (Du H) = Dxu H ∈ L(X × U ; R),

(9.19)

2 Du (Dx H) = Dux H ∈ L(U × X; R).

Thus the bilinear form of the “second variation” is given by (suppressing λ) 2 2 QH (ξ, η) = Dx2 H[ξ, ξ] + Dxu H[ξ, η] + Dux H[η, ξ] + Du2 H[η, η].

(9.20)

We let Qx , Qu be the linear maps corresponding to Dx2 H, Du2 H, respectively. Similarly we let Qxu : X → U ∗ and Qux : U → X ∗ be the linear maps which 2 2 correspond to Dxu H and Dux H. If X and U are Hilbert spaces (or, perhaps, simply reflexive), then Qxu = Q∗ux and Qux = Q∗xu and we have QH [x, u, λ](ξ, η) = hQx (x, u, λ)ξ, ξi + 2hQxu (x, u, λ)ξ, ηi + hQu (x, u, λ)η, ηi. (9.21) In particular, if X = Rn , U = Rm , then Qx is an n × n symmetric matrix, Qu is an m × m symmetric matrix, and Qxu is a m × n matrix with Qux as adjoint (i.e., transpose). Proposition 9.22. Let E = [a, b] be an interval and A : E → L(X, Y ) be regulated. Then x(t) ˙ = A(t)x(t) has a unique solution ϕ(t, s; x0 ) for (s, x0 ) such that ϕ(s, s; x0 ) = x0 . Moreover, (1) if t ∈ E, then x0 → ϕ(t, s; x0 ) is a surjective linear homeomorphism Φ(t, s) of X into X; (2) dΦ(t, s)/dt is the unique solution of ˙ Z(t) = A(t)Z(t), Z(s) = I

9 Control Problems: Classical Methods

104

in L(X, X); and (3) if r, s, t are in E, then Φ(r, t) = Φ(r, s)Φ(s, t), Φ(t, s)−1 = Φ(s, t) [so Φ is invertible]; and the map (s, t) → Φ(s, t) is continuous. Corollary 9.23. If s → B(s) is a regulated map of E → L(U, X) and s → h(s) is a regulated map of E into U , then Z

t

ψh (t) = Φ(t, t0 )x0 +

Φ(t, s)B(s)h(s)ds

(9.24)

t0

is the unique solution of x(t) = A(t)x(t) + B(t)h(t) with x(t0 ) = x0 . ˙ Corollary 9.25. Suppose x0 = 0. Then there is a constant K such that Z t1 Z t1 ||ψh (s)||2 ds ≤ K ||h(s)||2 ds t0

t0

and hence Z

t1

{||ψh (s)||2 + ||h(s)||2 }ds ≤ (K + 1)

Z

t1

||h(s)||2 ds.

t0

t0

Proof. Since (t, s) → Φ(t, s) is continuous and B(s) is regulated, there is an M > 0 such that Z t Z t1 ||h(s)||ds. ||h(s)||ds ≤ M ||ψn (t)|| ≤ M t0

t0

Then Z

t1

||ψh (s)||2 ds ≤

t0

Z

t1

M2

nZ

t1

||h(r)||dr

o2

ds

t0

t0

nZ ≤ M (t1 − t0 )

t1

2

≤ M 2 (t1 − t0 )2

Z

||h(r)||dr

o2

t0 t1

||h(s)||2 ds

t0

t u

by the Schwarz inequality.

So, now let E = [t0 , t1 ] be a fixed compact interval. Let x0 be a given element of X and let L : X × U → R, f : X × U → X be elements of CR2 (regulated second partials, continuous first partials). Let Ω = {u : u : E ∈ U , u bounded and regulated}. If u( · ) ∈ Ω, set Z

·

f (x(s), u(s))ds, G(xu , u) = xu ( · ) − x0 − t0 Z t1 J(u, x0 ) = L(xu (s), u(s))ds, t0

(9.26)

9 Control Problems: Classical Methods

105

where xu ( · ) is the solution (trajectory) of G(xu , u) = 0 corresponding to u. Let H(x, u, λ) = L(x, u) + λ[f (x, u)] be the Hamiltonian. For a given u0 ( · ), let uh (t, θ) = u0 (t) + θh(t) where h(t) is regulated and uh (t, θ) ∈ Ω. If Z

t1

[H(xu0 +θh (s), u0 (s) + θh(s), λ(θ, s)) − λ(θ, s)[x˙ u0 +θh (s)]]ds,

PH (θ, h) = t0

(9.27) then, suppressing x0 which is fixed, J(uh ) − J(u0 ) = PH (θ, h) − PH (0, h) and we have 0 PH (θ, h) − PH (0, h) = θPH (0, h)[h] +

where R(θ, h) = θ2

Z

θ2 00 P (0, h)[h, h] + R(θ, h) 2! H

1 00 00 (1 − s){PH (θs, h) − PH (0, h)}[h, h]ds.

0

Consider the linear differential equation ψ˙ h (t) = A(t)ψh (t) + B(t)h(t),

ψh (t0 ) = 0,

(9.28)

and the differential equation ˙ s) = −Dx H(xu +θh (s), u0 (s) + θh(s), λ(θ, s)), λ(θ, 0

λ(θ, t1 ) = 0,

(9.29)

where A(t) = Dx f (xu0 (t), u0 (t)), B(t) = Du f (xu0 (t), u0 (t)). As in (9.19), let Qx , Qu be the linear maps which correspond to Dx22 H, Du2 2 H, respec2 2 tively and let Qxu , Qux be the linear maps corresponding to Dxu H, Dux H, respectively. If a(s) = Qx ((xu0 +θh )(s), (u0 + θh)(s), λ(θ, s)) − Qx (xu0 (s), u0 (s), λ(s)), b(s) = (Qxu +Qux )((xu0 +θh )(s), (u0 +θh)(s), λ(θ, s)) − (Qxu +Qux )(xu0 (s), u0 (s), λ(s)), c(s) = Qu ((xu0 +θh )(s), (u0 + θh)(s), λ(θ, s)) − Qu (xu0 (s), u0 (s), λ(s)), (9.30) where λ(s) = λ(0, s), then R(θ, h) = θ2

Z

1

(1 − s){a(s)[ψh , ψh ] + b(s)[ψh , h] + c(s)[h, h]}ds 0

(abuse of notation). Since a( · ), b( · ), c( · ) are continuous, given  > 0, there is a δ > 0 such that

9 Control Problems: Classical Methods

106

Z

t1

R(θ, h) ≤ 

||h(s)||2 ds

(9.31)

t0

for ||h|| ≤ δ. Moreover, if xu0 +θh − xu0 = ϕ = ψh + [ϕ − ψh ], then, setting z( · ) = ϕ( · ) − ψh ( · ), z˙ = Dx f (xu0 , u0 )z + o()/, z(0 ) = 0, it follows that θz(t) = o(θ) and, since limθ→0 z(t) = 0 uniformly in t, we can write 00 (0)[h, h] = PH

Z

t1 0 {Qu0 [h, h]+Qxu [ψh , h]+Q0ux [h, ψh ]+Q0x [ψh , ψh ]}ds (9.32)

t0

where Q0u [h, h] = Qu (xu0 (s), u0 (s), λ(s))[h, h], Q0x [ψh , ψh ] = Qx (xu0 (s), u0 (s), λ(s))[ψh , ψh ], 0 Qxu [ψh , h] = Qxu (xu0 (s), u0 (s), λ(s))[ψh , h],

Q0ux [h, ψh ] = Qux (xu0 (s), u0 (s), λ(s))[h, ψh ]. In view of (9.31) and (9.30), it follows that if (Du H)(xu0 , u0 , λ) = 0 and 00 PH (0) is coercive, then u0 is a local minimum of J (exercise: prove this). 00 (0) is positive (i.e., Observe also that if u0 is a local minimum, then PH 00 PH (0)[h, h] ≥ 0 for all h). What more can be done? Let us, for the moment, assume that X, U are Hilbert spaces so that QH has the form (9.21) and let us write (suggestively) ∂2H (x, u, λ) = Qx (x, u, λ), ∂x2 ∂2H (x, u, λ) = Qxu (x, u, λ), ∂xu ∂2H (x, u, λ) = Qu (x, u, λ). ∂u2 Then 00 PH (0)[h, h] =

Z

t1

t0

nD ∂ 2 H

E D ∂2H E D ∂2H Eo ψ , ψ + 2 ψ , h + h, h ds. h h h ∂x2 ∂x∂u ∂u2

Suppose that u0 , xu0 , λ0 is a local minimum and that ∂ 2 H (s), ∂x2 0

∂ 2 H (s), ∂x∂u 0

∂ 2 H (s) ∂u2 0

are the corresponding values. 2 Claim 9.33. ∂ H 2 (s) is positive for s ∈ E = [t0 , t1 ]. ∂u 0 [This is Legendre’s necessary condition for a minimum.]

9 Control Problems: Classical Methods

107

Proof. Suppose the claim be false. Then there is a ξ ∈ (t0 , t1 ) and an h such that D ∂2H E (ξ)h, h = −γ||h||2 , γ > 0. ∂u2 0 By continuity9.1 , there is an interval (ξ − a, ξ + a) ⊂ E such that D ∂2H

E γ (σ)h, h < − ||h||2 2 ∂u 2

for σ ∈ (ξ − a, ξ + a). Let 0 <  < a and let ν(t) be such that |ν(t)| ≤ 1, |ν 0 (t)| = 1/ in (ξ − , ξ + ) and 0 elsewhere.

1

t0

t1

Fig. 9.1 A function ν(t).

Let ϕ(s) = ν 0 (s) and h(s) = ϕ(s)h. Then for this h(s), Z

t

||ψh (t)|| ≤ M

Z

ξ+

|ϕ(s)|||h||ds ≤ M ||h|| t0

ξ−

1 ds = 2M ||h|| 

with M > 0. Let α, β be such that ∂ 2 H ∂ 2 H 2 (σ) < α, (σ) < β. ∂x 0 ∂x∂u 0 For σ ∈ [ξ − a, ξ + a]. Then 9.1

Note that although only regulated, there are only simple jump discontinuities so we leave the (strict) argument to the reader.

9 Control Problems: Classical Methods

108

Z

ξ+

ξ−

Z ξ+ 2 D ∂2H E ∂ H ψ , ψ ds ≤ (s) ||ψh (s)||2 ds h h ∂x2 0 ∂x2 0 ξ−

≤ 2α4M 2 ||h||2 , Z ξ+ ξ+ D 2 E ∂ H ||ψh (s)|||ϕ(s)|||h||ds ψh , ϕ(s)h ds ≤ 2 ∂x∂u 0 ξ− ξ− 1 ≤ 2β2M ||h|| ||h|| = 4M β||h||2 ,  Z ξ+ D 2 Z ξ+ E D 2 E ∂ H 0 2 ∂ H ϕ(s)h, ϕ(s)h = ν (s) h, h ds 2 2 ∂u 0 ∂u 0 ξ− ξ− Z ξ+ D ∂2H E ≤ |ν 0 (s)|2 h, h ds ∂u2 0 ξ− Z 1 ξ+ γ γ < 2 − ||h||2 ds ≤ − ||h||2 .  ξ− 2 

Z

It follows that 00 PH (0)[h(s), h(s)] < [8M 2 α + 4M β − α/]||h||2 . 00 As  → 0, PH [h(s), h(s)] < 0 a contradiction.

t u

We leave the question of a Banach space version open. In the “classical” theory, an attempt was made to show that if ∂ 2 H/∂u2 was positive definite, then there was a minimum. This did not quite work. We shall give an indication of what can be done with the variational approach using the so-called Legendre transformation. Again suppose that X, U are Hilbert spaces and write " ∂2H ∂2H #   Z t1 2 ψh ∂x∂u 00 PH (0, 0)[h, h] = (ψh∗ , h∗ ) ∂x ds. (9.34) 2 2 ∂ H ∂ H h t0 ∂u∂x

∂u2

Suppose that Γ : E → L(X, X) and that Γ˙ is continuous (regulated enough). Then Z t1 d hΓ (s)ψh (s), ψh (s)ids = 0, Γ (t1 ) = 0, t0 ds 00 so that adding this term to (9.34) does not change PH (0)[h, h]. If Γ is symmetric, then 00 PH (0, 0)[h, h] " ∂2H Z t1 ∂f ˙ 2 + Γ + Γ ∂x + ∗ ∗ = (ψh , h ) ∂x ∂f ∗ ∂2H t0 ∂u∂x + ∂u Γ

(9.35) ∂f ∗ ∂x Γ

∂2H ∂x∂u

+Γ 2

∂ H ∂u2

∂f ∂u

#



ψh ds. h

9 Control Problems: Classical Methods

109

Suppose that ∂ 2 H/∂u2 (xu0 (s), u0 (s), λ(s)) is uniformly coercive on E where u0 is an extremal control.9.2 Then the “matrix” in (9.35) will be coercive if and only if the Schur complement of ∂ 2 H/∂u2 is coercive.   A B with D invertible, then Remark 9.36. If M = B∗ D M=

    0 I BD−1 A − BD−1 B ∗ 0 I . D−1 B ∗ I 0 I 0 D

A − BD−1 B ∗ is called the Schur complement of D. M will be coercive if and only if both D and A − BD−1 B ∗ are coercive. 2 Claim 9.37. If ∂ H is coercive (at u0 , x0 , λ) and if ∂u2  ∂2H ∂2H ∂f ∂f ∗ ∂f  ∂ 2 H −1  ∂ 2 H ∂f ∗ ∗ Γ˙ + + Γ + Γ − + Γ + Γ =Z ∂x2 ∂x ∂x ∂x∂u ∂u ∂u2 ∂u∂x ∂u (9.38) 00 is coercive, then PH (0)[h, h] is coercive and u0 is a (strict) local minimum.

The requirements of the variational approach are much too stringent and consequently other approaches are in order. Problems (for the reader). Find a version of Claim 9.37 where X, U are Banach spaces. Assume reflexivity if necessary. Primarily it is a matter of careful notational bookkeeping. What can be done about finding Γ such that Z in (9.38) is coercive, even in the finite-dimensional case? We now consider the complex of ideas leading to the Hamilton–Jacobi– Bellman equation. Consider the control problem with Z t1 J(u; x0 , x1 ) = L(xu (s), u(s))ds, t0 Z s xu (s) = x0 + f (xu (r), u(r))dr, xu (t1 ) = x1 . t0

Suppose that u0 gives a minimum and let t0 ≤ s0 < s1 ≤ t1 and y0 = xu (s0 ), y1 = xu (s1 ). Consider the problem Z s1 J1 (u; y0 , y1 ) = L(xu (s), u(s))ds, s0 Z s xu (s) = y0 + f (xu (r), u(r))dr , xu (s1 ) = y1 . s0

Claim 9.39. u0 on [s0 , s1 ] minimizes J1 . 9.2

Note that a coercive self-adjoint operator on a Hilbert space is invertible (Appendix B).

9 Control Problems: Classical Methods

110

Proof. If not then there exists a v( · ) on [s0 , s1 ] such that J1 (v; y0 , y1 ) < ˜ be J1 (u0 ; y0 , y1 ) and x˙ v (s) = f (xv (s), v(s)), xv (s0 ) = y0 , xv (s1 ) = y1 . Let u given by ˜ = u0 on [t0 , s0 ], u ˜ = v on [s0 , s1 ], u u ˜ = u0 on [s1 , t0 ]. ˜ is admissible and x˙ u˜ (s) = f (xu˜ (s), u ˜(s)) on [t0 , t1 ] (as regulated) Then u xu˜ (t0 ) = x0 , xu˜ (t1 ) = x1 (by uniqueness of the solution of differential equations), and J(˜ u; x0 , x1 ) < J(u0 ; x0 , x1 ) contradicting the optimality of u0 . The claim can be interpreted as “segments of optimal trajectories are optimal.” This is the principle of optimality [B-2]. Let τ ∈ (t0 , t1 ) and let x ∈ X which is reachable from x0 at τ , i.e., there is a u ∈ Ω such that x˙ u (s) = f (xu (s), u(s)), xu (t0 ) = x1 , xu (t1 ) = x1 , and xu (τ ) = x. If Z

t1

L(xu (s), u(s))ds,

J[x, τ ] = J[xu (τ ), τ ] =

(9.40)

τ

then, differentiating with respect to τ , dJ = (Dx J)[xu (τ ), τ ][x˙ u (τ )] + (Dτ J)[xu (τ ), τ ] dτ = −L(xu (τ ), u(τ )), or, equivalently, Dτ J[xu (τ ), τ ] + H(xu (τ ), u(τ ), (Dx J)[xu (τ ), u(τ )]) = 0.

(9.41)

This is the celebrated Hamilton–Jacobi–Bellman equation for the “cost to go” which holds along any admissible trajectory. Proposition 9.42. Suppose that V (x, t; α) is a solution of (9.41) depending smoothly on a parameter α where the trajectory and control are extremal. Then Dα V is constant. Proof. Along an extremal, x˙ u = Dλ H. But d 2 2 Dα V (x, t; α) = Dtα V + (Dxα V )[x˙ u ] dt and, differentiating (9.41) with respect to α, we have 2 2 Dtα V = −Dxα V [Dλ H]

and so, dDα V (x, t; α)/dt ≡ 0.

t u

9 Control Problems: Classical Methods

111

Proposition 9.43 ([C-2]). Let dim X = n so X = Rn , and let V (x, t; α), α = (α1 , . . . , αn ), be a general solution of (9.41). Suppose that the Jacobian ∂V (x, t; α)/∂(x, α) is (uniformly) nonsingular. Let β = (β1 , . . . , βn ) be an (arbitrary) element of Rn . If ψi (t; α, β) and λi , i = 1, . . . , n, are given by Dαi V (ψ(t), t; α) = βi , λi = Dxi (ψ(t), t; α), ψ = (ψ1 , . . . , ψn ), λ = (λ1 , . . . , λn ), i = 1, . . . , n. Then ψ, λ are a general solution of the canonical equations ∂H ψ˙ = , ∂λ

∂H λ˙ = − . ∂x

(9.44)

For a proof, see [C-2]]. In essence, the canonical equations (9.44) are the “characteristics” of the partial differential equation (9.41). Moreover, these equations hold along any ˙ then for ψ, λ, ω(t) = extremal. Intuitively, if ω(t) = (∂H/∂x)x˙ + (∂H/∂λ)λ, 0. Problem 9.45. Generalize Proposition 9.43 to the case where X = H is a (separable) Hilbert space. Note that if α ∈ H, then Dα H ∈ H ∗ = H. What is an appropriate substitute for the Jacobian (inverse mapping theorem 3.2)? What is a suitable notion of a “general” solution? Examining this problem should provide considerable insight. Suppose now that E = [t0 , t1 ] and that Σ is a smooth dynamical system on E with generator f (x, u, t) (Chapter 7). Let S be fixed in X, O an open domain in E × X with O ⊂ Df , (t0 , x0 ) ∈ O, and u ∈ Ω. Then: Definition 9.46. u is O-admissible relative to (t0 , x0 ) if (t, xu (t)) ∈ O for t ≥ t0 and xu (t0 ) = x0 . If u is O-admissible relative to (t0 , x0 ), then u O-transfers (t0 , x0 ) to S if (i) x˙ u (s) = f (xu (s), u(s), s), xu (t0 ) = x0 , s ∈ [t0 , t1 ]; (ii) (s, xu (s)) ∈ O, s ∈ [t0 , t1 ]; and (iii) xu (t1 ) ∈ S. Definition 9.47. A control u0 ∈ Ω is O-optimal relative to (t0 , x0 ) if u0 O-transfers (t0 , x0 ) to S and J(u0 ; t0 , x0 ) ≤ J(u; t0 , x0 ) for all u ∈ Ω which O-transfers (t0 , x0 ) to S. Assume that J is given by Z t1 J(u; t0 , x0 ) = K(t1 , xu (t1 )) + L(xu (s), u(s), s)ds. t0

9 Control Problems: Classical Methods

112

We have: Lemma 9.48 (Caratheodory–Kalman). Suppose K(t, x) = 0 for (t, x) ∈ S and that there exists ω0 (s, x) ∈ U such that 0 = L(x, ω0 (s, x), s) ≤ L(x, ω, s) for all ω ∈ U , (s, x, ) ∈ O. If there is a u0 ∈ Ω such that u0 O-transfers (t0 , x0 ) to S (with (t1 , xu (t1 )) ∈ S) and u0 (s) = ω0 (s, xu0 (s)) a.e. s ∈ [t0 , t1 ], then u0 is O-optimal relative to (t0 , x0 ). Proof. Note first that Z

t1

J(u0 ; t0 , x0 ) = K(t1 , xu0 (t1 )) +

L(xu0 (s), ω0 (s, xu0 (s)), s)ds = 0. t0

Let u be any control which O-transfers (t0 , x0 ) to S. Then Z

t2

J(u; t0 , x0 ) = K(t2 ; xu (t2 )) +

L(xu (s), u(s), s)ds, t0

since (t2 , xu (t2 )) ∈ S, K(t2 , xu (t2 )) = 0. Since L(xu (s), u(s), s) ≥ L(xu (s), ω0 (s, xu (s)), s) = 0, we have J(u; t0 , x0 ) ≥ 0 = J(u0 ; t0 , x0 ).

t u

Definition 9.49. Let H(x, u, λ, t) = L(x, u, t) + λ[f (x, u, t)] be the Hamiltonian. H is O-regular if, for (t, x) ∈ O and λ ∈ X ∗ , there is an ω0 (t, x, λ) ∈ U such that (9.50) H(x, ω0 (t, x, λ), λ, t) ≤ H(x, ω, λ, t) for all ω ∈ U . If H is O-regular, then Dt V (t, x) + H(x, ω0 (t, x, Dx V (t, x)), Dx V (t, x), t) = 0

(9.51)

is the HJB (Hamilton–Jacobi–Bellman) equation. Observe that, for any C 1 functions (regulated enough) V (t, x) and x(t), h dx(t) i dV (t, x(t)) = Dt V (t, x(t)) + Dx V (t, x(t)) dt dt

(9.52)

is the total derivative. Theorem 9.53. Suppose that H is O-regular and that the trajectories xu (t) = ϕ(t; u, t0 , x0 ) are C 1 on O for u ∈ Ω. If (i) u0 O-transfers (t0 , x0 ) to S with (t1 , xu (t1 )) ∈ S, (ii) there is a C 1 solution of the HJB equation (9.52) on O with V (t, x) = K(t, x) on S, and

9 Control Problems: Classical Methods

113

(iii) u0 (s) = ω0 (s, xu0 (s), Dx V (s, xu0 (s)) a.e. for s ∈ [t0 , t1 ], then u0 is O-optimal relative to (t0 , x0 ). ˜ x) = 0 and let Proof. Let K(t, ˜ u, t) = Dt V (t, x) + H(x, u, Dx V (t, x), t). L(x, Since Dt V does not depend (explicitly) on u and H is O-regular ˜ ω0 (t, x, Dx V (t, x), t)) ≤ L(x, ˜ ω, t) L(x, for (t, x) ∈ O and ω ∈ U . Let ˜ t0 , x0 ) = J(u;

t2

Z

˜ u (s), u(s), s)ds. L(x

t0

If u1 O-transfer (t0 , x0 ) to S with t2 as transfer time, then Z

˜ 0 ; t0 , x0 ) = 0 = J(u

t1

t0 Z t2

˜ 1 ; t0 , x0 ) = ≤ J(u

˜ u (s), u0 (s), s)ds L(x 0 ˜ u (s), u1 (s), s)ds. L(x 1

t0

In view of (9.52), we have d ˜ u (t), u0 (t), t), V (t, xu0 (t)) + L(xu0 (t), u0 (t), t) = L(x 0 dt d ˜ u (t), u1 (t), t). V (t, xu1 (t)) + L(xu1 (t), u1 (t), t) = L(x 1 dt Hence, by hypotheses (ii) and (iii), Z

t1

L(xu0 (s), u0 (s), s)ds − V (t0 , x0 )

0 = K(t1 , xu0 (t1 )) + t0

= J(u0 ; t0 , x0 ) − V (t0 , x0 ) Z t2 ≤ K(t2 ; xu1 (t2 )) + L(xu1 (s), u1 (s), s)ds − V (t0 , x0 ) t0

≤ J(u1 ; t0 , x0 ) − V (t0 , x0 ), and the theorem holds.

t u

Corollary 9.54. V (t0 , x0 ) = J(u0 ; t0 , x0 ) and for t ∈ [t0 , t1 ], V (t, xu0 (t)) = J(u0 ; t, xu0 (t)). (So, in a sense, the “cost” satisfies the HJB equation). Example 9.55 ([B-2]). Consider the smooth dynamical system x˙ = f (x, u, t), x(t0 ) = x0 with cost functional

9 Control Problems: Classical Methods

114

Z

t1

J(u; t0 , x0 ) = K(xu (t1 ), t1 ) +

L(xu (s), u(s), s)ds t0

and terminal condition g(xu (t1 ), t1 ) = 0. Let t ≥ t0 be the “current” time and let  > 0 be small. Then Z t+ Z t1 L(xu (s), u(s), s)ds L(xu (s), u(s), s)ds + J(u; t, x) = K(xu (t1 ), t1 ) + t+

t

and so Z

t+

L(xu (s), u(s), s)ds + J(u; t + , xu (t + )).

J(u; t, x) = t

Let V (t, x) = inf u J(u; t, x). Then V (t, x) =

hZ

inf

u(t,t+)

t+

L(xu (s), u(s), s)ds + V (t + , xu (t + ))

i

t

(where u(t,t+) is u restricted to (t, t + ]). Assuming that V is C 1 and noting that xu (t + ) = xu (t) + f (xu (t), u(t), t) + o(2 ), we have V (t + , x(t + )) = Dt V + Dx V [f ] + o(2 ). It follows that 0 = [Dt V + inf (L + Dx V [f ])] + o(2 ). u

Dividing by  and letting  → 0, gives Dt V + inf {H(x, u, Dx V, t)} = 0 u

(9.56)

with V (t1 , x(t1 )) = K(x(t1 ), t1 ) on g = 0. Fix τ and x so that (9.56) becomes Dt V (τ, x) + inf {L(x, u(τ ), τ ) + Dx V (τ, x)[f (x, u(τ ), τ )]} = 0.

(9.57)

u(τ )

Write w = u(τ ) ∈ U and (9.56) becomes Dt V (τ, x) + inf {L(x, w, τ ) + Dx V (τ, x)[f (x, w, τ )]} = 0. w

(9.58)

A necessary condition for w ˜ to minimize is ˜ τ )] = 0, Dw V (x, w, ˜ τ ) + Dx V (τ, x)[Dw f (x, w,

(9.59)

˜ τ, Dx V (τ, x)] = 0. Suppose that dim X = dim U = n and i.e., Dw H(x, w, ˜ τ ) is non-singular. Then, from (9.59), that the n × n matrix Dw f (x, w,

9 Control Problems: Classical Methods

115

˜ τ )[DW f (x, w, ˜ ˜ τ )]−1 = P (x, w), Dx V (τ, x) = −Dw L(x, w, ˜ τ ) + P (x, w)[f ˜ τ )] = Q(x, w). Dt V (τ, x) = L(x, w, ˜ ˜ (x, w,

(9.60)

Assuming differentiability, it follows that (Dw P )(Dt w) = (Dw Q)Dx w + Dx Q

(9.61)

is a first-order linear PDE for w = u(t, x) [C-2]. Essentially the first-order PDE (HJB equation) (9.56) with appropriate boundary conditions if solvable provides a direct solution method for the control problem. There is a vast literature on the subject. Exercise. Relate the results in Example 9.55 to the necessary conditions. Example 9.62. Consider the Hamiltonian H(x, u, λ, t) = L(x, u, t) + λ[f (x, u, t)] and let E = [t0 , t1 ]. Fix a control w : E → U (i.e., w ∈ Ω) so that x˙ w (s) = f (xw (s), w(s), s), xw (t0 ) = x0 . If t ∈ E, let xw (t) = x (t fixed). Let Z

t1

L(xw (s), w(s), s)ds

V (t, x; w) = t

Z

t+

Z

t1

L(xw (s), w(s), s)ds

L(xw (s), w(s), s)ds +

= t

Z =

t+ t+

L(xw (s), w(s), s)ds + V (t + , xw (t + ); w). t

It follows, as in the previous example, that Dt V (t, xw (t); w) + H(xw (t), w(t), Dx V (t, x; w), t) = 0

(9.63)

for t ∈ E. In other words, under our assumptions, the HJB equation holds along any admissible trajectory. Let Fu (t, x, z, λt , λx ) = λt + H(x, u, λx , t). Consider the system of ODE’s

9 Control Problems: Classical Methods

116

dt ds dλt ds dλx ds dz ds

= Dλt Fu = 1,

dx = Dλx Fu = f (x, u, t) = Dλx H, ds

= −Dt Fu − (Dz Fu )λt = −Dt Fu = −Dt H

as Dz Fu = 0,

= −Dx Fu − (Dz Fu )λx = −Dx Fu = −Dx H

(9.64)

as Dz Fu = 0,

= λt Dt Fu + λx Dλx Fu = λt + λx [f ] = λt + H − L.

Suppose that V (t, x; w) is a solution of the HJB equation, then x(s) ˙ = f (x(s), w(s), s) = Dλx H(x(s), w(s), Dx V (s, x(s); w), s), λt = Dt V (t, x; w) λx = Dx V (t, x; w), (9.65) dz = Dt V (t, x; w) + Dx V (t, x, w)[f (x, w, s)] = −L(x(s), w(s), s), ds are a solution of the system (9.64). The system (9.65) may be referred to as the “characteristic system” of the problem [C-2]. We now expand on the approach taken. Let E = [t0 , t1 ] and L : X ×U → R and f : X × U → X. Set H(x, u, λ0 , λ) = λ0 L(x, u) + λ[f (x, u)]

(9.66)

with λ0 ∈ R∗ = R and λ ∈ X ∗ . For ease of exposition, set y = (y0 , x), g(y, u) = (L(x, u), f (x, u)) so that g : (R × X) × U → R × X (or Y = R × X and g : Y × U → Y ). Let ν 0 = (ν0 , ν) ∈ R × X and consider y˙ w (t) = g(yw (t), w(t)) = (Dλ0 H, Dλ H)

(9.67)

with yw (t0 ; ν) = yw (t0 ) + ν + o(, ν), yw (t0 ) = (0, x0 ). Note the following: Remark 9.68 ([C-5]). If s ∈ (t0 , t1 ), there is an  > 0 such that if ||ξ − yw (s)|| < , then there is a unique solution yw (t; ξ, s) of (9.67) on E with yw (s; ξ, s) = ξ. In other words (under our standard assumption on L, f ), small changes or perturbations in the initial point, or, in fact, any point means trajectories are defined. We let yw (t; ν) = yw (t) + ψw (t) + o(, t)

(9.69)

where lim sup ||o(, t)||/ = 0 (uniformly in t). Setting ψw (s) = (ϕ0 (s), ϕ(s)) →0 t∈E

we have ϕ˙ 0 (s) = Dx L(yw (s), w(s))[ϕ(s)], ϕ(s) ˙ = Dx f (yw (s), w(s))[ϕ(s)],

(9.70)

with ϕ0 (t0 ) = ν0 , ϕ(t0 ) = ν or ψ˙ w (0) = ∆w (s)ψw (s),

ψw (t0 ) = ν

(9.71)

9 Control Problems: Classical Methods

117

where ∆w (s) : R × X → R × X is a bounded linear map with the “matrix” " # 0 Dx L(yw (s), w(s)) ∆w (s) = . 0 Dx f (yw (s), w(s)) The linear equation (9.71) has a unique solution for any set of initial data and therefore we let Φw (t, s) be the fundamental solution [C-5]. Then Φw (t, s) is a linear homeomorphism of Y . Thus Φw (t, s) maps a vector attached to yw (s) into a vector attached to yw (t), and therefore, a cone attached to yw (s) into a cone attached to yw (t). Consider the “adjoint” system of (9.71) λ˙ = −∆∗w λ,

λ = (λ0 , λ)

(9.72)

∗ where ∆w (s) : R∗ × X ∗ → R∗ × X ∗ is the adjoint of ∆w (s).

Remark 9.73. Let Z be reflexive and ∆(s) ∈ L(Z, Z) for s ∈ E. If ˙ = −∆∗ (s)ξ(s), then ξ(s)[z(s)] is constant on E. z(s) ˙ = ∆(s)z(s) and ξ(s) Let F (ξ, z) = ξ[z] and G(s) = F (ξ(s), z(s)). Then ˙ + (Dz F )[z] dG(s)/ds = (Dξ F )[ξ] ˙ = (Dξ F )(−∆∗ ξ) + (Dz F )(∆z) = −∆∗ ξ[z] + ξ[∆z] = 0 (viewing Dξ F as the element z by reflexivity and by the definition of the adjoint). Exercise: what is the situation if Z is not reflexive (needs careful “bookkeeping”)? Now fix a λ1 in R × X ∗ and let Nt1 be the hyperplane in Y ∗ = R × X ∗ given by λ1 [y − y1 ] = 0. The solution of (9.72) with λ(t1 ) = λ1 defines a hyperplane Nt with λ(t, λ1 ) = λ1 [y − yw (t)] where yw (t) is the solution of (9.71) with yw (t) = y1 . Similarly given λ0 and the hyperplane P0 defined by λ0 [y − y0 ] = 0, then λ(t; λ0 ), the solution of (9.72) with λ(t0 ; λ0 ) = λ0 , defines a hyperplane Pt with λ(t, λ0 )[y − yw (t)] = 0. These are called “comoving” hyperplanes [H-2]. Let us recall Theorem 8.49 and the necessity of separating an approximation and the ray of decreasing cost at an optimal element. The key, as was noted there, is determining the approximation from variations in the control. We shall, in a moment, determine the effects of certain special variations of a control. Let Ω = {w : w : E → U } be the set of admissible controls. Then (9.71) becomes y˙ w (t) = g(yw (t), w(t)) = (Dλ0 H, Dλ H), yw (t0 ) = x0 .

(9.74)

Let Ψw (t) be solution (a.e.) t ∈ E of Ψ˙ w (t) = −Ψw (t)Λw (t),

Ψw (t1 ) = I

(9.75)

118

9 Control Problems: Classical Methods

(this is a differential equation in L(R × X, R × X)). If v ∈ Ω and Zv,w (s) = Ψw (s)[yv (s) − yw (s)],

(9.76)

then Zv,w ( · ) exists, is unique, and is continuous. Definition 9.77. Let A = {yw (t1 ) : w ∈ Ω} be the set of attainable points or set of attainability. If w is a fixed element of Ω, then A(w) = {Zv,w (t1 ) : v ∈ Ω} is the w-approximate set of attainability. Note that the first (or y0 ) coordinate of Zv,w (t1 ) is J(v) − J(w). Lemma 9.78. If yw (t1 ) is an interior point of A, then there is a v ∈ Ω with J(v) < J(w) (i.e., w is not optimal). Proof. If (J(w), xw (t)) is an interior point, then (J(w) − , xw (t1 )) ∈ A for some  > 0 and there is a v ∈ Ω with (J(v), xv (t1 )) = (J(w) − , xw (t1 )) so t u that J(v) < J(w). Corollary 9.79. If u∗ is optimal, then yu∗ (t1 ) is a boundary point of A. Corollary 9.80. If yw (t1 ) is a boundary point of A, then 0 is a boundary point of A(w). Proof. A(w) = A − {yw (t1 )} since Ψw (t1 ) = I.

t u

These corollaries are a key element in Halkin’s approach to necessary conditions. Example 9.81 (Temporal variation). Let τ ∈ R and let  > 0 be “small.” Assume the system (9.74) is defined on [t0 , t1 + τ ] if τ > 0. Let w ∈ Ω and define w[τ ](s) by  w(s), t0 ≤ s ≤ t1 + τ, τ < 0,  w[τ ](s) = w(s), t0 ≤ s ≤ t1 , (9.82)   w(t1 ), t1 < s ≤ t1 + τ, τ > 0. Then yw (t1 +τ ) = yw (t1 )+vw (τ )+o() where vw (τ ) ∈ R×X is independent of . Since yw (t1 + τ ) − yw (t1 ) = y˙ w (t1 )τ + o(), we have vw (τ ) = g(yw (t1 ), w(t1 ))τ.

(9.83)

Thus the map τ → vw (τ ) takes R → span [g(yw (t1 ), w(t1 ))] which is a line through yw (t1 ). The map is linear. If τ1 , . . . , τM ∈ R and α1 ≥ 0, . . . , αM ≥ 0, then v(s) = w[Σαi τi ](s) is the “temporal variation” of w(s) generated by τ1 , . . . , τM , α1 , . . . , αM . Exercise: what is Zw[τ ],w ? Is it an element of A(w)? Example 9.84 (Spatial variation). Let  > 0 be small. Let β ∈ (t0 , t1 ), α > 0 and let Eα,β = (β − α, β] ⊂ E. Let w( · ) be an element of Ω and let ω ∈ U . Define w[ω, Eα,β ](s) by

9 Control Problems: Classical Methods

( ω, s ∈ Eα,β , w[ω, Eα,β ](s) = 6 Eα,β . w(s), s ∈

119

(9.85)

It is assumed here that w[ω, Eα,β ]( · ) ∈ Ω, and, for ease of exposition, let ω(s) = w[ω, Eα,β ](s). Then yω (β) = yw (β) + [y˙ ω (β) − y˙ w (β)α] + o() = yw (β) + [g(yω (β), ω)) − g(yw (β), w(β))]α + o() = yw (β) + ξβ (ω)α + o() where ξβ (ω) = (L(xw (β), ω), f (xw (β), ω)) − (L(xw (β), w(β)), f (xw (β), w(β))), since L(xw (β), ω) = L(xw (β), ω) + o()/ and f (xw (β), ω) = f (xw (β), ω) + o()/ as L, f are differentiable with respect to x. Note that ξβ (ω) depends only on β and ω. It follows that for s ≥ β, yω (s) = yw (s) + Φw (s, β)ξβ (ω)α + o() or −1 yω (s) = yw (s) + Ψw (s, β)ξβ (ω)α + o(),

because Ψw Φw is constant on E(= I) (Ψ the fundamental solution of (9.75) and Φ the fundamental solution of (9.71). Let γ[ω, Eα,β ] = Φw (t1 , β)ξβ (α)α so that yω (t1 ) = yw (t1 ) + γ[ω, Eα,β ] + o(). Viewing γ[ω, Eα,β ] as “attached” to yw (t1 ), the direction is independent of ~ [ω, β] = {yw (t1 ) + ρξβ (ω) : ρ ≥ 0}. If Eα1 ,β1 = α and so defines a ray ρ (β1 − α1 , β1 ], Eα2 ,β2 = (β2 − α2 , β2 ] are small disjoint intervals of E and ω1 , ω2 are elements of U . Let w(s) = w[ω1 , ω2 , α1 , β1 , α2 , β2 ](s) be given by ˜  ω1 , s ∈ Eα1 ,β1 ,  ˜ ω(s) = ω2 , s ∈ Eα2 ,β2 ,   w(s), s 6∈ Eα1 ,β1 ∪ Eα2 ,β2 ˜ · ) assumed in Ω). Then (again ω( yω˜ (s) = yw (s) + [Φw (s, β1 )ξβ1 (ω1 )α1 + Φw (s, β2 )ξβ2 (ω2 )α2 ] + o() and yω˜ (t1 ) = yw (t1 ) + [γ[ω1 , Eα1 , β1 )] + γ[ω2 , Eα2 , β2 )]] + o(). Thus the rays lie in a cone generated by spatial variations. ~ (t1 ) be the ray due to temporal variations and let Aw (t1 ) = Let ρ co ~ {γ(ω, Eα,β )] be the closed convex hull generated by (a set of) spatial vari-

9 Control Problems: Classical Methods

120

~ (t1 ) + Aw (t1 ) be a convex cone of attainability and ations. Let Cw (t1 ) = ρ then so is its closure Cw (t1 ). The process can be done for any s ∈ (t0 , t1 ] giving Cw (s) and Cw (s). It is easy to show that Φw (t, s)Cw (s) ⊂ Cw (t),

t ≥ s,

and, in particular, Φw (t, s)Cw (s) ⊂ C w (t1 ) or, equivalently, Cw (s) ⊂ Ψw (t1 , s)Cw (t1 ). The key is examining the elements Zv,w (s) of (9.76) when v comes from a temporal or spatial or more general variation. We note that for Zv,w (s) = Ψw (s)[yv (s) − yw (s)] or, equivalently, yv (s) − yw (s) = Ψw−1 (s)[Zv,w (s)], we have Z˙ v,w (s) = −Ψw (s)Λw (s)Ψw−1 (s)Zv,w (s)

(9.86)

+ Ψw (s)[g(yv (s), v(s)) − g(yw (s), w(s))], so the dependence of g(yv (s), v(s)) − g(yw (s), w(s)) on v(s) − w(s) is critical. For example, if g(y, u) is Lipschitz in u, then a local minimum u∗ of J is a minimum of the Hamiltonian H[x∗u (t), v, t, λ∗ (t)] as a function of v ∈ U . [Exercise: prove this.]. Note that ||Zv,u∗ || can be made small and H is continuous. Example 9.87. Let Z t1 J(u) = L(x(s), u(s))ds,

Z

t1

xu (s) = x0 +

t0

f (xu (s), u(s))ds, t0

and H(x, u, λ) = L(x, u) + λ[f (x, u)]. Then t1

Z J(v) − J(u) =

[H(xv (s), v(s), λ(s)) − H(xu (s), u(s), λ(s))]ds t0

Z

t1



λ(s)[x˙ v (s) − x˙ u (s)]ds. t0

Suppose that u∗ (s) is an admissible control such that ˙ x˙ u (s) − x∗ (s)] H(xu (s), u(s), λ(s)) − [H(x∗ (s), u∗ (s), λ(s)) ≥ −λ[ (a.e. on E). Then ∗

Z

t1

J(v) − J(u ) ≥ t0

so that

∗ ˙ {−λ(s))[x˙ u (s) − x∗ (s)] − λ(s)[x u (s) − x (s)]}ds

9 Control Problems: Classical Methods

J(v) − J(u∗ ) ≥

Z

t1

t0

121

d − λ(s)[xu (s) − x∗ (s)] ds ds

≥ λ(t0 )[xu (t0 ) − x∗ (t0 )] − λ(t1 )[xu (t1 ) − x∗ (t1 )]. If λ(t1 ) = 0 and since xu (t0 ) = x∗ (t0 ) = x0 , we have J(u) − J(u∗ ) ≥ 0 and u∗ is optimal. Let A(t) = {(xu (t), u(t)) : xu , u admissible}, Ax (t) = {u ∈ U : (x, u) ∈ A(t)} (i.e. there is an admissible pair with u(t) = u and xu (t) = x) and Au (t) = {x ∈ X : (x, u) ∈ A(t) for some u ∈ U }. Let H ∗ (x, λ, x) = minu∈Ax (s) H(x, u, λ(s)) (if Ax (s) is compact, the minimum exists). If u∗ (s), x∗ (s) are a trajectory with H ∗ (x∗ (s), λ(s)) = ˙ H(x∗ (s), u∗ (s), λ(s)) and H(x, λ(s)) − H ∗ (x∗ (s), λ(s)) ≥ −λ(s)[x − x∗ (s)] ∗ (a.e. in s), then u is optimal. R Example 9.88. Suppose that ||f (xu (t), u(t))|| ≤ m(t) with E m(t)dt < ∞ for all u ∈ Ω. Let A(x0 , t) = {xu (t) : u ∈ Ω} be the attainable set from x0 . Let A(x0 , t) be the closure of A(x0 , t). We claim: A(x0 , t) is bounded and the map t → A(x0 , t) is continuous (in the Hausdorff distance). Since Rt xu (t) = x0 + t0 f (xu (s), u(s))ds, Z

t

||xu (t, x0 )|| ≤ ||x0 || +

Z ||f (xu (s), u(s))||ds ≤ ||x0 || +

t0

m(s)ds E

so that A(x0 , t) is bounded. Suppose P1 ∈ A(x0 , s1 ) and  > 0, then there exists u suchR that ||xu (s1 )−P1 || < /2. Since m( · ) is integrable and ||xu (s)− s xu (s1 )|| ≤ s1 m(s)ds, there is a δ() such that ||xu (s) − xu (s1 )|| < /2 if |s − s1 | < δ(). Thus ρ(P1 , A(x0 , s)) < /2 if |s − s1 | < δ(). Similarly, if P2 ∈ A(x0 , s), then ρ(A(x0 , s), P2 ) <  for δ() small enough. In other words, ρ(A(x0 , s), A(x0 , s1 )) <  if |s − s1 | < δ(). Let B(x, t) = {f (x, u(t)) : u ∈ U }. Suppose that B(x, t) is convex for all x, t, then A(x0 , t) is convex (and hence so is A(x0 , t)). For let xu (t), xu1 (t) ∈ A(x0 , t), then αxu (t) + (1 − α)xu1 (t) = αx0 + (1 − α)x0 Z t + [αf (xu (s), u(s)) + (1 − α)f (xu1 (s)), u1 (s))]ds. t0

But f (xu (s), u(s)), f (xu1 (s)), u1 (s)) ∈ B(x, s). So A(x0 , t) is a closed, bounded, convex set in X. If X is reflexive, A(x0 , t) is weakly compact. Suppose that J(u), u ∈ Ω is bounded below and that J, f are convex, then an optimal solution exists. For let J ∗ = inf u∈Ω J(u), then, by weak compactness of A(x0 , t), there exists uj ∈ Ω, xuj such that J(uj ) → J ∗ ∗ and R . f (xuj ( · )), uj ( · )) → φ(∗· ) with φ( · ) integrable. If x ( · ) =∗ x0 + φ(s)ds, then xuj ( · ) → x ( · ) uniformly. We claim φ(s) ∈ B(x (s), s). t If so, then there exists u∗ (s) such that φ(s) = f (x∗ (s), u∗ (s)) and J ∗ = J(u∗ ). Suppose the claim is false. Then on some [s0 , s1 ] ⊂ E, there exists λ ∈ X ∗ such that λ[φ(s)] > limj→∞ sup λ[f (x∗ (s), uj (s))] for s ∈ [s0 , s1 ].

9 Control Problems: Classical Methods

122

But limj→∞ xj (s) = x∗ (s) and ||uj (s) − uk (s)|| → 0, so that λ[φ(s)] > limj→∞ sup λ[(J, f )(xj (s), uj (s))] a contradiction. [For illustration consider the linear, quadratic case as an exercise.] Example 9.89. Suppose that (under usual assumptions) Z tZ xσ (t) = x0 + f (xσ (s), u)µσ (du)(s)ds, t0 U Z vσ (s, xσ (s)) = f (xσ (s), u)µσ (du)(s), U Z tZ J(xσ , σ) = L(xσ (s), u)µσ (du)(s)ds, and t0 U Z L(xσ , σ(s)) = L(xσ (s), u)µσ (du)(s). U

Let H(xσ (s), σ(s), λ(s)] = L(xσ (s), σ(s)) + λ(s)[vσ (s, xσ (s))] and note that Z

t

xσ (t) = x0 +

vσ (s, xσ (s))ds t0

so that x˙ σ (t) = vσ (t, xσ (t)). Then Z

t1

[L(xσ (t), σ(t)) − L(xτ (t), τ (t))]dt

J(σ) − J(τ ) = t0 Z t1

[H(xσ (t), σ(t), λ(t)) − H(xτ (t), τ (t), λ(t))]dt

= t0

Z

t1

λ(t)[x˙ τ (t) − x˙ σ (t)]dt

+ t0

and so J(σ) − J(τ ) ≥ 0 for λ(t1 ) = 0 as xτ (t0 ) = xσ (t0 ) = x0 . This leads to a sufficient condition as in Example 9.87. Frequently, we take ∂H ˙ λ(t) =− (xσ (t), σ(t), λ(t)), ∂x

λ(t1 ) = 0

and ∂H (xσ (t), σ(t), λ(t)) = ∂x

Z

∂L (xσ (t), u)dµσ (du)(t) U ∂x Z ∂f + λ(t) (xσ (t), u)dµσ (du)(t). U ∂x

9 Control Problems: Classical Methods

123

The reader should consider this in light of the necessary conditions and for the linear quadratic case.

Chapter 10

Control Problems: An Existence Theory

Let E = [a, b] be a compact interval, X be a Banach space, let U be a separable metric space, and let P(U ) be the (metric) space of regular, Borel, probability measures on U . If u ∈ U , then δu denotes the Dirac probability measure on U , i.e., δu (A) = 1 if u ∈ A and δu (A) = 0 if u 6∈ A, for A ⊂ U a measurable set. We note that co {δu : u ∈ U } is dense in P(U ) and, in fact, coQ {δu : u ∈ U } = {Σαj δuj : αj ∈ Q, αj ≥ 0, Σαj = 1} is also dense in P(U ) (as U is separable [P-2]). Definition 10.1. Let f : E → X. f is a step-function if there is an increasing finite sequence a = s0 < s1 · · · < sn = b such that f is constant on (sj , sj+1 ), j = 0, . . . , n − 1, i.e., f (s) = xj for s ∈ (sj , sj+1 ). Let S(E, X) = {ϕ : ϕ is a step function}. A map f : E → X is regulated if it has right and left limits at all points of (a, b) and a right limit at a and a left limit at b. Let R(E, X) = {ϕ : ϕ is a regulated map of E into X}. The sequence s0 , s1 , . . . , sn is called a partition of E and a rational partition of E if s1 , . . . , sn−1 are elements of Q (i.e. are rational). Clearly S(E, X) ⊂ R(E, X) and the elements of S(E, X) are bounded maps of E into X. Let B(E, X) = {f : E → X, f is bounded} with ||f || = supt∈E ||f (t)||. We shall show that R(E, X) is closed in B(E, X) and that S(E, X) is dense in R(E, X). Remark 10.2. Suppose f ∈ R(E, X) and τ ∈ E. Then there is an open interval (a(τ ), b(τ )) ⊂ E, and, in fact, with a(τ ), b(τ ) in Q, such that ||f (s)− f (t)|| < 1/n if either both s, t ∈ (a(τ ), τ ) or s, t ∈ (τ, b(τ )). Proposition 10.3 ([D-7]). If f is regulated, then f is the limit in B(E, X) of step-functions, i.e., f is the limit of a uniformly convergent sequence of step functions. Proof. By the remark and the compactness of E, there are τi rational and open intervals (a(τi ), b(τi )) with a(τi ), b(τi ) rational which are a finite cover of E. Let sj , j = 0, . . . , m, be the (rational) partition corresponding to a, b, τi , a(τi ), b(τi ). Suppose sj ∈ (a(τi ), b(τi )), then either sj+1 ∈ (a(τi ), b(τi )) © Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_10

125

10 Control Problems: An Existence Theory

126

or sj+1 = b(τi ). This holds for j ≤ m − 1. By the remark, this means that if s, t ∈ (sj , sj+1 ), then ||f (s) − f (t)|| ≤ 1/n. Let ϕn (sj ) = f (sj ) and ϕn (s) = f ((sj + sj+1 )/2) if s ∈ (sj , sj+1 ). Then ϕn is a step-function and ||f − ϕn || ≤ 1/n. t u Proposition 10.4 ([D-7]). If f ∈ S(E, X) in B(E, X) then f is regulated. Proof. Let f be the uniform limit of ϕn ∈ S(E, X). If  > 0, there exists n such that ||f − ϕn || < /3. If τ ∈ E, then, by the remark, there is (a(τ ), b(τ )) with ||ϕn (s) − ϕn (t)|| < /3 if s, t in (a(τ ), τ ) or s, t in (τ, b(τ )). But then ||f (s) − f (t)|| ≤ ||f (s) − ϕn (s) + ϕn (s) − ϕn (t) + ϕn (t) − f (t)|| ≤ /3 + /3 + /3. So f is regulated as X is complete.

t u

Corollary 10.5. If SQ (E, X) = {ϕ : ϕ a step function with a rational partition}, then SQ (E, X) is dense in R(E, X). Corollary 10.6. If X is separable, then R(E, X) is separable. Proof. Let {xj } be a countable dense set in X. Let Π = {(sα ) : (sα ) a rational partition of E}. Then Π is countable and SQ (E, {xj }) (step-functions with values in {xj }) is countable and dense. t u Note that, as we have defined step-functions, the value at the points of the partition is not defined! Proposition 10.7. C (E, X) ⊂ R(E, X). Proof. By the definition. Here is an alternate proof. If ψ ∈ C (E, X), then ψ is uniformly continuous and so, if  > 0, there is a δ > 0 such that |s − t| < δ implies ||ψ(t) − ψ(s)|| < . Let a = s0 < s1 < · · · < sn = b be a rational partition with |sj − sj−1 | < δ, and set tj = (sj + sj+1 )/2 (say) so that sj−1 < tj < sj+1 . If t ∈ (sj , sj+1 ), |t − tj | < δ and ||ψ(t) − ψ(tj )|| < . Set ψα (t) = ΣχEj (t)ψ(tj ), Ej = (sj−1 , sj ). Then ψα ∈ S(E, X) and ||ψα − ψ|| < .10.1 t u Corollary 10.8. If X is separable, then C (E, X) is separable. We shall ultimately consider controls in R(E, U ) or in R(E, P(U )) or in L∞ (E, U ) or in L∞ (E, P(U )) which remain to be defined. Let now (X, ρ) be a metric space (ρ being the metric). 10.1

Note that ψ(sj +) =

lim ψ(s) and lim ψα (sj+1 ) = ψ(tj ) and ||ψ(sj ) − ψ(tj )|| < .

s → sj s ≥ sj

10 Control Problems: An Existence Theory

127

Definition 10.9. A map ψ : E → X is bounded if the diameter of ψ(E), i.e. supψ(t),ψ(s) ρ(ψ(t), ψ(s)) is finite. Let B(E, X) be the set of bounded functions from E into X. Define a metric ρB on B(E, X) by ρB (ψ, ϕ) = sup ρ(ϕ(t), ψ(t)). t∈E

A function f : E → X is a step-function if there is a partition a = s0 < s1 < · · · < sn = b such that f is constant on (sj , sj+1 ), j = 0, 1, . . . , n − 1. Let S(E, X) = {ϕ : ϕ is a step-function}. A map f : E → X is regulated if it has right and left limits at all points of (a, b) and a right limit at a and a left limit at b. [Note since (X, ρ) is a metric space, limits are defined.]. Let R(E, X) be the set of regulated maps of E into X. We shall mimic the previous development in this situation. Suppose f is regulated and τ ∈ E. Then there is an open interval (a(τ ), b(τ )) ⊂ E and, in fact, with a(τ ), b(τ ) in Q such that ρ(f (s), f (t)) ≤ 1/n if either both s, t, ∈ (a(τ ), τ ) or s, t ∈ (τ, b(τ )). For instance, let f (τ +) = x0 , then ρ(f (s), f (t)) ≤ ρ(f (s), x0 ) + ρ(x0 , f (t)) and, for s ≥ τ, t ≥ τ and |s − τ | < δ, |t − τ | < δ, ρ(f (t), x0 ) < 1/2n. Proposition 10.10 ([D-7]). If f is regulated, then f is the limit in B(E, X) of step-functions, i.e., f is the limit of a uniformly convergent sequence of step-functions. Proof. Entirely analogous to the proof of Proposition 10.3 with the replacement of ||f (s) − f (t)|| ≤ 1/n by ρ(f (s), f (t)) ≤ 1/n and by the replacement of ||f − ϕn || ≤ 1/n by ρB (f, ϕn ) ≤ 1/n. t u Similar replacements of || · || by ρ as appropriate lead to the following results. Proposition 10.11. If f ∈ S(E, X) in B(E, X), then f is regulated. Corollary 10.12. If SQ (E, X) = {ϕ : ϕ a step-function with a rational partition}, then SQ (E, X) is dense in R(E, X). Corollary 10.13. If X is separable, then R(E, X) is separable. Proposition 10.14. C (E, X) ⊂ R(E, X). We shall primarily consider the metric space P(U ) with (a version of) the metric ρP , the Prokhorov metric. Let us begin with consideration of a simple situation where E = [a, b], U is a separable metric space with metric ρU and X is a Banach space. We consider C (E, X) and C (E, U ) with the “norm” (better metric) |||uv |||∞ in C (E, U ) given by |||uv |||∞ = sup ρU (u(t), v(t)) = ρ∞ (u, v). t∈E

(10.15)

10 Control Problems: An Existence Theory

128

The control problem is: minimize Z

b

L(s, x(u)(s), u(s))ds

J(u) =

(10.16)

a

subject to t

Z

f (s, x(u)(s), u(s))ds

x(u)(t) = x0 +

(10.17)

a

and u( · ) ∈ C (E, U ). We assume that L, f are Lipschitz in x, u (with, for simplicity, a single Lipschitz constant L) and other appropriate existence conditions (Chapter 7). Lemma 10.18. The map of C (E, U ) into C (E, X) given by (10.17) is continuous and so is the corresponding map of C (E, U ) into R given by (10.16). Proof. Let u( · ), v( · ) be elements of C (E, U ). Let Z

t

ϕk+1,u (t) = x0 +

f (s, ϕk,u (s), u(s))ds, a

Z

t

f (s, ϕk,v (s), u(s))ds,

ϕk+1,v (t) = x0 + a

for k = 0, 1, . . . with ϕ0,u (t) = x0 = ϕ0,v (t) [cf.: Chapter 7]. Let ∆k,u (t) = ϕk+1,u (t) − ϕk,u (t), ∆k,v (t) = ϕk+1,v (t) − ϕk,v (t), for k = 0, 1, . . . and let ϕ˜n,u (t) = ϕ0,u (t) +

n−1 X

∆j,u (t),

j=0

ϕ˜n,v (t) = ϕ0,v (t) +

n−1 X

∆j,v (t),

j=0

for n = 0, . . .. Then ϕ˜n,u → x(u) and ϕ˜n,v → x(v). We have ||x(u)(t)−x(v)(t)|| = ||x(u)(t)− ϕ˜n,u (t)+ ϕ˜n,u (t)− ϕ˜n,v (t)+ ϕ˜n,v (t)−x(v)(t)|| so that ||x(u)−x(v)||∞ ≤ ||x(u)−ϕ˜n,u ||∞ +||ϕ˜n,u −ϕ˜n,v ||∞ +||x(v)−ϕ˜n,v ||∞ . (10.19)

10 Control Problems: An Existence Theory

129

Given  > 0, there exists an n0 such that, if n ≥ n0 , then ||x(u) − ϕ˜n,u ||∞ < /3 and ||x(v) − ϕ˜n,v ||∞ < /3 (Chapter 7). We have the following: Claim 10.20. There exist δ > 0 and n1 ≥ n0 such that if |||u − v|||∞ < δ and n ≥ max(n1 , n0 ), then ||ϕ˜n,u − ϕ˜n,v || < /3, and, hence, in view of (10.19), that given  > 0, there exists a δ > 0 such that if |||u − v|||∞ < δ, then ||x(u) − x(v)||∞ < , i.e., the map u → x(u) is continuous. Proof of Claim. Since ϕ0,u (t) = x0 = ϕ0,v (t), we have ϕ˜n,u (t) − ϕ˜n,v (t) =

n−1 X

∆j,u (t) − ∆j,v (t)

j=0

=

n−1 X

{ϕj+1,u (t) − ϕj,u (t) + ϕj,v (t) − ϕj+1,v (t)}

j=0

=

n−1 X

[ϕj+1,u (t) − ϕj+1,v (t)] + [ϕj,v (t) − ϕj,u (t)].

j=0

Let ∆k,u,v (t) = ϕk+1,u (t) − ϕk+1,v (t), then we need to estimate ||ϕk+1,u (t) − ϕk+1,v (t)||. Since f is Lipschitz, we have ||f (s, x1 , u1 ) − f (s, x2 , u2 )|| ≤ L||x1 − x2 || + L|||u1 − u2 ||| (where |||u1 − u2 ||| = ρ(u1 , u2 )). It follows that Z ||∆k,u,v (t)|| ≤ L

t

Z

a

t

|||u(s) − v(s)|||ds

||∆k−1,u,v (s)||ds + L a

and Z

t

|||u(s) − v(s)|||ds

||∆0,u,v (t)|| = ||ϕ1,u (t) − ϕ1,v (t)|| ≤ a

≤ L(t − t0 )|||u − v|||∞ , Z t L|||u − v|||∞ (s − a)ds + L(t − t0 )|||u − v|||∞ . ||∆1,u,v (t)|| ≤ L a

By induction, a = t0 , ||∆k,u,v (t)|| ≤ Lk+1 |||u − v|||∞

(t − a)k (t − a)k+1 + L|||u − v|||∞ . (k + 1)! k!

10 Control Problems: An Existence Theory

130

If α = (t − a), then ||∆k,u,v (t)|| ≤ (LeLα + Leα )|||u − v|||∞ and so, for |||u − v|||∞ sufficiently small, ||ϕ˜n,u − ϕ˜n,v || < /3 (since, for β = (b − a) ≥ (t − a) = α, we have eLβ ≥ eLα and eβ ≥ eα , so that u t ||∆k,u,v (t)|| ≤ L(eLβ + eβ )|||u − v|||∞ , all t). Now, let us prove a similar result under the Caratheodory existence conditions. Lemma 10.21. Let E × D ⊂ E × X with D a domain and let (U, ρ) be a separable, complete metric space (not totally necessary). If u( · ) : E → U , let Sγ (u0 , ρ∞ ) = {u( · ) : u : E → U , ρ∞ (u, u0 ) = supt∈E ρ(u(t), u0 (t)) < γ} and let D = E × D × Sγ (u0 , ρ∞ ). Suppose that (i) f ( · , x, u) is measurable for t ∈ E; (ii) f (t, · , · ) is continuous; and (iii) there is a ψf ∈ L1 (E, R) with ||f (t, x, u)|| ≤ ψf (t) for (t, x, u) ∈ E × D × U ; then, the map of C (E, U ) into C (E, X) given by (10.17) is continuous (and so is the corresponding map of C (E, U ) into R given by (10.16)). Proof (cf. [C-5]). Suppose that un ( · ) → u0 ( · ) in C (E, U ) and let n = 0, 1, t

Z xn (t) = x0 +

f (s, xn (s), un (s))ds. a

Then t

Z ||xn (t)|| ≤ ||x0 || +

||f (s, xn (s), un (s)||ds a t

Z ≤ ||x0 || +

ψf (s)ds ≤ ||x0 || + ||ψf ( · )||1 a

and, hence, for n = 1, 2, . . . the sequence is uniformly bounded. Now, let  > 0 and t1 , t2 ∈ E. Then Z t1 Z t2 f (s, xn (s), un (s))ds. xn (t1 ) − xn (t2 ) = f (s, xn (s), un (s))ds − a

Let M (t) =

Rt a

a

ψt (s)ds and (say) t2 ≤ t1 . Then Z

t1

||xn (t1 ) − xn (t2 )|| ≤

ψf (s)ds ≤ |M (t1 ) − M (t2 )|. t2

10 Control Problems: An Existence Theory

131

Since M is continuous and E is compact, M is uniformly continuous and the sequence {xn ( · )} is equicontinuous. Suppose there exists a t˜ ∈ E where xn (t˜) 6→ x0 (t˜). There exists a subsequence unj → u0 such that xnj → ψ uniformly on E but ψ(t˜) 6= x0 (t˜) (Ascoli). But ψ is continuous, f is continuous, and xnj → ψ uniformly together imply that Z

t

f (s, ψ(s), u0 (s))ds,

ψ(t) = x0 + a

i.e., ψ is a solution of (10.17) corresponding to u0 . By uniqueness, ψ(t) = x0 (t) and ψ(t˜) = x0 (t˜) a contradiction. It follows that xn → x0 pointwise and, by equicontinuity, uniformly on E. u t We remark that these results (modulo equivalence almost everywhere) remain valid with u( · ) : E → U bounded, Borel measurable with ρ∞ (u, v) = ess supt∈E ρ(u(t), v(t)), i.e., u( · ) ∈ L∞ (E, U ). The proofs are essentially similar. Now let us turn to the case of generalized controls. Let U be a separable, complete metric space and P(U ) = {µ : µ a regular Borel probability measure on U }. We consider (among many) two metrics on P(U ) both of which metrize weak convergence. The first is the Prokhorov metric ρ(µ, ν) and the second the bounded Lipschitz metric ρBL (µ, ν) (see [P-2]) and for σ : E → P(U ) τ : E → P(U ) we let ρ∞ (σ, τ ) = sup ρ(σ(t), τ (t)), t∈E 1 (σ, τ ) ρ∞

= sup ρBL (σ(t), τ (t)). t∈E

We now have: Proposition 10.22. Let E = [a, b] and σ ∈ C (E, P(U )). Then σ is uniformly continuous in either metric, i.e., if  > 0, then there exists δ > 0 such that |t − s| < δ implies ρ(σ(t), σ(s)) <  or ρBL (σ(t), σ(s)) < . Proposition 10.23 ([P-2]). Since U is separable, σn → σ in ρ (or ρBL ) if and only if σn → σ weakly. Proposition 10.24. If σn ( · ) → σ0 ( · ) in C (E, P(U )) (either ρ∞ or ρ1∞ ), then σn (t) → σ0 (t) weakly for all t and so Z Z g(s, x, u)µσn (du)(s) −→ g(s, x, u)µ0 (du)(s) U

U

for continuous g and all s in E. Remark 10.25. Let g(t, x, u) be bounded Lipschitz in u for all t, x, i.e., ||g(t, x, · )||BL = ||g(t, x, · )||∞ + ||g(t, x, · )||Lip ≤ M for all t, x. Suppose that σn ( · ) → σ0 ( · ) in ρ1∞ (σn , σ0 ). Then

10 Control Problems: An Existence Theory

132

Z

Z g(s, x, u)µσn (du)(s) −→ U

g(s, x, u)µσ0 (du)(s) U

uniformly in s. Proof. We have Z Z g(s, x, u) g(s, x, u) M · µσn (du)(s) − µσ0 (du)(s) M M U U ≤ M ρBL (σn (s), σ0 (s)) ≤ M ρ1∞ (σn , σ0 ). Since σn → σ0 , given  > 0, there exists n0 such that if n ≥ n0 , then ρ1∞ (σn , σ0 ) < /M and ρBL (σn (s), σ0 (s)) < /M . Hence the convergence is uniform. t u Consider the generalized system Z tZ xσ (t) = x0 + f (s, xσ (s), u)µσ (du)(s)ds a

(10.26)

U

where E = [a, b] and σ : E → P(U ) a generalized control. Corollary 10.27. If f (t, x, u) is bounded Lipschitz in u for all t, x, then the map σ → xσ given by (10.26) of C (E, P(U )) into C (E, X) is continuous. Note the corollary remains true (with appropriate modification) for σ( · ) ∈ B(E, P(U )), the space of bounded Borel measurable maps of E into P(U ). Definition 10.28. Let σ( · ), τ ( · ) be elements of C (E, P(U )) with metric ρ∞ (σ, τ ) and let f : E × D × U → X. f is generalized Lipschitz if there exists an L (same for simplicity) such that Z f (s, x1 , u)µσ (du)(s) − f (s, x2 , v)µτ (dv)(s) U

≤ L{||x1 − x2 || + ρ(σ(s), τ (s)} ≤ L{||x1 − x2 || + ρ∞ (σ, τ )} for (almost) all s ∈ E. [Note that this also applies if σ( · ) ∈ B(E, P(U )).] Proposition 10.29 (cf: Lemma 10.18). If f is generalized Lipschitz, then the map σ( · ) → xσ ( · ) given by (10.26) of C (E, P(U )) into C (E, X) is continuous. Proof. The proof is analogous to the proof of Lemma 10.18. Let Z tZ ϕk+1,σ (t) = x0 + f (s, ϕk,σ , u)µσ (du)(s)ds, a U Z tZ ϕk+1,τ (t) = x0 + f (s, ϕk,τ , u)µτ (du)(s)ds, a

U

10 Control Problems: An Existence Theory

133

for k = 0, 1, . . . with ϕ0,σ (t) = x0 = ϕ0,τ (t). Let ∆k,σ ( · ) = ϕk+1,σ ( · ) − ϕk,σ ( · )

∆k,τ ( · ) = ϕk+1,τ ( · ) − ϕk,τ ( · )

and n−1 X

ϕ˜n,σ ( · ) = ϕ0,σ ( · ) +

∆j,σ ( · ),

j=0

ϕ˜n,τ ( · ) = ϕ0,τ ( · ) +

n−1 X

∆j,τ ( · ).

j=0

Then ||xσ (t) − xτ (t)|| ≤ ||xσ (t) − ϕ˜n,σ (t)|| + ||ϕ˜n,σ (t) − ϕ˜n,τ (t)|| + ||ϕ˜n,τ (t) − xτ (t)||, and, if  > 0, then (by existence results) there exists n0 such that if n ≥ n0 , then ||xσ (t) − ϕ˜n,σ (t)|| < /3, ||xτ (t) − ϕ˜n,τ (t)|| < /3, for all t. So we need to estimate ||ϕ˜n,σ ( · ) − ϕ˜n,τ ( · )||. Claim. There exists δ > 0 and n1 ≥ n0 such that if ρ∞ (σ, τ ) < δ and n ≥ n1 , then ||ϕ˜n,σ ( · ) − ϕ˜n,τ ( · )|| < /3, and, consequently, the proposition follows. Let (as in Lemma 10.18) ∆k,σ,τ (t) = ϕk+1,σ (t) − ϕk+1,τ (t) (and note ϕ˜n,σ ( · ) − ϕ˜n,τ ( · ) =

Pn−1 j=0

∆j,σ,τ + [ϕj,τ − ϕj,σ ]). But

||∆k,σ,τ (t)|| = ||ϕk+1,σ (t) − ϕk+1,τ (t)|| Z t Z Z f (s, ϕk,σ (s), u)µσ (du)(s) − ≤ f (s, ϕk,τ (s), v)µσ (dv)(s) ds a U U hZ t i ||∆k−1,σ,τ (s)|| + ρ∞ (σ, τ ) ds ≤L a

Rt and ||∆0,σ,τ (t)|| = L a ρ∞ (σ, τ )ds. So, just as in Lemma 10.18, we get an exponential bound and the claim is established. t u We next do the generalized version of Lemma 10.21 with the Caratheodory conditions. Proposition 10.30. Suppose U is a separable metric space and P(U ) is the space of probability measures with metric ρ. Let σ0 ( · ) : E → P(U ) be

10 Control Problems: An Existence Theory

134

bounded and measurable and let Sγ (σ0 ) = {σ( · ) : σ bounded measurable map of E into P(U ), ρ∞ (σ, σ0 ) < γ} where ρ∞ (σ, τ ) = inf t∈E ρ(σ(t), τ (t)). Let D = E × D × Sγ (σ0 ) with D a domain in X. Suppose that (i) f ( · , x, u) measurable for (x, u) ∈ D × U ; (ii) f (t, · , · ) continuous for all t ∈ E; (iii) there exists ψf ∈ L1 (E, R) with ||f (t, x, u)|| ≤ ψf (t) for (t, x, u) in E × D × U . Then the map of C (E, P(U )) → C (E, X) given by Z tZ f (s, xσ (s), u)µσ (du)(s)ds

xσ (t) = U

a

is continuous. Proof (cf. Lemma 10.21). Let σn ( · ) → σ0 ( · ) in C (E, P(U )) and let Z tZ f (s, xn (s), u)µσn (du)(s)ds.

xn (t) = x0 + a

Then, letting M (t) =

Rt a

U

ψf (s)ds, we have Z tZ

||xn (t)|| ≤ ||x0 || +

||f (s, xn (s), u)||µσn (du)(s)ds a

U t

Z

ψf (s)ds ≤ ||x0 || + ||ψf ||1 .

≤ ||x0 || + a

So xn ( · ) is a uniformly bounded sequence. Let  > 0, t1 , t2 ∈ E with (say) t2 ≤ t1 . Then Z

t1

Z

||xn (t1 ) − xn (t2 )|| ≤

||f (s, xn (s), u)||µσn (du)(s)ds t2 Z t1

U

ψf (s)ds ≤ |M (t1 ) − M (t2 )|

≤ t2

R R (as U |f (s, xn (s), u)||µσn (du)(s)ds ≤ U ψf (s)µσ (du)(s) = ψf (s), since σn (s) is a probability measure). Since E is compact and M is continuous, xn ( · ) is uniformly bounded and equicontinuous. The result follows exactly t u as in Lemma 10.21. Let us now consider an alternate approach (cf. [Y-1, W-2]) by viewing the equation Z tZ (10.31) xσ (t) = x0 + f (s, xσ (s), u)µσ (du)(s)ds a

U

10 Control Problems: An Existence Theory

135

in a different light. Suppose that f (s, x, · ) is, as a function of u, an element of C (U, X). Observe that if c( · ) ∈ C (U, X) and µσ ∈ P(U ), then the map λσ : C (U, X) → X given by Z λσ [c(u)] = c(u)µσ (du) U

is a bounded linear map. Thus, we can write (10.31) in the form t

Z xσ (t) = x0 +

λσ (s)[f (s, xσ (s), (u)](ds).

(10.32)

a

Note also that if x∗ ∈ X ∗ , then Z t x∗ [xσ (t)] = x∗ [x0 ] + [x∗ ◦ λσ (s)][f (s, xσ (s), (u)]ds

(10.33)

a

and, conversely, if (10.33) holds for x∗ in a total family, then (10.32) holds. Let Z = C (U, X). Then x∗ ◦ λσ ∈ Z ∗ = C (U, X)∗ and weak convergence with respect to P(U ) corresponds to Z convergence in Z ∗ (i.e., weak-*-convergence). Since weak-*-convergence is “nice,” we can see the attraction of this approach. Definition 10.34. Let f : E × D × U × X and let λσ ( · ), λτ ( · ) ∈ L1 (E, L(C (U, X), X)). Then f is generalized Lipschitz if ||λσ (s)[f (s, x1 , u)] − λτ (s)[f (s, x2 , u)]|| ≤ L||x1 − x2 || + L||λσ (s) − λτ (s)|| for almost all s ∈ E (with same L for simplicity). Proposition 10.35. Suppose that f is generalized Lipschitz. Let P(E, U ) = {λσ ( · ) ∈ L1 (E, L(C (U, X), X)) : σ( · ) ∈ C (E, P(U ))}. Then the map of P(E, U ) → C (E, X) given by (10.32) is continuous (using the L1 -norm in P(E, U )). Proof. We have Z

t

λσ (s)[f (s, x(s), u)]ds,

x(t) = x0 + a

Z a

As previously, let

t

λτ (s)[f (s, y(s), u)]ds.

y(t) = x0 +

10 Control Problems: An Existence Theory

136 t

Z ϕk+1,σ (t) = x0 +

λσ (s)[f (s, ϕk,σ (s), u)]ds, a t

Z

λτ (s)[f (s, ϕk,τ (s), u)]ds

ϕk+1,τ (t) = x0 + a

for k = 0, 1, . . ., and ϕ0,σ (t) = x0 = ϕ0,τ (t). Let ∆k,σ = ϕk+1,σ − ϕk,σ , ∆k,τ = ϕk+1,τ − ϕk,τ , ϕ˜n,σ = ϕ0,σ +

n−1 X

∆j,σ ,

j=0

ϕ˜n,τ = ϕ0,τ +

n−1 X

∆j,τ .

j=0

Then ||x(t) − y(t)|| ≤ ||x(t) − ϕ˜n,σ (t)|| + ||ϕ˜n,σ (t) − ϕ˜n,τ (t)|| + ||ϕ˜n,τ (t) − y(t)||. Given  > 0, by the standard existence theory for the differential equations, there exists n0 such that if n ≥ n0 , then ||x(t) − ϕ˜n,σ (t)|| < /3,

||y(t) − ϕ˜n,τ (t)|| < /3

(for all t). So we want to estimate ||ϕ˜n,σ (t) − ϕ˜n,τ (t)||. Let ∆k,σ,τ (t) = ϕk+1,σ (t) − ϕk+1,τ (t) so that ϕ˜n,σ (t) − ϕ˜n,τ (t) =

n−1 X

∆j,σ (t) − ∆j,τ (t)

j=0

=

n−1 X

∆j,σ,τ (t) − ∆j−1,σ,τ (t),

j=0

so we need to estimate ||ϕk+1,σ (t) − ϕk+1,τ (t)||. Since f is generalized Lipschitz and since λσ (s)[f (s, ϕk,σ (s), u)] − λτ (s)[f (s, ϕk,τ (s), u)] = λσ (s)[f (s, ϕk,σ (s), u) − f (s, ϕk,τ (s), u)] + [λσ (s) − λτ (s)][f (s, ϕk,τ (s), u)], we have ||∆k,σ,τ (s)|| ≤ ||λσ (s)|| ||∆k−1,σ,τ (s)|| + ||λσ (s) − λτ (s)|| ||f (s, ϕk,τ (s), · )||.

10 Control Problems: An Existence Theory

137

Rt It follows, since ||∆0,σ,τ (t)|| ≤ L a ||λσ (s) − λτ (s)||ds ≤ ||λσ ( · ) − λτ ( · )||1 , that Z t ||∆k,σ,τ (s)|| ≤ L ||λσ (s)|| · ||∆k−1,σ,τ (s)||ds a Z t sk−1 L||λσ (s) − λτ (s)|| + ds, (k − 1)! a and, hence, that ||∆k,σ,τ (t)|| ≤ Lk+1 ||λσ ||k+1 1

(t − a)k+1 (t − a)k + ||λσ − λτ ||1 . (k + 1)! k!

This gives an exponential bound (as in Lemma 10.18) and so for ||λσ − λτ ||1 sufficiently small, ||ϕ˜n,σ ( · ) − ϕ˜n,τ ( · )|| < /3. In other words, the map is continuous. t u Let us now make the following observation. Suppose that U( · ) : E → 2U is a continuous map of E (with respect to the Hausdorff distance in 2U ) and we consider the set C (E, U( · )) = {u : u(t) ∈ U(t), u( · ) continuous} with the metric ρ(u, v) = supt∈E ρ(u(t), v(t)). Then Lemmas 10.18 and 10.21 remain true in this context. Similarly, if we let U(s) be the closure of U(s) and we consider (generalized) controls σ( · ) with support U(s), i.e., µσ(s) (U(s)) = 1, then the results remain true in this context. We leave the details to the reader as there are few changes required. We noted earlier that co {δu : u ∈ U } is dense in P(U ). We also note that if σ( · ) is a generalized control, then there is a step-function σ1 ( · ) arbitrarily close to σ( · ). Remark 10.36. Suppose σ( · ) ∈ C (E, P(U )). If  > 0, then there is a σu ( · ) ∈ R(E, co {δu }) with ρ(σu ( · ), σ( · )) < . Proof. Since σ( · ) is continuous and E is compact, σ( · ) is uniformly continuous. So there is a δ > 0 such that |t − s| < δ implies ρ(σ(s), σ(t)) < /3. Let s0 = ϕ < s1 < · · · < sn = Pb be a partition with |sj+1 − sj | < δ. Then there exist α (s ) ≥ 0 with and there exist ui ∈ U such j,i i i αj,i (si ) = 1, P P that ρ( i αj,i (si )δui , σ(si )) < /3. Let u(s) = i αj,i (s)ui , where αj,i (s) = αj,i (si ), s ∈ Ej . Then ρ(σu (s), σ(s)) ≤ ρ(σu (s), σ(si )) + ρ(σ(si ), σ(s)) <  for all s. t u Note, working almost everywhere, the remark shows that R(E, co (U )) is dense in R(E, P(U )). In other words, an optimal generalized control can be approximated by regular controls. Let W = {u( · ) : u( · ) : E → U , u bounded and Borel measurable} and Wg = {σ( · ) : σ( · ) : E → P(U )}, σ bounded and Borel measurable}. Suppose that L : E × D × U → R and that |L(s, x, u)| ≤ ψ(s),

ψ(s) ∈ L1 (E, R)

10 Control Problems: An Existence Theory

138

on E × D × U . If u( · ) ∈ W, let Z xu (t) = x0 +

t

f (s, xu (s), u(s))ds a

where f : E × D × U → X and ||f (s, x, u)|| ≤ ψ1 (s),

ψ1 (s) ∈ L1 (E, R)

on E × D × U . Similarly, if σ( · ) ∈ Wg , let Z tZ f (s, xσ (s), u)µσ (du)(s)ds.

xσ (t) = U

a

Let

Z L(t, xu (t), u(t))dt,

J(u, x0 ) =

u∈W

E

and

Z Z Jg (σ, x0 ) =

L(t, xσ (t), u)µσ (du)(t)dt, E

σ ∈ Wg .

U

Observe that Z

Z |L(t, xu (t), u(t))|dt ≤

|J(u, xu )| ≤

ψ(t)dt ≤ ||ψ( · )||1 E

E

and that Z Z |Jg (σ, xσ )| ≤

|L(t, xσ (t), u)|µσ (du)(t)dt Z Z Z ψ(t)µσ (du)(t)dt ≤ ψ1 (t)dt ≤ ||ψ1 ( · )||1 . ≤ E

U

E

U

E

In other words, if U ⊂ W or Ug ⊂ Wg , then inf u∈U J(u, x0 ) exists and so does inf σ∈Ug J(σ, x0 ). Proposition 10.37. If U (or Ug ) is sequentially compact and J(u, x0 ) (or Jg (σ, x0 )) is (lower semi-) continuous and the map u → xu (or σ → xσ ) is continuous, then there is an optimal u∗ ( · ) ∈ U (or σ ∗ ∈ Ug ). Proof. Let J ∗ = inf u∈U J(u, x0 ) which is finite. Thus there are un ( · ) ∈ U, xn ( · ) = xun ( · ) such that J(un , x0 ) → J ∗ . Since U is sequentially compact, there is a convergent subsequence unj ( · ) → u∗ ( · ) ∈ U (and by continuity xnj ( · ) → xu∗ ( · )). So J ∗ ≤ J(u∗ , x0 ), i.e. J ∗ = lim J(unj , x0 ) = J(u∗ , x0 ) t u (by continuity of J). The proof for Ug is entirely similar. Let us now develop a sort of general example. Remark 10.38. P(U ) is convex. For if µ, ν ∈ P(U ), then αµ(U ) + (1 − α)ν(U ) = α + (1 − α) = 1 (0 ≤ α ≤ 1) and [αµ + (1 − α)ν](∪Ai ) = αµ(∪Ai ) +

10 Control Problems: An Existence Theory

139

(1 − α)ν(∪Ai ) = αΣµ(Ai ) + (1 − α)Σν(Ai ) = Σαµ(Ai ) + (1 − α)ν(Ai ) = Σ[αµ + (1 − α)ν](Ai ) where the Ai are disjoint. Let Z be a metric space and Z ⊂ Z. If Z is compact, then Z is sequentially compact. Let L∞ (E, Z) = {u( · ) : u( · ) : E → Z, u( · ) bounded, Borel measurable} and L∞ (E, Z) ⊂ L∞ (E, Z). Remark 10.39. If Z is convex, then L∞ (E, Z) is convex. For if u( · ), v( · ) ∈ L∞ (E, Z), then αu(t) + (1 − α)v(t) ∈ Z a.e. in t (0 ≤ α ≤ 1) so that αu( · )+(1−α)v( · ) ∈ L∞ (E, Z) (since clearly bounded and Borel measurable). Remark 10.40. Z compact implies Z is complete, separable, closed and sequentially compact. If Z is a Banach space, then A ⊂ Z∗ is Z-compact (i.e., weak-*-compact) if and only if A is Z-closed and norm bounded. Now let Z be a Banach space and U ⊂ Z. As noted if U is convex, then L∞ (E, U ) is convex and if U is compact, then U is closed, separable and sequentially compact. Suppose that Z is a reflexive Banach space. Then L1 (E, Z∗ )∗ = L∞ (E, Z), i.e., L∞ (E, Z) is the dual of L1 (E, Z∗ ) via ψ( · ) ∈ L∞ (E, X) operates on L1 (E, Z∗ ) by Z ϕ(t)[ψ(t)]dt. λψ (ϕ) = E

Then K ⊂ L∞ (E, Z) is L1 (E, Z∗ )-compact if and only if K is L1 (E, Z ∗ ) closed and || · ||∞ bounded. If U ⊂ Z is compact and convex, then K = R is convex and ≤ L |ψ(ϕ)| || · || ) U ≤ bounded. [For, (E, |ϕ(t)[ψ(t)]dt ∞ ∞ E R ) ⊂ Z) (E, .] Also, is closed in ||ψ|| (E, ||ϕ|| L ≤ ||ϕ(t)|| ||ψ|| dt U L ∞ 1 ∞ ∞ ∞ E L∞ (E, Z) as ||ϕn − ϕ||∞ → 0 implies ϕn (t) → ϕ(t) a.e. in t and U compact implies ϕ(t) ∈ U . Since K = L∞ (E, U ) is convex and L1 (E, Z ∗ ) closed (as closed, convex means weakly closed and weak-*-closed) and therefore weak*-compact (Z reflexive). Proposition 10.41. If U = L∞ (E, U ), U compact, convex, and U ⊂ Z a reflexive Banach space, and J is (lower semi-)continuous, and the map u( · ) → xu ( · ) is continuous, then there is an optimal u∗ ( · ) ∈ U = L∞ (E, U ). In particular, this holds for U ⊂ Rn , U compact and convex. Now let us adopt a somewhat different point of view (cf. [Y-1, W-2, D-1]). Consider E × D × U and maps ϕ : E × D × U → X. We suppose that: (1) (2) (3) (4)

ϕ(t, · , · ) is bounded and Borel measurable; ϕ(t, x, · ) is continuous for all t, x; ϕ( · , x, u) is Lebesgue measurable for all x, u; and there exists ψ ∈ L1 (E, R) such that ||ϕ(t, x, u)|| ≤ ψ(t) on E × D × U .

For instance, the standard system function f satisfies these conditions which we call the Caratheodory conditions. We let Φ = {ϕ(t, x, u) : ϕ satisfies (1)–(4)}. Observe that if ϕ ∈ Φ, then c( · ) = ϕ(t, x, · ) is an element of C (U, X). If u0 ∈ U and c( · ) ∈ C (U, X), then

10 Control Problems: An Existence Theory

140

λu0 [c( · )] = c(u0 )

(10.42)

is a bounded, linear map of C (U, X) into X. Let B ∗ = {x∗ ∈ X ∗ : ||x∗ || ≤ 1} and let ∂B ∗ = {x∗ ∈ B ∗ : ||x∗ || = 1}. Thus if x∗ ∈ B ∗ (or ∂B ∗ ), then [x∗ ◦ λu0 ][c( · )] = x∗ [c(u0 )]

(10.43)

is an element of C (U, X)∗ . Proposition 10.44. If u( · ) ∈ L∞ (E, U ) and x∗ ∈ B ∗ , then the map [x∗ ◦ λu( · ) ]( · ) = x∗ [c(u( · ))]

(10.45)

is an element of L∞ (E, C (U, X)∗ ). Moreover, u( · ) = v( · ) a.e. if and only if x∗ ◦ λu( · ) = x∗ ◦ λv( · ) a.e. Proof. Clearly x∗ ◦ λu( · ) ∈ L∞ (E, C (U, X)∗ ), since ||x∗ || ≤ 1 and c( · ) is a bounded, continuous function. If u( · ) = v( · ) a.e., then x∗ ◦ λu( · ) = x∗ ◦ λv( · ) a.e., as c(u( · )) = c(v( · )) a.e. Conversely suppose that x∗ ◦ λu( · ) = x∗ ◦ λv( · ) almost everywhere. We claim ||u( · ) − v( · )||∞ = 0. If not, suppose ||u( · ) − v( · )||∞ = α > 0 so that ||u(s) − v(s)|| > 0 on some E1 ⊂ E of positive measure. Let x0 ∈ X, x0 6= 0 and x0∗ ∈ B ∗ , ||x0∗ || = 1, such that x∗0 x0 6= 0. Let c(w( · )) = ||w( · ) − u( · )||∞ χE1 x0 so that c( · ) ∈ C (U, X). But c(u( · )) = 0 and c(v( · )) = αx0 . But then x∗0 [c(u( · ))] = 0 and x0∗ [c(v( · ))] 6= 0. In other words, the map (10.45) of L∞ (E, U ) → L∞ (E, C (U, X)∗ ) is well defined. Let W = {wu : wu = x∗ ◦ λu ,

for all x∗ ∈ B ∗ },

(10.46)

i.e., the elements wu = B ∗ ◦ λu , u ∈ L∞ (E, U ). We define an equivalence relation on W via wu ' wv iff x∗ ◦ λu = x∗ ◦ λv ,

for all x∗ ∈ B ∗ ,

and let W be the set of equivalence classes [wu ]. Then the map u( · ) → [wu( · ) ] is bijective, so we may identify the space of controls L∞ (E, U ) with W. This is to be understood in the following way: if u( · ) ∈ L∞ (E, U ) and t

Z

f (s, xu (s), u(s))ds,

xu (t) = x0 + a

then x∗ xu (t) = x∗ x0 +

Z

t

x∗ [f (x, xu (s), u(s))]ds

a

for all x∗ ∈ X ∗ . Suppose that ϕ ∈ Φ, and wu ∈ W. Then wu acts on Φ via

10 Control Problems: An Existence Theory

wu (x∗ )(ϕ) =

141

t

Z

[x∗ ◦ λu ][ϕ(s, x, · )]ds

a

Note also that wu (x∗ ) is an element of Φ∗ .

t u

Definition 10.47. We say that un ( · ) → u( · ) weakly if [x∗ ◦ λun( · ) ][c( · )] → [x∗ ◦ λu( · ) [c( · )]

(10.48)

for all x∗ ∈ B ∗ and c( · ) ∈ C (U, X). Proposition 10.49. If un ( · ) → u( · ) in L∞ (E, U ), then un ( · ) → u( · ) weakly. Proof. Now Z



x∗ [c(un (s))]ds,

[x ◦ λun ][c( · )] = E

Z

[x∗ ◦ λu ][c( · )] =

x∗ [c(u(s))]ds.

E

Since un → u in L∞ (E, U ), un → u a.e. and, consequently, c(un ( · )) → c(u( · )) a.e. But c( · ) is bounded and µ(E) is finite. It follows that c ◦ un → t u c ◦ u in L1 (E, X) and hence that [x∗ ◦ λun ][c( · )] → [x∗ ◦ λu ][c( · )]. Corollary 10.50. If f (s, x, u) is a control system function, then un → u weakly implies [x∗ xun ] → [x∗ xu ] for all x∗ and hence xun → xu a.e. Similarly for λun and λu . We observe that if X = R, then all simplifies and since C (U, Rn ) = C (U, R) ⊕ · · · ⊕ C (U, R), the finite-dimensional theory is straightforward (cf. [Y-1]). We consider now the generalized case. Let σ ∈ P(U ) and c( · ) ∈ C (U, X) (bounded, continuous functions from U into X). We define a map λσ : C (U, X) → X by Z λσ (c( · )) =

c(u)µσ (du). U

Clearly λσ is a bounded linear map and, as earlier, x∗ ◦ λσ ∈ C (U, X)∗ . If σ( · ) ∈ L∞ (E, P(U )) (a metric space not normed),10.2 then Z tZ λσ( · ) (t)[c( · )] =

c(u)µσ (du)(s)ds a

Z

U t

λσ(s) [c( · )]ds.

= a

Then λσ( · ) is a map of E → L(C (U, X), X) and 10.2

A better notation is B∞ (E, P, U ) with metric ρ(σ, τ ) = ess supt ρP (σ(t), τ (t)).

10 Control Problems: An Existence Theory

142

[x∗ ◦ λσ( · ) ]( · ) ∈ L∞ (E, C (U, X)∗ ) for x∗ ∈ B ∗ . The map of B∞ (E, P(U )) → L∞ (E, C (U, X)∗ ) is well defined, and let Wg = {wσ : wσ = x∗ ◦ λσ }. As earlier, we define an equivalence in ∼ wτ if x∗ ◦ λσ = x∗ ◦ λτ for all x∗ ∈ B ∗ . Let W g be the set of Wg by wσ = equivalence classes. Then the map σ( · ) → [wσ ] (equivalence class) is bijective and we may identify the space of generalized controls with W g and we have Z tZ f (s, xσ (s), u)µσ (du)(s)ds

xσ (t) = x0 + a

U t

Z

λσ(s) f (s, xσ (s), · )]ds

= x0 + a

and

t

Z

x∗ ◦ xσ (t) = x∗ x0 +

[x∗ ◦ λσ( · ) ][f (s, xσ (s), · )]ds.

a

Similarly (and as earlier), W g maps Φ into a family of curves by wσ (x∗ )ϕ =

Z

t

[x∗ ◦ λσ(s) ][ϕ(s, x, · )]ds

a ∗



and wσ (x ) ∈ Φ . Definition 10.51. We say that σn ( · ) → σ( · ) weakly if x∗ ◦ λσ(n) [c( · )] → [x∗ ◦ λσ ][c( · )] for all x∗ ∈ B ∗ and c( · ) ∈ C (U, X). As earlier convergence in B∞ (E, P(U )) implies weak convergence which is essentially weak-*-convergence in C (U, X)∗ . Now let us consider still another point of view. Let A be a subset of a complete metric space and let C (A, X) be the Banach space of bounded, continuous functions c( · ) mapping A → X with X a Banach space. Let B ∗ = S ∗ (0, 1) = {x∗ ∈ X ∗ : ||x∗ || ≤ 1} be the closed unit ball in X ∗ . Note that B ∗ is X-compact (i.e., weak-*-compact). Let Z = A × B ∗ and let C (Z) = C (Z, R) be the space of bounded continuous maps of Z → R. We define a map ψ : C (A, X) → C (Z) by ψ(c( · ))(a, x∗ ) = x∗ [c(a)].

(10.52)

Let ψ(C (A, X)) = M ⊂ C (Z) be the range of ψ. Proposition 10.53. ψ is a bounded (continuous) linear map and M is a closed subspace of C (Z). Proof. Clearly ψ(αc( · ))(a, x∗ ) = x∗ [αc(a)] = αx∗ [c(a)] = αψ(c)(a, x∗ ) for all a, x∗ . Similarly ψ(c1 ( · ) + c2 ( · ))(a, x∗ ) = x∗ [c1 (a) + c2 (a)] = x∗ [c1 (a)] + x∗ [c2 (a)]. Since the distance in C (A, X) is given by ρ(c1 , c2 ) = supa∈A ||c1 (a)− c2 (a)|| and ||c( · )|| = ρ(c, 0), we have

10 Control Problems: An Existence Theory

143

||ψ(c( · ))(a, x∗ )|| = |x∗ [c(a)]| ≤ ||x∗ || · ||c( · )|| ≤ 1 · ||c||, so ψ is bounded and ||ψ|| = 1. The map ψ is injective since if c1 6= c2 , there exist an a with c1 (a) 6= c2 (a) and an x∗ ∈ B ∗ with x∗ [c1 (a)] 6= x∗ [c2 (a)] . All that remains is to show that M is closed. If ψ(cn ( · )) converges in C (Z), then ψ(c( · ))( · , · ) converges uniformly and ||ψ(cn ( · ))|| ≥ ||ψ(cn ( · ))(a, x∗ )| = |x∗ [cn (a)]| for all x∗ . But there exists x∗a ∈ B ∗ such that |xa∗ [cn (a)]| = ||cn (a)|| and so ||ψ(cn ( · ))|| ≥ ||cn ( · )||. It follows that cn ( · ) is uniformly convergent to a continuous function c( · ) in C (A, X) and that ψ(c( · )) = limn→∞ ψ(cn ( · )) in C (Z). In other words, M is closed. t u We apply this proposition to C (U, X) and maps of E → U or P(U ). We begin with an extension of Proposition 10.53 to the “generalized” case. Consider C (U, X) and let Zg = P(U ) × B ∗ . Define a map ψ g : C (U, X) → C (Zg , R) = C (Zg ) via i hZ ∗ ∗ c(u)µ(du) ψ(c( · ))(µ, x ) = x (10.54) U

for µ ∈ P(U ), x∗ ∈ B ∗ . If Pδ (U ) = {µδ : Dirac measures} and we imbed U in Pδ (U ) via u → µδu where  1 if u ∈ A, µδu (A) = 0 if u 6∈ A, then Z = U × B ∗ → Pδ (U ) × B ∗ ⊂ Zg . Proposition 10.55. ψ g is a bounded (continuous) linear map and ψg (C (U, X)) = Mg is closed in C (Zg ). Proof. Linearity is clear. Now Z hZ i ∗ ∗ ∗ |ψ g (c( · ))(µ, x )| = x c(u)µ(du) ≤ ||x || ||c( · )||µ(du) U

(10.56)

U

≤ ||x∗ || ||c( · )|| so that |ψ g (c( · ))(µ, x∗ )| ≤ ||c( · )|| for all µ, x∗ . Let ψ(cn ( · )) converge uniformly. From (10.56) we note that there exists x∗n,u such that hZ i Z ∗ cn (u)µ(du) = cn (u)µ(du) . xn,u U

U

If µ = δu is Dirac measure, then ||ψ g (cn ( · ))|| ≥ ||cn (u)|| for all u. Hence cn ( · ) → c( · ) in C (U, X) and limn→∞ ψ g (cn ( · )) = ψ g (c( · )). We observe that, with the identification of U and Pδ (U ), ψ g is an extension of ψ. Let Φ = {ϕ(t, x, u) : ϕ satisfies the standard Caratheodory conditions (1)–(4)}. Let σ( · ) be a bounded Borel measurable map of E → Zg , so that σ(t) = (µσ (ϕ), x∗ ) (fixed x∗ ). Then, σ ∗ (t) given by

10 Control Problems: An Existence Theory

144 ∗

σ (t)[ψ g (c( · ))] = x



hZ

i c(u)µσ (du)(t) ,

(10.57)

U

is an element of Mg∗ and, if ϕ ∈ Φ, then σ ∗ (t)[ϕ(t, x, u)] = σ ∗ (t)[ψ g (ϕ(t, x, · ))] = x∗

hZ

i ϕ(t, x, u)µσ (du)(t)

U

(10.58) t u

is an element of Φ∗ . w

Definition 10.59. σn ( · ) converges to σ( · ) weakly (σn → σ) if σn∗ ( · ) → σ ∗ ( · ) in L∞ (E, M∗g ) with the Mg topology of Mg∗ (i.e., the weak-*-topology). This means that σn∗ (t)[ψ g (c( · ))] → σ ∗ (t)[ψ g (c( · ))] uniformly in t for each c( · ) or, equivalently, that ψ g (c( · ))[σn ( · )] → ψ g (c( · ))[σ( · )] uniformly in t w for each c( · ). Observe that if σn ( · ) → σ( · ), then σn ( · ) → σ( · ). In other words, xn∗ → x∗ uniformly and µσn ( · ) → µσ ( · ) weakly with uniformity in t. Let σ( · ) : E → Zg = (P(U ) × B ∗ ) be a bounded measurable map and let σ ∗ ( · ) be the corresponding element of L∞ (E, Mg∗ ). The σ ∗ ( · ) are called generalized controls. If f (s, x, u) is the system function, then the system becomes Z t ∗ ∗ σ ∗ (s)[ψ g (f (s, xσ∗ (s), · ))]ds xσ (t) = x x0 + a Z t i hZ ∗ ∗ x (s) = x x0 + f (s, xσ∗ (s), u)µσ∗ (du)(s) ds. a

U

Under standard (e.g., Lipschitz) conditions, the map σ ∗ ( · ) → xσ∗ ( · ) is continuous and if σn∗ ( · ) → σ ∗ ( · ) weakly, then xσn∗ ( · ) → xσ∗ ( · ) uniformly. If the set of generalized controls is weak-*-compact, then (for appropriate cost) we have existence. Since B ∗ is weak-*-compact, the critical conditions are on P(U ). If U is a separable metric space, then P(U ) is separable and second countable. Definition 10.60. A Γ ⊂ P(U ) is tight, if, for every  > 0 there exists a compact K ⊂ U such that µ(U − K ) <  for all µ ∈ Γ (i.e., µ(K ) > 1 − . Note that if U is compact, then every Γ ⊂ P(U ) is tight. There is also: Theorem 10.61 ([P-2]). If U is a complete separable metric space and Γ ⊂ P(U ), then Γ is tight if and only if Γ (closure in P(U ) with respect to weak convergence) is compact. In view of this theorem, we usually consider generalized controls σ( · ) with µσ ( · ) ∈ Γ , Γ a given tight family of measures. In other words, the generalized controls lie in the compact (appropriate topologies) Γ ×B ∗ . Under these conditions and the standard conditions on the system and cost, there

10 Control Problems: An Existence Theory

145

is existence and a convergent minimizing sequence with values in the convex hull of the Dirac measures. Example 10.62 (cf. Remark 7.81). Let X = R and suppose that f (s, x, u) satisfies the conditions of Remark 7.81 (i.e., f : E × R × U → R, measurable, bounded, Lipschitz, etc.) with E = [0, T ]. Let σ( · ) : E → P(U ) × B ∗ and τ ( · ) : E → P(U ) × B ∗ and suppose, for simplicity, x0 = 0 (we leave the necessary modifications if x0 6= 0 to the reader). Let Z tZ yσ (t) =

x∗ (s)[f (s, yσ (s), u)]µσ (du)(s)ds,

U

0

Z tZ yτ (t) =

z ∗ (s)[f (s, yτ (s), u)]µτ (du)(s)ds.

U

0

We want to estimate ||yσ ( · ) − yτ ( · )||∞ . Let ϕ0 = ψ0 = 0 and let yσm ( · ) = Tσm (ϕ0 )( · ) and yτm ( · ) = Tτm (ψ0 )( · ) where Z tZ Tσ (ϕ)(t) = 0

U

Z tZ Tτ (ϕ)(t) = 0

x∗ (s)[f (s, ϕ(s), u)]µσ (du)(s)ds, z ∗ (s)[f (s, ϕ(s), u)]µτ (du)(s)ds.

U

Then ||yσ ( · ) − yτ ( · )||∞ = ||yσ ( · ) − yσm ( · )|| + ||yσm ( · ) − yτm ( · )||∞ + ||yτm ( · ) − yτ ( · )||∞ . Given  > 0 (Example 7.82), there exists m0 such that if m ≥ m0 , then ||yσ ( · ) − yσm ( · )||∞ < /3,

||yτ ( · ) − yτm ( · )||∞ < /3,

and so we need to estimate ||yσ ( · ) − yτ ( · )|| = ||Tσm (ϕ0 )( · ) − Tτm (ϕ0 )( · )||∞ . Write ϕm ( · ) = yσm ( · ) and ψm ( · ) = yτn ( · ) so that ϕm+1 = Tσ ϕm and ψm+1 = Tτ (ψm ). Then ||ϕm+1 − ψm+1 || = ||Tσ (ϕm ) − Tτ (ψm )|| ≤ ||Tσ (ϕm ) − Tσ (ψm )|| + ||Tσ (ψm ) − Tτ (ψm )||. We have

10 Control Problems: An Existence Theory

146

Z tZ ϕm+1 − ψm+1 = 0

Z

(x∗ (s) − z ∗ (s))[f (s, ϕm (s), u)]µσ (du)(s)ds

U tZ

+ 0

Z tZ − 0

z ∗ (s)[f (s, ϕm (s), u)]µτ (du)(s)ds

U

Z tZ + 0

z ∗ (s)[f (s, ϕm (s), u)]µσ (du)(s)ds

U

z ∗ (s)[f (s, ϕm (s), u) − f (s, ψm (s), u)]µτ (du)(s)ds.

U

It follows that ||ϕm+1 (t) − ψm+1 (t)|| ≤ ||x∗ ( · ) − z ∗ ( · )||∞ · t + ρBL (σ, τ ) · t Z t + ||ϕm (s) − ψm (s)||ds. 0

Thus ||ϕ1 (t) − ψ1 (t)|| ≤ αt, α = ||x∗ − z ∗ ||∞ + ρBL (σ, τ ) and ||ϕ2 (t) − Rt ψ2 (t)|| ≤ αt + 0 αsds = αt + αt2 /2! and similarly,  Tm ||ϕm+1 − ψm+1 ||∞ ≤ α T + ≤ αeT . m! If α < /3eT , then ||ϕm+1 − ψm+1 ||∞ < /3 and so for m large enough, ||yσ ( · ) − yτ ( · )||∞ < . In other words, the map σ( · ) → yσ ( · ) is continuous. Thus if σn ( · ) → σ( · ) weakly, then yσn ( · ) → yσ ( · ) and, if P(U ) is compact (weak convergence), we have an existence result. Since Rn = R ⊕ − ⊕ R, the result also holds in Rn . Example 10.63. Suppose that f : E ×D×U → X with the usual continuity and measurability conditions and that ||f (s, x, u)|| ≤ m(s) with m( · ) ∈ L1 (E, R). For simplicity we suppose that E = [0, 1]. We let B ∗ = {x∗ : ||x∗ || ≤ 1} in X ∗ . We consider controls σ( · ) : E → P(U ) × B ∗ with σ(s) = (µσ (du)(s), x∗σ ) which are bounded and measurable. Then xσ (t) =

x∗σ x0

Z tZ

x∗σ [f (s, xσ (s), u)]µσ (du)(s)ds

+ 0

U

or, equivalently, xσ (t) = x∗σ x0 +

t

Z

λσ (s)[f (s, xσ (s)u)]ds 0

where λσ (s) is the element of C (U, X)∗ given by Z x∗σ [c(u)]µσ (du)(s). λσ (s)[c(u)] = U

10 Control Problems: An Existence Theory

147

Suppose that λσ ( · ) is bounded, i.e., ||λσ ( · )||∞ ≤ Lσ . Then ||xσ (t)|| ≤

||x∗σ ||

Z · ||x0 || +

t

||λσ ( · )||m(s)ds ≤ ||x∗σ || · ||x0 || + Lσ ||m( · )||1 .

0

It follows that if Ωg = {σ( · ) : ||λσ ( · )||∞ ≤ L} is a bounded set of controls, then Ag (t, x0 ) = {xσ (t) : σ( · ) ∈ Ωg } is bounded; and in turn, Ag (t, x), co (Ag (t, x)), co (Ag (t, x)) are all bounded. We have: Claim 10.64. The maps t → Ag (t, x0 ), t → Ag (t, x0 ), t → co (Ag (t, x0 )), t → co Ag (t, x0 ) are all continuous with respect to the Hausdorff distance ρH . Proof. It is enough to show this for Ag (t, x0 ). Let P1 ∈ Ag (s1 , x0 ). Then there exists a δ() with ||xσ (s) − P1 || < /2R if |s − s1 | < δ(). Since m( · ) is s in L1 (E, R), we have ||xσ (s) − xσ (s1 )|| ≤ s1 Lm(s)ds < /2 for δ() small enough. In other words, ρH (P1 , Ag (s, x0 )) <  if |s − s1 | < δ(). Similarly, ρH (Ag (s, x0 ), P2 ) < . So we have ρH (Ag (s, x0 ), Ag (s1 , x0 )) < 2 if |s − s1 | < δ/. t u Suppose that Ωg is convex as well as bounded. If f (s, x, u) is convex in x, then Ag (t, x0 ) is convex. If J is bounded below and J, f are convex, then there is an optimal σ ˜ ( · ) in Ωg (cf. Example 9.88).

Chapter 11

Interlude: Variations and Solutions

We have already noted that the choice of admissible controls or variations plays a critical role in control and optimization problems and is closely related to the notion of solution. We consider several (relatively) simple problems to illustrate these issues again. We use a variety of standard spaces: C ([0, 1]), C ([0, 1], X), Cp ([0, 1]) = {f : f is piecewise continuous}, D([0, 1]) = {f : f (j) ∈ C ([0, 1]), j < n, f (n) ∈ Cp ([0, 1])}, Lp ([0, 1], X) and Lp ([0, 1], H), p ≥ 1, H a Hilbert space,

(11.1)

and a variety of situations. 11.2. Let H be a Hilbert space and let A : H → H be a closed densely-defined linear operator. Let Dt = d/dt be a differential operator on {z( · ) : z maps [0, 1] into H}. Let B : U → dom (A) be a bounded linear operator (U ⊂ U a Hilbert space). 11.3. Let D be a regular domain in Rn with Γ = ∂D. Let α = (α1 , . . . , αn ) be a multi-index and |α| = Σαj , αj ≥ 0. Let C0∞ (D) = {f : f infinitely differentiable, supp f ⊂ K, K compact ⊂ D}, α α f ∈ Lp (D), |α| ≤ m}, where Dw Wm,p (D) = {f ( · ) ∈ Lp (D) : Dw f is the weak derivative so that Z Z α Dw f Dα ϕdx = f ϕdx, for all ϕ ∈ C0∞ (D), (−1)α D

D

H 1 (D) = W 1,2 (D) with

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_11

149

11 Interlude: Variations and Solutions

150

||f ||1,2 =

1 X

||f (j) ||22

1/2

and

hf, gi =

j=0

1 X

hf (j) , g (j) i2 ,

j=0

H01 (D) = {f ∈ H 1 (D) : f = 0 on Γ }. Now let Lψ = −ΣDxj (aij Dxj ψ) + a0 ψ be a differential operator with aij (x), a0 (x) ∈ L∞ (D) and a0 (x) ≥ a > 0 and Σaij (x)xi xj ≥ α||x||2

(coercive).

Let dom (L) = H01 (D). Let U ⊂ L2 (D), and we consider the system Lψ(u) = f + u, f ∈ H01 (D) in Example 11.6 Example 11.4. Let Z

1

J(u) =

p x2 + u2 ds,

0

x˙ = u,

x(0) = 0, x(1) = 1, R1 and let U = {u ∈ Cp ([0, 1]), 0 u(s)ds = xu (1) = 1}. So we want to find inf J(u) subject to the constraints. Note that U is convex but J is not and that J(u) > 1 for all u ∈ U. Let uk (t) = ktk−1 so that xk (t) = tk . Then Z

1

Z uk (s)ds =

0

0

1

d(sk ) ds = 1 ds

and uk ∈ U. Moreover, Z 1p J(uk ) = xk (s)2 + uk (s)2 ds 0

Z

1

=

Z p 2k 2 2k−2 s +k s ds =

0

Z ≤

1

sk−1

p

s2 + k 2 ds

0 1

sk−1 (s + k)ds = 1 +

0

1 , k+1

so inf u∈U J(u) = 1 but there is no optimal u∗ in Cp ([0, 1]). Example 11.5. Consider the situation of 11.2. Let U ⊂ {u( · ) : u maps [0, 1] to U } and consider the system x˙ u (t) = Axu (t) + Bu(t), and let Z J(u) =

xu (0) = x0 ,

1

L(s, xu (s), u(s))ds 0

151

11 Interlude: Variations and Solutions

where L(s, · , · ) is measurable and convex (a fortiori continuous [R-3]). Assume that U is convex so that for u, v ∈ U, α ∈ [0, 1] implies w(α) = αu + (1 − α)v ∈ U. Then x˙ w(α) = Axw(α) + Bw(α), xw(α) (0) = x0 , and we claim that xw(α) = αxu + (1 − α)xv , since αx˙ u + (1 − α)x˙ v = αAxu +(1−α)Axv +αBu+(1−α)Bv = A(αxu +(1−α)xv )+B(αu+(1−α)v) and αxu (0) + (1 − α)xv (0) = x0 . By uniqueness the claim holds. In other words, if U is convex, then J(u) is convex, since L(s, · , · ) is convex and {xu ( · ) : u ∈ U} is convex. But what do we mean by a solution of the initial value problem x˙ u = Axu + Bu, xu (0) = x0 ? This, of course, depends on a careful specification of the function spaces {u( · )}, {xu ( · )}. A first reasonable requirement is that the functions (controls) u( · ) be bounded and (Borel) measurable. Since A is a closed, densely defined operator, A defines a strongly continuous semigroup SA (t) and the function Z

t

SA (t − s)Bu(s)ds

xu (t) = SA (t)x0 + 0

is absolutely continuous and satisfies the differential equation almost everywhere (depending on the interpretation of the operator d/dt) and supposing x0 ∈ dom (A). In that case, xu (t) ∈ dom (A) for t ≥ 0. What are potential choices for the space of functions containing dom (A) (and on which Dt operates)? Let H be a Hilbert space and x( · ) : [0, 1] → H. Then we could consider such spaces as: C ([0, 1], H), Cp ([0, 1], H), D(n) ([0, 1], H), or Lp ([0, 1], H), p ≥ 1. The operators A and Dt could have a corresponding choice (we are being, on purpose, a bit vague here as we want the reader to examine matters). Suppose we consider the case of L2 ([0, 1], H) and d/dt = Dt,w the weak derivative. Then we have Dt,w xu = Axu + Bu,

xu (0) = x0 ,

which means hDt,w xu , gi = −hxu , Dt gi = −hAxu + Bu, gi for g in C0∞ ([0, 1], H). Suppose, for the moment, that u( · ) = 0 so that xu (t) = SA (t)x0 . But Dt,w (SA (t)x0 ) = −ASA (t)x0 (as SA ( · ) is a strongly continuous semigroup with A as infinitesimal generator). Next suppose that x0 = 0. Then Z t

SA (t − s)Bu(s)ds

xu (t) = 0

and

11 Interlude: Variations and Solutions

152

Z

1

DZ

0

Z

t

0 1

=− 0

E S(t − s)Bu(s)ds + Bu(t) , g(t) dt E D Z t A S(t − s)Bu(s)ds + Bu(t)dt, g 0 (t) dt, 0

Rt and so the weak solution is given by SA (t)x0 + 0 S(t − s)Bu(s)ds. We leave it to the reader to examine other situations and to relate matters to the spectrum of A. Example 11.6. Consider the siutation of 11.3. So, let D ⊂ Rn be a regular domain with boundary Γ = ∂D. Let L be a linear differential operator with Lψ =

X ∂  ∂ψ  aij + a0 (x)ψ ∂xi ∂xj ij

with aij ( · ) ∈ L∞ (D), a0 ( · ) ∈ L∞ (D), and X aij (x)xi xj ≥ a|x|2 , a0 (x) > a,

a > 0,

ij

and x = (x1 , . . . , xn ) ∈ Rn . Suppose that U is a closed subspace of H 1 (D) such that H01 (D) ⊂ U and u( · ) ≡ 0 ∈ U. Consider the problem Z J(u) = inf (ψ(u) − ψ0 )2 dx + hQu, ui, u∈U

D

Lψ(u) = f + u ,

ψ(u) = g on Γ,

where Q is symmetric and coercive, g is smooth, and ψ(u) is an element of H 1 (D). We assume that there exists Ψ ∈ H 1 (D) such that τ (Ψ ) = g, i.e., ˜ ˜ that the trace of Ψ , τ (Ψ ), on Γ is g. Let ψ(u) = ψ(u) − Ψ . Then τ (ψ(u)) = 0, 1 ˜ ˜ i.e., ψ(u) = 0 on Γ and so ψ ∈ H0 (D). If Lψ(u) = f + u, then ˜ Lψ(u) = Lψ(u) − LΨ + f + u ˜ ˜ or, equivalently, Lψ(u) = f˜ + u where f˜ = f − LΨ , and ψ(u) = 0 on Γ . Now ˜ let ψ0 = ψ0 − Ψ . Then Z ˜ J(u) = inf (ψ(u) − ψ˜0 )2 dx + hQu, ui, u∈U

D

˜ Lψ(u) = f˜ + u. In other words, the problem ˜ J(u) = inf

u∈U

Z D

˜ Lψ(u) = f˜ + u

˜ (ψ(u) − ψ˜0 )2 dx + hQu, ui,

153

11 Interlude: Variations and Solutions

˜ ∈ H01 (D) is equivalent to the original with f˜ = f −LΨ , ψ˜0 = ψ0 −Ψ and ψ(u) problem. So we now consider the reduced problem Z (ϕ(u) − ϕ0 )2 dx + hQu, ui, J(u) = inf u∈U

D

ϕ(u) ∈ H01 (D),

Lϕ(u) = f + u,

Q coercive.

We next reduce the problem further. Let ϕf be the solution of Lϕf = f,

ϕf ∈ H01 (D),

˜ = ϕ(u) − ϕf . Then and let ϕ˜0 = ϕ0 + ϕf , ϕ(u) = Lϕ(u) − Lϕf = f + u − f = u. ˜ Lϕ(u) ˜ ˜ ˜ In other words, if ϕ(u) + ϕf = u, then ϕ(u) = ϕ(u) ∈ H01 (D) and Lϕ(u) satisfies Lϕ(u) = f + u, ϕ(u) ∈ H01 (D). Moreover, Z (ϕ(u) − ϕ0 )2 dx + hQu, ui J(u) = inf u∈U D Z (ϕ(u) ˜ − ϕ˜0 )2 dx + hQu, ui. = inf u∈U

D

Thus, we have reduced the problem to Z (ξ(u) − ξ0 )2 dx + hQu, ui, J(u) = inf u∈U

Lξ(u) = u,

D

ξ(u) ∈ H01 (D).

For suitable L (coercive) and U, a solution exists [A-2]. Observe that J(u) is bounded below. Suppose that U is convex; then, if u1 , u2 ∈ U, then ξ(αu1 + (1 − α)u2 ) = αξ(u1 ) + (1 − α)ξ(u2 ) by the linearity of L and uniqueness. hQu, ui is convex (and coercive). Moreover, Z [αξ(u1 ) + (1 − α)ξ(u2 ) − ξ0 ]2 dx D Z = [α(ξ(u1 ) − ξ0 ) + (1 − α)(ξ(u2 ) − ξ0 )]2 dx D Z Z 2 ≤ α (ξ(u1 ) − ξ0 ) dx + (1 − α) (ξ(u2 ) − ξ0 )2 dx D

D

(as f (x) = x2 is convex). In other words, J is convex and bounded below on U convex. So the various results of Appendix D, Corollary D.33, Proposition D.34 and Corollary D.35 apply, and the problem is well suited to a direct method approach.

11 Interlude: Variations and Solutions

154

Example 11.7. Let X, U be Banach spaces and let E(= [0, 1]) be a compact interval. Let U ⊂ U and let ϕ : U → X. Consider a set U = {u( · ) : u a measurable map of E → U with u(s) ∈ U a.e.}, and if u( · ) ∈ U, let xu (s)( · ) = ϕ(u( · )), i.e., xu (s) = ϕ(u(s)) (almost everywhere). Let J : U → R and consider Problem 11.8. inf J(u( · )),

u( · )∈U

xu ( · ) = ϕ(u( · )).

The problem is convex if J and ϕ are convex. Suppose that L : X × U → R and that Z 1 J(u( · )) = L(xu (s), u(s))ds. 0

Then we say the problem is strongly convex if (i) ϕ is convex; (ii) L is convex; (iii) L is non-decreasing in x. (Implicit is the assumption that U and U are convex.) Observe that if the problem is strongly convex, then J(u( · )) is convex. For, since L is non-decreasing in x and ϕ is convex, L(ϕ(αu1 + (1 − α)u2 ), αu1 + (1 − α)u2 ) ≤ L(αϕ(u1 ) + (1 − α)ϕ(u2 ), αu1 + (1 − α)ϕ(u2 )). Since L is convex, L(αϕ(u1 )+(1−α)ϕ(u2 ), αu1 +(1−α)u2 ) ≤ αL(ϕ(u1 ), u1 )+ (1 − α)L(ϕ(u2 ), u2 ). Clearly we could also add constraints of the form ψ(xu ) = ψ(ϕ(u)) ≤ 0. Now let Π = {s0 , s1 , . . . , sN }, 0 = s0 < s1 . . . < sN = 1 be a partition of E(= [0, 1]) and ρ(Π) = sup(sj+1 − sj ) be the “mesh” of the partition. Let Ej = (sj , sj+1 ), j = 0, 1, . . . , N − 1, and let S(Π, U) = {u( · ) ∈ U : u(Ej ) = uj , j = 0, . . . , N − 1}, be the space of the step-functions in U based on Π. If u( · ) ∈ S(Π, U), then ϕ(u( · )) = xu ( · ) ∈ S(Π, X) and J(u( · )) =

N −1 X

L(xu (Ej ), u(Ej ))(sj+1 − sj ).

j=0

Moreover, the map ϕΠ : S(Π, U) → S(Π, X) is convex and so is J(u( · )). In other words, the (discrete) problem based on Π: minimize J(u( · )) subject to xu = ϕ(u) is a convex optimization problem. 1 } be partitions of E. We say Π 1 Let Π = {s0 , . . . , sN }, Π 1 = {s01 , . . . , sN 1 is a refinement of Π if Π ⊂ Π 1 . If Π 1 , Π 2 are partitions, then Π 1 ∪ Π 2 is a common refinement. Proposition 11.9. If Π ⊂ Π 1 , then S(Π, U) ⊂ S(Π 1 , U). Proof. More or less obvious, but here is the idea. Say Π 1 = {s0 , τ, s1 , . . . , sN } with 0 = s0 < τ < s1 · · · < sN = 1 and u( · ) ∈ S(Π, U) with u(Ej ) = uj ,

155

11 Interlude: Variations and Solutions

j = 0, . . . , N − 1, Ej = (sj , sj+1 ), and E0 = (s0 , s1 ). Let E01 = (s0 , τ ) and E11 = (τ, s1 ), and Ej1 = Ej−1 , j = 2, . . . , N . Set u1 (Ej1 ) = u(Ej−1 ), j = 2, . . . , N and u1 (E01 ) = u1 , u1 (E11 ) = u1 . u t This is an injection of S(Π, U) into S(Π 1 , U). Corollary 11.10. The map ϕΠ : S(Π, U) → S(Π 1 , X) extends to a map 1 ϕΠ : S(Π, U) → S(Π 1 , X). Corollary 11.11. If Π ⊂ Π 1 , then inf{J(u) : u( · ) ∈ S(Π, U)} ≥ inf{J(u( · )) : u( · ) ∈ S(Π 1 , U)}. Observe also that if u( · ) ∈ S(Π, X), then J(u( · )) = L(xu (E0 ), u0 )(s1 − s0 ) +

N X

L(xu (Ej ), uj )(sj+1 − sj )

j=1

= L(xu (E01 ), u0 )(τ − s0 ) + L(xu (E11 ), u0 )(s1 − τ ) +

N X

L(xu (Ej ), uj )(sj+1 − sj ),

j−1

since xu (E01 ) = xu (E11 ) = xu (E0 ). Corollary 11.12. If Π ⊂ Π 1 , then ρ(Π) ≥ ρ(Π 1 ). The next elementary proposition is very useful in direct methods based on discretization since, if U is separable, then C (E, U) is dense in Lp (E, U) for p ≥ 1. (E compact.) Proposition 11.13. Suppose that u( · ) ∈ C (E, U) and Π n is a sequence of partitions with ρ(Π n ) → 0 as n → ∞. Then there exists un ( · ) ∈ S(Π n , U) based on Π n (and u( · )), such that un ( · ) → u( · ) in || · ||∞ . Proof. Since E is compact, u( · ) is uniformly continuous. If  > 0, there exists δ > 0 such that |s − t| < δ implies |u(s) − u(t)| < . Take n0 so n0 0 that ρ(Π n0 ) < δ and Ejn0 = (snj 0 , sj+1 ) and un0 (Ejn0 ) = u(sjn0 + (snj+1 − n0 t u sj )/2). Corollary 11.14. If u( · ) ∈ C (E, U) and u( · ) ∈ U and ϕ is continuous, then xun ( · ) → xu ( · ). Corollary 11.15. If U is separable and u( · ) ∈ Lp (E, U), p ≥ 1 for u( · ) ∈ U, then there exists Π n a sequence of partitions with ρ(Π n ) → 0 as n → ∞ such that un ( · ) ∈ S(Π n , U) converges to u( · ) in Lp (E, U). ˜( · ) ∈ C (E, U) with ||˜ u( · ) − u( · )||p < /2 Proof. Given  > 0, there exists u ˜( · )||p < /2 (as µ(E) = 1). and there exists Π n , un ( · ) such that ||un ( · ) − u But then ||un ( · ) − u( · )||p ≤ ||un ( · ) − u ˜( · )||p + ||˜ u( · ) − u( · )||p < /2 + /2 = . t u

11 Interlude: Variations and Solutions

156

Corollary 11.16. If ϕ is continuous, then xun ( · ) → xu ( · ) (in Lp (E, X)). We note that if L is convex and ϕ is linear, then J(u( · )) is convex and the problem is convex. Example 11.17. Let H and U be Hilbert spaces and let S(t) be a strongly continuous semi-group on H. Let B ∈ L(U, H) and let u( · ) ∈ L2 (E, U) (E = [0, 1]). Then Bu( · ) ∈ L2 ([0, 1], H) since Z

0

1

Z hBu(s), Bu(s)ids ≤

1

|hBu(s), Bu(s)i|ds ≤ ||B||2 · ||u( · )||22 .

0

Rt Let xu (t) = 0 S(t − s)Bu(s)ds and let Lu( · ) = xu ( · ). Then L is a bounded linear map of L2 (E, U) → L2 (E, H). For, if v( · ) = Bu( · ) so that xu (t) = Rt S(t − s)v(s)ds, then v( · ) ∈ L2 (E, H) and, since ||S(t − s)|| ≤ M (i.e., 0 Rt is bounded) on E, we have ||xu (t)|| ≤ M 0 ||v(s)||ds ≤ M ||v( · )||2 and, consequently, Z 1 Z 1 hxu (t), xu (t)idt ≤ M 2 ||v(t)||2 dt ≤ M 2 ||v( · )||22 . 0

0

We further observe that since B, L are linear, they are convex. Let 1 J(u( · )) = 2

1

Z

{hxu (t), Qxu (t)i + hu(t), Ru(t)i}dt 0

where Q is positive and self-adjoint and R is positive, self-adjoint and coercive. Remark 11.18. J(u( · )) is convex (and lower semi-continuous). Proof. Either by L linear and the convexity of the integrand or directly as hαxu1 + (1 − α)xu2 , Q(αxu1 + (1 − α)xu2 )i ≤ αhxu1 , Qxu1 i + (1 − α)hxu2 , Qxu2 i and hαu1 + (1 − α)u2 , R(αu1 + (1 − α)u2 )i ≤ αhu1 , Ru1 i + (1 − α)hu2 , Ru2 i (proved just as the proof that t2 is convex), so that J(αu1 + (1 − α)u2 ) ≤

α 2

Z

1

{hxu1 , Qxu1 i + hu1 , Ru1 i}dt 0

Z 1−α 1 {hxu2 , Qxu2 i + hu2 , Ru2 i}dt 2 0 ≤ αJ(u1 ) + (1 − α)J(u2 ). t u +

157

11 Interlude: Variations and Solutions

In other words, this linear quadratic problem is convex. The techniques outlined in Example 11.7 (and other direct methods) work well for this type of problem. Example 11.19. Let E = [0, 1] and H 1 (E) = {f ∈ L2 (E) : Dw f = f 0 ∈ L2 (E)} = W 1,2 (E) (Appendix E). Let hf, gi be the inner product in L2 (E) and let [f, g]1 = hf, gi + [f, g] where 0

0

Z

[f, g] = hf , g i =

1

f 0 (s)g 0 (s)ds.

0

Consider the system 2 −Dw x=−

d2 x = u, dτ 2

x(0) = 0,

x0 (1) = 0.

Let V = {f ∈ H 1 (E) : f (0) = 0} and [f, g]V = [f, g]1 . Claim 11.20. If xu ( · ) is a solution of the system, then xu is characterized as (the) solution of the variational problem [xu , v] = hu, vi for all v ∈ V . Proof. Z

1

hu, vi =

Z u(s)v(s)ds = 0

0

Z =

1

1 Z (−x00 (s))v(s)ds = −x0 v + 0

1

x0 (s)v 0 (s)ds

0

1

x0 (s)v 0 (s)ds = [x, v] (by parts and boundary conditions). t u

0

Claim 11.21. Let Vn ⊂ V be a subspace of dimension n and consider [xu , v] = hu, vi,

v ∈ Vn .

(11.22)

If u ˜ ∈ L2 (E), this equation has a unique solution x ˜u˜ . Pn Proof. Let v1 , . . . , vn be a basis of Vn . Set x = j=1 xj vj and Kij = [vj , vi ] and uj = hu, vj i. Then Corollary 11.12 is equivalent to the problem Kx = u with K an n × n symmetric matrix. In an n-dimensional space, existence and uniqueness are equivalent. So, suppose thereRexists x 6= 0 with Kx = 0. 1 Then [x, vj ] = 0, for j = 1, . . . , n and so [x, x] = 0 (x0 (s))2 ds = 0 and hence 0 x ( · ) = 0 and x( · ) is constant; but x(0) = 0 so we have a contradiction. Let Π = {s0 , . . . , sn } be a partition of E and let Vn = {v( · )} such that (i) v( · ) ∈ C (E) (or Cp (E)); (ii) v( · ) on Ej = [sj , sj+1 ], j = 0, . . . , n − 1 is a linear polynomial; and

11 Interlude: Variations and Solutions

158

(iii) v(0) = 0. For instance let φi (sj ) = δij , i = 1, . . . , n, e.g.,

s1

s0

s2

Then {ϕ1 , . . . , ϕn } is a basis of Vn (as Σαi ϕi (sj ) = 0 implies αj = 0). t u Claim 11.23. If v( · ) ∈ C (E), let vΠ = Σv(sj )ϕj . If v ∈ Vn , then v = vΠ . The claim holds since v − vΠ is linear on Ej and 0 at the end points. Note that E is compact with non-empty interior and is a “regular” domain. Suppose now that V is a space of functions on E and Vn ⊂ V is an ndimensional subspace. Let N = {ψ1 , . . . , ψn } be a basis of Vn∗ (ψi being the restrictions of elements of V ∗ ) and let {ϕ1 , . . . , ϕn } be the dual basis, i.e. ψi (ϕj ) = δij . N is called a nodal basis. Some examples: A. V2 linear polynomials: ψ1 (v) = v(0),

ψ2 (v) = v(1),

ϕ1 (s) = 1 − s,

ϕ2 (s) = s.

B. V2 linear polynomials: ψ1 (v) = v(1/2),

( ϕ1 (s) =

( ϕ2 (s) =

ψ2 (v) = v(1) − v(0),

2s on [0, 1/2], 2(1 − s) on [1/2, 1],

12

1

0 on [0, 1/2], (2s − 1) on [1/2, 1].

12

1

C. V2 piecewise linear polynomials:

159

11 Interlude: Variations and Solutions

√ ψ1 (v) = 2v(1/2), √ ψ2 (v) = 2v(1), (√

2 s on [0, 1/2], 0 on (1/2, 1],

ϕ1 (s) =

( ϕ2 (s) =

12

0 on [0, 1/2], √ (2s − 1)/ 2 on [1/2, 1].

1

12

1

Note here that ψi (ϕj ) = δij and [ϕi , ϕj ] = δij so that (11.22) becomes x = u so that x = x1 ϕ1 + x2 ϕ2 with x1 = hu, ϕ1 i, x2 = hu, ϕ2 i. Observe now that the approach works also for the system 2 −Dw x=

d2 x = ϕ(u), dt2

x(0) = 0,

x0 (1) = 0,

with the characterization [xu , v] = hϕ(u), vi. The (simple) examples are “finite elements.” We also observe that if we seek a solution with x(0) = a, then x ˜a = x − a will provide such a solution. Now let E = [0, π] and consider L2 (E). Let r 2 ϕn (s) = sin ns, n = 1, 2, . . . π Then the ϕn are linearly independent and, if m 6= n, then ϕm (s)ϕn (s) =

2 sin ms sin ns π

and π

Z

ϕm (s)ϕn (s)ds = 0

2 h sin(m + n)s sin(m − n)s iπ − + = 0. π 2(m + n) 2(m − n) 0

Moreover, ϕn (s)ϕn (s) =

2 (sin ns)2 π

and 2 π

Z 0

π

(sin ns)2 ds =

2 πn

Z 0



(sin2 r)dr =

inπ 2 hr 1 − sin 2r = 1. πn 2 4 0

11 Interlude: Variations and Solutions

160

In other words, hϕm , ϕn i = δmn . Next observe that r ϕ0n (s)

=

ϕ0m (s)ϕ0n (s) =

2 n cos ns, π

2 nm cos ms cos ns, π

and, if m 6= n, then Z 2nm π 2nm h sin(m + n)s sin(m − n)s inπ cos ms cos nsds = + = 0. π π 2(m + n) 2(m − n) 0 0 Let ϕ˜n (s) = ϕn (s)/n so that ϕ˜0n (s) = ϕ0n (s)/n. Then Z π Z π 1 ϕ0 (s)ϕ0n (s)ds = 0 ϕ˜0m (s)ϕ˜0n (s)ds = nm 0 m 0 if m 6= n. In addition Z π Z π 1 ϕ˜0n (s)ϕ˜0n (s)ds = 2 ϕ0n (s)ϕ0n (s)ds n 0 Z 0 Z 2 π 2 1 nπ (cos ns)2 ds = (cos2 r)dr = π 0 πn 0 inπ 2 hr 1 = + sin 2r =1 nπ 2 4 0 so that [ϕ˜m , ϕ˜n ] = δmn . Let VN = span {ϕ1 , . . . , ϕN } = span {ϕ˜1 , . . . , ϕ˜N }. If v ∈ L2 (E), let ψj and ψ˜j be given by ψj (v) = hv, ϕj i, ψ˜j (v) = hv, ϕ˜j i. These are elements of L2 (E)∗ = L2 (E) and VN∗ . Moreover, ψi (ϕj ) = δij so that {ψi } is dual to {ϕj }. Exercise 11.24. Let V = {f ∈ H 1 (E) : f (0) = 0} and VN = span {ϕ1 , . . . , ϕN } = span {ϕ˜1 , . . . , ϕ˜N }. Analyze this situation. Consider now the problem, E = [0, π], −

d2 x = u, dt2

x(0) = a,

u ∈ H 1 (E),

161

11 Interlude: Variations and Solutions

and

π

Z

L(xu (s), u(s))ds,

J(u) = 0

and minimize J(u). Then xu = yu + a where −

d2 yu = u, dt2

yu (0) = 0.

Let VN = span {ϕ˜1 , . . . , ϕ˜N } = span {ϕ1 , . . . , ϕN } and note that [ϕ˜m , ϕ˜n ] = 0, hϕ˜m , ϕ˜n i = 0 if m 6= n. Recall from Claim 11.21, that if yu = Σyj ϕ˜j and K = (Kij ) = ([ϕ˜j , ϕ˜i ]) and u = Σuj ϕ˜j , then y = (yj ) is a solution of the (matrix) equation Ky = u. Now here K is symmetric and diagonal and in fact K = I so that y = u. It follows that, for u ∈ VN , Z π Z π J(u) = L(u(s) + a, u(s))ds = L(Σuj ϕ˜j + a, Σuj ϕ˜j )ds. 0

0

In other words, the minimization is over an N -dimensional space. Example 11.25. Let N = 2 and J(u) =

1 2

Z

π

(x2 + u2 )ds.

0

Then x = y + a and Z Z 1 π 1 π 2 2 J(u) = [(y + a) + u ]ds = [(u + a)2 + u2 ]ds 2 0 2 0 Z π Z π 2 = (u + au)ds + a2 ds. 0

But u = u1 ϕ˜1 + u2 ϕ˜ and minimize

0

Rπ 0

a2 ds is independent of u, so, on V2 , we want to

J(u) = hu, Qui + ha, Qui where u=

  u1 , u2

h , i is the inner product in R2 and  hϕ˜1 , ϕ˜1 iL2 Q= 0

a=

  a , a

0 hϕ˜2 , ϕ˜2 iL2



Rπ where h , i is the inner product in L2 (E), i.e., hϕ˜i , ϕ˜i iL2 = 0 ϕ˜2i (s)ds. Clearly, there is a solution u (why?). What is the situation for arbitrary N ?

Chapter 12

Approximation I: Generalities

Let Y, Z be Banach spaces and let J : Z → R and Φ : Z → Y. Consider the prototype problem: Problem 12.1 (P). Find inf{J(z) : Φ(z) = 0}, i.e., if V (Φ) = {z ∈ dom Φ : Φ(z) = 0}, then find inf{J(z) : z ∈ V (Φ)}. What are reasonable approaches to defining the notion of an approximation to P? We shall examine three (somewhat) different approaches. Approach 1. Let Wm , Ym be Banach spaces, Jm : Wm → R and Φm : Wm → Ym . Consider Problem 12.2 (Pm ). Find inf{Jm (wm ) : Φ(wm ) = 0}, i.e., inf{Jm (wm ) : wm ∈ V (Φm )}. Definition 12.3. {Pm } weakly approximates P if ∗ (i) Jm → J ∗ where ∗ Jm = inf{Jm (wm ) : wm ∈ V (Φm )}, and J ∗ = inf{J(z) : z ∈ V (Φ)}; ∗ ∗ with Jm = (ii) there exists m0 such that, for m ≥ m0 , there exists wm ∗ ∗ Jm (wm ) and wm ∈ V (Φm ).

Suppose that there are surjective bounded linear maps ψm : Z → Wm . ∗ ∗ ∗ ∗ ) = wm = Then, for m ≥ m0 , there exists a zm with ψm (zm ; but, Jm ∗ ∗ inf{J(wm ) : wm ∈ V (Φm )} = J(wm ) = inf{(J ◦ ψm )(zm )}. If for m suffi∗ ∗ ∗ is a and zm ciently large, inf{(J ◦ ψm )(z) : z ∈ V (Φ)} = Jm ∈ V (Φ), then zm minimizing sequence. An interesting case is where Wm are closed subspaces of Z and Wm ⊂ Wm+1 ⊂ · · · (the reader should consider the case where Z is a Hilbert space). Definition 12.4. {Pm } strongly approximates P if (i) {Pm } weakly approximates P; and

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_12

163

12 Approximation I: Generalities

164

(ii) for m ≥ m1 , there exist surjective bounded linear maps ψm : Z → Wm ∗ ∗ ∗ ∗ ) = wm and zm → z ∗ with z ∗ ∈ V (Φ) and J(zm such that ψm (zm )= ∗ ∗ ∗ Jm = Jm (wm ) → J . ∗ In the case of a strong approximation, zm is a convergent minimizing sequence.

Example 12.5. Let E = [t0 , t1 ] and let Z = Cp1 (E) and Φ(z) = z˙ − f (t, z(t)) with z(t0 ) = z0 . Let Π = {s0 , . . . , sS n } be a partition of E and E0 = {s0 }, Ej = (sj−1 , sj ], j = 1, . . . , n. E = Ej a disjoint union. Let µ(Ej ) = hj , j = 1, . . . , n, and h = sup{µ(Ej )}. Here V (Φ) = {z( · ) : z˙ − f (t, z) = 0, z(t0 ) = z0 } and J is, for the moment, simply a “nice” map of Z → R. We begin by approximating V (Φ). Consider Φn where we have z(s0 ) = z0 , z(sj ) = z(sj−1 ) + hj f (sj−1 , z(sj−1 )),

j = 1, . . . , n.

Then, Φn is, in some sense, an approximation or, here, a discretization of Φ. Let F (s) be given by F (s) =

n X

χj (s)f (sj−1 , z(sj−1 ))

j=1

where χj ( · ) is the characteristic function of Ej . Then Z sj F (s)ds = (sj − sj−1 )f (sj−1 , z(sj−1 )) = hj f (sj−1 , z(sj−1 )) sj−1

for j = 1, . . . , n, and so Φn becomes Z

s

z(s) − z0 −

F (r)dr = 0. t0

If {Ejα } is a refinement of {Ej } with sup{µ(Ejα )} = hα < h, then as hα ↓ 0, the solution of Φα = 0 converges to the solution of Φ = 0 (modulo the usual assumptions on f ). Assuming appropriate properties of J, we get an approximation via Jn , i.e., J discretized. Example 12.6. Let E = [t0 , t1 ] and σ : E → P(U ) a bounded measurable map and consider the system ˙ = vσ (t, z(t)), z(t0 ) = z0 , z(t) Z f (t, z(t), u)µσ (du)(t). vσ (t, z(t)) = U

As in Example 12.5, let {s0 , . . . , sn } be a partition with E0 = {s0 }, Ej = (sj−1 , sj ], µ(Ej ) = hj , h = sup hj . Let Φh be the system

12 Approximation I: Generalities

165

z(s0 ) = z0 , z(sj ) = z(sj−1 ) + hj vσ (sj−1 , z(sj−1 )), and let F (s) =

n X

χj (s)vσ (sj−1 , z(sj−1 ))

j=1

so that

Z

sj

F (s)ds = hj vσ (sj−1 , z(sj−1 )), sj−1

and so, Φh becomes Z

s

zh (s) − z0 −

F (r)dr = 0. t0

Similar comments follow as in Example 12.5. Approach 2. Consider Problem 12.1 (P) and suppose that E is a compact metric space with a regular finite Borel measure µ. Let {Ej } be a partition of E and χj be the characteristic function of Ej . Assume that the elements of Z are maps x : E → X, X a Banach space. If t ∈ E and x ∈ Z, let ˜ (t) = Σχj (t)x(t) x ˜ j (t) , x ˜ j (t) = χj (t)x(t). = Σx ˆ : E → X be a regulated We call {Ej } a δ-partition if sup{µ(Ej )} < δ. Let x function with ˆ (Ej ) = xj . x ˆ ( · ) an (, δ)-approximation if We call x (i) {Ej } is a δ-partition; ˆ is close to V (Φ); (ii) ||Φ(ˆ x)|| < , i.e., x (iii) |J(ˆ x) − J ∗ | <  where J ∗ = inf{J(x) : x ∈ V (Φ)}. [Note that the definition presupposes all items are defined.] Suppose, for simplicity (without loss of generality) that δ = e. Let P be the problem: Problem 12.7 (P ). Find an (, ) approximation to P. Definition 12.8. Let n ↓ 0 and let Pn = Pn . Let x∗n be an n -approximation, i.e., a solution of Pn . Then Pn weakly approximates P if for n ≥ n0 , J(x∗n ) → J ∗ and Φ(x∗n ) → 0. {Pn } strongly approximates P if {Pn } weakly approximate P and x∗n → x∗ with x∗ ∈ V (Φ) (i.e., x∗n is (in a sense) a convergent minimizing sequence).

12 Approximation I: Generalities

166

Before providing an example, we consider still another approach [O-1]. Approach 3. Let Z, W, U be Banach spaces and let Zd , Yd be Banach spaces and let ϕd : Z → Zd , ψd : Y → Y d be surjective bounded linear maps. Suppose that F : Z → Y. Definition 12.9. (ϕd , ψd ) is called a discretization of Z, Y and F is compatible with the discretization (ϕd , ψd ) if ϕd (z1 ) = ϕd (z2 ) ⇒ ψd (F (z1 )) = ψd (F (z2 )). In other words, if z1 −z2 ∈ Ker ϕd , then F (z1 )−F (z2 ) ∈ Ker ψd . Let Fd : Zd → Yd be given by zd = ϕd (z),

Fd (zd ) = ψd (F z), or, equivalently,

Fd (ϕd (z)) = ψd (F (z)). If F is compatible, then Fd is well defined and completes the diagram Z

F

/Y

Fd

 / Yd

ϕd

 Zd

ψd

Let F˜ : Z/Ker ϕd → Y/Ker ψd be given by F˜ (z) = F (z) (where indicates appropriate residue). If F is compatible, then F˜d is well defined and completes the diagram Z/Ker ϕd



ϕd

 Zd

/ Y/Ker ψd ψd

Fd

 / Yd

where ϕd (z) = ϕd (z), ψ d (y) = ψd (y), y = F (z) and F d = Fd . Example 12.10. Let Φ : Z → Y, V (Φ, y) = {z : Φ(z) = y} and J : Z → R. Let J ∗ (Φ, y) = inf{J(z) : z ∈ V (Φ, y)}. Let ϕd : Z → Zd , ψd : Y → Yd be surjective linear maps. Let Φd : Z → Yd be given by Φd (z) = ψd (Φ(z)) and

12 Approximation I: Generalities

167

let zd = ϕd (z). Define a map Φ˜d : Zd → Yd by Φ˜d (zd ) = Φd (z),

zd = ϕd (z).

If z ∈ V (Φ, y), then ψd (Φ(z)) = ψd (y) = yd and Φd (zd ) = yd . If Φ is compatible, then ϕd (z1 ) = ϕd (z2 ) implies ψd (Φ(z1 )) = ψd (Φ(z2 )). Then z ∈ V (Φ, y) implies zd ∈ V (Φd , yd ), yd = ψd (y), i.e., ϕd (z) ∈ V (Φd , yd ), and compatibility carries over. However, in general, compatibility is too strong a requirement so we consider an intermediate factorization. Let K ∈ L(W, Y) and F : Z → Y. Definition 12.11. F is K-factorable if ∃ Θ : Z → W with dom Θ = dom F such that F = KΘ. Let F(K) = {F : F is K-factorable} and suppose L is another element of L(W, Y). Define a map K(K, L) : F(K) → F(L) by K(K, L)(F ) = K(K, L)(KΘ) = LΘ = G (note dom G = dom Θ). Definition 12.12. F ∈ Fd (K, L) if (i) F ∈ F(K); and (ii) dom K(K, L) ⊂ dom ϕd , i.e. F = KΘ, LΘ = G with dom Θ ⊂ dom ϕd . We have the diagrams: Z

Θ

/W

ϕd

K

/Y

Z

/W

ϕd

ψd

 Zd

Θ

 Zd

 Yd

L

/Y ψd

 Yd

So compatibility means ψd (KΘ) = (Kθ)d ◦ ϕd where (KΘ)d : Zd → Yd ,

(K θ)d (zd ) = ψd (Kθz),

and zd = ϕd (z). If F ∈ Fd (K, L), then there is a map Fd : Zd → Yd given by Fd (zd ) = ψd (LΘz),

ϕd (z) = zd ,

or, equivalently, (LΘ)d (zd ) or K(K, L)(zd ). Let us look at an example to illuminate the opaqueness of the abstraction. Example 12.13. Let Z

t

z(t) = z0 +

f (s, z(s))ds t0

12 Approximation I: Generalities

168

and let (Θz)(r) = f (r, z(r)), Z r (Kw)(r) = w(s)ds, t0

so that

Z

r

(F z)(r) = (KΘ)(z)(r) =

f (s, z(s))ds. t0

Let L = Kd where (Kd w)(r) =

n X

αj χj (r)w(sj )

j=1

(where {sj } is a partition of the interval). Compatibility requires ϕd (z) = ψd (z) here and so we need the same partition. [The reader should work through this example.] Now let us examine still another formulation. Let E be a finite regular Borel measure space and let X, U be Banach spaces. Let u : E → U , x : E → X, X0 ⊂ X, σ : E → P(U ), Ω = {u : u bounded measurable}, and Ωg = {σ : σ bounded, measurable}. Suppose that, if u ∈ Ω, x0 ∈ X0 , then there exists a unique xu such that Φ(xu , u, x0 ) = 0 (where Φ maps appropriate function spaces into X). Similarly, suppose that if σ ∈ Ωg and x0 ∈ X0 , there exists a unique solution xσ such that Φg (xσ , σ, x0 ) = 0. As in previous chapters, we view Ω ⊂ Ωg . Definition 12.14 ([W-2]). Φg is a Young Extension of Φ if Φg (xu , x, x0 ) = Φ(xu , u, x0 ) (all u, x0 , xu ). Similarly, xg,σ is a Young Extension of xu = x(u, x0 ) if xg,σ = x(σ, x0 ) = x(u, x0 ), σ = u (Dirac measure). Let J : Ω × X0 → R be given by J(u, x0 ) = ψ(xu , u, x0 ) and Jg : Ωg × X0 → R be given by Jg (σ, x0 ) = ψg (xσ , σ, x0 ). Suppose that Jg is a Young Extension of J. Let Σα = {Aα ⊂ U, Aa 6= φ} be a family of subsets of U and let Σα (X) = {B ⊂ X :(i) X0 ⊂ B; (ii) if ϕ : E → Σα is given, then u(t) ∈ ϕ(t) a.e. implies xu (t) ∈ B a.e.}. These represent constraints. One can define Σg,α and Σg,α (X) similarly.

12 Approximation I: Generalities

169

Problem 12.15 (A). Minimize J : Ω × X0 → R subject to x0 ∈ X0 , u(t) ∈ ϕ(t) a.e., and Φ(xu , u, x0 ) = 0. Problem 12.16 (B). Minimize J : Ω × X0 → R subject to x0 ∈ X0 , u(t) ∈ ϕ(t) a.e., xu (t) ∈ some B ∈ Σα (X) a.e., and Φ(xu u, x0 ) = 0. Clearly we can formulate Problems Ag , Bg analogously. Definition 12.17. An approximate solution to Problem 12.15 (A) is a sequence {(un , x0,n )} with un ∈ Ω, x0,n ∈ X0 , un (t) ∈ ϕ(t) a.e. on E such that J(un , x0,n ) → J ∗ = inf J, xn = xun (un , x0 ) → x∗ as n → ∞ (i.e., a convergent minimizing sequence). Definition 12.18. An approximate solution to Problem 12.16 (B) is a sequence {(un , x0,n } with un ∈ Ω, x0,n ∈ X0 , un (t) ∈ ϕ(t) a.e. and xn = xun (un , x0 ) ∈ Σα (X) a.e. (i.e., in an element B of Σα (X)) such that J(un , x0,n ) → J ∗ (= inf J), xn = xun (un , x0,n ) → x∗ , i.e., a convergent minimizing sequence. We can similarly define approximate solutions for the “generalized” Problems Ag , Bg . Example 12.19. Let f : E × X × U → R, K : X0 → R. Suppose (1) if x ∈ X, then f ( · , x, · ) : E × U → X is an element of L1 (E, C (U, X)) so that f (t, x, · ) ∈ C (U, X) a.e. in t; (2) there is a ψ ∈ L1 (E, R) such that ||f (t, x1 , u) − f (t, x2 , u)|| ≤ ψ(t)||x1 − x2 || for t ∈ E, x1 , x2 ∈ X and u ∈ co (U ) = U. Let σ : E → P(U ) be bounded measurable. Then Z tZ xσ (t) = x0 +

f (s, xσ (s), u)µσ (du)(s)ds t0

U

has a unique absolutely continuous solution for x0 ∈ X0 and Z f (t, xσ (t), u)µσ (du)(t) x˙ σ (t) = U

a.e. in t. Let Ω = {u( · ) : u : E → U , bounded measurable}. If u ∈ Ω, let σu (t) = µu (t) where µu ( · ) is the atom with µu (t)(A) = 1 if u(t) ∈ A and 0 if u(t) 6∈ A.

12 Approximation I: Generalities

170

Let Ωg = {σ( · ) : σ : E → P(U ), σ bounded measurable}. Then Ω ⊂ Ωg via u → σu and Z f (s, xσu (u)(s), u)µσ (u)(s) = f (s, xu (s), u(s)). U

Let x : Ω × X0 → R via Z f (s, xu (s), u(s))ds

x(u, x0 ) = x0 + E

and let J : Ω × X0 → R be given by Z J(u( · ), x0 ) = K(x0 ) +

L(s, xu (s), u(s))ds E

with

Z

s

f (r, xu (r), u(r))dr.

xu (s) = x0 + t0

Consider Ωg × X0 and xg : Ωg × X0 → X where Z Z xg (σ( · ), x0 ) = x0 + f (s, xσ (s), u)µσ (du)(s)ds. E

U

Then xg = x on Ω × X0 and is a Young extension or representation of x. Also for Jg : Ωg × X0 → X given by Z Z Jg (σ( · ), x0 ) = K(x0 ) + L(s, xσ (s), u)µσ (du)(s)ds. E

U

Then Jg is a Young extension of J. By Prokhorov’s theorem, Ωg is weak-*-compact and if U is compact, then P(U ) and Ωg are compact. As we saw in Chapter 7, under appropriate conditions, there are approximation for both Problem 12.15 (A) and Problem 12.16 (B) and Problems Ag , Bg . Example 12.20. Consider again the general problem: minimize J(z) subject to z ∈ V (Φ). Let H(z, λ) = J(z) + λ[Φ(z)] where λ ∈ Y ∗ . If z∗ is a local minimum of J on V (Φ), then z∗ is also a local minimum of H(z, λ) on V (Φ). It follows that Dz H(z∗ , λ) = Dz J(z∗ ) + λDz Φ(z∗ ) = 0 and Dλ H(z∗ , λ) = Φ(z∗ ) = 0 (assuming the derivatives exist).

12 Approximation I: Generalities

171

These necessary conditions (translated to standard control problems) have been used for approximation. The drawback, of course, is that the conditions are not sufficient. In effect, one approximates the problem Dz J(z) + λDz Φ(z) = 0, Dλ H(z, λ) = Φ(z) = 0 by problems of the form Dz Jm (zm ) + λm Dz Φm (zm ) = 0, Dλm H(zm , λm ) = Φm (zm ) = 0 where these represent “discretizations.” Let Z = H be a Hilbert space and let J : H → R. Suppose V ⊂ H (V may be of the form V = V (f ) where f : H → Y, Y another Hilbert space). Let {ϕj }∞ j=1 be an orthonormal basis of H and Hm = span {ϕ1 , . . . , ϕm } so that Hm ⊂ Hm+m1 , m1 = 1, 2, . . .. Hm is a closed subspace for all m. Let Em : H → Hm be the projection so that Em is a surjective bounded linear map; and let Vm = Em V . Consider Problem 12.21 (P). inf{J(z) : z ∈ V}. Problem 12.22 (Pm ). inf{Jm (zm ) : zm ∈ Vm , Jm = J ◦ Em }. We call Pm a Ritz–Galerkin approximation to P. If there is a solution z ∗ of P, then for large m, there is a zm (arbitrarily) close to z ∗ with Jm (zm ) = J(zm ) = J(Em z ∗ ) (arbitrarily) close to J ∗ . In other words, there is a convergent minimizing sequence. In any case, if J ∗ = inf{J(z) : z ∈ V} is finite, then there is a minimizing sequence {zm ∈ Vm }. This somewhat opaque chapter is at the heart of the idea behind direct methods. We will further develop the ideas concretely in the sequel. The abstract formulations all have concrete versions in control problems.

Chapter 13

Approximation II: Standard Discretizations

Let E = [t0 , t1 ] and consider the control problem: minimize Z

t1

J(u) =

L(x, u)dt t0

subject to x˙ = f (x, u),

x(t0 ) = x0 ,

and u( · ) ∈ L2 (E, U) (or L∞ (E, U)) with U bounded. The standard assumptions of Chapters 7 and 10 are assumed and so the maps u( · ) → xu ( · ) and u( · ) → J(u( · )) ∈ R are continuous. Let Πν = {s0 , . . . , sNν } be a partition of E with Ejν = (sj , sj+1 ] and µν = sup{|sj+1 − sj |} the mesh of Πν . Let {uνj : uνj ∈ U, j = 0, . . . , Nν − 1} be a sequence of elements of U and let uν (s) be the step-function based on {uνj }, i.e., uν (s) = uνj , s ∈ Ejν . Consider the systems Z

s

f (xν (r), uν (r))dr,

xν (s) = x0 + ν (s) = xνj + xj+1

s Z 0sj+1

f (xjν , uνj )dr,

(13.1) (13.2)

sj

and the corresponding costs J(uν ),

J({ujν }).

For instance,

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_13

173

13 Approximation II: Standard Discretizations

174

J(uν ) = inf

t1

nZ

L(xν (s), uν (s))ds : uν ( · ) a step function based on Πν

o

t0

and J({ujν }) = inf

ν −1 n NX

L(xνj , uνj )∆j : ∆j = (sj+1 − sj ), uνj ∈ U

o

j=0

(note that J(uν ) ≥ J(u)). Now, for ease of exposition, we suppose that J is an initial coordinate of ˙ where the system (i.e., x˙ 0 = L(x, u) and the system is technically y˙ = (x˙ 0 , x) y = (x0 , x) — we shall slur over the technicality). Let zjν = ||xν (sj ) − xjν ||, ν zj+1

= ||xν (sj+1 ) − Z = xν (sj ) +

so that

xνj+1 || sj+1 f (xν (r), ujν )dr − xνj −

Z

sj

≤ zjν +

Z

sj+1

sj

f (xνj , uνj )dr

sj+1

||f (xν (r), ujν )dr − f (xνj , uνj )||dr

sj

Z

sj+1

||f (xν (sj ), uνj ) − f (xνj , ujν )||dr

+ sj

≤ zjν +

Z

||f (xν (r), ujν ) − f (xνj , uνj )||dr

Ej

Z +

L||xν (sj ) − xjν ||dr

Ej

≤ zjν + M µ2 + Lµ0 zjν where L is the Lipschitz constant and M/2 is the bound for f . Lemma 13.3. Let zn , n = 1, . . . , zn ∈ R. Suppose that zn+1 ≤ (1 + Lµ)zn + M µ, then n (1 + Lµ)n − 1 o + (1 + Lµ)n z0 zn ≤ M µ Lµ and, hence, zn ≤

M µ Lµ (e − 1) + eLµ z0 . Lµ

Proof. Let n = 1 so that by assumption z1 ≤ (1 + Lµ)z0 + M µ ≤ M µ

n (1 + Lµ) − 1 o Lµ

+ (1 + Lµ)z0 .

Now use induction. So assume the inequality for n. Then, by assumption,

13 Approximation II: Standard Discretizations

175

zn+1 ≤ (1 + Lµ)zn + M µ n (1 + Lµ)n − 1 o ≤ (1 + Lµ)M µ + M µ + (1 + Lµ)n+1 z0 Lµ n (1 + Lµ)n+1 − (1 + Lµ) o ≤ Mµ + 1 + (1 + Lµ)n+1 z0 Lµ n (1 + Lµ)n+1 − 1 o ≤ Mµ + (1 + Lµ)n+1 z0 Lµ t u

and the lemma is established. It follows from the lemma (since z0 = 0) that zjν ≤

M Lµ (e − 1) L

and, hence that if we take refinements of Πν with µν → 0, the difference approximations (13.2) converge to the differential equation. Let P be the problem: find inf{J(u) : u( · ) ∈ U a sphere in L2 (E, U) (or L∞ (E, U)) subject to x˙ = f (x, u) on E} and let Pν Rbe the problem: find inf{J({uν } : ujν ∈ Uν ⊂ U and subject to xνj+1 = xνj + Ej f (xνj , uνj )dr}. Definition 13.4. Pν is a weak approximation to P if (i) there exist admissible solutions for Pν ; and (ii) limµν →0 Jν∗ = J ∗ . Remark 13.5. C (E, U ) is dense in L2 (E, U) (or L∞ (E, U)). By definition S(E, U) is dense in L2 (E, U) so if u( · ) ∈ L2 (E, U), then there exists a sequence ψn ∈ S(E, U) with ψn → u( · ) in L2 (E, U). By Proposition 10.10, there exists ψ˜n ∈ C (E, U ) such that ||ψ˜n − ψn || < /2 for n ≥ n0 and ||ψn −u|| < /2 if n ≥ n0 . It follows that ||ψ˜n −u|| ≤ ||ψ˜n −ψn ||+||ψn −u|| < . Suppose u∗ ( · ), x∗ ( · ) are optimal for P. Let Πν be a partition and let be the step-function solution of the system based on Πν (and u ( · ), x ( · )). Let

u∗ν ( · ), x∗ν ( · ) ∗ ∗

J ∗ = J(u∗ ), Jν∗ = J(u∗ν ), J˜ν = inf Jν ({ujν }). By the remark, there exists a continuous ψ with ||u∗ ( · ) − ψ|| sufficiently small so that |J ∗ − J(ψ)| < /2. Let µν be sufficiently small so that there exists a step-function ψν based on Πν with ||ψν − ψ|| small enough that |J(ψ) − J(ψν )| < /2. Hence, |J ∗ − J(ψν )| < . In other words, J ∗ = J(u∗ ) ≤ J(ψν ) ≤ J(u∗ ) +  and

13 Approximation II: Standard Discretizations

176

J ∗ ≤ J˜ν ≤ J(ψν ) ≤ J(u∗ ) + . Hence, limµν →0 J˜ν∗ = J ∗ , i.e. Pν is a weak approximation. For simplicity, now assume that E = [0, 1] and that Z J(u( · )) =

1

L(x, u)ds 0

subject to x˙ = f (x, u),

x(0) = x0 ,

and u( · ) ∈ L∞ (E, U) with U bounded (or Lp (E, U), p ≥ 1). Under the standard assumptions of Chapter 7, the maps u( · ) → xu ( · ) and u( · ) → J(u( · )) are continuous. Let Πν = {0 = s0 < s1 < · · · < sNν = 1} be a partition, hj = (sj+1 − sj ) and µν = sup{hj : j = 0, . . . , Nν−1 }. Consider the discrete system j j Jνj+1 = γνj (xj+1 ν , xν , uν ), j j xj+1 = gνj (xj+1 ν , xν , uν ), ν

with uνj ∈ U and j = 0, 1, . . . , Nν−1 . The discrete problem Pν is: inf{JνNν : xνj+1 = gνj (xνj+1 , xjν , ujν ), xν0 = x0 , ujν ∈ U}. Example 13.6 (Euler approximation). Let xj+1 = xν (sj+1 ) where ν xν (sj+1 ) = xν (sj ) + hj f (xν (sj ), uν (sj ))

(13.7)

and uν (sj ) = ujν , and let Jνj+1 = Jν (sj+1 ) = Jν (sj ) + hj L(xν (sj ), uν (sj )). In other words, the problem P ν becomes Nν−1

inf{Jν (1) =

X

Jν (sj+1 ) : xν (sj+1 ) = xν (sj ) + hj f (xν (sj ), uν (sj )}.

j=0 j j j j j j j Observe that gνj (xj+1 ν , xν , uν ) = xν + hj f (xν , uν ) and that γν = Jν + j j hj L(xν , uν ). We say that P ν is a weak approximation to P if there exists a solution feasible for Pν such that Jν∗ → J ∗ . Suppose that u( · ) is continuous and that Πν = {s0 , . . . , sNν } is a partition. Let ujν = uν (sj ) and let xu ( · ) be the solution of the differential equation. The error ej,ν is given by

ej,ν = xu (sj ) − xν (sj ),

j = 0, . . . , Nν .

Now note that the solution of (13.7) may not be exact, i.e.,

13 Approximation II: Standard Discretizations

177

xν (sj+1 ) = xν (sj ) + hj f (xν (sj ), ujν ) − τj (τj is called the truncation error). It follows that Z sj+1 ej+1,ν = ej,ν + [f (xu (s), u(s)) − f (xν (sj ), ujν )]ds + τj . sj

But f (xu (s), u(s)) − f (xν (sj ), uνj ) = f (xu (s), u(s)) − f (xν (sj ), u(s)) + f (xν (sj ), u(s)) − f (xνj , ujν ) and xu (s) − xν (sj ) = xu (s) − xu (sj ) + xu (sj ) − xν (sj ) so that, where L is the Lipschitz constant, ej+1,ν + Lhj ej,ν + Lhj sup{||xu (s) − xu (sj )|| + ||u(s) − u(sj )||}, Ej

and so, for µν small enough, ej+1,ν ≤ (1 + Lµν )ej,ν + LM µν + τν . Since f (x, u) is bounded, we can show that τν is o(µν2 ), i.e. |τν | ≤ N µν2 ≤ N µν for µν small. In other words, by the lemma, ej,ν ≤ α(eλµν − 1) + eλµν e0,ν , and there is convergence as µν → 0. By arguments entirely similar to those given earlier, the Euler approximation produces a weak approximation (as the argument applies to J). Example 13.8 (First-order discretization). Let E = [0, 1] and, u( · ) ∈ L∞ (E, U) (or Lp (E, U)), Z

1

J(u( · )) =

L(xu (t), u(t))dt, Z t f (xu (s), u(s))ds xu (t) = x0 + 0

(13.9)

0

(under the standard assumptions). Consider the Problem P inf{J(u( · )) : xu ( · ) satisfies (13.9)}. Let Πν = {0 = s0 < s1 < · · · < sNν = 1} be a partition with hj = (sj+1 − sj ) and µν = sup{hj }. For µν sufficiently small, there is a step-function uν ( · ) = {uνj } (i.e., uν (s) = ujν , s ∈ Ej = [sj , sj+1 )), such that (i) ||u( · ) − uν ( · )||∞

13 Approximation II: Standard Discretizations

178

is small and, by continuity, |J(u( · )) − J(uν ( · ))| and ||xu ( · ) − xν ( · )|| are also small. Note that Z s xν (s) = xν (sj ) + f (xν (r), ujν )dr sj

and hence that Z xν (sj+1 ) = xν (sj ) +

f (xν (r), uνj )dr.

Ej

Now consider the discrete approximation xj+1 = xjν + hj gνj (xjν , uνj ) ν

(13.10)

(with xν0 = x0 = xν (0) and gνj ( · , · ) bounded). Let zj = ||xν (sj ) − xjν || so that Z f (xν (s), ujν )ds − xνj − hj gνj (xνj , ujν )||. zj+1 = ||xν (sj ) + Ej

Observe that f (xν (s), uνj ) = f (xν (s), uνj ) − f (xν (sj ), ujν ) + f (xν (sj ), ujν ) − f (xjν , ujν ) + f (xνj , uνj ) and it follows that Z

Ej

f (xν (s), ujν )ds − hj gνj (xjν , ujν )

Z ≤

Ej

[f (xν (s), ujν ) − f (xν (sj ), ujν )]ds

Z + [f (xν (sj ), uνj ) − f (xνj , ujν )]ds Ej Z + f (xjν , ujν )ds − hj gνj (xνj , ujν ) Ej

and, hence, for a Lipschitz constant L and a bound M , that zj+1 ≤ (1 + Lµ)zj + M µ, and so there is convergence and if Pν is the problem for (13.10), then Pν approximates P (as in the prior considerations). Example 13.11 (Second-order Runge–Kutta). As usual, consider the standard problem P with E = [0, 1] and inf{J(u( · )) : u( · ) ∈ Lp (E, U), p ≥ 1} with

179

13 Approximation II: Standard Discretizations 1

Z J(u( · )) =

L(xu (s), u(s))ds 0

and

t

Z

f (xu (s), u(s))ds

xu (t) = x0 + 0

(under our standard assumptions). Πν = {0 = s0 < s1 < · · · < sNν = 1} and uν (s) = uνj , s ∈ Ej with ||u( · ) − uν ( · )|| small. Then Z

t

xν (t) = x0 +

Z

t

f (xν (s), uν (s))ds = xν (sj ) +

f (xν (s), ujν )ds.

sj

0

Consider the discretization xνj+1 = xνj +

hj {f (xjν , ujν ) + f (xjν + hj f (xjν , ujν ), ujν )} 2

where hj = sj+1 − sj . Note that it is required that xjν + hj f (xjν , ujν ) is in the domain (in x) of f . By Lipschitz and boundedness, this works for µν = sup{hj } small. If there are state constraints, additional assumptions may be needed (cf. [C-8, H-1]). Let zj = ||xν (sj ) − xjν ||. Then zj+1 = ||xν (sj+1 ) −

xj+1 ν ||

Z j = xν (sj ) − xν +

h

f (xν (s), ujν )

Ej



i hj {f (xjν , ujν ) + f (xjν + hj f (xjν , ujν ), ujν )} ds 2

(where hj = sj+1 − sj ). Observe that f (xν (s), ujν ) = f (xjν (s), ujν ) − f (xjν (sj ), ujν ) + f (xjν (sj ), ujν ) and that, for ∆1j , ∆2j given by ∆1j = f (xjν , ujν ) ,

∆2j = f (xjν + hj ∆1j , ujν )

we have f (xν (s), ujν ) − =

hj {∆j1 + ∆j2 } 2

hj hj [f (xν (sj ), ujν ) − ∆j1 ] + [f (xν (sj ), ujν ) − ∆2j ]. 2 2

It follows that zj+1 ≤ (1+Lµ)zj +M µ for suitable L, M . As argued previously, the Problem Pν is a weak approximation to P. A similar argument applies for higher-order Runge–Kutta methods.

13 Approximation II: Standard Discretizations

180

Let E = [0, 1], y = (x0 , x), x˙ 0 = L(x, u), x0 (0) = 0, x˙ = f (x, u), x(0) = x0 . Z t

yu (t) = y0 +

f (yu (s), u(s))ds,

f = (L, f ).

(13.12)

0

Let P be the problem inf{x0 (1) : yu ( · ) given by (13.12)}. Since E is compact, S(E, U) dense. Given u( · ), there exists a partition Πν = {0 = s0 < s1 < · · · < sNν = 1} such that ||u( · ) − uν ( · )|| is small where uν (s) = ujν , s ∈ Ej , and {uνj } are the step values. Consider the approximation yνj+1 = yνj + hj gνj (yνj , yνj−1 , . . . , ujν , uj−1 ν , . . .)

(13.13)

and the Problem Pν : inf{yνNRν : subject to (13.13)}. Let yν (sj+1 ) = yν (sj ) + E f (yν (s), uνj )ds. Let zj = ||yν (sj ) − yνj ||. Then zj+1 = ||yν (sj+1 ) − yνj+1 || Z f (yν (s), uνj )ds − yνj − hj gνj (yνj , . . . , ujν , . . .) = yν (sj ) + Ej

Z ≤ ||yν (sj ) − yνj || +

Ej

f (yν (s), uνj )ds − hj gνj (yνj , . . . , uνj , . . .) .

Now f (yν (s), ujν ) = f (yν (s), ujν ) − f (yν (sj ), ujν ) + f (yν (sj ), uνj ) = f (yν (s), ujν ) − f (yν (sj ), ujν ) + f (yν (sj ), uνj ) − f (yνj , ujν ) + f (yνj , ujν ). It follows, using Lipschitz and boundedness that zj+1 ≤ zj + M1 µ + Lµzj + hj ||f (yνj , ujν ) − gνj (yνj , . . . , uνj , . . .)|| ≤ (1 + Lµ)zj + (M1 + M2 )µ = (1 + Lµ)zj + M µ. Convergence follows from the lemma and so Pν is a weak approximation to P. [Again there is a requirement that (13.13) be defined on an appropriate domain [B-4, D-2, C-8].] Example 13.14 (Linear system). Suppose E = [0, 1] and At

Z

x(t) = e [x0 +

t

e−As Bu(s)ds]

0

with u( · ) ∈ Lp (E, U), p ≥ 1. Let Πν = {0 = s0 < · · · < sNν = 1} be a partition and let u(s) = uj , s ∈ Ej . Then

181

13 Approximation II: Standard Discretizations

xν (sj ) = e

Asj

j Z h X x0 + i=1

 i e−As Bds ui

Ej

and xν (sj ) − xν (sj−1 ) = Kj xν (sj−1 ) + Lj uj with Kj = eAsj e−Asj−1 − I, Z  Lj = eAsj e−As Bds . Ej

Note that if Xj = eAsj , then Xj − Xj−1 = Kj Xj−1 , X0 = I, and also that Kj = eA(sj −sj−1 ) − I = eAhj − I. But eAhj = I + Ahj + and so

A2 h2j + ··· 2!

∞ X ||A||k µk ||eAhj − I|| ≤ ||Ahj || 1 + . (k + 1)! k=1

Since the series converges, ||Kj || ≤ M µ and there is convergence as µ → 0. For Lj , we have Z Z A(sj −s) e Bds = [I + A(sj − s) + · · · ]Bds Ej

Ej

and so, ||Lj || ≤ Ej [1 + ||A||hj + · · · ]||B||ds ≤ e||A||hj ||B||hj and ||Lj || → 0 as µ → 0. Writing xj = xν (sj ), we have R

xj = x0 +

j X (Kr xr−1 + Lr ur ). r=1

Exercise 13.15. Can you modify this treatment for the case Z t x(t) = S(t)x0 + S(t − s)Bu(s)ds 0

where S( · ) is a strongly continuous semi-group? We now turn our attention to “finite element” methods ([B-4, B-5, E-3], Appendix F). Let D be a regular domain (i.e. closed, bounded, non-empty interior, and ∂D piecewise smooth or Lipschitz). Let Xd (D) be a space of functions on D with dim Xd (D) = d, generally finite. Suppose Xd (D) ⊂ X an infinite-dimensional space of functions on D. Let Φ = {ϕ1 , . . . , ϕd } with ϕi ∈ X ∗ and {ϕ1 , . . . , ϕd } (restricted to Xd (D)) be a basis of Xd (D)∗ . Example 13.16. (a) Let D = [a, b], Xk (D) = {ψ(t) : ψ a polynomial of degree ≤ k}. Spaces for X : C (D), C ∞ (D), Lp (D). Let

13 Approximation II: Standard Discretizations

182

 (b − a)i  ϕi (ψ) = ψ a + , k

i = 0, . . . , k.

Note that, in general, ϕ1 , . . . , ϕd is a basis of Xd (D)∗ if and only if ϕi (ψ) = 0, i = 1, . . . , d implies ψ ≡ 0. (b) D = [a1 , b1 ] × [a2 , b2 ]:

Fig. 13.1 D a rectangle

Let Qk (D) = span {pi (x)qj (y) : pi ∈ Pk , qi ∈ Pk } where Pk are polynomials of degree ≤ k. Note that dim Qk (D) = [dim Pk ]2 = (k + 1)2 . Let  (b2 − a2 )j  (b1 − a1 )i ϕij (ψ) = ψ a1 + , a2 + , k k

i = 0, . . . , k,

j = 0, . . . , k.

Then the {ϕij } are a basis of Qk (D)∗ . (c) Let D ⊂ R2 be a triangle with vertices α1 , α2 , α3 . Let Xk (D) = {ψ(x, y) : ψ a polynomial of degree ≤ k} so that dim Xk (D) = (k+1)(k+2)/2. If k = 1, then ϕi (ψ) = ψ(αi ), i = 1, 2, 3, span Xk (D)∗ . If k = 2, then ψ(x, y) = a0 + a1 x + a3 x2 + a4 xy + a5 y 2 , and let α1 = (0, 0), α2 = (2, 0), α3 = (0, 2), α4 = (1, 1), α5 = (0, 1), α6 = (1, 0): α3 α4 α5 α1

α6

α2

Let ϕj (ψ) = ψ(αj ), j = 1, . . . , 6. Then {ϕj } are a basis of Xk (D)∗ . Observe that ψ(α1 ) = 0 ⇒ a0 = 0 , ψ(α2 ) = 0 ⇒ 2a1 + 4a3 = 0, ψ(α6 ) = 0 ⇒ a1 + a3 = 0, so a3 = 0, a1 = 0, ψ(α3 ) = 0 ⇒ a2 + a5 = 0 , ψ(α5 ) = 0 ⇒ 2a2 + 4a5 = 0, so a2 = 0, and ψ(α5 ) = 0, ψ(α4 ) = 0 ⇒ a4 = 0, so ψ ≡ 0. Consider now the control problem with

183

13 Approximation II: Standard Discretizations t

Z x(t) = x0 +

f (x(s), u(s), s)ds,

x(0) = x0 .

0

Let z(t) = x(t) − x0 , g(z, u, s) = f (z + x0 , u, s), then t

Z z(t) = 0 +

g(z(s), u(s), s)ds, 0

and so we may suppose, for ease of exposition, that x0 = 0. Example 13.17. x˙ = Ax + Bu,

x(0) = x0 ,

f (x, u) = Ax + Bu,

z(s) = x(s) − x0 , z˙ = A(z + x0 ) + Bu,

z(0) = 0,

g(z, u) = Az + (Ax0 + Bu),

then Z t x(t) = eAt x0 + eAt e−As Bu(s)ds, 0 Z t Z t At −As Ax0 ds + e e z(t) = eAt e−As Bu(s)ds 0 0 Z t Z t d(−e−As ) At =e ds · x0 + eAt e−As Bu(s)ds ds 0 0 Z t = [eAt − I]x0 + eAt e−As Bu(s)ds 0

= x(t) − x0 . Let E = [0, T ] and f : Rn × Rm → Rn . Consider the system Z x(t) =

t

f (x(s), u(s))ds ,

x(0) = 0.

(13.18)

0

For suitable ϕ : E → (Rn )∗ , we have Z

T

Z

˙ ϕ(s)[x(s) − f (x(s), u(s))]ds = 0

0

T

T [ϕ(s)x(s) ˙ − ϕ(s)f (x(s), u(s))]ds + ϕx . 0

(13.19) Given u( · ), then xu ( · ) is a solution of (13.18) if and only if it is a solution of (13.19) for all “suitable” ϕ. This is at the heart of the finite element approach. By way of illustration, suppose that n = 2, m = 1, so that x = (x1 , x2 ), u = (u). Let Pk (E) = {ψ(s) : ψ a polynomial of degree ≤ k}. If u( · ) = (u( · )) ∈ Pν (E) and x1 ( · ) ∈ Pν (E), x2 ( · ) ∈ Pν (E) (so that x( · ) ∈ [Pν (E)]2 = Xν (E)), then, letting ϕ( · ) be an element of Xν (E)∗ (as the restriction of an element of a larger space C (E, R2 ), C ∞ (E, R2 ),

13 Approximation II: Standard Discretizations

184

Lp (E, R2 ), etc.), then an approximation equation to (13.19) can be developed. Example 13.20. Let µ = 1, ν = 1. Then u(s) = u0 + u1 s and x1 (s) = x10 + x1s s, x2 (s) = x20 + x21 s. Then T

   x10 f (x + x11 s, x20 + x21 s, u0 + u1 s) ds − 1 10 x20 f2 (x10 + x11 s, x20 + x21 s, u0 + u1 s) 0   Z T x1s + x11 s ˙ ϕ(s) (13.21) = · x20 + x22 s 0   f (x + x11 s, x20 + x21 s, u0 + u1 s) ds + ϕ(s) · 1 10 f2 (x10 + x11 s, x20 + x21 s, u0 + u1 s)   x + x11 T . − ϕ(T ) 10 x20 + x21 T

Z



ϕ(s) ·

Let e1 =

  1 , 0

e2 =

  0 , s

so that x10 + x11 s = (x10 , x11 ) · e1 + (x10 , x11 ) · e2 . For ease of exposition assume T = 1. Let ϕ11 (e1 ) = 1,

ϕ11 (e2 ) = 0,

ϕ12 (e1 ) = 0,

ϕ12 (e2 ) = 1

be a dual basis so that ϕ11 (x1 (s)) = x1 (0), ϕ12 (x1 (s)) = [x1 (1) − x1 (0)]. Similarly, we have ϕ21 and ϕ22 (and ϕ11 (x2 ) = 0, ϕ12 (x2 ) = 0, ϕ21 (x1 ) = 0, ϕ22 (x1 ) = 0) so that ϕ(s) = [α11 ϕ11 + α12 ϕ12 , α21 ϕ21 + α22 ϕ22 ] and ϕ(s) ·

  x1 (s) x2 (s)

= [α11 x1 (0) + α12 (x1 (1) − x1 (0)) + α21 x2 (0) + α22 (x2 (1) − x2 (0))]. The reader can calculate (13.21) and see that, given u0 , u1 , it is an equation in the xij and αij . Example 13.22. Consider the system (on E = [0, 1]) with x ¨ + f (t, u) = 0,

= b, ˙ −x(0)

x(1) = a,

185

13 Approximation II: Standard Discretizations

and the problem: find u( · ) such that 1

Z

[xu2 (s) + u2 (s)]ds

J(u) = 0

is minimized. Observe that Z xu (t) = a + (1 − t)b + t

1

hZ

s

i f (r, u(r))dr ds.

(13.23)

0

R1 Let H 1 = {v(s) : 0 v(s) ˙ 2 ds < ∞}, S = {v ∈ H 1 : v(1) = a}, and V = {w ∈ 1 H : w(1) = 0}. Consider the (weak) problem: find u such that xu ∈ S and 1

Z

1

Z

w(s)f (s, u(s))ds + w(0)b

˙ x˙ u (s)ds = w(s) 0

(13.24)

0

for all w ∈ V and J(u) is minimized. Claim 1. If xu is a solution of (13.23), then xu is a solution of (13.24). Proof. We have 1

Z

w(¨ xu + f (t, u))dt for any w ∈ V .

0=− 0

Integrate by parts to get Z Z 1 w˙ xds 0= ˙ − 0

0

1

1 wf (t, u)ds − wx˙ u . 0

But x˙ u (0) = −b, w(1) = 0, so xu is a solution of (13.24).

t u

Claim 2. If xu satisfies (13.24), then xu satisfies (13.23). Proof. Since xu satisfies (13.24), xu ∈ S and xu (1) = a. But Z

1

1

Z

wf (t, u)dt + w(0)b for all w ∈ V ,

w˙ x˙ u ds = 0

0

and, integrating by parts and noting w(1) = 0, Z 0=

1

w(¨ xu + f )dt + w(0)[x˙ u (0) + b].

(13.25)

0

xu + f ) where ϕ > 0 on (0, 1), ϕ(0) = ϕ(1) = 0 (e.g. ϕ(t) = Let w = ϕ (¨ t(1 − t)). Then w ∈ V and (13.25) becomes Z 0= 0

1

xu + f )2 dt ϕ(¨

13 Approximation II: Standard Discretizations

186

¨u + f = 0 and (13.25) becomes Since ϕ > 0, x 0 = w(0)[x˙ u (0) + b]. But take w ∈ V , with w(0) 6= 0 to get the result.

t u

Let Π = {0, s1 , . . . , sn , 1} be a partition of [0, 1] so 0 = s0 < s1 < · · · < sn < sn+1 = 1, and let h = sup |si+1 − si | be the mesh size. R1 Let [w, x] = hw, f i + w(0)b, [w, x] = 0 w˙ xds ˙ and, consider the problem: minimize Z 1

(xu2 + u2 )ds

J(u) = 0

subject to xu being a solution of (13.24). Several approaches are possible. Approach 1 (Ritz–Galerkin). Let W ⊂ V with W finite-dimensional and let {w1 , . . . , wn } be a basis of W and wn+1 6∈ W with wn+1 (1) = 1. Let β = ˜u ∈ V and assume that [wj , wn+1 ] = 0, ˜u = xu − β so that x awn+1 and set x j = 1, . . . , n, and hwj , wj+1 i = 0, j = 1, . . . , n. Observe that [w, xu ] = [w, x ˜u ] ˜u ] + [w, βu ]. If w ∈ W , then [w, xu ] = [w, x and we may consider the problem: minimize Z 1 J(u) = x2u + u2 )ds (˜ 0

subject to ˜u ] = hw, f i + w(0)b [w, x

(13.26)

for all w ∈ W .13.1 ˜u = Σxj wj , kij = [wi , wj ], fj = hwj , f i, and θj = wj (0)b, then (13.26) If x becomes Kx = f + θ where x = (xj ), f = (fj ), θ = (θj ). It is easy to see that if y is such that Ky = 0, then y = 0 so that K is invertible and (13.26) has a unique solution given u. If u ∈ a finite-dimensional subspace, then calculation can produce a solution. Approach 2 (Finite element). Let Π = {s0 , . . . , sn+1 } be a partition of [0, 1] so that 0 = s0 < s1 < · · · < sn+1 = 1 and h = sup |si+1 − si | be the mesh size. Let µj = sj+1 − sj and let 13.1

Note that

R1 0

2βwj ds = 2ahwn+1 , wj i = 0, j = 1, . . . , n.

187

13 Approximation II: Standard Discretizations

(

s1 −s µ1 ,

s0 ≤ s ≤ s1 , 0, s > s1 ,  sj−1 )   (s −   µj−1 , sj−1 ≤ s ≤ sj , ϕj (s) = (sj+1 − s) , sj ≤ s ≤ sj+1 , j = 2, . . . , n,  µj   0, elsewhere, ( s−sn µn , sn ≤ s ≤ sn+1 , ϕn+1 (s) = 0, s < sn .

ϕ1 (s) =

These are clearly independent as ϕj (si ) = δji . Let βΠ = aϕn+1 and let VΠ = span {ϕ1 , . . . , ϕn },

S Π = V Π + βΠ .

Note that VΠ ⊂ V and SΠ ⊂ S. Given u, let xu =

n X

aj ϕj + aϕn+1 ∈ SΠ

j=1

and consider the equation [xu , w] = hf, wi + w(0)b for all w. For ϕi , i = 1, . . . , n, we have Σaj [ϕj , ϕi ] + a[ϕn+1 , ϕi ] = hf, ϕi i + ϕi (0)b or, if K = [ϕi , ϕj ], a = (aj ), and F = {hf, ϕi i + ϕi (0)b − a[ϕn+1 , ϕi ]}, then Ka = F. As in Approach 1, K is invertible so there is a unique solution and given u ∈ a finite-dimensional subspace, there is a solution to the control problem. Example 13.27. Consider the system (on E = [0, 1]) with x ¨ + f (t, u) = 0,

x(0) = 0, x(1) ˙ = 0,

(13.28)

and the problem: find u( · ) such that Z J(u) =

1

[x2u + u2 ]ds

0

is minimized. Let V = {w ∈ H 2 : w(0) = 0}. Then, integration by parts gives

13 Approximation II: Standard Discretizations

188 1

Z hf, wi =

Z

1

f (s, u(s))w(s)ds = 0

xu (s))w(s)ds (−¨ 0

Z 1 = (−x˙ u )w + 0

1

˙ u(s)ds w(s) ˙

0

1

Z

˙ x˙ u (s)ds = [w, xu ] w(s)

=

(13.29)

0

as w(0) = 0, x˙ u (1) = 0. As before, (13.28) and (13.29) are equivalent. Let Π = {s0 , . . . , sn } be a partition and let VΠ = {v ∈ V : v|Ej ∈ Pk , v(0) = 0} (where the Pk are polynomials of degree ≤ k). VΠ is contained in V . Suppose, for the moment k = 1 (linear polynomials). Let ϕj , j = 1, . . . , n, be such that ϕj (si ) = δji . Claim. ϕj is a basis of VΠ . First if Σaj ϕj (si ) = 0, then aj = 0, so the ϕj are independent. If w ∈ V , the interpolant wΠ ∈ VΠ of w is given by wΠ (s) =

n X

w(sj )ϕj .

j=1

But if w ∈ VΠ , then wΠ = w so the ϕj span. Letting xu = Σaj ϕj , we have, for ϕi , i = 1, . . . , n hf (t, u), ϕi i = Σaj [ϕj , ϕi ] and Ka = F with K = ([ϕi , ϕj ]), a = (aj ), F = (hf (t, u), ϕi i). Again K is invertible and, for u in a finite-dimensional space, we get a solution. Example 13.30. Let D ⊂ R2 be a regular domain and let Γ = ∂D and n the unit exterior normal to Γ . Let (x, y) be the coordinates on R2 (we write dx = dxdy). Then  ∂ ∂  , , ∂x ∂y ∂n ψ = h∇ψ, ni, ∇=

∇2 ψ = h∇, ∇ψi =

∂2ψ ∂2ψ + , ∂x2 ∂y 2

and Z h∇ψ, ∇ϕidx.

[ψ, ϕ] = D

By Green’s theorem Z Z Z [ψ, ϕ] = h∇ψ, ∇ϕidx = − (∇2 ψ)ϕdx + (∂n ψ)ϕdγ. D

D

Γ

189

13 Approximation II: Standard Discretizations

Suppose that Γ = Γd + Γn with Γd ∩ Γn = φ. Consider L2 (D) and H 1 (D) = {ϕ ∈ L2 (D) : ∇ϕ ∈ L2 (D)} with norm (see Appendix E) o1/2 n Z o1/2 nZ [ |∇ϕ|2 + |ϕ|2 ]dx ||ϕ||1,D = . = [h∇ϕ, ∇ϕi + |ϕ|2 ]dx D

D

Consider the system −∇2 ψ + cψ = Bu,

c ≥ 0,

B bounded,

(13.31)

with boundary data trΓd ψ = ψd , ∂n ψ = ψn (with ψd , ψn given on Γd , Γn , respectively). Let Z {ψ(u)2 + hQu, ui}dx, Q coercive. J(u) = D

Then the problem is: find u such that ψ(u) satisfies (13.31) and minimizes J(u). We “replace” (13.31) by the “weak” version Z Z Z cψϕdx = ψn ϕdγ (13.32) [ψ, ϕ] + (Bu)ϕdx + Γn

D

D

with T rΓd ψ = ψd and (13.32) holding for all ϕ ∈ HΓ1 (D) = {ϕ ∈ H 1 (D) : ϕ(Γd ) = 0}. We note also that Green’s theorem insures that ∂n ψ = ψn . Assume there exists ψ0 ∈ H 1 (D) such that T rΓd ψ0 = ψd . If ψ(u) is a ˜ = ψ(u) − ψ0 is an element of solution of (13.32) (or (13.31)), then ψ(u) HΓ1d (D). Let Z 2 ˜ 2 + 2ψ(u)ψ ˜ ˜ {[ψ(u) J(u) = 0 + ψ0 ] + hQu, ui}dx. D

˜ Then J(u) = J(u). Observe that Z Z Z ˜ ϕ] + ˜ cψϕdx cψ0 ϕdx. cψϕdx − [ψ, = [ψ, ϕ] − [ψ0 , ϕ] + D

D

D

If ψ satisfies (13.32), noting that (Green’s theorem) Z Z Z Z [ψ0 , ϕ] = − (∇2 ψ0 )ϕdx + (∂n ψ0 )ϕdγ = − (∇2 ψ0 )ϕdx + D

Γ

D

(as ∂n ψ0 = 0 on Γd ) and that ∂n ψ = ∂n ψ˜ − ∂n ψ0 , then Z Z Z ˜ ˜ ϕ] + ˜ = (∂n ψ)ϕdx. cψϕdx [ψ, [Bu + ∇2 ψ0 ]ϕdx + D

D

(∂n ψ0 )ϕdγ

Γn

Γn

(13.33)

13 Approximation II: Standard Discretizations

190

Conversely, if ψ˜ satisfies (13.33), then ψ˜ + ψ0 satisfies (13.32). ˜ ∗ ) minimize J( ˜ · ) and Moreover, if u∗ , ψ(u∗ ) minimize J, then u∗ , ψ(u conversely. Now let us turn to the question of discretization. Let us consider the following problem: Z Z Z cψ(u)ϕdx = [ψ(u), ϕ] + [B(u)]ϕdx + (ψn )ϕdγ D

D

Γn

with c > 0, ψ(u) ∈ HΓ1d (D) for all ϕ ∈ HΓ1d (D) and Z [ψ(u)2 + hQu, ui]dx J(u) = D

with Q coercive and B( · ) continuous and bounded. Let Ψ = H 1 (D) with Z 1/2 ||ψ|| = ||ψ||1,D = {|∇ψ|2 + |ψ|2 }dx D

and let Ψ0 = {ψ ∈ Φ : T rΓd ψ = 0} = HΓ1d (D). Let Z Z ψϕdx, h∇ψ, ∇ϕidx + c α(ψ, ϕ) = D

c>0

D

(α is a bilinear form) and let Z Z [B(u)]ϕdx + λ(ϕ) = D

ψn ϕdγ

Γn

which is a linear form in ϕ. The problem becomes: find u ∈ U such that ψ(u) ∈ Ψ0 , (13.34) α(ψ(u), ϕ) = λ(ϕ), for all ϕ ∈ Ψ0 and J(u) is minimized (cf. Chapters 1, 5, 6). Note that: (1) Ψ is a Hilbert space; (2) Ψ0 is a closed subspace; (3) α(ψ, ϕ) is continuous (|α(ψ, ϕ)| ≤ m||ψ||||ϕ||); (4) λ is continuous (|λ(ϕ)| ≤ c||ϕ|| for all ϕ ∈ Ψ ); (5) as c > 0, α( · , · ) is coercive, i.e. α(ϕ, ϕ) ≥ α||ϕ||2 for all ϕ ∈ Ψ0 and α > 0. Under these assumptions with B(u) ∈ L2 (D), there is a unique solution of (13.34). Moreover ||ψ(u)|| ≤ c/α and so Z h 2 i c 0 ≤ J(u) ≤ + ||u||2Q dx α D where ||u||2Q = hQu, ui. Aside. Consider the problem (pure Neumann) −∇2 ψ = B(u), ∂n ψ = ψn (i.e., c = 0). Then (Green’s theorem)

191

13 Approximation II: Standard Discretizations

Z

∇2 ψdx =

Z div (∇ψ)dx =

D

D

so that

Z

∂n ψdγ Γ

Z

Z B(u)dx + D

ψn dγ = 0 Γn

(a compatibility condition) must hold. But if this is satisfied, uniqueness cannot hold as −∇2 ψ = 0, ∂n ψ = 0 has any constant as a solution. The key is coerciveness. We next show that the map u → ψ(u) (a solution) is continuous. Let Z ||ϕ||21,c = α(ϕ, ϕ) = {|∇ϕ|2 + c|ϕ|2 }dx, c > 0, D

and observe that ϕn → ϕ in || · ||1 if and only if ϕn → ϕ in || · ||1,c [Exercise: prove this. Hint: ∇ϕn → ∇ϕ, ϕn → ϕ a.e.]. If ψ(u) is a solution, then Z Z 2 ψn2 dγ ||ψ(u)||1,c = α(ψ(u), ψ(u)) = λ(ψ(u)) = B(u)ψ(u)dx + D

Γn

as T rΓn ψ(u) = ψn . If u, w ∈ U and ψ(u), ψ(w) are solutions, then ||ψ(u) − ψ(w)||21,c = α(ψ(u) − ψ(w), ψ(u) − ψ(w)) = λu (ψ(u)) − λu (ψ(w)) − λw (ψ(u)) + λw (ψ(w)) Z Z Z Z ψn2 dγ = ψn2 dγ − B(u)ψ(w)dx − B(u)ψ(u)dx + D D Γn Γn Z Z Z Z 2 ψn dγ + B(w)ψ(w)dx + − B(w)ψ(u)dx − ψn2 dγ D Γn Γn D Z [B(u) − B(w)][ψ(u) − ψ(w)]dx. = D

Hence 2 ||ψ(u) − ψ(w)||1,c ≤

Z |B(u) − B(w)||ψ(u) − ψ(w)|dx D

≤ ||B(u) − B(w)||2 ||ψ(u) − ψ(w)||2 (by the Schwarz inequality). But Z 2 |∇ϕ|2 dx + c||ϕ||22 . ||ϕ||1,c = D

Hence c||ψ(u) − ψ(w)||22 ≤ ||B(u) − B(w)||2 ||ψ(u) − ψ(w)||2

13 Approximation II: Standard Discretizations

192

and ψ( · ) is continuous. So if U is compact, there is a solution u∗ ∈ U. We can discretize in several ways. Approach 1 (Ritz–Galerkin). Let span [ϕ1 , . . . , ϕn ] = W be a subspace of dimension n and let Un = B −1 (Wn ) (assumed non-empty). Consider the system Z Z (13.35) α(ψ(u), ϕj ) = B(u)ϕj dx + ψn ϕj dγ, j = 1, . . . , n, D

Γn

with u ∈ Un and ψ(u) = Σxi ϕi ∈ W . Let K = (α(ϕi , ϕj )) so that (13.35) becomes Kψ = K1 β + θ R R with B(u) = Σβi ϕi , K1 = ( D ϕi ϕj dx), β = (βi ) and θ = ( Γn ψn ϕi ). As before, K is a non-singular matrix and there is a unique solution. Approach 2 (Finite element). Here the situation is less straightforward than the scalar equation case due to the issue of the character of the domain. We look at several simple cases. (1) Let D be the square 0 < x < 1, 0 < y < 1. (0,1)

(1,1) K2 K1

(0,0)

(1,0)

Fig. 13.2 K1 , K2 are the closed triangles

Π = {K1 , K2 } is a partition as int K1 ∩ int K2 = φ and K1 ∪ K2 = D. Let P1 = P11 (K1 ), P2 = P11 (K2 ) so that dim Pi = 3. Set x0 = (0, 0), x1 = (0, 1), x2 = (1, 0), x3 = (1, 1). Let ϕ11 = 1, ϕ21 = x, ϕ31 = y, and ϕ12 = 1, ϕ22 = x, ϕ32 = y be bases of P1 , P2 , respectively. Let ψ11 (w) = w(x0 ),

ψ21 (w) = w(x1 ),

ψ12 (w) = w(x1 ),

ψ22 (w) = w(x2 ),

ψ13 (w) = w(x2 ),

ψ23 (w) = w(x3 ).

Then these are the dual bases. The local interpolants are: IKi (w) =

3 X j

ψij (w)ϕji ( · )

193

13 Approximation II: Standard Discretizations

and IΠ (w)|Ki = IKi (w) gives the global interpolating function. The weak solution approximation is analogous to earlier work. (2) Let D = {(r, θ) : 0 < r < 1, 0 ≤ θ < 2π}. Let K1 = {(r, θ) : 0 ≤ r ≤ 1, 0 ≤ θ < π/2}, K2 = {(r, θ) : 0 ≤ r ≤ 1, π/2 ≤ θ < π}, K3 = {(r, θ) : 0 ≤ r ≤ 1, π ≤ θ < 3π/2}, K4 = {(r, θ) : 0 ≤ r ≤ 1, 3π/2 ≤ θ < 2π}. Let P1 = P11 ×J11 so that dim P1 = 5. We let ρ11 (r) = 1, ρ21 (r) = r, ψ11 (ρ(r)) = ρ(0), ψ12 (ρ(r)) = ρ(1) (where ρ(r) = a0 + a1 r). Then these give dual bases. Let ϕ21 (θ) = 1,

ϕ22 (θ) = 1 − sin θ,

ϕ13 (θ) = sin θ + cos θ − 1.

We view J1 as a finite-dimensional (here 3-dimensional) subspace of L2 ([0, π/2]) (or H 1 ). Let π ψ21 (f (θ)) = f (0), ψ22 (f (θ)) = f , and 2√  π i 1 h π  2  ψ23 (f (θ)) = √ f − 1− f (0) + f . 4 2 2 2−1 Then, ψ2j (ϕi2 ) = δij . For instance, √  i 1 h π  2 1 − sin − 1− (ψ21 (ϕ11 ) + ψ22 (ϕ11 ) = 0, 4 2 2−1 √  h   i π 1 2 ψ23 (ϕ22 ) = √ 1 − cos − 1− (ψ21 (ϕ21 ) + ψ22 (ϕ22 ) = 0, 4 2 2−1 h i 1 π π ψ13 (ϕ32 ) = √ sin + cos − 1 4 4 2−1 √ √ h i 1 2 2 =√ + − 1 = 1. 2 2−1 2

ψ33 (ϕ11 ) = √

These again lead to a weak approximation. There is considerable potential for the use of finite element methods in control problems. In the next chapter, we shall examine several general examples.

Chapter 14

Some Examples

Let X be a Banach space and let B[x, y] be a symmetric bilinear form on X. Suppose that b is a given element of X and that X0 = X − b. Let J(x) = B[x, x], J0 (x0 ) = B[x0 , x0 ] + 2B[x0 , b] + B[b, b]. Note that if x = x0 + b, then J(x) = J0 (x0 ). Lemma 14.1. If x∗ minimizes J(x), then x0∗ = x∗ − b minimizes J0 (x0 ) and conversely. Proof. Let x0 ∈ X0 and x = x0 + b. Then J0 (x0 ) = B[x0 , x0 ] + 2B[x0 , b] + B[b, b] = B[x, x] = J(x) ≥ J(x∗ ) = B[x∗ , x∗ ] = J0 (x∗0 ), i.e., J0 (x0 ) ≥ J0 (x0∗ ). The converse is similar.

t u

Let D ⊂ Rn be a regular domain and let L be a uniformly strongly elliptic operator of order m on D. Let β(ψ, ϕ) be the bilinear form X (Dα ψ, aαβ Dβ ϕ) β[ψ, ϕ] = (14.2) |α| ≤ m |β| ≤ m

where (Dα ψ, aαβ Dβ ϕ) =

Z

(Dα ψ)(aαβ Dβ ϕ)dx

D

(and L( · ) =

P |α| ≤ m |β| ≤ m

Dα (aαβ Dβ )( · )).

Consider the existence problem: © Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_14

195

14 Some Examples

196

Problem 14.3. Given g ∈ H m (D), f ∈ L2 (D), Bu ∈ L2 (D), find ψ(u) ∈ H m (D) such that (i) ψ(u) − g ∈ H0m (D); and (ii) β[ψ(u), ϕ] = (f, ϕ) + (Bu, ϕ) for all ϕ ∈ C0∞ (D), a fortiori for all ϕ ∈ H m (D). Theorem 14.4 (Lax–Milgram [L-1]). If B[., .] is bounded and coercive, then Problem 14.3 has a unique solution for u ∈ U and B : U → L2 (D) a bounded linear map. Let

Z J(u) =

[ψ(u)2 + hQBu, Bui]dx

(14.5)

D

with Q symmetric and coercive. Then Z [ψ(u)ψ(v) + hQBu, Bvi]dx J[u, v] = D

is a symmetric bilinear form (as Q = Q∗ ) and J(u) = J[u, u]. Let g be given. Then Problem 14.3 is (under appropriate assumptions) equivalent to the following [L-3]: Problem 14.6. Find ψ(u) ∈ H m (D) with T rm ψ(u) = g on Γ = ∂D, where  ∂g ∂ m−1  T rm g = g, n , · · · , n g ∂ ∂ and β[ψ(u), ϕ] = (f, ϕ) + (Bu, ϕ) for all ϕ. Again by the Lax–Milgram theorem, there is a unique solution. Now let X = {ψ(u) : Lψ(u) = f + Bu, T rm ψ(u) = g} and let X0 = {ψ(u) : Lψ(u) = f + Bu, T rm ψ(u) = 0}. Let ug be such that Lψ(ug ) = Bug and T rm ψ(ug ) = g. Set b = ψ(ug ). Claim 14.7. X = X0 + b. Proof. Let ψ(ˆ u) be the unique solution with Lψ(ˆ u) = f + B u ˆ, T rm ψ(ˆ u) = g on Γ and let u ˜ = u ˆ − ug . Then Lψ(ˆ u) = Lψ(˜ u + ug ) = f + B(˜ u + ug ) = f + Bu ˜ + Bug (and T rm ψ(ˆ u) = T rm ψ(˜ u) + T rm ψ(ug ) = g as T rm ψ(˜ u) = 0). In other words, Lψ(ˆ u) = f + B u ˆ = L[ψ(˜ u) + ψ(ug )] and T rm ψ(ˆ u) = g. By uniqueness, ψ(ˆ u) = ψ(˜ u) + ψ(ug ). t u Remark 14.8. If U = U˜ + ug , then u∗ minimizes J(u) if and only if u ˜∗ minimizes J0 (˜ u). Does there exist a minimum? Let v, w ∈ U˜ and ψ(v), ψ(w) be solutions of Lψ(v) = f + Bv, Lψ(w) = f + Bw, ψ(v), ψ(w) ∈ H0m (D), respectively. Then

197

14 Some Examples

β[ψ(v), ϕ] = (f, ϕ) + (Bv, ϕ),

(14.9)

β[ψ(w), ϕ] = (f, ϕ) + (Bw, ϕ),

for all ϕ ∈ H0m (D). Hence β[ψ(v) − ψ(w), ϕ] = (B(v − w), ϕ). Since β is bounded and coercive, α||ψ(v) − ψ(w)||2m ≤ β[ψ(v) − ψ(w), ψ(v) − ψ(w)] ≤ ||B|| · ||v − w||||ψ(v) − ψ(w)||m . In other words, ψ( · ) is continuous. Thus, if B U˜ (or BU) is compact (or weakly compact, or weak-*-compact) a solution exists. Now let us be somewhat redundant and recall some definitions and specifics. Let D ⊂ Rn be a regular domain. We consider H m (D) and H0m (D). We let ∂ j /∂nj be the exterior normal derivative of order j to Γ = ∂D. Recall that (Appendix E) Z X hψ, ϕim,D = (Dα ψ)(Dα ϕ)dx, D |α|≤m

||ψ||2m,D |ψ|m,D

= hψ, ψim,D , i1/2 hZ X |Dα ψ|2 dx = . D |α|=m

Since we assume D fixed, we omit the subscript D. Suppose that aαβ ( · ) are in L∞ (D) and that if |α| = |β| = m, the aαβ ( · ) are continuous. Let Q[ψ, ϕ] =

X Z |α| ≤ m |β| ≤ m

(Dα ψ)(aαβ Dβ ϕ)dx +

D

m−1 XZ j=0

Γ

∂ψ j λ2n−1−j (ϕ)dγ ∂nj

(14.10) where λk = λk (γ, D), k = 0, . . . , m − 1, are differential operators. Let X Lψ = Dα (aαβ Dβ ). (14.11) |α| ≤ m |β| ≤ m

We assume that Q is strongly coercive and bounded, that L is strongly elliptic, and that B : U → L2 (D) is a bounded linear operator. Then the existence problem Q[ψ, ϕ] = QL [Lψ, ϕ] = (f, ϕ) + (Bu, ϕ) (14.12) for all ϕ ∈ H0m (D) has a unique solution ψ(u). If the aαβ ( · ) are smooth (order 2m), then L is a differential operator as is the formal adjoint L0 given by

14 Some Examples

198

X

L0 ψ =

(−1)|α| Dα aαβ ( · )Dβ ψ

(14.13)

|α| ≤ m |β| ≤ m

and QL [Lψ, ϕ] = QL [ψ, L0 ϕ] (integration by parts). Problem 14.14. Find u ∈ U such that Q[ψ, ϕ] = QL [Lψ, ϕ] = QL [ψ, L0 ϕ] = (f, ϕ) + (Bu, ϕ) for all ϕ ∈ H0m (D) and Z J(u) =

[ψ(u)2 + hRBu, Bui]dx

D

with R = R∗ coercive is minimized. If BU is appropriately compact, then a solution u∗ exists. In view of Lemma 14.1, we shall suppose the boundary data vanish. As before, we consider two approaches, namely: (i) the Ritz–Galerkin method; and, (ii) the finite element method. We let M = co (BU) (i.e. the closed convex hull of BU) in L2 (D). Remark 14.15 (Properties of BU [B-1, D-9]). Let B : X → Y be a bounded linear map of Banach spaces. Then: (1) If U is convex, then BU, BU are convex. (2) If U is bounded, then BU, BU are bounded. (3) If U is weakly sequentially compact, then BU, BU are weakly sequentially compact. (4) If B is weakly compact, then B ∗ is also weakly compact and conversely. (5) If either X or Y is reflexive, every B ∈ L(X, Y) is weakly compact. (6) If X is reflexive, then the weak-*-topology and the weak topology in X∗ are the same. Proof of 3. Let vn ∈ BU, vn = Bun , and let {un } have a weakly convergent subsequence {un0 }, un0 → u. Let y ∗ ∈ Y∗ . Then y ∗ ◦ B ∈ X ∗ and (y ∗ ◦ B)(un0 ) = y ∗ (Bun0 ) = y ∗ (vn0 ) → (y ∗ ◦B)(u) = y ∗ (Bu) = y ∗ (v), v = Bu. t u Now let us examine the two approaches.

1. The Ritz–Galerkin Method Let M = BU and S(M ) be the closed subspace generated by M . Let X Z (Dα ψ)(aαβ Dβ ϕ)dx (14.16) β[ψ, ϕ] = QL [Lψ, ϕ] = |α| ≤ m |β| ≤ m

D

199

14 Some Examples

with β bounded and coercive. We have: ProblemM 14.17. Find v ∈ M such that, for f ∈ L2 (D), (i) β[ψ, ϕ] R= (f, ϕ) + (v, ϕ) for all ϕ ∈ H0m (D), (ii) J(v) = D [ψ(v)2 + hRv, vi]dx, with R = R∗ coercive is minimized. Let WN ⊂ H0m (D), dim WN = N , with VN = WN ∩ M non-empty and {ϕ1 , . . . , ϕN } be a basis of WN with ϕi ∈ VN , i = 1, . . . , N . ProblemN M 14.18. Find v ∈ VN = WN ∩ M such that, for f ∈ L2 (D), (i) β[ψ, ϕ] R= (f, ϕ) + (v, ϕ), ψ ∈ WN , for all ϕ ∈ WN , (ii) J(v) = D [ψ(v)2 + hRv, vi]dx, is minimized. Suppose that ψ =

PN

i=1

xi ϕi . Then Lψ =

β[ψ, ϕj ] = β[Σxi ϕi , ϕj ] =

P

N X

xi Lϕi and xi β[ϕi , ϕj ].

(14.19)

i=1

Let kij = β[ϕi , ϕj ], K = (kij ), and     x1 (f, ϕ1 )     x =  ...  , θ =  ...  , xN



 (v, ϕ1 )   v =  ...  .

(f, ϕN )

(14.20)

(v, ϕN )

Then (14.16) becomes Kx = θ + v.

(14.21)

In view of the existence and uniqueness via Lax–Milgram, K is an N × N non-singular matrix and (14.21) for ProblemM 14.17 has a unique solution for all v ∈ WN ∩ M . Since WN is finite-dimensional and M is closed, WN ∩ M ∗ to is in fact compact. As J is lower semi-continuous, there is a solution vN N ProblemM 14.18. Now let M ⊂ L2 (D), a separable reflexive space. If M is bounded (e.g., if U is bounded), then M is weak-*-compact (a fortiori weak compact) so that an optimal solution v ∗ ∈ M exists. We shall assume that M v , r) be a sphere contained in M . Then has a non-empty interior. Let S(˜ there is a basis {ϕ1 , . . . , ϕN , . . .} of L2 (D) which is contained in M . Let WN = span {ϕ1 , . . . , ϕN } and let VN = WN ∩ M . Then the VN are nested, i.e., VN ⊂ VN +1 ⊂ · · · , and the infima are decreasing, · · · inf J(VN +1 ) ≤ ∗ inf J(VN ) for all N . In other words, if v ∗ solves ProblemM 14.17 and vN N solves ProblemM 14.18, then ∗ ∗ J(v∗) ≤ J(vN +1 ) ≤ J(vN ) ≤ · · ·

(14.22)

14 Some Examples

200

for all N . Moreover, since {ϕ1 , . . . , ϕN , . . .} is a basis, v ∗ = PN v˜N = i=1 αi ϕi so that v˜N → v ∗ . Note that ∗ vN ) − J(v ∗ ) ≥ J(vN ) − J(v ∗ ), J(˜

P∞

i=1

αi ϕi . Let

(14.23)

since v˜N ∈ VN . ∗ Remark 14.24. limN →∞ [J(˜ )− vN ) − J(v ∗ )] = 0 and hence limN →∞ [J(vN ∗ J(v ∗ )] = 0 (in other words, {vN } is a minimizing sequence).

Proof. Note that ∗

Z

vN )2 − ψ(v ∗ )2 ] + [hQ˜ vN , v˜N i − hQv ∗ , v ∗ i]dx. [ψ(˜

vN ) − J(v ) = J(˜ D

However, Z Z vN )[ψ(˜ vN )2 −ψ(v ∗ )2 ]dx = vN )−ψ(v ∗ )]dx ψ(˜ vN )−ψ(v ∗ )]+ψ(v ∗ )[ψ(˜ [ψ(˜ D

D

so that Z D

vN ) − ψ(v ∗ )||. vN )2 − ψ(v ∗ )2 ]dx ≤ {||ψ(˜ [ψ(˜ vN )|| + ||ψ(v ∗ )||}||ψ(˜

Similarly, Z vN || + ||v ∗ ||}||˜ vN , v˜N i − hQv ∗ , v ∗ i]dx ≤ ||Q||{||˜ vN − v ∗ ||. [hQ˜ D

vN ||, ||v ∗ ||, N = 1, 2, . . .}. Then there exist vN )||, ||ψ(v ∗ )||, ||˜ Let m = sup{||ψ(˜ N1 , N2 such that, respectively, if N ≥ N1 , then ||ψ(˜ vN ) − ψ(v ∗ )|| ≤ /2m ∗ vN − v || ≤ /2m. If follows that if (continuity of ψ); and if N ≥ N2 , then ||˜ N ≥ max(N1 , N2 ), then |J(˜ vN ) − J(v ∗ )| < . Observe that v˜N is a convergent minimizing sequence. For bounded M , ∗ ∗ } which is a convergent minimizing {vN } has a convergent subsequence {vN i sequence. t u Example 14.25 (cf: [L-3]). Let D ⊂ Rn be a regular domain and let Lψ = −∇(A(x)∇ψ) + b(x) · ∇ψ + c(x)ψ

(14.26)

where A( · ) ∈ L∞ (D, Rn × Rn ), b( · ) ∈ L∞ (D, Rn ), ∇b( · ) ∈ L∞ (D, Rn ), c( · ) ∈ L∞ (D, R), and n X i,j=1

with α > 0 (elliptic). Let

aij ξi ξj ≥ α||ξ||2 a.e. in D

201

14 Some Examples

Z [∇ψA · ∇ϕ + ψ(b · ∇ϕ) + cψϕ]dx.

β[ψ, ϕ] =

(14.27)

D

Let H be a Hilbert space with U ⊂ H and let B ∈ L(H, L2 (D)). We consider the equation β[ψ, ϕ] = (f + Bu, ϕ)

for all ϕ ∈ H01 (D).

(14.28)

The solution ψ(u) (via Lax–Milgram) is an element of H01 (D). Let J(u) = ||ψ(u)||2 + hQu, ui

(14.29)

where Q ∈ L(H, H) and Q = Q∗ is positive definite. We write J(u) = ||ψ(u)||2 + ||u||2Q . Let M = BU and let us assume 0 ∈ int M . Then there is an orthogonal basis {ϕ1 , . . . , ϕN , . . .} in M (actually in int (M )).14.1 Let WN = M ∩ VN with VN = span {ϕ1 , . . . , ϕN }. Suppose that U is closed and bounded and so weakly compact as H is reflexive. Since B is bounded linear, BU is weakly compact. If Bun → w, then {un } has a weakly convergent w w w subsequence {uni } → u. Hence Buni → Bu. But Bun → v so that Bu = v and BU is closed. We have: ProblemM 14.30. Find u ∈ U such that (i) β[ψ(u), Rϕ] = (f + Bu, ϕ) for all ϕ ∈ H01 (D), (ii) J(u) = D [ψ(u)]2 dx + ||u||2Q , is minimized. ProblemN M 14.31. Find u ∈ U such that (i) β[ψ(u), ϕ]R = (f + Bu, ϕ), for all ϕ ∈ VN , ψ(u) ∈ VN , (ii) JN (u) = D [ψ(u)]2 dx + ||u||2Q . is minimized. By Lax–Milgram, both problems have solutions as J( · ), JN ( · ) are lower semi-continuous and ψ( · ), ψN ( · ) are continuous. Let ui , i = 1, . . . , N, . . . be elements of U such that Bui = ϕi . Then the ui are linearly independent PN (if α u i i = 0, then Σαi Bui = Σαi ϕi = 0 and αi = 0). In view of i=1 the assumption that U is bounded and closed and thus weakly compact, we have M = BU = BU, VN = span [ϕ1 , . . . , ϕN ] = B span [u1 , . . . , uN ] (where ϕi = Bui ). If UN = span [u1 , . . . , uN ] and UN = UN ∩ U, then B(UN ∩ U) = BUN ∩ BU and VN = BUN so that WN = BUN (the key is that if Σαi ui = Σβi ui , then Σai ϕi = Σβi ϕi and αi = βi ). Consider β[ψN (u), ϕj ] = (f + Bu, ϕj ), 14.1

j = 1, . . . , N.

(14.32)

Since S(0, r) ⊂ M for some r > 0, if {ϕ ˜1 , . . . , ϕ ˜N , . . .} is an orthonormal basis of H01 (D) relative to β, then ϕi = (λr)ϕ ˜i , for any 0 < λ < 1, is an orthogonal basis with ϕ1 ∈ M .

14 Some Examples

202

Let ψN (u) =

PN

i=1

vi ϕi and set   (v1 )   v =  ...  ,

 (ϕ1 )   ϕ =  ...  , 

(ϕN )

(vN )

PN so that ψN (u) = v0 ϕ (v0 = (v1 , . . . , vN )). Let u = i=1 αi ui where Bui = ϕi , i = 1, . . . , N , and set     (α1 ) (u1 )     u =  ...  , α =  ...  . (αN )

(uN ) Then (14.32) becomes N X

xi β[ϕi , ϕj ] = (f, ϕj ) +

N X

αi (ϕi , ϕj )

(14.33)

i=1

i=1

for j = 1, . . . , N . Letting kij = β[ϕi , ϕj ] = β[Bui , Buj ] and aij = (ϕi , ϕj ), (14.33) becomes Kv = θ + Lα (14.34) where K = (kij ), L = (aij ) and 

 (f, ϕ1 )   θ =  ...  . (f, ϕN ) Note that K is non-singular and symmetric and that L is symmetric. It follows that v = (K−1 )θ + (K−1 L)α (14.35) and that J(u) = v

0

Z

 ˜ hϕi , ϕj idx v + α0 Qα

(14.36)

D

˜ = (√Qui , √Quj ) (as Q = Q∗ positive definite). In other words, where Q J(u) = J(α) is a quadratic form in α which is bounded below and so has a minimum α∗ . By earlier arguments, we have a minimizing sequence and a convergent minimizing sequence.

2. The Finite Element Method Let D be a polyhedron and let ΠN = {KiN , PiN , Pi∗N }, N = 1, . . . , iN = 1, . . . , νN , be a simplicial partition of D (the KiN being affine equivalent nsimplices). Let k ≥ 1 be a fixed integer. Suppose that ΠN +1 is an edgewise

14 Some Examples

203

subdivision of ΠN (based on k) for N = 1, 2, . . .. Let hN = h(D, ΠN ) = sup{h(KiN )}. Then hN +1 ≤ hN and limN →∞ hN = 0. The basic idea is: start with an initial partition Π1 , a simplicial finite element with basic finite elements affine equivalent to a fixed element; subsequent partitions are obtained by edgewise subdivision or other suitable conformal refinement. Example 14.37. Let D be the P unit square {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} r and let Pr = {p(x, y) : p(x, y) = j=1 αij xi y j } be the space of polynomials of degree ≤ r on D. Let Π1 = {(K1 , P1 , P1∗ ), (K2 , P2 , P2∗ )} where Ki are the triangles and P1 = Pr |Ki for i = 1, 2. (0,1)

(1,1) K2 K1

(0,0)

(1,0)

Fig. 14.1

Note that (Ki , Pi , Pi∗ ) are affine equivalent to (K2 , Pr (K2 ), Pr (K2 )∗ ). Let M = BU = BU and let Lr1 = {ψ ∈ Hom (D) : ψ|Ki = ψi ∈ Pi } and let V1 = Lr1 ∩ M , V1,i = Lr1 ∩ M ∩ Pi for i = 1, 2. Note that dim Pi = (r + 2)!/r!2! = (r + 2)(r + 1)/2 = dim V1,i , and we suppose that O is an interior point. Let {ξ1a }, {ξ2b } be dual bases of P1∗ , P2∗ , respectively. Consider the problem(s) β i [ψi (u), ϕ) = (f + Bu, ϕ),

for all ϕ ∈ H0m (Ki )

with ψi ∈ H0m (Ki ) and the problem(s) for all ϕ ∈ H0m (Ki ) R and Bu ∈ V1,i , and determine the minimum of Ji (u) = D ψi (u)2 dx + ||u||2Q . In other words, solve the approximate problem(s) restricted to Ki , i = 1, 2. This is Problem1M . Let k = 2 and consider the edgewise subdivision(s) of K1 and K2 with spaces P1,1 , P1,2 , . . . , P1,4 and P2,1 , . . . , P2,4 , see Figure 14.2. This leads to Problem2M . All have solutions by Lax–Milgram and continuity. (The issue of cross interior boundary is handled by using C m -functions and a density argument.) By results in Appendix F, limN →∞ ||u∗ − u∗N || = 0, so there is a convergent minimizing sequence. β i [ψi (v), ϕ) = (f + Bu, ϕ),

We now return to the (more) general situation. Let M = BU and S(M ) be the subspace generated by M . Let

14 Some Examples

204

K2,1

K2,2

K1,2

K2,3

K1,3

K2,4

K1,1

K1,4

Fig. 14.2

X Z

β[ψ, ϕ] =

|α| ≤ m |β| ≤ m

(Dα ψ)(aαβ Dβ ϕ)dx

(14.38)

D

with β bounded and coercive. We have: ProblemM 14.39. Find v ∈ M such that, for f ∈ L2 (D), (i) β[ψ, ϕ] R= (f, ϕ) + (v, ϕ) for all ϕ ∈ H0m (D), (ii) J(v) = D [ψ(v)2 + hRv, vi]dx, with R = R∗ coercive is minimized. ˜ P, P ∗ ) be an n-simplicial basic finite element (e.g., K ˜ = K2 , P = Let (K, Pr (K2 ), P ∗ = Pr (K2 )∗ ). Let ΠN = {(KiN , PiN , Pi∗N ) : iN = 1, . . . , νN } be a finite element on D. We suppose that all (KiN , PiN , Pi∗N ) are affine equivalent ˜ P, P ∗ ) and that τi : K ˜ → Ki are the affine transformations defining to (K, N N ∗ the equivalence (so τiN (P) = PiN , τiN,∗ (Pi∗N ) = P ∗ ). We suppose that ΠN , N = 1, 2 . . ., is shape regular with ΠN +1 a “refinement” (or subdivision) of ΠN . Then hN +1 ≤ hN , and we assume that, given  > 0, there exists an N0 such that hN0 <  (and hence, limN →∞ hN = 0). Let PN = {ϕ : τi∗N ϕ = ϕ ◦ τiN ∈ PiN , iN = 1, . . . , νN }, VN = PN ∩ M, ViN = PiN ∩ M, where M = BU. We have: ProblemN M 14.41. Find vN ∈ VN = WN ∩ M such that, (i) β[ψ(vN ), ϕ] = (f, ϕ) + (vN , ϕ), for all ϕ ∈ PN ,

(14.40)

205

14 Some Examples

(ii) JN (vN ) =

R D

[ψ(vN )2 + hRvN , vN i]dx,

with R = R∗ coercive is minimized. N 14.41 have solutions By Lax–Milgram, ProblemM 14.39 and ProblemM ∗ ∗ ∗ v , vN , respectively. By the results in Appendix F, ||v −vN || → 0 as N → ∞. ∗

Example 14.42 (Example 14.25 revisited). Consider the problem of Example 14.25 and let ΠN be a family of finite elements satisfying the assump∗ tions made earlier. Let v ∗ , vN be optimal. Then, by the (standard) results ∗ ||1 = 0, (Appendix F, [B-5, E-3]), limN →∞ ||v ∗ − vN s−1 ∗ ∗ ||1 ≤ ahN |v |s ||v ∗ − vN

(if n/2 < s ≤ r + 1). Sharper estimates are possible under certain conditions [B-5, E-3]. Example 14.43 (“Maxwell’s” Equations [S-6]). Let D ⊂ R3 be a regular domain. Let H = L2 (D) ⊕ L2 (D) ⊕ L2 (D) with (., .)H as inner product. Let ∂D = γ1 ∪ γ2 and let n be the exterior normal to ∂D. Let Xi = {w : w ∈ H, ∇ × w ∈ H, w × n = 0 on γi } for i = 1, 2 and let ||w||Xi = ||w||0 + ||∇ × w||0 for i = 1, 2 (L2 -norms). Let Z = H ⊕ H with ||(η, θ)||Z = ||η||0 + ||θ||0 and let Z1 = X1 × X2 with ||(ψ, ϕ)||Z1 = ||ψ||X1 + ||ϕ||X2 . Consider the system (ω + σ)E − ∇ × H = f1 + B1 u1 , ∇ × E − iωµH = f2 + B2 u2 ,

(14.44)

E × n = 0 on γ1 ,H × n = 0 on γ2 , with fi ∈ H, Bi ∈ L(Ui , H), Ui Hilbert spaces, for i = 1, 2. Let L : Z1 → Z be given by L(ψ, ϕ) = ((iω + σ)ψ − ∇ × ϕ, ∇ × ψ − iωµϕ)

(14.45)

and β : Z1 × Z → R be given by β[(ψ, ϕ), (η, θ)] = (L(ψ, ϕ), (η, θ))z .

(14.46)

Then L ∈ L(Z1 , Z) and β[ , ] ∈ L(Z1 × Z, R). Moreover, β is bounded and coercive, i.e., there exists α > 0 such that β[(ψ, ϕ), (ψ, ϕ)] ≥ α||(ψ, ϕ)||2Z1 ˜ e.g., η˜ = (σ − iω)ϕ − (1 − i)∇ × ϕ, and consider η , θ), (take appropriate (˜ ˜ η , θ)]). β[(ψ, ϕ), (˜ We consider the system

14 Some Examples

206

β[(ψ, ϕ), (n, θ)] = ((f1 + B1 u1 , f2 + B2 u2 ), (η, θ))Z = ((f1 , η) + (B1 u1 , η), (f2 , θ) + (B2 u2 , θ2 ))

(14.47)

and let J(u1 , u2 ) be given by Z Z J(u1 , u2 ) = ϕ(u2 )2 dx + hR2 u2 , u2 i ψ(u1 )2 dx + hR1 u1 , u1 i + D

with Ri =

Ri∗

D

coercive.

As in previous situations we have: ProblemM 14.48. Let Mi = Bi Ui , i = 1, 2, and M = M1 ⊕ M2 . Then find (v1 , v2 ) ∈ M such that (i) β[(ψ(v1 ), ϕ(v2 )), (η, θ)] = ((f1 + v1 , f2 + v2 ), (η, θ)) for all (η, θ), (ii) J(v1 , v2 ) is minimized. Let Ni ≥ 1, i = 1, 2, be integers and let Xi,Ni , i = 1, 2, be Ni -dimensional subspaces of Xi , respectively. Let MN1 ,N2 = M ∩ (X1,N1 ⊕ X2,N2 ). Then we have: 1 ,N2 ProblemN 14.49. Find (v1,N1 , v2,N2 ) ∈ MN1 ,N2 such that, M

(i) β[(ψ(v1,N1 ), ϕ(v2,N2 )), (η, θ)] = ((f1 + v1,N1 , f2 + v2,N2 ), (η, θ)) for all (η, θ) in X1,N1 ⊕ X2,N2 , (ii) Z Z 2 JN1 ,N2 (v1,N1 , v2,N2 ) = ψ(v1,N1 ) dx + φ(v2,N2 )2 dx D

D

+ hR1 v1,N1 , v1,N1 i + hR2 v2,N2 , v2,N2 i is minimized. Both the Ritz–Galerkin and finite element methods can be applied. Convergence results can be obtained as in the earlier examples. A general theory for linear PDE control problems with convex cost functionals should be straightforward to develop.

Chapter 15

Roads Not Traveled and Future Paths

Since the book was written over a rather long period of time, there are a number of problems and ideas that we considered but did not pursue. In this somewhat disorganized chapter we provide some discussion which can point the reader to potential avenues of investigation. Two important areas we did not present are stochastic problems and the role of convexity. Direct methods could certainly play a role here. Consider the following prototype problem: Proto-Problem 15.1. Find u ∈ U which minimizes J(x, u) subject to x = ϕ(u). There is a prototype theorem: Proto-Theorem 15.2. If J(., .) is lower semi-continuous, ϕ(.) is continuous, and U is compact, then a solution exists. This was the idea inherent in Chapter 10. We did not consider, for example, stochastic systems dxu = a(t, xu , u)dt + b(t, xu , u)dW

(15.3)

with W a Brownian motion on Ω, u(t) adapted to appropriate Borel fields Bt , R standard assumptions on a(.), b(.), |u( · )| bounded a.e., and E{ |u(t)|n dµ} < ∞ (see [F-1, K-3, K-1]). For suitable U, continuity can be proved and an existence theorem follows. Since for this case, the solution is a continuous square integrable martingale, application of direct methods is quite feasible. The more general case of Levi stochastic systems might be possible (but very, very technical). Convexity (and quasi convexity) are quite significant in control problems [D-1, R-3, L-6]. Proto-Theorem 15.4. If J(., .) is convex, ϕ is linear, and U is convex, then there is a solution.

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0_15

207

15 Roads Not Traveled and Future Paths

208

This “proto-theorem” is at the heart of linear quadratic problems which are particularly amenable to direct methods. Another path we did not follow is the dynamic programming approach (e.g., [B-2, F-1]). In a sense, this is also a direct method but would have led us far afield. Now let us turn to some more specific issues. Moser’s theorem (Theorem 1.13) is at the heart of penalty function methods and has not been exploited in control problems. We think it particularly relevant for PDE problems. In a similar vein, the general implicit function and inverse function theorems, while leading to a (fairly) general multiplier rule, were not fully exploited. Most spaces involved in control problems are uniformly convex and the Sobolev gradient integration method is applicable but we could have examined some specific cases of interest. We believe a further investigation would prove fruitful here. In particular, examination of difference, i.e., discrete approximation of the “gradient” equation could lead to convergent minimizing sequences. Let, say, L be a linear differential operator on H = H m (D) and consider the control system (L − λ)ψ = Bu, B bounded, linear. Symbolically, ψ(u) = (L − λ)−1 Bu. Thus, in a sense, the solution is (intimately) related to the spectrum. Consider a system of the form Z t xu (t) = (15.5) K(t, s)Bu(s)ds t0

where K(t, s) is a Mercer kernel. Then there is an eigenvalue expansion (as R the operator Lf ( · ) = E K( · , s)f (s)ds is compact) and the expansion can be used as the basis of an approximation method for control problems. Consider also a (general) stochastic control system xt,u (ω) with xt,u ( · ) a square integrable martingale for all controls u. Then Ku (t, x) = E{xt,u , xs,u } generates an appropriate kernel and the spectrum may be used for an approximation method. Perhaps, it may be possible to generalize for systems defined by normal operators. We dealt primarily with continuous time dynamical systems, although in later chapters, we considered some (non-dynamical) systems defined by partial differential equations. Several extensions are worth considering. In place of E = [a, b], we could examine control systems defined on E = [a1 , b1 ]× · · · × [an , bn ], i.e., defined by multiple integrals as xu (t1 , t2 , . . . , tn ) = x(t10 , . . . , tn0 ) Z t1 Z tn (15.6) ··· + f (xu (s1 , . . . , sn ), u(s1 , . . . , sn ))ds a1

an

(corresponds to a partial differential equation). Standard classical control theory can be developed for such control systems although there does not seem to be a treatment in the control literature. We also did not do much

15 Roads Not Traveled and Future Paths

209

on discrete systems; however, if E ⊂ Z, the integers, and X, U are spaces of bounded or `p sequences, then there should be versions of a number of the results (in particular, the existence theory). We also did not consider dynamic programming and viscosity solutions [F-1]. These methods can be used to create minimizing sequences. Our existence theory, while elementary, is quite broad in its applicability as illustrated in the examples in Chapter 14. A version involving quasiconvexity might merit some investigation [D-1]. Direct methods, such as the Ritz–Galerkin method, work particularly well in Hilbert space. Suppose the spaces involved are not necessarily Hilbert but are reflexive. Some relevant facts are: (1) a uniformly convex Banach space is reflexive; (2) if X ∗ is separable, then X ∗ contains a countable total set and conversely if X (hence X ∗ ) is reflexive. Sets {xj } ⊂ X, {xi∗ } ⊂ X ∗ are bi-orthogonal if x∗i (xj ) = δij . Then, for {xj }, {xi∗ } bi-orthogonal, (3) if n X ∗ sup x (xi )xi∗ < ∞ n





for all x ∈ X , then x = (4) if

i=1

P∞ 1

x∗i (x)xi if x ∈ span {x1 , . . .};

n X sup x∗ (xi )x∗i < ∞ n

for all x ∈ X, then x∗ =

i=1

Pn

i=1

x∗ (xi )x∗i if x∗ ∈ span {x∗i }.

As separable reflexive spaces have (complete) bi-orthogonal {xj }, {x∗i }, expansions such as in (3) and (4) are available and approximation methods should be available for control problems in such spaces. In Chapter 13, we proved an approximation result for a fairly general method (13.13) where the system differential equation was approximated by yνj+1 = yνj + hj gνj (yνj , yνj−1 , . . . , ujν , uj−1 ν , . . .) with an arbitrary bounded g. It should be possible to generalize the approach and extend it to more general systems. The use of approximation methods such as Ritz–Galerkin and finite elements has been somewhat limited in control problems ([B-4, D-2, H-1, H-6], for example). Often such methods have been applied to necessary conditions such as the Pontryagin maximum principle and then the necessary conditions have been strengthened to make them sufficient. We have used them directly to obtain solutions. In particular, these methods should have broad applicability in control problems involving partial differential equations. Several examples were given in Chapter 14. We indicate some others now.

15 Roads Not Traveled and Future Paths

210

Example 15.7 (Parabolic). Let E = [t0 , t1 ] and D ⊂ Rn be a regular domain. Consider the differential operator n X

Lt ψ =

Di (aij Dj ψ) +

i,j=1

n X

bj Dj ψ + cψ

(15.8)

i,j=1

where the coefficients aij (t, x), bj (t, x), c(t, x) are in L∞ (E × D), the aij are symmetric, Lt ψ is uniformly elliptic, and coercive uniformly in t. The parabolic contol problem is: Dt ψ(u)(t, x) + Lt ψ(u)(t, x) = f + Bu, x ∈ Γ = ∂D,

ψ(u)(0, x) = 0,

x ∈ D,

ψ(u)(t0 , x) = g(x), where f ∈ L2 (E, H01 (D)), g ∈ L2 (D), Hilbert space into H0−1 (D). The cost Z Z J(u) = E

(15.9)

and B is a bounded linear map from a is a quadratic functional

|ψ(u)(t, x)|2 dxdt + hQu, ui.

D

Let (., .) be the inner product on L2 (D) and let {., .} be the pairing of H0−1 (D) and H01 (D). Then ψ(u) : E → H01 (D) is a weak solution of (15.9) if: (i) ψ(u) ∈ L2 (E, H01 (D)), Dt ψ(u) ∈ L2 (E, H0−1 (D)); (ii) for all ϕ ∈ H01 (D), (Dt ψ(u), ϕ) + β t [ψ(u), ϕ] = {f (t) + Bu, ϕ} almost everywhere on E where β t [ψ, ϕ] = [Lt ψ, ϕ] =

n Z X

aij (t, x)Di ψ(s)Dj ϕ(x)dx

i,j=1 D n Z X

+

j=1

bj (t, x)Dj ψ(x)ϕ(x)dx

D

Z +

c(t, x)ψ(x)ϕ(x)dx, D

and (iii) ψ(u)(t0 , x) = g(x). Because of boundedness of the forms and coerciveness, solutions exist [L-3], and the approximation methods work (for u ∈ U compact) as in Chapter 14. A similar treatment for the wave equation (hyperbolic) is possible. Development of a general theory along these lines should not be unduly difficult. By way of summary, we have tried to sketch some potential avenues of future investigation based on extensions or refinements of material developed in the book and based on areas of interest not covered.

Appendix A

Uniformly Convex Spaces

Let X be a Banach space. Proposition A.1. The properties (1) and (2) are equivalent. (1) If 0 <  < 2, ||x|| ≤ 1, ||y|| ≤ 1, ||x − y|| ≥ , then there is a δ > 0 such that ||(x + y)/2|| ≤ 1 − δ. (2) If  > 0, ||x|| = ||y|| = 1, ||x − y|| ≥ 2, then there is a δ, 0 < δ < 1, such that ||(x + y)/2|| ≤ δ (or ||x + y|| ≤ 2δ). Proof. Assume (1). Let  > 0, ||x|| = ||y|| = 1, and ||x − y|| ≥ 2. Then we must have 0 <  < 1 as 2 ≥ ||x − y|| ≥ 2. Let 0 = 2 so that by (1), there is a δ 0 > 0 such that ||(x + y)/2|| ≤ 1 − δ 0 . Let δ = 1 − δ 0 and note that 1 − δ 0 > 0, so that 1 > δ 0 and 0 < δ < 1. Assume (2). Let 0 = /2 so that ||x − y|| ≥ 20 which implies there is a δ 0 , 0 < δ 0 < 1, with ||(x + y)/2|| ≤ δ 0 . Let δ = 1 − δ 0 . Then δ > 0, ||x − y|| ≥  and ||(x + y)/2|| ≤ 1 − δ. t u Definition A.2. X is uniformly convex if the equivalent conditions of Proposition A.1 are satisfied [C-4]. We are going to show that the spaces `p , Lp are uniformly convex for 1 < p < ∞. As usual we let q be such that (1/p) + (1/q) = 1 so that q = p/p − 1 and p = q/q − 1. We observe that if we prove the result for `p , then the result will follow for Lp as step functions are dense in Lp . Lemma A.3. Suppose that 0 ≤ λ ≤ 1 and that 1 < p ≤ 2. Then (1 + λ)q + (1 − λ)q ≤ 2(1 + |λ|p )q−1 .

(A.4)

Proof. Let λ = (1 − θ)/(1 + θ) so that (A.4) becomes 

1+

 (1 − θ) q (1 − θ)p q−1 (1 − θ) q  + 1− ≤2 1+ (1 + θ) (1 + θ) (1 + θ)p

or

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0

211

A Uniformly Convex Spaces

212

2q (1 + θq ) ≤ 2((1 + θ)p + (1 − θ)p )q−1 or 2(1 + θq )1/q−1 ≤ (1 + θ)p + (1 − θ)p or 2(1 + θq )p−1 ≤ (1 + θ)p + (1 − θ)p for 1 < p ≤ 2, 0 ≤ θ ≤ 1, since 1 1 = p  = p − 1. q−1 −1 p−1 Let

(1 + θ)p (1 − θ)p + − (1 + θq )p−1 2 2 on [0, 1]. Then ψ(0) = 0, ψ(1) = 0, and we claim that ψ(θ) ≥ 0 for 0 < θ < 1. The Taylor series for ψ(θ) is given by ψ(θ) =

ψ(θ) =

∞ X (2 − p)(3 − p) · · · (2n − p) 2n h 1 − θ(2n−p)/p−1 1 − θ2n/p−1 i θ − . (2n − 1)! (2n − p)/p − 1 2n/p − 1 n=1

Since the function (1 − θt )/t, t > 0, 0 < θ < 1 is decreasing, the claim follows. t u Corollary A.5. If 2 ≤ p < ∞, then (1 + λ)p + (1 − λ)p ≤ 2(1 + |λ|q )p−1 for 0 ≤ λ ≤ 1. Corollary A.6. If 1 < p ≤ 2 and u, v ∈ C, then |u + v|q + |u − v|q ≤ 2(|u|p + |v|p )q−1 .

(A.7)

Proof. Suppose |u| ≥ |v|, then, dividing by |u|q , (A.7) becomes |1 + µ|q + |1 − µ|q ≤ 2(1 + |µ|p )q−1 with |µ| ≤ 1. Set µ = λeiz and it is easy to see that we need only consider z = 0 and 0 ≤ λ ≤ 1 which is the lemma. t u Remark A.8 (Minkowski’s Inequality). Let 1 < p < ∞. Then n n n X 1/p  X 1/p  X 1/p p p (|αi + βi |) ≤ |αi | + |βi |p i=1

1

1

and, if (αi ), (βi ) ∈ `p , then (A.9) holds for n = ∞. If 0 < p ≤ 1, then

(A.9)

A Uniformly Convex Spaces

X

|αi |p

1/p

+

213

X

|βi |p

1/p



X 1/p . (|αi + βi |)p

(A.10)

Proof. We prove leave P (A.9) and P (A.10) to the reader. P ≤ Note that |αi + βi |p (|αi | + |αi + βi |p = P|αi + βi |p−1 |αi + βi | P p−1 |αi + βi | |αi | ≤ ( |αi + βi |(p−1)q )1/q · |βP i |). By Hölder’s P inequality, P p 1/q p 1/p ( |αi | ) = ( |αi + βi | ) ( |αi |p )1/p as (p − 1)q = p. Similarly with respect to βi . We then have X

|αi + βi |p ≤

X

|αi + βi |p

1/q h X

|αi |p

1/p

+

X

|βi |p

1/p i

, t u

and the result follows. Lemma A.11. If 1 < p ≤ 2, then ||x + y||q + ||x − y||q ≤ 2(||x||p + ||y||p )q−1 for x, y ∈ `p .

Proof. Let x = (xi ), y = (yi ) and let λ = p/q and αi = |xi + yi |q , βi = |xi − yi |q . Then, by (A.10), we have hX

|xi + yi |p

iq/p

+

hX

|xi − yi |p

iq/p

hX iq/p X (|xi + yi |q + |xi − yi |q )p/q iq/p hX ≤ (2{|xi |p + |yi |p }q/p )p/q iq/p hX = 2p/q {|xi |p + |yi |p } ≤

(by Corollary A.6). Since q − 1 = q/p, the result follows.

t u

Lemma A.12. If 1 < p ≤ 2, then ||x + y||p + ||x − y||p ≤ 2(||x||q + ||y||q )p−1

(A.13)

for x, y ∈ `p . Proof. Let us write the inequality of Lemma A.11 in the form 2(||u||p + ||v||p )q−1 ≥ ||u + v||q + ||u − v||q . Let u = x + y, v = x − y so that u + v = 2x and u − v = 2y. Then (A.13) becomes 2(||x + y||p + ||x − y||p )q−1 ≥ 2q (||x||q + ||y||q ) or

A Uniformly Convex Spaces

214

(||x + y||p + ||x − y||p )q−1 ≥ 2q−1 (||x||q + ||y||q ) or (||x + y||p + ||x − y||p )1/p−1 ≥ 21/p−1 (||x||q + ||y||q ) or ||x + y||p + ||x − y||p ≥ 2(||x||q + ||y||q )p−1 t u

which is the result. Lemma A.14. If p ≥ 2, then the following inequalities are equivalent: (i) 2(||x||p + ||y||p )q−1 ≤ ||x + y||q + ||x − y||q , (ii) ||x + y||p + ||x − y||p ≤ 2(||x||q + ||y||q )p−1 , for x, y ∈ `p , Lp . Proof. Assume (i). Let u = x + y, v = x − y. Then, by (i), 2(||u||p + ||v||p )q−1 ≤ ||u + v||q + ||u − v||q or (||x + y||p + ||x − y||p )q−1 ≤ 2q−1 (||x||q + ||y||q ) or ||x + y||p + ||x − y||p ≤ 2(||x||q + ||y||q )p−1 as 1/(q − 1) = p − 1. The other implication is proved in a similar way.

t u

Lemma A.15. If p > 2, then 2(||x||p + ||y||p )q−1 ≤ ||x + y||q + ||x − y||q

(A.16)

||x + y||p + ||x − y||p ≤ 2(||x||q + ||y||q )p−1

(A.17)

and for x, y ∈ `p (or Lp ). Proof. In view of Lemma A.14, it is enough to prove (A.16). Applying the MInkowski inequality (A.10), we have hX

|xi +yi |p

iq/p h X iq/p iq/p h X X ≥ (|xi +yi |q + + |xi −yi |p |xi −yi |q )p/q

where x = (xi ), y = (yi ). Since p > 2, q < 2 and by Lemma A.12, |xi + yi |q + |xi − yi |q ≥ 2(|xi | + |yi |p )q−1 which implies that h X

|xi + yi |p

1/p iq

+

h X

|xi − yi |p

1/p iq

hX iq−1 (|xi |p + |yi |p ) ≥2

or ||x + y||q + ||x − y||q ≥ 2(||x||p + ||y||p )q−1 which is the result.

t u

A Uniformly Convex Spaces

215

Remark A.18. For any p, the following are equivalent: (i) 2(||x||p + ||y||p ) ≤ ||x + y||p + ||x − y||p , (ii) ||x + y||p + ||x − y||p ≤ 2p−1 (||x||p + ||y||p ). Proof. Assume (i). Let x0 = x + y, y0 = x − y. Then by (i), 2(||x0 ||p + ||y0 ||p ) ≤ ||x0 + y0 ||p + ||x0 − y0 ||p or 2(||x + y||p + ||x − y||p ) ≤ 2p (||x||p + ||y||p ). The other implication is proved similarly.

t u

Proposition A.19. If p > 2, then ||x + y||p + ||x − y||p ≤ 2p−1 (||x||p + ||y||p ) for x, y ∈ `p (or Lp ). Proof. In view of (A.17), it is enough to show that 2(||x||q + ||y||q )p−1 ≤ 2p−1 (||x||p + ||y||p ).

(A.20)

Suppose, without loss of generality, that ||y|| ≥ ||x|| > 0 and divide by ||y||q(p−1) = ||y||p so that (A.20) becomes 2(λq + 1)p−1 ≤ 2p−1 (λp + 1) with 0 ≤ λ ≤ 1. This is equivalent to showing that ψ(λ) = 2(p−2)/p (λp + 1)1/p /(λq + 1)1/q ≥ 1 for 0 ≤ λ ≤ 1. But ψ(1) = 1, ψ(0) > 1, ψ 0 (λ) < 0 gives the result. t u Corollary A.21. If p > 2, then `p (or Lp ) is uniformly convex. Proof. Let ||x|| = ||y|| = 1. If ||x − y|| ≥ , 0 <  ≤ 2, then ||x + y||p + ||x − y||p ≤ 2p−1 (||x||p + ||y||p ) = 2p and ||x + y/2||p ≤ [1 − (/2)p ] so that t u ||(x + y)/2|| ≤ [1 − (/2)p ]1/p . Now take δ() = 1 − [1 − (/2)p ]1/p . Proposition A.22. If 1 < p ≤ 2, then ||x + y||p + ||x − y||p ≥ 2p−1 (||x||p + ||y||p ) for x, y ∈ `p (or Lp ). Proof. In view of Lemma A.12, it is enough to show 2(||x||q + ||y||q )p−1 ≥ 2p−1 (||x||p + ||y||p ).

(A.23)

Suppose, without loss of generality, that ||y|| ≥ ||x|| > 0 and divide by ||y||q(p−1) = ||y||p so that (A.23) becomes

A Uniformly Convex Spaces

216

2(λq + 1)p−1 ≥ 2p−1 (λp + 1) with 0 ≤ λ ≤ 1. Then argue analogously to Proposition A.19 and the result t u follows. Corollary A.24. If 1 < p ≤ 2, then `p (or Lp ) is uniformly convex. Example A.25. Consider p = ∞. Then the inequality (for example), 2(||f || + ||g||) ≤ ||f + g|| + ||f − g|| is false. Let f ≡ 2, g ≡ 1 so that f + g ≡ 3 and f − g ≡ 1. Since uniform convexity implies reflexivity [C-4, H-5], neither `1 (or L1 ) or `∞ (or L∞ ) is uniformly convex. Let X1 and X2 be uniformly convex and let ψ : R0+ × R0+ → R+ 0 be a function which is (1) positively homogeneous, i.e., ψ(λr1 , λr2 ) = λψ(r1 , r2 ), λ ≥ 0; (2) strictly convex; and (3) strictly increasing in each variable sepP2 arately. For instance, ψ(r1 , r2 ) = ( 1 rip )1/p . Let X = X1 × X2 with ||x||ψ = ψ(||x1 ||, ||x2 ||); x = (x1 , x2 ). ||x||ψ is in fact a norm and X is a Banach space. Proposition A.26 ([C-4]). (X, ||x||ψ ) is uniformly convex. Corollary A.27. A product X1 × · · · × Xn of uniformly convex spaces with a ψ-norm is uniformly convex. Pn Corollary A.28. Lp × · · · × Lp = X, p > 1 with ||x||X = ( j=1 ||xj ||p )1/p is uniformly convex. For a proof of the proposition, see [C-4].

Appendix B

Operators

Let X and Y be (complex) Banach spaces. Definition B.1. A linear operator (or operator) from X into Y is a pair (D(T ), T ) consisting of a subspace D(T ) of X, called the domain of T , and a linear map T : D(T ) → Y . The image (or range) of T , Im (T ), is T (D(T )) and is also a subspace of Y . The kernel (or null space) of T , Ker (T ), is the subspace of X given by {x ∈ D(T ) : T x = 0}. A linear operator (D(Te ), Te ) is an extension of (D(T ), T ) if D(T ) ⊆ D(Te ) and Te x = T x for all x ∈ D(T ). We shall soon see why this formality is necessary. We often omit specific mention of the domain and simply speak of the operator T . Example B.2. Let X = C ([0, 1]) and let (D(T0 ), T0 ) be the operator with D(T0 ) = {x ∈ X : Dt x ∈ X, x(0) = 0}, T0 x = Dt x; and let (D(T1 ), T1 ) be the operator with D(T1 ) = {x ∈ X : Dt x ∈ X}, T1 x = Dt x. These operators are not the same even though T0 x = T1 x when x ∈ D(T0 ) ⊆ D(T1 ). For instance, T1 is not injective as T1 x = T1 (x + λ) for any λ ∈ C but T0 is injective. Suppose T is bounded on D(T ), i.e., ||T x|| ≤ M ||x|| for all x ∈ D(T ) and some M > 0, then T can be extended to D(T ) (closure of D(T )) and the extension is still bounded. Thus, if D(T ) is dense in X and T is bounded on D(T ), we may as well assume that D(T ) = X and T ∈ L(X, Y ). We now have [B-1, D-9]: Proposition B.3. Let T : X → Y be a linear map with D(T ) = X. The following are equivalent: (1) T is bounded, i.e., there is an M > 0 such that ||T x|| ≤ M ||x|| for all x; (2) T is uniformly continuous on X; (3) T is continuous at a point (which may be taken to be 0). Proof. (1) ⇒ (2): Let  > 0 and let δ = /M . If ||x1 − x2 || < δ, then ||T x1 − T x2 || = ||T (x1 − x2 )|| ≤ M ||x1 − x2 || ≤ . (2) ⇒ (3): Obvious. (3) ⇒

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0

217

B Operators

218

(1): If T is continuous at 0, then, for, say,  = 1, there is a δ > 0 such that ||x|| ≤ δ implies ||T x|| ≤ 1 (as T 0 = 0). Let x1 ∈ X. Then ||δx1 || ≤δ ||x1 || and

||T δx1 || δ = ||T x1 || ≤ 1 ||x1 || ||x1 ||

so that ||T x1 || ≤

1 ||x1 ||. δ

t u

If T is an injective operator, then T −1 : Im (T ) → X is given by T −1 (T x) = x. Proposition B.4. Suppose D(T ) = X. Then T −1 exists and is bounded if and only if there is an m > 0 such that ||T x|| ≥ m||x|| for all x ∈ X. Proof. If ||T x|| ≥ m||x||, then x 6= 0 implies T x 6= 0 and T is injective. But ||T −1 T x|| = ||x|| ≤ ||T x||/m and so T −1 is bounded. If T −1 exists and is bounded, then ||x|| = ||T −1 T x|| ≤ M ||T x|| and ||T x|| ≥ ||x||/M . t u Let us now recall the following: Theorem B.5 (Open Mapping Theorem [D-9, B-1, H-5]). If T is a surjective continuous linear map of X onto Y , then T maps open sets into open sets. Proof. See [D-9, B-1, H-5].

t u

If X and Y are Banach spaces, then X × Y with, say, ||(x, y)|| = (||x||2 + ||y||2 )1/2

(B.6)

(or any equivalent norm) is also a Banach space and if X and Y are Hilbert spaces, then so is X × Y with this norm. Definition B.7. If (D(T ), T ) is a linear operator, then the graph of T , Gr (T ), is the subspace of X × Y given by {(x, y) : x ∈ D(T ), y = T x}. Theorem B.8 (Closed Graph Theorem). If D(T ) = X and Gr (T ) is closed in X × Y , then T is continuous. Proof. Let πX , πY be the projections of X × Y on X and Y respectively. −1 −1 Then πX : Gr (T ) → X is bijective, linear and continuous and so (πX ) = −1 πX maps open sets onto open sets, i.e., πX is continuous. It follows that −1 T = πY ◦ πX is continuous. t u Definition B.9. An operator T is closed if Gr (T ) is closed.

B Operators

219

Example B.10. Let X = C ([0, 1]), Y = C 1 ([0, 1]), and let T : X → Y be given by (T x)(t) = Dt (x)(t) for D(T ) = {x ∈ X : Dt x ∈ X}. Then T is closed but T is unbounded. Since the “differentiation” operator is generally closed, closed operators are quite important. We have: Proposition B.11. (1) T is closed if and only if xn ∈ D(T ), xn → x, T xn → y together imply that x ∈ D(T ), T x = y; (2) if T is closed, then Ker T is closed; (3) if T is closed and injective, then T −1 is closed; (4) if D(T ) is closed and T is continuous, then T is closed. Proof. (1) is simply a statement of Gr (T ) being closed. As for (2), if xn ∈ Ker T , xn → x, then T xn = 0 → 0 and x ∈ D(T ) with T x = 0 (by T being closed). (3) Let yn ∈ D(T −1 ), yn → y, T −1 yn → x. Since yn ∈ D(T −1 ), yn = T xn with xn ∈ D(T ) and so T −1 yn = xn → x and T T −1 yn → T x, i.e., t u T xn → x and T −1 is closed. (4) is obvious. We shall now introduce the notion of an adjoint. This is a very important concept for the theory. We recall that if X is a Banach space, then X ∗ = {f : f ∈ L(X, k)}, k = R or C, is the dual space of X and is also a Banach space. We note that there is a natural isometric embedding of X into (X ∗ )∗ (= X ∗∗ ) and X is reflexive if X = X ∗∗ (by identification via the isometry). Definition B.12. A closed operator T is a closure of T if D(T ) ⊃ D(T ), T extends T , i.e., T x = T x for x ∈ D(T ), and if x ∈ D(T ), there are xn ∈ D(T ) such that xn → x and T xn → T x. Definition B.13. An operator T a : Y ∗ → X ∗ is an algebraic adjoint of T if (T a y ∗ )(x) = y ∗ (T x) for all x ∈ D(T ) and for y ∗ ∈ D(T a ). An operator T ∗ is the adjoint of T if T ∗ is an algebraic adjoint and if y ∗ (T x) = f (x), f ∈ X ∗ , for all x ∈ D(T ), then f ∈ D(T ∗ ) and T ∗ f = y ∗ . If D(T ) is dense in X, then D(T ∗ ) = {y ∗ ∈ Y ∗ : y ∗ T is continuous on D(T )} and T ∗ y ∗ = y ∗ T extended by continuity to X. We note that T ∗ is linear with D(T ∗ ) a subspace of Y ∗ and Im (T ∗ ) ⊂ X ∗ . Proposition B.14. If D(T ) is dense in X, then T ∗ is closed. Proof. Suppose yn∗ ∈ D(T ∗ ), yn∗ → y ∗ , and T ∗ yn∗ → x∗ . Then yn∗ T x → y ∗ T x for all x ∈ D(T ) and hence, for all x ∈ X by density. Therefore, y ∗ T is continuous on X and y ∗ ∈ D(T ∗ ). Then (T ∗ y ∗ )(x) = y ∗ T (x) for all x ∈ D(T ) and hence for all x by density. But (T ∗ yn∗ )(x) → T ∗ y ∗ (x) for all x and so T ∗ y ∗ = x∗ . t u Definition B.15. Let (D(T ), T ) be a linear operator from X into Y . T is bounded if there is an M > 0 such that ||T x|| ≤ M ||x|| for all x ∈ D(T ). T is continuous at x0 ∈ D(T ), if given  > 0, there is a δ > 0 such that x ∈ D(T ) ∩ S(x0 , δ) implies ||T x − T x0 || < .

B Operators

220

Proposition B.16 (same as Proposition B.3). Let T be a linear operator from X into Y . The following are equivalent: (1) T is continuous at a point x0 ∈ D(T ); (2) T is uniformly continuous on D(T ); (3) T is bounded on D(T ). Proof. (1) ⇒ (2): Let  > 0. Then there is a δ > 0 such that if x ∈ S(x0 , δ) ∩ D(T ), then ||T (x − x0 )|| < . Let x1 , x2 ∈ D(T ) with ||x1 − x2 || < δ. Then ||x0 + x1 − x2 − x0 || = ||x1 − x2 || < δ and x0 + x1 − x2 ∈ D(T ) together imply ||T (x1 − x2 )|| = ||T x0 − T (x0 + x1 − x2 || < . (2) ⇒ (3): Since 0 ∈ D(T ), ˜ ∈ S(0, δ) ∩ D(T ) implies ||T x given  = 1, there is a δ > 0 such that x ˜|| < 1. If x ∈ D(T ), x 6= 0, then x ˜= so that

δ x δ ∈ D(T ) and ||˜ x|| = < δ 2 ||x|| 2  δ x  T ≤ 1 2 ||x||

and hence ||T x|| ≤ (2/δ)||x|| so M = 2/δ gives a bound. (3) ⇒ (1) is obvious. t u Proposition B.17. D(T ∗ ) = Y ∗ if and only if T is continuous (with D(T ) = X). Then T ∗ is continuous and ||T ∗ || = ||T ||. Proof. If T is continuous D(T ) = X, then y ∗ T is continuous on X for all y ∗ ∈ Y ∗ so D(T ∗ ) = Y ∗ . If D(T ∗ ) = Y ∗ , then ||T ∗ y ∗ || = sup||x||=1 |(T ∗ y ∗ )(x)| = sup||x||=1 |y ∗ T x| so that ||T ∗ || = ||T ||. By the Hahn–Banach theorem (e.g., [D-9]), if xn → x, then T xn → T x if and only if y ∗ T xn → y ∗ T x for all y ∗ . t u Remark B.18. If Z is a reflexive Banach space, then {zν∗ } ⊂ Z∗ is total (see Definition 5.27) if and only if {zν∗ } is dense in Z∗ . Proof. Clearly dense implies total. Suppose {zν∗ } is total but not dense. Then there is a z ∗∗ ∈ Z∗∗ , z ∗∗ 6∈ 0 with z ∗∗ ({zν∗ }) = 0. By reflexivity, there is a z 6= 0, z ∈ Z with 0 = z ∗∗ zν∗ = zν∗ z for all zν∗ which contradicts {zν∗ } total. t u Remark B.19. If D(T ) is dense in X, i.e., D(T ) = X, then D(T ∗ ) is total and therefore dense in Y ∗ if Y is reflexive. Proof. Suppose, for y 6= 0, (0, y) 6∈ Gr (T ). Then there is a ψ ∈ (X×Y)∗ with ψ(Gr (T )) = 0 and ψ(0, y) 6= 0 (by the Hahn–Banach theorem). Let x∗ ∈ X ∗ be given by x∗ x = ψ(x, 0) and let y ∗ ∈ Y ∗ be given by y ∗ y = ψ(0, y). Then ψ(x, T x) = x∗ x + y ∗ T x = 0 for all x ∈ D(T ) and 0 6= ψ(0, y) = y ∗ y. So y ∗ ∈ D(T ∗ ) and y ∗ y 6= 0 (i.e., D(T ∗ ) is total). t u Lemma B.20. If D(T ) is dense and Y is reflexive, then (T ∗ )∗ exists and is a closed extension of T .

B Operators

221

Proof. By the remarks and Proposition B.14, (T ∗ )∗ exists and is closed. Clearly (T ∗ )∗ extends T . t u Corollary B.21. The closure T of T exists and D(T ) ⊆ D(T ) ⊆ D((T ∗ )∗ ) ⊆ X. Corollary B.22. If Y is reflexive, then T = (T ∗ )∗ = T ∗∗ and, similarly, if Y = X. Since the archtype reflexive space is Hilbert space, these results are particularly significant for Hilbert spaces. For instance, let H1 , H2 be Hilbert spaces with inner products h., .i1 and h., .i2 , respectively, and let T : H1 → H2 . Then, identifying H∗1 with H1 and H∗2 with H2 via the Riesz representation theorem, the adjoint T ∗ : H2 → H1 and hh1 , T ∗ h2 i1 = hT h1 , h2 i2

(B.23)

for h1 ∈ H1 , h2 ∈ H2 . In particular, if H1 = H2 , then a notion of self-adjoint makes sense. Definition B.24. Let M ⊂ X. The annihilator of M is M ⊥ = {f ∈ X ∗ : f (M ) = 0, i.e., f (x) = 0 all x ∈ M } = {f ∈ X ∗ : Ker f ⊇ M }. Let N ⊂ X∗ . ⊥ The X-annihilator of N is NX = {x ∈ X : N (x) = 0, i.e., f (x) = 0 for all T f ∈ N } = f ∈N Ker f . ⊥ We observe that, for subspaces M and N , both M ⊥ and NX are closed sub−1 spaces. Consider heuristically an equation T x = y. Then T y is a solution provided T −1 exists. Moreover we can “solve” the equation only for y ∈ Im T . If, for instance, T is a “differentiation” operator (so unbounded), then T −1 would be an “integration” operator (so very bounded). These remarks are some intuitive bases for the next set of results.

Proposition B.25. If M ⊂ X, then M = (M ⊥ )⊥ X. T ⊥ Proof. (M ⊥ )⊥ = {f : f (M ) = 0}. So M ⊂ X = f ∈M ⊥ Ker f and M ⊥ ⊥ ⊥ ⊥ (M )X which is closed and hence M ⊂ (M ⊥ )⊥ X . If x ∈ (M )X , x 6∈ M , then, by the Hahn–Banach theorem, there is an f such that f (M ) = 0 (so f (M ) = 0) and f (x) 6= 0 which is a contradiction. t u ⊥ ⊥ Proposition B.26. If N ⊂ X ∗ , then (NX ) ⊃ N and if X is reflexive, then ⊥ ⊥ (NX ) = N . ⊥ ⊥ ⊥ Proof. Let M = NX = {x : f (x) = 0 all f ∈ N }. M ⊥ = (NX ) = {f : f (M ) = 0}. If f ∈ N , there are fn ∈ N with fn → f . But fn (x) = 0 imples ⊥ ⊥ f (x) = 0 and so f ∈ (NX ) = M ⊥ . If X is reflexive and if f ∈ M ⊥ , f 6∈ N , ∗∗ then there is an x = x ∈ X ∗∗ = X such that x∗∗ (N ) = N (x) = 0 and ⊥ x∗∗ (f ) = f (x) 6= 0. But x ∈ NX = M implies f (x) = 0, a contradiction. t u

B Operators

222

Proposition B.27. T : X → Y with D(T ) dense in X Then: (a) (b) (c) (d)



Ker T ∗ = (Im T )⊥ = (Im T ) ; (Im T ) = (Ker T ∗ )⊥ X; Ker T ⊂ (Im T ∗ )⊥ ; X Im T ∗ ⊂ (Ker T )⊥ .

Proof. (a) If f ∈ (Im T )⊥ , then f (Im T ) = 0 and f (T x) = (T ∗ f )(x) = 0 for all x ∈ D(T ) and so f ∈ Ker T ∗ . If f ∈ Ker T ∗ , then (T ∗ f )(x) = 0 for all x ∈ D(T ) and so f (T x) = 0 for all x ∈ D(T ). ⊥ ⊥ (b) Let M = Im T . Then M = (M ⊥ )⊥ = Y = {y : M (y) = 0} and M ∗ ∗ ⊥ ⊥ ∗ ⊥ {y : y (Im T ) = 0} so that (MY ) = (Ker T )Y . ∗ ∗ ∗ ∗ ∗ (c) (Im T ∗ )⊥ X = {x : (T y )(x) = 0 for all y ∈ D(T )} = {x : y (T x) = 0 ∗ ∗ ∗ ⊥ for all y ∈ (D(T )} so Ker T ⊂ (Im T )X ). (d) Similarly, Im T ∗ ⊂ (Ker T )⊥ . t u Corollary B.28. If D(T ∗ ) is dense in Y ∗ , then Ker T = (Im T ∗ )⊥ X. ∗ ∗ ∗ ∗ ∗ ∗ Proof. If x ∈ (Im T ∗ )⊥ X , then x (x) = 0 for all x = T y , y ∈ D(T ), and ∗ ∗ ∗ ∗ ∗ so y (T x) = 0 for all y ∈ D(T ). Since D(T ) is dense, y (T x) = 0 for all y ∗ ∈ Y ∗ and T x = 0. t u

Corollary B.29. If Y is reflexive, then D(T ∗ ) is dense (D(T ) dense) and Ker T = (Im T ∗ )⊥ X. Corollary B.30. If T ∈ L(X, Y), then Ker T = (Im T ∗ )⊥ X. Theorem B.31 (See [D-9, B-1, H-5]). (1) Im T ∗ = X ∗ if and only if T has a bounded inverse; (2) If T is closed, then Im T = Y if and only if T ∗ has a bounded inverse. Proof. See [D-9, B-1, H-5].

t u

Lemma B.32. If T is closed, then T has a bounded inverse if and only if T is injective and Im T is closed. Proof. If T is injective and Im T is closed, then T −1 is closed and continuous by the closed graph theorem. If T has a bounded inverse, then X ∗ = Im T ∗ , so Ker T = (X ∗ )⊥ = {0}, i.e., T is injective. Suppose T xn → y. Since T −1 is bounded, ||T xn − T xm || ≥ λ||xn − xm || for some λ > 0. Hence xn is Cauchy and xn → x. Since T is closed, x ∈ D(T ) and T x = y, so Im T is closed. t u Theorem B.33. Suppose T : X → Y is closed with D(T ) dense. Then the following are equivalent: (i) (ii) (iii) (iv)

Im T is closed; Im T ∗ is closed; Im T = (Ker T ∗ )⊥ Y ; and Im T ∗ = (Ker T )⊥ .

B Operators

223

Proof. Clearly (iii) ⇒ (i) and (iv) ⇒ (ii). If Im T ∗ = (Ker T )⊥ , then ⊥ ⊥ ⊥ = ((Ker T )⊥ )X (ImT ∗ )X . If Im T ∗ is closed, then ((Im T ∗ )⊥ = Im T ∗ = X) ∗ ⊥ Im T = (Ker T ) so that (ii) ⇒ (iv). Similarly, (i) ⇒ (iii). So we show (ii) ⇒ (i) and the result follows. Let Y1 = Im T and consider T1 : X → Y1 with T1 x = T x. Then T closed implies T1 is closed. So by Theorem B.31, it is enough to show that Im T1 = Y1 . Now Im T1 = Y1 if and only if T1∗ has a bounded inverse. T1∗ is injective and has an inverse. But Im T ∗ = Im T1∗ is closed and the result follows by Lemma B.32. t u Lemma B.34. Suppose T is closed (with D(T ) dense). Then Im T is closed if and only if there is a λ > 0 such that if y ∈ Im T , there is an x ∈ D(T ), y = T x, and ||x|| ≤ λ||T x|| (or, equivalently, λ−1 ||x|| ≤ ||T x||). Proof. If Im T is closed, then, by the open mapping theorem, T S(0, 1) ⊃ {y : y ∈ Im T, ||y|| < m}. If 0 6= y ∈ Im T , then y = T x and y˜ = my/2||y|| ∈ Im T with ||˜ y || < m. So there is an x ˜ ∈ D(T ), ||˜ x|| ≤ 1 with y˜ = T x ˜. Let λ = 2/m. Then x ˜ = mx/2||T x||, ||˜ x|| ≤ 1 imply ||x|| ≤ λ||T x||. If there is such a λ and yn ∈ Im T with yn → y, then there are xn with yn = T xn and ||xn || ≤ λ||yn ||. Since yn is Cauchy, so is xn and xn → x, T xn → y. Since T is closed, x ∈ D(T ) and T x = y, i.e., y ∈ Im T . t u Theorem B.35. Suppose T : X → Y is closed with D(T ) dense. Then the following are equivalent: (a) (b) (c) (d)

T ∗ is surjective; T is injective and Im T is closed; there is an α > 0 such that α||x|| ≤ ||T x||; and there exists β > 0 so that inf sup x

y∗

|y ∗ (T x)| ≥ β, ||y ∗ || ||x||

||x|| = 6 0,

||y ∗ || = 6 0.

Proof. (c) and (d) are, by definition, equivalent. If T ∗ is surjective, then Im T ∗ = X ∗ is closed and X ∗ = Im T ∗ = (Ker T )⊥ so Ker T = 0 and T is injective. In other words, (a) ⇒ (b). Also (c) is equivalent to (b) by Lemma B.34 (as ||T x|| ≥ λ||x|| implies T injective). If Im T is closed and T is injective, then Im T ∗ = X ∗ is closed by Theorem B.33. t u The dual result is slightly different and we proceed with some propositions. Proposition B.36. Suppose T : X → Y is closed with D(T ) dense. If T is surjective, then Im T ∗ is closed and T ∗ is injective. Conversely, if Im T ∗ is closed and T ∗ is injective, then T is surjective. Proof. By Theorem B.33, Im T ∗ is closed (as T surjective means Im T is ∗ ∗ closed) and Y = Im T = (Ker T ∗ )⊥ Y so that Ker T = 0. Conversely, if Im T ∗ ∗ is closed, then Im T = (Ker T ∗ )⊥ and if T is injective, then Ker T = 0 and Y ⊥ (Ker T ∗ )⊥ t u Y = (0)Y = Y .

B Operators

224

Proposition B.37. There is a λ > 0 such that ||T ∗ y ∗ || ≥ λ||y ∗ || for all y ∗ ∈ D(T ∗ ) if and only if there is a µ > 0 such that inf sup ∗ y

x

|(T ∗ y ∗ )(x)| ≥ µ, ||y ∗ || ||x||

||y ∗ || = 6 0, ||x|| = 6 0, y ∗ ∈ D(T ∗ ).

Proof. Obvious.

t u

Proposition B.38. Suppose T : X → Y with D(T ) dense in X so that T ∗ is closed by Proposition B.14. If there is a λ > 0 such that ||T ∗ y ∗ || ≥ λ||y ∗ || for all y ∗ ∈ D(T ∗ ), then T ∗ is injective and Im T ∗ is closed. Proof. Since T ∗ y ∗ = 0 implies 0 = ||T ∗ y ∗ || ≥ λ||y ∗ || and ||y ∗ || = 0, T ∗ is injective. Let x∗n ∈ Im T ∗ with x∗n = T ∗ yn∗ and x∗n → x∗ . Then ||x∗n − x∗m || ≥ ∗ λ||yn∗ − ym || so that yn∗ is Cauchy and yn∗ → y ∗ , T yn∗ → x∗ . Since T ∗ is closed, ∗ ∗ y ∈ D(T ) and T ∗ y ∗ = x∗ . t u Proposition B.39. Suppose T : X → Y is closed with D(T ) dense. If D(T ∗ ) is dense in Y∗ (in particular if Y is reflexive) and if T ∗ is injective and Im T ∗ is closed, then there is a λ > 0 such that ||T ∗ y ∗ || ≥ λ||y ∗ || for all y ∗ ∈ D(T ∗ ). Proof. Lemma B.34 applied to T ∗ .

t u

Suppose now that T : X → Y is a densely defined closed operator so that T ∗ : Y ∗ → X ∗ is also a closed operator. Now consider the following: Problem 1. Given y ∈ Y , is there an x ∈ D(T ) such that T x = y? Problem 1∗ . Given x∗ ∈ X∗ , is there a y ∗ ∈ D(T ∗ ) such that T ∗ y ∗ = x∗ ? Properties 1. (1.a) Uniqueness for Problem 1, i.e., Ker T = {0}; (1.b)there is a λ > 0 such that ||x|| ≤ λ||T x|| for all x ∈ D(T ); (1.c) Im T is dense, i.e., Im T = Y; (1.d)Im T is closed. Properties 1∗ . (1∗ .a) Uniqueness for Problem 1∗ , i.e., Ker T ∗ = {0}; (1∗ .b) there is a λ∗ > 0 such that ||y ∗ || ≤ |λ∗ | ||T ∗ y ∗ || for all y ∗ ∈ D(T ∗ ); (1∗ .c) Im T ∗ is dense, i.e., Im T ∗ = X∗ ; (1∗ .d) Im T ∗ is closed. Proposition B.40. (i) (1.b) ⇒ (1.a); (ii) (1.d) ⇒ (1.c); and (iii) (1.d) ⇔ (1.b). Similarly for the ∗ properties. Proposition B.41. (i) (1∗ .c) ⇒ (1.a) and if X is reflexive, then (1.a) ⇒ (1∗ .a); (ii) (1.b) ⇔ T ∗ is surjective, i.e., ImT ∗ = X ∗ ; (iii) (1.d) ⇔ (1∗ .c) ≡ (1∗ .d);

B Operators

225

(iv) T is surjective, i.e., Im T = Y ⇔ (1∗ .b). Proof. Apply the various results Propositions B.27, Theorems B.31 and B.33, and note for the second part of (i) that D(A−1∗ ) = D(A∗ ). t u Definition B.42. Problem 1 is “well-posed” if Property (1.b) is satisfied and Problem 1∗ is “well-posed” if Property (1∗ .b) is satisfied. Theorem B.43. If T is a closed operator with D(T ) dense, then the (a priori) estimates (1.b), (1∗ .b) are necessary and sufficient conditions for surjectivity of T and T ∗ . The operators we shall deal with often come from control problems involving integral or partial differential equations. Often these have certain “finite-dimensional” aspects. Definition B.44. Let T : X → Y. Then, if W is a subspace of a Banach space Z, the codimension of W, codim W, is the dimension of W⊥ , i.e., codim W = dim W⊥ . We say (abuse of language) that the codimension of T , codim T , is codim Im T = dim Im T ⊥ . We note that if T has D(T ) dense, then codim T = dim(Im T )⊥ = dim Ker T ∗ . In particular, if codim T < ∞, then dim Ker T ∗ < ∞. We also have: Proposition B.45. If T is closed with D(T ) dense, then codim T < ∞ if and only if dim Ker T ∗ < ∞. Definition B.46. A closed densely defined operator T is a Fredholm operator if codim T < ∞ and dim Ker T < ∞. In that case, dim Ker T − codim T , is the index of T , i(T ). Moreover, T ∗ is also a Fredholm operator and i(T ∗ ) = dim Ker T ∗ − codim T ∗ = codim T − dim Ker T = −i(T ). If i(T ) = 0, T is a standard Fredholm operator. Definition B.47. Let T : X → Y and let S X = S X (0, 1) = {x : ||x|| ≤ 1}. Then T S X = {T x : x ∈ S X ∩ D(T )}. T is compact (or completely continuous) if T S X is compact in Y. Remark B.48. If T is bounded and dim Im T < ∞, then T is compact. Remark B.49. T is compact if and only if {xn } bounded in D(T ) implies that {T xn } has a convergent subsequence. Remark B.50. If T is compact, then T is continuous (on D(T )). Proof. Since T S X is compact (thus totally bounded), T S X is bounded. If y ∈ T S X , then ||y|| ≤ M for some M , i.e., y = T x, x ∈ S X ∩ D(T ), ||T x|| ≤ M . Hence x x ∈ S X ∩ D(T ) and T ≤ M ||x|| ||x|| so that ||T x|| ≤ M · ||x||.

t u

B Operators

226

Theorem B.51. Suppose that T ∈ L(X, Y). Then T is compact if and only if T ∗ is compact. Proof. Suppose T is compact and let  > 0 be given and set S = S X ∩D(T ) = S X . Then there are x1 , . . . , xn in S such that, if x ∈ S, then ||T x−T xi || < /3 for some xi . Let T˜∗ : Y∗ → Cn be given by T˜∗ y ∗ = (y ∗ T x1 , . . . , y ∗ T xn ).

(B.52)

Then T˜∗ is compact and there are yj∗ , j = 1, . . . , m, with yj∗ ∈ S Y ∗ such that y ∗ ∈ S Y ∗ implies ||T˜∗ y ∗ − T˜∗ yj∗ || ≤ ||y ∗ T x − y ∗ T xi || + ||y ∗ T xi − yj∗ T xi || + ||yj∗ T xi − yj∗ T x|| ≤ , so T ∗ is compact. Conversely if T ∗ is compact, then T ∗∗ is compact by the first part. Letting ηX , ηY be the isometric embeddings into X ∗∗ , Y∗∗ respectively, then T ∗∗ ηX = ηY T and noting the invertibility of the η’s, we have the result. t u Lemma B.53. If span {x1 , . . . , xn } = M is a finite-dimensional (hence closed) proper subspace of X, then there is an x ∈ S X such that d(x, S X ) = 1. Proof. If w ∈ X, w 6∈ M , then there is a sequence wj ∈ M such that ||w − wj || → d(w, M ) and so, a subsequence wjα with wjα → w, ˜ w ˜ ∈ M. Then ||w − w|| ˜ = limα→∞ ||w − wjα || = d(w − w, ˜ M ). Since w 6∈ M , w − w ˜ 6= 0 and  w−w  ˜ d , M = 1. ||w − w|| ˜ Lemma B.54. If S X is totally bounded, then dim X < ∞. Proof. There are x1 , . . . , xm in S X such that if x ∈ S X , then ||x − xi || < 1 for some xi . Let M = sp {x1 , . . . , xm }. If M 6= X, there is, by Lemma B.53, an x ∈ X, ||x|| ≤ 1 such that d(x, S X ) = 1. A contradiction. t u Theorem B.55. If T is compact and dm X ∩ D(T ) = ∞, then T invertible implies T −1 is unbounded. Proof. Suppose T has a bounded inverse on a subspace M of D(T ). Then T S M is totally bounded and, since T −1 is bounded, T −1 T S M = T S M is totally bounded and so, by Lemma B.54, M is finite-dimensional. Hence, T does not have a bounded inverse on any infinite-dimensional subspace of D(T ) (in particular X ∩ D(T )). t u Now let T : X → X be a compact operator, T ∈ L(X, X), so that T ∗ : X → X ∗ is also compact. Then λI − T , λI − T ∗ , λ ∈ C are Fredholm operators and we have: ∗

Theorem B.56. Consider the equations (1) (λI − T )x = y, and

(1∗ ) (λI − T ∗ )x∗ = y ∗ ,

B Operators

227

(1∗h ) (λI − T ∗ )x∗ = 0.

(1h ) (λI − T )x = 0,

Then (1) and (1∗ ) have unique solutions for any y in X or any y ∗ in X ∗ if and only if the homogeneous equations (1h ), (1∗h ) have only the zero solution, then both have finite-dimensional solution spaces of the same dimension, i.e. dim Ker (λI − T ) = dim Ker (λI − T ∗ ) = n < ∞. Moreover, in that case (1) has a solution if and only if y ∈ Ker (λI − T )⊥ and (1∗ ) has a solution if and only if y ∗ ∈ Ker (λI − T )⊥ . Compact operators are intimately connected with integral equations, especially through kernels. We have: Proposition B.57 ([K-2]). Let 1 < p < ∞ and (1/p) + (1/q) = 1. Let (S, B, µ) be a measure space and let K : S × S → C or R be a map such that hZ nZ i oq/p (B.58) µ(dt) = M < ∞. |K(t, s)|p µ(ds) S

S

Then the map T : Lp (S, C or R) → Lp (S, C or R) given by Z K(t, s)f (s)µ(ds) T f (t) =

(B.59)

S

is compact and ||T || ≤ M . Corollary B.60. If p = q = 2 and hZ Z i1/2 = ||K( · , · )||2 = M < ∞, |K(t, s)2 µ(ds)µ(dt) S

S

then T is a compact operator on L2 . Proposition B.61. If Z ess sup |K(t, s)|µ(ds) ≤ M t

S

Z and

|K(t, s)|µ(dt) ≤ M,

ess sup s

S

then T is a compact operator in L(Lp , Lp ) with ||T || ≤ M . Proposition B.62. If S is compact with µ a Borel measure and if K is continuous, then T is a compact operator in L(C (S), C (S)). Proofs can be found in [D-9, K-2]. Let us now examine some simple examples. Example B.63. Let X = L2 [0, 1] and let D ⊂ X be given by D = {x : x is absolutely continuous and Dt x = dx/dt ∈ X}. We consider several operators Ti : X → X with domains Di = D(Ti ), all containing D. Let T1 = Dt with D1 = D. Let T2 = Dt with D2 = D ∩ {x : x(0) = 0}. Let T3 = Dt with D3 = D ∩ {x : x(0) = 0, x(1) = 0}. Let T4 = Dt with D4 = D ∩ {x : x(0) = 0, x0 (0) = 0}. Let T5 = Dt with D5 = D ∩ {x : x(0) − x(1) = 0, i.e.,

B Operators

228

x(0) = x(1)}. We shall treat T1 in detail and leave analysis details of the other operators to the reader. We first list the various properties: (1) T1 is closed with D1 dense and Im T1 = X; T1∗ = −Dt and D(T1∗ ) = D3 . (2) T2 is closed with D2 dense and Im T2 = X; T2∗ = −Dt and D(T2∗ ) = D ∩ {x : x(1) = 0}. (3) T3 is closed with D3 dense; T3∗ = −Dt and D(T3∗ ) = D1 . (4) T4 is closed with D4 dense but Im T4 is not closed; T4∗ = −Dt and D(T4∗ ) = D(T2∗ ) so that T4∗ = T2∗ . (5) T5 is closed with D5 dense; T5∗ = −Dt and D(T5∗ ) = D5 . We observe that T3 ⊂ T2 ⊂ T1 (where here ⊂ means is extended by, i.e. T1 extends T2 which extends T3 ) and that T1∗ ⊂ T2∗ ⊂ T3∗ . Generally, restricting the domain of an operator T increases (or at least does not reduce) the domain of T ∗ . In other words, the “more constraints on T the fewer constraints on T ∗ .” Let us now deal with T1 . First T1 is closed. Suppose xn → x and T1 xn = x0n → y. Then t

Z xn (t) = 0

dxn ds + xn (0) ds

and xn (t) → x implies Z x(t) =

t

y(s)ds + lim xn (0). n→∞

0

It follows that limn→∞ xn (0) exists and that x0 = y, i.e. T1 is closed. Let h., .i denote the inner product in L2 [0, 1]. Suppose hT1 x, yi = hx, wi for all x ∈ D1 , i.e., w = T1∗ y. Now, Z

1

Z x(t)w(t)dt = −

0

1

Z hZ t i x (t) w(s)ds dt + x(1) 0

0

0

1

w(s)ds

(B.64)

0

(integration by parts of hx, wi). Since x0 (t) ≡ 1 is in D1 and T1 x0 = 0 R1 (so T1 is not injective), hx0 , wi = 0 so that 0 w(s)ds = 0 (or equivalently R1 w(s)ds = 0). It follows that 0 Z hx, wi =

1

1

Z

x0 (t)

x(t)w(t)dt = − 0

0

for all x ∈ D1 . But

1

Z hT1 x, yi = 0

hZ

t

i w(s)ds dt

(B.65)

0

dx y(t)dt dt

(B.66)

and (B.65), (B.66) together imply Z 0

1

Z t i h x0 (t) y(t) + w(s)ds dt = 0 0

(B.67)

B Operators

229

Rτ Rt for all x ∈ D1 . Substituting x(t) = 0 [y(τ ) + 0 w(s)ds]dt into (B.67) we Rt find that y(t) = − 0 w(s)ds and hence that w(t) = −Dt y, i.e., T1∗ = −Dt . Rt R1 Moreover, since y(t) = − 0 w(s)ds, y(0) = 0 and y(1) = 0 = 0 w(s)ds. In other words, D(T1∗ ) = D3 . Also Ker T1∗ = 0 which implies Im T1 = L2 [0, 1]. t u The other properties are proved in a similar fashion. Example B.68. Let X = L2 [0, 1] and let D = {x : x, x0 , x00 ∈ X and x, x0 absolutely continuous}. Let T = −Dt2 = −d2 /dt2 and let D(T ) = D ∩ {x : x(0) = 0, x(1) = 0}. Consider the equation T x = u. Let K(t, x) be given by ( (1 − t)s, s < t, 0 ≤ s < t ≤ 1, K(t, s) = (B.69) (1 − s)t, t < s, 0 < t < s ≤ 1, on [0, 1] × [0, 1]. Consider the operator (compact) U given by Z x(t) = (U u)(t) =

1

K(t, s)u(s)ds

(B.70)

0

Note that since K(0, s) = 0, K(1, s) = 0, x(0) = 0, and x(1) = 0. We claim that x = U u is a solution of T x = u. Let t0 ∈ (0, 1). Then dx = dt and

Z 0

t0 −

dK(t, s) u(s)ds + dt

Z

1

t0 +

dK(t, s) u(s)ds dt

dK(t, s) dK(t, s) − = −1. dt dt (t,t0 −) (t,t0 +)

(B.71)

(B.72)

Since d2 K(t, s)/dt2 = 0 for s < t, s > t, it follows that d2 x = dt2

Z 0

t0 −

d2 K(t, s) u(s)ds + dt2

Z

1

t0 +

d2 K(t, s) dK u(s)ds + u(t0 −) dt2 dt t,t0 −

dK u(t0 +) = −u(t0 ), − dt (B.73) or, in other words, −T U u = u. If the boundary conditions are x(0) = a, x(1) = b, then the solutions is x(t) = (U u)(t) + (1 − t)a + tb = R1 K(t, s)u(s)ds + (1 − t)a + tb. 0 Example/Exercise B.74 ([S-5]). Let X = L2 [0, 1] and let D = {x : x, x0 , x00 ∈ X, x, x0 absolutely continuous}. Consider the operators: (1) (2) (3) (4)

T1 T2 T3 T4

= Dt2 = d2 /dt2 , D(T1 ) = D; = Dt2 , D(T2 ) = D ∩ {x : x(0) = 0, x(1) = 0}; = Dt2 , D(T3 ) = D ∩ {x : x0 (0) = 0, x0 (1) = 0}; = Dt2 , D(T4 ) = D(T2 ) ∩ D(T3 );

B Operators

230

(5) T5 = Dt2 , D(T5 ) = D ∩ {x : x(0) = 0, x0 (0) = 0}; (6) T6 = Dt2 , D(T6 ) = D ∩ {x : x(0) = 0, x0 (0) = 0, x00 (0) = 0}. The operators T1 , T2 , T3 , T4 , T5 are closed and T5 is the closure of T6 . The operators T2 and T3 are self-adjoint (i.e., T2∗ = T2 , T3∗ = T3 ). The operator T4 is symmetric. Determine the adjoints and domains as in Example B.63. Example B.75. Consider the unit disc C = {(x, y) : x2 + y 2 < 1} (or {z : |z| < 1} in the complex plane) and let γ be the circle x2 + y 2 = 1. Let ∇2 =

∂2 ∂2 + ∂x2 ∂y 2

and u ∈ C (γ). Consider the Dirichlet problem: ∇2 ψ = 0 on C and ψ(γ) = u(γ). Using polar coordinates (r, θ), C = {reiθ : 0 ≤ r < 1, θ ∈ [−π, π]} and γ = {reiθ : r = 1}. Suppose K(r, θ) =

 1 + reiθ  1 − r2 = Re , 1 − 2r cos θ + r2 1 − reiθ

0 ≤ r < 1, −π ≤ θ ≤ π,

and let T be given by 1 (T f )(re ) = 2π iθ

Z

π

K(r, θ − s)f (eis )ds.

−π

T maps C (γ) into harmonic functions on C. Example B.76. Consider the half-plane H : y > 0, i.e., {(x, y) : y > 0} and let γ be its boundary, so γ is simply the x-axis. Let ∇2 = ∂ 2 /∂x2 + ∂ 2 /∂y 2 and let f ∈ Lp (R) (= Lp (γ)). Consider the problem: determine ψ such that ∇2 ψ = 0 on H and ψ(γ) = f (γ). Let (T f )(x, y) be given by Z 1 ∞ y (T f )(x, y) = f (s)ds, π −∞ (x − s)2 + y 2 then (T f )(x, y) gives a solution. The map is actually into a Hardy space H p [S-6] and ||ψ||Hp = ||f ||Lp . Example B.77. Let D be a domain in Rn with ∂D “smooth” and let L be a (symmetric) differential operator given by Lψ = −

n X n X ∂ n ∂ψ o aij + bψ ∂xi ∂xj i=1 j=1

with aij = aji , i, j = 1, . . . , n. Let ν be the external normal to ∂D. Green’s formula [S-6, S-4]

B Operators

231

Z

Z (ψNϕ − ϕNψ)dσ

(ϕLψ − ψLϕ)dx = ∂D

D

Pn Pn ∂ψ (σ for ∂D) where Nψ = i=1 j=1 aij ∂x cos(ν, xi ), connects integrals over j D with integrals on the boundary ∂D and leads to the notion of a Green’s function for the operator L, which we denote by KL (x, y). If u is a given function of ∂D, then we seek a solution ψ of Lψ = 0 such that ψ(∂D) = u. The “solution” is determined by the (unique) solution v of the Fredholm equation Z ∂KL v(σ) v(σ) (x, σ)dσ − = u(σ), ∂ν 2 ∂D since

Z ψ(x) =

v(σ) ∂D

∂KL (x, σ)dσ. ∂v

Properties of the Fredholm operator depend on the domain, particularly smoothness of the boundary and the space to which the boundary data belong. We generally deal with Fredholm equations of the following forms: (1) T ψ = u; (2) T ψ = λψ, λ in C; (3) T ψ = λψ + u, λ ∈ C where T is a Fredholm operator. In case (2), λ is called an eigenvalue if ψ 6= 0 exists. Equations of the form (1) are called Fredholm equations of the first kind and equations of the form (3) are called Fredholm equations of the second kind. Example B.78. Consider E = [0, 1] × [0, 1] and let K(s, t) be continuous on E − {s − t = 0} and bounded on E, i.e., |K(s, t)| ≤ M on E. Consider the linear map T : L1 [0, 1] → C ([0, 1]) given by Z

1

(T u)(s) =

K(s, t)u(t)dt 0

and an equation (I − T )u = f,

f ∈ L1 [0, 1].

(B.79)

One approach is to use successive approximation. Let u0 = 0, u1 = f +T u0 (= Pn−1 f ), u2 = f + T u1 , . . . , un = f + T un−1 = j=0 T j f, . . .. Note C ([0, 1]) ⊂ P∞ L1 [0, 1]. The series j=0 T j f is called the Neumann [K-2] series. If the series P∞ is uniformly convergent, then the sum u = j=0 T j f solves (B.79) as Tu =

∞ X

T j+1 f = u − f.

j=0

If M < 1, then the series is uniformly convergent. Example B.80. Let E = [0, 1] × [0, 1] and let K(s, t) be a bounded Volterra kernel so that K(s, t) = 0, for 0 ≤ s < t ≤ 1, and let M = max |K(s, t)| on E. Again let T : L1 [0, 1] → C ([0, 1]) be given by

B Operators

232 s

Z

K(s, t)u(t)dt.

(T u)(s) = 0

Then |T u(s)| ≤ M · ||u||1 s and |T n u(s)| ≤ M · ||u||1 sn−1 /(n − 1)! which implies ∞ ∞ X X M · ||u||1 j T u ≤ < ∞, (j − 1)! j=1 j=1 and successive approximation solves (B.79). Proposition B.81. P∞ Let T ∈ L(X, X). If ||T || < 1, then the series I + T + T 2 + · · · = j=0 T j is (norm) convergent and I − T is invertible with P∞ (I − T )−1 = j=0 T j . Proof. Note that ||T (T x)|| ≤ ||T || ||T x|| ≤ ||T ||2 ||x|| and by induction, ||T n || ≤ ||T ||n . It follows that n

||I + T + · · · + T || ≤ 1 +

n X

j

||T || ≤ 1 +

j=1

∞ X

||T ||j < ∞

j=1

for all n as ||T || < 1. Thus the series converges and (I − T )

∞ X j=0

∞ ∞  X X Tj = Tj − T j = I. j=0

j=1

Corollary B.82. If, under the hypothesis of Proposition B.57, the estimate M of (B.58) is less than 1, then I − T is invertible and has bounded inverse with ∞ X 1 −1 ||(I − T ) || ≤ 1 + Mj = . 1 − M j=1 Definition B.83. An operator T in L(X, Y) has finite rank if dim Im T < ∞ (and hence T is compact). We observe that if Tn → T , Tn be of finite rank, then T is compact. In fact, for Tn → T in L(X, Y) (Y Banach) with Tn compact, then T is compact. R1R1 Example B.84. Let S = [0, 1] and let K(s, t) satisfy 0 0 |K(s, t)|2 dsdt < ∞. Then there are n X Kn (s, t) = αj (s)βj (t) j=1

(clearly of finite rank) such that Kn → K and hence Tn given by Tn f = R1 R1 Kn (s, t)f (t)dt converges to T given by T f = 0 K(s, t)f (t)dt. 0 Example B.85. Suppose that T is of finite rank with dim Im T = n and let y ∈ Im T = En (Euclidean n-space). Then T x = Σaj ϕj where ϕj is a basis

B Operators

233

of Im T and y = Σbj ϕj . So to solve T x = y we need aj = bj , j = 1, . . . , n, a finite number of equations.PSuppose now, that as in Example B.84, T is n defined by kernel, K(s, t) = j=1 ui (s)vj (t) where uj ( · ), vj ( · ) are linearly independent.PConsider the equation x − T x = y or (I − T )x = y. Then n (T x)(s) i=1 hx, vi iui and every solution (if any) is of the form x = Pn = y + j=1 ξj uj . Then y+

X

n n D E X X y+ ξj uj = y + ξi vi , uj uj ,

j=1

j=1

i=1

Pn and it follows that ξj − i=1 hvi , uj iξi = hy, uj i for j = 1, . . . , n. In other words, if A = (aij ) where aij = hvi , uj i, then there is a solution if and only if det(I − A) 6= 0. Proposition B.86. Suppose that T ∈ L(X, X) is of finitePrank with n dim Im T = n. Let ϕi , . . . , ϕn be a basis of Im T and let T ϕi = j=1 αij ϕj , i = 1, . . . , n. If A = (αij ), then x − T x = y (i.e., (I − T )x = y) has a solution if and only if det (In − A) 6= 0. Pn Proof. Any solution of x−T x = y is of the form x = T x+y = i=1 ξi ϕi +y since T x ∈ span {ϕ1 , . . . , ϕn }. As in Example B.85, we have y+

n X i=1

ξi ϕi = y + T y +

n X

ξi T ϕi = y + T y +

i=1

n X i=1

ξi

n X

αij ϕj

i=1

Pn and it followsPthat there is a solution if and only if ξi − ( i=1 αji ξj ) = αi n where T y = i=1 αi ϕi and i = 1, . . . , n. In other words, there is a solution if and only if det(In − A) does not vanish. t u Corollary B.87. Let T ∈ L(X, X). Suppose that T = T1 + T2 where ||T1 || < 1 and T2 has finite rank. Set K = T2 (I − T1 )−1 (note that (I − T1 )−1 exists by Corollary B.82) and note that K has finite rank. If det (I − K) 6= 0 (as in the proposition), then (I − T )x = y has a solution. Proof. x − T x = y is equivalent to (I − T1 )x − T2 x = y. Let x = (I − T1 )−1 x1 t u so that the equation becomes x1 − Kx1 = y which has a solution. Corollary B.88. If T is a compact operator defined by a kernel K(s, t) as in (say) Proposition B.57, then T = T1 + T2 where ||T1 || < 1 and T2 has finite rank. Proof. K(s, t) is a limit of degenerate kernels Kn (s, t) =

n X i=1

αi (s)βi (t).

t u

B Operators

234

Now, let us recall that if (S, B, µ) is a measure space and K : S × S → C or R is a Kernel such that Z Z |K(t, s)|2 µ(dt)µ(ds) < ∞, s

s

then T is a compact operator from L2 → L2 and is called a Hilbert–Schmidt operator. The kernel is symmetric if K(t, s) = K(s, t) for all s, t. Such operators have some special properties as we shall see in the sequel. Example B.89. Let S = (−∞, ∞) and let K(s, t) be given by K(s, t) =

e−|s−t| , 2

(B.90)

then K(s, t) = K(t, s). But Z



−∞



h e−|s−t| i2

−∞

2

Z

dsdt = ∞

so K is not a Hilbert–Schmidt kernel. However, let Z ∞ −|s−t| e (T f )(s) = f (t)dt 2 −∞ for f ∈ L2 (−∞, ∞). Then Z Z 1 ∞ ∞ −|x| −|y| 2 |T f (s)| = e e f (s + x)f (s + y)dxdy 4 −∞ −∞ and |T f ( · )|2 =

1 ||f ||2 4

Z



−∞

Z



e−|x| e−|y| dxdy = ||f ||2 .

−∞

Hence T is bounded with ||T || ≤ 1. In fact ||T || = 1. The operator T is derived from the differential operator (−Dt2 + 1) on (−∞, ∞). Suppose now that X, Y are Banach spaces and that Y is reflexive. Let Q ∈ (X × Y)∗ . Q is called a quadratic functional. Associated with Q is an operator TQ : X → Y∗ given by (T Qx)(y) = Q(x, y),

(B.91)

TQ is an element of L(X, Y∗ ) and ||TQ || = ||Q|| where ||Q|| = inf sup x6=0 y6=0

From earlier results, we have:

|Q(x, y)| . ||x|| ||y||

(B.92)

B Operators

235

Proposition B.93. The following are equivalent: (1) for all f ∈ Y ∗ , there is a unique xf ∈ X such that Q(xf , y) = f (y) for all y ∈ Y (i.e. TQ xf = f ); (2) there is a µ > 0 such that inf sup

x6=0 y6=0

|Q(x, y)| ≥µ ||x|| ||y||

and, if Q(x, y) = 0 for all x, then y = 0. Problem B.94. Given f ∈ Y∗ , find an x ∈ X such that Q(x, y) = f (y) for all y ∈ Y , i.e., TQ x = f . Definition B.95. Problem B.94 is well-posed (Hadamard) if given f , there is a unique solution xf and there is a λ > 0 such that ||xf || ≤ λ||f ||. Theorem B.96 (Generalized Lax–Milgram Theorem). Problem B.94 is wellposed if and only if (i) there is a µ > 0 such that inf sup

x6=0 y6=0

|Q(x, y)| ≥ µ; ||x|| ||y||

(ii) for any y, if Q(x, y) = 0 for all x, then y = 0. Corollary B.97. Under the assumptions of the theorem, ||xf || ≤ λ||f || where λ = 1/µ. Proof (of Corollary). µ||xf || ≤ sup y6=0

|Q(xf , y)| |f (y)| = sup = ||f ||. ||y|| ||y||

t u

The theorem can be deduced from the complex of ideas around Theorem B.35. Observe also that if Q is a coercive element of (X × X)∗ , i.e., Q(x, x) ≥ α||x||2 for some α > 0, then the conditions of Theorem B.96 are satisfied. Let X be a Banach space. We have: Definition B.98. A linear map P : X → X is a projection if P is continuous and P 2 = P . If P is a projection, then so is I − P and every element x of X can be expressed uniquely as x = x1 + x2 where P x1 = x1 , P x2 = 0. In fact, x = P x+(I −P )x. Moreover, P X is a closed subspace of X (if yn = P xn → y, then P yn = P 2 xn = P xn → P y and so y = P y ∈ P X). On the other hand, if M1 and M2 are closed subspaces with X = M1 ⊕ M2 , then the map P1 : X → X given by P1 x = m1 where x = m1 + m2 , mi ∈ Mi is a

236

B Operators

projection. Thus projections correspond to direct sum decompositions of X. We say that projections P1 , P2 commute if P1 P2 = P2 P1 . Proposition B.99. Let P be a projection. Then the adjoint P ∗ of P is a projection and P ∗ X∗ = [(I − P )X]⊥ , (I − P ∗ )X ∗ = (P X)⊥ . Proof. Clearly P ∗ is continuous and P ∗2 = P ∗ , so P ∗ is a projection. Let x∗ ∈ X ∗ with P ∗ x∗ = x∗ . Then x∗ (I − P )x = x∗ x − x∗ P x = x∗ x − P ∗ x∗ x = 0 so x∗ ∈ [(I − P )X]⊥ . On the other hand, if x∗ ∈ [(I − P )X]⊥ , then x∗ [(I − P )x] = 0 for all x so that x∗ x − x∗ P x = (x∗ − P ∗ x∗ )(x) = 0 for all t u x and x∗ = P ∗ x∗ . That (I − P ∗ )(X∗ ) = (P X)⊥ follows. Proposition B.100. Let P1 , P2 be commuting projections and let M1 = P1 X, M2 = P2 X, N1 = (I − P1 )X, N2 = (I − P2 )X. Then: (1) P1 P2 = P2 if and only if M2 ⊆ M1 ; (2) if P = P1 P2 , then P is a projection with P X = M1 ∩M2 and (I −P )X = sp (N1 ∪ N2 ); (3) if P = P1 + P2 − P1 P2 , then P is a projection and P X = sp (M1 ∪ M2 ) and (I − P )X = N1 ∩ N2 ; and (4) if P = P1 − P2 , then P is a projection if and only if P1 P2 = P2 . Proof. (1) If P1 P2 = P2 , then P1 (P2 X) = P2 X = M2 ⊆ M1 = P1 X. If P2 X = M2 ⊆ M1 = P1 X, then P1 (P2 x) = P2 x so P1 P2 = P2 . (2) (P1 P2 )(P1 P2 ) = P12 P22 (commuting)= P1 P2 so a projection. If y = P1 P2 x, then y = P2 P1 x and y ∈ M1 ∩ M2 and similarly for (I − P1 P2 )X = sp (N1 ∪ N2 ). (3) (P1 + P2 − P1 P2 )2 = P12 + P1 P2 − P12 P2 + P2 P1 + P22 − P1 P12 − P12 P2 − P1 P22 + (P1 P2 )2 = P1 + P2 − P1 P2 (using commuting and P12 = P1 , P22 = P2 ). The rest is an argument similar to (2). (4) (P1 − P2 )(P1 − P2 ) = P12 + P22 − P1 P2 − P2 P1 = P1 + P2 − P1 P2 − P2 P1 will be P1 − P2 if and only if P2 = P1 P2 . t u We can order projections as follows: P1 ≺ P2 if P1 P2 = P2 P1 = P1 or, equivalently, if P1 X ⊆ P2 X and (I − P1 )X ⊇ (I − P2 )X. This defines a partial order on projections. Proposition B.101. ≺ is a partial order. In other words, (a) P1 ≺ P1 ; (b) if P1 ≺ P2 and P2 ≺ P1 , then P1 = P2 ; and (c) if P1 ≺ P2 , P2 ≺ P3 , then P1 ≺ P3 . Proof. Obvious.

t u

If P1 and P2 commute, then P1 + P2 − P1 P2 is a least upper bound for P1 , P2 with respect to ≺ and we write P1 ∨ P2 (sometimes called the “join”) and P1 P2 is a greatest lower bound of P1 , P2 and we write P1 ∧ P2 = P1 P2 . Definition B.102. A partially ordered set L with order < is a Boolean algebra if:

B Operators

237

(i) every pair of elements λ, µ in L has a greatest lower bound λ ∨ µ and a least upper bound λ ∧ µ; (ii) L has a unit 1 such that λ < 1 for all λ ∈ L; (iii) L has a zero 0 such that 0 < λ for all λ ∈ L; (iv) L is distributive, i.e., λ ∧ (µ ∨ ν) = (λ ∧ µ) ∨ (λ ∧ ν) for all λ, µ, ν ∈ L; (v) L is complemented, i.e., if λ ∈ L, then there is a λ0 (the complement of λ) in L such that λ ∨ λ0 = 1 and λ ∧ λ0 = 0. Observing that I and 0 (the linear operator 0x = 0) are projections which commute with any projection, we can see that there will be Boolean algebra of commuting projections. Example B.103. Let Σ be a set. Define a partial order < on subsets of Σ by A < B if A ⊆ B. Then Σ, < is a Boolean algebra with unit 1 = Σ, zero 0 = φ, glb A ∨ B = A ∪ B, and lub A ∧ B = A ∩ B, and A0 = Σ − A. Let Σ be a topological space and let Σs = {A ⊂ Σ : A is both open and closed}. If A, B ∈ Σs , set A ∨ B = A ∪ B and A ∧ B = A ∩ B. Then Σs is a Boolean algebra with unit Σ, zero φ, and complementA0 = Σ − A. Distributivity simply is A ∧ (B ∨ C) = A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) = (A ∧ B) ∨ (A ∧ C). Suppose now that X = H is a (complex) Hilbert space with inner product h , i. We identify H∗ with H via the Riesz representation theorem. If M is a closed subspace of H, then M⊥ = {h ∈ H : hh, Mi = 0} is the orthogonal complement of M and H = M ⊕ M⊥ . We say that elements h1 , h2 in H are orthogonal if hh1 , h2 i = 0 and we write h1 ⊥ h2 . Proposition B.104. Let M be a closed subspace of the Hilbert space H and let P : H → H be given by P h = m where h = m + m0 is the unique representation of h in M ⊕ M⊥ . Then (1) (2) (3) (4)

P is a projection; Im P = M and Ker P = M⊥ ; ||P || = 1; and P = P ∗ is self-adjoint. P is called the orthogonal projection onto M.

Proof. (1) P is clearly linear and idempotent. If h = m + m0 , then P h = m and ||P h||2 = ||m||2 ≤ ||m||2 + ||m0 ||2 = ||h||2 (as m ⊥ m0 ), and P is continuous. (2) Obvious. (3) Since ||P h|| ≤ ||h||, ||P || ≤ 1, and if h = m ∈ M, then ||P h|| = ||h|| so ||P || = 1. (4) Let h1 = m1 + m01 , h2 = m2 + m02 . Then hP h1 , h2 i = hm1 , h2 i = hm1 , m2 i and hh1 , P h2 i = hh1 , m2 i = hm1 , m2 i, i.e., P = P ∗. t u Proposition B.105. If P : H → H is idempotent and self-adjoint, then P is an orthogonal projection onto M = P H. ˜ mn ∈ M. Proof. M is closed for if m ˜ ∈ M, then there are mn → m, Since P = P ∗ and P ∗ is closed (by Proposition B.14), P is continuous and

238

B Operators

˜ Let h ∈ H, then h = m + m1 where m = P h, ˜ = m. P mn = mn → P m m1 = (I − P )h. Since P 2 = P , P m = m and m ∈ M. Claim (I − P )h ∈ M⊥ . If m ∈ M, then hm, (I − P )hi = h(I − P )m, hi = h0, hi = 0 so (I − P )h is in t u M⊥ . Proposition B.106. Let T ∈ L(H, H) and let M be a closed subspace of H with P the orthogonal projection on M. Then T M ⊂ M (i.e., M is invariant under T ) if and only if T P = P T P . Proof. Suppose T P = P T P and m ∈ M. Then T m = T P m = P (T P m) = P T P m ∈ M. On the other hand, if T M ⊂ M and h ∈ H with h = m + m0 , then P h = m and T P h = T m = m ˜ ∈ M so that P (T P h) = T P h for all t u h ∈ H. Corollary B.107. T M ⊂ M and T M⊥ ⊂ M⊥ if and only if T commutes with P . Now let P1 , P2 be orthogonal projections on closed subspaces M1 , M2 , respectively. If P1 , P2 commute, then P = P1 P2 is an orthogonal projection onto M1 ∩ M2 , by Proposition B.100. On the other hand, if P = P1 P2 is an orthogonal projection, then P = P1 P2 = P ∗ = (P1 P2 )∗ = P2∗ P1∗ = P2 P1 , i.e., P1 , P2 commute. We call orthogonal projections P1 , P2 orthogonal if P1 P2 = P2 P1 = 0 and we write P1 ⊥ P2 . We observe that P1 ⊥ P2 if and only if M1 ⊥ M2 . Definition B.108. Let T : H → H be an element of L(H, H). T is positive if hT h, hi ≥ 0 for all h (this means also that hT h, hi is always real).B.1 T is positive definite if hT h, hi > 0 for h 6= 0. T is coercive if hT h, T hi ≥ λ||h||2 for λ > 0. Let ΣA = {T ∈ L(H, H) : T is self-adjoint}. Define ≺ on ΣA by T1 ≺ T2 if and only if T2 − T1 is positive. Observe that ≺ defines a partial order on ΣA and that if P is an orthogonal projection, then P ∈ ΣA . Also 0 and I are elements of ΣA . However, ΣA is not a Boolean algebra (exercise: show this). Proposition B.109. Let P1 , P2 be projections on M1 , M2 , respectively. Then P1 ≺ P2 in ΣA if and only if P1 P2 = P2 P1 = P1 . Proof. If P1 , P2 commute with P1 P2 = P2 P1 = P1 , then by Proposition B.100, P2 − P1 is an orthogonal projection and h(P2 − P1 )x, xi = ||(P2 − P1 )x||2 ≥ 0. If P2 − P1 is positive, then ||P2 x||2 = hP2 x, xi ≥ hP1 x, xi = ||P1 x||2 . If x ∈ M1 , then ||x|| = ||P1 x|| ≤ ||P2 x|| ≤ ||x|| so that x = P2 x ∈ M2 , i.e., M1 ⊂ M2 . If h = m + m0 , m ∈ M1 , m0 ∈ M1⊥ , then P1 h = m ∈ M2 , so P2 P1 h = P2 m = m = P1 h, i.e., P2 P1 = P1 and hence t u P1 = P1∗ = (P2 P1 )∗ = P1∗ P2∗ = P1 P2 . B.1 In fact, such a T must be self-adjoint as well [B-1], and hT h, hi = hh, T ∗ hi means that hh, T ∗ hi is real.

B Operators

239

Since commutativity of operators plays an important role, we use the symbol ↔ [B-1] to indicate that operators commute, i.e. T1 ↔ T2 means T1 T2 = T2 T1 (Ti ∈ L(X, X) or Ti ∈ L(H, H)). If T ∈ L(X, X), then T n , n =P1, 2 . . . is also in L(X, X) and, conventionally, T 0 = I. Thus, if n αj z j is a polynomial with coefficients αi ∈ C (or R), then p(z) = Pj=0 n j p(T ) = j=0 αj T is an element of L(X, X). Note that if S ↔ T , then S ↔ p(T ) for any polynomial p(z). If p(z) has real coefficients and T is selfadjoint, then so is p(T ) [note also that (iT )∗ = −iT ∗ for T ∈ L(H, H)]. If T is any element of L(H, H), then T =

(T + T ∗ ) (T − T ∗ ) +i 2 2i

(B.110)

where (T + T ∗ )/2 and (T − T ∗ )/2i are self-adjoint. Also note that T ∗ T and T T ∗ are both positive as, for example, hT ∗ T h, hi = hT h, (T ∗ )∗ hi = hT h, T hi ≥ 0 for all h. Suppose that T is a self-adjoint element of L(H, H), then T 2 is always positive. We shall eventually show that T = T+ − T−

(B.111)

where T ∗ , T − are positive, commute, and commute with T . Lemma B.112. Let Tn be a bounded monotonically non-decreasing sequence of self-adjoint operators in L(H, H). Then limn→∞ Tn = T exists and T is self-adjoint. Proof. We may assume that ||Tn || ≤ 1 and so 0 < T1 < T2 < · · · < I. If n > m, then Tn − Tm ≥ 0 and ||(Tn − Tm )x||4 ≤ h(Tn − Tm )x, xih(Tn − Tm )2 x, (Tn − Tm )xi. Since Tn − Tm ≥ 0 and (Tn − Tm ) ≤ I, ||Tn − Tm || ≤ 1 and ||Tn x − Tm x||4 ≤ [hTn x, xi − hTm x, xi]||x||2 . But |hTn x, xi| is bounded, monotonically nondecreasing and hence convergent. Thus, Tn x converges to T x. t u Theorem B.113. Let T ∈ L(H, H) be a positive self-adjoint operator. Then there is a unique positive self-adjoint transformation S such that S 2 = T (we √ write S = T ) and S commutes with all transformations which commute with T . Proof. We may assume T ≤ I. Set T = I − V , 0 < V < I and S = I − W . Then, for S 2 = T , we need to have W =

(V + W 2 ) . 2

(B.114)

Set Wn+1 = (V + Wn2 )/2, n = 1, . . . , W1 = V /2. We claim that Wn is polynomial in V with non-negative real coefficients. This follows by induction

B Operators

240

2 )/2 = (Wn +Wn−1 )(Wn −Wn−1 )/2. Since V > 0, as Wn+1 −Wn = (Wn2 −Wn−1 n V > 0 for all n, so that Wn > 0 and Wn − Wn−1 > 0. Since ||Wn || ≤ 1, the lemma applies and W = lim Wn satisfies (B.114) and so S 2 = T . Moreover, S 2 is a limit of a sequence of polynomials in T and so commutes with all u t transformations which commute with T .

We leave the proof of uniqueness to the reader [B-1]. Corollary B.115. Suppose that T > T1 and that V is a positive self-adjoint operator such that T ↔ V , T1 ↔ V . Then T V > T1 V . √ Let T be a self-adjoint element of L(H, H) and let S = T 2 . Let T+ =

1 (S + T ), 2

T− =

1 (S − T ). 2

(B.116)

Then T = T + − T − and S, T + , T − commute (Theorem B.113) and commute with all transformations which commute with T . Note also that T +T − =

1 1 (S + T )(S − T ) = (S 2 − T 2 ) = 0 4 4

as S 2 = T 2 . Proposition B.117. T + , T − are positive self-adjoint elements of L(H, H). Proof. Consider Ker T + and let E+ be the orthogonal projection onto Ker T + . Then T + E+ = 0 and E+ T + = (T + E+ )∗ = 0. Since T + T − = 0, E+ T − = T − and T − E+ = (E+ T − )∗ = (T − )∗ = T − and (I − E+ )T = T (I −E+ ) = T + . In other words, E+ ↔ T + , E+ ↔ T − , and E+ ↔ (T+ +T− ). But E+ > 0, I − E+ > 0, and S > 0 together imply T − = E+ (T + + T − ) = E+ S > 0 and T + = S − T − = (I − E+ )S > 0. Thus, T + , T − are positive. t u Let T be self-adjoint and let m(T ) = glb {m : mI < T }, M(T ) = lub {M : T < M I}.

(B.118)

Note that ||T || = max{|m(T )|, |M(T )|}. Let λ ∈ R and let Tλ = T − λI. Then Tλ is self-adjoint. Observe that Tλ ↔ Tµ all λ, µ and that, if λ < µ, then Tλ+ > Tλ > Tµ and Tλ+ > Tµ . If λ < m(T ), then Tλ+ = Tλ > (m(T )−λ)I and if λ > M(T ), Tλ < 0 and so Tλ+ = 0. Let Kλ = Ker Tλ+ and Eλ be the orthogonal projection on Kλ . If λ < m(T ), then Kλ = {0} and if λ ≥ M(T ), then Kλ = H. This means that Eλ = 0 for λ < m(T ), Eλ = I for λ ≥ M(T ), and for λ < µ, λ(Eµ − Eλ ) < T (Eµ − Eλ ) < µ(Eµ − Eλ ).

(B.119)

If we consider the mapping E : [m(T ), M(T )], then E is a projection-valued, finitely additive, set function such that if M1 , M2 ⊂ [m(T ), M(T )] and M1 ⊂

B Operators

241

M2 , then E(M1 ) < E(M2 ) and E(M1 − M2 ) = E(M1 ) − E(M2 ). E might well be referred to as a “projection valued measure.” More on this in Appendix C. Observe now only that E(λ) is a monotone non-decreasing right continuous function. Now let us examine L(X, X) from a slightly different point of view. Definition B.120. A set A is a Banach algebra if it is a Banach space with a multiplication x ◦ y from A × A → A such that (1) (2) (3) (4)

z ◦ (x + y) = z ◦ x + z ◦ y; (x + y) ◦ z = x ◦ z + y ◦ z; α(x ◦ y) = (αx) ◦ y = x ◦ (αy), α ∈ C or R; and ||x ◦ y|| ≤ ||x|| · ||y||.

An element e of A is an identity if e ◦ x = x ◦ e = x. If A has an identity, then A is a Banach algebra with identity. Example B.121. (a) Let A = L(X, X) with S ◦ T = ST . Then A is a Banach algebra with identity I. (b) Let A = C ([0, 1]) and set (f ◦ g)(t) = f (t)g(t) and ||f || = supt |f (t)|. Then A is a Banach algebra with identity 1 (i.e., f (t) = 1 all t ∈ [0, 1]). (c) Let A = {f : f is analytic on |z| < 1 and continuous on |z| ≤ 1}. Set (f ◦ g)(z) = f (z)g(z) and ||f || = sup|z|≤1 |f (z)| = sup|z|=1 |f (z)| (by the maximum modulus theorem). Then A is a Banach algebra with identity 1 (i.e. f (z) = 1 all z). (d) Let P = {p ∈ C ([0, 1]) : p is a polynomial}. Then P is not a Banach algebra as P is not complete (i.e., not a Banach space). If T ∈ L(X, X) and P(T ) = {p(T ) : p ∈ P}, then P(T ) has the same properties as P. (e) Let A = L1 (R) and define a multiplication ∗, convolution, by Z ∞ f (t − s)g(s)ds. (B.122) (f ∗ g)(t) = −∞

Then A is a Banach algebra [D-9] without an identity. A is commutative, i.e., f ∗ g = g ∗ f whereas L(X, X) is not. This algebra, and generalizations, is intimately connected with the Fourier transform [L-6]. (f) Let A = L1 (R, L(X, X)) and define a convolution ∗ on A by Z ∞ (S ∗ T )(s) = S(t − s)T (s)ds (B.123) −∞

(note S( · ) : R → L(X, X) and similarly for T ( · )). Then A is a Banach algebra without an identity. (g) Let X be a set and let Σ be a σ-algebra of subsets of X and let M be a fixed element of Σ. Let ΣM be the set of all measurable subsets of M and let A = {µ : µ is a finite signed measure on ΣM }. Then A is a Banach space with ||µ|| = |µ|(M ) where |µ| is the total variation of µ on M . Let

B Operators

242

R (µ ∗ ν)(F ) = µ(F − t)dν when M is, say, an interval. Then A becomes a Banach algebra. Example B.124 ([D-9]). Let H be a Hilbert space and let L ∈ L(H, H). L is a Hilbert–Schmidt operator if, given an orthonormal basis {ϕα } of H, then X ||L||2HS = ||Lϕα ||2 < ∞ (B.125) α

(here || || is the norm in H). || · ||HS is called the Hilbert–Schmidt norm of L. It is well defined and ||L|| ≤ ||L||HS and ||L||HS = ||L∗ ||HS . For instance, if  > 0, then there is a ϕ1 of norm 1 with ||L||2 < ||Lϕ ||2 + . Choose an 2 + . This can orthonormal basis {ϕα } containing ϕ1 so that ||L||2 ≤ ||L||HS be used to show that A = {L : L a Hilbert–Schmidt operator} is a Banach 2 0. Moreover, if ν(λ) > 0, then there exists x 6= 0, x ∈ Ker {λI − L}. Let P (z) = Σai z i be a polynomial so that P (L) = Σai Li and P (A) = Σai Ai . Let {λ1 , . . . , λs } = spec (L), λi distinct. Eλi be the projection on Nλi ,ν(λi ) . Then Eλ2i = EP = {0} if λi 6= λj , so λi , Nλ,ν(λi ) ∩ Nλ,ν(λP j) s s Eλi Eλj = 0. Moreover, I = j=1 Eλj and X = j=1 Eλj X. Another approach. spec (L) = {λ1 , . . . , λn } is a closed bounded set of isolated points. Let O be an openP on O}. neighborhood of spec (L) and A(L) = {f (z) : f analytic P Then f (z) = m≤ν(z)−1 p(m) (z)z m +g(z) and f (L) = P (L) = p(m) (L)Lm . This gives a functional calculus. Let ( 1 near λ, eλ (z) = 0 on a neighborhood of spec L − {λ}, then Eλ = eλ (L). If f ∈ A(L), then f (L) =

j )−1 s ν(λ X X (L − λi I)j

i=1

0

j!

f (j) (λi )Eλi .

If f ∈ A(D), D ⊃ O (O open ⊃ spec (L) and ∂O = Γ be a finite number of rectifiable Jordan curves), then Z 1 f (L) = f (z)(λI − L)−1 dz. 2πi Γ Example C.3. etz is entire in z and dn etz dn−1 (zetz ) = dz n dz n−1

and

d tA e = AetA dt

which solves x˙ = Ax. P Example C.4. If ν(λi ) = 1 for all λi ∈ spec (L), then f (L) = f (λi )Eλi . If hLx, yi = hx, Lyi, then ν(λ) = 1. spec (L) = spec (L∗ ) and f (L∗ ) = f (L)∗ . Also if LL∗ = L∗ L (normal), then ν(λ) = 1. Example C.5 (Spectral Map). Let B be a Borel algebra in C, B(L) = {algebra generated by Eλi }. Map E : B → B(L) by ( 0 if σ ∩ spec (L) = φ, E(σ) = P if σ ∩ spec (L) 6= φ. λi ∈σ Eλi

C Spectral Theory

245

This is a homomorphism, E(spec (L)) = I, E(σ)L = LE(σ). If Lσ = L on E(σ)X, then spec (Lσ ) ⊆ σ. Example C.6. If A is non-singular, then there exists B with eB = A. Write A in Jordan form, i.e.,   A0 A0 = diag {λ    1 , . . . , λm },  ν(λj ) = 1, A =  ...  , Nj Aj = λm+j Ij + λm+j , Nj nilpotent. As Take B0 = diag log λi , h  Nj i . Bj = (log λm+j )Ij + log Ij + λm+j This gives B since: (i) the series X (−1)k+1 k

k

(λm+j )k Njk

is analytic as Nj is nilpotent; (ii)  log Ij +

1 λm+j

Nj



is a polynomial in Nj /λm+j (as 1 + x = elog(1+x) ). Let us consider an alternate approach (which generalizes smoothly). Let X = Cn . Definition C.7. A is normal if AA∗ = A∗ A. If A is self-adjoint, i.e., A = A∗ , then A is normal. Proposition C.8. If A(W ) ⊂ W , then A∗ (W ⊥ ) ⊂ W ⊥ . Proof. Let w ∈ W , z ∈ W ⊥ , so that hw, zi = 0. Then 0 = hAw, zi = hw, A∗ zi for all w ∈ W , and so A∗ z ∈ W ⊥ . t u Proposition C.9. If A is normal, then (i) ||Ax|| = ||A∗ x|| for all x; and (ii) if Ker (A − λI) 6= {0}, then Ker (A∗ − λI) 6= {0}. Proof. (i) ||Ax||2 = hAx, Axi = hx, A∗ Axi = hx, AA∗ xi = hA∗ x, A∗ xi = ||A∗ x||2 . (ii) Since ||(A − λI)x|| = ||(A∗ − λI)x||, x 6= 0, x ∈ Ker (A − λI) implies x ∈ Ker (A∗ − λI). t u Theorem C.10. If A is normal, then there exist x1 , . . . , xn eigenvectors of A which are an orthonormal basis.

C Spectral Theory

246

Proof. Let x1 be an eigenvector with ||x1 || = 1. If dim X = 1, we are done. Use induction on n. W = span [x1 ], X = W ⊕ W ⊥ and A(W ) ⊂ W so that ˜ =A A∗ (W ⊥ ) ⊂ W ⊥ and, hence, W ⊥ is invariant under A∗∗ = A. Let A ⊥ ⊥ ⊥ ⊥ ˜ restricted to W . Then A(W ) ⊂ W and dim W < dim X give the u t result. If E is a projection, then E is normal if and only if E is P self-adjoint if k and only if E is an orthogonal projection on Im (E). If X = ⊕ i=1 Wi with Pk Wi ⊥ Wj , i 6= j and Ei projection on Wi , then I = i=1 Ei , Ei Ej = Ej Ei = 0 if i 6= j. If {xi,αi } is an orthonormal basis of Wi , then k [

{xi,αi }

i=1

is an orthonormal basis for X. Remark C.11. If A is normal, λ1 , λ2 ∈ spec (A), λ1 6= λ2 , AX = λ1 x, Ay = λ2 y, then hx, yi = 0. Proof. λ1 hx, yi = hλ1 x, yi = hAx, yi = hx, A∗ yi = hx, λ2 yi = λ2 hx, yi, so (λ1 − λ2 )hx, yi = 0 and hx, yi = 0 as (λ1 − λ2 ) 6= 0. Theorem C.12 (Spectral Decomposition). Suppose A is normal and λ1 , . . . , λk are the distinct elements of spec (A). Then there exist projections E1 , . . . , Ek such that (a) (b) (c) (d)

Ei is an orthogonal projection on Ker (A − λi I); Ei 6= 0 and Ei Ej = 0 if i 6= j; Pk I = i=1 λi Ei ; and Pk A = i=1 λi Ei .

Proof. Let Wi = Ker (A − λi I) and {xi,αi } be an orthonormal basis for Pk Wi . Then P X = i=1 Wi . If x ∈ WiP ∩ (W1 + · · · + Wi−1 + Wi+1 · · · + Wk ), then x = j6=i wj and hx, xi = hx, j6=i wj i = 0 as hx, wj i = 0, j 6= i. Ei orthogonal projection on Wi gives (a), (b), (c). As for (d), if x ∈ X, then x=

k X

wi .

i=1

P P Awi = λi wi = λi Ei x. t u P Corollary C.13. p(A)x = p(λi )Ei x for any polynomial p(z) and also for any p(z) analytic on a neighborhood of spec (A). Awi = λi wi implies Ax =

P

We now turn our attention to infinite dimensions. Let X be a Banach space and L on operator on X with domain D(L).

C Spectral Theory

247

Definition C.14. λ is a regular point of L if Ker (λI − L) = {0} and Im (λI − A) is dense in X and λI − L has a bounded inverse on Im (λI − A). The set ρ(L) = {λ ∈ C : λ a regular point of L} is the resolvent set of L. The complement of ρ(L), σ(L) = spec (L), is the spectrum of L. The situation is more complicated in infinite dimensions. Definition C.15. If Ker (λI − L) 6= {0}, then (λI − L)−1 does not exist and λ is an eigenvalue of L. The set σp (L) = {λ : λ is an eigenvalue of L} is the point spectrum of L. If Ker (λI − L) = {0} and Im (λI − L) is dense but (λI − L)−1 is unbounded, then λ is an element of the continuous spectrum, σc (L), of L. If Ker (λI −L) = {0} but Im (λI −L) is not dense but (λI −L)−1 exists on Im(λI − L) (either bounded or unbounded), λ is an element of the residual spectrum, σr (L) of L. We observe that σ(L) is the disjoint union of σp (L), σc (L), and σr (L) and also that C = ρ(L) ∪ σ(L). Moreover, if L is closed and (λI − L)−1 exists, then (λI − L)−1 is also closed. Also if L is closed, then for λ ∈ C, either (1) λ ∈ ρ(L); (2) λ ∈ σp (L); (3) λ ∈ λc (L); or, (4) λ ∈ σr (L). Example C.16. Let X = C ([0, 1]), L = Dt . (a) D(L) = {x : x0 ∈ C ([0, 1]), x(0) = 0}. If (λI − L)x = 0, then x = 0, so Ker (λI − L) = {0}. λI − L is closed, so (λI − L)−1 is closed and bounded for: if x ∈ X, let Z t

x(s)ds

y(t) = 0

so that y ∈ D(L) and Ly = x so L−1 x = y, ||L−1 x|| ≤ ||x||. Thus σ(L) = φ. (b) D(L) = {x : x0 ∈ C ([0, 1])}. Then Ker (λI − L) 6= 0 for all λ (constants or λx − Dt x = 0, x(t) = αeλt any α 6= 0). So σ(L) = C. (c) D(L) = {x : x0 ∈ C ([0, 1]), x(0) = x(1)}. Then σ(L) = σp (L) = {2πin} a countable set of eigenvalues. Now let us suppose L = A ∈ L(X, X), i.e., A is bounded. Let R(z, A) = (zI − A)−1 . Recall (Appendix B) that if ||I − B|| < 1, then B is invertible. Proposition C.17. If A ∈ L(X, X), then σ(A) is compact and if λ ∈ σ(A), then |λ| ≤ ||A||. Proof. If λ0 ∈ σ(A), then λ0 I − A is invertible and ||I − (λ0 I − A)−1 (λI − A)|| = ||(λ0 I − A)−1 [λ0 I − A − (λI − A)]|| ≤ ||(λ0 I − A)−1 |||λ0 − λ| which is < 1 if λ0 − λ small. In other words, ρ(A) is open, so σ(A) is closed. If |λ| > ||A||, then ||A/λ|| < 1, so ||I − (I − A/λ)|| < 1 and λI − A is invertible. t u In other words, if λ ∈ σ(A), |λ| ≤ ||A|| and σ(A) is bounded. Remark C.18. Suppose A ∈ L(X, X). Then: (1) R(z, A) is analytic on C − σ(A). (2) R(z1 , A) − R(z2 , A) = (z1 − z2 )R(z1 , A)R(z2 , A).

C Spectral Theory

248

(3) If z0 ∈ ρ(A), z ∈ S(z0 , ||R(z0 , A)||−1 ), then z ∈ ρ(A) and R(z, A) = R(z0 , A) + (z − z0 )R(z0 , A)2 + · · · . (4) If |λ| > ||An ||1/n , then λ ∈ ρ(A). (5) If X 6= {0}, then σ(A) 6= φ. Now suppose L ∈ L(X, X) is a compact operator. Such operators represent a natural extension of finite-dimensional operators, i.e., matrices. In Appendix B, we noted that λI − L and λI − L∗ are Fredholm operators and that if Ker (λI − L) = {0} (or Ker (λI − L∗ ) = {0}), then (λI − L)x = y (or (λI − L∗ )x∗ = y ∗ ) has a unique solution x for any y. If λ 6= 0 and L is compact, then, if λI − L is surjective, λI − L is injective, and conversely, if λI − L is injective, then λI − L is surjective. It immediately follows that if λ 6= 0, then λ 6∈ σr (L). Lemma C.19. If L ∈ L(X, X) is compact, then Im (λI − L) is closed. Proof. Deny. So there exist xn with (λI −L)xn → x, x 6= 0, and x 6∈ Im (λI − L). So we may assume xn 6∈ Ker (λI − L) and let δn = d(xn , Ker (λI − L)), zn ∈ Ker (λI − L), ||xn − zn || < 2δn . Claim. ||xn − zn || → ∞. If not, (xn ) contains a bounded subsequence and L(xn − zn ) has a convergent subsequence. But xn −zn = λ−1 [(λI −L)(xn −zn )+L(xn −zn )]. Then xn −zn ˜. But (λI − L)(xν − has a convergent subsequence xν − zν with xν − zν → x x = y, a contradiction. Thus ||xn − zn || → ∞. Let zν ) → y and so (λI − L)˜ vn = (xn − zn )/||xn − zn ||. So ||vn || = 1 and (λI − L)vn = (λI − L)

xn → 0. ||xn − zn ||

Now vn = λ−1 [(λI −L)vn +Lvn ] and vn bounded together imply that there is a convergent subsequence v˜n with (λI − L)˜ vn → 0. If v˜n → v, (λI − L)v = 0, set yn = zn + ||xn − zn ||v. Then yn ∈ Ker (λI − L) and δn ≤ ||xn − yn ||, and it follows that δn ≤ ||xn − yn || < 2δn ||vn − v|| which is a contradiction. t u Corollary C.20. If λ 6= 0, then λ 6∈ σc (L). Corollary C.21. If λ 6= 0 (and L compact), then Ker (λI − L) is finitedimensional. Theorem C.22. If L is compact, then σ(L) = σp (L) is at most countable and 0 is the only possible limit point. Proof. Let S1/n = {λ ∈ σp (L) : |λ| ≥ 1/n}. Clearly σp (L) − {0} =

∞ [ n=1

S1/n .

C Spectral Theory

249

Suppose the theorem be false. Then there are {λi } distinct and infinite with λi ∈ S1/n0 for some n0 . Let xi be independent eigenvectors and set Wn = span {x1 , . . . , xn }. Then Wn−1 ⊂ Wn and there exists wn ∈ Wn with ||wn || = 1 and d(wn , Wn−1 ) ≥ 1/2. Clearly (λn I − L)(Wn ) ⊂ Wn−1 . Consider L(wn − ˜ with w wm ) = λn (wn − λn−1 w) ˜ = (λn I − L)wn + Lwm , 1 ≤ m < n. Then ||L(wn − wm )|| ≥ |λn |/2 > 1/2n0 . But wn is bounded and L is compact, so we have a contradiction. Now, by way of illustration, let X = H be a Hilbert space and L ∈ L(H, H) be a self-adjoint operator. Then: (a) hLh, hi is real for all h as hLh, hi = hh, L∗ hi = hh, Lhi = hh, Lhi. (b) ||L|| = sup||h||=1 |hLh, hi| = ||L∗ || (Appendix B). (c) If Ker (λI − L) 6= {0}, then λ ∈ R as λ=

hLh, hi , hh, hi

h 6= 0.

(d) If λ1 6= λ2 are eigenvalues with Hλ1 = Ker (λ1 I − L), Hλ2 = Ker (λ2 I − L), then Hλ1 ⊥ Hλ2 . (e) If L is compact, let m(L) = inf hLh, hi and M (L) = sup hLh, hi, ||h||=1

||h||=1

then ||L|| = max[m(L), M (L)]. Proposition C.23. If L is compact, then L has an eigenvalue. ˜ Then there exist hn , ||hn || = 1, hLhn , hn i → Proof. Say ||L|| = M (L) = λ. ˜ ˜ n ||2 ≤ ||L||2 + λ. As L compact, we may assume Lhn → h so that ||Lhn − λh 2 ˜ ˜ ˜ ˜ ˜ ˜=λ ˜h ˜ as λ − 2λhLhn , hn i. Then Lhn − λhn → 0 and hn → h = h/λ and Lh ˜ ˜ h 6= 0, ||h|| = 1. t u Remark C.24. ||L|| = |λ1 | and if L has no λi 6= 0, then L = 0. If λ is an eigenvalue, let Hλ be the corresponding eigenspace and Eλ the projection on Hλ . Theorem C.25. L is compact and self-adjoint. Let {λ1 , . . . P , λj , . . .} j = 1, . . . , ∞ (or N ) be the set of distinct eigenvalues. Then L = j λj Ej . Let P H0 = Ker (L) and H1 = ⊕Hλj . Then H = H0 ⊕ H1 . Proof. L and Eλi commute and Eλi Eλj = 0, i 6= j. Let λ1 be an eigenvalue with |λ1 | = ||L||. Let L1 = L, L2 = L1 −λ1 Eλ1 . L2 is compact and self-adjoint and ||L2 || ≤ ||L1 ||. Then L2 has an eigenvalue λ2 with |λ2 | = ||L2 || ≤ λ1 . Suppose Ker (λI − L2 ) 6= {0}. Then L1 (I − Eλ1 )h = λh and Eλ1 L1 (I − Eλ1 )h = λ(I − Eλ1 )h (h 6= 0, h ∈ Ker (λI − L2 )). But (I − Eλ1 )L1 (I − Eλ )h = L(I − Eλ )2 h = λh, so h ∈ Ker (λI − L1 ). As λ = λ1 , Hλ , Hλ1 are orthogonal and L2 h = L1 h − λ1 Eλ1 h = L1 h = λh. Continue by induction setting

C Spectral Theory

250

Ln+1 = Ln − λn Eλn = L −

n X

λj Eλj .

j=1

Moreover, |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn | and ||Ln || = |λn |. If LN = 0 for some ˜ > 0 for all N , then we are done. If not, we claim λn → 0 (if false, |λn | ≥ λ n and hn ∈ Hλn , ||hn || = 1 with ||Lhm − Ln hn ||2 = ||λm hm − λn hn ||2 = 2 ˜2 |λm |2 + |λP n | ≥ 2λ > 0 which contradicts compactness). So ||Ln || → 0 ∞ and L = j=1 λj Eλj . Now suppose h0 ∈ H0 and h ∈ H. Then hLh, h0 i = hh, Lh0 i = 0, so Lh ⊥ H0 . Im (L) ⊂ H1 and Im (L) = H1 . If h0 ⊥ L(H), then 0 = hLh, h0 i for all h, so Lh0 = 0, i.e., h0 ∈ H0 . t u Corollary C.26. λj = ± sup{|hLh, hi : ||h|| = 1, h ⊥ Hλj }. Now, again let A ∈ L(X, X) with X a Banach space. We want to define an “operational calculus” for A so that, for example, sin(A) makes sense. The following properties are easy to establish: (i) ρ(A) is open; (ii) σ(A) is compact; (iii) R(z, A) is analytic on ρ(A); (iv) sup |σ(A)| = ||An ||1/n (the spectral radius); (v) ∞ X An R(z, A) = z n+1 n=0 for |z| > sup |σ(A)|; (vi) σ(A∗ ) = σ(A) and R(z, A∗ ) = [R(z, A)]∗ . Definition C.27. A(A) = {f : f analytic on open O ⊃ spec(A), ∂O a finite number of rectifiable Jordan curves, O ⊂ dom f }, Z 1 f (A) = f (z)R(z, A)dz (C.28) 2πi Γ with Γ = ∂O. Since f is analytic, the definition of f (A) is independent of the choice of O by Cauchy’s integral theorem. Proposition C.29. If f, g ∈ A(A), then: (a) (b) (c) (d)

αf + βg ∈ A(A) and (αf + βg)(A) = αf (A) + βg(A); (f g)(A) =Pf (A)g(A); P∞ ∞ if f (z) = j=0 αj z j , then f (A) = i=0 αj Aj ; f ∈ A(A∗ ) and f (A∗ ) = [f (A)]∗ .

Theorem C.30. f (σ(A)) = σ(f (A)). Proof. If λ ∈ σ(A), let g(z) = (f (λ) − f (z))/λ − z. Then f (λ)I − f (A) = (λI − A)g(A), so if f (λ)I − f (A) has a bounded inverse B, then g(A)B is a bounded inverse of λI − A (a contradiction). So f (λ) ∈ σ(A). The converse is similar. t u Theorem C.31 ([D-9]). f ∈ A(A), g ∈ A(f (A)) implies g ◦ f ∈ A(A) and (g ◦ f )(A) = g(f (A)).

C Spectral Theory

251

Remark C.32. If fn ∈ A(A), fn defined on O, and fn → f uniformly on O, then fn (A) → f (A). [This is a property of the integrals in (C.28).] Definition C.33. σ ⊂ σ(A) is a spectral set of A if σ is both open and closed in σ(A). Let B(A) = {σ : σ is a spectral set of A}. Then B(A) is a Boolean algebra with σ1 < σ2 if σ1 ⊂ σ2 , σ1 ∨ σ2 = σ1 ∪ σ2 , σ1 ∧ σ2 = σ1 ∩ σ2 . If σ ∈ B(A), then there exists fσ ∈ A(A) such that fσ (σ) = 1, fσ (σ(A) − σ) = 0. Let E(σ, A) (or E(σ)) be given by Z 1 E(σ, A) = fσ (A) = fσ (z)R(z, A)dz. (C.34) 2πi Γ Clearly E(σ, A) is idempotent and independent of fσ . Observe also that if σ ˜ ∈ B(f (A)), then σ(A) ∩ f −1 (˜ σ ) ∈ B(A) and E(˜ σ , f (A)) = E(f −1 (˜ σ ), A). Let Xσ = E(σ)X. Then AXσ ⊂ Xσ . Let Aσ be A restricted to Xσ and note that A and E(σ) commute (for instance, via series expansions). Theorem C.35. Suppose σ ∈ B(A). Then: (1) σ(Aσ ) = σ (i.e., spec (Aσ ) = σ); (2) if f ∈ A(A), then f ∈ A(Aσ ) and f (A)σ = f (Aσ ). Proof. (1) if λ ∈ σ and λ 6∈ σ(Aσ ), then there exists B with (λI − A)Bx = B(λI − A)x = x for x ∈ Xσ . Let g = 0 near σ and g = (λ − z)−1 on σ(A) − σ. Then, for B1 = BE(σ), (λI − A)(B1 + g(A)) = (B1 + g(A))(λI − A) = I, a contradiction, so λ ∈ σ(Aσ ). The converse is similar. As for (2), Z Z 1 1 f (A)σ = f (z)R(z, A)σ dz = f (z)R(z, Aσ )dz = f (Aσ ). t u 2πi Γ 2πi Γ Corollary C.36. σ → E(σ) is an isomorphism of B(A) onto B(E(σ)). The formula (C.28) defines an “operational calculus” for bounded operators. Can a similar theory be developed for unbounded operators? So, assume until further notice, that L is a closed, densely defined operator on the Banach space X. Remark C.37. IfP λ0 ∈ ρ(L) and ||(λ0 − λ)(λ0 I − L)−1 || < 1, then λ ∈ ρ(L) −1 and (λI − L) = (λ0 − λ)j (λ0 I − L)−j−1 . Thus ρ(L) is open and σ(L) is closed. In view of Example C.16 (b), σ(L) = C is possible and ρ(L) = φ. We shall assume from now on that ρ(L) 6= φ (i.e., σ(L) 6= C). Let f (z) be given with D(f ) = domain of f and assume D(f ) is open and f analytic. D(f ) has a finite number of connected components Di (f ) with Di (f ) ∩ Dj (f ) = φ for i 6= j and ∂Di (f ) a finite number of closed, rectifiable, non-intersecting, Jordan curves.

C Spectral Theory

252

Definition C.38. If L is closed, densely defined and ρ(L) 6= φ, let ˜ A(L) = {f : σ(L) ⊂ D(f ), D(f ) ⊃ O∞ , f regular at ∞} (analytic functions of L). ˜ then there is a domain D with D ⊂ D(f ), Definition C.39. If f ∈ A(L), σ(L) ⊂ D, and D unbounded. Set Z 1 f (L) = f (∞)I + f (z)R(z, L)dz (C.40) 2πi ∂D (where f (∞) = limz→∞ f (z). (The integral is independent of D by Cauchy’s ˜ integral theorem.) Let A˜0 (L) = {f ∈ A(L) : f (∞) = 0} (vanishing at ∞). ˜ Theorem C.41. Suppose f ∈ A(L). (1) If f (z) = λ, then f (L) = λI. (2) (f + g)(L) = f (L) + g(L) and (f g)(L) = f (L)g(L). Proof. Exercise (see [D-9, B-1]).

t u

˜ Note that if L is bounded, then A(L) < A(L) (entire functions). This provides an “operational calculus” for L. We will ultimately relate this to projections and projection-valued measures. We now return to X = H a separable (for ease of exposition) Hilbert space and L a bounded self-adjoint operator. Then σ(L) is real. Definition C.42. λ is an element of the approximate spectrum of L, π(L), if given  > 0 there exists x ∈ dom (L) with ||(λI − L)x|| < , ||x|| = 1. Proposition C.43. λ ∈ π(L) if and only if λI − L does not have a bounded inverse. Hence π(L) ⊂ σ(L). Proof. If λ ∈ π(L), then there exists xn , ||xn || = 1 such that ||(λI −L)xn || < 1/n, so there is no µ with µ||x|| ≤ ||(λI − L)x|| and conversely no such µ means λ ∈ π(L). t u Proposition C.44. If L is normal, then π(L) = σ(L). Proof. If λ 6∈ π(L), then ||(λI − L)h|| ≥ ||h|| for all h. So if λI − L has a bounded inverse, we claim Im (λI − L) = H or, equivalently, that Im (λI − L)⊥ = {0}. But Im (λI − L)⊥ = Ker (λI − L∗ ). If h ∈ Ker (λI − L∗ ), then (λI − L∗ )h = 0 and, as observed earlier, ||(λI − L∗ )h|| ≥ ||h||, so h = 0. t u Corollary C.45. If L is self-adjoint, then σ(L) = π(L) ⊂ R. Proof. If λ = α + iβ, β 6= 0, then 0 < |λ − λ|||h||2 ≤ α||(λI − L)h|| · ||h||. If λ ∈ π(L), there exist xn , ||xn || = 1, ||(λI − L)xn || → 0. But α||(λI − L)xn || ≥ |λ − λ| > 0 if β 6= 0. So β = 0. t u

C Spectral Theory

253

Proposition C.46. L is normal if and only if L = A + iB where A and B are self-adjoint and commute. Proof. If L is normal, let A = (L + L∗ )/2 and B = (L − L∗ )/2i. Conversely, let L = A+iB with A, B self-adjoint and commuting. Then L∗ = A∗ −iB∗ = A − iB and LL∗ = (A + iB)(A − iB) = A2 − iAB + iBA + B2 = A2 + B2 = A2 − iBA + iAB + B2 = (A − iB)(A + iB). t u In other words, it makes sense to develop the theory for bounded normal operators. Recall that m(L) = inf ||h||=1 hLh, hi and M (L) = sup||h||=1 hLh, hi and that ||L|| = max[m(L), M (L)]. Proposition C.47. σ(L) = [m(L), M (L)]. Proof. If λ 6∈ [m(L), M (L)], then λ < m(L) or λ > M (L). Say λ > M (L) (the other case is similar). Then λ = M (L) + ,  > 0 and h(L − λI)h, hi = hLh, hi − λhh, hi ≤ M (L)hh, hi − λhh, hi = −hh, hi ≤ 0. Hence |h(L − λI)h, hi| ≥ ||h||2 and, since |h(L − λI)h, hi| ≤ ||λI − L|| ||h||, ||(L − λI)h|| ≥ ||h|| so λ 6∈ π(L) = σ(L). Suppose now that M (L) ≥ m(L) ≥ 0 (treat other cases similarly). Then there exist hn such that hLhn , hn i ↑ M (L), i.e. hLhn , hn i = M (L) − n , n > 0, n ↓ 0. Then ||Lhn − M (L)hn || ≤ ||L||2 − 2M (L)hhn , hn i + M (L)2 = 2M (L)n → 0. Hence M (L) ∈ π(L) = σ(L). Since there exists θ with M (L) − θ ≥ m(L) − t u θ ≥ 0, M (L) − θ ∈ σ(L − θI) = {λ − θ : λ ∈ σ(L)}. Definition C.48. Let L be a bounded, self-adjoint, element of L(H, H). A family {E(λ) : λ ∈ R} of orthogonal projections is a resolution of the identity relative to L if: (1) (2) (3) (4) (5)

λ1 ≤ λ2 implies E(λ1 )E(λ2 ) = E(λ2 )E(λ1 ) = E(λ1 ); limλ→λ0 + E(λ)h = E(λ0 +)h = E(λ0 )h; if λ < m(L), E(λ) = 0 and if λ > M (L), E(λ) = I; E(λ) commutes with L and E(λ) reduces Im (L); if, in addition, for h1 , h2 ∈ H, hE(λ)h1 , h2 i is of bounded variation on any interval [s, t] with s < m(L), t ≥ M (L) and Z

M (L)

hLh1 , h2 i =

λdhE(λ)h1 , h2 i, m(L)

then {E(λ)} is a spectral resolution of the identity relative to L.

C Spectral Theory

254

If [m(L), M (L)] ⊂ [s, t] and suppose {E(λ)} is a spectral resolution of I relative to L. If p(λ) is a real polynomial, then p(L) is self-adjoint and t

Z

p(λ)dhE(λ)h1 , h2 i.

hp(L)h1 , h2 i = s

Since p( · ) ∈ C ([m(L), M (L)] and σ(L) ⊂ [m(L), M (L)], ||p(L)|| = sup |p(λ)| ≤ ||p( · )||∞ . λ∈σ(L)

So the map p( · ) → C given by p( · ) → hp(L)h1 , h2 i is a continuous linear map. By density it extends to all of C ([m(L), M (L)]) and Z hϕ(L)h1 , h2 i =

t

ϕ(λ)dhE(λ)h1 , h2 i s

for ϕ ∈ C ([m(L), M (L)]). Theorem C.49 (Proof [B-1]). If L ∈ L(H, H) is self-adjoint, then there is a spectral resolution of I relative to L. Proof. Recall (Appendix B), if L is self-adjoint, then L2 is positive. Let p √ |L| = (L2 ) = LL∗ , 1 1 √ L+ = {|L| + L} = { LL∗ + L}, 2 2 1 √ 1 − L = {|L| − L} = { LL∗ − L}. 2 2 √ All √commute with L and each other. Moreover L+ L− = 1/4( LL∗ + L)( LL∗ − L) = 0. Let N = Ker L+ and E be the projection on N. Then: (a) L+ E = 0, EL+ = (L+ E)∗ = 0; (b) Im L− ⊂ N implies EL− = L− , L− E = (EL− )∗ = (L− )∗ = L− ; (c) EL = LE = EL+ − EL− = 0 − L− = −L− ; (d) E > 0, (I − E) > 0, |L| > 0 so E|L| ≥ 0 and (I − E)|L| ≥ 0; (e) |L| = L+ L− ≥ L+ − L− ≥ L and |L| ≥ L− − L+ ≥ −L and L+ ≥ L+ − L− = L and L− ≥ L− − L+ = −L. We note the following claim (ordering as in Appendix B): Claim. If A, B are self-adjoint and commute, then 1 (A + B + |A − B|) = A ∨ B, 2 1 (A + B − |A − B|) = A ∧ B 2 (the proof is left to the reader). We are now going to use the partial order ≥ on projections and we let m = m(L) and M = M (L). Let λ ∈ R and let

C Spectral Theory

255

Lλ = L − λI. Lλ is self-adjoint as λ ∈ R. If λ1 < λ2 , then L+ λ1 ≥ Lλ1 ≥ Lλ2 ,

L+ λ1 ≥ 0,

+ L+ λ1 ≥ Lλ2

(and all commute). Observe also that if λ < m, then Lλ ≥ 0 so that Lλ+ = Lλ ≥ (m − λ)I; and, if λ > M , Lλ ≤ 0 so that L+ λ = 0. on Nλ . This will ultibe projection the Let Nλ = Ker L+ and let E(λ) λ mately give us the spectral resolution. Claim. If λ < m, then Nλ = {0}. If λ1 ≤ λ2 , then Nλ1 ⊆ Nλ2 . If λ ≥ M , then Nλ = H. Proof. If λ1 ≤ λ2 , then Lλ1 ≥ Lλ2 so that Lλ+1 Lλ+2 ≥ (Lλ+2 )2 and hence that 2 hLλ+2 Lλ+1 h, hi ≥ ||L+ λ2 h|| for all h, so Nλ1 ≤ Nλ2 . The other assertions are t u clear. Note that E(λ) = 0 if λ < m and E(λ) = I if λ > M . Since λ1 ≤ λ2 implies E(λ1 ) ≤ E(λ2 ), the map λ → E(λ) is increasing. Then E(λ2 )[E(λ2 )− E(λ1 ) = [I − E(λ1 )][E(λ2 ) − E(λ1 )] = E(λ2 ) − E(λ1 ). Since all commute, (λ2 I − L)[E(λ2 ) − E(λ1 )] = Lλ−1 [E(λ2 ) − E(λ1 )] ≥ 0, (L − λ1 I)[E(λ2 ) − E(λ1 )] = Lλ+1 [E(λ2 ) − E(λ1 )] ≥ 0,

(C.50)

or, setting ∆E(λ1 , λ2 ) = E(λ2 ) − E(λ1 ), λ1 ∆E(λ1 , λ2 ) ≤ L∆E(λ1 , λ2 ) ≤ λ2 ∆E(λ1 , λ2 ). We shall use (C.50) to obtain the integral representation. Now let (∆E)λ1 =

lim

λ2 →λ1 +

∆E(λ1 , λ2 ),

then λ1 (∆E)λ1 ≤ L(∆E)λ1 ≤ λ1 (∆E)λ1 and λ1 (∆E)λ1 = L(∆E)λ1 . But Lλ1 (∆E)λ1 = 0 and Lλ+1 (∆E)λ1 = (I − E(λ))Lλ1 (∆E)λ1 = 0. In other words, (∆E)λ1 h ∈ Nλ1 for all h so that (∆E)λ1 = E(λ1 )(∆E)λ1 . Taking the limit in (C.50), (I −E(λ1 ))(∆E)λ1 = (∆E)λ1 implies (∆E)λ1 = 0. Thus E(λ) is a resolution of I relative to L. We claim that E(λ) is a spectral resolution. This is basically an exercise in integration. Let t0 < m < t1 < · · · < tn−1 < M ≤ tn . Then, summing (C.50), we get n X j=1

tj−1 ∆E(tj , tj−1 ) ≤ L

n X j=1

∆(tj , tj−1 ) ≤

n X

tj ∆E(tj , tj−1 ),

(C.51)

j=1

Pn since L∆E = L(I − 0) = L, L j=1 ∆(tj , tj−1 ) = L. Suppose max(tj − tj−1 ) ≤  and λj ∈ (tj−1 , tj ). Then

C Spectral Theory

256 n X λj ∆E(tj , tj−1 ) ≤ . L − j=1

Let  → 0, n → ∞. Then Z



Z

M

λdE(λ) =

L= −∞

λdE(λ).

(C.52)

m

By commutativity and orthogonality, (C.52) holds for polynomials and by density for continuous functions. In other words, if f ∈ C ([m, M ]), then Z

M

f (λ)dhE(λ)h1 , h2 i

hf (L)h1 , h2 i = m

as a Stieltjes integral. Since E(λ) is a monotone non-decreasing right continuous bounded function, hE(λ)h1 , h2 i is of bounded variation. Corollary C.53. If L ∈ L(H, H) is normal, then A=

1 (L + L∗ ), 2

B=

1 (L − L∗ ) 2

are self-adjoint with spectral resolutions EA (λ), EB (λ) and Z ∞Z ∞ L= (λ + iµ)dEA (λ)dEB (µ). −∞

−∞

Definition C.54. Let U : H → H be a linear (not necessarily bounded) map. U is an isometry if hU h1 , U h2 i = hh1 , h2 i or, equivalently, if U ∗ U = I. Note if U is surjective, then also U U ∗ = I and so U ∗ = U −1 . We ultimately want a spectral theorem for L closed, densely defined, selfadjoint but unbounded operators. We shall do so by “transforming” L into a unitary operator, applying the theorem for unitary operators and “transforming” back [B-1, R-1]. Definition C.55. A family E(λ), λ ∈ R, is a spectral family if (1) (2) (3) (4)

E(λ) are orthogonal projections; if λ1 < λ2 , then E(λ1 ) ≤ E(λ2 ) (i.e., monotone non-decreasing); limλ→λ+ E(λ) = E(λ0 ) (right continuous); 0 E(λ) → 0 as λ → −∞ and E(λ) → I as λ → ∞.

Example C.56 ((without a moral)). Let X : Ω → R be a random variable on a probability space (Ω, µ) and let ϕx (λ) = µ(x ≤ λ). Then (1) 0 ≤ ϕx (λ) ≤ 1 all λ; (2) if λ1 < λ2 , then ϕx (λ1 ) ≤ ϕx (λ2 ) (monotone non-decreasing); (3) limλ→λ0 + ϕx (λ1 ) = ϕx (λ0 ); and, (4) limλ→−∞ ϕx (λ) = 0, limλ→∞ ϕx (λ) = 1. However, ϕx (λ)2 6= ϕx (λ) in general.

C Spectral Theory

257

Theorem C.57. If U is unitary, then there is a spectral family E(θ), θ ∈ [0, 2π], such that E(θ) is continuous at 0 and as such is uniquely determined by U . Moreover, E(θ) is a limit of polynomials in U, U ∗ . Pn Pn k ikθ Proof ([B-1]). Let p(eiθ ) = and p(U ) = −n ck U . Note k=−n ck e P n that p(eiθ ) = −n ck e−ikθ . If p( · ) is real, then p(U ) is self-adjoint. If p(eiθ ) ≥ 0, then p(U ) ≥ 0. We note [B-1] if p( · ) is positive, p(eiθ ) = q(eiθ )q(eiθ ) and p(U ) = q(U )∗ q(U ) and hp(U )h, hi = hq(U )h, q(U )hi ≥ 0. This can extend to even continuous ψ(eiθ ). Define functions eµ (θ) as follows: e0 (θ) ≡ 0,

e2π (θ) ≡ 1,

eµ (θ) ≡ 1,

2jπ < θ < 2(j + 1)π,

eµ (θ) = 0,

2jπ + µ < θ ≤ 2jπ + µ,

where 0 < µ < 2π, j ∈ Z. Let E(µ) = eµ (U ). Then the E(µ) are projections with E(0) = 0, E(2π) = I, E(µ) ≤ E(ν) if µ ≤ ν. It is easy to show right continuity, and by the definition, E(µ) is a limit of polynomials in U, U ∗ (= U −1 ). Now let 0 = t0 < t1 · · · < tn = 2π be a partition with max |tj − tj−1 | ≤  and let θj ∈ [tj−1 , tj+1 ]. Then |eiθ −

n X

eiθj [etj (θ) − etj−1 (θ)]| ≤ |θ − θj | ≤ 

j=1

for θj−1 < θ < θj (and also θ = 0). It follows that 0 ≤ |eiθ −

n X

eiθj [etj (θ) − etj−1 (θ)]| ≤ 2

j=1

and hence, that ||U −

Pn

j=1

eiθj (E(tj ) − E(tj−1 ))|| ≤  so that Z U=



eiθ dE(θ).

(C.58)

0

t u

The rest is left to the reader. Let w(z) be the linear fractional transformation of C → C given by: w(z) =

z−i . z+i

(C.59)

Observe the following: (1) if z = x ∈ R, then w(x) = (x − i)/(x + i) and |w(x)| = 1 as w(x)w(x) = (x − i)(x + i)/(x + i)(x − i) = 1, i.e., w : R → {z : |z| = 1}; (2) (1, 0) 6∈ w(R);

C Spectral Theory

258

(3) if x ∈ R, w(x) = (x2 − 2ix − 1)/(x2 + 1) = u(x) + iv(x) with u(x) = (x2 − 1)/(x2 + 1) and v(x) = −2x/x2 + 1, so lim|x|→∞ w(x) = (1, 0); (4) the inverse of w is given by z = i(w + 1)/(1 − w); (5) given w = u + iv, |w| = u2 + v 2 = 1 and u 6= 1 (i.e., w 6= (1, 0)), then for x = v/(u−1) ∈ R, w(x) = w (in other words, w : R → {eiθ : θ ∈ (0, 2π)}). Let W (L) be (tentatively) given by: W (L) = (L − iI)(L + iI)−1 .

(C.60)

W (L) is called the Cayley Transform of L. Recall that L is a closed, densely defined, self-adjoint unbounded linear operator. We will show that W (L) is a unitary operator in L(H, H). Proposition C.61. R(i, L) = (L − iI)−1 and R(−i, L) = (L + iI)−1 exist and are bounded. Proof. ||(L ∓ iI)h||2 = ||Lh||2 + ||h||2 implies Ker (L ∓ I) = {0}, so inverses exist. Moreover, ||(L ∓ iI)h|| ≥ ||h|| implies ||R(∓i, L)h|| ≤ ||h|| for all h in D(R(∓i, L)). But these domains are all H as (L ∓ iI) is closed and densely t u defined. Proposition C.62. Let z = x + iy, y 6= 0. Then R(z, L) ∈ L(H, H). Proof. (L + (x + iy)I)−1 =

−1 1  L − xI − iI . y y

t u

Proposition C.63. ||(L − iI)h|| = ||(L + iI)h|| so W (L) is an isometry for h ∈ D(L). W (L) for h1 = (L + iI)h is defined by W (L)h1 = (L − iI)h. Proof. ||(L ∓ iI)h|| = ||Lh||2 + ||h||2 and h1 , W (L)h1 run through H.

t u

Corollary C.64. W (L) is unitary. Corollary C.65. L = i(I + W (L))(I − W (L))−1 . Proof. Let h1 = (L + iI)h. Then (I + W (L))h1 = 2Lh and (I − W (L))h1 = 2ih. If h1 ∈ Ker (I − W (L)), then h1 = 0 so Ker (I − W (L)) = {0}, so (I − W (L))−1 exists. Since W (L) is a bounded unitary operator, W (L) has a spectral decomposition on [0, 2π] Z 2π W (L) = eiθ dF (θ) (C.66) 0

where F (0) = 0, F (2π) = I and F ( · ) is continuous at 0 and 2π. Now if λ ∈ R, then −2 arc cot λ ∈ [0, 2π].R So let E(λ) = F (−2 arc cot λ). ∞ E(λ) is a spectral family on (−∞, ∞) and −∞ λdE(λ) is a self-adjoint operator. Let θn be such that −cot θn /2 = n for n = 0, ±1, ±2, . . . (i.e., n ∈ Z). Let Pn = F (θn ) − F (θn−1 ). The Pn are pairwise orthogonal projections and

C Spectral Theory

259 ∞ X

Pn = lim F (θ) − lim F (θ) = I. θ→2π

−∞

θ→0

The Pn commute with W (L) and L. Let Hn = Pn H so that, for hn ∈ Hn , Lhn = LPn hn = i(I + W (L))(I − W (L))−1 Pn hn Z θn i(1 + eiθ )(1 − eiθ )−1 dF (θ)hn = θn−1

Z

θn−1

=



θ dF (θ)hn . 2

− cot

θn

Thus

Z

n

Lhn =

λdE(λ)hn

(C.67)

n−1

and

R∞ −∞

λdE(λ) reduces to (C.67) on Hn . But Lhn = (F (θn ) − F (θn−1 ))H = [E(n) − E(n − 1)]H,

(C.68)

and a standard argument [B-1] gives the decomposition. This proof is originally due to Von Neumann [R-1]. t u So, we have, in effect, an “operational calculus” for continuous functions of possibly unbounded operators. We want to extend this to measurable functions. Definition C.69. Let Ω be a set and B(Ω) a Boolean algebra of subsets of Ω. Let H be a Hilbert space. A map E of B(Ω) into the set of projections on H is a spectral measure if: (i) E(Ω) = I; P (ii) if Mn are disjoint elements of B(Ω), then E(∪Mn ) = E(Mn ). If E is a spectral measure, then E is monotone (i.e. M ⊂ N implies E(M ) ≤ E(N )) and subtractive (i.e., E(N − M ) = E(N ) − E(M )). Proposition C.70 ([B-1, H-3]). A projection-valued map E( · ) is a spectral measure if and only if (a) E(Ω) = I; and (b) µh1 ,h2 (M ) = hE(M )h1 , h2 i is a C-valued measure for h1 , h2 ∈ H. Proof. If E is a spectral S measure, then (a) and (b) are clear. Conversely, if Mn are disjoint and Mn = M , then the P E(Mn ) are orthogonal projections (as E(M )E(N ) = E(M ∩ N )) and n ||E(Mn )h||2 = hE(M )h, hi = ||E(M )h||2 . Thus E(Mn )h is summable and DX E X hE(M )h1 , h2 i = hE(Mn )h1 , h2 i = E(Mn )h1 , h2 . t u

C Spectral Theory

260

If E is a spectral measure on Ω, this will allow us to ultimately define Z f (ω)dE(ω) Ω

as an operator for measurable f . If f is bounded (or essentially bounded), then we have provided a path to the proof which is left to the reader. We want to extend to arbitrary Borel measurable functions f : Ω → C. Lemma C.71. Let Z o n Df = h ∈ H : |f (ω)|2 d||E(ω)h||2 < ∞

(C.72)



(f : Ω → C Borel measurable, E a spectral measure). Then Df is a dense subspace and Z Z  (C.73) |f (ω)|d|µh1 ,h2 (ω)| ≤ ||h|| |f (ω)|2 dµh1 ,h2 Ω



for h1 ∈ Df , h ∈ H (and µh1 ,h2 (ω) is the complex measure hE( · )h1 , h2 i). Proof. Df is a subspace as ||E(ω)(h1 + h2 )||2 ≤ 2||E(ω)h1 ||2 + 2||E(ω)h2 ||2 , h1 , h2 ∈ Df . Let Ωn = {ω ∈ Ω : |f (ω)| < n}. If h ∈ Im (E(Ωn )), then µh,h (Ω − Ωn ) = 0 and Z Z 2 |f | dµh,h = |f |2 dµh,h ≤ h2 ||h||2 < ∞, Ωn



so h ∈ Df . Since Ωn ⊂ Ωn+1 and E(Ω) = I, E(Ωn )h → h as n → ∞ (by monotone convergence for χΩn ) So we have density. If it is easy to establish (C.73) for f bounded, the result follows for general f by letting fn = χΩn f t u and monotone convergence (of Lebesgue). Lemma C.74. There is a unique linear operator Lf : H → H with dom (Lf ) = Df and Z f (ω)dµh1 ,h (ω) hLf h1 , hi = (C.75) Ω

for h1 ∈ Df , h ∈ H. Moreover, ||Lf h1 ||2 =

Z

|f (ω)|2 dµh1 ,h1 (ω)

(C.76)



for h1 ∈ Df . In addition, L∗f = Lf so Lf is closed. Proof. R Uniqueness since (C.75) determines Lf . If h1 ∈ Df , then the map h → Ω f (ω)dµh1 ,h (ω) is an element of H∗ (= H). Let Lf h1 be the element defining it, i.e.

C Spectral Theory

261

Z f (ω)dµh1 ,h (ω) = hLf h1 , hi. Ω

R Since µh1 ,h is linear in h1 and ||Lf h1 ||2 ≤ Ω |f (ω)|2 dµh1 ,h1 (ω), we are done if f is bounded. For general f this follows by truncation and Lebesgue dom˜ 1 ∈ D = Df , then inated convergence. If h1 ∈ Df and h f Z

˜ 1i = hLf h1 , h



f (ω)dµh

˜ 1 ,h1

˜ 1 i, (ω) = hh1 , Lf h

˜ 2 ∈ D(L∗ ), ˜ 1 ∈ dom (L∗ ) and L∗ extends L . On the other hand, if h so h f f f f then Z ˜ 2 , hi = hh ˜ 2 , Lf hi = f (ω)dµh˜ 2 ,h (ω) hLf n h n Ωn

(where Ωn = {ω : |f (ω)| < n}, fn = χΩn f ). But ˜ 2 , hi = hh ˜ 2 , Lf E(ωn )hi hE(ωn )L∗f h Z = f (ω)dµh˜ 2 ,E(ωn )h Ω

˜ 2 = E(ωn )L∗ h ˜ so that Lf n h f 2 . The result follows by monotone convergence, ˜ 2 ∈ Df . i.e., h t u The two lemmas and properties of the Cayley transform give the spectral decomposition for measurable f [B-1]. There are many approaches to spectral theory. One elegant and intriguing approach, due to Davies [D-3, D-4], is based on almost analytic extensions and defines f (L) for slowly decreasing smooth functions f on R. We note that the space of such functions includes all rational functions with no poles on R and which are proper (i.e. degree of numerator < degree of the denominator). In other words, they are certain transfer functions. We very briefly indicate the idea. If z ∈ C, consider (1 + |z|2 )1/2 or if x ∈ R, (1 + |x|2 )1/2 . Let ν ∈ R and let Sν = {f : R → C : |Dn f (x)| ≤ γn [(1 + |x|2 )1/2 ]ν−n } S for some γn < ∞, all x ∈ R and n ≥ 0. Let S = ν −∞. For if α = −∞, then x ∈ K(α) = K(α) for all α ∈ R which is not possible. Hence x ∈ K(α + ) = K(α + ) for all  > 0, i.e., f (x) ≤ limxn →x inf f (xn ) which is lower semi-continuity. t u Corollary D.7. If Kf is compact and Y = R, then f is lower-semicontinuous. Proposition D.8. Let Y = R and f : K → R be convex. Suppose that the relative interior of K is non-empty. Then Kf has a relative interior point (x0 , ρ0 ) if and only if f is continuous at x0 . Proof ([R-3]). If f is continuous at x0 and 0 <  < 1, there is a δ > 0 such that if x ∈ S(x0 , δ) ∩ span [K], then x ∈ int (K) and |f (x) − f (x0 )| < . Let ρ0 = f (x0 ) + 2. Then (x0 , ρ0 ) ∈ Kf and (x, r) ∈ Kf if |r − ρ0 | < 1 and x ∈ S(x0 , δ) ∩ span [K]. In other words, (x0 , ρ0 ) is a relative interior point. On the other hand, if (x0 , ρ0 ) is a relative interior point of Kf , then there exist 1 , δ1 such that if x ∈ S(x0 , δ1 ) ∩ span [K] and |r − ρ0 | < 1 , then f (x) ≤ r, and so f (x) is bounded by f (x0 ) + 1 on a neighborhood. Assume without loss of generality that x0 = 0 and f (x0 ) = 0. Given , 0 <  < 1, and x ∈ S(x0 , δ1 ) ∩ span [K], we have f (x) ≤ (1 − )f (0) + f (x/), f (x) ≥ −1 . It follows that if x ∈ S(x0 , δ1 )∩span [K], |f (x)| ≤ 1 so that f is continuous at x0 = 0. t u Corollary D.9. f convex on K and continuous at x0 in the relative interior of K implies that f is continuous for all x in the relative interior of K. Corollary D.10. If f is convex on K and dim K < ∞, then f is continuous on the relative interior of K.

D Convexity

265

Let E = (a, b) be an open interval (a = −∞, b = ∞ are allowed). If ψ : E → R is convex, then ψ is continuous by, say, Corollary D.10. Proposition D.11. ψ : E → R is convex if and only if ψ(t) − ψ(s) ψ(u) − ψ(t) ≤ t−s u−t

(D.12)

for a < s < t < u < b. Proof ([R-7]). Suppose (D.12) holds. Let s < u be in E. If 0 < α < 1, then αs + (1 − α)u = t satisfies s < t < u and so ψ(αs + (1 − α)u) − ψ(s) ψ(u) − ψ(αs + (1 − α)u) ≤ (α − 1)s + (1 − α)u αu − αs (as t − s = (α − 1)s + (1 − α)u, u − t = αu − αs). It follows that αψ(αs + (1 − α)u) − αψ(s) ≤ (1 − α)ψ(u) − (1 − α)ψ(αs + (1 − α)u) and ψ(αs + (1 − α)u) ≤ αψ(s) + (1 − α)ψ(u). The converse is similar and is left to the reader. t u Theorem D.13. [Jensen’s Inequality] Let (Σ, β, µ) be a finite measure space and let f ∈ L1 (µ) with f (x) ∈ E = (a, b). If ψ is convex on E, then Z Z ψ f dµ) ≤ (ψ ◦ f )dµ. Proof (Rudin). Since ψ is continuous, ψ ◦ f is measurable. Let t = so that a < t < b. Let ψ(t) − ψ(s) β = sup t−s α a if ||u|| > ρ. Proposition D.37. Suppose that J is coercive, that U = Z∗ for some Banach space Z, and that J is weakly lower semi-continuous. Then J has a global minimum on U. Proof. Let a∗ = inf u∈U J(u) (if a∗ = ∞, there is nothing to prove) and let a > a∗ . Since J is coercive, there exists ρ > 0 with J(u) > a if ||u|| > ρ. Hence a∗ = inf{J(u) : u ∈ ρS(0, 1)}. But the sphere ρS(0, 1) is weak-*-compact and so there is a minimum. t u Corollary D.38. If J is coercive and weakly lower semi-continuous and U is reflexive, then J has a global minimum. Proof. Set Z = U∗ so Z∗ = U∗∗ = U and the weak-*-topology is equivalent to the weak topology. t u Corollary D.39. Suppose J is coercive, convex and lower semi-continuous and that U is reflexive, then J has a global minimum. Corollary D.40. If J is convex, then J has a minimum on every weakly compact subset of X. Hence, if X is reflexive, J has a minimum on every bounded, closed, convex set in X and if W ⊂ X is bounded, then J has a minimum on co(W ). Corollary D.41. || · ||X ∗ is weak-*-lower semi-continuous. Proof. {|| · ||X ∗ ≤ α} is closed and hence weak-*-closed. See also [D-1].

t u

Appendix E

Sobolev Spaces

We gather here some results on Sobolev spaces [A-1, M-1, D-8, S-3] which are critical to the formulation and analysis of control problems for partial differential equations. Definition E.1. A domain D ⊂ Rn is a bounded open subset of Rn . D is a C k -, k ≥ 1, domain if ∂D is an (n − 1)-dimensional C k -manifold (locally smooth boundary). D is a Lipschitz domain if ∂D is locally Lipschitz, i.e., each x ∈ ∂D has a neighborhood Ox such that Ox ∩D is defined by a Lipschitz function. Let x ∈ Rn , S1 a ball with center x, and S2 a ball with x 6∈ S2 , then Kx = S1 ∩ {x + ρ(y − x) : y ∈ S2 , ρ > 0} is a finite cone with vertex x. D is a conic domain if there is a finite cone K such that each x ∈ ∂D is the vertex of a finite cone Kx congruent to K (i.e., obtained by a Euclidean motion). We note that if D is a C k -domain, then D is a Lipschitz domain and that if D is a Lipschitz domain, then it is a conic domain. Definition E.2. If D is a domain and F(D), F(Rn ) are (suitable) function spaces, then a continuous linear map E : F(D) → F(Rn ) is called an extension (or imbedding) operator. Let C0∞ (D) = {f : f ∈ C ∞ (D), f has compact support supp f ⊂ interior of D}. We note that C0∞ (D) is dense in Lp (D), 1 ≤ p ≤ ∞. A function f is locally integrable on D if f ∈ L1 (K) for all compact K contained in (interior of ) D. R Remark E.3. If f is locally integrable on D and D f ϕdx = 0 for all ϕ ∈ C0∞ (D), then f = 0 a.e. C0∞ (D) is a topological vector space [S-2]. Let C0∞ (D)∗ be its dual space which is called the space of (Schwartz) distributions. Let {., .} denote the duality (scalar product) so that if f ∈ C0∞ (D)∗ and ϕ ∈ C0∞ (D), then {f , ϕ} is the action of f on ϕ, and we write Z {f , ϕ} = (E.4) f (x)ϕ(x)dx. D

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0

271

E Sobolev Spaces

272

Definition E.5. If f ∈ C0∞ (D)∗ , then the (generalized) derivative, Dxi f = ∂f /∂xi is the unique element of C0∞ (D)∗ , such that n ∂f o n ∂ϕ o ,ϕ =− f , ∂xi ∂xi for all ϕ ∈ C0∞ (D). Example E.6. Let δ be the Dirac delta function. Then {δ(x − x0 ), ϕ} = R δ(x − x0 )ϕ(x)dx = ϕ(x0 ) and D Z o n n ∂δ ∂ϕ o ∂ϕ ∂ϕ , ϕ = − δ(x − x0 ) , =− δ(x − x0 ) dx = − (x0 ). ∂xi ∂xi ∂x ∂x i i D Remark E.7. If f ∈ Lp (D), then there is an f ∈ C0∞ (D)∗ given by {f , ϕ} = R f (x)ϕ(x)dx. The map f → f is an injective continuous linear map and so D we have (an identification) [S-2] C0∞ (D) ⊂ Lp (D) ⊂ C0∞ (D)∗ .

(E.8)

Pn Let α = (α1 , . . . , αn ) be a multi-index with αi ≥ 0 and let |α| = i=1 αi . α1 α α αn We define D as usual by setting D f = (D1 · · · Dn )f where Di = Dxi = ∂/∂xi . Then we have: Definition E.9. Let D be a domain, m ≥ 0, and 1 ≤ p ≤ ∞. Let n Wm,p (D) = f ∈ Lp (D) : |α| ≤ m, there exists a gα ∈ Lp (D) such that Z Z o |α| α (−1) f D ϕdx = gα ϕdx for all ϕ ∈ C0∞ (D) , D

D

i.e., gα = Dα f is the generalized derivative and gα = Dα f = Dα f by the identification (E.8). Set ||f ||Wm,p =

 X

||Dα f ||pp

1/p

.

|α|≤m

Then Wm,p (D) is the (m, p) Sobolev space. If W0m,p (D) = C0∞ (D), closure in Wm,p (D), then W0m,p (D) is also a Sobolev space. We note that if f ∈ C m (D) ∩ Wm,p (D), then Dα f is the classical derivative. In this sense, we can speak of “classical” solutions. Theorem E.10. Wm,p (D) is a Banach space. Proof. Let wj be Cauchy in Wm,p (D). Then the “weak” derivatives (or α generalized), |α| ≤ m, {Dw wj } are Cauchy sequences in Lp (D) and hence, α α α there is a w in Lp (D) with Dw wj → wα . We claim that Dw w = wα . Since

E Sobolev Spaces

273

wj → w in Lp (D), it follows that {wj , ϕ} → {w, ϕ} for all ϕ in C0∞ (D) and hence (Hölder’s inequality) ||wj ϕ − wϕ||p ≤ ||wj − w||p ||ϕ||∞ → 0. Now α {wα , ϕ} = lim {Dw wj , ϕ} = lim (−1)|α| {wj , Dα ϕ} = (−1)|α| {w, Dα ϕ} j→∞

j→∞

t u

and the result follows.

Remark E.11. If 1 < p ≤ ∞, then Wm,p (D) = {f : f ∈ dom (Dα )∗ ⊂ Lp (D), |α| ≤ m}. Viewing Dα as a map C0∞ (D) → Lq (D) where 1/p+1/q = 1, then we have that Wm,p (D) is reflexive for 1 < p < ∞. Remark E.12. Let Hm (D) = P Wm,2 (D) = {f : f ∈ L2 (D), Dα f ∈ m α α L2 (D), |α| ≤ m}. Set hf, gi = |α|≤m hD f, D gi2 for f, g in H (D). m m Then h , i is a scalar product on H (D) and H (D) is a Hilbert space. Proposition E.13. (1) If k ≤ m and 1 ≤ p ≤ ∞, then Wm,p (D) ⊂ Wk,p (D); (2) if 1 ≤ p ≤ q ≤ ∞, then Wm,q (D) ⊂ Wm,p (D) [i.e., the identity map is continuous]. Proposition E.14. Suppose that the (bounded) domain D has a “regular” boundary (i.e., is a C k or Lipschitz or conic domain). Then, for 1 < p < ∞ and 0 ≤ s < t, the identity map Wt,p (D) → Ws,p (D) is compact. In particular, for 0 ≤ s < t, the identity map Ht (D) → Hs (D) is compact [A-1]. While the most general “embedding” results are for conic domains, we shall deal with Lipschitz domains as these are quite adequate for our needs. Theorem E.15 ([S-7]). Let D be a Lipschitz domain, m a non-negative integer, and 1 ≤ p ≤ ∞. Then there is an extension map E : Wm,p (D) → Wm,p (Rn ) such that E restricted to Wm,p (D) is the identity and ||Ew||Rn ≤ M ||w||D for w ∈ Wm,p (D), M > 0 (i.e., E is bounded). Since the converse of Theorem E.15 is obviously true (i.e., if w ∈ Wm,p (Rn ), then wD = w restricted to D is clearly in Wm,p (D)), and we can, for Lipschitz domains, move from D to Rn . Consider now L2 (R). If f ∈ L2 (R), then the Fourier transform, F(f ), of f is the element of L2 (R) given by Z ∞ (E.16) F(f )(x) = e2πixy f (y)dy −∞

(integral in mean square). The map F : L2 (R) → L2 (R) is an isometric isomorphism with ||F(f )||2 = ||f ||2 and F−1 given by Z ∞ F−1 (g)(x) = e2πixy g(y)dy. −∞

E Sobolev Spaces

274

Similarly, if f ∈ L2 (Rn ), then Z

e2πihx,yi f (y)dy

F(f )(x) = Rn

is an isometric isomorphism with F

−1

Z (g)(x) =

e2πihx,yi g(y)dy.

Rn

These extend by continuity so that n F(Dα u)(x) = (2πi)|α| xjα1 · · · xα n F(u)(x),

u ∈ L2 (Rn ).

We see that Hm (Rn ) = {f : xα F(t) ∈ L2 (Rn ), |α| ≤ m} or Hm (Rn ) = {f : (1 + |x|2 )m/2 F(t) ∈ L2 (Rn )}. We can define an equivalent scalar product on Hm (Rn ) via E D (1 + |x|2 )m/2 F(f ) , (1 + |x|2 )m/2 F(g) .

(E.17)

Definition E.18. Hs (Rn ) = {f : (1 + |x|2 )s/2 F(f ) ∈ L2 (Rn )} for any s. Hs (Rn ) is a Hilbert space for (E.17) and, for D a Lipschitz domain, Hs (D)∗ = H−s (D). Example E.19. Let D = (0, 1), f (x) = xα . If α > s − 1/2, s ≥ 0, then f ∈ Hs (D). Example E.20. Let D = S(0, 1/2) in R2 . Let f (x) = log(− log(x21 + x22 )), x = (x1 , x2 ). Then f ∈ H1 (D) but f is neither continuous or bounded. Example E.21. Let D = (0, 1), f (t) = 1 for t ∈ (0, 1/3), f (t) = 2, ∈ (1/3, 2/3) and f (t) = 3, t ∈ (2/3, 1). Then f ∈ L2 (D) but f 6∈ C (D) and f 6∈ H2 (D). E.22 (cf. Remark E.3). If f ∈ L1loc (a, b) and if for some k > 0, RLemma (k) f ϕ dx = 0 for all ϕ ∈ C0∞ (a, b), then f is a polynomial of degree ≤ k − 1 a.e. Corollary E.23. If f ∈ Lp (D), then f ∈ Wm,p (D) if and only if, for 1 ≤ k ≤ m, there are fk ∈ Lp (D) with {f, ϕ(k) } = (−1)k {fk , ϕ} all ϕ ∈ C0∞ (D). Proposition E.24. Suppose D is connected and 1 ≤ p ≤ ∞. If f ∈ W1,p (D) and  ∂f  =0 ∇f = ∂xi a.e., then f is a constant.

E Sobolev Spaces

275

Theorem E.25 ([S-7]). C ∞ (D) ∩ Ws,p (D) is dense in Ws,p (D) for p < ∞ (D Lipschitz). We often want to impose boundary conditions or use boundary controls. Observe that if f ∈ Lp (D), then there is no meaning to f on ∂D. Remark E.26. Suppose f ∈ C (D) and let τ : C (D) → C (∂D) be given by τ (f )(θ) = f (θ) = (tr∂D f )(θ) for θ ∈ ∂D. If D is a Lipschitz domain, then τ can be extended to W1,p (D). Further, if D is Lipschitz, then the identity map I : Ws,p (D) → Wk,p (D), 0 ≤ k < s, is compact. Definition E.27. Let 1 ≤ p ≤ ∞ and s ≥ 0. If there is a bounded linear map T : Ws,p (D) → Ws,p (Rn ) such that (T f )D = f for all f in Ws,p (D), then D is extendable and T f is the extension of f . We note that if D is a Lipschitz domain, then D is extendable (1, p). This allows for boundary values and normal derivatives. Definition E.28. Call a domain D regular if D is open, bounded and Lipschitz with no point of ∂D interior to D. Let W01,p (D) = {f ∈ W 1,p (D) : f (∂D) = 0} be functions vanishing on the boundary. Theorem E.29 ([M-1]). Suppose p < ∞ and D is regular and 1/p + 1/q = 1. Then the map τ : W1,p (D) → W1/q,p (∂D) is surjective with Kerτ = W01,p (D). Corollary E.30. There is a γ such that, for all ψ in W1/q,p (∂D), there is an fψ in W1,p (D) with τ (fψ ) = ψ

and

||fψ || ≤ γ||ψ||

(in other words, boundary data lift). Corollary E.31. Suppose D is regular with C 2 -boundary, then the map (τ, τ ) : W2,p (D) × W2,p (D) → W1+1/q,p (∂D) × W1+1/q,p (∂D) is surjective with kernel W02,p (∂D) (vanishes on ∂D and its normal derivative vanishes on ∂D). Remark E.32. If D is rgular, then the map I : Wm,p (D) → W0,p (D) = Lp (D) is compact and ||f ||m,p ≥ ||Dα f ||p for |α| ≤ m. These results lead to useful estimates and appropriate problem formulation. We now turn our attention to the situation where D ⊂ Rn and the range is a (reflexive) Banach space X or a Hilbert space H ([A-1, L-5] for approach).

E Sobolev Spaces

276

Let D be a bounded, connected, open set in Rn with a Lipschitz boundary. If f ∈ C (D), then the restriction of f to ∂D, f∂D is well-defined (and continuous) on ∂D. However we need “weak” solutions so that if f ∈ Lp (D, X), then f is not necessarily well defined on ∂D. Let 1 ≤ p < ∞, 1/p + 1/q = 1, X be a reflexive Banach space with X∗ as dual, and H a Hilbert space with H∗ = H. Remark E.33 ([P-2]). Lp (D, X)∗ = Lq (D, X ∗ ) and if K ⊂ D, K compact, then Lp (K, X)∗ = Lq (K, X∗ ). Definition E.34. D(D) = C0∞ (D, X) = {f : D → X : f is C ∞ with ˜ compact support}. D(D) = C0∞ (D, X∗ ) = {ϕ : D → X∗ : ϕ is C ∞ with p compact support}. Lloc (D, X) = {f : D → X : f ∈ Lp (K, X), all compact K ⊂ D}. Let α =P(α1 , . . . , αn ) be a multi-index with the αi non-negative integers ˜ × D → R via and |α| = αi . There is a natural pairing D Z [ϕ, f ] = ϕ(x)[f (x)]dx. (E.35) D 1 ˜ ˜ The pairing extends to D×L loc (D, X) as the ϕ ∈ D have compact support. Observe that if f ∈ D, then

Di f =

∂f ∈ L(R, X) = X ∂xi

˜ then and that, if ϕ ∈ D, ˜ i ϕ = ∂ϕ ∈ L(R, X∗ ) = X∗ . D ∂xi Definition E.36. If f, g ∈ L1loc (D, X), then g = Di,w f is the weak derivative of f if Z Z ˜ (Di ϕ)(x)[f (x)]dx = (−1) ϕ(x)[g(x)]dx (E.37) D

D

˜ for all ϕ ∈ D. Proposition E.38 ([L-2, H-5, R-7]). If f ∈ L1 (D, X), then for almost all x0 ∈ D, Z 1 f (x0 ) = lim f (x)dx |c|→0 |c| c where c are “cubes” about x0 with |c| as “volume.” [For example in R2 , x0 = (x1 , x2 ), c(γ) = {(s, t) : x1 −γ < s < x1 +γ, x2 − γ < t < x2 + γ, |c(γ)| = (2γ)2 .] Proof (Sketch). We may assume X separable with {xj } dense as range f is separable. By the standard Lebesgue differentiation theorem,

E Sobolev Spaces

277

||f (x0 ) − xj || = lim

|c|→0

1 |c|

Z ||f (x) − xj ||dx for all j c

and almost everywhere in x0 . Then Z 1 lim sup ||f (x) − f (x0 )||dx |c| c |c|→0 Z 1 ≤ lim sup {||f (x) − xj || + ||f (x0 )|| − xj ||}dx |c| c |c|→0 < 2||f (x0 ) − xj ||; t u

since the xj are dense, the result follows.

Definition E.39. Let f, h ∈ Lploc (D, X), then h = Dgα f is the α-th generalized derivative of f if for each compact K ⊂ D, there is a sequence fj in C |α| (K, X) such that fj → f in Lp (K, X) and D|α| fj → h in Lp (K, X). R ˜ Remark E.40. If f ∈ L1loc (D, X) and D ϕ(x)[f (x)]dx = 0 for all ϕ ∈ D, then f = 0 a.e. Proof. If f 6= 0 a.e., then there exists a compact K with µ(K) > 0 such that f 6= 0 on K, i.e., ||f (x)|| 6= 0 for x ∈ K. So there exists K1 compact, µ(K1 ) > 0, K1 ⊂ K, ϕ(x) = λ and λ[f (x)] > 0 on K1 . A contradiction. t u Proposition E.41. If f ∈ Lp (D, X) and ψ ∈ Lq (D, X∗ ), then |[ψ, f ]| ≤ ||f ||p ||ψ||q . Proof. Note that ab ≤ Then

bq ap + p q

for

Z |[ψ, f ]| ≤

1 1 + = 1. p q Z

|ψ(x)[f (x)]|dx ≤ D

||ψ(x)|| · ||f (x)||dx. D

If a = ||f (x)||/||f ||p and b = ||ψ(x)||/||ψ||q , then f (x) p ϕ(x) p ||ψ||q + ||f ||p |ψ(x)[f (x)] ≤ ||f ||1−p ||ϕ||1−q p q p q and the result follows.

t u

β α α+β Remark E.42. If f ∈ Lα f = h. loc (D, X), Dw f = g, Dw g = h, then Dw

˜ β ψ. Then Proof. Let ψ ∈ C0∞ (D, X∗ ) and ϕ = D α β [Dα+β ψ, f ] = (−1)|α| [Dβ ψ, Dw f ] = (−1)α+β [ψ, Dw g].

t u

E Sobolev Spaces

278

Remark E.43. Let ϕ ∈ C0∞ (Rn , R) so that K = supp ϕ is compact and therefore closed and bounded. Hence |ϕ(x)| ≤ M < ∞ for all x ∈ Rn and the map Tϕ R: Lp (Rn , X) → Lp (Rn , X) given by Tϕ f = ϕf is a continuous linear map as D ||ϕ(x)f (x)||p dx ≤ M p ||f ||pp . Proposition E.44. If f ∈ Lploc (D, X) and Dgα f = h in Lp (D, X), then α f = h. Dgα f = Dw ˜ and K = supp ϕ. Let  > 0 and γ ∈ C ∞ (K, X) such that Proof. Let ϕ ∈ D ||γ − f ||p,K <  and ||Dα γ − h||p,K < . Then ˜ α ϕ, f ]K = (−1)|α| [ϕ, Dα f ]K ≤ |[D ˜ α ϕ, γ]K | [D g = −(−1)|α| |[ϕ, Dα γ]K | + |[ϕ, Dgα f − Dα γ]K | ˜ gα f Dα γ||p,K ||ϕ||q ˜ α ϕ||q + ||D ≤ ||f − γ||p,K ||D ˜ α ϕ||q + ||ϕ||q ). ≤ (||D Let  → 0 and the result follows.

t u

We want to prove the converse. Definition E.45. Let ρ ∈ C0∞ (Rn , R) such that ρ ⊂ S(0, 1); (i) supp R (ii) ρ(x)dx = 1; and (iii) ρ(x) ≥ 0. Then ρ is called a regularizer. If  > 0, let Z  1 x0 − x  ρ f (x)dx (T f )(x0 ) = n  D  if the integral exists. T f is a mollifier of f . Remark E.46. If f ∈ L1loc (D, X), K compact, K ⊂ D and  < d(K, ∂D), R then T f is in C ∞ (K, X) and T f (x0 ) = S(0,1) ρ(x)f (x0 − x)dx. Remark E.47. For f ∈ Lp (D, X), S = S(0, 1), Z ||T f (x0 )|| ≤ ρ(x)1/q ρ(x)1/p ||f (x0 − x)||dx, S Z p ||T f (x0 )|| ≤ ρ(x)||f (x0 − x)||p dx. S

Moreover, if K ⊂ int K1 , K1 compact,  < d(K, ∂K1 ), then ||T f ||p,K ≤ ||f ||p,K1 . Hence if f ∈ L1loc (D, X) and K ⊂ D, K compact, then ||T f − f ||p,K → 0 as  → 0. Proof. Let  be small enough and δ > 0 so that there exists hR∈ C ∞ (K1 , X) with ||f − h||p,K1 < δ/3 and ||T h − h||p,K < δ/3 (note that S ρ(x)[h(x0 −

E Sobolev Spaces

279

x) − h(x)]dx → 0 uniformly on K with ). Then we have ||T f − f ||p,K ≤ ||f − h||p,K + ||T f − T h||p,K + ||T h − h||p,K and the result follows. t u α α Proposition E.48. Dw f = h implies Dw f = Dgα f in Lp (D, X).

Proof. K ⊂ D, K compact,  < d(K, ∂D). Then T f ∈ C ∞ (K, X) and for x0 ∈ K, Z x − x 1 0 α (D T f )(x0 ) = n Dxα0 ρ f (x)dx  D  Z x − x 1 0 = n (−1)|α| Dxα ρ f (x)dx   D Z  1 x0 − x  = n ρ h(x)dx.  D  But ||T f −f ||p,K → 0 and ||Dα T f −h||p,K = ||T h−h||p,K → 0 as  → 0. t u Definition E.49. Let f : D → X. Then, (i) ||f ||m,p =

 X Z |α|≤m

||Dα f (x)||p dx

1/p

;

D

(ii) Cm,p (D, X) = {f : f ∈ C m (D, X), ||f ||m,p < ∞}; and (iii) Hpm (D, X) is the completion of Cm,p with respect to ||f ||m,p . In view of Proposition E.44 and Definition E.45, Hpm (D, X) = {f : f ∈ L (D, X), Dα f ∈ Lp (D, X), |α| ≤ m and there exist fj ∈ Cm,p (D, X), Dα fj → Dα f in Lp (D, X), |α| ≤ m}. p

Definition E.50. W m,p (D, X) = {f : f ∈ Lp (D, X), Dα f ∈ Lp (D, X), |α| ≤ m}. Definition E.51. Let D be a regular domain. If {Un } is an open cover of D and {ψn } are elements of C0∞ (D, R), then the {ψn } are a partition of unity (based on {Un }), if (i) (ii) (iii) (iv)

ψn ≥ 0; supp P ψn contained in some Uj ; ψn (x) = 1 all x ∈ D; if K P ⊂ D is compact, there exist n0 and an open set O with K ⊂ O ⊂ D n0 and n=1 ψj (x) = 1 for all x ∈ O.

Partitions of unity exist for regular domains. Theorem E.52. Hpm (D, X) = W m.p (D, X). Proof. Let f ∈ W m,p (D, X) and let  > 0. We want to find h ∈ Cm,p (D, X) such that ||Dα f − Dα h||p <  for |α| ≤ m. Let ν = 1, 2, . . ., let Dν = {x : |x| < ν, d(x, ∂D) > 1/ν}, and let Uν = Dν+2 − Dν . {Uν } is an open

E Sobolev Spaces

280

cover. Let ψν be a partition of unity. Then f ψν has support in Uν and is |α| weakly differentiable for |α| ≤ m. Take ν > 0 so thatPTν f ψν has support ∞ in Dν+3 − Dν−1 and ||Tν f ψν ||m,p < /2ν . Take h = ν=1 Tν (f ψν ). Then h ∈ C0m (D, X) and ∞ X ||Dα h − Dα f ||p = Dα (Tν (f ψν ) − f ψν )



ν=1 ∞ X

p

||Dα (Tν (f ψν ) − f ψν )||p ≤

ν=1

∞ X  ≤ . ν 2 ν=1

t u

Corollary E.53 (of argument). C ∞ (D, X) is dense in W m,p (D, X). Proof. Using the construction of the theorem, are hν ∈ C0∞ (Uν , X) P there ∞ with ||Tν (f ψν ) − hν || small enough to give hν ∈ C (D, X) and close to f in W m,p (D, X). t u Definition E.54. W m,p (∂D, X) = {γ : γ ∈ Lp (∂D, X), Dα γ ∈ Lp (∂D, X), |α| ≤ m}. Definition E.55. D ⊂ Rn is a C m−1,1 -domain with Γ = ∂D if there is an open cover {Oj }νj=0 of D such that (i) Oj ⊂ D; (ii) for j = 1, . . . , ν, there is a C m−1,1 diffeomorphism γj (Γ ∩ Oj ) = Uj ∩ ∂H n , γj (D ∩ Oj ) = Uj ∩ H n where H n = Rn+ = {(t1 , . . . , tn ) : tn > 0}. For such a domain, there is a boundary map. Let x ∈ H n , x = (y, t) ∈ R × [0, ∞), α = (α1 , . . . , αn−1 ), |α| ≤ m − 1. Then n−1

Dyα f (y, 0)

=

Dyα f (y, t)

Z

t

Dα y Dr f (y, r)dr.

− 0

It follows that, for 1 ≤ p < ∞, ||Dyα f (y, 0)||p ≤ 2p/q {||Dyα f (y, t)||p +

Z

t p p/q ||Dα }. y Dr f (y, r)|| dr|t|

0

Integrating over Rn−1 × [0, τ ], we get ||Dα f ||pLp (∂H n ) ≤

2p−1 n α p τp o ||D f ||Lp (H n ) + ||Dα+en f ||pLp (H n ) . τ p

If D is a domain with smooth boundary then there is a natural map (restriction) γ: C ∞ (D, X) → C ∞ (Γ, X) (Γ = ∂D) given by γf = f|Γ . To obtain boundary values we want to extend this bounded linear map to H s (D, X) or W m,p (D, X).

E Sobolev Spaces

281

Theorem E.56. If D is a C m−1,1 domain and 1/2 < s < m, then γ has a unique extension to a bounded linear map γ : H s (D, X) → H s−1/2 (Γ, X) and γ has a continuous right inverse. u t

Proof. [A-1, L-4].

Theorem E.57. If m ≥ 1 and D is a C m−1,1 domain, then there is a unique bounded linear map γ : W m.p (D, X) → W m−1,p (Γ, X) such that γf = f|Γ for f ∈ C m (D, X). Proof (sketch). Let {Uj , γj , Wj , U0 , j = 1, . . . , ν} be as in the definition of a C m,1 domain and let {ψi : i = 0, . . . , ν}, ψi ∈ C0∞ (Ui , [0, 1]), be a partition of unity. Let θi = γi|∂Ui with θi−1 as right inverse. If f ∈ C m (D, X), then ||f|Γ ||W m−1,p (Γ,X) ν X = ||(ψi f )|Γ ◦ θi−1 ||W m−1,p n Wi ∩∂H

0=1

≤M

ν X

,X

||(ψi f ) ◦ θi−1 ||W k,p (Wj ∩H n ,X) + ||ψ0 f ◦ θν−1 )||W k,p (W0 ,X)

i=1

for k ≤ m, M = max[1, Mi ]. We are done as C m (D, X) dense. The norm ||f ||m,p in W m,p (D, X) is given by X ||f ||m,p = [ ||Dα f ||pLp (D,X) ]1/p .

t u

(E.58)

0≤|α|≤m

P Equivalently let N = 0≤|α|≤m 1 be the number of multi-indices and let QN N Lp (D, X) = i=1 Lp (D, X) with ||u||N =

N X j=1

||uj ||pp

1/p

,

||u||∞ = max ||uj ||∞ , 1≤J≤N

where u = (u1 , . . . , uN ). Then W m,p (D, X) is a closed subspace of LN p (and inherits the norm). Remark E.59. (a) If X is separable, W m,p (D, X) is separable for 1 ≤ p < ∞; (b) W m,p (D, X) is reflexive and uniformly convex (X reflexive) if 1 < p < ∞. From now on, let X = H be a Hilbert space. We are going to develop a theory on Rn using the Fourier transform and then specialize to a regular domain D.

E Sobolev Spaces

282

Definition E.60. Let S = S(Rn , H) = {f : f ∈ C ∞ (Rn , H),

lim |t|k Dα f (t) = 0 for all k, α}.

|t|→∞

A sequence fn ∈ S converges to f ∈ S, fn → f , if || |t|k (Dα fn (t) − Dα f (t))|| → 0 as |t| → ∞ for all k, α. Remark E.61. If ϕ ∈ C ∞ (Rn , R) and |Dα ϕ(t)| < Pα (t) is a polynomial for all t, then ϕf ∈ S for f ∈ S and the map Tϕ : S → S given by Tϕ f = ϕf is continuous. Proof. Case k = 0, α = 0: ||ϕ(t)f (t)|| < P0 (t)||f (t)|| but |t|r f (t) → 0 as |t| → ∞. Case k > 0, α = 0 similar. Case k > 0, α = ei = (0, . . . 1, . . . 0): || |t|k Di (ϕf )(t)|| ≤ || |t|k (Di ϕ)f (t) + ϕDi f (t)|| → 0 as |t| → ∞. The result t u follows by induction. α α1 α1 αn αn α Let t = (tP 1 , . . . , tn ), t = t1 , . . . , tn , ξ = (ξ1 , . . . , ξn ), ξ = ξ1 , . . . , ξn and ht, ξi = ti ξi (the inner product on Rn ).

Definition E.62. If f ∈ S, let fˆ : (Rn )∗ (= Rn ) → H be given by Z fˆ(ξ) = e−iht,ξi f (t)dt.

(E.63)

Rn

fˆ is the Fourier Transform of f . Remark E.64. Z f (0) =

||fˆ(ξ)|| ≤

f (t)dt and Rn

Z ||f (t)||dt < ∞. Rn

Proof. lim |t|k+2 Dα f (t) = 0 ⇒ || |t|k Dα f (t)|| ≤

|t|→∞

M |t|2 t u

for some M > 0. Remark E.65. If g(t) = (−1)|α| tα f (t), then Z Dα fˆ(ξ) = (−1)|α| e−iht,ξi tα f (t)dt = gˆ(ξ). Rn

Proof. By absolute integrability we can differentiate under integral for Di . Then we argue by induction. t u Remark E.66. Dj fˆ(ξ) = ξj fˆ(ξ). Proposition E.67. fˆ(ξ) ∈ S = S(Rn∗ , H).

E Sobolev Spaces

283

∞ n d ˆ Proof. By earlier remarks, fˆ( · ), D j f ( · ) ∈ C (R , H), so the map f → f is linear and sequentially continuous. If t0 ∈ Rn and Tt0 f (t) = f (t − t0 ), then fˆ(ξ − t0 ) = e−iht0 ,ξi fˆ(ξ). Recall from elementary calculus: R (1) If ϕ : R → R is even, then ϕ(ξ) ˆ = cos ht, ξiϕ(t)dt; √ 2 2 ϕ(ξ) (2) if ϕ(t) = e−t /2 , then ˆ = 2πe−ξ /2 ; R 1/  sin Rt  (3) R > 0, lim→0  dt = π2 . t

Definition E.68. Let f˜(t) =

1 (2π)n

Z

eiht,ξi fˆ(ξ)dξ.

Rn

f˜ is the Inverse Fourier Transform of fˆ. Theorem E.69 (Fourier Inversion). f (t) = f˜(t). Proof. By Fubini’s Theorem, we reduce to the case n = 1 so we need to show that Z ∞Z R 1 f (t) = f (s)ei(t−s)ξ dξds 2π −∞ −R or, equivalently, Z Z 1 ∞ sin R(t − s) R ∞ [f (t − r) − f (t + r)] sin Rr f (s) ds = dr. π −∞ (t − s) π −∞ 2 r (t+r) Let g(r) = f (t−r)−f − f (t); then g(0) = 0 and g(r) = rh(r) with h ∈ C 1 , r 0 ≤ r ≤ 1. Integrating by parts gives Z 1 Z  sin Rr sin Rr 1 g(r) dr + g(r) dr < . r r R 1 1/

t u

A standard limit gives the result.

Definition E.70. Let S(Rn , L(H, H)) = {Φ : Φ maps Rn into L(H, H) and lim|t|→∞ |t|k Dα Φ(t) = 0, for all k, α}. Then Z (Φ ∗ f )(t) = Φ(t − s)f (s)ds (E.71) Rn

is the convolution of Φ and f . Proposition E.72. (i) lim|t|→∞ (1 + |t|2 )(Φ ∗ f )(t) = 0; (ii) Di (Φ ∗ f ) = (Di Φ) ∗ f = Φ ∗ (Di f ) (so Φ ∗ f ∈ S); ˆ fˆ(ξ). [ (iii) Φ ∗ f (ξ) = Φ(ξ) Proof. Just as in the classical case.

t u

E Sobolev Spaces

284

On S, define ||f ||s , s ∈ R by ||f ||s2 =

 1 n Z (1 + ||ξ||2 )s ||fˆ(ξ)||2 dξ 2π

and

Z hf, gis =

(1 + ||ξ||2 )s hfˆ(ξ), gˆ(ξ)idξ.

Remark E.73. If s ≤ t, then ||f ||s ≤ ||f ||t . Remark E.74 (Classical Parseval). Z  1 n Z ||f (t)||2 dt = ||fˆ(ξ)||2 dξ. 2π Let K s be defined by  1 n Z K f (t) = eiht,ξi (1 + |ξ|2 )s fˆ(ξ)dξ, 2π

(E.75)

s f (ξ) = (1 + |ξ|2 )s fˆ(ξ). d K

(E.76)

s

then Remark E.77. (i) ||K s f ||st = ||f ||2t+2s ; (ii) hK s f, git = hf, gis+t . Proof. (i)  1 n Z s f i(ξ)||2 dξ d (1 + |ξ|2 )t ||hK 2π  1 n Z = (1 + |ξ|2 )t (1 + |ξ|2 )2s ||fˆ(ξ)||2 dξ 2π = ||f ||2t+2s ;

||K s f ||2t =

t u

(ii) is similar.

Remark E.78. K s ◦ K t = K s+t and, therefore, K s ◦ K −s = I, so K s is invertible. Proof.  1 n Z K ◦ K f (t) = eiht,ξi (1 + |ξ|2 )s (1 + |ξ|2 )t fˆ(ξ)dξ 2π  1 n Z = eiht,ξi (1 + |ξ|2 )s+t fˆ(ξ)dξ. t u 2π s

t

Definition E.79. H s (Rn , H) is the completion of S(Rn , H) in || · ||s . Note that the H s (Rn , H) are Hilbert spaces. The map K s : H t−2s (Rn , H) → H t (Rn , H) is an isometry by virtue of Remark E.78). There is a natural

E Sobolev Spaces

285

pairing hf, gi0 of H s (Rn , H) and H −s (Rn , H) given by Z hf, gi0 = hfˆ(ξ), gˆ(ξ)idξ.

(E.80)

Proposition E.81. |hf, gi0 | ≤ ||f ||s ||g||−s so hf, gi0 is a bounded bilinear form on H s (Rn , H) × H −s (Rn , H). Proof. Z

(1 + |ξ|2 )s (1 + |ξ|2 )−s |hfˆ(ξ), gˆ(ξ)idξ

Z

g (ξ)||2 |dξ (1 + |ξ|2 )s ||fˆ(ξ)||2 (1 + |ξ|2 )−s ||ˆ

|hf, gi0 | ≤ ≤

≤ ||f ||s · ||g||−s .

t u

Observe that if g ∈ H −s (Rn , H), then g defines an element λg of H (Rn , H)∗ via λg (f ) = hf, gi0 . Conversely since K s is an isometry, every element of H s (Rn , H)∗ is a λg for some g. Moreover, if g ∈ H −s (Rn , H), then |hf, gi0 | sup ||g||−s = n s f ∈H (R ,H) ||f ||s s

(since if g 6= 0, there is an f = K −s g and ||g||−s ||f ||s = hK −s/2 g, K −s/2 gi = ||g||2−s ). Theorem E.82. If s > n/2, the map S(Rn , H) → C ∞ (Rn , H) extends to a bounded linear map into H s (Rn , H). Proof.

 1 n Z f (t) = eiht,ξi fˆ(ξ)dξ 2π

so that  1 n Z ||f (t)|| ≤ |eiht,ξi |(1 + |ξ|2 )s (1 + |ξ|2 )−s ||fˆ(ξ)||dξ 2π Z 1/2 ≤ ||f ||s (1 + |ξ|2 )−s dξ < ∞. t u In other words, ||f ( · )||∞ ≤ M (n, s)||f ||s . (This is Sobolev’s inequality). Corollary E.83 (by induction). If s > n/2, |α| = m, and f ∈ H s+m (Rn , H), then f has m continuous derivatives and ||Dα f ||∞ ≤ Mα ||f ||m+s . Now let D ⊂ Rn be a Lipschitz domain and let S0 (D, H) = {f : f ∈ S(Rn , H), supp f ⊂ D}. If f ∈ S0 (D, H), then f|D or f|D is in C0∞ (D, H) or C0∞ (D, H). Then, extending f by 0, we have f ∈ S(Rn , H) and a fortiori in S0 (D, H).

E Sobolev Spaces

286

Proposition E.84. If f ∈ S0 (D, H), then ||fˆ(ξ)|| ≤ µ(D)1/2 ||f ||0 and α f (ξ)|| ≤ µ(D)1/2 ||Dα f || . [ ||D 0 Proof. The second part follows by induction. Now Z  1 n Z ||f ||02 = ||fˆ(θ)||2 dθ = ||f (t)||2 dt 2π Rn Rn and Z ˆ ||f (θ)|| ≤

e−iht,θi f (t)dt Rn Z 1/2 1/2 ≤ µ(D) ||f (t)||2 dt D

(by the Schwartz inequality). αn 1 Let mi = lubx∈D |xi |, m|α| = mα 1 · · · mn . Then ||Dα fˆ(ξ)|| ≤ mα µ(D)1/2 ||f ||0 (since Dα fˆ(ξ) = (−i)α Schwartz inequality).

R

(E.85)

e−iht,ξi tα f (t)dt; then consider || · || and use the t u

We can now apply Theorem E.56 to obtain boundary values appropriate to the smoothness of the domain. The idea is: let H s (D, H) be the completion ˜ ξ) = of S0 (D, H) in || · ||s . Let ψ ∈ S(Rn , R) with ψ = 1 on D and let ψ(t, e−iht,ξi ψ(t). Then, for f ∈ S0 (D, H), Z ˜ ξ)f (t)dt ≤ ||f ||s ||ψ|| ˜ −s . ||fˆ(ξ)|| = ψ(t, If ψ˜α (t, ξ) = tα ψ(t, ξ), then (as in earlier arguments) ||Dα fˆ(ξ)|| ≤ ||f ||s ||ψ˜α ||−s . It follows that if f ∈ H s (D, H) (s > n/2), then fˆ( · ) is differentiable and there are constants Ms (α, ξ, D) such that ||Dα fˆ(ξ)|| ≤ ||f ||s Ms (α, ξ, D). ˜ ξ)|| ≤ ||ψ(t)|| and so there are Ms (α, D) with sup ||Dα fˆ(ξ)|| ≤ But ||ψ(t, ξ Ms (α, D)||f ||s . We leave it to the reader to ponder the situation for a reflexive Banach space.

Appendix F

Finite Elements

Finite elements (cf. [B-5, E-3, H-7]) provide a mechanism for developing discrete methods for the numerical solution of many problems (particularly involving partial differential equations). These methods have not been singularly exploited in control problems. In this brief appendix, we sketch out some of the basic ideas. Let K ⊂ Rn be a compact set with int (K) a (non-empty) regular domain (i.e., ∂K is, say, Lipschitz). Let S(K) be a function space on K; let P be a finite-dimensional subspace of S(K) with dim P = d and {ϕ1 , . . . , ϕd } a basis of P; and, let {ψ1 , . . . , ψd } be a basis of P ∗ with the ψi elements of a larger space (or, better, the restrictions of such). Definition F.1. {K, P, P ∗ } is called a basic finite element. Elements of P are shape functions and elements of P ∗ are called nodal functions (or elements). We delineate several (standard) choices for P ⊂ S(K) (which is often H 1 (K)). Let Pkn = {p(x1 , . . . , xn ) : polynomials of degree ≤ k}, Qkn = {p(x1 , . . . , xn ) : p = Σαj p1j (x1 ) · · · pnj (xn ), pj ( · ) polynomials of of degree ≤ k}, Jnk

= {p(r, θ1 , . . . , θn−1 ) : trigonometric polynomials of degree ≤ k}.

Note that dim Pkn =



n+k k

 =

(n + k)! , n!k!

dim Qkn = (dim Pk1 )n = (k + 1)n , dim Pk1 = k + 1, dim J1k = 2k + 1.

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0

287

F Finite Elements

288

Example F.2. P11 = {a0 + a1 t},

P22 = {a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y 2 },

Q12 = {a0 b0 + a0 b1 y + a1 b0 x + a1 b1 xy}, Q22 = {a0 b0 + a0 b1 y + a0 b2 y 2 + a1 b0 x + a1 b1 xy + a1 b2 xy 2 + a2 b2 x2 + a2 b1 x2 y + a2 b2 x2 y 2 }, J21 = {r + a sin θ + b cos θ}, J31 = {r + r cos θ1 + r sin θ1 cos θ2 }. 1 We note that for J21 we use polar p coordinates and for J3 spherical co1 2 2 2 ordinates p to that, for J3 , r = x1 + x2 + x3 , θ1 = arccos x1 /r, θ2 = arccos x2 / x22 + x23 (with special treatment for 0) and x1 = r cos θ1 sin θ2 , x2 = r sin θ1 sin θ2 , x3 = r cos θ2 .

Definition F.3. Let D be a regular domain and S(D) a space of functions on D. A set Π = {Ki }n1 is a partition of D if (i) the Ki are compact with int (Ki ) a regular domain; (ii) int S (Ki ) ∩ int (Kj ) = φ if i 6= j; and (iii) Ki = D. Let hi = diam (Ki ) and let h = sup(hi ). h is the mesh size of Π. If Π = {Ki } is a partition of D, we let Si (D) = S(D)|Ki (restriction to Ki ). Let Pi ⊂ Si (D) be a finite-dimensional subspace with dim Pi = di and let {ϕiji : ji = 1, . . . , di } be a basis of Pi and let {ψ iji : ji = 1, . . . , di } be the dual basis in Pi∗ (viewed as elements of a larger space). Definition F.4. If Π = {Ki } is a partition of D, then {Π, {Pi }, {Pi∗ }} is a (general) finite element. Elements of Pi , are shape functions and elements of Pi∗ are nodal functions. The mesh of Π is sup{diam Ki }. Critical to the process is the structure of the partition Π. Often the Ki are the “same” type of set. For example, if D is a polygonal domain, then the Ki are all triangles. Definition F.5. Let {K, P, P ∗ } be a basic finite element. Let {ψ1 , . . . , ψd } be a given nodal basis and let {ϕ1 , . . . , ϕd } be the dual basis of P. If f ∈ dom ψi , i = 1, . . . , d, then the K or local interpolating function (or interpolant) IK (f ) is given by IK (f ) = Σψi (f )ϕi . (F.6) Proposition F.7. (i) IK is linear; (ii) ψj (IK (f )) = ψj (f ); and (iii) IK (f ) = f if f ∈ P. Proof. Obvious. The point is that the interpolant describes the elements of P.

t u

F Finite Elements

289

Definition F.8. Let Π = {Ki } be a partition of D and {Π, {Pi }, {P ∗ }} be a general (global) finite element. If f ∈ dom Pi for all i, then the global or Π-interpolating function (or interpolant) IΠ (f ) is given by IΠ (f )|Ki = IKi (f ) [i.e., the restriction to Ki is the local interpolating function for Ki ]. Proposition F.9. (i) IΠ is linear; (ii) if f ∈ dom Pi for all i, then ψiji (IΠ (f )|Ki ) = ψiji (f ); and (iii) if f ∈ Pi , then IΠ (f ) = f . u t

Proof. Obvious.

Another critical issue is determining bounds for the interpolating function which depend on the mesh of the partition. Proposition F.10. Consider Pkn (K), Qnk (K) and Jnk (K). Then Pnk (K) ⊂ H k (K), Qnk (K) ⊂ H k (K) and Jnk (K) ⊂ H k (K). Proof. K is compact and all the functions are continuously differentiable of arbitrary order. (We are only “scratching the surface” of a very complex t u subject here [B-5, E-3, H-7].) Consider the basic finite element {K, P, P ∗ } and suppose that S(K) = H (K), k ≥ 1. Then we can view P ⊂ H k (K) and P ∗ ⊂ [H k (K)]∗ = H k (K) (as H k (K) = W k,2 (K) is a Hilbert space [Appendix E]). If ψ( · ) ∈ P ∗ , then ψ extends to an element ψ˜ of H k (K)∗ = H k (K). Let {ψi } be the nodal basis and {ψ˜i } be the extension to H k (K). k

Definition F.11. Let {K, P, P ∗ } be a basic finite element and let IK (f ) = Σ ψ˜i (f )ϕi ,

f ∈ H k (K),

where ϕi ∈ P ⊂ H k (K) are the shape functions and ψ˜i ∈ H k (K) the extensions of the nodal functions. IK ( · ) is called the local interpolating function (or interpolant). Proposition F.12. (i) IK ( · ) is linear; (ii) ψ˜j (IK (f )) = ψ˜j (f ), f ∈ H k (K); (iii) IK (f ) = f for f ∈ P; and (iv) IK is a bounded linear map from H k (K) to H k (K). Proof. (i), (ii), (iii) are obvious. Let f ∈ H k (K). Then ||IK (f )|| ≤ Σ|ψ˜j (f )|· ||ϕi || and |ψ˜j (f )| ≤ ||ψ˜j || · ||f || so that ||IK (f )|| ≤ (Σ||ψ˜j || · ||ϕi ||)||f ||. t u Critical is estimating the error due to interpolation. We have (Appendix E, [E-3, B-5]):

F Finite Elements

290

Theorem F.13. Let O ⊂ Rn be a bounded, connected, open, regular domain. If 2m > n, then H m+j (O) ⊂ C j (O) for j = 0, 1, . . .. Corollary F.14. If n = 1, H 1 (O) ⊂ C (O); if n = 2, H 2+j (O) ⊂ C j (O), j = 0, . . .; and, if n = 2 + j, H 2+j (O) ⊂ C j (O), j = 0, 1, . . .. Theorem F.15. C ∞ (O) is dense in H j (O) for all j. Suppose B(u, v) is a bilinear form on a Hilbert space V , and Vd is a finite dimensional subspace of V with dimVd = d. Suppose B is an inner product and h , i is the inner product on V . Consider the problems: P:

B(u, v) = hf, vi for all v ∈ V,

(F.16)

Pd :

B(ud , vd ) = hf, vd i for all vd ∈ Vd .

(F.17)

Then if u∗ is a solution of P and ud∗ is a solution of Pd , (i) B(u∗ − u∗d , vd ) = 0 for all vd ∈ Vd (i.e., u∗ − u∗d is B-orthogonal to Vd ); and (ii) ||(u∗ − ud∗ )||B ≤ ||u∗ − vd ||B all vd , and hence, ||(u∗ − ud∗ )||B ≤ ||u∗ − I(v)||B for all v ∈ V . We want to obtain sharper estimates of the errors due to the interpolation. We first examine an example. Example F.18. Let K = [a, b], P = P11 (K) and P ∗ = (P11 (K))∗ (= P11 (K)). Let 1 γ h (a + b) i ϕ1 (t) = √ , ϕ2 (t) = √ t− , 2 b−a b−a p √ where γ = 12/ (b − a)2 + 12. Observe that for Z hϕ, ψi =

b

Z ϕψdt +

a

b

ϕ0 ψ 0 dt,

a

hϕi , ϕj i = δij for i, j = 1, 2. [Exercise: prove this.] Now let ˜ = [0, 1], K

˜ P˜ = P11 (K),

˜ ∗. P˜ ∗ = (P11 (K))

The (affine) map T :t→

(t − a) t a = − (b − a) (b − a) (b − a)

is a smooth bijective map between [a, b] and [0, 1]. If p(s) = a0 + a1 s is an element of P = P11 ([0, 1]), then (p ◦ T )(t) = a0 −

a1 a a1 t + (b − a) (b − a)

F Finite Elements

291

is an element of P and the map T ∗ : p → p ◦ T is surjective. Suppose that ψ ∈ P ∗ , then ψ ◦ T ∗ is an element of P˜ ∗ , i.e., if p ∈ P˜ , then T ∗ (p) ∈ P and ψ(T ∗ (p)) ∈ R. We write T∗ ψ for this map. It too is surjective. The point is that we can reduce estimates for an arbitrary interval [a, b] to those on a standard interval [0, 1]. ˜ P˜ ∗ ) be (basic) finite elements and ˜ P, Definition F.19. Let (K, P, P ∗ ), (K, let T be a non-singular affine map (i.e., T (x) = Ax + b, A non-singular). ˜ P, ˜ P˜ ∗ ) are (affine) equivalent if Then (K, P, P ∗ ), (K, ˜ (i) T (K) = K; (ii) T ∗ (˜ p) = p˜ ◦ T maps P˜ onto P; and (iii) T∗ (ψ) = ψ ◦ T ∗ maps P ∗ onto P˜ ∗ . ˜ P, ˜ P˜ ∗ } are Similarly, if θ is a smooth diffeomorphism, then {K, P, P ∗ }, {K, smoothly equivalent if ˜ (i) θ(K) = K; ∗ p) = p˜ ◦ θ maps P˜ onto P; and (ii) θ (˜ (iii) θ∗ (ψ) = ψ ◦ θ∗ maps P ∗ onto P˜ ∗ . ˜ [The degree of smoothness depends on that of P and P.] We observe that ([a, b], P11 ([a, b]), P ∗ = (P11 ([a, b])∗ )) and (([0, 1], P11 ([0, 1]), P = (P11 ([0, 1])∗ ))) are affine equivalent. ∗

Definition F.20. Let Kσ = {x ∈ Rn : xi ≥ 0,

n X

xi ≤ 1, i = 1, . . . , n}.

i=1

Then Kσ is the unit simplex in Rn . Kσ can also be defined as the convex hull of the points 0, {ei }, i = 1, . . . , n where ei are the unit vectors with ei = i

(0, 0, . . . 1, . . . 0). If T (x) = Ax + b, then T (Kσ ) = co (b, Ae1 + b, . . . , Aen + b) is the convex hull of n + 1 points in “general position.” If T = Ax + b is a non-singular affine transformation, then T (Kσ ) is a simplex. Let D ⊂ Rn be a regular domain. A partition {Ki } of D is simplicial (or a “triangulation”) if all the Ki are simplexes. Example F.21. Let n = 2. Then Kσ is the triangle with vertices α0 = (0, 0), α1 = (0, 1), α2 = (1, 0). Let T (x) = Ax + b be a non-singular affine transformation of R2 . Then T (Kσ ) is the triangle with vertices T (α0 ) = (b1 , b2 ), T (α1 ) = (a12 + b1 , a22 + b2 ), T (α2 ) = (a11 + b1 , a21 + b2 ). Given a triangle K with vertices b, c, d, let T1 (x) = Tx−b so that T1 (K) is a triangle with vertices 0, c − b, d − b. Let T2 (x) = Ax where A(c − b, d − b) = I (why is this possible?). Then (T2 ◦ T1 )(K) = Kσ . (This process generalizes quite considerably [B-5, E-3]).

F Finite Elements

292

Example F.22. Let D = [a, b] and a = s0 < s1 < sN = b be a partition with Ki = [sj , sj+1 ], j = 0, . . . , N − 1, and h = sup{hj }, hj = sj+1 − sj . Let Pi = P11 (Ki ), Pi∗ the dual space and {Π, {Pi }, {Pi∗ }} the global finite element. The global interpolant can be viewed as follows: let P ⊂ H 1 ([a, b]) be {p : p|Ki ∈ Pi }; let ϕji , j = 1, 2, i = 0, . . . , n − 1, be a basis of Pi and ψij the dual basis of Pi∗ . Define the map IΠ : H 1 ([a, b]) → H 1 ([a, b]) by IΠ (f ) =

i n−1 XX i=0

 ψij (f )ϕij .

j=0

Clearly IΠ is linear and, appropriately interpreted, maps P into P. Let f ∈ H 1 ([a, b]) so that f is continuous and f 0 ∈ L2 ([a, b]) ⊂ L1 ([a, b]). Then Z |f (t) − f (s)| ≤

t

|f 0 (r)|dr ≤ |t − s|1/2 ||f 0 ||2

s

(norm in L2 ([a, b])). Moreover, since f is continuous on [a, b], if |f (s0 )| = √ Rb min{|f (s)| : s ∈ [a, b]}, then |f (s0 )|(b − a) ≤ a |f (s)| · 1ds ≤ ||f ||2 · b − a so that 1 |f (s0 )| ≤ √ ||f ||2 . b−a It follows that IΠ is a well-defined map of H 1 ([a, b]) into H 1 ([a, b]) and on Kj = [sj , sj+1 ], ||(IΠ|Ki f )0 ||2 ≤ ||f 0 ||2 . Hence, since (Schwartz inequality) √ √ ||IK (f )||2 ≤ b − a||IK (f )||∞ ≤ b − a||f ||∞ , ||IΠ (f )|| ≤ c{||f ||2 + ||f 0 ||2 } independent of h. In a similar way, we can show that, if f ∈ H 2 ([a, b]), then ||f − IΠ (f )||2 ≤ h2 ||f 00 ||2 , ||(f − IΠ (f ))0 ||2 ≤ h||f 00 ||2 . If f ∈ H 1 ([a, b]), then ||f − IΠ (f )||2 ≤ h||f 0 ||2 for all h. Estimating the interpolation error is critical. Let D be a regular domain and consider Wpk (D) with ||w||kp =

 X

||Dα w||pp

1/p

||Dα w||pp

1/p

,

|α|≤k

|w|kp =

 X |α|=k

.

F Finite Elements

293

Let SN ⊂ Wpk+1 (D) be a finite-dimensional subspace with dim SN = N and let X = Wpk+1 (D)/SN be the quotient space with ||w||X = inf{||w + σ||k+1 : p σ ∈ SN }. Proposition F.23. For all w ∈ X, there exists c ≥ 0 such that |w|k+1 ≤ ||w||X ≤ c|w|k+1 p p for all k, p. Proof ([B-5, E-3]). The first inequality is obvious. Let {ψj : j = 1, . . . , N } ∗ be the dual basis in SN (and extended by Hahn–Banach to Wpk+1 (D)∗ ). We claim that n n o X k+1 ||w||k+1 ≤ c |w| + ψ (w) (F.24) j p p j=1

(note if w = w1 + σ, then ψj (w1 ) = 0). If not then there exists a bounded sequence wn with lim (|w|k+1 + Σψj (wn )) = 0. p n→∞

Since the embedding Wpk+1 (D) → Wpk (D) is compact, there is a subsequence wnν such that wnν → w in Wpk (D) (as ||wnν − w||kp → 0 and ||wnν ||kp → 0). It follows that ||Dα w|| = limnν →∞ ||Dα w|| = 0 for |α| = k + 1. Hence w ∈ SN and ψj (w) = 0 so w = 0. This contradiction proves (F.24) and the proposition. t u Corollary F.25 (Bramble–Hilbert). If ψ ∈ Wpk+1 (D)∗ with ψ(SN ) = 0, then there exists c > 0 such that |ψ(w)| ≤ c||ψ||k+1 · |w|k+1 . p p ˜ P, ˜ P˜ ∗ } be affine equivalent basic finite Lemma F.26. Let {K, P, P ∗ }, {K, ˜ and w = T ∗ (w) elements with x ˜ = T (x) = Ax + b. If w ˜ ∈ C σ (K) ˜ =w ˜◦T (so that w(x) = w(Ax ˜ + b)), then there exists c(r, n) = c (K ⊂ Rn ) such that |w|rp ≤ c

||A||r |w| ˜ r, | det A|1/p p

and, if w ∈ C r (K) and w ˜ = (T −1 )∗ (w) = w ◦ T −1 , then |w| ˜ 0p ≤ c||A−1 ||r | det A|1/p |w|rp . ˜ then w = w Proof. If w ˜ ∈ C r (K), ˜ ◦ T ∈ C r (K) and |Dα w(x)| ≤ α α ||D w(x)|| = sup||ξ||=1 {|D w(x)(ξ)| with |α| = r. Then

F Finite Elements

294

|w|pr =

Z

X

1/p |Dα w(x)|p dx

K |α|=r

≤ #{α : |α| = r}1/p

Z

1/p |Dr w(x)|p dx .

K

By the chain rule, ˜ ||Dr w(x)|| ≤ ||A||r ||Dr w||, and, since A is the Jacobian, the result follows.

t u

˜ (by density). Corollary F.27. The result holds for Wpr (K) and Wpr (K) Definition F.28. Let {K, P, P ∗ } be a basic finite element. Define the shape parameters ρ(K), h(K), σ(K), ρ(K) = sup{diam O : O a sphere, O ⊂ K}, h(K) = diam K, σ(K) =

h(K) . ρ(K)

Let Π = {Ki } be a partition. For a partition Π of a regular domain D, then the shape parameters of D relative to Π, ρ(D, Π), h(D, Π), σ(D, Π) are defined as ρ(D, Π) = sup ρ(Ki ), h(D, Π) = sup h(Ki ), σ(D, Π) =

h(D, Π) ρ(D, Π)

(the “aspect ratio” σ measures the “thinness” of Π). A family of partitions Π ν is (quasi)-uniform if σ(D, Π ν ) ≤ σ0 for all ν. Example F.29. Let K = {x ∈ R2 : ||x|| ≤ 1}, then h(K) = 2, ρ(K) =√2, σ(K) = 1. Let K =√{x ∈ R2 : 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1} then h(K) = 2, ρ(K) = √ 1, σ(K) = 2. Let K = {x√∈ R2 : 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ }, then h(K) = 1 + 2 , ρ(K) = , σ(K) = 1 + 2 / (so as  → 0, σ(K) → ∞). Let T (x) = Ax + b be an affine transformation of Rn . Then [B-5], if ˜ T (K) = K, ˜ µ(K) , µ(K) ˜ h(K) ||A|| ≤ , ρ(K) h(K) ||A−1 || ≤ , ˜ ρ(K)

| det A| =

F Finite Elements

295

since ||A|| =

1 sup{||Ax|| : ||x|| = ρ(K) and A is nonsingular}. ρ(K)

˜ P, ˜ P˜ ∗ } be affine equivalent basic finite elements with Let {K, P, P ∗ }, {K, x ˜ = T (x) = Ax+b and T ∗ : P˜ → P given by T ∗ (f˜) = f˜◦T and T∗ : P ∗ → P˜ ∗ given by T∗ (ψ)(f˜) = ψ(f˜ ◦ T ). Let {ϕi }, {ψ j } be dual bases of P, P ∗ so that ψ j (ϕi ) = δij . Set ϕ˜i = ϕi ◦ T −1 (so that ϕ˜i ◦ T = ϕi ) and ψ˜j = T∗ ψ j . Then ˜ P˜ ∗ . {ϕ˜i }, {ψ˜j } are dual bases of P, Remark F.30. IK ◦ T ∗ = T ∗ ◦ IK˜ . Proof. IK (T ∗ f˜) = IK (f˜◦T ) = Σψ i (f˜◦T )ϕi = Σ(T∗ ψ i )(f˜)ϕi = Σ ψ˜i (f˜)(ϕ˜i ◦ T ) = (IK˜ ◦ T )(f˜). t u Suppose that k, m, p, q are integers such that for the basic finite element {K, P, P ∗ }, the map τ : Wpk+1 (K) → Wqm (K) is continuous and that SN (dim SN = N ) ⊂ P and is invariant under τ (i.e., ˜ P, ˜ P˜ ∗ } be affine equivalent to {K, P, P ∗ } and let τ (SN ) ⊂ SN ). Let {K, k+1 ˜ m ˜ τ˜ : Wp (K) → Wq (K) be given by τ˜(f˜) = τ (f˜ ◦ T ). Proposition F.31. There exists a constant c(τ, k) > 0 such that ˜ 1/q−1/p |f˜ − IK˜ f˜|m,q ≤ c(τ, K)(µ(K))

hk+1 ˜ K ρm ˜ K

|f˜|k+1,p

˜ P, ˜ P˜ ∗ } affine equivalent to {K, P, P ∗ }. for all f˜ and any {K, Proof. f˜ ◦ T − IK (f˜ ◦ T ) = [f˜ − IK˜ f˜] ◦ T so that |f˜ − IK˜ f˜|m,q ≤ c||A−1 ||m | det A|1/q |(f˜ ◦ T ) − IK (f˜ ◦ T )m,q | and |˜ g ◦ T |k+1,p ≤ c||A||k+1 | det A|−1/p |˜ g |k+1,p . It follows that |f˜ − IK˜ f˜|m,q ≤ c||A−1 ||m | det A|1/q ||A||k+1 | det A|−1/p |f˜|k+1,p k+1

h˜ hm K ˜ 1/q−1/p |f˜|k+1,p . ≤ c k+1 µ(K)1/q−1/p Km µ(K) ρK˜ ρK

t u

Definition F.32. {K, P, P ∗ } is a (k, m) basic finite element if k + 1 − m ≥ 0, and if σ is the maximal degree of derivatives in P ∗ , then the map H k+1 (K) → C σ (K) is continuous. If k + 1 − m ≥ 0, m > n/2, then this ˜ is affine equivalent to K via T , then the maps holds for σ ≤ k + 1 − m. If K ˜ → H m (K), ˜ H k+1 (K)

˜ → C σ (K) ˜ H k+1 (K)

F Finite Elements

296

are also continuous. Moreover, |f˜ − IK˜ f˜|m ≤ c(K, P, P ∗ )

k+1 hK ˜

ρm ˜ K

|f˜|k+1

˜ for f˜ ∈ H k+1 (K). Definition F.33. A family Tα (K) = Kα of affine equivalent finite elements is (shape) regular if there exists σ0 such that hKα /ρKα ≤ σ0 for all α. Remark F.34. If {Kα } is shape regular, then there exists c(K, P, P ∗ ) such that |fα − IKα fα |m ≤ c(K, P, P ∗ )hk+1−m |fα |k+1 Kα for fα ∈ H k+1 (Kα ). Proof. By an earlier estimate, |fα − IKα fα |m ≤ c

hk+1 Kα µ(Kα )1/2−1/2 |fα |k+1 ρm Kα

≤ chk+1−m (σKα )m |fα |k+1 Kα ≤ cσ0m hk+1−m |fα |k+1 Kα for fα ∈ H k+1 (Kα ).

t u

Definition F.35. Let D be a regular domain and let {K, P, P ∗ } be a (k, m) basic finite element and let Π = {Ki }N i=1 be a partition of D. Π is a (k, m) shape regular partition of D based on K if (i) Ki is affine equivalent to K for all i; and (ii) hKi /ρKi = σi ≤ σ0 for some σ0 > 0 for all i (i.e., quasi-uniform). K need not be one of the Ki . If K is the unit simplex, then all the Ki are simplexes and Π is simplicial (or a “triangulation”). Remark F.36. Let fi = f |Ki . Then |fi − IKi fi |m ≤ c(K, P, P ∗ )hk+1−m |fi |k+1 ≤ chk+1−m |fi |k+1 , i h = sup hi = mesh size. Now let Π = {Ki } be a (k, m) shape regular partition of D. Let S(D) = H k+1 (D) and Si,m (D) = H k+1 (D) ∩ H m (Ki ) and Pi ⊂ Si,m (D). Then: ||f − IΠ f ||m,D ≤ chk+1−m |f |k+1,D . (F.37) Proof of (F.37). If m = 0, 1, then

F Finite Elements

297

X

||fi − IKi fi ||2

1/2

= ||f − IΠ f ||m

i

and ||fi − IKi fi ||m,Ki ≤ chk+1−m |fi |k+1,Ki ≤ chk+1−m |f |k+1,D . If 2 ≤ m ≤ k + 1, then IKi f |Ki = (IΠ f )|Ki and so X

2 ||fi − IKi fi ||m,K i

1/2

≤ chk+1−m

i

X

|fi |2k+1,Ki

1/2

i

≤ chk+1−m |f |k+1,D .

t u

Suppose now that θ is a C r -diffeomorphism with r ≥ 1 of Rn . Let Di , i = 1, . . . , n, be weak derivatives and let {K, P, P ∗ } be a basic finite element. ˜ P, ˜ P˜ ∗ } be smoothly equivalent to {K, P, P ∗ } under θ. Then Let {K, Di (f˜ ◦ θ) =

n X (Di θj )(Dj f˜) ◦ θ j=1

and, for example, Dk (Di (f˜ ◦ θ)) = =

n X

Dk (Di θj )Dk [(Dj f˜) ◦ θ]

j=1 n n X

Dk (Di θj )

j=1

n o X (Dk θ` )[D` (Dj f˜) ◦ θ `=1

(and so on). The previous results for affine equivalence can be extended but considerable care is required. For instance, the numeration of nodes must be compatible and it is usually assumed that orientation is preserved (i.e., J(θ), the Jacobian of θ is positive). Example F.38. Let K be the unit simplex triangle in R2 with α1 = (0, 0), α2 = (0, 1), α3 = (1, 0), and let θ be the map   y2 θ(x, y) = x + , y − x2 2 so that θ(α1 ) = (0, 0), θ(α2 ) = (1/2, 1), θ(α3 ) = (1, −1) and θ(x, 0) = (x, −x2 ), θ(0, y) = (y 2 /2, y), and det J(θ) = 1 + 2xy ≥ 1 > 0 as x, y ≥ 0. Consider Figure F.1, where θ(γ1 ) : x2 = −x21 , θ(γ2 ) : x1 = x2 /2, θ(γ3 ) a parabola. (Exercise: determine the equation of θ(γ3 ).) Note the distortion due to θ. This simple example is an indication of the complications of smooth equivalence. We now want to examine the notion of refinements. Let

F Finite Elements

298

α 1 = (0,1) γ2

α 1 = (0,0)

θ(γ2)

θ(α2)=(1 2 ,1)

γ3 θ γ1

α3 = (1,0)

θ(γ3 ) θ(α1)=(0,0) θ(γ1) θ(α3)=(1,1)

Fig. F.1

Kn = {x = (x1 , . . . , xn ) ∈ Rn : 0 ≤ xi ≤ 1, Σxi ≤ 1}, Cn = {x = (x1 , . . . , xn ) ∈ Rn : 0 ≤ xi ≤ 1}, Sn = {x = (x1 , . . . , xn ) ∈ Rn : Σx2i ≤ 1}, be the unit n-simplex, unit n-cell, and unit n-sphere, respectively. Let GA (Rn ) = {T : T (x) = Ax + b, det A 6= 0} be the affine group of Rn and let G+ A (Rn ) be the subgroup with det A > 0 (i.e., orientationpreserving). Observe that if T (x) = Ax + b, then T −1 (y) = A−1 (y − b) and det A−1 = 1/ det A. Remark F.39. (i) µ(Kn ) = 1/n!; (ii) µ(Cn ) = 1; √ (iii) µ(Sn ) = 2( π)n /nΓ (n/2) where √ Γ ( · ) is the gammamfunction and (2m − 1)]/2 (note that if Γ (m) = (m − 1)!, Γ (m + 1/2) = 2[1 · 3 · · · √ Sn (r) = {x : Σx2i ≤ r2 }, then µ(Sn (r)) = 2( π)n rn /nΓ (n/2)). ˜ = T(Kn ), then K ˜ is an n-simplex. If C˜ = T (Cn ), Definition F.40. If K then C˜ is an n-cell. If S˜ = T (Sn ), then S˜ is an n-sphere. ˜ = In view of the properties of affine transformations, we have µ(K) √ n ˜ ˜ | det A|/n!, µ(C) = | det A|, and µ(S) = [2( π) /nΓ (n/2)]| det A|. Observe ˜ ˜ also that G+ A (Rn ) provides an equivalence (affine equivalence) for {K}, {C}, ˜ respectively. {S}, Definition F.41. Let D be a regular domain and Π ν = {(Ki , Pi , Pi∗ )}, Π γ = {(Ci , Pi , Pi∗ )} be partitions of D. If the Ki are simplices, then Π ν is a simplicial partition; and, if the Ci are cells, then Π γ is a cellular partition. We shall consider “refinements” within the classes of simplicial and cellular ˜ ≤ ||A|| · partitions. Let√ h( · ), ρ( · ) be the shape parameters. Then h(K) √ √ ˜ h(Kn ) = ||A|| 2 and h(C) ≤ ||A||h(Cn ) = ||A||· n; and, ρ(Kn ) = 2/ n(1+ √ n), ρ(Cn ) = 2/2. (These are radii of the inscribed sphere multiplied by 2.)

F Finite Elements

299

Example F.42. Consider triangle with vertices (0, 0), (0, 1), (1, 0). √ K2 , the √ Then clearly h(K2 ) = 12 + 12 = 2. As for ρ(K2 ), consider the center c of the inscribed circle and note that the distance from c to each side of the triangle is r (radius of inscribed circle) and is the distancepfrom c to the 2 midpoint of the hypotenuse √ √ side. Thus √ c = (r, r) and r = 2(1/2 − r) = 2(1/2 − r) so that (1 + 2)r = 1/ 2).

0,

(1 2,1 2)

c

1 √ 2( 1 + √ 2)

r

r r (0,0)

(1,0) 1 √ 2( 1 + √ 2)

 Fig. F.2 c =



1 2(1 +



2)

,√

1 2(1 +

,0

 √

2)

A similar argument works for Kn , i.e., let c = (r, r, . . . ,p r) and point on face generated by e , . . . , e is (1/n, . . . , 1/n) so that r = n(1/n − r)2 = 1 n √ n(1/n − r). Alternatively, let O ⊂ Kn be the inscribed sphere. Then Kn = co (0, e1 , . . . , en ) = co (v0 , v1 , . . . , vn ) with n + 1 faces F0 = co (e1 , . . . , en ), Fj = co(v0 , v1 , . . . , vj−1 , vj+1 , . . . , vn ). P If c is the center of O, then d(c, Fj ) = r, cp = cj ej , and d(c, Fj ) = cj , √ 2 j = 1, . . . , n. Thus cj = r and d(c, F0 ) = 2j = n r and d(c, F0 ) = p 2 n(1/n − r)2 = r. Example F.43. Consider c2 with vertices (0, 0), (0, 1), (1, 0), (1, 1). (0,1)

(1,1) r r (1 2,1 2) r r

(0,0)

(1,0)

Fig. F.3

√ √ Then h(c2 ) = 12 + 12 = 2 and r = 1/2. Consider c3 with vertices (0, 0, 0), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (0, 0, 1), (1, 1, 1).

F Finite Elements

300

√ √ √ Then h(c3 ) = 12 + 12 + 12 = 3 (similarly for cn , h(cn ) = n). Let c = (1/2, 1/2, 1/2). Then the distance from c to any face of the cube is 1/2 and ρ(c3 ) = 2 · 1/2 = 1 (similarly for cn .) Pn Now Kn P = co (0, e1 , . . . , enP ). If x ∈ Rn , then x =P i=1 βi (x)ei . Let n n n β0 (x) = 1 − i=1 βi (x). Then j=0 βj (x) = 1 and x = j=0 βj (x)vj where [v0 , . . . , vn ] = [0, e1 , . . . , en ]. Moreover, βjP (vk ) = δj,k for j, k = 0, 1, . . . , n. n If x ∈ Kn , then 0 ≤ βj (x) ≤ 1 and j=0 βj (x) = 1. The βj ( · ) are called barycentric coordinates of x. Let σ be a permutation of {0, 1, . . . , n} (i.e., σ ∈ Sn+1 , the symmetric group on n + 1 symbols). Let v0 = 0, vi = ei , i = 1, . . . , n. Let βσj be the barycenter (co (vσ(j) , . . . , vσ(n) ) [note that co (vσ(j) , . . . , vσ(n) ) is Kn−j as these are n + 1 − j points]. Let Kσn = co(βσ0 , . . . , βσn ), which is an n-simplex. There are (n + 1)! such, and we have S 0 (i) int Kσn ∩ Kσn = φ if σ 6= σ 0 , and, (ii) σ Kσn = Kn . Thus {Kσn : σ ∈ Sn+1 } is a subdivision, called the barycentric subdivision of Kn . The barycenter of Kn , the point where medians to the faces meet, is (1/n + 1, . . . , 1/n + 1). If #q(k, n) is the number of k-faces of Kn , then   n+1 #q(k, n) = k+1 (as a k-face is the convex hull of k + 1 points) and (by induction using the binomial theorem).

Pn

k=0

#q(k, n) = 2n+1 − 1

Example F.44. Consider K2 = co (0, e1 , e2 ) and the group Sn+1 of order 6. Let v0 = (0, 0), v1 = (1, 0), v2 = (0, 1). Let σ0 = (0)(1)(2) (identity), σ1 = (01)(2) (permutes v0 , v1 , leaves v2 fixed), σ2 = (02)(1), σ3 = (0)(12), σ4 = (012), σ5 = (021) be the elements of S3 . Then there are 6 Kσn . v2 σ

K20 C

v0

A

B

b

v1

Fig. F.4 Barycenter (1/3, 1/3) = b.

For example, Kσ2 0 is the triangle bAv2 . We leave it to the reader to determine the remaining Kσ2 . Let Kn (λ) √ = co (0, λe1 , . . . λen ) = √ (λI)Kn , and Cn (λ) = (λI)Cn√ . Then h(K (λ)) = λ 2 and h(C (λ)) = λ n. Moreover, n √ n ρ(Kn (λ)) = 2λ/ n(1 + n) and ρ(Cn (λ)) = λ. Also µ(Kn (λ)) = λn /n!, µ(Cn (λ)) = λn . Consider Cn and note that the number of k-dimensional “faces,” #(k, n), is given by

F Finite Elements

301

  n! n−k n 2 = 2n−k (n − k)!k! k and that

Pn

k=0

#(k, n) = 3n and #(n − 1, n) = 2n.

Example F.45. Consider C2 = co (0, e1 , e2 , e1 + e2 ). (0,1)

(1,1)

(1 2,1 2)

(0,0)

(1,0)

Fig. F.5

We can subdivide C2 into 4 (= 22 ) 2-cubes by connecting the “faces.” P2 We can also subdivide into k=0 #(k, 2) = 32 = 9 2-cubes by considering the points (0, 0), (1/3, 0), (2/3, 0), (1, 0), (0, 1/3), (0, 2/3), (0, 1), (1/3, 1/3), (1/3, 2/3), (1/3, 1), (2/3, 1/3), (2/3, 2/3), (2/3, 1), (1/1/3), (1, 2/3), (1, 1).

(0,1)

(1 3 ,1)

(2 3 ,1)

(1,1)

( 0 , 2 3)

(1 , 2 3)

( 0 , 1 3)

(1 , 1 3)

(0,0)

(1 3 ,0)

(2 3 ,0)

(1,0)

Fig. F.6

This subdivision provides nine 2-cubes of side length 1/3. This method provides a small cube from each vertex, from each edge (from each k-face) and one in the center (the cubes being distinct). Definition F.46. Let F1 , F2 be opposite n − 1 cells of Cn , then there is an n − 1 cell through the center (1/2, . . . , 1/2) which “bisects” the faces. There are n such dividing n − 1 cubes each of edge 1/2. Thus Cn is subdivided into 2n Cn (1/2). This is called the bisecting cellular subdivision of Cn .

302

F Finite Elements

Definition F.47. Consider Cn (1/3) and divide the edges of Cn into thirds. Then Cn = 3n Cn (1/3) with a unique small cube from each k-face and one in the center. This is called the cubic cellular subdivision of Cn . √ Observe that h(Cn (λ)) = λ n and that ρ(Cn (λ)) = λ. Thus √ if Cn has a bisecting cellular division, then h(C (1/2))/ρ(C (1/2)) = ( n/2)/(1/2) = n n √ n and so for fixed n, the process of subdivision is uniform. Similarly, √ for the cubic cellular subdivision, we have h(Cn (1/3))/ρ(Cn (1/3)) = n and the process is uniform. Put another way, let SD( ) represent a subdivision operator and SDν (K) = SD(SDν−1 (K)), ν an integer√≥ 1, represent repeated subdivisions. Then, h(SDν (Cn )/ρ(SDν (Cn )) = n for all ν (in still other words, the aspect ratio does not change). Lemma F.48. Let K be an n-simplex, h = h(K), and SD(K) be a subdivision of K. Then n h(SD(K)) ≤ h n+1 and, consequently, limν→∞ h(SDν (K)) = 0. Proof. Let F1 , F2 be two disjoint n − 1 faces of SD(K) with barycenters C1 , C2 , respectively. The line joining C1 , C2 has length ≤ h and splits in the ratio (1 + dim F1 )/(1 + dim F2 ) so that the fraction lies between 1/(n + 1) and n/n + 1. Hence the edge length is ≤ (n/n + 1)h. t u Example F.49. Consider the unit simplex K2 and the barycentric subdivi˜ into 6 triangles. For K2 , sion: bsd (K2 ) = K, √ √ h(K2 ) 2 √ σ(K2 ) = = ( 2 + 1) = 2 + 1. ρ(K2 ) √2 2 σ0 Consider the triangle 2 ]. √ with vertices (1/3, 1/3), (1/2, 1/2), (0, 1) [i.e., σK σ0 0 Then h(K2 ) = 5/3 and, by elementary analytic geometry, ρ(K2 ) = σ0 2∆(Kσ2 0 )/s(Kσ2 0 ) where ∆(Kσ2 0 ) is length of the 2 ) is√the √ √ the area and s(K σ0 σ0 perimeter. Hence ρ(K ) = 1/2( 5 + 2) and σ(K ) = 5(2 5 + 2)/3 ≥ 2 2 √ 2 + 1. In other words, the aspect ratio increases. In fact [D-5, D-6], it can be shown that repeated barycentric subdivision is not regular (uniform). What is a solution?

Example F.50. Consider again the unit simplex K2 . Consider a subdivision K12 , K22 , K32 , K42 based on connecting the midpoints (or barycenters) of the edges. Note that the Kj2 are all congruent with (depending on an ordering of the vertices) equivalence under T (x) = Ax + b with det A = 1.

F Finite Elements

303

(0,1)

K32 (1 2,1 2)

( 0 , 1 2)

K22 K12 (0,0)

K42 (1 2 ,0)

(1,0)

Fig. F.7

Note the following: (for j = 1, 2, 3, 4) √ 1 1 h(Kj2 ) = √ , h(K2 ) = 2, h(Kj2 ) = h(K2 ), 2 2 1 1 1 j j ∆(K2 ) = , ∆(K2 ) = , ∆(K2 ) = 2 ∆(K2 ), 8 2 2 √ 1 1 j s(K2 ) = 1 + √ , s(K2 ) = 2 + 2, s(Kj2 ) = s(K2 ), 2 2 4∆(Kj2 ) 4∆(K2 ) 2 1 √ , ρ(K2 ) = √ , ρ(Kj2 ) = = = j s(K2 ) 2 + 2) 2 + 2) s(K2 ) 1 ρ(Kj2 ) = ρ(K2 ), 2 1 h(K2 ) h(Kj2 ) j σ(K2 ) = = 21 = σ(K2 ). j ρ(K2 ) 2 ρ(K2 ) Thus the aspect ratio is unchanged. Continuing the subdivision will be regular. However [E-2], this does not quite work in dimensions > 2. Example F.51. Consider K2 and let√k = 3. √ j j Then h(K√ ) = 2/3, h(K 2) 2 √ = 2,j ∆(K2 ) = √1/9.2, ∆(K2 ) = 1/2, √ j s(K2 ) = (2+ 2)/3, s(K2 ) = 2+ 2, ρ(K2 ) = 2/3(2+ 2), ρ(K2 ) = 2/2+ 2, and 1 h(K2 ) j σ(K2 ) = 3 = σ(K2 ). 1 ρ(K2 ) 3 This approach readily extends. For let k be an integer and consider the points

F Finite Elements

304

(0,1) (1 3,2 3)

( 0 , 2 3)

( 0 , 1 3)

(2 3,1 3)

(1 3,1 3)

(0,0)

(1 3 ,0)

(2 3 ,0)

(1,0)

Fig. F.8 Let Kj2 , j = 1, . . . , 9 be the triangles shown.

(0, 0), (0, 1/k), . . . , (0, k − 1/k), (0, 1), (1/k, 0), (2/k, 0), . . . , (k − 1/k, 0), (1, 0), (1/k, k − 1/k), . . . , (k − 1/k, 1/k), etc. 2 creating Kj2 for j = 1, . . . , k√ . This gives a subdivision with congruent 2√ j simplices such that h(K2 ) = 2/k, ∆(Kj2 ) = 1/2k 2 s(Kj2 ) = (2 + 2)/k and σ(Kj2 ) = σ(K2 ). This generalizes to Kn . Let Kn = co (0, e1 , . . . , en ) be the unit n-simplex. Let v0 = 0, vi = ei , i = 1, . . . , n. Let α = (α1 , . . . , αk ), αi ∈ {0, 1, . . . , n} and α1 ≤ α2 · · · ≤ αk where k ≥ 1. Given α = (α1 , . . . , αk ), let P  k j=1 vαj uα = u(α1 ,...,αk ) = . k

Suppose that αj = (α1j , . . . , αkj ) where αij ∈ {0, 1, . . . , n}. Call {αj } admissible if (i) α1j ≤ α2j ≤ · · · ≤ αkj ; and (ii) αrj = αrj−1 + θrj−1 , r = 1, . . . , k, j = 1, . . . , n, where θrj−1

 =

1 and 0

n X k X

θrj−1 = n

j−1 r=1

(in effect, add 1 appropriately and maintain the ordering). Example F.52. Let n = 2, k = 2. There are 6 possible ordered pairs, (0 0), (0 1), (0 2), (1 1), (1 2), (2 2) and 6 potential combinations of 3 pairs, namely: (1) (2) (3) (4) (5) (6)

(0 0), (0 1), (0 1), (0 2), (1 1), (0 1),

(0 1), (0 2), (1 1), (1 2), (1 2), (0 2),

(0 2), (1 2), (1 2), (2 2), (2 2), (2 2).

F Finite Elements

305

Pairs (1), (2), (3), (4) satisfy the admissibility conditions while (5) and (6) do not. For instance, for (3), α11 = α10 + 1,

α21 = α20 + 0,

α12 = α11 + 0,

α22 = α21 + 1,

θ10 + θ21 = 2, and, for (6), α11 = α10 + 0,

α21 = α20 + 1,

α12 = α11 + 2,

α22 = α21 + 0,

and, for (5), the points uα0 , uα1 , uα2 are co-linear. If α0 , . . . , αn is an admissible set, and uα0 , . . . , uαn are affinely independent, then co (uα0 , . . . , uαn ) is an n-simplex contained in Kn . Moreover, the map φ : ({1, . . . , n}) → {1, . . . , k} where Ku = co (u0 , . . . , un ) is an n-simplex contained in Kn determined by an admissible set, given by φ(j) = λj such that αλi j = αλi−1 + 1 is bijective. Hence, the number of n-simplices is (k)n j and the volume of any Ku , v(Ku ) = v(Kn )/k n . The subdivision generated in this way is called the edgewise subdivision of order k [E-2, F-2]. It can be shown that there are n!/2 congruence classes in the subdivision and that the edgewise subdivision is regular. Example F.53. Let n = 3, k = 2 so that we are considering the unit tetrahedron. An α consists of 4 affinely independent points and there are 8 = (2)3 (= (k)n ) elements of the subdivision. Let K = co (0, e1 , e2 , e3 ) and let v0 = 0, vi = ei , i = 1, 2, 3. Let [i, j] = (vi + vj )/2 for i, j = 0, 1, 2, 3. Then the subdivision simplices are given by: (1) (2) (3) (4) (5) (6) (7) (8)

co([0, 0], [0, 1], [0, 2], [0, 3]), co([0, 1], [0, 2], [0, 3], [1, 3]), co([0, 1], [0, 2], [1, 2], [1, 3]), co([0, 1], [1, 1], [1, 2], [1, 3]), co([0, 2], [0, 3], [1, 3], [2, 3]), co([0, 2], [1, 2], [1, 3], [2, 3]), co([0, 2], [1, 2], [2, 2], [2, 3]), co([0, 3], [1, 3], [2, 3], [3, 3]).

There are n!/2 = 3 congruence classes. Definition F.54. Let D ⊂ Rn be a regular domain and Π ν = {Ki , Pi , Pi∗ } be a simplicial partition (the Ki affine equivalent n-simplices). Let k ≥ 1 be a positive integer. Let SD(Ki , k) = {Kiα : α ∈ (k)n } be the edgewise subdivision of Ki , Piα = P|Kiα (restriction), and (Piα )∗ = Pi∗ |Piα . Then Π ν k = {Kiα , Piα , (Piα )∗ } is the edgewise refinement of Π ν . The edgewise refinement is regular and the error estimates for the interpolant decrease with the diameter.

F Finite Elements

306

Note if D has a “curved” boundary, then the partition must be modified so that the Ki which intersect the boundary are (appropriately) “smoothly” equivalent to a reference element. Definition F.55. A domain D such that ∂D is the union of n − 1 simplices is a polyhedron. If D ⊂ Rn is a polyhedron, then there is a sequence Π ν of (affine equivalent) simplicial finite elements with diameters hν = h(Π ν ) for D and limν→∞ hν = 0. If n/2 < s ≤ r + 1, and if ψ ∈ H s (D), then ||ψ − ψν ||1 ≤ αhs−1 |ψ|s ν for some constant α and ψν ∈ S(Π ν ) [B-5, E-3]. In other words, we can, by suitable refinement, converge to a solution.

References

[A-1] Adams, R.A., Sobolev Spaces, Academic Press, 1975. [A-2] Agmon, S., Lectures on Elliptic Boundary Value Problems, Van Nostrand, 1965. [A-3] Athans, M. and Falb, P.L., Optimal Control: An Introduction to the Theory and its Applications, Dover 2007. [B-1] Bachman, G. and Narici, L., Functional Analysis, Dover, 2000. [B-2] Bellman, R., Dynamic Programming, Princeton University Press, 1957. [B-3] Bensoussan, A., DaPrato, G., Delfour, M.C., and Mitter, S.K., Representation and Control of Infinite Dimensional Systems, Birkhäuser, 2nd ed, 2007. [B-4] Botlasso, C.L. and Ragazzi, A., “Finite Element and Runge–Katta Methods for Boundary-Value and Optimal Control Problems,” AIAA J., 2000. [B-5] Brenner, S.C. and Scott, E.R., The Mathematical Theory of Finite Element Methods, Springer-Verlag, 2002. [B-6] Budax, B.M., Berkovich, E.M., and Solov’eva, E.N., “Difference Approximations in Optional Control Problems,” SIAM J. Control , Vol. 7, No. 1, 1969. [C-1] Canon, M., Cullum, J., and Polak, E., “Constrained Minimization Problems in Finite Dimensional Systems,” SIAM J. Control , Vol. 4, 1967. [C-2] Caratheodory, C., Calculus of Variations and Partial Differential Equations of the First Order , Amer. Math. Soc., Chelsea Publishers, 2008. [C-3] Cesari, L., “An Existence Theorem in Problems of Optimal Control,” SIAM J. Control , Vol. 3, No. 1, 1965. [C-4] Clarkson, J.A., “Uniformly Convex Spaces,” Trans. Amer. Math. Soc., Vol. 40, 1936.

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0

307

308

References

[C-5] Clerc, J.L. and de Verdière, C., “Compacite dans les espaces localement convexes application saux espaces da type C(K) et L1 (µ),” Séminaire Choquet (Secretariat mathematique, Paris), 1967–68. [C-6] Coddington, E.A. and Levinson, N., Theory of Ordinary Differential Equations, McGraw-Hill, 1955. [C-7] Courant, R., “Calculus of Variations,” NYU Notes, Revised by J. Moser, 1962. [C-8] Cullum, J., “Discrete Approximation to Continuous Optimal Control Problems,” SIAM J. Control , Vol. 7, No. 1, 1969. [C-9] Curtain, R.F. and Zwart, H.J., An Introduction to InfiniteDimensional Linear Systems Theory, Springer-Verlag, 1995. [D-1] Dacorogna, B., Direct Methods in the Calculus of Variations, 2nd ed., Springer, 2008. [D-2] Daniels, J.W., “On the Convergence of a Numerical Method for Optimal Control Problems,” J. Opt. Theory and Applications, Vol. 4, No. 3, 1969. [D-3] Davies, E.B., Spectral Theory and Differential Operators, (paperback), Cambridge Univ. Press, 1996. [D-4] Davies, E.B., “The Functional Calculus,” J. London Math. Soc., Vol. 52, 1995. [D-5] Diaconis, P. and McMullen, C.T., “On Barycentric Subdivision,” Combinatorics, Probability and Computing, Vol. 20, 2011. [D-6] Diaconis, P. and Laurent, M., “On barycentric subdivision, with simulations,” arXiv:1007.3385, 2010. [D-7] Dieudonné, J., Foundations of Modern Analysis, Academic Press, 1960. [D-8] Dlotko, T., “Sobolev Spaces and Embedding Theorems,” Notes, Silesian Univ., Poland. [D-9] Dunford, N. and Schwartz, J.T., Linear Operators Part I: General Theory, Interscience, 1967. [E-1] Edelsbrunner, H. and Grayson, D., Proc. 15th Annual Symposium on Computational Geometry, 1999. [E-2] Ekeland, I. and Turnbull, T., Infinite Dimensional Optimization and Convexity, University of Chicago Press, 1983. [E-3] Ern, A. and Guemond, J.-L., Theory and Practice of Finite Elements, Springer-Verlag, 2004. [F-1] Fleming, W.H. and Rishel, R.W., Deterministic and Stochastic Optimal Control , Springer-Verlag, 1975. [F-2] Freudenthal, H., “Simplizialzerlegung von beschränkter Flachheit,” Ann. Math., Vol. 43, (1942). [G-1] Gelfand, I.M. and Fomin, S.V., Calculus of Variations, Prentice Hall, 1963. [H-1] Hager, W.W., “Runge–Kutta Discretizations of Optimal Control Problems,” in Systems Theory Modeling, Analysis and Control , Chapter 17, Kluwer, 2000.

References

309

[H-2] Halkin, H., “Method of Convex Ascent,” in Computing Methods in Optimization Problems, Academic Press, 1964. [H-3] Halmos, P., Measure Theory, Van Nostrand, 1961. [H-4] Hartman, P., “Difference Equations: Disconjugacy, Principal Solutions, Green’s Functions, Complete Monotonicity,” Trans. Amer. Math. Soc., Vol. 246, 1978. [H-5] Hille, E. and Phillips, R.S., “Functional Analysis and Semi-Groups,” Amer. Math. Soc., 1968, Revised Edition. [H-6] Hodges, D.H. and Bless, R., “Weak Hamiltonian Finite Element Method for Optimal Control Problems,” J. Guidance, 1990. [H-7] Hughes, T.J.R., The Finite Element Method , Dover, 2000. [K-1] Kalman, R.E., Falb, P.L., and Arbib, M.A., Topics in Mathematical Systems Theory, McGraw-Hill, 1969. [K-2] Kantorovich, L.V. and Akilov, G.P., Functional Analysis in Normed Spaces, Macmillan (Pergamon Press), 1964. [K-3] Kushner, H., Introduction to Stochastic Control , Rinehart & Winston, 1971. [L-1] Lax, P.D. and Milgram, A.N., “Parabolic Equations,” Ann. Math. Studies, no. 33, 1954. [L-2] Lebesgue, H., Lecons sur l’intégration et la recherche des fonctions primitives, Gauthier–Villars, Paris, Deuxième édition, 1928. [L-3] Lions, J.L., Optimal Control of Systems Governed by Partial Differential Equations, Springer-Verlag, 1971. [L-4] Lions, J.L. and Magenes, E., Non-homogeneous Boundary Value Problems and Applications, Springer, 1972. [L-5] Loomis, L.H. and Sternberg, S., Advanced Calculus, Addison-Wesley, 1968. [L-6] Luenberger, D.G., Optimization by Vector Space Methods, Wiley & Sons, 1969. [M-1] Meyers, N.G. and Serrin, J., “H = W ,” Proc. Nat. Acad. Sci. USA, Vol. 51, 1964. [M-2] Moser, J., “A rapidly convergent iteration method and non-linear partial differential equations,” Annali della Scuola Normale, Vol. 20, No. 2, 1966. [N-1] Neuberger, J.W., Sobolev Gradients and Differential Equations, Springer Lecture Notes, 1997. [O-1] Ortega, J.M, Rheinboldt, W.C., Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, 1970. [P-1] Parthasarathy, K.R., Probability Measures on Metric Spaces,, Amer. Math. Soc., Chelsea Publ., 1967. [P-2] Phillips, R.S., “On weakly compact subsets of a Banach space,” Amer. J. Math., Vol. 65, 1983. [R-1] Riesz, F. and Sz-Nagy, B., Functional Analysis, Ungar Publ. Co., 1955.

310

References

[R-2] Rockafellar, R.T., “Duality Theorems for Convex Functions,” Bull. Amer. Math. Soc., Vol. 7, No. 1, 1964. [R-3] Rockafellar, R.T., Convex Analysis, Princeton Univ. Press, 1970. [R-4] Rosenbloom, P., “The method of steepest descent,” Proc. Symp. App. Math. VI, Amer. Math. Soc., 1961. [R-5] Roxin, E., “The Existence of Optimal Controls,” Mich. Math. J., Vol. 9, No. 1, 1967. [R-6] Rudin, W., Principles of Mathematical Analysis, McGraw-Hill, 1953. [R-7] Rudin, W., Principles of Mathematical Analysis, McGraw-Hill, 3rd Edition, 1976. [S-1] Sagan, H., Introduction to the Calculus of Variations, Dover, 1992. [S-2] Schwartz, L., Mathematics for the Physical Sciences, Hermann, Addison-Wesley, 1966. [S-3] Sobolev, S.L., Applications of Functional Analysis in Mathematical Physics, Amer. Math. Soc., 1963 [S-4] Spivak, M., Calculus on Manifolds, W.A. Benjamin, 1965. [S-5] Stakgold, I., Boundary Value Problems of Mathematical Physics, Vol. I, Macmillan, 1967. [S-6] Stakgold, I., Boundary Value Problems of Mathematical Physics, Vol. II, Macmillan, 1968. [S-7] Stein, E., Singular Integrals and Differentiability Properties of Functions, Princeton Univ. Press, 1970. [T-1] Taylor, A.E., “The Resolvent of a Closed Transformation,” Bull. Amer. Math. Soc., Vol. 44, 1938. [T-2] Taylor, A.E., “Spectral Theory of Closed Dissipative Operators,” Acta. Math., Vol. 84, 1951. [T-3] Taylor, S., “An Introduction to Sobolev Spaces,” Notes, Montana State Univ. [W-1] Warga, J., “Necessary conditions for minimum in relaxed variational problems,” J. Math. Anal. Appl., Vol. 4, 1962. [W-2] Warga, J., “Relaxed Variational Problems,” J. Math. Anal. Appl., Vol. 4, 1962. [Y-1] Young, L.C., “Lectures on the Calculus of Variations and Optimal Control Theory,” Amer. Math. Soc., Chelsea Publ., 2nd Edition, 1980. [Z-1] Zahran, M.M., “Steepest descent on a uniformly convex space,” Rocky Mtn. J. Math., Vol. 33, No. 4, 2003.

Index

Hpm (D, X), 279 n H s (R √ , H), 284 S = T , 239 W m,p (D, X), 279 W m,p (∂D, X), 280 D(D), 276 Hs (Rn ), 274 S = S(Rn , H), 282 Cm,p (D, X), 279 ˜ D(D), 276 (, δ)-approximation, 165 f (A), 250 f (L), 252 α-th generalized derivative, 277 absolutely continuous, 60 adjoint of T , 219 adjoint system, 117 admissible controls, 149 admissible for g, 37 admissible variations, 149 affine equivalent, 291 algebraic adjoint, 219 algebraic basis, 36 algebraic supplement, 36 almost analytic extensions, 261 analytic functions of L, 252 annihilator of M , 221 approximate normal to W , 84 approximate solution, 169 approximate spectrum of L, 252 approximation, 163 (, δ), 165 Euler, 176, 177 Ritz–Galerkin, 171 strong, 164 weak, 163, 175, 180

approximation methods, 2 approximation of W , 85 approximation to W , 84 approximation to W at z0 relative to J, G, 85 Ascoli lemma, 61 auxiliary data, 12 Banach algebra, 241 with identity, 241 Banach control system, 58 basic finite element, 287 bisecting cellular subdivision, 301 Boolean algebra, 236 bounded, 127, 219 bounded Lipschitz metric, 131 bounded variation, 18, 60 Bramble–Hilbert theorem, 293 C k -domain, 271 canonical equations, 111 Carathéodory conditions, 133, 139 standard, 143 Carathéodory existence conditions, 130 Cayley transform of L, 258 cellular partition, 298 characteristic system, 116 classical control problem, 95, 98 closed graph theorem, 218 closed operator, 218 closure of T , 219 codimension, 225 coercive, 270 commuting projections, 236 comoving hyperplanes, 117 compact, 225 compact support, 17

© Springer Science+Business Media, LLC, part of Springer Nature 2019 P. Falb, Direct Methods in Control Problems, https://doi.org/10.1007/978-0-8176-4723-0

311

312 compatible with the discretization, 166 complete in U , 30 completely continuous, 225 conic domain, 271 conjugate function, 266 conjugate set, 265 continuous at x0 , 219 continuous spectrum, 247 control system Banach, 58 dynamical, 57 geometric, 58 convergence of a sequence, 282 converges to σ( · ) weakly, 144 convex, 263 convex on K, 264 convolution, 283 critical point, 33 critical points, 41 cubic cellular subdivision, 302 curve of steepest descent, 26 δ-partition, 165 derivative weak, 17 differentiable, 23 at x0 , 11 on U , 11 twice, 23 weakly, 17 differential system, 59 extended, 62 differential tangent to W , 84 Dirichlet condition, 47 discrete time system, 58 discretization, 166 domain, 217, 271 dynamical control system, 57 -approximate solution, 73 edgewise refinement of Π ν , 305 edgewise subdivision of order k, 305 eigenvalue, 247 elementary Hilbert space integration method, 26 elementary Hilbert space representation method, 30 elementary separable space representation method, 37 equivalence in Wg , 142 equivalence relation on W, 140 Euler approximation, 176, 177 existence theorem, 61 local, 64

Index extendable, 275 extended differential system, 62 extension, 217 extension operator, 271 extremum, 33 finite element method, 181, 186, 192, 202 finite elements, 54 finite rank, 232 first-order discretization, 177 Fourier inversion, 283 Fourier transform, 273, 282 inverse, 283 Fréchet differentiable, 79 Fréchet differential, 79 Fredholm equation, 231 Fredholm operator, 225 standard, 225 Gateaux differential, 79, 80 general finite element, 288 generalized controls, 70, 144 generalized derivative, 272 generalized Lax–Milgram theorem, 235 generalized Lipschitz, 132, 135 generalized uniformly continuous, 75 geometric control system, 58 global (P )-minimum, 263 gradient, 23 gradient coercive, 34 gradient integration method, 42 gradient method, 29 gradient property, 31, 32 Sobolev, 31 graph of T , 218 Green’s function, 51 Green’s theorem, 189 Gronwall lemmas, 64–67 Hamilton–Jacobi–Bellman equation, 109, 110, 112 Hamiltonian, 99, 103, 105, 115 Hausdorff distance, 121, 147 Hessian, 23 Hilbert–Schmidt norm, 242 Hilbert–Schmidt operator, 53, 242 identity, 241 imbedding operator, 271 implicit function theorem, 20, 25 index of T , 225 integration methods, 2 interpolant, 188, 288, 289 local, 192

Index inverse Fourier transform, 283 inverse function theorem, 19 isometry, 256 Jensen’s inequality, 265 (k, m) shape regular partition, 296 K-factorable, 167 kernel, 217 L-complete, 49 Lagrange multiplier, 8 Lagrange multipliers, 24, 38 Lax–Milgram theorem, 196 generalized, 235 Legendre transformation, 108 Legendre’s necessary condition, 106 limiting normal to W , 84 linear (perturbation) equation, 98 linear fractional transformation, 257 linear operator, 217 linear system, 180 Lipschitz domain, 271 local (P )-minimum, 263 local (relative) minimum, 82 local existence theorem, 64 local interpolant, 192 local interpolating function, 288, 289 locally Lp functions, 17 locally integrable, 271 m-differential tangent to W , 84 maximal solution, 69 Maxwell’s equations, 205 mean square approximation, 41 method elementary Hilbert space integration, 26 elementary Hilbert space representation, 30 elementary separable space representation, 37 gradient, 29 gradient integration, 42 of least squares, 48 Ritz, 29 Ritz–Galerkin, 29, 44 minimizing element, 5, 6 minimizing family, 1, 6 minimizing sequence, 1 Minkowski’s inequality, 212 mollifier, 278 Moser’s theorem, 208 multi-index, 17

313 n-cell, 298 n-simplex, 298 n-sphere, 298 nodal functions, 287, 288 normal, 245 O-admissible, 111 O-optimal, 111 O-regular, 112 open mapping theorem, 218 operational calculus, 251, 252 orthogonal complement, 237 Pn strongly approximates P, 165 Pn weakly approximates P, 165 P -convex, 263 parabolic control problem, 210 partial order on projections, 236 partition of D, 288 partition of E, 125 rational, 125 partition of unity, 279 penalty function, 3 perturbation, 83 Π-interpolant, 289 Π-interpolating function, 289 point-spectrum, 247 positive, 238 positive cone, 263 positive definite, 238 principle of optimality, 110 projection, 235 projection-valued measure, 241 Prokhorov metric, 75, 127, 131 prototype problem, 207 prototype theorem, 207 quadratic functional, 234 rational partition of E, 125 refinement, 154 regular, 275 regular for g, 37 regular point, 247 regularizer, 278 regulated, 18, 60, 125, 127 relaxed controls, 70 residual spectrum, 247 resolution of the identity relative to L, 253 resolvent set, 247 Ritz method, 29 Ritz–Galerkin approximation, 171 Ritz–Galerkin method, 29, 44, 186, 192, 198

314 Schur complement, 109 second (Fréchet) differential, 81 second variation, 103 second-order Runge–Kutta, 178 separation of convex sets, 267 set of attainable points, 118 shape functions, 287, 288 shape parameters, 294 shape parameters of D relative to Π, 294 shape regular, 296 simplicial, 291 simplicial partition, 298 smooth at x0 , 37 smooth variety in H, 24 smoothly equivalent, 291 Sobolev gradient property, 31 Sobolev space, 272 space uniformly convex, 31 space of distributions, 271 spatial variation, 118 spectral decomposition, 246 spectral family, 256 spectral map, 244 spectral measure, 259 spectral resolution of the identity relative to L, 253 spectral set, 251 spectrum of L, 244, 247 standard Carathéodory conditions, 143 standard Fredholm operator, 225 step-function, 60, 125, 127 strong approximation, 164 strong duality, 270 strongly convex, 154 support of f , 17 tangent at x0 , 11 to f at x0 , 11 tangent set to W , 84 tangent space, 24 tangent to W , 84

Index Taylor formula, 100 temporal variation, 118 theorem implicit function, 20, 25 inverse function, 19 tight, 144 topological direct sum, 36 total, 36 transfers, 57 triangulation, 291 twice differentiable, 23 twisted approximation of W , 85 twisting, 85 uniformly continuous generalized, 75 uniformly convex, 32, 211, 216 uniformly convex space, 31 unit n-cell, 298 unit n-simplex, 298 unit n-sphere, 298 unit simplex, 291 variation, 83 spatial, 118 temporal, 118 variation of f , 60 variational form, 14 vector basis, 36 weak approximation, 163, 175, 180 weak derivative, 17, 276 weak differential, 80 weak second differential, 81 weakly, 141, 142 weakly differentiable, 17 well-posed, 225 Hadamard, 235 X-annihilator of N , 221 Young extension, 168, 170