Relaxation in Optimization Theory and Variational Calculus 9783110811919

166 84 20MB

English Pages 488 Year 2011

Polecaj historie

Relaxation in Optimization Theory and Variational Calculus [Reprint ed.] 3110145421, 9783110145427

The series is devoted to the publication of high-level monographs which cover the whole spectrum of current nonlinear an

155 95 11MB Read more

Relaxation in Optimization Theory and Variational Calculus [2nd Revised and Extended Edition] 9783110590852, 9783110589627

The relaxation method has enjoyed an intensive development during many decades and this new edition of this comprehensiv

232 46 5MB Read more

Variational Calculus on Time Scales 1536143235, 9781536143232

This book encompasses recent developments of variational calculus for time scales. It is intended for use in the field o

220 35 4MB Read more

Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Engineering

824 129 13MB Read more

Variational Methods in Nonlinear Analysis: With Applications in Optimization and Partial Differential Equations 9783110647389, 9783110647365

This well-thought-out book covers the fundamentals of nonlinear analysis, with a particular focus on variational methods

321 90 77MB Read more

Variational Methods in Nonlinear Analysis: With Applications in Optimization and Partial Differential Equations 9783110647389, 9783110647365

This well-thought-out book covers the fundamentals of nonlinear analysis, with a particular focus on variational methods

367 88 5MB Read more

Variational Bayesian Learning Theory 1107076153, 9781107076150

Variational Bayesian learning is one of the most popular methods in machine learning. Designed for researchers and gradu

325 51 12MB Read more

Variational Translation Theory [1st ed.] 9789811592706, 9789811592713

This book, adopting the perspective of cross-cultural communication, theoretically justifies and addresses human variati

320 72 6MB Read more

Variation calculus and methods of optimization: educational manual 9786010416949

Some theoretical foundations of optimal control problem are expounded in the educational manual: methods of variation ca

482 57 2MB Read more

A Variational Theory of Convolution-Type Functionals 9819906849, 9789819906840

This book provides a general treatment of a class of functionals modelled on convolution energies with kernel having fin

173 71 2MB Read more

Relaxation in Optimization Theory and Variational Calculus
9783110811919

Author / Uploaded
Tomáš Roubiček

Table of contents :
Preface
1 Background generalities
1.1 Order and topology
1.2 Linear and convex analysis
1.3 Optimization theory
1.4 Function and measure spaces
1.5 Means of continuous functions
1.6 Some differential and integral equations
1.7 Non-cooperative game theory
2 Theory of convex compactifications
2.1 Convex compactifications
2.2 Canonical form of convex compactifications
2.3 Convex σ-compactifications
2.4 Approximation of convex compactifications
2.5 Extension of mappings
3 Young measures and their generalizations
3.1 Classical Young measures
3.2 Various generalizations
3.3 Convex compactifications of balls in Lp-spaces
3.4 Convex σ-compactifications of Lp-spaces
3.5 Approximation theory
3.6 Extensions of Nemytskiĭ mappings
4 Relaxation in optimization theory
4.1 Abstract optimization problems
4.2 Optimization problems on Lebesgue spaces
4.3 Example: Optimal control of dynamical systems
4.4 Example: Elliptic optimal control problems
4.5 Example: Parabolic optimal control problems
4.6 Example: Optimal control of integral equations
5 Relaxation in variational calculus I
5.1 Convex compactifications of Sobolev spaces
5.2 Relaxation of variational problems; p > 1
5.3 Optimality conditions for relaxed problems
5.4 Relaxation of variational problems; p= 1
5.5 Convex approximations of relaxed problems
6 Relaxation in variational calculus II
6.1 Prerequisities around quasiconvexity
6.2 Gradient generalized Young functionals
6.3 Relaxation scheme and its FEM-approximation
6.4 Further approximation: an inner case
6.5 Further approximation: an outer case
6.6 Double-well problem: sample calculations
7 Relaxation in game theory
7.1 Abstract game-theoretical problems
7.2 Games on Lebesgue spaces
7.3 Example: Games with dynamical systems
7.4 Example: Elliptic games
Bibliography
List of Symbols
Index

Citation preview

de Gruyter Series in Nonlinear Analysis and Applications 4

Editors A. Bensoussan (Paris) R. Conti (Florence) A. Friedman (Minneapolis) K.-H. Hoffmann (Munich) L. Nirenberg (New York) Managing Editors J. Appell (Würzburg) V. Lakshmikantham (Melbourne, USA)

Tomas Roubicek

Relaxation in Optimization Theory and Variational Calculus

W DE G Walter de Gruyter · Berlin · New York 1997

Author Tomas Roubicek Mathematical Institute Charles University, 18600 Praha 8 and Institute of Information Theory and Automation Academy of Sciences of the Czech Republic 18208 Praha 8, Czech Republic 1991 Mathematics Subject Classification: Primary: 49-02; 49J, 49K, 90D Secondary: 54D35, 93C, 34H05, 73C50, 65K10 Keywords: Optimization theory, optimal control, ordinary and partial differential equations, integral equations, variational calculus, game theory

© Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permance and durability. Library of Congress Cataloging-in-Publication

Data

Roubicek, Tomas, 1956— Relaxation in optimization theory and variational calculus / Tomas Roubicek. p. cm. - (De Gruyter series in nonlinear analysis and applications, ISSN 0941-813X ; 4) Includes bibliographical references (p. ) and index. ISBN 3-11-014542-1 (alk. paper) 1. Relaxation methods (Mathematics) 2. Mathematical optimization. 3. Calculus of variations. I. Title. II. Series. QA297.55.R68 1996 515'.64—dc20 96-31728 CIP

Die Deutsche Bibliothek -

CIP-Einheitsaufnahme

Roubicek, Tomas: Relaxation in optimization theory and variational calculus / Tomas Roubicek. - Berlin ; New York : de Gruyter, 1997 (De Gruyter series in nonlinear analysis and applications ; 4) ISBN 3-11-014542-1

ISSN 0941-813 X © Copyright 1997 by Walter de Gruyter & Co., D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printed in Germany. Diskconversion with TßX: Lewis & Leins, Berlin. Printing: Gerike GmbH, Berlin. Binding: Lüderitz & Bauer GmbH, Berlin. Cover design: Thomas Bonnie, Hamburg.

To the memory of Marie and Dr. Ervin Robitschek, victims of the Holocaust.

Preface

"Has not every... variational problem a solution, provided... if need be that the notion of a solution shall be suitably extended?" David Hilbert [227, p. 470]

"This sentence1 had an immense influence on analysis: it led ultimately to 'weak' or generalised solutions, to my generalised curves and to Schwartz distributions,..., and thence to chattering controls, to mixed strategies in Game Theory, all things that we find essential today." Laurence Chisholm Young [475, p. 241]

Let us begin with a piece of history. In his 20th problem, David Hilbert started a fascinating development going through the whole 20th century, namely the effort to generalize more and more the notion of solutions to various applied problems that mathematics met. This includes in particular the calculus of variations, ordinary and partial differential equations, problems from optimization theory and game theory. The generalization is always based on a natural (mostly continuous) extension of the original problems; such continuous extension is often addressed as relaxation. The first success was a generalization of the classical solution to differential equations to the weak solution 2 , which admits less smoothness than necessary to evaluate the original differential equation in the usual sense. This is basically related with the theory of distributions invented later by L. Schwartz [409]. The weak formulation of boundary and/or initial value problems for differential equations is nowadays accepted so generally that the original classical solution is often reckoned as less natural. Later it was found that in some nonlinear problems one must handle apart from the loss of smoothness also two other phenomena: oscillations and concentrations of the solutions. The former phenomenon was for the first time treated in the pioneering work by L.C. Young [471, 472, 473], followed also by E.J. McShane [312, 313]. The latter was investigated typically in connection with the famous Plateau's minimal surface problem by a lot of authors 3 . The renascence of Young's idea was in connection with generalization of solutions to some optimal control problems 4 and much later to some nonlinear partial 'It refers to a sentence by D. Hilbert with a similar meaning as that one cited above. The first appearance of this philosophy is probably in the work by Leray [287], The huge development took place since the fifties with the work by Sobolev, Lions, and many others. 3 See Giusti [207] for detailed references. 4 The first contributions appeared as from the sixties by Gamkrelidze [198, 199], GhouilaHouri [203], McShane [314], Medhin [308], Rishel [371] and Warga [455, 456, 457, 458, 2

viii

Preface

differential equations and nonconvex variational problems arising in continuum mechanics 5 . Just recently, DiPerna and Majda in their pioneering work [155] made the first attempt to deal with concentration and oscillation effects simultaneously. The common feature of this ever going generalization is to solve more and more general problems and to ensure existence of their solutions (in a reasonable sense) in larger classes than the original ones where the existence can actually fail. A typical property that makes this possible is compactness. The enlarged sets, where the solutions are sought, represent thus certain compactifications of the original sets where the problems are "classically" formulated, and the finer the compactifications, the "more generalized" the solutions we thus obtain. This can, in principle, yield eventually very fine and abstract compactifications which are not easy to imagine, nor to use for more detailed investigations. A natural restriction of generality is to require the existence of some "auxiliary" algebraic structure which could be also used for a more detailed analysis. It appears useful to require the investigated compactification to be a convex subset of some linear topological space; then we will speak about a convex compactification. Typical usage of the convex structure is for optimality conditions. The development of the research of optimality conditions has been also encouraged by David Hilbert [227], namely in his 23th problem in connection with the calculus of variations. This led to the weak formulation of the Euler-Lagrange equation, and later also to an appropriate optimality condition for problems involving oscillation phenomena, namely the so-called Euler-Weierstrass condition for variational problems 6 and the Pontryagin maximum principle for optimal control problems. 7 During the whole 20th century, we can also observe a parallel, intensive development of the supporting areas of mathematics, in particular of general topology, abstract functional analysis, and later also nonlinear analysis and optimization theory. The purpose of this book is to reflect these achievements and give a fairly abstract-analysis viewpoint of the concrete problems mentioned above. Also it can 459]. Recently many other authors dealt with these so-called relaxed controls, e.g. Ahmed [10, 11, 13], Balder [27, 28, 29, 30], Berliocchi and Lasry [61], Carlson [97], Chryssoverghi [121, 122], Goh andTeo [209, 435], Halanay [219], Fattorini [172, 173, 174, 175, 176, 177], Papageorgiou [342, 343, 345], Rosenblueth [375, 376, 377], Schwarzkopf [410], etc. Cf. also McShane [315] for a survey. For another relaxation approach see Rubio [400], 5 This was initiated by Tartar [429], followed by many other authors, especially by Ball [36], Ball and James [37, 38], Chipot and Kinderlehrer [118], Dacorogna [143, 144], DiPerna [154], Evans [168], Kinderlehrer and Pedregal [255, 257], Murat [324], Schonbek [408], etc. 6 For one-dimensional relaxed problems it was first formulated by Young [472] and McShane [313], and generalized for special two-dimensional cases in [473]. 7 First it was formulated only for ordinary controls by the Russian school around Pontryagin, involving also Boltyanskii, Gamkrelidze and Mishchenko in [68, 69, 199, 355, 356], following some earlier ideas of Hestenes [223]. The origin of these ideas can be found, however, already in the work of Caratheodory [95], Valentine [446] and Weierstraß [465]; cf. also Pesch and Bulirsch [353] for a historical survey. The extension of the maximum principle for relaxed controls is due to Gamkrelidze [200] and Rishel [371] and, in a more general form, to Warga [460], Fattorini [173, 176], Halanay [219], Schwarzkopf [410] and many others.

Preface

ix

be said that the presented point of view represents a properly nonlinear approach because the original linear structure (if any) is ignored to a more or less extent and a new one is imposed for the relaxed problem. I believe this reflects genuinely the fact that the original nonlinear problems themselves more or less violate the linear structure of the original spaces. Let us now go briefly through the contents of the book. After Chapter 1, which only summarizes very briefly and mostly without proofs some standard mathematical background, the general theory of convex compactification is introduced in Chapter 2. This provides an abstract framework of our relaxation method. Then, in Chapter 3, this general convex-compactification theory is applied to get (σ - or also locally) compact convex envelopes of Lebesgue spaces, which represents a basic tool for relaxation of concrete problems appearing in variational calculus and optimization of systems described by differential equations. Having the tools at hand, we will be able to treat concrete problems. In Chapters 4 - 6 they will typically have the abstract structure

(P)

where is a cost function, Y a Banach space of "states", U is a set of admissible "controls", and Π : Υ χ U A and Β : U χ Y Ai are mappings forming, respectively, the "state equation" and "state constraints", A and A! are Banach spaces, the latter one being ordered. The extended (relaxed) problems will then have the structure:

(RP)

where / : Ζ χ F ^ Μ, Κ is a convex set in a locally convex linear topological space Z , and Π : Ζ χ Υ —> A and Β : Ζ χ Υ Α\ are continuous mappings. The original set U is to be considered densely imbedded into Κ, and / , Π, and Β are regarded as extensions of 7 , Π, and B, respectively. The following questions will be pursued both on an abstract level and in particular cases: 8 • Relations between (P) and (RP); in particular the so-called correctness of the relaxation scheme. • Relations between various relaxation schemes (RP) for a given (P). • Existence and stability of solutions to (RP); well-posedness of (RP). 8

A s the emphasis is put on the relaxation method itself, a lot of other aspects will remain untouched; this includes higher-order optimality conditions, problems yielding nonsmooth relaxed problems, sensitivity analysis, etc.

χ

Preface

• First-order optimality conditions for (RP); Pontryagin or Weierstrass maximum principles. • Impacts of results for (RP) to the original problem (P). • Approximation theory for the relaxed problem (RP). • Numerical implementation of approximate relaxed problems. Going from simpler tasks to more complicated ones, we begin in Chapter 4 with the optimal control problems, which certainly represent the simplest variant of (P), at least if the state constraints have reasonable structure. A typical example is an optimal control problem for a nonlinear dynamical system (t e (0,T) denotes time): Minimize lize /

(Pi)

(p(t,y,u)dt

(cost funtional)

Jo dy subject to — = R k , r 0 e R", and S(t) C are subject to certain data qualification, Wl,q and Lp denote, respectively, the Sobolev and the Lebesgue spaces. Quite equally, the state equation can (and will) be a nonlinear partial differential equation, say of elliptic or parabolic type, or a nonlinear integral equation. In every case, the resulting relaxed problem (RP) has the property that the extended state equation Π(ζ, y) = 0 admits, for any ζ € Κ, precisely one solution y = jt(m) and the mapping π : Κ ^ Υ, called an (extended) state operator, is continuous and even smooth. Chapter 5 is devoted to scalar-variational-calculus problems of type Minimize ί φ(χ, _y(jc), Vy(x))düc (P2)

Jn

subject to y e Whp(ü,)

.

Here Ω C R" is a bounded Lipschitz domain and the "energy density" φ : Ω χ Μ χ Κ" Μ satisfies certain data qualification but φ{χ, r, ·) : R" —> R is allowed to be nonconvex. Then the relaxed variational problem has the form (RP) with Π linear, Κ convex σ -compact but noncompact, and Β = 0. The vectorial-variational-calculus problems of type

{

Minimize J φ(χ, y(x), V^(jc))djc

(P3)

subject to y e W'^;Rm)

,

with φ : Ω χ Rm χ R m x " ->- R are handled in Chapter 6; the adjective "vectorial" refers to y being R m -valued, m > 1. Although at the first sight (P3) has the same

Preface

xi

form as (P2), (P3) is much more difficult than (P 2 ) when simultaneously η > 2 and m > 2, and there are several essential differences between Chapters 5 and 6. One of them is that either Κ is nonconvex or Π is nonlinear and also Β φ 0 in general. Contrary to (P2) whose understanding is fairly complete, there are still many open essential questions as far as the relaxation of (P3) is concerned. This is basically connected with our poor understanding of quasiconvexity and related questions. In Chapters 4 - 6 , two fundamental concepts (i.e. compactness and convexity) have been used rather separately - the former one ensured existence and stability of solutions while the latter one enabled us to make a more detailed analysis, e.g. to pose optimality conditions. However, there are applications with much more intimate connections between convexity and compactness. We have in mind the noncooperative game theory, or more generally the underlying fixed-point theory, where typically compactness and convexity are required simultaneously to ensure mere existence of solutions. This will be the topic of Chapter 7, though it represents rather a sample of the wide area of potential applications. Let me conjecture here that every abstract problem where compactness and convexity play a certain role can reasonably be interpreted as a relaxed problem to a certain "original" problem. It should be emphasized that the relaxation method has, besides its purely mathematical aspects, also an essential interpretation aspect. For example, the fact inf(P) > min(RP) must be prevented in some situations while in other situations it is welcome. The former case is typical for variational problems where this fact means that an optimal relaxed solution cannot be attained by a minimizing sequence for the original problem. This is due to the necessity of the constraint u — Vy to hold exactly. 9 The latter case appears typically in state-constrained optimal control problems where only the state equation is to be satisfied exactly while the state constraints may be satisfied only approximately, with a certain tolerance. Then the gap inf(P) > min(RP) means that relaxed controls can achieve a lower "cost" than the original ones, which is naturally welcome. 1 0 The book should be read according to the following chart:

The material of Chapter 1 (presented mostly without proofs) is more or less standard and the reader can basically only consult it via the Index when needed. Likewise, if not interested in convex-compactification theoretical background in details, the reader can more or less skip also Chapter 2. Of course, the reader not 9

Cf. also Remark 5.4.6. This aspect can be reflected by introducing suitable "tolerances" in optimization problems; cf. [381] for a systematic pursuit of the "tolerance approach". l0

xii

Preface

involved in numerics will skip the corresponding sections, namely 2.4, 3.5, 4.3.d and 5.5. The reader is also asked for tolerating occasionally a bit unusual notation, created as a compromise by unifying the standard notation from various fairly distant fields." Sometimes, the very standard notation appearing in fairly different occasions was kept hopefully without any confusion. 12 Bibliographical notes are mostly mentioned as footnotes; anyhow, because of the wideness of the subject, only basic references are provided either from the historical purposes or just as a source of other references for a more detailed study. Finally, I would like to mention some colleagues with whom I had numerous fruitful discussions, among them especially Professors K. Bhattacharya, N.D. Botkin, B. Dacorogna, H.O. Fattorini, J. Haslinger, J. Maly, S. Müller, I. Netuka, J.V. Outrata, W.H. Schmidt, J. Soucek, and V. Sveräk. When writing this monograph, I also benefited from the courses I held during the academic years 1993/94 and 1995/96 at Charles University in Prague for undergraduate and graduate students. Moreover, M. Kruzik and M. Mätlovä contributed, apart from careful reading and commenting the whole manuscript, by computer implementation of proposed algorithms and by calculation of the examples. It is my duty and pleasure to express my deep thanks to all of them. Last but not least, special gratitude is to Professor Karl-Heinz Hoffmann, Professor Jindrich Necas and Dr. Jiff Jarusek, who influenced very essentially both my intellectual live and professional career and thus, directly or not, the theme of this book. I also warmly acknowledge the hospitality of Institut für Angewandte Mathematik und Statistik (TU München), where a great deal of the book has been accomplished. Besides, shorter stays at ENS (Lyon), IMA (Minneapolis), EPF (Lausanne) and Ernst-Moritz-Arndt-Universität (Greifswald) were inspiring. Praha / München, January 1996

Tomds Roubicek

" A typical dilemma was, e.g., that "w" normally stands for the control variable in optimal control theory while in variational calculus it denotes usually the state variable, "p" denotes the polynomial growth in L p -spaces while in optimal control it stands for the adjoint state, etc. l2 E.g., ό stands for a small positive real number, and also for the Dirac distribution, and for the indicator function.

Contents

Preface

vii

1

Background generalities 1.1 Order and topology 1.2 Linear and convex analysis 1.3 Optimization theory 1.4 Function and measure spaces 1.5 Means of continuous functions 1.6 Some differential and integral equations 1.7 Non-cooperative game theory

1 1 9 16 35 47 50 60

2

Theory of convex compactifications 2.1 Convex compactifications 2.2 Canonical form of convex compactifications 2.3 Convex σ -compactifications 2.4 Approximation of convex compactifications 2.5 Extension of mappings

68 69 71 81 92 96

3

Young measures and their generalizations 3.1 Classical Young measures 3.2 Various generalizations 3.2.a Generalization by Fattorini 3.2.b Generalization by Schonbek, Ball, Kinderlehrer and Pedregal 3.2.c Generalization by DiPerna and Majda 3.2.d Fonseca's extension of L 1 -spaces 3.3 Convex compactifications of balls in L^-spaces 3.4 Convex σ-compactifications of L p -spaces 3.5 Approximation theory 3.6 Extensions of Nemytskn mappings 3.6.a One-argument mappings: affine extensions 3.6.b Two-argument mappings: semi-affine extensions 3.6.c Two-argument mappings: bi-affine extensions

123 131 148 151 173 188 202 202 206 215

Relaxation in optimization theory 4.1 Abstract optimization problems 4.2 Optimization problems on Lebesgue spaces 4.3 Example: Optimal control of dynamical systems 4.3.a Original problem

222 223 242 252 252

4

102 103 120 120

xiv

Contents

4.4 4.5 4.6

4.3.b Relaxation scheme, correctness, well-posedness 4.3.c Optimality conditions 4.3.d Approximation theory 4.3.e Sample calculations Example: Elliptic optimal control problems Example: Parabolic optimal control problems Example: Optimal control of integral equations

261 267 275 282 287 300 312

5

Relaxation in variational calculus I 5.1 Convex compactifications of Sobolev spaces 5.2 Relaxation of variational problems; ρ > 1 5.3 Optimality conditions for relaxed problems 5.4 Relaxation of variational problems; ρ = 1 5.5 Convex approximations of relaxed problems

320 321 331 339 349 354

6

Relaxation in variational calculus II 6.1 Prerequisities around quasiconvexity 6.2 Gradient generalized Young functionals 6.3 Relaxation scheme and its FEM-approximation 6.4 Further approximation: an inner case 6.5 Further approximation: an outer case 6.6 Double-well problem: sample calculations

366 367 372 383 391 394 399

7

Relaxation in game theory 7.1 Abstract game-theoretical problems 7.2 Games on Lebesgue spaces 7.3 Example: Games with dynamical systems

412 412 419 423

7.4

437

Example: Elliptic games

Bibliography

443

List of Symbols

463

Index

469

Chapter 1

Background generalities

In this chapter we collect selected fundamental concepts and results concerning general topology, functional analysis, and optimization and game theory. We also summarize some results from the theory of function spaces, the theory of means of spaces (or rings) of continuous functions, and from differential and integral equations. This chapter is not intended as a survey of these fields because only items needed frequently throughout the book are included here. Also the generality is rather restricted to the level which is actually required in what follows. Moreover, some notions needed only locally have not been included into this chapter. They will be stated as footnotes at the relevant places. As the reader is supposed to have a basic knowledge from general topology, functional analysis, and function spaces, we introduce the results in Sections 1.1, 1.2, and 1.4 without proofs. All other results, though being more or less also quite standard, are mostly accompanied by (at least sketched) proofs. In fact, this chapter should be used rather as a consultation via Index within reading the following chapters but not as a thorough systematic study. Set-theoretical notions like relations, mappings, inverse mappings, Cartesian products, etc., are supposed to be well-known and will not be defined here.

1.1

Order and topology

In this section we will briefly summarize fundamental ideas and results concerning ordered sets and general topology. 1 A binary relation, denoted by f(x2)). We say that x\ e X is the greatest element of the ordered set X if x2 < x\ for any x2 e X. Similarly, x\ € X is the least element of X if x\ < X2 for any x2 e X. We say that jci € X is maximal in the ordered set X if there is no X2 € X such that jci < x2. Note that the greatest element, if it exists, is always maximal but not conversely. Similarly, χι e X is minimal in X if there is no x2 € X such that X\ > χ2· The ordering < on X induces also the ordering on a subset A of X, given just by the restriction of the relation |ο Pr0Perty.

3

in question if there is ξο e a such

1.1.2. Example. (Concept of sequences.) The set of all natural numbers Ν ordered by the standard ordering < is a directed set. The nets having Ν (directed by this standard ordering) as index set are called sequences. Any subsequence of a given sequence can be simultaneously understood as a finer net.4 A collection SF of subsets of X will be called a filter on X if A, Β e $F implies Α Γ) Β e SF, if A D Β e f implies A e 3% and if 0 £ Ψ. Furthermore, a collection 20 of subsets of X will be called a filter base on X if A\, A2 € SÖ implies Β C Αχ (Ί A2 for some Β e and if 0 £ SÖ. For a filter base, the collection [A c X; 35 e : A D 5 } is a filter on X; we will say that this filter is generated by the filter base SÖ. Furthermore, we will introduce a topology τ of a set X, which will be a collection of subsets of X such that τ contains the empty set and X itself, and with every finite collection of sets also their intersection, and also with every arbitrary collection of sets also their union. The elements of τ are called open sets (or τ-open, if we want to indicate explicitly the topology in question), while their complements are called closed. A set X endowed with a topology τ will be called a topological space; sometimes we will denote it by (X, τ ) to refer to τ explicitly. Having a subset A c Χ, τ\ A := {Α Π Β; Β e τ} is a topology on A; we will address it as a relativized topology. A collection TQ of subsets of X is called a base (resp. a pre-base) of a topology τ if every τ-open set is a union of elements of To (resp. a union of finite intersections of elements of To). Having χ e Ν C X, we say that Ν is a neighbourhood of χ if there is an open set A such that χ e A c Ν. It is easy to see that the collection of all neighbourhoods of a given point x, denoted by Ν (x), is a filter on X; we will call it the neighbourhood filter of x. Moreover, we define the interior, the closure, and the boundary of a set A respectively by int(A) := {jc e l ; BN e N(x) : Ν c A} , cl(A) :=

{JC

eX;

VN e X(x)

: Ν ΠΑ φ 0} ,

bd(A) := cl(A) \ int(A) . Having A C Β C X, we say that A is dense in Β if cl(A) D B. A topological space is called separable if it contains a countable subset which is dense in it. Having a net i n the topological space X, we say that it converges to a point χ € X if, for any neighbourhood Ν of x, there is £0 € Ξ large enough 4

Indeed, having a sequence {x^J^gN and its subsequence {xk}keN with some jV c N, one can put Η := (Ν, §; see, e.g., Engelking [165, Proposition 1.6.1] for details. 6 In the literature, the notions of "stronger" and "weaker" are sometimes used in place of "finer" and "coarser", respectively. 7 on an An example of a nonmetrizable topology is the product topology on uncountable number of Hausdorff topological spaces Xj having at least two elements.

1.1 Order and topology

5

Topologies of a given setX may have various important properties. One of them is a separation property. We say that τ is a Γο-topology (resp. Γι-topology) if for any X\, x2 G X there is N\ e X(x\) such that x2 ^ Μ or (resp. and) there is N2 € Ν ( χ 2 ) such that x\ g N2. If, for any x\,x2 e X, there are N\ e J{(jci) and N2 € Ν (*2) such that N[ PIN2 = 0, then τ is called a ^-topology or Hausdorff topology. Every net in a Hausdorff space may have at most one limit point. Α Hausdorff topological space (Χ, τ ) is called completely regular8 if, for any χ e X and any Ν e Ν (χ), there is a continuous function f : X R such that f(x) — 0 and f(X\N) — 1. Eventually, 9 a Hausdorff topological space (Χ, τ) is called normal if for every closed mutually disjoint subsets M,N C X there is a continuous function / : X —• [0, 1] such that f{N) = 0 while f(M) = 1. A collection of open intervals To {(a, +oo); a e 1 } is an example of a topology on the real line Ε which is a Γο-topology but not a T\ -topology. Having a topological space (Χ, τ), a (τ, To)-continuous function / : X —>· R will be also called lower semicontinuous (with respect to the topology τ ) . A function / : X —> Ε is called upper semicontinuous if —/ is lower semicontinuous. A function / : X —>· R is called continuous (with respect to the topology τ on X) if it is both lower and upper semicontinuous. The reader can easily verify that this continuity is equivalent to the (τ, T\)-continuity where T\ denotes the standard topology on R induced by the metric d(a\,a2) := \a\ — a2\. Besides, the lower (resp. upper) semicontinuity of / is equivalent to \immf^xf (je) > f (x) (resp. limsupje^/ (x) < f(x)), where the limes inferior and limes superior are defined respectively by lim i n f / (x) = lim sup / (x) = x^x

sup

inf / (χ) ,

inf

sup / (χ) .

NzH(x)x e N

1.1.3. Example. (Universal index set.) It is known 10 that, if X is a completely regular topological space, then there is a filter base °lt on Χ χ X such that, for any χ e X, °\L(x) := {{x e X; (Je, x) e β}; Β € is a base of the neighbourhood filter Κ (χ). For many investigations in completely regular spaces, a universal sufficiently rich index set is Ξ := °U ordered by the inclusion which makes it directed. Completely regular Hausdorff spaces are denoted as Γ31 -spaces and sometimes also called Tikhonov spaces in the literature. 9 This separation property is also addressed as Γ4. For further separation qualification, namely 7 3 , T5, and T^, we refer, e.g., to Engelking [165]. 10 In fact, it suffices to take a base of any uniformity structure on X; cf., e.g., Bourbaki [80, Chapter II] or Engelking [165, Chapter 8],

6

1 Background generalities

Then, for example, every χ e cl(A) can be attained by some net C A.11 Moreover, if a net {JC^ e H has a cluster point χ e X, then we can claim that there exists a finer net (using the same index set) which converges to x.n The central topological notion we will rely on is the compactness. A topology τ on X is called compact if every cover of X by open subsets contains a finite sub-cover. Equally 13 we can define τ compact if every net in X has a cluster point in X. Continuous mappings map compact sets onto compact ones. On a given set, compact topologies are minimal in the class of all Hausdorff topologies. Every lower (resp. upper) semicontinuous function X —>· Μ on a compact topological space ( Χ , τ ) attains its minimum (resp. maximum), which is known as (a generalization of) the Bolzano-Weierstrass theorem.14 A topological space (Χ,τ) is called sequentially compact if every sequence in X admits a subsequence that converges in X. A metrizable topology is compact if and only if it is sequentially compact, 15 while for non-metrizable topologies these notions are not comparable. 16 A subset A of a topological space {Χ,τ) is called relatively (sequentially) compact if the closure of A is (sequentially) compact in X. A topological space is called σ-compact if it is a union of a countable number of compact subsets, and it is called locally (sequentially) compact if every point possesses a (sequentially) compact neighbourhood. Having an arbitrary collection { ( X j , Tj)}jej of topological spaces, we define the topology τ on the product X Π/ε./ canonically as the coarsest topology on X that makes (τ, τ/)-continuous all the projections X —» Xj; this topology has 17 a base {Yljej Aj\ Vj e J : Aj e TJ, & Aj = Xj for all but a finite number of indices j e / } . The Tikhonov theorem,18 based on the Kuratowski-Zorn lemma, says that the product space (Χ,τ) is compact if and only if all (Xj, r y ) are compact; however, if J is countable and all (Xj,Tj) are metrizable, the usage of the KuratowskiZorn lemma is rather trivial so that the compactness of Π/ Μ be concave and lower semicontinuous. Then f attains a minimum on A in at least one extreme point of A. From Bauer's principle one can prove 41 the following important theorem which shows that extreme points can fully characterize the shape of convex compact subsets: 1.2.6. Theorem. 42 (Krem and Milman [270].) Let A be a convex compact subset of a locally convex space X. Then A is the closed convex hull of its extreme points: (2.11)

A = cö ({x G A; χ is an extreme point in A}) .

By a ray we will understand an open half-line. We say that R in A C X is an extreme ray in A if every open interval contained in A and intersecting R is fully contained in R , in other words, a ray R:— {xo + ax\ a G R + } is called extreme if xi,X2 G A and XO + ax = äx\ + (1 — ä)X2 for some a e R + and ä G (0, 1) implies that, for any a G (0, 1), there is 0 e R + such that XQ + ax = äx\ + (1 — α)χ2· 40

ln fact, [94] asserts that every point of a simplex is a convex combination of its vertices, from which Theorem 1.2.4 already follows quite directly; cf. also Valentine [447, Section I.Cj. 41 For the proof of Krem-Milman theorem based on Bauer's principle see, e.g., Choquet [120, Section 25], 42 See also, e.g., Bishop and Bridges [66, Section 7.7], Day [148, Section V.l], Dunford and Schwartz [160, Theorem V.8.4], Edwards [162, Chapter 10], Köthe [265, Section 25.1] or Valentine [446].

16

1 Background generalities

1.2.7. Theorem. 43 (Klee [259].) Let A be a closed convex locally compact subset of a locally convex space X and let A do not contain a line. Then A is the closed convex hull of all its extreme points and extreme rays, i.e. (2.12)

A = cö ({x e A; χ is an extreme point in A} U{R C A; R is an extreme ray in A}) .

1.3

Optimization theory

In this section we summarize some basic results concerning existence, stability, approximation and optimality conditions for the abstract constrained minimization problem in the form Minimize Φ(ζ)

{

for ζ e Ζ ,

subject to R (ζ) < 0 , ζ e Κ

where Ζ is a locally convex space, τ being its topology, Κ a subset of Ζ, Φ : Ζ Μ a "cost" function, R : Ζ —• Λι, and Λι a Banach space ordered by a closed convex cone D with the vertex at the origin. If D has a non-empty interior, i.e. intA, (D) φ 0, we will write λ < λ if and only if λ — λ eintA, (D). Besides, we will consider a locally convex topology θ on Λ| finer than the weak topology of A\, so that the cone D is θ -closed, as well. Throughout this section, we will suppose (3.1a)

Κ is convex ,

(3.1b)

Φ lower semicontinuous with respect to τ ,

(3.1c)

R a mapping (τ, 0)-continuous .

Let us remark that, thanks to its generality, (P) covers a majority of functionallyconstrained single-criterion optimization problems. We will denote the admissible domain for (P) by Söad(P), this means 2>ad(P) : = { z e Z ; R(z) < 0, ζ

€

K) .

Moreover we define in a natural way the value inf(P) by ίηί{Φ(ζ); ζ G 2Jad(P)}· Of course, we will use the convention inf(P) : = +oo if 2> a d(P)'= 0· We will say that (P) is feasible if 2>ad(P) φ 0 or, equivalently, if -D C\R{K) φ 0. Eventually, the (possibly empty) set of solutions to (P) will be denoted by Argmin(P) : = {z € Q)ad(P); Φ ( ζ ) = inf(P)} . 4 3 Cf.

also Holmes [232, Section 8] or Köthe [265, Section 25.5]. Klee's paper [259] contains even a more general assertion.

17

1.3 Optimization theory

The basic44 principle, called the direct method, ensuring the existence of a solution to (P) is based on a compactness of the level sets of Φ on Κ, defined by ί β ν ^ Φ

:=

{zeK·,

Φ(Ζ)

< c} .

1.3.1. Proposition. (Existence and uniqueness of a solution to (P).) If (3.1b,c) is valid, (P) is feasible and if 45 (3.2)

3c > inf(P) : Ι χ ν ^ Φ is τ-compact in Ζ ,

then (P) possesses at least one solution. Moreover, if also (1.3a) is valid, Φ is strictly convex, and R is D-convex, then (P) has precisely one solution. Proof As (P) is feasible, there exists a minimizing sequence {z/t}*eN; this means Zk € a a d ( P ) and limôo Φ(ζ*) = inf(P). Then, for any c > inf(P), this sequence must be eventually contained in the set ί β ν ^ Φ . Thanks to (3.2), Levfl-^Φ is compact for an appropriate choice of c. Therefore, this sequence possesses a cluster point zo G Ι χ ν ^ Φ and there is a finer net {z/tf}êH converging to zo € K. As Φ is lower semicontinuous, Φ(ζο) < liminf^ 6 n Φ(ζ^) = limôo Φ(ζ*) = inf(P). By the assumed continuity of R, 0-lim ξesR(zki) = R(zo)· As D is θ -closed, —R(zo) £ D or, in other words, R(ZQ) < 0. Altogether, we proved zo € Argmin(P). Let us suppose that R is D-convex and Φ is strictly convex, and take z\,zi € Argmin(P). Then %d(P) is convex so that ^zi + ^Z2 £ 3)ad(P)· If Zi Z2> then + < |Φ(ζι) + |Φ(Ζ2) = min(P), a contradiction showing zi = Z2· n The next standard task concerns a stability analysis of (P) with respect to a certain data perturbation. Perturbed problems arise typically due to uncertain data (say, obtained by inexact measurements in real situations) or/and due to numerical treatment of the model. For two perturbation parameters ε\ > 0 and £2 > 0, the perturbed problem (Ρ ει , ε2 ) will be considered in the form:

{

Minimize Φ ε . (ζ) subject to

RE2(Z)

for ζ € Ζ , < Λ

Ε Ι

,

ζ e

Κ

.

Of course, we assume Φο := Φ, Rq '.= R, and λο := 0, so that (Pgêj) coincides with (P) for ει = ε 2 = 0. ^Let us only remark that sometimes more sophisticated principles can be applied; e.g. various fixed-point theorems, see e.g. Zeidler [477], or the Palais-Smale [341] condition, see also Aubin and Ekeland [22, Sect.5.5.] or Zeidler [478, Chapter 38]. For a usage of Bauer's principle see also Corollary 4.3.6 below. 45 Note how carefully the condition (3.2) is compounded: the existence would not have been guaranteed if the condition (3.2) were weakened, admitting c > inf(P) only. For example, let us take (P) with Ζ = Κ = Αι := R, D := [0, +00), R(z) := z, and Φ(ζ) := min(e z , (ζ — l) 2 ). Then inf(P) = 0 and ί β ν ^ ο Φ = {1} is compact so that (3.2) would be satisfied for c = inf(P), but Φ does not achieve its minimum on 2) a d(P) — {ζ; ζ < 0}.

18

1 Background generalities

At first, we will suppose that (3.2) holds uniformly also for the perturbed problems, namely (3.3)

Υε,, ε2 € R+

3B0 c Ζ compact

3c\=

C£U£2

> inf(P ei , e2 ) :

L e v j t , ^ £ l c B0 .

Moreover, we will need a mode of convergence suitable for lower semicontinuous functions. Here we suppose (3.4a)

Vz € Ζ :

liminf Φ ε (ζ) > Φ(ζ) , z—yz

ε->0

(3.4b)

V z e Z : 1πτΐ8υρΦε(ζ) < Φ(ζ) . ε—» 0

A suitable mode of convergence for R£ is a continuous convergence, which means (3.5)

VZ€Z:

l i m R e ( z ) = R(z) , z-yz ε-*0

c where "lim" refers to the θ topology on Ai; we will write shortly RE —R ε 0 if (3.5) holds.

for

1.3.2. Proposition.46 (Hadamard-type well-posedness of (P).) Let (3.1)-(3.2) be valid also for the perturbed data, (P) be feasible, the cone D C Ai have nonempty interior, λε > 0, λε 0 for ε -> 0, and (3.3)-(3.5) hold. Then there is Ε : R + R+ such that (3.6)

lim min(P £li£2 ) = min(P) , £l,£2->0

ε2 0, there is Ε — Ε (ει) > 0 such that (3.8)

Υε2 € (0, E):

(z € B0 & R(z) < 0)

Rei(z) < λει

where B0 comes from (3.3). Supposing that (3.8) is not true for any Ε > 0, we would get some sequence {ζε2}ε2>ο such that ζε2 Ξ Bo, R(z£2) < 0, and simultaneously R£2(ZE2) £ λει, i.e. Αε, — Rg2(ZS2) € Αι \ D. Since BQ is compact, we can suppose (after possibly taking a finer net) that {ζε2} converges to a limit, say z. Passing to the limit with the help of (3.5), one gets R(z) < 0 and simultaneously λΕ] — R(z) e cl(Ai \D). This is, however, a contradiction because the sets — D and λει — cl(Aj \D) are disjoint; recall that we suppose λει > 0. Taking ζ € Argmin(P), we can suppose ζ e Bo and by (3.8) we have RE2(z) < Ae, provided ε 2 < Ε (ει) so that ζ is admissible for (Ρε, ε2 ), too. By (3.4b) we get (3.9)

lim sup min(P £l , £2 ) < lim sup Φ ε , (ζ) < Φ(ζ) = min(P). ει,ε2—»0 ' ei^-0 EISE(S i)

On the other hand, we can take ζει,ε2 G Argmin(P £l , £2 ) such that (after possibly taking a finer net) {zei,e2} converges for £ ι, ^ 0 to some limit, say zo- Passing to the limit in RE2(zEl,e2) < λΕι, with the help of (3.5) one gets R(zo) < 0 so that Zo € 2>ad(P). By (3.4a) we then get (3.10)

liminf min(P eiiC2 ) = liminf Φ ε ,(ζ ε|>ε2 ) > Φ(ζ 0 ) > min(P).

It proves (3.6). Then also, if ε 2 < Ε(ε\), we can see that zo € Argmin(P) because Φ ε ,(ζ ε]>ε2 ) = min(P ei g2) min(P) so that (3.10) implies Φ(ζ 0 ) = min(P). It proves (3.7). Eventually, let us realize that the stability criterion just guarantees 2)ad(P£|,e2) 3 ®ad(P)· This can automatically be ensured if Re = R and λε > 0. • 1.3.3. Remark. (Interpretation of upper semicontinuity of Argmin(P £]i £ 2 ).) The formula (3.7) can be interpreted in such a way that the perturbed problem cannot have any ambitions to approximate all solutions to the unperturbed problem but only some of them if the assumed data qualification holds. 1.3.4. Remark. (Weaker modes of convergence of Φ ε .) In the unconstrained case (i.e. Κ :— Ζ and D :— A)) the convergence mode (3.4b) can be considerably weakened, requiring only (3.11)

Vz € Ζ 3{ζε}ε>ο c ζ :

lim sup Φ ^ ) < Φ(ζ) . ε->0

Let us note that ζε need not converge to z. Such a very weak concept of convergence (3.4a) and (3.11), developed by Zolezzi [484], is called a variational convergence. Then (3.3), (3.4a) and (3.11) imply (3.6) and (3.7) if Κ := Ζ and D := Aj.

20

1 Background generalities

However, the variational convergence is not a convergence in the usual sense because a net {Φε}ε>ο can converge variationally to various "limits". 47 Likewise, Φ ε + Φο need not converge variationally to Φ + Φο even if Φ ε does converge variationally to Φ. To overcome this discrepancy, the following "intermediate" mode of convergence is appropriate, namely (3.12)

Vz e Ζ 3 a net ζ

ζ :

l i m s u p O c ( z ) < Φ(ζ) . z->z £—>0

The net Φ ε converges to Φ in the sense (3.4a) and (3.12) if and only if the respective epigraphs converge, which means Limf_ô ερί(Φ ί ) = ερί(Φ). As such, the mode of convergence (3.4a) and (3.12) is called epi-convergence.4& Obviously, (3.4b) implies (3.12) and the "limsup" in (3.12) can be replaced by "lim" if (3.12) is considered together with (3.4a). Equally, Φ ε epi-converges to Φ if, for any zeZ,

sup liminf inf Φ ε (ζ) = Φ (ζ) = sup limsup inf Φ ε (ζ) • ieN NeJf(z) NeJ{(z) ε^Ο Then Φ is also called a Γ-limit of {Φε}ε>ο, and denoted by Γ(Ζ~)-1ίηι ε ^.ο Φε> being a special case of the so-called hybrid limits. 49 If Φ ε = Φο is independent of ε > 0, then the Γ-limit of the constant net {Φε}ε>ο = {Φο} is called a T-regularization, being denoted by ΓΦο, i.e. ΓΦ 0 (ζ) = lim inf Φο (ζ) . z~*z Note that ερί(ΓΦ 0 ) = ΰ1 Ζ χ Κ (ερΐ(Φ 0 )). Let us turn our attention to an abstract approximation theory for (P). The numerical implementation of the problem (P) also often requires to avoid the functional constraint R(z) < 0. The simplest technique 50 relies on the usage of a penalty 47

For example, a constant net Φ ε (ζ) := ζ 2 , ζ e Ζ := R, converges variationally both to Φ(ζ) = ζ 2 and to Φ(ζ) = 0; see Dontchev and Zolezzi [157, Chapter IV, Remark 7], 48 For results of the type (3.6) and (3.7) if Φ ε epi-converges to Φ see, e.g., Aubin and Frankowska [23, Proposition 7.3.5]. 49 For a general theory of hybrid limits and interconnections with the epi- and variational convergence the reader is referred, e.g., to Buttazzo [90, Chapter 5], De Giorgi and Dal Maso [150], Dal Maso [147], Dontchev and Zolezzi [157, Chapter 4], Franconi [187], and Wets [466], Epigraphical convergence is thoroughly treated in the monograph by Aubin and Frankowska [23, Chapter 7]. 50 More advanced techniques are based on the Lindberg duality theory [289], which covers both the penalty-function technique and the so-called augmented-Lagrangean technique. The latter one was originated in the works by Hestenes [224] and Powel [358], and was further generalized by Rockafellar [373] and many others, see also Outrata and Jarusek [339, 340] for a survey.

1.3 Optimization theory

21

function ω : Λι ->· Ε satisfying f= 0 ω(λ) { \ > 0

(3.13)

for A < 0 elsewhere.

Moreover, the original data Φ, R and Κ must be usually approximated by some simpler objects, denoted respectively by Φ,/, Rd and Kd, where d > 0 is an abstract discretization parameter. Then the approximate and penalized problem has the form Γ Minimize Φ£(ζ) := Φ*(ζ) + ±o)(Rd(z)) for ζ e Ζ , Ρ

(Ρ d.P) { subject to ζ € Kd .

A very typical situation is that Kd is an internal approximation of Κ in the sense (3.14) Vi/, > d2 > 0 : Kdl C Kdl C Κ & cl z | J Kd = Κ , d> 0 which is equivalent to (3.14')

{Kd}d>ο nondecreasing and

LimA^ = Κ . d-> ο

We will now focus our attention to analyze the convergence of (Po,d>o expressed, for simplicity, by the following abstract condition51 (to be possibly satisfied only for d and ρ small enough): (3.15)

3B0 C Ζ compact

"id, ρ e R + 3c := cd,p > inf(Pd, p ) :

LevK ^

d

c B0 .

1.3.5. Proposition. (Convergence of (Ρ^, ρ ).) Let all the mappings Φ, Φ^ : Ζ

Ε, ς R,Rd : Ζ —> Λι, and ω : Aj —> Ε be continuous, (Ρ) be feasible, d —>· Φ, Rd —> R, (3.2), (3.13), (3.14), and (3.15) be valid and Kd Π Bo be closed. Then there exists a function κ : E+ R+ such that (3.16)

lim

min(P rfiP ) = min(P) ,

d-0 d max

1

1

α2β'

αια27

Moreover, it is obvious that the supposed coercivity enables us to weaken all the assumptions (3.24)-(3.27) by a localization on bounded subsets only. As we have at our disposal not only a topological structure, but also a linear structure of the involved spaces, we can express the local changes of the involved mappings in terms of their differentials (if they exist) and then pose first-order optimality conditions. We will confine ourselves to a smooth 55 case and the firstorder optimality conditions of the F. John type 56 . On the other hand, we will assume Ζ to be a general locally convex space, which will be useful in what follows. F o r a generalization to a nondifferentiable data the reader is referred to the monographs by Ekeland and Temam [164] or by Ioffe and Tikhomirov [239] in the convex case or by Clarke [130] in the nonconvex case, see also [22, 478], for example. 5 6 Another more general form of optimality condition was established in the DubovitskilMilyutin theory [158], see also [478], for example. 55

1.3 Optimization theory

27

Considering the ordering < of Aj induced by the cone D, the negative polar cone —D° c A* induces apparently an ordering on the dual space A*; we will denote this dual ordering by " > ". The following assertion represents just a single possibility from a widely developed theory of first-order necessary optimality conditions. The important fact is that the convex set Κ is admitted to have an empty interior, which is a typical case in relaxed problems. 1.3.12. Proposition.57 (F. John-type necessary conditions.) Let Κ be convex, both Φ and R be Gäteaux differentiable, and let the cone D have non-empty interior. Then, for every solution ζ to (P), there exist some multipliers XQ € R and λ* € A* such that AQ > 0 and Α* > 0 and (3.28)

Α^νΦ(z) + [V/?(z)]*At e

(3.29)

0 .

Sketch of the proof58 Let us consider two subsets o f l χ Aj: (3.31)

Si := {([V®(z)](z-z),Ä(z) + [ V Ä ( z ) ] ( z - z ) ) e R x A i ;

(3.32)

S2 := (—R+) x int(-D) .

zeK}

Note that both sets are non-empty and convex, the latter one being open. Supposing Si η S2 φ 0, we get some ζ € Κ such that [νΦ(ζ)](ζ - ζ) < 0 and R(z) + [VR(z)](z — z) < 0 . By the definition of the Gateaux differential, for 0 < ε < 1 small enough we have also Φ(ζ + ε(ζ — ζ)) < Φ(ζ) andÄ(z4-£(z — ζ)) < 0. Since Κ is convex, ζ + ε(ζ — ζ) e i , so that this point is admissible for (P), and therefore ζ cannot be the optimal solution to (P), a contradiction. Thus Si Π S 2 = 0. Then, by the Eidelheit theorem, they can be separated by a continuous linear functional, i.e., there is (Aq,A*) g R χ A* = (R χ AO* different from zero such ((AJ5, λ\), Si) > 0 and ( ( ^ , λ{), S2) < 0. The latter inequality implies /L^ > 0 (this gives already (3.30)) and λ* g —D°, i.e. λ\ > 0, while the former one yields (3.33) 57

Vz~ e Κ :

Α^[νΦ(ζ)](ζ~ - ζ) + (At, R{z) + [VR(z)](z - z)> > 0.

This type of optimality conditions was first invented by F. John [245]. The assertion presented here is basically due to Casas [99, Theorem 5.2], For the case that Κ has nonempty interior, see also Zeidler [478, Sect. 48.3] where also Ai infinite-dimensional is admitted provided certain additional assumptions on D and VR are imposed. For certain special data Φ, R, and D , an infinite-dimensional Ai is also admitted in Ioffe and Tikhomirov [239, Section 1.1.4]. 58 This proof is essentially due to Casas [99],

1 Background generalities

28

Especially, for ζ = z, we get {A*, R(z)) > 0. Since λ* > 0 and /?(z) < 0, we have also {λ\, R(z)) < 0, which proves (3.29). Putting (3.29) into (3.33), one can write the resulted inequality just in the form (3.28). • 1.3.13. Convention. In fact, the mappings Φ and R need not be defined on the whole space Ζ but only on the convex subset K. Then the meaning of the differential VÄ 6 £(Z,\X) of R at a point ζ 6 Ζ is that [V/?(z)](z - z) = l i m ^ o W z + ε(ζ - ζ)) - R(z))/e for ζ e Κ only (and not for ζ € Ζ as usual). The modification for Φ is straightforward. This may cause the differentials to be determined uniquely only up to a closed linear subspace provided Κ is "flat". We will occasionally use this convention in what follows. The first multiplier AQ in the F. John conditions can sometimes degenerate to zero and then such conditions become not much selective because the cost function Φ falls completely out.59 Therefore, the so-called normal case AQ > 0 (or equivalently AQ = 1) is of particular interest: 1.3.14. Proposition.60 (Karush-Kuhn-Tucker conditions.) Let (3.28)-(3.30) hold, int(D) φ 0, and let the so-called Mangasarian-Fromowitz constraint qualification61 be valid at the optimal solution z: (3.34)

3z€TK(z):

[V/?(z)](z~) < 0 .

Then AQ = 1. Proof. As the tangent cone TK{z) equals cl z U e > o — z)> w e c a n modify (3.34) to the condition: 3z € Κ : [V/?(z)](z — z) < 0. Supposing AQ — 0 for a moment, (3.30) yields λ\ φ 0 and then (3.28) says that 0 for any ζ Ε Κ or, equivalently, (Aj\ [Vtf(z)](z - z)) > 0. As A* > 0 but λ* φ 0, we get (A*, [V/?(z)](z-z)> < 0 for the special ζ € Κ from (3.34). This is a contradiction, implying AQ > 0. • 1.3.15. Remark. (Convex problems.) An important class of problems, namely those having linear/convex structure of the constraints and possibly also the cost functional, allows us to strengthen the results.62 For example, if Ζ and Λι are Banach 59

A model example appears for the data Ζ = Κ = Λι := Μ2, D := [0, +οο) 2 , Φ(ζ) := ζ\, and R(z) '•= {—Ζι, Ζι — ζ?). Then the unique solution to (Ρ) is ζ = (0, 0) and = 0 while A* = (1, 1), the multipliers (AQ, λ*) being determined uniquely up to positive multiplicative constants. 60 Conditions of such a kind were first isolated by Karush in his thesis [251] and later independently once again by Kuhn and Tucker [276]. There are many generalizations, for example by Zowe and Kurczyusz [488]. 61 Cf. Mangasarian and Fromowitz [302], 62 For more extensive study the reader is referred, e.g., to the monographs by Aubin and Ekeland [22], Ekeland and Temam [164], or Zeidler [478].

1.3 Optimization theory

29

spaces, R is linear and Φ convex Gateaux differentiable, then (3.28)-(3.29) hold 63 with AJ = 1 and λ\ > 0 provided 0 e i n t ( / f ( J : ) + D ) . The last condition is fulfilled, e.g., if 3z e K :

R(z)

0 also in the case that R is Gateaux differentiable and D -convex. In particular, if Ζ and Ai are Banach spaces and Φ is Gateaux differentiable (but not necessarily convex) but D = {0} and R is linear (so that we admit linear equality constraints only), then (3.28)-(3.29) hold 65 with λρ = 1 and λ* € A* provided R is surjective on K, i.e. R(K) = A). Moreover, in the convex case the F. John conditions are even sufficient: if Φ is convex, R is D-convex, and if ζ € Κ satisfies R(z) < 0 and (3.28)-(3.29) for some Aq > 0 and λ* > 0, then ζ e Argmin(P). To evaluate the gradient needed for the above optimality conditions, we often face the peculiarity that both Φ and R involve some implicit mapping π : Κ ^ Y determined so that y:= π(ζ) is a (unique) solution to the abstract, so-called state equation

(3.35)

Π(ζ,>)=0,

.

where Y is a Banach space, and Π : Ζ χ Υ —• Λ with Λ another Banach space. We call π the state mapping and suppose that (3.36)

Φ(ζ) : = J (ζ, π (ζ)) ,

R(z)

:= Β (ζ, π (ζ))

for some / : Ζ χ Υ -»· Κ. and Β : Ζ χ Υ ->· Αι. Then our task is to evaluate the gradient of Φ and R from the knowledge of the gradients of J and B \ note that the gradient of π is assumed not to be obtainable explicitly, which is actually the very typical situation in optimal control problems unless the state equation (3.35) is merely elementary. The presented method is called the adjoint-equation technique. 66 Let us formulate the results only for the mappings R and B, the case of Φ and J being only a simplification. We will say that Π(·,};) : Ζ — A is, at a point ζ € Κ, Gateaux equidijferentiable around y e Y if, for any ζ e Κ, it holds ||(Π(ζ + ε(ζ — ζ), y) - n ( z , y ) ) / e - [ V z I I ( z , y ) ] ( z - z ) h = οζ-(ε) with some oz : M+ R + such 63

It follows, e.g., from Aubin and Ekeland [22, Chapter 4, Sect.6, Corollary 3]. W e refer to Ekeland and Temam [164, Chapter 3, Proposition 5.1], where this result is formulated even for the case when Φ + λ* ο R is not differentiable. 65 It follows from the condition ΥΦ(ζ) e -NKnKerR(z) = ImK* - NK(z)\ for the last identity we refer, e.g., to Aubin and Ekeland [22, Section 4.1, Corollary 17] if 0 eint/?(A"). 66 This technique is based on the implicit-mapping theorem. In the context of optimal control, it has been basically originated by Luenberger [294, Sect. 9.6]. H

1 Background generalities

30

that limeôöfie) = 0 for all y in some neighbourhood (depending possibly on z) of the current point y. 1.3.16. Lemma. (Adjoint equation technique.) Let R be given by (3.36), Π(ζ, ·) : Y —> Λ and B(z,·) : Y —Aj be Frechet differentiable at y = Jt(z), and n(-,y) : Ζ —> A and B(-,y) : Ζ —> Λι be at ζ € Κ Gateaux equi-differentiable around y e Y (in the sense of Convention 1.3.13); by Vz and Vy we will denote the respective partial differentials.67 Moreover, for any ζ € Κ, let the mappings [VzII(z, ·)](£) : Y —• A and [V z £(z, •)](£) '• Υ Μ be continuous, and also the state mapping π : Κ —»• Y be continuous. Eventually, let the so-called adjoint equation (3.37)

Ρ ο V y Il(z, y) = V y ß(z,y)

have for y — π(ζ) a solution68 Ρ e £S(A, Αι). Then R is Gateaux differentiable at ζ (in the sense of Convention 1.3.13) with the differential given by (3.38)

VR(z)

= VzB(z,y)-P

oVzU(z,y)

e £ß(Z, A , ) .

Proof69 For ζ, ζ € A" and 0 < ε < 1, we put Ζε·— ζ + ε(ζ - ζ). Obviously, ζΕ £ Κ because A" is convex. Beside y:= ττ(ζ), we abbreviate also yE := π (ζ ε ). Then we can write ^ αοΛ (j.jyj

R(zs)-R(z) ε

_ B(ze,ye)-B(z,ye) — ε

ι

B(z,ye)-B(z,y) —. /> ε

ij.

The fact that B(-,y£) is Gateaux equi-differentiable yields ||(ß (zc, yc) — B(z,ye))/ ε - [VzZ?(z,y£)](z — Ζ ) | | Λ , = οι (ε) for ε \ 0 with o\ depending possibly on ζ. Then the first term approaches WzB(z, y)(z — z) thanks to the estimate l l / f - V ^ ( z , y ) ( z - z ) | | A i < li[VzB(z,yE)-VzB(z,y)](z-z)HAl < οζ(ε) +οζ(ε)

+

ode)

+ ο ι ( ε ) = : ο2(ε) ,

where οζ(ε) = ||[V z 5(z, ye) - VzB(z, y)] (ζ)|| Λ , tends to zero for ε \ 0 because [VzZ?(z, )](z) is assumed continuous and because yE y thanks to the 67

The equi-differentiability can be ensured by the joint continuity of VZII and VZB. Indeed, this follows from the formula ||(Π(ζ + ε(ζ — z),y) — Π(ζ,)>))/ε - [ V z n ( z , y)](z - Z)||A < s u p 0 ^ S £ IIV z U(z + ϋ(ζ- ζ), y) - ν,Π( ζ > y)|b (Z ,A) llz - z\\z; cf. e.g. Kolmogorov and Fomin [264, Section X.1.3]. Then the joint continuity of VZI1 ensures that oz(e) = sup0• £ß(A, Y) is continuous; it suffices to apply the implicit-mapping theorem to the mapping / : A) χ i£(A, Y) —> ££(Y, Y), defined by f(Au A2) := Ai oA2 - id, at the point {AUA2) := ( ν , Π ( ζ , y), [V y II(z, y)]" 1 ). By the remaining assumptions, the mapping ζ Μ» />(z) := VyB{z, π(ζ)) ο [V y II(z, π (ζ))] - 1 is continuous, and then also ζ ι-* VÄ(z) = VzZ?(z, π (ζ)) — P(z)oV z Il(z, π (ζ)) is continuous, as claimed.

32

1 Background generalities

1.3.18. Remark. (The adjoint state.) The solution Ρ € ££(A, Aj) of the adjoint equation (3.37), being called the adjoint state, has got a very concrete meaning. Namely, it is the Frechet differential of the mapping Λ Λι : λ Μ» B(z,yx) where yx e Y is the unique solution of the perturbed state equation (3.41)

Πίζ,η)

= λ .

Indeed, subtracting (3.35) from (3.41), one gets (3.42)

λ = U(z,yx)-m,y)

= [V y Il(z, γ)](γλ - y) +

0 l

( \ \ y

x

,

where o\ : R + - » Λ is some mapping such that lim £ ôOi(e) = 0. Also, we can w r i t e B ( z , y k ) - B { z , y ) = [V y ß(z, y)]iyx-y)+o2(\\yk-y\\Y), where o 2 : M + ^ A! is some mapping such that lim ε \ ο 02(e) = 0. Applying the mapping Ρ on (3.42) and using the adjoint equation (3.37), we get Ρ λ = [V^Z^z, )>)]0>λ — yJ+PoidlyA — Eventually, we can write71 (3.43)

B(z,yx)-B(z,y)

= Ρλ - P0l(\\yx

- y\\Y) + ο2(\\γλ - y\\Y) .

By the definition of the differential, Ρ is the differential at the point 0 of the mapping A — A ] : λ Μ» Β (Ζ, as claimed. At the end of this section, let us apply Proposition 1.3.12 together with Lemma 1.3.16 to an abstract optimal-control problem: Minimize J(z,y)

for ζ e Z, y e F ,

subject to Π(ζ, y) = 0, B(z, y) < 0 , ze Κ , supposing again that, for every "control" ζ e K, the state equation Il(z,y) = 0 has a unique solution y = π{ζ). 1.3.19. Corollary. (F. John-type optimality conditions for (P')·) Let J, Π, and Β possess the partial differentials with respect to ζ and y satisfying the assumptions of Lemma 1.3.16, Κ be convex, the state mapping π : Κ —Υ be continuous, (z, y) be a solution to (P'), and ν^Πίζ, y) : Υ —»· A have a bounded inverse, and D has non-empty interior. Then there are multipliers (AQ, λ*') e (Μ χ A] χ A)* 7

'The formula (3.43) is very useful within practical realization of the evaluation of the differential via the adjoint equation technique because it enables us to check numerically the value of the adjoint state Ρ by comparing its components with those ones computed approximately as differential quotients from (3.43). The realization is extremely simple because it just suffices to perturb the right-hand side of the state equation, see (3.41). The necessity of checking the adjoint state is very urgent even for quite simple tasks because it enables us to localize possible (and practically almost inevitable) mistakes created within writing a computer code.

1.3 Optimization theory such

that AQ > 0, λ* > 0, (AQ, λ*) φ 0,

(3.44)

33

and

+

(3.45)

€ -Νκ(ζ) ,

[ν,Π(ζ,^Π* =

(3.46)

Xp?yJ(z,y)

+

(λΐΒ(ζ,γ)) =

[VyB(z,y)]*Xi

,

0.

We shall apply Proposition 1.3.12 to the problem (P') with the data determined by (3.36); note that, due to our assumptions, Lemma 1.3.16 ensures that Φ and R are Gateaux differentiable, as required in Proposition 1.3.12. It gives the existence of AQ > 0 and λ* € Λ* satisfying (3.30), (3.46), and (3.28). Now we apply Lemma 1.3.16 to evaluate both ν Φ and V/?. As ν^Πίζ, y) is supposed to be invertible at the point (z,y) in question, the adjoint equation (3.37) has a solution Ρ G ££(A, Ai); this means we have Proof.

(3.47)

Ρ ο

V y II(z,

y)

=

VyB(z,y)

.

Then (3.38) gives (3.48)

VR(z)

=

VzB(z,y)

- Ρ ο VzU(z,y)

.

Replacing B, R, and Ai respectively with J, Φ, and R, one gets (3.49)

νΦ(ζ) = V z / ( z , ) 0 - [ V z n ( z , ) 0 ] * Ä * ,

where λ* € A* solves the adjoint equation (3.50)

[VyIl(z,y)]*X*

=

VyJ(z,y) .

Summing (3.49) and (3.48) with the "weights" λJ and λ\, we get Α^νΦ(ζ)

+ λ\ ο

VR(z)

=

λ*0νζ]

=

W

z

{ζ, y) + λ\ ο VzB(z,

J { z , y ) + k \ o V

z

y) -

(λ*0λ* + λ\ ο Ρ) ο ν ζ Π ( ζ , y)

B ( z , y ) - X *

oVzn(z,y) ,

where we have abbreviated λ* := λ*0λ* + λ\ ο Ρ € i£(A, Ε) = Λ*. By (3.28) we immediately get (3.44). Summing (3.50) and (3.47) with the same weights XQ and λ*, we get the "adjoint equation" for λ*, namely (3.45). • 1.3.20. Remark. (An alternative approach.) The transformation (3.36), which turned the problem (P') into the original general form (P), is not the only possibility. We could also imagine the optimal control problem (P') not as an optimization problem on the space of "controls" Ζ, but as a constrained minimization problem

34

1 Background generalities

on the product space of the pairs control-state Ζ χ Y. Then the corresponding problem (P) would bear the data Ζ χ Υ , Κ χ Υ , Α χ A\, {0} χ D, J, and (Π, B) in place of Ζ , K , A l 5 D , Φ, and R, respectively. The optimality conditions like in Proposition 1.3.12 would yield some multipliers ( λ $ , λ * , λ \ ) € ( Ε χ Λ χ AO* which play the same role as those ones obtained in Corollary 1.3.19; in particular, λ* can thus be identified as the Lagrange multiplier to the constraint Π(ζ, y) = 0. Nevertheless, the results would have been slightly different; e.g., we would obtain the non-degeneracy (AQ,A*,AJ) φ 0 only, instead of our (AQ,A*) φ 0. Besides, Proposition 1.3.12 could not be applied directly to such a problem because the cone {0} χ D has an empty interior. It is plain that the approach presented above reflects much better the nature of optimal control problems. 1.3.21. Remark. {Approximation of (P')·) A standard situation is that we cannot solve directly the state equation Π(ζ, y) = 0 but rather some "approximate" state equation I U z , ; y ) = 0 coming from the necessity of solving the original state equation numerically or/and from inexact knowledge of the original state mapping Π. Numerical treatment also typically requires an approximation of Κ by some Kd. Supposing, for simplicity, that the other data J and Β can be handled exactly, we face the following perturbed optimal control problem:

{

Minimize J(z,y)

for ζ G Z , y € Υ ,

subject to Γ U z , y) = 0, Β (ζ, y) < 0 , ζ e Kd .

The realistic assumptions are the following: Π, E U B, J continuous, Κ and Kd compact (or possibly only o -compact if a suitable coercivity structure is presented in the problem), there exists a continuous mappings π , Jt d : Κ Y such that rT(z,y) = 0 if and only if y = π (ζ) and I U z , ; y ) = 0 if and only if y — ^ ( z ) , c

and Kd —> π and Kd Κ in the sense (3.14). It should be emphasized that such quite strong assumptions still do not guarantee any mode of convergence of Argmin(P^) to Argmin(P') for d —» 0 even in very elementary situations. 72 Nevertheless, an approximate treatment of the state constraints can help here. Let us illustrate this again with the simplest variant - the penalty function technique. This gives rise to an approximate penalized optimal control problem:

{P> ,P

72

)

f Minimize Jp(z,y)\= J (z,y) + }co(B(z,y)) s \ subject to F U z , y) = 0 , ζ 6 Kd ,

for ζ e Z , y e Υ ,

Indeed, for Ζ = Υ = A = Λ, := R, Κ = Kd := [0, 1], D := [0, +oo), J(z, y) := -y, Π(ζ, y) := ζ — y, Πd(z,y) := (1 + d)z — y, and B(z, y) := (z — y)2, we obtain the situation that 2>ad(P') = {(z,y) e [0, l ] x R ; ζ = y} while Söad(P^) = {(0, 0)} for any d > 0, and therefore apparently Argmin(P^) = {(0, 0)} does not approximate Argmin(P') = {(1, 1)}.

1.4 Function and measure spaces

35

where ω : Λι —• R is a continuous function satisfying (3.13). Then we can easily see that there exists a function κ : — R + such that73 (3.51)

lim

min(Pj ) =

min(P') ,

d• IK. with bounded variations will be denoted by b a ^ ^ ) , and its subset consisting of σ -additive set functions will be denoted by ca(il-,X). If, in addition, Ω is a topological space, we can define r b a ^ ; Σ) as the collection of all regular additive set functions Σ Μ with a bounded variation, and by 9 2 See, e.g., Dunford and Schwartz [160, Corollary III.6.16] or Kolmogorov and Fomin [264, Section V.5.5], 93 Note that the linear hull of all characteristic functions χ Α with A c i i measurable is dense in L ° ° ( i i ) = Lx{Cl)*, so that the sequence {m^}, being bounded in £ ' ( Ω ) , converges weakly in Ζ,'(Ω) and, as such, it is relatively sequentially weakly compact, hence by the Eberlein-Smuljan theorem relatively weakly compact, too.

1 Background generalities

40

rca(fl;2) we denote its subset consisting of σ-additive set functions. The smallest σ -algebra containing all open subsets of Ω consists just of all Borel subsets of Ω and, as such, it will be called the Borel σ -algebra. Typically, Ω will be a domain in R" endowed not only with the Euclidean topology, but also with the Lebesgue measure. Then another natural choice for the ο-algebra X is the set of all subsets of Ω that are measurable with respect to the Lebesgue measure.94 Then by v b a ^ ; X) we denote the set of all additive set functions with bounded variations that vanish on sets having Lebesgue measure zero. All the introduced spaces b a ^ ; X ) , c a ^ ; X ) , r b a ^ ; X ) , r c a ^ ; X ) , v b a ^ ; X ) , are linear vector spaces which can be normed by means of the variation, i.e. ||μ|| := |μ|(Ω). This makes them Banach spaces. Let us remark that σ -additive set functions defined on a a-algebra are called measures, while the additive set functions are sometimes also called finitely additive measures. If Ω is a topological space and X its Borel σ -algebra, the measures from c a ^ ; X ) will be called Borel measures, while the elements of r c a ^ ; X ) will be then addressed as Radon measures. Moreover, a positive (finitely additive) measure μ will be called a probability measure if μ(Ω) = 1. The convex subsets of positive (resp. probability) measures will be denoted by (resp. by for example r c a + ^ ; X ) or Λ»α+(Ω;Χ) (resp. r c a + ^ ; X ) or rbaf (Ω;Χ)). An important example of a probability measure is the Dirac measure δχ supported at a point χ e Ω, which is defined for any subset A € X by (4.3)

1.4.8. Theorem.95 (Extreme probability measures.) (i) The Dirac measures are extreme points in the set of all probability measures. (ii) Conversely, if Ω is compact, then every extreme point in r c a f ^ l ; X) is of the form dx for some xsil. By Ζ?(Ω) we will denote the space of all bounded functions Ω —>· Μ endowed with the Chebyshev norm ||u|| := s u p | w ^ ) | , which makes it a Banach space. If Ω bears also a topology, say τ, we denote by (Γ0(Ω) := C (Ω) Π Β (Ω) the linear

94

Ιη fact, this is the so-called Lebesgue extension of the Borel ο -algebra, created by adding all subsets of sets having the measure zero. 95 For the point (i) see, e.g., Köthe [265, Section 25.2]. For f l compact see also Dunford and Schwartz [160, Lemma V.8.6] and realize that, if ö x is an extreme point in the unit ball of rcaCil; X), it remains extreme in rca,f (Ω; X), too. The point (ii) follows from Dunford and Schwartz [160, Lemma V.8.5] if one realizes that, by Proposition 1.5.1 and Remark 1.5.2, we have rcaf (Ω;Χ) = ϋδ({ 0 3ό > 0 VA c Ω measurable: |A| < ό ==>· |σ(Α)| < ε. Also the converse assertion is true: every absolutely continuous measure possesses a density belonging to Ι ' ( Ω ) . This is known as the Radon-Nikodym theorem [359], [333]. 9 7 If σ has a density da, we will use the notation a(djc) = d0{x)dx. Every measure ο € rca(Ö) admits 98 a uniquely determined decomposition a = θ\ + 02 where θ\ is absolutely continuous and Oi is singular (with respect to the Lebesque measure) in the sense that it is supported on some subset of Ω having the Lebesgue measure zero; the splitting ο — G\ + o-i is called the Lebesgue decomposition. Considering two Lebesgue measurable sets Ωι c identity

Ωι χίΪ2

Ω,

Ω2

K m i and Ω2 C K m 2 , the

Ω2 Ωι

holds provided g e L 1 ( Ω l χ Ω2) or provided one of the double-integrals does exist and is finite. This is known as the Fubini theorem" [194], Let us turn our attention to functions which enjoy some smoothness. Considering Ω c M" open and an increasing sequence of compact subsets Κι c Ω such that Ω = LUn^m we put 2)(Ω) : = ϋ , ^ ί Ι ^ Ν ί ' ^ ) , where denotes the space of all functions Ω — R which are continuous together with all their derivatives up to the order k and which have the support contained in Ki. Each : = ( \ e N £ ^ ( Ω ) is endowed with the collection of seminorms (I · U.iCiJteN with : = maxi 0 for any ι = 1 , . . . , η is defined as a distribution such that ^dku/dx\[ ( - 1 ) * (u, tfg/dx'l1 ... dxkfor 103

... dxk", g^ =

any g e 2)(Ω;ΙΓ).

See Adams [3, Theorem 3.5]. This means that Γ can be divided into a finite number of overlapping parts, each of them being a graph of a scalar Lipschitz function on an open subset of R" - 1 . 104

44

1 Background generalities

domains on which the former definition of Sobolev-Slobodeckii spaces can be already used.105 The closed linear subspace [u € W['p(Cl;Rm); w|r = 0} is denoted r p m by W o ' (il;IR ). Relations between various function and measure spaces are often in the form of inclusions. Such inclusions are always linear operators which can have some additional properties: in particular, an imbedding is called continuous, compact, dense, or homeomorphical if the corresponding linear operator is continuous, compact, has a dense range, or the inverse operator (restricted on the range of the original operator) is continuous together with the original operator, respectively. The following imbedding theorems will be often used: The imbedding C ( Ω ) c Lp(ü,) is always continuous, and for 1 < ρ < + o o it is dense but not homeomorphical while for ρ = + o o it is homeomorphical but not dense. For 1 < Ρ < q < + o o , we have the continuous dense imbedding Lq(Cl) c Lp(il) (recall that we supposed Ω bounded hence |Ω| < + o o , otherwise this imbedding would not hold). Neither of the mentioned imbeddings is compact. On the other hand, we have106 (4.6)

- > - - ρ q η

= »

C Ζ/(Ω)

compactly ;

recall that η is the dimension of Ω c IR". If \/p > 1 /q — k/n, then in general the imbedding Wk'«(tt)

C Lp(Cl)

is only continuous provided kq < η or kq = η — 1.

Also, for kq > η > 2 or for η =

1, Wk'q(ü,)

is continuously imbedded into

C(A). The imbeddings can be transposed, thus resulting in relations between the respective dual spaces. Having two function spaces Gi C G2 and the continuous imbedding / : G\ G2, the adjoint operator /* : G | —• G* makes just the restriction on Gi of the linear continuous functionals on G2. Let us distinguish two typical situations: (A)

The imbedding / : Gi /* : G\

G2 is homeomorphical. Then the adjoint operator

G* is surjective (because every linear continuous functional on

Gi remains continuous also with respect to the topology induced from G2 and can be then extended onto G2 by the Hahn-Banach theorem 1.2.3.). (B)

The imbedding I operator /* :

: G\ —• G2 is continuous and dense. Then the adjoint G* is injective (because two different linear continuous

functionals on G2 must also have different traces on any dense subset, in particular on G i ) . Sometimes the above cases can appear simultaneously, which gives rise to the third situation:

105See,

e.g., Adams [3] or Kufner, Fucik and John [275]. that throughout the book we use the convention l/p:— 0 for ρ = + o o .

l06 Recall

1.4 Function and measure spaces

45

(AB) The imbedding / : G\ —> Gi is homeomorphical and dense. Then the adjoint operator I* : G\ ->· G\ is one-to-one. Though I* need not be a (weak*,weak*)-homeomorphism, it is the (weak*,weak*)-homeomorphism if restricted to a ball in (which is weakly* compact). In addition, if Gι and G2 are normed spaces (so that the duals G* and are Banach spaces), _1 then the inverse operator (/*) is (norm,norm)-continuous, a consequence of the open-mapping theorem. In the situation (B) and thus also (AB), it is a common convention to consider G\ imbedded via I* into G* and then not to distinguish between elements of G | and their images in G*. 1.4.10. Example. (Intermediate subspace G.) Let Ω := [—1,1] and let G be a linear space of functions g : Ω —> Μ which are possibly discontinuous at 0 but possess the unilateral limits at 0 and also —1 and 1, i.e. g|(-i,o) e C([—1, 0]) and g |(o, l) € C([0, 1]). Of course, we endow G with the supremum norm. Obviously we have C ( [ - l , l ] ) C G c L°°([—1, 1]) , both imbeddings 107 being homeomorphical but not dense, i.e. of the type (A) but not (B). Again, the dual space G* can be identified with the space of certain measures, namely rca([—1, 0]) χ rca([0, 1]). The relations between the dual space are obviously the surjections: vba[—1, 1] G* -> rca[—1, 1]. Neither of these surjections is invertible. For example, for any a € R, the mapping αδo_ + (1 — α)δ 0 + defined by [α όο- + (1 — a) öo+l (g) = a limg(jc) + (1 - a) limg(x) χ/Ό

jr\0

forms a linear continuous functional on G. Obviously, if g e C([—1, 1]), then [αόο- + (1 — α)L°°(fI)

Lp(£l) The transposed diagram is the following (the description "sur" and "inj" indicates respectively the surjectivity or injectivity of the mapping corresponding to the particular arrow): 2>(Ω)* .

INJ

-

rca(n) < SUR '

G*

·SUR'

vba(Q)

The relations between the involved spaces are accomplished by the observation that the first diagram is connected with the second one because we have always the imbedding Lp(il) —> vba(il). For ρ > 2, we have an even stronger connection because there is the imbedding Lp(Ω) -> Lp/(p~l)(£l). 1.4.12. Remark. (Insufficiency of the concept of sequences.) The weak* topology on vba(H) = L'(Q)** is not metrizable even if restricted on bounded subsets, which is related to the fact that L°°(Ci) is not separable. Besides, this is an example of a situation where sequences are not a satisfactory tool. Namely, no element from the remainder vba(Ω) \ L 1 (Ω) can be attained (with respect to the weak* topology) by a sequence from L ' i f l ) , though Ll(Ci) is dense in vba(n). Indeed, if it were possible, such a net would be weakly Cauchy in L 1 (Ω) because the trace of the weak* topology in v b a ^ ) coincides with the weak topology in Ζ, ! (Ω). However, the limit of such a sequence must live in Ζ,'(Ω) because Ζ^(Ω) is sequentially weakly complete. 109

108

Cf. also the example by Lang [280, Section VII.4]. For this nontrivial fact the reader is referred, e.g., to Dunford and Schwartz [160, Section IV.8] or Edwards [162, Theorem 4.21.4], 109

1.5 Means of continuous functions

47

1.5 Means of continuous functions As we have already introduced the concepts of compactifications (Section 1.1), dual spaces (Section 1.2) and spaces of continuous functions (Section 1.4), we are ready to present (to a minimal necessary extent) the theory of means on spaces of continuous bounded functions, and in particular multiplicative means on rings of continuous bounded functions. 110 Again, we will consider a topological space (U, τ) and the subspace of the Banach space C°(U) of all continuous, bounded, real-valued functions on U endowed with the Chebyshev norm ||/Ilc°(t/) : = su P« e t/ l/(")l· Let us now consider a linear subspace of C°(U) containing constant functions. The set of all means111 on SF is defined by (5.1)

M ( 9 ) :=

{[i £ 2?*; \\μ\\9* = 1 and μ(\)

where 2F* denotes the dual of (5.2)

\\μ\\9.

= 1} ,

endowed with the standard dual norm, i.e. :=

fe9,

sup μ(/) . ||/f c 0 ( „,• Μ (&?) is the evaluation mapping defined by (5.3)

(e(u),f)

:=

f(u)

for u € U and / e 3F. Besides the norm topology of the Banach space 3F*, the set Μ(&τ) will be also endowed with the (relativized) weak* topology. The natural ordering of C°(U) relativized on induces the dual ordering on i.e. μ > 0 means just that {μ, f ) > 0 for any / > 0, / e 8F. 1.5.1. Proposition. 112 (Means.) Let 3F be a linear subspace of C°(U) constants.

u0

containing

Then:

For more details, the reader is referred to the monographs by, e.g., Berglund et al. [56], Cech [108], Edwards [162], Engelking [165], Gilmann and Jerison [206], and Yosida [470], 11 'The means can be defined even a bit more generally on a linear subspace SF of bounded functions on U not necessarily continuous and not necessarily containing constants. Namely, a mean μ is by definition a linear functional 5F — Μ such that inf ue u f (u) < μ ( / ) < sup u e [ //(M); cf. Edwards [162, Sect. 3.5]. This coincides with our definition provided 9 C C°(£7) and 1 € 8F. " 2 We refer to Berglund, Junghenn, Milnes [56, Section 1.3]; however, the presented assertion here is a bit modified, e.g. C°(U) is not a complex but a real algebra and need not be closed.

48

1 Background generalities

(i)

The set of all means on 2F can alternatively be expressed as:

(5.4)

Μ (9*) = [μ : 9?

R linear, μ>0

& {μ, 1) = 1} .

(ii) The evaluation mapping e : U —>• M(SF) is weakly* continuous. (iii) Μ (SF) is α weakly* compact and convex subset of 8F*. (iv) Μ (3*) is weak* closure of the set of all finite means. Proof. Let μ € M(9^) and / > 0, / € Put / m a x := sup/(£/) and / m i n := i n f / ( [ / ) . Obviously, / m i n > 0. Since μ € Μ (9?), ( « , / ) - 2 C/max + /min)
0. Conversely, let us take μ : 9* R linear such that μ > 0 and {μ, 1) = 1. Furthermore, take / € 9*· and put / = — / + liy llc°(t/)· Obviously, / > 0 and therefore ( μ , / ) = —(/ί,/> + ||y || c o (t/) > 0. This yields ( μ , / ) < | | / | | c o ( t / ) for any / g SF. Therefore μ is continuous, i.e. μ e 8F*, and even \\μ\\&* — 1. Thus (5.4) has been proved. The weak* continuity of e : U 3F* means precisely that u i-»· (e(u),f) — f (ii) '. U —>· R is continuous for any f e SF, which follows directly from the continuity of each / € 3F C C°(U). This shows (ii). In view of (5.4), Μ (9 s ) is convex and closed. By (5.1), M($F) is contained in the unit ball of SF*, and therefore by the Alaoglu-Bourbaki theorem 1.2.2 it must be weakly* compact, as claimed in (iii). Let us go on to (iv). Take μ 6 M(9^) and put M ( = w*-cl(co(e(i/))) the weak* closure of finite means. If μ were not belong to M(SF), then by the HahnBanach theorem 1.2.3 there would exist/ e 9* such that {μ, f ) > s u p ^ ^ ^ ^ a , / ) ; realize also that, by Theorem 1.2.1, every weakly* continuous linear functional on SF* has the form μ μ- {μ, f ) for some / e OF. However, (μ, f ) > and w e obtain a sup uel/ («),/} = ll/llc°(i/)> contradiction {μ,}) > ||/|| c o ( l / ) . This shows that μ e M(2F). • 1.5.2. Remark. (Connection between means and probability measures.) From (5.4) one can easily see that in the special case SF := C°(U) the set of all means is precisely the set of all probability measures on U. Thus, in view of the Riesz representation theorem 1.4.9, M(&) = rca +([/) (resp. Af (9?) = rbaf (t/)) if 9? = C°(U) and U is compact (resp. normal). If SF is smaller than C°(U), the means

1.5 Means of continuous functions

49

can alternatively be understood as classes of probability measures with respect to a suitable equivalence. 113 Let us now turn our attention to multiplicative means on rings of continuous bounded functions. We call a subspace 2ft c C°(C7) a ring if / i , / 2 € 2ft implies /1/2 € 2ft, where /1/2 denotes the pointwise multiplication defined naturally by [fifiMu) '•= f\{u)f2{u). The ring is called complete if it is closed in C°(U), contains constants, 114 and eventually, for every A c U closed and u e U \ A, contains / being equal 1 on Λ and vanishing at u. In particular, if (U, τ ) is completely regular, then C°(U) itself is a complete ring. The aim is to construct for every compactification of U its representation in terms of multiplicative means on a suitable ring. 115 If 2ft is a subring of C°(U), then we can define (5.5)

M mult (2ft) := {μ e M ( ^ ) ; V / , , / 2 € 2ft : {μ, fxf2) = {μ, /,) 0, it holds (6.2)

whenever we know that y(t) < C + / 0 '(e(T)y(T) + b(r))dT for some a.b > 0 integrable. We now give some classical existence and uniqueness results. 1.6.1. Proposition.118 (Ordinary differential equations.) Let 1 < ρ < +oo and \ R and φ^ : Γ χ R —> R are Caratheodory mappings representing a "conductivity" coefficient, distributed sources, and a boundary flux, respectively. 120

For the Neumann boundary conditions (i.e. φ^ = 0) the results concerning the studied nonlinear elliptic problem can be found as special cases in the monographs by Fucik and Kufner [195], Ladyzhenskaya and Uraltseva [279], Lions [291], etc. For a general case, we refer to Necas [327, Section 3.2], 121 Note that this normal does exist a.e. on Γ because Γ is Lipschitzian.

1.6 Some differential and integral equations

53

As the classical (=pointwise) understanding of the problem (6.3) is not natural for both mathematical and "physical" reasons, the standard understanding of (6.3) is in the sense of distributions, which leads to the notion of a so-called weak solution. The weak formulation arises by multiplying the equation in (6.3) by some test function y, by integration over Ω, and by applying Green's formula, i.e. / ( j i d i v y2 + j 2 V y i ) dx = Jn

/ yx(y2 • n)d,S Jr

which holds for any y\ : A —^ R and yi : Ω —• Μ" smooth enough, and eventually by substitution of the conormal derivative from the boundary condition in (6.3), which finally yields the identity (6.4)

/ [a{x,Vy)-Vy Jn

+ 0. Note that the growth is designed so that a ο Vy e Lp/{p 1} (Ω; M") for any y € ^ 1 , Ρ ( Ω ) , and that (pdoye Ζ," ρ / ( " ρ -" + ρ ) (Ω) for any y e Lnp^n~p)(il) because ^ ' ^ ( Ω ) C Lnpl{n~p)(Ci), see Section 1.4. This imbedding is continuous for ρ < η, while for ρ — η this is not guaranteed, which forced us to take a certain reserve ε > 0. On the other hand, for ρ > η, the imbedding ^ ^ ( Ω ) C Ζ,°°(Ω) is again continuous, so that an arbitrary growth of the coefficient φ j can be admitted. The growth condition (6.9) was designed analogously, taking into account the imbedding Ψ ι ~ ι / Ρ ' Ρ ( Γ ) C ^ " - ' ^ " - ^ ( Γ ) for ρ r) dr

d( fX ( , s , Pd (x,rr) = Jo Jq ν(ψά(*> rr) + rr —— dr

= JQ (ψά(χ, rr) + T 123

dT d r = [τφΑ(χ, r r ) ] J = 0 = φά(χ,

r).

In fact some assumptions can be weakened a bit. For example, one of the constants c\ and c2 can possibly vanish. 124 For more details in the case q)b — 0, we refer to the monographs by Fucik and Kufner [195], or Lions [291].

1 Background generalities

56

Note that (6.5) has been used in the first case. The fact that 3 r ) / d r = r) can be derived in the same way. The particular terms of Φ are continuous as a consequence of the continuity of the mapping y ^ Vy : W ^ i l ) L p ( i l ; R n ) , of the imbedding of Wl>p(CL) into L a | / ( a i - 1 } ( n ) and of the trace operator y h* y\r : ^ ' ^ ( Ω ) La^{a^X)(Y) and thanks to the continuity of the Nemytskii mappings : Lp(i2; R n ) -> L1 (12), a ( a 1 ) a ] ) : L '/ '- (n) Ζ,'(Ω) and ^ : L ^~ (T) Ζ,'(Γ), respectively. Besides, the assumptions (6.12)-(6.14) imply the convexity of the respective terms of Φ, which causes Φ to be weakly lower semicontinuous. By (6.15), one has α ( χ , ζ ) = /0' ζ • α(χ,τζ)dr > /J c0rp-' \ζ \ράτ = CQP~l\t\q- Similarly, (6.16) and (6.17) imply respectively > c\p~x\r\p — bi(x)\r\ and \φ\>(χ, r)| > c2p~x\r\p — b2(x)\r|. It enables to estimate ΦΟΟ > - [ (co|Vyr + Cl|#-^b|)£Lc

+

P JA

> i (min(c0,ci)||3;||^1JI(il) -

- f

Ρ JT

(c2\y\r-b2\y\)dS

||L«,(n) ΙΙ)ΊΙζ, 1, the functional Φ is coercive in the sense that Φ(_)0 —^ +00 for Ιΐ3||νν ·/'(Ω) —^ +00. Then the reflexivity of Ψ]'Ρ(Ω) ensures the existence of a minimizer of Φ over ^ ' ^ ( Ω ) ; recall that we assumed 1 < ρ < +oo. The last assumption in the theorem ensures strict convexity of Φ, which makes the solution unique. • The next equation we want to treat here is the nonlinear parabolic equation. 125 More precisely, we will consider the homogeneous Neumann initial-boundaryvalue problem for such an equation: r

(6.18)

^ - div a(x, Vy) = · £ ( R k , R") = R* x " and the Caratheodory mapping φ : Ω χ Ε" —> R* are supposed to satisfy, for a suitable 1 < a < p,

(6.25)

f ([ \a{x,x)\a/{a-X)dx)p~p'adx < +00, Ja Ja 3αφ eLa(ü) €Μ+ : \φ(χ, r)\ < αφ(χ) + 0φ\κ\ρ/α,

(6.26)

3£ e Lpa/(p~a\il)

(6.24)

:

\φ(χ, η ) - φ(χ, r2)\ < ε (χ) | η - r 2 |.

1.6.4. Proposition. (Hammerstein integral equation.) Let (6.24)-(6.26) be satisfied, yo € Ζ,Ρ(Ω;Μ"), and ||Λ||ί,ρ(Ω;ί,α/(α-ΐ)(Ω^*χ»))||^||ί,Ρα/(Ρ-α)(Ω) < 1. Then the integral equation (6.23) possesses precisely one solution y. Moreover, (6.27)

η „ llm^R")

^

0 : Kf C Kf C K2 , Kf convex closed,

clZ2 | J Kf = K2, d> 0

(7.10)

Vzi e Ku

z2eK2:

O f ( Z l , ·) Λ Ο ^ ζ ι , ·) & Ο d 2 (-,z 2 ) Λ Ο 2 ( · , ζ 2 ) ,

(7.11)

Of + Φ(

Λ

2

(7.12)

3c € Κ

θ! + ο

,

3KUc convex compact

VdeR+

Ίζ2^Κ2

:

0 φ L e v ^ c O f (·, Z2) C KUc , (7.13)

3c e Μ

3 K X c convex compact

Vd e R +

Vzi € Kf :

0 φ L e v ^ c O d 2 ( z u · ) C K2,c . Then, for all d > 0, N a s h ^ ^ O f , Ο d ) Φ 0 and (7.14)

Limsup Nash(Of, O f ) c N a s h ( 0 i , 0 2 ) . dô KfxK* K,xK2

Proof. First, let us note that, by Theorem 1.7.1, the approximate problems always admit Nash equilibria; note that both Kf Π K\tC and Kf Π K2c are convex compact. Having ( z f , z f ) € N a s h A ^ x ^ ( O f , O f ) , by the uniform coercivity of the approximate problems (7.12) and (7.f3) one can localize all considerations on a compact set ΚijC χ K2c and suppose that (possibly after taking a finer net) there is (zi, z2) e K\ χ K2 such that zf zi and zf —• z2 for d 0. Our aim is to show that (zi, z 2 ) e N a s h ^ x ^ O i , 0 2 ) . We know that O f ( z f , z f ) < O f ( z i , z f ) and Ο f ( z f , z f ) < Of ( z f , z 2 ) for any (zi, z 2 ) G Kf χ K f . In particular, (7.15) V(£i, z 2 ) € Kf χ Kf : Of (zf, z f ) + Of (zf, zf) < Of (z~i, zf) + Οf (zf, z~2). By (7.10), l i m ^ o Of (zi, z f ) = 0 1 (z~,,z 2 ) and l i m ^ 0 Φ 2 (ζ?, z2) = Φ2(ζι,ζ2). Moreover, by (7.11) also l i m ^ 0 Of (zf, zf) + O f (zf, z f ) = Φι(ζι, z 2 ) + 0 2 ( z i , z 2 )· This allows us to pass to the limit in (7.15), which gives (7.16)

V(zi,z2)eKfxK2d

:

Φι(ζι, z2) + Φ2(ζι, z 2 ) < Φι(ζι, z 2 ) + 0 2 ( z i , z 2 ).

1.7 Non-cooperative game theory

63

Eventually, by (7.3) with (7.8) and (7.9) one can see that (7.16) holds even for any (zi, z 2 ) e Κι χ K2. In particular, taking zi := z\ shows that Φ 2 (ζι, z 2 ) < Φ2(ζ\,ζ2) for any z2 e K2. Analogously, z2 '•= z2 shows that zi minimizes Φι(·, z 2 ) over K\.

• In view of the (quite restrictive) conditions (7.4) and (7.11), it is worth considering a special class of games where Φι + Φ 2 is constant - without any loss of generality, we can suppose Φι + Φ 2 = 0. This means that the players have entirely antagonistic goals in the sense that the profit of one player is just the loss of the other one and vice versa. In such situation we speak about a zero-sum game. Putting Φ : = Φι = — Φ2, from (7.1) one can easily see that the point (z\, z2) £ K\ χ K2 is a Nash equilibrium if and only if (7.17)

η ύ η Φ ( ζ ι , ζ 2 ) = Φ(ζι,ζ 2 ) = π ™ χ Φ ( ζ ι , ζ 2 ) . £l 6Ä"i Z2eK2

Such point is also called a saddle point of Φ, and Φ is addressed as a payoff. Let us denote the set of all saddle points by (7.18)

Saddle Φ := N a s h ^ , - Φ ) = Ki*.K2 KixKi

{(zi, z2) e ^ i x K2\ (7.17) holds} .

The fact that (z\,z 2 ) Ε K\ χ K 2 is a saddle point of Φ is equivalent 134 to the fact that (7.19)

inf sup Φ(ζι, z 2 ) =

sup ί η ί Φ ( ζ ι , ζ 2 ) ,

and z\ € K\ and z2 € K2 are so-called conservative strategies in the sense that sup Φ(ζι, z2) z2eK2

=

inf sup Φ(ζι, z2), Ziî z2eK2

inf Φ(ζι, z2) = sup z ϊeKη ί Φ ( ζ ι , ζ 2 ) . z2eK2 ^ ^

The meaning of a conservative strategy is that a player tries to reach the highest own profit on the assumption that the only goal of the opponent is to harm him or her as much as possible. 135 As a plain consequence of Theorem 1.7.1, we can claim that Φ has a saddle point provided Φ is separately continuous and Φ(·, zi) is convex and inf-compact while Φ(ζι, ·) is concave and sup-compact and uniformly coercive, and Aî and K2 are convex. Nevertheless, the special character of the zero-sum problem makes it possible to modify the coercivity assumptions: 134

See, e.g., Aubin [21, Proposition 8.1] or Aubin and Ekeland [22, Section 6.2, Proposition 1], 135 If no convex/concave structure of the game can be guaranteed (as typical, e.g. in games with fully nonlinear systems or pursuer/evader games), it is often a satisfactory task to find a conservative strategy of at least one of the players; cf. Friedman [191], McMillan and Triggiani [311], Nikol'skii [334], Warga [460, Chapter IX], etc.

1 Background generalities

64

1.7.3. Theorem.136 (Saddle point - von Neumann [450], generalized.) Let the following

assumptions

be

satisfied:

is separately

continuous,

(7.20)

Φ

(7.21)

Κ

(7.22)

Vzi € AT],

zi

(7.23)

3z2

Vc e l :

Levje liC 0(-,z 2 )

(7.24)

3zi e

Vc G l :

ί β ν ^ ί - Φ ί ζ ι , ·))

Then

ι

and

K2 are

£ K

2

K\

Saddle*, Χ*2Φ

φ

e

convex, K2

:

Φ(·,ζ 2 )

is convex,

Φ(ζι, ·) is

is

concave,

compact, is

compact.

0.

If Φι(·,ζ 2 ) and Φ 2 (ζι, ·) possess Gateaux derivatives, denoted respectively by ν ζ ι Φι and ν ζ2 Φ 2 , from (7.1) we can easily establish the first-order necessary conditions for the Nash equilibrium point (zi, z2), namely

(7.25)

V Z l Φ](zi,Z2)

> Λ with 7 and Λ Banach spaces. Like in Section 1.3, we suppose that the state equation Π(ζι, Z2, y) = 0 has always a unique solution y:= π(ζι, zi) which defines the state mapping π : K\ χ K2 —»· Y . Then we define in a natural way the set of equilibrium points of (P) by Nash(P):= Nash (Φι, Φ 2 ) for Φ/(ζι,ζ 2 ):= J i ( z u z , π ( ζ ι , z ) ) , 1 = 1,2. 2

2

Theorem 1.7.1 and Proposition 1.7.2 can be applied straightforwardly to (P); note that the assumption (7.5) about the convex structure of the composed cost functions Φ/ basically forces us to consider only the state mapping π which is bi-affine.137 It is noteworthy to specify the optimality conditions (7.25): See [450] for the special case that the set of admissible strategies K\ and K2 are mixed strategies for a finite game, or Nikaidö and Isoda [332] for K\ and K2 convex and compact. The presented general version is due to Aubin and Ekeland [22, Section 6.2, Theorem 8] where even a lower/upper semicontinuous payoff function Φ is admitted. 137 A uniform convexity of •/](·, Z2, ·) and J 2 (zi. ·, ) may sometimes guarantee (7.5) even if π(·, u2) and π(u\, ·) are "slightly" nonlinear, cf. [396]. 136

1.7 Non-cooperative game theory

65

1.7.4. Proposition. (Optimality conditions for (P).) Let i/(zi, z2, ·) : Y R / = 1,2, and Π(ζι, z2, ·) · i7 —• Λ be Frechet differentiable, J\(-,zi,y) : Zt —• M, Λ(ζι, - j ) : Z2 ->· Μ, Π(·, z2> y) : Zi A, and Π(ζι, -,;y) : Z2 —• A be Gateaux equi-differentiable around y e Y. Let the state mapping π : ÄTi χ Ä"2 Υ as well as all the mappings [V Z| 7ι(ζι,ζ 2 , ·)](ζι) : Y R, [V Z2 7 2 (zi,Z2, ·)](&) : Γ K, [VZlII(zi, Z2, ·)](ζι) :Y Λ, and [VZ2II(zi, Z2, · ) ] ( ώ ) : Υ Λ be continuous, let Vyll(zi, Z2,y) G (K, A) have a bounded inverse, and (7.21) be valid. Then: (i) If (zi, z2) € Nash(P) and y = jt (zi, z2), then (7.26)

[Vz;n(z1,z2,y)rA?-Vz,yi(z1,z2,y)

e

Νκ,{ζι) ,

/ =

1,2,

/or Α*, λ | G A* satisfying the adjoint equation (7.27) (ii)

[ν,Π(ζι,ζ 2 ,)0]*Α / * = V K z j . z a , ? ) ,

/ = 1,2.

Conversely, if for some (zi, Z2) € x K2, the composed cost functions J\(·, Z2, η(·, Z2)) K\ —> R and J2(z\, ·,π(ζ\, ·)) : A^2 -> Μ are convex and (7.26)—(7.27) hold for y = π(ζ\,Ζ2) and for λ^,λ^ € A*, then (zi,z2) e Nash(P).

Sketch of the proof. Statement (i) is just (7.25) if one evaluates ν ζ ι Φι(ζι, zi) and ν Ώ Φ2(ζι, Z2) by means of Lemma 1.3.16. The sufficiency (i.e. (ii)) then follows by the convex structure of the particular minimization problems; cf. Remark 1.3.15.

• For a special case J := J\ = —J2, (P) becomes the zero-sum game-theoretical problem involving a state equation:

{

Minimax

J{z\,zi,y)

,

subject to Π(ζι,ζ2,}0 = 0 ,

Zi e Κχ , Z2 € K2 , and it is natural to define the set of saddle points of (Po) by

Saddle(Po) := Saddle Φ KixK2

for Φ(ζι, z2) := J(z\, z2, π (z\, z 2 )) ·

1.7.5. Corollary. (Optimality conditions for (Po).) Let / ( z i , z 2 , · ) : Υ -> I& be Frechet differentiable, J (·, Ζ2,}') : Zi —> Μ and J(z\, ·, y) Z2 —> Μ be Gateaux equi-differentiable around y € Γ. Lei ί/ie mappings [V Zl 7(zi, z2, ·)](£ι) : Κ —>· Μ and [VZ27(zi, z2, ·)\(ζι) : Γ —>• Μ be continuous, let (7.21) be valid, and let Π and π be as in Proposition 1.7.4. Then:

66

1 Background generalities

(i)

If (zi, zi) e Saddle(Po) and y = π(ζ\, zi), then

(7.28)

V Z | 7(zi,z2, y) - [V z ,n(zi,z2,y)]*x* e -NKl(zi)

(7.29)

VZLJ(ZuZ2,y)-[Vz2N(Zl,Z2,y)YX*

€

,

Nk2(z2),

with λ* € Λ* satisfying the adjoint equation (7.30) (ii)

[ V , n (zuz2,y)n*

=

VyJ(zuz2,y)·

Conversely, i f , for some (ζι,ζι) £ Κ ι χ ^2» J(·, Ζι, π(·, Z2)) K\ —» Κ. is convex, J (ζι, ·, Jt(z\, ·)) : —> R is concave, and (7.28)-(7.30) /roW / o r y = π ( ζ ι , Z2) for λ* £ Λ*, then (ζι,ζι) £ Saddle(Po).

1.7.6. Remark. (Other concepts.) Just for completeness, let us mention the concepts of equilibria which assume more complicated relations between both players or their cooperative behaviour. 138 It is a frequent case that the game is non-symmetric, having a certain hierarchical structure. For example, the first player, called a leader, can get the privilege to select his or her strategy first, and afterwards to communicate it to the second player, called a follower, who wants only to minimize his or her own cost; such situation arises quite often - let us mention, e.g., the control of a decentralized enterprise (^follower) by a central management office (=leader) or the relation between a central bank (=leader) and commercial banks (=followers), etc. Also, Φ2(«ι,·) may represent the potential energy of a stationary physical system whose configuration U2 is governed by an energy-minimization principle while mi is then to control such system optimally with respect to the cost functional Φι. Then the natural choice of the leader's optimal strategy z\ e Z\ is to minimize Φ ( : K{ R defined by ι(ζι) = sup Z26Argmin$2(Zi Φι(ζι, z2)· This choice will guarantee a minimal cost for the leader on the condition that the follower will behave optimally according to the cost function Φ2(ζι,·)· The pair (ζι, Z2) e Argmin4>i χ A r g m ü ^ 2 ( z i , •) is called a Stackelberg equilibrium [451]. Let us finally mention a fully cooperative situation when both players attempt simultaneously to minimize both cost functions Φι and Then they will naturally accept only those pairs of strategies (zi,z2) € K\ χ K^, called optimal in the sense of Pareto [346], such that there is no {z\,zi) £ K\ χ K2 such that (Φι (ζι, Z2), Φ2(£ι , Z2)) differs from (Φι (zi, z 2 ), $2(21, zi)) and simultaneously Φι(ζι»ζ2) < Φ ι ( ζ ι , Ώ ) and Φ 2 (£ι,Ζ2) < $2(21.22)· In other words, ζ : = (zi,z 2 ) makes ( Φ ι , Φ 2 ) minimal with respect to the ordering of R χ 1 by the cone [0, +00) 2 . In fact, most of the results from Section 1.3 and Chapter 4 can be 138

For more information about cooperative games we refer, e.g., to Aubin [21], Leitmann [285], Szep and Forgo [427], etc.

1.7 Non-cooperative game theory

67

straightforwardly generalized for this fully-cooperative game concept (also called a multicriteria decision concept) just by replacing Φ and Μ ordered by the cone [0, +00) respectively with (Φι, Φ2) and Μ χ Μ ordered by the cone [0, +00) 2 (or even by a more general cone with non-empty interior139). Various combinations are also considerable: e.g. in a non-cooperative game one or both players may have vector-valued individual cost functions whose minimization represents a multicriteria decision making problem.

The non-empty interior is needed for S2 from (3.32) to be still open if M + is replaced by the interior of the cone which orders E x l . Then the F. John-type optimality conditions just involve the "first multiplier" AJ e (R χ R) which is positive with respect to the dual ordering. 139

Chapter 2 Theory of convex compactifications

In various relaxation schemes the common feature is the convexity of the used compact envelopes of the original spaces. Thus, to give an abstract and unified viewpoint of particular concrete cases, it is worth developing a general theory of what we will call "convex compactifications". This means, as it sounds, compactifications which are simultaneously convex subsets of some locally convex spaces. The convexity is certainly a considerable restriction and it should be emphasized that not every topological space admits nontrivial convex compactifications but, on the other hand, there are topological spaces which admit a lot of them. It is then certainly useful to introduce a natural ordering of convex compactifications of a given space. This will be done in Section 2.1. Furthermore, we will find useful to have a certain unified (we will say "canonical") form of an arbitrary convex compactification. Imitating the classical construction based on the multiplicative means on some ring of continuous bounded functions, in Section 2.2 we will construct our convex compactifications by using the means (cf. Section 1.5) on a suitable (we will say "convexifying") linear subspace of the space of continuous bounded functions on a topological space to be compactified. An important result is that there is a one-to-one order-preserving correspondence between all closed convexifying subspaces and all convex compactifications. In particular, it identifies the topology of the uniform convergence as decisive for the created convex compactification in the sense that, on the one hand, making a closure of the convexifying linear subspace in this topology does not change the corresponding convex compactification but, on the other hand, any further enlargements create convex compactifications which are actually strictly finer. In many concrete applied problems the spaces to be compactified possess, besides a topological structure, also a boundedness structure. This enables to speak about a coercivity of these problems, which eventually localizes investigations onto one "sufficiently" large bounded set. Typically this set cannot be chosen a priori for a given class of problems, which forces us to modify in Section 2.3 our concept of convex compactifications in such a manner that the resulting envelope is convex σ -compact but, in general, not compact. We will speak then of a "convex σ -compactification". It may be itself a linear manifold, though typically it is rather not. Sometimes convex σ -compactifications can have additional important special properties, like local compactness, homogenity, etc. The canonical form enables us, in Section 2.4, to develop an approximation theory of convex compactifications, which forms an abstract framework for developing

2.1 Convex compactifications

69

a computer-implementable numerical scheme in concrete cases. Also, the canonical form enables us to formulate simple criteria for mappings (esp. functions) to admit an affine continuous extension onto respective convex σ -compactifications and also to investigate their differentiability properties. This will be performed in Section 2.5.

2.1

Convex compactifications

Let us begin directly with the definition of the notion of a convex compactification which represents the fundamental concept used for relaxation theory as presented in this book. Let us consider a topological space U to be compactified, τ being its topology. 2.1.1. Definition. (Convex compactification.) A triple (Κ,Ζ, i) is called a convex compactification of a topological space (U, τ) if (a)

Ζ is a Hausdorff locally convex space,

(b)

Κ is a convex, compact subset of Ζ,

(c)

ι : U —> Κ is continuous, and

(d)

i(U) is dense in K.

If i is also injective (resp. a homeomorphical imbedding), (Κ,Ζ,ί) Hausdorff (resp. τ-consistent) convex compactification.

is called a

Note that the pair ( K , i ) , created from a convex compactification (Κ,Ζ, i) by "forgetting" Z, is a compactification of U in the usual sense as introduced in Section 1.1. Also note that, in general, the imbedding i is not even required to be injective so that some points of the original space U can be "glued together" in the compactification Κ. The set of all convex compactifications of a given topological space U can be ordered in a natural way. 2.1.2. Definition. (Ordering of convex compactifications.) Let (K\,Z\,i\) (K.21 Z2,i2) be two convex compactifications of U. Then we will say that: (i)

(ii)

and

(K\ ,Z\,i\) is a finer convex compactification of U than (K2, Z2,12), and write (K\ ,Z\,i\) >: (K2, Z2, i'2), if there is an affine continuous mapping ψ : Κ] K2 fixing U; the adjective "affine" means ψ(\ζ 4= \ψ(ζ) + \ψ(ζ) for any ζ, Ζ n2. Clearly, a. From (2.3) we can deduce that for every / e 9s and η e Ν there is £/,„ e Η suchthat \f {u\)-\f {u2)\ < £ for all ξ > £/,„. As (Ξ, >) is a directed index set, for every a:= (F, n) e Hg; there is ξα e Ξ such that ξα > for all f € F. We denote ({/}, n) e by a/,„. Putting üa:~ W|a, we get V/ e

η e N, a > «/,„ :

/("a) - ^/("i) ~

af,n:

f (Maß) -

3) -

By the diagonalization procedure we obtain the net {u a a } a e s V/ € 9?, η

G

Ν,

a >

α/ιΒ

:

/(«aa) - ^ / ( " a ) ~ ^/(«3)

/(«ao) ~ ^ / ( " l ) ~ +

1 ^/("a) < — η such that