383 121 3MB
English Pages 278 Year 2020
Table of contents :
Preface......Page 7
Acknowledgements......Page 10
Introduction......Page 11
References......Page 21
Contents......Page 22
1 Convex Sets: Basic Properties......Page 27
1.1 *Motivation: Fair Bargains......Page 28
1.2 Convex Sets and Cones: Definitions......Page 33
1.3 Chapters 1–4 in the Special Case of Subspaces......Page 37
1.4 Convex Cones: Visualization by Models......Page 39
1.4.1 Ray Model for a Convex Cone......Page 40
1.4.2 Sphere Model for a Convex Cone......Page 41
1.4.3 Hemisphere Model for a Convex Cone......Page 42
1.4.4 TopView Model for a Convex Cone......Page 43
1.5.1 Homogenization: Definition......Page 44
1.5.2 Nonuniqueness Homogenization......Page 46
1.6 Convex Sets: Visualization by Models......Page 49
1.7.1 The Three Golden Convex Cones......Page 52
1.7.2 Convex Hull and Conic Hull......Page 53
1.7.3 Primal Description of the Convex Hull......Page 54
1.8.1 Definition of Three Operations......Page 56
1.8.2 Preservation of Closedness and Properness......Page 57
1.9 Radon's Theorem......Page 58
1.10 Helly's Theorem......Page 61
1.11 *Applications of Helly's Theorem......Page 64
1.12 Carathéodory's Theorem......Page 66
1.13 Preference Relations and Convex Cones......Page 69
1.14 *The Shapley–Folkman Lemma......Page 70
1.15 Exercises......Page 73
1.16 Hints for Applications of Helly's Theorem......Page 77
References......Page 78
2 Convex Sets: Binary Operations......Page 79
2.1.2 *Crash Course in Working CoordinateFree......Page 80
2.2 Binary Operations and the Functions +X, X......Page 82
2.3 Construction of a Binary Operation......Page 83
2.4 Complete List of Binary Operations......Page 86
2.5 Exercises......Page 88
References......Page 90
3 Convex Sets: Topological Properties......Page 91
3.1 *Crash Course in Topological Notions......Page 92
3.2 Recession Cone and Closure......Page 94
3.3 Recession Cone and Closure: Proofs......Page 96
3.4 Illustrations Using Models......Page 97
3.5 The Shape of a Convex Set......Page 100
3.6 Topological Properties Convex Set......Page 104
3.7 Proofs Topological Properties Convex Set......Page 106
3.8.2 Certificates for Insolubility of an Optimization Problem......Page 107
3.9 Exercises......Page 108
4 Convex Sets: Dual Description......Page 111
4.1.1 Child Drawing......Page 112
4.1.2 How to Control Manufacturing by a Price Mechanism......Page 113
4.2 Duality Theorem......Page 114
4.3 Other Versions of the Duality Theorem......Page 117
4.3.2 Separation Theorems......Page 118
4.3.3 Theorem of Hahn–Banach......Page 119
4.3.4 Involution Property of the Polar Set Operator......Page 120
4.3.5 Nontriviality Polar Cone......Page 122
4.4 The CoordinateFree Polar Cone......Page 125
4.5 Polar Set and Homogenization......Page 127
4.6 Calculus Rules for the Polar Set Operator......Page 129
4.7 Duality for Polyhedral Sets......Page 132
4.8.1 Theorems of the Alternative......Page 134
4.8.2 *The BlackScholes Option Pricing Model......Page 137
4.8.3 *Child Drawing......Page 138
4.8.4 *How to Control Manufacturing by a Price Mechanism......Page 139
4.9 Exercises......Page 140
References......Page 147
5 Convex Functions: Basic Properties......Page 148
5.1.1 Description of Convex Sets......Page 149
5.1.2 Why Convex Functions that Are Not Nice Can Arise in Applications......Page 151
5.2 Convex Function: Definition......Page 152
5.3 Convex Function: Smoothness......Page 157
5.4 Convex Function: Homogenization......Page 160
5.5 Image and Inverse Image of a Convex Function......Page 163
5.6 Binary Operations for Convex Functions......Page 164
5.7 Recession Cone of a Convex Function......Page 169
5.8.1 Description of Convex Sets by Convex Functions......Page 170
5.9 Exercises......Page 171
6 Convex Functions: Dual Description......Page 175
6.2 Conjugate Function......Page 176
6.3 Conjugate Function and Homogenization......Page 179
6.4 Duality Theorem......Page 180
6.5 Calculus for the Conjugate Function......Page 181
6.6 Duality: Convex Sets and Sublinear Functions......Page 182
6.7 Subgradients and Subdifferential......Page 186
6.8 Norms as Convex Objects......Page 189
6.9 *Illustration of the Power of the Conjugate Function......Page 190
6.10 Exercises......Page 191
7 Convex Problems: The Main Questions......Page 196
7.1 Convex Optimization Problem......Page 197
7.2.1 Existence......Page 199
7.2.2 Uniqueness......Page 200
7.2.3 Illustration of Existence and Uniqueness......Page 201
7.3 Smooth Optimization Problem......Page 202
7.4 Fermat's Theorem (Smooth Case)......Page 205
7.5 Convex Optimization: No Need for Local Minima......Page 207
7.6 Fermat's Theorem (Convex Case)......Page 208
7.7 Perturbation of a Problem......Page 211
7.8 Lagrange Multipliers (Smooth Case)......Page 212
7.9 Lagrange Multipliers (Convex Case)......Page 215
7.10 *Generalized Optimal Solutions Always Exist......Page 220
7.11 Advantages of Convex Optimization......Page 222
7.12 Exercises......Page 223
8 Optimality Conditions: Reformulations......Page 227
8.1 Duality Theory......Page 228
8.2 Karush–Kuhn–Tucker Theorem: Traditional Version......Page 232
8.3 KKT in Subdifferential Form......Page 235
8.4 Minimax and Saddle Points......Page 239
8.5 Fenchel Duality Theory......Page 242
8.6 Exercises......Page 243
Reference......Page 246
9.1 Least Squares......Page 247
9.2 Generalized Facility Location Problem......Page 248
9.3 Most Likely Matrix with Given Row and Column Sums......Page 249
9.4 Minimax Theorem: Penalty Kick......Page 250
9.5 Ladies Diary Problem......Page 251
9.6 The Second Welfare Theorem......Page 252
9.7 *Minkowski's Theorem on Polytopes......Page 254
9.8 Duality Theory for LP......Page 256
9.9 Solving LP Problems by Taking a Limit......Page 260
9.10 Exercises......Page 263
A.1.2 Some Specific Sets......Page 265
A.1.3 Linear Algebra......Page 266
A.1.6 Convex Sets......Page 267
A.1.8 Convex Functions......Page 269
A.1.10 Optimization......Page 271
B.1.3 Subspaces......Page 272
B.1.7 Norms......Page 273
Index......Page 274
Graduate Texts in Operations Research Series Editors: Richard Boucherie · Johann Hurink
Jan Brinkhuis
Convex Analysis for Optimization A Unified Approach
Graduate Texts in Operations Research Series Editors Richard Boucherie University of Twente Enschede, The Netherlands Johann Hurink University of Twente Enschede, The Netherlands
This series contains compact volumes on the mathematical foundations of Operations Research, in particular in the areas of continuous, discrete and stochastic optimization. Inspired by the PhD course program of the Dutch Network on the Mathematics of Operations Research (LNMB), and by similar initiatives in other territories, the volumes in this series offer an overview of mathematical methods for postmaster students and researchers in Operations Research. Books in the series are based on the established theoretical foundations in the discipline, teach the needed practical techniques and provide illustrative examples and applications.
More information about this series at http://www.springer.com/series/15699
Jan Brinkhuis
Convex Analysis for Optimization A Unified Approach
Jan Brinkhuis Econometric Institute Erasmus University Rotterdam Rotterdam, The Netherlands
ISSN 26626012 ISSN 26626020 (electronic) Graduate Texts in Operations Research ISBN 9783030418038 ISBN 9783030418045 (eBook) https://doi.org/10.1007/9783030418045 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Gra˙zyna
Preface
The novelty of this textbook is a simplification: convex analysis is based completely on one single method. A Unified Method Borrowed from Geometry Convex analysis, with its wealth of formulas and calculus rules, immediately intrigued me, having a background in pure mathematics and working at a school of economics. This background helped to see that some phenomena in various branches of geometry that can be dealt with successfully by one tested method, called homogenization, run parallel to phenomena in convex analysis. This suggested using this tested method in convex analysis as well. This turned out to work well beyond my initial expectations. Here are sketches of two examples of such parallels and of their simple but characteristic use in convex analysis. First Example of Homogenization in Convex Analysis Taking the homogenization of circles, ellipses, parabolas, and hyperbolas amounts to viewing them as conic sections—intersections of a plane and the boundary of a cone. This is known for more than 2000 years to be a fruitful point of view. Parallel to this, taking the homogenization of a convex set amounts to viewing this convex set as the intersection of a hyperplane and a homogeneous convex set, also called a convex cone. This turns out to be also a fruitful point of view. For example, consider the task to prove the following proposition: a convex set is closed under finite convex combinations. The usual proof, by induction, requires some formula manipulation. The homogenization method gives a reduction to the task of proving that a convex cone is closed under positive linear combinations. The proof of this, by induction, requires no effort. In general, one can always reduce a task that involves convex sets to a task that involves convex cones, and as a rule this is an easier task. This reduction is the idea of the single method for convex analysis that is proposed in this book. Second Example of Homogenization in Convex Analysis In algebraic and differential geometry, the homogenization operation gives often sets that are not vii
viii
Preface
closed. Then it is natural to make these sets closed by adding the socalled points at infinity. This is known to be a fruitful method for dealing with the behavior “at infinity.” The added points model twosided or onesided directions. Parallel to this, taking the homogenization of an unbounded closed convex set gives a convex cone that is not closed, and again it is natural to make it closed by taking its closure and to view the added points as points at infinity of the given convex set. The added points turn out to correspond exactly to a wellknown important auxiliary concept in convex analysis: recession directions of the convex set. These are directions in which one can travel forever inside the convex set. It is a conceptual advantage of the homogenization method that it makes this auxiliary concept arise in a natural way and that it makes it easier to work with it. This turns out to be a fruitful method to work with unbounded convex sets. For example, consider the task to prove the following theorem of Krein–Milman: each point of a closed convex set is the sum of a convex combination of extreme points and a conic combination of extreme recession directions. The usual proof, by induction, requires a relatively intricate argument involving recession directions. The homogenization method gives a reduction to the task of proving that a closed convex cone is generated by its extreme rays. The proof of this, by induction, requires no effort. In general, one can always reduce a task that involves an unbounded convex set—and that therefore requires the use of recession directions—to a task that involves a closed convex cone, and as a rule this is an easier task. In other words, the distinction between bounded and unbounded convex sets disappears, in some sense. The homogenization method made everything fall into its proper place for me. On the Need for Calculus Rules for Convex Sets For example, I was initially puzzled by the calculus rules for convex sets—formulas for a convexity preserving operation (such as the intersection of two convex sets) followed by the duality operator for convex sets. These rules are of fundamental interest, but do they have practical applications? If not, it might be better to omit them from a textbook on convex analysis, especially as they are relatively intricate, involving, for example, the concept of sublinear functions. The homogenization method gives the following insight. In order to solve concrete convex optimization problems, one has to compute subdifferentials, certain auxiliary convex sets; this requires calculus rules for convex sets, and these are essentially the formulas mentioned above, as you will see. A Unified Definition of the Properties Closed and Bounded Another example is that the homogenization method reveals that the standard theory of convex analysis is completely satisfactory and straightforward, apart from one complication. To understand what this is, one has to keep in mind that convex sets and functions that occur in applications have two nice properties, closedness and properness; these properties make it easy to work with them. However, when you work with them, you might be led to new convex sets and functions that do not possess these nice properties; then these are not so easy to work with. Another complication is that previously the definitions of the properties closed and proper for various convex objects (sets and functions) had to be made individually and seemed to involve
Preface
ix
several judicious choices. The homogenization method gives a unified definition for each of these two properties; this definition can be used for each type of convex object. This also helps to some extent to deal with the complications mentioned above. Complete Lists of Convex Calculus Rules The use of the homogenization method is in essence known to experts; it is even used in some textbooks. However, it does not appear to be widely known that homogenization is useful for virtually everything in convex analysis. For example, it has been tried several times to obtain complete lists of the many convex calculus rules for convex sets and functions. In these attempts, experts had discarded the possibility to do this by homogenization: convex cones were considered to be too poor in structure to capture fully the rich world of these rules. However, when I tried this nevertheless, it worked: complete lists were obtained in joint work with V.M. Tikhomirov. This makes clear that the homogenization method can also lead to novel results in convex analysis. A Convex Analysis Course Based on Homogenization The present book grew out of lecture notes for the 9 week course Convex Analysis for Optimization of the LNMB (Dutch Network for the Mathematics of Operations Research) for about 30 PhD students that was based completely on homogenization. The chosen style is as informal as I could manage, given the wish to be precise. The transparent structure of convex analysis that one gets by using homogenization is emphasized. An active reader will ask himself/herself many questions, such as how the theory can be applied to examples, and will try to answer these. This helps to master the material. To encourage this, I have tried to guess these questions and these are placed at the end of each chapter. One chapter was presented each week of the course, the supporting examples and the applications at the beginning and end of each chapter were optional material. At the end of the course, participants had to hand in, in small groups, the solutions of their own selection of exercises: three of the more challenging exercises from each chapter. How Homogenization Generates Convex Analysis Writing these lecture notes made me realize that the homogenization method, in combination with some models for visualization, generates every concept, result, formula, technique using formulas, and proof from convex analysis. Every task can be carried out by the same method: reduction to the homogeneous case—that is, to convex cones, which makes the task always easier. This makes this unified approach, together with the models for visualization, very suitable for an introductory course: all you need to know and understand from convex analysis becomes easily available. Moreover, the structure of the entire theory becomes more transparent. The only work to be done is to establish the standard properties of convex sets—always by means of homogenization. Then all properties of convex functions (such as their continuity) and convex optimization problems are just consequences. Moreover, a number of relatively technical subjects of convex optimization such as the Karush–Kuhn– Tucker conditions and duality theory are seen to be reformulations of a simpler looking result: the convex analogue of the Lagrange multiplier rule. The course did
x
Preface
not include an analysis of the performance of algorithms for convex optimization problems. It remains to be seen how helpful homogenization will be for that purpose. Textbook on Convex Analysis Based on Homogenization After having given this course several times, the LNMB and Springer Verlag kindly invited me to turn the lecture notes into a textbook, in order to share the unified approach to this subject by the homogenization method with the wide circle of practitioners, who want to learn in an efficient way the minimum one needs to know about convex analysis.
Acknowledgements I would like to thank Vladimir M. Tikhomirov for stimulating my interest in convex analysis and for working with me on the construction of complete lists of convex calculus rules. I extend my thanks to Vladimir Protasov for sharing his insights. Moreover, I would like to thank Krzysztof Postek, with whom I taught the LNMB course in 2018, for reading the first version of each chapter in my presence and giving immediate feedback. I am very grateful to Jan van de Craats, who produced the figures, which provide an easy access to the entire material in this book. I would like to extend my thanks to Andrew Chisholm and Alexander Brinkhuis for their suggestions for the Preface and the Introduction. Rotterdam, The Netherlands
Jan Brinkhuis
Introduction
Prerequisites This book is meant for master, graduate, and PhD students who need to learn the basics of convex analysis, for use in some of the many applications of convex analysis, such as, for example, machine learning, robust optimization, and economics. Prerequisites to reading the book are: some familiarity with linear algebra, the differential calculus, and Lagrange multipliers. The necessary background information can be found elsewhere in standard texts, often in appendices: for example, pp. 503–527 and pp. 547–550 from the textbook [1] “Optimization: Insights and Applications,” written jointly with Vladimir M. Tikhomirov. Convex Analysis: What It Is To begin with, a convex set is a subset of ndimensional space Rn that consists of one piece and has no holes or dents. A convex function is a function on a convex set for which the region above its graph is a convex set. Finally, a convex optimization problem is the problem to minimize a convex function. Convex analysis is the theory of convex sets, functions, and its climax, optimization problems. The central result is that each boundary point of a convex set lies on a hyperplane that has the convex set entirely on one of its two sides. This hyperplane can be seen as a linearization at this point of the convex set, or even better, as a linearization at this point of the boundary of the convex set. This is the only successful realization of the idea of approximating objects that are nonlinear (and so are difficult to work with) by objects that are linear (and so are easy to work with thanks to linear algebra) other than the celebrated realization of this idea that is called differential calculus. The latter proceeds by taking derivatives of functions and leads to the concept of tangent space. Convex Analysis: Why You Should Be Interested One reason that convex analysis is a standard tool in so many different areas is the analogy with differential calculus. Another reason for the importance of convex analysis is that if you want to find in an efficient and reliable way a good approximation of an optimal solution of an optimization problem, then it is almost always possible to find it if this problem is convex, but it is rarely possible otherwise.
xi
xii
Introduction
The Main Results of Convex and Smooth Optimization Are Compared It is explained in this book that the main result of convex optimization comes in various equivalent forms such as the Karush–Kuhn–Tucker conditions, duality theory, and the minimax theorem. Moreover, it is explained that the main result of convex optimization is the analogue of the main result of smooth optimization, the Lagrange multiplier method, in a precise sense. Figure 1 illustrates how changing the method of linearization in the main result of smooth optimization gives the main result of convex optimization, for a simple but characteristic optimization problem that is both smooth and convex. A paraboloid is drawn. It is the graph of a function z = F (x, y). The problem of interest is that of minimizing F (x, 0). In the figure, a vertical plane is drawn; this is the coordinate plane y = 0. Moreover, a parabola is drawn; it is the intersection of the paraboloid and the vertical plane. This parabola is the graph of the function F (x, 0). The variable y represents some change of a parameter in the problem, and if this change is y, then the problem is perturbed and becomes that of minimizing F (x, y) as a function of x, for that value of y. Thus the optimization problem of interest is embedded in a family of optimization problems parametrized by y, and this entire family is described by one function, F . Now let x be a point of minimum for the problem of interest. Then we can linearize the graph of F at the point P = ( x , 0, F ( x , 0)) in two ways: using that F is smooth or using that F is convex. The smooth linearization is to take the tangent plane at P , the convex linearization is to take a plane through P that lies completely under the graph of F . The outcome of both linearization methods is the same plane. In the figure, this plane is drawn. The main result of smooth and convex optimization for this situation is that this plane is parallel to the xaxis.
Fig. 1 The main result of smooth and convex optimization
tangent plane
slope η
∧
∧
(x, 0, F (x, 0)) z = F (x, y ) O ∧
x x
y
Introduction
xiii
This follows from the minimality property of x . Therefore, the equation of this plane is of the form z − F ( x , 0) = ηy for some real number η. This number η is the Lagrange multiplier for the present situation. This procedure can be generalized to arbitrary smooth or convex functions F (x, y) with x ∈ Rn and y ∈ Rm . For smooth F one has to use smooth linearization, for convex F one has to use convex linearization. This gives the main result of smooth optimization, respectively, convex optimization. Thus the difference between the derivations of the two results is only in the method of linearization that is used. Novelty: One Single Method Convex analysis requires a large number of different operations, for example, maximum, sum, convex hull of the minimum, conjugate dual, subdifferential, etc., and formulas relating them, such as the Fenchel–Moreau theorem f ∗∗ = f , the Moreau–Rockafellar theorem ∂(f + g)(x) = ∂f (x) + ∂g(x), the Dubovitskii–Milyutin theorem ∂ max(f, g)(x) = ∂f (x)co ∪ ∂g(x) if f (x) = g(x), etc., depending on the task at hand. Previously, all these operations and formulas had to be learned one by one, and all proofs had to be studied separately. The results on convex analysis in this textbook are the same as in all textbooks on convex analysis. However, what is different in this textbook is that a single approach is given that can be applied to all tasks involved in convex analysis: construction of operations and formulas relating them, formulation of results and construction of proofs. Using this approach should save a lot of time. Once you have grasped the meaning of this approach, you will see that the structure of convex analysis is transparent. Description of the Unified Method The foundation of the unified approach is the homogenization method. This method consists of three steps: 1. homogenize, that is, translate a given task into the language of nonnegative homogeneous convex sets, also called convex cones, 2. work with convex cones, which is relatively easy, 3. dehomogenize, that is, translate back to get the answer to the task at hand. The use of homogenization in convex analysis is borrowed from its use in geometry. Therefore, we first take a look at its use in geometry. History of the Unified Method As long ago as 200 years BCE, Apollonius of Perga used an embryonic form of homogenization in his eight volume work “Conics,” the apex of ancient Greek mathematics. He showed that totally different curves—circles, ellipses, parabolas, and hyperbolas—have many common properties as they are all conic sections, intersections of a plane with a cone. Figure 2 shows an ice cream cone and boundaries of intersections with four planes. This gives, for the horizontal plane a circle, for the slightly slanted plane an ellipse, for the plane that is parallel to a ray on the boundary of the cone a parabola, and for the plane that is even more slanted a branch of a hyperbola. Thus, apparently unrelated curves can be seen to have common properties because they are formed in the same way: as intersections of one cone with various planes. This phenomenon runs parallel to the fact that totally different convex
xiv
Introduction
Fig. 2 Conic sections: circle, ellipse, parabola, branch hyperbola
Fig. 3 Convex set as intersection of a hyperplane and a convex cone
A×{1}
1
C
O
A
sets—that is, sets in column space Rn that consist of one piece and have no holes or dents—have many common properties and that this can be explained by homogenization: each convex set is the intersection of a hyperplane and a convex cone, that is, a convex set that is positive homogeneous—containing all positive scalar multiples for each of its points. Figure 3 shows the homogenization process in a graphical form. A convex set A is drawn in the horizontal plane R2 . If one vertical dimension is added and threedimensional space R3 is considered, then the set A can be lifted up in threedimensional space in a vertical direction to level 1; this gives the set A × {1}. Next, the convex cone C that is the union of all open rays with endpoint the origin O and running through a point of A × {1} is drawn. This convex cone C is called the homogenization or conification of A. Note that there is no loss of information in going from A to C. If you intersect C with the horizontal plane at level 1 and drop the intersection down vertically onto the horizontal plane at level 0, then you get A back again. This description of a convex set as the intersection of a hyperplane and a convex cone helps by reducing the task of proving any property of a convex set to the task of proving a property of a convex cone, which is easier. Working with Unboundedness in Geometry by the Unified Method In 1415, the Florentine architect Filippo Brunelleschi made the first picture that used linear perspective. In this technique, horizontal lines that have the same direction intersect at the horizon in one point called the vanishing point.
Introduction
xv
Fig. 4 Parallel tulip fields and linear perspective
Figure 4 illustrates linear perspective; it shows a stylized version of parallel tulip fields of different colors that seem to stretch to the horizon. The vanishing point is behind the windmill. In the early nineteenth century, the technique of linear perspective inspired the discovery of projective geometry. Projective space includes a “horizon” consisting of “points at infinity,” which represent twosided directions. Projective space enables dealing with problems at infinity in algebraic geometry. This is again an instance of homogenization. Even before the discovery of projective space, Carl Friedrich Gauss made a recommendation about how one should deal with onesided directions. In the first lines of his “Disquisitiones Generales Circa Superficies Curvas” (general investigations of curved surfaces), published in 1828, the most important work in the history of differential geometry, he wrote: Disquisitiones, in quibus de directionibus variarum in spazio agitur, plerumque ad maius perspicuitatis et simplicitatis fastigium evehuntur, in auxilium vocando superficiem sphaericum radio =1 circa centrum arbitrarium descriptam, cuius singula puncta repraesentare censebuntur directiones rectarum radiis ad illa terminatis parallelarum. Investigations, in which the directions of various straight lines in space are to be considered, attain a high degree of clearness and simplicity if we employ, as an auxiliary, a sphere of unit radius described about an arbitrary center, and suppose the different points of the sphere to represent the directions of straight lines parallel to the radii ending at these points.
Gauss’ recommendation for dealing with problems in space involving onesided directions is another example of homogenization. That this is indeed homogenization can be seen more clearly from the work of August Ferdinand Möbius, a student of Gauss. He developed the idea further applying it one dimension lower to the plane (in order to solve a problem in algebraic geometry). To begin with, Möbius modeled a point in the plane (x, y) as the open ray {(tx, ty, t)  t > 0} in threedimensional space, and he modeled a onesided direction in the plane given by the unit vector (x, ˜ y) ˜ as the open ray {(t x, ˜ t y, ˜ 0)  t > 0} in threedimensional space. This is convenient as all of his objects of interest, points in the plane, and onesided directions in the plane are modeled in the same way: as open rays in the upper halfspace z ≥ 0 (the ray model). This model is based on homogenization as it is essentially the same construction as the one depicted in Fig. 3.
xvi
Introduction
However, open rays are infinite sets, and Möbius preferred to model points as points and not as infinite sets. Therefore, he took the intersection of each ray with the standard unit sphere x 2 + y 2 + z2 = 1. This creates a modified model of the set that consists of the points in the plane and the onesided directions in the plane, the hemisphere model. In this model, the point (x, y) in the plane is modeled as the point 1 (x 2 + y 2 + 1)− 2 (x, y, 1) of the open upper hemisphere x 2 + y 2 + z2 = 1, z > 0, and the onesided direction in the plane given by the unit vector (x, ˜ y) ˜ is modeled as the point (x, ˜ y, ˜ 0) on the circle that bounds the hemisphere, x 2 + y 2 = 1, z = 0. The advantage of the hemisphere model over the ray model is that points and directions are modeled as points instead of as infinite sets. Note that the circle created by the hemisphere model is the analogue for the plane of the auxiliary sphere that Gauss recommended for threedimensional space. Comparing the hemisphere model of Möbius and the Gauss model, the hemisphere model is more convenient to use because it does not require the artificial constructs required by the Gauss model (“we employ as an auxiliary, a sphere of unit radius described about an arbitrary center”). One more simplification is possible. Looking down at the upper hemisphere from high above makes it look like a disk in the plane. To be more precise, one gets this topview model of the plane and its onesided directions by orthogonal projection of the hemisphere on the horizontal coordinate plane z = 0: then a point (x, y) in the 1 plane is modeled as the point (x 2 + y 2 + 1)− 2 (x, y) of the standard open unit disk x 2 + y 2 < 1 in the plane R2 and a onesided direction in the plane given by the unit vector (x, ˜ y) ˜ is modeled as this unit vector, which is a point of the circle that bounds the disk, x 2 + y 2 = 1. The advantage of this topview model over the hemisphere model is that it is one dimension lower. This topview model for the plane and its onesided directions is the closed unit disk in the plane itself, but the hemisphere model for the plane and its onesided directions requires threedimensional space. This topview model for threedimensional space and its onesided directions is the closed unit ball in threedimensional space itself, but the hemisphere model for threedimensional space and its directions requires fourdimensional space, which is hard to visualize. The following line from the novel, “Dom dzienny, dom nocny” (House of day, house of night) by Olga Tokarczuk can be read as evoking the idea of modeling the points and onesided directions of the plane by means of the topview model: Widz˛e kolista˛ lini˛e horyzontu, który zamyka dolin˛e ze wszystkich stron. I see the circular line of the horizon, which encloses the valley on all sides.
Similarly to the way in which the three models were created for the plane, models can be created for the line, for threedimensional space, and more generally for a vector space of any finite dimension. Figure 5 illustrates the ray, hemisphere, and topview models in the simplest case, for dimension one, the line. In this figure, the extended real line R = R∪{+∞, −∞} is modeled. This is the real line together with the two onesided directions in the real line—right (+∞) and left (−∞). A real number r is modeled in the ray model by
Introduction
xvii
Fig. 5 Ray, hemisphere, and topview model in dimension 1
(r,1)
q 0
p
r
the open ray containing the point √ (r, 1), in the √ hemisphere model by the point on the open upper halfcircle q √ = (r/ r 2 + 1, 1/ r 2 + 1 ), and in the topview model by the real number p = r/ r 2 + 1 ∈ (−1, 1). The onesided direction “right” (+∞) is modeled in the ray model by the positive horizontal axis, in the hemisphere model by the right endpoint of the upper halfcircle (1, 0), and in the topview model by the right endpoint of [−1; +1], the number 1. The onesided direction “left” (−∞) is modeled in a similar way. Working with Unboundedness in Convex Analysis by the Unified Method Having described the use of the unified method to work with unboundedness in geometry, we now show how to use the unified method to work with unboundedness in convex analysis. We are going to explain how homogenization makes working with unbounded convex sets, which was previously complicated, as simple as working with bounded convex sets. We do this by borrowing constructions from differential geometry and algebraic geometry, which are explained above. In each branch of geometry, such as differential or algebraic geometry, the ideal objects are sets that have the socalled compactness property. A subset of ndimensional column space Rn is compact if it is bounded and closed. This suggests how compactness can be achieved for a bounded convex set A: just take the closure of A. That is, add all points to A that are not in A but that are the limit of a sequence of points in A. Now we consider an unbounded convex set. Previously, working with such a set involved the introduction of an artificial construction, that of the socalled recession directions of A. This required considerable time and effort to get used to in a course on convex analysis. However, homogenization leads to the hemisphere model or the topview model of A, and these are both bounded, so all we have to do to achieve the desired compactness property—that is, to compactify A—is to take the closure of one of these two models of A; that is all that is required (one can also compactify A using the ray model, but this requires some preparation: the introduction of the concept of metric space). In terms of A itself, this compactification corresponds to taking not only the closure of A but also adjoining to it all its recession directions. Working with the enlarged compact set is as simple for an unbounded convex set as for a bounded one; in both cases, it allows a convenient analysis of the original convex set. So, the homogenization method gives a natural mechanism for the construction of recession directions, and this helps to see why they are “points at infinity” (“on the horizon”), why they should be considered if you want to work with an unbounded convex set, and how you should prove all properties of recession directions.
xviii
Introduction
Recession Directions of a Convex Set We want to illustrate how taking the closure of one of the models of a convex set A corresponds precisely to taking the closure of A and simultaneously adjoining the recession directions of A. To begin with, we give the definition of a recession direction. A recession direction of a convex set A is a onesided direction in which one can travel forever inside the convex set A without leaving it, after having started in an arbitrary point of A. A recession direction exists precisely if A is unbounded. The set of all vectors having such a direction form, together with the origin, the socalled recession cone RA . So a recession direction of A can be defined as an open ray of RA , the set of all positive scalar multiples of a nonzero vector of RA . Illustration in Dimension One Figure 6 illustrates in dimension one, the line R, that recession vectors of a closed convex set arise by taking the closure of the ray model of this convex set. The ray model of an unbounded closed convex set A in the plane is drawn, the halfline A = [a, +∞) for some a > 0. The set A has one recession direction, “right” (+∞). The direction “right” is modeled in the ray model by the horizontal open ray with endpoint the origin that points to the right. This is the unique ray of the recession cone RA . You can see intuitively in this figure that this horizontal ray belongs to the closure of the ray model of A. Indeed, the beginning of a sequence of rays is drawn—the arrow suggests the direction of the sequence—and the rays of this sequence are less and less slanted, and their limit is precisely the horizontal ray that points to the right. This shows that the direction “right” is a point in the closure of the ray model of A; moreover, it does not lie in the ray model of A as all rays in this model are nonhorizontal. In fact, it is the only such point. Illustration in Dimension Two Figure 7 illustrates in dimension two, the plane R2 , that recession vectors of a closed convex set arise by adding boundary points to the topview model of this convex set. The topview model is drawn for three closed convex sets A in the plane that contain the origin as an interior point. The model of A is also denoted by A and the model of the set of recession directions of A is denoted by RA . In each one of the three pictures, the circle that bounds the disk is the horizon, as in the quote on the valley above. The points on the circle represent the onesided directions in the plane. The center of the disk bounded by the circle represents the origin of the plane R2 . The points of the shaded region form
A × {1}
O
RA
Fig. 6 Recession directions by means of taking the closure
A
Introduction
xix
A RA
A
RA
A
Fig. 7 The modeling of convex sets and their recession directions
the closure of the topview model of A. To be more precise, the points in the shaded region that do not lie on the circle model the points of A itself; the points in the shaded region that do lie on the circle model the recession directions of A, that is, the rays of the recession cone RA . The straight arrows with initial point at the center of the disk represent straight paths from the origin that lie completely inside A. The straight arrows either end inside the circle at the boundary of the model of A, or they end on the circle. In the first case, the path ends on the boundary of A. In the second case, the path is a halfline, it never ends. Looking at Fig. 7, from left to right, you see three topview models: of a bounded set (such as a set bounded by a circle or an ellipse), of a set having precisely one recession direction (such as a set bounded by a parabola), and of a set having infinitely many recession directions (such as a set bounded by a branch of a hyperbola). In each case, you see that if you take the closure of the topview model of A, then what happens is that the recession directions are added. From left to right, you see: no point is added, the unique recession direction is added, all the infinitely many recession directions are added. Recession Directions in the Three Models In the three models that we consider, the ray model, the hemisphere model, and the topview model, a recession direction of a convex set A ⊆ Rn is modeled as follows: in the ray model as a horizontal ray that is the limit of a sequence of nonhorizontal rays that represent points of A, in the hemisphere model as a point on the boundary of the hemisphere that is the limit of a sequence of points on the open hemisphere that represent points of A, and in the topview model as a point on the boundary of the ball that is the limit of a sequence of nonboundary points of the ball that represent points of A. Conclusion of the Application of Homogenization to Unbounded Convex Sets The unified method proposed in this book, homogenization, and the use of the three visualization models make unbounded convex sets as simple to work with as bounded ones: all that is required is to take the closure of one of the three models of a given convex set. Then recession directions arise without requiring an artificial construction and the geometric insight provided by the visualization models makes it easy to construct the proofs of properties of recession directions.
xx
Introduction
Comparison with the Literature The novelty of the present book compared to existing textbooks on convex analysis is that all convex analysis tasks are simplified by performing them using only the homogenization method. Moreover, many motivating examples and applications as well as numerous exercises are given. Each chapter has an abstract (“why” and “how”) and a road map. Other textbooks include: Dimitri P. Bertsekas [2], Jonathan M. Borwein and Adrian S. Lewis [3], Lieven Vandenberghe and Stephen Boyd [4], Werner Fenchel [5], Osman Güler [6], JeanBaptiste HiriartUrruty and Claude Lemaréchal [7, 8], Georgii G. MagarilIlyaev and Vladimir M. Tikhomirov [9], Ralph Tyrrell Rockafellar [10] and [11], Vladimir M. Tikhomirov [12], and Jan van Tiel [13]. The use of the homogenization method in convex analysis is wellknown to experts. One example is the treatment of polarity in projective space in “Convex Cones, Sets and Functions” (1953) by Werner Fenchel in [5]. Another one is the use of extreme points at infinity in “Convex Analysis” (1970) by Ralph Tyrrell Rockafellar in [10]. Still another example is the concept of cosmic space in “Variational Analysis” (1997) by Ralph Tyrrell Rockafellar and Roger Wets in [14]. However, it does not appear to be widely known that homogenization is useful for any task in convex analysis. For example, attempts have been made to obtain complete lists of rules for calculating the dual object of a convex object (such as a convex set or function) but without using homogenization. A recent attempt was described in “Convex Analysis: Theory and Applications” (2003) by Georgii G. MagarilIl’yaev and Vladimir M. Tikhomirov in [9]. Subsequently, it was shown that the homogenization method leads to the construction of complete lists—in“Duality and calculus of convex objects (theory and applications)”, Sbornik Mathematics, Volume 198, Number 2 (2007), jointly with Vladimir M. Tikhomirov in [15], and also in [16]. The methods that are explained in this book can be applied to convex optimization problems. For a small but very interesting minority, one can get the optimal solutions exactly or by means of some law that characterizes the optimal solution. Numerous examples are given in this book. These methods are also used, for example, in Yurii Nesterov [17] and in Aharon BenTal [18], to construct algorithms, which can find efficiently and reliably approximations to optimal solutions for almost all convex optimization problems. These algorithms are not considered in this book, as it is not yet clear how helpful the homogenization method is in the analysis of these algorithms. How Is the Book Organized? Each chapter starts with an abstract (why and what), a road map, a section providing motivation (optional), and it ends with a section providing applications (optional), and many exercises (the more challenging ones are marked by a star). Chapters 1–4 present all standard results on convex sets. Each result is proved in the same systematic way using the homogenization method, that is, by reduction to results on convex cones. To prove some of these results on convex cones, simple techniques and tricks, described in the text, have to be applied.
References
xxi
The remaining chapters use the homogenization method on its own to derive, as corollaries of the standard results on convex sets, the standard results for convex functions (Chaps. 5 and 6) and for convex optimization (Chaps. 7 and 8). In Chaps. 7 and 8, many concrete smooth and convex optimization problems are solved completely, all by one systematic fourstep method. The consistent use of homogenization for all tasks (definitions, constructions, explicit formulas, and proofs) reveals that the structure of convex analysis is very transparent. To avoid repetition, only proofs containing new ideas will be described in the book. Finally, in Chap. 9 some additional convex optimization applications are given.
References 1. J. Brinkhuis, V.M. Tikhomirov, Optimization: Insights and Applications (Princeton University Press, Princeton, 2005) 2. D.P. Bertsekas, A. Nedic, A.E. Ozdaglar, Convex Analysis and Optimization (Athena Scientific, Nashua, 2003) 3. J.M. Borwein, A.S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples (Springer, Berlin, 2000) 4. L. Vandenberghe, S. Boyd, Convex Optimization (Cambridge University Press, Cambridge, 2004) 5. W. Fenchel, Convex Cones, Sets and Functions. Lecture Notes, Department of Mathematics (Princeton University, Princeton, 1951) 6. O. Güler, Foundations of Optimization (Springer, Berlin, 2010) 7. J.B. HiriartUrruty, C. Lemaréchal, Convex Analysis and Minimization Algorithms, vols. I–II (Springer, Berlin, 1993) 8. J.B. HiriartUrruty, C. Lemaréchal, Fundamentals of Convex Analysis. Grundlehren Text Editions (Springer, Berlin, 2004) 9. G.G. MagarilIlyaev, V.M. Tikhomirov, Convex Analysis: Theory and Applications. Translations of Mathematical Monographs, vol. 222 (American Mathematical Society, Providence, 2003) 10. R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 1970) 11. R.T. Rockafellar, Conjugate Duality and Optimization (Society for Industrial and Applied Mathematics, Philadelphia, 1989, First published in 1974) 12. V.M. Tikhomirov, Convex Analysis. Analysis II, Encyclopaedia of Mathematical Sciences, vol. 14 (Springer, Berlin, 1990) 13. J. van Tiel, Convex Analysis: An Introductory Text (Wiley, Hoboken, 1971) 14. R.T. Rockafellar, R.J.B. Wets, Variational Analysis. Grundlehren der Mathematischen Wissenschaften (Springer, Berlin, 1997) 15. J. Brinkhuis, V.M. Tikhomirov, Duality and calculus of convex objects. Sb. Math. 198(2), 171– 206 (2007) 16. J. Brinkhuis, Convex duality and calculus: reduction to cones. J. Optim. Theory Appl. 143, 439 (2009) 17. Y. Nesterov, Lectures on Convex Optimization. Springer Optimization and Its Applications, vol. 137 (Springer, Berlin, 2018) 18. A. BenTal, A. Nemirovskii, Lectures on Modern Convex Optimization (Society for Industrial and Applied Mathematics, Philadelphia, 2001)
Contents
1
Convex Sets: Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 *Motivation: Fair Bargains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Convex Sets and Cones: Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Chapters 1–4 in the Special Case of Subspaces . . . . . . . . . . . . . . . . . . . . . . 1.4 Convex Cones: Visualization by Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Ray Model for a Convex Cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Sphere Model for a Convex Cone . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Hemisphere Model for a Convex Cone. . . . . . . . . . . . . . . . . . . . . 1.4.4 TopView Model for a Convex Cone . . . . . . . . . . . . . . . . . . . . . . . 1.5 Convex Sets: Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Homogenization: Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Nonuniqueness Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 Homogenization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Convex Sets: Visualization by Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Basic Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 The Three Golden Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.2 Convex Hull and Conic Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.3 Primal Description of the Convex Hull. . . . . . . . . . . . . . . . . . . . . 1.8 Convexity Preserving Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.1 Definition of Three Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.2 Preservation of Closedness and Properness . . . . . . . . . . . . . . . . 1.9 Radon’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 Helly’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.11 *Applications of Helly’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12 Carathéodory’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.13 Preference Relations and Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.14 *The Shapley–Folkman Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.15 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.16 Hints for Applications of Helly’s Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 7 11 13 14 15 16 17 18 18 20 23 23 26 26 27 28 30 30 31 32 35 38 40 43 44 47 51 52
xxiii
xxiv
Contents
2
Convex Sets: Binary Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 *Motivation and Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 *Why Binary Operations for Convex Sets Are Needed . . . 2.1.2 *Crash Course in Working CoordinateFree . . . . . . . . . . . . . . . 2.2 Binary Operations and the Functions +X , X . . . . . . . . . . . . . . . . . . . . . . . 2.3 Construction of a Binary Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Complete List of Binary Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53 54 54 54 56 57 60 62 64
3
Convex Sets: Topological Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 *Crash Course in Topological Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Recession Cone and Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Recession Cone and Closure: Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Illustrations Using Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 The Shape of a Convex Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Topological Properties Convex Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Proofs Topological Properties Convex Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 *Applications of Recession Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Certificates for Unboundedness of a Convex Set . . . . . . . . . . 3.8.2 Certificates for Insolubility of an Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65 66 68 70 71 74 78 80 81 81
Convex Sets: Dual Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 *Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Child Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 How to Control Manufacturing by a Price Mechanism. . . . 4.1.3 Certificates of Insolubility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 The BlackScholes Option Pricing Model . . . . . . . . . . . . . . . . . 4.2 Duality Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Other Versions of the Duality Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 The Supporting Hyperplane Theorem . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Separation Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Theorem of Hahn–Banach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Involution Property of the Polar Set Operator . . . . . . . . . . . . . 4.3.5 Nontriviality Polar Cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 The CoordinateFree Polar Cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Polar Set and Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Calculus Rules for the Polar Set Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Duality for Polyhedral Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 Theorems of the Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 *The BlackScholes Option Pricing Model . . . . . . . . . . . . . . . . 4.8.3 *Child Drawing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85 86 86 87 88 88 88 91 92 92 93 94 96 99 101 103 106 108 108 111 112
4
81 82
Contents
xxv
4.8.4
*How to Control Manufacturing by a Price Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5
Convex Functions: Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 *Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Description of Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Why Convex Functions that Are Not Nice Can Arise in Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Convex Function: Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Convex Function: Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Convex Function: Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Image and Inverse Image of a Convex Function . . . . . . . . . . . . . . . . . . . . . 5.6 Binary Operations for Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Recession Cone of a Convex Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 *Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.1 Description of Convex Sets by Convex Functions . . . . . . . . . 5.8.2 Application of Convex Functions that Are Not Specified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
123 124 124 126 127 132 135 138 139 144 145 145 146 146
6
Convex Functions: Dual Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 *Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Conjugate Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Conjugate Function and Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Duality Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Calculus for the Conjugate Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Duality: Convex Sets and Sublinear Functions. . . . . . . . . . . . . . . . . . . . . . . 6.7 Subgradients and Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Norms as Convex Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 *Illustration of the Power of the Conjugate Function . . . . . . . . . . . . . . . . 6.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
151 152 152 155 156 157 158 162 165 166 167
7
Convex Problems: The Main Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Convex Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Existence and Uniqueness of Optimal Solutions . . . . . . . . . . . . . . . . . . . . . 7.2.1 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Uniqueness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Illustration of Existence and Uniqueness. . . . . . . . . . . . . . . . . . . 7.3 Smooth Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Fermat’s Theorem (Smooth Case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Convex Optimization: No Need for Local Minima . . . . . . . . . . . . . . . . . . 7.6 Fermat’s Theorem (Convex Case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Perturbation of a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Lagrange Multipliers (Smooth Case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
173 174 176 176 177 178 179 182 184 185 188 189
xxvi
7.9 7.10 7.11 7.12
Contents
Lagrange Multipliers (Convex Case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *Generalized Optimal Solutions Always Exist. . . . . . . . . . . . . . . . . . . . . . . Advantages of Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
192 197 199 200
8
Optimality Conditions: Reformulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Karush–Kuhn–Tucker Theorem: Traditional Version . . . . . . . . . . . . . . . . 8.3 KKT in Subdifferential Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Minimax and Saddle Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Fenchel Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205 206 210 213 217 220 221 224
9
Application to Convex Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Generalized Facility Location Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Most Likely Matrix with Given Row and Column Sums . . . . . . . . . . . . 9.4 Minimax Theorem: Penalty Kick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Ladies Diary Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 The Second Welfare Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 *Minkowski’s Theorem on Polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Duality Theory for LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.9 Solving LP Problems by Taking a Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
225 225 226 227 228 229 230 232 234 238 241
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1 Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.2 Some Specific Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.3 Linear Algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.5 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.6 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.7 Convex Sets: Conification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.8 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.9 Convex Functions: Conification . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.10 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
243 243 243 243 244 245 245 245 247 247 249 249
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1 Calculus Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.1 Convex Sets Containing the Origin . . . . . . . . . . . . . . . . . . . . . . . . . B.1.2 Nonempty Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1.4 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
251 251 251 251 251 252
Contents
xxvii
B.1.5 B.1.6 B.1.7
Proper Sublinear Functions p, q and Nonempty Convex Sets A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Subdifferentials of Convex Functions . . . . . . . . . . . . . . . . . . . . . . 252 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Chapter 1
Convex Sets: Basic Properties
Abstract • Why. The central object of convex analysis is a convex set. Whenever weighted averages play a role, such as in the analysis by Nash of the question ‘what is a fair bargain?’, one is led to consider convex sets. • What. An introduction to convex sets and to the reduction of these to convex cones (‘homogenization’) is given. The ray, hemisphere and topview models to visualize convex sets are explained. The classic results of Radon, Helly and Carathéodory are proved using homogenization. It is explained how convex cones are equivalent to preference relations on vector spaces. It is proved that the Minkowski sum of a large collection of sets of vectors approximates the convex hull of the union of the collection (lemma of Shapley–Folkman). Road Map For each figure, the explanation below the figure has to be considered as well. • Definition 1.2.1 and Fig. 1.3 (convex set). • Definition 1.2.6 and Fig. 1.4 (convex cone; note that a convex cone is not required to contain the origin). • Figure 1.5 (the ray, hemisphere and topview model for a convex cone in dimension one; similar in higher dimensions: for dimension two, see Figs. 1.8 and 1.9). • Figure 1.10 (the (minimal) homogenization of a convex set in dimension two and in the bounded case). • Figure 1.12 (the ray, hemisphere and topview model for a convex set in dimension one; for dimension two, see Figs. 1.13 and 1.14). • Definition 1.7.2 and Fig. 1.15 (convex hull). • Definition 1.7.4 and Fig. 1.16 (conic hull; note that the conic hull contains the origin). • Definitions 1.7.6, 1.7.9, Proposition 1.7.8, its proof and the explanation preceding it (the simplest illustration of the homogenization method). • Section 1.8.1 (the ingredients for the construction of binary operations for convex sets).
© Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045_1
1
2
1 Convex Sets: Basic Properties
• Theorems 1.9.1, 1.10.1, 1.12.1 and the structure of their proofs (theorems of Radon, Helly, Carathéodory, proved by the homogenization method). • Definitions 1.12.6, 1.12.9 and 1.12.11 (polyhedral set, polyhedral cone and Weyl’s theorem). • Definition 1.10.5, Corollary 1.12.4 and its proof (use of compactness in the proof). • The idea of Sect. 1.13 (preference relations: a different angle on convex cones).
1.1 *Motivation: Fair Bargains The aim of this section is to give an example of how a convex set can arise in a natural way. You can skip this section, and similar first sections in later chapters, if you want to come to the point immediately. John Nash has solved the problem of finding a fair outcome of a bargaining opportunity (see [1]). This problem does not involve convex sets, but its solution requires convex sets and their properties. Here is how I learned about Nash bargaining. Once I was puzzled by the outcome of bargaining between two students, exchanging a textbook and sunglasses. The prices of book and glasses were comparable—the sunglasses being slightly more expensive. The bargain that was struck was that the student who owned the most expensive item, the sunglasses, paid an additional amount of money. What could be the reason for this? Maybe this student had poor bargaining skills? Someone told me that the problem of what is a fair bargain in the case of equal bargaining skills had been considered by Nash. The study of this theory, Nash bargaining, was an eyeopener: it explained that the bargain I had witnessed and that initially surprised me, was a fair one. Nash bargaining will be explained by means of a simple example. Consider Alice and Bob. They possess certain goods and these have known utilities (≈pleasure) to them. Alice has a bat and a box, Bob has a ball. The bat has utility 1 to Alice and utility 2 to Bob; the box has utility 2 to Alice and utility 2 to Bob; the ball has utility 4 to Alice and utility 1 to Bob. If Alice and Bob would come to the bargaining agreement to exchange all their goods, they would both be better off: their total utility would increase—for Alice from 3 to 4 and for Bob from 1 to 4. Bob would profit much more than Alice from this bargain. So this bargain does not seem fair. Now suppose that it is possible to exchange goods with a certain probability. Then we could modify this bargain by agreeing that Alice will give the box to Bob with probability 1/2. For example, a coin could be tossed: if heads comes up, then Alice should give the box to Bob, otherwise Alice keeps her box. This bargain increases expected utility for Alice from 3 to 5 and for Bob from 1 to 3. This looks more fair. More is true: we will see that this is the unique fair bargain in this situation.
1.1 *Motivation: Fair Bargains
3
To begin with, we model the bargain where bat, box and ball are handed over with probability p1 , p2 , p3 respectively. To keep formulas as simple as possible, we do this by working with a different but equivalent measure of utility after the bargain: expected change in utility instead of expected utility. To this end, we consider the vector in R2 that has as coordinates the expected changes in total utility for Alice and Bob: p1 (−1, 2) + p2 (−2, 2) + p3 (4, −1). Indeed, this is the difference of the following two vectors: the vector of expected utilities after the bargain, (1 − p1 )(1, 0) + p1 (0, 2) + (1 − p2 )(2, 0) + p2 (0, 2) + (1 − p3 )(4, 0) + p3 (0, 1) and the vector of the utilities before the bargain, (1, 0) + (2, 0) + (0, 4). Thus the utilities have been adjusted. We allow Alice and Bob to make no agreement; in this example, it is decided in advance that in case of no agreement, each one keeps her or his own goods. Now we model the bargaining opportunity of Alice and Bob: as a pair (F, v), where F , the set of bargains, is the set of all vectors p1 (−1, 2) + p2 (−2, 2) + p3 (4, −1) with 0 ≤ p1 , p2 , p3 ≤ 1, and v, the disagreement point, is the origin (0, 0). Figure 1.1 illustrates this bargaining opportunity. The set F contains v and it has two properties that are typical for all bargaining opportunities. We have to define these properties, compactness and convexity. That F is compact means that it is bounded—there exists a bound M > 0 for the absolute values of the coordinates of F , xi  ≤ M for all x ∈ F, i = 1, 2—and that it is closed—for each convergent sequence in R2 that is contained in F , its limit is also contained in F . These properties are easily checked and they are evident from Fig. 1.1. So F is compact. That F is convex means that it consists of one piece, and has no holes or dents. Figure 1.2 illustrates the concept of convex set. For a precise definition, see Definition 1.2.1. We readily see from Fig. 1.1 that F is convex. Fig. 1.1 Bargaining opportunity for Alice and Bob
F
v
4
1 Convex Sets: Basic Properties
yes
yes
no
no
no
Fig. 1.2 Convex set: yes or no
So we see that if we model a bargaining opportunity, we are led to the concept of convex set. We point out the source of this: a weighted average of two choices of probabilities p1 , p2 , p3 and q1 , q2 , q3 that specify certain bargains, gives a choice of probabilities r1 , r2 , r3 that specifies another bargain. To be more precise, if v = p1 (−1, 2) + p2 (−2, 2) + p3 (4, −1), 0 ≤ pi ≤ 1, i = 1, 2, 3 and w = q1 (−1, 2) + q2 (−2, 2) + q3 (4, −1), 0 ≤ qi ≤ 1, i = 1, 2, 3, and we write r1 = (1 − ρ)p1 + ρq1 , r2 = (1 − ρ)p2 + ρq2 , r3 = (1 − ρ)p3 + ρq3 for some 0 ≤ ρ ≤ 1, then 0 ≤ ri ≤ 1, i = 1, 2, 3 and (1 − ρ)v + ρw = r1 (−1, 2) + r2 (−2, 2) + r3 (4, −1). So we see that a weighted average of two bargaining opportunities is indeed again a bargaining opportunity. This is an example of a situation where a convex set turns up in a natural way. In fact, the modeling of a bargaining opportunity would force one to invent the concept of convex set, if it had not yet been invented already. Now we explain, by means of the same example, how to solve the problem of finding a fair outcome of a bargaining opportunity. To do this, we should begin by considering not only the individual bargaining opportunity of Alice and Bob above, but all bargaining opportunities simultaneously. A bargaining opportunity is defined formally as a pair (F, v) consisting of a compact convex set F ⊂ R2 — called the set of all allowable bargains—and a point v ∈ F —called the disagreement point. It is assumed that there is a point w ∈ F for which w > v, that is, w1 > v1 and w2 > v2 . The coordinates of an element of F , that is, of a bargain, are the expected utilities of the bargain (maybe adjusted, such as was done above) for Alice and Bob respectively. We consider all rules ϕ that assign to each bargaining opportunity (F, v) an element ϕ(F, v) in F , the outcome of the bargaining opportunity according to that rule. We want to find a rule that is fair, in
1.1 *Motivation: Fair Bargains
5
some sense to be made precise. To this end, we give three properties that we can reasonably expect a rule ϕ to have, in order for it to deserve the title ‘fair rule’. 1. For each bargaining opportunity (F, v), there does not exist x ∈ F for which x > ϕ(F, v). This represents the idea that each individual wishes to maximize the utility to himself of the bargain. 2. For each two bargaining opportunities (E, v) and (F, v) for which E ⊆ F and ϕ(F, v) ∈ E, one has ϕ(E, v) = ϕ(F, v). This represents the idea that eliminating from F feasible points (other than the disagreement point) that would not have been chosen, should not effect the outcome. 3. If a bargaining opportunity (F, v) can be brought by transformations of the two utility functions of the type vi = ai ui + bi , i = 1, 2 with ai > 0, in symmetric form, in the sense that (r, s) ∈ F implies (s, r) ∈ F and v is on the line y = x, then if (F, v) is written in symmetric form, one has that ϕ(F, v) is on the line y = x. This represents equality of bargaining skills: if both have the same bargaining position, then they will get the same utility out of it. What comes now, surprises most people; it certainly surprised me. These requirements single out one unique rule among all possible rules. This unique rule should be called the fair rule, as it is reasonable to demand that a fair rule should satisfy the three properties above are, and as moreover it is the only rule satisfying these properties. Furthermore, it turns out that this rule can be described by means of an optimization problem. Theorem 1.1.1 (Nash) There is precisely one rule ϕ that satisfies the three requirements above. This rule associates to a bargaining opportunity (F, v)— consisting of a compact convex set F in the plane R2 and a point v ∈ F for which there exists w ∈ F with w > v—the unique solution of the following optimization problem: max f (x) = (x1 − v1 )(x2 − v2 ), x ∈ F, x ≥ v. Example 1.1.2 (The Fair Bargain for Alice and Bob) Now we can verify that the following bargain is optimal for the bargaining opportunity of Alice and Bob: Bob gives the ball to Alice, Alice gives the bat to Bob, and a coin is tossed: if it lands on heads, then Alice keeps the box, if it lands on tails, then Alice gives the box to Bob. That is, the objects are handed over with probabilities p1 = 1, p2 = 1/2, p3 = 1. This gives the vector (−1, 2)+ 12 (−2, 2)+(4, −1) = (2, 2) in the set F of allowable bargains. By Theorem 1.1.1, it suffices to check that (2, 2) is a point of maximum of the function f (x) = (x1 − v1 )(x2 − v2 ) = x1 x2 on the set F , which is drawn in Fig. 1.1. To do this, it suffices to check that the line x1 + x2 = 4 separates the set F and the solution set of f (x) ≥ f (2, 2), that is, of x1 x2 ≥ 4, x1 ≥ 0, x2 ≥ 0, and to
6
1 Convex Sets: Basic Properties
check that (2, 2) is the only point on this line that belongs to both sets. Well, on the one hand, two adjacent vertices of the polygon F , the points (1, 3) and (3, 1)—the two vertices in Fig. 1.1 that lie in the first orthant x1 ≥ 0, x2 ≥ 0—lie on the line x1 + x2 = 4, so F lies on one side of this line and its intersection with this line is the closed line interval with endpoints (1, 3) and (3, 1). On the other hand, the solution set of x1 x2 ≥ 4, x1 ≥ 0, x2 ≥ 0—which is bounded by one branch of the hyperbola x1 x2 = 4—lies on the other side of the line x1 + x2 = 4, and its only common point with this line is (2, 2); this follows as the line x1 + x2 = 4 is tangent to the hyperbola x1 x2 = 4 at the point (2, 2). This completes the verification that the given bargain is the unique fair one. Later, we will solve the bargaining problem of Alice and Bob in a systematic way, using the theory of convex optimization, in Example 8.3.5. Thus, by using convex sets and their properties, a very interesting result has been obtained. The applications are limited by the fact that they require that the two participants cooperate and give enough information, so that the compact convex set F can be determined. The temptation must be resisted to influence the shape of F and so the outcome, by giving information not truthfully. However, this result sheds information on bargaining opportunities that are carried out in an amicable way, for example among friends. This is precisely the case with the bargain of a textbook and sunglasses, which led me to study Nash bargaining. Nash bargaining makes clear that this is a fair bargain! Indeed, the student who owned the textbook had passed her exam and had no more need for the book and could sell it to one of the many students who had to take the course next year, the other student really needed the textbook as she had to take the course. The student who owned the sunglasses had on second thoughts regrets that she had bought them, the other student liked them but did not have to buy them. If you translate this information into a bargaining opportunity of the form (F, v), you will see that you get that the fair outcome according to Nash agrees with the outcome of the textbook and sunglasses bargaining that I had observed in reality. If you are interested in the proof of Theorem 1.1.1, here is a sketch. It already contains some properties of convexity in embryonic form. These properties will be explained in detail later in this book. Proof of the Nash Bargaining Theorem We begin by proving that the optimization problem has a unique optimal solution. By the theorem of Weierstrass that a continuous function on a nonempty compact set assumes its maximal value, there exists at least one optimal solution. This optimal solution is > v by the assumption that there exists w ∈ F for which w > v and so f (w) > f (v) = 0. The optimization problem can be rewritten, by taking the logarithm, changing the sign, and replacing the constraints x ≥ v by x > v, as follows, min g(x) = − ln(x1 − v1 ) − ln(x2 − v2 ), x ∈ F, x > v. The function g is convex, that is, its epigraph, the region above or on the graph of g, that is, the set of pairs (x, ρ) for which x ∈ F, x > v, g(x) ≤ ρ, is a convex
1.2 Convex Sets and Cones: Definitions
7
set in threedimensional space R3 . The function g is even strictly convex, that is, it has the additional property that its graph contains no line segment of positive length. We do not display the verification of these properties of g. These properties imply that g cannot have more than one minimum. We show this by contradiction. If there would be two different minima x and y, then g(x) = g(y) and so, by convexity of g, the line segment with endpoints (x, g(x)) and (y, g(y)) lies nowhere below the graph of g. But by the minimality property of x and y, this line segment must lie on the graph of g. This contradicts the strict convexity of g. Thus we have proved that the optimization problem max f (x) = (x1 − v1 )(x2 − v2 ), x ∈ F, x ≥ v has a unique solution. Now assume that ϕ is a rule that satisfies the three requirements. We want to prove that ϕ(F, v) is the unique solution of the optimization problem max f (x) = (x1 − v1 )(x2 − v2 ), x ∈ F, x ≥ v for each bargaining opportunity (F, v). We can assume, by an appropriate choice of the utility functions of the two individuals, that the two coordinates of the optimal solution of the optimization problem above are equal to 1 and that v1 = v2 = 0. Then (1, 1) is the solution of the problem max x1 x2 , x ∈ F, x ≥ 0. We claim that F is contained in the halfplane G given by x1 + x2 ≤ 2. We prove this by contradiction. Assume that there exists a point x ∈ F outside this halfplane. Then the line interval with endpoints (1, 1) and x is contained in F , by the convexity of F . This line interval must contain points y near (1, 1) with y1 y2 > 1 as a simple picture makes clear (and a calculation can show rigorously). This is in contradiction with the maximality of (1, 1). As G is symmetric, we get ϕ(G, 0) = (1, 1). Moreover ϕ(G, 0) ∈ F so ϕ(F, 0) = (1, 1). Thus we have proved that a rule ϕ that satisfies the three requirements, must be given by the optimization problem above. Finally, it is not hard to show that, conversely, the rule that is given by the optimization problem, satisfies the three requirements. We do not display the verification.
1.2 Convex Sets and Cones: Definitions Now we begin properly: with our central concept convex set. We will use the notation R+ = [0, +∞) and R++ = (0, +∞). Definition 1.2.1 A set A ⊆ Rn , where Rn is the space of ncolumns, is called convex if it contains for each two of its points also the entire line segment having these points as endpoints, a, b ∈ A, ρ, σ ∈ R+ , ρ + σ = 1 ⇒ ρa + σ b ∈ A. If σ runs from 0 to 1—and so ρ runs from 1 to 0—then the point ρa + σ b = (1 − ρ)a + ρb runs in a straight line from a to b.
8
1 Convex Sets: Basic Properties
yes
yes
no
no
no
Fig. 1.3 Convex set: yes or no
Example 1.2.2 (Convex Set) 1. Figure 1.3 illustrates the concept of convex set. Five shaded subsets of the plane R2 have been drawn. From left to right: the first two are convex, the other three are not: the third as it has a dent, the fourth as it has a hole, the fifth as it consists of two pieces. 2. For n = 2, if a convex set A in the plane R2 represents land, and its complement R2 \ A water, then if you stretch a rope between any two points of A, then the rope remains dry. 3. The boundary of a convex set can be very irregular. Each set A in R2 that contains the standard open unit disk x12 + x22 < 1 and is contained in the standard closed unit disk x12 + x22 ≤ 1 is convex. So the complement of A in x12 + x22 ≤ 1 can be any subset of the standard unit circle x12 + x22 = 1. The only such set A that is closed is x12 + x22 ≤ 1. One could try to enumerate all convex sets. Example 1.2.3 (Attempts to Enumerate Convex Sets) 1. The convex subsets of the line R can be enumerated. A subset of R is convex iff (if and only if) it is an interval; on each side it can be open or closed, finite or infinite. That is, the list of convex sets in R is ∅, (a, b), [a, b], (a, b], [a, b), (a, +∞), (−∞, a), [a, +∞), (−∞, a], (−∞, +∞)
for all −∞ < a < b < +∞. 2. The convex subsets of the plane R2 cannot be enumerated. However, if they are bounded (contained in a large disk with center the origin), then one can do the next best thing: one can approximate them by simpler convex sets, convex polygons, and these can be enumerated. A polygon is a region of the plane for which the boundary consists of a finite number of straight line segments, the sides of the polygon. (a) A polygon is convex iff all its interior angles are less than π . (b) The convex polygons can be enumerated by the lengths of their sides and their angles. Let a ksided convex polygon be given. Number its vertices from 1 to k starting at some chosen vertex and going counterclockwise. Let ϕi be the angle at the ith vertex and let li be the length of the side that has
1.2 Convex Sets and Cones: Definitions
9
the ith vertex as the first one of its two boundary vertices, going counterclockwise. Then ϕ1 , . . . , ϕk is a sequence of positive numbers in (0, π ) with sum (k −2)π , and l1 , . . . , lk is a sequence of positive numbers. All such pairs of sequences are obtained for a unique ksided convex polygon, if convex polygons that differ by rotations and translations are identified. This result has no simple generalization beyond the plane. (c) Here is another way to enumerate convex polygons. Let a ksided convex polygon be given. Let T be the set of its outside normals t of unit length, 1 2 = 1; for each t ∈ T , let at be the length of the side with t = (t12 + t22 ) normal t. Then t∈T at t = 0. All such pairs of sequences are obtained for a unique ksided polygon, if polygons that differ by rotations and translations are identified. The advantage of this result is that it has a nice straightforward generalization beyond the plane. (d) Note that a convex ksided polygon has k vertices, and that this set of vertices characterizes the polygon. However, the characterization of sets of k points in the plane that are the vertices of a convex ksided polygon is not so simple. This is a disadvantage of enumerating polygons by their vertices. (e) For each convex set A that contains a small disk with center the origin and for each ε > 0, there is a convex polygon P such that P ⊇ A ⊇ (1 − ε)P = {(1 − ε)p  p ∈ P }. 3. The generalization of a convex polygon in the plane R2 to dimension n is a convex polytope in Rn , a region in Rn for which the boundary consists of a finite number of bounded pieces of hyperplanes in Rn . The enumeration of convex polygons by angles and lengths of sides cannot be generalized in a convenient way to polytopes. However, the enumeration of convex polygons by normals and lengths of sides can be extended in a convenient way to polytopes (see Theorem 1.12.7). The approximation of bounded convex sets in the plane by convex polytopes can also be extended to dimension n. Note that the intersection of any collection of convex sets in Rn is again a convex set. In particular, if two convex sets in Rn have no common point, then their intersection is the empty set, which is considered to be a convex set by the definition above. However, the union of two convex sets need not be convex, as the example of the convex sets [2, 3] and [4, 6] shows. Often only the direction of a vector v ∈ Rn is important. Then one identifies two vectors that are a positive scalar multiple of each other. This leads to the following definition. Definition 1.2.4 A cone is a set T in Rn that is positive homogeneous, that is, for each v ∈ T , also all positive scalar multiples ρv, ρ > 0 belong to T . So a cone is a union of a collection of open rays Ru = {ρu  ρ ∈ R++ } where u is a unit vector, 1 u = (u21 + · · · + u2n ) 2 = 1, with possibly the origin adjoined to it.
10
1 Convex Sets: Basic Properties
Each open ray in Rn contains a unique unit vector, and so the collection of open rays of a cone in Rn can be viewed as a subset of the standard unit sphere x12 + · · · + xn2 = 1. Each open ray Ru with positive last coordinate, un > 0, contains a unique vector with last coordinate equal to 1, (x1 , . . . , xn−1 , 1); this gives a point (x1 , . . . , xn−1 ) in Rn−1 . Therefore, the collection of open rays of a cone in Rn that is contained in the halfspace xn > 0 can be viewed as a subset of Rn−1 . Example 1.2.5 (A Cone Determined by a Positive Homogeneous Function) Let f : Rn → R be a positive homogeneous function of degree ν ∈ R, that is, f (ρx) = ρ ν f (x) = 0 for all ρ > 0 and x ∈ Rn . Then the solution set of For example, the function fn (x1 , x2 , x3 ) = x22 x3 − x13 + n2 x1 x32 is positive homogeneous of degree 3 for all n ∈ R. The set of solutions of fn = 0 for which x3 ≥ 0 is a cone that has two horizontal rays, R(0,1,0) and R(0,−1,0) , and the set of rays that are contained in the open halfspace x3 > 0 can be viewed as the curve x22 = x13 − nx12 in the plane R2 . [Aside. This curve is related to the famous unsolved congruent number problem: determine whether a squarefree natural number n is a congruent number, that is, the area of a right triangle with rational number sides. Such triangles correspond 1 − 1 to the nontrivial rational number points on the curve x22 = x13 − nx12 (nontrivial means x2 = 0). There is an easily testable criterion for the existence of such points, but this depends on the truth of the Birch and SwinnertonDyer conjecture, which is one of the Millennium Problems (‘million dollar problems’).] We are only interested in a special type of cones. Definition 1.2.6 A convex cone C ⊆ Rn is a set in Rn that is simultaneously a cone and a convex set. A simpler description of a convex cone is that it is a set C ⊆ Rn that contains for each two elements also all their positive linear combinations, a, b ∈ C, σ, τ ∈ R++ ⇒ σ a + τ b ∈ C. Note that the intersection of any collection of convex cones in Rn is again a convex cone. Warning In some texts, convex cones are required to contain the origin. This hardly matters: if we adjoin the origin to a convex cone that does not contain the origin, we get a convex cone containing the origin. However, for some constructions it is convenient to allow convex cones that do not contain the origin. The precise relation between convex cones that do not contain the origin and convex cones that do contain the origin is given in Exercise 10. Example 1.2.7 (Convex Cone) 1. Figure 1.4 illustrates the concept of convex cone. The left hand side illustrates the defining property in the simpler description. The right side illustrates why the word ‘cone’ has been chosen: a typical threedimensional cone looks like an extended icecream cone.
1.3 Chapters 1–4 in the Special Case of Subspaces
τb
b a O
σa
11
σa + τb
O
Fig. 1.4 Convex cone
2. Convex cones in the line R can be enumerated: ∅, (0, +∞), (−∞, 0), [0, +∞), (−∞, 0], (−∞, +∞). 3. Convex cones in the plane can be enumerated. If they are closed (that is contain their boundary points), then the list is: ∅, R2 and each region with boundary the two legs of an angle ≤ π , with legs having their endpoint at the origin. 4. Three interesting convex cones in R3 : (a) the first orthant R3+ = {x ∈ R3  x1 ≥ 0, x2 ≥ 0, x3 ≥ 0},
1
(b) the Lorentz cone or ice cream cone L3 = {x ∈ R3  x3 ≥ (x12 + x22 ) 2 }, (c) the positive semidefinite cone S3+ = {x ∈ R3  x1 x3 − x22 ≥ 0, x1 ≥ 0, x3 ≥ 0}. If you identify x ∈ R3 with the symmetric 2 × 2matrix (aij )ij given by a11 = x1 , a12 = a21 = x2 , a22 = x3 , then this convex cone is identified with the set of positive semidefinite 2 × 2matrices (that is, the symmetric 2 × 2matrices A for which v Av ≥ 0 for all v ∈ R2 ). (d) The three examples of convex cones in R3 given above can be generalized, as we will see in Sect. 1.7.1.
1.3 Chapters 1–4 in the Special Case of Subspaces The aim of this section is to give some initial insight into the contents of Chaps. 1–4 of this book. The aim of these chapters is to present the properties of convex sets and now we display the specializations of some of these properties from convex sets to special convex sets, subspaces. A subspace of Rn is a subset L of Rn that is closed under sums and scalar multiples, x, y ∈ L, α, β ∈ R ⇒ αx + βy ∈ L. A subspace can be viewed as the central concept from linear algebra: it is the solution set of a finite system of homogeneous linear equations in n variables. A subspace is a convex cone and so, in particular, it is a convex set. The specializations of the
12
1 Convex Sets: Basic Properties
results from Chaps. 1–4 from convex sets to subspaces are standard results from linear algebra. Chapter 1 A subspace of X = Rn is defined to be a nonempty subset L ⊆ X that is closed under linear combinations of two elements: a, b ∈ L, ρ, σ ∈ R ⇒ ρa + σ b ∈ L. Example 1.3.1 (False Equations and Homogenization) Now we illustrate homogenization of convex sets by means of a simple high school example. Before the use of the language of sets in high school, the following three cases were distinguished in some school books, for a system of two linear equations in two variables: a11 x1 + a12 x2 = b1 , a21 x1 + a22 x2 = b2 . where for each equation, the coefficients on the left side are not both zero: the system is, either undetermined, or uniquely determined or false. This distinction of cases, and in particular the intriguing choice of terminology ‘false’, might suggest that this is a substantial phenomenon. This distinction can expressed as follows, using sets: the solution set of the system is either infinite, or it consists of one element, or it is the empty set. In geometric language: two lines in the plane are either identical, or they intersect in one point, or they are parallel. A variant of homogenization (for simplicity, we work with nonzero multiples instead of with positive multiples) turns the system into the following form: a11 x1 + a12 x2 = b1 x3 , a21 x1 + a22 x2 = b2 x3 , If these equations are not multiples of each other, then the system has always a unique nonzero solution up to scalar multiples. In geometric language: the intersection of two different planes in space through the origin is always a line. So homogenization makes the distinction of cases disappear, and reveals the simple essence of the phenomenon of false systems of equations. Chapter 2 There are two standard binary operations for subspaces, the intersection L1 ∩ L2 and the Minkowski sum L1 + L2 = {x + y  x ∈ L1 , y ∈ L2 }. These can be described systematically as follows. The subspace L1 +L2 is the image of the set L1 × L2 under the linear function sum map +X : X × X → X, given by the recipe (x, y) → x + y. The subspace L1 ∩ L2 is the inverse image of the same set L1 × L2 under the linear function diagonal map X : X → X × X, given by
1.4 Convex Cones: Visualization by Models
13
the recipe x → (x, x). This description has the advantage that it reveals a relation between ∩ and +, as the linear functions X and +X are each others transpose. That is, X (u) · (v, w) = u · +X (v, w), for all u, v, w ∈ X, where the inner product on X × X is denoted as (x, y) · (v, w) = x1 v1 + · · · + xn vn + y1 w1 + · · · + yn wn . Chapter 3 For each subspace L, we can choose a finite sequence of elements in L such that each element of L can be written in a unique way as a linear combination of these elements. So all subspaces have essentially the same ‘shape’. Chapter 4 For each subspace L, we can consider its socalled dual object, the orthogonal complement L⊥ , consisting of all elements in X that are orthogonal to L, L⊥ = {a ∈ X  a · x = a1 x1 + · · · + an xn = 0 ∀x ∈ L}. That is, L⊥ consists of all a ∈ X for which L is contained in the solution set of the equation a1 x1 + · · · + an xn = 0. One has the duality theorem L⊥⊥ = L. Moreover, one has the calculus rule that K1 + K2 and L1 ∩ L2 are each others orthogonal complement if Ki and Li are each others orthogonal complement for i = 1, 2. This rule is an immediate consequence of the systematic description of ∩ and + and the following property: if the subspaces K, L ∈ Rn are each others orthogonal complement, then for each m × nmatrix M, the following two subspaces in Rm are each others orthogonal complement: {Mx  x ∈ L}, the image of L under M, and {y  M y ∈ L}, the inverse image of L under the transpose of M.
1.4 Convex Cones: Visualization by Models The aim of this section is to present three models for convex cones C ⊆ X = Rn that lie in the upper halfspace xn ≥ 0: the ray model, the hemisphere model and the topview model. Example 1.4.1 (Ray, Hemisphere and TopView Model for a Convex Cone) Figure 1.5 illustrates the three models in one picture for a convex cone C in the plane R2 that lies above or on the horizontal axis. The ray model visualizes this convex cone C ⊆ R2 by viewing C \ {02 } as a collection of open rays, and then considering the set whose elements are these rays; the hemisphere model visualizes C by means of an arc: the intersection of C with the upper halfcircle x12 + x22 = 1, x2 ≥ 0; the topview model visualizes C by looking
14
1 Convex Sets: Basic Properties
Fig. 1.5 Ray, hemisphere and topview model for a convex cone
C
0 from high above at the arc, or, to be precise, by taking the interval in [−1, +1] that is the orthogonal projection of the arc onto the horizontal axis. The three models are all useful for visualization in low dimensions. Fortunately, most, maybe all, properties of convex cones in Rn for general n can be understood by looking at such low dimensional examples. Therefore, making pictures in your head or with pencil and paper by means of one of these models is of great help to understand properties of convex cones—and so to understand the entire convex analysis, as we will see.
1.4.1 Ray Model for a Convex Cone The role of nonzero elements of a convex cone C ⊆ X = Rn is usually to describe onesided directions. Then, two nonzero elements of C that differ by a positive scalar multiple, a and ρa with a ∈ Rn and ρ > 0, are considered equivalent. The equivalence classes of C \ {0n } are open rays of C, sets of all positive multiples of a nonzero element. So, what often only matters about a convex cone C is its set of open rays, as this describes a set of onesided directions. The set of open rays of C will be called the ray model for the convex cone C. The ray model is the most simple description of the onesided directions of C from a mathematical point of view, but it might be seen as inconvenient that a onesided direction is modeled by an infinite set. Moreover, it requires some preparations such as the definition of distance between two open rays: this is needed in order to define convergence of a sequence of rays. Suppose we have two open rays, {ρv  ρ > 0} and {ρw  ρ > 0}, where v, w ∈ X are unit vectors, that is, their lengths or 1 1 Euclidean norms, v = (v12 + · · · + vn2 ) 2 and w = (w12 + · · · + wn2 ) 2 , are both equal to 1. Then the distance between these two rays can be taken to be the angle ϕ ∈ [0, π ] between the rays, which is defined by cos ϕ = v ·w = v1 w1 +· · ·+vn wn , the dot product of v and w. To be precise about the concept of distance, the concept of metric space would be needed; however, we will not consider this concept. Example 1.4.2 (Ray Model for a Convex Cone) Figure 1.5 above illustrates the ray model for convex cones in dimension two. A number of rays of the convex cone C are drawn.
1.4 Convex Cones: Visualization by Models
15
1.4.2 Sphere Model for a Convex Cone One often chooses a representative for each open ray, in order to replace open rays— which are infinite sets of points—by single points. A convenient way to do this is to normalize: that is, to choose the unit vector on each ray. Thus the onesided directions in the space X = Rn are modeled as the points on the standard unit sphere SX = Sn = {x ∈ X  x = 1} in X. The set of unit vectors in a convex cone C ⊆ X = Rn will be called the sphere model for C. The subsets of the standard unit sphere SX that one gets in this way are precisely the geodesically convex subsets of SX . A subset T of SX is called geodesically convex if for each two different points p, q of T that are not antipodes (p = −q), the shortest curve on SX that connects them is entirely contained in T . This curve is called the geodesic connecting these two points. Note that for two different points p, q on SX that are not antipodes, there is a unique great circle on SX that contains them. A great circle on SX is a circle on SX with center the origin, that is, it is the intersection of the sphere SX with a two dimensional subspace of X. This great circle through p and q gives two curves on SX on this circle connecting the two points, a short one and a long one. The short one is the geodesic connecting these points. Example 1.4.3 (Sphere Model for a Convex Cone) 1. Figure 1.5 above illustrates the sphere model for convex cones in dimension two. The convex cone C is modeled by an arc. 2. Figure 1.6 illustrates the sphere model for convex cones in dimension three, and it illustrates the concepts great circle and geodesic on SX for X = R3 . Two great circles are drawn. These model two convex cones that are planes through the origin. For the two marked points on one of these circles, the segment on this circle on the front of the sphere connecting the two points is their geodesic. Moreover, you see that two planes in R3 through the origin (and remember that a plane is a convex cone) are modeled in the sphere model for convex cones by two large circles on the sphere. 3. Figure 1.7 is another illustration of the sphere model for a convex cone C in dimension two. Here the convex cone C is again modeled by an arc. The sphere model is convenient for visualizing the onesided directions determined by a convex cone C ⊆ X = Rn for dimension n up to three: for n = 3, a onesided direction in Fig. 1.6 Geodesic and great circle
16
1 Convex Sets: Basic Properties
C
O
SX Fig. 1.7 Sphere model for a convex cone
C
C
C
Fig. 1.8 The hemisphere model for a convex cone
threedimensional space is modeled as one point on the standard unit sphere S3 in threedimensional space, the solution set of the equation x12 + x22 + x32 = 1.
1.4.3 Hemisphere Model for a Convex Cone Often, we will be working with convex cones in X = Rn that lie above or on the horizontal coordinate hyperplane xn = 0. Then the unit vectors of these convex cones lie on the upper hemisphere x12 + · · · + xn2 = 1, xn ≥ 0. Then the sphere model is called the hemisphere model. Example 1.4.4 (Hemisphere Model for a Convex Cone) 1. Figure 1.5 above illustrates the hemisphere model for convex cones in dimension two. The convex cone C is modeled by an arc on the upper halfcircle. 2. Figure 1.8 illustrates the hemisphere model for three convex cones C in dimension three. The models of these three convex cones C are shaded regions on the upper hemisphere. These regions are indicated by the same letter as the convex cone that they model, by C. The number of points in the model of C that lie on the circle that bounds the hemisphere is, from left to right: zero, one, infinitely many. This means that the number of horizontal rays of the convex cone C is, from left to right: zero, one, infinitely many.
1.4 Convex Cones: Visualization by Models
17
1.4.4 TopView Model for a Convex Cone In fact, for a convex cone C ⊆ X = Rn that is contained in the halfspace xn ≥ 0, one can use an even simpler model, in one dimension lower—in Rn−1 instead of in Rn . Suppose we look at the hemisphere from high above it. Here we view the nth coordinate axis of X = Rn as a vertical line. Then the hemisphere looks like the standard closed unit ball in dimension n − 1, and the subset of the hemisphere that models C looks like a subset of this ball. To be more precise, one can take the orthogonal projection of points on the hemisphere onto the hyperplane xn = 0, by setting the last coordinate equal to zero. Then the hyperplane xn = 0 in X = Rn can be identified with Rn−1 by omitting the last coordinate 0. This gives that the set of onesided directions of the convex cone is modeled as a subset of the standard closed unit ball in Rn−1 , 2 ≤ 1}. BRn−1 = Bn−1 = {x ∈ Rn−1  x12 + · · · + xn−1
This subset will be called the topview model for a convex cone. Example 1.4.5 (TopView Model for a Convex Cone) 1. Figure 1.5 above illustrates the topview model for convex cones in dimension two. Here the convex cone C is modeled by a closed line segment contained in the closed interval [−1, 1], the standard closed unit ball B1 in R1 . 2. Figure 1.9 illustrates the topview model for three convex cones C in dimension three. The models for these three convex cones C are shaded regions in the standard closed unit disk B2 in the plane R2 , which are indicated by the same letter as the convex cones that they model, by C. All three convex cones that are modeled, are chosen in such a way that they contain the positive vertical coordinate axis {(0, 0, ρ)  ρ > 0} in their interior. As a consequence, their topview models contain in their interior the center of the disk. The arrows represent images under orthogonal projection on the plane x3 = 0 of some geodesics that start at the top of the hemisphere. The number of points in the model of C that lie on the circle are, from left to right: zero, one, infinitely many. This means that the number of horizontal rays of the convex cone C is, from left to right: zero, one, infinitely many.
C C
Fig. 1.9 The topview model for a convex cone
C
18
1 Convex Sets: Basic Properties
The topview model for convex cones can be used for visualization of convex cones in one dimension higher than the sphere and the hemisphere model, as it models onesided directions of C ⊆ X = Rn as points in the ball BRn−1 ⊆ Rn−1 : so it can be used for visualization up to and including dimension four, whereas the sphere and hemisphere model can be used only up to and including dimension three. The topview model has only one disadvantage: the orthogonal projections of the geodesics between two points on the hemisphere, which are needed to characterize the subsets of the ball that model convex cones, are less nice than the geodesics themselves. These geodesics on the hemisphere are nice: they are halves of great circles. Their projections onto the ball are less nice: they are halves of ellipses that are contained inside the ball and that are tangent to the boundary of the ball at two antipodes on the ball. The exception to this are the projections of halves of great circles through the top of the hemisphere: they are nice—they are just straight line segments connecting two antipodes on the boundary of the ball, and so they run through the center of the ball. Example 1.4.6 (Straight Geodesics in the TopView Model) Figure 1.9 above illustrates the projections of these geodesics: some straight arrows are drawn from the center of the disk to the boundary of the topview model for the convex cone C.
1.5 Convex Sets: Homogenization Now we come to the distinctive feature of this book: the consistent use of the description of a convex set in terms of a convex cone, its homogenization or conification.
1.5.1 Homogenization: Definition Now we are going to present homogenization of convex sets, the main ingredient of the unified approach to convex analysis that is used in this book. That is, we are going to show that every convex set in X = Rn —and we recall that this is our basic object of study—can be described by a convex cone in a space one dimension higher, X × R = Rn+1 , that lies entirely on or above the horizontal coordinate hyperplane xn+1 = 0, that is, it lies in the closed upper halfspace xn+1 ≥ 0. Example 1.5.1 (Homogenization of a Bounded Convex Set) Figure 1.10 illustrates the idea of homogenization of convex sets for a bounded convex set A in the plane R2 . This crucial picture can be kept in mind throughout this book!
The plane R2 is drawn as the floor, R2 × {0}, in three dimensional space R3 . The convex set A is drawn as lying on the floor, as the set A × {0}, and this copy
1.5 Convex Sets: Homogenization
19
Fig. 1.10 Homogenization of a convex set
A×{1}
1
C
O
A
is indicated by the same letter as the set itself, by A. This set is lifted upward in vertical direction to level 1, giving the set A × {1}. Then the union of all open rays that start at the origin and that run through a point of the set A × {1} is taken. This union is defined to be the homogenization or conification C = c(A) of A. Then A is called the (convex set) dehomogenization or deconification of C. There is no loss of information if we go from A to C: we can recover A from C as follows. Intersect C with the horizontal plane at level 1. This gives the set A × {1}. Then drop this set down on the floor in vertical direction. This gives the set A, viewed as lying on the floor. The formal definition is as follows. It is convenient to reverse the order and to define first the dehomogenization and then the homogenization. If C ⊆ X × R = Rn+1 is a convex cone and C ⊆ X × R+ , where R+ = [0, +∞)—that is, C lies on or above the horizontal hyperplane X × {0}, or, to be more precise, C is contained in the closed upper halfspace xn+1 ≥ 0—then the intersection C ∩ (X × {1}) is of the form A × {1} for some unique set A = d(C) ⊆ X. This set is convex. Here is the verification. If a, b ∈ A, ρ, σ ∈ [0, 1], ρ + σ = 1 then the point ρ(a, 1) + σ (b, 1) = (ρa + σ b, ρ + σ ) = (ρa + σ b, 1) lies in C and also in X×{1}; so it lies in A×{1}. This gives ρa+σ b ∈ A. This proves that A is a convex set, as claimed. All convex sets A ⊆ X are the dehomogenization of some convex cone C ⊆ X × R+ . Definition 1.5.2 The (convex set) dehomogenization or deconification of a convex cone C ⊆ X × R+ is the convex set A = d(C) ⊆ X defined by A × {1} = C ∩ (X × {1}).
20
1 Convex Sets: Basic Properties
Then the convex cone C is called a homogenization or conification of the convex set A. There is no loss of information if we go over from a convex set A to a homogenization C: we can recover A from C: A = d(C), that is, A × {1} = C ∩ (X × {1}). Example 1.5.3 (Deconification) The convex cones C1 = {x ∈ R3  x12 + x22 − x32 ≤ 0, x3 ≥ 0}, C2 = {x ∈ R3  2x12 + 3x22 − 5x33 ≤ 0, x3 ≥ 0}, C3 = {x ∈ R3  − x12 + x2 x3 ≤ 0, x3 ≥ 0}, C1 = {x ∈ R3  − x1 x2 + x3 ≤ 0, x1 ≥ 0, x2 ≥ 0, x3 ≥ 0}, have as deconifications the convex sets that have the following boundaries, respectively: the circle x12 + x22 = 1, the ellipse 2x12 + 3x22 = 5, the parabola x2 = x12 , and the branch of the hyperbola x1 x2 = 1 that is contained in the first orthant x ≥ 0.
1.5.2 Nonuniqueness Homogenization We have already stated the fact that each convex set A ⊆ X is the deconification of a suitable convex cone C ⊆ X × R+ . This convex cone C, a homogenization of A, is not unique. To give a precise description of the nonuniqueness we need the important concept recession direction. Definition 1.5.4 Let A ⊆ X = Rn be a convex set. A recession vector of A is a vector c ∈ X such that a + tc ∈ A for all a ∈ A and all t ≥ 0. This means, if c = 0n , that one can travel in A in the onesided direction of c without ever leaving A. So you can escape to infinity within A in this direction. The recession cone RA of the convex set A is the set in X consisting of all recession vectors of A. The set RA is a convex cone. A recession direction of A is a onesided direction of RA , that is, an open ray of the convex cone RA . Example 1.5.5 (Recession Cone (Geometric)) Figure 1.11 illustrates the concept of recession cone for convex sets with boundary an ellipse, a parabola and a branch of a hyperbola respectively. On the left, a closed convex set A is drawn with boundary an ellipse. This set is bounded, there are no recession directions: RA = {0n }. In the middle, a closed convex set A is drawn with boundary a parabola. This set is unbounded, there is a unique recession direction: RA is a closed ray. This recession direction is indicated by a dotted halfline: c is a nonzero recession vector. On the right, a closed convex
1.5 Convex Sets: Homogenization
21
A A a + tc a
A
RA RA
RA O
O
O
Fig. 1.11 Recession cone of a convex set
set A is drawn with boundary a branch of a hyperbola. This set is unbounded, there are infinitely many recession directions: RA is the closed convex cone in the plane with legs the directions of the two asymptotes to the branch of the hyperbola that is the boundary of A. Example 1.5.6 (Recession Cone (Analytic)) 1. Ellipse. The recession cone of 2x12 + 3x22 ≤ 5 is {02 }. 2. Parabola. The recession cone of x2 ≥ x12 is the closed ray containing (1, 0). 3. Hyperbola. The recession cone of x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 is the first quadrant R2+ . At this point, it is recommended that you find all the recession vectors of some simple convex subsets of the line and the plane, such as in Exercise 27. The following result describes all conifications of a given nonempty convex set A ⊆ X. Proposition 1.5.7 Let A ⊆ X = Rn be a nonempty convex set. For each conification C ⊆ X × R+ of A, the intersection of C with the open upper halfspace X × R++ is the smallest convex cone containing A × {1}, R++ (A × {1}) = {ρ(a, 1)  ρ ∈ R++ , a ∈ A}; the intersection of C with the horizontal hyperplane X × {0} is a convex cone contained in RA × {0}. Conversely, each convex cone contained in RA ×{0} is the intersection of a unique conification of A with the horizontal hyperplane X × {0}. This result follows immediately from the definitions. It shows that the conifications C of a nonempty convex set A ⊆ X = Rn are precisely the intermediate
22
1 Convex Sets: Basic Properties
convex cones between the convex cones cmin (A) = R++ (A × {1}) and cmax (A) = cmin (A) ∪ (RA × {0}), cmin (A) ⊆ C ⊆ cmax (A). So cmin (A) and cmax (A) are the minimal and maximal conifications of A respectively. We will choose the minimal homogenization of A to be our default homogenization. We write c(A) = cmin (A) and we call c(A) the homogenization of A. The recession cone RA of a convex set A ⊆ X is an important concept in convex analysis. It arises here naturally from homogenization: RA × {0} is the intersection of cmax (A), the maximal conification of A, with the horizontal hyperplane X × {0}. The main point that this book tries to make is that all concepts in convex analysis, their properties and all proofs, arise naturally from homogenization. The recession cone is the first example of this phenomenon. In Chap. 3, the recession cone will be considered in more detail. Example 1.5.8 (Minimal and Maximal Conification) 1. Consider the convex set A = (−7, +∞). This is the solution set of the inequality x1 > −7. Homogenization of this inequality gives x1 > −7x2 , that is, x2 > − 17 x1 . So cmin (A) = {x ∈ R2  x2 > − 17 x1 , x2 > 0}. One has RA = [0, +∞). It follows that cmax (A) = {x ∈ R2  x2 > − 17 x1 , x2 ≥ 0}. 2. Here is a numerical illustration of the minimal and maximal homogenization for the three convex sets in Fig. 1.10 with a convenient choice of coordinates. • Ellipse. Let the convex set A be the solution set of x12 + 5x22 − 1 ≤ 0. Then the recession cone is RA = {02 }. The minimal homogenization c(A) = cmin (A) is the solution set of x12 + 5x22 − x32 ≤ 0, x3 > 0. The maximal homogenization cmax (A) is the solution set of x12 + 5x22 − x32 ≤ 0, x3 ≥ 0. • Parabola. Let the convex set A be the solution set of x2 ≥ x12 . Then the recession cone RA is the solution set of x1 = 0, x2 ≥ 0. The minimal homogenization c(A) = cmin (A) is the solution set of x2 x3 ≥ x12 , x3 > 0. The maximal homogenization cmax (A) is the solution set of x2 x3 ≥ x12 , x3 ≥ 0. • Hyperbola. Let the convex set A be the solution set of x12 −x22 −1 ≤ 0, x2 ≥ 0. Then the recession cone RA is the epigraph of the absolute value function, the
1.6 Convex Sets: Visualization by Models
23
solution set of x2 ≥ x1 . The minimal homogenization c(A) = cmin (A) is the solution set of x12 − x22 − x32 ≤ 0, x2 ≥ 0, x3 > 0. The maximal homogenization cmax (A) is the solution set of x12 − x22 − x32 ≤ 0, x2 ≥ 0, x3 ≥ 0. These calculations illustrate also the terminology ‘homogenization’. For example, the usual homogenization of the quadratic function x12 + 5x22 − 1, which occurs in the description of the first convex set above, is the function x12 +5x22 −x32 , which occurs in the description of the two homogenizations of this convex set. Note that this last function is homogeneous of degree 2: when each of its arguments is multiplied by any number t > 0, the value of the function is multiplied by t 2 .
1.5.3 Homogenization Method Now we formulate the unified method that will be used in this book for all tasks: the homogenization method or conification method. This method consists of three steps: 1. conify (homogenize)—translate into the language of convex cones (homogeneous convex sets) by taking the conifications of all convex sets involved, 2. work in terms of convex cones, that is, homogeneous convex sets, to carry out the task, 3. deconify (dehomogenize)—translate back.
1.6 Convex Sets: Visualization by Models For visualization of convex sets A in X = Rn , we will make use of the ray model, the hemisphere model and the topview model for a convex set. Example 1.6.1 (Ray Model, Hemisphere Model and TopView Model for a Convex Set) Figure 1.12 illustrates the three models in the case of a convex set in the line R. A convex set A ⊆ R is drawn; R is identified with the horizontal axis in the plane R2 ; the set A is lifted up to level 1; the open rays that start from the origin and that run through a point of this lifted up set form the ray model for A; the arc that consists of the intersections of these rays with the upper halfcircle form the hemisphere model for A; the orthogonal projection of this arc onto the horizontal axis forms the topview model for A.
24
1 Convex Sets: Basic Properties
0
A
Fig. 1.12 Ray, hemisphere and topview models for a convex set
A
A
A
RA
RA
Fig. 1.13 The hemisphere model for a convex set
The three models for visualization of convex sets are the combination of two constructions that have been presented in the previous two sections: homogenization of convex sets in X, and visualization of convex cones in X ×R = Rn+1 lying above the horizontal hyperplane X × {0} by means of the ray, hemisphere and topview model for convex cones. Ray Model for a Convex Set Each point x ∈ X = Rn can be represented by the open ray in X × R++ that contains the point (x, 1). Thus each subset of X is modeled as a set of open rays in the open upper halfspace X × R++ . This turns out to be specially fruitful for visualizing convex sets, if we want to understand their behavior at infinity. We will see this in Chap. 3. We emphasize that this is an important issue. Hemisphere Model for a Convex Set Consider the upper hemisphere {(x, ρ)  x ∈ X, ρ ∈ R+ , x12 + · · · + xn2 + ρ 2 = 1} in X × R = Rn+1 . Each point x in X = Rn can be represented by a point in this hemisphere: the intersection of the hemisphere with the open ray that contains the point (x, 1). Thus x ∈ X is represented by the point (x, 1)−1 (x, 1) on the hemisphere, where · denotes the Euclidean norm. Thus we get for each subset of X, and in particular for each convex set in X, a bijection with a subset of the hemisphere. This set has no points in common with the boundary of the hemisphere, which is a sphere in one dimension lower, in Rn−1 . Example 1.6.2 (Hemisphere Model for a Convex Set) Figure 1.13 illustrates the hemisphere model for three closed convex sets A in the plane R2 . The models are subsets of the upper hemisphere; these subsets are indicated by the same letter as the set that they model, by A. These models contain no points of
1.6 Convex Sets: Visualization by Models
25
the circle that bounds the upper hemisphere. You see from left to right: (1) the closed convex set A is bounded and the model of A is closed, (2) the closed convex set A is unbounded and the model of A is not closed, but if we would add one suitable point on the boundary of the hemisphere, then the model would become closed, (3) the closed convex set A is unbounded and the model of A is not closed, but if we would add a suitable infinite set of points on the boundary of the hemisphere, then the model would become closed. The set of points that would be added in the middle and on the right if we would make the model closed as explained above, is denoted by RA . The reason for this notation is that the points of this set correspond to the rays of the recession cone RA of A. TopView Model for a Convex Set Now consider the standard closed unit ball in X = Rn , Bn = {x ∈ X  x12 + · · · + xn2 ≤ 1}. Each point (x, ρ) in the hemisphere above, that is, each solution of x12 + · · · + xn2 + ρ 2 = 1, ρ ≥ 0, can be represented by a point in this ball: the point x. Note here that (x, 0) is the orthogonal projection of the point (x, ρ) onto the horizontal hyperplane X × {0}. Combining this with the hemisphere model above, we get for each subset of X a bijection to a subset of the ball Bn , in fact, to a subset of the open ball UX = Un = {x ∈ X  x < 1}. Here a point x ∈ X corresponds to the point (x, 1)−1 x ∈ Un . Example 1.6.3 (TopView Model for a Convex Set) Figure 1.14 illustrates the topview model for three convex sets A in the plane R2 that contain the origin as an interior point. The topview models of these three convex sets A are subsets of the standard open unit disk; these are also indicated by A. In particular, these models include no points of the circle that bounds the disk. On the left, you see the model of a closed bounded A; this model is a closed set in the interior of the disk. In the middle and on the right, you see the model of a closed unbounded A. These models are not closed. In the middle, you should add one suitable point from the circle in order to make it closed. On the right, you should add a suitable infinite set of points from the circle to make it closed. The sets of points that have to be added in order to make the model closed have again be indicated by the notation for the recession cone, by RA . The topview model for a convex set will be specially fruitful for visualizing convex sets if we want to understand their behavior at infinity. The advantage of the
A RA
A
Fig. 1.14 The topview model for a convex set
RA
A
26
1 Convex Sets: Basic Properties
topview model over the hemisphere model is that it is in one dimension lower. For example, we can visualize convex sets in threedimensional space R3 as subsets of the open ball x12 + x22 + x32 < 1 and convex sets in the plane R2 as subsets of the open disk x12 + x22 < 1.
1.7 Basic Convex Sets The definition of the concept of convex set serves, as all definitions, for academic precision only: for every concept that one considers, a definition has to be given. If we work with a concept, we will never use its definition, but instead a userfriendly calculus. For convex sets, this means that we want basic examples of convex sets as well as convexity preserving operations, rules to make new convex sets from old ones. Here are, to begin with, some basic convex sets: the three golden cones, the convex hull of a set, and the conic hull of a set.
1.7.1 The Three Golden Convex Cones The most celebrated basic examples of convex sets are the following three ‘golden’ convex cones. • The first orthant Rn+ , is the set of points in Rn with nonnegative coordinates. • The Lorentz cone Ln , also called the ice cream cone, is the epigraph of the Euclidean norm on Rn , that is, the region in Rn+1 above or on the graph of the Euclidean norm, 1
{(x, y)  x ∈ Rn , y ∈ R, y ≥ (x12 + · · · + xn2 ) 2 }. • The positive semidefinite cone Sn+ , is the set of positive semidefinite n × nmatrices in the vector space of symmetric n × nmatrices Sn (a symmetric n × nmatrix M is called positive semidefinite if x Mx ≥ 0 for all x ∈ Rn ). Note that in the third example we do not work in Rn , but in the space Sn of symmetric n × nmatrices. This can be justified by identifying this space with Rm where m = 12 n(n + 1) by stacking symmetric matrices, that is, by taking the entries on or above the main diagonal of a symmetric matrix and then writing them as one long column vector, taking the columns of the matrix from left to right and writing these entries for each column from top to bottom. The convex cone Sn+ models in terms of linear algebra (and so relatively simply) a solution set of a system of nonlinear inequalities (that is, a complicated set). Example 1.7.1 (Positive Semidefinite Cone) If we identify a symmetric 2 × 2matrix B with the vector in R3 with coordinates b11 , b22 , b12 respectively, then the convex cone S2+ gets identified with the convex set x1 x2 − x32 ≥ 0, x1 ≥ 0, x2 ≥ 0.
1.7 Basic Convex Sets
27
This is the maximal conification cmax (A) of the closed convex set A that has boundary the branch of the hyperbola x1 x2 = 1 that is contained in the first quadrant x ≥ 0.
1.7.2 Convex Hull and Conic Hull Here is a source of many basic examples of convex sets. Definition 1.7.2 To each set S ⊆ X = Rn , one can add points from X in a minimal way in order to make a convex set co(S) ⊆ X, the convex hull of S. That is, co(S) is the smallest convex set containing S. It is the intersection of all convex sets containing S. Example 1.7.3 Figure 1.15 illustrates the concept of convex hull. From left to right: the first two sets are already convex, so they are equal to their convex hull; for the other convex sets, taking their convex hull means respectively filling a dent, filling a hole, and making from a set consisting of two pieces a set consisting of one piece, without dents. Here is a similar source of examples of convex cones. Definition 1.7.4 To each set S ⊆ X, one can adjoin to S ∪ {0X } points from X in a minimal way in order to make a convex cone cone(S), the conic hull of S. That is, cone(S) is the smallest convex cone containing S and the origin. It is the intersection of all convex cones containing S and the origin. Example 1.7.5 (Conic Hull) 1. Figure 1.16 illustrates the concept of conic hull. This picture makes clear that a lot of structure can get lost if we pass from a set to its conic hull. 2. The conification of a convex set A ⊂ Rn is essentially the conic hull of A × {1}: c(A) = cone(A × {1}) \ {0X×R }.
Fig. 1.15 Convex hull
28
1 Convex Sets: Basic Properties
Fig. 1.16 Conic hull
cone(S)
S O Fig. 1.17 Primal description of a convex set
a7
a6 a5
a1 A a2 a3
a4
1.7.3 Primal Description of the Convex Hull The convex hull can be described explicitly as follows. We need a definition. Definition 1.7.6 A convex combination or weighted average of a finite set S = {s1 , . . . , sk } ⊆ X = Rn is a nonnegative linear combination of the elements of S with sum of the coordinates equal to 1, ρ1 s1 + · · · + ρk sk with ρi ∈ R+ , 1 ≤ i ≤ k and ρ1 + · · · + ρk = 1. Example 1.7.7 (Convex Combination) Let a, b, c be three points in R2 that do not lie on one line. Then the set of convex combinations of a, b, c is the triangle with vertices a, b, c. Proposition 1.7.8 The convex hull of a set S ⊆ X = Rn consists of the convex combinations of finite subsets of S. In particular, a convex set is closed under taking convex combinations of a finite set of points. This could be called a primal description—or a description from the inside—of a convex set. Each choice of a finite subset of the convex set gives an approximation from the inside: the set of all convex combinations of this subset. Figure 1.17 illustrates the primal description of a convex set.
1.7 Basic Convex Sets
29
Proposition 1.7.8 will be proved by the homogenization method. This is the first of many proofs of results for convex sets that will be given in this way. Therefore, we make here explicit how all these proofs will be organized. 1. We begin by stating a version of the result for convex cones, that is, for homogeneous convex sets. This version is obtained by conification (or homogenization) of the original statement, which is for convex sets. For brevity we only display the outcome, which is a statement for convex cones. So we do not display how we got this statement for convex cones from the original statement for convex sets. 2. Then we prove this convex cone version, working with convex cones. 3. Finally, we translate back by deconification (or dehomogenization). We need a definition. Definition 1.7.9 A conic combination of a finite set S = {s1 , . . . , sk } ⊂ X = Rn is a nonnegative linear combination of the elements of S, ρ1 s1 + · · · + ρk sk , with ρi ∈ R+ , 1 ≤ i ≤ k. Example 1.7.10 (Conic Combination) The set of conic combinations of the standard basis e1 = (1, 0, . . . , 0) , . . . , en = (0, . . . , 0, 1) is the first orthant Rn+ . Proof of Proposition 1.7.8 1. Convex cone version. We consider the following statement: The conic hull of a set S ⊆ X consists of all conic combinations of finite subsets of S. 2. Work with convex cones. The statement above holds true. Indeed, these points should all lie in cone(S) (this follows immediately by induction on k) and as it is readily checked that they form a convex cone, they form the conic hull of S. 3. Translate back. The statement above implies the statement of the proposition: for S ⊆ X, we have that the convex cone cone(S × {1}) consists of all elements ρ1 (s1 , 1) + · · · + ρk (sk , 1) = (ρ1 s1 + · · · ρk sk , ρ1 + · · · + ρk ) where k ≥ 0, si ∈ S, 1 ≤ i ≤ k, ρi ∈ R+ . The convex hull of S is the deconification of this convex cone; it consists of all elements ρ1 s1 + · · · + ρk sk where k ≥ 0, si ∈ S, 1 ≤ i ≤ k, ρi ∈ R+ , ρ1 + · · · + ρk = 1, as required.
Now we already mention briefly something that is the main subject of Chap. 4 and that is the heart of convex analysis. For convex sets, one has not only a primal description, that is, a description from the inside, but also a socalled dual description, that is, a description from the outside. Each choice of a finite system
30
1 Convex Sets: Basic Properties
Fig. 1.18 Dual description of a convex set
H5
H6
H4 H1
A H3 H2
of linear inequalities in n variables which is satisfied by all points of the convex set gives an approximation from the outside: the set of all solutions of this system. Definition 1.7.11 Figure 1.18 illustrates the dual description of a convex set. In the figure, six lines are drawn that do not intersect the convex set A. For each of them we cut away the side of the line that is disjoint from A. This gives a sixsided polygon that is an outer approximation of A.
1.8 Convexity Preserving Operations Now we turn to three fundamental convexity preserving operations—rules for making new convex sets from old ones: cartesian product, image and inverse image under a linear function. We will make good use of these operations, to begin with in Chap. 2. There they will allow us to give a systematic construction of the four standard binary operations for convex sets: these make from two old convex sets a new one. In Chaps. 7 and 8, the image will be used to construct the optimal value function for a perturbation of an optimization problem.
1.8.1 Definition of Three Operations The first operation is the Cartesian product. If we have two convex sets A ⊆ Y = Rm , B ⊆ X = Rn , then their Cartesian product is the set A × B = {(a, b)  a ∈ A, b ∈ B} in Y × X = Rm × Rn = Rm+n . This is a convex set.
1.8 Convexity Preserving Operations
31
The second operation is the image of a convex set under a linear function. Let M be an m × nmatrix. This matrix determines a linear function Rn → Rm by the recipe x → Mx. For each convex set A ⊆ Rn , its image under M is the set MA = {Ma  a ∈ A} in Rm . This is a convex set. Example 1.8.1 (Image Convex Set Under Linear Function) Let M be a 2×2matrix, neither of whose two columns is a scalar multiple of the other column. 1. Let A be a rectangle in the plane R2 whose sides are parallel to the two coordinate axes, a ≤ x < b, c ≤ y < d for constants a 0, y = 0. It is not closed: it does not contain the origin. A sufficient condition for preservation of closedness of a convex set A ⊆ Rn by the image under a linear function Rn → Rm determined by an m × nmatrix M
32
1 Convex Sets: Basic Properties
Fig. 1.19 The projection of a closed convex set that is not closed
Fig. 1.20 Radon’s theorem
is that the kernel of M (the solution set of Mx = 0) and the recession cone RA of A have only the origin in common, as we will see in Chap. 3. Because of this, we will sometimes work with open convex sets instead of with closed sets: openness of convex sets is preserved by the image of a linear function Rn → Rm determined by an m × nmatrix M provided this linear function is surjective, that is, provided the rows of M are linearly independent. Properness of convex sets is preserved by Cartesian products, by images under linear functions, but not by inverse images under linear functions.
1.9 Radon’s Theorem For a finite set of points in X = Rn , we will give a concept of ‘points in the middle’. Theorem 1.9.1 (Radon’s Theorem) A finite set S ⊂ X = Rn of at least n + 2 points can be partitioned into two disjoint sets B (‘blue’) and R (‘red’) whose convex hulls intersect, co(B) ∩ co(R) = ∅. Figure 1.20 illustrates Radon’s theorem for four specific points in the plane and for five specific points in threedimensional space. A common point of the convex hulls of B and R is called a Radon point of S. It can be considered as a point in the middle of S.
1.9 Radon’s Theorem
33
Example 1.9.2 (Radon Point) 1. Three points on the line R can be ordered s1 ≤ s2 ≤ s3 and then their unique Radon point is s2 . 2. Four points in the plane R2 have either one point lying in the triangle formed by the other three points or they can be partitioned in two pairs of two points such that the two closed line segments with endpoints the points of a pair intersect. In both cases, there is a unique Radon point. In the first case, it is the one of these four points that lies in the triangle formed by the other three points. In the second case, it is the intersection of the two closed line segments. Proof of Radon’s Theorem We use the homogenization method. 1. Convex cone version. We consider the following statement: A finite set S ⊂ Rn−1 × (0, +∞)—so each element of S has positive last coordinate—of more than n elements can be partitioned into two disjoint sets B and R whose conic hulls intersect in a nonzero point, cone(B) ∩ cone(R) = {0n }. 2. Work with convex cones. As #(S) > n, we can choose a nontrivial linear relation among the elements of S. As each element of S has positive last coordinate, it follows that in this nontrivial linear relation, at least one coefficient is positive and at least one is negative. We rewrite this linear relation, leaving all terms with positive coefficient on the left side of the equality sign and bringing the remaining terms to the other side, giving an equality of two conic combinations. Thus S has been partitioned into two disjoint nonempty sets B and R, and one has an equality b∈B
μb b =
νr r.
r∈R
for suitable numbers μb > 0 ∀b ∈ B and νr ≥ 0 ∀r ∈ R. In particular, by the version of Proposition 1.7.8 for convex cones, the common value of left and right hand side is a point in cone(B) as well as a point in cone(R). That is, it is an element of the intersection cone(B) ∩ cone(R). This cannot be the zero vector: its last coordinate is b∈B μb bn and this is a nonempty sum of products of positive numbers: bn > 0 and μb > 0 for all b ∈ B. Thus we have partitioned S into two disjoint subsets B and R whose conic hulls intersect in a nonzero point. This concludes the proof of the version of the theorem of Radon for convex cones. 3. Translate back. What we just proved (the convex cone version), implies the statement of Theorem 1.9.1. Indeed let S be a subset of Rn of at least n + 2 points. Apply the convex cone version to the set S × {1} in Rn+1 . This is allowed as #(S × {1}) ≥ n + 2 > dim(Rn+1 ). This gives a partitioning of S × {1} into
34
1 Convex Sets: Basic Properties
two disjoint sets B × {1} and R × {1} whose conic hulls intersect in a nonzero point. That is, we have the equation
λb (b, 1) =
b∈B
μr (r, 1)
r∈R
for suitable λb ∈ [0, +∞), b ∈ B, μr ∈ [0, +∞), r ∈ R. Moreover, not all λb are zero (or, equivalently, not all μr are zero). This equation is equivalent to the following two equations
λb b =
b∈B
b∈B
μr r
r∈R
λb =
μr .
r∈R
Denote both sides of the second equation by u. We have u > 0 as all λb are nonnegative and at least one of them is nonzero. Now we divide the first equation by u. This gives that a convex combination of the set B equals a convex combination of the set R. By Proposition 1.7.8, this shows that the convex hulls of B and R intersect. This concludes the proof of the theorem.
The proof of Radon’s theorem implies a uniqueness result. Definition 1.9.3 A finite set of vectors in X=Rn is called affinely independent if there is no linear combination among them with sum of the coefficients equal to 1 or, equivalently, if the differences of one of its vectors with the other ones form a linearly independent set. As a linearly independent set in Rn can have at most n elements, this implies that the number of elements of an affinely independent set in Rn is always at most n + 1. Example 1.9.4 (Affinely Independent) Consider subsets of R3 : a set of two different points is always affinely independent, a set of three points is affinely independent if they do not lie on one line, a set of four points is affinely independent if they do not lie in one plane. Here is the promised uniqueness statement. Proposition 1.9.5 If S ⊆ X = Rn is a set of precisely n + 2 points, such that deleting any point gives an affinely independent set, then the Radon point is unique. One application of Radon points is that they can be used to approximate center points of any finite set of points in Rn (center points are generalization of the median to data in Rn , used in statistics and computational geometry) in an amount of time that is polynomial in both the number of points and the dimension n. However, the main use of Radon’s theorem is that it is a key step in the proof of Helly’s theorem.
1.10 Helly’s Theorem
35
1.10 Helly’s Theorem If you want to know whether a large collection of convex sets in X = Rn intersects, that is, has a common point, then you only have to check that each subcollection of n + 1 elements has a common point. Theorem 1.10.1 (Helly’s Theorem) A finite collection of more than n convex sets in X = Rn has a common point iff each subcollection of n + 1 sets has a common point. Example 1.10.2 (Helly’s Theorem) 1. Figure 1.21 illustrates Helly’s theorem for four convex sets S1 , S2 , S3 , S4 in the plane. These four sets have no common point, as you can check. Therefore, by Helly’s theorem, there should be a choice of three of these sets that has no common point. Indeed, one sees that S1 , S3 , S4 have no common point. 2. Figure 1.22 illustrates that Helly’s theorem cannot be improved: three convex subsets of the plane for which each pair has a nonempty intersection need not have a common point. Now we prove Helly’s theorem for some special cases. It is recommended that you draw some pictures for a good understanding of the arguments that will be given. Fig. 1.21 Helly’s theorem
S3
S1
S4
Fig. 1.22 Helly’s theorem cannot be improved
S2
36
1 Convex Sets: Basic Properties
Example 1.10.3 (Proof of Helly’s Theorem) 1. Consider three convex subsets A, B, C of the line R for which each pair intersects. Helly’s theorem states that then these three sets intersect. Let us prove this, using Radon’s theorem 1.9.1. Choose a ∈ B ∩ C, b ∈ C ∩ A, c ∈ A ∩ B. Now take the Radon point of a, b, c. We have already seen that this can be found as follows. Order the three points a, b, c. We assume that a ≤ b ≤ c as we may wlog (without loss of generality). Then b is the Radon point of a, b, c. We have a, c ∈ B, so by convexity of B, we get b ∈ B. Combining this with b ∈ C ∩ A, we get that b is a common point of A, B, C. 2. Consider four convex sets A, B, C, D in the plane R2 for which each triplet intersects. The theorem of Helly states that then these four sets intersect. Let us check this. Choose a point in the intersection of each triplet. This gives four points: a ∈ B ∩ C ∩ D, b ∈ A ∩ C ∩ D, c ∈ A ∩ B ∩ D, d ∈ A ∩ B ∩ C. Take their Radon point. We claim that this point is contained in all four convex sets A, B, C, D. We have seen that for the Radon point of four points in the plane, there are two cases to distinguish. • The Radon point is one of the four points and it is contained in the triangle formed by the other three points. We assume that this point is d, as we may wlog. Then d ∈ A ∩ B ∩ C by choice of d. Moreover, a, b, c ∈ D by choice of a, b, c. So by convexity of D, all points inside the triangle with vertices a, b, c are contained in D. In particular, D contains the Radon point d. This concludes the proof that the Radon point is contained in all four convex sets A, B, C, D. • The Radon point is the intersection of two line segments which have the four points as endpoints. We assume that one of these line segments has endpoints a and b, as we may wlog. So the other line segment has endpoints c and d. As a and b both belong to C and D, it follows from the convexity of C and D that the first line segment is contained in C and D. In particular, the Radon point belongs to C and D. Repeating this argument for the other line segment gives that the Radon point also belongs to A and B. This concludes the proof that the Radon point is contained in all four sets A, B, C, D. Now we extend these arguments to the general case. Proof of Helly’s Theorem We use the homogenization method. 1. Convex cone version. We consider the following statement: A finite collection Ci , i ∈ I of at least n convex cones in Rn that are contained in (Rn−1 × (0, +∞)) ∪ {0n } intersects in a nonzero point iff each subcollection of n convex cones intersects in a nonzero point. In order to prove this statement, it is convenient to reformulate it as follows: for each m ≥ n one has that each m convex cones of the collection intersect in a nonzero point. 2. Work with convex cones. We prove the reformulated statement above by induction on m. The start of the induction, m = n, holds by assumption. For the induction
1.10 Helly’s Theorem
37
step, assume that the statement holds for some m ≥ n and that J ⊆ I with #(J ) = m + 1. Choose for each j ∈ J a nonzero element cj ∈ ∩k∈J \{j } Cj as we may by the assumption of the induction step. Our knowledge about the chosen elements can be formulated as follows: if j, k are two different elements of J , then one has cj ∈ Ck . Now we apply the version of Radon’s theorem for convex cones to the set {cj  j ∈ J }. Then we get a partitioning of J into two disjoint sets B and R such that the conic hulls of the sets {cb  b ∈ B} and {cr  r ∈ R} intersect in some nonzero point c. To do the induction step, it suffices to show that c is a common point of the collection Cj , j ∈ J . Take, to begin with, an arbitrary j ∈ B. Then for all r ∈ R, one has r = j and so one has cr ∈ Cj . So, as Cj is a convex cone, the conic hull of the set {cr  r ∈ R} is contained in the cone Cj . As this conic hull contains the point c, we get that c ∈ Cj . Repeating this argument for an arbitrary j ∈ R, we get again c ∈ Cj . Thus we have proved for all j ∈ J = B ∪ R that c ∈ Cj . This concludes the induction step and so it concludes the proof of the version of Helly’s theorem for convex cones. 3. Translate back. What we have just proved (the convex cone version), implies the statement of Theorem 1.10.1. Indeed let Ai , i ∈ I be a collection of more than n convex sets in Rn . Apply the convex cone version to the collection of convex cones c(Ai ) ⊂ Rn+1 , i ∈ I . This is allowed as c(Ai ) ⊆ (Rn × (0, +∞)) ∪ {0n+1 } by definition of homogenization of a convex set and #(I ) ≥ n + 1 = dim(Rn+1 ). This gives the existence of a nonzero point in the intersection of the convex cones c(Ai ) ⊆ Rn+1 , i ∈ I . By scaling, we may assume moreover that the last coordinate is of this point is 1. As Ai × {1} = c(Ai ) ∩ (Rn × {1}) for all i ∈ I , we get that the collection Ai , i ∈ I intersects in a point.
The assumption in Helly’s theorem that the collection of convex sets is finite is essential. Example 1.10.4 (Finiteness Assumption in Helly’s Theorem) 1. The infinite collection of convex subsets [n, +∞) ⊂ R, n ∈ N has no common point, but every finite subcollection has a common point. The reason for the trouble here is that the common points of finite subcollections run away to infinity. 2. Another example where things go wrong is the infinite collection (0, n1 ) ⊂ R, n ∈ N. The reason for the trouble here is that common points of finite subcollections tend to a point that is not in the intersection, in fact, that does not belong to any of these sets: zero. If the two reasons for failure given above are prevented, then one gets a version of the result for infinite collections. This requires the topological concept of compactness (in Chap. 3 more details about topology will be given).
38
1 Convex Sets: Basic Properties
Definition 1.10.5 A subset S ⊆ X = Rn is closed if it contains the limit of each sequence in S that is convergent. A subset S ⊆ X = Rn is bounded if there exists a positive number R such that xi  ≤ R for all x ∈ S and all i ∈ {1, . . . , n}. A subset S ⊆ X = Rn is called compact if it is closed and bounded. A compact set has the following property. Proposition 1.10.6 A collection of closed subsets of a compact set has a nonempty intersection if each finite subcollection has a nonempty intersection. Theorem 1.10.7 (Helly’s Theorem for Infinite Collections) An infinite collection of compact convex sets in X = Rn has a common point iff each subcollection of n + 1 sets has a common point. Proof Assume that each subcollection of n + 1 sets intersects. By the theorem of Helly, it follows that each finite subcollection intersects. Then it follows by the compactness assumption and Proposition 1.10.6 that the entire collection intersects.
Note that the assumption of Theorem 1.10.7 can be relaxed: it is sufficient that the convex sets from the collection are closed and that at least one of them is compact.
1.11 *Applications of Helly’s Theorem Many interesting results can be proved with unexpected ease when you observe that Helly’s theorem can be applied. We formulate a number of these results. This material is optional. At a first reading, you can just look at the statements of the results below. Some of the proofs of these results are quite challenging. At the end of the chapter, hints are given. Proposition 1.11.1 If each three points from a set in the plane R2 of at least three elements can be enclosed inside a unit disk, then the entire set can be enclosed inside a unit disk. Here is a formulation in nontechnical language of the next result: if there is a spot of √ diameter d on a tablecloth, then we can cover it with a circular napkin of radius d/ 3. Proposition 1.11.2 (Jung’s Theorem) If the distance between any two points√of a set in the plane R2 is at most 1, then the set is contained in a disk of radius 1/ 3. We can also give a similar result about a disk contained in a given convex set in the plane. Proposition 1.11.3 (Blaschke’s Theorem) Every bounded convex set A in the plane of width 1 contains a disk of radius 1/3 (the width is the smallest distance between parallel supporting lines; a line is called supporting if it contains a point of the set and moreover has the set entirely on one of its two sides).
1.11 *Applications of Helly’s Theorem
39
The next result can again be formulated in nontechnical language: if a painting gallery consists of several rooms connected with one another whose walls are completely hung with paintings and if for each three paintings there is a point from which all three can be seen, then there exists a point from which all the paintings can be seen. Proposition 1.11.4 (Krasnoselski’s Theorem) If for each three points A, B, C of a polygon (that is, a subset of the plane that is bounded by a finite number of line segments) there is a point M such that all three line segments MA, MB, MC lie entirely inside the polygon, then there exists a point O in the interior of the polygon all whose connecting line segments with points of the polygon lie entirely inside the polygon. If you try to generalize the concept of median of a finite set S ⊂ R (a median of S is a real number such that the number of elements in S that are larger is equal to the number of elements in S that are smaller) to a finite set of data points S ⊂ Rn , then you come across the problem that there does not always exist a point in Rn such that each hyperplane through the point has at least 12 #(S) data points on each 1 of its two sides. However, if you replace 12 by n+1 , then such points exist. These points are called centerpoints. Proposition 1.11.5 (Centerpoint Theorem) For each finite set S ⊂ Rn there exists a point c ∈ Rn such that any hyperplane in Rn through c contains on each of its two sides at least #(S)/(n + 1) points. Some interesting information about the approximation of continuous functions by polynomial functions of given degree can be obtained. We need a preparation. Proposition 1.11.6 For each three line segments of a finite collection of parallel line segments in the plane, there exists a line that intersects them, then there exists a line that intersects all the line segments. Now we state the promised result on approximation of functions in the simplest case. Proposition 1.11.7 (Theorem of Chebyshev for Affine Functions) Let a continuous function f : [−1, 1] → R be given. Then the following conditions on an affine function y = ax + b are equivalent and there exists a unique such affine function. 1. There exists an affine (linear plus constant) function y = ax + b which approximates f best in the following sense: the value ε = maxx∈[−1,1] f (x) − (ax + b) is minimal. 2. There exist −1 ≤ x < x < x ≤ 1 such that the absolute values of the error f (x) − (ax + b) equals ε for x = x , x , x and that the signs of the error alternate for x = x , x , x . To illustrate this result, consider the function x 2 on [−1, 1], and check that the unique affine function having the two equivalent properties has a = 12 , b = 0.
40
1 Convex Sets: Basic Properties
Remark This last result can be generalized to polynomials of any given degree. Then the polynomial function of degree < n that best approximates the function x → x n on [−1, 1] can be given explicitly: by the recipe x → x n − 2−(n−1) cos (n arccos x).
1.12 Carathéodory’s Theorem We have seen that each point of the convex hull of a set S ⊆ X = Rn is a convex combination of a finite subset of S. This result can be refined. Theorem 1.12.1 (Carathéodory’s Theorem) Each point in the convex hull of a set S ⊆ Rn is a convex combination of an affinely independent subset of S. In particular, it is a convex combination of at most n + 1 points of S. Example 1.12.2 (Carathéodory’s Theorem) Figure 1.23 illustrates this result for a specific set of five points in the plane R2 . Each point in the convex hull of this set is by Proposition 1.7.8 a convex combination of these five points. But by Carathéodory’s theorem it is even a convex combination of three of these points—not always the same three points. Indeed, in the figure the convex hull is split up into three triangles. Each point in the convex hull belongs to one of the triangles. Therefore, it is a convex combination of the three vertices of the triangle. This is in agreement with Carathéodory’s theorem. Proof of Carathéodory’s Theorem We use the homogenization method. 1. Convex cone version. We consider the following statement: For a set S ⊆ X = Rn , every point in its conic hull cone(S) is a conic combination of a linearly independent subset of S. 2. Work with convex cones. Choose an arbitrary x ∈ cone(S). Let r be the minimal number for which x can be expressed as a conic combination of some finite subset T of S of r elements—#T = r. Choose such a subset T and write x as a conic combination of T . All coefficients of this conic combination must be positive by the minimality property of r. We claim that T is linearly independent. We prove this claim by contradiction. Suppose there would be a nontrivial linear relation among the elements of T . We may assume that one of its coefficients is positive (by multiplying it by −1 if necessary). Then we subtract a positive multiple of this linear relation from the Fig. 1.23 Carathéodory’s theorem
1.12 Carathéodory’s Theorem
41
chosen expression of x in T . If we let the positive scalar with which we multiply increase from zero, then initially all coefficients of the new linear combination of T will remain positive. But eventually we will reach a value of this positive scalar for which at least of the coefficients becomes zero, while all coefficients are still nonnegative. Then we would have expressed x as a conic combination of r − 1 elements. This would contradict the minimality property of r. This concludes the proof that T is linearly independent. 3. Translate back. What we have just proved (the convex cone version) implies the statement of Carathéodory’s theorem. Indeed, if we apply the result that we have just established to the set S × {1} and then dehomogenize, we get the required conclusion.
The convex cone version of Carathéodory’s theorem, which has been established in the proof above, is worth to be stated separately. Proposition 1.12.3 Every point in the conic hull of a set S ⊆ X = Rn is a conic combination of a linearly independent subset of S. In particular, it is a conic combination of at most n elements of S. Carathéodory’s theorem has many applications. The following one is especially useful. Corollary 1.12.4 The convex hull of a compact set S ⊆ X = Rn is closed. In the proof, we will need the following property of the concept of compactness. Recall that a function f from a set S ⊆ X = Rn to Y = Rm is called continuous if limx→a f (x) = f (a) for all a ∈ S. Proposition 1.12.5 The image of a compact set S ⊆ X = Rn under a continuous function S → Y , with Y = Rm , is again compact. Here we recall from Example 1.8.2 that the image of a closed set under a linear function need not be closed. Proof of Corollary 1.12.4 By Carathéodory’s theorem, co(S) is the image of the function {(α1 , . . . , αn+1 ) ∈ [0, 1]n+1  α1 + · · · + αn+1 = 1} × S n+1 → Rn , where S n+1 = S × · · · S (n + 1 copies), given by the recipe (α1 , . . . , αn+1 , s1 , . . . , sn+1 ) → α1 s1 + · · · + αn+1 sn+1 . This implies that co(S) is compact, by Proposition 1.12.5.
In the special case of Corollary 1.12.4 that the set S is finite, a much stronger result holds true. It is not hard to derive the following famous theorem from Theorem 1.12.1. We need two definitions.
42
1 Convex Sets: Basic Properties
Definition 1.12.6 A polyhedral set in X = Rn is the intersection of a finite collection of closed halfspaces of X, that is, it is the solution set of a finite system of linear inequalities ai · x ≤ αi , i ∈ I with I a finite index set, ai ∈ X \ {0X } and αi ∈ R for all i ∈ I . A polytope is a bounded polyhedral set. A polygon is a polytope in the plane R2 . Polytopes can be characterized as follows. For each polytope in Rn , we let ni , 1 ≤ i ≤ k be the outward normals of unit length of its n − 1boundaries and let Si , 1 ≤ i ≤ k be the corresponding n − 1dimensional volumes of the boundaries. Theorem 1.12.7 (Minkowski) For each polytope in Rn one has ki=1 Si ni = 0. n Conversely, every set of unit vectors 1 , . . . , nk } in R and every list of positive {n k numbers Si , 1 ≤ i ≤ k for which i=1 Si ni = 0 arise in this way for a unique polytope in Rn . This result will be proved in Chap. 9. Example 1.12.8 (Regular Polytope) A polytope in Rn is called regular if it is as symmetrical as possible. Here is a precise definition for n = 3 that can be generalized to arbitrary dimension. A polytope in R3 has faces of dimension 0, called vertices, of dimension 1, called edges, and of dimension 2, called just faces. A flag is a triple (v, e, f ) where v is a vertex of the polytope, e is an edge containing v, and f is a face containing e. A polytope is regular if for any two flags (v, e, f ) and (v , e f ) there is a symmetry of the polytope that takes v to v , e to e , and f to f . A polygon is regular if all sides have the same length and all angles are equal. The number of vertices can be any number ≥ 3. In each dimension n ≥ 3, one has the following three types of regular polytopes. If you take n + 1 points all at the same distance of each other, then they form the vertices of a regular simplex. For n = 2, this is an equilateral triangle and for n = 3 this is a regular tetrahedron. The set of all points x ∈ Rn for which x∞ = maxi xi  ≤ 1 form a hypercube. For n = 2 this is a square and for n = 3 this is a cube. The set of points x ∈ Rn for which x1 = x1  + · · · + xn  ≤ 1 form an orthoplex. For n = 2 this is a square and for n = 3 this is an octahedron. Besides, there are only five more regular polytopes, two for n = 3, the dodecahedron and the icosahedron, and three for n = 4. That the given list of regular polytopes is complete, can be proved for n = 3 as follows: there must be at least three faces meeting at each vertex, and the angles at that vertex must add up to less than 2π . Therefore, the only possibilities for the faces at a vertex are three, four, or five triangles, three squares, or three pentagons. These give the tetrahedron, the octahedron, the icosahedron, the cube, and the dodecahedron. Definition 1.12.9 A polyhedral cone in X = Rn is the intersection of a finite collection of halfspaces of X that have the origin on their boundary, that is, it is the solution set of a finite system of linear inequalities ai · x ≤ 0, i ∈ I with I a finite index set, ai ∈ X \ {0X } for all i ∈ I .
1.13 Preference Relations and Convex Cones
43
Example 1.12.10 (Polyhedral Cone) The first orthant Rn+ is an example of a polyhedral cone, and so of an unbounded polyhedral set. A polyhedral set is a closed convex set, and a polyhedral cone is a closed convex cone. Polyhedral sets are a very important and nice class of convex sets; we will come back to them in Chap. 4. Theorem 1.12.11 (Weyl’s Theorem) 1. A convex set is a compact polyhedral set iff it is the convex hull of a finite set of points. 2. A convex cone is a polyhedral cone iff it is the conic hull of a finite set of points. We will not display the derivation, as we will get this result in Chap. 4 as a special of a more general result.
1.13 Preference Relations and Convex Cones Convex cones arise naturally if you consider a preference relation for the elements of X = Rn . A preference (relation) on X is a partial ordering on X that satisfies the following conditions: • xx • x y, u v ⇒ x + u y + v. • x y, α ≥ 0 ⇒ αx αy. If is a preference relation on X, then it can be verified that the set {x ∈ X  x 0} is a convex cone C containing the origin. Conversely, each convex cone C ⊆ X containing the origin gives a preference relation on X—defined by x y iff x −y ∈ C. That is, there is a natural bijection between the set of preference relations on X and the set of convex cones in X that contain the origin. One can translate properties of a preference relation on X to its corresponding convex cone C ⊆ X. For example, is asymmetric (that is, x y and y x implies x = y) iff the convex cone C does not contain any line through the origin. A convex cone containing the origin that does not contain any line through the origin is called a pointed convex cone. Example 1.13.1 (A Preference Relation on SpaceTime) The following example of a preference relation in spacetime, the four dimensional space that is considered in special relativity theory (see Susskind and Friedman [2]), plays a key role in that 1 theory. This explains why the convex cone y ≥ (x12 + · · · + xn2 ) 2 is named after the physicist Lorentz. No information can travel faster than with the speed of light (for convenience, we assume that this speed equals 1, by suitable choices of units of length and time). Therefore, an event e that takes place at the point (x, y, z) at time t according to some observer can influence an event e that takes place at the point (x , y , z ) at time t according to the same observer if and only if light that is sent at
44
1 Convex Sets: Basic Properties
the moment t from the point (x, y, z) in the direction of the point (x , y , z ) reaches this point at time ≤ t . Then one writes e e. This defines a preference relation on spacetime R4 = {(x, y, z, t)}. Its corresponding convex cone is the Lorentz cone 1 L3 = {(x, y, z, t)  t ≥ (x 2 + y 2 + z2 ) 2 }. Indeed, light that is sent from the point (0, 0, 0) at time 0 in the direction of a point (x, y, z) will reach this point at a time 1 ≤ t iff t ≥ (x 2 + y 2 + z2 ) 2 . Here are some interesting properties of this preference relation from special relativity theory. Whether one has for two given events e and e that e e or not, does not depend on the observer. Therefore, one sometimes says that e e means that event e happens absolutely after event e. Note that it can happen for two events e and e that e happens after e according to some observer but that e happens after e according to another observer.
1.14 *The Shapley–Folkman Lemma You might like to do the following experiment yourself (with pencil and paper or with a computer). Choose a large number of arbitrary nonempty subsets S1 , . . . , Sm of the plane R2 , and make a drawing of their Minkowski sum S = S1 + · · · + Sm = {s1 +· · ·+sm  si ∈ Si , 1 ≤ i ≤ m}. That is, S is the set that you get by picking in all possible ways one element from each set and then taking the vector sum. You might expect that S can be any set in the plane. However, it has a very special property: each point in the convex hull of S lies close to a point of S.
Example 1.14.1 (Shapley–Folkman Lemma) Figures 1.24, 1.25 and 1.26 illustrate this experiment. Figure 1.24 shows a choice of sets: ten sets, each consisting of two antipodal points. Figure 1.25 shows the scaled outcome of the experiment, the Minkowski sum of the ten sets. Figure 1.26 also gives the convex hull of the Minkowski sum. This empirical phenomenon can be formulated as a precise result (see Starr [3]). This holds not only for the plane, but more generally in each dimension. To begin with, note the easy property that co(S) = co(S1 ) + · · · + co(Sm ) n for sets n S1 , . . . , Sm ⊆ X = R . That is, each x ∈ co(S) has a representation x = i=1 xi such that xi ∈ co(Si ) for all i. Now we give a stronger version of this property, and this is the promised precise result.
Lemma 1.14.2 (Shapley–Folkman Lemma) Let Si (i = 1, . . . , m) be m ≥ n nonempty subsets of Rn . Write S = m S for their Minkowski sum. Then each i=1 i
1.14 *The Shapley–Folkman Lemma Fig. 1.24 Ten sets of two antipodal points
Fig. 1.25 The Minkowski sum of the ten sets (scaled)
Fig. 1.26 The Minkowski sum of the ten sets compared to its convex hull
45
46
1 Convex Sets: Basic Properties
x ∈ co(S) has a representation x = xi ∈ Si for at least (m − n) indices i.
m
i=1 xi
such that xi ∈ co(Si ) for all i, and
This result implies indeed the empirical observation above. For example, if each set Si , 1 ≤ i ≤ m is contained in a ball of radius r, then the Shapley–Folkman lemma implies that each point in co(S) lies within distance 2nr of a point in S. In a more striking formulation, each point in the convex hull of the average sum 1 m (S1 + · · · + Sm ) lies within distance 2nr/m of a point of this average sum; note that 2nr/m tends to zero if m → ∞. Example 1.14.3 (Shapley–Folkman Lemma for Many Identical Sets) If we take averages S, 12 (S + S), 13 (S + S + S), . . . for an arbitrary bounded set S ⊂ Rn , then these averages converge to the convex hull of S in the sense that the set of all limits of convergent sequences s1 , s2 , . . . with sk in the kth average for all k is equal to the convex hull of S. The Shapley–Folkman lemma has many applications in economic theory, in optimization and in probability theory. **I can not resist the temptation to say something here about the light that the Shapley–Folkman lemma sheds on the intriguing and deep question why the necessary conditions for optimal control problems, Pontryagin’s Maximum Principle, one of the highlights of optimization theory in the twentieth century, are as strong as if these problems would be convex, which they are not. In passing, it is interesting that such strong necessary conditions are not—yet—known for optimal control problems where functions of several variables have to be chosen in an optimal way. By taking a limit, one can derive from the Shapley–Folkman lemma the result that the range of an atomless vector measure is convex. This result plays a key role in the proof of Pontryagin’s Maximum Principle; it is the ingredient that makes these necessary conditions so strong. If you are interested in a simple proof of the Shapley–Folkman lemma, here is one. **The following proof is due to Lin Zhou (see [4]). Proof Choose an arbitrary x ∈ co(S). Write it as x = m i=1 yi with yi ∈ co(Si ) li for all i. Then write, for each i, yi = j =1 aij yij , in which aij > 0 for all j , and l i j =1 aij = 1, and yij ∈ Si for all i. Construct the following auxiliary vectors in Rm+n : z = (x , 1, 1, . . . , 1), = (y1j , 1, 0, . . . , 0), z1j
................... = (ynj , 0, 0, · · · , 1). znj
1.15 Exercises
47
li Then we have z = m i=1 j =1 aij zij . Now we apply the convex cone version of Carathéodory’s theorem, Proposition 1.12.3. It follows that z can be expressed as li z= m i=1 j =1 bij zij , in which one has for all pairs (i, j ) that bij ≥ 0 and for at most (m + n) pairs (i, j ) that bij > 0. This amounts to: x=
li m i=1 j =1
bij yij and
li
bij = 1
j =1
for all i. Now write xi = lji=1 bij yij for all i. Then x = m i=1 xi and xi ∈ co(Si ) for each i. As the number of pairs (i, j ) for which bij > 0 is at most m + n and as for each i one has bij > 0 for each i, there are at most n indices i that have bij > 0 for more than one j . Hence xi ∈ Si for at least (m − n) indices i.
1.15 Exercises 1. Make a drawing similar to Fig. 3, but now for a landscape with a straight road that stretches out to the horizon, with lampposts along the road. 2. Check that the set F that is used to model the bargaining opportunity in the Alice and Bob example, is indeed a convex set. 3. Make a drawing of the pair (F, v) in the Alice and Bob example and check that this gives Fig. 1.1. 4. Show that the list of all convex subsets of the line R given in Example 1.2.3 is correct. 5. *Prove the statements about convex polygons in the plane R2 in Example 1.2.3. 6. *Prove the statement about approximating a bounded convex set in the plane R2 by polygons in Example 1.2.3. 7. Give a complete list of all convex cones in the plane R2 . 8. Consider the perspective function P : Rn × R++ → Rn given by the recipe (x, t) → t −1 x. Show that the image of a convex set A ⊆ Rn × R++ under the perspective function is convex. 9. Determine for each one of the following statements whether it is correct or not. (a) If you adjoin the origin to a convex cone that does not contain the origin, then you get a convex cone containing the origin. (b) If you delete the origin from a convex cone that contains the origin, then you get a convex cone that does not contain the origin. Hint: one statement is correct and one is wrong.
48
1 Convex Sets: Basic Properties
10. Prove the following statements, which describe the relation between convex cones that do not contain the origin and convex cones that do contain the origin. (a) For each convex cone C ⊆ Rn that does not contain the origin, the union with the origin, C ∪ {0n }, is a pointed convex cone. (b) Each convex cone C ⊆ Rn that contains the origin is the Minkowski sum of a pointed convex cone D and a subspace L, C = D + L = {c + l  c ∈ C, l ∈ L}. The subspace L is uniquely determined: it is the lineality space LC of the convex cone C, the largest subspace of Rn contained in the convex cone C, that is, LC = C ∩ (−C). 11. In this exercise, we apply homogenization of convex sets to the analysis of a system of two linear equations in two variables. Consider a system of two equations of the form a11 x1 + a12 x2 = b1 , a21 x1 + a22 x2 = b2 . Assume that for each equation at least one coefficient is nonzero. (a) How can you determine whether the system has 0, 1 or infinitely many solutions, by means of the homogenized system a11 x1 + a12 x2 = b1 x3 , a21 x1 + a22 x2 = b2 x3 . (b) Assume that the equations are not scalar multiples of each other. Show that the homogenized system has a unique (up to positive scalar multiples) solution with x3 ≥ 0. (c) Show that translation of this uniqueness result into geometric language gives the following statement. Consider two closed halfplanes in the upper halfspace x3 ≥ 0 that are both bounded by lines in the horizontal plane x3 = 0 through the origin, and that are both not contained in the horizontal plane x3 = 0. Then their intersection is a closed ray in the upper halfspace x3 ≥ 0. 12. Show that a subspace of Rn is a convex cone, and so in particular a convex set. 13. Let M be an m × nmatrix. Show that the subspaces {x ∈ Rn  Mx = 0} and {M y  y ∈ Rm } of Rn are each others orthogonal complement. 14. Give precise proofs for the results on subspaces from linear algebra that are listed in Sect. 1.2 as specializations of results on convex sets in the Chaps 1–
1.15 Exercises
49
4 of this book. In particular, show that for subspaces L1 , L2 ⊆ Rn , one has ⊥ ⊥ ⊥ ⊥ (L1 + L2 )⊥ = L⊥ 1 ∩ L2 and (L1 ∩ L2 ) = L1 + L2 , and show that the maps X and +X are each others transpose. 15. *Let A be a bounded convex set in the plane R2 . Show that there exists an inscribed rectangle r in A and a homothetic copy of r, R = {αx + β  x ∈ r} for suitable α > 0 and β, that is circumscribed and such that the homothety ratio α is at most 2 and moreover 1 area(R) ≤ area(A) ≤ 2 area(r). 2 16. Show that convex sets in the line R are precisely all intervals of the following forms—∅, (a, b), (a, b], [a, b), [a, b], (a, +∞), [a, +∞), (−∞, b), (−∞, b], (−∞, +∞), where a, b ∈ R, a < b. 17. Show that the solution set of the inequality f (x) ≤ 0 is a convex set if f is the simplest nontrivial type of function f : Rn → R, an affine function x → b · x + β with b ∈ Rn \ {0n } and β ∈ R. 18. Show that the solution set of the inequality f (x) ≤ 0 is a convex set if f is the second simplest nontrivial type of function f : Rn → R, a quadratic function x → x Bx + b x + β with B a symmetric n × nmatrix, b ∈ Rn , β ∈ R, provided B is positive semidefinite. 19. Show that a circle, an ellipse, a parabola and a branch of a hyperbola are all boundaries of convex sets. 20. Give a relation between the convex hull of a set S ⊆ Rn and the conic hull of the set S × {1}. 21. Show that the intersection of any collection of convex sets (respectively convex cones) is again a convex set (respectively a convex cone). 22. Show that the closed convex cones C in the plane R2 are C = ∅, C = {02 }, C = R2 , closed rays, closed halfplanes with 02 on the boundary, and sets consisting of all conic combinations of two linearly independent unit vectors u and v in R2 , C = {αu + βv  α, β ∈ [0, +∞)}. 23. Show that the first orthant, the Lorentz cone and the semidefinite cone are convex cones and that these convex cones are pointed. 24. Show that the vector space of symmetric n × nmatrices can be identified with Rm where m = 12 n(n + 1), by means of stacking. 25. Show that viewing circles, ellipses, parabolas and branches of hyperbolas as conic sections is an example of the homogenization of convex sets. 26. Make a version of Fig. 1.11 for an unbounded convex set in the plane. That is, draw the conification for such a set. 27. Determine the recession cones of some simple convex sets, such as (7, +∞), {(x, y)  x ≥ 0, y ≥ 0}, {(x, y)  x > 0, y > 0}, {(x, y) ∈ R2  y ≥ x 2 }, and {(x, y) ∈ R2  xy ≥ 1, x ≥ 0, y ≥ 0}.
50
1 Convex Sets: Basic Properties
28. Let A ⊆ X = Rn be a nonempty convex set. Show that the largest set S in Rn for which adjoining S × {0} to the convex cone c(A) preserves the convex cone property, is the recession cone of A, S = RA , defined to be the set of all r ∈ X for which a + tr ∈ A for all t ≥ 0 and all a ∈ A. 29. Show that the recession cone RA of a nonempty closed convex set A ⊆ Rn is a closed convex cone. 30. Prove the statements of Proposition 1.5.7. 31. Show that the minimal conification of an open convex set is open. 32. Show that the maximal conification of a closed convex set is closed. 33. Show that the maximal conification of an open convex set is never open. 34. Show that the minimal conification of a closed convex set is never closed. 35. Let A ⊆ X = Rn be a nonempty convex set. Show that cone(A), the conic hull of A, is equal to R+ A = {ρa  ρ ≥ 0, a ∈ A}. 36. Verify that the following three operations make new convex sets (respectively convex cones) from old ones: the Cartesian product, image and inverse image under a linear function. 37. As an exercise in the homogenization method, derive the property that the image and inverse image of a convex set under a linear function are convex sets from the analogous property for convex cones. 38. Show that the image of the closed convex set {(x, y)  x, y ≥ 0, xy ≥ 1} under the linear function R2 → R given by the recipe (x, y) → x is a convex set in R that is not closed. 39. Show that the property closedness is preserved by the following two operations: the Cartesian product and the inverse image under a linear function. 40. Show that the image MA of a closed convex set A ⊆ Rn under a linear function M : Rn → Rm is closed if the kernel of M, that is, the solution set of Mx = 0, and the recession cone of A, RA , have only the origin in common. 41. Show that the image MA of an open convex set A ⊆ Rn under a surjective linear function M : Rn → Rm is open. 42. Show that the proof of Theorem 1.9.1 gives an algorithm for finding the sets B and R and a point in B ∩ R. 43. Show that the proof of Theorem 1.10.1 gives in principle an algorithm for finding a common point of the given collection of convex sets, provided one has a common point of each subcollection of n + 1 of these sets. 44. Show that the assumption in Theorem 1.10.7 that all convex sets of the given collection are compact can be relaxed to: all convex sets of the given collection are closed and at least one of them is bounded. 45. Make the algorithm that is implicit in the proof of Theorem 1.12.1 explicit. 46. Is the smallest closed set that contains a given convex set always convex? 47. Is the smallest convex set that contains a given closed set always closed? 48. Give a constructive proof (‘an algorithm’) without using any result for Weyl’s theorem in the case of a tetrahedron, the convex hull of an affinely independent set. To be precise, prove the following statements.
1.16 Hints for Applications of Helly’s Theorem
51
(a) Show that a tetrahedron is a bounded polyhedral set. (b) Show that a bounded polyhedral set in Rn that is the intersection of n+! closed halfspaces and that has a nonempty interior is a tetrahedron. (c) Show that the conic hull of a basis of Rn is a polyhedral cone. (d) Show that a polyhedral cone in Rn that is the intersection of n halfspaces that have the origin on their boundary and that have linearly independent normals of these boundaries, is the conic hull of a basis of Rn . 49. Show that there is for a given vector space V a bijection between the preference relations on V and the convex cones C ⊆ V containing the origin, defined as follows: and C correspond iff x ∈ C ⇔ x 0. 50. Show that for a corresponding convex cone C containing the origin and a preference relation , one has that C is pointed iff is asymmetric. 51. Consider the relation on spacetime R4 defined by (x , y , z , t ) (x, y, z, t) iff a point that departs from position (x, y, z) at time t and travels in a straight line with speed 1 in the direction of position (x , y , z ) reaches position (x , y , z ) not later than at time t . (a) Show that is an asymmetric preference relation. (b) Show that (x , y , z , t ) (x, y, z, t) implies t ≥ t, but that the converse implication does not hold. (c) Show that the corresponding convex cone is the Lorentz cone L3 . 52. Assume that u, u1 , . . . , up are vectors in Rq . Show that if u is a conic combination of {u1 , . . . , up }, then it can be written as a conic combination of no more than q vectors from {u1 , . . . , up }. 53. Prove the following equality for m nonempty subsets S1 , . . . , Sm in Rn , where we have written S = S1 + · · · + Sm = {s1 + · · · + sm  si ∈ Si , 1 ≤ i ≤ m}: co(S) = co(S1 ) + · · · + co(Sm ). 54. Show that if we take averages S, 12 (S + S), 13 (S + S + S), . . . for a bounded set S ⊂ Rn , then these averages converge to the convex hull of S in the sense that the set of all limits of convergent sequences s1 , s2 , . . . with sk in the kth average for all k is equal to the convex hull of S.
1.16 Hints for Applications of Helly’s Theorem Here are hints for the applications of the theorem of Helly that are formulated in Sect. 1.11. • Proposition 1.11.1. Consider the collection of unit disks with center an element of the set. • Proposition √ 1.11.2. Show that each three points of the set are included in a disk of radius 1/ 3.
52
1 Convex Sets: Basic Properties
• Proposition 1.11.3. Let l run over the set L of supporting lines of A, define for each l ∈ L the set Al to consist of all points in A of distance to l at most 1/3 and then consider the collection of sets Al , l ∈ L. • Proposition 1.11.4. Consider the collection of closed halfplanes in the plane which contains on its boundary a side of the polygon and which contains also all points of the polygon close to this side. • Proposition 1.11.5. Consider the collection of convex hulls of subsets of S of at nm least n+1 points, where m = #(S). • Proposition 1.11.6. Let l run over the set of parallel line segments L, define for each l ∈ L the set Al consisting of all lines that intersect l and then consider the collection Al , l ∈ L.
References 1. J. Nash, The bargaining problem. Econometrica 18(2), 155–162 (1950) 2. L. Susskind, A. Friedman, Special Relativity and Classical Field Theory: The Theoretical Minimum (Basic Books, New York, 2017) 3. R.M. Starr, Quasiequilibria in markets with nonconvex preferences (Appendix 2: The Shapley– Folkman theorem, pp. 35–37). Econometrica 37(1), 25–38 (1969) 4. L. Zhou, A simple proof of the Shapley–Folkman theorem. Econ. Theory 3, 371–372 (1993)
Chapter 2
Convex Sets: Binary Operations
Abstract • Why. You need the Minkowski sum and the convex hull of the union, two binary operations on convex sets, to solve optimization problems. A convex function f has a minimum at a given point x iff the origin lies in the subdifferential ∂f ( x ), a certain convex set. Now suppose that f is made from two convex functions f1 , f2 with known subdifferentials at x , by one of the following two binary operations for convex functions: either the pointwise sum (f = f1 + f2 ) or the pointwise maximum (f = max(f1 , f2 ) ), and, in the latter case, f1 ( x ) = f2 ( x ). Then ∂f ( x ) can be calculated, and so it can be checked whether it contains the origin or not: ∂(f1 + f2 )( x ) is equal to the Minkowski sum of the convex sets ∂f1 ( x) and ∂f2 ( x ) and ∂(max(f1 , f2 ))( x ) is equal to the convex hull of the union of ∂f1 ( x ) and ∂f2 ( x ) in the difficult case that f1 ( x ) = f2 ( x ). • What. The four standard binary operations for convex sets are the Minkowski sum, the intersection, the convex hull of the union and the inverse sum. The defining formulas for these binary operations look completely different, but they can all be generated in exactly the same systematic way by a reduction to convex cones (‘homogenization’). These binary operations preserve closedness for polyhedral sets but not for arbitrary convex sets. Road Map • Section 2.1.1 (motivation binary operations convex sets). • Section 2.1.2 (crash course in working coordinatefree). • Section 2.2 (construction binary operations +, ∩ for convex cones by homogenization; the fact that the sum map and the diagonal map are each others transpose). • Section 2.3 (construction of one binary operation for convex sets by homogenization). • Proposition 2.4.1 and Fig. 2.1 (list for the construction of the standard binary operations +, ∩, co∪, # for convex sets by homogenization; note that not all of these preserve the nice properties closedness and properness).
© Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045_2
53
54
2 Convex Sets: Binary Operations
2.1 *Motivation and Preparation 2.1.1 *Why Binary Operations for Convex Sets Are Needed Binary operation on convex sets have been considered already a long time ago, for example in the classic work [1]. In [2] and later in [3], it has been attempted to construct complete lists of binary operations on various types of convex objects. This plan was realized using the method presented in this chapter, by homogenization, in [4] and [5]. A driving idea behind convex analysis is the parallel with differential analysis. Let us apply this idea to the minimization of a function f : Rn → R. If f is differentiable, then one has, at a minimum x , that f ( x ) = 0, the derivative of f at x is zero. If moreover, f is built up from differentiable functions for which one knows the derivative in x , as f = f1 + f2 or as f = f1 f2 , then f ( x ) can be expressed in f1 ( x ) and f2 ( x ). Here are the convex parallels of these two facts. If f is convex (that is, the region above its graph is a convex set), then x is a minimum iff 0 ∈ ∂f ( x ), zero lies in a certain convex set, called the subdifferential of f at x . If moreover f is built up from convex functions for which one knows the derivative, as f = f1 + f2 or as f = max(f1 , f2 ), then ∂f ( x ) can be expressed in ∂f1 ( x ) and ∂f2 ( x ), using binary operations for convex sets. We will see in Chap. 7 that these parallel facts are used to solve optimization problems.
2.1.2 *Crash Course in Working CoordinateFree In order to present the constructions of the binary operations for convex sets as simply as possible, we will work coordinatefree. In this section, we recall the minimal knowledge that is needed for this. More information can be found in any textbook on linear algebra. The idea is that if you work with column spaces, matrices, dot products and the transpose of a matrix, you can do this in another—more abstract—language, which does not make use of coordinates. • Column spaces are described by finite dimensional vector spaces. These are defined as sets V that are equipped with a function V ×V → V , which is denoted by (v, w) → v + w and called vector addition, and with a function R × V → V , which is denoted by (α, v) → αv and called scalar multiplication. In order to make sure that you can work with vector addition and scalar multiplication in the same way that you do in column spaces, you impose a number of suitable axioms on them such as α(v + w) = αv + αw. Finite dimensionality of V is ensured by requiring that there exist v1 , . . . , vn ∈ V such that each v ∈ V can be written as α1 v1 + · · · + αn vn . for suitable numbers α1 , . . . , αn ∈ R.
2.1 *Motivation and Preparation
55
The notion of finite dimensional vector space is essentially the same as the notion of column space, in particular, it is not more general. If the natural number n above is chosen to be minimal, then v1 , . . . , vn is called a basis of V . The reason for this terminology is that each v ∈ V can be written uniquely as α1 v1 + · · · + αn vn for suitable numbers α1 , . . . , αn ∈ R. Then you can replace v by the column vector with coordinates α1 , . . . , αn (from top to bottom). If you do this, then you are essentially working with column space Rn with the usual vector addition and scalar multiplication. So a finite dimensional vector space is indeed essentially a column space. • Matrices are described by linear functions between two finite dimensional vector spaces V and W . These are defined as functions L : V → W that send sums to sums, L(u + v) = L(u) + L(v), and a scalar multiples to the same scalar multiple, L(αv) = αL(v). The notion of a linear function is essentially the same as the notion of a matrix. Let L : V → W be a linear function between finite dimensional vector spaces. Choose a basis v1 , . . . , vn of V and a basis w1 , . . . , wm of W . Then there exists a unique m × nmatrix M such that if you go over to coordinates, and so get a function Rn → Rm , then this function is given by left multiplication by the matrix M, that is, by the recipe x → Mx, where Mx is matrix multiplication. So the notion of a linear function is indeed essentially the same as the notion of a matrix. Here we point out one advantage of working with linear functions over working with matrices. This reveals the deeper idea behind the matrix product: it is just the coordinate version of composition of linear functions (such a composition is of course again a linear function). • The dot product x · y = x1 y1 + · · · + xn yn is described by an inner product on a finite dimensional vector space V . This is defined to be a function V ×V → R— denoted by (u, v) → u, v = u, vV and called inner product—on which suitable axioms are imposed so that you can work with it just as you do with the dot product. The notion of an inner product is essentially the same as the notion of the dot product; in particular, it is not more general. Let ·, · be an inner product on a finite dimensional vector space V . Choose an orthonormal basis v1 , . . . , vn of V (that is, vi , vj equals 0 if i = j and it equals 1 if i = j ). Go over to coordinates, and so get an inner product on Rn . Then this inner product is the dot product. So the notion of inner product is indeed essentially the same as the notion of the dot product. A finite dimensional vector space together with an inner product is called a finite dimensional inner product space. So a finite dimensional inner product space is essentially a column space considered with the dot product. • The transpose of a matrix is described by the transpose L of a linear function L between two finite dimensional inner product spaces V and W . This is defined as the unique linear function L : W → V for which L w, vV = w, LvW
56
2 Convex Sets: Binary Operations
for all v ∈ V , w ∈ W . Note that L = L. That is, for two linear functions L : V → W and M : W → V we have L = M iff M = L. Then L and M are said to be each others transpose. The notion of the transpose of a linear function is essentially the same as the notion of the transpose of a matrix. Let L : V → W be a linear function between finite dimensional inner product spaces. Choose orthonormal bases v1 , . . . , vn of V and w1 , . . . , wm of W . Then we go over to coordinates, and so L is turned into an m × nmatrix and L is turned into an n × mmatrix. Then these two matrices are each others transpose. One can do everything that was done in Chap. 1 coordinate free. For example, one can define convex sets and convex cones in vector spaces, one can take Cartesian products of convex sets, one can take the image of a convex set under a linear function between two vector spaces and one can take the inverse image of a convex set under a linear function between two vector spaces.
2.2 Binary Operations and the Functions +X , X The aim of this section is to construct the two binary operations for the homogeneous case, that is, for convex cones—intersection and Minkowski sum—in a systematic way: by homogenization. Moreover, it is shown that these two binary operations are closely related—by means of the transpose operator. For these tasks, it is convenient to work coordinatefree. However, we recall that a finite dimensional inner product space is essentially just Rn equipped with the dot product. We consider two special linear functions for a finite dimensional inner product space X: • the sum map +X : X × X → X is the linear function with recipe (x, y) → x + y, • the diagonal map X : X → X×X is the linear function with recipe x → (x, x). The function +X needs no justification—it gives the formal description of addition in X. The function X gives a formal description for the equality relation x = y on pairs (x, y) of elements of X: these pairs are of the form X (x). Moreover, we now show that these two special linear functions +X and X are closely related: they are each other’s transpose. This fundamental property will be used later, for example in Chap. 6, to get calculus rules such as ∂(f1 + f2 )(x, y) = ∂f1 (x, y) + ∂f2 (x, y). Proposition 2.2.1 (Relation Sum and Diagonal Map) Let X be a finite dimensional inner product space. Then the sum map +X and the diagonal map X are each other’s transpose.
2.3 Construction of a Binary Operation
57
Proof For all u, v, w ∈ X one has the following chain of equalities: u, X (v, w) = X (u), (v, w) = (u, u), (v, w) = u, v + u, w = u, v + w = u, +X (v, w). It follows that X = +X as required.
Now we construct binary operations on convex cones. For two convex cones C, D ⊆ X, we take, to begin with, the Cartesian product C × D, which is a convex cone in X × X. Now we use the two operations by linear functions on convex sets— image and inverse image. We take (C × D)X , the inverse image of C × D under the diagonal map X . This is a convex cone in X. Note that this is the intersection C ∩ D of C and D: (C × D)X = C ∩ D, We take +X (C × D), the image of C × D under the sum map +X . This is also a convex cone in X. Note that this is the Minkowski sum C + D of C and D: +X (C × D) = C + D. Thus we have constructed the two standard binary operations (C, D) → C ◦D on convex cones—intersection and Minkowski sum respectively—by taking for C ◦ D the outcome of the following two operations on the subset C × D of X × X: ‘inverse image under the diagonal map’ and ‘image under the sum map’ respectively. Conclusion A systematic construction (by homogenization) gives the two standard binary operations for convex cones, intersection and Minkowski sum. This makes clear that these binary operations are related—by the transpose operator.
2.3 Construction of a Binary Operation The aim of this section is to give a systematic construction for binary operations for convex sets: by the homogenization method. We repeat once again that this means that we begin by working with convex cones and then we dehomogenize. The skill that is demonstrated in this section is somewhat technical. By reading the explanation of the construction of binary operations below to get the idea, moreover studying carefully the example to understand the details, furthermore repeating the calculation from the example yourself, and finally constructing in the same way one of the other three binary operations, you can master it. The precise way to construct a binary operation on convex sets by homogenization, is as follows. We take the space that contains the homogenizations of convex sets in a finite dimensional vector space X (we recall that X is essentially Rn ): this
58
2 Convex Sets: Binary Operations
space is the product space X×R. Then we make, for each of the two factors X and R of this space, a choice between + and −. Let the choice for X be ε1 and let the choice for R be ε2 . This gives 22 = 4 possibilities for (ε1 , ε2 ): (++), (+−), (−+), (−−). Each choice (ε1 , ε2 ) will lead to a binary operation ◦(ε1 ,ε2 ) on convex sets. Let (ε1 , ε2 ) be one of the four choices, and let A and B be two ‘old’ convex sets in X. We use the homogenization method to construct a ‘new’ convex set A◦(ε1 ,ε2 ) B in X. 1. Homogenize. We conify A and B. This gives the conifications c(A) and c(B) respectively. These are convex cones in X × R++ . 2. Work with homogeneous convex sets, that is, with convex cones. We are going to associate to the old convex cones C = c(A) and D = c(B) a new convex cone E that is contained in X × R++ . We take the Cartesian product C × D. This is a convex cone in X × R++ × X × R++ . We rearrange its factors: (X × X) × (R++ × R++ ). Now we view C × D as a subset of (X × X) × (R++ × R++ ). Then we act on C × D as follows, using the pair (ε1 , ε2 ) that we have chosen. We define the promised new convex cone E to be the outcome of applying simultaneously two operations on C × D: the first operation is taking the image under the mapping +X if ε1 = +, but it is the inverse image under the mapping X if ε1 = − and the second operation is taking the image under the mapping +R if ε2 = +, but it is taking the inverse image under the mapping R if ε2 = −. Here we recall the domain and range where we consider these mappings: +X : X × X → X, X : X → X × X, +R : R++ × R++ → R++ , R : R++ → R++ × R++ . The outcome E is a convex cone in X × R++ . 3. Dehomogenize. We take the dehomogenization of E to be A ◦(ε1 ,ε2 ) B. Example 2.3.1 (The Binary Operation ◦(+−) for Convex Sets) We will only do one of four possibilities: as a specific example, we choose + for the factor X and we choose − for the factor R. Then we form the pair (+−). This pair will determine a binary operation (A, B) → A ◦(+−) B on convex sets in X. At this point, it remains to be seen which binary operation we will get.
2.3 Construction of a Binary Operation
59
To explain what this binary operation is, let A and B be two ‘old’ convex sets in X. We have to make a new convex set A ◦(+−) B in X from A and B. We use the homogenization method. 1. Homogenize. We conify A and B. This gives the conifications c(A) and c(B) respectively. These are convex cones in X × R++ . 2. Work with homogeneous convex sets, that is, with convex cones. We want to associate to the old convex cones C = c(A) and D = c(B) a new convex cone E that is contained in X × R++ . We take the Cartesian product C × D. This is a convex cone in X × R++ × X × R++ . We rearrange its factors: (X × X) × (R++ × R++ ). Now we view C × D as a subset of (X × X) × (R++ × R++ ). Then we act on C × D as follows, using the pair (+−) that we have chosen. We define the promised new convex cone E to be the outcome of taking simultaneously the image under +X (as the pair (+−) starts with +) and the inverse image under R (as the pair (+−) ends with −). Then E is a convex cone in X × R++ . Explicitly, E consists of all elements (z, γ ) ∈ X × R++ for which there exists an element ((x, α), (y, β)) ∈ C × D such that z = +X (x, y) and R (γ ) = (α, β). In passing, we note that this explicit description of E confirms something that we have already observed: that E, just as C and D, is a convex cone that is contained in X × R++ , E ⊆ X × R++ . Now we use that C = c(A) and D = c(B). It follows that E consists of all (z, γ ) ∈ X × R++ for which there exists an element (s(a, 1), t (b, 1)) = (sa, s, tb, t) ∈ c(A) × c(B) for which z = sa + tb and γ = s = t. Thus E consists of the set of pairs (γ (a + b), γ ) with γ > 0, a ∈ A, b ∈ B.
60
2 Convex Sets: Binary Operations
That is, E = c(A + B), the conification or homogenization of the Minkowski sum A + B. 3. Dehomogenize. The deconification or dehomogenization of E = c(A + B) is A + B. Therefore, the choice of (+−) leads to the binary operation Minkowski sum, A ◦(+−) B = A + B for all convex sets A, B ⊆ X = Rn . Now it is recommended that you repeat the construction for one of the other three possibilities. Conclusion We have seen that for each one of the 4 = 22 possibilities of choosing for each one of the two factors of X × R between + and −, we get by means of the homogenization method a binary operation for convex sets in X. For the choice + for X and − for R, this gives just the wellknown binary operation Minkowski sum (A, B) → A + B.
2.4 Complete List of Binary Operations Here is a list that gives which choice leads to which binary operation if we construct the binary operations for convex sets systematically, by the homogenization method. Proposition 2.4.1 (Systematic Construction Binary Operations for Convex Sets) The outcome of the 4 = 22 possibilities (±, ±) of choosing for each one of the two factors of X × R between + and − is as follows: • (+−) gives the Minkowski sum +, defined by A + B = {a + b  a ∈ A, b ∈ B}, • (−−) gives the intersection ∩, defined by A ∩ B = {x ∈ X  x ∈ A & x ∈ B}, • (++) gives the convex hull of the union co ∪, defined by A co ∪ B = co(A ∪ B),
2.4 Complete List of Binary Operations
61
• (−+) gives the inverse sum #, defined by A#B = ∪α∈[0,1] (αA ∩ (1 − α)B). Example 2.4.2 (Minkowski Sum) For n = 1, the Minkowski sum of two intervals is given by [a, b) + [c, d) = [a + c, b + d). For n = 2, the Minkowski sum of two rectangles whose sides are parallel to the coordinate axes is given by ([a1 , b1 ) × [a2 , b2 )) + ([c1 , d1 ) × [c2 , d2 )) = (a1 + c1 , b1 + d1 ) × [a2 + c2 , b2 + d2 ). This can be used to compute the Minkowski sum of two arbitrary convex sets in R2 by approximating these by a disjoint union of such rectangles and then using that if two convex sets A, B in R2 are written as the disjoint union of convex sets, A = ∪i∈I Ai and B = ∪j ∈J Bj , then A + B = ∪i∈I,j ∈J (Ai + Bj ). For general n, one can proceed in a similar way as for n = 2. Example 2.4.3 (Convex Hull of the Union) Figure 2.1 illustrates the convex hull of the union. Figure 2.1 shows that if we want to determine the convex hull of the union of two convex sets A, B analytically in the interesting case that A ⊆ B and B ⊆ A, then we have to determine the outer common tangents to A and B and the points of tangency. Example 2.4.4 (Inverse Sum) Let A, B ⊆ Rn be convex sets. Then 0n ∈ A#B iff 0n ∈ A or 0n ∈ B. For a nonzero vector v ∈ Rn , v ∈ A#B iff it lies in the direct sum of the following two convex subsets of the open ray R generated by v, A ∩ R and B ∩ R. It remains to determine the inverse sum of two intervals in (0, +∞): [a −1 , b−1 )#[c−1 , d −1 ) = [(a + c)−1 , (b + d)−1 ) for all a > b > 0 and c > d > 0. This formula explains the terminology inverse sum.
A
A B A co ∪ B
Fig. 2.1 Convex hull of the union
B A co ∪ B
62
2 Convex Sets: Binary Operations
It would be interesting if we could characterize the four standard binary operations by means of some axioms. This remains an open question. But at least we have demonstrated that, from the point of view of homogenization, the four standard binary operations look the same and form the complete list of binary operations for convex sets that can be constructed by the homogenization method. Note that the binary operations + and co ∪ preserve properness, but that the binary operations ∩ and # do not preserve properness. The binary operation ∩ preserves closedness, but the binary operations +, co ∪ and # do not preserve closedness. We have now four binary operations on convex sets. If we restrict them to convex cones, then they agree pairwise. Can you see how? Conclusion The four standard binary operations on convex sets—intersection, convex hull of the union, Minkowski sum and inverse sum—are defined by completely different looking formulas, but they look exactly the same from the point of the homogenization method and they form the complete list of the binary operations on convex sets that are constructed by this method.
2.5 Exercises 1. Determine, by means of a drawing, the Minkowski sum A + B of the following convex sets A, B in the plane R2 : (a) A = {(2, 3)} and B is the closed disk B((−1, 1), 4), (b) A and B are closed line segments; the endpoints of A are (2, 0) and (4, 0), the endpoints of B are (0, −1) and (0, 3), (c) A and B are closed line segments; the endpoints of A are (2, 3) and (7, 5), the endpoints of B are (1, 4) and (6, 8), (d) A is the closed line segment with endpoints (2, 3) and (4, 1) and B is the closed disk B((0, 0), 2). (e) A and B are closed disks: A = B((2, 1), 3) and B = B((−3, −2), 5), (f) A is the solution set of y ≥ ex and B is the solution set of y ≥ e−x , (g) A is the solution set of y ≥ x 2 and B is the solution set of y ≥ e−x . 2. Determine, by means of a drawing, the convex hull of the union A co ∪ B of the following convex sets A, B in the plane R2 : (a) A = {(4, 7)} and B is the closed line segment with endpoints (2, 3) and (4, 5), (b) A and B are closed line segments; the endpoints of A are (−1, 0) and (2, 0), the endpoints of B are (0, −3) and (0, 4), (c) A and B are closed line segments; the endpoints of A are (1, 0) and (2, 0), the endpoints of B are (0, 3) and (0, 4), (d) A and B are closed line segments; the endpoints of A are (2, 3) and (7, 5), the endpoints of B are (1, 4) and (6, 8),
2.5 Exercises
63
(e) A = {(0, 0)} and B = B((4, 4), 3), (f) A = B((0, 0), 1) and B = B((2, 5), 3). (g) A is the closed line segment with endpoints (2, −1) and (2, 3) and B is the standard closed unit disk B((0, 0), 1). 3. Determine, by means of a drawing, the inverse sum A#B of some selfchosen convex sets A, B in the plane R2 . 4. Prove that the Minkowski sum of two compact convex sets is compact. How is this for the other three standard binary operations for convex sets? 5. Prove that the Minkowski sum of a compact convex set and a closed set is closed. How is this for the other three standard binary operations for convex sets? 6. Show that the Minkowski sum of two closed convex sets need not be closed. How is this for the other three standard binary operations for convex sets? 7. Show that for two functions f, g : Rn → R that are convex (that is, the region above their graph is convex), the pointwise sum f +g is also a convex function. 8. Let a > b > 0 and c > d > 0. Show that [a −1 , b−1 )#[c−1 , d −1 ) = [(a + c)−1 , (b + d)−1 ).
9. 10. 11.
12. 13.
14.
15.
This formula explains the terminology inverse sum for the binary operation # on convex sets. Give a ‘coordinate proof’ of Proposition 2.2.1: show that the matrices of +X and X for X = Rn , equipped with the dot product, are each others transpose. Verify the two formulas (C × D)X = C ∩ D and +X (C × D) = C + D for convex cones C, D ⊆ X = Rn . Show that each one of the two standard binary operations for convex cones = ∩, + is commutative, AB = BA, and associative, (AB)C = A(BC). Do this in two ways: by the explicit formulas for these binary operations and in terms of their systematic construction. Check that the set E constructed in Sect. 2.3 is a convex cone in X × R for which E ⊆ X × R++ . (a) Show that the following two binary operations for convex sets, the Minkowski sum and the convex hull of the union, are the same for convex cones. (b) Show that the following two binary operations for convex sets, the intersection and the inverse sum, are the same for convex cones. Show that each one of the four wellknown binary operations on convex sets = ∩, +, co ∪, # is commutative, AB = BA, and associative, (AB)C = A(BC). Do this in two ways: by the explicit formulas for these binary operations and in terms of their systematic construction. Prove Proposition 2.4.1.
64
2 Convex Sets: Binary Operations
References 1. R.T. Rockafellar, Convex Analysis (Princeton University, Princeton, 1970) 2. V.M. Tikhomirov, Convex Analysis. Analysis II, Encyclopaedia of Mathematical Sciences, vol. 14 (Springer, Berlin, 1990) 3. G.G. MagarilIlyaev, V.M. Tikhomirov, Convex Analysis: Theory and Applications. Translations of Mathematical Monographs, vol. 222 (American Mathematical Society, Providence, 2003) 4. J. Brinkhuis, V.M. Tikhomirov, Duality and calculus of convex objects. Sb. Math. 198(2), 171– 206 (2007) 5. J. Brinkhuis, Convex duality and calculus: reduction to cones. J. Optim. Theory Appl. 143, 439 (2009)
Chapter 3
Convex Sets: Topological Properties
Abstract • Why. Convex sets have many useful special topological properties. The challenging concept of recession directions of a convex set has to be mastered: this is needed for work with unbounded convex sets. Here is an example of the use of recession directions: they can turn ‘nonexistence’ (of a bound for a convex set or of an optimal solution for a convex optimization problem) into existence (of a recession direction). This gives a certificate for nonexistence. • What. The recession directions of a convex set can be obtained by taking the closure of its homogenization. These directions can be visualized by means of the three models as ‘points at infinity’ of a convex set, lying on ‘the horizon’. All topological properties of convex sets can be summarized by one geometrically intuitive result only: if you adjoin to a convex set A their points at infinity, then the outcome has always the same ‘shape’, whether A is bounded or unbounded: it is a slightly deformed open ball (its relative interior) that is surrounded on all sides by a ‘peel’ (its relative boundary with its points at infinity adjoined). In particular, this is essentially a reduction of unbounded convex sets to the simpler bounded convex sets. This single result implies all standard topological properties of convex sets. For example, there is a very close relation between the relative interior and the closure of a convex set: they can be obtained out of each other by taking in one direction the relative interior and in the other one by the closure. Another example is that the relative boundary of a convex set has a ‘continuity’ property. Road Map • Definition 1.5.4 and Fig. 3.1 (recession vector). • Section 3.2 (list of topological notions). • Proposition 3.2.1, Definition 1.5.4, Fig. 3.2 (construction of recession vectors by homogenization). • Corollary 3.2.4 (unboundedness equivalent to existence recession direction). • Figures 3.3, 3.4, 3.5, 3.6, 3.7 (visualization of recession directions as points on the horizon). • Theorem 3.5.7 (the shape of bounded and unbounded convex sets). © Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045_3
65
66
3 Convex Sets: Topological Properties
• Figure 3.10 (the idea behind the continuity of the relative boundary of a convex set). • Corollary 3.6.2 (all standard topological properties of convex sets). • Figure 3.11 (warning that the image under a linear function of a closed convex set need not be closed).
3.1 *Crash Course in Topological Notions The study of convex sets A leads to the need to consider limits of those sequences of points in A that tend either to a boundary point of A or ‘to a recession direction of A’ (in particular, its Euclidean norm goes to infinity). The most convenient way to do this, is to use topological notions. In this section, we recall the minimal knowledge that is needed for this. More information can be found in any textbook on analysis or on pointset topology. A set U ⊆ X = Rn is called open if it is the union of a collection of open balls. An open ball is the solution set UX (a, r) = Un (a, r) = U (a, r) of an inequality of the form x − a < r for some vector a ∈ Rn and some positive real number r. The sets ∅ and X are open, the intersection of a finite collection of open sets is open and the union of any collection of open sets is open. For any set S ⊆ X, the union of all open sets contained in S is called the interior of S and it is denoted by int(S). This is the largest open subset of S. Points in int(S) are called interior points of S. A set C ⊆ X is called closed if its complement X\C is open. The sets ∅ and X are closed, the union of a finite collection of closed sets is closed and the intersection of any collection of closed sets is closed. For any S ⊆ X, the intersection of all closed sets containing S is called the closure of S and it is denoted by cl(S). This is the smallest closed set containing S. For each set S ⊆ X, the points from the closure of S that are not contained in the interior of S, are called boundary points. A set S ⊆ X is called bounded if there exists K > 0 such that si  ≤ K for all s ∈ S and all i ∈ {1, . . . , n}. Otherwise, S is called an unbounded set. A set S ⊆ X is called compact if it is closed and bounded. An infinite sequence of points v1 , v2 , . . . in X has limit v ∈ X—notation limk→∞ vk = v—if there exists for each ε > 0 an natural number N such that vk − v < ε for all k ≥ N . Here · is the Euclidean norm on X given by 1 x = (x12 + · · · + xn2 ) 2 for all x ∈ X. Then the sequence is called convergent. An alternative definition for a closed set in X is that it is a subset of X such that for every sequence of points in A that is convergent in X, also its limit is a point in A. Let S ⊆ X = Rn , T ⊆ Y = Rm , a ∈ S, b ∈ Y and f : S → T a function. Then one says that, for s → a, the limit of f (s) equals b—notation lims→a f (s) = b—if there exists for each ε > 0 a number δ > 0 such that f (s) − b < ε for all s ∈ S for which s − a < δ.
3.1 *Crash Course in Topological Notions
67
A function f : S → T , where S ⊆ X and T ⊆ Y , is called continuous at a ∈ S if lims→a f (s) = f (a). The function f is called continuous if it is continuous at all a ∈ S. If a function f : S → T , where S ⊆ X = Rn and T ⊆ Y = Rm , is continuous and surjective, then the following implication holds: if S is compact, then T is compact (‘compactness is a topological invariant’). This implies the following theorem of Weierstrass: if S ⊆ X = Rn is a nonempty compact set and f : S → R is a continuous function, then f assumes its global minimum, that is, there exists s ∈ S for which f (s) ≥ f ( s ) for all s ∈ S. We will also use the following result (which characterizes compact sets): each sequence in a compact set has a convergent subsequence. One can also define topological concepts relative to a given subset S ⊆ X = Rn : the intersection of an open (resp. closed) subset of X with S is called open (resp. closed) relative to S. We will need relative topological notions in two situations. Here is the first situation where we need relative topology in convex analysis: if we work with a convex set. Example 3.1.1 Let A ⊆ R3 be a convex set that is contained in a plane. Then its interior is empty and all its points are boundary points. So the concepts of interior and boundary are not very informative here. Therefore, we need a finer topological notion. This can be obtained by taking the relative topology with respect to the plane. For a nonempty convex set A ⊆ X = Rn , it is usually natural to take topological notions on A—such as open subset of A, interior of A and boundary of A—relative to aff(A), the affine hull of A. This is defined as follows. An affine subspace of X is a subset of X of the form v + L = {v + l  l ∈ L} for a point v ∈ Rn and a subspace L ⊆ X. The affine hull of a convex set A ⊆ X is the smallest affine subspace aff(A) of X that contains A. Then we will say relatively open subset of A, relative interior ri(A) of A and relative boundary of A. Both the closure and the relative interior of a convex set are again convex sets. Often we will avoid working with these relative notions, for the sake simplicity. We will do this by translating the affine hull of a given convex set in such a way that it becomes a subspace, and then we replace the space X by this subspace. Then we can work with the concepts open, interior and boundary, without having to use ‘relative’ notions. Here is the second situation where we need relative topology in convex analysis. If we work with the sphere model and the hemisphere model, we will take the topological notions on subsets of the sphere and the hemisphere relative to the sphere.
68
3 Convex Sets: Topological Properties
3.2 Recession Cone and Closure The main aim of this section is to state the result that the recession cone RA of a convex set A ⊆ X = Rn , defined in Definition 1.5.4, can be constructed by means of a closure operation. The proof will be given in the next section. Figure 3.1 illustrates the concept of recession cone of a convex set. Proposition 3.2.1 For each nonempty closed convex set A ⊆ X = Rn , the convex cone RA × {0} is the intersection of cl(co(A × {1})), the closure of the convex hull of A × {1}, with the horizontal hyperplane X × {0}. Note that this result implies that the recession cone RA of a nonempty closed convex set A is a closed convex cone containing the origin. Indeed, cl(co(A × {1})) is a closed convex cone containing the origin, and so RA × {0}, which you get from it by intersection with the horizontal hyperplane X × {0} is also a closed convex cone containing the origin. This means that RA is a closed convex cone containing the origin. Example 3.2.2 (Recession Cone by Means of a Closure Operation (Geometric)) Figure 3.2 illustrates Proposition 3.2.1 for the closed halfline A = [0, +∞) ⊂ X = R.
A A a + tc a
A
RA RA
RA O
O
O
Fig. 3.1 Recession cone of a convex set
A × {1}
O Fig. 3.2 Recession direction as a limit
A
3.2 Recession Cone and Closure
69
The usual definition of the recession cone gives that A has a unique recession direction, ‘go right’, described by the open ray {(x, 0)  x > 0}. The convex cone c(A) is the solution set of x ≥ 0, y > 0. Its closure is the solution set is the solution set of x ≥ 0, y ≥ 0. So if you take the closure, then you add all points (x, 0) with x ≥ 0. So the direction ‘go right’ is added. Figure 3.2 can also be used to illustrate this proposition in terms of the ray model (consider the sequence of open rays that is drawn and for which the angle with the positive axis tends to zero; note that the positive xaxis models the recession direction) and the hemisphere model (consider the sequence of dotted points on the arc that tend to the encircled point; note that this point models the recession direction). Example 3.2.3 (Recession Cone by Means of a Closure Operation (Analytic)) 1. Ellipse. Let A ⊆ R2 be the solution set of 2x12 +3x22 ≤ 5 (a convex set in the plane that has a boundary an ellipse). The (minimal) conification c(A) = cmin (A) =⊆ R2 × R++ can be obtained by replacing xi by xi /x3 with x3 > 0, for i = 1, 2 and then multiplying by x32 . This gives the inequality 2x12 + 3x22 ≤ 5x32 . Its solution set in R2 × R++ is c(A). Then the maximal conification cmax (A) can be obtained by taking the closure. This gives that cmax (A) is the solution set in R2 × R+ of the inequality 2x12 + 3x22 ≤ 5x32 . The recession cone RA ⊆ R2 can be obtained by putting x3 = 0 in this inequality. This gives the inequality 2x12 + 3x22 ≤ 0. Its solution set in R2 is {02 }. This is RA . 2. Parabola. Let A ⊆ R2 be the solution set of x2 ≥ x12 (a convex set in the plane that has a boundary an parabola). We proceed as above. This gives that the (minimal) conification c(A) = cmin (A) ⊆ R2 × R++ is the solution set of x2 x3 ≥ x12 . The maximal conification cmax (A) can be obtained by taking the closure. One can check that this gives that cmax (A) is the solution set in R2 × R+ of the system x2 x3 ≥ x12 , x2 ≥ 0. The recession cone RA ⊆ R2 can be obtained by putting x3 = 0 in the first inequality. This gives the system of inequalities 0 ≥ x12 , x2 ≥ 0. Its solution set in R2 is the closed vertical ray in R2 that points upward, {(0, ρ)  ρ ≥ 0}. This is RA . 3. Hyperbola. Let A be the solution set of x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 (a convex set in the plane that has a boundary a branch of a hyperbola). We proceed again as above. This gives that the (minimal) conification c(A) = cmin (A) ⊆ R2 ×R++ is the solution set of x1 x2 ≥ x32 , x1 ≥ 0, x2 ≥ 0. The maximal conification cmax (A) can be obtained by taking the closure. One can check that this gives that cmax (A) is the solution set in R2 × R+ of the system x1 x2 ≥ x32 , x1 ≥ 0, x2 ≥ 0. The recession cone RA ⊆ R2 can be obtained by putting x3 = 0 in the first inequality. This gives the system of inequalities x1 x2 ≥ 0, x1 ≥ 0, x2 ≥ 0. Its solution set in R2 is the first quadrant R2+ . This is RA . Proposition 3.2.1 has the following consequence.
70
3 Convex Sets: Topological Properties
Corollary 3.2.4 Let A ⊆ X = Rn be a given nonempty closed convex set. Then the following conditions are equivalent: 1. A is bounded, 2. A has no recession directions, 3. c(A) ∪ {0n } is closed. For each nonempty closed convex set A, there is a natural bijection between the set of recession directions of A and the set of ‘horizontal’ open rays of cl(c(A)), that is, those rays that are not open rays of c(A). Example 3.2.5 (Unboundedness of Nonconvex Sets) The subset in the plane R2 that is the solution set of x 2 + 1 ≥ y ≥ x 2 is not convex; this set unbounded, but it contains no halflines. So the fact that unboundedness of a convex set is equivalent with the existence of a recession direction, is a special property of convex sets. It does not hold for arbitrary subsets of Rn . Checking unboundedness of nonconvex sets is therefore much harder than checking boundedness of convex sets.
3.3 Recession Cone and Closure: Proofs Now we prove the results from the previous section. Here is the proof of the main result. Proof of Proposition 3.2.1 It suffices to prove the second statement as both the recession cone RA and the closure of the conification of A, cl(c(A)), do not change if A is replaced by its closure cl(A). ⊆: Let (v, ρ) ∈ cl(c(A)) \ c(A). Choose a sequence ρn (an , 1), n ∈ N in c(A)— so an ∈ A and ρn > 0 for all n—that has limit (v, ρ). First we prove by contradiction that ρ = 0. Assume ρ > 0. Then the sequence (an , 1), n ∈ N in c(A) has limit (ρ −1 v, 1) as limn→∞ ρn = ρ > 0. That is, limn→∞ an = ρ −1 v, so ρ −1 v ∈ A as A is closed. Therefore, (v, ρ) = ρ(ρ −1 v, 1) lies in c(A). This is the required contradiction. So ρ = 0. By the choice of the sequence above, we have limn→∞ ρn an = v and limn→∞ ρn = ρ = 0. Choose a ∈ A. For each t ≥ 0, we consider the sequence ((1 − tρn )a + tρn an )n in A, where we start the sequence from the index n = Nt where Nt is chosen large enough such that tρn < 1 for all n ≥ Nt . This choice is possible as limn→∞ ρn = 0. This sequence ((1 − tρn )a + tρn an )n converges to a + tv as ρn → 0 and ρn an → 1, and so a + tv ∈ A as A is closed. This proves that a + tv ∈ A for all t ≥ 0 and so, by definition, v is a recession vector of A, that is, v ∈ RA . This establishes the inclusion ⊆. ⊇: Let v ∈ RA . If v = 0n , then we choose a ∈ A and note that (v, 0) = 0n+1 = limt↓0 t (a, 1) ∈ cl(c(A)). Now let v = 0n . Choose a ∈ A such that a + tv ∈ A for all t ≥ 0. As v = 0, we get limt→∞ (a + tv, 1)−1 (a + tv, 1) = (v, 0). So (v, 0) lies in the closure of c(A), that is, (v, 0) ∈ cl(c(A)). As the last coordinate of
3.4 Illustrations Using Models
71
(v, 0) is zero, we have (v, 0) ∈ c(A), as c(A) ⊆ X × R++ . So we have proved that (v, 0) ∈ cl(c(A)) \ c(A). This establishes the inclusion ⊇.
Here is the proof of the corollary. Proof of Corollary 3.2.4 We prove the following implications. 3 ⇒ 1: If A is unbounded, we can choose a sequence an , n ∈ N in A for which limn→∞ an = +∞. Then we can choose a subsequence ank , k ∈ N such ank = 0n for all k and such that, moreover, the sequence ank −1 ank , k ∈ N is convergent, say to v ∈ Sn . This is possible as the unit sphere Sn is compact and as each sequence in a compact set has a compact subsequence. Then the sequence (ank −1 ank , ank −1 ), k ∈ N in c(A) converges to (v, 0), which does not lie in c(A). So c(A) is not closed. This proves the implication: c(A) closed ⇒ A bounded. 2 ⇒ 3: follows from Proposition 3.2.1. 1 ⇒ 2: follows from the definition of recession direction. The last statement of the corollary follows immediately from Proposition 3.2.1.
3.4 Illustrations Using Models Now, the description of recession directions by a closure operation will be illustrated in the three models (ray, hemisphere, topview). This is possible as a convex cone C ⊆ X × R+ containing the origin, where X = Rn , is closed iff one of its three models is closed, as one can check. To be precise, for the ray model this statement requires the concept of metric space. Example 3.4.1 (Recession Directions and the Ray Model) Figure 3.3 illustrates the case A = [a, +∞) ⊆ X = R with a > 0. The ray model is used. The beginning of a sequence of rays of the convex cone c(A) has been drawn. The rays of the sequence are flatter and flatter and the angle they make with the horizontal ray tends to zero. In the limit, you get the positive horizontal axis; this ray represents the unique recession direction of A—‘go right’.
A × {1}
O
RA
Fig. 3.3 Recession cone and the ray model
A
72
3 Convex Sets: Topological Properties
_ c(A)
^
A × {1}
A O
A
RA
Fig. 3.4 Recession cone and the hemisphere model
^
X
RX
0
X
Fig. 3.5 The hemisphere model for the recession directions of the plane
Example 3.4.2 (Recession Directions and the Hemisphere Model in Dimension One) Figure 3.4 illustrates again the case of a closed halfline, A = [a, +∞) ⊆ X = R with a > 0. Now the hemisphere model is used. (the bottom point is You see that A is visualized as a half openhalf closed arc A not included) in the hemisphere model, and that the unique recession direction (‘go right’) is visualized as the bottom point of the arc, also in the hemisphere model. If we would take the closure of this half openhalf closed arc, the effect would be that this bottom point would be added to it. The closure of the conification, cl(c(A)), is denoted in the figure by c(A). Example 3.4.3 (Recession Directions and the Hemisphere Model for the Plane) Figure 3.5 illustrates the case of the plane A = X = R2 . The hemisphere model is used. and The plane A = X is visualized as the standard open upper hemisphere X, the set of recession directions of X, that is, of the open rays of the recession cone RX , is visualized by the circle that bounds the upper hemisphere. This last set is indicated in the same way as the recession cone of A = X, by RX . If you take the closure of the model of A = X, then what is added is precisely the model of the set of recession directions. Indeed, if we take the closure of the open upper hemisphere, then the circle that bounds the upper hemisphere is added to it. Example 3.4.4 (Recession Directions and the Hemisphere Model for Convex Sets Bounded by an Ellipse, a Parabola, and a Branch of a Hyperbola) Figure 3.6 illustrates the case of three closed convex sets A in the plane R2 . The hemisphere model is used.
3.4 Illustrations Using Models
73
A
A
A
RA
RA
Fig. 3.6 The hemisphere model for the recession directions of some convex sets in the plane
A RA
A
RA
A
Fig. 3.7 The topview model for the recession directions of some convex sets in the plane
Each one of the three shaded regions is denoted in the same way as the convex set that it models, as A; the set of points of the shaded region that lie on the boundary of the hemisphere is denoted in the same way as the recession cone of A, whose set of rays it models, as RA . On the left, the model of A is closed; this corresponds to the fact that A has no recession directions, or equivalently, to the fact that A is bounded. In the middle, the model of A is not closed: you have to add one suitable point from the circle that bounds the hemisphere to make it closed. This point corresponds to the unique recession direction of A, the unique open ray of the recession cone RA . In particular, A is not bounded. On the right, the model of A is not closed: you have to add a suitable closed arc from the circle that bounds the hemisphere to make it closed. The points of this arc correspond to the recession directions of A. These points correspond to the open rays of RA . In particular, A is not bounded. Example 3.4.5 (Recession Directions and the TopView Model for Convex Sets Bounded by an Ellipse, a Parabola, and a Branch of a Hyperbola) Figure 3.7 illustrates also the case of three closed convex sets A in the plane R2 , just as Fig. 3.6, but it differs slightly. In Fig. 3.7, the convex sets contain the origin in their interior, the topview model is used, and some arrows are drawn. The arrows in Fig. 3.7 have to be explained. These model straight paths in the convex set A that start at the origin of the plane. If an arrow ends in the interior of the disk, then the path in A ends at a boundary point of A; if the arrow ends at a boundary point of the disk, then the path in A does not end: it is a halfline. Example 3.4.6 (Recession Directions and the Ray Model for nDimensional Space Rn ) Consider the case A = X = Rn . Note to begin with that cl(c(X)) equals the closed upper halfspace X × R+ . Its nonhorizontal open rays correspond to the points of X, and its horizontal open rays correspond to the onesided directions of
74
3 Convex Sets: Topological Properties
X, that is, to Sn , the standard unit sphere in Rn . Indeed, these nonvertical rays are the rays {ρ(x, 1)  ρ > 0} where x ∈ X and these horizontal rays are the rays {ρ(x, 0)  ρ > 0} where x ∈ Sn . Example 3.4.7 (Gauss and Recession Directions) It is interesting that Gauss used already a similar model to visualize onesided directions in space. Disquisitiones, in quibus de directionibus variarum rectarum in spazio agitur, plerumque ad maius perspicuitatis et simplicitatis fastigium evehuntur, in auxilium vocando superficiem sphaericam radio = 1 circa centrum arbitrarium descriptam, cuius singula puncta repraesentare censebuntur directiones rectarum radiis ad illa terminatis parallelarum. Investigations, in which the directions of various straight lines in space are to be considered, attain a high degree of clearness and simplicity if we employ, as an auxiliary, a sphere of unit radius described about an arbitrary center, and suppose the different points of the sphere to represent the directions of straight lines parallel to the radii ending at these points. (First sentence of General Investigations of Curved Surfaces: Carl Friedrich Gauss, 1828.)
This fits in with the hemisphere model. The auxiliary sphere of unit radius that is recommended by Gauss, is precisely the boundary of the open upper hemisphere x12 + x22 + x32 + x42 = 1, x4 > 0 that models the points of three dimensional space. Indeed, this boundary is x12 + x22 + x32 = 1, x4 = 0, that is, it is a unit sphere of radius 1. So the auxiliary sphere recommended by Gauss agrees completely with the visualization of recession vectors in the hemisphere model, which has been explained above.
3.5 The Shape of a Convex Set The aim of this section is to give the main result of this chapter—a result describing the ‘shape’ of a convex set. We will formulate this result in terms of the hemisphere model. We begin by giving a parametrization of the standard upper hemisphere. Definition 3.5.1 The angle (u, w) between two unit vectors u, w ∈ Rp is the number ϕ ∈ [0, π ] for which cos ϕ = u · w. Definition 3.5.2 (Parametrization of the Upper Hemisphere) For each v ∈ SX = Sn , the standard unit sphere in X = Rn , and each t ∈ [0, 12 π ], we define the point xv (t) in the standard upper hemisphere in X × R = Rn+1 , given by 2 x12 + · · · + xn+1 = 1, xn+1 ≥ 0, implicitly by the following two conditions: xv (t) ∈ cone({(0n , 1), (v, 0)})
3.5 The Shape of a Convex Set
75
xv(t)
Fig. 3.8 Parametrization of the upper hemisphere
t v and
(xv (t), (0n , 1)) = t;
explicitly, xv (t) = ((sin t)v, cos t). This parametrization of the standard upper hemisphere is not unique for the point (0n , 1): then one can take t = 0 and each v ∈ Sn . Example 3.5.3 (Parametrization of the Upper Hemisphere) Figure 3.8 illustrates this parametrization. In order to visualize how the point xv (t) depends on t for a given unit vector v, the vector v and the vector (0n , 1) are drawn as (1, 0) and (0, 1) in the plane R2 respectively. The point on the standard unit circle in the planeR2 that lies in the first quadrant R2+ such that the length of the arc with endpoints (0, 1) and this point is t is drawn: this point is xv (t). The hemisphere model for X = Rn and the onesided directions in X can be expressed algebraically in terms of the parametrization of the standard upper hemisphere given above. The point xt (v) on the hemisphere models the point (tan t)v in X if 0 ≤ t < 12 π and it models the onesided direction in X given by the unit vector v if t = 12 π . The following result is a first step towards the main result; it is also of independent interest. From now on, we will take topological notions for a convex set A ⊆ Rn always relative to the affine hull of A. Proposition 3.5.4 Each nonempty convex set A ⊆ X = Rn has a relative interior point. Proof Take a maximal affinely independent set of points a1 , . . . , ar in A. Then the set of points ρ1 a1 + · · · + ρr ar with ρ1 , . . . , ρr > 0, ρ1 + · · · + ρr = 1 is open in aff(A) and so all its points are relative interior points of A.
This result gives a better grip on convex sets. Whenever you deal with a convex set, it helps to choose a relative interior point and then consider the position of the convex set in comparison with the chosen point. We will do this below in order to simplify the formulation and proof of the main result, Theorem 3.5.7.
76
3 Convex Sets: Topological Properties
To be more precise, Proposition 3.5.4 shows that we can always assume wlog that a given nonempty convex set A ⊆ Rn contains the origin in its interior point. Indeed, we can choose a relative interior point, translate A to a new position A such that this point is brought to the origin; this turns the affine hull of A into the span of A . Finally we consider A as a convex subset of the subspace span(A ) and we give a coordinate description of span(A ) by choosing a basis for it. This concludes the reduction to a convex set that has the origin as an interior point. Before we state the main result formally, we will explain the geometric meaning of it. Example 3.5.5 (The Main Result on the Shape of a Convex Set for a Closed Ball) The following description of the shape of a closed ball in Rn is the view on convex sets that is given below on arbitrary convex sets. If you travel in the direction of some unit vector v, with constant speed and in a straight line, starting from the center of the ball, then the following will happen. You travel initially through the interior of the ball, then you will hit during one moment the boundary of the disk, and after that moment you will travel forever outside the disk. The hitting time does not depend on the direction v. For the purpose of explaining the geometric meaning of the main result, the topview model is the most convenient one; for the purpose of giving a formal description of the result, the hemisphere model is more convenient. Example 3.5.6 (The Main Result on the Shape of a Convex Set for Sets in the Plane Bounded by an Ellipse, a Parabola, and a Branch of a Hyperbola) Figure 3.9 illustrates the main result for three closed convex sets A in the plane R2 that contain the origin in their interiors: the left one is bounded by an ellipse, the middle one is bounded by a parabola and the right one is bounded by a branch of a hyperbola. Here is some explanation for this figure. For each one of these three convex sets A, its model is indicated by the same letter as the set itself, by A; the model of the set of recession directions is indicated by the same letter as the recession cone, by RA . Recall here that the recession directions of A correspond to the open rays of RA . Each arrow in the figure models either a closed line segment with one endpoint the origin and the other endpoint a boundary point of A, or it models a halfline contained in A with one endpoint the origin.
A RA
A
RA
A
Fig. 3.9 The shape of a closed convex set together with its recession directions
3.5 The Shape of a Convex Set
77
Now we describe the main result for these three sets A. Here we get the same description for three different looking convex sets, after adding recession directions. Suppose that you travel, in each one of the three cases, in a straight line in A, starting from the origin and going in some direction v (a unit vector), with speed increasing in such a way that the speed would be constant in the hemisphere model. Then in the topview model, you travel also in a straight line. At first, you will travel through the interior of the model of A. Then you will hit during one moment, τ (v), the boundary of the model of A. If this boundary point does not lie on the boundary of the closed standard unit disk, then after that moment you will travel outside the model of A till you will hit the boundary of the disk. The hitting time τ (v) of the boundary of the model of A depends continuously on the initial direction v. This completes the statement of the main result for these three convex sets A. So we see that the three sets A, which look completely different from each other, have exactly the same shape, as a closed disk, after we adjoin to A the set of its recession directions. Now we are ready to state the main result formally and in general. As already announced, we do this in terms of the hemisphere model, as this allows the simplest formulation. We use the parametrization of the upper hemisphere from Definition 3.5.2. We recall that all topological notions for subsets of the standard upper hemisphere {x ∈ Sk  xk ≥ 0} are taken relative to the standard unit sphere Sk = {x ∈ Rk  x = 1}. Theorem 3.5.7 (Shape of a Convex Set) Let A ⊆ X = Rn be a closed convex set that contains the origin in its interior. Consider the hemisphere model cl(c(A)) ∩ Sn+1 for the convex set A to which the set of its recession directions has been added. For each v ∈ Sn , there exists a unique τ (v) ∈ (0, 12 π ] such that the point xv (t) lies in the interior of cl(c(A)) ∩ Sn+1 for t < τ (v), on the boundary of cl(c(A)) ∩ Sn+1 for t = τ (v), and outside cl(c(A)) ∩ Sn+1 for t > τ (v). Moreover, τ (v) depends continuously on v. The meaning of this result is that if you adjoin the set of recession directions to a convex set A, then the outcome has always the same shape; which is, for example, the shape of a closed ball in Rd where d is the dimension of the affine hull of A. Note that the continuity of the function τ (·) means that the union of the relative boundary of a convex set and its set of recession directions has a continuity property in the hemisphere model: if you travel at unit speed along a geodesic starting from a relative interior point, then the time it takes to reach a point of this union, depends continuously on the initial direction. A nice feature of this result is that it implies immediately all standard topological properties of convex sets, bounded as well as unbounded ones, as we will see in the next section. Proof of Theorem 3.5.7 The set cl(c(A)) is a closed convex cone contained in the halfspace X × R+ and it contains an open ball with center (0n , 1) and some radius r > 0. This implies all statements of the theorem, except the continuity statement. This follows from the lemma below.
78
3 Convex Sets: Topological Properties
Fig. 3.10 Illustration of Lemma 3.5.9
A x a
Example 3.5.8 (Continuity Property of the Boundary of a Convex Set) Figure 3.10 illustrates the following auxiliary result and it makes clear why this result implies a ‘continuity’ property for the boundary of a convex set (this is a matter of the boundary being squeezed at some point between two sets with ‘continuity’ properties at this point). Lemma 3.5.9 Let A ⊆ X = Rn be a closed convex set that contains an open ball U (a, ε) and let x be a boundary point of A. Then the convex hull of the point x and the open ball U (a, ε) is contained in A, and its point reflection with respect to the point x is disjoint from A, apart from the point x. Proof The first statement follows from the convexity of A, together with the assumptions that x ∈ A and U (a, ε) ⊆ A. The second statement is immediate from the easy observation that the convex hull of any point of the reflection and the open ball U (a, ε) contains x in its interior. So if a point of the reflection would belong to A, then by the convexity of A and by U (a, ε) ⊆ A we would get that x is in the interior of A. This would contradict the assumption that x is a boundary point of A.
To derive the required continuity statement precisely from this lemma requires some care. You have to intersect the convex cone cl(c(A)) with a hyperplane of X × R (a subspace of dimension n − 1 of X = Rn ) that does not contain the point (v, 0). We do not display the details of this derivation.
3.6 Topological Properties Convex Set The aim of this section is to give a list of all standard topological properties of convex sets. It is an advantage of the chosen approach that all topological properties of convex sets are a part of or easy consequences of one result, Theorem 3.5.7. Definition 3.6.1 An extreme point of a convex set A ⊆ X = Rn is a point of A that is not the average of two different points of A. An extreme recession direction of A is a onesided direction if a vector that represents this direction is not the average of two linearly independent recession vectors of A. We write extr(A) for the set of
3.6 Topological Properties Convex Set
79
extreme points of A and extrr(A) for the set of unit vectors that represent an extreme recession direction of A. The kernel of a linear function between finite dimensional inner product spaces L : X → Y is defined to be the following subspace of X: ker(L) = {x ∈ X  L(x) = 0Y }. Corollary 3.6.2 Let A ⊆ X = Rn be a nonempty convex set. Then the following statements hold true. 1. The relative interior of A is nonempty. 2. All points on an open line segment, with endpoints a relative interior point of A and a relative boundary point of A are relative interior points of A, 3. The closure of A is a convex set and it has the same relative interior as A. 4. The relative interior of A is a convex set and it has the same closure as A. 5. A convex set is unbounded iff it has a recession direction. 6. The inverse image of a closed resp. open convex set under a linear function is closed resp. open. 7. The image of an open convex set under a surjective linear function is open. 8. Let L : X → Y = Rm be a linear function. If A is closed, and either RA ∩ ker(L) = {0} or A is a polyhedral set, then the image LA is closed. 9. Krein–Milman theorem. If A is closed, then each point of A is the sum of an element of the convex hull of the extreme points of A and an element of the conic hull of the extreme recession directions of A. 10. Let A ⊆ X = Rn be a nonempty closed convex set that contains no line. Then each point of A can be written as a linear combination of an affinely independent subset of the union of extr(A) and extrr(A); here all coefficients are positive and the coefficients of the points in extr(A) add up to 1. Properties 3 and 4 show that there is a close connection between a convex set, its closure and its relative interior. This is a special property of convex sets. For an arbitrary subset S of Rn , one has still the inclusions int(S) ⊆ S ⊆ cl(S), but these three sets can be completely different. Figure 3.11 illustrates the need for an assumption in statement 8: the image of a closed convex set under a linear function need not be closed. Fig. 3.11 Bad behavior image under linear function of closed convex set
80
3 Convex Sets: Topological Properties
Statement 9 is, in the bounded case, due to Minkowski for n = 3 and to Steinitz for any n. The result was generalized by Krein and Milman, still in the bounded case, to suitable infinite dimensional spaces (those that are locally convex), but there one might have to take the closure. Now the terminology Krein–Milman is often used also for the finite dimensional case. The assumption that A contains no line can be made essentially wlog. Indeed, for each closed convex set A ⊆ X = Rn one has A = LA + (A ∩ L⊥ A) where LA is the subspace RA ∩ R−A , the lineality space of A and L⊥ A is the orthogonal complement of LA , that is, LA = {x ∈ X  x · l = 0 ∀l ∈ LA }.
3.7 Proofs Topological Properties Convex Set The statements 1–7 follow immediately from the known shape of a convex set, that is, from Theorem 3.5.7. Statement 10 is a combination of Krein–Milman and Carathéodory’s theorem 1.12.1. It remains to prove the statements 8 and 9. Proof Krein–Milman Take a nonempty closed subset S of the upper hemisphere that is convex in the sense that for two different and nonantipodal points in S the geodesic that connects them is contained in S. For each point s ∈ S let Ls be the subspace of X of all tangent vectors at p to geodesics that have s as an interior point. Now pick an element p ∈ S. If Lp = {0n }, then stop. Otherwise, take a nonzero v ∈ Lp and choose the geodesic through p which has v as tangent vector at p. Consider the endpoints q, r of the part of this geodesic that lies in S. Then repeat this step for q and r. Note that Lq , Lr ⊆ Lp but Lq , Lr not equal to Lp as they do not contain v. As the dimensions go down at each step, this process must lead to a finite set of points s with Ls = {0n } such that p lies in the convex hull of these points.
Now we prove statement 8. Proof Statement 8 under the Assumption on the Recession Cone By the assumption, we get a surjective continuous function from the intersection SX ∩ C to the intersection SY ∩ LC defined by u → Lu/Lu. So SY ∩ LC is compact, as compactness is preserved under taking the image under a continuous mapping. Hence the convex cone LC is closed.
Sketch Proof Statement 8 for the Polyhedral Case The statement holds if L is injective: then we may assume wlog that L is defined by adding m − n zeros below each column x ∈ Rn (by working with coordinates in X and Y with respect to suitable bases); this shows that if A is a polyhedral set, then LA is also a polyhedral set and so LA is closed. So as each linear function is
3.8 *Applications of Recession Directions
81
the composition of injective linear functions and linear functions of the form πr : Rr+1 → Rr given by the recipe (x1 , . . . , xr , xr+1 ) → (x1 , . . . , xr ), it suffices to prove the statement for linear functions of the latter type. Note that each polyhedral set P in Rr+1 is the solution set of a system of linear inequalities of the form Li (x1 , . . . , xr ) ≥ xr+1 , xr+1 ≥ Rj (x1 , . . . , xr ), Sk (x1 , . . . , xr ) ≥ 0 for all i ∈ I, j ∈ J, k ∈ K for suitable finite index sets I, J, K. Here Li , Rj , Sk are affine functions of x1 , . . . , xr . Then the image πr (P ) of P under πr is readily seen to be the solution set of the system of linear inequalities Li (x1 , . . . , xr ) ≥ Rj (x1 , . . . , xr ), Sk (x1 , . . . , xr ) ≥ 0 for all i ∈ I, j ∈ J, k ∈ K. That is, it is again a polyhedral set and in particular it is closed.
The method that was used in this proof is called the FourierMotzkin elimination, a method to eliminate variables in a system of linear inequalities.
3.8 *Applications of Recession Directions We give two applications of recession vectors. One can usually convince a nonexpert that something exists by giving an example. To convince him/her that something does not exist is usually much harder. However, in some cases we are lucky and then we can translate nonexistence of something into existence of something else.
3.8.1 Certificates for Unboundedness of a Convex Set How to convince a nonexpert that a convex set A is unbounded, that is that there exists no upperbound for the absolute value of the coordinates of the points in A? Well, the nonexistence of a bound for a convex set A is equivalent to the existence of a nonzero recession vector for A. So a recession vector for A is valuable: it is a certificate for unboundedness for A. So you can convince a nonexpert that there exists no bound for A by giving him/her an explicit nonzero recession vector for A.
3.8.2 Certificates for Insolubility of an Optimization Problem How to convince a nonexpert that there does not exist a global minimum for a linear function l on a closed convex set A? This nonexistence is equivalent to the existence of a nonzero recession vector of A along which the function l descends. So such a vector is valuable: it is a certificate for insolubility for the considered optimization problem. So you can convince a nonexpert that there does not exist an optimal solution for the given optimization problem by giving him/her an explicit nonzero recession vector of A along which the function l descends.
82
3 Convex Sets: Topological Properties
a c
b ∼ b
Fig. 3.12 Hint for Exercise 1
3.9 Exercises 1. Give a direct proof that for an interior point a of a convex set a ∈ A and an ˜ is contained in the arbitrary point b˜ ∈ A, the halfopen line segment [a, b) interior of A. Hint: see Fig. 3.12. 2. Show that both the closure and the relative interior of a convex set are convex sets. 3. Show that the following two properties of a set C ⊆ Rn are equivalent (both define the concept of closed set): (a) the complement Rn \ C is open, (b) C contains along with each convergent sequence of its points also its limit. 4. Show that a set S ⊆ Rn is compact, that is, closed and bounded, iff every sequence in S has a subsequence that converges to an element in S. 5. Show that the following three properties of a function f : S → T with S ⊆ Rn , T ⊆ Rm are equivalent (all three define the concept of continuous function): (a) lims→a f (s) = f (a) for all a ∈ S, (b) the inverse image under f of every set that is open relative to T is open relative to S, (c) the inverse image under f of every set that is closed relative to T is closed relative to S. 6. Show that one can write each nonempty closed convex set A as a Minkowski sum D + L where D is a closed convex set that contains no line and L is a subspace. In fact, L is uniquely determined to be LA , the lineality space of A. Moreover, one can choose D to be the intersection A ∩ L⊥ A. This result allows to restrict attention to nonempty closed convex sets that do not contain a line. 7. Show that the following conditions on a convex cone C ⊆ Rn × R+ containing the origin are equivalent: (a) C is closed, (b) the model of C in the ray model is closed in the sense that if the angles between the rays of a sequence of rays of C and a given ray in Rn × R+ tend to zero, then this given ray is also a ray of C, (c) the model of C in the hemisphere model is closed, (d) the model of C in the topview model is closed.
3.9 Exercises
83
8. Show that an unbounded convex set has always a recession direction, but that an arbitrary unbounded subset of Rn does not always have a recession direction. 9. Determine the recession cones of some simple convex sets, such as (7, +∞), {(x, y) ∈ R2  y ≥ x 2 } and {(x, y) ∈ R2  xy ≥ 1, x ≥ 0, y ≥ 0}. 10. Give an example of a convex set A ⊆ Rn , two recession vectors c, c and a point a ∈ A such that a + tc ∈ A ∀t ≥ 0 but not a + tc ∈ A ∀t ≥ 0. 11. Show that the recession cone RA of a nonempty convex set A ⊂ Rn is a convex cone. 12. Show that the recession cone RA of a nonempty closed convex set A ⊂ Rn is closed. 13. Show that the recession cone RA of a nonempty closed convex set A ⊂ Rn need not be closed. 14. Let A be a nonempty closed convex set in X = Rn . Show that a unit vector c ∈ X is in the recession cone RA of A iff there exist a sequence a1 , a2 , . . . of nonzero points in A for which ai → ∞ such that ai /ai → c for i → ∞. 15. Show that the recession cone RA of a nonempty closed convex set A ⊆ X = Rn consists—for any choice of a ∈ A—of all c ∈ X for which a + tc ∈ A for all t ≥ 0. Figure 3.13 illustrates this. So for closed convex sets, a simplification of the definition of recession vector is possible: we can take the same arbitrary choice of a in the entire set A, for all recession vectors c. 16. Verify in detail that Fig. 3.2 is in agreement with Proposition 3.2.1. Do this using each one of the models (ray, hemisphere, topview). 17. Verify that the gauge of a closed convex set that contains the origin in its interior, is welldefined, in the following way. Let A ⊆ X = Rn be a closed convex set that contains the origin in its interior. Show that there exists a unique sublinear function g : X → R+ for which A = {x ∈ X  g(x) ≤ 1}.
Fig. 3.13 Recession vector of a closed convex set A a + tc ∼ a a
84
3 Convex Sets: Topological Properties
18. Show that the epigraph of the gauge g of a closed convex set A that contains the origin in its interior, is equal to the closure of the homogenization of A, epi(g) = cl(c(A)). 19. Let A ⊆ X = Rn be a closed convex set that has the origin as an interior point, and let g be its gauge. Show that int(A) = {x ∈ X  g(x) < 1}. 20. Give a precise derivation of the continuity statement of Theorem 3.5.7 by applying Lemma 3.5.9 in a suitable way. 21. Show that the following two characterizations of extreme points of a closed convex set A ⊆ Rn are equivalent: ‘a point in A that is not the average of two different points of A’ and ‘a point in A that does not lie on any open interval with its two endpoints in A’. 22. Show that each extreme point of a closed convex set A ⊆ Rn is a relative boundary point of A. 23. *Prove or disprove the following statement: the set of extreme points of a nonempty bounded closed set with nonempty interior is closed. 24. Show that a recession vector of a closed convex set A ⊆ Rn is extreme iff it does not lie in the interior of an angle in (0, π ) spanned by two recession directions of A. 25. Determine the extreme directions for the semidefinite cone. 26. Let A ⊆ Rn be a nonempty closed convex set. Show that the lineality space LA = RA ∩ (−RA ) is a subspace. 27. Show that the lineality space LA of a nonempty closed convex set A ⊆ X = Rn consists of all c ∈ X such that a + tc ∈ A for all t ∈ R for some—and so for all—a ∈ A. 28. Let A ⊆ Rn be a nonempty closed convex set. Show that for each a ∈ A, the lineality space LA is the largest subspace of Rn , with respect to inclusion, that is contained in A − a = {b − a  b ∈ A}. 29. Show that a closed convex set A ⊆ Rn does not contain any line iff LA = {0n }. 30. Give the details of the sketch of the proof that has been given for statement 8 of Corollary 3.6.2 in the polyhedral case.
Chapter 4
Convex Sets: Dual Description
Abstract • Why. The heart of the matter of convex analysis is the following phenomenon: there is a perfect symmetry (called ‘duality’) between the two ways in which a closed proper convex set can be described: from the inside, by its points (‘primal description’), and from the outside, by the halfspaces that contain it (‘dual description’). Applications of duality include the theorems of the alternative: nonexistence of a solution for a system of linear inequalities is equivalent to existence of a solution for a certain other such system. The best known of these results is Farkas’ lemma. Another application is the celebrated result from finance that European call options have to be priced by a precise rule, the formula of BlackScholes. Last but not least, duality is very important for practitioners of optimization methods, as we will see in Chap. 8. • What. Duality for convex sets is stated and a novel proof is given: this amounts to just throwing a small ball against a convex set. Many equivalent versions of the duality result are given: the supporting hyperplane theorem, the separation theorems, the theorem of Hahn–Banach, the fact that a duality operator on convex sets containing the origin, the polar set operator B → B ◦ , is an involution. For each one of the four binary operations on convex sets a rule is given of the type (B1 B2 )◦ = B1◦ " B2◦ where " is another one of the four binary operations on convex sets, and where B1 , B2 are convex sets containing the origin. Each one of these four rules requires usually a separate proof, but homogenization generates a unified proof. This involves a construction of the polar set operator by means of a duality operator for convex cones, the polar cone operator. The following technical problem requires attention: all convex sets that occur in applications have two nice properties, closedness and properness, but they might lose these if you work with them. For example B1 B2 need not have these properties if B1 and B2 have them and is one of the four binary operations on convex sets; then the rule above might not hold. This rule only holds under suitable assumptions. Road Map 1. Definition 4.2.1, Theorem 4.2.5, Figs. 4.5, 4.6 (the duality theorem for convex sets, remarks on the two standard proofs, the novel ball throwing proof). © Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045_4
85
86
4 Convex Sets: Dual Description
2. Figure 4.7, Definition 4.3.1, Theorem 4.3.3, hint (reformulation duality theorem: supporting hyperplane theorem). 3. Figure 4.8, Definition 4.3.4, Theorem 4.3.6, hint (reformulations duality theorem: separation theorems). 4. Definitions 4.3.7, 4.3.9, Theorem 4.3.11, hint (reformulation duality theorem: theorem of Hahn–Banach). 5. Definition 4.3.12, Theorem 4.3.15, hint, remark (reformulation duality theorem: polar set operation is involution). 6. Figure 4.9, Definition 4.3.16, Theorem 4.3.19, hint, Theorem 4.3.20, hint (reformulations of duality theorem: (1) polar cone operator is involution, (2) polar cone of C = Rn is nontrivial; selfduality of the three golden cones). 7. The main idea of the Sects. 4.5 and 4.6 (construction polar set operator by homogenization; pay attention to the minus sign in the bilinear mapping on X × R and the proof that the ‘type’ of a homogenization of a convex set containing the origin remains unchanged under the polar cone operator). 8. Figure 4.12 (geometric construction of polar set). 9. Propositions 4.6.1, 4.6.2 (calculus rules for computing polar cones). 10. Propositions 4.6.3, 4.6.4 (calculus rules for computing polar sets, take note of the structure of the proof by homogenization). 11. Section 4.8 (MinkowskiWeyl representation, many properties of convex sets simplify for polyhedral sets). 12. Theorem 4.8.2, structure proofs (theorems of the alternative, short proofs thanks to calculus rules).
4.1 *Motivation We recall that the contents of first sections of chapters that motivate the contents of the chapter are optional material and are not used in the remainder of the section.
4.1.1 Child Drawing The aim of this section is to give an example of a curved line that is described in a dual way, by its tangent lines. A child draws lines on a sheet of paper using a drawer and according to some precise plan she has learned from a friend, who in turn has learned it from an older brother. For each line, the two endpoints of the drawer are placed somewhere at the bottom side of the sheet and accordingly at the left side of the sheet. The drawing is shown in Fig. 4.1. The fun of it is that drawing all these straight lines leads to something that is not straight: a curved line! The lines are tangent lines to the curve: the curve is created by its tangent lines. What is fun for us is to determine the equation for this curved line. You can see that the region above the curve is a convex set in the plane and that the lines that
4.1 *Motivation
87
Fig. 4.1 Envelope of a family of lines
Fig. 4.2 Manufacturing controlled by prices of resources
P
v
are drawn are all outside this region and tangent to the region. This description of a convex set in the plane by lines lying outside this set is called the dual description of the set. You can try to derive the equation already now. A calculation using the theory of this chapter will be given in Sect. 4.8.
4.1.2 How to Control Manufacturing by a Price Mechanism The aim of this section is to state an economic application of the dual description of a convex set: how a regulator can make sure that a firm produces a certain good in a desired way—for example respecting the environment. How to do this? Consider a product that requires n resources (such as labor, electricity, water) to manufacture. The product can be manufactured in many ways. Each way is described by a resource vector—an element in a closed convex set P in the first positive orthant Rn++ for which the recession cone is the first orthant Rn+ . In addition, it is assumed that P is strictly convex—its boundary contains no line segments of positive length. The regulator can set the prices πi ≥ 0, 1 ≤ i ≤ n, for the resources. Then the firm will choose a way of manufacturing x = (x1 , . . . , xn ) ∈ P that minimizes cost of production π1 x1 + · · · + πn xn . We will prove in Sect. 4.8, using the dual description of the convex set P , that for any reasonable way of manufacturing v ∈ P , the regulator can make sure that the firm chooses v. Here, reasonable means that v should be a boundary point of P —indeed other points will never be chosen by a firm as they can be replaced by points that give a lower price of production, whatever the prices are. Figure 4.2 illustrates how the firm chooses the way of manufacturing for given prices that minimizes cost of production.
88
4 Convex Sets: Dual Description
4.1.3 Certificates of Insolubility How can you convince a nonexpert who wants to know for sure whether a certain system of equations and inequalities has a solution or not, of the truth? If there exists a solution, then you should find one and give it: then everyone can check that it is indeed a solution. But how to do this if there is no solution? In the case of a linear system, there is a way. It is possible to give certificates of insolubility by means of socalled theorems of the alternative. This possibility depends on the dual description of convex sets. This will be explained in Sect. 4.8.
4.1.4 The BlackScholes Option Pricing Model Aristoteles writes in his book ‘Politics’ about 400 years BCE that another philosopher, Thales of Miletus, once paid the owners of olive presses an amount of money to secure the rights to use these presses at harvest time. When this time came and there was a great demand for these presses due to a huge harvest, as Thales had predicted, he sold his rights to those who needed the presses and made a great profit. Thus he created what is now called European call options, the right—but not the obligation—to buy something at a fixed future date at a fixed price. For thousands of years, such options were traded although there was no method for determining their value. Then Black and Scholes discovered a model to price them. They showed how one could make money without risk if such options would not be priced using their model. Therefore, from then on, all these options have been traded using the model of Black and Scholes. In Sect. 4.8, we will apply the dual description of a convex set to establish the result of Black and Scholes for the pricing of European call options in a simple but characteristic situation.
4.2 Duality Theorem The aim of this section is to present the duality theorem for a closed convex set. The meaning of this result is that the two descriptions of such a set—the primal description (from the inside) and the dual description (from the outside)—are equivalent. Definition 4.2.1 A subset H of Rn is a closed halfspace if it is the solution set of an inequality of the type a1 x1 + · · · + an xn ≤ b for a1 , . . . , an , b ∈ R and ai = 0 for at least one i. So, H is the set of points on one of the two sides of the hyperplane a1 x1 + · · · + an xn = b (including the hyperplane). A nonempty closed convex set A ⊆ Rn can be approximated from the inside, as the convex hull of a finite subset {a1 , . . . , ak } of A, as we have already explained.
4.2 Duality Theorem
89
Fig. 4.3 Approximation from the inside
a7
a6 a5
a1 A a2 a4
a3
Fig. 4.4 Approximation from the outside
H5
H6
H4 H1
A H3 H2
Example 4.2.2 (Approximation Convex Set from the Inside) Figure 4.3 illustrates a primal approximation of a convex set. In the figure, seven points in a given convex set A in the plane and their convex hull are drawn. The convex hull is an approximation of A from the inside, also called a primal approximation. The set A can also be approximated from the outside, as the intersection of a finite set of closed halfspaces H1 , . . . , Hl containing A. Example 4.2.3 (Approximation Convex Set from the Outside) Figure 4.4 illustrates a dual approximation of a convex set. In the figure, six halfplanes containing a given convex set A in the plane and their intersection are drawn. The intersection is an approximation of A from the outside, also called a dual approximation. Most of the standard results from convex analysis are a consequence of the following central result of convex analysis, the meaning of which is that ‘in the limit’ the two types of approximation, from the inside and from the outside, coincide. Example 4.2.4 (Approximation Convex Set from the Inside and the Outside) Figure 4.5 illustrates these both types of approximation, primal and dual, together in one picture. In the figure, both a primal and a dual approximation of a convex set A in the plane are drawn. The convex set A is squashed between the primal and the dual approximation. Now we formulate the duality theorem for closed convex set. This is the central result of convex analysis.
90
4 Convex Sets: Dual Description
Fig. 4.5 Primaldual approximation: co({a1 , . . . , a7 }) and H1 ∩ · · · ∩ H6
H5
H6 a7
a6 H4
a5 a1 H1
A
A
a2 a3
a4
H3
H2
Theorem 4.2.5 (Duality Theorem for Convex Sets) A closed convex set is the intersection of the closed halfspaces containing it. Example 4.2.6 (Duality Theorem and the Shape of Beach Pebbles) There are two types of beach pebbles: sea pebbles and river pebbles. Beach pebbles have usually the shape of a bounded closed convex set in three dimensional space R3 . River pebbles have usually not a convex shape. What might be the reason for this? Pebbles form gradually, sometimes over billions of years, as the water of the ocean or the river washes over rock particles. River current is gentler than ocean current. Therefore, the outside effect of ocean current on a pebble is that the pebble is exactly equal to its outside approximation—the intersection of the halfspaces containing it. However, the outside effect of river current on a pebble achieves in general only that it is roughly equal to its outside approximation. By the duality theorem for convex sets, Theorem 4.2.5, the sets in three dimensional space that are equal to their outer approximation are precisely the closed convex sets. So the difference in the shapes between sea pebbles and river pebbles illustrates the duality theorem for convex sets. To prove the duality theorem, one has to show that there exists for each point p outside A a hyperplane that separates p and A, that is, such that the point p and the set A lie on different sides of it (where each side is considered to contain the hyperplane). There are two standard ways for proving this theorem: the proof by means of the shortest distance problem, which produces the hyperplane in one blow, and the standard proof of the Hahn–Banach theorem (for the finite dimensional case), which produces step by step a chain of affine subspaces L1 ⊂ L2 ⊂ · · · ⊂ Ln−1 of Rn , such that the dimension of Li is i for all i and such that moreover, Ln−1 is the desired hyperplane (see Exercises 3 and 9 respectively). The proof of the Hahn–Banach theorem is usually given in a course on functional analysis for the general, infinite dimensional, case. Here a novel proof is presented. Example 4.2.7 (BallThrowing Proof Duality Theorem) Figure 4.6 illustrates this proof.
4.3 Other Versions of the Duality Theorem
91
Fig. 4.6 Ballthrowing proof duality theorem
pt− p
a q
A
The idea of the proof can be explained as follows for n = 3. Throw a small ball outside A with center p in a straight line till it hits A, say at the point q. Then take the tangent plane at q to the ball in its final position. Then geometric intuition suggests that this plane separates the point p and the set A. Now we give a more precise version of this proof. Figure 4.6 also illustrates this more precise version. Proof The statement of the theorem holds for A = ∅ (all closed halfspaces contain ∅ and their intersection is ∅) and A = Rn (the set of closed halfspaces containing Rn is the empty set and their intersection is Rn ; here we use the convention that the intersection of the empty set of subsets of a set S is S). So assume that A = ∅, Rn . Take an arbitrary p ∈ Rn \ A. Choose ε > 0 such that the ball B(p, ε) is disjoint with the closed convex set A. Choose a ∈ A. Define the point pt = (1 − t)p + ta for all t ∈ [0, 1]. So if t runs from 0 to 1, then pt runs in a straight line from p to a. Let t¯ be the smallest value of t for which B(pt , ε) ∩ A = ∅. This exists as A is closed. We claim that B(pt¯, ε) and A have a unique common element q. Indeed, let q be a common element. Then, for each x ∈ B(pt¯, ε) \ {q} there exists t < t¯ such that x ∈ B(pt , ε), and so, by the minimality property of t¯, x ∈ A. Now we use the fact that the open halflines with endpoint a boundary point of a given closed ball that run through a point of the ball, form an open halfspace that is bounded by the hyperplane that is tangent to the ball at this boundary point. Apply this property to the ball B(pt¯, ε) and the point q. This shows that the hyperplane that is tangent to the ball B(pt¯, ε) at the point q—which has equation (x − q) · (pt¯ − q) = 0— separates the point p and the set A.
4.3 Other Versions of the Duality Theorem The aim of this section is to give no less than six equivalent reformulations of the important Theorem 4.2.5. These are often used. We only give hints for the proofs of the equivalence.
92
4 Convex Sets: Dual Description
4.3.1 The Supporting Hyperplane Theorem We need a definition. Definition 4.3.1 A hyperplane H supports a nonempty closed convex set A ⊂ Rn at a point x¯ ∈ A if A lies entirely on one of the two sides of H (it is allowed to have points on H ) and if, moreover, the point x¯ lies on H . Then at least one of the two closed halfspaces bounded by H contains A. Such a halfspace is called a supporting halfspace of A. Example 4.3.2 (Supporting Halfspace) In Fig. 4.7, a convex set A in the plane R2 and a boundary point x¯ of A are drawn. A halfplane H is drawn that contains A entirely, and that contains the point x¯ on the line that is the boundary of H . Theorem 4.3.3 (Supporting Hyperplane Theorem) Let x¯ be a relative boundary point of a closed convex set A ⊆ Rn . Then there exists a hyperplane that supports the set A at the point x. ¯ Hint for the equivalence to the duality theorem 4.2.5: this result can be derived from Theorem 4.2.5 by choosing, for a relative boundary point x¯ of A, a sequence of points outside the closure of A that converges to x. ¯ The other implication is easier: it suffices to use that a closed line segment for which one endpoint is a relatively interior point of A and the other one is not in A contains a relatively boundary point of A.
4.3.2 Separation Theorems Definition 4.3.4 Two convex sets A, B ⊆ X = Rn can be separated if they lie on different sides of a hyperplane (they are allowed to have points in common with this hyperplane). Example 4.3.5 (Separation of Two Convex Sets by a Hyperplane) In Fig. 4.8, both on the left and on the right, two convex set A and B in the plane R2 are separated Fig. 4.7 Supporting halfspace
x− A H
4.3 Other Versions of the Duality Theorem Fig. 4.8 Separating hyperplane
93
A B B
A
by a line. On the left, this line has no points in common with A or B, on the right, this line has points in common with A or B, actually with both, and moreover with their intersection A ∩ B. Theorem 4.3.6 (Separation Theorems) Let A, B ⊆ X = Rn be nonempty convex sets. 1. Assume that A and B have no relative interior point in common, ri(A) ∩ ri(B) = ∅. Then A and B can be separated by a hyperplane. 2. Assume that A and B have no point in common, A ∩ B = ∅, and, moreover, assume that one of A, B is compact and the other one is closed. Then they can even be separated by two parallel hyperplanes, that is, there exist real numbers α1 , . . . , αn , not all zero and two different real numbers β < γ such that α1 x1 + · · · + αn xn ≤ β ∀x ∈ A and α1 x1 + · · · + αn xn ≥ γ ∀x ∈ B. Hint for the equivalence to the duality theorem 4.2.5: separating nonempty convex sets A and B in Rn by a hyperplane is equivalent to finding a linear function l : Rn → R for which l(a) ≥ l(b) for all a ∈ A and b ∈ B, that is, for which l(a − b) ≥ 0 for all a ∈ A and b ∈ B. This is equivalent to separating the convex set A − B = {a − b  a ∈ A, b ∈ B} from the origin by a hyperplane.
4.3.3 Theorem of Hahn–Banach The following result is the finite dimensional case of an important result from functional analysis. We need some definitions.
94
4 Convex Sets: Dual Description
Definition 4.3.7 A norm on a vector space X is a function N : X → [0, ∞) having the following properties: 1. N (x + y) ≤ N (x) + N(y) ∀x, y ∈ X, 2. N (αx) = αN (x) ∀α ∈ R ∀x ∈ X, 3. N (x) = 0 ⇔ x = 0X . A normed space is a vector space equipped with a norm. Example 4.3.8 (Norm) Let n be a natural number. For each p ∈ [1, +∞], the 1
function · p : Rn → [0, +∞) given by the recipe x → (x1 p + · · · + xn p ) p for p = +∞ and x → max(x1  , . . . , xn  ) = limq→+∞ xq for p = +∞ is a norm (the pnorm or lp norm). For n = 2 this gives the Euclidean norm. The most often used lp norms are the cases p = 1, 2, +∞. Definition 4.3.9 The norm of a linear function ψ : X → R where X is a finite dimensional normed vector space with norm N is N ∗ (ψ) = max ψ(x). N (x)=1
The set of all linear functions X → R on a finite dimensional normed vector space with norm N forms a finite dimensional normed vector space with norm N ∗ . One calls N ∗ the dual norm. Example 4.3.10 (Dual Norm) Let n be a natural number. If p, q ∈ [1, +∞] and 1 1 1 p + q = 1—with the convention +∞ = 0—then · q is the dual norm of · p , if we identify v ∈ Rn with the linear function Rn → R given by the recipe x → v · x, for each v ∈ Rn . Now we are ready to give another reformulation of Theorem 4.2.5. Theorem 4.3.11 (Hahn–Banach) Each linear function on a subspace of a finite dimensional normed space X with norm N can be extended to a linear function on X having the same norm. That is, if L is a subspace of X and ϕ : L → R is a linear function, then there is a linear function ψ : X → R for which ψL = ϕ and N ∗ (ψ) = N ∗ (ϕ). Hint for the equivalence to the duality theorem 4.2.5: the statement can be viewed as separation of the epigraph of the norm · on Rn from the subspace graph(L) by a hyperplane. We have already seen that separating two convex sets by a hyperplane is equivalent to the duality theorem 4.2.5.
4.3.4 Involution Property of the Polar Set Operator Now we define an operator on convex sets B ⊆ X = Rn that contain the origin, the polar set operator. Note that the assumption that B contains the origin does not
4.3 Other Versions of the Duality Theorem
95
restrict the generality as long as we consider one nonempty convex set at the time (which is what we are doing now): this assumption can be achieved by translation. If A ⊆ X = Rn is a nonempty convex set, then choose a˜ ∈ A and replace A by B = A − a˜ = {a − a˜  a ∈ A} or—equivalently—move the origin to the point a. ¯ Definition 4.3.12 The polar set of a convex set B in X = Rn that contains the origin is defined to be the following closed convex set in X that contains the origin B ◦ = {y ∈ X  x · y ≤ 1 ∀x ∈ B}. Note that for a nonzero y ∈ X one has that y ∈ B ◦ iff the solution set of x · y ≤ 1 is a closed halfspace in X that contains B and for which the origin does not lie on the boundary of the halfspace. Therefore, the polar set operator is called a duality operator for convex sets containing the origin. Indeed, it turns the primal description of such a set (by points inside the set) into the dual description (by closed halfspaces containing the set that do not contain the origin on their boundary). Example 4.3.13 (Polar Set Operator) Here are some illustrations of the polar set operator. • The polar set of a subspace is its orthogonal complement. • The polar set of the standard unit ball in Rn with respect to some norm N , {x ∈ Rn  N (x) ≤ 1}, is the standard unit ball in Rn with respect to the dual norm N ∗ , {y ∈ Rn  N ∗ (y) ≤ 1}. • The polar set P ◦ of a polygon P in the plane R2 that contains the origin is again a polygon that contains the origin. The vertices of each of these two polygons correspond 1 − 1 to the edges of the other polygon. Indeed, for each vertex of P , the supporting halfplanes to P that have the vertex on the boundary line form an edge of P ◦ . Moreover, for each edge of P , the unique supporting halfplane to P that contains the edge on its boundary line is a vertex of P ◦ . • The polar set P ◦ of a polytope P in three dimensional space R3 that contains the origin is again a polytope that contains the origin. The vertices of each of these two polytopes correspond 1 − 1 to the faces of the other polytope; the edges of P correspond 1 − 1 to the edges of P ◦ . Indeed, for each vertex of P , the supporting halfspaces to P that have the vertex on the boundary plane form a face of P ◦ . Moreover, for each face of P , the unique supporting halfspace to P that contains the face on its boundary plane is a vertex of P ◦ . Furthermore, for each edge of P , the supporting halfspaces to P that have the edge on the boundary plane form an edge of P ◦ . • The results above for polytopes in R2 and R3 can be extended to polytopes in Rn for general n. • The polar set of a regular polytope that contains the origin is again a regular polygon that contains the origin: – if P is a regular polygon in R2 with k vertices, then P ◦ is a regular polygon with k vertices; – if P is a regular simplex in Rn , then P ◦ is a regular simplex;
96
4 Convex Sets: Dual Description
– if P is a hypercube in Rn , then P ◦ is an orthoplex, and conversely; – if P is a dodecahedron in R3 , then P ◦ is an icosahedron, and conversely. Definition 4.3.14 The bipolar set B ◦◦ of a convex set B ⊆ X = Rn containing the origin is (B ◦ )◦ , the polar set of the polar set of B, B ◦◦ = (B ◦ )◦ . Now we are ready to give the promised reformulation of the duality theorem 4.2.5, in terms of the polar set operator. Theorem 4.3.15 A closed convex set B ⊆ X = Rn that contains the origin is equal to its bipolar set, B ◦◦ = B. In particular, the bipolar set operator on nonempty convex sets is the same operator as the closure operator: B ◦◦ = cl(B) for every nonempty convex set B ⊆ X = Rn . Hint for the equivalence to the duality theorem 4.2.5. Elements of B ◦ correspond to closed halfspaces containing B that do not contain the origin on their boundary. So B = B ◦◦ is equivalent to the property that B is the intersection of closed halfspaces. An advantage of this reformulation is that it shows that there is a complete symmetry between the primal description and the dual description of a closed convex set that contains the origin. Indeed, both are given by a convex set—B for the primal description and B˜ for the dual one—and these convex sets are each other’s polar set—B˜ = B ◦ and B = B˜ ◦ . So taking the dual description of the dual description brings you back to the primal description.
4.3.5 Nontriviality Polar Cone Now we define a duality operator on convex cones C ⊆ Rn . Definition 4.3.16 The polar cone of a convex cone C in Rn is defined to be the following closed convex cone in Rn C ◦ = {y ∈ Rn  x · y ≤ 0 ∀x ∈ C}. This operator is called a duality operator for convex cones; it turns the primal description of a closed convex cone (by its rays) into the dual description (by the halfspaces containing the convex cone that have the origin on their boundary: for each nonzero vector y ∈ C ◦ , the set of solutions x of the inequality x · y ≤ 0 is such a halfspace).
4.3 Other Versions of the Duality Theorem
97
C C
Co
Co
Fig. 4.9 Polar cone
Warning The terminologies polar cone and polar set are similar and the notations are even the same, but these operators are defined by different formulas. However, these operators agree for convex cones. Example 4.3.17 (Polar Cone Operator) 1. Figure 4.9 illustrates the polar cone operator for convex cones in the plane and in three dimensional space. Note the orthogonality property in both pictures: for each ray on the boundary of a closed convex cone C, there is a unique ray on C ◦ for which the angle is minimal; this minimal angle is always 12 π . 2. The polar cone of a convex cone is the polar set of that convex cone. 3. For each one of the three ‘golden cones’ C (the first orthant, the Lorentz cone and the positive semidefinite cone) one has C ◦ = −C = {−c  c ∈ C}. Sometimes one defines, for a convex cone C ⊆ Rn the dual cone C ∗ = {x ∈ Rn  x · y ≥ 0} = −C ◦ . and a convex cone C is called selfdual if C ∗ = C. Then one can say that the three golden cones are selfdual. One gets the following reformulation of the duality theorem 4.2.5 in terms of convex cones. Definition 4.3.18 The bipolar cone C ◦◦ of a convex cone C ⊆ X = Rn containing the origin is (C ◦ )◦ , the polar cone of the polar cone of C, C ◦◦ = (C ◦ )◦ . Theorem 4.3.19 A closed nonempty convex cone C ⊆ X = Rn is equal to its bipolar cone, C ◦◦ = C. In particular, the bipolar cone operator on nonempty convex cones is the same operator as the closure operator: C ◦◦ = cl(C). Hint for the equivalence to the duality theorem 4.2.5. One should use the construction of the polar set operator by the homogenization method, which will be given later in this chapter.
98
4 Convex Sets: Dual Description
In fact, one can give moreover the following weak looking reformulation of the duality theorem 4.2.5 in terms of convex cones. This weak looking reformulation represents in a sense the essence of the duality of convex sets, and so the essence of convex analysis. Theorem 4.3.20 The polar cone of a convex cone C ⊆ Rn (so C = Rn ) that contains the origin, is nontrivial: that is, C ◦ = {0n }. Hint for the equivalence to Theorem 4.3.19 (and so to the duality theorem 4.2.5). In order to derive from the weak looking Theorem 4.3.20 the stronger looking Theorem 4.3.19, one should use that separating a point p outside C by a hyperplane from C is equivalent to separating the origin by a hyperplane from the convex cone cone(C, −p) = {c − αp  c ∈ C, α ≥ 0}. This is the same as producing a nonzero element of (cone(C, −p))◦ . Example 4.3.21 (Throwing the Ball in Any Curved Line to Prove Theorem 4.3.20) The proof for Theorem 4.2.5 by means of throwing a ball that has been given, can be simplified if you want to prove the weak looking Theorem 4.3.20: it turns out that it is not necessary to throw the ball in a straight line. It can be thrown along any curved line that will hit the convex cone eventually. Figure 4.10 illustrates this simpler proof. The reason that you have to throw the ball in a straight line if you want to prove Theorem 4.2.5, but that you are allowed to throw the ball along any curved line if you want to prove Theorem 4.3.20 is as follows. In the first case, you have to make sure that the point p from where you throw the ball and the given convex set A lie on different sides of the plane that you get at the point where the ball hits the convex set; this is only guaranteed if you throw in a straight line. Otherwise, although A will lie entirely on one side of the plane, this point p might lie on the same side. In the second case, it suffices to have any plane for which the given convex cone C lies entirely on one side of it. Therefore, it is possible to throw the line along any curved line. Fig. 4.10 Ballthrowing proof for the weakest looking duality result
4.4 The CoordinateFree Polar Cone
99
4.4 The CoordinateFree Polar Cone The aim of this section is to give a coordinatefree construction of the polar cone operator. This will be convenient for giving an equivalent definition for the polar set operator, this time in a systematic way (rather than by an explicit formula, such as we have done before)—by homogenization. Definition 4.4.1 Let X, be two finite dimensional vector spaces. A mapping b : X × → R is bilinear if it is linear in both arguments, that is, b(αx + βy, z) = αb(x, z) + βb(y, z) for all α, β ∈ R, x, y ∈ X, z ∈ , b(x, αy + βz) = αb(x, y) + βb(y, z) for all α, β ∈ R, x ∈ X, y, z ∈ . Definition 4.4.2 Let X, be two finite dimensional vector spaces and b : X × → R a bilinear mapping. Then b is nondegenerate if for each nonzero x ∈ X the linear function on given by the recipe y → b(x, y) is not the zero function, and if moreover for each nonzero y ∈ the linear function on X given by the recipe x → b(x, y) is not the zero function. Definition 4.4.3 A pair of finite dimensional vector spaces X, is said to be in duality if a nondegenerate bilinear mapping b : X × → R is given. Proposition 4.4.4 (A Standard Pair of Vector Spaces in Duality) Let X = Rn and = Rn . The mapping b : X × → R given by the recipe (x, y) → x · y, that is, the standard inner product, called the dot product, makes from X and two vector spaces in duality. We only work with finite dimensional vector spaces and for them the standard pair of vector spaces in duality given above is essentially the only example. Remark The reasons for introducing a second space and not just taking an inner product on X is as follows. For infinite dimensional vector spaces X of interest, there is often no natural generalization of the concept of inner product. Then the best one can do is to find a different space and a nondegenerate bilinear mapping b : X × → R. Even for finite dimensional spaces, it is often more natural to introduce a second space, although it has the same dimension as X, so it could be taken equal to X. For example if one considers consumption vectors as column vectors x, then one models price vectors as row vectors p. However, for our purpose, the really compelling reason to introduce and a bilinear map b : X × → R is that we will need other b than the inner product. Example 4.4.5 (Useful Nonstandard Bilinear Mappings) Here are the two nonstandard pairs of spaces in duality that we will need. 1. Let X = Rn+1 and = Rn+1 . The mapping b : X × → R given by the recipe ((x, α), (y, β)) → x · y − αβ makes from X and two vector spaces in duality. Note the minus sign.
100
4 Convex Sets: Dual Description
2. Let X = Rn+2 and = Rn+2 . The mapping b : X × → R given by the recipe ((x, α, α ), (y, β, β )) → x · y − αβ − α β makes from X and two vector spaces in duality. Note the minus signs and the transposition. Now we are ready to give the definition of the polar cone. Definition 4.4.6 let X and be two finite dimensional vector spaces in duality with underlying mapping b. The polar cone C ◦ of a convex cone C in X with respect to b is the following closed convex cone in : C ◦ = {y ∈  b(x, y) ≤ 0 ∀x ∈ C}. We have mentioned that for each nondegenerate bilinear mapping b : X× → R we can choose bases of X and such that after going over to coordinate vectors, this mapping gives the dot product on Rn . Then we get the usual coordinate definition of the polar cone that was given in the previous section. Example 4.4.7 (Polar Cone) Figure 4.11 illustrates the polar cone operator with respect to the dot product. Example 4.4.8 (Dual Norms and Polar Cones) The dual norm is related to the polar cone operator. Indeed, for each finite dimensional normed space, the epigraphs of the norm N and the dual norm N ∗ , epi(N ) = {(x, ρ)  x ∈ X, ρ ∈ R, ρ ≥ N (x)} and epi(N ∗ ) = {(x, ρ)  x ∈ X, ρ ∈ R, ρ ≥ N ∗ (x)} are convex cones that are each other’s polar cone with respect to the bilinear pairing defined by the recipe (x, α), (y, β) = x · y − αβ.
C C
Co
Fig. 4.11 Polar cone with respect to the dot product
Co
4.5 Polar Set and Homogenization
101
4.5 Polar Set and Homogenization The aim of this section is to define the polar set operator by homogenization. This definition will make it easier to establish properties of the polar set operator. Here is a detail that needs some attention before we start: the homogenization c(B) of a convex set B ⊆ X = Rn that contains the origin, is contained in the space X × R. For defining the polar cone (c(B))◦ , we have to define a suitable nondegenerate bilinear mapping on the space X × R. We take the following nondegenerate bilinear mapping b: b((x, α), (y, β)) = x · y − αβ
(*)
for all (x, α), (y, β) ∈ X × R. Note the minus sign! The following result gives a nice property of a convex set that contains the origin: its dual object is again a convex set containing the origin. Proposition 4.5.1 Let X = Rn . 1. A convex cone C ⊆ X × R containing the origin is a homogenization of a convex set that contains the origin iff R+ (0X , 1) ⊆ C ⊆ X × R+ . 2. If a convex cone C ⊆ X×R containing the origin is a homogenization of a convex set containing the origin, then the polar cone C ◦ with respect to the bilinear mapping b defined by (∗) is also a homogenization of a convex set containing the origin. Proof The first statement holds by definition of homogenization of a convex set. The second statement follows from the fact that the convex cones R+ (0X , 1) and X×R+ are each other’s polar cone with respect to the bilinear mapping b. Therefore, we get that R+ (0X , 1) ⊆ cl(C) ⊆ X × R+ implies (R+ (0X , 1))◦ ⊇ C ◦ ⊇ (X × R+ )◦ , and so it gives X × R+ ⊇ cl(C ◦ ) ⊇ R+ (0X , 1), as required.
102
4 Convex Sets: Dual Description
Note that the second property of Proposition 4.5.1 is not true for arbitrary convex sets. In Chap. 6, it is explained how the duality of arbitrary convex sets can be described without translating them first to convex sets that contain the origin. This will be necessary, as for applications in convex optimization, we will have to deal simultaneously with two convex sets that do have a common point. Now we are ready to construct the polar set operator by the homogenization method. Proposition 4.5.2 The duality operator on convex sets containing the origin that one gets by the homogenization method (first conify, then apply the polar cone operator, finally deconify) is equal to the polar set operator. Proof Here are the three steps of the construction of the duality operator on convex sets containing the origin that one gets by the homogenization method. 1. Homogenize. Take cl(c(B)), the closure of the homogenization of a given convex set B ⊆ X = Rn containing the origin. Choose a conification C of B. Note that C is a convex cone C ⊆ X × R for which R+ (0X , 1) ⊆ C ⊆ X × R+ . 2. Work with convex cones. Take the polar cone with respect to the bilinear mapping b defined above, that is, {(y, β) ∈ X × R  b((x, α), (y, β)) ≤ 0 ∀(x, α) ∈ C}. This gives (R+ (0X , 1))◦ ⊇ C ◦ ⊇ (X × R+ )◦ . As the convex cones R+ (0X , 1) and X × R+ are each other’s polar cone with respect to this bilinear pairing, it follows that X × R+ ⊇ C ◦ ⊇ R+ (0X , 1). So C ◦ is a homogenization of a convex set in X that contains the origin. Moreover, as C ◦ is closed, this convex set is closed. 3. Dehomogenize. Take the dehomogenization of the convex cone C ◦ that is the outcome of the previous step. This gives a closed convex set in X that contains the origin. A calculation, which we do not display, shows that the polar set operator defined by the homogenization method is equal the usual polar set operator.
Remark This is another example where a formula from convex analysis—here the defining formula for the polar set operation—is generated by the homogenization method. Now it follows that the duality result for closed convex cones C that contain the origin, C ◦◦ = C, implies the duality result for closed convex sets B that contain the origin, B ◦◦ = B.
4.6 Calculus Rules for the Polar Set Operator
103
Fig. 4.12 Geometric construction of the polar set B ×{1}
B° ×{1}
Example 4.5.3 (Geometric Construction Polar Set Operator) The construction of the polar set operator by the homogenization method gives a geometric construction of the polar set operator. Figure 4.12 illustrates this. Indeed, consider a closed convex set B in the plane R2 that contains the origin. Take two closed convex cones in threedimensional space R3 that are each other’s polar cone—taken with respect to the standard inner product—one of which is the homogenization of B. Note that this is easy to visualize because of ‘the right angle between the convex cones’. Now intersect the upper cone with the horizontal plane at height 1 and intersect the lower cone with the horizontal plane at height −1. Take the orthogonal projections of the two intersections on the horizontal coordinate plane. Then you get two closed convex sets containing the origin that are each other’s polar set.
4.6 Calculus Rules for the Polar Set Operator For every type of convex object, there are calculus rules. These can be used to compute the duality operator for this type of objects. Here we give these calculus rules for convex sets B containing the origin. Here the duality operator is the polar set operator. We have already the rule B ◦◦ = B if B is closed. Now we will also give rules expressing (B1 ◦ B2 )◦ , for a binary operation ◦ (sum, intersection, convex hull of the union or inverse sum) in terms of B1◦ and B2◦ , and rules expressing (MB)◦ and (BN )◦ in terms of B ◦ for linear functions M and N . We will derive the calculus rules for convex sets by using the homogenization method. In order to do this, we need the calculus rules for convex cones C to begin with. Therefore, we prepare by giving these rules. We have already the rule C ◦◦ = C if C is closed and contains the origin. We will halve the number of formulas needed, by the introduction of a handy shorthand notation. We will use for two convex cones C, C˜ ⊆ X = Rn that are in duality, in the sense that ˜ C ◦ = C,
C˜ ◦ = cl(C),
104
4 Convex Sets: Dual Description
the following notation d ˜ C← → C.
Note that this implies that C˜ is closed; however C does not have to be closed. It also implies that C and C˜ both contain the origin. Proposition 4.6.1 (Calculus Rules for Binary Operations on Convex Cones) d d ˜ D, D˜ be closed convex cones in X = Rn for which C ← ˜ D← ˜ Let C, C, → C, → D. Then ˜ C+D ← → C˜ ∩ D. d
The formula in Proposition 4.6.1 is a consequence of the following result, taking into account that +X and X are each other’s transpose by Proposition 2.2.1. In this result, we look what happens to convex cones in duality under the two actions by linear functions—image and inverse image. Proposition 4.6.2 (Calculus Rules for the Two Actions of Linear Functions on Convex Cones) Let M : Rn → Rm be a linear function, and C, C˜ ⊆ Rn convex d ˜ Then → C. cones that are in duality, C ← d ˜ . → CM MC ←
This result follows from the duality theorem for convex cones. Under suitable assumptions, the cone MC in Proposition 4.6.2 is closed (see Exercise 42). Now we are ready to present the calculus rules for convex sets containing the origin. Again we use similar shorthand notation. We will use for two convex sets B, B˜ ⊆ X = Rn containing the origin that are in duality, in the sense that ˜ B˜ ◦ = cl(B), B ◦ = B, the following notation ˜ B← → B. d
Note that this implies that B˜ is closed; however B does not have to be closed. Proposition 4.6.3 (Calculus Rules for Binary Operations on Convex Sets Con˜ B, B˜ be convex sets in X = Rn containing the taining the Origin) Let A, A, d d ˜ B← ˜ Then → A, → B. origin, for which A ← d →A˜ ∩ B˜ A co ∪ B ←
˜ A +B ← →A˜ #B. d
4.6 Calculus Rules for the Polar Set Operator
105
This result can be proved by the homogenization method, using the systematic definition of the binary operations for convex sets. Now we give the calculus rules for the two actions of linear functions (image and inverse image). Proposition 4.6.4 (Calculus Rules for the Two Actions of Linear Functions on Convex Sets Containing the Origin) Let M : Rn → Rm be a linear function, and d ˜ Then → B. B, B˜ ⊂ Rn convex zerosets containing the origin for which B ← d ˜ . MB ← → BM
This result follows by the conification method from the corresponding result for convex cones. Again one ‘can omit the closure operator’ in the formulas above under suitable assumptions. We do not display these conditions. Example 4.6.5 (Calculus Rules for Bounded Convex Sets Containing the Origin) The calculus rules above simplify if one restricts attention to the following type of convex object: bounded convex sets that contain the origin in their interior. Part of the assumptions, that the convex sets contain the origin in their interior, is essentially no restriction of the generality: it can be achieved by a translation that brings a relative interior point of the convex set to the origin, followed by the restriction of the entire space by the linear span of the translated copy of the convex set. Note that the polar set of such a set is of the same type, and so the polar set operator preserves the type. Indeed, these convex sets are deconifications of convex cones C for which epi(R · ) ⊆ cl(C) ⊆ epi(r · ), for some numbers R > r > 0, where · is the Euclidean norm. If you apply the polar cone operator with respect to b, then you get (epi(R · ))◦ ⊇ C ◦ ⊇ (epi(r · ))◦ , As epi( · ) is equal to its polar cone with respect to b, you get epi(R −1 · ) ⊇ cl(C ◦ ) ⊇ epi(r −1 · ). The simplification in the calculus rules for bounded sets containing the origin in their interior, is that the closure operations in the calculus formulas can be omitted. This follows from the fact that the image of a compact set under a continuous function is again compact. Conclusion The unified definition of the four classic binary operations for convex sets containing the origin, by means of the homogenization method, makes it possible to prove their calculus formulas in one blow. They are a consequence of the calculus formulas for the two actions on convex cones by linear functions—image and inverse image.
106
4 Convex Sets: Dual Description
4.7 Duality for Polyhedral Sets The aim of this section is to present the duality theory for polyhedral sets. Polyhedral sets are the convex sets par excellence. They are the convex sets that occur in Linear Programming. Each convex set can be approximated as closely as desired by polyhedral sets. The theorems of convexity are essentially simpler and stronger for polyhedral sets than for arbitrary convex sets. Polyhedral sets have an interesting specific structure: vertices, edges, facets. Some properties of the duality for polyhedral sets have already been given in Example 4.3.13. A closed convex cone in X × R+ , where X = Rn , has as deconification a polyhedral set iff it is a polyhedral cone. We begin with the representation theorem for polyhedral sets P , which is the central result. We give a version of the representation theorem where the representation only depends on one choice, that of a basis for the lineality space of the polyhedral set P . Note that the definition of a polyhedral set, as the intersection of a finite collection of closed halfspaces, gives a dual description of polyhedral sets. The representation theorem gives a primal description as well. Theorem 4.7.1 (MinkowskiWeyl Representation) Let P ⊆ X = Rn be a polyhedral set. Consider the following sets S is the set of extreme points of P ∩ L⊥ P, T is a set of representatives of the extreme rays of P ∩ L⊥ P and U is a basis of LP . Then each point of P can be represented in the following form
αs s +
s∈S
βt t +
t∈T
γu u
u∈U
where αs ∈ R+ ∀s ∈ S, s∈S αs = 1, βt ∈ R+ ∀t ∈ T and γu ∈ R ∀u ∈ U . Conversely, each convex sum that is the Minkowski sum of the convex hull of a finite set and the conic hull of a finite set is a polyhedral set. Example 4.7.2 (MinkowskiWeyl Representation) Figure 4.13 illustrates the MinkowskiWeyl representation for four different polyhedral sets P in the plane R2 . From left to right: • a bounded P is the convex hull of its vertices (extreme points),
s4
P
s3 P
t1
u t2
s5 s1
P
t1
s2
Fig. 4.13 MinkowskiWeyl representation
s1
t2 s4
s2
s3
s1 P
s2
L⊥ P
4.7 Duality for Polyhedral Sets
107
• an angle is the conic hull of its two legs (extreme recession directions), • an unbounded P that contains no line is the Minkowski sum of the convex hull of its extreme points and the conic hull of its extreme recession directions, • a general polyhedral set P is the Minkowski sum of a polyhedral set that contains no line, and a subspace. Note that this result implies that a bounded convex set is the convex hull of a finite set iff it is the intersection of a finite collection of closed halfspaces. The proof of Theorem 4.7.1 is left as an exercise. It can be proved from first principles by the homogenization method (recommended!) or it can be derived from the theorem of Krein–Milman for arbitrary—possibly unbounded—convex sets that we have given. It follows that the polar set of a polyhedral set that contains the origin is again a polyhedral set and that the polar cone of a polyhedral cone is again a polyhedral cone. d We use again the notation P ← → P˜ for convex polyhedral sets in duality. Note that now there is no closure operation in its definition: it just means P ◦ = P˜ , P˜ ◦ = P . So there is complete symmetry between P and P˜ . This is the advantage of working with polyhedral sets. If we take this into account, then the theorems for general convex sets specialize to the following results for polyhedral sets. Theorem 4.7.3 (Theorem on Polyhedral Sets) The following statements hold: 1. The polar set of a polyhedral set is again a polyhedral set. 2. The image and inverse image of a polyhedral set under a linear transformation are again polyhedral sets. 3. The polyhedral property is preserved by the standard binary operations for convex sets—sum, intersection, convex hull of the union, inverse sum. 4. For polyhedral sets P , Q ⊂ Rn containing the origin, the following formulas hold d
→ P #Q P +Q← d
P co ∪ Q ← → P ∩ Q. d 5. For polyhedral sets P , P˜ ⊂ Rn containing the origin for which P ← → P˜ and M ∈ Rm×n , the following formula holds:
→ P˜ M . MP ← d
We do not go into the definitions of vertices, edges and facets of polyhedral sets. However we invite you to think about what happens to the structure of vertices, edges and facets of a polyhedral set when you apply the polar set operator. Try to guess this. Hint: try to figure out first the case of the platonic solids—tetrahedron, cube, octahedron, dodecahedron, icosahedron—if they contain the origin.
108
4 Convex Sets: Dual Description
Conclusion Polyhedral sets containing the origin are the nicest class of convex sets. In particular, the polyhedral set property is preserved under taking the image under a linear transformation and under taking the polar set. This makes the duality theory for polyhedral sets very simple: it involves no closure operator.
4.8 Applications The aim of this section is to give some applications of duality for convex sets.
4.8.1 Theorems of the Alternative We present the theorems of the alternative. We show that for each system of linear inequalities and equations, one can write down another such system such that either the first or the second system is soluble. These are results about polyhedral sets, as polyhedral sets are solution sets of finite systems of linear inequalities and equations. We do this for some specific forms of systems. We start with an example. Example 4.8.1 (Theorems of the Alternative) Consider the system of linear inequalities 2x + 3y ≤ 7 6x − 5y ≤ 8 −3x + 4y ≤ 2 This system is soluble. It is easy to convince a nonexpert of this. You just have to produce an example of a solution. For example, you give (2, 1). You could call this a certificate of solubility. This holds more generally for each system of inequalities and equations: if it is soluble, then each solution can be used to give a proof of the solubility. Now consider another system of linear inequalities 2x − 3y + 5z ≤ 4 6x − y − 4z ≤ 1 −18x + 11y − 7z ≤ −15. This system is not soluble—it has no solution. How to convince a nonexpert of this? Well, it is again easy. Just produce a linear combination of these inequalities with nonnegative coefficients that gives an equality of the form 0·x +0·y +0·z ≤ a
4.8 Applications
109
for some negative number a. Therefore, the assumption that a solution to the system of linear inequalities exists leads to a contradiction. Hence, there exists no solution. It seems reasonable to expect that nonexperts will be convinced by this argument. For example, you can take as coefficients 3, 2, 1. This gives that 3 · (2x − 3y + 5z) + 2(6x − y − 4z) + 1 · (−18x + 11y − 7z) is less than or equal to 3 · 4 + 2 · 1 + 1 · (−15) ≤ 3 · 4 + 1 · 1 + 2 · (−15). After simplification we get 0 · x + 0 · y + 0 · z ≤ −1. You could call this a certificate of insolubility. Such certificates do not exist for arbitrary systems of inequalities and equations. But fortunately they always exist if the inequalities and equations are linear. This is the content of the theorems of the alternative. We do not go into the efficient method to find such certificates—by means of the simplex method. Of course, it is not necessary that a nonexpert understands how the certificate—here the coefficients 3, 2, 1—has been found. This is a ‘trick of the trade’. Now we give several theorems of the alternative. Theorem 4.8.2 (Theorems of the Alternative) Let X = Rn , Y = Rm , A ∈ Rm×n and b ∈ Y . 1. Minkowski [1] and Farkas [2]. Either the system Ax = b, x ≥ 0 is soluble or the system A y ≥ 0, b · y < 0 is soluble. 2. Fan Tse (Ky Fan) [3]. Either the system Ax ≤ b is soluble or the system A y = 0, y ≥ 0, b · y < 0 is soluble.
110
4 Convex Sets: Dual Description
3. Gale [4]. Either the system Ax ≤ b, x ≥ 0 is soluble or the system A y ≥ 0, y ≥ 0, b · y < 0 is soluble. The first one of these theorems of the alternative is usually called Farkas’ lemma. The proofs of these theorems of the alternative will show how handy the calculus rules for the binary operations are. In order to avoid that minus signs will occur, we work with dual cones and not with polar cones. Recall that they only differ by a minus sign C ∗ = −C ◦ . We will use ((Rn )+ )∗ = Rn+ . Proof 1. The solubility of Ax = b, x ≥ 0 means that b ∈ ARn+ . The convex cone ARn+ is polyhedral, and therefore ARn+ = (ARn+ )∗∗ = ((A )−1 Rn+ )∗ . This settles 1. m n 2. The solubility of Ax ≤ b means that b ∈ ARn +Rm + . The convex cone AR +R+ is polyhedral, and therefore n m ∗∗ −1 m ∗ ARn + Rm + = (AR + R+ ) = ((A ) (0) ∩ R+ ) .
This settles 2. 3. The solubility of Ax ≤ b, x ≥ 0 means that b ∈ ARn+ + Rm + . The convex cone ARn+ + Rm is polyhedral, and therefore + n m ∗∗ −1 n m ∗ ARn+ + Rm + = (AR+ + R+ ) = ((A ) (R+ ) + R+ ) .
This settles 3.
For one more theorem of the alternative, by Gordan, see Exercise 50. Conclusion The calculus rules for binary operations allow very efficient proofs for the socalled theorems of the alternative. These theorems show the existence of certificates of nonsolubility of systems of linear inequalities.
4.8 Applications
111
4.8.2 *The BlackScholes Option Pricing Model We present, as announced in Sect. 4.1.4, the result of Black and Scholes for the pricing of European call options in a simple but characteristic situation. Consider a stock with current value p. Assume that there are two possible scenarios at a fixed future date: either the value will go down to vd , or the value will go up to vu . The question is to determine the ‘correct’ current value π for the right to buy the stock at the fixed future date for a given price q. Such a right is called a European call option. This given price is always chosen between the low value vd and the high value vu , of course, vd < q < vu . Other choices make no sense. Black and Scholes showed that there exists a unique ‘correct’ current value for the option. It is characterized by the following interesting property: There is an—auxiliary—unique probability distribution for the two scenarios such that, for both the stock and the option, the current value is equal to the expected value at the fixed future date.
This property is easy to memorize. The condition on the stock and the option is called the martingale property. Keep in mind that the statement is not that this probability distribution describes what the probabilities really are; it is just an auxiliary gadget. Let us first see why and how this determines the value of the option now. We choose notation for the probability distribution: let yd be the probability that the value will go down and let yu be the probability that the value will go up. So yd , yu ≥ 0 with yd + yu = 1. Now we put ourselves in the position of an imaginary person who thinks that the probability for the value of the stock to go down is yd and that the probability for the value of the stock to go up is yu . We do not imply at all that these thoughts have any relation to the real probabilities. The probability distribution that we consider is just an auxiliary gadget. The expected value of the stock at the future date is yd vd + yu vu . Thus the property above gives the equality p = yd vd + yu vu . When the price of the stock goes down, then the right to buy something for the price q that has at that date a lower price vd is useless. So this right will not be used. Therefore, the value of the option at that date is 0. However, when the price of the stock goes up, then you can buy stock for the price q and then sell it immediately for the price vu . This gives a profit vu − q. Therefore, the value of the option at that date turns out to be vu − q. It follows that the expected value of the option at the future date is yd · 0 + yu (vu − q) = yu (vu − q). Thus the property above gives the equality π = yu (vu − q). Now we have three linear equations in the unknown quantities π, yd , yu : yd + yu = 1, p = yd vd + yu vu , π = yu (vu − q).
112
4 Convex Sets: Dual Description
These equations have a unique solution and for this solution yd , yu ≥ 0 as one readily checks. This concludes the explanation how the property above determines a unique current value for the option. It remains to justify the property above. The source of this is the fact that in financial markets, there are always people on the lookout for an opportunity to make money without taking any risk, called arbitrage opportunities. Their activities lead to changes in prices, which make these opportunities disappear. Therefore, the assumption is made that there are no arbitrage opportunities. We are going to see that this implies the property above. To see this, we first have to give a precise description of the absence of arbitrage opportunities. There are three assets here: money, stock, options. We assume that there exists no portfolio of these assets that has currently a negative value but at the fixed future date a nonnegative value in each one of the two scenarios. That is, there exist no real numbers xm , xs , xo for which xm + xs p + xo π < 0 and xm + xs vd + xo · 0 ≥ 0, xm + xs vu + xo (vu − q) ≥ 0. Note that here we assume that the value of money stays the same from now till the future date and that we allow going short, that is, xm , xs , xo are allowed to be negative—this means holding negative amounts of the asset, as well as allowing xm , xs , xo to be not integers. Now we are ready to strike: we apply Farkas’ lemma. This gives the property above, as can be checked readily. This concludes the proof of the result of Black and Scholes.
4.8.3 *Child Drawing The aim of this section is to determine the equation of the curve which is the outcome of the child drawing of Sect. 4.1.1. Figure 4.14 gives this drawing again. Let the length of the drawer be 1. We choose coordinates in such a way that the xaxis lies horizontally at a height of 1 above the bottom of the sheet and that the yaxis lies vertically at a distance of 1 to the right of the left side of the sheet. Then the lines ax + by = 1 that the child draws enclose a closed convex set A containing Fig. 4.14 Child drawing
4.8 Applications
113
the origin; they are tangent lines to the boundary of A. This boundary is a curve . The lines that the child draws, correspond to the points of the curve that form the boundary of the polar set A◦ . By the duality theorem, the polar set of A◦ is A. The points on the boundary of A = (A◦ )◦ , the polar set of A◦ , correspond to the tangent lines to . This discussion suggests how to do the calculation. To keep this calculation as simple as possible we change to the following coordinates: the bottom of the sheet is the xaxis and the lefthand side of the sheet is the yaxis. Note that now the convex set A does not contain the origin. However, this does not matter for the present purpose. The lines that the child draws are the lines ax + by = 1 for which the distance between the intercepts (a −1 , 0) and (0, b−1 ) is 1. Thus the pair (a, b) has to satisfy, by the theorem of Pythagoras, the equation a −2 + b−2 = 1. So the curve has equation x −2 + y −2 = 1. Now we take in each point of the tangent ax + by = 1. Then one can verify that the coefficients a and b of a tangent ax + by = 1 to the curve satisfy the 2 2 equation a 3 + b 3 = 1. This shows that the curve in the child drawing has the equation 2
2
x 3 + y 3 = 1. Conclusion By using the duality theorem for convex sets we can derive from the dual description of a convex curved line (‘its tangent lines’) an equation for the curved line.
4.8.4 *How to Control Manufacturing by a Price Mechanism The aim of this section is to prove the result stated in Sect. 4.1.2: for each reasonable way to manufacture a product, there exist prices for the resources such that this way will be chosen by a cost minimizing firm. Let notations and definitions be as in Sect. 4.1.2. Let a reasonable way to manufacture the product be chosen: this is a boundary point x of P . Apply the supporting hyperplane theorem 4.3.3 to x and P . This gives a nonzero vector p ∈ Rn such that the halfspace p · (x − x ) ≥ 0 supports P at x . As RP = Rn+ and as x is efficient, it follows that p > 0. If we now set prices for the resources to be p1 , . . . , pn , then we get—by the definition of supporting hyperplane—that the problem to minimize p · x subject to x ∈ P has optimal solution x . This completes the analysis of the manufacturing problem.
114
4 Convex Sets: Dual Description
Conclusion The supporting hyperplane theorem can be applied to show the following result. For each efficient way to manufacture a product, there exist prices for the resources such that this efficient way will be chosen by a cost minimizing firm.
4.9 Exercises 1. In Sect. 4.1.2, it is stated that boundary points of P are the reasonable ways of manufacturing. Why is this the case? 2. Check the details of the ball throwing proof for the duality theorem for convex sets, Theorem 4.2.5, which is given in this chapter. 3. One of the two standard proofs for duality: by means of the shortest distance problem. Consider the setup of Theorem 4.2.5 and assume A = ∅, A = Rn . Show that for each p ∈ Rn \ A, there exists a unique point x ∈ A which is closest to p and show that the closed halfspace {x ∈ Rn  (x − x ) · ( x − p) ≥ 0} contains A but not p. Figure 4.15 illustrates the shortest distance problem. Figure 4.16 illustrates the proof for duality by means of the shortest distance problem for a convex cone C in the plane. A disk with center p is blown up till it hits the convex cone, at some point x . Then the tangent line to the disk at the point x has C on one side and the point p on the other side. 4. *Let p ∈ Rn and let A ⊆ Rn be a nonempty closed convex set. Show that the shortest distance problem for the point p and the set A has a unique solution. Fig. 4.15 Shortest distance problem for a point and a convex set
x^ p
Fig. 4.16 Shortest distance proof for duality
C
C x^
p
x^ p
4.9 Exercises
115
Fig. 4.17 Theorem of Moreau
0
C
D
c d x
This unique solution is sometimes called the projection of the point p on the convex set A. This result is called the Hilbert projection theorem. 5. Theorem of Moreau. *Here is the main result on the shortest distance problem. Let C, D be two closed convex cones in X = Rn that are each other’s polar cone, C = D ◦ , D = C ◦ . Let x ∈ X be given. Show that there exist unique c ∈ C and d ∈ D for which x = c + d, c · d = 0. Moreover show that c is the point in C closest to x and d is the point of D closest to x. Figure 4.17 illustrates the theorem of Moreau, as c · d = 0 means that the vectors c and d are orthogonal. 6. Prove that, for a finite dimensional vector space X, the subsets of X ×R that are the epigraph of a norm on X are precisely the closed convex cones C ⊂ X × R for which C \ {(0X , 0)} ⊆ X × R++ , for which (0X , 1) is an interior point of C, and for which moreover (x, ρ) ∈ C implies (−x, ρ) ∈ C. 7. Show that the norm of a linear function ψ : X → R on a finite dimensional normed vector space X with norm N is welldefined by the formula N ∗ (ψ) = max ψ(x). N (x)=1
That is, show that the function ψ(x) assumes its maximum on the solution set of N (x) = 1 (‘the sphere of the norm N ’). 8. Prove the statement in Example 4.3.10. 9. Hahn–Banach. One of the two standard proofs for duality: the Hahn–Banach proof. The following proof of the duality theorem, has the advantage that it can be extended to infinite dimensional spaces (this only requires that you use in addition Zorn’s lemma); it is essentially the proof of one of the main theorems of functional analysis, the theorem of Hahn–Banach. Figure 4.18 illustrates the induction step of the Hahn–Banach proof for duality, for the step from dimension two to dimension three. Here we consider the weakest looking version of duality, Theorem 4.3.20.
116
4 Convex Sets: Dual Description
C V
l 0
v
0
proj l ⊥(C)
proj l ⊥
Fig. 4.18 Hahn–Banach proof for duality
A convex cone C in threedimensional space that has nonempty interior and that is not equal to the entire space, is drawn. We have to prove duality for it: that is, we have to prove that there exists a plane through the origin that has C entirely on one of its two sides. We assume that we have duality for convex cones in one dimension lower, in a plane. That is, for each convex cone D in a twodimensional space that has nonempty interior and that is not equal to the entire plane, there exists a line in this plane that has D entirely on one of its two sides. An interior point v of C has been chosen and drawn in Fig. 4.18. Moreover, a plane V through the origin and through the point v has been chosen and drawn. Then the intersection C ∩ V is a convex cone in the plane V . We have duality for this convex cone by assumption. So we can choose a line l in V through the origin that has C ∩ V entirely on one of its two sides. This line l has been drawn. Now we take the orthogonal projection of C on the plane l ⊥ through the origin that is orthogonal to l. This is a convex cone in l ⊥ that is not equal to l ⊥ (this follows readily from the special choice of l). We have duality for this convex cone by assumption. Therefore, we can choose a line in l ⊥ through the origin that has this convex cone entirely on one of its two sides: in the figure, this projection is a halfplane (drawn on the righthand side). Therefore this line is unique. Now the plane that contains this line and that contains the line l as well, has the convex cone C entirely one of its two sides, as one can check. This completes the proof of this induction step. This proof for the induction step from dimension two to dimension three extends to a proof for the induction step from dimension n − 1 to dimension n for arbitrary n ≥ 3; this gives Theorem 4.3.20. In this exercise, you are invited to construct a precise proof of the duality—in the version of Theorem 4.2.5—by means of this induction argument. (a) Show that in order to prove Theorem 4.2.5, it suffices to prove the following statement. For each subspace H in Rn for which H ∩ int(A) = ∅ and dim H < n−1 there exists a subspace K in Rn for which H ⊂ K, dim K = dim H + 1 and K ∩ int(A) = ∅.
4.9 Exercises
117
(b) Let H be a subspace in Rn for which H ∩ int(A) = ∅ and dim H < n − 1. Show that there exists a subspace L in Rn for which H ⊂ L and dim L = dim H + 2. (c) Show that the—possibly empty—set of hyperplanes in L that contain H and that are disjoint from int(A) is in a natural bijection to the set of lines in the plane L/H through the origin that are disjoint from the interior of (A + H )/H . (d) Show that, for a given convex set B in the plane not having the origin as an interior point, there exists a line in the plane through the origin which is disjoint from int(B). 10. Prove the supporting hyperplane theorem 4.3.3 by deriving it from the duality theorem 4.2.5. Hint. Apply the duality theorem to a sequence of points outside A that converge to the given relatively boundary point of A. This leads to a sequence of hyperplanes. Take the limit of a suitable subsequence. Here the concept limit requires some care. 11. Derive the duality theorem 4.2.5 from the supporting hyperplane theorem 4.3.3. Hint. Choose a relative interior point a in A. Then show that for each point x outside A, the closed line segment with endpoints a and x contains at least one relative boundary point of A. 12. **Prove the separation theorem, Theorem 4.3.6. Hints. For the first statement, reduce the general case to the problem of separating A − B = {a − b  a ∈ A, b ∈ B} from the origin if 0 ∈ A − B. Note that we have already established this fact. For the second statement, show that A − B is closed if A is closed and B is compact. 13. Show that the set X∗ of linear functions X → R on a given finite dimensional normed vector space X with norm N , is a finite dimensional normed vector space with norm N ∗ , defined in Definition 4.3.9. The set X∗ is called the dual space of X and N ∗ is called the dual norm. 14. **Prove the finite dimensional Hahn–Banach theorem, Theorem 4.3.11, in two different ways: (1) derive it from the separation theorem, Theorem 4.3.6, (2) give a proof from first principles, extending the linear function step by step, each time to a subspace of one dimension more than the previous subspace (this is the usual proof given in textbooks). 15. Show that for two convex cones C1 , C2 ⊆ Rn one has the implication C1 ⊆ C2 ⇒ C1◦ ⊇ C2◦ . ˜ one has that C has a 16. Show that for two closed convex cones in duality, C, C, ◦ nonempty interior iff C is pointed (that is, it contains no entire line). 17. Show that for two closed convex sets containing the origin, B1 , B2 ⊆ Rn , one has the implication B1 ⊆ B2 ⇒ B1◦ ⊇ B2◦ .
118
4 Convex Sets: Dual Description
18. Let B ⊆ X = Rn be a closed convex set containing the origin. Check that B ◦ is a closed convex set in X containing the origin. ˜ that are each 19. Show that for two closed convex sets containing the origin, B, B, ˜ other’s polar set, one has that B is bounded iff B contains the origin in its interior. Hint. Proceed in three steps: (a) Show that B is bounded iff its conification c(B) is contained in the epigraph of ε · for some ε > 0. (b) Show that B contains the origin in its interior iff its conification c(B ) contains the epigraph of N · for some N > 0. (c) Show that the epigraphs of N −1 · and N · are each other’s polar cone with respect to the bilinear pairing b((x, α), (y, β)) = x · y − αβ. 20. Show that the polar set of a subspace is its orthogonal complement. 21. Show that the polar set of the standard unit ball in Rn with respect to some norm · is the standard unit ball in Rn with respect to the dual norm · ∗ . 22. Show that the fact that the polar set operator B → B ◦ is an involution—that is, Theorem 4.3.15—is indeed a reformulation of the duality theorem for convex sets, Theorem 4.2.5. Hint. Check first that for every nonzero v ∈ Rn , one has v ∈ B ◦ iff the closed halfspace given by the inequality x · v ≤ 1 contains B. 23. Prove the weakest looking version of the duality theorem (‘polar cone of a proper convex cone is nontrivial’) by means of the ballthrowing proof, where the ball is not necessarily thrown in a straight line. 24. Show that for every closed convex cone C ⊂ X = Rn that has nonempty interior, there exists a basis of the vector space X such that if you take coordinates with respect to this basis, C is the epigraph of a function g : Rn−1 → R+ for which epi(g) is a convex cone. 25. *Show that for every closed convex cone C ⊂ X = Rn that has nonempty interior and that contains no line through the origin, there exists a basis of the vector space X such that if you take coordinates with respect to this basis, C is the epigraph of a function g : Rn−1 → R+ for which epi(g) is a convex cone and g(0) = 0 iff x = 0. 26. *Show that a given closed convex cone does not contain any line through the origin iff its polar cone has a nonempty interior. 27. *Show that the three golden cones—the first orthant, the Lorentz cone and the positive semidefinite cone—are selfdual, that is, show that they are equal to their dual cone. 28. *Prove Proposition 4.4.4. 29. Let X, be two finite dimensional vector spaces which are put in duality by a nondegenerate bilinear mapping b : X × → R. (a) *Show that X and have the same dimension (= n). (b) Choose bases for X and and identify vectors of X and with the coordinate vectors with respect to these bases. Thus X and are identified
4.9 Exercises
119
with Rn and then b is identified with a mapping b : Rn × Rn → R. Show that there is a unique invertible n × nmatrix M for which b(x, y) = x My for all x ∈ X, y ∈ . (c) Show that the bases for X and can be chosen in such a way that M = In the identity n × nmatrix. Then b(x, y) = x y = x · y for all x, y ∈ Rn — that is, b is the dot product. 30. Let X and be two finite dimensional vector spaces in duality with underlying mapping b. Check that for a convex cone C ⊆ X, the polar cone C ◦ with respect to b is a closed convex cone in . 31. Give a nondegenerate bilinear mapping b : Rn × Rn → R for which the polar cone operator with respect to b is just the dual cone operator with respect to the dot product. 32. Check that the polar set and the polar cone of a convex cone in X = Rn are equal. 33. Prove the following relation between the concepts polar cone and dual cone: C ∗ = −C ◦ . 34. Show that the polar cone of a subspace L of X = Rn is its orthogonal complement L⊥ = {x ∈ X  x · y = 0 ∀y ∈ L}. 35. Show that the convex cones R+ (0X , 1) and X × R+ , where X = Rn , are each other’s polar cone with respect to the nondegenerate bilinear pairing b on X×R defined by b((x, α), (y, β)) = x · y − αβ for all (x, α), (y, β) ∈ X × R. 36. Show that the definition of polar set by the homogenization method is welldefined. To be specific, check that for a closed convex set B containing the origin, one has that c(B)◦ , the polar cone of c(B), is a closed convex cone in X × R+ for which c(B)◦ \ {0} is contained in X × R++ . 37. Show that the duality operator on convex sets containing the origin, defined by the homogenization method, is equal to the polar set operator. 38. Show that the duality operator on norms, defined by the homogenization method, is equal to the wellknown dual norm operator. 39. Let notation and assumptions be as in Proposition 4.6.1. Show that if ri(C ◦ ) ∩ ˜ ri(D ◦ ) = ∅ then C˜ + D˜ is closed. So then (C ∩ D)◦ = C˜ + D. 40. (a) Show that for a convex set A ⊆ X = Rn the intersection RA ∩ R−A is a subspace of X. This is called the lineality space of A. (b) Show that the lineality space of a convex cone C ⊆ X is the largest subspace of X contained in C. 41. Prove Proposition 4.6.2 using the duality result for convex cones. 42. Let notation and assumptions be as in Proposition 4.6.2. Show that MC is ˜ )◦ = MC. closed if C ∩ ker(M) ⊆ LC . So then (CM 43. (a) Derive Proposition 4.6.2 from the separation theorem. (b) Derive Proposition 4.6.1 by means of conification, using Proposition 2.2.1.
120
4 Convex Sets: Dual Description
44. *Show that ‘the closure operators cannot be omitted’ from the formulas above in general. That is, one might have, for suitable pairs of closed convex cones in ˜ and (D, D) ˜ in Rn , duality (C, C) ˜ ◦ C + D = (C˜ ∩ D) and ˜ )◦ = MC. (CM 45. *Derive Proposition 4.6.4 from the corresponding result for convex cones, Proposition 4.6.2. 46. *Derive Proposition 4.6.3 from Proposition 4.6.4 by means of the homogenization method. Hint. use that +Y and Y are each other’s transpose for each finite dimensional inner product space Y . 47. Formulate Theorem 4.7.3 for polyhedral cones. 48. *Let P be the solution set of the system of linear inequalities in n variables aj x ≤ bj , 1 ≤ j ≤ r where aj ∈ Rn and bj ∈ R. Show that for each extreme point x¯ of this polyhedral set there is a subset J of {1, 2, . . . , r} for which the vectors aj , j ∈ J form a basis of Rn and x¯ is the unique solution of the system of linear equations aj x = bj , j ∈ J . 49. Consider the five platonic solids—tetrahedron, cube, octahedron, dodecahedron, icosahedron. We assume that they contain the origin—after a translation; this is done for convenience: then we can give dual descriptions by means of the polar set operator. (a) Show that these are polyhedral sets. (b) Show that the polar set of a platonic solid is again a platonic solid. (c) *Determine for each platonic solid which platonic solid is its polar set. 50. Gordan’s theorem of the alternative. *Let a1 , . . . , ar ∈ Rn . Prove the following statement. Gordan [5]. Either the system a1 x < 0, . . . , ar x < 0 is soluble, or the system μ1 a1 + · · · + μr ar = 0, μ = 0, μ ≥ 0 is soluble. 51. Show that the coefficients a and b of a tangent ax +by = 1 to the curve in the 2 2 child drawing example analyzed in Sect. 4.8.3 satisfy the equation a 3 +b 3 = 1. 52. Now the child makes a second drawing (after the drawing that is analysed in Sect. 4.8.3). Again she draws straight lines from the lefthand side of the paper to the bottom of the paper. This time she is doing this in such a way that the sum of the distances of the endpoints of each line to the bottomleft corner of the paper is constant. Again these lines enclose a closed convex set. Find the equation of the curve formed by the boundary points of this closed convex set.
References
121
References 1. H. Minkowski, Geometrie der Zahlen (Teubner, Leipzig, 1896) 2. G. Farkas, Über die Theorie der einfachen Ungleichungen. J. Reine Angew. Math. 124, 1–24 (1901) 3. Fan Tse (Ky Fan), On Systems of Linear Inequalities. Linear Inequalities and Related Systems (Princeton University Press, Princeton, 1956) 4. D. Gale, The theory of linear economic models (McGrawHill, New York, 1960) 5. P. Gordan, Über die Auosung linearer Gleichungen mit reellen Coecienten. Math. Ann. 6, 23–28 (1873)
Chapter 5
Convex Functions: Basic Properties
Abstract • Why. A second main object of convex analysis is a convex function. Convex functions can help to describe convex sets: these are infinite sets, but they can often be described by a formula for a convex function, so in finite terms. Moreover, in many optimization applications, the function that has to be minimized is convex, and then the convexity is used to solve the problem. • What. In the previous chapters, we have invested considerable time and effort in convex sets and convex cones, proving all their standard properties. Now there is good news. No more essentially new properties have to be established in the remainder of this book. It remains to reap the rewards. In Chaps. 5 and 6, we consider to begin with convex functions: the dual properties in Chap. 6, and the properties that do not require duality—the primal properties in this chapter. This requires convex sets: convex functions are special functions on convex sets. Even better, they can be expressed entirely in terms of convex sets: convex functions are functions for which the epigraph, that is, the region above the graph, is a convex set. Some properties of convex functions follow immediately from a property of a convex set, applied to the epigraph. An example is the continuity property of convex functions. For other properties, a deeper investigation is required: one has to take the homogenization of the epigraph, and then one should apply a property of convex cones. An example is the unified construction of the eight standard binary operations on convex functions—sum, maximum, convex hull of the minimum, infimal convolution, Kelley’s sum, and three nameless ones—by means of the homogenization method. The defining formulas for them look completely different from each other, but they can all be generated in exactly the same systematic way by a reduction to convex cones (‘homogenization’). One has for convex functions the same technical problem as for convex sets: all convex functions that occur in applications have the nice properties of being closed and proper, but if you work with them and make new functions out of them, then they might lose these properties. Two examples of such work are: (1) the consideration, in a sensitivity analysis for an optimization problem, of the optimal value function, and (2) the application of a binary operation. Finally,
© Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045_5
123
124
5 Convex Functions: Basic Properties
twice continuously differentiable functions f (x) are convex iff their Hessian f (2) is positive semidefinite for all x. Road Map 1. Definitions 5.2.1, 5.2.4, 5.2.6, 5.2.9 and 5.2.10 (convex function, defined either by Jensen’s inequality or by the convexity of a set: its (strict) epigraph). 2. Proposition 5.3.1 (continuity property of a convex function). 3. Propositions 5.3.6, 5.3.9 (first and second order characterizations of convex functions). 4. Figure 5.7 and Definition 5.4.2 (construction convex function by homogenization). 5. Definitions 5.2.12, 5.3.4 (two nice properties for convex functions, closedness and properness). 6. Definitions 5.5.1, 5.5.2 (the two actions by linear functions on convex functions). 7. Figure 5.8, Definition 5.6.1, Proposition 5.6.6 (binary operations on convex functions—pointwise sum, pointwise maximum, convex hull of pointwise minimum, infimal convolution, Kelley’s sum and three nameless ones—defined by formulas and constructed by homogenization).
5.1 *Motivation The aim of this section is to motivate the concept of convex function.
5.1.1 Description of Convex Sets The aim of this section is to give two examples of how a convex set, which is usually an infinite set, can be described by a function, which can often be described in finite terms: by a formula. Example 5.1.1 (An Unbounded Closed Convex Set Viewed as an Epigraph) Suppose that A ⊆ X = Rn is an unbounded closed convex set. Then it has a nonzero recession vector v. Assume that −v is not a recession vector for A. Then we can view A as an epigraph, that is, as the region above the graph of a function. Figure 5.1 illustrates how this can be done. Here is a precise description. Let L be the hyperplane in X through the origin orthogonal to v, that is, L = v ⊥ = {x ∈ X  x · v = 0}. Let B ⊆ L be the image of A under the orthogonal projection of X onto L. Then we put f (x) = min{ρ  x + ρv ∈ A} for each x ∈ B. This is welldefined: by
5.1 *Motivation
125
Fig. 5.1 Unbounded closed convex set viewed as an epigraph A
the definition of B, there exists a number ρ ∈ R such that x + ρv ∈ A; as −v is not a recession direction of A, the set of such numbers ρ is bounded below; by the closedness of A, it follows that there is a minimal such number ρ. This completes the verification that f (x) is welldefined for all x ∈ B. This gives that x + f (x)v ∈ A and that x + ρv ∈ A implies ρ ≥ f (x). Conversely, for all ρ ≥ f (x), we have that the point x + ρv = (x + f (x)v) + (ρ − f (x))v is in A, as x + f (x)v ∈ A and as moreover v is a recession vector of A. So we get a function f : B → R and the set A is the subset of X consisting of all vectors x + ρv where (x, ρ) runs over the epigraph of f , epif = {(x, ρ)  x ∈ B, ρ ∈ R, ρ ≥ f (x)}. In this way, the convex set A is described completely in terms of the function f : it is essentially the epigraph of f . The function f is called a convex function as its epigraph is a convex set. Moreover, f is called lowersemicontinuous or closed as its epigraph is a closed set. There are many functions defined by a formula—that is, in finite terms—that can be verified to be convex functions, as we will see. For such a function f , one gets a convenient ‘finite’ description of the convex set epif , the epigraph of f —by the formula for the function f . Example 5.1.2 (A Closed Convex Set Described by Its Gauge) The Euclidean norm on Rn can be described in terms of the standard closed unit ball Bn as x = inf{t > 0  x/t ∈ Bn }. Now we generalize this construction. Suppose that we have a closed convex set A ⊆ X = Rn that contains the origin in its interior (recall that this condition can be assumed for a given nonempty closed convex set by translating it to a new position in such a way that a relative interior point is brought to the origin, and then restricting the space from Rn to the span of the translated copy of the given set). Then the closure of c(A), the homogenization of A, is the epigraph of a function pA : X → R+ . One has pA (x) = inf{t > 0  x/t ∈ A}.
126
5 Convex Functions: Basic Properties
Fig. 5.2 A closed convex set described by its gauge O
A
This function is called the Minkowski function or the gauge of A. The set A can be expressed in terms of this function by means of A = {x ∈ X  g(x) ≤ 1}. If g can be given by a formula, then this gives again a convenient finite description of the convex set A: by the formula for its gauge g, a convex function as its epigraph is a convex cone and so a convex set. Figure 5.2 illustrates this description for the case that A is bounded. A subset A of the plane R2 is drawn. It is a bounded closed convex set containing the origin in its interior. Some arrows are drawn from the origin to the boundary of A. On such an arrow the value of the gauge g of A grows linearly from 0 to 1. Conclusion We have seen two methods to describe convex sets by means of functions; this is convenient, as functions can often described by a formula. Then an ‘infinity’ (a convex set) is captured in finite terms. The first method is to view an unbounded convex set with the aid of a recession direction as the epigraph of a function. The second method is by describing a convex set by its gauge.
5.1.2 Why Convex Functions that Are Not Nice Can Arise in Applications In this chapter, we will define two nice properties, closedness and properness, that all convex functions that occur in applications possess. Now we describe an operation that has to be carried out in many applications and that leads to a convex function that might not have these two nice properties. This is the reason why one allows in the theory of convex functions also functions that do not have these two nice properties. Example 5.1.3 (The Importance of Sensitivity Analysis) Sometimes the sensitivity of an optimal solution is considered to be more important than the optimal solution itself. Suppose, for example, that a company wants to start a new project. To begin with, some possible scenarios for how to go about it might be created. Then, in order to help the responsible managers to make a choice, the expected revenue minus cost
5.2 Convex Function: Definition
127
is determined for each scenario, by means of optimization. You might be surprised to learn that managers are often not very interested in comparing the optimal profits of the scenarios. They are more interested in avoiding risky scenarios. Some of the data of an optimization problem for a given scenario, such as prices, are uncertain. In some scenarios, the sensitivity of optimal profits on the data is much higher than in others. Let S(y) be the optimal profit for a given scenario if the data vector is y. Then the function S is called the optimal value function, and by its definition it determines the sensitivity of the optimal value for changes y. Therefore, the optimal value function is of great interest. The function S is concave (that is, −S is convex), if the functions that describe the optimization problem for the scenario are convex. Indeed, making −S out of these convex functions is a convexity preserving operator. In each application, the convex functions that describe the optimization problem for a given scenario have the two nice properties. However, it turns out that the optimal value function S need not have these two nice properties.
5.2 Convex Function: Definition The aim of this section is to define convex functions. Here is the traditional definition of a convex function. Definition 5.2.1 A proper convex function is a function f : A → R, where A ⊆ X = Rn is a nonempty convex set, for which Jensen’s inequality f ((1 − α)x + αy) ≤ (1 − α)f (x) + αf (y) ∀α ∈ [0, 1] ∀x, y ∈ A holds. In geometric terms, Jensen’s inequality means that each chord with endpoints on the graph of a proper convex function f lies entirely above or on the graph of f . Figure 5.3 illustrates this property. Here we make an observation that is often useful, as it allows to reduce some arguments involving a convex function of several variables to the case of a convex function of one variable: a function is convex iff its restriction to each line is convex. Fig. 5.3 Jensen’s inequality
128
5 Convex Functions: Basic Properties
Example 5.2.2 (Convex Function) Here are some characteristic examples of convex functions. Later we will show how it can be verified that a function is convex in an easier way than by checking Jensen’s inequality. • Each linear function f (x) = a1 x1 +· · · an xn ; moregenerally each quadratic function f (x) = x Ax + b x + c = i,j aij xi xj + k ck xk + d for which the matrix A + A is positive semidefinite. 1 • The function f (x) = (x12 + · · · + xn2 ) 2 , the Euclidean length. This function is not differentiable at x = 0. • The function f (x) = 1/(1 − x 2 ), x ∈ (−1, 1). This function is not defined on the entire line R. • The function f on the nonnegative real numbers R+ for which f (x) = 0 for x > 0 and f (0) = 1. This function is not continuous at 0. It is often convenient to extend a convex function f : A → R to the entire space X by defining its value everywhere outside A formally to be +∞. The resulting function X → (−∞, +∞] is then called a proper convex function on X. Example 5.2.3 (Indicator Function) Let A ⊆ X = Rn be a nonempty convex set. Then the indicator function of A, defined to be the function δA : X → (−∞, +∞] given by the recipe x → 0 if x ∈ A and x → +∞ otherwise, is a proper convex function on X. However, a more efficient way to define convex functions (and then to distinguish between proper and improper ones) is by reducing this concept to the concept of convex set: a function is defined to be convex if the region above its graph is a convex set. Now we formulate this definition in a precise way, using the concept epigraph of a function. Definition 5.2.4 The epigraph of a function f on X = Rn taking values in the extended real number line R = [−∞, +∞] = R ∪ {−∞, +∞} is the region above its graph, that is, it is the set epi(f ) = {(x, ρ) ∈ X × R  ρ ≥ f (x)} in X × R. We give an informal explanation of the idea behind this definition. Suppose we are given a continuous function of two variables. Consider its graph, a surface in threedimensional space. Pour concrete on this surface and let it turn to stone. This stone is the epigraph. You do not lose any information by taking the epigraph of a function, as a function f can be retrieved from its epigraph as follows: f (x) = min{ρ  (x, ρ) ∈ epif }
5.2 Convex Function: Definition
129
if the set of real numbers ρ for which (x, ρ) ∈ epif is nonempty and bounded below, f (x) = +∞ if the set of real numbers ρ for which (x, ρ) ∈ epif is empty, f (x) = −∞ if the set of real numbers ρ for which (x, ρ) ∈ epif is unbounded below. This can be expressed by means of one formula, by using the concept of infimum. Definition 5.2.5 The infimum inf S of a set S ⊆ R is the largest lowerbound r ∈ R = R ∪ {±∞} of S. Note that we allow the infimum to be −∞ (this means that S is unbounded below) or +∞ (this means that S is empty). It is a fundamental property of the real numbers that this concept is welldefined. It always exists, but the minimum of a set S ⊆ R does not always exist. If the minimum of a set S ⊆ R does exist, then it is equal to the infimum of S. Therefore, the infimum is a sort of surrogate minimum. A function can be retrieved from its epigraph by means of the following formula: f (x) = inf{ρ  (x, ρ) ∈ epif } ∀x ∈ X. Here is the promised efficient definition of a convex function. Definition 5.2.6 A function f : X = Rn → R is convex if its epigraph is a convex set in X × R. Figure 5.4 illustrates this definition. We give an informal explanation of the idea behind this definition for a function of two variables. Pour concrete on the graph of this function and let it turn to stone. If the stone is a convex set, then the function is called convex. Now the announcement in the beginning of this chapter, about no essentially new properties having to be established to prove properties of convex functions, can be explained in more detail. Standard results on a convex function will follow either from a property of a convex set, applied to the epigraph of the function, or from a property of a convex cone, applied to the homogenization of the epigraph. Here is Fig. 5.4 Convexity of a function and of its epigraph
yes
yes
f
no f
f
130
5 Convex Functions: Basic Properties
a first example of how properties of convex functions follow from properties from convex sets. Proposition 5.2.7 Let f : X = Rn → R be a convex function and let γ ∈ R. Then the sublevel set {x ∈ X  f (x) ≤ γ } and the strict sublevel set {x ∈ X  f (x) < γ } are convex sets. Proof We give the proof for the sublevel set; the proof for the strict sublevel set is similar. As f is a convex function, its epigraph is convex. Intersection with the closed halfspace X × (−∞, γ ], which is a convex set, is a convex set. Therefore, the image of this intersection under the orthogonal projection from X × R onto X, given by (x, α) → x, is a convex set. This image is by definition the sublevel set {x ∈ X  f (x) ≤ γ }.
Example 5.2.8 (Convex Sets as Level Sets of Convex Functions) The following sets are convex as they are level sets of convex functions. • Ellipse. The solution set of 2x12 + 3x22 ≤ 5 is a level set of x → 2x12 + 3x22 . • Parabola. The solution set of x2 ≥ x12 is a level set of x → x12 − x2 . • Hyperbola. The solution set of x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 is a level set of the function x → − ln x1 − ln x2 defined on R2++ . Sometimes, the definition of a convex function, in terms of the strict epigraph, is more convenient. Definition 5.2.9 The strict epigraph of a function f on X = Rn taking values in the extended real number line R = [−∞, +∞] = R ∪ {−∞, +∞} is the set epis (f ) = {(x, ρ) ∈ X × R  ρ > f (x)} in X × R. If you take the strict epigraph of a function, you do not lose any information either. Definition 5.2.10 A function f : X = Rn → R is convex if its strict epigraph is a convex set in X × R. Definition 5.2.11 The effective domain of a convex function f on X = Rn is the subset dom(f ) = A = {x ∈ X  f (x) = +∞} of X. Note that the effective domain of a convex function is a convex set. Note also that we allow a convex function to take value −∞ on its effective domain. The convex functions that are of interest in the first place are the following ones. Definition 5.2.12 A convex function f : X = Rn → R is proper if it does not assume the value −∞ and if moreover it does not only assume the value +∞. Convex functions that are not proper are called improper. In other words, a proper convex function on X arises from a function f : A → R, where A is a nonempty convex set in X that satisfies Jensen’s inequality
5.2 Convex Function: Definition
131
by extending it to the entire space X, defining its value to be +∞ everywhere outside A. Example 5.2.13 (Improper Convex Function) What do improper convex functions look like? They can be characterized almost completely in a simple way. Assume that A ⊆ Rn is a closed convex set. Define the ‘almost identical’ functions rA− , rA+ : X → R by rA− (x) = −∞ ∀x ∈ A, rA− (x) = +∞ ∀x ∈ A and rA+ (x) = −∞ ∀x ∈ ri(A), rA+ (x) = +∞ ∀x ∈ ri(A). Then rA− , rA+ are improper convex functions. In fact they are essentially the only improper convex functions. To be precise, a convex function f : X → R is improper − + iff one has, for A = dom(f ), that rcl(A) ≤ f ≤ rcl(A) . Figure 5.5 illustrates this characterization of improper convex functions. We cannot avoid improper convex functions; this has been explained in Sect. 5.1.2. Conclusion The most fruitful definition of a convex function is by means of its property that the epigraph is a convex set. One has to allow in the theory socalled improper convex functions on X = Rn ; these can be classified almost completely in terms of closed convex sets in X. Fig. 5.5 Improper convex functions
+∞
+∞
−∞
132
5 Convex Functions: Basic Properties
5.3 Convex Function: Smoothness Convex functions have always the following continuity property. Proposition 5.3.1 Each proper convex function f on X = Rn is continuous on the relative interior of its effective domain dom(f ). Proof We may assume wlog that the interior of the effective domain of f is nonempty. This implies that the affine hull of epi(f ) is X × R. Now we apply the theorem on the shape of a convex set, Theorem 3.5.7. It follows that epi(f ) has a nonempty interior. For each point x ∈ dom(f ), the point (x, f (x)) is a boundary point of the epigraph of f . Another application of Theorem 3.5.7 gives that the strict epigraph of the restriction of f to int(dom(f )) is contained in the interior of epi(f ). Now choose an arbitrary v ∈ int(dom(f )). To prove the proposition, it suffices to show that f is continuous in xv. Now we are going to apply Lemma 3.5.9. Let A = epi(f ), x = (v, f (v)) and a = (v, f (v) + 1). By what we have proved above, a ∈ int(A), so we can choose ε > 0 such that U (a, ε) ⊆ A. Now all conditions of Lemma 3.5.9 hold for these choices of A, x, a, ε. Therefore, the conclusion of this lemma holds. This is, in terms of the function f , that for each h ∈ U (v, ε, one has the inequality f (v + h) − f (v) < ε−1 h. This implies that f is continuous at v.
Remark 5.3.2 This proof shows a stronger result than the stated than is stated: a proper convex function f is Lipschitz continuous on each compact set G inside its effective domain: there exists a constant K > 0 such that f (x)−f (y) < Kx −y for all x, y ∈ G. So, by Proposition 5.3.1, proper convex functions behave nicely on the relative interior of their effective domain. In contrast to the nice behavior of proper convex functions on the relative interior of its effective domain, their behavior can be very rough on the relative boundary of their effective domain. Example 5.3.3 (Convex Function that Behaves Wildly on the Boundary of Its Effective Domain) Any function Sn → [0, +∞) on the standard unitsphere is the restriction of a convex function Bn → [0, +∞) on the standard unitball: define the function to be 0 everywhere in the interior of Bn . So a convex function can behave wildly on its boundary. However, convex functions that occur in applications always nicely on the boundary of their effective domain. We need a definition to formulate precisely what we mean by nice behavior of a convex function on its boundary.
5.3 Convex Function: Smoothness
133
Definition 5.3.4 A convex function f : Rn → [−∞, +∞] is closed or lower semicontinuous if the following equivalent properties hold: 1. the epigraph of f is closed, 2. lim infx→a f (x) = f (a) for each point a on the relative boundary of the effective domain of f . We recall that lim infx→a f (x) is defined to be the limit in R of inf{f (x)  x − a < r} for r ↓ 0. It is equal to limx→a f (x) if this limit exists in R. Warning Sometimes, a different definition is given for closedness of improper functions. Then it is considered that the only improper convex functions that are closed are the constant functions +∞ and −∞. All convex functions that occur in applications have the following two nice properties: they are proper and they are closed or lower semicontinuous. One can ‘make a convex function closed by changing its values on the relative boundary of its effective domain’: by taking the closure of its epigraph. Proposition 5.3.5 Let f : X = Rn → R be a convex function. Then the following statements hold. 1. The closure of the epigraph of f is the epigraph of a convex function cl(f ) : X → R, the maximal closed underestimator of f . 2. The functions f and cl(f ) agree on the relative interior of dom(f ). Suppose you are given a function by an explicit formula and you want to check whether it is convex. You will find that it is often not convenient to verify whether the epigraph is convex or whether Jensen’s inequality holds even if it is a function of one variable and the formula for the function is simple. Here is a criterion for convexity of differentiable functions. Proposition 5.3.6 Let an open set A ⊆ Rn and a differentiable function f : A → R be given. Then f is convex iff f (x) − f (y) ≥ f (y)(x − y) for all x, y ∈ A. This is already more convenient for checking convexity of a function than the definition. Example 5.3.7 (A Nondifferential Convex Function: A Norm) Not all convex functions are differentiable everywhere. The main basic example of a convex function that is not differentiable everywhere is a norm; for dimension one this gives the absolute value function x. Figure 5.6 illustrates this example. The graph of the Euclidean norm on Rn is drawn for n = 1 (this is the graph of the absolute value function) and for n = 2 (then the region above the graph is an ice cream cone). The nondifferentiability of a convex function can never be very serious. For example, a proper closed convex function of one variable can have at most countably many points of its effective domain where it is nondifferentiable. Here is a criterion for convexity of twice differentiable functions. This is often very convenient, in particular for functions of one variable. A variant of it gives a
134
5 Convex Functions: Basic Properties
0
0
Fig. 5.6 Nondifferentiable convex functions
sufficient condition for the following stronger concept than convexity. This stronger concept will be used when we consider optimization in Chap. 7. Definition 5.3.8 A proper convex function is called strictly convex if it is convex and if, moreover, its graph does not contain an interval of positive length. Proposition 5.3.9 Assume that A is open. Then a twice differentiable function f : A → R is convex iff its Hessian f (x) is positive semidefinite for each x ∈ A. Moreover, if f (x) is positive definite for all x ∈ A, then f is strictly convex. Example 5.3.10 (Second Order Conditions for (Strict) Convexity Function) 1. The function f (x) = ex is strictly convex as f (x) = ex > 0 for all x. 2. The function f (x) = 1/(1 − x 2 ), −1 < x < 1 is strictly convex as f (x) = 2(1 − x 2 )(1 + 3x 2 )/(1 − x 2 )4 > 0. 3. The function f (x1 , x2 ) = ln(ex1 + ex2 ) 2 = 0, but not is convex, as f11 = f22 = ex1 +x2 (ex1 +ex2 )−2 > 0 and f11 f22 −f12 strictly convex as f (x, x) = x + ln 2 and so the graph of f contains an interval of positive length. 4. Let f be a quadratic function of n variables. Write f (x) = x Cx + c x + γ where C ∈ Sn , the symmetric n × nmatrices, c is an ncolumn vector and γ is a real number, and where x runs over Rn . This function f is convex iff the matrix C is positive semidefinite and strictly continuous if C is positive definite.
You are invited to apply this result to a few onedimensional examples in Exercise 25. For a challenging multidimensional example, see Exercise 27. This last exercise requires considerable algebraic manipulations. In the next chapter, we will show that this exercise can be done effortlessly as well—by virtue of a trick. Conclusion A convex function is continuous on the relative interior—but not necessarily on the relative boundary—of its effective domain. A convex function is not necessarily differentiable at a relative interior point: the main example is a norm at the origin. If a function is twice differentiable, then there is a useful criterion for
5.4 Convex Function: Homogenization
135
its convexity in terms of its Hessian. This criterion gives an explicit description of the following nice explicit class of convex functions: the quadratic functions that are convex.
5.4 Convex Function: Homogenization The aim of this section is to present the homogenization or conification of a convex function. We are going to show that every convex function f : X → R, where X = Rn , can be described by a convex cone in two dimensions higher, X×R×R = Rn+2 , that lies entirely on or above the horizontal coordinate hyperplane xn+2 = 0, that is, it is contained in the closed halfspace xn+2 ≥ 0, and that remains a convex cone if you adjoin to it the positive (n + 1)st coordinate axis, that is, R++ (0n , 1, 0). Example 5.4.1 (Homogenization Convex Function) Figure 5.7 illustrates the idea of homogenization of convex functions for a convex function f of one variable. In this figure, the graph of a convex function f : R → R is drawn, as a subset of the floor. Its strict epigraph epis (f ) is shaded. This is the region strictly above the graph of f . This region is a convex set in the plane R2 . Then the conification of this convex set is taken. We recall how this is done. The strict epigraph of f is lifted up in space R3 in vertical direction to level 1. Then the union is taken of all those open rays that originate from the origin and that run through a point on the lifted up copy of the strict epigraph of f . This convex cone is the promised conification of the convex set epis (f ); we define this convex cone to be the homogenization or conification of f and we denote it by C = c(f ). Note that c(f ) remains a convex cone if you adjoin the ray {(0, ρ, 0)  ρ > 0} to it. The convex function f is called the (convex function) dehomogenization or deconification of the convex cone C. Note that there is no loss of information if you go over from a convex function f to its homogenization: you can retrieve f from c(f ). Just intersect c(f ) with the horizontal plane at level 1 and drop the intersection down in vertical direction Fig. 5.7 Homogenization of a convex function
c(f)
0
f
136
5 Convex Functions: Basic Properties
on the floor. This gives the strict epigraph of f , from which the function f can be recovered. Now we give a formal definition of the description of a convex function f : X → R, where X = Rn , in terms of a convex cone. Again, just as for the homogenization of a convex set, it is convenient to give first the definition of deconification and then the definition of conification. Let C ⊆ X × R × R = Rn+2 be a convex cone for which C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ . Note that this implies the following condition (and if C is closed it is even equivalent to it): R+ (0X , 1, 0) ⊆ cl(C) ⊆ X × R × R+ . Then C ∩ (X × R × {1}) = A × {1} for some convex set A ⊆ X × R that has (0X , 1) as a recession vector. Then this set A is the convex set deconification of the convex cone C. Consider the function f : X → R that is defined by f (x) = inf{ρ  (x, ρ) ∈ A} for all x ∈ X. This function f can be expressed directly in terms of the convex cone C by f (x) = inf{ρ  (x, ρ, 1) ∈ C} ∀x ∈ X. This function is readily verified to be convex. All convex functions f on X are the dehomogenization of some convex cone C ⊆ X × R × R = Rn+2 for which C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ . Definition 5.4.2 The (convex function) dehomogenization or deconification of a convex cone C ⊆ X × R × R = Rn+2 for which C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ is the convex function f = d(C) : X → R given by f (x) = inf{ρ  (x, ρ, 1) ∈ C} ∀x ∈ X.
5.4 Convex Function: Homogenization
137
Then the convex cone C is called a (convex function) homogenization or conification of the convex function f . There is no loss of information if we go over from a convex function f to a homogenization C: we can recover f from C: f = d(C), that is, f (x) = inf{ρ  (x, ρ, 1) ∈ C} ∀x ∈ X. Example 5.4.3 (Deconification) The convex cones C1 = {x ∈ R3  x12 + x22 ≤ x32 , x2 ≤ 0, x3 ≥ 0}, C2 = {x ∈ R3  x2 x3 ≥ x12 , x3 ≥ 0}, C3 = {x ∈ R3  x1 x2 ≥ x32 , x3 ≥ 0} have as deconifications the convex functions that have as graphs a halfcircle, a parabola, a branch of a hyperbola, respectively; their function recipes are x1 →
− 1 − x12 , where −1 ≤ x1 ≤ 1, x1 → x12 , where x1 ∈ R, and x1 → 1/x1 , where x1 > 0, respectively.
We have already stated the fact that each convex function f on X is the (convex function) deconification of a suitable convex cone C ⊆ X × R × R = Rn+2 for which C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ . This convex cone is not unique. The following result describes all conifications of a given convex function f on X. Proposition 5.4.4 Let f be a convex function on X = Rn . The conifications C ⊆ X × R × R+ of f are the intermediate convex cones of the following two convex cones: the minimal conification cmin (f ) = R++ (epis (f )×{1}) = {ρ(x, σ, 1)  ρ ∈ R++ , x ∈ dom(f ), σ > f (x)}, which is contained in X × R × R++ , and the maximal conification cmax (f ) = R++ (epi(f ) × {1}) ∪ (Repi(f ) × {0}). So the minimal conification of a convex function f on X = Rn is the convex set conification of the strict epigraph of f , lifted up to level 1 in an additional dimension; the maximal conification is the union of the convex set conification of the epigraph of f , lifted up to level 1 of the additional dimension, and the recession cone of the epigraph of f , on level 0 of the additional dimension.
138
5 Convex Functions: Basic Properties
This result follows immediately from the definitions. We choose the minimal homogenization of f to be our default homogenization. We write c(f ) = cmin (f ) and we call c(f ) the homogenization of f . Conclusion A convex function can be described in terms of a convex cone: by taking the conification of the convex set which is the strict epigraph of the convex function.
5.5 Image and Inverse Image of a Convex Function The aim of this section is to give the following two ways to make a new convex function from an old one by means of a linear function—by taking the image and by taking the inverse image. Let M : Rn → Rm be a linear function—that is, essentially an m × nmatrix. Definition 5.5.1 The inverse image f M of a convex function f on Rm under M is the convex function on Rn given by (f M)(x) = f (M(x)) ∀x ∈ Rn . This definition shows that the chosen notation for inverse images of convex sets and functions is natural and convenient. Note that taking the inverse image of a convex function does not preserve properness. Definition 5.5.2 The image Mg of a convex function g on Rn under M is the convex function on Rm given by Mg(y) = inf{g(x)  M(x) = y} ∀y ∈ Rm . In terms of the reduction of convex functions to convex cones, the image and inverse image can be characterized as follows. If C is a conification of f , then CM is a conification of f M. To be more precise, CM is the inverse image of C under the linear function Rn+2 → Rm+2 given by the recipe (x, α, β) → (Mx, α, β). If C is a conification of g, then MC is a conification of Mf . Again, to be more precise, MC is the image of C under the linear function Rn+2 → Rm+2 given by the recipe (x, α, β) → (Mx, α, β).
5.6 Binary Operations for Convex Functions
139
In terms of the reduction of convex cones to convex sets—their (strict) epigraph—the image and inverse image can be characterized as follows. For the inverse image both epigraph and strict epigraph work fine: epi(f M) = (epif )M and epis (f M) = (epis f )M. For the image, only the strict epigraph works fine: epis (Mg) = M(epis (g)). This is a first example where the strict epigraph is more convenient than the epigraph. It is the reason why the strict epigraph is chosen to define the homogenization of a convex function. Warning The properties properness and closedness are not preserved under taking images and inverse images (the only preservation is that of closedness under inverse images).
5.6 Binary Operations for Convex Functions The aim of this section is to construct the five standard binary operations for convex functions in a systematic way—by the homogenization method. So far we did not need homogenization of convex functions: it was enough to describe convex functions by means of their epigraph or strict epigraph. But in this section, this is not enough: we do need homogenization, to reduce all the way to convex cones. Here are five standard binary operations for convex sets—that is, rules to make a new convex function X = Rn → R from two old ones. Definition 5.6.1 • f + g (pointwise sum), (f + g)(x) = f (x) + g(x) for all x ∈ X, • f ∨ g (pointwise maximum), (f ∨ g)(x) = max(f (x), g(x)) for all x ∈ X, • f co ∧ g (convex hull of minimum), the maximal convex underestimator of f and g, that is, the strict epigraph of f co ∧ g is the convex hull of the strict epigraph of f1 ∧ f2 = min(f1 , f2 ), the pointwise minimum of f1 and f2 . • f ⊕ g (infimal convolution), f ⊕ g(x) = inf{f (x1 ) + g(x2 )  x = x1 + x2 } for all x ∈ X. • f #g (Kelley’s sum), f #g(x) = inf{f (x1 ) ∨ g(x2 )  x = x1 + x2 } (where a ∨ b = max(a, b) for a, b ∈ R) for all x ∈ X.
140
5 Convex Functions: Basic Properties
g
f
f ∨g
f
g
f co ∧ g
Fig. 5.8 Two binary operations
Note that for two of these operations , pointwise sum and pointwise maximum, the value of f g at a point depends only on the value at this same point of f and g. Example 5.6.2 (Binary Operations for Convex Functions (Geometric)) Figure 5.8 illustrates the pointwise maximum and the convex hull of the pointwise minimum. To get the graph of f ∨ g one should follow everywhere the highest one of the graphs of f and g. To get the graph of f co ∧ g, one should first follow everywhere the lowest one of the graphs of f and g; then one should make the resulting graph convex, by taking the graph of the highest convex minorizing function. Example 5.6.3 (Binary Operations for Convex Functions (Analytic)) Consider the two convex functions f (x) = x 2 and f (x) = (x − 1)2 = x 2 − 2x + 1. 1. (f + g)(x) = x 2 + (x − 1)2 = 2x 2 − 2x + 1, 2. (f ∨ g)(x) = (x − 1)2 = x 2 − 2x + 1 for x ≤ 12 and = x 2 for x ≥ 12 , 3. (f ∧ cog)(x) = x 2 for x ≤ 0, = 0 for 0 ≤ x ≤ 1, and = (x − 1)2 = x 2 − 2x + 1 for x ≥ 1. 4. (f ⊕ g)(x) = infu (u2 + (x − u − 1)2 ) by definition; by completion of the square of the quadratic function of u inside the brackets, this is equal to 1 1 1 1 1 inf 2(u − (x − 1)2 )2 + (x − 1)2 = (x − 1)2 = x 2 − x + . u 2 2 2 2 2 5. (f #g)(x) = infu (u2 ∨ (x − u − 1)2 ) by definition; standard knowledge of parabolas gives that this is equal to the value of the function u2 or the function (x − u − 1)2 at the u for which these two values are equal—u = 12 (x − 1)—so this is equal to ( 12 (x − 1))2 = 14 x 2 − 12 x + 14 . Example 5.6.4 (The Convenience of Working with Epigraphs) As an illustration of the convenience of working with epigraphs or strict epigraphs of convex functions rather than with Jensen’s inequality, we verify for one of the binary operations—the pointwise maximum—that this operation is welldefined—that is, that it turns two old convex functions into a new convex function.
5.6 Binary Operations for Convex Functions
141
Proof As the functions f, g are convex, the sets epi(f ), epi(g) are convex. Therefore the intersection epi(f )∩epi(g) is convex. This intersection is by definition of the binary operation max equal to epi(f ∨g). So the function f ∨g is convex.
Warning The properties closedness and properness are not always preserved under binary operations. For example, if A1 , A2 are nonempty disjoint convex sets in Rn and f1 : A1 → R, f2 : A2 → R are convex functions, then f1 + f2 has effective domain A1 ∩ A2 = ∅, and so it is not proper. Now a systematic construction is given of the five binary operations of Definition 5.6.1: by the homogenization method. That this can be done, is an advantage of the homogenization method. It will be useful later, when we will give for convex functions the convex calculus rules for their dual description and for their subdifferentials. Recall at this point how this was done for convex sets. Here the procedure will be similar. The precise way to construct a binary on convex functions by homogenization, is as follows. We take the space that contains the homogenizations of convex functions on X (we recall that X is essentially Rn ): this space is the product space X × R × R. Then we make for each of the three factors of this space—X, the first factor R, and the second factor R—a choice between + and −. Let the choice for X be ε1 , let the choice for the first factor R be ε2 , and let the choice for the second factor R be ε3 . This gives 23 = 8 possibilities for (ε1 , ε2 , ε3 ): (+ + +), (+ + −), (+ − +), (+ − −), (− + +), (− + −), (− − +), (− − −). Each choice (ε1 , ε2 , ε3 ) will lead to a binary operation ◦(ε1 ,ε2 ,ε3 ) on convex functions. Let (ε1 , ε2 , ε3 ) be one of the eight choices, and let f and g be two ‘old’ convex functions on X. We will use the homogenization method to construct a ‘new’ convex function f ◦(ε1 ,ε2 ,ε3 ) g on X. 1. Conify. We conify f and g. This gives conifications c(f ) and c(g) respectively. These are convex cones C in X × R × R = Rn+2 for which C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ . 2. Work with convex cones. We want to associate to the old convex cones C = c(f ) and D = c(g) a new convex cone E in X × R × R = Rn+2 for which E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ . We take the Cartesian product C × D. This is a convex cone in X × R × R+ × X × R × R+ . We rearrange its factors: (X × X) × (R × R) × (R+ × R+ ). This has three factors: the first factor is X × X, the second factor is R × R and third factor is R+ × R+ . Now we view C ×D as a subset of (X ×X)×(R×R)×(R+ ×R+ ). Then we act on C × D as follows, using the triplet (ε1 , ε2 , ε3 ) that we have chosen. We define the promised new cone E to be the outcome of applying simultaneously three operations on C × D: the first operation is taking the image under the mapping +X if ε1 = +, but it is the inverse image under the mapping X if ε1 = −; the second operation is taking the image under the mapping +R on the second
142
5 Convex Functions: Basic Properties
factor if ε2 = +, but it is taking the inverse image under the mapping R on the second factor if ε2 = −; the third operation is taking the image under the mapping +R on the third factor if ε2 = +, but it is taking the inverse image under the mapping R on the third factor if ε2 = −. The outcome E is a convex cone in X × R × R = Rn+2 for which E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ . 3. Deconify. We take the convex function deconification of E to be f ◦(ε1 ,ε2 ,ε3 ) g. Example 5.6.5 (The Binary Operation ◦(−+−) for Convex Functions) We will only do one of the eight possibilities: as a specific example, we choose − for the factor X, + for the first factor R, and − for the second factor R. Then we form the triplet (− + −). This triplet will determine a binary operation (f, g) → f ◦(−+−) g on convex functions on X. At this point it remains to be seen which binary operation we will get. To explain what this binary operation is, let f and g be two ‘old’ convex functions on X. We have to make a new convex function f ◦(−+−) g on X from f and g. We use the homogenization method. 1. Conify. We conify f and g. This gives conifications c(f ) and c(g) respectively. These are convex cones F in X × R × R = Rn+2 for which F + R+ (0X , 1, 0) ⊆ F ⊆ X × R × R+ . 2. Work with convex cones. We want to associate to the old convex cones C = c(f ) and D = c(g) a new convex cone E in X × R × R = Rn+2 for which E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ . We take the Cartesian product C × D. This is a convex cone in X × R × R+ × X × R × R+ . We rearrange its factors: (X × X) × (R × R) × (R+ × R+ ). This has three factors: the first factor is X × X, the second factor is R × R and third factor is R+ × R+ . Now we view C×D as a subset of (X×X)×(R×R)×(R+ ×R+ ). Then we act on C × D as follows, using the triplet (ε1 , ε2 , ε3 ) that we have chosen (− + −). We define the promised new cone E to be the outcome of applying simultaneously the inverse image under X on the first of the three factors of (X × X) × (R × R) × (R+ × R+ ) (as the triplet (− + −) has − on the first place), the image under +R on the second of the three factors of (X × X) × (R × R) × (R+ × R+ ) (as the triplet (− + −) has + on the second place), and the inverse image under R on the third of the three factors of (X × X) × (R × R) × (R++ × R+ ) (as the triplet (− + −) has − on the third place).
5.6 Binary Operations for Convex Functions
143
The outcome E is a convex cone in X × R × R = Rn+2 for which E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ . Explicitly, E consists of all elements (z, γ , τ ) ∈ X ×R×R for which there exists an element ((x, α, ρ), (y, β, σ )) ∈ C × D such that X (z) = (x, y), γ = +R (α, β), R (τ ) = (ρ, σ ). One can use this description to check that E is a convex cone in X × R × R for which the closure is intermediate between R+ (0X , 1, 0) and X × R × R+ . Now we use that C = c(f ) and D = c(g). It follows that E consists of all triplets (z, γ , τ ) ∈ X × R × R for which there exists an element (s(x, ξ, 1), t (y, η, 1)) = (sx, sξ, s, ty, tη, t) ∈ c(f ) × c(g) with (x, ξ ) ∈ epis (f ), (y, η) ∈ epis (g) for which z = sx = ty, γ = sξ + tη, τ = s = t. Thus E consists of the set of triplets (γ x, γ (ξ + η), γ ) with (x, ξ ) ∈ epis (f ), (x, η) ∈ epis (g). That is, E = c(f + g), the homogenization of the pointwise sum f + g. 3. Deconify. The deconification of E = c(f + g) is f + g. Therefore, the choice of (− + −) leads to the binary operation pointwise sum, f ◦(−+−) g = f + g for all convex functions f, g on X = Rn . It is recommended that you repeat this construction, but now for some of the other triplets. Here is a list which choice in the construction leads to which one of the standard binary operations pointwise sum, pointwise maximum, convex hull of the pointwise minimum and the infimal convolution (we do not display the choice that gives Kelley’s sum, as this binary operation is rarely used). Proposition 5.6.6 The four wellknown binary operations for convex functions are obtained by the following choices of triplets in the construction by the homogenization method: 1. (− + −) gives pointwise sum f + g, 2. (− − −) gives pointwise maximum f ∨ g,
144
5 Convex Functions: Basic Properties
3. (+ + +) gives convex hull of the pointwise minimum f co ∧ g, 4. (+ + −) gives infimal convolution f ⊕ g. It is satisfying to see that the four wellknown binary operations on convex functions—‘pointwise sum’, ‘pointwise maximum’, ‘convex hull of the pointwise minimum’, ‘infimal convolution’—which are traditionally given by completely different looking explicit formulas are all generated in a systematic way by the homogenization method. We saw this phenomenon—that all classic explicit formulas of convex analysis can be generated systematically by the same method— already in action for the binary operations for convex sets. Moreover, we see that there are four choices of triplets left. One of these leads to Kelley’s sum and the other three lead to three nameless binary operations on convex functions for which so far no applications have been found. The formulas for these binary operations will not be displayed, but you are welcome to derive the formula for one such binary operation. The eight binary operations on convex functions can be denoted in a systematic way as ◦(1 2 3 ) where i ∈ {+, −} for i = 1, 2, 3. The binary operation that is obtained by choosing triplet (α1 , α2 , α3 ) with α1 ∈ {+X , X } and α2 , α3 ∈ {+R , R } is denoted by ◦(1 2 3 ) , where (1 2 3 ) is made from (α1 , α2 , α3 ) by replacing +Y by + and Y by − (where Y = X or Y = R). For example, ◦(−+−) denotes pointwise sum: this binary operation is obtained if the triplet (X , +R , R ) is chosen; then we replace X by −, +R by +, and R by −, and this gives the triplet (− + −). Conclusion The five standard binary operations for convex functions are defined by completely different formulas, but they are all generated in the same systematic way—by the homogenization method.
5.7 Recession Cone of a Convex Function The aim of this section is to define the recession cone of a convex function. The recession cone of a proper convex function on X = Rn always contains the vector (0X , 1). This is not interesting. Therefore, we define recession vectors of a convex function f by taking the vertical projections on X of the recession vectors of the epigraph of f that do not point upward—that is, their last coordinate should be nonpositive. Definition 5.7.1 For each convex function f on Rn and each real number γ the sublevel set of f for level γ is the following convex set: Vγ = {x  f (x) ≤ γ }. Definition 5.7.2 The recession cone Rf of a closed proper convex function f on X = Rn is the recession cone of a nonempty level set of f .
5.8 *Applications
145
Fig. 5.9 Recession cone of a convex function
f
γ
0
Rf
This concept is welldefined: which nonempty sublevel set is chosen does not matter. Figure 5.9 illustrates the concept of recession cone of a convex function. Thus, the recession cone of a convex function is defined to be ‘the interesting part’ of the recession cone of the epigraph of the function. Here is the main application of this concept. Proposition 5.7.3 If f is closed and Rf = {0}, then f assumes its minimum. It might be surprising that this somewhat artificial looking concept can be defined in a natural way, as a dual concept to the concept of effective domain of a convex function. We will see this in Chap. 6.
5.8 *Applications The aim of this section is to give applications of the concept of convex function.
5.8.1 Description of Convex Sets by Convex Functions The aim of this section is to give further explanation that convex functions allow to describe some convex sets. By the second order criterion for convexity, one can give explicit examples of 1 convex functions of one variable such as x 2 , −x 2 , 1/x, ex , − ln x (x > 0). For such a function f , its epigraph {(x, y)  y ≥ f (x)} is a convex set in the plane R2 . More generally, the second order conditions can be used to verify the convexity of functions of several variables such as f (x) = x Ax +a x +α where A is a positive semidefinite n×nmatrix, a is an ncolumn and α a real number. Besides, each norm on Rn is a convex function. By repeatedly applying binary operations one gets many more convex functions. Then for each convex function f of n variables constructed explicitly, one gets an explicit construction of a convex set—its epigraph. Conclusion One can give explicit examples of convex sets by constructing convex functions and taking their epigraphs.
146 Fig. 5.10 Risk averse
5 Convex Functions: Basic Properties
U
a c pa + qb
b
5.8.2 Application of Convex Functions that Are Not Specified The aim of this section is to explain why, in some valuable applications in economic theory, convex functions occur that are not specified. Suppose that someone expects to receive an amount of money a with probability p ∈ [0, 1] and an amount of money b with probability q = 1−p. Then the expected amount of money he will receive is pa + qb. If he is risk averse, then he would prefer to receive an amount of money slightly less than pa + qb for sure. This can be modeled by means of a utility function U of the person: a prospect of getting the amount c for sure is equivalent to the prospect above if U (c) = pU (a) + qU (b). It is reasonable to require that a utility function is increasing: ‘the more the better’. Moreover, it should be concave (that is −U should be convex), as the person is risk averse. Figure 5.10 illustrates the modeling of risk averse behavior by concavity of the utility function. Thus risk averse behavior is modeled by a convex function that is not specified. In economic theory, many problems are modeled by convex functions and analyzed fruitfully, although these functions are not specified by a formula; one only assumes these functions to be convex or concave. Conclusion The property of a function to be convex or concave corresponds sometimes to desired properties in economic models. In the analysis of these models these properties are assumed and used in the analysis, without a specification of the functions by a formula.
5.9 Exercises 1. Show that every linear function is convex. 2. Show that ln det(X), where X runs over the positive definite n × nmatrices, is a convex function. 3. Show that if f, g : Rn → R are convex functions of one variable and g is nondecreasing, then h(x) = g(f (x)) is convex.
5.9 Exercises
147
4. Show that a convex function f : [0, +∞) → R with f (0) ≤ 0 has the following property: f (x) + f (y) ≤ f (x + y) for all x, y. 5. Prove for a convex function on [a, b] (where a < b) Hadamard’s inequality f(
1 a+b )≤ 2 b−a
a
b
f (x)dx ≤
f (a) + f (b) . 2
6. Let f be a proper convex function on X = Rn . Show that the perspective function g(x, t) = tf ( xt ) with effective domain {(x, t)  xt ∈ dom(f ), t > 0} is convex. Relate the perspective function to the homogenization of f . 7. *Show that a closed convex set A ⊆ Rn that has a nonzero recession vector v for which −v is no recession vector can be viewed as the epigraph of a function. Hint. assume wlog that this v the vertical vector (0, 0, . . . , 0, 1). 8. Show that for a convex set A ⊆ X = Rn and a function f : A → R which has a convex epigraph, the solution set of the inequality f (x) ≤ γ is a convex set in X for each real number γ . Do the same for the strict inequality f (x) < γ . 9. Show how you can retrieve a function f : X = Rn → R from its epigraph—by means of an infimum operation. This shows that you do not lose information if you pass from a function to its epigraph. 10. *Show that a function f : X = Rn → R is convex iff its strict epigraph epis (f ) is a convex set in X × R. 11. Show that the effective domain of a convex function is a convex set. 12. Show that the definition of a proper convex function by Jensen’s inequality is equivalent to the definition in terms of its epigraph. 13. *Prove the statements in Example 5.2.13. 14. Let a function f : X = Rn → R be given. Show that f is convex iff the restriction of f to each line L in X is a convex function. 15. **Show that each norm on X = Rn is a convex function and characterize epigraphs of norms as convex cones in X × R by suitable axioms. 16. Prove that each proper convex function is continuous on the relative interior of its effective domain. 17. *Give an example of a convex function f : A → R for which A is the unit disk x12 + x22 ≤ 1 and f is not continuous at any boundary point. 18. Prove Proposition 5.3.5. 19. Prove that the function x 2 is convex by verifying that its epigraph is convex and that Jensen’s inequality holds. Note that even for this simple example, these methods to verify convexity of a function are not very convenient. 20. Prove Proposition 5.3.6. 21. Prove that the function x 2 is convex by the first order criterion for convexity of differentiable functions. Note that this is more convenient than using one of
148
22. 23.
24. 25.
5 Convex Functions: Basic Properties
the two definitions of convex functions: by the convexity of the epigraph or by Jensen’s inequality. *Show that no norm on X = Rn is differentiable at the origin. **Show that a convex function f : R → R has at most countably many points of nondifferentiability. Hints. Show that for each a ∈ R the left and right derivative of f in a—f− (a) and f+ (a) respectively—are defined and that f− (a) ≤ f+ (a). Then show that a < b implies f+ (a) < f+ (b). Finally prove and apply the fact that each uncountable collection of positive numbers has a countable subcollection with sum +∞. Prove Proposition 5.3.9. *Show that the following functions are convex: (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)
ax 2 + bx + c, (for a ≥ 0), ex , − ln x, x > 0, x ln x, x > 0, x r , x ≥ 0, (for r ≥ 1), −x r , x > 0, (for 0 ≤ r ≤ 1), x r , x > 0 (for x ≤ 0), x + x − a , x √ − 1 , 1 + x2, − ln(1 − ex ), ln(1 + ex ).
26. *Consider a quadratic function f of n variables. That is, consider a function of the form f (x) = x Cx + c x + γ , where C ∈ Sn , c ∈ Rn and γ ∈ R. Use the second order criterion to show that the function f (x) is convex iff C is positive semidefinite. Moreover, show that f is strictly convex iff C is positive definite. 27. **Show by means of the second order conditions that the function ln(ex1 +· · ·+ exn ) is convex. 28. Show how to retrieve a convex function from its conification. 29. *Use conification to prove Jensen’s inequality for a convex function f : X = Rn → R m
m f αi xi ≤ αi f (xi ) i=1
i=1
for αi ≥ 0, m i=1 αi = 1, xi ∈ X ∀i. 30. Compare the proof you have found in Exercise 29 to the usual proof (‘derivation from Jensen’s inequality for two points’). 31. Show that the inverse image f M of a convex function f under a linear function M is again a convex function.
5.9 Exercises
149
32. Show that taking the inverse image of a convex function under a linear function does not preserve properness. 33. Show that the epigraph of the inverse image of a convex function under a linear function is the inverse image of the epigraph of this convex function. 34. Show that the image Mg of a convex function g under a linear function M is again a convex function. 35. Show that taking the image of a convex function under a linear function does not preserve properness. 36. *Show that the image Mg of a convex function g : Rn → R under M can be characterized as the function Mg : Rm → R for which the strict epigraph is the image of the strict epigraph of g under the mapping Rn × R → Rm × R : (x, ρ) → (Mx, ρ). 37. *Show that the five binary operations in Definition 5.6.1 are welldefined. For the pointwise maximum, this is done in the main text. 38. Prove the following geometric interpretation for the binary operation infimal convolution of convex functions: this amounts to taking the Minkowski sum of the strict epigraphs. 39. *Repeat this construction, but now for the triplet (X , R , R ) and show that this gives the pointwise maximum. 40. Repeat this construction again, but now for the triplet (+X , +R , +R ) and (+X , +R , R ) and show that this gives the convex hull of the pointwise minimum and the infimal convolution respectively. 41. Construct with the conification method Kelley’s sum by a suitable triplet. 42. Construct with the conification method formulas for the three nameless binary operations on convex functions. Try them for some simple concrete convex functions on R. 43. Show that the recession cone of a closed proper convex function is welldefined. That is, show that all nonempty sublevel sets of a convex function have the same recession cone. 44. Give a homogenization definition of the recession cone of a convex function.
Chapter 6
Convex Functions: Dual Description
Abstract • Why. The phenomenon duality for a proper closed convex function, the possibility to describe it from the outside of its epigraph, by graphs of affine (constant plus linear) functions, has to be investigated. This has to be done for its own sake and as a preparation for the duality theory of convex optimization problems. An illustration of the power of duality is the following task, which is challenging without duality but easy if you use duality: prove the convexity of the main function from geometric programming, ln(ex1 + · · · + exn ), a smooth function that approximates the nonsmooth function max(x1 , . . . , xn ). • What. The duality for convex functions can be formulated efficiently by the property of a duality operator on convex functions that have the two nice properties (proper and closed), the conjugate function operator f → f ∗ : this is an involution. For each one of the eight binary operations on convex functions, one has a rule of the type (f g)∗ = f ∗ " g ∗ where " is another one of the eight binary operations on convex functions. Again, homogenization generates a unified proof for these eight rules. This requires the construction of the conjugate function operator by means of a duality operator for convex cones (the polar cone operator). There is again a technical complication due to the fact that the rules only hold if all convex functions involved, f, g, f g and f ∗ " g ∗ , have the two nice properties. Even more important, for applications, is that one has, for the two binary operations + and max on convex functions, a rule for subdifferentials: the theorem of MoreauRockafellar ∂(f1 +f2 )(x) = ∂f1 (x)+∂f2 (x) and the theorem of DubovitskiiMilyutin ∂ max(f1 , f2 )(x) = ∂f1 (x)co ∪ ∂f2 (x) in the difficult case f1 (x) = f2 (x). Homogenization generates a unified proof for these two rules. Again, these rules only hold under suitable assumptions. Again, there is no need to prove something new: all duality results for convex functions follow from properties of convex sets (their epigraphs) or of convex cones (their homogenizations). Road Map 1. Figure 6.1, Definition 6.2.3, (conjugate function). © Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045_6
151
152
6 Convex Functions: Dual Description
2. Figure 6.2 (dual norm is example of either conjugate function or duality operator by homogenization). 3. Theorem 6.4.2 and structure proof (duality theorem for proper closed convex functions: conjugate operator is an involution, proof by homogenization). 4. Propositions 6.5.1, 6.5.2 (calculus rules for computation conjugate functions). 5. Take note of the structure of Sect. 6.5 (duality between convex sets and sublinear functions; this is a preparation for the proof of calculus rules for subdifferentials). 6. Definitions 6.7.1, 6.7.2, Propositions 6.7.8–6.7.11 (subdifferential convex function at a point, nonemptiness and compactness, at points of differentiability, FenchelMoreau, DubovitskiiMilyutin, subdifferential norm).
6.1 *Motivation The dual description of a convex function is fundamental and turns up often, for example in duality theory for convex optimization. Here we mention a striking example of the power of using the dual description of a convex function. We give an example of a function for which it is quite involved to verify convexity by means of the second order criterion. However, this will be surprisingly simple if you use its dual description, as we will see at the end of this chapter. Consider the function f : Rn → R given implicitly by e y = e x1 + · · · + e xn or explicitly by f (x) = ln(ex1 + · · · + exn ). This function is useful as it is a smooth function resembling closely the nondifferentiable function maxi (xi ) (see Exercise 1). You can prove that f is convex as follows. You can calculate its second order derivatives and then form the Hessian f (x). Then it is in principle possible to prove that f (x) is positive semidefinite for all x ∈ Rn (and hence that f is convex), but it is not easy. On the other hand, the dual description of this function turns out to be essentially the sum of n functions of one variable. It is straightforward to check that each of these is convex. This implies that the function f is convex. The details will be given at the end of this chapter.
6.2 Conjugate Function The aim of this section is to define the duality operator for proper closed convex functions—the conjugate function operator. Example 6.2.1 Figure 6.1 illustrates the definition of the conjugate function. The graph of a convex function f : R → R is drawn. Its primal (‘inner’) description is its epigraph. Its dual (‘outer’: outside the epigraph) description is the
6.2 Conjugate Function
153
Fig. 6.1 Conjugate function
f
ax − b − f ∗(a) −b
collection of lines in the plane that lie entirely under or on the graph. Each one of these lines can be described by an equation y = ax − b. It turns out that the set of pairs (a, b) that one gets is precisely the epigraph of a convex function f ∗ . This function f ∗ is the dual description of the convex function f , called the conjugate function of f . Figure 6.1 shows that f ∗ is given by the following formula: f ∗ (a) = sup (ax − f (x)) . x
Indeed, the condition that the line y = ax − b lies entirely under or on the graph of f means that ax − b ≤ f (x) ∀x. This can be rewritten as ax − f (x) ≤ b ∀x that is, as sup (ax − f (x)) ≤ b. x
This shows that the set of pairs (a, b) for which the line y = ax − b lies entirely under or on the graph of f is indeed the epigraph of the function with recipe a → supx (ax − f (x)) . The formula above for f ∗ implies that f ∗ is convex, as for each x the function a → ax − f (x) is affine and so convex, and the pointwise supremum of a collection of convex functions is again convex. Example 6.2.2 (Comparison of Conjugate Function and Derivative) Let f : R → R be a differentiable convex function; assume for simplicity that f : R → R is a bijection. Then both the functions f ∗ and f give descriptions of the set of the tangent lines to the graph of f . In the case of the conjugate function f ∗ , the set is described as a family (La )a∈R indexed by the slope of the tangent line: La is the
154
6 Convex Functions: Dual Description
graph of the function with recipe x → ax − f ∗ (a). In the case of the derivative f , the set is described as a family (Mb )b∈R indexed by the xcoordinate b of the point of tangency of the tangent line and the graph of f : Mb is the graph of the function with recipe x → f (b) + f (b)(x − b). The description by means of f ∗ has the advantage over the description by f that if f is convex but not differentiable, this description includes all supporting lines at points of nondifferentiability of f . These arguments will be generalized in Proposition 6.7.5 to arbitrary proper convex functions: then one gets a relation between the conjugate function and the subdifferential function, which is the analogue of the derivative for convex functions. Now we give the formal definition of the duality operator for convex functions. Definition 6.2.3 Let a proper convex function f be given. That is, let a nonempty convex set A ⊆ X = Rn and a convex function f : A → R be given. Then the conjugate function of f is the convex function on X given by f ∗ (y) = sup (x · y − f (x)). x∈A
Now f is extended in the usual way to f : X → R ∪ {+∞} by defining f (x) = +∞ ∀x ∈ A. Then one can write f ∗ (y) = sup (x · y − f (x)) ∀y ∈ X. x∈X
Note that the function f ∗ is a proper closed convex function. The conjugate function f ∗ of a proper convex function f : Rn → R gives a description of all affine functions Rn → R that support the graph of f at some point; if the slope of such an affine function is a ∈ Rn , then this affine function is given by recipe x → a · x − f ∗ (a). Example 6.2.4 (Conjugate Function) The conjugate function of f (x) = ex can be determined as follows. Consider for each y ∈ R the problem to maximize the concave function (that means that minus the function is convex) x → xy − ex . For d y > 0, we get dx (xy − ex ) = y − ex = 0, so x = ln y, and so f ∗ (y) = y ln y − y. For y = 0, we get that xy − ex tends to 0 for x → −∞ and to −∞ for x → +∞, so f ∗ (0) = 0. For y < 0, we get that xy − ex tends to +∞ for x → −∞, so f ∗ (y) = +∞. This is a good moment to compute the conjugate function for a few other convex functions (see Exercises 4–6). Example 6.2.5 (Dual Norm and Duality) Now we consider the dual norm · ∗ of a norm · on X = Rn , defined in Chap. 4 (see Definition 4.3.9). This definition is not just some arbitrary convention: it fits into the general pattern of duality. Indeed, the indicator function of the unit ball of the dual norm is the conjugate of the convex function · , as one can readily check. Even more simply, the epigraphs of · and − · ∗ are each other’s polar cone. Figure 6.2 illustrates this. In particular, the
6.3 Conjugate Function and Homogenization
155
Fig. 6.2 Dual norm by means of dual cones
 ⋅ 
−  ⋅ ∗
wellknown formula · ∗∗ = · follows from the formula C ◦◦ = C for closed convex cones C. Conclusion The duality operator for proper closed convex functions—the conjugate function operator—has been defined by an explicit formula.
6.3 Conjugate Function and Homogenization The aim of this section is to define the conjugate function operator by homogenization. This alternative definition will make it easier to establish properties of the conjugate function operator: they can be derived from similar properties of the polar cone operator. Here is a detail that needs some attention before we start: the conification c(f ) of a convex function f on X = Rn is contained in the space X × R × R. For defining the polar cone (c(f ))◦ , we have to define a suitable nondegenerate bilinear mapping on X × R × R. WE take the following nondegenerate bilinear mapping b: b((x, α, α ), (y, β, β )) = x · y − αβ − α β
(*)
for all (x, α, α ), (y, β, β ) ∈ X×R×R. Note the minus signs and the transposition! The following result gives a nice property of a convex function: its dual object is again a convex function. Proposition 6.3.1 Let X = Rn . 1. A convex cone C ⊆ X × R × R is a conification of a convex function f on X iff R+ (0X , 1, 0) ⊆ cl(C) ⊆ X × R × R+ . 2. If a convex cone C ⊆ X × R × R is a conification of a convex function on X, then the polar cone C ◦ with respect to the bilinear mapping b defined by (∗) is also a conification of a convex function.
156
6 Convex Functions: Dual Description
Proof The first statement follows by definition of conification of a convex set. The second statement follows from the fact that the convex cones R+ (0X , 1, 0) and X × R×R+ are each other’s polar cone with respect to the bilinear mapping b. Therefore we get that R+ (0X , 1, 0) ⊆ cl(C) ⊆ X × R × R+ implies (R+ (0X , 1, 0))◦ ⊇ C ◦ ⊇ (X × R × R+ )◦ , and so it gives X × R × R+ ⊇ C ◦ ⊇ R+ (0X , 1, 0),
as required.
Now we are ready to construct the conjugate function operator by the homogenization method. Proposition 6.3.2 The duality operator on convex functions that one gets by the homogenization method (first conify, then apply the polar cone operator, finally deconify) is equal to the conjugate function operator. Indeed, by the previous proposition, the duality operator by the homogenization method gives, when applied to an ‘old’ convex function f on X, a ‘new’ convex function on X. A calculation, which we do not display, shows that the new convex function is f ∗ , the conjugate function of f . This is yet another example where a formula from convex analysis—here the defining formula for the conjugate function operator—is generated by the homogenization method. Conclusion The duality operator for proper closed convex functions—the conjugate function operator—has been constructed by the homogenization method.
6.4 Duality Theorem The aim of this section is to give the duality theorem for convex functions. Definition 6.4.1 The biconjugate function f ∗∗ of a proper convex function f on X = Rn is the conjugate function of the conjugate function of f , f ∗∗ = (f ∗ )∗ .
6.5 Calculus for the Conjugate Function
157
Theorem 6.4.2 (Duality Theorem for Convex Functions) A closed proper convex function f on X = Rn is equal to its biconjugate function f ∗∗ , f ∗∗ = f. In particular, the biconjugate function operator on convex functions is the same operator as the closure operator: f ∗ = cl(f ). This result is called the theorem of FenchelMoreau. Here is the geometric meaning of this result: there is complete symmetry between the primal and dual descriptions of a proper closed convex function: the functions f and f ∗ are each other’s conjugate. The proof below illustrates the convenience of the homogenization definition of the conjugate function operator. Proof We apply the homogenization method. 1. Homogenize. Take cl(c(f )), which is a homogenization of f . 2. Work. Apply the duality theorem for cones to C = cl(c(f )). This gives C ◦◦ = C. 3. Dehomogenize. As C is a conification of f , C ◦ is a conification of f ∗ , so C ◦◦ is a conification of f ∗∗ . So deconification gives f ∗∗ = f .
6.5 Calculus for the Conjugate Function The aim of this section is to present calculus rules to compute conjugate functions. We already have given one rule above: (f ∗ )∗ = f if f is proper and closed. Now we will also give rules for expressing (f ◦ g)∗ in f ∗ and g ∗ where ◦ is a binary operation for convex functions (sum, maximum, convex hull of the minimum, or infimal convolution) and for expressing (Mf )∗ and (f N )∗ in f ∗ for linear functions M and N . Just as we did for convex sets, we will halve the number of formulas by the introduction of a shorthand notation. We will write for two proper convex functions f, f˜ : X = Rn → R that are in duality, in the sense that f ∗ = f˜, f˜∗ = cl(f ) as follows: → f˜. f ← d
Then we will say that (f, f˜) is a pair in duality. Note that here f need not be closed (that is, lowersemicontinuous). Here are the promised calculus rules for conjugate functions.
158
6 Convex Functions: Dual Description
Proposition 6.5.1 (Calculus Rules for the Conjugate Function Operator) Let both (f, f˜) and (g, g) ˜ be two pairs of proper convex functions on X = Rn in duality, d d → f˜, g ← → g. ˜ f ←
Then d → f˜ ⊕ g˜ f +g ←
and → f˜co ∧ g, ˜ f ∨g ← d
provided, for each of these two rules, that the outcomes of the binary operations involved are proper. The proof of these results in a traditional approach are relatively involved. However, we can get a unified proof by the homogenization method. Indeed these results follow immediately from the following convex calculus rules and the definition of the binary operations by the homogenization method (recall that this involves the special linear functions +Y and Y ). Proposition 6.5.2 Let M ∈ Rm×n , (g, g) ˜ a pair of proper convex functions on Rn in duality, d
→ g. ˜ g← Then (Mg, gM ˜ ) is a pair in duality, d
→ gM ˜ Mg ← provided Mg and gM ˜ are proper. This proposition is an immediate consequence of the corresponding calculus rule for convex sets, applied to strict epigraphs of convex functions.
6.6 Duality: Convex Sets and Sublinear Functions The aim of this section is to prepare for the nondifferentiable analogue of the derivative of a differentiable function—the subdifferential of a convex function f at a point x, a concept that we need for optimization. Example 6.6.1 (Subdifferential) Figure 6.3 illustrates the subdifferential for functions on X = Rn for n = 1.
6.6 Duality: Convex Sets and Sublinear Functions
159
Fig. 6.3 Subdifferential of a convex function at a point
f
x
The graph of a convex function f of one variable that is nondifferentiable at x, is drawn, as well as some of the lines that run through the kink point of the graph and that lie nowhere above the graph of f . The set of the slopes of all these lines is defined to be the subdifferential of f at x. It is fruitful to view the concept of subdifferential of a convex function at a point as defined in a systematic way, in two steps. The first step is that an auxiliary function g is taken. This auxiliary function is characterized as follows: it takes value 0 at 0, it is linear in both directions—left and right—and has lefthand slope the leftderivative of f at x and righthand slope the rightderivative of f at x. The epigraph of the auxiliary function g is a convex cone. A functions for which the epigraph is a convex cone is called a sublinear function. The second step is that the dual description of the auxiliary function g is taken: this consists of the set of lines through the origin that lie nowhere above the graph of this function. This set can be viewed as a convex subset of the line R, the slopes of these lines. Note that a nonvertical line through the origin is determined by its slope. This convex subset is precisely the subdifferential of f at x. The preparation to be done in this section for the systematic construction of the subdifferential, is the presentation of the duality between sublinear functions X = Rn → R and arbitrary convex sets in X = Rn . We recall the definition of sublinear functions. Definition 6.6.2 A sublinear function on X = Rn is a function p : X → R for which the epigraph is a convex cone and for which X ×R+ ⊆ cl(epi(p)). Moreover, p is called closed if epi(p) is closed and proper if {0X } × R ⊆ cl(epi(p)). Example 6.6.3 (Sublinear Function: Norm) A norm N on X = Rn is an example of a sublinear function. Alternatively, a sublinear function can be defined by homogenization. For a convex cone C ⊆ X × R for which R+ (0X , 1) ⊆ cl(C), one defines its dehomogenization to be the function p : X × R by p(x) = inf{ρ  (x, ρ) ∈ C} ∀x ∈ X.
160
6 Convex Functions: Dual Description
Then the convex cone C is called a homogenization of the sublinear function p. The homogenizations of p are the cones C for which epis (p) ⊆ C ⊆ epi(p). Now we define the duality operators for proper sublinear functions and for general nonempty convex sets. Definition 6.6.4 The support function of a nonempty convex set A ⊆ X = Rn is the proper closed sublinear function s(A) : X → R ∪ {+∞} given by y → sup{x · y  x ∈ A}. Example 6.6.5 (Support Function) The support function of the standard ndimensional closed unit ball x12 + · · · + xn2 ≤ 1 is the Euclidean norm 1
N(x) = x = (x12 + · · · + xn2 ) 2 . Definition 6.6.6 The subdifferential ∂p of a proper sublinear function p on X = Rn is the nonempty closed convex set ∂p = {y ∈ X  x · y ≤ p(x) ∀x ∈ X}. Example 6.6.7 (Subdifferential) The subdifferential of the Euclidean norm on Rn is the standard closed unit ball in Rn . Note that here we have for the first time duality operators that do not preserve the type. So far, we have seen that the polar set operator preserves convex sets containing the origin, the polar cone operator preserves nonempty convex cones, the conjugate function preserves proper convex functions. Now we can state the duality theorem for convex sets and sublinear functions. This gives the dual descriptions of convex sets and sublinear functions. Theorem 6.6.8 (Duality Theorem for Convex Sets and Sublinear Functions) One has the following duality formulas 1. ∂s(A) = A for all nonempty closed convex sets A ⊆ X = Rn , 2. s∂p = p for all proper closed sublinear functions p on X. The meaning of this result is that it gives a perfect symmetry between closed nonempty convex sets A in X and closed proper sublinear functions p on X: p is the support function of A iff A is the subdifferential of p. Example 6.6.9 (Subdifferential and Support Function: Inverse Operators) The ndimensional standard closed unit ball B and the ndimensional Euclidean norm · are each other’s dual object: B is the subdifferential of · , and · is the support function of B.
6.6 Duality: Convex Sets and Sublinear Functions
161
These two duality operators are defined by explicit formulas which might look like they come out of nowhere and the duality theorem 6.6.8 might look like a new phenomenon that requires a new proof. However, one can also define these operators, equivalently, in a systematic way: by homogenization. This gives the same duality operators and then the duality theorem 6.6.8 is seen to be an immediate consequence of the duality theorem for convex cones. Indeed, let p be a proper sublinear function on X = Rn . Choose a homogenization C of p. This is a convex cone in X × R, for which R+ (0X , 1) ⊆ cl(C), R(0X , 1) ⊆ cl(C). Apply the polar cone operator with respect to the bilinear mapping b on X ×R given by b((x, α), (y, β)) = x · y − αβ. This gives (R+ (0X , 1))◦ ⊇ C ◦ , (R(0X , 1))◦ ⊇ (C)◦ . Now use that the convex cones R+ (0X , 1) and X × R+ are each other’s dual with respect to b, and that the convex cones R(0X , 1) and X × {0} are also each other’s dual with respect to b. It follows that we get X × R+ ⊇ C ◦ , X × {0} ⊇ C ◦ . That is, C ◦ ⊆ X × R+ , C ◦ ⊆ X × {0}. This means that C ◦ is a homogenization of a nonempty convex set A. This is readily verified to be the subdifferential ∂p that was defined above. In the same way, one can define by homogenization a duality operator from convex sets to sublinear functions. This gives precisely the support function operator s defined above. These two operators are each other’s inverse on closed proper objects, that is, for a closed proper convex set A ⊂ X and a closed proper sublinear function p on X, one has that A is the subdifferential of p iff p is the support function of A. This follows from the homogenization definitions of the subdifferential operator and the support function operator together with the duality theorem C ◦◦ = C for closed nonzero convex cones C. One can go on and derive in a systematic way, by homogenization, the convex calculus rules for subdifferentials and for support functions. This gives eight rules: there are four binary operations on sublinear functions +, ∨, ⊕, # and there are four binary operations on convex sets +, co ∪, ∩, # and for each one of these eight binary operations, there is a duality rule. We only need two of these rules, those
162
6 Convex Functions: Dual Description
for the operations +, ∨ on sublinear functions. Therefore we only display the four duality rules for sublinear functions. Proposition 6.6.10 (Calculus Rules for the Sublinear Functions) Let p, q be proper closed sublinear functions on X = Rn . Then the following formulas hold. 1. 2. 3. 4.
∂(p + q) = cl(∂p + ∂q), ∂(p ∨ q) = cl((∂p) co ∪ (∂q)), ∂(p ⊕ q) = ∂p ∩ ∂q, ∂(p # q) = ∂p # ∂q,
provided, for each one of these rules, that the outcomes of the two binary operations involved is proper. The closure operators can be omitted if the relative interiors of the effective domains of p1 , p2 have a common point, ri(dom p1 ) ∩ ri(dom p2 ) = ∅. We emphasize that for applications in convex optimization, we need only the first two rules. We do not display the details of the proofs, by homogenization. These are similar to the proofs of the corresponding rules for other convex objects than sublinear functions, such as convex sets containing the origin or proper convex functions. Conversely, one can give rules for the support function of ‘new convex sets’ in terms of old ones. We do not need these rules so we will not display them. Conclusion The dual description of a proper sublinear function is a general nonempty convex set. This will be needed for the definition of the ‘surrogate’ for the derivative of a convex function in a point of nondifferentiability.
6.7 Subgradients and Subdifferential The aim of this section is to present the basics for subgradients and subdifferentials of convex functions at a point. These concepts play a role in optimization. For convex functions they are the analogue of the derivative. Here is the usual definition of the subdifferential of a proper convex function at a point. Definition 6.7.1 A subgradient of a proper convex function f on X = Rn at a point x from its effective domain dom(f ) is a vector d ∈ X for which f (z) ≥ f (x) + d · (z − x),
∀z ∈ X.
In words, if an affine function on X has the same value at x as f and its graph lies nowhere above the graph of f , then its slope is called a subgradient of f at x.
6.7 Subgradients and Subdifferential
163
Fig. 6.4 Subdifferential of a convex function at a point
f
x
Definition 6.7.2 The subdifferential ∂f (x) of a proper convex function f on X at a point x ∈ dom(f ) is the set of all subgradients of f at x. Note that this is a closed convex set. Example 6.7.3 (Subdifferential Convex Function at a Point (Geometric)) Figure 6.4 illustrates this definition for the case that n = 1. The geometric meaning of a subgradient is that it corresponds to a nonvertical supporting hyperplane to the epigraph of f at the point (x, f (x)). Example 6.7.4 (Subdifferential Convex Function at a Point (Analytic)) 1. The subdifferential of the Euclidean norm on X = Rn at a point x ∈ X is the standard closed unit ball in X if x = 0n , the origin, and it is {x/x} if x = 0n . 2. The subdifferential of a convex quadratic function f (x) = x Bx + b x + β (where B is a positive semidefinite matrix) at the point x0 is {2Bx0 + b}. The subdifferential operator is related to the conjugate function operator. Proposition 6.7.5 Let f be a proper convex function on X = Rn . For each x, y ∈ X one has y ∈ ∂f (x) ⇔ f (x) + f ∗ (y) = x · y. This result follows from the definitions. It often allows to compute the subdifferential ∂f (x) if the conjugate function f ∗ is known. The subdifferential of a proper convex function at a point can be expressed in terms of directional derivatives as follows. Often, one can calculate subdifferentials ∂f (x) of a convex function f at a point x, by means of the following result. Definition 6.7.6 The directional derivative f (x; v) ∈ R of a proper convex function f : X = Rn → R at a point x ∈ dom(f ) in direction v ∈ X is the limit f (x; v) = lim t↓0
f (x + tv) − f (x) . t
164
6 Convex Functions: Dual Description
The fractional derivative f (x; v) is a sublinear function of v ∈ X Note that a fractional derivative f (x, v) always exist, but it can take the values +∞ and −∞. Proposition 6.7.7 The subdifferential ∂f (x) of a proper convex function f at a point x ∈ dom(f ) is equal to the dual object ∂(f (x; ·)) of the sublinear function f (x, ·) that is given by the directional derivatives of f at x, v → f (x; v): ∂f (x) = ∂(f (x; ·)). The proof is easy: it suffices to prove this result for the case of a function of one real variable, by restriction of f to lines through the point x. The proof in this case follows readily from the definitions. In passing, we mention that this property of convex functions is no exception to the rule that all properties of convex functions can be proved by applying a property of a convex set or a convex cone. It is a consequence of a property of the tangent cone TA (x) = cl[R+ (A − x)] of a convex set A at a point x ∈ A, but we do not consider the concept of tangent cone. The meaning of Proposition 6.7.7 is twofold. First it gives a systematic definition, by homogenization, of the subdifferential of a proper convex function f at a point x of its effective domain. Second it gives a method that allows to calculate the subdifferential of a convex function at a point in many examples. Concerning the first point. One takes the sublinear function f (x, ·) given by the directional derivatives of f at x, v → f (x; v). Then one takes the dual description of this sublinear function, the convex set ∂(f (x; v)). Proposition 6.7.7 shows that this definition is equivalent to the definition given above. Concerning the second point. Proposition 6.7.7 reduces the task to calculate a subdifferential of a convex function at a point to the task to calculate, using differential calculus, onesided derivatives of functions of one variable at a point. This systematic definition of the subdifferential of a convex function at a point makes it possible to give straightforward proofs—by the homogenization method— of all its properties. Here are some of these properties. Proposition 6.7.8 The subdifferential of a proper convex function at a relative interior point of the effective domain is a nonempty compact convex set. The subdifferential is, at a point of differentiability, essentially the derivative. Proposition 6.7.9 If f : X = Rn → R is a proper convex function and x ∈ ri(dom(f )), then f is differentiable in x iff ∂f (x) consists of one element. Then this element is f (x), the derivative of f at x.
6.8 Norms as Convex Objects
165
Moreover, we get the following calculus rules for the subdifferential of a proper convex functions at a point of its effective domain. Proposition 6.7.10 (Calculus Rules for Subdifferential Convex Function) Let f, g be two proper convex functions on X = Rn . Then the following formulas hold true: 1. ∂(f ∨ g)(x) = cl(∂f (x)co ∪ ∂g(x)) if f (x) = g(x) 2. ∂(f + g)(x) = cl(∂f (x) + ∂g(x)). provided, for each of these formulas, that the outcomes of the two binary operations involved are proper. The formulas of Proposition 6.7.10 are celebrated theorems. The first one is called the theorem of DubovitskiiMilyutin (note that the rule for ∂(f ∨ g)(x) in the case f (x) = g(x) is simply that it is equal to ∂f (x) if f (x) > g(x) and that it is equal to ∂g(x) if f (x) < g(x)), the second one the theorem of MoreauRockafellar. Usually it requires considerable effort to prove them; by the systematic approach— the homogenization method—you get them essentially for free: they are immediate consequences from the corresponding rules for sublinear functions: the first two rules from Proposition 6.6.10 Proposition 6.7.11 The subdifferential of a norm N on X = Rn at the origin is equal to the unit ball of the dual norm N ∗ , {x ∈ X  N ∗ (x) ≤ 1}. The material in this section can be generated by the homogenization method; the details are not displayed here. Conclusion A surrogate for the derivative—the subdifferential—has been defined which is also defined for points of nondifferentiability of convex functions. This is done by a formula and equivalently by means of the duality between sublinear functions and convex sets. The latter definition can been used to derive the calculus rules for the subdifferential.
6.8 Norms as Convex Objects One can define the conification of a norm N on X = Rn in two ways. In the first place, one can take the convex cone in X × R × R that is the conification of N as a convex function. However, a simpler concept of conification is possible: the convex cone in X × R that is the strict epigraph of N . These two conifications give as dual object of N different convex functions. The conification of N as a convex function gives the indicator function of standard unit ball of the standard dual norm N ∗ , which is given by N ∗ (y) = supN (x)=1 x · y for all y ∈ X. That is, this function takes the value 0 in this ball and the value +∞ outside this ball. The epigraph gives
166
6 Convex Functions: Dual Description
the standard dual norm N ∗ itself. Each norm is continuous. So if N, N˜ are norms on X, then N˜ = N ∗ iff N = N˜ ∗ ; then one says that the norms N and N˜ are dual to each other. One gets four binary operations on norms, the restrictions of the binary operations on convex functions +, ⊕, ∨, co∧. The binary operations + and ⊕ are dual to each other, as are the binary operations ∨ and co∧.
6.9 *Illustration of the Power of the Conjugate Function The aim of this section is to give an example that illustrates the power of the dual description of convex functions. Example 6.9.1 (Proving the Convexity of a Function by Means of Its Conjugate Function) We consider the problem to verify that the function f (x) = ln(ex1 + · · · + exn ) is convex. This function plays a central role in a certain type of optimization problems, socalled geometric programming problems. At first sight, it might seem that the best thing to do is to check that its Hessian f (x) is positive semidefinite for all x. This verification turns out to be elaborate. Here is a better way: by a trick. Its beginning might look surprising and in conflict with the right way to prove something: assume the very thing that has to be proved— the convexity of f . Then compute the conjugate function of f : for each y ∈ Rn we have by definition that f ∗ (y) is the optimal value of the problem to maximize the function x → x · y − f (x). Write down the stationary conditions—that is, put all partial derivatives equal to zero. This gives: yi − (ex1 + · · · + exn )−1 exi = 0 for 1 ≤ i ≤ n. These imply n
yi = 1, y > 0.
i=1
For each such y, the following choice of x is a solution of the stationarity conditions xi = ln yi 1 ≤ i ≤ n. So the optimal value of the optimization problem under consideration is n i=1
yi ln yi .
6.10 Exercises
167
So we have computed f ∗ . It is the function g(y) that takes the value ni=1 yi ln yi if ni=1 yi = 1, y ≥ 0 (with, for simplicity of notation, the convention 0 ln 0 = 0— note here that limt↓0 t ln t = 0) and that takes the value +∞ otherwise. If we would take the conjugate function of g we get f back, by the duality theorem. After this ‘secret and private preparation’, we can start the official proof that the function f is convex. We define out of the blue the following function g : Rn → R: g(y) =
n
yi ln yi
i=1
for all y ∈ Rn for which n
yi = 1, y > 0
i=1
and which takes the value +∞ otherwise. It is our good right to consider any function we like. There is no need to explain in a proof how we have come up with our ideas to make the proof work. Now one sees straightaway that g is convex. Just calculate the second derivative of η → η ln η—it is η−1 and so it is positive if η > 0. Hence η ln η is convex and so g is convex. Now we compute the conjugate function of g: for each x ∈ Rn we have by definition that g ∗ (x) is the optimal value of the problem to maximize the function y → x · y − g(y) subject to the constraints n
yi = 1, y ≥ 0.
i=1
Writing down the stationary conditions of this problem and solving them gives readily that the value of this problem is f (x). It follows that g ∗ = f . Hence f is a convex function, as each conjugate function is convex. Conclusion To prove that the central function from geometric programming is convex, one can derive nonrigorously a formula for its conjugate function. This formula is easily seen to define a convex function. Therefore, its conjugate function is convex. It can be shown readily in a rigorous way that this convex function is the original given function.
6.10 Exercises 1. *Prove the following inequalities: max{xi } ≤ ln( exi ) ≤ ln n + max{xi }. i
i
i
168
6 Convex Functions: Dual Description
These inequalities show that the function ln( i exi ) is a good smooth approximation of the useful but nondifferentiable function maxi {xi }. 2. * (a) Show that the condition that a line y = ax − b lies entirely under the graph of a given convex function f : R → R can be written as follows: inf (f (x) − (ax − b)) ≥ 0 x
(see Fig. 6.1). (b) Show that this condition can be rewritten as b ≥ sup (ax − f (x)) . x
(c) Conclude that the pairs (a, b) form the epigraph of the function a → f ∗ (a) = supx (ax − f (x)). (d) Show that the conjugate function f ∗ is convex, as it is written as the supremum of convex functions lx : a → ax − f (x) where x runs over R. 3. *Check that the conjugate function f ∗ of a proper closed convex function f is again a proper closed convex function on X. Show that its effective domain consists of the slopes a of all affine functions a · x − b that minorize f . 4. *Compute the conjugates of the following convex functions of one variable: (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)
ax 2 + bx + c, (for a ≥ 0), ex , − ln x, x > 0, x ln x, x > 0, x r , x ≥ 0, (for r ≥ 1), −x r , x > 0, (for 0 ≤ r ≤ 1), x r , x > 0 (for x ≤ 0), x + x − a , x √ − 1 , 1 + x2, − ln(1 − ex ), ln(1 + ex ).
5. Compute the conjugate of a convex quadratic function of n variables, that is for f (x) = x Cx + c x + γ where C is a positive semidefinite n × nmatrix, c ∈ Rn and γ ∈ R. 6. *Calculate the conjugate function of the convex function f : Rn → R : x → max{x1 , . . . , xn }. 7. Show that · ∗ is a norm: directly from the defining formula and from the definition by the homogenization method.
6.10 Exercises
169
8. *Show that the conjugate of a norm · on X = Rn is the usual dual norm · ∗ on X. 9. Show that the two definitions of the conjugate function operator for a proper not necessarily closed convex function—by an explicit formula and by the homogenization method, are equivalent. 10. Prove the following implication for proper convex functions f, g on X = Rn : f ≤ g ⇒ f ∗ ≥ g∗. 11. Prove the FenchelYoung inequality: p · x ≤ f (x) + f ∗ (p). 12. Check that for a proper closed convex function f on X = Rn one has indeed c(f )◦ = cl(c(f ∗ )). 13. *Prove Proposition 6.5.2. 14. *Derive Proposition 6.5.1 from Proposition 6.5.2. 15. Show that a proper closed function p : X = Rn → R ∪ {+∞} is sublinear iff C = {x ∈ X  p(x) < +∞} is a convex cone and p(αx + βy) ≤ αp(x) + βp(y) ∀α, β ≥ 0 ∀x, y ∈ X. 16. Show that the support function of a nonempty convex set is a proper sublinear function. 17. *Show that the subdifferential of a proper sublinear function is a nonempty convex set. 18. Show that if a sublinear function takes all its values in R, then its subdifferential is compact. 19. *Show that the two definitions of the duality operators for nonempty convex sets and proper sublinear functions—by an explicit formula and by the homogenization method (formulate the latter explicitly)—are equivalent. 20. Prove Theorem 6.6.8 by the homogenization method. That is, use the definition of the duality operators for nonempty convex sets and proper sublinear functions by dehomogenization and use the duality theorem for convex cones. 21. Construct binary operations on proper sublinear functions by conification, show that these are equal to binary operations defined by formulas and then prove Proposition 6.6.10 by reduction to convex cones. 22. Show that the directional derivative of a proper convex function at a point of its effective domain and in the direction of some vector is a welldefined element in R. 23. Show that the directional derivative of a proper convex function at a point of its effective domain and in the direction of some vector v is a sublinear function of v. 24. Give an example of a proper convex function f on X = Rn , a point x ∈ dom(f ) and a vector v ∈ X such that the directional derivative f (x; v) is +∞. 25. Give an example of a proper convex function f on X = Rn , a point x ∈ dom(f ) and a vector v ∈ X such that the directional derivative f (x; v) is −∞.
170
6 Convex Functions: Dual Description
26. Prove the following expression for the subdifferential of a proper convex function f : X = Rn → R at a point x of its effective domain and the directional derivatives of f at x, ∂f (x) · v = f (x; v) ∀v ∈ X. 27. *Let X = Rn , let f be a proper convex function on X, let x¯ ∈ dom(f ) and let η ∈ X. Write A = epi(f ) and v = (x, ¯ f (x)). ¯ Show that the closed halfspace in X × R having v on its bounding hyperplane and for which the vector (η, −1) is an out of the halfspace pointing normal —that is, the halfspace η · (x − x) − (ρ − f (x)) ¯ ≥ 0— supports A at v iff η ∈ ∂f ( x ). 28. Derive the theorems of MoreauRockafellar and DubovitskiiMilyutin from the corresponding rules for subdifferential functions: the first two rules from Proposition 6.6.10. 29. Give the rule for ∂(f ∨ g)(x) in the case f (x) = g(x). 30. *Compute the subdifferential of the Euclidean norm on the plane—· : R2 → 1 [0, ∞) given by the recipe x = (x1 , x2 ) → (x12 + x22 ) 2 —at each point. 31. Find the subdifferential ∂f (x, y) of the function f (x, y) = max {x, 2y} + 2 x 2 + y 2 at the points (x, y) = (3, 2); (0, −1); (2, 1); (0, 0). 32. Find the subdifferential ∂f (x, y) for the function f (x, y) = max{2x, y} +
x2 + y2
at every point (x, y). 33. Find the subdifferential ∂f (x, y) of the function
f (x, y) = max e
x+y
− 1 , x + 2y − 1 , 2 x 2 + y 2
at the point (x, y) = (0, 0). 34. Consider a right truncated cone of height 2 with the discs of radii 2 and 1 in the bases. Construct an example of a function for which this truncated cone is a subdifferential at some point. 35. Prove Proposition 6.7.7. 36. **Prove Proposition 6.7.8. 37. **Prove Proposition 6.7.9. 38. *Give a precise derivation of Proposition 6.7.10 from Proposition 6.6.10. 39. *Show that the definition of properness of a sublinear function that is given by homogenization—its epigraph does not contain the vertical line through the origin—is equivalent to the property that the function does not assume the value −∞ at any point.
6.10 Exercises
171
40. Prove all results mentioned in Sect. 6.8, and determine which choice (++), (= −), (−+), (−−) gives rise to which binary operation on norms +, ⊕, ∨, co∧. 41. *Formulate all results on norms N on X = Rn that are mentioned in Sect. 6.8, in terms of their unit balls {x ∈ X  N(x) ≤ 1}. In particular, determine how the binary operations on norms correspond to the binary operations on unit balls of norms. 42. *Write down the stationarity conditions of the optimization problem y → x · y − g(y), where g(y) = ni=1 yi ln yi for all y > 0, subject to the constraints n
yi = 1, y > 0
i=1
and solve them. Compute the optimal value of this problem as a function of x.
Chapter 7
Convex Problems: The Main Questions
Abstract • Why. Convex optimization includes linear programming, quadratic programming, semidefinite programming, least squares problems and shortest distance problems. It is necessary to have theoretical tools to solve these problems. Finding optimal solutions exactly or by means of a law that characterizes them, is possible for a small minority of problems, but this minority contains very interesting problems. Therefore, most problems have to be solved numerically, by an algorithm. An analysis of the performance of algorithms for convex optimization requires techniques that are different from those presented in this book; therefore such an analysis falls outside the scope of this book. • What. We give the answers to the main questions of convex optimization (except the last one), and in this chapter these answers are compared to those for smooth optimization: 1. What are the conditions for optimality? A convex function f assumes its infimum at a point x if and only if 0 ∈ ∂f ( x )—Fermat’s theorem in the convex case. Now assume, moreover, that a convex perturbation has been chosen: a convex function of two vector variables F (x, y) for which f (x) = F (x, 0) ∀x; here y ∈ Y = Rm represents changes in some data of the problem to minimize f (such as prices or budgets). Then f assumes its infimum at x if and only if f ( x ) = L( x , η, η0 ) and 0 ∈ ∂L( x , η, η0 ) for some selection of multipliers (η, η0 ) (provided the bad case η0 = 0 is excluded)—Lagrange’s multiplier method in the convex case. Here L is the Lagrange function of the perturbation function F , which will be defined in this chapter. The conditions of optimality play the central role in the complete analysis of optimization problems. The complete analysis can always be done in one and the same systematic four step method. This is illustrated for some examples of optimization problems. 2. How to establish in advance the existence of optimal solutions? A suitable compactness assumption implies that there exists at least one optimal solution. 3. How to establish in advance the uniqueness of the optimal solution? A suitable strict convexity assumption implies that there is at most one solution.
© Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045_7
173
174
7 Convex Problems: The Main Questions
4. What is the sensitivity of the optimal value S(y) for small changes y in the data of the problem? The multiplier η is an element of the subdifferential ∂S(0) provided η0 = 1, so then it is a measure for the sensitivity. 5. Can one find in a guaranteed efficient way approximations for the optimal solution, by means of an algorithm? This is possible for almost all convex problems. Road Map 1. Section 7.1 (convex optimization problem, types: LP, QP, SDP, least squares, shortest distance). 2. Theorems 7.2.1–7.2.3, Definition 7.2.5, Theorems 7.2.6, 7.2.7, Example 7.2.3 (existence and uniqueness for convex problems). 3. Section 7.3 (definitions leading up to the concept of smooth local minimum). 4. Theorem 7.4.1, Example 7.4.4 (Fermat’s theorem (smooth case)). 5. Figure 7.2, Proposition 7.5.1 (no need for local minima in convex optimization). 6. Figure 7.3, Theorem 7.6.1, Example 7.6.5 (Fermat’s theorem (convex case)). 7. Section 7.7 (perturbation of a problem). 8. Theorem 7.8.1, Example 7.8.3 (Lagrange multipliers (smooth case)). 9. Figure 7.7, Proposition 7.9.1, Definitions 7.9.2–7.9.4, Theorem 7.9.5, Definition 8.2.2 (Lagrange multipliers (convex case). 10. Figure 7.8 (generalized optimal solutions always exist). 11. Section 7.11 (list advantages convex optimization).
7.1 Convex Optimization Problem We will work coordinatefree. X will always be a finitedimensional inner product 1 space in this chapter. Then we take on X the norm defined by x = x, x 2 for all x ∈ X. The main example is X = Rn , equipped with the standard inner product—the dot product x, y = x · y = i xi yi ; we recall that the concept of finitedimensional inner product space is not more general than Rn with the dot product. For semidefinite programming problems (to be defined below), the space X is Sn , the space of symmetric n × nmatrices, with inner product M, N = Tr(MN ) = i,j mij nij (where T r is the trace of a square matrix, the sum of its diagonal entries). We begin by defining the concept of a convex optimization problem. Definition 7.1.1 A convex optimization problem is a problem to minimize a convex function f : X → R. The optimal value v of the problem is the infimum of all values assumed by f , v = inf f (x) ∈ [−∞, +∞]. x
An optimal solution of the problem is an element x ∈ X for which f ( x ) ∈ R and f ( x ) ≤ f (x) ∀x ∈ X, or, equivalently, for which f assumes its infimum and this
7.1 Convex Optimization Problem
175
is finite, f ( x ) = v ∈ R. Convex problems of primary interest are those where f is proper: other problems cannot have an optimal solution, of course. Example 7.1.2 (Hiding Constraints by Using +∞) Someone wants to spend at most 5 euro in order to maximize the pleasure x 3 y 2 of playing x times in a game hall, which costs 50 euro cents for one game, and eating y times an ice cream that cost 1 euro, This problem can be modeled as the convex optimization problem to minimize the convex function f : X = R2 → R defined by f (x) = −3 ln x − 2 ln y for x > 0, y. and f (x) = +∞ otherwise. Here the constraints x > 0, y > 0 have been hidden. Note that the set of optimal solutions of a convex optimization problem is a convex set. Indeed, if f : X → R is a convex function, and v = infx f (x) is a real number, then f assumes the value v precisely at the sublevel set {x ∈ X  f (x) ≤ v} and a sublevel set of a convex function is a convex set. We emphasize the generality of this definition. If f is proper, then we have the problem to minimize a convex function A → R, where A = dom(f ) is a convex set in X. So the definition hides the constraint x ∈ A by defining the value of the objective function outside A formally to be +∞. In particular, this definition includes the following type of problem, called a convex programming problem: min f0 (x), x ∈ X, f1 (x) ≤ 0, . . . , fm (x) ≤ 0, where f0 , . . . , fm : X → R are convex functions. This definition also includes the following type of problem, called a conic problem: mina, x, x ∈ L, x ∈ C, where a ∈ X, L is an affine subspace of X (that is, it is of the form p + M = {p + m  m ∈ M} for p ∈ X and M a subspace of X), and C is a convex cone in X. Conversely, each convex optimization problem can be reformulated as a conic problem, using homogenization. Convex optimization problems include the following important types of problems: • linear programming problem (LP): this is a conic problem with X = Rn and C = Rn+ , the nonnegative orthant, • quadratic programming problem (QP): this is a conic problem with X = Rn and 2 , x ≥ 0}, the Lorentz cone or ice cream C = {x ∈ Rn  xn2 ≥ x12 + · · · + xn−1 n cone, • semidefinite programming problem (SDP): this is a conic problem with X = Sn , the space of symmetric n × nmatrices, with inner product M, N = Tr(MN ) = i,j mij nij , and C the convex cone of positive semidefinite n × nmatrices, the positive semidefinite cone.
176
7 Convex Problems: The Main Questions
• least squares problems (LS): this is the problem to minimize the length of the error vector Ax − b where A is a given m × nmatrix of rank m, b is a given column vector in Rm and x runs over X = Rn . • shortest distance problems: this is the problem to find the point on a given convex set A ⊆ X that has the shortest distance to a given point p ∈ X. We will emphasize throughout this chapter the parallel between convex and smooth optimization. With the latter we mean, roughly speaking, optimization problems that can have equality constraints and where all functions that describe the problem are continuously differentiable—that is, all their partial derivatives exist and are continuous. A precise definition of smooth optimization will be given later, when we need it. The main questions are the same for smooth problems as for convex problems, but the answers are much more satisfying for convex problems.
7.2 Existence and Uniqueness of Optimal Solutions In applications, most questions of existence and uniqueness of optimal solutions can be answered by means of the following results.
7.2.1 Existence To prove existence of optimal solutions for an optimization problem with a continuous objective function, one always uses the following theorem. If it is not possible to prove, by means of this theorem, existence of an optimal solution for an optimization occurring in an application, then the problem has usually no solution, and it is likely that something has gone wrong in the modeling process that led to the optimization problem. Theorem 7.2.1 (Weierstrass Theorem) A lower semicontinuous function f : T → R on a nonempty compact subset T ⊆ X assumes its infimum. Proof As T is nonempty, we can choose a sequence t1 , t2 , . . . in T for which the sequence of values f (t1 ), f (t2 ), . . . converges to v = inft f (t) ∈ [−∞, +∞). Note that at this point we do not yet know that v > −∞. Now take a convergent subsequence of t1 , t2 , . . ., with limit t. This is possible as T is compact, and the limit of the f values of this subsequence is the limit of the f values of the original sequence. Now we replace the sequence t1 , t2 , . . . by the chosen subsequence and leave the notation for it the same. Then v = limi→+∞ f (ti ) is ≥ f ( t), as f is lower semicontinuous, and so, as v = inft f (t) and f ( t) ∈ R, we get f ( t) = v ∈ R.
We recall that compactness amounts to closedness and boundedness. Thus, four properties have to be verified: lower semicontinuity (or the stronger property continuity) of f , nonemptiness, closedness and boundedness of the domain T . The
7.2 Existence and Uniqueness of Optimal Solutions
177
hardest property to verify is boundedness. If all properties except boundedness have been checked, then existence of optimal solutions follows from Theorem 7.2.1 if there exists t¯ ∈ T for which the sublevel set {t ∈ T  f (t) ≤ f (t¯)} is bounded. This property holds for example if f is coercive, that is, if limt→+∞ f (t) = +∞. To prove existence of optimal solutions for a convex problem, one always uses the following consequence of Theorem 7.2.1, which is slightly more convenient. Theorem 7.2.2 Let f : X → R be a closed proper convex function. If f has no recession directions, then f assumes its infimum. Proof Choose x¯ ∈ dom(f ). Then the infimum of f does not change if we restrict it to the sublevel set {x ∈ X  f (x) ≤ f ( x )}. The recession cone of this level set is the recession cone of f and so it is equal to {0}. Therefore, this level set is compact. Now apply Theorem 7.2.1.
Usually, it is relatively easy to check that f has no recession direction. Therefore, it is in general easier to prove existence in the convex case than in the smooth case.
7.2.2 Uniqueness Here is the main uniqueness result for convex optimization. Theorem 7.2.3 There is at most one optimal solution for the problem to minimize a strictly convex function f : X = Rn → R. Proof If v = infx f (x) is finite, then f assumes its infimum on the—possibly empty—sublevel set {x ∈ X  f (x) ≤ v}, which is a convex set. If we let x run over this set, then the point (x, v) runs over a convex subset of the graph of f . As f is strictly convex, this convex subset cannot contain any closed line segment of positive length. It follows that it is either empty or a singleton. That is, f has at most one minimum.
Moreover, we give a uniqueness result in a situation where the objective function is not strictly convex. We consider convex optimization problems of the following type: minimize a linear function on a convex set. Each convex optimization problem can be brought in this form: minimization of a convex function f : X → R is equivalent to letting (x, ρ) run over the epigraph of f and minimizing ρ. This type of problems has the following property. Proposition 7.2.4 Let l : X → R be a linear function and A ⊆ X a convex set. Assume that l is not constant on A. Then the (possibly empty) set of optimal solutions of the problem to minimize l on A is contained in the relative boundary of A. Proof It suffices to prove that a relative interior point p of A cannot be an optimal solution. Choose x, y ∈ A for which l(x) < l(y). Then choose t > 0 so small that p + t (x − y) ∈ A. Then l(p + t (x − y)) = l(p) + t (l(x) − l(y)) < l(p) and so p is not an optimal solution.
178
7 Convex Problems: The Main Questions
We need a definition. Definition 7.2.5 A convex set A ⊆ X is strictly convex if its relative boundary contains no closed line segment of positive length. Now we are ready to give the promised second uniqueness result. Theorem 7.2.6 There is at most one optimal solution for the problem to minimize a linear function l on a strictly convex set A that is not constant on this set. Proof The set of optimal solutions is a convex set that is a subset of the relative boundary of A. Therefore, it does not contain any closed line segment of positive length, as A is strictly convex. It follows that it is either empty or a singleton.
Here is a third class of convex optimization problems for which one has uniqueness. Theorem 7.2.7 The problem to find a point on a closed convex set A ⊆ X having shortest distance to a given point p ∈ X has a unique optimal solution. There are no uniqueness results for global optima of nonconvex optimization problems.
7.2.3 Illustration of Existence and Uniqueness Here is an illustration of the use of these existence and uniqueness results for convex optimization. Example 7.2.8 (The Waist of Three Lines in Space) Once, Dr. John Tyrrell, lecturer at King’s College London, challenged us PhD students to crack an unsolved problem: to give a proof for the following phenomenon. If an elastic band is stretched around three iron wires in space that are mutually disjoint and at least two of which are not parallel, then the band will always slide to an equilibrium position, and this position does not depend on how you stretch initially the elastic band around the wires. In other words, the three lines have a unique waist. We all did a lot of thinking and calculating but all was in vain. Only many years later, when I gave a course on convex analysis, did I realize how simple the solution is. It does not require any calculation. The problem can be formalized as follows: show that for three given mutually disjoint lines l1 , l2 , l3 in space R3 —at least two of them should not be parallel—the problem to minimize the function f (p1 , p2 , p3 ) = p1 − p2  + p2 − p3  + p3 − p1  where the point pi runs over the line li for i = 1, 2, 3 possess a unique optimal solution. This problem is illustrated in Fig. 7.1. This is a convex optimization problem as all three terms of f are convex. This gives already some information: the function f cannot have any stationary
7.3 Smooth Optimization Problem
179
Fig. 7.1 Waist of three lines in space
l1 p1
p2
l2 l3 p3
points (points where all three partial derivatives are equal to zero) that are not a global minimum. In particular, there can be no local minimum that is not a global minimum. Existence of an optimal solution follows from Theorem 7.2.2: indeed if we choose qi , ri ∈ li for i = 1, 2, 3, such that qj = rj for all j , and write pi (t) = (1 − t)qi + tri for all t ∈ R+ . Then, it follows that the three terms of f (p1 (t), p2 (t), p3 (t)) = p1 (t) − p2 (t) + p2 (t) − p3 (t) + p3 (t) − p1 (t) are the Euclidean norm of a vector function of the form a + tb and for at least one of the three b is not the zero vector. This implies that f (p1 (t), p2 (t), p3 (t)) is for large t nondecreasing, as required. Uniqueness of an optimal solution follows from Theorem 7.2.3: indeed, the function f is strictly convex. We check this now. As all three terms of f are convex, it suffices to show that at least one of them is strictly convex. Let pi (t) = qi + tri with qi , ri ∈ R3 be a parametric description of line li for i = 1, 2, 3. Consider the differences p1 (t) − p2 (t), p2 (t) − p3 (t), p3 (t) − p1 (t). None of these three functions takes the value zero as the lines are mutually disjoint and not all three can be constant, as the lines l1 , l2 , l3 are not all parallel. We assume that p1 (t) − p2 (t) is not constant, as we may without restricting the generality of the argument. It follows readily that the term p1 (t) − p2 (t) is strictly convex. Hence the function f is strictly convex. This settles the ‘waist of three lines in space’ problem.
7.3 Smooth Optimization Problem For a good understanding of convex optimization, it is useful to keep in mind that this runs parallel to the more familiar smooth optimization. In particular, the two optimality conditions for convex problems that will be given, will be seen to run parallel to two wellknown optimality conditions for smooth problems: Fermat’s theorem (‘in a local minimum all partial derivatives are equal to zero’) and
180
7 Convex Problems: The Main Questions
Lagrange’s multiplier rule (‘in a local minimum of an equality constrained problem, you should first take up the constraint functions in the objective function by means of multipliers and then put all partial derivatives equal to zero’). For both optimality conditions for convex problems, we will before we give them first consider their smooth analogue. As a preparation for the latter, we give in this section a precise definition of smooth optimization. This requires that we make precise the intuitive idea that a local minimum should be considered to be ‘smooth’ if near this point not only the description of the feasible region but also the graph of the objective function are ‘nonsingular’ in some sense. Here are some examples of points of local minimum that should be considered to be nonsmooth. Example 7.3.1 (Nonsmooth Points of Local Minimum) 1. The problem to minimize the absolute value, min f (x) = x, x ∈ R, has optimal solution x = 0. At this point f (x) is nondifferentiable. 2. For the problem to minimize the square of the Euclidean norm on the plane over the union of the two coordinate axes, min f (x) = x12 + x22 , x ∈ R2 , x1 x2 = 0, the origin x = 02 is an optimal solution. Near this point, the feasible set is nonsingular, being the intersection of two lines. 3. The problem to minimize the square of the Euclidean norm on the plane over the horizontal axis, which is described by squaring the natural description x2 = 0, that is, min f (x) = x12 + x22 , x ∈ R2 , x22 = 0 has optimal solution x = 0. Near this point, the feasible set is described in an unnecessarily complicated way. Now we are going to give a precise definition of a smooth point of local minimum that is adequate for the present purpose, which is to shed light on the parallel of convex optimization with smooth optimization. To begin with, we need the concepts differentiable and continuously differentiable. Definition 7.3.2 A function G : U → Y from an open subset U of a finite dimensional inner product space X to another finite dimensional vector space Y is differentiable at a point x ∈ U if there exists a linear function G ( x) : X → Y — necessarily unique and called the derivative of G at x —such that G( x + h) − G( x ) − G ( x )(h)/h tends to zero if h → 0. If Y = R, then a linear function X → Y is of the form x → a, x for some a ∈ X; this allows to identify the derivative of a function X → R at a point with an element of X. If X = Rn and Y = Rm , then the derivative of a function X → Y at a point can be identified with an m × nmatrix. For practical purposes, we need the following stronger notion of differentiability. Definition 7.3.3 A function G : U → Y from an open subset U of a finite dimensional inner product space X to a finite dimensional inner product space Y is continuously differentiable if it is differentiable at all points of U and if moreover the derivative G (x) depends continuously on x ∈ U . For brevity, a continuously differentiable function is called a C 1 function.
7.3 Smooth Optimization Problem
181
The following result gives a partial explanation why the concept of continuous differentiability is convenient. Proposition 7.3.4 A function G = (g1 , . . . , gm ) : U → Y —where X = Rn , Y = Rm , U an open subset of Rn —is continuously differentiable iff all partial ∂gi exist and are continuous. Then one has the following formula for the derivatives ∂x j derivative of G: G (x) = (
∂gi (x))ij . ∂xj
This reduces the task to calculate the derivative of a continuously differentiable vector function of several variables to the task of calculating partial derivatives, which can be done by means of the calculus rules for calculating derivatives of functions of one variable. Now we are going to use this concept to give the definitions of local minimum and smooth local minimum. Definition 7.3.5 A function f : U → R on an open set U of Rn has a local minimum at a point x ∈ U if there exists an open subset V of U containing the point x such that f assumes its minimum on V at the point x, f (x) ≥ f ( x ) ∀x ∈ V . Definition 7.3.6 A point of local minimum x ∈ X of a function f : X → (−∞, +∞] for which the graph of f is described in an open set U ⊆ X containing x as the zero set of a C 1 function G : U → Y , where Y is a finite dimensional inner product space, is smooth if G ( x ) is surjective. Note that this property can be verified in concrete examples: if G is given, then one can usually compute G ( x ) and then check surjectivity. In order to understand the sense of this definition, we need the concept tangent vector. Definition 7.3.7 A tangent vector to a subset R of a finitedimensional vector space X at a point r ∈ R is a vector d ∈ X for which there exist sequences (dk )k in X and (tk )k in R++ such that dk → d, tk → 0 for k → ∞, and r + tk dk ∈ R for all k. We write Tr (R) for the set of all tangent vectors to R at r and call it the tangent cone of the set R at the point r. Note that this is a cone: for each d ∈ Tr (R), and each α ∈ R++ , one has αd ∈ Tr (R). The following nontrivial result makes it possible to calculate the tangent cone in some interesting cases. Theorem 7.3.8 (Tangent Space Theorem) Let X, Y be finite dimensional inner product spaces, G : X → Y a continuously differentiable function and x ∈ X for which G( x ) = 0 and moreover the linear function G ( x ) from X to Y is surjective. Then the tangent cone to the set of solutions of the vector equation G(x) = 0 at the
182
7 Convex Problems: The Main Questions
point x is the kernel of the derivative of G at x , {h ∈ X  G ( x )(h) = 0}, x ). T x ({x ∈ X  G(x) = 0}) = ker G (
This result is an equivalent version of two related central results from differential calculus: the inverse function theorem and the implicit function theorem. Note in particular that the tangent cone in Theorem 7.3.8 is a subspace. Therefore, the tangent cone of the solution set of G(x) = 0 at a solution x for which G ( x ) is surjective is called tangent space instead of tangent cone. Now we can understand the sense of the definition of smooth local minimum above. Indeed, if the graph of a function f is described near a point ( x , f ( x )) as the solution set of a C 1 vector function G with G ( x , f ( x )) surjective, then the graph of f is near the point ( x , f ( x )) smooth in the following sense: it looks like an affine subspace—( x , f ( x )) + T( x , f ( x )) + ker G ( x , f ( x )). x ,f ( x )) = (
7.4 Fermat’s Theorem (Smooth Case) We recall Fermat’s theorem for smooth problems. Theorem 7.4.1 (Fermat’s Theorem (Smooth Case)) Let a function f : U → R, where the set U ⊆ Rn is open, and a point of differentiability x of f be given. Then the derivative of f at x is zero, x ) = 0, f ( if f (x) has a local minimum at x. In order to emphasize the parallel between the two optimality conditions for convex optimization and their smooth analogues to be given, of which this is the first one, we will prove all these optimality conditions using the concept of the tangent cone. In the proof, we need the following result. Proposition 7.4.2 Let X, Y be finite dimensional inner product spaces, G : X → Y a continuously differentiable function and x ∈ X. Then the tangent cone of the set R = G , the graph of G, at the point r = ( x , G( x )), is the graph of the derivative G ( x ), T( x ,G( x )) (G ) = G ( x). This result follows immediately from the definitions of the derivative and the tangent cone. Alternatively, it is a special case of the tangent space theorem given above: consider the function with recipe (x, z) → G(x) − z. Now we are ready to give the proof of Theorem 7.4.1 in the promised style.
7.4 Fermat’s Theorem (Smooth Case)
183
Proof of Fermat’s Theorem in the Smooth Case Assume that x is a point of local minimum of the function f . To prove the theorem, it suffices by Proposition 7.4.2 to prove the following claim: (X × 0) + T( x ,f ( x )) (f ) ⊂ X × R. So we have here a proper inclusion and no equality. To prove this proper inclusion, we show that for every d = (x, ρ) ∈ T( x ,f ( x )) (f ), one has ρ = 0. Choose sequences (dk = (xk , ρk ))k in X × R and (tk )k in R++ such that dk → d, tk → 0 and x + tk xk , f ( x ) + tk ρk ) ∈ f . ( x , f ( x )) + tk dk = ( So, as x is a point of local minimum of f and xk → x, tk → 0, x + txk ) ≥ f ( x ), f ( x ) + tk ρk = f ( for all sufficiently large k. Hence tk ρk ≥ 0 and so, as tk ∈ R++ , ρk ≥ 0 for all sufficiently large k. As ρk → ρ, it follows that ρ ≥ 0. Now we repeat the argument for −d instead of for d; this is allowed as T( x ,f ( x )) (f ) is a subspace, so d ∈ T( x ,f ( x )) (f ) implies −d ∈ T( x ,f ( x )) (f ). This gives −ρ ≥ 0. In all, we get ρ = 0.
The power of Fermat’s theorem in the smooth case is based on its combination with the differential calculus (rules for computing the derivative of a continuously differentiable function). Remark 7.4.3 Note that Fermat’s theorem in the smooth case is equivalent to the proper inclusion (X × 0) + T( x ,f ( x )) (f ) ⊂ X × R. It is convenient that for each smooth or convex optimization that can be analyzed completely, the analysis can always be presented in the same systematic way, which we call the four step method. This method proceeds as follows. 1. Modeling (and existence and/or uniqueness). The problem is modeled in a precise way. Moreover, existence and/or uniqueness of the optimal solution are proved if this is desirable. 2. Optimality conditions. The optimality conditions are written down for the problem. 3. Analysis. The optimality conditions are analyzed. 4. Conclusion. The optimal solution(s) is/are given. Now we give the first illustration of this four step method.
184
7 Convex Problems: The Main Questions
Example 7.4.4 Minimize the function 2x14 + x24 − x12 − 2x22 . 1. Modeling and existence. min f (x) = 2x14 + x24 − x12 − 2x22 , x ∈ R2 . Existence of an optimal solution x follows from the coercivity of f (f (x) = x12 (2x12 − 1) − x22 (x22 − 2) → +∞ for x → +∞). 2. Fermat: f (x) = (0, 0) ⇒ f1 = 8x13 − 2x1 = 0, f2 = 4x23 − 4x2 = 0. 3. Analysis. Nine stationary points: x1 ∈ {0, + 12 , − 12 } and x2 ∈ {0, +1, −1}. Comparison of f values shows that four stationary points have lowest f value: x1 = ± 12 , x2 = ±1. 4. Conclusion. There are four optimal solutions, x = (± 12 , ±1). (Remark: (0, 0) is a local maximum.)
7.5 Convex Optimization: No Need for Local Minima Before we are going to give Fermat’s theorem in the convex case, we explain in this section why the concept local minimum, which plays a role in Fermat’s theorem in the smooth case, plays no role in convex optimization. Proposition 7.5.1 Each local minimum of a proper convex function is also a global minimum. Figure 7.2 illustrates this result and its proof. Proof We argue by contradiction. Assume f is a proper convex function on X and x is a local minimum but not a global minimum. Then we can pick x¯ such that f (x) ¯ < f ( x ). Now consider a point P on the segment with endpoints ( x , f ( x )) and (x, ¯ f (x)) ¯ close to ( x , f ( x )). That is, P = (xα , yα ) = (1 − α)( x , f ( x )) + α(x, ¯ f (x)) ¯ with α ∈ (0, 1) sufficiently small. What ‘sufficiently small’ means will become clear in a moment. On the one hand, P lies in the epigraph of f , and therefore, by Fig. 7.2 No distinction between local and global minima
f
x−
∧
xα x
7.6 Fermat’s Theorem (Convex Case)
185
the local minimality of x , we get that for α > 0 sufficiently small, P is so close to ( x , f ( x )) that the last coordinate of P is ≥ f ( x ). On the other hand, this last coordinate equals (1−α)f ( x )+αf (x), ¯ and—by f (x) ¯ < f ( x )—this last coordinate is < (1 − α)f ( x ) + αf ( x ) = f ( x ). So we get the required contradiction.
7.6 Fermat’s Theorem (Convex Case) In order to obtain Fermat’s theorem for convex problems, you should replace the equation f (x) = 0 by the inclusion 0 ∈ ∂f (x). Theorem 7.6.1 (Fermat’s Theorem (Convex Case)) Let f : X → (−∞, +∞] be a proper convex function and let x ∈ dom(f ). Then x is an optimal solution of the convex optimization problem minx f (x) iff 0 ∈ ∂f ( x ). Figure 7.3 illustrates Theorem 7.6.1. The power of this result is based on its combination with the convex calculus (rules for computing the subdifferential of a convex function). This result itself is just a tautology and therefore we do not display the proof. However, in order to emphasize the parallel between the four optimality conditions for smooth and convex optimization that we give (Fermat and Lagrange for smooth and convex problems), we mention the following reformulation of the result above in terms of a tangent cone. Remark 7.6.2 Fermat’s theorem in the convex case is equivalent to the proper inclusion (X × 0) + T( x ,f ( x )) (epi(f )) ⊂ X × R. Compare this reformulation of Fermat’s theorem in the convex case to the one given for Fermat’s theorem in the smooth case in Remark 7.4.3. Note that the epigraph and the graph of a function f : X → (−∞, +∞] are related: epi(f ) = f + (X × [0, +∞)). Fig. 7.3 Fermat’s theorem (convex version) 0 ∈ ∂f ( x)
f
∧
x
186
7 Convex Problems: The Main Questions
In passing, we mention that here we come across another situation where the tangent cone can be determined easily. Proposition 7.6.3 The tangent cone to a convex set A ⊆ X at a point a¯ ∈ A is the cone generated by the set A − a, ¯ Ta (A) = R+ (A − a) ¯ = {ρ(a − a) ¯  ρ ∈ R+ , a ∈ A}. We emphasize that Fermat’s theorem 7.6.1 in the convex case gives a criterion for global optimality: that is, the condition 0 ∈ ∂f ( x ) holds if and only if x is an optimal solution for the convex optimization problem minx f (x). In other words, it is a necessary and sufficient condition for global optimality. Therefore, the ideal of the theory of optimization is reached here, in some sense. In comparison, Fermat’s theorem 7.4.1 in the smooth case gives only a necessary condition for local optimality. Now we give some examples. Example 7.6.4 (Minimization of a Complicated Looking Function) Solve the problem to minimize f (x, y) = x 2 + 2y 2 − 3x − y + 2 x 2 − 4x + y 2 + 4.
10 x 1. Modeling and convexity. f (x, y) = (x y) − 3x − y + 2(x, y) − 02 y (2, 0) → min, (x, y) ∈ (R2 )T . The function f (x, y) is strictly convex, as its building reveals: it is the sum of the strictly convex function
pattern 10 x (x y) and the two convex functions −3x − y and 2(x, y) − (2, 0). 02 y 2. Criterion. Convex Fermat: 0 ∈ ∂f (x, y). There is a unique point of nondifferentiability, (2, 0); in this point, f has subdifferential the sum of the derivative (x 2 + 2y 2 − 3x − y) = (2x − 3, 4y − 1) taken in the point (2, 0)—that is, (1, −1)—and the unit disk multiplied from the origin with the scalar 2. That is, we have to take the disk with center the origin and radius 2 and translate it over the vector (1, −1). This gives the disk with center at (1, −1) and radius 2. 3. Analysis. Let us try the point of non differentiability to begin with. It is a special point. Its subdifferential is a relatively large set so we have a fair chance that it contains the origin. All other points have a subdifferential consisting of one point only. It would be unlikely that in a randomly chosen such points the derivative is zero. Trying to find a point of differentiability that is a solution requires therefore the solution of the stationarity equations. These are two complicated equations in two unknowns. Let us hope that we are lucky with the investigation of the point of nondifferentiability. Then we do not have to solve this system. Well, the criterion for optimality of the point of nondifferentiability (2, 0) is: the origin has to lie inside the disk with center (1, −1) and radius 2. This is illustrated in
7.6 Fermat’s Theorem (Convex Case) Fig. 7.4 Catching the origin in the subdifferential of f at some point
187
0
1
−1
2
Fig. 7.4. This picture shows that this is√indeed the case. To be more precise, the distance between (1, −1) and (0, 0) is 2, which is indeed smaller than 2. 4. Conclusion. The problem has a unique solution (2, 0). In the next example, both calculus rules for subdifferentials—MoreauRockafellar and DubovitskiiMilyutin—are used. Example 7.6.5 (Application of MoreauRockafellar and DubovitskiiMilyutin) Minimize the function 3 1 x 2 + y 2 + x + x + y 4 + y. 2 2 1. Modeling and existence. min f (x, y) = x 2 + y 2 +x+ 32 x+y 4 + 12 y, (x, y) ∈ R2 . This problem is convex. 2. Convex Fermat: 0 ∈ ∂f (x, y). 3. Analysis. Again we try the point of nondifferentiability (x, y) = (0, 0). We have • ∂( x 2 + y 2 )(0, 0) = B(0, 1), the disk with radius 1 and center at the origin; • ∂( 32 x + y 4 + 12 y)(0, 0) = (∂( 32 x + y 4 + 12 y) (0, 0) = {( 32 , 12 )}; • ∂x(0, 0) = ∂ max(−x, x)(0, 0), and this is by DubovitskiiMilyutin equal to {(−1, 0)}co ∪ {(1, 0)}, the closed line segment on the xaxis with endpoints −1 and +1. This gives, by MoreauRockafellar, that ∂f (0, 0) is the Minkowski sum of B(0, 1), the interval [−1, 1] on the xaxis, and the singleton {( 32 , 12 )}. Then you have the pleasure of drawing a picture of this set. This reveals that ∂f (0, 0) is an upright square with sides 2 and with center the point ( 32 , 12 ), to which on both horizontal sides halfdisks have been attached. This set contains the origin. 4. Conclusion. The point (0, 0) is an optimal solution. These were numerical examples. Now we give a convincing example of an application to the facility location problem of FermatWeber. Example 7.6.6 (The Facility Location Problem of FermatWeber) A facility has to be placed for which the average distance to three given points ai , i = 1, 2, 3—that
188
7 Convex Problems: The Main Questions
120° 120° 120°
Fig. 7.5 Facility location problem of FermatWeber
do not lie on one straight line—is minimal. This is also called the FermatTorricelliSteiner problem. The usual textbook solution is by the theorem of Fermat. The cases that a Torricelli point does not exist are often omitted or require an awkward additional analysis. Below we use the convex Fermat theorem, and this gives an efficient treatment that includes all cases. 1. Modeling and existence: f (x) = 13 3i=1 x − ai  → min, x ∈ R2 , f (x) ≈ x2 → ∞ for x → ∞, so by Weierstrass an optimal solution x exists. The function f is strictly convex so the optimal solution is unique. x−ai 2. Convex Fermat: 0 ∈ ∂f (x); this subdifferential equals 13 3i=1 x−a if x = i a −a
k a1 , a2 , a3 and it equals {u  u ≤ 1} + aii −ajj  + aaii −a −ak  for x = ai , where we have written {ai , aj , ak } = {a1 , a2 , a3 }. 3. Analysis: in a point of nondifferentiability x = ai we get (use: sum of two unit vectors has length ≤ 1 iff these make an angle ≥ 2π/3) that the angle of the triangle with vertices a1 , a2 , a3 at ai is ≥ 2π/3; in a point of differentiability we get (use: the sum of three unit vectors equals zero iff these make angles 2π/3 with each other) that x is the Torricelli point of the triangle, the point that sees each pair of vertices of the triangle under the same angle—2π/3. 4. Conclusion: if one of the angles of the triangle is ≥2π/3, then the unique optimal location is at the vertex of that angle; otherwise, there exists a unique Torricelli point and this is the unique optimal location. Figure 7.5 illustrates this: the first case on the left hand side and the second case on the right hand side.
Note that in the last example the optimal solution is characterized by a law (provided it is a point of differentiability for the objective function): that the angles between the line segments from the optimal solution to the three given points are equal.
7.7 Perturbation of a Problem We want to consider the sensitivity of a given smooth or convex optimization problem to changes in some of its data. We describe the given problem as follows. Let f : X → (−∞, +∞] be a function, where X is a finite dimensional inner
7.8 Lagrange Multipliers (Smooth Case)
189
product space. Note that we allow f to take the value +∞. Consider the problem min f (x), x ∈ X.
(P )
Now we describe a change in the data of this problem by embedding (P ) into a family of problems min fy (x), x ∈ X,
(Py )
where the index y runs over a finite dimensional inner product space Y , and where fy is a function X → (−∞, +∞] for all y ∈ Y , and where f0 = f . The vector y represents changes in some of the data of problem (P ). This family of optimization problems can be described efficiently by just one object, by the function F : X × Y → (−∞, +∞] defined by F (x, y) = fy (x) for all x ∈ X, y ∈ Y . The function F is called the perturbation function and (Py ) is called a perturbed problem. Here are two examples where perturbations arise naturally. Example 7.7.1 (Perturbation) 1. Suppose you want to choose a location on a road that is close to a given number of locations; to be precise the sum of the squares of the distances from this point to the given points has to be minimized. Consider the numerical example that there are three given locations, given by points on the line R, two of which are precisely known, 3 and 5, and one is approximately known, y ≈ 0. Then you choose X = Y = R and F (x, y) = (x − y)2 + (x − 3)2 + (x − 5)2 for all x, y ∈ R. 2. Suppose that you want to find the point in the plane R2 that satisfies the inequality x1 + 3x2 ≤ 1 and that is as close as possible to the point (1, 2). We are interested in the sensitivity of to a change of the right hand side of the inequality, which might represent a budget. Then you choose X = R2 , Y = R and F (x, y) = (x1 − 1)2 + (x2 − 2)2 if x1 + 3x2 − 1 + y ≤ 0 and F (x, y) = +∞ otherwise.
7.8 Lagrange Multipliers (Smooth Case) We recall the method of Lagrange multipliers for smooth problems. Theorem 7.8.1 (Lagrange Multiplier Rule) Let functions f0 , . . . , fm : U → R, x of these where the set U ⊆ Rn is open, and a point of continuous differentiability functions be given. Then there exists a selection of real numbers λ = (λ0 , . . . , λm ), not all zero, such that the derivative of the Lagrange function x → L(x; λ) = λ0 f0 (x) + · · · + λm fm (x)
190
7 Convex Problems: The Main Questions
at x is zero, x ; λ) = λ0 f0 ( x ) + · · · + λm fm ( x ) = 0, L ( if the following problem has a local minimum at x, min f0 (x), x ∈ X, f1 (x) = · · · = fm (x) = 0. x ), . . . , fm ( x ) are linSketch of the Proof of Theorem 7.8.1 We assume that f1 ( early independent. This can be done wlog, for if there exists a nontrivial relation λ1 f1 ( x ) + · · · + λm fm ( x ) = 0, then we can choose λ0 = 0 and then we are through. Consider the following perturbation of the constraints of the problem: fj (x) + yj = 0, 1 ≤ j ≤ m. To be more precise, we consider the perturbation function F : X × Y → (−∞, +∞], where Y = Rm , defined by F (x, y) = f0 (x) if f1 (x) + y1 = · · · = fm (x) + ym = 0 and F (x, y) = +∞ otherwise. To prove the theorem, it suffices to verify the claim that the following proper inclusion holds: (X × 0Y × 0) + T( x ,0,f ( x )) (F ) ⊂ X × Y × R. Indeed, F can be seen as the solution set of the C 1 function with recipe x → (z, y) = (f0 (x), −f1 (x), . . . , −fm (x)), so by Proposition 7.4.2, T( x ,0,f ( x )) (F ) can be seen as the graph of the function with recipe x )x, −f1 ( x )x, . . . , −fm ( x )x). x → (z, y) = (f0 ( The claim implies that this tangent space lies in a hyperplane with equation η, y− η0 z = 0 for suitable η = (λ1 , . . . , λm ) ∈ Y, η0 = λ0 ∈ R, not both zero. This gives the required Lagrange equations. To prove the proper inclusion above, it suffices to show that the intersection of the tangent space T( x ,0,f ( x )) (F ) and the subspace {y = 0} is contained in X × Y × 0. Indeed, this implies that the intersection of (X × 0Y × 0) + T( x ,0,f ( x )) (F ) and the subspace {y = 0} is contained in X×0Y ×0; therefore (X×0Y ×0)+T( x ,0,f ( x )) (F ) cannot be the entire space X × Y × R. To begin with, the tangent space to the intersection F ∩ {y = 0} at the point ( x , 0, f ( x ) is equal to the intersection of the tangent spaces to F and {y = 0} at this point. We omit the easy verification. So, as f (x) = F (x, 0) for all x, it remains to show that the tangent space to f at the point ( x , f ( x )) is contained in X × 0. This follows immediately from the local minimality of f at x.
Remark 7.8.2 This sketch shows that Lagrange’s multiplier method in the smooth case is equivalent to the proper inclusion (X × 0Y × 0) + T( x ,0,f ( x )) (F ) ⊂ X × Y × R.
7.8 Lagrange Multipliers (Smooth Case)
191
Compare this reformulation of Lagrange’s multiplier method in the smooth case with the reformulations of Fermat’s theorem in the smooth and convex case that have been given in Remarks 7.4.3 and 7.6.2 respectively. As an aside, we point out that the proof that is sketched above can be generalized to a more general type of smooth optimization problem than minimization under equality constraints. The power of the Lagrange multiplier method is the reversal of the natural order of the two tasks, which is ‘first eliminate and then differentiate’. The advantage of this reversal is that it simplifies the most difficult task, elimination, turning it from a nonlinear problem into a linear problem. In many examples of optimization problems with equality constraints, this reversal is not necessary: one uses the constraints to eliminate some variables, and then one applies Fermat’s theorem (in the smooth case). But sometimes this does not work, such as in the following example of a simple looking problem that would be very hard or even impossible to solve without this reversal of tasks. Example 7.8.3 (Lagrange Multiplier Rule) Find the minima and maxima of the function x12 + 12x1 x2 + 2x22 on the ellipse 4x12 + x22 = 25. 1. Modeling and existence. f0 (x) = x12 + 12x1 x2 + 2x22 , x ∈ R2 , f1 (x) = 4x12 + x22 − 25 = 0. Global extrema exist by Weierstrass: f0 is continuous and the solution set of f1 (x) = 0 is a region bounded by an ellipse (including the ellipse) and so it is nonempty, closed and bounded. 2. Lagrange. The Lagrange function is L = λ0 (x12 + 12x1 x2 + 2x22 ) + λ1 (4x12 + x22 − 25). The optimality condition is Lx = 0, that is, ∂L = λ0 (2x1 + 12x2 ) + λ1 (8x1 ) = 0, ∂x1 ∂L = λ0 (12x1 + 4x2 ) + λ1 (2x2 ) = 0. ∂x2 Exclusion of the bad case λ0 = 0: if λ0 = 0, then λ1 = 0 and x1 = x2 = 0; this contradicts the constraint. 3. Analysis. Eliminate λ1 : x1 x2 + 6x22 = 24x12 + 8x1 x2 . This can be rewritten as
6
x2 x1
2
−7
x2 x1
− 24 = 0
as x1 cannot be equal to zero (this would imply that x2 = 0 and this contradicts the constraint). This gives x2 = 83 x1 or x2 = − 32 x1 . In the first (resp. second)
192
7 Convex Problems: The Main Questions
case we get x1 = ± 32 and so x2 = ±4 (resp. x1 = ±2 and so x2 = ∓3; we use the sign ∓ as x2 has a different sign as x1 ), using the equality constraint. Comparison of these four candidates: f0 (2, −3) = f0 (−2, 3) = −50, 1 3 3 f0 ( , 4) = f0 (− , −4) = 106 . 2 2 4 4. Conclusion. (2, −3) and (−2, 3) are global minima and ( 32 , 4) and (− 32 , −4) are global maxima.
7.9 Lagrange Multipliers (Convex Case) Now we give the method of Lagrange multipliers for convex problems. The setup is the same as in Sect. 7.7. We write S(y) for the optimal value of the problem (Py ), for all y ∈ Y . Thus we get a function S : Y → R, called the optimal value function. Proposition 7.9.1 If the perturbation function F is convex, then the optimal value function S is convex. The proof of this result shows how convenient the concept of strict epigraph can be (proving the result using the epigraph is not convenient). Proof By definition, epis (S), the strict epigraph of S is the image of epi s (F ), the strict epigraph of F under the projection X × Y × R → Y × R, given by the recipe (x, y, z) → (y, z). The F is a convex function, epis (F ) is a convex set. As convexity of sets is preserved under talking the image under a linear function, we get that epis (S) is a convex set and so that S is a convex function.
Figure 7.6 illustrates the shadow price interpretation of a multiplier η. By the proposition above, there exists if S(0) is finite a closed halfspace in Y × R that supports the epigraph of S at the point (0, S(0)). Let (η, η0 ) be an outward normal of this halfspace. If this halfspace is not vertical, that is, if η0 = 0, then we can Fig. 7.6 Shadow price interpretation of multiplier η
S slope η
0
y
7.9 Lagrange Multipliers (Convex Case)
193
assume that η0 = 1 by scaling. Then we have S(y) ≥ S(0) + η · y for all y ∈ Y . That is, η ∈ ∂S(0). Therefore, η is a shadow price: it measures the sensitivity of changes in the data of our problem. We need some definitions. Definition 7.9.2 A selection of multipliers is an element (η, η0 ) ∈ Y × [0, +∞) \ {0Y ×R }. In particular, it is not allowed that η and η0 are simultaneously zero: at least one of the two elements η ∈ Y and η0 ∈ R should be nonzero. Definition 7.9.3 The Lagrange condition holds for a feasible solution of (P ) and a selection of multipliers (η, η0 ) if the closed halfspace in Y × R with (0Y , f ( x )) on the boundary and with out of the halfspace pointing normal (η, −η0 ) is a supporting hyperplane for epi(S). For the parallel to the smooth case, note that the Lagrange condition holds for x and some selection of multipliers (η, η0 ) iff the following proper inclusion holds: (X × 0 × 0) + T( x ,0,f ( x )) (epi(F )) ⊂ X × Y × R. Definition 7.9.4 A Slater point is a vector x¯ ∈ X for which 0Y is an interior point of dom(F (x, ¯ ·)). The Slater condition holds if there exists a Slater point. Now we can state Lagrange’s method for convex problems. Theorem 7.9.5 (Method of Lagrange Multipliers (Convex Case)) Let X, Y be finite dimensional inner product spaces, F a perturbation function X × Y → (−∞, +∞] and x ∈ X. Assume that F is a proper closed convex function. Then the following statements hold true for the convex problem (P ) to minimize f (x) = F (x, 0): 1. If the point x is an optimal solution of (P ), then the Lagrange conditions hold for some selection of multipliers. 2. If the Lagrange condition holds for x and some selection of multipliers (η, η0 ) with η0 = 0, then x is an optimal solution of (P ). 3. If the Slater condition holds, then the Lagrange conditions cannot hold with η0 = 0. Note that the convex method of Lagrange multipliers gives under the Slater condition a criterion—that is, a necessary and sufficient condition—for optimality. Proof 1. If x is an optimal solution of (P ), then f ( x ) = S(0). Take a supporting halfspace to epi(S) at the point (0, S(0)). Let (η, η0 ) be its outward slope. Then the Lagrange condition holds for x and (η, η0 ), by definition of the Lagrange condition. 2. If the Lagrange condition holds for x and (η, η0 ) with η0 = 0, then the halfspace involved is not vertical and hence f ( x ) must be equal to S(0).
194
7 Convex Problems: The Main Questions
3. If the Slater condition holds then in some neighborhood of 0Y the optimal value function does not take the value +∞. Hence, a separating halfspace to epi(S) at the point (0, S(0)) cannot be vertical.
Remark 7.9.6 The Lagrange multiplier method in the convex case is equivalent to the following proper inclusion: (X × 0Y × 0) + T( x ,0,f ( x )) (epi(F )). Compare this reformulation for Lagrange’s multiplier method in the convex case to the reformulations for Fermat’s theorem in the smooth and convex case and the reformulation for Lagrange’s multiplier method in the smooth case, given in Remarks 7.4.3, 7.6.2, and 7.8.2 respectively. At first sight, it might not be clear how the method of Lagrange for the convex case can be used to solve optimization problems: the Lagrange condition is formulated in terms of the optimal value function S, which is a priori unknown. However, if we replace S by its definition in terms of the function F , we get a repeated minimization: first over x ∈ X and then over y ∈ Y . Changing the order of this repeated minimization brings the Lagrange condition in a form that is convenient to use. Now we give the outcome of this reformulation of the Lagrange condition. We need a definition, a notation for the minimization over y ∈ Y in the reversed order. Definition 7.9.7 The Lagrange function L : X × (Y × [0, +∞) \ {0Y ×R }) → R is given by the recipe L(x, η, η0 ) = inf (η0 F (x, y) − η, y) y∈Y
where y runs over all elements of Y for which F (x, y) ∈ R. We write L(x, η) = L(x, η, 1). The Lagrange function has the following property. Proposition 7.9.8 The Lagrange function L(x, η, η0 ) is convex in x ∈ X The proof of this result is similar to the proof of Proposition 7.9.1. Now we can give the promised userfriendly reformulation of the Lagrange condition. Proposition 7.9.9 The Lagrange condition for x ∈ X and a selection of Lagrange multipliers (η, η0 ) is equivalent to the conditions f ( x ) = L( x , η, η0 ), and 0 ∈ ∂L( x , η, η0 ). Expressed in terms of the perturbation function F , the Lagrange condition for the point x and the selection of multipliers (η, η0 ), states that the epigraph of F is
7.9 Lagrange Multipliers (Convex Case)
195
tangent plane
slope η
∧
∧
(x, 0, F (x, 0)) z = F (x, y) O ∧
x x
y
Fig. 7.7 Convex method of Lagrange multipliers (0, η) ∈ ∂F ( x , 0)
contained in a closed halfspace η, y ≤ ηz for some η ∈ Y, η0 ∈ R+ , not both zero. If moreover η0 = 0, then we can take η0 = 1 and then we get (0Y , η) ∈ ∂F ( x , 0Y ). Figure 7.7 illustrates Lagrange’s multiplier method for the convex case, for the formulation of the Lagrange condition in terms of F . Here F is chosen such that its graph is a paraboloid. So the problems (Py ), and in particular (P ), are problems to find the lowest point of a valley parabola. For (P ) this parabola is drawn and its lowest point is at x . The Slater condition is satisfied: each point x¯ is a Slater point as the effective domain of F is the entire space X × Y = R2 . The tangent plane to the paraboloid is drawn at the point above ( x , 0). Its slope in the xdirection is 0 by the minimality property of x . Here we see that the convex method of Lagrange multipliers is just the convex Fermat theorem in disguise! Its slope in the ydirection is denoted by η. As this tangent plane supports the paraboloid at the point above ( x , 0), we have—by the definition of the subdifferential—that the first one of the two equivalent conditions in the theorem holds: (0, η) ∈ ∂F ( x , 0). In fact, Fig. 7.7 also illustrates Lagrange’s multiplier method for the smooth case (to be more precise, for a more general version than given in Theorem 7.8.1: for a function F that satisfies suitable smoothness conditions). Note here that if F is both convex and smooth, as in Fig. 7.7, then linearization of F by a supporting hyperplane (‘convex linearization’) is the same as linearization of F by a tangent space (‘smooth linearization’). The purpose of the following simple example is to illustrate all notions from this section and Lagrange’s multiplier method for the convex case, Example 7.9.10 (Lagrange’s Multiplier Method for the Convex Case) Suppose you want to choose a location on a road that is close to a given number of locations on
196
7 Convex Problems: The Main Questions
this road; to be precise, the sum of the squares of the distances from this point to the given points has to be minimized. Consider the numerical example that there are three given locations, given by points on the line R, two of which are precisely known, 3 and 5, and one is approximately known, 0. This can be modeled as follows min f (x) = x 2 + (x − 3)2 + (x − 5)2 , x ∈ R.
(P )
This is a convex optimization problem. We embed this problem into the following family of optimization problems min fy (x) = (x − y)2 + (x − 3)2 + (x − 5)2 , x ∈ R.
(Py )
Here y models the position of the location that is approximately known. These problems (Py ) are all convex optimization problems. It is straightforward to solve the problem (P ) and to determine the sensitivity of its optimal values to small changes in the location that is approximately known to be 0. The solution of (Py ) is seen to be the average location (3 + 5 + y)/3; therefore, the solution of (P ) is 8/3, and substitution of x = (3 + 5 + y)/3 in fy (x) and then taking the derivative at y = 0 gives η = S (0) = −16/3. So now that we are in the comfortable possession of these answers, we can use this numerical problem to get familiar with the convex method of Lagrange and all concepts involved. The family of problems (Py )y can be described by one perturbation function, F (x, y) = (x − y)2 + (x − 3)2 + (x − 5)2 for all x, y ∈ R. This is a proper closed convex function. Note that the Slater condition holds: each point x¯ ∈ R is a Slater point as F (x, y) < +∞ for all x, y ∈ R. The Lagrange function is L(x, η) = inf[(x − y)2 + (x − 3)2 + (x − 5)2 − ηy]. y
The expression that has to be minimized is convex. Carry out this minimization by putting the derivative with respect to y equal to zero: 2(y − x) − η = 0. So y = x + 12 η. Substitution gives the following formula for the Lagrange function, 1 L(x, η) = − η2 − xη + (x − 3)2 + (x − 5)2 . 4 This function is convex in x for each η. The convex method of Lagrange multipliers gives that x is a solution of the given problem (P ) iff there exist η ∈ R such that f ( x ) = L( x , η) and Lx ( x , η) = 0. Here we have used that as L(x, η) is convex in x, the condition L( x , η) = minx L(x, η) is equivalent to the condition Lx ( x , η) = 0. So we have, writing x instead of x for simplicity: 1 x 2 + (x − 3)2 + (x − 5)2 = − η2 − xη + (x − 3)2 + (x − 5)2 , 4 −η + 2(x − 3) + 2(x − 5) = 0.
7.10 *Generalized Optimal Solutions Always Exist
197
The second equation gives η = 4x − 16. Substitution in the first equation and simplifying gives 9x 2 − 48x + 64 = 0,that is (3x − 8)2 = 0, so x = 8/3, and so η = 4 · 83 − 16 = −16/3. This is in agreement with the outcomes above.
7.10 *Generalized Optimal Solutions Always Exist The aim of this section is to explain by means of examples that for any given convex optimization problem, an optimal solution in a suitable generalized sense always exists. Example 7.10.1 (Generalized Optimal Solution for a Convex Optimization Problem) Figure 7.8 illustrates the existence of optimal solutions using the topview model, for a better insight in the existence question. Fig. 7.8 Existence of a generalized optimal solution in the topview model
1.
0 2.
0 3.
4.
5.
6.
198
7 Convex Problems: The Main Questions
The illustrations suggest that an optimal solution in a suitable generalized sense always exists. The topview model suggests how to define this generalized optimal solution, as we will see. Now we will explain Fig. 7.8. We include for convenience an explanation of the topview model, which was already presented in Chap. 1. The topview model works as follows for a convex function f of one variable. Recall that the points of the epigraph of f can been modeled as rays of the convex cone c(f )—by homogenization. The rays of cl(c(f )), the closure of c(f ), correspond bijectively to points on the standard upper hemisphere in space R3 — by normalization (dividing nonzero vectors by their length). Now we make a topview model: we look at the hemisphere from above; then we see a closed disk. More formally, the points on the hemisphere correspond bijectively to the points of the standard closed unit disk in the plane R2 —by orthogonal projection (x, y, z) → (x, y) of the hemisphere on the disk. So, in all we can model the epigraph of f as a subset of the standard unit disk in the plane. The effect of taking the closure of this subset of the disk is that certain points on the boundary of the disk—this boundary is the standard unit circle—will be added. In particular, the point (0, 1) will be added: this represents the fact that (0, 1) is a recession vector of the epigraph of each function. The set of points that are added on or below the vertical coordinateaxis of the plane correspond bijectively to the rays of Rf , the recession cone of f . Here are details of the six cases in Fig. 7.8. In each case, you see on the left hand side the epigraph of f together with a horizontal line at level the optimal value r, provided this is finite. On the right hand side, you see the closure of this epigraph in the ball model and this horizontal line. Note that here we also have such a line if the optimal value is −∞: the lower half of the circle. We define a generalized solution as the points given by the common points of the graph of f and the horizontal line at level the optimal value of f . 1. Here f has a unique minimum x . We have assumed wlog that x = 0 and f ( x) = 0. 2. Here f has an unbounded set of minima, but f is not constant. We have assumed wlog that the optimal value is 0 and that the solution set is [0, +∞). In the topview model, you see that there is a unique generalized optimal solution which is not an ordinary optimal solution, represented by the point (1, 0). This point corresponds to the fact that the recession cone of f consists of the ray generated by the vector (1, 0). 3. Here the optimal value is finite, but there is no optimal solution. We have assumed wlog that the optimal value is 0, and that limx→+∞ f (x) = 0. In the topview model, you see again that there is a unique generalized optimal solution, represented by the point (1, 0). This point corresponds to the fact that the recession cone of f consists of the ray generated by the vector (1, 0). Note that the curve drawn inside the disk intersects the circle in the point (1, 0) at a right angle.
7.11 Advantages of Convex Optimization
199
4. Here the optimal value is −∞ and f tends to −∞ with less than linear speed. We have assumed wlog. that limx→+∞ f (x) = −∞. In the topview model, you see again that there is a unique generalized optimal solution, represented by the point (1, 0). This point corresponds to the fact that the recession cone of f consists of the ray generated by the vector (1, 0). Note that the curve drawn inside the circle is tangent to the circle in the point (1, 0). 5. Here, f has a downward sloping asymptote for x → +∞. In the topview model, you see that the set of generalized solutions is a circle segment: one endpoint is (1, 0) and the other endpoint corresponds to the slope of the asymptote mentioned above. Note that the curve drawn inside the circle is not tangent to the circle in the ‘other’ endpoint. 6. Here, f is unbounded below for x → +∞ but it has no asymptote. In the topview model, you see that the set of generalized solutions is again a circle segment: one endpoint is (1, 0) and the other endpoint corresponds to the arrow on the left hand side, which is a steepest recession vector of the epigraph of f . Note that the curve drawn inside the circle is tangent to the circle in the ‘other’ endpoint. That there exists an optimal solution in a generalized sense, for each of the six problems in Fig. 7.7, is no coincidence. It is not hard to define this solution concept in general, in the same spirit, and then it follows from the compactness of the ball that there always exists an optimal solution in this generalized sense. All existence results for ordinary optimal solutions above can be derived from this general result. Conclusion For each convex optimization problem, there exists a solution in a natural generalized sense. This gives existence of ordinary solutions under assumptions on recession vectors.
7.11 Advantages of Convex Optimization The aim of this section is to list advantages of working with convex optimization problems. 1. All local minima are global minima. 2. The optimality conditions are criteria for global minimality: they are sufficient as well as necessary conditions for optimality. That is, they hold if and only if the point under investigation is a minimum. In particular, note the following advantages: (a) No need to prove existence of an optimal solution if you can find a solution of the optimality conditions and are satisfied with just one optimal solution. However, if the analysis of the optimality conditions leads only to equations which characterize the optimal solutions and not to a closed formula, then it is essential that you check independently the existence of an optimal solution to ensure that these equations have a solution.
200
7 Convex Problems: The Main Questions
(b) The following ‘educated guess’method is available: often the optimality conditions lead to the distinction of many cases depending on which constraints are binding; sometimes you can make an educated guess which case is promising (as you have a hunch) and then it is not necessary to consider all the many other cases. 3. Checking existence of the optimal solution can be done easier than in general— using recession cones. Moreover, if a problem has no optimal solution, then it is often possible to prove this—again using recession cones. 4. The set of optimal solutions is a convex set. 5. For uniqueness, the principle of monotonicity boils down to conditions of strict convexity (note that a differentiable function of one variable is strictly convex iff its derivative is monotonic increasing): a strictly convex function can have at most one minimum and a nonconstant linear function on a strictly convex set has at most one minimum. 6. Determining the sensitivity amounts to computing the subdifferential of a certain auxiliary function—the optimal value function—at a point, and this computation is often possible. 7. Last but not least, the class of convex optimization problems corresponds more or less to the class of problems that can be solved numerically in a way that can be proven to be efficient. However, optimization algorithms fall outside the scope of this book.
7.12 Exercises 1. Show that the optimization problem in Nash’s theorem that characterizes the fair bargain for a bargaining opportunity, Theorem 1.1.1, is equivalent to the following optimization problem min(− ln f (x)), x ∈ F, x > v and show that this problem is convex. 2. Show that a point x ∈ X is an optimal solution for the convex optimization problem to minimize a convex function f : X → R iff f ( x ) = v ∈ R, where v is the optimal value of the problem. 3. Show that if the problem to minimize a given convex function f : X → R has a solution, then f must be proper. 4. Show that the set of optimal solutions of a convex problem is a convex set. 5. Give a precise proof of Theorem 1.1.1 on Nash bargaining. A sketch of the proof has been given in Chap. 1. 6. Give examples of convex optimization problems having no optimal solutions in each of the following cases:
7.12 Exercises
201
(a) the optimal value is +∞, (b) the optimal value is −∞ and the objective function is proper, (c) the optimal value is a real number. 7. What is the—hidden—strict convexity in Theorem 7.2.7? 8. Show that each convex optimization problem can be reformulated as a conic problem, by using homogenization. 9. Show that the local minima of the three problems that are given in Example 7.3.1 are not smooth. 10. Prove Theorem 7.4.1 by a direct argument (not using the concept of tangent space). 11. Let a line l in the point in the plane R2 be given and suppose that you can move with speed v1 on one side of v1 and with speed on the other side of l. Solve the problem of the fastest route between two given points on different sides of l by means of a law. That is, prove that there exists an optimal solution, that it is unique and rewrite the equation given by Fermat’s theorem in some attractive way. 12. Solve the problem about the waist of three lines in space by finding a law that characterizes the optimal solution. 13. Show that Theorem 7.6.1 is just a tautology: its proof is an immediate consequence of the definitions of the subdifferential and of an optimal solution. 14. Let fi , 0 ≤ i ≤ m be proper closed convex functions on X. Define F (x, y) = f0 (x) if fi (x) + yi = 0, 1 ≤ i ≤ m and = +∞ otherwise. Show that F is a proper closed convex function. 15. Solve each one of the optimization problems in Example 7.7.1 in two ways: using Lagrange’s multiplier method for smooth problems and using Lagrange’s multiplier method for convex problems. 16. *Show that the function f and more generally the functions fy , y ∈ Y in Sect. 7.9 are always closed but not necessarily proper. 17. Prove that L(x, η) = −F ∗(2) (x, η) for all x ∈ X, η ∈ Y , where the superscript ∗(2) denotes the operator of taking the conjugate function with respect of the second variable of F , which is y. 18. Derive Theorem 7.6.1 from Theorem 7.9.5. Hint: take any finite dimensional inner product space Y , for example the zero space, and define F (x, y) = f (x) for all x ∈ X, y ∈ Y . 19. Try to prove Proposition 7.9.1 by using that a function is convex iff its epigraph is convex. Note that this proof is not so straightforward as the proof in terms of the strict epigraph. 20. *Show that the optimal value function need not be proper and need not be closed. 21. Show that the Slater condition together with the condition S(0) ∈ R imply that S is proper, that 0Y ∈ ri(dom(S)) and so that ∂S(0) is nonempty. 22. *Give the details of the proof of Theorem 7.9.5. 23. Prove Proposition 7.9.8. 24. Prove Proposition 7.9.9.
202
7 Convex Problems: The Main Questions
25. Give an example of a convex optimization problem, a feasible point x , and a perturbation, such that the Lagrange condition holds for some selection of multipliers (η, η0 ) with η0 = 0, but x is not an optimal solution. 26. Formulate a concept of generalized solution for a convex optimization problem and prove that every convex optimization problem has a generalized solution. Hint. Work with the topview model and use that every closed subset of a ball is compact. 27. Give an example of a twice differentiable strictly convex function f for which the Hessian is not positive definite everywhere. 28. Prove Proposition 7.2.7. 29. Solve the problem x − 2 + max {3x + 3 , ex + 2} → min 30. Solve the problem x + y − 2 + 3x − 2y − 1 + x 2 − x +
1 2 y + ex−3y → min 2
31. For all possible values of the parameters a, b > 0, solve the problem x 2 + y 2 + 2 (x − a)2 + (y − b)2 → min 32. Two villages are located on one side with respect to a right highway (a straight line). The first one is on the distance of 1 km from the highway and the second one is on the distance of 6 km from the highway and of 13 km from the first village. Where to put the gasoline station so that the sum of distances from the two villages and from the highway is minimal? 33. Formulate and prove an analogue to the FermatTorricelliSteiner theorem for four points in the three dimensional space. 34. For all possible values of the parameters α > 0, solve the problem max 2x + 1 , x + y + α x 2 + y 2 − 4x + 4 → min 35. Let a ∈ Rn be a given nonnegative vector. Reduce the following problem to a convex problem: ⎧ ⎪ x · · · xn → max ⎪ ⎪ 1 ⎪ ⎪ ⎨ a, x ≤ 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ x ≥ 0, i = 1, . . . , n . i
7.12 Exercises
203
36. Let a ∈ Rn be a given nonnegative vector. Reduce the following problem to a linear programming problem: ⎧ ⎪ ⎪ a, x → min ⎪ ⎪ ⎪ ⎨ n i=1 i xi  ≤ 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ n x = 0 . i=1 i Try to make the new problem have a not very large number of constraints (the smaller the better).
Chapter 8
Optimality Conditions: Reformulations
Abstract • Why. In applications, many different forms of the conditions for optimality for convex optimization are used: duality theory, the Karush–Kuhn–Tucker (KKT) conditions, the minimax and saddle point theorem, Fenchel duality. Therefore, it is important to know what they are and how they are related. • What. – Duality theory. To a convex optimization problem, one can often associate a concave optimization problem with a completely different variable vector of optimization, but with the same optimal value. This is called the dual problem. – KKT. This is the most popular form for the optimality conditions. We also present a reformulation of KKT in terms of subdifferentials, which is easier to work with. – Minimax and saddle point. When two parties optimize against each other, then this sometimes leads to an equilibrium, where both have no incentive to change the current situation. This equilibrium can be described, in formal language, by a saddle point, that is, by vectors x ∈ Rn and y ∈ Rm for which, for a suitable function F (x, y), one has that F ( x, y ) equals both minx maxy F (x, y) and maxy minx F (x, y). – Fenchel duality. We do not describe this result in this abstract. Road Map • • • • • • • •
Figure 8.1 and Definition 8.1.2 (the dual problem and its geometric intuition). Proposition 8.1.3 (explicit description of the dual problem). Figure 8.2, Theorem 8.1.5 (duality theory in one picture). Definitions 8.2.1–8.2.4, Theorem 8.2.5, numerical example (convex programming problem, Lagrange function, Slater condition, KKT theorem). The idea of Definition 8.2.8 (KKT in the context of a perturbation of a problem). Theorem 8.3.1, numerical example (KKT in subdifferential form). Section 8.4 (minimax, maximin, saddle point, minimax theorem of von Neumann). Section 8.5 (Fenchel duality).
© Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045_8
205
206
8 Optimality Conditions: Reformulations
8.1 Duality Theory The aim of this section is to present duality theory. We consider again the setup for the optimality conditions in terms of perturbations of data. Let X, Y be finite dimensional inner product spaces and let F : X × Y → (−∞, +∞] be a proper closed convex function. The problem minx f (x) (P ) where f (x) = F (x, 0) is called in the present context the primal problem (P ). Now we define the dual problem. We recall the notation S(y) = infx F (x, y), the optimal value of the problem (Py ) to minimize F (x, y) as a function of x, for all y ∈ Y and that the function S is convex. Example 8.1.1 (Dual Problem) The idea of the duality problem is to get the best possible lower bound of a certain type for the optimal value S(0) of the primal problem (P ). Figure 8.1 illustrates this. Let η ∈ Y be given. Take for each η ∈ Y an affine (= linear plus constant) function on Y with slope η for which the graph intersects the graph of S. Then move this graph in vertical direction downward till the last moment that it has a point in common with the graph of S. Equivalently, take an affine function on Y with slope η for which the graph lies completely under the graph of S and then move it upward till it ‘bounces’ against the graph of S. Then consider the intercept with the vertical axis. This intercept is by definition at height precisely ϕ(η). It follows from the convexity of S that ϕ(η) is a lowerbound for S(0): S(0) ≥ ϕ(η) ∀η ∈ Y. This concludes the explanation that the dual problem is the problem of finding the best lower bound for the optimal value of (P ) of a certain type. Now we give a formal definition of the dual problem. Definition 8.1.2 The dual problem (D) is the problem to maximize the function ϕ : X → R given by ϕ = −S ∗ . Note that ϕ is concave (that is, minus this function is convex). Note that the dual problem depends on the chosen family of perturbed problems (Py ), y ∈ Y —that is, on F —and not only on the primal problem—that is, not only Fig. 8.1 Dual problem
S
ϕ(η)
0
slope η
8.1 Duality Theory
207
on f . However, often there is for a given convex optimization problem a unique natural perturbation (Py ), y ∈ Y —that is, a natural unique F . The meaning of the dual problem is that it is the problem to find the best possible lower bound for the optimal value of (P ) of a certain type. Indeed, if you work out the definition of ϕ, then you get the following equivalent description, illustrated in Fig. 8.1. At first sight, this dual problem might not look very useful. What we know is F , and not S, let alone S ∗ . Fortunately, one can express the dual problem in terms of the Lagrange function L(x, η) as follows. Proposition 8.1.3 ϕ(η) = inf L(x, η) ∀η ∈ Y. x
Proof We have ϕ = −S ∗ , S ∗ (η) = infy (η, y − S(y)) ∀η ∈ Y and S(y) = infx∈X F (x, y). So in all we get ϕ(η) = inf inf[F (x, y) − η, y]. y
x
Now comes the crux of the proof. We interchange the two infimum operations. This gives ϕ(η) = inf inf[F (x, y) − η, y]. x
y
By definition of the Lagrange function, this gives ϕ(η) = inf L(x, η) ∀η ∈ Y x
as desired.
This result is useful: experience has shown that often one can calculate—for a given function F —the Lagrange function L and then the function ϕ. The point is that it so happens that if you take F (x, y) − η · y, then taking first minimizing with respect to x and then with respect to y is more complicated than doing these minimizations in the other order. Therefore you can proceed as follows: choose an element η, and compute ϕ(η) by using Proposition 8.1.3; this is something valuable: an explicit lower bound for the value of the primal problem: S(0) ≥ ϕ(η). Example 8.1.4 (Duality Theory in One Picture) Our findings are essentially the full duality theory. Duality theory can be seen in one simple picture, Fig. 8.2, which shows that ϕ(η) is maximal if it equals S(0), and then η is a subgradient for S at 0—so it is a measure of sensitivity for the optimal value.
208
8 Optimality Conditions: Reformulations
Fig. 8.2 Duality theory for convex optimization in one picture
S
∧
slope η = S ′(0)
∧
ϕ(η)
0
Now we give a precise description of duality theory. Theorem 8.1.5 Let F : X × Y → R be a proper closed convex function. Let the primal problem (P ), the objective function f of (P ), the dual problem (D), the objective function ϕ of (D), the optimal value function S and the Lagrange function L be defined in terms of F as above. 1. Formula for objective function dual problem. ϕ(η) = inf L(x, η) ∀η ∈ Y. x
2. Weak duality. For each x ∈ X and η ∈ Y one has the inequality f (x) ≥ ϕ(η). 3. Strong duality. Let x ∈ X, η ∈ Y . Then problem (P ) has optimal solution x and the problem (D) has optimal solution η and the optimal values of (P ) and (D) are equal iff f ( x ) = ϕ( η). 4. Sensitivity to perturbations. If S(0) ∈ R and η is an optimal solution of (D) then η ∈ ∂S(0) and so S(y) ≥ S(0) + η · y ∀y ∈ Y. Strong duality need not always hold: if (P ) and (D) have a different optimal value, then one says that there is a duality gap. Example 8.1.6 (Duality Theory) Consider the problem to minimize x12 + 2x1 x2 + 3x22 under the constraint x1 + 2x2 ≥ 1. We perturb this problem by replacing the constraint by x1 + 2x2 ≥ y. Then, one finds that the Lagrange function L(x, η) equals x12 + 2x1 x2 + 3x22 + η(1 − x1 − 2x2 ),
8.1 Duality Theory
209
provided η ≥ 0; otherwise, it equals −∞. So the set of η for which L(x, η) is a proper function of x is [0, +∞). Putting the partial derivatives of L with respect to x1 and x2 equal to zero and solving gives x1 (η) = (1/4)η and x2 (η) = (1/4)η. This is a point of minimum for L(x, η) as a function of x, as this function is convex, by the second order condition. Therefore, ϕ(η) = L(x1 (η), x2 (η), η) = −(3/8)η2 + . This function has a stationary point η = 4/3. So x = x (4/3) = (1/3, 1/3) . As f ( x ) = 2/3 = ϕ( η), it follows that x = (1/3, 1/3) is a solution of the given problem. Observe that it lies on the boundary of the feasible set: the constraint is binding. For a coordinatefree treatment of duality theory for LP problems, see Sect. 9.8. The oldest pair of primal dual problems was published in 1755 in the journal Ladies Diary ([1]). See Sect. 9.5 for a description of this pair. There is complete symmetry between primal and dual problem: the dual problem can be embedded in a family of perturbed problems in a natural way such that if we take its associated dual problem, we are back to square one: the primal problem. Proposition 8.1.7 (Duality Scheme) Let F : X × Y → R be a proper closed convex function. Then one has the following scheme: min f (x)

max ϕ(η)
min F (x, y)

max (ξ, η)
(P ) (Py )
x
x
f (x) = F (x, 0);
η
η
(ξ, η) = −F ∗ (ξ, η) :
(D) (Dξ )
ϕ(η) = (0, η).
210
8 Optimality Conditions: Reformulations
8.2 Karush–Kuhn–Tucker Theorem: Traditional Version The aim of this section is to present the Karush–Kuhn–Tucker (KKT) conditions in the traditional version. Definition 8.2.1 A convex programming problem is an convex optimization problem of the form min f0 (x) s.t. fi (x) ≤ 0,
1 ≤ i ≤ m,
where f0 , . . . , fn are proper closed convex functions on X = Rn . We define for this type of problem the Lagrange function and the Slater condition. Later you will see that these definitions will agree with the ones given for the formulation of the optimality conditions in terms of F for a suitable choice of F . Definition 8.2.2 The Lagrange function L(x, λ) on X × Rm + for the convex programming problem is given by L(x, λ) = f0 (x) + · · · + λm fm (x) for all x ∈ X and all λ ∈ Rm + where λ = (λ1 , . . . , λm ), that is, for all λ1 , . . . , λm ∈ R+ . Definition 8.2.3 A Slater point of the convex programming problem is a feasible point x¯ for which all constraints hold strictly: f1 (x) ¯ < 0, . . . , fm (x) ¯ < 0. The Slater condition holds for a convex programming problem if there exists a Slater point. Now we define the optimality conditions for the convex programming problem. Definition 8.2.4 The Karush–Kuhn–Tucker (KKT) conditions hold for a feasible point x of the convex programming problem and for the selection of multipliers λ0 , . . . , λm ∈ R+ , not all zero, if the function x → L(x, λ) has a minimum in x = x and if moreover λi fi ( x ) = 0 ∀i ∈ {1, . . . , m}. Theorem 8.2.5 (Karush–Kuhn–Tucker Theorem for a Convex Programming Problem) Consider a convex programming problem min f0 (x) s.t. fi (x) ≤ 0
1 ≤ i ≤ m.
Assume that the Slater condition holds. Let x be a solution of the problem. Then x is an optimal solution of the problem iff the KKT conditions hold for x and a suitable selection of multipliers λ ∈ Rm +. Note again that we get the ideal situation: a criterion for optimality for a feasible point x as the KKT conditions are necessary and sufficient for optimality. Example 8.2.6 (KKT Example with One Constraint) Solve the shortest distance problem for the point (1, 2) and the halfplane x1 + 3x2 ≤ 1.
8.2 Karush–Kuhn–Tucker Theorem: Traditional Version
211
The easiest way to solve this simple problem is geometrically: if you draw a figure, then you see that you have to determine the orthogonal projection of the point (1, 2) on the line x1 + 3x2 = 1 and to compute its distance to the point (1, 2). The purpose of the analytical solution below is to illustrate the use of the KKT conditions. We will see that the analysis leads to the distinction of two cases; working out these cases is routine but tedious, and the details will not be displayed below. We will assume that we have a hunch which of the two cases leads to the optimal solution. If this case leads to a point that satisfies the KKT conditions, then it is optimal, as the KKT conditions are not only necessary but also sufficient. So then we do not have to investigate the other case. This possibility of making use of a hunch is an advantage of the KKT conditions. We emphasize here that the analysis of an optimization problem with equality constraints by means of the Lagrange equations leads also often to a distinction of cases, but then all cases have to be analyzed, even if the first case that is considered leads to a point that satisfies the Lagrange equations. 1. Modeling and convexity. min f (x) = (x1 − 1)2 + (x2 − 2)2 subject to g(x) = x1 + 3x2 − 1 ≤ 0.
x1 ,x2
(P )
This problem is convex as f is readily seen—from its Hessian—to be strictly convex and g is affine. So there is at most one optimal solution. The Slater condition holds: (0, 0) is a Slater point as g(0, 0) < 0. 2. Criterion. The KKT conditions are: there exists η ≥ 0 for which 2(x1 − 1) + η = 0, 2(x2 − 2) + 3η = 0, η(x1 + 3x2 − 1) = 0. 3. Analysis. Suppose you have a hunch that in the optimum the constraint is x2 − 1 = 0. Then the KKT conditions boil down to: binding, that is, x1 + 3 2(x1 − 1) + η = 0, 2(x2 − 2) + 3η = 0, x1 + 3x2 − 1 = 0, η ≥ 0. Now you solve these three linear equations in three unknowns. This gives x1 = 25 , x2 = 15 , η = 65 . Then you check that this solution satisfies the condition η ≥ 0. 4. Conclusion. The given problem has a unique solution x = ( 25 , 15 ).
212
8 Optimality Conditions: Reformulations
Example 8.2.7 (KKT Problem with Two Constraints) Solve the shortest distance problem for the point (2, 3) and the set of points that have at most distance 2 to the origin and that have moreover first coordinate at least 1. Again, the easiest way to solve this problem is geometrically. A precise drawing shows that one should determine the positive scalar multiple of (2, 3) that has length 2. Then one should take the distance of this point to (2, 3). The purpose of the analytical solution below is to illustrate the KKT conditions for a problem with several inequality constraints. For a problem with m inequality constraints, 2m cases have to be distinguished. So in this example the number of cases is 4. Again we assume that we have a hunch. Again, as working out a case is routine but tedious, the details will not be displayed below. 1. Modeling and convexity. min
x=(x1 ,x2 )
(x1 − 2)2 + (x2 − 3)2 , x12 + x22 − 4 ≤ 0, 1 − x1 ≤ 0.
This is a convex optimization problem. By the strict convexity of (x1 − 2)2 + (x2 − 3)2 , there is at most one optimal solution. The Slater condition holds: for example, (3/2, 0) is a Slater point. We do not yet establish existence of an optimal solution as we hope that we can solve the KKT conditions. 2. Criterion. The Lagrange function is L(x, η, η0 ) = η0 ((x1 − 2)2 + (x2 − 3)2 ) + η1 (x12 + x22 − 4) + η2 (1 − x1 ). The KKT conditions: ∂L = 2(x1 − 2) + 2η1 x1 − η2 = 0, ∂x1 ∂L = 2(x2 − 3) + 2η1 x2 = 0, ∂x2 η1 , η2 ≥ 0, η1 (x12 + x22 − 4) = 0, η2 (1 − x1 ) = 0. 3. Analysis. Suppose we have a hunch that in the optimum only the first constraint is binding. We start by considering this case (there are four cases). That is, x12 + x22 = 4, 1−x1 = 0. Then the KKT conditions are led by some tedious arguments √ √ √ to the solution x1 = 4/ 13, x2 = 6/ 13 and η1 = ( 13 − 2)/2, η2 = 0. By the sufficiency of the KKT conditions, it follows that this is the unique solution.
8.3 KKT in Subdifferential Form
213
So we do not have to consider the other three cases; our educated guess turned out to be right. √ √ 4. Conclusion. The problem has a unique solution— x = (4/ 13, 6/ 13). Now we are going to indicate how Theorem 8.2.5 can be derived from the convex method of Lagrange multipliers. First, we embed the convex programming problem in a family of perturbed problems. Definition 8.2.8 For each y = (y1 , . . . , ym ) ∈ Rm the standard perturbed problem of a convex programming problem (Py ) is the optimization problem min f0 (x) s.t. fi (x) + yi ≤ 0
1 ≤ i ≤ m.
¯ It is easy to give a convex function F : X × Y → R—where X = Rn , Y = Rm such that for each y ∈ Y the problem (Py ) is equivalent to the problem min fy (x) = F (x, y), x ∈ X. Indeed, define F (x, y) = f0 (x) if fi (x)+yi ≤ 0, 1 ≤ i ≤ m and = +∞ otherwise. This function satisfies the requirements. Then we apply the convex method of Lagrange multipliers to this function F . It can be verified that this gives Theorem 8.2.5. See Sect. 9.6 for an upbeat application of KKT in economics, the celebrated second welfare theorem, that basically expresses that every desirable allocation of goods can be achieved by a suitable price mechanism, provided one corrects the starting position, for example by means of education in the case of the labor market. And, last but not least, see the proof of Minkowski’s theorem on polytopes in Chap. 9.
8.3 KKT in Subdifferential Form The aim of this section is to reformulate the optimality conditions as the Karush– Kuhn–Tucker (KKT) conditions in subdifferential form. Vladimir Protasov has discovered a formulation of the KKT conditions in terms of subdifferentials. This is very convenient in concrete examples. We only display the main statement. Theorem 8.3.1 (Karush–Kuhn–Tucker Conditions for a Convex Programming Problem in Subdifferential Form) Assume there exists a Slater point for a given convex programming problem. Let x be a feasible point. Then x is a solution iff 0n ∈ co(∂f0 ( x ), ∂fk ( x ), k ∈ K) where K = {k ∈ {1, . . . , m}  fk ( x ) = 0}, the constraints that are active in x.
214
8 Optimality Conditions: Reformulations
To illustrate the use of the subdifferential version of the KKT conditions, we first consider the two runofthemill examples that were solved in the previous section by the traditional version of the KKT conditions. We have seen that working with the traditional version leads to calculations that are routine but tedious. Now we will see that working with the subdifferential version leads to more pleasant activities: drawing a picture and figuring out how the insight that the picture provides can be translated into analytical conditions. Therefore, the subdifferential version of KKT is an attractive alternative. Example 8.3.2 (Subdifferential Version of KKT for a Shortest Distance Problem) 1. Modeling and convexity. min f0 (x) = (x1 − 1)2 + (x2 − 2)2 , x ∈ R2 , f1 (x) = x1 + 3x2 − 1 ≤ 0. This problem is convex as f0 and f1 are convex functions. The function f0 is even strictly convex, so there is at most one solution. The Slater condition holds: (0, 0) is a Slater point as f1 (0, 0) < 0. 2. Criterion (taking into account the hunch that the constraint is active in the optimal point x ). The origin (0, 0) is contained in the convex hull of the two points (2(x1 − 1), 2(x2 − 1)) and (1, 3). 3. Analysis. Drawing a picture reveals what the criterion means: the point (2(x1 − 1), 2(x2 − 1)) must lie on the line through (0, 0) and (1, 3), but on the other side of (0, 0) than (1, 3) (it is allowed that it is equal to the point (0, 0)). That is, (2(x1 − 1), 2(x2 − 1)) is a nonpositive scalar multiple of (1, 3). Translating this geometric insight into analytical conditions gives the equation 2(x2 − 1) = 6(x1 − 1) (here we use that two vectors (a, b) and (c, d) are linearly dependent iff ad = bc) and the inequality 2(x1 − 1) ≤ 0. The equation can be rewritten as 6x1 − 2x2 = 2; the active constraint gives x1 + 3x2 = 1. We get solution ( 25 , 15 ). Finally, we check that the inequality is satisfied 2( 25 − 1) is indeed nonpositive. 4. Conclusion. The point ( 25 , 15 ) is the unique optimal solution of the problem. Example 8.3.3 (Subdifferential Version of KKT for a Shortest Distance Problem) 1. Modeling and convexity. min f0 (x) = (x1 − 2)2 + (x2 − 3)3 , x ∈ R2 , x12 + x22 − 4 ≤ 0, 1 − x1 ≤ 0. This is a convex problem as f0 , f1 , f2 are convex functions. The function f0 is even strictly convex, so there is at most one optimal solution. 2. Criterion (taking into account the hunch that the only active constraint is x12 + x22 − 4 ≤ 0). The origin (0, 0) is contained in the convex hull of the two points (2(x1 − 2), 2(x2 − 3)) and (2x1 , 2x2 ). 3. Analysis. Translation of the criterion, which is a geometric condition, into analytical conditions gives the equation (x1 −2)x2 = (x2 −3)x1 and the inequality 2(x1 − 2)2x1 ≤ 0 (that is, x1 and x1 − 2 have different sign). The equation can be
8.3 KKT in Subdifferential Form
215
rewritten as −2x2 = −3x3 . Substitution in the active constraint x12 + x22 − 4 = 0 gives x12 + 94 x12 = 4 and so x12 = 16 13 . Thus we get, taking into account that x1 and x1 − 2 have different sign, that x1 = √4 and x2 = √6 . 13
13
4. Conclusion. The point ( √4 , √6 ) is the unique optimal solution. 13
13
Now we give a more interesting numerical example. It would be complicated to solve this example using the traditional version of the KKT conditions. Example 8.3.4 (Subdifferential Version of KKT for a Problem Involving Complicated Functions) We consider the problem to minimize y 4 − y − 4x → min, subject to the constraints √ max{x + 1, ey } + 2 x 2 + y 2 ≤ 3y + 2 5,
x 2 + y 2 − 4x − 2y + 5 + x 2 − 2y ≤ 2.
1. Modeling and convexity. f (x, y) = y 4 − y − 4x √→ min, (x, y) ∈ (R2 )T , g(x, y) = max{x + 1, ey } + 2(x, y) − 3y − 2 5 ≤ 0, h(x, y) = (x, y) − (2, 1) + x 2 − 2y − 2 ≤ 0. This is seen to be a convex optimization problem and the objective function is strictly convex. 2. Criterion. The origin is contained in the convex hull of ∂f (x, y), ∂g(x, y) and ∂h(x, y). 3. Analysis. Let us begin by investigating the point of nondifferentiability (2, 1). At this point we have: (1) the subdifferential of f consists of the point f (2, 1) = (−4, 3), (2) the first constraint is active and the subdifferential of √ of g consists √ (2,1) the point g (2, 1) = (1, 1) = 2 (2,1) + (0, −3) = (1 + 45 5, −2 + 25 5), (3) the second constraint is active, and the subdifferential of h is D + (4, −2), where D denotes the unit disk. It is strongly recommended that you try to draw a picture. How to compute the convex hull of these subdifferentials? We can try to do this graphically by drawing a picture. However, is a picture reliable? Maybe not, but it gives us a valuable idea. We see that the origin lies more or less between the point (−4, 3), the subdifferential of f and the disk with center (4, −2) and radius 1, which is the subdifferential of h. This suggests the following rigorous observation: the origin is the midpoint of the segment with endpoints (−4, 3) and (4, −3), and the latter point lies in the disk with center (4, −2) and radius 1. This shows that the origin lies in the convex hull of the subdifferentials (and the subdifferential of the objective function f is genuinely taking part). That is, condition (0 ) holds true and so (2, 1) is minimal. 4. Conclusion. The problem has a unique solution, ( x, y ) = (2, 1).
216
8 Optimality Conditions: Reformulations
Finally, we consider the bargaining example of Alice and Bob that was presented in Sect. 1.1 and that was solved there. However, this time the problem will not be solved by adhoc arguments, but by the KKT conditions, in subdifferential form. Example 8.3.5 (Subdifferential KKT for a Bargaining Problem) We want to solve the bargaining example of Alice and Bob that was presented in Sect. 1.1 and that was solved there. However, this time the problem will not be solved by adhoc arguments, but by the KKT conditions. One can make an educated guess about the solution: Bob should give his ball, which has utility 1 to him, to Alice, to whom it has the much higher utility 4; in return, Alice should give her bat, which has utility 1 to her, to Bob, to whom it has the higher utility 2; moreover, as Alice profits more from this exchange than Bob, she should give the box with some probability to Bob; this probability should be >0 and 0, 2p1 + 2p2 − p3 > 0.
Here we have used that maximizing f (x) is equivalent to minimizing − ln f (x) = − ln x1 − ln x2 , which is a convex function; moreover, we have expressed x in p1 , p2 , p3 . 2. Criterion (taking into account the hunch that the constraints that are active in the optimum p are p1 − 1 ≤ 0, p3 − 1 ≤ 0). The origin is contained in the convex hull of ∂g(p) = {g (p)}, ∂(p1 − 1) = {(1, 0, 0)}, ∂(p3 − 1) = {(0, 0, 1)}. For geometric reasons, this means that ∂g ≤ 0, ∂p1 ∂g = 0, ∂p2 ∂g ≤ 0. ∂p3 3. Analysis. The second condition is −
1 1 (−2) − 2 = 0. −p1 − 2p2 + 4p3 2p1 + 2p2 − p3
8.4 Minimax and Saddle Points
217
Substitution of p1 = p3 = 1 gives 2 2 − = 0. 3 − 2p2 1 + 2p2 This can be rewritten as 3 − 2p2 = 1 − 2p2 , that is, as 4p2 = 2. This has unique solution p2 = 12 . It remains to check that p = (1, 12 , 1) satisfies the first and the third condition above. These conditions are −
1 1 (−1) − 2 ≤ 0, −p1 − 2p2 + 4p3 2p1 + 2p2 − p3
−
1 1 4− (−1) ≤ 0. −p1 − 2p2 + 4p3 2p1 + 2p2 − p3
Now we substitute p = (1, 12 , 1) in the left hand sides of these inequalities and check that we get a nonpositive outcome. For the first one, we get 2 1 1 − = − 1 < 0. −1 − 1 + 4 2 + 1 − 1 2 For the second one, we get −4 1 1 + = −2 + < 0. −1 − 1 + 4 2 + 1 − 1 2 4. Conclusion. The hunch that it is optimal for Bob to give the bat to Alice, and for Alice to give the ball to Bob is correct; moreover then it is optimal that Alice should give the box to Bob with probability 12 .
8.4 Minimax and Saddle Points The aim of this section is to present von Neumann’s minimax theorem. ¯ be given, to be called a saddle function. Let a function L : X × Y → R ¯ Definition 8.4.1 The minimax of L is the following element of R inf sup L(x, η)
x∈X η∈Y
¯ and the maximin of L is the following element of R sup inf L(x, η).
η∈Y x∈X
218
8 Optimality Conditions: Reformulations
There is a relation between the minimax and the maximin that holds in general. Proposition 8.4.2 (Minimax Inequality) The minimax is not smaller than the maximin: inf sup L(x, η) ≥ sup inf L(x, η).
x∈X η∈Y
η∈Y x∈X
In order to formulate the proof, and for later purposes, it is useful to make the following definitions. Definition 8.4.3 The following two optimization problems are associated to L. Write f (x) = sup L(x, η)
∀x ∈ X
η∈Y
and ϕ(η) = inf L(x, η) x∈X
∀η ∈ Y.
and consider the primal problem minx f (x) and the dual problem maxη ϕ(η). Proof of Proposition 8.4.2 One has f (x) ≥ L(x, η) ≥ ϕ(η) for all x ∈ X, η ∈ Y and so inf sup L(x, η) = inf f (x) ≥ sup ϕ(η) = sup inf L(x, η).
x∈X η∈Y
x∈X
η∈Y
η∈Y x∈X
(*)
The converse inequality does not hold in general. Definition 8.4.4 If the equality holds in the minimax inequality, then the common value is called the saddle value of L. Under suitable assumptions, one has that the converse inequality holds as well and that both outer optimizations have optimal solutions. In order to formulate these conclusions, the following concept is convenient. Definition 8.4.5 A saddle point of L is a pair ( x, η) ∈ X × Y for which L(x, η) ≥ L( x, η) ≥ L( x , η) for all x ∈ X, η ∈ Y . The following result shows that the concept of saddle point is precisely what is needed for the present purpose.
8.4 Minimax and Saddle Points
219
Proposition 8.4.6 A pair ( x, η) ∈ X × Y is a saddle point of L iff x is an optimal solution of the primal problem, η is a solution of the dual problem and if moreover L has a saddle value. Proof The definition of the saddle point can be written as the condition f ( x ) = L( x, η) = ϕ( η). The required equivalence follows immediately from (∗).
Here we give the most celebrated result on minimax and saddle points. We need a definition. ¯ is called convexconcave if Definition 8.4.7 A function L : X × Y → R L(·, η) is convex for each η ∈ Y , and L(x, ·) is concave for each x ∈ X. Theorem 8.4.8 (Minimax Theorem of von Neumann) Let A ⊂ X and B ⊂ Y be compact convex sets. If L : A × B → R is a continuous convexconcave function, then we have that min max L(a, b) = max min L(a, b). a∈A b∈B
b∈B a∈A
In particular, L has a saddle point. The use of this result in game theory is illustrated by the example of taking a penalty kick in Sect. 9.4. Fundamentally, searching for the KKT point of a convex programming problem is a problem of searching for the best (highest) lower bound on the optimal value of the primal problem. For a given vector of Lagrange multipliers λ, the minimum of the Lagrange function gives a lower bound on the optimal value of the convex programming problem. Then, one can look for the best possible lower bound by means of maximizing this these lower bounds over λ. Therefore, one is in fact seeking to maximize a (concave) function of λ which in itself is defined as an infimum of the (convex) function of the primal variables x. Such problems are known as saddle point problems and are of independent interest in fields that involve optimization of two adversarial forces. Examples are: • robust optimization, where one seeks to optimize the primal decisions x assuming that at the same time nature picks worstpossible forms (from a given set) of functions fi , • game theory, with classical twoplayer games as an example (von Neumann himself famously said “As far as I can see, there could be no theory of games . . . without that theorem I thought there was nothing worth publishing until the Minimax Theorem was proved.”) • chemical engineering, where a balance point of a system is a result of several adversarial forces.
220
8 Optimality Conditions: Reformulations
A careful reader might observe that if the function L(x, η) is concave in η, then the inner maximization problem is equivalent to a convex optimization problem that can be dualized to obtain inside convex minimization problem over some variable λ. After such a dualization, one would have the outer minimization over x, and an inner minimization over some new decision vector, say λ. Because both the inner and outer problems would be minimization and the resulting function could be jointly convex and they could be considered jointly as a single convex optimization problem. Hence, there seems to be no reason to consider a saddle point problem as a separate entity. Unfortunately, the dualization of the inner problem could increase the dimensionality of the problem significantly, which would outweigh the benefit of having a min problem instead of a minmax problem. For that reason, in many applications such as for example machine learning, one prefers to stay with a minmax form of a given problem and solve it as such using socalled saddlepoint algorithms.
8.5 Fenchel Duality Theory The aim of this section is to reformulate the optimality conditions as Fenchel’s duality theory. In a previous section, the reader has been introduced to the ideas of Lagrangian duality. In the Lagrangian world, the dual problem is viewed through the perspective of adding the constraint functions to the objective function using the Lagrange multipliers. A minimum of such a function is then a lower bound on the optimal value of the optimal value of the original convex optimization problem and the dual problem is about finding the best lower bound by manipulating the λ. Another view on duality is offered by the Fenchel duality, which is fundamentally equivalent to the Lagrange duality, but which makes it sometimes easier to determine the dual problem by using simple function transformations called conjugation f → f ∗ . The intuition behind this approach is the same as in the dual description of a convex cone—one wants to describe a convex function in a dual way as a supremum of all linear functions that underestimate it. We need the concept of conjugate function of a concave function (that is, a function for which minus the function is convex). Definition 8.5.1 The concave conjugate f∗ of a concave function f : A → R, where the set A ⊂ X is convex is the concave function f∗ : X → [−∞, +∞) given by the recipe f∗ (ξ ) = inf [x, ξ − f (x)]. x∈A
Note that the function f∗ is concave.
8.6 Exercises
221
We consider two proper convex functions f and −g on X and we are going to relate the minimum of the convex function f − g and the maximum of the concave function g∗ − f ∗ . Note that the minimization of f − g takes place over the convex set dom(f − g) = dom(f ) ∩ dom(−g) and that the maximization of g∗ − f ∗ takes place over the convex set dom(f ∗ − g∗ ) = dom(f ∗ ) ∩ dom(−g ∗ ). Theorem 8.5.2 (Fenchel Duality) Let f and −g be proper convex functions. Assume that the relative interiors of the effective domains of f and −g have a point in common—ri(dom(f )) ∩ ri(dom(−g)) = ∅. Then we have the equality inf (f (x) − g(x)) = sup (g∗ (ξ ) − f ∗ (ξ ))
x∈X
ξ ∈X
holds and the supremum in the right hand side is attained.
8.6 Exercises 1. Show that the objective function ϕ of the dual problem is concave (that is, minus this function is convex). 2. Write out the definition ϕ = −S ∗ and conclude that this gives the description illustrated in Fig. 8.1 and described verbally just below this figure. 3. *Derive Theorem 8.1.5 from Lagrange’s principle in the convex case, and show that Theorem 8.1.5 is in fact equivalent to the Lagrange’s principle in the convex case. 4. Show that duality theorem is essentially equivalent to Lagrange’s principle in the convex case. 5. Illustrate all statements of Theorem 8.1.5 by means of Figs. 8.1 and 8.2. 6. **Check all statements implicit in Proposition 8.1.7. Give a precise formulation of the statement above that there is complete symmetry between the primal and dual problem and prove this statement. 7. Write down the KKT conditions for the optimization problem that characterizes the fair bargain for a bargaining opportunity that is given in Nash’s theorem 1.1.1, after first having rewritten it as a convex optimization problem, as min(− ln f (x)), x ∈ F, x > v. 8. Show that a convex programming problem is a convex optimization problem. 9. *Derive Theorem 8.2.5 from Lagrange’s principle in the convex case. 10. *Derive Theorem 8.2.5 directly from the convex version of the Fermat theorem, using rules for computing the subdifferential.
222
8 Optimality Conditions: Reformulations
11. Formulate the Slater condition, the KKT conditions and the KKT theorem for a convex programming problem with inequality and equality constraints: min f0 (x) s.t. fi (x) ≤ 0
12.
13. 14.
15.
16. 17. 18. 19. 20. 21. 22.
1 ≤ i ≤ m , fi (x) = 0, m + 1 ≤ i ≤ m,
where f0 , . . . , fm are proper closed convex functions on X and fm +1 , . . . , fm are affine functions on X. Derive this KKT theorem from Lagrange’s principle in the convex case. Formulate the KKTconditions and the KKT theorem for a convex programming problem for which the Slater conditions do not necessarily hold. Derive this KKT theorem from Lagrange’s principle in the convex case. Show that the problems considered in Examples 8.2.6 and 8.2.7 are convex optimization problems. Show that the function F : X × Y → (−∞] given by the recipe F (x, y) = (x1 − 1)2 + (x2 − 2)2 if x1 + 3x2 − 1 + y ≤ 0 and F (x, y) = +∞ otherwise— which is the perturbation function for Example 8.2.6, is a proper closed convex function and that F (x, 0) = f (x) for all x ∈ R2 . In the complete analysis of the problem considered in Examples 8.2.6 and 8.2.7, use is made is made of the fortunate assumption that you have a hunch which constraints are binding in the optimum. Which changes are necessary if you do not have such a hunch? Write down the analysis of the problem of the waist of three lines in space considered in Sect. 7.2.3 in the style of the four step method. Derive Theorem 8.3.1 from the principle of FermatLagrange. Hint: use the subdifferential form of Lagrange’s principle in the convex case. Derive the KKT conditions in subdifferential form from the formal duality theory for convex functions F : X × Y → R. *Show that the inequality in Proposition 8.4.2 can be strict. *Derive the minimax theorem 8.4.8 from Lagrange’s principle in the convex case. Show that the function f∗ is concave. **Derive Theorem 8.5.2 from Lagrange’s principle in the convex case. Hints. Reformulate the problem of minimizing f − g as follows min f (y) − g(z) y,z
subject to z = y, y ∈ dom(f ), z ∈ dom(g). Then take the dual problem with respect to a suitable perturbation and show that this dual problem is the problem to maximize g∗ − f ∗ .
8.6 Exercises
223
23. Derive, conversely, the KKT theorem from Fenchel duality. Hint. Reformulate the convex programming problem as inf f (x) − x
m
−δfi (x) ,
i=1
where δfi (x) is the indicator function of the set {x  fi (x) ≤ 0}: δfi (x) =
0 if fi (x) ≤ 0 +∞ otherwise.
24. Solve the shortest distance problem for the point (1, 2) and the halfplane x1 + 3x2 ≤ 1. 25. *Solve the problem to minimize y 4 − y − 4x subject to the constraints √ max(x + 1, ey ) + 2 x 2 + y 2 ≤ 3y + 2 5 and
x 2 + y 2 − 4x − 2y + 5 + x 2 − 2y ≤ 0.
26. Solve the problem to maximize x + y − 7 subject to the constraint max(e + 3, y ) + x
2
x 2 + y 2 − 4y + 4 ≤ 4.
27. Find the minimal value a such that the system of inequalities has a solution: ⎧ x ⎪ ⎨ x + y − 2 + 2x− y + 2 − 7y ≤ a 4 2 x + 2y + x 2 + y 2 − 5 ≤a ⎪ ⎩ 3x 2 + y 2 − 2y + 1 + ex − 3y ≤ a 28. Solve the problem: ⎧ 2 ⎨ x − 4y → min 2x − y + y − x − 1 + y 2 ≤ 4 ⎩ 3 max 2x, y + e 2x−y ≤ 7 29. Solve the problem: ⎧ 2 ⎨ 2x − 3x + 5y → min x − 2y + 1 + (y + 1)2 ≤ 4 ⎩ 2 2 x + y 2 ≤ x + 2y
224
8 Optimality Conditions: Reformulations
30. Solve the problem:
x− 2y + 5x 2 + y 2 − 3x → min 4 x 2 + y 2 + 6y + 9 ≤ 3x − 2y − 6
31. Use Theorem 1.1.1, the reformulation of the problem in this theorem as a convex problem min(− ln f (x)), x ∈ F, x > v and the KKT conditions to determine the fair bargain for two persons, Alice (who possesses a bat and a box) and Bob (who possesses a ball). The bat has utility 1 to Alice and utility 2 to Bob; the box has utility 2 to Alice and utility 2 to Bob; the ball has utility 4 to Alice and utility 1 to Bob. If they do not come to an agreement, then they both keep what they have. 32. *(Creative problem). Suggest a method of approximate solution of the linear regression model with the uniform metric: ⎧ ⎨ f (a) = maxk=1,...,N a, x (k) → min ⎩
x ∈ Rd ,
(a, a) = 1
where x (1) , . . . , x (N ) are given points (samples) from Rd .
Reference 1. T. M. Diary, Question CCCXCVI in Ladies Diary or Women’s Almanack, p. 47 (1755)
Chapter 9
Application to Convex Problems
Al freír de los huevos lo verá. You will see it when you fry the eggs. Miguel de Cervantes, Don Quixote (1615)
Abstract • Why. The aim of this chapter is to teach by means of some additional examples the craft of making a complete analysis of a convex optimization problem. These examples will illustrate all theoretical concepts and results in this book. This phenomenon is in the spirit of the quote by Cervantes. Enjoy watching the frying of eggs in this chapter and then fry some eggs yourself! • What. In this chapter, the following problems are solved completely; in brackets the technique that they illustrate is indicated. – – – – – – – – –
Least squares (convex Fermat). Generalized FermatWeber location problem (convex Fermat). RAS method (KKT). How to take a penalty (minimax and saddle point). Ladies Diary problem (duality theory). Second welfare theorem (KKT). Minkowski’s theorem on an enumeration of convex polytopes (KKT). Duality for LP (duality theory). Solving LP by taking a limit (the interior point algorithms are based on convex analysis).
9.1 Least Squares The aim of this section is to present a method that is often used in practice, the least squares method. Suppose you have many data points (xi , yi ) with 1 ≤ i ≤ N . Suppose you are convinced that they should lie on a straight line y = ax + b if not for some small © Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045_9
225
226
9 Application to Convex Problems
errors. You wonder what are the best values of a and b to take. This can be viewed as the problem to find the best approximate solution for the following system of N equations in the unknowns a, b: yi = axi + b, 1 ≤ i ≤ N. More generally, often one wants to find the best approximate solution of a linear system of m equations in n unknowns Ax = b. Here A is a given m × nmatrix, b is a given mcolumn and x is a variable ncolumn. Usually the columns of A are linearly independent. Then what is done in practice is the following: the system Ax = b is replaced by the following auxiliary system A Ax = A b. It can be proved that this system has a unique solution. This is taken as approximate solution of Ax = b. Now we give the justification of this method. It shows that, in some sense, this is the best one can do. For each x¯ ∈ Rn , the Euclidean length of the error vector e = Ax¯ − b is a measure of the quality of x¯ as approximate solution of the system Ax = b. The smaller this norm, the better the quality. This suggests to consider the problem to minimize the length of the error vector, f (x) = Ax − b. This is equivalent to minimizing its square, g(x) = Ax − b2 = (Ax − b) (Ax − b) = x A Ax − 2b Ax + b2 . This quadratic function is strictly convex and coercive (that is, its value tends to +∞ if x → +∞) as the matrix product A A is positive definite. Therefore it has a unique point of minimum. This is the solution of g (x) = 0, that is, of A Ax = A b. This justifies that the method described above is best, in some sense. This method is called the method of the least squares, as by definition Ax − b2 is the sum of squares and this sum is minimized.
9.2 Generalized Facility Location Problem Now we apply the convex Fermat theorem to a generalization of the famous facility location problem of FermatWeber to find a point that has minimal sum of distances to the vertices of a given triangle. The generalized problem is to find a point in ndimensional space that has minimal sum of distances to a given finite set of points. This leads to a solution that is much shorter than all known geometric solutions of this problem. Consider the problem to minimize the function f (x) =
d i=1
x − xi .
9.3 Most Likely Matrix with Given Row and Column Sums
227
Here x1 , . . . , xd are d given distinct points in Rn and x runs over Rn . If you take n = 2 and d = 3, then you get the problem of FermatTorricelliSteiner. The function f is continuous and coercive, so there exists a point of minimum x . The function f is convex, so such a point is characterized by the inclusion 0 ∈ ∂f ( x ). We write fi (x) = x − xi for all i, and ui (x) = x−xi for all x ∈ X x −xi and all i for which x = xi ; note that then ui (x) is a unit vector that has the same direction as x − xi . We distinguish two cases: 1. The point x is not equal to any of the points xi . In this case, each one of the functions fi is differentiable and f ( x) = ui ( x ). Therefore, the point x is an optimal solution of the problem iff f ( x ) = di=1 ui ( x ) = 0. 2. The point x is equal to xj for some j . Then ∂fj ( x ) = B(0, 1), the closed ball with radius 1 and center the origin, and ∂fi ( x ) = {ui ( x )} for all i = j , as all these functions fi are differentiable in the point x . Now we apply the theorem of MoreauRockafellar. This gives that ∂f (xj ) = B(0, 1) +
d
ui (xj ) = B(
i=j
xi , 1).
i=j
Therefore, xj is an optimal solution iff the closed ball with center and radius 1 contains the origin. This means that
i=j
ui (xj )
ui (xj ) ≤ 1.
i=j
Conclusion For d given distinct points x1 , . . . , xd ∈ Rn , thereis either a point x d that does not coincide with any of these points, and for which u ( x ) = 0, or i=1 i among these points there is one, xj , for which i=j ui (xj ) ≤ 1, and then we put x = xj . In each case, the point x is the optimal solution of the given problem. If you take n = 2 and d = 3, you get the solution of the problem of FermatTorricelliSteiner, which is called the problem of FermatWeber in location theory. This special case and the geometric description of its solution is considered in detail in Example 7.6.6.
9.3 Most Likely Matrix with Given Row and Column Sums Suppose you want to know the entries of a positive square matrix, but you only know its row and column sums. Then you do not have enough information. Now suppose that you have reason to believe that the matrix is the most likely matrix with row and column sums as given. This is the case in a number of applications, such as the estimate or prediction of inputoutput matrices. To be more precise, suppose that these row and column sums are positive integers, and assume that all entries of the matrix you are looking for are positive integers. Assume moreover that the
228
9 Application to Convex Problems
matrix has been made by making many consecutive independent random decisions to assign one unit to one of the entries—with equal probability for each entry. Then ask for the most likely outcome given that the row and column sums are as given. Taking the required matrix to be the most likely one in the sense just explained is called the RAS method. Now we explain how to find this most likely matrix. Using Stirling’s approximation ln n! = n ln n − n + O(ln n), one gets the following formula for the logarithm of the probability that a given matrix (Tij ) is the outcome: C−
[Tij (ln Tij ) − Tij ], i,j
where C is a constant that does not depend on the choice of the matrix. Then the problem to find the most likely matrix with given row and column sums is approximated by the following optimization problem: min f (T ) = [Tij (ln Tij ) − Tij ], i,j
subject to
Tij = Si ∀i,
j
Tij = Cj ∀j,
i
Tij > 0 ∀i, j. Note that this is a convex optimization problem. The objective function is coercive (it tends to +∞ if the maxnorm of the matrix T = maxi,j Tij  tends to +∞) and strictly convex, so there exists a unique optimal solution for the problem. Solving the Lagrange equations of this problem gives that this optimal solution is Tij = Ri Cj /S ∀i, j where S = i Ri = j Cj . The resulting matrix is given as an estimate for the desired matrix: it has positive entries and the desired row and column sums, but it does not necessarily have integral coefficients. The most likely matrix with given row and column sum is used to estimate inputoutput matrices for which only the row sums and column sums are known.
9.4 Minimax Theorem: Penalty Kick The aim of this section is to illustrate minimax and the saddle point theorem. We consider taking of a penalty kick, both from the point of view of the goal keeper and from the point of view of the football player taking the penalty. For
9.5 Ladies Diary Problem
229
simplicity we assume that each one has only two options. The player can aim left or right (as seen from the direction of the keeper), the keeper can dive left or right. So their choices lead to four possibilities. For each one the probability is known that the keeper will stop the ball: these probabilities are stored in a 2 × 2matrix A. The first and second rows represent the choice of the keeper to dive left and right respectively; the first and second columns represent the choice of the player to aim left and right respectively. Both know this matrix. It is easy to understand that it is in the interest of each one that his choice will be a surprise for the other. Therefore, they both choose randomly, with carefully determined probabilities. These probabilities they have to make known to the other. Which choices of probabilities should they make for given A? The point of view of the keeper, who knows his adversary is smart, is to determine the following maximin (where we let I = {x = (x1 , x2 )  x1 , x2 ≥ 0, x1 + x2 = 1}): max min x Ay. x∈I y∈I
The point of view of the player, who also knows the keeper is smart, is to determine the following minimax min max x Ay. y∈I x∈I
By von Neumann’s minimax theorem, these two values are equal and it so happens that there exists a choice for each one the two where this value is realized and nobody has an incentive, after hearing the choice of his adversary to change his own choice. This choice is the saddle point of the function x Ay.
9.5 Ladies Diary Problem The aim of this section is to describe the oldest example of the duality theorem for optimization problems. In 1755 Th. Moss posed the following problem in the journal Ladies Diary: In the three sides of an equilateral field stand three trees at the distance of 10, 12 and 16 chains from one another, to find the content of the field, it being the greatest.
Keep in mind that this was before the existence of scientific journals; it was not unusual to find scientific discoveries published for the first time in the Ladies Diary. Already in the beginning of the seventeenth century, Fermat posed at the end of a celebrated essay on finding minima and maxima by precalculus methods (the differential calculus was only discovered later, by Newton and Leibniz independently) the following challenge: Let he who does not approve of my method attempt the solution of the following problem: Given three points in the plane, find a fourth point such that the sum of its distances to the three given points is a minimum!
230
9 Application to Convex Problems
Fig. 9.1 The oldest pair of primaldual optimization problems
16
12
10
These two optimization problems look unrelated. However, if you solve one, then you have the solution of the other one. The relation between their solutions is given in Fig. 9.1. Each one of these two problems is just the dual description of the other one. Conclusion You have seen the historically first example of the remarkable phenomena that seemingly unrelated problems, one minimization and the other maximization, are in fact essentially equivalent.
9.6 The Second Welfare Theorem The aim of this section is to illustrate the claim that you should be interested in duality of convex sets as this is the heart of convexity. We give the celebrated second welfare theorem. This gives a positive answer to the following intriguing question. Can price mechanisms achieve each reasonable allocation of goods without interference, provided one corrects the starting position, for example, by means of education in the case of the labor market? We will present the celebrated second welfare theorem that asserts that price mechanisms can achieve each reasonable allocation of goods without interference, provided one corrects the starting position. We first present the problem in the form of a parable. A group of people is going to have lunch. Each one has brought some food. Initially, each person intends to eat his own food. Then one of them offers his neighbor an apple in exchange for her pear which he prefers. She accepts as she prefers an apple to a pear. So this exchange makes them both better off, because of the difference in their tastes. Now suppose that someone who knows the tastes of the people of the group comes up with a ‘socially desirable’ reallocation of all the food. He wants to realizes this using a price system, in the following way. He assigns suitable prices to each sort of food, all the food is put on the table, and everyone gets a personal budget
9.6 The Second Welfare Theorem
231
to choose his own favorite food. The budget will not necessarily be the value of the lunch that the person in question has brought to the lunch. The starting position has been corrected. Show that the prices and the budgets can be chosen in such a way that the selfinterest of all individuals leads precisely to the planned socially desirable allocation, under natural assumptions. Solution. The situation can be modeled as follows. We consider n consumers 1, 2, . . . , n and k goods 1, . . . , k. A nonnegative vector in Rk is called a consumption vector. It represents a bundle of the k goods. The consumers have initial endowments ω1 , . . . , ωn . These are consumption vectors representing the initial lunches. An allocation is a sequence of n consumption vectors x = (x1 , . . . , xn ). The allocation is called admissible if n
xi ≤
i=1
n
ωi .
i=1
That is, the allocation can be carried out given the initial endowments. The consumers have utility functions, defined on the set of consumption vectors. These represent the individual preferences of the consumers as follows. If ui (xi ) > ui (xi ) then this means that consumer i prefers consumption vector xi to xi . If ui (xi ) = ui (xi ) then this means that consumer i is indifferent between the consumption vectors xi and xi . The utility functions are assumed to be strictly quasiconcave (that is, for each γ ≥ 0 the set {x  ui (x) ≥ γ } is convex for each i) and strictly monotonic increasing in each variable. An admissible allocation is called Pareto efficient if there is no other admissible allocation that makes everyone better off. This is equivalent to the following more striking looking property: each allocation that makes one person better off, makes at least one person worse off. A price vector is a nonnegative row vector p = (p1 , . . . , pk ). It represents a choice of prices of the k goods. A budget vector is a nonnegative row vector b = (b1 , . . . , bn ). It represents a choice of budgets for all consumers. For each choice of price vector p, each choice of budget vector b and each consumer we consider the problem max ui (xi ), subject to p · xi ≤ bi , xi ≤
n
ωj ,
j =1
xi ∈ Rk+ . This problem models the behavior of consumer i who chooses the best consumer vector he can get for his budget. This problem has a unique point of maximum
232
9 Application to Convex Problems
di (p, d). This will be called the demand function of the consumer i. A nonzero price vector p and a budget vector b for which n
bi = p ·
i=1
n
ωi
i=1
that is, total budget equals total value of the initial endowments are said to give a Walrasian equilibrium if the sequence of demands d(p, b) = (d1 (p, b), . . . , dn (p, b)) is an admissible allocation, that is, if the utility maximizing behavior of all individuals does not lead to excess demand. Here are two positive results about the present model. Theorem 9.6.1 (First Welfare Theorem) For each Walrasian equilibrium determined by price p and budget b, the resulting admissible allocation d(p, b) is Pareto efficient. Theorem 9.6.2 (Second Welfare Theorem) Each Pareto efficient admissible allocation x ∗ for which all consumers hold a positive amount of each good is a Walrasian equilibrium. To be more precise, there is a nonzero price vector p which gives for the choice of budget vector b = (p · x1∗ , . . . , p · xn∗ ) a Walrasian equilibrium with d(p, b) = p · x ∗ . These theorems can be proved by means of the KKT conditions.
9.7 *Minkowski’s Theorem on Polytopes Minkowski’s theorem gives a convenient description of all polytopes. Theorem 9.7.1 (Minkowski’s Theorem on Polytopes) Let ni , 1 ≤ i ≤ k be a sequence of different unit vectors in Rn and let a1 , . . . , ak be a sequence of positive numbers, such that the equality k i=1
ai ni = 0
9.7 *Minkowski’s Theorem on Polytopes
233
holds. Then there exists a unique polytope with k (n − 1)dimensional boundaries with outward normals ni , 1 ≤ i ≤ k and with (n − 1)volumes a1 , . . . , ak . All polytopes in Rn are obtained in this way. In threedimensional space, the equality in the theorem has a physical meaning. If a polytope is filled with an ideal gas under pressure p, then the force that acts on the ith boundary is equal to pai ni . As the total force of the gas on all boundaries is equal to zero (otherwise, this force would move the polytope and we would get a perpetuum mobile, which is impossible), we get ki=1 pSi ni = 0. The theorem consists of three statements: the necessary condition ki=1 ai ni = 0 for the existence of a polytope, its sufficiency, and the uniqueness of the polytope. We prove all except the uniqueness. This proof, by Vladimir Protasov, is a bit technical, but it appears to be the shortest proof. It is a convincing example of application of the KKT theorem to a wellknown geometric problem. If the nonnegativity conditions of the Lagrange multipliers are not taken into account, then one comes to a wrong conclusion. This mistake is made in several wellknown books on convex analysis. Proof of the Necessity and the Sufficiency Necessity Let P be a polytope with k (n − 1)dimensional boundaries with outward normals ni , 1 ≤ i ≤ k and with (n − 1)volumes a1 , . . . , ak . For an arbitrary point x ∈ P , we denote by hi = hi (x) the distance of x to the ith boundary. As the volume of a pyramid with top x and base the ith boundary of P is equal to n1 hi ai , we get ki=1 hi (x)ai = dvol(P ). Taking the derivative of both sides of the equality gives ki=1 ai ni = 0. Sufficiency For an arbitrary sequence of nonnegative numbers h = (h1 , . . . , hk ) we consider the polytope P (h) = {x ∈ Rn  ni · x ≤ hi , i = 1, . . . , k} and we denote by V (h) its volume. We observe that 0 ∈ P (h) for all h. We consider the optimization problem min −V (h),
k
ai hi = 1, −hi ≤ 0, i = 1, . . . , k.
i=1
In other words, among all polytopes in Rn containing the origin, bounded by hyperplanes that are orthogonal to the vectors ni and that have distances to the origin hi , and for which the equation ki=1 ai hi = 0 holds, we look for the polytope with maximal volume. There exists a solution to this optimization problem, as the set of admissible vectors h is compact. We denote a solution by h. Let the (n − 1)volumes of the boundaries of P ( h) be Si , 1 ≤ i ≤ k. By the KKT theorem, one has Lh ( h, λ) = 0,
234
9 Application to Convex Problems
where L = −λ0 V (h) −
k i=1
λi hi + λk+1 (
k
ai hi − 1)
i=1
is the Lagrange function. Moreover λi ≥ 0 and λi hi = 0, i = 1, . . . , k. Differentiating with respect to hi , we get −λ0 Vhi − λi + λk+1 ai = 0, i = 1, . . . , k. If for some j , the hyperplane {x ∈ Rn  nj · x = hj } is disjoint from the polytope P ( h), that is, if this hyperplane is superfluous, then Vhj = 0, and hence λj = λk+1 aj . This also holds in the case λ0 = 0. If λj = 0, then λk+1 = 0, and so λi = −λ0 Vhi for all i. If λ0 = 0, then we get a contradiction to the requirement that not all Lagrange multipliers are zero, but if λ0 = 1, then λi = −Vhi for all i = 1, . . . , k. If the ith hyperplane is not disjoint from the polytope P ( h) (such hyperplanes exist of course), then Vhi = Si > 0, and so λi < 0, which contradicts the condition of nonnegativity of the Lagrange multipliers. Therefore, λ0 = 1, and all k hyperplanes ‘form the boundary’, that is, Vhi = Si > 0 for all i. Then −Si − λi + λk+1 ai = 0, i = 1, . . . , k. Multiplying this the vector ni , taking the sum equality by over all i, and taking into account that ki=1 Si ni = ki=1 ai ni = 0, we obtain ki=1 λi ni = 0. We denote by I the set of all indices i for which λi > 0. Then i∈I λi ni = 0, and taking into account the condition of complementary slackness, hi = 0 for all i ∈ I . On the other hand, the polytope P ( h) has at least one interior point z, that is, for which ni · z < h i , i = 1, . . . , k. Multiplying by λi and taking the sum over all i ∈ I , we get 0 < i∈I λi hi = 0. This contradiction means that I is empty, that is λi = 0 and so Si = λk+1 ai for all i = 1, . . . , k. Therefore, the volumes of the boundaries of polytope P ( h) are proportional to the numbers ai . Carrying out a homothety with the required coefficient, we obtain a polytope for which the volumes of the boundaries are equal to these numbers.
Remark It can be proved that the function − ln V (h) is strictly convex, using the BrunnMinkowski inequality; this implies the uniqueness in Minkowski’s theorem on polytopes.
9.8 Duality Theory for LP The aim of this section is to derive in a geometric way the formulas that are usually given to define dual problems. A natural perturbation of the data will be chosen, and then the dual problem associated to this perturbation will be determined. This is of
9.8 Duality Theory for LP
235
interest for the following reason. There are several standard forms for LPproblems. Often one defines for each type separately its dual problem. The coordinate free method allows to see that all formulas for dual problems are just explicit versions of only one coordinatefree, and so geometric intuitive, concept of dual problem. To begin with we will proceed as much as possible coordinate free—for better insight, starting from scratch. Definition 9.8.1 A linear programming problem is an optimization problem P of the form min p · d0 , p ∈ P + , p
where d0 ∈ X = Rn , P ⊂ X an affine subspace—say P = P˜ + p0 for a subspace P˜ ⊂ X and a vector p0 ∈ X—and P + = {p ∈ P  p ≥ 0}. This problem will be called the primal problem. We perturb P by a translation of the affine subspace P . That is, we consider the family of problems Py , y ∈ X, given for each y ∈ X by min p · d0 , p ∈ (P˜ + y)+ . p
So P = Pp0 . For simplicity of formulas we do not translate p0 to the origin as we have done in the main text. We write D for the dual problem of the primal problem P with respect to the perturbation Py , y ∈ Y . Now we write down the function F : X × X → (−∞, +∞] that describes the chosen perturbation. We do this by giving the epigraph of F : (x, y, z) ∈ X × X × R for which x ∈ P + y, x ≥ 0, z ≥ p · d0 . This set is readily seen to be a polyhedral set and to be the epigraph of a function F : X × X × R → (−∞, +∞]. Now we derive the following explicit description of the dual problem D. Proposition 9.8.2 The dual problem D is the following problem max p0 · (d0 − d), d ∈ D + , d
where D˜ = P˜ ⊥ = {d˜ ∈ X  d˜ · p˜ = 0 ∀p˜ ∈ P˜ }, the orthogonal complement of P˜ in X, and D = D˜ + d0 .
236
9 Application to Convex Problems
¯ Proof We begin by determining the Lagrange function L : X × X → R. L(p, η) = inf{p · d0 + (p0 − y) · η  y ∈ P˜ + p}, y
which becomes [provided p ≥ 0, otherwise L(p, η) = +∞; in this step, we use the equivalence of p ∈ (P˜ + y)+ with y ∈ P˜ + p and p ≥ 0] L(p, η) = inf{p · d0 + (p0 − y) · η  y ∈ (P˜ + p)}, y
˜ otherwise L(p, η) = −∞; in this step, use that which becomes [provided η ∈ D, infu∈P˜ u · η equals 0 if η ∈ D˜ and that it equals −∞ otherwise] L(p, η) = p · d0 + (p0 − p · η) = p · (d0 − η) + p0 · η. Thus, the Lagrange function has been determined. The next step is to determine the objective function of the dual problem—that is, to determine ϕ(η) = infp∈X L(p, η) for all η ∈ X. ϕ(η) = inf{p · (d0 − η) + p0 · η  p ∈ X+ } = p0 · η, p
˜ otherwise ϕ(η) = −∞. This finishes the provided d0 − η ≥ 0 and η ∈ D, determination of ϕ. Thus, the objective function ϕ of the dual problem D has been determined. It follows that D takes the following form: ˜ d0 − η ≥ 0. max p0 · η, η ∈ D, η
Writing d = d0 − η, this can be rewritten as max p0 · (d0 − d), d ∈ (D˜ + d0 )+ , d
as required. Now duality theory takes the following symmetric form.
Theorem 9.8.3 (Coordinate Free Duality Theory for LP) Let P = P˜ + p0 , D = D˜ + d0 ⊂ X = Rn be two affine subspaces for which P˜ and D˜ are subspaces of X which are each other’s orthogonal complement. Consider the following LP problems min p · d0 , p ∈ P +
(P)
max p0 · (d0 − d), d ∈ D.
(D)
p
d
9.8 Duality Theory for LP
237
Then the following statements hold true. 1. Either P and D have the same optimal value or they are both infeasible. 2. P and D both possess an optimal solution if their common optimal value is finite. 3. Let p ∈ P + and d ∈ D + . Then the following two conditions are equivalent: (a) p is an optimal solution of P and d is an optimal solution of D (b) The complementary condition p · d = 0 holds. This result is a consequence of the duality theory for convex optimization. Finally, we go over to coordinates. Then we get the familiar descriptions for LPproblems and their duals. Proposition 9.8.4 The description of a pair of primaldual LP problems in standard form is as follows: min c x, Ax = b, x ≥ 0,
(P)
max b y, A y + s = c, s ≥ 0.
(D)
x
y,s
Here, A is an m × nmatrix, b ∈ Rm and c ∈ nn . Proposition 9.8.5 The descriptions of a pair of primaldual problems in symmetric form is as follows: min c x, Ax ≥ b, x ≥ 0,
(P)
max b y, A y ≤ c, y ≥ 0.
(D)
x
y
Here, A is an m × nmatrix, b ∈ Rm and c ∈ Rn . One can determine, more generally, the dual problem of a conic optimization problem in the same way. Proposition 9.8.6 Consider the conic optimization problem min c x x
s.t. Ax = b x 0. Here A ∈ Rm×n , b ∈ Rm , c ∈ Rn , the variable x runs over Rn and x 0 means that x ∈ C, a closed pointed solid convex cone in Rn .
238
9 Application to Convex Problems
Then the dual problem determined by the perturbation of b defined by replacing in Ax = b the righthand side b by b − y is max b y y,s
s.t. A y + s = c s ∗ 0. where s ∗ 0 means that x ∈ C ∗ , the dual cone of C.
9.9 Solving LP Problems by Taking a Limit In this section, an introduction is given to primaldual interior point methods, also called barrier methods. These methods allow one to solve numerically most convex optimization problems in an efficient and reliable way, both in practice and in theory. We restrict attention to LP problems. We begin with a result that is of independent interest. We define a mapping that seems to belong to linear algebra. It turns out, maybe surprisingly, to be a bijection. This can be proved by a somewhat miraculous argument, involving the construction of a suitable strictly convex function. Let P and D be affine subspaces in X = Rn that are each other’s orthogonal complement. That is, after translation of both to subspaces P˜ and D˜ respectively, each of the two consists of the vectors in X that are orthogonal to the other one, ˜ P˜ = D˜ ⊥ = {x ∈ X  x · y = 0 ∀y ∈ D} and, equivalently, D˜ = P˜ ⊥ = {x ∈ X  x · y = 0 ∀y ∈ P˜ }. We let X++ , P++ , D++ denote the subset of all points in X, P , D respectively for which all coordinates are positive. We use the Hadamard product, coordinatewise multiplication v ◦ w = (v1 w1 , . . . , vn wn ) ∈ Rn of two vectors v, w ∈ Rn . Now we are ready to state the promised surprising fact. Proposition 9.9.1 The map P++ × D++ → multiplication—(p, d) → p ◦ d—is a bijection.
X++ given by pointwise
Figure 9.2 illustrates this proposition. Proof Choose an arbitrary point x ∈ Rn++ . We have to prove that there exist unique p ∈ P++ , d ∈ D++ for which x = p ◦ d, where ◦ is the Hadamard product
9.9 Solving LP Problems by Taking a Limit
239
d
p∗d 11
s p D
0
0
P
Fig. 9.2 A surprising bijection
(coordinate wise multiplication). To this end we consider the optimization problem min f (p, d) = p · d − x · ln(p ◦ d), subject to p ∈ P++ , d ∈ D++ . Here we write ln for coordinatewise taking the logarithm: ln v = (ln v1 , . . . , ln vn ) ∀v ∈ Rn . It is readily checked that there is an optimal solution of this problem; moreover, one can check that the function f (p, d) is strictly convex, so this optimal solution is unique. Let s be the unique common point of P and D. Then P = P˜ + s = {p˜ + s  p˜ ∈ P˜ } and ˜ D = D˜ + s = {d˜ + s  d˜ ∈ D}. The stationarity conditions of this optimization problem are readily seen to be as follows: (d − x ◦ p−1 ) · p˜ ∀p˜ ∈ P˜ , ˜ (p − x ◦ d −1 ) · d˜ = 0 ∀d˜ ∈ D.
240
9 Application to Convex Problems
Here we write v r for coordinatewise raising to the power r: v r = (v1r , . . . , vnr ) ∀v ∈ Rn++ ∀r ∈ R. The stationarity conditions can be rewritten as follows: ˜ = 0 ∀p˜ ∈ P˜ , (p 2 ◦ d 2 − x ◦ p− 2 ◦ d − 2 ) · (p− 2 ◦ d 2 ◦ p) 1
1
1
1
1
1
˜ = 0 ∀d˜ ∈ D. ˜ (p 2 ◦ d 2 − x ◦ p− 2 ◦ d − 2 ) · (p 2 ◦ d − 2 ◦ d) 1
1
1
1
1
1
That is, the vector 1
1
1
1
p 2 ◦ d 2 − x ◦ p− 2 ◦ d − 2 is orthogonal to the subspaces p− 2 ◦ d 2 ◦ P˜ 1
1
and p 2 ◦ d − 2 ◦ D˜ 1
1
of Rn . These subspaces are each other’s orthogonal complement. We get 1
1
1
1
p 2 ◦ d 2 − x ◦ p− 2 ◦ d − 2 = 0. This can be rewritten as p ◦ d = x.
Now we turn to primaldual interior point methods for LP problems. An LPproblem can be formulated coordinatefree in the following unusual way. Let P and D be affine subspaces in X = Rn that are each other’s orthogonal complement. That is, after translation of both to subspaces P˜ and D˜ respectively, each of the two consists of the vectors in X which are orthogonal to the other one, ˜ P˜ = D˜ ⊥ = {x ∈ X  x · y = 0 ∀y ∈ D} and, equivalently, D˜ = P˜ ⊥ = {x ∈ X  x · y = 0 ∀y ∈ P˜ }. Then P and D have a unique common point s; this can be checked using that each element can be written in a unique way as the sum of an element of P˜ and an ˜ Assume moreover that both element of D˜ = P˜ ⊥ . Then P = s + P˜ and D = s + D. P and D contain positive vectors. One can consider the problem to find nonnegative
9.10 Exercises
241
vectors p ∈ P and d ∈ D for which p · d = 0. It follows from Theorem 9.8.3 that this is equivalent to the problem of solving an arbitrary LP problem and of its dual problem. The interior point method solves this problem by taking a limit: one considers pairs (p, d) consisting of a positive vectors p ∈ P and d ∈ D—that is, p ∈ of a convergent sequence of such P++ , d ∈ D++ . Then one takes a limit ( p, d) is a solution of the problem, by the pairs for which p · d tends to zero. Then ( p, d) continuity of the inner product. Now we are going to sketch how one can obtain a numerical computation of such limits efficiently—using Newton’s algorithm for solving systems of equations—by virtue of Proposition 9.9.1. ¯ Suppose you have explicit points p¯ ∈ P++ and d¯ ∈ D++ . Compute p¯ ◦ d. Choose a positive vector v in X that lies close to p¯ ◦ d¯ and that is a bit closer to the ¯ Then do one step of the Newton method for solving the system of origin than p¯ ◦ d. ¯ We recall how this equations p ◦ d = v with initial approximation p = p, ¯ d = d. works and what is the intuition behind it. One expects the solution of p ◦ d = v to ¯ as v is close to p¯ ◦ d. ¯ Therefore one rewrites the vector equation be close to (p, ¯ d) ¯ and then expanding p ◦ d = v, writing p = p¯ + (p − p) ¯ and d = d¯ + (d − d) brackets. This gives ¯ + (p − p) ¯ = v. p¯ ◦ d¯ + p¯ ◦ (d − d) ¯ ◦ d¯ + (p − p) ¯ ◦ (d − d) Now omit the last term, which we expect to be negligible if we would substitute the ¯ solution: indeed then both p− p¯ and d − d¯ become small vectors, so (p− p)◦(d ¯ − d) will become very small. This leads to the linear vector equation ¯ + (p − p) p¯ ◦ d¯ + p¯ ◦ (d − d) ¯ ◦ d¯ = v, which also can be written as ¯ = v. p ◦ d − (p − p) ¯ ◦ (d − d) This has a unique solution (p , d ) having p ∈ P , d ∈ D. Take this solution. It is reasonable to expect that this is a good approximation of the unique p ∈ P++ , d ∈ D++ for which p ◦ d = v. In particular, it is reasonable to expect that p and d are positive vectors. Then repeat this step, now starting from (p , d ). Repeat the step a number of times. This can be done in such a way that one gets an algorithm that is in theory and in practice efficient.
9.10 Exercises 1. Show that there exists an optimal solution for the problem of the waist of three lines in space.
242
9 Application to Convex Problems
2. (a) Check that the set of all (x, y, z) ∈ X × X × R for which is a polyhedral set which is the epigraph of a function F : X × X → (−∞, +∞]. (b) Give an explicit description of this function F . 3. Derive Theorem 9.8.3 from the duality theory for convex optimization. 4. Derive the explicit form for the dual problem of an LP problem in standard form that is given in Proposition 9.8.4. 5. Derive the explicit form for the dual problem of an LP problem in symmetric form that is given in Proposition 9.8.5. 6. Derive the explicit form for the dual problem of a conic optimization problem given in Proposition 9.8.6. 7. Give a coordinate free description of a pair of primaldual conic optimization problems and formulate complementarity conditions for such pairs. 8. Determine the best strategies for both players in the penalty problem from Sect. 9.4 if all = 0.9, alr = 0.4, arl = 0.2, arr = 0.8. 9. Give a precise formalization of the maximization problem by Th. Moss from 1755 formulated in Sect. 9.5. 10. Give a precise formalization of the minimization problem by Fermat formulated in Sect. 9.5. 11. Show that there exists a unique global optimal solution to the problem min f (p, d) = p · d − x · ln(p ◦ d), subject to p ∈ P++ , d ∈ D++ considered in Sect. 9.9. 12. Check that the stationarity conditions for the problem min f (p, d) = p · d − x · ln(p ◦ d), subject to p ∈ P++ , d ∈ D++ , considered in Sect. 9.9 are as follows: (d − x ◦ p−1 ) · p˜ (p − x ◦ d −1 ) · d˜ = 0
∀p˜ ∈ P˜ , ˜ ∀d˜ ∈ D.
Appendix A
A.1 Symbols A.1.1 Set Theory s∈S S⊆T S⊂T S×T f :S→T epi(f ) #T
The element s belongs to the set S. For sets S, T : S subset of T , s ∈ S ⇒ s ∈ T . For sets S, T : S proper subset of T , S ⊆ T , S = T . For sets S, T : Cartesian product of S and T , the set {(s, t)  s ∈ S, t ∈ T }. For sets S, T : a function f from S to T . For a set T and a function f : T → R: epigraph of f , the set {(t, ρ)  t ∈ T , ρ ∈ R, ρ ≥ f (t)}. For a finite set T : the number of elements of T .
A.1.2 Some Specific Sets R Rn R+ R++ R [a, b] (a, b) [a, b) (a, b] Rc
Real numbers. Real ncolumn vectors. Nonnegative numbers, [0, +∞) = {x ∈ R  x ≥ 0}. Positive numbers, (0, +∞) = {x ∈ R  x > 0}. Extended real numbers, [−∞, +∞] = R ∪ {−∞, +∞}. For a, b ∈ R, a ≤ b: {x ∈ R  a ≤ x ≤ b}. For a, b ∈ R, a ≤ b: {x ∈ R  a < x < b}. For a, b ∈ R, a ≤ b: {x ∈ R  a ≤ x < b}. For a, b ∈ R, a ≤ b, {x ∈ R  a < x ≤ b}. For nonzero c ∈ X = R n : the open ray spanned by c, {ρc  ρ > 0}.
© Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045
243
244
Appendix A
Rn+
First orthant, the convex cone {x ∈ Rn  x ≥ 0n } = {x ∈ Rn  x1 ≥ 0, . . . , xn ≥ 0}. Lorentz cone or ice cream cone, the convex cone that is the epigraph of the 1 Euclidean norm on Rn , {(x, ρ)  x ∈ Rn , ρ ∈ R, ρ ≥ (x12 + · · · + xn2 ) 2 }. Inner product space of symmetric n × n matrices M (that is, for which M = M), with inner product M, N = i,j Mij Nij . Positive semidefinite cone, the convex cone in Sn that consists of all M that are positive semidefinite x Mx ≥ 0 for all x ∈ Rn .
Ln Sn Sn+
A.1.3 Linear Algebra X
· X
(u, v) L⊥ N
span(S) aff(S) ker(L) N∗ · p x◦y
A finite dimensional (f.d.) inner product space with inner product ·, ·; often (without loss of generality) Rn with the dot product x · y = x1 y1 + · · · + xn yn . 1 The Euclidean norm on X, xX = x, x 2 for all x ∈ X. For X = Rn 1 this is (x12 + · · · + xn2 ) 2 . For unit vectors u, v ∈ Rn : angle ϕ between u and v, cos ϕ = u · v, ϕ ∈ [0, π ]. For subspace L of X: orthogonal complement of L, the subspace {x ∈ X  x · y = 0 ∀y ∈ L}. For a linear function M : X → Y : transpose, the linear function Y → X with recipe characterized by M (y), x = y, M(x) for all x ∈ X and y ∈ Y. Preference relation on X. For a subset S ⊆ X: the span of S, the minimal subspace of X containing S (where minimal means ‘minimal with respect to inclusion’). For a subset S ⊆ X: the affine hull of S, the minimal affine subset of X containing S (where minimal means ‘minimal with respect to inclusion’). For a linear function L : X → Y , where X, Y are f.d. inner product spaces: the kernel or nullspace of L, the subspace {x ∈ X  L(x) = 0} of X. For a norm N on X: the dual norm on X, N ∗ (y) = maxx=0 x, y/N(x) for all y ∈ X. 1
For p ∈ [1, +∞]: the pnorm on Rn , xp = [x1 p + · · · + xn p ] p if p = +∞, and x∞ = max(x1 , . . . , xn ). So x2 is the Euclidean norm. For x, y ∈ Rn : Hadamard product of x and y, the pointwise product (x1 y1 , . . . , xn yn ) .
Appendix A
245
A.1.4 Analysis f (x) f (x) f Tx R
For a function f : X → Y and a point x ∈ X of differentiability of f : derivative of f at x, linear function f (x) : X → Y for which f (x + h) − f (x) − f (x)(h)/h → 0 for h → 0. For a function f : X → Y and a point x ∈ X of twice differentiability: second derivative or Hessian of f at x, f = (f ) . For a set T and a function f : T → R: graph of f , the set {(t, f (t))  t ∈ T }. For a set R ⊆ X and a point x ∈ R: the tangent cone of R at x, the set of vectors d ∈ X for which there exist sequences (dk )k in X and (tk )k in R++ such that dk → d, tk → 0 for k → ∞, and r + tk dk ∈ R for all k.
A.1.5 Topology cl(S)
int(S)
BX (a, r) = Bn (a, r) = B(a, r) UX (a, r) = Un (a, r) = U (a, r) SX = Sn BX = Bn UX = Un
For a subset S ⊂ X: the closure of S, the minimal closed subset of X that contains S (minimal means minimal with respect to inclusion). For a subset S ⊂ X: the interior of S, the maximal open subset of X that is contained in S (maximal means maximal with respect to inclusion). For a ∈ X = Rn and r ∈ R++ : the closed ball with radius r and center a, {x ∈ X  x−a ≤ r}. For a ∈ X = Rn and r ∈ R++ : the open ball with radius r and center a, {x ∈ X  x−a < r}. For X = Rn : the standard unit sphere in X, {x ∈ X  x12 + · · · + xn2 = 1} = {x ∈ X  x = 1}. For X = Rn : the standard closed unit ball in X, B(0n , 1) = {x ∈ X  x12 + · · · + xn2 ≤ 1} = {x ∈ X  x ≤ 1}. For X = Rn : the standard open unit ball in X, U (0n , 1) = {x ∈ X  x12 = · · · + xn2 < 1} = {x ∈ X  x < 1}.
A.1.6 Convex Sets A RA
Often used for a convex set. Recession cone for a nonempty convex set A.
246
co(S) cone(S) MA
AN A1 + A2 A1 ∩ A2 A1 co ∪ A2 A1 #A2 LA
ri(A)
extr(A) extrr(A)
rA−
rA+
B B◦ B ◦◦
Appendix A
For a set S ⊆ X: the convex hull of S, the minimal convex set containing S (minimal means ‘minimal under inclusion’). For a set S ⊆ X: the conic hull of S, the minimal convex cone containing S ∪ {0X } (minimal means ‘minimal under inclusion’). For a convex set A ⊆ X and a linear function M : X → Y , where Y is also a f.d. inner product space: image of A under M, the convex set {M(a)  a ∈ A} of Y . For f.d. inner product spaces X, Y , a linear function N : X → Y , and a convex set A ⊆ Y : inverse image of A under N , the convex set{x ∈ X  N(x) ∈ A}. For convex sets A1 , A2 ⊆ X: Minkowski sum of A1 and A2 , {a1 + a2  a1 ∈ A1 , a2 ∈ A2 }. For convex sets A1 , A2 ⊆ X: intersection of A1 and A2 , {x ∈ X  x ∈ A1 & x ∈ A2 }. For convex sets A1 , A2 ⊆ X: convex hull of the union of A1 and A2 : co(A1 ∪ A2 ). For convex sets A1 , A2 ⊆ X: inverse sum of A1 and A2 , ∪α∈[0,1] (αA1 ∩ (1 − α)A2 ). For a nonempty convex set A: the lineality space of A, the maximal subspace of X that consists of recession vectors of A, RA ∩ R−A (maximal means maximal with respect to inclusion). For a convex set A ⊆ X: the relative interior of A, the maximal relatively open subset of aff(A) containing A (maximal means maximal with respect to inclusion). For a convex set A: the set of extreme points of A, points that are not the average of two different points of A. For a convex set A: the set of extreme recession directions of A, recession directions Ru for which the vector u is not the average of two linearly independent vectors v, w for which Rv and Rw are recession directions of A. For a convex set A ⊆ X: the minimal improper convex function for A (where minimal means minimal with respect to the pointwise ordering for functions), the improper convex function X → R with recipe x → −∞ if x ∈ cl(A) and x → +∞ otherwise. For a convex set A ⊆ X: the maximal improper convex function for A (where maximal means maximal with respect to the pointwise ordering for functions), the improper convex function X → R with recipe x → −∞ if x ∈ ri(A) and x → +∞ otherwise. Often used for a convex set that contains the origin. For a convex set B ⊆ X that contains the origin: the polar set, the convex set containing the origin {y ∈ X  x, y ≤ 1 ∀x ∈ B}. For a convex set B ⊆ X that contains the origin: the bipolar set, the closed convex set that contains the origin (B ◦ )◦ .
Appendix A
B← → B˜ d
C C◦ C ◦◦ C∗ → C˜ C← d
247
For convex sets containing the origin B, B˜ ⊆ X: B and B˜ are in duality, B ◦ = B˜ and B˜ ◦ = cl(B). Warning: B˜ is closed, but B is not necessarily closed. Often used for a convex cone. For a convex cone C ⊆ X: polar cone, the closed convex cone {y ∈ X  x, y ≤ 0 ∀x ∈ C}. For a convex cone C ⊆ X: bipolar cone, the closed convex cone (C ◦ )◦ . For a convex cone C ⊆ X: the dual cone, the closed convex cone −C ◦ = {y ∈ X  x, y ≥ 0 ∀x ∈ C}. For convex cones C, C˜ ⊆ X: C and C˜ are in duality, C ◦ = C˜ and C˜ ◦ = cl(C). Warning: C˜ is closed, but C is not necessarily closed.
A.1.7 Convex Sets: Conification X +X d(C) cmin (A)
cmax (A)
c(A) ◦(+−) ◦(−−) ◦(++) ◦(+−)
Diagonal map for X, function X → X × X with recipe x → (x, x). Sum map for X, function X × X → X with recipe (x, y) → x + y. For a convex cone C ⊆ X × R+ : deconification/dehomogenization of C, the convex set A = {a ∈ X  (a, 1) ∈ C}. For a convex set A ⊆ X: the minimal conification of A (where minimal means ‘minimal under inclusion’), R++ (A × {1}) = {ρ(a, 1)  ρ ∈ R++ , a ∈ A} (the conic hull of A × {1} with the origin of X × R deleted). For a convex set A ⊆ X: the maximal conification of A (where maximal means ‘maximal under inclusion’), cmin (A) ∪ (RA × {0}) (this is a union of two disjoint sets). For a convex set A ⊆ X: the conification of A, cmin (A). Minkowski sum. Intersection. Convex hull of the union. Inverse sum.
A.1.8 Convex Functions dom(f ) ∂f ( x)
For a proper convex function f on X: effective domain of f , the set {x ∈ X  f (x) < +∞}. For a convex function f on X and a point x in its effective domain: subdifferential of f at x , the set of slopes of those affine functions on X that have the same value as f at the point x and that moreover minorize f , {y ∈ X  f ( x ) + y, x − x ≤ f (x) ∀x ∈ X}.
248
f1 ∨ f2 = max(f1 , f2 ) f1 + f2 f1 co ∧ f2
f1 ⊕ f2
f #g
cl(f )
fM
Mg
Rf f∗ f ∗∗ d → f˜ f ←
∂p
Appendix A
For convex functions f1 , f2 on X: the pointwise maximum of f1 and f2 , the convex function X → R with recipe x → max(f1 (x), f2 (x)). For convex functions f1 , f2 on X: pointwise sum of f1 and f2 , the convex function X → R with recipe x → f1 (x) + f2 (x). For proper convex functions f1 , f2 on X: convex hull of the (pointwise) minimum of f1 and f2 , the largest convex underestimator of f1 and f2 , that is, the strict epigraph of f1 co ∧ f2 is the convex hull of the strict epigraph of f1 ∧ f2 = min(f1 , f2 ), the pointwise minimum of f1 and f2 . For proper convex functions f1 , f2 on X: infimal convolution of f1 and f2 , the convex function on X with recipe x → inf{f1 (x1 ) + f2 (x2 )  x = x1 + x2 }. For convex functions f1 , f2 on X: Kelley’s sum, the convex function on X with recipe x → inf{f (x1 ) ∨ g(x2 )  x = x1 + x2 } (where a ∨ b = max(a, b) for a, b ∈ R). For a convex function on X: the closure of f , the maximal closed convex function that minorizes f , characterized by epi(cl(f )) = cl(epi(f )). For two f.d. inner product spaces X, Y , a convex function f on Y , and a linear function M : X → Y : the inverse image of f under M, a convex function on X with recipe x → f (M(x)). For two f.d. inner product spaces X, Y , a convex function g on X, and a linear function M : X → Y : the image of g under M, a convex function on Y with recipe y → inf{g(x)  M(x) = y}. For a convex function on X: the recession cone of f , the recession cone of a nonempty level set of f . For a convex function f on X: conjugate function, the convex function on X with recipe y → supx (x, y − f (x)). For a convex function f on X: biconjugate function of f , (f ∗ )∗ . For convex functions f, f˜ on X: f and f˜ are in duality, f ∗ = f˜ and f˜∗ = cl(f ). Warning: f˜ is closed, but f is not necessarily closed. For a proper sublinear function p on X: subdifferential of p, the nonempty closed convex set {y ∈ X  x, y ≤ p(x) ∀x ∈ X}.
Appendix A
sA
249
For a nonempty convex set A ⊆ X: support function of A, the proper closed sublinear function on X with recipe y → sup{x, y  x ∈ A}.
A.1.9 Convex Functions: Conification d(C)
cmin (f )
cmax (f )
c(f ) ◦(−+−) ◦(−−−) ◦(+++) ◦(++−)
For a convex cone C ⊆ X × R × R for which R+ (0X , 1, 0) ⊆ cl(C) ⊆ X ×R×R+ : deconification/dehomogenization of C, the convex function f on X with recipe x → f (x) = inf{ρ  (x, ρ, 1) ∈ C}. For a convex function f on X: the minimal conification of f (minimal means minimal with respect to inclusion), the convex cone R++ (epis (f ) × {1}) in X × R × R. For a convex function f on X: the maximal conification of f (maximal means maximal with respect to inclusion), the closed convex cone that is the union of the disjoint sets R++ (epi(f ) × {1}) and Repi(f ) × {0}. For a convex function f on X: the conification of f , cmin (f ). pointwise sum. pointwise maximum. convex hull of the pointwise minimum. infimal convolution.
A.1.10 Optimization min R
inf R max R
sup R L
S ϕ
For a set R ⊂ R: minimum of R, the element r ∈ R for which r ≥ r for all r ∈ R. Not every set R has a minimal element; if it exists, then it is unique. For a set R ⊂ R: infimum of R, the largest lowerbound of R in R. Every set R has an infimum, and it is unique. For a set R ⊂ R: maximum of R, the element r ∈ R for which r ≤ r for all r ∈ R. Not every set R has a maximal element; if it exists, then it is unique. It can be expressed in the concept of minimum: max R = − min(−R). For a set R ⊂ R: supremum of R, the smallest upperbound of R in R. Every set R has a supremum, and it is unique. It can be expressed in the concept of infimum: sup R = − inf(−R). For a perturbation function F : X × Y → R: Lagrange function, the function X × (Y × [0, +∞) \ {0Y ×R }) → R with recipe (x, η, η0 ) → infy∈Y (η0 F (x, y) − η, y). For a perturbation function F : X × Y → R: optimal value function, the function Y → R with recipe y → infx∈X F (x, y). For a perturbation function F : X × Y → R: objective function dual problem, the function Y → R with recipe y → infy (η, y − S(y)).
Appendix B
B.1 Calculus Formulas B.1.1 Convex Sets Containing the Origin B ◦◦ = B if B is closed. d → A◦ ∩ B ◦ . A co ∪ B ← d → A◦ # B ◦ . A+B ← d MB ← → B ◦ M if M is a linear function.
B.1.2 Nonempty Convex Cones C ◦◦ = C if C is closed. d → C ◦ ∩ D◦. C+D ← d MC ← → C ◦ M if M is a linear function.
B.1.3 Subspaces L⊥⊥ = L. d → K ⊥ ∩ L⊥ . K +L← d ML ← → L⊥ M if M is a linear function.
© Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045
251
252
Appendix B
B.1.4 Convex Functions f ∗∗ = f if f is closed. d
→ f ∗ ⊕ g∗. f +g ← d
→ f ∗ co ∧ g ∗ . f ∨g ← d
→ g∗M . Mg ←
B.1.5 Proper Sublinear Functions p, q and Nonempty Convex Sets A ∂s(A) = A if A is closed. s∂(p) = p if p is closed. ∂(p + q) = cl(∂p + ∂q). ∂(p ∨ q) = cl((∂p) co ∪ (∂q)). ∂(p ⊕ q) = cl(∂p ∩ ∂q). ∂(p # q) = cl(∂p # ∂q).
B.1.6 Subdifferentials of Convex Functions ∂(f + g)(x) = cl(∂f (x) + ∂g(x)). ∂(f ∨ g)(x) = cl(∂f (x) co ∪ ∂g(x)) if f (x) = g(x).
B.1.7 Norms N ∗∗ = N . d → N1∗ + N2∗ . N1 ⊕ N2 ← d
→ N1∗ ∨ N2∗ . N1 co ∧ N2 ←
Index
Symbols BX = Bn closed standard unit ball in X = Rn , 17 C 1 function, 180 L⊥ orthogonal complement, 13 SX = Sn standard unit sphere in X = Rn , 15 R = [0, ∞), 19 ri(A) relative interior convex set A, 67 X diagonal map, 12 +X sum map, 12 lp norm, 94 pnorm, 94
A Affine function linear plus constant, 39 Affine hull, 67 Affinely independent, 34 Affine subspace, 67, 175, 238, 240 Angle, 74
B Barrier methods, 238 Basis, 55 Biconjugate function, 156 Bilinear mapping, 99 Binary operation for convex cones intersection, 56 Minkowski sum, 56 Binary operation for convex functions convex hull of the minimum, 139, 144
infimal convolution, 139, 144 Kelley’s sum, 139 maximum, 139, 143 sum, 139, 143 Binary operation for convex sets convex hull of the union, 60 intersection, 60 inverse sum, 61 Minkowski sum, 60 systematic construction, 60 Bipolar cone, 97 Bipolar set, 96 Blaschke’s theorem, 38 Boundary point, 66 Bounded set, 38, 66
C Calculus rules convex cones, 103, 104 convex functions, 157 convex sets, 161 convex sets containing the origin, 103 sublinear functions, 161 Carathéodory’s theorem, 40 Centerpoint theorem, 39 Certificate for insolubility, 81 unboundedness, 81 Child drawing, 112, 120 Closed convex function, 133 Closed halfspace, 88 Closed set, 38, 66 Closure, 66
© Springer Nature Switzerland AG 2020 J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations Research, https://doi.org/10.1007/9783030418045
253
254 Coercive, 177 Compactify, xvii Compact set, 38, 66 Concave function, 146, 154 Cone, 9, 181 Conic combination, 29 Conic hull, 27 Conic problem, 175 Conification method, 23 Conjugate function, 154 application, 166, 167 concave, 220 Continuous function, 41, 67 Continuously differentiable, 180 Convergent, 66 Convex combination, 28 Convex cone, 10 preference relation, 43 spacetime, 43 Convex function, 125 closed, 125 epigraph, 129 (im)proper, 128, 130 lowersemicontinuous, 125 strict epigraph, 130 Convex hull, 27 Convex hull of the union convex sets, 60 Convex optimization problem, 174 Convex programming problem, 175, 210 Convex set, 7 dual description, 29 fair bargain, 5 primal description, 28 shape, 77 Coordinatefree crash course, 54
D Deconification convex set, 19 Dehomogenization convex set, 19 Derivative, 180 Diagonal map, 56 Differentiable, 180 Distance between two rays, 14 Dot product, 14 Dual cone, 97 Dual description convex cone, 97 convex function, 156
Index convex set, 87, 89, 160 convex set containing the origin, 96 sublinear function, 160 Duality the essence, 98 proof: ball throwing, 90 proof: Hahn–Banach, 115 proof: shortest distance, 114 Duality convex sets BlackScholes, 88, 111 certificates of insolubility, 88 child drawing, 86, 112 Hahn–Banach, 94 manufacturing, 87 price mechanism, 113 separation theorem, 93 supporting hyperplane theorem, 92 theorems of the alternative, 108 Duality gap, 208 Duality operator conjugate function, 154 polar cone, 96 polar set, 95 subdifferential sublinear function, 160 support function, 160 Duality operator for convex cone, 96 convex function, 154 convex set, 160 convex set containing the origin, 95 sublinear function, 160 Duality scheme, 209 Duality theory illustration, 229, 234 oldest example, 229 Duality theory LP coordinate free, 234, 237 Dual norm, 94, 117 Dual problem, 206 saddle function, 218 Dual space normed space, 117 DubovitskiiMilyutin theorem, 165
E Effective domain, 130 Epigraph, 128 Euclidean norm, 14 European call option, 111 Extreme point, 78 Extreme recession direction, 78
Index F Farkas’ lemma, 110 Fenchel, xx Fenchel duality, 221 FenchelMoreau theorem, 157 Fermat’s theorem (convex), 185 Fermat theorem, 182 Finite dimensional vector space, 54 First orthant, 26 FourierMotzkin elimination, 81 Four step method, 183 Functional analysis, 93
G Gauss, 74 Geodesic on standard unit sphere, 15 Geodesically convex set in standard unit sphere, 15 Golden convex cone first orthant, 26 Lorentz cone, 26 positive semidefinite cone, 26
H Hadamard’s inequality, 147 Helly’s theorem, 35 Hemisphere parametrization, 74 Hemisphere model convex cone, 16 convex set, 24 Hessian, 134 Hilbert projection theorem, 115 Homogenization, vii convex set, 18 Homogenization method, 23, 103 binary operations convex functions, 143 binary operations convex sets, 57, 60, 144 calculus convex functions, 158 subdifferential, 164 calculus sublinear functions, 161 Carathéodory’s theorem, 40 duality for convex functions, 157 finite convex combinations, 29 Helly’s theorem, 36 organization of proofs, 29 polar set, 99, 102 properties subdifferential convex functions, 164 Radon’s theorem, 33
255 subdifferential, 161 support function, 161 Hyperplane, 78, 88
I Ice cream cone, 26 Iff, 8 Image convex function, 138 Indicator function, 128 In duality convex cones, 103 convex sets containing the origin, 104 polyhedral sets, 107 Infimum, 129 Inner product, 55 Inner product space, 55 Interior, 66 Interior point, 66 Interior point methods, 238 Intersection convex cones, 56 convex sets, 60 Inverse image convex function, 138 Inverse sum convex sets, 61 Involution, 94
J Jensen’s inequality, 127 Jung’s theorem, 38
K Karush–Kuhn–Tucker theorem, 210 Kernel, 79 KKT in usual form, 210 Krasnoselski’s theorem, 39
L Lagrange function, 194, 210 Lagrange multipliers (convex), 192 Least squares, 225 Least squares problem, 176 Limit of a function at a point, 66 Limit of a sequence, 66 Lineality space of a convex cone, 48 of a convex set, 80 Linear function, 55
256 Linear programming, 175 Linear programming problem coordinate free, 235 Local minimum, 181, 184 Lorentz cone, 26 Lower semicontinuous convex function, 133
M Martingale, 111 Maximin, 217 Median, 39 Minimax, 217 Minimax theorem illustration, 228 Minimax theorem of von Neumann, 219 Minkowski function, 126 Minkowski sum convex cones, 56 convex sets, 60 MinkowskiWeyl representation, 106 MoreauRockafellar theorem, 165 Most likely matrix RAS method, 228
N Nash bargaining, 5 Nondegenerate bilinear mapping, 99 Norm, 94 Normed space, 94 Norm of a linear function, 94
O Open, 66 Open ball, 66 Open ray, 9, 14 Optimality conditions duality theory, 206 Fenchel duality, 220 KKT in subdifferential form, 213 KKT in usual form, 210 minimax and saddle points, 217 Optimal solution, 174 Optimal value, 174 Optimal value function, 192 Orthogonal complement, 13 of an affine subspace, 238, 240 Orthonormal basis, 55
Index P Pair in duality convex functions, 157 Penalty kick, 228 Perspective function, 47 Perturbation function, 189 Perturbed problem, 189 Platonic solids, 120 Pointed convex cone, 43 Points in the middle, 32 Polar cone, 96 with respect to a bilinear mapping, 100 Polar set, 94, 95 geometric construction, 103 homogenization, 102 homogenization method, 99 Polygon, 8 Polyhedral cone, 42 Polyhedral set, 42, 106 Pontryagin’s Maximum Principle, 46 Positive homogeneous, 9 Positive homogeneous function, 10 Positive semidefinite cone, 26 Positive semidefinite matrix, 26 Primaldual LP standard form, 237 symmetric form, 237 Primaldual problems conic form, 237 Primal problem, 206 saddle function, 218 Projection point on convex set, 115 Proper convex function Jensen’s inequality, 127 Protasov, 213
Q Quadratic programming, 175
R Radon point, 32 Radon’s theorem, 32 Ray model convex cone, 14 convex set, 24 Recession cone convex function, 144 convex set, 20 Recession direction, 20 Recession vector, 20 Relative boundary convex set, 67
Index
257
Relative interior convex set, 67 Relatively open convex set, 67 Rockafellar, xx
Supporting halfspace, 92 Supporting hyperplane, 92 Supporting hyperplane theorem, 92 Supporting line, 38 Surprising bijection, 238
S Saddle function, 217 Saddle point, 218 Saddle value, 218 Second welfare theorem in praise of price mechanisms, 230 Selfdual convex cone, 97 Semidefinite programming, 175 Separated by a hyperplane, 92 Separation theorem, 92 Shapley–Folkman lemma, 44 Shortest distance problem, 176 Slater condition, 193, 210 Slater point, 193, 210 Smooth local minimum, 181 Spaces in duality, 99 Sphere model, 15 convex cone, 15 Standard closed unit ball, 17 Standard unit sphere, 15 Strict epigraph, 130 Strictly convex, 134 Strictly convex set, 178 Strict sublevel set, 130 Strong duality, 208 Subdifferential convex function, 163 sublinear function, 160 Subgradient convex function, 162 Sublevel set, 130, 144 Sublinear function, 159 Subspace, 11, 12 Sum map, 56 Support function, 160
T Tangent cone, 164, 181 Tangent space, 182 Tangent space theorem, 181 Tangent vector, 181 Theorem of Chebyshev for affine functions, 39 Theorem of Hahn–Banach, 93, 115 Theorem of Moreau, 115 Theorem of the alternative, 108 Gale, 110 Gordan, 120 Ky Fan, 109 Minkowski, Farkas, 109 Theorem of Weierstrass, 67 Topological notion, 66 Topview model convex cone, 17 convex set, 25 Transpose of a linear function, 55
U Unbounded, 66 Uniqueness convex optimization, 177
W Waist of three lines, 178 Weak duality, 208 Weyl’s theorem, 43 Width, 38 Wlog, 36