An Easy Path to Convex Analysis and Applications [2 ed.] 3031264576, 9783031264573

This book examines the most fundamental parts of convex analysis and its applications to optimization and location probl

177 120 3MB

English Pages 312 [313] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

An Easy Path to Convex Analysis and Applications [2 ed.]
 3031264576, 9783031264573

Table of contents :
Preface to the Second Edition
Preface to the First Edition
Acknowledgments
Contents
Glossary of Notation and Acronyms
Operations and Symbols
Spaces
Sets
Functions
Set-Valued Mappings
Acronyms
List of Figures
1 Convex Sets and Functions
1.1 Preliminaries
1.2 Convex Sets
1.3 Convex Functions
1.4 Continuity and Lipschitz Continuity of Convex Functions
1.5 The Distance Function
1.6 Convex Set-Valued Mappings and Optimal Value Functions
1.7 Exercises for This Chapter
2 Convex Separation and Some Consequences
2.1 Affine Hulls and Relative Interiors of Convex Sets
2.2 Strict Separation of Convex Sets
2.3 Separation and Proper Separation Theorems
2.4 Relative Interiors of Convex Graphs
2.5 Normal Cones to Convex Sets
2.6 Exercises for This Chapter
3 Convex Generalized Differentiation
3.1 Subgradients of Convex Functions
3.2 Subdifferential Calculus Rules
3.3 Mean Value Theorems
3.4 Coderivative Calculus
3.5 Subgradients of Optimal Value Functions
3.6 Generalized Differentiation for Polyhedral Convex Functions and Multifunctions
3.7 Exercises for This Chapter
4 Fenchel Conjugate and Further Topics in Subdifferentiation
4.1 Fenchel Conjugates
4.2 Support Functions and Support Intersection Rules
4.3 Conjugate Calculus Rules
4.4 Directional Derivatives
4.5 Subgradients of Supremum Functions
4.6 Exercises for This Chapter
5 Remarkable Consequences of Convexity
5.1 Characterizations of Differentiability
5.2 Carathéodory Theorem
5.3 Farkas Lemma
5.4 Radon Theorem and Helly Theorem
5.5 Extreme Points and Minkowski Theorem
5.6 Tangents to Convex Sets
5.7 Exercises for This Chapter
6 Minimal Time Functions and Related Issues
6.1 Horizon Cones
6.2 Minimal Time Functions and Their Properties
6.3 Minkowski Gauge Functions
6.4 Subgradients of Minimal Time Functions with Bounded Dynamics
6.5 Subgradients of Minimal Time Functions with Unbounded Dynamics
6.6 Exercises for This Chapter
7 Applications to Problems of Optimization and Equilibrium
7.1 Lower Semicontinuity and Existence of Minimizers
7.2 Optimality Conditions in Convex Minimization
7.3 Fenchel Duality in Optimization
7.4 Lagrangian Duality in Convex Optimization
7.5 Subgradient Methods in Nonsmooth Convex Optimization
7.6 Nash Equilibrium via Convex Analysis
7.7 Exercises for This Chapter
8 Applications to Location Problems
8.1 The Fermat-Torricelli Problem
8.2 The Weiszfeld Algorithm
8.3 Generalized Fermat-Torricelli Problems
8.4 Solving Planar Fermat-Torricelli Problems for Euclidean Balls
8.5 Generalized Sylvester Problems
8.6 Solving Planar Smallest Intersection Ball Problems
8.7 Exercises for This Chapter
A Solutions and Hints for Selected Exercises
References
Index

Citation preview

Synthesis Lectures on Mathematics & Statistics

Boris Mordukhovich Nguyen Mau Nam

An Easy Path to Convex Analysis and Applications Second Edition

Synthesis Lectures on Mathematics & Statistics Series Editor Steven G. Krantz, Department of Mathematics, Washington University, Saint Louis, MO, USA

This series includes titles in applied mathematics and statistics for cross-disciplinary STEM professionals, educators, researchers, and students. The series focuses on new and traditional techniques to develop mathematical knowledge and skills, an understanding of core mathematical reasoning, and the ability to utilize data in specific applications.

Boris Mordukhovich · Nguyen Mau Nam

An Easy Path to Convex Analysis and Applications Second Edition

Boris Mordukhovich Wayne State University Detroit, MI, USA

Nguyen Mau Nam Department of Mathematics and Sciences Portland State University Portland, OR, USA

ISSN 1938-1743 ISSN 1938-1751 (electronic) Synthesis Lectures on Mathematics & Statistics ISBN 978-3-031-26457-3 ISBN 978-3-031-26458-0 (eBook) https://doi.org/10.1007/978-3-031-26458-0 1st edition: © Morgan & Claypool Publishers 2014 2nd edition: © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To the Loving Memory of Our Parents

Preface to the Second Edition

Seven years have passed since the first edition of this book was published. During these years, the book has been used for teaching undergraduate and graduate classes in our universities and in many other institutions worldwide. Moreover, the developed applications of convex analysis and optimization to facility location, machine learning, and data science have attracted serious attention of researchers and practitioners working in these fascinating areas. Our teaching experience, as well as that of our colleagues in different institutions, calls for updating and extending the first edition of the book, which is now presented to the reader’s attention. The second edition also benefits from recent developments in convex and variational analysis with their applications to optimization and related topics. We particularly mention the books “Convex Analysis and Monotone Operator Theory in Hilbert Spaces” by Bauschke and Combettes [1], “Lectures on Convex Optimization” by Nesterov [28], “Variational Analysis and Applications” by the first author [16], and our book “Convex Analysis and Beyond” [18]. In comparison with the first edition with four chapters, the second edition contains eight chapters. Among the most important additional topics in the second edition, we mention conjugate calculus, convex generalized differentiation of polyhedral functions and multifunctions, extreme points and the Minkowski theorem, Fenchel and Lagrangian duality in optimization, etc. We added new exercises to each chapter with selected solutions/hints presented in a separate chapter. New figures are also added to the second edition. The book is fully written in finite-dimensional spaces, which makes it accessible for teaching undergraduate and lower graduate classes. Some historical remarks and commentaries are given in the text. More commentaries for basic theoretical aspects of convex analysis can be found in the recent monograph [18], which contains advanced material in infinite-dimensional spaces.

vii

viii

Preface to the Second Edition

Besides teaching classes on convex analysis and optimization, this book should be of interest to doctoral students, researchers, and practitioners in the areas of convex and variational analysis, optimization-related problems, facility location, and their applications. Ann Arbor, MI, USA Portland, OR, USA December 2022

Boris Mordukhovich Nguyen Mau Nam

Preface to the First Edition

Some geometric properties of convex sets and, to a lesser extent, convex functions had been studied before the 1960s by many outstanding mathematicians, first of all by Hermann Minkowski and Werner Fenchel. At the beginning of the 1960’s, convex analysis was greatly developed in the works of R. Tyrrell Rockafellar and Jean-Jacques Moreau who initiated a systematic study of this new field. As a fundamental part of variational analysis, convex analysis contains a generalized differentiation theory that can be used to study a large class of mathematical models with no differentiability assumptions on their initial data. It has been well recognized the importance of convex analysis for many applications in which convex optimization is the first to name. The presence of convexity makes it possible not only to comprehensively investigate qualitative properties of optimal solutions and derive necessary and sufficient conditions for optimality but also to develop effective numerical algorithms to solve convex optimization problems, even with nondifferentiable data. Convex analysis and optimization have an increasing impact on many areas of mathematics and applications including control systems, estimation and signal processing, communications and networks, electronic circuit design, data analysis and modeling, statistics, economics and finance, etc. There are several fundamental books devoted to different aspects of convex analysis and optimization. Among them we mention “Convex Analysis” by Rockafellar [30], “Convex Analysis and Nonlinear Optimization” by Borwein and Lewis [3], “Conjugate Duality in Convex Optimization” by Bo¸t [5], “Convex Optimization” by Boyd and Vandenberghe [6], “Convex Analysis and Minimization Algorithms” by Hiriart-Urruty and Lemaréchal [9] and its abridge version [10], “Introductory Lectures on Convex Optimization” by Nesterov [26], “Convex Functions, Monotone Operators and Differentiability” by Phelps, “Convex Analysis with General Vector Spaces” by Z˘alinescu [38], as well as other books listed in the bibliography below. In this big picture of convex analysis and optimization, our book serves as a bridge for students and researchers who have just started using convex analysis to reach deeper topics in the field. We give detailed proofs for most of the results presented in the book and also include many figures and exercises for better understanding the material. The powerful geometric approach developed in modern variational analysis is adopted and ix

x

Preface to the First Edition

simplified in the convex case in order to provide the reader with an easy path to access generalized differentiation of convex objects infinite dimensions. In this way, the book also serves as a starting point for the interested reader to continue the study of nonconvex variational analysis and applications. It can be of interest from this viewpoint to experts in convex and variational analysis. Finally, the application part of this book not only concerns the classical topics of convex optimization related to optimality conditions and subgradient algorithms but also presents some recent while easily understandable qualitative and numerical results for important location problems. The book consists of four chapters and is organized as follows. In Chapter 1, we study fundamental properties of convex sets and functions while paying particular attention to classes of convex functions important in optimization. Chapter 2 is mainly devoted to developing basic calculus rules for normals to convex sets and subgradients of convex functions that are in the mainstream of convex theory. Chapter 3 concerns some additional topics of convex analysis that are largely used in applications. Chapter 4 is fully devoted to applications of basic results of convex analysis to problems of convex optimization and selected location problems considered from both qualitative and numerical viewpoints. Finally, we present at the end of the book Solutions and Hints to selected exercises. Exercises are given at the end of each chapter while figures and examples are provided throughout the whole text. The list of references contains books and selected papers, which are closely related to the topics considered in the book and may be helpful to the reader for advanced studies of convex analysis, its applications, and further extensions. Since only elementary knowledge in linear algebra and basic calculus is required, this book can be used as a textbook for both undergraduate and graduate level courses in convex analysis and its applications. In fact, the authors have used these lecture notes for teaching such classes in their universities as well as while visiting some other schools. We hope that the book will make convex analysis more accessible to large groups of undergraduate and graduate students, researchers in different disciplines, and practitioners. The first author acknowledges continued support by grants from the National Science Foundation and the second author acknowledges support by the Simons Foundation. Ann Arbor, Michigan and Portland, Oregon November 2013

Boris Mordukhovich Nguyen Mau Nam

Acknowledgments

Our first thanks go to many students who attended our classes on convex analysis and optimization based on the first edition of this book. Their active participation, interest, and criticism of some material presented in the first edition play a crucial role in the writing of the second edition. We are also grateful to our collaborators of the papers whose results are used in the book, particularly in applications to facility location problems. Over the years, various aspects of the book have been discussed with our colleagues and friends. We specially mention Heinz Bauschke, Dimitri Bertsekas, Jon Borwein, Radu Bo¸t, René Henrion, Jean-Baptise Hiriart-Urruty, Marco López, Juan Enrique Martínez-Legaz, Yurii Nesterov, Michael Overton, Diethard Pallaschke, Boris Polyak, Terry Rockafellar, Stephen Simons, Christiane Tammer, Nguyen Dong Yen, and Constantin Z˘alinescu. Our deep gratitude goes to all of them. Finally, we are grateful to the National Science Foundation for their support during this project. Boris Mordukhovich Nguyen Mau Nam

xi

Contents

1 Convex Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Continuity and Lipschitz Continuity of Convex Functions . . . . . . . . . . . . . 1.5 The Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Convex Set-Valued Mappings and Optimal Value Functions . . . . . . . . . . . 1.7 Exercises for This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 7 14 25 28 33 40

2 Convex Separation and Some Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Affine Hulls and Relative Interiors of Convex Sets . . . . . . . . . . . . . . . . . . . 2.2 Strict Separation of Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Separation and Proper Separation Theorems . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Relative Interiors of Convex Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Normal Cones to Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Exercises for This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45 45 58 61 67 70 77

3 Convex Generalized Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Subgradients of Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Subdifferential Calculus Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Mean Value Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Coderivative Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Subgradients of Optimal Value Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Generalized Differentiation for Polyhedral Convex Functions and Multifunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Exercises for This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81 81 91 98 100 106

4 Fenchel Conjugate and Further Topics in Subdifferentiation . . . . . . . . . . . . . 4.1 Fenchel Conjugates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Support Functions and Support Intersection Rules . . . . . . . . . . . . . . . . . . . . 4.3 Conjugate Calculus Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

121 121 127 132 137

109 116

xiii

xiv

Contents

4.5 Subgradients of Supremum Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Exercises for This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

142 147

5 Remarkable Consequences of Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Characterizations of Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Carathéodory Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Farkas Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Radon Theorem and Helly Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Extreme Points and Minkowski Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Tangents to Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Exercises for This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

151 151 153 156 159 160 163 167

6 Minimal Time Functions and Related Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Horizon Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Minimal Time Functions and Their Properties . . . . . . . . . . . . . . . . . . . . . . . 6.3 Minkowski Gauge Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Subgradients of Minimal Time Functions with Bounded Dynamics . . . . . 6.5 Subgradients of Minimal Time Functions with Unbounded Dynamics . . . 6.6 Exercises for This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

171 171 174 179 183 185 187

7 Applications to Problems of Optimization and Equilibrium . . . . . . . . . . . . . . 7.1 Lower Semicontinuity and Existence of Minimizers . . . . . . . . . . . . . . . . . . 7.2 Optimality Conditions in Convex Minimization . . . . . . . . . . . . . . . . . . . . . . 7.3 Fenchel Duality in Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Lagrangian Duality in Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Subgradient Methods in Nonsmooth Convex Optimization . . . . . . . . . . . . . 7.6 Nash Equilibrium via Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Exercises for This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

191 191 198 204 209 222 229 233

8 Applications to Location Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 The Fermat-Torricelli Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 The Weiszfeld Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Generalized Fermat-Torricelli Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Solving Planar Fermat-Torricelli Problems for Euclidean Balls . . . . . . . . . 8.5 Generalized Sylvester Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Solving Planar Smallest Intersection Ball Problems . . . . . . . . . . . . . . . . . . . 8.7 Exercises for This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

237 237 242 246 256 262 269 274

Solutions and Hints for Selected Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

277

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

295

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

297

Glossary of Notation and Acronyms

Operations and Symbols :=   AT ·, · · x → x¯ lim inf lim sup dim Ω1 × Ω2 Ω 1⊂ Ω 2 f1  f2 

Equal by definition Less than or equal to componentwise Greater than or equal to componentwise Transpose of matrix A Inner product Euclidean norm x converges to x¯ Lower limit for real numbers Upper limit for real numbers Dimension Cartesian product of Ω 1 and Ω 2 Ω 1 is included in or is equal to Ω 2 Infimal convolution of two functions End of proof

Spaces R := (−∞, ∞) R+ R := (−∞, ∞] Rn Rn+ and Rn−

Real line Collection of nonnegative numbers Extended real line n-dimensional Euclidean space Nonnegative and nonpositive orthant of Rn

xv

xvi

Glossary of Notation and Acronyms

Sets ∅ N Ωc Ωo Ωr B(x; r ) B(x; r ) B int Ω or int(Ω) ri Ω or ri(Ω)  or cl Ω bd Ω co Ω or co(Ω) co Ω cone Ω or cone(Ω) KΩ Ω∞ aff Ω or aff(Ω) Π (x; Ω) = ΠΩ (x) Π F (x; Ω) = ΠΩF (x) N (x;Ω) ¯ T (x; ¯ Ω)

Empty set Set of natural numbers Complement of Ω Polar set of Ω r-expansion of Ω Open ball centered at x with radius r with respect to the Euclidean norm Closed ball centered at x with radius r with respect to the Euclidean norm Closed unit ball with respect to the Euclidean norm Interior of Ω Relative interior of Ω Closure of Ω Set boundary Convex hull of Ω Closure of co Ω Conic hull of Ω Convex cone generated by Ω Horizon/asymptotic cone of set Ω Affine hull of Ω Euclidean projection from x to Ω Generalized projection from x to Ω Normal cone to a convex set Ω at x¯ Tangent/contingent cone to a convex set Ω at x¯

Functions δ(·;Ω) = δ Ω (·) σ (·;Ω) = σ Ω (·) d(·;Ω) = dΩ (·) ρΩ T F (·;Ω) = T FΩ dom f or dom( f ) epi f or epi( f ) hypo f or hypo( f ) f ∗ and f ∗∗ f (x) = f + (x) and f − (x)

Indicator function of Ω Support function of Ω Distance function for Ω Minkowski gauge of function associated with set Ω Minimal time function with dynamics F and target Ω Domain of f : Rn → R Epigraph of f Hypograph of f Fenchel conjugate and biconjugate of f Right and left derivatives of f : R → R at x

Glossary of Notation and Acronyms

f (x) ¯ = f F (x) ¯ = ∇ f (x) ¯ ¯ f G (x) ¯ v) f (x; ∂ f (x) ¯ ¯ ∂ ∞ f (x) Lα ( f )

Fréchet derivative/gradient of f at x¯ Gâteaux derivative of f at x¯ Directional derivative of f at x¯ in direction v Subdifferential of f at x¯ Singular/horizon subdifferential of f at x¯ Sublevel set of f

Set-Valued Mappings F : Rn ⇒ R p domF or dom(F) rge F or rge(F) gphF or gph(F) ker F or ker(F) F −1 : R p ⇒ Rn F(Ω) and F −1 (Ω) F1 + F2 G◦F Ef ¯ y¯ ) D ∗ F(x,

Set-valued mapping/multifunction from Rn to R p Domain of F Range of F Graph of F Kernel of F Inverse mapping of F : Rn ⇒ R p Image and inverse image/preimage of Ω under F Sum of mappings Composition of mappings Epigraphical multifunction associated with function f Coderivative of F at (x, ¯ y¯ ) ∈ gph F

Acronyms l.s.c. u.s.c.

Lower semicontinuous (functions) Upper semicontinuous (functions)

xvii

List of Figures

Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig.

1.1 1.2 1.3 1.4 1.5 1.6 1.7 2.1 2.2 2.3 2.4 3.1 3.2 3.3 4.1 5.1 5.2 5.3 6.1 6.2 7.1 8.1 8.2 8.3 8.4 8.5 8.6

Convex set and nonconvex set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonconvex set and its convex hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convex function and nonconvex function . . . . . . . . . . . . . . . . . . . . . . . . Epigraphs of convex function and nonconvex function . . . . . . . . . . . . . Distance function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Euclidean projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph of set-valued mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of convex separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometric illustration of Lemma 2.37 . . . . . . . . . . . . . . . . . . . . . . . . . . . Epigraph of the absolute value function and the normal cone to it at 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normal cone to set intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subdifferential of the absolute value function at the origin . . . . . . . . . . Singular subgradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subdifferential mean value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fenchel conjugate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convex cone generated by a set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carathéodory theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tangent cone and normal cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Horizon cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimal time functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Which function is l.s.c.? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Construction of the Torricelli point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Three sets that satisfy assumptions of Proposition 8.16 . . . . . . . . . . . . . Minimal time function with square dynamics . . . . . . . . . . . . . . . . . . . . . A generalized Fermat-Torricelli problem with |I (x)| = 2 . . . . . . . . . . . A generalized Fermat-Torricelli problem with infinite solutions . . . . . . Generalized Fermat-Torricelli problem for three balls . . . . . . . . . . . . . .

7 12 16 19 29 32 34 58 61 71 74 82 90 99 122 154 156 165 172 175 192 241 249 255 259 262 262

xix

xx

Fig. Fig. Fig. Fig.

List of Figures

8.7 8.8 8.9 8.10

Minimal time function for strictly convex sets . . . . . . . . . . . . . . . . . . . . Smallest intersecting ball problem for three balls . . . . . . . . . . . . . . . . . . Euclidean projection to a square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimal time function with diamond-shape dynamics . . . . . . . . . . . . .

265 270 272 273

1

Convex Sets and Functions

This chapter presents definitions, examples, and basic properties of convex sets and functions in the finite-dimensional Euclidean space Rn and also contains some related material.

1.1

Preliminaries

We start with reviewing classical notions and properties of the Euclidean space Rn . The proofs of the results presented in this section can be found in standard books on mathematical analysis and linear algebra. Denote by Rn the set of all n−tuples of real numbers x = (x1 , . . . , xn ). Then Rn is a linear space with the following operations: (x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ), λ(x1 , . . . , xn ) = (λx1 , . . . , λxn ), where (x1 , . . . , xn ), (y1 , . . . , yn ) ∈ Rn and λ ∈ R. The zero element of Rn and the number zero of the real line R are often denoted by the same notation 0 if no confusion arises. Given any x = (x1 , . . . , xn ) ∈ Rn , we identify it with the column vector x = [x1 , . . . , xn ]T , where the symbol “T ” stands for vector transposition. For any x = (x1 , . . . , xn ) ∈ Rn and y = (y1 , . . . , yn ) ∈ Rn , the inner product of x and y is defined by x, y :=

n 

xi yi .

i=1

The following proposition lists some important properties of the inner product in the space Rn . © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0_1

1

2

1 Convex Sets and Functions

Proposition 1.1 For any x, y, z ∈ Rn and λ ∈ R, we have (i) (ii) (iii) (iv)

x, x ≥ 0, and x, x = 0 if and only if x = 0. x, y = y, x. λx, y = λx, y. x + y, z = x, z + y, z.

The Euclidean norm of x = (x1 , . . . , xn ) ∈ Rn is defined by  x := x12 + · · · + xn2 . It follows directly from the definition that x =

√ x, x.

Proposition 1.2 For any x, y ∈ Rn and λ ∈ R, we have (i) (ii) (iii) (iv)

x ≥ 0, and x = 0 if and only if x = 0. λx = |λ| · x. x + y ≤ x + y (the triangle inequality). |x, y| ≤ x · y (the Cauchy-Schwarz inequality).

Using the Euclidean norm allows us to introduce the notions of open and closed balls, which can be used to define topological notions in Rn . Definition 1.3 Let x ∈ Rn , and let r be a positive real number. (i) The open ball centered at x with radius r and the open unit ball of Rn are defined, respectively, by       B(x; r ) := x ∈ Rn  x − x < r and B := x ∈ Rn  x < 1 . (ii) The closed ball centered at x with radius r and the closed unit ball of Rn are defined, respectively, by       B(x; r ) := x ∈ Rn  x − x ≤ r and B := x ∈ Rn  x ≤ 1 .

It is easy to see that B = B(0; 1) and B = B(0; 1).

1.1

Preliminaries

3

Definition 1.4 Let Ω ⊂ Rn , and let x ∈ Rn . Then x is an interior point of Ω if there is δ > 0 such that B(x; δ) ⊂ Ω. The set of all interior points of Ω is denoted by int Ω. Moreover, Ω is said to be open if every point of Ω is an interior point. We get that Ω is an open set if and only if for every x ∈ Ω there exists δ > 0 such that B(x; δ) ⊂ Ω. Obviously, the empty set ∅ and the whole space Rn are open. Furthermore, any open ball is an open set. Definition 1.5 A set Ω ⊂ Rn is closed if its complement Ω c := Rn \Ω is an open set in Rn . It follows from the definition that the empty set, the whole space, and any closed ball B(x; r ) are closed subsets of Rn . Proposition 1.6 We have the following assertions: (i) (ii) (iii) (iv)

The union of any collection of open sets in Rn is open. The intersection of any finite collection of open sets in Rn is open. The intersection of any collection of closed sets in Rn is closed. The union of any finite collection of closed sets in Rn is closed.

Definition 1.7 Let {xk } be a sequence in Rn , and let x ∈ Rn . We say that {xk } converges to x if xk − x → 0 as k → ∞. In this case, we write lim xk = x.

k→∞

This notion allows us to define the following important topological concepts for sets. Definition 1.8 Let Ω be a subset of Rn . Then (i) The closure of Ω, denoted by Ω or cl Ω, is the collection of limits of all convergent sequences belonging to Ω. (ii) The boundary of Ω, denoted by bd Ω, is the set Ω \ int Ω. We can see that the closure of Ω is the intersection of all closed sets containing Ω, and that the interior of Ω is the union of all open sets contained in Ω. It follows from the definition that x ∈ Ω if and only if for any δ > 0 we have B(x; δ) ∩ Ω  = ∅. Furthermore, x ∈ bd Ω if and only if for any δ > 0 the ball B(x; δ) intersects both sets Ω and its complement Ω c .

4

1 Convex Sets and Functions

Definition 1.9 Let {xk } be a sequence in Rn , and let {kl } be a strictly increasing sequence of positive integers. Then the new sequence {xkl } is called a subsequence of {xk }. We say that a set Ω is bounded if it is contained in a ball centered at the origin with some radius r > 0, i.e., Ω ⊂ B(0; r ). A sequence {xk } is bounded if the set {xk | k ∈ N} is bounded. Thus a sequence {xk } is bounded if there exists r > 0 such that xk  ≤ r for all k ∈ N. The following important result is known as the Bolzano-Weierstrass theorem. Theorem 1.10 Any bounded sequence in Rn has a convergent subsequence. Using subsequences and convergence, we can define the notion of compactness, which plays a very significant role in analysis and optimization. Definition 1.11 We say that a set Ω is compact in Rn if every sequence in Ω contains a subsequence converging to some point in Ω. The following result is a consequence of the Bolzano-Weierstrass theorem. Theorem 1.12 A subset Ω of Rn is compact if and only if it is simultaneously closed and bounded. For subsets Ω, Ω1 , and Ω2 of Rn and for λ ∈ R, we define the operations:       Ω1 + Ω2 := x + y  x ∈ Ω1 , y ∈ Ω2 , λΩ := λx  x ∈ Ω . Recall the notions of lower and upper bounds for subsets of the real line. Definition 1.13 Let D be a subset of the real line. A number m ∈ R is a lower bound of D if we have m ≤ x for all x ∈ D. If the set D has a lower bound, then it is bounded below. Similarly, a number M ∈ R is an upper bound of D if x ≤ M for all x ∈ D, and D is bounded above if it has an upper bound. Furthermore, we say that the set D is bounded if it is simultaneously bounded below and above.

1.1

Preliminaries

5

Now we are ready to define the concepts of infimum and supremum of subsets of the real line. Definition 1.14 Let D ⊂ R be nonempty and bounded below. A real number α is a greatest lower bound of D if the following properties are satisfied: (i) α is a lower bound of D. (ii) If α is a lower bound of D, then α ≤ α. It follows from the definition that a greatest lower bound of D (if exists) is unique. The greatest lower bound of D is also called the infimum of D, denoted by inf D. Definition 1.15 Let D ⊂ R be nonempty and bounded above. A real number β is a least upper bound of D if the following properties are satisfied: (i) β is an upper bound of D. (ii) If β  is an upper bound of D, then β ≤ β  . It is similar to the case of the greatest lower bound that a least upper bound D (if exists) is unique. The least upper bound of D is also called the supremum of D, denoted by sup D. The following fundamental axiom of real analysis ensures that these notions are welldefined. Completeness Axiom. For every nonempty subset D of R that is bounded below, the greatest lower bound of D exists as a real number. Using the Completeness Axiom, it is easy to see that if a nonempty set D of the real line is bounded above, then its least upper bound exists as a real number and sup D = − inf(−D), where −D := {−x | x ∈ D}. We consider the extended real number system which consists of R and two symbols −∞ and ∞. Along with the usual properties of R, the conventions of extended arithmetic and orders are as follows: a + ∞ = ∞, a + (−∞) = a − ∞ = −∞ for a ∈ R, ∞ + ∞ = ∞, (−∞) + (−∞) = −∞ t · ∞ = ∞, t · (−∞) = −∞ for t ∈ (0, ∞) t · ∞ = −∞, t · (−∞) = ∞ for t ∈ (−∞, 0), 0 · ∞ = 0 · (−∞) = 0, −∞ < a < ∞ for a ∈ R.

(1.1)

In the extended real number system, −∞ is a lower bound of every subset, and every nonempty subset has a greatest lower bound. If D is nonempty and not bounded below, then inf D = −∞. We also use the convention that inf ∅ = ∞. Similarly, ∞ is an upper bound

6

1 Convex Sets and Functions

of every subset of the extended real number system, and every nonempty subset has a least upper bound. If D is nonempty and not bounded above, then sup D = ∞. We also use the convention that sup ∅ = −∞. Throughout the entire book, we mostly consider for convenience extended-real-valued functions f : Rn → R, which take values in R := (−∞, ∞] = R ∪ {∞}. This allows us to avoid undefined operations on the extended real number system such as −∞ + ∞. Whenever a letter such as ε, δ, and λ is used, we mean that it is real number unless otherwise stated. However, f (x) could have ∞ value when x ∈ Rn . Definition 1.16 Let f : Ω → Θ be a function, where Ω is a nonempty subset of Rn and Θ is a nonempty subset of R p . The function f is continuous at x ∈ Ω if for any ε > 0 there exists δ > 0 such that  f (x) − f (x) < ε whenever x ∈ B(x; δ) ∩ Ω. If f is continuous at every x ∈ Ω, then we say that it is continuous on Ω. Next we formulate a fundamental result of mathematical analysis and optimization known as the Weierstrass existence theorem. Theorem 1.17 Let f : Ω → R be a continuous function, where Ω is a nonempty compact subset of Rn . Then there exist points x ∈ Ω and u ∈ Ω, which realize the infimum/minimum and supremum/maximum, respectively:       f (x) = min f (x)  x ∈ Ω and f (u) = max f (x)  x ∈ Ω . Recall that a nonempty subset L of Rn is a linear subspace of Rn if x + y ∈ L and λx ∈ L whenever x, y ∈ L and λ ∈ R. Given a linear subspace L of Rn and a linear subspace W of R p , a mapping A : L → W is linear if A(x + y) = A(x) + A(y) and A(λx) = λA(x) for all x, y ∈ L and λ ∈ R. Throughout the book, the notation R p×n stands for the collection of all p × n-matrices with real entries. If A ∈ R p×n , its transpose is denoted by A T . It follows from the definition that Ax, y = x, A T y for all x ∈ Rn and y ∈ R p . We also use the following symbols:

1.2

Convex Sets

7

im A := {Ax | x ∈ Rn },    (A)−1 (y) := x ∈ Rn  Ax = y , y ∈ R p , ker A := A−1 (0).

1.2

Convex Sets

We start the study of convexity with sets and then proceed to functions. Geometric ideas play an underlying role in convex analysis, its extensions, and applications. Thus we implement the geometric approach in this book. Given two elements a and b in Rn , define the interval/line segment    [a, b] := λa + (1 − λ)b  0 ≤ λ ≤ 1 . Besides [a, b], we also use other kinds of intervals, namely (a, b) := {λa + (1 − λ)b | 0 < λ < 1}, [a, b) := {λa + (1 − λ)b | 0 < λ ≤ 1}, (a, b] := {λa + (1 − λ)b | 0 ≤ λ < 1}. Note that if a = b, then these intervals reduce to the single-point set {a}. Definition 1.18 A subset Ω of Rn is convex if [a, b] ⊂ Ω whenever a, b ∈ Ω. Equivalently, Ω is convex if λa + (1 − λ)b ∈ Ω for all a, b ∈ Ω and 0 ≤ λ ≤ 1 (Fig. 1.1). Example 1.19 (i) The empty set ∅ and the whole space Rn are both convex. (ii) A ball in Rn is a convex set. Indeed, consider the closed ball B(x; r ), where x ∈ Rn and r > 0. For any a, b ∈ B(x; r ) and 0 ≤ λ ≤ 1 we have a − x ≤ r and b − x ≤ r , which imply

Fig. 1.1 Convex set and nonconvex set

8

1 Convex Sets and Functions

λa + (1 − λ)b − x = λ(a − x) + (1 − λ)(b − x) ≤ λ(a − x) + (1 − λ)(b − x) = λa − x + (1 − λ)b − x ≤ λr + (1 − λ)r = r . By definition, B(x; r ) is a convex set. (iii) Given any x, y ∈ Rn , the line segment [x, y] is a convex set. To see this, take a, b ∈ [x, y] and 0 ≤ λ ≤ 1. Then there exist 0 ≤ λ1 , λ2 ≤ 1 such that a = λ1 x + (1 − λ1 )y and b = λ2 x + (1 − λ2 )y. It follows that     λa + (1 − λ)b = λ λ1 x + (1 − λ1 )y + (1 − λ) λ2 x + (1 − λ2 )y    = λλ1 + (1 − λ)λ2 )x + λ(1 − λ1 ) + (1 − λ)(1 − λ2 ) y. Let α = λλ1 + (1 − λ)λ2 and β = λ(1 − λ1 ) + (1 − λ)(1 − λ2 ). It is not hard to check that α, β ≥ 0, α + β = 1, and λa + (1 − λ)b = αx + β y. By definition, λa + (1 − λ)b ∈ [x, y], which justifies the convexity of [x, y]. The proposition below provides a characterization for set convexity. Proposition 1.20 Let Ω be a subset of Rn . Then Ω is convex if and only if αΩ + βΩ = (α + β)Ω whenever α, β ≥ 0.

(1.2)

Proof Suppose that Ω is convex. Fix any α, β ≥ 0 and get (α + β)Ω ⊂ αΩ + βΩ. The reverse inclusion is obviously satisfied if α + β = 0, so we only consider the case where α + β > 0. Taking any x ∈ αΩ + βΩ, we have x = αa + βb = (α + β)



 α β a+ b for some a ∈ Ω, b ∈ Ω. α+β α+β

It follows from the convexity of Ω that α β a+ b ∈ Ω, α+β α+β which yields x ∈ (α + β)Ω. Thus αΩ + βΩ ⊂ (α + β)Ω and (1.2) holds.

1.2

Convex Sets

9

For the converse implication, suppose that (1.2) is satisfied and get λΩ + (1 − λ)Ω = Ω whenever 0 ≤ λ ≤ 1. The inclusion “⊂” in this set equality clearly implies the convexity of Ω.



m λi ωi , where λi ≥ 0 for i = 1, . . . , m Given ω1 , . . . , ωm ∈ Rn , the element x = i=1 m λi = 1, is called a convex combination of ω1 , . . . , ωm . and i=1 Proposition 1.21 A subset Ω of Rn is convex if and only if it contains all convex combinations of its elements. Proof The sufficient condition is trivial. To justify the necessity, we show by induction m λi ωi of elements in Ω is an element of Ω. This that any convex combination x = i=1 conclusion follows directly from the definition for m = 1, 2. Fix now a positive integer m ≥ 2 and suppose that every convex combination of m elements from Ω belongs to Ω. Form the convex combination y=

m+1 

λi ωi , ωi ∈ Ω, λi ≥ 0,

i=1

m+1 

λi = 1.

i=1

Observe that if λm+1 = 1, then λ1 = λ2 = · · · = λm = 0, so y = ωm+1 ∈ Ω. In the case where λm+1 < 1, we get the representations m 

λi = 1 − λm+1 and

i=1

m  i=1

λi = 1. 1 − λm+1

The inductive hypothesis gives us the inclusion z=

m  i=1

λi ωi ∈ Ω, 1 − λm+1

which yields by the definition of set convexity the relationships y = (1 − λm+1 )

m  i=1

λi ωi + λm+1 ωm+1 = (1 − λm+1 )z + λm+1 ωm+1 ∈ Ω 1 − λm+1

and hence completes the proof of the proposition.



Proposition 1.22 Let Ω1 be a convex subset of Rn , and let Ω2 be a convex subset of R p . Then the Cartesian product Ω1 × Ω2 is a convex subset of Rn+ p .

10

1 Convex Sets and Functions

Proof Fix a = (a1 , a2 ), b = (b1 , b2 ) ∈ Ω1 × Ω2 , and 0 ≤ λ ≤ 1. Then we have a1 , b1 ∈ Ω1 and a2 , b2 ∈ Ω2 . The convexity of Ω1 and Ω2 tells us that λai + (1 − λ)bi ∈ Ωi for i = 1, 2, which implies therefore that   λa + (1 − λ)b = λa1 + (1 − λ)b1 , λa2 + (1 − λ)b2 ∈ Ω1 × Ω2 . Thus the Cartesian product Ω1 × Ω2 is convex.



Let us continue with the definition of affine mappings. Definition 1.23 A mapping B : Rn → R p is affine if there exist a linear mapping A : Rn → R p and an element b ∈ R p such that B(x) = A(x) + b for all x ∈ Rn . It follows from the definition that B : Rn → R p is an affine mapping if and only if there exist a matrix A ∈ R p×n and b ∈ R p such that B(x) = Ax + b for all x ∈ Rn . Proposition 1.24 A mapping B : Rn → R p is affine if and only if B(λx + (1 − λ)y) = λB(x) + (1 − λ)B(y) for all x, y ∈ Rn and λ ∈ R.

(1.3)

Proof Suppose that B : Rn → R p is an affine mapping. Then there exist a linear mapping A : Rn → R p and an element b ∈ R p such that B(x) = A(x) + b for all x ∈ Rn . For any x, y ∈ Rn and λ ∈ R, we have B(λx + (1 − λ)y) = A(λx + (1 − λ)y) + b = λA(x) + (1 − λ)A(y) + λb + (1 − λ)b = λ(A(x) + b) + (1 − λ)(A(y) + b) = λB(x) + (1 − λ)B(y), which justifies the necessity of condition (1.3). Now suppose that B : Rn → R p satisfies (1.3). Let b = B(0), and let A(x) = B(x) − b for x ∈ Rn . Then A(0) = B(0) − b = 0, and it follows from (1.3) that A(λx + (1 − λ)y) = λA(x) + (1 − λ)A(y) for all x, y ∈ Rn and λ ∈ R. Then for x, y ∈ Rn and λ ∈ R, we have A(λx) = A(λx + (1 − λ)0) = λA(x) + (1 − λ)A(0) = λA(x),

1.2

Convex Sets

11

which implies in turn that



x+y 1 1 = 2A A(x + y) = A 2 x+ y 2 2 2

1 1 =2 A(x) + A(y) = A(x) + A(y). 2 2 Thus A is a linear mapping for which B(x) = A(x) + b whenever x ∈ Rn . This verifies that B is an affine mapping.  Next we show that set convexity is preserved under affine operations. Proposition 1.25 Let B : Rn → R p be an affine mapping. Suppose that Ω is a convex subset of Rn , and that Θ is a convex subset of R p . Then B(Ω) is a convex subset of R p , and B −1 (Θ) is a convex subset of Rn . Proof Fix any a, b ∈ B(Ω) and 0 ≤ λ ≤ 1. Then a = B(x) and b = B(y) for some x, y ∈ Ω. Since Ω is convex, we have λx + (1 − λ)y ∈ Ω and therefore λa + (1 − λ)b = λB(x) + (1 − λ)B(y) = B(λx + (1 − λ)y) ∈ B(Ω), which justifies the convexity of the image B(Ω). Taking further any x, y ∈ B −1 (Θ) and 0 ≤ λ ≤ 1, we get B(x), B(y) ∈ Θ. This gives us the inclusion   B λx + (1 − λ)y = λB(x) + (1 − λ)B(y) ∈ Θ by the convexity of Θ. Thus we have λx + (1 − λ)y ∈ B −1 (Θ), which verifies the convexity  of the inverse image B −1 (Θ). Corollary 1.26 Let Ω1 , Ω2 ⊂ Rn be convex, and let λ ∈ R. Then both sets Ω1 + Ω2 and λΩ1 are also convex in Rn . Proof Observe that the mapping A : Rn × Rn → Rn given by A(x, y) = x + y for (x, y) ∈ Rn × Rn is a linear mapping, and so it is an affine mapping. Then the set Ω1 + Ω2 can be represented as Ω1 + Ω2 = A(Ω1 × Ω2 ), which is a convex set by Propositions 1.22 and 1.25. The convexity of λΩ1 can be justified in a similar way.  Next we proceed with intersections of convex sets. Proposition 1.27 Let {Ωα }α∈I be a collection of convex subsets of Rn . Then also a convex subset of Rn .



α∈I

Ωα is

12

1 Convex Sets and Functions

Fig. 1.2 Nonconvex set and its convex hull

Proof Taking any a, b ∈ α∈I Ωα and 0 ≤ λ ≤ 1, we get that a, b ∈ Ωα for all α ∈ I . The convexity of each Ωα ensures that λa + (1 − λ)b ∈ Ωα . Thus λa + (1 − λ)b ∈ α∈I Ωα and the intersection α∈I Ωα is convex.  Definition 1.28 Let Ω be a subset of Rn . The convex hull of Ω is 

  co Ω := C  C is convex and Ω ⊂ C . The next proposition follows directly from definition and Proposition 1.27 (Fig. 1.2). Proposition 1.29 The convex hull co Ω of Ω is the smallest convex set containing the set Ω. Proof The convexity of the set co Ω ⊃ Ω follows from Proposition 1.27. On the other hand, for any convex set C that contains Ω we clearly have that co Ω ⊂ C, which verifies the claimed statement.  Proposition 1.30 For any subset Ω of Rn , its convex hull admits the following representation: m m      co Ω = λi ai  λi = 1, λi ≥ 0, ai ∈ Ω, m ∈ N . i=1

i=1

Proof Denoting by C the right-hand side of the representation to prove, we obviously have the inclusion Ω ⊂ C. Let us check that the set C is convex. Take any a, b ∈ C and get by the definition that p q   a= αi ai , b = βjbj, i=1

j=1

1.2

Convex Sets

13

p q where ai , b j ∈ Ω, αi , β j ≥ 0 with i=1 αi = j=1 β j = 1, and p, q ∈ N. It easily follows that for every number λ ∈ (0, 1), we have λa + (1 − λ)b =

p 

λαi ai +

i=1

q 

(1 − λ)β j b j .

j=1

Then the resulting equality p  i=1

λαi +

q  j=1

(1 − λ)β j = λ

p  i=1

αi + (1 − λ)

q 

βj = 1

j=1

ensures that λa + (1 − λ)b ∈ C, which justifies the convexity of C. Thus co Ω ⊂ C by the m λi ai ∈ C with ai ∈ Ω ⊂ co Ω, λi ≥ 0 for i = definition of co Ω. Fix finally any a = i=1 m 1, . . . , m, and i=1 λi = 1. Since the set co Ω is convex, we deduce from Proposition 1.21 that a ∈ co Ω and thus arrive at the inclusion C ⊂ co Ω, which completes the proof of the proposition.  Proposition 1.31 The interior int Ω and closure Ω of a convex set Ω ⊂ Rn are also convex sets. Proof Fix a, b ∈ int Ω and λ ∈ (0, 1). Then find an open set V such that a ∈ V ⊂ Ω and so λa + (1 − λ)b ∈ λV + (1 − λ)b ⊂ Ω. Since λV + (1 − λ)b is open, we get λa + (1 − λ)b ∈ int Ω, and thus the interior int Ω of Ω is a convex set. To verify the convexity of Ω, fix a, b ∈ Ω and λ ∈ (0, 1) and then find by the definition sequences {ak } and {bk } in Ω converging to a and b, respectively. By the convexity of Ω, the sequence {λak + (1 − λ)bk } lies entirely in Ω and converges to λa + (1 − λ)b. This ensures the inclusion λa + (1 − λ)b ∈ Ω and thus justifies the convexity of the closure Ω.  Lemma 1.32 For a convex set Ω ⊂ Rn with nonempty interior, take any a ∈ int Ω and b ∈ Ω. Then we have [a, b) ⊂ int Ω. Proof Since b ∈ Ω, it follows that b ∈ Ω + εB for any ε > 0. Pick now a real number λ such that 0 < λ ≤ 1, and let xλ = λa + (1 − λ)b. Choosing ε > 0 such that a + ε 2−λ λ B ⊂ Ω and using Proposition 1.20 give us the relationships

14

1 Convex Sets and Functions

xλ + εB = λa + (1 − λ)b + εB   ⊂ λa + (1 − λ) Ω + εB + εB = λa + (1 − λ)Ω + (1 − λ)εB + εB  2−λ  B + (1 − λ)Ω =λ a+ε λ ⊂ λΩ + (1 − λ)Ω = Ω. This shows that xλ ∈ int Ω and thus verifies the inclusion [a, b) ⊂ int Ω.



The final result of this section establishes interconnections between taking the interior and closure of convex sets. Proposition 1.33 Let Ω ⊂ Rn be a convex set with nonempty interior. Then we have the relationships (i) int Ω = Ω. (ii) int Ω = int Ω. Proof (i) Obviously, int Ω ⊂ Ω. To verify the reverse inclusion, for any b ∈ Ω and a ∈ int Ω define the sequence {xk } by xk =

 1 1 b, k ∈ N. a+ 1− k k

Lemma 1.32 ensures that xk ∈ int Ω for every k ∈ N. Since xk → b as k → ∞, we have b ∈ int Ω and thus justify (i). (ii) Since the inclusion int Ω ⊂ int Ω is obvious, it remains to prove the reverse inclusion int Ω ⊂ int Ω. To proceed, fix any b ∈ int Ω and a ∈ int Ω. If ε > 0 is sufficiently small, then c = b + ε(b − a) ∈ Ω. Employing Lemma 1.32, we get b=

ε 1 a+ c ∈ (a, c) ⊂ int Ω, 1+ε 1+ε

which verifies that int Ω ⊂ int Ω and thus completes the proof.

1.3



Convex Functions

This section presents basic facts about general (extended-real-valued) convex functions including their analytic and geometric characterizations, important properties as well as specifications for particular subclasses.

1.3

Convex Functions

15

Definition 1.34 Let f : Ω → R := (−∞, ∞] be an extended-real-valued function defined on a convex set Ω ⊂ Rn . Then f is convex on Ω if   f λx + (1 − λ)y ≤ λ f (x) + (1 − λ) f (y) for all x, y ∈ Ω and 0 < λ < 1.

(1.4)

The reader is referred to Fig. 1.3 for an illustration of function convexity. Remark 1.35 (i) Inequality (1.4) is known as the Jensen inequality. With the conventions from (1.1), one can easily check that (1.4) holds if and only if we have   f λx + (1 − λ)y ≤ λ f (x) + (1 − λ) f (y) for all x, y ∈ Ω and 0 ≤ λ ≤ 1. Indeed, if λ = 0, then   f λx + (1 − λ)y = f (y), λ f (x) + (1 − λ) f (y) = 0 f (x) + f (y) = f (y) even if f (x) = ∞. The justification is similar if λ = 1. (ii) Given a function f : Ω → R defined on a convex set Ω ⊂ Rn , the extension of f to Rn is defined by  f (x) if x ∈ Ω,  f (x) := ∞ otherwise. Obviously, if f is convex on Ω, then  f is convex on Rn . Furthermore, if f : Rn → R is a convex function, then for any convex set Ω ⊂ Rn the restriction f |Ω : Ω → R is also convex. This allows us to consider without loss of generality extended-real-valued convex functions on the whole space Rn . We associate with an extended-real-valued function the two geometric objects, which play an important role in what follows. Definition 1.36 The effective domain (or domain for short) and epigraph of f : Rn → R are defined, respectively, by    dom f := x ∈ Rn  f (x) < ∞ and    epi f := (x, t) ∈ Rn+1  x ∈ Rn , f (x) ≤ t . We say that f is proper if dom f  = ∅. Let us illustrate the convexity of functions by examples.

16

1 Convex Sets and Functions

Fig. 1.3 Convex function and nonconvex function

Example 1.37 Given a subset Ω of Rn , define the indicator function associated with the set Ω by  0 if x ∈ Ω, δΩ (x) := ∞ otherwise. It is easy to check that dom δΩ = Ω, epi δΩ = Ω × [0, ∞), and that δΩ is a convex function if and only if Ω is a convex set. Example 1.38 The following functions are convex: (i) f (x) = a, x + b for x ∈ Rn , where a ∈ Rn and b ∈ R. (ii) g(x) = x for x ∈ Rn . (iii) h(x) = x2 for x ∈ Rn . Indeed, the function f in (i) is convex since   f λx + (1 − λ)y = a, λx + (1 − λ)y + b = λa, x + (1 − λ)a, y + b = λ(a, x + b) + (1 − λ)(a, y + b) = λ f (x) + (1 − λ) f (y) for all x, y ∈ Rn and λ ∈ (0, 1). The function g in (ii) is convex since for x, y ∈ Rn and λ ∈ (0, 1), we have   g λx + (1 − λ)y = λx + (1 − λ)y ≤ λx + (1 − λ)y = λg(x) + (1 − λ)g(y),

1.3

Convex Functions

17

which follows from the triangle inequality and the fact that αu = |α| · u whenever α ∈ R and u ∈ Rn . The convexity of the function h in (iii) is a consequence of the more general result for the class of quadratic functions on Rn given in the next example. Example 1.39 Let A be an n × n symmetric matrix with real entries. It is called positive semidefinite and is written as A  0 if we have Au, u ≥ 0 for all u ∈ Rn . Let us check that A is positive semidefinite if and only if the function f : Rn → R defined by f (x) =

1 Ax, x, x ∈ Rn , 2

is convex. Indeed, a direct calculation shows that   1 λ f (x) + (1 − λ) f (y) − f λx + (1 − λ)y = λ(1 − λ)A(x − y), x − y 2

(1.5)

whenever x, y ∈ Rn and 0 < λ < 1. If the matrix A is positive semidefinite, then A(x − y), x − y ≥ 0, so the function f is convex by (1.5). Conversely, assume that f is convex and take any u ∈ Rn . Then using equality (1.5) for x = u, y = 0, and 0 < λ < 1 verifies that A is positive semidefinite. We show now that to determine whether an extended-real-valued function is convex, it suffices to verify the Jensen inequality on its domain. Proposition 1.40 Let f : Rn → R be an extended-real-valued function. Then f is a convex function if and only if the Jensen inequality (1.4) is satisfied for all x, y ∈ dom f and 0 < λ < 1. Proof Suppose that f is convex. Since dom f is a subset of Rn , it is obvious that the Jensen inequality is satisfied for all x, y ∈ dom f and 0 < λ < 1. The converse implication is also obvious because if x ∈ / dom f or y ∈ / dom f , then f (x) = ∞ or f (y) = ∞. Thus the Jensen inequality holds in this case.  Next we introduce the notion of strict convexity for extended-real-valued functions. This notion is important, in particular, to establish the uniqueness of an absolute minimizer. Definition 1.41 Let f : Rn → R be an extended-real-valued function. We say that f is strictly convex if   f λx + (1 − λ)y < λ f (x) + (1 − λ) f (y)

18

1 Convex Sets and Functions

for all x, y ∈ dom f with x  = y and 0 < λ < 1. It follows directly from the definition that if f is strictly convex, then it is convex. Note that the Jensen inequality always holds as an equality if x = y. The following theorem provides a major geometric characterization of the function convexity via the convexity of the associated epigraphical set (Fig. 1.4). Theorem 1.42 A function f : Rn → R is convex if and only if its epigraph epi f is a convex subset of Rn+1 . Proof Assuming that f is convex, fix any elements (x1 , t1 ), (x2 , t2 ) ∈ epi f and a number λ ∈ (0, 1). Then we have f (xi ) ≤ ti for i = 1, 2. Thus the convexity of f ensures that   f λx1 + (1 − λ)x2 ≤ λ f (x1 ) + (1 − λ) f (x2 ) ≤ λt1 + (1 − λ)t2 . This implies therefore that λ(x1 , t1 ) + (1 − λ)(x2 , t2 ) = (λx1 + (1 − λ)x2 , λt1 + (1 − λ)t2 ) ∈ epi f , and thus the epigraph epi f is a convex subset of Rn × R. Conversely, suppose that the epigraphical set epi f is convex. Fix any elements x1 , x2 ∈ dom f and a real number λ ∈ (0, 1). Then (xi , f (xi )) ∈ epi f for i = 1, 2. This tells us that   λx1 + (1 − λ)x2 , λ f (x1 ) + (1 − λ) f (x2 )     = λ x1 , f (x1 ) + (1 − λ) x2 , f (x2 ) ∈ epi f , and hence we have the inequality   f λx1 + (1 − λ)x2 ≤ λ f (x1 ) + (1 − λ) f (x2 ). Using Proposition 1.40, the convexity of f is justified.

(1.6) 

The next characterization of convexity owned for extended-real-valued functions is known as the generalized Jensen inequality. Theorem 1.43 An extended-real-valued function f : Rn → R is convex if and only if for m λi = 1 and for any elements xi ∈ Rn as i = 1, . . . , m it any numbers λi > 0 with i=1 holds that m m    f λi xi ≤ λi f (xi ). (1.7) i=1

i=1

1.3

Convex Functions

19

Fig. 1.4 Epigraphs of convex function and nonconvex function

m Proof Suppose that f : Rn → R is convex. Taking any numbers λi > 0 with i=1 λi = 1 and any elements xi ∈ Rn as i = 1, . . . , m, we need to prove that the generalized Jensen inequality (1.7) is satisfied. It suffices to consider the case where xi ∈ dom f for i = 1, . . . , m. Then (xi , f (xi )) ∈ epi f , which is a convex set by Theorem 1.42. It follows from Proposition 1.21 that  m  m m    λi (xi , f (xi )) = λxi , λi f (xi ) ∈ epi f , i=1

i=1

i=1

which yields (1.7) and thus completes the proof of the theorem since the converse implication is obvious.  Now we show that convexity is preserved under some important operations. Proposition 1.44 Let f and f i : Rn → R be convex functions for all i = 1, . . . , m. Then the following functions are convex as well: (i) The multiplication by scalars α f for any α ≥ 0. m (ii) The sum function i=1 fi . (iii) The maximum function max1≤i≤m f i . Proof (i) The convexity of α f follows directly from the definition. (ii) It is sufficient to consider the case where m = 2 since the general case easily follows by induction. Fix any x, y ∈ Rn and λ ∈ (0, 1). Then we have        f 1 + f 2 λx + (1 − λ)y = f 1 λx + (1 − λ)y + f 2 λx + (1 − λ)y ≤ λ f 1 (x) + (1 − λ) f 1 (y) + λ f 2 (x) + (1 − λ) f 2 (y) = λ( f 1 + f 2 )(x) + (1 − λ)( f 1 + f 2 )(y), which verifies the convexity of the sum function f 1 + f 2 . (iii) Similar to the proof of (ii), we only consider the case where m = 2. Denote g =

20

1 Convex Sets and Functions

max{ f 1 , f 2 } and get that epi g = epi f 1 ∩ epi f 2 , which is a convex set in Rn+1 since both epi f 1 and epi f 2 are convex. By Proposition 1.42, the maximum function g is convex.  The next proposition provides sufficient conditions ensuring the preservation of convexity under function compositions. Proposition 1.45 Let f : Rn → R be a real-valued convex function, and let ϕ : R → R be nondecreasing and convex on an interval I of R containing the range of the function f . Then the composition ϕ ◦ f is convex. Proof Take any x1 , x2 ∈ Rn and 0 < λ < 1. Employing the convexity of the inner function f tells us that   f λx1 + (1 − λ)x2 ≤ λ f (x1 ) + (1 − λ) f (x2 ).   Observe that f λx1 + (1 − λ)x2 , f (x1 ), f (x2 ) ∈ I for the convex set I ⊂ R. Since the outer function ϕ is nondecreasing and convex on I , we get that       ϕ ◦ f λx1 + (1 − λ)x2 = ϕ f λx1 + (1 − λ)x2   ≤ ϕ λ f (x1 ) + (1 − λ) f (x2 )     ≤ λϕ f (x1 ) + (1 − λ)ϕ f (x2 ) = λ(ϕ ◦ f )(x1 ) + (1 − λ)(ϕ ◦ f )(x2 ), which verifies the convexity of the composition ϕ ◦ f .



Now we consider the composition of convex functions and affine mappings. Proposition 1.46 Let B : Rn → R p be an affine mapping, and let f : R p → R be a convex function. Then the composition f ◦ B is convex. Proof Taking any x, y ∈ Rn and 0 < λ < 1, we have      f ◦ B λx + (1 − λ)y = f (B(λx + (1 − λ)y)) = f λB(x) + (1 − λ)B(y)     ≤ λ f B(x) + (1 − λ) f B(y) = λ( f ◦ B)(x) + (1 − λ)( f ◦ B)(y) and thus justify the convexity of the composition f ◦ B.



Example 1.47 Given A ∈ R p×n and b ∈ R p , define the function g(x) = Ax − b2 for x ∈ Rn . Consider the affine mapping B(x) = Ax − b on Rn and the convex function f (x) = x2 on R p . Then g = f ◦ B is convex by Proposition 1.46.

1.3

Convex Functions

21

The next consequence of Proposition 1.46 is useful in applications. Corollary 1.48 Let f : Rn → R be a convex function. Then for any x, d ∈ Rn , the function ϕ : R → R defined by ϕ(t) = f (x + td) for t ∈ R is convex as well. Conversely, if for any x, d ∈ Rn the function ϕ defined above is convex, then f is also convex. Proof Given x, d ∈ Rn , define B : R → Rn by B(t) = x + td for t ∈ R. Since B is an affine mapping and ϕ = f ◦ B, the convexity of ϕ immediately follows from Proposition 1.46. To prove the converse implication, take any x1 , x2 ∈ Rn , λ ∈ (0, 1), and let x = x2 , d = x1 − x2 . Since the function ϕ : R → R given by ϕ(t) = f (x + td) for t ∈ R is convex, we have       f λx1 + (1 − λ)x2 = f x2 + λ(x1 − x2 ) = ϕ(λ) = ϕ λ(1) + (1 − λ)(0) ≤ λϕ(1) + (1 − λ)ϕ(0) = λ f (x1 ) + (1 − λ) f (x2 ), which verifies the convexity of the function f .



The proof of the statement below is straightforward. Proposition 1.49 Let f : Rn × R p → R be convex, and let (x, y) ∈ Rn × R p be an arbitrary fixed pair. Then the functions ϕ(y) = f (x, y) for y ∈ R p and ψ(x) = f (x, y) for x ∈ Rn are also convex. Now we present an important extension of Proposition 1.44(iii). Proposition 1.50 Let f i : Rn → R for i ∈ I be a collection of convex functions with a nonempty index set I . Then the supremum function f : Rn → R given by f (x) = supi∈I f i (x) for x ∈ Rn is convex. Proof Fix x1 , x2 ∈ Rn and 0 < λ < 1. For every i ∈ I , we have   f i λx1 + (1 − λ)x2 ≤ λ f i (x1 ) + (1 − λ) f i (x2 ) ≤ λ f (x1 ) + (1 − λ) f (x2 ), which implies in turn that     f λx1 + (1 − λ)x2 = sup f i λx1 + (1 − λ)x2 ≤ λ f (x1 ) + (1 − λ) f (x2 ). i∈I

This justifies the convexity of the supremum function.



Our next intention is to characterize the convexity of smooth functions of one variable. To proceed, we begin with the following lemma.

22

1 Convex Sets and Functions

Lemma 1.51 Given a convex function f : R → R, assume that I ⊂ dom f for an interval I . If a, b ∈ I and a < x < b, then we have f (x) − f (a) f (b) − f (a) f (b) − f (x) ≤ ≤ . x −a b−a b−x Proof Fix a, b, x as above and form the number t =

x −a ∈ (0, 1). Then b−a

        x −a b − a = f a + t(b − a) = f tb + (1 − t)a . f (x) = f a + (x − a) = f a + b−a

This gives us the inequalities f (x) ≤ t f (b) + (1 − t) f (a) and    x −a f (b) − f (a) , f (x) − f (a) ≤ t f (b) + (1 − t) f (a) − f (a) = t f (b) − f (a) = b−a which can be equivalently rewritten as f (x) − f (a) f (b) − f (a) ≤ . x −a b−a Similarly, we have the estimate    x − b f (b) − f (a) , f (x) − f (b) ≤ (t − 1) f (b) − f (a) = b−a which finally implies that

f (b) − f (a) f (b) − f (x) ≤ b−a b−x and thus completes the proof of the lemma.



Theorem 1.52 Suppose that f : R → R is differentiable on a nonempty open interval I ⊂ dom f . Then f is convex on I if and only if the derivative f  is nondecreasing on I . Proof Suppose that f is convex and fix a < b with a, b ∈ I . By Lemma 1.51, we have the inequality f (x) − f (a) f (b) − f (a) ≤ x −a b−a for any x ∈ (a, b). This implies by the derivative definition that f  (a) ≤ Similarly, we arrive at the estimate

f (b) − f (a) . b−a

1.3

Convex Functions

23

f (b) − f (a) ≤ f  (b) b−a and conclude that f  (a) ≤ f  (b). Thus f  is nondecreasing on I . To prove the converse, suppose that f  is nondecreasing on I and fix x1 < x2 with x1 , x2 ∈ I and t ∈ (0, 1). Then x1 < xt < x2 for xt = t x1 + (1 − t)x2 . By the mean value theorem, there are c1 , c2 with x1 < c1 < xt < c2 < x2 and f (xt ) − f (x1 ) = f  (c1 )(xt − x1 ) = f  (c1 )(1 − t)(x2 − x1 ), f (xt ) − f (x2 ) = f  (c2 )(xt − x2 ) = f  (c2 )t(x1 − x2 ). This can be equivalently rewritten as t f (xt ) − t f (x1 ) = f  (c1 )t(1 − t)(x2 − x1 ), (1 − t) f (xt ) − (1 − t) f (x2 ) = f  (c2 )t(1 − t)(x1 − x2 ). Since f  (c1 ) ≤ f  (c2 ), summing up these equalities yields f (xt ) ≤ t f (x1 ) + (1 − t) f (x2 ), which justifies the convexity of the function f .



Corollary 1.53 Let f : R → R be twice differentiable on a nonempty open interval I ⊂ dom f . Then f is convex on I if and only if f  (x) ≥ 0 for all x ∈ I . Proof Since f  (x) ≥ 0 for all x ∈ I if and only if the derivative function f  is nondecreasing on this interval, the claimed assertion follows directly from the result of  Theorem 1.52. Example 1.54 Consider the function f : R → R given by ⎧ ⎨ 1 if x > 0, f (x) = x ⎩∞ otherwise. To verify its convexity, we get that f  (x) = 2/x 3 > 0 for all x belonging to the domain of f , which is the interval I = (0, ∞). Thus this function is convex on I and hence on R by Corollary 1.53. Lemma 1.55 Let G be a nonempty open convex subset of Rn . Then for any x ∈ G and any d ∈ Rn , the set

24

1 Convex Sets and Functions

   I = t ∈ R  x + td ∈ G

(1.8)

is an open interval in R that contains 0. Proof Pick any x ∈ G, d ∈ Rn and define the mapping B : R → Rn by B(t) = x + td for t ∈ R. Then B is an affine mapping with I = B −1 (G). Thus I ⊂ R is convex by Proposition 1.25, so it is an interval in R. The obvious continuity of B implies the openness of I , which is the inverse image of an open set. Since x ∈ G, we see that 0 ∈ I and thus complete the proof.  Theorem 1.56 Let f : Rn → R be a twice continuously differentiable function on a nonempty open convex set G ⊂ dom f . Then f is convex on G if and only if we have ∇ 2 f (x)  0 for all x ∈ G.

(1.9)

Proof Take any x ∈ G, d ∈ Rn and define the function ϕ : R → R by ϕ(t) = f (x + td) for t ∈ R. The open interval I defined in (1.8) is a subset of dom ϕ. The classical chain rule gives us   ϕ (t) = d T ∇ 2 f (x + td) d whenever t ∈ I . Suppose that f is convex on G. Then ϕ is convex on I by Corollary 1.48. Using Proposition 1.53 tells us that   ϕ (t) = d T ∇ 2 f (x + td) d ≥ 0 for all t ∈ I . Since 0 ∈ I , we have the condition ϕ (0) = d T ∇ 2 f (x)d ≥ 0, which readily implies that ∇ 2 f (x)  0. To verify the converse implication, suppose that (1.9) holds. Then ϕ (t) ≥ 0 for all t ∈ I , which ensures the convexity of ϕ on I . Employing Corollary 1.48 again yields the convexity  of f on G due to the arbitrary choices of x ∈ G and d ∈ Rn . Example 1.57 Consider the function f : R2 → R given by  √ − x1 x2 if x1 ≥ 0 and x2 ≥ 0, f (x1 , x2 ) = ∞ otherwise. Direct calculations give us the expression

1.4

Continuity and Lipschitz Continuity of Convex Functions

25

√ ⎞ x2 1 − √ √ ⎜ 4x1 x1 4 x1 x2 ⎟ ⎟ ⎜ ∇ 2 f (x) = ⎜ ⎟ √ ⎝ x1 ⎠ 1 − √ √ 4 x1 x2 4x2 x2 ⎛

whenever x = (x1 , x2 ) ∈ G = (0, ∞) × (0, ∞). Since the trace of the matrix ∇ 2 f (x) is nonnegative and the determinant of ∇ 2 f (x) is zero for all x ∈ G, condition (1.9) is satisfied. Thus f is a convex function on G by Theorem 1.56. Then we can easily verify the convexity of f on the entire plane R2 .

1.4

Continuity and Lipschitz Continuity of Convex Functions

In this section, we study the fundamental notions of continuity and Lipschitz continuity for convex functions. The Lipschitz continuity can be treated as “continuity at a linear rate” being highly important for many aspects of convex analysis and its applications. Definition 1.58 Let f : Rn → R be an extended-real-valued function, and let x ∈ Rn be a given reference point. (i) The function f is continuous at x if x ∈ int(dom f ) and for any ε > 0 there exists δ > 0 such that B(x; δ) ⊂ dom f and | f (x) − f (x)| < ε for all x ∈ B(x; δ). (ii) The function f is Lipschitz continuous on a subset Ω of Rn if Ω ⊂ dom f and there is a constant  ≥ 0 such that | f (x) − f (y)| ≤ x − y for all x, y ∈ Ω.

(1.10)

(iii) The function f is locally Lipschitz continuous around x ∈ Rn if there exists δ > 0 such that f is Lipschitz continuous on B(x; δ). Furthermore, the function f is locally Lipschitz continuous on an open subset G of Rn if it is locally Lipschitz continuous around every point of this set. To proceed with the study of Lipschitz continuity of convex functions, we need two lemmas presented below, which are of their own interest. Lemma 1.59 Let {ei | i = 1, . . . , n} be the standard orthonormal basis of Rn . Consider the set    A = x ± εei  i = 1, . . . , n , ε > 0. Then we have the following properties:

26

1 Convex Sets and Functions

(i) x + γei ∈ co A for i = 1, . . . , n and γ ∈ R with |γ| ≤ ε. (ii) B(x; ε/n) ⊂ co A. Proof (i) For any number γ ∈ R with |γ| ≤ ε, find t ∈ [0, 1] such that γ = t(−ε) + (1 − t)ε. Then the conditions x ± εei ∈ A give us x + γei = t(x − εei ) + (1 − t)(x + εei ) ∈ co A, i = 1, . . . , n. (ii) For any x ∈ B(x; ε/n), we have x = x + (ε/n)u with u ≤ 1. Represent u via the basis {ei } by n  u= λi ei , where |λi | ≤



i=1

n 2 i=1 λi

= u ≤ 1 for every i = 1, . . . , n. This gives us

  ελi  1 ε x=x+ u=x+ ei = x + ελi ei . n n n n

n

i=1

i=1

Denoting γi = ελi yields |γi | ≤ ε. It follows from (i) that x + ελi ei = x + γi ei ∈ co A. Thus by Proposition 1.21 we see that x ∈ co A since it is a convex combination of elements in the convex hull co A.  Lemma 1.60 If a convex function f : Rn → R is bounded above on B(x; δ) for some element x ∈ dom f and δ > 0, then f is bounded on B(x; δ). Proof Denote m = f (x) and take M > 0 with f (x) ≤ M for all x ∈ B(x; δ). Picking any u ∈ B(x; δ), define x = 2x − u. Then x ∈ B(x; δ) and

x +u 1 1 ≤ f (x) + f (u), m = f (x) = f 2 2 2 which shows that f (u) ≥ 2m − f (x) ≥ 2m − M. Therefore, f is bounded on B(x; δ) as claimed in the lemma.  The next important result shows that the local boundedness of a convex function ensures its local Lipschitz continuity. Theorem 1.61 Let f : Rn → R be convex with x ∈ dom f . If f is bounded above on B(x; δ) for some δ > 0, then f is Lipschitz continuous on B(x; δ/2). Proof Fix x, y ∈ B(x; δ/2) with x  = y and consider the element

1.4

Continuity and Lipschitz Continuity of Convex Functions

u=x+

27

  δ x−y . 2x − y

Since e = (x − y)/x − y ∈ B, we have u = x + 2δ e ∈ x + 2δ B + 2δ B ⊂ x + δB. Denotδ gives us u = x + α(x − y), and thus ing further α = 2x−y x=

1 α u+ y. α+1 α+1

It follows from the convexity of f that f (x) ≤

1 α f (u) + f (y), α+1 α+1

which implies in turn the inequalities 1 1 ( f (u) − f (y)) ≤ 2M α+1 α+1 2x − y 4M = 2M ≤ x − y, δ + 2x − y δ

f (x) − f (y) ≤

where M = sup{| f (x)| | x ∈ B(x; δ)} < ∞ by Lemma 1.60. Interchanging the role of x and y above, we arrive at the estimate | f (x) − f (y)| ≤

4M x − y δ

and thus verify the Lipschitz continuity of f on B(x; δ/2).



The following two consequences of Theorem 1.61 ensure the automatic local Lipschitz continuity of a convex function on appropriate sets. Corollary 1.62 Any extended-real-valued convex function f : Rn → R is locally Lipschitz continuous on the interior of its domain int(dom f ). Proof Pick x ∈ int(dom f ) and choose ε > 0 such that x ± εei ∈ dom f for every i = 1, . . . , n. Considering the set A from Lemma 1.59, we get B(x; ε/n) ⊂ co A by assertion (ii) of that lemma. Denote M = max{ f (a) | a ∈ A} < ∞ over the finite set A. For any x ∈ B(x; ε/n), using the representation x=

m  i=1

brings us to the estimates

λi ai with λi ≥ 0,

m  i=1

λi = 1, ai ∈ A

28

1 Convex Sets and Functions

f (x) ≤

m 

λi f (ai ) ≤

i=1

m 

λi M = M,

i=1

and so f is bounded above on B(x; ε/n). Then Theorem 1.61 tells us that f is Lipschitz continuous on B(x; ε/2n) and therefore it is locally Lipschitz continuous on int(dom f ).  It immediately follows from Corollary 1.62 that any finite convex function on Rn is always locally Lipschitz continuous on the whole space. Corollary 1.63 If f : Rn → R is a real-valued convex function, then it is locally Lipschitz continuous on the space Rn . The final result of this section provides characterizations of the local Lipschitz continuity of extended-real-valued convex functions. Theorem 1.64 Let f : Rn → R be a convex function, and let x ∈ Rn . The following properties are equivalent: (i) f is continuous at x. (ii) x ∈ int(dom f ). (iii) f is locally Lipschitz continuous around x. Proof Implication (i) =⇒ (ii) follows directly from the definition of continuity. Implication (ii)=⇒ (iii) is a consequence of Corollary 1.62 while (iii)=⇒ (i) is obvious. Thus assertions (i), (ii), and (iii) are equivalent. 

1.5

The Distance Function

This section is mainly devoted to the study of the class of distance functions associated with convex sets. This class of functions belongs to one of the most interesting and important subjects of convex analysis and its extensions. Functions of this type are intrinsically nondifferentiable while they naturally and frequently appear in convex analysis and applications (Fig. 1.5). Given a nonempty set Ω ⊂ Rn , the distance function associated with this set is defined by the formula    d(x; Ω) := inf x − ω  ω ∈ Ω , x ∈ Rn . (1.11) Proposition 1.65 Let Ω be a nonempty subset of Rn , and let x ∈ Rn . Then the associated distance function possesses the following properties:

1.5 The Distance Function

29

Fig. 1.5 Distance function

(i) d(x; Ω) = 0 if and only if x ∈ Ω. (ii) The function d(·; Ω) is Lipschitz continuous with constant  = 1 on the entire space Rn . Proof (i) Suppose that d(x; Ω) = 0. For each k ∈ N, find ωk ∈ Ω such that 0 = d(x; Ω) ≤ x − ωk  < d(x; Ω) +

1 1 = , k k

which ensures that the sequence {ωk } converges to x, and hence x ∈ Ω. Conversely, pick any vector x ∈ Ω and find a sequence {ωk } ⊂ Ω converging to x. Then we have 0 ≤ d(x; Ω) ≤ x − ωk  for all k ∈ N, which implies that d(x; Ω) = 0 since x − ωk  → 0 as k → ∞. (ii) Take any x, y ∈ Rn . For any ω ∈ Ω, we have the estimate d(x; Ω) ≤ x − ω ≤ x − y + y − ω, which implies in turn that    d(x; Ω) ≤ x − y + inf y − ω  ω ∈ Ω = x − y + d(y; Ω). Similarly, we get d(y; Ω) ≤ y − x + d(x; Ω) and thus |d(x; Ω) − d(y; Ω)| ≤ x − y. Therefore, d(·; Ω) is Lipschitz continuous on Rn with constant  = 1. For each x ∈ Rn , the Euclidean projection from x to Ω is defined by    Π (x; Ω) := ω ∈ Ω  x − ω = d(x; Ω) .



(1.12)

30

1 Convex Sets and Functions

Proposition 1.66 Let Ω be a nonempty closed subset of Rn . Then for any x ∈ Rn , the Euclidean projection Π (x; Ω) is nonempty. Proof By definition (1.12), for each k ∈ N there exists ωk ∈ Ω such that 1 d(x; Ω) ≤ x − ωk  < d(x; Ω) + . k It is clear that {ωk } is a bounded sequence. Thus it has a subsequence {ωkl } that converges to ω as l → ∞. Since Ω is closed, we get ω ∈ Ω. Note that {kl } is a strictly increasing sequence of positive integers, so one can show that l ≤ kl for every l ∈ N by induction. In particular, kl → ∞ as l → ∞. Letting l → ∞ in the inequality d(x; Ω) ≤ x − ωkl  < d(x; Ω) +

1 , kl

we arrive at d(x; Ω) = x − ω, which ensures that ω ∈ Π (x; Ω).



An interesting and useful consequence of convexity is the following unique projection property. Corollary 1.67 If Ω is a nonempty closed convex subset of Rn , then for each x ∈ Rn the Euclidean projection Π (x; Ω) is a singleton. Proof The nonempty property of the projection Π (x; Ω) follows from Proposition 1.66. To prove the uniqueness, suppose on the contrary that ω1 , ω2 ∈ Π (x; Ω) with ω1  = ω2 . It is written as x − ω1  = x − ω2  = d(x; Ω). By the classical parallelogram equality, we have  2 ω1 + ω2  2 ω1 − ω2   2x − ω1 2 = x − ω1 2 + x − ω2 2 = 2x − .  + 2 2 This directly implies that  !2 ω1 + ω2  ω1 − ω2 2  2 < x − ω1 2 = d(x; Ω) , x −  = x − ω1 2 − 2 4 which is a contradiction due to the inclusion (ω1 + ω2 )/2 ∈ Ω.



Now we show that the convexity of an arbitrary nonempty set Ω implies the convexity of its distance function.

1.5 The Distance Function

31

Proposition 1.68 Let Ω be a nonempty subset of Rn . If Ω is convex, then d(·; Ω) is convex. The converse implication also holds if we assume in addition that the set Ω is closed. Proof Suppose that Ω is convex. Take any x1 , x2 ∈ Rn and 0 < λ < 1. Then for any ε > 0, there exist elements ωi ∈ Ω such that xi − ωi  < d(xi ; Ω) + ε for i = 1, 2. The convexity of Ω ensures that λω1 + (1 − λ)ω2 ∈ Ω. This yields     d λx1 + (1 − λ)x2 ; Ω ≤ λx1 + (1 − λ)x2 − λω1 + (1 − λ)ω2  ≤ λx1 − ω1  + (1 − λ)x2 − ω2      = λ d(x1 ; Ω1 ) + ε + (1 − λ) d(x2 ; Ω2 ) + ε = λd(x1 ; Ω1 ) + (1 − λ)d(x2 ; Ω2 ) + ε. Letting ε → 0+ gives us   d λx1 + (1 − λ)x2 ; Ω ≤ λd(x1 ; Ω) + (1 − λ)d(x2 ; Ω), which justifies the convexity of d(·; Ω). To prove the converse implication, suppose that Ω is closed and d(·; Ω) is convex. Fix any ωi ∈ Ω for i = 1, 2 and λ ∈ (0, 1). Then we have   d λω1 + (1 − λ)ω2 ; Ω ≤ λd(ω1 ; Ω) + (1 − λ)d(ω2 ; Ω) = 0. Since Ω is closed, this ensures by Proposition 1.65(i) that λω1 + (1 − λ)ω2 ∈ Ω and therefore justifies the convexity of the set Ω.  Next we characterize the Euclidean projection to convex sets in Rn . In the proposition below and in what follows, we often identify the projection Π (x; Ω) with its unique element if Ω is a nonempty closed convex set (Fig. 1.6). Proposition 1.69 Let Ω be a nonempty convex subset of Rn with ω ∈ Ω. Then we have ω ∈ Π (x; Ω) if and only if x − ω, ω − ω ≤ 0 for all ω ∈ Ω.

(1.13)

Proof Suppose that ω ∈ Π (x; Ω) and get for any ω ∈ Ω and 0 < λ < 1 that ω + λ(ω − ω) ∈ Ω. Thus  2   x − ω2 = d(x; Ω) ≤ x − ω + λ(ω − ω) 2 = x − ω2 − 2λx − ω, ω − ω + λ2 ω − ω2 .

32

1 Convex Sets and Functions

Fig. 1.6 Euclidean projection

This readily implies that 2x − ω, ω − ω ≤ λω − ω2 . Letting λ → 0+ , we arrive at (1.13). To verify the reverse implication, suppose that (1.13) is satisfied. Then for any ω ∈ Ω, we have the relationships x − ω2 = x − ω + ω − ω2 = x − ω2 + ω − ω2 + 2x − ω, ω − ω = x − ω2 + ω − ω2 − 2x − ω, ω − ω ≥ x − ω2 . Hence x − ω ≤ x − ω for all ω ∈ Ω, which implies that ω ∈ Π (x; Ω) and completes the proof of the proposition.  We know from the above that for any nonempty closed convex set Ω in Rn the Euclidean projection Π (x; Ω) is a singleton. Now we show that the projection mapping is in fact nonexpansive, i.e., it satisfies the Lipschitz property with constant  = 1. Proposition 1.70 Let Ω be a nonempty closed convex subset of Rn . Then for any elements x1 , x2 ∈ Rn , we have the estimate   # " Π (x1 ; Ω) − Π (x2 ; Ω)2 ≤ Π (x1 ; Ω) − Π (x2 ; Ω), x1 − x2 , which yields the nonexpansive property of the Euclidean projection mapping:   Π (x1 ; Ω) − Π (x2 ; Ω) ≤ x1 − x2  for all x1 , x2 ∈ Rn . Proof Fix any x1 , x2 ∈ Rn . It follows from the preceding proposition that " # Π (x2 ; Ω) − Π (x1 ; Ω), x1 − Π (x1 ; Ω) ≤ 0.

1.6

Convex Set-Valued Mappings and Optimal Value Functions

33

Changing the roles of x1 , x2 in the inequality above gives us "

# Π (x1 ; Ω) − Π (x2 ; Ω), x2 − Π (x2 ; Ω) ≤ 0.

Summing these inequalities up yields " # Π (x1 ; Ω) − Π (x2 ; Ω), x2 − x1 + Π (x1 ; Ω) − Π (x2 ; Ω) ≤ 0. This implies the first estimate in the proposition. Finally, the nonexpansive property of the Euclidean projection mapping follows directly from   # " Π (x1 ; Ω) − Π (x2 ; Ω)2 ≤ Π (x1 ; Ω) − Π (x2 ; Ω), x1 − x2 ≤ Π (x1 ; Ω) − Π (x2 ; Ω) · x1 − x2 , which completes the proof of the proposition.

1.6



Convex Set-Valued Mappings and Optimal Value Functions

In this section, we first introduce the notion of set-valued mappings (or multifunctions), which plays an important role in many aspects of modern convex analysis and its applications (Fig. 1.7). Definition 1.71 We say that F is a set- valued mapping/multifunction from Rn to R p → R p if F(x) is a subset of R p for every x ∈ Rn . The domain, and denote it by F : Rn → range, and graph of F are defined, respectively, by    dom F := x ∈ Rn  F(x)  = ∅ ,   rge F := y ∈ R p | ∃x ∈ Rn such that y ∈ F(x) ,    gph F := (x, y) ∈ Rn × R p  y ∈ F(x) . The set-valued mapping F is convex if its graph is a convex set in Rn × R p . It follows from the definition that any single-valued mapping f : Rn → R p can be con→ R p , where F(x) = { f (x)} for x ∈ Rn . sidered as a set-valued mapping F : Rn → Let us provide examples of two remarkable set-valued mappings. → Rn defined by Example 1.72 Consider the set-valued mapping F : R p →    F(w) = B −1 (w) = x ∈ Rn  B(x) = w , w ∈ Rn , where B : Rn → R p is an affine mapping. Then F is convex.

34

1 Convex Sets and Functions

Fig. 1.7 Graph of set-valued mapping

Example 1.73 For any extended-real-valued function f : Rn → R, define the epigraphical multifunction associated with f by    E f (x) := [ f (x), ∞) = λ ∈ R  f (x) ≤ λ , x ∈ Rn . → R is a set-valued mapping for which gph E f = epi f and dom E f = Then E f : Rn → dom f . Thus f a convex function if and only if E f is a convex set-valued mapping. → R p be a convex set-valued mapping. Consider the singleProposition 1.74 Let F : Rn → valued mapping P : Rn × R p → Rn defined by

P (x, y) = x for (x, y) ∈ Rn × R p . Then P (gph F) = dom F. In particular, dom F is a convex set. Proof Take any (x, y) ∈ gph F and get y ∈ F(x), which implies that F(x)  = ∅ and hence P (x, y) = x ∈ dom F. Thus we have P (gph F) ⊂ dom F. To prove the reverse inclusion, take any x ∈ dom F, i.e., F(x)  = ∅, and choose y ∈ F(x). Then (x, y) ∈ gph F and x = P (x, y) ∈ P (gph F), so dom F ⊂ P (gph F). Since P is a linear mapping and gph F is convex, the convexity of dom F follows from Proposition 1.25.  The next proposition provides an analytic description of convex multifunctions that is similar to Jensen’s inequality for convex functions.

1.6

Convex Set-Valued Mappings and Optimal Value Functions

35

→ R p be a set-valued mapping. Then F is convex if and Proposition 1.75 Let F : Rn → only if   λF(x) + (1 − λ)F(y) ⊂ F λx + (1 − λ)y (1.14) whenever x, y ∈ Rn and 0 < λ < 1. Proof Suppose that F is a convex set-valued mapping, i.e., gph F is a convex set in Rn × R p . Fix x, y ∈ Rn and 0 < λ < 1. For any z ∈ λF(x) + (1 − λ)F(y), we have the representation z = λa + (1 − λ)b, where a ∈ F(x) and b ∈ F(y). It is obvious that (x, a) ∈ gph F and (y, b) ∈ gph F. The convexity of the set gph F yields   λ(x, a) + (1 − λ)(y, b) = λx + (1 − λ)y, λa + (1 − λ)b ∈ gph F, which implies in turn that   z = λa + (1 − λ)b ∈ F λx + (1 − λ)y . Thus the inclusion in (1.14) is satisfied. To show conversely that F is convex under the fulfillment of (1.14) for all x, y ∈ Rn and 0 < λ < 1, take any (x, a), (y, b) ∈ gph F and 0 < λ < 1, and then get a ∈ F(x) and b ∈ F(y).The assumed inclusion (1.14) tells us that   λa + (1 − λ)b ∈ F λx + (1 − λ)y , which implies therefore that   λ(x, a) + (1 − λ)(y, b) = λx + (1 − λ)y, λa + (1 − λ)b ∈ gph F. The latter justifies the convexity of gph F and thus completes the proof.



Now we define some operations over set-valued mappings and verify the preservation of convexity under these operations. → R p be set-valued mappings. Definition 1.76 Let F1 , F2 : Rn → (i) The sum of F1 and F2 is the set-valued mapping from Rn to R p defined by (F1 + F2 )(x) := F1 (x) + F2 (x), x ∈ Rn . (ii) The intersection of F1 and F2 is the set-valued mapping from Rn to R p defined by (F1 ∩ F2 )(x) := F1 (x) ∩ F2 (x), x ∈ Rn .

36

1 Convex Sets and Functions

It is not difficult to check that both operations from Definition 1.76 preserve the convexity of set-valued mappings. → R p be convex set-valued mappings. Then both setProposition 1.77 Let F1 , F2 : Rn → valued mappings F1 + F2 and F1 ∩ F2 are convex. → R p are convex set-valued mappings. Take any x, y ∈ Rn Proof Suppose that F1 , F2 : Rn → and 0 < λ < 1. Proposition 1.75 tells us that   λFi (x) + (1 − λ)Fi (y) ⊂ Fi λx + (1 − λ)y for i = 1, 2. It follows therefore that     λF1 (x) + (1 − λ)F1 (y) + λF2 (x) + (1 − λ)F2 (y)     ⊂ F1 λx + (1 − λ)y + F2 λx + (1 − λ)y . Rearranging the terms and using the definition yield     λ F1 + F2 )(x) + (1 − λ) F1 + F2 )(y) ⊂ (F1 + F2 ) λx + (1 − λ)y , which readily verifies the convexity of the sum F1 + F2 . The convexity of the intersection F1 ∩ F2 follows from the fact that gph(F1 ∩ F2 ) = gph F1 ∩ gph F2 , where the latter is a convex set in Rn × R p .



Next we proceed with compositions of two set-valued mappings. → R p and G : R p → → Rq be set-valued mappings. The comDefinition 1.78 Let F : Rn → n q → position G ◦ F : R → R is defined by $   G ◦ F (x) := G(y), x ∈ Rn . y∈F(x)

We show now that the convexity of set-valued mappings is preserved under general compositions from Definition 1.78. → R p and G : R p → → Rq be convex set-valued mappings. Proposition 1.79 Let F : Rn →  n q → R is also convex. Then the mapping (G ◦ F : R →

1.6

Convex Set-Valued Mappings and Optimal Value Functions

37

Proof To verify this, fix x, y ∈ Rn and 0 < λ < 1. It suffices to show that        λ G ◦ F (x) + (1 − λ) G ◦ F (y) ⊂ G ◦ F λx + (1 − λ)(y) .

(1.15)

Take any z from the set on the left-hand side of this inclusion and get z = λa + (1 − λ)b, where a ∈ G(u) for some u ∈ F(x), and b ∈ G(v) for some v ∈ F(y). Using the convexity of F, we have   z = λa + (1 − λ)b ∈ λG(u) + (1 − λ)G(v) ⊂ G λu + (1 − λ)v . The convexity of G tells us that   λu + (1 − λ)v ∈ λF(x) + (1 − λ)F(y) ⊂ F λx + (1 − λ)y .   It follows from the definition of compositions that z ∈ (G ◦ F) λx + (1 − λ)y . This yields (1.15) and completes the proof.  Now we discuss inverses of set-valued mappings. → R p be a set-valued mapping. The inverse of F is the setDefinition 1.80 Let F : Rn → −1 p → valued mapping F : R → Rn given by    F −1 (y) := x ∈ Rn  y ∈ F(x) , y ∈ R p . The next proposition on the convexity of inverses of convex set-valued mappings can be derived from the definition; see Exercise 1.24. → R p be a convex set-valued mapping. Then F −1 : R p → → Rn Proposition 1.81 Let F : Rn → is also a convex set-valued mapping. Consider further a class of functions, which plays a highly important role in convex analysis, optimization, and various applications. → R p , and let ϕ : Rn × R p → R. The optimal value or Definition 1.82 Let F : Rn → marginal function associated with F and ϕ is defined by    μ(x) := inf ϕ(x, y)  y ∈ F(x) , x ∈ Rn . (1.16) Throughout this section, we assume that μ(x) > −∞ for every x ∈ Rn .

38

1 Convex Sets and Functions

Proposition 1.83 Suppose that ϕ : Rn × R p → R is a convex function, and that → R p is a convex set-valued mapping. Then the optimal value function μ in (1.16) F : Rn → is convex. Proof Take x1 , x2 ∈ dom μ, λ ∈ (0, 1). For any ε > 0, find yi ∈ F(xi ) with ϕ(xi , yi ) < μ(xi ) + ε for i = 1, 2. This directly implies the inequalities λϕ(x1 , y1 ) < λμ(x1 ) + λε, (1 − λ)ϕ(x2 , y2 ) < (1 − λ)μ(x2 ) + (1 − λ)ε. Summing these inequalities up and employing the convexity of ϕ yield   ϕ λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 ≤ λϕ(x1 , y1 ) + (1 − λ)ϕ(x2 , y2 ) < λμ(x1 ) + (1 − λ)μ(x2 ) + ε. Furthermore, the convexity of gph F gives us 

 λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 = λ(x1 , y1 ) + (1 − λ)(x2 , y2 ) ∈ gph F,

and therefore λy1 + (1 − λ)y2 ∈ F(λx1 + (1 − λ)x2 ). This implies that     μ λx1 + (1 − λ)x2 ≤ ϕ λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 < λμ(x1 ) + (1 − λ)μ(x2 ) + ε. Letting finally ε → 0+ ensures the convexity of μ.



Using Proposition 1.83, we can verify the convexity of functions in many situations. For instance, consider two convex functions f i : Rn → R for i = 1, 2. Define ϕ(x, y) = f 1 (x) + y and F(x) = [ f 2 (x), ∞) for x ∈ Rn and y ∈ R. Then ϕ is a convex function and F is a convex set-valued mapping. Thus we justify the convexity of the sum μ(x) = inf ϕ(x, y) = f 1 (x) + f 2 (x), x ∈ Rn . y∈F(x)

Another example concerns compositions. Let f : R p → R be convex, and let B : Rn → be affine. Define ϕ(x, y) = f (y) and F(x) = {B(x)} for x ∈ Rn and y ∈ R p . Observe that ϕ is a convex function while F is a convex set-valued mapping. We have the representation   μ(x) = inf ϕ(x, y) = f B(x) , x ∈ Rn ,

Rp

y∈F(x)

1.6

Convex Set-Valued Mappings and Optimal Value Functions

39

which justifies the convexity of f ◦ B. The examples presented above recover the results obtained previously by direct proofs. Now we establish via Proposition 1.83 the convexity of three new classes of functions. Proposition 1.84 Let φ : R p → R be a convex function, and let B : R p → Rn be an affine → R p from Example 1.72, define mapping. Consider the inverse image mapping B −1 : Rn →    f (x) = inf φ(y)  y ∈ B −1 (x) , x ∈ Rn , and suppose that f (x) > −∞ for all x ∈ Rn . Then f is a convex function. Proof Let ϕ(x, y) = φ(y) and F(x) = B −1 (x) for (x, y) ∈ Rn × R p . Then ϕ is a convex function and F is a convex set-valued mapping. Since we have f (x) = inf ϕ(y), x ∈ Rn , y∈F(x)

the convexity of f follows directly from Proposition 1.83.



Proposition 1.85 For two extended-real-valued convex functions f 1 , f 2 : Rn → R, define the infimal convolution    ( f 1  f 2 )(x) := inf f 1 (x1 ) + f 2 (x2 )  x1 + x2 = x , x ∈ Rn , and suppose that ( f 1  f 2 )(x) > −∞ for all x ∈ Rn . Then f 1  f 2 is also convex. Proof Define φ : Rn × Rn → R by φ(x1 , x2 ) = f 1 (x1 ) + f 2 (x2 ) and B : Rn × Rn → Rn by B(x1 , x2 ) = x1 + x2 for (x1 , x2 ) ∈ Rn × Rn . We have    inf φ(x1 , x2 )  (x1 , x2 ) ∈ B −1 (x) = ( f 1  f 2 )(x) for all x ∈ Rn , which implies the convexity of ( f 1  f 2 ) by Proposition 1.84.



Finally in this section, we deduce from Proposition 1.83 sufficient conditions ensuring the preservation of convexity under the composition of a convex function with a mapping that is nondecreasing componentwise. Definition 1.86 An extended-real-valued function g : R p → R is called nondecreasing componentwise if we have ! ! xi ≤ yi for all i = 1, . . . , p =⇒ g(x1 , . . . , x p ) ≤ g(y1 , . . . , y p ) . Here is the aforementioned preservation result.

40

1 Convex Sets and Functions

Proposition 1.87 Define h : Rn → R p by h(x) = ( f 1 (x), . . . , f p (x)) for x ∈ Rn , where f i : Rn → R for i = 1, . . . , p are real-valued convex functions. Suppose that g : R p → R is convex and nondecreasing componentwise. Then the composition (g ◦ h) : Rn → R is a convex function. → R p be the set-valued mapping defined by Proof Let F : Rn → F(x) = [ f 1 (x), ∞) × [ f 2 (x), ∞) × . . . × [ f p (x), ∞), x ∈ Rn . Then the graph of F is represented by    gph F = (x, t1 , . . . , t p ) ∈ Rn × R p  ti ≥ f i (x) . Since all functions f i are convex, the set gph F is convex as well. Define further ϕ : Rn × R p → R by ϕ(x, y) = g(y) for (x, y) ∈ Rn × R p . Observe, since g is increasing componentwise, that      inf ϕ(x, y)  y ∈ F(x) = g f 1 (x), . . . , f p (x) = (g ◦ h)(x), which ensures the convexity of the composition g ◦ h by Proposition 1.83.

1.7



Exercises for This Chapter

Exercise 1.1 Let Ω be a subset of Rn . We say that Ω is a cone if λx ∈ Ω whenever λ ≥ 0 and x ∈ Ω. Show that the following assertions are equivalent: (i) Ω is a convex cone. (ii) x + y ∈ Ω whenever x, y ∈ Ω, and λx ∈ Ω whenever x ∈ Ω and λ ≥ 0. Exercise 1.2 (i) Let Ω be a convex set that contains 0, and let 0 ≤ λ1 ≤ λ2 . Show that λ1 Ω ⊂ λ2 Ω. (ii) Give an example of a set Ω in Rn and two real numbers α, β > 0 such that αΩ + βΩ  = (α + β)Ω. Exercise 1.3 (i) Let Ωi for i = 1, . . . , m be nonempty convex sets in Rn . Show that x ∈ %m Ω if and only if there exist elements ωi ∈ Ωi and λi ≥ 0 for i = 1, . . . , m such co i=1 m m i λi = 1 and x = i=1 λi ωi . that i=1 (ii) Let Ωi for i = 1, . . . , m be nonempty convex cones in Rn . Show that

1.7

Exercises for This Chapter

41 m  i=1

Ωi = co

m $

Ωi .

i=1

Exercise 1.4 Prove that if Ω is an open set in Rn , then co Ω is an open set as well. Is it true that if F is a closed set in Rn , then co F is closed? Exercise 1.5 Let Ω1 be a subset of Rn , and let Ω2 be a subset of R p . Prove that co(Ω1 × Ω2 ) = co Ω1 × co Ω2 . Exercise 1.6 Show that the following functions are convex on the given domains: (i) (ii) (iii) (iv)

f (x) = eax , x ∈ R, where a is a constant. f (x) = x q , x ∈ [0, ∞), where q ≥ 1 is a constant. f (x) = − ln(x), x ∈ (0, ∞). f (x) = x ln(x), x ∈ (0, ∞).

Exercise 1.7 Show that the following functions are convex on Rn : (i) (ii) (iii) (iv)

f (x) = αx, where α ≥ 0. f (x) = x − a2 , where a ∈ Rn . f (x) = Ax − b, where A is an p × n matrix and b ∈ R p . f (x) = xq , where q ≥ 1.

Exercise 1.8 (i) Give an example of a function f : R2 → R, which is convex with respect to each variable but not convex with respect to both. (ii) Let f i : Rn → R for i = 1, 2 be convex functions. Can we conclude that the minimum function min{ f 1 , f 2 }(x) = min{ f 1 (x), f 2 (x)}, x ∈ Rn , is convex? Exercise 1.9 Prove the generalized Jensen inequality in Theorem 1.43 by using induction arguments. Exercise 1.10 Give an example showing that the product of two convex functions is not necessarily convex. Exercise 1.11 Verify that the set {(x, t) ∈ Rn × R | t ≥ x} is closed and convex. Exercise 1.12 Let δ(·; Ω) be the indicator function of a set Ω ⊂ Rn . (i) Find δ(·; Ω) for Ω = ∅ and Ω = Rn . (ii) Show that Ω is convex if and only if the indicator function δ(·; Ω) is convex.

42

1 Convex Sets and Functions

Exercise 1.13 Show that if f : Rn → [0, ∞) is a convex function, then its q-power f q (x) = ( f (x))q , x ∈ Rn , is also convex for any q ≥ 1. Exercise 1.14 We say that f : Rn → R is positively homogeneous if f (αx) = α f (x) for all α > 0, and that f is subadditive if f (x + y) ≤ f (x) + f (y) for all x, y ∈ Rn . Prove that a positively homogeneous function f is convex if and only if it is subadditive. Exercise 1.15 Given a set Ω ⊂ Rn , define    $ cone Ω := λω  λ ≥ 0, ω ∈ Ω = λΩ.

(1.17)

λ≥0

(i) Show that cone Ω is a cone. (ii) Show that cone Ω is the smallest cone containing Ω. (iii) Show that if Ω is convex, then the set cone Ω is convex as well. Exercise 1.16 Give an example of a closed convex set Ω ⊂ R2 with a point x ∈ Ω such that the set cone(Ω − x) is not closed.

Exercise 1.17 Let f : Rn → R be a convex function. (i) Show that for every α ∈ R the sublevel set

Lα ( f ) := {x ∈ Rn | f (x) ≤ α}

(1.18)

is a convex subset of Rn . (ii) Let Ω ⊂ R be a convex set. Is it true in general that the inverse image f −1 (Ω) is a convex subset of Rn ? Exercise 1.18 We say that f : Rn → R is quasi-convex if    f λx + (1 − λ)y) ≤ max f (x), f (y) for all x, y ∈ Rn and λ ∈ (0, 1). (i) Show that a function f is quasi-convex if and only if for any α ∈ R the sublevel Lα ( f ) defined in (1.18) is a convex set in Rn . (ii) Show that any convex function f : Rn → R is quasi-convex. Give an example demonstrating that the converse is not true.

1.7

Exercises for This Chapter

43

Exercise 1.19 Let g : Rn → [−∞, ∞). We say that g is a concave function if −g : Rn → R = (−∞, ∞] is convex. Given a family of concave functions {gα }α∈I with I  = ∅, prove that the function g : Rn → [−∞, ∞) given by g(x) = inf gα (x), x ∈ Rn α∈I

is also a concave function. Exercise 1.20 Find the explicit formulas for the distance d(x; Ω) and the Euclidean projection Π (x; Ω) in the following cases: (i) Ω = {x ∈ Rn | x = of Rn and x ∈ Rn .  1} is the unit sphere   2 (ii) Ω = (x1 , x2 ) ∈ R  |x1 | + |x2 | ≤ 1 and x ∈ R2 . (iii) Ω = [−1, 1] × [−1, 1] and x ∈ R2 . Exercise 1.21 Find the projection Π (x; Ω), x ∈ Rn , in the following cases:    (i) Ω = x ∈ Rn  a, x = b , where 0  = a ∈ Rn and b ∈ R. (ii) Ω is the nonnegative orthant Ω := Rn+ . Exercise 1.22 For ai ≤ bi with i = 1, . . . , n, define    Ω = x = (x1 , . . . , xn ) ∈ Rn  ai ≤ xi ≤ bi for all i = 1, . . . , n . Show that the projection Π (x; Ω) for x ∈ Rn admits representation    Π (x; Ω) = u = (u 1 , . . . , u n ) ∈ Rn  u i = max{ai , min{bi , xi }} for i = 1, . . . , n . Exercise 1.23 Let f i : Rn → R for i = 1, 2 be convex functions. Define    Ω1 = (x, λ1 , λ2 ) ∈ Rn × R × R  λ1 ≥ f 1 (x) ,    Ω2 = (x, λ1 , λ2 ) ∈ Rn × R × R  λ2 ≥ f 2 (x) . (i) Show that the sets Ω1 , Ω2 are convex. → R2 by F(x) = [ f 1 (x), ∞) × [ f 2 (x), ∞) and (ii) Define the set-valued mapping F : Rn → verify that the graph of F is Ω1 ∩ Ω2 . → R p be a set-valued mapping. Prove that Exercise 1.24 Let F : Rn → (i) dom F −1 = rge F. (ii) rge F −1 = dom F. (iii) F is convex if and only if F −1 is convex.

44

1 Convex Sets and Functions

Exercise 1.25 Let C be a convex subset of Rn+1 . (i) For x ∈ Rn , define F(x) = {λ ∈ R | (x, λ) ∈ C}. Show that F is a set-valued mapping with a convex graph and give an explicit formula for its graph. (ii) For x ∈ Rn , define    f C (x) := inf λ ∈ R  (x, λ) ∈ C . (1.19) Find an explicit formula for f C when C is the closed unit ball of R2 . (iii) Show that if f C (x) > −∞ as x ∈ Rn , then f C from (ii) is a convex function. Exercise 1.26 Let f : Rn → R be bounded below. Define its convexification (co f )(x) := inf

m  i=1

m m      λi f (xi )  λi ≥ 0, λi = 1, λi xi = x, m ∈ N . i=1

(1.20)

i=1

Verify that (1.20) is convex with co f = f C , C = co(epi f ), and f C defined in (1.19). Exercise 1.27 Let ϕ : Rn × R p → R be a convex function, and let K  = ∅ be a convex subset of R p . Suppose that ϕ(x, ·) is bounded below on K for each x ∈ Rn . Verify that the function f : Rn → R given by f (x) = inf{ϕ(x, y) | y ∈ K }, x ∈ Rn , is convex.

2

Convex Separation and Some Consequences

This chapter mainly concerns convex separation theorems that play a fundamental role in convex analysis and its numerous applications. We also study here the notion of the relative interior of convex sets and its connection to proper convex separation. Among the major applications of convex separation and relative interior results, we establish the normal cone intersection rule for convex sets, which is crucial for developing calculus rules of generalized differentiation for convex functions and set-valued mappings via the geometric approach of convex analysis.

2.1

Affine Hulls and Relative Interiors of Convex Sets

We begin this section with the definition and properties of affine sets. Given two elements a and b in Rn , the line connecting them is    L(a, b) := λa + (1 − λ)b  λ ∈ R . (2.1) Note that if a = b, then L(a, b) = {a}. Definition 2.1 A subset Ω of Rn is affine if for any elements a, b ∈ Ω we have the inclusion L(a, b) ⊂ Ω. For instance, any point, line, and plane in R3 are affine sets. The empty set and the whole space are always affine. It follows from the definition that the intersection of any collection of affine sets is affine. This leads us to the construction of the affine hull of a set.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0_2

45

46

2 Convex Separation and Some Consequences

Definition 2.2 The affine hull of a set Ω ⊂ Rn is    aff Ω := C  C is affine and Ω ⊂ C . Example 2.3 (i) For any two elements a, b ∈ Rn , the affine hull of the line segment [a, b] is the line L(a, b). (ii) If Ω ⊂ Rn itself is an affine set, then aff Ω = Ω. (iii) If Ω is a subset of Rn with nonempty interior, then aff Ω = Rn . Given ω1 , . . . , ωm ∈ Rn , an element x in Rn of the form x=

m 

λi ωi with λi ∈ R,

i=1

m 

λi = 1, m ∈ N,

i=1

is called an affine combination of ω1 , . . . , ωm . Proposition 2.4 The following assertions hold: (i) A set Ω in Rn is affine if and only if Ω contains all affine combinations of its elements. (ii) Let B : Rn → R p be an affine mapping. If Ω is an affine subset of Rn and Θ is an affine subset of R p , then the direct image B(Ω) is an affine subset of R p and the inverse image B −1 (Θ) is an affine subset of Rn . (iii) Let Ω, Ω1 , and Ω2 be affine subsets of Rn . Then the sum Ω1 + Ω2 and the scalar multiplication αΩ for any α ∈ R are affine subsets of Rn . (iv) Given Ω ⊂ Rn , its affine hull is the smallest affine set containing Ω. In addition, we have the representation aff Ω =

m  i=1

m     λi ωi  λi = 1, λi ∈ R, ωi ∈ Ω, m ∈ N . i=1

(v) A set Ω in Rn is a linear subspace if and only if Ω is an affine set containing the origin. Proof Assertion (i) can be proved by induction following the proof of Proposition 1.21 for the convex hull. Then we can use the arguments in the proof of Proposition 1.25 to prove (ii), which implies (iii). The proof of (iv) is similar to that of Proposition 1.30. Finally, let us prove (v). It is obvious that if Ω is a linear subspace of R, then it is an affine set containing the origin. For the converse implication, take any x, y ∈ Ω and λ ∈ R. Since 0 ∈ Ω, we have λx = λx + (1 − λ)0 ∈ Ω.

2.1

Affine Hulls and Relative Interiors of Convex Sets

It follows therefore that



1 1 x+ y x+y=2 2 2

47

∈ Ω.

Thus Ω is a linear subspace of Rn and the proof is complete.



Next we consider relationships between affine sets and linear subspaces. Lemma 2.5 Let Ω be a nonempty subset of Rn . The following three statements are equivalent: (i) Ω is an affine set. (ii) Ω − ω is a linear subspace of Rn for every ω ∈ Ω. (iii) Ω − ω is a linear subspace of Rn for some ω ∈ Ω. Proof Suppose that a nonempty set Ω ⊂ Rn is affine. Take any ω ∈ Ω and observe that Ω − ω is an affine set that contains 0. It follows from Proposition 2.4(v) that Ω − ω is a linear subspace of Rn . This verifies the fulfillment of implication (i)=⇒(ii). Implication (ii)=⇒(iii) is obvious. Finally, suppose that Ω − ω is a linear subspace denoted by L for some ω ∈ Ω. Then the set Ω = ω + L is obviously affine as it is the sum of two affine sets. This completes the proof of (iii)=⇒(i) and all the equivalences.  The preceding lemma leads us to the following notion. Definition 2.6 A nonempty affine set Ω is parallel to a linear subspace L if Ω = ω + L for some ω ∈ Ω. The next proposition justifies the precise form of the parallel linear subspace. Proposition 2.7 Let Ω be a nonempty affine subset of Rn . Then it is parallel to a unique linear subspace L of Rn given by L = Ω − Ω. Proof Given a nonempty affine set Ω, fix ω ∈ Ω and define L = Ω − ω. Then L is a linear subspace of Rn that is parallel to Ω. To justify the uniqueness of such L, suppose that L 1 , L 2 ⊂ Rn are linear subspaces parallel to Ω. Then Ω = ω1 + L 1 = ω2 + L 2 for some ω1 , ω2 ∈ Ω, which implies that L 1 = ω2 − ω1 + L 2 . Since 0 ∈ L 1 , we have ω1 − ω2 ∈ L 2 and thus ω2 − ω1 ∈ L 2 . It follows that L 1 = ω2 − ω1 + L 2 ⊂ L 2 due to the fact that L 2 is closed under addition. Similarly, we get L 2 ⊂ L 1 , which justifies the equality L 1 = L 2 . It remains to verify the representation L = Ω − Ω. Let Ω = ω + L with a unique linear subspace L and some ω ∈ Ω. Then L = Ω − ω ⊂ Ω − Ω. Fix any x = u − ω with u, ω ∈ Ω and observe that Ω − ω is a linear subspace parallel to Ω. Hence Ω − ω = L by the uniqueness of L proved above. This ensures that x ∈ Ω − ω = L and thus Ω − Ω ⊂ L. 

48

2 Convex Separation and Some Consequences

The uniqueness of the parallel linear subspace to a nonempty affine set shows that the following notion is well defined. Definition 2.8 The dimension of a nonempty affine set Ω ⊂ Rn is the dimension of the linear subspace parallel to Ω. The dimension of a nonempty convex set Ω ⊂ Rn is the dimension of its affine hull aff Ω. To proceed further, we need to define yet another important notion. Definition 2.9 The elements v0 , . . . , vm in Rn , m ≥ 1, are affinely independent if we have the implication m 

λi vi = 0,

i=0

m 



 λi = 0 =⇒ λi = 0 for all i = 0, . . . , m

i=0

whenever λ0 , . . . , λm ∈ R. It is easy to observe the following relationship between affine independence and linear independence. Proposition 2.10 The elements v0 , . . . , vm in Rn , m ≥ 1, are affinely independent if and only if v1 − v0 , . . . , vm − v0 are linearly independent. Proof Let v0 , . . . , vm be affinely independent. Suppose that m 

λi (vi − v0 ) = 0,

i=1

where λi ∈ R for i = 1, . . . , m. It follows that λ0 v 0 +

m 

λi vi = 0,

i=1

m where λ0 = − i=1 λi . Since the elements v0 , . . . , vm are affinely independent and m λ = 0, we have λi = 0 for all i = 0, . . . , m. Thus v1 − v0 , . . . , vm − v0 are linearly i i=0 independent. The proof of the converse implication is straightforward.  Recall that the span of a set C, denoted by span C, is the linear subspace generated by this set.

2.1

Affine Hulls and Relative Interiors of Convex Sets

49

Lemma 2.11 Let Ω = aff {v0 , . . . , vm }, where vi ∈ Rn for i = 0, . . . , m, m ≥ 1. Then the span of the set {v1 − v0 , . . . , vm − v0 } is the linear subspace parallel to Ω. Proof Denote by L the linear subspace parallel to Ω. Then Ω − v0 = L and therefore vi − v0 ∈ L for all i = 1, . . . , m. This gives us    span vi − v0  i = 1, . . . , m ⊂ L. To prove the reverse inclusion, fix any v ∈ L and get v + v0 ∈ Ω. Thus we have the representation m m   v + v0 = λi vi , where λi = 1. i=0

i=0

This implies the relationship v=

m 

   λi (vi − v0 ) ∈ span vi − v0  i = 1, . . . , m ,

i=1

which justifies the reverse inclusion and hence completes the proof.



The proof of the next proposition is rather straightforward. Proposition 2.12 The elements v0 , . . . , vm , m ≥ 1, are affinely independent in Rn if and only if its affine hull Ω = aff {v0 , . . . , vm } is m-dimensional. Proof Suppose that v0 , . . . , vm are affinely independent. It follows from Lemma 2.11 that the linear subspace L = span {vi − v0 | i = 1, . . . , m} is parallel to Ω. The linear independence of v1 − v0 , . . . , vm − v0 by Proposition 2.10 ensures that the linear subspace L is m-dimensional and so is Ω. The proof of the converse implication is also straightforward.  Affinely independent systems lead us to the construction of simplices. Definition 2.13 Let v0 , . . . , vm , m ≥ 1, be affinely independent elements in Rn . Then the set    Δm = co vi  i = 0, . . . , m is called an m-simplex or a simplex of dimension m in Rn with the vertices vi , i = 0, . . . , m.

50

2 Convex Separation and Some Consequences

By convention, a single-point set {v0 }, where v0 ∈ Rn , is affinely independent. Thus Δ0 = co {v0 } is a simplex of dimension 0. An important role of simplex vertices is revealed below. Proposition 2.14 Consider an m-simplex Δm with vertices vi as i = 0, . . . , m and m ≥ 1. m+1 such that we Then for every v ∈ Δm , there exists a unique element (λ0 , . . . , λm ) ∈ R+ have m m   v= λi vi and λi = 1. i=0

i=0

m+1 m+1 and (μ0 , . . . , μm ) ∈ R+ satisfy Proof Let (λ0 , . . . , λm ) ∈ R+

v=

m  i=0

λi vi =

m  i=0

μi vi ,

m 

λi =

i=0

m 

μi = 1.

i=0

This immediately implies the equalities m  i=0

m  (λi − μi )vi = 0, (λi − μi ) = 0. i=0

Since v0 , . . . , vm are affinely independent, we get λi = μi for i = 0, . . . , m and complete the proof of the proposition.  Now we are ready to define the notion of relative interior for subsets of Rn . Although we deal mostly with relative interiors of convex sets in what follows, the definition below does not require the convexity of the set involved. Definition 2.15 Let Ω be a subset of Rn (not necessarily convex). We say that an element x0 ∈ Ω belongs to the relative interior ri Ω of Ω if there exists δ > 0 such that B(x0 ; δ) ∩ aff Ω ⊂ Ω.

(2.2)

It follows directly from the definition that    ri Ω = x ∈ Ω  ∃δ > 0 such that B(x; δ) ∩ aff Ω ⊂ Ω . Example 2.16 (i) Consider the line segment Ω = [a, b] for a, b ∈ Rn . Then aff Ω is the line L(a, b). It is not hard to see that ri Ω = (a, b). This implies that the relative interior of a single-point set is the set itself. (ii) Let Ω be an affine subset of Rn . Then aff Ω = Ω and ri Ω = Ω. (iii) Let Ω be a subset of Rn with nonempty interior. Then aff Ω = Rn and thus ri Ω = int Ω.

2.1

Affine Hulls and Relative Interiors of Convex Sets

51

The following proposition shows that the relative interior of a subset Ω of Rn is the interior of Ω relatively to its affine hull aff Ω considered as a metric subspace of Rn . Proposition 2.17 Let Ω be a subset of Rn . Then we have the representation    ri Ω = x ∈ aff Ω  ∃δ > 0 such that B(x; δ) ∩ aff Ω ⊂ Ω . Proof If x ∈ ri Ω, then we get by definition that x ∈ Ω ⊂ aff Ω and there exists δ > 0 such that B(x; δ) ∩ aff Ω ⊂ Ω. This justifies the inclusion “⊂”. To prove the reverse inclusion, take any x ∈ aff Ω such that there is δ > 0 with B(x; δ) ∩ aff Ω ⊂ Ω. Since x belongs to B(x; δ) ∩ aff Ω, we see that x ∈ Ω. It follows from the definition that x ∈ ri Ω, which completes the proof.  The next result is known as the prolongation property which states that if x0 ∈ ri Ω, then from any x ∈ Ω we can prolong the line segment [x, x0 ] beyond x0 while staying in the set Ω. Note that the convexity of the set under consideration is not required. Proposition 2.18 Let Ω be a subset of Rn . If x0 ∈ ri Ω, then for any x ∈ Ω there is t > 0 such that x0 + t(x0 − x) ∈ Ω, i.e., we get u ∈ Ω with x0 ∈ (x, u). Proof Suppose that x0 ∈ ri Ω. By definition, x0 ∈ Ω and there exists δ > 0 such that (2.2) is satisfied. Then there is t > 0 with x0 + t(x0 − x) ∈ B(x0 ; δ). Observe further that x0 + t(x0 − x) = (1 + t)x0 + (−t)x is an affine combination of two elements of Ω, namely x0 and x. Thus we have x0 + t(x0 − x) ∈ B(x0 ; δ) ∩ aff Ω ⊂ Ω. Letting u = x0 + t(x0 − x) yields x0 ∈ (x, u) and completes the proof.



We continue the study of relative interiors with the following lemma. Lemma 2.19 Any linear mapping A : Rn → R p is continuous. More generally, any linear mapping from a linear subspace of Rn to R p is continuous. Proof Let {e1 , . . . , en } be the standard orthonormal basis of Rn , and let vi = A(ei ), i = 1, . . . , n. For any x ∈ Rn with x = (x1 , . . . , xn ), we have A(x) = A

n  i=1

n n    xi ei = xi A(ei ) = xi vi . i=1

i=1

52

2 Convex Separation and Some Consequences

Then the triangle inequality and the Cauchy-Schwarz inequality give us    n  n n    |xi | vi ≤  |xi |2  vi 2 = M x , A(x) ≤ i=1

where M =



n 2 i=1 vi .

i=1

i=1

It follows furthermore that

A(x) − A(y) = A(x − y) ≤ M x − y for all x, y ∈ Rn , which implies the continuity of the mapping A. The proof for the second assertion of the lemma is left as an exercise for the reader.  The next proposition plays an essential role to prove the main result below on the nonempty property of the relative interior of any nonempty convex set in Rn . Proposition 2.20 Let Δm be an m-simplex in Rn for some m ≥ 1. Then ri Δm = ∅. Proof Consider the vertices v0 , . . . , vm of the simplex Δm and denote 1  vi . m+1 m

v=

i=0

We prove the proposition by showing that v ∈ ri Δm . Define    L = span vi − v0  i = 1, . . . , m and observe that L is the m-dimensional linear subspace of Rn parallel to aff Δm = aff {v0 , . . . , vm }. It is not hard to see that for every x ∈ L there exists a unique element (λ0 , . . . , λm ) ∈ Rm+1 with m m   x= λi vi , λi = 0. i=0

i=0

Rm+1 ,

which maps x to the corresponding coefficients Consider the mapping A : L → (λ0 , . . . , λm ) ∈ Rm+1 as above. Then A is linear, and hence it is continuous by Lemma 2.19. Since A(0) = 0, we can choose δ > 0 such that A(u)
0 such that B(x; δ) ∩ aff Ω1 ⊂ Ω1 and B(x; δ) ∩ aff Ω2 ⊂ Ω2 , which readily imply that   B(x; δ) ∩ aff Ω1 ∩ aff Ω2 ⊂ Ω1 ∩ Ω2 . Since aff(Ω1 ∩ Ω2 ) ⊂ aff Ω1 ∩ aff Ω2 , we have B(x; δ) ∩ aff(Ω1 ∩ Ω2 ) ⊂ Ω1 ∩ Ω2 . Finally, it follows from the relative interior definition that x ∈ ri(Ω1 ∩ Ω2 ). Thus ri Ω1 ∩  ri Ω2 ⊂ ri(Ω1 ∩ Ω2 ), which completes the proof. The next corollary can be deduced from Theorem 2.26 by induction. Corollary 2.27 Let Ωi for i = 1, . . . , m be convex sets in Rn . Suppose that m 

ri Ωi = ∅.

i=1

Then we have the equalities m m Ω = i=1 Ω. (i) i=1 m i m i ri Ωi . (ii) ri i=1 Ωi = i=1 The following useful result reveals the behavior of relative interiors under applying affine mappings. Theorem 2.28 Let B : Rn → R p be an affine mapping, and let Ω ⊂ Rn be a convex set. Then we have the equality     B ri Ω = ri B(Ω) .

2.1 Affine Hulls and Relative Interiors of Convex Sets

57

Proof Pick y ∈ B(ri Ω) and find x ∈ ri Ω such that y = B(x). Then take y ∈ ri(B(Ω)) ⊂ x ∈ Ω such that x ∈ ( x , x) B(Ω) and x ∈ Ω with y = B(x). By Proposition 2.18, we find  and then define  y = B( x ) ∈ B(Ω). Since B is an affine mapping, it follows that y = B(x) ∈ y, y), and so y ∈ ri(B(Ω)) by Theorem 2.22(ii). Thus we have (B( x ), B(x)) = (     B ri Ω ⊂ ri B(Ω) . To proceed further, let us show that B(Ω) = B(ri Ω) and then obtain by using Proposition 2.23(i) the reverse inclusion       ri B(Ω) = ri B(ri Ω) ⊂ B ri Ω . Indeed, the continuity of B, which follows from Lemma 2.19 and Proposition 2.23(i), tells us that     B(Ω) ⊂ B Ω = B(ri Ω) ⊂ B ri Ω . Taking into account the obvious inclusion B(ri Ω) ⊂ B(Ω), we have that     B(Ω) ⊂ B Ω = B(ri Ω) ⊂ B ri Ω . This yields B(Ω) ⊂ B(ri Ω) and thus completes the proof of the theorem.



As a consequence of the above theorem, we establish the behavior of relative interiors under the summation and scalar multiplication of convex sets. Corollary 2.29 Let Ω, Ω1 , and Ω2 be convex subsets of Rn , and let α ∈ R. Then we have the equalities (i) ri(Ω1 + Ω2 ) = ri Ω1 + ri Ω2 . (ii) ri(αΩ) = α ri Ω. Proof Define B : Rn × Rn → Rn by B(x, y) = x + y and form the Cartesian product Θ = Ω1 × Ω2 . Then B(Θ) = Ω1 + Ω2 , which implies that     ri(Ω1 + Ω2 ) = ri(B(Θ)) = B ri Θ = B ri(Ω1 × Ω2 )   = B ri Ω1 × ri Ω2 = ri Ω1 + ri Ω2 by using Theorem 2.28 and the fact that ri(Ω1 × Ω2 ) = ri Ω1 × ri Ω2 . This completes the proof of (i). The proof of (ii) is similar. 

58

2.2

2 Convex Separation and Some Consequences

Strict Separation of Convex Sets

In this section, we study the notion of strict separation of convex sets by hyperplanes in Rn . Recall that a subset H of Rn is a hyperplane if it is affine with dim H = n − 1. Proposition 2.30 A subset H of Rn is a hyperplane if and only if there exist a nonzero vector v ∈ Rn and a real number α such that   H = x ∈ Rn | v, x = α .

(2.4)

Proof Suppose that H is a hyperplane in Rn with the representation H = x0 + L, where x0 ∈ H and L is a linear subspace of Rn with dim L = n − 1. Let {u 1 , . . . , u n−1 } be an orthonormal basis of L and choose v ∈ Rn such that {u 1 , . . . , u n−1 , v} is an orthonormal basis of Rn . Then it is obvious that v = 0. Letting α = v, x0 , we now show that (2.4) is satisfied. Indeed, if x ∈ H , then x − x0 ∈ L, so v, x − x0  = 0 and we have v, x = v, x0  = α. This justifies the inclusion “⊂” in (2.4). Take further any x ∈ Rn such that v, x = α = v, x0 . Then v, x − x0  = 0, which implies that x − x0 ∈ L and so x ∈ x0 + L = H . This justifies the reverse inclusion in (2.4). For the converse implication, suppose that H admits representation (2.4) with 0 = v ∈ Rn and α ∈ R. Let x0 = αv/ v 2 and get v, x0  = α, i.e., x0 ∈ H . Defining the set    L = x ∈ Rn  v, x = 0 , we have that L is a linear subspace of Rn with dim L = n − 1. It is obvious that H = x0 + L,  so H is a hyperplane in Rn .

Fig. 2.1 Illustration of convex separation

2.2

Strict Separation of Convex Sets

59

Definition 2.31 Two nonempty subsets Ω1 and Ω2 of Rn can be strictly separated by a hyperplane (Fig. 2.1) if there exists a (nonzero) element v ∈ Rn together with α ∈ R and ε > 0 such that v, x ≤ α − ε < α + ε ≤ v, y for all x ∈ Ω1 , y ∈ Ω2 .

(2.5)

In the setting of Definition 2.31, consider the set    H = x ∈ Rn  v, x = α . Since v is nonzero, H is a hyperplane in Rn . Then Ω1 lies strictly on one side and Ω2 lies strictly on the other side of H in the sense that v, x ≤ α − ε for all x ∈ Ω1 and α + ε ≤ v, y for all y ∈ Ω2 . Proposition 2.32 Two nonempty subsets Ω1 and Ω2 of Rn can be strictly separated by a hyperplane if and only if there exists v ∈ Rn such that       sup v, x  x ∈ Ω1 < inf v, y  y ∈ Ω2 . (2.6) Proof Suppose that Ω1 and Ω2 can be strictly separated by a hyperplane. Using the notation of Definition 2.31, we have       sup v, x  x ∈ Ω1 ≤ α − ε and α + ε ≤ inf v, y  y ∈ Ω2 , which readily yield (2.6). To verify the converse implication, suppose that (2.6) is satisfied. Then we can find α ∈ R and ε > 0 such that       sup v, x  x ∈ Ω1 ≤ α − ε < α + ε ≤ inf v, y  y ∈ Ω2 , which clearly ensures the fulfillment of (2.5). Therefore, the sets Ω1 and Ω2 can be strictly separated by a hyperplane.  The following result concerns the strict separation of a nonempty closed convex set and an out-of-set point. / Ω. Then Ω Proposition 2.33 Let Ω be a nonempty closed convex subset of Rn , and let x ∈ and {x} can be strictly separated by a hyperplane, which means the existence of an element v ∈ Rn such that    (2.7) sup v, x  x ∈ Ω < v, x.

60

2 Convex Separation and Some Consequences

Proof Let w = Π (x; Ω) be the Euclidean projection from x to Ω, and let v = x − w = 0. Applying Proposition 1.69 gives us v, x − w = x − w, x − w ≤ 0 for every x ∈ Ω. Then for any x ∈ Ω, we have v, x − (x − v) = v, x − w ≤ 0, which implies in turn that v, x ≤ v, x − v 2 and thus verifies (2.7).



Lemma 2.34 Let Ω1 and Ω2 be two subsets of Rn . If Ω1 is compact and Ω2 is closed, then the difference Ω1 − Ω2 is a closed set. Proof Take an arbitrary sequence {ak } ⊂ Ω1 − Ω2 and suppose that {ak } converges to a. Then we have the representation ak = bk − ck for all k ∈ N, where {bk } ⊂ Ω1 and {ck } ⊂ Ω2 . The compactness of Ω1 allows us to extract a subsequence {bkl } that converges to some b ∈ Ω1 . Then ckl = bkl − akl → b − a as l → ∞. The closedness of Ω2 implies that b − a ∈ Ω2 , and so a ∈ b − Ω2 ⊂ Ω1 − Ω2 . Therefore,  the set Ω1 − Ω2 is closed. The next theorem extends the result of Proposition 2.33 to the case of two convex sets. Theorem 2.35 Let Ω1 and Ω2 be nonempty convex subsets of Rn with Ω1 ∩ Ω2 = ∅. If Ω1 is compact and Ω2 is closed, then Ω1 and Ω2 can be strictly separated by a hyperplane. Proof Denote Ω = Ω1 − Ω2 . By Corollary 1.26, the set Ω is convex. Since Ω1 is compact and Ω2 is closed, it follows from Lemma 2.34 that Ω is closed. Thus Ω is a nonempty closed convex set with 0 ∈ / Ω. Applying Proposition 2.33 to Ω and x = 0, we have the conditions    γ = sup v, x  x ∈ Ω < 0 = v, x for some v ∈ Rn . If x ∈ Ω1 and y ∈ Ω2 , then x − y ∈ Ω and hence v, x − y ≤ γ, which implies that v, x ≤ γ + v, y. Therefore, we arrive at          sup v, x  x ∈ Ω1 ≤ γ + inf v, y  y ∈ Ω2 < inf v, y  y ∈ Ω2 and thus complete the proof of the theorem.



2.3

2.3

Separation and Proper Separation Theorems

61

Separation and Proper Separation Theorems

In this section, we introduce and study the notions of convex separation and proper convex separation of sets by hyperplanes. These notions can be characterized by using the relative interiors of sets and via set extremality. Definition 2.36 Let Ω1 and Ω2 be nonempty subsets of Rn . It is said that Ω1 and Ω2 can be separated by a hyperplane if there exists a nonzero element v ∈ Rn such that we have       sup v, x  x ∈ Ω1 ≤ inf v, x  x ∈ Ω2 . (2.8) If it holds in addition that       inf v, x  x ∈ Ω1 < sup v, x  x ∈ Ω2 ,

(2.9)

which means that there exist x1 ∈ Ω1 and x2 ∈ Ω2 with v, x1  < v, x2 , then the sets Ω1 and Ω2 can be properly separated by a hyperplane. The following lemma is useful for the proofs of the convex separation theorems given below; see Fig. 2.2 for its geometric illustration. Lemma 2.37 Let Ω be a nonempty convex subset of Rn , and let 0 ∈ Ω \ ri Ω. Then the affine hull aff Ω of Ω is actually a linear subspace of Rn , and there exists a sequence / Ω and xk → 0 as k → ∞. {xk } ⊂ aff Ω with xk ∈

Fig. 2.2 Geometric illustration of Lemma 2.37

62

2 Convex Separation and Some Consequences

Proof Pick any point x0 ∈ ri Ω, which exists by Theorem 2.22(i). Since 0 ∈ Ω \ ri Ω, we / Ω for all t > 0. Indeed, suppose by contradiction that −t x0 ∈ Ω for some see that −t x0 ∈ t > 0 and then deduce from Theorem 2.22(ii) that 0=

t 1 x0 + (−t x0 ) ∈ ri Ω, 1+t 1+t

/Ω which contradicts the assumption 0 ∈ / ri Ω. Letting now xk = −x0 /k implies that xk ∈ for every k ∈ N with xk → 0 as k → ∞. Furthermore, we have 0 ∈ Ω ⊂ aff Ω = aff Ω due to the closedness of aff Ω. It follows from Proposition 2.4(v) that the set aff Ω is a linear subspace of Rn and that xk ∈ aff Ω for all k ∈ N.  To proceed, we need another lemma on the strict separation in linear subspaces of Rn . Lemma 2.38 Let L be a linear subspace of Rn , and let Ω ⊂ L be a nonempty convex set / Ω. Then there exists v ∈ L such that with x ∈ L and x ∈ sup {v, x | x ∈ Ω} < v, x. Proof Employing Proposition 2.33 gives us a vector w ∈ Rn such that sup {w, x | x ∈ Ω} ≤ sup {w, x | x ∈ Ω} < w, x. It is well known that Rn can be represented as the direct sum Rn = L ⊕ L ⊥ , where    L ⊥ := u ∈ Rn  u, x = 0 for all x ∈ L . Thus w = u + v with u ∈ L ⊥ and v ∈ L. This yields u, x = 0 for any x ∈ Ω ⊂ L and v, x = u, x + v, x = u + v, x = w, x ≤ sup {w, x | x ∈ Ω} < w, x = u + v, x = u, x + v, x = v, x, which shows that sup {v, x | x ∈ Ω} < v, x.



The next result establishes the proper separation of a nonempty convex set and the origin in Rn . / ri Ω if and only if the sets Theorem 2.39 Let Ω be a nonempty convex set in Rn . Then 0 ∈ Ω and {0} can be properly separated by a hyperplane, i.e., there exists a vector v ∈ Rn such that       sup v, x  x ∈ Ω ≤ 0 and inf v, x  x ∈ Ω < 0.

2.3

Separation and Proper Separation Theorems

63

Proof We split the proof into the following two cases. Case 1: 0 ∈ / Ω. Using the strict separation from Proposition 2.33 for the set Ω and x = 0, find v ∈ Rn such that       sup v, x  x ∈ Ω ≤ sup v, x  x ∈ Ω < v, x = 0, which implies that the sets Ω and {0} can be properly separated. Case 2: 0 ∈ Ω \ ri Ω. Letting L = aff Ω and employing Lemma 2.37 tell us that L is a linear subspace of Rn and there exists a sequence {xk } ⊂ L with xk ∈ / Ω and xk → 0 as k → ∞. By Lemma 2.38, there exists a sequence {vk } ⊂ L with vk = 0 and    sup vk , x  x ∈ Ω < vk , xk  for all k ∈ N. Denoting wk = vk / vk shows that wk = 1 for all k ∈ N and that wk , x < wk , xk  for all x ∈ Ω.

(2.10)

Letting k → ∞ in (2.10) and assuming without loss of generality that wk → v ∈ L with some v = 1 along the entire sequence, we arrive at    sup v, x  x ∈ Ω ≤ 0 by using |wk , xk | ≤ wk · xk = xk → 0. To verify the inequality    inf v, x  x ∈ Ω < 0, it suffices to show that there exists x ∈ Ω with v, x < 0. Suppose on the contrary that v, x ≥ 0 for all x ∈ Ω and deduce from sup{v, x | x ∈ Ω} ≤ 0 that v, x = 0 for all x ∈ Ω. Since v ∈ L = aff Ω, we get the representation v=

m  i=1

λi ωi with

m 

λi = 1 and ωi ∈ Ω for i = 1, . . . , m,

i=1

which readily implies the equalities v 2 = v, v =

m 

λi v, ωi  = 0.

i=1

This contradicts the condition v = 1 and so justifies the proper separation. To prove the converse implication of the theorem, assume that Ω and {0} can be properly separated by a hyperplane and thus find v ∈ Rn such that    sup v, x  x ∈ Ω ≤ 0 while v, x0  < 0 for some x0 ∈ Ω.

64

2 Convex Separation and Some Consequences

If on the contrary 0 ∈ ri Ω, it follows from Proposition 2.18 that 0 + t(0 − x0 ) = −t x0 ∈ Ω for some t > 0. This immediately yields the inequalities    v, −t x0  ≤ sup v, x  x ∈ Ω ≤ 0 and thus v, x0  ≥ 0. We arrive at a contradiction, which justifies 0 ∈ / ri Ω.



Now we are ready to establish the main proper separation theorem for convex sets in finite-dimensional convex analysis. Theorem 2.40 Let Ω1 and Ω2 be two nonempty convex subsets of Rn . Then the sets Ω1 and Ω2 can be properly separated by a hyperplane if and only if we have the condition ri Ω1 ∩ ri Ω2 = ∅. Proof Define Ω = Ω1 − Ω2 and observe by Corollary 2.29 that ri Ω1 ∩ ri Ω2 = ∅ if and only if 0∈ / ri(Ω1 − Ω2 ) = ri Ω1 − ri Ω2 . / ri(Ω1 − Ω2 ) = ri Ω. To proceed, suppose first that ri Ω1 ∩ ri Ω2 = ∅, which implies 0 ∈ By Theorem 2.39, the sets Ω and {0} can be properly separated by a hyperplane. Thus there exists v ∈ Rn such that v, x ≤ 0 for all x ∈ Ω, and v, x < 0 for some x ∈ Ω. For any ω1 ∈ Ω1 and ω2 ∈ Ω2 , denote x = ω1 − ω2 ∈ Ω and get that v, ω1 − ω2  = v, x ≤ 0, i.e., v, ω1  ≤ v, ω2 . Choose ω 1 ∈ Ω1 and ω 2 ∈ Ω2 with x = ω 1 − ω 2 . Then v, ω 1 − ω 2  = v, x < 0, which implies that v, ω 1  < v, ω 2 . Thus the sets Ω1 and Ω2 can be properly separated by a hyperplane. To verify now the converse implication, suppose that Ω1 and Ω2 can be properly separated by a hyperplane. Then the sets Ω = Ω1 − Ω2 and {0} can be properly separated by a hyperplane as well. Theorem 2.39 yields 0∈ / ri Ω = ri(Ω1 − Ω2 ) = ri Ω1 − ri Ω2 , which implies that ri Ω1 ∩ ri Ω2 = ∅ and thus completes the proof.



2.3

Separation and Proper Separation Theorems

65

We continue this section with several characterizations of the relative interior. For a subset Ω of Rn , recall that the cone generated by Ω is     cone Ω = tω  t ≥ 0, ω ∈ Ω = tΩ. t≥0

Theorem 2.41 Let Ω be a nonempty convex set in Rn with x0 ∈ Ω. Then the following assertions are equivalent: (i) (ii) (iii) (iv)

x0 ∈ ri Ω. For any x ∈ Ω, there exists u ∈ Ω such that x0 ∈ (x, u). cone(Ω − x0 ) is a linear subspace of Rn . cone(Ω − x0 ) is a linear subspace of Rn .

Proof (i)=⇒(ii): This implication follows directly from Proposition 2.18. Note that this argument applies for the case where x = x0 , in which we have u = x0 and x0 ∈ (x, u) = (x0 , x0 ) = {x0 }. (ii)=⇒(iii): Let us show that K = cone(Ω − x0 ) is a linear subspace of Rn given that (ii) is satisfied. Since K is a convex cone, it suffices to show that if x ∈ K , then −x ∈ K . Taking any x ∈ K , we find t ≥ 0 and ω ∈ Ω such that x = t(w − x0 ). By (ii), there exists u ∈ Ω with x0 ∈ (w, u). Thus we have the representation w = x0 + λ(x0 − u) for some λ > 0. Then x = tλ(x0 − u), and so −x = tλ(u − x0 ) ∈ cone(Ω − x0 ), which readily justifies assertion (iii). (iii)=⇒(iv): This implication is obvious since the closure of a linear subspace is itself due to the fact that any linear subspace in Rn is closed. (iv)=⇒(i): Let cone(Ω − x0 ) be a linear subspace of Rn . Suppose on the contrary that / ri Ω. By Theorem 2.40 on the proper separation, there exists v ∈ Rn such that x0 ∈ v, x ≤ v, x0  for all x ∈ Ω.

(2.11)

v,  x  < v, x0  for some  x ∈ Ω.

(2.12)

In addition, we have It follows from (2.11) that v, x − x0  ≤ 0 and hence v, t(x − x0 ) ≤ 0 for all x ∈ Ω and t ≥ 0. Employing the definition and a limiting process gives us v, c ≤ 0 for all c ∈ cone(Ω − x0 ).

66

2 Convex Separation and Some Consequences

It follows therefore that v, c = 0 for all c ∈ cone(Ω − x0 ) since cone(Ω − x0 ) is a linear subspace of Rn . Observing that c =  x − x0 ∈ cone(Ω − x0 ), we get v, c = 0. This yields v,  x  = v, x0 , which contradicts (2.11) and thus completes the proof of the theorem.  As seen above, the proper separation of nonempty convex sets Ω1 and Ω2 can be characterized by the relative interior condition ri Ω1 ∩ ri Ω2 = ∅. Now we show that the separation property can be characterized by the following notion of set extremality. Definition 2.42 Two nonempty sets Ω1 and Ω2 form an extremal system if for any ε > 0 there exists a ∈ Rn such that a < ε and   Ω1 − a ∩ Ω2 = ∅. Let us present some important examples of extremal systems of sets. The reader is referred to Exercise 2.14 for another example related to optimization. Example 2.43 (i) If Ω1 and Ω2 are two nonempty disjoint sets, then they obviously form an extremal system. (ii) Let Ω be a convex subset of Rn with x ∈ Ω. Then x ∈ Ω is a boundary point of Ω if and only if the sets Ω and {x} form an extremal system. Indeed, if x ∈ bd Ω, then for any real number ε > 0 we get B(x; ε/2) ∩ Ω c = ∅. Thus choosing b ∈ B(x; ε/2) ∩ Ω c and denoting a = b − x yield a < ε and (Ω − a) ∩ {x} = ∅. The converse implication is left as an exercise; see Exercise 2.15. The following result is a separation (while not proper separation) theorem for extremal systems of convex sets. It admits a far-going extension to nonconvex settings, which is known as the extremal principle and is at the heart of variational analysis and its applications; see [15]. It is worth mentioning that, in contrast to general settings of variational analysis, we do not impose here any closedness assumptions on the sets in question. Theorem 2.44 Let Ω1 and Ω2 be two nonempty convex subsets of Rn . Then Ω1 and Ω2 form an extremal system if and only if the sets Ω1 and Ω2 can be separated by a hyperplane. Proof Suppose that Ω1 and Ω2 form an extremal system in Rn . Then there exists a sequence {ak } in Rn with ak → 0 as k → ∞ and ri(Ω1 − ak ) ∩ ri Ω2 ⊂ (Ω1 − ak ) ∩ Ω2 = ∅ for all k ∈ N.

2.4

Relative Interiors of Convex Graphs

67

Taking any k ∈ N, apply Theorem 2.40 to the convex sets Ω1 − ak and Ω2 and find a sequence of nonzero elements vk ∈ Rn satisfying vk , x − ak  ≤ vk , y for all x ∈ Ω1 and y ∈ Ω2 . The normalization u k = vk / vk with u k = 1 gives us u k , x − ak  ≤ u k , y for all x ∈ Ω1 and y ∈ Ω2 .

(2.13)

Thus we find v ∈ Rn with v = 1 such that u k → v as k → ∞ along a subsequence. Since u k , ak  → 0 as k → ∞, passing to the limit in (2.13) verifies the validity of the separation property in (2.8). For the converse implication, suppose that there exists a nonzero element v ∈ Rn which separates the sets Ω1 and Ω2 in the sense of (2.8). Then it is not hard to check that we have  1  Ω1 − v ∩ Ω2 = ∅ for all k ∈ N. k Thus Ω1 and Ω2 form an extremal system of sets in Rn .

2.4



Relative Interiors of Convex Graphs

This section is devoted to representing the relative interior of the graph of a given convex set-valued mapping using the relative interiors of its domain and function values. The convex separation theorems from the preceding section are crucial for deriving the main result below. → R p be a convex set-valued mapping. Then we have the repTheorem 2.45 Let F : Rn → resentation        (2.14) ri gph F = (x, y) ∈ Rn × R p  x ∈ ri dom F , y ∈ ri F(x) . Proof First, we verify the inclusion “⊂” in (2.14). Considering the mapping P : Rn × R p → Rn given by P (x, y) = x for (x, y) ∈ Rn × R p , deduce from Proposition 1.74 and Theorem 2.28 that     P ri(gph F) = ri P (gph F) = ri(dom F).

(2.15)

Take further any (x, y) ∈ ri(gph F) and get from (2.15) that x ∈ ri(dom F). Since (x, y) ∈ ri(gph F) ⊂ gph F, we have y ∈ F(x). Fix any y ∈ F(x) and get (x, y) ∈ gph F. By the equivalence between (i) and (ii) in Theorem 2.41, there exist (u, z) ∈ gph F and t ∈ (0, 1) such that

68

2 Convex Separation and Some Consequences

(x, y) = t(x, y) + (1 − t)(u, z). Hence x = t x + (1 − t)u, which tells us that (1 − t)x = (1 − t)u and so x = u. In addition, we have y = t y + (1 − t)z ∈ (y, z), where z ∈ F(x). Using again the equivalence between (i) and (ii) in Theorem 2.41 yields y ∈ ri F(x). To verify next the reverse inclusion in (2.14), fix x ∈ ri(dom F) and y ∈ ri F(x). Arguing / ri(gph F) and then find by Theorem 2.40 a pair by contradiction, suppose that (x, y) ∈ (u, v) ∈ Rn × R p such that u, x + v, y ≤ u, x + v, y whenever (x, y) ∈ gph F.

(2.16)

Furthermore, there exists (x0 , y0 ) ∈ gph F satisfying u, x0  + v, y0  < u, x + v, y.

(2.17)

Letting x = x in (2.16) gives us the inequality v, y ≤ v, y for all y ∈ F(x).

(2.18)

Since x ∈ ri(dom F) and x0 ∈ dom F, we deduce from Theorem 2.41 that there exists  x∈ x for some t ∈ (0, 1). Choose  y ∈ F( x ) and consider the dom F such that x = t x0 + (1 − t) convex combination y, y  = t y0 + (1 − t) x,  y) ∈ gph F, we use where y  ∈ F(x) by the convexity of the graphical set gph F. Since ( (2.16) and (2.17) to get the inequalities u,  x  + v,  y ≤ u, x + v, y, u, x0  + v, y0  < u, x + v, y. Multiplying the first inequality above by 1 − t and the second one by t, and then summing them up lead us to the inequality u, x + v, y   < u, x + v, y, which yields v, y   < v, y. Using the latter together with (2.18), we conclude that the sets {y} and F(x) can be properly separated by a hyperplane. Applying Theorem 2.40 tells / ri F(x), a contradiction that verifies (x, y) ∈ ri(gph F) and thus completes the us that y ∈ proof of the theorem.  As a consequence of Theorem 2.45, we obtain the following representation for relative interiors of epigraphs of extended-real-valued convex functions.

2.4

Relative Interiors of Convex Graphs

69

Corollary 2.46 Let f : Rn → R be a convex function. Then we have    ri(epi f ) = (x, t) ∈ Rn+1  x ∈ ri(dom f ), f (x) < t .

(2.19)

Proof Consider the epigraphical multifunction E f defined in Example 1.73. Then gph E f = epi f and dom E f = dom f . In addition, for any x ∈ dom E f we get ri E f (x) = ri [ f (x), ∞) = ( f (x), ∞). Thus representation (2.19) follows from the one for ri(gph E f ) obtained in (2.14).  Finally in this section, we employ Theorem 2.45 to obtain a relative interior representation for graphs of multifunctions that are defined as products of epigraphical ones, i.e., F(x) = E f1 (x) × E f2 (x) × · · · × E fm (x), x ∈ Rn ,

(2.20)

where each f i : Rn → R is a convex function, and where the epigraphical multifunction E fi is defined in Example 1.73. Corollary 2.47 Let f i : Rn → R, i = 1, . . . , m, be convex functions satisfying m 

ri (dom f i ) = ∅.

i=1

Then we have the following relative interior representation for the graph of the multifunction → Rm defined in (2.20): F : Rn →     m  ri (dom f i ), ri gph F = (x, t1 , . . . , tm ) ∈ Rn × Rm  x ∈ i=1  f i (x) < ti for all i = 1, . . . , m . m Proof By definition (2.20), we have that dom F = i=1 dom f i . Under the assumption that m ri (dom f ) = ∅, we employ Corollary 2.27 and get i i=1 m m      dom f i = ri (dom f i ). ri dom F = ri i=1

i=1

For any x ∈ dom F, it readily follows that   ri F(x) = ri [ f 1 (x), ∞) × · · · × [ f m (x), ∞) = ( f 1 (x), ∞) × · · · × ( f m (x), ∞). Applying now Theorem 2.45 completes the proof of the corollary.



70

2 Convex Separation and Some Consequences

2.5

Normal Cones to Convex Sets

In this section, we apply the separation results obtained above to the study of normal cones to convex sets and derive basic rules of normal cone calculus. Definition 2.48 Let Ω ⊂ Rn be a nonempty convex set with x ∈ Ω. The normal cone to Ω at x is    (2.21) N (x; Ω) := v ∈ Rn  v, x − x ≤ 0 for all x ∈ Ω . / Ω. By convention, we let N (x; Ω) = ∅ if x ∈ Here are some simple normal cone calculations. Example 2.49 (i) Let Ω = B ⊂ Rn be the closed unit ball. Then ⎧ ⎪ ⎪ ⎨{t x | t ≥ 0} if x = 1, N (x; Ω) = {0} if x < 1, ⎪ ⎪ ⎩∅ if x > 1. (ii) Let Ω = {x} for x ∈ Rn be a singleton. Then N (x; Ω) = Rn . (iii) Let Ω = [0, 1] × [0, 1] ⊂ R2 , and let x = (1, 1). Then    N (x; Ω) = (v1 , v2 ) ∈ R2  v1 ≥ 0, v2 ≥ 0 . (iv) Let f (x) = |x| for x ∈ R, let Ω = epi f ⊂ R2 , and let x = 0. Then    N ((x, f (x)); epi f ) = (x, λ) ∈ R2  λ ≤ −|x| , which is illustrated in Fig. 2.3. Next we present some elementary properties of the normal cone. Proposition 2.50 Let x ∈ Ω for a convex subset Ω of Rn . Then we have (i) N (x; Ω) is a closed convex cone containing the origin. (ii) If x ∈ int Ω, then N (x; Ω) = {0}. Proof (i) Fix vi ∈ N (x; Ω) and λi ≥ 0 for i = 1, 2. We get by definition that 0 ∈ N (x; Ω) and vi , x − x ≤ 0 for all x ∈ Ω and i = 1, 2. This implies that λ1 v1 + λ2 v2 , x − x ≤ 0 whenever x ∈ Ω.

2.5

Normal Cones to Convex Sets

71

Fig. 2.3 Epigraph of the absolute value function and the normal cone to it at 0

Thus λ1 v1 + λ2 v2 ∈ N (x; Ω), and hence N (x; Ω) is a convex cone. To check its closedness, fix a sequence {vk } ⊂ N (x; Ω) that converges to v. Then passing to the limit in the inequalities vk , x − x ≤ 0 for all x ∈ Ω as k → ∞ tells us that v, x − x ≤ 0 whenever x ∈ Ω, and so v ∈ N (x; Ω). (ii) Pick v ∈ N (x; Ω) with x ∈ int Ω and find δ > 0 such that x + δB ⊂ Ω. Thus for every e ∈ B we have the inequality v, x + δe − x ≤ 0, which shows that δv, e ≤ 0, and hence v, e ≤ 0. Choose t > 0 so small that tv ∈ B.  Then v, tv ≤ 0 and therefore t v 2 ≤ 0, i.e., v = 0. The following proposition presents a normal cone characterization of relative interior points of convex sets. Proposition 2.51 Let Ω be a convex set in Rn with x ∈ Ω. Then x ∈ ri Ω if and only if N (x; Ω) is a linear subspace of Rn . Proof Take x ∈ ri Ω and suppose on the contrary that N (x; Ω) is not a linear subspace of / N (x; Ω). By the definition of the normal Rn . Then there exists v ∈ N (x; Ω) such that −v ∈ cone, we have v, x − x ≤ 0 for all x ∈ Ω, and there exists x0 ∈ Ω such that −v, x0 − x > 0. It follows that v, x ≤ v, x for all x ∈ Ω with v, x0  < v, x. Thus the sets Ω and {x} can be properly separated by a hyperplane. / ri Ω, which is a contradiction. To verify the converse Using Theorem 2.40 brings us to x ∈

72

2 Convex Separation and Some Consequences

implication, it suffices to assume by contradiction that x ∈ / ri Ω and employ Theorem 2.40 / N (x; Ω). This contradicts the again to see that there exists v ∈ N (x; Ω) such that −v ∈  assumption that N (x; Ω) is a linear subspace of Rn and thus completes the proof. The next proposition allows us to characterize the extremality and separation of convex sets in terms of their normal cones. Proposition 2.52 Let Ω1 and Ω2 be convex sets in Rn with x ∈ Ω1 ∩ Ω2 . Then the following assertions are equivalent: (i) {Ω1 , Ω2 } forms an extremal system. (ii) The sets Ω1 and Ω2 can be separated by a hyperplane.   (iii) N (x; Ω1 ) ∩ − N (x; Ω2 ) = {0}. Proof The equivalence of (i) and (ii) follows from Theorem 2.44. Suppose that (ii) is satisfied. By definition, there exits v = 0 such that v, x ≤ v, y whenever x ∈ Ω1 , y ∈ Ω2 . Since x ∈ Ω2 , we have v, x ≤ v, x, and thus v, x − x ≤ 0 for all x ∈ Ω1 , i.e., v ∈ N (x; Ω1 ). Similarly, we can verify the inclusion −v ∈ N (x; Ω2 ) and hence arrive at (iii).   Suppose further that (iii) is satisfied, and let 0 = v ∈ N (x; Ω1 ) ∩ − N (x; Ω2 ) . Then (2.8) follows directly from the definition of the normal cone. Indeed, for any x ∈ Ω1 and y ∈ Ω2 , we have v, x − x ≤ 0 and −v, y − x ≤ 0, which imply that v, x − x ≤ 0 ≤ v, y − x, and hence v, x ≤ v, y. Therefore, we obtain (ii) and complete the proof of the proposition.  Now we provide characterizations of the normal cone nontriviality. Corollary 2.53 Let Ω be a convex subset of Rn with x ∈ Ω. Then we have the following equivalent statements: (i) (ii) (iii) (iv)

x is a boundary point of Ω. The sets Ω and {x} form an extremal system. The sets Ω and {x} can be separated by a hyperplane. N (x; Ω) = {0}.

2.5

Normal Cones to Convex Sets

73

Proof The equivalence between (i) and (ii) follows from Example 2.43(ii). Let Ω1 = Ω and Ω2 = {x}. Since N (x; Ω2 ) = Rn , the rest of the proof is a consequence of Proposition 2.52    since N (x; Ω1 ) ∩ − N (x; Ω2 ) = N (x; Ω). The next corollary gives us a sufficient condition for the nontriviality of the normal cone via relative interior. / ri Ω, then we have the nontriviality Corollary 2.54 Let Ω be a convex set with x ∈ Ω. If x ∈ condition N (x; Ω) = {0}. Proof Suppose that x ∈ / ri Ω. Theorem 2.40 tells us that the sets Ω and {x} can be properly separated by a hyperplane. Thus they can be separated by a hyperplane, so N (x; Ω) = {0} by Corollary 2.53.  We start calculus rules by deriving simple albeit useful results. Proposition 2.55 (i) Let Ω1 and Ω2 be convex subsets of Rn and R p , respectively. For (x 1 , x 2 ) ∈ Ω1 × Ω2 , we have   N (x 1 , x 2 ); Ω1 × Ω2 = N (x 1 ; Ω1 ) × N (x 2 ; Ω2 ).

(2.22)

(ii) Let Ω1 and Ω2 be convex subsets of Rn with x i ∈ Ωi for i = 1, 2. Then N (x 1 + x 2 ; Ω1 + Ω2 ) = N (x 1 ; Ω1 ) ∩ N (x 2 ; Ω2 ). Proof To verify (i), fix (v1 , v2 ) ∈ N ((x 1 , x 2 ); Ω1 × Ω2 ) and get that ! (v1 , v2 ), (x1 , x2 ) − (x 1 , x 2 ) = v1 , x1 − x 1  + v2 , x2 − x 2  ≤ 0

(2.23)

whenever (x1 , x2 ) ∈ Ω1 × Ω2 . Putting x2 = x 2 in (2.23) gives us v1 , x1 − x 1  ≤ 0 for all x1 ∈ Ω1 meaning that v1 ∈ N (x 1 ; Ω1 ). Similarly, we obtain v2 ∈ N (x 2 ; Ω2 ) and justify the inclusion “⊂” in (2.22). The proof of the reverse one is straightforward. To verify (ii), fix v ∈ N (x 1 + x 2 ; Ω1 + Ω2 ) and get by definition that ! v, x1 + x2 − (x 1 + x 2 ) ≤ 0 whenever x1 ∈ Ω1 , x2 ∈ Ω2 . Putting there x1 = x 1 first and then x = x2 gives us v ∈ N (x 1 ; Ω1 ) ∩ N (x 2 ; Ω2 ), which justifies the inclusion “⊂” in (ii). The proof of reverse inclusion is also straightforward. 

74

2 Convex Separation and Some Consequences

Fig. 2.4 Normal cone to set intersection

Now we are ready to derive the major representation of the normal cone to intersections of convex sets; see an illustration in Fig. 2.4. Note that the proof of this result, as well as the subsequent calculus rules of generalized differentiation for convex functions and set-valued mappings, follows the geometric pattern of variational analysis as in [15], while the specific features of convexity allow us to essentially simplify the proof and to avoid the closedness requirement on sets and the corresponding lower semicontinuity assumptions on functions in subdifferential calculus. Theorem 2.56 Let Ω1 and Ω2 be convex sets in Rn satisfying the relative interior qualification condition ri Ω1 ∩ ri Ω2 = ∅.

(2.24)

Then we have the normal cone intersection rule   N x; Ω1 ∩ Ω2 = N (x; Ω2 ) + N (x; Ω2 ) for all x ∈ Ω1 ∩ Ω2 .

(2.25)

Proof Fixing x ∈ Ω1 ∩ Ω2 , it suffices to verify the inclusion “⊂” since the reverse inclusion is obvious. Taking any v ∈ N (x; Ω1 ∩ Ω2 ), we get by the normal cone definition that v, x − x ≤ 0 for all x ∈ Ω1 ∩ Ω2 . Consider the two convex sets Θ1 = Ω1 × [0, ∞) and    Θ2 = (x, λ) ∈ Rn × R  x ∈ Ω2 , λ ≤ v, x − x . Then ri Θ1 = ri Ω1 × (0, ∞) and it follows from Theorem 2.45 (see Exercise 2.19) that    ri Θ2 = (x, λ) ∈ Rn × R  x ∈ ri Ω2 , λ < v, x − x .

2.5

Normal Cones to Convex Sets

75

Assuming the contrary, it is easy to check that ri Θ1 ∩ ri Θ2 = ∅; see Exercise 2.24. Then applying Theorem 2.40 to these convex sets in Rn+1 gives us a nonzero pair (w, γ) ∈ Rn × R such that w, x − γλ1 ≤ w, y − γλ2 for all (x, λ1 ) ∈ Θ1 , (y, λ2 ) ∈ Θ2 .

(2.26)

Moreover, there exist ( x, λ1 ) ∈ Θ1 and ( y,  λ2 ) ∈ Θ2 satisfying w,  x  − γ λ1 < w,  y − γ λ2 . Observe that γ ≥ 0. Indeed, it follows from (2.26) with (x, 1) ∈ Θ1 and (x, 0) ∈ Θ2 that −γ ≤ 0, so γ ≥ 0. Let us show by using (2.24) and (2.26) that γ > 0. Supposing again on the contrary that γ = 0, we get the conditions w, x ≤ w, y for all x ∈ Ω1 , y ∈ Ω2 , y ∈ Ω2 . w,  x  < w,  y with  x ∈ Ω1 ,  This means that Ω1 and Ω2 can be properly separated, which tells us by Theorem 2.40 that ri Ω1 ∩ ri Ω2 = ∅, a contradiction verifying that γ > 0. Deduce from (2.26), by taking into account that (x, 0) ∈ Θ1 if x ∈ Ω1 and that (x, 0) ∈ Θ2 , the inequality w, x ≤ w, x for all x ∈ Ω1 . This implies therefore that w ∈ N (x; Ω1 ) and so w/γ ∈ N (x; Ω1 ). Moreover, we get from (2.26), due to (x, 0) ∈ Θ1 and (y, α) ∈ Θ2 for all y ∈ Ω2 with α = v, y − x, that ! ! w, x ≤ w, y − γv, y − x whenever y ∈ Ω2 . Dividing both sides therein by γ, we arrive at the relationship " # w v − , y − x ≤ 0 for all y ∈ Ω2 , γ and thus v −

w ∈ N (x; Ω2 ). This gives us γ v∈

w + N (x; Ω2 ) ⊂ N (x; Ω1 ) + N (x; Ω2 ) γ

and completes therefore the proof of (2.25).



The following example shows that the intersection rule (2.25) does not hold generally without assuming the qualification condition (2.24).

76

2 Convex Separation and Some Consequences

Example 2.57 Let Ω1 = {(x, λ) ∈ R2 | λ ≥ x 2 }, and let Ω2 = {(x, λ) ∈ R2 | λ ≤ −x 2 }. Then for x = (0, 0), we have N (x; Ω1 ∩ Ω2 ) = R2 while N (x; Ω1 ) + N (x; Ω2 ) = {0} × R. Next we present more easily verifiable conditions ensuring the fulfillment of the relative interior qualification condition (2.24). Proposition 2.58 Let Ω1 and Ω2 be two convex sets in Rn with nonempty intersection. Consider the following conditions:   (i) N (u; Ω1 ) ∩ − N (u; Ω2 ) = {0} for some u ∈ Ω1 ∩ Ω2 .   (ii) N (u; Ω1 ) ∩ − N (u; Ω2 ) = {0} for every u ∈ Ω1 ∩ Ω2 . (iii) (int Ω1 ) ∩ Ω2 = ∅ or Ω1 ∩ (int Ω2 ) = ∅. Then we have the relationships (iii) =⇒ (ii) ⇐⇒ (i) =⇒ (2.24). Proof Suppose that the condition (int Ω1 ) ∩ Ω2 = ∅ in (iii) is satisfied. Then we show that (ii) is satisfied. Indeed, suppose on the contrary that (ii) fails. Then N (u; Ω1 ) ∩   − N (u; Ω2 ) = {0} for some u ∈ Ω1 ∩ Ω2 . It follows from Proposition 2.52 that Ω1 and Ω2 can be separated by a hyperplane. Let v ∈ Rn be a nonzero element such that v, x ≤ v, y whenever x ∈ Ω1 , y ∈ Ω2 .   Choosing x ∈ int Ω1 ∩ Ω2 gives us δ > 0 with B(x; δ) ⊂ Ω1 and x ∈ Ω2 . Thus we have the inequality v, x + δx ≤ v, x whenever x ≤ 1. Then δv, x ≤ 0 whenever x ≤ 1, so taking x = v/ v yields δ v ≤ 0 and hence v = 0. This contradiction shows that (ii) is satisfied. Implication (ii)=⇒(i) is obvious.   Suppose now that (i) is satisfied, i.e., N (u; Ω1 ) ∩ − N (u; Ω2 ) = {0} for some u ∈ Ω1 ∩ Ω2 . If on the contrary ri Ω1 ∩ ri Ω2 = ∅, then Ω1 and Ω2 can be properly separated by a hyperplane. Thus these sets can be separated by a hyperplane in the sense of (2.8). Using   Proposition 2.52 yields N (u; Ω1 ) ∩ − N (u; Ω2 ) = {0}, which is a contradiction. Thus the relative interior qualification condition (2.24) holds under (i). Finally, the implication (i)=⇒(ii) is left as an exercise for the reader.  Employing induction arguments allows us to derive a counterpart of Theorem 2.56 for finitely many sets.

2.6

Exercises for This Chapter

77

Corollary 2.59 Let Ωi ⊂ Rn for i = 1, . . . , m be convex sets with x ∈ the fulfillment of one of the following conditions: (i) The relative interior qualification condition: (ii) The normal cone qualification condition: m 

m

i=1 ri Ωi

m

i=1 Ωi .

Assume

= ∅.



 vi = 0, vi ∈ N (x; Ωi ) =⇒ vi = 0 for all i = 1, . . . , m .

i=1

Then we have the intersection formula m m     Ωi = N (x; Ωi ). N x; i=1

2.6

i=1

Exercises for This Chapter

Exercise 2.1 Let the set {v1 , . . . , vm }, m ≥ 1, consist of affinely independent elements, and let v ∈ / aff {v1 , . . . , vm }. Show that v1 , . . . , vm , v are affinely independent. Exercise 2.2 Suppose that Ω is a convex subset of Rn with dim Ω = m, m ≥ 1. Let the set {v1 , . . . , vm } ⊂ Ω consist of affinely independent elements, and let   Δm = co v1 , . . . , vm .   Prove that aff Ω = aff Δm = aff v1 , . . . , vm . Exercise 2.3 Give detailed proofs of assertions (i) and (ii) in Proposition 2.4. Exercise 2.4 Prove the general statement of Lemma 2.19. Exercise 2.5 Let Ω be a nonempty convex subset of Rn . (i) Prove that aff Ω = aff Ω. (ii) Prove that for any x ∈ ri Ω and x ∈ Ω there is t > 0 such that x + t(x − x) ∈ Ω. (iii) Prove Proposition 2.23(ii). Exercise 2.6 Let f : Rn → R be a function. Prove that aff(epi f ) = aff(dom f ) × R. Exercise 2.7 (i) Find two convex sets Ω1 and Ω2 in Rn such that Ω1 ⊂ Ω2 while ri Ω1 ⊂ ri Ω2 .

78

2 Convex Separation and Some Consequences

(ii) Prove that if Ω1 and Ω2 are two convex sets in Rn such that Ω1 ⊂ Ω2 and Ω1 ∩ ri Ω2 = ∅, then ri Ω1 ⊂ ri Ω2 . Exercise 2.8 Let Ω1 be a subset of Rn , and let Ω2 be a subset of R p . (i) Prove that aff(Ω1 × Ω2 ) = aff Ω1 × aff Ω2 . (ii) Prove that ri(Ω1 × Ω2 ) = ri Ω1 × ri Ω2 . Exercise 2.9 Prove Corollary 2.27. Exercise 2.10 Give an example of two nonempty closed convex sets Ω1 and Ω2 for which Ω1 − Ω2 is not closed. Exercise 2.11 Let Ω1 and Ω2 be nonempty convex sets in Rn . Prove that Ω1 and Ω2 can be strictly separated if and only if 0 ∈ / Ω1 − Ω2 . / Ω, show that Exercise 2.12 (i) Let Ω ⊂ Rn be a nonempty closed convex cone. Given x ∈ there exists an element v ∈ Rn such that v, x ≤ 0 for all x ∈ Ω and that v, x > 0. / Ω. Prove that there exists an element (ii) Let Ω be a linear subspace of Rn , and let x ∈ n v ∈ R such that v, x = 0 for all x ∈ Ω and v, x > 0. Exercise 2.13 Let f : Rn → R be a convex function attaining its minimum at some point x ∈ dom f . Prove that the sets Ω1 and Ω2 , which are defined by Ω1 = epi f and Ω2 = Rn × { f (x)}, form an extremal system. Exercise 2.14 Consider the constrained optimization problem: minimize ϕ(x) subject to x ∈ Ω, where ϕ : Rn → R is finite at x ∈ Ω. Assume that x is a minimizer of the formulated problem, i.e., ϕ(x) ≤ ϕ(x) for all x ∈ Ω. Show that {Ω1 , Ω2 } forms an extremal system in Rn+1 , where the sets Ω1 , Ω2 are defined by   Ω1 = epi ϕ and Ω2 = Ω × ϕ(x) . Exercise 2.15 Let Ω be a convex set with x ∈ Ω. (i) Prove that if the sets Ω and {x} form an extremal system, then x ∈ bd Ω. (ii) Provide a counterexample to show that the converse implication in Corollary 2.54 is not true in general.

2.6

Exercises for This Chapter

79

Exercise 2.16 Let f : Rn → R be a convex function, and let α ∈ R. Consider the convex sublevel set    Lα ( f ) = x ∈ Rn  f (x) ≤ α . (i) Suppose that there exists x0 ∈ ri(dom f ) such that f (x0 ) < α. Prove that    ri Lα ( f ) = x ∈ ri(dom f )  f (x) < α .

(2.27)

(ii) Use a counterexample to show that equality (2.27) does not hold in general. → R p be a convex set-valued mapping such that Exercise 2.17 (i) Let F : Rn → int(gph F) = ∅. Prove that    int(gph F) = (x, y) ∈ Rn × R p  x ∈ int(dom F), y ∈ int F(x) . Does this equality still hold without assuming that int(gph F) = ∅? (ii) Let f : Rn → R be a convex function such that int(epi f ) = ∅. Prove that    int(epi f ) = (x, t) ∈ Rn+1  x ∈ int(dom f ), f (x) < t . Does this equality still hold without assuming that int(epi f ) = ∅? → R p be a convex set-valued mapping. Prove that if (x, y) ∈ Exercise 2.18 Let F : Rn → ri(gph F), then y ∈ ri(rge F). Exercise 2.19 Let Ω ⊂ Rn be a nonempty convex set. Define the set    Θ = (x, λ) ∈ Rn+1  x ∈ Ω, λ ≤ a, x + b , where a ∈ Rn and b ∈ R. Prove that    ri Θ = (x, λ) ∈ Rn+1  x ∈ ri Ω, λ < a, x + b .  Exercise 2.20 Let Ω = {x ∈ Rn  x  0}. Prove that    N (x; Ω) = v ∈ Rn  v  0, v, x = 0 for every x ∈ Ω. Exercise 2.21 Let f : Rn → R be a C 1 convex function, and let x ∈ Rn . Prove that       N (x, f (x)); epi f = λ ∇ f (x), −1  λ ≥ 0 . Exercise 2.22 Prove implication (i)=⇒(ii) in Proposition 2.58.

80

2 Convex Separation and Some Consequences

Exercise 2.23 Let Ωi for i = 1, . . . , m (m ≥ 2) be convex subsets of Rn with nonempty intersection and such that m−1 

ri Ωi = ∅ and

i=1

m 

ri Ωi = ∅.

i=1

Prove that there are elements vi in Rn , not equal to zero simultaneously, and real numbers αi for i = 1, . . . , m such that vi , x ≤ αi for all x ∈ Ωi , i = 1, . . . , m, m m   vi = 0, and αi = 0. i=1

i=1

Exercise 2.24 In the setting of Theorem 2.56 and its proof, show that ri Θ1 ∩ ri Θ2 = ∅. Exercise 2.25 Use induction to prove Corollary 2.59.

3

Convex Generalized Differentiation

This chapter is devoted to basic constructions of convex generalized differentiation. Using the separation theorems and the intersection rule for normals to convex sets, we develop basic calculus results for subdifferentials of extended-real-valued convex functions and coderivatives of convex set-valued mappings/multifunctions.

3.1

Subgradients of Convex Functions

In this section, we define and study the concept of subdifferential (or the collection of subgradients) for extended-real-valued convex functions. This notion of generalized derivative is one of the most fundamental in convex analysis with a variety of important applications. The revolutionary idea behind the subdifferential concept, which distinguishes it from other notions of generalized derivatives in mathematics, is its set-valuedness as the indication of nonsmoothness of a function around the reference individual point. It not only provides many advantages and flexibility but also creates certain difficulties in developing its calculus and applications. The major techniques of subdifferential analysis revolve around convex separation of sets. Definition 3.1 Let f : Rn → R be a convex function with x ∈ dom f . An element v ∈ Rn is called a subgradient of f at x if v, x − x ≤ f (x) − f (x) for all x ∈ Rn .

(3.1)

The collection of all the subgradients of f at x is called the subdifferential of the function at this point and is denoted by ∂ f (x). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0_3

81

82

3 Convex Generalized Differentiation

Fig. 3.1 Subdifferential of the absolute value function at the origin

The following proposition shows that the subdifferential ∂ f (x) can be geometrically defined via the normal cone to the epigraph of f at (x, f (x)); see an illustration in Fig. 3.1. Proposition 3.2 For any convex function f : Rn → R and any x ∈ dom f , we have the subdifferential representation     ∂ f (x) = v ∈ Rn  (v, −1) ∈ N (x, f (x)); epi f . (3.2) Proof Fix v ∈ ∂ f (x) and (x, λ) ∈ epi f . Since λ ≥ f (x), by (3.1) we have   (v, −1), (x, λ) − (x, f (x)) = v, x − x − (λ − f (x)) ≤ v, x − x − ( f (x) − f (x)) ≤ 0. This readily implies that (v, −1) ∈ N ((x, f (x)); epi f ). To verify the reverse inclusion in (3.2), fix v ∈ Rn with (v, −1) ∈ N ((x, f (x)); epi f ). It suffices to verify the inequality in (3.1) when x ∈ dom f since this inequality holds obviously if x ∈ / dom f , i.e., f (x) = ∞. For any x ∈ dom f , we have (x, f (x)) ∈ epi f . Thus   v, x − x − ( f (x) − f (x)) = (v, −1), (x, f (x)) − (x, f (x)) ≤ 0, which shows that v ∈ ∂ f (x) and so justifies representation (3.2).



The next result not only ensures the subdifferentiability, i.e., nonemptiness of the subdifferential, of convex functions but also justifies the compactness of the subdifferential at interior points of the domain. Proposition 3.3 For any convex function f : Rn → R and for any x ∈ ri(dom f ), we have that ∂ f (x) = ∅. If x ∈ int(dom f ), then the subdifferential ∂ f (x) is compact in Rn .

3.1

Subgradients of Convex Functions

83

Proof Suppose that x ∈ ri(dom f ). Since (x, f (x)) ∈ / ri(epi f ) by Corollary 2.46, we can properly separate (x, f (x)) and epi f by a hyperplane. This gives us a nonzero pair (v, −μ) ∈ Rn × R such that v, x − μλ ≤ v, x − μ f (x) whenever (x, λ) ∈ epi f .

(3.3)

Moreover, there exists (x0 , λ0 ) ∈ epi f for which v, x0  − μλ0 < v, x − μ f (x). Using (3.3) with x = x and λ = f (x) + 1 yields μ ≥ 0. Let us now show that μ > 0. Assuming on the contrary that μ = 0, we get the conditions v, x ≤ v, x for all x ∈ dom f , and v, x0  < v, x, which mean that x and dom f can be properly separated by a hyperplane, and therefore x∈ / ri(dom f ) by Theorem 2.40. This contradiction shows that μ > 0. The latter allows us to derive from (3.3), by dividing both sides of the inequality by μ and rearranging the terms, that  v , x − x ≤ λ − f (x) whenever (x, λ) ∈ epi f . μ Since (x, f (x)) ∈ epi f for x ∈ dom f , this implies by the definition of subgradients that v/μ ∈ ∂ f (x). Thus ∂ f (x) is nonempty in this case and also in the case where x ∈ int(dom f ) because ri(dom f ) = int(dom f ) if int(dom f ) = ∅. Let us now verify the boundedness of the subdifferential. It follows from Corollary 1.62 that f is locally Lipschitz continuous around x with some constant , i.e., there exists a positive number δ such that | f (x) − f (y)| ≤  x − y for all x, y ∈ B(x; δ). Thus for any subgradient v ∈ ∂ f (x) and any element h ∈ Rn with h ≤ δ, we have the inequality v, h ≤ f (x + h) − f (x) ≤  h , which implies that v ≤  by choosing h = δv/ v . The closedness of ∂ f (x) follows directly from the definition, so we get the compactness of the subdifferential.  Next we recall standard notions of local and global minimizers of functions. Definition 3.4 Let f : Rn → R be an extended-real-valued function. (i) We say that f has a local minimum at x ∈ Rn if there is δ > 0 with f (x) ≤ f (x) for all x ∈ B(x; δ).

84

3 Convex Generalized Differentiation

(ii) We say that f has a global/absolute minimum at x ∈ Rn if f (x) ≤ f (x) for all x ∈ Rn . A remarkable feature of convex functions is that every local minimizer for a convex function gives also the global/absolute minimum to the function. Proposition 3.5 Let f : Rn → R be a convex function. If f has a local minimum at x ∈ Rn , then f has a global minimum at this point. Proof If x is a local minimizer of f , then f (x) ≤ f (x) for all x ∈ B(x; δ) for some δ > 0. Fix x ∈ Rn and construct the sequence

1 1 xk = 1 − x + x, k ∈ N. k k Observe that xk ∈ B(x; δ) when k is sufficiently large. Taking such a number k, it follows from the convexity of f that  1 1 f (x) + f (x), f (x) ≤ f (xk ) ≤ 1 − k k which readily implies that k −1 f (x) ≤ k −1 f (x), and hence f (x) ≤ f (x). Therefore, f has  a global minimum at x. The Fermat stationary rule of classical analysis tells us that if f : Rn → R is Fréchet differentiable at x and attains its local minimum at this point, then ∇ f (x) = 0. The following proposition gives a subdifferential counterpart of this result for general convex functions. Its proof is a direct consequence of the definitions and Proposition 3.5. Proposition 3.6 Let f : Rn → R be a convex function with x ∈ dom f . Then f has a local/global minimum at x if and only if 0 ∈ ∂ f (x). Proof Suppose that f has a global minimum at x. Then f (x) ≤ f (x) for all x ∈ Rn , which can be rewritten as follows: 0 = 0, x − x ≤ f (x) − f (x) for all x ∈ Rn .

3.1

Subgradients of Convex Functions

This holds if and only if 0 ∈ ∂ f (x) by the subdifferential definition.

85



Along with representing the subdifferential via the normal cone in (3.2), there is the following simple way to express the normal cone to a convex set via the subdifferential of the indicator function. Example 3.7 Given a convex set Ω ⊂ Rn with x ∈ Ω, recall that the indicator function associated with Ω is given by 0 if x ∈ Ω, f (x) = δ(x; Ω) = (3.4) ∞ otherwise. Then epi f = Ω × [0, ∞), and we have by Proposition 2.55(i) that   N (x, f (x)); epi f = N (x; Ω) × (−∞, 0], which readily implies by Proposition 3.2 the relationship ∂ f (x) = N (x; Ω).

(3.5)

Next we recall the classical derivative notions for functions on Rn . Definition 3.8 Let f : Rn → R be a function. (i) We say that f is GÂTEAUX differentiable at x ∈ Rn if x ∈ int(dom f ) and there is v ∈ Rn such that for any d ∈ Rn we have lim

t→0

f (x + td) − f (x) − tv, d = 0. t

In this case, the element v is unique and is called the GÂTEAUX derivative of f at x denoted by f G (x). (ii) We say that is Fréchet differentiable at x ∈ Rn if x ∈ int(dom f ) and there exists an element v ∈ Rn such that lim

x→x

f (x) − f (x) − v, x − x = 0. x − x

In this case, the element v is unique and is called the Fréchet derivative of f at x denoted by f F (x) or ∇ f (x). It follows from the definitions that if f is Fréchet differentiable at x, then it is Gâteaux differentiable at this point. The converse implication holds for convex functions defined on Rn as we will see in Sect. 5.1

86

3 Convex Generalized Differentiation

Now we show that the subdifferential (3.1) is in fact a singleton for any Gâteaux differentiable function and is reduced to the Gâteaux derivative at the reference point. Proposition 3.9 Let f : Rn → R be convex and Gâteaux differentiable at x ∈ Rn . Then we have ∂ f (x) = { f G (x)} and  f G (x), x − x ≤ f (x) − f (x) for all x ∈ Rn .

(3.6)

Proof Suppose that f is Gâteaux differentiable at x and denote v = f G (x). Since f is convex, for any x ∈ Rn and t ∈ (0, 1) we have f (x + t(x − x)) = f (t x + (1 − t)x) ≤ t f (x) + (1 − t) f (x) = f (x) + t( f (x) − f (x)). Thus using the Gâteaux differentiability of f at x gives us v, x − x = lim

t→0+

f (x + t(x − x)) − f (x) ≤ f (x) − f (x) for all x ∈ Rn , t

which yields (3.6) and v ∈ ∂ f (x). Picking w ∈ ∂ f (x), we get w, x − x ≤ f (x) − f (x) for all x ∈ Rn . This readily implies that w, d =

w, (x + td) − x f (x + td) − f (x) ≤ for all d ∈ Rn , t > 0. t t

Letting t → 0+ tells us that w, d ≤ v, d, i.e., w − v, d ≤ 0 for all d ∈ Rn . Using the latter with d = w − v verifies that w = v, which yields the equality ∂ f (x) = {v} = { f G (x)} and thus completes the proof of the proposition.  Corollary 3.10 Let f : Rn → R be a convex function. Suppose that f is Gâteaux differentiable at x ∈ Rn . Then the strict convexity of f implies that  f G (x), x − x < f (x) − f (x) whenever x = x. Proof Since f is convex, we get from Proposition 3.9 that  f G (x), u − x ≤ f (u) − f (x) for all u ∈ Rn . Fix x = x with x ∈ dom f , and let u = (x + x)/2. It follows that

f G (x),

x + x  x+x 1 1 − f (x) < f (x) + f (x) − f (x). −x ≤ f 2 2 2 2

(3.7)

3.1

Subgradients of Convex Functions

87

This brings us to the strict inequality (3.7) (valid even if x ∈ / dom f ) and thus completes the proof of the corollary.  A simple while very important example of a nonsmooth convex function is the norm function on Rn . Here is the calculation of its subdifferential. Example 3.11 Let p(x) = x , x ∈ Rn , be the Euclidean norm function on Rn . Then we have the formula ⎧ ⎨B if x = 0, ∂ p(x) =  x  ⎩ otherwise. x To verify this, observe first that the Euclidean norm function p is Fréchet differentiable at any nonzero point with ∇ p(x) = x/ x for x = 0. Thus  x  for x = 0. ∂ p(x) = x It remains to calculate its subdifferential at x = 0. To proceed by (3.1), we have v ∈ ∂ p(0) if and only if v, x = v, x − 0 ≤ p(x) − p(0) = x for all x ∈ Rn . Letting x = v gives us v, v ≤ v , which implies that v ≤ 1, i.e., v ∈ B. Now take v ∈ B and deduce from the Cauchy-Schwarz inequality that v, x − 0 = v, x ≤ v · x ≤ x = p(x) − p(0) for all x ∈ Rn yielding v ∈ ∂ p(0), which shows that ∂ p(0) = B. The next theorem calculates the subdifferential of the distance function. Theorem 3.12 Let Ω be a nonempty closed convex subset of Rn . Then we have the subdifferential formula ⎧ ⎨ N (x; Ω) ∩ B if x ∈ Ω, ∂d(x; Ω) =  x − Π (x; Ω)  ⎩ otherwise. d(x; Ω) Proof Consider the case where x ∈ Ω and fix any element w ∈ ∂d(x; Ω). It follows from the definition of the subdifferential that w, x − x ≤ d(x; Ω) − d(x; Ω) = d(x; Ω) for all x ∈ Rn .

(3.8)

Since the distance function d(·; Ω) is Lipschitz continuous with constant  = 1, it immediately follows that

88

3 Convex Generalized Differentiation

w, x − x ≤ x − x for all x ∈ Rn . This implies that w ≤ 1, i.e., w ∈ B. We also deduce from (3.8) that w, x − x ≤ 0 for all x ∈ Ω. Thus w ∈ N (x; Ω), which verifies that w ∈ N (x; Ω) ∩ B. To prove the reverse inclusion, pick w ∈ N (x; Ω) ∩ B. Then w ≤ 1 and w, u − x ≤ 0 for all u ∈ Ω. Thus for every x ∈ Rn and every u ∈ Ω we have w, x − x = w, x − u + u − x = w, x − u + w, u − x ≤ w, x − u ≤ w · x − u ≤ x − u . Since u is taken arbitrarily in Ω, this implies that w, x − x ≤ d(x; Ω) = d(x; Ω) − d(x; Ω), and hence w ∈ ∂d(x; Ω). / Ω, take z = Π (x; Ω), and fix w ∈ ∂d(x; Ω). Then Now consider the case where x ∈ w, x − x ≤ d(x; Ω) − d(x; Ω) = d(x; Ω) − x − z ≤ x − z − x − z for all x ∈ Rn . Denoting p(x) = x − z , x ∈ Rn , tells us that w, x − x ≤ p(x) − p(x) for all x ∈ Rn , which ensures that w ∈ ∂ p(x) = {(x − z)/ x − z }. Let us show next that w = (x − z)/ x − z is a subgradient of d(·; Ω) at x. Indeed, for any x ∈ Rn denote px = Π (x; Ω) and get w, x − x = w, x − z + w, z − x = w, x − z − x − z = w, x − px  + w, px − z − x − z . Since z = Π (x; Ω), it follows from Proposition 1.69 that x − z, px − z ≤ 0, and so we have w, px − z ≤ 0. Using the fact that w = 1 and the Cauchy-Schwarz inequality gives us w, x − x = w, x − px  + w, px − z − x − z ≤ w · x − px − x − z = x − px − x − z = d(x; Ω) − d(x; Ω) for all x ∈ Rn . Thus we arrive at w ∈ ∂d(x; Ω) and complete the proof of the theorem.



3.1

Subgradients of Convex Functions

89

Now we introduce the notion of the singular (or horizon) subdifferential of extendedreal-valued convex functions. Note that, in contrast to the subdifferential of convex analysis defined earlier, the singular subdifferential is not related to the classical derivative for differentiable functions. As we will see below, this construction is nontrivial only for functions that are not Lipschitz continuous around the reference point. Definition 3.13 Let f : Rn → R be a convex function, and let x ∈ dom f . The singular subdifferential of the function f at x is defined by     ∂ ∞ f (x) := v ∈ Rn  (v, 0) ∈ N (x, f (x)); epi f . (3.9) The following example illustrates the calculation of ∂ ∞ f (x) for the case of the set indicator function. Example 3.14 Consider the indicator function in Example 3.7. Then ∂ ∞ f (x) = N (x; Ω) for x ∈ Ω. In particular, for the function f (x) =

0

if |x| ≤ 1,

∞ otherwise,

we have epi f = [−1, 1] × [0, ∞). If x = 1, that   N (x, f (x)); epi f = [0, ∞) × (−∞, 0], which shows that ∂ ∞ f (x) = [0, ∞); see Fig. 3.2. The next proposition significantly simplifies the calculation of the singular subdifferential f (x) and has important consequences.

∂∞

Proposition 3.15 For a convex function f : Rn → R with x ∈ dom f , we have the representation ∂ ∞ f (x) = N (x; dom f ). Proof Fix any v ∈ ∂ ∞ f (x) with x ∈ dom f and observe that (x, f (x)) ∈ epi f . Then using (3.9) and the normal cone construction (2.21) gives us   v, x − x = v, x − x + 0 f (x) − f (x) ≤ 0,

90

3 Convex Generalized Differentiation

Fig. 3.2 Singular subgradients

which shows that v ∈ N (x; dom f ). Conversely, suppose that v ∈ N (x; dom f ) and fix any (x, λ) ∈ epi f . Then we have f (x) ≤ λ, and so x ∈ dom f . Thus v, x − x + 0(λ − f (x)) = v, x − x ≤ 0, which implies that (v, 0) ∈ N ((x, f (x)); epi f ), i.e., v ∈ ∂ ∞ f (x).



Note that the indicator function f = δ(·; Ω) for which ∂ ∞ f (x) = {0} in Example 3.14 is highly non-Lipschitzian. Let us now show that in general the nontriviality condition ∂ ∞ f (x) = {0} fully characterizes the local Lipschitzian property of f around the reference point x. Proposition 3.16 Let f : Rn → R be a convex function with x ∈ dom f . Then f is locally Lipschitzian around x if and only if ∂ ∞ f (x) = {0}. Proof It is proved in Theorem 1.64 that the local Lipschitz continuity of f around x is equivalent to the interiority condition x ∈ int(dom f ). The latter condition clearly implies by Proposition 3.15 that ∂ ∞ f (x) = N (x; dom f ) = {0}. To verify the converse impli/ int(dom f ). Then x ∈ bd(dom f ), and hence cation, suppose on the contrary that x ∈ we have N (x; dom f ) = {0} by Corollary 2.53. This tells us that ∂ ∞ f (x) = {0} due to Proposition 3.15, which yields a contradiction and thus completes the proof. 

3.2

3.2

Subdifferential Calculus Rules

91

Subdifferential Calculus Rules

In this section, we derive basic calculus rules for the convex subdifferential (3.1). First observe the obvious subdifferential rule for scalar multiplication. Proposition 3.17 Let f : Rn → R be a convex function with x ∈ dom f , and let α > 0. Then we have the equality ∂(α f )(x) = α∂ f (x). Let us now establish the fundamental subdifferential sum rule that plays a crucial role in convex analysis and its applications. We give a geometric proof for this result reducing it to the intersection rule for normal cones to convex sets, which in turn is based on convex separation. Theorem 3.18 Let f i : Rn → R for i = 1, 2 be convex functions. Impose the relative interior qualification condition ri(dom f 1 ) ∩ ri(dom f 2 ) = ∅.

(3.10)

Then we have the subdifferential sum rule ∂( f 1 + f 2 )(x) = ∂ f 1 (x) + ∂ f 2 (x) for all x ∈ dom f 1 ∩ dom f 2 .

(3.11)

Proof To prove the inclusion “⊂”, fix an arbitrary subgradient v ∈ ∂( f 1 + f 2 )(x), where x ∈ dom f 1 ∩ dom f 2 , and define two convex sets    Ω1 = (x, λ1 , λ2 ) ∈ Rn × R × R  λ1 ≥ f 1 (x) ,    Ω2 = (x, λ1 , λ2 ) ∈ Rn × R × R  λ2 ≥ f 2 (x) . Let us first verify the inclusion   (v, −1, −1) ∈ N (x, f 1 (x), f 2 (x)); Ω1 ∩ Ω2 .

(3.12)

For any (x, λ1 , λ2 ) ∈ Ω1 ∩ Ω2 , we have λ1 ≥ f 1 (x) and λ2 ≥ f 2 (x). Then v, x − x + (−1)(λ1 − f 1 (x)) + (−1)(λ2 − f 2 (x))   ≤ v, x − x − f 1 (x) + f 2 (x) − f 1 (x) − f 2 (x) ≤ 0, where the last estimate holds due to v ∈ ∂( f 1 + f 2 )(x). Thus (3.12) is satisfied. To proceed with employing the normal cone intersection rule in (3.12), let us first check that the assumed qualification condition (3.10) yields

92

3 Convex Generalized Differentiation

ri Ω1 ∩ ri Ω2 = ∅,

(3.13)

which is the qualification condition needed for the application of Theorem 2.56. Indeed, we have from Ω1 = epi f 1 × R and Corollary 2.46 that    ri Ω1 = (x, λ1 , λ2 ) ∈ Rn × R × R  x ∈ ri(dom f 1 ), λ1 > f 1 (x) . and observe similarly the representation    ri Ω2 = (x, λ1 , λ2 ) ∈ Rn × R × R  x ∈ ri(dom f 2 ), λ2 > f 2 (x) . Using the qualification condition (3.10), fix x0 ∈ ri(dom f 1 ) ∩ ri(dom f 2 ) and observe that (x0 , f 1 (x0 ) + 1, f 2 (x0 ) + 1) ∈ ri Ω1 ∩ ri Ω2 , which verifies (3.13). Applying Theorem 2.56 in (3.12) shows that     N (x, f 1 (x), f 2 (x)); Ω1 ∩ Ω2 = N (x, f 1 (x), f 2 (x)); Ω1   +N (x, f 1 (x), f 2 (x)); Ω2 , which implies in turn that (v, −1, −1) = (v1 , −γ1 , 0) + (v2 , 0, −γ2 ), where (v1 , −γ1 ) ∈ N ((x, f 1 (x)); epi f 1 ) and (v2 , −γ2 ) ∈ N ((x, f 1 (x)); epi f 2 ). Then we get γ1 = γ2 = 1 and v = v1 + v2 , where vi ∈ ∂ f i (x) for i = 1, 2. This verifies the inclusion “⊂” in the subdifferential sum rule of (3.11). The reverse inclusion therein follows easily from the definition of the subdifferential. Indeed, any subgradient v ∈ ∂ f 1 (x) + ∂ f 2 (x) can be represented as v = v1 + v2 , where vi ∈ ∂ f i (x) for i = 1, 2. Then we have v, x − x = v1 , x − x + v2 , x − x ≤ f 1 (x) − f 1 (x) + f 2 (x) − f 2 (x) = ( f 1 + f 1 )(x) − ( f 1 + f 2 )(x) for all x ∈ Rn , which readily implies that v ∈ ∂( f 1 + f 2 )(x) and therefore completes the proof of the theorem.  The corollary below follows directly from Proposition 2.58 applied to the sets Ω1 = dom f 1 and Ω2 = dom f 2 . Corollary 3.19 The subdifferential sum rule (3.11) is satisfied under each of the qualification conditions:   (i) N (u; dom f 1 ) ∩ − N (u; dom f 2 ) = {0} for some u ∈ dom f 1 ∩ dom f 2 . (ii) Either int(dom f 1 ) ∩ dom f 2 = ∅, or dom f 1 ∩ int(dom f 2 ) = ∅.

3.2

Subdifferential Calculus Rules

93

Let us present some useful consequences of Theorem 3.18. The fundamental sum rule for the (basic) subdifferential (3.1) in the following corollary is known as the MoreauRockafellar theorem. Corollary 3.20 Let f i : Rn → R for i = 1, 2 be convex functions. Assume that there exists u ∈ dom f 1 ∩ dom f 2 such that either f 1 is continuous at u, or f 2 is continuous at u. Then we have the equality (3.14) ∂( f 1 + f 2 )(x) = ∂ f 1 (x) + ∂ f 2 (x) for all x ∈ dom f 1 ∩ dom f 2 . Consequently, if both functions f i : Rn → R are real-valued convex functions on Rn , then the sum rule (3.14) holds for all x ∈ Rn . Proof Suppose for definiteness that f 1 is continuous at u ∈ dom f 1 ∩ dom f 2 , which implies by Theorem 1.64 that u ∈ int(dom f 1 ) ∩ dom f 2 . Thus the conclusion follows directly from Corollary 3.19.  The next corollary provides the subdifferential sum rule of finitely many functions. It can be proved by induction. Corollary 3.21 Consider finitely many convex functions f i : Rn → R for i = 1, . . . , m. Then we have the equality ∂

m 

m m    f i (x) = ∂ f i (x) for every x ∈ dom f i

i=1

i=1

i=1

under either one of the following qualification conditions: m = ∅. (i) i=1 ri (dom f 1 ) m dom f i such that (ii) There exists u ∈ i=1 m 

   vi = 0, vi ∈ N (u; dom f i ) =⇒ vi = 0 for all i = 1, . . . , m .

i=1

(iii) There exists u ∈ continuous at u.

m

i=1 dom f i

such that all (except possibly one) functions f i are

To proceed further with deriving chain rules, we present first the following lemma concerning normal cones to graphs of affine mappings. Lemma 3.22 Let B : Rn → R p be an affine mapping given by B(x) = Ax + b, x ∈ Rn , where A ∈ R p×n and b ∈ R p . Then for any (x, y) ∈ gph B, we have

94

3 Convex Generalized Differentiation

     N (x, y); gph B = (u, v) ∈ Rn × R p  u = −A T v . Proof It is clear that the set gph B is convex and that the inclusion (u, v) ∈ N ((x, y); gph B) is satisfied if and only if u, x − x + v, B(x) − B(x) ≤ 0 for all x ∈ Rn .

(3.15)

It follows directly from the definitions that u, x − x + v, B(x) − B(x) = u, x − x + v, Ax − Ax = u, x − x + A T v, x − x = u + A T v, x − x. This tells us that condition (3.15) holds if and only if u + A T v, x − x ≤ 0 for all x ∈ Rn . We finally observe that the latter is equivalent to the equality u = −A T v and thus complete the proof of the lemma.  Here is the basic subdifferential chain rule of convex analysis. Theorem 3.23 Let B : Rn → R p be given as in Lemma 3.22, and let f : R p → R be a convex function. Suppose that B(Rn ) ∩ ri (dom f ) = ∅. Then for any x ∈ Rn such that y = B(x) ∈ dom f we have      ∂( f ◦ B)(x) = A T ∂ f (y) := A T v  v ∈ ∂ f (y) .

(3.16)

(3.17)

Proof Pick v ∈ ∂( f ◦ B)(x) and form two convex subsets of Rn × R p × R by Ω1 = gph B × R and Ω2 = Rn × epi f . It follows from the definitions of subdifferential and normal cone that (v, 0, −1) ∈ N ((x, y, z); Ω1 ∩ Ω2 ), where z = f (y). Indeed, fix any (x, y, λ) ∈ Ω1 ∩ Ω2 . Then y = B(x) and λ ≥ f (y), so λ ≥ f (B(x)). Thus   v, x − x + 0(y − y) + (−1)(λ − z) ≤ v, x − x − f (B(x)) − f (B(x)) ≤ 0. Since Ω1 is an affine set, we have ri Ω1 = Ω1 . Then Corollary 2.46 implies the relative interior representation

3.2

Subdifferential Calculus Rules

95

   ri Ω2 = (x, y, λ) ∈ Rn × R p × R  y ∈ ri (dom f ), λ > f (y) . Using (3.16), pick  x ∈ Rn such that  y = B( x ) ∈ ri(dom f ). Then ( x,  y,  λ) ∈ ri Ω1 ∩ ri Ω2 ,  where λ = f ( y) + 1. Thus ri Ω1 ∩ ri Ω2 = ∅. Employing now the intersection rule from Theorem 2.56 yields     (v, 0, −1) ∈ N (x, y, z); Ω1 + N (x, y, z); Ω2 , which tells us that (v, 0, −1) = (v, −w, 0) + (0, w, −1) with (v, −w) ∈ N ((x, y); gph B) and (w, −1) ∈ N ((y, z); epi f ). Then we deduce from Lemma 3.22 and Proposition 3.2 the conditions v = A T w and w ∈ ∂ f (y), which imply in turn that v ∈ A T (∂ f (y)) and thus verify the inclusion “⊂” in (3.17). The reverse inclusion follows directly from the definition of the subdifferential; see Exercise 3.6.  Now we derive a dual version of the qualification condition (3.16). Corollary 3.24 Let B : Rn → R p be given as in Lemma 3.22, and let f : R p → R be a convex function. Assume that ker A T ∩ N ( y; dom f ) = {0} for some  y ∈ B(Rn ) ∩ dom f .

(3.18)

Then for any x ∈ Rn such that y = B(x) ∈ dom f we have the subdifferential chain rule (3.17). Proof It suffices to show that (3.18) implies (3.16). On the contrary, suppose that (3.16) is not satisfied. By the separation of these sets, we can find a nonzero element v ∈ R p such that (3.19) v, y ≤ v, z for all y ∈ dom f and z ∈ B(Rn ). Since  y ∈ B(Rn ), it follows from (3.19) that v, y ≤ v,  y for all y ∈ dom f , which yields v ∈ N ( y; dom f ). Using  y = A x + b ∈ dom f for some  x ∈ Rn in (3.19) gives us v,  y = v, A x + b ≤ v, Ax + b for all x ∈ Rn . Hence 0 ≤ v, A(x −  x ) for all x ∈ Rn , which implies that 0 ≤ v, Au for all u ∈ Rn . By the linearity of A, we have A T v, u = 0 for all u ∈ Rn , and so A T v = 0, i.e., v ∈

96

3 Convex Generalized Differentiation

ker A T . This is a contradiction due to the imposed qualification condition (3.18), and thus we complete the proof.  The next corollary presents sufficient conditions for the fulfillment of (3.18). Corollary 3.25 In the setting of Corollary 3.24, suppose that either A is surjective, or f is continuous at some point  y ∈ B(Rn ); the latter assumption is automatic when f : R p → R is a real-valued convex function. Then the subdifferential chain rule (3.17) holds. Proof It suffices to verify the qualification condition (3.18). The surjectivity of A gives y ∈ B(Rn ), we have us ker A T = {0}. In the case where f is continuous at some point   y ∈ int(dom f ), so N ( y; dom f ) = {0}. Thus it remains to apply Corollary 3.24.  The last consequence of Theorem 3.23 provides a representation for the normal cone to inverse images of convex sets under affine mappings. Corollary 3.26 Let Ω ⊂ R p be a convex set, and let B : Rn → R p be an affine mapping as in Lemma 3.22 satisfying y = B(x) ∈ Ω. Then we have     N x; B −1 (Ω) = A T N (y; Ω) under the fulfillment of either one of the following qualification conditions: (i) B(Rn ) ∩ ri Ω = ∅. y; Ω) = {0} for some  y ∈ Ω. (ii) ker A T ∩ N ( Proof This follows from Corollary 3.24 with f (y) = δ(y; Ω), y ∈ R p .



The final result of this section concerns another major subdifferential rule for the maximum of finitely many convex functions. Given f i : Rn → R for i = 1, . . . , m, define the maximum function    f (x) = max f i (x)  i = 1, . . . , m , x ∈ Rn . (3.20) m dom f i . Theorem 3.27 Let f i : Rn → R, i = 1, . . . , m, be convex functions with x ∈ i=1 Assume that all functions f i for i = 1, . . . , m are continuous at x. Then we have the subdifferential maximum rule  ∂ f i (x) ∂ f (x) = co i∈I (x)

for the maximum function f defined in (3.20), where

3.2

Subdifferential Calculus Rules

97

   I (x) = i = 1, . . . , m  f i (x) = f (x) . Proof Considering the epigraphs of the functions f and f i , we get epi f =

m 

epi f i .

i=1

Since f i (x) < f (x) = α for any i ∈ / I (x), there exist a neighborhood U of x and δ > 0 such that f i (x) < α whenever (x, α) ∈ U × (α − δ, α + δ). It follows that (x, α) ∈ int(epi f i ), and so N ((x, α); epi f i ) = {(0, 0)} for such indices i. Let us show that the qualification condition from Corollary 2.59(ii) is satisfied for the   sets Ωi = epi f i , i = 1, . . . , m, at (x, α). Fix pairs (vi , −λi ) ∈ N (x, α); epi f i such that m 

(vi , −λi ) = 0.

i=1

/ I (x), and hence Then (vi , −λi ) = (0, 0) for i ∈  (vi , −λi ) = 0. i∈I (x)

Note that f i (x) = f (x) = α for i ∈ I (x). The proof of Proposition 3.3 tells us that λi ≥ 0  for i ∈ I (x). Thus the equality i∈I (x) λi = 0 yields λi = 0, and so vi ∈ ∂ ∞ f i (x) = {0} for every i ∈ I (x) by Theorem 1.64 and Proposition 3.15. This verifies that (vi , −λi ) = (0, 0) for i = 1, . . . , m, and hence we arrive at m         N (x, α); epi f i = N (x, f i (x)); epi f i . N (x, f (x)); epi f = i∈I (x)

i=1

For any v ∈ ∂ f (x), we have (v, −1) ∈ N ((x, f (x)); epi f ), which allows us to find (vi , −λi ) ∈ N ((x, f i (x)); epi f i ) for i ∈ I (x) such that  (v, −1) = (vi , −λi ). i∈I (x)





This yields i∈I (x) λi = 1 and v = i∈I (x) vi . If λi = 0, then vi ∈ ∂ ∞ f i (x) = {0}. Hence we assume without loss of generality that λi > 0 for every i ∈ I (x). Since N ((x, f i (x)); epi f i ) is a cone, it gives us (vi /λi , −1) ∈ N ((x, f i (x)); epi f i ), and so vi ∈ λi ∂ f i (x). We   get the representation v = i∈I (x) λi u i , where u i ∈ ∂ f i (x) and i∈I (x) λi = 1. Thus v ∈ co



∂ f i (x).

i∈I (x)

Note that the reverse inclusion follows directly from

98

3 Convex Generalized Differentiation

∂ f i (x) ⊂ ∂ f (x) for all i ∈ I (x), which is a consequence of the definition. Indeed, for any i ∈ I (x) and v ∈ ∂ f i (x), we have the relationships v, x − x ≤ f i (x) − f i (x) = f i (x) − f (x) ≤ f (x) − f (x) whenever x ∈ Rn , which imply that v ∈ ∂ f (x) and thus complete the proof.



The following example illustrates the application of the subdifferential maximum rule from Theorem 3.27. Example 3.28 Let f (x) = max{x, −x, 1} for x ∈ R. Define f 1 (x) = x, f 2 (x) = −x, and f 3 (x) = 1 for x ∈ R. If x = 1, then I (x) = {1, 3} and thus   ∂ f (x) = co ∂ f 1 (x) ∪ ∂ f 3 (x) = co {1, 0} = [0, 1]. If x = −1, then I (x) = {2, 3}, and we get   ∂ f (x) = co ∂ f 2 (x) ∪ ∂ f 3 (x) = co {−1, 0} = [−1, 0]. If x = 0, then I (x) = {3} and hence ∂ f (x) = ∂ f 3 (x) = {0}. The reader can easily find ∂ f (x) for all x ∈ R.

3.3

Mean Value Theorems

The mean value theorem for differentiable functions is one of the most fundamental results of classical real analysis. Now we derive its subdifferential counterparts for convex functions that are not necessarily differentiable (Fig. 3.3). The first result concerns mean value theorem for convex functions of one variable. Theorem 3.29 Let f : R → R be a convex function, let a < b, and assume that [a, b] ⊂ int(dom f ). Then there exists c ∈ (a, b) such that f (b) − f (a) ∈ ∂ f (c). b−a

(3.21)

3.3

Mean Value Theorems

99

Fig. 3.3 Subdifferential mean value theorem

Proof Define the real-valued function g : [a, b] → R by g(x) = f (x) −

 f (b) − f (a)  (x − a) + f (a) , x ∈ [a, b], b−a

for which we have g(a) = g(b). This implies, by the convexity and hence the continuity of f on the interior of its domain, that g has a local minimum at some c ∈ (a, b). Consider further the linear function  f (b) − f (a)  h(x) = − (x − a) + f (a) , x ∈ [a, b]. b−a Extending g and h to functions on R by setting g(x) = h(x) = ∞ for x ∈ / [a, b], we obviously have the equalities    f (b) − f (a)  ∂h(c) = h  (c) = − . b−a Using now the subdifferential Fermat rule and the subdifferential sum rule derived in Chap. 2 gives us the condition 0 ∈ ∂g(c) = ∂ f (c) −

 f (b) − f (a)  , b−a

which implies (3.21) and completes the proof of the theorem.



Next we proceed with deriving the main result of this section that gives us the mean value theorem for nondifferentiable convex functions of many variables. The following lemma is useful in our derivation.

100

3 Convex Generalized Differentiation

Lemma 3.30 Let f : Rn → R be a convex function, let a, b ∈ Rn with a = b, and assume that [a, b] ⊂ int(dom f ). Define the function ϕ : R → R of one variable by   ϕ(t) = f tb + (1 − t)a , t ∈ R.

(3.22)

Then for any number t0 ∈ (0, 1), we have    ∂ϕ(t0 ) = v, b − a  v ∈ ∂ f (c0 ) with c0 = t0 a + (1 − t0 )b. Proof Define the vector-valued mapping B : R → Rn by B(t) = tb + (1 − t)a = t(b − a) + a, t ∈ R. It is clear that ϕ(t) = ( f ◦ B)(t) whenever t ∈ R. By the subdifferential chain rule from Theorem 3.23, we have    ∂ϕ(t0 ) = v, b − a  v ∈ ∂ f (c0 ) , which verifies the result claimed in the lemma.



Here is the main mean value theorem of this section. Theorem 3.31 Let f : Rn → R be a convex function, let a, b ∈ Rn with a = b, and assume that [a, b] ⊂ int(dom f ). Then there exists an element c ∈ (a, b) such that      f (b) − f (a) ∈ ∂ f (c), b − a := v, b − a  v ∈ ∂ f (c) . Proof Consider the function ϕ : R → R from (3.22) for which ϕ(0) = f (a) and ϕ(1) = f (b). Applying first Theorem 3.29 and then Lemma 3.30 to this function, we find t0 ∈ (0, 1) such that    f (b) − f (a) = ϕ(1) − ϕ(0) ∈ ∂ϕ(t0 ) = v, b − a  v ∈ ∂ f (c) , where c = t0 a + (1 − t0 )b. This justifies our statement.

3.4



Coderivative Calculus

This section addresses generalized differentiation of convex set-valued mappings. We introduce and study the coderivative notion for convex set-valued mappings, which is broadly used in convex analysis and plays a crucial role in many aspects of the theory and applications of set-valued mappings.

3.4

Coderivative Calculus

101

→ R p be a convex set-valued mapping with (x, y) ∈ gph F. Definition 3.32 Let F : Rn → → Rn with The coderivative of F at (x, y) is the set-valued mapping D ∗ F(x, y) : R p → the values     D ∗ F(x, y)(v) := u ∈ Rn  (u, −v) ∈ N (x, y); gph F , v ∈ R p . (3.23) Now we present coderivative calculations for some structured mappings. → R p be given by F(x) = {Ax + b} for x ∈ Rn , where Proposition 3.33 Let F : Rn → p×n p and b ∈ R . For any (x, y) ∈ Rn × R p such that y = Ax + b, we have the A∈R representation D ∗ F(x, y)(v) = {A T v} whenever v ∈ R p . Proof Lemma 3.22 tells us that     N (x, y); gph F) = (u, v) ∈ Rn × R p  u = −A T v . Thus u ∈ D ∗ F(x, y)(v) if and only if u = A T v.



Proposition 3.34 For a convex function f : Rn → R with x ∈ dom f , consider the epi→ R defined in Example 1.73. Denoting y = f (x), we graphical multifunction E f : Rn → have ⎧ ⎪ ⎪ ⎨λ∂ f (x) if λ > 0, D ∗ E f (x, y)(λ) =

∂ ∞ f (x) if λ = 0, ⎪ ⎪ ⎩∅ if λ < 0.

In particular, if x ∈ int(dom f ), then D ∗ E f (x, y)(λ) = λ∂ f (x) for all λ ≥ 0. Proof It follows from the definition of E f that    gph E f = (x, λ) ∈ Rn × R  λ ∈ E f (x) = epi f . Thus v ∈ D ∗ E f (x, y)(λ) if and only if (v, −λ) ∈ N ((x, y); epi f ). If λ > 0, then (v, −λ) ∈ N ((x, y); epi f ) if and only if (v/λ, −1) ∈ N ((x, y); epi f ). By Proposition 3.2, this holds if and only if v/λ ∈ ∂ f (x), i.e., v ∈ λ∂ f (x). The claimed formula in the case where λ = 0 follows from the definition of the singular subdifferential. If λ < 0, then D ∗ F(x, y)(λ) = ∅ because λ ≥ 0 whenever (v, −λ) ∈ N ((x, y); epi f ) by the proof of Proposition 3.3. To verify the last part of the proposition, it suffices to consider the case where λ = 0. Observe that if x ∈ int(dom f ), then ∂ f (x) = ∅ and D ∗ E f (x, y)(0) = ∂ ∞ f (x) = {0} =  λ∂ f (x) with λ = 0 by Proposition 3.15. Next we calculate the coderivative of a constant set-valued mapping.

102

3 Convex Generalized Differentiation

Example 3.35 Given a nonempty convex subset K of R p , define the set-valued map→ R p by F(x) = K for every x ∈ Rn . Then gph F = Rn × K is convex, and ping F : Rn → Proposition 2.55(i) tells us that N ((x, y); gph F) = {0} × N (y; K ) for any (x, y) ∈ Rn × K . Thus {0} if − v ∈ N (y; K ), ∗ D F(x, y)(v) = ∅ otherwise. In particular, for the case where Θ = R p we have {0} if v = 0, ∗ D F(x, y)(v) = ∅ otherwise. Yet another structured multifunction is known as the indicator mapping associated with a convex set. Example 3.36 Given a nonempty convex subset Θ of Rn , define the set-valued mapping → R p by ΔΘ : Rn → 0 if x ∈ Θ, ΔΘ (x) := ∅ if x ∈ / Θ. Then gph ΔΘ = Θ × {0} is convex. For any fixed x ∈ Θ we get N ((x, 0); gph ΔΘ ) = N (x; Θ) × R p , and therefore D ∗ ΔΘ (x, 0)(v) = N (x; Θ) for all v ∈ R p . → R p , recall from Proposition 1.77 Given further two set-valued mappings F1 , F2 : Rn → that their sum F1 + F2 defined in (1.76) is a convex set-valued mapping with dom(F1 + F2 ) = dom F1 ∩ dom F2 . Our first calculus result concerns representing the coderivative of F1 + F2 at a given point (x, y) ∈ gph(F1 + F2 ). To formulate this result, consider the nonempty set   S(x, y) := (y 1 , y 2 ) ∈ R p × R p | y = y 1 + y 2 , y i ∈ Fi (x), i = 1, 2 .

(3.24)

The following theorem gives us the coderivative sum rule for convex set-valued mappings under the relative interior qualification condition. → R p be convex set-valued mappings with (x, y) ∈ Theorem 3.37 Let F1 , F2 : Rn → gph(F1 + F2 ). Imposing the qualification condition ri(dom F1 ) ∩ ri(dom F2 ) = ∅, we have the coderivative sum rule

(3.25)

3.4

Coderivative Calculus

D ∗ (F1 + F2 )(x, y)(v) = D ∗ F1 (x, y 1 )(v) + D ∗ F2 (x, y 2 )(v)

103

(3.26)

whenever (y 1 , y 2 ) ∈ S(x, y) and v ∈ R p . Proof Fix any u ∈ D ∗ (F1 + F2 )(x, y)(v) and (y 1 , y 2 ) ∈ S(x, y) for which we have the inclusion (u, −v) ∈ N ((x, y); gph(F1 + F2 )). Consider the convex sets    Ω1 = (x, y1 , y2 ) ∈ Rn × R p × R p  y1 ∈ F1 (x) ,    Ω2 = (x, y1 , y2 ) ∈ Rn × R p × R p  y2 ∈ F2 (x) and deduce from the normal cone definition that (u, −v, −v) ∈ N ((x, y 1 , y 2 ); Ω1 ∩ Ω2 ). We intend to verify the inclusion     (u, −v, −v) ∈ N (x, y 1 , y 2 ); Ω1 + N (x, y 1 , y 2 ); Ω2 .

(3.27)

Indeed, it follows from (3.25) that there exists  x ∈ ri(dom F1 ) ∩ ri(dom F2 ), and hence x ) = ∅ and ri F2 ( x ) = ∅. Furthermore, Theorem 2.45 Theorem 2.22(i) implies that ri F1 ( ensures that    ri Ω1 = (x, y1 , y2 ) ∈ Rn × R p × R p  x ∈ ri(dom F1 ), y1 ∈ ri F1 (x) ,    ri Ω2 = (x, y1 , y2 ) ∈ Rn × R p × R p  x ∈ ri(dom F2 ), y2 ∈ ri F2 (x) . Choosing  y1 ∈ ri F1 ( x ) and  y2 ∈ ri F1 ( x ) shows that ( x,  y1 ,  y2 ) ∈ ri Ω1 ∩ ri Ω2 , which tells us by Theorem 2.56 that (3.27) is satisfied. Therefore, we get (u, −v, −v) = (u 1 , −v, 0) + (u 2 , 0, −v), where (u i , −v) ∈ N ((x, y i ); (gph Fi ) for i = 1, 2. The latter readily implies by the coderivative definition that u = u 1 + u 2 ∈ D ∗ F1 (x, y 1 )(v) + D ∗ F2 (x, y 2 )(v). The reverse inclusion is obvious, and thus we verify the sum rule (3.26).



→ R p and Now we consider the composition of two set-valued mappings F : Rn → p q → R taken from Definition 1.78. It follows from Proposition 1.79 that G ◦ F G: R →

is convex provided that both F and G have this property. Given z ∈ (G ◦ F)(x), define the set (3.28) M(x, z) := F(x) ∩ G −1 (z).

104

3 Convex Generalized Differentiation

The following theorem establishes the coderivative chain rule for convex set-valued mappings. → R p and G : R p → → Rq be convex set-valued mappings with Theorem 3.38 Let F : Rn → (x, z) ∈ gph(G ◦ F). Imposing the qualification condition     ri rge F ∩ ri dom G = ∅,

(3.29)

we have the coderivative chain rule D ∗ (G ◦ F)(x, z)(w) = D ∗ F(x, y) ◦ D ∗ G(y, z)(w)

(3.30)

whenever y ∈ M(x, z) and w ∈ Rq . Proof Picking u ∈ D ∗ (G ◦ F)(x, z)(w) and y ∈ M(x, z) gives us the inclusion (u, −w) ∈ N ((x, z); gph(G ◦ F)), which means that u, x − x − w, z − z ≤ 0 for all (x, z) ∈ gph(G ◦ F). Define two convex subsets of Rn × R p × Rq by Ω1 = (gph F) × Rq and Ω2 = Rn × (gph G)

(3.31)

for which we can show that (see Exercise 3.19)   Ω1 − Ω2 = Rn × rge F − dom G × Rq .

(3.32)

Using (3.32), we have the representation   ri(Ω1 − Ω2 ) = Rn × ri rge F − dom G × Rq . It follows from (3.29), Corollary 2.29, and (3.31) that 0 ∈ ri(Ω1 − Ω2 ), and so ri Ω1 ∩ ri Ω2 = ∅. We can directly deduce from the definitions that (u, 0, −w) ∈ N ((x, y, z); Ω1 ∩ Ω2 ). Applying Theorem 2.56 with the qualification condition (3.33) tells us that (u, 0, −w) ∈ N ((x, y, z); Ω1 ∩ Ω2 ) = N ((x, y, z); Ω1 ) + N ((x, y, z); Ω2 ), and thus there exists a vector v ∈ Rm such that we have the representation

(3.33)

3.4

Coderivative Calculus

105

(u, 0, −w) = (u, −v, 0) + (0, v, −w) with (u, −v) ∈ N ((x, y); gph F) and (v, −w) ∈ N ((y, z); gph G). This shows by the coderivative definition (3.23) that u ∈ D ∗ F(x, y)(v) and v ∈ D ∗ G(y, z)(w), and so we verify the inclusion “⊂” in (3.30). The proof of the reverse inclusion therein is straightforward.  → R p be a set-valued mapping, and let Θ ⊂ R p be a given set. Recall that Let F : Rn → the preimage, or inverse image, of the set Θ under the mapping F is defined by    F −1 (Θ) = x ∈ Rn  F(x) ∩ Θ = ∅ . The corollary below gives us a representation of the normal cone to F −1 (Θ) via the normal cone to Θ and the coderivative of F. → R p be a convex set-valued mapping, and let Θ ⊂ R p be a Corollary 3.39 Let F : Rn → convex set. Suppose that   ri rge F ∩ ri Θ = ∅. (3.34) Then for any x ∈ F −1 (Θ) and y ∈ F(x) ∩ Θ, we have the representation   N (x; F −1 (Θ)) = D ∗ F(x, y) N (y; Θ) . → Rq Proof In the setting of Theorem 3.38, consider the indicator mapping G = ΔΘ : R p → given in Example 3.36. It is easy to check that Δ F −1 (Θ) (x) = (ΔΘ ◦ F)(x) for all x ∈ Rn , → Rq . Then the representation N (x; F −1 (Θ)) follows from where Δ F −1 (Θ) : Rn → Theorem 3.38 with G = ΔΘ by taking into account the coderivative calculation in Example 3.36 and Theorem 3.38.  Given an extended-real-valued convex function f : Rn → R, for each x ∈ dom f we define the operation λ∂ f (x) if λ > 0, λ  ∂ f (x) := ∂ ∞ f (x) if λ = 0. The last result of this section provides a precise representation formula for the normal cone to sublevel sets of extended-real-valued convex functions.

106

3 Convex Generalized Differentiation

Corollary 3.40 Let f : Rn → R be a convex function. For α ∈ R, consider the sublevel set    Lα ( f ) = x ∈ Rn  f (x) ≤ α . Assume that f (x) = α, and that there exists  x ∈ ri (dom f ) such that f ( x ) < α. Then we have the representation  λ  ∂ f (x). N (x; Lα ( f )) = λ≥0

Proof Let F(x) = E f (x) = [ f (x), ∞) for x ∈ Rn , and let Θ = (−∞, α]. Then Lα = x ∈ ri (dom ( f )) and f ( x ) < f (x) = α, we can show that F −1 (Θ). Since    ri rge (F) ∩ ri (Θ) = ∅. Indeed, choose γ ∈ R such that f (x) < γ < α. By Corollary 2.46, we see that (x, γ) ∈   ri (epi f ) = ri (gph F), so γ ∈ ri (rge F); see Exercise 2.18. Thus γ ∈ ri rge F ∩ ri Θ. Since N ( f (x); Θ) = N (α; Θ) = [0, ∞), by Proposition 3.34 and Corollary 3.39 we have   y) N (α; Θ) N (x; Lα ( f )) = N (x; F −1 (Θ)) = D ∗ F(x,     D ∗ F(x, f (x))(λ) = λ  ∂ f (x), = D ∗ F(x, f (x)) [0, ∞) = λ≥0

which completes the proof of the corollary.

3.5

λ≥0



Subgradients of Optimal Value Functions

This section is mainly devoted to generalized differentiation for the class of optimal value/marginal functions introduced in Definition 1.82 in the form    μ(x) = inf ϕ(x, y)  y ∈ F(x) , to obtain a formula for calculating the subdifferential of the marginal function μ via the coderivative of the set-valued mapping F and the subdifferential of the extended-real-valued function ϕ. For simplicity of presentation, we assume throughout this section that μ(x) > −∞ for all x ∈ Rn . Proposition 3.41 Let μ be the optimal value function defined in (1.16) generated by a convex → R p . For any x ∈ dom μ, function ϕ : Rn × R p → R and a convex multifunction F : Rn → consider the solution set    S(x) := y ∈ F(x)  μ(x) = ϕ(x, y) .

3.5

Subgradients of Optimal Value Functions

Assuming that S(x) = ∅, for any y ∈ S(x) we have the inclusion    u + D ∗ F(x, y)(v) ⊂ ∂μ(x).

107

(3.35)

(u,v)∈∂ϕ(x,y)

Proof Pick w from the left-hand side of (3.35) and find (u, v) ∈ ∂ϕ(x, y) with w − u ∈ D ∗ F(x, y)(v). This gives us (w − u, −v) ∈ N ((x, y); gph F) and thus w − u, x − x − v, y − y ≤ 0 for all (x, y) ∈ gph F, which shows that whenever y ∈ F(x) we have w, x − x ≤ u, x − x + v, y − y ≤ ϕ(x, y) − ϕ(x, y) = ϕ(x, y) − μ(x). This yields in turn the estimate w, x − x ≤ inf ϕ(x, y) − μ(x) = μ(x) − μ(x), y∈F(x)

which justifies the inclusion w ∈ ∂μ(x), and hence verifies (3.35).



Here is the aforementioned calculation formula for the subdifferential of the optimal value function. Theorem 3.42 Consider the optimal value function (1.16) under the assumptions of Proposition 3.41. For any y ∈ S(x), we have the equality    ∂μ(x) = u + D ∗ F(x, y)(v) (3.36) (u,v)∈∂ϕ(x,y)

provided the fulfillment of the qualification condition ri(dom ϕ) ∩ ri(gph F) = ∅.

(3.37)

Proof Taking Proposition 3.41 into account, it is only required to verify the inclusion “⊂” in (3.36). To proceed, pick w ∈ ∂μ(x) and y ∈ S(x). For any x ∈ Rn , we have the inequalities w, x − x ≤ μ(x) − μ(x) = μ(x) − ϕ(x, y) ≤ ϕ(x, y) − ϕ(x, y) for all y ∈ F(x). This implies that, whenever (x, y) ∈ Rn × R p , the following inequality holds:     w, x − x + 0, y − y ≤ ϕ(x, y) + δ (x, y); gph F) − ϕ(x, y) + δ (x, y); gph F .

108

3 Convex Generalized Differentiation

Define further the functions   f (x, y) = ϕ(x, y) + δ (x, y); gph F , (x, y) ∈ Rn × R p , and deduce from the subdifferential sum rule in Theorem 3.18 under the qualification condition (3.37) that the inclusion   (w, 0) ∈ ∂ f (x, y) = ∂ϕ(x, y) + N (x, y); gph F is satisfied. This shows that (w, 0) = (u 1 , v1 ) + (u 2 , v2 ) with (u 1 , v1 ) ∈ ∂ϕ(x, y) and (u 2 , v2 ) ∈ N ((x, y); gph F). Thus v2 = −v1 , so (u 2 , −v1 ) ∈ N ((x, y); gph F). Finally, we get u 2 ∈ D ∗ F(x, y)(v1 ) and therefore w = u 1 + u 2 ∈ u 1 + D ∗ F(x, y)(v1 ), 

which completes the proof of the theorem.

Let us now present some important consequences of Theorem 3.42. The first one concerns the following chain rule for convex compositions. Corollary 3.43 Let f : Rn → R be a real-valued convex function, and let φ : R → R be a nondecreasing convex function. Take x ∈ Rn and denote y = f (x) while assuming that y ∈ dom φ. Then  λ∂ f (x) ∂(φ ◦ f )(x) = λ∈∂φ(y)

under the assumption that ( f ( x ), ∞) ∩ ri(dom φ) = ∅ for some  x ∈ Rn . Proof The composition φ ◦ f is a convex function by Proposition 1.45. Define ϕ(x, y) = φ(y) for (x, y) ∈ Rn × R and define the set-valued mapping F(x) = [ f (x), ∞) for x ∈ Rn . Observe that (φ ◦ f )(x) = inf φ(y) = inf ϕ(x, y) y∈F(x)

y∈F(x)

since ϕ is nondecreasing. In addition, ri(dom ϕ) = Rn × ri(dom φ),    ri(gph F) = ri(epi f ) = (x, λ) ∈ Rn × R  λ > f (x) . because f has real values. Thus the assumption ( f ( x ), ∞) ∩ ri(dom φ) = ∅ for some  x∈ Rn ensures the fulfillment of the qualification condition (3.37). It follows from Theorem 3.42 that  D ∗ F(x, y)(λ). ∂(φ ◦ f )(x) = λ∈∂φ(y)

3.6

Generalized Differentiation for Polyhedral Convex Functions and Multifunctions

109

Taking again into account that φ is nondecreasing yields λ ≥ 0 for every λ ∈ ∂φ(y). This tells us by Proposition 3.34 that   ∂(φ ◦ f )(x) = D ∗ F(x, y)(λ) = λ∂ f (x), λ∈∂φ(y)

λ∈∂φ(y)

which verifies the claimed chain rule of this corollary.



The second corollary addresses calculating subgradients of the optimal value functions over parameter-independent constraints. Corollary 3.44 Let ϕ : Rn × R p → R, where the function ϕ(x, ·) is bounded below on R p for every x ∈ Rn . Given x ∈ Rn , let       μ(x) = inf ϕ(x, y)  y ∈ R p , S(x) = y ∈ R p  ϕ(x, y) = μ(x) . Then for any x ∈ dom μ and y ∈ S(x), we have    ∂μ(x) = u ∈ Rn  (u, 0) ∈ ∂ϕ(x, y) . Proof Follows directly from Theorem 3.42 with F(x) ≡ R p and Example 3.35. Note that the qualification condition (3.37) is satisfied in this setting. 

3.6

Generalized Differentiation for Polyhedral Convex Functions and Multifunctions

This section collects some results of convex generalized differential calculus concerning normals to sets, subgradients of extended-real valued functions, and coderivatives of setvalued mappings with polyhedral convex components. The obtained results are mainly based on a version of convex separation for two sets one of which is polyhedral convex. We start with the definition of polyhedral sets. Definition 3.45 A subset P of Rn is called a polyhedral convex set, or a convex polyhedron, if there exist vectors v1 , v2 , . . . , vm ∈ Rn and numbers α1 , α2 , . . . , αm ∈ R such that    (3.38) P = x ∈ Rn  vi , x ≤ αi for all i = 1, . . . , m . Thus polyhedral sets are those which are described by systems of finitely many linear inequalities. It is easy to see that we can reduce to form (3.38) seemingly more general systems containing additional linear equalities given by {x ∈ Rn | Ax = c} with an m × n matrix A and a vector c ∈ Rm . Systems of linear inequalities and equalities play a very

110

3 Convex Generalized Differentiation

important role in various aspects of linear algebra, analysis, and optimization. In particular, they describe sets of feasible solutions in problems of linear programming. Let us now introduce yet another version of convex separation and establish an appropriate separation theorem used in what follows. Definition 3.46 Let Ω1 and Ω2 be a nonempty convex sets in Rn . We say that Ω1 and Ω2 can be separated by a hyperplane that does not contain Ω2 if there exist v ∈ Rn and α ∈ R such that v, x ≤ α ≤ v, y whenever x ∈ Ω1 and y ∈ Ω2 , and there exists  y ∈ Ω2 such that α < v,  y. In the setting of Definition 3.46, consider the sets    H = x ∈ Rn  v, x = α ,       H+ = x ∈ Rn  v, x ≥ α , H− = x ∈ Rn  v, x ≤ α . Since v is obviously nonzero, H is a hyperplane while H+ and H− are the corresponding halfspaces. We have Ω1 ⊂ H− , Ω2 ⊂ H+ , and Ω2 ⊂ H. Here is the convex separation theorem invoking polyhedrality. Theorem 3.47 Let P be a nonempty convex polyhedron, and let Θ be a nonempty convex set in Rn . Then P and Θ can be separated by a hyperplane that does not contain Θ if and only if P ∩ ri Θ = ∅. Proof Suppose that P and Θ can be separated by a hyperplane that does not contain Θ. By the definition, there exist v ∈ Rn and α ∈ R such that v, x ≤ α ≤ v, y whenever x ∈ P and y ∈ Θ, and there exists  y ∈ Θ such that α < v,  y. On the contrary, suppose that P ∩ ri Θ = ∅. Pick x0 ∈ P ∩ ri Θ and get v, x0  ≤ v, y for all y ∈ Θ. By the normal cone definition, we have −v ∈ N (x0 ; Θ). Since x0 ∈ ri Θ, Proposition 2.51 tells us that the set N (x0 ; Θ) is a linear subspace of Rn . Thus v ∈ N (x0 ; Θ) meaning that v, y − x0  ≤ 0 for all y ∈ Θ.

3.6

Generalized Differentiation for Polyhedral Convex Functions and Multifunctions

111

This implies that v, y ≤ v, x0  for all y ∈ Θ, and hence we have the equality v, y = y = v, x0 . Thus we arrive at a conv, x0  as x ∈ Θ. In particular, this implies that v,  tradiction since y. v, x0  ≤ α < v,  The obtained contradiction shows that P ∩ ri Θ = ∅. The verification of the converse implication is rather involved. The reader is referred to [18, Theorem 3.84] for a detailed proof.  Now we apply Theorem 3.47 to derive the normal cone intersection rule for two convex sets one if which is polyhedral convex. In contrast to the intersection rule for arbitrary convex sets from Theorem 2.56, the qualification condition below does not use the relative interior of the convex polyhedron. Theorem 3.48 Let P be a convex polyhedron, and let Ω be an arbitrary convex set in Rn . Imposing the qualification condition P ∩ ri Ω = ∅, we have the normal cone intersection rule N (x; P ∩ Ω) = N (x; P) + N (x; Ω) for every x ∈ P ∩ Ω.

(3.39)

Proof It suffices to verify the inclusion “⊂” in (3.39) since the reverse inclusion is obvious. Fix any v ∈ N (x; P ∩ Ω) and get v, x − x ≤ 0 for all x ∈ P ∩ Ω. Define the two subsets of Rn+1 by    Q = (x, λ) ∈ Rn × R  x ∈ P, λ ≤ v, x − x , Θ = Ω × [0, ∞). Both sets are convex, while Q is also polyhedral in Rn+1 . Furthermore, we have by construction that Q ∩ ri Θ = ∅. By Theorem 3.47, there exist w ∈ Rn , γ ∈ R, and α ∈ R such that w, x + γλ1 ≤ α ≤ w, y + γλ2 whenever (x, λ1 ) ∈ Q and (y, λ2 ) ∈ Θ. In addition, there is ( y,  λ2 ) ∈ Θ with α < w,  y + γ λ2 .

(3.40)

112

3 Convex Generalized Differentiation

It follows from (3.40) that γ ≥ 0. Indeed, suppose on the contrary that γ < 0. Using (x, k) ∈ Θ for any k ∈ N, we have α ≤ w, x + γk, which brings us to a contradiction by letting k → ∞. Observe further that γ > 0 since assuming γ = 0 yields w, x ≤ α ≤ w, y whenever x ∈ P, y ∈ Ω, and that α < w,  y with  y ∈ Ω. This implies by Theorem 3.47 that P ∩ ri Ω = ∅, which is a contradiction. Using now x = x, λ1 = 0, y ∈ Ω, and λ2 = 0 in (3.40) gives us the condition w, x ≤ w, y for all y ∈ Ω, which shows that −w ∈ N (x; Ω). Letting further x ∈ P, λ1 = v, x − x, y = x ∈ Ω, and λ2 = 0 leads us to w, x + γv, x − x ≤ w, x for all x ∈ P. Dividing both sides of this inequality by γ and rearranging the terms yield v +

w , x − x ≤ 0 for all x ∈ P, γ

which ensures in turn that v+

w ∈ N (x; P). γ

Therefore, v ∈ −w/γ + N (x; P) ⊂ N (x; Ω) + N (x; P). This verifies the inclusion “⊂” in (3.39) and thus completes the proof of the theorem.  Next we proceed with deriving subdifferential calculus rules involving polyhedral convex functions. Definition 3.49 We say that g : Rn → R is a polyhedral convex function if epi g is a convex polyhedron in Rn+1 . Here is the subdifferential sum rule with a relaxed qualification condition. Theorem 3.50 Consider the sum of convex functions h, f : Rn → R, where h is a polyhedral convex one. Imposing the qualification condition dom h ∩ ri(dom f ) = ∅, we have the subdifferential sum rule

3.6

Generalized Differentiation for Polyhedral Convex Functions and Multifunctions

113

∂(h + f )(x) = ∂h(x) + ∂ f (x) for all x ∈ dom h ∩ dom f . Proof Fixing any x ∈ dom h ∩ dom f , we only need to show that ∂(h + f )(x) ⊂ ∂h(x) + ∂ f (x)

(3.41)

by taking into account that the reverse inclusion is obvious. To verify (3.41), pick v ∈ ∂(h + f )(x) and define the sets    P = (x, λ1 , λ2 ) ∈ Rn × R × R  h(x) ≤ λ1 = epi h × R,    Ω = (x, λ1 , λ2 ) ∈ Rn × R × R  f (x) ≤ λ2 . It is easy to see that (v, −1, −1) ∈ N ((x, h(x), f (x)); P ∩ Ω). To apply Theorem 3.48 to this intersection, we need to check the qualification condition therein. Indeed, Corollary 2.46 tells us that    ri Ω = (x, λ1 , λ2 ) ∈ Rn × R × R  x ∈ ri(dom f ), f (x) < λ2 . Choose x0 ∈ dom h ∩ ri(dom f ) and get (x0 , h(x0 ), f (x0 ) + 1) ∈ P ∩ ri Ω. Thus P ∩ ri Ω = ∅. Employing now Theorem 3.48, we have     (v, −1, −1) ∈ N (x, h(x), f (x)); P + N (x, h(x), f (x)); Ω . Following the proof of Theorem 3.18 leads us to the inclusion v ∈ ∂h(x) + ∂ f (x), which verifies (3.41) and thus completes the proof of the theorem.



The subdifferential chain rule in the next theorem is based on the application of the normal cone intersection rule under polyhedrality. Note that no qualification condition is imposed in this theorem. Theorem 3.51 Let h : R p → R be a polyhedral convex function, let A ∈ R p×n , and let b ∈ R p . Consider the function g(x) = h(Ax + b) for x ∈ Rn . Then we have the subdifferential chain rule ∂g(x) = A T ∂h(Ax + b) for every x ∈ dom g.

114

3 Convex Generalized Differentiation

Proof Fix any x ∈ dom g, and let B(x) = Ax + b for x ∈ Rn . Then y = B(x) ∈ dom h. Pick a subgradient v ∈ ∂g(x) and define the sets P = Rn × epi h and Ω = gph B × R. Following the proof of Theorem 3.23 gives us the inclusion   (v, 0, −1) ∈ N (x, y, h(y)); P ∩ Ω . Since Ω is an affine set, we have ri Ω = Ω. Thus P ∩ ri Ω = ∅ because (x, y, h(y)) belongs to this intersection. Applying Theorem 3.48 tells us that       N (x, y, h(y)); P ∩ Ω = N (x, y, h(y)); P + N (x, y, h(y)); Ω . It remains to follow the proof of Theorem 3.23 and complete in this way the proof of the claimed result.  Finally in this section, we derive refined coderivative calculus rules for convex set-valued mappings under polyhedrality assumptions. → R p is said to be polyhedral convex Definition 3.52 A set-valued mapping F : Rn → n if its graph gph F is a convex polyhedron in R × R p . As always, we start with the corresponding sum rule, now for coderivatives. → R p be convex set-valued mappings, where F1 is polyheTheorem 3.53 Let F1 , F2 : Rn → dral convex. Impose the qualification condition dom F1 ∩ ri(dom F2 ) = ∅. Then we have the coderivative sum rule D ∗ (F1 + F2 )(x, y)(v) = D ∗ F1 (x, y 1 )(v) + D ∗ F2 (x, y 2 )(v) for all (y 1 , y 2 ) ∈ S(x, y) and v ∈ R p , where S is defined in (3.24). Proof Following the proof of Theorem 3.37, fix any u ∈ D ∗ (F1 + F2 )(x, y)(v) and (y 1 , y 2 ) ∈ S(x, y) for which we have the inclusion (u, −v) ∈ N ((x, y); gph(F1 + F2 )).

3.6

Generalized Differentiation for Polyhedral Convex Functions and Multifunctions

115

Consider the convex sets    Ω1 = (x, y1 , y2 ) ∈ Rn × R p × R p  y1 ∈ F1 (x) ,    Ω2 = (x, y1 , y2 ) ∈ Rn × R p × R p  y2 ∈ F2 (x) and deduce from the normal cone definition that (u, −v, −v) ∈ N ((x, y 1 , y 2 ); Ω1 ∩ Ω2 ).   Observe that Ω1 = gph F1 × R p , and hence this set is a convex polyhedron in Rn × R p × x ∈ dom F1 ∩ ri(dom F2 ) and select some vectors  y1 ∈ F1 ( x ) and  y2 ∈ ri F2 ( x ). R p . Fix  Using the representation of ri Ω2 from the proof of Theorem 3.37, we arrive at the inclusion ( x,  y1 ,  y2 ) ∈ Ω1 ∩ ri Ω2 . Employing now Theorem 3.48 gives us the relationships (u, −v, −v) ∈ N ((x, y 1 , y 2 ); Ω1 ∩ Ω2 ) = N ((x, y 1 , y 2 ); Ω1 ) + N ((x, y 1 , y 2 ); Ω2 ). To finish the proof, we just follow the proof of Theorem 3.37.



The next theorem chain rule for polyhedral mappings provides the refined coderivative chain rule for compositions of convex set-valued mappings G ◦ F, where the inner mapping F is a polyhedral convex multifunction. → Theorem 3.54 Consider the composition G ◦ F of convex set-valued mappings F : Rn → p p q → R and G : R → R , where F is polyhedral convex. Assume that fulfillment of the qualification condition   rge F ∩ ri dom G = ∅. Then for any (x, z) ∈ gph (G ◦ F), y ∈ M(x, z) and w ∈ Rq we have the coderivative chain rule D ∗ (G ◦ F)(x, z)(w) = D ∗ F(x, y) ◦ D ∗ G(y, z)(w) where the multifunction M is defined in (3.28). Proof Similarly to the proof of Theorem 3.38. Picking u ∈ D ∗ (G ◦ F)(x, z)(w) and y ∈ M(x, z) gives us the inclusion (u, −w) ∈ N ((x, z); gph(G ◦ F)). Define the two convex subsets of Rn × R p × Rq by Ω1 = (gph F) × Rq and Ω2 = Rn × (gph G). Then Ω1 is a convex polyhedron and Ω2 is a convex set. We can directly deduce from the definitions that

116

3 Convex Generalized Differentiation

(u, 0, −w) ∈ N ((x, y, z); Ω1 ∩ Ω2 ). Using Theorem 2.45 tells us that    ri Ω2 = (x, y, z) ∈ Rn × R p × Rq  y ∈ ri(dom G), z ∈ ri G(y) . Select  y ∈ rge F ∩ ri(dom G) and then choose  x ∈ Rn such that  y ∈ F( x ) and choose  z∈ ri G( y). This gives us the inclusion ( x,  y, z) ∈ Ω1 ∩ ri Ω2 . It follows therefore from Theorem 3.48 that (u, 0, −w) ∈ N ((x, y, z); Ω1 ∩ Ω2 ) = N ((x, y, z); Ω1 ) + N ((x, y, z); Ω2 ). The rest of the proof repeats the steps in the proof of Theorem 3.26.

3.7



Exercises for This Chapter

Exercise 3.1 Calculate the subdifferential and singular subdifferential of each of the following convex functions at every point of their domains: f (x) = |x − 1| + |x − 2|, x ∈ R. f (x) = e |x| , x ∈ R. x 2 − 1 if |x| ≤ 1, (iii) f (x) = ∞ otherwise on R. √ − 1 − x 2 if |x| ≤ 1, (iv) f (x) = ∞ otherwise on R. (v) f (x1 , x2 ) = |x1 | + |x2 |, (x1 , x2 ) ∈ R2 .   (vi) f (x1 , x2 ) = max |x1 |, |x2 | , (x1 , x2 ) ∈ R2 .   (vii) f (x) = max a, x − b, 0 , x ∈ Rn , where a ∈ Rn and b ∈ R. (i) (ii)

Exercise 3.2 Let f , g : Rn → R be convex functions with x ∈ dom f ∩ dom g. Suppose that f (x) = g(x) and f (x) ≤ g(x) for all x ∈ Rn . Verify that ∂ f (x) ⊂ ∂g(x). Exercise 3.3 Let f : R → R be a nondecreasing convex function, and let x ∈ R. Show that v ≥ 0 for v ∈ ∂ f (x).

3.7

Exercises for This Chapter

117

Exercise 3.4 Let ai for i = 1, . . . , m be real numbers such that a1 < a2 < . . . < am . Define the function m  f (x) = |x − ai | for x ∈ R. i=1

Find ∂ f (x) at every x ∈ R. Exercise 3.5 Use induction to prove Corollary 3.21. Exercise 3.6 Prove the reverse inclusion “⊃” in (3.17) of Theorem 3.23. Exercise 3.7 Let A ∈ R p×n , and let b ∈ R p . Find the subdifferential of each of the following convex functions at every point x ∈ Rn : (i) f (x) = Ax − b 2 for x ∈ Rn . (ii) f (x) = Ax − b for x ∈ Rn . Exercise 3.8 Let Ω be a convex subset of R p , and let B : Rn → R p be an affine mapping given by B(x) = Ax + b, where A ∈ R p×n and b ∈ R p . Consider the indicator function f (y) = δΩ (y) of Ω for y ∈ R p . (i) Prove that f ◦ B = δΘ , where Θ = B −1 (Ω). (ii) Provide a detailed proof for Corollary 3.26. Exercise 3.9 Find ∂ f (x) for all x ∈ R for the function f defined in Example 3.28. Exercise 3.10 Consider the maximum functions f (x) = max{x1 , . . . , xn } and g(x) = max{|x1 |, . . . , |xn |} for x = (x1 , . . . , xn ) ∈ Rn . Prove that ∂ f (0) = co {e1 , . . . , en } and ∂g(0) = co {±e1 , . . . , ±en }, where e1 = (1, 0, . . . 0), . . . , en = (0, 0, . . . , 0, 1). Exercise 3.11 Let f : Rn → R be a convex function. For any number α ∈ R, consider the sublevel set    Lα ( f ) = x ∈ Rn  f (x) ≤ α Assume that x ∈ int(dom f ), f (x) = α, and 0 ∈ / ∂ f (x). Prove that    N x; Lα ( f ) = λ∂ f (x). λ≥0

118

3 Convex Generalized Differentiation

Exercise 3.12 Let f : Rn → R be a real-valued convex function satisfying the condition ∂ f (x) ⊂  B as x ∈ Rn for some  ≥ 0. Show that f is Lipschitz continuous with constant  on the entire space Rn . Exercise 3.13 Let f : R → R be a real-valued convex function. Prove that f is nondecreasing if and only if ∂ f (x) ⊂ [0, ∞) for all x ∈ R. Exercise 3.14 Derive counterparts of the (basic) subdifferential calculus rules obtained in Sects. 3.2, 3.5, and 3.6 for the case of singular subdifferentials. → R p be a convex set-valued mapping with (x, y) ∈ gph F. Exercise 3.15 Let F : Rn → p Prove that for any v ∈ R we have      D ∗ F(x, y)(v) = u ∈ Rn  u, x − v, y = max u, x − v, y . (x,y)∈gph F

Exercise 3.16 Let F :

Rn

→ →

Rp

be a convex set-valued mapping.

(i) Prove that the dom F is a convex set in Rn . (ii) Prove that for any (x, y) ∈ gph F we have D ∗ F(x, y)(0) = N (x; dom F). Exercise 3.17 (i) In the setting of Theorem 3.37 and its proof, show that Ω1 − Ω2 = (dom F1 − dom F2 ) × R p × R p . Then use this representation to show that ri Ω1 ∩ ri Ω2 = ∅. (ii) In the setting of Theorem 3.37, prove the reverse inclusion in (3.26) without assuming (3.25). Exercise 3.18 Use Theorem 3.37 with Fi = E fi for i = 1, 2 to verify the subdifferential sum rule in Theorem 3.18. Exercise 3.19 (i) Verify representation (3.32) in the proof of Theorem 3.38. (ii) In the setting of Theorem 3.38, prove the reverse inclusion in (3.30). Exercise 3.20 Provide a detailed proof of Corollary 3.39.

3.7

Exercises for This Chapter

119

Exercise 3.21 In the setting of Theorem 3.48, verify that the set Q is a convex polyhedron, and that we have Q ∩ ri Ω = ∅. Exercise 3.22 Formulate and prove a counterpart of Theorem 3.54 for the case where the outer mapping G in the composition G ◦ F is polyhedral convex.

4

Fenchel Conjugate and Further Topics in Subdifferentiation

Duality is one of the central themes of convex analysis and its applications. A large part of this chapter is devoted to Fenchel conjugates, their calculus, and duality relationships. We also consider here some other related topics of generalized differentiation addressing directional derivatives and convex subdifferentiation for a broad class of supremum functions.

4.1

Fenchel Conjugates

Many important issues of convex analysis and its applications (in particular, to optimization) are based on duality. The following notion plays a crucial role in duality considerations (Fig. 4.1). Definition 4.1 Given a function f : Rn → R (not necessarily convex), its Fenchel conjugate f ∗ : Rn → [−∞, ∞] is    f ∗ (v) := sup v, x − f (x)  x ∈ Rn , v ∈ Rn . (4.1) Note that the case where f ∗ (v) = −∞ is not excluded in (4.1), while we have f ∗ (v) > −∞ for all v ∈ Rn if dom f  = ∅. The following examples illustrate calculations of Fenchel conjugates for some functions.

Example 4.2 (i) Given a ∈ Rn and b ∈ R, consider the affine function f (x) = a, x + b, x ∈ Rn . © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0_4

121

122

4 Fenchel Conjugate and Further Topics in Subdifferentiation

Fig. 4.1 Fenchel conjugate

Then it can be seen directly from the definition that  −b if v = a, ∗ f (v) = ∞ otherwise. (ii) Given any p > 1, consider the function ⎧ p ⎨x if x ≥ 0, p f (x) = ⎩ ∞ otherwise. For any v ∈ R, the conjugate of this function is given by 

xp

x p   f ∗ (v) = sup vx − − vx  x ≥ 0 .  x ≥ 0 = − inf p p It is clear that f ∗ (v) = 0 if v ≤ 0, since in this case we have vx − p −1 x p ≤ 0 when x ≥ 0. Considering the case of v > 0, we see that the function ψv (x) = p −1 x p − vx is convex and differentiable on (0, ∞) with ψv (x) = x p−1 − v. Thus ψv (x) = 0 if and only if x = v 1/( p−1) , and so ψv attains its minimum on [0, ∞) at x = v 1/( p−1) . Therefore, the conjugate function of f is calculated by 1 p/( p−1) , v ∈ Rn . v f ∗ (v) = 1 − p

4.1

Fenchel Conjugates

123

Taking q from q −1 = 1 − p −1 , we express the conjugate function as ⎧ ⎨0 if v ≤ 0, f ∗ (v) = v q ⎩ otherwise. q Now we show that the conjugate of a proper function is always convex. Proposition 4.3 Let f : Rn → R be a proper function, not necessarily convex. Then its Fenchel conjugate f ∗ : Rn → R is convex on Rn . Proof Given any v ∈ Rn , it follows from (4.1) that       f ∗ (v) = sup v, x − f (x)  x ∈ Rn = sup v, x − f (x)  x ∈ dom f . Fixing x0 ∈ dom f , we clearly get that −∞ < v, x0  − f (x0 ) ≤ f ∗ (v). For any x ∈ dom f , the function ϕx Rn → R given by ϕx (v) = v, x − f (x), v ∈ Rn , is obviously convex. Thus Proposition 1.50 tells us that the Fenchel conjugate f ∗ is convex as the supremum of a family of convex functions.  The next proposition verifies a certain monotonicity property under the conjugacy correspondence. Proposition 4.4 Let f , g : Rn → R be such that f (x) ≤ g(x) for all x ∈ Rn . Then we have f ∗ (v) ≥ g ∗ (v) for all v ∈ Rn . Proof Taking any v ∈ Rn , it follows from (4.1) that v, x − f (x) ≥ v, x − g(x) for all x ∈ Rn . This gives us by the definition that       f ∗ (v) = sup v, x − f (x)  x ∈ Rn ≥ sup v, x − g(x)  x ∈ Rn = g ∗ (v), which implies that f ∗ (v) ≥ g ∗ (v) for all v ∈ Rn .



To continue, define the biconjugate of a proper function f : Rn → R as the conjugate of f ∗ , i.e., f ∗∗ (x) := ( f ∗ )∗ (x). It follows from the definition that f ∗∗ : Rn → [−∞, ∞] is given by    f ∗∗ (x) = sup x, v − f ∗ (v)  v ∈ Rn , x ∈ Rn .

124

4 Fenchel Conjugate and Further Topics in Subdifferentiation

Let us calculate the conjugate and biconjugate functions for the Euclidean norm function on Rn . Example 4.5 Consider the norm function f (x) = x on the space Rn . Its Fenchel conjugate is given by    (4.2) f ∗ (v) = sup v, x − x  x ∈ Rn , v ∈ Rn . Examine separately the following two cases: Case 1: v ∈ B. In this case, v, x − x ≤ v · x − x ≤ 0 whenever x ∈ Rn with v, 0 − 0 = 0. This shows that f ∗ (v) = 0 for all v ∈ B. Case 2: v ∈ / B. Plugging x = tv into (4.2) tells us that f ∗ (v) ≥ v, tv − tv = t v ( v − 1) for all t > 0. Since v > 1, this yields f ∗ (v) = ∞ by letting t → ∞. Combining these two cases, we get f ∗ (v) = δB (v) for all v ∈ Rn . Then it is easy to deduce from the definition of the biconjugate function that f ∗∗ (x) =

x = f (x) whenever x ∈ Rn . Note that the calculation in Example 4.2(ii) shows that if 1/ p + 1/q = 1 for positive real numbers p and q, then vq xp + whenever x, v ≥ 0. vx ≤ p q Assertion (i) of the next proposition demonstrates that such a relationship holds in a more general setting. Similarly to the previous two propositions, we do not require the convexity of the function involved in the proposition below. Proposition 4.6 Let f : Rn → R be a proper function. Then we have (i) v, x ≤ f (x) + f ∗ (v) for all x, v ∈ Rn . (ii) f ∗∗ (x) ≤ f (x) for all x ∈ Rn . Proof Fix any elements x, v ∈ Rn and observe first that both (i) and (ii) are obvious when x∈ / dom f , i.e., f (x) = ∞. If x ∈ dom f , we get from (4.1) that    v, x − f (x) ≤ sup v, x − f (x)  x ∈ dom f = f ∗ (v), which verifies (i). This implies in turn that

4.1

Fenchel Conjugates

125

   f ∗∗ (v) = sup v, x − f ∗ (v)  v ∈ Rn ≤ f (x), which justifies (ii) and thus completes the proof.



The following important result reveals a close relationship between subgradients and Fenchel conjugates of convex functions. Theorem 4.7 For any proper convex function f : Rn → R and any x ∈ dom f , we have that v ∈ ∂ f (x) if and only if (4.3) f (x) + f ∗ (v) = v, x. Proof Taking any v ∈ ∂ f (x) and using (3.1) give us f (x) + v, x − f (x) ≤ v, x for all x ∈ Rn . This readily implies the inequality    f (x) + f ∗ (v) = f (x) + sup v, x − f (x)  x ∈ Rn ≤ v, x. Since the reverse inequality holds by Proposition 4.6(i), we arrive at (4.3). Conversely, suppose that f (x) + f ∗ (v) = v, x and get f ∗ (v) = v, x − f (x). Applying Proposition 4.6(i), we get the estimate v, x − f (x) ≤ f ∗ (v) for every x ∈ Rn . This yields v, x − f (x) ≤ v, x − f (x) for every x ∈ Rn , which shows that v ∈ ∂ f (x).



The result obtained in the previous theorem allows us to find conditions ensuring that the biconjugate f ∗∗ of a convex function agrees with the function itself at a given point. This is a manifestation of duality in convex analysis. Proposition 4.8 Let f : Rn → R be a convex function with x ∈ dom f . Suppose that ∂ f (x)  = ∅. Then we have the equality f ∗∗ (x) = f (x). Proof By Proposition 4.6(ii), it suffices to verify the reverse inequality therein. Fix v ∈ ∂ f (x) and get v, x = f (x) + f ∗ (v) by Theorem 4.7. This gives us the relationships    f (x) = v, x − f ∗ (v) ≤ sup x, v − f ∗ (v)  v ∈ Rn = f ∗∗ (x), which thus complete the proof of this proposition. Taking into account Proposition 3.3, we arrive at the following corollary.



126

4 Fenchel Conjugate and Further Topics in Subdifferentiation

Corollary 4.9 Let f : Rn → R be a convex function with x ∈ ri(dom f ). Then we have the equality f ∗∗ (x) = f (x) . Proof Suppose that x ∈ ri(dom f ). Then we can apply Proposition 3.3 and get that the subgradient set ∂ f (x) is nonempty. Therefore, the claimed equality follows directly from Proposition 4.8.  Finally in this section, we prove a necessary and sufficient condition for the fulfillment of the biconjugacy equality f = f ∗∗ known as the Fenchel-Moreau theorem, which is a fundamental result of convex analysis. Theorem 4.10 Let f : Rn → R be a proper function, and let A be the collection of all affine functions of the form ϕ(x) = a, x + b for x ∈ Rn , where a ∈ Rn and b ∈ R. Denote    A( f ) := ϕ ∈ A  ϕ(x) ≤ f (x) for all x ∈ Rn . Then A( f )  = ∅ whenever epi f is closed and convex. Moreover, the following properties are equivalent: (i) epi f is closed and convex. (ii) f (x) = supϕ∈A( f ) ϕ(x) for all x ∈ Rn . (iii) f ∗∗ (x) = f (x) for all x ∈ Rn . Proof Since f is proper, we have epi f  = ∅. Let us first show that A( f )  = ∅ if epi f is / epi f . By closed and convex. Fix any x0 ∈ dom f and choose λ0 < f (x0 ). Then (x0 , λ0 ) ∈ Proposition 2.33, there exist a pair (v, γ) ∈ Rn × R and a positive number  ensuring the strict inequality v, x + γλ < v, x0  + γλ0 −  whenever (x, λ) ∈ epi f .

(4.4)

Since (x0 , f (x0 ) + α) ∈ epi f for all α ≥ 0, we get γ( f (x0 ) + α) < γλ0 −  whenever α ≥ 0. This yields γ < 0 since if not, we can let α → ∞ and arrive at a contradiction. For any x ∈ dom f , it follows from (4.4) as (x, f (x)) ∈ epi f that v, x + γ f (x) < v, x0  + γλ0 − , x ∈ dom f . This allows us to conclude that

v   f (x) > , x0 − x + λ0 − if x ∈ dom f . γ γ

4.2

Support Functions and Support Intersection Rules

Define now the affine function

v   ϕ(x) = , x0 − x + λ0 − whenever x ∈ Rn γ γ

127

(4.5)

for which we have ϕ ∈ A( f ), and therefore A( f )  = ∅. Let us next prove that (i)=⇒(ii). Taking any x0 ∈ Rn , we intend to show that (ii) is satisfied when x = x0 . By the definition of supremum, this reduces to show that for any / epi f , we apply λ0 < f (x0 ) there exists ϕ ∈ A( f ) such that λ0 < ϕ(x0 ). Since (x0 , λ0 ) ∈ again Proposition 2.33 to obtain (4.4). In the case where x0 ∈ dom f , consider the function ϕ defined in (4.5) and get ϕ ∈ A( f ). Moreover, we have ϕ(x0 ) = λ0 − γ > λ0 since γ < 0. / dom f . It follows from (4.4) by taking any x ∈ dom f Consider further the case where x0 ∈ and letting λ → ∞ that γ ≤ 0. If γ < 0, we use the same process and arrive at the conclusion. Hence it remains to consider the case where γ = 0. In this case, we have that v, x − x0  +  < 0 whenever x ∈ dom f . Since A( f )  = ∅, choose ϕ0 ∈ A( f ) and define ϕk (x) = ϕ0 (x) + k(v, x − x0  + ) for x ∈ Rn , k ∈ N. It is obvious that ϕk ∈ A( f ) and ϕk (x0 ) = ϕ0 (x0 ) + k > λ0 for large k. This readily justifies assertion (ii). Let us now verify implication (ii)=⇒(iii). Fix any ϕ ∈ A( f ). Then ϕ ≤ f , and hence ϕ∗∗ ≤ f ∗∗ . Applying Proposition 4.8 ensures that ϕ = ϕ∗∗ ≤ f ∗∗ . It follows therefore that   f (x) = sup ϕ(x)  ϕ ∈ A( f )} ≤ f ∗∗ for every x ∈ Rn . The reverse inequality f ∗∗ ≤ f holds by Proposition 4.6(ii), and thus f ∗∗ = f . The last implication (iii)=⇒(i) is obvious because the set epi g ∗ is always closed and convex for any proper function g Rn → R. 

4.2

Support Functions and Support Intersection Rules

In this section, we investigate one more class of functions associated with given sets together with their subgradients. Here is the basic definition. Definition 4.11 Given a nonempty subset  of Rn , the support function associated with  is defined by    (4.6) σ (v) := sup v, x  x ∈  , v ∈ Rn .

128

4 Fenchel Conjugate and Further Topics in Subdifferentiation

It is easy to deduce from the definition that the support function (4.6), which is always convex regardless of the convexity of the set Ω in question, is the conjugate function of the indicator one associated with this set. Proposition 4.12 Given a nonempty subset of Rn , we have ∗ δΩ (v) = σΩ (v) for all v ∈ Rn .

Proof Taking any v ∈ Rn and using the definition, we have    ∗ (v) = sup v, x − δ (x)  x ∈ dom δ = Ω δΩ Ω Ω    = sup v, x  x ∈ Ω = σΩ (v), which therefore completes the proof.



Let us now list simple albeit useful properties of the support function that follow from its definition. Proposition 4.13 The following assertions hold: (i) For any nonempty subset Ω of Rn , the support function σΩ is subadditive and positively homogeneous. (ii) For any nonempty subsets Ω1 , Ω2 ⊂ Rn and any v ∈ Rn , we have   σΩ1 +Ω2 (v) = σΩ1 (v) + σΩ2 (v), σΩ1 ∪Ω2 (v) = max σΩ1 (v), σΩ2 (v) . The next result calculates subgradients of support functions by using the strict convex separation theorem. Theorem 4.14 Let Ω be a nonempty closed convex subset of Rn , and let    S(v) = x ∈ Ω  v, x = σΩ (v) , v ∈ Rn . For any v ∈ dom σΩ , we have the following subdifferential representation for the support function (4.6): ∂σΩ (v) = S(v). Proof Fix p ∈ ∂σΩ (v) and get from (3.1) that  p, v − v ≤ σΩ (v) − σΩ (v) whenever v ∈ Rn . Taking this into account, for any u ∈ Rn we have

(4.7)

4.2

Support Functions and Support Intersection Rules

 p, u =  p, u + v − v ≤ σΩ (u + v) − σΩ (v) ≤ σΩ (u) + σΩ (v) − σΩ (v) = σΩ (u).

129

(4.8)

Assuming that p ∈ / Ω, we use the strict separation result of Proposition 2.33 and find u ∈ Rn such that    σΩ (u) = sup u, x  x ∈ Ω <  p, u, which contradicts (4.8) and so yields p ∈ Ω. From (4.8), we also get  p, v ≤ σΩ (v). Letting v = 0 in (4.7) gives us σΩ (v) ≤  p, v. Hence  p, v = σΩ (v) and thus we obtain p ∈ S(v). Now fix p ∈ S(v) and get from p ∈ Ω and  p, v = σΩ (v) that  p, v − v =  p, v −  p, v ≤ σΩ (v) − σΩ (v) for all v ∈ Rn . This shows that p ∈ ∂σΩ (v), which completes the proof. Note that Theorem 4.14 immediately implies that    ∂σΩ (0) = S(0) = x ∈ Ω  0, x = σΩ (0) = 0 = Ω



(4.9)

for any nonempty closed convex set Ω ⊂ Rn . The proposition below justifies a certain monotonicity property of support functions with respect to set inclusions. Proposition 4.15 Let Ω1 and Ω2 be two nonempty closed convex subsets of Rn . Then the inclusion Ω1 ⊂ Ω2 holds if and only if we have σΩ1 (v) ≤ σΩ2 (v) for all v ∈ Rn . Proof Taking Ω1 ⊂ Ω2 , for any v ∈ Rn we get       σΩ1 (v) = sup v, x  x ∈ Ω1 ≤ sup v, x  x ∈ Ω2 = σΩ2 (v). Conversely, suppose that σΩ1 (v) ≤ σΩ2 (v) whenever v ∈ Rn . Since σΩ1 (0) = σΩ2 (0) = 0, it follows from definition (3.1) of the subdifferential that ∂σΩ1 (0) ⊂ ∂σΩ2 (0), which yields Ω1 ⊂ Ω2 by formula (4.9).



The next theorem allows us to represent the support function associated with the intersection of two convex sets in terms of the support functions associated with the sets involved. It plays a central role in our geometric approach to Fenchel conjugate calculus in the next section. To proceed, for f 1 , f 2 : Rn → R define the infimal convolution of these functions by    (4.10) ( f 1  f 2 )(x) := inf f 1 (x1 ) + f 2 (x2 )  x1 + x2 = x , x ∈ Rn .

130

4 Fenchel Conjugate and Further Topics in Subdifferentiation

Theorem 4.16 Let 1 and 2 be convex subsets of Rn . Suppose that ri 1 ∩ ri 2  = ∅. Then we have the relationship σ1 ∩2 (v) = (σ1 σ2 )(v) for all v ∈ Rn . Furthermore, for any v ∈ dom(σ1 ∩2 ) there exist vectors v1 , v2 ∈ Rn such that v = v1 + v2 and we have the representation σ1 ∩2 (v) = σ1 (v1 ) + σ2 (v2 ). Proof Take any v ∈ Rn and v1 , v2 ∈ Rn such that v = v1 + v2 . Then σ1 (v1 ) + σ2 (v2 ) = supx∈1 v1 , x + supx∈2 v2 , x ≥ supx∈1 ∩2 v1 , x + supx∈1 ∩2 v2 , x ≥ supx∈1 ∩2 v1 + v2 , x = supx∈1 ∩2 v, x = σ1 ∩2 (v). It follows by taking the infimum with respect to all v1 , v2 ∈ Rn satisfying v = v1 + v2 due to definition (4.10) of the infimal convolution that (σ1 σ2 )(v) ≥ σ1 ∩2 (v).

(4.11)

It remains to prove the reverse inequality (σ1 σ2 )(v) ≤ σ1 ∩2 (v). Since this inequality is obvious if σ1 ∩2 (v) = ∞, it suffices to consider the case where v ∈ dom(σ1 ∩2 ), i.e., α = σ1 ∩2 (v) ∈ R. Then we have the inequality v, x ≤ α for all x ∈ 1 ∩ 2 . Define the two nonempty convex sets by    1 = (x, λ) ∈ Rn+1  x ∈ 1 , λ ≤ v, x , 2 = Ω2 × [α, ∞). It is easy to see that    ri 1 = (x, λ) ∈ Rn+1  x ∈ ri 1 , λ < v, x , ri 2 = (ri Ω2 ) × (α, ∞).

4.2

Support Functions and Support Intersection Rules

131

Let us show that ri 1 ∩ ri 2 = ∅. Suppose on the contrary that this is not the case and find (x0 , λ0 ) ∈ Rn+1 such that x0 ∈ ri 1 , λ0 < v, x0 , x0 ∈ ri 2 , α < λ0 . Then x0 ∈ ri 1 ∩ ri 2 ⊂ 1 ∩ 2 , and hence α < λ0 < v, x0  ≤ α, which gives us a contradiction. By the proper convex separation from Theorem 2.40, there exists a pair (w, γ) ∈ Rn+1 such that (4.12) w, x + γλ1 ≤ w, y + γλ2 x, λ1 ) ∈ 1 and whenever (x, λ1 ) ∈ 1 and (y, λ2 ) ∈ 2 . In addition, there exist pairs ( ( y,  λ2 ) ∈ 2 for which y + γ λ2 . (4.13) w,  x  + γ λ1 < w,  Taking x ∈ Ω1 ∩ Ω2 and plugging x = x, λ1 = v, x, y = x, and λ2 = α + k with k ∈ N into (4.12), leads us to γv, x ≤ γ(α + k) for all k ∈ N. This implies that γ ≥ 0 since k is taken arbitrarily in N. We intend to show that γ > 0. Suppose on the contrary that γ = 0 and then deduce from (4.12) and (4.13) that w, x ≤ w, y whenever x ∈ 1 , y ∈ 2 . We also have in addition that w,  x  < w,  y with  x ∈ 1 ,  y ∈ 2 . Employing again the strict convex separation from Theorem 2.40 tells us that ri 1 ∩ ri 2 = ∅, which is a contradiction. This shows that γ > 0, and hence from (4.12) with x ∈ 1 , λ1 = v, x, y ∈ 2 , and λ2 = α we deduce that w, x + γv, x ≤ w, y + γα. This readily implies that

w  w  v + , x + − , y ≤ α whenever x ∈ 1 , y ∈ 2 . γ γ

(4.14)

Letting further v1 = v + w/γ and v2 = −w/γ gives us v = v1 + v2 , and it follows from (4.14) that

132

4 Fenchel Conjugate and Further Topics in Subdifferentiation

σ1 (v1 ) + σ2 (v2 ) ≤ α = σ1 ∩2 (v). Thus definition (4.10) of the infimal convolution ensures that (σ1 σ2 )(v) ≤ σ1 (v1 ) + σ2 (v2 ) ≤ α = σ1 ∩2 (v). Recalling finally (4.11) brings us to the equalities (σ1 σ2 )(v) = σ1 (v1 ) + σ2 (v2 ) = σ1 ∩2 (v), which therefore complete the proof of the theorem.

4.3



Conjugate Calculus Rules

This section develops conjugate calculus for extended-real-valued functions that provides rules for representing Fenchel conjugates under various operations over convex functions via conjugates of the functions involved. We start with simple rules, which follow directly from the definitions. Proposition 4.17 Let f : Rn → R be an arbitrary function with λ > 0, c ∈ R, and a ∈ Rn . Then for any v ∈ Rn we have: v . λ ∗ ∗ (ii) ( f + c) (v) = f (v) − c. (iii) ( f a )∗ (v) = f ∗ (v) − v, a, where f a (x) := f (x + a) for x ∈ Rn . (i) (λ f )∗ (v) = λ f ∗

Proof To verify (i), we get by definition that    n  (λ f )∗ (v) = sup v, v vx − λ f (x) x ∈ R

 . , x − f (x)  x ∈ Rn = λ f ∗ = λ sup λ λ The proofs of (ii) and (iii) are also straightforward.



The next proposition expresses the conjugate of the infimal convolution via the sum of the conjugates of the functions involved. Proposition 4.18 Let f 1 , f 2 : Rn → R be proper functions such that ( f 1  f 2 )(x) > −∞ for all x ∈ Rn .

4.3

Conjugate Calculus Rules

Then we have

133

( f 1  f 2 )∗ (v) = f 1∗ (v) + f 2∗ (v) for all v ∈ Rn .

Proof Fix v ∈ Rn and for any x, y ∈ Rn get the relationships    ( f 1  f 2 )∗ (v) = sup v, x − ( f 1  f 2 )(x)  x ∈ Rn ≥ v, x − f 1 (y) − f 2 (x − y) = v, y − f 1 (y) + v, x − y − f 2 (x − y). Taking the supremum with respect to x ∈ Rn gives us ( f 1  f 2 )∗ (v) ≥ v, y − f 1 (y) + f 2∗ (v) whenever y ∈ Rn . Taking further the supremum with respect to y ∈ Rn brings us to ( f 1  f 2 )∗ (v) ≥ f 1∗ (v) + f 2∗ (v). To verify the reverse inequality, we first fix x ∈ Rn and take any x1 , x2 ∈ Rn such that x1 + x2 = x. In this way we get v, x − ( f 1 (x1 ) + f 2 (x2 )) = v, x1  − f 1 (x1 ) + v, x2  − f 2 (x2 ) ≤ f 1∗ (v) + f 2∗ (v). Taking the infimum with respect to all x1 , x2 ∈ Rn such that x1 + x2 = x gives us the inequality v, x − ( f 1  f 2 )(x) ≤ f 1∗ (v) + f 2∗ (v). Finally, we take the supremum with respect to x ∈ Rn and arrive at ( f 1  f 2 )∗ (v) ≤ f 1∗ (v) + f 2∗ (v), which therefore completes the proof of the proposition.



Note that we do not assume the convexity of the functions involved in Proposition 4.18. Since f 1 and f 2 are proper, their conjugate functions f 1∗ and f 2∗ are always convex with values in R. Thus the sum f 1∗ (v) + f 2∗ (v) is well-defined for v ∈ Rn . The condition ( f 1  f 2 )(x) > −∞ for all x ∈ Rn is imposed in this proposition since the Fenchel conjugate is defined only for functions with values in R, i.e., not equal to −∞. Our next goal is to derive a conjugate sum rule. To do it, we are going to use the support intersection rule from Theorem 4.16 and the following lemma, which represents the Fenchel conjugate of the function in question via the support function of its epigraph.

134

4 Fenchel Conjugate and Further Topics in Subdifferentiation

Lemma 4.19 Let f : Rn → R be a function. Then f ∗ (v) = σepi f (v, −1) for all v ∈ Rn . Proof Fix any v ∈ Rn . Since (x, f (x)) ∈ epi f whenever x ∈ dom f , we have f ∗ (v) = sup{v, x − f (x) | x ∈ dom f } = sup{(v, −1), (x, f (x)) | x ∈ dom f } ≤ sup{(v, −1), (x, λ) | (x, λ) ∈ epi f } = σepi f (v, −1). For any (x, λ) ∈ epi f , it follows that f (x) ≤ λ and hence (v, −1), (x, λ) = v, x − λ ≤ v, x − f (x) ≤ f ∗ (v). Taking the supremum with respect to (x, λ) ∈ epi f yields σepi f (v, −1) ≤ f ∗ (v), which completes the proof of the lemma.



Here is the sum rule for Fenchel conjugates of convex functions, while the upper estimate of conjugates for sums holds with no convexity assumption. Theorem 4.20 Let f 1 , f 2 : Rn → R be proper functions. Then   ( f 1 + f 2 )∗ (v) ≤ f 1∗  f 2∗ (v) for all v ∈ Rn .

(4.15)

If in addition the functions f 1 and f 2 are convex and satisfy ri(dom f 1 ) ∩ ri(dom f 2 )  = ∅, then we have the conjugate sum rule   ( f 1 + f 2 )∗ (v) = f 1∗  f 2∗ (v) for all v ∈ Rn .

(4.16)

Moreover, the infimum in ( f 1∗  f 2∗ )(v) above is attained, i.e., for any v ∈ dom( f 1 + f 2 )∗ there exist v1 , v2 ∈ Rn such that v = v1 + v2 and ( f 1 + f 2 )∗ (v) = f 1∗ (v1 ) + f 2∗ (v2 ).

(4.17)

4.3

Conjugate Calculus Rules

135

Proof Fixing any v1 , v2 ∈ Rn with v1 + v2 = v, we get   f 1∗ (v1 ) + f 2∗ (v2 ) = sup v1 , x − f 1 (x) | x ∈ Rn   + sup v2 , x − f 2 (x) | x ∈ Rn    ≥ sup v1 , x − f 1 (x) + v2 ,x − f 2 (x)  x ∈ Rn   = sup v, x − ( f 1 + f 2 )(x)  x ∈ Rn = ( f 1 + f 2 )∗ (v). Thus we obtain (4.15), merely under the proper assumption, by taking the infimum with respect to all v1 , v2 ∈ Rn with v1 + v2 = v. Let us now verify that ( f 1∗  f 2∗ )(v) ≤ ( f 1 + f 2 )∗ (v) holds under the additional assumptions imposed. We only need to consider the case where ( f 1 + f 2 )∗ (v) < ∞, i.e., v ∈ dom ( f 1 + f 2 )∗ . Define two convex sets    Ω1 = (x, λ1 , λ2 ) ∈ Rn × R × R  λ1 ≥ f 1 (x) = epi( f 1 ) × R, (4.18)    Ω2 = (x, λ1 , λ2 ) ∈ Rn × R × R  λ2 ≥ f 2 (x) . It can be checked similarly to the proof of Theorem 3.18 that ri 1 ∩ ri 2  = ∅. Then we have the representation ( f 1 + f 2 )∗ (v) = σΩ1 ∩Ω2 (v, −1, −1).

(4.19)

Applying Theorem 4.16 on the support function intersection rule in (4.19) gives us the two triples (v1 , −α1 , −α2 ), (v2 , −β1 , −β2 ) ∈ Rn × R × R such that (v, −1, −1) = (v1 , −α1 , −α2 ) + (v2 , −β1 , −β2 ) and ( f 1 + f 2 )∗ (v) = σΩ1 ∩Ω2 (v, −1, −1) = σΩ1 (v1 , −α1 , −α2 ) + σΩ2 (v2 , −β1 , −β2 ). If α2  = 0, then σΩ1 (v1 , −α1 , −α2 ) = ∞, which is not possible. Thus α2 = 0 and similarly β1 = 0. Employing Lemma 4.19 and taking into account the structures of the sets Ω1 and Ω2 in (4.18) imply that ( f 1 + f 2 )∗ (v) = σΩ1 ∩Ω2 (v, −1, −1) = σΩ1 (v1 , −1, 0) + σΩ2 (v2 , 0, −1) = σepi f1 (v1 , −1) + σepi f2 (v2 , −1)   = f 1∗ (x1∗ ) + f 2∗ (v2 ) ≥ f 1∗  f 2∗ (v). This justifies therefore the conjugate sum rule (4.16) together with the last statement of the theorem.  Our next result presents a conjugate chain rule the proof of which is also based on the support function representation for set intersection.

136

4 Fenchel Conjugate and Further Topics in Subdifferentiation

Theorem 4.21 Let A ∈ R p×n , and let g : R p → R be a proper function. Then    (g ◦ A)∗ (v) ≤ inf g ∗ (w)  A T w = v , v ∈ Rn .

(4.20)

If in addition g is a convex function and the qualification condition im A ∩ ri(dom g) = A(Rn ) ∩ ri(dom g)  = ∅ is satisfied, then we have the conjugate chain rule    (g ◦ A)∗ (v) = inf g ∗ (w)  A T w = v , v ∈ Rn .

(4.21)

Furthermore, for any v ∈ dom(g ◦ A)∗ there exists w ∈ (A T )−1 (v) such that (g ◦ A)∗ (v) = g ∗ (w). Proof Picking w ∈ R p such that A T w = v gives us by definition that    g ∗ (w) = sup w, y − g(y)  y ∈ R p   ≥ sup w, Ax − g(Ax)  x ∈ Rn   T  x ∈ Rn = sup A w, x − (g ◦ A)(x)    = sup v, x − (g ◦ A)(x)  x ∈ Rn = (g ◦ A)∗ (v), which therefore verifies the upper estimate (4.20). Let us now show that the reverse estimate is also satisfied under the additional assumptions made. It suffices to consider the case where v ∈ dom(g ◦ A)∗ . Pick any v ∈ dom(g ◦ A)∗ and then define the two convex sets Ω1 = gph(A) × R and Ω2 = Rn × epi g ⊂ Rn × R p × R.

(4.22)

It follows directly from the above constructions that (g ◦ A)∗ (v) = σΩ1 ∩Ω2 (v, 0, −1) < ∞. Since Ω1 is an affine set, we have ri Ω1 = Ω1 . Using Corollary 2.46 gives us    ri 2 = (x, y, λ) ∈ Rn × R p × R  y ∈ ri (dom g), λ > g(y) . Observe that ri 1 ∩ ri 2  = ∅ for the sets Ω1 , Ω2 from (4.22). Then the support intersection rule from Theorem 4.16 tells us that there exist some triples (v1 , w1 , α1 ) and (v2 , w2 , α2 ) in the space Rn × R p × R satisfying (v, 0, −1) = (v1 , w1 , α1 ) + (v2 , w2 , α2 ) and σ1 ∩2 (v, 0, −1) = σ1 (v1 , w1 , α1 ) + σ2 (v2 , w2 , α2 ).

4.4

Directional Derivatives

137

We readily deduce from the structures of 1 , 2 in (4.22) that α1 = 0 and v2 = 0. This gives us the representation σ1 ∩2 (v, 0, −1) = σ1 (v, −w2 , 0) + σ2 (0, w2 , −1). Thus we arrive at the equalities    σ1 ∩2 (v, 0, −1) = sup v, x − w2 , Ax  x ∈ Rn  + σepi g (w2 , −1)   ∗ = sup x − A T w2 , x  x ∈ Rn + g ∗ (w2 ), which allow us to conclude that v = A T w2 and therefore    σ1 ∩2 (v, 0, −1) = g ∗ (w2 ) ≥ inf g ∗ (w)  w ∈ (A T )−1 (v) . This justifies (4.21) and the concluding statement of the theorem.

4.4



Directional Derivatives

Our next topic is the directional differentiability of convex functions and its relationships with subdifferentiation. In contrast to classical analysis, directional derivative constructions in convex analysis are one-sided and related to directions with no classical plus-minus symmetry. Definition 4.22 Let f : Rn → R be an extended-real-valued function with x ∈ dom f . The directional derivative of the function f at the point x in the direction d ∈ Rn is the following limit f (x + td) − f (x) (4.23) f (x; d) := lim t t→0+ if the limit exists as either a real number or ±∞. Note that construction (4.23) is sometimes called the right directional derivative of f at x in the direction d. Its left counterpart is defined by f − (x; d) := lim

t→0−

f (x + td) − f (x) . t

It is easy to see from the definitions that f − (x; d) = − f (x; −d) for all d ∈ Rn ,

138

4 Fenchel Conjugate and Further Topics in Subdifferentiation

and thus properties of the left directional derivative f − (x; d) reduce to those of the right one (4.23), which is studied in what follows. First we derive a useful lemma based on convexity. Lemma 4.23 Given a convex function f : Rn → R with x ∈ dom f and given a direction d ∈ Rn , define the function ϕ(t) =

f (x + td) − f (x) for t > 0. t

Then the function ϕ (0, ∞) → R is nondecreasing. Proof Fixing any real numbers 0 < t1 < t2 , we get the representation t1 t1 x + t1 d = x + t2 d + 1 − x. t2 t2 It follows from the convexity of f that f (x + t1 d) ≤

t1 t1 f (x), f (x + t2 d) + 1 − t2 t2

which implies in turn the inequality ϕ(t1 ) =

f (x + t1 d) − f (x) f (x + t2 d) − f (x) ≤ = ϕ(t2 ). t1 t2

This verifies that ϕ is nondecreasing on (0, ∞).



The next proposition establishes the directional differentiability of an arbitrary proper convex function. Proposition 4.24 For any convex function f : Rn → R and any x ∈ dom f , the directional derivative f (x; d) (and hence its left counterpart) exists in every direction d ∈ Rn . Furthermore, it admits the following representation via the function ϕ defined in Lemma 4.23: f (x; d) = inf ϕ(t) for all d ∈ Rn . t>0

Proof Lemma 4.23 tells us that the function ϕ is nondecreasing. Thus f (x; d) = lim

t→0+

f (x + td) − f (x) = lim ϕ(t) = inf ϕ(t), t>0 t t→0+

which readily verifies the results claimed in the proposition.



4.4

Directional Derivatives

139

Corollary 4.25 If f : Rn → R is a convex function, then f (x; d) is a real number for any x ∈ int(dom f ) and any direction d ∈ Rn . Proof It follows from Theorem 1.64 that f is locally Lipschitz continuous around x. Therefore, there exists  ≥ 0 such that  f (x + td) − f (x)  t d   =  d for all small t > 0, ≤  t t which shows that | f (x; d)| ≤  d < ∞.



To establish relationships between directional derivatives and subgradients of convex functions, we need the following useful observation. Lemma 4.26 For any convex function f : Rn → R and x ∈ dom f , we have f (x; d) ≤ f (x + d) − f (x) whenever d ∈ Rn . Proof Lemma 4.23 tells us that the function ϕ therein satisfies the condition ϕ(t) ≤ ϕ(1) = f (x + d) − f (x) for all t ∈ (0, 1). This justifies the claimed property due to f (x; d) = inf t>0 ϕ(t) ≤ ϕ(1).



Here are the aforementioned relationships. Theorem 4.27 Let f : Rn → R be a convex function with x ∈ dom f . The following assertions are equivalent: (i) v ∈ ∂ f (x). (ii) v, d ≤ f (x; d) for all d ∈ Rn . (iii) f − (x; d) ≤ v, d ≤ f (x; d) for all d ∈ Rn . Proof Picking any v ∈ ∂ f (x) and t > 0, we get v, td ≤ f (x + td) − f (x) whenever d ∈ Rn , which verifies implication (i)=⇒(ii) by dividing both sides of the inequality by t and taking the limit as t → 0+ . Assuming now that assertion (ii) holds, we get by Lemma 4.26 that v, d ≤ f (x; d) ≤ f (x + d) − f (x) for all d ∈ Rn .

140

4 Fenchel Conjugate and Further Topics in Subdifferentiation

This ensures by (3.1) that v ∈ ∂ f (x), and thus verifies the equivalence between assertions (i) and (ii). Further, it is obvious that (iii) yields (ii). Conversely, if (ii) is satisfied, then for d ∈ Rn we have v, −d ≤ f (x; −d), and thus f − (x; d) = − f (x; −d) ≤ v, d, which justifies the fulfillment of (iii) and completes the proof.



Let us next list some properties of (4.23) as a function of the direction. Proposition 4.28 For any convex function f : Rn → R with x ∈ dom f , define the directional function ψ(d) = f (x; d). Then this directional function satisfies the following properties: (i) ψ(0) = 0. (ii) ψ(d1 + d2 ) ≤ ψ(d1 ) + ψ(d2 ) for all d1 , d2 ∈ Rn provided that the right-hand side is well-defined. (iii) ψ(αd) = αψ(d) whenever d ∈ Rn and α > 0. (iv) If furthermore x ∈ int(dom f ), then ψ is finite on Rn . Proof It is straightforward to deduce properties (i)–(iii) directly from definition (4.23). For instance, (ii) is satisfied due to the relationships   f x + t(d1 + d2 ) − f (x) ψ(d1 + d2 ) = lim t t→0+  f = lim

t→0+

≤ lim

t→0+

x + 2td1 + x + 2td2 2 t

 − f (x)

f (x + 2td1 ) − f (x) f (x + 2td2 ) − f (x) + lim 2t 2t t→0+

= ψ(d1 ) + ψ(d2 ). Thus it remains to check that ψ(d) is a real number for every d ∈ Rn when x ∈ int(dom f ). To proceed, choose α > 0 so small that x + αd ∈ dom f . It follows from Lemma 4.26 that ψ(αd) = f (x; αd) ≤ f (x + αd) − f (x) < ∞. Employing (iii) gives us ψ(d) < ∞. Further, we have from (i) and (ii) that   0 = ψ(0) = ψ d + (−d) ≤ ψ(d) + ψ(−d), d ∈ Rn ,

4.4

Directional Derivatives

141

which implies that ψ(d) ≥ −ψ(−d) > −∞ and so verifies (iv).



To derive the second major result of this section, we need one more lemma. Lemma 4.29 Let f : Rn → R be a convex function, and let the directional function ψ be taken from Proposition 4.28 with x ∈ int(dom f ). Then we have: (i) ∂ f (x) = ∂ψ(0). (ii) ψ ∗ (v) = δΩ (v) for all v ∈ Rn , where Ω = ∂ψ(0). Proof It follows from Theorem 4.27 that v ∈ ∂ f (x) if and only if v, d − 0 = v, d ≤ f (x; d) = ψ(d) = ψ(d) − ψ(0) for all d ∈ Rn . This is equivalent to v ∈ ∂ψ(0), and hence (i) holds. To verify (ii), let us first show that ψ ∗ (v) = 0 for all v ∈ Ω = ∂ψ(0). Indeed, we obviously have the estimate    ψ ∗ (v) = sup v, d − ψ(d)  d ∈ Rn ≥ v, 0 − ψ(0) = 0. Picking now any v ∈ ∂ψ(0) tells us that v, d = v, d − 0 ≤ ψ(d) − ψ(0) = ψ(d), d ∈ Rn , which implies therefore the inequality    ψ ∗ (v) = sup v, d − ψ(d)  d ∈ Rn ≤ 0 and hence ensures the fulfillment of ψ ∗ (v) = 0 for any v ∈ ∂ψ(0). / ∂ψ(0). For such an element v, find d0 ∈ Rn It remains to show that ψ ∗ (v) = ∞ if v ∈ with v, d0  > ψ(d0 ). Since ψ is positively homogeneous by Proposition 4.28 (iii), it follows that    ψ ∗ (v) = sup v, d − ψ(d)  d ∈ Rn ≥ sup(v, td0  − ψ(td0 )) = sup t(v, d0  − ψ(d0 )) = ∞,

t>0

t>0

which thus completes the proof of the lemma.



Now we are ready to establish a major relationship between the directional derivative and the subdifferential of an arbitrary convex function. Theorem 4.30 Given a convex function f : Rn → R and a point x ∈ int(dom f ), we have the relationship

142

4 Fenchel Conjugate and Further Topics in Subdifferentiation

   f (x; d) = max v, d  v ∈ ∂ f (x) for every d ∈ Rn . Proof It follows from Proposition 4.8, with taking into account the properties of the function ψ(d) = f (x; d), d ∈ Rn , listed in Proposition 4.28, that f (x; d) = ψ(d) = ψ ∗∗ (d) whenever d ∈ Rn . Note that ψ is a real-valued convex function in this setting. Employing now Lemma 4.29 tells us that ψ ∗ (v) = δΩ (v), where Ω = ∂ψ(0) = ∂ f (x). Hence we have by Proposition 4.12 that    ∗ (d) = σΩ (d) = sup v, d  v ∈ Ω . ψ ∗∗ (d) = δΩ Since the subgradient set Ω = ∂ f (x) is compact by Proposition 3.3, we thus complete the proof of the theorem. 

4.5

Subgradients of Supremum Functions

This section concerns subdifferentiation of a broad class of convex functions playing a highly important role in convex analysis and a variety of applications, particularly to problems of optimization. Let T be a nonempty subset of R p , and let g : T × Rn → R be a real-valued function. For convenience, we also use the notation gt (x) = g(t, x) for (t, x) ∈ T × Rn . The supremum function f : Rn → R for gt over T is f (x) = sup g(t, x) = sup gt (x), x ∈ Rn . t∈T

(4.24)

t∈T

If the supremum in (4.24) is attained (this happens, in particular, when the index set T is compact and g(·, x) is continuous), then (4.24) reduces to the maximum function, which can be written in form (3.20) when T is a finite set. The main goal of this section is to calculate the subdifferential (3.1) of (4.24) when the functions gt are convex. Note that in this case the supremum function (4.24) is also convex by Proposition 1.50. In what follows, we assume without mentioning it that the functions gt : Rn → R are convex for all t ∈ T . For any point x ∈ Rn , define the active index set    S(x) = t ∈ T  gt (x) = f (x) , (4.25) which is empty if the supremum in (4.24) is not attained for x = x. We first present a simple lower estimate for the subgradient set associated with the supremum function (4.24).

4.5

Subgradients of Supremum Functions

143

Proposition 4.31 Consider the supremum function (4.24) over a nonempty index set T . For any x ∈ dom f , we have the inclusion  clco ∂gt (x) ⊂ ∂ f (x). (4.26) t∈S(x)

Proof This inclusion obviously holds if S(x) = ∅. If S(x)  = ∅, pick t ∈ S(x) and v ∈ ∂gt (x). Then we get gt (x) = f (x) and hence v, x − x ≤ gt (x) − gt (x) = gt (x) − f (x) ≤ f (x) − f (x), which shows that v ∈ ∂ f (x). Since the subgradient set ∂ f (x) is a closed and convex, we arrive at (4.26) and thus complete the proof.  Further properties of the supremum/maximum function (4.24) involve the following useful extensions of continuity. The lower counterpart of this property is studied and applied in Chap. 6. Definition 4.32 Let Ω be a nonempty subset of Rn . A function ϕ : Ω → R is upper semicontinuous at x ∈ Ω if for every sequence {xk } in Ω converging to x we have the inequality lim sup ϕ(xk ) ≤ ϕ(x). k→∞

The function ϕ is said to be upper semicontinuous on Ω if it is upper semicontinuous at every point of this set. The next proposition can be treated as a one-sided version of the Weierstrass existence theorem. Proposition 4.33 Let T be a nonempty compact subset of R p , and let g(·, x) be upper semicontinuous on T for every x ∈ Rn . Then the function f defined in (4.24) is a realvalued convex function. Proof The convexity of f Rn → R has been already mentioned. We need to check that f (x) < ∞ for every x ∈ Rn . Assume on the contrary that f (x) = ∞ for some x ∈ Rn . Then there exists a sequence {tk } ⊂ T such that g(tk , x) → ∞. Since T is compact, suppose without loss of generality that tk → t ∈ T . It follows from the upper semicontinuity of g(·, x) on T that ∞ = lim sup g(tk , x) ≤ g(t, x) < ∞, k→∞

which is a contradiction that completes the proof of the proposition.



144

4 Fenchel Conjugate and Further Topics in Subdifferentiation

The following proposition is a major step to derive the main result of this section on subdifferentiation of supremum functions. Proposition 4.34 Let the index set ∅  = T ⊂ R p be compact, and let g(·, x) be upper semicontinuous on T for every x ∈ Rn . Then the active index set S(x) ⊂ T is nonempty and compact for every x ∈ Rn . Furthermore, we have the compactness in Rn of the convex hull  C = co ∂gt (x). t∈S(x)

Proof Fix x ∈ Rn and get from Proposition 4.33 that f (x) ∈ R for the supremum function (4.24). Consider an arbitrary sequence {tk } ⊂ T with g(tk , x) → f (x) as k → ∞. By the compactness of T , we find t ∈ T such that tk → t along some subsequence. It follows from the imposed upper semicontinuity of the functions g(·, x) on T that f (x) = lim sup g(tk , x) ≤ g(t, x) ≤ f (x), k→∞

which ensures that t ∈ S(x) by the construction of S in (4.25), and thus S(x)  = ∅. Since g(t, x) ≤ f (x), for any t ∈ T we get the representation    S(x) = t ∈ T  g(t, x) ≥ f (x) implying the closedness of S(x) (and hence its compactness since S(x) ⊂ T ), which is an immediate consequence of the upper semicontinuity of g(·, x). It remains to show that the subgradient set  Q= ∂gt (x) t∈S(x)

is compact in Rn . This yields the claimed compactness of C = co Q, since the convex hull of a compact set is compact as follows from the classical Carathéodory theorem; see Corollary 5.7 in the next chapter. To proceed with verifying the compactness of Q, we recall first that Q ⊂ ∂ f (x) by Proposition 4.31. This implies that the set Q is bounded in Rn , since the subgradient set of a real-valued convex function is compact by Proposition 3.3. To check the closedness of Q, take a sequence {vk } ⊂ Q converging to some v and find tk ∈ S(x) such that vk ∈ ∂gtk (x), k ∈ N. Since S(x) is compact, we may assume that tk → t ∈ S(x). Then g(tk , x) = g(t, x) = f (x) and vk , x − x ≤ g(tk , x) − g(tk , x) = g(tk , x) − g(t, x) for all x ∈ Rn by (3.1). This gives us the relationships

4.5

Subgradients of Supremum Functions

145

v, x − x = lim supvk , x − x k→∞

≤ lim sup g(tk , x) − g(t, x) k→∞

≤ g(t, x) − g(t, x) = gt (x) − gt (x) for all x ∈ Rn , which verify that v ∈ ∂gt (x) ⊂ Q and thus complete the proof.



The next result is of its own interest while being crucial for the proof of the main subdifferential formula of this section. Theorem 4.35 In the setting of Proposition 4.34, we have f (x; v) ≤ σC (v) for all x ∈ Rn , v ∈ Rn via the support function of the set C defined therein. Proof It follows from Propositions 4.24 and 4.25 that −∞ < f (x; v) = inf ϕ(λ) < ∞, λ>0

where the function ϕ is defined in Lemma 4.23 and is proved there to be nondecreasing on (0, ∞). Thus there exists a strictly decreasing sequence λk ↓ 0 as k → ∞ along which we have f (x + λk v) − f (x) , k ∈ N. f (x; v) = lim k→∞ λk For every k, we select tk ∈ T such that f (x + λk v) = g(tk , x + λk v). The compactness of T allows us to find t ∈ T such that tk → t ∈ T along some subsequence. Let us show that t ∈ S(x). Since g(tk , ·) is convex and λk < 1 for large k, we get the relationships   g(tk , x + λk v) = g tk , λk (x + v) + (1 − λk )x (4.27) ≤ λk g(tk , x + v) + (1 − λk )g(tk , x), which imply by the continuity of f (since any real-valued convex function is continuous) and the upper semicontinuity of g(·; x) for x ∈ Rn that f (x) = lim f (x + λk v) = lim g(tk , x + λk v) ≤ g(t, x). k→∞

k→∞

This tells us that f (x) = g(t, x), and therefore t ∈ S(x). Moreover, for any  > 0 and large k, we get the estimate g(tk , x + v) ≤ γ +  with γ = g(t, x + v)

146

4 Fenchel Conjugate and Further Topics in Subdifferentiation

by the upper semicontinuity of g(·, x + v). It follows from (4.27) that g(tk , x) ≥

g(tk , x + λk v) − λk (γ + ) → f (x), 1 − λk

which ensures that limk→∞ g(tk , x) = f (x) due to f (x) ≤ lim inf g(tk , x) ≤ lim sup g(tk , x) ≤ g(t, x) = f (x). k→∞

k→∞

Fix further any λ > 0 and see that λk < λ for all k sufficiently large. Thus we can deduce from Lemma 4.23 that g(tk , x + λk v) − g(t, x) f (x + λk v) − f (x) = λk λk g(tk , x + λk v) − g(tk , x) ≤ λk g(tk , x + λv) − g(tk , x) ≤ for large k. λ This implies in turn the conditions lim sup k→∞

f (x + λk v) − f (x) g(tk , x + λv) − g(tk , x) ≤ lim sup λk λ k→∞ g(t, x + λv) − g(t, x) ≤ λ gt (x + λv) − gt (x) = , λ

which clearly yield the relationships f (x; v) ≤ inf

λ>0

gt (x + λv) − gt (x) = gt (x; v). λ

Using finally Theorem 4.30 and the inclusion ∂gt (x) ⊂ C obtained in Proposition 4.31 shows that f (x; v) ≤ gt (x; v) = sup w, v ≤ sup w, v = σC (v), w∈∂gt (x)

w∈C

and justifies therefore the statement of the theorem.



Now we are ready to establish the main result of this section. Theorem 4.36 Let f be defined in (4.24), where the index set ∅  = T ⊂ R p is compact, and where the function g(t, ·) is convex on Rn for every t ∈ T . Suppose in addition that

4.6

Exercises for This Chapter

147

the function g(·, x) is upper semicontinuous on T for every x ∈ Rn . Then we have the subdifferential formula  ∂gt (x). ∂ f (x) = co t∈S(x)

Proof Taking any w ∈ ∂ f (x), we deduce from Theorems 4.30 and 4.35 that w, v ≤ f (x, v) ≤ σC (v) for all v ∈ Rn , where C is defined in Proposition 4.34. Thus w ∈ ∂σC (0) = C by formula (4.9) for the set Ω = C, which is a nonempty closed convex subset of Rn . The reverse inclusion follows from Proposition 4.31. 

4.6

Exercises for This Chapter

Exercise 4.1 Find the Fenchel conjugate of each of the following functions on R: (i) f (x) = e x . (ii) f (x) = |x|. 2 (iii) f (x) = ax  + bx + c, where a > 0. x ln(x) x > 0, (iv) f (x) = ∞ x ≤ 0.  − ln(x) x > 0, (v) f (x) = ∞ x ≤ 0.  √ − x x ≥ 0, (vi) f (x) = ∞ x < 0. Exercise 4.2 Find the Fenchel conjugate of each of the following functions on Rn : (i) f (x) = 21 x T Ax + b T x + c, where A ∈ Rn×n is a positive definite matrix, b ∈ Rn , and c ∈ R.  n i=1 x i ln(x i ) if x i > 0 for all i = 1, . . . , n, (ii) f (x) = ∞ otherwise.  − ln(xi ) if xi > 0 for all i = 1, . . . , n, (iii) f (x) = ∞ otherwise. Here x = (x1 , . . . , xn ) ∈ Rn .

148

4 Fenchel Conjugate and Further Topics in Subdifferentiation

Exercise 4.3 Let f (x) = e−x for x ∈ R. Find f ∗ and f ∗∗ . 2

Exercise 4.4 Consider the function f : R2 → R defined by  √ − x1 x2 if x1 ≥ 0, x2 ≥ 0, f (x) = ∞ otherwise for x = (x1 , x2 ) ∈ R2 . Find f ∗ and f ∗∗ . Exercise 4.5 (i) Let f : R → R be such that f (0) = 0 and f (t) ≥ 0 for all t ∈ R. Define ϕ(x) = f ( x ), x ∈ Rn . Prove that ϕ∗ (v) = f ∗ ( v ) for all v ∈ Rn . (ii) Let ϕ(x) = x p / p for x ∈ Rn , where p > 1. Find ϕ∗ and ϕ∗∗ . Exercise 4.6 (i) Let f : Rn → R be a convex and positively homogeneous function. Show that f ∗ (v) = δΩ (v) for all v ∈ Rn , where Ω = ∂ f (0). n |xi |, and let g(x) = max{|xi | | i = 1, . . . , n} for x ∈ Rn . Find (ii) Let f (x) = x 1 = i=1 ∗ ∗ ∗∗ ∗∗ f , g and f , g . ∗∗ . Exercise 4.7 Let ∅  = Ω ⊂ Rn . Find the biconjugate function δΩ

Exercise 4.8 Give the detailed proofs of Proposition 4.17(ii) and (iii). Exercise 4.9 Let f : Rn → R be a function with dom f = ∅. Show that f ∗∗ (x) = f (x) for all x ∈ Rn . Exercise 4.10 Give an example of a convex function f : Rn → R such that f ∗∗  = f . Exercise 4.11 Let f : Rn → R be an extended-real-valued function such that epi f is nonempty, closed, and convex. Prove that for x, v ∈ Rn we have the equivalence v ∈ ∂ f (x) if and only if x ∈ ∂ f ∗ (v). Exercise 4.12 Let f 1 , f 2 : Rn → R be proper convex functions. (i) Suppose that f 1 and f 2 share an affine minorant, i.e., there exist a vector v ∈ Rn and a number b ∈ R such that v, x + b ≤ f i (x) for all x ∈ Rn , i = 1, 2. Prove that f 1  f 2 : Rn → R is a proper convex function. (ii) Prove that if dom f 1∗ ∩ dom f 2∗  = ∅, then f 1 and f 2 share an affine minorant.

4.6

Exercises for This Chapter

149

Exercise 4.13 Let f : R → R be convex. Prove that ∂ f (x) = [ f − (x), f + (x)], where f − (x) and f + (x) stand for the left and right derivative of f at x, respectively. Exercise 4.14 Define the function f : R → R by  √ − 1 − x 2 if |x| ≤ 1, f (x) = ∞ otherwise. (i) Find the Fenchel conjugate f ∗ . (ii) Show that f is a convex function and calculate its directional derivatives f (−1; d) and f (1; d) in the direction d = 1. Exercise 4.15 Let f : Rn → R be convex with x ∈ dom f . For any d ∈ Rn , consider the function ϕ defined in Lemma 4.23 and show that ϕ is nondecreasing on (−∞, 0). Exercise 4.16 Let f : Rn → R be convex with x ∈ dom f . Verify the following: (i) f − (x; d) ≤ f (x; d) for all d ∈ Rn . (ii) f (x) − f (x − d) ≤ f − (x; d) ≤ f (x; d) ≤ f (x + d) − f (x) for all d ∈ Rn . Exercise 4.17 Prove the assertions of Proposition 4.13. Exercise 4.18 Let Ω ⊂ Rn be a nonempty compact set. Using Theorem 4.36, calculate the subdifferential of the support function ∂σΩ (0). Exercise 4.19 Let Ω be a nonempty and compact subset of Rn . Define the function    μΩ (x) = sup x − ω  ω ∈ Ω , x ∈ Rn . Prove that μΩ is convex and find its subdifferential ∂μΩ (x) at any x ∈ Rn . Exercise 4.20 Using the representation   

x = sup v, x  v ≤ 1 and Theorem 4.36, calculate the subdifferential ∂ p(x) of the norm function p(x) = x at any point x ∈ Rn . Exercise 4.21 Let d(·; Ω) be the distance function (1.11) associated with some nonempty closed convex subset Ω of Rn .

150

4 Fenchel Conjugate and Further Topics in Subdifferentiation

(i) Prove the representation    d(x; Ω) = sup x, v − σΩ (v)  v ≤ 1 via the support function σΩ of Ω. (ii) Calculate the subdifferential ∂d(x; Ω) at any point x ∈ Rn .

5

Remarkable Consequences of Convexity

This chapter is devoted to remarkable topics of convex analysis that, together with subdifferential calculus, Fenchel conjugates and related results of previous chapters, constitute major achievements of this discipline highly important for many applications—first of all, to convex optimization.

5.1

Characterizations of Differentiability

Here we study relationships between the Gâteaux and Fréchet derivatives of functions introduced in Definition 3.8 in the general framework. Then we show that for convex functions, these notions can be characterized via a single-element subdifferential of convex analysis. It follows from Definition 4.22 of the directional derivative that f − (x; d) = − f  (x; −d). Thus f G (x) = v if and only if the directional derivative f  (x; d) exists and is linear with respect to d, i.e., f  (x; d) = v, d for all d ∈ Rn ; see Exercise 5.1 and its solution. Moreover, if f is Gâteaux differentiable at x, then  f G (x), d = lim

t→0

f (x + td) − f (x) for all d ∈ Rn . t

Proposition 5.1 Let f : Rn → R be a function (not necessarily convex), which is locally Lipschitz continuous around x ∈ Rn . Then the following assertions are equivalent: (i) f is Fréchet differentiable at x. (ii) f is Gâteaux differentiable at x.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0_5

151

152

5 Remarkable Consequences of Convexity

Proof Suppose that f is Fréchet differentiable at x with v = f F (x). Fixing a nonzero element d ∈ Rn and taking into account that x + td − x = |t| · d → 0 as t → 0, we have lim

t→0

f (x + td) − f (x) − tv, d f (x + td) − f (x) − v, td = d lim = 0, t→0 t td

which yields the Gâteaux differentiability of f at x with f G (x) = v. Note that the local Lipschitz continuity of f around x is not required for this implication. Assuming now that f is Gâteaux differentiable at x with v = f G (x), let us show that it is Fréchet differentiable at this point, i.e., lim

h→0

f (x + h) − f (x) − v, h = 0. h

On the contrary, suppose that this is not the case. Then we can find 0 > 0 and a sequence of h k → 0 with h k  = 0 as k ∈ N such that   f (x + h k ) − f (x) − v, h k | 0 ≤ for all k ∈ N. (5.1) h k  Denote tk = h k  and dk = h k /h k , k ∈ N. Then tk → 0, and we get without loss of generality that dk → d with d = 1. This gives us the estimate   f (x + tk dk ) − f (x) − v, tk dk | for all k ∈ N. 0 ≤ tk Let  ≥ 0 be a Lipschitz constant of f around x. The triangle inequality and elementary transformations tell us that   f (x + tk dk ) − f (x) − v, tk dk | 0 ≤ tk   f (x + tk dk ) − f (x + tk d)| + |v, tk d − v, tk dk | ≤ tk   f (x + tk d) − f (x) − v, tk d| + tk ≤ dk − d + v · dk − d   f (x + tk d) − f (x) − v, tk d| → 0 as k → ∞, + tk which is a contradiction verifying the Fréchet differentiability of f at x.



5.2

Carathéodory Theorem

153

The next theorem provides a complete subdifferential characterization of both notions of differentiability that are equivalent for convex functions. Theorem 5.2 Let f : Rn → R be a convex function with x ∈ int(dom f ). The following assertions are equivalent: (i) f is Fréchet differentiable at x. (ii) f is Gâteaux differentiable at x. (iii) ∂ f (x) is a singleton. Proof It follows from Theorem 1.64 that f is locally Lipschitzian around x, and hence (i) and (ii) are equivalent by Proposition 5.1. Suppose now that (ii) is satisfied, and let v = f G (x). Then f − (x; d) = f  (x; d) = v, d for all d ∈ Rn . Observe that ∂ f (x)  = ∅ since x ∈ int(dom f ). Fix any w ∈ ∂ f (x). It follows from Theorem 4.27 that f − (x; d) ≤ w, d ≤ f  (x; d) whenever d ∈ Rn , and hence w, d = v, d; so w − v, d = 0 for all d. Plugging there d = w − v gives us w = v and thus justifies that ∂ f (x) is a singleton. Suppose finally that (iii) is satisfied with ∂ f (x) = {v} and then show that (ii) holds. By Theorem 4.30, we have the representation f  (x; d) = max w, d = v, d for all d ∈ Rn . w∈∂ f (x)

This implies that f is Gâteaux and hence Fréchet differentiable at x.

5.2



Carathéodory Theorem

We begin with some preliminary statements on conic and convex hulls of sets that lead us to one of the most remarkable results of convex geometry established by Constantin Carathéodory in 1911. Recall that a set Ω is a cone if λx ∈ Ω for all λ ≥ 0 and all x ∈ Ω. It follows from the definition that 0 ∈ Ω if Ω is a nonempty cone. Definition 5.3 Let Ω ⊂ Rn be a nonempty set. (i) The cone generated by Ω is the set defined in (1.17), namely  cone Ω = R+ Ω = tΩ. t≥0

154

5 Remarkable Consequences of Convexity

Fig. 5.1 Convex cone generated by a set

(ii) We say that K Ω is the convex cone generated by Ω, or the convex conic hull of Ω, if it is the intersection of all the convex cones containing Ω; see an illustration in Fig. 5.1. Here is a convenient description of convex cones generated by sets. Proposition 5.4 Let Ω be a nonempty subset of Rn . Then the convex cone generated by Ω admits the following representation:  m     KΩ = λi ai  λi ≥ 0, ai ∈ Ω, m ∈ N . (5.2) i=1

If, in particular, the set Ω is convex, then K Ω = cone Ω. Proof Denote by C the set on the right-hand side of (5.2) and observe that it is a convex cone which contains Ω. It remains to show that C ⊂ K for any convex cone K containing Ω. m To see this, fix such a cone and form the combination x = i=1 λi ai with λi ≥ 0, ai ∈ Ω, and m ∈ N. If λi = 0 for all i, then x = 0 ∈ K . Otherwise, suppose that λi > 0 for some i m λi > 0. Since ai ∈ Ω ⊂ K and K is a convex cone, this yields and denote λ = i=1  m  λi x =λ ai ∈ K , λ i=1

which ensures that C ⊂ K , and thus K Ω = C. The last assertion of the proposition obviously follows from the definitions. 

5.2

Carathéodory Theorem

155

The next proposition provides a powerful design, which plays a crucial role in deriving many results of finite-dimensional analysis and optimization. Besides its applications presented below, this design applies, e.g., to establishing fundamental theorems of linear programming. Proposition 5.5 For any nonempty set Ω ⊂ Rn and any x ∈ K Ω \ {0}, we have the following representation: x=

m 

λi ai with λi > 0, ai ∈ Ω as i = 1, . . . , m and m ≤ n.

i=1

m μi ai with μi > 0 Proof Let x ∈ K Ω \ {0}. It follows from Proposition 5.4 that x = i=1 and ai ∈ Ω for i = 1, . . . , m and m ∈ N. If the elements a1 , . . . , am are linearly dependent, then there are numbers γi ∈ R, not all zeros, such that m 

γi ai = 0.

i=1

Denote I = {i = 1, . . . , m | γi > 0}  = ∅ and get for any  > 0 that  m m m m     μi ai = μi ai −  γi ai = x= (μi − γi ) ai . i=1

i=1

i=1

i=1

Letting  = min{μi /γi | i ∈ I } = μi0 /γi0 with i 0 ∈ I and βi = μi − γi for i = 1, . . . , m, we have the relationships x=

m 

βi ai with βi0 = 0 and βi ≥ 0 for any i = 1, . . . , m.

i=1

Continuing this process, after a finite number of steps we represent x as a positive linear combination of linearly independent elements {a j | j ∈ J }, where J ⊂ {1, . . . , m}. Thus the index set J cannot contain more than n elements, which completes the proof of the proposition.  The following fundamental result of finite-dimensional geometry is known as the Carathéodory theorem. It relates convex hulls of sets to the dimension of the space; see an illustration in Fig. 5.2. Theorem 5.6 Let Ω be a nonempty subset of Rn . Then every x ∈ co Ω can be represented as a convex combination of no more than n + 1 elements of Ω.

156

5 Remarkable Consequences of Convexity

Fig. 5.2 Carathéodory theorem

Proof Denoting A = {1} × Ω ⊂ Rn+1 , it is easy to observe that co A = {1} × co Ω and that co A ⊂ K A . Take now any x ∈ co Ω and get (1, x) ∈ co A. Proposition 5.5 ensures the existence of λi ≥ 0 and (1, ai ) ∈ A for i = 0, . . . , m such that m ≤ n and we have the representation m  λi (1, ai ). (1, x) = This implies that x =

m

i=0 λi ai

with

m

i=0

i=0 λi

= 1, λi ≥ 0, and m ≤ n.



The next useful assertion is a direct consequence of Theorem 5.6; see Exercise 5.10 and the hint to its solution. Corollary 5.7 Suppose that a subset Ω of Rn is compact. Then the set co Ω is also compact in Rn .

5.3

Farkas Lemma

This section concerns another remarkable result of convex analysis known as the Farkas lemma, which is highly important in various applications, particularly those related to optimization. First we present two propositions of their own interest. Note that the design given in Proposition 5.5 of the preceding section plays a crucial role in the proofs of these propositions and hence of the Farkas lemma.

5.3

Farkas Lemma

157

Proposition 5.8 Let a1 , . . . , am be linearly dependent elements in Rn , where ai  = 0 for all i = 1, . . . , m. Define the sets

Ω = a1 , . . . , am and Ωi = Ω \ ai for i = 1, . . . , m. Then the convex cones generated by these sets are related as KΩ =

m 

K Ωi .

(5.3)

i=1

m K Ωi ⊂ K Ω . To verify the reverse inclusion, pick any Proof We obviously have that i=1 x ∈ K Ω and get m  μi ai with μi ≥ 0, i = 1, . . . , m. x= i=1

m

If x = 0, then clearly x ∈ i=1 K Ωi . Otherwise, the linear dependence of ai allows us to find γi ∈ R, not all zeros, such that m 

γi ai = 0.

i=1

Following the design of Proposition 5.5, we represent x as x=

m 

λi ai for some i 0 ∈ {1, . . . , m},

i=1,i=i 0

which shows that x ∈

m i=1

K Ωi and thus verifies (5.3).



Proposition 5.9 Let a1 , . . . , am be elements of Rn . Then the convex cone K Ω generated by Ω = {a1 , . . . , am } is a closed set in Rn . Proof Assume first that the vectors a1 , . . . , am are linearly independent and take any sequence {xk } ⊂ K Ω converging to x. By the construction of K Ω , find numbers αki ≥ 0 for each i = 1, . . . , m and k ∈ N such that xk =

m 

αki ai .

i=1

Letting αk = (αk1 , . . . , αkm ) ∈ Rm and arguing by contradiction, it is easy to check that the sequence {αk } is bounded. This allows us to suppose without loss of generality that αk → (α1 , . . . , αm ) as k → ∞ with αi ≥ 0 for all i = 1, . . . , m. Thus we get by passing m αi ai ∈ K Ω , which verifies the closedness of K Ω . to a limit that xk → x = i=1

158

5 Remarkable Consequences of Convexity

Considering next arbitrary elements a1 , . . . , am , assume without loss of generality that ai  = 0 for all i = 1, . . . , m. By the above, it remains to examine the case where a1 , . . . , am are linearly dependent. Then by Proposition 5.8 we have the cone representation (5.3), where each set Ωi contains m − 1 elements. If at least one of the sets Ωi consists of linearly dependent elements, we can represent the corresponding cone K Ωi in a similar way via the sets of m − 2 elements. This gives us, after a finite number of steps, that KΩ =

p 

KC j ,

j=1

where each set C j ⊂ Ω contains only linear independent elements, and so the cones K C j are closed. Thus the cone K Ω under consideration is closed as well as it is the union of a finite number of closed sets.  Now we are ready to derive a fundamental result of convex analysis and optimization established by Gyula Farkas in 1894. Theorem 5.10 Let Ω = {a1 , . . . , am } with a1 , . . . , am ∈ Rn , and let b ∈ Rn . The following assertions are equivalent: (i) b ∈ K Ω . (ii) For any x ∈ Rn , we have the implication



 ai , x ≤ 0, i = 1, . . . , m =⇒ b, x ≤ 0 .

Proof To verify implication (i) =⇒ (ii), take b ∈ K Ω and find λi ≥ 0 for i = 1, . . . , m such that m  b= λi ai . i=1

Then, given x ∈

Rn ,

the inequalities ai , x ≤ 0 for all i = 1, . . . , m imply that b, x =

m 

λi ai , x ≤ 0,

i=1

which is stated in (ii). To prove the converse implication, suppose that b ∈ / K Ω and show that (ii) does not hold in this case. Indeed, since the convex cone K Ω is closed by Proposition 5.9, the strict separation result of Proposition 2.33 yields the existence of x ∈ Rn satisfying 

sup u, x  u ∈ K Ω } < b, x. By using the conic structure of K Ω , we have 0 = 0, x < b, x and tu ∈ K Ω for all t > 0, u ∈ K Ω . This tells us that

5.4

Radon Theorem and Helly Theorem

159



t sup u, x  u ∈ K Ω < b, x whenever t > 0. Dividing both sides of this inequality by t > 0 and letting t → ∞ give us 

sup u, x  u ∈ K Ω ≤ 0. It follows that ai , x ≤ 0 for all i = 1, . . . , m while b, x > 0. This is a contradiction, which therefore completes the proof of the theorem. 

5.4

Radon Theorem and Helly Theorem

In this section, we introduce two important theorems of convex geometry. Lemma 5.11 Consider arbitrary elements w1 , . . . , wm in Rn with m ≥ n + 2. Then these elements are affinely dependent. Proof Form the elements w2 − w1 , . . . , wm − w1 in Rn and observe that these vectors are linearly dependent since m − 1 > n. Thus the vectors w1 , . . . , wm are affinely dependent by Proposition 2.10.  We continue with a theorem by Radon established in 1923 and used by himself to give a simple proof of a celebrated theorem of convex geometry discovered by Helly in 1913. Theorem 5.12 Let w1 , . . . , wm of Rn with m ≥ n + 2, and let I = {1, . . . , m}. Then there are two nonempty disjoint subsets I1 and I2 of I with I = I1 ∪ I2 such that for Ω1 = {wi | i ∈ I1 }, Ω2 = {wi | i ∈ I2 } we have co Ω1 ∩ co Ω2  = ∅. Proof Since m ≥ n + 2, it follows from Lemma 5.11 that the vectors w1 , . . . , wm are affinely dependent. Thus there exist real numbers λ1 , . . . , λm , not equal to zero simultaneously, such that m m   λi wi = 0 and λi = 0. i=1

i=1

Consider the index sets I1 = {i = 1, . . . , m | λi > 0} and I2 = {i = 1, . . . , m | λi ≤ 0}.   m λi = 0, both sets I1 and I2 are nonempty and i∈I1 λi = − i∈I2 λi . Denoting Since i=1  λ = i∈I1 λi , we have the equalities  i∈I1

λi wi = −

 i∈I2

λi wi and

 λi i∈I1

λ

wi =

 −λi i∈I2

λ

wi .

160

5 Remarkable Consequences of Convexity

Define finally the sets Ω1 = {wi | i ∈ I1 }, Ω2 = {wi | i ∈ I2 } and observe that co Ω1 ∩ co Ω2  = ∅ since  −λi  λi wi = wi ∈ co Ω1 ∩ co Ω2 . λ λ i∈I1

i∈I2



This completes the proof of the Radon theorem. Now we are ready to formulate and prove the fundamental Helly theorem.



Theorem 5.13 Let O = Ω1 , . . . , Ωm be a collection of convex sets in Rn with m ≥ n + 1. Suppose that the intersection of any subcollection of n + 1 sets from O is nonempty. Then m Ωi  = ∅. we have i=1 Proof Let us prove the theorem by using induction with respect to the number of elements in O. The conclusion of the theorem is obvious if m = n + 1. Suppose that the conclusion

holds for any collection of m convex sets of Rn with m ≥ n + 1. Let Ω1 , . . . , Ωm , Ωm+1 be a collection of convex sets in Rn such that the intersection of any subcollection of n + 1 sets is nonempty. For i = 1, . . . , m + 1, define i =

m+1 

Ωj

j=1, j=i

and observe that i ⊂ Ω j whenever j  = i and i, j = 1, . . . , m + 1. It follows from the induction hypothesis that i  = ∅, and so we can pick wi ∈ i for every i = 1, . . . , m + 1. Applying the above Radon theorem for the elements w1 , . . . , wm+1 , we find two nonempty disjoint index sets I1 and I2 of I = {1, . . . , m + 1} with I = I1 ∪ I2 such that for W1 = {wi | i ∈ I1 } and W2 = {wi | i ∈ I2 } it holds co W1 ∩ co W2  = ∅. This allows us to pick m+1 Ωi and thus arrive at the an element w ∈ co W1 ∩ co W2 . We finally show that w ∈ i=1 conclusion. Since i  = j for i ∈ I1 and j ∈ I2 , it follows that i ⊂ Ω j whenever i ∈ I1 and j ∈ I2 . Fix any i = 1, . . . , m + 1 and consider the case where i ∈ I1 . Then w j ∈  j ⊂ Ωi for every j ∈ I2 . Hence w ∈ co W2 = co {w j | j ∈ I2 } ⊂ Ωi due to the convexity of Ωi . It shows that w ∈ Ωi for every i ∈ I1 . Similarly, we can verify that w ∈ Ωi for every i ∈ I2 and thus complete the proof of the theorem. 

5.5

Extreme Points and Minkowski Theorem

Here we present one more classical theorem of convex analysis, which was proved by Hermann Minkowski for n = 3 as a consequence of his convex separation theorem in his posthumously published book of 1911. The major notion studied in this section is the following.

5.5

Extreme Points and Minkowski Theorem

161

Definition 5.14 Let Ω be a nonempty subset of Rn . An element x ∈ Ω is an extreme point of Ω if there do not exist a, b ∈ Ω with a  = b such that x ∈ (a, b). The set of all extreme points of Ω is denoted by ext Ω. This introduced notion is illustrated by the simple planar examples below. Example 5.15 (i) Let Ω = {(x, y) ∈ R2 | |x| ≤ 1 and |y| ≤ 1}, i.e., it is the unit square in R2 . Then the set ext Ω consists of the four points: (1, 1), (1, −1), (−1, 1), (−1, −1). (ii) Let Ω = {(x, y) ∈ R2 | x 2 + y 2 ≤ 1}, the closed unit ball in R2 . Then the set ext Ω is the unit sphere S = {(x, y) ∈ R2 | x 2 + y 2 = 1}. (iii) Let Ω = {(x, y) ∈ R2 | y < 0} ∪ {(0, 0)}. Then (0, 0) is the only extreme point of the set Ω. The next notions are closely related to convex separation theorems. Definition 5.16 Let Ω be a nonempty subset of Rn with x ∈ bd Ω. (i) If there exists a nonzero element v ∈ Rn such that v, x ≤ v, x for all x ∈ Ω, then we say that there is a supporting hyperplane to Ω at x and the corresponding supporting hyperplane is given by 

(5.4) H = x ∈ Rn  v, x = v, x . (ii) If there exists a nonzero element v ∈ Rn such that v, x ≤ v, x for all x ∈ Ω and there exists  x ∈ Ω such that v,  x  < v, x , then we say that there is a proper supporting hyperplane to Ω at x defined by (5.4). Yet another definition is used in what follows. Definition 5.17 Let Ω be a nonempty subset of Rn . The relative boundary of Ω is defined and denoted by rb Ω := Ω \ ri Ω. Now we present the existence results for supporting hyperplanes and proper supporting hyperplanes to convex sets at their boundary points that are consequences of the corresponding convex separation theorems. Proposition 5.18 Let Ω be a nonempty convex subset of Rn . We have (i) If x ∈ bd Ω, then there is a supporting hyperplane to Ω at x. (ii) If x ∈ rb Ω, then there is a proper supporting hyperplane to Ω at x.

162

5 Remarkable Consequences of Convexity

Proof Suppose that Ω is a nonempty convex subset of Rn with x ∈ bd Ω. In the case where x ∈ Ω, it follows from the definition that there is a supporting hyperplane to Ω at x if and only if N (x; Ω)  = {0}, which holds by Proposition 2.53. Consider further the case where x∈ / Ω, and so x ∈ / ri Ω. Applying Theorem 2.40 gives us a nonzero element v ∈ Rn such that v, x ≤ v, x for all x ∈ Rn . Therefore, in any case there is a supporting hyperplane / ri Ω, and so to Ω at x, which completes the proof of (i). Observe that if x ∈ rb Ω, then x ∈ (ii) follows directly from the proper separation in Theorem 2.40.  The following two lemmas can be viewed as important steps in the proof of the main result of this section. Lemma 5.19 Let Ω be a nonempty convex subset of Rn with x ∈ bd Ω, and let H be a supporting hyperplane to Ω at x. Then we have the inclusion ext(Ω ∩ H ) ⊂ ext Ω.

(5.5)

Proof Suppose that the supporting hyperplane H is generated by the vector v as in (5.4). Fix any x ∈ ext(Ω ∩ H ) and assume on the contrary that x ∈ / ext Ω. Then there exist a, b ∈ Ω with a  = b and 0 < t < 1 such that x = ta + (1 − t)b. Since a, b ∈ Ω and x ∈ H , we get the relationships v, a ≤ v, x, v, b ≤ v, x, and v, x = tv, a + (1 − t)v, b = v, x, which yield v, a = v, x and v, b = v, x. Therefore, a, b ∈ Ω ∩ H and x ∈ (a, b), a contradiction which thus verifies (5.5).  Lemma 5.20 Let Ω be a nonempty compact convex subset of Rn that contains at least two points. Then we have the representation Ω = co(rb Ω). Proof Fix any x ∈ Ω and first consider the case where x ∈ ri Ω. Choose x0 ∈ Ω with x0  = x and define a subset of R by 

I = t ∈ R  x + t(x0 − x) ∈ Ω . We can check that I is a bounded interval in R. Denote α = inf I , β = sup I , a = x + α(x0 − x), and b = x + β(x0 − x). It is not hard to show that a, b ∈ rb Ω and x ∈ [a, b] ⊂ co(rb Ω). In the remaining case where x ∈ / ri Ω. Since Ω = Ω, we have by the closedness of Ω that x ∈ Ω \ ri Ω = rb Ω ⊂ co(rb Ω), which tells us that Ω ⊂ co(rb Ω). The reverse inclusion follows from the fact that rb Ω ⊂ bd Ω ⊂ Ω, and thus we are done with the proof. 

5.6 Tangents to Convex Sets

163

Now we are in a position to establish the main result, the Minkowski theorem. Note that an important and nontrivial extension of this result to infinite-dimensional spaces is known as the Krein-Milman theorem. Theorem 5.21 Let Ω be a nonempty compact convex set in Rn . Then Ω = co(ext Ω). Proof We verify this result by using induction with respect to the dimension of the convex set Ω. The assertion obviously holds when dim Ω = 0 because in this case Ω is a singleton. Suppose that the assertion holds for any nonempty compact convex set  with dim  < dim Ω. Fix any x ∈ rb Ω and get x ∈ Ω due to the closedness of Ω. Let H be a proper supporting hyperplane to Ω at x, which exists by Proposition 5.18(ii). Denoting  = Ω ∩ H , it is easy to see that  is a nonempty, compact, and convex subset of Rn such that dim  < dim Ω (see Exercise 5.17 and the hint to its solution) and that x ∈ . Then by Lemma 5.19 and the inductive hypothesis, we have the relationships x ∈  = co(ext ) ⊂ co(ext Ω). It follows that rb Ω ⊂ co(ext Ω), which implies by Lemma 5.20 that Ω = co(rb Ω) ⊂ co(ext Ω). Since the reverse inclusion is obvious, the proof is complete.

5.6



Tangents to Convex Sets

In Chap. 2, we studied the normal cone to convex sets and learned that the very definition and major properties of it were closely related to convex separation. The goal of this section is to define the tangent cone to a convex set independently while observing that it can be constructed via duality from the normal cone to the set in question. As the reader can see, the Farkas lemma is crucial to prove the main result given below. Given a nonempty set Ω ⊂ Rn , the polar of Ω is defined by 

Ω ◦ := v ∈ Rn  v, x ≤ 1 for all x ∈ Ω . Note that the polar of Ω is a nonempty closed convex set. The polar of Ω ◦ denoted by Ω ◦◦ is called the bipolar of Ω. Next we provide a simpler representation for the polar of a cone. Proposition 5.22 Let C be a nonempty cone in Rn . Then 

C ◦ = v ∈ Rn  v, c ≤ 0 for all c ∈ C .

164

5 Remarkable Consequences of Convexity

Proof Fix any v ∈ C ◦ . For any c ∈ C we have v, tc ≤ 1 whenever t > 0. This yields v, c ≤ 1/t for all t > 0. Letting t → ∞ gives us v, c ≤ 0, which justifies the inclusion “⊂”. The reverse inclusion is obvious.  Using convex separation shows that the bipolar of a nonempty convex cone reduces to the closure of the set C. Proposition 5.23 For any nonempty convex cone C ⊂ Rn , we have C ◦◦ = cl C. Proof The inclusion C ⊂ C ◦◦ follows directly from the definition, and hence cl C ⊂ C ◦◦ since C ◦◦ is a closed set. To verify the reverse inclusion, take any c ∈ C ◦◦ and suppose on the contrary that c ∈ / cl C. Since C is a cone, by the strict convex separation from Proposition 2.33 we find v  = 0 such that v, x ≤ 0 for all x ∈ C and v, c > 0, which yield v ∈ C ◦ . This contradicts the fact that c ∈ C ◦◦ due to the second inequality v, c > 0 above.  Next we define the tangent cone to an arbitrary convex set. Definition 5.24 Let Ω be a convex subset of Rn with x ∈ Ω. The tangent cone to Ω at x is given by T (x; Ω) := cl cone(Ω − x). The following theorem reveals the duality correspondence between normals and tangents to convex sets; see an illustration in Fig. 5.3. Theorem 5.25 Let Ω be a convex set in Rn with x ∈ Ω. Then we have

◦

◦ N (x; Ω) = T (x; Ω) and T (x; Ω) = N (x; Ω) .

(5.6)

Proof To verify first the inclusion N (x; Ω) ⊂ [T (x; Ω)]◦ , pick any v ∈ N (x; Ω) and get by definition that v, x − x ≤ 0 for all x ∈ Ω. This implies that 

 v, t(x − x) ≤ 0 whenever t ≥ 0 and x ∈ Ω.

5.6 Tangents to Convex Sets

165

Fig. 5.3 Tangent cone and normal cone

Fixing w ∈ T (x; Ω), find tk ≥ 0 and xk ∈ Ω with tk (xk − x) → w as k → ∞ such that we have

◦ v, w = lim v, tk (xk − x) ≤ 0, and hence v ∈ T (x; Ω) . k→∞

To verify the reverse inclusion in (5.6), pick v ∈ [T (x; Ω)]◦ and deduce from the relations v, w ≤ 0 as w ∈ T (x; Ω) and x − x ∈ T (x; Ω) as x ∈ Ω that v, x − x ≤ 0 for all x ∈ Ω. This yields v ∈ N (x; Ω) and thus completes the proof of the first equality in (5.6). The second equality in (5.6) follows from the first one due to

N (x; Ω)

◦

◦◦ = T (x; Ω) = cl T (x; Ω) = T (x; Ω)

by Proposition 5.23 and the closedness of T (x; Ω) from Definition 5.24.



The next theorem, which is partly based on the Farkas lemma from Theorem 5.10, provides useful representations of the tangent and normal cones to a convex polyhedron. Theorem 5.26 Let ai ∈ Rn , let bi ∈ R for i = 1, . . . , m, and let 

Ω = x ∈ Rn  ai , x ≤ bi for all i = 1, . . . , m . Suppose that x ∈ Ω and define the active index set I (x) = {i = 1, . . . , m | ai , x = bi }. Then we have the relationships

166

5 Remarkable Consequences of Convexity



T (x; Ω) = v ∈ Rn  ai , v ≤ 0 for all i ∈ I (x) ,

 N (x; Ω) = cone ai  i ∈ I (x)

(5.7) (5.8)

with the convention that cone ∅ = {0}. Proof To verify (5.7), consider the convex cone 

K = v ∈ Rn  ai , v ≤ 0 for all i ∈ I (x) . It is easy to see that for any x ∈ Ω, any i ∈ I (x), and any t ≥ 0 we get     ai , t(x − x) = t ai , x − ai , x ≤ t(bi − bi ) = 0, and hence cone(Ω − x) ⊂ K . Thus it follows from the closedness of K that T (x; Ω) = cl cone(Ω − x) ⊂ K . To justify the reverse inclusion in (5.7), observe that ai , x < bi for i ∈ / I (x). Using this and picking an arbitrary element v ∈ K give us ai , x + tv ≤ bi for all i ∈ / I (x) and sufficiently small t > 0. Since ai , v ≤ 0 as i ∈ I (x), it tells us that ai , x + tv ≤ bi for all i = 1, . . . , m and sufficiently small t > 0, and so x + tv ∈ Ω, or v ∈

 1 Ω − x ⊂ T (x; Ω). t

This verifies the tangent cone representation (5.7). To prove finally representation (5.8) of the normal cone, observe that the first equality in (5.6) ensures the equivalence w ∈ N (x; Ω) if and only if w, v ≤ 0 for all v ∈ T (x; Ω). Then we see from (5.7) that w ∈ N (x; Ω) if and only if



 ai , v ≤ 0 for all i ∈ I (x) =⇒ w, v ≤ 0

whenever v ∈ Rn . Then the Farkas lemma from Theorem 5.10 tells us that w ∈ N (x; Ω) if  and only if w ∈ cone {ai | i ∈ I (x)}, and thus (5.8) holds.

5.7

5.7

Exercises for This Chapter

167

Exercises for This Chapter

Exercise 5.1 Let f : Rn → R be a function with x ∈ int(dom f ). Show that f is Gâteaux differentiable at x with f G (x) = v if and only if the directional derivative f  (x; d) exists and f  (x; d) = v, d for all d ∈ Rn . Exercise 5.2 Let Ω be a nonempty closed convex subset of Rn .  2 (i) Prove that the function f (x) = d(x; Ω) , x ∈ Rn , is Fréchet differentiable on Rn and its gradient is calculated by   ∇ f (x) = 2 x − (x; Ω) for every x ∈ Rn . (ii) Prove that the function d(·; Ω) is Fréchet differentiable on Rn \ Ω. Exercise 5.3 Let F be a nonempty compact convex subset of R p , and let A ∈ R p×n . Define the function   1 f (x) = sup Ax, u − u2 , x ∈ Rn , 2 u∈F and show that f is convex and Fréchet differentiable on Rn . Exercise 5.4 Give an example of f : Rn → R, which is Gâteaux differentiable but not Fréchet differentiable. Can such a function be convex? Exercise 5.5 Let f : Rn → R be a convex function with x ∈ int(dom f ). Show that if at x exists for all i = 1, . . . , n, then f is Fréchet differentiable at x.

∂f ∂xi

Exercise 5.6 Let f : Rn → R be Fréchet differentiable on Rn . Show that f is strictly convex if and only if it satisfies the condition ∇ f (x), x − x < f (x) − f (x) for all x ∈ Rn and x ∈ Rn , x  = x. Exercise 5.7 Let f : Rn → R be twice continuously differentiable (C 2 − smooth) on Rn . Prove that if the Hessian matrix ∇ 2 f (x) is positive definite for all x ∈ Rn , then f is strictly convex. Give an example showing that the converse implication does not hold in general. Exercise 5.8 A convex function f : Rn → R is called strongly convex with modulus c > 0 if for all x1 , x2 ∈ Rn and λ ∈ (0, 1) we have   1 f λx1 + (1 − λ)x2 ≤ λ f (x1 ) + (1 − λ) f (x2 ) − cλ(1 − λ)x1 − x2 2 . 2

168

5 Remarkable Consequences of Convexity

Prove that the following properties are equivalent for any twice continuously differentiable convex function f : Rn → R: (i) f strongly convex. c (ii) f −  · 2 is convex. 2 (iii) ∇ 2 f (x)d, d ≥ cd2 for all x, d ∈ Rn . → Rn is said to be monotone if Exercise 5.9 A set-valued mapping F: Rn → v2 − v1 , x2 − x1  ≥ 0 whenever vi ∈ F(xi ), i = 1, 2. F is called strictly monotone if v2 − v1 , x2 − x1  > 0 whenever vi ∈ F(xi ), x1  = x2 , i = 1, 2. It is called strongly monotone with modulus  > 0 if v2 − v1 , x2 − x1  ≥ x1 − x2 2 whenever vi ∈ F(xi ), i = 1, 2. (i) Prove that for any convex function f : Rn → R, the subgradient mapping F(x) = ∂ f (x) is monotone on Rn . (ii) Prove that for any convex function f : Rn → R the subgradient mapping ∂ f (x) is strictly monotone if and only if f is strictly convex, and it is strongly monotone if and only if f is strongly convex. Exercise 5.10 Prove Corollary 5.7. Exercise 5.11 Let A be an p × n matrix, and let b ∈ R p . Use the Farkas lemma to show that the system Ax = b, x  0 has a feasible solution x ∈ Rn if and only if the system A T y ≤ 0, b T y > 0 has no feasible solution y ∈ R p . Exercise 5.12 Deduce Carathéodory’s theorem from the proof of Radon’s theorem. Exercise 5.13 Use Carathéodory’s theorem to prove Helly’s theorem. Exercise 5.14 Let Ω be a nonempty convex subset of Rn . Prove that x ∈ ext Ω if and only if x ∈ Ω and Ω \ {x} is convex. Exercise 5.15 Let Ω ⊂ Rn be a nonempty convex set. We say that F ⊂ Ω is a face of Ω if F is a nonempty convex set such that for any x, y ∈ Ω and λ ∈ (0, 1) we have



 x, y ∈ Ω, 0 < λ < 1, λx + (1 − λ)y ∈ F =⇒ x ∈ F, y ∈ F .

5.7

Exercises for This Chapter

169

Verify the following assertions: (i) If E is a face of F and F is a face of Ω, then E is a face of Ω. (ii) If F is a face of Ω, then ext F = (ext Ω) ∩ F. In particular, ext F ⊂ ext Ω. Exercise 5.16 Provide further details for the proof of Lemma 5.20. Exercise 5.17 Let Ω be a nonempty convex set in Rn with x ∈ bd Ω. Prove that if H is a proper supporting hyperplane to Ω at x, then dim(Ω ∩ H ) < dim Ω. Exercise 5.18 Let C be a linear subspace of Rn . Show that the polar of C admits the following representation: 

C ◦ = C ⊥ = v ∈ Rn  v, c = 0 for all c ∈ C . Exercise 5.19 Let Ω be a nonempty convex set in Rn . Show that Ω ◦◦ = co (Ω ∪ {0}). Exercise 5.20 Let Ω1 and Ω2 be convex sets with ri Ω1 ∩ ri Ω2  = ∅. Show that T (x; Ω1 ∩ Ω2 ) = T (x; Ω1 ) ∩ T (x; Ω2 ) for any x ∈ Ω1 ∩ Ω2 . Exercise 5.21 Let A be p × n matrix, let 

Ω = x ∈ Rn  Ax = 0, x  0 , and let x = (x 1 , . . . , x n ) ∈ Ω. Verify the tangent and normal cone formulas:  T (x; Ω) = {u = (u 1 , . . . , u n ) ∈ Rn  Au = 0, u i ≥ 0 for any i with x i = 0 , 

N (x; Ω) = v ∈ Rn  v = A T y − z, y ∈ R p , z ∈ Rn , z i ≥ 0 for all i, z, x = 0 .

6

Minimal Time Functions and Related Issues

Our main attention in this chapter is drawn to the study of minimal time functions, which belong to a rather new class of nondifferentiable functions that are challenging theoretically and important for various applications. We reveal the remarkable properties of such functions and establish formulas for their convex subdifferentiation. Related issues on horizon cones and Minkowski gauges are also addressed in this chapter.

6.1

Horizon Cones

Here we study the behavior of unbounded convex sets at infinity. Such behavior can be largely captured by the notion and properties of horizon cones associated with convex sets. This notion is important for its own sake while being broadly used in the subsequent sections. Definition 6.1 Given a nonempty closed convex subset F of Rn and a point x ∈ F, the horizon cone of F at x is defined by    F∞ (x) := d ∈ Rn  x + td ∈ F for all t > 0 . Note that the horizon cone is also known as the asymptotic cone of F at the point in question. Another equivalent definition of F∞ (x) is given by F∞ (x) =

 F−x , t

t>0

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0_6

171

172

6 Minimal Time Functions and Related Issues

Fig. 6.1 Horizon cone

which clearly implies that F∞ (x) is a closed convex cone in Rn . The next proposition shows that F∞ (x) is the same for every x ∈ F, so it can be simply denoted by F∞ ; see an illustration in Fig. 6.1. Proposition 6.2 Let F be a nonempty closed convex subset of Rn . Then we have the equality F∞ (x1 ) = F∞ (x2 ) for any x1 , x2 ∈ F. Proof It suffices to verify that F∞ (x1 ) ⊂ F∞ (x2 ) whenever x1 , x2 ∈ F. Taking any direction d ∈ F∞ (x1 ) and any real number t > 0, we show that x2 + td ∈ F. Consider the sequence   1 1 xk = x1 + ktd + 1 − x2 , k ∈ N. k k Then xk ∈ F for every k because d ∈ F∞ (x1 ) and F is convex. We also have xk → x2 + td,  so x2 + td ∈ F since F is closed. Thus d ∈ F∞ (x2 ). Corollary 6.3 Let 0 ∈ F for a closed convex subset of Rn . Then  F∞ = t F. t>0

Proof It follows from the construction of F∞ and Proposition 6.2 that F∞ = F∞ (0) =

 F −0  1  t F, = F= t t

t>0

t>0

t>0

6.1

Horizon Cones

173



which therefore justifies the claim. The next proposition provides a sequential description of the horizon cone.

Proposition 6.4 Let F be a nonempty closed convex subset of Rn . For a given element d ∈ Rn , the following assertions are equivalent: (i) d ∈ F∞ . (ii) There are sequences {tk } ⊂ [0, ∞) and { f k } ⊂ F such that tk → 0 and tk f k → d as k → ∞. Proof To verify implication (i)=⇒(ii), take d ∈ F∞ and fix x ∈ F. It follows from the definition that x + kd ∈ F for all k ∈ N. Fixing k ∈ N, we find f k ∈ F for which we get x + kd = f k , or equivalently,

1 1 x + d = fk . k k

Letting tk = 1/k shows that tk f k → d as k → ∞. To verify the converse implication (ii)=⇒(i), suppose that there exist sequences {tk } ⊂ [0, ∞) and { f k } ⊂ F such that tk → 0 and tk f k → d as k → ∞. Fix x ∈ F and get d ∈ F∞ by showing that x + td ∈ F for all t > 0. Indeed, for any fixed t > 0, we have 0 ≤ t · tk < 1 when k is large. Thus (1 − t · tk )x + t · tk f k → x + td as k → ∞. It follows from the convexity of F that every element (1 − t · tk )x + t · tk f k belongs to F. Hence x + td ∈ F by the closedness of the set F, and thus we arrive at d ∈ F∞ and complete the proof.  Finally in this section, we present the expected characterization of the set boundedness in terms of its horizon cone. Theorem 6.5 Let F be a nonempty closed convex subset of Rn . Then the set F is bounded if and only if its horizon cone is trivial, i.e., F∞ = {0}. Proof Suppose that F is bounded and take any d ∈ F∞ . By Proposition 6.4, there exist sequences {tk } ⊂ [0, ∞) with tk → 0 and { f k } ⊂ F such that tk f k → d as k → ∞. It follows from the boundedness of F that tk f k → 0, which shows that d = 0.

174

6 Minimal Time Functions and Related Issues

To prove the converse implication, assume on the contrary that F is unbounded while F∞ = {0}. Then there exists a sequence {xk } ⊂ F with xk  = 0 and xk → ∞, which allows us to form the sequence of (unit) directions dk =

xk , k ∈ N. xk

Without loss of generality, we get that dk → d as k → ∞ with d = 1. Fix x ∈ F and observe that for all t > 0 and large k ∈ N it follows that  t t x+ xk ∈ F uk = 1 − xk xk due to the convexity of F. Furthermore, we get that x + td ∈ F since u k → x + td and F is closed. Thus 0  = d ∈ F∞ , which is a contradiction that completes the proof of the theorem. 

6.2

Minimal Time Functions and Their Properties

The main object of this section is a remarkable class of extended-real-valued functions known as the minimal time functions. This class of functions is relatively new in analysis and highly important in applications. Some of their recent applications to facility location problems are given in Chap. 8. Definition 6.6 Let F be a nonempty closed convex subset of Rn , and let Ω ⊂ Rn be a nonempty set (not necessarily convex). The minimal time function associated with the sets F and Ω is defined by    TΩF (x) := inf t ≥ 0  (x + t F) ∩ Ω = ∅ , x ∈ Rn . (6.1) The set F is called dynamic set and the set Ω is called the target set. The name comes from the following interpretation: TΩF (x) from (6.1) signifies the minimal time needed for the point x to reach the target set Ω along the constant dynamics F; see Fig. 6.2. Note that when F = B, the closed unit ball in Rn , the minimal time function (6.1) reduces to the distance function d(·; Ω) considered in Sect. 1.5. In the case where F = {0}, the minimal time function (6.1) is the indicator function of Ω. If Ω = {0}, we get TΩF (x) = ρ F (−x), where ρ F is the Minkowski gauge function defined in the next section. The minimal time function (6.1) also covers another interesting class of functions called

6.2

Minimal Time Functions and Their Properties

175

Fig. 6.2 Minimal time functions

the directional minimal time function, which corresponds to the case where F is a nonzero vector. In this section, we study the general properties of the minimal time function (6.1) used in what follows. These properties are certainly of own interest. Observe that TΩF (x) = 0 if x ∈ Ω, and that TΩF (x) is finite if there exists t ≥ 0 ensuring the condition (x + t F) ∩ Ω  = ∅, which holds, in particular, when 0 ∈ int F. If no such t exists, then we have TΩF (x) = ∞ by definition. Thus the minimal time function (6.1) is an extended-real-valued function in general. The sets F and Ω in (6.1) are not assumed to be bounded. The next proposition reveals the behavior of TΩF (x) in the case where F is bounded. Proposition 6.7 Let F be a nonempty compact convex subset of Rn , and let ∅  = Ω ⊂ Rn . Then we have the implication



x ∈ Ω =⇒ TΩF (x) = 0 .

The converse implication holds if we assume in addition that Ω is closed. Proof If x ∈ Ω, then it is obvious that TΩF (x) = 0. Suppose now that TΩF (x) = 0 under the additional assumption that Ω is closed. Then we find a sequence {tk } ⊂ [0, ∞) such that lim tk = 0 and (x + tk F) ∩ Ω  = ∅.

k→∞

176

6 Minimal Time Functions and Related Issues

Choose sequences { f k } ⊂ F and {wk } ⊂ Ω such that x + tk f k = wk for all k ∈ N. Since F is bounded, {wk } converges to x. Thus we have x ∈ Ω due to the closedness of Ω and complete the proof.  The next theorem involves the case where F is not necessarily bounded. Theorem 6.8 Let F be a nonempty closed convex subset of Rn , and let the target set Ω ⊂ Rn be merely nonempty. (i) If 0 ∈ F, then we have the implication



x ∈ Ω − F∞ =⇒ TΩF (x) = 0 .

(ii) If Ω is closed and bounded, then we have the converse implication



TΩF (x) = 0 =⇒ x ∈ Ω − F∞ .

Consequently, if 0 ∈ F and Ω is closed and bounded, then we have TΩF (x) = 0 if and only if x ∈ Ω − F∞ . Proof (i) Suppose that 0 ∈ F. To verify the claimed implication, pick any x ∈ Ω − F∞ and find q ∈ Ω and d ∈ F∞ such that x = q − d. Since 0 ∈ F and d ∈ F∞ , we deduce from Corollary 6.3 the inclusions k(q − x) = kd ∈ F for all k ∈ N. This shows that q−x ∈

 1  1 F, and so x + F ∩ Ω  = ∅, k ∈ N. k k

The latter ensures that 0 ≤ TΩF (x) ≤ 1/k for all k ∈ N, which implies TΩF (x) = 0. (ii) Suppose that Ω is closed and bounded. If TΩF (x) = 0, then we find tk → 0 as k → ∞ with tk ≥ 0 such that (x + tk F) ∩ Ω  = ∅ for all k ∈ N. This gives us qk ∈ Ω and f k ∈ F with x + tk f k = qk for each k. By the closedness and boundedness of Ω, we can select a subsequence of {qk } (no relabeling) that converges to some q ∈ Ω, and thus get tk f k → q − x. It follows from Proposition 6.4 that q − x ∈ F∞ ,  which justifies x ∈ Ω − F∞ and completes the proof of the theorem. The following example illustrates the result of Theorem 6.8. Example 6.9 Let F = R × [−1, 1] ⊂ R2 , and let Ω be the closed ball centered at (1, 0) with radius r = 1. Using the representation of F∞ from Corollary 6.3, we get that F∞ = R × {0} and Ω − F∞ = R × [−1, 1]. This tells us that TΩF (x) = 0 if and only if x ∈ R × [−1, 1].

6.2

Minimal Time Functions and Their Properties

177

The next proposition guarantees the convexity of the minimal time function (6.1) under the convexity of the target set Ω in a rather general setting. Proposition 6.10 Let F be a nonempty closed convex subset of Rn . If ∅  = Ω ⊂ Rn is convex, then the minimal time function (6.1) is convex. Proof Take any x1 , x2 ∈ dom TΩF , and let λ ∈ (0, 1). We now show that



TΩF λx1 + (1 − λ)x2 ≤ λTΩF (x1 ) + (1 − λ)TΩF (x2 ).

(6.2)

Denote γi = TΩF (xi ) for i = 1, 2. Given ε > 0, select real numbers ti with γi ≤ ti < γi + ε and (xi + ti F) ∩ Ω  = ∅ for i = 1, 2. It follows from the convexity of F and Ω that

λx1 + (1 − λ)x2 + λt1 + (1 − λ)t2 F ∩ Ω  = ∅, which implies in turn the estimates

TΩF λx1 + (1 − λ)x2 ≤ λt1 + (1 − λ)t2 ≤ λTΩF (x1 ) + (1 − λ)TΩF (x2 ) + ε. Letting ε → 0+ , we arrive at (6.2) and thus verify the convexity of TΩF . Consider the enlargement of the target set Ω via minimal time function    Ωr := x ∈ Rn  TΩF (x) ≤ r , r > 0.



(6.3)

The next proposition establishes a relationship between the minimal time function associated with the target set Ω and the minimal time function associated with its enlargement Ωr . Proposition 6.11 Let F be a nonempty closed convex subset of Rn . Then for any nonempty set Ω and any x ∈ / Ωr with TΩF (x) < ∞, we have

TΩF (x) = TΩFr (x) + r . Proof Since Ω ⊂ Ωr , we get TΩFr (x) ≤ TΩF (x) < ∞. Pick any t ≥ 0 with (x + t F) ∩ Ωr  = ∅ and find f 1 ∈ F and u ∈ Ωr such that x + t f 1 = u. We have TΩF (u) ≤ r , so for any ε > 0 there is s ≥ 0 such that s < r + ε and (u + s F) ∩ Ω  = ∅. This yields the existence of ω ∈ Ω and f 2 ∈ F such that u + s f 2 = ω. Thus

178

6 Minimal Time Functions and Related Issues

ω = u + s f 2 = (x + t f 1 ) + s f 2 ∈ x + (t + s)F by the convexity of F. This tells us that

TΩF (x) ≤ t + s < t + r + ε, and so TΩF (x) ≤ t + r due to the arbitrary choice of ε > 0. Therefore, we arrive at

TΩF (x) ≤ TΩFr + r . To verify the reverse inequality, denote γ = TΩF (x) and observe that r < γ since x ∈ / Ωr . For any ε > 0, find t ≥ 0, f ∈ F, and ω ∈ Ω such that γ ≤ t < γ + ε and x + t f = ω. This implies that ω = x + t f = x + (t − r ) f + r f ∈ x + (t − r ) f + r F, and so TΩF (x + (t − r ) f ) ≤ r . Then x + (t − r ) f ∈ Ωr and x + (t − r ) f ∈ x + (t − r )F. Thus TΩFr (x) ≤ t − r ≤ γ − r + ε. This brings us to r + TΩFr (x) ≤ γ = TΩF (x) by the arbitrary choice of ε > 0 and hence completes the proof.



We conclude this section with yet another property of minimal time functions used in what follows. Proposition 6.12 Let F be a nonempty closed convex subset of Rn . Then for any nonempty set Ω, any x ∈ dom TΩF , any t ≥ 0, and any f ∈ F, we have

TΩF (x − t f ) ≤ TΩF (x) + t. Proof Fix any ε > 0 and select s ≥ 0 such that

TΩF (x) ≤ s < TΩF (x) + ε and (x + s F) ∩ Ω = ∅. Then (x − t f + t F + s F) ∩ Ω  = ∅, and hence (x − t f + (t + s)F) ∩ Ω  = ∅. It shows therefore that TΩF (x − t f ) ≤ t + s < t + TΩF (x) + ε, which verifies the claim of the proposition by letting ε → 0+ .



6.3

6.3

Minkowski Gauge Functions

179

Minkowski Gauge Functions

This section addresses the class of Minkowski gauge functions, which plays a very important role in many aspects of convex analysis and its applications. We study here the remarkable properties of Minkowski gauge functions and establish relationships between them and minimal time functions that are used in the next section for the subsequent subdifferential study. Definition 6.13 Let F be a nonempty closed convex subset of Rn . The Minkowski gauge function associated with the set F is defined by    ρ F (x) := inf t ≥ 0  x ∈ t F , x ∈ Rn . (6.4) The following theorem summarizes the main properties of the Minkowski gauge function with the usage of the horizon cone of the underlying set. Theorem 6.14 Let F be a nonempty closed convex set in Rn . Then the Minkowski gauge ρ F is an extended-real-valued function, which is positively homogeneous and subadditive. If we assume in addition that 0 ∈ F, then ρ F (x) = 0 if and only if x ∈ F∞ . Proof We first prove that ρ F is subadditive meaning that ρ F (x1 + x2 ) ≤ ρ F (x1 ) + ρ F (x2 ) for all x1 , x2 ∈ Rn .

(6.5)

This obviously holds if the right-hand side of (6.5) is equal to infinity, i.e., we have that either x1 ∈ / dom ρ F , or x2 ∈ / dom ρ F . Suppose now that x1 , x2 ∈ dom ρ F and fix any ε > 0. Then there exist numbers t1 , t2 ≥ 0 such that ρ F (x1 ) ≤ t1 < ρ F (x1 ) + ε, ρ F (x2 ) ≤ t2 < ρ F (x2 ) + ε and x1 ∈ t1 F, x2 ∈ t2 F. It follows from the convexity of F that x1 + x2 ∈ t1 F + t2 F = (t1 + t2 )F. This readily implies the inequalities ρ F (x1 + x2 ) ≤ t1 + t2 < ρ F (x1 ) + ρ F (x2 ) + 2ε, which justify (6.5) since ε > 0 was chosen arbitrarily. Next we check the positive homogeneous property of ρ F meaning that ρ F (αx) = αρ F (x) for all α > 0 and x ∈ Rn . Taking any α > 0 and x ∈ Rn , we get

180

6 Minimal Time Functions and Related Issues

      t   ρ F (αx) = inf t ≥ 0 αx ∈ t F = inf t ≥ 0 x ∈ F α  t t   ≥ 0  x ∈ F = αρ F (x). = α inf α α Thus it remains to verify the last statement of the theorem. Observe that ρ F (x) = TΩ−F (x) is a special case of the minimal time function with Ω = {0}. Since 0 ∈ F, Theorem 6.8 tells us that ρ F (x) = 0 if and only if x ∈ Ω − (−F)∞ = F∞ , which therefore completes the proof.  Next we proceed with some important results of their own interest while being useful to establish a major relationship between the minimal time function and the Minkowski gauge. Lemma 6.15 Let F be a nonempty closed convex subset of Rn . Then 0 ∈ int F if and only if its polar F ◦ is bounded. Proof Suppose that 0 ∈ int F and find r > 0 such that B(0; r ) ⊂ F. Taking any v ∈ F ◦ tells us that v, x ≤ 1 for all x ∈ F, and so v, x ≤ 1 for all x ∈ B(0; r ). Then we can see that v ≤ 1/r , which ensures that F is bounded. To verify the converse implication, suppose on the contrary that F ◦ is bounded but 0∈ / int F. For any k ∈ N, we have B(0; 1/k)  ⊂ F, and so there exists xk ∈ Rn such that / F. The strict separation guarantees the existence of vk ∈ Rn such that xk ≤ 1/k and xk ∈ vk , x < vk , xk  for all x ∈ F. Define u k = vk / vk and assume without loss of generality that u k → u with u = 1 as k → ∞. It follows by passing to a limit that u, x ≤ 0 for all x ∈ F, which implies that ku, x ≤ 0 for all x ∈ F. Thus ku ∈ F ◦ , which yields a contradiction because ku → ∞ as k → ∞.  The following lemma relates the Minkowski gauge function associated with F and the support function associated with F ◦ . Lemma 6.16 Let F be a closed convex set in Rn with 0 ∈ F. Then we have the relationship ρ F (x) = σ F ◦ (x) for all x ∈ Rn . Proof Fix any x ∈ Rn and consider any t ≥ 0 such that x ∈ t F. Then there exists u ∈ F such that x = tu. For any v ∈ F ◦ , we have

6.3

Minkowski Gauge Functions

181

v, x = v, tu = tv, u ≤ t, which implies the inequality σ F ◦ (x) = supv∈F ◦ v, x ≤ t. Taking the infimum with respect to all such t ≥ 0 gives us σ F ◦ (x) ≤ ρ F (x). Let us now prove that the equality holds. Suppose on the contrary that σ F ◦ (x) < ρ F (x). Then observing the inclusion 0 ∈ F ◦ , we can find λ ∈ R such that (6.6) 0 ≤ σ F ◦ (x) < λ < ρ F (x). The definition of the Minkowski gauge function yields x ∈ / λF, which is a nonempty, closed, and convex set. Taking into account that 0 ∈ F, the strict separation theorem ensures that there exist v ∈ Rn and γ ∈ R such that 0 ≤ sup{v, λu | u ∈ F} < γ < v, x. Let w = λv/γ. Then σ F (w) ≤ 1, and so w ∈ F ◦ . However, w, x = λv/γ, x > λ, which contradicts (6.6) and completes the proof.  The equality in the corollary below does not require the condition 0 ∈ F. Corollary 6.17 Let F be a nonempty closed convex set in Rn . Then ρ F ◦ (x) = σ F (x) for all x ∈ Rn . Proof Since F ◦ is closed and convex with 0 ∈ F ◦ , by Lemma 6.16 we have ρ F ◦ (x) = σ F ◦◦ (x) for all x ∈ Rn . It remains to use the equality F ◦◦ = co(F ∪ {0}) from Exercise 5.19 to complete the proof of this statement.  We say that ϕ : Rn → R is -Lipschitz continuous on Rn if it is Lipschitz continuous there with Lipschitz constant  ≥ 0. The next result calculates a constant  of the Lipschitz continuity of the Minkowski gauge function on Rn . Proposition 6.18 Let F be a closed convex set in Rn with 0 ∈ int F. Then the Minkowski gauge ρ F is -Lipschitz continuous on Rn with the Lipschitz constant calculated by     = F ◦ := sup v  v ∈ F ◦ . (6.7) In particular, we have that ρ F (x) ≤  x for all x ∈ Rn . Proof The condition 0 ∈ int F ensures by Lemma 6.15 that  < ∞ for the constant  from (6.7). By Lemma 6.16 and the definition of the support function, we have

182

6 Minimal Time Functions and Related Issues

ρ F (x) = σ F ◦ (x) = sup v, x ≤ F ◦ · x =  x for all x ∈ Rn . v∈F ◦

Then the subadditivity of ρ F gives us ρ F (x) − ρ F (y) ≤ ρ F (x − y) ≤  x − y for all x, y ∈ Rn , and thus ρ F is -Lipschitz continuous on Rn with constant (6.7).



Our next result relates the minimal time function to the Minkowski gauge. Theorem 6.19 Let F be a nonempty closed convex set in Rn . Then for any nonempty set Ω, we have the formula    TΩF (x) = inf ρ F (ω − x)  ω ∈ Ω , x ∈ Rn . (6.8) Proof Consider first the case where TΩF (x) = ∞. In this case, we have    t ≥ 0  (x + t F) ∩ Ω  = ∅ = ∅, which implies that {t ≥ 0 | ω − x ∈ t F} = ∅ for every ω ∈ Ω. Thus ρ F (ω − x) = ∞ as ω ∈ Ω, and hence the right-hand side of (6.8) is also infinity. Considering further the case where TΩF (x) < ∞, fix any t ≥ 0 such that (x + t F) ∩ Ω  = ∅. This gives us f ∈ F and ω ∈ Ω with x + t f = ω. Thus ω − x ∈ t F, and so ρ F (ω − x) ≤ t, which yields inf ω∈Ω ρ F (ω − x) ≤ t. It follows from Definition 6.13 that inf ρ F (ω − x) ≤ TΩF (x) < ∞.

ω∈Ω

To verify the reverse inequality in (6.8), denote γ = inf ω∈Ω ρ F (ω − x) and for any ε > 0 find ω ∈ Ω with ρ F (ω − x) < γ + ε. By the definition of the Minkowski gauge function (6.4), we get t ≥ 0 such that t < γ + ε and ω − x ∈ t F. This gives us ω ∈ x + t F and therefore ensures the estimate TΩF (x) ≤ t < γ + ε. Since ε > 0 is arbitrary, we arrive at

TΩF (x) ≤ γ = inf ρ F (ω − x), ω∈Ω

which yields the remaining inequality in (6.8) and so finishes the proof.



As a consequence of the obtained result and Proposition 6.18, we establish now the Lipschitz continuity of the minimal time function with the Lipschitz constant  calculated in (6.7).

6.4

Subgradients of Minimal Time Functions with Bounded Dynamics

183

Corollary 6.20 Let F be a closed convex set in Rn with 0 ∈ int F. Then for any nonempty set Ω, the function TΩF is -Lipschitz with the Lipschitz constant  taken from (6.7). Proof It follows from the subadditivity of ρ F and Proposition 6.18 that ρ F (ω − x) ≤ ρ F (ω − y) + ρ F (y − x) ≤ ρ F (ω − y) +  y − x for all ω ∈ Ω whenever x, y ∈ Rn . Then Theorem 6.19 tells us that

TΩF (x) ≤ TΩF (y) +  x − y , and so TΩF is -Lipschitz continuous due to the arbitrary choice of x, y.

6.4



Subgradients of Minimal Time Functions with Bounded Dynamics

In this section, we obtain subdifferential formulas for the minimal time function (6.1) in a special case where F is a nonempty compact convex set. The theorem below provides the subdifferential formula for the case where the reference point belongs to the target set Ω. Theorem 6.21 Let F be a nonempty compact convex set, and let Ω be a nonempty convex set in Rn . Then for any x ∈ Ω, we have ∂ TΩF (x) = N (x; Ω) ∩ C ∗ ,

(6.9)

where C ∗ := {v ∈ Rn | σ F (−v) ≤ 1} is defined via the support function (4.6). Proof Fix any v ∈ ∂ TΩF (x). Using (3.1) and the fact that TΩF (x) = 0 for all x ∈ Ω, we get the inequalities v, x − x ≤ TΩF (x) − TΩF (x) ≤ 0 for all x ∈ Ω. This yields v ∈ N (x; Ω). Fix any f ∈ F and define x = x − f . Then (x + F) ∩ Ω  = ∅, and so v, − f  = v, x − x ≤ TΩF (x) − TΩF (x) = TΩF (x) ≤ 1, which implies that v ∈ C ∗ and therefore verifies the inclusion ∂ TΩF (x) ⊂ N (x; Ω) ∩ C ∗ . To prove the reverse inclusion in (6.9), fix any v ∈ N (x; Ω) ∩ C ∗ and any x ∈ dom TΩF . For every ε > 0, there are t ∈ [0, ∞), ω ∈ Ω, and f ∈ F with

184

6 Minimal Time Functions and Related Issues

TΩF (x) ≤ t < TΩF (x) + ε and x + t f = ω. Then we have the relationships v, x − x = v, ω − x + tv, − f  ≤ t < TΩF (x) + ε = TΩF (x) − TΩF (x) + ε, which show that v ∈ ∂ TΩF (x) since ε > 0 was chosen arbitrarily.



The next theorem concerns the out-of-set case. Theorem 6.22 Let F be a nonempty compact convex set, and let Ω be a nonempty closed / Ω with x ∈ dom TΩF , we have the subdifferential repreconvex set in Rn . Then for any x ∈ sentation ∂ TΩF (x) = N (x; Ωr ) ∩ S ∗ , where the enlargement Ωr is defined in (6.3) with r = TΩF (x) > 0 and S ∗ := {v ∈ Rn | σ F (−v) = 1}. Proof Observe that r > 0 by Theorem 6.7. Pick any v ∈ ∂ TΩF (x) and get by the subgradient the definition that v, x − x ≤ TΩF (x) − TΩF (x) for all x ∈ Rn .

(6.10)

Then TΩF (x) ≤ r = TΩF (x), and hence v, x − x ≤ 0 for every x ∈ Ωr . This yields v ∈ N (x; Ωr ). Using (6.10) for x = x − f with f ∈ F and Proposition 6.12 ensures that σ F (−v) ≤ 1. Fix further ε ∈ (0, r ) and select t ∈ R, f ∈ F, and ω ∈ Ω satisfying the conditions r ≤ t < r + ε2 and ω = x + t f . We can write ω = x + ε f + (t − ε) f and then see that TΩF (x + ε f ) ≤ t − ε. Applying (6.10) to x = x + ε f gives us the estimates v, ε f  ≤ TΩF (x + ε f ) − TΩF (x) ≤ t − ε − r ≤ ε2 − ε, which imply in turn that 1 − ε ≤ −v, f  ≤ σ F (−v). Letting now ε → 0+ tells us that σ F (−v) ≥ 1, i.e., σ F (−v) = 1 in this case. Thus v ∈ S ∗ , which proves the inclusion ∂ TΩF (x) ⊂ N (x; Ωr ) ∩ S ∗ . To verify the reverse inclusion in (6.13), take any v ∈ N (x; Ωr ) with σ F (−v) = 1 and show that (6.10) is satisfied. It follows from Theorem 6.23 that v ∈ ∂ TΩFr (x), and hence v, x − x ≤ TΩFr (x) for all x ∈ Rn .

6.5

Subgradients of Minimal Time Functions with Unbounded Dynamics

185

Fix x ∈ dom TΩF and deduce from Proposition 6.11 that TΩF (x) − r = TΩFr (x) in the case t = TΩF (x) > r , which gives us (6.10). Suppose now that t ≤ r and for any ε > 0 choose f ∈ F such that v, − f  > 1 − ε. Then Proposition 6.12 tells us that TΩF (x − (r − t) f ) ≤ r , and so x − (r − t) f ∈ Ωr . Since v ∈ N (x; Ωr ), we have v, x − (r − t) f − x ≤ 0, which implies that

v, x − x ≤ v, f (r − t) ≤ (1 − ε)(t − r ) = (1 − ε) TΩF (x) − TΩF (x) . The arbitrary choice of ε > 0 yields (6.10). Therefore, the inclusion v ∈ ∂ TΩF (x) holds, which completes the proof of the theorem. 

6.5

Subgradients of Minimal Time Functions with Unbounded Dynamics

Here we address calculating the subdifferential of the minimal time function (6.1) in the case where the constant dynamic set F is unbounded. The results below present precise formulas for calculating the subdifferential ∂ TΩF (x) depending on the position of x with respect to the set Ω − F∞ . Theorem 6.23 Let 0 ∈ F ⊂ Rn be a closed convex set, and let ∅  = Ω ⊂ Rn be a convex set. For any point x ∈ Ω − F∞ , the subdifferential of the minimal time function TΩF at x is calculated by the formula ∂ TΩF (x) = N (x; Ω − F∞ ) ∩ C ∗ ,

(6.11)

where C ∗ = {v ∈ Rn | σ F (−v) ≤ 1} is defined in Theorem 6.21. In particular, we have the representation (6.12) ∂ TΩF (x) = N (x; Ω) ∩ C ∗ for x ∈ Ω. Proof Fix any v ∈ ∂ TΩF (x). Using (3.1) and the fact that TΩF (x) = 0 for all x ∈ Ω − F∞ (see Theorem 6.8(i)), we get v, x − x ≤ TΩF (x) − TΩF (x) ≤ 0 for all x ∈ Ω − F∞ . This yields v ∈ N (x; Ω − F∞ ). Write x ∈ Ω − F∞ as x = ω − d with ω ∈ Ω and d ∈ F∞ . Fix any f ∈ F and define xt = (x + d) − t f = ω − t f as t > 0. Then (xt + t F) ∩ Ω  = ∅, and so  d  v, xt − x≤TΩF (xt ) ≤ t, or equivalently v, − f ≤ 1 for all t > 0. t

186

6 Minimal Time Functions and Related Issues

Letting t → ∞ implies that v, − f  ≤ 1. Hence we get v ∈ C ∗ and therefore verify the inclusion ∂ TΩF (x) ⊂ N (x; Ω − F∞ ) ∩ C ∗ . To furnish the proof of the reverse inclusion in (6.11), fix any v ∈ N (x; Ω − F∞ ) ∩ C ∗ and any u ∈ dom TΩF . For every ε > 0, there exist t ∈ [0, ∞), ω ∈ Ω, and f ∈ F such that

TΩF (u) ≤ t < TΩF (u) + ε and u + t f = ω. Since Ω ⊂ Ω − F∞ , we have the relationships v, u − x = v, ω − x + tv, − f  ≤ t < TΩF (u) + ε = TΩF (u) − TΩF (x) + ε, which show that v ∈ ∂ TΩF (x) since ε > 0 is arbitrary. In this way, we get N (x; Ω − F∞ ) ∩ C ∗ ⊂ ∂ TΩF (x) and thus justify the subdifferential formula (6.11) in the general case where x ∈ Ω − F∞ . It remains to verify the simplified representation (6.12) in the case where x ∈ Ω. Since Ω ⊂ Ω − F∞ , we obviously have the inclusion N (x; Ω − F∞ ) ∩ C ∗ ⊂ N (x; Ω) ∩ C ∗ . To show the reverse inclusion, pick v ∈ N (x; Ω) ∩ C ∗ and fix x ∈ Ω − F∞ . Taking ω ∈ Ω and d ∈ F∞ with x = ω − d gives us td ∈ F due to 0 ∈ F, and so v, −td ≤ 1 for all t > 0. Thus v, −d ≤ 0, which readily implies that v, x − x = v, ω − d − x = v, ω − x + v, −d ≤ 0. Therefore, we arrive at v ∈ N (x; Ω − F∞ ) ∩ C ∗ and complete the proof.



The next theorem provides the subdifferential calculation for the minimal time function in the case complemented to the one considered in Theorem 6.23, i.e., when x ∈ / Ω − F∞ . Theorem 6.24 Let both sets 0 ∈ F ⊂ Rn and Ω ⊂ Rn be nonempty, closed, and convex, and let x ∈ dom TΩF for the minimal time function (6.1). Suppose that Ω is bounded. Then / Ω − F∞ , we have for x ∈ ∂ TΩF (x) = N (x; Ωr ) ∩ S ∗ ,

(6.13)

 where the enlargement Ωr with r = TΩF (x) > 0 is defined in (6.3), and where S ∗ = v ∈   Rn  σ F (−v) = 1 is defined in Theorem 6.22. Proof Observe that r > 0 by Theorem 6.8. The rest of the proof follows the proof of Theorem 6.22. 

6.6

6.6

Exercises for This Chapter

187

Exercises for This Chapter

6.1 Find the horizon cones of the following sets: (i) F = {(x, y) ∈ R2 | y ≥ x 2 }. (ii) F = {(x, y) ∈ R2 | y ≥ |x|}. (iii) F = {(x, y) ∈ R2 | x > 0, √ y ≥ 1/x}.  (iv) F = (x, y) ∈ R2 | y ≥ x 2 + 1}. 6.2 Let vi ∈ Rn and bi > 0 for i = 1, . . . , m. Consider the set   F = x ∈ Rn  vi , x ≤ bi for all i = 1, . . . , m}. Prove the following representation of the horizon cone for F:    F∞ = x ∈ Rn  vi , x ≤ 0 for all i = 1, . . . , m . 6.3 Let Ω be a nonempty closed convex set in Rn . Prove that    Ω∞ = d ∈ Rn  Ω + d ⊂ Ω . 6.4 Let A, B be nonempty closed convex subsets of Rn . Show that (A × B)∞ = A∞ × B∞ . 6.5 Let A, B be nonempty closed convex subsets of Rn . Show that if A∞ ∩ B∞ = {0}, then A + B is closed and (A + B)∞ = A∞ + B∞ . 6.6 Let Ω j for j ∈ J be nonempty closed convex subsets of Rn which have a common point. Verify the equality    j j Ω = Ω∞ . j∈J



j∈J

6.7 Let ∅  = Ω ⊂ Rn be closed and convex, and let A : Rn → R p be linear. (i) Prove that if

A−1 (0) ∩ Ω∞ = {0},

then AΩ is closed and we have the equality AΩ∞ = (AΩ)∞ . (ii) Construct an example showing that the assumption in (i) is essential. 6.8 Prove that if 0 ∈ int F, then the minimal time function TΩF defined in (6.1) is a realvalued function, i.e., TΩF (x) ∈ R for all x ∈ Rn .

188

6 Minimal Time Functions and Related Issues

6.9 Consider the minimal time function TΩF under the general conditions on the sets F and Ω imposed in Definition 6.6. Prove the following assertions: (i) If F = B, the closed unit ball in Rn , then the minimal time function (6.1) reduces to the distance function d(·; Ω) considered in Sect. 1.5. (ii) If F = {0}, then the minimal time function (6.1) is the indicator function of Ω. (iii) If Ω = {0}, then TΩF (x) = ρ F (−x) for all x ∈ Rn , where ρ F is the Minkowski gauge function defined in Definition 6.13. 6.10 Let F ⊂ Rn be a closed convex set with 0 ∈ int F, and let Ω be a nonempty subset of Rn . Assume that the complement Ω c of the target set is convex. Prove that the minimal time function TΩF is concave on Ω c . 6.11 Let F be a nonempty closed convex set in Rn . Prove that F is bounded if and only if 0 ∈ int F ◦ . 6.12 Consider the Minkowski gauge function ρ F under the general condition on the set F imposed in Definition 6.13. Prove that if 0 ∈ F, then    ρ F (x) = inf t > 0  x ∈ t F for all x ∈ Rn , i.e., the condition t ≥ 0 in Definition 6.13 can be replaced by t > 0 if 0 ∈ F. 6.13 Consider the Minkowski gauge function ρ F , where F ⊂ Rn is compact and convex with 0 ∈ F. Prove that    x ∈ Rn  ρ F (x) ≤ 1 = F. 6.14 Consider the Minkowski gauge function ρ F , where F ⊂ Rn is compact and convex with 0 ∈ F. Prove that x, v ≤ ρ F (x)ρ F ◦ (v) for all x, v ∈ Rn . 6.15 Consider the Minkowski gauge function ρ F , where F ⊂ Rn is a compact convex set with 0 ∈ F. Prove that (i) {x ∈ Rn | ρ F (x) ≤ 1} = F. (ii) ∂ρ F (0) = F ◦ . (iii) If x  = 0, then    ∂ρ F (x) = v ∈ Rn  ρ F (x) = v, x, ρ F ◦ (v) = 1 . 6.16 Consider the Minkowski gauge function ρ F under the general condition on the set F imposed in Definition 6.13 and suppose that 0 ∈ F, where F is not necessarily bounded.

6.6

Exercises for This Chapter

189

Perform the following: (i) Prove that {x ∈ Rn | ρ F (x) ≤ 1} = F ∪ F∞ . (ii) Calculate ∂ρ F (0) and ∂ ∞ ρ F (0). (iii) Calculate ∂ρ F ◦ (0) and ∂ ∞ ρ F ◦ (0). 6.17 Let F  = ∅ be a compact convex set, while Ω  = ∅ is a closed convex set. Prove that / Ω, then if x ∈

F ∂ TΩF (x) = − ∂ρ F (ω − x) ∩ N (ω; Ω) for any ω ∈ Ω (x) in terms of the generalized projection to Ω associated with F defined by    F Ω (x) := w ∈ Ω  TΩF (x) = ρ F (w − x) .

(6.14)

6.18 Let F be a closed convex set in Rn with 0 ∈ int F, and let Ω be a nonempty set. Define the generalized distance function associated with F and Ω by F dΩ (x) := inf σ F (w − x), x ∈ Rn , w∈Ω

via the support function σ F given in (4.6). Prove the following assertions: F (x) = 0, while the converse implication holds if we assume in addition (i) If x ∈ Ω, then dΩ that Ω is closed. F is convex. (ii) If Ω is convex, then dΩ F is Lipschitz continuous with constant  = F := (iii) If in addition F is bounded, then dΩ sup f ∈F f . (iv) If Ω is convex with x ∈ Ω, then F ∂dΩ (x) = N (x; Ω) ∩ D ∗ ,

where D ∗ := {v ∈ Rn | ρ F (−v) ≤ 1}. / Ω, then (v) If Ω is closed and convex with x ∈ F ∂dΩ (x) = N (x; Ωr ) ∩ E ∗ , F (x) > 0 and E ∗ := {v ∈ Rn | ρ (−v) = 1}. where r = dΩ F

6.19 Provide further details for the proof of Theorem 6.24.

7

Applications to Problems of Optimization and Equilibrium

This chapter addresses applications of convex analysis to various aspects of optimization and equilibria. They include the existence of global minimizers in optimization problems and of Nash equilibria in two-person games, necessary and sufficient optimality conditions in convex constrained optimization, and subgradient methods to solve such problems numerically in both unconstrained and constrained frameworks. The convex calculus rules for normals and subgradients developed in Chap. 3 play an underlying role in deriving the main results established in this chapter in simplified and unified ways.

7.1

Lower Semicontinuity and Existence of Minimizers

First we discuss the notion of lower semicontinuity for extended-real-valued functions that is crucial in verifying the existence of global minimizers and related topics. The general results obtained here concern nonconvex optimization with further specifications in convex frameworks. Definition 7.1 An extended-real-valued function f : Rn → R is lower semicontinuous (l.s.c.) at x ∈ Rn if for every α ∈ R with α < f (x) there exists δ > 0 ensuring the inequality α < f (x) for all x ∈ B(x; δ).

(7.1)

We say that f is lower semicontinuous on Rn if this function is lower semicontinuous at every point of the space. In the case where f is lower semicontinuous on Rn , we also say that f is lower semicontinuous.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0_7

191

192

7 Applications to Problems of Optimization and Equilibrium

The following simple examples illustrate the given definition (Fig. 7.1).

Fig. 7.1 Which function is l.s.c.?

Example 7.2 (i) Define the function f : R → R by  x 2 if x  = 0, f (x) = −1 if x = 0. It is easy to see that the defined function is l.s.c. at x = 0 and in fact on the whole real line. On the other hand, the replacement of f (0) = −1 by f (0) = 1 destroys the lower semicontinuity at x = 0. (ii) Consider the indicator function of the interval [−1, 1] ⊂ R given by  0 if |x| ≤ 1, f (x) = ∞ otherwise. Then f is an l.s.c. function on R. However, the modified function  0 if |x| < 1, g(x) = ∞ otherwise is not lower semicontinuous on R. In fact, g is not l.s.c. at x = ±1, while it is lower semicontinuous at any other point of the real line. (iii) Consider the function f : R → R given by

7.1

Lower Semicontinuity and Existence of Minimizers

193

⎧ ⎨ 1 if x  = 0, f (x) = |x| ⎩ ∞ if x = 0. Take any real number α < f (0) = ∞ and consider first the case where α > 0. Letting δ = 1/α, we can verify that α < f (x) whenever |x| < δ. If α ≤ 0, then any real number δ > 0 ensures that α < f (x) whenever |x| < δ. Thus f is l.s.c. at 0 by definition. In fact, f is l.s.c. on R. The four propositions presented below contain useful characterizations of lower semicontinuity for extended-real-valued functions. Proposition 7.3 Let f : Rn → R with x ∈ dom f . Then f is l.s.c. at x if and only if for any real number  > 0 there exists δ > 0 such that f (x) −  < f (x) whenever x ∈ B(x; δ).

(7.2)

Proof Suppose that f is l.s.c. at x and take any real number  > 0. Since λ = f (x) −  < f (x) and λ is a real number due to x ∈ dom f , we find by the definition of lower semicontinuity δ > 0 such that f (x) −  = λ < f (x) for all x ∈ B(x; δ). Conversely, suppose that for any real number  > 0 there exists δ > 0 such that (7.2) is satisfied. Fix any real number λ < f (x) and choose  > 0 for which λ < f (x) − . Then for some δ > 0 we arrive at the inequalities λ < f (x) −  < f (x) whenever x ∈ B(x; δ). This implies by the definition the lower semicontinuity of f at x therefore completes the proof of the proposition.  Proposition 7.4 A function f : Rn → R is l.s.c. if and only if for any number λ ∈ R the sublevel set Lλ ( f ) = {x ∈ Rn | f (x) ≤ λ} is closed. Proof Suppose that f is l.s.c. and fix λ ∈ R. If x ∈ / Lλ ( f ), then we get λ < f (x). It follows from the l.s.c. definition that there exists δ > 0 such that λ < f (x) for all x ∈ B(x; δ), which yields the inclusion B(x; δ) ⊂ Lcλ ( f ) for the complement of Lλ ( f ). Thus Lcλ ( f ) is open, so the sublevel set Lλ ( f ) is closed.

194

7 Applications to Problems of Optimization and Equilibrium

To verify the converse implication, suppose that the set Lλ ( f ) is closed for every λ ∈ R. / Lλ ( f ), there exists Fix x ∈ Rn and λ ∈ R with λ < f (x). Since Lλ ( f ) is closed and x ∈ δ > 0 such that B(x; δ) ⊂ Lcλ ( f ). This tells us that λ < f (x) for all x ∈ B(x; δ) and thus completes the proof of the proposition.



Proposition 7.5 A function f : Rn → R is l.s.c. if and only if its epigraphical set epi f is closed in Rn+1 . Proof To verify the “only if” part, fix (x, λ) ∈ / epi f meaning that (x, λ) ∈ Rn × R and λ < f (x). Then for any  > 0 with λ < λ +  < f (x), there exists δ > 0 such that λ +  < f (x) whenever x ∈ B(x; δ). Hence B(x; δ) × (λ − , λ + ) ⊂ (epi f )c , and therefore the epigraph epi f is closed. Conversely, suppose that epi f is closed. Employing Proposition 7.4, it suffices to verify that for any λ ∈ R the sublevel set Lλ ( f ) is closed. To see this, fix a sequence {xk } ⊂ Lλ ( f ) converging to x. Since f (xk ) ≤ λ, we have (xk , λ) ∈ epi f for all k ∈ N. It follows from (xk , λ) → (x, λ) as k → ∞ that (x, λ) ∈ epi f , and so f (x) ≤ λ. Thus x ∈ Lλ ( f ), which  justifies the closedness of the sublevel set Lλ ( f ). Proposition 7.6 A function f : Rn → R is l.s.c. at x if and only if for any sequence xk → x as k → ∞ we have lim inf k→∞ f (xk ) ≥ f (x). Proof Suppose that f is l.s.c. at x and take an arbitrary sequence xk → x as k → ∞. By the definition of lower semicontinuity, for any real number λ < f (x) there exists δ > 0 such that (7.1) holds. Then xk ∈ B(x; δ), and so λ < f (xk ) for large k. It follows that λ ≤ lim inf k→∞ f (xk ), and hence f (x) ≤ lim inf k→∞ f (xk ) by the arbitrary choice of λ < f (x). To verify the converse implication, suppose on the contrary that f is not l.s.c. at x. Then there exists a real number λ < f (x) such that for every δ > 0 we have xδ ∈ B(x; δ) with λ ≥ f (xδ ). Applying this to δk = 1/k for k ∈ N gives us a sequence {xk } ⊂ Rn converging to x with λ ≥ f (xk ). This yields f (x) > λ ≥ lim inf f (xk ), k→∞

which is a contradiction that completes the proof of the proposition.



7.1

Lower Semicontinuity and Existence of Minimizers

195

The following two propositions indicate two cases of preserving lower semicontinuity under important operations performed on functions. Proposition 7.7 Let f i : Rn → R, i = 1, . . . , m, be l.s.c. functions. Then the sum is also an l.s.c. function.

m

i=1 f i

Proof It suffices to verify this for m = 2. Taking any x ∈ Rn and any sequence {xk } ⊂ Rn such that xk → x as k → ∞, we get lim inf f i (xk ) ≥ f (x) for i = 1, 2, k→∞

which immediately implies that   lim inf f 1 (xk ) + f 2 (xk ) ≥ lim inf f 1 (xk ) + lim inf f 2 (xk ) ≥ f 1 (x) + f 2 (x), k→∞

k→∞

k→∞

and thus f 1 + f 2 is l.s.c. at x by Proposition 7.6. Since x ∈ Rn is arbitrary, this justifies the  lower semicontinuity of f on Rn . Proposition 7.8 Let I be any nonempty index set, and let f i : Rn → R be l.s.c. functions for all i ∈ I . Then the supremum function f : Rn → R given by f (x) = supi∈I f i (x) for x ∈ Rn is also an l.s.c. function. Proof For any number λ ∈ R, we deduce directly from the definition that



x ∈ Rn f i (x) ≤ λ = Lλ ( f ) = x ∈ Rn sup f i (x) ≤ λ = Lλ ( f i ). i∈I

i∈I

i∈I

Thus all the sublevel sets of f are closed as intersections of closed sets. This ensures the lower semicontinuity of f by Proposition 7.4.  The next result provides two interconnected unilateral versions of the classical Weierstrass existence theorem presented in Theorem 1.17. The crucial difference is that now we assume the lower semicontinuity versus continuity while ensuring the existence of only minima, not together with maxima as in Theorem 1.17. Recall that a function f : Rn → R has a global/absolute minimum on Ω at x if x ∈ Ω and we have f (x) ≤ f (x) for all x ∈ Ω. Theorem 7.9 Let f : Rn → R be an l.s.c. function. The following hold:

196

7 Applications to Problems of Optimization and Equilibrium

(i) The function f attains its global minimum on any nonempty compact subset Ω ⊂ Rn . (ii) Assume that there exists a real number λ for which the sublevel set n L< λ ( f ) := {x ∈ R | f (x) < λ}

of f is nonempty and bounded. Then f attains its global minimum on Rn . Proof To verify (i), we first show that f is bounded below on Ω. Suppose on the contrary that this is not the case. Then for any k ∈ N there is xk ∈ Ω such that f (xk ) ≤ −k. Since Ω is compact, we can assume without loss of generality that xk → u ∈ Ω as k → ∞. It follows from the lower semicontinuity that f (u) ≤ lim inf f (xk ) = −∞, k→∞

a contradiction showing that f is bounded below on Ω. This implies further that γ = inf x∈Ω f (x) ∈ R = (−∞, ∞]. Take {xk } ⊂ Ω such that f (xk ) → γ as k → ∞. By the compactness of Ω, we can assume again that xk → x ∈ Ω as k → ∞. This yields f (x) ≤ lim inf f (xk ) = γ, k→∞

which shows that f (x) = γ ≤ f (x) for all x ∈ Rn , so f achieves its absolute minimum on Ω at x. It remains to prove assertion (ii). It is not hard to check that f is bounded below. Thus γ = inf x∈Rn f (x) is a real number such that γ < λ due to the nonemptiness of the sublevel set under consideration. Let {xk } ⊂ Rn for which f (xk ) → γ as k → ∞, and let k0 ∈ N be such that f (xk ) < λ for all k ≥ k0 . Since the sublevel set {x ∈ Rn | f (x) < λ} is bounded, we select a subsequence of {xk } (no relabeling) that converges to some x ∈ Rn . Then f (x) ≤ lim inf f (xk ) = γ, k→∞

which shows that x is a global minimizer of f on Rn .



The next corollary is very useful for verifying that a given l.s.c. function has an absolute minimum. Corollary 7.10 Let f : Rn → R be an l.s.c. function. The function f attains its global minimum on Rn under either one of the following conditions: (i) All sublevel sets Lλ ( f ) = {x ∈ Rn | f (x) ≤ λ} for λ ∈ R are bounded. (ii) lim x →∞ f (x) = ∞.

7.1

Lower Semicontinuity and Existence of Minimizers

197

Proof Note that if dom f = ∅, then f (x) = ∞ for all x ∈ Rn , so the conclusion is obvious. Thus it suffices to consider the case where the function f is proper, i.e., dom f  = ∅. Pick x0 ∈ dom f and choose a real number λ such that f (x0 ) < λ. Then ∅  = L< λ ( f ) ⊂ Lλ ( f ) is bounded under (i). Thus the conclusion follows from Theorem 7.9(ii). Now suppose that (ii) is satisfied. By definition, there exists a real number r > 0 such that λ ≤ f (x) whenever r < x . This implies that ∅  = L< λ ( f ) ⊂ B(0; r ), which is a bounded set. Therefore, the conclusion follows from Theorem 7.9(ii) again.  In contrast to all the results given above in this section, the next theorem requires the convexity of the function in question. It provides effective characterizations of the sublevel set boundedness imposed in Theorem 7.9(ii). Theorem 7.11 Let f : Rn → R be a proper convex function. The following conditions are equivalent: (i) There exists a real number β for which the sublevel set n L< β ( f ) = {x ∈ R | f (x) < β}

is nonempty and bounded. n (ii) All the sublevel sets L< λ ( f ) = {x ∈ R | f (x) < λ} for λ ∈ R of f are bounded (while may be empty). (iii) lim x →∞ f (x) = ∞. f (x) (iv) lim inf x →∞ > 0. x Proof Let γ = inf{ f (x) | x ∈ Rn }, which is either a real number or −∞ by the proper property of f . We first verify (i)=⇒(ii). Since (i) is satisfied, we see that γ < β. Suppose on the contrary that there is λ ∈ R such that the sublevel set L< λ ( f ) is not bounded. Then there ( f ) with x → ∞ as k → ∞. Fix a real number α such that exists a sequence {xk } ⊂ L< k λ ( f ) and then define the convex combination γ < α < β, fix x ∈ L< α 

1 1 1  xk − x = √ xk + 1 − √ x, k ∈ N. yk = x + √ xk xk xk By the convexity of f , we have the relationships     1 1 1 λ f (xk ) + 1 − √ f (x) < √ + 1− √ α → α < β, f (yk ) ≤ √ xk xk xk xk

198

7 Applications to Problems of Optimization and Equilibrium

and hence yk ∈ L< β ( f ) for k sufficiently large. This verifies the boundedness of {yk }. It follows from the triangle inequality that   x

 1  1  k   x = xk − 1 − √ x → ∞, yk ≥  √ − 1− √ xk xk xk which is a contradiction. Thus (ii) is satisfied. Next we show that (ii)=⇒(iii). Indeed, the violation of (iii) yields the existence of a constant M > 0 and a sequence {xk } ⊂ Rn such that xk → ∞ as k → ∞ and f (xk ) < M for all k ∈ N. However, this obviously contradicts the boundedness of the sublevel set {x ∈ Rn | f (x) < M}. Finally, we justify (iii)=⇒(i) leaving the proof of (iii)⇐⇒(iv) as an exercise. Suppose that (iii) is satisfied. Taking a real number M > γ, we can find a real number α > 0 such that f (x) ≥ M whenever x > α. Then

n

f (x) < M ⊂ B(0; α). L< M( f ) = x ∈ R Thus this set is nonempty and bounded, so (i) holds.



The corollary below follows directly from Theorems 7.9 and 7.11. Corollary 7.12 Let f : Rn → R be l.s.c. and convex. Then f achieves its absolute minimum on Rn under each of the equivalent conditions (i)–(iv) of Theorem 7.11.

7.2

Optimality Conditions in Convex Minimization

This section is devoted to deriving necessary and sufficient conditions for optimal solutions to convex minimization problems. We address first the following optimization problem with a geometric constraint: minimize f (x) subject to x ∈ Ω, (7.3) where the constraint set Ω is nonempty and convex, and where the objective function f : Rn → R is convex as well. Recall the standard notions of optimality for the constrained problem (7.3). Definition 7.13 A point x ∈ Ω is a local optimal solution to problem (7.3) if there exists a real number γ > 0 such that f (x) ≤ f (x) for all x ∈ B(x; γ) ∩ Ω.

(7.4)

If x ∈ Ω and we have f (x) ≤ f (x) for all feasible points x ∈ Ω, then x is a global optimal solution to problem (7.3).

7.2

Optimality Conditions in Convex Minimization

199

The next result shows, in particular, that there is no difference between local and global solutions to convex minimization problems. Proposition 7.14 Let x be an element of Ω. The following properties are equivalent in the convex setting of (7.3): (i) x is a local optimal solution to the constrained problem (7.3). (i) x is a local optimal solution to the unconstrained problem minimize g(x) = f (x) + δ(x; Ω) on Rn .

(7.5)

(iii) x is a global optimal solution to the unconstrained problem (7.5). (iv) x is a global optimal solution to the constrained problem (7.3). Proof To verify (i)=⇒(ii), select a real number γ > 0 such that (7.4) is satisfied. Since g(x) = f (x) for x ∈ Ω ∩ B(x; γ) and since g(x) = ∞ for x ∈ B(x; γ) with x ∈ / Ω, we obviously have g(x) ≤ g(x) for all x ∈ B(x; γ), which yields (ii). Implication (ii)=⇒(iii) is a consequence of Proposition 3.5. To verify (iii)=⇒(iv), let x is a global solution to (7.5), i.e., g(x) = f (x) + δ(x; Ω) ≤ f (x) + δ(x; Ω) = g(x) for all x ∈ Rn . It follows therefore that f (x) ≤ f (x) for all x ∈ Ω due to x ∈ Ω and δ(x; Ω) = 0 whenever x ∈ Ω. This justifies (iii). Finally, implication (iv)=⇒ (i) is obvious.  In what follows we use the term “optimal solutions” for problems of convex minimization, without mentioning whether it is global or local. Observe that the objective function g of the unconstrained problem (7.5) associated with the constrained one (7.3) is always extendedreal-valued when Ω  = Rn . This reflects the “infinite penalty” for the constraint violation. The next theorem presents basic necessary and sufficient optimality conditions for optimal solutions to the minimization problem (7.3) with a convex objective function and a convex constraint. Theorem 7.15 Consider problem (7.3), where f : Rn → R is convex function and Ω is a convex set with x ∈ Ω ∩ dom f . Suppose that ri(dom f ) ∩ ri Ω  = ∅.

(7.6)

200

7 Applications to Problems of Optimization and Equilibrium

Then the following assertions are equivalent: (i) x is an optimal solution to (7.3). (ii) 0 ∈ ∂ f (x) + N (x; Ω), i.e., ∂ f (x) ∩ [−N (x; Ω)]  = ∅. (iii) f  (x; d) ≥ 0 for all d ∈ T (x; Ω). Proof To verify implication (i)=⇒(ii), we use the unconstrained form (7.5) of problem (7.3) due to Proposition 7.14. Then Proposition 3.6 provides the optimal solution characterization 0 ∈ ∂g(x). Applying the subdifferential sum rule from Theorem 3.18 under the qualification condition (7.6) gives us the optimality condition in (ii). To show that implication (ii)=⇒(iii) holds, we have by the tangent-normal duality (5.6) the equivalence   d ∈ T (x; Ω) ⇐⇒ v, d ≤ 0 for all v ∈ N (x; Ω) . Fix a subgradient w ∈ ∂ f (x) such that −w ∈ N (x; Ω). Then w, d ≥ 0 for all d ∈ T (x; Ω), so we arrive at f  (x; d) ≥ w, d ≥ 0 by Theorem 4.27. Thus assertion (iii) is verified. It remains to justify (iii)=⇒(i). It follows from Lemma 4.26 that f (x + d) − f (x) ≥ f  (x; d) ≥ 0 for all d ∈ T (x; Ω).   By the tangent cone definition, we get T (x; Ω) = cl cone(Ω − x) , which tells us that d = w − x ∈ T (x; Ω) if w ∈ Ω. This yields f (w) − f (x) = f (w − x + x) − f (x) = f (x + d) − f (x) ≥ 0 whenever w ∈ Ω. Thus assertion (i) holds, and the proof of the theorem is complete.



Corollary 7.16 Assume that in the framework of Theorem 7.15 the cost function f : Rn → R is Fréchet differentiable at x. Then we have the following equivalent assertions: (i) x is an optimal solution to problem (7.3). (ii) −∇ f (x) ∈ N (x; Ω), which reduces to ∇ f (x) = 0 if x ∈ int Ω. Proof Since ∂ f (x) = {∇ f (x)} in this case by Theorem 5.2, we deduce this corollary directly from Theorem 7.15.  Next we consider convex optimization problems with functional constraints of the inequality type given by finitely many real-valued convex functions gi : Rn → R, i =

7.2

Optimality Conditions in Convex Minimization

201

1, . . . , m, along with a convex set/geometric constraint ∅  =  ⊂ Rn . For simplicity, suppose that f : Rn → R is a real-valued convex function and formulate problem (P) as follows: minimize f (x) subject to gi (x) ≤ 0 for i = 1, . . . , m, x ∈ . This problem can be written in form (7.3) with

 = x ∈ Rn gi (x) ≤ 0 for all i = 1, . . . , m ∩ .

(7.7)

Define the Lagrange function, or Lagrangian, for (P) via the objective and functional constraints by m  L(x, λ) := f (x) + λi gi (x), x ∈ Rn , i=1

where λi ∈ R, i = 1, . . . , m. For each feasible solution x of (P), i.e., x belonging to the set Ω from (7.7), consider the collection of active indices at x given by

I (x) = i ∈ {1, . . . , m} gi (x) = 0 . The next theorem presents necessary optimality conditions for (P). Theorem 7.17 Let x be an optimal solution to the convex problem (P). Then there exist nonnegative multipliers λ0 , λ1 , . . . , λm , not equal to zero simultaneously, such that we have 0 ∈ λ0 ∂ f (x) +

m 

λi ∂gi (x) + N (x; ),

(7.8)

i=1

and λi gi (x) = 0 for all i = 1, . . . , m. Proof Consider the convex maximum function

ϕ(x) = max f (x) − f (x), gi (x) i = 1, . . . , m , x ∈ Rn , and observe that x solves the following problem with no functional constraints: minimize ϕ(x) subject to x ∈ . Since ϕ : Rn → R is a real-valued convex function, we get from Theorem 7.15 that 0 ∈ ∂ϕ(x) + N (x; ). The subdifferential maximum rule from Theorem 3.27 ensures the inclusion

   ∂gi (x) + N (x; ). 0 ∈ co ∂ f (x) ∪ i∈I (x)

202

7 Applications to Problems of Optimization and Equilibrium

This gives us λ0 ≥ 0 and λi ≥ 0 for i ∈ I (x) with λ0 + 

0 ∈ λ0 ∂ f (x) +



i∈I (x) λi

= 1 and

λi ∂gi (x) + N (x; ).

i∈I (x)

/ I (x) and get λi gi (x) = 0, i = 1, . . . , m, with (λ0 , λ)  = 0. Let λi = 0 as i ∈



Many applications including numerical algorithms to solve problem (P) require effective constraint qualification conditions ensuring that λ0  = 0 in Theorem 7.17. The following one is classical in convex optimization. Definition 7.18 Slater’s constraint qualification (C Q) holds for (P) if there exists u ∈  such that gi (u) < 0 for all i = 1, . . . , m. The next theorem provides necessary and sufficient conditions for convex optimization in the form known as the qualified, normal, or Karush-Kuhn-Tucker (K K T ) ones. Theorem 7.19 Suppose that Slater’s CQ holds for (P), and let x be a feasible solution. Then x is an optimal solution to (P) if and only if there exist nonnegative Lagrange multipliers λ1 , . . . , λm such that m  0 ∈ ∂ f (x) + λi ∂gi (x) + N (x; ), (7.9) i=1

and λi gi (x) = 0 for all i = 1, . . . , m. Proof To verify the necessity part, we only need to apply Theorem 7.17 and show that λ0  = 0 in (7.8). Suppose on the contrary that λ0 = 0 and then find 0  = (λ1 , . . . , λm ) ∈ Rm +, vi ∈ ∂gi (x) for i = 1, . . . , m, and v ∈ N (x; ) satisfying the equality 0=

m 

λi vi + v.

i=1

This readily yields the fulfillment of 0=

m 

λi vi , x − x + v, x − x ≤

i=1

m  i=1





λi gi (x) − gi (x) =

m 

λi gi (x)

i=1

for all x ∈ , which is a contradiction since the Slater condition ensures the existence of u ∈  such that m  λi gi (u) < 0. i=1

7.2

Optimality Conditions in Convex Minimization

203

To check the sufficiency of (7.9), take v0 ∈ ∂ f (x), vi ∈ ∂gi (x), and v ∈ N (x; ) such m λi vi + v with λi gi (x) = 0 and λi ≥ 0 for all i = 1, . . . , m. Then the that 0 = v0 + i=1 optimality of x in (P) follows directly from the definitions of the subdifferential and normal cone. Indeed, for any x ∈  with gi (x) ≤ 0 for i = 1, . . . , m we have the relationships 0 = v0 +

m 

λi vi + v, x − x

i=1

= v0 , x − x +

m 

λi vi , x − x + v, x − x

i=1 m 

≤ f (x) − f (x) +

  λi gi (x) − gi (x)

i=1

= f (x) − f (x) +

m 

λi gi (x) ≤ f (x) − f (x),

i=1

which therefore complete the proof of the theorem.



Finally in this section, we consider the following convex optimization problem (Q) with convex inequality and affine equality constraints: minimize f (x) subject to gi (x) ≤ 0 for i = 1, . . . , m, Ax = b, where f , gi : Rn → R are real-valued convex functions, A ∈ R p×n , and b ∈ R p . Corollary 7.20 Suppose that Slater’s CQ holds for problem (Q) with the set  = {x ∈ Rn | Ax − b = 0} in Definition 7.18, and let x be a feasible solution to (Q). Then x is an optimal solution to this problem if and only if there exist nonnegative multipliers λ1 , . . . , λm such that m  0 ∈ ∂ f (x) + λi ∂gi (x) + im A T , i=1

and λi gi (x) = 0 for all i = 1, . . . , m. Proof The result follows from Theorem 7.17 and the fact that

N (x; ) = im A T = A T u u ∈ R p . This completes the proof of the corollary.



The concluding result presents a classical version of the Lagrange multiplier rule for convex problems with differentiable data.

204

7 Applications to Problems of Optimization and Equilibrium

Corollary 7.21 Let f and gi for i = 1, . . . , m be real-valued convex functions that are Fréchet differentiable at a feasible solution x, and let {∇gi (x) | i ∈ I (x)} be linearly independent. Then x is an optimal solution to (P) with  = Rn if and only if there exist nonnegative multipliers λ1 , . . . , λm such that 0 = ∇ f (x) +

m 

λi ∇gi (x) = ∇x L(x, λ)

i=1

and λi gi (x) = 0 for all i = 1, . . . , m. Proof In this case we have that N (x; ) = {0}, and it follows from (7.8) that λ0  = 0. Indeed, if λ0 = 0, then we get a contradiction due to the imposed linear independence of the gradient vectors {∇gi (x) | i ∈ I (x)}. The rest of the proof is now straightforward. 

7.3

Fenchel Duality in Optimization

This section and the next one are devoted to basic developments on duality in optimization, which belong to the most fundamental issues in convex analysis, convex optimization, and their numerous applications. The main force of duality is in the construction, starting with the given (primal) optimization problem, a dual one in such a way that the optimal values in the primal and dual problems coincide under appropriate qualification conditions. It often happens, as was first revealed in linear programming, that the dual problem is simpler— in one or another aspect—than the primal formulation, and thus solving the dual problem essentially benefits the original, primal framework. We discuss two major duality schemes in this and next sections, respectively. This section concerns Fenchel duality, where the dual problem is constructed via Fenchel conjugates studied in Chap. 4. Consider the following primal unconstrained minimization problem: minimize ϕ(x) = f (x) + g(Ax), x ∈ Rn ,

(7.10)

where f : Rn → R, g : R p → R, and A ∈ R p×n . Throughout this section, we use the standing assumption: there is  x ∈ dom f such that A x ∈ dom g, i.e.,   dom g ∩ A dom f  = ∅,

(7.11)

which guarantees that the function ϕ defined in (7.10) is a proper function. Using Fenchel conjugates, define the Fenchel dual problem of (7.10) by maximize ψ(x) = − f ∗ (A T u) − g ∗ (−u), u ∈ R p .

(7.12)

7.3

Fenchel Duality in Optimization

205

First we establish a relationship between optimal values of the primal minimization problem (P) and the dual maximization problem (7.12) known as Fenchel weak duality. Note that the weak duality theorem does not impose any convexity assumptions. Prior to the theorem, let us formulate the following simple albeit useful lemma. Lemma 7.22 Given a function ϕ : Rn → R, we always have

inf ϕ(x) x ∈ Rn = −ϕ∗ (0). Proof It follows directly from the definitions that



inf ϕ(x) x ∈ Rn = − sup − ϕ(x) x ∈ Rn = − sup 0, x − ϕ(x) x ∈ Rn = −ϕ∗ (0), which therefore completes the proof of the claimed relationship.



Here is the Fenchel weak duality theorem. Theorem 7.23 Consider problem (7.10) along with its Fenchel dual problem (7.12) under the standing assumption (7.11). Denote the corresponding optimal values of the cost functions in (7.10) and (7.12) by

V := inf ϕ(x) x ∈ Rn , (7.13) Vd := sup ψ(u) u ∈ R p . Then we have the relationship Vd ≤ V . Proof Define h(x) = g(Ax) for x ∈ Rn . For any v ∈ Rn , we have ϕ∗ (0) = ( f + h)∗ (0) ≤ ( f ∗ h ∗ )(0) ≤ f ∗ (v) + h ∗ (−v) due to the conjugate sum inequality (4.15) in Theorem 4.16. Next we use the chain inequality (4.20) in Theorem 4.21 and get h ∗ (−v) = (g ◦ A)∗ (−v) ≤ g ∗ (w) whenever A T w = −v. Fix any u ∈ R p and employ the above inequalities with w = −u and v = A T u. This tells us that ϕ∗ (0) ≤ f ∗ (A T u) + g ∗ (−u). Using the latter together with Lemma 7.22, we get − f ∗ (A T u) − g ∗ (−u) ≤ −ϕ∗ (0) = V for all u ∈ R p .

206

7 Applications to Problems of Optimization and Equilibrium

Taking finally the supremum on the left-hand side above over u ∈ R p verifies that Vd ≤ V and thus completes the proof of the theorem.  The next theorem imposes the convexity assumptions on the initial data of the primal problem (P) and provides sufficient conditions to ensure that V = Vd , which is known as Fenchel strong duality. Theorem 7.24 In the setting of Theorem 7.23, suppose in addition that f and g are convex functions satisfying the qualification condition   ri (dom g) ∩ A ri (dom f )  = ∅.

(7.14)

Then we have the strong duality relationship V = Vd . Proof Due to the weak duality Theorem 7.23, it remains to show that V ≤ Vd under the imposed additional assumptions. We only need to consider the case where −∞ < V . Note that ϕ is a proper convex function under (7.14), and so V ∈ R. It follows from Lemma 7.22 that −V = ϕ∗ (0) ∈ R. It is easy to see in the notation of Theorem 7.23 that (7.14) yields  ri(dom f ) ∩ ri(dom h)  = ∅, x ∈ ri(dom g). ∃ x ∈ Rn such that A Using the conjugate sum rule from Theorem 4.16 gives us v ∈ Rn such that −V = ϕ∗ (0) = ( f + h)∗ (0) = f ∗ (v) + h ∗ (−v). Since we get −v ∈ dom h ∗ by the above, it follows from the conjugate chain rule in Theorem 4.21 that h ∗ (−v) = (g ◦ A)∗ (−v) = g ∗ (w) for some w ∈ R p with A T w = −v. Letting u = −w gives us −V = f ∗ (A T u) + g ∗ (−u). This implies in turn that

V = − f ∗ (A T u) − g ∗ (−u) ≤ Vd ,

which completes the proof of the theorem.



7.3

Fenchel Duality in Optimization

207

Now we deduce from Theorem 7.24 the following strong duality relationship, which addresses a more general version of the primal problem (7.10) with affine mappings under the composition. It is similar to consider the weak duality relationship for this problem. Corollary 7.25 Given functions f : Rn → R and g : R p → R, a matrix A ∈ R p×n , and a vector b ∈ R p , consider the composite optimization problem minimize ϕ(x) = f (x) + g(Ax − b), x ∈ Rn ,

(7.15)

and the associated Fenchel dual counterpart maximize ψ(u) = − f ∗ (A T u) − g ∗ (−u) + b, u, u ∈ R p .

(7.16)

Define the corresponding optimal values V and Vd as in (7.13). Then the Fenchel strong duality Vd = V holds provided that f and g are convex functions satisfying the qualification condition   ri(dom g) ∩ B ri(dom f )  = ∅, (7.17) where B(x) = Ax − b for all x ∈ Rn . Proof Define the function g1 : R p → R by g1 (u) = g(u − b) for u ∈ R p . Then we have the representation ϕ(x) = f (x) + g1 (Ax) for x ∈ Rn . It follows from Proposition 4.17(iii) that g1∗ (u) = g ∗ (u) + b, u for u ∈ R p . Due to the construction of the function g1 , we get the equality − f ∗ (A T u) − g1∗ (−u) = − f ∗ (A T u) − g ∗ (−u) + b, u for all u ∈ R p , which tells us that the Fenchel dual problem of (7.15) is represented by (7.16). The rest of the proof follows from applying the conjugate calculus rules in Theorems 7.23 and 7.24.  To conclude this section, we give an example showing that the Fenchel strong duality allows us to calculate the optimal value in a difficult problem by reducing this to solving an easier dual counterpart.

208

7 Applications to Problems of Optimization and Equilibrium

Example 7.26 Take b ∈ R p and A ∈ R p×n with p ≤ n, and assume that A has full rank, i.e., the row vectors of A are linearly independent in Rn . Consider the affine set Ω = {x ∈ Rn | Ax = b}. Given a x ∈ Rn , we aim at verifying that the distance function associated with this set is calculated by   d(x; Ω) =  A T (A A T )−1 (Ax − b). (7.18) To proceed, consider the minimization problem 1 x − x 2 2 subject to Ax = b,

minimize

which can be written in the unconstrained form minimize ϕ(x) =

1 x − x 2 + δ{0} (Ax − b), x ∈ Rn . 2

Defining the convex functions f (x) =

1 x − x 2 and g(u) = δ{0} (u) for x ∈ Rn , u ∈ R p , 2

we readily compute their conjugates by f ∗ (y) =

1 y 2 + x, y and g ∗ (v) = σ{0} (v) = 0 for y ∈ Rn , v ∈ R p . 2

The Fenchel dual problem (7.16) associated with (7.19) is 1 maximize ψ(u) = − A T u 2 − x, A T u + b, u, u ∈ R p . 2 The objective function ψ of the dual problem can be represented by 1 ψ(u) = − A A T u, u − Ax − b, u for u ∈ R p . 2 The latter problem admits a unique optimal solution u satisfying A A T u = −(Ax − b), i.e., u = −(A A T )−1 (Ax − b). Observe further the equalities −Ax − b, u = A A T u, u = A T u, A T u = A T u 2 and thus get that the optimal value of the Fenchel dual problem is

Vd =

2 1 T 2 1 A u =  A T (A A T )−1 (Ax − b) . 2 2

(7.19)

7.4

Lagrangian Duality in Convex Optimization

209

It is easy to check that the qualification condition (7.17) is satisfied. Applying now Corollary 7.25 tells us that V = Vd . Therefore, √   d(x; Ω) = 2V =  A T (A A T )−1 (Ax − b), which verifies formula (7.18). Note that in the particular case of the hyperplane Ω = {x ∈ Rn | a, x = b} with 0  = a ∈ Rn and b ∈ R, we have d(x; Ω) =

7.4

|a, x − b| . a

Lagrangian Duality in Convex Optimization

This section presents another type of duality for convex optimization problems that does not involve Fenchel conjugates while being based on a certain Lagrangian function depending on primal and dual variables. Such a duality is labeled as the Lagrangian duality. The main thrust of the Lagrangian duality is to find efficient conditions ensuring that the Lagrangian function attains its saddle/equilibrium point at the joint solutions of the primal minimization problem and the associated dual problem of concave maximization. Given real-valued functions f : Rn → R and gi : Rn → R for i = 1, . . . , m together with a matrix A ∈ R p×n , a vector b ∈ R p , and a nonempty set  ⊂ Rn , consider the following primal minimization problem: minimize f (x) subject to gi (x) ≤ 0 for i = 1, . . . , m, Ax = b, x ∈ .

(7.20)

Unless otherwise stated, we assume in this section that the functions f , gi and the set  are convex. The Lagrange function (or simply Lagrangian) associated with (7.20) is defined by L(x, u, v) := f (x) +

m 

u i gi (x) + v, Ax − b, x ∈ ,

(7.21)

i=1 p where u = [u 1 , . . . , u m ]T ∈ Rm + and v ∈ R . We also define the Lagrange dual function m ∗ p (or dual Lagrangian) L : R+ × R → [−∞, ∞) by

L ∗ (u, v) := inf L(x, u, v) x ∈  . (7.22)

The latter function is concave as shown below. p Proposition 7.27 The dual Lagrangian L ∗ : Rm + × R → [−∞, ∞) is a concave function.

210

7 Applications to Problems of Optimization and Equilibrium

Proof Since the Lagrangian function (7.21) is affine with respect to the variable pair (u, v), the claimed result follows from the concavity of the infimum of a family of concave functions in (7.22); see Exercise 1.19.  The next statement is useful in what follows. Lemma 7.28 Let C and D be nonempty subsets, and let F : C × D → [−∞, ∞]. Then we always have (7.23) sup inf F(x, y) ≤ inf sup F(x, y). y∈D x∈C

x∈C y∈D

Assume in addition that there exists a pair (x ∗ , y ∗ ) ∈ C × D such that F(x ∗ , y) ≤ F(x ∗ , y ∗ ) ≤ F(x, y ∗ ) for all x ∈ C, y ∈ D,

(7.24)

i.e., (x ∗ , y ∗ ) is a saddle point of F on C × D. Then (7.23) holds as an equality. Proof For any (x, y) ∈ C × D, we obviously have F(x, y) ≤ sup F(x, y). y∈D

It follows therefore that inf F(x, y) ≤ inf sup F(x, y) for all y ∈ D,

x∈C

x∈C y∈D

which yields (7.23) by taking the supremum with respect to y ∈ D. Using the first inequality in (7.24) gives us sup F(x ∗ , y) ≤ F(x ∗ , y ∗ ), y∈D

which implies that inf sup F(x, y) ≤ sup F(x ∗ , y) ≤ F(x ∗ , y ∗ ).

x∈C y∈D

y∈D

Similarly, the second inequality yields F(x ∗ , y ∗ ) ≤ sup inf F(x, y). y∈D x∈C

Therefore, we obtain the equality in (7.23) under (7.24).



Having in hand the dual Lagrangian (7.22), we formulate the following dual problem of concave maximization:

7.4

Lagrangian Duality in Convex Optimization

211

maximize L ∗ (u, v) p subject to u ∈ Rm +, v ∈ R .

(7.25)

To proceed further, we denote the constraint sets of the primal problem (7.20) and the dual problem (7.25) respectively by

 = x ∈  gi (x) ≤ 0 for i = 1, . . . , m, Ax = b , (7.26) p d = Rm +×R and then define the optimal values in problems (7.20) and (7.25) by

V := inf f (x) x ∈  and ∗ Vd := sup L (u, v) (u, v) ∈ d .

(7.27)

The following theorem is known as the Lagrangian weak duality. Theorem 7.29 Considering the Lagrangian (7.21) and optimal values (7.27) of the primal (7.20) and dual (7.25) problems, we have the relationships sup

inf L(x, u, v) = Vd and inf

(u,v)∈d x∈

sup

x∈ (u,v)∈Ωd

L(x, u, v) = V .

Consequently, the Lagrangian weak duality Vd ≤ V holds. Proof For any x ∈ , it is easy to see that  sup

(u,v)∈d

L(x, u, v) =

f (x)

if x ∈ ,



if x ∈ / .

It follows therefore that inf

sup

x∈ (u,v)∈d

L(x, u, v) = inf f (x) = V . x∈Ω

Furthermore, we have by the corresponding definitions that sup

inf L(x, u, v) =

(u,v)∈d x∈

sup

(u,v)∈d

L ∗ (u, v) = Vd .

The claimed relationships follow now from Lemma 7.28.



Let us now discuss some sufficient conditions which guarantee the Lagrangian strong duality. We first concentrate on a particular case of the general problem (7.20) with no equality constraints, i.e., consider the following optimization problem:

212

7 Applications to Problems of Optimization and Equilibrium

minimize f (x) subject to gi (x) ≤ 0 for all i = 1, . . . , m, x ∈ .

(7.28)

The Slater condition/constraint qualification for (7.28) postulates the existence of  x ∈ x ) < 0 for all i = 1, . . . , m. Our goal in what follows is to obtain a strong such that gi ( Lagrangian duality theorem for (7.28) under the Slater constraint qualification. To proceed, consider the constraint system with respect to the vector x: ⎧ ⎪ ⎪ ⎨ f (x) < 0, gi (x) ≤ 0 for all i = 1, . . . , m, ⎪ ⎪ ⎩x ∈ .

(7.29)

Define also another system with respect to the multipliers u 1 , . . . , u m : ⎧ m  ⎪ ⎨ f (x) + u i gi (x) ≥ 0 for all x ∈ , ⎪ ⎩

i=1

(7.30)

u i ≥ 0 for all i = 1, . . . , m.

The next lemma reveals relationships between the constraint systems (7.29) and (7.30) under the Slater constraint qualification. Lemma 7.30 Let the Slater condition be satisfied for problem (7.28). Then (7.29) has a solution if and only if (7.30) has no solution. Proof Suppose that (7.29) admits a solution x0 and show that (7.30) is not solvable. Assume on the contrary that a solution u = (u 1 , . . . , u m ) exists for (7.30). Then for the solution x0 to (7.29) we have m  u i gi (x0 ) ≤ f (x0 ) < 0, f (x0 ) + i=1

which is a clear contradiction. Note that in this part of the proof, the Slater condition is not required. Now suppose that (7.30) has no solution and then show that (7.29) has a solution. Assuming again the contrary that (7.29) has no solution, consider the set

 = (γ0 , γ) ∈ R × Rm f (x) < γ0 , g(x)  γ for some x ∈  , where γ0 ∈ R, γ ∈ Rm , and g(x) = [g1 (x), . . . , gm (x)]T for x ∈ Rn , and where ‘’ indicates the componentwise vector inequality. It is easy to see that  is a nonempty convex set with 0 ∈ / . Separating  and {0} gives us multipliers u 0 , . . . , u m , not all zeros, such that

7.4

Lagrangian Duality in Convex Optimization

γ0 u 0 +

213

m 

γi u i ≥ 0

(7.31)

i=1

whenever (γ0 , . . . , γm ) ∈ . By passing to a limit, we see that this inequality also holds for all (γ0 , . . . , γm ) ∈ . It clearly follows from (7.31) that u i ≥ 0 for all i = 0, . . . , m. Taking now any x ∈  tells us that ( f (x), g1 (x), . . . , gm (x)) ∈ , and thus we arrive at the inequality m  u i gi (x) ≥ 0. (7.32) u 0 f (x) + i=1

x ∈  from the Slater condition such that If u 0 = 0, then we use in (7.32) the vector  x ) < 0 for all i = 1, . . . , m, which brings us to a contradiction. Therefore, u 0 > 0, and gi ( dividing both sides of (7.32) by u 0 shows that (7.30) has a solution. This is a contradiction, which completes the proof.  Next we show that the Slater condition ensures the Lagrangian strong duality for the optimization problem (7.28) with no equality constraints. Theorem 7.31 If the Slater condition is satisfied for problem (7.28), then we have the Lagrangian strong duality V = Vd . Proof In the case where V = −∞, we get that Vd = −∞ by the Lagrangian weak duality from Theorem 7.29, and thus the strong duality holds. Consider the case where V ∈ R. Then the system ⎧ ⎪ ⎪ ⎨ f (x) − V < 0, gi (x) ≤ 0 for all i = 1, . . . , m, ⎪ ⎪ ⎩x ∈ 

has no solution. By Lemma 7.30, there is u = [u 1 , . . . , u m ]T ∈ Rm + with f (x) − V +

m 

u i gi (x) ≥ 0 for all x ∈ .

i=1

It follows therefore that



Vd ≥ inf

x∈

f (x) +

m 

 u i gi (x) ≥ V .

i=1

Combining the latter estimate with the Lagrangian weak duality from Theorem 7.29 verifies the claimed Lagrangian strong duality in (7.28). 

214

7 Applications to Problems of Optimization and Equilibrium

Now we return to the general class of problems (7.20) in convex optimization with the presence of inequality, equality, and geometric constraints. Our goal here is to establish the Lagrangian strong duality under the following relative interior Slater condition: there exists  x ∈ ri  such that x ) < 0 for all i = 1, . . . , m, gi ( A x = b. To proceed, consider the constraint system with respect to x: ⎧ ⎪ f (x) < 0, ⎪ ⎪ ⎪ ⎨g (x) ≤ 0 for all i = 1, . . . , m, i ⎪ Ax = b, ⎪ ⎪ ⎪ ⎩ x ∈

(7.33)

together with the dual constraint system with respect to u 1 , . . . , u m , v: ⎧ m  ⎪ ⎨ f (x) + u i gi (x) + v, Ax − b ≥ 0 for all x ∈ , ⎪ ⎩

i=1

u i ≥ 0 for all i = 1, . . . , m, v ∈

(7.34)

Rp.

Let us present a counterpart of Lemma 7.30 for the case under consideration, where the proof is technically more involved. Lemma 7.32 Under the fulfillment of the relative interior Slater condition, system (7.33) has a solution if and only if (7.34) has no solution. Proof Assuming that (7.33) has a solution x0 , suppose on the contrary that (7.34) has a p solution as well. Denote the latter by u = [u 1 , . . . , u m ]T ∈ Rm + and v ∈ R , and therefore arrive at the strict inequality f (x0 ) +

m 

u i gi (x0 ) + v, Ax0 − b ≤ f (x0 ) < 0,

i=1

which is a clear contradiction that verifies the “if” part of the lemma. To prove the “only if” part, suppose that (7.34) has no solution and show that (7.33) has a solution in this case. Assuming on the contrary that (7.33) does not have a solution, consider the set

 = (γ0 , γ, β) ∈ R × Rm × R p f (x) < γ0 , g(x)  γ, Ax − b = β for some x ∈  ,

7.4

Lagrangian Duality in Convex Optimization

215

where γ0 ∈ R, γ ∈ Rm , β ∈ R p , and g(x) = [g1 (x), . . . , gm (x)]T for x ∈ Rn , and where ‘’ indicates the componentwise vector inequality. Since (7.34) is not solvable, there exists p x0 ∈  such that for all u = [u 1 , . . . , u m ]T ∈ Rm + and v ∈ R we have the strict inequality f (x0 ) +

m 

u i gi (x0 ) + v, Ax0 − b < 0.

i=1 p Fix u = [u 1 , . . . , u m ]T ∈ Rm + and v ∈ R , and then denote β = Ax 0 − b, γ = g(x 0 ) and

γ0 = −

m 

u i gi (x0 ) − v, Ax0 − b.

i=1

We have   = ∅ due to (γ 0 , γ, β) ∈ . Since (7.33) has no solution, it follows that 0 ∈ / . Furthermore, the convexity of  is implied by the convexity of f , gi and the linearity of A. Thus we can properly separate  and {0}, which gives us [u 0 , . . . , u m ]T ∈ Rm+1 and v ∈ R p , not all zeros, such that u 0 γ0 +

m 

u i γi + v, β ≥ 0

(7.35)

i=1

γ0 , γ,  β) ∈  satisfying the strict whenever (γ0 , . . . , γm , β) ∈ . In addition, there exists ( inequality m  γ0 + ui γi + v,  β > 0. (7.36) u 0 i=1

By passing to the limit, observe that the inequality in (7.35) also holds for any (γ0 , . . . , γm , β) ∈ . Next we are going to check that u i ≥ 0 for all i = 0, . . . , m. The imposed Slater condition x ) < 0 for all i = 1, . . . , m and A x = b. Fix a real number gives us  x ∈ ri  ⊂  with gi ( x ). We first see that (γ0 , 0, . . . , 0) ∈ , which shows by (7.35) that γ0 > 0 such that γ0 > f ( u 0 γ0 ≥ 0, and so u 0 ≥ 0. To verify that u 1 ≥ 0, we also suppose on the contrary that u 1 < 0. x ) it follows that (γ0 , γ1 , 0, . . . , 0) ∈  while (7.35) If u 0 = 0, then for any γ1 > 0 > g1 ( u 0 γ0 x ) ensures that (γ0 , γ1 , 0, . . . , 0) ∈  fails. If u 0 > 0, choosing γ1 = − u 1 + 1 > 0 > g1 ( but (7.35) is not satisfied. This contradiction verifies that u 1 ≥ 0. Similarly we check that u i ≥ 0 for i = 2, . . . , m and hence get u i ≥ 0 for all i = 0, . . . , m. To proceed further, fix any x ∈  and see that ( f (x), g1 (x), . . . , gm (x), Ax − b) ∈ . Thus we have the inequality

216

7 Applications to Problems of Optimization and Equilibrium

u 0 f (x) +

m 

u i gi (x) + v, Ax − b ≥ 0 for all x ∈ .

(7.37)

i=1

If u 0 = 0, then using the relative interior Slater condition gives us  x ∈ ri  such that gi ( x) < 0 for all i = 1, . . . , m and A x = b. This yields m 

u i gi ( x ) + v, A x − b =

i=1

m 

u i gi ( x) ≥ 0

i=1

and implies that u i = 0 for all i = 1, . . . , m telling us that v, Ax − b ≥ 0 for all x ∈ . Since ( γ0 , γ,  β) ∈ , there exists  x ∈  ⊂ Rn with  β = A x − b, and hence (7.36) brings us to the strict inequality v, A x − b > 0. By definition, we see that the sets A() and {b} can be properly separated, which yields b∈ / ri(A()). Since A(ri ) = ri(A()), we arrive at a contradiction due to  x ∈ ri  and A x = b. Therefore, u 0 > 0, which allows us to divide both sides of (7.37) by u 0 to show that (7.34) has a solution. The obtained final contradiction completes the proof of the lemma.  The established lemma leads us to the following refined theorem on the Lagrangian strong duality in general convex optimization problem (7.20). Theorem 7.33 Consider the constrained minimization problem (7.20) under the standing convexity assumptions on f , gi , and . If the relative interior Slater condition holds, then we have the Lagrangian strong duality V = Vd . Proof Examine the two possible cases. If V = −∞, then Vd = −∞ by the Lagrangian weak duality theorem, and we are done. Considering now the case where V ∈ R, we get that the constrained system ⎧ ⎪ f (x) − V < 0, ⎪ ⎪ ⎪ ⎨g (x) ≤ 0, i = 1, . . . , m, i

⎪ Ax = b, ⎪ ⎪ ⎪ ⎩ x ∈ p has no solution. By Lemma 7.32, there exist u = [u 1 , . . . , u m ]T ∈ Rm + and v ∈ R satisfying the condition

f (x) − V +

m  i=1

u i gi (x) + β, Ax − b ≥ 0 for all x ∈ .

7.4

Lagrangian Duality in Convex Optimization

This implies therefore the relationship  ∗

Vd ≥ L (u, v) = inf

x∈

f (x) +

m 

217

 u i gi (x) + v, Ax − b ≥ V .

i=1

Combining the latter with the Lagrangian weak duality from Theorem 7.29 verifies that V = Vd and thus completes the proof.  Let us now continue with the study of relationships between the Lagrangian strong duality and the KKT conditions. p Similarly to (7.24), we say that the triple (x, u, v) ∈  × Rm + × R is a saddle/equilibrium point of the Lagrange function L from (7.21) if p L(x, u, v) ≤ L(x, u, v) ≤ L(x, u, v) whenever x ∈ , (u, v) ∈ Rm +×R .

Theorem 7.34 Consider the primal problem (7.20) and its Lagrangian dual counterpart (7.25) under the standing convexity assumptions of this section. We have the following assertions: (i) Under the Lagrangian strong duality V = Vd , if x is an optimal solution to the primal problem (7.20) and (u, v) is an optimal solution to the dual problem (7.25), then the KKT conditions are satisfied, i.e., 0 ∈ ∂ f (x) +

m 

u i ∂gi (x) + A T v + N (x; )

i=1

and Ax = b together with the complementary slackness relationships u i gi (x) = 0 for all i = 1, . . . , m. (ii) Conversely, if the KKT conditions formulated in (i) hold for x ∈ Ω and (u, v) ∈ Ωd given in (7.26), then we have the Lagrangian strong duality V = Vd . In this case, x is an optimal solution to the primal problem (7.20) and the pair (u, v) provides an optimal solution to the dual problem (7.25). Proof To prove (i) first, suppose that x is an optimal solution to the primal problem and p (u, v) is an optimal solution to the dual problem (7.25). Then x ∈  and (u, v) ∈ Rm +×R in the notation of (7.26) being such that

f (x) = inf f (x) x ∈  = V and L ∗ (u, v) = sup L ∗ (u, v) (u, v) ∈ Ωd = Vd . Under the Lagrangian strong duality V = Vd , we have that

218

7 Applications to Problems of Optimization and Equilibrium

f (x) = V = Vd = L ∗ (u, v) = inf L(x, u, v) x∈

≤ L(x, u, v) = f (x) + = f (x) +

m 

m 

u i gi (x) + v, Ax − b

i=1

u i gi (x) ≤ f (x).

i=1

Note that Ax = b and gi (x) ≤ 0 for all i = 1, . . . , m due to the inclusion x ∈ . Thus m i=1 u i gi (x) = 0, which yields the complementary slackness conditions u i gi (x) = 0 for all i = 1, . . . , m. Since f (x) = inf x∈ L(x, u, v), using Theorem 7.15 and remembering that f , gi are real-valued functions imply that 0 ∈ ∂x L(x, u, v) + N (x; ) = ∂ f (x) +

m 

u i ∂gi (x) + A T v + N (x; ),

i=1

which completes the proof of assertion (i). To verify (ii), suppose that the triple (x, u, v) satisfies the KKT conditions. Defining the convex function h(x) = L(x, u, v) for x ∈ Rn gives us 0 ∈ ∂ f (x) +

m 

u i ∂gi (x) + A T v + N (x; ) ⊂ ∂h(x) + N (x; ).

i=1

Using Theorem 7.15, this implies that h(x) ≤ h(x) for all x ∈ , and so L(x, u, v) ≤ L(x, u, v) for all x ∈ . It follows from Ax = b and the complementary slackness conditions that

p sup L(x, u, v) (u, v) ∈ Rm ≤ f (x) = L(x, u, v). +×R This tells us that (x, u, v) is a saddle point of the Lagrangian L, and thus we have the Lagrangian strong duality V = Vd by Lemma 7.28. Combining all of this, we arrive at the relationships

f (x) = L(x, u, v) = inf L(x, u, v) x ∈ 

≤ inf L(x, u,

v) x ∈ , gi (x) ≤ 0 for all i = 1, . . . , m, Ax = b ≤ inf f (x) x ∈ , gi (x) ≤ 0 for all i = 1, . . . , m, Ax = b = V and thus conclude that x is a solution to the primal problem (7.20). p Invoking the dual Lagrangian (7.22), for any (u, v) ∈ Rm + × R we have

L ∗ (u, v) = inf L(x, u, v) x ∈  ≤ L(x, u, v), which implies therefore that

7.4

Lagrangian Duality in Convex Optimization

219

p sup L ∗ (u, v) (u, v) ∈ Rm +×R

p ≤ sup L(x, u, v) (u, v) ∈ Rm +×R

= L(x, u, v) ≤ inf L(x, u, v) x ∈  = L ∗ (u, v) and thus verifies the optimality of (u, v) in the dual problem (7.25).



We complete this section by considering three examples, which illustrate applications of the Lagrangian strong duality to solving two optimization problems of their own interest. The first example addresses calculating the distance to a hyperplane in Rn . This problem, as well as the one in the next Example 7.36, have already been examined in Example 7.26 by using the Fenchel strong duality approach. Example 7.35 Consider the hyperplane

Ω = x ∈ Rn a, x = b , where a ∈ Rn with a  = 0, and where b ∈ R. Given x ∈ Rn , our goal is to find the distance from x to the hyperplane Ω by solving the problem minimize f (x) = subject to x ∈ Rn

1 x − x 2 2

by employing the Lagrangian strong duality. The Lagrange function here is L(x, λ) =

1 x − x 2 + λ(a, x − b) 2

for x ∈ Rn and λ ∈ R. Then the dual Lagrangian is L ∗ (λ) = infn L(x, λ), λ ∈ R. x∈R

Since L(·, λ) is convex in Rn with ∇x L(x, λ) = x − x + λa, solving ∇x L(x, λ) = x − x + λa gives us the global minimizer of L(·, λ) as x = x − λa. Thus L ∗ (λ) = infn L(x, λ) = − x∈R

  a 2 2 λ + λ a, x − b . 2

Solving further the equation (L ∗ ) (λ) = −λ a 2 + a, x − b = 0,

220

7 Applications to Problems of Optimization and Equilibrium

we obtain the expression for the multiplier λ: λ=

a, x − b . a 2

Since the Lagrangian strong duality holds in this case, we have  2 1 a, x − b V = Vd = . 2 a 2 Therefore, the distance from x to the hyperplane Ω is calculated by d(x; Ω) =

√ |a, x − b| 2V = . a

Observe that we arrive at the same formula for the distance function as in Example 7.26, which was resolved by using the Fenchel strong duality. The second example is an extension of the preceding one by replacing the hyperplane with the general affine set in Rn . We show now that the Lagrangian strong duality provides an elegant way to calculate the distance function associated with the affine set under consideration. Example 7.36 Let A be a p × n matrix with p ≤ n and the full rank p, and let b ∈ R p . Consider the affine set

Ω = x ∈ Rn Ax = b . Our goal is to find the distance from any point x ∈ Rn to Ω. We reduce this to solving the optimization problem: 1 x − x 2 2 subject to Ax = b, x ∈ Rn .

minimize

The Lagrange function associated with this problem is L(x, λ) =

1 1 x − x 2 + λ, Ax − b = x − x 2 + A T λ, x − λ, b 2 2

for all x ∈ Rn and λ ∈ R p . The corresponding dual Lagrangian is given by   1 ∗ 2 x − x + λ, Ax − b . L (λ) = infn L(x, λ) = infn x∈R x∈R 2 We find L∗ by calculating the partial gradient of the Lagrangian ∇x L(x, λ) = x − x + A T λ

7.4

Lagrangian Duality in Convex Optimization

221

and solving the equation ∇x L(x, λ) = 0, which gives us x = x − A T λ. Thus 1 L ∗ (λ) = − A A T λ, λ + Ax − b, λ. 2 Considering further the adjoint equation ∇ L ∗ (λ) = −A A T λ + Ax − b = 0 tells us that λ = (A A T )−1 (Ax − b). Using the Lagrangian strong duality yields

V = Vd =

1 A(A A T )−1 (Ax − b) 2 . 2

Therefore, the distance from x to Ω is calculated by √ d(x; Ω) = 2V = A(A A T )−1 (Ax − b) in accordance with Example 7.26. The last example addresses a standard linear programming problem with inequality constraints and constructs a Lagrangian dual problem written in the canonical form of the most popular Dantzig’s simplex method. The obtained duality form agrees with the classical dual problem in linear programming. Example 7.37 Consider the linear program minimize c, x subject to Ax  b, where c ∈ Rn , b ∈ R p , and A is a p × n matrix. Write this problem as minimize c, x subject to −(Ax − b)  0 for which the Lagrange function is L(x, λ) = c, x − λ, Ax − b = c − A T λ, x + λ, b p

whenever x ∈ Rn and λ ∈ R+ . The dual Lagrangian is now given by ∗

L (λ) = infn L(x, λ) = x∈R

 b, λ −∞

if A T λ = c, otherwise.

222

7 Applications to Problems of Optimization and Equilibrium

Thus the Lagrangian dual problem is represented as maximize b, λ subject to A T λ = c, λ  0, which is the canonical form for applying the simplex algorithm.

7.5

Subgradient Methods in Nonsmooth Convex Optimization

This section is devoted to algorithmic aspects of convex optimization and presents two versions of the classical subgradient method: one for unconstrained and the other for constrained optimization problems. First we study the unconstrained problem formulated as minimize f (x) subject to x ∈ Rn ,

(7.38)

where f : Rn → R is a real-valued convex function. Let {αk } be a sequence of positive numbers. The subgradient algorithm generated by {αk } is designed as follows. Given a starting point x1 ∈ Rn , consider the iterative procedure constructed by xk+1 = xk − αk vk with some vk ∈ ∂ f (xk ), k ∈ N.

(7.39)

The standing assumptions of this section for (7.38) are the following: this problem admits an optimal solution and the cost function f is convex and Lipschitz continuous on Rn . Our main goal is to find conditions that ensure the convergence of iterations constructed by the subgradient algorithm (7.39) and its counterpart for problems of constrained optimization. We split the proof of the main result in Theorem 7.44 into several propositions, which are of their own interest. The first one gives us an important subgradient property of convex Lipschitzian functions. Proposition 7.38 Let  ≥ 0 be a Lipschitz constant of f on Rn . For any x ∈ Rn , we have the subgradient estimate v ≤  whenever v ∈ ∂ f (x). Proof Fix x ∈ Rn and take any subgradient v ∈ ∂ f (x). It follows from the subgradient definition that v, u − x ≤ f (u) − f (x) for all u ∈ Rn . Combining this with the Lipschitz property of f gives us v, u − x ≤  u − x for all u ∈ Rn ,

7.5

Subgradient Methods in Nonsmooth Convex Optimization

223

which yields in turn that v, y + x − x ≤  y + x − x , i.e., v, y ≤  y for all y ∈ Rn . Using y = v, we arrive at the claimed conclusion v ≤ .



The second proposition provides a relation for the subsequent iterations in (7.39). Its verification is based on the Euclidean norm properties. Proposition 7.39 Let  ≥ 0 be a Lipschitz constant of f on Rn . For the sequence {xk } generated by algorithm (7.39), we have   xk+1 − x 2 ≤ xk − x 2 − 2αk f (xk ) − f (x) + αk2 2

(7.40)

for all x ∈ Rn and k ∈ N. Proof Fix x ∈ Rn , k ∈ N and get from (7.39) that xk+1 − x 2 = xk − αk vk − x 2 = xk − x − αk vk 2 = xk − x 2 − 2αk vk , xk − x + αk2 vk 2 . Since vk ∈ ∂ f (xk ) and vk ≤  by Proposition 7.38, it follows that vk , x − xk  ≤ f (x) − f (xk ). 

This verifies (7.40) and completes the proof of the proposition.

The next proposition gives us an estimate of the closeness of iterations in (7.39) to the optimal value of (7.38). Denote

V = min f (x) x ∈ Rn and Vk = min f (x1 ), . . . , f (xk ) . (7.41) Proposition 7.40 Let  ≥ 0 be a Lipschitz constant of f on Rn , and let S be the set of optimal solutions to (7.38). For all k ∈ N, we have d(x1 ; S)2 + 2

k  i=1

0 ≤ Vk − V ≤ 2

k  i=1

αi

αi2 .

(7.42)

224

7 Applications to Problems of Optimization and Equilibrium

Proof Substituting any x ∈ S into estimate (7.40) tells us that   xk+1 − x 2 ≤ xk − x 2 − 2αk f (xk ) − f (x) + αk2 2 . Since Vk ≤ f (xi ) for all i = 1, . . . , k and V = f (x), we get xk+1 − x 2 ≤ x1 − x 2 − 2

k 

k    αi f (xi ) − f (x) + 2 αi2

i=1

≤ x1 − x 2 − 2

k 

αi (Vk − V ) + 2

i=1

k 

i=1

αi2 .

i=1

It follows from xk+1 − x 2 ≥ 0 that 2

k 

k    αi Vk − V ≤ x1 − x 2 + 2 αi2 .

i=1

i=1

Since Vk ≥ V for every k, we deduce the estimate x1 − x 2 + 2

k 

αi2

i=1

0 ≤ Vk − V ≤ 2

k 

αi

i=1

and hence arrive at (7.42) since x ∈ S is arbitrary.



As a consequence of the latter proposition, we present now an appropriate error estimate for the optimal value in the case of constant step sizes. Corollary 7.41 Suppose that αk =  for all k ∈ N in the setting of Proposition 7.40. Then there exists a positive constant C such that 0 ≤ Vk − V < 2  whenever k > C/2 .

(7.43)

Proof It follows from (7.42) that 0 ≤ Vk − V ≤

a + k2 2 a 2  = + with a = d(x1 ; S)2 . 2k 2k 2

Since a/(2k) + 2 /2 < 2  as k > a/(2 2 ), we get (7.43) with C = a/2 .



7.5

Subgradient Methods in Nonsmooth Convex Optimization

225

The following proposition reveals conditions on the sequence {αk } in (7.39) ensuring the convergence of the sequence {Vk } in (7.41) to the optimal value V of the optimization problem (7.38). Proposition 7.42 In the setting of Proposition 7.40, suppose in addition that ∞ 

αk = ∞ and

k=1

lim αk = 0.

(7.44)

k→∞

Then we have the convergence Vk → V as k → ∞. Proof Given any  > 0, choose K ∈ N such that αi 2 <  for all i ≥ K . For every fixed k > K , using (7.42) ensures the fulfillment of the estimates d(x1 ; S)2 + 2

K  i=1

0 ≤ Vk − V ≤ 2

k 

d(x1 ; S)2 + 2 k 

K 

d(x1 ; S)2 + 2 k 

k  i=K k 

K 

αi

αi



i=K k 

+

αi2

i=1 k 

2

i=1

2

αi2

αi

i=1



+ 2

i=1

2

2

αi

i=1



αi2

αi

i=1

αi2 + /2.

αi

i=1

The imposed assumption

∞

k=1 αk

= ∞ yields limk→∞

k

i=1 αi

= ∞, and so

0 ≤ lim inf (Vk − V ) ≤ lim sup(Vk − V ) ≤ /2. k→∞

k→∞

The arbitrary choice of  > 0 ensures that limk→∞ (Vk − V ) = 0, and thus the claimed  convergence Vk → V as k → ∞ is verified. The next result gives us important information about the value convergence of the subgradient algorithm (7.39). Proposition 7.43 In the setting of Proposition 7.42, we have lim inf f (xk ) = V . k→∞

226

7 Applications to Problems of Optimization and Equilibrium

Proof Since V is the optimal value in (7.38), it follows that f (xk ) ≥ V for any k ∈ N. Therefore, we have lim inf f (xk ) ≥ V . k→∞

To prove the reverse inequality, suppose on the contrary that lim inf f (xk ) > V k→∞

and then find a small number  > 0 such that lim inf f (xk ) − 2 > V = minn f (x) = f (x), k→∞

x∈R

where x is an optimal solution to problem (7.38). This tells us that lim inf f (xk ) −  > f (x) + . k→∞

In the case where lim inf k→∞ f (xk ) < ∞, it follows from the definition of the limit inferior “lim inf” that there exists k0 ∈ N such that for every natural number k ≥ k0 we have the lower estimate f (xk ) > lim inf f (xk ) − , k→∞

and hence f (xk ) − f (x) >  as k ≥ k0 . This conclusion also holds if lim inf k→∞ f (xk ) = ∞. We deduce from (7.40) with x = x that   xk+1 − x 2 ≤ xk − x 2 − 2αk f (xk ) − f (x) + αk2 2

(7.45)

for each k ≥ k0 . Let k1 be such that αk 2 <  for all k ≥ k1 , and let k 1 = max{k1 , k0 }. For any k > k 1 , it easily follows from (7.45) that k  2    xk+1 − x 2 ≤ xk − x 2 − αk  ≤ · · · ≤ xk 1 − x  −  αj, j=k 1

which contradicts the assumption of this proposition.

∞

k=1 αk

= ∞ in (7.44) and therefore completes the proof 

Using the results established above, we are now ready to justify the convergence of the subgradient algorithm (7.39). Theorem 7.44 Suppose that the generating sequence {αk } in (7.39) satisfies ∞  k=1

αk = ∞ and

∞  k=1

αk2 < ∞.

(7.46)

7.5

Subgradient Methods in Nonsmooth Convex Optimization

227

Then the sequence {Vk } converges to the optimal value V and the sequence of iterations {xk } in (7.39) converges to some optimal solution x to (7.38). Proof Since (7.46) yields (7.44), the sequence {Vk } converges to V by Proposition 7.42. It remains to verify that {xk } converges to some optimal solution x to problem (7.38). Let  ≥ 0 be a Lipschitz constant of f on Rn . Since the set S of optimal solutions to (7.38) is assumed to be nonempty, we pick any  x ∈ S and plug x =  x into estimate (7.40) of Proposition 7.39. Thus   xk+1 −  x 2 ≤ xk −  x 2 − 2αk f (xk ) − V + αk2 2 for all k ∈ N.

(7.47)

Since f (xk ) − V ≥ 0, estimate (7.47) ensures that xk+1 −  x 2 ≤ xk −  x 2 + αk2 2 . Then we get by induction that xk+1 −  x 2 ≤ x1 −  x 2 + 2

k 

αi2 .

(7.48)

i=1

The right-hand side of (7.48) is finite by

∞

2 k=1 αk

< ∞, and hence

sup xk+1 −  x < ∞, k∈N

i.e., the sequence {xk } is bounded. Furthermore, Proposition 7.43 tells us that lim inf f (xk ) = V . k→∞

Take now a subsequence {xk j } of {xk } such that lim f (xk j ) = V .

(7.49)

j→∞

Then {xk j } is also bounded and contains a cluster point x. Without loss of generality, assume that xk j → x as j → ∞. We directly deduce from (7.49) and the continuity of f that x is an optimal solution to (7.38). Fix any j ∈ N and any k ≥ k j . Thus we get from (7.47) with  x = x that k   2 xk+1 − x 2 ≤ xk − x 2 + αk2 2 ≤ . . . ≤ xk j − x  + 2 αi2 . i=k j

228

7 Applications to Problems of Optimization and Equilibrium

Taking first the limit as k → ∞ and then the limit as j → ∞ in the resulting inequality above gives us the estimate ∞   2 2   αi2 . lim sup xk+1 − x ≤ lim xk j − x +  lim 2

j→∞

k→∞

Since xk j → x and

∞

2 k=1 αk

j→∞

i=k j

< ∞, this implies that lim sup xk+1 − x 2 = 0, k→∞

and thus justifies the claimed convergence of xk → x as k → ∞.



Finally in this section, we present a version of the subgradient algorithm for convex problems of constrained optimization given by minimize f (x) subject to x ∈ Ω,

(7.50)

where f : Rn → R is a convex function that is Lipschitz continuous on Rn with some constant  ≥ 0, while—in contrast to (7.38)—we have now the constraint on x given by a nonempty closed convex set Ω ⊂ Rn . The following counterpart of the subgradient algorithm (7.39) for constrained problems is known as the projected subgradient method. Given a sequence of positive numbers {αk } and given a starting point x1 ∈ Ω, the sequence of iterations {xk } is constructed by xk+1 = (xk − αk vk ; Ω) with some vk ∈ ∂ f (xk ), k ∈ N,

(7.51)

where (x; Ω) stands for the (unique) Euclidean projection of x onto Ω. The next result, which is a counterpart of Proposition 7.39 for algorithm (7.51), plays a major role in justifying the convergence of iterates in the projected subgradient method. Proposition 7.45 In the general setting of (7.51), we have the estimate   xk+1 − x 2 ≤ xk − x 2 − 2αk f (xk ) − f (x) + αk2 2 for all x ∈ Ω, k ∈ N. Proof Define z k+1 = xk − αk vk for k ∈ N. Then fix any x ∈ Ω and any k ∈ N. Employing Proposition 7.39, we have the estimate   z k+1 − x 2 ≤ xk − x 2 − 2αk f (xk ) − f (x) + αk2 2 . Projecting now z k+1 onto Ω and using Proposition 1.70 lead us to xk+1 − x = (z k+1 ; Ω) − (x; Ω) ≤ z k+1 − x .

(7.52)

7.6

Nash Equilibrium via Convex Analysis

229

Combining this with estimate (7.52), we arrive at   xk+1 − x 2 ≤ xk − x 2 − 2αk f (xk ) − f (x) + αk2 2 and thus complete the proof of the proposition.



Based on the obtained proposition and proceeding in the same way as for the subgradient algorithm (7.39) above, we can derive similar convergence results for the projected subgradient algorithm (7.51) under the assumptions (7.44) on the generating sequence {αk } in (7.45).

7.6

Nash Equilibrium via Convex Analysis

This section gives us a brief introduction to noncooperative game theory and presents a simple proof of the existence of Nash equilibrium as a consequence of the convexity and Brouwer’s fixed point theorem. For simplicity, we only consider two-person games while more general games can be treated similarly. Note that some equilibrium issues were addressed in Sect. 7.4 on the Lagrangian duality, which was treated via saddle points, i.e., a Pareto-type equilibrium. Such an approach can be also implemented for games. The notion of Nash equilibrium is very different from the Pareto one, while we employ basic tools of convex analysis and optimization for the study of both of them. Let us start with the definition of two-person noncooperative game. Definition 7.46 Let Ω1 and Ω2 be nonempty subsets of the spaces Rn and R p , respectively. A noncooperative game in the case of two players I and I I consists of two strategy sets Ωi and two real-valued functions u i : Ω1 × Ω2 → R for i = 1, 2 called the payoff functions. Then we refer to this game as {Ωi , u i }, i = 1, 2, in what follows. The key notion of Nash equilibrium was introduced by John Forbes Nash, Jr., in 1950 who proved the existence of such an equilibrium and was awarded the Nobel Memorial Prize in Economics in 1994. Definition 7.47 Given a noncooperative two-person game {Ωi , u i }, i = 1, 2, a Nash equilibrium is a pair (x 1 , x 2 ) ∈ Ω1 × Ω2 satisfying the conditions u 1 (x1 , x 2 ) ≤ u 1 (x 1 , x 2 ) for all x1 ∈ Ω1 , u 2 (x 1 , x2 ) ≤ u 2 (x 1 , x 2 ) for all x2 ∈ Ω2 . The conditions formulated in Definition 7.47 mean that (x 1 , x 2 ) is a pair of best strategies that both players agree with in the sense that x 1 is a best response of Player I when Player II

230

7 Applications to Problems of Optimization and Equilibrium

chooses the strategy x 2 , and x 2 is a best response of Player II when Player I chooses the strategy x 1 . Let us now consider two examples to illustrate this concept. Example 7.48 In a two-person game, suppose that each player can only choose either strategy A or B. If both players choose strategy A, they both get 4 points. If Player I chooses strategy A and Player II chooses strategy B, then Player I gets 1 point and Player II gets 3 points. If Player I chooses strategy B and Player II chooses strategy A, then Player I gets 3 points and Player II gets 1 point. If both players choose strategy B, each player gets 2 points. The payoff function of each player is represented in the payoff matrix:

Player I

Player II A

B

A

4, 4

1, 3

B

3, 1

2, 2

In this example, Nash equilibrium occurs when both players choose strategy A, and it also occurs when both players choose strategy B. Let us consider, for instance, the case where both players choose strategy B. In this case, given that Player II chooses strategy B, Player I also wants to keep strategy B because a change of strategy would lead to a reduction of the payoff from 2 to 1. Similarly, given that Player I chooses strategy B, Player II wants to keep strategy B because a change of the strategy would also lead to a reduction of the payoff from 2 to 1. Example 7.49 Here we consider another simple two-person game called matching pennies. Suppose that Player I and Player II each has a penny. Each player must secretly turn the penny to heads or tails and then reveal their choices simultaneously. Player I wins the game and gets Player II’s penny if both coins show the same face (heads-heads or tails-tails). In the other case where the coins show different faces (heads-tails or tails-heads), Player II wins the game and gets Player I’s penny. The payoff function of each player is represented in the payoff matrix below:

Player I

Player II Heads

Tails

Heads

1, −1

−1, 1

Tails

−1, 1

1, −1

In this game, it is not hard to see that no Nash equilibrium exists.

7.6

Nash Equilibrium via Convex Analysis

231

Denote by A the matrix that represents the payoff function of Player I, and denote by B the matrix that represents the payoff function of Player II: A=

1 −1 −1 1 and B = −1 1 1 −1

Now suppose that Player I is playing with a mind reader who knows Player I’s choice of faces. If Player I decides to turn heads, then Player II knows about it and chooses to turn tails and thus wins the game. In the case where Player I decides to turn tails, Player II chooses to turn heads and again wins the game. Thus to have a fair game, Player I decides to randomize the strategy by, e.g., tossing the coin instead of putting the coin down. We describe the new game in the following table:

Player I

Player II Heads, q1

Tails, q2

Heads, p1

1, −1

−1, 1

Tails, p2

−1, 1

1, −1

In this new game, Player I uses a coin randomly with probability of coming up heads p1 and probability of coming up tails p2 , where p1 + p2 = 1. Similarly, Player II uses another coin randomly with probability of coming up heads q1 and probability of coming up tails q2 , where q1 + q2 = 1. The new strategies are called mixed strategies while the original ones are called pure strategies. Suppose that Player II uses the mixed strategy {q1 , q2 }. Then Player I’s expected payoff for playing the pure strategy heads is u H (q1 , q2 ) = q1 − q2 = 2q1 − 1. Similarly, Player I’s expected payoff for playing the pure strategy tails is u T (q1 , q2 ) = −q1 + q2 = −2q1 + 1. Thus Player I’s expected payoff for playing the mixed strategy { p1 , p2 } is u 1 ( p, q) = p1 u H (q1 , q2 ) + p2 u T (q1 , q2 ) = p1 (q1 − q2 ) + p2 (−q1 + q2 ) = p T Aq, where p = [ p1 , p2 ]T and q = [q1 , q2 ]T . By the same arguments, if Player I chooses the mixed strategy { p1 , p2 }, then Player II’s expected payoff for playing the mixed strategy {q1 , q2 } is u 2 ( p, q) = p T Bq = −u 1 ( p, q).

232

7 Applications to Problems of Optimization and Equilibrium

For this new game, an element ( p, q) ∈  ×  is a Nash equilibrium if u 1 ( p, q) ≤ u 1 ( p, q) for all p ∈ , u 2 ( p, q) ≤ u 2 ( p, q) for all q ∈ , where  = {( p1 , p2 ) | p1 ≥ 0, p2 ≥ 0, p1 + p2 = 1} is a nonempty compact, and convex subset of R2 . Nash [24, 25] proved the existence of his equilibrium in the class of mixed strategies in a more general setting. His proof was based on Brouwer’s fixed point theorem with rather involved arguments. Now we present, following Rockafellar [31], a much simpler proof of the Nash equilibrium theorem that also uses Brouwer’s fixed point theorem while applying in addition just elementary tools of convex analysis and optimization. Theorem 7.50 Consider a two-person game {Ωi , u i }, i = 1, 2, where Ω1 ⊂ Rn and Ω2 ⊂ R p are nonempty compact convex sets. Let the payoff functions u i : Ω1 × Ω2 → R be given by u 1 ( p, q) = p T Aq and u 2 ( p, q) = p T Bq, where A and B are n × p matrices. Then this game admits a Nash equilibrium. Proof It follows from the definition that a pair ( p, q) ∈ Ω1 × Ω2 is a Nash equilibrium of the game under consideration if and only if −u 1 ( p, q) ≥ −u 1 ( p, q) for all p ∈ Ω1 , −u 2 ( p, q) ≥ −u 2 ( p, q) for all q ∈ Ω2 . It is a simple exercise to verify (see also an elementary convex optimization result of Corollary 7.16) that the above inequalities hold if and only if we have the normal cone inclusions ∇ p u 1 ( p, q) ∈ N ( p; Ω1 ), ∇q u 2 ( p, q) ∈ N (q; Ω2 ).

(7.53)

By using the structures of u i and defining Ω = Ω1 × Ω2 , conditions (7.53) can be expressed in one of the following two equivalent ways: Aq ∈ N ( p; Ω1 ) and B T p ∈ N (q; Ω2 ), (Aq, B T p) ∈ N ( p; Ω1 ) × N (q; Ω2 ) = N (( p, q); Ω1 × Ω2 ) = N (( p, q); Ω). By Proposition 1.69 on the Euclidean projection, this is equivalent also to ( p, q) = (( p, q) + (Aq, B T p); Ω).

(7.54)

7.7

Exercises for This Chapter

233

Defining now the mapping F : Ω → Ω by  F( p, q) =  ( p, q) + (Aq, B T p); Ω) with Ω = Ω1 × Ω2 and employing another elementary projection property from Proposition 1.70 allow us to conclude that the mapping F is continuous and the set Ω is nonempty, compact, and convex. By the classical Brouwer fixed point theorem (see, e.g., [39]), the mapping F has a fixed point ( p, q) ∈ Ω, which satisfies (7.54), and thus it is a Nash equilibrium of the game. 

7.7

Exercises for This Chapter

Exercise 7.1 Given a function f : Rn → R, show that its lower semicontinuity is equivalent to openness of the set

Uλ> ( f ) = x ∈ Rn f (x) > λ for every real number λ. Exercise 7.2 Given a lower semicontinuous function f : Rn → R, show that its multiplication (α f )  → α f (x) by any scalar α > 0 is lower semicontinuous as well. Exercise 7.3 Let A ∈ R p×n , let b ∈ R p , and let f : R p → R be an l.s.c. function. Prove that the function g : Rn → R defined by g(x) = f (Ax + b), x ∈ Rn , is also lower semicontinuous on Rn . Exercise 7.4 Let f : Rn → R be a real-valued function, and let g : R → R be an extendedreal-valued function. (i) Prove that if f is convex and g is l.s.c., then g ◦ f is l.s.c. (ii) Is it true that if both f and g are l.s.c., then the composition g ◦ f is l.s.c. as well? Justify your answer. Exercise 7.5 Construct an example of two lower semicontinuous real-valued functions whose product is not lower semicontinuous. Exercise 7.6 Verify the equivalence between (iii) and (iv) in Theorem 7.11. Exercise 7.7 Let f : Rn → [−∞, ∞) be a function. Recall that f is upper semicontinuous (u.s.c.) at x ∈ Rn if − f : Rn → (−∞, ∞] is lower semicontinuous at this point. Prove the following assertions:

234

7 Applications to Problems of Optimization and Equilibrium

(i) f is u.s.c. at x if and only if lim supk→∞ f (xk ) ≤ f (x) for every xk → x. (ii) f is u.s.c. at x if and only if we have lim supx→x f (x) ≤ f (x). (iii) f is u.s.c. on Rn if and only if the set Uλ ( f ) = {x ∈ Rn | f (x) ≥ λ} is closed for every real number λ. (iv) f is u.s.c. on Rn if and only if its hypographical set (or simply hypograph) hypo f := {(x, λ) ∈ Rn × R | λ ≤ f (x)} is closed. Exercise 7.8 Show that a real-valued function f : Rn → R is continuous at x ∈ Rn if and only if it is both lower semicontinuous and upper semicontinuous at this point. Exercise 7.9 Use assertion (ii) in Theorem 7.9 to prove assertion (i) therein. Exercise 7.10 Show that each function below attains its absolute minimum on Rn . m x − ai , where a1 , . . . , am are m given points in Rn . (i) f (x) = i=1 (ii) g(x) = maxi=1,...,m x − ai , where a1 , . . . , am given points in Rn . (iii) h(x) = Ax − b 2 + λ x , where A ∈ R p×n , b ∈ R p , and λ > 0. Exercise 7.11 Let f : Rn → R be an l.s.c. function. Prove that if there exists λ ∈ R such that the set Lλ ( f ) = {x ∈ Rn | f (x) ≤ λ} is nonempty and bounded, then f attains its absolute minimum on Rn . Exercise 7.12 Give an example of a convex function f : R → R that is bounded from below but f does not attains its absolute minimum on Rn . Exercise 7.13 Consider the real-valued function f (x) =

m 

x − ai 2 , x ∈ Rn ,

i=1

where ai ∈ Rn for i = 1, . . . , m. Show that f is a convex function and attains its unique global minimum at the mean point x = (a1 + · · · + am )/m. Exercise 7.14 Consider the problem: minimize x 2 subject to Ax = b, where A is an p × n matrix with rank A = p, and where b ∈ R p . Find the unique optimal solution to this problem. Exercise 7.15 Give a direct proof of Corollary 7.16 without using Theorem 7.15.

7.7

Exercises for This Chapter

235

Exercise 7.16 Derive the result of Corollary 7.21 from Theorem 7.15(ii). Exercise 7.17 Formulate and prove the weak duality result for the primal and Fenchel dual problems in Corollary 7.25. Exercise 7.18 Consider the set

Ω = x ∈ Rn a, x ≤ b , where 0  = a ∈ Rn and b ∈ R. Given a point x ∈ Rn , find the distance d(x; Ω) by using the Fenchel strong duality. Exercise 7.19 Let Ω be a nonempty convex set in Rn . Prove the following relationship between the distance and support functions associated with Ω:   d(x; Ω) = max v, x − σΩ (v) , x ∈ Rn . v∈B

Exercise 7.20 Establish relationships between Fenchel duality and Lagrangian duality for convex optimization problem with inequality constraints. Exercise 7.21 Given ai ∈ Rn for i = 1, . . . , m, write MATLAB/PYTHON programs to implement the subgradient algorithm for minimizing the functions: (i) f (x) =

m 

(ii) f (x) =

x − ai , x ∈ Rn .

i=1 m 

(iii) f (x) =

x − ai 1 , x ∈ Rn .

i=1 m 

x − ai ∞ , x ∈ Rn .

i=1

Exercise 7.22 Write a MATLAB/PYTHON program to implement the projected subgradient algorithm for solving the problem: minimize x 1 subject to Ax = b, where A is a p × n matrix with rank A = p, and where b ∈ R p . Exercise 7.23 Consider the optimization problem (7.38) and the iterative procedure (7.39) under the standing assumptions imposed in Sect. 7.5. Suppose that the optimal value V is known and that the sequence {αk } is given by

236

7 Applications to Problems of Optimization and Equilibrium

⎧ ⎪ ⎨0 αk = f (xk ) − V ⎪ ⎩ vk 2

if 0 ∈ ∂ f (xk ), otherwise,

(7.55)

where vk ∈ ∂ f (xk ). Prove that: (i) f (xk ) → V as k → ∞. d(x1 ; S) (ii) 0 ≤ Vk − V ≤ for all k ∈ N. √ k Exercise 7.24 Let Ωi for i = 1, . . . , m be nonempty closed convex subsets of Rn having a common point. Use the iterative procedure (7.39) for the function

f (x) = max d(x; Ωi ) i = 1, . . . , m , x ∈ Rn , with the sequence {αk } given in (7.55) to derive an algorithm for finding a common point m x ∈ i=1 Ωi . Exercise 7.25 Formulate and prove a counterpart of Theorem 7.44 for the projected subgradient algorithm (7.51) to solve constrained optimization problems. Exercise 7.26 Give a direct proof of characterizations (7.53) of Nash equilibrium. Exercise 7.27 In Example 7.49 with mixed strategies, find ( p, q) that provides a Nash equilibrium in this game.

8

Applications to Location Problems

This chapter addresses the usage of convex analysis in the study of various facility location problems, which are highly challenging mathematically and the importance of which in numerous applications is difficult to overstate. The classical problems of this type go back to Fermat, Torricelli, Sylvester along with other mathematicians and applied scientists, but there are generalized versions of such problems that are also formulated and investigated in what follows from the viewpoint of convex analysis.

8.1

The Fermat-Torricelli Problem

In 1643, Pierre de Fermat proposed to Evangelista Torricelli the following optimization problem: Given three points in the plane, find another point such that the sum of its distances to the given points is minimal. This problem was solved by Torricelli and was named the Fermat-Torricelli problem. A more general version of the Fermat-Torricelli problem asks for a point that minimizes the sum of the distances to a finite number of given points in Rn . This is one of the main problems in facility location, which has been studied by many researchers. The first numerical algorithm for solving the general Fermat-Torricelli problem was introduced by Endre Weiszfeld in 1937. Assumptions that guarantee the convergence of the Weiszfeld algorithm were given by Harold Kuhn in 1972. Kuhn also pointed out an example in which the Weiszfeld algorithm fails to converge. Several new algorithms have been introduced recently to improve the Weiszfeld algorithm. The goal of this section is to revisit the Fermat-Torricelli problem from both theoretical and numerical viewpoints by using techniques of convex analysis and optimization.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0_8

237

238

8 Applications to Location Problems

Throughout this section, we consider m distinct points ai ∈ Rn for i = 1, . . . , m and define the convex function ϕ(x) =

m 

x − ai , x ∈ Rn .

(8.1)

i=1

Then the mathematical description of the Fermat-Torricelli problem is minimize ϕ(x) subject to x ∈ Rn .

(8.2)

We see that (8.2) a convex unconstrained optimization problem with a nondifferentiable objective function. Our goal is to apply theoretical and algorithmic results of convex analysis and optimization developed in the previous chapters to investigating and solving the FermatTorricelli problem by taking into account its specific features. As conventional in optimization theory, let us start with the existence and uniqueness issues for optimal solutions. The first proposition addresses the existence of optimal solutions to the Fermat-Torricelli problem (8.2). Proposition 8.1 The Fermat-Torricelli problem formulated in (8.2) admits at least one optimal solution. Proof This follows directly from Theorem 7.9; see Exercise 7.10(i).



The next proposition reveals a simple condition ensuring the uniqueness of optimal solutions to (8.2). Proposition 8.2 Let the points ai for i = 1, . . . , m be not collinear, i.e., they do not belong to the same line. Then the function ϕ in (8.1) is strictly convex, and hence the Fermat-Torricelli problem has a unique optimal solution. Proof Since each function ϕi (x) = x − ai , x ∈ Rn , for i = 1, . . . , m is obviously conm ϕi is convex as well, i.e., for any x, y ∈ Rn and λ ∈ (0, 1) we vex, their sum ϕ = i=1 have the Jensen inequality   ϕ λx + (1 − λ)y ≤ λϕ(x) + (1 − λ)ϕ(y).

(8.3)

To verify its strict convexity, suppose the contrary and then find x, y ∈ Rn with x  = y and λ ∈ (0, 1) such that (8.3) holds as an equality. It follows that   ϕi λx + (1 − λ)y = λϕi (x) + (1 − λ)ϕi (y),

8.1 The Fermat-Torricelli Problem

239

which ensures therefore that λ(x − ai ) + (1 − λ)(y − ai ) = λ(x − ai ) + (1 − λ)(y − ai ) for all i = 1, . . . , m.

If x  = ai and y  = ai , then for each i ∈ {1, . . . , m} there exists ti > 0 such that ti λ(x − ai ) = (1 − λ)(y − ai ), and hence x − ai = γi (y − ai ) with γi =

1−λ . ti λ

Since x  = y, we have γi  = 1. Thus ai =

1 γi x− y ∈ L(x, y), 1 − γi 1 − γi

where L(x, y) is the line connecting x and y as defined in (2.1). Both cases where x = ai and y = ai give us ai ∈ L(x, y), and therefore ai ∈ L(x, y) for all i = 1, . . . , m. This is a contradiction, which completes the proof.  We proceed further with solving the Fermat-Torricelli problem for three points in Rn and first present two elementary lemmas. Lemma 8.3 Let a1 , a2 be two vectors in Rn with a1  = a2  = 1. Then we have −a1 − a2 ∈ B if and only if a1 , a2  ≤ −1/2. Proof If −a1 − a2 ∈ B, then a1 + a2 ∈ B and a1 + a2  ≤ 1. By squaring both sides, we get the inequality a1 + a2 2 ≤ 1, which subsequently implies that a1 2 + 2a1 , a2  + a2 2 = 2 + 2a1 , a2  ≤ 1, and so a1 , a2  ≤ −1/2. The converse implication is obvious.



Lemma 8.4 Let a1 , a2 , a3 be three vectors in Rn with a1  = a2  = a3  = 1. Then a1 + a2 + a3 = 0 if and only if we have the equalities 1 a1 , a2  = a2 , a3  = a3 , a1  = − . 2

(8.4)

240

8 Applications to Location Problems

Proof It follows from the condition a1 + a2 + a3 = 0 that a1 2 + a1 , a2  + a1 , a3  = 0,

(8.5)

a2 , a1  + a2 2 + a2 , a3  = 0,

(8.6)

a3 , a1  + a3 , a2  + a3 2 = 0.

(8.7)

Subtracting (8.7) from (8.5) gives us a1 , a2  = a2 , a3 , and similarly we get a2 , a3  = a3 , a1 . Substituting these equalities into (8.5) ensures the fulfillment of the claimed assertion (8.4). Conversely, assume that (8.4) holds. Then we have a1 + a2 + a3 2 =

3 

ai 2 + 2a1 , a2  + 2a2 , a3  + 2a1 , a3  = 0,

i=1

which yields the equality a1 + a2 + a3 = 0 and thus completes the proof.



Now we use subdifferentiation of convex functions to solve the Fermat-Torricelli problem for three points. Given nonzero vectors u, v ∈ Rn , define cos(u, v) =

u, v . u · v

The next theorem fully describes optimal solutions to the Fermat-Torricelli problem for three points in two distinct cases. In the statement of the theorem and its proof, we consider as x  = ai the three vectors x − ai vi = , i = 1, 2, 3. (8.8) x − ai  Geometrically, each vector vi is a unit one pointing in the direction from the vertex ai to x. Observe that the classical (three-point) Fermat-Torricelli problem always has a unique solution even if the given points belong to the same line. Indeed, in the latter case the middle point solves the problem. Theorem 8.5 Consider the Fermat-Torricelli problem (8.2) for the three distinct points a1 , a2 , a3 ∈ Rn . The following assertions hold: (i) Let x ∈ / {a1 , a2 , a3 }. Then x is the solution to the problem if and only if cos(v1 , v2 ) = cos(v2 , v3 ) = cos(v3 , v1 ) = −1/2. (ii) Let x ∈ {a1 , a2 , a3 }, say x = a1 . Then x is the solution to the problem if and only if we have cos(v2 , v3 ) ≤ −1/2.

8.1 The Fermat-Torricelli Problem

241

Proof In case (i), function (8.1) is Fréchet differentiable at x. By the classical Fermat rule, x solves the optimization problem (8.2) if and only if ∇ϕ(x) = v1 + v2 + v3 = 0, where v1 , v2 , v3 are the unit vectors defined in (8.8). By Lemma 8.4, the latter is satisfied if and only if we get   vi , v j  = cos(vi , v j ) = −1/2 for any i, j ∈ 1, 2, 3 with i  = j. To verify (ii) in the nonsmooth case therein, we employ the subdifferential Fermat rule (3.6) and the subdifferential sum rule from Corollary 3.20, which tell us that x = a1 solves the Fermat-Torricelli problem if and only if 0 ∈ ∂ϕ(a1 ) = v2 + v3 + B. Since v2 and v3 are unit vectors, it follows from Lemma 8.3 that v2 , v3  = cos(v2 , v3 ) ≤ −1/2, which completes the proof of the theorem.



Example 8.6 This example illustrates geometrically the solution construction for the Fermat-Torricelli problem on the plane. Consider (8.2) with the three given distinct points A, B, and C in R2 as shown in Fig. 8.1. If one of the angles of the triangle ABC is greater than or equal to 120◦ , then the corresponding vertex is the solution to (8.2) by Theorem 8.5(ii). Consider now the case where none of the angles of the triangle is greater than or equal to 120◦ . Construct the two equilateral triangles AB D and AC E, and let S be the intersection of DC and B E as in the figure. Two quadrilaterals AD BC and ABC E are convex, and hence S lies inside the triangle ABC. It is clear that the two triangles D AC and B AE are congruent. A rotation of 60◦ about A maps the triangle D AC to the triangle B AE. This rotation

Fig. 8.1 Construction of the Torricelli point

242

8 Applications to Location Problems

also maps C D to B E, and therefore ∠DS B = 60◦ by elementary properties of angles in triangles. Let T be the image of S through this rotation. Then T belongs to B E. It follows hence that ∠AST = ∠AS E = 60◦ . Moreover, ∠DS A = 60◦ , and thus ∠B S A = 120◦ . It is now clear that we have ∠ASC = 120◦ and ∠B SC = 120◦ . By Theorem 8.5(i), the point S is the solution to the Fermat-Torricelli problem under consideration.

8.2

The Weiszfeld Algorithm

Next we present a numerical algorithm to solve the Fermat-Torricelli problem (8.2) in the general case of m distinct points in Rn that are not collinear, which is our standing assumption in the rest of this section. This algorithm was proposed in 1937 by Weiszfeld [37] and is known as the Weiszfeld algorithm. To proceed, observe that the gradient of ϕ from (8.1) is calculated by ∇ϕ(x) =

m    x − ai for x ∈ / a1 , . . . , am . x − ai  i=1

The gradient equation ∇ϕ(x) = 0 gives us m 

x=

i=1 m  i=1

ai x − ai  1 x − ai 

  for x ∈ / a1 , . . . , am .

Define the function m 

f (x) =

i=1 m  i=1

ai x − ai  1 x − ai 

  for x ∈ / a1 , . . . , am ,

(8.9)

with extending f to the whole space Rn by f (x) = x for x ∈ {a1 , . . . , am }. Weiszfeld’s algorithm consists of choosing an arbitrary starting point x1 ∈ Rn , taking f from (8.9), and defining recurrently xk+1 = f (xk ) for k ∈ N,

(8.10)

where f is called the algorithm mapping. In what follows, we present the precise conditions and the proof of convergence for Weiszfeld’s algorithm in the way suggested by Kuhn [11]

8.2 The Weiszfeld Algorithm

243

who corrected some errors from [37]. In contrast to [11], we employ below subdifferential rules of convex analysis, which allow us to simplify the proof of convergence. The next proposition ensures decreasing the function value of ϕ after each iteration, i.e., the descent property of the algorithm. Proposition 8.7 Let x ∈ Rn . If f (x)  = x, then we have ϕ( f (x)) < ϕ(x). Proof The imposed assumption f (x)  = x tells us that x is not a vertex from the set {a1 , . . . , am }. Observe also that the point f (x) is the unique minimizer of the strictly convex function m  z − ai 2 g(z) = , z ∈ Rn , x − ai  i=1

and that g( f (x)) < g(x) = ϕ(x) since f (x)  = x. We have furthermore that m     f (x) − ai 2 g f (x) = x − ai  i=1 m  (x − ai  +  f (x) − ai  − x − ai )2 = x − ai  i=1 m    ( f (x) − ai  − x − ai )2 = ϕ(x) + 2 ϕ( f (x)) − ϕ(x) + . x − ai  i=1

This clearly yields the inequality m    ( f (x) − ai  − x − ai )2 2ϕ f (x) + < 2ϕ(x). x − ai  i=1





  Then 2ϕ f (x) < 2ϕ(x). Therefore, we arrive at ϕ f (x) < ϕ(x) and complete the proof of the proposition.  The next two propositions reveal how the algorithm mapping f behaves near a vertex that solves the Fermat-Torricelli problem. First we present a necessary and sufficient optimality condition for a vertex in (8.2). Define Rj =

m  i=1,i= j

ai − a j , j = 1, . . . , m. ai − a j 

Proposition 8.8 The vertex a j solves problem (8.2) if and only if R j  ≤ 1.

244

8 Applications to Location Problems

Proof Employing the Fermat rule from Proposition 3.6 and the subdifferential sum rule from Corollary 3.20 shows that a j solves (8.2) if and only if 0 ∈ ∂ϕ(a j ) = −R j + B, which is clearly equivalent to the condition R j  ≤ 1.



Proposition 8.8 from convex analysis allows us to essentially simplify the proof of the next result established by Kuhn. We use the notation   f s (x) = f f s−1 (x) for s ∈ N with f 0 (x) = x, x ∈ Rn . Proposition 8.9 Suppose that the vertex a j does not solve problem (8.2). Then there exists δ > 0 such that the condition 0 < x − a j  ≤ δ yields the existence of a positive integer s for which we have (8.11)  f s (x) − a j  > δ and  f s−1 (x) − a j  ≤ δ. Proof Pick any x ∈ / {a1 , . . . , am } and deduce from (8.9) that m 

f (x) − a j =

i=1,i= j m  i=1

f (x) − a j = lim lim x→a j x − a j  x→a j

ai − a j x − ai 

1 x − ai 

m

i=1,i= j

, and so

ai − a j x − ai 

m  x − a j  1+ x − ai 

= R j , j = 1, . . . , m.

i=1,i= j

It follows from Proposition 8.8 that lim

x→a j

 f (x) − a j  = R j  > 1, j = 1, . . . , m, x − a j 

(8.12)

which allows us to select ε > 0 and δ > 0 ensuring that  f (x) − a j  > 1 + ε whenever 0 < x − a j  ≤ δ. x − a j  Thus for every s ∈ N, the fulfillment of 0 <  f q−1 (x) − a j  ≤ δ with any q = 1, . . . , s implies that

8.2 The Weiszfeld Algorithm

245

 f s (x) − a j  ≥ (1 + ε) f s−1 (x) − a j  ≥ · · · ≥ (1 + ε)s x − a j . Since (1 + ε)s x − a j  → ∞ as s → ∞, this yields (8.11) as claimed.



We finally present the main result of this section on the convergence of the Weiszfeld algorithm to solve the Fermat-Torricelli problem. Theorem 8.10 Let {xk } be the sequence generated by the Weiszfeld algorithm (8.10), and / {a1 , . . . , am } for all k ∈ N. Then the iterative sequence {xk } converges to the unique let xk ∈ solution x of (8.2). Proof If xk = xk+1 for some k = k0 ∈ N, then {xk } is a constant sequence for k ≥ k0 , and hence it converges to xk0 . Since f (xk0 ) = xk0 and xk0 is not a vertex, xk0 solves the problem. Thus we can assume that xk+1  = xk for all k ∈ N. It follows from Proposition 8.7 that the sequence {ϕ(xk )} of nonnegative numbers is decreasing, so it converges. This gives us   lim ϕ(xk ) − ϕ(xk+1 ) = 0.

k→∞

(8.13)

By the construction of {xk }, we see that each xk belongs to the compact set co {a1 , . . . , am }. This allows us to select from {xk } a subsequence {xkν } converging to some point z as ν → ∞. We will show that z = x, i.e., it is the unique solution to (8.2). Indeed, we get from (8.13) that    lim ϕ(xkν ) − ϕ f (xkν ) = 0. ν→∞

This implies by the continuity of ϕ and f that ϕ(z) = ϕ( f (z)), which yields f (z) = z by Proposition 8.7. Thus z solves (8.2) if it is not a vertex. It remains to consider the case where z is a vertex, say a1 . Suppose on the contrary that z = a1  = x and select δ > 0 so small that (8.11) holds for some s ∈ N and that B(a1 ; δ) does not contain x and ai for i = 2, . . . , m. Since xkν → z = a1 , we assume without loss of generality that this sequence is entirely contained in B(a1 ; δ). For x = xk1 , select l1 with / B(a1 ; δ). Choosing now kν > l1 and applying Proposition 8.9 xl1 ∈ B(a1 ; δ) and f (xl1 ) ∈ / B(a1 ; δ). Repeating this procedure, find give us l2 > l1 with xl2 ∈ B(a1 ; δ) and f (xl2 ) ∈ a subsequence {xlν } such that xlν ∈ B(a1 ; δ) while f (xlν ) is not in this ball. Extracting a further subsequence if needed, we can assume that {xlν } converges to some point q satisfying f (q) = q. If q is not a vertex, then it solves (8.2) by the above, which is a contradiction because the solution x is not in B(a1 ; δ). Thus q is a vertex, which must be a1 since the other vertexes are also not in the ball. Then we have  f (xlν ) − a1  = ∞, ν→∞ xlν − a1  lim

which contradicts (8.12) in Proposition 8.9 and completes the proof.



246

8.3

8 Applications to Location Problems

Generalized Fermat-Torricelli Problems

There are several generalized versions of the Fermat-Torricelli problem known in the literature under different names; see, e.g., our papers [17, 19, 20] and the references therein. Note that the constrained version of the problem below is also called a “generalized Heron problem” after Heron from Alexandria who formulated its simplest version in the plane in the 1st century AD. Following [17, 19], we consider here a generalized Fermat-Torricelli problem defined via the minimal time functions (6.1) for finitely many nonempty, closed, and convex target sets in Rn with a constraint set of the same structure. In [8, 12, 33] and their references, the reader can find some particular versions of this problem and related material on location problems. We also refer the reader to [17, 20] to nonconvex problems of this type. Let the target sets Ωi for i = 1, . . . , m and the constraint set Ω0 be distinct nonempty, closed, and convex subsets of Rn . Consider the minimal time functions T F (x; Ωi ) ≡ TΩFi (x) from (6.1) for Ωi , i = 1, . . . , m, with the constant dynamics defined by a compact and convex set F ⊂ Rn that contains the origin as an interior point. Recall that the generalized projection from a point x ∈ Rn to a set Ω ⊂ Rn is defined by    Π F (x; Ω) ≡ ΠΩF (x) = w ∈ Ω  T F (x; Ω) = ρ F (w − x) . Then the generalized Fermat-Torricelli problem is formulated as follows: minimize H(x) =

m 

T F (x; Ωi ) subject to x ∈ Ω0 .

(8.14)

i=1

Our first goal in this section is to establish the existence and uniqueness of solutions to the optimization problem (8.14). These issues are not so simple to resolve as in the case of the classical Fermat-Torricelli problem (8.2). We split the proof into several propositions that are of their own interest. Let us start with the following property of the Minkowski gauge function (6.4). Proposition 8.11 Assume that F is a compact and convex subset of Rn with 0 ∈ int F. Then ρ F (x) = 1 if and only if x ∈ bd F. Proof Suppose that ρ F (x) = 1. Then there exists a sequence of real numbers {tk } that converges to t = 1 while being such that x ∈ tk F for every k. Since F is closed, this implies that x ∈ F. To show further that x ∈ / int F, suppose on the contrary that x ∈ int F and then choose t > 0 so small that x + t x ∈ F, which ensures that x ∈ (1 + t)−1 F. This tells us that ρ F (x) ≤ (1 + t)−1 < 1, a contradiction that verifies the “only if” implication. Now let x ∈ bd F. Since F is closed, we have x ∈ F, and hence ρ F (x) ≤ 1. Suppose on the contrary that γ = ρ F (x) < 1. Then there exists a sequence of real numbers {tk } that converges to γ with x ∈ tk F for every k ∈ N. Then x ∈ γ F = γ F + (1 − γ)0 ⊂ F. By Propo-

8.3

Generalized Fermat-Torricelli Problems

247

sition 6.18, the Minkowski gauge function ρ F is continuous, so the set {u ∈ Rn | ρ F (u) < 1} is an open subset of F containing x. Thus x ∈ int F, which completes the proof.  Recall further that a set Q ⊂ Rn is said to be strictly convex if for any x, y ∈ Q with x  = y and for any t ∈ (0, 1) we have t x + (1 − t)y ∈ int Q. Some examples of strictly convex sets are given below; see also Exercise 8.6. Example 8.12 It is easy to see the following: (i) The Euclidean ball B(x; r ) is strictly convex for any x ∈ Rn and r > 0. (ii) If f : Rn → R is a real-valued function that is strictly convex, then epi f is a strictly convex set in Rn+1 . (iii) The empty set and any singleton in Rn are strictly convex. The next proposition reveals a useful property of the Minkowski gauge function associated with a constant dynamics set F that is strictly convex. It extends a well-known property of the Euclidean norm stating that for two nonzero elements x, y ∈ Rn , we have x + y = x + y if and only if x = λy for some λ > 0. Proposition 8.13 Assume that F is a compact and strictly convex subset of Rn with 0 ∈ int F. Then we have the equality ρ F (x + y) = ρ F (x) + ρ F (y) with x, y  = 0

(8.15)

if and only if x = λy for some λ > 0. Proof Denote α = ρ F (x), β = ρ F (y) and observe that α, β > 0 by Proposition 6.7. The positively homogeneous property of ρ F gives us ρ F (x/α) = ρ F (y/β) = 1, so x/α and y/β belong to F by Proposition 8.11. If the linearity property (8.15) holds, then ρF

x + y

= 1, α+β

which implies in turn that ρF

x

α y β

+ = 1. αα+β β α+β

This allows us to use Proposition 8.11 and conclude that x α y β + ∈ bd F. αα+β β α+β

248

8 Applications to Location Problems

Since x/α ∈ F, y/β ∈ F, and α/(α + β) ∈ (0, 1), it follows from the strict convexity of the constant dynamics F that x y ρ F (x) = , i.e., x = λy with λ = . α β ρ F (y) The converse implication in the proposition is obvious.



The following proposition extends the standard singleton property of the Euclidean projections associated with closed and convex sets. Proposition 8.14 Let F be a compact and strictly convex set with 0 ∈ int F, and let Ω be a nonempty, closed, convex subset of Rn . Then for any x ∈ Rn the generalized projection set Π F (x; Ω) is a singleton. Proof We only need to consider the case where x ∈ / Ω. It follows from the classical Weierstrass theorem that Π F (x; Ω)  = ∅. Suppose on the contrary that there exist elements u 1 , u 2 ∈ Π F (x; Ω) with u 1  = u 2 . Then

T F (x; Ω) = ρ F (u 1 − x) = ρ F (u 2 − x) = r > 0, which implies in turn that (u 1 − x)/r ∈ bd F ⊂ F and (u 2 − x)/r ∈ bd F ⊂ F due to the closedness of F. Since F is strictly convex, we have the inclusion

1 u2 − x 1 u1 + u2 1 u1 − x + = − x ∈ int F, 2 r 2 r r 2 and therefore ρ F (u − x) < r = T F (x; Ω) with u = (u 1 + u 2 )/2 ∈ Ω. The obtained contradiction completes the proof of the proposition.  In what follows, we identify the generalized projection Π F (x; Ω) with its unique element when both sets F and Ω are strictly convex. Proposition 8.15 Let F be a compact and convex subset of Rn such that 0 ∈ int F, and let / Ω. If ω ∈ Π F (x; Ω), then we get ω ∈ bd Ω. Ω be a nonempty closed set in Rn with x ∈ Proof Take ω ∈ Π F (x; Ω) and suppose on the contrary that ω ∈ int Ω. Then there exists a real number ε > 0 such that B(ω; ε) ⊂ Ω. Define the point z =ω+ε

x −ω ∈ B(ω; ε) ⊂ Ω. x − ω

8.3

Generalized Fermat-Torricelli Problems

249

Fig. 8.2 Three sets that satisfy assumptions of Proposition 8.16

We observe from the construction of the Minkowski gauge function (6.4) that x −ω ε ρ F (ω − x) −x = 1− ρ F (z − x) = ρ F ω + ε x − ω x − ω < ρ F (ω − x) = T F (x; Ω) for small ε > 0. This is a contradiction by the definition of the generalized projection  Π F (x; Ω), and thus ω must be a boundary point of Ω. The next result marries the properties of the minimal time function to the strict convexity of the target sets Ωi from (8.14)(Fig. 8.2). Proposition 8.16 In the setting of problem (8.14), suppose that the sets Ωi as i = 1, . . . , m are strictly convex. If for any x, y ∈ Ω0 with x  = y there exists an index i 0 ∈ {1, . . . , m} such that L(x, y) ∩ Ωi0 = ∅ for the line L(x, y) from (2.1) passing through x and y, then the function

H(x) =

m 

T F (x; Ωi )

i=1

defined in (8.14) is strictly convex on the constraint set Ω0 . Proof Observe that the function H is convex on Ω0 . Suppose on the contrary that H is not strictly convex on this set. Then there exist x, y ∈ Ω0 with x  = y and a real number t ∈ (0, 1) such that   H t x + (1 − t)y = t H(x) + (1 − t)H(y).

250

8 Applications to Location Problems

Using the convexity of each function T F (·; Ωi ), we have   T F t x + (1 − t)y; Ωi = t T F (x; Ωi ) + (1 − t)T F (y; Ωi ), i = 1, . . . , m.

(8.16)

Suppose further that L(x, y) ∩ Ωi0 = ∅ for some i 0 ∈ {1, . . . , m} and define u = Π F (x; Ωi0 ), v = Π F (y; Ωi0 ). Then it follows from (8.16), Theorem 6.19, and the convexity of the Minkowski gauge function ρ F that tρ F (u − x) + (1 − t)ρ F (v − y) = t T F (x; Ωi0 ) + (1 − t)T F (y; Ωi0 )   = T F t x + (1 − t)y; Ωi0   ≤ ρ F tu + (1 − t)v − (t x + (1 − t)y) ≤ tρ F (u − x) + (1 − t)ρ F (v − y), which shows that tu + (1 − t)v = Π F (t x + (1 − t)y; Ωi0 ). Thus u = v, since otherwise tu + (1 − t)v ∈ int Ωi0 , a contradiction to Proposition 8.15. This gives us u = v = Π F (t x + (1 − t)y; Ωi0 ). The estimates above also tell us that       ρ F u − (t x + (1 − t)y = ρ F t(u − x) + ρ F (1 − t)(u − y) . Since u − x, u − y  = 0 due to x, y ∈ / Ωi0 , by Proposition 8.13 we find λ > 0 such that t(u − x) = λ(1 − t)(u − y), which yields u − x = γ(u − y) for γ =

λ(1 − t)  = 1. t

This clearly justifies the inclusion u=

γ 1 x− y ∈ L(x, y), 1−γ 1−γ

which is a contradiction that completes the proof of the proposition. Now we establish a major existence and uniqueness theorem for (8.14). Theorem 8.17 Impose the following assumptions on the data of (8.14): (i) The sets Ωi , i = 1, . . . , m, are strictly convex. (ii) If x, y ∈ Ω0 with x  = y, then there is i 0 ∈ {1, . . . , m} with

L(x, y) ∩ Ωi0 = ∅. (iii) At least one of the sets Ωi , i = 0, . . . , m, is bounded.



8.3

Generalized Fermat-Torricelli Problems

251

Then the generalized Fermat-Torricelli problem formulated in (8.14) admits a unique optimal solution. Proof By the strict convexity of H from Proposition 8.16, it suffices to show that problem (8.14) admits an optimal solution. If Ω0 is bounded, then it is compact and the conclusion follows from the (Weierstrass) Theorem 7.9 since the function H in (8.14) Lipschitz continuous by Corollary 6.20. Consider further the case where one of the sets Ωi for i = 1, . . . , m is bounded; say it is Ω1 . In this case, for any α ≥ 0 we get       Lα = x ∈ Ω0  H(x) ≤ α ⊂ x ∈ Ω0  T F (x; Ω1 ) ≤ α ⊂ Ω0 ∩ (Ω1 − αF), which yields the compactness of Lα , and so problems (8.14) admits an optimal solution by Theorem 7.9.  To continue, we consider further specifications of the generalized Fermat-Torricelli problems from the preceding section and obtain constructing results leading us to their solving in some particular settings. Furthermore, the application of the projected subgradient method from Sect. 7.5 is specified to design an efficient numerical algorithm to solving a broad class of the generalized Fermat-Torricelli problems. Illustrative examples implementing the obtained results and solving procedures are also presented below. First we examine a particular case of problem (8.14), where the constant dynamics set F is given by the closed unit ball. Consider the problem minimize D(x) =

m 

d(x; Ωi ) subject to x ∈ Ω0 ,

(8.17)

i=1

where Ωi for i = 0, . . . , m are nonempty closed convex sets. Given x ∈ Rn , define the index subsets of {1, . . . , m} by    I (x) := i ∈ {1, . . . , m}  x ∈ Ωi , (8.18)    J (x) := i ∈ {1, . . . , m}  x ∈ / Ωi . It is obvious that I (x) ∪ J (x) = {1, . . . , m} and that I (x) ∩ J (x) = ∅ for all x ∈ Rn . The next theorem fully characterizes an optimal solution to (8.17) via the Euclidean projections to the target sets Ωi for i = 1, . . . , m. Theorem 8.18 The following assertions hold for problem (8.17): (i) Let x ∈ Ω0 be an optimal solution to (8.17). Then we have   − Ai (x) ∈ Ai (x) + N (x; Ω0 ), i∈J (x)

i∈I (x)

(8.19)

252

8 Applications to Location Problems

where each set Ai (x), i = 1, . . . , m, is calculated by ⎧ x − Π (x; Ωi )  ⎪ ⎪ / Ωi , if x ∈ ⎨ d(x; Ωi ) Ai (x) = ⎪ ⎪ ⎩ N (x; Ωi ) ∩ B if x ∈ Ωi

(8.20)

with Ai (x) being a singleton for all i ∈ J (x). (ii) Conversely, if x ∈ Ω0 satisfies condition (8.19), then x is an optimal solution to problem (8.17). Proof To verify (i), fix an optimal solution x to problem (8.17), which is obviously an optimal solution to the unconstrained problem minimize D(x) + δ(x; Ω0 ), x ∈ Rn .

(8.21)

Applying the subdifferential Fermat rule to (8.21), we characterize x by 0∈∂

m 

d(·; Ωi ) + δ(·; Ω0 ) (x).

(8.22)

i=1

Since all the functions d(·; Ωi ) for i = 1, . . . , m are real-valued convex functions, by the subdifferential sum rule of Theorem 3.18 and the results of Theorem 3.12 we have   0∈ ∂d(x; Ωi ) + d(x; Ωi ) + N (x; Ω0 ) =

i∈J (x) 

i∈I (x) Ai (x) + N (x; Ω0 ),

Ai (x) +

i∈J (x)

i∈I (x)

which justifies inclusion (8.19). Assertion (ii) can be verified similarly.



Let us specifically examine problem (8.17) in the case where the intersections of the constraint set Ω0 with the target sets Ωi are empty. Corollary 8.19 Consider problem (8.17) with Ω0 ∩ Ωi = ∅ for i = 1, . . . , m. Let x ∈ Ω0 and define the vectors ai (x) =

x − Π (x; Ωi )  = 0, i = 1, . . . , m. d(x; Ωi )

(8.23)

Then x is an optimal solution to (8.17) if and only if −

m  i=1

ai (x) ∈ N (x; Ω0 ).

(8.24)

8.3

Generalized Fermat-Torricelli Problems

253

Proof In this case, we have x ∈ / Ωi for all i = 1, . . . , m, i.e., I (x) = ∅ and J (x) = {1, . . . , m}. Thus the result follows from Theorem 8.18.  For more detailed solution to (8.17), consider a special case where m = 3. Theorem 8.20 Let m = 3 in the framework of Theorem 8.18, where the target sets Ω1 , Ω2 , and Ω3 are pairwise disjoint, and where Ω0 = Rn . Then either one of the following conclusions holds for an optimal solution x of (8.17) with ai (x) taken from (8.23): (i) x belongs to one of the sets Ωi , say Ω1 , and we have  a2 , a3  ≤ −1/2 and − a2 − a3 ∈ N (x; Ω1 ). (ii) x does not belong to any of the three sets Ω1 , Ω2 , and Ω3 , and we have   ai , a j  = −1/2 for i  = j as i, j ∈ 1, 2, 3 . Conversely, if x ∈ Rn satisfies either (i) or (ii), then it solves (8.17). Proof Suppose that x ∈ Rn is an optimal solution to (8.17). First consider the case where x belongs to one of the sets Ωi , say Ω1 . We deduce from Theorem 8.18 that if x is an optimal solution to (8.17) if and only if 0 ∈ ∂d(x; Ω1 ) + ∂d(x; Ω2 ) + ∂d(x; Ω3 ) = ∂d(x; Ω1 ) + a2 + a3 = N (x; Ω1 ) ∩ B + a2 + a3 . Thus −a2 − a3 ∈ N (x; Ω1 ) ∩ B, which is equivalent to the fulfillment of −a2 − a3 ∈ B and −a2 − a3 ∈ N (x; Ω1 ). By Lemma 8.3, this is also equivalent to  a2 , a3  ≤ −1/2 and − a2 − a3 ∈ N (x; Ω1 ), which give us (i). We similarly conclude that in the case where x ∈ / Ωi for i = 1, 2, 3, this element solves (8.17) if and only if 0 ∈ ∂d(x; Ω1 ) + ∂d(x; Ω2 ) + ∂d(x; Ω3 ) = a1 + a2 + a3 . Then Lemma 8.4 tells us that the latter is equivalent to   ai , a j  = −1/2 for i  = j as i, j ∈ 1, 2, 3 . Conversely, the fulfillment of either (i) or (ii) and the convexity of the sets Ωi ensure the relationships 0 ∈ ∂d(x; Ω1 ) + ∂d(x; Ω2 ) + ∂d(x; Ω3 ) = ∂ D(x), and thus x is an optimal solution to the problem under consideration.



254

8 Applications to Location Problems

Finally in this section, we discuss the application of the projected subgradient algorithm from Sect. 7.5 to the numerical solution to the generalized Fermat-Torricelli problem (8.14). Theorem 8.21 Consider problem (8.14) and assume that S  = ∅ for its solution set. Picking a sequence {αk } of positive numbers and a starting point x1 ∈ Ω0 , form the iterative sequence m

 vik ; Ω0 , k ∈ N, xk+1 = Π xk − αk

(8.25)

i=1

with an arbitrary choice of subgradient elements vik ∈ −∂ρ F (ωik − xk ) ∩ N (ωik ; Ωi ) / Ωi for some ωik ∈ Π F (xk ; Ωi ) if xk ∈

(8.26)

and vik ∈ N (xk ; Ωi ) ∩ C ∗ otherwise, where Π F (x; Ω) stands for the generalized projection operator, where ρ F is the Minkowski gauge function (6.4), and where C ∗ = {v ∈ Rn | σ F (−v) ≤ 1}. Defining the sequence of values    Vk = min H(x j )  j = 1, . . . , k , the following assertions are fulfilled: (i) If the sequence {αk } satisfies conditions (7.44), then {Vk } converges to the optimal value V of problem (8.14). Furthermore, we have the estimate  2 d(x1 ; S)2 + 2 ∞ k=1 αk 0 ≤ Vk − V ≤ , k 2 i=1 αk where  ≥ 0 is a Lipschitz constant of cost function H on Rn . (ii) If the sequence {αk } satisfies the conditions in (7.46), then {Vk } converges to the optimal value V and {xk } in (8.25) converges to an optimal solution x ∈ S of the generalized FermatTorricelli problem (8.14). Proof This is a direct application of the projected subgradient algorithm described in Proposition 7.45 to the constrained optimization problem (8.14) by taking into account the results for the subgradient method in convex optimization presented in Sect. 7.5.  In the case where F = B, we have a remarkable specification of the subgradient algorithm from Theorem 8.21 to solve (8.14). Corollary 8.22 Let F = B ⊂ Rn in (8.14), let {αk } be a sequence of positive numbers, and let x1 ∈ Ω0 be a starting point. Consider algorithm (8.25) with

8.3

Generalized Fermat-Torricelli Problems

vik =

255

xk − Π (xk ; Ωi ) / Ωi , if xk ∈ d(xk ; Ωi )

and vik ∈ N (x; Ωi ) ∩ B if xk ∈ Ωi , e.g., vik = 0. Then all the conclusions of Theorem 8.21 hold for this specification. Proof Immediately follows from Theorem 8.21 with ρ F (x) = x.



In conclusion, we illustrate applications of the subgradient algorithm of Theorem 8.21 to solving the generalized Fermat-Torricelli problem (8.14). Note that to implement the subgradient algorithm (8.25), it is sufficient to calculate just one subgradient from the set on the right-hand side of (8.26) for each target set. In the following example, we consider for definiteness the dynamics F given by the square [−1, 1] × [−1, 1] in the plane. In this case, the corresponding Minkowski gauge function (6.4) reduces to (Fig. 8.3)   ρ F (x1 , x2 ) = max |x1 |, |x2 | , (x1 , x2 ) ∈ R2 .

(8.27)

Example 8.23 Consider the unconstrained problem (8.14) in R2 with the constant dynamics set F = [−1, 1] × [−1, 1].

Fig. 8.3 Minimal time function with square dynamics

256

8 Applications to Location Problems

The target sets are the square Ωi of right position (sides parallel to the axes). These squares are centered at (ai , bi ) with radii ri (half of the side) for i = 1, . . . , m. Given a sequence of positive numbers {αk } and a starting point x1 , construct the iterations xk = (x1k , x2k ) by the projected subgradient algorithm (8.25). Subgradients vik can be chosen by ⎧ (1, 0) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (−1, 0) ⎪ ⎪ ⎪ ⎪ ⎨ vik = (0, 1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (0, −1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (0, 0)

8.4

if |x2k − bi | ≤ x1k − ai and x1k > ai + ri , if |x2k − bi | ≤ ai − x1k and x1k < ai − ri , if |x1k − ai | ≤ x2k − bi and x2k > bi + ri , if |x1k − ai | ≤ bi − x2k and x2k < bi − ri , otherwise.

Solving Planar Fermat-Torricelli Problems for Euclidean Balls

In this section, we completely solve by means of convex analysis and optimization the following simplified version of problem (8.14): minimize H(x) =

3 

d(x; Ωi ), x ∈ R2 ,

(8.28)

i=1

where the target sets Ωi = B(ci ; ri ) for i = 1, 2, 3 are the closed balls of R2 whose centers ci are supposed to be distinct from each other in order to avoid triviality. Throughout this section, we denote by S the set of all optimal solutions to the formulated problem. Let us first reveal some properties of optimal solutions to problem (8.28). Proposition 8.24 The optimal solution set S of (8.28) is a nonempty compact convex subset of R2 . Proof The existence of solutions (8.28) follows from the proof of Theorem 8.17. It is not hard to see that S is closed and bounded. The convexity of S can be deduced from the convexity of the distance functions d(·; Ωi ) for any i = 1, 2, 3; see Exercise 8.5.  To proceed, fix x ∈ R2 and define the index set    I (x) := i ∈ {1, 2, 3}  x ∈ Ωi , which is broadly used in what follows.

(8.29)

8.4

Solving Planar Fermat-Torricelli Problems for Euclidean Balls

257

Proposition 8.25 Suppose that x ∈ R2 does not belong to any of the sets Ωi , i = 1, 2, 3, i.e., I (x) = ∅. Then x is an optimal solution to (8.28) if and only if x solves the classical Fermat-Torricelli problem (8.2) generated by the centers of the balls ai = ci for i = 1, 2, 3 in (8.1). Proof Assume that x is an optimal solution to (8.28) with x ∈ / Ωi for i = 1, 2, 3. Choose δ > 0 such that B(x; δ) ∩ Ωi = ∅ as i = 1, 2, 3 and get the following relationships for every x ∈ B(x; δ): 3  i=1

x − ci  −

3  i=1

ri = inf H(x) = H(x) ≤ H(x) = x∈R2

3 

x − ci  −

i=1

3 

ri .

i=1

This shows that x is a local optimal solution to the problem minimize F (x) =

3 

x − ci , x ∈ R2 ,

(8.30)

i=1

so it is a global optimal solution to this problem due to the convexity of F . Thus x solves the classical Fermat-Torricelli problem (8.2) generated by c1 , c2 , c3 . The converse implication is straightforward.  The next result provides a direct condition ensuring the uniqueness of solutions to (8.28) in terms of the index set from (8.29). Proposition 8.26 Let x be a solution to (8.28) such that I (x) = ∅ in (8.29). Then x is the unique solution to (8.28), i.e., S = {x}. Proof Suppose on the contrary that there exists u ∈ S such that u  = x. Since I (x) = ∅, / Ωi for i = 1, 2, 3. Then Proposition (8.25) tells us that x solves the classical we have x ∈ Fermat-Torricelli problem generated by c1 , c2 , c3 . It follows from the convexity of S in / Ωi for i = 1, 2, 3. Proposition 8.24 that [x, u] ⊂ S. Thus we find v  = x with v ∈ S and v ∈ It shows that v also solves the Fermat-Torricelli problem (8.30). This contradicts the uniqueness of optimal solutions to (8.30) and hence justifies the claim.  The following proposition verifies yet another important geometric property of an ellipse. Proposition 8.27 Given α > 0 and a, b ∈ R2 with a  = b, consider the set    E = x ∈ R2  x − a + x − b = α

258

8 Applications to Location Problems

and suppose that a − b < α. For x, y ∈ E with x  = y, we have x + y  x + y      − a +  − b < α,  2 2 which ensures, in particular, that (x + y)/2 ∈ / E. Proof Take x, y ∈ E with x  = y. It follows from the triangle inequality that  x + y  1 x + y      − a +  − b = x − a + y − a + x − b + y − b  2 2 2  1 x − a + y − a + x − b + y − b 2 = α.



Suppose on the contrary that x + y  x + y      − a +  − b = α,  2 2

(8.31)

i.e., (x + y)/2 ∈ E. Since our norm is Euclidean and x  = y, this gives us x − a = β(y − a) and x − b = γ(y − b) for some β, γ ∈ (0, ∞) \ {1}. The latter yields a=

1 1 β γ x− y ∈ L(x, y) and b = x− y ∈ L(x, y). 1−β 1−β 1−γ 1−γ

Since x, y ∈ E and a − b < α, we see that x, y ∈ L(a, b) \ [a, b]. Consider first the case where (x + y)/2 ∈ / [a, b] and, by taking into account that all these points belong to the same straight line, assume without loss of generality the following order: x, (x + y)/2, a, b, and y. Then we get  x + y  x + y     − a +  − b < x − a + x − b = α,  2 2 which contradicts (8.31). Now consider the case where (x + y)/2 ∈ [a, b] and get therefore the conditions x + y  x + y      − a +  − b = a − b < α  2 2 bringing us to a contradiction. Thus we have (x + y)/2 ∈ / E, which completes the proof of the proposition.  The next result ensures the uniqueness of solutions to (8.28) via the index set (8.29) in a setting different from Proposition 8.26.

8.4

Solving Planar Fermat-Torricelli Problems for Euclidean Balls

259

Proposition 8.28 Let S be the solution set for problem (8.28), and let x ∈ S satisfy the / [c j , ck ], where i, j, k are distinct indices in {1, 2, 3}. Then conditions: I (x) = {i} and x ∈ S is a singleton, namely S = {x}. / Ω2 , x ∈ / Ω3 , Proof Suppose for definiteness that I (x) = {1}. Then we have x ∈ Ω1 , x ∈ / [c2 , c3 ]. Denote and x ∈ α = x − c2  + x − c3  = inf H(x) + r2 + r3 x∈R2

and observe that c2 − c3  < α. Consider the ellipse    E = x ∈ R2  x − c2  + x − c3  = α for which we get x ∈ E ∩ Ω1 . Suppose on the contrary that S is not a singleton, i.e., there exists u ∈ S with u  = x. The convexity of S implies that [x, u] ⊂ S. We can choose v ∈ [x, u] sufficiently close to but different from x so that [x, v] ∩ Ω2 = ∅ and [x, v] ∩ Ω3 = ∅. Let us first show that v ∈ / Ω1 . Assuming the contrary gives us d(v; Ω1 ) = 0, and hence

H(v) = inf H(x) = d(v; Ω2 ) + d(v; Ω3 ) = v − c2  − r2 + v − c3  − r3 , x∈R2

which implies that v ∈ E. It follows from the convexity of S and Ω1 that (x + v)/2 ∈ / Ω2 and (x + v)/2 ∈ / Ω3 , which tell us that S ∩ Ω1 . We have furthermore that (x + v)/2 ∈ (x + v)/2 ∈ E. However, this cannot happen due to Proposition 8.27. Therefore, v ∈ S and I (v) = ∅, which contradict the result of Proposition 8.26 and thus justify that S is a singleton.  Now we verify the boundary property of optimal solutions to (8.28)(Fig. 8.4). Proposition 8.29 Let x ∈ S, and let I (x) = {i, j} ⊂ {1, 2, 3}. Then x is a boundary point of the set intersection Ωi ∩ Ω j .

Fig. 8.4 A generalized Fermat-Torricelli problem with |I (x)| = 2

260

8 Applications to Location Problems

Proof Assume for definiteness that I (x) = {1, 2}, i.e., x ∈ Ω1 ∩ Ω2 and x ∈ / Ω3 . Arguing by contradiction, suppose that x ∈ int(Ω1 ∩ Ω2 ), and therefore there exists δ > 0 such that B(x; δ) ⊂ Ω1 ∩ Ω2 . Denote p = Π (x; Ω3 ), γ = x − p > 0 and take q ∈ [x, p] ∩ B(x, δ) satisfying the condition q − p < x − p = d(x; Ω3 ). Then we arrive at the relationships 3  H(q) = d(q; Ωi ) = d(q; Ω3 ) ≤ q − p i=1

< x − p = d(x; Ω3 ) =

3 

d(x; Ωi ) = H(x),

i=1

which contradict the fact that x ∈ S and thus complete the proof.



The following theorem gives us verifiable necessary and sufficient conditions for the solution non-uniqueness of (8.28). 3 Ωi = ∅. Then problem (8.28) has more than one solutions if and Theorem 8.30 Let i=1 only if there exists a set [ci , c j ] ∩ Ωk for distinct indices i, j, k ∈ {1, 2, 3} that contains a point u ∈ R2 belonging to the interior of Ωk and such that the index set I (u) is a singleton. Proof Let u ∈ [c1 , c2 ] ∩ Ω3 with u ∈ int Ω3 satisfy the conditions in the “if” part of the / Ω2 and we choose v  = u with v ∈ [c1 , c2 ], v ∈ / Ω1 , v ∈ / Ω2 , theorem. Then u ∈ / Ω1 , u ∈ and v ∈ int Ω3 . This shows that the set I (v) is a singleton. To verify that u is a solution to (8.28), observe first that

H(u) = d(u; Ω1 ) + d(u; Ω2 ) + d(u; Ω3 ) = c1 − c2  − r1 − r2 . Choose δ > 0 such that B(u; δ) ∩ Ω1 = ∅, B(u; δ) ∩ Ω2 = ∅, and B(u; δ) ⊂ Ω3 . For every x ∈ B(u; δ), we get the relationships

H(x) = d(x; Ω2 ) + d(x; Ω3 ) = x − c1  + x − c2  − r1 − r2 ≥ c1 − c2  − r1 − r2 = H(u), which imply that u is a local (and hence global) minimizer of H. Similar arguments show that v ∈ S, and so the solution set S is not a singleton. To justify the “only if” part, suppose that S is not a singleton and pick two distinct elements x, y ∈ S, which yields [x, y] ⊂ S by Proposition 8.24. If there exists an optimal solution to (8.28) not belonging to Ωi for every i ∈ {1, 2, 3}, then S reduces to a singleton due to Proposition 8.26, a contradiction. Thus we can assume by the above that Ω1 contains infinitely many solutions. Then its interior int Ω1 also contains infinitely many optimal

8.4

Solving Planar Fermat-Torricelli Problems for Euclidean Balls

261

solutions by the strict convexity of the ball Ω1 . If there is such an optimal solution u ∈ / Ω2 as well as u ∈ / Ω3 . This implies that int Ω1 with I (u) = {1}, then u ∈ int Ω1 and u ∈ u ∈ [c2 , c3 ], since the opposite yields the solution uniqueness by Proposition 8.28. Consider the case where the set I (u) contains two distinct elements for every optimal solution u belonging to int Ω1 . Then there are infinitely many optimal solutions belonging to the intersection of these two sets, which is strictly convex in this case. Hence there exists an optimal solution to (8.28) that belongs to the interior of this intersection, which contradicts Proposition 8.29 and thus finishes the proof of the theorem (Fig. 8.6).  Here is an illustrative example with complete calculations. Example 8.31 Let the sets Ω1 , Ω2 , and Ω3 in problem (8.28) be closed balls in R2 of radius r = 1 centered at the points (0, 2), (−2, 0), and (2, 0), respectively. We can easily see that the point (0, 1) ∈ Ω1 satisfies all the conditions in Theorem 8.20(i), and hence it is an optimal solution (in fact the unique one) of this problem. More generally, consider problem (8.28) in R2 generated by three pairwise disjoint Euclidean closed balls denoted by Ωi , i = 1, 2, 3. Let c1 , c2 , and c3 be the centers of the balls. Assume first that either the line segment [c2 , c3 ] ⊂ R2 intersects Ω1 , or [c1 , c3 ] intersects Ω2 , or [c1 , c2 ] intersects Ω3 . It is not hard to check that any point of the intersections (say, of the sets Ω1 and [c2 , c3 ] for definiteness) is an optimal solution to the problem under consideration, since it satisfies the necessary and sufficient optimality conditions of Theorem 8.20(i). Indeed, using the notation of this theorem, we see that if x is such a point, then a2 and a3 are unit vectors with a2 , a3  = −1 and −a2 − a3 = 0 ∈ N (x; Ω1 ). If the above intersection assumptions are violated, we define the three points q1 , q2 , and q3 as follows. Let u and v be intersection points of [c1 , c2 ] and [c1 , c3 ] with the boundary of the ball centered at c1 , respectively. Then we see that there is a unique point q1 on the minor curve generated by u and v such that the measures of angle c1 q1 c2 and c1 q1 c3 are equal. The points q2 and q3 are defined similarly. Theorem 8.20 yields that whenever the angle c2 q1 c3 , or c1 q2 c3 , or c2 q3 c1 equals or exceeds 120◦ (say, the angle c2 q1 c3 does), then the point x = q1 is an optimal solution to the problem under consideration. Indeed, in this case we have that the points a2 and a3 from (8.23) are unit vectors with a2 , a3  ≤ −1/2 and −a2 − a3 ∈ N (x; Ω1 ) because the vector −a2 − a3 is orthogonal to Ω1 . If none of these angles equals or exceeds 120◦ , then there exists a point q not belonging to Ωi as i = 1, 2, 3 and such that the angles c1 qc2 = c2 qc3 = c3 qc1 are of 120◦ and that q is an optimal solution to the problem under consideration. Observe that in this case we have that the point q is also a unique optimal solution to the classical Fermat-Torricelli problem determined by the points c1 , c2 , and c3 ; see Fig. 8.5.

262

8 Applications to Location Problems

Fig. 8.5 A generalized Fermat-Torricelli problem with infinite solutions

Fig. 8.6 Generalized Fermat-Torricelli problem for three balls

8.5

Generalized Sylvester Problems

In [35], James Joseph Sylvester proposed the smallest enclosing circle problem formulated as follows. Given finitely many points in the plane, find the smallest circle that encloses all of these points. After more than a century, location problems of this type remain active as they are mathematically beautiful and have meaningful real-world applications. This section is devoted to the study by means of convex analysis and optimization of the following generalized version of the Sylvester problem formulated in [21]; see also [22, 23] for further developments. Let F ⊂ Rn be a compact convex set containing the origin as an interior point. Define the generalized ball centered at x ∈ Rn with radius r ≥ 0 by B F (x; r ) := x + r F. It is obvious if for F = B, then the generalized ball B F (x; r ) reduces to the closed ball of radius r centered at x. Given nonempty closed convex sets Ωi for i = 0, . . . , m with m ≥ 2, the generalized Sylvester problem is formulated by: find a point x ∈ Ω0 and the smallest number r ≥ 0 such that B F (x; r ) ∩ Ωi  = ∅ for all i = 1, . . . , m.

8.5

Generalized Sylvester Problems

263

Observe that when F = B, Ω0 = R2 , and the sets Ωi , i = 1, . . . , m, are singletons in R2 , the generalized Sylvester problem becomes the classical Sylvester enclosing circle problem. The sets Ωi are often called the target sets. To study the generalized Sylvester problem, we introduce the cost function    T (x) = max T F (x; Ωi )  i = 1, . . . , m , x ∈ Rn , via the minimal time function (6.1) and consider the following constrained optimization problem of minimax type: minimize T (x) subject to x ∈ Ω0 .

(8.32)

Observe that the generalized ball B F (x; r ) with x ∈ Ω0 is an optimal solution to the generalized Sylvester problem if and only if x solves the optimization problem (8.32) with r = T (x). First we present sufficient conditions for the existence of solutions of (8.32). Proposition 8.32 The generalized Sylvester problem (8.32) admits an optimal solution if  m  the set Ω0 ∩ i=1 (Ωi − αF) is bounded for all α ≥ 0. This holds, in particular, if there is i 0 ∈ {0, . . . , m} such that the set Ωi0 is bounded. Proof For any α ≥ 0, consider the sublevel set    Lα (T ) = x ∈ Ω0  T (x) ≤ α and then show that in fact we have

Lα (T ) = Ω0 ∩

 m

(Ωi − αF) .

(8.33)

i=1

Indeed, it holds for any closed and convex set Q  = ∅ that T F (x; Q) ≤ α if and only if (x + αF) ∩ Q  = ∅, which is equivalent to x ∈ Q − αF. If x ∈ Lα (T ), then x ∈ Ω0 and T F (x; Ωi ) ≤ α for i = 1, . . . , m. Thus x ∈ Ωi − αF for i = 1, . . . , m, which verifies the inclusion “⊂” in (8.33). The proof of the reverse inclusion is also straightforward. It is clear  m  furthermore that the boundedness of the set Ω0 ∩ i=1 (Ωi − αF) for all α ≥ 0 ensures that the sublevel sets for the cost function in (8.32) are bounded. Thus it follows from Theorem 7.11 that problem (8.32) has an optimal solution. The last statement is obvious.  Now we proceed with the study of the uniqueness of optimal solutions to (8.32) while splitting the proof of the uniqueness result into several propositions. The first proposition in this vein concerns an abstract situation, which involves general convex functions and sets.

264

8 Applications to Location Problems

Proposition 8.33 Consider a convex function g : Ω → R, where Ω ⊂ Rn is a nonempty convex set and where g(x) ≥ 0 for all x ∈ Ω. Then the function h : Ω → R given by h(x) = (g(x))2 ≡ g 2 (x), x ∈ Ω, is strictly convex on Ω if and only if g is not constant on any line segment [a, b] ⊂ Ω with a  = b. Proof Let h be strictly convex on Ω and suppose on the contrary that g is constant on a line segment [a, b] with a  = b. Then h is also constant on this line segment, so it cannot be strictly convex. Conversely, assume that g is not constant on any segment [a, b] ⊂ Ω with a  = b. Observe that h is a convex function on Ω since it is a composition of a quadratic function, which is a nondecreasing convex function on [0, ∞), and a convex function whose range is a subset of [0, ∞). To verify that h is strictly convex on Ω, suppose on the contrary that there exist t ∈ (0, 1) and x, y ∈ Ω, x  = y, with h(z) = th(x) + (1 − t)h(y) with z = t x + (1 − t)y, and hence g 2 (z) = tg 2 (x) + (1 − t)g 2 (y). It follows from the convexity of g that g(z) ≤ tg(x) + (1 − t)g(y) and therefore g 2 (z) ≤ t 2 g 2 (x) + (1 − t)2 g 2 (y) + 2t(1 − t)g(x)g(y), which readily implies that tg 2 (x) + (1 − t)g 2 (y) ≤ t 2 g 2 (x) + (1 − t)2 g 2 (y) + 2t(1 − t)g(x)g(y). Thus (g(x) − g(y))2 ≤ 0, and so g(x) = g(y). This tells us that h(x) = h(z) = h(y) and would bring us to a contradiction by showing that h is constant on the line segment [z, y]. To verify the latter, fix any u ∈ (z, y) and observe that h(u) ≤ νh(z) + (1 − ν)h(y) = h(z) for some ν ∈ (0, 1). On the other hand, since z lies between x and u, we have h(z) ≤ μh(x) + (1 − μ)h(u) ≤ μh(z) + (1 − μ)h(z) = h(z) for some μ ∈ (0, 1), which leads us to h(z) = μh(x) + (1 − μ)h(u) = μh(z) + (1 − μ)h(u) and hence to h(u) = √ h(z). This contradicts the assumption that g(u) = h(u) is not constant on any [a, b] ⊂ Ω with a  = b and thus completes the proof.  The next proposition reveals conditions ensuring that the minimal time function (6.1) is not constant on line segments in Rn . Proposition 8.34 Let both sets F and Ω in Rn be nonempty, closed, and strictly convex, and let in addition F be a bounded set containing 0 as an interior point. Then the convex

8.5

Generalized Sylvester Problems

265

function T F (·; Ω) is not constant on any line segment [a, b] ⊂ Rn such that a  = b and [a, b] ∩ Ω = ∅. Proof To prove the statement of the proposition, suppose on the contrary that there exists [a, b] ⊂ Rn with a  = b such that [a, b] ∩ Ω = ∅ and T F (x; Ω) = r for all x ∈ [a, b]. Choosing u ∈ Π F (a; Ω) and v ∈ Π F (b; Ω), we get ρ F (u − a) = ρ F (v − b) = r . Let us verify that u  = v. Indeed, the opposite gives us ρ F (u − a) = ρ F (u − b) = r , which implies by r > 0 that u − a

u − b

= ρF = 1, ρF r r u−a u−b and thus ∈ bd F and ∈ bd F. Since F is strictly convex, it yields r r a + b

1 u − a 1 u − b 1 + = u− ∈ int F. 2 r 2 r r 2 Hence we arrive at a contradiction by having

TF

a + b 2



a + b

; Ω ≤ ρF u − < r. 2

To finish the proof of the proposition, fix t ∈ (0, 1) and get     r = T F ta + (1 − t)b; Ω ≤ ρ F tu + (1 − t)v − (ta + (1 − t)b) ≤ tρ F (u − a) + (1 − t)ρ F (v − b) = r , which implies that tu + (1 − t)v ∈ Π F (ta + (1 − t)b; Ω). Hence we have by Proposition 8.15 that tu + (1 − t)v ∈ bd Ω, which contradicts the strict convexity of Ω and thus completes the proof (Fig. 8.7). 

Fig. 8.7 Minimal time function for strictly convex sets

266

8 Applications to Location Problems

The following proposition concerns behavior of the class of maximum functions over nonnegative and convex ones that encompasses the setting of the generalized Sylvester problem to establish the uniqueness result below. Proposition 8.35 Let ∅  = Ω ⊂ Rn be a closed convex set, and let h i : Ω → R be nonnegative continuous convex functions for i = 1, . . . , m. Define the constrained maximum function   φ(x) = max h 1 (x), . . . , h m (x) , x ∈ Ω, and suppose that φ(x) = r > 0 on [a, b] for some line segment [a, b] ⊂ Ω with a  = b. Then there exists a line segment [α, β] ⊂ [a, b] with α  = β and an index i 0 ∈ {1, . . . , m} such that we have h i0 (x) = r for all x ∈ [α, β]. Proof There is nothing to prove for m = 1. When m = 2, define   φ(x) = max h 1 (x), h 2 (x) , x ∈ Ω, and observe that the conclusion is obvious if h 1 (x) = r for all x ∈ [a, b]. Otherwise, we find x0 ∈ [a, b] with h 1 (x0 ) < r . By the continuity of h 1 , there exists [α, β] ⊂ [a, b], α  = β, with h 1 (x) < r for all x ∈ [α, β]. Thus h 2 (x) = r for all x in this subinterval. To proceed further by induction, assume that the conclusion holds for a positive integer m ≥ 2 and show that it also the holds for m + 1 functions. Defining   φ(x) = max h 1 (x), . . . , h m (x), h m+1 (x) , x ∈ Ω, we get φ(x) = max{h 1 (x), ψ1 (x)} with υ1 (x) = max{h 2 (x), . . . , h m+1 (x)}, and thus reduces the statement to the case of two functions. The rest of the proof using induction is straightforward.  Now we are ready to establish the uniqueness of optimal solutions for the generalized Sylvester problem (8.32). Theorem 8.36 In the setting of problem (8.32), suppose that the sets F and Ωi for i = 1, . . . , m are strictly convex. Then problem (8.32) has at most one optimal solution if and m Ωi contains at most one element. only if the set i=0 Proof Define K(x) = T 2 (x), x ∈ Rn , and consider the optimization problem minimize K(x) subject to x ∈ Ω0 .

(8.34)

8.5

Generalized Sylvester Problems

267

It is obvious that x ∈ Ω0 is an optimal solution to (8.32) if and only if it is an optimal m Ωi contains at most one solution to problem (8.34). We are going to prove that if i=0 element, then K is strictly convex on Ω0 , and hence problem (8.32) has at most one optimal solution. Indeed, suppose that K is not strictly convex on Ω0 . By Proposition 8.33, we find a line segment [a, b] ⊂ Ω0 with a  = b and a real number r ≥ 0 so that    T (x) = max T F (x; Ωi )  i = 1, . . . , m = r for all x ∈ [a, b]. m It is clear that r > 0, since otherwise [a, b] ⊂ i=0 Ωi , which contradicts the assumption that this set contains at most one element. Proposition 8.35 tells us that there exist a line segment [α, β] ⊂ [a, b] with α  = β and an index i 0 ∈ {1, . . . , m} such that

T F (x; Ωi0 ) = r for all x ∈ [α, β]. Since r > 0, we get that x ∈ / Ωi0 for all x ∈ [α, β], which contradicts the statement of Proposition 8.34. m Ωi Conversely, let problem (8.32) have at most one solution. Suppose that the set i=0 m contains more than one element. It is clear that i=0 Ωi is the solution set of (8.32) with the optimal value equal to zero. Thus this problem has more than one solution, a contradiction that completes the proof.  The following corollary combines the previous results and gives us sufficient conditions for the existence and uniqueness of optimal solutions to the generalized Sylvester problem (8.32). Corollary 8.37 In the setting of (8.32), suppose that the sets F and Ωi for i = 1, . . . , m m Ωi contains at most one element and if one of the sets among are strictly convex. If i=0 Ωi for i = 0, . . . , m is bounded, then problem (8.32) has a unique optimal solution. m Ωi contains at most one element, problem (8.32) has at most one optimal Proof Since i=0 solution by Theorem 8.36. Thus it has exactly one solution because the solution set is nonempty by Proposition 8.32.  Next we consider a special unconstrained case of (8.32) given by    minimize M(x) = max d(x; Ωi )  i = 1, . . . , m , x ∈ Rn ,

(8.35)

m for nonempty closed convex sets Ωi under the assumption that i=1 Ωi = ∅. Problem (8.35) corresponds to (8.32) with F = B ⊂ Rn . Denote the set of active indices for (8.35) by    I (x) := i ∈ {1, . . . , m}  M(x) = d(x; Ωi ) , x ∈ Rn .

268

8 Applications to Location Problems

Further, for each x ∈ / Ωi and i ∈ {1, . . . , m}, define the vectors ai (x) =

x − Π (x; Ωi ) d(x; Ωi )

(8.36)

where the projection Π (x; Ωi ) is unique. We clearly have M(x) > 0 for all i ∈ I (x) due m Ωi = ∅. This ensures that x ∈ / Ωi automatically for any to the imposed assumption i=1 i ∈ I (x). Let us now present necessary and sufficient optimality conditions for (8.35) based on general results of convex analysis and optimization developed above. Theorem 8.38 The vector x ∈ Rn is an optimal solution to problem (8.35) under the assumptions made if and only if we have the inclusion    x ∈ co Π (x; Ωi )  i ∈ I (x) . (8.37) In particular, for the case where Ωi = {ai }, i = 1, . . . , m, problem (8.35) has a unique solution x if and only if    x ∈ co ai  i ∈ I (x) . (8.38) Proof It follows from Propositions 3.6 and 3.27 that the condition  ∂d(x; Ωi ) 0 ∈ co i∈I (x)

is necessary and sufficient for the optimality of x in (8.35). The distance function subdifferentiation from Theorem 3.12 gives us the characterization    0 ∈ co ai (x)  i ∈ I (x) (8.39) via the singletons ai (x) calculated in (8.36). It is easy to observe that (8.39) holds if and  only if there are λi ≥ 0 for i ∈ I (x) such that i∈I (x) λi = 1 and 0=

 i∈I (x)

λi

x − ωi with ωi = Π (x; Ωi ). M(x)

This equation is clearly equivalent to   0= λi (x − ω i ), i.e., x = λi ω i , i∈I (x)

i∈I (x)

which in turn is equivalent to inclusion (8.37). Finally, for Ωi = {ai } as i = 1, . . . , m, the latter inclusion obviously reduces to (8.38). In this case, the problem has a unique optimal solution by Corollary 8.37. 

8.6

Solving Planar Smallest Intersection Ball Problems

269

To formulate the next result, we say that the ball B(x; r ) touches the set Ωi if the intersection set Ωi ∩ B(x; r ) is a singleton. Corollary 8.39 Consider problem (8.35) under the assumptions imposed above. Then any smallest intersecting ball touches at least two targets among the sets Ωi as i = 1, . . . , m. Proof First we verify that there exist at least two active indices in I (x) provided that x is a solution to (8.35). Suppose the contrary, i.e., I (x) = {i 0 }. Then it follows from (8.37) that x ∈ Π (x; Ωi0 ) and d(x; Ωi0 ) = M(x) > 0, a contradiction. Next we show that B(x; r ) with r = M(x) touches Ωi when i ∈ I (x). Indeed, if on the contrary there are u, v ∈ Ωi such that u, v ∈ B(x; r ) ∩ Ωi with u  = v, then, by taking into account that d(x; Ωi ) = r , we get u − x ≤ r = d(x; Ωi ) and v − x ≤ r = d(x; Ωi ), and so u, v ∈ Π (x; Ωi ). This contradicts the fact that the projector Π (x; Ωi ) is a singleton and thus completes the proof of the corollary. 

8.6

Solving Planar Smallest Intersection Ball Problems

The first goal of this section is to illustrate efficient applications of the results obtained in the preceding section to solving two nontrivial planar examples. Then we present a specification of the projected subgradient algorithm to the case of the generalized Sylvester problem with their subsequent implementations to solving two other planar examples. To begin with, consider problem (8.35) for three Euclidean balls in R2 . Example 8.40 Let x solve the smallest enclosing ball/circle problem generated by the three points ai ∈ R2 , i = 1, 2, 3. Corollary 8.39 tells us that there are either two or three active indices at x. If I (x) consists of two indices, say I (x) = {2, 3}, then x ∈ co {a2 , a3 } and x − a2  = x − a3  > x − a1  by Theorem 8.38. In this case, we have the conditions x=

a2 + a3 and a2 − a1 , a3 − a1  < 0. 2

If conversely a2 − a1 , a3 − a1  < 0, then the angle of the triangle formed by a1 , a2 , and a3 at the vertex a1 is obtuse; we include here the case where all the three points ai lie on a straight line. Then I ((a2 + a3 )/2) = {2, 3} and the point x = (a2 + a3 )/2 satisfies the

270

8 Applications to Location Problems

Fig. 8.8 Smallest intersecting ball problem for three balls

assumption of Theorem 8.38 and therefore solves the problem. Observe that in the latter case, the solution to (8.35) is the midpoint of the side opposite to the obtuse vertex. If none of the angles of the triangle formed by a1 , a2 , a3 is obtuse, then the active index set I (x) consists of three elements. In this case, x is the unique point satisfying the conditions   x ∈ co a1 , a2 , a3 and x − a1  = x − a2  = x − a3 . In the other words, x is the center of the circumscribing circle of the triangle (Fig. 8.8). The next example relates to the celebrated problem of Apollonius. Example 8.41 Let Ωi = B(ci , ri ) with ri > 0 and i = 1, 2, 3 be disjoint balls in R2 , and let x be the unique solution to the corresponding problem (8.35). Consider first the case where one of the line segments connecting two of the centers intersects the other balls, e.g., the line segment connecting c2 and c3 intersects Ω1 . Denote u 2 = c2 c3 ∩ bd Ω2 and u 3 = c2 c3 ∩ bd Ω3 , and let x be the midpoint of u 2 u 3 . Then I (x) = {2, 3} and we can apply Theorem 8.38 to see that x is the solution to the problem. It remains to consider the case where any line segment connecting two centers of the balls does not intersect the remaining ball. In this case, denote u 1 = c1 c2 ∩ bd Ω1 , v1 = c1 c3 ∩ bd Ω1 , u 2 = c2 c3 ∩ bd Ω2 , v2 = c2 c1 ∩ bd Ω2 , u 3 = c3 c1 ∩ bd Ω3 , v3 = c3 c2 ∩ bd Ω3 , a1 = c1 m 1 ∩ bd Ω1 , a2 = c2 m 2 ∩ bd Ω2 , a3 = c3 m 3 ∩ bd Ω3 , where m 1 is the midpoint of u 2 v3 , m 2 is the midpoint of u 3 v1 , and m 3 is the midpoint ◦ of u 1 v2 . Suppose that one of the angles u 2 a1 v3 , u 3 a2 v1 , u 1 a3 v2 is greater than 90 ; e.g.,

8.6

Solving Planar Smallest Intersection Ball Problems

271

◦ u 2 a1 v3 is greater than 90 . Then I (m 1 ) = {2, 3}, and thus x = m 1 is the unique solution to this problem by Theorem 8.38. If now we suppose that all of the aforementioned angles are less than or equal to 90◦ , then I (x) = 3 and the smallest ball we are looking for is the unique ball that touches three other balls. The given construction of this disc solves the classical problem of Apollonius.

Finally in this section, we present and illustrate the projected subgradient algorithm to solve numerically the generalized Sylvester problem (8.32). Theorem 8.42 In the setting of problem (8.32), suppose that the solution set S is nonempty. Fix x1 ∈ Ω0 and define the sequences of iterates by   xk+1 = Π xk − αk vk ; Ω0 , k ∈ N, where {αk } is a sequence of positive numbers, and where  if xk ∈ Ωi , N (x ; Ω ) ∩ C ∗  vk ∈  k i / Ωi − ∂ρ F (ωk − xk ) ∩ N (ωk ; Ωi ) if xk ∈ with ωk ∈ Π F (xk ; Ωi ) for some i ∈ I (xk ) and C ∗ := {v ∈ Rn | σ F (−v) ≤ 1}. Define the value sequence    Vk = min T (x j )  j = 1, . . . , k . Then the following assertions hold: (i) If the sequence {αk } satisfies the conditions in (7.44), then {Vk } converges to the optimal value V of the problem. Furthermore, we have the estimate  2 d(x1 ; S)2 + 2 ∞ k=1 αk 0 ≤ Vk − V ≤ , k 2 i=1 αk where  ≥ 0 is a Lipschitz constant of cost function T on Rn . (ii) If the sequence {αk } satisfies conditions (7.46), then {Vk } converges to the optimal value V and the sequence of iterates {xk } in (8.25) converges to an optimal solution x ∈ S of this problem. Proof Follows from the results of Sect. 4.3 applied to problem (8.32).



The two concluding examples illustrate the application of the projected subgradient algorithm of Theorem 8.42 in particular situations in the plane.

272

8 Applications to Location Problems

Example 8.43 Consider problem (8.35) in R2 with m target sets given by the following planar squares:     Ωi = S(ωi ; ri ) = ω1i − ri , ω1i + ri × ω2i − ri , ω2i + ri , i = 1, . . . , m. Denote the vertices of the ith square by v1i = (ω1i + ri , ω2i + ri ), v2i = (ω1i − ri , ω2i + ri ), v3i = (ω1i − ri , ω2i − ri ), and v4i = (ω1i + ri , ω2i − ri ), and then let xk = (x1k , x2k ). The Euclidean projection to Ωi is illustrated in Fig. 8.9. Fix an index i ∈ I (xk ) and choose the vectors vk from the subgradient algorithm in Theorem 8.42 by ⎧ xk − v1i ⎪ ⎪ if x1k − ω1i > ri and x2k − ω2i > ri , ⎪ ⎪ x ⎪ k − v1i  ⎪ ⎪ ⎪ xk − v2i ⎪ ⎪ if x1k − ω1i < −ri and x2k − ω2i > ri , ⎪ ⎪ ⎪ xk − v2i  ⎪ ⎪ ⎪ ⎪ xk − v3i ⎪ ⎪ ⎪ ⎪ xk − v3i  if x1k − ω1i < −ri and x2k − ω2i < −ri , ⎪ ⎨ xk − v4i vk = if x1k − ω1i > ri and x2k − ω2i < −ri , ⎪ ⎪ xk − v4i  ⎪ ⎪ ⎪ ⎪ ⎪ if |x1k − ω1i | ≤ ri and x2k − ω2i > ri , ⎪ (0, 1) ⎪ ⎪ ⎪ ⎪ ⎪ (0, −1) if |x1k − ω1i | ≤ ri and x2k − ω2i < −ri , ⎪ ⎪ ⎪ ⎪ ⎪ (1, 0) if x1k − ω1i > ri and |x2k − ω2i | ≤ ri , ⎪ ⎪ ⎪ ⎪ ⎩ (−1, 0) if x1k − ω1i < −ri and |x2k − ω2i | ≤ ri . This allows us to run the subgradient algorithm for solving problem (8.32).

Fig. 8.9 Euclidean projection to a square

8.6

Solving Planar Smallest Intersection Ball Problems

273

In the last example, we address the case of a non-Euclidean norm in the minimal time function in the construction of (8.32).    Example 8.44 Consider problem (8.32) with F = (x1 , x2 ) ∈ R2  |x1 | + |x2 | ≤ 1 and a nonempty closed convex target Ω. The generalized ball B F (x; t) centered at x = (x 1 , x 2 ) with radius t ≥ 0 has the diamond shape    B F (x; t) = (x1 , x2 ) ∈ R2  |x1 − x 1 | + |x2 − x 2 | ≤ t . The distance from x to Ω and the corresponding projection are given by    T F (x; Ω) = inf t ≥ 0  B F (x; t) ∩ Ω = ∅ ,

(8.40)

Π F (x; Ω) = B F (x; t) ∩ Ω with t = T F (x; Ω). Let Ωi be the squares S(ωi ; ri ), i = 1, . . . , m, given in Example 8.43. Then problem (8.35) can be interpreted as follows. Find a set of the diamond shape in R2 that intersects all m given squares. Using the same notation as in Example 8.43, we can see that the vector vk in Theorem 8.42 are chosen by

Fig. 8.10 Minimal time function with diamond-shape dynamics

274

8 Applications to Location Problems

vk =

⎧ (1, 1) ⎪ ⎪ ⎪ ⎪ (−1, 1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (−1, −1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (1, −1) ⎪ (0, 1) ⎪ ⎪ ⎪ ⎪ ⎪ (0, −1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (1, 0) ⎪ ⎪ ⎪ ⎩ (−1, 0)

if x1k − ω1i > ri and x2k − ω2i > ri , if x1k − ω1i < −ri and x2k − ω2i > ri , if x1k − ω1i < −ri and x2k − ω2i < −ri , if x1k − ω1i > ri and x2k − ω2i < −ri , if |x1k − ω1i | ≤ ri and x2k − ω2i > ri , if |x1k − ω1i | ≤ ri and x2k − ω2i < −ri , if x1k − ω1i > ri and |x2k − ω2i | ≤ ri , if x1k − ω1i < −ri and |x2k − ω2i | ≤ ri .

This is illustrated by Fig. 8.10.

8.7

Exercises for This Chapter

Exercise 8.1 Prove the converse implication in Lemma 8.3. Exercise 8.2 Write a MATLAB/PYTHON program to implement the Weiszfeld algorithm. Exercise 8.3 Let ai ∈ R for i = 1, . . . , m be m distinct real numbers. Solve the following optimization problem: minimize f (x) =

m 

|x − ai |, x ∈ R.

i=1

Exercise 8.4 Let Ωi = [ai , bi ] ⊂ R for i = 1, . . . , m be m disjoint intervals. Solve the following optimization problem: minimize f (x) =

m 

d(x; Ωi ), x ∈ R.

i=1

Exercise 8.5 In the setting of problem (8.14), prove that if at least one of the sets Ωi for i = 0, . . . , m is bounded, then the set of all optimal solutions of this problem is a nonempty compact convex set. Exercise 8.6 (i) Prove that any Euclidean ball B(x; r ), where x ∈ Rn and r > 0, is a strictly convex set. (ii) Prove that if f : Rn → R is a real-valued function that is strictly convex, then epi f is a strictly convex set in Rn+1 . (iii) Give an example of a convex set that is not strictly convex.

8.7

Exercises for This Chapter

275

Exercise 8.7 Prove assertion (ii) of Theorem 8.18. Exercise 8.8 Give a complete proof of Theorem 8.21. Exercise 8.9 Write a MATLAB/PYTHON program to implement the subgradient algorithm for solving the problem: minimize f (x), x ∈ Rn , m d(x; Ωi ) with Ωi = B(ci ; ri ) for i = 1, . . . , m. where f (x) = i=1 Exercise 8.10 Complete the solution to the problem in Example 8.23. Write a MATLAB/PYTHON program to implement the algorithm. Exercise 8.11 Give a simplified proof of Theorem 8.36 in the case where F is the closed unit ball of Rn . Exercise 8.12 Write a MATLAB/PYTHON program to implement the subgradient algorithm for solving the problem: minimize f (x), x ∈ Rn , where f (x) = max{d(x; Ωi ) | i = 1, . . . , m} with Ωi = B(ci ; ri ) for i = 1, . . . , m. Exercise 8.13 Give a detailed proof of Theorem 8.42. Exercise 8.14 Complete the solution to the problem in Example 8.43. Write a MATLAB/PYTHON program to implement the algorithm in this example. Exercise 8.15 Complete the solution to the problem in Example 8.44. Write a MATLAB/PYTHON program to implement the algorithm in this example.

A

Solutions and Hints for Selected Exercises

This final part of the book contains solutions/hints for selected exercises. Rather detailed solutions are given for several problems.

Exercises for Chap. 1 Exercise 1.2. (ii) Let Z be the set of all integers in R. Then Z + Z = Z  = 2Z. Exercise 1.3. (i) Use Proposition 1.20. (ii) We only consider the case where m = 2 since the general case is similar. From the definition of cones, both sets Ω1 and Ω2 contain 0, so Ω1 ∪ Ω2 ⊂ Ω1 + Ω2 . Then we have the inclusion co {Ω1 ∪ Ω2 } ⊂ Ω1 + Ω2 . To verify the reverse inclusion, fix any x ∈ Ω1 + Ω2 and get x = w1 + w2 , where w1 ∈ Ω1 , w2 ∈ Ω2 . It follows therefore that x = (2w1 + 2w2 )/2 ∈ co {Ω1 ∪ Ω2 }.

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0

277

278

Appendix: Solutions and Hints for Selected Exercises

Exercise 1.4. Since Ω ⊂ co Ω and Ω is open, we see that Ω ⊂ int(co Ω). By Proposition 1.31, the set int(co Ω) is convex. Thus co Ω ⊂ int(co Ω), which shows that co Ω is open. Consider the set    Ω = (x, y) ∈ R2  y ≥ 1/|x|, x  = 0 . Then Ω is closed, but co Ω is not closed. Exercise 1.6. Theorem 1.52.

Compute

the

first

derivative

of

each

function.

Then

apply

Exercise 1.8. (i) Use f (x, y) = x y, (x, y) ∈ R2 . (ii) Take f 1 (x) = x and f 2 (x) = −x, x ∈ R. Exercise 1.10. Let f 1 (x) = x and f 2 (x) = x 2 , x ∈ R. Exercise 1.13. Use Proposition 1.45. Note that the function f (x) = x q , q ≥ 1, is convex on the interval [0, ∞). Exercise 1.16. Consider the set Ω = {(x, y) ∈ R2 | y ≥ x 2 } and x = 0. Exercise 1.17. (i) Use definition. √ (ii) Take f (x) = |x|, x ∈ R. b − a, x

a. a 2 (ii) (x; Ω) = max{x, 0} (componentwise maximum).

Exercise 1.21. (i) (x; Ω) = x +

Exercise 1.25. (i) gph F = C. (ii) Consider the function f (x) = (iii) Use Proposition 1.83.

 √ − 1 − x 2 if |x| ≤ 1, ∞

otherwise.

Appendix: Solutions and Hints for Selected Exercises

279

Exercise 1.27. Define F : Rn ⇒ R p by F(x) = K , x ∈ Rn . Then gph F is a convex subset of Rn × R p . The conclusion follows from Proposition 1.83.

Exercises for Chap. 2 Exercise 2.1. Suppose that λv + where λ +

m

i=1 λi

m 

λi vi = 0,

i=1

= 0. If λ  = 0, then − v=−

m  λi i=1

λ

m

i=1 (λi /λ)

= 1. It follows that

  vi ∈ aff v1 , . . . , vm ,

which is a contradiction. Thus λ = 0, and hence λi = 0 for i = 1, . . . , m because the elements v1 , . . . , vm are affinely independent. Exercise 2.2. Since m ⊂ Ω ⊂ aff Ω, we have aff m ⊂ aff Ω. To prove the reverse inclusion, it remains to show that Ω ⊂ aff m . Arguing by contradiction, suppose that there exist v ∈ Ω such that v ∈ / aff m . It follows from Exercise 2.1 that the vectors v, v1 , . . . , vm are affinely independent and all of them belong to Ω. This contradicts the condition dim Ω = m. The second equality can be proved similarly. Exercise 2.5. (i) We obviously have aff Ω ⊂ aff Ω. Observe that the set aff Ω is closed and contains Ω since aff Ω = w0 + L, where w0 ∈ Ω and L is the linear subspace parallel to aff Ω, which is a closed set. Thus Ω ⊂ aff Ω and the conclusion follows. (ii) Choose δ > 0 so that B(x; δ) ∩ aff Ω ⊂ Ω. Since x + t(x − x) = (1 + t)x + (−t)x is an affine combination of some elements in aff Ω = aff Ω, we have x + t(x − x) ∈ aff Ω. Thus u = x + t(x − x) ∈ Ω when t is small. (iii) It follows from (i) that ri Ω ⊂ ri Ω. Fix any x ∈ ri Ω and x ∈ ri Ω. By (ii), we have z = x + t(x − x) ∈ Ω when t > 0 is small, and so x = z/(1 + t) + t x/(1 + t) ∈ (z, x) ⊂ ri Ω. Exercise 2.6. Let us show that aff(epi f ) = aff(dom f ) × R.

(A.1)

Fix any (x, γ ) ∈ aff(epi f ). Then there exist λi ∈ R and (xi , γi ) ∈ epi f for i = 1, . . . , m m λi = 1 such that with i=1 m  (x, γ ) = λi (xi , γi ). i=1

280

Appendix: Solutions and Hints for Selected Exercises

Since xi ∈ dom f , i = 1, . . . , m, we have x =

m

i=1 λi x i

∈ aff(dom f ), and so

aff(epi f ) ⊂ aff(dom f ) × R. To verify the reverse inclusion, fix any (x, γ ) ∈ aff(dom f ) × R and find xi ∈ dom f and m m λi = 1 and x = i=1 λi xi . Define further αi = f (xi ) ∈ λi ∈ R for i = 1, . . . , m with i=1 m R and α = i=1 λi αi for which (xi , αi ) ∈ epi f and (xi , αi + 1) ∈ epi f . Thus we get m 

λi (xi , αi ) = (x, α) ∈ aff(epi f ),

i=1

m 

λi (xi , αi + 1) = (x, α + 1) ∈ aff(epi f ).

i=1

Letting λ = α − γ + 1 gives us (x, γ ) = λ(x, α) + (1 − λ)(x, α + 1) ∈ aff(epi f ), which justifies the reverse inclusion in (A.1). Exercise 2.10. Let Ω1 = {(x, y) ∈ R2 | y ≥ 1/x, x > 0} and Ω2 = {(x, y) ∈ R2 | y ≤ −1/x, x > 0}. Then Ω1 and Ω2 are nonempty closed convex sets while their difference Ω1 − Ω2 = {(x, y) ∈ R2 | y > 0} is not closed. Exercise 2.13. Observe that Ω1 ∩ [Ω2 − (0, λ)] = ∅ for any λ > 0. Exercise 2.16. (i) Define the sets Ω1 = epi f and Ω2 = Rn × (−∞, α]. Then Lα ( f ) = P (Ω1 ∩ Ω2 ), where P : Rn × R → R is the projection mapping P(x, λ) = x for (x, λ) ∈ Rn × R. Using Corollary 2.46, the imposed assumption yields ri Ω1 ∩ ri Ω2  = ∅. By Theorem 2.26(ii) and Theorem 2.28, we have ri Lα = P (ri Ω1 ∩ ri Ω2 ), which therefore ensures the fulfillment of (2.27). (ii) Consider the constant function f (x) = α for x ∈ Rn . Exercise 2.19. Consider the affine function ϕ : Rn → R given by ϕ(x) = a, x + b, x ∈ Rn , and define the set-valued mapping F : Rn ⇒ R by

Appendix: Solutions and Hints for Selected Exercises

F(x) =

281

 (−∞, ϕ(x)] if x ∈ Ω, ∅

if x ∈ / Ω.

Then F is a convex set-valued mapping with dom F = Ω and gph F = Θ. It remains to apply Theorem 2.45. Exercise 2.20. Fix x  0 and v ∈ N (x; Ω). Then v, x − x ≤ 0 for all x ∈ Ω. Using this inequality for 0 ∈ Ω and 2x ∈ Ω, we get v, x = 0. Since Ω = (−∞, 0] × · · · × (−∞, 0], it follows from Proposition 2.55(i) that v  0. The reverse inclusion is a direct consequence of the definition. Exercise 2.23. Pick x ∈

m

i=1 Ωi .

Define the set Ω=

m−1

Ωi .

i=1

Then Ω is a nonempty convex set such that ri Ω ∩ ri Ωm = ∅. By Theorem 2.40 and Proposition 2.52, there exists 0  = v ∈ N (x; Ω) ∩ (−N (x; Ωm )). Let vm = −v, α = v, x , and αm = vm , x . Then v, x ≤ α for all x ∈ Ω, vm , x ≤ αm for all x ∈ Ωm , v + vm = 0, and α + αm = 0. m−1 vi , where By the normal cone intersection rule, we have the representation v = i=1 vi ∈ N (x; Ωi ) for i = 1, . . . , m − 1. It remains to set αi = vi , x for i = 1, . . . , m − 1 to complete the proof. Exercise 2.24. Suppose on the contrary that ri Θ1 ∩ ri Θ2  = ∅ and find (x0 , λ0 ) ∈ ri Θ1 ∩ ri Θ2 . Since (x0 , λ0 ) ∈ ri Θ1 , we see that x0 ∈ ri Ω1 ⊂ Ω1 and 0 < λ0 . Similarly, the representation of ri Θ2 gives us x0 ∈ ri Ω2 ⊂ Ω2 and λ0 < v, x0 − x . Then x0 ∈ Ω1 ∩ Ω2 , which implies that 0 < λ0 < v, x0 − x ≤ 0. This contradiction shows that ri Θ1 ∩ ri Θ2 = ∅. Exercise 2.25. We prove the intersection formula under (ii). The conclusion holds for m = 2 due to Proposition 2.58 under (i) and u = x. Suppose that this holds for any k convex sets with k ≤ m and m ≥ 2. Consider the sets Ωi for i = 1, . . . , m + 1 satisfying the condition

282

Appendix: Solutions and Hints for Selected Exercises

m+1 





vi = 0, vi ∈ N (x; Ωi ) =⇒ vi = 0 for all i = 1, . . . , m + 1 .

i=1

Since 0 ∈ N (x; Ωm+1 ), this condition also holds for the first m sets. Thus  m  m  N x; Ωi = N (x; Ωi ). i=1

Then apply the induction hypothesis to Θ1 =

i=1

m

i=1 Ωi

and Θ2 = Ωm+1 .

Exercises for Chap. 3 Exercise 3.1. (i) Applying the subdifferential sum rule gives us ⎧ ⎪ {−2} if x < 1, ⎪ ⎪ ⎪ ⎪ ⎨ [−2, 0] if x = 1, ∂ f (x) = {0} if 1 < x < 2, ⎪ ⎪ ⎪ [2, 0] if x = 2, ⎪ ⎪ ⎩ {2} if x > 2. Since dom f = R, we see that ∂ ∞ f (x) = {0} for all x (ii) We can show that ⎧ −x ⎨ {−e } if x ∂ f (x) = [−1, 1] if x ⎩ x if x {e } and that ∂ ∞ f (x) = {0} for all x ∈ R. (iii) We can show that ⎧ if ⎨ {2x} ∂ f (x) = [2, ∞) if ⎩ (−∞, −2] if ⎧ if ⎨ {0} ∂ ∞ f (x) = [0, ∞) if ⎩ (−∞, 0] if

∈ R. < 0, = 0, > 0,

|x| < 1, x = 1, x = −1, |x| < 1, x = 1, x = −1.

(vii) Applying the subdifferential maximum rule gives us ⎧ ⎨ {0} if a, x < b, ∂ f (x) = {a} if a, x > b, ⎩ [0, a] if a, x = b. Since dom f = Rn , we see that ∂ ∞ f (x) = {0} for all x ∈ Rn .

Appendix: Solutions and Hints for Selected Exercises

283

Exercise 3.3. Since v ∈ ∂ f (x), we have v(x − x) ≤ f (x) − f (x) for all x ∈ R. Then pick any x1 < x and use the inequality above. Exercise 3.7. (i) ∂ f (x) = {2 A T (Ax − b)} for all x ∈ Rn . (ii) By using the subdifferential chain rule, we have  T if Ax = b, A B ∂ f (x) =  T Ax−b  A Ax−b if Ax  = b. Exercise 3.11. Take any v ∈ N (x; Lα ( f )) and get v, x − x ≤ 0 whenever f (x) ≤ α = f (x). Define the two convex sets Ω1 = epi f and Ω2 = Rn × (−∞, α]. It is not hard to see that (v, 0) ∈ N ((x, α); Ω1 ∩ Ω2 ). Since 0 ∈ / ∂ f (x), there exists x0 ∈ Rn such that f (x0 ) < f (x) = α. Since f is continuous on the interior of its domain, we see that (x0 , α) ∈ (int Ω1 ) ∩ Ω2 . It follows from Theorem 2.56 and Proposition 2.58 that (v, 0) = (v, −λ) + (0, λ), where (v, −λ) ∈ N ((x, α); Ω1 ) and (0, λ) ∈ N ((x, α); Ω2 ). We can check that λ ≥ 0. If λ = 0, then (v, 0) ∈ N ((x, α); epi f ), so v ∈ N (x; dom f ) = {0}. Thus v =  0 ∈ λ≥0 λ∂ f (x). In the case where λ > 0, we have (v/λ, −1) ∈ N ((x, f (x)); epi f ), so v ∈ λ∂ f (x). We have completed the proof of the inclusion “⊂”. The reverse inclusion is derived directly from the definition. The reader can also use Corollary 3.40 to derive the result. Exercise 3.12. Fix any a, b ∈ Rn with a  = b. By Theorem 3.31 there exists c ∈ (a, b) such that f (b) − f (a) = v, b − a for some v ∈ ∂ f (c). Then | f (b) − f (a)| ≤ v · b − a ≤ b − a . Exercise 3.15. Use (3.23) and Definition 2.48 of the normal cone. Exercise 3.16. (i) Take any x1 , x2 ∈ dom F and 0 < λ < 1. Then pick y1 ∈ F(x1 ) and y2 ∈ F(x2 ). Since F is a convex set-valued mapping, we get

284

Appendix: Solutions and Hints for Selected Exercises

λ(x1 , y1 ) + (1 − λ)(x2 , y2 ) ∈ gph F. This implies λy1 + (1 − λ)y2 ∈ F(λx1 + (1 − λ)x2 ). Thus λx1 + (1 − λ)x2 ∈ dom F, which justifies the convexity of dom F. (ii) Take any v ∈ D ∗ F(x, y)(0). Then (v, 0) ∈ N ((x, y); gph F). By the definition, v, x − x + 0, y − y ≤ 0 whenever (x, y) ∈ gph F. Taking any x ∈ dom F and choosing y ∈ F(x) tell us that v, x − x = v, x − x + 0, y − y ≤ 0. This implies that v ∈ N (x; dom F) and completes verifying the inclusion “⊂”. The proof of the reverse inclusion is also straightforward. Exercise 3.18. Define the mappings Fi (x) = E fi (x) = [ f i (x), ∞) for x ∈ Rn and i = 1, 2. Then F1 + F2 = E f1 + f2 , dom Fi = dom f i , and D ∗ Fi (x, f i (x))(1) = ∂ f i (x) for all x ∈ dom f i . Take any x ∈ dom f 1 ∩ dom f 2 and let y = f 1 (x) + f 2 (x) ∈ F1 (x) + F2 (x). We can easily check the representation   S(x, y) = f 1 (x), f 2 (x) for the mapping S defined in (3.24). Assuming that ri(dom f 1 ) ∩ ri(dom f 2 )  = ∅, it follows from Theorem 3.37 that ∂( f 1 + f 2 )(x) = D ∗ (F1 + F2 )(x, y)(1)     = D ∗ F1 x, f 1 (x) (1) + D ∗ F2 x, f 2 (x) (1) = ∂ f 1 (x) + ∂ f 2 (x), which completes the proof of Theorem 3.18. Exercise 3.19. (i) Pick any (x, y, z) ∈ Ω1 − Ω2 and get the representation (x, y, z) = (x1 , y1 , z 1 ) − (x2 , y2 , z 2 ), where (x1 , y1 , z 1 ) ∈ Ω1 and (x2 , y2 , z 2 ) ∈ Ω2 . By the definition of the set Ω1 we have y1 ∈ F(x1 ), so y1 ∈ rge F. Similarly, y2 ∈ dom G. Thus y1 − y2 ∈ rge F − dom G, which implies that (x, y, z) ∈ Rn × (rge F − dom G) × Rq . This justifies the inclusion “⊂” in (3.32). To prove the reverse inclusion, pick and (x, y, z) ∈ Rn × (rge F − dom G) × Rq and get y = y1 − y2 , where y1 ∈ rge F and y2 ∈ dom G. Pick x1 ∈ Rn such that y1 ∈ F(x1 ), and pick z 2 ∈ Rq such that z 2 ∈ G(y2 ). Then we have the representation

Appendix: Solutions and Hints for Selected Exercises

285

(x, y, z) = (x1 , y1 , z + z 2 ) − (x1 − x, y2 , z 2 ). Since (x1 , y1 , z + z 2 ) ∈ Ω1 and (x1 − x, y2 , z 2 ) ∈ Ω2 , this justifies the reverse inclusion in (3.32). Exercise 3.21. Since P is a convex polyhedron, there exist v1 , . . . , vm ∈ Rn and α1 , . . . , αm ∈ R such that    P = x ∈ Rn  vi , x ≤ αi for all i = 1, . . . , m . Observe that λ ≤ v, x − x if and only if (−v, 1), (x, λ) ≤ v, −x . Letting α = v, −x , we have the representation   Q = (x, λ) ∈ Rn+1  (vi , 0), (x, λ) ≤ αi ,  (−v, 1), (x, λ) ≤ α for all i = 1, . . . , m . This clearly shows that Q is a convex polyhedron in Rn+1 . Furthermore, suppose on the contrary that Q ∩ ri Θ  = ∅. Then there exists (x0 , λ0 ) ∈ Q ∩ ri Θ. This tells us that x0 ∈ P ∩ Ω, λ0 > 0, and therefore 0 < λ0 ≤ v, x − x0 ≤ 0, which brings us to a contradiction and thus completes the proof.

Exercises for Chap. 4 Exercise 4.3. We can show that f ∗ (v) = 0 if v = 0, and f ∗ (v) = ∞ if v  = 0. Then f ∗∗ (x) = 0 for all x ∈ R. Exercise 4.6. (i) Since f : Rn → R is convex and positively homogeneous, we have ∂ f (0)  = ∅ and f (0) = 0. In the case where v ∈ Ω = ∂ f (0) we have v, x ≤ f (x) for all x ∈ Rn , where the equality holds if x = 0. Thus we calculate the Fenchel conjugate by    f ∗ (v) = sup v, x − f (x)  x ∈ Rn = 0. If v ∈ / Ω = f (0), there exists x ∈ Rn satisfying v, x > f (x). This gives us by the homogeneity of f that       f ∗ (v) = sup v, x − f (x)  x ∈ Rn ≥ sup v, t x − f (t x)  t > 0 = ∞. Exercise 4.7. Use first the representations       (δΩ )∗ (v) = sup v, x − δ(x; Ω)  x ∈ Rn = sup v, x  x ∈ Ω = σΩ (v), v ∈ Rn .

286

Appendix: Solutions and Hints for Selected Exercises

Applying then the Fenchel conjugate one more time leads us to    (σΩ )∗ (x) = sup x, v − σΩ (v)  v ∈ Rn = δco Ω (x). Exercise 4.11. Suppose that v ∈ ∂ f (x). Then by Theorems 4.7 and 4.10, we have f (x) + f ∗ (v) = v, x , which yields f ∗ (v) + ( f ∗ )∗ (x) = v, x . Applying Theorem 4.7 again gives us x ∈ ∂ f ∗ (v). The converse implication can be justified by the same way. Exercise 4.14. (ii) f  (−1; d) = −∞ and f  (1; d) = ∞. Exercise 4.15. Use the representation   t2 t2 x + t2 d = (x + t1 d) + 1 − x for fixed t1 < t2 < 0. t1 t1 Exercise 4.16. (i) For d ∈ Rn , consider the function ϕ defined in Lemma 4.23. Then show that the inequalities λ < 0 < t imply that ϕ(λ) ≤ ϕ(t). (ii) It follows from (i) that ϕ(t) ≤ ϕ(1) for all t ∈ (0, 1). Similarly, ϕ(−1) ≤ ϕ(t) for all t ∈ (−1, 0). This easily yields the conclusion. Exercise 4.19. Fix x ∈ Rn , define P (x; Ω) = {w ∈ Ω | x − w = μΩ (x)}. For any w ∈ Ω define the following function f w (x) = x − w , x ∈ Rn . Then we get from Theorem 4.36 that    ∂μΩ (x) = co ∂ f w (x)  w ∈ P (x; Ω) . If Ω is not a singleton, for any w ∈ P (x; Ω) we have ∂ f w (x) = ∂μΩ (x) =

x −w , and so μΩ (x)

x − co P (x; Ω) . μΩ (x)

The case where Ω is a singleton is trivial. Exercise 4.21. (i) Fix any v ∈ Rn with v ≤ 1 and any w ∈ Ω. Then x, v − σΩ (v) ≤ x, v − w, v = x − w, v ≤ x − w · v ≤ x − w . It follows therefore that sup

v ≤1



 x, v − σΩ (v) ≤ d(x; Ω).

Appendix: Solutions and Hints for Selected Exercises

287

To verify the reverse inequality, observe first that it holds trivially if x ∈ Ω. In the case where x ∈ / Ω, let w = (x; Ω) and u = x − w; thus u = d(x; Ω). Following the proof of Proposition 2.33, we obtain the estimate u, z ≤ u, x − u 2 for all z ∈ Ω. Divide both sides by u and let v = u/ u . Then σΩ (v) ≤ v, x − d(x; Ω), and so we arrive at the claimed inequality d(x; Ω) ≤ sup

v ≤1

  x, v − σΩ (v) .

Exercises for Chap. 5 Exercise 5.1. Let f be Gâteaux differentiable at x with f G (x) = v. Then f G (x), v = lim

t→0

f (x + td) − f (x) f (x + td) − f (x) = lim for all d ∈ Rn . t t t→0+

This implies by the definition of the directional derivative that f  (x; d) = v, d for all d ∈ Rn . Conversely, suppose that f  (x; d) = v, d for all d ∈ Rn . Then f  (x; d) = v, d = lim

t→0+

f (x + td) − f (x) for all d ∈ Rn . t

Moreover, f − (x; d) = − f  (x; −d) = − v, −d = v, d for all d ∈ Rn . Thus f − (x; d) = v, d = lim

t→0−

f (x + td) − f (x) for all d ∈ Rn . t

This implies that f is Gâteaux differentiable at x with f G (x) = v. Exercise 5.2. (i) Since the function ϕ(t) = t 2 is convex and nondecreasing on [0, ∞), the function g : Rn → [0, ∞) is a convex function. We can show that ∂(g 2 )(x) = 2g(x)∂g(x)  2 for every x ∈ Rn , where g 2 (x) = g(x) , x ∈ Rn . It follows from Theorem 3.12 that  {0} if x ∈ Ω, ∂ f (x) = 2d(x; Ω)∂d(x; Ω) =    otherwise. 2 x − (x; Ω)    Thus in any case ∂ f (x) = 2 x − (x; Ω) is a singleton, which yields the Fréchet dif  ferentiability of f at x by Theorem 5.2 and ∇ f (x) = 2 x − (x; Ω) . (ii) Use Theorem 3.12 in the out-of-set case and Theorem 5.2.

288

Appendix: Solutions and Hints for Selected Exercises

Exercise 5.3. For every u ∈ F, the function f u (x) = Ax, u − 21 u 2 , x ∈ Rn , is convex, and hence f is a convex function by Proposition 1.50. It follows from the equality Ax, u = x, A T u that ∇ f u (x) = A T u, x ∈ Rn . Fix x ∈ Rn and observe that the function gx (u) = Ax, u − 21 u 2 , u ∈ R p , has a unique maximizer on F denoted by u(x). By Theorem 4.36, we have ∂ f (x) = {A T u(x)}, and thus f is Fréchet differentiable at x by Theorem 5.2. Exercise 5.5. Consider the set Y = {d ∈ Rn | f  (x; d) = f − (x; d)} and verify that Y is a linear subspace of Rn containing the standard orthogonal basis {ei | i = 1, . . . , n}. Thus V = Rn , which yields the Fréchet differentiability of f at x. Indeed, fix any v ∈ ∂ f (x) and get by Theorem 4.27(iii) that f  (x; d) = v, d for all d ∈ Rn . Hence ∂ f (x) is a singleton, which ensures that f is Fréchet differentiable at x by applying Theorem 5.2. Exercise 5.6. Use Corollary 3.10 and its proof. Exercise 5.7. As an example, consider the function f (x) = x 4 on R. Exercise 5.10. Fix any sequence {xk } ⊂ co Ω. By the Carathéodory theorem, find λk,i ≥ 0 and wk,i ∈ Ω for i = 0, . . . , n with xk =

n  i=0

λk,i wk,i and

n 

λk,i = 1 for every k ∈ N.

i=0

Then use the boundedness of {λk,i }k for i = 0, . . . , n and the compactness of Ω to extract a subsequence {xk } converging to some element of co Ω. Exercise 5.17. By definition, there exists  x ∈ Ω such that  x∈ / H . Observe that aff(Ω ∩ H ) ⊂ aff Ω. Arguing by contradiction, suppose that dim(Ω ∩ H ) = dim Ω. Then aff(Ω ∩ H ) = aff Ω. This is a contradiction since  x ∈ aff Ω while  x∈ / aff(Ω ∩ H ). Exercise 5.20. By Corollary 2.58, we have N (x; Ω1 ∩ Ω2 ) = N (x; Ω1 ) + N (x; Ω2 ). Applying Theorem 5.25 tells us that

◦ T (x; Ω1 ∩ Ω2 ) = N (x; Ω1 ∩ Ω2 )





◦ = N (x; Ω1 ) + N (x; Ω2 ) = N (x; Ω1 ) ∩ N (x; Ω2 ) = T (x; Ω1 ) ∩ T (x; Ω2 ).

Appendix: Solutions and Hints for Selected Exercises

289

Exercises for Chap. 6 Exercise 6.1. (i) F∞ = {0} × R+ . (ii) F∞ = F. (iii) F∞ = R+ × R+ . (iv) F∞ = {(x, y) | y ≥ |x|}. Exercise 6.3. It is obvious that Ω∞ ⊂ {d ∈ Rn | Ω + d ⊂ Ω}. To prove the reverse inclusion, fix any d ∈ Rn such that Ω + d ⊂ Ω. Fix further x0 ∈ Ω and get x0 + kd ∈ Ω for all k ∈ N ∪ {0}. Then for any λ > 0, we can find two integers k1 , k2 ≥ 0 and 0 ≤ t ≤ 1 such that λ = tk1 + (1 − t)k2 . Then x0 + λd = t(x0 + k1 d) + (1 − t)(x0 + k2 d) ∈ Ω by the convexity of Ω. Therefore, we arrive at d ∈ Ω∞ . Exercise 6.8. Choose γ > 0 such that B(0; γ ) ⊂ F and find t > 0 sufficiently large 0 for some x0 ∈ Ω). Since B(x; tγ ) = x + with B(x; tγ ) ∩ Ω  = ∅ (e.g., t = 1 + x−x γ tB(0; γ ) ⊂ x + t F, we see that (x + t F) ∩ Ω  = ∅ and thus verify that TΩF (x) ≤ t < ∞. Exercise 6.11. Suppose that F is bounded and choose γ > 0 such that F ⊂ B(0; γ ). Using the Cauchy-Schwarz inequality yields B(0; 1/γ ) ⊂ F ◦ , and so 0 ∈ int F ◦ . Now suppose that 0 ∈ int F ◦ and get by Exercise 5.19 and Lemma 6.15 that F ◦◦ = co(F ∪ {0}) is bounded. This ensures that F is bounded. Exercise 6.13. Suppose that t = ρ F (x) ≤ 1. Then there exists a sequence {tk } ⊂ [0, ∞) such that tk → t as k → ∞ and x ∈ tk F. Then we can show that x ∈ t F by considering the two cases: t = 0 and t > 0. Since 0 ∈ F and 0 ≤ t ≤ 1, we have x ∈ t F ⊂ F. Thus {x ∈ Rn | ρ F (x) ≤ 1} ⊂ F. The reverse inclusion follows directly from the definition. Exercise 6.14. It suffices to consider the case where x  = 0. Then x/ρ F (x) ∈ F by Exercise 6.13. Thus we have that v, x/ρ F (x) ≤ σ F (v). The rest of the proof follows from Lemma 6.16. F (x) for all x ∈ Rn . Thus Exercise 6.15. (i) Observe that ρ F (−x) = T{0} F (−x) for all x ∈ Rn . ∂ρ F (x) = −∂ T{0}

Using Theorem 6.21 gives us ∂ρ F (0) = F ◦ . (ii) Let r = ρ F (x), and let Br = {x ∈ Rn | ρ F (x) ≤ r }. Using Theorem 6.22 and the fact that σ F = ρ F ◦ from Corollary 6.17, we have ∂ρ F (x) = {v ∈ Rn | v ∈ N (x; Br ), ρ F ◦ (v) = 1}.

290

Appendix: Solutions and Hints for Selected Exercises

It follows from Exercise 6.13 that Br = r F, and so N (x; Br ) = N (x; r F) = N (x/r ; F). Thus by the definition of the normal cone, we see that v ∈ N (x; Br ) if and only if r σ F (v) = ρ F (x)ρ F ◦ (v) ≤ v, x . Since the reverse inequality holds by Exercise 6.14, we have v ∈ N (x; Br ) if and only if ρ F (x)ρ F ◦ (x) = v, x . The representation of ∂ρ F (x) is now straightforward. Exercise 6.18. Observe that ◦

F (x) = inf σ F (w − x) = inf ρ F ◦ (w − x) = TΩF (x) for all x ∈ Rn . dΩ w∈Ω

w∈Ω

It remains to apply the related results for the minimal time function.

Exercises for Chap. 7 Exercise 7.1. Use the relationship Uλ> ( f ) = Lcλ ( f ) and Proposition 7.4. Exercise 7.5. Consider the l.s.c. function  0 f (x) = −1

if x  = 0, if x = 0.

Then the product of f · f is not l.s.c. Exercise 7.10. Consider the function f defined in (i). By the triangle inequality we have f (x) =

m 

x − ai ≥

m 

i=1

where γ =

m

i=1 ai .

( x − ai ) = m x − γ ,

i=1

It follows that lim

x →∞

f (x) = ∞,

which yields the conclusion by Corollary 7.10 under (ii). Exercise 7.13. Solve the equation ∇g(x) = 0, where ∇g(x) =

m  i=1

2(x − ai ), x ∈ Rn .

Appendix: Solutions and Hints for Selected Exercises

291

Exercise 7.14. Let Ω = {x ∈ Rn | Ax = b}. Then N (x; Ω) = {A T λ | λ ∈ R p } for x ∈ Ω. Corollary 7.16 tells us that x ∈ Ω is an optimal solution to this problem if and only if there exists λ ∈ R p such that −2x = A T λ and Ax = b. Solving this system of equations yields x = A T (A A T )−1 b. Exercise 7.22. The MATLAB function sign x gives us a subgradient of the function f (x) = x 1 for all x ∈ R. Considering the set    Ω = x ∈ Rn  Ax = b , we have the projection formula (x; Ω) = x + A T (A A T )−1 (b − Ax). This can be found by minimizing the quadratic function ϕ(u) = x − u 2 subject to u ∈ Ω. Exercise 7.23. If 0 ∈ ∂ f (xk0 ) for some k0 ∈ N, then αk0 = 0, and hence xk = xk0 ∈ S and / ∂ f (xk ) as f (xk ) = V for all k ≥ k0 . Thus we assume without loss of generality that 0 ∈ k ∈ N. It follows from the proof of Proposition 7.40 that 0 ≤ xk+1 − x 2 ≤ x1 − x 2 − 2

k 

αi [ f (xi ) − V ] +

i=1

k 

αi2 vi 2

i=1

for any x ∈ S. Substituting here the formula for αk gives us 0 ≤ xk+1 − x 2 ≤ x1 − x 2 − 2 = x1 − x 2 −

k  [ f (xi ) − V ]2

i=1 k  i=1

[ f (xi ) − V ]2 and vi 2

k  [ f (xi ) − V ]2 i=1

vi 2

+

vi 2

≤ x1 − x 2 .

Since vi ≤ for every i, we get k  [ f (xi ) − V ]2 i=1

2

≤ x1 − x 2 ,

k  [ f (xi ) − V ]2 i=1

vi 2

292

Appendix: Solutions and Hints for Selected Exercises

which implies in turn that k 

[ f (xi ) − V ]2 ≤ 2 x1 − x 2 .

(A.2)

i=1

 2 Then the series ∞ k=1 [ f (x k ) − V ] is convergent. Hence we get f (x k ) → V as k → ∞, which justifies (i). From (A.2) and the fact that Vk ≤ f (xi ) for i = 1, . . . , k, it follows the estimate k(Vk − V )2 ≤ 2 x1 − x 2 . Thus 0 ≤ Vk − V ≤ assertion (ii).

x1 − x . Since x was taken arbitrarily in the set S, we arrive at √ k

Exercises for Chap. 8 Exercise 8.2. Define the function m 

f (x) =

i=1 m  i=1

ai x − ai 1 x − ai

if x ∈ / {a1 , . . . , am },

and f (x) = x if x ∈ {a1 , . . . , am }. Use a FOR loop with a stopping criterion to find xk+1 = f (xk ), where x0 is a starting point. Exercise 8.3. Use the subdifferential Fermat rule. Consider the cases where m is odd and where m is even. In the first case, the problem has a unique optimal solution. In the second case, the problem has infinitely many solutions. Exercise 8.4. For the interval Ω = [a, b] ⊂ R, the subdifferential formula for the distance function Theorem 3.12 gives us the calculation ⎧ ⎪ {0} if x ∈ (a, b), ⎪ ⎪ ⎪ ⎪ ⎨ [0, 1] if x = b, ∂d(x; Ω) = {1} if x > b, ⎪ ⎪ ⎪ [−1, 0] if x = a, ⎪ ⎪ ⎩ {−1} if x < b. Then use this calculation and the subdifferential Fermat rule.

Appendix: Solutions and Hints for Selected Exercises

293

Exercise 8.9. For Ω = B(c; r ), a subgradient of the function d(·; Ω) at x is ⎧ ⎨0 if x ∈ Ω, x −c v(x) = ⎩ / Ω. if x ∈ x − c Then a subgradient of the function f can be found by the subdifferential sum rule from Corollary 3.21. Exercise 8.12. Use the subgradient formula in the hint of Exercise 8.9 together with Proposition 3.27. It is easy to find an index i ∈ I (x), and then see that any subgradient of the function f i (x) = d(x; Ωi ) at x belongs to ∂ f (x).

References

1. H.H. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces (Springer, Berlin, 2011) (2nd edition in Springer, 2017) 2. D.P. Bertsekas, A. Nedi´c, A. E. Ozdaglar, Convex Analysis and Optimization (Athena Scientific, 2003) 3. J.M. Borwein, A.S. Lewis, Convex Analysis and Nonlinear Optimization (Springer, Berlin, 2000) 4. J.M. Borwein, Q.J. Zhu, Techniques of Variational Analysis (Springer, Berlin, 2005) 5. R.I. Bo¸t, Conjugate Duality in Convex Optimization (Springer, Berlin, 2010) 6. S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004) 7. G. Colombo, P.R. Wolenski, Variational analysis for a class of minimal time functions in Hilbert spaces. J. Convex Anal. 11, 335–361 (2004) 8. Z. Drezner, H.W. Hamacher (eds.), Facility Location: Applications and Theory (Springer, Berlin, 2004) 9. J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization Algorithms, vols (I and II (Springer, Berlin, 1993) 10. J.-B. Hiriart-Urruty, C. Lemaréchal, Fundamentals of Convex Analysis (Springer, Berlin, 2001) 11. H.W. Kuhn, A note on Fermat-Torricelli problem. Math. Program. 54, 98–107 (1973) 12. Y.S. Kupitz, H. Martini, M. Spirova, The Fermat-Torricelli problem, part I: a discrete gradientmethod approach. Optim. Theory Appl. 158, 305–327 (2013) 13. S.R. Lay, Convex Sets and Their Applications (Wiley, 1982) 14. G.G. Magaril-Il’yaev, V.M. Tikhomirov, Convex Analysis: Theory and Applications (American Mathematical Society, 2003) 15. B.S. Mordukhovich, Variational Analysis and Generalized Differentiation, vols (I and II (Springer, Berlin, 2006) 16. B.S. Mordukhovich, Variational Analysis and Applications (Springer, Berlin, 2018) 17. B.S. Mordukhovich, N.M. Nam, Applications of variational analysis to a generalized FermatTorricelli problem. Optim. Theory Appl. 148, 431–454 (2011) 18. B.S. Mordukhovich, N.M. Nam, Convex Analysis and Beyond, vol. I (Springer, Berlin, 2022) 19. B.S. Mordukhovich, N.M. Nam, J. Salinas, Solving a generalized Heron problem by means of convex analysis. Amer. Math. Monthly 119, 87–99 (2012) © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0

295

296

References

20. B.S. Mordukhovich, N.M. Nam, J. Salinas, Applications of variational analysis to a generalized Heron problem. Applic. Anal. 91, 1915–1942 (2012) 21. B.S. Mordukhovich, N.M. Nam, C. Villalobos, The smallest enclosing ball problem and the smallest intersecting ball problem: existence and uniqueness of solutions. Optim. Lett. 154, 768–791 (2012) 22. N.M. Nam, N. Hoang, A generalized Sylvester problem and a generalized Fermat-Torricelli problem. J. Convex Anal. 20, 669–687 (2013) 23. N.M. Nam, N. Hoang, N.T. An, Constructions of solutions to generalized Sylvester and FermatTorricelli problems for Euclidean balls. Optim. Theory Appl. 160, 483–509 (2014) 24. J.F. Nash, Noncooperative Games, Doctoral thesis (Princeton University, Princeton, 1950) 25. J.F. Nash, Equilibrium points in N-person games. Proc. Nat. Acad. Sci. 36, 48–49 (1950) 26. Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, 2004) 27. Y. Nesterov, Smooth minimization of nonsmooth functions. Math Program. 103, 127–152 (2005) 28. Y. Nesterov, Lectures on Convex Optimization (Springer, Berlin, 2018) 29. B.T. Polyak, An Introduction to Optimization (Optimization Software, 1987) 30. R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 1970) 31. R.T. Rockafellar, Applications of convex variational analysis to Nash equilibrium, in Proceedings of 7th International Conference on Nonlinear Analysis and Convex Analysis (Busan, Korea, 2011), pp. 173–183 32. R.T. Rockafellar, R.J-B. Wets, Variational Analysis (Springer, Berlin, 1998) 33. A. Ruszczynski, Nonlinear Optimization (Princeton University Press, Princeton, 2006) 34. N.Z. Shor, Minimization Methods for Nondifferentiable Functions (Springer, Berlin, 1985) 35. J.J. Sylvester, A question in the geometry of situation. Quart. J. Math. 1, 79 (1857) 36. J. van Tiel, Convex Analysis: An Introductory Text (Wiley, 1984) 37. E. Weiszfeld, Sur le point pour lequel la somme des distances de n points donnés est minimum. Tôhoku Math. J. 43, 355–386 (1937) 38. C. Z˘alinescu, Convex Analysis in General Vector Spaces (World Scientific, 2002) 39. E. Zeidler, Applied Functional Analysis: Applications to Mathematical Physics (Springer, Berlin, 1997)

Index

A Affine combination, 46 Affine equality constraints, 203 Affine hull, 45 Affinely independent, 48 Affine mapping, 10 Affine set, 45 Apollonius problem, 270 Asymptotic cone, 171

B Biconjugacy equality, 126 Biconjugate function, 123 Bipolar of set, 163 Bolzano-Weierstrass theorem, 4 Boundary of set, 3 Brouwer fixed point theorem, 229

C Canonical form, 222 Carathéodory theorem, 155 Cauchy-Schwarz inequality, 2 Closed ball, 2 Closure of set, 3 Coderivative, 100 Coderivative chain rule, 104 Coderivative chain rule for polyhedral mappings, 115

Coderivative sum rule, 102 Collinearity condition, 238 Compact set, 4 Complementary slackness conditions, 217 Complement of set, 3 Completeness axiom, 4 Composite optimization, 207 Composition of multifunctions, 36 Concave function, 42 Concave maximization, 209 Cone, 40, 153 Conjugate calculus, 132 Conjugate chain rule, 135 Conjugate sum rule, 134 Constant dynamic set, 174 Convex combination, 9, 155 Convex cone, 40 Convex cone generated by a set, 154 Convex conic hull, 154 Convex function, 15 Convex hull, 12, 155 Convexification of function, 44 Convexity of composition, 19 Convex optimization, 198 Convex polyhedron, 109 Convex separation, 61 Convex separation for polyhedra, 110 Convex set, 7 Cost/objective function, 191

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Mordukhovich and N. M. Nam, An Easy Path to Convex Analysis and Applications, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-26458-0

297

298 D Derivative, 85 Descent property, 243 Dimension of convex set, 48 Directional derivative, 137 Directional function, 140 Distance function, 28 Domain, 15 Dual Lagrangian, 209

E Enlargement of set, 177 Epigraph, 15, 194 Epigraphical multifunction, 34 Equilibrium point, 209 Euclidean projection, 29, 228 Existence of Nash equilibrium, 232 Existence of optimal solutions, 195 Extended-real-valued function, 191 Extremal principle, 66 Extremal system of sets, 66 Extreme point, 161

F Face of set, 168 Facility location problems, 237 Farkas lemma, 158, 165 Feasible solution, 201 Fenchel conjugate, 121 Fenchel duality in optimization, 204 Fenchel dual problem, 204 Fenchel-Moreau theorem, 126 Fenchel strong duality, 206 Fenchel weak duality, 205 Fermat rule, 84 Fermat-Torricelli problem, 237 Fixed point, 232 Fréchet derivative, 85 Fréchet differentiability, 151 Full rank condition, 208 Functional constraints, 201

G Gâteaux derivative, 85 Gâteaux differentiability, 151 Generalized ball, 262

Index Generalized distance function, 189 Generalized Fermat-Torricelli problem, 246 Generalized Heron problem, 246 Generalized Jensen inequality, 18 Generalized planar Fermat-Torricelli problem, 256 Generalized projection, 189, 246 Generalized Sylvester problem, 262 Geometric constraints, 201 Graph of multifunction, 33

H Helly theorem, 159 Horizon cone, 171 Horizon subdifferential, 89 Hyperplane, 58 Hypograph, 234

I Indicator function, 85 Indicator mapping, 102 Inequality constraints, 200 Infimal convolution, 39, 129 Infimum, 5 Interior of set, 3 Inverse image, 39 Inverse of set-valued mapping, 37

J Jensen inequality, 14

K Karush-Kuhn-Tucker conditions, 202 Krein-Milman theorem, 163

L Lagrange dual function, 209 Lagrange function, 201 Lagrange multiplier rule, 203 Lagrangian, 201, 209 Lagrangian duality in optimization, 209 Lagrangian dual problem, 211 Lagrangian strong duality, 211 Lagrangian weak duality, 211 Linear programming, 221

Index Linear subspace, 47 Lipschitz continuity, 181 Locally Lipschitz function, 25 Lower semicontinuous function, 191 M Marginal function, 37 Maximum function, 19 Mean value theorem, 23 Minimal time function, 174 Minimax problem, 263 Minkowski gauge, 179, 249 Minkowski theorem, 163 Mixed strategy, 229 Monotone mapping, 168 Moreau-Rockafellar theorem, 93 N Nash equilibrium, 229 Necessary optimality conditions, 198 Noncooperative, 229 Normal cone, 70 Normal cone intersection rule, 74 Normal cone intersection rule for polyhedral sets, 111 Normal cone to inverse image, 105 O Open ball, 2 Optimal solution, 201 Optimal value function, 37, 106 P Parallel subspace, 47 Pareto equilibrium, 229 Payoff function, 229 Payoff matrix, 230 Planar smallest intersection ball problems, 269 Polar of set, 163 Polyhedral function, 112 Polyhedral mapping, 114 Polyhedral multifunction, 109 Polyhedral set, 109 Positively homogeneous function, 42 Positive semidefinite matrix, 17 Preimage, 105

299 Primal minimization problem, 204, 209 Projected subgradient method, 228 Proper function, 15 Proper separation, 61, 64 Proper supporting hyperplane, 161 Pure strategy, 231 Q Quasi-convex function, 42 R Radon theorem, 159 Relative boundary, 161 Relative interior, 50 Relative interior qualification condition, 91 Relative interior Slater condition, 214 Right position, 256 S Saddle point, 209 Set extremality, 61 Set separation, 59 Set-valued mapping/multifunction, 33 Simplex, 49 Simplex method, 221 Singular subdifferential, 89 Slater constraint qualification, 202 Smallest enclosing circle problem, 262 Span, 48 Strictly convex function, 17, 238, 264 Strictly convex set, 247 Strictly monotone mapping, 168 Strict separation, 59 Strongly convex function, 167 Strongly monotone mapping, 168 Subadditive function, 42 Subdifferential, 81 Subdifferential chain rule, 94, 108 Subdifferential chain rule for polyhedral functions, 113 Subdifferential maximum rule, 96 Subdifferential mean value theorem, 98 Subdifferential of optimal value function, 106 Subdifferential of supremum function, 146 Subdifferential sum rule, 91 Subdifferential sum rule for polyhedral functions, 112

300 Subdifferentiation of minimal time functions, 183, 185 Subgradient, 81 Subgradient algorithm, 223 Subgradients of Lipschitz functions, 222 Sublevel set, 42, 79, 193 Sufficient optimality conditions, 198 Summation of sets, 4 Support function, 127 Supporting hyperplane, 161 Support intersection rule, 129 Supremum, 5 Supremum function, 21, 142 Sylvester problem, 262 T Tangent cone, 164

Index Target set, 174 Torricelli point, 241 Two-person games, 229

U Unconstrained optimization, 198 Upper semicontinuous function, 143

V Value convergence, 225

W Weierstrass existence theorem, 6 Weiszfeld algorithm, 241, 274