Introduction to the Theory of Optimization in Euclidean Space is intended to provide students with a robust introduction
239 63 21MB
English Pages 334 [335] Year 2019
Table of contents :
Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Contents
Preface
Acknowledgments
Symbol Description
Author
1. Introduction
1.1 Formulation of Some Optimization Problems
1.2 Particular Subsets of Rn
1.3 Functions of Several Variables
2. Unconstrained Optimization
2.1 Necessary Condition
2.2 Classification of Local Extreme Points
2.3 Convexity/Concavity and Global Extreme Points
2.3.1 Convex/Concave Several Variable Functions
2.3.2 Characterization of Convex/Concave C1 Functions
2.3.3 Characterization of Convex/Concave C2 Functions
2.3.4 Characterization of a Global Extreme Point
2.4 Extreme Value Theorem
3. Constrained Optimization-Equality Constraints
3.1 Tangent Plane
3.2 Necessary Condition for Local Extreme Points-Equality Constraints
3.3 Classification of Local Extreme Points-Equality Constraints
3.4 Global Extreme Points-Equality Constraints
4. Constrained Optimization-Inequality Constraints
4.1 Cone of Feasible Directions
4.2 Necessary Condition for Local Extreme Points/Inequality Constraints
4.3 Classification of Local Extreme Points-Inequality Constraints
4.4 Global Extreme Points-Inequality Constraints
4.5 Dependence on Parameters
Bibliography
Index
Introduction to the Theory of Optimization in Euclidean Space
Series in Operaons Research Series Editors: Malgorzata Sterna, Marco Laumanns
About the Series The CRC Press Series in Operaons Research encompasses books that contribute to the methodology of Operaons Research and applying advanced analycal methods to help make beer decisions. The scope of the series is wide, including innovave applicaons of Operaons Research which describe novel ways to solve real-world problems, with examples drawn from industrial, compung, engineering, and business applicaons. The series explores the latest developments in Theory and Methodology, and presents original research results contribung to the methodology of Operaons Research, and to its theorecal foundaons. Featuring a broad range of reference works, textbooks and handbooks, the books in this Series will appeal not only to researchers, praconers and students in the mathemacal community, but also to engineers, physicists, and computer sciensts. The inclusion of real examples and applicaons is highly encouraged in all of our books. Raonal Queueing Refael Hassin Introducon to the Theory of Opmizaon in Euclidean Space Samia Challal
For more informaon about this series please visit: hps://www.crcpress.com/Chapman--HallCRC-Series-inOperaons-Research/book-series/CRCOPSRES
Introduction to the Theory of Optimization in Euclidean Space
Samia Challal Glendon College-York University Toronto, Canada
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 c 2020 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper International Standard Book Number-13: 978-0-367-19557-1 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a notfor-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
To my parents
Contents
Preface
ix
Acknowledgments
xi
Symbol Description
xiii
Author
xv
1 Introduction 1.1 Formulation of Some Optimization Problems . . . . . . . . . 1.2 Particular Subsets of Rn . . . . . . . . . . . . . . . . . . . . 1.3 Functions of Several Variables . . . . . . . . . . . . . . . . .
1 1 8 20
2 Unconstrained Optimization 2.1 Necessary Condition . . . . . . . . . . . . . . . . . . . . . 2.2 Classification of Local Extreme Points . . . . . . . . . . . 2.3 Convexity/Concavity and Global Extreme Points . . . . 2.3.1 Convex/Concave Several Variable Functions . . . 2.3.2 Characterization of Convex/Concave C 1 Functions 2.3.3 Characterization of Convex/Concave C 2 Functions 2.3.4 Characterization of a Global Extreme Point . . . . 2.4 Extreme Value Theorem . . . . . . . . . . . . . . . . . .
. . . . . . . .
49 49 71 93 93 95 98 102 117
. . . . . . .
135 137
. . . . . . .
151
. . . . . . . . . . . . . .
167 187
4 Constrained Optimization-Inequality Constraints 4.1 Cone of Feasible Directions . . . . . . . . . . . . . . . . . . . 4.2 Necessary Condition for Local Extreme Points/ Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . 4.3 Classification of Local Extreme Points-Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203 204
3 Constrained Optimization-Equality Constraints 3.1 Tangent Plane . . . . . . . . . . . . . . . . . . . 3.2 Necessary Condition for Local Extreme Points-Equality Constraints . . . . . . . . . . . . 3.3 Classification of Local Extreme Points-Equality Constraints . . . . . . . . . . . . . . . . . . . . . 3.4 Global Extreme Points-Equality Constraints . .
. . . . . . . .
220 251 vii
viii
Contents 4.4 4.5
Global Extreme Points-Inequality Constraints . . . . . . . . Dependence on Parameters . . . . . . . . . . . . . . . . . . .
271 292
Bibliography
315
Index
317
Preface
The book is intended to provide students with a useful background in optimization in Euclidean space. Its primary goal is to demystify the theoretical aspect of the subject.
In presenting the material, we refer first to the intuitive idea in one dimension, then make the jump to n dimension as naturally as possible. This approach allows the reader to focus on understanding the idea, skip the proofs for later and learn to apply the theorems through examples and solving problems. A detailed solution follows each problem constituting an image and a deepening of the theory. These solved problems provide a repetition of the basic principles, an update on some difficult concepts and a further development of some ideas.
Students are taken progressively through the development of the proofs where they have the occasion to practice tools of differentiation (Chain rule, Taylor formula) for functions of several variables in abstract situation. They learn to apply important results established in advanced Algebra and Analysis courses, like, Farkas-Minkowski Lemma, the implicit function theorem and the extreme value theorem.
The book starts, in Chapter 1, with a short introduction to mathematical modeling leading to formulation of optimization problems. Each formulation involves a function and a set of points. Thus, basic properties of open, closed, convex subsets of Rn are discussed. Then, usual topics of differential calculus for functions of several variables are reminded.
In the following chapters, the study is devoted to the optimisation of a function of several variables f over a subset S of Rn . Depending on the particularity of this set, three situations are identified. In Chapter 2, the set S has a nonempty interior; in Chapter 3, S is described by an equation g(x) = 0 and in Chapter 4
ix
x
Preface
by an inequality g(x) 0 where g is a function of several variables. In each case, we try to answer the following questions: – If the extreme point exists, then where is it located in S? Here, we look for necessary conditions to have candidate points for optimality. We make the distinction between local and global points. – Among the local candidate points, which of them are local maximum or local minimum points? Here, we establish sufficient conditions to identify a local candidate point as an extreme point. – Now, among the local extreme points found, which ones are global extreme points? Here, the convexity/concavity property intervenes for a positive answer. Finally, we explore how the extreme value of the objective function f is affected when some parameters involved in the definition of the functions f or g change slightly.
Acknowledgments
I am very grateful to my colleagues David Spring, Mario Roy and Alexander Nenashev for introducing the course on optimization, for the first time, to our math program and giving me the opportunity to teach it. I, especially, thank Professor Vincent Hildebrand, Chair of the Economics Department for the useful discussions during the planning of the course content to support students majoring in Economics. My thanks are also due to Sarfraz Khan and Callum Fraser from Taylor and Francis Group, to the reviewers for their invaluable help, and to Shashi Kumar for the expert technical support. I have relied on the various authors cited in the bibliography, and I am grateful to all of them. Many exercises are drawn or adapted from the cited references for their aptitude to reinforce the understanding of the material.
xi
Symbol Description
∀
For all, or for each
∃
There exists
∃!
There exists a unique
∅
The empty set
s.t
Subject to
◦
A
a2ij
1/2
norm of the ma-
trix A = (aij )i,j=1,...,n
Interior of the set S
∂S
Closure of the set S
S
Boundary of the set S
CS
The complement of S.
i, j, k
i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1) standard basis of R3
rankA
rank of the matrix A
detA
determinant of the matrix A
KerA
= {x : Ax = 0} Kernel of the matrix A
= h1 . . . hn transpose of ⎡ ⎤ h1 ⎢ ⎥ h = ⎣ .. ⎦ . hn
th
Br (x0 )
Ball centered at x0 with radius r
Br (x0 )
Bordered Hessian of order r at x0 .
., .
or [ . , . ] brackets for vectors
∇f
gradient of f ⎡ ∗ ⎤ x1 ⎢ . ⎥ = ⎣ . ⎦ column vector iden. x∗n tified sometimes to the point (x∗1 , . . . , x∗n ) x21 + x22 + . . . + x2n norm of = the vector x
x
n
i,j=1
S
x∗
=
Mm n
set of matrices of m rows and n columns
A
= (aij ) i = 1, . . . , m, j = 1, . . . , n n matrix
is an m ×
t h.x∗
=
n
hk .xk dot product of the
k=1
vectors h and x∗ C 1 (D)
set of continuously differentiable functions on D
C k (D)
set of continuously differentiable functions on D up to the order k
C ∞ (D)
set of continuously differentiable functions on D for any order k
Hf (x)
= (fxi xj )n×n Hessian of f
Dk (x)
fx 1 x 1 fx 2 x 1 = . .. f
fx 1 x 2
...
fx 2 x 2
...
.. .
.. .
fx k x 2 ... xk x1 leading minor of order Hessian Hf
fx 1 x k fx 2 x k .. . fx k x k k of the
xiii
Author
Samia Challal is an assistant professor of Mathematics at Glendon College, the bilingual campus of York University. Her research interests include homogenization, optimization, free boundary problems, partial differential equations and problems arising from mechanics.
xv
Chapter 1 Introduction
Optimization problems arise in different domains. In Section 1.1 of this chapter, we introduce some applications and learn how to model a situation as an optimization problem. The points where an optimal quantity is attained are looked for in subsets that can be one dimensional, multi-dimensional, open, closed, bounded or unbounded, . . . etc. We devote Section 1.2 to study some topological properties of such subsets of Rn . Finally, since, the phenomena analyzed are often complex, because of the many parameters that are involved, this requires an introduction to functions of several variables that we study in Section 1.3.
1.1
Formulation of Some Optimization Problems
The purpose of this short section is to show, through some examples, the main elements involved in an optimization problem.
Example 1. Different ways in modeling a problem. To minimize the material in manufacturing a closed can with volume capacity of V units, we need to choose a suitable radius for the container. i)
Show how to make this choice without finding the exact radius.
ii)
How to choose the radius if the volume V may vary from one liter to two liters?
1
2
Introduction to the Theory of Optimization in Euclidean Space
Solution: Denote by h and r the height and the radius of the can respectively. Then, the area and the volume of the can are given by area = A = 2πr2 + 2πrh,
volume = V = πr2 h.
i) * The area can be expressed as a function of r and the problem is reduced to find r ∈ (0, +∞) for which A is minimum: ⎧ 2V ⎪ ⎨ minimize A = A(r) = 2πr2 + over the set S r ⎪ ⎩ S = (0, +∞) = {r ∈ R / r > 0}. Note that the set S, as shown in Figure 1.1, is an open unbounded interval of R.
0.0
interval
r0
0.5
1.0
1.5
2.0
r
2.5
FIGURE 1.1: S = (0, +∞) ⊂ R
** We can also express the problem as follows: ⎧ ⎨ minimize A(r, h) = 2πr2 + 2πrh ⎩
S = {(r, h) ∈ R+ × R+
over the set S
/ πr2 h = V }.
Here, the set S is a curve in R2 and is illustrated by Figure 1.2 below: h
2.0
1.5
1.0
0.5
S
0.5
1.0
1.5
2.0
r
FIGURE 1.2: S is a curve h = π −1 /r2 in the plane (V=1 liter)
ii) In the case, we allow more possibilities for the volume, for example 1 V 2, then we can formulate the problem as a two dimensional problem
Introduction ⎧ 2 ⎪ ⎨ minimize A(r, h) = 2πr + 2πrh ⎪ ⎩ S = {(r, h) ∈ R+ × R+
/
3 over the set S
1 2 h 2 }. πr2 πr
The set S is the plane region, in the first quadrant, between the curves 2 1 h = 2 and h = 2 (see Figure 1.3). πr πr h 3.5 3.0 2.5 2.0 1.5 1.0
S
0.5 0.5
1.0
1.5
2.0
r
FIGURE 1.3: S is a plane region between two curves
A three dimensional formulation of the same problem is ⎧ 2V ⎪ ⎨ minimize A(r, h, V ) = 2πr2 + over the set S r ⎪ ⎩ S = {(r, h, V ) ∈ R+ × R+ × R+ / πr2 h = V, 1 V 2} where, the set S ⊂ R3 is the part of the surface V = πr2 h located between the planes V = 1 and V = 2 in the first octant; see Figure 1.4. 3 h
2
1 0 1.0
V
0.5
0.0 0.0 0.5 1.0 r
1.5 2.0
FIGURE 1.4: S is a surface in the space
4
Introduction to the Theory of Optimization in Euclidean Space
Example 2. Too many variables and linear inequalities. Diet Problem. * One can buy four types of aliments where the nutritional content per unit weight of each food and its price are shown in Table 1.1 [5]. The diet problem consists of obtaining, at the minimum cost, at least twelve calories and seven vitamins.
calories vitamins price
type1 type2 type3 type4 2 1 0 1 3 4 3 5 2 2 1 8
TABLE 1.1: A diet problem with four variables
Solution: Let ui be the weight of the food of type i. The total price of the four aliments consumed is given by the relation 2u1 + 2u2 + u3 + 8u4 = f (u1 , u2 , u3 , u4 ). To ensure that at least twelve calories and seven vitamins are included, we can express these conditions by writing 2u1 + u2 + u4 12
and
3u1 + 4u2 + 3u3 + 5u4 7.
Hence, the problem would be ⎧ ⎨ minimize f (u1 , u2 , u3 , u4 ) ⎩
2u1 + u2 + u4 12,
over the set S = (u1 , u2 , u3 , u4 ) ∈ R4 : 3u1 + 4u2 + 3u3 + 5u4 7 .
** The above problem is rendered more complex if more factors (fat, proteins) and types of food (steak, potatoes, fish, ...) were to be considered. For example, from Table 1.2, we deduce that the total price of the seven
protein f at calories vitamins price
type1 type2 type3 type4 type5 type6 type7 3 1 2 7 8 5 10 0 1 0 8 15 10 6 2 1 0 1 5 7 9 3 4 3 5 1 2 5 2 2 1 8 12 10 8
TABLE 1.2: A diet problem with seven variables
Introduction
5
aliments consumed is 2u1 + 2u2 + u3 + 8u4 + 12u5 + 10u6 + 8u7 = p(u1 , u2 , u3 , u4 , u5 , u6 , u7 ). To ensure that at least twelve calories, seven vitamins, twenty proteins are included, and less than fifteen fats are consumed, the problem would be formulated as ⎧ minimize p(u1 , u2 , u3 , u4 , u5 , u6 , u7 ) over the set ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ S = (u1 , u2 , u3 , u4 , u5 , u6 , u7 ) ∈ R7 : ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 3u1 + u2 + 2u3 + 7u4 + 8u5 + 5u6 + 10u7 20 ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
u2 + 8u4 + 15u5 + 10u6 + 6u7 15 2u1 + u2 + u4 + 5u5 + 7u6 + 9u7 12 3u1 + 4u2 + 3u3 + 5u4 + u5 + 2u6 + 5u7 7.
Example 3. Too many variables and nonlinearities. * A company uses x units of capital and y units of labor to produce x y units of a manufactured good. Capital can be purchased at 3$/ unit and labor can be purchased at 2$/ unit. A total of 6$ is available to purchase capital and labor. How can the firm maximize the quantity of the good that can be manufactured? Solution: We need to maximize the quantity x y on the set of points (see Figure 1.5) y 3
2
L2
L3
1
1
S
1
2
3
x
L1
1
FIGURE 1.5: S is a triangular region in the plane S = {(x, y) ∈ R2 :
3x + 2y 6,
x 0,
y 0}.
6
Introduction to the Theory of Optimization in Euclidean Space
The set S is the triangular plane region bounded by the sides L1 , L2 and L3 , defined by: L1 = {(x, 0), 0 x 2}, L2 = {(0, y), 0 y 3},
L3 = {(x, (6 − 3x)/2), 0 x 2}.
Here, the objective function f (x, y) = xy is nonlinear and the set S is described by linear inequalities. ** Such a model may work for a certain production process. However, it may not reflect the situation as other factors involved in the production process cannot be ignored. Therefore, new models have to be considered. For Example [7]: - The Canadian manufacturing industries for 1927 is estimated by: P (l, k) = 33l0.46 k 0.52 where P is product, l is labor and k is capital. - The production P for the dairy farming in Iowa (1939) is estimated by: P (A, B, C, D, E, F ) = A0.27 B 0.01 C 0.01 D0.23 E 0.09 F 0.27 where A is land, B is labor, C is improvements, D is liquid assets, E is working assets and F is cash operating expenses. Each of these nonlinear production function P is optimized on a suitable set S that describes well the elements involved.
As seen above, the main purpose, of this study, is to find a solution to the following optimization problems find u ∈ S
such that
f (u) = min f (v)
find u ∈ S
such that
f (u) = max f (v)
S
or S
n
where f : S ⊂ R −→ R is a given function and S a given subset of Rn . It is obvious that establishing existence and uniqueness results of the extreme points, depends on properties satisfied by the set S and the function f . So, we need to know some categories of subsets in Rn as well as some calculus on multi-variable functions. But, first look at the following remark:
Introduction
7
Remark 1.1.1 The extreme point may not exist on the set S. In our study, we will explore the situations where min f and max f are attained in S. S
S
For example min f (x) = x2
does not exist.
(0,1)
Indeed, suppose there exists x0 ∈ (0, 1) such that f (x0 ) = min f (x). Then, (0,1)
0
0
To include these limit cases, usually, instead of looking for a minimum or a maximum, we look for inf f (x) = inf{f (x) : x ∈ S} S
and
sup f (x) = sup{f (x) : x ∈ S} S
where inf E and sup E of a nonempty subset E of R are defined by [2] sup E = the least number greater than or equal to all numbers in E inf E = the greatest number less than or equal to all numbers in E. If E is not bounded below, we write inf E = −∞. If E is not bounded above, we write sup E = +∞. By convention, we write sup ∅ = −∞ and inf ∅ = +∞. For the previous example, we have inf x2 = 0,
(0,1)
and
sup x2 = 1.
(0,1)
8
Introduction to the Theory of Optimization in Euclidean Space
1.2
Particular Subsets of Rn
We list here the main categories of sets that we will encounter and give the main tools that allow their identification easily. Even though the purpose is not a topological study of these sets, it is important to be aware of the precise definitions and how to apply them accurately [18], [13].
Open and Closed Sets In one dimension, the distance between two real numbers x and y is measured by the absolute value function and is given by d(x, y) = |x − y|. d satisfies, for any x, y, z, the properties d(x, y) 0 d(y, x) = d(x, y) d(x, z) d(x, y) + d(y, z)
d(x, y) = 0 ⇐⇒ x = y symmetry triangle inequality.
These three properties induce on R a metric topology where a set O is said to be open if and only if, at each point x0 ∈ O, we can insert a small interval centered at x0 that remains included in O, that is, O
is open
⇐⇒
∀x0 ∈ O
∃ > 0
such that
(x0 − , x0 + ) ⊂ O.
In higher dimension, these tools are generalized as follows: The distance between two points x = (x1 , · · · , xn ) and y = (y1 , · · · , yn ) is measured by the quantity d(x, y) = x − y = (x1 − y1 )2 + . . . + (xn − yn )2 . d is called the Euclidean distance and satisfies the three properties above. A set O ⊂ Rn is said to be open if and only if, at each point x0 ∈ O, we can insert a small ball B (x0 ) = {x ∈ Rn : x − x0 < }
Introduction
9
centered at x0 with that remains included in O, that is, O
∀x0 ∈ O
⇐⇒
is open
∃ > 0
B (x0 ) ⊂ O.
such that
The point x0 is said to be an interior point to O. Example 1. As n varies, the ball takes different shapes; see Figure 1.6. n=1
a∈R
n=2
a = (a1 , a2 )
Br (a) = (a − r, a + r)
:
an open interval
Br (a) = {(x1 , x2 ) : (x1 − a1 )2 + (x2 − a2 )2 < r2 } : an open disk
n=3
a = (a1 , a2 , a3 ) Br (a) = {(x1 , x2 , x3 ) : (x1 − a1 )2 + (x2 − a2 )2 + (x3 − a3 )2 < r2 } : set of points delimited by the sphere centered at a with radius r Br (a) is the set of points delimited by a = (a1 , . . . , a3 ) the hyper sphere of points x satisfying d(a, x) = r.
n>3
y
4
2
y 0
3 2 4 4
2
2
1
1
2 x 2
interval 0
1
2
3
2
1
disk
1 x2 y2 4
sphere
2
2
1 2
3
x
x2 y2 z2 4
z 0 2 4 4
2 x
3
FIGURE 1.6: Shapes of balls in R, R2 and R3 Using the distance d, we define
Definition 1.2.1 Let S be a subset of Rn . ◦
– S is the interior of S, the set of all interior points of S. – S is a neighborhood of a if a is an interior point of S.
0
2 4
10
Introduction to the Theory of Optimization in Euclidean Space – S is a closed set
C S is open.
⇐⇒
– ∂S is the boundary of S, the set of boundary points of S, where
x0 ∈ ∂S ⇐⇒ ∀r > 0, Br (x0 )∩ S = ∅ and Br (x0 )∩ C S = ∅. – S = S ∪ ∂S is the closure of S. ⇐⇒
– S is bounded
∃M >0
such that
x M
∀x ∈ S.
– S is unbounded if it is not bounded.
Example 2. For the sets,
S1 = [−2, 2] ⊂ R
S2 = {(x, y) : x2 +y 2 4} ⊂ R2 ,
S3 = {(x, y, z) : x2 +y 2 +z 2 < 4} ⊂ R3 ,
we have ◦
S
S
∂S
S
S1
(−2, 2)
{−2, 2}
S1
S2
B2 (0)
C2 (0) : circle
S2
S3
S3 = B2 (0)
S2 (0) : sphere
S3 ∪ S2 (0)
where C2 (0) = {(x, y) : x2 + y 2 = 4},
S2 (0) = {(x, y, z) : x2 + y 2 + z 2 = 4}.
We have the following properties:
Remark 1.2.1 – Rn and ∅ are open and closed sets – The union (resp. intersection) of arbitrary open (resp. closed) sets is open (resp. closed). – The finite intersection (resp. union) of open (resp. closed) sets is open (resp. closed). – S is open – S is closed
⇐⇒ ⇐⇒
◦
S = S. S = S.
Introduction
11
– If f is continuous on an open subset Ω ⊂ Rn (see Section 1.3), then
f −1 (−∞, a] = [f a], [f a], [f = a] are closed sets in Rn
f −1 (−∞, a) = [f < a],
are open sets in Rn .
[f > a]
Example 3. Sketch the set S in the xy-plane and determine whether it is ◦
open, closed, bounded or unbounded. Give S, S = {(x, y) :
x 0,
∂S and S.
y 0,
xy 1}
y 5
4
3
xy1
x0
y0
2
3
4
2
1
2
1
1
5
x
1
2
FIGURE 1.7: An unbounded closed subset of R2
∗ Note that the set S, sketched in Figure 1.7, doesn’t contain the points on the x and y axis. So S = {(x, y) : x > 0,
y > 0,
xy 1}
and can be described using the continuous function f : (x, y) −→ xy on the open set Ω = {(x, y) : x > 0, y > 0} as
S = {(x, y) ∈ Ω : f (x, y) 1} = f −1 [1, +∞) . Therefore, S is a closed subset of R2 . Thus S = S. ∗∗ The set is unbounded since it contains the points (x(t), y(t)) = (t, t) for t 1 (xy = t.t = t2 1) and √ (x(t), y(t)) = (t, t) = t2 + t2 = 2t −→ +∞ as t −→ +∞.
12
Introduction to the Theory of Optimization in Euclidean Space
∗ ∗ ∗ We have ◦
S = {(x, y) : x > 0, y > 0, xy > 1} the region in the 1st quadrant above the hyperbola y = ∂S = {(x, y) :
x > 0,
y > 0,
1 x
xy = 1}
the arc of the hyperbola in the 1st quadrant. Example 4. A person can afford any commodities x 0 and y 0 that satisfies the budget inequality x + 3y 7. Sketch the set S described by these inequalities in the xy-plane and determine ◦
whether it is open, closed, bounded or unbounded. Give S,
∂S
and S.
y 4
3
2
1
S
2
4
6
x
1
FIGURE 1.8: Closed set as intersection of three closed sets of R2
∗ Figure 1.8 shows that S is the triangular region formed by all the points in the first quadrant below the line x + 3y = 7 : S = {(x, y) :
x + 3y 7,
x 0,
y 0}
and can be described using the continuous functions f1 : (x, y) −→ x + 3y,
f2 : (x, y) −→ x,
f3 : (x, y) −→ y
on R2 as S = {(x, y) ∈ R2 : f1 (x, y) 7, f2 (x, y) 0, f3 (x, y) 0}
f2−1 [0, +∞) f3−1 [0, +∞) . = f1−1 (−∞, 7] Therefore, S is a closed subset of R2 as the intersection of three closed subsets of R2 . Thus S = S.
Introduction
13
∗∗ The set S is bounded since x + 3y 7,
x 0,
y0
0 x 7,
=⇒
0y
7 3
from which we deduce (x, y) = x2 + y 2
72 +
7 2 7√ = 10 3 3
∀(x, y) ∈ S.
∗ ∗ ∗ We have ◦
S = {(x, y) : x > 0, y > 0, x + 3y < 7} the region S excluding its three sides ∂S =
the three sides of the triangular region.
Convex sets The category of convex sets, deals with sets S ⊂ Rn where any two points x, y ∈ S can be joined by a line segment that remains entirely into the set. Such sets are without holes and do not bend inwards. Thus S
is convex
⇐⇒
(1 − t)x + ty ∈ S
∀x, y ∈ S
∀t ∈ [0, 1].
We have the following properties:
Remark 1.2.2 – Rn and ∅ are convex sets – A finite intersection of convex sets is a convex set.
Example 5. “Well known convex sets” (see Figure 1.9) ∗ A line segment joining two points x and y is convex. It is described by [x, y] = {z ∈ Rn :
∃t ∈ [0, 1]
such that
z = x + t(y − x) = (1 − t)x + ty}.
∗∗ A line passing through two points x0 and x1 is convex. It is described by L = {x ∈ Rn :
∃t ∈ R
such that
x = x0 + t(x1 − x0 )}.
14
Introduction to the Theory of Optimization in Euclidean Space y 5 4 3 line segment B
line
1
A 4
disk
2
2
2
4
6
x
1
FIGURE 1.9: Convex sets in R2
∗ ∗ ∗ A ball Br (x0 ) = {x ∈ Rn :
x − x0 < r} is convex.
Indeed, let a and b in Br (x0 ) and t ∈ [0, 1], we have [(1 − t)a + tb] − x0 = (1 − t)(a − x0 ) + t(b − x0 ) (1 − t)(a − x0 ) + t(b − x0 ) = |1 − t|a − x0 + |t|b − x0 < |1 − t|r + |t|r = r
a − x0 < 1 and b − x0 < 1.
since
Hence (1 − t)a + tb ∈ Br (x0 ) for any t ∈ [0, 1]; that is, [a, b] ⊂ Br (x0 ). y 2
1 x2 y2 4
2
1
1
2
x
closed disk 1
2
FIGURE 1.10: A closed ball is convex ∗ ∗ ∗∗ A closed ball Br (x0 ) = {x ∈ Rn : x − x0 r} is convex. For example, in the plane, the set in Figure 1.10, defined by {(x, y) :
x2 + y 2 4} = B2 ((0, 0))
is convex.
Introduction
15
The set is the closed disk with center (0, 0) and radius 2. It is closed since it includes its boundary points located on the circle with center (0, 0) and radius 2. This set is bounded since (x, y) 2 ∀(x, y) ∈ B2 ((0, 0)). Example 6. “Convex sets described by linear expressions” ∗ For a = (a1 , . . . , an ) ∈ Rn , b ∈ R, the set of points x = (x1 , . . . , xn ) ∈ Rn :
a1 x1 + a2 x2 + . . . + an xn = a.x = b
is convex and called hyperplane. Indeed, consider x1 , x2 in the hyperplane and t ∈ [0, 1], then a.[(1 − t)x1 + tx2 ] = (1 − t)a.x1 + ta.x2 = (1 − t)b + tb = b thus (1 − t)x1 + tx2 belongs to the hyperplane. As illustrated in Figure 1.11, the graph of an hyperplane is reduced to the point x1 = b/a1 when n = 1, to the line a1 x1 + a2 x2 = b in the plane when n = 2, and to the plane a1 x1 + a2 x2 + a3 x3 = b in the space when n = 3. 4 y 0
y 2.0 4 4
1.5 hyperplane 3
2
1
0
1
2
x
z 0
hyperplane 0.5
2
hyperplane
2
1.0 3
2
2
2
1
1
2
4 4
x
2 x
0.5
0 2 4
FIGURE 1.11: Hyperplane in R, R2 and R3
∗∗ The set of points in x = (x1 , . . . , xn ) ∈ Rn defined by a linear inequality a1 x1 + a2 x2 + . . . + an xn = a.x b
(resp. , )
is convex.
Indeed, as above, consider x1 , x2 in the region [a.x b] and t ∈ [0, 1], then a.x1 b
=⇒
(1 − t)a.x1 (1 − t)b
a.x2 b
=⇒
ta.x2 tb
since
since t0
(1 − t) 0
16
Introduction to the Theory of Optimization in Euclidean Space Adding the two inequalities, we get a.[(1 − t)x1 + tx2 ] = (1 − t)a.x1 + ta.x2 (1 − t)b + tb = b
thus (1 − t)x1 + tx2 belongs to the region [a.x b]. The set a.x b describes the region of points located below the hyperplane a.x = b. ∗ ∗ ∗ A set of points in Rn described by linear equalities and inequalities is convex as it can be seen as the intersection of convex sets described by equalities and inequalities. For example, in Figure 1.12, the set S = {(x, y) : 2x + 3y 19, −3x + 2y 4, x + y 8, 0 x 6, x + 6y 0} can be described as S = S1 ∩ S2 ∩ S3 ∩ S4 ∩ S5 ∩ S6 where S1 = {(x, y) ∈ R2 : x + 6y 0} S3 = {(x, y) ∈ R2 : x + y 8}
S2 = {(x, y) ∈ R2 : x 6} S4 = {(x, y) ∈ R2 : 2x + 3y 19}
S5 = {(x, y) ∈ R2 : −3x + 2y 4}
S6 = {(x, y) ∈ R2 : x 0}.
y
6
L4 4
L5
L3 2
S
L2
L6
L1
2
4
6
x
FIGURE 1.12: A convex set described by linear inequalities
Introduction
17
S is the region of the plan xy, bounded by the lines L1 : x + 6y = 0
L2 : x = 6,
L3 : x + y = 8,
L5 : −3x + 2y = 4
L4 : 2x + 3y = 19,
L6 : x = 0.
Often, such sets are described using matrices and vectors; ⎡ ⎤ ⎡ 2 3 ⎢ −3 2 ⎥ ⎢ ⎢ ⎥ ⎢ x ⎢ ⎥ ⎢ 1 ⎥ x ⎢ 1 S = ∈ R2 : ⎢ ⎢ ⎥ ⎢ y 0 ⎥ y ⎢ 1 ⎢ ⎣ −1 −6 ⎦ ⎣ −1 0
19 4 8 6 0 0
⎤ ⎥ ⎥ ⎥ ⎥ . ⎥ ⎥ ⎦
Example 7. “Well-known non convex sets” ∗ The hyper-sphere (see Figure 1.13 for an illustration in the plane) ∂Br (x∗ ) = {x ∈ Rn :
x − x0 = r}
is not convex.
y 3
2
circle 1 x 12 y 12 4
1
1
2
3
x
1
FIGURE 1.13: Circle ∂B2 ((1, 1)) is not convex Indeed, we have (x∗1 , . . . , x∗n ± r) ∈ ∂Br (x∗ ) since (0, . . . , ±r) = r 1 1 (x∗1 , . . . , x∗n + r) + (1 − )(x∗1 , . . . , x∗n − r) − x∗ 2 2 1 = (2x∗1 , . . . , 2x∗n + r − r) − x∗ = x∗ − x∗ = 0 = r 2 1 ∗ 1 (x , . . . , x∗n + r) + (1 − )(x∗1 , . . . , x∗n − r) = x∗ ∈ ∂Br (x∗ ). =⇒ 2 1 2 ∗∗ The domain located outside the hyper-sphere, described by S = {x ∈ Rn :
x − x∗ > r} = Rn \ Br (x∗ )
is not convex.
18
Introduction to the Theory of Optimization in Euclidean Space y 4
x2 y2 4 2
4
2
2
4
x
2
4
FIGURE 1.14: An unbounded open non convex set of R2
Indeed, we have (x∗1 , . . . , x∗n ± 2r) ∈ S
since
(0, . . . , ±2r) = 2r > r
1 1 ∗ (x1 , . . . , x∗n + 2r) + (1 − )(x∗1 , . . . , x∗n − 2r) 2 2 1 (2x∗1 , . . . , 2x∗n + 2r − 2r) = x∗ ∈ S. 2 For example, in the plane, the set =
{(x, y) :
x2 + y 2 > 4} = R2 \ B2 ((0, 0))
is not convex.
Moreover, the set is open since it is the complementary of the closed disk with center (0, 0) and radius 2 (see Figure 1.14). It is not bounded since for t 2, the points (0, t2 ) belong to the set, but (0, t2 ) = t2 −→ +∞ as t −→ +∞. ∗ ∗ ∗ The region located outside the hyper-sphere, including the hyper-sphere, described by S = {x ∈ Rn :
x − x0 r} = Rn \ Br (x∗ )
is not convex.
Example 8. “The union of convex sets is not necessarily convex ” ∗ The union of the disk and the line in Figure 1.9 is not convex. ∗∗ The set E = {(x, y) ∈ R2 : is not convex.
xy + x − y − 1 > 0}, graphed in Figure 1.15,
Introduction
19
Indeed, we have xy + x − y − 1 > 0 ⇐⇒ x > 1 and
⇐⇒ (x − 1)(y + 1) > 0 y > −1 or x 1 and y > −1}
E2 = {(x, y) ∈ R2 :
x < 1 and y < −1}
E1 and E2 are convex since they are described by linear inequalities. However, E = E1 ∪ E2 is not convex since for example (2, 0) and (0, −2) are points of E, but
1 1 2, 0 + 1 − 0, −2 = 1, −1 2 2
4
2
doesn’t belong to the set E.
2
0.5
4
x1 and y 1
1.0
y 1 1 x
1.5
2.0
FIGURE 1.15: Union of convex sets
20
Introduction to the Theory of Optimization in Euclidean Space
1.3
Functions of Several Variables
We refer the reader to any book of calculus [1], [3], [21], [23] for details on the points introduced in this section.
Definition 1.3.1 A function f of n variables x1 , · · · , xn is a rule that assigns to each n-vector x = (x1 , . . . , xn ) in the domain of f , denoted by Df , a unique number f (x) = f (x1 , . . . , xn ).
Example 1. Formulas may be used to model problems from different fields. – Linear function f (x1 , . . . , xn ) = a1 x1 + a2 x2 + . . . + an xn . – The body mass index is described by the function B(w, h) =
w h2
where w is the the weight in kilograms and h is the height measured in meters. – The distance of a point P (x, y, z) to a given point P0 (x0 , y0 , z0 ) is a function of three variables d(x, y, z) = (x − x0 )2 + (y − y0 )2 + (z − z0 )2 . – The Cobb-Douglas function or the production function, describes the relationship between the output: the product Q and the inputs: x1 , . . . , xn (capital, labor, . . .) involved in the production process Q(x1 , · · · , xn ) = Cxa1 1 xa2 2 . . . xann
C, a1 , . . . , an
are constants, C > 0.
– The electric potential function for two positive charges, one at (0, 1) with twice the magnitude as the charge at (0, −1), is given by ϕ(x, y) =
2 x2
+ (y −
1)2
+
1 x2
+ (y + 1)2
.
Introduction
21
Example 2. When given a formula of a function, first identify its domain of definition before any other calculation. The domains of definition of the functions given by the following formulas: √ √ √ g(x, y) = x h(x, y, z) = x f (x) = x are Df = {x ∈ R/ x 0} Dg = {(x, y) ∈ R2 / x 0} : the half plane bounded by the y axis, including the axis and the points located in the 1st and 4th quadrants. Dh = {(x, y, z) ∈ R3 / x 0} : the half space bounded by the plane yz, including this plane and the points with positive 1st coordinates x 0. The three domains Df , Dg , Dh are closed, convex, unbounded subsets of R, R2 and R3 respectively; see Figure 1.16. y 1.0
Dg : x 0
Df : x 0
interval 0
1
2
3
1.0
4
5
0.5
0.5
0.5
6 0.5
1.0 10 y
Dh : x 0
5
0 5 10 10
5
z
0
5
10 0
x
5 10
FIGURE 1.16: Domains of definition
1.0
x
22
Introduction to the Theory of Optimization in Euclidean Space
Graphs and Level Curves With the aid of monotony, and convexity, sketching the graph of a real function is performed by plotting few points. This is not possible in the case of dimension 3. To get familiar with some sets in R3 , we describe the traces’ method used for plotting graphs of functions of two variables. The method consists on sketching the intersections of the graph (or surface) with well-chosen planes, usually planes that are parallel to the coordinates planes: xy-plane : z = 0
xz-plane : y = 0
yz-plane : x = 0.
These intersections are called traces.
Definition 1.3.2 The graph of a function f : x = (x1 , . . . , xn ) ∈ Df ⊂ Rn −→ z = f (x) ∈ R is the set Gf = {(x, f (x)) ∈ Rn+1 :
x ∈ Df }.
The set of points x in Rn satisfying f (x) = k is called a level surface of f .
When n = 2, a level surface f (x, y) = k is called level curve. It is the projection of the trace Gf ∩[z = k] onto the xy-plane. Drawing level curves of f is another way to picture the values of f . The following examples illustrate how to proceed to graph some surfaces and level curves. Example 3. A cylinder is a surface that consists of all lines that are parallel to a given line and that pass through a given plane curve. Let
E = {(x, y, z), x = y 2 }.
The set E cannot be the graph of a function z = f (x, y) since (1, 1, z) ∈ E for any z, and then (1, 1) would have an infinite number of images. However, we can look at E as the graph of the function x = f (y, z) = y 2 . Moreover, we have {(x, y, z), x = y 2 , (x, y) ∈ R2 }. E= z∈R
Introduction
23
This means that any horizontal plane z = k (// to the xy plane) intersects the graph in a curve with equation x = y 2 . So these horizontal traces E ∩ [z = k], k ∈ R are parabolas. The graph is formed by taking the parabola x = y 2 in the xy-plane and moving it in the direction of the z-axis. The graph is a parabolic cylinder as it can be seen as formed by parallel lines passing through the parabola x = y 2 in the xy-plane (see Figure 1.17). Note that for any k ∈ R, the level curve z = k is the parabola x = y 2 in the xy plane.
z traces y y 4
Level curve x y2
x
2
4
2
2
4
x
2
4 2 graph x
y2
1
y 0 1 2 2
1
z
0
1 2 2 1 0 x
1 2
FIGURE 1.17: Parabolic cylinder Example 4. An Elliptic Paraboloid, in its standard form, is the graph of the function f (x, y) = z = The graph Gf =
x2 y2 + a2 b2
(x, y, z),
z∈[0,+∞)
with
a > 0, b > 0.
x2 y2 + = z a2 b2
y2 x2 + = k in the planes z = k, k 0. a2 b2 By choosing the traces in Table 1.3, we can shape the graph in the space (see Figure 1.18 for a = 2, b = 3): can be seen as the union of ellipses
24
Introduction to the Theory of Optimization in Euclidean Space plane xy (z = 0)
trace point : (0, 0)
xz (y = 0)
parabola : z =
x2 a2
yz (x = 0)
parabola : z =
y2 b2
z=1
ellipse :
x2 y2 + =1 a2 b2
TABLE 1.3: Traces to sketch a paraboloid z
x2 4
y2 9
2 y 0 y 10
x2
levelcurves
4
y2 9
2 k
1.0
k9
5 k4 k1
z
k0 10
5
10
x
2
1.5 z
1.0 0.5
5
0
0.0
y
0.0 2
2
1
1
2
0 x 10
0.5
2.0 5
1 2
0 x
1 2
FIGURE 1.18: Elliptic paraboloid
Note that for any k < 0, the level curves z = k are not defined. For k > 0, x2 y2 the level curves are ellipses √ + √ = 1 centered at the origin. For (a k)2 (b k)2 k = 0, the level curve is reduced to the point (0, 0). Example 5. The Elliptic Cone, in its standard form, is described by the equation x2 y2 with a > 0, b > 0. z2 = 2 + 2 a b y2 x2 It is the union of the graphs of the functions z = ± + 2. 2 a b To sketch the cone, one can make the choice of traces in Table 1.4 (see Figure 1.19 for a = 2, b = 3):
Introduction
25
plane xy (z = 0)
trace point : (0, 0)
xz (y = 0)
lines : z = ±
x a
yz (x = 0)
lines : z = ±
y b
z = ±1
ellipse :
x2 y2 + =1 a2 b2
TABLE 1.4: Traces to sketch a cone z2
x2 4
y2 9
2
y 0 y 10
x2
levelcurves
4
y2 9
2 k2
1.0
k3
5
0.5
k2 k1 k0 10
5
z
1.0
5
10
x
z
0.0
0.5 0.0
0.5
0.5
2
1.0
1.0
2 5
0
1 x
1 0
2
1 10
2
y
0 x
1 2
2
FIGURE 1.19: Elliptic cone
Note that for any k = 0, the level curves z = ±k are ellipses y2 x2 + = 1 centered at the origin. For k = 0, the level curve is 2 (|k|a) (|k|b)2 reduced to the point (0, 0).
Example 6. The Elliptic Ellipso¨ıd, in its standard form, is described by the equation y2 z2 x2 + + =1 a2 b2 c2
with
a > 0, b > 0, c > 0.
x2 y2 − 2 that one 2 a b can sketch by making the following choice of traces in Table 1.5 (see Figure 1.20 for a = 2, b = 3, c = 4): It is the union of the graphs of the functions z = ±c
1−
26
Introduction to the Theory of Optimization in Euclidean Space plane
trace x2 y2 ellipse : 2 + 2 = 1 a b
xy (z = 0) xz (y = 0)
ellipse :
x2 z2 + =1 a2 c2
yz (x = 0)
ellipse :
y2 z2 + 2 =1 2 b c
TABLE 1.5: Traces to sketch an ellipso¨ıd x2 44 y y 3
9
z2 16
1
1
2
0 1
k0
2
4 4
2
2
y2
0
2
x
2
k 2 y 0
1
2
k 3 2
k 4 3
2
1
1
2
3
z
0
x 4
2 1
2
4 4 z
0
2
2
0 2
3
x 4
2 4
FIGURE 1.20: An elliptic ellipso¨ıd
For k ∈ R, the level curves centered z = ±k are ellipses at the origin with ver k2 k2 k2 k2
tices − a 1 − 2 , a 1 − 2 , − b 1 − 2 , b 1 − 2 in the xy plane. c c c c
Limits and Continuity For the local study of a function, the concept of limit is generalized to functions of several variables as follows
n Definition 1.3.3
Let x0 ∈ R and let f be a function defined on Df ∩ Br (x0 ) \ {x0 } . We write lim f (x) = L
⇐⇒
x−→x0
∀ > 0, ∃δ > 0 such that ∀x : 0 < x−x0 < δ =⇒ |f (x)−L| < .
Introduction
27
Remark 1.3.1 i) The definition above supposes that f is defined in a neighborhood of x0 , except possibly at x0 . It includes points x0 located at the boundary of the domain of f . ii) One can establish, using similar tools in one dimension [2], that the standard properties of limits hold for limits of functions of n variables. iii) If the limit of f (x) fails to exist as x −→ x0 along some smooth curve, or if f (x) has different limits as x −→ x0 along two different smooth curves, then the limit of f (x) does not exist as x −→ x0 .
Example 7. • lim xi = ai ,
a = (a1 , · · · , an ) ∈ Rn .
i = 1, · · · , n
x−→a
Indeed, for > 0, choose δ = > 0. Then, we have for x satisfying x − a < δ
|xi − ai | x − a < δ = .
=⇒
• Algebraic operations on limits. lim
(x,y,z)−→(1,2,3)
=
lim
3xy 2 + z − 5
(x,y,z)−→(1,2,3)
= 3[
lim
[3xy 2 ] +
(x,y,z)−→(1,2,3)
x] . [
lim
(x,y,z)−→(1,2,3)
lim
(x,y,z)−→(1,2,3)
• The limit
2x2 y + y2
lim
(x,y)→(0,0) x4
z −
lim
(x,y,z)−→(1,2,3)
5
y]2 + 3 − 5 = 3(1)(2)2 + 3 − 5 = 10.
does not exist.
Indeed, if we consider the smooth curves C1 and C2 with equations y = x2 and y = x respectively, we find that lim
(x,y)→(0,0)(x,y)∈C
1
2x2 y 2x2 x2 2x4 = lim 4 = lim 4 = 1, 2 2 2 x→0 x + (x ) x→0 2x +y
x4
2x2 y 2x2 x 2x = 0, = lim = lim 2 4 2 4 2 x→0 x→0 x +x x +1 (x,y)→(0,0)(x,y)∈C x + y 2 lim
the limits have different values along C1 and C2 (see Figure 1.21).
28
Introduction to the Theory of Optimization in Euclidean Space z
2 x2 x4 y2
15
0.5
10z 5 0 0.5
0.0 y
0.0 x
0.5
0.5
FIGURE 1.21: Behavior of f (x, y) =
2x2 y near (0, 0) x4 + y 2
Definition 1.3.4 Let f be a function defined on Df ⊂ Rn . Then ⎧ ⎪ f (x0 ) is defined and ⎨ f is continuous at x0 ⇐⇒ ⎪ ⎩ lim f (x) = f (x0 ). x−→x0
If f is continuous at every point in an open set O, then we say that f is continuous on O.
Remark 1.3.2 A function of n variables that can be constructed from continuous functions by combining the operations of addition, substraction, multiplication, division and composition is continuous wherever it is defined.
Example 8. Give the largest region where f is continuous f (x, y) =
1 . −1
exy
Solution: f is continuous on its domain of definition Df = R2 \ {(x, y) ∈ R2 / x = 0 or y = 0}.
Introduction
29
More precisely, we have ∗ (x, y) −→ x y is continuous on R2 as the product of the function (x, y) −→ x and the function (x, y) −→ y 1 is continuous on Df as the composition of the C 0 −1 1 on function (x, y) −→ xy on R2 and the C 0 function t −→ t e −1 R \ {0} : 1 . (x, y) ∈ Df −→ xy = t ∈ R \ {0} −→ t e −1
∗ ∗ (x, y) −→
exy
First-order Partial Derivatives Our purpose, now, is to generalize the concept of differentiability to functions of several variables. More precisely, we will show that the existence of a line tangent for a real differentiable function f at a point x0 is extended to the existence of an hyperplane for a differentiable function with several variables. First, we introduce some tools:
Definition 1.3.5 If
z = f (x) = f (x1 , · · · , xn ), then the quantity
∂f f (x1 , · · · , xi + h, · · · , xn ) − f (x1 , · · · , xi , · · · , xn ) (x) = lim h−→0 ∂xi h is the partial derivative of f (x1 , · · · , xn ) with respect to xi when all the other variables xj (j = i, i = 1, . . . , n) are held constant.
Remark 1.3.3 - The partial derivative d ∂f (a) = [f (a1 , . . . , xi , . . . , an )] , ∂xi dxi xi =ai
i = 1, . . . , n
can be viewed as the slope of the line tangent to the curve Ci : z = f (a1 , . . . , xi , . . . , an ) at the point a, or the rate of change of z with respect to xi along the curve Ci at a.
30
Introduction to the Theory of Optimization in Euclidean Space
- Other notations are : ∂z ∂f = = f x i = zx i ∂xi ∂xi
i = 1, · · · , n.
- We call gradient of f the vector ∇f (x) = fx1 , fx2 , · · · , fxn = f (x).
f (w, x, y, z) = xeyw sin z.
Example 9. Let fx (1, 2, 3, π/2),
fy (1, 2, 3, π/2),
Find
fz (1, 2, 3, π/2)
and
fw (1, 2, 3, π/2).
Solution: We have fx = eyw sin z fy = xweyw sin z fz = xeyw cos z fw = xyeyw sin z
fx (1, 2, 3, π/2) = eyw sin z
(w,x,y,z)=(1,2,3,π/2)
fy (1, 2, 3, π/2) = xweyw sin z fz (1, 2, 3, π/2) = xeyw cos z
= e3
(w,x,y,z)=(1,2,3,π/2)
(w,x,y,z)=(1,2,3,π/2)
fw (1, 2, 3, π/2) = xyeyw sin z
= 2e3
=0
(w,x,y,z)=(1,2,3,π/2)
= 6e3 .
Example 10. The rate of change of the (BMI) body mass index function B(w, h) = w/h2 with respect of the weight w at a constant height h is 1 ∂B = 2 > 0. ∂w h Thus, at constant height, people’s BMI differs by a factor of 1/h2 . The rate of change of the BMI with respect of the height h at a constant weight w is 2w ∂B = − 3 < 0. ∂h h Therefore, with similar weight, people’s BMI is a decreasing function of the height.
Introduction
31
Higher Order Partial Derivatives • Each partial derivative is also a function of n variables. These functions may themselves have partial derivatives, called second order derivatives. For each i = 1, . . . , n, we have ∂2f ∂ ∂f
= = fxi xj . ∂xj ∂xi ∂xj ∂xi The n second-order partial derivatives fxi xi are called direct second-order partial; the others, fxi xj where i = j, are called mixed second-order partial. Usually these second-order partial derivatives are displayed in an n × n matrix named the Hessian ⎡ ⎤ fx1 x1 fx1 x2 . . . f x1 xn ⎢ fx2 x1 fx2 x2 . . . f x2 xn ⎥ ⎢ ⎥ Hf (x) = (fxi xj )n×n = ⎢ ⎥ .. .. .. .. ⎦ ⎣ . . . . f xn x1
fxn x2
...
fxn xn
• The mixed derivatives are equal in the following situation [15]
Theorem 1.3.1 Clairaut’s theorem Let f (x) = f (x1 , x2 , · · · , xn ). If fxi xj and fxj xi , i = j for i, j ∈ {1, · · · , n} are defined on a neighborhood of a point a ∈ Rn and are continuous at a then fxi xj (a) = fxj xi (a).
• Third-order, fourth-order and higher-order partial derivatives can be obtained by successive differentiation. Clairaut’s theorem reduces the steps of calculations when the continuity assumption is satisfied. Example 11. Write the Hessian of the Cobb-Douglas function Q(L, K) = cLa K b
(c, a, b are positive constants)
where the two inputs are labor L and capital K. Solution: For L, K > 0, we have ln Q = ln c + a ln L + b ln K
32
Introduction to the Theory of Optimization in Euclidean Space QL a ∂(ln Q) = = ∂L Q L
=⇒
QK b ∂(ln Q) = = ∂K Q K QLL = QKK =
QL =
=⇒
a Q L
QK =
b Q K
a
a a
a(a − 1) a a
QL + − 2 Q = Q+ − 2 Q= Q L L L L L L2 b
b b
b(b − 1) b b
QK + − 2 Q = Q+ − 2 Q= Q K K K K K K2
a ab QK = Q. L LK The Hessian matrix of Q is given by: QKL = QLK =
⎡
HQ (L, K) =
QLL QKL
QLK QKK
a(a − 1) ⎢ L2 ⎢ = Q⎢ ⎣ ab LK
ab LK
⎤
⎥ ⎥ ⎥. b(b − 1) ⎦ K2
Example 12. Laplace’s equation of a function u = u(x1 , . . . , xn ) is u =
∂2u ∂2u ∂2u + + ... + = 0. 2 2 ∂x1 ∂x2 ∂x2n
For which value of k, the function u = (x21 + x22 + . . . + x2n )k satisfies Laplace’s equation? Solution: We have ∂u = 2xi k(x21 + x22 + . . . + x2n )k−1 ∂xi ∂2u = 2k(x21 + x22 + . . . + x2n )k−1 + 4x2i k(k − 1)(x21 + x22 + . . . + x2n )k−2 ∂x2i u =
∂2u ∂2u ∂2u + + ... + 2 2 ∂x1 ∂x2 ∂x2n
= 2kn(x21 + x22 + . . . + x2n )k−1 + 4k(k − 1) n
x2i (x21 + x22 + . . . + x2n )k−2 × i=1
= 2k[n + 2(k − 1)](x21 + x22 + . . . + x2n )k−1 . Thus u = 0 if n + 2(k − 1) = 0 ie. for k = 1 + n/2.
Introduction
33
Differentiability While the existence of a derivative of a one variable function at a point guarantees the continuity of the function at this point, the existence of partial derivatives for a function of several variables doesn’t. Indeed, for example ⎧ if x > 0 and y > 0 ⎨ 2 f (x, y) = ⎩ 0 if not has partial derivatives at (0, 0) since f (h, 0) − f (0, 0) 0−0 = lim = lim 0 = 0, h→0 h→0 h→0 h h
fx (0, 0) = lim
f (0, h) − f (0, 0) =0 h but f is not continuous at (0, 0) since fy (0, 0) = lim
h→0
lim f (t, t) = lim 2 = 2 = 0 = f (0, 0).
t→0+
t→0+
This motivates, the following definition
Definition 1.3.6 A function of n variables is said to be differentiable at a = (a1 , . . . , an ) provided that fxi (a), i = 1, . . . , n exist and that there exists a function ε : R+ −→ R such that: f (x) = f (a) + fx1 (a)(x1 − a1 ) + . . . + fxn (a)(xn − an ) + x − aε(x − a) with lim ε(x − a) = 0.
x−→a
Remark 1.3.4 The definition extends the concept of differentiability of functions of one variable to functions of n variables in such a way that we preserve properties like: - f continuous at a; - the values of f at points near a can be very closely approximated by the values of a linear function: f (x) ≈ f (a) + fx1 (a)(x1 − a1 ) + . . . + fxn (a)(xn − an ).
34
Introduction to the Theory of Optimization in Euclidean Space
The next theorem provides particular conditions for a function f to be differentiable.
Theorem 1.3.2 If all first-order partial derivatives of f exist and are continuous at a point, then f is differentiable at that point.
If f has continuous partial derivatives of first-order in a domain D, we call f continuously differentiable in D. In this case, f is also called a C 1 function on D. If all partial derivatives up to order k exist and are continuous, f is called a C k function. Example 13. Use the linear approximation to estimate the change of the Cobb-Douglas production function Q(L, K) = L1/3 K 2/3
from
Solution: We have 1 2 Q, QK (L, K) = Q, QL (L, K) = 3L 3K 1 Q(20, 10), QL (20, 10) = 3(20) Thus, close to (20, 10), we have
(20, 10)
to
(20.6, 10.3).
Q(20, 10) = 201/3 102/3 = 10(21/3 ), QK (20, 10) =
2 Q(20, 10) 3(10)
Q(L, K) ≈ Q(20, 10) + QL (20, 10)(L − 20) + QK (20, 10)(K − 10)
2 1 = 1 + (L − 20) + (K − 10) Q(20, 10) 60 30 from which we deduce the estimate
2 1 Q(20.6, 10.3) ≈ 1+ (20.6−20)+ (10.3−10) Q(20, 10) = 1.003 Q(20, 10). 60 30 Another consequence of the differentiability is the chain rule for derivation under composition.
Theorem 1.3.3 Chain rule 1 If f is differentiable at x = (x1 , x2 , . . . , xn ) and each xj = xj (t), j = 1, . . . , n, is a differentiable function of a variable t, then z = f (x(t)) is differentiable at t and ∂z dx1 ∂z dx2 ∂z dxn dz = + + ...... + . dt ∂x1 dt ∂x2 dt ∂xn dt
Introduction
35
Proof. Since f is differentiable at the point a, then, for x(t) close to a = x(t0 ), we have f (x(t)) − f (a) = fx1 (a)(x1 (t) − a1 ) + · · · + fxn (a)(xn (t) − an ) +x(t) − aε(x(t) − a)
with
lim ε(x − a) = 0.
x−→a
Dividing each side of the equality by Δt = t − t0 , we obtain x (t) − a
x (t) − a
f (x(t)) − f (a) 1 1 n n = fx1 (a) + . . . . . . + fxn (a) Δt Δt Δt x(t) − a + ε(x(t) − a). Δt Then letting t −→ t0 and using the fact that each xj = xj (t), j = 1, . . . , n, is a differentiable function of the variable t and that lim ε(x − a) = 0, then x−→a
x (t) − a
f (x(t)) − f (a) 1 1 = fx1 (a). lim + ... t−→t0 t−→t0 Δt Δt x (t) − a x(t) − a n n + fxn (a). lim + lim . lim (x(t) − a) t−→t0 t−→t0 t−→t0 Δt Δt from which we deduce that lim
dx dxn d(f (x(t))) dx1 (t0 ) + . . . . . . + fxn (a). (t0 ) + (t0 ).0 = fx1 (a). dt dt dt dt t=t0 and the result follows. In the general situation, each variable xi is a function of m independent variables t1 , t2 , . . . , tm . Then z = f (x(t1 , t2 , . . . , tm )) is a function of ∂z , we hold ti with i = j fixed and compute the t1 , t2 , . . . , tm . To compute ∂tj ordinary derivative of z with respect to tj . The result is given by the following theorem:
Theorem 1.3.4 Chain rule 2 If f is differentiable at x = (x1 , x2 , . . . , xn ) and each xj = xj (t1 , t2 , · · · , tm ), j = 1, · · · , n, is a differentiable function of m variables t1 , t2 , . . . , tm , then z = f (x(t1 , t2 , . . . , tm )) is differentiable at (t1 , t2 , . . . , tm ) and ∂z ∂x1 ∂z ∂x2 ∂z ∂xn ∂z = + + ...... + . ∂ti ∂x1 ∂ti ∂x2 ∂ti ∂xn ∂ti
36
Introduction to the Theory of Optimization in Euclidean Space
Example 14. Let f (x, y) = x2 − 2xy + 2y 3 ,
x = s ln t,
y = s t.
Use the chain rule formula to find ∂f , ∂s
∂f , ∂t
∂f |s=1,t=1 ∂s
and
∂f |s=1,t=1 . ∂t
Solution: i) We have ∂f = 2x − 2y, ∂x
∂f = −2x + 6y 2 , ∂y
x = x(s, t),
∂x = ln t, ∂s
y = y(s, t),
∂y = t, ∂s
∂x s = , ∂t t ∂y = s. ∂t
Hence the partial derivatives of f at (s, t) are: ∂f ∂x ∂f ∂y ∂f = . + . = (2x − 2y) ln t + (−2x + 6y 2 )t ∂s ∂x ∂s ∂y ∂s = (2s ln t − 2st) ln t + (−2s ln t + 6s2 t2 )t
∂f ∂x ∂f ∂y s ∂f = . + . = (2x − 2y) + (−2x + 6y 2 )s ∂t ∂x ∂t ∂y ∂t t s = (2s ln t − 2st) + (−2s ln t + 6s2 t2 )s. t ii) When s = 1 and t = 1, we have x(1, 1) = (1) ln(1) = 0,
and
y(1, 1) = 1.
Thus the partial derivatives of f at (s, t) = (1, 1) are:
∂f ∂s
s=1,t=1
∂f ∂t
s=1,t=1
= (2x(s, t) − 2y(s, t)) ln t + (−2x(s, t) + 6y 2 (s, t))t = (2x(s, t) − 2y(s, t)) st + (−2x(s, t) + 6y(s, t)2 )s
s=1,t=1
s=1,t=1
= 6
= 4.
Introduction
37
Solved Problems
1. – Sketch the domains of definition of the functions given by the following formulas: ii) f (x, y, z) = z (1 − x2 )(y 2 − 4) i) f (x, y) = e2x y − x2 iii) H(x, y, z) = z − x2 − y 2 .
Solution: 2 2 Df : 1 5 x y 4 0
DH : z x2 y2
y
2
y
0
0 2
5 5 10
y z 8
5 0
z
y x2 0
Df
0
6 5
4
5
5
2
2
0
0
x 3
2
1
1
2
3
x
x
2
5
FIGURE 1.22: Domains of definitions i)
f (x, y) = e2x
y − x2 Df = {(x, y) ∈ R2 :
y − x2 0}
the plane region located above the parabola, including the parabola. ii)
f (x, y, z) = z
(1 − x2 )(y 2 − 4)
Df = {(x, y, z) ∈ R3 : x 1 − x2 y2 − 4
-2 − +
(1 − x2 )(y 2 − 4) 0}
-1 − −
1 + −
2 − −
− +
38
Introduction to the Theory of Optimization in Euclidean Space
so
= [−1, 1] × (−∞, −2] ∪ [2, +∞) × R
∪ (−∞, −1] ∪ [1, +∞) × [−2, 2] × R .
Df
iii)
H(x, y, z) =
z − x2 − y 2
DH = {(x, y, z) ∈ R3 :
z − x2 − y 2 0}
set of points bounded by the paraboloid z = x2 +y 2 , including the paraboloid. The three domains are illustrated in Figure 1.22.
2. – Match the functions with their graphs in Figure 1.23. a.
y − z2 = 0
d.
x2 +
b.
x+y+z =0
y2 − z2 = 1 9
x2 +
e.
c.
4x2 +
y2 + z2 = 1 9
f.
z − y2 = 0
y2 = z2 9
2 2
2
y
y
y 0
0
1
0 1
2
2
2
1.0 4
2
z
0.5 z
z0.0
0
2 0.5
2
0
1.0
2
1.0
1
0.5
2
0.0 x
0
A
x 2 3 y
B
0 0.5 1.0
2
1
2
y 0
2
0
5
1.0 1.0 0.5
1
0.5 z0.0
0
z0.0
0.5
1
0.5
1.0
2
1.0
1.0 0.5
2
D
1
5
y 0
2
2
z
x
C
0.0 x
0 x
2
E
1 0.5 1.0
F
FIGURE 1.23: Surfaces in R3
x
0 1
Introduction
39
Solution: equation of the surf ace a.
b.
its graph
why?
y − z2 = 0
(D)
parabolic cylinder in the direction of the x − axis, located in y 0
x+y+z =0
(A)
a plane
c.
4x2 +
y2 9
+ z2 = 1
(E)
ellipsoid centered at (0, 0, 0)
d.
x2 +
y2 9
− z2 = 1
(F )
the traces at z = −1, 0, 1 are ellipses
(B)
elliptic cone
(C)
parabolic cylinder in the direction of the x − axis, located in z 0
e. f.
x2 +
y2 9
= z2
z − y2 = 0
3. – Sketch the graphs of the following functions: i) f (x, y) = 81 − x2 ii) f (x, y) = 3 iii) f (x, y) = − x2 + y 2 .
Solution: i) Df
z
15
81 x2
10
y 0 10
10 15
5
10
0 z
5
5
0
10
10 0
15
x 15
10
5
0
5
10
15
10
FIGURE 1.24: Domain and graph of z =
√
81 − x2
40
Introduction to the Theory of Optimization in Euclidean Space
Domain of f : Df = {(x, y) ∈ R2 : 81 − x2 0} = {(x, y) ∈ R2 : |x| 9} Graph of f : Gf = {(x, y, z) ∈ R3 : ∃(x, y) ∈ Df such that z = = {(x, y, z) ∈ R3 :
81 − x2 }
∃(x, y) ∈ Df such that x2 + z 2 = 81,
z 0}.
It is the half circular cylinder located in the z 0 with radius 9 and axis the y axis (see Figure 1.24). ii) Df = {(x, y) ∈ R2 :
Domain of f :
Gf = {(x, y, z) ∈ R3 :
Graph of f :
f (x, y) = 3 ∈ R} = R2 ∃(x, y) ∈ Df such that z = 3}
It is the plane passing through (0, 0, 3) with normal vector k = 0, 0, 1 (see Figure 1.25). iii) Df = {(x, y) ∈ R2 : x2 + y 2 0} = R2
Domain of f :
Gf = {(x, y, z) ∈ R3 : ∃(x, y) ∈ Df such that z = − x2 + y 2 }
Graph of f :
= {(x, y, z) ∈ R3 : ∃(x, y) ∈ R2 such that z 2 = x2 + y 2 , z 0} The graph is the part of the circular cone z 2 = x2 + y 2 located in the region [z 0]; see Figure 1.25. 5
z3
x2 y2
z1.0
y
y
0
0.5
0.0 0.5
5
1.0 0.0
4
z
z 0.5 2
0
1.0
5
1.0 0.5 0
0.0
x
x 5
FIGURE 1.25: Graph of z = 3
0.5 1.0
and
graph of z = −
x2 + y 2
Introduction
41
4. – Match the surfaces with the level curves in Figure 1.26. 10
2
5
1
4
2
0
0
5
1
0
2
4 10
1.
10
4.
5
0
5
2.
10
2 2
1
0
1
3.
2
10
2
10
5
1
5
0
0
0
5
1
5
10 10
5
0
5
10
5.
4
2 2
1
0
1
6.
2
1
1 0 2
10
2
4
5
0
5
10
1
2
5
0 1 2
2 2
1
z
1
1
0 y
0
2
1 z 0
2
2
10
2
3 z 2
4
0 y
0
5
y
1 x
1
0
x
1
A
2
1
0 1
B
2
0 x
2
2.0
C
2
1.0
1.5 z 1.0
z
1
0.5 0.0 2
0.06
10
0.0 0.5
0
0
D
1
1
2
2
5
0 x
1
0
2
y
5 x
1
0.00
10
y
1
2
z 0.04 0.02
5
1.0
0
5
0.08
0.5
2
5
E
x
5 10
1
0 1
F
10
2
FIGURE 1.26: Surfaces and their level curves Solution: level curves
(1)
(2)
(3)
(4)
(5)
(6)
surface
(E)
(F )
(C)
(A)
(D)
(B)
2
y
42
Introduction to the Theory of Optimization in Euclidean Space 5. – Draw a set of level curves for the following functions: i)
z = x2 + y
ii)
f (x, y, z) = (x − 2)2 + y 2 + z 2 .
Solution: i) We have: Df = {(x, y) ∈ R2 ,
x2 + y ∈ R} = R2 ,
z = x2 + y = k ⇐⇒ y − k = x2 : parabola with vertex (0, k) and axis the line Oy; see Figure 1.27. ii) The level curve (see the 2nd graph in Figure 1.27) (x − 2)2 + y 2 + z 2 = k is reduced to ⎧ the point (2, 0, 0) ⎪ ⎪ ⎪ ⎪ ⎨ √ the sphere centered at (2, 0, 0) with radius k ⎪ ⎪ ⎪ ⎪ ⎩ no points
if
k=0
if
k>0
if
k < 0.
x 22 y2 z2 k 2
y
x2 y k
0
y 2 4
2 2
z 4
2
2
4
0
x
2 2 2 4
0 x 2
FIGURE 1.27: Level curves x2 +y = k and level surfaces (x−2)2 +y 2 +z 2 = k
6. – Sketch the largest region on which the function is continuous. Explain why the function is continuous. f (x, y, z) = y − x2 ln z.
Introduction
43
Solution: f is continuous on its domain of definition Df = {(x, y, z) ∈ R3 / y − x2 0
and
z > 0}
because it is the product of the two continuous functions: ∗ u : (x, y, z) −→ ln z continuous on D1 = {(x, y, z) : z > 0} with values in R as the composite of the polynomial function (x, y, z) ∈ D1 −→ z ∈ R+ \ {0} and the function t −→ ln t continuous on R+ \ {0}; we have (x, y, z) ∈ D1 −→ z = t ∈ R+ \ {0} −→ ln t.
y − x2 : continuous on D2 = {(x, y, z) : y − x2 0} as the composite of the polynomial function(x, y, z) ∈ D2 −→ y − x2 √ ∈ R+ and the function t −→ t continuous on R+ ; we have √ (x, y, z) ∈ D2 −→ y − x2 = t ∈ R+ \ {0} −→ t.
∗ ∗ v : (x, y, z) −→
∗∗∗
f = u.v is continuous on D1 ∩ D2 = Df , the set in Figure 1.28. 3 y
Df
2
1 0 10
z
5
0 2 0 x 2
FIGURE 1.28: Domain of continuity of f (x, y, z) =
y − x2 ln z
7. – Let f (x, y, z) = x2 y 2 − y 3 + 3x4 + xe−2z sin(πy) + 5. Find (a)
fxy
(b)
fyz
(c)
fxz
(e)
fzyy
(f )
fxxy
(g)
fzyx
(d) (h)
fzz fxxyz .
44
Introduction to the Theory of Optimization in Euclidean Space
Solution: Note that f is indefinitely differentiable. Therefore, we can change the order of differentiation with respect of the variables by using Clairaut’s theorem. fx = 2xy 2 + 12x3 + e−2z sin(πy) fy = 2x2 y − 3y 2 + πxe−2z cos(πy)
fz = −2xe−2z sin(πy)
(a)
fxy = (fx )y = 4xy + πe−2z cos(πy)
(b)
fyz = (fy )z = −2πxe−2z cos(πy)
(c)
fxz = (fx )z = −2e−2z sin(πy)
(e)
fzyy = (fzy )y = (fyz )y = 2π 2 xe−2z sin(πy),
(f )
fxxy = (fx )xy = (fx )yx = (fxy )x = 4y
(g)
fzyx = (fzy )x = (fyz )x = −2πe−2z cos(πy),
(h)
fxxyz = (fxxy )z = (4y)z = 0.
(d)
fzz = (fz )z = 4xe−2z sin(πy)
8. – Show that u = ln(x2 + y 2 ) satisfies Laplace equation Show, without calculation, that:
∂2u ∂2u + 2 = 0. ∂x2 ∂y
∂2u ∂2u = . ∂x∂y ∂y∂x
Solution: We have 2x ∂u = 2 ∂x x + y2
2y ∂u = 2 ∂y x + y2
(1)(x2 + y 2 ) − x(2x) (y 2 − x2 ) ∂2u = 2 = 2 ∂x2 (x2 + y 2 )2 (x2 + y 2 )2
(x2 − y 2 ) ∂2u = 2 ∂y 2 (x2 + y 2 )2
(y 2 − x2 ) (x2 − y 2 ) ∂2u ∂2u + 2 =2 2 +2 2 = 0. 2 2 2 ∂x ∂y (x + y ) (x + y 2 )2
Introduction Note that
45
∂2u ∂u is a fraction. Then is also a fraction. As a consequence, ∂x ∂y∂x
∂2u is continuous on R2 \ {(0, 0)}. ∂y∂x ∂2u ∂2u ∂u is a fraction, is also a fraction. Therefore, In the same way, ∂y ∂x∂y ∂x∂y is continuous on R2 \ {(0, 0)}.
From Clairaut’s Theorem, the two second derivatives uxy and uyx are equal on R2 \ {(0, 0)}.
9. – Find the value
dw if ds s=0
w = x2 e2y cos(3z);
x = cos s,
y = ln(s + 2),
z = s.
Solution: We have x = x(s), y = y(s), z = z(s) and w = w(x, y, z). Then dx = − sin s, ds
dy 1 = , ds s+2
∂w = 2xe2y cos(3z), ∂x x(0) = 1,
dz =1 ds
∂w = 2x2 e2y cos(3z), ∂y
y(0) = ln 2,
∂w = −3x2 e2y sin(3z) ∂z
z(0) = 0
∂w dx ∂w dy ∂w dz dw = + + ds ∂x ds ∂y ds ∂z ds = [2xe2y cos(3z)](− sin s) + [2x2 e2y cos(3z)]
1
+[−3x2 e2y sin(3z)] s+2
dw = e2 ln 2 = 4. ds s=0
10. – Let R = ln(u2 + v 2 + w2 ), Find
∂R |x=1,y=0 ∂x
u = x + 2y,
and
v = 2x − y, ∂R |x=1,y=0 . ∂y
w = 2xy.
46
Introduction to the Theory of Optimization in Euclidean Space
Solution: We have 2u ∂R = 2 , ∂u u + v 2 + w2
2v ∂R = 2 , ∂v u + v 2 + w2
∂u = 1, ∂x
∂v = 2, ∂x
∂u = 2, ∂y
∂v = −1, ∂y
2w ∂R = 2 , ∂w u + v 2 + w2 ∂w = 2y, ∂x ∂w = 2x. ∂y
The partial derivatives of R are: ∂R ∂u ∂R ∂v ∂R ∂w 2u + 4v + 4wy ∂R = . + . + . = 2 ∂x ∂u ∂x ∂v ∂x ∂w ∂x u + v 2 + w2 ∂R ∂u ∂R ∂v ∂R ∂w 4u − 2v + 4wx ∂R = . + . + . = 2 . ∂y ∂u ∂y ∂v ∂y ∂w ∂y u + v 2 + w2 When x = 1 and y = 0, we have u=1
v = 2,
u2 + v 2 + w2 = 5.
w = 0,
Thus ∂R 4(1) − 2(2) + 4(0) = = 0. ∂y 5
2(1) + 4(2) + 4(0) ∂R = = 2, ∂x 5
11. – Use the linear approximation of f (x, y, z) = x3 (2, 3, 4) to estimate the number (1.98)3 (3.01)2 + (3.97)2 .
y 2 + z 2 at the point
Solution: Since f is differentiable at the point (2, 3, 4), the linear approximation of L(x, y, z) at the point (2, 3, 4) is given by: L(x, y, z) = f (2, 3, 4) + fx (2, 3, 4)(x − 2) + fy (2, 3, 4)(y − 3) + fz (2, 3, 4)(z − 4). We have fx = 3x2
y2 + z2 ,
fy =
yx3 y2 + z2
,
fz =
zx3 y2 + z2
Introduction
47
and fx (2, 3, 4) = 60,
f (2, 3, 4) = 40,
fy (2, 3, 4) =
24 , 5
fz (2, 3, 4) =
32 . 5
Thus 24 32 12 (x − 2) + (y − 3) + (z − 4). 5 5 5 Using this approximation, one obtain the following estimate: L(x, y, z) = 40 +
(1.98)3
(3.01)2 + (3.97)2 ≈ L(1.98, 3.01, 3.97) 24 32 = 40 + 60(1.98 − 2) + (3.01 − 3) + (3.97 − 4)3 5 5 32 24 = 40 + 60(−0.02) + (0.01) + (−0.03) = 38.656. 5 5
12. – Determine whether the limit exists. If so, find its value. x4 − x + y − x3 y , x−y (x,y)→(0,0) lim
cos(xy) , (x,y)→(0,0) x + y lim
x − y4 . (x,y)→(1,1) x3 − y 4 lim
Solution: We have i)
x4 − x + y − x3 y x3 (x − y) − (x − y) = lim x−y x−y (x,y)→(0,0) (x,y)→(0,0) lim
= ii)
lim
(x,y)→(0,0)
x3 − 1 = −1.
cos(xy) doesn’t exist since (x,y)→(0,0) x + y lim
cos(xy) cos(t2 ) = lim = +∞. 2t (x,y)=(t,t),t>0→(0,0) x + y t→0+ cos(xy) cos(t2 ) and lim = lim = −∞. 2t (x,y)=(t,t),t 0
f (x) f (x∗ )
such that
(resp. )
∀x ∈ Br (x∗ ) ∩ S.
– a strict local maximum (resp. minimum) of f if ∃r > 0 such that f (x) < f (x∗ ) (resp. >) ∀x ∈ Br (x∗ )∩S,
x = x∗ .
– a global maximum (resp. minimum) of f if f (x) f (x∗ )
(resp. )
∀x ∈ S.
– a strict global maximum (resp. minimum) of f if f (x) < f (x∗ )
∀x ∈ S,
(resp. >)
x = x∗ .
Remark 2.1.1 Note that a global extreme point is also a local extreme point when S is an open set, but the converse is not always true.
Indeed, suppose, for example, that x∗ is such that min f (x) = f (x∗ ) S
then
f (x) f (x∗ )
∀x ∈ S.
∗
Because S is an open set and x ∈ S, there exists a ball Br (x∗ ) such that Br (x∗ ) ⊂ S, and then, in particular, f (x) f (x∗ )
∀x ∈ Br (x∗ )
which shows that x∗ is a local minimum. To show that the converse is not true, consider the function f (x) = x3 − 3x. The study of the variations of f , in Table 2.1, and its graph, in Figure 2.1, show that f has a local minimum at x = 1 and a local maximum at x = −1, but none of them is a global maximum or a global minimum, as we have f (x) = 3x2 −3
f (x) = 6x
lim f (x) = +∞
x→+∞
lim f (x) = −∞.
x→−∞
Now, here is a characterization of a local extreme point for a regular objective function.
Unconstrained Optimization x f (x) f (x) f (x) f is
−∞ −∞
−1 + − concave
51
0 −
2 − concave
+∞
1 −2
+
+ convex
+∞ + convex
TABLE 2.1: Study of f (x) = x3 − 3x y 3
2
y x3 3 x
1
3
2
1
1
2
3
x
1
2
3
FIGURE 2.1: Local extreme points but not global ones
Theorem 2.1.1 Necessary condition for local extreme points Let S ⊂ Rn and f : S −→ R be a differentiable function at an interior ◦
point x∗ ∈ S. Then x∗
is a local extreme point
=⇒
∇f (x∗ ) = 0.
Proof. Suppose f has a local minimum at x∗ . Since f is differentiable at x = (x∗1 , x∗2 , . . . , x∗n ), its first derivatives exist. From the definition of the partial derivative, we have, for j ∈ {1, . . . , n} ∗
f (x∗1 , . . . , x∗j + t, . . . , x∗n ) − f (x∗1 , . . . , x∗j , . . . , x∗n ) ∂f ∗ (x ) = lim t→0 ∂xj t Because f has an interior local minimum at x∗ , there is an > 0 such that ∀x ∈ B (x∗ ) ⊂ S
=⇒
f (x) f (x∗ ).
In particular, for |t| < , we have (x∗1 , . . . , x∗j +t, . . . , x∗n )−x∗ = (x∗1 , . . . , x∗j +t, . . . , x∗n )−(x∗1 , . . . , x∗j , . . . , x∗n )
52
Introduction to the Theory of Optimization in Euclidean Space = (0, . . . , 0, t, 0, . . . , 0) = 02 + . . . + 02 + t2 + 02 + . . . + 02 = |t| < .
Thus the points (x∗1 , . . . , x∗j + t, . . . , x∗n ) remain inside the ball B (x∗ ) and therefore satisfy f (x∗1 , . . . , x∗j + t, . . . , x∗n ) f (x∗ ) ⇐⇒
f (x∗1 , . . . , x∗j + t, . . . , x∗n ) − f (x∗1 , . . . , x∗j , . . . , x∗n ) 0.
Thus, if t is positive, f (x∗1 , . . . , x∗j + t, . . . , x∗n ) − f (x∗1 , . . . , x∗j , . . . , x∗n ) 0 t and letting t → 0+ , we deduce that lim+
t→0
f (x∗1 , . . . , x∗j + t, . . . , x∗n ) − f (x∗1 , . . . , x∗j , . . . , x∗n ) 0. t
In the same way, if t is negative, f (x∗1 , . . . , x∗j + t, . . . , x∗n ) − f (x∗1 , . . . , x∗j , . . . , x∗n ) 0 t and letting t → 0− , we deduce that lim
t→0−
f (x∗1 , . . . , x∗j + t, . . . , x∗n ) − f (x∗1 , . . . , x∗j , . . . , x∗n ) 0. t
Because lim
t→0+
f (x∗1 , . . . , x∗j + t, . . . , x∗n ) − f (x∗1 , . . . , x∗j , . . . , x∗n ) t
= lim
t→0−
f (x∗1 , . . . , x∗j + t, . . . , x∗n ) − f (x∗1 , . . . , x∗j , . . . , x∗n ) ∂f ∗ = (x ) t ∂xj
∂f ∗ ∂f ∗ ∂f ∗ (x ) 0 and (x ) 0, and we deduce that (x ) = 0. ∂xj ∂xj ∂xj This holds for each j ∈ {1, . . . , n}. Hence ∇f (x∗ ) = 0.
we have
A similar argument applies if f has a local maximum at x∗ .
Remark 2.1.2 Note that a local extremum can also occur at a point where a function is not differentiable.
Unconstrained Optimization
53
y 3
2
y x
1
3
2
1
1
2
3
x
1
FIGURE 2.2: A minimum point where f (x) = |x| is not differentiable
• For example, the one variable function f (x) = |x|, illustrated in Figure 2.2, has a local minimum at 0 but f is not differentiable at 0 since we have
lim+
f (x) − f (0) x−0 = lim+ =1 x x x→0
lim
f (x) − f (0) −x − 0 = lim = −1. x x x→0−
x→0
x→0−
Moreover 0 is a global minimum since we have f (x) = |x| 0 = f (0) z 1.0 y
∀x ∈ R.
x2 y2
0.5
0.0 0.5
1.0
1.0 1.0
0.5
z
0.5 0.0
0.0
0.5
1.0 0.5 0.0 x
0.5
1.0 1.0
1.0
0.5
FIGURE 2.3: A minimum point where f (x, y) = tiable
0.0
0.5
1.0
x2 + y 2 is not differen-
x2 + y 2 , graphed in Figure 2.3, • The two variables function f (x, y) = attains its minimum value at (0, 0) because we can see that ∀(x, y) ∈ R2 . f (x, y) = x2 + y 2 0 = f (0, 0) But f is not differentiable at (0, 0) since, for example fx (0, 0) doesn’t exist. Indeed, we have
54
Introduction to the Theory of Optimization in Euclidean Space ⎧ √ if h → 0+ ⎨ 1, |h| h2 − 0 f (0 + h, 0) − f (0, 0) = = −→ ⎩ h h h −1, if h → 0− .
The above remark leads to the following definition.
Definition 2.1.2 Critical point An interior point x∗ of the domain of a function f is a critical point of f if it is a stationary point where ∇f (x∗ ) = 0 or a point where f is not differentiable.
Example 1. (0, 0) is the only stationary point for the functions f and g i)
f (x, y) = x2 + y 2
g(x, y) = 1 − x2 − y 2 .
ii)
It is a local and absolute minimum for f and a local and absolute maximum for g. The values of the level curves are increasing in Figure 2.4, while they are decreasing in Figure 2.5. z x2 y2
1.0 1.8
1.08
1.26
1.62
1.44
1.8 1.62
1.0
1.44 0.72 1.26
0.5 0.36 1.08 z
0.5 0.0
1.08 0.0
1.08
0.18
0.5
1.0
0.54 1.44
0.5 0.0 x
1.44 1.26
1.8
0.5
1.0 1.0
1.0
0.9
1.62
1.26 0.5
0.0
0.5
1.62
1.8 1.0
FIGURE 2.4: Local minimum point that is a global one Indeed, we have for any (x, y) ∈ R2 f (x, y) = x2 + y 2 0 = f (0, 0)
and
g(x, y) = 1 − (x2 + y 2 ) 1 = g(0, 0).
Example 2. In economics, one is interested in maximizing the total profit P (x) in the sale of x units of some product. If C(x) is the total cost of production and R(x) is the revenue function then P (x) = R(x) − C(x).
Unconstrained Optimization
55
z x2 y2 1
1.0 0.9 0.54
0.18
0.36
0.72
1.5
0.36
0.72 0.9 0.54
0.36 0.5
0.1
1.0 z 0.54
0.0 0 0.5 0.72 0.0
0.1
0.5
1.0
0.36
0.5
0.54
0.0 x
0.72 0.5
1.0 0.9 0.54 1.0
1.0
0.18 0.36
0.18 0.5
0.0
0.5
0.720.9 1.0
FIGURE 2.5: Local maximum point that is a global one
The maximum profit occurs when P (x) = 0, or R (x) = C (x). From the linear approximation, we have for Δx = 1, R(x+1)−R(x) ≈ R (x)Δx = R (x),
C(x+1)−C(x) ≈ C (x)Δx = C (x),
C(x + 1) − C(x) ≈ R(x + 1) − R(x) that is, the cost of manufacturing an additional unit of a product is approximately equal to the revenue generated by that unit. P (x), R (x), C (x) are interpreted respectively as the additional profit, revenue and cost that result from producing one additional unit when the production and sales levels are at x units.
Remark 2.1.3 A function needs not to have a local extremum at every local critical point.
• For example, the one variable function f (x) = x3 has a local critical point since ⇐⇒ x = 0. f (x) = 3x2 = 0 But 0 is not a local extremum (see Figure 2.6). Indeed we have f (x) = x3 > 0 = f (0)
∀x > 0
and
f (x) = x3 < 0 = f (0)
∀x < 0.
The point 0 is called an inflection point. • The two variables function f (x, y) = y 2 − x2 , graphed in Figure 2.7, has a critical point at (0, 0) since we have ∇f (x, y) = −2x, 2y = 0, 0
⇐⇒
(x, y) = (0, 0).
56
Introduction to the Theory of Optimization in Euclidean Space y 2
y x3
1
2
1
1
2
x
1
2
FIGURE 2.6: The critical point x = 0 is an inflection point for f (x) = x3
However, the function f has neither a relative maximumnor a relative minimum at (0,0). Indeed, along the x and y axis, we have f (x, 0) = −x2 0 = f (0, 0)
∀x ∈ R and f (0, y) = y 2 0 = f (0, 0)
∀y ∈ R.
The point (0, 0) is called a saddle point. Figure 2.7 shows how the values of the level curves are increasing in one side and decreasing on the other side. 1.0 z y
y2 x2
0.5
0.0 0.5
1.0
0.54
0.72
0.18
1.0 1.0
0.3 0.18 0.50.72
0.5
0.36
0
0.9 z
0
0.36
0.0 0.0
0.9
0.54
0.5
1.0
0
0.5
1.0
0.54
0.5
0.72
0.18 0.54
0.0 x
0.5
1.0 1.0
1.0
0.36
0.72 0.5
0.0
0.5
0.1 1.0
FIGURE 2.7: (0, 0) is a saddle point for f (x, y) = y 2 − x2
Definition 2.1.3 Saddle point A differentiable function f (x) has a saddle point at a critical point x∗ if in every open ball centered at x∗ there are domain points x where f (x) > f (x∗ ) and domain points x where f (x) < f (x∗ ).
Remark 2.1.4 In two dimensions, the projection of horizontal traces shows circular curves around (x∗ , y ∗ ) when it is a local extreme point, and hyperbolas when the point is a saddle point.
Unconstrained Optimization
57
Now, we give a necessary condition when the extreme point is not necessarily an interior point [5].
Theorem 2.1.2 Necessary condition for a relative extreme point on a convex set Let S ⊂ Ω ⊂ Rn , Ω an open set, S a convex set and f : Ω −→ R be a differentiable function at a point x∗ ∈ S. Then f (x) f (x∗ ) =⇒
(resp. )
∇f (x∗ ).(x − x∗ ) 0
∀x ∈ S (resp. 0)
∀x ∈ S.
Proof. Let x ∈ S, x = x∗ . Since S is convex, θx + (1 − θ)x∗ = x∗ + θ(x − x ) ∈ S for θ ∈ [0, 1]. Suppose f has a relative minimum at x∗ . Since f is differentiable at x∗ , we can write ∗
f (x∗ + θ(x − x∗ )) − f (x∗ ) = θ[ f (x∗ ).(x − x∗ ) + (θ) ],
lim (θ) = 0.
θ→0
If f (x∗ ).(x − x∗ ) < 0, then ∃θ0 ∈ (0, 1) : Hence,
∀θ ∈ (0, θ0 )
∀θ ∈ (0, θ0 ),
=⇒
1 |(θ)| < − f (x∗ ).(x − x∗ ). 2
f (x∗ + θ(x − x∗ )) − f (x∗ )
0, x2 > 0} since f (x1 , x2 ) = 2x1 −1+x2 , 1+x1 = 0, 0
⇐⇒
◦
(x1 , x2 ) = (−1, 3) ∈ S.
So the minimum value, if it exists, must be attained on the boundary of S. Note that 1 1 1 1 f (x1 , 0) = x21 − x1 = (x1 − )2 − − = f ( , 0) 2 4 4 2
∀x1 0
and f (0, x2 ) = x2 0 Since −1/4 < 0, the point shown in Figure 2.9.
( 12 , 0)
∀x2 0.
is the global minimum point of f on S, as
At this point f (x1 , x2 )
x1 = 12 ,x2 =0
= 2x1 − 1 + x2 , 1 + x1
x1 = 12 ,x2 =0
3 = 0, = 0. 2
Unconstrained Optimization 2 z 10 x y x x y
y
59 2 z 10 x y x x y
5
y
0
5
5 10 10
0 10
5
z
5
z
0
5
0
5
10
10
10
0 5 0 x
5 x
5 10
10
FIGURE 2.9: Min f attained at the boundary of x1 0, x2 0 and 1 3 1 ∇f ( , 0).x1 − , x2 − 0 = x2 0 2 2 2
∀(x1 , x2 ) ∈ S = R+ × R+ .
Remark 2.1.5 * Note that, it is not easy to find the candidate points by solving an inequality ∇f (x∗ ).(x − x∗ ) 0 (resp. 0). However, the information gained is useful to establish other results. ** Solving the equation ∇f (x) = 0 is not that easy either! It induces nonlinear equations or large linear systems when the number of variables is large. To overcome this difficulty, we resort to approximate methods. Newton’s method is one of the well known approximate methods for approaching a root of the equation F (x) = 0. In Exercise 5, the method is described and applied for solving a nonlinear equation in one dimension. Steepest descent method, Conjugate gradient methods and many other methods are developed for approaching the solution [22], [5].
∗ ∗ ∗ Finally, the following example, in dimension 2, shows the necessity of using new methods for finding the critical points. Indeed, the graph, Figure 2.10, of z = f (x, y) = 10e−(x
2
+y 2 )
+ 5e−[(x+5)
2
+(y−3)2 ]/10
+ 4e−2[(x−4)
2
+(y+1)2 ]
,
on the window [−10, 8] × [−10, 8] × [−1, 12], shows three peaks. Thus, we have at least three local maxima points. These points are solution of the system ⎧ 2 2 2 2 2 2 fx = −20xe−(x +y ) − 10(x + 5)e−[(x+5) +(y−3) ]/10 − 16(x − 4)e−2[(x−4) +(y+1) ] = 0 ⎪ ⎪ ⎨ −(x2 +y 2 ) − (y − 3)e−[(x+5)2 +(y−3)2 ]/10 − 16(y + 1)e−2[(x−4)2 +(y+1)2 ] = 0, ⎪ ⎪ ⎩fy = −20ye
60
Introduction to the Theory of Optimization in Euclidean Space
a nonlinear system, for which it is not evident to find an explicit solution by algebraic manipulations. The following Maple software command searches for a solution near (0, 0) using an approximate method: f := (x, y)− > 10 ∗ exp(−x2 − y 2 ) + 5 ∗ exp(−((x + 5)2 + (y − 3)2 ) ∗ (1/10)) +4 ∗ exp(−2 ∗ ((x − 4)2 + (y + 1)2 )) with(Optimization) : N LP Solve(f (x, y), x = −8..8, y = −8..8, initialpoint = x = 0, y = 0, maximize);
The result is [10.1678223807097599, [x = −0.842598632890276e − 2, y = 0.505559179745079e − 2]].
Thus, (−0.084e−2 , 0.5e−2 ) ≈ (−0.115, 0.067) is an approximate critical point, where f takes the approximate local maximal value 10.1678. A search near (−5, 3) and (4, −1) yields to the other approximate local maxima points: with(Optimization); N LP Solve(f (x, y), x = −8..8, y = −8..8, initialpoint = x = −4, y = 3, maximize) [5.00000000000001688, [x = −5.00000000010854, y = 2.99999999999990]] with(Optimization) : N LP Solve(f (x, y), x = −8..8, y = −8..8, initialpoint = x = 4, y = −1, maximize); [4.00030684298145278, [x = 3.99996531847993, y = −.999984626392930]
5
y 0 5 10
5 10
0
z 5
5
0 10 5 x
0
10 5
10
5
FIGURE 2.10: Location of mountains
0
5
Unconstrained Optimization
61
Solved Problems
1. – A suitable choice of the objective function. Find a point on the curve y = x2 that is closest to the point (3, 0).
Solution: When formulating an optimization problem, sometimes, one can encounter some technical difficulties by considering an auxiliary objective function instead of considering the direct one. This situation is illustrated by the two choices below. y 4
3
y x2
2
1
4
2
2
4
x
1
FIGURE 2.11: Closest point • 1st choice. Let D = distance between (3,0) and any point (x, y). Since (x, y) lies on the curve y = x2 , the distance D must satisfy D = D(x) = (x − 3)2 + (y − 0)2 = (x − 3)2 + x4 . We need to solve the problem (see Figure 2.11) min D(x). x∈R
Since R is an open set, the minimum must occur at a critical point, i.e., since D is differentiable, at a point where
62
Introduction to the Theory of Optimization in Euclidean Space 2(x − 3) + 4x3 dD = =0 dx 2 (x − 3)2 + x4 ⇐⇒
2(x − 3) + 4x3 = 2(x − 1)(2x2 + 2x + 3) = 0
⇐⇒
x = 1.
Since D ∈ C 0 (R) and lim D(x) = +∞
and
x→−∞
lim D(x) = +∞,
x→+∞
the minimum exists and it must be at x = 1 [1]. The variations of D is given by Table 2.2. x D (x) D(x)
−∞
−
+∞
+∞
2 0 D(1)
+
TABLE 2.2: Variations of D(x) = Thus min D(x) = D(1) = x∈R
+∞
(x − 3)2 + x4
√ 5.
• 2nd choice. Note that, for any x0 , x ∈ R, we have 0 D2 (x0 ) D2 (x) ⇐⇒ 0 D(x0 ) = D2 (x0 ) D2 (x) = D(x) √ since t is an increasing function on the interval [0, +∞). It suffices, then, to minimize on R the function F (x) = D2 (x) = (x − 3)2 + x4 . Since R is an open set, the minimum must occur at a critical point, i.e., since F is differentiable, at a point where dF = 2(x − 3) + 4x3 = 0 dx
⇐⇒
2(x − 1)(2x2 + 2x + 3) = 0
⇐⇒
x = 1.
Since F ∈ C 0 (R) and lim F (x) = +∞
x→−∞
and
lim F (x) = +∞,
x→+∞
the minimum exists and it must be at x = 1. The variations of F is given by Table 2.3. The point (1, 1) is the closest point on the curve [y = x2 ] to the point (3, 0).
Unconstrained Optimization x F (x) = 2(x − 1)(2x2 + 2x + 3) F (x)
−∞ +∞
−
63 +∞
1 0 F (1)
+
+∞
TABLE 2.3: Variations of F (x) = (x − 3)2 + x4
2. – To minimize the material in manufacturing a closed can with volume capacity of V units, we need to choose a suitable radius for the container. Find the radius if the container is cylindrical.
Solution: From Section 1.1, Example 1, we are lead to solve the minimization problem ⎧ 2V ⎪ ⎨ minimize A = A(r) = 2πr2 + over the set S r ⎪ ⎩ S = (0, +∞) = {r ∈ R / r > 0}. Since S is an open set, the minimum must occur at a critical point, ie., since A(r) is differentiable, at a point where 2V dA = 4πr − 2 = 0 dr r 0 Since A ∈ C (S) and
=⇒
lim A(r) = +∞
and
r→0+
r=
V 1/3 2π
∈ S.
lim A(r) = +∞,
r→+∞
V 1/3 the minimum exists and it must be on r = . Indeed the variations of 2π A are as shown in Table 2.4. r A (r) = 4πr − A(r)
2V r2
0 − +∞
√ V 1/3 / 3 2π 0 + A((V /2π)1/3 )
TABLE 2.4: Variations of A(r) = 2πr2 +
+∞
+∞
2V r
So we should choose for the can a radius r = (V /2π)1/3 and a height h = V /(2πr) = (V /2π)2/3 .
64
Introduction to the Theory of Optimization in Euclidean Space 3. – Locate all absolute maxima and minima if any for each function. f (x, y) = 1 − (x + 1)2 − (y − 5)2
i)
g(x, y) = 3x − 2y + 5
ii) iii)
h(x, y) = x2 − xy + y 2 − 3y.
Solution: i) 2 2 z x 7 1 y 5 1
y
6
5 7 6.48
4
5.04
3.6
4.32
5.76
3
5.76 6.48
2.16 5.0
1.0 4.32 6
0 3.6
0.5 1.44 z
0.0
5
0.5 0.72
3.6
1.0
3.6
4
3 2
5.04
5.0
1 x
0
3 1
6.48 5.76
4.32
3
2
4.32
2.88 1
0
6.48 5.76 1
FIGURE 2.12: Graph and level curves of z = f (x, y) Since f is differentiable on R2 , its absolute extremum that are also local extremum (if they exist), are stationary points, ie. solution of ∇f = −2(x + 1), −2(y − 5) = 0, 0
⇐⇒
(x, y) = (−1, 5).
So, there is only one critical point. It satisfies f (−1, 5) = 1 1 − (x + 1)2 − (y − 5)2 = f (x, y)
∀(x, y) ∈ R2 .
Hence, it is a global maximum of f in R2 ; see Figure 2.12. However, f does not have a global minimum since the following hold: (x + 1)2 + (y − 5)2 = (x, y) − (−1, 5)2 (x, y) − (−1, 5)2 ((x, y) + (−1, 5))2 = (
x2 + y 2 +
√ 2 26)
2 √ (x, y) − (−1, 5)2 (x, y) − (−1, 5) = ( x2 + y 2 − 26)2 .
Unconstrained Optimization Then 1−(
65
√ √ x2 + y 2 + 26)2 f (x, y) 1 − ( x2 + y 2 − 26)2
and we deduce that lim
(x,y)→+∞
f (x, y) = −∞.
It suffices also to show that f takes large negative values on a subset of its domain R2 , like f (x, 5) = 1 − (x + 1)2 −→ −∞
x −→ ±∞.
as
ii) Since g is differentiable on R2 , its absolute extreme points that are also z 3x2 y5 2
y 0
3
2.7
2
1.0
1
0.5
z
8.1
2.7
8.1
2
5.4 13.5 0
0.0
0.5
1
1.0
10.8
5.4
2 0
2 0
16.2
3
x 2
3
2
1
0
1
2
3
FIGURE 2.13: Graph and level curves of g(x, y) = 3x − 2y + 5 local extreme points (if they exist), are stationary points, ie. solution of ∇g = 0, 0. But ∇g = 3, −2 = 0, 0. So, there is no critical point. g has no local or global extreme point. The graph z = g(x, y) is a plane in R3 which spreads in the space taking large values when x or y −→ ±∞; see Figure 2.13. For example g(0, y) = −2y + 5 −→ ∓∞ as g(x, 0) = 3x + 5 −→ ±∞ as
y −→ ±∞ x −→ ±∞.
iii) Since h is differentiable on R2 , its absolute extreme points that are also local extreme points (if they exist) are stationary points, ie. solution of ∇h = 2x − y, −x + 2y − 3 = 0, 0
⇐⇒
(x, y) = (1, 2).
66
Introduction to the Theory of Optimization in Euclidean Space So, there is only one critical point. It satisfies h(1, 2) = 1 − 2 + 4 − 6 = −3 y y2 + y 2 − 3y + 3 h(x, y) − h(1, 2) = x2 − xy + y 2 − 3y + 3 = (x − )2 − 2 4 3 y ∀(x, y) ∈ R2 . = (x − )2 + (y − 1)2 0 2 4
Hence, the point (1, 2) is a global minimum of h in R2 . Here also, one can see that h takes large values, for example, along the x axis, we have h(x, 0) = x2
−→ +∞
when x −→ ±∞.
So h has no global maximum (see Figure 2.14). z 4x2 y x y2 3 y y
3
2 4
1
7
0
5
3
1
6
0
1
4
3
0
1 2 z
2
2
3
2
4
2
1
1
4
0
6 1 x
2
1
0 1
3
0
1
3
5
7
2
3
FIGURE 2.14: Graph and level curves of h(x, y) = x2 − xy + y 2 − 3y
4. – Consider the problem min f (x, y) = y S
where
S = {(x, y) : x2 + y 2 1}.
i) Does f have local minimum points? ii) Where may the minimum points locate if they exist? iii) Solve the inequality ∇f (a, b).(x − a, y − b) 0
∀(x, y) ∈ S
to find the candidate points (a, b) and solve the problem. iv) Can you proceed as in iii) if S = {(x, y) : x2 + y 2 1}? What is the solution in this case?
Unconstrained Optimization
67
Solution: i) Since f is differentiable on R2 , a local minimum point would be a critical point, ie. solution of ∇f = 0, 0. But ◦
∀(x, y) ∈ {(x, y) : x2 + y 2 < 1} = S.
∇f = 0, 1 = 0, 0
So, there is no critical point. f has no local minimum point. ii) If the minimum points exist, they may be on the unit circle, the boundary of S: ∂S = {(x, y) : x2 + y 2 = 1}. 1.0 y
0.5
0.0 0.5 1.0 1.0
0.5
z
0.0
0.5
1.0 1.0 0.5 0.0 x
0.5 1.0
FIGURE 2.15: Graph of f (x, y) = y on the set x2 + y 2 1 iii) Since S is convex and f differentiable on R2 , a solution (a, b) of the problem, if it exists, must satisfy ⎧ 2 ⎨ a + b2 = 1 ⎩
∇f (a, b).(x − a, y − b) 0
⇐⇒
⇐⇒
∀(x, y) ∈ S
⎧ 2 ⎨ a + b2 = 1 ⎩
0, 1.x − a, y − b 0
∀(x, y) ∈ S
⎧ 2 ⎨ a + b2 = 1 ⎩
y−b0
∀(x, y) ∈ S
⇐⇒
yb
Thus b = −1
and
a = 0.
∀(x, y) ∈ S
68
Introduction to the Theory of Optimization in Euclidean Space
So the only point candidate is (a, b) = (0, 1). In fact, it is the minimum point (see Figure 2.15) since we have f (x, y) = y −1 = f (0, −1)
∀(x, y) ∈ S.
iv) We cannot proceed as in iii) because the set S = {(x, y) : x2 + y 2 1} is not convex. And because this set is not bounded, we can see that f can take large negative values. Therefore, it doesn’t attain a minimum value. For example, on the negative y axis, we have f (0, y) = y
−→ −∞
as
y −→ −∞.
5. – Newton’s Method[2] Let I = [a, b] and let F : I −→ R be twice differentiable on I. Suppose that ∃m, M ∈ R+ : |F (x)| m > 0 and |F (x)| M F (a).F (b) < 0, K = M/2m.
∀x ∈ I
Then there exists a subinterval I ∗ containing a root r of F such that for any x1 ∈ I ∗ , the sequence xn defined by xn+1 = xn −
F (xn ) F (xn )
∀n ∈ N,
belongs to I ∗ and (xn ) converges to r. Moreover |xn+1 − r| K|xn − r|2 Application
∀n ∈ N.
Let f (x) = x3 − 2x − 5.
i) Show that f has a root on the interval I = [2, 2.2]. ii) If x1 = 2 and if (xn ) is the sequence obtained by Newton’s method, show that |xn+1 − r| (0.7)|xn − r|2 iii) Show that x4 is exact up to 6 decimals.
Solution: i) We have f (2.2) = (2.2)3 − 2.(2.2) − 5 = 1.248 > 0 f is continuous on [2, 2.2]
and
f (2) = 8 − 4 − 5 = −1 < 0
0 is between f (2) and f (2.2).
Unconstrained Optimization
69
From the intermediate value theorem, there exists x0 ∈ (2, 2.2) such that f (x0 ) = 0. ii) The sequence (xn ) obtained by Newton’s method is: ⎧ x1 = 2 ⎪ ⎪ ⎨ 2x3 + 5 f (xn ) x3 − 2xn − 5 ⎪ ⎪ = xn − n 2 = n2 ⎩ xn+1 = xn − f (xn ) 3xn − 2 3xn − 2 with f (x) = 3x2 − 2
f (x) = 6x.
Because, the functions f and f are increasing on [2, 2.2], we have 10 = f (2) f (x) f (2.2) = 12.52 12 = f (2) f (x) f (2.2) = 13.2. In particular |f (x)| 10 = m
|f (x)| 13.2 = M
and
∀x ∈ [2, 2.2].
We deduce that the sequence (xn ) converges to a root r of f (x) = 0 in [2, 2.2] and satisfies |xn+1 − r| 0.7|xn − r|2
K=
M = 0.66 < 0.7. 2m
iii) Denote by en = xn − r, the approximation error of the root r, then |Ken+1 | K 2 |en |2 = |Ken |2
=⇒
|Ken+1 | |Ke1 |2n
by induction,
where |e1 | = |x1 − r| < (2.2 − 2) = 0.2
since
x1 , r ∈ [2, 2.2].
Thus
2n = (0.0196)n |Ken+1 | |Ke1 |2n (0.7)(0.2) To obtain an accuracy up to 6 decimals, it suffices to choose n such that |en+1 |
(0.0196)n 10−6 . 0.66
70
Introduction to the Theory of Optimization in Euclidean Space We have (0.0196)2 = 0.000582061 0.66
(0.0196)3 ≈ 0.0000114084 0.66
(0.0196)4 ≈ 0.0000002236 < 10−6 . 0.66 The desired accuracy is obtained for n = 4. The approximate values of this root are:
x2 =
2(8) + 5 21 2x31 + 5 = = = 2.1 2 2 3x1 − 2 3(2 ) − 2 10
x3 =
2(2.1)3 + 5 23.522 2x32 + 5 = = = 2.0945681 2 3x2 − 2 3(2.1)2 − 2 11.23
x4 =
3 2( 23.522 2x33 + 5 11.23 ) + 5 = ≈ 2.09455148 23.522 2 3x3 − 2 3( 11.23 )2 − 2
x5 =
23.3782059 2x34 + 5 ≈ ≈ 2.0945514841. 3x24 − 2 11.1614377
We can see that x4 is exact up to six decimals; see Figure 2.16. y 10
5
y 1.0 3
2
1
1
2
3
x
0.5
5
1
1
2
3
x
0.5
10
1.0
FIGURE 2.16: Approximate position of the root of f (x) = x3 − 2x − 5
Unconstrained Optimization
2.2
71
Classification of Local Extreme Points
For a C 2 function f of one variable, in a neighborhood of a critical point x∗ , one can write by using the second order Taylor’s formula: f (x) = f (x∗ ) +
f (c) f (x∗ ) (x − x∗ ) + (x − x∗ )2 1! 2!
for some number c between x∗ and x. Then, since f (x∗ ) = 0, we have
f (x) = f (x∗ ) + (x − x∗ )2
f (c) . 2!
Now, if we have f (x∗ ) > 0, then by continuity of f , we deduce that for x close to x∗ , ( x ∈ (x∗ − , x∗ + ) ), we will have
f (c) > 0
=⇒
f (x) > f (x∗ )
∀x ∈ (x∗ − , x∗ + ) \ {x∗ }.
This means that x∗ is a strict local minimum point. Similarly, we show that
f (x∗ ) < 0
=⇒
x∗
is a strict local maximum point.
This classification of critical points, into minima and maxima points, where the sign of the second derivative intervenes, is generalized to C 2 functions with several variables in the theorem below, following the definition: Definition 2.2.1 Let Hf (x) = (fxi xj (x))n×n be the Hessian of a C 2 function f . Then, the n leading minors of Hf are defined by f x1 x1 fx1 x2 . . . f x1 xk fx2 x1 fx2 x2 . . . f x2 xk Dk (x) = k = 1, . . . , n. , .. .. .. .. . . . . fx x fx x . . . fx x k 1 k 2 k k
Theorem 2.2.1 Second derivatives test - Sufficient conditions for a strict local extreme point
Let S ⊂ Rn and f : S −→ R be a C 2 function in a neighborhood of a critical point x∗ ∈ S (∇f (x∗ ) = 0). Then (i)
∀k = 1, . . . , n Dk (x∗ ) > 0, ∗ =⇒ x is a strict local minimum point,
72
Introduction to the Theory of Optimization in Euclidean Space
(−1)k Dk (x∗ ) > 0, ∀k = 1, . . . , n ∗ =⇒ x is a strict local maximum point,
(ii)
Dn (x∗ ) = 0
(iii)
and neither of the conditions in (i) and (ii) are satisfied, then x∗ is a saddle point.
Before proving the theorem, we will see its application through some examples. Example 1. Profit in selling one commodity A commodity is sold at 5$ per unit. The total cost for producing x units is given by C(x) = x3 − 10x2 + 17x + 66. Find the most profitable level of production. Solution: The total revenue for selling x units is R(x) = 5x. Thus, the profit P (x) on x units is P (x) = R(x) − C(x) = 5x − (x3 − 10x2 + 17x + 66) = −x3 + 10x2 − 12x − 66. The profit, illustrated in Figure 2.17, will be at its maximum at points where 2 dP = −3x2 + 20x − 12 = −3(x − 6)(x − ) = 0. dx 3 We deduce that we have two critical points x = 6 and x = 23 . The Hessian of P is d2 P
= [−6x + 20]. dx2 Applying the second derivatives test, we obtain HP (x) =
∗ at x = 6, (−1)1 D1 (6) = (−1)
d2 P
dx2 Thus, x = 6 is a local maximum. ∗∗ at x = 2/3,
d2 P 2 2 ( ) = −6( ) + 20 = 16 > 0 2 dx 3 3 is a local minimum. D1 (2/3) =
Thus, x =
2 3
(6) = (−1)(−6(6) + 20) = 16 > 0
Unconstrained Optimization
73
Thus six units is a candidate point for optimality. We have to check that it is the point at which we have the most profitable profit. This can be done by comparing P (x) and P (6). Indeed, we have P (x) − P (6) = −(x − 6)2 (x + 2) 0 =⇒
∀x > 0,
x = 6
∀x ∈ (0, +∞) \ {6}.
P (x) < P (6) y 2
4
6
8
10
x
50
100
y x3 10 x2 12 x 66
150
FIGURE 2.17: Graph of P and the maximum profit at x = 6
Example 2. Profit in selling two commodities The cost to produce x units of a commodity A and y units of a commodity B is C(x, y) = 0.2x2 + 0.05xy + 0.05y 2 + 20x + 10y + 2500. If each unit from A and B are sold for 75 and 45 respectively, find the daily production levels x and y that maximize the profit per day. Solution: The daily profit is given by P (x, y) = 75x + 45y − C(x, y) = −0.2x2 − 0.05xy − 0.05y 2 + 55x + 35y − 2500. Since P is differentiable (because it is a polynomial), the points that maximize the profit are critical ones, i.e, solutions of x = 100 ∇P (x, y) = −0.4x−0.05y+55, −0.05x−0.1y+35 = 0, 0 ⇐⇒ y = 300. We deduce that (100, 300) is the only critical point of P ; see Figure 2.18. Now, we apply the second derivatives test to classify that point. We have −0.4 0 Pxx Pxy = HP (x, y) = 0 −0.1 Pyx Pyy D1 (100, 300) = Pxx = Pxx = −0.4 < 0, P D2 (100, 300) = xx Pxy
Pxy −0.4 = Pyy 0
0 = 0.004 > 0. −0.1
74
Introduction to the Theory of Optimization in Euclidean Space
So (100, 300) is a local maximum point. In fact, it is a global maximum point where P attains the optimal value P (100, 300) = 5500. This is true because P is concave in R2 . Indeed, we have D1 (x, y) < 0 500
and
D2 (x, y) > 0
∀(x, y) ∈ R2
(see next section).
225018001350900
2700
4050
z 0.2 x2 0.05 y x 55 x 0.05 y2 35 y 2500
3150 3600 4950
400
3600
2700 3150
300 global maximum
4000 z
3150 200
2700
500
2000
400
3600 0 3600 4500
3150
0
50
100
150
200
100 x
2700
900135018002250
100
300 y 50
150 200
200
100
FIGURE 2.18: Profit function P (x, y) and maximum point (100, 300) Example 3. Several local extreme points Find the stationary points and classify them when f (x, y) = 3x − x3 − 2y 2 + y 4 . Solution: Since f is a differentiable function (because it is a polynomial), the local extreme points are critical points, i.e, solutions of ∇f (x, y) = 0, 0. We have ∇f (x, y) = 3 − 3x2 , −4y + 4y 3 = 0, 0 ⎧ ⎧ 2 ⎨ x = 1 or ⎨ x =1 and and ⇐⇒ ⇐⇒ ⎩ ⎩ y = 0 or y(y 2 − 1) = 0 ⎧ ⎧ ⎨ x=1 ⎨ x=1 and and ⇐⇒ or or ⎩ ⎩ y=0 y=1 ⎧ ⎧ ⎨ x = −1 ⎨ x = −1 and and or or ⎩ ⎩ y=0 y=1
x = −1 y = 1 or y = −1 ⎧ ⎨ x=1 and ⎩ y = −1 ⎧ ⎨ x = −1 and or ⎩ y = −1.
We deduce that (1, 0), (1, 1), (1, −1), (−1, 0), (−1, 1) and (−1, −1) are the critical points of f . The level curves, graphed in Figure 2.19, show the nature of these points.
Unconstrained Optimization 9.7 5.82 7.76 0.67
z y4 2 y2 x3 3 x
75 4.85 2.91 0.97 0.97 6.7
3.88
1.94
1.94
1.5 8.73
4.85 0
1.0 2.91 0.5
3.88 7.76
2.91
1.94
10
1.94
0.0 2
5 z
6.79
2
1.0
0.97
7.76
4.85
1
1.5 8.73
1 2
5.82
2.91
0y 1 x0
0
0.97
0.5
1
0
0.67 7.76 2.91 9.7
2
1.94
1.94 2
3.88 1
0
3.88
4.85 2.91 1
2
FIGURE 2.19: Local extreme points of f (x, y) = 3x − x3 − 2y 2 + y 4 ∗ Classification of the critical points: We have fyy (x, y) = 12y 2 − 4, fxy (x, y) = 0, fxx (x, y) = −6x, −6x 0 fxx fxy = Hf (x, y) = 0 12y 2 − 4 fyx fyy D1 (x, y) = fxx = fxx = −6x, fxx fxy −6x 0 = −24x[3y 2 − 1]. D2 (x, y) = = fxy fyy 0 12y 2 − 4 Applying the second derivative test, we obtain: (x, y) (1, 0)
D1 (x, y) −6
D2 (x, y) −6 0
0 = 24 −4
6 0
0 = 48 8
6 0
0 = 48 8
(−1, 1)
6
(−1, −1)
6
(1, 1)
−6
−6 0
0 = −48 8
(1, −1)
−6
−6 0
0 = −48 8
(−1, 0)
6
6 0
0 = −24 −4
type local maximum
local minimum
local minimum
saddle point
saddle point
saddle point
TABLE 2.5: Critical points’ classification for f (x, y) = 3x − x3 − 2y 2 + y 4
76
Introduction to the Theory of Optimization in Euclidean Space
The proof of Theorem 2.2.1 uses Taylor’s formula for a function of several variables and a characterization of symmetric quadratic forms (see the end of this section). Taylor’s formula will be used several times through out the next chapters. It is therefore important to understand its proof.
Theorem 2.2.2 2nd order Taylor’s formula for a function of n variables Suppose f is C 2 in an open set of Rn containing the line segment [x∗ , x∗ + h]. Then f (x∗ + h) = f (x∗ ) +
n n n 1 ∂2f ∂f ∗ (x )hi + (x∗ + c h)hi hj ∂x 2 ∂x ∂x i i j i=1 i=1 j=1
or
1t hHf (x∗ + ch)h 2 ⎤ ⎤ ⎡ h1 ⎥ ⎢ . ⎥ ⎦ , h = ⎣ .. ⎦, t h = x∗n hn h1 . . . hn . Here, we identified the column vector x∗ + th with the point (x∗1 + th1 , . . . , x∗n + thn ), t ∈ R. f (x∗ + h) = f (x∗ ) + ∇f (x∗ ).h + ⎡ ∗ x1 ⎢ .. ∗ for some c ∈ (0, 1) and where x = ⎣ .
Proof. Define the function g(t) = f (x∗1 + th1 , . . . , x∗n + thn ) = f (x∗ + th). Note that g(t) = f (x1 (t), x2 (t), . . . , xn (t))
with
xj (t) = x∗j + thj
j = 1, . . . , n.
Since the real functions xj , j = 1, . . . , n, are differentiable with xj (t) = hj , then g is differentiable and we have by the chain rule formula g (t) =
∂f ∂x2 ∂f ∂xn ∂f ∂x1 + + ...... + ∂x1 ∂t ∂x2 ∂t ∂xn ∂t
= fx1 (x∗ + th) h1 + fx2 (x∗ + th) h2 + . . . . . . + fxn (x∗ + th) hn = ∇f (x∗ + th) .h.
Unconstrained Optimization
77
Because f is C 2 , then g is also C 2 , and we have g (t) =
d d d fx1 (x∗ +th) h1 + fx2 (x∗ +th) h2 + . . . . . . + fxn (x∗ +th) hn . dt dt dt
For each i = 1, . . . , n, we have fxi (x∗ + th) = fxi (x1 (t), x2 (t), . . . , xn (t)). Then ∂fxi ∂x2 ∂fxi ∂xn ∂fxi ∂x1 d fx (x∗ + th) = + + ...... + dt i ∂x1 ∂t ∂x2 ∂t ∂xn ∂t = fxi x1 (x∗ + th) h1 + fxi x2 (x∗ + th) h2 + . . . . . . + fxi xn (x∗ + th) hn =
n
fxi xj (x∗ + th) hj .
j=1
Hence g (t) =
n n
fxi xj (x∗ + th) hi hj .
i=1 j=1
Now, since f is defined on the segment [x∗ , x∗ + h], g is defined on the interval [0, 1] and by using the 2nd order Taylor’s formula for real functions [1], [2], we get g(1) = g(0) +
g (c) g (0) (1 − 0) + (1 − 0)2 1! 2!
1 = g(0) + g (0) + g (c) 2
for some c ∈ (0, 1),
or equivalently f (x∗ + h) = f (x∗ ) +
n i=1
fxi (x∗ )hi +
n
n
1 fx x (x∗ + ch)hi hj . 2 i=1 j=1 i j
Proof. (Theorem 2.2.1) Since x∗ is an interior point of S and is a local stationary point of f then ∇f (x∗ ) = 0.
78
Introduction to the Theory of Optimization in Euclidean Space
For h ∈ Rn such that x∗ + h ∈ S, we have from the 2nd order Taylor’s formula f (x∗ + h) = f (x∗ ) +
Situation (i)
1t hHf (x∗ + ch)h 2
for some c ∈ (0, 1).
Suppose that Dk (x∗ ) > 0 for all k = 1, . . . , n.
By continuity of the second-order partial derivatives of f , there exists r > 0 such that Dk (x) > 0
∀x ∈ Br (x∗ )
∀k = 1, . . . , n.
As a consequence, the quadratic form Q(h)(x) =
n n
fxi xj (x)hi hj =t hHf (x)h
i=1 j=1
with the associated symmetric matrix Hf (x) = fxi xj (x) n×n is definitely positive. Since x∗ + ch ∈ Br (x∗ ), then Q(h)(x∗ + ch) =t hHf (x∗ + ch)h > 0. Therefore, we have for x∗ + h ∈ Br (x∗ ) 1 Q(h)(x∗ + ch) > 0 2 which shows that the stationary point x∗ is a strict local minimum point for f in S. f (x∗ + h) − f (x∗ ) =
Situation (ii)
Suppose that (−1)k Dk (x∗ ) > 0 for all k = 1, . . . , n.
By continuity of the second-order partial derivatives of f , there exists r > 0 such that (−1)k Dk (x) > 0
∀x ∈ Br (x∗ )
From the property of determinants, we can write (−f )x1 x1 (−f )x1 x2 . . . (−f )x2 x1 (−f )x2 x2 . . . k ∗ (−1) Dk (x ) = .. .. .. . . . (−f )x x (−f )x x . . . k 1 k 2
∀k = 1, . . . , n. (−f )x1 xk (−f )x2 xk .. . (−f )xk xk
Unconstrained Optimization
79
As a consequence, the quadratic form t
hH−f (x)h =
n n
(−f )xi xj (x)hi hj
i=1 j=1
with the associated symmetric matrix H−f (x) = (−f )xi xj (x) n×n is definite positive. Therefore, we have for x∗ + h ∈ B(x∗ , r) 1 (−f )(x∗ + h) − (−f )(x∗ ) = ( )t hH−f (x∗ + ch)h > 0 2 =⇒
(−f )(x∗ + h) > (−f )(x∗ )
f (x∗ ) > f (x∗ + h)
⇐⇒
which shows that the stationary point x∗ is a strict local maximum point for f in S.
Situation (iii) hold.
Assume Dn (x∗ ) = 0 and neither of the conditions i) and ii)
situation (i) (resp. (ii)) means also that the matrix A = Note that fxi xj (x∗ ) n×n is definite positive (resp. negative), which is equivalent to each of its eigen value λi to be positive (resp. negative). So, if neither (i) or (ii) hold, there exist i0 , j0 ∈ {1, . . . , n} such that Dn (x∗ ) =
n
λi = 0
with
λi0 > 0
and
λj0 < 0.
i=1
Now, since A is symmetric, there exists an orthogonal matrix O = (pij )n×n (O−1 =t O) such that ⎡ ⎤ λ1 · · · 0 ⎢ ⎥ A = ODt O D = ⎣ 0 ... 0 ⎦ 0
···
λn
Then the quadratic form Q(h) can be written as Q(h)(x∗ ) =t hAh =t [t Oh]D[t Oh] =
n i=1
λi
n
pji hj
2
.
j=1
Choose hs and hs such that for s > 0, t
s 2s Ohs = ei0 + ej0 −λj0 λi0
t
2s s Ohs = ei0 + ej0 , −λj0 λi0
80
Introduction to the Theory of Optimization in Euclidean Space
which is possible since t O is invertible. Then we have s 2 2s 2 Q(hs )(x∗ ) = λi0 + λ j0 = s2 − 4s2 = −3s2 < 0 −λj0 λi0 2s 2 s 2 + λ j0 = 4s2 − s2 = 3s2 > 0. Q(hs )(x∗ ) = λi0 −λj0 λi0 We deduce, by continuity of Q(h)(x) the existence of δ > 0 such that ∀s ∈ (0, δ) 1 f (x∗ + hs ) − f (x∗ ) = Q(hs )(x∗ + chs ) < 0 2 1 Q(hs )(x∗ + chs ) > 0. 2 f takes values greater and less than f (x∗ ) in the neighborhood of x∗ . Therefore x∗ is a saddle point. f (x∗ + hs ) − f (x∗ ) =
The following theorem shows that the Hessian matrix of a C 2 function at a local maximum (resp. minimum) point is necessarily positive (resp. negative) semi definite. However, this condition is not sufficient as we can see it in a suggested exercise where the origin is neither a local minimum, nor a local maximum.
Theorem 2.2.3 Necessary conditions for a local extreme point Let S ⊂ Rn and f : S −→ R be a C 2 function in a neighborhood of a ◦
critical point x∗ ∈ S
(∇f (x∗ ) = 0). Then
(i) x∗ is a local minimum point (ii) x∗ is a local maximum point
=⇒ =⇒
k (x∗ ) 0
∀k = 1, n
(−1)k k (x∗ ) 0
∀k = 1, n
where k (x∗ ) is the principal minor of order k of the Hessian matrix Hf (x∗ ); that is the determinant of a matrix obtained by deleting n − k rows and n − k columns such that if the ith row (column) is selected, then so is the ith column (row).
Proof. (i) Suppose that x∗ is an interior local minimum point for f . There exists r > 0 such that f (x∗ ) f (x)
∀x ∈ Br (x∗ ).
Unconstrained Optimization
81
In particular, for t ∈ (−r, r) and h ∈ Rn with h = 1, we have x∗ + th ∈ Br (x∗ ) since x∗ + th − x∗ = |t|h = |t| < r. Then g(0) = f (x∗ ) f (x∗ + th) = g(t)
∀t ∈ (−r, r).
So g is a one variable function that has an interior local minimum at t = 0. Consequently, it satisfies g (0) = 0
g (0) 0.
and
From previous calculations, we have g (t) =
n n
fxi xj (x∗ + th)hi hj .
i=1 j=1
Hence g (0) =
n n
fxi xj (x∗ )hi hj =t hHf (x∗ )h 0.
i=1 j=1
The above inequality remains true for h = 0 and for h = 0. It suffices to consider for this last case h/h which is a unit vector. Hence the Hessian matrix of f at x∗ is positive semi definite by the result below from Algebra (see [10]). (ii) is proved similarly.
Quadratic forms Consider the quadratic form in n variables Q(h) =
n n
aij hi hj =t hAh
t
h=
h1
...
hn
i=1 j=1
associated to the symmetric matrix ⎡ a11 . . . ⎢ .. .. A = (aij )i,j=1,...,n = ⎣ . . an1 . . .
⎤ a1n .. ⎥ . ⎦ ann
(aij = aji ).
82
Introduction to the Theory of Optimization in Euclidean Space Definition. Q is positive (resp. negative) definite if Q(h) > 0 (resp. < 0) for all h = 0. Q is positive (resp. negative) semi definite if Q(h) 0 (resp. 0) for all h ∈ Rn .
We have the following necessary and sufficient condition for a quadratic form Q to be positive (negative), definite or semi definite.
Theorem. Q is positive definite Q is negative definite
⇐⇒ ⇐⇒
r = 1, . . . , n Dr > 0 r (−1) Dr > 0 r = 1, . . . , n
where Dr is the leading principal minor of order r of the matrix A; a11 . . . a1n .. .. Dr = ... for r = 1, . . . , n. . . an1 . . . ann
Theorem. Q is positive semi definite Q is negative semi definite
⇐⇒ ⇐⇒
Δr 0 r = 1, . . . , n (−1)r Δr 0 r = 1, . . . , n
where Δr is the principal minor of order r of the matrix A; that is the determinant of the matrix obtained from the matrix A by deleting n − r rows and n − r columns such that if the i th row (column) is selected, then so is the i th column (row).
Unconstrained Optimization
83
Solved Problems
1. – Use the following functions to show that the positivity or negativity semi definite of the Hessian of the objective function at a critical point is not a necessary condition for local optimality. f (x, y) = x4 + y 4 ,
g(x, y) = −(x4 + y 4 ),
h(x, y) = x4 − y 4 .
Solution: z x4 y4
z x4 y4
z x4 y4
2.0
1.0
0.0
1.5 z1.0
0.5 z 1.0
1.0 0.5
0.5 0.0 1.0
2.0 1.0
0.0y
1.0 0.5
0.5
0.5
1.5
0.5
0.5 z0.0
1.0
1.0 1.0
0.0y
0.0y 0.5
0.5 0.5
0.0 x
0.5
0.0 x
0.5
0.5
0.5 1.0
1.0
0.5
0.0 x
1.0
1.0
1.0
1.0
FIGURE 2.20: Graphs of f, g, h
We have ∇f (x, y) = 4x3 , 4y 3 ,
∇g(x, y) = −4x3 , −4y 3 ,
∇h(x, y) = 4x3 , −4y 3 .
So (0, 0) is the only stationary point for f , g and h. But, we cannot conclude anything about its nature by using the second derivatives test since the Hessian matrix at (0, 0) of each function is equal to the zero matrix. Hf =
12x2 0
0 12y 2
,
Hg = −Hf ,
Hh =
Hf (0, 0) = Hg (0, 0) = Hh (0, 0) =
0 0
12x2 0 0 0
0 −12y 2
84
Introduction to the Theory of Optimization in Euclidean Space Δ11 (0, 0) = Δ21 (0, 0) = Δ2 (0, 0) = 0
where Δl1 is the principal minor of order l obtained by removing the leme row and leme column l = 1, 2. Thus the Hessian matrices of f , g and h are positive and negative semi definite at (0, 0). However, this doesn’t imply that (0, 0) is a local minimum or maximum point. Indeed, by looking at the functions directly, we can classify the point. The three situations are shown in Figure 2.20. First, note that (0, 0) is a global maximum for f since we have f (x, y) = x4 + y 4 0 = f (0, 0)
∀(x, y) ∈ R2 .
Next, note that (0, 0) is a global minimum for g. Indeed, we have g(x, y) = −(x4 + y 4 ) 0 = g(0, 0)
∀(x, y) ∈ R2 .
Finally, (0, 0) is a saddle point for h since we have h(x, 0) = x4 0 = h(0, 0) h(0, y) = −y 4 0 = h(0, 0)
∀x ∈ R ∀y ∈ R.
Thus, for any disk centered at (0, 0), h takes values greater and lower than h(0, 0).
2. – Classify the stationary points of f (x1 , x2 , x3 , x4 ) = 20x2 + 48x3 + 6x4 + 8x1 x2 − 4x21 − 12x23 − x24 − 4x32 . Does f attain its global extreme values on R4 ?
Solution: Since the function f is differentiable (because it is a polynomial), the local extreme points are critical points, i.e, solutions of ∇f (x1 , x2 , x3 , x4 ) = 8x2 − 8x1 , 20 + 8x1 − 12x22 , 48 − 24x3 , 6 − 2x4 = 0R4 ⎧ 5 + 2x1 − 3x22 = 0 ⎨ x2 = x 1 ⇐⇒ ⎩ x3 = 2 x4 = 3. 5 5 We deduce that (−1, −1, 2, 3) and ( , , 2, 3) are the critical points of f . 3 3
Unconstrained Optimization
85
• Classification of the critical points: The Hessian matrix of f is ⎡ ⎤ −8 8 0 0 ⎢ 8 −24x2 0 0 ⎥ ⎥ Hf (x1 , x2 , x3 , x4 ) = ⎢ ⎣ 0 0 −24 0 ⎦ 0 0 0 −2 The leading principal minors at the point (−1, −1, 2, 3) are D1 = −8 < 0,
D2 = −256 < 0,
D3 = −24D2 > 0 and D4 = −2D3 < 0.
Then, (−1, −1, 2, 3) is a saddle point. The leading principal minors at the point ( 53 , 53 , 2, 3) are D1 = −8 < 0,
D2 = 256 > 0,
D3 = −24D2 < 0 and
D4 = −2D3 > 0.
5 5 Then, ( , , 2, 3) is a local maximum point. 3 3 • Global optimal points: Note that f (0, x2 , 0, 0) = 20x2 − 4x32
−→ ∓∞
as
x2 −→ ±∞.
Thus f takes large negative and positive values. Therefore f doesn’t attain its global optimal values on R2 .
3. – Let f (x, y) = ln(1 + x2 y). i) Find and sketch the domain of definition of f . ii) Find the stationary points and show that the second-derivatives test is inconclusive at these points. iii) Describe the behavior of f at these points.
Solution: i) The domain of f is given by: Df = {(x, y) ∈ R2 : = {(0, y) :
1 + x2 y > 0}
y ∈ R} ∪ {(x, y) ∈ R∗ × R :
y>−
1 }. x2
1 The domain of f is the region located above the curve y = − 2 , including the x y axis; see Figure 2.21.
86
Introduction to the Theory of Optimization in Euclidean Space y 10
5
5
10
x
0.2
0.4
0.6
0.8
FIGURE 2.21: Domain of f (x, y) = ln(1 + x2 y)
ii) f is differentiable on its open domain Df because v(t) = ln t, u(x, y) = 1 + x2 y u is differentiable in R then, in particular, in Df
f = vou
with
2
u(Df ) ⊂ R∗ + v is differentiable in R∗ + . The stationary points are solutions of x2 2xy , = 0, 0 1 + x2 y 1 + x2 y ⎧ ⎧ ⎨ x=0 ⎨ xy = 0 ⇐⇒ ⎩ ⎩ 2 x=0 x =0
∇f (x, y) =
⇐⇒
⇐⇒
⎧ ⎨ x=0 or ⎩ y=0
and
x=0
and
x=0
or
⇐⇒
y=0
x = 0,
y ∈ R.
We deduce that the points located on the y axis are the critical points of f . The Hessian matrix of f is 1 Hf (x, y) = (1 + x2 y)2
1 − x2 y 2x
At the stationary points, we have Hf (0, y) =
1 0 0 0
.
2x −x4
.
Unconstrained Optimization
87
2 2.04 1.7
z logy x2 1
0.51
0
1 2 2
0 y
0.17
1
1 2
1.1 2.0
1.7
1.36 0.17
1.87 1.19 0.34
2
2
0.51 1.53
0.68
0.68
1
0
0.34 1.02
0
1.53
1 x
0.34
0 0.51 1.36
2.04 0.85
1
1.19
0.68
0.17
0.51
0.34 1.02 1.7
2
1.53
0.17
1.53 1.02
1 z 0
1.87 1.02
0.85
1.36
2
1.36 1.7
1.87 1
2.04
0.85
0
0.68
1.19
2
1
1.87 0.85 0
1
2
FIGURE 2.22: Graph and level curves of f (x, y) = ln(1 + x2 y)
The leading minor D2 (0, y) = det(Hf (0, y)) = 0, then the second derivatives test fails at these points. The behaviour of the function is illustrated in Figure 2.22. Classification of these points: • The points (0, y0 ) with y0 > 0 are local minimum points for f . Indeed, since the logarithm function is increasing, we have f (x, y) = ln(1 + x2 y) ln(1) = ln(1 + 02 y0 ) = 0 = f (0, y0 ) ∀x ∈ R,
∀y ∈ (y0 −
y0 3y0 y0 y0 , y0 + ) = ( , ). 2 2 2 2
Thus, f takes values greater than f (0, y0 ) in a neighborhood of (0, y0 ) with y0 > 0. • The points (0, y0 ) with y0 < 0 are local maximum points for f . Indeed, since ln is an increasing function, we have f (x, y) = ln(1 + x2 y) ln(1) = ln(1 + 02 y0 ) = 0 = f (0, y0 ) ∀y ∈ (y0 −
3y0 y0 −y0 −y0 , y0 + )=( , ), 2 2 2 2
∀x
such that
0 < 1 + x2 y.
f takes values lower than f (0, y0 ) in a neighborhood of (0, y0 ) with y0 < 0. • The point (0, 0) is a saddle point for f . Indeed, we have f (x, y) = ln(1 + x2 y) ln(1) = 0 = f (0, 0)
∀y ∈ R+
88
Introduction to the Theory of Optimization in Euclidean Space
f (x, y) = ln(1 + x2 y) ln(1) = 0 = f (0, 0)
∀y ∈ R− such that 0 < 1 + x2 y.
For any disk centered at (0, 0), f takes values greater and lower than f (0, 0).
4. – Find and classify all stationary points of f (x, y) = x2 y + y 3 x − x y. Are there global minimum and maximum values of f on R2 ?
Solution: 2
z x y3 x2 y x y
1
5
0 2
z0 1 5 2
1
0y 1 1
x0 1 2
2
2
2
1
0
1
2
FIGURE 2.23: Graph and level curves of f (x, y) = x2 y + y 3 x − xy The function f is differentiable on its open domain R2 since it is a polynomial. So, the local extreme points are critical, i.e, solutions of ∇f (x, y) = 2xy + y 3 − y, x2 + 3y 2 x − x = 0, 0 ⇐⇒ ⎧ ⎨ y(y 2 + 2x − 1) = 0 ⎩
⇐⇒
x(3y 2 + x − 1) = 0 ⎧ y = 0 and ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ or [y = 0 ⎪ ⎪ or ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ or
⇐⇒
⎧ ⎨ y=0
or
y 2 + 2x − 1 = 0
⎩
or
3y 2 + x − 1 = 0
x=0
x=0 and
3y 2 + x − 1 = 0]
[y 2 + 2x − 1 = 0
and
x = 0]
[y 2 + 2x − 1 = 0
and
3y 2 + x − 1 = 0]
⇐⇒
⎧ ⎨ y=0 ⎩
or
and [y = 0
Unconstrained Optimization ⎧ 2 ⎪ x=0 ⎨ or [y − 1 = 0 and
x = 1]
⎪ ⎩ or
[y 2 =
1 5
89 and
and
x = 0]
x=
2 ]. 5
1 2 2 1 We deduce that (0, 0), (1, 0), (0, 1), (0, −1), ( , √ ) and ( , − √ ) are the 5 5 5 5 critical points of f . Reading the level curves in Figure 2.23, one can locate four saddle points and two local extrema. Classification of the critical points: Applying the second derivatives test, we obtain: critical point
D1 (x, y)
D2 (x, y)
classification
(0, 0)
0
−1
saddle point
(1, 0)
0
−4
saddle point
(0, 1)
2
−4
saddle point
(0, −1)
−2
−4
saddle point
2 1 ( ,√ ) 5 5
2 √ 5
4 5
local minimum point
1 2 ( , −√ ) 5 5
2 −√ 5
4 5
local maximum point
TABLE 2.6: Critical points’ classification for f (x, y) = x2 y + y 3 x − xy where fxx (x, y) = 2y, Hf (x, y) =
fxy (x, y) = 2x + 3y 2 − 1
fyy (x, y) = 6yx,
2y 2x + 3y 2 − 1 2 6xy 2x + 3y − 1
,
D1 (x, y) = fxx = fxx = 2y
90 Introduction to the Theory of Optimization in Euclidean Space fxx fxy 2y 2x + 3y 2 − 1 2 2 2 = fxy fyy 2x + 3y 2 − 1 = 12xy − [2x + 3y − 1] . 6yx Finally, note that f takes large positive and negative values since we have f (1, y) = y 3 −→ ±∞
y −→ ±∞.
as
Therefore, f doesn’t attain a global maximal value nor a minimal one.
5. – A power substation must be located at a point closest to three houses located at the points (0, 0), (1, 1), (0, 2). Find the optimal location by minimizing the sum of the squares of the distances between the houses and the substation. Solution: Let (x, y) be the position of the power substation. Then, we have y
2.0
1.5
1.0
0.5
0.2
0.2
0.4
0.6
0.8
1.0
1.2
x
FIGURE 2.24: The closest power station to three houses to look for (x, y) as the point that minimize the function f (x, y) = d2 ((x, y), (0, 0)) + d2 ((x, y), (1, 1)) + . . . + d2 ((x, y), (0, 2)) which can be written as f (x, y) = [(x − 0)2 + (y − 0)2 ] + [(x − 1)2 + (y − 1)2 ] + [(x − 0)2 + (y − 2)2 ]. Because f is polynomial, it is differentiable on the open set R2 . Thus a global minimum point is also a local one. Therefore, it is a solution of
Unconstrained Optimization
91
∇f (x, y) = 2x + 2(x − 1) + 2x, 2y + 2(y − 1) + 2(y − 2) = 0, 0 ⎧ ⎨ 6x − 2 = 0
⇐⇒
⎩
1 (x, y) = ( , 1). 3
⇐⇒
6y − 6 = 0
Thus, we have one critical point and by applying the second derivatives test, we obtain: Hf (x, y) =
6 0
0 6
6 0 1 D2 ( , 1) = 0 6 3
1 D1 ( , 1) = 6 > 0 3
= 36 > 0.
So ( 13 , 1) is a local minimum; see Figure 2.24 for the position of the point and the three houses. To show that it is the point that minimizes f globally, we proceed by comparing the values of f and completing squares: 2 1 f (x, y) − f ( , 1) = 3x2 − 2x + 1 + 3y 2 − 6y + 5 − ( + 2) 3 3 1 2 ∀(x, y) ∈ R2 . = 3(x − ) + 3(y − 1)2 0 3 6. – Based on the level curves that are visible in Figures 2.25 and 2.26, identify the approximate position of the local maxima, local minima and saddle points. 2
0.128
0.128 0.256
0.128
0.12 0.32
0.064 1 0.192 0.256
0.064
0 0
0.192
0 0.192
0.064
0.192
0.256
0.256
1 0.064
0.32 0.128
0.128 0.128
2 2
1
0.128 0
1
FIGURE 2.25: Level curves of f (x, y) = −xye−(x
2
2
+y 2 )/2
on [−2.2] × [−2, 2]
Solution: i) From the level curves’ plotting, one can locate: - a saddle point at (0, 0)
92
Introduction to the Theory of Optimization in Euclidean Space 1.2
1.2
0.4
1.2
0 2.4
0.4 0.4 0.8 0
2
1.6
2.4
8 0.8
0.4
0.8
0.8
2
1.6
1.2
6 1.2
0.8
1.2
0.4 0.4
0.8 0.8
0
1.2
1.2
4
1.2
0.4
0.4 0.4
1.2
1.6
0.8
2
0.8 0.4 0.4 0
0.4 2.4
2
2.4
1.6
0
0 0
1.2
0.8 0.8
2
1.2 2
1.2
0.8 4
1.2
0 6
8
FIGURE 2.26: Level curves of g(x, y) = sin(x) + sin(y) − cos(x + y) for x, y in [0, 3π]
- two local maxima at (−1, 1), (1, −1) - two local minima at (−1, −1), (1, 1). Using Maple software, one can check these observations by applying the second derivatives test using the coding: with(Student[M ultivariateCalculus]) LagrangeM ultipliers(−x ∗ y ∗ exp(−(x2 + y 2 ) ∗ (1/2)), [ ], [x, y], output = detailed) [x = 0, y = 0, −x ∗ y ∗ exp(−(1/2) ∗ x2 − (1/2) ∗ y 2 ) = 0], [x = 1, y = 1, −x ∗ y ∗ exp(−(1/2) ∗ x2 − (1/2) ∗ y 2 ) = −exp(−1)], [x = 1, y = −1, −x ∗ y ∗ exp(−(1/2) ∗ x2 − (1/2) ∗ y 2 ) = exp(−1)], [x = −1, y = 1, −x ∗ y ∗ exp(−(1/2) ∗ x2 − (1/2) ∗ y 2 ) = exp(−1)], [x = −1, y = −1, −x ∗ y ∗ exp(−(1/2) ∗ x2 − (1/2) ∗ y 2 ) = −exp(−1)] SecondDerivativeT est(−x ∗ y ∗ exp(−(x2 + y 2 ) ∗ (1/2)), LocalM in = [],
LocalM ax = [],
SecondDerivativeT est(−x ∗ y ∗ exp(−(x2 + y 2 ) ∗ (1/2)), LocalM in = [[1, 1]],
[x, y] = [0, 0])
Saddle = [[0, 0]]
LocalM ax = [],
[x, y] = [1, 1])
Saddle = []
.. .
ii) For the second figure, the exact points found, using Maple, are: - 5 saddle points
(
3π 3π , ), 2 2
π 3π ( , ), 2 2
- 4 local maxima at
π 5π ( , ), 2 2
- 2 local minima at
(
(
11π 11π , ), 6 6
(
3π π , ), 2 2
5π π , ), 2 2 (
(
π π ( , ), 2 2
7π 7π , ). 2 2
7π π , ), 2 2 (
π 7π ( , ) 2 2
5π 5π , ) 2 2
Unconstrained Optimization
2.3
93
Convexity/Concavity and Global Extreme Points
In dimension 1, when a C 2 function f is convex on its domain Df and x∗ is a local minimum of f , then x∗ is a global minimum. Indeed, the convexity of f is characterized by f (x) 0 [2], [1]. Then, using Taylor’s formula, the values f (x) and f (x∗ ) can be compared as follows: f (x) = f (x∗ ) + (x − x∗ )f (x∗ ) +
(x − x∗ )2 f (c) 2
for some c between x∗ and x.
Because f (x∗ ) = 0, then (x − x∗ )2 f (c) 0. 2 As x is arbitrarily chosen in the domain of f , then f (x) − f (x∗ ) =
f (x) f (x∗ )
∀x ∈ Df
∗
which shows that x is a global minimum point for f .
In this section, we want to generalize the convexity property to functions of several variables in order to establish, later, results of global optimality.
2.3.1
Convex/Concave Several Variable Functions
Definition 2.3.1 Let S be a convex set of Rn and let f be a real function f:
S −→ R x = (x1 , · · · , xn ) −→ f (x).
Then, f is convex
⇐⇒
f is strictly convex
f is concave
⇐⇒
f is strictly concave
f (ta + (1 − t)b) tf (a) + (1 − t)f (b). ⇐⇒
f (ta + (1 − t)b) < tf (a) + (1 − t)f (b) a = b, t = 0, 1.
f (ta + (1 − t)b) tf (a) + (1 − t)f (b). ⇐⇒
f (ta + (1 − t)b) > tf (a) + (1 − t)f (b) a = b, t = 0, 1.
These equivalences must hold ∀a, b ∈ S,
∀ t ∈ [0, 1].
94
Introduction to the Theory of Optimization in Euclidean Space
• Using the definition, one can check that the functions i)
f (x) = ax + b
ii)
f (x, y) = ax + by + c
are simultaneously concave and convex in R and R2 respectively. Their respective graphs represent a line y = ax + b and a plane z = ax + by + c. • A convex/concave function is not necessarily differentiable at every point. i) f (x) = |x| ii) f (x, y) = x2 + y 2 = (x, y. Each function is not differentiable at the origin and represents the Euclidean distance in R and R2 respectively. We use the triangular inequality to verify that they are convex. • One can form new convex/concave functions using algebraic operations. For Example [25], if f , g are functions defined on a convex set S ⊂ Rn and s, t 0, then: f and g are concave (resp. convex) =⇒ sf + tg is concave (resp. convex) min(f, g) (resp. max(f, g)) is concave (resp. convex).
Remark 2.3.1 The geometrical interpretation of the convexity of f expresses that the graph of f remains under the line segment [AB] joining any two points A(a, f (a)) and B(b, f (b)) of the graph of f . Indeed, [A, B] = (x, y) ∈ Rn × R : x = a + t(b − a), y = f (a) + t(f (b) − f (a)) t ∈ [0, 1] is located above the part of the graph of f (x, y) ∈ Rn × R : x = a + t(b − a), y = f (a + t(b − a)) since we have
t ∈ [0, 1]
∀t ∈ [0, 1]
f (a+t(b−a)) = f (tb+(1−t)a) tf (b)+(1−t)f (a) = f (a)+t(f (b)−f (a)). Similarly, the geometrical interpretation of the concavity of f expresses that the graph of f remains above the line segment [AB] joining any two points A(a, f (a)) and B(b, f (b)) of the graph of f ; see Figure 2.27.
Unconstrained Optimization
95
convex function z fx,y segment AB above the graph y fx
A
B
FIGURE 2.27: Shape of convex functions
Remark 2.3.2 There is a connection with a convexity/concavity of a function f defined on a convex set S ⊂ Rn and the convexity/concavity of particular sets described by f [25]. Indeed, we have f is convex ⇐⇒ the set (x, y) ∈ S × R : y f (x) is convex f is concave
2.3.2
⇐⇒
the set
(x, y) ∈ S × R :
y f (x)
is convex
Characterization of Convex/Concave C 1 Functions
When n = 1, the following theorem expresses that the graph of a convex (resp. concave) C 1 function remains above (resp. below) its tangent lines.
Theorem 2.3.1 Let S be a convex open set of Rn and let f : S −→ R be C 1 . Then, for any x, a ∈ S, the following inequalities hold f is convex in S ⇐⇒ f (x) − f (a) ∇f (a) .(x − a) f is strictly convex in S ⇐⇒ f (x) − f (a) > ∇f (a) .(x − a), x = a
96
Introduction to the Theory of Optimization in Euclidean Space
⇐⇒
f is concave in S
f (x) − f (a) ∇f (a) .(x − a)
f is strictly concave in S ⇐⇒ f (x) − f (a) < ∇f (a) .(x − a), x = a.
Proof. We prove the first assertion. The other assertions can be established similarly. =⇒)
If f is convex in S, then, by definition, we have for a, b ∈ S, f (tb + (1 − t)a) tf (b) + (1 − t)f (a)
∀t ∈ [0, 1]
from which we deduce
f (b) − f (a)
f (a + t(b − a)) − f (a) f (tb + (1 − t)a) − f (a) = t t
∀t ∈ (0, 1].
Since f ∈ C 1 (S), we obtain f (b) − f (a) lim
t→0+
g(t) − g(0) = g (0) t−0
where g(t) = f (a+t(b−a))
g (t) = f (a+t(b−a)).(b−a),
g (0) = f (a).(b−a).
Indeed, g(t) = f (a1 + t(x1 − a1 ), . . . , an + t(xn − an )) = f (x1 (t), x2 (t), ..., xn (t)). Each function xj (t) == aj + t(bj − aj ), j = 1, ..., n, is differentiable with xj (t) = bj − aj . So g is differentiable and we obtain, by the chain rule formula, g (t) =
∂f ∂x2 ∂f ∂xn ∂f ∂x1 + + ... + ∂x1 ∂t ∂x2 ∂t ∂xn ∂t
= fx1 (a + t(b − a)) (b1 − a1 ) + fx2 (a + t(b − a)) (b2 − a2 ) + . . . . . . + fxn (a + t(b − a)) (bn − an ) = (∇f )(a + t(b − a)) .(b − a). ⇐=)
Assume that
Unconstrained Optimization f (x) − f (u) ∇f (u) .(x − u)
97 ∀ x, u ∈ S.
Let a, b ∈ S and t ∈ [0, 1]. Choosing x = a and u = ta + (1 − t)b in the above inequality, we obtain f (a) − f (ta + (1 − t)b) ∇f (ta + (1 − t)b) .(a − [ta + (1 − t)b]) = (1 − t) ∇f (ta + (1 − t)b) .(a − b).
(∗)
Now, choose x = b and u = ta + (1 − t)b in the same inequality. We get f (b) − f (ta + (1 − t)b) ∇f (ta + (1 − t)b) .(b − [ta + (1 − t)b]) = −t ∇f (ta + (1 − t)b) .(a − b).
(∗∗)
Multiply the inequality (∗) by t > 0 and the inequality (∗∗) by (1 − t) > 0, then add the resulting inequalities. This gives tf (a) + (1 − t)f (b) − (t + (1 − t))f (ta + (1 − t)b) [t(1 − t) − (1 − t)t]∇f (ta + (1 − t)b).(a − b) = 0. Therefore f is convex. Example 1. Show that f (x, y) = x2 + y 2 is convex on R2 . Solution: We have
f (x, y) − f (s, t) − ∇f (s, t). 2
2
2
2
= x + y − (s + t ) −
2s
x−s y−t 2t
.
x−s y−t
= x2 + y 2 − (s2 + t2 ) − 2s(x − s) − 2t(y − t) = (s − x)2 + (t − y)2 0
∀ (x, y), (s, t) ∈ R2 .
Thus f is convex on R2 . Note that by taking (s, t) = (0, 0), the critical point of f , we deduce that f (x, y) − f (0, 0) 0 ∀ (x, y) ∈ R2 . Hence, (0, 0) is a global minimum of f .
98
Introduction to the Theory of Optimization in Euclidean Space
As, we can expect, from the above example, it will not always be easy to check the convexity or concavity of a function through solving inequalities. Next, we show a more practical characterization, but requiring more regularity on the function.
Characterization of Convex/Concave C 2 Functions
2.3.3
Theorem 2.3.2 Strict convexity/concavity Let S be a convex open set of Rn and let f : S −→ R, (i) Dk (x) > 0
∀x ∈ S,
k = 1, . . . , n
=⇒
f ∈ C 2 (S). Then
f is strictly convex in S.
(ii) (−1)k Dk (x) > 0 ∀x ∈ S, k = 1, . . . , n =⇒ f is strictly concave in S. Dk (x), k = 1, . . . , n are the n leading minors of the Hessian matrix Hf (x) = (fxi xj (x))n×n of f .
Proof. i) For a, b ∈ S, a = b, and t ∈ [0, 1], define the function g(t) = f (tb + (1 − t)a) = f (a + t(b − a)) = f (x1 (t), . . . , xn (t)) with xj (t) = aj + t(bj − aj ) j = 1, . . . , n. By the chain rule theorem, we have g (t) = (∇f )(a + t(b − a)) .(b − a). Since f is C 2 , g is also C 2 and we have d d fx1 (a + t(b − a)) (b1 − a1 ) + . . . + fxn (a + t(b − a)) (bn − an ). dt dt For each i = 1, . . . , n, we have
g (t) =
fxi (a + t(b − a)) = fxi (x1 (t), x2 (t), . . . , xn (t)). Then d ∂fxi ∂x2 ∂fxi ∂xn ∂fxi ∂x1 fxi (y + t(x − y)) = + + ... + dt ∂x1 ∂t ∂x2 ∂t ∂xn ∂t = fxi x1 (a + t(b − a)) (b1 − a1 ) + . . . + fxi xn (a + t(b − a)) (bn − an ) =
n
fxi xj (a + t(b − a)) (xj − yj ).
j=1
Unconstrained Optimization
99
Hence g (t) =
n n
[fxi xj (a + t(b − a))](bi − ai )(bj − aj ).
i=1 j=1
Now, by assumption, we have Dk (z) > 0 for all z ∈ S and for all k = 1, . . . , n, then the quadratic form Q(h) =
n n
fxi xj (a + t(b − a)) hi hj
i=1 j=1
with the associated symmetric matrix fxi xj (a + t(b − a)) n×n is positive definite. As a consequence, g (t) > 0 and g is strictly convex. In particular f (tb+(1−t)a) = g(t) = g(t.1+(1−t)0) < tg(1)+(1−t)g(0) = tf (b)+(1−t)f (a) and the strict convexity of f follows. ii) Under the assumptions ii), the quadratic form Q∗ (h) =
n n
(−f )xi xj (a + t(b − a)) hi hj =t hH−f (a + t(b − a))h
i=1 j=1
⎡
(−f )x1 x1
⎢ ⎢ ⎢ (−f )x2 x1 ⎢ ⎢ =t h ⎢ ⎢ .. ⎢ . ⎢ ⎣ (−f )xk x1
(−f )x1 x2
...
(−f )x1 xk
(−f )x2 x2
...
(−f )x2 xk
.. .
.. .
.. .
(−f )xk x2
...
(−f )xk xk
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥h ⎥ ⎥ ⎥ ⎦
is positive definite by assumption. As a consequence, (−g) (t) > 0 and −g is strictly convex. In particular −f (tb + (1 − t)a) = (−g)(t) = −g(t.1 + (1 − t)0) ⇐⇒
< t(−g)(1) + (1 − t)(−g)(0) = t(−f )(b) + (1 − t)(−f )(a) f (tb + (1 − t)a) > tf (b) + (1 − t)f (a) ∀ a = b ∀t ∈ [0, 1]
and the strict concavity of f follows. We also have the following characterization
100
Introduction to the Theory of Optimization in Euclidean Space
Theorem 2.3.3 Convexity/concavity Let S be a convex open set of Rn and let f : S −→ R be C 2 . Then f is convex in S ⇐⇒ Δk (x) 0
∀x ∈ S
f is concave in S ⇐⇒ (−1)k Δk (x) 0
∀ k = 1, . . . , n.
∀x ∈ S
∀ k = 1, . . . , n.
A principal minor Δr (x) of order r in the Hessian [fxi xj (x)] of f is the determinant obtained by deleting n − r rows and the n − r columns with the same numbers (if the ith row (column) is selected, then so is the ith column (row)).
Proof. We prove only the first assertion. The second one is established by replacing f by −f . ⇐=) We proceed as in the proof of the previous theorem. We conclude that Q(h) is positive semi definite. As a consequence, g (t) 0 and g is convex. In particular f (tb+(1−t)a) = g(t) = g(t.1+(1−t)0) tg(1)+(1−t)g(0) = tf (b)+(1−t)f (a) and the convexity of f follows. =⇒) Suppose f convex in S. It suffices to show that the quadratic form Q(h) satisfies n n fxi xj (a)hi hj 0 ∀a ∈ S. Q(h) = i=1 j=1
So, let a ∈ S. Since S is an open set, there exists > 0 such that B (a) ⊂ S. In particular for h ∈ Rn , h = 0, we have a + th ∈ B (a)
⇐⇒
a + th − a = |t|h <
⇐⇒
|t|
0
∀(x, y, z, t) ∈ R4
for
k = 1, 2, 3, 4.
Therefore, f is strictly concave on R4 and the point (12, 16, 12, 12) is the only global maximum point. f doesn’t exist since f takes large negative values. Indeed, we Note that min 4 R
have, for example, f (x, 0, 0, 0) = 24x − x2 −→ −∞
as
x −→ ±∞.
106
Introduction to the Theory of Optimization in Euclidean Space
Solved Problems
1. – A power substation must be located at a point closest to n houses located at m distinct points (x1 , y1 ), (x2 , y2 ), . . . , (xm , ym ). Find the optimal location by minimizing the sum of the squares of the distances between the houses and the substation. Solution: Let (x, y) be the position of the power substation. Then, we have to look for (x, y) as the point that minimizes the function f (x, y) = d2 ((x, y), (x1 , y1 )) + d2 ((x, y), (x2 , y2 )) + . . . + d2 ((x, y), (xm , ym )) which can be written as f (x, y) = [(x−x1 )2 +(y−y1 )2 ]+[(x−x2 )2 +(y−y2 )2 ]+. . .+[(x−xm )2 +(y−ym )2 ]. Because f is polynomial, it is differentiable on the open set R2 . Thus a global minimum point is also a local one. Therefore, it is a solution of ∇f (x, y) = 2(x − x1 ) + 2(x − x2 ) + . . . + 2(x − xm ), 2(y − y1 ) + 2(y − y2 ) + . . . + 2(y − ym ) = 0, . . . , 0
⇐⇒
⎧ m ⎪ ⎪ ⎪ xk = 0 m.x − ⎪ ⎪ ⎪ ⎨ k=1 ⎪ m ⎪ ⎪ ⎪ ⎪ yk = 0 m.y − ⎪ ⎩
m
⇐⇒ x =
1 xk m
m
and
k=1
k=1
We have only one critical point. The Hessian matrix of f is m 0 fxx fxy = . Hf (x, y) = 0 m fyx fyy
y=
1 yk . m k=1
Unconstrained Optimization
107
The leading principal minors satisfy m 0 D2 (x, y) = 0 m
D1 (x, y) = m > 0
= m2 > 0.
So f is strictly convex on R2 . Then, the critical point is the global minimum of f and describes the optimal location of the substation.
2. – Let f be a function of two variables given by f (x, y) = x2 + y 4 − 4xy
for all x and y.
i) Calculate the first and second order partial derivatives of f . ii) Find all the stationary points of f and classify them by means of the second derivatives test. iii) Does f have any global extreme points? iv) Use a software to graph f .
Solution: i) and ii) Since the function f is differentiable (because it is a polynomial), the local extreme points are critical, i.e, solutions of ∇f (x, y) = 2x − 4y, 4y 3 − 4x = 0, 0
⇐⇒
⇐⇒
⎧ ⎨ 2x − 4y = 0 ⎩
4y 3 − 4x = 0
⇐⇒
⎩
y 3 − 2y = 0
⎧ ⎨ x = 2y
⎧ ⎨ x = 2y ⎩
⎧ ⎨ x = 2y
or
⎩
y=0
y=
⇐⇒
y(y 2 − 2) = 0
⎧ ⎨ x = 2y or
√ 2
⎩
√ y=− 2
√ √ √ √ We deduce that (0, 0), (2 2, 2) and (−2 2, − 2) are the critical points. Classification of the critical points: The Hessian matrix of f is Hf (x, y) =
fxx fyx
fxy fyy
=
2 −4
−4 12y 2
108
Introduction to the Theory of Optimization in Euclidean Space (x, y) (0, 0) √ √ (2 √2, √2) (−2 2, 2)
D1 (x, y) 2 2 2
D2 (x, y) −16 32 32
type saddle point local minimum local minimum
TABLE 2.7: Critical points classification of f (x, y) = x2 + y 4 − 4xy The leading principal minors are D1 (x, y) = 2
2 D2 (x, y) = −4
−4 . 12y 2
An application of the second derivatives test gives the characterization in Table 2.7. iii) and iv) The first graphing in Figure 2.28 shows a form of a saddle. On the second graphing, there are two families of circulaire curves and a hyperbola which confirm the previous classification of the critical points. z y4 4 x y x2
2 4
1 2
z
0
0
2 1
4 5 0
2
x 5
4
2
0
2
4
FIGURE 2.28: Graph and level curves of f Global extreme points. We cannot conclude about the concavity/convexity of f on R2 since the signs of the principal minors of the Hessian are as follows: 2 Δ11 1 (x, y) = 12y 0,
Δ22 1 (x, y) = 2 0,
Δ2 (x, y) = 24y 2 − 16
and Δ2 depends on y. Thus, f is neither convex, nor concave on R2 .
Unconstrained Optimization
109
However, we remark that, on the y axis, we have as f (0, y) = y 4 −→ +∞ So f cannot attain a maximum value in R2 .
y −→ ±∞.
Moreover, by completing the squares,√we √ compare the √ values√of f with its value at the local minima points f ((2 2, 2) = f ((−2 2, − 2) = −4 and obtain f (x, y) + 4 = (x − 2y)2 + (y 2 − 2)2 0
∀(x, y) ∈ R2 .
Thus, f attains its global minimal value -4 at these two points. 3. – Let f (x, y) = x2 . i) Show that f has infinitely many critical points and that the second derivatives test fails for these points. ii) Show that f is convex on R2 . iii) What is the minimum value of f ? Give the minima points. iv) Does f have any local or global maxima? Justify your answer.
Solution: z x2
6
4
4
2
z
2
0
0
2 2 4 5
4 0 x
6 5
6
4
2
0
2
4
6
FIGURE 2.29: Graph and level curves of f i) Since f is a differentiable function (because it is a polynomial), the local extreme points are critical ones, i.e, solutions of ∇f (x, y) = 2x, 0 = 0, 0
⇐⇒
x = 0.
We deduce that the points on the y axis are all critical points of f .
110
Introduction to the Theory of Optimization in Euclidean Space
Classification of the critical points: The Hessian matrix of f is Hf (x, y) =
fxx fyx
fxy fyy
=
2 0 0 0
The leading principal minors at the critical points (0, y) of f (y ∈ R) are 2 0 . = 0. D2 (0, y) = D1 (0, y) = 2 > 0 0 0 So the second derivative test is inconclusive. ii) The principal minors are Δ11 1 (x, y) = 0
2 0 = 0. Δ2 (x, y) = 0 0
Δ22 1 (x, y) = 2
and satisfy Δk (x, y) 0 for R2 .
x, y ∈ R2 . Therefore f is convex in
k = 1, 2
iii) Note that f (x, y) = x2 0 = f (0, y)
∀(x, y) ∈ R2 .
We deduce that the critical points are global minimum for f in R2 . iv) Since f is infinitely differentiable (because it is polynomial) in the open set R2 , an absolute maximum of f would be a local maximum, and therefore a critical point. But, all the critical points are minima points for f . Hence, f has no local nor absolute maxima; see Figure 2.29. In fact, on the x-axis, we have f (x, 0) = x2 −→ +∞
x −→ +∞.
as 2
So f cannot attain a maximum value M in R . Indeed, if not, we would have f (x, y) M Then, we have
∀(x, y) ∈ R2 .
f (x, 0) = x2 M
∀x ∈ R
which is not possible. For example f (M, 0) = M 2 > M
∀M > 1.
Unconstrained Optimization
111
4. – Discuss the convexity/concavity of f on R2 f (x, y) = 4xy − x2 − y 2 − 6x. Are there global extreme points? Solution: z x2 4 y x 6 x y2
6
2
4
0
z
2
2
0
4 2 6 5
4 0 x
6 6
5
4
2
0
2
4
6
FIGURE 2.30: Graph and level curves of f
We have ∇f (x, y) = 4y − 2x − 6, 4x − 2y. ∞
Since f is C , then f
is convex
⇐⇒
is semi-definite positive
Hf
where the Hessian matrix of f is fxx Hf (x, y) = fyx
fxy fyy
The principal minors of Hf are Δ11 1 (x, y) = −2,
Δ22 1 (x, y) = −2
and
=
−2 4 4 −2
−2 Δ2 (x, y) = 4
4 = −12. −2
So f is not convex, nor concave on R2 ; see Figure 2.30. Remark that f (0, y) = −y 2
−→ −∞
as
y −→ ±∞.
f takes large negative values and doesn’t attain its minimal value.
112
Introduction to the Theory of Optimization in Euclidean Space
On the other hand, when looking for the critical points of f , we obtain ∇f (x, y) = 4y − 2x − 6, 4x − 2y = 0, 0
⇐⇒
x=1
and
y = 2x = 2.
This point is a saddle point. It will help us to find a direction of increase of values of f . Indeed, by completing the squares, we obtain f (x, y) − f (1, 2) = −(2x − y)2 + 3(x − 1)2 from which we deduce that f (x, 2x) = f (1, 2) + 3(x − 1)2
−→ +∞
as
x −→ ±∞.
So f takes large positive values and doesn’t attain its maximal value either.
5. – Let f be the function defined by: f (x, y) = x4 − 2x2 + y 2 − 6y. i) Find the critical points of f . ii) Use the second derivative test to classify the critical points of f . iii) Find the global minimum value of f on R2 by completing squares. iv) Is there a global maximum value of f on R2 ? v) Show that f is convex on each of the open convex sets √ √ S1 = {(x, y) : x < −1/ 3} and S2 = {(x, y) : x > 1/ 3}. vi) Sketch these sets and plot the critical points. vii) Find
min f (x, y) S1
and
viii) Set S = S1 ∪ S2 . Find ix) Use
min f (x, y) S2
(justify)
m0 = min (x4 − 2x2 ) 2 R \S
(x4 − 2x2 ) m0 on R2 \ S to deduce
min f (x, y).
R2 \S
Unconstrained Optimization
113
Solution: The shape of the surface, in Figure 2.31, shows that the function is neither convex, nor concave. z x4 2 x2 y2 6 y
4.0
5 3.5 0 z 3.0
5
10 2.5
2 1 0 x
1
2.0 2
2
1
0
1
2
FIGURE 2.31: Graph and level curves of f i) Since f is a differentiable function (because it is a polynomial), the local extreme points are critical points solution of ⎧ ⎨ 4x(x + 1)(x − 1) = 0 ∇f (x, y) = 4x3 − 4x, 2y − 6 = 0, 0 ⇐⇒ ⎩ 2(y − 3) = 0
⇐⇒
⇐⇒
⎧ ⎨ x=0
or
x+1=0
or
x−1=0
⎩
y=3 ⎧ x = 0 and y = 3 ⎪ ⎪ ⎪ ⎪ ⎨ or [x = −1 and y = 3] ⎪ ⎪ ⎪ ⎪ ⎩ or [x = 1 and y = 3].
We deduce that (−1, 3), (0, 3) and (1, 3) are the critical points of f . ii) Classification of the critical points: The Hessian matrix of f is Hf (x, y) =
fxx fyx
The leading principal minors are D1 (x, y) = 12x2 − 4
fxy fyy
=
12x2 − 4 0
0 2
12x2 − 4 0 . D2 (x, y) = 0 2
114
Introduction to the Theory of Optimization in Euclidean Space
(x, y)
D1 (x, y)
(−1, 3)
(0, 3)
D2 (x, y) 8 0
8
(1, 3)
0 = 16 2
−4 0 0 2
−4
8 0
8
type
local minimum
= −8
0 = 16 2
saddle point
local minimum
TABLE 2.8: Classifying critical points of f (x, y) = x4 − 2x2 + y 2 − 6y The second derivative test gives the following characterization of the points in Table 2.8. iii) Global minimum value of f : We have f (x, y) = (x2 − 1)2 + (y − 3)2 − 10 −10 = f (1, 3) = f (−1, 3)
∀(x, y) ∈ R2
Thus min
(x,y)∈R2
f (x, y) = −10 = f (1, 3) = f (−1, 3).
iv) Global maximum value of f : We have f (x, 3) = (x2 − 1)2 − 10
and
lim f (x, 3) = +∞.
x→±∞
So f doesn’t attain its maximum value on R2 . v) The principal minors of the Hessian of f are fyy = 2 0, Δ11 1 = 12x2 − 4 Δ2 = 0
fxx = 12x2 − 4 0 ⇐⇒ |x| √1 Δ22 1 = 3
0 = 8(3x2 − 1) 0 2
⇐⇒
1 |x| √ 3
Unconstrained Optimization So Δk 0
1 |x| √ . 3
⇐⇒
k = 1, 2
115
Hence, Hf is semi definite positive on each open convex set S1 and S2 . Hence f
is convex on each of the open convex sets
S1 and S2 .
vi) Sketch of the sets S1 and S2 in Figure 2.32. y
4
2
4
2
S2
2
4
x
2
S1
4
FIGURE 2.32: The convex sets S1 , and S2
vii) Since f is convex on S1 = [x < − √13 ] and the critical point (−1, 3) is in S1 with f (−1, 3) = −10, then min f (x, y) = f (−1, 3) = −10. S1
f is also convex on S2 = [x > f (1, 3) = −10, then
√1 ] 3
and the critical point (1, 3) is in S2 with
min f (x, y) = f (1, 3) = −10. S2
viii) We have ϕ(x) = x4 − 2x2
ϕ (x) = 4x3 − 4x = 4x(x − 1)(x + 1)
Using Table 2.9, we find that min
x∈[− √13 , √13 ]
1 5 1 ϕ(x) = ϕ(− √ ) = ϕ( √ ) = − . 9 3 3
116
Introduction to the Theory of Optimization in Euclidean Space x ϕ (x) ϕ(x)
−1 +
−1
0 0 0
1 −
−1
TABLE 2.9: Variations of ϕ(x) = x4 − 2x2 ix) We deduce that x4 − 2x2 −
5 9
1 1 ∀x ∈ [− √ , √ ] 3 3
5 1 5 5 f (x, y) − +y 2 −6y = (y−3)2 −9− −9− = f (± √ , 3) ∀(x, y) ∈ R2 \S. 9 9 9 3 Hence, 86 1 5 1 min f (x, y) = f (− √ , 3) = f ( √ , 3) = −9 − = − . 9 9 3 3
R2 \S
Remark. Note that f (x, y) −9 −
5 −10 9
∀(x, y) ∈ R2 \ S.
We also have f (x, y) −10 = f (−1, 3) = f (1, 3)
∀(x, y) ∈ S.
Hence, f (x, y) = f (−1, 3) = f (1, 3) = −10. min 2 R
Unconstrained Optimization
2.4
117
Extreme Value Theorem
The first main result of this section is
Theorem 2.4.1 Extreme value theorem Let S be a closed bounded set of Rn . Let f ∈ C 0 (S). Then f attains both its maximal and minimal values in S; that is, min f exist. max f and S
S
The proof of the extreme value theorem uses the fact that the image of a closed bounded set S of Rn by a real valued continuous function f : S −→ R is a closed bounded set of R [18]. Thus f (S) is a closed bounded interval [a, b]. Therefore ∃xm , xM ∈ S
such that
f (xm ) = a,
f (xM ) = b.
Since f (S) = [f (xm ), f (xM )], then f (xm ) f (x) f (xM )
∀x ∈ S.
Therefore, f (xm ) = min f (x) S
and
f (xM ) = max f (x). S
Remark 2.4.1 When f is a continuous function on a closed and bounded set S, then the extreme value theorem guarantees the existence of an absolute maximum and an absolute minimum of f on S. These absolute extreme points can occur either on the boundary of S or in the interior of S. As a consequence, to look for these points, we can proceed as follows: – find the critical points of f that lie in the interior of S – find the boundary points where f takes its absolute values on the boundary – compare the values of f taken at the critical and boundary points found. The largest of the values of f at these points is the absolute maximum and the smallest is the absolute minimum.
118
Introduction to the Theory of Optimization in Euclidean Space
Example 1. Find the extreme values of f (x) = 13 x3 − 12 x2 − 2x + 3 on the intervals [−1, 1] and [−2, 2]. Solution: We have
f (x) = x2 − x − 2 = (x − 2)(x + 1)
f (x) = 0
⇐⇒
x=2
and
x = −1.
or
We deduce that 2 and −1 are the critical points of f ; see Figure 2.33. • The values max f (x) and min f (x) exist by the extreme value theorem x∈[−1,1]
x∈[−1,1]
because f is continuous on the closed bounded interval [−1, 1]. Now, since there is no critical points in the interior of the interval (−1, 1), these values must be in {f (−1) , f (1)}. Comparing these two values, we conclude that 25 6
max f (x) = f (−1) =
x∈[−1,1]
and
min f (x) = f (1) =
x∈[−1,1]
5 . 6
• The values max f (x) and min f (x) exist by the extreme value theorem x∈[−2,2]
x∈[−2,2]
because f is continuous on the closed bounded interval [−2, 2]. The critical point −1 is in the interior of the interval (−2, 2), the absolute values must be 7 25 1 in {f (−2) , f (−1) , f (2)} = { , , − }. Comparing these three values, we 3 6 3 conclude that max f (x) = f (−1) =
x∈[−2,2]
25 6
1 min f (x) = f (2) = − . 3
and
x∈[−2,2]
y y
x3 3
x2 2
2x3 4
2
4
2
2
4
x
2
4
FIGURE 2.33: Absolute values on a closed interval
Unconstrained Optimization
119
Example 2. Find the absolute maximum and minimum values of f (x, y) = 4xy − x2 − y 2 − 6x on the closed triangle S = {(x, y) : 0 x 2,
0 y 3x}.
Solution: f is continuous (because it is a polynomial) on the triangle S, which is a bounded and closed subset of R2 . So f attains its absolute extreme points on S at the stationary points lying at the interior of S or on points located at the boundary of S (see Figure 2.34). ∗ Interior stationary points of f : We have ∇f = 4y − 2x − 6, 4x − 2y = 0, 0
⇐⇒
(x, y) = (1, 2).
The point (1, 2) is the only critical point of f and f (1, 2) = −3. y
6 2 2 z x 6 4 yx6x y
y
4
2 4
L3
0 0 L2 5 z
2
10
15 0.0 1
1 L1
2
3
4
0.5
x
1.0 x
1.5 2.0
FIGURE 2.34: Extreme values of f on the triangular plane region S ∗ Extreme values of f at the boundary of S: Let L1 , L2 and L3 be the three sides of the triangle, defined by: L1 = {(x, 0), 0 x 2}
L2 = {(2, y), 0 y 6}
L3 = {(x, 3x), 0 x 2}. – On L1 , we have: f (x, 0) = −x2 − 6x = g(x), g (x) = −2x − 6. We deduce from the monotony of g (see Table 2.10) that
120
Introduction to the Theory of Optimization in Euclidean Space x g (x) g(x)
0
2 −
0
− 16
TABLE 2.10: Variations of g(x) = −x2 − 6x on [0, 2]
max f = f (0, 0) = 0
min f = f (2, 0) = −16.
and
L1
L1
– On L2 , we have: f (2, y) = −y 2 + 8y − 16 = h(y), y h (y) h(y)
0
4 0
+ −16
h (y) = −2y + 8. 6
−
0
−4
TABLE 2.11: Variations of h(y) = −y 2 + 8y − 16 on [0, 6] Then, from Table 2.11, we obtain max f = f (2, 4) = 0
min f = f (2, 0) = −16.
and
L2
L2
– On L3 , we have: f (x, 3x) = 2x2 − 6x = l(x), x l (x) l(x)
0 0
−
3 2
l (x) = 4x − 6. 2
0 − 92
+
−4
TABLE 2.12: Variations of l(x) = 2x2 − 6x on [0, 2] Using Table 2.12, we deduce that max f = f (0, 0) = 0 L3
and
min f = f (3/2, 9/2) = −9/2. L3
∗ Conclusion: We list, in Table 2.13, the values of f at the interior critical points and at the boundary points where an absolute extreme value occurs on the considered side of the boundary. We conclude that the absolute maximum value of f is f (0, 0) = f (2, 4) = 0 and the absolute minimum value is f (2, 0) = −16. Now, here is a version of an extreme value theorem for a continuous function on an unbounded domain.
Unconstrained Optimization (x, y) f (x, y)
(1, 2) −3
(0, 0) 0
(2, 0) −16
121
(2, 4) 0
(3/2, 9/2) −9/2
TABLE 2.13: Values of f at the points
Theorem 2.4.2 Let f (x) be a continuous function on an unbounded set S of Rn such that lim
x→+∞
(resp. − ∞).
f (x) = +∞
Then, there exists an element x∗ ∈ S such that f (x∗ ) = min f (x)
(resp. max f (x)).
x∈S
x∈S
Proof. Let x0 ∈ S. There exists R0 > 0 such that ∀x ∈ S :
x > R0 inf f (x)
So the optimization problem min f (x)
x∈S0
=⇒
x∈S
f (x) > f (x0 ).
is equivalent to
S0 = S ∩ {x ∈ Rn :
with
x R0 }.
Indeed, we have S0 ⊂ S
min f (x) = inf f (x) inf f (x).
=⇒
x∈S0
x∈S0
x∈S
Moreover, we have f (x) > f (x0 ) f (x) min f (z)
if
z∈S0
then
x ∈ S \ S0
if
x ∈ S0
f (x) max f (x0 ), min f (z) min f (z) z∈S0
z∈S0
Hence inf f (x) min f (z).
x∈S
z∈S0
∀x ∈ S.
122
Introduction to the Theory of Optimization in Euclidean Space
Note that the minimum min f (z) is attained by the extreme value theorem z∈S0
since S0 is a bounded closed set of Rn . Therefore ∃x∗ ∈ S0 Now, since, we have
min f (x) = f (x∗ ).
such that
inf f (x) f (x∗ ),
x∈S
x∈S0
we deduce that
f (x∗ ) = inf f (x) = min f (x). x∈S
x∈S0
Example 3. Let f (x) = 3x4 + 4x3 − 12x2 + 2. i)
Show that f has an absolute minimum on R.
ii)
Find the minimal value of f on R.
Solution: The graphing in Figure 2.35 shows three local extrema. y 10
3
2
1
1
2
3
x
10
20
y 3 x4 4 x3 12 x2 2
30
FIGURE 2.35: Absolute minimum of f i) f is continuous on R since f is polynomial. Moreover, we have lim
|x|→+∞
f (x) = +∞.
Then f attains its minimum value at some point x∗ ∈ R. ii) Since R is an open set and x∗ ∈ R, then x∗ must be a critical point of f . We have f (x) = 12x3 + 12x2 − 24x = 12x(x + 2)(x − 1) f (x) = 0
⇐⇒
x = 0,
x = −2
or
and x = 1.
Unconstrained Optimization
123
We deduce that ! " ! " min f (x) = min f (0), f (−2), f (1) = min 2, −30, −3 = −30 = f (−2). x∈R
Example 4. Let f (x) = p(x) = xn + +an−1 xn−1 + · · · + a1 x + a0 be a polynomial with n 1. If n is odd, then lim p(x)
x→+∞
and
lim p(x)
x→−∞
have opposite signs (one is +∞ and the other is −∞), so f has no absolute extreme points. If n is even, then the limits above have the same sign. When they are both equal to +∞, f has an absolute minimum but no absolute maximum. When the limits are both equal to −∞, f has an absolute maximum but no absolute minimum.
124
Introduction to the Theory of Optimization in Euclidean Space
Solved Problems
1. – Define the function f (x, y) =
1 2 1 2 x − y on the closed unit disk. Find 4 9
i) the critical points ii) the local extreme values iii) the absolute extreme values.
Solution: z
x2 4
y2 9
1.0
0.5
0.0
0.2 1.0
0.1 0.0
0.5 0.5
0.1 0.0
1.0 0.5 0.5
0.0
1.0
0.5 1.0
1.0
1.0
0.5
0.0
0.5
1.0
FIGURE 2.36: Graph of f on the unit disk and level curves i) Since f is differentiable, the critical points are solution of ∇f (x, y) = 0, 0. That is x 2y ∇f (x, y) = , − = 0, 0 2 9
⇐⇒
(x, y) = (0, 0).
So (0, 0), the origin of the unit disk, is the unique critical point of f .
Unconstrained Optimization ii) Nature of the local extreme point. fxx =
1 , 2
125
We have
2 fyy = − , 9
fxy = 0.
1 2 ](0, 0) = − < 0 Then D2 (0, 0) = [fxx fyy − fxy 9 point; see Figure 2.36.
and (0, 0) is a saddle
iii) Global extreme points. Since the unit disk is a bounded closed subset of R2 , f attains its global extreme points on this set since it is continuous (because it is a polynomial function). These extreme points are interior critical points or points on the boundary of the disk. ∗ Extreme values of f on the boundary of the disk: On the unit disk, f takes the values (see Table 2.14) f (cos t, sin t) =
1 1 cos2 t − sin2 t = g(t), 4 9
t ∈ [0, 2π].
We have 2 13 1 g (t) = − cos t sin t − sin t cos t = − sin t cos t 2 9 18
θ sin t cos t g (t) g(t)
π 2
0
1 4
+ + −
−
1 9
3π 2
π + − +
1 4
− − −
TABLE 2.14: Variations of g(t) =
−
1 9
cos2 t −
1 9
1 4
2π − + +
1 4
sin2 t
∗ Conclusion: We list, in Table 2.15, the values of f at the critical point and at the boundary points where f attains its absolute values on that boundary. (x, y)
(0, 0)
f (x, y)
0
(1, 0) 1 4
(0, 1) 1 − 9
(−1, 0) 1 4
(0, −1) 1 − 9
TABLE 2.15: Values of f (x, y) = 14 x2 − 19 y 2 at candidate points
126
Introduction to the Theory of Optimization in Euclidean Space
The absolute maximal value of f on the disk is (1, 0) and (−1, 0).
1 and is attained on the points 4
The absolute minimal value of f on the disk is − points (0, 1) and (0, −1).
1 and is attained on the 9
2. – Find the absolute extreme points of the function f (x, y) = (4x − x2 ) cos y on the rectangular region
1 x 3,
−π/4 y π/4.
Solution: y 2.0 1.5 L3
1.0 0.5 L4 1
L2 1
2
0.5
3
4
x
R
1.0
L1
1.5
FIGURE 2.37: The plane region R
f is continuous (because it is the product of a polynomial function and the cosine function) on the rectangle R = [1, 3] × [−π/4, π/4] (see Figure 2.37), which is a closed bounded set of R2 , then f attains its absolute extreme points on R. These points are attained at the critical points of f located at the interior of R or on points located on ∂R. ∗ Interior stationary points of f . We have ∇f = (4 − 2x) cos y, −(4x − x2 ) sin y = 0, 0
⇐⇒
⎧ ⎨ x=2
or
cos y = 0
⎩
or
x=4
x=0
⇐⇒ or
sin y = 0
(x, y) = (2, 0).
Unconstrained Optimization
127
The point (2, 0) is the only critical point of f , as shown in Figure 2.38, and f (2, 0) = 4. ∗ Extreme values of f at the boundary of R: Let L1 , L2 , L3 and L4 the four sides of the rectangle R, defined by: L1 = {(x, − π4 ), L3 = {(x, π4 ),
1 x 3}, 1 x 3},
L2 = {(3, y),
− π4 y
π 4}
L4 = {(1, y),
− π4 y
π 4 }.
1.0
z 4 x x2 cosy
0.5
4.0
0.0
3.5 3.0
0.5
2.5 0.5
0.0
1.0 1.5 2.0
0.5 2.5
1.0 1.0
3.0
1.5
2.0
2.5
3.0
FIGURE 2.38: Values of f on R and level curves
– On L1 , we have: f (x, − π4 ) = x g (x) g(x)
1 3 2
√
2 2 (4x
− x2 ) = g(x),
g (x) =
2 √
2
+
√ 2 2
√ 2(2 − x).
3 −
3 2
√
TABLE 2.16: Variations of g(x) =
2 2 (4x
√ 2
− x2 )
We deduce from the monotony of g, described in Table 2.16, that √ π max f = f (2, − ) = 2 2 L1 4
√ π 3 2 π . min f = f (1, − ) = f (3, − ) = L1 4 4 2
– On L2 , we have: f (3, y) = 3 cos y = h(y), h (y) = −3 sin y. From the monotony of h (see Table 2.17), we have
max f = f (3, 0) = 3 L2
√ π 3 2 π . min f = f (3, − ) = f (3, − ) = L2 4 4 2
128
Introduction to the Theory of Optimization in Euclidean Space − π4
y h (y) h(y)
3 2
√
π 4
0 2
−
+
3 2
3
√ 2
TABLE 2.17: Variations of h(y) = 3 cos y – On L3 , we have: f (x, π4 ) = x l (x) l(x)
√
2 2 (4x
− x2 ) = l(x),
1 3 2
l (x) =
2 √
2
+
√
2(2 − x).
3 −
√ 2 2
3 2
√
TABLE 2.18: Variations of l(x) =
2 2 (4x
√
2
− x2 )
As a consequence of Table 2.18, we have √ π 3 2 π . min f = f (1, ) = f (3, ) = L3 4 4 2
√ π max f = f (2, ) = 2 2 L3 4
– On L4 , we have: f (1, y) = 3 cos y = m(y), y m (y) m(y)
− π4 3 2
√ 2
m (y) = −3 sin y. π 4
0 +
− 3
3 2
√ 2
TABLE 2.19: Variations of m(y) = 3 cos y From the behaviour described in Table 2.19, we deduce that
max f = f (1, 0) = 3 L4
√ π 3 2 π . min f = f (1, − ) = f (1, − ) = L4 4 4 2
∗ Conclusion: We list the particular points found above in Table 2.20. The maximal value of f on R is 4 and it is attained at the point (2, 0), which is an interior critical point. √ The minimal value of f on R is 3 2 2 and it is attained at the points (1, − π4 ), (1, π4 ), (3, − π4 ) and (3, π4 ).
Unconstrained Optimization (x, y) f (x, y)
(2, 0) 4
(2, ± π4 ) √ 2 2
(1, √ ± π4 ) 3 2 2
129
(3, √ ± π4 ) 3 2 2
(3, 0) 3
(1, 0) 3
TABLE 2.20: Values of f (x, y) = (4x − x2 ) cos y at candidate points 3. – Find the points on the surface z 2 = xy + 4 that are closer to the origin.
Solution: The distance of a point (x, y, z) to the origin is given by d = x2 + y 2 + z 2 . The problem is equivalent to minimize d2 = x2 + y 2 + z 2 on the set z 2 = xy + 4 or equivalently to look for min x2 + y 2 + (xy + 4) = f (x, y).
S=R2
Note that the function f is continuous on the unbounded set R2 and satisfies 1 1 1 f (x, y) x2 + y 2 − (x2 + y 2 ) + 4 = (x2 + y 2 ) + 4 = (x, y) + 4 2 2 2 since |xy|
1 2 (x + y 2 ). 2
lim
f (x, y) = +∞.
Thus (x,y)→+∞
Hence, the minimization problem has a solution. Note that a global minimum of the problem is also a local minimum, i.e., solution of
∇f = 2x+y, 2y +x = 0, 0
⇐⇒
(x, y) = (0, 0)
since
2 1
1
0. = 2
The point (0, 0) is the only critical point of f and f (0, 0) = 4. Since the global minimum exists, then (0, 0) is the global minimum and the corresponding points on the surface z 2 = 4 + xy where the distance is closer to (0, 0, 0) are: (0, 0, ±2). We can also verify that (0, 0) is a local minimum by applying the second derivatives test. Indeed, we have fxx = 2,
fxy = 1,
fyy = 2
130
Introduction to the Theory of Optimization f fxy D2 (x, y) = xx D1 (x, y) = fxx = 2 fxy fyy
in Euclidean Space 2 1 = 1 2 = 3.
Since D1 (0, 0) > 0 and D2 (0, 0) > 0, then (0, 0) is a strict local minimum.
4. – i) Find the quantities x, y that should be produced to maximize the total profit function f (x, y) = x + 4y subject to 2x + 3y 19, x + y 8,
−3x + 2y 4 0 x 6,
y 0.
ii) Use level curves to solve the problem geometrically.
Solution: y
6
L4 4
L5
L3 2
S
L2
L6
2
L1
4
6
x
FIGURE 2.39: Hexagonal plane region S
i) Set S = {(x, y) : 2x + 3y 19, −3x + 2y 4, x + y 8, 0 x 6, y 0}. The set S is the region of the plan xy, located in the first quadrant and bounded by the lines 2x+3y = 19, −3x+2y = 4, x+y = 8; see Figure 2.39. It is a closed bounded convex of R2 . Since f is continuous (because it is a
Unconstrained Optimization
131
polynomial), it attains its absolute extreme points on S at the stationary points lying at the interior of S or on points located at the boundary of S. ∗ Interior stationary points of f .
We have ∀(x, y) ∈ R2 .
∇f = 1, 4 = 0, 0 There is no critical point of f . ∗ Extreme values of f at the boundary of S:
Let L1 , · · · , L6 be the six sides of the hexagon S, defined by: L1 = {(x, 0),
L3 = {(x, 8 − x),
L5 = {(x,
4 + 3x ), 2
0 x 6},
5 x 6},
0 x 2},
L2 = {(6, y),
L4 = {(x,
0 y 2}
19 − 2x ), 2
L6 = {(0, y),
2 x 5},
0 y 2}.
On L1 , we have: f (x, 0) = x, max f = f (6, 0) = 6 L1
min f = f (0, 0) = 0. L1
On L2 , we have: f (6, y) = 6 + 4y, max f = f (6, 2) = 10 L2
min f = f (6, 0) = 6. L2
On L3 , we have: f (x, 8 − x) = x + 4(8 − x) = 32 − 3x, max f = f (5, 3) = 17 L3
min f = f (6, 2) = 10. L3
On L4 , we have: f (x, 19−2x ) = x + 43 (19 − 2x) = 13 (76 − 5x), 3 max f = f (2, 5) = 22 L4
min f = f (5, 3) = 17. L4
On L5 , we have: f (x, 4+3x 2 ) = x + 2(4 + 3x) = 8 + 7x, max f = f (2, 5) = 22 L5
min f = f (0, 2) = 8. L5
132
Introduction to the Theory of Optimization in Euclidean Space On L6 , we have: f (0, y) = 4y, max f = f (0, 2) = 8
min f = f (0, 0) = 0.
L6
L6
∗ Conclusion: We list, in Table 2.21 below, the values of f at the boundary points where f takes absolute values on each side of the set S. We conclude that the absolute maximum value of f is f (2, 5) = 22 and the absolute minimum value is f (0, 0) = 0. (x, y) f (x, y)
(0, 0) 0
(6, 0) 6
(6, 2) 10
(5, 3) 17
(2, 5) 22
(0, 2) 8
TABLE 2.21: Values of f (x, y) = x + 4y at candidate points
ii) To solve the problem geometrically, we sketch the level curves x + 4y = k. The profit k is attained if the line has common points with the region S. The profit 0 is attained at the point (0, 0) at the level curve x + 4y = 0. When the profit k increases, the lines x + 4y = k are parallel and move out farther to reach the point (2, 5) where the highest profit is attained; see Figure 2.40. y
6
L4 4
x4y 22
L5
L3 2
S
L2
L6
2
L1
4
6
x
x4y 0
FIGURE 2.40: Level curve of highest profit
Unconstrained Optimization
133
Remark 2.4.2 Note that the points that appear in the above table are the vertices of the hexagon S. The extreme points are attained at two of these vertices. This is true in the more general problem min p.x = p1 x1 + · · · + pn xn
or
S
with
S = {x = (x1 , · · · , xn ) ∈ R+
n
max p.x S
: Ax b}
where A = (aij )1im, 1jn ,
b =t (b1 , · · · , bn )
and
p =t (p1 , · · · , pn ).
We look for the extreme points on the polyhedra U = {x = (x1 , · · · , xn ) ∈ R+n : Ax = b}. We establish, when it exists, that an extreme point is at least a corner of the polyhedra U . However, when m and n take large values, the number of corners is very important, and linear programming develops various methods to approach these optimal values of the objective function [19], [5], [29].
Chapter 3 Constrained Optimization-Equality Constraints
In this chapter, we are interested in optimizing functions f : Ω ⊂ Rn −→ R over subsets described by equations
g(x) = (g1 (x), g2 (x), . . . , gm (x)) = cRm
with
m < n x ∈ Rn .
Denote the set of the constraints by
S = [g(x) = c] = [g1 (x) = c1 ] ∩ [g2 (x) = c2 ] ∩ . . . ∩ [gm (x) = cm ]. In dimension n = 3, when m = 1, the equation g1 (x, y, z) = cR3 may describe a surface, while when m = 2, the equations
g1 (x, y, z) = c1
g2 (x, y, z) = c2
and
may describe a curve as the intersection of two surfaces. Thus, the set [g = c] can be seen as a set of dimensions less than 3. For m = 3, the set [g = c] may be reduced to some points or to the empty set. For this reason, we will not consider these situations and assume always m < n. x52 y2 z2 9 y 0
5 5
z
0
5 5
0 x 5
FIGURE 3.1: S = [g1 = 9], n = 3, m = 1
Example. ∗ S = [g1 (x, y, z) = x2 + y 2 + z 2 = 9] is a surface (the sphere centered at the origin with radius 3; see Figure 3.1). Here (n = 3, m = 1).
135
136
Introduction to the Theory of Optimization in Euclidean Space
∗∗ S = [g1 (x, y, z) = x2 + y 2 + z 2 = 9] ∩ [g2 (x, y, z) = z = 2] is the intersection of the previous sphere with the plan z = 2; see Figure 3.2.
x2 5y2 z2 9, z 2
x2 y2 z2 9, z 2
y
2
0
y
1
0 1
5 5
2 4
3 z
0
z
2
1 5
0
5
2 1 0
0
x
x
1
5
2
FIGURE 3.2: S = [g1 = 9] ∩ [g2 = 2], n = 3, m = 2
∗∗ S = [g1 (x, y, z) = x2 + y 2 + z 2 = 9] ∩ [g2 (x, y, z) = z = 2] ∩ [g3 (x, y, z) = y = 1] = {(2, 1, 2), (−2, 1, 2)}is the intersection of the sphere with the two planes z = 2 et y = 1. It is reduced to two points; see Figure 3.3.
x2 y2 z2 9, z 2, y 1
5 y
2
0
y
1
0 1
5 5
2 4
3 z
0
z
2
1 5
0
5
2 1 0
0
x
x 5
1 2
FIGURE 3.3: S = [g1 = 9] ∩ [g2 = 2] ∩ [g3 = 1],
n = 3, m = 3
As in the case of unconstrained optimization, we will need to reduce our set of searches of the extreme points by looking for some necessary conditions. A local study for such points x∗ cannot be done by considering balls centered at these points because the points x∗ + th, with |h| small, do not remain necessarily inside the set [g = c]. This situation prevents us from comparing the values f (x∗ + th) with f (x∗ ). In order to remain close to x∗ through points of the set [g = c], an idea is to consider all the curves passing through x∗ included in the constraint set. We will consider curves t −→ x(t) such that the set {x(t) : t ∈ [−a, a], x(0) = x∗ }, for some a > 0, are included in [g = c]. So, if x∗ is a local maximum of f , then we have
Constrained Optimization-Equality Constraints f (x(t)) f (x∗ )
137
∀t ∈ [−a, a].
Thus, 0 is local maximum point for the function t −→ f (x(t)). Hence
d f (x(t)) =0 = f (x(t)).x (t) dt t=0 t=0
=⇒
f (x∗ ).x (0) = 0
x (0) is a tangent vector to the curve x(t) at the point x(0) = x∗ . This equality musn’t depend on a particular curve x(t). So, we must have
f (x∗ ).x (0) = 0
for any curve x(t) such that g(x(t)) = c.
In this chapter, first, we will characterize, in Section 3.1, the set of tangent vectors to such curves, then establish in Section 3.2, the equations satisfied by a local extreme point x∗ . In Section 3.3, we identify the candidates’ points for optimality, and in Section 3.4, we explore the global optimality of a constrained local candidate point. Finally, we establish, in Section 3.5, the dependence of the optimal function with respect to certain of its parameters.
3.1
Tangent Plane
Let
x∗ ∈ S = [g(x) = c].
Definition 3.1.1 The set defined by T = { x (0) :
t −→ x(t) ∈ S,
x ∈ C 1 (−a, a),
a > 0,
x(0) = x∗ }
of all tangent vectors at x∗ to differentiable curves included in S, is called tangent plane at x∗ to the surface [g = c].
We have the following characterization of the tangent plane T at a regular point x∗ of S.
138
Introduction to the Theory of Optimization in Euclidean Space
Definition 3.1.2 A point x∗ ∈ S = [g = c] is said to be a regular point of the constraints if the gradient vectors ∇g1 (x∗ ), . . ., ∇gm (x∗ ) are linearly independent (LI). That is, the m × n matrix ⎤ ⎡ ∂g ∂g1 ∂g1 1 . . . ∂x1 ∂x2 ∂xn ⎥ ⎢ ⎥ ⎢ ⎢ ∂g2 ∂g2 ∂g2 ⎥ ∗ . . . ⎢ has rank m. g (x ) = ⎢ ∂x1 ∂x2 ∂xn ⎥ .. .. .. ⎥ ⎥ ⎢ .. ⎣ . . . . ⎦ ∂gm ∂x1
∂gm ∂x2
...
∂gm ∂xn
v1 , . . . , vm ∈ Rn are LI ⇐⇒ α1 v1 + . . . + αm vm = 0 =⇒ α1 = . . . = 0 . The rank of a matrix = rank of its transpose [10].
Theorem 3.1.1 At a regular point x∗ ∈ S = [g = c], where g is C 1 in a neighborhood of x∗ , the tangent plane T is equal to the subspace M = {y ∈ Rn :
g (x∗ )y = 0}.
The proof of this theorem is an application of the implicit function theorem. Proof. We have T ⊂ M : Indeed, let y ∈ T, then ∃ x ∈ C 1 (−a, a) such that g(x(t)) = c x(0) = x∗ , x (0) = y.
∀t ∈ (−a, a) for some a > 0,
Differentiating the relation g(x(t)) = c, we obtain g (x(t))x (t) = 0
∀t ∈ (−a, a)
=⇒
g (x(0))x (0) = 0
⇐⇒
g (x∗ )y = 0.
Hence y ∈ M. M ⊂ T : ∗ Indeed, let y ∈ M \ {0} and consider the vectorial equation F (t, u) = g(x∗ + ty +t g (x∗ )u) − c = 0, where for fixed t, the vector u ∈ Rm is the unknown.
Constrained Optimization-Equality Constraints
139
Note that F is well defined on an open subset of R × Rm . Indeed, if g is C 1 on Bδ (x∗ ) ⊂ Rn , then ∀(t, u) ∈ (−δ0 , δ0 ) × Bδ0 (0)
with
δ0 = min
δ δ , 2y 2g (x∗ )
(x∗ + ty +t g (x∗ )u) − x∗ |t|y + ug (x∗ ) < =⇒
δ δ δ δ y + g (x∗ ) = + = δ ∗ 2y 2g (x ) 2 2
[x∗ + ty +t g (x∗ )u] ∈ Bδ (x∗ ).
We have X(t, u) = x∗ + ty +t g (x∗ )u
F (t, u) = g(X(t, u)) − c Xj (t, u) = x∗j + tyj +
m ∂gl ∗ (x )ul ∂xj
∂gi ∗ ∂Xj = (x ) ∂ui ∂xj
l=1
∂Fk (t, u) = ∂ui ∂F
k
∂ui
n j=1
(t, u)
n
∂gk ∂gi ∗ ∂gk ∂Xj = (X(t, u)) (x ) ∂Xj ∂ui ∂x ∂x j j j=1
k,i=1,··· ,m
t
= g (X(t, u)) g (x∗ ) .
By hypotheses, we have – F is a C 1 function in the open set A = (−δ0 , δ0 ) × Bδ0 (0) – F (0, 0) = g(x∗ ) − c = 0 – (0, 0) ∈ (−δ0 , δ0 ) × B(0, δ0 ), so (0, 0) is an interior point – det(∇u F (0, 0)) =
∗
rankg (x ) = m.
t
∂(F1 , · · · , Fm ) = det g (x∗ ) g (x∗ )
= 0 ∂(u1 , · · · , um )
as
Then, by the implicit function theorem, there exists open balls B (0) ⊂ (−δ0 , δ0 ),
Bη (0) ⊂ Bδ0 (0),
, η > 0
with
B (0) × Bη (0) ⊆ A,
140
Introduction to the Theory of Optimization in Euclidean Space
and such that det(∇u F (t, u)) = 0 ∀t ∈ B (0),
∃!u ∈ Bη (0) :
u : (−, ) −→ Bη (0); The curve
B (0) × Bη (0)
in
F (t, u) = 0 is a C 1 function.
t −→ u(t)
x(t) = X(t, u(t)) = x∗ + ty +t g (x∗ )u(t)
is thus, by construction, a curve on S. By differentiating both sides of F (t, u(t)) = g(x(t)) − c = g(X(t, u(t))) − c = 0 with respect to t, we get n
0=
∂g ∂Xj d g(x(t)) = dt ∂Xj ∂t j=1
0=
d g(x(t)) dt t=0
m ∂gl ∗ (x )ul ∂xj
m
∂gl ∂ul ∂Xj = yj + (x∗ ) ∂t ∂xj ∂t l=1 l=1 # n m ∂g ∂gl ∗ ∂ul = (X(t, u)) yj + (x ) ∂xj ∂xj ∂t j=1
Xj (t, u) = x∗j + tyj +
∗
∗ t
∗
l=1
t=0
= g (x )y + g (x ) g (x )u (0). Since y ∈ M \ {0}, we have g (x∗ ).y = 0. Moreover, since g (x∗ )t g (x∗ ) is nonsingular, we conclude that u (0) = 0. Hence x (0) = y +t g (x∗ )u (0) = y + 0 = y and y is a tangent vector to the curve x(t) included in S, so y ∈ T. ∗∗ If y = 0, the constant curve x(t) = x∗ is included in S and x (0) = 0 = y, so 0 ∈ T. It is easy to show that M is a subspace of Rn . Indeed, 0 ∈ M and for y1 , y2 ∈ M, κ ∈ R, we have g (x∗ )(y1 + κy2 ) = g (x∗ )y1 + κg (x∗ )y2 = 0.
Constrained Optimization-Equality Constraints
141
Theorem 3.1.2 Implicit function theorem [15] [20] Let A in Rn × Rm be an open set. Let F = (F1 , . . . , Fm ) be a C 1 (A) function. Consider the vector equation F (x, y) = 0. If ◦
∃(x0 , y 0 ) ∈ A = A,
F (x0 , y 0 ) = 0
then, ∃, η > 0 such that
det Fy (x, y) = 0 ∀x ∈ B (x0 ),
and
det Fy (x0 , y 0 ) = 0,
∀(x, y) ∈ B (x0 ) × Bη (y 0 ) ⊂ A
∃!y ∈ Bη (y 0 ) :
ϕ : B (x0 ) −→ Bη (y 0 );
F (x, y) = 0
x −→ ϕ(x) = y
is C 1 (B (x0 ))
−1 Fx (x, y) ϕ (x) = − Fy (x, y)
where ⎡ ∂F1 ∂y1
⎢ Fy (x, y) = ∇y F (x, y) = ⎣ ...
∂Fm ∂y1
det(Fy (x, y)) =
... .. . ...
∂(F1 , . . . , Fm ) ∂(y1 , . . . , ym )
∂F1 ∂ym
⎤
.. ⎥ . ⎦
gradient of F with respect to y
∂Fm ∂ym
Jacobian of F with respect to y.
Remark 3.1.1 Denote by T(x∗ ) the translation of T by the vector x∗ : T(x∗ ) = x∗ + M = {x∗ + h ∈ Rn : g (x∗ ).h = 0} = {x ∈ Rn : g (x∗ ).(x − x∗ ) = 0}. T(x∗ ) is the tangent plane to the surface [g(x) = c] passing through x∗ .
142
Introduction to the Theory of Optimization in Euclidean Space y 1.0
y 1.5 2
0.5
y x 1 1
tangent line y 1 1.0
0.5
0.5
1.0
1.5
2.0
2.5
x y 1 x2
0.5 0.5
1.0
1.5
tangent line y 1
1.0
0.5
0.5
1.0
1.5
x
0.5
FIGURE 3.4: Horizontal tangent line at local extreme points
Remark 3.1.2 Tangent plane at a point of a surface z = f (x) * Suppose x∗ is an interior point of a surface z = f (x) where f is a C 1 function. Then, the tangent plane at (x∗ , f (x∗ )) is given by z = f (x∗ ) + f (x∗ ).(x − x∗ ) Indeed, setting
g(x, z) = z − f (x) = 0,
g (x, z) = −f (x), 1 = 0
and
then rank(g (x∗ , f (x∗ ))) = 1.
The tangent plane at (x∗ , f (x∗ )) is characterized by g (x∗ , f (x∗ )).x − x∗ , z − f (x∗ ) = −f (x∗ ), 1.x − x∗ , z − f (x∗ ) = 0. ** If x∗ is an interior stationary point, then ∇f (x∗ ) = 0 , and the tangent plane T (x∗ , f (x∗ )) is the horizontal plane z = f (x∗ ). *** The graph of the tangent plane is the graph of the linear approximation L(x) = f (x∗ ) + f (x∗ ).(x − x∗ ). Thus, we have f (x) ≈x∗ L(x)
for x close to x∗ .
Example 1. The tangent plane to a curve y = f (x) at a point (x0 , f (x0 )) corresponds to the tangent line to the curve at that point described by the equation y = f (x0 ) + f (x0 )(x − x0 ). The following examples, in Table 3.1, show that the tangent line is horizontal at local extreme points and separates the graph into two parts at an inflection point; see Figure 3.4 and Figure 3.5.
Constrained Optimization-Equality Constraints
f (x0 )
f (x)
point x0
(x − 1)2 − 1
1 : global minimum
2(x − 1)
1 − x2
0 : global maximum
−2x
(x − 1)3 + 1
0 : inflection point
3(x − 1)2
ln x
e
143
tangent line
x=1
y = −1
=0
x=0
=0
y=1
x=1
=0
1 1 = x x=e e
y=1 y−1=
1 (x − e) e
TABLE 3.1: Tangent planes in one dimension y 1.5
tangent line y 1 1.0
y 1.5 1.0
0.5 y 1 x2
0.5
0.5
0.5 1.0
1.5
2.0
x
0.0
tangent line y
y logx
x
1
2
3
4
x
0.5
0.5
FIGURE 3.5: Tangent lines at an inflection and at an ordinary points Example 2. The tangent plane to a surface z = f (x, y) at a point (x0 , y0 , f (x0 , y0 )) corresponds to the usual tangent plane to the surface at that point described by the equation z = f (x0 , y0 ) + fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 ). A normal vector to this plane is n = fx (x0 , y0 ), fy (x0 , y0 ), −1 = fx (x0 , y0 )i + fy (x0 , y0 )j − k. A normal line to the surface z = f (x, y) at (x0 , y0 , f (x0 , y0 )) is the line parallel to the vector n. The examples, given in Table 3.2 and graphed in Figures 3.6 and 3.7, show that the tangent plane is horizontal at local extreme points and separates the graph into two parts at a saddle point.
144
Introduction to the Theory of Optimization in Euclidean Space 4
4
2 2
2
2
z x 1 y 1 1
y
y
0
plane tangent z 4
0
2
2 4
z x2 y2 4
2 3 1 z z
2
0 1 1 0 2
2
plane tangent z 1
1 0
0 x
1 x
1 2
2
FIGURE 3.6: Horizontal tangent planes at local extreme points
The corresponding tangent planes at (x0 , y0 ) are respectively a) z = −1,
b) z = 4,
d) z = 2x + 2y − 3
c) z = 0,
4
2 y
1 2
0
plane tangent 2 x 2 y z3 0
y
1
0 2 2
z y2 x2 2 2
1
z
1 0 z
0
1 1 2
plane tangent z 0
2
2 1 x
z x 12 y 12 1 0
0
1 1
x 2
2 3
FIGURE 3.7: Tangent planes at a saddle and ordinary points
Example 3. Find the tangent plane at the point (0, 1, 0) to the set g = (g1 , g2 ) = 1, 1 with g1 (x, y, z) = x + y + z,
g2 (x, y, z) = x2 + y 2 + z 2 .
and
Solution: The surface g(x, y, z) = 1, 1 is the intersection of the two surfaces g1 (x, y, z) = 1 and g2 (x, y, z) = 1. So, it is a curve in the space R3 . We have ⎡ ⎤ ∂g1 ∂g1 ∂g1 ∂y ∂z 1 1 1 ⎢ ∂x ⎥ g (x, y, z) = ⎣ ⎦ = 2x 2y 2z . ∂g2 ∂g2 ∂g2 ∂x
∂y
∂z
Constrained Optimization-Equality Constraints
f (x0 , y0 )
z = f (x, y)
a)
(x − 1)2 + (y + 1)2 − 1 2(x − 1), 2(y + 1)
(x,y)=(1,−1)
−2x, −2y
y 2 − x2
−2x, 2y
(x − 1)2 + (y + 1)2 − 1
2(x − 1), 2(y + 1)
c) d)
4 − x2 − y 2
b)
145
(x,y)=(0,0)
(x,y)=(0,0)
= 0, 0
= 0, 0
= 0, 0
(x,y)=(2,0)
= 2, 2
TABLE 3.2: Examples in two dimension
g (0, 1, 0) =
1 0
1 2
1 0
has rank 2
The tangent plane is the set of points (x, y, z) such that ⎡ ⎤ x 1 1 1 0 ⎣ ⎦ g (0, 1, 0).x − 0, y − 1, z − 0 = . y−1 = 0 2 0 0 z ⎧ ⎨ x+y−1+z =0 ⇐⇒ ⎩ 2(y − 1) = 0. A parametrization of the tangent plane to the two surfaces at (0, 1, 0) is the line (see Figure 3.8) x=t
y=1
z = −t,
t ∈ R.
146
Introduction to the Theory of Optimization in Euclidean Space 2 y
1
0 1 2 2
1
z
0
1
2 2 1 0 x
1 2
FIGURE 3.8: Tangent plane at (0, 1, 0) to [g = 1, 1]
Remark 3.1.3 Note that the representation of the tangent plane obtained in the theorem has used the fact that the point was regular. When this hypothesis is omitted, the representation is not necessary true. Indeed, if S is the set defined by g(x, y) = 0
with
g(x, y) = x2 ,
then S is the y axis. No point of S is regular since we have g (x, y) = 2x, 0
and
g (0, y) = 0, 0
on the y-axis.
We deduce that at each point (0, y0 ) ∈ S, we have M = {h = (h1 , h2 ) :
g (0, y0 ).h = 0} = R2 .
However, the line x(t) = 0
y(t) = y0 + t
passes through the point (0, y0 ) at t = 0 with direction x (0), y (0) = 0, 1 and remains included in S. Hence, the tangent plane is equal to S.
Constrained Optimization-Equality Constraints
147
Solved Problems
1. – Find an equation of the tangent plane to the ellipsoid x2 +4y 2 +z 2 = 18 at the point (1, 2, 1). Solution: Set g(x, y, z) = x2 + 4y 2 + z 2 = 18. Then, g (x, y, z) = 2xi + 8yj + 2zk, g (1, 2, 1) = 2i + 16j + 2k = 0
=⇒
rank(g (1, 2, 1)) = 1.
The tangent plane (see Figure 3.9) is the set of points (x, y, z) such that ⎡ ⎤ x−1 g (1, 2, 1).x − 1, y − 2, z − 1 = 2 16 2 . ⎣ y − 2 ⎦ = 0 z−1 ⇐⇒
2(x − 1) + 16(y − 2) + 2(z − 1) = 0
⇐⇒
x + 8y + z − 18 = 0.
5 y 0
5 5
z
0
5 5
0 x 5
FIGURE 3.9: Tangent plane at (1, 2, 1) to the ellipsoid
148
Introduction to the Theory of Optimization in Euclidean Space
2. – Find all points on the surface 2x2 + 3y 2 + 4z 2 = 9 at which the tangent plane is parallel to the plane x − 2y + 3z = 5. Solution: Set g(x, y, z) = 2x2 + 3y 2 + 4z 2 = 9. We have g (x, y, z) =
4x
6y
8z
= 0
on [g = 9]
=⇒
rank(g (x, y, z)) = 1
since g (x, y, z) = 0
⇐⇒
(x, y, z) = 0
g(0) = 9.
and
The tangent plane to the surface g(x, y, z) = 9 at a point (x0 , y0 , z0 ) is the set of points (x, y, z) such that ⎡ ⎤ x − x0 g (x0 , y0 , z0 ).x − x0 , y − y0 , z − z0 = 4x0 6y0 8z0 . ⎣ y − y0 ⎦ = 0 z − z0 4x0 (x − x0 ) + 6y0 (y − y0 ) + 8z0 (z − z0 ) = 0.
⇐⇒
This tangent plane will be parallel to the plane x − 2y + 3z = 5 if the two planes have their respective normals g (x0 , y0 , z0 ) and 1, −2, 3 parallel. So, we have to solve the following system ⎧ ⎪ ⎪ 4x0 = t ⎪ ⎧ ⎪ ⎪ ⎪ find t ∈ R : ⎪ ⎪ ⎪ ⎪ ⎨ ⎨ 6y0 = −2t g (x0 , y0 , z0 ) = t1, −2, 3 ⇐⇒ ⎪ ⎪ ⎪ ⎪ 8z0 = 3t ⎩ ⎪ ⎪ g(x0 , y0 , z0 ) = 9 ⎪ ⎪ ⎪ ⎪ ⎩ 2x20 + 3y02 + 4z02 = 9 =⇒
t 2
2
4
t 2 3t 2 +3 − +4 =9 3 8
The needed points on the surface are 3√ 4√ 9 √
3 , 3, − 3, 7 7 14
−
=⇒
t=±
12 √ 3. 7
3√ 4√ 9√
3, 3 . 3, − 7 7 14
The equations of the tangent planes to the surface (see Figure 3.10) at these points are
x−
3√
4√
9√
3 −2 y+ 3 +3 z− 3 = 0, 7 7 14
Constrained Optimization-Equality Constraints
149
2 1
y 0 1 2 2
1
z
0
1
2 2 1 0 x
1 2
FIGURE 3.10: Parallel tangent planes to an ellipsoid x+
4√
9√
3√
3 −2 y− 3 +3 z+ 3 = 0. 7 7 14
3. – Show that the surfaces z = x2 + y 2
and
z=
1 2 5 (x + y 2 ) + 10 2
intersect at (3, 4, 5) and have a common tangent plane at that point.
Solution: Set g1 (x, y, z) = z −
x2 + y 2
g2 (x, y, z) = z −
5 1 2 (x + y 2 ) − . 10 2
Since g1 (3, 4, 5) = 0 and g2 (3, 4, 5) = 0, then the point (3, 4, 5) is a common point to the surfaces g1 (x, y, z) = 0 and g2 (x, y, z) = 0. We have g1 (x, y, z) = −
x x2
y2
y i− j+k 2 x + y2
+ y x g2 (x, y, z) = − i − j + k 5 5
4 3 g1 (3, 4, 5) = − i − j + k = 0, 5 5
rank(g1 (3, 4, 5)) = 1
4 3 g2 (3, 4, 5) = − i − j + k = 0, 5 5
rank(g2 (3, 4, 5)) = 1.
150
Introduction to the Theory of Optimization in Euclidean Space
Note that the normal vectors g1 (3, 4, 5) and g2 (3, 4, 5) of the tangent planes to the surfaces g1 (x, y, z) = 0 and g2 (x, y, z) = 0 respectively are the same. Hence, the two surfaces have a common tangent plane at this point with the equation 4 3 − (x − 3) − (y − 4) + (z − 5) = 0. 5 5
4. – Find two unit vectors that are normal to the surface sin(xz) − 4 cos(yz) = 4 at the point P (π, π, 1). Solution: A vector that is normal to the surface g(x, y, z) = sin(xz) − 4 cos(yz) = 4 is normal to the tangent plane to this surface at this point and we have g (x, y, z) = z cos(xz)i + 4z sin(yz)j + (x cos(xz) + 4y sin(yz))k g (π, π, 1) = −i − πk = 0 =⇒ rank(g (π, π, 1)) = 1. A normal vector to the tangent plane is g (π, π, 1) = −i − πk and two unit vectors that are normal to the surface sin(xz) − 4 cos(yz) = 4 at the point P (π, π, 1) are ±
1 π g (π, π, 1) = ±− √ , 0, − √ . g (π, π, 1) 1 + π2 1 + π2
Constrained Optimization-Equality Constraints
3.2
151
Necessary Condition for Local Extreme Points-Equality Constraints
Before setting the results rigorously, we will try to give an intuitive approach of the comparison of the values of f close to a local maximum value f (x∗ ) under the constraints g(x) = c. We will follow the unconstrained case in parallel. • Unconstrained case: We compare values of f taken in a neighborhood of x∗ in all directions
f (x∗ + th) f (x∗ )
f or h ∈ Rn
|t| < δ
or equivalently, for each i = 1, . . . , n
f (x∗ + tei ) f (x∗ )
|t| < δ
then for |t| < δ, we have f (x∗ + tei ) − f (x∗ ) 0 if t>0 t f (x∗ + tei ) − f (x∗ ) 0 if t
and
t > 0.
Since f is differentiable, we obtain as t −→ 0+ and t −→ 0− respectively
fxi (x∗ ) 0 So
fxi (x∗ ) = 0
fxi (x∗ ) 0
and for each
i = 1, . . . , n.
• Constrained case: We cannot choose points around x∗ in any direction because
we need to remain on the set [g = c]. A way to do that, is to consider curves t −→ x(t) satisfying x(t) ∈ [g = c] for t ∈ (−a, a) and x(0) = x∗ . Then, we have
f (x(t)) f (x∗ )
∀t ∈ (−a, a)
and 0 is local maximum point for the function t −→ f (x(t)). Hence, for regular functions, we have
d f (x(t)) = f (x(t)).x (t) =0 dt t=0 t=0
=⇒
f (x∗ ).x (0) = 0.
152
Introduction to the Theory of Optimization in Euclidean Space
x (0) is a tangent vector to the curve x(t) at the point x(0) = x∗ . This equality musn’t depend on a particular curve. Thus, it must be satisfied for any y = x (0) ∈ M , which is summarized below:
Lemma 3.2.1 Let f and g = (g1 , . . . , gm ) be C 1 functions in a neighborhood of x∗ ∈ [g = c]. If x∗ is a regular point and a local extreme point of f subject to these constraints, then we have ∀y ∈ Rn :
g (x∗ )y = 0
f (x∗ )y = 0.
=⇒
The lemma says that f (x∗ ) is orthogonal to the plane tangent at x∗ to the surface g(x) = c. As a consequence, we will see that f (x∗ ) is a linear (x∗ ). combination of g1 (x∗ ), . . . , gm
Theorem 3.2.1 Let f and g = (g1 , . . . , gm ) be C 1 functions in a neighborhood of x∗ ∈ [g = c]. If x∗ is a regular point and a local extreme point of f subject to these constraints, then there exists unique numbers λ∗1 , . . . , λ∗m such that m ∂f ∗ ∂gj ∗ (x ) − λ∗j (x ) = 0 ∂xi ∂xi j=1
i = 1, . . . , n.
Proof. The proof uses a simple argument of linear algebra. Indeed, ⎡ ⎢ ⎢ ⎢ ∗ A = g (x ) = ⎢ ⎢ ⎢ ⎣
b = f (x∗ ) =
∂g1 ∂x1
∂g1 ∂x2
∂g2 ∂x1
∂g2 ∂x2
∂gm ∂x1
∂gm ∂x2
.. .
.. .
... ... .. . ...
∂g1 ∂xn ∂g2 ∂xn
.. .
∂gm ∂xn
∂f ∂f ,..., ∂x1 ∂xn
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
rank(A) = m
b ∈ Rn .
From the previous lemma, we have ∀y ∈ Rn :
Ay = 0
=⇒
b.y = 0.
Constrained Optimization-Equality Constraints
153
In other words, we have KerA = Ker
A b
where KerN denotes the Kernel [10] of the linear transformation induced by the matrix N . Since we have [10] A A ) + rank( ) dimRn = dim(kerA) + rank(A) = dim(ker b b
then rank(A) = rank(
A b
)
which means that the vector b is linearly dependent on the line vectors of A, so there exists a unique vector λ∗ = (λ∗1 , . . . , λ∗m ) ∈ Rm such that t
b =t Aλ∗
m
⇐⇒
∂gj ∂f ∗ (x ) = λ∗j (x∗ ), ∂xi ∂x i j=1
i = 1, . . . , n.
Finally, to look for extreme points of f subject to the constraint g(x) = c, we are lead to solve the system m ∂gj ∂f (x) − λj (x) = 0 ∂xi ∂xi j=1
gj (x) − cj = 0
i = 1, · · · , n
j = 1, · · · , m.
These equations suggest to introduce the function L(x, λ) = f (x) − λ1 (g1 (x) − c1 ) − · · · − λm (gm (x) − cm ) called Lagrange function or Lagrangian and λ1 , . . . , λm the Lagrange multipliers. The necessary conditions can be, then, expressed in the form ⎧ m ⎪ ∂f ∂gj ∂L ⎪ ⎪ (x, λ) = (x) − λj (x) = 0 i = 1, . . . , n ⎪ ⎪ ∂xi ∂xi ⎨ ∂xi j=1 ⎪ ⎪ ⎪ ∂L ⎪ ⎪ (x, λ) = −(gj (x) − cj ) = 0 ⎩ ∂λj or simply
∇L(x, λ) = 0.
j = 1, . . . , m
154
Introduction to the Theory of Optimization in Euclidean Space We may reformulate the previous theorem as follow:
Theorem 3.2.2 Let f and g = (g1 , . . . , gm ) be C 1 functions in a neighborhood of x∗ ∈ [g = c]. x∗ is a regular point and a local extreme point of f ∃! λ∗ ∈ Rm such that ∇L(x∗ , λ∗ ) = 0.
=⇒
Remark 3.2.1 When m = 1, the necessary condition is reduced to ∃!λ∗ ∈ R :
∇f = λ∗ ∇g
∇f // ∇g.
=⇒
The vectors g (x∗ ) and f (x∗ ) are respectively normal to the level curves g(x) = c and f (x) = f (x∗ ). When the extreme point is attained then the two vectors g (x∗ ) and f (x∗ ) are parallel. Thus the two level curves have a common tangent plane at x∗ . When, using a graphic utility, it is where the level curves are tangent, the constrained extreme points may locate. Example 1. At what points on the circle x2 + y 2 = 1 does f (x, y) = xy have its maximum and minimum? Solution: Set g(x, y) = x2 + y 2
S = {(x, y) :
g(x, y) = x2 + y 2 = 1}
By the extreme-value theorem, f attains its maximum and minimum values on S since f is continuous on the closed and bounded unit circle S; see Figure 3.11.
0.5
z
1.0
0.0
0.5
0.5
zxy 0.0
1.0
y
0.5 0.5
0.0 x
0.5 1.0
1.0
FIGURE 3.11: Graph of f (x, y) = xy on the unit disk [x2 + y 2 1]
Constrained Optimization-Equality Constraints
155
Next, the functions f and g are C 1 around each point (x, y) ∈ R2 and in particular each point of S is relatively interior to S and is a regular point since we have g (x, y) = 2xi + 2yj = 0
rank(g (x, y)) = 1.
=⇒
Thus, introducing the Lagrangian L(x, y, λ) = x y − λ (x2 + y 2 − 1), we can apply Lagrange multipliers method to look for the interior extreme points as solutions of the system ⎧ Lx = fx (x, y) − λgx (x, y) = 0 ⎪ ⎪ ⎪ ⎪ ⎨ Ly = fy (x, y) − λgy (x, y) = 0 ⎪ ⎪ ⎪ ⎪ ⎩ Lλ = −(g(x, y) − 1) = 0
⇐⇒
∗
⎧ y − 2xλ = 0 ⎪ ⎪ ⎪ ⎪ ⎨ x(1 − 4λ2 ) = 0 ⎪ ⎪ ⎪ ⎪ ⎩ 2 x + y2 − 1 = 0
⇐⇒
⇐⇒
⎧ y − 2xλ = 0 ⎪ ⎪ ⎪ ⎪ ⎨ x − 2yλ = 0 ⎪ ⎪ ⎪ ⎪ ⎩ 2 x + y2 − 1 = 0 ⎧ y − 2xλ = 0 ⎪ ⎪ ⎪ ⎪ ⎨ x=0 or λ = ± 21 ⎪ ⎪ ⎪ ⎪ ⎩ 2 x + y 2 − 1 = 0.
x = 0 leads to y = 0 and (0, 0) is not a point on the constrained curve.
∗∗ λ = 12 leads to y = x and from the constraint equation, we deduce that √ x = ±1/ 2. ∗ ∗ ∗ λ = − 12 leads to y = −x and from the constraint equation, we deduce √ that x = ±1/ 2.
So, the stationary points, for the Lagrangian, are the four points 1 1 ( √ , √ ), 2 2
1 1 (− √ , − √ ), 2 2
1 1 ( √ , − √ ), 2 2
1 1 (− √ , √ ) 2 2
at which f takes its maximum and minimum values respectively 1 1 1 1 1 f ( √ , √ ) = f (− √ , − √ ) = , 2 2 2 2 2
1 1 1 1 1 f ( √ , − √ ) = f (− √ , √ ) = − . 2 2 2 2 2
The problem can be solved graphically, as illustrated in Figure 3.12.
156
Introduction to the Theory of Optimization in Euclidean Space y 1.5 2.1 1.9 1.7 1.4 1.1 0.7 21.8 0.3 1.5 1.2 0.8 1.6 1.0 1.3 0.4 1 0.6
0.9 0.5
1.5 0.1
1.0
0.8
0.4
0
0
0.5
1.0
1.1 1
1.7
1.5 1.8
1.3
0.1
0.2 1.5
xy 1 2
0.5
0.6
1.0 0.2
1.5
x
1.5
1.0
0.5
0.5
1.0
1.5
x
0.4
0.6 1
0.7 0.5
1.0
0.9
0.1 0.5
0.5
08
1.
1.2
0.7
0.3
0.3 0.7
1.8
1.4 1
0.3
0.4
.6 1.4
1.7
1.3
0.1 0.5
0.5
0.9
1.5
1.5
0.2
0.2
0.6
.2
y 1.1
0.5
0.5
0 1 1.
1.0
1 1.7 0.9 1.31 6 1 92.1
1.5
xy 1 2
1.1 1.4
1 2
FIGURE 3.12: The constraint [x2 + y 2 = 1] and the level curves xy = − , are tangent
1 2
Remark 3.2.2 Note that the Lagrange’s method doesn’t transform a constrained optimization problem into one finding an unconstrained extreme point of the Lagrangian.
Example 2. Consider the problem max x y
subject to
x + y = 2,
x 0,
y 0.
Using the Lagrange multiplier method, prove that (x, y) = (1, 1) solves the problem with λ = 1. Prove also that (1, 1, 1) does not maximize the Lagrangian L. Solution: Since x and y must be positive and satisfy the sum x + y = 2, we may look for the extreme points in the set [0, 2] × [0, 2]. Let us denote f (x, y) = xy
Ω = [0, 2] × [0, 2].
g(x, y) = x + y
First, the optimization problem has a solution by the extreme-value theorem. Indeed, f is continuous on the line segment (see Figure 3.13) S = {(x, y) :
g(x, y) = 2,
x 0,
y 0}
which is a closed and bounded subset of R2 . Next, the functions f and g are C 1 around each point (x, y) ∈ (0, 2) × (0, 2) which is a regular point since we have g (x, y) = i + j = 0
=⇒
rank(g (x, y)) = 1.
Constrained Optimization-Equality Constraints
157
y 2.5
2.0
1.5
S
1.0
0.5
0.5
0.5
1.0
1.5
2.0
2.5
x
0.5
FIGURE 3.13: Set of the constraints
So, by applying the method of Lagrange Multipliers, we introduce the Lagrangian L(x, y, λ) = f (x, y) − λ(g(x, y) − 2) = xy − λ(x + y − 2) and look for the interior extreme points as solutions of the system ⎧ ⎧ Lx = fx − λgx = 0 y−λ=0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎨ Ly = fy − λgy = 0 ⇐⇒ x−λ=0 ⇐⇒ x = y = λ = 1. ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎩ Lλ = −(g − 2) = 0 x+y−2=0 So, the point (1, 1, 1) is a stationary point for the Lagrangian L. But, it is not an extreme point for L. Indeed, the second test derivative gives ⎡ ⎤ 0 1 −1 the Hessian matrix of L. HL (x, y, λ) = ⎣ 1 0 −1 ⎦ 1 1 0 The leading principal minors of HL at (1, 1, 1) are D1 = 0 ,
0 D2 = 1
1 = −1, 0
0 D3 = 1 1
1 −1 0 −1 = −2 = 0. 1 0
Hence, (1, 1, 1) is a saddle point. It remains to show that the point (1, 1) is the maximum point for the problem; see Figure 3.14 for a graphical solution using level curves. Indeed, since it is the only interior point to the segment, it suffices to compare the value of f at (1, 1) with its value at the end points of the segment. We have f (1, 1) = 1
f (2, 0) = 0
f (0, 2) = 0.
158
Introduction to the Theory of Optimization in Euclidean Space 0.0 2.0 y
1.5
x 0.5 1.0 1.5
1.0
2.0
0.5 0.0 4
2.0
0.57
2.28
1.33
3.04 3.42
1.9
2.66
3.8 3.6 3.23
3
1.5
2.85 0.95
z
2.09
2.47
1.52
2
1.0
1.71
zxy
1
0.76 0.5
1.14
0.19
0
0.3 0.0 0.0
0.5
1.0
1.5
2.0
FIGURE 3.14: The constraint x + y = 2 and the level curve f = xy = 1 are tangent
So f attains its maximum value at (1, 1) under the constraint g(x, y) = 2.
Remark 3.2.3 A function subject to a constraint needs not to have a local extremum at every stationary point of the associated Lagrangian. The Lagrangian multiplier method transforms a constrained optimization problem into one of finding the appropriate stationary points of the Lagrangian.
Example 3. Consider the problem min x y
subject to
x + y = 2,
x 0,
y 0.
Using the Lagrange multiplier method, prove that (x, y) = (1, 1) doesn’t solve the problem with λ = 1. Solution: Arguing as in the Example 2, the problem has a solution by the extreme-value theorem. But, by applying the method of Lagrange multiplier, we found the only point candidate (1, 1) and it realizes the maximum of f . So the minimum point of f is not necessary a stationary point of L. In fact, f attains its minimum value 0, under the constraint g(x, y) = 2 at (2, 0) and (0, 2).
Constrained Optimization-Equality Constraints
159
Solved Problems
1. – i) Show that the Lagrange equations for max (min) f (x, y) = x+y +3
subject to
g(x, y) = x−y = 0
have no solution. ii) Show that any point of the constraints’ set is a regular point. iii) What can you conclude about the minimum and maximum values of f subject to g = 0? Show this directly.
Solution: i) Set L(x, y, λ) = f (x, y) − λ(g(x, y) − 0) = x + y + 3 − λ(x − y). By applying Lagrange’s multipliers method, we look for the interior extreme points as a solution of the system ⎧ ⎧ Lx = fx − λgx = 0 1−λ=0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎨ Ly = fy − λgy = 0 1+λ=0 ⇐⇒ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎩ Lλ = −(g − 0) = 0 x−y =0 which leads to a contradiction with λ = 1 and λ = −1. So the system has no solution. ii) Any point of the constraint is a regular point since we have g (x, y) = i − j = 0
=⇒
rank(g (x, y)) = 1.
iii) We can conclude that f has no maximum nor minimum on the set of the constraints since if they existed, they would be solution of the above
160
Introduction to the Theory of Optimization in Euclidean Space 2
y 0
y 2
2 x y 0 line: z 0, xy 0
1
2
z 2
1
1
2
0
x
plane : z xy 3
2 1 2 0 x
2
2
FIGURE 3.15: No solution for the constrained optimization problem
system. Indeed, all conditions of the theorem on the necessary conditions for a constrained candidate point are satisfied. The problem is equivalent to optimize for x ∈ R.
F (x) = f (x, x) = 2x + 3 We can see that lim F (x) = −∞
lim F (y) = +∞.
x→−∞
x→+∞
Therefore, f cannot reach a finite lower and upper bound on the set of the constraints. The graph of f is a plane; see Figure 3.15. The level curves y+1 = k are parallel lines that intersects the constraint line x − y = 0 at the points (k − 1, k − 1). This shows that f takes large values (see Figure 3.16). 5.73
4.56
6.84
8.36
7.6
7.98
.28
2
4.94
7.22
6.08
3.42 1
76 3 0.38
2
2 66
3.8 1
6.46
2
3 4.18
2.66
1.14 2
1.14 1 9
5.32 1
1
0
1.52 2.28
1.52
0 76
3
0 38
1.9
30
FIGURE 3.16: ∇f = 1, 1 ∦ 1, −1 = ∇g
Constrained Optimization-Equality Constraints
161
2. – Consider the problem of minimizing f (x, y) = y + 1
g(x, y) = x4 − (y − 2)5 = 0.
subject to
i) Show, without using calculus, that the minimum occurs at (0, 2). Is it a regular point? ii) Show that the Lagrange condition ∇f = λ∇g is not satisfied for any value of λ. iii) Does this contradicts the theorem on the necessary conditions for a constrained candidate point?
Solution: i) Note that we have, g(x, y) = x4 −(y −2)5 = 0
(y −2)5 = x4 0
⇐⇒
y 2.
=⇒
So, on the set of the constraint (see Figure 3.17), we have f (x, y) = y + 1 3 = f (x, 2)
∀(x, y) ∈ [g = 0].
Since g(0, 2) = 0, then (0, 2) ∈ [g = 0]. Thus f (x, y) f (0, 2)
∀(x, y) ∈ [g = 0]
and (0, 2) is a global minimum point. 5
y 0 5 10 10
y 10
5
5
x4 y 2 0
8 z
plane : z y 1
0
6 5
4 10
2
5 0
10
5
5
10
x
x
5 10
FIGURE 3.17: Minimal value of f on the constraint set g = 0
162
Introduction to the Theory of Optimization in Euclidean Space
ii) Let L(x, y, λ) = f (x, y) − λ(g(x, y) − 0) = y + 1 − λ(x4 − (y − 2)5 ). An interior extreme point, if it exists, is a solution of the system ⎧ Lx = fx − λgx = 0 ⎪ ⎪ ⎪ ⎪ ⎨ Ly = fy − λgy = 0 ⎪ ⎪ ⎪ ⎪ ⎩ Lλ = −(g − 0) = 0
⎧ 0 − 4λx3 = 0 ⎪ ⎪ ⎪ ⎪ ⎨ 1 − 5λ(y − 2)4 = 0 ⎪ ⎪ ⎪ ⎪ ⎩ 4 x − (y − 2)5 = 0
⇐⇒
Note that λ = 0 is not possible by the second equation. So, we deduce that x = 0, from the first equation, and then y = 2 from the third equation. But, this leads to a contradiction by the second equation. So the system has no solution. No level curve is tangent to the constraint set in Figure 3.18. 10
10.24
9.6
.96
8.32 7.04 5.76
5
7.6 6.4
5.12
4.4
3.84 2.56 1.28 5
10 0.64 2.56 5.12
6.4 8.32
3.2 1.92 0
3.84 5
1.92
3.2
7.04
0.6 10
5
1.28
5.76
4.48 7.
10
FIGURE 3.18: No solution with Lagrange method
iii) This does not contradict the theorem on the necessary conditions for a constrained candidate point since the theorem is true if all assumptions are satisfied which is not the case for the regularity of the point (0, 2). Indeed, we have g (x, y) = 4x3 i−5(y−2)4 j
g (0, 2) = 0, 0
=⇒
rank(g (0, 2)) = 0 = 1.
3. – At what points on the curve g(x, y) = x4 +y 4 = 1 does f (x, y) = x2 +y 2 have its maximum and minimum values? Give a geometric interpretation of the problem.
Constrained Optimization-Equality Constraints
163
Solution: Note that, the optimization problem has a solution by the extremevalue theorem
since f is continuous on the closed and bounded subset [g = 1] = g −1 {1} of R2 . Next, the functions f and g are C 1 around each point (x, y) ∈ R2 . In particular each point of [g = 1] is relatively interior to [g = 1]. Indeed, if (x0 , y0 ) ∈ [g = 1], then the point (x20 , y02 ) is on the unit circle. Thus, (x20 , y02 ) is an interior point and we conclude, by using the preimage of an open ball by the continuous function (x, y) −→ (x2 , y 2 ) is an open set. Moreover, each point of [g = 1] is a regular point since we have g (x, y) = 4x3 i + 4y 3 j = 0
on [g = 1]
=⇒
rank(g (x, y)) = 1.
So, by setting L(x, y, λ) = x2 + y 2 − λ (x4 + y 4 − 1) we are led to solve the system ⎧ ⎧ Lx = 2x − 4λx3 = 0 2x(1 − 2λx2 ) = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎨ Ly = 2y − 4λy 3 = 0 2y(1 − 2λy 2 ) = 0 ⇐⇒ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎩ 4 x + y4 = 1 Lλ = −(x4 + y 4 − 1) = 0 ⎧ x = 0 or 2λx2 = 1 ⎪ ⎪ ⎪ ⎪ ⎨ y = 0 or 2λy 2 = 1 ⇐⇒ ⇐⇒ ⎪ ⎪ ⎪ ⎪ ⎩ 4 x + y4 = 1 ⎧ ⎧ ⎧ 2 x=0 y=0 x = y2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎨ ⎨ x4 = 1/2 y = ±1 x = ±1 or or ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎩ ⎩ λ = 1/(2x2 ). λ = 1/2 λ = 1/2
So, the stationary points for the Lagrangian are 1 1 1 1 , ± 1/4 ), (− 1/4 , ± 1/4 ) 21/4 2 2 2 at which f takes its maximum and minimum values respectively √ 1 1 1 1 max f = f ( 1/4 , ± 1/4 ) = f (− 1/4 , ± 1/4 ) = 2 g=1 2 2 2 2 (0, ±1),
(±1, 0),
(
164
Introduction to the Theory of Optimization in Euclidean Space min f = f (±1, 0) = f (0, ±1) = 1. g=1
1.0 0.5
y
y
0.0
2.3 1.5
4 3.4 3.1 2.7 3.83.6 3.2 2.9 2.4 .9 3.73.5
0.5 1.0
1.0
3.3
3 2.6
2.8
2.2
2.1 1.2
1.0
2.5 2.93.2
0.8
2.7
1.3
2.5
3 3.3 3.53.9 2.6 3.7 4 2.8 3.13.4 3.8 3.
2.4
1.7
0.5
z
0.7 0.2
2.3
1.4
0.4
0.5 1.5 1.8 0.0
1.0 1
z x2 y2 0.6
2.6 2.3
0.5
3 2.7 0.0 0.5 1.0
0.5
x
2.
2.4
1.6
1.0 1.1
2 1.5
1.5 1.9
0.5
1.5
2.5 3.4 3.7 3.2 3.9 3.5 2.9 2.4 3.8 3.3 4 36 31 28
1.0 0.9
0.1
1.0
x
0.3 0.5
0.5
2.3 27
2.8 3.1
2.9 3.33. 3.23.53.84 3 3.43.73.9
2.6
FIGURE 3.19: The constraint [x4 + y 4 = 1] and the level curves f = 1,
√
2 are
tangent
Since f (x, y) = (x, y) − (0, 0)2 , then the problem looks for points (x, y) on the curve x4 + y 4 = 1 that are closest and farthest from the origin; see Figure 3.19.
4. – Figures A an B (see Figure 3.20) show the level curves of f and the constraint curve g(x, y) = 0 graphed thickly. Estimate the maximum and minimum values of f subject to the constraint. Locate the point(s), if any, where an extreme value occurs. Solution: Figure A. Two level curves of f are tangent to the constraint curve g = 0. Comparing the values of f taken at these level curves, we deduce that local max f ≈ 15 ≈ f (−1.5, 1.5) g=0
local min f ≈ 3.64 ≈ f (−1.5, 1.5). g=0
Figure B. One level curve of f is tangent to the constraint curve g = 0. Comparing the values of f taken at different level curves, we remark that f keeps taking large values on the constraint set. Therefore, we deduce that local max f doesn’t exist g=0
local min f ≈ 19.2 ≈ f (3, −2). g=0
Constrained Optimization-Equality Constraints y 3 32.48 33.6 21.5620.16 33.04 30.8 31.92 30.24 24.0822.68 29.1228.25.76 .8831.36 26.3224.92 26.88 29.68 .32 .76 28.56 2.2 30.52 27.44 23.2420.72 .64 1.08 21.84 19.32 9.96 28.84 28.28 24.36 17.92 16.24 9.4 26.04 25.2 27.16 7.72 22.4 2 23.5221. 19.88 6.6 13.44 12.04 5.48 4.64 22.12 15.12 3.8 18.7616.8 8.96 2.96 10.92 1 20.44 12.32 .28 6.72 9.6 14. 9.8 317.64 2 1 8.12 15.96 11.48
19.04
17.36
14.84
A
9.88 0.44 19.32 1. .56 .12 20.7218.48 11.76 .68 19.6 15.6813 72 21 22120.16 484 28
163.2 2.4
7.28 5.6
7.84 9.52 3
201.6 220 211
105.6
19 172 134.4153
5
5.04 1 3.36
10.0 8.4
4.2 2
3
96.
x
57.6 38.4
10 76.8
9.6
19.2 5
86.4 x 28.8 48. 67.2 10 5
6.16
4.48 2
115.2
240. 249.6 230.4 259.2 268.8288 278
124.8
144.
12.
10
7.56
7.
12.88 14.56 10.368.68 17.08
249.6230.4 259.2 288.268.8 8.4 240. 201.6 0.8 1.2
16.5 14.28 11.2 9.24
5.32 13.92 .04
y
22.6 22.4 20.1620.72 21.28 21.84 19.6 19.04 22. 19.8821. 21 20. 19.3 18.2 15.4
13.16
10.64
165
53.6134.4 2.8 92. 1.2 0.8 201.6
3.64 4.76 6.44
7.28 8.1 5.88 8.6 7.84 9.24 9. 8.4 10. 8.96 9 10 5210 086 7 56
B
5
144 124.8
115.2
105.6
240. 8.4 288 268259 8249 2 6230 4
10
182 163.2
211 220 201.6 240 278 2 8288 230 4249259 6268
FIGURE 3.20: Level curves of f and the constraint curve g = 0
5. – Find the points on the sphere x2 + y 2 + z 2 = 1 that are closest to and farthest from the point (1, 2, 2).
Solution: The distance of a point (x, y, z) to the point (1, 2, 2) is given by D(x, y, z) = (x − 1)2 + (y − 2)2 + (z − 2)2 . To look for the shortest and the farthest distance when (x, y, z) remains into the unit sphere is equivalent to optimize D2 (x, y, z) under the constraint x2 + y 2 + z 2 = 1. So, let us denote f (x, y, z) = (x − 1)2 + (y − 2)2 + (z − 2)2 g(x, y, z) = x2 + y 2 + z 2 S = [g = 1]. First, the optimization problem has a solution by the extreme value theorem since f is continuous on the unit sphere S, which is a closed and bounded subset of R3 . Next, f and g are C ∞ around each point (x, y, z) ∈ R3 . In particular, each point of S a relatively interior point and is a regular point since we have g (x, y, z) = 2xi + 2yj + 2zk = 0
=⇒
rank(g (x, y, z)) = 1.
So, consider the Lagrangian L(x, y, z, λ) = (x − 1)2 + (y − 2)2 + (z − 2)2 − λ(x2 + y 2 + z 2 − 1)
166
Introduction to the Theory of Optimization in Euclidean Space
and apply Lagrange multipliers method to look for the interior extreme points by solving the system ⎧ Lx = fx (x, y, z) − λgx (x, y, z) = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ Ly = fy (x, y, z) − λgy (x, y, z) = 0
⎧ 2(x − 1) − 2xλ = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ 2(y − 2) − 2yλ = 0
⇐⇒
⎪ ⎪ Lz = fz (x, y, z) − λgz (x, y, z) = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ Lλ = −(g(x, y, z) − 1) = 0
⎪ ⎪ 2(z − 2) − 2zλ = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 2 x + y 2 + z 2 − 1 = 0.
If x = 0, then the first equation leads to x = 1 which is impossible. We cannot have also y = 0 and z = 0. So, we deduce from the system that 2 2 1 =1− =1− x y z
λ=1− from which we deduce ⎧ ⎨ y = z = 2x λ = 1 − 1/x ⎩ 2 x + y2 + z2 = 1
⎧ ⎪ ⎨ y = z = 2x λ = 1 − 1/x ⎪ ⎩ x2 + 4x2 + 4x2 − 1 = 0 ⇐⇒ x = ± 1 . 3
⇐⇒
44 y 0
2
2
x 0 2 4
2 4 5
z
0
5
FIGURE 3.21: The constraint [g = 1] and the level curves f = 4, 16 are tangent So, the stationary points for the Lagrangian are the two points 1 2 2 1 2 2 (− , − , − , 4) ( , , , −2), 3 3 3 3 3 3 and f takes its maximum and minimum values respectively 1 2 2 1 2 2 and f ( , , ) = 4. f (− , − , − ) = 16 3 3 3 3 3 3 The level curves passing by these points are spheres tangents to the constraint as shown in Figure 3.21.
Constrained Optimization-Equality Constraints
3.3
167
Classification of Local Extreme Points-Equality Constraints
To classify a local extreme point x∗ in the case of an unconstrained optimization problem, we compared values f (x∗ + h) with f (x∗ ) using Taylor’s formula and the fact that ∇f (x∗ ) = 0. In this constrained case, we also need to make this comparison, but, we have to take into account the presence of the constraints. The Lagrangian function links the values of f to those of g. Therefore, we will apply Taylor’s formula to compare values L(x∗ + h) with L(x∗ ) using the fact that ∇L(x∗ , λ∗ ) = 0. More precisely, we establish a second derivative test under specific assumptions.
Consider the optimization problem with equality constraints, local max(min)f (x)
subject to
g(x) = c
where g(x) = g1 (x), . . . , gm (x),
c = c1 , . . . , cm
(m < n).
The associated Lagrangian is L(x, λ) = f (x) − λ1 (g1 (x) − c1 ) − λ2 (g2 (x) − c2 ) − . . . − λm (gm (x) − cm ).
Theorem 3.3.1 Sufficient conditions for a strict local constrained extreme point
Let f and g = (g1 , . . . , gm ) be C 2 functions in a neighborhood of x∗ in Rn such that: g(x∗ ) = c
rank(g (x∗ )) = m,
∇L(x∗ , λ∗ ) = 0
for a unique vector
λ∗ = λ∗1 , . . . , λ∗m .
Then (i)
(ii)
(−1)m Br (x∗ ) > 0 ∀r = m + 1, . . . , n =⇒ x∗ is a strict local minimum point (−1)r Br (x∗ ) > 0 ∀r = m + 1, . . . , n =⇒ x∗ is a strict local maximum point.
168
Introduction to the Theory of Optimization in Euclidean Space
For r = m + 1, . . . , n, Br (x∗ ) is the bordered Hessian determinant defined by ∗ Br (x ) =
0 .. . 0
... .. . ...
∂g1 ∗ ∂x1 (x )
.. . ∂g1 ∗ ∂xr (x )
... .. . ...
∂g1 ∗ ∂x1 (x )
.. . ∂gm ∗ ∂x1 (x )
... .. . ...
Lx1 x1 (x∗ , λ∗ ) .. . Lxr x1 (x∗ , λ∗ )
... .. . ...
0 .. . 0 ∂gm ∗ ∂x1 (x )
.. . ∂gm ∗ ∂xr (x )
.. . ∂gm ∗ (x ) ∂xr ∗ ∗ Lx1 xr (x , λ ) .. . Lxr xr (x∗ , λ∗ ) ∂g1 ∗ ∂xr (x )
The variables are renumbered in order to make the first m columns in the matrix g (x∗ ) linearly independent.
Remark 3.3.1 If we introduce the notations: Q(h) = Q(h1 , . . . , hn ) =
n n
Lxi xj (x∗ , λ∗ )hi hj
i=1 j=1
⎡
the (m + n) × (m + n) bordered matrix ⎣
0m×m t
g (x∗ )
g (x∗ )
⎤ ⎦
[Lxi xj (x∗ , λ∗ )]n×n
M = {h ∈ Rn : g (x∗ ).h = 0} the theorem says that Q(h) > 0
∀h ∈ M, h = 0
=⇒
x∗ is a strict local minimum
Q(h) < 0
∀h ∈ M, h = 0
=⇒
x∗ is a strict local maximum.
It suffices then to study the definite positivity (negativity) of the quadratic form on the tangent plan M to the constraint g = c at the point x∗ (see the reminder at the end of this section).
Before proving the theorem, we will see its application through some examples.
Constrained Optimization-Equality Constraints
169
Example 1. Consider the problem local max f (x, y) = x y
g(x, y) = x + y = 2, x 0, y 0.
subject to
Lagrange multiplier method shows that (1, 1) is a regular candidate point. Prove that it is a local maximum to the constrained optimization problem. Solution: Considering the Lagrangian L(x, y, λ) = f (x, y) − λ(g(x, y) − 2) = xy − λ(x + y − 2), we can study the nature of the point (1, 1) using the second derivatives test. Here, we have n = 2 and m = 1. The first column vector of g (1, 1) = 1, 1 is linearly independent. So, we keep the matrix g (1, 1) without renumbering the variables. Then, we have to consider the sign of the bordered Hessian determinant (r = m + 1 = 2 = n) gy (1, 1) 0 gx (1, 1) 0 1 1 (−1)2 B2 (1, 1) = gx (1, 1) Lxx (1, 1, 1) Lxy (1, 1, 1) = 1 0 1 = 2 > 0. 1 1 0 gy (1, 1) Lxy (1, 1, 1) Lyy (1, 1, 1) We conclude that the point (1, 1) is a local maximum to the problem. Example 2. Solve the problem local max f (x, y, z) = xy + yz + xz
subject to
g(x, y, z) = x + y + z = 3.
Solution: Note that f and g are C 1 in R3 and g (x, y, z) = i + j + k = 0
=⇒
rank(g (x, y, z)) = 1.
Thus, any point, interior to the constraint set [g = 3] (see Figure 3.22), is a regular point. Consider the Lagrangian L(x, y, z, λ) = f (x, y, z) − λ(g(x, y, z) − 3) = xy + yz + xz − λ(x + y + z − 3) and let us look for its stationary points solutions of the system ⎧ Lx = y + z − λ = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ Ly = x + z − λ = 0 ∇L(x, y, z, λ) = 0, 0, 0, 0 ⇐⇒ ⎪ ⎪ Lz = y + x − λ = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ Lλ = −(x + y + z − 3) = 0.
170
Introduction to the Theory of Optimization in Euclidean Space x yz3
10 z
4
5 0
2
5 4
0y 2 x
2
0 2 4
4
FIGURE 3.22: The constraint set [g = 3]
λ From the first three equations, we deduce that = x = y = z, which inserted 2 into the last equation gives x=y=z=1
λ = 2.
Now, let us study the nature of the point (1, 1, 1). For this we use the second derivative test since f and g are C 2 around this point. The first column vector of g (1, 1, 1) is linearly independent. So, we keep the matrix g (1, 1, 1) without renumbering the variables. As n = 3 and m = 1, we have to consider the signs of the following bordered Hessian determinants: 0 (−1)2 B2 (1, 1, 1) = gx (1, 1, 1) gy (1, 1, 1)
0 Lxy (1, 1, 1, 2) = 1 1 Lyy (1, 1, 1, 2)
gx (1, 1, 1)
gy (1, 1, 1)
Lxx (1, 1, 1, 2) Lxy (1, 1, 1, 2)
0 gx (1, 1, 1) (−1)3 B3 (1, 1, 1) = − gy (1, 1, 1) gz (1, 1, 1) = −
0 1 1 1
gx (1, 1, 1)
gy (1, 1, 1)
Lxx (1, 1, 1, 2)
Lxy (1, 1, 1, 2)
Lyx (1, 1, 1, 2)
Lyy (1, 1, 1, 2)
Lzx (1, 1, 1, 2)
Lzy (1, 1, 1, 2)
1 0 1 1
1 1 0 1
1 1 1 0
1 0 1
1 1 0
= 2 > 0.
Lxz (1, 1, 1, 2) Lyz (1, 1, 1, 2) Lzz (1, 1, 1, 2) gz (1, 1, 1)
= 3 > 0.
We conclude that the point (1, 1, 1) is a local maximum to the constrained maximization problem.
Constrained Optimization-Equality Constraints
171
Proof. We will prove assertion i). Assertion ii) can be established similarly. We follow for this the proof in [25] with more details in the steps involved.
◦
Step 1 : Let Ω be a neighborhood of x∗ . For h ∈ Rn such that x∗ + h ∈ Ω, we have from Taylor’s formula, for some τ ∈ (0, 1), L(x∗ +h, λ∗ ) = L(x∗ , λ∗ )+
n
Lxi (x∗ , λ∗ )hi +
i=1
n
n
1 Lx x (x∗ +τ h, λ∗ )hi hj . 2 i=1 j=1 i j
◦
Since x∗ ∈ Ω and (x∗ , λ∗ ) is a local stationary point of L then, in particular, Lxi (x∗ , λ∗ ) = 0
i = 1, · · · , n.
Moreover, we have g1 (x∗ ) − c1 = g2 (x∗ ) − c2 = . . . = gm (x∗ ) − cm = 0 L(x∗ , λ∗ ) = f (x∗ ) − λ∗1 (g1 (x∗ ) − c1 ) − . . . − λ∗m (gm (x∗ ) − cm ) = f (x∗ ) L(x∗ + h, λ∗ ) = f (x∗ + h) − λ∗1 (g1 (x∗ + h) − c1 ) − . . . − λ∗m (gm (x∗ + h) − cm ) from which we deduce
∗
∗
f (x +h)−f (x ) =
n
λ∗k [gk (x∗ +h)−ck ]
k=1
n
n
1 + Lx x (x∗ +τ h, λ∗ )hi hj . 2 i=1 j=1 i j
Using Taylor’s formula for each gk , k = 1, . . . , m, we obtain
gk (x∗ + h) − ck = gk (x∗ + h) − gk (x∗ ) =
n ∂gk j=1
∂xj
(x∗ + τk h)hj
τk ∈ (0, 1).
Step 2 : Now consider the (m + n) × (m + n) bordered Hessian matrix ⎤ ⎡ 0 G(x1 , . . . , xm ) ⎦ B(x0 , x1 , . . . , xm ) = ⎣ t G(x1 , . . . , xm ) HL(.,λ∗ ) (x0 )
172
Introduction to the Theory of Optimization in Euclidean Space
where ⎡ G(x1 , . . . , xm ) =
x1 , . . . , xm
∂g
i
∂xj
(xi )
m×n
⎢ =⎢ ⎣
∂g1 1 ∂x1 (x )
.. . ∂gm m (x ) ∂x1
... .. . ...
∂g1 1 ∂xn (x )
⎤
⎥ .. ⎥ . ⎦ ∂gm m (x ) ∂xn
are arbitrary vectors in some open ball around x∗
HL(.,λ∗ ) (x0 ) : is the Hessian matrix of L with respect to x evaluated at x0 .
For r = m+1, . . . , n, let detBr (x0 , x1 , . . . , xm ) be the (m+r)×(m+r) leading principal minor of the matrix B(x0 , x1 , . . . , xm ). Suppose that (−1)m Br (x∗ ) > 0 for all r = m + 1, . . . , n, then by continuity of the second-order partial derivatives of f and g, and since detBr (x∗ , x∗ , . . . , x∗ ) = Br (x∗ ) there exists ρ > 0 such that, ∀r = m + 1, . . . , n, (−1)m detBr (x0 , x1 , . . . , xm ) > 0
∀x0 , x1 , . . . , xm ∈ Bρ (x∗ ).
As a consequence, for x0 , x1 , . . . , xm ∈ Bρ (x∗ ), the quadratic form Q(t) = Q(t1 , . . . , tn ) =
n n
Lxi xj (x0 , λ∗ )ti tj ,
i=1 j=1
with the associated symmetric matrix Lxi xj (x0 ) ject to the constraints
G(x1 , . . . , xm ).t = 0
⇐⇒
n ∂gk j=1
∂xj
n×n
, is definite positive sub-
(xk )tj = 0
k = 1, . . . , m.
Step 3 : Because τ, τk ∈ (0, 1), we have, for x∗ + h ∈ Bρ (x∗ ), x0 = x∗ + τ h, Then
n n i=1 j=1
x1 = x∗ + τ1 h, . . . , xm = x∗ + τm h ∈ Bρ (x∗ ).
Lxi xj (x∗ + τ h, λ∗ )ti tj > 0
∀t = 0 such that
Constrained Optimization-Equality Constraints n ∂gk j=1
∂xj
(x∗ + τk h)tj = 0
173
k = 1, . . . , m.
In particular, for t = h such that n ∂gk j=1
∂xj
(x∗ + τk h)hj = 0
k = 1, . . . , m,
(1)
we have n
n
1 f (x + h) − f (x ) = Lx x (x∗ + τ h, λ∗ )hi hj > 0. 2 i=1 j=1 i j ∗
∗
(2)
This shows that the stationary point x∗ is a strict local minimum point for f subject to the constraint g(x) = c in particular directions.
Step 4 : Suppose that x∗ is not a strict relative minimum point. Then, there exists a sequence of points yl satisfying yl −→ x∗
f (yl ) f (x∗ ).
g(yl ) = c
Write each yl in the form yl = x∗ + δl sl = 0 Note that we have
sl ∈ Rn
sl = 1
δl > 0
∀l.
δl = δl sl = yl − x∗ −→ 0.
Hence, there exists l0 > 1 such that for all l l0 , yl ∈ Bρ (x∗ ). Choose in steps 1 and 3, h = δl sl = yl − x∗ . Then g(x∗ + h) − g(x∗ ) = g(ylk ) − g(x∗ ) = c − c = 0 gk (x∗ + h) − gk (x∗ ) =
n ∂gk ∗ (x + τk h)hj = 0 ∂xj
τk ∈ (0, 1),
k = 1, . . . , m
j=1
and we should have from (1) and (2) 0 f (yl ) − f (x∗ ) = f (x∗ + h) − f (x∗ ) = which is a contradiction.
n
n
1 Lx x (x∗ + τ h)hi hj > 0 2 i=1 j=1 i j
174
Introduction to the Theory of Optimization in Euclidean Space
Theorem 3.3.2 Necessary conditions for local extreme points Let f and g = (g1 , . . . , gm ) be C 2 functions in a neighborhood of x∗ in Rn such that: g(x∗ ) = c ∇L(x∗ , λ∗ ) = 0
rank(g (x∗ )) = m, for a unique vector
λ∗ = λ∗1 , . . . , λ∗m .
Then, (i) x∗ is a local minimum point =⇒ HL = (Lxi xj (x∗ , λ∗ ))n×n is positive semi definite on M : t yHL y 0 ∀y ∈ M (ii)
x∗ is a local maximum point
=⇒
HL = (Lxi xj (x∗ , λ∗ ))n×n
is negative semi definite on M : t yHL y 0 where M = {h ∈ Rn : g (x∗ ).h = 0} g(x) = c at the point x∗ .
∀y ∈ M
is the tangent plane to the surface
Proof. We prove i), then ii) can be established similarly. Let x(t) be a two differentiable curve on the constraint surface g(x) = c with x(0) = x∗ . Suppose that x∗ is a local minimum point for f subject to the constraint g(x) = c. Then there exists r > 0 such that f (x∗ ) f (x(t)) Then
f$(0) = f (x∗ ) f (x(t)) = f$(t)
∀t ∈ (−r, r). ∀t ∈ (−r, r).
So f$ is a one variable function that has an interior minimum at t = 0. Consequently, it satisfies f$ (0) = 0 and f$ (0) 0 or equivalently d2 and f (x(t)) 0. ∇f (x∗ ).x (0) = 0 dt2 t=0 We have d2 f (x(t)) = t x (t)Hf (x(t))x (t) + ∇f (x(t))x (t) dt2 d2 f (x(t)) = t x (0)Hf (x∗ )x (0) + ∇f (x∗ ).x (0). dt2 t=0 Moreover, differentiating the relation g(x(t)) = c twice, we obtain t
x (t)Hg (x(t))x (t) + ∇g(x(t))x (t) = 0
=⇒
Constrained Optimization-Equality Constraints
175
t
x (0)Hg (x∗ )x (0) + ∇g(x∗ )x (0) = 0.
Hence 0
d2 f (x(t)) = [t x (0)Hf (x∗ )x (0) + ∇f (x∗ )x (0)] dt2 t=0 − t λ∗ [t x (0)Hg (x∗ )x (0) + ∇g(x∗ )x (0)] = =
t
x (0)[Hf (x∗ ) −t λ∗ Hg (x∗ )]x (0) + [∇f (x∗ ) + t λ∗ ∇g(x∗ )]x (0)
t
x (0)[HL (x∗ )]x (0)
∇f (x∗ ) + t λ∗ ∇g(x∗ ) = 0
since
and the result follows since x (0) is an arbitrary element of M.
Quadratic Forms with Linear Constraints Consider the symmetric quadratic form in n variables Q(h) =
n n
aij hi hj
(aij = aji )
i=1 j=1
subject to m linear homogeneous constraints
b11 h1 + . . . + b1n hn = 0 .. .. . . . bm1 h1 + . . . + bmn hn = 0
Set
⎡
a11 ⎢ A = ⎣ ... an1
... .. . ...
⎤ a1n .. ⎥ . ⎦ ann
⎡
b11 ⎢ B = ⎣ ... bm1
... .. . ...
⎤ b1n .. ⎥ . ⎦ bmn
⎡
⎤ h1 ⎢ ⎥ h = ⎣ ... ⎦ hn
Definition. Q(h) = t hAh is positive (resp. negative) definite subject to the linear constraints Bh = 0 if Q(h) > 0 (resp. < 0) for all h = 0 that satisfy Bh = 0. We have the following necessary and sufficient condition for a quadratic form Q to be positive (resp. negative) definite subject to linear constraints.
176
Introduction to the Theory of Optimization in Euclidean Space
Theorem: Assume the first m columns in the matrix B = (bij ) are linearly independent. Then Q is positive definite subject to the constraints Bh = 0 ⇐⇒
(−1)m Br > 0
r = m + 1, . . . , n
Q is negative definite subject to the constraints Bh = 0 ⇐⇒
(−1)r Br > 0
r = m + 1, . . . , n
where Br are the symmetric determinants Br =
··· .. . ...
0 .. .
b11 .. .
...
b1r .. .
0
bm1
...
bmr
b11 .. .
...
bm1 .. .
a11 .. .
a1r .. .
b1r
...
bmr
ar1
... .. . ...
0 .. . 0
arr
for
r = m + 1, . . . , n.
Constrained Optimization-Equality Constraints
177
Solved Problems
1. – Consider the problem max(min) f (x, y) = x2 + 2y 2
subject to
g(x, y) = x2 + y 2 = 1.
i) Find the four points that satisfy the first-order conditions. ii) Classify them by using the second derivatives test. iii) Graph some level curves of f and the graph of g = 1. Explain, where the extreme points occur.
Solution: i) First, each of the optimization problems has a solution by the extreme-value theorem; see Figure 3.23. Indeed, f is continuous on the unit circle S = {(x, y) : g(x, y) = 1} which is a closed and bounded subset of R2 . 1.0 y
0.5
0.0 0.5 1.0 2.0
1.5
z 1.0
0.5
0.0
z x2 2 y2
1.0 0.5 0.0 x
0.5 1.0
FIGURE 3.23: Graph of f on the set [x2 + y 2 1]
178
Introduction to the Theory of Optimization in Euclidean Space
Next, the functions f and g are C 1 in R2 and any point on the unit circle is regular since, for each (x, y) ∈ S, we have g (x, y) = (2x, 2y) = (0, 0) Thus, if we introduce the Lagrangian
rank(g (x, y)) = 1.
=⇒
L(x, y, λ) = f (x, y) − λ(g(x, y) − 1) = x2 + 2y 2 − λ(x2 + y 2 − 1), then, by applying Lagrange multipliers method, the interior extreme points candidates are solutions of the system ∇L(x, y, λ) = 0, 0, 0
⇐⇒
⎧ Lx = 2x − λ(2x) = 0 ⎪ ⎪ ⎪ ⎪ ⎨ Ly = 4y − λ(2y) = 0 ⎪ ⎪ ⎪ ⎪ ⎩ Lλ = −(x2 + y 2 − 1) = 0
⇐⇒
⎧ x = 0 or λ = 1 ⎪ ⎪ ⎪ ⎪ ⎨ y = 0 or λ = 2 ⎪ ⎪ ⎪ ⎪ ⎩ 2 x + y 2 − 1 = 0.
We cannot have x = y = 0 since the constraint is not satisfied. If x = 0 and λ = 2, we deduce from the third equation y = ±1. Then, if y = 0 and λ = 1, we get x = ±1. So the four points that satisfy the necessary conditions are (1, 0) (−1, 0) (0, 1) (0, −1). ii) Now, because f and g are C 2 , we may study the nature of the four points by using the second derivatives test. Here, we have n = 2 and m = 1. Then, we have to consider the sign of the bordered Hessian determinant B2 at each point. Nature of the points (±1, 0) where λ = 1 : First, we have g (x, y) = (2x, 2y),
g (±1, 0) = (±2, 0),
rank(g (±1, 0)) = 1,
and the first column vector of g (±1, 0) is linearly independent. We have 0 B2 (x, y) = gx (x, y) gy (x, y) 0 B2 (1, 0) = 2 0
2 0 0
gx (x, y) Lxx (x, y, λ) Lxy (x, y, λ) 0 0 2
= −8
gy (x, y) Lxy (x, y, λ) Lyy (x, y, λ)
0 = 2x 2y
2x 2 − 2λ 0
0 −2 B2 (−1, 0) = −2 0 0 0
0 0 2
2y 0 4 − 2λ = −8.
For m = 1, we have (−1)1 B2 (1, 0) = 2 > 0 and the points (±1, 0) are local minima.
(−1)1 B2 (−1, 0) = 2 > 0
Constrained Optimization-Equality Constraints
179
Nature of the points (0, ±1) where λ = 2 : We have g (x, y) = (2x, 2y),
g (0, ±1) = (0, ±2)
rank(g (0, ±1)) = 1.
=⇒
Note that the first column vector of g (0, ±1) is linearly dependent and the second column vector is linearly independent. So, we renumber the variables so that the second column vector of g (0, ±1) is in the first position. Hence B2 will be written as 0 B2 (x, y) = gy (x, y) gx (x, y)
gy (x, y) Lyy (x, y, λ) Lxy (x, y, λ)
0 2 B2 (0, 1) = 2 0 0 0
0 0 −2
gx (x, y) Lyx (x, y, λ) Lxx (x, y, λ)
0 = 2y 2x
2y 4 − 2λ 0
0 −2 B2 (0, −1) = −2 0 0 0
=8
2x 0 2 − 2λ
= 8.
0 0 −2
For r = m + 1 = 2 = n, we have (−1)2 B2 (0, 1) = 8 > 0
(−1)2 B2 (0, −1) = 8 > 0
and the points (0, ±1) are local maxima. y .4 76 5.12
6.08 5.44
3.2
4.16
y
1.5 3.84
4.8
6.4 5.12 6.0 5.4
3.52
2.56
1.0
4.4
0.96
1.92
1.5
5.76
4.8
2.8
0.5
x2 2 y2 2
1.0
0.5 x2 2 y2 1
1.5
1.0
0.64
1.6
0.32 0.5
0.5
1.0
1.5
0.5 .2
2.24
2.56
2.88 4.16 5 12
3.84 1.5
1.0
0.5
0.5
1.0
1.5
x
3.5
1.0 5.44 08 6 4 5 76
1.5
0.5 1.28
4.8
x
48
1.0 4.48 5.4 5.12 6.0 5 76 6 4
1.5
FIGURE 3.24: Level curves f = 1 and f = 2 are tangent to the constraint g=1
iii) Conclusion: We have f (±1, 0) = 1
f ((0, ±1) = 2.
Subject to the constraint g(x, y) = 1, f attains its maximum value 2 at the points (0, ±1) and its minimum value 1 at the points (±1, 0). At these points,
180
Introduction to the Theory of Optimization in Euclidean Space
the level curves x2 + 2y 2 = 1, x2 + 2y 2 = 2 and the constraint x2 + y 2 = 1, sketched in Figure 3.24, are tangent.
2. – Consider the problem min f (x, y, z) = (x − x0 )2 + (y − y0 )2 + (z − z0 )2 subject to
g(x, y, z) = ax + by + cz + d = 0
for (x0 , y0 , z0 ) ∈ R3 , d ∈ R and (a, b, c) = (0, 0, 0). i) Find the points that satisfy the first-order conditions. ii) Show that the second-order conditions for a local minimum are satisfied. iii) Give a geometric argument for the existence of a minimum solution. iv) Does the maximization problem have any solution? v) Solve min x2 + y 2 + z 2
subject to
x + y + z = 1.
Solution: i) Note that f and g are C 1 in R3 . In particular, each point of [g = 0] is a relative interior and regular point since we have g (x, y, z) = ai + bj + ck = 0
=⇒
rank(g (x, y, z)) = 1.
So, by applying Lagrange multipliers method, we will look for the candidate extreme points as stationary points for the Lagrangian L(x, y, z, λ) = (x − x0 )2 + (y − y0 )2 + (z − z0 )2 − λ(ax + by + cz + d). These points are solution of the system
∇L(x, y, z, λ) = 0, 0, 0, 0
⇐⇒
⎧ Lx = 2(x − x0 ) − λa = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ Ly = 2(y − y0 ) − λb = 0 ⎪ ⎪ Lz = 2(z − z0 ) − λc = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ Lλ = −(ax + by + cz + d) = 0
Constrained Optimization-Equality Constraints
181
from which we deduce ⎧ λ λ λ ⎪ ⎪ y = b + y0 z = c + z0 ⎪ ⎨ x = 2 a + x0 2 2
λ
λ
⎪ ⎪ λ ⎪ ⎩ a a + x 0 + b b + y 0 + c c + z0 + d = 0 2 2 2 and that ax0 + by0 + cz0 + d λ∗ λ =− . = 2 a 2 + b2 + c 2 2 Thus, we have only one critical point denoted (x∗ , y ∗ , z ∗ ) with λ = λ∗ . ii) First, note that g (x∗ , y ∗ , z ∗ ) = (a, b, c) = (0, 0, 0) and discuss: Case a = 0. The first column vector of g (x∗ , y ∗ , z ∗ ) is linearly independent, and because n = 3 and m = 1, we have to consider the signs of the following bordered Hessian determinants: 0 ∗ ∗ ∗ B2 (x , y , z ) = gx gy
gy Lxy Lyy
gx Lxx Lxy
0 = a b
a 2 0
b 0 2
= −2(a2 + b2 ) < 0.
The partial derivatives of g are taken at (x∗ , y ∗ , z ∗ ) and those of L at (x∗ , y ∗ , z ∗ , λ∗ ). B3 =
0 gx gy gz
gx Lxx Lyx Lzx
gy Lxy Lyy Lzy
gz Lxz Lyz Lzz
=
0 a b c
a 2 0 0
b 0 2 0
c 0 0 2
= −4(a2 + b2 + c2 ) < 0.
Case a = 0 & b = 0. The first column vector of g (x∗ , y ∗ , z ∗ ) is linearly dependent and the second is linearly independent. We renumber the variables in the order y, x, z and obtain 0 B2 = b a
b a 2 0 = −2(a2 + b2 ) 0 2
B3 =
0 b a c
b a 2 0 0 2 0 0
c 0 0 2
= −4(a2 + b2 + c2 ).
182
Introduction to the Theory of Optimization in Euclidean Space
Case a = 0, b = 0, & c = 0. The first and second column vector of g (x∗ , y ∗ , z ∗ ) are linearly dependent and the third is linearly independent. We renumber the variables in the order z, x, y and obtain 0 B2 = c a
B3 =
c a 2 0 = −2(a2 + c2 ) 0 2
0 c a b
c 2 0 0
a 0 2 0
b 0 0 2
= −4(a2 + b2 + c2 ).
Conclusion. In each case, we have, with m = 1, (−1)m B2 (x∗ , y ∗ , z ∗ ) > 0
(−1)m B3 (x∗ , y ∗ , z ∗ ) = 4(a2 + b2 + c2 ) > 0.
We conclude that the point (x∗ , y ∗ , z ∗ ) is a local minimum to the constrained minimization problem. iii) Geometric interpretation of the minimization problem: If M (x, y, z), M0 (x0 , y0 , z0 ) ∈ R3 , then f (x, y, z) = (x − x0 )2 + (y − y0 )2 + (z − z0 )2 = M0 M 2 is the square of the distance of the point M to the point M0 . The constraint surface g(x, y, z) = ax + by + cz + d = 0
is the plane with normal
a, b, c.
The minimization problem consists in finding a point M in the plane that is located at a shortest distance from M0 . Such a point exists and is obtained by considering the intersection of the line passing through the point M0 and perpendicular to the plane. A direction of this line is given by the normal to the plane a, b, c. Therefore, parametric equations of the line are x = x0 + ta
y = y0 + tb
Clearly the intersection of the line with the plane gives
a x0 + ta + b y0 + tb + c z0 + tc + d = 0 t=
t ∈ R.
z = z0 + tc
⇐⇒
λ∗ ax0 + by0 + cz0 + d =− . 2 a 2 + b2 + c 2
f takes its minimum value f(
λ∗ λ∗ λ∗ λ∗ λ∗ λ∗2 2 2 2 λ∗ a+x0 , b+y0 , c+z0 ) = ( a)2 +( b)2 +( c)2 = (a +b +c ). 2 2 2 2 2 2 4
Constrained Optimization-Equality Constraints
183
The shortest distance of M0 to the plan g = 0 is |λ∗ | 2 |ax0 + by0 + cz0 + d| λ∗2 2 √ (a + b2 + c2 ) = . a + b2 + c 2 = D= 4 2 a 2 + b2 + c 2 iv) The maximization problem doesn’t have a solution: Suppose that there exists (xm , ym , zm ) a solution to the maximization problem. Then the points (xm + t, ym − t, zm ) for t ∈ R are located in the plane and satisfy f (xm +t, ym −t, zm ) = (xm +t)2 +(ym −t)2 +z 2
−→ +∞
as
t −→ +∞.
v) From, the previous study, choose (a, b, c) = (1, 1, 1), d = −1, (x0 , y0 , z0 ) = (0, 0, 0). Then 1 λ = 2 3
(x∗ , y ∗ , z ∗ ) =
and
1 1 1
, , . 3 3 3
We conclude that the point ( 13 , 13 , 13 ) is a local minimum to the constrained minimization problem. At this point, the two level surfaces x2 + y 2 + z 2 =
1 1 1
1 =f , , , 3 3 3 3
and
x+y+z =1
are tangent, as it is described in Figure 3.25. 1.0 y
0.5
0.0 0.5 1.0 1.0 x yz 1 0.5
z
0.0
0.5 x2 y2 z2 1.0
1 3
1.0 0.5 0.0 x
0.5 1.0
FIGURE 3.25: The level surface and the plane are tangent
184
Introduction to the Theory of Optimization in Euclidean Space
3. – The planes x + y + z = 3 and x − y = 2 intersect in a straight line. Find the point on that line that is closest to the origin.
Solution: i) We formulate the problem as follows ⎧ ⎨ g1 (x, y, z) = x + y + z = 3
min f (x, y, z) = x2 + y 2 + z 2
subject to
⎩
g2 (x, y, z) = x − y = 2.
Note that f , g1 and g2 are C 1 in R3 and any point of the set of the constraints, sketched in Figure 3.26 and defined by g = (g1 , g2 ) = (3, 2), is an interior point and regular since we have 1 1 1 rank(g (x, y, z)) = 2. g (x, y, z) = 1 −1 0 x
0 y
2
x
2 4
y
2
0 2
0
4
0 2
2
5 5
minimum
z
z origin
0 0
5
5
FIGURE 3.26: The constraints, the origin and the minimum point
Consider the Lagrangian L(x, y, z, λ1 , λ2 ) = f (x, y, z) − λ1 (g1 (x, y, z) − 3) − λ2 (g2 (x, y, z) − 2) = x2 + y 2 + z 2 − λ1 (x + y + z − 3) − λ2 (x − y − 2)
Constrained Optimization-Equality Constraints
185
and look for the stationary points solutions of ∇L(x, y, z, λ1 , λ2 ) = 0R5 ⎧ (1) Lx = 2x − λ1 − λ2 = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (2) Ly = 2y − λ1 + λ2 = 0 ⎪ ⎪ ⎪ ⎪ ⎨ (3) Lz = 2z − λ1 = 0 ⇐⇒ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (4) Lλ1 = −(x + y + z − 3) = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (5) Lλ2 = −(x − y − 2) = 0. From equations (1), (2) and (3), we deduce that ⎧ 1 ⎪ ⎪ x = 2 (λ1 + λ2 ) ⎪ ⎪ ⎨ y = 12 (λ1 − λ2 ) ⎪ ⎪ ⎪ ⎪ ⎩ z = 12 λ1 then substituting these values into equations (4) and (5), we obtain ⎧ 1 ⎨ 2 (λ1 + λ2 ) + 12 (λ1 − λ2 ) + 12 λ1 = 3 =⇒ (λ1 , λ2 ) = (2, 2). ⎩ 1 1 (λ + λ ) − (λ − λ ) = 2 1 2 1 2 2 2 The only critical point for L is
(x∗ , y ∗ , z ∗ , λ∗1 , λ∗2 ) = (2, 0, 1, 2, 2).
ii) Note that the first two column vectors of g (x, y, z) are linearly independent. We can, therefore, keep the matrix without renumbering the variables, and consider the sign of the following bordered Hessian determinant (n = 3, m = 2, r = m + 1 = 3): 0 0 1 B3 (2, 0, 1) = ∂g ∂x ∂g1 ∂y ∂g1 ∂z
We have
0
∂g1 ∂x
∂g1 ∂y
∂g1 ∂z
0
∂g2 ∂x
∂g2 ∂y
∂g2 ∂z
∂g2 ∂x
Lxx
Lxy
Lxz
∂g2 ∂y
Lyx
Lyy
Lyz
∂g2 ∂z
Lzx
Lzy
Lzz
=
0 0 1 1 1
0 0 1 −1 0
1 1 2 0 0
(−1)m B3 (2, 0, 1) = (−1)2 B3 (2, 0, 1) = 12 > 0.
1 −1 0 2 0
1 0 0 0 2
= 12
186
Introduction to the Theory of Optimization in Euclidean Space
We conclude that the point (2, 0, 1) is a local minimum to the constrained optimization problem. iii) To show that the point is the global minimum point, we use the following parametrization of the set of the constraints; see Figure 3.26: x = t + 2,
y = t,
z = 1 − 2t
t ∈ R.
So the optimization problem is reduced to min F (t) = f (t + 2, t, 1 − 2t) = (t + 2)2 + t2 + (2t − 1)2 . t∈R
We have F (t) = 2(t + 2) + 2t + 2(2t − 1)(2) = 12t = 0 and
F (t) = 12 > 0
⇐⇒
t=0
∀t ∈ R.
Hence 0 is a global minimum for F . That is, the point (2, 0, 1) is the solution to the minimization problem. In Section 3.4, we will see that using the convexity of the Lagrangian in (x, y, z), when (λ1 , λ2 ) = (2, 2), we can conclude that the local minimum point (2, 0, 1) is the global minimum point. Therefore, it solves the problem. The advantage, in arguing in this way, prevents us from exploring the geometry of the constraint set.
Constrained Optimization-Equality Constraints
3.4
187
Global Extreme Points-Equality Constraints
The following theorem gives sufficient conditions for a critical point of the Lagrangian to be a global extreme point for the associated constrained optimization problem.
Theorem 3.4.1 Let Ω ⊂ Rn , Ω be an open set and f, g1 , . . . , gm : Ω −→ ◦
R be C 1 functions. Let S ⊂ Ω be convex, x∗ ∈ S and L be the Lagrangian L(x, λ) = f (x) − λ1 (g1 (x) − c1 ) − . . . − λm (gm (x) − cm ). Then, we have ∃ λ∗ = λ∗1 , . . . , λ∗m : ∇x,λ L(x∗ , λ∗ ) = 0
⎫ ⎬
L(., λ∗ ) is concave (resp. convex) in x ∈ S
⎭
=⇒
f (x∗ ) =
max
{x∈S: g(x)=c}
f (x)
( resp. min)
Proof. Suppose that the Lagrangian L(., λ∗ ) is concave in x and that m ∂f ∗ ∂gj ∗ ∂L ∗ ∗ (x , λ ) = (x ) − λ∗j (x ) = 0 ∂xi ∂xi ∂x i j=1
i = 1, . . . , n,
then x∗ is a stationary point for L(., λ∗ ). Therefore, x∗ is a global maximum for L(., λ∗ ) in S (by Theorem 2.3.4) and we have L(x∗ , λ∗ ) = f (x∗ ) − λ∗1 (g1 (x∗ ) − c1 ) − . . . − λ∗m (gm (x∗ ) − cm ) f (x) − λ∗1 (g1 (x) − c1 ) − . . . − λ∗m (gm (x) − cm ) = L(x, λ∗ ) Since, we have ∂L ∗ ∗ (x , λ ) = −(gj (x∗ ) − cj ) = 0 ∂λj then
j = 1, . . . , m
g1 (x∗ ) − c1 = g2 (x∗ ) − c2 = . . . = gm (x∗ ) − cm = 0.
∀x ∈ S.
188
Introduction to the Theory of Optimization in Euclidean Space
So, the previous inequality reduces to f (x∗ ) f (x) − λ∗1 (g1 (x) − c1 ) − . . . − λ∗m (gm (x) − cm ). In particular, we have f (x∗ ) f (x)
∀x ∈ {x ∈ S :
g(x) = c}.
Thus x∗ solves the constrained maximization problem. The minimization case can be established similarly.
Remark 3.4.1 * Note that there is no regularity assumption on the point x∗ in the theorem. The proof uses the characterization of a C 1 convex function on a convex set. ** The concavity/convexity hypothesis is a sufficient condition. We may have a global extreme point with a Lagrangian that is neither concave nor convex (see Example 3).
Example 1. Economy. If the cost of capital K and labor L is r and w dollars per unit respectively, find the values of K and L that minimize the cost to produce the output Q = c K a Lb , where c, a and b are positive parameters satisfying a + b < 1. Solution: The inputs K and L minimizing the cost must solve the problem min rK + wL
subject to
cK a Lb = Q.
We look for the extreme points in the set Ω = (0, +∞) × (0, +∞) since K and L must satisfy cK a Lb = Q. Denote g(K, L) = cK a Lb
f (K, L) = rK + wL
S = Ω.
Note that f and g are C 1 in the open convex set Ω. Consider the Lagrangian L(K, L, λ) = f (K, L) − λ(g(K, L) − Q) = rK + wL − λ(cK a Lb − Q) and Lagrange’s necessary conditions
∇L(K, L, λ) = 0, 0, 0
⇐⇒
⎧ LK = r − λcaK a−1 Lb = 0 ⎪ ⎪ ⎪ ⎪ ⎨ LL = w − λcbK a Lb−1 = 0 ⎪ ⎪ ⎪ ⎪ ⎩ Lλ = −(cK a Lb − Q) = 0.
Constrained Optimization-Equality Constraints
189
Multiplying each side of the first equality by K, each side of the second equality by L, we obtain rK = λcaK a Lb = λaQ
wL = λcbK a Lb = λbQ
then using the third equality, we deduce the unique solution of the system K ∗ = λ∗
aQ r
L∗ = λ ∗
bQ w
λ∗ =
1 a b Q a+b r a+b w a+b . c aQ bQ
Convexity of L in (K, L). The Hessian matrix of L is ⎤ ⎡ −λ∗ cabK a−1 Lb−1 −λ∗ ca(a − 1)K a−2 Lb ⎦. HL(.,.,λ∗ ) = ⎣ −λ∗ cabK a−1 Lb−1 −λ∗ cb(b − 1)K a Lb−2 The leading principal minors are D1 (K, L) = −λ∗ ca(a − 1)K a−2 Lb > 0 −λ∗ ca(a − 1)K a−2 Lb D2 (K, L) = −λ∗ cabK a−1 Lb−1
since 0 < a < a + b < 1 −λ∗ cabK a−1 Lb−1 −λ∗ cb(b − 1)K a Lb−2
= (λ∗ )2 c2 abK 2a−2 L2b−2 (1 − (a + b)) > 0. Hence, L(., ., λ∗ ) is strictly convex in (K, L) in Ω, and we conclude that the point (K ∗ , L∗ ) is the solution to the constrained minimization problem. Example 2. Two-constraint problem. Solve the problem ⎧ ⎨ g1 (x, y, z) = x2 + y 2 = 1 min (max) f (x, y, z) = x−z subject to ⎩ g2 (x, y, z) = x2 + z 2 = 1. Solution: i) Consider the Lagrangian L(x, y, z, λ1 , λ2 ) = f (x, y, z) − λ1 (g1 (x, y, z) − 1) − λ2 (g2 (x, y, z) − 1) = x − z − λ1 (x2 + y 2 − 1) − λ2 (x2 + z 2 − 1)
190
Introduction to the Theory of Optimization in Euclidean Space and look for its stationary points, solution of the system
∇L(x, y, z, λ1 , λ2 ) = 0R5
⎧ (1) Lx = 1 − 2xλ1 − 2xλ2 = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (2) Ly = 0 − 2yλ1 = 0 ⎪ ⎪ ⎪ ⎪ ⎨ (3) Lz = −1 − 2zλ2 = 0 ⇐⇒ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (4) Lλ1 = −(x2 + y 2 − 1) = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (5) Lλ2 = −(x2 + z 2 − 1) = 0.
From equation (2), we deduce that λ1 = 0
or
y = 0.
∗ If y = 0, then from (4) and (5) we deduce that x = ±1
and
z = 0.
But (3) is not possible. ∗ If λ1 = 0, then (1) and (3) reduce to 1 − 2xλ2 = 0
or
− 1 − 2zλ2 = 0.
Since λ2 cannot be equal to zero, we deduce that x = −z =
1 . 2λ2
Inserting x = −z in (5), we obtain 2x2 = 1
⇐⇒
√ x = ±1/ 2.
Then, from (4), we get 1 + y2 = 1 2
√ y = ±1/ 2.
⇐⇒
So, the critical points of L are 1 1 1 ( √ , ± √ , − √ , λ∗1 , λ∗2 ) 2 2 2 1 1 1 (− √ , ± √ , √ , λ∗1 , λ∗2 ) 2 2 2
with with
1 (λ∗1 , λ∗2 ) = (0, √ ), 2 1 (λ∗1 , λ∗2 ) = (0, − √ ). 2
Constrained Optimization-Equality Constraints
191
The values taken by f at these points are √ 1 1 1 f(√ , ±√ , −√ ) = 2 2 2 2 ii) To study the convexity ⎡ Lxx HL(x,y,z,λ1 ,λ2 ) = ⎣ Lyx Lzx
√ 1 1 1 f (− √ , ± √ , √ ) = − 2. 2 2 2
of L in (x, y, z), consider the Hessian matrix ⎤ ⎡ ⎤ −2(λ1 + λ2 ) Lxy Lxz 0 0 Lyy Lyz ⎦ = ⎣ 0 ⎦ 0 −2λ1 Lzy Lzz 0 0 −2λ2
1 * With (λ∗1 , λ∗2 ) = (0, √ ), the Hessian is 2 ⎡ √ − 2 HL(x,y,z,0, √1 ) = ⎣ 0 2 0 and
√ √ − 2 =− 2 Δ12 1 = 0 0 √ Δ12 = 0 − 2 √ − 2 0 0 Δ3 = 0 0 0
=0
0 =0 Δ13 1 = √ − 2 Δ22 = 0
=0 − 2 0 0 √
⎤ 0 0 0 0 ⎦ √ 0 − 2 √ √ − 2 =− 2 Δ23 1 =
=2 − 2 0 √
(−1)k Δk 0
√ − 2 Δ32 = 0
0 =0 0
k = 1, 2, 3.
Thus L(., 0, √12 ) is concave in R3 and the points ( √12 , ± √12 , − √12 ) are maxima points. ** Similarly, we show that L(., 0, − √12 ) is convex and the points (− √12 , ± √12 , √12 ) are minima points. iii) Comments. The constraint set, illustrated in Figure 3.27, is the intersection of two cylinders. A parametrization of this set is described by the equations −t t ∈ [−1, 1]. x(t) = ± 1 − t2 , y(t) = t, z(t) = t or The set is closed since g1 and g2 are continuous on R3 and
[(g1 , g2 ) = (1, 1)] = g1−1 {1} ∩ g2−1 {1} . It is bounded since, for any (x, y, z) ∈ [(g1 , g2 ) = (1, 1)], we have (x, y, z) = x2 + y 2 + z 2 (x2 + y 2 ) + (x2 + z 2 ) 1 + 1 = 2.
192
Introduction to the Theory of Optimization in Euclidean Space 4 y
1.0
2
y
0 2
0.5
x2 y2 1
4 4
1.0 1.0
2
z
0.5
0.0
0.5
z
0
2
0.0
0.5
x2 z2 1
4
1.0
4
1.0 2
0.5 0 x
0.0 x
2 4
0.5 1.0
FIGURE 3.27: The constraint set
As f is continuous on the closed bounded constraint set [(g1 , g2 ) = (1, 1)], it attains its maximum and minimum values on this set by the extreme value theorem. Thus, the solution of the problem is found by comparing the values of f taken at the candidate points obtained in i). Example 3. No concavity nor convexity. Consider the problem max f (x, y, z) = xy + yz + xz
subject to
g(x, y, z) = x + y + z = 3
and the associated Lagrangian L(x, y, z, λ) = f (x, y, z) − λ(g(x, y, z) − 3) = xy + yz + xz − λ(x + y + z − 3). Show that the local maximum point (1, 1, 1) of the constrained optimization problem, with λ = 2, is a global maximum, but L(., 2) is not concave. Solution: We have Lx = y + z − λ
Ly = x + z − λ
Lz = y + x − λ
Lλ = −(x + y + z − 3).
To study the concavity of L in (x, y, z) when λ = 2, consider the Hessian matrix ⎤ ⎡ ⎡ ⎤ 0 1 1 Lxx Lxy Lxz HL(x,y,z,2) = ⎣ Lyx Lyy Lyz ⎦ = ⎣ 1 0 1 ⎦ . Lzx Lzy Lzz 1 1 0
Constrained Optimization-Equality Constraints
193
The principal minors are 0 =0 Δ12 1 = 0 Δ12 = 1 0 Δ3 = 1 1
0 =0 Δ13 1 =
0 1 2 = −1 Δ = 2 1 0 1 1 0 1 = 2. 1 0
0 =0 Δ23 1 =
1 = −1 0
0 Δ32 = 1
1 = −1 0
So L(., 2) is neither concave nor convex in (x, y, z). Thus, we cannot conclude, by using the theorem, whether the point (1, 1, 1) is a global maximum or not. Now, to show that (1, 1, 1) is a global maximum point, we can proceed as follows. Consider the values of f taken on the plane g(x, y, z) = 3: f (x, y, 3−(x+y)) = xy +(y +x)[3−(x+y)] = xy +3(x+y)−(x+y)2 = θ(x, y). The maximization problem is equivalent to solve the following unconstrained problem max
(x,y)∈R2
θ(x, y).
Since θ is C 1 , the critical points are solutions of ∇θ(x, y) = y + 3 − 2(x + y), x + 3 − 2(x + y) = 3 − 2x − y, 3 − x − 2y = 0, 0 ⎧ ⎨ 2x + y = 3 ⇐⇒ (x, y) = (1, 1). ⇐⇒ ⎩ x + 2y = 3 (1, 1) is the only critical point for θ. Moreover, we have −2 −1 θxx θxy Hθ (x, y) = = −1 −2 θyx θyy D1 (x, y) = −2 −2 D2 (x, y) = −1
=⇒ −1 =3 −2
(−1)1 D1 (x, y) = 2 > 0 =⇒
(−1)2 D2 (x, y) = 3 > 0.
θ is strictly concave on R2 . Thus (1, 1) is a global maximum of θ on R2 . Therefore, (1, 1, 1) is a global maximum of f on [g = 3].
194
Introduction to the Theory of Optimization in Euclidean Space
Solved Problems
Part 1. – A constrained optimization problem. [29] i) Solve the following constrained minimization problem min x21 + x22 + . . . + x2n
subject to
x 1 + x2 + . . . + x n = c
where c ∈ R. ii) Use part (i) to show that if x1 , x2 , . . . , xn are given numbers, then n
i=n
x2i
i=n
i=1
xi
2
.
i=1
When does the equality hold ? Solution: i) Denote by f and g the C ∞ functions in Rn : f (x1 , . . . , xn ) = x21 + x22 + . . . + x2n
g(x1 , . . . , xn ) = x1 + x2 + . . . + xn .
Consider the Lagrangian L(x1 , . . . , xn , λ) = f (x1 , . . . , xn ) − λ(g(x1 , . . . , xn ) − c) = x21 + x22 + . . . + x2n − λ(x1 + x2 + . . . + xn − c). Note that any point of the hyperplane g = c is a regular point since we have g (x1 , . . . , xn ) = 1, . . . , 1
=⇒
rank(g (x1 , . . . , xn )) = 1.
The stationary points of the Lagrangian are solutions of the system
∇L(x1 , . . . , xn , λ) = 0, . . . , 0, 0 ⇐⇒
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨
Lx1 = 2x1 − λ = 0 .. .
Lxi = 2xi − λ = 0 .. ⎪ . ⎪ ⎪ ⎪ ⎪ ⎪ Lxn = 2xn − λ = 0 ⎪ ⎪ ⎪ ⎪ ⎩ Lλ = −(x1 + x2 + . . . + xn − c) = 0.
Constrained Optimization-Equality Constraints
195
We deduce, from the n first equations, that λ = 2x1 = . . . = 2xi = . . . = 2xn
=⇒
xi =
λ 2
i = 1, . . . , n,
λ which inserted into the last equation gives n = c. Hence the unique solution 2 to the system is λ=
2c n
xi =
c n
i = 1, . . . , n.
Now, let us study the convexity of L in (x1 , . . . , xn ) when λ = The corresponding Hessian matrix is ⎡ 2 ··· ⎢ .. . . ⎣ . . 0
···
2c . n
⎤ 0 .. ⎥ . ⎦ 2
The leading principal minors are D1 = 2 > 0,
D2 = 22 > 0, . . . Di = 2i , . . . Dn = 2n > 0.
Hence, L is strictly convex in (x1 , . . . , xn ), and we conclude that the point c c
,..., n n is the solution to the constrained minimization problem. ii) Let x1 , x2 , . . . , xn be given numbers. Denote by c their sum. From part i), we have f
c n
,...,
c
f (t1 , . . . , tn ) n
∀(t1 , . . . , tn ) ∈ [t1 + . . . + tn = c].
In particular, for the given xi , we can write c 2 n
⇐⇒
c 2
c 2
x21 + x22 + . . . + x2n n c2 c2 x21 + x22 + . . . + x2n ⇐⇒ n 2 = n n c2 = (x1 + x2 + . . . + xn )2 n(x21 + x22 + . . . + x2n ). +
n
+ ... +
The equality holds only at the minimum point whose coordinates are equal to (x1 + x2 + . . . + xn )/n.
196
Introduction to the Theory of Optimization in Euclidean Space
Part 2. – Method of least squares. [1] Consider n points (x1 , y1 ), . . . , (xn , yn ) such that x1 , . . . , xn are not all equal. Find the slope m and the y-intercept b of the line y = mx + b, that minimize the quantity D(m, b) =
n
(mxi + b − yi )2 = (mx1 + b − y1 )2 + . . . + (mxn + b − yn )2
i=1
which represents the sum of the squares of the vertical distances di = [yi −(mxi +b)] from these points to the line. This line is called the regression line or the least squares’ line of best fit. (Hint: find the point candidate and check its global optimality by using Part 1 (ii))
Solution: consider the following unconstrained minimization problem: min D(m, b) = [y1 − (mx1 + b)]2 + . . . + [yn − (mxn + b)]2
(m,b)
Since D is regular, then the local extreme points are stationary points of the gradient of D, i.e, solution of ∇D(m, b) = 0, 0
⇐⇒
⎧ ∂D ⎪ ⎪ ⎪ ⎨ ∂m = −2[y1 − (mx1 + b)]x1 − . . . − 2[yn − (mxn + b)]xn = 0 ⎪ ⎪ ∂D ⎪ ⎩ = −2[y1 − (mx1 + b)] − . . . − 2[yn − (mxn + b)] = 0 ∂b ⎧ n n n ⎪ 2 ⎪ y x = m[ x ] + b[ xi ] ⎪ i i i ⎪ ⎪ ⎨ i=1 i=1 i=1 ⇐⇒ ⎪ n n ⎪ ⎪ ⎪ ⎪ yi = m[ xi ] + b[n]. ⎩ i=1
i=1
The determinant of this 2 × 2 linear system is n 2 xi i=1 n xi i=1
xi n n
2 i=1 2 x − xi = 0 = n i i=1 i=1 n
n
Constrained Optimization-Equality Constraints
197
since x1 , . . . , xn are different (see Part 1). Therefore, there exists a unique solution to the system. It remains to show that it is the minimum point. For this, we study the convexity of D where its Hessian matrix is given by ⎡
n x2i 2 ⎢ ⎢ i=1 ⎢ HD (m, b) = ⎢ ⎢ n ⎣ 2 xi
2
⎤ xi ⎥ ⎥ i=1 ⎥ ⎥ ⎥ ⎦ 2n
n
i=1
The leading principal minors values are D1 (m, b) = 2
n
x2i > 0
i=1
n n
D2 (m, b) = 4 n x2i − ( xi )2 > 0. i=1
i=1
So D is convex and the unique critical point (m∗ , b∗ ) is the global minimum. The regression line equation is y = m∗ x + b∗ with n n yi x i xi i=1 i=1 n y n i i=1 ∗ m = n n 2 x x i i i=1 i=1 n x n i i=1
n n 2 xi yi xi i=1 i=1 n n x y i i i=1 b∗ = i=1 n n 2 x x i i i=1 i=1 n x n i i=1
Part 3. – Students’ scores. In a math course, Table 3.3 lists the scores xi of 14 students on the midterm exam and their scores yi on the final exam. i) Plot the data. Do the data appear to lie along a straight line? ii) Find the least squares’ line of best fit of y as a function of x. iii) Plot the points and the regression line on the same graph. iv) Use your answer from ii) to predict the final exam score of a student whose midterm score was 41 and who dropped the course.
198
Introduction to the Theory of Optimization in Euclidean Space xi yi
100 95
95 81 88 53
71 58
83 80
48 31
92 91
100 78
85 85
63 52
78 78
58 74
73 60
60 60
TABLE 3.3: Students’ scores Solution: i) The plot, in Figure 3.28, shows that 10 points are close to a line. The plot is obtained using the Mathematica coding below: f p = {{100, 95}, {95, 88}, {81, 53}, {71, 58}, {83, 80}, {48, 31}, {92, 91}, {100, 78}, {85, 85}, {63, 52}, {78, 78}, {58, 74}, {73, 60}, {60, 60}}; gp = ListP lot[f p]
90 80 70 60 50 40
60
70
80
90
100
FIGURE 3.28: The data shows an alignment
ii) Using the results from Part 2, we have 14
xi = 1087
i=1
14
x2i = 87855
i=1
yi = 983
i=1
1499 ≈ 0.8981426 1669 and the regression line will be m∗ =
14
14
xi yi = 7942815178
i=1
b∗ =
801 ≈ 0.4799281 1669
y = 0.8981426 x + 0.4799281. iii) To check the equation of the line of best fit, we use the instruction line = F it[f p, {1, x}, x] 0.479928 + 0.898143 x
Constrained Optimization-Equality Constraints
199
To sketch the line (see Figure 3.29) with the data, we add the following Mathematica coding: gl = P lot[line, {x, 25, 110}]; Show[gl, gp]
100 90 80 70 60 50 40 60
80
100
FIGURE 3.29: Data and line y = 0.479928 + 0.898143 x
iv) The student who dropped the course would have at the final exam the approximate mark of y(41) ≈ 0.8981426(41) + 0.4799281 = 36.4056. The student would have failed if he didn’t improve his understanding of the material studied. However, this is only a relative prediction that doesn’t take into account other factors involving the learning experience of the student. Part 4. – University tuition. [12] The following, in Table 3.4, are the tuition fees that were charged at Vanderbilt University from 1982 to 1991. i) Plot the data. ii) To fit these data with a model of the form y = β0 eβ1 x , find the least squares’ line of best fit of ln y as a function of ln x. Deduce approximate values of β0 and β1 . iii) Sketch the curve in ii) with the data plot in i). iv) Suppose the exponential model is accurate for a period of time. In which year would the tuition attain a rate of $40000?
Solution: i) The data, of points (x, y), appear to lie along a straight line. The plot, shown in Figure 3.30, is obtained using the Mathematica coding below:
200
Introduction to the Theory of Optimization in Euclidean Space
year 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991
year after 1981, x 1 2 3 4 5 6 7 8 9 10
tuition (in thousands $), y 6.1 6.8 7.5 8.5 9.3 10.5 11.5 12.625 13.975 14.975
TABLE 3.4: University tuition
f p1 = {{1, 6.1}, {2, 6.8}, {3, 7.5}, {4, 8.5}, {5, 9.3}, {6, 10.5}, {7, 11.5}, {8, 12.625}, {9, 13.975}, {10, 14.975}}; gp1 = ListP lot[f p1]
14
12
10
8
4
6
8
10
FIGURE 3.30: The data (xi , yi ) lie along a straight line
The plot of the data, of points (xi , ln yi ), appears also to lie along a straight line (see Figure 3.31). f p2 = {{1, Log[6.1]}, {2, Log[6.8]}, {3, Log[7.5]}, {4, Log[8.5]}, {5, Log[9.3]}, {6, Log[10.5]}, {7, Log[11.5])}, {8, Log[12.625]}, {9, Log[13.975]}, {10, Log[14.975]}}; gp2 = ListP lot[f p2]
Constrained Optimization-Equality Constraints
201
2.6
2.4
2.2
2.0
4
6
8
10
FIGURE 3.31: The data (xi , ln yi ) are positioned along a straight line
ii) Using the results from Part 2, the least squares’ line of best fit is given by ln(y) = ln(β0 ) + β1 x; where b∗ = ln(β0 ) and m∗ = β1 , are the solution of the linear system p = A.m∗ + B.b∗ ,
q = B.m∗ + 10.b∗
where B=
10
xi = 55
A=
i=1
p=
10
10
x2i = 385
i=1
ln(yi ) = 22.7832
i=1
q=
10
xi ln(yi ) = 22.7832
i=1
p.10 − qB ≈ 0.10156 10A − B 2 and the regression line will be m∗ =
b∗ =
qA − Bp ≈ 1.71975 10A − B 2
ln y = 0.10156x + 1.71975. Thus
∗
β0 = eb ≈ 5.583112000
β1 = m∗ ≈ 0.10156.
iii) Using Matematica, we find the equation of the line of best fit line = F it[f p2, {1, x}, x] 1.71975 + 0.10156 x
We sketch the line with the data (xi , ln(yi )), in Figure 3.32, using the coding: gl = P lot[line, {x, 1/2, 11}]; Show[gl, gp2]
202
Introduction to the Theory of Optimization in Euclidean Space 2.8
2.6
2.4
2.2
2.0
4
6
8
10
FIGURE 3.32: Data (xi , ln(yi )) and line y = 1.71975 + 0.10156 x
Finally, we sketch, in Figure 3.33, the curve y = f (x) = β0 eβ1 x , with the original data (xi , yi ): curve = P lot[5.583112Exp[0.10156x], {x, 1/2, 11}]; Show[curve, gp1]
16 14 12 10 8
2
4
6
8
10
FIGURE 3.33: Data (xi , yi ) and curve model f (x) = 5.583112e0.10156x
vi) Using the formula for prediction. We need to solve the equation 1000f (x) = 40000
⇐⇒
x=
40 1 ln( ) ≈ 19.38886498 0.10156 5.583112
Thus in the year 1981 + 19 = 2000, the tuition fees reached the rate of $40000.
Chapter 4 Constrained Optimization-Inequality Constraints
In this chapter, we are interested in optimizing functions f : Ω ⊂ Rn −→ R over subsets described by inequalities
g(x) = (g1 (x), g2 (x), . . . , gm (x)) bRm
⇐⇒
⎧ ⎪ ⎨ g1 (x) b1
.. .. x ∈ Rn . . . ⎪ ⎩ g (x) b m m
Denote the set of the constraints
S = [g(x) b] = [g1 (x) b1 ] ∩ [g2 (x) b2 ] ∩ . . . ∩ [gm (x) bm ].
Example. ∗
S = [g1 (x, y) = x2 + y 2 1] ∩ [g2 (x, y) = x − y 0] is the plane region inside
the unit disk and above the line y = x. Here ∗∗
(n = 2,
S = [g1 (x, y, z) = 9−(x2 +y 2 +z 2 ) 0] = [x2 +y 2 +z 2 9] is the domain out-
side the sphere centered at the origin with radius 3. Here ∗∗∗
m = 2).
S = [g(x, y) = x2 0] = {(0, y) :
(n = 3,
m = 1).
y ∈ R} is the y-axis. Here
(n = 2,
m = 1). Note that sets defined by inequalities contain interior points and boundary points. So, for comparing the values of a function f taken around an extreme point x∗ , it will be suitable to consider curves x(t) passing through x∗ and included in the constraint set [g b]. We will consider, this time, curves t −→ x(t) such that the set {x(t) : t ∈ [0, a], x(0) = x∗ }, for some a > 0, is included in [g b]. Then, if x∗ is a local maximum of f , then we have f (x(t)) f (x∗ )
∀t ∈ [0, a]. 203
204
Introduction to the Theory of Optimization in Euclidean Space
Thus, 0 is local maximum point for the function t −→ f (x(t)). Hence
d f (x(t)) = f (x(t)).x (t) 0 dt t=0 t=0
=⇒
f (x∗ ).x (0) 0.
x (0) is a tangent vector to the curve x(t) at the point x(0) = x∗ . This inequality musn’t depend on a particular curve x(t). So, we should have
f (x∗ ).x (0) 0
for any curve x(t) such that g(x(t)) b.
In this chapter, we will first characterize, in Section 4.1, the set of tangent vectors to such curves, then establish, in Section 4.2, the equations satisfied by a local extreme point x∗ . In Section 4.3, we identify the candidates points for optimality, and in Section 4.4, we explore the global optimality of a constrained local candidate point. Finally, we establish, in Section 4.5, the dependence of the optimal value of the objective function with respect to certain parameters involved in the problem.
4.1
Cone of Feasible Directions
Let
x∗ ∈ S = [g(x) b]
Definition 4.1.1 The set defined by T = { x (0) :
t −→ x(t) ∈ S,
x ∈ C 1 [0, a],
a > 0,
x(0) = x∗ }
of all tangent vectors at x∗ to differentiable curves included in S, is called cone of feasible directions at x∗ to the set [g b].
We have the following characterization of the cone T at an interior point x∗ of S.
Constrained Optimization-Inequality Constraints
205
Remark 4.1.1 We have g continuous on Ω
x∗ ∈ [g(x) < b]
and
=⇒
T = Rn .
That is, when x∗ is an interior point of S, then the cone at x∗ coincides with the whole space. Indeed, we have T ⊂ Rn . Let us prove that Rn ⊂ T . Let y ∈ Rn . ∗ If y = 0, then the constant curve x(t) = x∗ with t ∈ [0, 1] satisfies: x ∈ C 1 [0, 1], x(0) = x∗ , x (t) = 0, x (0) = 0 = y, x(t) = x∗ ∈ S ∀t ∈ [0, 1]. So y = 0 ∈ T . ∗
∗∗ Suppose y = 0. We have x ∈
m
[gj (x) < bj ] which is an open subset of
j=1
Rn . So there exists δ > 0 such that
Bδ (x∗ ) ⊂
m
[gj (x) < bj ].
j=1
Now x(t) = x∗ + ty ∈ Bδ (x∗ )
∀t ∈ [−
δ δ , ] 2|y| 2|y|
since |x(t) − x∗ | = |t||y|
δ δ |y| = < δ. 2|y| 2
δ We deduce that y ∈ T since the curve satisfies: x ∈ C 1 [0, 2|y| ],
x(0) = x∗ ,
x (t) = y,
x (0) = y,
x(t) = x∗ + ty ∈ S
∀t ∈ [0,
δ ]. 2|y|
Example 1. Find and sketch the cone of feasible directions at the point (−1/2, 1/2) belonging to the set S = {(x, y) ∈ R2 : g1 (x, y) = x2 +y 2 −1 0 and
g2 (x, y) = x−y 0}.
Solution: The set S is the part of the unit disk located above the line y = x. The point (−1/2, 1/2) is an interior point of S; see Figure 4.1. Thus T = R2 .
206
Introduction to the Theory of Optimization in Euclidean Space y 1.5
1.0
1.0
0.5
1.5
1.0
y
1.5
0.5
S
0.5
0.5
1.0
1.5
x
1.5
1.0
S
0.5
0.5
0.5
0.5
1.0
1.0
1.5
1.5
1.0
1.5
x
Cone IR2
FIGURE 4.1: S and the cone at (−1/2, 1/2)
We know a representation of the cone T when x∗ is a regular point of S.
Definition 4.1.2 A point x∗ ∈ S = [g b] is said to be a regular point of the constraints if the gradient vectors ∇gi (x∗ ), i ∈ I(x∗ ) are linearly independent, and where I(x∗ ) = i, i ∈ {1, . . . , m} : gi (x) = bi .
Theorem 4.1.1 At a regular point x∗ ∈ S = [g b], where g is C 1 in a neighborhood of x∗ , the cone of feasible directions T is equal to the convex cone C = {y ∈ Rn :
gi (x∗ )y 0,
i ∈ I(x∗ )}.
Before giving the proof, we give some remarks and identify some cones.
Remark 4.1.2 The cone of feasible directions at a point x∗ ∈ S with vertex x∗ is the translation of C by the vector x∗ given by C(x∗ ) = x∗ + C = x∗ + {h ∈ Rn :
gi (x∗ ).h 0,
i ∈ I(x∗ )}
Constrained Optimization-Inequality Constraints
= {x∗ + h ∈ Rn : = {x ∈ Rn :
gi (x∗ ).h 0,
207
i ∈ I(x∗ )}
gi (x∗ ).(x − x∗ ) 0,
i ∈ I(x∗ )}
C(x∗ ) is the cone of feasible directions to the constraint set [g(x) b] passing through x∗ .
Example 2. Find and sketch the cone of feasible directions C(x, y) with vertex √ √ (x, y) = (−1/2, −1/2), (0, 1) and (1/ 2, 1/ 2). The points belong to the set S = {(x, y) ∈ R2 :
g1 (x, y) = x2 +y 2 −1 0
g2 (x, y) = x−y 0}.
and
Solution: Note that the three points belong to ∂S; see Figure 4.2. y 1.5
1.0
1.0
0.5
1.5
1.0
y
1.5
0.5
S
0.5
0.5
1.0
1.5
x
1.5
1.0
S
0.5
0.5
0.5
0.5
1.0
1.0
1.5
1.5
1.0
1.5
x
Cone : y x
FIGURE 4.2: Location of the points on S and C(−1/2, −1/2) To determine the cone of feasible directions at each point (see Figures 4.2 and 4.3), we need to discuss the regularity of each point. First, we will need: ⎡ ⎢ g (x, y) = ⎣
∂g1 ∂x
∂g1 ∂y
∂g2 ∂x
∂g2 ∂y
⎤
⎡ 2x ⎥ ⎣ ⎦= 1
2y −1
⎤ ⎦.
∗ At (−1/2, −1/2), the equality constraints g2 = 0 is satisfied and the point is regular. We have and rank(g2 (−1/2, −1/2)) = 1 g2 (x, y) = 1 −1
208
Introduction to the Theory of Optimization in Euclidean Space
C(−1/2, −1/2) = (x, y) ∈ R2 :
1
−1
⎡ .⎣
= (x, y) ∈ R2 :
x+
1 2
y+
1 2
⎤
⎦0
x−y 0 .
∗∗ At (0, 1), only the equality-constraint g1 = 0 is satisfied and the point is regular. We have g1 (0, 1) =
0 2
rank(g1 (0, 1)) = 1
and
C(0, 1) = (x, y) ∈ R2 :
2
= (x, y) ∈ R :
0 2
⎡ .⎣
x−0 y−1
⎤ ⎦0
y1 .
y
1.5
1.0
y
1.5
1.5
1.0
1.0
0.5
0.5
S
0.5
0.5
1.0
1.5
x
1.5
1.0
0.5
0.5
S
0.5
0.5
1.0
Cone : x y
1.5
x
2
x y 0 1.0
Cone : y 1
1.0
1.5
1.5
√
√
FIGURE 4.3: C(0, 1) = [y 1] and C(1/ 2, 1/ 2) = [x + y
√ 2] ∩ [x y]
√ √ ∗ ∗ ∗ At (1/ 2, 1/ 2), the two equality constraints g1 = g2 = 0 are satisfied and the point is regular. We have 1 1 g (√ , √ ) = 2 2
√
√
1 1 rank(g ( √ , √ )) = 2 2 2 ⎡ ⎤ √ √ x − √12 1 1 2 2 2 ⎣ ⎦0 C( √ , √ ) = (x, y) ∈ R : . 1 −1 2 2 y − √12 √ = (x, y) ∈ R2 : x + y − 2 0 and x − y 0 .
2 1
2 −1
and
Constrained Optimization-Inequality Constraints
209
Remark 4.1.3 The conclusion of the theorem is also true when the point x∗ satisfies any one of the following regularity conditions [5] : i) Each constraint gj (x) is affine for j ∈ I(x∗ ). ii) There exists x ¯ such that x) bj gj (¯ gj (¯ x) < bj
∀j ∈ I(x∗ ). if
gj
is not affine
Example 3. Suppose that all the constraints are affine and that the set S is described by
S = {x ∈ Rn :
n
aij xj bi ,
i = 1, . . . , m} = {x ∈ Rn :
Ax b}
j=1
where A = (aij ) is an m × n matrix and b ∈ Rm . Here g(x) = Ax, g (x) = A and gi (x) = ai1 ai2 . . . ain . Thus, from the previous remark, any point of S is a regular point and the cone of feasible directions at a point x∗ ∈ S with vertex x∗ is given by the polyhedra C(x∗ ) = {x ∈ Rn : gi (x).(x − x∗ ) =
n
aij (xj − x∗j ) 0,
i ∈ I(x∗ )}
j=1
= {x ∈ Rn :
n j=1
aij xj
n
aij x∗j = bi ,
i ∈ I(x∗ )}.
j=1
Example 4. Suppose f is a C 1 function and S = [z f (x)] = {(x, z) ∈ Ω × R :
z f (x)}
Ω ⊂ Rn .
Let x∗ be a relative interior point of the surface z = f (x). Find the cone at x∗ . Solution: If we set
g(x, z) = z − f (x),
then the set S can be described by
S = [g(x, z) 0] = {(x, z) ∈ Ω × R :
g(x, z) 0}
210
Introduction to the Theory of Optimization in Euclidean Space and the point (x∗ , f (x∗ )) is a regular point since we have g (x∗ , f (x∗ )) =
−f (x∗ )
1
rank(g (x∗ , f (x∗ ))) = 1.
= 0
The cone of feasible directions at the point (x∗ , f (x∗ )) with vertex (x∗ , f (x∗ )) is given by C(x∗ , f (x∗ )) = (x, z) ∈ Rn × R :
g (x∗ , f (x∗ )).
x − x∗ z − f (x∗ )
0 .
We have
∗
∗
g (x , f (x )).
x − x∗ z − f (x∗ )
=
∗
−f (x )
1
.
x − x∗ z − f (x∗ )
= −f (x∗ ).(x − x∗ ) + z − f (x∗ ) 0 ⇐⇒ z f (x∗ ) + f (x∗ ).(x − x∗ ). Hence
C(x∗ , f (x∗ )) = (x, z) ∈ Rn × R :
z f (x∗ ) + f (x∗ ).(x − x∗ ) .
The cone is the region below the hyperplane z = f (x∗ )+f (x∗ ).(x−x∗ ), which is also the tangent plane to the surface z = f (x) at x∗ . In particular, when x∗ is a stationary point, i.e f (x∗ ) = 0, the cone of feasible directions at x∗ is the region below the horizontal tangent plane z = f (x∗ ).
Remark 4.1.4 Note that the representation of the cone of feasible directions obtained in the theorem used the fact that the point was regular. When, this hypothesis is omitted the representation is not necessary valid. Indeed, if we consider the set S defined by g(x, y) 0
with
g(x, y) = x2 ,
then S is reduced to the y axis. No point of S is regular since we have and g (0, y) = 0 0 on the y-axis. g (x, y) = 2x 0 We deduce that at each point (0, y0 ), we have x−0 0 C(0, y0 ) = (x, y) : g (0, y0 ). y − y0 x−0 0 0 . = 0 = R2 . = (x, y) : y − y0
Constrained Optimization-Inequality Constraints
211
However, the line y(t) = y0 + t
x(t) = 0
remains included in S, passes through the point (0, y0 ) at t = 0, and has the 0 x (0) = . Hence, the cone of feasible directions at each direction 1 y (0) point of S is equal to S. Note that, it also coincides with the tangent plane at each point, since g(x, y) = x2 0
g(x, y) = x2 = 0.
⇐⇒
Proof. We have: T ⊂ C : Indeed, let y ∈ T , y = 0, then ∃ x(t) differentiable such that g(x(t)) b x (0) = y. x(0) = x∗ ,
∀t ∈ [0, a] for some a > 0,
So 0 is a minimum for the function φi (t) = gi (x(t)) − bi , (i ∈ I(x∗ )), over the interval [0, a] since we have φi (t) = gi (x(t)) − bi 0 = φi (0) φi (0) = gi (x∗ ) − bi = 0
because i ∈ I(x∗ ).
Since gi and x(.) are C 1 , then φi is C 1 and Taylor’s formula gives
φi (t) − φi (0) = φi (0)t + tα(t) = t φi (0) + α(t)
with
lim α(t) = 0.
t→0+
If φi (0) > 0 then there exists a0 ∈ (0, a) such that α(t)
−
φi (0) 2
∀t ∈ (0, a0 ).
We deduce that φ (0) φ (0)
=t i >0 φi (t) − φi (0) > t φi (0) − i 2 2
∀t ∈ (0, a0 )
which contradicts that 0 is a maximum for φi on [0, a]. So y ∈ C since we have φi (0) =
d (gi (x(t))) = ∇gi (x(t)).x (t) = gi (x∗ ).y 0. dt t=0 t=0
212
Introduction to the Theory of Optimization in Euclidean Space
C ⊂ T : Let y ∈ C \ {0} . We distinguish between two situations: First
Suppose that
case
gi (x∗ ).y < 0
∀i ∈ I(x∗ ).
Since x∗ ∈ [gj (x) < bj ] for j ∈ I(x∗ ) and g continuous, there exists δ > 0 such that [gj (x) < bj ]. Bδ (x∗ ) ⊂ j ∈I(x∗ )
Consider the curve x(t) = x∗ + ty
t>0
x(0) = x∗
where
x (0) = y.
and
We claim that δ
∃δ0 ∈ 0, min(δ, ) |y|
such that
x(t) ∈ S = [g(x) b]
∀t ∈ [0, δ0 ].
Indeed, for j ∈ I(x∗ ), we have gj (x(t)) = gj (x∗ + ty) = gj (x∗ ) + tgj (x∗ ).y + tεj (t) Since gj (x∗ ) = bj and gj (x∗ ).y δ δ0j ∈ (0, min(δ, )) such that |y|
with
lim εj (t) = 0.
t→0
< 0, we deduce the existence of
1 |εj (t)| < − gj (x∗ ).y. 2 Consequently, for δ0 = min∗ δ0j , we have j∈I(x )
∀j ∈ I(x∗ ),
t t gj (x(t)) < bj + tgj (x∗ ).y − gj (x∗ ).y = bj + gj (x∗ ).y < bj 2 2 Second case
∀t ∈ (0, δ0 ).
Suppose that
gi (x∗ ).y = 0
∀i ∈ {i1 , i2 , . . . , ip } ⊂ I(x∗ )
and
gi (x∗ ).y < 0
∀i ∈ I(x∗ ) \ {i1 , i2 , . . . , ip }
p < n.
Constrained Optimization-Inequality Constraints
213
Consider the system of equations
F (t, u) = G x∗ + ty +t G (x∗ )u − B = 0 where, for t fixed, u ∈ Rp is the unknown, and where G = (gi1 , gi2 , . . . , gip ),
B = (bi1 , bi2 , . . . , bip ),
rank(G (x∗ )) = p.
Note that F is well defined on an open subset of R × Rp . Indeed, if g is C 1 on Bδ (x∗ ) ⊂ {x ∈ Rn : gj (x) < bj , j ∈ I(x∗ )},
δ δ , , we have then ∀(t, u) ∈ (−δ0 , δ0 ) × Bδ0 (0) with δ0 = min ∗ 2y 2G (x ) (x∗ + ty +t G (x∗ )u) − x∗ |t|y + uG (x∗ )
0 with B (0) × Bη (0) ⊆ A, and such that det(∇u F (t, u)) = 0 ∀t ∈ B (0),
B (0) × Bη (0)
in
∃!u ∈ Bη (0) :
u : (−, ) −→ Bη (0);
F (t, u) = 0 is a C 1 function.
t −→ u(t)
Thus, the curve x(t) = X(t, u(t)) = x∗ + ty +t G (x∗ )u(t) is, by construction, a curve in S since we have for each t ∈ (−, ) G(x(t)) − B = 0
⇐⇒
gj (x(t)) − bj = 0
x(t) ∈ Bδ (x∗ ) ⊂ {x ∈ Rn : gj (x) < bj , ⇐⇒
gj (x(t)) − bj < 0
∀j ∈ {i1 , i2 , . . . , ip } ⊂ I(x∗ )
j ∈ I(x∗ )}
∀j ∈ I(x∗ ).
By differentiating both sides of F (t, u(t)) = G(x(t)) − B = G(X(t, u(t))) − B = 0 with respect to t, we get n
0=
∂G ∂Xj d G(x(t)) = dt ∂Xj ∂t j=1
0=
d G(x(t)) dt t=0
m ∂Gi
m
∂Gi ∂ul ∂Xj l = yj + (x∗ ) ∂xj ∂t ∂xj ∂t l=1 l=1 # n m ∂G ∂Gl ∗ ∂ul = (X(t, u)) yj + (x ) ∂xj ∂xj ∂t j=1
Xj (t, u) = x∗j + tyj +
l
(x∗ )ul
l=1
t=0
= G (x∗ )y + G (x∗ )t G (x∗ )u (0). Since we have G (x∗ )y = 0 and that G (x∗ )t G (x∗ ) is nonsingular and definite positive, we conclude that G (x∗ )t G (x∗ )u (0) = G (x∗ )y = 0
=⇒
u (0) = 0.
Constrained Optimization-Inequality Constraints Hence
215
x (0) = y +t G (x∗ )u (0) = y.
Now, for j ∈ I(x∗ ) \ {i1 , i2 , . . . , ip }, we have gj (x(t)) = gj (x(0)) + tgj (x∗ ).x (0) + tη(t) = bj + tgj (x∗ ).y + tη(t) with
lim η(t) = 0. Then, from the first case, there exists 0 ∈ (0, ) such that
t→0
gj (x(t)) < bj
∀t ∈ (0, 0 )
thus x(t) ∈ [gj (x) bj ]
for all
j ∈ I(x∗ ) \ {i1 , i2 , . . . , ip }.
Finally, y is a tangent vector to the curve x(t) included in S for t ∈ [0, 0 /2], so y ∈ T . *C is a cone of Rn since for y ∈ C and κ ∈ R+ , we have gi (x∗ )(κy) = κgi (x∗ )y 0 for i ∈ I(x∗ ). Thus κy ∈ C. *C is a convex of Rn since for y, y ∈ C and s ∈ [0, 1], we have gi (x∗ )(sy + (1 − s)y ) = sgi (x∗ )y + (1 − s)gi (x∗ )y s.0 + (1 − s).0 = 0 for
i ∈ I(x∗ ). Thus sy + (1 − s)y ∈ C.
216
Introduction to the Theory of Optimization in Euclidean Space
Solved Problems
1. – Find and draw the cone of feasible directions at the point (0, 3, 0) belonging to the set x2 + y 2 + z 2 9. Solution: Set g(x, y, z) = 9 − (x2 + y 2 + z 2 ). We have 5 5
y
y
0
0
5
5 5 5
Cone y3
z
z
0
5
x0
5
0
x2 y2 z2 9
5
4 2
x
0 x 5
0
FIGURE 4.4: The set [g 3] ∩ [x 0]
g (x, y, z) = −2xi − 2yj − 2zk,
and
C(0, 3, 0) = [y 3]
g (0, 3, 0) = −6j = 0,
rank(g (0, 3, 0)) = 1.
So (0, 3, 0) is a regular point and the cone of feasible directions to [g 0], with vertex at this point (see Figure 4.4), is given by ⎡ ⎤ x−0 C(0, 3, 0) = {(x, y, z) ∈ R3 : g (0, 3, 0). ⎣ y − 3 ⎦ 0}. z−0 We have
0 −6 0
⎡
⎤ x−0 .⎣ y − 3 ⎦ 0 z−0
⇐⇒
0(x − 0) − 6(y − 3) + 0(z − 1) 0
Constrained Optimization-Inequality Constraints ⇐⇒
y3:
217
C(0, 3, 0) = [y 3].
2. – Find the cone of feasible directions at the point (0, 1, 0) to the set g(x, y, z) = (g1 (x, y, z), g2 (x, y, z)) (1, 1) g2 (x, y, z) = x2 + y 2 + z 2 .
g1 (x, y, z) = x + y + z,
Solution: The set S = [g (1, 1)], as illustrated in Figure 4.5, is the part of the unit ball located below the plane x + y + z 1. 2 y
2
1
y
0
1
0
1
Cone
1
2 2
x y z 1 1 y1 0
2 2 P
P
1
z
1
z
0
1
0
1
x2 y2 z2 1 0 and
2
2
x yz1 0
2
2 1
1 0
0
x
x
1
1
2
2
FIGURE 4.5: [g (1, 1)]
and
C(0, 1, 0)
The point (0, 1, 0) ∈ S satisfies the two constraints g1 (x, y, z) = g2 (x, y, z) = 1 and is a regular point since we have: ⎡ ⎢ g (x, y, z) = ⎣
g (0, 1, 0) =
∂g1 ∂x
∂g1 ∂y
∂g1 ∂z
∂g2 ∂x
∂g2 ∂y
∂g2 ∂z
1 0
1 2
1 0
⎤ ⎥ ⎦=
1 2x
1 2y
has rank 2.
1 2z
218
Introduction to the Theory of Optimization in Euclidean Space
The cone of feasible directions to the set S at the point (0, 1, 0), with vertex this point, is the set of points (x, y, z) such that ⎡ ⎤ ⎡ ⎤ x−0 x 1 1 1 0 g (0, 1, 0). ⎣ y − 1 ⎦ = .⎣ y − 1 ⎦ 0 2 0 0 z−0 z ⇐⇒
x+y−1+z 0
2(y − 1) 0.
and
Thus C(0, 1, 0) = {(x, y, z) ∈ R3 :
3. – Show that the sets z x2 + y 2
x+y+z 1
z
and
and
y 1}.
1 2 5 (x + y 2 ) + 10 2
have a common cone of feasible directions at the point (3, 4, 5). Solution: Set g1 (x, y, z) = z −
x2 + y 2
g2 (x, y, z) = z −
1 2 5 (x + y 2 ) − . 10 2
We have g1 (3, 4, 5) = g2 (3, 4, 5) = 0 g1 =
1 x2 + y 2
− xi − yj + k ,
y x g2 (x, y, z) = − i − j + k, 5 5 rank(g1 (3, 4, 5)) = 1
4 3 g1 (3, 4, 5) = − i − j + k = 0 5 5 4 3 g2 (3, 4, 5) = − i − j + k = 0 5 5 rank(g2 (3, 4, 5)) = 1.
So (3, 4, 5) is a regular point for the two constraints g1 (x, y, z) = 0 and g2 (x, y, z) = 0. Therefore, the cones of feasible directions at the point (3, 4, 5) for the sets [g1 (x, y, z) 0] and [g2 (x, y, z) 0], with vertex (3, 4, 5), are given respectively by: C1 (3, 4, 5) = {(x, y, z) ∈ R3 : g1 (3, 4, 5).t C2 (3, 4, 5) = {(x, y, z) ∈ R3 : g2 (3, 4, 5).t
x−3
y−4
z−5
x−3
y−4
z−5
0} 0}.
Constrained Optimization-Inequality Constraints
219
Clearly, since g1 (3, 4, 5) = g2 (3, 4, 5), the two sets are equal and we have for i = 1, 2 ⎡ ⎤ ⎡ ⎤ x−3 x−3 − 35 − 45 1 . ⎣ y − 4 ⎦ 0 gi (3, 4, 5). ⎣ y − 4 ⎦ 0 ⇐⇒ z−5 z−5 3 4 − (x − 3) − (y − 4) + 1(z − 5) 0. 5 5
⇐⇒
Hence, the two given sets have a common cone of feasible directions at this point (see the illustrations in Figure 4.6) characterized by the inequality 4 3 − (x − 3) − (y − 4) + (z − 5) 0. 5 5 10 y
5
5
y
0
0
5 5
10 10
P 6
z
5
z
4
Cone
2 0 0
10 5
5 0 x
0
5
x 5
10 10 y
5
0 5 10 10
z
5
Cone
0 10 5 0 x
5 10
FIGURE 4.6: Sets
[g1 0], [g2 0]
and
C(3, 4, 5)
220
Introduction to the Theory of Optimization in Euclidean Space
4.2
Necessary Condition for Local Extreme Points/ Inequality Constraints
In what follows, we will be interested in the study of the maximization problem
max f (x1 , . . . , xn )
⎧ ⎪ ⎨ g1 (x1 , . . . , xn ) b1 .. .. . . ⎪ ⎩ gm (x1 , . . . , xn ) bm
subject to
The results established are strongly related to the fact that we are maximizing a function f under inequality constraint g(x) b. To solve a minimization problem min f (x), we can maximize −f (x), and if a constraint is given in the form gj (x) bj , we can transform it into −gj (x) −bj . An equality constraint gj (x) = bj can be equivalently written as gj (x) bj and −gj (x) −bj .
We have the following preliminary lemma
Lemma 4.2.1 Let f and g = (g1 , . . . , gm ) be C 1 functions in a neighborhood of x∗ ∈ [g(x) b]. If x∗ is a regular point and a local maximum point of f subject to these constraints, then we have
∀y ∈ Rn : gi (x∗ ).y 0, i ∈ I(x∗ ) =⇒ f (x∗ ).y 0.
Proof. Let y ∈ Rn such that g (x∗ ).y 0. Because, x∗ is a regular point of the set [g(x) b], then y ∈ C(x∗ ), the cone of feasible directions at x∗ to the set [g(x) b]. So ∃ a > 0, ∃ x ∈ C 1 [0.a] such that g(x(t)) c
∀t ∈ [0, a],
x(0) = x∗ ,
x (0) = y.
Now, since x∗ is a local maximum point of f on the set g(x) b, then there exists δ ∈ (0, a) such that ∀t ∈ (0, δ), f (x(t)) f (x∗ ) = f (x(0))
⇐⇒
f (x(t)) − f (x(0)) 0 t−0
from which we deduce f (x∗ ).y = f (x(t)).x (t)
t=0
=
d f (x(t)) − f (x(0)) f (x(t)) 0. = lim+ dt t−0 t=0 t→0
Constrained Optimization-Inequality Constraints
221
Remark 4.2.1 The lemma generalizes the necessary condition for a local maximum point x∗ in a convex S: f (x∗ ).(x − x∗ ) 0
∀x ∈ S.
Without assuming the set S = [g(x) b] is convex, the local maximum point must satisfy an inequality on the convex cone C(x∗ ): f (x∗ ).(x − x∗ ) 0
∀x ∈ C(x∗ ).
As a consequence of the lemma, we have the following characterization of a constrained local maximum point.
Theorem 4.2.1 Let f and g = (g1 , . . . , gm ) be C 1 functions in a neighborhood of x∗ ∈ [g(x) b]. If x∗ is a regular point and a local maximum point of f subject to these constraints, then ∃λ∗j 0,
j ∈ I(x∗ ) = {k ∈ N,
1km:
gk (x) = bk }
such that ∂f ∗ (x ) − ∂xi
λ∗j
j∈I(x∗ )
∂gj ∗ (x ) = 0, ∂xi
i = 1, · · · , n.
The proof uses an argument of linear algebra called “Farkas-Minkowski’s Lemma” [5] that says: Farkas-Minkowski’s Lemma. Let A be an p×n real matrix and c ∈ Rn . Then, the inclusion {x ∈ Rn :
Ax 0}
⊂
{x ∈ Rn :
c.x 0}
is satisfied if and only if ∃λ = λ1 , . . . , λp ∈ Rp ,
Proof. Set
λ 0,
I(x∗ ) = {i1 , i2 , · · · , ip },
such that
c = t Aλ.
222
Introduction to the Theory of Optimization in Euclidean Space ⎡ A = −[gj (x∗ )]j∈I(x∗ )
⎢ ⎢ ⎢ = −⎢ ⎢ ⎢ ⎣
∂gi1 ∂x1
∂gi1 ∂x2
∂gi2 ∂x1
∂gi2 ∂x2
∂gip ∂x1
∂gip ∂x2
.. .
∂gi1 ∂xn
...
∂gi2 ∂xn
... .. .. . . ∂gip . . . ∂xn ⎡ ∂f ⎤
.. .
⎢ ⎢ ∂f ⎢ t ∗ t ∂f = −⎢ ,··· , c = − f (x ) = − ⎢ ∂x1 ∂xn ⎣
∂x1
.. .
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎥ ⎥ ⎥ ⎥. ⎥ ⎦
∂f ∂xn
From Farkas-Minkowski’s Lemma, the inclusion {y = (y1 , · · · , yn ) ∈ Rn :
Ay 0} =
{y ∈ Rn :
gi (x∗ ).y 0}
i∈I(x∗ )
⊂
{y ∈ Rn :
f (x∗ ).y 0} = {y ∈ Rn :
is satisfied, then ∃ λ∗ = (λ∗1 , . . . , λ∗p ) ∈ Rp , −t f (x∗ ) = c = t Aλ∗ = −
p
c.y = −f (x∗ ).y 0}
λ∗ 0
λ∗k t gik (x∗ )
⇐⇒
such that f (x∗ ) =
k=1
p
λ∗k gik (x∗ ).
k=1
So we are led to solve the system ∂f ∂gj (x) − λj (x) = 0 ∂xi ∂x i ∗
i = 1, . . . , n,
λj 0
j ∈ I(x∗ )
j∈I(x )
gj (x) − bj = 0
∀j ∈ I(x∗ )
gj (x) − bj < 0
∀j ∈ I(x∗ ).
To find a practical way to solve the system, we introduce the complementary slackness conditions
λj 0,
with
λj = 0
if
gj (x) < bj ,
j = 1, . . . , m
Constrained Optimization-Inequality Constraints
223
When gj (x∗ ) = bj , we say that the constraint gj (x) bj is active or binding at x∗ . When gj (x∗ ) < bj , we say that the constraint gj (x) bj is inactive or slack at x∗ . We introduce the Lagrangian function L(x, λ) = f (x) − λ1 (g1 (x) − b1 ) − . . . − λm (gm (x) − bm ) where λ1 , · · · , λm are the generalized Lagrange multipliers. Then, we reformulate the previous theorem as follows:
Theorem 4.2.2 Let f and g = (g1 , . . . , gm ) be C 1 functions in a neighborhood of x∗ ∈ [g(x) b]. If x∗ is a regular point and a local maximum point of f subject to these constraints, then ∃!λ∗ = (λ∗1 , . . . , λ∗m ) such that the following Karush-Kuhn-Tucker (KKT) conditions hold at (x∗ , λ∗ ): ⎧ m ⎪ ∂f ∗ ∂gj ∗ ∂L ∗ ∗ ⎪ ⎪ (x , λ ) = (x ) − λ∗j (x ) = 0 i = 1, . . . , n ⎨ ∂x ∂x ∂x i i i j=1 ⎪ ⎪ ⎪ ⎩ ∗ λj 0, with λ∗j = 0 if gj (x∗ ) < bj , j = 1, . . . , m.
Remark 4.2.2 The numbers λ∗j , j ∈ I(x∗ ) are unique. Indeed, suppose there exist λ = λ1 , · · · , λp and λ = λ1 , · · · , λp solutions of c =t Aλ
and
c =t Aλ
then t A(λ − λ ) = 0, which we can write (λj − λj )gj (x∗ ) = 0. j∈I(x∗ )
Since the vectors gj (x∗ ) are linearly independent, deduce that (λj − λj ) = 0
for each
j ∈ I(x∗ ).
224
Introduction to the Theory of Optimization in Euclidean Space
Remark 4.2.3 If I(x∗ ) = ∅ then the Karush-Kuhn-Tucker conditions reduce to ∇f (x∗ ) = 0 which is expected since then the point x∗ belongs to the interior of the set of the constraints. On the other hand, this shows that the Kuhn-Tucker conditions are not sufficient for optimality. In fact, when x∗ is an interior point, it could be a local maximum, a local minimum or a saddle point. First, let us practice writing the KKT conditions through simple examples. Example 1. Solve the problem max (x − 2)3
g(x) = −x 0.
subject to
Solution: Since f, g, are C 1 in R, consider the Lagrangian L(x, α) = (x − 2)3 − α(−x) and write the KKT conditions: ⎧ ∂L ⎪ ⎨ (1) = 3(x − 2)2 + α = 0 ∂x ⎪ ⎩ (2) α0 with α = 0 ∗
if
−x 1, solve the problem min f (x, y) = (x, y) − (a, b)2
g(x, y) = x2 + y 2 1.
subject to
Solution: i) The problem describes the shortest distance of the point (a, b) to the unit disk (here (a, b) is located outside the unit disk). This distance is attained by the extreme value theorem since f is continuous on the constraint set [g 1], which is a closed and bounded subset of R2 . The case (a, b) = (2, 3) is illustrated in Figure 4.7 and a graphical solution is described in Figure 4.8 using level curves. x
4
0 2
y
2
4
0
1.0
z x 22 y 32
0.5 y 0.0 0.5
20
1.0 20
z
15 z 10
10
1.0 0.5 0
0.0 x 0.5 1.0
FIGURE 4.7: Graph of z = (x − 2)2 + (y − 3)2 on x2 + y 2 1
ii) KKT conditions. f and g being C 1 , introduce, for the corresponding maximization problem, the Lagrangian L(x, y, λ) = −(x − a)2 − (y − b)2 − λ(x2 + y 2 − 1). The necessary conditions to satisfy are: ⎧ (i) Lx = −2(x − a) − 2λx = 0 ⇐⇒ x(1 + λ) = a ⎪ ⎪ ⎪ ⎪ ⎨ (ii) Ly = −2(y − b) − 2λy = 0 ⇐⇒ y(1 + λ) = b ⎪ ⎪ ⎪ ⎪ ⎩ (iii) λ 0 with λ = 0 if x2 + y 2 < 1 ∗ If x2 + y 2 < 1 then λ = 0, and then (i) and (ii) yield (x, y) = (a, b) which leads to a contradiction since a2 + b2 > 1.
Constrained Optimization-Inequality Constraints
227
∗∗ If x2 +y 2 = 1 then from (i) and (ii), we deduce that (x, y) = ( By substitution in x2 + y 2 = 1, we get a 2 b 2 + = 1, λ 0 ⇐⇒ 1+λ 1+λ Thus, the only solution of the system is the point
λ=
b a , ). 1+λ 1+λ
a2 + b2 − 1.
a b ,√ ) 2 2 +b a + b2 where the constraint is active. Finally, the point is regular since we have g (x, y) = 2x 2y , and rankg (x∗ , y ∗ ) = 1. (x∗ , y ∗ ) = ( √
a2
Therefore, the point is candidate for optimality. Conclusion. Now, since it is guaranteed that the maximum value is attained, it must be at the candidate point found. Hence, f (x, y) = f (x∗ , y ∗ ) = a2 + b2 − 1. max 2 2 x +y 1
y 6.8 30.4 25.6
6.44
12.8
1.6
4.8
17.6 3.2
35.2
2
4.8
22.4 9.6
40
4 6.4
8
14.4
2
2
11.2
4 16
28.8 19.2
51.2 54.4 57.6 60.8 62.4 65.6 68.8 72 5.2 80
70.4 7876473 8 6
24
33.6 43.2
27.2
2
32
38.4
36.8 41
49.6 67 264 59.2 56
x
20
44.8 452 8
48 49 6 51 2
FIGURE 4.8: Minimal value of z = (x − 2)2 + (y − 3)2 on x2 + y 2 = 1
Remark 4.2.4 The conclusion of the Karush-Kuhn-Tucker theorem is also true when the extreme point x∗ satisfies any one of the following regularity conditions (see [14], [5]): i) Linear constraints: gj (x) is linear, j = 1, · · · , m.
ii) Slater’s condition: gj (x) is convex and there exists x ¯ such that gj (¯ x) < bj , j = 1, · · · , m (with f concave).
228
Introduction to the Theory of Optimization in Euclidean Space
iii) Concave programming (with f concave): gj (x) is convex and there exists x ¯ such that for any j = 1, . . . , m, x) bj gj (¯
and
gj (¯ x ) < bj
if
gj
is not linear.
iv) The rank condition: The constraints gi1 , . . . , gip , (p m), are binding. The rank of the matrix ⎡ ∗ ⎤ gi1 (x ) ⎢ ⎥ .. ⎣ ⎦ . gip (x∗ )
is equal to p. This last case is the one we consider here in our study. These four conditions are not equivalent to one another. For example, the uniqueness of the Lagrange multipliers is established under the rank condition iv).
Example 4. (Non-uniqueness of Lagrange multipliers). Solve the problem max f (x, y) = x1/2 y 1/4 2x + y 3,
x + 2y 3,
subject to
x+y 2
with
x 0,
y 0.
Solution: To simplify calculations, we will transform the problem to an equivalent one as we did for distance problems, where the square distance is considered instead of the distance itself. Here, to avoid the powers, we will use the logarithmic function. i) The constraint set, sketched in Figure 4.9, and defined by: S = {(x, y) ∈ R+ × R+ :
2x + y 3,
x + 2y 3,
x + y 2}
is a closed bounded subset of R2 . f is continuous on S, then, by the extreme value theorem, ∃(x∗ , y∗ ) ∈ S
such that
f (x∗ , y∗ ) = max f (x, y). (x,y)∈S
Note that, we have f (0, y) = f (x, 0) = 0 f (x, y) > 0
∀x > 0,
∀x 0, y > 0.
y0
Constrained Optimization-Inequality Constraints
229
So f (x∗ , y∗ ) = max f (x, y) > 0. Therefore, at the maximum point, the con(x,y)∈S
straints x 0 and y 0 cannot be binding. y 3.0 2.0
2x y 3 y
2.5
1.5
1.0 0.5 0.0
2.0
1.5 z x
1.5 z
4
y
1.0
0.5
1.0
0.0
0.5
0.0
x2 y 3
S
0.5
x y 2
1.0 x
0.5
1.0
1.5
2.0
2.5
3.0
x
1.5 2.0
FIGURE 4.9: Graph of z = x1/2 y 1/4 on S
ii) Set
Ω = (0, +∞) × (0, +∞).
As a consequence of i), we have max f (x, y) = max ∗ f (x, y)
(x,y)∈S
(x,y)∈S
where S ∗ = {(x, y) ∈ Ω :
2x + y 3,
x + 2y 3,
x + y 2}.
Set F (x, y) = ln f (x, y) =
1 1 ln(x) + ln(y) 2 4
(x, y) ∈ Ω
F is well defined and we have max f (x, y) = max ∗ f (x, y) = max ∗ eln F (x,y)
(x,y)∈S
(x,y)∈S
max ∗ F (x, y) = ln
(x,y)∈S
(x,y)∈S
max ∗ f (x, y) = ln f (x∗ , y∗ )
(x,y)∈S
since the functions t −→ ln t and t −→ et are increasing. Note that S ∗ is a bounded subset of R2 but not closed. Thus, we cannot apply the extreme value theorem to conclude about the existence of a solution to the problem
230
Introduction to the Theory of Optimization in Euclidean Space max F (x, y).
(x,y)∈S ∗
iii) Since F and the constraints are C 1 in Ω, to solve the problem, we write the KKT conditions for the associated Lagrangian L(x, y, λ1 , λ2 , λ3 ) =
1 1 ln x+ ln y−λ1 (2x+y−3)−λ2 (x+2y−3)−λ3 (x+y−2), 2 4
The necessary conditions to satisfy are: ⎧ 1 ⎪ ⎪ − 2λ1 − λ2 − λ3 = 0 (i) Lx = ⎪ ⎪ 2x ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ ⎪ ⎪ − λ1 − 2λ2 − λ3 = 0 (ii) Ly = ⎪ ⎪ 4y ⎨ ⎪ (iii) λ1 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (iv) λ2 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (v) λ3 0
with
λ1 = 0
if
2x + y < 3
with
λ2 = 0
if
x + 2y < 3
with
λ3 = 0
if
x+y 0 4x 4y
=⇒
y = x.
Because λ1 > 0 then 2x + y = 3. Thus, we have (x, y) = (1, 1). But x + y = 1 + 1 = 2 : contradiction. ◦◦ if x + 2y = 3, then −
if
2x + y < 3, then λ1 = 0, and λ2 =
1 1 = >0 2x 8y
=⇒
x = 4y.
Because x + 2y = 3, then, we have (x, y) = (2, 1/2). But x + y = 2 + 1/2 > 2: contradiction. − contradiction.
if 2x + y = 3, then (x, y) = (1, 1). But x + y = 1 + 1 = 2:
Constrained Optimization-Inequality Constraints
231
** If x + y = 2, then by drawing the constraint set, we see that the only point satisfying x + y = 2 is (x, y) = (1, 1) for which we have also 2x + y = 3 and x + 2y = 3 with 2λ1 + λ2 + λ3 = 1
⇐⇒
λ1 + 2λ2 + λ3 = 1
λ1 = λ2
λ3 = 1 − 3λ1 .
iv) Conclusion. The only point candidate is (1, 1), and it is the maximum point since we know that such a point exists. However, we see that we do not have uniqueness of the Lagrange multipliers, but still we can apply the KKT conditions since the constraints are linear. Note also, that the rank condition is not satisfied since we have g(x, y) = (2x + y, ⎡ 2 g (x, y) = ⎣ 1 1
x + 2y , x + y) g(1, 1) = (3, 3, 3) ⎤ 1 2 ⎦ rank(g (1, 1)) = 2 = 3 1
and the three constraints are active at (1, 1); see Figure 4.10. y 2.0 096
0.704
1.312
1.024
0.416
1.44
1.152 0.864
1.5 0.288
1.28
0.576
1.536 1.6 1.472
1.056 1.184
0.768 0.48
1.344
1.504 1.408
0.928 1.088
1.00.16
1.248
0.64 0.384
1.376
0.832 0.96
0.5 0.224
1.568
1.12
1.216
0.544 0.672
0.8
0.896
0.992
0.32 0.0 032 0.0
0.192 0.5
0.448
0 128 1.0
0.512
0.608 0 256 1.5
0.7 0 352 0 0 x 2.0
FIGURE 4.10: Maximal value of z = x1/2 y 1/4 on S
232
Introduction to the Theory of Optimization in Euclidean Space
Mixed Constraints Some maximization problems take the form ⎧ ⎨ gj (x) = bj , max f (x)
subject to
⎩
hk (x) ck ,
j = 1, · · · , r
(r < n)
k = 1, · · · , s
We have:
Theorem 4.2.3 Let f , g = (g1 , . . . , gr ), and h = (h1 , . . . , hs ) be C 1 functions in a neighborhood of x∗ ∈ [g(x) = b] ∩ [h(x) c]. If x∗ is a regular point and a local maximum point of f subject to these constraints, then ∃!(λ∗ , μ∗ ), λ∗ = (λ∗1 , . . . , λ∗r ), μ∗ = (μ∗1 , . . . , μ∗s ) such that the following Karush-Kuhn-Tucker (KKT) conditions hold at (x∗ , λ∗ , μ∗ ): ⎧ r s ⎪ ∂f ∗ ∂hk ∗ ∂L ∗ ∗ ∗ ⎪ ∗ ∂gj ∗ ⎪ (x , λ , μ ) = (x ) − λ (x ) − μ∗k (x ) = 0 ⎪ j ⎪ ∂x ∂x ∂x ∂xi ⎪ i i i ⎪ j=1 k=1 ⎪ ⎪ ⎪ ⎪ i = 1, . . . , n ⎨ ⎪ ∂L ∗ ∗ ∗ ⎪ ⎪ (x , λ , μ ) = −(gj (x∗ ) − bj ) = 0 j = 1, . . . , r. ⎪ ⎪ ⎪ ∂λ j ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ∗ with μ∗k = 0 if hk (x∗ ) < ck , k = 1, . . . , s, μk 0, where L(x, λ, μ) = f (x) −
r j=1
λj (gj (x) − bj ) −
s
μk (hk (x) − ck ).
k=1
Proof. The maximization problem is equivalent to
max f (x)
subject to
⎧ j = 1, . . . , r gj (x) bj , ⎪ ⎪ ⎪ ⎪ ⎨ −gj (x) −bj , j = 1, . . . , r ⎪ ⎪ ⎪ ⎪ ⎩ hk (x) ck , k = 1, . . . , s
Constrained Optimization-Inequality Constraints
233
By applying the KKT conditions with the Lagrangian
∗
L (x, τ, κ, μ) = f (x)−
r
τj (gj (x)−bj )−
j=1
r
κj (−gj (x)+bj )−
j=1
s
μk (hk (x)−ck )
k=1
there exist unique multipliers τj∗ , κ∗j , μ∗k such that the necessary conditions are satisfied: ⎧ r ∂f ∗ ∂gj ∗ ∂L∗ ∗ ∗ ∗ ∗ ⎪ ⎪ ⎪ (x , τ , κ , μ ) = (x ) − τj∗ (x ) ⎪ ⎪ ∂xi ∂xi ∂xi ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ r s ⎪ ⎪ ∂hk ∗ ∗ ∂gj ∗ ⎪ ⎪ + κ (x ) − μ∗k (x ) = 0 ⎪ j ⎨ ∂xi ∂xi ⎪ ⎪ ⎪ ⎪ τj∗ 0, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ κ∗j 0, ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ∗ μk 0,
j=1
i = 1, . . . , n
k=1
with
τj∗ = 0
if
gj (x∗ ) < bj ,
with
κ∗j = 0
if
− gj (x∗ ) < −bj ,
with
μ∗k = 0
if
hk (x∗ ) < ck ,
j = 1, . . . , r j = 1, . . . , r k = 1, . . . , s.
Setting λ∗ = τ ∗ − κ ∗
and
L(x, λ, μ) = L∗ (x, τ, κ, μ) = f (x) −
r
(τj − κj )(gj (x) − bj ) −
j=1
s
μk (hk (x) − ck )
k=1
we deduce that (x∗ , λ∗ , μ∗ ) is also a solution of the KKT conditions corresponding to Lagrangian L. Moreover, for j = 1, . . . , r, λj changes sign λj = τj − κj = −κj 0 λ j = τj − κ j = τj 0
if if
gj (x) < bj gj (x) > bj .
234
Introduction to the Theory of Optimization in Euclidean Space
Uniqueness of λ∗ and μ∗ . Suppose λ∗ and μ∗ are not uniquely defined, then we would have for some λ = λ and μ = μ r s ∂f ∗ ∂gj ∗ ∂hk ∗ (x ) − λj (x ) − μk (x ) = 0 ∂xi ∂xi ∂xi j=1 k=1
r s ∂gj ∗ ∂hk ∗ ∂f ∗ (x ) − λj (x ) − μk (x ) = 0. ∂xi ∂x ∂xi i j=1 k=1
Subtracting the two equalities and using the fact that x∗ is a regular point, we obtain a contradiction: r
(λj − λj )∇gj (x∗ ) −
j=1
s
(μk − μk )∇hk (x∗ ) = 0
=⇒
(λ , μ ) = (λ, μ).
k=1
Nonnegativity constraints Some maximization problems take the form ⎧ ⎨ gj (x) bj , max f (x)
subject to
⎩
j = 1, . . . , m
x1 0, . . . , xn 0.
We introduce the following n new constraints: gm+1 (x) = −x1 0, . . . . . . . . . . . . , gm+n (x) = −xn 0. The maximization problem is equivalent to ⎧ ⎨ gj (x) bj , max f (x)
subject to
⎩
gj (x) 0,
j = 1, . . . , m j = m + 1, . . . , m + n.
By applying the KKT conditions, for a regular point x, with the Lagrangian L∗ (x, λ, μ) = f (x) −
m j=1
λj (gj (x) − bj ) −
n k=1
μk (−xk )
Constrained Optimization-Inequality Constraints there exist unique multipliers λj , μk such that ⎧ m ⎪ ∂f ∂gj ∂L∗ ⎪ ⎪ (x, λ, μ) = (x) − λj (x) + μi = 0 ⎪ ⎪ ∂x ∂x ∂x ⎪ i i i ⎪ j=1 ⎨ ⎪ λj 0, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ μk 0,
with
λj = 0
if
gj (x) < bj ,
with
μk = 0
if
xk > 0,
235
i = 1, . . . , n j = 1, . . . , m k = 1, . . . , n.
We deduce then:
Theorem 4.2.4 Let f and g = (g1 , . . . , gr ) be C 1 functions in a neighborhood of x∗ ∈ [g(x) b] ∩ [x 0]. If x is a regular point and a local maximum point of f subject to these constraints, then ∃!λ∗ = (λ∗1 , . . . , λ∗m ) such that the following Karush-Kuhn-Tucker (KKT) conditions hold at (x, λ): ⎧ m ⎪ ∂f ∂gj ∂L ⎪ ⎪ (x, λ) = (x) − λj (x) 0 (=0 ⎪ ⎪ ∂xi ∂xi ⎨ ∂xi j=1 i = 1, . . . , n ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ λj 0, with λj = 0 if gj (x) < bj , where the Lagrangian is L(x, λ) = f (x) −
m j=1
λj (gj (x) − bj ).
if
xi > 0 ),
j = 1, . . . , m
236
Introduction to the Theory of Optimization in Euclidean Space
Solved Problems
1. – Importance of KKT hypotheses. Show that the KKT conditions fail to hold at the optimal solution of the problem ⎧ ⎨ g1 (x, y) = (x − 2)2 = 0 2 subject to max f (x, y) = x + y ⎩ g2 (x, y) = (y + 1)3 0.
Solution: i) The set of constraints, graphed in Figure 4.11, is S = {(x, y) :
g1 (x, y) = 0
= {(x, y) :
x=2
and
g2 (x, y) 0} y −1}.
and
y 0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
0.5
1.0
1.5
2.0
S
2.5
3.0
FIGURE 4.11: Constraint set S
ii) The Karush-Kuhn-Tucker conditions for the Lagrangian L(x, y, α, β) = x2 + y − α((x − 2)2 ) − β((y + 1)3 )
Constrained Optimization-Inequality Constraints ⎧ (1) ⎪ ⎪ ⎪ ⎪ ⎨ (2) ⎪ ⎪ ⎪ ⎪ ⎩ (3)
are
237
Lx = 2x − 2α(x − 1) = 0 Ly = 1 − 3β(y + 1)2 = 0 β0
with
β=0
(y + 1)3 < 0
if
* If (y + 1)3 < 0, then β = 0. We get a contradiction with (2) which leads to 1 = 0. * If (y + 1)3 = 0, then y = −1, and by (3) again, we obtain 1 = 0 which is not possible. Thus, KKT conditions have no solution. ii) The problem has a solution at (2, −1) since, we have f (x, y) = x2 + y = 22 + y 4 + (−1) = f (2, −1)
∀(x, y) ∈ S.
Thus max f (x, y) = f (2, −1) = 3. S
Note that the point is not a candidate for the KKT conditions. This is because it doesn’t satisfy the constraint qualification under which the KKT conditions are established. In particular, the rank condition is not satisfied. Indeed, the g1 (2, −1) ) = 0 two constraints are active at (2, −1), but we have rank( g2 (2, −1) since g1 (x, y) g1 (2, −1) 2(x − 2) 0 0 0 = = . 0 3(y + 1)2 0 0 g2 (x, y) g2 (2, −1)
2. –KKT conditions are not sufficient. Consider the problem min f (x, y) = 2 − y − (x − 1)2
subject to
y − x = 0,
x+y−20
i) Sketch the feasible set and write down the necessary KKT conditions. ii) Find the point(s) solution of the KKT conditions and check their regularity. iii) What can you conclude about the solution of the minimization problem? iv) Does this contradict the theorem on the necessary conditions for a constrained candidate point?
238
Introduction to the Theory of Optimization in Euclidean Space
Solution: i) The set of the constraints is the set of points on the line y = x included in the region below the line y = 2 − x, as shown in Figure 4.12. y 3 x y 2 2
1
3
2
1
1
2
3
x
1 x y 0 2
S
S 3
FIGURE 4.12: Constraint set S
Writing the Karush-Kuhn-Tucker conditions. First, transform the problem into a maximization one as: max −f (x, y) = y − 2 + (x − 1)2
y − x = 0,
subject to
x+y−20
Note that f , and the constraints g1 and g2 are C ∞ in R2 where g1 (x, y) = y − x
g2 (x, y) = x + y − 2.
Thus, the Lagrangian associated is L(x, y, α, β) = y − 2 + (x − 1)2 − α(y − x) − β(x + y − 2) and the Karush-Kuhn-Tucker conditions are ⎧ Lx = 2(x − 1) + α − β = 0 ⎪ ⎪ (1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Ly = 1 − α − β = 0 ⎨ (2) ⎪ ⎪ (3) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (4)
Lα = −(y − x) = 0 β0
with
ii) Solving the KKT conditions. ∗ If x + y − 2 < 0 then β = 0 and
β=0
if
x + y − 2 < 0.
Constrained Optimization-Inequality Constraints ⎧ ⎨ 2(x − 1) + α = 0 1−α=0 ⎩ y−x=0
1 1 =⇒ (x, y) = ( , ) 2 2
and
239
(α, β) = (1, 0).
∗∗ If x + y − 2 = 0 then ⎧ 2(x − 1) + α − β = 0 ⎪ ⎪ ⎨ 1−α−β =0 =⇒ (x, y) = (1, 1) y−x=0 ⎪ ⎪ ⎩ x+y−2=0
and
(α, β) = (1/2, 1/2).
So, there are two solutions: (1/2, 1/2) and (1, 1) for the KKT conditions. Regularity of the point (1/2, 1/2). Only the constraint g1 (x, y) = y − x is active at (1/2, 1/2) and we have
g1 (x, y)
=
−1
1
rank( g1 (1/2, 1/2) ) = rank( −1 1 ) = 1.
The point (1/2, 1/2) is a regular point. Regularity of the point (1, 1). The two constraints are active at (1, 1). We have
g1 (x, y) g2 (x, y)
=
−1 1 1 1
rank(
g1 (1, 1) g2 (1, 1)
) = rank(
−1 1 1 1
) = 2.
Thus the point (1, 1) is a regular point. iii) Conclusion. The two points are candidates for optimality. Comparing the values taken by f at these points gives: f (1, 1) = 1,
1 1 5 1 1 f ( , ) = 2 − − = > 1, 2 2 2 4 4
we deduce that, only (1, 1) is the candidate for minimality. However, it is not the minimum point. Indeed, we have f (x, x) = 2 − x − (x − 1)2 −→ −∞ Therefore, f doesn’t attain its minimal value.
as
x −→ −∞.
240
Introduction to the Theory of Optimization in Euclidean Space
iv) This doesn’t contradict the theorem since KKT conditions indicate only where to find the possible points when they exist.
3. – Positivity constraints. Solve the problem by two methods: ⎧ ⎨ y − x2 0, y 4 max f (x, y) = 3 + x − y + xy subject to ⎩ x 0, y0 i) using the extreme value theorem. ii) using the KKT conditions.
Solution: i) EVT method. The constraints set, graphed in Figure 4.13, is S = {(x, y) / 0 x 2,
x2 y 4}. x
y
0 4
5
1
3 y
2 3
2
2
yx
1 0 10
4 y4
3
z yxx y3
S
z 5
2
1
0
0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
FIGURE 4.13: Graph of z = 3 + x − y + xy on S
f is continuous (because it is a polynomial) on the set S, which is a bounded and closed subset of R2 . So f attains its absolute extreme points on S (by the ◦
extreme value theorem), either at the critical points located in S or on ∂S.
Constrained Optimization-Inequality Constraints
241
* Critical points of f : f has no critical point in the interior of S because ∇f (x, y) = 1 + y, −1 + x = 0, 0
⇐⇒
(x, y) = (1, −1) ∈ S.
** Extreme values on ∂S : Let L1 , L2 and L3 the three parts of the boundary of S defined by: L1 = {(x, x2 ), 0 x 2},
L2 = {(x, 4), 0 x 2}
L3 = {(0, y), 0 y 4} – On L1 , we have f (x, x2 ) = 3 + x − x2 + x3 = g(x), g (x) = 3x2 − 2x + 1. x g (x) g(x)
0
2 +
3
9
TABLE 4.1: Variations of g(x) = 3 + x − x2 + x3 on [0, 2] Then, using Table 4.1, we deduce that max f = f (2, 4) = 9
min f = f (0, 0) = 3.
L1
L1
– On L2 , we have: f (x, 4) = 5x − 1 = h(x), x h (x) h(x)
h (x) = 5.
0 −1
2 +
9
TABLE 4.2: Variations of h(x) = 5x − 1 on [0, 2] From Table 4.2, the extreme values on this side are max f = f (2, 4) = 9 L2
min f = f (0, 4) = −1. L2
ϕ (y) = −1.
– On L3 , we have: f (0, y) = 3 − y = ϕ(y), Hence, we obtain from Table 4.3, max f = f (0, 0) = 3 L3
min f = f (0, 4) = −1. L3
242
Introduction to the Theory of Optimization in Euclidean Space y ϕ (y) ϕ(y)
0
4
3
−
−1
TABLE 4.3: Variations of ϕ(y) = 3 − y on [0, 4] ∗ ∗ ∗Conclusion: The maximal value of f on S is 9 and is attained at the point (2, 4). The minimal value of f on S is −1 and is attained at the point (0, 4). ii) KKT conditions. Consider the Lagrangian L(x, y, λ, μ) = 3 + x − y + x y − λ(x2 − y) − μ(y − 4).
The Karush-Kuhn-Tucker conditions are ⎧ (= 0 if x > 0) (1) Lx = 1 + y − 2λx 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Ly = −1 + x + λ − μ 0 (= 0 if y > 0) ⎨ (2) ⎪ ⎪ (3) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (4)
λ0
with
λ=0
if
y < x2
μ0
with
μ=0
if
y 0, then y = x2 > 0. From (1) and (2), we get 1 + x2 − 2λx = 0
and
λ=1−x
=⇒
3x2 − 2x + 1 = 0
with no solution. ◦ if x = 0, then y = x2 = 0, and (1) leads to a contradiction.
Constrained Optimization-Inequality Constraints
243
∗∗ If y = 4 – Suppose y < x2 then λ = 0 and by (1), we get 1 + 4 0 which is not possible. – Suppose y = x2 . We deduce that x = 2 or x = −2. The second value is not possible since x 0. For x = 2 > 0, we insert the values x = 2 and y = 4 in (1) and (2), and obtain 5 9 5 − 4λ = 0 =⇒ (λ, μ) = ( , ). 1+λ−μ=0 4 4 Note that, both constraints are active at (2, 4), and if g(x, y) = (x2 − y, y − 4), then 2x −1 4 −1 rank(g (2, 4)) = rank( ) = 2. g (x, y) = 0 1 0 1 Thus, (2, 4) is a regular point and, therefore, a candidate point. Moreover, (2, 4) solves the problem since the maximal value of f is attained on S by the EVT; see Figure 4.14. y 0.7
5
8.75 10.15
4.55
1.05
12.2513.3 14.715.4
.75 6.3
1.4 0
14. 15.
11.2
12.95
2.45
1.05 4
11.9
7.7 9.45
14.
13. 10.5
0.7 0.35
11.55 12. 8.4
5.25
3 1.75
7.
35
9.8 10.8
3.5 2 8.05 .4
9.1
5.6
1 2.1
6.65 7.3
2.8 0 3 15 0.0
0.5
3 85
4.9
4.2 1.0
1.5
2.0
2.5
5 95 x 3.0
FIGURE 4.14: Maximal value of z = 3 + x − y + xy on S
244
Introduction to the Theory of Optimization in Euclidean Space
4. – Application. Find (x, y) ∈ S = {(x, y) : x + y 0, x2 − 4 0} that lies closest to the point (2, 3) by following the steps below: i) Formulate the problem as an optimization problem. ii) Illustrate the problem graphically (Hint: use level curves). iii) Write down the KKT conditions. iv) Find all points that satisfy the KKT conditions. Check whether or not each point is regular. v) What can you conclude about the solution of the problem?
Solution: i) The square of the distance between (x, y) and (2, 3) is given by (x − 2)2 + (y − 3)2 . To find the point (x, y) ∈ S that lies closest to the point (2, 3) is equivalent to solve the minimization problem
min (x − 2)2 + (y − 3)2
⎧ ⎨ g1 (x, y) = x + y 0 subject to
⎩
g2 (x, y) = x2 − 4 0
or to maximize the objective function f below subject to the two constraints: ⎧ ⎨ g1 (x, y) = x + y 0 max f (x, y) = −(x − 2)2 − (y − 3)2 subject to ⎩ g2 (x, y) = x2 − 4 0
ii) The feasible set, graphed in Figure 4.15, is also described by S = {(x, y) :
y −x,
−2 x 2}
The level curves of f , with equations: (x√− 2)2 + (y − 3)2 = k where k 0, are circles centered at (2, 3) with radius k; see Figure 4.16. If we increase the values of the radius, the values of f decrease. The first circle that will intersect the set S will be the circle with radius equal to the distance of the point (2, 3) to the line y = −x. So, only the first constraint will be active in solving the optimization problem. iii) Writing the KKT conditions. Consider the Lagrangian L(x, y, λ, β) = −(x − 2)2 − (y − 3)2 − λ(x + y) − β(x2 − 4)
Constrained Optimization-Inequality Constraints
245 x
1
0 1
4
2
y 2
0
y 10
x 2
x2
4
y x
2
z 5 3
2
1
1
2
3
x
2
z x 22 y 32
S 0 4
FIGURE 4.15: Graph of z = (x − 2)2 + (y − 3)2 on S
The KKT conditions ⎧ ⎪ ⎪ (1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (2)
are
⎪ ⎪ ⎪ ⎪ ⎪ (3) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (4)
∂L = −2(x − 2) − λ − 2βx = 0 ∂x ∂L = −2(y − 3) − λ = 0 ∂y λ0
with
λ=0
if
x+y 0. 1+β
This contradicts x + y < 0. ∗∗ If x + y = 0 then – Suppose x2 − 4 < 0 then β = 0 and ⎧ ⎨ −2(x − 2) − λ = 0 =⇒ ⎩ −2(y − 3) − λ = 0
y = x + 1.
With x+y = 0, we deduce that (x, y) = (−1/2, 1/2). Note that (−1/2)2 −4 < 0 is satisfied and λ = 5 > 0. – Suppose x2 − 4 = 0. We deduce that x = 2 or x = −2. Then, inserting in (1) and (2), we obtain
(x, y) = (2, −2)
⎧ ⎨ λ + 4β = 0 =⇒
⎩
10 − λ = 0
=⇒
(λ, β) = (10, −5/2)
Constrained Optimization-Inequality Constraints ⎧ ⎨ 8 − λ + 4β = 0 (x, y) = (−2, 2)
=⇒
⎩
=⇒
247
(λ, β) = (0, −2)
λ=0
contradicting β 0. So, the only point solution of the system is (x∗ , y ∗ ) = (−1/2, 1/2)
with
(λ, β) = (5, 0).
Regularity of the candidate point (−1/2, 1/2). Note that only the constraint g1 (x, y) = x + y is active at (−1/2, 1/2). We have
g1 (x, y)
=
1 1
rank( g1 (−1/2, 1/2) ) = rank( 1
1 ) = 1.
Thus the point (−1/2, 1/2) is a regular point. iv) Conclusion. The constraint set is an unbounded closed convex and we have |f (x, y)| = (x, y) − (2, 3)2 ((x, y) − (2, 3))2 =⇒
lim
(x,y)→+∞
f (x, y) = +∞.
By Theorem 2.4.2, there exists a minimum point for f on S. Thus, the candidate found solves the problem.
5. – Mixed constraints. Solve the problem ⎧ ⎨ 2x2 + y 2 + z 2 = 1 2 2 2 max x + y + z subject to ⎩ x + y + z 0.
Solution: Set U (x, y, z) = x2 + y 2 + z 2 and g(x, y, z) = x + y + z
h(x, y, z) = 2x2 + y 2 + z 2 − 1.
First, the maximization problem has a solution by the extreme-value theorem. Indeed, U is continuous on the set S = {(x, y, z) :
x + y + z 0,
2x2 + y 2 + z 2 = 1}
248
Introduction to the Theory of Optimization in Euclidean Space
which is a closed and bounded subset of R3 as the intersection of the ellipsoid x2 √ + y 2 + z 2 = 1 with the region below the plane x + y + z = 0. The (1/ 2)2 plane passes through the center of the ellipsoid; see Figure 4.17. 2 y
1
0 1 2 2
Plane
Ellipsoid
1
z
0
1
2 2 1 0 x
1 2
FIGURE 4.17: S is the part of the ellipsoid below the plane Next, the functions U , g and h are C 1 around each point (x, y, z) ∈ R3 . We then may deduce the solution by using the Karusk-Kuhn-Tucker conditions. The Lagrangian is given by L(x, y, λ, μ) = x2 + y 2 + z 2 − λ(x + y + z) − μ(2x2 + y 2 + z 2 − 1), and the necessary conditions to satisfy are: ⎧ (i) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (ii) ⎪ ⎪ ⎪ ⎪ ⎨ (iii) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (iv) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (v)
Lx = 2x − λ − 4μx = 0
⇐⇒
2x(1 − 2μ) = λ
Ly = 2y − λ − 2μy = 0
⇐⇒
2y(1 − μ) = λ
Lz = 2z − λ − 2μz = 0
⇐⇒
2z(1 − μ) = λ
Lμ = −(2x2 + y 2 + z 2 − 1) = 0 λ0
with
λ=0
if
x + y + z < 0.
* If x + y + z < 0, then λ = 0, and from (i), (ii), (iii) and (iv), we deduce that
Constrained Optimization-Inequality Constraints ⎧ x = 0 or μ = 1/2 ⎪ ⎪ ⎨ y = 0 or μ = 1 ⎪ z = 0 or μ = 1 ⎪ ⎩ 2x2 + y 2 + z 2 = 1
249
We obtain the points (0, 0, −1),
(0, −1, 0)
1 (− √ , 0, 0) 2
with
with
μ=1
μ=
1 2
The active constraint at these points satisfies: h (x, y, z) = (4x, 2y, 2z),
1 rank h (0, −1, 0) = rank h (0, 0, −1) = rank h (− √ , 0, 0) = 1. 2 Thus, the points are regular and candidate for optimality. * If x + y + z = 0, then • Suppose x = 0. We deduce from (iv) that y2 + z2 = 1
and
y+z =1
and deduce the two candidate points 1 1 (x, y, z) = (0, √ , − √ ) 2 2
1 1 (0, − √ , √ ) 2 2
or
with
(λ, μ) = (0, 1/2).
The two constraints are active at theses points and satisfy
g(x, y, z)
h(x, y, z)
=
x+y+z
1 g (x, y, z) = 4x h (x, y, z) # ( g (0, √12 , − √12 ) = rank rank h (0, √12 , − √12 ) ( # g (0, − √12 , √12 ) rank = rank h (0, − √12 , √12 )
1 2y
2x2 + y 2 + z 2 − 1 1 2z
1 0
1 √ 2 2
1√ −2 2
1 0
1√ 1 √ −2 2 2 2
=2 =2
The points satisfy the constraint qualification. They are regular points and candidates for optimality.
250
Introduction to the Theory of Optimization in Euclidean Space • Suppose x = 0. Then
– if μ = 1/2, then λ = 0. By (ii) and (iii), we have y = z = 0. Thus, from x + y + z = 0, we deduce x = 0 : contradiction with x = 0. – if μ = 1/2, then from (i), we have λ = 0. Moreover, by (ii) and (iii), we have μ = 1. So, by dividing each side of (ii) by each side of (iii), we obtain y = z. Then we deduce that x + 2y = 0
2x2 + 2y 2 = 1
2y(1 − μ) = λ = 2(−2y)(1 − 2μ)
1 y = ±√ 10 3 μ= . 5
=⇒ =⇒
With λ 0, the only possible point is: (x, y, z) =
1 1
2 −√ ,√ ,√ 10 10 10
4 3 (λ, μ) = ( √ , ). 5 10 5
with
It is clear also that the constraint qualification condition is satisfied, so the point is regular. g − rank ⎣ h − ⎡
⎤
√2 , √1 , √1 10 10 10
√2 , √1 , √1 10 10 10
⎦ = rank
1
− √810
1
1
√2 10
√2 10
= 2.
Conclusion: Finally, comparing the values of f at the candidate points 2 1 1 3 f −√ ,√ ,√ = 5 10 10 10
1 1 f − √ , 0, 0 = 2 2
1 1
1
1 f (0, 0, −1) = f (0, −1, 0) = f 0, √ , − √ = f 0, − √ , √ = 1 2 2 2 2 we deduce that f attains its maximum value subject to the constraints at (0, 0, −1),
(0, −1, 0),
1
1 0, √ , − √ 2 2
and
1 1
0, − √ , √ . 2 2
Constrained Optimization-Inequality Constraints
4.3
251
Classification of Local Extreme Points-Inequality Constraints
To classify a candidate point x∗ for optimality of the problem local max (min) f (x)
g(x) b
subject to
with g = (g1 , . . . , gm ) and b = (b1 , . . . , bm ), we proceed as in the case of equality constraints by comparing the values taken by the Lagrangian L(x, λ) = f (x) − λ1 (g1 (x) − b1 ) − · · · − λm (gm (x) − bm ), at points close to x∗ . Then, since, x∗ ∈ [g(x) b] means that x∗ ∈
[gj (x) < bj ] = O
and
j ∈I(x∗ )
x∗ ∈
[gj (x) = bj ],
j∈I(x∗ )
we remark that by working in a neighborhood of x∗ included in the open set O, we bring ourselves to solving a local optimization problem of type equality constraints local max (min) f (x)
subject to
gj (x) = bj ,
j ∈ I(x∗ ).
Consequently, we can apply the second derivative test established for equality constraints by considering in the test only the active constraints at that point. In what follows, suppose we have: Hypothesis (H) f and g = (g1 , . . . , gm ) be C 2 functions in a neighborhood of x∗ in Rn such that: gj (x∗ ) = bj
if
j ∈ I(x∗ ) = {i1 , . . . , ip }
λj = 0
if
gj (x∗ ) < bj
rank(G (x∗ )) = p, ∇x L(x∗ , λ∗ ) = 0
p 0 ∗ x is a strict local minimum point (−1)r Br (x∗ ) > 0
∀r = p + 1, . . . , n
∗
x is a strict local maximum point.
Proof. The proof follows the one seen for the case of equality constraints. We outline here the key modification that allows us to conclude with the previous proof. We assume that I(x∗ ) = {1, . . . , m} to avoid the case of equality constraints. Note that the positivity of λ is not assumed in the hypothesis H in order to include both the maximization and minimization problems as explained below. The Lagrangian introduced is used to link values of f and g for comparison. Then depending on its positivity or negativity on the plan tangent of the active constraints at that point, we identify whether we have a minimum or a maximum point. Step 0: Suppose that we assign for the problems max f :
L(x, α) = f (x) − α.(g(x) − b)
min f :
L(x, β) = −f (x) − β.(g(x) − b)
α0 β0
Constrained Optimization-Inequality Constraints
253
then −L(x, β) = f (x) − (−β).(g(x) − b)
− β 0.
So, to consider the two problems simultaneously, we can introduce the Lagrangian L(x, λ) = f (x) − λ.(g(x) − b) with λ 0 (resp. )
for the maximization (resp. minimization) problem.
Step 1: We have [g(x) b] =
[gj (x) < bj ]
j ∈I(x∗ )
[gj (x) = bj ]
⊂ O.
j∈I(x∗ )
Thus x∗ belongs to the open set O. So, one can find ρ0 > 0 such that Bρ0 (x∗ ) ⊂ O. Then, for h ∈ Rn such that x∗ + h ∈ Bρ0 (x∗ ), we have from Taylor’s formula, for some τ ∈ (0, 1), L(x∗ +h, λ∗ ) = L(x∗ , λ∗ )+
n
n
n
1 Lx x (x∗ +τ h, λ∗ )hi hj . 2 i=1 j=1 i j
Lxi (x∗ , λ∗ )hi +
i=1
By assumptions, we have Lxi (x∗ , λ∗ ) = 0 L(x, λ) = f (x) −
i = 1, . . . , n
λj (gj (x) − bj )
j∈I(x∗ )
gi1 (x∗ ) − bi1 = gi2 (x∗ ) − bi2 = . . . = gip (x∗ ) − bip = 0 then, we have L(x∗ , λ∗ ) = f (x∗ ) − λ∗i1 (gi1 (x∗ ) − bi1 ) − . . . − λ∗ip ((gip (x∗ ) − bip ) = f (x∗ ) L(x∗ + h, λ∗ ) = f (x∗ + h) − λ∗i1 (gi1 (x∗ + h) − bi1 ) − . . . − λ∗ip (gip (x∗ + h) − bip ) from which we deduce
f (x∗ +h)−f (x∗ ) =
λ∗k [gk (x∗ +h)−bk ]+
k∈I(x∗ )
n
n
1 Lx x (x∗ +τ h, λ∗ )hi hj . 2 i=1 j=1 i j
Using Taylor’s formula for each gk , k ∈ I(x∗ ), we obtain gk (x∗ + h) − bk = gk (x∗ + h) − gk (x∗ ) =
n ∂gk j=1
∂xj
(x∗ + τk h)hj
τk ∈ (0, 1).
254
Introduction to the Theory of Optimization in Euclidean Space
Step 2: Consider the (p + n) × (p + n) bordered Hessian matrix B(x0 , x1 , . . . , xp ) with ⎡
G(x1 , · · · , xp ) =
∂g
ik
(xk )
∂xj
p×n
∂gi1 ∂xi1
(x1 ) ⎢ .. =⎢ . ⎣ ∂gip p ∂x1 (x )
... .. . ...
⎤ (x1 ) ⎥ .. ⎥ . ⎦ ∂gip p (x ). ∂xn ∂gi1 ∂xn
The remaining steps of the equality constraints’ proof work is shown using the above notations.
Remark 4.3.1 If we introduce the notations: Q(h) = Q(h1 , . . . , hn ) =
n n
Lxi xj (x∗ , λ∗ )hi hj
i=1 j=1
⎡
the (p + n) × (p + n) bordered matrix
0p×p
⎣ t
∗
G (x )
G (x∗ ) ∗
⎤ ⎦
∗
[Lxi xj (x , λ )]n×n
M = {h ∈ Rn : G (x∗ ).h = 0} the theorem says that Q(h) > 0 =⇒
∀h ∈ M, h = 0 x∗ is a strict local constrained minimum
Q(h) < 0 =⇒
∀h ∈ M, h = 0 x∗ is a strict local constrained maximum.
It suffices then to study the positivity (negativity) of the quadratic form on the tangent plan M to the constraints gk (x) = bk , k ∈ I(x∗ ) at x∗ .
Example 1. Solve the problem local max (min) f (x, y) = xy
subject to
g(x, y) = x + y 2.
Solution: Consider the Lagrangian L(x, y, λ) = f (x, y) − λ(g(x, y) − 2) = xy − λ(x + y − 2)
Constrained Optimization-Inequality Constraints
255
and the system ⎧ ⎨ (i) Lx = y − λ = 0 (ii) Ly = x − λ = 0 ⎩ (iii) λ = 0 if x + y < 2. From (i) and (ii), we deduce that λ = x = y. ∗ If x + y < 2, then λ = 0. Thus (0, 0) is a candidate point, that is an interior point of [g 2]. To explore its nature, we use the second derivatives test for unconstrained problems. We have 0 1 , D1 (0, 0) = 0 and D2 (0, 0) = −1 < 0. Hf (x, y) = 1 0 Then, (0, 0) is a saddle point. ∗∗
If x + y = 2, then (x, y) = (1, 1) is a candidate point with λ = 1.
First, (1, 1) is a regular point since g (x, y) = 1, 1 and rank[g (1, 1)] = 1. Next, since n = 2 and p = 1, we have to consider the sign of the bordered Hessian determinant: 0 (−1)2 B2 (1, 1) = gx (1, 1) gy (1, 1)
gx (1, 1) Lxx (1, 1, 1) Lxy (1, 1, 1)
0 1 1 Lxy (1, 1, 1) = 1 0 1 1 1 0 Lyy (1, 1, 1) gy (1, 1)
= 2 > 0.
We conclude that the point (1, 1) is a local maximum to the problem. Finally, we also have
Theorem 4.3.2 Necessary conditions for a local constrained extreme points If assumptions (H) hold, then (i)
(ii)
x∗ is a local minimum point =⇒ positive semi definite on M :
t
HL = (Lxi xj (x∗ , λ∗ ))n×n is yHL y 0
∀y ∈ M
x∗ is a local maximum point =⇒ HL = (Lxi xj (x∗ , λ∗ ))n×n is is negative semi definite on M : t yHL y 0 ∀y ∈ M
where M = {h ∈ Rn : G (x∗ ).h = 0} constraints gk (x) = bk , k ∈ I(x∗ ) at x∗ .
is the tangent plan to the
256
Introduction to the Theory of Optimization in Euclidean Space
Proof. Let x(t) ∈ C 2 [0, a], a > 0, be a curve on the constraint set g(x) b passing through x∗ at t = 0. Suppose that x∗ is a local maximum point for f subject to the constraint g(x) b. Then, f (x∗ ) f (x(t)) or
∀t ∈ [0, a).
f$(0) = f (x∗ ) f (x(t)) = f$(t)
∀t ∈ [0, a).
So f$ is a one variable function that has a local maximum at t = 0. Consequently, it satisfies f$ (0) 0 and f$ (0) 0 or equivalently ∇f (x∗ ).x (0) 0
and
d2 0. f (x(t)) dt2 t=0
We have d2 f (x(t)) = t x (0)Hf (x∗ )x (0) + ∇f (x∗ ).x (0). dt2 Moreover, differentiating the relations gk (x(t)) = bk , k ∈ I(x∗ ) twice and denoting Λ∗ = λ∗i1 , . . . , λ∗ip , we obtain t
x (0)HG (x∗ )x (0) + ∇G(x∗ )x (0) = 0
=⇒
t
x (0)t Λ∗ HG (x∗ )x (0) + t Λ∗ ∇G(x∗ )x (0) = 0.
Hence 0
d2 f (x(t)) = [t x (0)Hf (x∗ )x (0) + ∇f (x∗ )x (0)] − 2 dt t=0 [t x (0)t ΛHG (x∗ )x (0) + t Λ∇G(x∗ )x (0)]
= t x (0)[Hf (x∗ ) −t ΛHG (x∗ )]x (0) + [∇f (x∗ ) + t Λ∇G(x∗ )]x (0) = t x (0)[HL (x∗ )]x (0)
since
∇f (x∗ ) + t Λ∇G(x∗ ) = 0
and the result follows since x (0) is an arbitrary element of M . Example 2. Suppose that (4, 0) is a candidate satisfying the KKT conditions where only the constraint g is active and such that g (4, 0) = −1 0 −2 0 . and the Hessian of the associated Lagrangian is HL(.,−8) (4, 0) = 0 14 Can (4, 0) be a local maximum or minimum to the constrained optimization problem? Solution: The point (4, 0) is regular since rank(g1 (4, 0)) = 1.
Constrained Optimization-Inequality Constraints
257
We have p = 1 < 2 = n. Then we can consider the following determinant (r = p + 1 = 2). (Note that the first column vector of g1 (4, 0) is linearly independent, so we do not have to renumber the variables). 0 −1 0 B2 (4, 0) = −1 −2 0 = −14. 0 0 14 We have (−1)1 B2 (4, 0) = 14 > 0 and λ = −8 < 0. So the second derivatives test is satisfied and (4, 0) is a strict local minimum. This shows also that the Hessian is positive definite under the constraint. Indeed, we can check this directly: h h = −1 0 . = −h + (0)k = 0 g1 (4, 0) k k
Thus,
0 k ∈ R} and k −2 0 0 k = 14k2 0 0 14 k
M ={
0
∀
0 k
∈ M.
258
Introduction to the Theory of Optimization in Euclidean Space
Solved Problems
1. – Solve the problem ⎧ ⎨ x + 2y 3
local min f (x, y) = x2 + y 2
s.t
⎩
2x − y 1
Solution: The problem is equivalent to the maximization problem.
local max −f (x, y) = −(x2 + y 2 )
⎧ ⎨ −(x + 2y) −3 s.t
⎩
−(2x − y) −1
Consider the Lagrangian L(x, y, λ1 , λ2 ) = −f (x, y) − λ1 (−(x + 2y) + 3) − λ2 (−(2x − y) + 1) = −(x2 + y 2 ) + λ1 (x + 2y − 3) + λ2 (2x − y − 1). The constraints are linear, so we can look for the candidate points by writing the Karush-Kuhn-Tucker conditions: ⎧ (i) Lx = −2x + λ1 + 2λ2 = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Ly = −2y + 2λ1 − λ2 = 0 ⎨ (ii) ⎪ ⎪ (iii) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (iv)
λ1 0
with
λ1 = 0
if
x + 2y > 3
λ2 0
with
λ2 = 0
if
2x − y > 1.
We distinguish several cases: • If 2x − y > 1, then λ2 = 0. From (i) and (ii), we deduce that λ1 = 2x = y. But 2x − 2x = 0 ≯ 1. So, no solution.
Constrained Optimization-Inequality Constraints •
259
If 2x − y = 1, then
– If x + 2y > 3, then λ1 = 0. From (i) and (ii), we deduce that λ2 = x = 2y. With 2x − y = 1, we deduce (x, y) = (2/3, 1/3). But 2/3 + 2(1/3) ≯ 3. So, no solution. – If x + 2y = 3, then with 2x − y = 1, we have (x, y) = (1, 1) and (λ1 , λ2 ) are such that ⎧ ⎨ λ1 + 2λ2 = 2 6 2 ⇐⇒ (λ1 , λ2 ) = ( , ). ⎩ 5 5 2λ1 − λ2 = 2 Hence, the only solution point is (x, y) = (1, 1)
with
6 2 (λ1 , λ2 ) = ( , ). 5 5
Regularity of the point. The two constraints are active at the point. We have g = (g1 , g2 ) = (−(x + 2y), −(2x − y)), # ( ∂g1 ∂g1 −1 −2 ∂x ∂y g (x, y) = ∂g = =⇒ rank(g (1, 1)) = 2. ∂g2 2 −2 1 ∂x ∂y Classification of the point. Since n = 2, p = 2, p ≮ n, then we can’t apply the second derivatives test. Let us use comparison to conclude. We have 2 6 6 2 L(x, y, , ) = −(x2 + y 2 ) + (x + 2y − 3) + (2x − y − 1) 5 5 5 5 = −x2 − y 2 + 2x + 2y − 4 = −(x − 1)2 − (y − 1)2 − 2 −2 and, on the set of the constraints, we have 6 2 6 2 −f (x, y) L(x, y, , ) = −f (x, y) + (x + 2y − 3) + (2x − y − 1) 5 5 5 5 Thus, −f (x, y) −2 = −f (1, 1)
∀(x, y)
x + 2y 3,
2x − y 1.
Hence, (1, 1) is the minimum point solution; see Figure 4.18 for a geometric interpretation of the solution.
260
Introduction to the Theory of Optimization in Euclidean Space y
y
4
4
16.12
29.76 28.52 31 24.826.04 27.28 23.56 30.3 29.1 22.32 27. 16.74 26.6 20.46 24.18 25.4
17.98
21.08
19.22 13.02 3
3
18.6
7.44
22.94 21.
x2 y 3 2
2
1
1
9.92 15.5
19.8
3.72 17.3 1.24 6.2 S
1
1
2
3
4
x
12.4 1
2x y 1
1 0.62
1
2
3
4
x
1.86 16 12
1
FIGURE 4.18: Local minimum of z = x2 + y 2 on x + 2y 3 and 2x − y 1
2. – Classify the solutions of the problem local max (min)f (x, y) = x2 y + 3y − 4 s.t
g(x, y) = 4 − xy 0
Solution: i) Consider the Lagrangian L(x, y, λ) = f (x, y) − λ(g(x, y) − 0) = x2 y + 3y − 4 − λ(4 − xy) and write the conditions ⎧ (1) Lx = 2xy − λ(−y) = 0 ⇐⇒ ⎪ ⎪ ⎪ ⎪ ⎨ (2) Ly = x2 + 3 − λ(−x) = 0 ⎪ ⎪ ⎪ ⎪ ⎩ (3) λ = 0 if xy > 4.
y(2x + λ) = 0
∗ If xy > 4, then λ = 0, and with (2), we have x2 + 3 = 0 which has no solution. ∗∗ If xy = 4, then x = 0 and y = 0. By (1), we deduce that λ = −2x, which inserted in (2), we obtain 3 − x2 = 0. Thus, we have two solutions: √ 4 (x, y) = ( 3, √ ) 3 √ 4 (x, y) = (− 3, − √ ) 3
with with
√ λ = −2 3 √ λ = 2 3.
ii) Constraint qualification. Note that g is C 1 in R2 and any point of the set of the constraints g = 0 is an interior point and regular; see Figure 4.19. Indeed, we have
Constrained Optimization-Inequality Constraints
261
4 y
2
0 y
2
4 4 4
4x y0
2
2
z
4
2
2
4
0
x 2
4x y0 4 4
2
2 0 x 4
2 4
FIGURE 4.19: Graph of z = x2 y + 3y − 4 on xy 4
g (x, y) =
−y
−x
rank(g (x, y)) = 1
for (x, y) ∈ [g = 0]
since g (x, y) = 0 ⇐⇒ (x, y) = (0, 0) and (0, 0) ∈ [g = 0]. In particular √ 4 √ 4 ( 3, √ ) and (− 3, − √ ) are regular points of [g = 0]. Therefore, they are 3 3 candidate points; see in Figure 4.20, the variations of the values of the function close to these points. iii) Classification. With m = 1 (the number of the active constraints), n = 2 (the dimension of the space), then r taking values from m + 1 to n = 2, must be equal to r = 2. So, consider the following determinant 0 0 −y −x gx gy 2y 2x + λ B2 (x, y) = gx Lxx Lxy = −y −x 2x + λ gy Lyx Lyy 0 √ √ √ ∗ At ( 3, 4/ 3), we have λ = −2 3, then √ √ √ √ 0√ −4/√ 3 − 3 √ √ −4/ 3 −8/ 3 √ √ = −8 3 √ B2 ( 3, 4/ 3) = −4/ 3 8/ 3 0 = − 3 √ 3 0 8/ − 3 0 0
√ √ √ 1 Because √ λ = −2 3 0 and (−1) B2 ( 3, 4/ 3) > 0, we deduce that √ ( 3, 4/ 3) is a local minimum. √ √ √ ∗ At (− 3, −4/ 3), we have λ = 2 3, then 0 √ √ B2 (− 3, −4/ 3) = 4/ 3 √ 3 √
√ 4/ √3 −8/ 3 0
√
3 0 0
√ √ 4/ 3 = 3 √ 8/ 3
√ 3 =8 3 0
√
√ √ √ Because λ√= 2 3 0 and (−1)2 B2 (− 3, −4/ 3) > 0, we deduce that √ (− 3, −4/ 3) is a local maximum.
262
Introduction to the Theory of Optimization in Euclidean Space y 4
43.2 37.8
4.8 54 4
43.2 37.8
32.4 27
32.4 27 2
21.6 16.2
21.6 16.2
0
4
2
2
5.4
4
21.6
x
21.6
27
27 2 10.8
32.4 9.4
54 64 5
5.4
32.4
37.8 43.2
37.8 43.2
64.8 .6
5
64.8 7 48.6
48.6
416 2
FIGURE 4.20: Local extrema of z = x2 y + 3y − 4 on xy 4
3. – Solve the problem local min f (x, y, z) = x2 +y 2 +z 2
⎧ ⎨ g1 (x, y, z) = x + 2y + z 30 s.t
⎩
g2 (x, y, z) = 2x − y − 3z 10
Solution: Note that f , g1 and g2 are C 1 in R3 . The problem is equivalent to the maximization problem
local max −f = −(x2 + y 2 + z 2 )
⎧ ⎨ −g1 = −(x + 2y + z) −30 s.t
⎩
g2 = 2x − y − 3z 10
Consider the Lagrangian L(x, y, z, λ1 , λ2 ) = −(x2 + y 2 + z 2 ) + λ1 (x + 2y + z − 30) − λ2 (2x − y − 3z − 10). Because the constraints are linear, the local candidate points satisfy the KKT conditions: ⎧ (i) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (ii) ⎪ ⎪ ⎪ ⎪ ⎨ (iii) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (iv) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (v)
Lx = −2x + λ1 − 2λ2 = 0 Ly = −2y + 2λ1 + λ2 = 0 Lz = −2z + λ1 + 3λ2 = 0 λ1 0 λ2 0
with with
λ1 = 0 λ2 = 0
if if
x + 2y + z > 30 2x − y − 3z < 10.
Constrained Optimization-Inequality Constraints
263
From the first three equations, we deduce that
x=
1 y = λ 1 + λ2 2
1 λ1 − λ 2 2
z=
1 3 λ1 + λ2 . 2 2
We distinguish several cases: ∗ If x + 2y + z = 30 and 2x − y − 3z = 10, then inserting the expressions of x, y and z above into the two equations gives ⎧ ⎨ 3λ1 + 32 λ2 = 30 54 8 , λ2 = − ⇐⇒ λ1 = ⎩ 3 5 5 − 2 λ1 − 7λ2 = 10 which contradicts λ2 0. ∗∗
If x + 2y + z = 30 and 2x − y − 3z < 10, then λ2 = 0 and 1 1 (x, y, z) = λ1 ( , 1, ) 2 2
which inserted into the equation x + 2y + z = 30 gives λ1 = 10 and (x, y, z) = (5, 10, 5). We have 2x − y − 3z = 2(5) − 10 − 3(5) = −15 < 10. So the point λ1 = 10,
(x, y, z) = (5, 10, 5)
λ2 = 0
is a candidate for optimality. Now, let us study the nature of the point (5, 10, 5). For this, we use the second derivatives test since f , g1 and g2 are C 2 around this point. Since n = 3 and p = 1 (only the constraint g1 is active), then r takes the values p + 1 = 2 to n = 3. First, we consider the matrix 1 1 1 − ∂g − ∂g = −1 −2 −1 g (x, y, z) = − ∂g ∂x ∂y ∂z Then rank(g (x, y, z)) = 1. Moreover, the first column vector of g (5, 10, 5) is linearly independent, so we don’t have to renumber the variables. Next, we have to consider the sign of the following bordered Hessian determinants: 0 1 B2 (5, 10, 5) = − ∂g ∂x ∂g − 1 ∂y
1 − ∂g ∂x
1 − ∂g ∂y
Lxx
Lxy
Lyx
Lyy
0 −1 −2 −1 −2 0 = 10. = −2 0 −2
264
Introduction to the Theory of Optimization in Euclidean Space
0 − ∂g1 ∂x B3 (5, 10, 5) = 1 − ∂g ∂y − ∂g1 ∂z
1 − ∂g ∂x
1 − ∂g ∂y
1 − ∂g ∂z
Lxx
Lxy
Lxz
Lyx
Lyy
Lyz
Lzx
Lzy
Lzz
=
0 −1 −2 −1
−1 −2 0 0
−2 0 −2 0
−1 0 0 −2
= −24.
Here, the partial derivatives of g1 are evaluated at the point (5, 10, 5) and the second partial derivatives of L are evaluated at the point (5, 10, 5, 10, 0). We have (−1)2 B2 (5, 10, 5) = 10 > 0
and
(−1)3 B3 (5, 10, 5) = 24 > 0.
We conclude that the point (5, 10, 5) is a local maximum to the maximization problem, or equivalently, a local minimum to the minimization problem. ∗∗∗
If x + 2y + z > 30 and 2x − y − 3z = 10 then λ1 = 0 and 1 3 (x, y, z) = λ2 (1, − , − ) 2 2
which inserted into the equation 2x − y − 3z = 10 gives λ2 = −10/7 < 0 : contradiction. ∗ ∗ ∗∗ If x + 2y + z > 30 and 2x − y − 3z < 10 then λ1 = 0 and λ2 = 0. So (x, y, z) = (0, 0, 0) which contradicts the first above inequality. Conclusion: The minimization problem has one local minimum at the point (5, 10, 5).
4. – Classify the candidates of the problem ⎧ ⎨ g 1 = x2 + y 2 + z 2 = 1 local max(min)f (x, y, z) = x + y + z
s.t
⎩
g2 = x − y − z 1.
Solution: i) Note that f , g1 and g2 are C ∞ in R3 and consider the Lagrangian L(x, y, z, λ1 , λ2 ) = f (x, y, z) − λ1 (g1 (x, y, z) − 1) − λ2 (1 − g2 (x, y, z)) = x + y + z − λ1 (x2 + y 2 + z 2 − 1) + λ2 (x − y − z − 1)
Constrained Optimization-Inequality Constraints and let us look for the ⎧ (1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (2) ⎪ ⎪ ⎪ ⎪ ⎨ (3) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (4) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (5)
265
solutions of the system Lx = 1 − 2xλ1 + λ2 = 0 Ly = 1 − 2yλ1 − λ2 = 0 Lz = 1 − 2zλ1 − λ2 = 0 Lλ1 = −(x2 + y 2 + z 2 − 1) = 0 λ2 = 0
if
x − y − z > 1.
From the first three equations, we deduce that −λ2 = 1 − 2xλ1
λ1 (x + y) = 1
λ1 (x + z) = 1.
Note that λ1 = 0 is not possible because we would have from (1) : λ2 = −1, and from (2) : λ2 = 1. So λ1 = 0 and we have x+y =x+z =
1 λ1
=⇒
y = z.
∗ If x − y − z > 1, then λ2 = 0. We deduce that
1 = 2x, thus λ1
1 = 2x. So x = y = z, which inserted into (4) gives 3x2 = 1. λ1 Hence, we have two points x+y =x+z =
1 1 1 (x, y, z) = ( √ , √ , √ ) 3 3 3
1 1 1 (− √ , − √ , − √ ). 3 3 3
or
But, they do not satisfy x − y − z > 1. ∗ If x − y − z = 1, then with y = z and (4), we have ⎧ ⎨ x = 1 + 2y 1 2 ⇐⇒ (x, y) = (1, 0) or (x, y) = (− , − ). ⎩ 3 3 2y(3y + 2) = 0 We deduce then (x, y, z) = (1, 0, 0)
with
1 2 2 (x, y, z) = (− , − , − ) 3 3 3
λ1 = 1, with
λ2 = 1
λ1 = −1,
1 λ2 = − . 3
266
Introduction to the Theory of Optimization in Euclidean Space
ii) Regularity of the points. We have ⎡ ⎢ g (x, y, z) = ⎣
∂g1 ∂x
∂g1 ∂y
∂g1 ∂z
2 − ∂g ∂x
2 − ∂g ∂y
2 − ∂g ∂z
g (1, 0, 0) =
2 0 −1 1
0 1
⎤ ⎥ ⎦=
2x −1
1 2 2 g (− , − , − ) = 3 3 3
2y 1
− 23 −1
2z 1
− 43 1
− 43 1
.
Then
1 2 2 rank(g (1, 0, 0)) = rank(g (− , − , − )) = 2. 3 3 3 The two points are regular. Moreover, we remark that the first two column vectors are linearly independent and we will not renumber the variables. iii) Classification of the points. Now, let us study the nature of the points (1, 0, 0) and (− 13 , − 23 , − 23 ). For this we use the second derivatives test since f , g1 and g2 are C 2 around these points. We have to consider the sign of the following bordered Hessian determinant: ∂g1 ∂g1 ∂g1 0 0 ∂x ∂y ∂z ∂g2 ∂g2 ∂g2 0 0 − ∂x − ∂y − ∂z ∂g1 2 ∂x − ∂g Lxx Lxy Lxz ∂x B3 (x, y, z) = ∂g1 − ∂g2 Lyx Lyy Lyz ∂y ∂y ∂g1 2 ∂z − ∂g L L L zx zy zz ∂z =
0 0 2x 2y 2z
0 0 −1 1 1
2x −1 −2λ1 0 0
2y 1 0 −2λ1 0
2z 1 0 0 −2λ1
.
The first partial derivatives of g1 and g2 are evaluated at (x, y, z). The second partial derivatives of L are evaluated at (x, y, z, λ1 , λ2 ). ∗ At (1, 0, 0) with B3 (1, 0, 0) =
λ1 = 1 and λ2 = 1, we have 0 0 2 0 0 0 0 −1 1 1 2 −1 −2 0 0 = −16 0 1 0 −2 0 0 1 0 0 −2
(−1)3 B3 = 16 > 0.
Constrained Optimization-Inequality Constraints
267
We conclude that the point (1, 0, 0) is a local maximum to the constrained optimization problem (λ2 0, (−1)3 B3 > 0). ∗∗ At (− 13 , − 23 , − 23 ) with λ1 = −1 and λ2 = − 13 , we have 1 2 2 B3 (− , − , − ) = 3 3 3
0 0 − 23 − 43 − 43
0 0 −1 1 1
− 23 −1 2 0 0
− 43 1 0 2 0
− 43 1 0 0 2
= 16
(−1)2 B3 = 16 > 0.
We conclude that the point (− 13 , − 23 , − 23 ) is a local minimum to the constrained optimization problem (λ2 0, (−1)2 B3 > 0). iii) The set of the constraints is a closed bounded set of R2 as it is the intersection of the unit sphere [g1 = 1] and the region above the plane [g2 = 1]. By the extreme value theorem, f attains its extreme values on this set of the constraints. Therefore, the local points found in ii) are also the global extreme points. Hence, we have max
g1 =1, g2 1
f = f (1, 0, 0) = 1
min
g1 =1, g2 1
5 1 2 2 f = f (− , − , − ) = − . 3 3 3 3
5. – Classify the candidates of the problem ⎧ ⎨ g 1 = x2 + y 2 + z 2 = 1 local max(min)f (x, y, z) = x + y + z
s.t
⎩
g2 = x − y − z 1.
Solution: i) Note that f , g1 and g2 are C ∞ in R3 and consider the Lagrangian L(x, y, z, λ1 , λ2 ) = f (x, y, z) − λ1 (g1 (x, y, z) − 1) − λ2 (g2 (x, y, z) − 1) = x + y + z − λ1 (x2 + y 2 + z 2 − 1) − λ2 (x − y − z − 1). We look for the solutions of the system ⎧ (1) ⎪ ⎪ ⎪ ⎪ ⎨ (2) ⎪ ⎪ ⎪ ⎪ ⎩ (3)
Lx = 1 − 2xλ1 − λ2 = 0 Ly = 1 − 2yλ1 + λ2 = 0 Lz = 1 − 2zλ1 + λ2 = 0
⎧ ⎨ (4) ⎩
(5)
Lλ1 = −(x2 + y 2 + z 2 − 1) = 0 λ2 = 0
if
x − y − z < 1.
268
Introduction to the Theory of Optimization in Euclidean Space
From the first three equations, we deduce that λ2 = 1 − 2xλ1
λ1 (x + y) = 1
λ1 (x + z) = 1.
Note that λ1 = 0 is not possible because we would have from (1) : λ2 = −1, and from (2) : λ2 = 1. So λ1 = 0 and we have x+y =x+z =
1 λ1
=⇒
y = z.
∗ If x − y − z < 1, then λ2 = 0. We deduce that
1 = 2x, thus λ1
1 = 2x. So x = y = z, which inserted into (4) gives 3x2 = 1. λ1 Hence, we have two solutions x+y =x+z =
√
1 1 1 (x, y, z) = ( √ , √ , √ ) 3 3 3
with
1 1 1 (x, y, z) = (− √ , − √ , − √ ) 3 3 3
3 , 2
λ1 =
λ2 = 0
√ 3 , λ1 = − 2
with
λ2 = 0.
∗ If x − y − z = 1, then with y = z and (4), we have ⎧ ⎨ x = 1 + 2y 1 2 ⇐⇒ (x, y) = (1, 0) or (x, y) = (− , − ). ⎩ 3 3 2y(3y + 2) = 0 We deduce then (x, y, z) = (1, 0, 0)
1 2 2 (x, y, z) = (− , − , − ) 3 3 3 ii) Regularity of the points. We have ⎡ ⎢ g (x, y, z) = ⎣
1 1 1 g2 ( √ , √ , √ ) = 3 3 3
λ1 = 1,
with
∂g1 ∂y
∂g1 ∂z
∂g2 ∂x
∂g2 ∂y
∂g2 ∂z
√2 3
√2 3
√2 3
λ1 = −1,
with
∂g1 ∂x
⎤ ⎥ ⎦=
λ2 = −1
2x 1
2y −1
λ2 =
2z −1
1 . 3
1 1 1 = −g2 (− √ , − √ , − √ ) 3 3 3
Constrained Optimization-Inequality Constraints
269
1 1 1 1 1 1 rank(g2 ( √ , √ , √ )) = rank(g2 (− √ , − √ , − √ )) = 1. 3 3 3 3 3 3
g (1, 0, 0) =
2 1
0 −1
0 −1
1 2 2 g (− , − , − ) = 3 3 3
− 23 1
− 43 −1
− 43 −1
.
1 2 2 rank(g (1, 0, 0)) = rank(g (− , − , − )) = 2. 3 3 3 The four points are regular. Moreover, we will not have to renumber the variables since the first two column vectors of each derivative above are linearly independent. 1 1 1 iii) Classification of the points (± √ , ± √ , ± √ ). 3 3 3 Here n = 3, p = 1, thus we have to consider the sign of the following bordered Hessian determinants: 0 B2 = 2x 2y
2x −2λ1 0
2y 0 −2λ1
B3 =
0 2x 2y 2z
2x −2λ1 0 0
2y 0 −2λ1 0
2z 0 0 −2λ1
.
We have 1 1 8 1 1 1 1 B3 ( √ , √ , √ ) = −12 B2 ( √ , √ , √ ) = √ 3 3 3 3 3 3 3 1 1 1 (−1)r Br ( √ , √ , √ ) > 0 r = 2, 3. 3 3 3 Thus, the point is a local maximum since λ1 = 1 > 0 and (−1)2 B2 > 0, (−1)3 B3 > 0. 1 1 8 1 B2 (− √ , − √ , − √ ) = − √ 3 3 3 3
1 1 1 B3 (− √ , − √ , − √ ) = −12 3 3 3
1 1 1 (−1)1 Br (− √ , − √ , − √ ) > 0 3 3 3
r = 2, 3
Thus, the point is a local minimum since λ1 = −1 < 0 and (−1)1 B2 > 0, (−1)1 B3 > 0. 1 2 2 iv) Classification of the points (1, 0, 0), (− , − , − ). 3 3 3 Here n = 3, p = 2, thus we have to consider the sign of the following bordered Hessian determinant:
270
Introduction to the Theory of Optimization in Euclidean Space 0 0 2x 2y 2z 0 0 1 −1 −1 0 0 . B3 (x, y, z) = 2x 1 −2λ1 2y −1 0 0 −2λ 1 2z −1 0 0 −2λ1
∗ At (1, 0, 0) with λ1 = 1 and λ2 = −1, we have B3 (1, 0, 0) = −16. We conclude that the point cannot be a local maximum because λ2 = −1 0. It cannot also be a local minimum because the Hessian is not semi definite positive at the point on the tangent plane ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ h 0 h 2 0 0 ⎣ k ⎦= 0 M = ⎣ k ⎦: = k⎣ 1 ⎦ : k ∈ R 1 −1 −1 0 l l −1
0 k
−k
⎡
⎤⎡ ⎤ 0 0 0 −2 0 ⎦ ⎣ k ⎦ = −4k2 0 0 −2 −k
−2 ⎣ 0 0
on M.
∗∗ At (− 13 , − 23 , − 23 ) with λ1 = −1 and λ2 = 13 , we have B3 (− 13 , − 23 , − 23 ) = 16. We conclude that the point cannot be a local minimum because λ2 = 1/3 0. It cannot also be a local maximum because the Hessian is not semi definite negative at the point on the tangent plane ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ h 2 0 h 4 4 0 −3 −3 −3 ⎣ k ⎦= M = ⎣ k ⎦: = k ⎣ 1 ⎦: k ∈ R 0 1 −1 −1 l l −1
0
k
−k
⎡
2 ⎣ 0 0
0 2 0
⎤⎡ ⎤ 0 0 0 ⎦ ⎣ k ⎦ = 4k2 0 2 −k
on M.
v) The set of the constraints is a closed bounded set of R2 as it is the intersection of the unit sphere [g1 = 1] and the region below the plane [g2 = 1]. By the extreme value theorem, f attains its extreme values on this set of the constraints. Hence, we have max
g1 =1, g2 1
f (x, y, z) =
√
3
and
min
g1 =1, g2 1
√ f (x, y, z) = − 3.
Constrained Optimization-Inequality Constraints
4.4
271
Global Extreme Points-Inequality Constraints
When the Lagrangian is concave/convex on a convex constraint set, a solution of the Karush-Kuhn-Tucker conditions is a global maximum/minimum point.
Theorem 4.4.1 Let Ω ⊂ Rn , Ω be an open set and f, g1 , . . . , gm : Ω −→ ◦
R be C 1 functions. Let S ⊂ Ω be convex, x∗ ∈ S and L(x, λ) = f (x) − λ1 (g1 (x) − b1 ) − . . . − λm (gm (x) − bm ) ∃ λ∗ = λ∗1 , . . . , λ∗m : ∇x L(x∗ , λ∗ ) = 0 ∗ λj = 0 if gj (x∗ ) < bj j = 1, . . . , m. Then, we have λ∗ 0 and L(., λ∗ ) is concave in x ∈ S =⇒ f (x∗ ) =
λ∗ 0 and L(., λ∗ ) is convex in x ∈ S =⇒ f (x∗ ) =
max
f (x)
min
f (x)
S∩{x∈Ω: g(x)b}
S∩{x∈Ω: g(x)b}
Proof. i) First implication. The point x∗ is a critical point for the Lagrangian L(., λ∗ ) (∇x L(x∗ , λ∗ ) = 0) and L(., λ∗ ) is concave on the convex set S, then x∗ is a global maximum for L(., λ∗ ) on S (by Theorem 2.3.4). Thus, we have L(x∗ , λ∗ ) = f (x∗ ) − λ∗1 (g1 (x∗ ) − b1 ) − . . . − λ∗m (gm (x∗ ) − bm ) f (x) − λ∗1 (g1 (x) − b1 ) − . . . − λ∗m (gm (x) − bm ) = L(x, λ∗ ) At x∗ , we have so
λ∗j 0,
with
λ∗j = 0
−λ∗j (gj (x∗ ) − bj ) = 0
if
gj (x∗ ) < bj
j = 1, . . . , m,
∀x ∈ S.
j = 1, . . . , m
272
Introduction to the Theory of Optimization in Euclidean Space
and, the previous inequality reduces to L(x∗ , λ∗ ) = f (x∗ ) f (x) − λ∗1 (g1 (x) − b1 ) − . . . − λ∗m (gm (x) − bm ) = L(x, λ∗ ). For each j = 1, . . . , m, we also have , λ∗j 0 and gj (x) − bj 0, then −λ∗j (gj (x) − bj ) 0. Therefore, L(x∗ , λ∗ ) = f (x∗ ) L(x, λ∗ ) f (x)
∀x ∈ S ∩ {x ∈ Ω : g(x) b}.
Hence x∗ solves the constrained problem. ii) Second implication. This part can be deduced similarly. Moreover, it suggests, for example, when looking for candidates for a maximization problem that we keep the points with negative Lagrange multipliers and see if they are global minima points without maximizing (−f ) and introducing another Lagrangian. Example 1. Solve the problem min(max)f (x, y, z) = x2 + y 2 + z 2
s.t
g(x, y, z) = x − 2z −5.
Solution: Form the Lagrangian using the C ∞ functions f and g on R3 : L(x, y, z, λ) = x2 + y 2 + z 2 − λ(x − 2z + 5) Let us solve the system ⎧ (i) Lx = 2x − λ = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (ii) Ly = 2y = 0 ⎪ ⎪ (iii) Lz = 2z + 2λ = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (iv) λ = 0 if x − 2z + 5 < 0. ∗ If x − 2z + 5 < 0 , then λ = 0. From the equations (i), (ii) and (iii), we deduce that (x, y, z) = (0, 0, 0). But, then the inequality x − 2z + 5 < 0 is not satisfied. ∗∗ If x − 2z + 5 = 0 , then using (i), (ii) and (iii), we obtain λ = 2x = −z,
y = 0,
x−2z+5 = 0 ⇐⇒ (x, y, z) = (−1, 0, 2) with λ = −2
which is the only candidate point for maximality.
Constrained Optimization-Inequality Constraints
273
Now, we study the convexity/concavity of L have ⎡ 2 HL(.,−2) (x, y, z) = ⎣ 0 0
in (x, y, z) when λ = −2. We ⎤ 0 0 2 0 ⎦ 0 2
The leading principal minors are such that:
∀(x, y, z) ∈ R3 ,
D1 (x, y, z) = 2 > 0,
D2 (x, y, z) = 4 > 0,
D3 (x, y, z) = 8 > 0.
Hence, L(., −2) is strictly convex in (x, y, z), and we conclude that the point (−1, 0, 2) is the solution to the constrained manimization problem. The maximization problem doesn’t have a solution, since there is only one solution to the system and it is a global minimum point. Interpretation. The problem looks for the shortest and farthest distance of the origin to the space region located below the plan x − 2z + 5 = 0. The shortest distance is attained on the plane.
Remark 4.4.1 The rank condition, at the point x∗ , is not assumed in the theorem. The proof uses the characterization of a C 1 convex function on a convex set only.
Example 2. In Example 4, Section 4.2, the point (1, 1) doesn’t satisfy the rank condition. It solves the KKT conditions related to the problem with linear constraints: max F (x, y) = ln x + ln y 2x + y 3,
x + 2y 3
and
subject to
x+y 2
with
x > 0,
y > 0.
Use concavity to show that (1, 1) solves the problem. Solution: i) With the Lagrangian L(x, y, λ1 , λ2 , λ3 ) =
1 1 ln x+ ln y−λ1 (2x+y−3)−λ2 (x+2y−3)−λ3 (x+y−2), 2 4
the Hessian with respect to (x, y) is ⎡
1 ⎢ − x2 HL(.,λ1 ,λ2 ,λ3 ) (x, y) = ⎣ 0
⎤ 0 ⎥ 1 ⎦ − 2 y
274
Introduction to the Theory of Optimization in Euclidean Space
is strictly definite negative since the leading principal minors are such that D1 (x, y) = −
1 < 0, x2
D2 (x, y) =
1 x2 y 2
>0
for (x, y) ∈ Ω = (0, +∞) × (0, +∞). So the Lagrangian is strictly concave in (x, y) ∈ Ω, and (1, 1) is the maximum point.
Remark 4.4.2 The concavity/convexity hypothesis is a sufficient condition. We may have a global extreme point with a Lagrangian that is neither concave nor convex (see Exercise 3).
Example 3. In Exercise 2, Section 4.3, the points √ 4 ( 3, √ ) 3
with
√ λ = −2 3
and
√ 4 (− 3, − √ ) 3
with
√ λ=2 3
solve respectively the local min and local max problems local max (min)f (x, y) = x2 y + 3y − 4
s.t
g(x, y) = 4 − xy 0.
Are there global extreme points? Solution: i) Let us explore the concavity and convexity of L with respect to (x, y) L(x, y, λ) = x2 y + 3y − 4 + λ(xy − 4) The Hessian matrix of L in (x, y) is 2y Lxx Lxy = HL = 2x + λ Lyx Lyy √ When λ = 2 3, the principal minors are Δ11 = Lyy = 0,
Δ21 = Lxx = 2y
and
So L is neither concave nor convex in (x, y) ∈ R2 . √ Similarly, when λ = −2 3 the principal minors are Δ11 = Lyy = 0,
Δ21 = Lxx = 2y
and
2x + λ 0
Δ2 = −(2x + λ)2 .
√ Δ2 = −(2x − 2 3)2 ,
and L is neither concave nor convex in (x, y). Therefore, we cannot use this sufficient condition to conclude anything about the global optimality of the candidate points.
Constrained Optimization-Inequality Constraints
275
ii) Note that, on the boundary of the constraint set [g 4], we have y = 4/x and f takes the values 12 4 −4 f (x, ) = 4x + x x and
4 4 lim f (x, ) = +∞ and lim f (x, ) = −∞. x→−∞ x x Hence f doesn’t attain an absolute maximum nor an absolute minimum value on the constraint set. x→+∞
Remark. Note that √ √ √ √ 24 24 f (− 3, −4/ 3) = − √ − 4. f ( 3, 4/ 3) = √ − 4 3 3 √ √ √ √ √ √ With and √ minimum √ √ f ( 3,√4/ 3) > f (− 3, −4/ 3), ( 3, 4/ 3) being a local we can see that ( 3, 4/ 3) cannot (− 3, −4/ 3) being a local maximum, √ √ be a global minimum and (− 3, −4/ 3) cannot be a global maximum. A constrained global extreme point would be a local one since any point of the set of the constraints g = 4 is an interior point and regular. Example 4. Quadratic programming. The general quadratic program (QP) can be formulated as min
1 t xQx +t x.d 2
Ax b
s.t
where Q is a symmetric n × n matrix, d ∈ Rn , b ∈ Rm and A an m × n matrix. Introduce the Lagrangian L(x, λ) = −(
1 t xQx +t x.d) − λ(Ax − b) 2
and write the KKT conditions ⎧ ⎨ ∇x L = −Qx − d −t Aλ = 0 ⎩
λi 0
with
λi = 0
if
(Ax)i < bi .
If (x∗ , λ∗ ) is a solution of the KKT conditions, and x∗ is a candidate point where p constraints are active (Ax)ik = bik , k = 1, . . . , p, then the second derivatives test at the point shows whether the point is a solution or not since the HL (x, λ∗ ) = Q is constant and the constraints are linear. Thus the positivity of the Hessian subject to these constraints is equivalent to test the
276
Introduction to the Theory of Optimization in Euclidean Space
bordered determinants formed from the matrix ⎡ t ⎤ ai1 0 Ap ⎢ ⎥ t Ap = ⎣ ... ⎦ aik is the ik eme row vector of A t Ap Q t aip
Remark 4.4.3 * To sum up, solving an unconstrained or constrained optimization problem leads to solving a nonlinear system F (x, λ) = 0 that appears in different forms no constraints f (x) = 0 ↓ F(x, λ) = 0
equality constraints ∇x,λ L(x, λ) = 0
inequality constraints
∇x L(x, λ) = 0,
λ.(g(x) − b) = 0
On the other hand, solving a nonlinear equation is not easy even when F is a polynomial of degree 3 of one variable. ** The importance of the theorems studied comes from - locating the possible candidates - showing how to compare the values of f along the feasible directions. These two points are the start for the development of numerical methods for approaching the solution with accuracy (see [17], [19], [8], [4]). *** The proofs we studied for optimization problems in the Euclidean space constitute a natural step to more complex ones developed in calculus of variation where the maximum and minimum are searched in a class of functions and where the objective function is a function defined on that class (see [16], [6], [9]).
Constrained Optimization-Inequality Constraints
277
Solved Problems
1. – Distance to an hyperplane. Let a ∈ Rn , a = 0, b ∈ R b > 0. i) Solve
min x2
subject to
t
a.x b.
ii) Deduce the solution to the following problems. 5 + x2 + y 2 β) max −6x2 − 6y 2 − 6z 2 + 4 α) min −x+y2
2x−y+2z−1
Solution: i) Let t a = a1 . . . an , t x = x1 . . . xn . The minimization problem looks for points in the region above the hyperplane t
a.x = b
⇐⇒
a 1 x 1 + a2 x 2 + . . . + a n x n = b
that are closest to the origin. It is a nonlinear minimization problem with inequality constraints. We introduce the Lagrangian L(x, λ) = −(x21 + x22 + . . . + x2n ) + λ(a1 x1 + a2 x2 + . . . + an xn − b) and write the KKT conditions ⎧ Lx1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨
= −2x1 + λa1 = 0 .. .
⎪ ⎪ ⎪ Lxn = −2xn + λan = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ λ0 with λ = 0
if
a1 x1 + a2 x2 + . . . + an xn > b.
278
Introduction to the Theory of Optimization in Euclidean Space
Finding a candidate. * If a1 x1 + a2 x2 + . . . + an xn > b, then λ = 0. We get x1 = . . . = xn = 0. But then, we have a contradiction with a1 (0) + . . . + an (0) = 0 b. * If a1 x1 + a2 x2 + . . . + an xn = b, then λ x i = ai , i = 1, . . . , n 2 which inserted in the equation of the hyperplane, we obtain a1
λ
λ
λ
a1 + a 2 a2 + . . . + a n an = b 2 2 2
λ
⇐⇒
2
=
b . a2
Hence, a solution to the system is xi =
b ai , a2
⇐⇒
i = 1, . . . , n
x=
b a. a2
Finding the solution.
Lx 1 x 1 .. .
... .. .
b , consider the Hessian matrix a2 ⎤ ⎡ ⎤ Lx 1 x n −2 . . . 0 ⎥ ⎢ . .. .. ⎥ .. ⎦ = ⎣ .. . . . ⎦
Lxn x1
...
Lxn xn
To study the concavity of L in x when λ = 2 ⎡ HL(.,λ) (x)
=
⎢ ⎣
0
...
−2
The leading minor principals are equal to Dk (x) = (−2)k , k = 1, . . . , n. The matrix is semi-definite negative. Thus, the point maximizes −x2 subject to the constraint t ax b. Hence, the point solves the minimization problem and the minimal distance of the origin to this point is equal to b b . a = a2 a ii) α) Note that min min 5 + x2 + y 2 =
−x+y2
−x+y2
5 + min 5 + x2 + y 2 =
−x+y2
Moreover, we have min
−x+y2
x2 + y 2
=
min⎡ −1 1
.⎣
x y
⎤ ⎦2
(x, y)2 .
x2 + y 2 .
Constrained Optimization-Inequality Constraints Thus min
−x+y2
and is attained at
x2 + y 2 =
279
√ 2 2 =√ = 2 −1 2 1
(x∗ , y ∗ ) = (−1, 1). Hence min
−x+y2
) 5+
x2
+
y2
=
5+
√ 2.
β) We have max
2x−y+2z−1
−6x2 − 6y 2 − 6z 2 + 4 = 4 − 6
min
−2x+y−2z1
x2 + y 2 + z 2
Thus min
−2x+y−2z1
max
2x−y+2z−1
x2 + y 2 + z 2
=
min
−2 1 −2
−6x2 − 6y 2 − 6z 2 + 4 = 4 −
and is attained at
(x∗ , y ∗ , z ∗ ) =
⎡ ⎢ .⎢ ⎣
x y z
⎤
(x, y, z)2 ,
⎥ ⎥1 ⎦
6 6 ⎤ = 4 − √ = 2, −2 9 ⎣ 1 ⎦ −2 ⎡
1 (−2, 1, −2). 9
2. – Distance to an hyperplane with positive constraints. i) Let a ∈ Rn , a = 0, b ∈ R b > 0. Solve min x2
⎧ t ⎨ a.x b subject to
⎩
x0
ii) Minimize x2 + y 2 over the following sets α)
− y −2, x 0
β)
x − y 2, x 0, y 0
γ)
− x + y 2, x 0, y 0
δ)
x + y 2, x 0, y 0
Sketch graphs to check the solution.
280
Introduction to the Theory of Optimization in Euclidean Space t x = x1 . . . xn . The minSolution: i) Let t a = a1 . . . an , imization problem looks for points with positive coordinates in the region above the hyperplane t
⇐⇒
a.x = b
a1 x1 + a2 x2 + . . . + an xn = b,
and that are closest to the origin. It is a nonlinear minimization problem with inequality constraints. We introduce the Lagrangian L(x, λ) = −(x21 + x22 + . . . + x2n ) + λ(a1 x1 + a2 x2 + . . . + an xn − b) and write the KKT conditions ⎧ Lx1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨
= −2x1 + λa1 0 .. .
⎪ ⎪ ⎪ Lxn = −2xn + λan 0 ⎪ ⎪ ⎩ λ0 with λ = 0
(= 0
if x1 > 0)
(= 0
if xn > 0)
a1 x1 + a2 x2 + . . . + an xn > b.
if
Finding a candidate. * If xi = 0 for each i ∈ {1, . . . , n}, then, a1 (0) + . . . + an (0) = 0 b > 0, and we get a contradiction with. * If xi0 > 0 for some i0 ∈ {1, . . . , n}, then λ ai 2 0 then λ > 0 and ai0 > 0. As a consequence, we have a1 x1 +a2 x2 +. . .+an xn = b. Suppose xi > 0 for i ∈ {i0 , i1 , . . . , ip }, and xi = 0 for i = i0 , i1 , . . . , ip . Then, −2xi0 + λai0 = 0
λaj 0
for j = i0 , i1 , . . . , ip
⇐⇒
⇐⇒
xi0 =
aj 0
for j = i0 , i1 , . . . , ip
since λ > 0. Hence, we can write xj =
λ λ max(aj , 0) = (aj )+ 2 2
for j = i0 , i1 , . . . , ip
and get a unified formula for the candidate point x∗ =
λ + a 2
t +
a =
a+ 1
...
a+ n
Constrained Optimization-Inequality Constraints
281
Inserting the expression of x∗ in the equation of the hyperplane, we obtain a1
λ
λ
λ
a+ + a2 a+ + . . . + a n an + = b 1 2 2 2 2
λ
⇐⇒
2
=
b . a+ 2
Hence, a solution to the system is xi =
b a+ , a+ 2 i
i = 1, . . . , n
⇐⇒
x=
b a+ . a+ 2
Finding the solution. To study the concavity of L in x when λ = 2 matrix
⎡
HL(.,λ) (x)
=
Lx 1 x 1 ⎢ .. ⎣ . Lxn x1
b , consider the Hessian a+ 2
⎤ Lx 1 x n ⎥ .. ⎦ . Lxn xn
... .. . ...
⎡ =
−2 ⎢ .. ⎣ . 0
... .. . ...
⎤ 0 .. ⎥ . ⎦ −2
The leading minor principals are equal to Dk (x) = (−2)k , k = 1, . . . , n. The matrix is semi-definite negative. Thus, the point maximizes −x2 subject to the constraint t ax b and to the positivity constraint x 0. Hence, the point solves the minimization problem and the minimal distance of the origin to this point is equal to b b + 2 a+ = + . a a ii) Here, in filling Table 4.4, we have b = 2. a
t +
a
a+
(x∗ , y ∗ )
α
(0, 1)
(0, 1)
1
(0, 2)
β
(1, −1)
(1, 0)
1
(2, 0)
γ
(−1, 1)
(0, 1)
1
(0, 2)
δ
(1, 1)
(1, 1)
√
√ √ ( 2, 2)
set
t
2
TABLE 4.4: Minima points for x2 + y 2 on the four sets One can easily check the minimal distance of the origin to the given sets from the graphics in Figure 4.21.
282
Introduction to the Theory of Optimization in Euclidean Space y
y
2.5
5
y x2
2.0 4 1.5 x0, y2
3 2
y2
1.0
xy2
0.5
x0, y0
1 x0 0.5
0.5
1.0
1.5
2.0
x
1
2
3
4
5
x
0.5 y
y
2.5
5 x0, y0 xy2
4
2.0
xy2
1.5
x0, y0
3 1.0 2 0.5 1
0.5
0.5
1.0
1.5
2.0
x
1
2
3
4
x
0.5
FIGURE 4.21: Closest point of the constraint set to the origin
3. – L not convex nor concave Consider the following minimization problem: ⎧ 2 ⎪ ⎪ y 4−x ⎪ ⎪ ⎨ y 3x min x2 + y 2 s.t ⎪ ⎪ ⎪ ⎪ ⎩ y −3x i) Sketch the feasible set. ii) Write the problem as a maximization problem in the standard form, and write down the necessary KKT conditions for a point (x∗ , y ∗ ) to be a solution of the problem. iii) Find the points that satisfy the KKT conditions. Check whether or not each point is regular. iv) Determine whether or not the point(s) in part ii) satisfy the secondorder sufficient condition. v) Explore the concavity of the Lagrangian in (x, y) ∈ R2 . vi) What can you conclude about the solution of the problem? vii) Give a geometric interpretation of the problem that confirms the solution you have found (Hint: use level curves).
Solution: i) The feasible set is the plane region located above the curve and the two lines, as described in Figure 4.22.
Constrained Optimization-Inequality Constraints
283
y
6
S
y 3 x
y3x
4
2
y 4 x2
3
2
1
1
2
x
3
2
FIGURE 4.22: The constraint set S
ii) Writing the KKT conditions. The problem is equivalent to the following maximization problem
max (−x2 − y 2 )
subject to
⎧ g1 (x, y) = 4 − x2 − y 0 ⎪ ⎪ ⎪ ⎪ ⎨ g2 (x, y) = 3x − y 0 ⎪ ⎪ ⎪ ⎪ ⎩ g3 (x, y) = −3x − y 0.
Consider the Lagrangian L(x, y, λ, β, γ) = −x2 − y 2 − λ(4 − x2 − y) − β(3x − y) − γ(−3x − y). The conditions are ⎧ (1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (2) ⎪ ⎪ ⎪ ⎪ ⎨ (3) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (4) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (5)
Lx = −2x + 2λx − 3β + 3γ = 0 Ly = −2y + λ + β + γ = 0 λ0
with
λ=0
if
4 − x2 − y < 0
β0
with
β=0
if
3x − y < 0
γ0
with
γ=0
if
− 3x − y < 0
284
Introduction to the Theory of Optimization in Euclidean Space
iii) Solving the equations satisfying the KKT conditions. • If 4 − x2 − y < 0 then λ = 0 and ⎧ ⎨ −2x − 3β + 3γ = 0 ⎩
−2y + β + γ = 0
then we discuss ∗ Suppose 3x − y < 0, then β = 0. Thus, ⎧ ⎨ −2x + 3γ = 0 ⎩
−2y + γ = 0
we get 2x = 3γ = 6y =⇒ x = 3y. But, then 3x − y = 3(3y) − y = 8y < 0 and hence γ < 0 which contradicts γ 0. ∗ Suppose 3x − y = 0. We have then ⎧ ⎨ −2x − 3β + 3γ = 0 ⎩
=⇒
−2(3x) + β + γ = 0
6γ = 20x
and
3β = 8x 0.
We deduce that x 0 and y 0. So x = y = 0 since −3x − y 0. But, this contradicts 4 − 02 − 0 = 4 < 0. •• If 4 − x2 − y = 0 then ∗ Suppose 3x − y < 0 then β = 0 and ⎧ ⎨ −2x + 2λx + 3γ = 0 ⎩
−2y + λ + γ = 0
– Suppose −3x − y < 0 then γ = 0 and ⎧ ⎨ −2x + 2λx = 0 ⎩
−2y + λ = 0
⇐⇒
⎧ ⎨ 2x(−1 + λ) = 0 ⇐⇒ ⎩
x = 0 or λ = 1
2y = λ 0 √ √ ◦ λ = 1 leads to y = 1/2 and x = ± 7/2. But, for (x, y) = (√7/2, 1/2), the inequality 3x − y < 0 is not satisfied, and for (x, y) = (− 7/2, 1/2), the inequality −3x − y < 0 is not satisfied. So we cannot have λ = 1.
Constrained Optimization-Inequality Constraints
285
◦ x = 0 leads to y = 4 and λ = 8 > 0. The two inequalities 3x − y < 0 and −3x − y < 0 are satisfied at this point. Hence, the following point is a solution: (x∗ , y ∗ ) = (0, 4)
(λ∗ , β ∗ , γ ∗ ) = (8, 0, 0)
with
←−
– Suppose −3x − y = 0 then y = −3x and ⎧ ⎨ −2x + 2λx + 3γ = 0
⎧ ⎨ −2x + 2λx + 3γ = 0 ⎩
=⇒
−2(−3x) + λ + γ = 0
⎩
λ + γ = −6x
From 4 − x2 − y = 0, we have y = −3x
and
4 − x2 − y = 0
⇐⇒
(x, y) = (−1, 3)
or
(4, −12).
The point (4, −12) doesn’t satisfy the inequality 3x − y < 0, so it cannot be a solution. The point (−1, 3) satisfies the inequality 3x − y < 0, and we have ⎧ ⎨ −2λ + 3γ = −2 =⇒ (λ, γ) = (4, 2). ⎩ λ+γ =6 Thus, we have another candidate point: (x∗ , y ∗ ) = (−1, 3)
with
(λ∗ , β ∗ , γ ∗ ) = (4, 0, 2)
←−
∗∗ Suppose 3x − y = 0 then y = 3x. We have y = 3x
and
4 − x2 − y = 0
⇐⇒
(x, y) = (−4, −12)
or
(1, 3).
The points (−4, −12) doesn’t satisfy the inequality −3x − y 0, so it cannot be a candidate. The point (1, 3) satisfies the inequality −3x − y < 0, thus γ = 0, and we have ⎧ ⎨ 2λ − 3β = 2 =⇒ (λ, β) = (4, 2). ⎩ λ+β =6
286
Introduction to the Theory of Optimization in Euclidean Space Thus, we have another candidate point: (x∗ , y ∗ ) = (1, 3)
with
(λ∗ , β ∗ , γ ∗ ) = (4, 2, 0)
←−
Regularity of the candidate point (0, 4). Only the constraint g1 (x, y) = 4 − x2 − y is active at (0, 4) and we have g1 (0, 4) = 0 −1 rank(g1 (0, 4)) = 1. g1 (x, y) = −2x −1 Thus the point (0, 4) is a regular point.
Regularity of the candidate point (−1, 3). Only the constraints g1 (x, y) = 4 − x2 − y and g3 (x, y) = −3x − y are active at (−1, 3) and we have g1 (x, y) −2x −1 g1 (−1, 3) 2 −1 = = . −3 −1 −3 −1 g3 (x, y) g3 (−1, 3) Thus the point (1, −3) is a regular point since rank(
g1 (−1, 3) g3 (−1, 3)
) = 2.
Regularity of the candidate point (1, 3). Only the constraints g1 (x, y) = 4 − x2 − y and g2 (x, y) = 3x − y are active at (1, 3) and we have g1 (x, y) −2x −1 g1 (1, 3) −2 −1 = = . 3 −1 3 −1 g2 (x, y) g2 (1, 3) g1 (1, 3) ) = 2. Thus the point (1, 3) is a regular point since rank( g2 (1, 3) iv) With p = 2 (the number of active constraints) at the points (3, −1) and (3, 1), n = 2 (the dimension of the space), then p = n. The second derivatives test cannot be applied since it is established for p < n. For the point (0, 4), we have p = 1 < 2 = n. We consider the following determinant (r = p + 1 = 2) (Note that the first column vector of [g1 (x, y)] is linearly dependent, so we have to renumber the variables) 0 1 B2 (x, y) = ∂g ∂y ∂g 1 ∂x
∂g1 ∂y
Lyy Lxy
∂g1 ∂x
Lyx Lxx
=
0 −1 −1 −2 −2x 0
−2x 0 −2 + 2λ
Constrained Optimization-Inequality Constraints
287
∗ At (0, 4), we have λ = 8, 0 −1 B2 (0, 4) = −1 −2 0 0
0 0 14
= −14.
We have (−1)2 B2 (0, 4) = −14 < 0. So the second derivatives test is not satisfied at (0, 4). v) Let us explore the concavity and convexity of L with respect to (x, y) where the Hessian matrix of L in (x, y) is −2 0 Lxx Lxy = HL = 0 −2 + 2λ Lyx Lyy When λ = 8 or 4, the principal minors are Δ11 = Lyy = −2 + 2λ > 0
Δ21 = Lxx = −2 < 0
Δ2 = 4(1 − λ) < 0.
Therefore, L is neither concave, nor concave in (x, y). vi) We have a situation where the theorems studied remain inconclusive. To conclude, we proceed by comparison. Since, the candidate points are on the boundary of the constraint set, let us study directly the values of the objective function on these points. On the lines y = ±3x, with |x| 1, the function f (x, y) = x2 + y 2 takes the values f (x, ±3x) = x2 + (±3x)2 = 10x2 10 = f (1, ±3)
∀ |x| 1.
On the parabola x2 = 4 − y, with |x| 1,we have f (x, 4 − x2 ) = x2 + (4 − x2 )2 = x4 − 8x2 + 16 + x2 = x4 − 7x2 + 16 = ϕ(x) ϕ (x) = 4x3 − 14x = 2x(2x2 − 7) = 0 ⇐⇒ x = 0, ± 7/2. By the extreme value theorem, ϕ attains its extreme values on the closed bounded interval [−1, 1] at the critical points inside the interval (−1, 1) or at the end points. Therefore, we have min ϕ(x) = min{ϕ(−1), ϕ(0), ϕ(1)} = min{10, 16, 10} = 10.
[−1,1]
Thus, f (x, 4 − x2 ) = ϕ(x) 10 = f (±1, 3)
∀ |x| 1.
288
Introduction to the Theory of Optimization in Euclidean Space
So we can conclude that the minimum value attained by f on the set of the constraints is 10.
vii) The feasible set is S = {(x, y) :
4 − x2 − y 0,
3x − y 0, 2
−3x − y 0}
2
The level curves of f , with equations : x + y = k where k 0, are circles √ centered at (0, 0) with radius k; see Figure 4.23. If we increase the values of the radius, the values of f increase. The value k = 10 is the first one at which the level curve intersects the constraints g1 = g2 = 0 and g1 = g3 = 0. Thus the value 10 is the minimal value of f reached at (±1, 3). Moreover, the objective function f (x, y) = x2 +y 2 is the square of the distance between (x, y) and (0, 0). So our problem is to find the point(s) in the feasible region that are closest to (0, 0). y
6
S
4
2
4
2
2
4
x
2
4
FIGURE 4.23: Level curves of f and the closest points of S to the origin
4. – The data in Table 4.5 can be found in [8]. Here we consider boundary conditions to illustrate an inequality constrained problem. The Body Fat Index (BFI) measures the fitness of an individual. It is a function of the body density ρ (in units of kilograms per liter) according
Constrained Optimization-Inequality Constraints to Brozek’s formula, BF I =
289
457 − 414.2. ρ
However the accurate measurement of ρ is costing. An alternative solution is to try to describe the dependence of the BFI with respect of five variables x1 , x2 , x3 , x4 , x5 in the form f : x −→ BF I = y = f (x) = a1 x1 + a2 x2 + a3 x3 + a4 x4 + a5 x5 The variables are easier to measure and represent
x1 = weight(lb.) x4 = wrist(cm.)
x2 = height(in.) x5 = neck(cm.)
x3 = abdomen(cm.) y = BF I
Using the following table of measurements, we assume the average of each category x ¯i of measurements satisfying: ¯ 1 + a2 x ¯ 2 + a3 x ¯ 3 + a4 x ¯ 4 + a5 x ¯5 y¯ a1 x
(∗)
hoping to find a model when BF I y¯ = 15.23. i)
Use a software to find a linear function f which best fits the given data, in the sense of least-squares, i.e., find a that minimizes the sum of the square errors 10 i=1
(f (xi ) − yi )2 =
10
(a1 xi1 + a2 xi2 + a3 xi3 + a4 xi4 + a5 xi5 − yi )2
s.t
i=1
xi = (xi1 , xi2 , xi3 , xi4 , xi5 ) are the measurements for the ieme individual. ii)
Formulate the constrained problem using matrices. Use Maple to check that the Hessian of the resulting objective function is definite positive on the convex described by (∗).
Solution: We use Maple software for solving the problem. i) Finding the linear regression of best fit. We solve the “least square problem” “LS” with ten linear residuals. The objective function is 1 (154.25a1 + 67.75a2 + 85.2a3 + 17.1a4 + 36.2a5 − 12.7)2 ϕ(a1 , a2 , a3 , a4 , a5 ) = 2 +(173.25a1 + 72.25a2 + 83a3 + 18.2a4 + 38.5a5 − 6.9)2 +(154a1 + 66.25a2 + 87.9a3 + 16.6a4 + 34a5 − 24.6)2 +(184.75a1 + 72.25a2 + 86.4a3 + 18.2a4 + 37.4a5 − 10.9)2 +(184.25a1 + 71.25a2 + 100a3 + 17.7a4 + 34.4a5 − 27.8)2
290
Introduction to the Theory of Optimization in Euclidean Space x1 154.25 173.25 154 184.75 184.25 210.25 181 176 191 198.25
x2 67.75 72.25 66.25 72.25 71.25 74.75 69.75 72.5 74 73.5
x3 85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6
x4 17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2
x5 36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1
y 12.6 6.9 24.6 10.9 27.8 20.6 19 12.8 5.1 12
TABLE 4.5: Measurements involved in BFI +(210.25a1 + 74.75a2 + 94.4a3 + 18.8a4 + 39a5 − 20.6)2 +(181a1 + 69.75a2 + 90.7a3 + 17.7a4 + 36.4a5 − 19)2 +(176a1 + 72.5a2 + 88.5a3 + 18.8a4 + 37.8a5 − 12.8)2 +(191a1 + 74a2 + 82.5a3 + 18.2a4 + 38.1a5 − 5.1)2 +(198.25a1 + 73.5a2 + 88.6a3 + 19.2a4 + 42.1a5 − 12)2
with(Optimization) : LSSolve( [154.25a1 + 67.75a2 + 85.2a3 + 17.1a4 + 36.2a5 − 12.6, 173.25a1 + 72.25a2 + 83a3 + 18.2a4 + 38.5a5 − 6.9, 154a1 + 66.25a2 + 87.9a3 + 16.6a4 + 34a5 − 24.6, 184.75a1 + 72.25a2 + 86.4a3 + 18.2a4 + 37.4a5 − 10.9, 184.25a1 + 71.25a2 + 100a3 + 17.7a4 + 34.4a5 − 27.8, 210.25a1 + 74.75a2 + 94.4a3 + 18.8a4 + 39a5 − 20.6, 181a1 +69.75a2 +90.7a3 +17.7a4 +36.4a5 −19, 176a1 +72.5a2 +88.5a3 +18.8a4 +37.8a5 −12.8, 191a1 +74a2 +82.5a3 +18.2a4 +38.1a5 −5.1, 198.25a1 +73.5a2 +88.6a3 +19.2a4 +42.1a5 −12], {180.7a1 + 71.425a2 + 88.72a3 + 18.05a4 + 37.39a5 ≤ 15.23}) [15.0549945448635683, [a1 = 0.474753096134219e − 1, a2 = −1.03634130223772, a3 = 1.22920301075594, a4 = −1.86308283592359, a5 = .140089140413700]] Thus f (x1 , x2 , x3 , x4 , x5 ) ≈ 0.474x1 − 1.036x2 + 1.229x3 − 1.863x4 + .140x5 f can be used to predict an individual’s body fat index, based upon the five measurements types. Comments. - Least square problems are solved by the LSSolve command. - When the residuals in the objective function and the constraints are all linear, which is the case here, then an active set method is used. This is an approximate method [19],[22], [17]. -The LSSolve command uses various methods implemented in a built in library provided by a group of numerical algorithms. ii) Finding the linear regression of best fit using matrices. Let G = (xi1 , xi2 , xi3 , xi4 , xi5 )i=1,...,10 ∈ M10;5 be the matrix whose rows are the vectors xi , or equivalently, the matrix whose columns are the five first columns entries of the table. Let
Constrained Optimization-Inequality Constraints
291
c be the last column entry of the table. Denote a =t (a1 , a2 , a3 , a4 , a5 ),
A =t (180.7, 71.425, 88.72, 18.05, 37.39),
then ϕ(a) = ϕ(a1 , a2 , a3 , a4 , a5 ) =
1 1 ((G.a − c)1 )2 + . . . + ((G.a − c)10 )2 = G.a − c 2 2 2
and the problem can be expressed as 1 G.a − c 2 subject to Aa b. 2 Following Maple’s instructions, we enter the data using matrices min
with(Optimization) : c := V ector([12.6, 6.9, 24.6, 10.9, 27.8, 20.6, 19, 12.8, 5.1, 12], datatype = f loat) : G := M atrix([[154.25, 67.75, 85.2, 17.1, 36.2], [173.25, 72.25, 83, 18.2, 38.5], [154, 66.25, 87.9, 16.6, 34], [184.75, 72.25, 86.4, 18.2, 37.4], [184.25, 71.25, 100, 17.7, 34.4], [210.25, 74.75, 94.4, 18.8, 39], [181, 69.75, 90.7, 17.7, 36.4], [176, 72.5, 88.5, 18.8, 37.8], [191, 74, 82.5, 18.2, 38.1], [198.25, 73.5, 88.6, 19.2, 42.1]], datatype = f loat) : with(Statistics) : A := M ean(G) : b := M ean(c) : A := M atrix([[180.7, 71.425, 88.72, 18.05, 37.39]], datatype = f loat) : b := V ector([15.23], datatype = f loat) : lc := [A, b] : LSSolve([c, G], lc) : ⎡
⎡
⎢ ⎢ ⎢ 15.0549945448635683, ⎣
⎢ ⎢ ⎢ ⎣
0.0474753096134219 −1.03634130223772 1.22920301075594 −1.86308283592359 0.140089140413700
⎤ ⎤ ⎥ ⎥ ⎥ ⎦
⎥ ⎥ ⎥ ⎦
Hence, we obtain the same coefficients ai . The Hessian of ϕ is ϕ(a) = =
b = 15.23
1 1 G.a − c 2 = (G.a − c).(G.a − c) 2 2
1 1 ( G.a 2 − 2t c.G.a + c 2 ) = (t at GG.a − 2t c.G.a + c 2 ) 2 2 ϕ (a) =t GG.a − G.c ϕ (a) =t GG
Checking that the Hessian is definite positive. with(LinearAlgebra) H := M ultiply(T ranspose(G), G) : IsDef inite(H) true
292
Introduction to the Theory of Optimization in Euclidean Space
4.5
Dependence on Parameters
The cost to produce an output Q is equal to rK + wL where r and w are respectively the prices of the input capital K and labor L. The firm would like the output to obey the Cobb-Douglas production function Q = cK a Lb (r > 0, w > 0, c > 0, a + b < 1). Thus, to minimize the cost of production, the problem is expressed as: min rK + wL
subject to
cK a Lb = Q
with (K, L) ∈ (0, +∞) × (0, +∞). Using Lagrange’s multiplier method, the unique solution is (see Example 1, Section 3.4) 1 a b Q a+b aQ bQ aQ a+b bQ a+b L∗ = λ λ∗ = . K∗ = λ r w c r w One can see the dependence of the extreme point on the parameters r, w, c, a, b. In general, it is not easy to express explicitly the solution with respect of many parameters. On the other hand, changing the parameters and solving a new optimization problem is costing or difficult. An alternative solution is to have an estimate on how much the optimal value changes compared to an initial situation. To set the main result of this section, we suppose the objective function f and the constraint function g depending on a parameter r ∈ Rk , i.e. f (x, r) = f (x1 , . . . , xn , r1 , . . . , rk ), g = (g1 , . . . , gn ), g(x, r) = g(x1 , . . . , xn , r1 , . . . , rk ) I(x(r)) = {i ∈ {1, · · · , m} : gi (x(r), r) < 0}, Consider the problem (Pr ) f ∗ (r) = local max f (x, r)
(resp. local min)
s.t
g(x, r) 0
and introduce the Lagrangian L(x, λ, r) = f (x, r) − λ1 g1 (x, r) − . . . − λm gm (x, r).
Hypothesis (Hr ). f and g are C 2 functions in a neighborhood of x∗ and for each r ∈ Bδ (¯ r) ⊆ Rk such that: " ! p 0
r) ⊆ A, B1 (x∗ ) × B2 (λ∗p ) × Bη (¯ det(∇x,λp F (x, λp , r)) = 0
in
B1 (x∗ ) × B2 (λ∗p ) × Bη (¯ r)
such that r), ∀r ∈ Bη (¯ (x, λp ) :
∃!(x, λp ) ∈ B1 (x∗ ) × B2 (λ∗p ) :
F (x, λp , r) = 0
Bη (¯ r) −→ B1 (x∗ ) × B2 (λ∗p ) r −→ (x(r), λp (r))
are C 1 functions.
296
Introduction to the Theory of Optimization in Euclidean Space
Remark 4.5.2 * In the theorem above, the local max(min) problem can be replaced by the max(min) problem, provided we assume, for example, −
∀r ∈ B(¯ r, δ), x −→ L(x, λ∗ , r) is strictly concave (resp. convex)
* For the unconstrained case, L is reduced to f , F (x, r) = ∇x f (x, r) and det(∇x F (x∗ , r¯)) = detHf (x∗ , r¯).
Example 1. Suppose that when a firm produces and sells x units of a commodity, it has a revenue R(x) = x, while the cost is C(x) = x2 . i) Find the optimal choice of units of the commodity that maximize profit. ii) Find the approximate change of the optimal profit if the revenue changes to 0.99x. Solution: i) The profit is given by P (x) = R(x) − C(x) = x − x2
with x > 0.
Since the set of the constraints S = (0, +∞) is an open set and the profit function is regular, the optimal point, if it exists, is a critical point solution of the equation dP = 1 − 2x = 0 dx
⇐⇒
x=
1 . 2
Moreover, we have d2 P = −2 dx2
and
d2 P 0 : P (x, r) = rx − C(x) = rx − x2
with x > 0.
Proceeding as in i), one can verify that d2 P (x, r) = −2 < 0. Thus P (., r) is concave 1. For, r close to 1, we have dx2 in x. 2. The second order condition for strict maximality is satisfied when r = 1.
Constrained Optimization-Inequality Constraints 3. P (1/2, 1) = max P (x, 1) = S
297
1 1 1 − = . 2 4 2
As a consequence, – ∃η > 0 such that the function P ∗ (r) = max P (x, r) is defined for any x∈S
r ∈ (1 − η, 1 + η) – P ∗ is C 1 and –
∂P dP ∗ 1 (1) = (x, r) =x = . dr ∂r 4 x=1/2, r=1 x=1/2, r=1
We can write the following approximation P ∗ (r) ≈ P ∗ (1) +
1 1 dP ∗ (1)(r − 1) = + (r − 1) dr 4 2
for r close to 1.
In particular, for r = 0.99, the objective function P ∗ takes the following approximate value: P ∗ (0.99) ≈ 0.25 + 0.5(0.99 − 1) = 0.25 − 0.5(0.01) = 0.245 and the approximate change in the maximum value of the maximum profit function is P ∗ (0.99) − P ∗ (1) ≈ −0.5(0.01) = −0.005. * Note that, for this example, we have easily the exact value of the objective function P ∗ ; see Figure 4.24. Indeed, we have r r r r2 P ∗ (r) = P (x∗ (r), r) = P ( , r) = r − ( )2 = 2 2 2 4 from which we deduce (0.99)2 = 0.245025 4 We also have the following equality ∂P (x, r) r dP ∗ = = x∗ (r) = . dr 2 ∂r x=x∗ (r) P ∗ (0.99) =
y 0.30 0.25 : y xx^2 0.20 0.15 0.10 0.05 0.00 0.0 0.2
y 0.99 x x2 0.4
0.6
0.8
1.0
x
FIGURE 4.24: Highest profit for r = 1 and r = 0.99
298
Introduction to the Theory of Optimization in Euclidean Space
Remark 4.5.3 In particular, when r = b, f (x, r) = f (x) and g(x, r) = g(x) − b, we have ∂L ∂f ∗ ¯ (b) = (x, λ, b) = λj (¯b) ∂bj ∂bj x=x∗ , λ=λ∗ , b=¯ b
j = 1, . . . , m.
This tells us that the Lagrange multiplier λj = λj (¯b) for the j th constraint is the rate of change at which the optimal value function changes with respect of the parameter bj at the point ¯b. Using the linear approximation formula, ∂f ∗ ¯ ∂f ∗ ¯ (b)(b1 − ¯b1 ) + · · · · · · + (b)(bm − ¯bm ) f ∗ (b) − f ∗ (¯b) ∂b1 ∂bm = λ1 (¯b)(b1 − ¯b1 ) + · · · · · · + λm (¯b)(bm − ¯bm ), the change in the optimal value function is estimated, when one or more components of the resource vector are slightly changed.
Example 2. For b close to 3, estimate f ∗ (b) = local max f (x, y, z) = xy + yz + xz
subject to
x+y+z =b
knowing that (see Example 2, Section 3.3) f ∗ (3) = f (1, 1, 1) = 3,
λ(3) = 2,
(−1)r Br (1, 1, 1) > 0 for r = 2, 3.
Solution: We can deduce that f ∗ ∈ C 1 (3 − η, 3 + η) for some η > 0, and write the linear approximation f ∗ (b) ≈ f ∗ (3) +
∂f ∗ (3)(b − 3) ∂b
for b close to 3.
If we denote by L(x, y, z, λ, b) = xy + yz + xz − λ(x + y + z − b) the Lagrangian associated with the new constrained maximization problem, then we have ∂L ∂f ∗ (3) = (x(b), y(b), z(b), λ(b), b) = λ(b) =2 ∂b ∂b b=3 b=3 f ∗ (b) ≈ 3 + 2(b − 3)
for b close to 3.
Constrained Optimization-Inequality Constraints
299
Solved Problems
1. – Irregular value function. i) Show that the value function f ∗ (r) = max (x − r)2 x∈[−1,1]
is not differentiable on R. Is there a contradiction with the theorem? ii) Can you expect a regularity for the value function g ∗ (r) = min
x∈[−1,1]
(x − r)2 .
Solution: This example shows that the optimal value function is not necessarily regular. Indeed, set y = f (x, r) = (x − r)2
f ∗ (r) = max
x∈[−1,1]
f (x, r).
We have
dy = fx (x, r) = 2(x − r). dx We distinguish different cases: y =
∗ r ∈ (−1, 1) : From Table 4.6, we deduce the maximum value. x y = 2(x − r) y = (x − r)2
−1 (1 + r)2
r −
1 + 0
(1 − r)2
TABLE 4.6: Variations of y = (x − r)2 when r ∈ (−1, 1)
max (x − r)2 = max (1 + r)2 , (1 − r)2 = f ∗ (r).
x∈[−1,1]
300
Introduction to the Theory of Optimization in Euclidean Space x y = 2(x − r) y = (x − r)2
−1
1
r
−
(1 + r)2
− (1 − r)2
0
TABLE 4.7: Variations of y = (x − r)2 when r ∈ (1, +∞) x y = 2(x − r) y = (x − r)2
−1
r +
0
1
(1 + r)2
+
(1 − r)2
TABLE 4.8: Variations of y = (x − r)2 when r ∈ (−∞, −1) ∗∗ r ∈ (1, +∞) : Using Table 4.7, we obtain max (x − r)2 = (1 + r)2 = f ∗ (r).
x∈[−1,1]
∗∗ r ∈ (−∞, −1) : Table 4.8 shows that max (x − r)2 = (1 − r)2 = f ∗ (r).
x∈[−1,1]
Conclusion: Note that (1 + r)2 − (1 − r)2 ⎧ (1 − r)2 ⎪ ⎪ ⎪ ⎪ ⎨ 1 f ∗ (r) = ⎪ ⎪ ⎪ ⎪ ⎩ (1 + r)2
= 4r, then if
r0
∗
For r = 0, f is differentiable since it is a polynomial. For r = 0, we have ⎧ (1 − r)2 − 1 ⎪ ⎪ = −(2 − r) ⎪ ⎨ r
f ∗ (r) − f ∗ (0) = ⎪ r−0 2 ⎪ ⎪ ⎩ (1 + r) − 1 = 2 + r r
if
r0
Hence lim
r−→0−
f ∗ (r) − f ∗ (0) = −2 r−0
and f ∗ is not differentiable at 0.
lim +
r−→0
f ∗ (r) − f ∗ (0) =2 r−0
Constrained Optimization-Inequality Constraints
301
This doesn’t contradicts the theorem since the regularity of f ∗ was proved when x∗ is an interior point for f , which is not the case here with x∗ = ±1. Indeed, we have f (x, 0) = x2 and f ∗ (0) = f (±1, 0) = 1. f (x) = f (x, 0) = x2 ,
ii) We have min
x∈[−1,1]
x2 = 0 = f (0) = g ∗ (0),
0 ∈ (−1, 1),
f (x) = 2 > 0.
So f attains its minimal value at the interior point 0, where the second derivatives test is satisfied. Moreover, f is convex on [−1, 1], which let’s 0 be the global minimum point. Therefore, for r close to 0, that is ∃η > 0 such that g ∗ ∈ C 1 (−η, η). In fact, from i), we have exactly g ∗ (r) = 0 for r ∈ (−1, 1), which is a regular function.
2. – Find an approximate value of (1.05)2 x + 5y sin(0.01) − 2x2 − 3y 2 max 2 R
Solution: Since, we are looking for an estimate of the maximal value, we will proceed using the linear approximation for a suitable function. First, we remark that 1.05 ≈ 1, 0.01 ≈ 0 and sin(0.01) ≈ 0. So, if we introduce the function f (x, y, r, s) = r2 x + 5y sin(s) − 2x2 − 3y 2 where r and s are parameters, then the problem seems like a perturbation of the simpler problem max f (x, y, 1, 0) = x − 2x2 − 3y 2 to the given problem max f (x, y, 1.05, 0.01). Solving
x − 2x2 − 3y 2 . max 2 R
Since R2 is an open set, a global extreme point of f (x, y) = f (x, y, 1, 0) is also a local extreme point. Therefore, it is a stationary point of f (x, y) = x−2x2 −3y 2 (f is a polynomial, it is C ∞ ). We have ∇f (x, y) = 1 − 4x, −6y = 0, 0
⇐⇒
1 (x, y) = ( , 0). 4
The only stationary point is ( 14 , 0). The Hessian matrix is −4 0 Hf (x, y) = 0 −6
302
Introduction to the Theory of Optimization in Euclidean Space
The leading principal minors are D1 (x, y) = −4 < 0 and D2 (x, y) = 24 > 0. 1 Hence, f is strictly concave on R2 and we conclude that (x∗ , y ∗ ) = ( , 0) is a 4 global maximum point, and the only one. Linear approximation. We have −4 0 1. Hf (.,r,s) (x, y) = 0 −6 (r, s) ∈ R2 .
=⇒
f (., r, s) is concave on R2 for any
1 1 2. f ( , 0) = max f (x, y, 1, 0) = . 2 4 8 R 3. The second order condition for strict maximality is satisfied when (r, s) = (1, 0) at the point (x, y) = (1/4, 0). As a consequence, –
∃η > 0 such that the function f ∗ (r, s) = max 2 f (x, y, r, s) (x,y)∈R
is defined for any (r, s) ∈ Bη (1, 0) – f ∗ is C 1 (Bη (1, 0)) and –
∂f 1 ∂f ∗ (1, 0) = = 2rx = ∂r ∂r (x,y)=(1/4,0), (r,s)=(1,0) 2 (x,y)=(1/4,0), (r,s)=(1,0) ∂f ∂f ∗ (1, 0) = = 5y cos(s) = 0. ∂s ∂s (x,y)=(1/4,0), (r,s)=(1,0) (x,y)=(1/4,0), (r,s)=(1,0) We can write the following approximation, for (r, s) close to (1, 0), f ∗ (r, s) ≈ f ∗ (1, 0) +
∂f ∗ 1 1 ∂f ∗ (1, 0)(r − 1) + (1, 0)(s − 0) = + (r − 1). ∂r ∂s 8 2
In particular, for (r, s) = (1.05, 0.01), the objective function f ∗ takes the following approximate value: 1 f ∗ (1.05, 0.01) ≈ 0.125 + (1.05 − 1) = 0.125 + 0.025 = 0.15 2 and the approximate change in the maximum value of the maximum profit function is f ∗ (1.05, 0.01) − f ∗ (1, 0) ≈
1 (1.05 − 1) = 0.025. 2
Constrained Optimization-Inequality Constraints
303
3. – Consider the problem min(max)f (x, y, z) = ex + y + z
⎧ ⎨ g1 = x + y + z = 1 s.t
⎩
g2 = x2 + y 2 + z 2 = 1.
i) Apply Lagrange’s theorem to the problem to show that there are four points satisfying the necessary conditions. ii) Show that each point is a regular point. iii) What can you conclude about the global minimal and maximal values of f subject to g1 = g2 = 1? Justify your answer. iv) Replace the constraints by x + y + z = a and x2 + y 2 + z 2 = b with (a, b) close to (1, 1) (a > 0, b > 0). - What is the approximate change in the optimal value function f ∗ (a, b) =
min
g1 =a, g2 =b
f (x, y, z)?
- What is the approximate change in the optimal value function F ∗ (a, b) =
max
g1 =a, g2 =b
f (x, y, z)?
Solution: i) Note that f , g1 and g2 are C ∞ in R3 . Consider the Lagrangian L(x, y, z, λ1 , λ2 ) = ex + y + z − λ1 (x + y + z − 1) − λ2 (x2 + y 2 + z 2 − 1) and look for its stationary points solution of the system
∇L(x, y, z, λ1 , λ2 ) = 0R5
⇐⇒
⎧ (1) Lx = ex − λ1 − 2xλ2 = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (2) Ly = 1 − λ1 − 2yλ2 = 0 ⎪ ⎪ ⎪ ⎪ ⎨ (3) Lz = 1 − λ1 − 2zλ2 = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (4) Lλ1 = −(x + y + z − 1) = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (5) Lλ2 = −(x2 + y 2 + z 2 − 1) = 0.
From equations (2) and (3), we deduce that (z − y)λ2 = 0
=⇒
z=y
or
λ2 = 0.
304
Introduction to the Theory of Optimization in Euclidean Space
∗ If λ2 = 0, we deduce from equation (2) that λ1 = 1 and then from equation (1), that x = 0. Hence, equations (4) and (5) give 2y 2 − 2y = 0
=⇒
y=0
or
y = 1.
Therefore, we have the two points (0, 1, 0),
(0, 0, 1)
with
(λ1 , λ2 ) = (1, 0).
∗∗ If z = y, then equations (4) and (5) give x = 1 − 2y
6y 2 − 4y = 0
and
=⇒
y=0
or
y=
2 . 3
Therefore, we have the two points (1, 0, 0) 1 2 2 (− , , ) 3 3 3
with
with
λ1 = 1
λ1 =
and
1 1 (1 + 2e− 3 ) 3
1 (e − 1) 2
λ2 = and
λ2 =
1 1 (1 − e− 3 ). 2
ii) Consider the matrix ⎡ ⎢ g (x, y, z) = ⎣
g (0, 1, 0) = g (1, 0, 0) =
1 2
∂g1 ∂x
∂g1 ∂y
∂g1 ∂z
∂g2 ∂x
∂g2 ∂y
∂g2 ∂z
1 0
1 2
1 0
1 0
1 0
⎤
⎡
⎥ ⎣ ⎦=
1
1
1
2x
2y
2z
g (0, 0, 1) = 1 2 2 g (− , , ) = 3 3 3
1 0
1 0
1 − 23
⎤ ⎦
1 2 1 4 3
1 4 3
.
Each critical point is regular, and we remark that the first two column vectors in the matrices g (− 13 , 23 , 23 )), g (0, 1, 0) and g (1, 0, 0) are linearly independent, while they are linearly dependent in g (0, 0, 1). Therefore, we can keep the matrices without renumbering the variables when applying the second derivatives test in the first three matrices and change the variables in the last one. iii) Now, f is continuous on the constraint set which is a closed and bounded curve of R3 as the intersection of the unit sphere x2 + y 2 + z 2 = 1 and the plane x + y + z − 1 = 0. So f attains its optimal values by the extreme value theorem on points that are also critical points of the Lagrangian. Comparing the values of f on these points, we obtain 1 4 1 2 2 2 < f (− , , ) = e− 3 + ≈ 2.0498 < e. 3 3 3 3
Constrained Optimization-Inequality Constraints min
g1 =1, g2 =1
305
f (x, y, z) = f (0, 1, 0) = f (0, 0, 1) = 2
and max
g1 =1, g2 =1
f (x, y, z) = f (1, 0, 0) = e.
iv) ∗ If we denote f ∗ (a, b) =
min
g1 =a, g2 =b
f (x, y, z)
then f ∗ is regular for (a, b) close to (1, 1) because we have: 1. for (a, b) close to (1, 1), there exists a solution to the constrained minimization problem by the extreme value theorem (because f is continuous on the closed bounded set x + y + z = a and x2 + 2y 2 + z 2 = b). 2. (0, 1, 0) and (0, 0, 1) are solutions to the constrained minimization problem when (a, b) = (1, 1) and are regular points. 3. the second order condition for minimality is satisfied when (a, b) = (1, 1) at (0, 1, 0) and (0, 0, 1). Indeed, n = 3 and m = 2, then we have to consider the sign of the following bordered Hessian determinant: 0 0 B3 (x, y, z) = ∂g1 ∂x ∂g1 ∂y ∂g 1 ∂z
B3 (0, 1, 0) =
0
∂g1 ∂x
∂g1 ∂y
∂g1 ∂z
0
∂g2 ∂x
∂g2 ∂y
∂g2 ∂z
∂g2 ∂x
Lxx
Lxy
Lxz
∂g2 ∂y
Lyx
Lyy
Lyz
∂g2 ∂z
Lzx
Lzy
Lzz
0 0 1 1 1
0 0 0 2 0
1 0 1 0 0
1 2 0 0 0
1 0 0 0 0
=4
0 0 =1 1 1
=⇒
0
1
1
0
2x
2y
2x
ex − 2λ2
0
2y
0
−2λ2
2z
0
0
2z 0 = 4. 0 −2λ2 1
(−1)2 B3 (0, 1, 0) = 4 > 0.
We change the variables in the order (x, z, y) to compute B3 (0, 0, 1) and obtain B3 (0, 0, 1) =
0 0 1 1 1
0 0 0 2 0
1 0 1 0 0
1 2 0 0 0
1 0 0 0 0
=4
=⇒
(−1)2 B3 (0, 0, 1) = 4 > 0.
306
Introduction to the Theory of Optimization in Euclidean Space
Consequently, with the new Lagrangian, La,b (x, y, z, λ1 , λ2 ) = ex + y + z − λ1 (x + y + z − a) − λ2 (x2 + y 2 + z 2 − b), we have f ∗ (1, 1) = f (0, 1, 0) = f (0, 0, 1) = 2 λ1 (1, 1) = 1
and
with
λ2 (1, 1) = 0
∂La,b ∂f ∗ (1, 1) = = λ1 (1, 1) = 1 ∂a ∂a (x,y,z,λ1 ,λ2 )=(0,1,0,λ1 (1,1),λ2 (1,1)) ∂La,b ∂f ∗ (1, 1) = = λ2 (1, 1) = 0 ∂b ∂b (x,y,z,λ1 ,λ2 )=(0,1,0,λ1 (1,1),λ2 (1,1)) f ∗ (a, b) ≈ f ∗ (1, 1) +
∂f ∗ ∂f ∗ (1, 1)(a − 1) + (1, 1)(b − 1) ∂a ∂b
= 2 + (a − 1) + (0)(b − 1) = a + 1. ∗∗ If we denote
F ∗ (a, b) =
max
g1 =a, g2 =b
f (x, y, z)
then F ∗ is regular for (a, b) close to (1, 1) because we have: 1. for (a, b) close to (1, 1), there exists a solution to the constrained maximization problem by the extreme value theorem (because f is continuous on the closed bounded set x + y + z = a and x2 + 2y 2 + z 2 = b). 2. (1, 0, 0) is the solution to the constrained maximization problem when (a, b) = (1, 1) and it is a regular point. 3. the second order condition for maximality is satisfied when (a, b) = (1, 1) at (1, 0, 0). Indeed, n = 3 and m = 2, then we have to consider the sign of the following bordered Hessian determinant: B3 (1, 0, 0) =
0 0 1 1 1
0 0 2 0 0
1 2 1 0 0
1 0 0 1−e 0
1 0 0 0 1−e
= 8(1 − e) < 0
Consequently, we have F ∗ (1, 1) = f (1, 0, 0) = e λ1 (1, 1) = 1
and
with λ2 (1, 1) =
1 (e − 1) 2
(−1)3 B3 = 8(e − 1) > 0.
Constrained Optimization-Inequality Constraints 307 ∂L ∂F ∗ (1, 1) = = λ1 (1, 1) = 1 ∂a ∂a (x,y,z,λ1 ,λ2 )=(1,0,0,λ1 (1,1),λ2 (1,1)) ∂L 1 ∂F ∗ (1, 1) = = λ2 (1, 1) = (e − 1) ∂b ∂b (x,y,z,λ1 ,λ2 )=(1,0,0,λ1 (1,1),λ2 (1,1)) 2 F ∗ (a, b) ≈ F ∗ (1, 1) +
∂F ∗ ∂F ∗ (1, 1)(a − 1) + (1, 1)(b − 1) ∂a ∂b
1 = e + (a − 1) + (e − 1)(b − 1). 2
4. – Consider the problem min(max) f (x, y) = 1 − (x − 2)2 − y 2
⎧ 2 ⎨ x + y2 8 s.t
⎩
x − y 0.
i) Sketch the feasible set and write down the necessary KKT conditions. ii) Find the solutions candidates of the necessary KKT conditions. iii) Use the second derivatives test to classify the points. iv) Explore the concavity and convexity of the associated Lagrangian in (x, y). v) What can you conclude about the solution of the maximization problem? vi) Determine the approximate values of each problem. ⎧ √ ⎨ x2 + 1.04y 2 8 min(max) 1 − (0.98)3 (x − 2)2 − e−0.01 y 2 s.t ⎩ (1.04)2 x − y 0.
Solution: i) Figure 4.25 describes the constraint set and locate the extreme points, approximately, following the variation of the objective function along the level curves. Consider the Lagrangian L(x, y, λ, β) = 1 − (x − 2)2 − y 2 − λ(x2 + y 2 − 8) − β(x − y) We look simultaneously for the possible minima and maxima candidates. Thus, the Karush-Kuhn-Tucker conditions are
308
Introduction to the Theory of Optimization in Euclidean Space y 46 50 48 44 40 34 42 49 38 47 45 36 31 43 41 25 39
3
2 S
1
1
1
2
3
32 4
x
35 37
x2 y2 8
41 43
x y 0
23
5
2 3
2
2
26
4
x
1 19 15
2 8
4
6
30
36 45 47 38 49 42 40 50 4844 46
3
12
11
39 2
17 1 1
2
29 2
y 418 14 9
16
35
3
22
20
37
1
28
10 24 33
27
214 17
13
1 17 1
FIGURE 4.25: Level curve of highest profit
⎧ (1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (2) ⎪ ⎪ (3) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ (4)
Lx = −2(x − 2) − 2λx − β = 0 Ly = −2y − 2λy + β = 0 λ=0
if
x2 + y 2 < 8
β=0
if
x−y 0. Thus, by (4), we get x − y = 0 which contradicts x − y < 0. – Suppose x − y = 0. We have then y = x and y = −x + 2. Thus, we have a candidate point for optimality: (x, y) = (1, 1)
with
(λ, β) = (0, 2).
Constrained Optimization-Inequality Constraints
309
∗ If x2 + y 2 = 8 then – Suppose x − y < 0 then β = 0 and ⎧ ⎨ x − 2 + λx = 0 ⎩
−2y(1 + λ) = 0
⇐⇒
y=0
or
λ = −1.
λ = −1 is not possible by x − 2 + λx = 0. Thus y = 0. √ With √ x2 + y 2 = 8, we deduce that√ x = 8, which contradicts x < y, or x = − 8. Inserting the value x = − 8 into x − 2 + λx = 0 gives λ = −1 − √12 . So, we have another candidate √ (x, y) = (− 8, 0)
with
1 (λ, β) = (−1 − √ , 0). 2
– Suppose x − y = 0. With x2 + y 2 = 8, we deduce that x = 2 or x = −2. Then, inserting in (1) and (2), we obtain ⎧ ⎨ −4λ − β = 0 (x, y) = (2, 2)
=⇒
⎩
⇐⇒
−4λ + β = 4
1 (λ, β) = (− , 2) 2
contradicting the common sign of λ and β. ⎧ ⎨ 4λ − β = −8
(x, y) = (−2, −2)
=⇒
⎩
4λ + β = −4
⇐⇒
3 (λ, β) = (− , 2) 2
contradicting the common sign of λ and β. Regularity of the candidate point (1, 1). Note that the constraints g1 (x, y) = x2 + y 2 and g2 (x, y) = x − y are C 1 in R2 and that only the constraint g2 is active at (1, 1). We have g2 (x, y) =
1
−1
rank(g2 (1, 1)) = 1
Thus the point (1, 1) is a regular point. √ Regularity √of the candidate point (− 8, 0). Only the constraint g1 is active at (− 8, 0). We have g1 (x, y) = 2x 2y √ Thus the point (− 8, 0) is a regular point.
√ rank(g2 (− 8, 0)) = 1
310
Introduction to the Theory of Optimization in Euclidean Space
iii) Second derivatives test at (1, 1). With p = 1 (the number of the constraints), n = 2 (the dimension of the space), then r = p + 1, n = 2, 2 =⇒ r = 2 and we will consider the following determinant 0 2 B2 (x, y) = ∂g ∂x ∂g ∂y2
∂g2 ∂x
Lxx Lyx
∂g2 ∂y
Lxy Lyy
∗ At (1, 1), we have λ = 0, then 0 1 −1 B2 (1, 1) = 1 −2 0 −1 0 −2
=
=4
0 1 −1
1 −2 − 2λ 0
=⇒
(−1)2 B2 (1, 1) > 0
−1 0 −2 − 2λ
and (1, 1) is a local maximum. √ Second derivatives test at (− 8, 0). We consider the following determinant 0 1 B2 (x, y) = ∂g ∂x ∂g ∂y1
∂g1 ∂x
Lxx Lyx
∂g1 ∂y
Lxy Lyy
=
√ ∗ At (− 8, 0), we have λ = −3/2, then √ 0√ −2 8 0 B2 (1, 1) = −2 8 1 0 = −32 0 0 1 √ and (− 8, 0) is a local minimum.
0 2x 2y
=⇒
2x −2 − 2λ 0
2y 0 −2 − 2λ
√ (−1)1 B2 (− 8, 0) > 0
iv) and v) Let us explore the concavity and convexity of L with respect to (x, y) where the Hessian matrix of L in (x, y) is −2 − 2λ 0 Lxx Lxy = HL = 0 −2 − 2λ Lyx Lyy • When λ = 0, the principal minors are Δ11 = Lyy = −2 < 0, Δ21 = Lxx = −2 < 0 and Δ2 = 4 > 0. So (−1)k Δk 0 for k = 1, 2. Therefore, L is concave in (x, y) and then (1, 1) is a global maximum for the constrained maximization problem. • When λ = −3/2, the principal minors are Δ11 = Lyy = 1 > 0, Δ21 = Lxx = > 0. So (−1)k Δk 0 for k = 1, 2. Therefore, L is convex in 1 > 0 and Δ2 = 1 √ (x, y) and then (− 8, 0) is a global minimum for the constrained minimization problem.
Constrained Optimization-Inequality Constraints
311
vi) Note that 0.98 ≈ 1, 1.04 ≈ 1, 0.01 ≈ 0 and e−0.01 ≈ 1. Thus, the new problems seem like a perturbation from the original problem. Therefore, we will use linear approximation to solve the problem when r = 0.98, s = 1.04 and t = −0.01. So, introduce the Lagrangian associated with the new constrained optimization problem √ L(x, y, λ, β, r, s, t) = 1 − r3 (x − 2)2 − et y 2 − λ(x2 + sy 2 − 8) − β(s2 x − y) • Set f (x, y, r, s, t) = 1 − r3 (x − 2)2 − et y 2 , and the value function ⎧ 2 √ 2 ⎨ x + sy 8 f ∗ (r, s, t) = min f (x, y, r, s, t) s.t ⎩ 2 s x−y 0 Then f ∗ is well defined and differentiable when (r, s, t) is close to (1, 1, 0). Indeed, the following is satisfied √ 1. There is a unique solution (x, y) = (− 8, 0) to the√constrained minimization problem when (r, s, t) = (1, 1, 0) and (− 8, 0) is a regular point. 2. For (r, s, t) close to (1, 1, 0), there exists a solution to the constrained minimization problem by the extreme value theorem since the set of constraints is a closed bounded set and the function is continuous. √ 3. the second order condition for minimality is satisfied at (− 8, 0) when (r, s, t) = (1, 1, 0). As a consequence, ∂L ∂f ∗ (1, 1, 0) = √ √ ∂r ∂r (x,y,λ,β)=(− 8,0,−1−1/ 2,0), (r,s,t)=(1,1,0) √ = −3r2 (x − 2)2 = −3( 8 + 2)2 √ √ (x,y,λ,β)=(− 8,0,−1−1/ 2,0), (r,s,t)=(1,1,0)
∂L ∂f ∗ (1, 1, 0) = √ √ ∂s ∂s (x,y,λ,β)=(− 8,0,−1−1/ 2,0, (r,s,t)=(1,1,0) 1 = −λ √ y 2 − 2βsx =0 √ √ 2 s (x,y,λ,μ)=(− 8,0,−1−1/ 2,0), (r,s,t)=(1,1,0) ∂L ∂f ∗ (1, 1, 0) = √ √ ∂t ∂t (x,y,λ,β)=(− 8,0,−1−1/ 2,0), (r,s,t)=(1,1,0) = −et y 2 = 0. √ √ (x,y,z,λ,β)=(− 8,0,−1−1/ 2,0), (r,s,t)=(1,1,0)
312
Introduction to the Theory of Optimization in Euclidean Space
Hence, for (r, s, t) close to (1, 1, 0) f ∗ (r, s, t) ≈ f ∗ (1, 1, 0) +
∂f ∗ (1, 1, 0)(r − 1) ∂r
∂f ∗ ∂f ∗ (1, 1, 0)(s − 1) + (1, 1, 0)(t − 0) ∂s ∂t √ f ∗ (r, s, t) ≈ 2 − 3( 8 + 2)2 (r − 1) √ f ∗ (0.98, 1.04, −0.01) ≈ 2 − 3( 8 + 2)2 (0.04). +
• Set the value function F ∗ (r, s, t) = max f (x, y, r, s, t)
⎧ 2 √ 2 ⎨ x + sy 8 s.t
⎩
s2 x − y 0
Then F ∗ is well defined and differentiable when (r, s, t) is close to (1, 1, 0). Indeed, the following is satisfied 1. There is a unique solution (x, y) = (1, 1) to the constrained maximization problem when (r, s, t) = (1, 1, 0) and (1, 1) is a regular point. 2. For (r, s, t) close to (1, 1, 0), there exists a solution to the constrained maximization problem by the extreme value theorem since the set of constraints is a closed bounded set and the function is continuous. 3. the second order condition for maximality is satisfied at (1, 1) when (r, s, t) = (1, 1, 0). As a consequence, ∂L ∂F ∗ (1, 1, 0) = ∂r ∂r (x,y,λ,β)=(1,1,0,2), (r,s,t)=(1,1,0) = −3r2 (x − 2)2
(x,y,λ,β)=(1,1,0,2), (r,s,t)=(1,1,0)
= −3
∂L ∂F ∗ (1, 1, 0) = ∂s ∂s (x,y,λ,β)=(1,1,0,2), (r,s,t)=(1,1,0) 1 = −λ √ y 2 − 2βsx = −4 2 s (x,y,λ,μ)=(1,1,0,2), (r,s,t)=(1,1,0) ∂L ∂F ∗ (1, 1, 0) = ∂t ∂t (x,y,λ,β)=(1,1,0,2), (r,s,t)=(1,1,0) = −et y 2 = −1. (x,y,λ,β)=(1,1,0,2), (r,s,t)=(1,1,0)
Constrained Optimization-Inequality Constraints
313
Hence, for (r, s, t) close to (1, 1, 0) F ∗ (r, s, t) ≈ F ∗ (1, 1, 0) +
∂F ∗ (1, 1, 0)(r − 1) ∂r
∂F ∗ ∂F ∗ (1, 1, 0)(s − 1) + (1, 1, 0)(t − 0) ∂s ∂t F ∗ (r, s, t) ≈ −1 − 3(r − 1) − 4(s − 1) − (t − 0) +
F ∗ (0.98, 1.04, −0.01) ≈ −1 − 3(−0.02) − (0.04) − (−0.01) = 0.02.
Remark. The set of feasible solutions S = {(x, y) : x2 +y 2 8, x−y 0} is a closed bounded set of R2 and f is continuous on S. Therefore, the extreme points are attained on this set by the extreme value theorem. Moreover, such points must occur either at points satisfying the KKT conditions or at points √ where the constraint qualification fails. Since, (1, 1) and (− 8, 0) are the only two points solution and they are regular, then they solve the problem. For more practice, we refer the reader to [11], [27], [28], [26], [25], [24], [4].
Bibliography
[1] H. Anton, I. Bivens, and S. Davis. Calculus. Early Transcendentals. John Wiley & Sons, Inc. New York, NY, USA, 2005. [2] R. G. Bartle and D. R. Sherbert. Introduction to Real Analysis. John Wiley & Sons, Inc, 2011. [3] W. Briggs, L. Cochran, and B. Gillett. Calculus. Early Transcendentals. Addison-Wesley. Pearson, 2011. ˙ [4] E. K. P. Chong and S. H. Zak. An Introduction to Optimization. Wiley, 2013. [5] P. G. Ciarlet. Introduction ` a l’analyse num´erique matricielle et l’optimisation. Masson, 1985. [6] B. Dacorogna. Introduction au calcul des variations. Presses polythechniques et universitaires romandes. Lausanne, 1992. [7] Jr. Ernest.F. Haeussler, S. P. Richard, and J. W. Richard. Introductory Mathematical Analysis for Business, Economics, and the Life and Social Sciences. Pearson, Prentice Hall, 2008. [8] P. E. Fishback. Linear and Nonlinear Programming with MapleTM . An Interactive, Applications-Based Approach. CRC Press, Taylor and Francis Group, 2010. [9] A.S. Gupta. Calculus of Variations with Applications. Prentice-Hall of India, 2006. [10] W. Keith Nicholson. Linear Algebra with Applications. McGraw-Hill Ryerson, 2014. [11] D. Koo. Elements of Optimisation with Applications in Economics and Business. Springer-Verlag, 1977. [12] R. J. Larsen and M. L. Marx. An Introduction to Mathematical Statistics and its Applications. Prentice Hall, 2001. [13] S. Lipschutz. Topologie, cours et probl`emes. McGraw-Hill, 1983.
315
316
Bibliography
[14] D.G. Luenberger. Introduction to Linear and Nonlinear Programming. Addison Wesley, 1973. [15] J. E. Marsden. Elementary Classical Analysis. W. H. Freeman and Company, 1974. [16] M. Mesterton-Gibbons. A primer on the calculus of variations and optimal control theory. Student Mathematical Library vol 50. American Mathematical Society, 2009. [17] M. Minoux. Mathematical Programming: Theory and Algorithms. John Wiley and Sons, 1986. [18] J. R. Munkres. Topology of First Course. Prentice Hall, 1975. [19] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999. [20] M.H. Protter and C.B. Morrey. A First Course in Real Analysis. Springer, 2000. [21] S.L. Salas, E. Hille, and G.J. Etgen. Calculus. One and Several Variables. Tenth Edition. John Wiley & Sons, INC, 2007. [22] J. A. Snyman. Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms. Springer, 2005. [23] J. Stewart. Essential Calculus. Brooks/Cole, 2013. [24] K. Sydsæter and P. Hammond. Mathematics for Economic Analysis. FT Prentice Hall, 1995. [25] K. Sydsæter, P. Hammond, A. Seierstad, and A. Strøm. Further Mathematics for Economic Analysis. FT Prentice Hall, 2008. [26] K. Sydsæter, P. Hammond, A. Seierstad, and A. Strøm. Instructor’s Manual: Further Mathematics for Economic Analysis. Pearson, 2008. 2nd Edition. [27] K. Sydsæter, A. Strøm, and P. Hammond. Instructor’s Manual: Essential Mathematics for Economic Analysis. Pearson, 2008. 3rd Edition. [28] K. Sydsæter, A. Strøm, and P. Hammond. Instructor’s Manual: Essential Mathematics for Economic Analysis. Pearson, 2014. 4th Edition. [29] W. L. Winston. Operations Research: Applications and Algorithms. Brooks/Cole, 2004.
Index
absolute maximum, 54, 117 absolute minimum, 54, 117 active, 223 affine, 209 approximate method, 60 approximation, 293
eigen value, 79 ellipse, 23 ellipsoid, 25 extreme-value theorem, 117
ball, 8 binding, 223 bordered Hessian determinant, 252 boundary, 10, 27, 117 bounded, 10, 117
generalized Lagrange multipliers, 223 global extreme points, 117 global maximum, 50 global minimum, 50 gradient, 30, 141 graph, 22
chain rule, 34, 96, 294 Clairaut, 31 closed, 10, 117 closure, 10 Cobb-Douglas, 20, 292 columns, 80 concave, 93 cone, 24, 206 cone of feasible directions, 204 constraint function, 292 continuous, 28, 117 continuously differentiable, 34 convex, 13, 93 critical point, 54 critical points, 117 cylinder, 22
Hessian, 31, 71, 98 hyperplane, 29
dependence, 292 determinant, 80 differentiability, 29 differentiable, 33 dimension, 22 domain, 21
Farkas-Minkowski, 222
implicit function theorem, 138, 139, 214, 295 inactive, 223 inflection point, 55 interior, 9, 117, 139 interior point, 9, 54, 204 intermediate value theorem, 69 Jacobian, 141 Karush-Kuhn-Tucker, 223, 232, 235 Lagrange, 153 Lagrange multipliers, 153 Lagrangian, 153, 223 Laplace, 32 leading minors, 71, 98 level curve, 22 level surface, 22 line, 23 line tangent, 29 317
318
Index
linear, 33 linear combination, 152 linear constraints, 175 Linear programming, 133 linearly independent, 138, 176, 206, 252 local extreme point, 50 local maximum, 50 local minimum, 50
quadratic form, 76, 78, 81, 99, 254
negative definite, 175 negative semi definite, 82, 255 neighborhood, 9, 27, 80 normal line, 143 normal vector, 143
saddle point, 56, 80 second derivatives test, 72, 293 semi definite, 80 several variables, 26, 29 slack, 223 slope, 29 stationary point, 54 strictly concave, 93 strictly convex, 93, 99 subspace, 138, 140 surface, 22 symmetric, 76, 79 symmetric matrix, 81
objective function, 50, 292 open, 9 optimal value function, 293 orthogonal, 152 orthogonal matrix, 79 parabola, 23 Paraboloid, 23 parallel, 23, 143 parameters, 292 partial derivative, 29 plane tangent, 152 polyhedra, 133 positive definite, 82, 99, 175 positive semi definite, 82, 255 principal minor, 80, 82, 100 production, 20
radius, 9 rank, 139 rate of change, 29 regular point, 137, 206 relative maximum, 56 relative minimum, 56 rows, 80
tangent line, 95, 142 tangent plane, 137, 254 Taylor’s formula, 76, 253 traces, 22 triangular inequality, 8, 94 unbounded, 10, 121 unit vectors, 150 vertices, 26