Mathematics for Economists with Applicationsprovides detailed coverage of the mathematical techniques essential for unde
386 16 6MB
English Pages 690 [713] Year 2015
Table of contents :
Cover......Page 1
Title Page......Page 4
Copyright Page......Page 5
Table of Contents......Page 6
List of figures......Page 16
Preface......Page 22
1.2 A brief summary by chapter......Page 24
2.1 Introduction......Page 42
2.1.1 Some motivating examples......Page 43
2.2 Vectors......Page 46
2.3 Matrices......Page 47
2.3.1 Matrix addition and subtraction......Page 48
2.3.2 Matrix multiplication......Page 49
2.3.3 Matrix transpose......Page 52
2.3.4 Matrix inversion......Page 53
2.3.5 Additional remarks on determinants......Page 63
2.3.6 Cramer’s rule......Page 66
2.3.7 Gaussian elimination......Page 67
2.4 Linear dependence and independence......Page 73
2.4.1 Matrix rank and linear independence......Page 78
2.5.1 Solving systems of equations: a geometric view......Page 80
2.5.2 Solutions of equation systems: rank and linear independence......Page 84
2.6 Special matrices......Page 87
Exercises for Chapter 2......Page 94
3.2 The market model......Page 100
3.3 Adjacency matrices......Page 101
3.4 Input–output analysis......Page 104
3.5 Inflation and unemployment dynamics......Page 111
3.6 Stationary (invariant) distributions......Page 112
3.6.1 Convergence to a stationary distribution......Page 114
3.6.2 Computation of the invariant distribution......Page 116
3.7 Econometrics......Page 117
3.7.1 Derivation of the least squares estimator......Page 120
Exercises for Chapter 3......Page 124
4.1 Introduction......Page 128
4.1.1 Formulation......Page 129
4.1.2 The feasible region......Page 132
4.1.3 Finding an optimal solution......Page 133
4.1.4 Dual prices and slack variables......Page 136
4.1.5 Some examples......Page 139
4.1.6 Duality reconsidered......Page 143
4.2.1 Equalityconstrained programs......Page 146
4.2.2 Definition and identification of basic solutions......Page 148
4.3 Duality principles......Page 154
4.3.1 Duality and dual prices......Page 157
Exercises for Chapter 4......Page 160
5.1 Introduction......Page 168
5.2.1 Demand functions......Page 169
5.2.2 Present value and the interest rate......Page 170
5.2.3 Taxation......Page 171
5.2.4 Options......Page 172
5.3.1 Preliminaries......Page 174
5.3.2 Inequalities and absolute value......Page 176
5.3.3 Continuity......Page 177
5.3.4 Differentiability......Page 179
5.4 Types of function......Page 181
5.4.1 Plots of various functions......Page 182
5.4.2 The log and exponential functions......Page 184
5.5 Growth rates......Page 186
5.5.1 Additivity of continuous growth rates......Page 187
5.5.2 The functions lnx (loge ex) and ekx......Page 188
5.6 Some rules of calculus......Page 189
5.6.1 l’Hospital’s rule......Page 192
5.6.2 Higher derivatives......Page 193
Exercises for Chapter 5......Page 194
6.1 Introduction......Page 198
6.2 Optimization......Page 199
6.2.1 The firstorder condition: issues......Page 200
6.2.2 The secondorder condition......Page 204
6.2.3 Applications: profit maximization and related problems......Page 206
6.3 Concavity and optimization......Page 210
6.4.1 The freerider problem......Page 216
6.5 Comparative statics......Page 219
6.6 Elasticity of demand......Page 223
6.6.1 Elasticity, revenue and profit maximization......Page 225
6.7 Taxation and monopoly......Page 226
6.8 Taxation incidence in supply and demand......Page 228
6.8.1 A unit tax......Page 229
6.8.2 An advalorem tax......Page 230
6.8.3 Elasticities and taxation......Page 232
6.9 Ramsey pricing......Page 234
Exercises for Chapter 6......Page 238
7.1 Introduction......Page 242
7.2 Partial derivatives......Page 246
7.3 Level contours......Page 248
7.4 The elasticity of substitution......Page 250
7.5 Market equilibrium: an application......Page 253
7.5.1 Oligopoly: linear demand......Page 255
7.5.2 Oligopoly: nonlinear demand......Page 256
7.5.3 Industry concentration indices......Page 257
7.6 Total differentials......Page 258
7.6.1 The impact of parameter changes......Page 263
7.7 Solutions to systems of equations......Page 265
7.7.1 Existence and properties of solutions......Page 267
7.8 The implicit function theorem......Page 270
7.8.1 General systems of equations......Page 276
Exercises for Chapter 7......Page 286
8.1 Introduction......Page 292
8.1.1 Taylor series approximations: an example......Page 293
8.1.2 Taylor series expansions......Page 295
8.1.3 Approximations and accuracy......Page 298
8.2 Applications of Taylor series expansions......Page 299
8.2.1 Concavity and the second derivative......Page 300
8.2.2 Roots of a function......Page 301
8.2.3 Numerical optimization......Page 302
8.3.1 Expected utility theory......Page 308
8.3.2 Riskaverse preferences and welfare loss......Page 310
8.3.3 Risk aversion and the cost of small risk......Page 313
8.4.1 Diversification......Page 316
8.4.2 Portfolio selection......Page 317
8.5 The multivariate version of Taylor’s theorem......Page 319
8.6 Proof of Taylor’s theorem......Page 320
Exercises for Chapter 8......Page 324
9.2 Vectors: length and distance......Page 330
9.3 Vectors: direction and angles......Page 333
9.4 Hyperplanes and direction of increase......Page 336
9.5 Farkas’ lemma......Page 339
Exercises for Chapter 9......Page 342
10.1 Introduction......Page 344
10.2 Quadratic forms......Page 345
10.3 Positive and negative definite matrices......Page 348
10.3.1 Symmetric matrices......Page 350
10.3.2 Criteria for positive and negative definiteness......Page 351
10.3.3 Positive and negative semidefiniteness......Page 352
10.4 Definiteness with equality constraints......Page 353
Exercises for Chapter 10......Page 356
11.2 The twovariable case......Page 358
11.2.1 Motivation for the secondorder conditions......Page 362
11.2.2 Failure of the secondorder conditions......Page 370
11.2.3 The envelope theorem......Page 372
11.3 Optimization: n variables......Page 380
11.3.1 Motivation for the Hessian conditions......Page 383
11.3.2 Definiteness and secondorder conditions......Page 384
11.4 Concavity, convexity and the Hessian matrix......Page 390
Exercises for Chapter 11......Page 392
12.1 Introduction......Page 398
12.1.1 A motivating example......Page 399
12.2 The twovariable case......Page 401
12.3 Motivation for first and secondorder conditions......Page 407
12.3.2 The secondorder condition......Page 408
12.3.3 A geometric perspective......Page 410
12.4 The nvariable case......Page 413
12.4.1 Quadratic forms and bordered matrices......Page 414
12.4.2 Bordered Hessians and optimization......Page 415
12.4.3 The secondorder condition reconsidered......Page 418
12.5 Interpretation of the Lagrange multiplier......Page 419
12.6 Optimization: concavity and convexity......Page 425
12.6.1 Quasiconcavity and bordered Hessians......Page 432
12.7 Optimization with many constraints......Page 434
Exercises for Chapter 12......Page 438
13.1 Introduction......Page 442
13.1.1 A motivating example......Page 443
13.1.2 Overview......Page 445
13.2 Optimization......Page 446
13.2.1 The constraint qualification......Page 453
13.2.2 Complementary slackness......Page 457
13.2.4 Global and local optima......Page 459
13.2.5 Nonnegativity constraints......Page 460
13.2.6 Necessary conditions and nonconvexities......Page 465
13.2.8 Equality and inequality constraints......Page 467
Exercises for Chapter 13......Page 470
14.1 Introduction......Page 472
14.2 The integral......Page 473
14.4.1 Consumer surplus......Page 475
14.4.2 Producer surplus......Page 476
14.4.3 Consumer surplus and welfare......Page 477
14.5 Cost, supply and profit......Page 479
14.5.1 Average and marginal cost......Page 481
14.6 Taxation and consumer surplus......Page 484
14.7 Present value......Page 488
14.8 Leibnitz’s rule......Page 490
14.9 Inequality measures......Page 492
14.10 Ramsey pricing......Page 494
14.11.1 Consumer surplus......Page 497
14.11.2 Compensating variation......Page 499
14.11.3 Equivalent variation......Page 505
14.11.4 Comparison of welfare measures......Page 508
Exercises for Chapter 14......Page 514
15.1 Introduction......Page 516
15.2 Definitions and basic properties......Page 517
15.3 An application: dynamic models......Page 523
15.3.1 Supply and demand dynamics......Page 525
15.3.2 A model of population growth......Page 527
15.4 The Perron–Frobenius theorem......Page 531
15.5.1 Webpage ranking......Page 534
15.5.2 Leslie matrices......Page 538
15.6 Eigenvalues and eigenvectors of symmetric matrices......Page 542
15.7 Real and complex eigenvalues and eigenvectors......Page 546
15.8 Diagonalization......Page 552
15.9 Properties of eigenvalues and eigenvectors......Page 553
15.9.1 Largest eigenvalues and eigenvalue multiplicity......Page 557
Exercises for Chapter 15......Page 560
16.1 Introduction......Page 564
16.1.1 Preliminary discussion......Page 565
16.2.1 Constant coefficients......Page 567
16.2.2 Variable coefficients......Page 571
16.3.1 The logistic model......Page 575
16.3.2 The Bernoulli equation......Page 577
16.4 Firstorder differential equations: existence of solutions......Page 580
16.5.1 Constant coefficients......Page 581
16.5.2 Derivation of solution......Page 582
16.6 Systems of differential equations......Page 586
16.7 Stability......Page 589
Exercises for Chapter 16......Page 592
17.1 Introduction......Page 596
17.2 Motivating examples......Page 597
17.3 Firstorder linear difference equations......Page 599
17.3.2 The constant forcing function......Page 600
17.3.3 The cobweb model......Page 602
17.3.4 Equations with variable forcing function......Page 604
17.4 Secondorder linear difference equations......Page 605
17.4.2 The homogeneous equation......Page 606
17.4.3 The general solution......Page 607
17.4.5 Examples......Page 608
17.4.6 The Samuelson multiplieraccelerator model......Page 609
17.5.1 The particular solution......Page 612
17.5.2 The complementary solution......Page 613
17.6 The nvariable case......Page 615
17.7.1 Various forcing functions: solutions......Page 616
17.7.2 Roots of the characteristic equation......Page 618
17.7.3 Stability......Page 622
Exercises for Chapter 17......Page 626
18.1 Introduction......Page 628
18.2 Random variables......Page 629
18.3 The distribution function and density......Page 631
18.3.1 The conditional distribution......Page 633
18.3.2 Joint distributions......Page 634
18.4.2 The binomial distribution......Page 636
18.4.3 The Poisson distribution......Page 639
18.4.4 The Poisson process......Page 640
18.5.1 The uniform distribution......Page 642
18.5.3 The normal distribution......Page 643
18.5.5 The chisquare distribution......Page 646
18.5.6 The t distribution......Page 648
18.5.7 The F distribution......Page 654
Exercises for Chapter 18......Page 656
19.2 Estimation......Page 662
19.2.1 Unbiasedness and efficiency......Page 663
19.3.1 Law of large numbers......Page 665
19.3.2 The central limit theorem......Page 668
19.4 Hypothesis testing......Page 670
19.4.2 Types of error......Page 671
19.4.3 Simple null and alternative hypotheses......Page 675
19.4.4 Uniformly most powerful tests......Page 677
19.4.5 Twosided tests......Page 683
19.4.6 Unbiased tests......Page 685
19.4.7 Use of chisquare and F tests......Page 686
19.5.1 Matrix computations......Page 687
19.5.2 Ordinary least squares......Page 689
19.5.3 Distributions and hypothesis testing......Page 692
19.5.4 Test statistics......Page 694
19.5.5 Hypothesis testing for the linear model......Page 695
Exercises for Chapter 19......Page 700
Bibliography......Page 704
Index......Page 706
Mathematics for Economists with Applications
Mathematics for Economists with Applications provides detailed coverage of the mathematical techniques essential for undergraduate and introductory graduate work in economics, business and finance. Beginning with linear algebra and matrix theory, the book develops the techniques of univariate and multivariate calculus used in economics, proceeding to discuss the theory of optimization in detail. Integration, differential and difference equations are considered in subsequent chapters. Uniquely, the book also features a discussion of statistics and probability, including a study of the key distributions and their role in hypothesis testing. Throughout the text, large numbers of new and insightful examples and an extensive use of graphs explain and motivate the material. Each chapter develops from an elementary level and builds to more advanced topics, providing logical progression for the student, and enabling instructors to prescribe material to the required level of the course. With coverage substantial in depth as well as breadth, and including a companion website at www.routledge.com/cw/bergin, containing exercises related to the worked examples from each chapter of the book, Mathematics for Economists with Applications contains everything needed to understand and apply the mathematical methods and practices fundamental to the study of economics. James Bergin is Professor of Economics at Queen’s University, Canada.
This page intentionally left blank
Mathematics for Economists with Applications
James Bergin
A First published 2015 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN and by Routledge 711 Third Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2015 James Bergin The right of James Bergin to be identified as author of this work has been asserted by him in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress CataloginginPublication Data Bergin, James. Mathematics for economists with applications / by James Bergin. – 1 Edition. pages cm 1. Economics, Mathematical. I. Title. HB135.B47 2015 510–dc23 2014020372 ISBN: 9780415638272 (hbk) ISBN: 9780415638289 (pbk) ISBN: 9781315819532 (ebk) Typeset in NewCenturySchlbk by Cenveo Publisher Services
Contents
List of figures
xv
Preface
xxi
1 Introduction
1
1.1 Introduction
1
1.2 A brief summary by chapter
1
2 Matrices and systems of equations 2.1 Introduction
19 19
2.1.1 Some motivating examples
20
2.2 Vectors
23
2.3 Matrices
24
2.3.1 Matrix addition and subtraction
25
2.3.2 Matrix multiplication
26
2.3.3 Matrix transpose
29
2.3.4 Matrix inversion
30
2.3.5 Additional remarks on determinants
40
2.3.6 Cramer’s rule
43
2.3.7 Gaussian elimination
44
2.4 Linear dependence and independence 2.4.1 Matrix rank and linear independence 2.5 Solutions of systems of equations v
50 55 57
CONTENTS
vi
2.5.1 Solving systems of equations: a geometric view
57
2.5.2 Solutions of equation systems: rank and linear independence
61
2.6 Special matrices
64
Exercises for Chapter 2
71
3 Linear algebra: applications
77
3.1 Introduction
77
3.2 The market model
77
3.3 Adjacency matrices
78
3.4 Input–output analysis
81
3.5 Inflation and unemployment dynamics
88
3.6 Stationary (invariant) distributions
89
3.6.1 Convergence to a stationary distribution
91
3.6.2 Computation of the invariant distribution
93
3.7 Econometrics 3.7.1 Derivation of the least squares estimator
94 97
Exercises for Chapter 3
101
4 Linear programming
105
4.1 Introduction
105
4.1.1 Formulation
106
4.1.2 The feasible region
109
4.1.3 Finding an optimal solution
110
4.1.4 Dual prices and slack variables
113
4.1.5 Some examples
116
4.1.6 Duality reconsidered
120
4.2 Basic solutions
123
4.2.1 Equalityconstrained programs
123
4.2.2 Definition and identification of basic solutions
125
4.3 Duality principles 4.3.1 Duality and dual prices Exercises for Chapter 4 5 Functions of one variable
131 134 137 145
5.1 Introduction
145
5.2 Some examples
146
CONTENTS
vii
5.2.1 Demand functions
146
5.2.2 Present value and the interest rate
147
5.2.3 Taxation
148
5.2.4 Options
149
5.3 Continuity and differentiability
151
5.3.1 Preliminaries
151
5.3.2 Inequalities and absolute value
153
5.3.3 Continuity
154
5.3.4 Differentiability
156
5.4 Types of function
158
5.4.1 Plots of various functions
159
5.4.2 The log and exponential functions
161
5.5 Growth rates
163
5.5.1 Additivity of continuous growth rates 5.5.2 The functions ln x (log e x) and e
kx
164 165
5.6 Some rules of calculus
166
5.6.1 l’Hospital’s rule
169
5.6.2 Higher derivatives
170
Exercises for Chapter 5 6 Functions of one variable: applications
171 175
6.1 Introduction
175
6.2 Optimization
176
6.2.1 The firstorder condition: issues
177
6.2.2 The secondorder condition
181
6.2.3 Applications: profit maximization and related problems
183
6.3 Concavity and optimization
187
6.3.1 Summary 6.4 Externalities 6.4.1 The freerider problem
193 193 193
6.5 Comparative statics
196
6.6 Elasticity of demand
200
6.6.1 Elasticity, revenue and profit maximization
202
6.7 Taxation and monopoly
203
6.8 Taxation incidence in supply and demand
205
CONTENTS
viii
6.8.1 A unit tax
206
6.8.2 An advalorem tax
207
6.8.3 Elasticities and taxation
209
6.9 Ramsey pricing
211
Exercises for Chapter 6
215
7 Systems of equations, differentials and derivatives
219
7.1 Introduction
219
7.2 Partial derivatives
223
7.3 Level contours
225
7.4 The elasticity of substitution
227
7.5 Market equilibrium: an application
230
7.5.1 Oligopoly: linear demand
232
7.5.2 Oligopoly: nonlinear demand
233
7.5.3 Industry concentration indices
234
7.6 Total differentials 7.6.1 The impact of parameter changes 7.7 Solutions to systems of equations 7.7.1 Existence and properties of solutions 7.8 The implicit function theorem 7.8.1 General systems of equations Exercises for Chapter 7 8 Taylor series 8.1 Introduction
235 240 242 244 247 253 263 269 269
8.1.1 Taylor series approximations: an example
270
8.1.2 Taylor series expansions
272
8.1.3 Approximations and accuracy
275
8.2 Applications of Taylor series expansions
276
8.2.1 Concavity and the second derivative
277
8.2.2 Roots of a function
278
8.2.3 Numerical optimization
279
8.3 Expected utility theory and behavior towards risk
285
8.3.1 Expected utility theory
285
8.3.2 Riskaverse preferences and welfare loss
287
8.3.3 Risk aversion and the cost of small risk
290
8.4 Diversification and portfolio selection
293
CONTENTS
ix
8.4.1 Diversification
293
8.4.2 Portfolio selection
294
8.5 The multivariate version of Taylor’s theorem
296
8.6 Proof of Taylor’s theorem
297
Exercises for Chapter 8
301
9 Vectors
307
9.1 Introduction
307
9.2 Vectors: length and distance
307
9.3 Vectors: direction and angles
310
9.4 Hyperplanes and direction of increase
313
9.5 Farkas’ lemma
316
Exercises for Chapter 9
319
10 Quadratic forms
321
10.1 Introduction
321
10.2 Quadratic forms
322
10.3 Positive and negative definite matrices
325
10.3.1 Symmetric matrices
327
10.3.2 Criteria for positive and negative definiteness
328
10.3.3 Positive and negative semidefiniteness
329
10.4 Definiteness with equality constraints
330
Exercises for Chapter 10
333
11 Multivariate optimization
335
11.1 Introduction
335
11.2 The twovariable case
335
11.2.1 Motivation for the secondorder conditions
339
11.2.2 Failure of the secondorder conditions
347
11.2.3 The envelope theorem
349
11.3 Optimization: n variables
357
11.3.1 Motivation for the Hessian conditions
360
11.3.2 Definiteness and secondorder conditions
361
11.4 Concavity, convexity and the Hessian matrix
367
Exercises for Chapter 11
369
12 Equalityconstrained optimization 12.1 Introduction
375 375
CONTENTS
12.1.1 A motivating example
x
376
12.2 The twovariable case
378
12.3 Motivation for first and secondorder conditions
384
12.3.1 The firstorder condition
385
12.3.2 The secondorder condition
385
12.3.3 A geometric perspective
387
12.4 The nvariable case
390
12.4.1 Quadratic forms and bordered matrices
391
12.4.2 Bordered Hessians and optimization
392
12.4.3 The secondorder condition reconsidered
395
12.5 Interpretation of the Lagrange multiplier
396
12.6 Optimization: concavity and convexity
402
12.6.1 Quasiconcavity and bordered Hessians
409
12.7 Optimization with many constraints
411
Exercises for Chapter 12
415
13 Inequalityconstrained optimization 13.1 Introduction
419 419
13.1.1 A motivating example
420
13.1.2 Overview
422
13.2 Optimization
423
13.2.1 The constraint qualification
430
13.2.2 Complementary slackness
434
13.2.3 Minimization and maximization
436
13.2.4 Global and local optima
436
13.2.5 Nonnegativity constraints
437
13.2.6 Necessary conditions and nonconvexities
442
13.2.7 The Lagrange multiplier
444
13.2.8 Equality and inequality constraints
444
Exercises for Chapter 13 14 Integration
447 449
14.1 Introduction
449
14.2 The integral
450
14.3 Common integrals and rules of integration
452
14.4 Measuring welfare: surplus
452
14.4.1 Consumer surplus
452
CONTENTS
xi
14.4.2 Producer surplus
453
14.4.3 Consumer surplus and welfare
454
14.5 Cost, supply and profit 14.5.1 Average and marginal cost
456 458
14.6 Taxation and consumer surplus
461
14.7 Present value
465
14.8 Leibnitz’s rule
467
14.9 Inequality measures
469
14.10 Ramsey pricing
471
14.11 Welfare measures
474
14.11.1 Consumer surplus
474
14.11.2 Compensating variation
476
14.11.3 Equivalent variation
482
14.11.4 Comparison of welfare measures
485
Exercises for Chapter 14 15 Eigenvalues and eigenvectors
491 493
15.1 Introduction
493
15.2 Definitions and basic properties
494
15.3 An application: dynamic models
500
15.3.1 Supply and demand dynamics
502
15.3.2 A model of population growth
504
15.4 The Perron–Frobenius theorem
508
15.5 Some applications
511
15.5.1 Webpage ranking
511
15.5.2 Leslie matrices
515
15.6 Eigenvalues and eigenvectors of symmetric matrices
519
15.7 Real and complex eigenvalues and eigenvectors
523
15.8 Diagonalization
529
15.9 Properties of eigenvalues and eigenvectors
530
15.9.1 Largest eigenvalues and eigenvalue multiplicity
534
Exercises for Chapter 15
537
16 Differential equations
541
16.1 Introduction 16.1.1 Preliminary discussion 16.2 Firstorder linear differential equations
541 542 544
CONTENTS
xii
16.2.1 Constant coefficients
544
16.2.2 Variable coefficients
548
16.3 Some nonlinear firstorder differential equations
552
16.3.1 The logistic model
552
16.3.2 The Bernoulli equation
554
16.4 Firstorder differential equations: existence of solutions
557
16.5 Second and higherorder differential equations
558
16.5.1 Constant coefficients
558
16.5.2 Derivation of solution
559
16.6 Systems of differential equations
563
16.7 Stability
566
Exercises for Chapter 16
569
17 Linear difference equations
573
17.1 Introduction
573
17.2 Motivating examples
574
17.3 Firstorder linear difference equations
576
17.3.1 Solving firstorder linear difference equations
577
17.3.2 The constant forcing function
577
17.3.3 The cobweb model
579
17.3.4 Equations with variable forcing function
581
17.4 Secondorder linear difference equations
582
17.4.1 The inhomogeneous equation
583
17.4.2 The homogeneous equation
583
17.4.3 The general solution
584
17.4.4 Stability
585
17.4.5 Examples
585
17.4.6 The Samuelson multiplieraccelerator model
586
17.5 Vector difference equations
589
17.5.1 The particular solution
589
17.5.2 The complementary solution
590
17.6 The nvariable case
592
17.7 Miscellaneous calculations
593
17.7.1 Various forcing functions: solutions
593
17.7.2 Roots of the characteristic equation
595
17.7.3 Stability
599
CONTENTS
xiii
Exercises for Chapter 17 18 Probability and distributions
603 605
18.1 Introduction
605
18.2 Random variables
606
18.3 The distribution function and density
608
18.3.1 The conditional distribution
610
18.3.2 Joint distributions
611
18.4 Discrete distributions
613
18.4.1 The Bernoulli distribution
613
18.4.2 The binomial distribution
613
18.4.3 The Poisson distribution
616
18.4.4 The Poisson process
617
18.5 Continuous distributions
619
18.5.1 The uniform distribution
619
18.5.2 The exponential distribution
620
18.5.3 The normal distribution
620
18.5.4 Momentgenerating functions
623
18.5.5 The chisquare distribution
623
18.5.6 The t distribution
625
18.5.7 The F distribution
631
Exercises for Chapter 18 19 Estimation and hypothesis testing
633 639
19.1 Introduction
639
19.2 Estimation
639
19.2.1 Unbiasedness and efficiency 19.3 Largesample behavior
640 642
19.3.1 Law of large numbers
642
19.3.2 The central limit theorem
645
19.4 Hypothesis testing
647
19.4.1 A hypothesis test
648
19.4.2 Types of error
648
19.4.3 Simple null and alternative hypotheses
652
19.4.4 Uniformly most powerful tests
654
19.4.5 Twosided tests
660
19.4.6 Unbiased tests
662
CONTENTS
xiv
19.4.7 Use of chisquare and F tests 19.5 Econometric applications
663 664
19.5.1 Matrix computations
664
19.5.2 Ordinary least squares
666
19.5.3 Distributions and hypothesis testing
669
19.5.4 Test statistics
671
19.5.5 Hypothesis testing for the linear model
672
Exercises for Chapter 19
677
Bibliography
681
Index
683
Figures
1.1 Cost flows
6
1.2 Reaction curves
9
1.3 Direction of steepest increase
12
1.4 Indifference curves and optimal choices
13
1.5 Nonoptimal, optimal and nonfeasible choice
14
2.1 Combinations of basis vectors
53
2.2 Intersecting lines: a unique solution
59
2.3 Combinations of basis vectors
60
2.4 Overlapping or parallel lines: many or no solutions
61
3.1 Fitting a line
95
4.1 Basic solutions
129
5.1 Choices determine demand
146
5.2 The present value function
149
5.3 The impact of taxation in a market
150
5.4 Options
151
5.5 Giffen good; a noninjective demand function
152
5.6 The derivative
157
5.7 Plot of f (x) = min{ x,1 − x}
158
5.8
160
a
f (x) = x , for various values of a x
5.9 Plot of f (x) = a , for various values of a xv
160
FIGURES
xvi
5.10 Plot of sin(x) and cos(x) for −2π ≤ x ≤ 2π ¡ ¢ 5.11 Plot of sin(x) and sin x − π2 for −2π ≤ x ≤ 2π
161 161
5.12 The functions e x and ln x (≡ log e (x))
165
6.1 Changing slopes
177
6.2 Optimization: some possibilities I
178
6.3 Optimization: some possibilities II
179
6.4 Minimization and inflection points p 6.5 Optimization: f (x) = 3 x − x
181 182
6.6 Revenue, cost and profit
183
6.7 Marginal cost and marginal revenue
184
6.8 Concavity
188
6.9 Concave and convex functions ′
189
′
6.10 Plot of derivatives f (x), g (x)
190 ′′
6.11 Strictly concave function, f , with f = 0 at x = 1
192
6.12 Congestion externalities
195
6.13 Impact of a parameter change on choice variable
197
6.14 Demand elasticity
201
6.15 The impact of taxation on price and quantity
205
6.16 Large and small changes in quantity
210
6.17 Taxation with elastic demand or supply
211
6.18 Ramsey pricing
212
7.1 The impact of price variations on demand 1 2
7.2 The function f (x, y) = x y
1 2
7.3 Isoquant slope
225 226
7.4 Elasticity of substitution 7.5 Plots of level contours of
222
228 ρ ρ 1 u = [x1 + x2 ] ρ
229
7.6 Optimal income–work combination
238
7.7 A parameter change
241
7.8 No solution
243
7.9 Multiple solutions
243
7.10 A system with two solutions
245
7.11 The impact of varying output on x and y
256
7.12 The IS–LM curves
257
7.13 The impact of increasing distance
260
FIGURES
xvii
x
8.1 Taylor series approximation of f (x) = ln(x + 1)e− 2
271
8.2 Firstorder Taylor series expansion
274
8.3 Taylor series approximations 8.4 Roots of a function:
f (x) = − 12
276 2
+ 4x − 4x + x
3
279
8.5 Risk aversion
288
8.6 Utility loss associated with risk
289
8.7 The cost of risk
291
9.1 Vectors and length
308
9.2 Vectors
309
9.3 Length
309
9.4 Orthogonality
311
9.5 Angles between vectors
311
9.6 Projections
312
9.7 Trigonometric functions
313
9.8 Orthogonal vectors
313
9.9 The normal to a hyperplane
314
9.10 The normal to a hyperplane
315
9.11 The sign of an inner product
315
9.12 Proof of Farkas’ lemma
317
9.13 Farkas’ lemma: a geometric view
318
11.1 The function f (x, y) = 1 − x2 − y2
336
11.2 The function f (x, y) = x − y
348
2
2
2
11.3 The function f (x, y) = x y
348
11.4 The impact of parameter variation on the objective value
350
12.1 Constrained optimal choice
377
12.2 Optimal leisure choice
377
12.3 Orthogonal lines and normal vectors
388
12.4 Utility maximization: the indifference curve
389
12.5 Utility maximization: optimization
390
12.6 Tangency condition fails
393
12.7 Tangencies at the solution
394
12.8 Quasiconcavity
404
12.9 A linear constraint
406
FIGURES
xviii
13.1 Sign of the Lagrange multiplier
426
14.1 The integral of f from a to b
451
14.2 The integral of f : definition
451
14.3 Consumer surplus
453
14.4 Producer surplus
453
14.5 Consumer and producer surplus
454
14.6 Cost and profit
455
14.7 The supply curve
457
14.8 Profit and cost
458
14.9 Average cost curves and marginal cost
459
14.10 Cost curves
460
14.11 Cost curves
461
14.12 Supply
461
14.13 Taxation and consumer surplus
462
14.14 The Lorenz curve
470
14.15 Utility maximization and consumer surplus
475
14.16 Compensating variation
477
14.17 The optimal costminimizing choice
478
14.18 The impact of price changes on quantities
481
14.19 Compensating variation
482
14.20 Equivalent variation
483
14.21 Equivalent variation
484
14.22 Demand variation
486
14.23 Different welfare measures
489
15.1 Price dynamics
505
15.2 Internet nodes
512
15.3 Adjacency matrix and corresponding transition process
512
15.4 Rotation of 90
◦
527
16.1 The logistic curve
554
16.2 Euler’s formula
560
17.1 Plot for yt +
¡3¢ 4
yt−1 , c = 72 , y0 = 72 .
578
17.2 Cobweb dynamics
581
17.3 Stability of secondorder system
585
FIGURES
xix
17.4 A converging process
587
17.5 Dynamic reaction functions
592
a21 − 4a2
597
17.6 Plot of
= 0, 1 + a1 + a2 = 0 and a1 + 2a2 = 0
17.7 Stable regions: real, repeated and complex roots
601
18.1 Probability density
607
18.2 Cumulative distribution functions
609
18.3 Density functions
610
18.4 The function F(x, y) =
xy 2− x y
611
18.5 The binomial distribution
18.6 The Poisson distribution: P(X = k) =
616 e −λ λ k k!
617
18.7 Poisson process: density at t
619
18.8 The normal density
621
18.9 Distribution function of standard normal random variable
622
18.10 The chisquare distribution with three degrees of freedom q 18.11 The region {(y, z)  y ≤ t kz }
626 627
18.12 The t distribution with 4 and 30 degrees of freedom
628
18.13 The F distribution: F3,12 and F5,3 distributions
631
19.1 Tradeoff between Type I and Type II error
650
19.2 Distributions at µ = 30 and µ = 30.5
658
19.3 Distributions at µ = 31 and µ = 31.5
658
19.4 Power function; onesided test
659
19.5 Power functions for sample sizes of 50 and 225
660
19.6 Power functions for sample sizes of 50 and 225
662
This page intentionally left blank
Preface
Courses in undergraduate mathematical economics are taught at various levels, from the second to fourth year, and as preparatory material for graduate students. As a result, the diversity in the backgrounds of students taking such courses is substantial with large variations in training and knowledge. This diverse readership makes it difficult to present the mathematics used in economics at a suitable level. Nevertheless, there is a basic set of mathematical techniques considered prerequisite for many third and fourthyear courses and which should be discussed in any course on introductory mathematical economics. While the material in this book varies substantially in terms of level and difficulty, it is possible to identify a subset of the material that may be used for a first course on mathematical techniques in economics. The following outline suggests a suitable collection of essential material at the undergraduate level. Chapter 2 on matrices and systems of equations provides a standard discussion of linear equation systems and rules on matrix manipulation. In particular, the material from Section 2.1 to Section 2.3.5 provides these rules, along with various techniques for solving such systems. Following this, Chapter 3 describes a variety of applications of matrix algebra, including input–output models, adjacency matrices and invariant distributions. These are discussed in Section 3.1 to Section 3.6. Proceeding to functions and calculus, Chapter 5 considers functions of a single variable. This chapter introduces continuity and differentiability of a function, and develops the basic techniques and rules of calculus and differentiation. In addition, the standard functions, including log and exponential function, are discussed in detail.
xxi
PREFACE
xxii
Chapter 6 provides a range of calculus applications, including optimization and the use of calculus techniques to determine elasticities and taxation incidence. In Chapter 7, the study of functions of many variables begins. Section 7.1 to Section 7.6 develop partial derivatives along with a variety of applications, including the study of contours of functions, the elasticity of substitution and comparative statics calculations. In Chapter 8, Section 8.1 and Section 8.2 cover a broad range of applications of Taylor series expansions, including finding the roots of an equation, the development of numerical optimization routines and the calculation of measures of risk aversion. Multivariate optimization is discussed in Chapter 11 for the unconstrained case. Sections 11.1 and 11.2 examine the twovariable problem, motivate the firstand secondorder conditions and describe the envelope theorem. Chapter 12 develops optimization with equality constraints. Section 12.2 considers the case of two variables, while motivations for the necessary and sufficient conditions for an optimum are given in Section 12.3. Integration is introduced in Chapter 14. Section 14.2 defines the integral, and some basic rules of integration are presented in Section 14.3. Common applications, such as the measurement of welfare surplus and the calculation of present value, are described in Sections 14.4 through 14.7. In Chapter 16, differential equations are studied. Section 16.1 introduces the topic and Section 16.2 develops the basic analysis for the firstorder linear differential equation. Chapter 17 considers difference equations. Section 17.2 provides some motivating examples and Section 17.3 provides the mathematical techniques for solving these equations. Summarizing this suggested course outline:
Chapter
Section
2
2.1–2.3.5
3
3.1–3.6
5
5.1–5.6
6
6.1–6.8
7
7.1–7.6
8
8.1–8.2
11
11.1–11.2
12
12.1–12.3
14
14.1–14.7
16
16.1–16.2
17
17.1–17.3
1 Introduction
1.1
Introduction
This introduction provides a brief overview of the contents of the book, briefly describing the focus of each chapter.
1.2
A brief summary by chapter
Chapter 2 introduces matrices and the corresponding linear algebra. A matrix is a rectangular array of numbers, which may be used to succinctly represent sets of equations, network connections, evolutionary dynamic systems and so on. Manipulation of matrices becomes essential to the efficient study of large systems. The following example illustrates the benefit of matrices when moving from single equation problems to systems of equations. In the supplyanddemand model, demand (D) for some good is negatively related to price, and supply (S) positively related. Thus, D = α − β p, where p is the
price of the good. The letters α and β denote parameters of the model with α > 0, denoting that an increase in price reduces demand. The formulation of supply is similar – except that supply ought to be positively related to price, so that we postulate that δ > 0 in the relation S = γ + δ p. Combining these two relationships gives us the supplyanddemand model:
D = α − β p, α > 0, β > 0
(1.1)
S = γ + δ p, γ < 0, δ > 0. At what price does the good sell and how much of the good is sold? The market clearing assumption implies that the market clearing price will prevail. Denote by 1
CHAPTER 1. INTRODUCTION
2
p e the price that equates supply and demand. Then D e = α − β p e = γ + δ p e = S e .
Rearranging, α − β p e = γ + δ p e , and again, (α − γ) = (β + δ) p e . Divide both sides by (β + δ) to get the equilibrium price p e =
( α − γ) (β+δ) .
Since at this price the quantity
demanded is equal to the quantity supplied, we can determine the equilibrium quantity, denoted Q e . We find Q e by hsubstituting p e into either the supply or i (δα+βγ) ( α − γ) demand equation: Q e = α − β p e = α − β (β+δ) = (β+δ) . Summarizing:
pe =
(α − γ ) , (β + δ)
Qe =
(δα + βγ) . (β + δ)
(1.2)
With multiple markets, the computations become more complex and motivate the use of matrices. With two (or more) markets, there may be crossprice effects: the price of bread may affect the demand for margarine; the price of tea may affect the demand for coffee. As a result, the computations to determine the market clearing prices and quantities become more complex. To capture these crossprice effects, write the demand and supply in both markets:
D 1 = a0 + a1 p1 + a2 p2 ,
Market 1:
D 2 = α0 + α1 p1 + α2 p2 ,
Market 2:
S1 = b 0 + b 1 p1 + b 2 p2
S2 = β0 + β1 p1 + β2 p2 .
Equilibrium occurs when prices equate supply and demand in both markets (D 1 = S1 and D 2 = S2 ). To find the equilibrium level of quantity and price in each market, equate supply and demand in each market. Market 1: Market 2:
D 1 = S1
D 2 = S2
or or
a0 + a1 p1 + a2 p2 = b0 + b1 p1 + b2 p2
α0 + α1 p1 + α2 p2 = β0 + β1 p1 + β2 p2 .
Rearranging these equations gives: ( a1 − b1 ) p1 + ( a2 − b2 ) p2 = ( b0 − a0 ) (α1 − β1 ) p1 + (α2 − β2 ) p2 = (β0 − α0 ). These equations can be solved by substitution. However, the process is tedious. From the first equation
p1 =
( b0 − a0 ) ( a2 − b2 ) − p2 . ( a1 − b1 ) ( a1 − b1 )
Substitute into the second equation to get: (α1 − β1 )
·
¸ ( b0 − a0 ) ( a2 − b2 ) − p2 + (α2 − β2 ) p2 = (β0 − α0 ). ( a1 − b1 ) ( a1 − b1 )
1.2. A BRIEF SUMMARY BY CHAPTER
3
Collecting terms in p2 gives: · ¸ ( a2 − b2 ) (b o − a o ) (α2 − β2 ) − (α1 − β1 ) . p2 = (β o − α o ) − (α1 − β1 ) ( a1 − b1 ) ( a1 − b1 )
Multiplying both sides by (a1 − b1 ) gives: [(α2 − β2 )(a1 − b1 ) − (α1 − β1 )(a2 − b2 )] p2 = (β o − α o )(a1 − b1 ) − (α1 − β1 )(b o − a o ). Thus,
p2 =
[(β o − α o )(a1 − b1 ) − (α1 − β1 )(b o − a o )] . [(α2 − β2 )(a1 − b1 ) − (α1 − β1 )(a2 − b2 )]
A similar calculation gives p1 . With three or more equations, the procedure becomes progressively more complicated. The use of matrix algebra simplifies problems of this sort significantly. In matrix form: Ã
( a1 − b1 )
(α1 − β1 )
( a2 − b2 )
!Ã
(α2 − β2 )
p1
!
p2
=
Ã
b0 − a0
!
β0 − α0
or Ã
p1 p2
!
=
Ã
( a1 − b1 )
( a2 − b2 )
(α1 − β1 )
! −1 Ã
(α2 − β2 )
The matrix Ã
( a1 − b1 )
(α1 − β1 )
( a2 − b2 )
b0 − a0
β0 − α0
!
.
! −1
(α2 − β2 )
is called the inverse of Ã
( a1 − b1 )
(α1 − β1 )
( a2 − b2 )
(α2 − β2 )
!
.
The inverse can be written (details later): Ã
( a1 − b1 )
(α1 − β1 )
( a2 − b2 )
(α2 − β2 )
! −1
Ã 1 (α2 − β2 ) = d −(α1 − β1 )
! −( a 2 − b 2 )
( a1 − b1 )
where d = (a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 ). Thus, Ã
p1 p2
!
! !Ã Ã 1 (α2 − β2 ) −(a2 − b2 ) (b0 − a0 ) = d −(α1 − β1 ) (a1 − b1 ) (β0 − α0 ) ! Ã 1 (α2 − β2 )(b0 − a0 ) − (a2 − b2 )(β0 − α0 ) = . d −(α1 − β1 )(b0 − a0 ) + (a1 − b1 )(β0 − α0 )
,
CHAPTER 1. INTRODUCTION
4
So, p1 =
[(α2 − β2 )(b0 − a0 ) − (a2 − b2 )(β0 − α0 )] [(a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 )]
p2 =
[−(α1 − β1 )(b0 − a0 ) + (a1 − b1 )(β0 − α0 )] . [(a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 )]
The early part of Chapter 2 describes the basic rules for matrix manipulation and for solving systems of equations. Not all systems have a solution, and some equation systems may have many solutions. This leads to the task of identifying when equation systems are soluble and the circumstances under which there is a unique solution. The key ideas of rank and linear independence are introduced to study this matter. The chapter concludes with some additional observations on matrix algebra. In Chapter 3, matrix techniques are used to illustrate the many uses of matrix algebra. These include linear demand systems, input–output techniques, Markov processes and invariant distributions, and least squares estimation. To illustrate, consider input–output analysis, where the economy is viewed as a number of interlinked industries (n industries). The output of an industry is used for final consumption, and in the production of other goods. The total output of an industry is allocated between that used in production in other industries (interindustry demand) and consumption or final demand. Planning to achieve some (target) level of consumption or final demand indirectly implies production levels in each industry. Input–output analysis is concerned with determining the industry production levels required to sustain or achieve target levels across the entire range of final demands. A simple model is developed next to illustrate the procedure. The discussion illustrates the method and the use of matrix methods to determine the necessary level of industry output. Let a i j = amount of i required to produce 1 unit of j; let x i = output of i and f i = final demand for i. Thus, if x j is the amount of good j produced, a i j x j is the amount of i required in the production of j. (If one kilo of good j is produced, a i j is the fraction of one kilo of i used in the production of the kilo of j.) The output of industry i is allocated to production in other industries (a i j x j ) and final demand ( f i ). Thus, x i = a i1 x1 + a i2 x2 + · · · + a in x n + f i , i = 1,... , n. This can be rearranged: −a i1 x1 − a i2 x2 · · · + (1 − a ii )x i · · · − a in x n = f i , i = 1,... , n,
1.2. A BRIEF SUMMARY BY CHAPTER
5
or, in matrix notation:
(1 − a11 )
−a 21 .. .
−a n1
−a12
···
−a1n
(1 − a22 ) .. .
···
−a2n .. .
−a n2
···
..
.
x1
f1
x f 2 2 = . .. .. . .
(1 − a nn )
xn
fn
This expression can be written succinctly: (I − A)x = f , where the matrices are:
a11
a12
a 21 A= .. .
···
a22 .. .
···
a n2
···
a n1
..
.
1 0 a2n , I = .. .. . .
a1n
0
a nn
0
···
1 .. .
···
0
···
..
.
0
0 .. . . 1
Chapter 4 develops the theory of linear programming. Optimization arises in many contexts. For example, utility maximization involves determining the consumption bundle that maximizes utility subject to a linear constraint, the budget constraint. While the objective function, the utility function, is nonlinear, the set of feasible choices is defined by a linear constraint. Chapter 4 introduces constrained optimization in the linear context: both the objective and the constraints are linear. Linear programming is widely used in industry and so has immediate practical usefulness. The following transportation problem shows how a linear programming problem arises. A firm has two sources of input, two processing plants and three markets to supply. The shipping costs are given in the following table. Source 1
Source 2
Market 1
Market 2
Market 3
Plant A
2
3
5
4
5
Plant B
3
2
3
4
6
The chart in Figure 1.1 depicts this graphically. Let x iA j = # tons from source i through plant A to market j and x iB j = # tons
from source i through plant B to market j. Inputs are shipped from source to plant, and then to market. The cost of shipping one ton from source 1 through
plant A to market 3 is 2 + 5 = 7 (the cost of shipping from source 1 to plant A plus
the cost of shipping from plant A to market 3). Thus, the cost of shipping x1A3 tons on this route is 7x1A3 . Other route costs are calculated in the same way. Finally, sources 1 and 2 have, respectively, 10 and 20 units of raw material available.
CHAPTER 1. INTRODUCTION
6
Source 1
Source 2 $3
$2
$3
$2
Plant A
Plant B $5
$5
$3
$4
Market 1
$4 Market 2
$6 Market 3
Figure 1.1: Cost flows
Market demand is 10 in market 1, 15 in market 2 and 5 in market 3. This leads to the program: min
C = 7x1A1 + 6x1A2 + 7x1A3 + 6x1B1 + 7x1B2 + 9x1B3 + 8x2A1 + 7x2A2 + 8x2A3 + 5x2B1 + 6x2B2 + 8x2B3 ,
subject to: x1A1 + x1A2 + x1A3 + x1B1 + x1B2 + x1B3 ≤ 10 x2A1 + x2A2 + x2A3 + x2B1 + x2B2 + x2B3 ≤ 20 x1A1 + x1B1 + x2A1 + x2B1 ≥ 10 x1A2 + x1B2 + x2A2 + x2B2 ≥ 15 x1A3 + x1B3 + x2A3 + x2B3 ≥ 5 x iA j , x iB j ≥ 0, ∀ i, j. From a theoretical point of view, the subject introduces many key ideas in economics: optimization, constraints and feasible choices, dual or shadow prices of constraints, the notion of a slack variable and corresponding constraint, and so on. The first part of Chapter 4 introduces the basic concepts used to define a linear program, formulating objectives, constraints and feasible region. Then a graphical approach is used to illustrate the general solution technique. One key idea is that a solution to a linear program can always be found at a corner or vertex of the feasible region. Since there are a finite (although possibly large)
1.2. A BRIEF SUMMARY BY CHAPTER
7
number of vertices, this fact greatly simplifies the search for a solution. Examples illustrate this fact and utilize it to locate the optimal solution. In addition, computation of shadow prices may be done directly for simple problems where the graphical technique is used. The second part of the chapter considers the general structure of these problems. Basic solutions are introduced and vertices are identified as basic solutions. The computation of basic solutions is straightforward, so this observation provides a means of locating the vertices of the feasible region. These observations provide a basis for algorithms to determine the solution. The concluding part of the chapter provides a discussion of duality theory and the theory underlying dual or shadow prices. Chapter 5 is concerned with functions of one variable and begins by showing how various functions, demand functions, present value functions, and so on, arise in economics. A demand function is written p(q), relating price to quantity demanded. To measure pricedemand responsiveness, one must consider the price p(q′ ) at which some alternative demand q′ would be forthcoming. This naturally leads to the calculation p(q′ )− p(q), or
p(q′ )− p(q) , giving q′ − q
the perunit price variation
associated with a quantity variation. Similarly, if a single firm supplies the market, the total revenue to the firm is r(q) = q p(q), and if the cost of output level q is
c(q), the firm’s profit is π(q) = r(q) − c(q). In this way, functions arise routinely in
economic problems and calculations such as the increase in cost per unit increase in output
c(q′ )− c(q) , q′ − q
the marginal cost, lead naturally to calculus and derivatives.
Working with functions requires an understanding of the basic concepts of continuity and differentiability. These are introduced in the early part of the chapter. Following this, a variety of functions are described. The log and exponential functions are discussed in detail since they appear regularly in economic calculations, in particular in calculations concerning growth and present value. Various rules of calculus are described for the functions of functions (sums, products and ratios). Finally, higherorder derivatives are introduced. In Chapter 6 a number of applications illustrate the use of functions in economics. Optimization for functions of one variable is discussed in detail, by considering the behavior of the first and second derivatives. Subsequently, the determination of optima is related to concavity or convexity of the function, and this in turn to properties of the derivatives. Following the discussion of optimization with a single variable, a range of applications are considered. A model of externalities is developed to describe the freerider problem and market inefficiency. Then the elasticity function is described and this is later applied to the study of taxation and deadweight loss. For example, if the pretax equilibrium is at Q e with price p e and posttax output and price at Q t , p t , the impact of the tax is a
CHAPTER 1. INTRODUCTION
8
quantity reduction of Q t − Q e and price increase of p t − p e . The deadweight loss is the area between Q t and Q e above the supply curve and under the demand curve. In the linear case, this is easy to compute, giving an expression for deadweight loss. In this chapter, comparative statics are introduced by considering the impact of parameter changes on the equilibrium or the optimal level of a variable. The incidence of a product tax is considered in detail. Finally, the chapter concludes with a discussion of Ramsey pricing. Chapter 7 introduces functions of many variables and systems of nonlinear equations. The chapter begins with the definition of a partial derivative and the notion of a level contour of a function. This is illustrated with the familiar isoquant where curvature is described with the marginal rate of substitution. Partial derivatives are used to develop the notion of a Cournot equilibrium in the context of oligopoly. The market inverse demand function is given by the equation p = f (Q), where p is the market price and Q the aggregate quantity. There are
two firms, labeled 1 and 2. If firm i supplies quantity q i , the aggregate output and supply is Q = q1 + q2 . The cost per unit of output for firm i is given by c i – so that if firm i produces q i , its cost of production is c i q i . When aggregate output is Q = q1 + q2 , the revenue of firm i is q i f (Q). Thus, the total profit of firm i is π i (q1 , q2 ) = q i f (q1 + q2 ) − c i q i .
This notation makes explicit the dependence of profit on the output of each firm and illustrates how multivariate functions may arise. The analysis of firm behavior in the linear demand case p = a − bQ may be
taken further here. In this case, the profit of firm i is π i (q i , q j ) = pq i − c i q i , where i 6= j. Since price depends on the quantities, π i (q i , q j ) = [a − b(q1 + q2 )]q i − c i q i ,
where i 6= j. This can be rearranged: π i (q i , q j ) = [(a − c i ) − b(q1 + q2 )]q i ; again i 6= j.
How a firm decides on its level of output depends on how the other firm is expected to behave. A myopic assumption for a firm, say firm 1, is that firm 2 will not adjust its output in response to the choice of firm 1. In general, when one firm adjusts it output, the other firm will also wish to do so. In the particular case where both firms’ quantity levels are such that neither firm wants to adjust its output level, the output levels are equilibrium levels. The fact that a firm does not wish to change its level of output is the relevant issue. The fact that this decision is based on the naive assumption that the other firm would not respond is irrelevant. If firm i takes as given the output of j and maximizes profit with respect to i (by differentiating with respect to q i ), this gives: (a − c i ) − 2bq i − bq j = 0, i 6= j. Solving for q i gives q i = [(a− c i )− bq j ]/2b. In particular, q1 = [(a− c1 )− bq2 ]/2b, and q2 = [(a − c2 ) − bq1 ]/2b. Denote these expressions q1 (q2 ) and q2 (q1 ), respectively.
1.2. A BRIEF SUMMARY BY CHAPTER
9
These are depicted in Figure 1.2: the interpretation is that when firm 2 supplies the quantity q2 , then firm 1 would wish to choose the output level q1 (q2 ) and vice
q2
q1 (q2 )
q2 (q1 ) q1
Figure 1.2: Reaction curves versa. Note that at the quantity pair (q∗1 , q∗2 ), firm 1 wishes to choose q∗1 given that firm 2 is supplying q∗2 , and similarly for firm 2. (a − c1 ) 1 (a − c1 ) 1 − q2 = − 2b 2 2b 2 (a − c1 ) − (a − c2 ) 1 = + q1 . 2b 4
q1 =
½·
¸ ¾ (a − c2 ) 1 − q1 2b 2
(1.3)
Thus, the solution values for q1 and q2 , (q∗1 , q∗2 ), are: q∗1 =
· ¸ 4 1 (a − c1 ) − (a − c2 ) 6b 2
q∗2 =
· ¸ 4 1 (a − c2 ) − (a − c1 ) . 6b 2
(1.4)
Further development of the Cournot equilibrium allows a characterization of the price markup over marginal cost in terms of the elasticity. In addition, the Herfindahl–Hirschman index of industry concentration may be defined and related to the equilibrium markup in Cournot competition. This chapter also develops the techniques of comparative statics and provides a discussion of the implicit function theorem. Given two functions f and g of two variables (x and y), suppose that for some parameter α the functions satisfy α = f (x, y) and 0 = g(x, y). Then, at a solution (x, y) the total differentials (with a
CHAPTER 1. INTRODUCTION
10
variation in α, d α) give d α = f x (x, y)dx + f y (x, y)d y and 0 = g x (x, y)dx + g y (x, y)d y,
where, for example, f x (x, y) is the partial derivative of f at (x, y). Rearranging in matrix form: Ã
dα 0
!
=
Ã
f x (x, y)
f y (x, y)
g x (x, y)
g y (x, y)
!Ã
dx
!
dy
.
This sets up the x and y variations naturally in terms of the partial derivatives and d α. Ã
dx dy
!
=
Ã
! −1 Ã
f x (x, y)
f y (x, y)
g x (x, y)
g y (x, y)
dα 0
!
.
The formulation makes use of matrix algebra and highlights the fact that the matrix of partial derivatives must be invertible. The chapter proceeds to discuss the existence of solutions of nonlinear systems of equations and their behavior when parameters change. An informal discussion leads to the implicit function theorem. Following this, a variety of examples including familiar macroeconomic multiplier models are developed to illustrate the application of the theorem. Chapter 8 introduces Taylor series expansions. A Taylor series expansion involves approximating a continuous function by a polynomial function, with higherorder polynomials giving better approximations or greater accuracy. The value of this technique is substantial since polynomial functions are relatively simple, and, if the approximation is good, it provides important information on the properties of the original function. The chapter begins with an example showing a quadratic approximation of a function and then proceeds to describe expansions of order n. Some further discussion shows how accuracy improves as the order of the expansion increases. Following this, the chapter develops some applications of Taylor series expansions, beginning with a proof that concavity implies a negative second derivative. This application shows the usefulness of the expansion. A second application concerns numerical optimization. Much of the logic for numerical optimization routines derives from the fact that an arbitrary continuous function may be approximated locally as a quadratic function. This observation forms the basis for many numerical routines. The ideas behind this are developed in the chapter and a variety of examples used to illustrate how the process may be used to locate a maximum or minimum of a function f (x). Exactly the same process may be used to locate a root of a function (a value from x such that the function is equal to 0 at that value ( f (x) = 0)). Multidimensional generalizations are described. The chapter then discusses risk aversion using Taylor series expansions
1.2. A BRIEF SUMMARY BY CHAPTER
11
to provide measures of risk, of risk premia, and of welfare loss resulting from small risks. Finally, a proof of Taylor’s theorem is given for both the univariate and multivariate cases. Vectors are discussed in Chapter 9. A vector can be viewed from a few different perspectives. In one, as a rectangular array of numbers, it is a matrix with either one row or one column. A second interpretation views a vector as a direction, each coordinate representing a length in the corresponding direction. Thus x = (x1 , x2 ) is viewed as x1 units in direction 1 and x2 units in direction 2. In
this view, it makes sense to describe vectors in terms of direction, perpendicular
ity (or orthogonality), length in a given direction, and so on. In particular, using vectors it is possible to calculate the direction in which a function increases (or decreases) most rapidly. This perspective facilitates the study of many problems, including optimization. Specifically, considering a function f (x1 , x2 ), those values of x = (x1 , x2 ) that give a constant value for the function f determine a “level con
tour” of the function f . So, the set F = {(x1 , x2 )  f (x1 , x2 ) = c} is the level contour
of the function f at level c. Such curves appear in economics as, for example, isocost curves or indifference curves. If f is a utility function, the set F is the set of consumption combinations that give the same level of utility. It is common to consider the slope of the indifference curve by considering the variations of x1 and x2 that leave utility constant: 0 = f x1 (x1 , x2 )dx1 + f x2 (x1 , x2 )dx2 , with the
marginal rate of substitution given by
d x2 d x1
f x (x1 ,x2 ) . 2 1 ,x2 )
= − f x1 (x
But this information is
also contained in the vector ( f x1 (x1 , x2 ), f x2 (x1 , x2 )) and, in addition, the absolute magnitudes of the partial derivatives. It turns out that, viewed as a direction ∇ f = ( f x1 (x1 , x2 ), f x2 (x1 , x2 )) points in the direction of steepest increase in the func
tion f . Figure 1.3 illustrates this observation. Starting from x, the value of the function f increases most rapidly in the direction ∇ f and ∇ f is perpendicular to
the level surface of f at x. This fact is used extensively in the discussion of constrained optimization. Finally, Chapter 9 states and proves Farkas’ lemma. This useful result may be used to prove the main theorems of linear programming and provides valuable insight into the logic behind important results in constrained optimization theory. Chapter 10 discusses quadratic forms. A quadratic form is an expression x′ Ax where x is a vector and A a matrix. In the univariate case, this just amounts to x2 a. An important question concerns the circumstances in which such an expression is always positive or always negative. For example, what properties of A would imply that x′ Ax > 0 for all x 6= 0. Such a matrix is called a positive definite
matrix. When x′ Ax < 0 for all x 6= 0, the matrix is called negative definite. The standard characterization is given in terms of the leading principal minors of the
CHAPTER 1. INTRODUCTION
12
x2
x•
• x +∇f
f (z) = 2 ∇f
f x (x1 ,x2 )
slope= − f 1 (x ,x ) x2 1 2
f (z) = 1 x1
Figure 1.3: Direction of steepest increase
matrix (the leading principal matrix of order k in an n × n matrix is obtained by
deleting the last n − k rows and n − k columns). Positive and negativeness cor
respond to the cases where x′ Ax ≥ 0 (or x′ Ax ≤ 0) for all x. An important related question concerns positive or negative definiteness when the admissible x are constrained: under what conditions is x′ Ax < 0 or x′ Ax > 0 for all nonzero x satisfying
a set of linear restrictions: Bx = 0? This question arises in constrained optimiza
tion. Here, a full characterization is given in terms of an augmented (bordered) matrix and its leading principal minors. Chapter 11 is the first of three chapters on optimization, and considers multivariate optimization where there are no constraints on the choice variables. The chapter begins with a discussion of the twovariable case, that of maximization or minimization of a function f (x1 , x2 ). For example, the standard profitmaximizing problem writes profit π(x1 , x2 ) = ph(x1 , x2 ) − c1 x1 − c2 x2 , where h is a production function, p the price of output and c i the unit cost of input i.
The first and secondorder conditions for a local maximum or minimum are developed in detail, motivated by using Taylor series expansions, using the Hessian matrix (of secondorder partial derivatives). Following this, the envelope theorem is described at length: given a function f (x, α) depending on a choice variable x and parameter α, if x(α) maximizes f (x, α), how does f ∗ (α) = f (x(α), α) vary with
1.2. A BRIEF SUMMARY BY CHAPTER
13
α? This is the analog of a shadow or dual price in linear programming and can be
viewed as a measure of the value of the resource α. The chapter considers the nvariable case, generalizing the initial discussion. The chapter concludes by establishing the connection between concavity, convexity and the properties of the Hessian matrix. In Chapter 12, maximizing or minimizing a function subject to equality constraints is considered. Familiar problems are the maximization of utility P u(x1 ,... , x n ) subject to a budget constraint p i x i = y and minimization of the cost, P c = w i x i , of achieving a level of output y with a production function f (x1 ,... , x n ),
f (x1 ,... , x n ) = y. The following discussion describes utility maximization, to moti
vate subsequent discussion.
An illustrative example concerns an individual’s optimization problem. A consumer’s utility depends on the consumption of food, f , and drink, w, according to the formula: u( f , w) = f w. The respective prices of food and drink are p f and p w and the consumer’s income is I. The problem for the consumer is to maximize
utility subject to spending no more than I. Thus, the consumer is restricted to choosing those pairs ( f , w) that satisfy p f f + p w w = I. This equation can also be ¡p ¢ written w = pIw − p wf f and this equation gives a straight line relation between w ¡p ¢ and f , with slope − p wf . See Figure 1.4. The set of pairs ( f , w) that yield a given level of utility u¯ are given by the equation u¯ = f w. This determines a relation ¡ ¢ between f and w, w = uf¯ , which gives for any value of f the level of w uf¯ that ¯ When u¯ = 1 then f w = 1 so that w = 1f ; when u¯ = 2 then f w = 2 achieves utility u.
w
w
p
u¯ = 3
u¯ = 2 u¯ = 1 f
•
f •slope = − p w
f
Figure 1.4: Indifference curves and optimal choices
so that w = 2f ; and when u¯ = 3 then f w = 3 so that w = 3f . These calculations indicate that higher levels of utility correspond to higher curves, as in Figure 1.4.
CHAPTER 1. INTRODUCTION
14
Thus, the highest utility attainable is reached by going to the highest curve having some point in common with the equation p f f + p w w = I.
w
•
a
•
b
•
c
u¯ = 3 u¯ = 2
u¯ = 1 f
Figure 1.5: Nonoptimal, optimal and nonfeasible choice
Figure 1.5 illustrates a number of possibilities regarding indifference curves and the budget constraint: (a) feasible points on the indifference curve are not ¯ (b) this is the only optimal; (c) there are no affordable points giving the utility u; possible optimal situation – and there is a unique optimal point −( f ∗ , w∗ ). At this ∗
point the slope of the curve is − wf ∗ , while the slope of the straight line is constant p
p
∗
at − p wf . Since these slopes are equal at the tangency this gives: − wf ∗ = − p wf and so p w w∗ = p f f ∗ . The same amount is spent on food and drink. Since p w w∗ + p f f ∗ = I
and p w w∗ = p f f ∗ , I = 2p w w∗ = 2p f f ∗ . Thus w∗ =
I 2p w
and f ∗ =
I 2p f
.
Mathematically, this can be viewed as follows. The utility function is a function of food and drink u( f , w). The extra utility from increasing f a small amount is
∂u ∂f ,
the marginal utility of food at the consumption level ( f , w). Likewise, the marginal utility of drink at the consumption level is denoted
∂u ∂w .
If at ( f , g) the level of food
consumption is changed a small amount, d f , the impact on utility is
∂u df . ∂f
If at
( f , w) the levels of food and drink consumption are changed by small amounts, d f and dw, the impact on utility is
∂u df ∂f
u + ∂∂w dw. If the change in the pair ( f , w)
to ( f + d f , w + dw) is such that no change in utility occurs, then it must be that 0=
∂u df ∂f
+
∂u dw. ∂w
Since the marginal utilities are positive (more food or more
drink raises utility), an increase in one must be balanced by a reduction in the other to maintain the same utility level. When f changes and w changes to keep
1.2. A BRIEF SUMMARY BY CHAPTER
15
utility constant, it must be that:
dw = −
Ã
1 ∂u ∂w
!
∂u ∂f
df
or
dw =− df
∂u ∂f ∂u ∂w
.
This gives the slope of the indifference curve at ( f , g). From the previous discussion, this must equal the slope of the budget constraint at the optimal point where utility is maximized subject to the budget constraint. So, at the optimal point, ∂u ∂f ∂u ∂w
=
pf pw
.
The ratio of the marginal utilities is called the marginal rate of substitution, so at the optimal point the marginal rate of substitution is equal to the price ratio. However, this procedure is adhoc. The chapter develops systematic procedures for locating the optima. The initial part of Chapter 12 deals with the twovariable case, giving necessary and sufficient conditions for local maxima and local minima. For the case of just two variables, the constraint (subject to regularity conditions) determines one variable as a function of the other, so that one variable may be eliminated. For example, if the objective is f (x1 , x2 ) and the constraint g(x1 , x2 ) = c, then us
ing the constraint to solve for x2 as a function of x1 , x2 (x1 ), the objective may be maximized as a function of one unconstrained variable, max f (x1 , x2 (x1 )). This observation may be used to justify the secondorder conditions. Following this, the nvariable case is considered. Here, the secondorder conditions make use of the bordered Hessian matrix and properties derived from the theory of quadratic forms. The value of the Lagrange multiplier is interpreted as a shadow price and a general version of the envelope theorem given for the constrained optimization problem. In utility maximization, these calculations identify the Lagrange multiplier as the marginal value of income; in the costminimization case, the Lagrange multiplier is seen to be the marginal cost of additional output. Finally, this chapter discusses optimization in terms of the shape of the objective function and the constraints. The key idea is quasiconcavity (or quasiconvexity) of a function. This property is related to the bordered Hessian and used to provide a sufficient condition for a constrained global maximum. Chapter 13 considers optimization with inequality constraints. Thus, the task is to maximize or minimize a function f (x1 ,... , x n ) subject to a constraint g(x1 ,... , x n ) ≤ c or a collection of constraints g k (x1 ,... , x n ) ≤ c k , k = 1,... , m. The
chapter begins with some motivating examples. Perhaps the central idea in the chapter is the observation that the direction of improvement in the objective
CHAPTER 1. INTRODUCTION
16
must lie between the gradients of the constraints in the nonfeasible direction.
y
p f (x, y) = y + 3 x
g 1 (x, y) = y + x3 − 1
g 2 (x, y) = y + x4 − 2
g2 = 0
2
∇ g1
∇f
∇ g2
1 • g1 = 0
x
0 0
1
2
3
4
5
6
Figure 1.6: Inequalityconstrained optimization Let the feasible region be defined as those pairs (x1 , x2 ) satisfying g 1 (x1 , x2 ) ≤ 0
and g 2 (x1 , x2 ) ≤ 0. From the figure one may see that there are nonnegative numbers λ1 and λ2 such that
∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 ,
or, alternatively,
∂f ∂ xi
∂g
∂g
= λ1 ∂ x1 + λ2 ∂ x2 , for i = 1,2. Although nonnegativity of the i
i
choice variables (x1 , x2 ) is not imposed in this example, adding nonnegativity
constraints is straightforward and done later in the chapter. It turns out that this condition, ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 , is, in general, a necessary condition for a local optimum: a candidate solution necessarily satisfies this condition in general.
The benefit of this is obvious: it provides a system of equations that helps to locate the solution (but note that there are more unknowns than equations, when the λ variables are included). However, one problem with this insight is that in certain (exceptional) circumstances this equation fails to be a necessary condition for an optimum. This arises, for example, when the constraints have the same gradient at the candidate solution. Recognizing this issue, the constraint qualification is a requirement on the constraints, such that the expression ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2
is in fact necessary at a solution. The chapter motivates the constraint qualification with a number of examples. Following this, the nonnegativity constraints are introduced, leading to the familiar Kuhn–Tucker conditions. Chapter 14 discusses integration – the integral of a function. Most of the chapter is spent on applications, beginning with consumer and producer surplus. The
1.2. A BRIEF SUMMARY BY CHAPTER
17
chapter works through a range of problems: recovering total cost from marginal costs, calculating deadweight loss from taxation, inequality measures, Ramsey pricing and welfare measures in consumer theory. In Chapter 15, the theory of eigenvalues is developed. Many problems involving the use of matrices in economics have a dynamic aspect where a system evolves according to a rule of the form x t+1 = Ax t , where A is a square matrix.
Systems of this form arise in evolutionary models of population change, in Markov processes, matrix difference equations and so on. The theory of eigenvectors and eigenvalues facilitates the study of such systems. An eigenvector–eigenvalue pair for a matrix A is a vector v and scalar λ, such that Av = λv. The vector v is called
an eigenvector and the scalar λ an eigenvalue. Typically, an n × n matrix A has n distinct eigenvalues, {λ i }ni=1 , and a corresponding set of linearly independent
eigenvectors, {v i }ni=1 . Arranging the eigenvectors in a matrix P = [v1 ,... , vn ], the set of equations Av i = λ i v i may be written AP = P Λ, where Λ is a diagonal matrix with λ1 , λ2 ,... , λn on the main diagonal, and 0 elsewhere. Thus, Λ is a diagonal
matrix with the eigenvalues on the diagonal. With P nonsingular, this may be written A = P ΛP −1 . Considering A 2 = P ΛP −1 P ΛP −1 = P Λ2 P −1 , A 3 = P Λ3 P −1 , A t = P Λ t P −1 , higher powers of A are expressible in terms of powers of Λ. There
fore, the longrun behavior of A t is determined by Λ. This property has many applications, such as in the study of longrun behavior of Markov processes, in the study of the dynamics of population growth, and so on. In this chapter, eigenvalues are used to study the evolution of different age cohorts in a population and to provide a model of webpage ranking based on the longrun dynamics of a Markov process describing the transition dynamics between webpages. Dynamic models in economics are frequently developed in terms of differential or difference equations. Chapter 16 develops the theory of differential equations. In the simplest case, a differential equation expresses the derivative of some variable y′ (t) as a function of its current value y(t) and the independent variable t: y′ (t) = f (y(t), t). (Such an equation is called a firstorder differential equa
tion. Regarding terminology, a secondorder differential equation has the form y′′ (t) = f (y′ , y, t), and so on.) The problem is to determine the function y(t) from
this equation. For example, if the value of a loan p grows according to the formula p′ (t) = r p(t) = f (p(t), t), where r is the rate of interest at time t, what is p(t)?
Here, one can see that p(t) = p(0)e rt is the required function, where p(0) is the ini
tial value of the loan. In general, determining the value of y that satisfies y′ (t) = f (y(t), t) is very difficult. (For continuous functions f , a solution always exists; the
solution is unique if, in addition, the function f satisfies a “Lipschitz” condition.) This difficulty leads to a focus on tractable problems. The chapter begins with
CHAPTER 1. INTRODUCTION
18
(firstorder) linear differential equations y′ (t) = b(t)y(t) + c(t) or y′ (t) = b y(t) + c
when the coefficients are constant. These equations are solved using standard techniques in the early part of the chapter. Then two useful, but nonlinear, firstorder differential equations are considered – the logistic and Bernoulli equations. Following this, second and higherorder differential equations are discussed. A secondorder linear differential equation with constant coefficients has the form ay′′ (t) + b y′ (t) + c y(t) + d = 0. Such a differential equation is again solved by stan
dard techniques but the study of stability (the behavior of the solution y(t) as a function of t) is more involved. The chapter concludes with a discussion of matrix differential equations and the stability of such systems. In Chapter 17, linear difference equations are discussed. A linear difference
equation has the form yt = ayt−1 + b. In contrast to the differential equation, in a difference equation variables move in discrete increments. A kth order linear dif
ference equation has the form yt = a1 yt−1 + a2 yt−1 +· · ·+ a k yt−k + b. The solutions of such systems are relatively straightforward. A slight generalization occurs when the parameter b varies, say as b t – this is called the variable forcing function case and is discussed for a variety of models relating b to t. The secondorder difference equation, yt = a1 yt−1 + a2 yt−1 , is studied in detail, in particular the stability and dynamic behavior of yt . The conditions described here apply to autoregressive
processes of order 2: yt = a1 yt−1 + a2 yt−1 + ǫ t , where ǫ t is a random error term. The chapter concludes with a discussion of systems of difference equations.
Chapter 18 introduces some basic material on probability and random variables. The density and distribution function of a random variable are defined. The chapter then develops a few of the fundamental discrete distributions: the Bernoulli, Poisson and binomial distributions. Following this, continuous distributions are introduced. The chapter discusses in detail four important distributions that arise routinely in statistical theory and are the key distributions that appear in hypothesis testing. These are the normal, chisquare, t and F distributions. The relations between these distributions are explored in detail. In particular, the chisquare density is derived from the normal density; the t distribution is derived from the normal and chisquare distributions; and the F distribution is derived from the chisquare distribution. The concluding chapter, Chapter 19, introduces estimation and some basic properties of estimators. The law of large numbers and the central limit theorem are described. The chapter then proceeds to review confidence intervals and develop the fundamentals of hypothesis testing, including the role of various distributions. Least squares estimation is reviewed and the use of major test statistics (t and F) is explained.
2 Matrices and systems of equations
2.1
Introduction
In linear algebra, a rectangular array of numbers is called a matrix. For many problems in mathematics, economics, and other fields, arranging numbers in such blocks turns out to be very useful – for example, in solving systems of equations, analyzing the evolution of dynamic systems and so on. Working with blocks of numbers requires a methodology for the manipulation of such arrays, so that addition, subtraction, multiplication and other algebraic procedures are defined. Sections 2.2 and 2.3 introduce arrays of numbers (vectors and matrices) and provide the basic rules of manipulation (addition, subtraction and multiplication). Addition, subtraction and multiplication are natural extensions of the scalar rules. In contrast, matrix inversion is more involved, requiring a number of steps, described in Section 2.3.4. Matrix inversion is defined for square matrices but not all square matrices are invertible. For square matrices, a summary statistic, the determinant, identifies key properties of a matrix. In particular, a square matrix can be inverted if and only if it has a nonzero determinant. The determinant of a matrix is discussed in Section 2.3.5. Inverting a matrix ofter arises in solving systems of equations, but it is not the only way. Sections 2.3.6 and 2.3.7 provide alternative methods. In Section 2.4, linear dependence, a key concept in linear algebra, and independence are discussed. From this, the rank of a matrix is defined and matrix rank is used to discriminate between equation systems with many, one or no solution. Section 2.5 provides a geometric perspective on the task of solving equation systems and proceeds to give necessary and sufficient conditions in terms of rank and linear independence for a system to be soluble. Section 2.6
19
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
20
concludes with an introduction to some special matrices. The remainder of this section provides some elementary motivation for the use of matrices.
2.1.1 Some motivating examples The following examples illustrate how matrices arise naturally in a variety of areas and where the matrix formulation greatly simplifies subsequent analysis. The first example introduces adjacency matrices, a simple way to represent “closeness” or adjacency between entities – in this case cities, where closeness is represented by the minimum number of flights necessary to travel from one city to another. E XAMPLE 2.1: Suppose there are four airports, A , B, C and D . The figure shows the flights available from each airport. The only flight from A is to
B; from C , flights are available to A and D – and so on. These flight options may be represented in an array, with rows identified with flight origin and columns identified with flight destination. Thus, in the first row (corresponding to A ) and the third column (corresponding to B), there is a “1”, indicating that there is a flight from A to B. In the third row (corresponding to C ), there is a “1” in columns 1 and 4, corresponding to destinations A and
D : it is possible to take a flight from C to either A or D .
B
D
A
A
B
C
D
A
0
1
0
0
B
0
0
1
0
C
1
0
0
1
D
0
0
1
0
C
Write M or M1 for this array or matrix of numbers. Consider now the destinations reached if exactly two flights are taken. From A there is one possible destination, C , reached by a flight to B and then to C . From B there are two possible destinations: A and D with corresponding flight paths
B → C → A and B → C → D . In matrix M2 , these calculations are repre
sented. Looking at the first row, there is just one possible destination, C ,
2.1. INTRODUCTION
21
with two flights from A . This is indicated by “1” in column 3 of row 1 of M2 . Similarly, from B there are two possible destinations if exactly two flights are taken: A and D . In matrix M2 , this is represented by a “1” in row 2 (corresponding to B) column 1 (corresponding to A ), and in row 2 column 4 (corresponding to D ). Matrix M3 indicates the destinations with exactly three flights. Finally, the matrix M (2) identifies the destinations reachable with two or fewer flights (two or one flights). Simple matrix rules may be used to generate matrices such as M3 or M (2). 0 0 M = M1 = 1
0
1 0 M3 = 1
0
1
0
0
0
1
0
0
0 , 1
0
0
0
1
1
0
1
1
1
0 1 M2 = 0
0
1
0
0
0
1
1
1 , 0
1 0 1 0 0 1 0 1 1 0 1 0 , M (2) = 1 0 1 1 . 1 1 1 1 1 0
1
0
1
1
♦
The next example describes a standard demand system. Demand for a good typically depends not just on the good’s own price, but also on the price of related goods. For example, if two goods are substitutes, such as butter and margarine, one expects the price of one good to affect the demand for the other. But, with such crossprice effects, it becomes more difficult to determine the equilibrium price in each market. The next example illustrates this issue.
E XAMPLE 2.2: This example considers equilibrium price determination. To illustrate, consider the case where there are two markets, 1 and 2, defined by the supply and demand conditions in each market. Suppose the demand and supply in the pair of markets is described by the following equations:
D 1 = 20 − 3 p1 + p2 ,
S1 = 2 + 2 p1
D 2 = 10 + 2 p1 − 4 p2 ,
S2 = 4 + 3 p2 .
So, for example, a unit increase in the price of good 1 reduces the demand for good 1 by 3 units, while an increase in the price of good 2, raises the demand for good 1 (possibly because good 1 is a substitute for good 2, with the higher price for good 2 causing consumers to switch to good 1).
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
22
Setting supply equal to demand in each market gives: 20 − 3 p1 + p2 = 2 + 2 p1 10 + 2 p1 − 4 p2 = 4 + 3 p2 , and rearranging, 5 p1 − p2 = 18 −2 p 1 + 7 p 2 = 6 .
These equations determine the price pair ( p1 , p2 ) at which both markets clear (supply equals demand in each market). The solution is ( p1 , p2 ) = (4, 2). However, finding such solutions is more difficult when the number of mar
kets is large, with correspondingly large numbers of prices to be determined. This pair of equations is rearranged in matrix form in Example 2.5. The procedure generalizes directly to any number of equations and variables and provides a systematic way of dealing with and solving large systems of
♦
equations.
The final example focuses on the dynamic evolution of economic variables. At any point in time, suppose that the current state of the economy evolves to a future state according to a law of motion. Analyzing the longrun behavior of such economic variables is simplified by matrix techniques. The next example illustrates such a dynamic model.
E XAMPLE 2.3: The Phillips curve asserts that the change in inflation is proportional to the gap between the level of unemployment (u) and a natural rate of unemployment (u¯ ). If π denotes the rate of inflation, with subscripts for time periods, then π t − π t−1 = −α( u t − u¯ ).
Okun’s law connects changes in unemployment to output changes, which are in turn connected to the inflation rate. Assume that this relation has the form
u t − u t−1 = β( g − π t ), where g is the growth in the money supply. Rearranging these equations: π t = π t−1 − α( u t − u¯ )
u t = u t−1 − β( g − π t ).
2.2. VECTORS
23
Suppose that α = β = 21 , so the equations are π t = π t−1 − 12 (u t − u¯ ) and
u t = u t−1 − 12 ( g − π t ). Substituting the expression for u t into the first equa
tion and substituting the expression for π t into the second equation, some rearrangement gives: ¸ · 2 1 2 4 π t −1 − u t −1 + g + u¯ 5 5 5 5 ¸ · 2 4 2 1 u t = π t−1 + u t−1 + − g + u¯ . 5 5 5 5 πt =
From these equations, the pair (π t , u t ) are determined by (π t−1 , u t−1 ) so the evolution of system is determined by the coefficients on π t−1 and u t−1 . This system can be arranged as a matrix system where the dynamics are determined by the matrix terms (see Example 2.5).
♦
The following discussion focuses on two types of array – vectors, where the array has one row or one column, and matrices, where the number of rows and columns is arbitrary.
2.2 Vectors Definition 2.1: A
VECTOR
in R n is an ordered collection (row or column) of n
real numbers. Throughout, lower case bold letters, such as a, b and c, will denote vectors. To highlight the dimension of a vector, say that a is an nvector. For example, a =
(1, 3) and b = (5, 2) are 2vectors, c = (3, 4, 1, 7) is a 4vector, and so on. If two
vectors have the same number of rows or columns they may be multiplied, to form the inner product.
Definition 2.2: Let f and g be two nvectors with f = ( f 1 , f 2 ,... , f n ) and g = ( g 1 , g 2 ,... , g n ). The
INNER PRODUCT
f · g = f 1 g1 + f 2 g2 + · · · + f n g n .
of f and g is denoted f · g and defined:
So the entries of the vectors are multiplied pairwise and added together. For example, with a and b given above, a · b = 1 × 5 + 3 × 2 = 11. Note that the order does
not matter since f · g = g · f . Equations may be represented using this notation. The equations x + y = 5 and 2 x + 3 y = 12 may be written as (1, 1) · ( x, y) = 5 and
(2, 3) · ( x, y) = 12.
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
24
There is a convenient way to represent f 1 g 1 + f 2 g 2 + · · · + f n g n . Suppose there
are 20 numbers, labeled x1 , x2 ,... , x20 . Adding the first ten can be written P x1 + x2 + · · · + x10 , but there is also a shorthand notation for this, 10 i=1 x i , which
can be read as saying “add up the numbers labeled from 1 to 10” or “sum from 1 to P P15 P6 P 10” the numbers x i . Similarly, to illustrate, 15 i=7 x i = i=1 x i − i=1 x i . The sign Pn is called the summation sign. With this notation: f · g = i=1 f i g i = f 1 g 1 + f 2 g 2 +
· · · + f n gn.
2.3 Matrices Definition 2.3: A
MATRIX
is a rectangular array of numbers.
Throughout, upper case bold letters, such as A , B and C , will denote matrices. A matrix B is an m × n matrix if it has m rows and n columns (this is sometimes
denoted by B − m × n).
If B is an m × n matrix we may write it as:
b11
b12
b21 B= . .. b m1
b22 .. .
b1n
···
b2n . .. . b mn
··· .. .
b m2
···
For example, C is a 2 × 3 matrix:
C=
Ã
2
11
4
1
0
6
!
.
Sometimes, the matrix B may be denoted {b i j } i∈ I, j ∈ J , where I and J are index
sets, or just as {b i j }. An element b i j of the matrix B represents the element in the position given by the i th row and jth column, the ( i, j)th element of B . There are some special matrices, such as the identity ( I ), null (0) and diagonal matrices (D ):
1
0
0 I = . .. 0
1 .. .
··· .. .
0
···
···
0
0 , .. . 1
0
0
0 0=. .. 0
0 .. .
··· .. .
0
···
···
0 0 , .. . 0
D=
a11
0
0 .. .
a22 .. .
0
0
···
0
··· .. .
0 .. .
···
a nn
.
(2.1)
In the diagonal matrix (D ), the diagonal elements may be arbitrary numbers but the offdiagonal elements must be zero. When a matrix has just one row or one column, it is a vector.
2.3. MATRICES
25
The next sections discuss matrix algebra, in particular addition, subtraction, multiplication and inversion. The rules for addition, subtraction, multiplication are straightforward. Understanding matrix inversion requires some discussion, and the computation of the inverse involves a number of steps.
2.3.1
Matrix addition and subtraction
Let A and B be two m × n matrices, so that both A and B have the same number of rows and columns. M ATRIX
ADDITION :
The sum of A and B is obtained by adding the entries in the
two matrices pairwise:
a11 + b11
a +b 21 21 A+B = .. . a m1 + b m1
M ATRIX
SUBTRACTION :
a12 + b12
···
a1 n + b1 n
a22 + b22 .. .
···
.
a2 n + b2 n .. .
a m2 + b m2
···
a mn + b mn
..
.
The difference of A and B is obtained by subtracting en
tries pairwise:
a11 − b11
a −b 21 21 A−B = .. . a m1 − b m1
a12 − b12
···
a1 n − b1 n
a22 − b22 .. .
···
.
a2 n − b2 n .. .
a m2 − b m2
···
a mn − b mn
..
.
So the entry in the i th row and jth column of A + B is a i j + b i j , and the entry in the i th row and jth column of A − B is a i j − b i j . For example, if
A=
Ã 1
3
4
5
8
1
!
and
B=
Ã
2
4
1
9
0
3
!
,
then
A+B = A−B =
Ã Ã
1+2
3+4
4+1
5+9
8+0
1+3
1−2
3−4
4−1
5−9
8−0
1−3
! !
= =
Ã Ã
!
3
7
5
14
8
4
−1
−1
3
−4
8
−2
; !
.
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
26
Note from the definition that A + B = B + A . Also, ( A + B ) + C = A + (B + C ). For
two matrices A and B , the operations of addition and subtraction are defined only if A and B have the same number of rows and columns.
2.3.2 Matrix multiplication Let A and B be two matrices: a11 a12 a13 · · · a21 a22 a23 · · · a a32 a33 · · · A= 31 . .. .. .. .. . . .
a m1
a m2
a m3
···
a1 n
a2 n a3 n , .. . a mn
b11
b12
b13
b21 b B= 31 . ..
b22
b23
b32 .. .
b33 .. .
b n2
b n3
b n1
b1 p
···
b2 p b3 p . .. . bn p
···
··· .. . ···
So A is an m × n matrix and B is an n × p matrix. If the number of columns in A ,
n, equals the number of rows in B , then the matrices can be multiplied to form
the matrix C = AB , where C is defined componentwise as
c i j = a i1 b1 j + a i2 b2 j + · · · + a in b n j =
n X
a ik b k j .
k =1
In words, take the i th row of the A matrix and the jth column of the B matrix and form the inner product. Note that this is possible since A has the same number of columns as B has rows so that both the i th row of the A matrix and the jth column of the B matrix are vectors with the same number of elements, n, and the inner product can be formed. For example:
A=
Ã
1
3
2
4
3
6
!
,
1 B = 2 3
5
2
5
1 , 2
2
AB =
Ã
13
24
9
28
47
23
!
.
E XAMPLE 2.4: Recall the adjacency matrix in Example 2.1. Write f i j = 1 if
there is a flight from city i to city j; and write f i j = 0 if there is no flight
from i to j. For notational convenience, label the cities A , B, C and D as 1, 2, 3 and 4. Then the adjacency matrix may be written:
f 11
f 12
f 13
f 21 M = f 31
f 22
f 23
f 32
f 33
f 42
f 43
f 41
0 f 24 = 0 f 34 1
f 14
f 44
0
1
0
0
0
1
0
0
0
1
0 . 1
0
2.3. MATRICES
27
Consider the matrices M 2 = M M , obtained by multiplying M with itself,
and M 3 = M M M , obtained by multiplying M by itself three times. These calculations give:
0
0
1
0
1 M = 0
0
0
1
1
0
0
1 , 0
2
1
1
1
0
0
1
0 M = 1
1
1
0
1
1
1
0 . 1
3
0
0
Let f i2j be the entry in the i th row and jth column of M 2 . From its definition:
f i2j = f i1 f 1 j + f i2 f 2 j + f i3 f 3 j + f i4 f 4 j . If f ik f k j > 0, then there is a flight from i to k and a flight from k to j; and if
f i2j = 0, then f ik f k j = 0 for each k and there is no airport k with connecting
2 flight from i to j. In this example, f 13 = 1, indicating that there is exactly one
route with two flights from airport 1 to airport 3. (If f i2j = 2 this would mean that there are exactly two distinct routes connecting i and j, each involving
two flights.) Thus, f i2j is the number of distinct routes from i to j involving exactly two flights. Going further, let f i3j be the ( i, j)th entry in M 3 . This gives the number of distinct routes from i to j with exactly three flights. Finally, if we add M and M 2 together:
0
1
0
0
0
0
1
0
0
1
1
0
0 M + M2 = 1
0
1
0
0
1
0
1
1
1
1
0
1
0
0
1 = 1 0 1
0
0
0 + 1 1 0
0
1
1 = M (2). 1
0
0
1
1
1
1
Thus, if f i j (2) is the entry in the i th row and jth column of M + M 2 , then
f i j (2) gives the number of distinct routes from i to j with either one or two
♦
flights.
With matrix multiplication, there are some basic points of caution to note. Even when the product AB can be formed, this does not imply that B A can be formed. The expression B A is not defined unless p = m (i.e., B has the same number of columns as A has rows). For example: ! Ã 1 3 1 2 1 A = 5 1 , B = gives AB = 5 1 1 2 1 2
3 Ã 2 1 1 1
1 1
!
5
= 11
5
4
6 ,
3
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
28
but the expression B A is not defined since B has fewer columns than A has rows. If both AB and B A exist, they are generally not equal and may not be the same dimension. For example:
B=
Ã
2
1
1
1
!
,C=
Ã
3
0
1
2
!
,
so BC =
Ã
7
2
4
2
!
while CB =
Ã 6
3
4
3
!
and
A=
Ã
1
3
2
4
2 5 , B = 5 6 3
1
!
Ã
32 7 , so AB = 42 1
27
!
36
4
10
16
, and B A = 19
43
67 .
5
13
21
A matrix may be multiplied by a real number by multiplying each element by the number. Take a real number, λ ∈ R (where R is the set of real numbers), then λ A is defined componentwise. The ( i, j)th element of λ A is λa i j . For example:
D=
Ã 1
3
4
2
!
, λD =
Ã
λ1
λ3
λ4
λ2
!
, if λ = 2, 2D =
Ã
2
6
8
4
!
.
The operations of matrix addition and multiplication have the following properties.
A (B + C ) = AB + AC λ( A + B ) = λ A + λB, λ a scalar
(λ + µ) A = λ A + µ A, λ, µ scalars. Matrices provide a succinct way of expressing systems of equations. The system of m equations in n unknowns, { x1 , x2 ,... , x n }:
a11 x1
+
a21 x1 .. .
+
a m 1 x1
+
+
a12 x2
+
···
...
+
+
...
+
a22 x2 .. .
+
a m 2 x2
+
...
+
+
a1 n x n
b1
=
a2 n x n .. .
=
a mn x n
=
b2 .. .
=
bn
may be written:
a11
a12
a13
a21 . .. a m1
a22 .. .
a23 .. .
··· .. .
a m2
a m3
···
···
a1 n
x1
b1
a2 n x2 b 2 = . .. .. .. . . . bm a mn x n
To illustrate, consider the examples introduced earlier.
2.3. MATRICES
29
E XAMPLE 2.5: Considering Example 2.2 shows how a system of equations may be arranged in matrix form using matrix multiplication: 5 p1 − p2 = 18
Ã
or
−2 p 1 + 7 p 2 = 6
5 −2
−1
!Ã
7
p1
!
p2
=
Ã
18 6
!
.
Similarly, the Phillips curve equations in Example 2.3 may be written in matrix form: ¸ · 2 1 2 4 g + u¯ π t = π t −1 − u t −1 + 5 5 5 5 ¸ · 2 4 2 1 u t = π t−1 + u t−1 + − g + u¯ 5 5 5 5
or Ã
πt
ut
2.3.3
!
=
Ã
4 5 2 5
− 25 4 5
!Ã
π t −1
u t −1
! Ã£ 1
+
¤!
g + 52 u¯ ¤ £ 52 − 5 g + 15 u¯
.
♦
Matrix transpose
Let A be a matrix, not necessarily square. The transpose of A is obtained from
A by “rotating” A so that the i th row becomes the i th column. Write A ′ for the transpose of A . The ( i, j)th element of A ′ is a ji , so the ( i, j)th element of the transpose is the ( j, i )th element of the original matrix. In general, if A is an m × n matrix, then A ′ is an n × m matrix. For example, Ã ! Ã ! Ã ! 2 1 2 4 6 3 0 3 1 ′ ′ B= ⇒ B = 4 5 , D = ⇒D = . 1 5 3 1 2 0 2 6 3 When using vectors in matrix computations, dimensions must be assigned – since matrix multiplication depends on correct dimensions. To illustrate, let x and y be the two 3 × 1 vectors: 1 4 x= , y = 2 5 . 3 6 Then, for example, x′ = (1, 2, 3), x′ y = 1 · 4 + 2 · 5 + 3 · 6 = 32 = x · y. Similarly, Ã ! ! 1 Ã 2 1 ´ ³ 2 4 6 2 = 28 and y′ B ′ = (4, 5, 6) Bx = = 64 47 . 4 5 20 1 5 3 3 6 3
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
30
However, the expressions xB or B ′ y are not defined, since the matrices do not conform for matrix multiplication. The transpose operation has the following properties: ( A ′ )′ = A
( A + B )′ = A ′ + B ′ ( AB )′ = B ′ A ′
( ABC )′ = C ′ B ′ A ′ . Note, for example, that if AB exists then A has the same number of columns as
B has rows. Say A is an m × n matrix and B an n × p matrix: then A ′ is an n × m
and B ′ an p × n matrix so that B ′ A ′ exists.
2.3.4 Matrix inversion When considering a single variable, division is straightforward. Given a number a, provided a 6= 0, one can divide 1 by a to get −1
a
. This term,
1 a
−1
or a
1 a,
or in alternative notation,
, is called the inverse of a or “a inverse”. It routinely
arises when solving equations. If 2 x = 4, then multiplying both sides by 12 gives ¡1¢ ¡1¢ ¡1¢ ¡1¢ 2 2 x = 2 4, and since 2 2 = 1 and 2 4 = 2, we get 1 x = 2 or x = 2. The same calculation may be expressed as 2−1 2 x = 2−1 4, and since 2−1 2 = 1, x = 2−1 4 = 2.
For the expression ax = b, where a and b are real numbers and a 6= 0, exactly the
same procedure applies: a−1 ax = a−1 b, and with a−1 a = 1, x = a−1 b. We wish to apply exactly the same logic to matrices.
Considering the equation system Az = b, if there is a matrix, A −1 , such that
A −1 A = I , then we can multiply the equation to get A −1 Az = A −1 b or I z = A −1 b
or z = A −1 b. For example, given the pair of equations
x+ y=5
(2.2)
2 x + 3 y = 12,
(2.3)
we wish to find a pair ( x∗ , y∗ ) that satisfies both of these equations. One can see that (0,4) satisfies the second equation, but not the first, while (5,0) satisfies the first but not the second. It is straightforward to find a solution by substitution. Take Equation 2.2 and express x in terms of y : x = 5 − y. This can be substituted
into Equation 2.3 to find: 2(5 − y) + 3 y = 12 or 10 − 2 y + 3 y = 12 or 10 + y = 12, so that
y = 2. Since x + y = 5, this implies that x = 3 and so the solution is ( x∗ , y∗ ) = (3, 2).
However, this solution procedure is not satisfactory for large systems of equations
and does not allow one to easily investigate circumstances under which solutions
2.3. MATRICES
31
exist or are unique. Writing the pair of equations in matrix form gives: Ã 1
1
2
3
!Ã ! x
y
=
Ã
!
5
12
.
Taking the matrix system on the left, consider multiplying both sides by an appropriate matrix: Ã
3
−1
−2
!Ã 1
1
2
3
1
!Ã ! x
y
=
Ã
3
−1
−2
!Ã
5
!
12
1
.
Carrying out the matrix multiplications on both sides gives: Ã 1
0
0
1
!Ã ! x
=
y
Ã ! 3
2
Ã ! x
or
y
=
Ã ! 3
2
.
This “appropriate” matrix is in fact the inverse of A :
A
−1
=
Ã
3
−1
−2
!
1
.
Observe that A −1 A = I :
A
−1
A=
Ã
3 −2
!Ã −1 1
1
2
1 3
!
=
Ã 1
0
0
1
!
.
This following discussion considers matrix inversion, developing the computation of A −1 . In general given a system of n equations with n unknowns, Az = b, where A is
an n × n square matrix, assuming we can invert A , we may solve for z according
to z = A −1 b. However, this is not always the case. Not all square matrices are
invertible (have an inverse). Returning to the single equation, ax = b, there are
three distinct possibilities. The first is that there is a unique solution when a 6= 0, so a−1 is defined, and the solution is given by x = a−1 b. A second possibility is that there is no solution – if a = 0 and b 6= 0 it is impossible to find x satisfying
ax = b. Finally, there may be an infinite number of solutions – if a = b = 0, then
any number x satisfies ax = b.
The situation is essentially the same in the matrix case. Consider the following three equation systems:
x+ y =5
x+ y=5
x+ y=5
2 x + 3 y = 12
3 x + 3 y = 12
2 x + 2 y = 10.
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
32
The first system has a unique solution (x∗ , y∗ ) = (5,12). The second system
requires that x + y = 5 and 3(x + y) = 12 or x + y = 4: there is no (x, y) satisfying both equations. The third system requires that x + y = 5 and 2x + 2y = 10 or x + y = 5 –
so any (x, y) satisfying the first equation also satisfies the second, and there are infinitely many solutions. If the matrix A has an inverse, A −1 , then A −1 A = I, the identity matrix and
given Az = b, we can multiply both sides by A −1 to obtain A −1 Az = A −1 b or z = A −1 b. In this case, there is a unique solution. When A does not have an in
verse, there is either no solution or an infinite number of solutions. That issue is considered in detail in Section 2.5, along with the more general case where the number of equations and the number of variables may differ. A square matrix has an inverse if and only if its determinant (defined next) is nonzero. An alternative criterion for invertibility may be given in terms of matrix rank (developed in Section 2.5.2). R EMARK 2.1: A square matrix has a nonzero determinant if and only if the columns of the matrix are linearly independent, in which case the matrix is said to be of full rank. Rank and linear independence are introduced in Section 2.4.1. The concept of linear independence provides geometric insight into the circumstances in which a matrix is invertible, but is not required to define the inverse of a matrix.
When a matrix has an inverse, the inverse can be found in a number of ways. The method described here involves finding in turn a number of matrices, leading to the inverse. Briefly, the steps are to find the determinant along with the minor, cofactor and adjoint matrices; the inverse matrix is the adjoint matrix scaled by the reciprocal of the determinant (for which reason, it must be nonzero). These are developed in turn. R EMARK 2.2: As noted, the method discussed next is not the only way to invert a matrix, but it serves the purpose of introducing determinants, minors and cofactors. Later, alternative methods of inversion are discussed.
2.3.4.1
The determinant
Let A be a square matrix. The determinant of A is denoted  A . The determinant is defined inductively. If A = {a}, so that A is a 1 × 1 matrix, then  A = a
(the determinant is equal to the number, not the absolute value). For example, if
2.3. MATRICES
33
A = (−2), then  A = −2. In the 2 × 2 case, A=
Ã
a11
a12
a21
a22
!
¯ ¯a ¯ 11 ⇒ A  = ¯ ¯ a21 def
¯ a12 ¯¯ ¯ = a11 a22 − a21 a12 . a22 ¯
In words, multiply the pair a11 and a22 and subtract from that the product of
a21 and a12 . For the 3 × 3 case the determinant is defined:
a11
a12
A= a21 a31
a22
a32 ¯ ¯a ¯ 21 + a13 ¯ ¯a31
a13
¯ ¯a ¯ 22 a23 ,  A = a11 ¯¯a 32 a33 ¯ a22 ¯¯ ¯. a32 ¯
¯ ¯ ¯a a23 ¯¯ ¯ 21 ¯ − a12 ¯ ¯a31 a33 ¯
¯ a23 ¯¯ ¯ a33 ¯
(2.4)
For example,
1
2
3
A= 1
3
5 ,  A = 1(36 − 25) − 2(12 − 5) + 3(5 − 3) = 11 − 14 + 6 = 3.
1
5
(2.5)
12
Notice that the definition of the determinant for a 3 × 3 matrix makes use
of determinants of 2 × 2 matrices. Similarly, the definition of the determinant
of a 4 × 4 matrix utilizes 3 × 3 determinants, and the definition of the determinant of a n × n matrix utilizes (n − 1) × (n − 1) determinants. So the definition
of the determinant is given inductively. The general process is as follows. Given the square matrix A , let A i j be the matrix obtained from A by deleting the
i th row and jth column. If A is an n × n matrix, A i j is an (n − 1) × (n − 1) ma
trix. For the n × n matrix A , the determinant may be calculated according to the
formula
 A = a11  A 11  −a12  A 12  +a13  A 13  + · · · + (−1)1+n a1 n  A 1 n  .
(2.6)
(The pattern may be seen in the computation of the determinant in Equation 2.5.) Observe that in this calculation we are required to compute determinants of the
A i j matrices. However, if A is an n × n matrix, then A i j is an (n − 1) × (n − 1) matrix.
Therefore, if we have defined the determinant of a 2 × 2 matrix, we can define the
determinant of a 3 × 3 matrix. In turn, this allows us to define the determinant of a 4 × 4 matrix, and so on.
Note that the a i j coefficients are drawn from a chosen row, row 1, but in fact any row may be chosen for the calculation. In the example given above, if the
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
34
second row were chosen, the computation would proceed as: ¯ ¯ ¯ ¯ ¯ ¯ ¯2 3 ¯ ¯1 3 ¯ ¯ 1 2¯ ¯ ¯ ¯ ¯ ¯ ¯  A  = −1 ¯ ¯ +3¯ ¯ −5¯ ¯ ¯5 12¯ ¯1 12¯ ¯ 1 5¯
= −1(2 · 12 − 3 · 5) + 3(1 · 12 − 1 · 3) − 5(1 · 5 − 1 · 2)
= −1(9) + 3(9) − 5(3) = 3.
Note that in Equation 2.6, we have a “−” sign before a12 while there is a “+” sign
before a13 . The rule here is that if in a i j , i, j sum to an even number (i.e., i + j is
even), then we place a “+” sign before that a i j and if i, j sum to an odd number then we place a “−” sign before that a i j . Finally, note that (−1)2 = (−1)(−1) = 1,
(−1)3 = (−1)(−1)(−1) = −1 and, in general, (−1)k = 1 if k is even and (−1)k = −1 if
k is odd. Thus
 A  = (−1)1+1 a11  A 11  +(−1)1+2 a12  A 12  +(−1)1+3 a13  A 13  + ... · · · + (−1)1+n a1 n  A 1 n 
=
n X
(−1)1+k a1 k  A 1 k  .
(2.7)
k =1
In fact, the determinant can be calculated by expanding along any row. Expanding along the i th row:  A=
n X
(−1) i+ j a i j  A i j  = (−1) i+1 a i1  A i1  +(−1) i+2 a i2  A i2 
j =1
+ · · · + (−1) i+n a in  A in  .
(2.8)
While these calculations obtain the determinant by expanding along rows, the determinant may also be calculated by expansion along columns. The point is clarified in the discussion of cofactors below.
2.3.4.2
The minor matrix
Given an n × n matrix A , there is a corresponding matrix, M , called the matrix of
minors. Recall that A i j is the matrix obtained from A by deleting the i th row and
jth column of A . The minor in the i th row and jth column of the minor matrix is the determinant of A i j . That is:
m i j = A i j  . Let M be the matrix with m i j in the i th row and jth column.
2.3. MATRICES
35
For example, in the 3 × 3 case: ¯ ¯a ¯ 22 m 11 = ¯ ¯ a32
¯ a23 ¯¯ ¯. a33 ¯
Consider the matrix A in Equation 2.5. The minors are: ¯ ¯ ¯ ¯ ¯ ¯3 5 ¯ ¯1 5 ¯ ¯1 ¯ ¯ ¯ ¯ ¯ m 11 = ¯ ¯ = 36 − 25 = 11; m 12 = ¯ ¯ = 12 − 5 = 7; m 13 = ¯ ¯5 12¯ ¯1 12¯ ¯1 ¯ ¯ ¯ ¯ ¯ ¯2 3 ¯ ¯1 3 ¯ ¯1 ¯ ¯ ¯ ¯ ¯ m 21 = ¯ ¯ = 24 − 15 = 9; m 22 = ¯ ¯ = 12 − 3 = 9; m 23 = ¯ ¯5 12¯ ¯1 12¯ ¯1 ¯ ¯ ¯ ¯ ¯ ¯1 ¯2 3¯ ¯ 1 3¯ ¯ ¯ ¯ ¯ ¯ m 31 = ¯ ¯ = 5 − 3 = 2; m 33 = ¯ ¯ = 10 − 9 = 1; m 32 = ¯ ¯1 ¯ 1 5¯ ¯3 5¯
¯ 3¯¯ ¯ = 5−3 = 2 5¯ ¯ 2¯¯ ¯ = 5−2 = 3 5¯ ¯ 2¯¯ ¯ = 3 − 2 = 1. 3¯
Together, A and its matrix of minors M are:
1
2
3
A= 1
3
5
1
5
and
12
11
7
2
M = 9
9
3 .
1
2
1
Finally, note that since m i j =  A i j , the determinant may also be written as:  A =
2.3.4.3
n X
j =1
(−1) i+ j a i j m i j .
(2.9)
The cofactor matrix
The cofactor matrix is derived directly from the minor matrix by changing the sign of minors in odd locations on the minor matrix. The i jth cofactor c i j is defined as
c i j = (−1) i+ j m i j . In words, if i + j is even, the cofactor c i j equals the minor m i j ;
and c i j = −m i j if i + j is odd. More succinctly:
c i j = (−1) i+ j m i j
(2.10)
noting that (−1) i+ j = 1 if i + j is even and (−1) i+ j = −1 if i + j is odd. Let C be the matrix of cofactors.
For example, in the 3 × 3 case:
m 11
m 12
m 13
c11
c12
c13
If M = m 21
m 22
m 23 , then C = c21
c22
c23 = −m 21
m 31
m 32
m 33
c31
c32
c33
m 11 m 31
−m 12
m 22
−m 32
m 13
−m 23 .
m 33
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
36
Continuing with the earlier example, the matrix A and its minor and cofactor matrices are given by: 1 A = 1 1
2
3
3
5 ,
5
12
11
7
M = 9
9
1
2 3 , 1
2
11
C= −9
1
−7
2
−3 .
9
−2
1
Note that, in view of Equations 2.9 and 2.10,  A =
n X
ai j ci j ,
(2.11)
j =1
and this is called cofactor expansion along the i th row. Cofactor expansion will be discussed further.
2.3.4.4
The adjoint matrix
The adjoint matrix of A is denoted adj( A ) and defined as the transpose of the cofactor matrix: adj( A ) = C ′ . Continuing with the example, the matrices are given by:
1
2
3
11
7
2
A= 1
3
5 , M = 9
9
3 , C = −9
1
5
12
1
2
1
11 1
−7
9
−2
2
11
′ −3 , C = −7
1
2
The inverse matrix is defined directly from the adjoint matrix.
2.3.4.5
The inverse matrix
Definition 2.4: The inverse of A is denoted A −1 and defined:
A −1 =
1 adj( A ). A
Continuing with the example (recall  A = 3):
11
adj( A ) = −7
2
−9
9
−3
1
−2 ,
1
11 3 7 A −1 = − 3 2 3
− 39
9 3 − 33
1 3 − 23 . 1 3
−9
9
−3
1
−2 .
1
2.3. MATRICES
37
The inverse matrix satisfies: A A −1 = I = A −1 A 1 1 1
2 3 5
11 3 3 7 − 5 3 2 12 3
− 93
1 3
1 − 23 = 0 1 0
9 3
− 33
3
0
0
1
0 .
0
1
A property of the adjoint (which follows from the cofactor expansion discussed above) is the following: A 0 A adj( A ) = (adj A ) A = . ..
0 A .. .
0
0
···
··· .. .
···
0
= A  · I. A
0 .. .
So, given a system of n equations in n unknowns, such a system can be written in the form: Ax = b. Provided A has full rank (is nonsingular), we can solve for
x : x = A −1 b .
E XAMPLE 2.6: Consider the system Ax = b, where A and b are given by:
1
2
4
2 A= 3
5
9
2
1
1
6
0
4 1 , and b = 5 , 7 1
3
2
2
and we require the solution x. Then
− 162 61
116 61 −1 A = − 33 61 41 61
35 61
51 61
− 10 61
32 − 61
6 61
7 61
− 13 61
5 − 61
47 61
57 − 61 22 61 7 − 61
− 328 61 268 61 so that x = A −1 b = . − 51 61 80 61
♦
The next example develops the twomarket model of Example 2.2 for arbitrary supply and demand parameters.
E XAMPLE 2.7: A twomarket model with crossprice effects (the price of butter may affect the demand for margarine; the price of tea may affect
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
38
the demand for coffee) has the price of both goods enter the demand for each good. This observation leads to the market structure:
D 1 = a0 + a1 p1 + a2 p2 ,
Market 1:
D 2 = α0 + α1 p1 + α2 p2 ,
Market 2:
S1 = b 0 + b 1 p1 + b 2 p2
S2 = β0 + β1 p1 + β2 p2 .
(One might plausibly expect that the supply functions depend on own price (b2 = β1 = 0) but that is unnecessary for the computations.) To find the equi
librium level of quantity and price in each market, equate supply and demand in each market. Market 1:
D 1 = S1
Market 2:
D 2 = S2
or or
a0 + a1 p1 + a2 p2 = b0 + b1 p1 + b2 p2
α0 + α1 p1 + α2 p2 = β0 + β1 p1 + β2 p2 .
Rearranging these equations gives: ( a1 − b1 ) p1 + ( a2 − b2 ) p2 = ( b0 − a0 ) (α1 − β1 ) p1 + (α2 − β2 ) p2 = (β0 − α0 ). These equations can be solved by substitution. From the first equation
p1 =
( b0 − a0 ) ( a2 − b2 ) − p2 . ( a1 − b1 ) ( a1 − b1 )
Substitute into the second equation to get: (α1 − β1 )
·
¸ ( b0 − a0 ) ( a2 − b2 ) − p2 + (α2 − β2 ) p2 = (β0 − α0 ). ( a1 − b1 ) ( a1 − b1 )
Collecting terms in p2 gives: · ¸ ( a2 − b2 ) ( b0 − a0 ) (α2 − β2 ) − (α1 − β1 ) . p2 = (β0 − α0 ) − (α1 − β1 ) ( a1 − b1 ) ( a1 − b1 )
Multiplying both sides by (a1 − b1 ) gives: [(α2 − β2 )(a1 − b1 ) − (α1 − β1 )(a2 − b2 )] p2 = (β0 − α0 )(a1 − b1 ) − (α1 − β1 )(b0 − a0 ). Thus,
p2 =
[(β0 − α0 )(a1 − b1 ) − (α1 − β1 )(b0 − a0 )] . [(α2 − β2 )(a1 − b1 ) − (α1 − β1 )(a2 − b2 )]
A similar calculation gives p1 .
♦
2.3. MATRICES
39
With three or more equations, the procedure in Example 2.7 becomes progressively more complicated and unmanageable. The use of matrix algebra simplifies the problem significantly, and the advantage of matrix algebra becomes more pronounced as the number of equations increases. Example 2.8 reconsiders the problem formulated in matrix terms and shows that a direct generalization to the
nvariable case is straightforward.
E XAMPLE 2.8: In matrix form, the system here may be written Ã
( a1 − b1 )
(α1 − β1 )
( a2 − b2 )
(α2 − β2 )
!Ã
p1
!
p2
=
Ã
b0 − a0
!
β0 − α0
or Ã
p1 p2
!
=
Ã
( a1 − b1 )
( a2 − b2 )
(α1 − β1 )
! −1 Ã
(α2 − β2 )
b0 − a0
β0 − α0
!
,
where the matrix Ã
( a1 − b1 )
(α1 − β1 )
( a2 − b2 )
! −1
(α2 − β2 )
is the inverse of Ã
( a1 − b1 )
(α1 − β1 )
( a2 − b2 )
!
(α2 − β2 )
.
Using the procedure above, the inverse is: Ã
( a1 − b1 )
(α1 − β1 )
( a2 − b2 )
(α2 − β2 )
!−1
Ã 1 (α2 − β2 ) = d −(α1 − β1 )
−( a 2 − b 2 )
( a1 − b1 )
where d = (a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 ). Thus, Ã
p1 p2
!
Ã !Ã ! 1 (α2 − β2 ) −(a2 − b2 ) (b0 − a0 ) = d −(α1 − β1 ) (a1 − b1 ) (β0 − α0 ) ! Ã 1 (α2 − β2 )(b0 − a0 ) − (a2 − b2 )(β0 − α0 ) . = d −(α1 − β1 )(b0 − a0 ) + (a1 − b1 )(β0 − α0 )
So [(α2 − β2 )(b0 − a0 ) − (a2 − b2 )(β0 − α0 )] [(a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 )] [−(α1 − β1 )(b0 − a0 ) + (a1 − b1 )(β0 − α0 )] p2 = . [(a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 )]
p1 =
!
,
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
40
From this solution one can, for example, calculate the change in the equilibrium price of good 1 resulting from a change in α0 : (=
∆ p1 ∆β0 ).
∆ p1 ∆α0
=
(a 2 − b 2 ) d
♦
The following discussion elaborates on various aspects of determinants, cofactors and the inverse.
2.3.5 Additional remarks on determinants The following sections observe that the calculation of determinants may be obtained by cofactor expansion along any row or column, as described next. In addition, the determinant and inverse of a matrix satisfies a number of important properties, summarized in Sections 2.3.5.2 and 2.3.5.3.
2.3.5.1
Determinants and cofactor expansion
Let
a11
a12
...
a1n
a 21 A= .. . a n1
a22 .. .
...
a n2
...
a2n .. . a nn
..
.
and
c11
c12
...
c1n
c 21 C= .. . c n1
c22 .. .
...
c n2
...
c2n .. , . c nn
..
.
where C is the cofactor matrix derived from A. Recall that the determinant of A may be calculated:  A = a11 c11 + a12 c12 + · · · + a1n c1n =
n X
a1 j c1 j ,
j =1
where c i j is the i jth cofactor of A. From the definition, the determinant is computed by taking the first rows of both A and C, multiplying them pairwise, and summing. Put differently, the determinant of A equals the inner product of row 1 from matrix A with row 1 from matrix C. In fact, the same answer is obtained from the inner product of any matching rows of A and C. Furthermore, the determinant may also be computed by taking the inner product of matching columns: the inner product of the ith column of A and the ith column of C gives the determinant. For this reason, the
2.3. MATRICES
41
determinant of A satisfies the following:  A= =
The expression
n X
j =1 n X
a i j c i j , for any i a i j c i j , for any j.
i =1
Pn
is called “row expansion” since the summation is along P a row (the i th). The expression ni=1 a i j c i j is called “column expansion” since the j =1 a i j c i j
summation is along a column (the jth). The following terms give the expansions
associated with each row (for the matrix in Equation 2.5):
a11 c11 + a12 c12 + a13 c13 = (1)(11) + 2(−7) + 3(2) = 3 a21 c21 + a22 c22 + a23 c23 = (1)(−9) + 3(9) + (5)(−3) = 3 a31 c31 + a32 c32 + a33 c33 = (1)(1) + 5(−2) + 12(1) = 3. Similarly, the column expansions are:
a11 c11 + a21 c21 + a31 c31 = (1)(11) + (1)(−9) + (1)(1) = 3 a12 c12 + a22 c22 + a32 c32 = (2)(−7) + 3(9) + (5)(−2) = 3 a13 c13 + a23 c23 + a33 c33 = (3)(2) + (5)(−3) + (12)(1) = 3. So, in the 3 × 3 case, the determinant is computed by any of the expansions
given above. This sometimes provides an easy way to compute a determinant.
E XAMPLE 2.9: In the following example the determinant is computed most easily by cofactor expansion along the row with most 0s. 3 B= 6 1
8 3 0
5
¯ ¯8 ¯ 4 ⇒ B = b31 c31 + b32 c32 + b33 c33 = 1 · ¯¯3 0
¯ 5¯¯ ¯ = 32 − 15 = 17. 4¯
♦
The inner product of the i th row of the A matrix and the kth row of the cofactor matrix, with k 6= i , or the inner product of the jth column of the A matrix and the kth column of the cofactor matrix, with k 6= j, is called expansion by alien
cofactors. These expansions have the property that these inner products are equal P to zero. Thus, nj=1 a i j c k j = 0, if k 6= i (expansion by i th row of A , kth row of C ). P Similarly, ni=1 a i j c ik = 0, if j 6= k (expansion by jth column of A , kth column of C ).
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
42
To illustrate with the example above: 3 X
a1 j c2 j = a11 c21 + a12 c22 + a13 c13 = (1)(−9) + 2(9) + 3(−3) = 0
3 X
a i1 c i2 = a11 c12 + a21 c22 + a31 c32 = (1)(−7) + (1)(9) + (1)(−2) = 0.
j =1
i =1
2.3.5.2
Properties of determinants
1.  A = A ′  . The determinant of matrix A equals the determinant of the transpose of matrix A .
2. Interchange of two rows or two columns reverses the sign of a determinant. 3. Multiplying any one row or column by a scalar k alters the determinant
kfold. 4. The addition or subtraction of a multiple of any row (column) to or from any other row (column) leaves the value of the determinant unaltered. 5. If one row or column is a multiple of another, the determinant is 0. 6. If A and B are two square matrices of order n, then  AB = A  B . For example,
A=
Ã 1
3
2
4
!
, B=
Ã
5
7
6
8
!
⇒ AB =
Ã
23
31
34
46
!
,
so  A = −2,  B = −2 and  A  B = (−2)(−2) = 4. Also,  AB = 1058 −
1054 = 4.
7. In general  A + B 6= A  +  B . Any of the following can hold: (1)  A +B = A  +  B  ;
(2)  A +B > A  +  B  ;
(3)  A +B  j. Thus, an upper triangular matrix has the form:
A=
a11
a21
...
...
a n1
0 .. .
a22 .. .
... .. .
··· .. .
0
0
a n2 . .. . a nn
0
From the definition, the determinant of an upper or lower triangular matrix is Q given by  A = ni=1 a ii = a11 × · · · × a nn . Therefore, a triangular matrix is
nonsingular if and only if all diagonal entries are nonzero.
2.6. SPECIAL MATRICES
65
Theorem 2.9: If a lower (upper) triangular matrix is invertible, its inverse is lower (upper) triangular. P ROOF : Consider a lower triangular matrix L and an entry l i j with i > j. The
minor matrix associated with this entry is obtained by deleting the i th row and
jth column of L. Deleting the jth column deletes l j j on the diagonal so the minor matrix has at entry ( j, j) the element of the L matrix in position ( j, j + 1),
which is 0. Therefore, the minor matrix is a lower triangular matrix with a 0 on the diagonal and hence has determinant m i j = 0. Thus the corresponding adjoint matrix A d has adji = 0 with i < j. The entries above the main diagonal
in the adjoint matrix are 0 and so the inverse is lower triangular. A similar calculation applies for an upper triangular matrix.
Definition 2.17: A square matrix A admits a LU (lowerupper) decomposition if A may be written: A = LU , where L is a lower triangular and U an upper
triangular matrix.
Not all square matrices A admit a LU decomposition. E XAMPLE 2.26: Let A be a 2 × 2 matrix that admits a LU decomposition,
so that A = LU :
A= =
Ã Ã
a11
a12
a21
a22
!
=
Ã
l 11
0
l 21
l 22
!Ã
u11
u12
0
u22
!
l 11 u11
l 11 u12
l 21 u11
l 21 u12 + l 22 u22
!
.
If A is nonsingular, then so are L and U , so that both have all entries on the diagonal not equal to 0. Therefore, a11 = l 11 u11 6= 0. However, consider
A=
Ã
0
! 1
1
0
with
A
−1
=
Ã
0
1
1
0
!
,
so A is nonsingular with a11 = 0 and any LU decomposition of a non
singular matrix requires a11 > 0. There is no LU decomposition of A .
E XAMPLE 2.27: Let
A=
Ã
a11
a12
a21
a22
!
.
♦
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
66
a 21 a 11
Observe that, provided a11 6= 0, subtracting
× row 1 from row 2 creates a
matrix with a 0 in the lower left corner and the second row becomes: (a21 , a22 ) −
µ ¶ a21 a21 a21 (a11 , a12 ) = a21 − a11 , a22 − a12 a11 a11 a11 µ ¶ a21 = 0, a22 − a12 . a11
A matrix transformation that achieves this while leaving row 1 unchanged is as follows: Ã
1
0
− aa 21
1
11
!Ã
a11
a12
a21
a22
1
0
− aa 21 11
1
!
=
Ã
a11
a12
0
a22 − aa 21 a12
!
.
11
Observe that Ã
! −1
=
Ã
1
0
a 21 a 11
1
!
.
So, Ã
a11
a12
a21
a22
!
=
Ã
1
0
a 21 a 11
1
!Ã
a11
a12
0
21 a12 a22 − aa11
!
,
♦
which expresses A in the form A = LU .
Definition 2.18: Given an n × n matrix A , for j = 1,... , n, a principal minor is obtained from A by removing 0 ≤ k < n rows and the corresponding columns. If
A j j is the matrix obtained from A by deleting the last n − j rows and columns, this
matrix is called the leading principal minor of order j.
Theorem 2.10: An invertible matrix admits an LU decomposition if and only if the determinants of all leading principal minors are nonzero. A noninvertible matrix may have an LU decomposition (such as a triangular matrix with some 0s on the diagonal.) Definition 2.19: A square matrix A is idempotent if A = AA. The identity matrix is an idempotent matrix. Other examples: with a and b any numbers,
A=
Ã
1
a
0
0
!
or
B=
Ã
0
a
0
1
!
2.6. SPECIAL MATRICES
67
are idempotent matrices, as is
A=
Ã
1−a
a(1−a) b
b
!
a
a(1−a)2 b
(1 − a)2 + a(1 − a) , since A A = b(1 − a) + ab
+ a2 (1−b a)
a(1 − a) + a2
,
with A A simplifying to equal A . In the 3 × 3 case the matrix A is idempotent: 1
3 1 A= 3 1 3
1 3 5 6 − 16
1 3 − 16 . 5 6
If A is idempotent, its determinant is either 0 or 1. Since: A = A A ,  A = A A = A  ·  A = A 2 ,
using the rule that the determinant of the product of two square matrices is the product of the determinants. And, since  A = A 2 , it must be that  A  is equal
to 0 or 1. If A is idempotent, so is I − A , since ( I − A )( I − A ) = I I − I A − AI + A A =
I − 2 A + A = I − A.
Idempotent matrices arise naturally in statistics. The following discussion introduces a matrix that is used to compute deviations around the mean and the sum of squared deviations – the main calculation in computing the variance of sample data. Let ι be an n × 1 vector of 1s, so that ι′ = (1, 1,... , 1). Then ι′ ι = n, so def
that (ι′ ι)−1 = n1 . Define Q = ι(ι′ ι)−1 ι′ :
1 ³ ¡ ¢ ¡ ¢ ¡ ¢ 1 Q = ι(ι′ ι)−1 ι′ = ι n1 ι′ = n1 ιι′ = n1 . 1 .. 1
1
...
1 ´ ¡ ¢ 1 1 = n1 . ..
1
1
...
1
1 .. .
... .. .
1
...
1 . .. . 1
Notice that Q is an idempotent matrix. It is symmetric and nn−1 =1
z } { QQ = ι(ι′ ι)−1 ι′ ι(ι′ ι)−1 ι′ = ι(ι′ ι)−1 ι′ ι(ι′ ι)−1 ι′ = ι(ι′ ι)−1 ι′ = Q.
Given an n × 1 vector x so that x′ = ( x1 , x2 ,... , x n ): 1 ¡ ¢ 1 Qx = n1 . ..
1
1
...
1 .. .
... .. .
1
...
1 Pn
x¯ 1 x 2 ¡ ¢ x 1 + x 2 + · · · + x n x¯ 1 = = = . .. .. .. .. n ... . . . . 1 Pn 1 xn x1 + x2 + · · · + x n x¯ n i =1 x i
1
x1
x1 + x2 + · · · + x n
n1 n
i =1 x i Pn i =1 x i
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
68
Thus, Qx is an n × 1 vector with the mean of the data ( x1 , x2 ,... , x n ) in every
position. Deviations from the mean may be computed by the matrix I − Q where
I is the n × n identity matrix:
x1 − x¯ x¯ x2 x¯ x2 − x¯ ( I − Q ) x = I x − Qx = . − . = . . .. .. .. ¯ ¯ xn − x x xn
x1
Let D = I − Q . From previous discussion, D is symmetric and idempotent so that: (Dx)′ (Dx) = xD ′ Dx = xDx =
n X
i =1
( x i − x¯ )2 .
More generally, suppose that X is an n × k matrix with n ≥ k and X of rank k. Then X ( X ′ X )−1 X ′ is a symmetric idempotent matrix, as is I − X ( X ′ X )−1 X ′ .
Given a square n × n matrix A , the terms a11 , a22 ,... , a nn form the main diag
onal – the entries along the diagonal from the top left corner to the bottom right corner. If a matrix has all entries that are not on the main diagonal equal to 0, the matrix is called a diagonal matrix. Given a collection of numbers, d1 , d2 ,... , d n , diag(d1 , d2 ,... , d n ) is the n × n matrix formed by arranging d1 , d2 ,... , d n in the main diagonal and placing 0s elsewhere.
d1
0
0 diag(d1 , d2 ,... , d n ) = . .. 0
d2 .. .
··· .. .
0 .. .
0
···
dn
···
0
.
Definition 2.20: The trace of a square n × n matrix A is defined as the sum of the
terms on the main diagonal and is denoted tr( A ). Thus: tr( A ) =
n X
a ii .
i =1
The trace operation has a number of useful properties. • If A and B are two square n × n matrices then tr( A + B ) = tr( A ) + tr(B ). • tr( A ) = tr( A ′ ). • tr( cA ) = c · tr( A ), c a scalar. • If A is idempotent, then the trace of A equals its rank: tr( A ) = r( A ).
2.6. SPECIAL MATRICES
69
• If A is m × n and B is n × m, then tr( AB ) = tr(B A ). This follows because: tr( AB ) =
m X n X
i =1 j =1
a i j b ji =
n X m X
j =1 i =1
b ji a i j = tr(B A ).
To see this more explicitly, let C = AB and D = B A . Then:
c11 = a11 b11 + a12 b21 + · · · + a1 n b n1 c22 = a21 b12 + a22 b22 + · · · + a2 n b n2 .. .. . = . c mm = a m1 b1 m + a m2 b2 m + · · · + a mn b nm d11 = b11 a11 + b12 a21 + · · · + b1 m a m1 d22 = b21 a12 + b22 a22 + · · · + b2 m a m2 .. .. . = . d nn = b n1 a1 n + b n2 a2 n + · · · + b nm a mn P The diagonal sum of C = AB is m i=1 c ii and the diagonal sum of D = B A Pn P Pn is i=1 d ii . From the expressions above, m i=1 c ii = i=1 d ii . The following
example illustrates:
E XAMPLE 2.28:
A=
Ã
1
3
2
4
2 , B= 5 6 3 ! 5
1
Ã
32 7 , so AB = 42 1
27 36
!
4
10
16
, and B A = 19
43
67 ,
5
13
21
and observe that tr( AB ) = 32 + 36 = 4 + 43 + 21 = tr(B A ).
♦
This page intentionally left blank
Exercises for Chapter 2
Exercise 2.1 Consider the following three matrices:
A=
Ã
1
2
3
4
!
1 , B= 2 8
3 7
1
2
5
4
2 , C = 6 6 9 7
1
3
4
3
4
2
2 . 1
5
4
3
For each matrix, find (a) the determinant, (b) the matrix of minors, (c) the matrix of cofactors, (d) the adjoint matrix and (e) the inverse matrix. Exercise 2.2 Consider the following matrix: 1 D = 4 5
2
3 .
1
What is the rank of D , D ′ D and DD ′ ? Find (a) D ′ (D transpose), (b) D ′ D , (c) (D ′ D )−1 (d) D (D ′ D )−1 D ′ . Call the last matrix you have calculated E (i.e., E =
D (D ′ D )−1 D ′ ). Find ED and DE . Check that ED = D and D ′ E = D ′ and explain
why this is so.
Exercise 2.3 Given A =
Ã
1
2
0
3
!
, find
(a) A 2 = A A , (b) A 3 = A A A , (c) A −1 , (d) 2 A 3 + 3 A 2 + 5 I .
Verify that (e) ( A + A ′ )′ = A ′ + A and (f) ( A A ′ )′ = A A ′ . 71
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
72
(a) For each of the following matrices, find the determinant and
Exercise 2.4
the inverse if it exists.
1
−1
A= 3
1
−1
2
C= 3 0
1
4
2
5 , 1
−1
2
2
1
2
2
−1 , B = 2
1
4 ,
−4
1 −1 D= 0 3
3
0
1
4
2
0
4
2
0
3
1 , 5
0
6
E=
Ã
t−2
−3
−4
0
!
t−1
.
(b) Find the value of t, such that the determinant of matrix E is zero. Solve the following systems of equations by matrix inversion or
Exercise 2.5 Cramer’s rule: (a)
3 x + 2 y = 16
(d)
−2 x + 3 y = 3
(g)
x + 2y = 4
−5 x + 7 y = 6
x− y+ z
(b)
(e)
2 x − 3 y = 18
x + 3y = 9
Q s = − c + dP
Q d = a − bP
Qd = Qs
(c)
(f)
x + 2y = 2
3 x + 4 y = 12 −x + 2 y − 3 z = 2
2x + y
= 1
4x − 2 y + 5 z = 3
= 4
−x + 2 y − z = 5 .
2x − y + z = 2
Exercise 2.6 Solve the following systems of equations by Cramer’s rule:
(a)
2 y + 4z
= 8
2x + 4 y + 2 z = 8
3x + 3 y + z = 4
(b)
x + 2y + z = 6
−x + y + 2 z = 3
x+z
= 2
(c)
x+ y− z
= 1
−2 y + 4 z
= 1
2x + 4 y − 5 z = 2 .
Exercise 2.7 Consider the following:
X is an n × k matrix e is an n × 1 vector of 1s (i.e., a column vector with n ones) A is an n × n symmetric matrix P = X ( X ′ X ) −1 X ′ M = In − P
EXERCISES FOR CHAPTER 2
73
Q = In −
µ ¶ 1 ee′ n
F = X ( X ′ A X )−1 X ′ A. (a) Determine the order of P, M,Q, F (i.e., the number of rows and columns). (b) Determine which of these are idempotent (B is idempotent if B 2 = BB = B ). (c) Determine which are symmetric (B = B ′ ). (d) Show that for an n vector x
x1 − x¯
x2 − x¯ n X xi . Qx = . , where x¯ = .. i =1 x n − x¯
Exercise 2.8 The holdings of stock of individuals I and II in companies A, B, C, D are given by the matrix:
A I II
Ã
B
C
D
2000 1000
500
5000
1000
2000
0
500
!
.
Denote the stock price of stock A by p A (and similarly for B, C and D ). If the vector of stock prices (per share) is ( p A , p B , p C , p D ) = (24, 47, 150, 14), calculate the value of the stock holdings of both individuals.
Exercise 2.9 (a) Plot the following pairs of equations:
x + 3y = 9
(1)
3 x + 3 y = 12
(2)
4x + 3 y = 4
(3)
2x + 5 y = 9
(4)
4x + 2 y = 8
4x + y = 8
8x + 6 y = 8
4 x + 10 y = 10. If possible, determine the solution of each equation pair.
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
74
(b) Write the four pairs of equations in matrix form. Thus, for example,
ax + b y = α cx + d y = γ
is written: Ã
a
b
c
d
!Ã ! x
y
=
Ã ! α γ
.
(c) Find the determinant of each of the four coefficient matrices in part (b). Exercise 2.10 For the following matrices:
A=
Ã 1
4
2
1
!
,
B=
Ã
3
2
1
6
!
,
C=
Ã 1
3
1
3
2
1
!
,
2
D= 4
6
1
4 ,
2
(a) Find AB , B A ,  A ,  B ,  AB  and compare  AB  with  A  ×  B . (b) Find DC , CD ,  CD  and  DC . (c) Find A ′ and C ′ .
Exercise 2.11 Use Cramer’s rule to solve the following two equation systems: !Ã ! Ã ! Ã 1 2 3 x1 2 1 2 x1 5 3 2 1 x2 = 4 . and = 3 4 x2 6 2 3 1 x3 6 Exercise 2.12 Given:
A=
Ã
3
5
2
3
!
, B=
Ã 2
4
4
2
!
8
, C= −5
10
−1
1
−1
2 ³ y = −7 , x = 1 4
3
−3 3 2 , D = 0 −4 6
0
3
−1
2
3
0
´ 7 ,
find (a) A + B , (b) C − D , (c) D ′ , (d) 5 A − 6B , (e) A −1 , (f) C −1 , (g) CD , (h) DC , (i)
( x′ x), (j) ( x′ x) y, (k) y′ C , (l) the inverse of D .
Exercise 2.13 In matrix notation, a linear regression equation can be written
y = X β + u, where y is an n × 1 vector of observations on the dependent variable.
EXERCISES FOR CHAPTER 2
75
X is a n × k matrix of observations on the exogenous variables. β is a k × 1 vector of coefficients.
u is an n × 1 vector of error terms. The (ordinary least squares OLS) estimator b is defined: b = ( X ′ X )−1 X ′ y. Suppose that
2 3 and y = 4 . 5 3
1
1
X = 1
1
(a) Calculate b. (b) Calculate P = X ( X ′ X )−1 X ′ , M = I n − P (where I n is a n × n identity matrix). (c) Show or prove that X ′ M = 0 and explain why this is so. Exercise 2.14 For the matrix 1 D= 2 1
2
3
1
4 ,
2
1
find the determinant and the matrices of minors and cofactors. Give the adjoint matrix and then the inverse matrix, D −1 . Exercise 2.15 Given the matrices
A=
Ã
2
! 5
4
3
,
B=
4 E = 6 4
Ã 1
4
3
2
!
,
3
2
6
5 , 5
9
C=
Ã 1
3
5
4
2
1
2 F = 6 4
!
D= 3
,
6
1
4 ,
2
3
2
6
5 :
9
2
5
(a) Find A ′ , B ′ , B ′ A ′ and ( AB )′ . Confirm that B ′ A ′ equals ( AB )′ . Is it also true that D ′ C ′ = (CD )′ ?
(b) Calculate the matrix H , defined H = I − D (D ′ D )−1 D ′ . Confirm that H ′ = H ,
HD = 0 and that HH = H . Let the matrix J be defined: J = I − C ′ (CC ′ )−1 C . Show
that JC ′ = 0.
(c) Find the rank of A , E , F , CC ′ and DD ′ . Also find the determinants of A , B , E and F .
CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS
76
(d) Using Cramer’s rule, solve: Ã 2
5
4
3
!Ã
x1
!
x2
=
Ã ! 4
7
4 6 4
,
3 6 9
2 x 2 = 4 . 5 5 x3 6
2
x1
Note that the coefficient matrices are A and E . Exercise 2.16 Let
a11
a12
a21 A= . .. a m1
a22 .. .
··· .. .
a m2
···
···
a1 n
a2 n , .. . a mn
b11
b12
b21 B= . .. b n1
b22 .. .
··· .. .
b n2
···
···
b1k
b2k . .. . b nk
Thus, A is an m × n matrix and B an n × k matrix. Show that ( A ′ )′ = A (the
transpose of A ′ is A ), and ( AB )′ = B ′ A ′ . For the second part, the ( j, i )th element of ( AB )′ is the ( i, j)th element of AB . Thus, show that the ( i, j)th element of AB is equal to the ( j, i )th element of B ′ A ′ .
3 Linear algebra: applications
3.1
Introduction
This chapter discusses applications of matrix algebra to illustrate techniques developed earlier. Section 3.2 reviews the basic supply and demand model with two goods. Sections 3.4 to 3.7 consider various applications, illustrating techniques such as the use of matrix polynomials and convergence of matrix expansions. These techniques have widespread use (for example in the study of dynamic systems) and are developed further in later chapters.
3.2
The market model
The twomarket model has demand and supply in each market given by: Market 1:
D 1 = a0 + a1 p1 + a2 p2 ,
Market 2:
S1 = b 0 + b 1 p1 + b 2 p2
D 2 = α0 + α1 p1 + α2 p2 ,
S2 = β0 + β1 p1 + β2 p2 .
Equating supply and demand in each market gives the system of equations determining equilibrium prices: Ã (a1 − b1 )
(α1 − β1 )
(a2 − b2 )
!Ã
(α2 − β2 )
p1
!
p2
Thus, Ã
p1 p2
!
=
Ã
(a1 − b1 )
(α1 − β1 )
(a2 − b2 )
=
Ã
.
(b0 − a0 )
!
(β0 − α0 )
! −1 Ã
(α2 − β2 )
!
(b0 − a0 )
(β0 − α0 )
.
This is of the form discussed in the previous chapter with Ã ! Ã ! Ã ! (a1 − b1 ) (a2 − b2 ) p1 (b0 − a0 ) A= , x= , b= . (α1 − β1 ) (α2 − β2 ) p2 (β0 − α0 ) 77
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
78
E XAMPLE 3.1: Consider the following twomarket model: Market 1
Market 2
D 1 = 6 + 5 p1 + 4 p2
D 2 = 5 + 4 p1 + 7 p2
S1 = 12 + 2 p1 + 8 p2
S2 = 9 + 6 p1 + 4 p2
To find the equilibrium prices equate supply and demand. In the first market this gives: 6 + 5 p1 + 4 p2 = 12 + 2 p1 + 8 p2 ⇔ 3 p1 − 4 p2 = 6. Similarly, for the second market: 5 + 4 p 1 + 7 p 2 = 9 + 6 p 1 + 4 p 2 ⇔ −2 p 1 + 3 p 2 = 4 . This gives the equation system (in matrix form): Ã
3
−2
so that Ã
p1 p2
!
=
Ã
3 −2
−4
−4
!Ã
3
p1 p2
! −1 Ã ! 6
4
3
!
=
Ã
=
Ã ! 6
4
,
3
4
!Ã ! 6
2
3
4
=
Ã
34 24
!
.
The example generalizes directly to the case of many markets.
♦
3.3 Adjacency matrices A network of connections can be expressed as an adjacency matrix that summarizes connections between different nodes in a network. If there is a direct link from node i to node j, write m i j = 1. If, in addition, there is a direct link back from
j to i , then also, m ji = 1. The network of links is directed in the sense that a link
from i to j does not necessarily imply a link from j to i . The adjacency matrix is:
m 11
m 12
m 21 M = . .. m n1
m 22 .. .
··· .. .
m n2
···
···
m1n
m2n . .. . m nn
Just as M identifies those pairs ( i, j) such that j can be reached from i in one step (m i j = 1), it is easy to identify those nodes connected by exactly two steps.
3.3. ADJACENCY MATRICES
79
Consider M 2 = M M . The entry in the ( i, j)th position of M 2 is
m2i j =
n X
m ik m k j .
k =1
Note that m ik m k j = 1 if and only if there is a link from i to k and from k to j.
Thus, m2i j is the number of distinct ways to go from i to j in two steps (using exactly two links). This insight carries over to steps of greater length. Consider M 3 = M M 2 . The P element in the ( i, j)th position is m3i j = nr=1 m ir m2r j . If there is a path from i to
r, then m ir = 1 and there are m2r j twostep paths (or m ir m2r j paths since m ir =
1) from i through r to j. If there is no path from i to r, m ir = 0 and there are
m ir m2r j = 0 paths from i through r to j. In total, there are m ir m2r j paths of length P 3 from i to j that pass through r. Summing over all r, there are nr=1 m ir m2r j
paths of length 3 that go from i to j. Thus, the entry m3i j in M 3 gives the number of paths of length 3 that go from i to j. If the matrix M is multiplied k times, the same reasoning implies that the ( i, j)th entry m kij gives the number of paths of length k going from i to j. Define the sum:
Sk = M + M2 + M3 + · · · + M k =
k X
M j.
j =1
The element skij in the ( i, j)th position gives the number of paths of length k or less, from i to j. E XAMPLE 3.2: Consider the network: 1
0 M = 0 1
3
2
1
0
0
1 ,
0
0
0 2 M = 1 0
0
1
0
0 ,
1
0
1 3 M = 0 0
0
0
1
0 .
0
1
The matrices M , M 2 , M 3 give the nodes reachable in one, two and three steps from any initial node. 0 1 S1 = M = 0 0 1 0
0
1 ,
0
1 S 3 = M + M + M = 1 1 2
3
0
1
1
S2 = M + M 2 = 1
0
1 ,
1
1 1 1
1 1 , 1
1
0
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
80
so, for example, considering row 1 of S2 , there is one way to go from node 1 to node 2 using one or two steps. But there is no way to go from node 1 (back) to node 1 in one or two steps. From S3 , there is one way to go from
♦
any node to any other in three or fewer steps.
The next example illustrates similar points with a larger set of nodes.
E XAMPLE 3.3: In this example, the network consists of six nodes. The corresponding adjacency matrix is on the right.
1
1 2
5
2
3 3
4 5
6
4
6
Then M 2 and the sum S6 = 0 2 0 2 M = 0 0 0
P6
j =1 M
0
1
1
1
0
0
0
0
0
1
0
0
0
0
0
1
0
1
0
1
0
0
0
0
0
2 0 , 0 0 1
j
1
2 3 4 5 6
0
1 0 0 0 0
0 1 0 1 0
0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 1 1 0
are: 6 6 3 S6 = 0 3 0
3
3
7
3
8
6
3
11
3
3
3
5
3
0
0
3
0
3
3
8
3
0
0
3
0
10 4 . 3 7 3
Thus, for example, from M 2 there is exactly one path of length 2 from node 1 to node 5: (1 → 2 → 5), indicated by m 15 = 1. Similarly, there are exactly
two paths of length 2 from node 2 to node 1 ((2 → 3 → 1) and (2 → 5 → 1)),
indicated by m 21 = 2.
Considering S6 , s6i j , the ( i, j)th entry gives the number of paths with 6
or fewer steps from i to j. So, for example, s24 = 11 indicates that there are
11 paths from 2 to 4 with 6 steps or less.
♦
3.4. INPUT–OUTPUT ANALYSIS
81
R EMARK 3.1: One issue concerns the possibility of moving from one node to another. The main observation in this regard is that with n nodes, if it is not possible to reach node j from node i in n steps, then it is impossible to reach j from i in any number of steps. This is easy to see. Let Wk ( i ) ⊆ {1, 2,... , n} be the set of nodes reachable from i in k or fewer
steps. Write #Wk ( i ) to denote the number of nodes in Wk ( i ). Then Wk ( i ) ⊆ Wk+1 ( i ):
Wk ( i ) is a subset of Wk+1 ( i ), so that if a node l is in Wk ( i ) it is also in Wk+1 ( i ). This follows from the observation that if it is possible to reach j from i in k
or fewer steps, it is certainly possible to reach j in no more than k + 1 steps. Next, if Wk ( i ) = Wk+1 ( i ) then the same set of nodes is reached in k steps as
in k + 1 steps – additional steps reach no extra nodes. If Wk+1 ( i ) = Wk ( i ) then
Wk+ j ( i ) = Wk ( i ) for j = 1, 2,... so, at each iteration in k, either extra nodes are reached or the process ceases to add nodes. Suppose that Wn ( i ) 6= {1, 2,... , n}.
Then for some k < n it must be that Wk ( i ) = Wk+1 ( i ), otherwise the set Wk ( i ) would have increased at each round k and, with n steps, would have n nodes.
But, Wk ( i ) = Wk+1 ( i ) implies that Wk ( i ) = Wk+ j ( i ) for all j ≥ 1. At j = n − k,
Wk ( i ) = Wn ( i ), and for any r ≥ 0, with j = n − k + r, Wk ( i ) = Wn+r ( i ). For any r, Wk ( i ) = Wn ( i ) = Wn+r ( i ). If node l is not in Wn ( i ) it can never be reached
from i . Note that this includes i : if i is not in Wn ( i ), there is no way to return to i .
The possibility of moving from one node to any other is closely related to the concept of irreducibility, which arises in the study of dynamic processes. Definition 3.1: A matrix A is irreducible if and only if, for any i , j, there is a sequence i 1 , i 2 ,... , i k with i 1 = i , i k = j and a i l i l+1 6= 0 for l = 1,... , k − 1.
3.4 Input–output analysis In input–output analysis, the economy is viewed as a number of interlinked industries (n industries). The output of an industry is used for final consumption, and in the production of other goods. For example, steel is used to produce trucks and transportation equipment, roads and bridges, and so on. Moreover, industries may be interlinked in complex ways: trucks are produced using steel but also transport iron ore used to make steel, giving rise to “circular” relationships in the production process – achieving a certain level of steel output implies an appropriate level of transportation capacity. Apart from output going from one industry to others, some steel is “consumed” – for example, people buy garden tools, kitchen utensils and so on. So, the total output of steel is allocated between that used in
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
82
production in other industries (such as transportation and housing industries), an intermediate or “interindustry” demand; and consumption or final demand. Planning to achieve some (target) level of consumption or final demand indirectly implies production levels in each industry. Input–output analysis is concerned with determining the industry production levels required to sustain or achieve target levels across the entire range of final demands. A simple model is developed next to illustrate the procedure. Let a i j = amount of i required to produce 1 unit of j, x i = output of i and
f i = final demand for i . Thus if x j is the amount of good j produced, a i j x j is the
amount of i required in the production of j. (If one kilo of good j is produced,
a i j is the fraction of one kilo of i used in the production of the kilo of j.) Let all measurements be in dollars. (So, for example, a i j is the number of dollars of good i required to produce one dollar in value of good j.) The output of industry
i is allocated to production in other industries (the a i j x j ) and final demand ( f i ). Thus,
x i = a i1 x1 + a i2 x2 + · · · + a in x n + f i , i = 1,... , n. This can be rearranged: −a i1 x1 − a i2 x2 · · · + (1 − a ii ) x i · · · − a in x n = f i , i = 1,... , n,
or, in matrix notation:
(1 − a11 ) −a21 .. .
−a12
(1 − a22 ) .. .
···
−a1 n
− a n1
− a n2
···
(1 − a nn )
··· .. .
−a2 n .. .
x1 x2 .. . xn
=
f1 f2 .. . fn
.
This expression can be written succinctly: ( I − A ) x = f , where the matrices are:
I =
1
0
0 .. .
1 .. .
··· .. .
0 .. .
0
···
1
0
···
1 − a11 −a21 I−A= . .. − a n1
0
,
a11
a12
a21 A= . .. a n1
a22 .. .
··· .. .
a2 n .. .
a n2
···
a nn
−a12
···
−a1 n
− a n2
···
1 − a nn
1 − a22 .. .
··· .. .
−a2 n .. .
.
···
a1 n
,
3.4. INPUT–OUTPUT ANALYSIS
83
E XAMPLE 3.4:
0.7 A = 0.1 0.1
0.4
0.3
0.2
0.3
0.3
0.2
The matrix A corresponds to the system:
x1 = 0.7 x1 + 0.4 x2 + 0.3 x3 + f 1 x2 = 0.1 x1 + 0.2 x2 + 0.3 x3 + f 2 x3 = 0.1 x1 + 0.3 x2 + 0.2 x3 + f 3 . So, to produce one dollar of x1 requires 70 cents of x1 as input, 10 cents of x2 and 10 cents of x3 . Thus ( I − A ) x = f is:
0.3
−0.1 −0.1
−0.4
0.8
−0.3
f1 −0.3 x2 = f 2 . 0.8 x3 f3
−0.3
x1
For a fixed value of f the unique solution to these equations is obtained:
x = ( I − A )−1 f . For example, f1 90 x1 0.3 f = = ⇒ = f 75 x − 2 2 0.1 f3 80 x3 −0.1 1239.20 = 400.57 . 405.17
−0.4
−0.3
−0.3
0.8
0.8
−1
−0.3
90 75 80
Thus, if the economy must meet the level of final demand given by the f vector, the required level of output in the industries is given by the x vector,
x′ = (1239.20, 400.57, 405.17).
♦
Provided ( I − A ) is nonsingular, the industry output vector is x = ( I − A )−1 f .
However, not only must x satisfy x = ( I − A )−1 f , it must also be the case that x
is nonnegative for it to be a feasible level of production; negative output cannot be produced. So, it must be that x i ≥ 0 for all i . Whether x is nonnegative or not
depends on the A matrix.
The following discussion shows that a sufficient condition for a nonnegative solution to x = ( I − A )−1 f is that the sum of elements in each column of A is less
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
84
than 1. In this case, it turns out that all entries in the matrix ( I − A )−1 are nonnegative, so that ( I − A )−1 f is nonnegative, since f is nonnegative. When all
column sums of A are less than 1, A k , the kth power of A , approaches a matrix of 0s as k becomes large, and this fact is sufficient to ensure that ( I − A )−1 has all
nonnegative entries. The next example illustrates this property; the key result is then developed.
E XAMPLE 3.5: To illustrate, consider the following input–output coefficient matrix A and final demand vector f given by:
A=
Ã
!
a11
a12
a21
a22
=
−a12
!
Ã
Ã
x1
Ã 0.2
0.5
0.3
0.1
!
,
f =
Ã ! f1
f2
=
Ã ! 30
60
.
Then,
I−A=
Ã
1 − a11 −a21
1 − a22
=
so that
0.8 −0.3
30 19 = 10 x2 19
!
−0.5
!
implies
0.9
( I − A)
−1
30 19 = 10 19
50 57 80 57
! 50 Ã ! Ã 100 57 30 = . 80 60 100 57
Notice that the solution is positive – both x1 and x2 are positive. This is not accidental: it is guaranteed by the fact that the column sums of the A matrix are both less than 1. With a view to motivating future discussion, consider multiplying A by itself multiple times: A 2 = A A , A 3 = A A A and
A 4 = A A A A . Note that the column sums of the A matrix are 0.5 = 0.2 + 0.3
(first column) and 0.6 = 0.5 + 0.1 (second column), and that the largest of these two numbers is 0.6. Observe now that
19 100 A2 = 9 100
3 20 , 4 25
83 1000 A3 = 33 500
11 100 , 61 1000
31 625 A4 = 63 2000
21 400 , 391 10 000
and these matrices have column sums as follows. The column sums of A 2 = ¡ 149 171 ¢ ¡ 28 31 ¢ 3 100 , 100 = (0.28, 0.31); column sums of A = 1000 , 1000 = (0.149, 0.171); and ¡ ¢ 916 column sums of A 4 = 10811 000 , 10 000 = (0.0811, 0.0916). Finally, note that (a) the largest column sum in A 2 is no larger than (0.6)2 = 0.36, (b) the largest
column sum in A 3 is no larger than (0.6)3 = 0.216 and (c) the largest col
umn sum in A 4 is no larger than (0.6)4 = 0.1296. There is a good reason
for this pattern and it has one particularly useful implication – that the
3.4. INPUT–OUTPUT ANALYSIS
85
solution, x, is positive. These matters are discussed next and relate to the nonnegativity of the solution to the input–output problem.
♦
Before proving the result (that A k → 0 when all column sums of A are less
than 1), a simple formula for calculating certain sums of scalars will provide useful insight. This formula is then generalized to matrices. Let α be a real number (α ∈ R ) with  α < 1. Then: (1−α)(1+α+α2 +α3 +α4 +· · ·+αn ) = (1+α+α2 +· · ·+αn )−(α+α2 +· · ·+αn+1 ) = 1−αn+1 . Thus, (1 + α + α2 + · · · + αn ) =
1 − α n +1 = (1 − α)−1 (1 − αn+1 ). 1−α
Since  α < 1, as n becomes large, αn+1 → 0. Consequently,
lim (1 + α + α2 + · · · + αn ) = lim (1 − α)−1 (1 − αn+1 ) = (1 − α)−1 .
n→∞
n→∞
Because αn → 0 as n → ∞, (1 + α + α2 + · · · + αn + · · · ) =
1 1 −α .
This discussion applies to real numbers, but there is an analogous discussion for matrices. Let A 2 = A A , A 3 = A A A and so on. Then, ( I − A )( I + A + A 2 + A 3 + · · · + A n ) = ( I + A + A 2 + · · · + A n ) − ( A + A 2 + · · · + A n +1 ) = I − A n +1 .
Thus
I + A + A 2 + · · · + A n = ( I − A ) −1 ( I − A n +1 ) . These calculations show that provided A n → 0 as n → ∞, (1 + A + A 2 + A 3 + · · · +
A n ) → ( I − A )−1 . Thus, ( I + A + A 2 +· · ·+ A n · · · ) = ( I − A )−1 , if A n → 0 as n becomes
large.
Theorem 3.1: A sufficient condition for A n → 0 is that the sum of the elements in P any column of A is less than one: i a i j < 1 for all j. P ROOF : If this condition holds then there is some number less than 1, and P greater than any column sum: ∃ r < 1 with i a i j ≤ r < 1, ∀ j. Denote the ( i, j)th
element of A 2 by a(2) , the i th row of A by a i , and the jth column of A by ij P P P P P P (2) a j . Then a i j = a i · a j = k a ik a k j . Thus i a(2) = i k a ik a k j = k ( i a ik a k j ) ij
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
=
86
P P P P 2 k ( i a ik ) a k j ≤ r k a k j ≤ r , using the fact that ( i a ik ) ≤ r. This implies that
a(2) ≤ r2 . Proceeding in this way gives a(inj ) ≤ r n and since r < 1, a(inj ) → 0 for all ij
( i, j).
To see these calculations more clearly: P
k a1 k a k1
P k a2 k a k1 A = .. . P k a nk a k1 2
P
k a1 k a k2
···
P
k a1 k a kn
P
k a2 k a k2
···
P
k a2 k a kn
P
.. .
..
k a nk a k2
. P
···
.. .
k a nk a kn
.
Consider the first column of A 2 . Writing out all the terms:
a11 a11 + a12 a21 + · · · + a1 n a n1
a21 a11 + a22 a21 + · · · + a2 n a n1 .. . .. .. . . . a n1 a11 + a n2 a21 + · · · + a nn a n1 Adding all these together: column 1 sum
column 2 sum
z z } { } { a11 (a11 + a21 + · · · + a n1 ) +a21 (a12 + a22 + · · · + a n2 ) column n sum
} { z + · · · + a n1 (a1 n + a2 n + · · · + a nn ) .
Since each column sum is no larger than r, this sum is no larger than a11 r +
a21 r + · · · + a n1 r = (a11 + a21 + · · · + a n1 )r, which for the same reason is no larger
than r2 . The same reasoning applies to the other columns of A 2 .
Returning to the discussion, lim n→∞ A n+1 = 0, when the column sums of A are
all less than 1. Thus (1 + A + A 2 + · · · ) = ( I − A )−1 and so x = ( I − A )−1 f = ( I + A +
A 2 +· · · ) f . Since all the terms in the sum f + A f + A 2 f + A 3 f +· · · = ( I + A + A 2 +· · · ) f
are nonnegative, this implies that x is nonnegative.
E XAMPLE 3.6: This example illustrates how the polynomial expansion approximates the inverse. Ã 0.2
0.35
!
2
Ã
0.1275
0.175
!
, ,A = 0.125 0.1775 0.25 0.3 ! ! Ã Ã 0.0381 0.0534 0.0693 0.0971 4 3 , ,A = A = 0.0381 0.0534 0.0694 0.0970
A=
3.4. INPUT–OUTPUT ANALYSIS
87
5
Ã 0.0210
0.0294 0.0294
7
0.0210 Ã 0.0063 0.0063
0.0089
A = A =
0.0089
! !
6
,A = 8
,A =
Ã Ã
0.0115
0.0161
0.0115
0.0161
0.0035
0.0049
0.0035
0.0049
!
,
!
.
So, 2
3
4
5
6
7
8
I+A+A +A +A +A +A +A +A =
Ã
1.4772
0.7348
0.5248
1.6872
!
,
and ( I − A)
−1
=
Ã
1.4815
0.7407
0.5291
1.6931
!
.
Thus,
( I − A)
−1
2
3
4
5
6
7
8
− (I + A + A + A + A + A + A + A + A ) =
Ã
0.0043
0.0060
0.0043
0.0060
!
,
which is very small. So ( I + A + A 2 + A 3 + A 4 + A 5 + A 6 + A 7 + A 8 ) approximates
( I − A )−1 accurately to the first two decimal places.
♦
R EMARK 3.2: Given f ≥ 0, the key requirement for x ≥ 0 is that ( I − A )−1 has
all nonnegative entries. This is a necessary condition: if the ( i, j)th element of ( I − A )−1 is negative, a final consumption vector f with 1 in the i th row produces an x vector with a negative entry in the i th position. Consider the
condition: ∃ x∗ ≥ 0 with x∗ ≫ Ax∗ , where a ≫ b means that a i > b i for all i .
Note that this implies that x∗ ≫ 0. Since x∗ ≫ A k x∗ implies the elements of
A k x∗ are bounded, so A k x∗ or some subsequence converges to a vector z giving
x∗ ≥ z ≥ 0. Since x∗ ≫ Ax∗ , there is a λ < 1 with λ x∗ ≫ Ax∗ . Therefore, since
all entries in x∗ and Ax∗ are nonnegative, A (λ x∗ ) ≥ A Ax∗ or λ Ax∗ ≥ A 2 x∗
(λ Ax∗ ≫ A 2 x∗ if each row of A is strictly positive). And, since λ x∗ ≫ Ax∗ , λ2 x∗ ≫ λ Ax∗ . Therefore λ2 x∗ ≫ A 2 x∗ . Multiplying this expression by A with
the same observations gives λ3 x∗ ≫ A 3 x∗ . With k iterations, λk x∗ ≫ A k x∗ and since λk x∗ → 0, A k x∗ → 0. And since x∗ ≫ 0, A k → 0.
Finally, suppose that the row sums of A are all less than 1. Let x∗ be a
vector of 1s, so that Ax∗ is a vector with the row sums of A . Since the row sums are all less than 1, x∗ ≫ Ax∗ , and the previous discussion implies that A k → 0.
This may be seen directly. Recall that in A 2 , the first row consisted of the terms:
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
88
P { k a1 k a k j }nj=1 . Expanding these: X k
a1 k a k1 = a11 a11 + a12 a21 + a13 a31 + · · · + a1 n a n1
X
a1 k a k2 = a11 a12 + a12 a22 + a13 a32 + · · · + a1 n a n2
k
.. . X k
.. .
=
a1 k a kn = a11 a1 n + a12 a2 n + a13 a3 n + · · · + a1 n a nn .
Adding these terms to obtain the first row sum in A 2 :
a11 (a11 + a12 + · · · + a1 n ) + a12 (a21 + a22 + · · · + a2 n ) + · · · + a1 n (a n1 + a n2 + · · · + a nn ).
Letting d i be the row sum of i , this is equal to a11 d1 + a12 d2 + · · · + a1 n d n .
And letting d = max i d i = d < 1, the sum is no larger than (a11 + a12 +· · · + a1 n )d ,
which in turn is no larger than d 2 . Thus, the row sum of A k is no larger than
d k , which converges to 0, so that A k → 0.
3.5 Inflation and unemployment dynamics Earlier calculations using a model of the Phillips curve and Okun’s law led to the equation system: ¸ · 2 1 2 4 π t −1 − u t −1 + g + u¯ 5 5 5 5 ¸ · 4 2 1 2 u t = π t−1 + u t−1 + − g + u¯ . 5 5 5 5 πt =
Rewriting in matrix form, Ã
4 5 = 2 ut 5
πt
!
− 25 4 5
Ã
! 1 2 ¯ g + u 5 5 . + u t −1 − 25 g + 51 u¯ π t −1
We can write this system succinctly as: v t = Av t−1 + δ. At a steady state, v t = v t−1 =
v∗ and v∗ = Av∗ + δ so that v∗ = ( I − A )−1 δ or ( I − A )v∗ = δ. Here, ( I − A ) v∗ = δ
⇐⇒
1 5 − 25
2 Ã ∗! 1 g + 52 u¯ 5 π 5 = 2 1 1 u∗ ¯ − g + u 5 5 5
3.6. STATIONARY (INVARIANT) DISTRIBUTIONS
89
or 1 ∗ 2 ∗ π + u = 5 5
1 2 g + u¯ 5 5
2 1 2 1 ¯ − π∗ + u∗ = − g + u, 5 5 5 5 ¯ with solution π∗ = g and u∗ = u. Returning to the equation v t = Av t−1 + δ, whether v t converges or not depends
on the matrix A. Substituting for v t−1 ,
v t = A[Av t−2 + δ] + δ = A 2 v t−2 + A δ + δ. And continuing, v t = A t v0 + [I + A + A 2 + · · · + A t−1 ]δ. If A t → 0, then [I + A +
A 2 + · · · + A t−1 ] → (I − A)−1 and v t → v∗ = (I − A)−1 δ. (The convergence of A t is
discussed in a Chapter 15, which deals with eigenvalues.) In this case, starting ¯ from any point (π0 , u0 ), the system converges over time to (π∗ , u∗ ) = (g, u).
3.6
Stationary (invariant) distributions
The notion of a stationary distribution is most easily motivated by an example. Suppose that a worker currently employed has a 95% probability of being employed next period and a worker currently unemployed has a 25% probability of being employed next period. Letting E denote “employed” and U denote “unemployed”; write P(U  E) to denote the probability of being unemployed next period given that one is currently employed. With this notation, P(E  E) = 0.95, P(U  E) = 0.05, P(E  U) = 0.25 and P(U  U) = 0.75. This defines a transition
matrix or Markov matrix, P: Ã P(E  E) P= P(E  U)
P(U  E)
P(U  U)
!
=
Ã 0.95
0.05
0.25
0.75
!
We are interested in the evolution of employment and unemployment in the workforce. Suppose that initially (currently), the fraction πE of the workforce is employed, and the fraction πU unemployed. Of workers currently employed, the fraction P(E  E) will continue to be employed next period – so the probability of being currently employed and continuing to be employed next period is πE P
(E  E). Of workers currently unemployed, the fraction P(E  U) will find employment next period – so the probability of being currently unemployed and finding
employment next period is πU P(E  U). The fraction of the workforce employed next period, πˆ E , is equal to the fraction of those currently employed who retain employment, πE P(E  E), plus the fraction of those currently unemployed who
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
90
gain employment, πU P(E  U): πˆ E = πE P(E  E) + πU P(E  U). Similarly, the frac
tion of the workforce unemployed next period is πˆ U = πE P(U  E) + πU P(U  U). For
example, suppose that πE = 0.7 and πU = 0.3. Then πˆ E = (0.7)(0.95) + (0.3)(0.25) = 0.74, and πˆ U = (0.7)(0.05) + (0.3)(0.75) = 0.26. Notice that: Ã ! P(E  E) P(U  E) (πˆ E , πˆ U ) = (πE , πU ) . P(E  U) P(U  U)
ˆ = π P. Because this process proceeds over time, Writing π = (πE , πU ), this gives π
let π(t) = (πE (t), πU (t)) denote the distribution of states (employment and unemployment) in period t, so the evolution of the process proceeds according to the formula π(t + 1) = π(t)P. Proceeding, step by step, if π(0) denotes the current distribution of employed and unemployed, next period the distribution is determined by π(1) = π(0)P. Repeating this reasoning, in the third period, the distribution is determined by
π(2) = π(1)P = π(0)P P = π(0)P 2 . In the (t + 1)th period, this becomes π(t) = π(0)P t . In the case where the matrix P has all positive entries, P t converges to a
matrix, P ∗ , where in any given column all entries are the same, or equivalently, all rows are identical (see Section 3.6.1). In the two state case, Ã ! α β ∗ P = , α β where α and β are positive and sum to 1, (α + β = 1). Notice that Ã ! α β ∗ π(0)P = (πE (0), πU (0)) = (α, β). α β The important point to observe is that π(0)P ∗ = (α, β), regardless of the particular
value of π(0). Write π∗ = π(0)P ∗ to denote this distribution, the long run stationary or invariant distribution. Another way of viewing this is to observe that π∗ is ˆ 6= π∗ the only distribution that satisfies the equation π = π P ∗ : π∗ = π∗ P ∗ and π ˆ 6= π ˆ P ∗ , since π ˆ P ∗ = π∗ . implies that π
The following calculations illustrate these observations using the example. Ã
0.95
0.05
!
2
Ã 0.915
0.085
!
,P = , 0.25 0.75 0.425 0.575 Ã ! Ã ! 0.8905 0.1095 0.8380 0.1620 3 10 P = ,P = , 0.5475 0.4525 0.8098 0.1902 Ã ! Ã ! Ã 0.8335 0.1665 0.8333 0.1667 0.8333 20 50 100 P = P = ,P = 0.8327 0.1673 0.8333 0.1667 0.8333 P=
0.1667 0.1667
!
.
3.6. STATIONARY (INVARIANT) DISTRIBUTIONS
91
If the initial distribution is π(0) = (0.7, 0.3), then π(1) π(2) π(3)
= π(0)P = π(0)P = π(1)P = π(0)P = π(2)P = π(0)P
π(10) = π(9) P = π(0) P π(50) = π(49) P = π(0) P π(100) = π(99) P = π(0) P
= (0.740, 0.260),
2
= (0.7680, 0.2320)
2
= (0.7876, 0.2124),
9
= (0.8296, 0.1704)
50 100
= (0.8333, 0.1667), = (0.8333, 0.1667).
Therefore, in the long run, regardless of the initial employment–unemployment ¡ ¢ distribution, the longrun distribution is π∗ = (0.8333, 0.1667) = 65 , 61 . The frac
tion of the workforce employed in the long run is 56 .
Although this discussion has centered around the twostate case (the states being employed and unemployed), the principles apply generally. If there are n states, labeled s1 ,... , s n with probability p i j of going from state s i to s j , then the corresponding transition matrix is:
p12
p21 P= . .. p n1
p22 .. .
··· .. .
p n2
···
Pn
p i j = 1. If p i j > 0 for all ( i, j) then P t
where each row sums to 1: for each i ,
j =1
···
p1 n
p11
p2 n , .. . p nn
converges to a matrix P ∗ as t becomes large, where the matrix P ∗ has all rows equal or, equivalently, the numbers in each column are the same. If π∗ is the invariant distribution of P so π∗ = π∗ P , then π∗ . P ∗ = .. , π∗
a matrix with n rows, each one equal to the invariant distribution. This raises the question: Why does this convergence take place?
3.6.1
Convergence to a stationary distribution
Let P be a transition matrix with representative entry p i j : p i j is the probability that state j will be reached next time, given that the current state is i . Let M j be the largest number in column of j, and m j be the smallest number in column j.
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
92
The matrix P 2 = P P has entry p(2) in the ( i, j)th position, where p(2) = ij ij
P
k
p ik p k j .
Say that P is strictly positive if there is some (small) strictly positive number ǫ such that p i j ≥ ǫ for every i, j. Theorem 3.2: Suppose that P is strictly positive. Then P t converges to a matrix P∗ where all rows of P∗ are identical. P ROOF : Considering column j of P , since p k j ≥ m j for all k and for some k, say
k∗ , p k∗ j = M j , we see that p(2) = ij
X k
p ik p k j = p ik∗ p k∗ j +
X
k6= k∗
p ik p k j = p ik∗ M j +
X
p ik p k j .
k6= k∗
Every element in column j is at least as large as m j , p k j ≥ m j , so that X
k6= k∗
p ik p k j ≥
X
k6= k∗
p ik m j = m j
X
k6= k∗
p ik = m j (1 − p ik∗ ).
(In this calculation, because each row sums to 1,
p ik = 1 − p ik∗ .) Therefore, p(2) = ij
X k
P
k6= k∗
p ik + p ik∗ = 1,
P
k6= k∗
p ik p k j ≥ p ik∗ M j + (1 − p ik∗ )m j ≥ ǫ M j + (1 − ǫ)m j = m j + ǫ( M j − m j ),
where the inequality follows because p ik∗ ≥ ǫ. Exactly the same reasoning shows that:
p(2) = ij
X k
p ik p k j ≤ (1 − ǫ) M j + ǫ m j = M j − ǫ( M j − m j ).
Therefore, every entry of the jth column of P 2 is at least as large as m j + ǫ( M j − m j ), and no entry of the jth column of P 2 is larger than M j −
ǫ( M j − m j ). This shows that elements in the columns of P 2 are “closer” together
than elements in the columns of P . If M 2j and m2j are the largest and smallest entries in column j of P 2 then {m j + ǫ( M j − m j )} ≤ m2j ≤ M 2j ≤ { M j − ǫ( M j − m j )},
so
M 2j − m2j ≤ { M j − ǫ( M j − m j )} − {m j + ǫ( M j − m j )} = (1 − 2ǫ)( M j − m j ). The same reasoning shows that with M tj and m tj the largest and smallest entries in column j of P t , M tj+1 − m tj+1 ≤ (1 − 2ǫ) t ( M j − m j ) → 0 as t becomes large.
3.6. STATIONARY (INVARIANT) DISTRIBUTIONS
93
The intuition is simple. In the jth column of P 2 , entry ( i, j) is P i(2) = j Therefore,
P i(2) j
P
k
p ik p k j .
is a weighted average of the elements of column j of P . Since
all weights are strictly positive, this weighted average must be smaller than the largest entry in column j and larger than the smallest entry in this column. (Assuming not all entries of column j are equal.) It is worth remarking that the assumption that every entry in P is positive gives a sufficient condition for P t to converge to a matrix P ∗ with all rows equal. But it is not necessary – weaker conditions on P are sufficient to ensure convergence, but are more complex to describe.
3.6.2
Computation of the invariant distribution
Finding the longrun stationary distribution would be tedious if computed by calculating P t for large values of t. There is an easy way to find the stationary distribution. Let I be the identity matrix, and define Q = I − P . Let c ii be the cofactor P in the i th row and column of the cofactor matrix of Q , and let c = ni=1 c ii . Then
π∗ = (π∗1 , π∗2 ,... , π∗n ) is given by
π∗i =
c ii , i = 1,... , n. c
Applying this to the example discussed earlier,
Q = I −P =
Ã 1 − 0.95
− 0.25
− 0.05
1 − 0.75
!
=
Ã
0.05 −0.25
−0.05
0.25
Thus, c11 = 0.25, c22 = 0.05, c = 0.25 + 0.05 = 0.3, so that π∗1 = 0.05 0.3
= 61 .
!
.
0.25 0.3
=
5 6
and π∗2 =
E XAMPLE 3.7: Suppose that the following statistics for a baseball player are known. When batting, there is a
3 5
chance that he will hit the ball (on
a swing), given that he hit the ball on his most recent swing. Write this as
P (H  H ) = 53 . Thus, the probability that he will miss on a swing, given that
he hit the ball on his most recent swing, is P ( M  H ) = 52 . The probabilities of hit and miss given that he missed on the most recent swing are P (H  M ) = 15 ,
P ( M  M ) = 45 . The transition matrix is then: T=
Ã
P (H  H )
P (H  M )
P (M  H )
P (M  M)
!
=
Ã
3 5 1 5
2 5 4 5
!
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
94
What is the (unique) stationary or invariant distribution of this transition matrix? On average (over the season), how often does the batter miss the ball on a swing? First:
I −T =
− 25
1 − 53 − 15
1−
2 = 5 4 − 51 5
The cofactors of I − T are c11 = 15 , c22 = ¢ ¡ invariant distribution is: 31 , 23 .
2 5
− 52 1 5
so that c = c11 + c22 = 53 . The
♦
E XAMPLE 3.8: Students never sleep – they spend all their time in the restaurant (R) or library (L). At any given time, a student in the library has a 75% chance of being in the library an hour later (and a 25% chance of being in the restaurant). A student in the restaurant has an 80% chance of being in the restaurant an hour later (and a 20% chance of being in the library). These probabilities determine the following transition matrix:
P=
Ã
P ( L  L)
P ( R  L)
P (L  R )
!
P (R  R )
3 4 = 1 5
1 4 4 5
On “average”, what percentage of the student population is in the library (i.e., find the stationary distribution of the transition matrix)? In this case,
Q = I −P Thus, the cofactors are c11 = Therefore π1 = the library.
4 9
1 5
1 4 = − 15
− 14
1 4, average 49
and c22 =
and π2 = 95 . So, on
1 5
so that c = c11 + c22 =
9 20 .
or 44.4% of students are in
♦
3.7 Econometrics A common problem in statistics and economics is the measurement of the relation between different variables. Suppose there is a collection of observations on a variable, yt : ( y1 , y2 ,... , yT ) and on a second variable x t : ( x1 , x2 ,... , xT ). Suppose also that these variables are linearly related. The discussion here describes a statistical process for measuring this relationship.
3.7. ECONOMETRICS
95
When the data sets consist of just two points, (y1 , y2 ) and (x1 , x2 ), then these may be arranged in pairs, (x1 , y1 ) and (x2 , y2 ), and placed on a graph as in Figure 3.1(A). Because there are only two points, one can draw a straight line
A y2 y3 y1
⋆ •
x1 x3
B ◦
•
◦
y2 y3
⋆
y1
x2
•
◦
•
◦
x1 x3
x2
Figure 3.1: Fitting a line through both points as indicated. The slope is given by b =
y2 − y1 x2 − x1 .
Since the equa
tion satisfies yi = a + bx i , the intercept is given by a = y1 − bx1 = y2 − bx2 . Thus, the
relationship between y and x is fully described by the intercept and slope, (a, b). Suppose now there is a third point (x3 , y3 ), located at ⋆ on the graph. This point does not lie on the line, and it is not possible to draw a straight line through these three points. And one could have still more points – such as those indicated by ◦. In the graph, it seems reasonable to say that x and y are positively
related – when x increases, so does y. Also, the relationship appears “linear”. From a different perspective, one can imagine starting with the data as in Figure 3.1(B), and trying to draw a straight line as close to all the points as possible. This problem arises in econometric theory, where there is a presumed relationship between y and x variables of the form: yi = β0 + β1 x i + ε i If the x variable takes on the value x i , the corresponding y variable is determined by multiplying x i by a constant β1 , and adding an intercept β0 and a random error term ε i . The theory is that y is determined linearly from x, plus a small random shock. Given a finite collection of observations, (y1 , x1 ), · · · ,(yn , x n ), how
should one estimate β0 and β1 ? Suppose b0 and b1 are the estimates for β0 and β1 . Then, the ith observation has a gap between yi and its estimated value (according to the estimated line given the intercept b0 and slope b1 ) of b0 + b1 x i equal to e i = yi − b 0 − b 1 x i .
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
96
Ordinary least squares requires minimizing the sum of squares of these errors by choice of b0 and b1 .
S=
n X
i =1
( y i − b 0 − b 1 x i )2 .
(3.1)
In matrix form it is easy to describe (b0 , b1 ). Define the matrices X and y:
1
x1
1 X = . .. 1
x2 .. . xn
y1
y2 y= . .. yn
It turns out that the choice of b0 and b1 that minimizes the sum of squared errors in Equation 3.1 is given by the following expression: Ã
b0 b1
!
= b = ( X ′ X )−1 X ′ y.
E XAMPLE 3.9: Suppose that there are five observations on ( x, y) pairs: x =
(3, 4, 6, 2, 5) and y = (7, 9, 12, 5, 11). Thus: 1 1 X = 1 1 1
3
4 6 , 2 5
7
9 y = 12 5 11
and ′
X X=
Ã
5
20
20
90
!
′
,
(X X )
−1
=
9 5
− 52
− 25
1 10
′
,
X y=
Ã
44
!
194
.
Therefore,
b=
Ã
b0 b1
!
′
= (X X )
−1
X
′
9 5 y= − 25
− 52 1 10
Ã
44 194
!
8
5 = . 9 5
♦ R EMARK 3.3: This calculation assumes that ( X ′ X )−1 exists – that X ′ X has an inverse. What if ( X ′ X )−1 does not exist? If r( X ′ X ) < 2 then since
3.7. ECONOMETRICS
97
r( X ′ X ) = min{r( X ′ ), r( X )} and r( X ′ ) = r( X ), r( X ) < 2. This implies that the
columns of X are linearly dependent so that ( x1 ,... , x n ) = α(1,... , 1), with α 6= 0. P P P Thus, x i = α, ∀ i and so x i = nα, x¯ = 1n x i = α, so x¯ = x i , ∀ i and ( x i − x¯ )2 = 0. In
this case, b1 cannot be computed. Thus, a sufficient condition for the estimator
to exist is that x 6= α(1,... , 1) for any α. The general case with many explanatory variables is similar.
yi = β0 + β1 x1 i + β2 x2 i + · · · + βk x ki + ǫ i . In vector notation:
y = X β + ǫ, where y − n × 1, x − n × (k + 1), ǫ − n × 1. The least squares estimator is (as before):
b = ( X ′ X )−1 X ′ y, where X ′ X − (k + 1) × (k + 1). For ( X ′ X )−1 to exist, we require that r( X ′ X ) = k + 1. Since r( X ′ X ) = r( X ) ≤
min(k + 1, n), in particular, this requires n ≥ k + 1 (the number of variables to be
estimated, k, is no greater than the number of observations).
The following discussion clarifies some details on the computation of the least squares estimator. This requires knowledge calculus.
3.7.1
Derivation of the least squares estimator
The sum of squared errors is given by:
S=
n X
i =1
e2i = =
n X
i =1 n X
i =1
( y i − b 0 − b 1 x i )2 ( y2i − 2b0 yi − 2b1 x i yi + b20 + 2b0 b1 x i + b21 x12 ).
Minimization of this expression by choice of β0 and β1 requires calculus – discussed in a later chapter. To minimize this expression, differentiate S with respect to b0 and b1 in turn and set the partial derivatives to 0. This gives: ∂S ∂b0 ∂S ∂b1
=2
n X
yi − 2
i =1 n X
= −2
i =1
n X
i =1
b0 − 2b1
x i yi + 2 b 0
n X
i =1
n X
i =1
xi = 0
x i + 2b1
n X
i =1
x2i = 0
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
98
Dividing both equations by 2 and rearranging: n X
i =1
n X
i =1
yi =
n X
i =1
x i yi = b 0
b0 + b1 n X
i =1
n X
i =1
x i + b1
x i = nb0 + b1
n X
i =1
n X
xi
i =1
x2i .
So, the least squares estimators b0 and b1 satisfy the pair of equations: n X
i =1
n X
i =1
yi =
n X
i =1
x i yi = b 0
b0 + b1 n X
i =1
n X
xi
i =1
x i + b1
n X
i =1
x2i ,
or, in matrix form: Ã P n Pn
i =1 y i
!
i =1 x i y i
Ã
Pn
n
i =1 x i Pn 2 i =1 x i
= Pn
i =1 x i
!Ã
b0 b1
!
.
Then
′
X X=
Ã
1
1
x1
x2
···
···
1 ! 1 1 . . xn . 1
x1
Ã x2 n = Pn .. . i =1 x i xn
Pn
i =1 x i Pn 2 i =1 x i
!
and
′
X y=
Ã
1
1
x1
x2
···
···
y1
! ! Ã P n 1 y2 i =1 y i . . = Pn . xn i =1 x i y i . yn
So, the least squares estimators b′ = (b0 , b1 ) satisfy the equations:
X ′ X b = X ′ y. Rearranging, on the assumption that ( X ′ X ) is nonsingular,
b = ( X ′ X )−1 X y. To compute ( X ′ X )−1 we need the determinant, minors, cofactors and adjoint. The determinant is  X ′ X  = nΣ x2i − (Σ x i )2 = nΣ x2i − n2 x¯ (= n(Σ x2i − n x¯ 2 )).
3.7. ECONOMETRICS
99
Minors ÃP P ! x2i xi P xi n
Cofactors ÃP P ! x2i − xi P − xi n
Adjoint ÃP
x2i P − xi
−
P
xi
!
n
Thus,
( X ′ X ) −1 =
ÃP
x2i P n( x2i − n x¯ 2 ) − x i 1
P
y1 ! Ã P ! P 1···1 yi − xi .. ′ , X y= . . = P x1 · · · x n n x i yi yn !
Ã
Thus, ′
(X X )
−1
P !Ã P ! yi x2i − xi X y= P 2 P P 2 n( x i − n x¯ ) − x i x i yi n ! ÃP P P P ( x2i )( yi ) − ( x i )( x i yi ) 1 = P 2 . P P P n( x i − n x¯ 2 ) −( x i )( yi ) + n( x i yi ) ′
1
ÃP
So P P P P ( x2i )(n y¯ ) − (n x¯ )( x i yi ) [( x2i )( y¯ ) − ( x¯ )( x i yi )] = b0 = P P n( x2i − n x¯ 2 ) ( x2i − n x¯ 2 ) P P P [n x i yi − n2 x¯ y¯ ] [ x i yi − n x¯ y¯ ] ( x i − x¯ )( yi − y¯ ) . b1 = = P 2 = P P 2 ( x i − x¯ )2 n( x i − n x¯ 2 ) ( x i − n x¯ 2 )
Now, [(Σ x2i ) y¯ − x¯ Σ x i yi ] = [(Σ x2i ) y¯ − n x¯ 2 y¯ − x¯ (Σ x i yi − n x¯ y¯ )] = y¯ (Σ x2i − n x¯ 2 ) − x¯ (Σ x i yi − n x¯ y¯ )
P P P P [( x2i )( y¯ ) − ( x¯ )( x i yi )] ( x2i ) y¯ − x¯ x i yi ¯ 1. b0 = = = y¯ − xb P P ( x2i − n x¯ 2 ) ( x2i − n x¯ 2 )
Thus, without the use of matrices, we can summarize: ¯ 1 , and b1 = b0 = y¯ − xb
P ( x i − x¯ )( yi − y¯ ) . P ( x i − x¯ )2
This page intentionally left blank
Exercises for Chapter 3
Exercise 3.1 For the (supplyanddemand) system of equations given below, find the equilibrium prices: 2 p1 + p2 + 6 p3 = 26
p1 + p2 + p3 = 12 3 p 1 + 2 p 2 − 2 p 3 = 2. Exercise 3.2
0.7 A= 0.1 0.1
0.4
0.3
0.2
0.3 ,
0.3
f1 90 f = = f 2 75 : f3 80
0.2
find the vector of interindustry outputs. Exercise 3.3 For the input–output model with technology matrix A = suppose that the final demand vector is f =
Ã
! 350
250
Ã
0.2
0.3
0.2
0.25
!
.
(a) Find the appropriate industry output levels (the x vector). (b) Find A 2 (= A A ), A 3 (= A A A ), A 4 (= A A A A ) and A 5 (= A A A A A ). Then calculate the matrix G = ( I + A + A 2 + A 3 + A 4 + A 5 ) and compute G f .
(c) Observe that G is close to ( I − A )−1 and your answer in (a) is close to G f . 101
,
CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS
102
3.4 with technology matrix Given the input–output model 0.0 0.2 30 40, find the equilibrium industry and final demand f = 0.1 0.0 0.2 0.3 50
Exercise 0.4 A= 0.2 0.2 outputs.
Exercise 3.5
A simple economy consists of two sectors: agriculture ( A ) and
transportation (T ). The input–output matrix for this economy is given by
A B
Ã
A
T
0.4
0.1
0.2
0.2
!
.
(a) Find the gross output of agricultural products needed to satisfy a consumer demand for $50 million worth of agriculture products and $10 million worth of transportation. (b) Find the value of agriculture products and transportation consumed in the internal process of production in order to meet the gross output.
Exercise 3.6
0.2
0.2
0.1
Given the input–output matrix 0.2 0.1
0.4
0.1 and final de
0.2 0.3 100 mand f = 80 , compute the interindustry flows necessary to sustain this final 50 demand.
Exercise 3.7
Given an input–output matrix A , if x is the vector of industry
outputs and f the vector of final demands, then x and f satisfy x = Ax + f . In the
case where
3 12 1 A= 12
0
4 12
0
0
6 , 12 2 12
4 12
20
f = 30 , 40
find the solution value of x. Suppose that each column sum of A is less than 1: for each j,
Pn
i =1 a i j
< 1.
Show that this implies that ( I − A )−1 exists and equals ( I + A + A 2 + A 3 + · · · ) = P t I+ ∞ t=1 A . Explain why this implies that for a given final demand f (a vector
with nonnegative entries), the solution value of x is nonnegative.
EXERCISES FOR CHAPTER 3
103
Exercise 3.8 Consider the transition matrix:
T
1 21 = 3 1 4
1 4 1 3 1 2
1 4 1 . 3 1 4
(a) Find the unique stationary distribution (using the cofactors) of T i.e., find the 1 × 3 vector q satisfying qT = q. (b) Find T 2 and compare with T . (c) Find the determinant of T . Is T invertible? Why? Exercise 3.9 Let
2
3 9 x = 3 , 6 11
4
Find the least squares estimator.
7
12 26 y = 13 . 18 36
13
This page intentionally left blank
4 Linear programming
4.1
Introduction
Linear programming is an optimization technique for finding optimal solutions to problems where the objective and constraints are linear. In what follows, the general linear programming problem and methods of solution are described, along with various features of this class of problem. A linear program is characterized by a linear objective, linear constraints and restrictions on the signs of choice variables. Linearity of the constraint set leads to a particular geometric structure on the set possible choice variables. For lowdimensional programs this permits a graphical treatment of the problem and, more generally, the geometric structure leads naturally to an algorithmic method for finding solutions. Section 4.1 develops the basic features of a linear program. This consists of formulation of the programming problem, description of key issues in solving these programs and the valuation of resource constraints through the dual prices. A broader understanding of linear programming requires knowledge of basic solutions, which are discussed in Section 4.2. Here the geometric view described in Section 4.1 is reinterpreted in terms of equation systems. Section 4.3 describes classic results in linear programming where the original program (the primal) is related to a second program (the dual). Before developing the discussion, a simple example will illustrate how these programs arise.
E XAMPLE 4.1: A firm produces two goods, 1 and 2. Production requires the application of two techniques: processing and finishing. A unit of good 1 requires two hours of processing time and four hours of finishing time; a unit of 105
CHAPTER 4. LINEAR PROGRAMMING
106
good 2 requires three hours of processing and three hours of finishing time. The total amount of time that the processing machine can run for each day is 13 hours, while the finishing machine can run for 17 hours. Each unit of good 1 sells for $3 and each unit of good 2 sells for $4. The problem is to decide how much of each good should be produced to maximize revenue – a problem that may be formulated as a linear program. Let x denote the number of units of good 1 produced, and let y denote the number of units of good 2 produced. Then the revenue is equal to 3 x + 4 y, since each unit of 1 generates $3 of revenue and each unit of
2 generates $4 of revenue. This defines the linear objective. Next, x units of good 1 require two hours of processing time, while y units of good 2 re
quire three hours of processing time: altogether, producing x units of 1 and
y units of 2 requires 2 x + 3 y hours of processing time. Similarly, producing
x units of 1 and y units of 2 requires 4 x + 3 y hours of finishing time. The
total amount of processing time is 13 hours, so the production plan must satisfy 2 x + 3 y ≤ 13, the first linear constraint, and likewise 4 x + 3 y ≤ 17
defines the finishing constraint. Finally, negative amounts cannot be pro
duced: x, y ≥ 0. Putting all these conditions together, the full problem may be summarized:
max subject to:
3x + 4 y 2 x + 3 y ≤ 13
4 x + 3 y ≤ 17
(processing) (finishing)
x, y ≥ 0.
♦
4.1.1 Formulation In a linearprogramming problem a collection of variables ( x1 , x2 , · · · , x n ) must be
chosen to maximize (or minimize) a linear function,
f ( x1 , x2 , · · · , x n ) = c 1 x1 + c 2 x2 + · · · + c n x n , subject to a finite number of linear constraints of the form:
a1 x1 + a2 x2 + · · · + a n x n ≤ b, a1 x1 + a2 x2 + · · · + a n x n ≥ b.
a1 x1 + a2 x2 + · · · + a n x n = b,
or
4.1. INTRODUCTION
107
In addition, the x i variables are required to be chosen nonnegative. A typical linear program may be written: max
c 1 x1 + c 2 x2 + · · · + c n x n
subject to:
a11 x1 + a12 x2 + · · · + a1 n x n ≤ b1 a21 x1 + a22 x2 + · · · + a2 n x n .. .
≤ .. .
b2 .. .
a m1 x1 + a m2 x2 + · · · + a mn x n ≤ b m x1 , x2 , · · · , x n
≥ 0.
This is a linear program with n variables and m constraints. If we write c =
( c1 , c2 , · · · , c n ),
a11
a12
a21 A= . .. a m1
a22 .. .
··· .. .
a m2
···
···
a1 n
a2 n , .. . a mn
b1
b2 b = . , .. bm
x1
x2 x = . , .. xn
then the program may be written in matrix form: max subject to:
c·x (4.1)
Ax ≤ b
x ≥ 0.
In parallel with the maximization problem, there is minimization formulation: min subject to:
c 1 x1 + c 2 x2 + · · · + c n x n a11 x1 + a12 x2 + · · · + a1 n x n
a21 x1 + a22 x2 + · · · + a2 n x n .. .
a m1 x1 + a m2 x2 + · · · + a mn x n x1 , x2 , · · · , x n
≥ b1
≥ .. .
b2 .. .
≥ bm
≥ 0.
In matrix form: min subject to:
c·x Ax ≥ b
x ≥ 0.
(4.2)
CHAPTER 4. LINEAR PROGRAMMING
4.1.1.1
108
Mixed constraints and unrestricted variable signs
The formulations for maximization and minimization given in Equations 4.1 and 4.2 have a standardized form with nonnegativity constraints. In general, maximization and minimization problems can be cast in this way. The following discussion illustrates. Consider the program: max subject to:
c 1 x1 + c 2 x2 a11 x1 + a12 x2 ≤ b1
a21 x1 + a22 x2 ≥ b2 x 1 ≥ 0.
In this program, both types of inequality appear, and only x1 is required to be nonnegative. Replacing x2 with y − z and requiring y and z to be nonnegative
overcomes the second issue – picking y = 0 and z any desired positive number gives x2 equal to the negative of that number. max subject to:
c 1 x1 + c 2 y − c 2 z a11 x1 + a12 y − a12 z ≤ b1
a21 x1 + a22 y − a22 z ≥ −b2 x1 , y, z ≥ 0.
The first issue is solved by multiplying the relevant constraint by −1 to reverse
the inequality: for example 5 ≥ 4 is equivalent to −5 ≤ −4 and ax ≥ b is equivalent
to −ax ≤ −b. In this example, multiplying a21 x1 + a22 y − a22 z ≥ b2 by −1 gives: max subject to:
c 1 x1 + c 2 y − c 2 z a11 x1 + a12 y − a12 z ≤ b1
−a21 x1 − a22 y + a22 z ≤ b2
x1 , y, z ≥ 0.
P Similarly, given an equality constraint a i x i = b i , this is equivalent to two P P constraints, a i x i ≤ b i and a i x i ≥ b i . Note that the convention is to formulate
the minimization program with the inequalities having the “≥” form: in matrix terms, Ax ≥ b. In the linear program, constraints of the form Ax ≤ b or Ax ≥ b
are called feasibility constraints; the constraints that x i ≥ 0 for all i are called
nonnegativity constraints. Taking both types of constraint together, refer to them as the program constraints or constraints.
4.1. INTRODUCTION
109
The next sections consider a number of important concepts in linear programming. Section 4.1.2 discusses the feasible region – those choices that satisfy all the constraints of the problem. Following this, Section 4.1.3 considers the problem of finding an optimal solution. In Section 4.1.4, dual prices and slack variables are introduced. Dual prices attach a price or valuation to each constraint, while slack variables indicate whether constraints are binding or not. Finally, the dual program associated with a linear program (called “the primal”) is introduced (Section 4.1.6). The dual program is derived from the primal program, is closely related and provides additional perspective on the primal program.
4.1.2
The feasible region
A vector x = (x1 ,... , x n ) is called feasible if it satisfies the linear constraint inequalities and the nonnegativity constraints.
Definition 4.1: The set of x vectors that satisfy the constraints Ax ≤ b and x ≥ 0 is called the feasible set or feasible region.
Geometrically, it has a special shape – it is a convex polyhedron. The following examples give the feasible regions defined by inequality constraints (on the left side), and the corresponding graphs show the feasible regions as the shaded areas. E XAMPLE 4.2: In this example, the feasible region is defined by two inequality constraints and the nonnegativity constraints. 2x + y ≤ 6 3x + 4y ≤ 12 x, y ≥ 0
y 6 5 4
2x + y = 6
3 2 1 0
•
3x + 4y = 12
x
0 1 2 3 4 5 6
♦
The next example has three constraints. The additional constraint adds corners to the feasible region, and indicates the impact of additional constraints.
CHAPTER 4. LINEAR PROGRAMMING
110
E XAMPLE 4.3:
y
2 x + 2 y ≤ 10
6
3 x + 6 y ≤ 18
5
2 x + 12 y ≤ 24
2 x + 2 y = 10
4
x, y ≥ 0
3 x + 6 y = 18
3 2
•
1 0
•
2 x + 12 y = 24
x 0 1 2 3 4 5 6 7 8 9 10 11 12
♦
The feasible region is defined by linear equations, but this region may be quite complex. The next example illustrates increasing complexity. In this example, the second constraint is written as 2 x + 6 y ≥ 12, which is equivalent to −2 x − 6 y ≤ 12. E XAMPLE 4.4:
y 3 x + 4 y ≤ 12
6
2 x + 6 y ≥ 12
5
2 x + 8 y ≤ 20
4
x, y ≥ 0
3 x + 4 y = 12 2 x + 8 y = 20
3 2 1 0
•
•
2 x + 6 y = 12
x 0 1 2 3 4 5 6 7 8 9 10
♦
4.1.3 Finding an optimal solution Once the feasible region is identified, one can begin the search for an optimal solution: a point in the feasible region such that there is no other point in the feasible region that improves the objective value.
4.1. INTRODUCTION
111
Definition 4.2: Consider the linear program: max c · x subject to Ax ≤ b and x ≥
0. If x∗ is feasible and c · x∗ ≥ c · x for all feasible x, then x∗ is called an optimal solution. Similarly, in the program: min c · x subject to Ax ≥ b and x ≥ 0. If x∗ is
feasible and c · x∗ ≤ c · x for all feasible x, then x∗ is called an optimal solution.
When it is possible to graph the feasible region (with two choice variables), and there are not too many constraints, the problem can be solved by inspection. To illustrate, consider Example 4.1 again.
E XAMPLE 4.5: Recall the program from Example 4.1: max subject to:
3x + 4 y 2 x + 3 y ≤ 13
4 x + 3 y ≤ 17
x, y ≥ 0.
Consider the objective function, π( x, y) = 3 x + 4 y. An equation such as
π( x, y) = 2 defines a line giving those combinations of ( x, y) such that π( x, y) = 2. For example, because 3 · 0 + 4 · 21 = 2 and 3 · 23 + 4 · 0 = 2, the points ¢ ¡ ¢ ¡ ( x, y) = 0, 21 and ( x, y) = 23 , 0 are on this line. Referring to π as profit, the
set {( x, y)  π( x, y) = k} is called an “isoprofit” line with profit k. Isoprofit
lines for the objective function, along with the feasible region, may easily be graphed. In the figure, the isoprofit lines are drawn for k = 8, 12 and 18.
Plotting these and the constraints gives:
y π( x, y) = 18
17 3
π( x, y) = 12
13 3
6 4 x + 3 y = 17
5 π( x, y) = 8
4 (2, 3) •
3 2
2 x + 3 y = 13
1 0 0
1
2
3
4
17 4
5
6
13 2
7
x
CHAPTER 4. LINEAR PROGRAMMING
112
From the figure, the highest value for the objective, 18, is achieved at (2, 3), the corner of the feasible region determined by equations 2 x + 3 y = 13 and 4 x + 3 y = 17. Using Cramer’s rule: ¯ ¯2 ¯ ¯ ¯4
¯ 3¯¯ ¯ = 6 − 12 = −6, 3¯
¯ ¯13 ¯ ¯ ¯17
¯ 3¯¯ ¯ = 39 − 51 = −12, 3¯
Thus, the solution is x∗ = ( x∗ , y∗ ) =
¡ −12 −6
¯ ¯2 ¯ ¯ ¯4
¯ 13¯¯ ¯ = 34 − 52 = −18. 17¯
¢ ¡ 12 18 ¢ , −−18 6 = 6 , 6 = (2, 3). Note that the
feasible region is defined by its corners. Determining the objective value at each of the corners gives the table: Corner : ( x, y) (0, 0) ¡ 13 ¢ 0, 3
(2, 3)∗ ¡ 17 ¢ 4 ,0
Objective value 0 = 3(0) + 4(0) ¡ ¢ 17 31 = 3(0) + 4 13 3
18 = 3(2) + 4(3) ¡ ¢ 12 43 = 3 17 4 + 4(0)
By comparing the objective value at each corner, observe that the highest value occurs at corner (2, 3). This is not accidental. An optimal solution can always be found at a corner.
♦
An important theorem in linear programming (see Theorem 4.2, later) says that if a linear program has an optimal solution, then it has an optimal solution at a corner. This does not say that every optimal solution is at a corner – just that there is an optimal solution at a corner (assuming the problem has an optimal solution). The key benefit of the theorem is that in searching for an optimal solution we can restrict attention to the corners of the feasible region. When there are many variables and a graphical solution is not possible, being able to restrict attention to corners is very useful. One method for solving a linear program, the simplex method, compares corners, moving from one to another to improve the objective and stopping when no further improvement is possible. (New methods move inside the feasible set during the search.) In more than two dimensions, the meaning of “corner” may be less clear. In higher dimensions, the appropriate concept is vertex or extreme point of the feasible region. Note in the figures that no corner is a (strict) convex combination of other points in the feasible region. We may define a corner of the feasible region to be a point, x∗ , in the feasible region with the property that we cannot find two other points in the feasible region, say y, z (different from x∗ ), such that x∗ lies on a straight line connecting these two points. The only points in the
4.1. INTRODUCTION
113
figures satisfying this criterion are the corners. This criterion is independent of the dimension of the feasible region (the number of elements in the x vector) and defines a vertex or extreme point of the feasible set in general. This is discussed further in Section 4.2.
4.1.4
Dual prices and slack variables
In a linear program, the inequality constraints identify resource constraints: usage of a particular resource may not exceed a particular level. Subject to these constraints, the objective is optimized. Therefore, when a resource constraint is varied (some righthand side parameter b i is altered), this generally has an impact on the solution and corresponding objective value. This change in the objective value is measured by the dual price.
4.1.4.1
Dual prices
The dual or shadow price of the ith constraint is defined as the change in the value of the optimized objective due to a unit change in the righthand side of the ith constraint. For example, suppose that the program has been solved and the solution is at a corner x∗ with associated objective value c · x∗ . Then, x∗ is located
at the intersection of some set of constraints. Changing the righthand side (RHS) of one of those constraints slightly will change the intersection point slightly, to say x′ . Suppose that the RHS of the ith constraint is changed from b i to b′i and suppose that the corner at which the solution is located is defined by the same set of constraints (the solution occurs at the x′ ). In Example 4.6, figure A, suppose the original solution is at (2,3). If the RHS of constraint 1 is changed from 13 ¡ ¢ to 14 and the solution will move to 96 , 22 6 – still defined by the intersection of constraints 1 and 2.
(This is the typical situation. Formally, the solution basis does not change. See Example 4.6, Remark 4.1 and the discussion of basic solutions in Section 4.2.) Let the new solution be x′ . Then the dual price of the ith constraint is defined: DP i =
Change in objective c · x′ − c · x∗ = Change in RHS of constraint i b′i − b i
E XAMPLE 4.6: To illustrate, continuing with the previous example, consider the impact of changing the righthand side (RHS1 ) of the first
CHAPTER 4. LINEAR PROGRAMMING
114
constraint from 13 to 14. In the notation above, b1 = 13, b′1 = 14. This defines a new program:
max 3x + 4y
subject to:
2x + 3y ≤ 13 (1) 4x + 3y ≤ 17 (2) ≥ 0
x, y
max 3x + 4y
subject to:
2x + 3y ≤ 14 (1′ )
==⇒
4x + 3y ≤ 17 (2) ≥ 0
x, y
When the RHS of constraint 1 changes to 13 from 14, the new solution using Cramer’s rule is: ¯ ¯2 ¯ ¯ ¯4
¯ 3¯¯ ¯ = 6 − 12 = −6, 3¯
¯ ¯14 ¯ ¯ ¯17
¯ 3¯¯ ¯ = 42 − 51 = −9, 3¯
Then the new solution is at x = (x′ , y′ ) =
¡9
22 6, 6
y
¢
¯ ¯2 ¯ ¯ ¯4
¯ 14¯¯ ¯ = 34 − 56 = −22. 17¯
(see figure A).
y figure A
17 6 3
figure B
17 6 3
5
5
13 34
•
¡9
22 6, 6
• (2,3)
3
13 34
¢
2
2
1
1
0
x 0
1
2
• (2,3) •
3
3
4
17 4
5
6
13 2
¡ 15
, 16 6
3
4
6
¢
0
7
x 0
1
2
17 4
5
6
13 2
The objective value at the new solution is: c · x′ = (3,4) ·
22 6, 6
¡9
¢
=
¡1¢ 6
[3(9) + 4(22)] =
¡1¢ 6
115 = 19.1667 = 19 16 .
7
4.1. INTRODUCTION
115
Therefore, DP1 =
∆objective c · x′ − c · x∗ 19.1667 − 18 = = = 1.1667 = 1 16 . ∆RHS1 b′1 − b1 14 − 13
The same reasoning applies to constraint 2. Consider raising the RHS (RHS2 ) of constraint 2 from 17 to 18. This is depicted in figure B. max 3x + 4y subject to: 2x + 3y ≤ 13 (1) 4x + 3y ≤ 17 (2) ≥ 0
x, y
max 3x + 4y subject to: 2x + 3y ≤ 13 (1)
==⇒
4x + 3y ≤ 18 (2′ ) x, y The new solution is at x† = (x† , y† ) = c · x† = (3,4) ·
¡ 15 6
¡ 15 6
≥ 0
¢ , 16 6 with objective value:
¢ ¡1¢ ¡1¢ , 16 6 = 6 [3(15) + 4(16)] = 6 109 = 18.1667
Using Cramer’s rule, the calculations for determining the solution are: ¯ ¯2 ¯ ¯ ¯4
¯ 3¯¯ ¯ = 6 − 12 = −6, 3¯
giving x′ =
−15 −6
DP2 =
=
15 6
¯ ¯13 ¯ ¯ ¯18
and y′ =
¯ 3¯¯ ¯ = 39 − 54 = −15, 3¯ −16 −6
=
16 6 .
¯ ¯2 ¯ ¯ ¯4
¯ 13¯¯ ¯ = 36 − 52 = −16. 18¯
Therefore,
∆objective c · x† − c · x∗ 18.1667 − 18 = = = 0.1667 = ∆RHS2 b′2 − b2 18 − 17
1 6
Note: One can also consider the impact of changes in other parameters of the problem. For example, suppose that the objective were 8x + 6y instead of 3x + 4y. In this case, the objective would run parallel to the second constraint
CHAPTER 4. LINEAR PROGRAMMING
116
and any point on the line between ( x, y) = (2, 3) and ( x, y) =
optimal, including the two optimal corners.
4.1.4.2
¡ 17 4
¢ , 0 would be
♦
Slack variables
Given the constraints Ax ≤ b and x ≥ 0, suppose that the optimal solution has been found and call it x∗ . The slack variables are defined as the vector s∗ that satisfies
Ax∗ + s∗ = b or s∗ = b − Ax∗ . (In the case where the constraints have the form Ax ≥ b and x ≥ 0, the slack variables are defined: Ax∗ − s∗ = b or s∗ = Ax∗ − b.
This convention ensures that the slack variables are nonnegative.)
Continuing with the example, s∗ = (s∗1 , s∗2 ) is defined by the equations: 2 x∗ + 3 y∗ + s∗1 = 13
or
4 x∗ + 3 y∗ + s∗2 = 17
2(2) + 3(3) + s∗1 = 13
4(2) + 3(3) + s∗2 = 17
or
13 + s∗1 = 13
17 + s∗2 = 17
so that s∗ = (s∗1 , s∗2 ) = (0, 0). When a slack variable is 0, the corresponding con
straint is binding. Here both slack variables are 0, and the constraints bind. In general, changing the RHS of a binding constraint changes the solution and
the value of the objective. Therefore, in general, if the slack variable associated with a constraint is 0, the corresponding dual price is nonzero. Conversely, if the slack variable associated with a constraint is positive, then the constraint is not binding: changing it has no effect on the solution and so the dual price is zero.
4.1.5 Some examples In this section, two examples are considered in detail to summarize the discussion so far. E XAMPLE 4.7: Consider the following linear program: max subject to:
4x + 7 y 2 x + 6 y ≤ 28
3 x + 4 y ≤ 22
5 x + 4 y ≤ 34
x, y ≥ 0.
(1) (2) (3)
4.1. INTRODUCTION
117
• Plot: A plot of the constraints gives:
y 9 8
constraint 3
7 6
22 4 28 5 6
¡ 92
• (2, 4)
4 3
72 22 , 22
•
¢
2
constraint 1 (6, 1) •
1 0
x 0
1
2
3
4
5
6
34722 5 3
8
9 10 11 12 13 14
• Solution: Constraints 1 and 2 intersect at ( x, y) = (2, 4); constraints ¡ 92 72 ¢ , 22 ; and constraints 2 and 3 at ( x, y) = (6, 1). 1 and 3 at ( x, y) = 22 These are found by solving each pair of equations in turn. For example, 2 x + 6 y = 28 and 3 x + 4 y = 22 solve to give ( x, y) = (2, 4), which turns out to be the optimal solution:
Ã
Thus
¯ ¯2 ¯ ¯ ¯3
¯ 6¯¯ ¯ = −10, 4¯
This gives ( x, y) =
2
6
3
4
¯ ¯28 ¯ ¯ ¯22
−40 −10 , −10
¡ −20
¢
!Ã ! x
y
=
Ã
28 22
¯ 6¯¯ ¯ = −20, 4¯
!
.
¯ ¯2 ¯ ¯ ¯3
¯ 28¯¯ ¯ = −40. 22¯
= (2, 4). To see that it is optimal, recall
from earlier discussion that comparing the objective value at each feasible corner is one way to solve the program. Another way is to compare the slopes of the constraints and the objective.
Constraint
Slope
Constraint 1
− 13
Constraint 2 Constraint 3
− 34 − 54
Ob jective − 47
CHAPTER 4. LINEAR PROGRAMMING
118
The slope of the objective is between the slope of constraints 1 and 2, so the optimal solution will be where these constraints intersect – at (2,4). The value of the objective at the solution is: 4(2) + 7(4) = 36. • Parameter changes: If, for example, the objective were changed to 6 x + 7 y with a slope of 67 , this would be steeper than constraint 2 and the solution would be located at ( x, y) = (6, 1). If the objective were al
tered to be parallel to the second constraint (for example, 6 x +8 y) there
would be an infinite number of solutions (( x, y) = (2, 4), ( x, y) = (6, 1) and all points on the line connecting them).
• Slack variables: Since both constraints 1 and 2 are satisfied with equality, s1 = s2 = 0, while s3 = 34 − 5(2) − 4(4) = 34 − 26 = 8. • Dual prices: 1. To calculate the dual price in constraint 1, replace 28 by 29. Then: ¯ ¯29 ¯ ¯ ¯22
¯ 6¯¯ ¯ = −16, 4¯
¯ ¯2 ¯ ¯ ¯3
¯ 29¯¯ ¯ = −43 22¯
¡ 16 43 ¢ , 10 . The value of the objective and the new solution is ( x, y) = 10 ¡ 16 ¢ ¡ 43 ¢ 365 1 is 4 10 + 7 10 = 10 = 36 2 . Consequently, DP1 = 12 .
2. To calculate the dual price in constraint 2, replace 22 by 23. Then: ¯ ¯28 ¯ ¯ ¯23
¯ 6¯¯ ¯ = −26, 4¯
¯ ¯2 ¯ ¯ ¯3
¯ 28¯¯ ¯ = −38 23¯
¡ 26 38 ¢ , 10 . The value of the objective and the new solution is ( x, y) = 10 ¡ 38 ¢ 370 ¡ 26 ¢ is 4 10 + 7 10 = 10 = 37. Consequently, DP2 = 1.
3. Since changing the righthand side of constraint 3 does not affect the solution, the associated dual price is 0.
E XAMPLE 4.8: Consider the following linear program: min subject to:
4x + 2 y 4 x + 8 y ≥ 40
3 x + 2 y ≥ 24
6 x + 2 y ≥ 30
x, y ≥ 0.
(1) (2) (3)
♦
4.1. INTRODUCTION
119
• Plot:
y 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
constraint 3
• (2, 9)
constraint 2
(4, 3)
• •
¡ 3¢ 7, 2
constraint 1 x 0 1 2 3 4 5 6 7 8 9 10
• Solution: Comparing the slopes of the objective and constraints gives the solution at the intersection of constraints 2 and 3. ¯ ¯ ¯ ¯ ¯ ¯ ¯3 2¯ ¯24 2¯ ¯3 24¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ = −6 , ¯ ¯ = −12, ¯ ¯ = −54. ¯6 2¯ ¯30 2¯ ¯6 30¯ Thus, ( x, y) = (2, 9). The value of the objective at the solution is: 4(2) +
2(9) = 26.
• Slack variables: The values of the slack variables at the optimal solution, ( x∗ , y∗ ) are defined from the equations: 4 x∗ + 8 y∗ − s∗1 = 40
3 x∗ + 2 y∗ − s∗2 = 24
6 x∗ + 2 y∗ − s∗3 = 30, giving s∗1 = 4 x∗ + 8 x∗ − 40 = 4(2) + 8(9) − 40 = 80 − 40 = 40, and s∗2 = 0,
s∗3 = 0.
• Dual prices: 1. Because constraint 1 is not binding, its shadow price is 0:
DP1 = 0. 2. To determine the dual price of the second constraint, increase
RHS2 by 1 from 24 to 25. Solving for the intersection of this new
CHAPTER 4. LINEAR PROGRAMMING
120
constraint 2 and the original constraint 3: ¯ ¯25 ¯ ¯ ¯30
¯ 2¯¯ ¯ = −10, 2¯
¯ ¯3 ¯ ¯ ¯6
¯ 25¯¯ ¯ = −60. 30¯
¢ ¡ This gives a new solution ( x, y) = 10 6 , 10 and the objective value ¡ ¢ is 4 10 6 + 2(10) = 26.6667. The original objective value was 26, so
a unit increase in the RHS of constraint 2 has raised the value of the objective by 23 : DP2 = 0.6667. 3. Finally, to determine DP3 raise the RHS of constraint 3 by 1 unit and find the new intersection with the original constraint 2: ¯ ¯24 ¯ ¯ ¯31
¯ 2¯¯ ¯ = −14, 2¯
¯ ¯3 ¯ ¯ ¯6
¯ 24¯¯ ¯ = −51. 31¯
¢ ¡ 51 This gives a new solution ( x, y) = 14 6 , 6 and the objective value ¡ 51 ¢ ¡ 14 ¢ ♦ is 4 6 + 2 6 = 26.3337. Therefore DP3 = 0.3337.
4.1.6 Duality reconsidered The following discussion provides a different perspective on dual prices and their computation. An example will motivate the discussion. The next example takes an earlier example and rearranges the parameters and objective to give a program called the dual program. Every program has a dual program, giving the terminology primal and dual to refer to the initial and transformed programs. It turns out that the optimal solution payoff values are the same in both programs, the dual prices in the primal are the solution values in the dual, and the solution values in the primal are the dual prices in the dual program.
E XAMPLE 4.9: Consider the following linear program: Program A min subject to:
13w + 17 z 2w + 4 z ≥ 3
3w + 3 z ≥ 4
w, z ≥ 0
(1) (2)
.
4.1. INTRODUCTION
121
Observe that this program bears a close relation to one discussed earlier (see Examples 4.5 and 4.6): Program B max subject to:
3x + 4 y 2 x + 3 y ≤ 13
4 x + 3 y ≤ 17
(1) (2)
x, y ≥ 0.
Program A is called the dual of program B. Notice that A is obtained from B by exchanging the objective coefficients and RHS constraint parameters, and taking the transpose of the coefficient matrix. The remarkable fact relating these programs is the following. Let ( x∗ , y∗ ) solve program B, with objective value πB and dual prices (DP1B , DP2B ). Similarly, let (w∗ , z∗ ) solve program A, with objective value π A and dual prices (DP1A , DP2A ). Then (1)
πB = π A ,
( x∗ , y∗ ) = (DP1A , DP2A ),
(2)
(3)
(w∗ , z∗ ) = (DP1B , DP2B ).
Recall from Examples 4.5 and 4.6, that the solution to program B has: πB = ¢ ¡ 18, ( x∗ , y∗ ) = (2, 3) and (DP1 , DP2 ) = 1 61 , 16 . To solve program A, plot the constraints:
z 2
4 3
constraint 2
1 3 4
¡7
1 6, 6
constraint 1
¢
w
0 0
1
4 3
2
3 2
The solution is at the intersection of the two constraints, and using Cramer’s rule:
¯ ¯2 ¯ ¯ ¯3
¯ 4¯¯ ¯ = −6 , 3¯
¯ ¯3 ¯ ¯ ¯4
¯ 4¯¯ ¯ = −7 , 3¯
¯ ¯2 ¯ ¯ ¯3
¯ 3¯¯ ¯ = −1 4¯
CHAPTER 4. LINEAR PROGRAMMING
gives solution (w∗ , z∗ ) = ¡ ¢ 17 61 = 108 6 = 18.
122
1 6, 6
¡7
¢
¡ ¢ ¡ ¢ = 1 61 , 16 . The objective value is π A = 13 76 +
To calculate the dual prices consider first an increase of
1 4
in the RHS of
constraint 1 and solve for the new intersection: ¯ ¯2 ¯ ¯ ¯3
¯ 4¯¯ ¯ = −6 , 3¯
¯ ¯3 1 ¯ 4 ¯ ¯4 ′
′
=
444 24
Thus, the new solution is: (w , z ) = value is then 13
¡ 25 ¢ 24
+ 17
¡
7 24
¢
¯ ¯2 ¯ ¯ ¯3
¯ 4 ¯¯ ¯ = −6 41 , 3¯ µ
6 41 6
,
1 34 6
¶
=
¯ 3 14 ¯¯ ¯ = −1 43 . 4¯
¡ 25
7 24 , 24
¢
= 18 21 . Thus, DP1A =
. The new objective 1 2 1 4
= 2.
For constraint 2, to determine DP2A , change the RHS to 4 41 : ¯ ¯2 ¯ ¯ ¯3
¯ 4¯¯ ¯ = −6 , 3¯
¯ ¯3 ¯ ¯ 1 ¯4 4
¯ ¯2 ¯ ¯ ¯3
¯ 4¯¯ ¯ = −8 , 3¯
¯ 3 ¯¯ ¯ = − 21 . 41¯ 4
¡ 1¢ . The new objective value is then The new solution is now (w† , z† ) = 86 , 12 3 ¡ 1 ¢¤ 225 £ ¡ 16 ¢ 13 12 + 17 12 = 12 = 18 34 . Thus, DP2A = 41 = 3. Summarizing: DP1A = 2, 4
DP2A = 3. Thus, ( x∗ , y∗ ) = (DP1A , DP2A ) = (2, 3).
Recall that the dual prices for program B were DP1B = 1.1667 = 67 and ¢ ¡ DP2B = 0.1667 = 16 and πB = 18. Thus, (w∗ , z∗ ) = (DP1B , DP2B ) = 67 , 16 and
♦
π A = πB = 18.
However, there are some qualifications to these observations, as the next remark illustrates. R EMARK 4.1: The tight connection between objective variables and dual variables depends on regularity of the programs. Consider the following program with associated dual:
Dual
Primal max
2x + 2 y
min
subject to:
4w + 4v + 6 z
subject to:
x≤4
(1)
w+z ≥2
(1)
y≤4
(2)
v+z ≥2
(2)
x+ y≤6
(3)
w, v, z ≥ 0
x, y ≥ 0
.
4.2. BASIC SOLUTIONS
123
The primal has an objective parallel to the third constraint. Any point on the line connecting (2, 4) and (4, 2), such as (3, 3), is an optimal solution to the linear program. The value of the objective is 12. Furthermore, for either constraints 1 or 2, raising the RHS by 1 or reducing it by 1, leaves (3, 3) both feasible and optimal, so that there is no impact on the objective. The dual prices of constraints 1 and 2 are 0. The third constraint in the primal has a dual price of 2. In the dual program, w = v = 0 and z = 2 is the unique optimal solution.
To see this, consider changing these values. If z is reduced by ∆, then both
x and y must increase by ∆ to satisfy the constraints. The reduction in the objective from the change in z, −6∆, is outweighed by the gain from the increase
in x and y: 8∆ = 4∆ + 4∆. Turning to shadow prices, if the RHS of the first
constraint in the dual is raised to 2 + γ, γ small, then it is optimal to raise w
from 0 to γ. Raising w raises the objective by 4γ, whereas increasing z would raise the cost by 6γ. The change in the objective divided by the change in the RHS of 1 is 4, suggesting DP1 = 4 in the dual. The same reasoning suggests
that DP2 = 4 in the dual, but the pair (4, 4) correspond to no solution in the
primal.
4.2 Basic solutions The previous discussion dealt with programs having constraints of the form Ax ≤
b or Ax ≥ b. In what follows, it is shown that such inequalityconstrained prob
lems may be reformulated to equalityconstrained programs. The equalityconstrained formulation is then used to discuss basic solutions.
4.2.1
Equalityconstrained programs
Consider the linear program: max c · x subject to Ax ≤ b and x ≥ 0. Here c – n × 1,
x – n × 1, A – m × n, s – m × 1 and I – m × m, the identity matrix. The program
may be written in a variety of equivalent ways:
I max subject to:
II c·x Ax ≤ b
x≥0
max subject to:
c· x+0·s Ax + Is = b
x, s ≥ 0
CHAPTER 4. LINEAR PROGRAMMING
124
IV
III max
( c, 0) · ( x, s)
subject to:
c¯ · z
max subject to:
( A : I )( x, s) = b
¯ =b Az
( x, s) ≥ 0
z ≥ 0.
Formulation II can be expressed succinctly as follows. Define new matrices and vectors: c¯ = ( c, 0), A¯ = ( A : I ), z = ( x, s). In the initial problem, the dimensions of the matrices and vectors were c − n × 1, x − n × 1, A − m × n. Since A is an m × n
matrix, this means that there are m constraints (leaving aside the nonnegativity constraints on the x). The equalityconstrained model is obtained by adding “slack” variables – s – one for each constraint. So there are m slack variables added to give formulation II. Thus c¯ is a lengthened vector with zeros added to c; likewise, z involves lengthening the x vector with s, and A¯ is a matrix formed by placing A and I , the m× m identity matrix, side by side. Therefore, z −(n+ m)×1, c¯ −(n+ m)×1 and A¯ − m × (n + m). Programs II, III and IV are equivalent problems – just involving relabeling. Furthermore, I and II are equivalent problems, so that the inequalityconstrained program in I may be replaced by the equalityconstrained program, IV. Theorem 4.1: The linear programs I and II have the same solutions: if x∗ solves ˜ s˜) solves II then x˜ solves I. I then for some s∗ ≥ 0, ( x∗, s∗ ) solves II, and if ( x, P ROOF : Let x∗ solve I and suppose that there is no s∗ ≥ 0 such that ( x∗, s∗ )
˜ s˜) with: solves II. If this is the case, then for any s∗ ≥ 0, ∃( x,
˜ s˜) ≥ 0. c · x˜ = c · x˜ + 0 · s˜ > c · x∗ + 0 · s∗ = c · x∗ and A x˜ + I s˜ = b, ( x, However, from the previous equation, ˜ = b − A x˜ implies A x˜ ≤ b. 0 ≤ s˜ = Is Therefore, x˜ is feasible for the first problem and c · x˜ > c · x∗ , contradicting the assumption that x∗ solves problem I .
˜ s˜) solves I I but x˜ does not solve I . Then ∃ x∗ ≥ 0, Conversely, suppose that ( x,
such that c · x∗ > c · x˜ and Ax∗ ≤ b. Set s∗ = b − Ax∗ so that s∗ ≥ 0. Then ( x∗, s∗ )
˜ s˜) in II : c · x∗ +0· s∗ > c · x˜ +0· s˜. This contradicts is feasible and improving over ( x,
˜ s˜) solves I I . Therefore, solving I and II are equivalent. the assumption that ( x, This completes the proof.
4.2. BASIC SOLUTIONS
125
R EMARK 4.2: The minimization problem is analogous.
I min subject to:
II min
c·x
subject to:
Ax ≥ b
c· x+0·s Ax − Is = b
x, s ≥ 0.
x≥0
This can be expressed: min subject to:
c¯ · z ¯ =b Az
z ≥ 0,
where A¯ = ( A : − I ), c¯ = ( c, 0), and z = ( x, s). Again, solving I and II yields the
same solution.
4.2.2
Definition and identification of basic solutions
In this section, basic solutions are introduced and related to the vertices of the feasible region. In view of Theorem 4.1, we can focus on the equalityconstrained formulation for the following discussion (and discard the “bar” notation). Thus, the linear program will be written: max c · x, subject to : Ax = b and x ≥ 0. That is, max subject to:
c·x
Ax = b x ≥ 0.
Recall that a feasible solution in the inequalityconstrained program is any vector x satisfying the constraints: Ax ≤ b and x ≥ 0. A feasible solution xˆ is an
optimal feasible solution if there is no other feasible x˜ , such that c · x˜ > c · xˆ in the case of a maximization problem and c · x˜ < c · xˆ in the case of a minimization
problem. (Alternatively, the definitions of feasible solution and optimal feasible solution can be given in terms of the equalityconstrained program – it makes no difference.) With the equalityconstrained program, a vector x satisfying Ax = b
and x ≥ 0 is said to be feasible. Let m be the number of rows in A and recall that
A has more columns than rows.
We assume that A has full rank and since A has more columns than rows this means that the rank of A equals the number of rows: m. This is called the
CHAPTER 4. LINEAR PROGRAMMING
126
full rank assumption: the m rows of A are linearly independent. This assumption involves no loss of generality. If A does not have full rank then there are two possibilities. Either some of the constraints in Ax = b are redundant (when some
subset of constraints are satisfied, so are the remainder), or the constraints are inconsistent – there is no x such that Ax = b. To see this more clearly, let a i
denote the i th row of A . Suppose that A does not have full rank. Then some
row of A , say row i , can be written as a linear combination of the other rows: P P P a i = j 6= i θ j a j . Now, suppose a j · x = b j , j 6= i , so that j 6= i θ j (a j · x) = j 6= i θ j b j . P P Since j 6= i (θ j a j ) · x = a j · x, this implies that a i · x = j 6= i θ j b j . However, the i th
equation specifies that x must satisfy a i · x = b i . Thus, there are two possibilities: P (a) b i = j 6= i θ j b j , in which case any x satisfying all but the i th constraint will P also satisfy the i th constraint or (b) b i 6= j 6= i θ j b j , so that any x satisfying all but
the i th constraint cannot satisfy the i th constraint. In case (a), the i th constraint
is redundant, while in case (b) the i th constraint is inconsistent with the other constraints. Therefore, in case (b) the problem has no solution while in case (a) the i th constraint may be dropped from the problem and the problem remains unchanged. From now on we assume that A has full rank – equal to the number of rows (m) in A . Suppose we select m linearly independent columns from the matrix A : say these columns together form a matrix B . Such a set of columns is called a basis. It is possible that not all sets of m columns of A are linearly independent, but since
A has full rank there is at least one set of m linearly independent columns. Since B is nonsingular, the equation system BxB = b has a unique solution.
For example, suppose that the first m columns of A are linearly independent and
let B be the matrix formed by these first m columns. Let the remaining columns of A form some matrix C , so that A = (B : C ). Then if xB satisfies BxB = b, let
x = ( xB , 0) and observe that Ax = BxB + c · 0 = BxB = b. Thus x = ( xB : 0) satisfies Ax = b. However, this x may not satisfy x ≥ 0 – some elements of the vector x
may be negative. This problem is taken up later. Note that when the B matrix is selected, a corresponding set of x is selected. In the example above, selection of the first m columns of A implicitly led to the selection of the first m elements of
x, with the remainder set to 0. Given Ax = b, let B be any nonsingular m × m submatrix of columns of A .
If all elements of x not associated with columns of B are set to 0, and the resulting set of equations solved, the unique x determined in this way is called a basic solution of the system Ax = b with respect to the basis B . The components of x associated with columns of B are called basic variables. Thus, in the
discussion above where the first m columns of A are chosen (and are linearly
4.2. BASIC SOLUTIONS
127
independent), x = ( xB , 0) is a basic solution (with respect to the basis B ) and xB the
basic variables.
If one or more of the basic variables in a basic solution has a value of 0, that solution is said to be a degenerate basic solution. Therefore, in a nondegenerate basic solution there are m nonzero components (and the remaining components are 0). In a degenerate basic solution, there are fewer than m components nonzero. Recall that if x satisfies both Ax = b and x ≥ 0, x is said to be feasible (for
these constraints). If x is a basic solution satisfying constraint 3 it is called a basic feasible solution. If this solution is also degenerate, it is called a degenerate
basic feasible solution. The following result, called the fundamental theorem of linear programming, is central in linear programming. Theorem 4.2: Given a linear program of the form I or I I ,
I max subject to:
II max
c·x
subject to:
Ax ≤ b
c·x Ax = b
x ≥ 0.
x≥0
1. If there is a feasible solution there is a basic feasible solution. 2. If there is an optimal feasible solution, there is an optimal basic feasible solution.
The importance of this theorem (from the computational point of view) is that the search for an optimal solution can be restricted to the basic solutions. Since there are only a finite number of these, there are only a finite number of x to be considered in the search for an optimal solution. The following example illustrates the use of basic solutions in locating an optimal solution.
E XAMPLE 4.10: max
2x + 5 y
subject to: 4 x + y
x + 3y x+ y x, y
≤ 24
≤ 21
≤9
≥0
CHAPTER 4. LINEAR PROGRAMMING
128
This is rearranged in equality constraint form as: max
2x + 5y + 0s1 + 0s2 + 0s3
subject to :
4x + y + s1 = 24 x + 3y + s2 = 21 x + y + s3 = 9 x, y, s1 , s2 , s3 = 0.
In matrix form, let:
4
1
1
0
A¯ = 1
3
0
1
1
0
0
1
0 24 0 , b = 21 , c¯ = (2,5,0,0,0), and z = (x, y, s1 , s2 , s3 ). 1 9
The problem may then be written:
c¯ · z
max subject to :
¯ =b Az z ≥ 0.
♦
In general, given a matrix with m rows and p columns (p > m), the num
ber of m × m submatrices that can be formed is
p! m!(p− m)!
(the binomial coeffi
cient). In the present case, the coefficient matrix has m = 3 and p = 5, and so
the maximum number of distinct 3 × 3 matrices (giving the maximum number of bases) is 3!(55!−3)! = 10. Consider the A¯ matrix. The constraints are plotted in Figure 4.1 (the numbering identifies the basic solutions). To find the first basic ¯ Provided these three columns solution, begin with the first three columns of A. are linearly independent (which they are), set the coefficients on the last two columns to 0 (s2 and s3 ) and solve for the coefficients of the first three columns (x, y, s1 ). x
y
s1
s2
s3
4
1
1
0
0
A¯ = 1 1
3
0
1
1
0
0
0 . 1
4.2. BASIC SOLUTIONS
129
[9] 24 • 23 22 21 20 19 18 17 16 15 14 13 12 11 10 [7] 9 • 8 7 • [3] • 6 • [8] [1] 5 • 4 [2] 3 2 [6] [4] [5] 1 [10] • • • 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Figure 4.1: Basic solutions
So, suppose s2 = s3 = 0, and solve for the remaining variables:
4
1
B1 = 1
3
1
1
1 4 1 ⇒ 0 0 1
1 3 1
x 24 y = 21 . 0 0 s1 9
1
The solution is ( x, y, s1 ) = (3, 6, 6). In the terminology introduced earlier, the set ( x, y, s1 ) are called basic variables and (3,6,6,0,0) is a basic solution. The basis B1
was obtained by setting s2 and s3 to 0. To identify how the basis was obtained, denote this situation by B1 : (s2 , s3 = 0). The notation is:
B1 : (s2 , s3 = 0) 4 1 1 1 3 0 1 1 0 (3, 6, 6)
CHAPTER 4. LINEAR PROGRAMMING
130
The top line gives the elements set to 0 to determine the associated basis. The matrix identifies the basis. The last line gives the solution values of the basic variables:
B1 : (s2 , s3 = 0) B2 : (s1 , s3 = 0) 4 1 1 4 1 0 1 3 0 1 3 1 1 1 0 1 1 0 (3, 6, 6)
(5, 4, 4)
B3 : (s1 , s2 = 0) 4 1 0 1 3 0 1 1 1
B4 : ( y, s3 = 0) B5 : ( y, s2 = 0) 4 1 0 4 1 0 1 0 1 1 0 0 1 0 0 1 0 1
(4.636, 5.545, −1.09)
(9, −12, 12)
(21, −60, −12)
B6 : ( y, s1 = 0) 4 0 0 1 1 0 1 0 1
B7 : ( x, s3 = 0) 1 1 0 3 0 1 1 0 0
B8 : ( x, s2 = 0) 1 1 0 3 0 0 1 0 1
B9 : ( x, s1 = 0) 1 0 0 3 1 0 1 0 1
B10 : ( x, y = 0) 1 0 0 0 1 0 0 0 1
(6, 15, 3)
(9, 15, −6)
(7, 17, 2)
(24, −51, −15)
(24, 21, 9).
The ten basic solutions are in the form z = ( x, y, s1 , s2 , s3 ):
z1 = (3, 6, 6, 0, 0)
(4.3)
z2 = (5, 4, 0, 4, 0)
(4.4)
z3 = (4.636, 5.545, 0, 0, −1.09)
(4.5)
z4 = (9, 0, −12, 12, 0)
(4.6)
z5 = (21, 0, −60, 0, −12)
(4.7)
z6 = (6, 0, 0, 15, 3)
(4.8)
z7 = (0, 9, 15, −6, 0)
(4.9)
z8 = (0, 7, 17, 0, 2)
(4.10)
z9 = (0, 24, 0, −51, −15)
(4.11)
z10 = (0, 0, 24, 21, 9).
(4.12)
This list of z is the list of basic solutions. Note now that there are some basic solutions that have negative entries. Once we eliminate those with negative entries, we are left with the basic feasible solutions. Recall that all basic solutions are chosen to satisfy Az = b and the constraint z ≥ 0 is ignored. Eliminating those
basic solutions with negative entries amounts to eliminating those basic solutions, z, that fail to satisfy z ≥ 0. The nonfeasible basic solutions are: z3 , z4 , z5 , z7
and z9 . Observe that those points on the graph are those at the intersection of constraints at points outside the feasible region. The basic feasible solutions are
z1 , z2 , z6 , z8 , z10 . Denoting the objective value at z by π( z), π( z1 ) = 2(3) + 5(6) = 36,
4.3. DUALITY PRINCIPLES
131
π( z2 ) = 2(5) + 5(4) = 30, π( z6 ) = 2(6) + 5(0) = 12, π( z8 ) = 2(0) + 5(7) = 35, π( z10 ) =
2(0) + 5(0) = 0. So, the solution is at z, with x = 3 and y = 6.
4.3 Duality principles There is a general principle behind the close connection between the primal and dual programs. It turns out that both have the same objective values. The following discussion formalizes these observations. Consider the primal (P) and dual (D):
P max
D c′ x
subject to:
b′ y
min subject to:
Ax ≤ b
x≥0
A′ y ≥ c
y ≥ 0.
The dual is formed from the primal by exchanging objective coefficients with the RHS vector; reversing the inequalities in the linear constraints; and transposing the constraint matrix. Taking the dual program associated with the dual gives the primal. Theorem 4.3: For any feasible x and any feasible y, b′ y ≥ c′ x. P ROOF : Taking any y feasible in the dual and premultiplying Ax ≤ b (from the
primal) gives y′ Ax ≤ y′ b = b′ y. Similarly, taking any x feasible in the primal
and premultiplying A ′ y ≥ c from the dual gives x′ A ′ y ≥ x′ c = c′ x. So, for any
feasible x and y,
b′ y ≥ y′ Ax = x′ A ′ y ≥ c′ x.
(4.13)
Therefore, for any feasible solution, the value of the primal is no larger than the value of the dual: b′ y ≥ c′ x. Perhaps surprisingly, when there is equality, the corresponding x and y are optimal in their respective programs. Theorem 4.4: If x∗ and y∗ are feasible and satisfy c′ x∗ = b′ y∗ , then x∗ is optimal in the primal and y∗ is optimal in the dual.
P ROOF : To see this, observe that from the inequalities above in Equation 4.13, for all feasible x, c′ x ≤ b′ y∗ = c′ x∗ , so that x∗ gives at least as high a payoff
CHAPTER 4. LINEAR PROGRAMMING
132
as any feasible x. Similarly, for all feasible y, b′ y ≥ c′ x∗ = b′ y∗ , so that y∗ is optimal for the dual.
For programs that have a solution, the equality c′ x∗ = b′ y∗ always holds, as stated in the next theorem, Theorem 4.5. The proof makes use of Farkas’ lemma.
Farkas’ lemma asserts that if Q is an m × n matrix and α an m × 1 vector then: 1. [Qv ≤ α and v ≥ 0] has a solution, or 2. [Q ′ z ≥ 0, α′ z < 0, and z ≥ 0] has a solution, but not both. Theorem 4.5: If x∗ and y∗ are solutions to the primal and dual respectively, then
c′ x∗ = b′ y∗ . P ROOF : From Theorem 4.3, b′ y∗ ≥ c′ x∗ . Let δ = b′ y∗ , so it is necessary to con
firm that c′ x∗ = δ. For ǫ > 0, there is no solution to the system:
b′ y ≤ δ − ǫ
A′ y ≥ c
y ≥ 0,
otherwise, y∗ would not be optimal in the dual program. Rearranging, there is no y ≥ 0 with: Ã
b′
− A′
!
y≤
! Ã δ−ǫ
−c
.
In the statement of Farkas’ lemma (point 1), let
Q=
Ã
b′
− A′
!
,
α=
! Ã δ−ǫ
−c
,
so that from Farkas’ lemma (point 2), this implies that there is zˆ ≥ 0 with (b : − A ) zˆ ≥ 0, (δ − ǫ, − c′ ) zˆ < 0, zˆ ≥ 0. Let zˆ = ( z0 , z) so that these expressions say that there is a ( z0 , z) with:
bz0 − Az ≥ 0, (δ − ǫ) z0 − c′ z < 0, ( z0 , z) ≥ 0. Next, observe that z0 > 0. For if not ( z0 = 0), then − Az ≥ 0 and − c′ z < 0, or
equivalently, Az ≤ 0 and c′ z > 0. And if this were so, x∗ would not be optimal
4.3. DUALITY PRINCIPLES
133
for the primal (since ( x∗ + z) would be feasible A ( x∗ + z) ≤ b, ( x∗ + z) ≥ 0 and yield a higher payoff than x∗ , c′ ( x∗ + z) > c′ x∗ ). Therefore, since z0 > 0, in the
expressions, bz0 − Az ≥ 0 and z0 (δ − ǫ) − cz < 0 divide by z0 to get:
b − A z˜ ≥ 0 and
(δ − ǫ) − c z˜ < 0,
where z˜ = z z1 . Rearranging, there is z˜ ≥ 0 with 0
A z˜ ≤ b,
and
c z˜ > (δ − ǫ).
Hence, a solution to the dual program x∗ satisfies c′ x∗ > δ − ǫ for all ǫ > 0, so
that c′ x∗ ≥ δ. Therefore, δ = b′ y∗ ≥ c′ x∗ ≥ δ, so that b′ y∗ = c′ x∗ .
For the next theorem, let FP = { x  Ax ≤ b, x ≥ 0} and FD = { y  A ′ y ≥ c, y ≥ 0},
the sets of feasible points for the primal and dual. Theorem 4.6 asserts that if there is no feasible solution for the primal (FP = ;, the empty set), then there is no optimal solution for the dual: if FD 6= ;, then the dual program is unbounded.
Theorem 4.6: Suppose that FP = ;, so that there is no x feasible for the primal.
Then there is no yˆ ∈ FD satisfying b′ yˆ ≤ b y for all y ∈ FD .
P ROOF : Suppose that FP = ;, so Ax ≤ b for some x ≥ 0 has no solution. This
implies, by Farkas’ theorem, that there is a y˜ with A ′ y˜ ≥ 0, y˜ ≥ 0 and β = b′ y˜ < 0. If FD = ;, there is nothing to prove. If FD 6= ;, A ′ y ≥ c and y ≥ 0 has a
solution, yˆ : A ′ yˆ ≥ c and yˆ ≥ 0. Then, for any δ > 0, yˆ + δ y˜ is feasible: A ′ ( yˆ + δ y˜ ) ≥
c, yˆ + δ y˜ ≥ 0. Furthermore, b′ ( yˆ + δ y˜ ) = b′ yˆ + δb′ y˜ = b′ yˆ + δβ. This can be made
arbitrarily small (large negative value) by choosing δ sufficiently large.
The argument is symmetric if FD = ;. Then, either FP = ; or FP 6= ;, and the
primal program, P , has no upper bound on the objective. It is useful to summarize
the possible cases in terms of feasibility: 1. FP = ; and FD = ;. 2. FP = ; and FD 6= ;. 3. FP 6= ; and FD = ;. 4. FP 6= ; and FD 6= ;. In case 1, there is nothing to add. In case 2, according to Theorem 4.6, the dual program is unbounded. Case 3 is the same as case 2, except now the primal program
CHAPTER 4. LINEAR PROGRAMMING
134
is unbounded. Finally, considering case 4, since b′ y ≥ c′ x for all y ∈ FP both pro
grams are bounded.
Thus, if either program has an optimal solution, then so does the other, and the objective values are equal: if x∗ solves the primal and y∗ solves the dual, then c ·
x∗ = b · y∗ . If one program is not feasible, the other has no optimal
solution.
E XAMPLE 4.11: Let A=
Ã
1
0
0
−1
!
,
b=
Ã
b1 b2
!
,
c=
Ã
c1
!
c2
Then Ax ≤ b has no nonnegative solution x, if b1 < 0; and A ′ y ≥ c has no nonnegative solution y if c2 > 0. In this case, FP = FD = ;. If b1 , b2 and c2 are all positive, then FP 6= ;, while FD = ;.
♦
R EMARK 4.3: Complementary slackness follows from the following considerations. Recall that y feasible for the dual and Ax ≤ b give y′ Ax ≤ y′ b, and x
feasible for the primal and A ′ y ≥ c give x′ A ′ y ≥ x′ c, so b′ y ≥ y′ Ax = x′ A ′ y ≥ ˆ y), ˆ c′ xˆ = b′ y, ˆ the previous inequalities imply c′ x. Since, at a solution pair ( x,
ˆ From this, yˆ ′ (b − A x) ˆ = 0 and xˆ ′ (c − A ′ y) ˆ = 0. b′ yˆ = yˆ ′ A xˆ = xˆ ′ A ′ yˆ = c′ x.
4.3.1 Duality and dual prices Theorem 4.4 may be used to provide insight on dual prices. Note that in examples such as Example 4.9 in Section 4.1.6, changing the righthand sides a “small” amount leaves the solution at the intersection of the same constraints – with the solution point moved to the new intersection. For example, changing the RHS of constraint 2 in program B from b2 = 17 to b2 + ∆b2 = 18, moved the solution ¡ ¢ 16 from (2,3) (defined by the intersection of 2x + 3y = 13 and 3x + 4y = 17) to 15 6 , 6
(defined by the intersection of 2x + 3y = 13 and 3x + 4y = 18). Furthermore, considering the dual program, program A, the slopes of the constraints are − 42 = −2 and
− 33 = −1, while the slope of the objective is − 17 13 . So changing the coefficient on the
objective (from b2 = 17 to b2 = 18) leaves the objective slope between –1 and –2 ¡ ¢ and the solution unchanged at 76 , 16 . More generally, if the parameters b in the dual objective do not define objective isocost lines that are parallel to any of the constraints, then small changes in b
4.3. DUALITY PRINCIPLES
135
lead to no change in the optimal solution of the dual. So, given b with solution
y∗ to the dual program, if some b i is changed to b i + ∆b i , then the solution to the
dual remains unchanged at y∗ and the value of the dual objective at the optimal P solution to the new program is j 6= i b j y∗j +(b i +∆b i ) y∗i . Turning to the primal, with initial RHS, b, the optimal solution is x∗ . Changing the RHS of the i th equation
to b i + ∆b i determines a new solution with each x∗i shifted by a small amount – so that x∗i moves to x∗i + ∆ x i , say. From Theorem 4.4,
c1 x1∗ + c2 x2∗ + ... + c n x∗n ∗ = b1 y1∗ + b2 y2∗ + ... + b m ym
and after the RHS change: ∗ c1 ( x1∗ +∆ x1 )+ c2 ( x2∗ +∆ x2 )+...+ c n ( x∗n +∆ x n ) = b1 y1∗ +b2 y2∗ +...+(b i +∆b i ) yi +...+b m ym .
Using the prechange equality:
c1 ∆ x1 + c2 ∆ x2 + ... + c n ∆ x n = ∆b j y j . Dividing:
c1 Writing the left side as
∆ x2 ∆xn ∆ x1 + c2 + ... + c n = yj. ∆b j ∆b j ∆b j
∆π ∆b j ,
this gives
∆π = yj. ∆b j So, for the “regular” case, the change in the objective resulting from a change in the RHS is given by the corresponding dual price.
This page intentionally left blank
Exercises for Chapter 4
Exercise 4.1
You have a choice of two foods to eat per day. Food A contains 3
grams of vitamin A per ounce, 4 grams of vitamin C and 1 gram of vitamin E. Food
B contains 1 gram of vitamin A, 5 grams of vitamin C and 3 grams of vitamin E. The minimum daily requirements of vitamins A, C and E are 22, 66 and 20 grams respectively. You are also on a diet, so you want to minimize your calorie intake. An ounce of food A has 30 calories and an ounce of food B has 20. (a) Formulate the linear programming problem. (b) What are the vertices of the convex set bounded by the constraints? (c) At what point do you minimize your calorie intake subject to the nutritional requirements? (d) Assume that each calorie you consume costs you $2. What would you pay for a vitamin pill that would reduce the amount of vitamin A you need from food to 20 grams? Exercise 4.2
For the following systems of constraints, determine and plot the
feasible regions.
(a)
4 x + 3 y ≥ 12
x− y ≤0
x≥0
(b)
y≥0
x+ y−6 ≤ 0
2x + y − 8 ≤ 0
137
2 x + y ≥ 50 (c)
x + 2 y ≥ 40 x≥0
y≥0
CHAPTER 4. LINEAR PROGRAMMING
138
x + 2 y ≤ 12
(d)
6 x + 4 y ≤ 48
−x + 2 y ≤ 8
3 x + y ≥ 12
(e)
6 x + 5 y ≥ 30
x + 2 y ≥ 14
x≥0
x≥0
y≥0
y≥0
20 x + 15 y ≤ 60, 000 4 x + 9 y ≤ 25, 200
(f)
4 x + 6 y ≤ 18, 000
30 x + 20 y ≤ 87, 000 Exercise 4.3 Solve the following linear programming problems: max such that:
5x + 4 y
x + 2 y ≤ 12
6 x + 4 y ≤ 48
−x + 2 y ≤ 8
x, y ≥ 0,
min such that:
x + 3y
3 x + y ≥ 12
6 x + 5 y ≥ 30
x + 2 y ≥ 14 x, y ≥ 0.
Part I. In the maximization problem, calculate the change in the value of the objective as a result of: (a) Changing the RHS (righthand side) from 12 to 10, and from this deduce the dual price associated with the first constraint. (b) Changing the RHS from 48 to 40, and from this deduce the dual price associated with the second constraint. (c) Changing the RHS from 8 to 7, and from this deduce the dual price associated with the third constraint. Part II. In the minimization problem, calculate the change in the value of the objective as a result of: (d) Changing the RHS from 12 to 11, and from this deduce the dual price associated with the first constraint.
EXERCISES FOR CHAPTER 4
139
(e) Changing the RHS from 30 to 32, and from this deduce the dual price associated with the second constraint. (f) Changing the RHS from 14 to 15, and from this deduce the dual price associated with the third constraint. Part III. For both problems, introduce slack variables and write the programs as equalityconstrained problems. In both cases: (g) Identify all the basic solutions. (h) Identify all the basic feasible solutions. Exercise 4.4 Consider the following problem max
x+ y
subject to:
3 x + 5 y + 8 z ≤ 30
4 x + 5 y − 10 z ≤ 20
x, y, z ≥ 0.
(a) Reformulate the problem using equalities. (b) What are all the basic solutions? (c) What are all the basic feasible solutions? (d) What is the optimal basic feasible solution? Exercise 4.5 Consider the following linear program: max
x+ y
subject to: 2 x + 4 y ≤ 12
(4.1)
4 x + 2 y ≤ 12
(4.2)
x, y ≥ 0. (a) Plot the constraints and solve the program. (b) Write the program in equality constraint form ( Az = b) by introducing slack variables, s1 and s2 .
(c) The “ A ” matrix in (b) has two rows and four columns. Write down all the submatrices having two rows and two columns. There are six such matrices: A 1 , A 2 ,... , A 6 . Find the six basic"solutions # of the four " two # " #by setting 2 4 12 x , when s1 = variables ( x, y, s1 , s2 ) to zero. (e.g. A 1 , A1 = = 4 2 12 y s2 = 0).
CHAPTER 4. LINEAR PROGRAMMING
140
(d) Locate each of the six solutions found in (c) on a graph. Each solution has the form ( x∗ , y∗ , s∗1 , s∗2 ), but in all six cases the ( x∗ , y∗ ) differ – so you can see where each solution is on your x– y graph. Now eliminate those solutions that contain a negative term and relate the remaining solutions to the feasible region. Exercise 4.6 Consider the following linear program: max
Π = 2x + 2 y
subject to:
2 x + 4 y ≤ 16 3x + y ≤ 9
x ≥ 0, y ≥ 0. (a) Find the optimal values of x and y. (b) What is the value of the objective at the (optimal) solution? (c) What are the values of the slack (surplus) variables at the solution? (d) Suppose that in the second constraint, 9 is replaced by 10. How are your answers in (a) and (b) affected? Exercise 4.7 For the following linear program: min
C = 4x + 3 y
subject to:
3 x + 2 y ≥ 158
5 x + 2 y ≥ 200
x + 3 y ≥ 90
x ≥ 0, y ≥ 0. (a) Find the optimal values of x and y.
(b) What is the value of the objective at the (optimal) solution? (c) What are the values of the slack (surplus) variables at the solution? (d) Calculate the dual prices associated with each constraint. Exercise 4.8 For the following linear program: max subject to:
Π = 10 x + 24 y
x + 2 y ≤ 120
x + 4 y ≤ 180
x ≥ 0, y ≥ 0. (a) Find the optimal values of x and y.
(b) What is the value of the objective at the (optimal) solution? (c) What are the values of the slack (surplus) variables at the solution? (d) Calculate the dual prices associated with each constraint.
EXERCISES FOR CHAPTER 4
141
Exercise 4.9 Consider the following linear program: max 2 x + 5 y subject to: 4 x + 1 y ≤ 24
(4.1)
1 x + 3 y ≤ 21
(4.2)
1x + 1 y ≤ 9
(4.3)
x, y ≥ 0. (a) Solve the linear program – find the ( x, y) pair that solves the problem. (b) Find the slack (surplus) values. (c) What is the effect on the value of the objective if the third constraint is replaced by 1 x + 1 y ≤ 8? From this, give the dual price associated with the third constraint. (d) Calculate the dual prices for the other two constraints, (1) and (2). (e) Would the optimal values of the variables ( x, y) be different if the objective were 1 x + 5 y? Exercise 4.10 Consider the linear program: max subject to:
3x + 2 y
4 x + 2 y ≤ 14
4 x + 3 y ≤ 17
6 x + y ≤ 19
x ≥ 0, y ≥ 0. (a) Plot the constraints. (b) Find the optimal solution, ( x, y).
(c) Calculate the dual prices associated with each constraint. (d) Find all the basic solutions. (e) Identify all the basic feasible solutions. Exercise 4.11 Consider the linear program: max subject to:
3x + 4 y
3 x + y ≤ 16
x + 3 y ≤ 16 x+ y ≤6
x ≥ 0, y ≥ 0. (a) Plot the constraints. (b) Find the optimal solution, ( x, y).
CHAPTER 4. LINEAR PROGRAMMING
142
(c) Calculate the dual prices associated with each constraint. (d) Find all the basic solutions. (e) Identify all the basic feasible solutions. Exercise 4.12 Consider the following linear program: max subject to:
3x + 4 y 2 x + 3 y ≤ 13
(4.1)
3 x + 4 y ≤ 17
(4.2)
x, y ≥ 0. Plot the program and solve, giving the slack variables, objective value and dual prices. Exercise 4.13 Consider the following linear programming problem. max
3x + 2 y
subject to:
x + y ≤ 12 2 x + y ≤ 20
x ≥ 0, y ≥ 0. (a) Plot the constraints and the feasible region for this linear programming problem. (b) Solve for the optimal choice of x and y. Exercise 4.14 Consider the following linear program: max subject to:
x+ y 7 x + 2 y ≤ 28
(4.1)
4 x + 3 y ≤ 24
(4.2)
2 x + 6 y ≤ 42
(4.3)
x, y ≥ 0. Plot the program and solve, giving the slack variables, objective value and dual prices. Exercise 4.15 Consider the problem of maximizing x y subject to the constraint
x2 + y2 ≤ 2 and x ≥ 0, y ≥ 0. Let the feasible region be the ( x, y) pairs satisfying x2 + y2 ≤ 2 and x ≥ 0, y ≥ 0.
(a) Plot the feasible region. (b) Plot the set of ( x, y) pairs such that (i) x y = 1, (ii) x y = 2 and (iii) x y = 3.
EXERCISES FOR CHAPTER 4
143
(c) Indicate the solution on a graph and find the solution. (d) How does your answer change if the constraint changes to x2 + y2 ≤ 3? Exercise 4.16 (a) Consider the linear program: max 3 x + 5 y subject to:
x + 2 y ≤ 10
(4.1)
2 x + 3 y ≤ 17
(4.2)
3 x + 2 y ≤ 24
(4.3)
x ≥ 0, y ≥ 0. (i) Plot the constraints. (ii) Find the optimal solution, ( x, y). (iii) Calculate the dual prices associated with each constraint. (b) Consider the linear program: max subject to:
x+ y 2x + y ≤ 6
x + 2y ≤ 6 x ≥ 0, y ≥ 0. (i) Find all the basic solutions. (ii) Identify all the basic feasible solutions.
This page intentionally left blank
5 Functions of one variable
5.1
Introduction
Functions arise naturally in the study of economic problems. For example, a demand function relates quantity demanded to price. Properties of such functions are routinely studied – the slope and elasticity measure responsiveness of demand to price variations. The techniques presented here introduce the tools required for such calculations. Section 5.2 describes a variety of functions arising in different areas (taxation, finance and present value calculations). Section 5.3 introduces the basic notions of continuity and differentiability of a function. Certain families of functions, such as logarithmic, polynomial and exponential functions, appear frequently in economic applications; some of these are introduced in Section 5.4 and their shapes described. Growth rates and the algebra of growth calculations are developed in Section 5.5. Finally, Section 5.6 provides rules for the differentiation where the function is itself formed from other functions (such as the ratio or product of functions). A function, f , from a set X to a set Y is a rule associating with each element of a set X some element in the set Y . This is written f : X → Y . In words, to each
point in X , f associates a point, y = f (x) in Y . The set X is called the domain
and the set Y is called the range. The inverse demand function, relating price to
quantity demanded, p(Q), is a familiar example. The domain is the set of nonnegative numbers, R + (quantity must be positive), and the range is also the set of nonnegative numbers (price is nonnegative). Thus, p : R + → R + .
145
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
146
5.2 Some examples The following examples show a variety of ways in which functions arise in economics.
5.2.1 Demand functions Consumer demand theory describes an individual in terms of utility derived from consumption of a vector of goods (x1 , x2 ,... , x n ). With a fixed income y and price p i for good i, utility maximization leads to choices for the goods that depend on prices and income: x i (p1 , p2 ,... , p n , y). See Figure 5.1 for the twogood case. In the twogood case, utility maximization leads to the selection of (x1∗ , x2∗ ). The values of x1∗ and x2∗ depend on the prices (p1 , p2 ) and income, y. These are called the demand functions and are written x1 (p1 , p2 , y) and x2 (p1 , p2 , y).
x2
•
(x1∗ , x2∗ )
x1
Figure 5.1: Choices determine demand j
With J individuals, write the demand for good i by individual j as x i (p1 , p2 ,... , p n , y j ), where y j is income of individual j. Then, the aggregate demand for good i is: X i (p1 , p2 ,... , p n , y1 , y2 ,... , y J ) =
J X
j =1
j
x i (p1 , p2 ,... , p n , y j ).
The function X i depends on the prices and incomes of each individual. If we consider the case where all prices other than that of good j, p j = p¯ j , j 6= i, and all
5.2. SOME EXAMPLES
147
incomes, y j = y¯ j , are fixed then the aggregate demand is:
D i ( p i ) = X i ( p¯ 1 , p¯ 2 ,... , p¯ i−1 , p i , p¯ i+1 ,... , p n , y¯ 1 , y¯ 2 ,... , y¯ J ). For simplicity, ignore the subscripts and write, D ( p), for the aggregate demand as a function of own price. With this, one may study the behavior of aggregate demand as own price varies. Writing Q for the quantity demanded, Q = D ( p).
This gives aggregate demand for the good as a function of the good’s price. With
this, the responsiveness of demand to price variation may be considered. Consider a price p′ 6= p with corresponding demand Q ′ : the variation in quantity resulting
from the price variation is Q ′ − Q = D ( p′ ) − D ( p). Thus, for example, the change in aggregate quantity demanded per unit change in price is given by
Q ′ − Q D ( p′ ) − D ( p) = p′ − p p′ − p and gives the slope of the demand function. Similarly, the percentage change in quantity resulting from a percentage change in price is given by: Q ′ −Q Q p′ − p p
=
D ( p ′ )− D ( p ) D ( p) p′ − p p
=
p D ( p′ ) − D ( p) . D ( p) p′ − p
The negative of this term gives the elasticity of demand. Note that the elasticity of demand is computed at a point on the demand curve: at different points, the value of the elasticity will vary. Thus, we may write: ǫ( p ) = −
5.2.2
·
p D ( p)
¸·
¸ D ( p′ ) − D ( p) . p′ − p
Present value and the interest rate
The present value of a cash flow may be viewed as a function of the interest rate. A bond is characterized by a face value P , an interest rate ρ and a maturity date T , and it pays the holder a fixed payment, f = ρ P , each time period (such
as every six months or every year), and the face value P at the end of the T th period. Thus, the cash flow generated by the bond is: ( f , f , f ,... , f , P ), where f is paid at the end of periods 1, 2, up to T − 1. What is the value of this cash flow?
If the market interest rate is r, x dollars invested today for one period is worth
y = x(1 + r) at the end of the period. Put differently, to have y dollars at the end
of the period, x =
y (1+ r )
dollars should be invested now. How much should one pay
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
148
for a payoff of y dollars in one period from now? If the cost is c < x, borrowing c
will lead to a payoff y next period and a debt of c(1 + r). Since c(1 + r) < x(1 + r) = y,
this generates a positive net payoff. Similarly, if c > x the loan repayment in one period will exceed the payoff: c(1 + r) > y. So, c < x undervalues the return y in one period; and c > x overvalues the return y in one period. For market equilibrium, c=x=
y 1+ r .
This is the current value of y dollars, one period from now. The mar
ket value of the return y in one period is
y 1+ r .
The same reasoning shows that the
current value of a payoff y two periods from now is it is
y . (1+ r) t
y , (1+ r)2
and t periods from now
Therefore, the flow ( f , f , f ,... , f , P) has current market value of: f f f f P + ··· +··· + + t 2 T − 1 1 + r (1 + r) (1 + r) (1 + r) (1 + r)T · ¸ 1 1 1 1 P =f + ··· +··· + + . 1 + r (1 + r)2 (1 + r) t (1 + r)T −1 (1 + r)T
V=
One simple case occurs when the payments continue indefinitely (and there is no end period at which P would be paid). In this case, ·
1 V=f 1+r
¸· 1+
¸ 1 1 1 + ··· +··· . 1 + r (1 + r)2 (1 + r) t 1 1 1−α , when α = 1+ r , the infinite f 1 1+ r 1+ r r = r (Figure 5.2). At the time
Using the fact that for 0 < α < 1, 1 + α + α2 + · · · = sum becomes
1 1 1 − 1+ r
=
1+ r r .
Therefore, V = f
the bond is issued, the rate on the bond, ρ , will typically equal the market rate r. Afterwards, while the bond rate is fixed, the market rate will vary, causing the value of the cash flow provided by the bond to vary. Write V (r) to denote this dependence, the valuing of the bond as a function of the current market interest rate: V (r) =
f . r
From this, one may, for example, calculate the impact on the value of the bond £ ¤ f f ofha change in the market interest rate as: ∆V = V (r′ ) − V (r) = r′ − r = f r1′ − 1r i h i r′ r−r′ f rrr ′ − rr ′ = f rr ′ .
5.2.3 Taxation In a market with supply and demand curves S(p) and D(p), the introduction of a sales tax creates a wedge between the price paid and the price received. How do these changes relate to the pretax price and quantity? With D(p) and supply S(p), the market clearing price is p0 , determined as the solution to: D(p0 ) = S(p0 ).
5.2. SOME EXAMPLES
149
V
r
Figure 5.2: The present value function
Suppose now that an advalorem tax is introduced, so that if the seller receives p t , the buyer pays (1 + t)p t . Let t0 = 0 be the tax initially (no tax) and t the tax imposed, so the tax incre
ment is ∆ t = t − t0 = t. Now, market clearing requires that the amount supplied at price p t equals the amount demanded at price (1 + t)p t : D((1 + ∆ t)p t ) = S(p t ). With
Q(0) = D(p0 ) = S(p0 ) and Q(t) = D((1 + ∆ t)p t ) = S(p t ), the variation on quantity
traded as a result of the tax increment is Q(t) − Q(0) < 0, and the rate of change of quantity traded relative to the tax variation is given by
Q(t)−Q(0) . ∆t
From Figure 5.3, the magnitude of this depends on the slopes of the supply and demand curves. Similarly, the extent to which p t lies below p0 and (1 + t)p t
lies above p0 depends on the slopes of the curves and determines the incidence of the tax (which party, supplier or demander, pays most of the tax).
5.2.4
Options
At any point in time, a stock trades at some price p, which varies over time. Derivative assets can be created as a function of the stock price p, f (p). For example, consider a contract offering f (p) dollars if the stock price is p dollars or more at the end of trading, and 0 otherwise. For example, if f (p) = 5, then the contract pays $5 if the underlying stock has a price of at least p at the end of trading. Such derivative assets are called “options”. In finance, an option gives the owner the right to buy or sell an asset. An option giving the right to purchase an asset at some specified price, p∗ , called the
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
150
Q S(p)
Q0 Qt
•
p t p0
D(p) p (1+ t)p t
Figure 5.3: The impact of taxation in a market
exercise or strike price, at (or before) a specified date, T, is a call option. An option giving the right to sell is called a put option. When the right can be exercised only at the date T, the option is a European option. When the right can be exercised on or before the date T, the option is an American option. Such options define assets that are functions of the underlying assets and so are called derivative assets. Suppose that the strike price of a European call option is p¯ and that p˜ is the (as yet unknown) stock price at period T (in the future). Then, if the price p˜ turns ¯ the owner of the call option can exercise the right to purchase out to be above p, ¯ That unit of stock can be sold on the market one unit of the stock at price p. ˜ so the net gain is p˜ − p. ¯ If the price p˜ turns out to be below p, ¯ the at price p,
owner of the call option will choose not to exercise the purchase right, and the net ¯ and 0 benefit is 0. So, the owner of the option will get a return of p˜ − p¯ if p˜ > p, ˜ = max{ p˜ − p,0 ¯ }, the otherwise. Therefore, the return per call option owned is ϕ( p)
larger of p˜ − p¯ and 0 (max{ x, y} equals the larger of the two numbers, x and y). If
the individual owns k such options, then k units of the stock may be purchased at ˜ Figure 5.4 depicts the return of a call and price p¯ and so the return is then kϕ( p). ¯ put option where the striking price is p. In contrast, a put option gives the right to sell a unit of the asset at a (strike) ˆ say. In this case, net benefit arises when the price turns out to be low, price, p, ˆ This has a return so that the asset may be sold at the (higher) strike price p. ˜ = max{ pˆ − p,0 ˜ }. Such options may be combined to provide more complex of θ( p) return streams.
5.3. CONTINUITY AND DIFFERENTIABILITY
151
ϕ
θ
call option ϕ( p˜ )
p¯
put option θ( p˜ )
p˜
pˆ
p˜
Figure 5.4: Options
5.3 Continuity and differentiability 5.3.1
Preliminaries
Before discussing continuity and differentiability, it is necessary to give some background definitions, describe types of function and introduce the notion of distance between numbers. Definition 5.1: Functions may be categorized in terms of surjectivity, injectivity and bijectivity. • If f ( X ) = Y , where f ( X ) = { y  there is an x ∈ X , f ( x) = y}, we say f is onto Y
or surjective. In words, for any point y ∈ Y , there is some x ∈ X with f ( x) = y.
• The function f is called 11 (onetoone) or injective if f ( x1 ) = f ( x2 ) implies
that x1 = x2 ; a 11 function cannot have the same value at two different points in the domain.
• If f is both 11 and onto, it is called bijective. If f is a bijective mapping, ∃
a function g : Y → X , called the inverse of f with g( f ( x)) = x and f ( g( y)) = y.
The function g associates a unique point in X to each point in Y , and is written f −1 .
E XAMPLE 5.1: For example, with X = Y = [0, 1], f ( x) = x2 is surjective since p for y ∈ Y there is an x with f ( x) = y ( x = y); and injective because x 6= x¯ and
x, x¯ ∈ [0, 1] imply f ( x) = x2 6= x¯ 2 = f ( x¯ ). This function is bijective with inverse
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
g(y) = f −1 (y) =
p
152
y. The function f (x) = 2x(1 − x) is not surjective because
for any y ∈ [0,1] and y >
1 2
there is no x ∈ [0,1] with f (x) = y; and it is not ¡ ¢ ¡ ¢ injective because, for example, f 13 = f 23 = 49 . ♦
In the economic context, some of these ideas can play an important role. Consider the case where there are two goods, x1 and x2 . Demand for good 1 is x1 (p1 , p2 ). For the discussion, let p2 be fixed (at say p¯ 2 ) and let D(p1 ) = x1 (p1 , p¯ 2 ), giving the demand for good 1 as its price varies.
In Figure 5.5, an increase in the price of good 1 from p1 to p′1 raises the demand for good 1 from x1 to x1′ . Such a good is called a “Giffen” good, and has the odd feature that the demand curve slopes upward in some regions. The second figure illustrates such a demand curve. Here, there are multiple prices that lead to the same level of demand. In the figure, x1 = D(p◦1 ) = D(p1 ) = D(p′′1 ): the demand equals x1 at three different prices, p◦1 , p1 and p′′1 . The demand function
Q x2 a • b •
p◦1 x1 + p¯ 2 x2 = y
c d• •
x1 x1′
y p′1
y p1
x1
x1′ x1
c a
p◦1
b
p1 p′1
d
p′′1
D(p) p
Figure 5.5: Giffen good; a noninjective demand function
is not injective and we cannot associate a unique price to a demand level such as x1 . Technically, the demand function does not have an inverse – there is no inverse demand function, p(x1 ). Such a function cannot be defined, since, at x1 for example, there are multiple prices associated with this demand level (p◦1 , p1 and p′′1 ).
5.3. CONTINUITY AND DIFFERENTIABILITY
153
5.3.2
Inequalities and absolute value
Informally, a function f is continuous if when x is close to x′ , f ( x) is close to f ( x′ ). Formally defining continuity requires a notion of closeness – the distance between different points or numbers. For example, considering the numbers 4, 6 and 9, four is closer to 6 than 9 is to 6. Since 6 − 4 = 2 while 9 − 6 = 3. Here, the subtraction is
done so that the result is positive: if in the second calculation, we write 6 − 9 = −3;
this is smaller than 2 (−3 < 2), but 9 is not closer than 4 is to 6. Clearly, here the
magnitude is what is important, and this is measured by the absolute value of a number. Comparing 9 − 6 = 3 and 6 − 9 = −3, the magnitude of the difference is 3; both 3 and −3 indicate that 9 and 6 are separated by three units.
When comparing two numbers, one can always say which one is larger or smaller, or else they are equal. The expressions “≥” and “≤” denote “greater than or equal to” and “less than or equal to”, respectively. Thus, “a is at least as large
as b” is written a ≥ b, while “a is greater (strictly) than b” is written a > b. Note
that if a ≥ b, then −a ≤ −b: multiplying both sides of an inequality reverses the inequality (consider 7 > 5 and −7 < −5).
For a real number a, the expression  a  is called the absolute value of a, and
is defined:  a = a, if a ≥ 0 and  a = −a, if a < 0: a =
(
−a
a
if a < 0, if a ≥ 0.
In words,  a  equals a if a is nonnegative; and equals −a if a is negative. So,  a 
is always nonnegative, and strictly positive unless a = 0.
P ROPOSITION 5.1: The absolute value operation satisfies a number of simple rules. For real numbers, a, b and c: 1. If a ≥ 0, then  x ≤ a if and only if −a ≤ x ≤ a; 2.  a + b = b + a ; 3.  a × b = a  b ; 4.  a + b ≤ a  +  b ; 5.  a + b ≥ a  −  b ; 6. 
a b
=
a   b ,
if b 6= 0.
♦
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
154
P ROOF : (1) From the definition of absolute value, −  x ≤ x ≤ x , since x = x 
or −  x . If we assume  x ≤ a, then −a ≤ −  x ≤ x ≤ x ≤ a. Conversely, if
we assume −a ≤ x ≤ a then x ≥ 0 implies  x = x ≤ a. On the other hand, if
x < 0, −a ≤ x implies − x ≤ a and  x = x so  x ≤ a. Either way,  x ≤ a. Thus,
 x ≤ a is equivalent to −a ≤ x ≤ a. Points (2) and (3) follow from the definition.
Concerning (4), to see that  a + b ≤ b  +  a , observe that −  a ≤ a ≤ a  and
−  b ≤ b ≤ b : these inequalities imply −( a  +  b ) ≤ a + b ≤ ( a  +  b ) and
 a + b ≤ a  +  b . To see that  a + b ≥ a  −  b , using rule (4), note that  x  +  y ≥ x + y  and, rearranging,  x ≥ x + y  −  y . Set x = a + b, and y = −b
in this expression to give the result  a + b ≥ a  −  b , for any a, b. Finally, (6)
follows from the definition.
P P Rules 4 and 5 generalize directly. Consider  ni=1 x i  ≤ ni=1  x i . To confirm P P P this, put a = x1 , b = ni=2 x i so that  ni=1 x i  ≤  x1  +  ni=2 x i , using rule 4. Repeat P P P with  ni=2 x i  to give  ni=1 x i  ≤  x1  +  x2  +  ni=3 x i . Proceeding in this way P gives  ni=1 x i  ≤  x1  +  x2  + · · · +  x n . P Pn Similarly,  ni=1 x i  ≥  x1  −  x2  −  x3  − · · · −  x n . Given i =1 x i , Pn Pn Pn set a = x1 and b = i=2 x i to get  i=1 x i  ≥  x1  −  i=2 x i , using rule 5. The preP P vious discussion gives  ni=2 x i  ≤  x2  +  x3  + · · · +  x n , so −  ni=2 x i  ≥ −  x2  − P P  x3  − · · · −  x n . Therefore,  ni=1 x i  ≥  x1  −  ni=2 x i  ≥  x1  −  x2  −  x3  − · · · −
 x n .
5.3.3 Continuity Continuity of a function is the requirement that when points are close in the domain, the function should take those points to the same neighborhood in the range. This is formalized next. Definition 5.2: Let x1 , x2 ,... , x n = { x n }n≥1 be a sequence of real numbers. We say that x n converges to a real number c if given any positive number, ǫ > 0, however
small, there is an integer n¯ , such that  x n − c ≤ ǫ, whenever n ≥ n¯ .
This is written  x n − c → 0 as n → ∞, or x n → c, or lim n→∞ x n = c. Intuitively,
 x n − c → 0 means that  x n − c  gets “closer and closer” to 0 as n gets “larger and
larger” (n → ∞).
E XAMPLE 5.2: The following sequences illustrate convergence: 1. x n = c + n1 , so that  x n − c =
1 n
→ 0, as n → ∞, so that x n converges to c.
5.3. CONTINUITY AND DIFFERENTIABILITY
155
2. x n = c + (−1)n . Thus x n is equal to c + 1 when n is even and c − 1 when
n is odd. In this case  x n − c  = 1 for all n and so x n does not converge
to c.
3. x n = c + n, so that  x n − c = n and so x n does not converge to c.
♦
Definition 5.3: A function f is continuous at c if given ǫ > 0, ∃δ > 0 such that  f ( x) − f ( c) < ǫ whenever  x − c < δ.
In this case, write lim x→ c f ( x) = f ( c). As x gets “closer and closer” to c, f ( x) gets
“closer and closer” to f ( c). Call f continuous, if f is continuous at every c ∈ X .
E XAMPLE 5.3: The following functions illustrate continuity: 1. X = (0, ∞), f ( x) =
1 x
is continuous; but f is not continuous if the domain
is X = [0, ∞).
2. X = (−∞, ∞), f ( x) = 2 x3 + x2 + 5 is continuous. 3. X = [0, 1], f ( x) = x, if x
x, then f (x′ ) > f (x). If x′ is slightly larger than x, then f (x′ ) is slightly larger than
f (x), and f (x′ ) − f (x) measures the increase in f resulting from the increase from
x to x′ . More generally, whether f is increasing or not, f (x′ ) − f (x) measures the
change in f resulting from the change in x.
Definition 5.4: The function f is differentiable at x if, as x′ converges to x,
f (x′ )− f (x) (x′ − x)
converges to a number. Formally, f is differentiable at x if the following limit exists: lim
x′ → x
f (x′ ) − f (x) f (x + ∆ x) − f (x) f (x + ǫ) − f (x) = lim = lim . ′ ǫ → 0 ∆ x → 0 x −x ∆x ǫ
If the limit exists it is denoted
d f (x) or f ′ (x) dx and is called the derivative of f . In Figure 5.6, f ′ (x) is equal to the slope of the tangent to the curve;
f (x+∆ x)− f (x) ∆x
is
equal to the slope of the hypotenuse of the triangle. As ∆ x becomes smaller, the two slopes get closer. ¯ the function is flat at that When the derivative is equal to 0 at some point x, point. This is important in optimization, where one wishes to locate the value of x than makes f (x) as large as possible. The function f (x) = x(1 − x) has a plot similar to Figure 5.6 and has f ′ (x) = 0 at x = 12 .
The following functions illustrate differentiability.
E XAMPLE 5.4: If f (x) = x2 , then f (x + ∆ x) = (x + ∆ x)2 = x2 + 2x∆ x + (∆ x)2 so
that f (x + ∆ x) − f (x) = x2 + 2x∆ x + (∆ x)2 − x2 = 2x∆ x + (∆ x)2 . Therefore f (x + ∆ x) − f (x) 2x∆ x + (∆ x)2 = = 2x + (∆ x). ∆x ∆x
5.3. CONTINUITY AND DIFFERENTIABILITY
157
f (x + ∆ x) f (x)
•
x x + ∆x
Figure 5.6: The derivative
Since the limit of this as ∆ x goes to 0 is 2x, this shows that f ′ (x) = 2x. For example, take x =
1 2
= 0.5 and ∆ x = 0.05, so that x + ∆ x = 0.55. Then f (x +
∆ x) = f (0.55) = (0.55)2 = 0.3025 and f (x) = f (0.5) = 0.25 so that
f (x + ∆ x) − f (x) f (0.55) − f (0.5) 0.3025 − 0.25 0.0525 = = = = 1.05. ∆x 0.05 0.05 0.05 At x = 0.5, the derivative is f ′ (x) = 2x = 2(0.5) = 1, so the approximation is
♦
good at ∆ x = 0.05.
The next example gives a continuous function that does not have a derivative at x=
1 2
(see Figure 5.7). E XAMPLE 5.5: Let f (x) = min{ x,1 − x}, so that f (x) is the smaller of x and ¡ ¢ 1 − x. Take x = 12 and let x′ < 12 so that ∆ x = x′ − x < 0, f (x) = f 12 = 12 = x and
f (x′ ) = x′ since x′ < 12 . Now, observe that f (x + ∆ x) − f (x) = (x + ∆ x) − x = ∆ x,
so that
Thus lim∆ x→0
f (x′ ) − f (x) ∆ x = = 1. ∆x ∆x f (x+∆ x)− f (x) ∆x ′
= 1, when we take x′ < x (∆ x < 0).
Now, take x > x, so that x′ = x + ∆ x, ∆ x > 0. Then f (x′ ) = 1 − x′ = 1 − (x +
∆ x) = (1 − x) − ∆ x. Since f (x) = x =
1 2
= (1 − x),
f (x′ ) − f (x) f (x + ∆ x) − f (x) [(1 − x) − ∆ x − (1 − x)] −∆ x = = = = −1 ∆x ∆x ∆x ∆x
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
So the limit of
[ f ( x′ )− f ( x)] ∆x
158
= −1 when we take x′ > x. Another example of this
♦
sort is f ( x) = x . (Check that f is not differentiable at x = 0.)
x
1 2
Figure 5.7: Plot of f ( x) = min{ x, 1 − x} R EMARK 5.1: It is always the case that differentiability of a function implies f ( x0 +∆ x)− f ( x0) ∆x f ( x0 +∆ x)− f ( x0) to ∆x
that the function is continuous. To see this, suppose that lim∆ x→0
exists. As the denominator (∆ x) approaches 0, for the ratio
have a limit requires that ( f ( x0 + ∆ x) − f ( x0 )) also approaches 0. Therefore, lim∆ x→0 ( f ( x0 + ∆ x) − f ( x0 )) = 0. The converse is not true. For example,  x  is
continuous but not differentiable.
R EMARK 5.2: Differentiability plays an important role in locating the maximum or minimum of a function. Consider f ( x) = x(1 − x) = x − x2 , so f ′ ( x) = 1 − 2 x. Thus f ′ ( x) is positive when x < 21 , negative when x >
1 2
and equal to 0 at x = 12 .
The derivative of f measures the slope of f : the slope is positive to the left of 12 , 0
and negative to the right of 12 . For certain functions (such as this one), there ¡ ¢ is a unique point at which the derivative is 0 21 for the function f ( x) = x(1 − x) . at
1 2
Note that this occurs at the highest point of f so by solving f ′ ( x) = 0 = 1 − 2 x, we find the value of x that maximizes f : 1 − 2 x = 0 ⇒ x = 21 . This suggests that the
use of derivatives will be helpful in finding maxima (and minima) of functions.
5.4 Types of function In principle, a function f is a rule that associates to each point x in the domain, a point y in the range. In practice, it is useful to use specific functional forms that reflect specific patterns in relationship between x and y.
5.4. TYPES OF FUNCTION
159
E XPONENTIAL
FUNCTIONS :
The exponential function is defined f (x) = e x , where
e is a constant (known as Euler’s number, e ≈ 2.718), discussed later. More gener
ally, exponential functions have the functional form f (x) = a x or f (x) = ba cx where a, b and c are constants. P OLYNOMIAL n
a n x + a n −1 x
FUNCTIONS :
n −1
A polynomial function is a function of the form: f (x) =
+ · · · + a1 x + a0 , where {a k }nk=0 are constants.
T RIGONOMETRIC
FUNCTIONS :
The three primary trigonometric functions are
the sine, cosine and tangent functions, f (x) = sin(x), f (x) = cos(x), f (x) = tan(x). Trigonometric functions are functions of an angle x, which may be measured in
degrees or radians. When measured in degrees, a 360◦ angle represents one full rotation in a circle. When measured in radians, a radian is that angle, such that 2π such angles is a full rotation in the circle. It is common to plot these functions from −π to π, after which they repeat, so in this case the unit of measure is radians.
L OGARITHMIC
FUNCTIONS :
The logarithm of a number is defined relative to
some given base, and is said to be the logarithm of the number to that base. The logarithm of x to the base b is that number y such that x = b y , that is the power to which the base must be raised to equal that number. So, for example, the logarithm of 100 to the base 10 is 2, because 100 = 102 . The log of x to the base b is
written logb x. One special case is where b = e, Euler’s number. Routinely, log e x is written ln x.
5.4.1
Plots of various functions
The following table gives some well known functions and their derivatives. f (x)
xa
ax
log e x
e ax
sin(x)
cos(x)
tan(x)
e g(x)
f ′ (x)
ax a−1
a x · ln(a)
1 x
ae ax
cos(x)
− sin(x)
1 cos2 (x)
g′ (x) · e g(x)
For some functions, parameter variation may change the plot of the function substantially. In Figure 5.8, different values of a vary the appearance of the function significantly. Note that the function f (x) = x a may only be defined for some values of x. For p example, when a = 12 , then f (x) = x is defined only for real x ≥ 0 and when a = −1
it is defined for all x 6= 0.
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
x2
160
x3
x −1
x −1
Figure 5.8: f (x) = x a , for various values of a The function f (x) = a x (Figure 5.9) is defined for all real x when a > 0. When 1
1
a < 0, a x may or may not be defined. For example, (−8)− 3 = −2, but (−1)− 2 is not
a real number.
2x 1x 2
1x
Figure 5.9: Plot of f (x) = a x , for various values of a The trigonometric functions are typified by repeating patterns. For any angle x, measured in degrees, this angle is the same angle as x + 360◦ so that any function of the angle will have the same value at x and x + 360◦ . Alternatively,
angles may be measured in radians. A radian is that angle such that 2π radians equals one rotation in the circle (or 360◦ ). In Figure 5.10 the angles are measured in radians. These functions may be modified (Figure 5.11). Consider the function f (x) = ¤¢ ¡ £ A sin(bx − c) + d = A sin b x − bc + d. The parameter A is called the amplitude:
if A = 2, the height of the fluctuation is doubled. The parameter
c b
measures
5.4. TYPES OF FUNCTION
161
sin( x)
•
π
•
−2π
2π
•
−π
•
cos( x)
Figure 5.10: Plot of sin( x) and cos( x) for −2π ≤ x ≤ 2π horizontal shift. Comparing f ( x) with g( x) = A sin(bx) + d , f tio
c b
¡c¢ b
= g(0). The ra
is called the phase shift (the horizontal displacement of the function from its
initial position). Finally, the period of a function is the distance between repeating points and is given by
2π b .
sin( x) •
•
−π
−2π
•π 2
π
•
2π
•
x
sin( x) ¡ ¢ sin x − π 2
¢ ¡ Figure 5.11: Plot of sin( x) and sin x − π2 for −2π ≤ x ≤ 2π
5.4.2
The log and exponential functions
Both log and exponential functions are widely used in economics because, for example, growth formula and growth calculations utilize these functions. The log of
x to the base b is denoted logb x giving that number a such that x = ba . The log
function is defined only for positive numbers (for x > 0). Two particularly impor
tant bases are b = 10 and b = e, where e is Euler’s number with e ≈ 2.718. Because
the function log e x is widely used, for notational convenience this is often written def
as ln x: ln x = log e x.
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
5.4.2.1
162
The log function
Definition 5.5: The log of a number x to the base b is the power to which b must be raised to equal that number. Thus, a = logb x means x = ba . So, logb x is the
number that satisfies x = b l og b x .
The log function has a number of useful properties. P ROPOSITION 5.2: The log function satisfies: 1. The log of x a to base b is equal to a times the log of x to the base b: logb x a = a log b x. 2. The log of a product is equal to the sum of the logs: logb x y = logb x + logb y. 3. The log of a ratio is equal to the difference of the logs: logb
x y
logb y.
= logb x −
4. Change of base. The log of x to base c equals the log of x to base b times log of b to base c: log c x = log c b × log b x, or
log c x logb x
♦
= log c b.
P ROOF : The proofs are straightforward manipulations of the definition. 1
1. Let c = logb x a , so x a = b c , using the definition of a log. Therefore ( x a ) a = 1
c
c
(b c ) a or x = c a . And x = b a means logb x =
c a.
Multiplying by a gives
a
a log b x = c or a log b x = logb x .
2. Suppose d = log b x y, f = logb x and g = log b y. Then (i) x y = b d , (ii) x = b f
and (iii) y = b g . Using (ii) and (iii), x y = b f b g = b f + g . Since (i) gives x y =
b d , it must be that d = f + g or logb x y = log b x + logb y.
3. Suppose d = logb xy , f = log b x and g = log b y. Then (i) and (iii) y = b g . Using (ii) and (iii),
must be that d = f − g or
logb xy
x y
=
bf bg
x y
= b d , (ii) x = b f
= b f − g . Since (i) gives
x y
= log b x − logb y.
= b d , it
4. Let d = log b x so that x = b d . Then log c x = log c b d . From rule 1, log c b d =
d log c b. So, log c x = d log c b = logb x · log c b, and thus
5.4.2.2
log c x log b x
= log c b.
Choice of base for the log function
There are two commonly used bases: 10 and e. For example, if α = log10 x then 10α = x. Thus, log10 10 = 1, log10 100 = 2, log10 1000 = 3, and so on. The number e is ¢x ¡ defined as e = lim x→∞ 1 + 1x ≈ 2.71828. It arises naturally in differentiation and
5.5. GROWTH RATES
163
integration. The functions are related by the following equations: log e x = and log10 x = α
log e x log e 10 .
log10 x log10 e
For example, if α = log10 x, then 10α = x and log e (10α ) = log e x,
and log e (10 ) = α · log e 10 = log10 x · log e 10, giving the second relation.
It is common practice to write ln x for log e x. One useful feature of the ln function is that the derivative of ln x is dy
dy
that e y d x = 1 or x d x = 1 or
and log10 x = 1 log e 10
· 1x .
log e x log e 10
dy dx
1 x.
If y = log e x, then e y = elog e x = x so
= 1x . For the formulae above: log10 e · ln x = log10 x
so the derivative of log10 x can be written as log10 e · 1x and as
R EMARK 5.3: The mathematical constant e ≈ 2.71828, Euler’s number, may be
defined in a number of ways. Define e to be that number that makes the area under the curve
1 x
between 1 and e equal to 1. The number e may also be de
fined as:
e = 1+
∞ xj X x x2 x3 + + ··· = . 1! 2! 3! j =0 j !
Yet another equivalent definition gives e as the limit of a sequence of numbers: µ ¶ 1 x e = lim 1 + . x→∞ x
The latter formula makes the connection with continuoustime discounting.
5.5 Growth rates Suppose an investor invests one dollar for a year at a 100% interest rate, r = 100. If the interest is calculated once at the end of the year, the loan is worth 1×(1+ r) =
1×2 = 2. Suppose instead interest is compounded twice a year, with the half yearly ¢ ¡ rate set at 2r = 21 . Then after six months the loan is worth 1 × 1 + 12 and at the ¢¡ ¢ ¡ ¢2 ¡ end of twelve months it is worth 1 × 1 + 21 1 + 21 = 1 + 21 = 2.25. Similarly, if
the interest is calculated every four months at the fourmonthly rate 3r = 13 , then ¢3 ¡ at the end of the year the value of the loan is 1 + 31 = 2.36. Calculating interest ¢ ¡ 1 12 = (1.08333)12 = 2.613. on a monthly basis, the value of the loan would be 1 + 12
In general, with n rounds of interest calculation, the value of the loan at the end ¢n ¡ of the year is 1 + n1 . As n becomes large, this approaches e. ¡ ¢n 1 + 1n −→ e.
With continuous interest accumulation (from one instant to the next, a tiny amount of interest is added to the principal), the value of the loan at the end of the year is e.
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
164
Suppose now that the interest rate is r instead of 1 and note that: ³
Ã !n Ã !( n )r Ã !( n ) r r r r ´n 1 1 1 1+ = 1+ ¡n¢ = 1+ ¡n¢ → er . 1+ ¡n¢ = n r r r
(Recall some rules of manipulation: x a x b = x a+b , ( x a )b = x a·b .) With t periods, and
an interest rate r each period, the value after t periods is ´(1 + r) t . With inter³£¡ ¢¤ 2 t and with interest est calculated twice each period, the final value is 1 + 2r ¢¤ ¢ ¡£¡ r n t . From the previous calculated n times each period the final value is 1+ n
calculations:
³h³ r ´in ´ t 1+ → ( e r ) t = e rt . n
Thus, the value of P dollars invested now with continuous interest accumulation is V = P e rt . Conversely, V dollars in t periods with continuous discounting is worth V e−rt (= P ) now.
5.5.1 Additivity of continuous growth rates One advantage of using continuous compound rates is that growth rates are additive. Consider three time periods, s < t < u, with some variable (such as income) observed at those three dates: ys , yt , yu . In the discrete calculation, the growth rate from s to t is the number r st determined by from t to u is the number r tu determined by to u the growth rate is r su , defined from
yu ys
yu yt
yt ys
= (1 + r st ), and the growth rate
= (1 + r tu ). Over the period from s
= (1 + r su ). However,
(1 + r st )(1 + r tu ) = (1 + r st + r tu + r st r tu ) = (1 + r su ), so the overall growth is not the sum of the growths in the two periods: r st + r tu 6=
r su . By contrast, using the continuous growth formula: yt = eρ st , ys
yu = eρ tu , yt
yu = eρ su , ys
where ρ i j is the continuous growth rate from time i to time j. Then, multiplying,
eρ su =
yu yt yu = = eρ st eρ tu = eρ st +ρ tu , ys ys yt
so that ρ su = ρ st + ρ tu . Thus, the growth rate over the full period from t to u is equal to the sum of the growth rates in the two subperiods.
5.5. GROWTH RATES
165
5.5.2
The functions ln x (log e x) and e kx
Figure 5.12 shows the functions e x and log e x. Both are increasing in x, but e x increases rapidly (at an increasing rate), whereas log e x increases at a decreasing rate. The following section provides additional discussion on these important functions.
ex
ln x
x
x
Figure 5.12: The functions e x and ln x (≡ log e (x))
5.5.2.1
The derivatives of e x and ln x
Given a constant k, the derivative of e kx is given by lim∆→0
e k(x+∆) − e kx . ∆
e k(x+∆) − e kx e(kx+k∆) − e kx e kx e k∆ − e kx = = ∆ ∆ ∆ k∆ e kx [e k∆ − 1] − 1] kx [e = =e . ∆ ∆ 2
Recalling that one definition of e z is e z = 1 + 1!z + z2! + · · · k∆ k2 ∆2 k3 ∆3 + + ··· 1! 2! 3! · ¸ k∆ k2 ∆2 2 2 1 = 1 + k∆ + ∆ k + + ··· . 2! 3! 4!
e k∆ = 1 +
Therefore, · ¸ 1 k∆ k2 ∆2 [e k∆ − 1] = k + ∆ k2 + + ··· . ∆ 2! 3! 4!
Consider:
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
166
Since the term [· · · ] is bounded for k∆ < 1: [ e k∆ − 1] = k. ∆→0 ∆ lim
¯h i¯ ¯ 1 k ∆ k 2 ∆2 ¯ (The term [· · · ] is bounded because ¯ 2! + 3! + 4! · · · ¯ ≤
k∆ + k∆2 + · · · =
1 1− k∆ .)
1 2!
Therefore:
2
∆ + k3!∆ + k4! ··· < 1 +
d kx { e } = ke kx . dx )−ln x . The derivative of ln x is given by lim∆→0 ln( x+∆ ∆
Consider: µ µ µ ¶ ¶ ¶ x+∆ ∆ 1 1 1 ln( x + ∆) − ln x 1 = ln = ln 1 + = ln 1 + x ∆ ∆ x ∆ x ∆ ∆ "µ "µ ¶1 # ¶x 1# 1 ∆x 1 ∆ = ln 1 + x . = ln 1 + x ∆
Rearranging brackets, ln
·³
1+
³ tinuous for x > 0 and since 1 +
1
x ∆
´x 1¸
1
x ∆
∆ x
´x
∆
"½ ³ 1+ = ln
∆
´ x ¾ 1x
#
½³ 1+
1
1
x ∆
> 1, therefore
∆
. The ln x function is con
x ∆
´ x ¾ 1x ∆
> 1. (Recall, if f is
continuous at x0 then lim x→ x0 f ( x) = f ( x0 ).) Therefore, ( (µ µ ¶ x ) 1x ¶ x ) 1x h 1i ∆ ∆ 1 1 ln( x + ∆) − ln x = ln lim 1 + = ln e x = 1 , = lim ln 1 + x lim x ∆→0 ∆→0 ∆→0 ∆ x ∆ ∆
and so,
d 1 {ln x} = . dx x
5.6 Some rules of calculus Some basic rules of calculus apply for the differentiation of the product, ratio and composition of continuous functions. 1. The derivative of a product of functions:
d { f ( x ) g ( x )} = f ( x ) g ′ ( x ) + g ( x ) f ′ ( x ). dx
5.6. SOME RULES OF CALCULUS
167
For example, if f ( x) = xα and g( x) = cos x. Then
d α d x { x cos( x)}
α xα−1 cos x.
= xα sin( x) +
2. The derivative of a ratio of functions ( g( x) 6= 0):
d dx
½
¾ f ( x) g( x ) f ′ ( x ) − f ( x ) g′ ( x ) = . g( x ) [ g( x)]2
3. The chain rule (differentiating the composition of functions).
d { f ( g( x))} = f ′ ( g( x)) g′ ( x). dx
E XAMPLE 5.6: Consider: f ( x) = ln x and g( x) = x2 . (Note that the ln function
is defined only for x > 0.) Then
f ′ ( x) =
1 d {ln( x)} = dx x
g′ ( x ) =
d 2 { x } = 2 x. dx
1. Product rule: If h( x) = f ( x) g( x), then h′ ( x) = (ln x) · (2 x) + 1x · x2 . 2. Ratio rule: If h( x) =
f ( x) g ( x) ,
then h′ ( x) =
x2 · 1x −(ln x)·(2 x) ( x 2 )2
3. Chain rule: Put h( x) = g( f ( x)) = (ln x)2 . Differentiating directly, h′ ( x) = ¡ ¢ (2 ln x) 1x . Or, from the chain rule,
h′ ( x) = g′ ( f ( x)) f ′ ( x) = (2 ln x)
E XAMPLE 5.7 (R ATIO
RULE ):
using the ratio formula:
µ ¶ µ ¶ 1 1 = (2 ln x) x x
The function tan x is defined tan x =
♦
sin x cos x ,
and
d cos( x) cos( x) + sin( x) sin( x) cos2 ( x) + sin2 ( x) 1 tan( x) = = = , dx (cos( x))2 cos2 ( x) cos2 ( x) utilizing the fact that the sine of an angle is the opposite (a) over the hypotenuse (h) and the cosine is the base (b) divided by the hypotenuse (in a rightangled triangle). Thus, cos x =
b h,
sin x =
a h
and tan x =
sin x cos x
=
a b.
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
168
The figure illustrates:
h
a
x
E XAMPLE 5.8 (C HAIN
RULE ):
♦
.
b
Let g( y) = sin y and f ( x) =
p
x. Then
g′ ( y) = cos y f ′ ( x) =
1 −1 x 2. 2
p p 1 If h( x) = g( f ( x)) = sin( x) then h′ ( x) = cos( x) 12 x− 2 =
p cos( x) p . 2 x
2
Let g( y) = α y and f ( x) = x2 , so that g( f ( x)) = α x . This may be written 2
2
2
eln α· x , so that the derivative is 2 x ln α eln α x = 2 x ln α · α x .
♦
In the context of the exponential and log functions, the chain rule gives:
d f ( x) e = f ′ ( x ) e f ( x) , dx
d 1 ′ ln f ( x) = f ( x) given f ( x) > 0. dx f ( x)
The next example illustrates a more abstract calculation using the chain rule. E XAMPLE 5.9: Suppose that f (α) = g(α)h(α) , where g(α) > 0 for all α. To find
the derivative of f , take the ln of f and differentiate it. ln f (α) = h(α) ln g(α).
Thus, differentiating both sides:
µ ¶ · µ ¶ ¸ d ln f (α) 1 1 = f ′ (α) = h′ (α) ln g(α) + h(α) g′ (α) . dα f (α) g(α)
So, µ
¶ 1 f (α) = f (α)h (α) ln g(α) + f (α)h(α) g′ (α) g(α) ¶ µ 1 g′ (α). f ′ (α) = g(α)h(α) h′ (α) ln g(α) + g(α)h(α) h(α) g(α) ′
′
¡ Alternatively, observe that f (α) = e h(α) ln g (α) , so that using the rule ¢ e r ( x) r ′ ( x ) · µ ¶ ¸ 1 f ′ ( x) = e h(α) ln g (α) h′ (α) ln g(α) + h(α) g′ (α) . g(α)
d r ( x) = dx e
♦
5.6. SOME RULES OF CALCULUS
169
5.6.1
l’Hospital’s rule
Sometimes it is necessary to consider the behavior of the ratio of functions,
f ( x) g ( x) ,
in
the neighborhood of a point where the ratio is not defined. Consider the behavior of the ratio f ( x) g ( x)
f ( x) g ( x)
behave?
in the neighborhood of a point a. Suppose that x → a; how does
There are a few cases where the answer is obvious. If f ( x) → c and g( x) → d
with d 6= 0, then
f ( x) g ( x)
→ dc . Similarly, if f ( x) → c and g( x) → d with d = 0 and c 6= 0,
then the ratio tends to +∞ or −∞.
However, there are two specific cases where the answer is not obvious: When both f ( x) and g( x) go to 0, and when both f ( x) and g( x) go to ±∞. In such cases,
determining the limit may be tedious. However, l’Hospital’s rule provides a means of evaluating the limit. Theorem 5.1: Suppose that (i) f and g are differentiable on (a − δ, a + δ) \ {a} for
some δ > 0 (where the notation x\{ y} means the set x with the point y removed), (ii) lim x→a f ( x) = 0 and lim x→a g( x) = 0, (iii) g′ ( x) 6= 0 on (a − δ, a + δ) \ {a}. Then, lim
x→ a
f ′ ( x) =k g′ ( x )
implies
lim
x→ a
f ( x) = k. g( x )
Similarly, suppose that (i) f and g are differentiable on (a − δ, a + δ) \ {a} for some δ > 0, (ii) lim x→a f ( x) = ∞ and lim x→a g( x) = ∞, (iii) g′ ( x) 6= 0 on ( a − δ, a + δ) \ {a}.
Then,
lim
x→ a
f ′ ( x) =k g′ ( x )
implies
lim
x→ a
f ( x) = k. g( x )
Thus, under general conditions, the limit of the ratio is equal to the limit of the ratio of the derivatives.
E XAMPLE 5.10 ( L’H OSPITAL’ S RULE ): Let f ( x) = x2 and g( x) = e x − 1. As
x approaches 0, the values of the functions approach 0: lim x→a f ( x) =
lim x→a g( x) = 0. How does
f ( x) g ( x)
behave as x → 0? Considering the derivatives
of both functions, f ′ ( x) = 2 x and g′ ( x) = e x . Since 2 x goes to 0 as x approaches 0 and e x goes to 1 as x goes to 0,
lim
x2
x→0 e x − 1
= lim
x→0
2x = 0. ex
♦
CHAPTER 5. FUNCTIONS OF ONE VARIABLE
170
5.6.2 Higher derivatives For notational convenience, the second derivative is denoted f ′′ ( x). Technically, the second derivative is defined in exactly the same way as the first derivative:
f ′′ ( x) = lim
∆→0
f ′ ( x + ∆) − f ′ ( x) . ∆
Thus, the second derivative is the derivative of the first derivative. Likewise, higherorder derivatives may be determined step by step. As noted earlier, a continuous function may not be differentiable. Similarly, a function may have a first derivative, but not a second derivative; and so on. Properties of the second derivative are central in the discussion of optimization. If the second derivative is positive, the slope of the function is increasing at that point; if the second derivative is negative, the slope of the function is decreasing. Thirdorder derivatives are calculated similarly:
f ′′′ ( x) = lim
∆→0
f ′′ ( x + ∆) − f ′′ ( x) . ∆
Higherorder derivatives are calculated by repeated differentiation.
Exercises for Chapter 5
Exercise 5.1 Consider the function F , defined:
f ( x) =
2 x2 ,
x
0. h( x) 4x
Find the derivative of f with respect to x, and show that the derivative is negative p p for x < 5 and positive for x > 5. Exercise 5.8 Consider the ratio
ln( t) t : show that the limit as k −at
= 0. Next, consider the function f ( t) = t e
positive number k, lim t→∞ f ( t) = 0.
t → ∞ is 0: lim t→∞ ln(t t)
, t > 0, where a > 0. Show that for any
Exercise 5.9 Given a function f ( x), the first derivative is a function f ′ ( x). Differentiating this function f ′ ( x) gives a function denoted f ′′ ( x), the second derivative of f . Plot the function f ( x) = 8[ x3 − 2 x2 + x] and its first and second derivatives. Exercise 5.10 Suppose that revenue as a function of output is given by R (Q ) =
ln(Q )− 14 Q 2 . Output, in turn, depends on input K according to the equation Q (K ) =
3 − e−aK . Thus, r(K ) = R (Q (K )) gives revenue in terms of K . Find
dr dK .
Exercise 5.11 Suppose the inverse demand for a good is given by the function
P (Q ) where P ′ (Q ) < 0. This identifies the price that corresponds to a given level
of demand. Conversely, the demand function Q (P ) associates a level of demand to
each price. So, given a quantity Q , P (Q ) is the price that would lead to that level of demand. Mathematically, Q = Q (P (Q )) for each Q . The elasticity of demand is
defined at Q = P (Q ) as
ǫ(Q ) = −
P dQ P dQ (P ) =− . Q dP Q (P ) dP
EXERCISES FOR CHAPTER 5
173
Use the chain rule to show that 1 Q dP (Q ) =− . ǫ(Q ) P (Q ) dQ
Exercise 5.12 Suppose that output Q is produced at cost C (Q ) = ln(Q + 1) + e−Q . Therefore, the average cost function is:
A (Q ) =
C (Q ) ln(Q + 1) + e−Q = . Q Q
Calculate A ′ (C ). Exercise 5.13 Suppose that output Q is produced at cost C (Q ). The average cost of the output is A (C ) =
C (Q ) Q .
Show that average cost is declining if marginal cost
is less than average cost. Exercise 5.14 A firm pays tax on gross revenue R (Q ) at a rate that depends on output volume t(Q ). Suppose that R (Q ) = k ln(Q +1)− aQ α and t(Q ) = 1− e−bQ , then total tax paid is T (Q ) = R (Q ) t(Q ) = [k ln(Q + 1) − aQ α ](1 − e−bQ ). Use the product rule to find T ′ (Q ).
This page intentionally left blank
6 Functions of one variable: applications
6.1
Introduction
In this chapter, calculus techniques are applied to the study of a variety of problems, beginning with optimization in Section 6.2. In an optimization problem, given a function f of one variable x, the problem is to find the value of x, x∗ that maximizes f . Such an x∗ is characterized by the condition that f (x∗ ) ≥ f (x) for all x. If the choice of x is restricted to some set X , then the requirement is that f (x∗ ) ≥ f (x) for all x ∈ X . Calculus techniques are used to find x∗ by considering
how f varies with x. These techniques are local in nature and so locate “local” maxima (or minima), where there is no alternative choice close to the candidate optimum that yields a higher value of the function f (or lower value if the task is to minimize the value of f ). In contrast, x∗ is a global maximizer if f (x∗ ) is as large as f (x) for any possible x. Identifying whether a value x∗ is a local or global maximizer is considered in Section 6.3. This discussion focuses on the shape of the function f , and the use of calculus techniques to identify the shape. Following this discussion of optimization, a number of applications are examined. The discussion of externalities in Section 6.4 illustrates the use of these techniques to study the welfare implications of traffic congestion. Comparative statics, introduced in Section 6.5, concerns the change in a variable resulting from an underlying parameter change. For example, a change in the intercept of a demand curve will change the intersection of the supply and demand curves and hence the equilibrium quantity and price. In the optimization context, if the function f depends on x and a fixed parameter α, then the optimal choice of x to maximize f (x, α) will depend on α: x(α). Therefore, any change in α will result in a change in the value of x(α). The study of how x(α) changes as α changes is considered in this section. 175
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
176
Subsequent sections consider comparative static calculations in a taxation context – in Sections 6.7 and 6.8. A key measurement in economics is the elasticity, measuring the response of a dependent variable to a change in another variable – for example, measuring how demand responds to a price change. In the context of demand, this is introduced in Section 6.6. Finally, Section 6.9 considers Ramsey pricing, using variational arguments to study efficient taxation.
6.2 Optimization Making optimal choices involves maximization or minimization of some objective. In the present context, given a function f where f : X → R , suppose that we wish to find a value for x, that maximizes (or minimizes) f on X . For most of this discussion, X will be a subset of R . ¯ ≥ f (x) for all x ∈ X , and a Definition 6.1: Say that x¯ is a global maximum if f ( x) ¯ ≤ f (x) for all x ∈ X . If f ( x) ¯ ≥ f (x) for all x ∈ X , say that x¯ global minimum if f ( x) ¯ ≤ f (x) for all x ∈ X , say that x¯ minimizes f . maximizes f . If f ( x)
The discussion here focuses on the problem of maximizing f (to minimize f , maximize − f ). Furthermore, throughout the discussion only unconstrained optimization is considered: at any point x, one can move in any direction, to alternative
choices. That fact allows one to use derivatives to characterize optimal choices. (An example of a constrained optimization problem is: max f (x) = x2 subject to the constraint that 0 ≤ x ≤ 1. In that case, the maximum occurs at x = 1, where f ′ (x) = 2.)
To motivate some of the key ideas, consider Figure 6.1, where the function f is increasing to the left of b (at point a, for example), is flat at b and decreasing to the right of b (such as at c). Corresponding to these movements, the derivative is positive at a, 0 at b and negative at c. The function is highest at b (and so is maximized at b) and this corresponds to the point where the derivative is 0. When the derivative is not 0, such as at a or c, rightward or leftward movements raise the value of the function. So, a first derivative equal to 0 is a necessary condition for a point to be a candidate for maximizing the function. Note also that, in this example, the derivative is decreasing. If the derivative is decreasing, then the slope of the derivative is negative – the derivative of the first derivative is negative or, equivalently, the second derivative is negative. So, in this case, a point maximizes the function if the derivative of the function at that point is 0, and in addition, the second derivative is negative.
6.2. OPTIMIZATION
177
f ′ (a) < 0, f ′ (b) = 0, f ′ (c) < 0
a
b
c
Figure 6.1: Changing slopes ¯ = 0, so that A necessary condition for a point x¯ to be an optimum is that f ′ ( x)
¯ if f ′ ( x) ¯ > 0 increasing x raises the value of the function, the function is flat at x:
¯ < 0 reducing x raises the value of the function. This observation may and if f ′ ( x) be expressed mathematically.
Theorem 6.1: Let f be a differentiable function. Then if x¯ maximizes f (x), ¯ = 0. f ′ ( x) ¯ = k 6= 0. The derivative of f at x¯ is lim∆ x→0 P ROOF : Suppose that f ′ ( x) f ( x¯ +∆ x)− f ( x¯ ) ∆x
¯ > 0; = k. If k > 0, taking ∆ x > 0 and small implies that f ( x¯ + ∆ x) − f ( x)
¯ > 0. In either and if k < 0, taking ∆ x < 0 and small implies that f ( x¯ + ∆ x) − f ( x) case, x¯ is not a maximizer of f .
¯ = 0, then the function has no maximum. Of course, If no point x¯ exists with f ′ ( x) if the function is not differentiable, then it may have a maximum at a point where
the derivative is not defined. For example f (x) = min{ x,1 − x} is a tent shaped
function with a maximum at 12 , but the function is not differentiable at 12 . While
¯ = 0 is a necessary condition for a maximum of a differentiable function, it is f ′ ( x)
¯ not sufficient. Figure 6.1 gives a case where the function has a unique point, x, where the first derivative is 0 and this point is the point that maximizes f . The ¯ = 0, that the slope is zero, is called the firstorder condition for an condition, f ′ ( x)
optimum.
6.2.1
The firstorder condition: issues
The firstorder condition is a necessary condition for a point x¯ to be either a ¯ 6= 0 it means that the function is either maximum or a minimum, for if f ′ ( x)
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
178
¯ However, this is only a necessary condition. increasing or decreasing at x. Consider the function: f (x) =
(
− x2
x
2
if x < 0, if x ≥ 0.
For this function, f ′ (x) = −2x, x < 0 and f ′ (x) = 2x, x ≥ 0. Therefore, f is differentiable and f ′ (0) = 0. However, there are points arbitrarily close to 0 (to the right)
with f (x) > 0 = f (0) and points arbitrarily close to 0 (to the left) with f (x) < 0.
(The function f (x) = x3 has similar properties.) Thus, while the condition f ′ (x) = 0 locates an x such that the function f is flat at that point, additional conditions must be checked to determine if x is optimal.
f
f
(a)
x¯
x
(b)
x¯
f
x
(c)
x¯
x
Figure 6.2: Optimization: some possibilities I
Figure 6.2 illustrates some of the many possibilities where a solution to ′
f (x) = 0 may or may not yield a maximum. In Figure 6.2, case (a) is “wellbehaved”,
in that there is a unique point where f ′ (x) = 0, and this corresponds to a global
¯ f ′ ( x) ¯ = 0, but while x¯ is a local maximum, it is not a maximum. In case (b), at x, global maximum. In case (c), the point x¯ is called a point of inflection, where the function is increasing, then flattens and then continues increasing. The actual functions plotted in Figure 6.2 are: (a) f (x) = 3x − x2 ,
(b) f (x) = 1 − 4x(x − 1)(x − 2)2 (x − 3),
(c) f (x) = 3x − 6x2 + 4x3 .
R EMARK 6.1: It may be worth noting that f ′ (x) = 0 is consistent with the func
tion being strictly increasing at x ( f is strictly increasing if y > x implies f (y) > ¯ = 0 but x¯ is not a local maxf (x)). Considering case (c), f (x) = 3x − 6x2 + 4x3 , f ′ ( x)
¯ = 0. imum. In fact, this function is strictly increasing, despite the fact that f ′ ( x) To see this, observe that f ′ (x) = 3 − 12x + 12x2 = 3 − 12x(1 − x), so that at x¯ = 12 , ¡ ¢ ¡ ¢ ¯ = 12 , while for x < x, ¯ f (x) < 12 and f ′ 12 = 3 − 12 14 = 0. Observe that x¯ = 12 , f ( x)
6.2. OPTIMIZATION
179
¯ f (x) > 12 , so f is a strictly increasing function. To see this more clearly, for x > x, f (x + z) − f (x) = [3(x + z) − 6(x + z)2 + 4(x + z)3 ] − [3x − 6x2 + 4x3 ] = 3z − 12xz − 6z2 + 12x2 z + 12xz2 + 4z3 = z[3 − 12x + 12x2 − 6z + 12xz + 4z2 ] = z[3 − 12x(1 − x) + z(−6 + 12x + 4z)]
f
¡1
2
¢
+z −f
¡1¢
Thus, z > 0 implies f
2
= 4z3 .
¡1 2
¢ ¡ ¢ ¡ ¢ ¡ ¢ + z − f 12 > 0, and z < 0 implies f 12 + z − f 12 < 0, so
the function f is strictly increasing.
The next group of functions illustrate additional possibilities. f
(d)
(e)
f
x
x¯
f
x
x¯
(f )
x¯
x
Figure 6.3: Optimization: some possibilities II
The functions plotted in Figure 6.3 are:
(d)
(e)
(f)
p 2 x − x + 1, f (x) = 2, ¡ ¢ 2x 1 − 14 x ,
0≤x f ′ (x + ∆), so that the graph of f ′ appears as in Figure 6.5 (ii). These ideas
can be formalized using Taylor series expansions.
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
f
f′ (i)
182
f ′′
(ii)
(iii)
x¯ (0, 0)
x¯ (0, 0)
x
x
x x¯ p Figure 6.5: Optimization: f ( x) = 3 x − x
Theorem 6.2: Suppose that f is a continuous function with continuous first and second derivatives, f ′ ( x) and f ′′ ( x) (these are continuous functions of x). Then
f ( x¯ + ∆ x) = f ( x¯ ) + f ′ ( x¯ )∆ x + 21 f ′′ ( p)(∆ x)2
(6.1)
for some p between x¯ and x¯ + ∆ x. (That is, p satisfies x¯ − ∆ x ≤ p ≤ x¯ + ∆ x.) (This theorem is discussed in detail later.) Because ∆ x may be positive or negative, p between x¯ and x¯ + ∆ x means that x¯ − ∆ x ≤ p ≤ x¯ + ∆ x. With this result,
necessary and sufficient conditions may be given for a local optimum.
Theorem 6.3: Let f be a continuous function with continuous first and second derivatives. Suppose that f ′ ( x¯ ) = 0. Then: 1. f ′′ ( x¯ ) < 0 implies that x¯ is a local maximum. 2. f ′′ ( x¯ ) > 0 implies that x¯ is a local minimum. P ROOF : Consider the first case and suppose that f ′ ( x¯ ) = 0 and f ′′ ( x¯ ) < 0. Since
f ′′ is continuous, there is an ǫ > 0 such that f ′′ ( z) < 0 for all z with x¯ −ǫ < z < x¯ +ǫ.
Pick ∆ x so that ∆ x < ǫ. Then, from Theorem 6.2,
f ( x¯ + ∆ x) = f ( x¯ ) + f ′ ( x¯ )∆ x + 12 f ′′ ( p)(∆ x)2 = f ( x¯ ) + 21 f ′′ ( p)(∆ x)2 ,
since f ′ ( x¯ ) = 0,
with p satisfying:
x¯ − ǫ < x¯ − ∆ x < p < x¯ + ∆ x < x¯ + ǫ. Because x¯ − ǫ < p < x¯ + ǫ, f ′′ ( p) < 0, and therefore
1 ′′ 2 2 f ( p)(∆ x)
< 0. Consequently,
f ( x¯ + ∆ x) < f ( x¯ ) when ∆ x < ǫ. The second part of the theorem is proved in the
same way.
6.2. OPTIMIZATION
183
The condition f ′′ (x) < 0 is called a secondorder condition. ¯ = 0 but f ′′ (only) satisfies f ′′ (x) ≤ 0 in a neighR EMARK 6.2: Suppose that f ′ ( x)
borhood of x¯ (rather than f ′′ (x) < 0). That is, for some ǫ, if x¯ − ǫ < z < x¯ + ǫ, then f ′′ (z) ≤ 0. If ∆ x < ǫ, then according to the Taylor series expansion, f (x + ∆ x) = ¯ + 12 f ′′ (p)(∆ x)2 , with f ′′ (p) ≤ 0, so that f (x + ∆ x) ≤ f ( x). ¯ But the inequality f ( x)
may not be strict. In this case x¯ is a local maximum, but it may not be unique ¯ (in any neighborhood of x¯ there may be x′ 6= x¯ and f (x′ ) = f ( x)).
¯ = 0 and f ′′ satisfies f ′′ (x) ≥ 0 in a neighborhood of x, ¯ then Conversely, if f ′ ( x)
¯ > 0, then continuity of f ′′ implies that x¯ is a local minimum. Again, if f ′′ ( x) ¯ and then f (x + ∆ x) > f ( x), ¯ implying that x¯ f ′′ (p) > 0 for p sufficiently close to x,
¯ = 0 implies is a unique local minimum. If, in fact, f ′′ (x) ≤ 0 for all x, then f ′ ( x) that x¯ is a global maximum. This observation is developed in Section 6.3. The
property f ′′ (x) ≤ 0 for all x is closely related to concavity (see Section 6.3 and
Theorem 6.6).
6.2.3
Applications: profit maximization and related problems
A simple application of these results is the monopoly revenue optimization problem, where inverse demand is given by p = p(Q) and the cost by c(Q). Thus, total revenue is R(Q) = Q p(Q) and profit is π(Q) = R(Q) − c(Q). These functions are depicted in Figure 6.6. R(Q)
π(Q)
c(Q)
Q
Q
Q
Figure 6.6: Revenue, cost and profit The firstorder condition for profit maximization is: π′ (Q) = R ′ (Q) − c′ (Q) = 0.
If we assume that R ′′ (Q) ≤ 0 and c′′ (Q) ≥ 0 for all Q, then the profit function has
a nonpositive second derivative: π′′ (Q) = R ′′ (Q) − c′′ (Q) ≤ 0. In this case, to find a
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
184
Demand, revenue and cost
Profit
MC
MR
D
Figure 6.7: Marginal cost and marginal revenue
¯ = 0. The intuition maximum of π it is sufficient to find a value Q¯ such that π′ (Q)
is simple: increase output to the point where the increase in revenue matches the increase in cost. (Beyond that point the increase in cost is greater.) This yields the profitmaximizing rule: set marginal revenue equal to marginal cost. Figure 6.7 depicts the profitmaximizing choice. The calculations may be taken a step further. With π(Q) = R(Q) − c(Q) =
Q p(Q) − c(Q), differentiating gives
π′ (Q) = [p(Q) + Q p′ (Q)] − c′ (Q)
and rearranging: · ¸ Q dp π′ (Q) = p(Q) 1 + − c′ (Q). p dQ
This expression can be simplified, using the elasticity of demand (discussed in Section 6.6).
E XAMPLE 6.3: In the linear case p(Q) = a − bQ and c(Q) = cQ, where c is
the constant marginal cost. Then
R(Q) = Q p(Q) = Q(a − bQ) = aQ − bQ 2 and π(Q) = aQ − bQ 2 − cQ = (a − c)Q − bQ 2 .
Also, π′ (Q) = (a − c) − 2bQ = R ′ (Q) − c′ (Q) = a − 2bQ − c. Setting π′ (Q) = (a − c) −
2bQ = 0, the solution for Q is Q m = ′′
′′
(a− c) 2b .
The secondorder condition is
π (Q) ≤ 0 and in the linear case π (Q) = −2b < 0. Therefore, Q m =
(a− c) 2b
6.2. OPTIMIZATION
185
maximizes π. This may be compared with the competitive model, where quantity (Q c ) is such that price equals marginal cost: p(Q ) = c′ (Q ). With linearity a − bQ = c or Q c =
( a − c) b
♦
for the competitive output.
E XAMPLE 6.4: Let f ( x) = β ln(a + x) − (α + x)2 , where α, β > 0. Then
f ′ ( x) =
β α+ x
− 2(α + x)
and
f ′′ ( x) = −
β − 2 < 0. (α + x )2 β
Solving for the firstorder condition: 0 = f ′ ( x) = α+ x − 2(α + x) gives q q q β β β (α + x)2 and so 2 = (α + x) and x = 2 − α (assume α − α > 0).
β 2
=
♦
The next example is a variation of the monopoly problem.
E XAMPLE 6.5: A monopolist has demand function p(Q ) =
1 Qα
= Q −α , 0
1. The firm maximizes profit (revenue minus cost) giving the objective:
π(Q ) = p(Q )Q − c(Q ) = QQ −α − cQ β = Q 1−α − cQ β .
Then π′ (Q ) = (1 − α)Q −α − β cQ β−1 π′′ (Q ) = −α(1 − α)Q −α−1 − β(β − 1) cQ β−2 .
Since 0 < α < 1 and β > 1, −α(1 − α) < 0 and −β(β − 2) < 0, and for Q > 0,
Q −α−1 > 0 and Q β−2 > 0. Therefore, for all Q > 0, π′′ (Q ) < 0. Consequently, the solution to π(Q ) = 0 gives a local optimum (in fact, a global optimum). And,
π′ (Q ) = (1 − α)Q −α − β cQ β−1 = 0
implies
(1 − α) − β cQ α+β−1 = 0.
Rearranging: (1 − α) βc · ¸ 1 (1 − α) α+β−1 Q= . βc
Q α+β−1 =
♦
E XAMPLE 6.7: Consider a tree harvesting problem. Suppose that the cost of planting a tree is c. If the tree is sold when harvested at age t, its value is f ( t). Therefore, the present discounted value (PDV) of a tree harvested at age t is P ( t) = e−rt f ( t) − c, where r is the interest rate. The firstorder condition for present value maximization is
P ′ ( t) = 0 = e−rt f ′ ( t) − re−rt f ( t) = 0 (using the product rule). This may be written P ′ ( t) = 0 = e−rt [ f ′ ( t)− r f ( t)] = 0,
and since e−rt 6= 0 this gives [ f ′ ( t) − r f ( t)] = 0 as the firstorder condition.
Since no functional form has been specified (yet) for f ( t), this cannot be solved.
6.3. CONCAVITY AND OPTIMIZATION
187
The secondorder condition, which ensures that the solution above gives a maximum, is P ′′ (t) < 0, and here this requires that P ′′ (t) = −re−rt f ′ (t) + e−rt f ′′ (t) + r2 e−rt f (t) − re−rt f ′ (t) < 0, which simplifies to e−rt {−r f ′ (t) + f ′′ (t) + r2 f (t) − r f ′ (t)} < 0, or, since e−rt 6= 0,
{r2 f (t) − 2r f ′ (t) + f ′′ (t)} < 0.
To consider a specific example, suppose that f (t) = 0 = [ f ′ (t) − r f (t)] = 1
Multiply by t 2 to obtain 0 and f
′′
(t) = − 14 t−
¡3¢ 2
1 2 − rt = 0,
or t =
p
1
t = t 2 . In this case
1 1 −1 t 2 − rt 2 . 2
1 2r .
Since f (t) =
p
t > 0, f ′ (t) =
1 p1 2 t
>
< 0, the secondorder condition is: 1 1 1 3 r2 t 2 − rt− 2 − t− 2 < 0. 4 1
Multiplying this by t 2 leaves the sign unchanged, so the condition is equivalent to: r2 t − r − And, substituting t =
1 2r
1 −1 t < 0. 4
gives the condition: r 1 − r − r = −r < 0. 2 2
The secondorder condition is satisfied.
♦
The circumstances under which the secondorder conditions hold are closely related to the shape of the function.
6.3
Concavity and optimization
One issue with the use of secondorder conditions is that such conditions do not determine whether x¯ is a local or a global maximum. In cases (a) and (b) of ¯ = 0 and f ′′ ( x) ¯ < 0 are satisfied but case (a) corFigure 6.2, the conditions f ′ ( x) responds to a global maximum while case (b) corresponds to a local maximum.
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
188
As the figures plotted earlier show, the overall shape of the function f is crucial in determining whether these properties of the derivatives of the function identify local or global maxima. In Figure 6.2, the function depicted in (a) has a shape that leads to the solution to f ′ (x) = 0 identifying a global maximum. This function is an example of a concave function.
Definition 6.2: A function f is concave if for any x, y and any θ ∈ [0,1] f (θ x + (1 − θ)y) ≥ θ f (x) + (1 − θ) f (y). A concave function is “domeshaped”, as in Figure 6.8.
f (θ x + (1 − θ)y) f (y) θ f (x) + (1 − θ) f (y)
f (x)
x
θ x + (1 − θ)y
y
Figure 6.8: Concavity When the shape is “reversed” the function is convex. Convexity is defined: Definition 6.3: A function g is convex if for any x, y and any θ ∈ [0,1] g(θ x + (1 − θ)y) ≤ θ g(x) + (1 − θ)g(y). A convex function is “bowlshaped” – the reverse of a concave function. Intuitively, a concave function is shaped like a hill; a convex function is shaped like a valley. Figure 6.9 depicts both types of function. A more extensive discussion of concavity and convexity is given in Chapter 8, when Taylor series expansions are considered in detail. (The functions plotted in Figure 6.9 are f (x) = 5 ln(1 + x) − x and g(x) = −15x + 8e x − 5.) The properties of concavity and convexity are closely related to
global optima.
¯ = 0, then x¯ is a global maximum. If f is Theorem 6.4: If f is concave and f ′ ( x)
¯ = 0, then x¯ is a global minimum. convex and f ′ ( x)
6.3. CONCAVITY AND OPTIMIZATION
189
g f
convex
concave
x¯
x¯
Figure 6.9: Concave and convex functions P ROOF : Considering the concave case, suppose that f is concave and x¯ satisfies ¯ = 0 but x¯ is not a global maximum. Then there is some y such that f (y) > f ′ ( x) ¯ Using concavity of f : f ( x).
¯ ≥ θ f (y) + (1 − θ) f ( x), ¯ θ ∈ [0,1]. f (θ y + (1 − θ) x) Rearrange this expression as ¯ − f ( x) ¯ ≥ θ[ f (y) − f ( x)], ¯ θ ∈ [0,1]. f (θ y + (1 − θ) x) ¯ f ( x¯ + θ(y − x)) ¯ − f ( x) ¯ ≥ θ[ f (y) − f ( x)], ¯ θ ∈ [0,1] or Since θ y + (1 − θ) x¯ = x¯ + θ(y − x), ¯ − f ( x) ¯ f ( x¯ + θ(y − x)) ¯ ≥ [ f (y) − f ( x)], θ ¯ this may be written and, multiplying and dividing the left side by y − x, ¯ (y − x)
·
¸ ¯ − f ( x) ¯ f ( x¯ + θ(y − x)) ¯ ≥ [ f (y) − f ( x)]. ¯ θ(y − x)
Finally, letting θ → 0, gives ¯ f ′ ( x) ¯ ≥ [ f (y) − f ( x)] ¯ >0 (y − x) ¯ This contradicts the fact that f ′ ( x) ¯ = 0. Therefore, concavity (here ∆ x = θ(y − x)).
of f and the condition f ′ (x) = 0 imply that x¯ is a global maximum. A similar argument shows that with convexity, the condition f ′ (x) = 0 is sufficient for a global minimum.
¯ the convex funcIn Figure 6.9, the concave function, f , has a maximum at x; ¯ A key feature of either a concave function or a tion, g, has a minimum at x. convex function is that the second derivative never changes sign: it is always
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
190
nonnegative or always nonpositive. In Figure 6.10, the derivatives of f and g are plotted. Given the functional forms, f (x) = 5 ln(1 + x) − x with f ′ (x) = x
x
′
and g(x) = −5 − 15x + 8e with g (x) = −15 + 8e .
5 x +1
− 1,
Considering the concave function f , its derivative or slope to the left of x¯ is ¯ and negative to the right of x. ¯ The derivative decreases steadily, positive, 0 at x, ¯ to become a negative number to the right from a positive number to the left of x, ¯ Thus, the derivative is decreasing: f ′′ (x) ≤ 0 for all x. For the convex function of x.
g, the opposite observation applies. The derivative is negative to the left of x¯ and
¯ And so g′′ (x) ≥ 0 for all x. positive to the right of x. g′ (x)
′
f (x)
x¯
x¯
Figure 6.10: Plot of derivatives f ′ (x), g′ (x)
′′
¯ = 0 is sufficient for a global maximum: whether Notice that with concavity, f ′ ( x)
¯ ≤ 0 or f ′′ ( x) ¯ < 0 is irrelevant. Recall in Figure 6.3(d), the function depicted f ( x)
there is concave but any number x in the interval [1,2] is an optimum. While the function is concave, it is not strictly concave. Definition 6.4: A function f is strictly concave if for x 6= y f (θ x + (1 − θ)y) > θ f (x) + (1 − θ) f (y), θ ∈ (0,1). Strict concavity guarantees a unique optimum. (The definition of strict convexity reverses these strict inequalities.) Theorem 6.5: If f is a strictly concave function then f has a unique maximum. P ROOF : Suppose that x¯ 1 and x¯ 2 are two maxima. Then f (θ x¯ 1 + (1 − θ) x¯ 2 ) > θ f ( x¯ 1 ) + (1 − θ) f ( x¯ 2 ), θ ∈ (0,1) and this contradicts the assumption that x¯ 1 and x¯ 2 are maxima.
6.3. CONCAVITY AND OPTIMIZATION
191
R EMARK 6.3: To summarize the discussion so far, if f ′ ( x) = 0 and f ′ ( x) < 0 or
f ′′ ( x) ≤ 0 on a neighborhood of x, then x is a local maximum. And if f is concave,
f ′ ( x) = 0 implies that x is a global maximum, while strict concavity implies that
x is a unique global maximum.
The following discussion clarifies the relation between concavity and the second derivative. The main conclusions are that concavity implies that f ′′ ( x) ≤ 0 for all
x (proved in a later chapter); and f ′′ ( x) ≤ 0 for all x implies concavity. If f ′′ ( x) < 0
for all x then f is strictly concave, but strict concavity of f does not imply that
f ′′ ( x) < 0 for all x. (Note that the proof of this theorem requires integration, a
subject not developed until later.)
Theorem 6.6: Given the function f , suppose that f ′′ ( x) < 0 everywhere, then the function is strictly concave. If f ′′ ( x) ≤ 0 for all x, the function is concave. P ROOF : With f ′′ ( x) < 0 for all x, if y > x, f ′ ( y) − f ′ ( x) =
Ry x
f ′′ (ξ)d ξ < 0, so that
y > x implies f ′ ( y) < f ′ ( x), and f ′ (·) is a strictly decreasing function. So, for y > x, f ( y) − f ( x ) = f ( y) − f ( x ) =
Zy x
Zy x
f ′ (ξ)d ξ < f ′ ( x)[ y − x]
(6.2)
f ′ (ξ)d ξ > f ′ ( y)[ y − x],
(6.3)
since f ′ ( x) > f ′ ( z) > f ′ ( y) for all z with x < z < y. Taking x < y, define z(θ) =
θ x + (1 − θ) y, 0 < θ < 1. From Equation 6.3, since x < z (θ) < y,
f ( z(θ)) − f ( x) > f ′ ( z(θ))[ z(θ) − x]. From Equation 6.2
f ( y) − f ( z(θ)) < f ′ ( z(θ))[ y − z(θ)]. Rearranging these expressions:
f ( z(θ)) > f ( x) + f ′ ( z(θ))[ z(θ) − x]
(6.4)
f ( z(θ)) > f ( y) + f ′ ( z(θ))[ z(θ) − y].
(6.5)
Multiplying these expressions by θ and 1 − θ, respectively, and adding gives: [θ + (1 − θ)] f ( z(θ)) > θ f ( x) + (1 − θ) f ( y) + f ′ ( z(θ)) {θ[ z(θ) − x] + (1 − θ)[ z(θ) − y]} .
(6.6)
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
192
Since [θ + (1 − θ)] = 1 and θ[z(θ) − x] + (1 − θ)[z(θ) − y] = z(θ) − [θ x + (1 − θ)y] = 0, f (z(θ)) > θ f (x) + (1 − θ) f (y).
(6.7)
Therefore, f is strictly concave if f ′′ (x) < 0 for all x. If f satisfies the weaker condition f ′′ (x) ≤ 0 for all 0, the previous arguments go through with weak inequalities replacing strict inequalities everywhere to yield: f (z(θ)) ≥ θ f (x) + (1 − θ) f (y).
¯ = R EMARK 6.4: Because f ′′ (x) ≤ 0 for all x implies concavity, the conditions f ′ ( x)
0 and f ′′ (x) ≤ 0 for all x are sufficient to ensure that x¯ is a global maximum.
However, strict concavity of f does not imply an everywhere strictly negative second derivative. R EMARK 6.5: Consideration of the strictly concave function f (x) = −(x − 1)4
(Figure 6.11) shows that strict concavity does not imply that f ′′ (x) < 0 every
where, since f ′ (x) = −4(x − 1)3 and f ′′ (x) = −12(x − 1)2 , so that f ′′ (1) = 0. (That this function is strictly concave is shown in a later chapter.)
·
·
·
1 2
·
1
·
3 2
x 2 −(x − 1)4
Figure 6.11: Strictly concave function, f , with f ′′ = 0 at x = 1 The previous observations may be gathered to give: Theorem 6.7:
1. A twice differentiable function f is concave if and only if
′′
f (x) ≤ 0, ∀ x. 2. The function f is convex if and only if f ′′ (x) ≥ 0, ∀ x. 3. Strict concavity of f implies that f ′′ (x) ≤ 0 for all x, and f ′′ (x) < 0 for all x implies strict concavity.
4. Strict convexity of f implies that f ′′ (x) ≥ 0 for all x, and f ′′ (x) > 0 for all x implies strict convexity.
There is an analogous result in higher dimensions.
6.4. EXTERNALITIES
193
R EMARK 6.6: For the multidimensional case, similar results apply. A function
f on R n is concave if and only if the matrix of second partial derivatives (the Hessian matrix) is negative semidefinite everywhere; strict concavity does not imply negative definiteness, but negative definiteness everywhere implies strict concavity.
6.3.1
Summary
The following set of points summarizes the previous discussion on optimization. • f ′ ( x¯ ) = 0 is necessary for a maximum or minimum. • f ′′ ( x¯ ) < 0 and f ′ ( x¯ ) = 0 is sufficient for x¯ to be a unique local maximum. • f ′′ ( x) ≤ 0 for a neighborhood of x¯ and f ′ ( x¯ ) = 0 is sufficient for x¯ to be a local maximum.
• f ′′ ( x) ≤ 0 for all x and f ′ ( x¯ ) = 0 implies that x¯ is a global maximum. • f ( x) concave implies that f ′′ ( x) ≤ 0 for all x. • f ′′ ( x) < 0 for all x implies f is strictly concave, which implies that f has a unique global maximum.
6.4 Externalities In many circumstances, the actions of one individual affect the welfare of another, sometimes in a positive way and sometimes in a negative way – but the individual taking the action does not consider such matters. So the overall impact on welfare resulting from the action is not considered in making the choice and this results in an inefficient outcome. One example of this type of issue arises in the use of public goods. When individuals can use a public facility for free there is either overuse or underprovision of the public facility. This overuse is called the freerider problem.
6.4.1
The freerider problem
Consider a public bridge that drivers can use for free. For each individual driver, the benefit from using the bridge is b. With n drivers using the bridge, the cost (in
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
194
terms of delay) to each driver is c(n), so when there are n drivers in total crossing the bridge the net benefit or utility to each of those drivers is
u = b − c(n). A driver who does not cross the bridge does not pay the cost or get any benefit and has a utility of 0. Assume that c(0) = 0, c′ (n) > 0 and c′′ (n) > 0 for all n ≥ 0. Thus, there is little delay with few drivers on the bridge, but delay increases with
the number of drivers, getting progressively worse as the congestion increases. (Although in reality n must be an integer, assume for the calculations that n is a real number. If c′ (n) > 0 is small until n is very large, rounding to the nearest integer will not be a significant issue.)
6.4.1.1
Equilibrium behavior
The marginal social benefit (MSB) from increasing the number of drivers crossing the bridge is b, the extra utility from an extra person making the journey. The social cost from bridge usage is the cost per user times the number of users,
s( c) = nc(n), and the total benefit is nb, the benefit per user times the number of users. Therefore, the social welfare is:
w(n) = nb − s( c) = nb − nc(n). Maximizing this:
w′ (n) = b − s′ ( c) = b − [nc′ (n) + c(n)]. The intuition for this is that an extra user gets benefit, b. There are two components to marginal or social cost. The extra user incurs cost c(n) and, in addition, the marginal increase in congestion, measured by c′ (n), is incurred by all n people, for a total cost nc′ (n). This gives a total marginal social cost of MSC = s′ ( c) = nc′ (n) + c(n). Socially efficient usage of the bridge occurs when the number of drivers crossing the bridge maximizes total welfare: the sum of utilities less costs of those crossing the bridge. This occurs when w′ (n) = 0. Let the solution to this equation
be n∗ . The choice of n∗ is the value of n satisfying:
[b − c(n)] − nc′ (n) = 0
6.4. EXTERNALITIES
195
or b = c(n∗ ) + n∗ c′ (n∗ ).
(6.8)
But drivers independently decide whether to use the bridge or not, and usage ˆ is determined increases as long as b > c(n), so the equilibrium usage level (n)
as the solution b − c(n) = 0. Comparing these, the market equilibrium level of
nc′ (n) + c(n)
c(n)
b
n∗
nˆ
Figure 6.12: Congestion externalities usage nˆ is larger than the socially optimal level, n∗ (see Figure 6.12). The welfare implications of the difference between n∗ and nˆ are clear – usage is excessive from a social point of view.
6.4.1.2
A policy solution
Suppose the government could charge a toll for bridge usage so a driver must pay t to cross the bridge. In particular, suppose the bridge manager solves the program maxn w(n) and determines that optimal usage is n∗ . The manager then charges a toll of t = n∗ c′ (n∗ ). So, if n people use the bridge, each individual compares b with c(n) + n∗ c′ (n∗ ), the travel cost plus the toll charge. If b > c(n) + n∗ c′ (n∗ ), the benefit outweighs the cost and usage increases. By contrast, if b < c(n) + n∗ c′ (n∗ ),
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
196
the opposite is true, and usage will decrease. So, with the tax, equilibrium bridge usage is determined by the value of n that satisfies b = c(n) + n∗ c′ (n∗ ), which is n∗ . E XAMPLE 6.8: Consider the case where c(n) = an2 . Then c′ (n) = 2an and c′′ (n) = 2a. The solution for n∗ , the socially optimal level of usage, is deter
mined by
b = nc′ (n) + c(n) = n2an + an2 = 3an2 or n∗ = p ∗ 3n .
q
b 3a
=
p1 3
q
b a.
The solution for nˆ is given by: b − an2 = 0 or nˆ =
q
b a
=
♦
6.5 Comparative statics Comparative statics studies the impact on a solution variable of changes in the underlying parameters. Consider an optimization problem where the function to be maximized depends explicitly on some parameter or parameters. For example, consider the maximization of the function f (x; a, c) = a · ln(x + 1) − cx. Writing the function this way makes explicit the dependence of the problem on both a and c. Maximizing gives: a
1 −c=0 x+1
and the secondorder condition for a minimum, −a (x+11)2 < 0, is satisfied. From the
firstorder condition, the solution is x(a, c) = ac − 1. So, for example, taking c to be a
constant,
dx da
= 1c , giving the impact on the solution of a change in the parameter, a.
In Figure 6.13, the impact of an increase in a to a′ is depicted. At parameter value a, the optimal choice of x is at x∗ ; when a is increased to a′ , the optimal choice of x moves from x∗ to x′ . Thus, the change in x per unit change in a is given by:
∆ x x′ − x∗ = ′ , ∆a a −a measuring the change in the optimal choice resulting from the parameter change.
6.5. COMPARATIVE STATICS
197
f ( x)
a′ ln( x + 1) − cx a ln( x + 1) − cx x x∗ x′
Figure 6.13: Impact of a parameter change on choice variable The following examples illustrate these ideas.
E XAMPLE 6.9: Consider the monopoly profit maximization (Example 6.5). ¡ ¢1 Note that the solution depends on the parameters c and α, Q (α, c) = 1−c α α , so that one may consider the impact of variations in α or c on the value
of Q . To calculate the derivative
dQ dα ,
we view Q as a function of α alone,
taking marginal cost to be a constant. First, to simplify matters, take the ln of Q and use the fact that 1 α
ln
(1−α) c ,
d ln Q dα
=
1 dQ Q dα
so that
dQ dα
=Q
d ln Q dα .
Since ln Q =
Ã ! µ ¶ d ln Q −1 (1 − α) 1 1 −1 = + α 1 −α ln dα c c α2 c ¶ µ µ ¶ µ ¶ (1 − α) −1 1 ln = − . c α(1 − α) α2
Multiplying this expression by Q =
¡ 1−α ¢ α1 c
gives
dQ dα
=Q
d ln Q dα :
¶1 µ ¶ µ ¶1 µ ¶ µ ¶ µ (1 − α) α (1 − α) α −1 (1 − α) 1 dQ − , = ln dα c c c α(1 − α) α2
giving the impact of a change in α on the optimal choice of Q .
The next example generalizes the cost function:
♦
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
198
E XAMPLE 6.10: This example modifies Example 6.5 by considering an alternative cost function. Suppose now that the cost function is c(Q ) = Q β , β >
1. In this case, the profit function is π(Q ) = Q 1−α − Q β . Therefore, π′ (Q ) =
(1 − α)Q −α − βQ β−1 = 0 gives
(1 − α)Q −α − βQ β−1 = 0 ⇒ (1 − α) = βQ α+β−1 ⇒
(1 − α) = Q α+β−1 . β
This implies that
Q = Q (α, β) = Then, ln Q =
³
1 α+β−1
´
ln
³
(1−α) β
´
µ
(1 − α) β
¶
1 α+β−1
.
. If we view Q (α, β) as a function of just α, so
that β is held fixed for the entire discussion, then we can consider
dQ dα :
!µ ¶ µ ¶2 µ ¶ µ ¶Ã 1 dQ 1 1 1 (1 − α) =− − , ln dα α+β−1 β α + β − 1 1 −α β β
calculated in the same way as in Example 6.9.
♦
Example 6.11 applies the same procedure to the tree harvesting problem (Example 6.7) to consider how the optimal time to harvest varies with the interest rate. E XAMPLE 6.11: Continuing with the tree harvesting problem of Example 6.7, one may also consider, in the abstract, the effect of variations in the interest rate on the solution value of t. The condition [ f ′ ( t) − r f ( t)] = 0 must
hold for any r – so that as r varies, so must t. Therefore,
f ′′ ( t)dt − f ( t)dr − r f ′ ( t)dt = [ f ′′ ( t) − r f ′ ( t)]dt − f ( t)dr = 0. Thus
dt f ( t) = , dr [ f ′′ ( t) − r f ′ ( t)]
(6.9)
and if we assume that f > 0, f ′ > 0, so that the value of the tree increases with age and at a diminishing rate so that f ′′ < 0, then this expression is negative.
To present a different perspective, this result may also be derived by approximation. The firstorder condition is f ′ ( t) − r f ( t) = 0. If r varies to
6.5. COMPARATIVE STATICS
199
r + ∆r, then t must vary to, say, t + ∆ t so that the firstorder condition is
satisfied at the new value of r:
f ′ ( t + ∆ t) − (r + ∆r) f ( t + ∆ t) = 0. Thus, [ f ′ ( t + ∆ t) − (r + ∆r) f ( t + ∆ t)] − [ f ′ ( t) − r f ( t)] = 0 © ′ ª © ª f ( t + ∆ t) − f ′ ( t) − r f ( t + ∆ t) − f ( t) − ∆r f ( t + ∆ t) = 0. Multiplying the last expression by
1 ∆t ∆ t ∆r
=
1 ∆r
gives:
f ( t + ∆ t) − f ( t) ∆ t 1 f ′ ( t + ∆ t) − f ′ ( t) ∆ t −r − ∆r f ( t + ∆ t) = 0 ∆t ∆r ∆t ∆r ∆r · ′ · ¸ ¸ f ( t + ∆ t) − f ′ ( t) ∆ t f ( t + ∆ t) − f ( t) ∆ t −r − f ( t + ∆ t) = 0 ∆t ∆r ∆t ∆r As ∆r → 0 with t → 0, varying smoothly in r,
f ′′ ( t)
dt dt − r f ′ ( t) − f ( t) = 0. dr dr
For example, let f ( t) = ktα , so that P ( t) = ktα e−rt − c and
P ′ ( t) = αktα−1 e−rt − rktα e−rt = ktα e−rt [α t−1 − r].
Setting this to 0 gives α t−1 − r = 0 or t = αr . Then
dt dr
= − rα2 .
To illustrate numerically, consider the case where k = 20, α = 0.9, c = 70
and r = 0.05, so that f ( t) = 20 t0.9 . The objective is then P ( t) = 20 t0.9 e−0.05 t − 70. Therefore, f ′ ( t) = (0.9)20 t−0.1 , f ′′ ( t) = −(0.1)(0.9)20 t−1.1 . According to the formula given earlier,
dt f ( t) 20 t0.9 = ′′ = ′ − dr [ f ( t) − r f ( t)] −(0.1)(0.9)20 t 1.1 − 0.05(0.9)20 t−0.1 =
t0.9
−(0.1)(0.9) t−1.1 − 0.05(0.9) t−0.1
=−
t . (0.09) t−1 + 0.045
Using the fact that the solution is t = dt dr dt dr
α r
=
0.9 0.05
= 18, this formula gives
= 360. (Notice that this is the same result as would be obtained from
=
0.9 0.052
=
0.9 0.0025
= 360.)
♦
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
200
6.6 Elasticity of demand The elasticity of demand measures the responsiveness of demand to price changes and is defined as the percentage change in quantity demanded due to a percentage change in price and denoted ǫ:
ǫ=
% change in Q = % change in P
∆Q Q ∆P P
=
P ∆Q P dQ ≃ = Q ∆P Q dP
dQ dP Q P
.
R EMARK 6.7: Often, because demand is downward sloping, the elasticity is defined as: −
% change in Q P dQ ≃− , % change in P Q dP
so that the elasticity is expressed as a positive number for downward sloping demand. The elasticity definition consists of a slope term magnitude of the variables,
P Q.
For any given
dQ dP ,
dQ dP
and the relative
the larger is Q relative to P,
the smaller is the elasticity. For example, if demand is equal to 10 units and the slope is −1, a unit price increase causes a unit drop in demand, which is 10% of total demand. If, however, demand is 100 units, and the slope is unchanged, a unit price increase again will reduce demand by 1 unit, which is 1% of demand. The demand elasticity reflects the importance of the levels of the variables,
whereas the slope does not. The calculations above use the fact that
a b c d
=
ad bc .
The elasticity approximation “≃”
becomes exact as the change in price becomes small. The elasticity of demand has a graphical interpretation as the ratio of the slope of the demand curve to the angle of the line from the origin to the point on the demand curve (see Figure 6.14). R EMARK 6.8: A point worth noting is that the reciprocal of the elasticity satisfies: 1 Q dP = . ǫ P dQ This is the case if dP 1 = dQ , dQ dP
since in that case, from the definition of ǫ, 1 = ǫ
1 P dQ Q dP
=
1 1 P dQ Q dP
=
Q 1 Q dP = . P dQ P dQ dP
6.6. ELASTICITY OF DEMAND
201
Q
dQ dP
Q P
Q (P ) (demand) P
Figure 6.14: Demand elasticity Because the price function is strictly decreasing (larger Q corresponds to lower
P , a downward sloping demand), we can write the relation between quantity and price as Q = Q (P ). This gives two alternative representations of the demand function: P = P (Q ) and Q = Q (P ). If we substitute one into the other we have an identity: Q ≡ Q (P (Q )). In words, for any Q , the price at which this quantity
will be purchased, P (Q ), is the price such that the quantity demanded at this price is Q . Because this equation holds for every Q , we can differentiate to get dQ dQ
=
dQ dP dP dQ
dQ
dP and since dQ /dQ = 1, this implies that dQ = 1/ dP . Since ǫ = # # " " #" Q 1 Q dP 1 1 1 1 ´= = = =³ . P dQ dQ dQ P ǫ P P dQ Q dP
Q
dP
P dQ Q dP ,
dP
´ ³ E XAMPLE 6.12: Let P (Q ) = ln 1 + Q1 . Then
dP dQ
= P ′ (Q ) is:
µ ¶ −1 dP 1 −1 = 1+Q · . = dQ (Q + 1)Q Q2 Q
´ ³ ´ ³ Since P (Q ) = ln 1 + Q1 , 1 + Q1 = e P or
Q ′ (P ) is:
1 Q
= e P − 1 or Q =
1 . ( e P −1)
Thus
dQ dP
=
µ ¶ µ ¶ 1 −1 dQ −1 1 P 2 · 1 + · e = = P = − Q · 1 + = −Q (1 + Q ), ´ ³ 2 dP ( e − 1)2 Q Q 1 Q
therefore
dQ dP
=
1 dP dQ
.
♦
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
202
6.6.1 Elasticity, revenue and profit maximization Note that, in general, the elasticity will depend on the point on the demand i h³ ´ curve Q
(a− bQ )
at which it is evaluated. For example, in the linear case P = Q = Qa − b and i¡ ¢ ³ ´³ ´ h³ ´ ¡ ¢ ¡ ¢ ¡ ¢ dQ dQ a a P Q = ab − b1 P , so dP = − b1 . Therefore, ǫ = Q = − b . − b1 = 1 − bQ dP Q
Sometimes it is useful to make explicit the dependence of the elasticity on the quantity by writing ǫ(Q ). Note that ǫ < 0 (as long as price is nonnegative: a − bQ ≥
a = −1 or 0). When ǫ = −1, 1− bQ
a bQ
= 2 or Q =
a 2 b , the revenuemaximizing quantity.
Recall that the profit function of a monopoly was written: π(Q ) = P (Q )Q − c(Q ) = R (Q ) − c(Q ).
From this expression: π′ (Q ) = R ′ (Q ) − c′ (Q ) ¸ · Q dP − c′ (Q ) = P (Q ) 1 + P dQ ¸ · 1 − c′ (Q ). = P (Q ) 1 + ǫ
Since a profitmaximizing monopolist sets π′ (Q ) = 0 and solves for Q , this implies
that the solution satisfies:
· ¸ 1 P (Q ) 1 + = c′ (Q ), ǫ(Q )
and since ǫ(Q ) < 0, this implies that P (Q ) > c′ (Q ) at the solution so that the profit
maximizing monopolist sets a price greater than marginal cost. Rearranging this expression, 1 P (Q ) − c′ (Q ) = , P (Q ) ǫ(Q )
so that the markup of price over marginal cost is equal to the reciprocal of the elasticity. Finally, note that marginal revenue is given by · ¸ 1 1 MR (Q ) = R (Q ) = P (Q ) 1 + = P (Q ) + P (Q ) < P (Q ) ǫ ǫ ′
and the marginal revenue curve lies below the demand curve P (Q ) by the amount 1 P (Q ). ǫ
6.7. TAXATION AND MONOPOLY
203
6.7 Taxation and monopoly Suppose that the government wishes to induce a monopolist to produce the quantity at which price equals marginal cost. To achieve this outcome, suppose that the government gives a perunit subsidy of s and imposes a lump sum tax of T . The profit of the firm is then: π g (Q ) = Q ( P (Q ) + s) − cQ − T.
Taking s and T as given, the firm maximizes profit – the firstorder condition is π′g (Q ) = 0 = P (Q ) + QP ′ (Q ) + s − c = 0.
i h By an earlier calculation, this is equivalent to: P (Q ) 1 + ǫ(1Q ) = c − s. Write the
solution to this equation as Q (s), to denote dependence on the level of the subsidy.
For any given s, this equation has a unique solution, provided R ′′ (Q ) < 0, where
R (Q ) = QP (Q ).
In the particular case where s = 0, the solution is the monopoly level of out
put, Q m : Q (0) = Q m . Suppose that the subsidy level s∗ is such that Q (s∗ ) is the
competitive quantity Q (s∗ ) = Q c so that price equals marginal cost, P (Q (s∗ )) = c. Then the firstorder condition at s∗ is
π′g (Q ( s∗ )) = 0 = P (Q ( s∗ )) + Q ( s∗ ) P ′ (Q ( s∗ )) + s∗ − c = 0,
and because P (Q (s∗ )) = c, this reduces to π′g (Q ( s∗ )) = Q ( s∗ ) P ′ (Q ( s∗ )) + s∗ = 0.
Now, recall that
1 ǫ
=
Q ′ P P (Q ),
so this condition may be written:
s∗ = −Q (s∗ )P ′ (Q (s∗ )) = −P (Q (s∗ ))
µ
¶ Q (s∗ ) P (Q (s∗ )) P (Q c ) ′ ∗ P ( Q ( s )) = − =− . ∗ ∗ P (Q (s )) ǫ(Q ( s )) ǫ(Q c )
So, if the subsidy is chosen equal to the price–elasticity ratio at the competitive output, the monopolist’s optimal choice is the competitive output. Unraveling the expression, µ
¶µ ¶ P (Q c ) dQ P (Q c ) = −s ǫ(Q c ) = −s . Qc dP ³ ´ dQ Q Multiply both sides by P (Qcc ) to get Q c = −s dP . In the case of linear demand ∗
− b1 ,
∗
so that s
∗
should be chosen to satisfy Q c =
s∗ b
∗
or s
dQ dP
=
= bQ c . Since
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
Qc =
( a − c) b ,
204
the unit subsidy s∗ = a − c leads the monopolist to produce at the com
petitive output level.
In this case, the total subsidy is s∗ Q c = (a − c)
³
( a − c) b
´
=
( a − c )2 b .
Because the
firstorder condition for a maximum does not depend on the lump sum tax T , the output of the firm depends only on the policy parameter s. So, ³the subsidy may be ´ recouped by a lump sum tax, T , chosen to equal s∗ Q c = (a − c)
( a − c) b
=
( a − c )2 b .
Since the solution depends on governmentset parameters, comparative statics calculations arise naturally in considering the impact of government policies. The next example discusses the impact of various tax changes on a monopolist’s choice of quantity.
E XAMPLE 6.13: Suppose that a government imposes a revenue tax t and output subsidy s. Thus, if the firm receives revenue R and produces quantity
Q , the tax paid is tR and the subsidy received sQ . With demand given by P (Q ) = Q1α = Q −α , 0 < α < 1, and linear cost ( c(Q ) = cQ ), the profit function is π(Q ) = (1 − t) R (Q ) − cQ + sQ
= (1 − t)Q 1−α − ( c − s)Q.
Thus, the firstorder condition is π′ (Q ) = (1 − t)(1 − α)Q −α − ( c − s) = 0. Rearranging,
Q
−α
µ ¶1 (1 − t)(1 − α) (1 − t)(1 − α) α ( c − s) α ⇒Q = ⇒Q = . = (1 − t)(1 − α) ( c − s) ( c − s)
It is of interest to know the impact of changes in t and s on output. Some calculation gives: µ ¶ 1 ¶ µ dQ 1 (1 − t)(1 − α) ( α −1) −(1 − α) = dt α ( c − s) ( c − s)
and µ ¶ 1 ¶ µ dQ 1 (1 − t)(1 − α) ( α −1) (1 − t)(1 − α) = ds α ( c − s) ( c − s)2 µ ¶1 µ ¶ 1 (1 − t)(1 − α) α 1 . = α ( c − s) ( c − s)
♦
6.8. TAXATION INCIDENCE IN SUPPLY AND DEMAND
205
6.8
Taxation incidence in supply and demand
When a tax is imposed on a good, this generally impacts both the buyer and seller. The price paid by the buyer will typically rise, and the price received by the seller will typically fall – with the gap equal to the tax. Relative to the pretax price P0 , the extent to which the buyer price after tax, P tb , exceeds P0 and the seller price received, P ts , falls below P0 measures the tax incidence. With a unit tax, P tb = P ts + t. If, for example, P tb − P0 is much larger than P0 − P ts , then the tax
incidence is borne mostly by the buyer. This is discussed next.
Suppose that in some market the government charges a tax t per unit sold. Thus the price paid by the purchaser exceeds the price received by the seller by the amount t : P d (Q) = P s (Q) + t. Here P d denotes the price paid by the demander and P s denotes the price paid by the seller. Prior to the imposition of the tax,
the equilibrium quantity Q e is determined by the equation P d (Q e ) = P s (Q e ). Demand is downward sloping and supply upward sloping, so P d′ (Q e ) < 0, P s′ (Q e ) > 0 and there is a unique point of intersection. The situation before and after tax is depicted in Figure 6.15. P S
P s (Q t ) + t = P d (Q t )
P d (Q e ) = P s (Q e )
•
P s (Q t )
D Qt
Qe
Q
Figure 6.15: The impact of taxation on price and quantity When a unit tax is imposed on a good, the amount of the tax is independent of the selling price, so that the gap between P s and p d is equal to the tax: p d (Q) =
P s (Q) + t. In contrast, with an advalorem tax, the amount of the tax depends on
the price. In this case, the selling price is scaled up by the addition of a tax that is a percentage of the price and then p d (Q) = (1 + t)P s (Q). In either case, the result is
to reduce the equilibrium amount traded. Let Q(t) (or Q t ) be the amount traded at
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
206
tax level t. This is a function of t, whether the tax is a unit or advalorem tax. What is
dQ dt ?
This question may be answered directly by solving for Q as a function of t
and then computing the derivative. It is also possible to derive a general formula for
dQ dt
in terms of elasticities, and that approach provides some insight into how
elasticities affect the responsiveness of quantity to variation in the tax. These matters are discussed next. The analyses for the unit and advalorem taxes are provided separately, although the two cases are quite similar.
6.8.1 A unit tax The impact of the imposition of a unit tax on quantity can be determined as follows. Given an initial value for t, the corresponding quantity, Q ( t), satisfies
P d (Q ( t)) = P s (Q ( t)) + t,
(Q (0) = Q e ).
Differentiating gives
P d′ (Q ( t))
dQ ( t) dQ ( t) − 1 = P s′ (Q ( t)) . dt dt
This may be rearranged, −1 = [P s′ (Q ( t)) − P d′ (Q ( t))]
dQ ( t) . dt
At t = 0, Q ( t) = Q (0) = Q e and P s (Q (0)) = P d (Q (0)) = P e , so considering this equa
tion at t = 0,
· ¸ Pe Q e ′ Qe ′ dQ (0) P s (Q e ) − P d (Q e ) Q e Pe Pe dt · ¸ 1 dQ (0) Pe 1 − . = Q e ηs ηd dt
−1 =
(6.10)
Let the terms η s and η d denote the elasticities of supply and demand respectively. Since P d′ (Q ) < 0 and P s′ (Q ) > 0, η d < 0 and η s > 0. Solving for
t = 0,
dQ (0) dt )
dQ dt
(evaluated at
gives
dQ dt
Since η s > 0 and η d < 0,
t =0
dQ dt
=
dQ (0) dQ = = dt dt
P Q
h
−1
1 ηs
− η1
d
i.
< 0. Notice that when the elasticities are small, the
impact on Q of a change in t is large. This calculation gives the impact on Q of a change in t starting from the initial point where t = 0.
6.8. TAXATION INCIDENCE IN SUPPLY AND DEMAND
207
E XAMPLE 6.14: Consider a onemarket model for good Q , where demand is given by P d (Q ) = Q −α , 0 < α < 1 and supply is given P s (Q ) = bQ , b > 0. The
marketclearing quantity is given by the solution to: Q −α = bQ , or 1 = bQ 1+α , ¡ ¢ 1 1 or Q 1+α = 1b , or Q = 1b 1+α = b− 1+α . With a unit tax, market clearing becomes Q −α = bQ + t. This expression is
difficult to work with. However, the elasticities formula above allows a direct computation of
dQ dt
at the marketclearing level. The elasticity of demand
and supply may be computed: P d′ (Q ) = −αQ −(1+α) , so that ¶ µ ¶ µ 1 Q Q (−α)Q −(1+α) = Q (1+α)Q −(1+α) (−α) = (−α). = P d′ (Q ) = ηd Pd Q −α
Similarly P s (Q ) = bQ , so that P s′ (Q ) = b, and µ ¶ µ ¶ 1 Q ′ Q 1 b= = P s (Q ) = b = 1. ηs Ps Ps b P d dQ Q dP d P =b that Q
Summarizing, η d =
Finally, observe
= − α1 and
1 ηd
= −α. Similarly, η s = 1 and η1s = 1.
since P e = bQ e , so:
1 dQ =− . dt b(1 + α)
♦
As mentioned, a unit tax is independent of the selling price and is just added to the selling price so that market clears when P d (Q t ) = P s (Q t ) + t. In contrast, with the advalorem tax, the tax paid is a percentage of the sale price.
6.8.2
An advalorem tax
Suppose that the government imposes an advalorem tax t, so that in (the aftertax) equilibrium (1 + t)P s (Q ( t)) = P d (Q ( t)), where Q ( t) is the aftertax quantity
traded. Let Q ( t) denote the equilibrium quantity after tax. Note that Q (0) = Q e .
With an advalorem tax, the elasticity formula is slightly different. Differentiate (1 + t)P s (Q ( t)) = P d (Q ( t)) with respect to t, to get
P s (Q ( t)) + (1 + t)P s′ (Q ( t))
dQ ( t) dQ ( t) = P d′ (Q ( t)) dt dt
or [(1 + t)P s′ (Q ( t)) − P d′ (Q ( t))]
dQ ( t) = −P s (Q ( t)) dt
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
208
or · ¸ P s (Q ( t)) Q ( t) P d (Q ( t)) Q ( t) dQ ( t) (1 + t) P s′ (Q ( t)) − P d′ (Q ( t)) = −P s (Q ( t)). Q ( t) P s (Q ( t)) Q ( t) P d (Q ( t)) dt
Let
1 η s (Q ( t))
=
Q ( t) ′ P s (Q ( t)) P s (Q ( t))
and
1 η d (Q ( t))
Q ( t) ′ P d (Q ( t)) P d (Q ( t))
=
be the reciprocals of the
elasticity of supply and demand at Q ( t). Then the expression may be written as: ¸ · 1 1 P d (Q ( t)) dQ ( t) P s (Q ( t)) − = −P s (Q ( t)). (1 + t) Q ( t) η s (Q ( t)) Q ( t) η d (Q ( t)) dt
This expression is valid at each value of t. However, it is simpler at t = 0, since
P s (Q (0)) = P d (Q (0)) = P e , so that P d (Q ( t)) Q ( t)
equals
P d (Q (0)) Q (0)
·
or
Pe Qe .
P s (Q ( t)) Q ( t)
equals
P s (Q (0)) Q (0)
or
Pe Qe .
Likewise, at t = 0,
In this case, the expression becomes:
¸ Pe 1 1 Pe dQ (0) − = −P s (Q (0)) = −P e . Q e η s (Q e ) Q e η d (Q e ) dt
So, · ¸ 1 dQ (0) Pe 1 − = −P e , Q e ηs ηd dt
where the elasticities are evaluated at Q e . Rearranging:
dQ (0) =− dt
Pe h
Pe Qe
= −h
1 ηs
− η1
d
Qe 1 ηs
− η1
d
i
i;
¸ · Pe 1 dQ 1 − = −P e , Q e (0) η s η d dt
dQ =− dt
P Pe Qe
h
1 ηs
− η1
d
i = −h
Qe 1 ηs
− η1
d
i,
where all functions are evaluated at t = 0. Applying this formula to the demand and supply functions of Example 6.14 gives:
µ ¶ 1 ¡ 1 ¢ 1 1 1 +α 1 dQ =− b − 1 +α . =− dt (1 + α) b (1 + α)
This calculation may be confirmed directly by solving for Q ( t) and differentiating. With an advalorem tax, the equation Q ( t)−α = (1 + t)bQ ( t) sets demand equal
to supply. Rearranging,
6.8. TAXATION INCIDENCE IN SUPPLY AND DEMAND
209
Q ( t)−α = (1 + t)bQ ( t) or
µ ¶ 1 1 +α 1 1 1 = Q ( t)1+α or Q ( t) = = ((1 + t)b)− 1+α . (1 + t)b (1 + t)b
Thus, µ ¶ 1 dQ ( t) 1 =− [(1 + t)b]− 1+α −1 · b. dt 1+α
At t = 0,
1 1 dQ ( t) =− b − 1 +α . dt (1 + α)
So, the two calculations yield the same result.
E XAMPLE 6.15: As a final example, these calculations may be repeated, replacing P s (Q ) = bQ, b > 0 with P s (Q ) = Q β , β > 0. In this case η d = − α1 and ηs =
1 β
so
1 ηd
= −α and
1 ηs
= β, and so
1 ηs
− η1 = α + β. The equilibrium quand
1
tity satisfies Q −α = Q β or 1 = Q α+β or Q = 1 α+β = 1. Likewise, P = 1 is the equilibrium price. Then, using the formula,
dQ 1 =− . dt α+β By contrast, solving P d (Q ( t)) = Q ( t)−α = (1 + t)Q ( t)β gives Q ( t)α+β = (1 + t)
−1
and Q ( t) = (1 + t)
1 − α+ β
. Differentiating:
1 1+ t
=
dQ 1 − 1 −1 =− (1 + t) α+β dt α+β 1 at t = 0. and this equals − α+ β
6.8.3
♦
Elasticities and taxation
Considering the formula associated with the unit tax,
dQ = dt
P Q
h
−1
1 ηs
− η1
d
i,
the following observations highlight some aspects of elasticities and their relationship to the responsiveness of demand and price to changes in taxation. If η s is
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
210
very large, the quantity supplied is very responsive to price changes, so the supply curve is flat. When η d is also very large, then the demand curve is also flat. In this case, both
1 ηs
1 ηd
and
are very small so that
dQ dt
is a large negative number (−1
divided by a small positive number). Therefore, large elasticities lead to a large change in quantity traded as a result of the tax. (See Figure 6.16.)
P
P S
S •
∆P
∆P D
•
D Qt
Q
Qe
Qt Qe
Q
Figure 6.16: Large and small changes in quantity
When both η s and η d are small, this corresponds to the situation where neither supply nor demand is responsive to price changes – the curves are both steep. In this case,
1 ηs
and
1 ηd
are large so that
dQ dt
is a small negative number. (See
Figure 6.16.) Other possibilities arise when only one of the elasticities is relatively small. If η s is very large (so that
1 ηs
≃ 0) and η d is very small, then in this case the
supply curve is close to horizontal and the demand curve steep. (See Figure 6.17.) When η d is very large (so that
1 ηd
≃ 0) and η s is very small, the supply curve is
steep and the demand curve flat. (See Figure 6.17.) While the price change is the
same in both cases in Figure 6.16, the quantity change is much smaller when the elasticities are small. In Figure 6.16, the demand and supply curves are similarly sloped; in one case they are relatively elastic so that large quantity changes occur as a result of the tax and in the other a less significant change occurs. Because they have similar slopes, the incidence of the tax is equally borne by demanders and suppliers – the rise in price of one is roughly equal to the drop in price for the other. In Figure 6.17, the demand and supply curves have dissimilar slopes, so the taxation incidence is unequally distributed.
6.9. RAMSEY PRICING
211
P
P S
S
∆P
•
∆P
•
D
D Qt Qe
Q
Qt Qe
Q
Figure 6.17: Taxation with elastic demand or supply
6.9
Ramsey pricing
Consider the efficiency problem that arises when there is a fixed cost in production, so that pricing at marginal cost generates a loss. In the absence of a subsidy to production, price must be set above marginal cost. Consider the task of recovering fixed cost by pricing above marginal cost in different markets where fixed cost is jointly shared in production, while minimizing loss of consumer and producer surplus. The issue then is to set prices in each of the markets to recover fixed cost, but in a way that minimizes overall deadweight loss. Consider two markets with demands p x (x) = a − bx and p y (y) = α − β y, where
both items are produced by a firm with cost function c(x, y) = F + c[x + y], so that there is common constant marginal cost c and a fixed cost F. Pricing at marginal cost is efficient but generates a loss for the firm. If prices are set above marginal cost so that the fixed cost F is recouped, what is the optimal pricing scheme (minimizing loss of surplus, subject to raising the fixed cost F)? In Figure 6.18, to simplify, the parameters are chosen so that the quantity, q, setting price equal to marginal cost is the same in both markets. Consider reducing the quantity in each market by the same amount. In market X this leads to price d and revenue R x = (d − c)q′ . The deadweight loss is given
by the triangle DW L x = D + E = 12 (d − c)(q − q′ ). In market Y , the price is e and the revenue raised is R y = E = (e − c)q′ . Also, the deadweight loss is DW L y = E = 1 ′ 2 (e − c)(q − q ).
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
212
px
py d D
e
E
c
q′
q
Figure 6.18: Ramsey pricing
Now, consider the revenue generated per unit of deadweight loss in each market: ρ x and ρ y . ρx = ρy =
q′ ( d − c) q ′ Rx =2 = 1 , ′ DW L x q − q′ 2 ( d − c)( q − q ) Ry DW L y
=
( e − c) q ′
1 ′ 2 ( e − c)( q − q )
=2
and
q′ . q − q′
Let ρ = ρ x = ρ y . Viewed as a function of ( q′ , q), ρ ( q′ , q) falls as q′ falls, so the
revenue gain per unit of deadweight loss is greatest when q′ is close to q. And the optimal quantity reduction is the same in both markets: if, for example, the reductions were q′x < q′y then the revenue gain per unit of deadweight loss is smaller in market X than Y .
In the case where marginal costs are constant but different and the priceequalmarginalcost conditions give different quantities, the calculations are similar. When quantities are reduced from qα to q′α and corresponding prices following the quantity reductions are p′α , α = x, y, the revenue gains in markets x
and y are: R x = q′x ( p′x − c x ) and R y = q′y ( p′y − c y ), and the deadweight losses are
DW L x = 21 ( p′x − c x )( q x − q′x ) and DW L y = 21 ( p′y − c y )( q y − q′y ). The corresponding
6.9. RAMSEY PRICING
213
ratios are:
q′x ( p′x − c x )
1 ′ ′ 2 ( p x − c x )( q x − q x )
=2
q′y ( p′y − c y )
q′x
, q x − q′x
1 ′ ′ 2 ( p y − c y )( q y − q y )
=2
q′y q y − q′y
.
Setting these equal:
q′x q x − q′x
=
q′y q y − q′y
,
so that q′x ( q y − q′y ) = q′y ( q x − q′x ) or q′x q y = q′y q x , or
ties should be reduced in the same proportion.
q′x qx
=
q′y qy .
Therefore, the quanti
This page intentionally left blank
Exercises for Chapter 6
Exercise 6.1 Let C ( q) denote the total cost of producing q units of a good. Show that the average cost is minimized at the point where average and marginal costs coincide. Exercise 6.2 The price–cost margin (markup), m, for a monopolist is defined as
m=
price − marginal cost . price
Find the relationship between ǫ, the elasticity of demand and m. Exercise 6.3 Let demand for a good be given by p = a − bq or equivalently by ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ q = ab − b1 p = α − β p, where α = ab and β = 1b . The cost of production of the good is c( q) = cq, where c is a positive constant. Assume there is only one firm producing the good.
(a) Is the firm’s profit function concave? (b) Find the profitmaximizing level of output of the good. (c) Suppose that the firm is taxed, paying $τ per unit of output. Give the firm’s new objective function and find the firm’s profitmaximizing level of output. (d) Suppose that the firm is taxed, such that on each dollar of sales the firm pays the fraction τ in taxes. Give the firm’s new objective function and find the profitmaximizing output level. (e) In cases (c) and (d), analyze the effect of changing τ on the optimal choice of q. 215
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
Exercise 6.4
216
A monopolist faces a demand curve P (Q ) = kQ −α (0 < α < 1) and
cost function c(Q ) = cQ 2 , c > 0.
(a) Find the profitmaximizing level of output, Q ∗ . Confirm that the secondorder condition for a maximum is satisfied. (b) Now suppose that the cost function is c(Q ) = c1 Q + c2 Q 2 , where c1 , c2 > 0 (the
demand function is still P (Q ) = kQ −α ). Confirm that the profit function is concave.
Exercise 6.5 Consider the following supply and demand model: demand : P d (Q ) = α − βQ ; supply : P s (Q ) = γ − δQ. The equilibrium quantity Q o solves P d (Q ) = P s (Q ), i.e., P d (Q o ) = P s (Q o ). (a) Find the elasticities of the demand and supply functions, η d and η s . (b) Calculate the impact of a small (per) unit tax, t, on the quantity sold. ³ ´ dQ (c) Relate your answer d t , to the elasticity formula: ¾ ½ dQ 1 1 1 − = −³ ´ p dt ηs ηd Q
(d) Replace the supply function with P s (Q ) = γ − 21 δQ 2 and repeat the calculations for the pair: P d (Q ) = α − βQ and P s (Q ) = γ − 21 δQ 2 .
Exercise 6.6 Suppose that instead of a unit tax, a proportional tax is imposed so that the aftertax relation between the selling and buying price is given by
P d (Q ) = (1 + t)P s (Q ). Calculate
dQ dt ,
at t = 0.
Exercise 6.7 In a city, electricity is produced by one firm. This firm is given the sole right to produce the city’s electricity because it can produce the electricity needed more efficiently than could any other firm or firms. The firm is thus a natural monopoly. The total revenue function for the firm is R (Q ) = P (Q )Q = 24Q − 5Q 2 (P (Q ) = 24 − 5Q ) and its total cost is 25 − Q 2 .
One of the problems that the city faces is that since the firm is a natural monopoly, and since electricity has a very inelastic demand, the firm can make large profits. To control these large profits, the city is considering a revenue tax ( tR ) as a means of regulating the firm. Assuming that the firm is a profit maximizer: (a) Calculate the optimal price and quantity before regulation, and after the revenue tax is applied. (b) If the tax rate is set at 20%, will the profits of the firm be decreased? Will the city benefit from this regulation, i.e., will it face lower prices and higher quantities? (c) Calculate the tax revenue the city will receive.
EXERCISES FOR CHAPTER 6
217
Exercise 6.8 To control the use of tobacco, the federal government can impose many different types of tax on the producers of tobacco. Assume that there is only one producer (monopolist), who faces the demand curve P (Q ) = 56−8Q , and whose total cost curve is C (Q ) = 6Q 2 −10. If the producer
is a profit maximizer:
(a) Determine the profitmaximizing output of the monopolist and the associated price. (b) Assume now that the government levies a perunit tax of t on the producer in order to reduce the use of tobacco. What effect does this have on the profitmaximizing price and output? (c) Alternatively, assume the government levies a sales tax of 50% on the producer. What effect does this have on the profitmaximizing price and output? (d) If the perunit tax is set at 7, and the sales tax is set at 50%, which tax is more effective? Exercise 6.9
Assume that a perfectly competitive producer has a production 1
function for a homogeneous good given by Q (L) = 20L 2 . The costs of production
are given by a fixed cost of $50 plus a wage rate of $8 for each unit of labour employed. If the market price of the good is $4, (a) Find the amount of labour needed to maximize profits, and the associated level of profits. (b) Assume that a payroll tax is imposed, which causes the wage rate to become w = w(1 + t). Determine the new amount of labour needed to maximize
profits.
(c) If t = 0.25, what is the amount of labour needed to maximize profits? What is the associated level of profits?
Exercise 6.10 A competitive firm faces the market price of $9 per unit. Its total cost is given by Q 3 − 9Q 2 + 33Q + 10. (a) Determine the level of output that maximizes the firm’s profits, and the value of profits at this level. Will the firm continue to produce? (b) Determine the minimum price at which the firm will stay in the industry. Exercise 6.11 Consider a firm with cost function C = 6 + Q 2 facing the market
demand function Q = 18 − 2P .
(a) If the firm behaves as a monopolist, what will be the price and quantity in equilibrium? (b) If the firm behaves competitively, what will be the equilibrium price and quantity?
CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS
218
(c) Suppose the monopolist were offered a subsidy S per unit produced. What level of subsidy would induce the monopolist to produce at the competitive equilibrium? Exercise 6.12
Consider a firm with the following cost and demand functions:
2
C (Q ) = Q + 2Q + 1, Q = 16 − 2P . (a) Calculate the equilibrium price and quantities: (i) if the firm acts as a monopolist (ii) if the firm acts competitively. (b) Calculate the profits the firm makes: (i) as a monopolist (ii) as a competitor. (c) Suppose the government offered the monopolist a perunit subsidy, S . What level of S will bring about the competitive solution? (d) What is the subsidized monopolist’s level of profit? Exercise 6.13
Consider a monopolist with a demand function P = P (Q ) and
cost function c = c(Q ), c′ (Q ) > 0. Show that if the monopolist maximizes profit, the monopolist will operate on the portion of the demand curve where the elasticity exceeds 1. Exercise 6.14
Consider a monopolist with a demand function P = P (Q ) and
cost function c = c(Q ), c′ (Q ) > 0. Show that with profit maximization, the firm will operate on the portion of the demand curve where the elasticity exceeds 1. Exercise 6.15 In some market, the demand function is given by p d ( q) =
a ln q
and
the supply function is given by p s ( q) = bq.
(a) Calculate the elasticities of demand and supply. (b) The equilibrium quantity satisfies p d ( q) = p s ( q). Starting from a point where
there is no tax on the good, suppose that a small tax t is imposed – so that p d ( q) =
p s ( q) + t. Calculate
d q d pd dt , dt
and
d ps dt ,
all at t = 0.
Exercise 6.16 Suppose that the demand and supply functions in a market are given by P d (Q ) = Q −α and P s (Q ) = e kQ − 1. The equilibrium quantity depends on the parameters α and k: Q (α, k). (a) Show that
∂Q ∂α
< 0 and
∂Q ∂k
< 0.
(b) Suppose now that the government imposes a “goods and services” style tax at rate t (formally, a value added tax) so that the equilibrium quantity after tax is
Q ( t) (ignore the parameters α and k from now on). Thus, Q ( t) satisfies P d (Q ( t)) = (1 + t)P s (Q ( t)). Find
dQ (0) dt
in terms of elasticities.
7 Systems of equations, differentials and derivatives
7.1
Introduction
In many economic problems, functions of more than one variable arise naturally. Preferences are represented by a utility function that depends on goods consumed, u(x1 , x2 ,... , x n ), output in production is represented as a function of inputs, f (k1 , k2 ,... , k n ), and so on. As with the singlevariable case, differential techniques play an important role in a variety of applications, such as comparative statics and optimization. For example, the impact of variation in one variable on the value of a function routinely arises in considering its marginal impact. In the production case, the marginal productivity of input k1 is given by [ f (k1 +
∆, k2 ,... , k n ) − f (k1 , k2 ,... , k n )]/∆, and measures the impact on output of a small change in input k1 . Such a calculation is called a partial derivative, since only variation in one of a number of variables is considered. Developing these techniques leads to a number of applications and provides the necessary tools to develop some important results – in particular the implicit function theorem. Following some introductory discussion, partial derivatives are defined in Section 7.2. This is the natural extension of the derivative of a function of one variable. After this, level contours are introduced (Section 7.3). A level contour identifies the set of x that yields a specific value for the function: in microeconomic theory, indifference curves and isoquants are level contours of utility and production functions respectively. In two dimensions particularly, level contours provide a useful way to visualize the shape of a function. Concepts such as the marginal rate of substitution between goods in consumption or production are explained naturally in terms of level contours and are defined using partial derivatives. 219
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
220
One important use of partial derivatives is in locating maxima and minima of a function: values of (x1 ,... , x n ) that make the function as large or as small as possible. This technique is illustrated in Section 7.5. The remaining sections consider (nonlinear) systems of equations. Section 7.7 outlines some of the basic issues relating to nonlinear systems, such as the existence and uniqueness of solutions. Following this, the implicit function is introduced in Section 7.8. This theorem studies how variation of one or more parameters in a system of equations affects each variable in the system. For a linear function of many variables, the impact of parameter variation is easy to determine. The function f (x1 , x2 ) = ax1 + bx2 is a linear function of two variables and f (x1 , x2 ,... , x n ) = a1 x1 + a2 x2 + · · · + a n x n is a linear function of n
variables, (x1 , x2 ,... , x n ). The impact of a change in x1 on the value of f (x1 , x2 ) = ax1 + bx2 may be computed directly:
f (x1 + ∆ x1 , x2 ) − f (x1 , x2 ) = (ax1 + ∆ x1 + bx2 ) − (ax1 + bx2 ) = a∆ x1 . From this, the change in f per unit change is x1 may be calculated as: f (x1 + ∆ x1 , x2 ) − f (x1 , x2 ) = a. ∆ x1 Similar calculations may be applied for nonlinear functions to compute the impact of variation in some variable on the function. E XAMPLE 7.1: Consider f (x1 , x2 ) = (x1 x2 )2 + cos(x1 ) sin(x2 ), suppose that x1 is increased to x1 + ∆ x1 : f (x1 + ∆, x2 ) = ([x1 + ∆ x1 ] · x2 )2 + cos(x1 + ∆ x1 ) sin(x2 )
= [x12 + 2x1 ∆ x1 + (∆ x1 )2 ] · x22 + cos(x1 + ∆ x1 ) sin(x2 )
= x12 · x22 + [2x1 ∆ x1 + (∆ x1 )2 ] · x22 + cos(x1 + ∆ x1 ) sin(x2 ).
Subtracting f (x1 , x2 ), then f (x1 + ∆ x1 , x2 ) − f (x1 , x2 ) = [2x1 ∆ x1 + (∆ x1 )2 ] · x22
+ [cos(x1 + ∆ x1 ) − cos(x1 )] sin(x2 ).
Dividing by ∆ x1 , [cos(x1 + ∆ x1 ) − cos(x1 )] f (x1 + ∆ x1 , x2 ) − f (x1 , x2 ) = 2x1 x22 + sin(x2 ) + (∆ x1 ) · x22 . ∆ x1 ∆ x1
7.1. INTRODUCTION
221
As ∆ x1 goes to 0, the term (∆ x1 ) · x22 goes to 0 and
[cos( x1 +∆ x1 )−cos( x1 )] ∆ x1
approaches the derivative of cos( x1 ), which is − sin( x1 ). Therefore, as ∆ x1 approaches 0,
f ( x1 + ∆ x1 , x2 ) − f ( x1 , x2 ) → 2 x1 x22 − sin( x1 ) sin( x2 ). ∆ x1
♦ This example shows that calculating the impact of parameter or variable changes on a function of many variables or parameters is not significantly different from similar computations in the onevariable case. The need for such calculations comes up routinely in economics. Consider maximizing utility u( x1 , x2 ) subject to a budget constraint, p1 x1 + p2 x2 = y, where
p i is the price of good i and y is income. Rearranging the budget constraint:
x2 =
y p2
−
p1 p 2 x1 .
Substituting this into the utility function gives µ ¶ y p1 uˆ ( x1 , y, p1 , p2 ) = u x1 , − x1 . p2 p2
Taking ( y, p1 , p2 ) as fixed (beyond the control of the consumer), maximizing utility amounts to choosing x1 to maximize uˆ ( x1 , y, p1 , p2 ). The impact of variations in x1 on uˆ requires calculation of the direct effect via x1 and the indirect effect via x2 = y p1
−
p1 p 2 x1 .
When x1 is chosen optimally, the maximized value of uˆ gives the utility
of the individual from choosing optimally – let u∗ ( y, p1 , p2 ) = max x1 uˆ ( x1 , y, p1 , p2 ).
The function u∗ gives the utility of the individual as a function of ( y, p1 , p2 ), so one may consider the impact of income and price variations on the individual’s welfare. The impact on utility of an increase in income from y to y + ∆ y is given
by u∗ ( y + ∆ y, p1 , p2 ) − u∗ ( y, p1 , p2 ), and the perunit change is given by
u∗ ( y + ∆ y, p1 , p2 ) − u∗ ( y, p1 , p2 ) . ∆y This expression is a partial derivative – calculating the impact on u∗ of a small change in y, with the other parameters affecting u∗ held fixed. The partial derivative is the analogous concept to the derivative of a function of one variable, for functions of more than one variable. In this example, partial derivatives appear both in the choice of x1 and the subsequent calculation of the marginal utility of income. In production theory, a production function f (k, l ) associates a level of output
Q = f (k, l ) to inputs of capital (k) and labor (l ). If capital is increased from k to k +
∆, then output increases to f (k+∆, l ) and the change in output is f (k + ∆, l ) − f (k, l ).
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
222
Dividing this by ∆ gives [ f (k + ∆, l) − f (k, l)]/∆, the change in output per unit
change in the capital input. As ∆ goes to 0, assuming f is differentiable with respect to k, the limiting value is called the marginal product of capital. In demand theory, when a profile of goods (x1 ,... , x n ) is chosen to maximize utility subject to a budget constraint p1 x1 + · · · + p n x n = y, the resulting demands depend on the profile of prices and income. Thus, the demand for good i is x i (p1 ,... , p n , y). When prices other than p i and income y are constant, changes in p i lead to changes in x i , and this connection between x i and p i gives the demand function for good i. As p i varies, the resulting change in x i is [x i (p1 ,... , p i−1 , p i + ∆ p i , p i+1 , p n , y) − x i (p1 ,... , p i−1 , p i + ∆ p i , p i+1 , p n , y)] . ∆ pi The partial derivative with respect to one variable depends on the values of the other variables entering the function. For example, the impact of a price change on demand depends on the prices of other goods and income. In Figure 7.1, the partial derivative with respect to p i at (p1 ,... , p i ,... , p n , y) is different from ˜ since ∆ x˜ i  > ∆ x i . the partial derivative at ( p˜ 1 ,... , p i ,... , p˜ n , y), The next section defines the partial derivative of a function – a natural extension of the derivative of a function of a single variable, to functions of many variables. Following this, level contours are introduced and then, using partial derivatives, the equilibrium output in a duopoly is determined.
xi
∆ x˜ i ∆xi
∆ pi
˜ x i ( p˜ 1 ,... , p i ,... , p˜ n , y) x i (p1 ,... , p i ,... , p n , y) pi
Figure 7.1: The impact of price variations on demand
7.2. PARTIAL DERIVATIVES
223
7.2 Partial derivatives Let f : X → R , where X ⊆ R n , so f ( x) = f ( x1 , x2 ,... , x n ). If all the xs except one
are held fixed, we can view f as a function of one variable, and then consider the derivative of f with respect to this variable. This is called the partial derivative. The partial derivative of f with respect to x i is denoted ∂f ∂xi
Thus,
= lim
∂f ∂ xi
∆ xi →0
∂f ∂ xi
and defined as:
f ( x1 , x2 ,... , x i−1 , x i + ∆ x i , x i+1 ,... , x n ) − f ( x1 , x2 ,... , x n ) . ∆xi
is defined provided this limit exists for all ∆ x i → 0. The partial
derivative of f with respect to x i measures the change in f due to a small change in x i , when other variables are held constant.
E XAMPLE 7.2: With f ( x1 , x2 ) = ( x1 x2 )2 + cos( x1 ) sin( x2 ), ∂f ∂ x1 ∂f ∂ x2
= 2( x1 x2 ) x2 − sin( x1 ) sin( x2 ) = 2 x1 x22 − sin( x1 ) sin( x2 ), = 2( x1 x2 ) x1 + cos( x1 ) cos( x2 ) = 2 x12 x2 + cos( x1 ) cos( x2 ).
♦ R EMARK 7.1: Often, for notational convenience, the partial derivative of f with respect to x i is denoted f xi or f i when no confusion can arise. Also, where there are just two variables in consideration, these may be labeled x and y with partial derivatives f x and f y . With more than two variables, it is often more conve
nient to label by index, x i , i = 1,... , n. Note that for f ( x1 , x2 ,... , x n ),
∂f ∂ xi
is a function of all the x i . One can take the
partial derivative of this function – the partial derivative of the derivative. n partial o derivative of
∂f ∂ xi
∂f
∂f ∂ xi
with respect to x i is ∂∂x ∂ x and the partial i i n o ∂f with respect to x j is ∂∂x ∂ x . For ease of notation, these are
Thus, the partial derivative of
sometimes written as
j
∂2 f ∂ x2i
and
∂2 f ∂ x j ∂ xi
i
or as f xi xi and f x j xi . In the twovariable case ∂f
∂f
with f ( x, y), the partial derivatives are f x = ∂ x and f y = ∂ y and the second partial n o n o n o ∂f ∂f ∂f derivatives are written as f xx = ∂∂x ∂ x , f yy = ∂∂y ∂ y f x y = ∂∂x ∂ y . and f yx = n o ∂f ∂ . ∂ y ∂x
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
224
E XAMPLE 7.3: To illustrate, consider the twovariable case with f ( x, y) =
xα yβ . Then:
f x ( x, y) = f x ( x, y) = Notice that f x ( x, y) =
∂f ∂x ∂f ∂y
= α x α −1 y β = α
xα β α y = f ( x, y) x x
= β xα yβ−1 = β xα
∂f ( x, y) ∂x
=
∂{ f ( x,y)} ∂x
yβ β = f ( x, y). y y
= α xα−1 yβ depends on x and y – it
is a function of ( x, y). The n ofunction f x ( x, y) has a partial derivative with re∂ ∂f spect to x, which is ∂ x ∂ x = α(α − 1) xα−2 yβ . Similarly, f x ( x, y) has a partial n o ∂f derivative with respect to y given by ∂∂y ∂ x = αβ xα−1 yβ−1 . In Example 7.2, the second partial derivatives are given by:
f x1 x1 = f x2 x1 = f x2 x2 = f x1 x2 =
∂ ∂ x1 ∂ ∂ x2 ∂ ∂ x1 ∂ ∂ x1
½ ½ ½ ½
∂f ∂ x1 ∂f ∂ x1 ∂f ∂ x2 ∂f ∂ x2
¾ ¾ ¾ ¾
= 2 x22 − cos( x1 ) sin( x2 ) = 4 x1 x2 − sin( x1 ) cos( x2 ) = 2 x12 − cos( x1 ) sin( x2 ) = 4 x1 x2 − sin( x1 ) cos( x2 ).
♦
Note that in Example 7.3, f x2 x1 = f x1 x2 . This is not accidental. When the partial
derivatives exist and are continuous, then the second partial derivatives satisfy a special relation known as Young’s theorem. Theorem 7.1: Suppose the function f ( x1 ,... , x n ) has continuous partial derivatives, f xi ( x1 ,... n , xon ), in a neighborhood of ( x¯ 1 ,... , x¯ n ), and suppose they are differ
entiable,
∂ ∂ xi
∂f ∂x j
exists for each i, j; then the second partial derivatives are equal:
f xi x j =
∂2 f ∂xi ∂x j
=
∂ ∂xi
½
∂f ∂x j
¾
=
∂ ∂x j
½
∂f ∂xi
¾
=
∂2 f ∂x j ∂xi
= f x j xi .
E XAMPLE 7.4: Consider f ( x, y) = ln( x y) · sin y, so that f x = 1 x
y xy
sin y =
and f yx = cos y. Differentiating first with respect to y gives: f y = 1 y
x xy
1 x
sin y
sin y +
ln( x y) cos y = sin y + ln( x y) cos y. And then differentiating with respect to x:
fxy =
y xy
cos y = 1x cos y. So, f x y = f yx .
♦
7.3. LEVEL CONTOURS
225
7.3
Level contours
Level contours arise frequently in economics, for example, as isoquants in production theory or as indifference curves in consumer theory. Given a function of n variables, f (x1 ,... , x n ), the level contour of f at level or height f¯ is the set of values of (x1 ,... , x n ) such that f (x1 ,... , x n ) = f¯. If this set is denoted L ¯ , then f
L f¯ = {(x1 ,... , x n )  f (x1 ,... , x n ) = f¯} Different values of f¯ determine different level contours. The level contour is a way of visualizing a function of more than one variable.
y
z
xy
5 4 3 2
z=3
1
z=2 z=1
1
2
3
4
x
5 xy
1
1
Figure 7.2: The function f (x, y) = x 2 y 2 1
1
In Figure 7.2, the function f (x, y) = x 2 y 2 is plotted, and three level contours
graphed, giving the (x, y) pairs associated with the level contours {(x, y)  f (x, y) = 1}, {(x, y)  f (x, y) = 2} and {(x, y)  f (x, y) = 3}.
Given a utility function u(x, y), the indifference curve corresponding to utility ¯ level u¯ is defined as the set of (x, y) combinations that provide level of utility, u: I u¯ = {(x, y)  u(x, y) = u¯ }. The isoquant corresponding to a production function f
is the set of input combinations (x, y) that yield a given level of output Q: I Q = {(x, y)  f (x, y) = Q }. Considering the production function f , since a small change
in x has impact f x on the value of output and a small change in y has impact f y , the total change in output from small changes (dx, d y) in (x, y) is d f = f x (x, y)dx +
f y (x, y)d y. This is called the total derivative of f at (x, y). Note that d f depends on
(x, y) and (dx, d y). An isoquant is a level contour of the function f . Starting from a point (x, y) on the contour, small variations in x and y that leave the value of
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
226
y
dy
f (x, y) = Q x
dx
Figure 7.3: Isoquant slope f unchanged satisfy d f = 0. This determines the rate of change in y required to
offset a change in x so as to leave the output level unchanged (see Figure 7.3): dy = −
f x (x, y) dx. f y (x, y)
E XAMPLE 7.5: This type of calculation may be used in other contexts. Consider a government that imposes an advalorem tax on all goods at the rate τ. Let s i be the total sales of good i and let p i be the price of good P i, both constant over time. Total tax revenue is T = ni=1 (1 + τ)s i p i . Suppose now that prices increase at the rate π(t) so that at time t, the price of good i
is p i (1 + π(t)). Suppose also, that the government adjusts the tax rate τ over P time to keep the total tax revenue constant: T = ni=1 (1 + τ(t))(1 + π(t))s i p i = P (1 + τ(t))(1 + π(t)) ni=1 s i p i . Then, differentiating with respect to t, 0=
½
¾ n d τ(t) d π(t) X (1 + π(t)) + (1 + τ(t)) si pi . dt dt i =1
Therefore, 0=
·
¸ d τ(t) d π(t) (1 + π(t)) + (1 + τ(t)) . dt dt
Thus, taxes should be adjusted over time according to the formula: d τ(t) (1 + τ(t)) d π(t) =− . dt (1 + π(t)) dt
♦
7.4. THE ELASTICITY OF SUBSTITUTION
227
7.4
The elasticity of substitution
The elasticity of demand measures the slope
dQ dP
of a curve (the demand function)
relative to the ratio of the variables at that point on the curve,
Q P.
This is the slope
of the curve at a point, divided by the slope of the ray from the origin to that point. The elasticity of substitution measures how that ray varies as the slope varies (all in percentage terms). In principle, this measure can be applied to the level surface of any function, but two particularly important functions in economics are the utility and production functions. Consider Figure 7.4, where the indifference curve u is given and a change in slope from a to b depicted. The marginal rates of substitution at a and b are denoted MRS(a) and MRS(b) and denote the slope of the indifference curve at the respective points. The ray slope from the origin to a is the ratio of consumption of x2 relative to x1 , u x1 (a) u x2 (a) .
substitution is MRS(a) =
x2a , x1a
when the marginal rate of
When the marginal rate of substitution changes
to MRS(b), the slope of the ray becomes
x2b x1b
. Therefore, the percentage change
in the ratio of consumption resulting from the percentage change in the MRS is given by the following expression, called the elasticity of substitution: "
xb 2 xb 1
# · ¸ xa − x2a 1
·
η=
xa 2 xa 1
¸
MRS(b)− MRS(a) MRS(a)
=
·
MRS(a) x2a x1a
In Equation 7.1, with u given, the ratio
¸ h ai x2 b − xa
x2b x1
1
MRS(b) − MRS(a) x2 x1
.
(7.1)
is viewed as a function of MRS,
with the elasticity tracking the change in the ratio resulting from the change in MRS. Considering this measure at a point corresponds to the difference between MRS(a) and MRS(b) becoming small (approaching 0 in the limit). Here, this leads to an expression in terms of the ln(x) function. Recall that the derivative of ln(x) is
1 x
so that
1 x
=
d ln(x) dx
≈
ln(x+∆)−ln(x) . ∆
Thus, ln(x + ∆) − ln(x) ≈
∆ x
′
= ³x x− x´
where x′ = x + ∆. From this perspective, the numerator in Equation 7.1 is d ln and the denominator is d ln(MRS). In differential terms: ³x ´ ³x ´ 2 2 d ln x1 d ln x1 ³ ´, η= = d ln(MRS) d ln u x1 ux
x2 x1
(7.2)
2
recalling the fact that MRS is obtained from du = 0 = u x1 dx1 + u x2 dx2 , giving d x2 d x1
ux
= − u x1 , giving MRS = 2
u x1 u x2
(measuring the slope in absolute terms).
Regarding the sign of η, note that, in the movement from a to b, the slope of the indifference curve is always negative but increases towards 0. So, in absolute
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
228
x2
x2a
•
a •b
x2b
u
MRS(a) MRS(b)
x2 x1
x1
x1a x b 1
Figure 7.4: Elasticity of substitution
terms, the slope ( MRS ) declines (is getting smaller). Likewise, the ratio of x2 to
x1 gets smaller, moving from a to b. Therefore, in moving from a to b both the ratio and MRS decline. Conversely, in the movement from b to a both increase. Consequently, as defined, the elasticity of substitution is nonnegative, provided the level surface of the function is convex, as depicted.
E XAMPLE 7.6: The constant elasticity of substitution function is: ρ 1
ρ
u( x1 , x2 ) = [α x1 + (1 − α) x2 ] ρ . Therefore,
u x1 = u x2 =
1 ρ 1 ρ
ρ
ρ 1 −1
ρ
ρ 1 −1
[α x1 + (1 − α) x2 ] ρ [α x1 + (1 − α) x2 ] ρ
ρ −1
αρ x1
ρ −1
(1 − α)ρ x2
.
Therefore, MRS is:
u x1 u x2
=
ρ −1
αρ x1
ρ −1
(1 − α)ρ x2
µ
x1 = 1 − α x2 α
¶ρ −1
.
Taking the ln of both sides, ln
µ
u x1 u x2
¶
= ln
µ ¶ µ ¶ ³ α ´ ³ α ´ x1 x2 = ln . + (ρ − 1) ln + (1 − ρ ) ln 1−α x2 1−α x1
7.4. THE ELASTICITY OF SUBSTITUTION
229
Therefore, ln
µ
¶ µ ¶ ³ α ´ u x1 x2 1 1 =− ln + ln x1 1−ρ 1−α 1−ρ u x2
and ³ ´ d ln xx2 1 1 ³u ´ = η= . x1 1 − ρ d ln u x
♦
2
The constant elasticity function can exhibit substantial variation in shape, depending on the value of ρ . Figure 7.5 illustrates three possible values for ρ . The figure suggests some properties of this function as ρ varies. When ρ = 1, the
function is linear. As ρ decreases to 0, the function approaches the Cobb–Douglas function, x1α x21−α , while, as ρ goes to −∞, the function approaches the Leontief
function, min{ x1 , x2 } (and as ρ tends to ∞, the function tends to max{ x1 , x2 }). x2
x2
x2
ρ=2
ρ=
1 2
ρ = −4
x1
x1
x1 ρ
ρ 1
Figure 7.5: Plots of level contours of u = [x1 + x2 ] ρ ρ 1
ρ
Theorem 7.2: Given u(x1 , x2 ) = [α x1 + (1 − α)x2 ] ρ , then: ρ
ρ 1
ρ
ρ 1
• [α x1 + (1 − α)x2 ] ρ → x1α x21−α as ρ → 0 • [α x1 + (1 − α)x2 ] ρ → min{ x1 , x2 } as ρ → −∞. P ROOF : Write uρ to make dependence on ρ explicit. Assume at least one of ρ
ρ
{ x1 , x2 } is strictly positive, so that α x1 + (1 − α)x2 is strictly positive. Let
U ρ (x1 , x2 ) = f (uρ (x1 , x2 )) = ln uρ (x1 , x2 ) =
ρ ρ ln[α x1 +(1−α)x2 ]
ρ
ous monotonic transformation of u. Observe that
ρ α x1
=
q(ρ ) . ρ
So, f is a continuρ
+ (1 − α)x2 → 1 as ρ → 0,
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
230
ρ
ρ
so ln[α x1 + (1 − α) x2 ] → 0 and therefore both numerator and denominator go to
0 as ρ goes to 0. Using l’Hospital’s rule, the limit of this ratio is the limit of the derivatives: ρ
ρ
lim
ln[α x1 + (1 − α) x2 ] ρ
ρ →0
using the fact that function f
−1
d ρ dρ z z
ρ
ρ
= lim =
α x1 ln x1 + (1 − α) x2 ln x2 ρ
ρ
= α ln x1 + (1 − α) ln x2 ,
ρ →0
α x1 + (1 − α) x2
d (ln z )ρ dρ e
= ln ze(ln z)ρ = ln z · zρ and limρ →0 zρ = 1. The
( z) = e is continuous, so that uρ = f −1 (U ρ ) is continuous. Since
U ρ → α ln x1 + (1 − α) ln x2 ,
uρ ( x1 , x2 ) → eα ln x1 +(1−α) ln x2 = eα ln x1 e(1−α) ln x2 = x1α x21−α. So, as ρ tends to 0, the utility function tend to the Cobb–Douglas function. In the second case, suppose that x1 ≤ x2 . Write µ ¶ρ ¾¸ 1 · ½ ρ x2 ρ 1 ρ ρ [α x1 + (1 − α) x2 ] ρ = x1 α + (1 − α) x1 · µ ¶ρ ¸ 1 ρ x2 = x1 α + (1 − α) . x1 ³ ´ρ ≥ 1 and ρ → −∞ if x2 = x1 then xx12 = 1 for all ρ and if x2 > x1 , ³ ´ρ x2 → 0. Therefore α + (1 − α) xx2 → d where α ≤ d ≤ 1. Since ρ → −∞, ρ1 → 0, x1 1 h ³ ´ρ i 1 ρ x2 so that α + (1 − α) x1 → 1. So, for x1 ≤ x2 , the limit is x1 , and conversely;
Since ³ ´ρ
x2 x1
when x2 ≤ x1 , the limit is x2 . Therefore, as ρ → 0, £
ρ
ρ ¤ ρ1
α x1 + (1 − α) x2
→ min{ x1 , x2 }.
7.5 Market equilibrium: an application In the following discussion, partial derivatives are used to find the equilibrium output in an oligopolistic market with n firms. Consider a market where the (inverse) demand is given by P (Q ), giving the price at which the quantity Q will sell. Suppose there are n firms, where firm i produces output q i . Let Q denote P aggregate output: Q = ni=1 q i . Suppose also that firm i has cost function c( q i ), giving the total cost of producing q i units for firm i . Sales of q i units at price p
generate a profit of pq i − c( q i ) for firm i , and suppose the objective of the firm is
to maximize profit. The problem in this environment is that the price depends on total output Q , and Q depends on q i and the other firms’ output levels, { q j } j 6= i .
7.5. MARKET EQUILIBRIUM: AN APPLICATION
231
So, profit of firm i depends on the output levels of all the firms. Making this explicit: π i ( q1 ,... , q n ) = q i P (Q ) − c i ( q i ) = q i P
Ã
n X
j =1
!
q i − c ( q i ).
Given the profile of output levels ( q1 ,... , q n ), a firm, say firm i , may consider the possibility of changing output to some other level, say q′i . It is in the firm’s interest to do so if: π i ( q1 ,... , q i−1 , q′i , q i+1 ,... , q n ) > π i ( q1 ,... , q i−1 , q i , q i+1 ,... , q n ).
In that case, changing output from q i to q′i raises profit. When firms behave in this way (competing by choice of quantity and picking the profitmaximizing output, taking the actions of others as given), the form of competition is called Cournot competition. From the previous expression, if firm i chooses q i to maximize π i , at the chosen level of q i it must be that either increasing or reducing q i by a small amount provides no increase in firm i ’s profit: ∂π i ( q1 ,... , q i−1 , q i , q i+1 ,... , q n ) ∂qi
= 0.
Definition 7.1: Call an output profile ( q∗1 ,... , q∗n ) a Cournot equilibrium if no firm has an incentive to change output level.
This condition (that no firm has an incentive to change output) will be true if, for each i , π i ( q∗1 ,... , q∗i−1 , q i , q∗i+1 ,... , q∗n )
is maximized at q i = q∗i , taking q∗j constant for j 6= i .
A necessary condition for maximization of π i ( q∗1 ,... , q∗i−1 , q i , q∗i+1 ,... , q∗n ) by
choice of q i is that the (partial) derivative with respect to q i is equal to 0 at
q i = q∗i . The derivative of π i ( q∗1 ,... , q∗i−1 , q i , q∗i+1 ,... , q∗n ) with respect to q i equals
0 at q i = q∗i :
∂ ∂qi
(Here,
π i ( q∗1 ,... , q∗i−1 , q∗i , q∗i+1 ,... , q∗n ) = 0.
∂ π ( q∗ ,... , q∗i−1 , q∗i , q∗i+1 ,... , q∗n ) = 0 ∂q i i 1
∂ ∂qi
evaluated at q i = q∗i .)
is understood to mean the derivative
π i ( q∗1 ,... , q∗i−1 , q i , q∗i+1 ,... , q∗n )
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
232
Whether this condition is sufficient for the output vector to be an equilibrium is taken up later. The next section applies these calculations in the case where demand and cost functions are linear.
7.5.1 Oligopoly: linear demand To illustrate the calculations, consider the case of linear demand P (Q ) = a − bQ ,
and let firm i ’s cost be given by c( q i ) = cq i , so that every firm faces the same constant marginal cost, c, and 0 fixed cost. Then Ã
πi = q i a − b
!
n X
q j − cq i
Ã
!
j =1
= aq i − bq i = aq i − bq i
n X
j =1
Ã
X
j 6= i
q j − cq i !
q j − bq2i − cq i .
Taking the partial derivative of π i with respect to q i and setting to zero gives: ∂π i ∂qi
= a − c − 2bq i − b
X
j 6= i
q j = 0,
and rearranging the equality: X ( a − c) = 2qi + q j. b j 6= i
This gives a system of equations:
2
1
1 . .. 1
2 .. .
··· .. .
1
···
1
···
which has the solution: ( q∗1 ,... , q∗n ) =
q1
a− c
b a− q2 c 1 b . = . , .. . . . . . a− c 2 qn b
³
a− c a− c ( n+1) b ,... , ( n+1) b
´
. (To see this note that if the
q i are all equal and satisfy any equation, then they satisfy all equations, because
with q i = q∗ , each equation requires (n + 1) q∗ = each i , and
∂ ∂qi
a− c b .)
Thus, q∗i = q∗ =
π i ( q∗1 ,... , q∗i−1 , q∗i , q∗i+1 ,... , q∗n ) = 0
1 a− c n +1 b
for
7.5. MARKET EQUILIBRIUM: AN APPLICATION
233
is satisfied. This is called a symmetric equilibrium, since each firm makes the same choice. In fact, the solution might have been determined more easily. From the condition ∂π i ∂qi
= a−b
Ã
X
j 6= i
!
q i − 2bq i − c = 0,
assuming that the equilibrium is symmetric (each firm makes the same choice), P this will be satisfied by the symmetric choice q i = q∗ so that j 6= i q i = (n − 1) q∗ , giving a − b(n − 1) q∗ − 2bq∗ − c = 0, or a − c − b(n + 1) q∗ = 0, or q∗ = ∗
case, total output is Q o = nq =
n a− c n +1 b .
1 a− c n +1 b .
In this
R EMARK 7.2: Recall that the competitive quantity, Q c , is the quantity that leads to price equaling marginal cost: P (Q c ) = c. In the linear case, Q c =
( a − c) b ,
so
that Q o , the equilibrium oligopoly output, is related to Q c according to n ( n+1) Q c .
The price associated with the oligopoly output is P (Q o ) = a − n ( a − c) n bQ o = a − b ( n+1) Q c . Substituting in the value of Q c : P (Q o ) = a − b ( n+ 1) b = a( n+1)− n(a− c) (a+ nc) a n n = ( n+1) = ( n+1) + c ( n+1) . To summarize: a − ( n+1) [(a − c)] = ( n+1)
Qo =
Qo =
n 1 n Q c , and P (Q o ) = a + c. n+1 n+1 n+1
As the number of firms becomes large (n → ∞), the quantity converges to the competitive quantity, Q o → Q c , and the price converges to the competitive price n 1 (marginal cost), since a n+ 1 + n +1 c → c .
7.5.2
Oligopoly: nonlinear demand
In the general case of nonlinear price and cost functions, the profit function is: π i ( q1 ,... , q n ) = q i P
Ã
n X
i =1
!
q i − c ( q i ).
Repeating the earlier calculations: Ã ! n X ∂π i dP qi + qi · =P − c ′ ( q i ) = 0. ∂qi dQ i =1 In a symmetric equilibrium, each firm produces the same quantity, say q∗ , and this must satisfy the equation:
P (nq∗ ) + q∗ P ′ (nq∗ ) − c′ ( q∗ ) = 0 ½ ¸¾ · 1 nq∗ ′ ∗ P (nq∗ ) 1 + P ( nq ) = c ′ ( q ∗ ), n P (nq∗ )
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
234
or, using Q ∗ = nq∗ , ¸¾ · 1 Q∗ ′ ∗ P (Q ) = c′ ( q∗ ) P (Q ) 1 + n P (Q ∗ ) ∗
½
P and with η(Q ) denoting the elasticity of demand at Q , η(Q ) = − Q
dQ dP ,
¸ · 1 1 = c ′ ( q ∗ ). P (Q ∗ ) 1 − n η(Q ∗ )
Thus, again price converges to marginal cost as n → ∞. This is a common theme
in economic models: competition (a large number of firms) leads to efficient levels of output.
7.5.3 Industry concentration indices The previous discussion suggests that when a small number of firms are supplying a market, there is less competition, output is lower and prices higher. In industrial organization, a market is said to be “concentrated” if there are a small number of suppliers or if a small number of firms dominate the market. A natural question is: when is a market too concentrated, in the sense that market dominance of a few firms leads to large inefficiencies? One measure of concentration is the Herfindahl–Hirschman index. This index is a simple function of market shares of firms. The discussion here relates that index to the Cournot model using the techniques developed earlier. When there are a small number of firms in an industry, some firms will have a larger share of industry sales. If one or two firms have a large share of total sales, the industry is considered to be concentrated. The Herfindahl–Hirschman index (HH I ) of concentration in an industry is defined: 2
2
2
HH I = (100s1 ) + (100s2 ) + · · · + (100s n ) = 100
2
Ã
n X
i =1
s2i
!
,
where s i is the share of firm i of industry sales. This sum is minimized when the shares are all equal, so that larger values of HH I suggest greater industry concentration. (For example, with just two firms, s1 + s2 = 1, so that s21 + s22 =
s21 + (1 − s1 )2 . Differentiating with respect to s1 and setting to 0 gives 2s1 − 2(1 −
s1 ) = 0 = 4s1 − 2. So s1 =
1 2
and this corresponds to a minimum, since the second
derivative of s21 + (1 − s1 )2 equals 4, which is positive.) With n firms, the minimum P P occurs at s i = n1 for all i . In this case, s2i = 12 and ni=1 s2i = ni=1 12 = n 12 = n1 . n n n Consequently, the smallest possible value for HH I is 1002 n1 , when there are n firms. (In fact, viewing the HH I index as a function of the shares: HH I (s1 ,... , s n ),
7.6. TOTAL DIFFERENTIALS
235
the total change is d (HH I ) = (1002 )2 · [s1 ds1 + s2 ds2 +· · ·+ s n ds n ], where the share P changes must add to 0: ni=1 ds i = 0. If, for some j, k, s j > s k then setting ds k = −ds j with ds j < 0 decreases the HH I . Thus, s j = s k is necessary for a minimum.)
Consider now the firm’s optimizing condition in choosing q i . In what follows, allow different firms to have different cost functions: the cost function of firm i is
c i ( q i ). Maximizing profit for i gives the firstorder condition: Ã ! n X dP qi + qi · P − c′i ( q i ) = 0. dQ i =1 Write Q =
P
i
q i , so that s i =
qi . Q
Rearrange the condition as:
P (Q ) − c′i ( q i ) + q i · P ′ (Q ) = 0. Divide by P (Q ):
P (Q ) − c′i ( q i ) P (Q )
+ qi ·
P ′ (Q ) = 0. P (Q )
qi : Q
Multiply by
′ 2 P ′ (Q ) q i q i P (Q ) − c i ( q i ) +Q · = 0. Q P (Q ) P (Q ) Q 2
Recalling that s i =
qi Q
η(Q ) = −
and
P dQ 1 Q dP Q P ′ (Q ) ⇒ =− = − P ′ (Q ) = −Q , Q dP η(Q ) P dQ P P (Q )
this expression may be written:
si So, n X
i =1
si
P (Q ) − c′i ( q i ) P (Q )
=
P (Q ) − c′i ( q i ) P (Q )
−
1 2 s = 0. η(Q ) i
n n X 1 1 X HH I 1 1002 s2i = s2i = 2 2 η(Q ) η(Q ) i=1 100 η(Q ) 100 i =1
Since P (Q ) − c′i ( q i ) is the price over cost margin of firm i ,
Pn
i =1 s i
P (Q )− c′i ( q i ) P (Q )
is the
weighted average of these margins. So, the higher is the HH I , the higher is the average price–cost margin or markup.
7.6 Total differentials The partial derivative of a function f ( x1 ,... , x n ) with respect to a variable x i measures the impact of a change in x i on the value of the function f . If f xi is the
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
236
partial derivative of f with respect to x i , then the impact of a small change in
x i , ∆ x i , on f is approximately f xi ∆ x i . One may consider the impact of multiple changes in all the x i at the same time, to consider a total variation in f resulting from a set of small changes in all the x i . Given a function f ( x1 ,... , x n ), the total differential of f , denoted d f , is defined:
df =
n ∂f n X X dx i = f xi dx i . i =1 ∂ x i i =1
(7.3) ∂f
³Note that d f ´depends on x = ( x1 ,... , x n ) and dx = (dx1 ,... , dx n ). Put ∂ x = ∂f ∂f ∂f ∂f ∂ x1 , ∂ x2 ,... , ∂ xn and dx = ( dx1 , dx2 ,... dx n ). Then d f = ∂ x · dx. In some branches
of mathematics, this sum is sometimes referred to as a “1form”, consisting of the inner product of ( f x1 ,... , f xn ) and the vector direction (dx1 ,... , dx n ). Given
( x1 ,... , x n ), the partial derivatives are determined, the functions f xi ( x1 ,... , x n ), and Equation 7.3 is an equation in the changes, the dx i .
E XAMPLE 7.7: The following functions illustrate the total differential: 1. Let f ( x, y) = x2 ln( y). 1 d f = 2 x ln( y)dx + x2 d y. y 2. Let f ( x, y) = sin( x) cos( y) ln( x y). The total differential is: · ¸ 1 d f = cos( x) cos( y) ln( x y) + sin( x) cos( y) dx x · ¸ 1 + − sin( x) sin( y) ln( x y) + sin( x) cos( y) d y. y
3. Let f ( x, y) = etan( x y) . The total differential is:
df = e using the fact that 4. Let f ( x, y) =
³
tan( x y)
d dz
´ ln( y) α . ex
df =α
·µ
¶ µ ¶ ¸ y x dx + dy , cos2 ( x y) cos2 ( x y)
tan( z) = sec2 ( z) =
1 . cos2 ( z )
The total differential is: µ
ln( y) ex
¶α−1 ·µ
¶ µ ¶ ¸ − ln y 1 dx + dy . ex ye x
♦
7.6. TOTAL DIFFERENTIALS
237
The partial derivative of a function f (x, y), say with respect to x, f x (x, y), measures the change in f (x, y) due to a small change in x, dx. This is approximately
fx ≈
f (x + dx, y) − f (x, y) . dx
Thus, f x (x, y)dx ≈ f (x + dx, y) − f (x, y) is the change in f resulting from the change
in x. So, d f = f x (x, y)dx + f y (x, y)d y measures the overall change in f from the
combined effects of changing x and y. When the overall change is 0, d f = 0, and so 0 = f x (x, y)dx + f y (x, y)d y. This is an equation in dx and d y, given (x, y), which f (x,y)
may be solved to give d y = − f xy (x,y) dx, the change in y necessary to keep the value of the function constant, given the change in x of dx.
For example, consider the function f (x, y) = ax + b y. Then d f = adx + bd y,
since f x = a and f y = b. If d f = 0, then 0 = adx + bd y and d y = − ab dx. If x is
increased by dx, y must be reduced by d y = − ab dx to maintain the value of the
function constant. The following example utilizes this “tradeoff ” computation in an economics application (using the marginal rate of substitution). Sometimes, partial derivatives may be used in the simplification of a problem. The following example reduces a twovariable constrained optimization problem to a onevariable problem, where partial derivatives are then used to locate the optimum. Utility is given as a function of income (y) and leisure (l): u = u(y, l). If cur
rent income and leisure levels are (y, l), then small changes in income and leisure, d y and dl, produce a change in utility given by: du = u y (y, l)d y + u l (y, l)dl, the
total differential of u at (y, l). To simplify notation, write du = u y d y + u l dl. When
this is set to 0, the equation 0 = u y d y + u l dl relates small changes in y to small u
u
changes in l that leave utility unchanged: d y = − u yl dl or dl = − u yl d y. Thus, dy dl
u
= − u yl given the rate at which y must be substituted for l at the point (y, l)
to keep utility constant: as l (leisure) increases, y must be reduced at the rate u
− u yl to keep utility constant. The partial derivatives u y and u l are the marginal
utility of y and l respectively.
With work defined as available time (24) minus leisure (l), the time spent working is 24 − l and at wage ω, income is y = ω(24 − l). Therefore, d y = −ωdl.
Substitute into the total differential: du = u y (−ωdl) + u l dl = [−ω · u y + u l ]dl. If
utility is maximized, then it should not be possible for small changes in l to increase utility. For this to be the case requires that [−ω · u y + u l ] = 0, or ω · u y = u l , or ω =
ul uy .
Figure 7.6 depicts the solution.
The same condition is obtained directly by optimization. The income constraint, y = ω(24 − l), can be substituted into the utility function: u(ω(24 − l), l) = u∗ (l). The
problem of maximizing utility reduces to maximizing u∗ with respect to l.
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
238
y 24w
•
ul uy
=w
l
24
Figure 7.6: Optimal income–work combination
max u∗ gives l
dy dl
Now,
du∗ ∂u d y ∂u =0= + . dl ∂ y dl ∂l
= −ω, from the budget constraint, so the firstorder condition becomes −u y ω + u l = 0, or ω =
where
MUl MU y
ul MUl = , u y MU y
is the ratio of marginal utilities. For a numerical example, suppose 1
1
that u( y, l ) = y 2 l 2 and so:
uy = This gives ω =
ul uy
=
y l
1 −1 1 y 2l2 = 2
1 21 21 2y l
y
; ul =
1 1 −1 y2 l 2 = 2
1 21 12 2y l
l
.
or ωl = y as the optimizing condition. Recalling that y =
ω(24 − l ), the optimizing condition may be written ωl = ω(24 − l ). Thus, 2ωl = 24ω
or l =
24 2
= 12 and y = ωl = 12ω.
E XAMPLE 7.8: This example considers a standard profit maximization problem. Let a firm’s profit be given by: π( k, l ) = P f ( k, l ) − rk − wl.
7.6. TOTAL DIFFERENTIALS
239
The total derivative of π is:
d π = πk dk + πl dl = (P f k − r)dk + (P f l − w)dl. For profit to be maximized, it must be that there is no variation in k or l that raises profit. This implies that (P f k − r) = 0 and (P f l − w) = 0. If, for
example, (P f k − r) < 0, then reducing k (dk < 0) would raise profit, while
if (P f k − r) > 0, then raising k (dk > 0) would raise profit. This gives the firstorder conditions. The firstorder conditions are: ∂π
= P f k − r = 0, f k =
∂k
and
∂π ∂l
= P f l − w = 0, f l =
These conditions imply that f k =
r P
and f l =
∂f ∂k ∂f ∂l
.
w P.
Consider a specific functional form: f (k, l ) = kα l β , then:
f f f k = αkα−1 l β = α , and f l = βkα l β−1 = β . k l Rearranging,
fk fl
=
α l β k
=
r w,
so that αwl = βrk and k =
αw βr l .
This determines
k as a function of l . To solve for l , use either of the firstorder conditions. Taking f k = αkα−1 l β =
r P
and substituting for k yields:
α
α
µ
µ
αw βr
αw βr
¶α−1
¶ α −1
l α −1 l β =
l (β+α−1) =
r P r . P
Solving for l :
l=
½
µ ¶ ¾1/(α+β−1) r 1 α w 1 −α . P α βr
Similarly, the value of k is then determined: µ ¶ ¾1/(α+β−1) r 1 α w 1 −α k= βr P α βr )1/(α+β−1) (µ ¶ µ ¶ αω α+β−1 ³ r ´ 1 αw 1−α = βr P α βr (µ ¶β ³ ´ )1/(α+β−1) r 1 αω = βr P α αw
½
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
=
=
=
(µ
αω
(µ
αω
(
βr
βr
¶β ³
240
µ ¶ µ ¶)1/(α+β−1) w´ β ³ r ´ 1 P α ω β
¶β−1 ³
µ ¶)1/(α+β−1) w´ 1 P β
)1/(α+β−1) µ ¶ w 1 β r 1 −β . P β αω
♦
7.6.1 The impact of parameter changes The following discussion illustrates the impact of parameter shifts on the solution to a pair of equations. A production function q = f (x, y) gives the output, q, associated with inputs x and y. Assume that raising input of either x or y raises output, f x > 0 and f y > 0. An isoquant identifies different combinations of x and y that give
the same level of output. For example, I q = {(x, y)  f (x, y) = q} is the set of (x, y)
pairs that give output q. Under standard assumptions on the function f , an isoquant is convex to the origin, as depicted in Figure 7.7. Changing x and y by small amounts, dx and d y, changes output according to dq = d f = f x dx + f y d y. If the changes in x and y leave output unchanged, 0 = f x dx + f y d y, then (x + dx, y + d y)
is on the same isoquant as (x, y), as indicated in Figure 7.7, where a = (x, y) and b = (x + dx, y + d y). Now, suppose that the producer must use inputs x and y in
the proportion k, so that the choice of (x, y) must satisfy y = kx. Thus, output level q is achieved with input usage (x, y) at location a. Implicitly, the choice of q and
k determines the values of x and y: (k, q) → (x, y). To make this explicit, write x = s(k, q) and y = t(k, q). So f (x, y) = q and y = kx implicity determine (x, y).
Consider a variation in k, dk. The total differential of y = kx is d y = xdk + kdx,
giving small variations in y, k, x that permit the equation y = kx to continue to be satisfied. Next, if x and y must change so as to leave output unchanged, 0 = f x dx + f y d y, and this with the requirement d y = xdk + kdx determines the way in which
x and y change. Substituting the requirement d y = xdk + kdx into 0 = f x dx + f y d y
gives:
0 = f x dx + f y (xdk + kdx) = ( f x + f y k)dx + f y xdk. f x
Thus, ( f x + f y k)dx = − f y xdk or dx = − ( f x +yf y k) dk. This measures the change in
x due to a change in k, with q unchanged. Dividing through by dk, it has the
7.6. TOTAL DIFFERENTIALS
241
y
dk
b • a •
dy
Iq
x
dx
Figure 7.7: A parameter change
interpretation as a partial derivative (since q is held fixed): ∂x ∂k
=−
f yx ( f x + f y k)
.
(7.4)
In a similar way, we can hdeterminei the impact ³ hof a change i´ in k in y. Recall that fyx fyx d y = xdk + kdx = xdk + k − ( f x + f y k) dk = x + k − ( f x + f y k) dk. Thus, ∂y ∂k
µ · = x+k −
f yx ( f x + f y k)
¸¶
.
(7.5)
In these two Equations 7.4 and 7.5, the values of x and y are understood to be the solution values, located at a. Recall that when x and y are solved in terms of (k, q), we have x = s(k, q) and
y = t(k, q), so the calculation in Equation 7.4 corresponds to
∂x ∂k
=
∂s ∂k .
In general,
determining the functions s and t may be difficult, depending on the functional form of f . Therefore, the calculations in Equations 7.4 and 7.5 provide insight into the impact of parameter variations, even though we have not solved for x and y. In the case where the production function has the Cobb–Douglas form, f ( x, y) =
xα yβ , the functions may be found easily. Since xα yβ = q and y = kx, substituting
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
q . kβ
gives xα (kx)β = q or kβ xα+β = q or xα+β =
x = s(k, q) =
·
y = t(k, q) = k
q
242
Thus, ¸
1 α+β
kβ · ¸ 1 q α+β kβ
.
R EMARK 7.3: Considering the Cobb–Douglas function q = xα yβ , one might also
consider this expression as having three parameters ( q, α, β). With y = kx, the
pair of equations q = xα yβ , y = kx determine ( x, y) as functions of four parame
ters (k, q, α, β) in which case we might write x = s∗ (k, q, α, β) and t = t∗ (k, q, α, β).
This changes none of the calculations above. However, with just two parameters (k, q) there is a 11 connection between parameters and endogenous variables ( x, y), whereas with four parameters this is not possible.
7.7 Solutions to systems of equations Consider a system of linear equations A y = x, where A is an n × n matrix, y a vector of variables and x a vector of parameters. Provided A is nonsingular, we
can solve for y: there is a unique solution y = A −1 x. Furthermore, the impact of variations in x on the solution value of y is easy to determine. If x changes to
x + ∆ x, the new solution value of y is given by y′ = A −1 ( x + ∆ x) = A −1 x + A −1 ∆ x = y + A −1 ∆ x, so the change in y is ∆ y = y′ − y = A −1 ∆ x. In the nonlinear case, similar issues arise. In the twoequation twounknown system,
f 1 ( y1 , y2 ) = x1 f 2 ( y1 , y2 ) = x2 , where ( y1 , y2 ) are the variables and ( x1 , x2 ) the parameters, the following questions arise: 1. Does the equation system have a solution, ( y1∗ , y2∗ )? 2. Is the solution unique? 3. Assuming there is a solution, how does it vary with the parameters ( x1 , x2 )?
7.7. SOLUTIONS TO SYSTEMS OF EQUATIONS
243
These questions are considered in detail next, but it possible to indicate the main insights with a brief discussion. To address the first question (existence of a solution), consider the system: f 1 (y1 , y2 ) = 0y1 + 1y2 = 1 (= x1 )
f 2 (y1 , y2 ) = e− y1 + y2 = 1 (= x2 ).
Rearranging, the first equation gives y2 = 1, the second, y2 = 1− e− y1 . Regardless of y2 1 a y1
Figure 7.8: No solution
how
large
y1
is,
1 − e− y1 < 1,
so
the
pair
of
equations
has
no
solutions (Figure 7.8). However, if x2 = 1 is changed to x2 = a, as depicted, then there is a solution and it is unique. Finally, in this case, small variations in (x1 , x2 ) from the point (x1 , x2 ) = (1, a) lead to small variations in the solution, answering the third question.
The impact of parameter variation on a nonlinear system of equations is complicated by the fact that there may be multiple solutions. Even in the univariate
f (x) f (x) nb A a
k
cx d ©
x′
B
x′′
Figure 7.9: Multiple solutions
x
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
244
case, the issue arises, since an equation, k = f (x), may have multiple solutions. Consider the function f (x) depicted in Figure 7.9 and the equation k = f (x), which
has multiple solutions, at x′ , x and x′′ . However, if we identify a particular solution, x, as we vary k, the solution, x, should vary smoothly with k. As k varies in the region A = (a, b), the solution x varies in the region (c, d). If k increases to b, the solution x moves to c: with ∆k = b − k and ∆ x = c − x, ′
′
dk = f (x)dx, with f (x) 6= 0,
dx dk
=
1 f ′ (x) .
∆x ∆k
=
c− x b− k .
Noting that
Note that this “works” because f ′ (x) 6= 0,
and we have a welldefined (implicit) function g : (a, b) → (c, d) with g(k) = x. The next section explores this idea in the multivariate case.
7.7.1 Existence and properties of solutions To motivate the discussion, recall that in the linear case, if A is a square n × n
nonsingular matrix, then the equation system A y = x has a unique solution,
y = A −1 x. If A is singular, there are two possibilities – the system has no solution
or an infinite number of solutions. Observe that if both y∗ and y¯ solve A y = x, then for any number θ,
¯ = θ A y∗ + (1 − θ)A y¯ = θ x + (1 − θ)x = x. A(θ y∗ + (1 − θ) y) So, if y∗ and y¯ solve the equation A y = x then so does (θ y∗ + (1 − θ) y¯ ) for any value
of θ. So, when there are at least two solutions, there are an infinite number of solutions. Notice also, that if θ is close to 1, (1 − θ) is close to 0 and (θ y∗ + (1 − θ) y¯ )
is close to y∗ . So, when there is more than one solution every solution has other solutions arbitrarily close. These properties contrast with the situation in the nonlinear case. Consider the equation pair: y12 + y22 = x1
(7.6)
y1 + y2 = x2 .
(7.7)
The first equation is a circle (of radius
p
x1 ); the second is a linear equation. When
x1 = 1 and x2 = 1, both equations pass through the points (y1 , y2 ) = (1,0) and
(y1 , y2 ) = (0,1), points b and a respectively. The pair of equations has those two
points as solutions and no others, as depicted in Figure 7.10.
If the diameter of the circle were smaller, defined by an equation y12 + y22 = xˆ 1
with xˆ 1 < 12 , as indicated by the dashed line, there would be no solution. The pair of equations are inconsistent. This example suggests two observations: it is difficult
to determine whether or not a nonlinear system of equations has a solution; and
7.7. SOLUTIONS TO SYSTEMS OF EQUATIONS
245
y2 d •
• a
c •
e•
y12 + y22 = xˆ 1 < 12
y12 + y22 = x1′ > 1
b
y1
•
y1 + y2 = x2′ y1 + y2 = x2
y12 + y22 = x1 = 1
Figure 7.10: A system with two solutions “typically”, if a nonlinear system has more than one solution, the solutions are a distance apart: here at (y1 , y2 ) = (1,0) and (y1 , y2 ) = (0,1). This local uniqueness is
exploited in the implicit function theorem. Given the equation pair,
f 1 (y1 , y2 ) = x1 f 2 (y1 , y2 ) = x2 , one may ask if, given x = (x1 , x2 ), the value of y = (y1 , y2 ) is uniquely determined, assuming there is a solution, y. In the twoequation example, y12 + y22 = x1 and y1 + y2 = x2 , when x1 = 1 = x2 , there are two possible solutions for the y values:
(y1 , y2 ) = (1,0), at a, and (y1 , y2 ) = (0,1) at b.
However, there is a unique solution if we restrict attention to a neighborhood of a or of b. In Figure 7.10, if (x1′ , x2′ ) is close to (x1 , x2 ), then there is a unique point close to a that satisfies the pair of equations y12 + y22 = x1′ and y1 + y2 = x2′ , the
point c. This observation may be used to determine the local variation in (y1 , y2 ) resulting from a variation in (x1 , x2 ). To see this, consider the pair of equations
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
246
y12 + y22 = x1 , y1 + y2 = x2 , and taking total differentials: 2y1 d y1 + 2y2 d y2 = dx1 1d y1 + 1d y2 = dx2 . Writing in matrix form and solving for (d y1 , d y2 ): Ã
2y1
2y2
1
1
!Ã
d y1
!
Ã
d y2
=
dx1 dx2
!
Ã
,
d y1 d y2
!
=
Ã
2y1
2y2
1
1
! −1 Ã
dx1 dx2
!
.
Thus,
d y1 d y2
y
1 2( y1 − y2 )
− ( y1 −2y2 )
= − 2( y11− y2 )
y1 ( y1 − y2 )
dx1 dx2
.
At point a in Figure 7.10, (y1 , y2 ) = (0,1), and substituting this into the coefficient matrix gives:
d y1 d y2
=
− 12
1
1 2
0
dx1 dx2
.
This may be compared with Figure 7.10 at a. First suppose that dx2 = 0 and dx1 > 0 – this corresponds to a dilation of the circle. Then d y1 = − 21 dx1 and
d y2 = 12 dx1 indicated by the move from point a to point d. If dx1 is set to 0 and dx2 > 0, this corresponds to a right shift in the straight line. Then (approximately)
d y1 = dx2 and d y2 = 0, which corresponds to a direct move to the right from a
along the circle to point e. Note that this is approximate, with dx2 taken to be very small (so the movement from a to e is horizontal). In considering the general case, the question of whether a nonlinear system has a solution, or not, will not be considered. The entire discussion focuses on the properties of solutions, on the assumption that the equation system has at least one solution. Let x, y ∈ R n , and suppose that x j = f j (y1 , y2 ,... , yn ), j = 1,... , n. This gives an equation system:
x1 = f 1 (y1 , y2 ,... , yn ) x2 = f 2 (y1 , y2 ,... , yn ) .. .=
.. .
x n = f n (y1 , y2 ,... , yn ).
(7.8)
7.8. THE IMPLICIT FUNCTION THEOREM
247
Writing the vector of functions as f = ( f 1 ,... , f n ), this can be written succinctly as x = f (y), with x = (x1 , x2 ,... , x n ) and y = (y1 , y2 ,... , yn ).
Given the equation system x = f (y), consider a variation of y to some point
y˜ in some neighborhood of y, N (y). This determines a new value for x: x˜ = f ( y˜ ).
Provided f is continuous, x˜ will be close to x because y˜ is close to y. Now, suppose that for any y˜ ∈ N (y), with y˜ 6= y, x˜ = f ( y˜ ) 6= f (y) = x. In this case, the function f
is 11 in the neighborhood of y: different values of y determine different values of x. Therefore, when x˜ is close to x, there is a unique y˜ in the neighborhood of y with x˜ = f ( y˜ ). In this case, y is implicitly defined as a function of x, and so one can compute the impact on any give solution value y of a variation in x.
Referring to Figure 7.10 at x = (x1 , x2 ) = (1,1) there are two values of y that
satisfy x = f (y): y = a and y = b. Considering the point y = a, when x changes to
x˜ = (x1′ , x2′ ) the corresponding value of y moves from y = a to y˜ = c, the solution
in the neighborhood of a. Thus, y is implicitly defined as a function of x in this
neighborhood. In this way, we can associate to each point in a neighborhood of (x1 , x2 ) a unique point in a neighborhood of a. And, in this case, we can study how y varies in response to variations in x. Considering equation j, x j = f j (y1 ,... , yn ) in Equation system 7.8, dx j =
∂f j ∂ y1
d y1 + · · · +
∂f j ∂ ym
d yn ,
and the entire system may be written:
∂ f1 dx1 ∂ y1 .. .. . = . ∂fm dx n ∂y 1
··· .. . ···
d y1 .. .. . . . ∂fm d yn ∂ yn ∂ f1 ∂ yn
In abbreviated form: dx = Jd y, and provided J is nonsingular, d y = J −1 dx. The study of such systems is taken up next.
7.8
The implicit function theorem
For the square linear system A y = x, if A has a nonzero determinant (is nonsingular), there are two implications: (1) the system has a solution and (2) the
solution is unique. Then as we vary x to x′ , y must vary to y′ , to satisfy A y′ = x′ .
(Because y = A −1 x = Bx uniquely determines y as a function of x.) The key requirement is that A is nonsingular. For the nonlinear equation system, there
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
248
is an analogous criterion based on a Jacobian determinant. This criterion, a nonzero Jacobian determinant, provides information about the local uniqueness of a solution but not about the existence of a solution. The Jacobian determinant also provides information about how solutions vary with parameters. Given the systems of equations x = f ( y), where the x i are fixed constants, let y∗ be a solution to
x = f ( y∗ ). Given a solution y∗ , under what circumstances is this the only solution
in a neighborhood of y∗ ? And, if x = f ( y∗ ), under what circumstances does a small change in x lead to a small change in the solution? For both questions, a sufficient condition is that the Jacobian determinant (introduced below) is nonzero at y∗ . ³ ´ n o ∂x The Jacobian of the equation system is defined as J = ∂∂ xy = ∂ yj , i, j = i
1,... , n.
∂ f1 ∂ y1
∂ f2 ∂ y1 .. . J = ∂fi ∂ y1 . . . ∂fn ∂ y1
∂ f1 ∂ y2
...
∂ f1 ∂ yi
...
∂ f2 ∂ y2
...
∂ f2 ∂ yi
...
.. .
..
.. .
.. .
∂fi ∂ y2
...
∂fi ∂ yi
...
.. .
.. .
.. .
..
...
∂fn ∂ yi
...
∂fn ∂ y2
.
.
∂ f1 ∂ yn ∂ f2 ∂ yn
.. . ∂fi ∂ yn .. . ∂fn ∂ yn
Before discussing the Jacobian in detail, an example will illustrate the role of the Jacobian. E XAMPLE 7.9: Consider the pair of equations:
x1 = 1 = x y = f 1 ( x, y)
x2 = 0 = y − x2 = f 2 ( x, y).
y y − x2 = 0
y∗
•
xy = 1 x∗
x
7.8. THE IMPLICIT FUNCTION THEOREM
249
The Jacobian of the system is: ∂f
J=
∂ y1
∂ f1 ∂ y2
∂ f2 ∂ y1
∂ f2 ∂ y2
1
=
Ã
1
1
−2 x
1
!
.
So, the Jacobian determinant is  J  = 1 + 2 x. Note that the two curves “cut”,
providing a welldefined solution. The nonzero Jacobian determinant confirms this. Furthermore, variations in one of the parameters, ( x1 , x2 ), will vary the solution a small amount. This also is an implication of the nonzero Jacobian at the solution point ( x∗ , y∗ ).
♦
A key property of J is that when there is dependency between the equations
x i = f i ( y1 ,... , yn ), i = 1,... , n, the determinant of J is zero. ( J  is called the
Jacobian determinant). To see why this is so, suppose that the nth equation can be written as a function of the other n − 1 equations: x n = ϕ( f 1 , f 2 ,... , f n−1 ). Differentiate this equation to get: ∂xn
nX −1
=
∂ y′
i =1
∂ϕ ∂ f i ∂fi
∂ y′
=
nX −1
ϕi
i =1
∂fi ∂ y′
=
nX −1 i =1
ϕi
∂xi ∂ y′
, ϕi =
∂ϕ ∂fi
.
This shows that the nth row of J is a linear combination of the first (n − 1) rows
and this implies that  J = 0.
E XAMPLE 7.10: For example, let x1 = f 1 ( y1 , y2 ) = y1 + y2 and x2 = f 2 ( y1 , y2 ) =
y12 + 2 y1 y2 + y22 . Observe that y12 + 2 y1 y2 + y22 = ( f 1 ( y1 , y2 ))2 , so that the equa
tions are dependent (nonlinearly). In the notation above, x2 = ϕ( f 1 ) or
f 2 = ϕ( f 1 ).
∂x
∂ y1
∂ x1 ∂ y2
∂ x2 ∂ y1
∂ x2 ∂ y2
=
∂ f1 ∂ y2
´
1
J = Thus,  J = 2
³
∂ f1 ∂ y1
´³
Ã
−2
1
1
2 y1 + 2 y2
2 y1 + 2 y2
³
∂ f1 ∂ y1
´³
∂ f1 ∂ y2
´
!
∂ f1 ∂ y1
= ∂f 2 ∂ y1
1
∂ f1 ∂ y2 ∂f
2 ∂ y1
2
.
= 0. If x2 = x12 , there are an infinite
number of solutions and ( y1 , y2 ) is not determined in terms of ( x1 , x2 ). If
x2 6= x12 , there is no solution.
♦
The condition  J 6= 0 is necessary for a system of equations to be indepen
dent. Note, from the example, that this condition detects linear and nonlinear
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
250
dependence. However, the Jacobian determinant does not provide information about the existence of a solution.
E XAMPLE 7.11: Consider again the equation system:
y12 + y22 = x1 y1 + y2 = x2 . The Jacobian is:
J=
Ã
2 y1
2 y2
1
1
!
.
When x2 is small, say x2 < 21 , the equation system has no solution, but, for
example, at y1 = 0, y2 = 1,  J  = −2 6= 0. In this equation system, whether there is a solution or not depends on ( x1 , x2 ), but the Jacobian depends only
on ( y1 , y2 ), so it cannot provide information on the existence of a solution. ♦
In the linear case: x = A y so that J = A and so  J  =  A . Hence  J  6= 0 implies
that A is invertible and y = A −1 x. In this case, the solution is unique and the impact of variations in x and y is easy to determine ∆ y = A −1 ∆ x.
In the nonlinear case, J will generally depend on the value of the vector y – see the first row of the example above. To make this explicit, one may write J ( y). The following theorem connects the value of  J  to the function f , and considers the impact of a variation in x j on the values of the yi . For the latter, one additional
piece of notation is required. Let e j denote a column vector with 1 in the jth row j
and zeros elsewhere. Write Ji for the matrix obtained from J by replacing the i th column with e j . Theorem 7.3: If f is a continuous function with continuous partial derivatives,
f : R n → R n and  J ( y˜ ) 6= 0 for all y˜ in a neighborhood of y, then f is 11 in a
neighborhood of y: if y′ is close to y, there is a unique x′ = f ( y′ ) close to x = f ( y). Furthermore, at the point y,
∂ yi ∂x j
=
 J ji  J
.
R EMARK 7.4: The motivation for the theorem is relatively easy to see. With x =
f ( y), and f 11 on a neighborhood of y, x′ close to x is associated with a unique
point y′ close to y: there is a function g associating points in the neighborhood of
x to points in the neighborhood of y, ( y1 ,... , yn ) = ( g 1 ( x1 ,... , x n ),... , g 1 ( x1 ,... , x n )),
such that x = f ( g( x)), because at x, g( x) gives the value y = g( x) with x = f ( y).
Then the partial derivatives of the g functions give the impact of variations in
7.8. THE IMPLICIT FUNCTION THEOREM
251
the x i on the yi :
∂ y1 ∂ x1
=
∂ g1 ∂ x1
and
∂ y1 ∂ x2
=
∂ g1 . ∂ x2
Considering the twovariable case:
x1 = f 1 (g 1 (x1 , x2 ), g 2 (x1 , x2 )) x2 = f 2 (g 1 (x1 , x2 ), g 2 (x1 , x2 )). Consider a small variation in x1 , holding x2 fixed. From the equation, x1 = f 1 (g 1 (x1 , x2 ), g 2 (x1 , x2 )):
dx1 =
∂ f 1 ∂ g1 ∂ y1 ∂ x1
∂ f 1 ∂ g2
dx1 +
∂ y2 ∂ x1
dx1 .
From the equation x2 = f 2 (g 1 (x1 , x2 ), g 2 (x1 , x2 )): 0=
∂ f 2 ∂ g1 ∂ y1 ∂ x1
dx1 +
∂ f 2 ∂ g2 ∂ y2 ∂ x1
dx1 .
Dividing each of these equations by dx1 : 1= 0= In matrix form: Ã ! 1
0
∂ f 1 ∂ g1 ∂ y1 ∂ x1 ∂ f 2 ∂ g1 ∂ y1 ∂ x1
∂ f1 ∂ y1 = ∂f 2 ∂ y1
+ +
∂ f 1 ∂ g2 ∂ y2 ∂ x1 ∂ f 2 ∂ g2 ∂ y2 ∂ x1
.
∂ f1 ∂ g1 ∂ y2 ∂ x1 , ∂ f2 ∂ g2 ∂ y2 ∂ x1
so, from Cramer’s rule:
∂ y1 ∂ x1
=
∂ g1 ∂ x1
¯ ¯1 ¯ ¯ ¯ ¯0
=¯ ¯ ∂ f1 ¯ ∂ y1 ¯ ¯ ∂ f2 ¯ ∂ y1
∂ f1 ¯ ∂ y2 ¯
¯ ¯
∂ f 2 ¯¯ ∂ y2
¯.
∂ f1 ¯ ∂ y2 ¯
¯
∂ f 2 ¯¯ ∂ y2
The next example applies the theorem to the pair of equations x1 = y12 + y22 , x2 = y1 + y2 .
E XAMPLE 7.12: Consider again the equation pair (depicted in Figure 7.10): x1 = f 1 (y1 , y2 ) = y12 + y22 x2 = f 2 (y1 , y2 ) = y1 + y2 ,
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
252
with x1 = x2 = 1. Then ∂f
1
∂ y1
∂ f2 ∂ y1
∂ f1 ∂ y2 ∂ f2 ∂ y2
=
Ã
2y1
2y2
1
1
!
∂ f1 ∂ y2
1 0
;
∂ f2 ∂ y2
=
Ã
1 0
2y2
!
1
;
∂f
1
∂ y1
∂ f2 ∂ y1
1 0
=
Ã 2y1
1
1
0
!
.
(7.9) So: ∂ y1 ∂ x1
At (y1 , y2 ) = (0,1),
=
∂ y1 ∂ x1
1 2(y1 − y2 )
= − 12 and
and ∂ y2 ∂ x1
∂ y2 ∂ x1
=
−1 . 2(y1 − y2 )
(7.10)
= 12 . The impact of dilating the circle
(depicted in Figure 7.10 by the circle formed with dotted lines) is to move
♦
the solution left from a to d.
R EMARK 7.5: Note that when these matrices in Equation 7.9 are evaluated at point (y1 , y2 ), the implicit assumption is that the point satisfies the equations. For example, the point (y1 , y2 ) = (4,0) lies on neither equation (with x1 = x2 = 1), but one may nevertheless compute the matrices of partial derivatives:
again, these partial derivatives are computed at solution points to the system of
equations.
R EMARK 7.6: The calculations in Example 7.12 do not give an explicit solution for (y1 , y2 ). However, this is straightforward here. From y1 + y2 = x2 , y1 = x2 − y2 ,
which may be substituted into y12 + y22 = x1 to give (x2 − y2 )2 + y22 = x1 or x22 − 2x2 y2 + y22 + y22 = x1 . Collecting terms, 2y22 − 2x2 y2 + x22 − x1 = 0. This quadratic in y2 has two roots, y2a , y2b :
y2a , y2b = = = = =
2x2 ±
q (2x2 )2 − 4(2)(x22 − x1 )
4 q 2 2x2 ± 4x2 − 8x22 + 8x1 4 q 2x2 ± 4(2x1 − x22 ) 4 q 2x2 ± 2 (2x1 − x22 ) 4 q x2 ± (2x1 − x22 ) 2
.
7.8. THE IMPLICIT FUNCTION THEOREM
253
At x1 = x2 = 1, this gives y2a , y2b = 1 2
1 ±1 2 ,
so y2a = 1 and y2b = 0. Taking the first root,
y2a ( x1 , x2 ) = 12 x2 + 21 (2 x1 − x22 ) , and differentiating with respect to x1 (dilating
the circle), gives
∂ y2a ∂ x1
=
µ ¶ 1 1 1 (2 x1 − x22 )− 2 2 2 2
=
1 1 (2 x1 − x22 )− 2 2
1 . = q 2 (2 x1 − x22 ) Notice that
y2a =
1 1 x2 + 2 2
q
(2 x1 − x22 ),
q Thus, y1a − y2a = − (2 x1 − x22 ) and ∂ y2a ∂ x1
1 y1a − y2a
y1a =
1 1 x2 − 2 2
= −q
1
(2 x1 − x22 )
q
(2 x1 − x22 ).
, so:
1 1 =− = q a a , 2( y 2 1 − y2 ) 2 (2 x1 − x2 )
which compares with Equation 7.10.
In concluding this section, recall the general system of Equation 7.8, which may be written in the form x = f ( y). Then, the total differential may be written, in matrix notation, as dx = J ( y)d y. If J ( y) is nonsingular, then we may write d y =
J ( y)−1 dx, giving the variations in all y as a result of a variation in the vector x.
7.8.1
General systems of equations
In the previous discussion, equations had the form x i = f i ( y1 ,... , yn ), with the
interpretation that the x i are parameters determining the values of the yi , just as the values of the x i in x = A y determine the yi . In either case: x i = f i ( y1 ,... , yn ) P or b i = nj=1 a i j y j , the parameters appear “nicely” on one side. If we write F i ( y1 ,
y2 ,... , yn , x i ) = x i − f i ( y1 ,... , yn ), then x i = f i ( y1 ,... , yn ) for i = 1,... , n is equivalent
to F i ( y1 , y2 ,... , yn , x i ) = 0, for i = 1,... , n.
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
254
More generally, each parameter x i may enter all equations: F i ( y1 , y2 ,... , yn , x i ,
x2 ,... , x m ) = 0. For example, F 1 ( y1 , y2 , x1 , x2 , x3 ) = x3 − x1 y12 − x2 y22 F 2 ( y1 , y2 , x1 , x2 , x3 ) = 1 − x3 x1 y12 − x2 y22 gives a pair of equations determining two variables, ( y1 , y2 ), with three parameters, ( x1 , x2 , x3 ). The general system of equations has the form:
F 1 ( y1 ,... , yn ; x1 ,... , x m ) = 0 F 2 ( y1 ,... , yn ; x1 ,... , x m ) = 0 .. .
. = ..
F n ( y1 ,... , yn ; x1 ,... , x m ) = 0. In this system, the y variables are viewed as endogenous variables (variables to be determined) and the x variables are viewed as exogenous variables (variables whose values are fixed – parameters). Given the values for the x i , this is a system of n equations (F 1 , F 2 ,... , F n ), in n unknowns, ( y1 , y2 ,... , yn ). Note that the number of x i (m) need not equal the number of equations (n), but the number of
yi (n) must equal the number of equations. This is because the y variables are the variables that must be solved from the n equations, given some fixed set of values for the x variables. The impact of a parameter change (say x i changes) on some variable (say y j ) is measured by
∂ yj . ∂ xi
As in the previous discussion, this may be computed in terms of
the ratio of the determinants of two matrices. The following discussion describes j
the procedure. For this system, the Jacobian J and an additional matrix Ji are defined: ∂F 1 ∂ y1
∂F 1 ∂ y2
...
∂F 1 ∂ yj
...
2 ∂F ∂ y1 . .. J = i ∂F ∂ y1 .. .
∂F 2 ∂ y2
...
∂F 2 ∂ yj
...
.. .
..
.. .
.. .
...
∂F i ∂ yj
...
.. .
.. .
..
∂F i ∂ y2
.. .
∂F n
∂F n
∂ y1
∂ y2
.
...
∂F n ∂ yi
.
...
∂F 1 ∂ yn
∂F 2 ∂ yn
.. . i ∂F ∂ yn .. .
∂F n ∂ yn
7.8. THE IMPLICIT FUNCTION THEOREM
255
∂F 1 ∂ y1
2 ∂F ∂ y1 . . . j Ji = ∂F i ∂ y1 . ..
∂F 1 ∂ y2
...
−∂ F 1 ∂ xi
...
∂F 1 ∂ yn
∂F 2 ∂ y2
...
−∂ F 2 ∂ xi
...
∂F 2 ∂ yn
.. .
..
.. .
.. .
∂F i ∂ y2
...
−∂ F i ∂ xi
...
.. .
.. .
.. .
..
...
−∂ F n
...
∂F n
∂F n
∂ y1
∂ y2
.
∂ xi
.
.. . . ∂F i ∂ yn .. .
∂F n ∂ yn
j
The matrix Ji is obtained from J by replacing the jth column of J with the vector of partial derivatives of (−F 1 , −F 2 ,... , −F n ) with respect to x i . Theorem 7.4: Suppose that F i has continuous partial derivatives for all i and ¯ 6= 0 at (y, x). ¯ Then there exist continuous functions f 1 ,... , f i ,... , f n , such  J( x) ¯ Furthermore, the effect on that y j = f j (x), j = 1,2,... n, for x in a neighborhood of x. y j of a change in x i ,
∂ yj ∂ xi
, is given by
∂yj ∂xi
j
=
 Ji  J
.
To illustrate, return to the production problem discussed in Section 7.6.1 with a different specification of the production and constraint technology. Let demand, Q, be exogenously given and determining the required output. Suppose that the technology relating inputs x and y to output is Q = xα yβ . In addition, a technology
constraint requires that y = kxγ . We wish to consider the impact of changes in the
required production level Q on x and y. The equations relating the variables may be written: F 1 (x, y;Q, α, β, γ, k) = Q − xα yβ = 0
(1)
F 2 (x, y;Q, α, β, γ, k) = y − kxγ = 0.
(2)
These equations are depicted in Figure 7.11, where the impact of varying Q is Ã ! illustrated. Ã −αQ −βQ ! 1 − ∂∂FQ −1 x y = J = −γ y and . ∂F 2 0 1 − ∂Q x Q
Thus,  J = − x (α + βγ), and
¯ ¯ 1 ¯¯−1 = ∂Q  J  ¯¯ 0 ¯ ¯ −αQ ∂y 1 ¯¯ x = ∂Q  J  ¯¯ −γ y ∂x
x
¯
−βQ ¯ y ¯¯
1 ¢Q ¯= ¡ α + βγ x 1 ¯ ¯ ¡ y¢ ¯ −1 ¯ γ x ¯= ¢³Q ´. ¯ ¡ 0¯ α + βγ x
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
y
256
Q increases
(2)
dy
•
•
(1)
x
dx
Figure 7.11: The impact of varying output on x and y E XAMPLE 7.13: Consider a macroeconomic model with two equations:
y = c( y − T ( y), r) + I ( y, r) + G M = L( y, r). The first equation is called the IS curve and the second the LM curve in macroeconomics. In this widely studied model, a common task is to determine the impact of changes in policy parameters (government expenditure
G and the money supply M ) on income y and the interest rate r. These are called policy multipliers and are computed next. For example,
∂y ∂G
is the pol
icy multiplier relating government expenditure to output and measures the impact on output of an increase in government expenditure. Write c1 for the partial derivative of c with respect to its first argument,
cr =
∂c , ∂r
cr =
∂c , ∂r
Lr =
∂L ∂r
and L y =
∂L . Then, ∂y
differentiating the first equation:
d y = c1 (1 − T ′ )d y + c r dr + I y d y + I r dr gives the variation in r resulting from a variation in y when restricted to stay on this curve. Rearranging: (1 − c1 (1 − T ′ ) + I y )d y = ( c r + I r )dr (1 − c1 (1 − T ′ ) + I y ) (cr + I r )
=
dr . dy
A common economic assumption is that 0 < (1 − c1 (1 − T ′) + I y ) < 1 along with
7.8. THE IMPLICIT FUNCTION THEOREM
257
c r < 0 and I r < 0, so that dr dy
=
L − L ry ,
dr dy
< 0. Similar calculations for the LM curve yield
which is positive under the assumptions L y > 0 and L r < 0. The
curves are depicted in Figure 7.12.
r LM
IS y
Figure 7.12: The IS–LM curves
In this model, the exogenous variables are government expenditure (G) and the money supply (M). The endogenous variables are income (y) and the rate of interest (r). (As a numerical example, let T(y) = 12 y, c(y − T(y), r) = ³q ´ 1 y y − 12 y · 1r , I(y, r) = r and L(y, r) = y 2 r−1 .) The first equation asserts that national income (y) equals consumption (c), plus investment I, plus govern
ment expenditure, G. The second says that money supply (M) equals money demand L (and demand depends on the interest rate (r) and the level of income). These equations may be rearranged in the general form given previously: F 1 (y, r,G, M) = y − c(y − T(y), r) − I(y, r) − G = 0
F 2 (y, r,G, M) = L(y, r) − M = 0.
Provided the Jacobian has full rank, we may view y and r as functions of G and M, y = f 1 (G, M), r = f 2 (G, M), and compute the corresponding partial derivatives according to the rule above. (In the theorem above y and r
correspond to y1 and y2 , while G and M correspond to x1 and x2 .) With this rearrangement, the Jacobian is obtained:
J =
∂F 1 ∂y
∂F 1 ∂r
∂F 2 ∂y
∂F 2 ∂r
=
Ã
1 − c1 (1 − T y ) − I y Ly
−(c r + I r )
Lr
!
.
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
258
The Jacobian has determinant  J = [1 − c1 (1 − T y) − I y ]L r + L y ( c r + I r ). To y
y
r calculate JG , JM , JGr and JM (where these correspond to J ji , with i = y, r
and j = G, M ), observe
Ã ! 1 − ∂∂FG +1 = , ∂F 2 0 − ∂G
1
− ∂∂FM
2
− ∂∂FM
=
Ã
!
0 +1
.
Thus, y JG
JGr
= =
Ã
1
−( c r + I r )
0 Ã
Lr
!
y JM
,
1 − c1 (1 − T y ) − I y
1 0
Ly
!
r JM
,
= =
Ã 0
! −( c r + I r )
1 Lr Ã 1 − c1 (1 − T y ) − I y
Ly
0
!
1
.
Applying the theorem: ∂y ∂G
y
=
 JG  J
=
Lr , J
∂y
and
∂M
y
=
 JM  J
=
cr + I r . J
(7.11)
Similarly, ∂r ∂G
=
 JGr  J
=
−L y J
,
and
∂r ∂M
=
r  JM 
J
=
−[1 − c1 (1 − T y ) − I y ] J
.
(7.12)
♦
R EMARK 7.7: It is worth noting that, for a slightly different perspective, these results may be obtained directly by totally differentiating the original equation system.
d y = c1 (1 − T y )d y + c r dr + I y d y + I r dr + dG L y d y + L r dr = dM, [1 − c1 (1 − T y ) − I y ]d y − ( c r + I r )dr = dG
L y d y + L r dr = dM Ã
[1 − c1 (1 − T y ) − I y ]
Ly
!Ã ! −( c r + I r ) d y
Lr
dr
=
Ã
dy
Ã
dG
!
dM
.
This system has the form: Ax = b where
A=
Ã
[1 − c1 (1 − T y ) − I y ]
Ly
−( c r + I r )
Lr
!
, x=
dr
!
, b=
Ã
dG dM
!
.
7.8. THE IMPLICIT FUNCTION THEOREM
259
Note that A = J and that  A = [1 − c1 (1 − T y ) − I y ]L r + L y (G r + I r ). Using Cramer’s rule:
¯ 1 ¯¯ dG dy = ¯  A  ¯ dM
¯ −( c r + I r )¯¯ ¯ ¯ Lr
L r dG + ( c r + I r )dM A µ ¶ µ ¶ Lr (cr + I r ) = dG + dM, A A
=
and ¯ 1 ¯¯[1 − c1 (1 − T y ) − I y ] dr = ¯ A¯ Ly
= =
¯ dG ¯¯ ¯ dM ¯
[1 − c1 (1 − T y ) − I y ]dM − L y dG
µ
−L y A
¶
dG +
µ
A
[1 − c1 (1 − T y ) − I y ] A
¶
dM.
Considering the expression ¶ µ ¶ (cr + I r ) Lr dG + dM, dy = A A µ
if dM = 0, the change in y, d y, results from a change in G , dG , with M held
fixed. This measures the partial derivative³ of y´ with respect to G , and is given in the expression by the coefficient on dG : LAr . Thus: ∂y ∂G
=
·
¸ Lr , A
∂r ∂M
=
·
[1 − c1 (1 − T y ) − I y ] A
¸
,
and so on. This is the same result as appears in Equations 7.11 and 7.12.
The following discussion provides a graphical perspective on the calculations. A highway starts from point ( x, y) = (0, 0) and follows the path y = x + β x2 − γ x3
(given the x coordinate, there is a unique y coordinate determined by this equation). The direction x is east and the direction y is north. The point ( x, y) on this highway that is a distance r from the starting point satisfies x2 + y2 = r2 . (We assume that
2β 3γ
> r.) This point is uniquely defined by these two equations.
Suppose that we move slightly further along the highway by increasing the distance r; is the direction more northerly or more easterly? From the starting point at radius r, consider an increase in the radius to ′
r with r′ − r > 0 small. This changes the coordinates ( x, y) slightly to ( x′ , y′ ).
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
260
y
highway
dy
•
•
r
dr
x
x∗
dx
x∗ =
β+
p
β+3γ 3γ
Figure 7.13: The impact of increasing distance
The eastward movement is x′ − x and the northward movement is y′ − y. Then
x′ − x ∂ x ≈ r′ − r ∂r
y′ − y ∂ y ≈ . r′ − r ∂r
and
Thus, the northward movement is greater than the eastward movement if:
y′ − y = x′ − x
y′ − y r′ −r x′ − x r′ −r
≈
∂y ∂r ∂x ∂r
>1
Note that, along the highway, µ ¶ 2β dy = 1 + 2β x − 3γ x2 = 1 + (2β − 3γ x) x = 1 + 3γ − x x. dx 3γ
So,
dy dx
> 1 if
³
2β 3γ
´ − x > 0 or (2β − 3γ x) > 0. At any point ( x, y) on the circle x2 + y2 =
r2 , and since y2 ≥ 0, x2 ≤ r2 , and since the solution is positive, x < r, so that
2β − 3γ x > 2β − 3γ r or
2β 3γ
−x >
2β 3γ
− r. Since
2β 3γ 2
≥ r,
2β 3γ
3
− x ≥ 0. Therefore, at the
point on the circle where the curve y = x + β x − γ x cuts, the slope of the curve
is greater than 1: at the solution, than the eastward movement.
dy dx
> 1, and the northward movement is greater
7.8. THE IMPLICIT FUNCTION THEOREM
261
The following example reconsiders this discussion by calculating the partial derivatives of y and x with respect to r, and comparing them.
E XAMPLE 7.14: Recall:
x 2 + y2 = r 2
y − x − β x 2 + γ x 3 = 0.
2 xdx + 2 yd y = 2rdr
d y − dx − 2β xdx + 3γ x2 dx = 0 Ã
2x
2y
1 + 2β x − 3γ x 2
−1
!Ã
dx
!
dy
=
Ã
2rdr
!
0
,
or, since only r is varied, Ã
2x
2y
1 + 2β x − 3γ x 2
−1
! ∂x
∂r = ∂y ∂r
Ã ! 2r
0
.
Using Cramer’s rule, the coefficient determinant is ∆ = −2 x − 2 y(1 + 2β x −
3γ x2 ), while ∆1 = −2r and ∆2 = −(1 + 2β x − 3γ x2 )2r. Then ∂x ∂r
=
∆1 , ∆
∂y ∂r
=
∆2 and ∆
∂y ∂r ∂x ∂r
=
∆2 = (1 + 2β x − 3γ x2 ) = (1 + (2β − 3γ x) x). ∆1 ♦
This page intentionally left blank
Exercises for Chapter 7
Exercise 7.1
The following simple macroeconomic model has four equations
relating government policy variables to aggregate income and investment. Aggregate income equals consumption plus investment and government expenditure:
y = c + i + g. Consumption is positively related to the real rate of interest:
c = α + β y, α, β > 0. Investment is negatively related to the rate of interest, r:
i = γ − δr, γ, δ > 0. The money supply, m s , equals the money demand, m d , and this in turn is positively related to the level of income (ρ > 0) and negatively related to the rate of interest (ϕ > 0):
m s = m d = ρ y − ϕ r, ϕ, k > 0.
Find the impact on y of (a) a small change in g, ∆ g, and (b) a small change in m,
∆ m. Exercise 7.2
Consider a market defined by the following supply and demand
curves.
q d = a + bP d + cY¯ q s = d + eP s ,
where a, c, d , e > 0, and b < 0, a > d . 263
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
264
(a) (i) Compute the equilibrium price P ∗ and quantity q∗ . (ii) Sign the comparative statics partial derivatives
∗ ∂P ∗ ∂ q , ∂Y¯ . ∂Y¯
(b) Suppose a tax t > 0 is levied on producers: (i) Compute the new equilibrium price and quantity. (ii) Sign the derivatives
∂ q∗ ∂t ,
∂P d ∂t
∂P s ∂t .
,
Exercise 7.3 Consider the following macroeconomic model of national income:
C = α + βY ,
α > 0, 0 < β < 1
Y = C + I +G I = γY ,
0 < γ < 1.
What are the exogenous and endogenous variables? Write this system in matrix form and solve for the reduced form (i.e., write the system in the form Ax = b,
where x is the vector of endogenous variables and b the vector of exogenous variables and solve as x = A −1 b). Using the reduced form expressions, find the following comparative statics derivatives: ∂c , ∂G
∂Y ∂G
,
∂I ∂G
.
Exercise 7.4 Consider the following macroeconomic model:
y = c( y, r) + i (r) + G m = l (r, y). (a) Identify the exogenous and endogenous variables. (b) (i) What is the effect of an increase in government expenditure G , on output y and the interest rate r? (ii) What is the effect of an increase in the money supply M , on output and the interest rate? (c) Consider what happens if we replace the above money market equation with the following: m = µ y. Repeat part (b) with this assumption. (d) Represent the two different systems in ( y, r) space and show the above effect diagrammatically. Exercise 7.5 Consider the following macroeconomic model:
S (Y , r) + T (Y ) + IM = I (r) + G + x( e) L(Y , r) = M,
EXERCISES FOR CHAPTER 7
265
where Y is national income, S is savings, which is a function of income and the rate of interest, IM is imports, x is exports, I is investment, G is government expenditure, M is money supply and e is the exchange rate. The first equation is the equilibrium condition for the goods market and the second equation is the equilibrium condition for the money market. Using this model, determine how the equilibrium values of Y and r change with a change in the exchange rate e and with a change in the exogenous level of imports (here imports is an exogenous variable, just like G ). Make the usual assumptions: S y > 0, S r > 0, T y > 0, I r < 0, x e < 0, L y > 0, L r < 0. Exercise 7.6 Consider a production process using input x to produce output y. The relationship between inputs and outputs is somewhat odd: for technological reasons the input–output levels must satisfy two conditions:
x2 + y2 = α, 0 < α < 1 β y = x(1 − x), 0 < β.
This pair of equations has a unique nonnegative solution ( x, y ≥ 0). Plot these
functions in ( x, y) space. At this solution, calculate
dx d y dx dα , dα , dβ
and
dy dβ .
In relation
to your graph, interpret the signs of the derivatives (they are positive or negative). Exercise 7.7
Repeat the previous question, with input and output required to
satisfy the following two conditions:
y − γ ln( x + 1) = 0 x α yβ = γ.
Exercise 7.8 Consider the following IS –LM model with an import sector:
y = c ( y) + I ( i ) + G m = L( y, i ) x(π) = z( y, π). The first two equations are the IS and LM curves, respectively. The third equation asserts that exports ( x) depend only on the exchange rate, π, while imports ( z) depend on the exchange rate and the level of domestic income. (a) Calculate the slopes of the IS and LM curves. (b) Find
d y di dG , dG
and
dπ dG .
(Differentiate each of the three equations to give a 3 × 3
equation system in (d y, di, d π) and use Cramer’s rule.)
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
266
An individual has utility function u( x, y) = xα yβ , where 0 < α
0. Find
∂ y ∂x ∂ y ∂x , , , . ∂Q ∂Q ∂ k ∂ k
∂ y ∂x , . ∂α ∂α
Also, try finding
Exercise 7.14 Consider the following two equations: x2 + y2 = r2 and y = k ln( x + 1),
with r > 0 and k > 0. Plot both equations and indicate on your graph the impact of an increase in r and k on the solution. Find
Exercise 7.15
d y dx d y dr , dr dk
and
dx dk .
Consider the following two equations: x y = k and y = 1 − e−γ x ,
k > 0 and γ > 0. Plot both equations and indicate on your graph the impact of an
increase in k and γ on the solution. Find
d y dx d y dk , dk d γ
and
dx dγ .
Exercise 7.16 An output technology is given by x + k y2 = a, where k and a are
positive constants. Government regulations require that outputs of x and y satisfy the relation eβ y − eα x = c, where α, β and c are positive constants. (We assume that q ln( c+1) < ak to ensure there is a solution to the pair of equations.) β (a) The government has decided that it is desirable to raise x and reduce y, by a policy of increasing c. Will the policy have the desired effect? Calculate the impact on x and y of an increase in c. (b) An economic consultant recommends instead that the government raise β. Would this policy have the desired effect of increasing x and reducing y? Exercise 7.17 An individual has income or wealth, y, to allocate between present,
c, and future, f , consumption. Suppose that the individual has utility function u( c, f ) = c · f . Income not consumed in the current period is invested at interest
rate r to yield the amount available for future consumption: f = ( y − c)(1 + r).
Since y is given, the choice of c determines f . This is a budget constraint relating current and future consumption. Find the impact of a change in income, y, on consumption c.
CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES
268
Exercise 7.18 An individual has preferences given by indifference curves of the form x y = k: if the agent is consuming x units of one good, the consumption of
k y
units of the other good leads to utility level k. The problem facing the individual is that the goods x and y are bundled so that they cannot be bought in any combination. Specifically, x units of one good come in a package with y = 1 − e−α x units
of the other good. Suppose that the individual currently is at utility level k, consuming at levels x and y according to the packaging constraint. Find the impact of an increase in utility on the consumption levels x and y: Exercise 7.19
∂x ∂k
and
∂y . ∂k
According to a macroeconomic model, the aggregate level of in
come Y equals consumption C , investment I and government expenditures G (Y = C + I + G ). Consumption depends on income C = C (Y ) while investment de
pends on the rate of interest r, so I = I (r). Thus, Y = C (Y ) + I (r) + G . The money
supply M equals money demand, which is a function of the level of income and
the rate of interest L = L(Y , r). Thus, M = L(Y , r). Finally, the exchange rate π
is assumed to depend on income and the interest rate π = π(Y , r). Together, this
gives three equations. The level of government expenditure (G ) and the money supply ( M ) are assumed to be controlled by the government. Given values for G and M , the three equations determine the values of Y , r and π. Suppose that these functions can be written in the form:
Y = cY α − γ r + G M=
Y r
π = Y β rδ .
In this model, there are two government policy parameters, government expenditure, G , and money supply M . The impact of changes in these policy variables on output, interest rates and the exchange rate are given by such expressions as ∂r ∂Y ∂G , ∂ M ,
and
∂π ∂G .
Determine these policy “multipliers” and confirm that
β = −δ.
∂π ∂G
= 0 if
Exercise 7.20 Suppose that the supply function for a particular good is given by the equation p = α + δ qβ , α, δ, β > 0. The demand function for the good is given by ln( pq) = k. Shifts in the supply function are caused by changes in α, δ and β. Plot
the supply and demand equations and calculate the impact on (equilibrium) price and quantity of supply shifts.
8 Taylor series
8.1
Introduction
Problems such as maximization of a function f or finding the root of an equation f (x) = 0 arise frequently in economics. However, often the function may be difficult
to work with, motivating the use of approximations to the function f that are easier to study and provide insight regarding f . One relatively simple class of approximating functions are the polynomials. A function h is a polynomial in x if it has the form: h(x) = a0 + a1 x + a2 x2 + · · · + a n x n , where each a i is a constant. So, h is the sum of terms where each term is a constant multiplied by x to the power of some integer, such as a3 x3 , where a3 is the constant and x is raised to the power of 3. The function h is called an nth order polynomial because the highest power is n (assuming a n 6= 0). For example, h(x) = 2 + x + 5x2 + 8x3 is a thirdorder polynomial, as is g(x) = 2x3 . Under suitable
circumstances, a large class of functions can be approximated by a polynomial, so that given the function f , one may find a polynomial such as h approximating f . This makes it possible to discover properties of f by study of the simpler function h and that fact is a primary motivation for studying these approximations. Taylor’s theorem concerns the approximation of functions with polynomials and provides the theoretical basis for the polynomial approximations. Section 8.1.1 motivates the approach by illustrating how a quadratic function may be chosen to approximate a continuous function. Following this, in Section 8.1.2 the main theorem is stated. This asserts that any continuous differentiable function may be approximated by a polynomial function – the Taylor series 269
CHAPTER 8. TAYLOR SERIES
270
expansion. Because Taylor series expansions involve approximation, there is a gap between the approximation and the actual function. However, the quality of the approximation improves as the length of the Taylor series expansion increases – the higher the order of the polynomial, the better the approximation. The improvement from lengthening the approximation is considered with an example in Section 8.1.3. Sections 8.2 to 8.4 provide a number of applications. Section 8.2 shows how Taylor series expansions may be used to address a variety of problems – such as relating concavity to the derivatives of a function, finding the roots of an equation, and generating numerical optimization routines to find maxima and minima of a function. In Section 8.3, expected utility is introduced. In this section, the theory is used to provide an approximate measure of utility loss from risk and a monetary measure of the cost incurred from bearing risk. Section 8.4 uses Taylor series to find the approximately optimal solution to a portfolio selection problem. Finally, Section 8.5 describes the multivariate version of Taylor’s theorem and Section 8.6 provides a proof of the theorem.
8.1.1 Taylor series approximations: an example To illustrate the way in which Taylor series may be used to approximate a function, consider the function: x
f (x) = ln(x + 1)e− 2 . Although simple to describe mathematically, the function has reasonably complex behavior as x varies. Starting from x = 0, the function increases initially and then declines, “flattening out” to 0 as x becomes large. Often, functions such as this are
difficult to study. For example, the function has a unique maximum, x∗ , defined by f ′ (x∗ ) = 0, but this expression is very difficult to solve. One way to address
this difficulty is to approximate f by another function, g, which is “close” to f but simpler to manipulate. Consider the function, g(x), which is given by the following formula: g(x) = 0.1524 + 0.44285x − 0.17489x2 . Observe that the function g “tracks” the function f (x) to some degree – see Figure 8.1. Both functions, f and g, have the same value at x = 1 and are close when x is close to 1. So, g is a good approximation to f when x is close to 1 and g has a simple form.
8.1. INTRODUCTION
271
0.6
0.4 x
f ( x) = ln( x + 1) e− 2 0.2
g( x) = 0.1524 + 0.44285 x − 0.17489 x2 x 1
2
x
Figure 8.1: Taylor series approximation of f ( x) = ln( x + 1) e− 2
In fact, the function g is a secondorder Taylor series expansion of f , defined according to the formula:
g( x) = f (1) + f ′ (1)( x − 1) +
1 ′′ f (1)( x − 1)2 . 2
(8.1)
The logic for this is discussed later. The following calculations confirm that this formula in Equation 8.1 yields the expression for g given above. Observe that: 1
1 1 e− 2 x − ln( x + 1) e− 2 x ( x + 1) 2 1 1 1 1 f ′ (1) = e− 2 − ln(2) e− 2 , 2 2
f ′ ( x) =
and 1
1
1 e− 2 x 1 e− 2 x + ln( x + 1) e− 2 x − 2 ( x + 1) 4 ( x + 1) 1 x+2 −1 x 1 =− e 2 + ln( x + 1) e− 2 x 4 ( x + 1)2 1 3 1 1 f ′′ (1) = − e− 2 + ln(2) e− 2 . 4 4
f ′′ ( x) = −
With these calculations in place, g may be obtained: 1 g( x) = f (1) + f ′ (1)( x − 1) + f ′′ (1)( x − 1)2 2 · µ ¶ ¸ · ¸ 1 1 1 1 −1 1 3 1 1 e 2− ln(2) e− 2 − e− 2 ( x − 1)2 ln(2) e− 2 ( x − 1) + = ln(2) e− 2 + 2 2 8 8
CHAPTER 8. TAYLOR SERIES
272
1 1 1 −1 e 2 [1 − ln(2)]( x − 1) − e− 2 [3 − ln(2)]( x − 1)2 2 8 ¤ 1 −1 £ = e 2 13 ln(2) − 6 ln(2) x + 10 x − 7 + ln(2) x2 − 3 x2 8 ¤ 1 1£ = e− 2 (13 ln(2) − 7) + (10 − 6 ln(2)) x − (3 − ln(2)) x2 8 1
= ln(2) e− 2 +
= 0.1524 + 0.44285 x − 0.17489 x2 , 1
using the fact that ln(2) = 0.69314 and e− 2 = 0.60653. The function g is called a secondorder Taylor series expansion of f around 1 since we used f (1), f ′ (1) and
f ′′ (1) to obtain g. The ability to approximate a function (such as f ), by a polynomial function, such as the quadratic function g, turns out to have many uses. One application is optimization. If the maximum of f is difficult to determine but g is close to f , then finding the maximum of g may hbe a good approximation to maximizing f . Note i 1 1 that f ′ ( x) may be written f ′ ( x) = ( x+1 1) − 12 ln( x + 1) e− 2 x . Since for any x, e− 2 x 6= 0, h i h i 1 1 1 1 ∗ ∗ f ′ ( x) = 0 implies ( x+ 1) − 2 ln( x + 1) = 0: x satisfies ( x∗ +1) − 2 ln( x + 1) = 0. The
solution for x∗ is with x∗ = 1.34575 and f ′ ( x∗ ) = 0 is satisfied. The function f has a unique maximum at x∗ .
The function g has a maximum at xˆ = 1.266. Assume that, since g approxi
mates f , the maximum of g might be close to the maximum of f , so if we now expand f around xˆ , we obtain a “new” function, gˆ ; this should be a better approximation of f close to the maximum:
gˆ ( x) = f (1.266) + f ′ (1.266)( x − 1.266) +
1 ′′ f (1.266)( x − 1.266)2 2
= 0.22901 + 0.307258 x − 0.114577 x2 .
Observe that gˆ is maximized at x′ = 1.341, which is very close to x∗ . These calcu
lations suggest a simple procedure for locating the maximum of a function using a simple algorithm, and illustrate one important use of Taylor series expansions.
8.1.2 Taylor series expansions Taylor’s theorem asserts that a continuous function with continuous derivatives may be approximated by a polynomial function. For notational convenience write
f ( k) ( x) =
dk f d xk
for the kth derivative of the function f . (As usual the lower deriva
tives are often written f ′ ( x) =
f (2) ( x) and f (3) ( x).)
df dx ,
f ′′ ( x) =
d2 f d x2
and f ′′′ ( x) =
d3 f d x3
instead of f (1) ( x),
8.1. INTRODUCTION
273
Theorem 8.1: Let f be a continuous function on an interval [a, b] with continuous derivatives f ′ , f ′′ , ... f n . Then given x, x0 ∈ (a, b): 1 ′′ f ( x0 )( x − x0 )2 2! 1 1 + f ′′′ ( x0 )( x − x0 )3 + · · · + f n ( p)( x − x o )n 3! n!
f ( x) = f ( x0 ) + f ′ ( x0 )( x − x0 ) +
(8.2)
for some p between x and x0 , and 1 ′′ f ( x0 )( x − x0 )2 2! 1 1 + f ′′′ ( x0 )( x − x0 )3 + · · · + f n ( x0 )( x − x0 )n , 3! n!
f ( x) ≈ f ( x0 ) + f ′ ( x0 )( x − x0 ) +
(8.3)
when x is close to x0 .
In Theorem 8.1, the expression on the right side is called a Taylor series expansion of f of order n around x0 . Note that in one of the expansions, Equation 8.2, the last term is
1 n!
f n ( p)( x − x0 )n – the point p is chosen to give equality of the ex
pansion with the function (and depends on both x and x0 ). The second expansion, Equation 8.3, has last term
1 n n n! f ( x0 )( x − x0 ) ,
and the expansion is approximately
equal to f ( x). The term n! is “n factorial”, defined n! = n(n − 1)(n − 2) · · · 3 · 2 · 1, so,
for example, 3! = 3 × 2 × 1 = 6. By convention, 0! = 1. For example, a thirdorder
expansion is:
f ( x) = f ( x o ) + f ′ ( x o )( x − x o ) +
1 1 ′′ f ( x o )( x − x o )2 + f ′′′ ( p)( x − x o )3 . 2! 3!
To illustrate the formula, consider the quadratic function f ( x) = x2 . The first
and second derivatives are f ′ ( x) = 2 x and f ′′ ( x) = 2 for all x, so that for any x0 and
any number p, f ′′ ( x0 ) = 2 = f ′′ ( p) and the second derivative, f ′′ ( x) is independent
of x. An exact firstorder expansion has the form f ( x) = f ( x0 ) + f ′ ( p)( x − x0 ) for some p, so that for the function f ( x) = x2 :
f ( x ) = f ( x0 ) + f ′ ( p ) · ( x − x0 ) x2 = x02 + 2 p · ( x − x0 ). This equality requires that p =
x0 + x 2 .
To see this, rearrange the equation:
x2 − x02 = 2 p · ( x − x0 ) ( x − x0 )( x + x0 ) = 2 p · ( x − x0 )
CHAPTER 8. TAYLOR SERIES
274
( x0 + x ) = 2 p
( x0 + x ) = p. 2
Or, with approximation, f ( x) ≈ f ( x0 ) + f ′ ( x0 )( x − x0 ) = [ f ( x0 ) − f ′ ( x0 ) x0 ] + f ′ ( x0 ) · x, a linear equation with intercept [ f ( x0 ) − f ′ ( x0 ) x0 ] and slope f ′ ( x0 ).
f ( x) 1 [ f ( x0 ) − f ′ ( x0 ) x0 ] + f ′ ( x0 ) x
f ( x0 ) x x0
1
[ f ( x0 ) − f ′ ( x0 ) x0 ]
Figure 8.2: Firstorder Taylor series expansion
One advantage of the approximate expansion is that there is no need to calculate
p, since the expansion does not have to be exact. A secondorder Taylor series expansion of x2 gives: 1 f ( x) = x02 + 2 x0 ( x − x0 ) + 2( x − x0 )2 = x02 + 2 x0 x − 2 x02 + x2 − 2 xx0 + x02 = x2 . 2 The expansion is exact because f ′′ ( x) = 2 for all values of x: the second derivative
is a constant. Put differently, if f ′′′ ( x) = 0, so the third derivative is 0, then the
second derivative must be constant and so a secondorder expansion is exact if the third derivative is 0.
E XAMPLE 8.1: Consider the function f ( x) = ln( x + 1) · x2 . Note that f ′ ( x) =
x2 4x x2 ′′ x+1 + 2 ln( x + 1) x and f ( x) = − ( x+1)2 + ( x+1) + 2 ln( x + 1). At x = 2, f ( x) = 4 ln(3), f ′ ( x) = 34 + 4 ln(3), and f ′′ ( x) = − 49 + 38 + 2 ln(3) = 20 9 + 2 ln(3). This function has a firstorder Taylor series expansion around 2 of: f ( x) ≈ 4 ln(3) + (4 ln(3) + 43 )
8.1. INTRODUCTION
275
( x − 2) and a secondorder expansion f ( x) ≈ 4 ln(3) + (4 ln(3) + 43 )( x − 2) + 4 2 (ln(3) + 10 9 )( x − 2) . Since ln(3) = 1.0986, 4 ln(3) = 4.3944, 4 ln(3) + 3 = 5.7274 ¡ 10 ¢ ¡ 10 ¢ and ln(3) + 9 = 1.0986 + 9 = 2.2097, so f ( x) ≈ 4.3944 + 5.7274( x − 2) +
2.2097( x − 2)2 .
4.3944 + 5.7274( x − 2) + 2.2097( x − 2)2
f ( x) 20
10
x 1
2
3
4
From the figure, the approximation lies above f ( x), is tangent at x = 2 and
♦
provides a good approximation on the interval (1,3).
8.1.3
Approximations and accuracy
As the length of the Taylor series expansion increases, the accuracy of the approximation increases, but the complexity of the approximation also increases with higherorder polynomials. So, there is a tradeoff between increasing accuracy and increasing complexity. To see this, consider f ( x) = x3 and expansions
around x0 = 1. The firstorder Taylor series expansion is TSE1 ( x) = 1 + 3( x − 1) =
f (1) + f ′ (1)( x − 1); and the secondorder Taylor series expansion is TSE2 ( x) = 1 +
3( x − 1) + 3( x − 1)2 = f (1) + f ′(1)( x − 1) + 21 f ′′ (1)( x − 1)2 . The error associated with
the firstorder Taylor series expansion is E1 ( x) = f ( x) − TSE1 ( x) , and with the
secondorder expansion, E2 ( x) = f ( x) − TSE2 ( x) . The following table compares
the errors.
x
x3
TSE1 ( x)
TSE2 ( x)
E1 ( x )
E2 ( x )
0.4
0.064
0.28
0.864
0.216
0.5
0.125
−0.8
0.25
0.625
0.125
0.6
0.216
−0.5
0.416
0.064
0.343
−0.2
0.28
0.7
0.1
0.37
0.243
0.027
CHAPTER 8. TAYLOR SERIES
276
0.8
0.512
0.4
0.52
0.112
0.008
0.9
0.729
0.7
0.73
0.029
0.001
1
1
1
1
0
0
1.1
1.331
1.3
1.33
0.031
0.001
1.2
1.728
1.6
1.72
0.128
0.008
1.3
2.197
1.9
2.17
0.297
0.027
1.4
2.744
2.2
2.68
0.544
0.064
1.5
3.375
2.5
3.25
0.875
0.125
1.6
4.096
2.8
3.88
1.296
0.216
f (x) = x3
TSE 2 (x) = 1 +3(x −1)+3(x −1)2
4
TSE 1 (x) = 1 +3(x −1)
3 2 1 · 1
2
−1 −2
Figure 8.3: Taylor series approximations
In Figure 8.3, f (0) = 0, T SE 1 (0) = −2 and T SE 2 (0) = 1.
8.2 Applications of Taylor series expansions The following sections consider some applications of Taylor series expansions. The first application is a proof that concavity of a function implies that the second derivative is nonpositive. Following this, numerical optimization routines based on Taylor series are developed. Then some applications on decisionmaking with risk are described.
8.2. APPLICATIONS OF TAYLOR SERIES EXPANSIONS
277
8.2.1
Concavity and the second derivative
The following theorem shows that concavity implies a nonpositive second derivative; a similar argument may be used to show that convex functions have nonnegative second derivatives. Theorem 8.2: If f is concave and f ′′ ( x) continuous then for any x, f ′′ ( x) ≤ 0. P ROOF : Pick x and let ∆ x = x˜ − x, x˜ close to x. Note 21 ( x + ∆ x) + 12 ( x − ∆ x) = x and
f concave gives:
f ( x) = f
µ
¶ 1 1 1 1 ( x + ∆ x ) + ( x − ∆ x ) ≥ f ( x + ∆ x ) + f ( x − ∆ x ). 2 2 2 2
Therefore 2 f ( x) ≥ f ( x + ∆ x) + f ( x − ∆ x) (or f ( x + ∆ x) + f ( x − ∆ x) ≤ 2 f ( x)). Now, Taylor’s theorem gives:
f ( y) = f ( y0 ) + f ′ ( y0 )( y − y0 ) +
1 ′′ f ( y˜ )( y − y0 )2 , 2!
where y˜ is between y and y0 . First, take y = x + ∆ x, y0 = x, so y − y0 = ∆ x and then
f ( x + ∆ x ) = f ( x ) + f ′ ( x )∆ x +
1 ′′ ∗ f ( x )(∆ x)2 , x∗ between x and x + ∆ x. 2
Next, take y = x − ∆ x, y0 = x, so y − y0 = −∆ x and then
f ( x − ∆ x ) = f ( x ) − f ′ ( x )∆ x +
1 ′′ f ( xˆ )(∆ x)2 , xˆ between x and x − ∆ x. 2
Adding these together: 1 f ( x + ∆ x) + f ( x − ∆ x) = 2 f ( x) + [ f ′′ ( x∗ ) + f ′′ ( xˆ )](∆ x)2 . 2 Recall from concavity that 2 f ( x) ≥ f ( x + ∆ x) + f ( x − ∆ x), so 1 0 ≥ f ( x + ∆ x) + f ( x − ∆ x) − 2 f ( x) = [ f ′′ ( x∗ ) + f ′′ ( xˆ )](∆ x)2 2 Since (∆ x)2 > 0, this implies that 0 ≥ 21 [ f ′′ ( x∗ ) + f ′′ ( xˆ )]. Since f ′′ is continuous, as ∆ x → 0, x∗ , xˆ → x, and so f ′′ ( x∗ ), f ′′ ( xˆ ) → f ′′ ( x). Therefore, 0 ≥ f ′′ ( x).
CHAPTER 8. TAYLOR SERIES
278
8.2.2 Roots of a function Given a function f , a common problem is to find a root of f , a value for x such that ¯ = 0. Finding a root involves solving the equation f (x) = 0. In practice this can f ( x)
be difficult. However, using Taylor series there is a simple procedure that may be used to search for the roots of a function. Consider a firstorder Taylor series expansion of f around some x0 : f (x) ≈ f (x0 ) + f ′ (x0 )(x − x0 ). Given x0 , suppose x is chosen to set f (x0 ) + f ′ (x0 )(x − x0 ) = 0, with solution x1 . Then f (x1 ) ≈ f (x0 ) + f ′ (x0 )(x1 − x0 ) = 0, so that x1 is “approximately” a root of f . If this process is repeated with x1 replacing x0 , a revised estimate, x2 , for a root is obtained: f (x2 ) ≈ f (x1 ) + f ′ (x1 )(x2 − x1 ) = 0. And so on. This process is easy to implement in practice. Assuming f ′ (x0 ) 6= 0, the expression can be arranged to give an algorithm: x1 − x0 = −
f (x0 ) f ′ (x0 )
x1 = x0 −
f (x0 ) f ′ (x0 )
x2 = x1 −
f (x1 ) f ′ (x1 )
.. .
.. .
x n +1 = x n −
f (x n ) . f ′ (x n )
To illustrate, consider the function f (x) = − 12 + 4x − 4x2 + x3 (Figure 8.4). Applying this iterative process:
xi
f
f′
xi −
1
0.5
−1
1.5
1.5 1.4 1.403
−0.125
0.004 0.000
f (xi ) f ′ (xi )
−1.25
1.4
−1.318
1.403
−1.32
1.403
8.2. APPLICATIONS OF TAYLOR SERIES EXPANSIONS
279
f ( x) 2
1
x −0.5
0.5
1.0
1.5
2.0
2.5
Figure 8.4: Roots of a function: f ( x) = − 12 + 4 x − 4 x2 + x3
8.2.3
Numerical optimization
Suppose we wish to maximize some differentiable function f ( x). A maximum has the property that f ′ ( x) = 0. How do we solve f ′ ( x) = 0? One approach uses Taylor
series expansions,
f ( x) ≃ f ( x0 ) + f ′ ( x0 )( x − x0 ) +
1 ′′ f ( x0 )( x − x0 )2 . 2
When x0 is “close” to x, a value of x that maximizes the righthand side (RHS) will also maximize the LHS. Differentiate with respect to x and set to 0. Since the RHS is a quadratic function of x, the derivative is linear in x. This yields
f ′ ( x0 ) + f ′′ ( x0 )( x − x0 ) = 0. Solving for x (call the solution x1 )
x1 = x0 −
f ′ ( x0 ) , f ′′ ( x0 )
assuming that f ′′ ( x0 ) 6= 0. Presumably, x1 will be closer to a maximum of f ( x) than
x0 . We can continue the iteration
x2 = x1 −
f ′ ( x1 ) , f ′′ ( x1 )
CHAPTER 8. TAYLOR SERIES
280
assuming f ′′ ( x1 ) 6= 0, and having defined x1 , x2 ,... , x n−1 ,
f ′ ( xn ) , f ′′ ( x n )
x n = x n −1 −
assuming f ′′ ( x n−1 ) 6= 0. If { x n } converges it will solve the problem f ′ ( x) = 0. E XAMPLE 8.2: Consider f ( x) = sin x so that f ′ ( x) = cos x, f ′′ ( x) = − sin x. The
sine function, sin( x), is equal to 0 at x = 0, rising to a value of 1 at x = π2 ,
falling to 0 at x = π. Thus, on the interval [0, π], sin x has a maximum at Starting the algorithm at
1 2
π 2.
= x0 begins an iteration to estimate π2 . Using the
formula, the iteration has the form:
x n = x n −1 +
cos( x n−1 ) . sin( x n−1 )
Beginning with x0 = 21 , ¡1¢ 1 cos 2 1 0.878 x1 = + = 2.330 ¡1¢ = + 2 sin 2 0.479 2
cos(2.330) −0.688 x2 = 2.330 + = 2.330 + = 1.381 sin(2.330) 0.725 cos(1.381) 0.189 x3 = 1.381 + = 1.381 + = 1.573 sin(1.381) 0.982
x4 = 1.573 +
cos(1.573) −(2.20) × 10−3 = 1.573 + = 1.5707963. sin(1.573) 0.999
0.982 0.683
sin( x)
0.479
·
·
1 2
Since
π 2
1.383
·
π 2
· 2.329
π
= 1.5707963, even at the fourth iteration we are getting an accurate
estimate of
π 2!
Starting at
1 2,
observe that the algorithm moves rightward
in the direction of increase of the function given by the derivative (to 2.329) overshooting. From this point, again moving in the direction of increase of
8.2. APPLICATIONS OF TAYLOR SERIES EXPANSIONS
281
the function, now leftward, the algorithm moves to 1.383. The next move is rightward, since the direction of increase is rightward – to 1.573.
♦
E XAMPLE 8.3: Consider the task of maximizing profit, where the profit function is: π( x) = 10 ln( x + 1) − 0.1 e0.1 x − x2 .
First, observe that the derivatives are π′ ( x) = π′′ ( x) =
−10 ( x+1)2
10 x +1
− (0.1)2 e0.1 x − 2 x and
− (0.1)3 e0.1 x − 2. Thus π′′ < 0 and the profit function is concave.
Therefore, if x∗ satisfies π′ ( x∗ ) = 0, then x∗ maximizes π. Using a secondorder Taylor series expansion:
1 π( x) ≈ π( x0 ) + π′ ( x0 )( x − x0 ) + π′′ ( x0 )( x − x0 )2 , 2 so that 0 = π′ ( x) ≈ π′ ( x0 ) + π′′ ( x0 )( x − x0 ) ⇒ x ≈ x0 −
π′ ( x 0 ) π′′ ( x0 )
.
There is no obvious point to start the iteration, so choose x0 = 0. The algorithm proceeds according to:
x i +1 = x i −
π′ ( x i ) π′′ ( x i )
= xi −
Ã
10 2 0.1 x − 2x x+1 − (0.1) e −10 3 0 . 1 − (0.1) e x − 2 ( x+1)2
!
.
The following table gives the sequence of iterations: From this it is clear that if x i+1 = x i and π′′ ( x i ) 6= 0, then π′ ( x i ) = 0 and this implies that x i maximizes π:
′
x i − ππ′′((xx))
xi
π′ ( x )
π′′ ( x)
0
9.99
0.832431
0.832431
3.781503
−12.0010
1.591887
0.662694
1.781785
1.781785
0.019295
−3.48974
1.787643
1.59 × 10−05
−3.28804
1.787648
−3.28803
1.787648
−3.28803
1.787648
1.787648 1.787648 1.787648 1.787648
1.08 × 10
−11
2.25 × 10
−16
2.25 × 10
−16
2.25 × 10−16
−4.97923
1.591887
−3.29346
1.787643
−3.28803
1.787648
−3.28803
1.787648
The profitmaximizing choice of x is x∗ = 1.787648.
♦
CHAPTER 8. TAYLOR SERIES
282
E XAMPLE 8.4: A production function has isoquants given by x y = k: if in
puts are used at the levels x∗ and y∗ , the output level is x∗ y∗ . A technological constraint requires that the inputs x and y satisfy the relation
y = 1 − e−α x . Suppose that α = 2, and the firm decides to produce an output
of 5 units, so k = 5. In this case, the values of x∗ and y∗ are determined by
the two equations x y = 5 and y = 1 − e−2 x . The following calculations show how the values may be calculated using a Taylor series expansion. The first equation can be written x = the second equation we get y = 1 − e
f ( y) = y − 1 + e
−10 y
5 y
−2 5y ∗
and when this is substituted into = 1− e
−10 y
or y − 1 + e
−10 y
∗
= 0. Let
, so we wish to find y such that f ( y ) = 0. Now, f ( y∗ ) ≈
f ( y0 ) + f ′ ( y0 )( y∗ − y0 ), when y∗ is “close” to y0 , and since f ( y∗ ) = 0, f ( y0 ) +
f ′ ( y0 )( y∗ − y0 ) ≈ 0, so that y∗ ≈ y0 − iteration yi+1 ≈ yi −
f ( yi ) f ′ ( yi ) .
f ( y0 ) f ′ ( y0 ) .
This provides the basis for the ³ ´ −10 −10 Since f ( y) = y − 1 + e y , f ′ ( y) = 1 + 10 e y : y2
−10 yi
yi − 1 + e µ ¶ −10 y i +1 = y i − . 1 + 102 e yi yi
Use a starting value for the iteration of y0 = 2:
f ′ ( x)
f ( x) f ′ ( x)
xi
f ( x)
2
1.006738
1.016845
1.009939
1.009939
0.00999
1.000496
0.999955
0.999955
2.37 × 10−07
1.000454
0.999955
−4.7 × 10
−15
1.000454
0.999955
−17
1.000454
0.999955
−5.1 × 10
−17
1.000454
0.999955
0.999955 0.999955 0.999955
−5.1 × 10
Thus, y∗ = 0.999955 and x =
5 0.999955
xi −
is the required solution.
♦
The numerical optimization procedure generalizes directly to multivariate optimization. The secondorder Taylor series approximation in the multivariate case is:
f ( x ) ≈ f ( x0 ) + ∇ f ( x0 ) · ( x − x0 ) +
1 ( x − x0 )′ H ( x0 )( x − x0 ), 2!
where ∇ f ( x0 ) is the vector of partial derivatives of f evaluated at x0 , and where
H ( x0 ) is the Hessian matrix of secondorder partial derivatives, also evaluated at
8.2. APPLICATIONS OF TAYLOR SERIES EXPANSIONS
283
x0 . Maximizing the right side by choice of the vector x gives: ∇ f ( x0 ) + H ( x0 )( x − x0 ) = 0.
Assuming the Hessian matrix is nonsingular, the solution x1 satisfies
x 1 − x 0 = − H ( x 0 ) −1 ∇ f ( x 0 ) , and the resulting iteration is:
x i +1 − x i = − H ( x i ) −1 ∇ f ( x i ) .
E XAMPLE 8.5: Consider the objective function: 1
1
π( x, y) = 2 x 2 + 4 y 2 + x y − x2 − y2 .
Using a secondorder expansion, it is straightforward to develop an algorithm to locate the values of x and y that maximize π. First, observe that a secondorder Taylor series expansion of π( x, y) around ( x0 , y0 ) is: 1 π( x, y) ≈ π( x0 , y0 ) + π x ( x0 , y0 )( x − x0 ) + π y ( x0 , y0 )( y − y0 ) + {π xx ( x0 , y0 )( x − x0 )2 2 + 2π x y ( x0 , y0 )( x − x0 )( y − y0 ) + π yy ( x0 , y0 )( y − y0 )2 }.
Maximize the right side by choice of x and y. Differentiating with respect to x and y, respectively, and setting the equations to 0 gives the solution equations. For x: 1 0 = π x ( x0 , y0 ) + {2π xx ( x0 , y0 )( x − x0 ) + 2π x y ( x0 , y0 )( y − y0 )} 2 = π x ( x0 , y0 ) + π xx ( x0 , y0 )( x − x0 ) + π x y ( x0 , y0 )( y − y0 ).
Similarly, with y: 0 = π y ( x0 , y0 ) + π x y ( x0 , y0 )( x − x0 ) + π yy ( x0 , y0 )( y − y0 ). In matrix terms: Ã ! 0
0
=
Ã
π x ( x0 , y0 )
! Ã
π y ( x0 , y0 )
+
π xx ( x0 , y0 )
π x y ( x0 , y0 )
π x y ( x0 , y0 )
π yy ( x0 , y0 )
!Ã
( x − x0 )
!
( y − y0 )
.
CHAPTER 8. TAYLOR SERIES
284
Rearranging: Ã
( x − x0 )
!
( y − y0 )
=−
Ã
π xx ( x0 , y0 )
π x y ( x0 , y0 )
π x y ( x0 , y0 )
π yy ( x0 , y0 )
! −1 Ã
π x ( x0 , y0 )
!
π y ( x0 , y0 )
.
The solution has the form ( z = ( x, y)):
z = z0 − Hπ ( x, y)−1 ∇π( x, y), where Hπ is the matrix of partial derivatives and ∇π = (π x , x¯ y ). For the specific functional form: 1
1
π( x, y) = 2 x 2 + 4 y 2 + x y − x2 − y2 1
π x ( x, y) = x− 2 + y − 2 x 1
π y ( x, y) = 2 y− 2 + x − 2 y
1 3 π xx ( x, y) = − x− 2 − 2 2 3
π yy ( x, y) = − y− 2 − 2 π x y ( x, y) = 1.
Thus,
H ( x, y) =
Ã 3 − 21 x− 2 − 2
1
1 3
− y− 2 − 2
!
,
and so the algorithm is (beginning at ( x, y) = ( x0 , y0 )): Ã
x i +1
y i +1
!
Ã ! −3 xi − 21 x i 2 − 2 = − yi 1
−1
1 − 23
− yi
−2
− 21
xi
+ yi − 2 x i
, −1 2 yi 2 + x i − 2 yi
given initial value ( x0 , y0 ). A plot of the function:
z
x y
♦
8.3. EXPECTED UTILITY THEORY AND BEHAVIOR TOWARDS RISK
285
8.3 Expected utility theory and behavior towards risk The presence of risk (variability) lowers the value of an asset: other things equal, comparing two assets, the one that has greater risk will have a lower value. In this section, Taylor series expansions are used to determine approximate welfare and monetary measures of the negative impact of risk. Before providing these measures, some introduction to utility and risk is necessary.
8.3.1
Expected utility theory
The usual consumer theory model has u( x1 , x2 ,... , x n ) = utility from consumption of the ( x1 ,... , x n ) bundle of goods. In this model, the individual receives and consumes x1 of good 1, x2 of good 2 and so on. Suppose that one receives some random return instead of a sure consumption bundle. Then this formulation is inappropriate: when a person receives a random return X , we cannot attach a utility to the return in the same way. Let X be a random return:
X
Probability
x1
p1
x1 .. .
p1 .. .
xn
pn
where p i is the probability that X takes on the value x i . With this random return, the individual receives income x i with probability p i . When you hold a lottery ticket that pays a million dollars if you win (and you win with probability p, say), then the situation is described:
X
Probability
0
(1 − p)
1 million
p
Note that you either get 0 or 1 million – you do not get both! Above, the person receives some return (some x i ). Only one x i is actually received. Note also that we often have to make a decision about whether to purchase a random return or not, prior to knowing the outcome. We cannot choose to buy the lottery ticket after we know whether we have won or not. At the time of purchase, we face the possibility
CHAPTER 8. TAYLOR SERIES
286
of getting either 0 or one million dollars – and have to make a decision. Similarly, when one buys a stock today, the value of the stock tomorrow is unknown. In situations such as these, we need a way to compare the utility associated with different uncertain outcomes, so that we can choose the most preferred one. Expected utility theory is such a way. According to expected utility theory an individual has a utility function, u, defined on income. To calculate the utility associated with the random income or return stream X given previously, calculate the utility of income in each state i, u( x i ), multiply by the probability p i to get p i u( x i ) and sum P these terms, p i u( x i ). This is called expected utility and denoted E {u( X )}. Here,
u( x i ) is utility from x i . (Some outcome x i will occur – and this is what the indi
vidual actually receives.) Now we can compare two random returns X and Y in expected utility terms:
X
Probability
Y
Probability
x1
p1
y1
q1
x2 .. .
p2 .. .
y2 .. .
q2 .. .
xn
pn
ym
qm
Then the individual prefers the random return y to x if: m X
j =1
q j u( y j ) >
n X
p i u ( x i ).
i =1
E XAMPLE 8.6: Let X and Y be two random variables with the following distributions:
X
Probability
Y
Probability
100
1 3 1 3 1 3
200
1 4 1 4 1 4 1 4
196 400
220 244 264
Suppose that the utility function is the square root function: u(·) =
p
·.
1 44 p i u( x i ) = [10 + 14 + 20] = = 14.67 3 3 X 60.84 1 = 15.21. E {u(Y )} = p i u( yi ) = [14.14 + 14.83 + 15.62 + 16.25] = 4 4
E { u ( X )} =
X
8.3. EXPECTED UTILITY THEORY AND BEHAVIOR TOWARDS RISK
287
Thus, y is preferred to x. Why should this be? Some insight is given from the mean and variance. E(X ) =
X
p i x i = 232 =
696 3
E(Y ) =
X
q i yi = 232 =
928 4
1 46944 Var(X ) = [(100 − 232)2 + (196 − 232)2 + (400 − 232)2 ] = = 15648 3 3 1 Var(Y ) = [(200 − 232)2 + (220 − 232)2 + (244 − 232)2 + (264 − 232)2 ] 4 =
2336 = 584 4
Thus, while both have the same mean return, Y has a smaller variance – is
♦
“less risky”.
8.3.2
Riskaverse preferences and welfare loss
The reason that Y is preferred to X in Example 8.6 is related to the shape of u. In general, u(·) is assumed to be concave: this is said to represent riskaverse pref
erences. An individual is said to be risk averse if u is concave. It is easy to check p that · is a concave function. Loosely speaking, riskaverse individuals prefer low
variance for a given mean return. To understand the role of concavity, consider the case where the random variable X has only two possible values, x1 and x2 , as depicted in Figure 8.5.
X
Probability
x1
p
x2
1− p
The individual prefers the sure amount µ = px1 + (1 − p)x2 to the uncertain
return which gives x1 with probability p and x2 with probability (1 − p). Thus
concavity means that the individual prefers the mean return µ, for sure, rather than the uncertain return X (with mean µ). One can check that the opposite is the case when the utility function is convex. This can be taken further to clarify the role of the variance.
CHAPTER 8. TAYLOR SERIES
288
u
•
u ( x2 ) u(µ)
• •
E { u ( X )}
u ( x1 )
•
µ
x1
x2
x
Figure 8.5: Risk aversion Suppose x1 , x2 are close to µ, so that we can use a Taylor series approximation: 1 1 u( x1 ) = u(µ) + u′ (µ)( x1 − µ) + u′′ (µ)( x1 − µ)2 + u′′′ (µ1 )( x1 − µ)3 2 3! 1 1 u( x2 ) = u(µ) + u′ (µ)( x2 − µ) + u′′ (µ)( x2 − µ)2 + u′′′ (µ2 )( x2 − µ)3 . 2 3! P We can use these equations to calculate the expected utility, E (u) = p i u( x i ), as
follows:
E ( u ) = p 1 u ( x1 ) + p 2 u ( x2 ) = [ p1 u(µ) + p2 u(µ)] + [ p1 u′ (µ)( x1 − µ) + p2 u′ (µ)( x2 − µ)]
1 + [ p1 u′′ (µ)( x1 − µ)2 + p2 u′′ (µ)( x2 − µ)2 ] 2 1 + [ p1 u′′′ (µ1 )( x1 − µ)3 + p2 u′′′ (µ2 )( x2 − µ)3 ]. 3!
If ( x i − µ) is “small” then the last term is relatively small and can be ignored. Then,
E {u( X )} ≈ [ p1 u(µ) + p2 u(µ)] + [ p1 u′ (µ)( x1 − µ) + p2 u′ (µ)( x2 − µ)] 1 + [ p1 u′′ (µ)( x1 − µ)2 + p2 u′′ (µ)( x2 − µ)2 ]. 2
8.3. EXPECTED UTILITY THEORY AND BEHAVIOR TOWARDS RISK
289
Adding terms: E {u(X )} ≈ [p1 + p2 ]u(µ) + [p1 (x1 − µ) + p2 (x2 − µ)]u′ (µ) 1 + [p1 (x1 − µ)2 + p2 (x2 − µ)2 ]u′′ (µ). 2
Note that [p1 + p2 ] = 1, and [p1 (x1 − µ) + p2 (x2 − µ)] = 0 from the definition of the mean; while [p1 (x1 − µ)2 + p2 (x2 − µ)2 ] = σ2 , from the definition of the variance. Thus, we have the expression:
1 E {u(X )} ≈ u(µ) + u′′ (µ)σ2 . 2
(8.4)
If we write the last expression as E {u(X )} − u(µ) ∼ = 12 u′′ (µ)σ2 , we see that when u
is concave, then u′′ is negative – so that expected utility is less than u(µ). (See Figure 8.6.) Note also that the difference depends on the variance of the random return. Thus, the welfare loss due to small risk is: 1 ′′ u (µ)σ2 . 2 So, 12 u′′ (µ)σ2 is the loss in utility or the disutility associated with the risk. Figure 8.6 illustrates the idea. The disutility associated with risk is measured in utility terms but we can also give a monetary measure – discussed next.
u
•
u(x2 ) u(µ)
o
1 ′′ 2 2 u (µ)σ
•
E {u(X )}
u(x1 )
•
•
x1
µ
x2
Figure 8.6: Utility loss associated with risk
x
CHAPTER 8. TAYLOR SERIES
290
8.3.3 Risk aversion and the cost of small risk Let income be represented by a random variable Y , where Y takes on the value P P yi with probability p i (i.e., Prob(Y = yi ) = p i ). Thus µ y = p i yi and σ2y = p i ( yi − µ y )2 . Instead of asking how risk affects the level of utility, one can focus on the
monetary cost of bearing risk. How much would a person pay to avoid facing the risk and receive a sure income µ y ? If the individual receives µ y and makes a payment of c, the amount received is y∗ = µ y − c, and the individual is indifferent between the sure income y∗ and facing the random return Y . The amount, c, is P given as the solution of the equation: u(µ y − c) = p i u( yi ). In general, solving this explicitly for c is very difficult. However, we can use Taylor series expansions to find an approximate solution. This is done now, expanding each side separately. First, consider an expansion of u(µ y − c) around µ y .
u(µ y − c) = u(µ y ) − u′ (µ∗ ) c, some µ∗ ∈ [µ y − c, µ y ] = u(µ y ) − u′ (µ y ) c + [u′ (µ y ) − u′ (µ∗ )] c ≈ u(µ y ) − u′ (µ y ) c, when c is small so [u′ (µ y ) − u′ (µ∗ )] is small.
By taking an expansion to the first order for u(µ y − c), the approximation obtained
is linear in c. (A better approximation is given by u(µ y − c) ≈ u(µ y ) − u′ (µ y ) c + 1 ′′ 2 2 u (µ y ) c ,
but that is quadratic in c. Here, the linear approximation gives sim
plicity at the cost of less accuracy.) Next, consider the right side of the expression u(µ y − c) =
sion of u( yi ) gives:
P
p i u( yi ). An expan
1 1 u( yi ) = u(µ y ) + u′ (µ y )( yi − µ y ) + u′′ (µ y )( yi − µ y )2 + [u′′ (ξ) − u′′ (µ y )]( yi − µ y )2 , 2 2 where ξ is between µ y and yi . If yi is close to µ y the last term is relatively small and so: 1 u( yi ) ≈ u(µ y ) + u′ (µ y )( yi − µ y ) + u′′ (µ y )( yi − µ y )2 . 2 Multiplying by p i and summing over i gives: X
or, since
P
p i u ( y i ) ≃ u (µ y ) + u ′ (µ y )
p i ( yi − µ y ) =
P
p i yi −
X
P
X
X 1 p i ( yi − µ y ) + u′′ (µ y ) p i ( yi − µ y )2 , 2
p i µ y = µ y − µ y = 0,
1 p i u( yi ) ≃ u(µ y ) + u′′ (µ y )σ2y . 2
8.3. EXPECTED UTILITY THEORY AND BEHAVIOR TOWARDS RISK
291
u
•
u(x2 ) u(µ)
•
E {u(X )} = u(µ − c)
•
•
u(x1 )
c
z } { µ
x1
x2
x
Figure 8.7: The cost of risk
Combining these two approximations: u(µ y ) − u′ (µ y )c ≈ u(µ y − c) = and so
X
1 p i u(yi ) ≈ u(µ y ) + u′′ (µ y )σ y 2
1 u(µ y ) − u′ (µ y )c ≈ u′ (µ y ) + u′′ (µ y )σ y . 2
Solving for c (approximately) gives: c≈
1 −u′′ (µ y ) 2 σ . 2 u ′ (µ y ) u′′ (µ )
Thus the “cost” of the risk is proportional to σ2y . The proportionality factor − u′ (µ yy)
is called the “degree of absolute risk aversion”. Figure 8.7 indicates how c is determined.
E XAMPLE 8.7: Let u(x) = ln(x), so that u′ (x) =
the random return is:
X
Probability
95
1 2 1 2
105
1 x
and u′′ (x) = − x12 . Suppose
CHAPTER 8. TAYLOR SERIES
292
u′′ ( x) u ′ ( x)
= − 1x and σ2x = 21 [52 + 52 ] = 25. From these, we calculate c £ ¤ £ 1 ¤ 25 = 18 . Similarly, using the formula c = 12 1x σ2x = 12 100 i we see that the h
Therefore,
utility loss associated with the risk is 21 u′′ (µ)σ2x =
1 2
−1 1002
25 =
25 20,000 .
♦
E XAMPLE 8.8: This example illustrates the computation of the cost of risk and the loss of utility using the approximations based on Taylor series expansions. Exact calculations (rather than approximations) are also given, to provide comparison. 1
Suppose that an individual has the utility function u( x) = x 3 and suppose
that the individual faces a 5050 gamble of $45 or $55 (i.e., with probability 1 2
the individual receives $45 and with probability
$55). Since E {u( X )} ≈ u(50) +
1 2
the individual receives
1 ′′ 2 2 u (50)σ ,
5 2 2 1 u(50) − E {u( X )} = − u′′ (50)σ2 = 50− 3 × 25 = (0.001474)25 = 0.008187. 2 9 9
Thus, the approximate loss of utility from facing the risk, relative to receiving the sure amount of 50, is 0.008187. Similarly, the approximate cost associated with the risk is given by c ≈
′′ (50) 2 − 21 uu′ (50) σ . To compute this, 5 u′′ ( x) = − 92 x− 3 . Therefore,
1
observe that u( x) = x 3 , so that u′ ( x) = 13 x
−2 3
and
5
2 − 6 1 2 1 6 u′′ ( x) − 9 x 3 = = − x −1 = − × = − × . 2 ′ 1 − u ( x) 9 9 x 3 x 3 3x
Since σ2 = 12 (50 − 45)2 + 21 (55 − 45)2 = 12 52 + 21 52 = 25,
c≈−
µ ¶ 1 1 2 1 1 u′′ (50) 2 σ = − × − 25 = = 0.16667. ′ 2 u (50) 2 3 50 6
Now, observe that in this example, the exact value of expected utility is: E(u( X )) =
1 1 1 1 1 1 u(45) + u(50) = 45 3 + 50 3 = 3.620462. 2 2 2 2
Recall that c is chosen to satisfy:
u(50 − c) = E{u( X )},
8.4. DIVERSIFICATION AND PORTFOLIO SELECTION
293
1
so that c must satisfy u(50 − c) = (50 − c) 3 = 3.679923, or 50 − c = 3.6799233 =
49.8329, which gives c = 50 − 49.8329 = 0.167101. Summarizing, the exact cost of the risk is 0.167101, and the approximation of this using the Taylor
♦
series expansions is 0.16667.
8.4 Diversification and portfolio selection Diversification of risk involves combining different random returns that are independent or have low correlation. In that way, the probability of all returns being low (or high) at the same time is small, so that extreme variation in return is less likely. This is briefly discussed and related to the cost of risk. Then approximately optimal portfolio selection using Taylor series expansions is considered.
8.4.1
Diversification
This section illustrates how portfolio diversification reduces risk. Suppose that we have n projects. Assume that the return on each project is independent of the return on other projects. Project j has return Y ( j), a random variable. This random variable has n j possible values, (Y1 ( j),Y2 ( j),... ,Yn j ( j)), with associated probabilities ( p1 ( j), p2 ( j),... , p n j ( j)). The probability that Y ( j) takes the value Yi ( j) is
p i ( j): µ( j) = σ2 ( j ) =
nj X
p i ( j) yi ( j) = E (Y ( j)),
nj X
p i ( j)[Yi ( j) − µ( j)]2 = Var{Y ( j)}.
i =1
i =1
Let µ j = µ, σ2 ( j) = σ2 – so we assume that each of the random variables has the
same mean and variance. If an individual undertakes project j, the cost of risk is:
c=
1 −u′′ (µ) 2 σ . 2 u′ (µ)
Suppose instead the individual forms a portfolio containing n1 of each project. DeP note this portfolio Z . Z = n1 nj Y j a random variable. The mean and variance are:
E(Z) = µ =
n n 1X 1 1 X µj = µ = nµ = µ n j =1 n i =1 n
CHAPTER 8. TAYLOR SERIES
294
and Var( Z ) = n1 σ2 . To see the latter: Ã
! # " n n 1 X 1 X Var( Z ) = Var Yj = 2 Var(Y j ) , n j =1 n j =1
assuming the Y are independent. This is then equal to 1 n2
"
n X
σ2j
j =1
#
1 = 2 n
"
n X
j =1
σ
2
#
=
1 £ 2¤ 1 2 nσ = σ . n n2
Using our earlier results, we see that Z is a portfolio that has cost of risk to the
i th individual given by: c=
· ¸ 1 −u′′ (µ) σ2 . 2 u′ (µ) n
We observe that as the number (n) of projects increases, the cost of risk goes to 0. You can easily see what is happening here – the portfolio Z has the same mean return as each of the Y ( j) assets but the variance of Z is
1 n
the variance of each of
the Y ( j).
8.4.2 Portfolio selection This section studies the optimal division of a portfolio into a risky and a safe asset. The purchase of Z units of a risky asset yields an endofyear gross return Z per unit of risky asset purchased. Let p i = Prob(Z = zi ). The asset price is P (price at
beginning of year), the individual’s wealth at the beginning of the year is W . Thus,
if the individual purchases α units of the asset and if at the end of the year the risky asset return turns out to be z i , then the individual’s endofyear wealth is
W − αP + α z = W + α( z i − P ). Therefore, the (endofyear) expected utility is given P by: E {u(W + α( Z − P ))} = p i u(W + α( z i − P )). P The individual’s problem is to choose α to solve: maxα p i u(W + α( z i − P )). Differentiating with respect to α gives X
p i u′ (W + α( z i − P ))( z i − P ) = 0.
Solving this gives the optimal solution for α. However, this may be difficult to solve, so that finding an approximate solution via a Taylor series approximation is useful. If z i − P is small,
u′ (W + α( z i − P )) ≃ u′ (W ) + u′′ (W )α( z i − P ).
8.4. DIVERSIFICATION AND PORTFOLIO SELECTION
295
Note that by using a firstorder Taylor series expansion of u′ (W + α( z i − P )), the
expression above is linear in α. So X
p i {u′ (W ) + u′′ (W )α( z i − P )}( z i − P ) ≃ 0
or X
p i u′ (W )( z i − P ) +
X
p i u′′ (W )α( z i − P )2 ≃ 0.
To simplify this expression, observe that X
X p i u′ (W )( z i − P ) + p i u′′ (W )α( z i − P )2 X X = u′ (W ) p i ( z i − P ) + u′′ (W )α p i ( z i − P )2 .
Focus on the terms
Note that σ z = P p i ( z i − P )2 . X
P
P
p i ( z i − P )2 . We see that:
p i ( z i − P ) and
P
X
X
pi (zi − P) =
pi zi −
X
p i P = µ − P.
p i ( z i −µ z )2 is the variance of Z , but the expression above involves ¡ ¢2 p i ( z i − µ) + (µ − P ) X X X = p i ( z i − µ)2 + 2 p i (µ − P )( z i − µ) + p i (µ − P )2 X X = σ2 + 2(µ − P ) p i ( z i − µ) + (µ − P )2 p i
p i ( z i − P )2 =
X
= σ2 + ( µ − P ) 2 .
P p i = 1 and p i ( z i − µ) = 0.) P This calculation gives p i ( z i − P )2 = σ2 + (µ − P )2 ). So,
(Using the fact that
u′ (W )
X
P
p i ( z i − P ) + u′′ (W )α
X
p i ( z i − P )2 = u′ (W )(µ − P ) + u′′ (W )α[σ2 + (µ − P )2 ].
Therefore, the firstorder condition is approximately:
u′ (W )(µ − P ) + u′′ (W )α[σ2 + (µ − P )2 ] ≃ 0. Solving for α: α≈
1 ′′ W ) − uu′ ((W )
(µ − P )
σ2 + ( µ − P ) 2
.
E XAMPLE 8.9: Let u(ξ) = ln(ξ), W = 100, P = 2. Suppose that the random
variable Z has the distribution Prob( Z = 1) = Prob( Z = 5) = 12 . Thus, µ = 3,
CHAPTER 8. TAYLOR SERIES
296
σ2 = 4. With these parameters:
· ¸ −1/(100)2 1 u′′ (W ) −u′′ (100) = − = ⇒ 1/ = 100. u′ (100) 1/100 100 u′ (W ) −2) Using the formula for α, α ≃ 100 (34+ 1 =
100 5
is to purchase roughly 20 units of asset.
= 20, so the approximate solution
♦
8.5 The multivariate version of Taylor’s theorem Continuous functions of many variables can also be approximated by Taylor series expansions. Let f ( x) : R n → R , be a function of n variables where R is the real number line and R n = R ×· · ·×R , the ntimes product of R . Recall that the partial
derivative of f with respect to x i is denoted f xi and the second partial f xi x j =
∂2 f ∂ xi ∂ x j .
Write:
f x1
f x2 ∇ f = . , . . f xn
f x1 x1
f x1 x2
f x2 x1 H = . .. f xn x1
···
f x2 x2
···
.. .
f xn x2
..
.
···
f x1 xn
f x1 xn .. . f xn xn
and
x1 − x01
x2 − x02 x − x0 = . .. . x n − x0 n
Also, write ∇ f ( x) and H ( x) to denote the matrix of derivatives evaluated at x. A
Taylor series expansion of f to the second order around x0 is
f ( x ) = f ( x0 ) + ∇ f ( x0 ) · ( x − x0 ) +
1 ( x − x0 )′ H ( p)( x − x0 ). 2!
Theorem 8.3: Let f : R n → R , be continuous with continuous first and second partial derivatives. Then given x, x0 :
f ( x ) = f ( x0 ) + ∇ f ( x0 ) · ( x − x0 ) +
1 ( x − x0 )′ H ( p)( x − x0 ), 2!
where p is on a line between x and x0 : for some θ ∈ (0, 1), p = θ x0 + (1 − θ) x.
E XAMPLE 8.10: Let f ( x, y) = x2 y3 . So, f x ( x, y) = 2 x y3 , f y ( x, y) = 3 x2 y2 , f xx =
2 y3 , f yy = 6 x2 y and f x y = 6 x y2 . Then a firstorder Taylor series expansion of
8.6. PROOF OF TAYLOR’S THEOREM
297
f around x = 1, y = 1 is: f ( x, y) ≈ f (1, 1) + f x (1, 1)( x − 1) + f y (1, 1)( y − 1) ≈ 1 + 2( x − 1) + 3( y − 1) ≈ −4 + 3 y + 2 x.
The secondorder Taylor series expansion has the form:
f ( x, y) ≈ f (1, 1) + f x (1, 1)( x − 1) + f y (1, 1)( y − 1) 1 + [ f xx (1, 1)( x − 1)2 + 2 f x y (1, 1)( x − 1)( y − 1) + f yy (1, 1)( y − 1)2 ]. 2 Substituting the values for the function: 1 x2 y3 ≈ 1 + 2( x − 1) + 3( y − 1) + [2( x − 1)2 + 2 · 6( x − 1)( y − 1) + 6( y − 1)2 ] 2 ≈ 1 + 2( x − 1) + 3( y − 1) + ( x − 1)2 + 6( x − 1)( y − 1) + 3( y − 1)2 ≈ 6 − 9 y − 6 x + 3 y2 + x2 + 6 x y.
♦
8.6 Proof of Taylor’s theorem The proof of Taylor’s theorem makes use of Rolle’s theorem: Theorem 8.4: Let f be a continuous differentiable function on the interval [a, b]. Suppose that f (a) = f (b) = 0. Then, there is a point c ∈ (a, b) with f ′ ( c) = 0. P ROOF : If f ( x) = 0 for all x ∈ [a, b], then f ′ ( x) = 0 on [a, b]. Otherwise, there
is some point x ∈ [a, b] with f ( x) > 0 or f ( x) < 0. Suppose the former, so f ( x) >
f (a) = f (b). Since f is continuous, it has a maximum on [a, b] and this cannot
be at a or b. Hence there is an interior maximum at some point c with f ′ ( c) = 0.
If instead f ( x) < f (a) = f (b), there is a minimum on [a, b] at some point c with
f ′ ( c) = 0.
Theorem 8.5: Let f be a continuous function on an interval [a, b] with continuous derivatives f ′ , f ′′ , ... , f n . Then given x, x0 ∈ (a, b): 1 ′′ f ( x0 )( x − x0 )2 2! 1 1 + f ′′′ ( x0 )( x − x0 )3 + · · · + f n ( p)( x − x0 )n , 3! n!
f ( x) = f ( x0 ) + f ′ ( x0 )( x − x0 ) +
CHAPTER 8. TAYLOR SERIES
298
where p is between x and x0 , and 1 ′′ f ( x0 )( x − x0 )2 2! 1 1 + f ′′′ ( x0 )( x − x0 )3 + · · · + f n ( x0 )( x − x0 )n 3! n!
f ( x) ≈ f ( x0 ) + f ′ ( x0 )( x − x0 ) +
when x is close to x0 . P ROOF : Consider the secondorder expansion – the generalization to the nthorder case is straightforward. Let
g( z) = f ( x) − f ( z) − f ′( z)( x − z) −
K ( x − z )2 . 2
Observe that g( x) = 0 and g( x0 ) = f ( x) − f ( x0 ) − f ′ ( x0 )( x − x0 ) − K2 ( x − x0 )2 . Choose
K to make this 0, so that g( x0 ) = 0. Then g( x) = g( x0 ), so that by Rolle’s the
orem, there is a point between x0 and x, say ξ with g′ (ξ) = 0 (min{ x0 , x} < ξ < max{ x0 , x}). Differentiating g( z):
g′ ( z) = − f ′ ( z) − f ′′ ( z)( x − z) + f ′( z) + 2
K ( x − z) = − f ′′ ( z)( x − z) + K ( x − z). 2
At ξ, g′ (ξ) = − f ′′ (ξ)( x − ξ) + K ( x − ξ) = 0. Since ξ 6= x, f ′′ (ξ) = K . For the nthorder case: 1 ′′ f ( z)( x − z)2 2! K 1 f ( n−1)( z)( x − z)n−1 − ( x − z)n . −··· − (n − 1)! n!
g( z) = f ( x) − f ( z) − f ′ ( z)( x − z) −
Again, g( x) = 0 and choose K so that g( x0 ) = 0. Rolle’s theorem implies that
there is some ξ between x0 and x with g′ (ξ) = 0. In the derivative of g, all but
two terms cancel:
g′ ( z ) = −
1 K f ( n) ( z)( x − z)n−1 + ( x − z ) n −1 . (n − 1)! (n − 1)!
At ξ,
g′ (ξ) = −
1 K f ( n) (ξ)( x − ξ)n−1 + ( x − ξ)n−1 = 0. (n − 1)! (n − 1)!
Since x 6= ξ, f ( n) (ξ) = K . For the multivariate version, the proof is similar.
8.6. PROOF OF TAYLOR’S THEOREM
299
Theorem 8.6: Let f : R n → R , be continuous with continuous first and second partial derivatives. Then given x, x0 :
f ( x ) = f ( x0 ) + ∇ f ( x0 ) · ( x − x0 ) +
1 ( x − x0 )′ H ( p)( x − x0 ), 2!
where p is on a line between x and x0 : for some θ ∈ (0, 1), p = θ x0 + (1 − θ) x. P ROOF : To see why this is so, given z, ∆ z, let g( t) = f ( z + t∆ z). Then, g is a
function of one variable and a secondorder Taylor series expansion around 0 gives:
g( t) = g(0) + g′ (0) t +
1 ′′ g (s) t2 , 0 < s < t. 2
Since
g′ ( t) =
X i
f i ( z + t∆ z)∆ z i ⇒ g′ (0) =
X
f i ( z )∆ z i
i
and
g′′ ( t) =
XX j
i
f i j ( z + t∆ z)(∆ z i )2 ⇒ g′′ (s) =
XX j
i
f i j ( z + s∆ z)(∆ z i )2 ,
this gives 1 ′′ g (s) t2 2 X 1 XX f i j ( z + s∆ z) t2 ∆ z2i . f ( z + t∆ z) = f ( z) + f i ( z) t∆ z i + 2 j i i
g( t) = g(0) + g′ (0) t +
If z = x0 and ∆ z = α=
s t
∈ (0, 1).
x− x0 t ,
then t∆ z = ( x − x0 ) = ∆ x and s∆ z = st ( x − x0 ) = α( x − x0 ),
f ( x ) = f ( x0 ) +
X i
f i ( x 0 )∆ x i +
XX j
i
f i j ( x0 + α∆ x)(∆ x i )2 .
Writing p = x0 + α( x − x0 ) = x0 (1 − α) + α x, gives:
f ( x ) = f ( x o ) + ∇ f ( x0 ) · ( x − x0 ) +
1 ( x − x0 )′ H ( p)( x − x0 ). 2!
This page intentionally left blank
Exercises for Chapter 8
Exercise 8.1 Consider the following functions:
f ( x ) = e x − 1,
(8.5)
g( x) = ln( x + 1),
(8.6)
h( x) = x ,
(8.7)
u( x ) = x ,
(8.8)
2
1 2
1 v( x ) = , x
(8.9)
w( x) = sin( x).
(8.10)
In all cases take x > 0. (a) Plot each function and determine whether each function is concave, convex or neither. (b) For each function, give the secondorder Taylor series expansion. Exercise 8.2
Give the secondorder Taylor series expansions for the following
functions,
f ( x) ≈ f ( x¯ ) + f ′ ( x¯ )( x − x¯ ) +
1 ′′ f ( x¯ )( x − x¯ )2 : 2
1
f ( x) = x 3
(8.11)
f ( x) = ln( x)
(8.12)
f ( x) = sin( x)
(8.13)
301
CHAPTER 8. TAYLOR SERIES
302
f ( x) = cos( x)
(8.14)
f ( x) = e−rx
(8.15)
f ( x) = x
(8.16)
α
f ( x) = a x .
(8.17)
Exercise 8.3 Consider the average cost function c(Q ) = e−Q + 12 Q 2 . Show that the cost function is strictly convex. Find the minimum of the average cost function c:
find the value of Q that minimizes c. To do this, first observe that we can write a Taylor series approximation of c: 1 c(Q ) ∼ = c(Q 0 ) + c′ (Q 0 )(Q − Q 0 ) + c′′ (Q 0 )(Q − Q 0 )2 . 2 When Q 0 is close to the value of Q that minimizes c, then minimizing c(Q ) should give approximately the same answer as minimizing 1 c(Q 0 ) + c′ (Q 0 )(Q − Q 0 ) + c′′ (Q 0 )(Q − Q 0 )2 . 2 If we minimize the latter expression by differentiating with respect to Q , we get the condition: c′ (Q 0 ) + c′′ (Q 0 )(Q − Q 0 ) = 0. Solving for Q gives Q = Q 0 − i h ′ firstorder c (Q 0 ) is the basis of an iteration: guess an initial Q 0 ; this gives Q 1 = c′′ (Q 0h) . This i h ′ i c′ (Q 0 ) c (Q ) Q 0 − c′′ (Q ) . With Q 1 we can obtain another estimate Q 2 = Q 1 − c′′ (Q1 ) , and 0 h1 ′ i c (Q ) with Q 2 we can obtain Q 3 , and so on. At the nth iteration Q n+1 = Q n − c′′ (Qn ) . n
The process converges when the change from Q n to Q n+1 becomes small. In this
case, we endh up with i (approximately) Q n = hQ n′ +1 . iLookingh at′ thei iteration rule c (Q ) c (Q ) c′ (Q n ) Q n+1 = Q n − c′′ (Q n ) yields 0 = Q n+1 − Q n = − c′′ (Qnn ) . Thus c′′ (Qnn ) = 0, and since [ c′′ (Q n )] 6= 0, we get c′ (Q n ) = 0. Thus, since c is convex, Q n minimizes c.
Exercise 8.4 Harveys brewery has a monopoly in the beer market. The demand ¡ ¢ 2 function is P (Q ) = 100 − 31 Q 2 and the cost function is C (Q ) = Q 2 + e0.01Q . (a) Is the profit function concave? (b) Using numerical optimization, find the profitmaximizing level of output. (Hint: start with Q 0 = 8 and perform four iterations.) Exercise 8.5
A monopolist is faced with the following demand curve and cost
function
P (Q ) = 154 − 2 eQ ,
C (Q ) = 4Q + 6Q 2 .
(a) Confirm that the profit function is concave. (b) Find the profitmaximizing level of output using numerical optimization. Write down the algorithm. (Start iterations at q0 = 2 and perform five iterations.)
EXERCISES FOR CHAPTER 8
303
Exercise 8.6
A manufacturer of jackets faces the following demand and cost
functions:
P ( q) = 1000 − 10 q2 ,
C ( q) = 100 q + q ln q.
(a) Confirm that the profit function is concave. (b) Find the profitmaximizing level of output by numerical optimization. Exercise 8.7 Faced with the following demand and cost functions
P ( q) = 1000 − q2 ,
C ( q) = 25 + q2 + q(ln q)2 ,
a widget firm wishes to find the profitmaximizing level of output. Find the profitmaximizing level of output. Exercise 8.8 Suppose a firm’s profit function (as a function of output) is given p by π( q) = 2 ln( q + 1) + q − q. Find the profitmaximizing value of q. Exercise 8.9 Suppose a firm’s revenue function (as a function of output) is given by r( q) = 3(1 − e− q ) + 3 ln( q + 1) − 2 q. Find the revenuemaximizing value of q. Exercise 8.10
Consider the function f ( x) = 1 + x + 4 x2 − 3 x3 . This function has
two points at which the first derivative of f is 0. Denote these x∗ and x¯ , so that
f ′ ( x∗ ) = f ′ ( x¯ ) = 0. It is easy to see that one of these is 1, say that x¯ = 1. Find the
value of x∗ using a Taylor series expansion. Exercise 8.11
An individual has utility defined on income according to u( y) =
ln( y + 1). The individual has initial income y¯ and accepts a bet on a coin toss: if
heads, the individual receives a dollar, if tails, the individual pays a dollar. The
coin is fair (50/50 chance of heads or tails), thus the average utility from the bet is 1 1 u( y¯ − 1) + u( y¯ + 1). 2 2 If the individual had not accepted the bet, the utility obtained is u( y¯ ) for sure. Compare the average utility of the bet with the sure utility from not betting. Should the individual have accepted the bet? Exercise 8.12 An individual has the utility function u( y) = ln( y + 1). This gives
the utility of having y dollars. The individual faces a random income next period:
current income is $100 and next period income will either fall by $5 to $95 or increase by $5 to $105. There is a 50% chance that income will rise and a 50% chance that income will fall, so that average income next period is 100 = 12 95 + 12 105. The
risk premium, c, is defined as the amount of money less than $100 that makes the individual indifferent between accepting that sum for sure and facing the risk
CHAPTER 8. TAYLOR SERIES
304
associated with receiving only $95. That is, the individual is indifferent between $100 − c and the risky return $95 with 50% chance and $105 with 50% chance.
Thus, u(100 − c) =
1 1 2 u(95) + 2 u(105),
or ln(101 − c) = 12 ln(101 − 5) + 21 ln(101 + 5).
Find the approximate value of c using Taylor series expansions. Plot the utility function and illustrate how c is calculated on your graph.
Repeat the previous question using the utility function u( y) =
Exercise 8.13 y
e − 1. In this case, note that the utility function is convex (second derivative
positive) so that your picture will change significantly!
A student has just received a present of a new computer. The
Exercise 8.14
1
student’s yearly income is $50 and the student has a utility function u( y) = y 2 .
Unfortunately, there is a 10% chance that the computer will be stolen and the price of a new computer is $5, so in the event of theft, the replacement will leave the student with only $45 income for the year. Thus, in the absence of theft, the student’s income is $50, while if the computer is stolen, the income drops to $45. Computer insurance is available at a cost of $1. Use the calculations from a Taylor series expansion to decide whether the student should buy insurance or not. Note 1
3
u′ ( y) = 21 y− 2 and u′′ ( y) = − 14 y− 2 , so that · ¸ u′′ ( y) 1 − 3 1 − 1 −1 1 1 3 2 2 = − y y = − y− 2 [2 y1/2 ] = y−1 . ′ u ( y) 4 2 4 2
Exercise 8.15 Consider an individual with the following utility function defined p over wealth: u(w) = w. Suppose the person has the opportunity to take the fol
lowing gamble: draw a ticket from a hat; a blue ticket yields a prize of $200, a yellow ticket yields $50 and an orange ticket yields $100. There are four orange tickets, two blue, and two yellow. (a) Calculate the expected payoff from the bet.
(b) Calculate the approximate cost of risk to this individual (using a Taylor series expansion). (c) How much is the person willing to pay to partake in this gamble? (d) Suppose instead the utility function is u(w) = w. Rework your answers to
parts (b) and (c).
(e) Suppose now the utility function is u(w) = w2 . Rework your answers to parts (b)
and (c).
(f) Explain intuitively why your answers differ in parts (c), (d) and (e). A riskaverse individual has the following utility function dep fined over income: u(w) = w. The person is faced with the following lottery with Exercise 8.16
EXERCISES FOR CHAPTER 8
305
possible outcomes denoted x i and their associated probabilities P ( x i ):
x
P ( x)
10
1 2 1 8 1 8 1 4
15 1 7
(a) What is the expected payoff from this lottery? (b) Calculate this person’s measure of absolute risk aversion. (c) If tickets for this lottery cost $7.50, will he buy a ticket? (Use a Taylor series expansion.) (d) Another individual has utility function v( x) = kx, k > 0. Will this individual buy a ticket?
Exercise 8.17 An individual owns a house worth $200,000 that is located next to a river. There is a 1% chance that the river will flood in the spring and cause $60,000 worth of damage. The utility function of the home owner is u(w) = ln w. (a) How much is the individual willing to pay for house insurance? (b) Suppose instead that the homeowner’s utility function is u = w. How much
would he now be willing to pay for insurance? Why is this amount more in part (a)? Exercise 8.18 A group of graduate students have recently pooled together their life’s savings and bought a ski cabin. Apart from its inaccessability, the major drawback with this cabin is that it is located near a hill that is somewhat prone to avalanches. The cabin is worth $100,000. There are two types of avalanche. The first, which occurs with 20% probability, will cause $30,000 worth of damage; the second occurs with 2% probability but causes $90,000 worth of damage. Suppose there are three students in this group with utility functions 1
u 1 ( w) = w 2 ,
u 2 ( w) = w2 ,
u3 (w) = w.
How much will each student be willing to pay to insure the cabin? What is the relationship between these amounts and how is this related to their attitudes towards risk? Is it true that a riskloving individual will never purchase insurance? Exercise 8.19
According to expected utility theory, faced with uncertain out
comes x1 with probability p and x2 with probability (1 − p), an individual associates utility E (u) = pu( x1 ) + (1 − p)u( x2 ). The risk premium, c, associated with
the risk is defined: u(µ − c) = E (u) = pu( x1 ) + (1 − p)u( x2 ), where µ = px1 + (1 − p) x2 ,
CHAPTER 8. TAYLOR SERIES
306
the mean of the random variable x. In the case where u( x) =
and p =
1 2, 2
find π. An approximation for c is given by c ≈
p
x, x1 = 90, x2 = 110 ′′ 1 u (µ) 2 − 2 u′ (µ) σ , where σ2 =
p( x1 − µ) + (1 − p)( x2 − µ)2 . Estimate c using this approximation and compare it
with the exact solution. Exercise 8.20
(a) Let a riskaverse agent have utility function u( x) = ln( x + 1).
Suppose that this individual faces the gamble: win $95 with probability $105 with probability
1 2.
1 2
and win
How much less than $100 would the individual accept
instead of the gamble? (b) Let a random variable have two possible outcomes x1 and x2 , with probabilities
p1 and p2 , respectively. Thus, the mean is µ = p1 x1 + p2 x2 . Let the agent’s utility
function be u. Suppose that p1 u( x1 ) + p2 u( x2 ) = u( z) (and both x1 and x2 are close
to z). Show that:
p 1 ( x1
µ− z
− z )2 + p
2 ( x2
− z )2
≈−
1 u′′ ( z) . 2 u′ ( z )
9 Vectors
9.1
Introduction
A vector is an array of numbers, x = (x1 , x2 ,... , x n ), and may be interpreted in a
number of ways: as the values of a set of variables, as a coordinate position in R n , as a direction (move x i units in direction i, for i = 1,... , n), and so on. Depending on
context, different interpretations are appropriate. In what follows, (Sections 9.2 and 9.3), some of the key summarizing features of a vector are considered: the
length of a vector, the distance between vectors and the angle between vectors. As mentioned, from a geometric perspective, vectors can be interpreted as having direction – in subsequent chapters on optimization, direction of improvement of a function is central to the characterization of an optimal choice. As it turns out, this direction is represented by a vector of partial derivatives of the function, the gradient vector. Vectors implicitly define hyperplanes and halfspaces, key concepts in economic theory – Section 9.4 discusses these concepts. Finally, Section 9.5 describes Farkas’ lemma, an old and important result in optimization theory.
9.2
Vectors: length and distance
The vector x = (x1 , x2 ,... , x n ) is a point in R n , a list of n real numbers. A vector
may also be viewed in terms of direction as: x1 units along axis 1, x2 units along axis 2, and so on, as illustrated in Figure 9.1. Expressed that way, the vector x describes a direction from 0 to a point.
307
CHAPTER 9. VECTORS
308
a
b
c
x
x2 c
a
b
c
c
b
a
x1
a
b
c
a
b
Figure 9.1: Vectors and length
What is the length of the vector x, labeled c in Figure 9.1? To answer this, from the vector x form a rightangled triangle where the hypotenuse has length c and the adjacent and opposite have lengths a and b, respectively. The length of the vector x is the length of the hypotenuse, c, and the value of c can be determined from the opposite and adjacent. On the right side of the figure, the triangle is arranged around a square with side length c. This square with side length c is inscribed inside a square of side length a + b. The area of the larger square consists of four occurrences of the
triangle and one of the square with side length c. The area of the triangle is 21 ab ¡ ¢ (half the base by the perpendicular height). Thus, (a + b)2 = c2 + 4 21 ab = c2 + 2ab. p Since (a + b)2 = a2 + b2 + 2ab, a2 + b2 = c2 , so that c = a2 + b2 , which is Pythagoras’ theorem.q Thus, the length of the vector x, denoted ∥ x ∥, is the length of the line c
given by
x12 + x22 :
∥ x ∥=
q
x12 + x22 .
This extends to n dimensions directly. The length of a vector x = (x1 , x2 ,... , x n ) is denoted ∥ x ∥ and defined
∥ x ∥=
q
x12 + x22 + · · · + x2n
=
s
n X
i =1
x2i =
p
x′ x.
From the definition, the length of x and − x are the same: ∥ x ∥=∥ − x ∥. The sum and difference of two vectors x and y define vectors x + y and x − y, as depicted in Figure 9.2.
The distance between two vectors, x, y, is the length of the vector x − y: ∥ x − y ∥=
s
n X
i =1
(x i − yi )2 .
9.2. VECTORS: LENGTH AND DISTANCE
309
x
x
x2
x− y
∥x∥
y y+ z
0
z
x1
Figure 9.2: Vectors Given a vector x, its length defines a circle or sphere with x in the boundary, so a circle with center 0 has radius r =∥ x ∥. Thus, the smaller the length of a vector, the smaller the circle on whose boundary it lies, as illustrated in Figure 9.3, where x and x′ have the same length (and y, y′ also have the same length). Given a vector z, equidistant points from z all lie on a circle centered on z of some fixed radius. In the second diagram in Figure 9.3, the point w, closest to y and on the horizontal axis, has the property that y − w is a vector perpen
dicular (orthogonal) to the horizontal axis. The circle centered on y and tangent to the horizontal axis has only the point w in common with the horizontal axis. The figure suggests that y − w is perpendicular to the horizontal axis and this is x2
x2
y
y− w
x
y
x′ 0
x1 y′
Figure 9.3: Length
e1 w
x1
CHAPTER 9. VECTORS
310
in fact the case. To see this, let e 1 = (1, 0). Then ce 1 is the vector e 1 scaled by
the factor c. Any point, such as w, on the horizontal axis may be written as ce 1 , for some value of c. Consider the problem of choosing c to minimize the distance from y to the horizontal axis. This requires minimizing ∥ y − ce 1 ∥, or equivalently, minimizing:
∥ y − ce 1 ∥2 = ( y − ce 1 ) · ( y − ce 1 ) = y · y − 2 cy · e 1 + c2 e 1 · e 1 .
Choosing c to minimize this expression gives: −2 y · e 1 + 2 ce 1 · e 1 = 0, so that
c¯ =
y · e1 . e1 · e1
¯ 1 is the point on the horizontal closest to y. Observe that So, w = ce ¯ 1 ) · e 1 = y · e 1 − ce ¯ 1 · e 1 = y · e 1 − y · e 1 = 0. ( y − w) · e 1 = ( y − ce Since any point, v, on the horizontal axis may be written v = ce 1 , for some c,
( y−w)·v = ( y−w)· ce 1 = c( y−w)· e 1 = 0. In particular, ( y−w)·w = 0. The explanation
is simple. The choice of w yields a vector y − w with 0 in the first coordinate. Every
vector on the horizontal axis has 0 in the second coordinate, so ( y − w) · x = 0 for
any x on the horizontal axis x.
9.3 Vectors: direction and angles Since vectors indicate direction, we may compare the relative directions of different vectors. One particularly important comparison occurs when the vectors are orthogonal or “perpendicular”. Definition 9.1: Two vectors a and b are said to be orthogonal if their inner product is 0, a · b = 0. In two dimensions, a · b = a1 b1 + a2 b2 , so that if the vectors are orthogonal, then
a1 b1 + a2 b2 = 0, and rearranging,
a2 a1
= − bb 1 , provided a1 and b2 are both not equal 2
to 0. From the definition, if a and b are orthogonal, then so are the vectors ca and
db for any constants c and d , since ca · db = ( cd )a · b = 0. The vectors u and v,
u=
Ã ! 4
2
and
v=
Ã
−1
2
!
,
9.3. VECTORS: DIRECTION AND ANGLES
311
have inner product u · v = (4)(−1) + (2)(2) = −4 + 4 = 0, so these vectors are orthogonal (Figure 9.4).
v
u
Figure 9.4: Orthogonality Consider two vectors, v and u, depicted in Figure 9.5. Two cases are depicted – where the angle is acute and where the angle is obtuse. In either case, drop a perpendicular line from the point v onto a point lying on the vector u, cu. Orthogonality requires that this line, v − cu, be orthogonal to u: (v − cu) · u = 0. This
implies that v · u − cu · u = 0, so that c=
v
v − cu
v·u v·u = . u · u ∥ u ∥2
u
u
• cu
v − cu
θ
c>0
θ
c 0, v · u > 0; when the angle is obtuse, c < 0, v · u < 0.
Notice that when v · u = 0, cos(θ) = 0, corresponding to a 90◦ angle. For a dif
ferent perspective consider Figure 9.8, where β and z are orthogonal: β · z = 0 or β
β1 z1 + β2 z2 = 0 so that z2 = − β1 z1 . Thus, the angle between z and the horizontal 2
9.4. HYPERPLANES AND DIRECTION OF INCREASE
313
x2 cos(θ) = ha ; sin(θ) = hb ; tan(θ) = h
b a
y h
b
b
θ
θ
x1
a
• α
a
z
Figure 9.7: Trigonometric functions
x2 β = (β1 , β2 )
β2
(β1 , β2 )
β2
β1 β2
(β2 , β1 )
α δ
γ β1 γ
α
x1
β − β1 2
α
β1
z (β2 , −β1 )
Figure 9.8: Orthogonal vectors
axis is (in absolute terms) axis. Since γ and
β1 β2
β1 , β2
the same as the angle between β and the vertical
combine to form a 90◦ angle, the angle between β and z is
90◦ . Alternatively, from the second diagram in Figure 9.8, note that 2α + δ = 90◦ ,
so the angle between (β1 , β2 ) and (β2 , −β1 ) is 90◦ and (β1 , β2 ) · (β2 , −β1 ) = 0.
9.4
Hyperplanes and direction of increase
Let β = (β1 ,... , βn ) 6= 0 be a vector of constants, c a constant and consider the
equation:
βx =
n X
i =1
β i x i = c.
CHAPTER 9. VECTORS
314
The set of all x satisfying this equation defines a hyperplane: H(β, c) = { x  β · x = c}. If y and z are on this hyperplane, then β · y = β · z = c, so that β · (y − z) = 0.
Thus, β and (y − z) are orthogonal, as depicted in Figure 9.9. Movement along the hyperplane defined by β and c is orthogonal to β.
x2 β
y z
y− z
x1 H = { x  β x = c}
Figure 9.9: The normal to a hyperplane A hyperplane H(β, c) = { x  β · x = c}, separates R n into three regions: H = { x  β x = c},
H+ = { x  β x > c},
H− = { x  β x < c}.
On which sides of H do H+ and H− lie? Consider the point y and move from y along the direction β a fraction δ > 0 to y + δβ, as depicted in Figure 9.10. Observe that
c′ = β · (y + δβ) = β · y + δβ · β = c + δβ · β > c Therefore, β points into the region H+ : starting at y and moving from y to w = y + δβ, w is in H+ .
Equivalently, let f (x) = β · x, then β points in the x direction on which f (x)
increases. Furthermore, notice that the direction β gives the greatest increase
among all directions of equal length. For example, r has the same length as δβ (since they both lie on the same circle around y), but c′′ = β · (y + r) < β · (y + δβ) = c′ .
9.4. HYPERPLANES AND DIRECTION OF INCREASE
315
x2 β
y + δβ
y+r
H ′ = { x  β x = c′ } y
H ′′ = { x  β x = c′′ }
•
x1 H = { x  β x = c}
Figure 9.10: The normal to a hyperplane
As noted earlier, the inner product of β with any vector ξ may be positive, zero or negative. If βξ = 0, they are orthogonal. Informally, if βξ > 0, the angle between β and ξ is less than 90◦ ; and if βξ < 0, the angle between β and ξ is greater
than 90◦ . To see this from a different perspective, consider Figure 9.11, where z is chosen so that β z > c, say β z = c′ > c.
x2 β
z x
H ∗ = { x  β x = c′ > c} x1
ξ = x− z
H = { x  β x = c} H ◦ = { x  β x = 0}
Figure 9.11: The sign of an inner product
CHAPTER 9. VECTORS
316
Notice from the figure that the vector ξ satisfies βξ = β(x − z) = c − c′ < 0, and ξ
forms an angle greater than 90◦ with β. Similarly, if z were chosen with β z = cˆ < c, then βξ = β(x − z) > 0.
If β is normal (perpendicular) to H, then for any α 6= 0, αβ is normal to the
hyperplane. Dividing β by ∥ β ∥ gives the unit normal, β∗ , to the hyperplane H where β∗ = β′ β ∥β∥2
= 1.
β . ∥β∥
The normal is orthogonal to H and has length 1, since β∗′ β∗ =
9.5 Farkas’ lemma These ideas may be used to develop a wellknown theorem in linear algebra and linear inequalities called Farkas’ lemma. This result is particularly useful in optimization theory. Let A be an m × n matrix and b an n × 1 vector. Theorem 9.1: Either (1) { Ax = b, x ≥ 0}, or (2) { A ′ y ≤ 0 and y′ b > 0} has a solution, but not both.
P ROOF : Suppose that (1) has no solution: b 6∈ K ≡ { Ax  x ≥ 0}. Let p ∈ { Ax 
x ≥ 0} solve min z∈K kb − zk. Since K is convex, p is unique. Let w ≥ 0 satisfy p = Aw. The hyperplane perpendicular to b − p separates b and K: ∀ z ∈ K,
(b − p) · (z − p) ≤ 0 (see Figure 9.12.) Equivalently:
(b − p) · (Ax − Aw) ≤ 0, ∀ x ≥ 0. Put y = b − p, so y · (Ax − Aw) ≤ 0 for all x ≥ 0. Let x = w + e i , e i a vector with 1 in
the ith position and zero elsewhere. Then A(x − w) = Ae i = a i the ith column of A. Thus, y · a i ≤ 0, ∀ i, or y′ A ≤ 0. Taking x = 0 in y · (Ax − Aw) ≤ 0 and recalling
Aw = p gives − y · p ≤ 0. Substituting b − y for p, − y · (b − y) ≤ 0 or − y · b + y · y ≤ 0
or y · y ≤ y · b and y · y > 0 since y 6= 0, so y · b > 0. Thus, if (1) does not hold, (2) must hold. Finally, (1) and (2) cannot both be satisfied: if ∃ x ≥ 0, Ax = b and if y′ A ≥ 0 then y′ b = y′ Ax ≥ 0. This completes the proof.
R EMARK 9.1: Farkas’ lemma has a simple geometric interpretation. In Figure 9.13, b is in the cone generated by the vectors a1 and a2 : b = α1 a1 + α2 a2 with
α i ≥ 0. The shaded region is defined as the set of y satisfying a1 · y ≤ 0 and a2 · y ≤
0 or A ′ y ≤ 0, where A is the with columns a1 and a2 (A = {a1 , a2 }) and A ′ ½ matrix ¾ the transpose of A: A ′ =
a′1 a′2
. As the figure illustrates, the region { y  A ′ y ≤ 0}
is a subset of the region { y  b′ y ≤ 0}. Thus, Ax = b has a nonnegative solution and A ′ y ≤ 0 implies b′ y ≤ 0.
9.5. FARKAS’ LEMMA
317
b • p•
•z
K
b− p •
• z− p
Figure 9.12: Proof of Farkas’ lemma
The lemma may be rearranged as follows: L EMMA 1: The pair A ′ y ≤ 0 and b′ y > 0 have a solution if and only if the pair
A ′ y ≥ 0 and b′ y < 0 have a solution. Consequently, Farkas’ lemma may be writ
ten as:
either (1) { Ax = b, x ≥ 0}, or (2) { A ′ y ≥ 0 and b′ y < 0} has a solution, but
♦
not both.
P ROOF : For, if z satisfies A ′ z ≤ 0 and b′ z > 0 then − A ′ z ≥ 0 and −b′ z < 0 or
A ′ (− z) ≥ 0 and b′ (− z) < 0 or A ′ y ≥ 0 and b′ y < 0 with y = − z.
Farkas’ lemma may be expressed in other ways. One corollary is: L EMMA 2: Either:
Ax ≤ b
or
A′ y ≥ 0
x≥0
b′ y < 0
has a solution
has a solution.
y≥0
♦
Ã ! ´ A′ ′ ¯ . Then, applying I , so that A = I lemma (1), either [ A¯ z = b, z ≥ 0] has a solution or, [ A¯ ′ y ≥ 0 and b′ y < 0] has a ³ P ROOF : In Farkas’ lemma, let A¯ = A
CHAPTER 9. VECTORS
318
a1
b
a2 a′1 y˜ = 0 b′ y˜ = 0
{ y  A ′ y ≤ 0}
a′2 y˜ = 0
Figure 9.13: Farkas’ lemma: a geometric view solution. Let z = ( x, w), so that the first expression is equivalent to [ Ax + Iw = b
and ( x, w) ≥ 0]. There is an ( x, w) satisfying this if and only if there is an x ≥ 0
with Ax ≤ b.
The second expression in lemma (1) is, in this context: [
Ã
A′
!
Ã
! A′ y
Ã ! 0
≥ y= 0 y I and b′ y < 0], and this is equivalent to there being a y with A ′ y ≥ 0,
b′ y < 0 and y ≥ 0. So, either one or the other of the following hold (but not
both). Summarizing, there is an x ≥ 0 with Ax ≤ b, or there is a y with A ′ y ≥ 0,
b′ y < 0 and y ≥ 0.
Exercises for Chapter 9
Exercise 9.1 Find the distance between the vectors x = (5, 3) and y = (4, 6). Exercise 9.2
Let x = (1, 3), y = (2, 2) and z = (1, −1). Find the angle between x
and y, and that between x and z. Exercise 9.3
Let x = (4, 2) and y = (3, 3). Find c such that ( y− cx) · ( y − cx) ≤
˜ , for some c˜. Confirm that ( y − cx) · x = 0. ( y − v) · ( y − v) for all v = cx Exercise 9.4
(a) Let β = (1, 2). Find the hyperplane β · x = 2. Noting that (2, 0) is on this hyperplane, plot (2, 0) + β. Identify the region { x  β x ≥ 2}.
(b) Let β = (−1, 1). Find the hyperplane β · x = 2. Noting that (−1, 1) is on this hyperplane, plot (−1, 1) + β. Identify the region { x  β x ≥ 2}.
(c) Let β = (−1, −1). Find the hyperplane β · x = 2. Noting that (−1, −1) is on this hyperplane, plot (−1, −1) + β. Identify the region { x  β x ≥ 2}.
Exercise 9.5 Consider the matrix:
R (θ ) =
Ã cos(θ)
sin(θ)
− sin(θ)
cos(θ)
!
,
where θ is an angle measured in degrees. For example:
R (90) =
Ã
cos(90) sin(90)
− sin(90)
cos(90)
319
!
=
Ã 0
1
−1
0
!
.
CHAPTER 9. VECTORS
320
The transformation R (θ) x moves the vector x through an angle of θ degrees. (a) Letting x = (1, 1), find x′ = R (45) x. Using the formulae c =
x· x′ ∥ x∥
and cos(θ) =
(b) Letting y = (1, 3), find y′ = R (30) y. Using the formulae c =
x· y′ ∥ y∥2
and cos(θ) =
c ∥ x∥ , ∥ x′ ∥
c∥ y∥ ∥ y′ ∥ ,
confirm that cos(θ) =
confirm that cos(θ) =
p1 2
p
3 2
=
=
c∥ x ∥ . ∥ x′ ∥
c∥ y∥ ∥ y′ ∥ .
Exercise 9.6 Show that 1 2 4
3 1 2
0 y2 ≥ 0 , 3 1 y3 0
5
y1
³
3
4
y 1 ´ 3 y2 < 0 y3
has a solution. Exercise 9.7
Consider the following observations on two random variables, x
and y: 9 7 x= , 3
5
12 8 y= . 2
6
The correlation coefficient is defined:
r = pP
P ( x i − x¯ )( yi − y¯ ) pP ( yi − y¯ )2 ( x i − x¯ )2
where x¯ and y¯ are the respective means. Find the correlation coefficient and relate it to the angle between the deviations around x and y (determine the angle between the vectors of deviations).
10 Quadratic forms
10.1 Introduction A quadratic form is given by the expression x′ Ax where x is an n × 1 vector and A an n × n matrix. Quadratic forms arise in various branches of economics, and
particularly in optimization theory to identify a candidate solution as a maximum
or minimum. For example, in the study of quadratic functions, ax2 + bx + c = (ax +
b)x + c, this function is either “ushaped” or the opposite, depending on whether a is positive or negative. So, the turning point either locates a minimum or a
maximum depending of the sign of a. If, for example, a > 0, then large values of x
correspond with large values of the function and the turning point corresponds to a minimum value of the function. Essentially the same situation occurs when the function is a quadratic in many variables: x′ Ax + bx + c. In this case, the matrix A plays an analogous role to a in
the scalar case: whether a “turning point” is a maximum or minimum depends on properties of A. Letting f (x) = x′ Ax + bx + c =
n X n X
i =1 j =1
a i j xi x j +
n X
k =1
b k x k + c,
in the context of optimization, positive and negative definiteness are the central concepts. Section 10.2 introduces quadratic forms and shows how, in the twovariable case, properties of the A matrix determine the shape of the function f . In Section 10.3, positive and negative definiteness are defined and the properties of positive and negative matrices are characterized in terms of determinant conditions. When the function f (x) is arbitrary (not necessarily quadratic), the criteria developed here extend directly by using a secondorder quadratic approximation 321
CHAPTER 10. QUADRATIC FORMS
322
of the function f to identify local maxima or local minima. Finally, Section 10.4 considers the case where there are linear restrictions on x. A matrix A is positive definite if for all x 6= 0, x′ Ax > 0 and negative definite when x′ Ax < 0 for all nonzero x. So, for example, positive definiteness requires that for all non
zero x, x′ Ax > 0. Suppose instead, that this requirement applies only to those nonzero x satisfying an additional condition, B ′ x = 0: under what circumstances
is x′ Ax > 0 for all nonzero x satisfying B ′ x = 0? This issue is considered in Section 10.4. The criteria developed there appear in the study of constrained optimization problems.
10.2 Quadratic forms A quadratic form in n variables is given as: a11 · · · a1n x1 . XX .. . ′ . . . x Ax = (x1 , x2 ,... , x n ) . a i j xi x j , . . .. = i j a n1 ... a nn x n where a i j is the element in the ith row and jth column of A. In the case of two variables: x=
Ã
x1
!
x2
,
A=
Ã
a11
a12
a21
a22
!
and ′
³
x Ax = x1
x2
Ã ´ a 11
a21
a12 a22
!Ã
x1 x2
!
= a11 x12 + (a12 + a21 )x1 x2 + a22 x22 ,
which is a quadratic form in the variables x1 and x2 . An important question concerns the sign of x′ Ax. Under what conditions is the quadratic form always positive or always negative, regardless of the value of x? The question arises in a variety of problems, and in particular in the area of optimization.
E XAMPLE 10.1: For example, if a11 = 3, a22 = 4 and a12 + a21 = 3, then the
associated quadratic form is q(x1 , x2 ) = 3x12 + 4x22 + 3x1 x2 . Let f (x) be the quadratic function plotted in the figure:
z = f (x) = x1 + x2 + q(x1 , x2 ) = x1 + x2 + 3x12 + 4x22 + 3x1 x2 .
10.2. QUADRATIC FORMS
323
z
x2 x1
2 5 , − 11 ) = ( x1∗ , x2∗ ). That is: This function is smallest when ( x1 , x2 ) = (− 22
f ( x1∗ , x2∗ ) ≤ f ( x1 , x2 ) for all ( x1 , x2 ).
Notice that the “bowl” shape of the function is crucial: around ( x1∗ , x2∗ )
the function curves up. This shape property is determined by the quadratic form q( x1 , x2 ), which has the property that q( x1 , x2 ) ≥ 0 for all ( x1 , x2 ), and illustrates the relationship between quadratic forms and optimization.
♦
In general, the sign of the quadratic form x′ Ax depends on both A and the vector
x, as Example 10.2 illustrates.
E XAMPLE 10.2: Let A =
Ã
!
a
0
0
−aˆ
′
x Ax = ( x1 , x2 )
Ã
where a > 0 and aˆ > 0. Then
a
0
0
−aˆ
!Ã
x1 x2
!
ˆ 22 . = ax12 − ax
Then
x′ Ax > 0 if
ˆ 22 > 0 or ax12 − ax
x′ Ax < 0
ˆ 22 < 0 or ax12 − ax
if
aˆ 2 x a 2 aˆ x12 < x22 . a x12 >
So, the sign of x′ Ax depends on the value of x and the parameters of A . If, instead,
A=
Ã
!
a
0
0
aˆ
ˆ 22 , which is never negative since a and aˆ are positive then x′ Ax = ax12 + ax
and x2i ≥ 0.
♦
CHAPTER 10. QUADRATIC FORMS
324
Thus, as Example 10.2 illustrates, under suitable restrictions on A , the sign of a ˆ 22 , may be completely determined by the paramequadratic form, x′ Ax = ax12 + ax ters of the A matrix.
Consider again the twovariable quadratic form:
( x1 x2 )
Ã
a11
a12
a21
a22
!Ã
x1
!
x2
= a11 x12 + (a12 + a21 ) x1 x2 + a22 x22
¯ 1 x2 + a22 x22 , = a11 x12 + 2ax Ã !Ã ! a¯ a11 x1 = ( x1 x2 ) . a¯ a22 x2
1 a¯ = (a12 + a21 ) 2 (10.1)
If a11 = 0, then x′ = (1, 0) gives x′ Ax = 0, so, for a positive or negative definite
¯ 1 x2 + a22 x22 may be rearranged matrix, a ii 6= 0 for all i . The expression a11 x12 + 2ax (assume a11 6= 0 or that a22 6= 0 and with a symmetric calculation):
¸ ¸2 · · a¯ 2 2 a¯ ¯ 1 x2 + a22 x22 = a11 x1 + x x2 + a22 − a11 x12 + 2ax a11 a11 2 · · ¸ ¸2 a¯ 1 2 ¯ = a11 x1 + a11 a22 − a x22 . x2 + a11 a11
If a11 > 0 and a11 a22 − a¯ 2 > 0, then this expression is positive if ( x1 , x2 ) 6= (0, 0).
Conversely, if a11 < 0 and a11 a22 − a¯ 2 > 0, then the expression is negative for all ( x1 , x2 ) 6= (0, 0). In either of the other two cases [a11 > 0, a11 a22 − a¯ 2 < 0] and
[a11 < 0, a11 a22 − a¯ 2 < 0], the sign of x′ Ax can be positive or negative depending
on the values of ( x1 , x2 ). Consequently, the quadratic form is strictly positive for all ( x1 , x2 ) 6= (0, 0) if and only if a11 > 0 and a11 a22 − a¯ 2 > 0; and strictly negative
for all ( x1 , x2 ) 6= (0, 0) if and only if a11 < 0 and a11 a22 − a¯ 2 > 0.
E XAMPLE 10.3: This example illustrates how different values of the a i j parameters in the A matrix affect the sign of the quadratic form, Equation 10.1:
A1 = A2 =
Ã Ã
4
3
1
2
4
3
5
2
! !
⇒ x ′ A 1 x = 4 x 2 + 4 x y + 2 y2 ⇒ x ′ A 1 x = 4 x 2 + 8 x y + 2 y2 .
Considering A 1 , a11 = 4 > 0, a11 a22 − a¯ 2 = 4 · 2 − 22 = 4 > 0. Similarly, for A 2 ,
a11 = 4 > 0, a11 a22 − a¯ 2 = 4 · 2 − 42 = −8 < 0. The pair of conditions a11 > 0 and
10.3. POSITIVE AND NEGATIVE DEFINITE MATRICES
325
a11 a22 − a¯ 2 > 0 are satisfied by the matrix A 1 , but not by A 2 . z
x′ A 1 x
x1 x2 z
x′ A 2 x
x1 x2 In the graphs, the quadratic form based on A 1 is always nonnegative and strictly positive for ( x1 , x2 ) 6= (0, 0), whereas the quadratic form based on A 2
is negative for some values of ( x1 , x2 ). For example, at ( x1 , x2 ) = (−1, 1), 4 x2 + 8 x y + 2 y2 = 4(−1)2 + 8(−1)(1) + 2(1)2 = −2.
♦
The next section describes a test to determine if a matrix is positive or negative definite.
10.3 Positive and negative definite matrices For the following discussion let A be an n × n square matrix. Any vector x gives
a quadratic form, x′ Ax. Depending on the properties of the matrix, A , it may be that the expression x′ Ax is always positive or always negative (whenever x 6= 0).
Definition 10.1 formalizes this concept and the discussion provides criteria for checking when a matrix is positive or negative definite.
CHAPTER 10. QUADRATIC FORMS
326
Definition 10.1: Definite and semidefinite matrices: • A is positive definite if x′ Ax > 0, ∀ x 6= 0 • A is positive semidefinite if x′ Ax ≥ 0 • A is negative definite if x′ Ax < 0, ∀ x 6= 0 • A is negative semidefinite if x′ Ax ≤ 0. Observe that if A is positive (semi) definite then − A is negative (semi)definite. If
A is positive definite then A is positive semidefinite since x′ Ax > 0 ⇒ x′ Ax ≥ 0.
The converse is not true:
A=
Ã 1
1
1
1
!
,
x=
Ã
x1 x2
!
,
so x′ Ax = ( x1 + x2 )2 ≥ 0, but setting x1 = − x2 6= 0 gives x′ Ax = 0. How can one determine if a matrix is positive or negative definite? Sometimes the nature of the problem provides an answer. For example, suppose that B is an P n × k matrix and that A = B ′ B . Then x′ Ax = x′ B ′ Bx = z′ z = ni=1 z2i ≥ 0, where z =
Bx. So, whenever A can be expressed as A = B ′ B , x′ Ax ≥ 0 for all x. Furthermore, P if the rank of B is k, then for any x 6= 0, z = Bx 6= 0 so that x′ Ax = ni=1 z2i > 0
whenever x 6= 0.
In general, the matrix A will have no obvious structure implying the sign of the corresponding quadratic form and motivates the need for a systematic procedure to investigate definiteness of the matrix. The systematic procedure is based on the properties of “minors” derived from the matrix A . Definition 10.2: Given a square matrix A , the determinant of any k × k matrix
(0 < k ≤ n) obtained by deleting any n − k rows and n − k columns is called a
minor (or a kthorder minor to identify the dimensions of the matrix on which the determinant is derived).
• The ( i, j) minor of a square matrix A , denoted m i j , is the determinant of the matrix obtained from A by deleting the i th row and jth column. • A minor obtained from a square matrix A as the determinant of a k × k
matrix given by deleting n − k matching rows and columns is called a (kth
order) principal minor.
• A principal minor is called a leading principal minor if it is the determinant of a submatrix obtained from A by deleting the last (n − k) rows and columns, 0 < k ≤ n.
10.3. POSITIVE AND NEGATIVE DEFINITE MATRICES
327
Definition 10.3: The trace, tr( A ), of a square matrix, A , is the sum of the diagonal P elements of the matrix: tr( A ) = ni=1 a ii . P ROPOSITION 10.1: The following observations are useful: 1. If A is positive definite then a ii > 0, ∀ i ; and if A is positive semidefinite then a ii ≥ 0, ∀ i .
2. If A is positive definite r( A ) = n. 3. If A is positive definite then tr( A ) > 0. 4. The determinant of a symmetric positive definite matrix is positive.
♦
P ROOF : Let x = (0, 0,... , 1,... , 0), a vector with 1 in the ith position and 0 else
where. Then x′ Ax = a ii . If A is positive definite then a ii > 0, ∀ i ; and if A is positive semidefinite then a ii ≥ 0, ∀ i . To show the second result, note that if A
is positive definite, then for all x 6= 0, x′ Ax 6= 0. Therefore, for all x 6= 0, Ax 6= 0,
which implies linear independence of the columns of A , so that A has full rank,
n. The positive determinant property follows from Theorem 10.1.
E XAMPLE 10.4: Consider the matrix A :
A=
Ã
1
0
−4
1
!
.
This matrix has positive trace, determinant and leading principal minors, but is not positive definite. (With x′ = (2, 1), x′ Ax = −3.) Notice that the matrix A is not symmetric.
10.3.1
♦
Symmetric matrices
Recall that a matrix A is symmetric if A = A ′ (or a i j = a ji for all i, j). The following considerations show that there is no loss of generality in considering only symmetric matrices when checking for positive definiteness. Consider any nonsymmetric matrix B . Noting that x′ Bx = x′ B ′ x, if B is positive definite then so is B + B ′ since 2 x′ Bx = x′ Bx + x′ B ′ x = x′ (B + B ′) x, so that x′ Bx > 0 if and only if x′ (B + B ′) x > 0. A matrix B is positive definite (semidefinite) if and only if the
matrix B + B ′ is positive definite (semidefinite). Note that B + B ′ is symmetric.
Therefore, to determine if B is positive definite (positive semidefinite) etc., one
CHAPTER 10. QUADRATIC FORMS
328
need only confirm that (B + B ′ ) is positive definite (positive semidefinite). In discussing (semi)definite matrices we need only consider symmetric matrices.
10.3.2
Criteria for positive and negative definiteness
In general, it is difficult to check positive or negative definiteness by inspecting a matrix. An alternative criterion is based on considering a sequence of determinants. Let A be a symmetric n × n matrix and let D i be an i × i matrix – the submatrix of A obtained by deleting the last n − i rows and the last n − i columns.
Thus D i  is the i th leading principal minor. In words, for positive definiteness, the
leading principal minors {D i }ni=1 must all be positive; for negative definiteness they must alternate in sign, starting negative:
a11
a12
a1 j
a 21 . . . A= a j1 . . . a n1
···
···
a22
···
a2 j
···
.. .
a j2 .. .
a n2
..
.
···
.. .
ajj
.. .
.. .
···
an j
.. .
···
..
.
···
a1 n
a2 n .. . , a jn .. . a nn
a11
a21 Di = . . . a i1
a12
···
a22
···
.. .
a i2
..
.
···
a1 i
a2 i . .. . a ii
With this notation: Theorem 10.1: Let A be a symmetric matrix. The matrix A is positive definite if and only if  D i > 0, ∀ i = 1,... , n. That is:  D 1 > 0,  D 2 > 0,  D 3 > 0, ... ,  D n > 0.
The matrix A is negative definite if and only if (−1) i  D i > 0, ∀ i = 1,... , n. That is:  D 1 < 0,  D 2 > 0,  D 3 < 0, ... , (−1)n  D n > 0.
Thus, with positive definiteness,  D i  is positive for all i , whereas with nega
tive definiteness,  D i  alternates in sign, beginning negative.
In the 2 × 2 case, because the matrix is symmetric, a12 = a21 and a11 6= 0, the
quadratic form is:
x
′
Ax = a11 x12 + 2a12 x1 x2 + a22 x22 = a11
µ
a12 x1 + x2 a11
¶2
Ã
+ a22 −
a212 a11
!
x22 .
10.3. POSITIVE AND NEGATIVE DEFINITE MATRICES
329
From this expression, it is clear that if a11 > 0 and a11 a22 − a212 > 0, then provided
x1 6= 0 or x2 6= 0, x′ Ax > 0. Note that a11 is the determinant of the matrix D 1 , and
a11 a22 − a212 is the determinant of the matrix D 2 . From these calculations, note
that to check positive or negative definiteness, there are at most n determinants to evaluate.
10.3.3
Positive and negative semidefiniteness
The next theorem gives similar conditions for positive and negative semi˜ i be any submatrix of A condefiniteness and uses the following notation. Let D taining i rows and columns obtained by deleting any n − i rows and the corre
sponding columns (if the kth row is deleted, then the kth column is also deleted). For example, if A is an n × n matrix:
a kk
ak j
a kl
a jk .. . ˜ Di = a lk .. .
···
···
ajj .. .
···
.
a jl .. .
··· .. .
al j .. .
··· .. .
a ll .. .
···
ar j
···
a rl
···
a rk
..
..
.
a kr
a jr .. . a lr .. .
a rr
˜ i is an i thorder principal minor. The determinant of any D For example, if A is a 3 × 3 matrix:
a11
a12
a13
A= a21
a22
a23 ,
a31
a32
a33
˜ i matrices: then there are three D Ã
a11
a12
a21
a22
!
,
Ã
a22
a23
a32
a33
!
,
Ã
a11
a13
a31
a33
!
,
and hence three principal minors of order 2. Theorem 10.2: Let A be a symmetric matrix. The matrix A is positive semidefinite if and only if: ˜ 1 ≥ 0, all  D ˜ 2 ≥ 0,... all  D ˜ i ≥ 0,... all  D ˜ n ≥ 0. all  D
CHAPTER 10. QUADRATIC FORMS
330
The matrix A is negative semidefinite if and only if: ˜ 1 ≤ 0, all  D ˜ 2 ≥ 0, ... all (−1) i  D ˜ i ≥ 0,... all (−1)n  D ˜ n ≥ 0. all  D Observe that the statements for positive and negative semidefiniteness are not obtained by replacing strict inequalities by nonstrict inequalities. To check for positive semidefiniteness, it is necessary to check that all 1storder principal minors are positive, all 2ndorder principal minors are positive, all 3rdorder principal minors are positive, and so on. There are n = minors,
n! ( n−3)!3!
n! ( n−1)!1!
1storder principal minors,
n! ( n−2)!2!
2ndorder principal i P h 3rdorder principal minors, and so on. There are ni=1 ( n−ni!)! i! =
2n − 1 such minors in total.
10.4 Definiteness with equality constraints In constrained optimization problems, checking for maxima or minima leads to considering definiteness of a quadratic form subject to linear constraints: for example, that x′ Ax > 0 for all x satisfying Bx = 0, for some m × n matrix B , where
b11
b12
b 21 B= .. . b m1
···
b22 .. .
··· .. .
b m2
···
b1n
b2n .. . . b mn
First, consider the simplest case where there is one constraint so Bx = 0 may be
written b′ x where b′ = (b1 , b2 ,... , b n ). Form the bordered matrix:
0
b1
b2
b1 M kk = b2 .. .
···
a11
a12
···
a21 .. .
a22 .. .
···
a k1
a k2
···
bk
..
.
bk
a1 k a2 k , .. .
a kk
or, writing b k for the first k elements of b and A kk for the matrix obtained from
A by deleting the ! last k + 1, k + 2,... , n rows and columns, M kk may be written Ã ′ 0 bk . M kk = b k A kk
10.4. DEFINITENESS WITH EQUALITY CONSTRAINTS
331
Theorem 10.3: Let A be a symmetric matrix and b1 6= 0. Then, 1. x′ A x > 0 for all x 6= 0 and b′ x = 0 if and only if  M kk  < 0,
k≥2
2. x′ A x < 0 for all x 6= 0 and b′ x = 0 if and only if (−1)k  M kk  > 0,
k≥2
In the case where m > 1, similar computations apply. Then,
0 0 . . . 0 M kk = b11 b 12 . ..
b1k
.. . .. .
0 0 .. .
..
0
b21 b22 .. . b2k
0
b11
b12
···
0 .. .
b21 .. .
b22
··· .. .
0
b m1
b m2
···
b m1
a11
a12
···
b m2 .. .
a21 .. .
a22 .. .
···
b mk
a k1
a k2
···
.
.. . .. . .. . .. . .. .
···
..
.
b1k b2k .. . b mk . a1 k a2 k .. .
a kk
The matrix M kk has m + k rows and columns. Writing A kk and B mk for the component matrices,
M kk =
Ã
0
B mk
B ′mk
A kk
!
.
Theorem 10.4: Let A be a symmetric matrix and let  Bmm 6= 0. Then, 1. x′ A x > 0 for all x 6= 0 with B x = 0 if and only if: (−1)m  M kk  > 0,
k = m + 1,... , n.
2. x′ A x < 0 for all x 6= 0 with B x = 0 if and only if: (−1)k  M kk  > 0,
k = m + 1,... , n.
CHAPTER 10. QUADRATIC FORMS
332
R EMARK 10.1: This characterization may be expressed in different ways. Let
I p be an identity matrix of dimension p. Then: Ã
0
Ik
Im
0
!Ã
0
B mk
B ′mk
A kk
!Ã
0
Im
Ik
0
!
=
Ã
B ′mk
A kk
0
B mk
!Ã
0
Im
Ik
0
!
=
Ã
A kk
B ′mk
B mk
0
!
.
From this, noting that Ã
µ¯ ¯0  M kk = ¯¯ Im
0
Im
Ik
0
!′
=
Ã
0
Ik
Im
0
¯¶ ¯ I k ¯¯ 2 ¯¯ A kk ¯ ×¯ B mk 0
!
,
¯ ¯ B ′mk ¯¯ ¯¯ A kk ¯=¯ B mk 0
¯ B ′mk ¯¯ ¯. 0
Therefore, the determinental conditions given previously can also be expressed in terms of matrices in the form: Ã
A kk
B ′mk
B mk
0
!
.
R EMARK 10.2: If there is just one constraint, m = 1, then constrained positive
ness requires  M kk < 0, k = m + 1,... , n. More generally, with m > 1 m odd gives
 M kk < 0, k = m + 1,... , n, while with m even the inequality is reserved so that
 M kk > 0, k = m + 1,... , n. In contrast, for constrained negatives  M kk  alter
nates in sign starting positive at k = m + 1 if m is odd and starting negative if
m is even.
Exercises for Chapter 10
Exercise 10.1 Consider the following symmetric matrices: 3 −1 1 3 2 1 A= B= −1 3 −2 , 2 12 4 . 1 −2 2 1 4 2 Confirm that both A and B are positive definite. Exercise 10.2 Consider the following matrices: −3 2 5 −5 A = 2 −6 B= 2 1 , 5 1 −13 1
2 −1 −1
1
−1 .
−7
Confirm that both A and B are negative definite. Exercise 10.3 Consider the matrix: Ã
A=
1 4 − 31
− 31 2 9
!
.
Show that A is positive definite for all x = ( x1 , x2 ) satisfying Exercise 10.4 Consider the matrix: Ã
A=
1 4 − 34
− 34 32 9
!
¡1
2 2, 3
¢
· ( x1 , x2 ) = 0.
.
¢ ¡ Show that A is positive definite for all x = ( x1 , x2 ) satisfying 2, 38 · ( x1 , x2 ) = 0.
333
CHAPTER 10. QUADRATIC FORMS
334
Exercise 10.5 Consider the matrix: α(α − 1) Q (α, β, γ) = αβ αγ
αβ β(β − 1) βγ
αγ
. γ(γ − 1). βγ
Let x′ Q (α, β, γ) x be a quadratic form. Also, let b = (w, r, p) be a vector of strictly positive entries. Suppose that α = β = γ = 41 . Show that x′ Q (α, β, γ) x < 0 for all x
satisfying b′ x = 0.
11 Multivariate optimization
11.1 Introduction This chapter is concerned with multivariate optimization, finding values for (x1 , x2 ,... , x n ) to make the function f (x1 , x2 ,... , x n ) as large or small as possible. In Section 11.2, the twovariable case is described; Section 11.3 considers the nvariable case. The extension from two to n variables is natural. Section 11.2 provides sufficient conditions for a local optimum and provides motivation for those conditions. Also, in this section, the envelope theorem is described. Section 11.3 provides sufficient conditions for both local and global optima. These conditions are developed in terms of concavity of the objective function and the properties of Hessian matrices in Section 11.4.
11.2 The twovariable case Consider the problem of choosing (x, y) to maximize z = f (x, y), a function of two variables. The first requirement is that no unilateral variations in x or y improve the value of f , the firstorder conditions: ∂ f (x, y) ∂x ∂ f (x, y) ∂y
= f x (x, y) = 0 = f y (x, y) = 0.
(11.1)
The total differential is dz = f x dx + f y d y. This gives the change in the value
of the function due to the variation (dx, d y) in x and y from the value (x, y). 335
CHAPTER 11. MULTIVARIATE OPTIMIZATION
336
Note that dz depends on (x, y) since dz = f x (x, y)dx + f y (x, y)d y. When the choice
of x and y maximizes f , there can be no changes dx and d y that raise the value of f , as required by the partial derivative conditions above – that f x and f y are both 0. Consider the function plotted in Figure 11.1, f (x, y) = 1 − x2 − y2 . There, at z
y x
Figure 11.1: The function f (x, y) = 1 − x2 − y2
(0,0), f x = f y = 0, and it is clear from the figure that this point locates the highest
point of the function. Confirming that a point is the maximizing value requires a comparison similar to the univariate test f ′′ (x) < 0. The next theorem summarizes ¯ y). ¯ the key result for local optimality of a point ( x, ¯ y) ¯ satisfy Theorem 11.1: Suppose that ( x, 1.
¯ y) ¯ = f y ( x, ¯ y) ¯ =0 (a) f x ( x, £ ¤ ¯ y) ¯ < 0, f yy ( x, ¯ y) ¯ < 0 and f xx ( x, ¯ y) ¯ f yy ( x, ¯ y) ¯ − f x2y ( x, ¯ y) ¯ > 0, (b) f xx ( x,
¯ y) ¯ is a local maximum. then ( x, 2.
¯ y) ¯ = f y ( x, ¯ y) ¯ =0 (a) f x ( x, £ ¤ ¯ y) ¯ > 0, f yy ( x, ¯ y) ¯ > 0 and f xx ( x, ¯ y) ¯ f yy ( x, ¯ y) ¯ − f x2y ( x, ¯ y) ¯ > 0. (b) f xx ( x,
¯ y) ¯ is a local minimum. then ( x, The following examples illustrate the use of the theorem.
E XAMPLE 11.1: Suppose that a firm has a product that is produced at unit cost: min c(x, y) = e x e y − ln x − ln y.
11.2. THE TWOVARIABLE CASE
337
What are the unit cost minimizing levels of x and y? The firstorder conditions are: 1 =0 x 1 c y = e x e y − = 0. y
cx = ex e y −
Thus,
1 x
=
1 y
and x = y. From this, e2 x = 1x , so that solving xe2 x = 1 gives the
solution ( x = 0.4263). The secondorder conditions are: 1 x2 1 c yy = e x e y + 2 y
c xx = e x e y +
cx y = ex e y. So, c xx > 0 and ¸· ¸ · 1 1 c xx c yy − c2x y = e x e y + 2 e x e y + 2 − [ e x e y ]2 x y
= ex e y
1 1 1 1 + e x e y 2 + 2 2 > 0. 2 y x x y
♦
The next example considers a profit maximization problem. E XAMPLE 11.2: Suppose that a firm has profit function
f ( x, y) =
p
x + ln( y + 1) − x −
1 y. 2
What values of x and y maximize profit? In this example, the computations are simplified because the function p may be written f ( x, y) = f 1 ( x) + f 2 ( y): f ( x, y) = x + ln( y + 1) − x − 12 y = ¤ £ p [ x − x] + ln( y + 1) − 21 y , and the crosspartial derivatives vanish. The function may be plotted:
z
x y
p f ( x, y) = x +ln( y + 1)− x − 12 y
CHAPTER 11. MULTIVARIATE OPTIMIZATION
338
The firstorder conditions are:
fx =
1 1 p −1 = 0 2 x
fy =
1 1 − = 0. y+1 2
1
1 1 2 and 2 = y + 1 giving x = 4 , y = 1. The second derivatives are: 1 − 23 −2 f xx = − 4 x , f yy = −( y + 1) and f x y = 0. So, f xx and f yy are both negative ¢ ¡ ¢ ¡ ¢ ¡ ¯ y¯ ) = 14 , 1 , f ( x, y) < f 14 , 1 , for ( x, y) 6= 14 , 1 . ♦ in a neighborhood of ( x,
So, x 2 =
R EMARK 11.1: In Example 11.2, the function f ( x, y) has crosspartial derivatives that are 0, so that f x,y ( x, y) = 0 for all ( x, y). Later, in Equation 11.2, with
¯ y¯ ) = f y ( x, ¯ y¯ ) = 0 and f x y ( x, ¯ y¯ ) = f yx ( x, ¯ y¯ ) = 0, the Taylor series expansion ref x ( x,
duces to:
¯ y¯ ) + [ f xx ( x∗ , y∗ )(∆ x)2 + f yy ( x∗ , y∗ )(∆ y)2 ]. f ( x, y) = f ( x, So, if f xx ( x∗ , y∗ ) < 0 and f yy ( x∗ , y∗ ) < 0, since (∆ x)2 and (∆ y)2 are both posi
¯ y¯ ). If f xx ( x, ¯ y¯ ) < 0 and f yy ( x, ¯ y¯ ) < 0, then this will also hold for tive, f ( x, y) < f ( x,
¯ y¯ ), so that for all ( x, y) close to ( x, ¯ y¯ ), f ( x, y) < f ( x, ¯ y¯ ). ( x∗ , y∗ ) close to ( x,
E XAMPLE 11.3: A problem in physics leads to the following objective. Minimize f ( x1 , x2 ), where f is defined: 1 f ( x1 , x2 ) = (γ1 x12 + γ3 ( x2 − x1 )2 + γ2 x22 ) − bx2 , γ i > 0, ∀ i = 1, 2, 3. 2 Differentiating with respect to x1 and x2 gives:
f x1 = γ1 x1 − γ3 ( x2 − x1 ) = 0 f x2 = γ3 ( x2 − x1 ) + γ2 x2 − b = 0. The second derivatives are: f x1 x1 = γ1 + γ3 , f x1 x2 = −γ3 , f x2 x2 = γ2 + γ3 . This gives the matrix H as
H=
Ã
f x1 x1
f x1 x2
f x1 x2
f x2 x2
!
=
Ã
(γ 1 + γ 3 ) −γ 3
−γ 3
(γ 2 + γ 3 )
!
.
To solve the firstorder conditions, observe that they can be rearranged as: Ã
(γ 1 + γ 3 ) −γ 3
−γ 3
(γ 2 + γ 3 )
!Ã
x1 x2
!
=
Ã ! 0
b
.
11.2. THE TWOVARIABLE CASE
339
Cramer’s rule may be used to solve for x1 and x2 . The determinant of H is  H = (γ1 + γ3 )(γ2 + γ3 ) − γ23 = γ1 γ2 + γ1 γ3 + γ2 γ3 + γ23 − γ23 = γ1 γ2 + γ1 γ3 + γ2 γ3 .
So, ¯ 1 ¯¯ 0 x1 = ¯  H  ¯b
Or,
x1 =
¯ ¯ ¯ 1 ¯¯(γ1 + γ3 ) ¯ ¯ , x2 = ¯  H  ¯ −γ 3 (γ 2 + γ 3 )¯
−γ 3
¯ 0¯¯ ¯. b¯
b γ3 b (γ 1 + γ 3 ) , x2 = . γ1 γ2 + γ1 γ3 + γ2 γ3 γ1 γ2 + γ1 γ3 + γ2 γ3
Since
f x1 x1 = γ1 + γ2 > 0,
f x1 x1 f x2 x2 − f x21 x2 = γ1 γ2 + γ1 γ3 + γ2 γ3 > 0,
the secondorder condition for a minimum is satisfied. The figure plots f for γ i = 2, ∀ i , b = 1.
z
x y f ( x, y) = x2 +( y − x)2 + y2 − y
♦
11.2.1
Motivation for the secondorder conditions
Recall that in the univariate case, a Taylor series expansion gave:
f ( x) = f ( x¯ ) + f ′ ( x¯ )( x − x¯ ) +
1 ′′ f ( p)( x − x¯ )2 2!
for some p between x and x¯ . So, with f ′ ( x¯ ) = 0, f ( x) = f ( x¯ ) + 21 f ′′ ( p)( x − x¯ )2 , so that if
f ′′ ( p) < 0 for all p close to x¯ , then f ( x) < f ( x¯ ) and x¯ is a local maximum. Essentially
CHAPTER 11. MULTIVARIATE OPTIMIZATION
340
the same ideas apply to the multivariate case. In the twovariable case, a Taylor ¯ y¯ ), ( x, y) = ( x¯ + ∆ x, y¯ + ∆ y) is given by: series expansion of f around ( x, ¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] f ( x, y) = f ( x,
+ [ f xx ( x∗ , y∗ )(∆ x)2 + 2 f x y ( x∗ , y∗ )∆ x∆ y + f yy ( x∗ , y∗ )(∆ y)2 ]
(11.2)
¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] ≈ f ( x,
¯ y¯ )(∆ x)2 + 2 f x y ( x, ¯ y¯ )∆ x∆ y + f yy ( x, ¯ y¯ )(∆ y)2 ], + [ f xx ( x,
(11.3)
¯ y¯ ): where ( x∗ , y∗ ) is on the line segment joining the points ( x, y) and ( x, ˜ y˜ )  ( x, ˜ y˜ ) = θ( x, y) + (1 − θ)( x, ¯ y¯ ), θ ∈ [0, 1]}, ( x∗ , y∗ ) ∈ {( x, and where ∆ x = x − x¯ and ∆ y = y − y¯ . The Taylor series expansion is exact with
¯ y¯ ) (Equation 11.3). The expression ( x∗ , y∗ ) (Equation 11.2) or approximate with ( x, may be written in matrix terms: " Ã !# 1 ∗ ∗ ∆x ¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] + f ( x, y) = f ( x, (∆ x ∆ y ) H ( x , y ) 2 ∆y " Ã !# ∆x 1 ¯ y¯ ) ¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] + , (∆ x∆ y)H ( x, ≈ f ( x, 2 ∆y
(11.4)
(11.5)
where
H ( x, y) =
Ã
f xx ( x, y)
f x y ( x, y)
f yx ( x, y)
f yy ( x, y)
!
,
is called the Hessian matrix. In the case where f ( x, y) = 1 − x2 − y2 ,
H ( x, y) =
Ã
−2
0
0
!
−2
so, the expression in Equation 11.4 reduces to ¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] − 2[(∆ x)2 + (∆ y)2 ], f ( x, y) = f ( x, ¯ y¯ ), if f x ( x, ¯ y¯ ) = f y ( x, ¯ y¯ ) = 0 the expression reduces to f ( x, y) = −2[(∆ x)2 + and at ( x, (∆ y)2 ]. Since this is less than or equal to 0 for all values of ∆ x and ∆ y, the value
¯ y¯ ) is at least as large as at any other ( x, y) value. In this case, and in of f at ( x, general, the properties of the Hessian matrix are crucial in determining whether a point satisfying the firstorder conditions (Equations 11.1) determine a local maximum (or minimum).
11.2. THE TWOVARIABLE CASE
341
Note that the entries (partial derivatives) in the Hessian matrix depend on
x and y: writing H ( x, y) emphasizes that the entries are evaluated at ( x, y). If ¯ y¯ ), then f x ( x, ¯ y¯ ) = f y ( x, ¯ y¯ ) = 0. In this case, the firstorder conditions hold at ( x, Equations 11.2 and 11.3 become:
¯ y¯ ) + [ f xx ( x∗ , y∗ )(∆ x)2 + 2 f x y ( x∗ , y∗ )∆ x∆ y + f yy ( x∗ , y∗ )(∆ y)2 ] f ( x, y) = f ( x, ¯ y¯ ) + [ f xx ( x, ¯ y¯ )(∆ x)2 + 2 f x y ( x, ¯ y¯ )∆ x∆ y + f yy ( x, ¯ y¯ )(∆ y)2 ], ≈ f ( x,
(11.6) (11.7)
¯ y¯ ). In terms of the Heswhere ( x∗ , y∗ ) is on the line segment joining ( x, y) and ( x, sian matrix: ∗
∗
¯ y¯ ) + (∆ x ∆ y)H ( x , y ) f ( x, y) = f ( x,
Ã
∆x
!
∆y Ã ! ∆x ¯ y¯ ) + (∆ x ∆ y)H ( x, ¯ y¯ ) . ≈ f ( x, ∆y
(11.8)
(11.9)
The sign of this expression depends on properties of the matrix:
H ( x, y) =
Ã
f xx ( x, y)
f x y ( x, y)
f x y ( x, y)
f yy ( x, y)
!
.
The key property is called negative definiteness. Ã
a
b
!
is said to be negative b c definite if for all z = ( z1 z2 ) 6= (0, 0), z′ Az < 0. The matrix is said to be positive Definition 11.1: A square symmetric matrix A = definite if z′ Az > 0 for all such z.
Recall the criteria for negative and positive definiteness: P ROPOSITION 11.1: In terms of the elements of the matrix, positive definiteness and negative definiteness are characterized as follows: 1. A is negative definite if and only if a < 0 and ac − b2 > 0, and 2. A is positive definite if and only if a > 0 and ac − b2 > 0.
♦
R EMARK 11.2: This implies that if A is negative (positive) definite and A˜ is (sufficiently) close to A , then A˜ is negative (positive) definite. For example, if ˜ c˜ are close to a, b, c, respectively, then a˜ < 0 and ˜ b, a < 0 and ac − b2 > 0 and a, a˜ c˜ − b˜ 2 > 0, so that A˜ is negative definite.
CHAPTER 11. MULTIVARIATE OPTIMIZATION
342
Consequently, H ( x, y) is negative definite at ( x, y) if:
f xx ( x, y) < 0 and f xx ( x, y) f yy ( x, y) − f x2y ( x, y) > 0. Likewise, H ( x, y) is positive definite at ( x, y) if:
f xx ( x, y) > 0 and f xx ( x, y) f yy ( x, y) − f x2y ( x, y) > 0. ¯ y¯ ) < 0, then for [ f xx ( x, ¯ y¯ ) f yy ( x, ¯ y¯ ) − R EMARK 11.3: As noted earlier, if f xx ( x,
¯ y¯ )] > 0 to hold, it must be that f yy ( x, ¯ y¯ ) < 0, since f x2y ( x, ¯ y¯ ) ≡ ( f x y ( x, ¯ y¯ ))2 ≥ 0. f x2y ( x, £ ¯ y¯ ) < 0 and f xx ( x, ¯ y¯ ) f yy So, condition 1(b) in Theorem 11.1 is equivalent to f xx ( x, ¤ 2 ¯ y¯ ) − f x y ( x, ¯ y¯ ) > 0. An analogous equivalence applies in the local minimum ( x,
case.
Given the previous discussion, the main theorem, Theorem 11.1, is straightforward to prove. ¯ y¯ ) satisfy Theorem 11.1 Suppose that ( x, 1.
¯ y¯ ) = f y ( x, ¯ y¯ ) = 0 (a) f x ( x, £ ¤ ¯ y¯ ) < 0, f yy ( x, ¯ y¯ ) < 0 and f xx ( x, ¯ y¯ ) f yy ( x, ¯ y¯ ) − f x2y ( x, ¯ y¯ ) > 0. (b) f xx ( x,
¯ y¯ ) is a local maximum. then ( x, 2.
¯ y¯ ) = f y ( x, ¯ y¯ ) = 0 (a) f x ( x, £ ¤ ¯ y¯ ) > 0, f yy ( x, ¯ y¯ ) > 0 and f xx ( x, ¯ y¯ ) f yy ( x, ¯ y¯ ) − f x2y ( x, ¯ y¯ ) > 0. (b) f xx ( x,
¯ y¯ ) is a local minimum. then ( x, ¯ y¯ ), f x ( x, ¯ y¯ ) = f y ( x, ¯ y¯ ) = 0, and suppose also that conP ROOF : Suppose that at ( x,
¯ y¯ ) is negative definite. Given a small numdition 1(b) is satisfied, so that H ( x, ¯ y¯ ) and ber η, let −η < ∆ x < η and −η < ∆ y < η. Since ( x∗ , y∗ ) lies between ( x,
¯ y¯ ). In view of the pre( x¯ + ∆ x, y¯ + ∆ y) in Equation 11.8, ( x∗ , y∗ ) is within η of ( x,
ceding observation, it must be that H ( x∗ , y∗ ) is negative definite for small η ¯ y¯ )). Therefore, (since H ( x∗ , y∗ ) is close to H ( x, ∗
∗
¯ y¯ ) + (∆ x ∆ y)H ( x , y ) f ( x, y) = f ( x,
Ã
∆x
∆y
!
∗
∗
and (∆ x ∆ y)H ( x , y )
Ã
∆x
∆y
!
< 0,
¯ y¯ ). Similar reasoning applies to the case of a local so it must be that f ( x, y) < f ( x,
minimum.
As mentioned in Remark 11.3, the conditions f xx ( x, y) < 0 and f xx ( x, y) f yy ( x, y) −
f x2y ( x, y) > 0 together imply that f yy ( x, y) < 0 so that the pair of conditions
11.2. THE TWOVARIABLE CASE
343
[ f xx (x, y) < 0 and f xx (x, y) f yy (x, y) − f x2y (x, y) > 0] are equivalent to [ f xx (x, y) < 0, f yy (x, y) < 0 and f xx (x, y) f yy (x, y) − f x2y (x, y) > 0]. However, as a practical matter, it
is often useful to check the simpler conditions f xx (x, y) < 0 and f yy (x, y) < 0 first: if either fails, then the condition f xx (x, y) f yy (x, y) − f x2y (x, y) > 0 cannot be satisfied.
¯ y) ¯ + R EMARK 11.4: In since the Taylor series expansion f (x, y) = f ( x, Ã fact, ! ∆ x ¯ y) ¯ with f x ( x, ¯ y) ¯ = (∆ x ∆ y)H(x∗ , y∗ ) holds generally (given any (x, y), and ( x, ∆y ¯ y) ¯ = 0, there is (x∗ , y∗ ) such that this equation is satisfied). So, for examf y ( x, ¯ y) ¯ for all ple, if H(x, y) were negative definite for all (x, y), then f (x, y) < f ( x, ¯ y) ¯ and the firstorder conditions give a global maximum. (x, y) 6= ( x,
With a view to generalization to the manyvariable case, the secondorder conditions can be expressed directly in terms of submatrices of the Hessian matrix H. Considering H, write H1 for the matrix consisting of just the top left entry H1 = ( f xx ), and H2 for the matrix H. Because the determinant of a matrix with
one row and column is just the single entry of that matrix (if A = [a],  A = a),  H1 = f xx
and  H2 = f xx f yy − f x2y .
Consequently: R EMARK 11.5: The secondorder conditions for a maximum are  H1 < 0
and
 H2 > 0.
The secondorder conditions for a minimum are  H1 > 0
and
 H2 > 0.
In the case of two variables, the benefit from expressing the secondorder conditions in terms of determinants derived from the Hessian matrix is not significant. However, with three or more variables, the matrix presentation is more succinct. (See Section 11.3.)
E XAMPLE 11.4: Let f (x, y) = −2x2 − 7y2 + x y + 12x + 37y. Then, f x = −4x + y +
12, f xx = −4, f y = −14y + x + 37, f yy = −14, f x y = 1. Solving for f x = 0, f y = 0
CHAPTER 11. MULTIVARIATE OPTIMIZATION
gives x =
41 11 ,
H=
y=
Ã −4
1
344
32 11 .
!
1 −14
,
 H 1  = −4 < 0 ,
¯ ¯−4 ¯ H2  = ¯ ¯1
¯ 1 ¯¯ ¯ = 55 > 0, −14¯
so the Hessian is negative definite. Consequently, the solution to the first
♦
order conditions is a local maximum. The next example involves somewhat more complex computation. E XAMPLE 11.5: Consider the problem: max f ( x, y) =
p
x[1 + ln( y + 1)] − x − y2 .
A graph of the function is:
y
x
z
p f ( x, y) = x[1 +ln( y + 1)]− x − y2
The firstorder conditions are: 1 1 f x = ( x)− 2 [1 + ln( y + 1)] − 1 = 0. 2 p x fy = − 2 y = 0. ( y + 1) 1
From the firstorder conditions, x 2 = 12 [1 + ln( y + 1)]. Substituting into the
1 2 [1 + ln( y + 1)] = 2 y( y + 1). Solving for y gives y = 1 2 4 [1 + ln(1.2448129)] . Next, consider the secondorder
equation f y = 0 gives
0.2448129. Thus, x =
conditions,
3 1 f xx = − ( x)− 2 [1 + ln( y + 1)] < 0 4 p x f yy = − −2 ( y + 1)2 1 1 1 . f x y = ( x )− 2 2 ( y + 1)
11.2. THE TWOVARIABLE CASE
345
Thus,
f xx f yy −
f x2y
p ¶ ¸ µ µ ¶· 1 −1 x 1 −3 1 2 − 2 − ( x) = − ( x) [1 + ln( y + 1)] − 4 4 ( y + 1)2 ( y + 1)2 µ ¶ 1 1 −3 = ([1 + ln( y + 1)] − 1) + 2 x 2 (1 + ln( y + 1)) 2 4 4 x( y + 1) µ ¶ ln( y + 1) 1 −3 = + 2 x 2 (1 + ln( y + 1)) > 0. 4 4 x( y + 1)2
Since ln( y + 1) = ln(1.2448129) > 0, the solution is a local maximum.
♦
The final example in this section considers a problem with general parameter specification.
E XAMPLE 11.6: A firm uses two inputs, gas ( x) and electricity ( z), to heat a building during winter. The problem is to decide on the appropriate combinations of the inputs to minimize cost. The cost function is given by:
c( x, z) = px−α + β
x + γ z, f ( z)
α, β, γ > 0,
where f ( z) is an increasing function ( f ′ ( z) > 0). The firstorder conditions are:
c x ( x, z) = −α px−α−1 + c z ( x, z) = −β
β = 0, f ( z)
x f ′ ( z ) + γ = 0. f ( z )2
And the second derivatives are given by:
c xx ( x, z) = α(α + 1) px−α−2 · ¸ 1 1 ′ 2 ′′ c zz ( x, z) = −β x −2 ( f ( z )) + f ( z ) f ( z )3 f ( z )2 c xz ( x, z) = −
β f ′ ( z ). f ( z )2
Thus, the secondorder conditions are:
c xx = α(α + 1) px−α−2 > 0 ¸ · 1 1 ′ 2 ′′ ( f ( z )) + f ( z ) >0 c zz = −β x −2 f ( z )3 f ( z )2
CHAPTER 11. MULTIVARIATE OPTIMIZATION
346
µ · c xx c zz − c2xz = −(α(α + 1) px−α−2 ) β x −2 2
β
−
f ( z )2
1 1 ( f ′ ( z))2 + f ′′ ( z) 3 f ( z) f ( z )2
¸¶
( f ′ ( z))2 > 0.
These conditions are impossible to confirm without a specific functional form for f . Consider the case where f ( z) = e z , so that f ( z) = f ′h( z) = f ′′ ( z) = ie z . Then f (1z)3 ( f ′ ( z))2 = f (1z) = f (1z)2 f ′′ ( z), so that c zz ( x, z) = −β x −2 f (1z) + f (1z) =
−β x f (1z) (−1) =
βx f ( z) .
β
β
Also, c xz ( x, z) = − f ( z)2 f ′ ( z) = − f ( z) . Thus, the second
order conditions reduce to:
c xx = α(α + 1) px−α−2 > 0 c zz =
βx
>0
f ( z)
c xx c zz − c2xz = [α(α + 1) px−α−1 ]
·
The firstorder condition gives −α px−α−1 +
stitute
β f ( z)
for −α px
−α −1
¸ µ ¶ β 2 − > 0. f ( z) f ( z) β
β f ( z)
= 0 or α px−α−1 =
β f ( z) .
Sub
in the last condition to obtain
· ¸· ¸ µ ¶ ¶ µ β β β 2 β 2 (α + 1) − = (α) > 0, f ( z) f ( z) f ( z) f ( z)
which verifies that the firstorder condition is satisfied. To solve for x and z, recall the firstorder conditions:
c x ( x, z) = −α px−α−1 + From c x ( x, z) = 0, α px−α−1 =
x=
µ
1 αp
¶
−1 (α+1)
µ
β
f ( z)
β f ( z) ,
= 0,
c z ( x, z) = −β
so x−(α+1) =
β
f ( z)
¶
−1 (α+1)
1 β α p f ( z) .
1
= (α p) (α+1)
µ
x + γ = 0. f ( z)
Rearranging,
f ( z) β
¶
1 (α+1)
,
and from c z ( x, z) = 0, β f (xz) = γ so that, substituting for x, µ
β
f ( z)
¶µ
1 αp
¶
−1 (α+1)
µ
β
f ( z)
¶
−1 (α+1)
= γ.
Rearranging, µ
1 γ= αp
¶
−1 (α+1)
µ
β
f ( z)
¶µ
β
f ( z)
¶
−1 (α+1)
µ
1 = αp
¶
−1 (α+1)
µ
β
f ( z)
¶ 1+
−1 (α+1)
µ
1 = αp
¶
−1 (α+1)
µ
β
f ( z)
¶
α (α+1)
.
11.2. THE TWOVARIABLE CASE
347
So, µ
β
f (z)
¶
α (α+1)
µ
1 =γ αp
¶
1 (α+1)
µ
and then
β
f (z)
¶
=γ
α+1 α
µ
1 αp
¶1
α
.
Thus,
f (z) = β(γ)
−(α+1) α
µ
1 αp
¶ −1 α
or
z= f
−1
Ã
β(γ)
−(α+1) α
µ
1 αp
¶ −1 ! α
.
When f (z) = e z , q = f (z) gives z = f −1 (q) = ln(q). Thus, "
z = ln β(γ)
11.2.2
−(α+1) α
µ
1 αp
¶ −1 # α
.
♦
Failure of the secondorder conditions
The secondorder conditions confirm that at a candidate optimum, in a neighborhood of that point the function “bends down” in all directions in the case of maximization, and “bends up” in all directions in the case of minimization. There are a variety of ways in which this condition may fail. The following examples illustrate two common cases. Consider the function f (x, y) = x2 − y2 . The function is depicted in Figure 11.2 and is sometimes called a saddle function in view of its shape. As the figure indicates, the point (0,0) satisfies the firstorder conditions but from there variations in x raise f while variations in y reduce f . Observe that f x = 2x, f y = −2y, f xx = 2, f yy = −2 and f x y = 0, so the Hessian matrix is: Ã
2
0
0
−2
!
.
Thus, H1  = 2 and H2  = −2 so that neither the conditions for a local maximum or a local minimum are satisfied.
As a second example, consider the function: f (x, y) = x2 y.
CHAPTER 11. MULTIVARIATE OPTIMIZATION
348
z z
y
x x y
Figure 11.2: The function f (x, y) = x2 − y2 This function is plotted in Figure 11.3. Again, the point (0,0) satisfies the firstorder conditions since f x = 2x y, f y = x2 , but the function twists at (0,0) . The Hessian matrix is:
H=
Ã
2y
2x
2x
0
!
.
So H1  = 2y and H2  = −4x2 . Both of these are 0 at (x, y) = 0.
z y x
Figure 11.3: The function f (x, y) = x2 y
11.2. THE TWOVARIABLE CASE
349
11.2.3
The envelope theorem
Consider the problem of maximizing a function of x when the function depends on 1
some exogenous parameter α : f (x, α). For example: f (x, α) = x 2 − α x. The usual procedure is to set the first derivative to zero: f x (x, α) = 0. This then gives the
solution x∗ = x(α) as a function of α, chosen to satisfy f x (x(α), α) = 0. Note that
if α varies, so does x(α) to maintain the equality of the partial derivative with 0. As α varies, so will the optimal choice of x to maintain the identity f x (x(α), α) ≡ 1
1
0. In the example, f (x, α) = x 2 − α x gives f x (x, α) = 12 x− 2 − α = 0. Rearranging, ¡ 1 ¢2 1 1 2 2α = x or x(α) = 2α . If the optimal choice of x given α, x(α), is substituted into
the function f , then f (x(α), α) is the value of the function being maximized, at the solution. Note that f (x(α), α) depends only on α; the value of the optimized function may be written f ∗ (α) ≡ f (x(α), α). In the example: ∗
f (α) = f (x(α), α) =
·µ
1 2α
¶2 ¸ 12
µ
1 −α 2α
¶2
µ
¶ µ ¶ µ ¶ 1 1 1 = − = . 2α 4α 4α
Now consider the following question. If α varies, how much will the value of the maximized function vary? That is, what is the value of
d f ∗ ( α) dα ?
Differentiating:
d f ∗ (α) dx(α) = f x (x(α), α) + f α (x(α), α). dα dα Note now that from the firstorder condition f x (x(α), α) = 0 at each value of α, so that for each α,
d f ∗ (α) = f α (x(α), α). dα
In words, the change in the maximized value of the function, f ∗ (α), as the parameter (α) changes, is equal to the derivative of the (unoptimized) function, f (x, α), with respect to the parameter, with the derivative evaluated at the solution (x(α)) : ¡ ¢ f α (x∗ (α), α). In the example, f ∗ (α) = 41α , so that µ ¶ µ ¶ d f ∗ (α) 1 1 2 =− = − = − x(α). dα 2α 4α2 1
If we differentiate f (x, α) = x 2 − α x with respect to α and evaluate at the solution 1
x(α), we get − x(α). To maximize f (x, α) = x 2 − α x, the optimal choice of x is that 1
which makes the gap between x 2 and α x as large as possible. In Figure 11.4, this 1
is illustrated with the gap maximized at x(α) and with resulting value x(α) 2 − α x(α) = a − b.
CHAPTER 11. MULTIVARIATE OPTIMIZATION
350
x
1 2
slope = α′
slope = α
•d
f• •e ·
·
•a
c• • b
· x ( x(α ) α) ′
·
x
Figure 11.4: The impact of parameter variation on the objective value In the figure, profit at α is (a − b) and at α′ profit is (d − f ). Observe that π(α′ ) = d − f ≈ ( a − c) = ( a − b) − ( c − b) = π(α) − (α′ − α) x(α),
so π(α′ ) − π(α)
(α′ − α)
≈ − x(α).
This observation extends to the general case. Let f ( x1 , x2 , · · · , x n , α1 , α2 , · · · , αm ).
For ease of notation write x = ( x1 , x2 , · · · , x n ) and α = (α1 , α2 , · · · , αm ). The first
order conditions are f xi ( x, α) = 0, i = 1, 2, · · · , n. Solving these gives x(α) = ( x1 (α),
x2 (α), · · · , x n (α)). Then the optimized value of the function is f ∗ (α) = f ( x(α), α).
To calculate the change in f ∗ due to a change in one α, say α i , take the partial derivative of f ∗ (α) with respect to α i . This gives ∂ f ∗ (α) ∂α i
=
n ∂ f ( x(α), α) ∂ x (α) X ∂ f ( x(α), α) i + . ∂ x ∂α ∂α i i i j =1
Note now that from the firstorder conditions, the partial derivative of f ( x, α) evaluated at x = x(α) is equal to 0 for each j = 1, 2, · · · , n : ∂ f ∗ (α) ∂α i
=
∂ f ( x(α), α) ∂α i
.
∂ f ( x(α),α) ∂x j
= 0, ∀ j. Thus,
11.2. THE TWOVARIABLE CASE
351
The following examples illustrate this result.
E XAMPLE 11.7: The profit function of a firm depends on output Q and advertising A . Profit is given by π(Q, A ) = 500 − 4Q 2 − 8Q + 6 AQ − 10 A 2 + 100 A.
So, the firstorder conditions are: πQ = −8Q + 6 A − 8
=0
π A = 6Q − 20 A + 100 = 0.
Thus, Ã
−8
6
Solving these gives Q =
πQ A = 6,
2 πQQ π A A − πQ A
110 31
!Ã ! Q
6 −20
A
and A =
=
188 31 .
Ã
8
!
−100
.
Since πQQ = −8, π A A = −20 and
= 160 − 36 = 124. So, the secondorder conditions for
♦
a maximum are satisfied.
Example 11.8 develops Example 11.7 to illustrate the envelope theorem.
E XAMPLE 11.8: Suppose that the profit function is now π(Q, A ) = 500 −
αQ 2 − βQ + γ AQ − 5 A 2 + 50 A , where α, β and γ are parameters. In this case,
the firstorder conditions become:
πQ = −2αQ − β + γ A = 0,
π A = γQ − 10 A + 50 = 0.
Rearranging: Ã −2 α γ
γ
−10
!Ã ! Q
A
=
Ã
β
−50
!
.
Using Cramer’s rule:
¯ ¯ β ¯ ∆1 = ¯ ¯−50
¯ γ ¯¯ ¯ = 20α − γ2 , −10¯ ¯ ¯ ¯ ¯ −2 α γ ¯¯ β ¯¯ ¯ ¯ = −10β + 50γ, ∆2 = ¯ ¯ = 100α − γβ. ¯ γ −10¯ −50¯ ¯ ¯−2α ¯ ∆=¯ ¯ γ
CHAPTER 11. MULTIVARIATE OPTIMIZATION
352
Thus,
Q (α, β, γ) =
∆1 −10β + 50γ = , ∆ 20α − γ2
A (α, β, γ) =
∆2 100α − γβ = . ∆ 20α − γ2
For the secondorder conditions: πQQ = −2α < 0,
and
2 2 πQQ π A A − πQ A = 20α − γ > 0
µ
if α >
γ2
20
¶
.
Finally, using the envelope theorem, the impact of parameter changes on the value of the objective may be calculated: ∂π∗ ∂α
= −Q (α, β, γ)2 ,
For example,
∂π∗ ∂α
∂π∗ ∂β
= −Q (α, β, γ),
∂π∗ ∂γ
= Q (α, β, γ) A (α, β, γ).
is the change in the firm’s profit (at the optimal solution)
due to a change in the parameter α.
♦
The next example considers a standard economic model: profit maximization by a pricetaking firm with two inputs. Profit maximization leads to optimal choices, which in turn determine the firm’s profit. How does this vary as input prices vary? The envelope theorem provides a simple way to answer this question. In addition, the example illustrates the impact of parameter changes on the choice variables.
E XAMPLE 11.9: A firm produces output, Q , with two inputs, x and y, according to the production function Q = xα yα , 0 < α < 21 . The cost of input x is
r > 0 and the cost of input y is w > 0. The output Q sells at unit price p.
The following discussion considers the problem of profit maximization and consideration of the impact of parameter changes on the optimal values of the choice variables and the level of profit at the solution. The profit function is π( x, y) = pxα yα − rx − w y.
Maximizing this determines solution values for x and y, say x∗ and y∗ , which depend on the parameters ( p, r, w, α), x∗ = x( p, α, r, w) and y∗ =
y( p, α, r, w). Substituting these into the profit function gives the level of
profit at the optimal choices of ( x, y): π∗ ( p, w, r) = p( x∗ )α ( y∗ )α − rx∗ − w y∗ ,
with x∗ and y∗ functions of ( p, α, r, w).
11.2. THE TWOVARIABLE CASE
353
How does maximized profit vary with a small change in r? According to the envelope theorem, the answer may be obtained by differentiating the unoptimized profit function with respect to r and evaluating the derivative at the solution. Here, the derivative of the unoptimized function with respect to r is − x and evaluating at the solution gives − x∗ . (Likewise, the impact
of an output price change is obtained by differentiating the unoptimized function with respect to p and evaluating the derivative at the solution.) The following computations confirm these observations. S OLVING
FOR THE OPTIMAL SOLUTION
The firstorder conditions for profit maximization are: π x ( x, y) = α pxα−1 yα − r = α pQx−1 − r = 0
π y ( x, y) = α pxα yα−1 − w = α pQ y−1 − w = 0,
where Q = xα yα . Therefore, rx = w y or x =
w r y,
and
0 = α pxα−1 yα − r ³ w ´α−1 = αp y yα − r r ³ w ´α−1 = αp y α −1 y α − r r ³ w ´α−1 = αp y2α−1 − r. r Therefore,
αp r
¡ w ¢ α −1 r
= y1−2α , so that
y=
α p 1−12α ³ w ´ 1α−−21α
r
y = (α p )
r
1 1 −2 α
α−1
−α
w 1 −2 α r 1 −2 α .
(11.10)
Similarly, 1
α−1
−α
x = ( α p ) 1 −2 α r 1 −2 α w 1 −2 α .
(11.11)
The secondorder conditions are satisfied, since: π xx = α(α − 1) pQx−2 < 0
π yy = α(α − 1) pQ y−2 < 0 π xx π yy − π2x y = α2 (α − 1)2 p2
2 2 Q2 2 2 2 Q 2 2 Q − α α p = α p [(α − 1)2 − α2 ] > 0, y2 x 2 x 2 y2 y2 x 2
CHAPTER 11. MULTIVARIATE OPTIMIZATION
354
using π x y = αα pQx−1 y−1 and the fact that [(α − 1)2 − α2 ] = [α2 − 2α + 1 − α2 ] =
(1 − 2α). T HE
IMPACT OF PARAMETER CHANGES ON CHOICE VARIABLES
From the solutions, the impact of parameter changes on the optimal values can be computed directly. For example: µ
¶ α−1 −α 1 1 ( α p ) 1 −2 α − 1 r 1 −2 α w 1 −2 α ∂p 1 − 2α µ ¶µ ¶ 1 1 = x 1 − 2α p α ∂x
=
and ∂y
1 α−1 −α −α ( α p ) 1 −2 α w 1 −2 α r 1 −2 α − 1 ∂ r 1 − 2α ¶µ ¶ µ 1 −α y. = 1 − 2α r
T HE
IMPACT
OF
CALCULATIONS )
=
PARAMETER
CHANGES
ON
( ENVELOPE
PROFIT ∗
At the optimal solution, the profit is π ( p, α, r, w) =
π( x( p, α, r, w), y( p, α, r, w)). According to the envelope theorem, the impact
of a parameter change on the objective may be computed by differentiating the unoptimized objective with respect to the parameter and evaluating the derivative at the solution. In the present context, for example: ∂π∗ ∂r ∂π∗ ∂w
= =
∂π( x( p, α, r, w), y( p, α, r, w)) ∂r ∂π( x( p, α, r, w), y( p, α, r, w)) ∂w
= − x( p, α, r, w) = − y( p, α, r, w).
The following discussion confirms this. Recall that π( x( p, α, r, w), y( p, α, r, w)) = px( p, α, r, w)α y( p, α, r, w)α
− rx( p, α, r, w) − w y( p, α, r, w).
Then ∂π( x( p, α, r, w), y( p, α, r, w)) ∂r
= α px( p, α, r, w)α−1 y( p, α, r, w)α α
+ α px( p, α, r, w) y( p, α, r, w)
∂ x( p, α, r, w)
α −1
∂r ∂ y( p, α, r, w) ∂r
11.2. THE TWOVARIABLE CASE
355
−r
∂ x( p, α, r, w)
−w
∂r
∂ y( p, α, r, w) ∂r
− x( p, α, r, w)
= [α px( p, α, r, w)α−1 y( p, α, r, w)α − r] α
+ [α px( p, α, r, w) y( p, α, r, w)
α −1
∂ x( p, α, r, w)
− w]
∂r ∂ y( p, α, r, w) ∂r
− x( p, α, r, w).
We know that x( p, α, r, w) and y( p, α, r, w) were chosen to satisfy the firstorder conditions, so that [α px( p, α, r, w)α−1 y( p, α, r, w)α − r] = 0,
[α px( p, α, r, w)α y( p, α, r, w)α−1 − w] = 0. Therefore, ∂π( x( p, α, r, w), y( p, α, r, w))
= − x( p, α, r, w).
∂r
So, the derivative of the optimized function with respect to the parameter
r is equal to the derivative of the unoptimized function with respect to the parameter, with the derivative evaluated at the optimal solution. Similarly, ∂π( x( p, α, r, w), y( p, α, r, w))
= − y( p, α, r, w).
∂w
These calculations are closely related to dual prices in linear programming. The partial derivative of π( x( p, α, r, w), y( p, α, r, w)) with respect to r measures the change in the value of the objective at the optimal solution due to a change in the cost of one of the inputs ( x). T HE
IMPACT OF PARAMETER CHANGES ON PROFIT ( DIRECT CALCULATIONS )
Finally, the optimized function can be differentiated directly. These tedious calculations confirm the result obtained directly from the envelope theorem. 1
α−1
−α
Observe, using Equations 11.11 and 11.10, x = (α p) 1−2α r 1−2α w 1−2α , and y = 1
α−1
−α
(α p) 1−2α w 1−2α r 1−2α , so:
α
x α = ( α p ) 1 −2 α r y α = (α p ) Because
(α−1)α 1 −2 α
2
+ 1−−α2α =
α 1 −2 α
(α−1)α 1 −2 α
w
−α2
(11.12)
w 1 −2 α
(α−1)α 1 −2 α
r
−α2 1 −2 α
.
−α 1 −2 α , 2α
−α
−α
x α y α = ( α p ) 1 −2 α r 1 −2 α w 1 −2 α 2α
2α
−α
−α
= α 1 −2 α p 1 −2 α r 1 −2 α w 1 −2 α
(11.13)
CHAPTER 11. MULTIVARIATE OPTIMIZATION
356
and 2α
1
−α
−α
pxα yα = α 1−2α p 1−2α r 1−2α w 1−2α since
2α 1 −2 α
+1 =
2α+(1−2α) 1 −2 α
=
1 1 −2 α .
From the solution values of x and y, 1
α−1
−α
x = ( α p ) 1 −2 α r 1 −2 α w 1 −2 α α−1
1
−α
y = ( α p ) 1 −2 α w 1 −2 α r 1 −2 α and 1
α−1
−α
1
1
−α
1
1
−α
−α
rx = (α p) 1−2α r( 1−2α +1) w 1−2α = α 1−2α p 1−2α r 1−2α w 1−2α α−1
1
−α
−α
1
1
w y = (α p) 1−2α w( 1−2α +1) r 1−2α = α 1−2α p 1−2α w 1−2α r 1−2α . Thus, 1
1
−α
−α
rx + w y = 2α 1−2α p 1−2α w 1−2α r 1−2α . So, π∗ = p( x∗ )α ( y∗ )α − (rx∗ + w y∗ ), 2α
1
−α
−α
−α
−α
p( x∗ )α ( y∗ )α − (rx∗ + w y∗ ) = α 1−2α p 1−2α r 1−2α w 1−2α − 2α 1−2α p 1−2α w 1−2α r 1−2α i h 2α 1 1 −α −α = α 1 −2 α − 2 α 1 −2 α p 1 −2 α r 1 −2 α w 1 −2 α i h 1 1 1 −α −α = α 1−2α α−1 − 2α 1−2α p 1−2α r 1−2α w 1−2α ¤ 1 1 £ −α −α = α 1−2α α−1 − 2 p 1−2α r 1−2α w 1−2α 1
1
−α
−α
= α 1−2α α−1 [1 − 2α] p 1−2α r 1−2α w 1−2α ,
using the fact that ∂π∗ ∂r
=
2α 1 −2 α
=
1 1 −2 α
− 1. Differentiating with respect to r gives:
h −α i 1 −α −α 1 α 1−2α α−1 [1 − 2α] p 1−2α r 1−2α −1 w 1−2α 1 − 2α 1
1
α−1
−α
= −α 1 −2 α p 1 −2 α r 1 −2 α w 1 −2 α = − x∗ ,
using
−α 1 −2 α
−1 =
−α−(1−2α) 1 −2 α
=
α −1 1 −2 α
and
£
−α 1 −2 α
¤
α−1 [1 − 2α] = −1.
♦
11.3. OPTIMIZATION: N VARIABLES
357
11.3 Optimization: n variables Consider the problem of maximizing or minimizing a function f : max f ( x1 ,... , x n ) or min f ( x1 ,... , x n ). For convenience, write f ( x), where x = ( x1 ,... , x n ). The first derivatives of f are f i =
∂f ∂ xi , i
∗
= 1,... , n. Set ∇ f = ( f 1 , f 2 ,... , f n ). If it is the case
that at a point x , the partial derivative of f with respect to x i is not equal to 0, say f i ( x∗ ) =
∂ f ( x∗ ) ∂ xi
> 0, this means that at the point x∗ the value of f is increased
if x i is increased from x∗i . This implies that for x∗ to be a local maximum of the function f , it must be the case that every partial derivative is 0. A necessary condition for x∗ to be a local maximum is that partial derivatives at x∗ are 0: ∂ f ( x∗ ) ∂xi
= 0, i = 1,... , n.
Under what conditions can one be sure that x∗ is a local maximum? In other words, what are necessary and sufficient conditions for x∗ to be a local maximum? Let H ( p) be the Hessian matrix evaluated at p:
H=
½
∂2 f ∂xi ∂x j
¾
∂2 f ∂2 x1
2 ∂ f ∂ x2 ∂ x1 = .. ij . 2
∂ f ∂ xn ∂ x1
∂2 f ∂ x1 ∂ x2 ∂2 f ∂2 x2
··· ···
.. .
..
∂2 f ∂ xn ∂ x2
.
···
∂2 f ∂ x1 ∂ xn
∂2 f ∂ xn ∂ xn
f 11
f 21 = .. . f n1
∂2 f ∂ x2 ∂ xn
.. .
f 12
···
f 22
···
.. .
..
f n2
···
.
f 1n
f 2n . .. . f nn
In general, H will depend on x. This dependence is made explicit by writing H ( x). Determining if H is positive or negative definite is possible by checking a set of determinental conditions. Define H j to be the j × j matrix obtained from H by deleting the last n − j rows and columns:
f 11
f 21 Hj = . . . f j1
f 12
···
f 22
···
.. .
f j2
..
.
···
f1 j
f2 j . .. . fjj
Theorem 11.2: Suppose that x∗ satisfies: ∂ f ( x∗ ) ∂xi
= 0, i = 1,... , n.
CHAPTER 11. MULTIVARIATE OPTIMIZATION
358
If H1  < 0, H2  > 0, H3  < 0, ... ,
or (−1) j H j  > 0, j = 1,... , n,
then x∗ is a local maximum. If H1  > 0, H2  > 0, H3  > 0,... ,
H j  > 0, j = 1,... , n,
or
then x∗ is a local minimum. The following example illustrates the calculations. E XAMPLE 11.10: Let f ( x, y, z) = 3 x2 + 12 y2 + 5 z2 − 13 x + 12 y + 2 xz − 3 yz. Then: f x = 6 x − 13 + 2 z, f xx = 6, f x y = 0, f xz = 2, f y = 24 y + 12 − 3 z, f yy = 24,
f yz = −3, f z = 10 z + 2 x − 3 y, f zz = 10, and
6
0
H = 0
24
2
−3
2
−3 ,
The firstorder conditions solve to give: x =
 H 1  = 6 > 0,
¯ ¯6 ¯ H2  = ¯ ¯0
¯ 0 ¯¯ ¯ = 144 > 0, 24¯
10
205 86 ,
25 y = − 43 and z = − 28 43 .
¯ ¯6 ¯ ¯ H3  = ¯¯0 ¯ ¯2
so the Hessian is positive definite.
0 24 −3
¯ 2 ¯¯ ¯ −3¯¯ = 1290 > 0, ¯ 10 ¯
♦
The next example is the standard profitmaximization, now with three inputs. E XAMPLE 11.11: Consider the production function q = f ( x, y, z) = xα yβ zγ .
A pricetaking firm must maximize π( x, y, z) = p f ( x, y, z) − rx − sy − tz. The
firstorder conditions are π x = p f x − r = 0, π y = p f y − s = 0 and π z = p f z − t = 0. The first and secondorder partial derivatives are:
f x = α x α −1 y β z γ
f y = β xα yβ−1 zγ f z = γ x α y β z γ− 1
11.3. OPTIMIZATION: N VARIABLES
359
and
q x2 q f yy = β(β − 1) xα yβ−2 zγ = β(β − 1) 2 y q f zz = γ(γ − 1) xα yβ zγ−2 = γ(γ − 1) 2 z q α−1 β−1 γ f x y = αβ x y z = αβ xy q f xz = αγ xα−1 yβ zγ−1 = αγ xz q α β−1 γ−1 f yz = βγ x y z = βγ . yz f xx = α(α − 1) xα−2 yβ zγ = α(α − 1)
From these calculations, the Hessian is:
π xx
πx y
π xz
f xx
fxy
f xz
H = π yx
π yy
π yz = f yx
f yy
f yz
π zx
πz y
q α(α − 1) x2 = αβ xqy q αγ xz
f zx
π zz
q αβ x y
q
β(β − 1) y2 q
βγ yz
Ã q ´ ³ α(α − 1) x2 q From this, H1 = α(α − 1) x2 , H2 = q αβ x y q
fzy
f zz
q αγ xz q βγ yz
q
γ(γ − 1) z2
.
q
αβ x y q
!
β(β − 1) y2
and H3 = H .
Thus, H1  = α(α − 1) x2 < 0. Also, µ ¶2 q q 2 2 q H2  = α(α − 1)β(β − 1) 2 2 − α β xy x y q q = [α(α − 1)β(β − 1) − α2 β2 ] 2 2 x y q q = αβ[1 − α − β] 2 2 . x y
Since 1 − α − β > 0, this expression is positive. Finally, ¯ ¯ ¯ ¯ q ¯ ¯αβ x y q ¯ − αβ ¯ q¯ x y ¯¯ αγ q γ(γ − 1) z2 ¯ yz xz ¯ q ¯ β(β − 1) y2 ¯ ¯ ¯ q βγ yz ¯
¯ q ¯ q ¯¯β(β − 1) y2 H3  = α(α − 1) 2 ¯ x ¯ βγ q ¯ q ¯ q ¯¯αβ x y + αγ ¯ xz ¯ αγ q xz
q
βγ yz
¯ ¯ ¯ ¯ q¯ γ(γ − 1) z2 ¯ q
βγ yz
CHAPTER 11. MULTIVARIATE OPTIMIZATION
360
i q qh β(β − 1)γ(γ − 1) − β2 γ2 2 2 2 x y z i q q h αβγ(γ − 1) − αγβγ − αβ xy x yz2 q q + αγ [αββγ − αγβ(β − 1)] 2 xz xy z
= α(α − 1)
= −αβγ(1 − α − β − γ)
q2 , x 2 y2 z 2
♦
so H3  < 0, since (1 − α − β − γ) > 0.
11.3.1
Motivation for the Hessian conditions
To motivate the previous discussion, consider a Taylor series expansion of f . The matrix version of Taylor’s theorem gives:
f ( x) = f ( x∗ ) + ∇ f ( x∗ ) · ( x − x∗ ) + ( x − x∗ )′ H ( p)( x − x∗ ), where p is between x and x∗ . If x∗ satisfies: f i ( x∗ ) = 0, ∀ i , then
f ( x˜ ) = f ( x∗ ) + ( x˜ − x∗ )′ H ( p)( x˜ − x∗ ).
(11.14)
From this expression, whether f ( x˜ ) is larger or smaller than f ( x∗ ) depends entirely on H ( p). If, for all vectors, z 6= 0, z′ H ( p) z < 0, then H ( p) is negative definite. From Equation 11.14 this implies f ( x˜ ) < f ( x∗ ). If, for all vectors, z 6= 0, z′ H ( p) z > 0
the matrix H ( p) is positive definite and Equation 11.14 implies f ( x˜ ) > f ( x∗ ). So
the sufficiency conditions depend on properties of the Hessian matrix H , and the
associated quadratic form, characterized in terms of positive and negative definiteness. Theorem 11.3 gives criteria for negative and positive definiteness. Theorem 11.3: The matrix H ( x) is negative definite if: H1 ( x) < 0, H2 ( x) > 0, H3 ( x) < 0,... , or (−1) j H j ( x) > 0, for j = 1,... , n.
The matrix H ( x) is positive definite if: H1 ( x) > 0, H2 ( x) > 0, H3 ( x) > 0,... or H j ( x) > 0, for j = 1,... , n.
Note that if H ( x∗ ) satisfies the conditions for positive or negative definiteness, then provided p is close to x∗ , H ( p) will also satisfy those conditions. If x˜ is close
11.3. OPTIMIZATION: N VARIABLES
361
to x∗ in Equation 11.14, then so is p. So, for example, if at x∗ , f i ( x∗ ) = 0, for all i , and H ( x∗ ) is negative definite, then so is H ( p) and f ( x˜ ) < f ( x∗ ).
11.3.2
Definiteness and secondorder conditions
Recall that the Hessian matrix is the matrix of second partial derivatives of the function. Because the order of differentiation does not matter, f xi x j = f x j xi , the Hessian matrix is symmetric and the expression z′ H ( p) z is a quadratic form. The
main theorem giving secondorder conditions in terms of the Hessian matrix is: Theorem 11.4: Let x∗ satisfy f i ( x∗ ) = 0, i = 1,... , n. Then 1. x∗ is a local maximum if H ( x˜ ) is negative semidefinite, ∀ x˜ close to x∗ . 2. x∗ is a unique local maximum if H ( x∗ ) negative definite. 3. x∗ is a global maximum if H ( x) is negative semidefinite ∀ x. 4. x∗ is a unique global maximum if H ( x) is negative definite ∀ x. 5. x∗ is a local minimum if H ( x) is positive semidefinite. 6. x∗ is a unique local minimum if H ( x∗ ) is positive definite, ∀ x˜ close to x∗ . 7. x∗ is a global minimum if H ( x) positive semidefinite ∀ x. 8. x∗ is a unique global minimum if H ( x) positive definite ∀ x. P ROOF : These results are easy to verify by considering the Taylor series expansion:
f ( x) = f ( x∗ ) + ∇ f ( x∗ ) · ( x − x∗ ) + ( x − x∗ )′ H ( p)( x − x∗ ) and if at x∗ , f i ( x∗ ) = 0 so ∇ f ( x∗ ) = 0, then the expansion gives:
f ( x) = f ( x∗ ) + ( x − x∗ )′ H ( p)( x − x∗ ). For example, if H ( x˜ ) is negative semidefinite, ∀ x˜ close to x∗ , then this will also be satisfied by p in the Taylor series expansion, since f is between x˜ and x∗ :
H ( x˜ ) is negative semidefinite when x˜ is close to x∗ . In this case, ( x˜ − x∗ )′ H ( p)( x˜ −
x∗ ) ≤ 0, so that f ( x˜ ) ≤ f ( x∗ ). This confirms that x∗ is a local maximum. A similar
discussion applies to the other results.
CHAPTER 11. MULTIVARIATE OPTIMIZATION
362
The next examples illustrate threevariable optimization and the envelope theorem. Following this, a variation of the example is used to illustrate the envelope theorem.
E XAMPLE 11.12: A monopolist can sell a product in three different countries. The demand in each country is Q 1 = 45 − 21 P1 , Q 2 = 50 − 13 P2 and
Q 3 = 75 − 14 P3 . The cost function for the monopolist is C = 30 + Q 2 where Q = Q 1 + Q 2 + Q 3 . The profit function for the monopolist is:
π = T R − TC = P1 Q 1 + P2 Q 2 + P3 Q 3 − 30 − (Q 1 + Q 2 + Q 3 )2 .
Using the inverse demand functions, P1 = 90 − 2Q 1 , P2 = 150 − 3Q 2 , P3 = 300 − 4Q 3 , profit is:
π = (90 − 2Q 1 )Q 1 + (150 − 3Q 2 )Q 2 + (300 − 4Q 3 )Q 3 − 30 − (Q 1 + Q 2 + Q 3 )2 .
The firstorder conditions for profit maximizing output in each country are: πQ 1 = 90 − 4Q 1 − 2(Q 1 + Q 2 + Q 3 ) = 0 πQ 2 = 150 − 6Q 2 − 2(Q 1 + Q 2 + Q 3 ) = 0 πQ 3 = 300 − 8Q 3 − 2(Q 1 + Q 2 + Q 3 ) = 0.
In matrix form, these conditions give:
90
6
2
150 − 2
8
300
2
2
Q1 0 2 Q 2 = 0 , 0 10 Q 3
2
and solving:
−1
6
2
2
2 2
8
2
2
10
19 100 Q1 Q 2 = −1 25 Q3 3 − 100
19 100
1 = − 25 3 − 100
1 − 25 7 50 1 − 50
1 − 25
3 − 100
7 50
1 − 50
1 − 50
11 100
3 − 100
1 ; − 50 11 100
21 90 10 57 150 = 5 . 273 300 10
11.3. OPTIMIZATION: N VARIABLES
363
Since πQ 1Q 1 = −6,
πQ 1 Q 2 = −2,
πQ 2Q 3 = −2,
πQ 3 Q 3 = −10,
−6 H= − 2 −2
−2
−8
−2
−2
πQ 1 Q 3 = −2,
−2 , −10
H = H3 ;
H 1 = −6 < 0 ,
¯ ¯−6 ¯ ¯ H  = ¯¯−2 ¯ ¯−2
−2
−8
−2
πQ 2 Q 2 = −8,
¯ ¯−6 ¯ H2 = ¯ ¯−2
¯ −2¯¯ ¯ = 44 > 0, −8 ¯
¯ −2 ¯¯ ¯ −2 ¯¯ = −400 < 0, ¯ −10¯
♦
and the solution is a maximum.
E XAMPLE 11.13: Suppose that in Example 11.12 the cost function is C =
30 + αQ 2 , where α is some positive constant. Then the profit function
becomes
π = (90 − 2Q 1 )Q 1 + (150 − 3Q 2 )Q 2 + (300 − 4Q 3 )Q 3 − 30 − α(Q 1 + Q 2 + Q 3 )2 ,
and the firstorder conditions are: πQ 1 = 90 − 4Q 1 − 2α(Q 1 + Q 2 + Q 3 ) = 0 πQ 2 = 150 − 6Q 2 − 2α(Q 1 + Q 2 + Q 3 ) = 0 πQ 3 = 300 − 8Q 3 − 2α(Q 1 + Q 2 + Q 3 ) = 0.
Thus,
4 + 2α 150 − 2α 300 2α
90
2α 6 + 2α 2α
Q1 0 Q 2 = 0 . 2α 8 + 2α Q 3 0
2α
Some calculation yields: 4 + 2α 2α 2α
−1
2α
2α
6 + 2α
2α 8 + 2α
2α
7α+12 4(13α+12) = − (13αα+12)
− 4(133αα+12)
− (13αα+12)
− 4(133αα+12)
− 2(13αα+12)
5 α +6 4(13α+12)
3 α +4 2(13α+12)
− 2(13αα+12)
CHAPTER 11. MULTIVARIATE OPTIMIZATION
364
and
7α+12 4(13α+12) − (13αα+12) − 4(133αα+12)
− (13αα+12)
3 α +4 2(13α+12) − 2(13αα+12)
¡ 15 ¢ ¡ −36+29a ¢ − 2 90 Q1 12+13a 15(a−20) α = Q 2 − 2(13α+12) 150 = − 12+13a . ¡ 15 ¢ 31a+60 5 α +6 Q3 300
− 4(133αα+12)
4(13α+12)
2
12+13a
For the secondorder conditions, the Hessian matrix is:
−(4 + 2α) −2 α
−2 α
−2 α
−(6 + 2α) −2 α
−2 α
−2 α
−(8 + 2α)
,
so  H1 = −(4 + 2α) < 0,  H2 = [(4 + 2α)(6 + 2α)] − (2α)2 = 24 + 20α > 0, and
 H3 = −(208α + 192) < 0.
This example may be used to illustrate the envelope theorem. If the optimized function is denoted π∗ (α), then
d π∗ = −(Q 1∗ + Q 2∗ + Q 3∗ )2 , dα where Q ∗i is the optimal value of Q i . The following calculations confirm this. Observe that with the cost function C = 30 + αQ 2 , the optimal values of the Q i depend on α, as calculated above. Write these functions:
Q 1 (α),Q 2 (α),Q 3 (α). Then R1 (Q 1 (α)) = P1 (α)Q 1 (α) = [90 − 2Q 1 (α)]Q 1 (α) R2 (Q 2 (α)) = P2 (α)Q 2 (α) = [150 − 3Q 2 (α)]Q 2 (α) R3 (Q 3 (α)) = P3 (α)Q 3 (α) = [300 − 4Q 3 (α)]Q 3 (α) and π∗ (α) = R1 (Q 1 (α)) + R2 (Q 2 (α)) + R3 (Q 3 (α)) − 30 − α(Q 1 (α) + Q 2 (α) + Q 3 (α))2 .
Thus,
d π∗ dR1 (Q 1 (α)) dQ 1 dR2 (Q 2 (α)) dQ 2 dR3 (Q 3 (α)) dQ 3 = + + dα dQ 1 dα dQ 2 dα dQ 3 dα · ¸ dQ 1 dQ 2 dQ 3 + + − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) dα dα dα − (Q 1 (α) + Q 2 (α) + Q 3 (α))2
11.3. OPTIMIZATION: N VARIABLES
365
=
·
¸ dR1 (Q 1 (α)) dQ 1 − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) dQ 1 dα ¸ · dQ 2 dR2 (Q 2 (α)) − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) + dQ 2 dα ¸ · dQ 3 dR3 (Q 3 (α)) − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) + dQ 3 dα
− (Q 1 (α) + Q 2 (α) + Q 3 (α))2 .
Since · d π∗ dα
¸ dR i (Q i (α)) − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) = 0, i = 1, 2, 3, dQ i
= −(Q 1 (α) + Q 2 (α) + Q 3 (α))2 . Alternatively, recall that the profit function
(before optimization) is:
π = (90 − 2Q 1 )Q 1 + (150 − 3Q 2 )Q 2 + (300 − 4Q 3 )Q 3 − 30 − α(Q 1 + Q 2 + Q 3 )2 .
Differentiating with respect to the parameter α gives −(Q 1 + Q 2 + Q 3 )2 , and evaluating this at the solution yields: (Q 1 (α) + Q 2 (α) + Q 3 (α))2 . Thus, the
derivative of the unoptimized function with respect to the parameter α evaluated at the solution is the same as the derivative of the optimized function with respect to the parameter – an illustration of the envelope theorem. ♦
R EMARK 11.6: When f is additively separable, the second order conditions are P simpler. The function f is additively separable if f ( x1 ,... , x n ) = ni=1 f i ( x i ) =
f 1 ( x1 ) + · · · + f n ( x n ). The Hessian has the form: ∂2 f 1
H =
∂2 x12
0
0
∂2 f 2 ∂2 x22
.. .
.. .
0
0
Then f is negative definite if and only if
E XAMPLE 11.14 (P RICE
··· ···
..
.
0 ∂2 f i ∂ x2i
0
0 .. . . 2 n
∂ f ∂2 x2n
< 0, ∀ i .
DISCRIMINATION IN THREE MARKETS ):
Price and
revenue in market 1 are given by P1 = P1 (Q 1 ) and R1 (Q 1 ) = P1 (Q 1 )Q 1 ,
respectively. Similarly, in markets 2 and 3, price and revenue are,
CHAPTER 11. MULTIVARIATE OPTIMIZATION
366
respectively, P2 = P2 (Q 2 ), P3 = P3 (Q 3 ), R2 (Q 2 ) = P2 (Q 2 )Q 2 and R3 (Q 3 ) =
P3 (Q 3 )Q 3 . Production cost is given by c(Q 1 ,Q 2 ,Q 3 ). The profit function is
then: π(Q 1 ,Q 2 ,Q 3 ) = R1 (Q 1 ) + R2 (Q 2 ) + R3 (Q 3 ) − c(Q 1 ,Q 2 ,Q 3 ).
(If the cost function is additively separable, so that c(Q 1 ,Q 2 ) = γ1 (Q 1 ) + γ2 (Q 2 ) + γ3 (Q 3 ), then the problem of maximizing profit reduces to choosing
Q 1 , Q 2 and Q 3 independently. In general, if the level of production of one good affects the cost of producing the other good, then the cost function is not additively separable.) The firstorder conditions are: ¶ µ ∂c ∂π , c1 = π1 = R1′ (Q 1 ) − c1 = 0, π1 = ∂Q 1 ∂Q 1 µ ¶ ∂π ∂c π2 = R2′ (Q 2 ) − c2 = 0, π2 = , c2 = ∂Q 2 ∂Q 2 ¶ µ ∂c ∂π . , c3 = π3 = R3′ (Q 3 ) − c3 = 0, π3 = ∂Q 3 ∂Q 3
The second derivatives are π11 = R1′′ (Q 1 ) − c11 , π22 = R2′′ (Q 2 ) − c22 and π33 =
R3′′ (Q 3 ) − c33 . The cross derivatives are π i j = − c i j , when i 6= j. (π i j = ci j =
∂2 c ∂Q i ∂Q j ).
∂2 π , ∂Q i ∂Q j
The firstorder conditions give:
MR1 (Q 1 ) = R1′ (Q 1 ) = P1 (Q 1 ) + Q 1 P1′ (Q 1 ) = c1 (Q 1 ,Q 2 ,Q 3 ) = MC1 (Q 1 ,Q 2 ,Q 3 ). So ¸ · Q 1 dP1 = c1 P1 (Q 1 ) 1 + P1 dQ 1
or · ¸ 1 P1 (Q 1 ) 1 + = c1 , η1 h i h i η1 < 0. Similarly, P2 (Q 2 ) 1 + η1 = c2 , with η2 < 0 and P3 (Q 3 ) 1 + η1 = c3 , 2
3
with η3 < 0. If η i is a large negative number, then market i is sensitive to price and the price in this market should be set close to marginal cost.
If the cost is linear: c(Q 1 ,Q 2 ,Q 3 ) = γ1 Q 1 + γ2 Q 2 + γ3 Q 3 , and the Hessian
matrix has a simple form, since c i j = 0 for i 6= j. Taking each market to
have diminishing marginal revenue: R ′′i < 0, i = 1, 2, 3. For the secondorder conditions, observe that the Hessian matrix is:
11.4. CONCAVITY, CONVEXITY AND THE HESSIAN MATRIX
367
π11 (Q 1 )
0
0
π22 (Q 2 )
0
0
0
= π33 (Q 3 )
0
R1′′ (Q 1 )
0
0
R2′′ (Q 2 )
0
0
0
. R3′′ (Q 3 )
0
Therefore, the secondorder conditions are: π11 = R1′′ (Q 1 ) < 0, ∀Q 1 , π11 π22 =
R1′′ (Q 1 )R2′′ (Q 2 ) > 0, and π11 π22 π33 = R1′′ (Q 1 )R2′′ (Q 2 )R3′′ (Q 3 ) < 0.
♦
11.4 Concavity, convexity and the Hessian matrix The next theorem relates concavity and convexity to the “definiteness” of the Hessian. Theorem 11.5: 1. If f is strictly concave, then f is concave; and concavity of f is equivalent to negative semidefiniteness of the Hessian matrix H ( x), for all x. 2. If the Hessian matrix H ( x) is negative definite for all x, then f is strictly concave. 3. If f is strictly convex then f is convex; and convexity of f implies that the Hessian matrix H ( x) is positive semidefinite for all x. 4. If the Hessian matrix H ( x) is positive definite for all x, then f is strictly convex. P ROOF : The following discussion proves 1. From the definitions, strict concavity implies concavity. The remaining calculations show that concavity is equivalent to negative semidefiniteness. Recall that a function, f , is concave if given
x and y, f (θ x + (1 − θ) y) ≥ θ f ( x) + (1 − θ) f ( y), θ ∈ [0, 1]. Concavity is determined
by comparing any two points with a third point on the line defined by those two points. Given any two points x, y the line defined by these two points is
L x y = {w  w = (1 − θ) x + θ y, some θ ∈ R}. Let f L xy be the function f defined on L x y by: f L xy (θ) = f (θ x + (1 − θ) y), ∀θ. Observe that for every line L x y , f L xy is
concave in θ if and only if f is concave.
CHAPTER 11. MULTIVARIATE OPTIMIZATION
368
¡ ¢ f L xy (λθ + (1 − λ)θ′ ) = f (λθ + (1 − λ)θ′ ) x + [1 − (λθ + (1 − λ)θ′ )] y ¡ ¢ = f λθ x + (1 − λ)θ′ x + [(λ + (1 − λ)) − (λθ + (1 − λ)θ′ )] y ¡ ¢ = f λθ x + (1 − λ)θ′ x + (λ(1 − θ) y + (1 − λ)(1 − θ′ ) y ¡ ¢ = f λ[θ x + (1 − θ) y] + (1 − λ)[θ′ x + (1 − θ′ ) y]
≥ λ f (θ x + (1 − θ) y) + (1 − λ) f (θ′ x + (1 − θ′ ) y) = λ f L xy (θ) + (1 − λ) f L xy (θ′ ),
where the inequality holds if and only if f is concave. Thus, f L xy (θ) =
f (θ x + (1 − θ) y) = f ( y + θ( x − y)) is concave for any x, y. Let z = x − y so f L xy (θ) = L xy
f ( y + θ z). By assumption, f L xy is concave so that f θθ ≤ 0. Now, L xy
fθ
=
X i
f i ( y + θ z) z i
and L xy
f θθ (θ) =
XX j
i
f ji ( y + θ z) z j z i = z′ H ( y + θ z) z.
So, concavity of f L xy in θ for every L x y is equivalent to z′ H ( y + θ z) z ≤ 0 for all
y, z and θ.
R EMARK 11.7: Note that strict concavity of a function f does not imply that
H ( x) is negative definite for all x; and strict convexity does not imply that H ( x) is positive definite for all x. For example, consider f ( x) = −( x − 1)4 , a strictly
concave function. Differentiating gives f ′′ ( x) = −12( x − 1)2 . Note that f ′′ (1) = 0
so that f ′′ is only negative semidefinite despite the fact that the function f is strictly concave.
Exercises for Chapter 11
Exercise 11.1 The total weekly revenue (in dollars) that a company obtains from selling goods x and y is:
R ( x, y) = −
¡1¢ 4
x2 −
¡3¢ 8
y2 −
¡1¢ 4
x y + 300 x + 240 y.
The total weekly cost attributed to production is
C ( x, y) = 180 x + 140 y + 5000. Find the profitmaximizing levels of x and y and check the secondorder conditions. Exercise 11.2 A television relay station will serve three towns, A , B and C , located in the ( x, y) plane at coordinates A = (30, 20), B = (−20, 10) and C = (10, −10). Plot the location of the three towns in the ( x, y) plane and find the location of a transmitter, chosen to minimize the sum of the squares of the distances of the transmitter from each of the towns. For two points ( x1 , y1 ) and ( x2 , y2 ), the dis1
tance between them is [( x1 − x2 )2 + ( y1 − y2 )2 ] 2 . Exercise 11.3 Consider the following duopoly model. There are two firms supplying a market where demand is given by p(Q ) = a − bQ . Firm i produces q i units of output and so the total level of production is Q = q1 + q2 . Both firms face the same constant marginal cost, so the cost of producing q i for firm i is cq i . Thus the profit functions of firms 1 and 2 respectively, are given by: π1 ( q1 , q2 ) = p( q1 + q2 ) q1 − cq1 = ( a − b[ q1 + q2 ]) q1 − cq1 π2 ( q1 , q2 ) = p( q1 + q2 ) q2 − cq2 = ( a − b[ q1 + q2 ]) q2 − cq2 .
369
CHAPTER 11. MULTIVARIATE OPTIMIZATION
370
(a) Suppose that each firm takes the output of the other firm as fixed, and calculates its own best choice of quantity. In this case, firm i solves
∂π i ∂q i
= 0, taking as
given the value of q j . Calculate for the linear demand case the equations: ∂π
" # 0
∂π2 ∂q2
0
1
∂q1 =
.
Solve for q1 and q2 and denote the solutions q1d and q2d , where d denotes duopoly. Compare the value of Q d = q1d + q2d , with the monopoly level of output. (b) Suppose now that we have n firms instead of two. This case is called oligopoly. Let the output of the i th firm be q oi , where o denotes oligopoly, and see if you can solve for Q o = q1o + q2o + ... + q on . (c) Suppose now that a government tax per unit of output is imposed on each firm, with tax t i per unit on firm i . Then the profit functions become: π1 ( q1 , q2 ; t1 ) = p( q1 + q2 ) q1 − cq1 − t1 q1 = ( a − b[ q1 + q2 ]) q1 − cq1 − t1 q1 π2 ( q1 , q2 ; t2 ) = p( q1 + q2 ) q2 − cq2 − t2 q2 = ( a − b[ q1 + q2 ]) q2 − cq2 − t2 q2 .
Repeat the exercise in (a), and solve for q1 and q2 . In this case, q1 and q2 will depend on t1 and t2 (and the other parameters, a, b, c), so we can write q1 ( t1 , t2 ) and q2 ( t1 , t2 ). The total tax revenue of the government is T ( t1 , t2 ) = t1 q1 ( t1 , t2 ) +
t2 q2 ( t1 , t2 ). Maximize T with respect to t1 and t2 , and check that the secondorder
conditions for a maximum are satisfied. Exercise 11.4 Consider a firm that sells its output, y, at a fixed price p. In the production process, the firm creates pollution. To control pollution, the firm can engage in pollution abatement, a. The technology is as follows: the cost of producing y units of output when operating at abatement level a is given by c( y, a), and the level of pollution emission is e( y, a). We assume that for the cost function,
c y > 0, c yy > 0, c a > 0 and c aa > 0. Similarly, for the emission process, e, we assume
that e y > 0, e yy > 0, e a < 0 and e aa > 0. In addition, we assume that c a y (= c ya ) and
e a y (= e ya ) are both positive. Finally, the government imposes a (unit) tax, τ, on
the level of pollution emission. The profit function for the firm is then given by: π( y, a) = py − c( y, a) − τ e( y, a).
(a) Find the firstorder conditions for profit maximization with respect to choice of ( y, a). (b) Determine the secondorder conditions.
EXERCISES FOR CHAPTER 11
371
(c) In the case where c( y, a) = y2 a + γ ay and e( y, a) = y − a2 + βay, γ, β parameters, find the profitmaximizing levels of y and a. Check the secondorder conditions. From the solution to the firstorder conditions and calculate ∂ y/∂τ and ∂a/∂τ. Exercise 11.5 A firm’s costs of plant operation are a function of two variables, x and y. The cost function is given by:
c( x, y) = 3 x2 − 2 x y + y2 + 4 x + 3 y. Give the solution to the firstorder conditions and confirm that the solution minimizes the cost function. Exercise 11.6
A box with edge lengths x, y and z has volume V = x yz. The
surface area of a box with these dimensions (open at the top) is: S = x y + 2 yz +
2 xz. (The base of the box has length x and width y. Since there is no top, x y
occurs only once in the definition of S .) Suppose you have to choose a box size with volume 32; what choice of x, y and z is optimal to minimize surface area? Note that, in this question, there are only two choice variables once the volume is fixed. The variable z can be substituted out: z = 32/ x y. Find the optimal choices of the
variables to minimize surface area S , and check that the secondorder conditions for a minimum are satisfied. Exercise 11.7 Consider a firm that sells its output, y, at a fixed price p. In the production process, the firm creates pollution. To control pollution, the firm can engage in pollution abatement, a. The technology is as follows: the cost of producing y units of output when operating at abatement level a is given by c( y, a), and the level of pollution emission is e( y, a). Finally, the government imposes a (unit) tax, τ, on the level of pollution emission. The profit function for the firm is then given by: π( y, a) = py − c( y, a) − τ e( y, a).
Give the first and secondorder conditions for a maximum. Find the profitmaximizing levels of output and abatement when the cost function is given by
c( y, a) = y2 +
1 2 2a ,
and e( y, a) = y(10 − a). Check that the secondorder
conditions are satisfied. We assume that the tax rate is less than 1(0 < τ < 1), and
p > 10.
Exercise 11.8
Find the maximizing values of ( x, y) for the following function,
and confirm that the secondorder conditions are satisfied:
f ( x, y) =
p
x + ln y − x2 − 3 y.
CHAPTER 11. MULTIVARIATE OPTIMIZATION
Exercise 11.9
372
Find the maximizing values of ( x, y) for the following function,
and confirm that the secondorder conditions are satisfied: 1 f ( x, y) = a ln x + b ln y − c( x2 + y2 ). 2 Exercise 11.10 Find the maximizing values of ( x, y) for the following function, and confirm that the secondorder conditions are satisfied:
f ( x, y) = ln(ax + b y) − Exercise 11.11
1 c ( x 2 + y 2 ). 2
Consider the problem of maximizing a function, f ( x, y; a), by
choice of x and y, where a is some parameter. Suppose that the solution values for
x and y are x(a) and y(a), respectively. The optimized value of the function is then f ∗ (a) = f ( x(a), y(a); a). (a) Explain why
d f ∗ (a) da
=
∂ f ( x(a),y(a);a) . ∂a
(b) Illustrate this in the onevariable case, where f ( x; a) = a ln( x) − 1a x. Exercise 11.12 2
( y − b) −
1 2 cx y.
Consider the following problem. Minimize f ( x, y) = ( x − a)2 +
(a) Find the minimizing values for x and y. (Assume that c < 1.)
(b) Confirm that the secondorder conditions are satisfied. Let f ∗ (a, b, c) be the
value of the function at the solution. (c) Find
∂ f ∗ (a,b,c) ∂a
and
∂ f ∗ (a,b,c) . ∂c
Exercise 11.13 A firm maximizes profit, where profit is equal to revenue minus cost. Output (Q ) depends on two inputs, K and L, related according to the production function Q = f (K, L). Output sells at price p and the costs of the inputs K and
L are r and w, respectively. So the profit maximization problem is: max π(K, L) = p f (K, L) − rK − wL.
Give the first and secondorder conditions for profit maximization. In the particular case where f is f (K, L) = K α Lβ with α > 0, β > 0 and 0 < α + β < 1, verify that
the secondorder conditions are satisfied and solve for the optimal choice of (K, L). Exercise 11.14
A standard econometric problem is determining the ordinary
least squares estimator (OLS). In the underlying model, the i th observation on one variable y is denoted yi and the corresponding observation on another variable x is denoted x i . It is postulated that y and x are linearly related with some
EXERCISES FOR CHAPTER 11
373
error, yi = β0 + β1 x i + ǫ i . If a and b are chosen as the intercept and slope,
respectively, then the estimated error associated with the i th observation is e i =
yi − a − bx i . The OLS estimator is obtained by minimizing the sum of squares of Pn 2 i=1 e i , by choice of a and b. Thus, the OLS estimator is
the estimated errors,
obtained by solving the problem:
min S = a,b
n X
i =1
e2i = min
n X
i =1
( yi − a − bx i )2 = min
n X
i =1
( yi − a − bx i )2 .
Find the OLS estimator and confirm the secondorder conditions for a minimum. Exercise 11.15 A monopolist can sell a product in three different countries. The demand in each country is: 1 Q 1 = 50 − P1 2 1 Q 2 = 50 − P2 3 1 Q 3 = 75 − P3 . 4 The cost function for the monopolist is
C (Q ) = 30 + Q 2 ,
where Q = Q 1 + Q 2 + Q 3 .
Calculate the amount of the product that the profitmaximizing monopolist should sell in each country and check the secondorder conditions for a maximum. Exercise 11.16
For the following functions, calculate the Hessians and deter
mine if they are positive or negative definite: (a)
f ( x, y) = −2 x2 − 7 y2 + x y + 12 x + 37 y
(b)
f ( x, y) = 10 x2 + 5 y2 + 3 x − 12 y + 9 x y
(c)
f ( x, y, z) = 3 x2 + 12 y2 + 5 z2 − 13 x + 12 y + 2 xz − 3 yz
(d)
f ( x, y, z) = 2 x + 3 y − 5 xz + 2 x y − 6 x2 − 8 y2 − 11 z2 .
Exercise 11.17 For the following function, find all extreme points and check if they are maxima or minima:
f ( x, y) = 3 y3 + 2 x2 + 4 x y + 8 y2 + 2 y.
CHAPTER 11. MULTIVARIATE OPTIMIZATION
374
Exercise 11.18 Given the threeinput Cobb–Douglas production function,
Y = ALα K β M γ , where L is the amount of labour, K the amount of capital and M the amount of other materials used, prove that the secondorder conditions for a profit maximization hold if and only if α + β + γ < 1 (α, β, γ > 0). The good sells for a price of a dollar per unit. The costs of labour, capital and materials are, respectively, w, r
and p. Find the Hessian matrix and confirm that it is negative definite. Exercise 11.19 The geometric average of three (positive) numbers x, y and z is 1
given by ( x yz) 3 . Suppose that x + y + z = k. Show that the geometric average is
maximized when x = y = z. Exercise 11.20
A firm uses two inputs x, y to produce Q according to
Q = A ( xα + yβ ), with 0 < α, β < 1. The production cost is given by c( x, y), so that profit is π( x, y) = pA ( xα + yβ ) − c( x, y), where p is the price of Q. Give the first
and secondorder conditions for a maximum. In the case where c( x, y) = ax + b y +
dx y, a, b, d > 0, find the maximizing values of x, y and show that the conditions
for a maximum are satisfied.
Exercise 11.21 For the function
f ( x, y) = 10 x2 + 10 y2 + 3 x − 12 y + 10 x y, show that the Hessian is positive definite. Exercise 11.22 For the function
f ( x, y, z) = 2 x + 3 y − 5 xz + 2 x y − 6 x2 − 9 y2 − 12 z2 , show that the Hessian is negative definite.
12 Equalityconstrained optimization
12.1 Introduction Optimization of a function f ( x1 ,... , x n ) involves choosing the x i variables to make the function f as large or as small as possible. With constrained optimization, permissible choices for the x i are restricted or constrained. In what follows, such constraints are represented by a function, g, whereby the x i variables must satisfy a condition of the form g( x1 ,... , x n ) = c. For example, in the standard utility
maximization problem, the task is to maximize utility u( x1 ,... , x n ) subject to a budget constraint p1 x1 + · · · + p n x n = c, where p i is the price of good i and c is the
income available.
Constraints limit the choices that are feasible. In the case of two goods, the budget constraint is p1 x1 + p2 x2 = c so that the feasible pairs lie on a line. In
this case, there is one constraint with two choice variables. The choice of one variable determines the choice of the other: given x1 , x2 is determined as x2 = c p2
−
p1 p 2 x1 .
Suppose that in addition to the budget constraint there is a second
constraint – that the two goods are complements that must be consumed in a set ratio:
x2 x1
= r or x2 = rx1 . Given this, from p1 x1 + p2 x2 = c one has p1 x1 + p2 rx1 = c
or ( p1 + p2 r) x1 = c or x1 =
c ( p1 + p2 r)
and x2 =
cr ( p1 + p2 r) .
Thus, there is a unique pair
satisfying the two constraints. This observation is generally true; if the number of constraints equals the number of choice variables, there are generally at most a finite number of solutions to the set of constraints. For this reason, the set of constraints is generally less than the number of choice variables. The next section provides a motivating example showing how the Lagrangian technique relates to the familiar firstorder condition (marginal rate of substitution equals the price ratio) for an optimum in the consumer choice problem. 375
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
376
Section 12.2 considers the case of two variables and one constraint. The case of n variables and one constraint is discussed in Section 12.4. It turns out that the Lagrange multiplier appearing in the definition of the Lagrangian has a significant economic interpretation: it provides a measure of the value of the constraint. Specifically, it measures the marginal change in the value of the objective resulting from a small change in the constraint. This is examined in Section 12.5. In the case of unconstrained optimization, identification of sufficient conditions for a global optimum is simplified when the objective has a specific shape, particularly concavity or convexity. Similarly, with constrained optimization, under suitable conditions, discussed in Section 12.6, the local conditions for an optimum are sufficient for the solution to be a global optimum.
12.1.1
A motivating example
The optimization techniques described next are based on the use of Lagrangian functions. The discussion here shows how that formulation may be related to optimization ideas introduced earlier. Consider the problem of maximizing u(y, l) where y is income and l is leisure. Work is time not used for leisure, 24 − l, and
income is given by the wage w times the amount of time spent working, so income and leisure are connected according to the equation or constraint y = w(24 − l). Assume that the marginal benefit of both income and leisure is positive: u y > 0 and u l > 0. Maximizing u(y, l) subject to the constraint y = w(24 − l), is an equality
constrained optimization problem. Preferences and the leisure–income constraint are depicted in Figure 12.1. Note that the condition y = w(24− l) implies that there
is only one variable to be chosen – once l is chosen, the value of y is determined. Using this fact, y can be eliminated from the problem – the constraint can be substituted into the objective to give u∗ (l) = u(w(24 − l), l) and the function u∗ can
be maximized with respect to l as an unconstrained problem. The function u∗ is depicted in Figure 12.2. Differentiating and setting to 0: ′
u∗ = 0 ⇒ − u y w + u l = 0 ⇒
ul = w. uy
For example, if u(y, l) = ln y + ln l, then u∗ (l) = ln(w(24 − l)) + ln(l). Differentiating u∗ (l) and setting the derivative to 0: ′
u∗ (l) = 0 = 24 2
−w 1 + = 0 ⇒ −l + 24 − l = 0 = 24 − 2l. w(24 − l) l
= 12, y = w(24 − 12) = 12w. The secondorder condition for ³ ´2 ¡ ¢2 ′ a maximum is easy to check: u∗ (l) = − (241−l) − 1l < 0. Thus, using the con
This has solution l =
12.1. INTRODUCTION
377
y
u0 •
y = w(24 − l ) · l∗
u¯
•
u′
· l′
24
l
increasing work
Figure 12.1: Constrained optimal choice
u
u¯ u′
• •
u0 · l∗
· l′
u∗ (·) 24
l
Figure 12.2: Optimal leisure choice
straint to solve for one variable in terms of the other leads to an unconstrained problem in one variable that can be solved as the unconstrained maximization of a function of one variable. With just two choice variables, this approach is often satisfactory, but becomes less manageable as the number of choice variables increases. An alternative approach to constrained optimization is to define a function called a Lagrangian. The Lagrangian is denoted L . It is the original function plus the constraint multiplied by a “Lagrange multiplier”. The arguments of the function L are the choice variables and a Lagrange multiplier, λ. In the present
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
378
example, the Lagrangian is defined: L (y, l, λ) = U(y, l) + λ(y − w(24 − l)).
This is now viewed as the objective and maximized with respect to all variables, including the Lagrangian multiplier. ∂L ∂y ∂L ∂l ∂L ∂λ
The first two conditions give
ul uy
= uy +λ = 0
(12.1)
= u l + λw = 0
(12.2)
= y − w(24 − l) = 0.
(12.3)
= w, and this condition imposes the tangency of
the indifference curve to the income–leisure constraint in Figure 12.1. The third equation gives y − w(24 − l) = 0. Solving these determines the optimal values for l and y.
To illustrate, consider the case where u(l, y) = ln u + ln y, so that u y = 1y , u l = ¡1¢ ³1´ ul y 1 l , and so w = u y = l / y = l ⇒ wl = y. Since y = w(24 − l) = wl, this gives
wl = 24w − lw or 24w = 2lw. Therefore l = 12 and y = wl = 12w. This is the same
solution as was obtained by substituting the constraint into the objective function. u
1 From Equation 12.2, u l + λw = 0. This equation determines λ : λ = − wl = − wl = 1 − 12w .
Thus, the solution is the same as that obtained by substitution, using the
constraint to solve for one variable in terms of the other and substituting that into the function to be maximized. In this example, there were just two variables and y was given as an explicit function of l. More generally, with many variables, more complicated constraint functions and possibly multiple constraints, solution by substitution becomes infeasible and use of Lagrangians is the standard approach. Before discussing the general case, the twovariable case is considered in more detail.
12.2 The twovariable case The following discussion considers in general terms the case where there are two choice variables and one equality constraint. Let the objective function be f (x, y) and let the constraint be c = g(x, y). Throughout, assume that (x, y) ∈ Z where Z is ¯ y) ¯ are in Z, all points (x′ , y′ ) sufficiently close to ( x, ¯ y) ¯ an open set in R 2 . Thus, if ( x,
are in Z. Furthermore, f and g are assumed to be continuously differentiable
12.2. THE TWOVARIABLE CASE
379
(continuous with continuous derivatives). The task is to maximize or minimize
f ( x, y) subject to ( x, y), satisfying the constraint c = g( x, y). Definition 12.1: A point ( x∗ , y∗ ) is a constrained local maximum if g( x∗ , y∗ ) = 0
and f ( x∗ , y∗ ) ≥ f ( x, y) for all ( x, y) in a neighborhood of ( x∗ , y∗ ) and satisfying
g( x, y) = 0.
A constrained local minimum is similarly defined. To identify necessary and sufficient conditions for ( x∗ , y∗ ) to be a constrained local maximum or minimum it is useful to formulate the Lagrangian, L : L ( x, y, λ) = f ( x, y) + λ[ c − g( x, y)].
(12.4)
In addition, define the bordered Hessian, H¯ :
0
H¯ = gx
gy
gx
gy
f xx − λ g xx
f xy − λgxy
f xy − λgxy .
(12.5)
f yy − λ g yy
Sometimes, the bordered Hessian is written with a negative sign before the partial derivatives of the constraint (− g x and − g y ). This is equivalent to multiplying the ¯ first row and column by −1, a transformation that leaves the determinant of H unchanged. With this notation:
Theorem 12.1: Consider the problem of maximizing or minimizing the function
f ( x, y) subject to a constraint g( x, y) = c. Let L ( x, y, λ) = f ( x, y, λ) + λ[ c − g( x, y)].
Suppose ( x∗ , y∗ , λ∗ ) satisfies:
L x = f x ( x∗ , y∗ ) − λ∗ g x ( x∗ , y∗ ) = 0 ∗
∗
∗
∗
∗
(12.6)
L y = f y(x , y ) − λ g y(x , y ) = 0
(12.7)
L λ = c − g ( x ∗ , y ∗ ) = 0.
(12.8)
Then, • if H¯  > 0, ( x∗ , y∗ ) is a constrained local maximum; • if H¯  < 0, ( x∗ , y∗ ) is a constrained local minimum. Assuming the derivatives are nonzero, conditions 12.6 and 12.7 imply that
g x ( x ∗ , y∗ ) f x ( x∗ , y∗ ) λ∗ g x ( x∗ , y∗ ) = = f y ( x∗ , y∗ ) λ∗ g y ( x∗ , y∗ ) g y ( x∗ , y∗ ).
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
380
This, along with the constraint, c − g( x∗ , y∗ ) = 0, yields two equations in the two
unknowns, ( x∗ , y∗ ):
f x ( x ∗ , y∗ ) g x ( x ∗ , y∗ ) = f y ( x ∗ , y∗ ) g y ( x ∗ , y∗ )
(12.9)
g( x∗ , y∗ ) = c.
(12.10)
These may be solved for ( x∗ , y∗ ) and λ∗ solved residually from Equations 12.6 or 12.7. E XAMPLE 12.1: A cylinder has circumference length 2πr, where r is the radius. The area of a circle of radius r is πr2 . Thus, the surface area of the cylinder of radius r and height h is: a(r, h) = 2πr2 + 2πrh (top and bottom +
wall). The cylinder has a volume of v(r, h) = πr2 h.
Consider the problem of maximizing volume subject to total surface area being A . This leads to the Lagrangian: L (r, h, λ) = v(r, h) + λ( A − a(r, h))
L = πr2 h + λ( A − 2πr2 − 2πrh) :
∂L ∂r ∂L
∂h ∂L ∂λ
and
∂L ∂h
= 2πrh − λ4πr − λ2πh = 0
(1)
= πr2 − λ2πr = 0
(2)
= A − 2πr2 − 2πrh = 0,
(3)
gives λ = 12 r. Substitute into (1): 2πrh − 2πr2 − πrh = 0, or πr(2h −
2r − h) = 0, or h = 2r. Substitute into (3): A − 2πr2 − 4πr2 = 0, which gives
A = 6πr2 or:
A , r2 = 6π
r=
s
A , 6π
h = 2r = 2
Thus
A × L = π·r ·h = π 6π 2
∗
s
4A = 6π
s
s
A = 6π
4 A3 = 36 × 6π
s
s
4A . 6π
A3 54π.
For example, if A = 24π then r = 2, h = 4, λ = 1, L ∗ = 16π. To check the secondorder condition (with g(h, r) = a(h, r), the constraint function is the area):
¯ ¯0 ¯ ¯ ¯ H  = ¯¯ g r ¯ ¯ gh
gr L rr L rh
¯ ¯ 0 g h ¯¯ ¯¯ ¯ ¯ ¯ ¯ L hr ¯ = ¯4πr + 2πh ¯ ¯ L hh ¯ ¯ 2πr
4π r + 2π h
2πh − λ4π 2πr − λ2π
¯ ¯ ¯ ¯ 2πr − λ2π¯¯. ¯ ¯ 0
H¯  = 32π r − 16π r λ + 8π r h − 16π3 rhλ. 3 3
3 2
3 2
2π r
12.2. THE TWOVARIABLE CASE
381
With λ = 21 r, this reduces to:  H¯ = 24π3 r3 .
This is sufficient for a maximum (local). The area and volume functions may be depicted:
h A = a(r, h)
•
V = v(r, h) r
♦
R EMARK 12.1: In fact, the problem in Example 12.1 is easily solved by substitution. For fixed surface area A , radius and height are connected by the formula:
A = 2πrh + 2πr2 , so that h(r) =
A − 2π r 2 A = − r. 2π r 2π r
Volume is given as a function of r and h by v(r, h) = πr2 h. Substituting for h
gives
v∗ (r) = v(r, h(r)) = πr2 h = πr2
·
¸ A A − r = r − π r3 . 2π r 2
A 2
− 3πr2 and v∗′′ (r) = −6πr, so the function v∗ is concave and q solving for r in v∗′ (r) = A2 − 3πr2 = 0 gives r2 = 6Aπ and r = 6Aπ .
From this, v∗′ (r) =
E XAMPLE 12.2: A production technology gives combinations of x and y that yield a fixed level of output Q . The technology is described by the equation
y = Q + e2 x . The cost of using these inputs is given by the equation C = y − 3 x
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
382
(so that raising x reduces cost!). The task is to minimize cost subject to the production constraint.
y = Q + e2 x
y
y = c + 3x
• Q +1
c
x
The Lagrangian for the problem is: L = y − 3 x + λ[ y − Q − e2 x ]. Therefore, L x = −3 − λ2 e2 x = 0 Ly = 1+λ = 0 L λ = [ y − Q − e 2 x ] = 0.
From the second equation, λ = −1 and substituting into the first equation ¡ ¢ gives: 3 = 2 e2 x . Solving for x, x = 21 ln 23 . Since y = Q + e2 x , we get y = Q + 1
3
3
e2 2 ln( 2 ) = Q + eln( 2 ) = Q + 32 .
For the secondorder condition, L xx = −λ4 e2 x , L yy = 0, L x y = 0. The
constraint is 0 = g( x, y), where g( x, y) = − y + Q + e2 x , so that g x = +2 e2 x and
g y = −1. The bordered Hessian is
0
2x H¯ = 2 e
−1
2 e2 x −λ4 e
0
2x
−1
¯ ¯2 e2 x ¯ ¯ 0 ⇒ H = −1 ¯ ¯ −1 0
¯ −λ4 e2 x ¯¯ ¯ = λ4 e2 x = −4 e2 x < 0. 0 ¯
Thus, the solution to the firstorder conditions gives a local minimum.
♦
E XAMPLE 12.3: In this example, the task is to find a pair ( x, y) that minimize the radius of a circle, subject to ( x, y) being on a line: min f ( x, y) = x2 + ( y − 1)2 ,
12.2. THE TWOVARIABLE CASE
383
subject to α = y − kx, 0 < α < 1, 0 < k. The Lagrangian is L = x2 + ( y − 1)2 + λ(α − y + kx).
This leads to firstorder conditions: L x = 2 xλk = 0 L y = 2( y − 1) − λ = 0 Lλ = (α − y + kx) = 0.
From these, y = α + kx, y − 1 = (α − 1) + kx so 2[(α − 1) + kx] − λ = 0. Then
2 x + 2[(α − 1) + kx]k = 0 or (1 + k2 ) x = (1 − α)k, so x = (11−+αk)2k . Solving for y: ¡ ¢ 2 −α ) + k2 y = α + (11−+αk)2k = α1+ and λ = − 1k 2 x = − 2(1 . Next, L xx = 2, L yy = 2, k2 1+ k 2 L x y = 0, and g x = −k, g y = 1. The bordered Hessian is
0
¯ = − k H
1
−k
2
0
1
0 ,
2
and  H¯ = k(−2k) + 1(−2) = −2(1 + k2 ) < 0. A graph of the problem is given in
the figure.
y
y = α + kx
• • α
x
♦
R EMARK 12.2: Theorem 12.1 gives sufficient conditions for a local maximum. But a local (or global) maximum may fail to satisfy these conditions. Suppose that f ( x, y) = x y and g( x, y) = ( x + y − 1)2 . Consider the problem of maximizing
f subject to the constraint g( x, y) = 0. The constraint is satisfied if and only if
x + y − 1 = 0 or x + y = 1 or y = 1 − x. Substituting y = 1 − x into the objective and
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
384
maximizing fˆ( x) = x(1− x) gives fˆ′ ( x) = 1−2 x and fˆ′′ ( x) = −2 so that the local and
global maxima are found at x∗ = y∗ = 21 . But observe that g x ( x, y) = g y ( x, y) = ¡ ¢ ( x + y − 1) so that at the solution ( x∗ , y∗ ) = 12 , 21 , g x ( x∗ , y∗ ) = g y ( x∗ , y∗ ) = 0. ¢ ¡ However, ( f x , f y ) = ( y, x) so that ( f x ( x∗ , y∗ ), f y ( x∗ , y∗ )) = 12 , 21 . So, for example,
in Theorem 12.1, the condition f x ( x∗ , y∗ ) = λ∗ g x ( x∗ , y∗ ) fails, although ( x∗ , y∗ )
is the global optimum. Because ( g x ( x∗ , y∗ ), g y ( x∗ , y∗ )) = (0, 0), it is impossible to write f x ( x∗ , y∗ ) = λ∗ g x ( x∗ , y∗ ).
The issue here is that at the solution, ( g x ( x∗ , y∗ ), g y ( x∗ , y∗ )) = (0, 0). Theorem 12.2
confirms this observation.
Theorem 12.2: Let f and g be continuously differentiable functions on an open set Z in R 2 . Suppose that ( x∗ , y∗ ) is a constrained local maximum or minimum of
f , subject to the constraint g( x, y) = c. Suppose that ( g x ( x∗ , y∗ ), g y ( x∗ , y∗ )) 6= (0, 0).
Then there is a λ∗ such that
f x ( x ∗ , y∗ ) f y ( x ∗ , y∗ )
= λ∗
g x ( x ∗ , y∗ ) g y ( x ∗ , y∗ )
.
Therefore, when ( g x ( x∗ , y∗ ), g y ( x∗ , y∗ )) 6= (0, 0), the conditions f x = g x , f y = g y and
g = 0 give three equations in three unknowns, ( x, y, λ), which may be solved to
identify a candidate solution to the optimization problem.
The following discussion motivates the firstorder conditions and derives the bordered Hessian secondorder condition using familiar arguments for the unconstrained optimization problem. Then the problem is described from a geometric perspective. Following that, the general result for the nvariable case is given, in Theorem 12.4.
12.3 Motivation for first and secondorder conditions Observe that the constraint g( x, y) = c may be used to solve for y as a function of
x. Assume that either
∂g ∂x
or
∂g ∂y
6= 0, so that we can write one variable as a function
of the other. Recall the discussion on implicit functions. Let F ( x, y) = 0 = g( x, y) − c and assume that F has continuous partial derivatives. If
written as a function, h, of x while if
6= 0, then y can be
6= 0, then x can be written as a function
∂g ∂g ∂F = − ∂ y , if ∂ y 6= 0, then there is a function with y = h( x). Similarly, ∂y ∂g ∂g ∂F ∂ x = − ∂ x , if ∂ x 6= 0 then there is a function ϕ with x = ϕ( y). (These are local
ϕ, of y. Since
since
∂F ∂x
∂F ∂y
12.3. MOTIVATION FOR FIRST AND SECONDORDER CONDITIONS
385
properties.) For example, if g( x, y) = x y = c, then y = xc = h( x); if g( x, y) = ln( x y) = c ¡ ¢ p then x y = e c , and y = 1x e c = h( x); if g( x, y) = px + q y = c, then y = qc − q x = h( x).
Note that the presumption here is that g y 6= 0. (In particular, this must hold at
the candidate solution to the optimization problem, so that the issue raised in Remark 12.2 is assumed not to arise.)
12.3.1
The firstorder condition
Assume that we can write y as a function of x: y = h( x). Then, by substituting
for y the problem reduces to max x f ( x, h( x)) (or max y f (ϕ( y), y) if x is substituted
out). The original problem with two choice variables and a constraint has been redef duced to a onevariable unconstrained problem – maximizing the function fˆ( x) =
f ( x, h( x)). With just two variables, this procedure is, in principle, straightforward.
With more than two variables, the approach is more complex and in general it is more convenient to use Lagrangians. The following calculations show that the solution from max x f ( x, h( x)) and then determining y from x ( y = h( x)) is the same as that obtained from the Lagrangian method, solving max f ( x, y) subject to g( x, y) = c. Specifically, solving
max x f ( x, h( x)) leads to the same firstorder conditions for ( x, y) as obtained earlier, and the secondorder condition is shown to be equivalent to the bordered Hessian condition. Maximizing f ( x, h( x)) with respect to x, the firstorder condition is
0=
d fˆ dy gx = f x + f y h′ = 0, where h′ = =− . dx dx gy
Thus, µ ¶ fy gx fx 0 = fx + f y − , so that = . gy gx g y
This condition and c = g( x, y) give the same conditions as obtained from the Lagrangian firstorder conditions.
12.3.2
The secondorder condition
Maximizing the function fˆ( x) = f ( x, h( x)) gives a firstorder condition for an optimum. The secondorder condition, fˆ′′ ( x) < 0 for a maximum or fˆ′′ ( x) > 0 for a
minimum, may also be related to the bordered Hessian condition: the discussion to follow shows that fˆ′′ < 0 or fˆ′′ > 0 is equivalent to the corresponding bordered Hessian conditions for a local constrained optimum.
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
For a maximum we require that
d2 f d x2
Evaluating this,
386
< 0, which is
d dx
n
df dx
o
=
d dx
©
ª f x + f y h′ < 0.
d d { f x + f y h′ } = { f x ( x, h( x)) + f y ( x, h( x))h′ ( x)} dx dx = f xx + f x y h′ + f x y h′ + f yy (h′ )2 + f y h′′ = f xx + 2 f x y h′ + f yy (h′ )2 + f y h′′ .
Since
h′ ( x) = − h′′ ( x) = − h′′ = − =−
Using h′ =
gx gy ,
g x ( x, h( x)) , g y ( x, h( x)) g y [ g xx + g x y h′ ] − g x [ g x y + g yy h′ ] g2y
g xx g y − g x y g x − g x y g x + g yy g2x / g y g2y
g xx g2y − 2 g x y g x g y + g yy g2x g3y
h′′ as above and λ =
fy gy
(using − h′ =
gx ) gy
.
gives the second derivative as:
" # ¸ · g2x gx d2 f + f yy = f xx + 2 f x y − + f y h′′ dx gy g2y ¾ ½ fy 1 ( g xx g2y − 2 g x y g x g y + g yy g2x ) = 2 f xx g2y − 2 f x y g x g y + f yy g2x − gy gy
=
1
g2y
{( f xx − λ g xx ) g2y − 2( f x y − λ g x y ) g x g y + ( f yy − λ g yy ) g2x },
so
d2 f × ( g2y ) = {( f xx − λ g xx ) g2y − 2( f x y − λ g x y ) g x g y + ( f yy − λ g yy ) g2x }. dx Now, consider the determinant:
0
 H¯ = gx
gy
gx f xx − λ g xx
f xy − λgxy
gy
f xy − λgxy .
f yy − λ g yy
12.3. MOTIVATION FOR FIRST AND SECONDORDER CONDITIONS
387
Evaluating the determinant: H¯  = − g x { g x ( f yy − λ g yy ) − g y ( f x y − λ g x y )} − g y {− g x ( f x y − λ g x y ) + g y ( f xx − λ g xx )} = −{( f xx − λ g xx ) g2y + 2( f x y − λ g x y ) g x g y + ( f yy − λ g yy ) g2x } =−
d2 f × ( g2y ). dx2
Therefore, H¯  has the opposite sign to For a maximum, the condition condition:
¯ ¯0 ¯ ¯ ¯ gx ¯ ¯ ¯gy
∂2 f ∂ x2
∂2 f . ∂ x2
< 0 is sufficient and this is equivalent to the ¯ ¯ ¯ ¯ f x y − λ g x y ¯¯ > 0. ¯ f yy − λ g yy ¯
gx
gy
f xx − λ g xx
f xy − λgxy
Since multiplying a row and a column by −1 does not change either the value or sign of the determinant, the determinant may also be written: ¯ ¯ 0 ¯ ¯ ¯− g x ¯ ¯ ¯− g y
¯ ¯ ¯ ¯ f x y − λ g x y ¯¯ . ¯ f yy − λ g yy ¯
−gx
−gy
f xx − λ g xx
f xy − λgxy
Conversely, in a minimization problem the determinant should be negative, i.e., ¯ ¯0 ¯ ¯ ¯ gx ¯ ¯ ¯gy
gx f xx − λ g xx
f xy − λgxy
¯ ¯ ¯ ¯ f x y − λ g x y ¯¯ < 0. ¯ f yy − λ g yy ¯
gy
Finally, since L xx = f xx − λ g xx , etc., the determinant may be written succinctly: ¯ ¯0 ¯ ¯ H¯  = ¯¯ g x ¯ ¯g y
12.3.3
gx L xx L yx
¯ g y ¯¯ ¯ L x y ¯¯ . ¯ L yy ¯
A geometric perspective
With just one constraint, g, and one objective, f , constrained optimization involves locating a point where the objective is tangent to the constraint and then
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
388
confirming that this is an optimum. The following discussion provides a different perspective. The slope of the constraint may also be measured in terms of a line perpendicular to the constraint. Figure 12.3 shows the perpendicular, P, to the line L and the curve f (x, y) = c in each panel. In fact, the perpendicular to a curve
f (x, y) = c is proportional to the vector of partial derivatives, ( f x (x, y), f y (x, y)).
Furthermore, this vector P points in the direction of greatest local increase in the function and therefore has a direct connection to the optimization problem. To motivate this perspective, first consider the case where the curve is a line such y
as L in Figure 12.3A. From the figure, the slope of the line is − x and the an
gle between L and the horizontal axis is α. With the vertical dashed line as the baseline, the slope of the angle β is
x y.
Because α and β are in two corners of a
rightangled triangle, these two angles sum to 90◦ . Consider the line P, drawn perpendicular to L. Note that the angles α and α∗ are equal, and because the angles α∗ and β∗ sum to 90◦ , β∗ is the same angle as β, and so has the same y
slope, xy . Noting that the slope of α times the slope of β is − x × xy = −1, two lines, L
and P, are perpendicular if and only if their slopes multiply to −1. Now, consider
A
B
y
P fy
P
•
β
(f x ,f y) }
{
x
α
z
y
z } {
¯ y) ¯ ( x,
fx
∗
•β
f (x, y) = c x
∗
α
L
Figure 12.3: Orthogonal lines and normal vectors
Figure 12.3B. Along the level surface f (x, y) = c, and totally differentiating and
setting to 0 gives: 0 = f x (x, y)dx + f y (x, y)d y, so that the slope along the level surf
¯ y) ¯ given by the dashed line is face is − f xy . The slope at ( x,
dy dx
f ( x¯ , y¯ )
= − f xy ( x¯ , y¯ ) . The line
P is chosen perpendicular to the dashed line. Since the slope of P, s P , multiplied f ( x¯ , y¯ )
by the slope of the dashed line equals −1: −1 = − f xy ( x¯ , y¯ ) × s P , s P =
f y ( x¯ , y¯ ) f x ( x¯ , y¯ ) .
Thus, the
¯ y), ¯ is proportional to ( f x ( x, ¯ y), ¯ f y ( x, ¯ y)). ¯ The line perdirection P, starting from ( x, ¯ y), ¯ f y ( x, ¯ y)). ¯ pendicular to the tangent to the level surface is proportional to ( f x ( x,
12.3. MOTIVATION FOR FIRST AND SECONDORDER CONDITIONS
389
To illustrate these ideas in the economic context, consider utility maximization where the utility function is u(x, y) = xα yβ . The partial derivatives of u with respect to x and y are u x = α xα−1 yβ = α ux and u y = β xα yβ−1 = β uy . Thus, ∇u(x, y) = ³ ´ (u x , u y ) = α ux , β uy . Notice, for example, that at (x1 , y1 ), x1 is small relative to y1 , so that α xu is large relative β yu and so the vector ∇u(x1 , y1 ) = (α xu , β yu ) 1
1
1
1
has first component large relative to the second. In Figure 12.4, the dashed line y
(x1 , y1 ) + ∇u(x1 , y1 )
y1 ·
•
(x1 , y1)
(x2 , y2 ) + ∇u(x2 , y2 )
∇u(x2 , y2 )
y2 ·
∇u(x1 , y1 )
· x1
•
(x2 , y2)
· x2
u = u¯ x
Figure 12.4: Utility maximization: the indifference curve starting from (x1 , y1 ) is the vector ∇u(x1 , y1 ). Similarly, the dashed line starting
from (x2 , y2 ) is the vector ∇u(x2 , y2 ). Notice that these dashed lines are perpen
dicular to the level surface of the function u (the set of points {(x, y)  u(x, y) = u¯ }). Also, notice that for i = 1,2, starting from (x i , yi ) and moving in the direction
∇u(x i , yi ) increases the value of the function. For utility maximization, the con
straint has the form g(x, y) = px + q y = c, where p and q are the prices of x and y, respectively. In this case, g x = p and g y = q, so that for all (x, y), ∇ g(x, y) = (p, q)
and the gradient of g(x, y) is constant, independent of (x, y). This is depicted in Figure 12.5a. In Figure 12.5b, at z1 = (x1 , y1 ) there is no number λ such that ∇u(z1 ) = λ∇ g(z1 ) and the indifference curve cuts the budget constraint. The sit
uation is similar at z2 . Neither z1 nor z2 is optimal. Whenever this is the case, ∇u(z) 6= λ∇ g(z) for any λ, the slope of the indifference curve and the budget
constraint differ so that z cannot be optimal. In contrast, at point z3 , ∇u(z3 ) =
λ∇ g(z3 ), where ∇u(z3 ) is proportional to ∇ g(z3 ) and λ scales ∇ g(z3 ) to equal
∇u(z3 ). This is the firstorder condition.
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
390
(a)
( b)
y
y z i = ( x i , yi )
y1 ·
y1 ·
•
(x1 , y1)
z1 •
∇u(z2)
y2 ·
∇ g ( x i , yi )
· x1
•
(x2 ,y2) px + q y = c
· x2
y2 ·
x
∇ g( z i )
· x1
z3•
∇u(z1)
•
z2
u = u¯
· x2
x
Figure 12.5: Utility maximization: optimization
12.4 The nvariable case This section considers the problem of maximizing or minimizing a function
f ( x1 , x2 ,... , x n ) subject to the constraint g( x1 , x2 ,... , x n ) = c. The procedure is sim
ilar to the twovariable case: obtain firstorder conditions relating the partial
derivatives of the objective to those of the constraint and then confirm that the solution to these equations actually is a local maximum or local minimum. The generalization of the Lagrangian to n variables is straightforward. Let L = f ( x1 , x2 ,... , x n ) + λ( c − g( x1 ,... , x n )).
The firstorder conditions are: ∂L ∂xi ∂L ∂λ
= L i = f i ( x) − λ g i ( x) = 0 = L λ = c − g ( x ) = 0.
Let ( x∗ , λ∗ ) be a solution to these n + 1 equations. Whether this is a local max
imum or local minimum depends on secondorder conditions. These in turn are
derived in terms of bordered matrices and associated quadratic forms, developed in the next section. After that, the main result is developed. This gives sufficiency conditions for a solution to the firstorder conditions to be a local maximum or minimum.
12.4. THE N VARIABLE CASE
391
12.4.1
Quadratic forms and bordered matrices
This section recalls some material on quadratic forms and matrix definiteness. A matrix A determines a quadratic form according to the expression x′ Ax. The matrix A is said to be negative or positive definite if x′ Ax > 0 or x′ Ax < 0 for
all nonzero vectors x. The positive and negative definiteness criteria play a cen
tral role in identifying local optima in the study of unconstrained optimization problems. In constrained optimization, there is a comparable criterion based on bordered matrices. The following theorem presents a result on quadratic forms. Its application to constrained optimization is developed subsequently. Theorem 12.3: Let A be an n matrix and let b be an n × 1 vector ! Ã × n symmetric 0 b′ with b 6= 0. Also, let C = C n = b A
0
b1
b2
b1 b C= 2 . .. bn
a11
a12
a21 .. .
a22 .. .
··· .. .
a n1
a n2
···
bn
···
a1 n a2 n , .. . a nn
···
0
b1
b2
b1 b Cj = 2 . .. bj
a11
a12
a21 .. .
a22 .. .
··· .. .
a j1
a j2
···
···
···
bj
a1 j a2 j . .. . ajj
Then • ξ′ A ξ < 0, for all ξ 6= 0 with b′ ξ = 0, if and only if (−1) j C j  > 0, for all j = 2, 3,... , n.
• ξ′ A ξ > 0 for all ξ 6= 0 with b′ ξ = 0, if and only if C j  < 0, for all j = 2, 3,... , n. The logic for this can be seen in the twovariable case. Let
b=
Ã ! α β
,
A=
Ã
a11
a12
a21
a22
!
,
0
C= α β
α
a11 a21
β
a12 .
a22
Consider a quadratic form in A , with x′ = ( x1 , x2 ), x′ Ax = x12 a11 + 2a12 x1 x2 + a22 x22
(note a12 = a21 ). Suppose that the constraint (α, β) · ( x1 , x2 ) = 0 is imposed. Then
x2 = − α x and imposing this condition on the quadratic form gives: β 1 ¶ ¶2 µ µ α α x′ Ax = x12 a11 + 2a12 x1 − x1 + a22 − x1 β β
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
392
· µ ¶ ¶ ¸ µ α α 2 a11 + 2a12 − + a22 − β β i 1 h = x12 β2 a11 − 2a12 αβ + a22 α2 2 . β
= x12
£ ¤ This expression is positive if and only if β2 a11 − 2a12 αβ + a22 α2 is positive.
Now, consider the determinant of C : C  = −α(αa22 − βa12 ) + β(αa21 − βa11 ) = −α2 a22 + αβa12 + βαa21 − β2 a11 = −α2 a22 + 2αβa12 − β2 a11 i h = − β2 a11 − 2a12 αβ + a22 α2 .
For x 6= 0, x′ Ax > 0 for all x satisfying (α, β) · x = 0 if and only if C  < 0.
12.4.2
Bordered Hessians and optimization
In the present context of nvariable optimization, ( x1 ,... , x n ) are chosen to maximize the objective f ( x1 ,... , x n ), subject to the constraint g( x1 ,... , x n ) = c. Define
the bordered Hessian matrices:
0
g1 H¯ = g 2 .. .
gn
g1
g2
···
L11
L12
···
L21 .. .
L22 .. .
···
L n1
L n2
···
..
gn
.
L1 n L2 n , .. .
L nn
0
g1 H¯ j = g 2 .. .
Comparing notation, note that b corresponds to
gj
∂g ∂x .
g1
g2
···
L11
L12
···
L21 .. .
L22 .. .
···
L j1
L j2
···
..
.
gj
L1 j L2 j . .. .
Ljj
The matrix A corresponds
to the matrix of second derivatives of the Lagrangian, and C j corresponds to the bordered Hessian principal minor H¯ j . With this notation, the main result on constrained optimization is: Theorem 12.4: Let L ( x1 ,... , x n , λ) = f ( x1 ,... , x n ) + λ( c − g( x1 ,... , x n )) and suppose
that at ( x∗ , λ∗ ) the following are satisfied: ∂L ∂xi ∂L ∂λ
= L i = 0, i = 1,... , n = L λ = 0.
12.4. THE N VARIABLE CASE
393
If, in addition, the bordered Hessian H¯ satisfies (1) at x∗ , then x∗ is a local maxi¯ satisfies (2) at x∗ , then x∗ is a local minimum, where mum and if H 1.  H¯ 2 > 0,  H¯ 3 < 0,  H¯ 4 > 0, ..., (−1)n  H n > 0 ¯ n < 0. 2.  H¯ 2 < 0,  H¯ 3 < 0, ...,  H Note that the theorem asserts that x∗ is a local maximum or minimum; x∗ may not be a global maximum or minimum. Some motivation for this result is as follows. The Lagrangian firstorder necessary conditions for an optimum are: ∂L ∂xi ∂L ∂λ
= f i (x) − λ g i (x) = 0, i = 1,... , n
(12.11)
= c − g(x) = 0.
(12.12)
Figure 12.6 illustrates failure of the tangency conditions (Equation 12.11). The figure shows that when there is no λ such that condition f i = λ g i for all i, then generally there is an objective improving change that satisfies the constraint. There is an nvariable analogy to Theorem 12.2.
¯ g( x) ¯ = c, but there is no λ 6= 0 such At x, that ¯ = λ∇ g( x). ¯ ∇ f ( x)
¯ } { x  f (x) ≥ f ( x)
x¯
x¯ +∇ f
•
¯ Thus, the level curves of f and g cross at x. ˆ =c So there is a point such as xˆ with g( x) and ˆ > f ( x). ¯ f ( x)
• xˆ
x¯ +∇ g g=c
Figure 12.6: Tangency condition fails
Theorem 12.5: Let f and g be continuously differentiable functions on an open set Z in R n . Suppose that (x∗ ) is a constrained local maximum or minimum of
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
394
f , subject to the constraint g(x) = c. Suppose that ∇ g(x∗ ) 6= 0. Then there is a λ∗
such that
∇ f (x∗ ) = λ∗ ∇ g(x∗ ).
12.4.2.1
The sign of the Lagrange multiplier
Sometimes it is useful to know the sign of the Lagrangian multiplier at the solution. However, with equalityconstrained optimization, the multiplier may be positive or negative, depending on the way the constraint is expressed. Figure 12.7 illustrates the case where the firstorder tangency conditions are satisfied. There are two possibilities. In the first, ∇ f and ∇ g “point” in the same direction, so that
¯ } for some λ > 0, ∇ f = λ∇ g. This occurs when the upper contour sets { x  g(x) ≥ g( x) ¯ } are on the same side of the g and f curves through x. ¯ This is and { x  f (x) ≥ f ( x) depicted in (b).
When ∇ f and ∇ g “point” in opposite directions, then for some λ < 0, ∇ f = λ∇ g.
¯ } and { x  f (x) ≥ f ( x) ¯ } are This occurs when the upper contour sets { x  g(x) ≥ g( x)
¯ This is depicted in (a). These on opposite sides of the g and f curves through x. observations show that the sign of the Lagrange multiplier, λ, is indeterminate in equalityconstrained optimization. x2
x2
(a)
(b) ¯ } { x  f (x) ≥ f ( x)
¯ } { x  f (x) ≥ f ( x)
x¯ +∇ f
x¯ +∇ f x¯ •
x¯ •
x¯ +∇ g
x¯ +∇ g
f
{ x  g(x) ≥ c}
g=c
f x1
{ x  g(x) ≤ c}
g=c
Figure 12.7: Tangencies at the solution
To illustrate, suppose that f (x1 , x2 ) = x1 x2 and g(x1 , x2 ) = x12 + x22 = c. Then ∇ f = ( f x1 , f x2 ) = (x2 , x1 )
and ∇ g = (g x1 , g x2 ) = (2x1 ,2x2 ),
x1
12.4. THE N VARIABLE CASE
395
so that ∇ f = 12 ∇ g, and λ = 21 . Both ∇ f and ∇ g point into the positive orthant with
the vector ∇ f twice the length of ∇ g. Now, suppose that gˆ ( x1 , x2 ) = − x12 − x22 so
that the constraint is unchanged if gˆ ( x1 , x2 ) = − c. Since ∇ gˆ = (−2 x1 , −2 x2 ) = −∇ g, ∇ f = − 21 ∇ gˆ and λˆ = − 12 . The sign of the Lagrange multiplier depends on the way in which the constraint is expressed.
12.4.3
The secondorder condition reconsidered
The following discussion elaborates on the connection between the bordered Hessian conditions and the optimality of x∗ , obtained as a solution to the firstorder conditions. In particular, the discussion provides additional perspective on the connection between the bordered Hessian and the secondorder condition. The Lagrangian equals the objective function at any x satisfying the constraint: L ( x, λ) = f ( x) + λ[ c − g( x)] = f ( x).
At a constrained maximum, ( x∗ , λ∗ ), f ( x∗ + v) − f ( x∗) ≤ 0, for any v with g( x∗ + v) −
c = 0. Consider a Taylor series expansion of L ( x∗ + v, λ) around x∗ . This gives: L ( x∗ + v, λ) = L ( x∗ , λ) +
X ∂L ( x∗ , λ) i
∂xi
1 ˆ λ)v v i + v′ HL ( x, 2
1 ˆ λ)v, L ( x∗ + v, λ) = L ( x∗ , λ) + v′ HL ( x, 2 ˆ λ) is the matrix where HL ( x, ∗
∂2 L ∂ xi ∂ x j
ˆ λ) where xˆ is a point between evaluated at ( x,
∗
x and x + v, and where the second equality follows from the firstorder condi
tions,
∂L ∂ xi
= 0, ∀ i .
Since x∗ and x∗ + v are feasible, ( c − g( x∗ ) = 0, c − g( x∗ + v) = 0), L ( x∗ + v, λ) =
f ( x∗ + v) and L ( x∗ , λ) = f ( x∗ ), so from the previous expression 1 ˆ λ)v. f ( x∗ + v) = f ( x∗ ) + v′ HL ( x, 2 ˆ λ)v < 0. Therefore f ( x∗ + v) < f ( x∗ ) if v′ HL ( x,
The constraint function satisfies, using a firstorder Taylor series, 0 = g( x∗ + v) − c = g( x∗ ) − c + g x ( x˜ ) · v = g x ( x˜ ) · v, where x˜ is (on the line) between x∗ and x∗ + v.
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
396
Recall that given a vector b and matrix A , z′ Az < 0 for all z satisfying b′ z = 0
if and only if the matrix
Q=
Ã
0
b′
b
A
!
satisfies the conditions (−1) j Q j  > 0, j = 2,... , n. Equivalently, Q 2  > 0, Q 3  < 0, and so on. (Here, Q j is the northwest principal minor of Q , of order j + 1.)
Returning to the optimization problem, consider the matrices: ∗
Q =
Ã
0
g x ( x ∗ )′
g x ( x∗ )
HL ( x∗ , λ)
!
ˆ x˜ ) = and Q ( x,
Ã
0
g x ( x˜ )′
g x ( x˜ )
ˆ λ) HL ( x,
!
.
ˆ x˜ ) ≈ Q ∗ so that if Q ∗ satisSince these functions are continuous, for v small, Q ( x,
ˆ x˜ ) > 0 for j ≥ 2 also. Therefore, since fies (−1) j Q ∗j  > 0 for j ≥ 2, then (−1) j Q j ( x, ˆ λ)v < 0 and so f ( x∗ + v) < f ( x∗ ). g x ( x˜ ) · v =0, v′ HL ( x,
12.5 Interpretation of the Lagrange multiplier The Lagrangian is L = f ( x1 ,... , x n ) + λ( c − g( x1 ,... , x n )). This has firstorder con
ditions:
f i ( x1∗ ,... , x∗n ) − λ g i ( x1∗ ,... , x∗n ) = 0 c = g( x1∗ ,... , x∗n ) = 0. Let ( x∗ , λ) be a solution. Note that x∗ and λ depend on c: as c varies they will also. Thus x∗ and λ may be written as functions of c : x∗ ( c), λ( c). These functions give a solution at each value of c. In particular, the firstorder conditions hold at each value of c:
f i ( x1∗ ( c), · · · , x∗n ( c)) − λ( c) g i ( x1∗ ( c), · · · , x∗n ( c)) = 0 c − g( x1∗ ( c),... , x∗n ( c)) = 0. The value of the objective at c is: L ( c) = f ( x1∗ ( c),... , x∗n ( c)) + λ( c)[ c − g( x1∗ ( c),... ,
x∗n ( c))]. This function may be differentiated with respect to c to give: ∗
∗
∗
∗
dx dx dx dx dx∗ dx∗ d L ( c) = f 1 1 + f 2 2 + · · · + f n n − λ( c) g 1 1 − λ( c) g 2 2 − · · · − λ( c) g n n dc dc dc dc dc dc dc +
dλ [ c − g( x1∗ ( c),... , x∗n ( c))] + λ( c). dc
12.5. INTERPRETATION OF THE LAGRANGE MULTIPLIER
397
Rearranging: ∗
n dx d L ( c) X = [ f i ( x1∗ ( c)),... , x∗n ( c)) − λ( c) g i ( x1∗ ( c),... , x∗n ( c))] i dc dc i =1
+
dλ [ c − g( x1∗ ( c),... , x∗n ( c)] + λ( c) dc
= λ( c).
This follows using the firstorder conditions. Thus,
dL dc
= λ, the marginal impact
on the value of the objective due to changing the constraint “a little”, is equal to the Lagrange multiplier. The next example uses the Lagrange multiplier to calculate a derivative: the change in a function due to a small change in its argument.
E XAMPLE 12.4: Consider the problem: max f ( x) = x2 subject to x = c. The
Lagrangian is:
L ( x, λ) = x2 + λ( c − x),
with firstorder conditions: L x = 2 x − λ = 0, ⇒ x =
λ
2
Lλ = c − x = 0 ⇒ x = c,
so λ( c) = 2 c, and x( c) = c.
f ( x) x2
( c + ∆ c )2 c2
c c + ∆c
x
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
398
def
Thus, L ∗ ( c) = L ( x( c), λ( c)) = x( c)2 = c2 so that
dL ∗ dc
= λ = 2 c.
The figure illustrates this. Note that varying c “a little” amounts to calculating the slope of the function, x2 , which at c is equal to 2 c. Since L ∗ ( c) = L ( x( c), λ( c)) = f ( x( c)), the Lagrangian at the solution is identically ′
equal to the value of the objective at the solution, L ∗ ( c) = f ′ ( x( c)) dd xc =
2 x( c) dd xc = 2 c.
♦
Example 12.5 shows that in the standard utility maximization problem, the Lagrange multiplier measures the marginal utility of income: the utility gain an optimizing individual would achieve from a small increase in income. E XAMPLE 12.5: Consider the standard utility maximization problem: max u( x, y) = ln x + ln y, subject to px + q y = I . The Lagrangian is: L = ln x + ln y + λ( I − px − q y).
So, the firstorder conditions are: 1 − λp = 0 x 1 L y = − λq = 0 y Lx =
Lλ = ( I − px + q y) = 0.
Solving, x =
I 2p ,
y=
bordered Hessian is
I 2q ,
and λ = 2I . L xx = − x12 , L yy = − y12 and L x y = 0. The
0
H¯ = p
q
Thus,  H¯ = I 2p
p2 y2
and y( I ) =
+
I 2q :
q2 x2
p − x12
0
q
0 . − y12
> 0. Denote the solution values as functions of I , x( I ) =
¶ µ ¶ I I + ln 2p 2q µ 2 ¶ I = ln . 4 pq
u( x( I ), y( I )) = ln
µ
12.5. INTERPRETATION OF THE LAGRANGE MULTIPLIER
399
Thus, 1 2 du( x( I ), y( I )) I = 2 · = = λ. I dI 2 pq I 4 pq
Observe that if we differentiate L with respect to I we get λ and evaluating at the solution value for λ gives 2I . The same reasoning allows us to conclude that the marginal impact on maximum utility of a change in price p is given by −λ( I ) x( I ) = − 2I 2Ip = − 1p ,
differentiating the Lagrangian with respect to p and evaluating at the solu
tion. Differentiating u( x( I ), y( I )) with respect to p gives −
2
1 I2 4 pq
· 4 pI 2 q = − 1p , so
that the marginal impact of an increase in the price of x is − 1p .
E XAMPLE 12.6 (C OST
♦
MINIMIZATION SUBJECT TO A PRODUCTION QUOTA ):
The cost function is c = rK + wL and the production function is Q = K α Lβ . The problem is:
min L = rK + wL + λ(Q − K α Lβ ). The firstorder conditions are: L K = r − λα
Q K α Lβ = 0 ⇒ r − λα = 0 K K
(1)
Q K α Lβ = 0 ⇒ w − λβ = 0 L L
(2)
L L = w − λβ
L λ = Q − K α L β = 0.
Solving using (1) and (2),
r w
Q = K α Lβ =
µ
=
α L , β K
αw βr
¶α
(3)
which implies that K =
Lα Lβ =
µ
αw βr
¶α
αw L. βr
L α +β .
Thus
L=
µ
αw βr
¶
−α α+β
1
Q α+β .
So
K=
αw βr
L=
µ
αw βr
¶µ
αw βr
¶
−α α+β
1
Q α+β =
µ
αw βr
¶
β α+β
1
Q α+β .
Then
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
400
The solutions can also be written:
K=
³r´
−β α+β
µ
w β
¶
L=
³r´
α α+β
µ
w β
¶
α α
β α+β
−α α+β
1
Q α+β = K (Q ) 1
Q α+β = L(Q ).
As required, these satisfy K (Q )α L(Q )β = Q, ∀Q . To solve for λ, observe that λ=
³r´K
α Q
=
³ r ´³ r ´
Compare this with
−β α+β
µ
w β
¶
β α+β
Q
1 α+β−1
=
³r´
α α+β
α
µ
w β
α
α
dc dQ .
Note that c(Q ) = rK (Q ) + wL(Q ),
¶
β α+β
Q
1−(α+β) α+β
.
c(Q ) = rK (Q ) + wL(Q ) µ ¶ ³r´ w K (Q ) + β L(Q ) =α α β ³ r ´ α µ w ¶ α+β β ³ r ´ α µ w ¶ α+β β 1 1 α+β α+β ( α + β ) +β Q Q (α+β) =α α β α β β ¶ µ ³ r ´ α w α+β 1 α+β = (α + β) Q (α+β) . α β Therefore, µ ¶ β dc ³ r ´ α+αβ w α+β 1−α(α++ββ) = Q dQ α β
= λ(Q ).
♦
E XAMPLE 12.7: A production technology relating output Q to inputs x and
y is given by the equation y = Q + e kx . The cost function is given by C = y −δ x,
where δ > k. Consider minimizing cost by choice of x and y.
In this case the Lagrangian is: L = y − δ x + λ[ y − Q − e kx ]. Therefore, L x = −δ − λke kx = 0 Ly = 1+λ = 0 Lλ = [ y − Q − e kx ] = 0.
From the second equation, λ = −1 and ³substituting into the first equation ´ gives: δ = ke kx . Solving for x, x = 1k ln δk . Since y = Q + e kx , we get y = 1
δ
δ
Q + e k k ln( k ) = Q + eln( k ) = Q + δk .
12.5. INTERPRETATION OF THE LAGRANGE MULTIPLIER
401
For the secondorder condition, L xx = −λk2 e kx , L yy = 0, L x y = 0. The
constraint is 0 = g( x, y), where g( x, y) = − y + Q + e kx , so that g x = +ke kx and
g y = −1. The bordered Hessian is
ke kx
0
¯ = ke kx H −1
−λ k 2 e 2 x
0
−1
¯ ¯ke kx ¯ ¯ 0 ⇒ H = −1 ¯¯ −1 0
¯ −λk2 e kx ¯¯ ¯ ¯ 0
= λk2 e kx = −k2 e kx < 0.
Thus, the solution to the firstorder conditions gives a local minimum. Write the solution values of x and y as x(k, δ) and y(k, δ), and the solution value of the Lagrange multiplier as λ(k, δ). Then the cost is given by C (k, δ) =
y(k, δ) − δ x(k, δ). Using the envelope theorem ∂C ∂k
= −λ(k, δ) x(k, δ) e kx( k,δ) ,
∂C ∂δ
= − x(k, δ).
Note the answers are equal to the derivative of the Lagrangian with respect
♦
to the appropriate variable, evaluated at the solution.
E XAMPLE 12.8: Recall Example 12.3: max f ( x, y) = x2 + ( y − 1)2 subject to
α = y − kx, 0 < α < 1 and k > 0. The solution: x∗ = ∗
λ
−α ) . = − 2(1 1+ k 2
∗
Thus, y − 1 =
α+ k2 −(1+ k2 ) 1+ k 2
=
α −1 , 1+ k 2
(1−α) k , 1+ k 2
so that
L ∗ = ( x∗ )2 + ( y∗ − 1)2 ¶ µ ¶ (1 − α)k 2 α−1 2 + 1 + k2 1 + k2 µ ¶2 1 = [((1 − α)k)2 + (α − 1)2 ] 1 + k2 ¶2 µ 1 [(1 − α)2 (1 + k2 )] = 1 + k2 ¶ µ 1 (1 − α)2 . = 1 + k2
=
µ
y∗ =
α+ k 2 , 1+ k 2
and
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
402
Therefore, ∂L ∗ ∂α
=−
µ
¶ 1 2(1 − α) = λ∗ . 1 + k2
Thus, the impact on the constrained objective is obtained by differentiating the Lagrangian with respect to the parameter and evaluating the derivative
♦
at the solution.
E XAMPLE 12.9: Recall the optimal design of the cylinder in Example 12.1. ³ 3 ´1 2 A There L ∗ = 54 . Differentiating with respect to A, π ∂L ∗
1 = ∂A 2
Using the fact that r2 =
³
A 6π
·µ
¶ ¸− 1 µ ¶ A 2 2 2 3A 2 r 9 54π
´ , ∂L ∗ ∂A
=
1 r = λ (or λ∗ ). 2
Again, this is the derivative of the Lagrangian with respect to A, evaluated
♦
at the solution.
The discussion this far has focused on the identification of a local optimum. Section 12.6 provides conditions under which a local optimum is also a global optimum.
12.6 Optimization: concavity and convexity In a constrained optimization problem, the shape of the objective function and the constraints play an important role in characterizing global optima – just as in the unconstrained case. For this discussion, assume that the domain of the function is convex. Definition 12.2: A set X is convex if ∀ x, y ∈ X ⇒ θ x + (1 − θ)y ∈ X , ∀θ ∈ [0,1].
12.6. OPTIMIZATION: CONCAVITY AND CONVEXITY
403
The main result below relies on quasiconcavity of the objective function. Definition 12.3 introduces a variety of notions of quasiconcavity. Definition 12.3: Concavity and quasiconcavity. 1. A function f is concave if f (θ x + (1 − θ)y) ≥ θ f (x) + (1 − θ) f (y), ∀ x, y ∈ X , θ ∈ [0,1].
2. A function f is strictly concave if f (θ x + (1 − θ)y) > θ f (x) + (1 − θ) f (y), ∀ x 6= y ∈ X , θ ∈ (0,1).
3. A function f is quasiconcave if f (θ x+(1−θ)y) ≥ min{ f (x), f (y)}, ∀ x, y ∈ X , ∀θ ∈ [0,1].
4. A function f is strictly quasiconcave if f (θ x + (1 − θ)y) > min{ f (x), f (y)}, ∀ x 6= y, θ ∈ (0,1).
5. A function f is explicitly quasiconcave if it is quasiconcave and f (x) > f (y) ⇒ f (θ x + (1 − θ)y) > f (y), ∀θ ∈ (0,1).
Concavity is a more demanding requirement than quasiconcavity. P ROPOSITION 12.1: If f is (strictly) concave then f is (strictly) quasiconcave.♦ P ROOF : If f is concave then f (θ x + (1 − θ)y) ≥ θ f (x) + (1 − θ) f (y) ≥ min{ f (x), f (y)}.
With strict quasiconcavity and x 6= y, f (θ x + (1 − θ)y) > θ f (x) + (1 − θ) f (y) ≥
min{ f (x), f (y)}. Furthermore,
P ROPOSITION 12.2:
If f is strictly quasiconcave then f is explicitly quasicon
♦
cave.
P ROOF : If f is strictly quasiconcave and f (x) > f (y) then f (θ x + (1 − θ)y) >
min{ f (x), f (y)} > f (y) and this confirms the proposition.
The relations are depicted in Figure 12.8. Notice that explicit quasiconcavity allows flat sections at the top of the function (a property excluded by strict quasiconcavity). Recall that concavity is related to the Hessian matrix H(x) =
f 11 (x)
f 12 (x)
f 21 (x) H(x) = . .. f n1 (x)
f 22 (x) .. . f n2 (x)
···
··· .. . ···
f 1n (x)
f 2n (x) , .. . f nn (x)
n
∂2 f (x) ∂ x∂ x′
o ,
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
f ( x)
f ( x) flat section only on top
404
f ( x) no flat section
flat section
x quasiconcave
x explicitly quasiconcave
x strictly quasiconcave
Figure 12.8: Quasiconcavity
where the dependence of the partial derivatives on x = ( x1 , x2 ,... , x n ) is made explicit.
As discussed earlier, if f is concave then the Hessian matrix H ( x) is negative semidefinite for all x. In fact, f is concave if and only if H ( x) is negative semidefinite for all x. If f is strictly concave then H ( x) is negative semidefinite for all
x; and, conversely, if H ( x) is negative definite for all x then f is strictly concave. Notice that strict concavity of f does not imply that H ( x) is negative definite for all x. This observation is clarified in the following remark.
R EMARK 12.3: While an everywhere negative definite Hessian matrix implies strict concavity of the associated function, the converse is not the case: a strictly concave function need not have an everywhere negative definite Hessian matrix. The following example confirms the latter observation. Consider f ( x) =
− x4 , with f ′ ( x) = −4 x3 and f ′′ ( x) = −12 x2 . Thus, f ′′ ( x) ≤ 0 with f ′′ (0) = 0, so
viewing f ′′ ( x) as the Hessian matrix (with just one entry), this is not nega
tive definite for all x, since the requirement fails at x = 0. However, this func
tion is strictly concave as the following discussion demonstrates. To do so, it is necessary to show that when 0 < θ < 1 and y 6= x, then f (θ x + (1 − θ) y) >
θ f ( x) + (1 − θ) f ( y). With f ( x) = − x4 ,
f (θ x + (1 − θ) y) = −(θ x + (1 − θ) y)4 = −{4θ3 x3 y + 6θ2 x2 y2 − 4θ4 x3 y + 12θ3 x y3 + 6θ4 x2 y2 + 4θ x y3 − 4θ4 x y3 − 12θ2 x y3 − 12θ3 x2 y2 + θ4 x4 − 4θ y4 + y4 + 6θ2 y4 − 4θ3 y4 + θ4 y4 }.
12.6. OPTIMIZATION: CONCAVITY AND CONVEXITY
405
Let δ = f (θ x + (1 − θ)y) − [θ f (x) + (1 − θ) f (y)] = −(θ x + (1 − θ)y)4 − {−θ x4 − (1 − θ)y4 }.
The function f is strictly concave if δ > 0 whenever θ ∈ (0,1) and x 6= y. δ = −{4θ3 x3 y + 6θ2 x2 y2 − 4θ4 x3 y + 12θ3 x y3 + 6θ4 x2 y2 + 4θ x y3 − 4θ4 x y3
− 12θ2 x y3 − 12θ3 x2 y2 + θ4 x4 − 3θ y4 + 6θ2 y4 − 4θ3 y4 + θ4 y4 − θ x4 } £ ¤ = −{θ(x − y)2 (θ − 1) θ2 y2 − 2θ2 x y + θ2 x2 + 2θ x y − 3θ y2 + θ x2 + 3y2 + 2x y + x2 } £ ¤ = θ(1 − θ)(x − y)2 θ2 y2 + 2θ(1 − θ)x y + θ2 x2 + 3(1 − θ)y2 + θ x2 + 2x y + x2 £ ¤ = θ(1 − θ)(x − y)2 θ2 y2 + (2θ(1 − θ) + 1)x y + θ2 x2 + 3(1 − θ)y2 + θ x2 + x2 .
From this expression, it is clear that with x 6= y and θ ∈ (0,1), δ > 0 if x y ≥ 0. It remains to show that δ > 0 for x y < 0. Suppose that x y < 0. Take x < 0 and y > 0 (the reasoning is the same for the case y < 0 and x > 0), and set z = θ x +
(1 − θ)y so that x < z < y. Regarding z there are two possibilities: z < 0 or z > 0. The analysis is the same in both cases, so consider the case where z < 0. Then there is a β, 0 < β < 1 with z = β x + (1 − β)0 = β x. So, z = β x + (1 − β)0 = θ x + (1 − θ)y
and thus (β − θ)x = (1 − θ)y. Since y > 0 and x < 0, β < θ. Therefore,
f (z) = f (β x + (1 − β)0) > β f (x) + (1 − β) f (0) > θ f (x) + (1 − θ) f (0) > θ f (x) + (1 − θ) f (y), where the first inequality follows because f (β x) = −β4 x4 > −β x4 = β f (x) and
f (0) = 0; the second because β < θ and f (0) = 0; and the third because f (0) > f (y).
An identical argument applies when z > 0.
So far the results apply only to finding local maxima or minima. For a global result it is necessary to impose additional assumptions. The following result assumes that the constraint set is convex: { x  g(x) = c} is convex. Observe that { x  g(x) = c} convex implies g(x) = a0 + a1 x1 + · · · + a n x n , (g(θ x + (1 − θ)y) = θ g(x) +
(1 − θ)g(y), ∀(x, y).) In Figure 12.9, the set of x i that satisfy g(x) = c lie on the straight line.
Theorem 12.6: Let L (x1 ,... , x n , λ) = f (x1 ,... , x n ) + λ(c − g(x1 ,... , x n )), where (a) f is explicitly quasiconcave (b) { x  g(x) = c} is convex.
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
406
x2
•
x∗
•
θ x∗ + (1 − θ) x¯
• { x˜  g( x˜ ) = c}
x¯ { x˜  f ( x˜ ) = f ( x¯ )}
x1
Figure 12.9: A linear constraint
If ( x∗ , λ∗ ) satisfy, 1. L i ( x∗ , λ∗ ) = 0, i = 1,... , n Lλ ( x∗ , λ∗ ) = 0
¯ 3 ( x∗ ) < 0,... , (−1)n H¯ n ( x∗ ) > 0, 2. H¯ 2 ( x∗ ) > 0, H then x∗ is a global maximum of the constrained program. P ROOF : Suppose ( x∗ , λ∗ ) satisfies the conditions (1) and (2) but is not a global maximum. Since x∗ is not a global maximum, ∃ y, g( y) = c and f ( y) > f ( x∗ ). Since g is linear, g(θ y + (1 − θ) x∗ ) = c, or g( x∗ + θ( y − x∗ )) = c. So θ y + (1 − θ) x∗ =
x∗ + θ( y − x∗ ) is feasible and since f is explicitly quasiconcave, f ( x∗ + θ( y − x∗ )) > f ( x∗ ), ∀θ ∈ (0, 1). However, since ( x∗ , λ∗ ) satisfy (1) and (2), from Theorem 12.4,
x∗ is a local maximum so that in some neighborhood of x∗ , all x˜ satisfying g( x˜ ) = c also satisfy f ( x˜ ) ≤ f ( x∗ ). However, θ can be chosen arbitrarily small and f ( x∗ +
θ( y − x∗ )) > f ( x∗ ). Thus, there are feasible points arbitrarily close to x∗ with a
higher value of f ; a contradiction.
The theorem is not valid without the assumption that (2) holds, as the next example illustrates.
12.6. OPTIMIZATION: CONCAVITY AND CONVEXITY
407
E XAMPLE 12.10: Define the function f 1 ( x):
f 1 ( x) = ( x − 1)3 , x ≤ 2
= ( x − 1)3 − ( x − 2)4 , x > 2.
f 1 ( x)
x
Set f ( x, y) = f 1 ( x), g( x, y) = x − y and c = 0. Let ( x∗ , y∗ , λ∗ ) = (1, −1, 0). Then
f x ( x, y) = 3( x − 1)2 if x < 2 and f x ( x, y) = 3( x − 1)2 − 4( x − 2)3 if x > 2. Thus
f x (1, −1) = 0, f y ( x, y) = 0, g x ( x, y) = 1 and g y = −1. Therefore, f x (1, −1) − 0 · g x (1, −1) = 0 f y (1, −1) − 0 · g y (1, −1) = 0 0 − (1 − 1) = 0, so (1) is satisfied at ( x∗ , y∗ , λ∗ ) = (1, −1, 0). Next, ¯ ¯0 ¯ ¯ ∗ ∗ ∗ ¯  H2 ( x , y , λ ) = ¯¯ g x ¯ ¯gy
gx f xx f yx
¯ g y ¯¯ ¯ f x y ¯¯ = 0. ¯ f yy ¯
Thus (2) is not satisfied. Observe from the graph that ( x∗ , y∗ , λ∗ ) = (1, −1, 0) does not correspond to a maximum.
♦
R EMARK 12.4: A variation on the proof of Theorem 12.6 may be given using Theorem 12.3 directly, as follows. Since { x  g( x) = c} is convex, g( x) = a0 +
a1 x1 + ... + a n x n because if x, y are in this set then so is θ x + (1 − θ) y. Then
g(θ x + (1 − θ) y) = c = θ g( x) + (1 − θ) g( y) so that g is linear (affine). Therefore
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
408
g i = a i and L i j = f i j . Thus
0
a1
a2
a1 ¯ H = a2 .. .
···
f 11
f 12
···
f 21 .. .
f 22 .. .
···
f n1
f n2
···
an
..
.
an
f 1n f 2n . .. .
f nn
¯ ( x) to denote the matrix evaluated at x. Let a′ = (a1 , a2 ,... , a n ), so Write H
g( x) = a0 + a′ x. Suppose x∗ is not a global maximum so ∃ y, g( y) = c, f ( y) >
f ( x∗ ). Explicit quasiconcavity implies that f (θ y + (1 − θ) x∗ ) > f ( x∗ ), ∀θ ∈ (0, 1),
or f [ x∗ + θ( y − x∗ )] > f ( x∗ ), ∀θ ∈ (0, 1).
Since g( y) = a0 + a′ y = c = g( x∗ ) = a0 + a′ x∗ , g( y) − g( x∗ ) = a′ y − a′ x∗ = c − c = 0,
or a′ ( y − x∗ ) = 0. Also a′ θ( y − x∗ ) = 0. Recall the notation in Theorem 12.3 and let
f 11 ( x)
f 21 ( x) A ( x) = . . . f n1 ( x)
f 12 ( x)
···
f 22 ( x)
···
.. .
..
f n2 ( x)
f 1 n ( x)
f 2 n ( x) , b = a, C j ( x) = H¯ j ( x). .. . f nn ( x)
.
···
Given (2) in the statement of Theorem 12.6, (−1) j  C j ( x∗ ) > 0, j = 2,... , n, so from Theorem 12.3, ξ′ A ( x∗ )ξ < 0, for all ξ 6= 0, with a′ ξ = 0 (= b′ ξ).
As f ( x∗ + θ( y − x∗ )) > f ( x∗ ), take θ sufficiently small for a Taylor series ex
pansion:
1 f ( x∗ + θ( y − x∗ )) = f ( x∗ ) + f ′ ( x∗ )θ( y − x∗ ) + θ( y − x∗ )′ A ( x˜ )θ( y − x∗ ) > f ( x∗ ), 2 (12.13) where x˜ is close to x∗ . Cancel f ( x∗ ) to get θ f ′ ( x∗ )( y − x∗ ) + 21 θ2 ( y − x∗ )′ A ( x˜ ) ×
( y− x∗ ) > 0. Consider the term ( y− x∗ )′ A ( x˜ )( y− x∗ ). At x∗ , ( y− x∗ )′ A ( x∗ )( y− x∗ ) < 0, since a′ ( y − x∗ ) = 0 and ( y − x∗ ) 6= 0.
For x˜ sufficiently close to x∗ , (−1) j  C j ( x˜ ) > 0, j = 2,... , n, and since a′ ( y − x∗ )
and ( y − x∗ ) 6= 0, it must be that ( y − x∗ )′ A ( x˜ )( y − x∗ ) < 0. Thus to satisfy the inequality in expression 12.13, θ f ′ ( x∗ )( y − x∗ ) > 0, so f ′ ( x∗ )( y − x∗ ) > 0. However, 0 < f ′ ( x∗ )( y − x∗ ) = lim
θ →0
½
¾ f ( x∗ + θ( y − x∗ )) − f ( x∗ ) . θ
12.6. OPTIMIZATION: CONCAVITY AND CONVEXITY
409
This implies that for θ small, f ( x∗ + θ( y − x∗ )) > f ( x∗ ) and since x∗ + θ( y − x∗ ) is feasible ( g( x∗ + θ( y − x∗ )) = c) this contradicts the fact that x∗ satisfies the
conditions for a local maximum.
12.6.1
Quasiconcavity and bordered Hessians
Just as there is a close connection between concavity (and convexity) and the Hessian matrix, there is a similar connection for quasiconcavity (quasiconcavity), except that the Hessian matrix is replaced by a bordered Hessian matrix. Augment the Hessian matrix with a border of first derivatives as follows. Let
0
f1
f2
f1 f Q = 2 . .. fn
f 11
f 12
f 21 .. .
f 22 .. .
··· .. .
f n1
f n2
···
···
···
fn
f 1n f 2n , .. . f nn
0
f1
f2
f1 f Qj = 2 . .. fj
f 11
f 12
f 21 .. .
f 22 .. .
f j1
f j2
···
···
··· .. .
···
fj
f1 j f2 j . .. . fjj
Write Q ( x) and Q j ( x) to denote matrices evaluated at x. These matrices are used to characterize quasiconcavity. Theorem 12.7: 1. If f is quasiconcave then,  Q 2 ( x) ≥ 0,  Q 3 ( x) ≤ 0,... , (−1)n  Q n ( x) ≥ 0, ∀ x.
(Q 1 ≤ 0 always holds). 2. If  Q 1 ( x) < 0,  Q 2 ( x) > 0,... , (−1)n  Q n ( x) > 0, ∀ x,
then f is quasiconcave. 3. If f is quasiconvex, then,  Q 2 ( x) ≤ 0,  Q 3 ( x) ≤ 0,... ,  Q n ( x) ≤ 0, ∀ x.
4. If  Q 1 ( x) < 0,  Q 2 ( x) < 0,... ,  Q n ( x) < 0, ∀ x,
then f is quasiconvex.
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
410
These results can be related to constrained optimization as follows. Consider the problem: max f ( x1 , x2 ,... , x n )
subject to
g( x1 , x2 ,... , x n ) = c.
Suppose that g( x1 ,... , x n ) = a0 + a1 x1 + a2 x2 + ... + a n x n . Then g i = a i . Let L = f ( x1 ,... , x n ) + λ( c − g( x1 ,... , x n )).
Then L i = f i − λ g i and L i j = f i j . The firstorder conditions are L i = 0, ∀ i , or
f i = λ g i , ∀ i , or g i =
1 λ f i , ∀i.
Theorem 12.4 gives secondorder conditions in terms
of the Hessian matrix. In the present context,
0
− g1 −g Hn = H = 2 . ..
− gn
− g1
− g2
···
L11
L12
L21 .. .
L22 .. .
··· .. .
L n1
L n2
···
···
− gn
0
L1 n − f 1 /λ L2 n = − f 2 /λ . .. . . . − f n /λ L nn
− f 1 /λ
− f 2 /λ
···
L21 .. .
L22 .. .
··· .. .
L n1
L n2
···
L11
L12
···
− f n /λ
L1 n L2 n . .. . L nn
Thus
0
f1
f2
f1 1 f2 H¯ j = 2 λ . ..
f 11
f 12
f 21 .. .
f 22 .. .
f j1
L j2
fj
···
···
··· .. .
···
fj
f1 j f2 j , .. . Ljj
since L i j = f i j in the linear constraint case. Noting that H j corresponds to Q j , Theorem 12.8: If the bordered Hessian satisfies the conditions for f to be quasiconcave, then the bordered Hessian satisfies the sufficient conditions for a local maximum.
E XAMPLE 12.11: Consider the problem: max U ( x1 ,... , x n ) subject to: p1 x1 +
p2 x2 + · · · + p n x n = Y , or p · x = Y = g( x). Thus, − g i = − p i and L i j = U i j .
Therefore, a sufficient condition for x∗ to be a local maximum is
0 − p 1 − p2
− p1
U11
U21
− p2
0
− p 1 U12 > 0, − p 2 U22 − p3
− p1
− p2
U21
U22
U31
U32
U11
U12
− p3
U13 < 0, etc. U23
U33
♦
12.7. OPTIMIZATION WITH MANY CONSTRAINTS
411
12.7 Optimization with many constraints Consider a square n × n matrix A , and let B be an m × n matrix with m < n. Thus, a11 a12 · · · a1 n b11 a12 · · · a1 n a21 a22 · · · a2 n . .. .. .. A= . B = .. , . . . . . . . .. .. .. .. b m1 a m2 · · · a mn a n1 a n2 · · · a nn Let B m denote the square matrix obtained from B by deleting the last n − m columns. Define the matrix:
C=
Ã
0
B
B′
A
!
.
The matrix C and a secondary family of matrices (C r ) may be defined more explicitly:
... .. .
0 .. .
b11 .. .
b12 .. .
··· .. .
...
0
b m1
b m2
···
b m1
a11
a12
···
··· .. .
b m2 .. .
a21 .. .
a22 .. .
b1n
···
b mn
a n1
a n2
···
0 . . . 0 Cr = b11 b12 . ..
... .. .
0 .. .
b11 .. .
b12 .. .
··· .. .
...
0
b m1
b m2
···
0 . . . 0 C= b11 b12 . ..
b1r
···
··· .. .
b m1
a11
a12
··· .. .
b m2 .. .
a21 .. .
a22 .. .
··· .. .
···
b mr
a r1
a r2
···
···
···
b1n .. . b mn a1 n , a2 n .. . a nn b1r .. . b mr a1 r . a2 r .. . a rr
Recall the matrices from C and C r from the earlier discussion on quadratic forms. These are used in Theorem 12.9 to provide the key result for the multiconstraint optimization problem. Theorem 12.9: Suppose that A is symmetric and B m is nonsingular. 1. Then x′ Ax > 0 for every x with Bx = 0, if and only if: (−1)m  C r < 0,
r = m + 1,... , n.
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
412
2. Then x′ Ax < 0 for every x with Bx = 0, if and only if: (−1)r  C r > 0,
r = m + 1,... , n.
Notice that condition 1 requires that all determinants have the same sign (negative if m is odd, positive if m is even), whereas in condition 2 the sign alternates with the size of the matrix. This result may be used to give a sufficiency condition for local optimality of a solution x∗ to a constrained optimization problem. Let f ( x1 ,... , x n ) be the objective function and g k ( x1 ,... , x n ) = c k , k = 1,... , m a
collection of m constraints. Assume that m < n. Define the Lagrangian:
L = f ( x1 ,... , x n ) + λ1 [ c1 − g 1 ( x1 ,... , x n )] + · · · + λm [ c1 − g m ( x1 ,... , x n )].
Define the matrix D g as: ∂g
∂ g1 ∂ x2
···
∂ g1 ∂ xn
∂ g2 ∂ x2
···
∂ g2 ∂ xn
.. .
.. .
∂gm ∂ x2
···
1
∂ x1
∂ g2 ∂ x1 Dg = . ..
∂gm ∂ x1
, .. .
∂gm ∂ xn
and define
Cr =
0
D ′g
0
. . . 0 Dg = ∂ g1 ∂ x1 ∇2 L ∂ g1 ∂x 2 . ..
∂ g1 ∂ xr
...
0
∂ g1 ∂ x1
∂ g1 ∂ x2
···
..
.
.. .
.. .
.. .
.. .
...
0
∂gm ∂ x1
∂gm ∂ x2
···
···
∂gm ∂ x1
L11
L12
···
···
∂gm ∂ x2
L21
L22
···
.. .
.. .
.. .
.. .
..
···
∂gm ∂ xr
L r1
L r2
···
∂ g1 ∂ xr
∂gm ∂ xr L1 r . L2 r .. . L rr
.. .
.
Where ∇2 L is the matrix {L i j }1≤ i≤r,1≤ j ≤r , as indicated in the expression. Assume that the rank of D g is m, and that the x i are labeled such that the first m columns
of D g are linearly independent at x∗ . Thus Theorem 12.10: Suppose that x∗ is feasible (satisfies all the constraints). If ( x∗ , λ∗1 ,... , λ∗m ) = ( x∗ , λ∗ ) satisfies:
12.7. OPTIMIZATION WITH MANY CONSTRAINTS
413
1. L i ( x∗ , λ∗ ) = 0, i = 1,... , n Lλk ( x∗ , λ∗ ) = 0, k = 1,... , m
2. (−1)r  C r > 0, r = m + 1,... n, then x∗ is a constrained local maximum. If condition 2 is replaced by (−1)m  C r > 0 r = m + 1,... , n, then x∗ is a constrained local minimum.
So, for example, if there is one constraint so that m = 1, the local maximum
condition requires that (−1)2  C2 > 0 or  C2 > 0, (−1)3  C3 > 0 or  C2 > 0, and
so on. Similarly, the local minimum condition requires (−1)1  C2 > 0 or  C2 < 0, (−1)1  C3 > 0 or  C3 < 0, and so on. This is the same set of conditions as appears
in Theorem 12.4.
This page intentionally left blank
Exercises for Chapter 12
1
2
Exercise 12.1 A firm has the production function Y = 30L 2 K 3 . (a) What is the costminimizing level of L and K if the firm must produce 1000 units? Check the secondorder conditions. (b) By how much will cost increase if the firm must increase output by one unit? (c) Given this production function, can we find a profitmaximizing level of output? Why or why not? Exercise 12.2
Assume that a monopolist producing one good can sell to two
distinct markets and price discriminate. The demand function in each market is 1 D 1 = 300 − P1 3 1 D 2 = 300 − P2 . 2 The total cost to the monopolist is C = 2D 12 + 3D 1 D 2 + D 22 . Also, owing to a government restriction, the total output of the firm must be 100 units. What is the profitmaximizing output in each market? Check the secondorder conditions. Exercise 12.3
Consider a firm that produces beer and pretzels. It must sell a
bag of pretzels for every four bottles of beer. If its profit function is π = 150B − 4B 2 + 150P − 3P 2 + 10BP + 100
415
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
416
(P = # of bags of pretzels, B = # beers), what is the profitmaximizing level of output?
1
Exercise 12.4 The geometric average of two positive numbers, x and y, is ( x y) 2 . Consider the task of minimizing the geometric average, subject to the condition
x + y = 1. Show, using the method of Lagrange multipliers, that the optimal choice ¡1 1¢ 2 , 2 and confirm that this gives the constrained minimum.
of ( x, y) is
Exercise 12.5
A swimming pool is 25 meters times 100 meters. In coordi
nates, a position ( x, y) gives length and distance traveled from (0,0). Thus (15,20) denotes the position of a swimmer who swam 20 meters along the x axis and 15 meters along the y axis from (0,0). There are four corners (0,0), (0,25), (100,0) and (100,25). The diagonal line connecting corners (0,25) and (100,0) is given by 1
4 y + x = 100. The distance from corner (0,0) to coordinate ( x, y) is ( x2 + y2 ) 2 . A swimmer wishes to swim from the (0,0) corner, and cross the diagonal line in the
shortest distance possible. What coordinate position should the person swim to? Use the method of Lagrange multipliers to determine the answer. Exercise 12.6
A firm wishes to produce a level of output Q at minimum cost.
Two inputs, capital (K ) and labour (L), are used to produce Q with the production function given by Q = K α + Lα , where 0 < α < 1. The wage rate is ω and the cost of capital (per unit) is r, so that total cost given (K, L) is C = rK + ωL.
(a) Set up the Lagrangian to solve the problem of finding the costminimizing levels of K and L required to produce Q . (b) Solve for K, L and the Lagrange multiplier, λ. (c) Check that the secondorder conditions for a minimum are satisfied. (d) Observe that in your solution to part (b), both K and L depend on Q . To emphasize this dependency, write K (Q ) and L(Q ). Thus the cost of producing the output level Q is C (Q ) = rK (Q ) + ωL(Q ). Find dC (Q )/dQ and compare your answer with λ. Exercise 12.7
Dr. Who lives at coordinates ( x, y) = (3, 12). A road in the ( x, y)
plane follows the path: k = x2 − 6 x + y, where 0 < k < 3. On the bus journey home, each traveller must give the driver “dropoff ” coordinates. Unfortunately for Dr. Who, the bus does not travel by his house (since ( x, y) = (3, 12) is not on the line
k = x2 − 6 x + y).
(a) What dropoff coordinates should Dr. Who give the driver so that Dr. Who is as close to his house as possible? Hint: the distance between two points, ( x1 , y1 ) and 1
( x2 , y2 ), is d = [( x1 − x2 )2 + ( y1 − y2 )2 ] 2 . However, since minimizing this is equivalent
to minimizing d ∗ = [( x1 − x2 )2 + ( y1 − y2 )2 ], take d ∗ as the measure of distance. (b) Confirm that the secondorder conditions are satisfied.
EXERCISES FOR CHAPTER 12
417
(c) The solution depends on k. Let d ∗ (k) be the distance at the solution from the dropoff point to Dr. Who’s house. Calculate
d [ d ∗ ( k)] dk
and relate it to the Lagrange
multiplier in (a). Exercise 12.8
A firm sets both price and quantity in a market. It is known
that the demand function for this market is P = k/Q . By law, the firm is required
to choose a price quantity combination lying on the demand curve. The cost of setting (P,Q ) combinations is C (P,Q ) = P + αQ 2 .
(a) Minimize C (P,Q ) subject to the demand constraint. (b) Confirm that the secondorder conditions are satisfied. (c) The optimal choices of P and Q depend on (α, k) : P (α, k),Q (α, k), so that the cost at the solution is C ∗ (α, k) = C (P (α, k),Q (α, k)). Find ∂C ∗ (α, k)/∂k and compare
this with the Lagrange multiplier obtained in (a). Exercise 12.9
Maximize x y subject to the constraint x2 + y = k, where k is
some positive constant. Confirm that the secondorder conditions for a constrained optimum are satisfied. Interpret the Lagrange multiplier. Exercise 12.10 Minimize ( x − 1)2 + y2 subject to the constraint x2 + k = y, where
k is some positive constant. Confirm that the secondorder conditions for a con
strained optimum are satisfied. Interpret the Lagrange multiplier. Exercise 12.11 Consider the following constrained optimization problem: maximize f ( x, y) = y − bx2 subject to the constraint: y = k + a ln x. Here k, a and b are positive constants.
(a) Plot the problem. (b) Find the solution values for x and y. (c) Confirm that the secondorder conditions are satisfied. (d) Let the solution be x(a, b, k) and y(a, b, k). Then the value of the objective at the solution is f ∗ (a, b, k) = f ( x(a, b, k), y(a, b, k)). Find
∂ f ∗ (a,b,k) . ∂a
Find
∂ f ∗ (a,b,k) ∂k
and
interpret this in terms of the Lagrange multiplier.
Exercise 12.12 An individual faces the problem of minimizing expenditure subject to reaching some fixed level of utility u∗ . There are two goods, x and y, and the utility function is given by u( x, y) = xα yα , 0 < 2α < 1. The cost of x (per unit) is
p and the cost of y (per unit) is q, so the cost of the bundle ( x, y) is px + q y.
(a) Set up the Lagrangian for this problem. Find the optimal values of x and y and the value of the Lagrange multiplier. Check the secondorder conditions.
CHAPTER 12. EQUALITYCONSTRAINED OPTIMIZATION
418
(b) The solution depends on p, q and u∗ so write x( p, q, u∗ ), y( p, q, u∗ ), λ( p, q, u∗ ). The expenditure required to achieve utility level u∗ given prices p and q is therefore e( p, q, u∗ ) = px( p, q, u∗ ) + q y( p, q, u∗ ). Then ∂ e( p, q, u∗ ) ∂p
and
∂ e( p, q, u∗ ) ∂q
give the change in mimimum income required to reach utility level u∗ when the price of x and y change. Show that ∂ e( p, q, u∗ ) ∂p
= x( p, q, u∗ ) and
∂ e( p, q, u∗ ) ∂q
= y( p, q, u∗ ).
13 Inequalityconstrained optimization
13.1 Introduction This chapter considers the problem of maximizing or minimizing a function f : R n → R subject to inequality constraints such as g(x) ≤ c or g(x) ≥ c. This type
of problem arises frequently in economics. For example, maximizing output y = f (x1 , x2 ) subject to expenditure constraint, g(x1 , x2 ) = w1 x1 + w2 x2 ≤ c, is a typical
problem.
In Section 13.2, the constrained optimization is formally defined and the main result (Theorem 13.1) is given. All the results in this chapter are derived from this theorem and the constraint qualification. Section 13.2.1 discusses the constraint qualification, a (mild) condition that the derivatives of the constraints must satisfy to guarantee that the “standard” firstorder conditions are necessary for a local optimum. In Section 13.2.2, the complementary slackness condition is discussed: on nonbinding constraints we can take the Lagrange multipliers to be 0. Section 13.2.3 discusses the logical equivalence of the maximization and minimization programs. In Section 13.2.4, conditions are given under which a local optimum is also a global optimum. In Section 13.2.5, problems with nonnegativity constraints are considered. In this case, the necessary conditions for a local optimum may be expressed in terms of the Kuhn–Tucker conditions, and this is done in Theorems 13.5 and 13.6, which are derived directly from the main result, Theorem 13.1. The remaining sections deal with a few miscellaneous topics.
419
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
420
First, Section 13.1.1 gives an introductory example that motivates the key characterizing condition for a solution: at the solution, the gradient of the objective may be written as a linear combination of the gradients of the constraints.
13.1.1
A motivating example
To develop some of the issues, consider the following optimization example.
E XAMPLE 13.1: max
x1 x2
subject to: x12 + x22 ≤ 1
x2 ≤ α x1 .
Thus, the objective is f (x1 , x2 ) = x1 x2 , and these constraints may be written g 1 (x1 , x2 ) = x12 + x22 − 1 ≤ 0 and g 2 (x1 , x2 ) = −α x1 + x22 ≤ 0. The figure depicts
the problem for two different values of the parameter α. x2
x2 = α˜ x1 •
xa
x1 x2 = 0.5 x12 + x22 = 1
x1 x2
x2 = αˆ x1 • xb
x1 x2 = 0.3448
x12 + x22 = 1
x1
13.1. INTRODUCTION
421
In the figures, constraint g 2 is depicted for two possible values for α: α˜ = 2 > 4 10
= αˆ . At parameter value α˜ , only the first constraint is binding and the ³q q ´ 1 1 ˆ solution occurs at x a = 2, 2 . At parameter value α, both constraints
1>
are binding and the solution occurs at x b . Let g˜ 2 ( x1 , x2 ) = −α˜ x1 + x2 , and
gˆ 2 ( x1 , x2 ) = −αˆ x1 + x2 .
At parameter value α˜ , the constraints are g 1 ( x1 , x2 ) ≤ 0 and g˜ 2 ( x1 , x2 ) ≤ 0
and the solution, x a , satisfies g 1 ( x a ) = 0, g˜ 2 ( x a ) < 0. Constraint g 1 is active
or binding at x a but constraint g˜ 2 is not active at x a , because the constraint
is not binding at x a . At parameter value αˆ , the constraints are g 1 ( x1 , x2 ) ≤ 0
and gˆ 2 ( x1 , x2 ) ≤ 0, and, at the solution, x b satisfies g 1 ( x b ) = 0, gˆ 2 ( x b ) = 0. Both constraints g 1 and gˆ 2 are active at x b . Depending on circumstance, a constraint may or may not be binding, so that the equations defining the solution must reflect this possibility. In the case where α = α˜ , observe that the slope of the objective is equal
to the slope of the active constraint. In the case where α = αˆ , the slope of the
objective lies between the slopes of the constraints. The point x b is located at the solution to the two constraints x2 = 0.4 x1 and x12 + x22 = 1: x b = ( x1 , x2 ) =
(0.9285, 0.3714). The slope of constraint 1 is 0.4 and the slope of constraint 9285 0.27 2 is − xx1 = − 00..3714 = −2.5. The slope of the objective is − 0.27 2 = − 0.9285 =
x1
2
−0.2908. Thus, since the slope of the objective lies between the slopes of
the two constraints, the slope of the objective may be written as a weighted average of the slopes of the constraints.
♦
That the slope of the objective lies between the slopes of the constraints is an important point. Remark 13.1 clarifies this point. R EMARK 13.1: Rather than express conditions in terms of slopes, it is standard practice to use the normal to a curve to represent the slope. Thus, with
f ( x1 , x2 ) = x1 x2 evaluated at x b , def
∇ f = ( f x1 , f x2 ) = ( x2 , x1 ) = (0.3714, 0.9285)
is the normal to the curve f ( x1 , x2 ) at ( x1 , x2 ). As a vector, this has slope
x1 x2 ,
which is the negative of the reciprocal of the slope of the objective f ( x1 , x2 ) =
x1 x2 , given by dd xx12 = − xx12 . Similarly, the normal of the linear constraint g 2 is ´ ³ ∂g ∂g ∇ g 2 = ∂ x 2 , ∂ x 2 = (−0.4, 1) and the normal of the quadratic constraint g 1 is 2´ ³ 1 ∂g ∂g ∇ g 1 = ∂ x 1 , ∂ x 1 = (2 x1 , 2 x2 ) = (1.857, 0.7428). 1
2
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
422
x2 xb + ∇ g2
g2
xb + ∇ f
• xb
xb + ∇ g1
f g1
x1
The slopes of the normals of the constraints are, respectively, 1 − 0.4 = −2.5 (g 2 ). The slope of the objective normal is
x1 x2
x2 x1
= 0.4 (g 1 ) and
= 2.5. An arc starting
with 0 slope first reaches the normal to g 1 then the normal to f and finally the normal to g 1 . Thus, the normal of the objective may be expressed as a positive linear combination of the constraint normals:
Ã
∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 ! Ã ! Ã ! 0.3714 1.8570 −0.4 = 0.3448 + 0.6724 . 0.9285 0.7428 1
(13.1)
The observations in Example 13.1 apply for the general case.
13.1.2
Overview
Given a function h(x) = h(x1 ,... , x n ), the normal to the function at x is the gradient
of the function at x, ∇h = (h x1 , h x2 ,... , h xn ). The key characterizing condition for an optimum is that the gradient (normal) of the objective may be written as a
weighted sum of the gradients of the constraints. With k constraints, { g j }kj=1 , ∇ f = Pk j =1 λ j ∇ g j , with λ j ≥ 0 for all j, as in Theorem 13.1. P Provided a constraint qualification is satisfied, this condition, ∇ f = kj=1 λ j ∇ g j , with λ j ≥ 0 for all j, along with complementary slackness conditions, determines a system of equations that may be solved to provide necessary conditions for a local optimum. All results here are derived from these conditions. The constraint qualification is discussed in Section 13.2.1; complementary slackness is discussed in Section 13.2.2. Sufficiency conditions are discussed in Section 13.2.4: concavity of the objective and convexity of the set of feasible points imply that, given the constraint qualification, a solution to the necessary conditions yields a global maximum for the constrained problem. Finally, the necessary conditions can be specialized to include nonnegativity conditions, and this is done in Section 13.2.5.
13.2. OPTIMIZATION
423
13.2 Optimization Since a constraint of the form g( x) ≤ c may be written as g( x) − c ≤ 0 and constraints of the form g( x) ≥ c may be written as g( x) − c ≥ 0 or as − g( x) ≤ − c, we
can express all such constraints in the form g( x) ≤ 0 (or g( x) ≥ 0). Equality con
straints may be dealt with in the same way since g( x) = c is equivalent to g( x) ≤ c
and g( x) ≥ c, but at the cost of having two constraints instead of one. Similarly,
nonnegativity constraints may be expressed in this way; setting g( x) = − x, the condition g( x) ≤ 0 implies that x ≥ 0.
An inequalityconstrained maximization problem is given as: max f ( x) subject to g j ( x) ≤ 0, j = 1, ... , r.
(13.2)
The function f ( x) is the objective function and g j , j = 1,... , r, the set of constraints. Feasible points and active constraints are defined next.
Definition 13.1: A point x∗ is feasible if it satisfies all of the constraints: g j ( x∗ ) ≤
0, ∀ j. A constraint, g j is active at a feasible point x∗ if g j ( x∗ ) = 0, and otherwise it
is inactive. Given the feasible point x∗ , let A ( x∗ ) be the set of active constraints at
x∗ : j ∈ A ( x∗ ) if and only if g j ( x∗ ) = 0. A point x∗ is a constrained local maximum if there is no alternative point “close” to x∗ that satisfies the constraints and gives a higher value for the objective. Formally: Definition 13.2: A point x∗ is a constrained local maximum of f if there is a neighborhood of x∗ (an open set in R n containing x∗ ), N ( x∗ ), such that f ( x∗ ) ≥
f ( x¯ ) for all x¯ ∈ N ( x∗ ) ∩ { x  g l ( x) ≤ 0, l = 1,... , r}.
The following discussion identifies necessary conditions for a point x∗ to be a constrained local maximum. The discussion makes use of Farkas’ lemma (see Remark 13.2), applied to the following equation system: ∂ g 1 ( x∗ ) ∂ x1
∂ g 2 ( x∗ ) ∂ x1
.. .
··· .. .
∂ g 1 ( x∗ ) ∂ xn
∂ g 2 ( x∗ ) ∂ xn
···
.. .
∂ g r ( x∗ ) ∂ x1 λ1
∂ f ( x∗ ) ∂ x1
.. .. . = . ∂ g r ( x∗ ) ∂ f ( x∗ ) λr ∂ xn ∂ xn
.. .
(13.3)
Each of the i rows may be read as: ∂ f ( x∗ ) ∂xi
=
r X
j =1
λj
∂ g j ( x∗ ) ∂xi
, i = 1,... , n.
(13.4)
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
424
This system of equations is central to the characterization of necessary conditions for an optimum. For ease of notation, write ∇ f ( x∗ ) for the term on the right side of equal
ity 13.3, ∇ g for the n × r matrix on the left side and ∇ g l ( x∗ ) for the l th column of
∇ g( x∗ ). With λ = (λ1 ,... , λr ), equation system 13.3 may be written: ∇ g ( x ∗ ) · λ = ∇ f ( x ∗ ).
(13.5)
This equation system plays a central role in inequalityconstrained optimization. Subject to a qualification, if the equation system does not have a nonnegative solution, then the point x∗ is not a local optimum. Theorem 13.1 gives Equation 13.5 as a necessary condition for optimality, provided a “constraint qualification”, ∇ g j ( x∗ ) · z < 0, for any binding constraint, j, is satisfied. Theorem 13.1: Given the program 13.2, suppose that at x∗ there is some z with ∇ g j ( x∗ ) · z < 0 for each active constraint j. If x∗ is a constrained local maximum
then
∂ f ( x∗ ) ∂xi
=
r X
λj
j =1
∂ g j ( x∗ ) ∂xi
, i = 1,... , n
must have a nonnegative solution, λ = (λ1 ,... , λr ), λ j ≥ 0, j = 1,... , r.
P ROOF : If Equation 13.3 has no nonnegative solution, then from Farkas’ lemma (Remark 13.2), ∃ z∗ ∈ R n such that: n ∂ g ( x∗ ) X j
i =1
∂xi
· z∗i ≤ 0,
for
j = 1,... , r,
and
n ∂ f ( x∗ ) X · z∗i > 0. i =1 ∂ x i
(13.6)
More succinctly, ∇ g j ( x∗ ) · z∗ ≤ 0, ∀ j = 1,... , r, and ∇ f ( x∗ ) · z∗ > 0. By assumption, P ∂ g ( x∗ ) there is some z with ∇ g j ( x∗ ) · z < 0 for each j ∈ A ( x∗ ) (that is ni=1 ∂jx z i < 0 i
for all j ∈ A ( x∗ )). Let zα = (1 − α) z + α z∗, so that ∇ g j ( x∗ ) · zα < 0 for each j ∈ A ( x∗ ). Consider x∗ + δ zα where δ > 0 is small, and note that limδ→0 δ1 [ f ( x∗ +
δ zα ) − f ( x∗)] = ∇ f ( x∗ ) · zα > 0 when α is close to 1. Similarly, for each j ∈ A ( x∗ ),
limδ→0 δ1 [ g j ( x∗ + δ zα ) − g j ( x∗ )] = ∇ g j ( x∗ ) · zα < 0. For the inactive constraints,
j 6∈ A ( x∗ ), g j ( x∗ ) < 0 so that for δ close to 0, g j ( x∗ + δ zα ) < 0, j = 1,... , r. Conse
quently, there is an α close to 1, and a δ close to 0 such that f ( x∗ +δ zα )− f ( x∗ ) > 0 and g j ( x∗ + δ zα ) < 0, j = 1,... , r. Summarizing, given that at x∗ there is some z
with ∇ g j ( x∗ ) · z < 0 for each active constraint j, if Equation 13.3 does not have a nonnegative solution, then x∗ cannot be a constrained local maximum.
13.2. OPTIMIZATION
425
R EMARK 13.2: According to Farkas’ lemma, if A is an n × r matrix then exactly
one of the conditions holds:
1. Ax = b has a solution x ≥ 0, or 2. There exists y with A ′ y ≤ 0 and b′ y > 0. That is, if (1) is satisfied, (2) cannot be satisfied; and if (1) is not satisfied, then (2) must be satisfied.
The following remarks clarify a few points regarding this result, Theorem 13.1. Remark 13.3 explains how nonnegativity constraints are imposed; Remark 13.4 discusses the determinacy of the system (in terms of equations and unknowns) and Remark 13.5 provides some insight into the sign of the multiplier. R EMARK 13.3: Observe that the program given by expression 13.2, max f (x) subject to g j (x) ≤ 0, j = 1, ... , r, includes the possibility of imposing nonnegativity constraints. For example, if g k (x) = − x k , then g k (x) ≤ 0 is equivalent to − x k ≤ 0 or x k ≥ 0. In Section 13.2.5,
a special case is considered – where some of the constraints are nonnegativity
conditions. There, the Lagrangian together with complementary slackness conditions are used to define the necessary conditions for an optimum. This gives the Kuhn–Tucker conditions for constrained optimization with inequality constraints.
R EMARK 13.4: Regarding the solution of these equation systems, note that in Equation 13.3 (or Equations 13.4 or 13.5), there are n equations in n + r
variables (x, λ). In addition, there are r inequalities g j (x) ≤ 0, j = 1,... , r to be
satisfied at a solution. Complementary slackness, discussed in Section 13.2.2,
asserts that the λ j may be chosen to satisfy λ j g j = 0 for each j, giving an addi
tional r equations. Provided the constraint qualification is satisfied, a necessary condition for an optimum is that the system ∇ g(x) · λ = ∇ f (x) and λ j g j (x) = 0, j = 1,... , r of n + r equations in n + r unknowns has a solution (x1 ,... x n ; λ1 ,... , λr )
with λ j ≥ 0, ∀ j = 1,... , r.
R EMARK 13.5: A motivation for the sign of the Lagrange multiplier may be given by considering Figure 13.1. At the solution x1∗ = x2∗ , ∇ f (x∗ ) = (x2∗ , x1∗ ), and ∇ g(x∗ ) = (2x1∗ ,2x2∗ ), so that ∇ f (x∗ ) = λ∇ g(x∗ ) with λ = 12 . Notice that ∇ g points
out from the feasible region, as does ∇ f . Thus, the direction of improvement of f is the same as the direction of infeasible choices. This is necessary for the
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
x2
426
x∗ + ∇ g x∗ + ∇ f
x∗ = ( x1∗ , x2∗ ) •
f ( x1 , x2 ) = x1 x2 = u¯ g( x1 , x2 ) = x12 + x22 = c
x1 Figure 13.1: Sign of the Lagrange multiplier candidate solution to be an optimum. This fact is reflected in the expression
∇ f = λ∇ g with λ > 0.
The following example illustrates the application of Theorem 13.1 by showing how the necessary condition for an optimum is satisfied at the solution.
E XAMPLE 13.2: Consider the following problem: max x subject to:
f ( x1 , x2 ) = x1 x2 g 1 ( x1 , x2 ) = x2 + x12 − c ≤ 0 p g 2 ( x 1 , x 2 ) = x 2 − α x 1 ≤ 0.
x2
g1 = 0 ∇ g2
∇f ∇ g1
z
g2 = 0 f =k x1
Taking the parameter values as c = 4 and α = 1, the constraints intersect at ( x1 , x2 ) = (1.648, 1.284). Let
f ( x1 , x2 ) = x1 x2 , so that
13.2. OPTIMIZATION
427
at ( x1 , x2 ) = (1.648, 1.284), Ã ! Ã ! Ã ! ! Ã x2 2 x1 1.284 3.296 ∇f = ; ∇ g1 = ; = = x1 1.648 1 1
∇ g2 =
Ã
− 21 p1x
1
!
1
=
Ã
−0.389
1
!
.
Therefore, ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 = 0.5224∇ g 1 + 1.1256∇ g 2 .
So, ( x1 , x2 ; λ1 , λ2 ) = (1.648, 1.284, 0.5224, 1.1256) satisfies the necessary con
♦
ditions for a maximum.
R EMARK 13.6: Note that the vector z′ = (−0.3, −0.6) satisfies ∇ g 1 · z = −1.589,
∇ g 1 · z = −0.483. Thus, the constraint qualification is satisfied. The constraint
qualification if satisfied, ensures that the requirement that ∇ g( x∗ ) · λ = ∇ f ( x∗ )
has a nonnegative solution, where λ is a necessary condition for a local
optimum.
The next Example, 13.3, considers the quasilinear utility function and maximization of utility on a constraint set that is not convex. The example illustrates the key necessary condition, satisfied at the solution to the program. However, this example is reconsidered later (Example 13.11), where the nonconvexity permits the necessary conditions to be satisfied at a nonoptimal point. E XAMPLE 13.3: Consider the problem of maximizing the function f ( x, y) =
α ln x + y, subject to the constraints g 1 ( x, y) = y2 + ( x − 1)2 ≤ 2 and g 2 ( x, y) =
x y ≤ k = 1.12. For the calculations, take α = y 2
xy = k
6 10 .
∇f
∇ g1
∇ g2
( x − 1)2 + y2 = 2 1
g2
6 10
· ln x + y = β
g1 x
0 0
1
2
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
The solution occurs at x =
8 10
6 10 ln(x) + y = β = 1.266.
14 10 .
and y =
428
At these values, x y = k = 1.12 and
Note that
∇ f (x, y) =
At x =
8 10 ,
y=
14 10 ,
Ã
6 1 10 x
!
1
,
∇ g 1 (x, y) =
Ã
! 2(x − 1)
2y
,
∇ g 2 (x, y) =
Ã ! y
x
.
forming Equation 13.3 gives: Ã ! 6 8
1
which has solution λ1 =
55 204
= λ1
Ã
4 − 10 28 10
and λ2 =
!
+ λ2
Ã
14 10 4 10
!
,
125 204 .
This example is considered further in Section 13.2.6, where a second
♦
tangency point is identified, which fails to be a local optimum.
One feature of quasilinear preferences is that demand for one of the goods (the one entering the utility function nonlinearly) depends only on the price ratio. If income is not sufficiently large, demand as a function of the price ratio may exceed income. This in turn implies that demand for the other good is negative. To avoid this outcome, it is necessary to impose nonnegativity constraints on the variable choices, as is done in Example 13.4.
1
E XAMPLE 13.4: Consider the problem of maximizing f (x, y) = x 2 + y subject
to g 1 (x, y) = px + q y − m ≤ 0, g 2 (x, y) = − x ≤ 0 and g 3 (x, y) = − y ≤ 0. The
gradients are
∇f =
Ã1
2
p1 x
1
!
,
∇ g1 =
Ã ! p
q
,
∇ g2 =
From the graph, the solution is at (x∗ , y∗ ) =
³
Ã ! −1
0
m p ,0
´
,
∇ g3 =
Ã
0 −1
!
.
. Consider the expression:
∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 + λ3 ∇ g 3
13.2. OPTIMIZATION
429
y · λ1 ∇ g 1
λ3 ∇ g 3
· ∇f f ( x, y) =
px + q y
q
m p
∇ g1
=m
•
∇ g2
x
m p
∇ g3
or 1 2
q1
m p
1
= λ1
Ã ! p
q
+ λ2
Ã ! −1
0
+ λ3
Ã
0 −1
!
.
Because the normal to the budget constraint is steeper than the normal to the indifference curve, the second coordinate of λ1 · ( p, q) will exceed the sec
ond coordinate of ∇ f (which is 1), as depicted in the figure. Choose λ3 to reduce the second coordinate to 1. Thus,
1 q1 m 2 p
1 p1 2 mp .
Then λ1 q =
1 p1 2 mp q
=
1 q1 q . 2 m p
= λ1 p, or λ1 =
1 ³q 1 ´ 2 m p p
=
Ignoring the nonnegativity condi
p
q2
tion, the optimal x value would be 41 p2 and this is greater than m p . Thus, q m 1 q 1 q q1 1 q q1 > 1. Putting λ3 = 2 p m − 1 and λ2 = 0 comm 2 p > p and so 2 p p
p
pletes the determination of the λ i . This example is considered further in Example 13.10.
♦
The constraint qualification is discussed next. If the gradient of the objective is to P be written as a weighted average of the gradients of the constraints, ∇ f = λ j ∇ g j ,
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
430
this requires sufficient “variability” across the different constraints. The constraint qualification concerns this variation.
13.2.1
The constraint qualification
In Theorem 13.1, the constraint qualification requires that there exists some z with ∇ g j ( x∗ )· z < 0 for each active constraint j. An alternative constraint qualifica
tion is given in terms of the linear independence of the gradients of the active constraints. Let ∇A ( x∗ ) g( x∗ ) be the matrix obtained from the matrix ∇ g( x∗ ) by delet
ing the columns corresponding to inactive constraints. Suppose that ∇A ( x∗ ) g( x∗ )
has rank equal to the number of active constraints – the gradients of the active
constraints are linearly independent (so the number of active constraints must be no greater than n). Then there exists some z with ∇ g j ( x∗ ) · z < 0 for each active
constraint j. To see this, let γ = (−1, −1,... , −1), a vector with the same number of entries as there are active constraints. Since ∇A ( x∗ ) g( x∗ ) has full column rank, there is some z ∈ R n with z′ ∇A ( x∗ ) g( x∗ ) = γ. Thus, for each active constraint j, −1 = ∆ g j ( x∗ ) · z < 0.
The following examples illustrate the constraint qualification. E XAMPLE 13.5: Consider the pair of constraints: p g 1 ( x1 , x2 ) = x2 − x1 ≤ 0
g 2 ( x1 , x2 ) = x12 − 1 − x2 ≤ 0.
The two curves, g 1 ( x1 , x2 ) = 0, g 2 ( x1 , x2 ) = 0 intersect at ( x1 , x2 ) = (1.49, 1.22).
x2 g2 = 0 ∇ g1
∇f g1 = 0 ∇ g2
1
f
x1
0 0
1
13.2. OPTIMIZATION
431
The shaded area depicts the set of points in the nonnegative orthant satisfying the two constraints. The gradient of f , ∇ f , lies in the cone determined by ∇ g 1 and ∇ g 2 so that ∇ f may be written ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 ,
♦
with λ1 , λ2 nonnegative.
13.2.1.1
Failure of the constraint qualification
When the constraint qualification fails, the representation in Equation 13.4 (or 13.5) may be impossible to satisfy: at the solution to the problem, x∗ , Equation 13.4 cannot be satisfied. The following example illustrates.
E XAMPLE 13.6: The figure plots the constraints determined by two circles p of radius 8:
g 1 ( x) = ( x1 − 3)2 + ( x2 − 3)2 ≤ 8
g 2 ( x) = ( x1 − 7)2 + ( x2 − 7)2 ≤ 8. They have only one point in common: x = (5, 5).
x2 g2
10
g2 = 8
9 8
1 2 ∇ g 1 (5, 5)
7 6
g1
5 4 3 2 1
1 2 ∇ g 2 (5, 5)
g1 = 8
L
0 0 1 2 3 4 5 6 7 8 9 10
x1
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
432
The constraints have gradients:
∇ g 1 ( x) =
Ã
2( x1 − 3)
!
2( x2 − 3)
,
∇ g 2 ( x) =
Ã
2( x1 − 7)
2( x2 − 7)
!
.
The figure depicts these for x = (5, 5):
∇ g 1 (5, 5) =
Ã ! 4
4
,
∇ g 2 (5, 5) =
Ã ! −4
−4
.
Thus, for any λ1 , λ2 ,
λ1 ∇ g 1 (5, 5) + λ2 ∇ g 2 (5, 5) = (λ1 − λ2 )
Ã ! 4
4
=β
Ã ! 4
4
.
The line L passes through (5, 5). From the figure, note that {( x1 , x2 ) 
∇ g 1 (5, 5) · ( x1 , x2 ) ≤ 0} is the set of points on or below the line L . Like
wise {( x1 , x2 )  ∇ g 2 (5, 5) · ( x1 , x2 ) ≤ 0} is the set of points on or above the
line L . There is no point z with ∇ g i (5, 5) · z < 0 for i = 1, 2. Note also
that ∇ g 1 (5, 5) and ∇ g 2 (5, 5) are linearly dependent: ∇ g 1 (5, 5) = (4, 4) and
∇ g 2 (5, 5) = (−4, −4). Unless ∇ f (5, 5) = (α, α) for some number α, it is impossi
ble to write ∇ f (5, 5) = λ1 ∇ g 1 (5, 5) + λ2 ∇ g 2 (5, 5), even though the maximization problem has a solution at ( x1∗ , x2∗ ) = (5, 5), the only feasible point.
♦
Considering Equation 13.5, in the context of Example 13.6, the matrix ∇ g( x∗ ) is singular and solving for (nonnegative) λ in ∇ g( x∗ ) · λ = ∇ f ( x∗ ) is impossible. The
next example illustrates the same point – singularity of ∇ g( x∗ ) makes it impossi
ble to satisfy ∇ g( x∗ )·λ = ∇ f ( x∗ ). However, in the example, the shape of the feasible region seems “regular”; the problem arises because one of the gradients vanishes at the candidate solution point.
E XAMPLE 13.7: Let g 1 ( x1 , x2 ) = ( x1 + x2 − 1)3 , g 2 ( x1 , x2 ) = x2 and f ( x1 , x2 ) =
x1 x2 . Consider the two constraints g 1 ( x1 , x2 ) ≤ 0 and g 2 ( x1 , x2 ) ≤ 12 . Satisfy
ing constraint 1 requires that ( x1 + x2 − 1)3 ≤ 0 or ( x1 + x2 − 1) ≤ 0 or x1 + x2 ≤ 1.
13.2. OPTIMIZATION
433
The gradients are:
∇ f ( x1 , x2 ) =
Ã
x2 x1
!
,
∇ g 1 ( x1 , x2 ) =
Ã
3( x1 + x2 − 1)2
3( x1 + x2 − 1)2
!
,
∇ g 2 ( x1 , x2 ) =
Ã ! 0
1
.
x2 1 2 ∇ g2
1
∇f
( x¯ 1 , x¯ 2 )
0
x1
0
1
So, the set of ( x1 , x2 ) pairs satisfying both constraints are those pairs with ¢ ¡ x1 + x2 ≤ 1 and with x2 ≤ 21 . The solution occurs at ( x¯ 1 , x¯ 2 ) = 12 , 12 , where the gradients are
¶ Ã1! 1 1 ∇f , = 21 , 2 2 2 µ
∇ g2
µ
¶ Ã ! 0 1 1 , , = 2 2 0
∇ g2
µ
¶ Ã ! 0 1 1 , . = 2 2 1
¢ ¡ ¢ ¡ It is impossible to write ∇ f 12 , 21 as a linear combination of ∇ g 1 12 , 21 ¡1 1¢ and ∇ g 1 2 , 2 . Here, again, the matrix ∇ g( x∗ ) is singular at the candidate
solution point. One might remark that a respecification of the constraint
g 1 ( x1 , x2 ) ≤ 1 to the equivalent gˆ 1 ( x1 , x2 ) = x1 + x2 − 1 ≤ 0 provides two lin
♦
early independent constraint gradient vectors.
E XAMPLE 13.8: The following example illustrates yet another case where the necessary conditions are not satisfied. The³ constraints are g 1 ( x, y) = y − ´ (1 − x)3 ≤ 0 and g 2 ( x, y) = − y ≤ 0. Thus, ∇ g 1 = ³ ´ ∂g ∂g ∇ g 2 = ∂ x2 , ∂ y2 = (0, −1).
∂ g1 ∂ g1 , ∂y ∂x
= (−3(1 − x)2 , 1) and
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
434
x2 1 2 ∇ g2
1
∇ g1
x1
1
−1
∇ g2
So, at ( x, y) = (1, 0), unless ∇ f = α(0, 1) for some real number α, there is no
♦
solution to the expression: ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 .
From Theorem 13.1, subject to the constraint qualification, a necessary condition for x∗ to be a solution is that ∂ f ( x∗ ) ∂xi
=
r X
λj
j =1
∂ g j ( x∗ ) ∂xi
, i = 1,... , n
must have a nonnegative solution, λ = (λ1 ,... , λr ), λ j ≥ 0, j = 1,... , r.
Remark 13.4 notes that the Lagrange multipliers and associated constraints may be chosen to satisfy λ j g j = 0: at the solution, if the multiplier is positive, the cor
responding constraint is binding (λ j > 0 implies g j = 0); and, if the constraint is not binding, the corresponding multiplier is 0 ( g j < 0 implies λ j = 0).
Imposing this requirement yields a system of n + k equations (∇ f =
Pk
j =1 λ j ∇ g j
gives n equations and λ j g j = 0, j = 1,... , k gives k equations) in n + k unknowns,
({ x i }ni=1 , {λ j }kj=1 ). The following discussion confirms the validity of imposing the complementary slackness conditions.
13.2.2
Complementary slackness
Consider the program given in Equation 13.2 and let x∗ be a constrained local maximum satisfying the necessary conditions of Theorem 13.1 with Lagrange
13.2. OPTIMIZATION
435
multipliers {λ j }kj=1 . At x∗ , the active constraints are A ( x∗ ) = {l  g l ( x∗ ) = 0} and
the inactive constraints A ( x∗ ) c = {l  g l ( x∗ ) < 0}. There is an open neighborhood of
x∗ , O ( x∗ ) with g l ( x) < 0 for any l ∈ A ( x∗ ) c and x ∈ O ( x∗ ).
Consider the modified program obtained from Equation 13.2 by deleting the constraints not binding at x∗ : max f ( x) subject to g j ( x) ≤ 0, j ∈ A ( x∗ ).
(13.7)
For this program, x∗ is a constrained local maximum – otherwise, in any neighborhood of x∗ there is an x′ satisfying the constraints A ( x∗ ) and with f ( x′ ) > f ( x∗ ), violating the assumption that x∗ is a constrained local maximum of the original
program. Furthermore, for this program the constraint qualification is satisfied at x∗ (if the qualification was satisfied at x∗ for the original program). Therefore (applying Theorem 13.1 to the modified program), there exists λ˜ j ≥
0, j ∈ A ( x∗ ) with:
∂ f ( x∗ ) ∂xi
=
X
λ˜ j
j ∈A ( x∗ )
∂ g j ( x∗ ) ∂xi
.
For j 6∈ A ( x∗ ), let λ˜ l = 0 so that: ∂ f ( x∗ ) ∂xi
=
r X
λ˜ j
∂ g j ( x∗ )
j =1
∂xi
.
The condition λ˜ j g j = 0 for all j = 1,... , r is called the complementary slackness condition. Consequently, we can refine Theorem 13.1 with complementary slackness conditions: Theorem 13.2: Given the program 13.2, suppose that at x∗ there is some z with
∆ g l ( x∗ ) · z < 0 for each active constraint l , then if x∗ is a constrained local maximum there exists λ j ≥ 0, j = 1,... , r, with: ∂ f ( x∗ ) ∂xi
=
r X
j =1
λj
∂ g j ( x∗ ) ∂xi
, i = 1,... , n,
(13.8)
and where λ j = 0 if g j ( x∗ ) < 0 (equivalently λ j g j ( x∗ ) = 0, ∀ j = 1,... , r). So, in expression 13.8 there are n equations, and the equations λ j g j ( x∗ ) = 0, ∀ j = 1,... , r, give n + r equations in n + r unknowns (see Remark 13.4).
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
13.2.3
436
Minimization and maximization
The case where the objective is to minimize a function subject to constraints may be considered by adapting Theorem 13.2. Let f ( x) be the function to be minimized, and specify the j constraints in the form g j ( x) ≥ 0, j = 1,... , r. Observe that minimizing f ( x) is the same as maximizing fˆ( x) = − f ( x); and that the con
straint g j ( x) ≥ 0 is equivalent to − g j ( x) ≤ 0, so let gˆ j ( x) = − g j ( x) for each j. Then
the program:
(A)
min f ( x) subject to g j ( x) ≥ 0, j = 1, ... , r
is equivalent to the program: (B)
max fˆ( x) subject to gˆ j ( x) ≤ 0, j = 1, ... , r.
Considering program B, according to Theorem 13.2, if at x∗ , ∇ gˆ l ( x∗ ) · z < 0 for each active constraint l , then if x∗ is a local maximum of fˆ, there are λ j ≥ 0, j = 1,... , r, such that:
∂ fˆ( x∗ ) ∂xi
=
r X
λj
j =1
∂ gˆ j ( x∗ ) ∂xi
, i = 1,... , n
(13.9)
and λ j gˆ j ( x∗ ) = 0 for all j = 1,... , r.
These conditions may be written in terms of f and g j . Since f = − fˆ and
g = − gˆ j , Equation 13.9 is unchanged by multiplying by −1 to give ∂ f ( x∗ ) ∂xi
=
r X
j =1
λj
∂ g j ( x∗ ) ∂xi
, i = 1,... , n
(13.10)
and λ j gˆ j ( x∗ ) = 0 is equivalent to λ j g j ( x∗ ) = 0. Finally, note that ∇ gˆ l ( x∗ ) · z < 0 is
equivalent to ∇ g l ( x∗ ) · z > 0. Consequently,
Theorem 13.3: Suppose that at x∗ , there is some z such that ∇ g l ( x∗ ) · z > 0 for
each active constraint, l . Then if x∗ is a local minimum in program A, there exists λ j ≥ 0, j = 1,... , r such that Equation 13.10 is satisfied and λ j g j ( x∗ ) = 0, for all j.
13.2.4
Global and local optima
Theorem 13.1 gives necessary conditions for a constrained local optimum. This leaves open the possibility that there is some alternative feasible x′ satisfying the constraints, for which f ( x′ ) > f ( x∗ ).
13.2. OPTIMIZATION
437
Theorem 13.4: Suppose that f is concave and the feasible region convex, then, given the constraint qualification, any x∗ satisfying Equation 13.3 solves the optimization problem 13.2.
P ROOF : To see this observe that if f is concave then for any α ∈ [0, 1],
f ((1 − α) x∗ + α x′ ) ≥ (1 − α) f ( x∗ ) + α f ( x′ ). Rearranging gives
f ( x∗ + α( x′ − x∗ )) ≥ f ( x∗ ) + α[ f ( x′ ) − f ( x∗ )] or
f ( x∗ + α( x′ − x∗ )) − f ( x∗ ) ≥ α[ f ( x′ ) − f ( x∗ )]. Dividing by α and taking limits gives ∇ f ( x∗ ) · ( x′ − x∗ ) ≥ f ( x′ ) − f ( x∗ ). If there is
some feasible x′ with f ( x′ ) > f ( x∗ ), then ∇ f ( x∗ ) · ( x′ − x∗ ) > 0. Because the feasible set is convex, for every binding constraint, g j ( x∗ + α( x′ − x∗ )) = g j ((1 − α) x∗ + α x′ ) ≤ g j ( x∗ ) = 0 so that ∇ g j ( x∗ ) · ( x′ − x∗ ) ≤ 0 for every binding constraint. Put
z = ( x′ − x∗ ). So, 0 < ∇ f ( x∗ ) · z and for every binding constraint, j, ∇ g j ( x∗ ) · z ≤ 0. P According to the necessary condition for an optimum, ∇ f ( x∗ ) = rj=1 λ j ∆ g j ( x∗ )
with λ j = 0 for every nonbinding constraint. Combining these inequalities: ∗
0 < ∆ f (x ) · z =
"
r X
j =1
∗
#
λ j ∆ g j (x ) · z =
r X
j =1
λ j ( ∆ g j ( x ∗ ) · z ) ≤ 0.
This contradiction implies there is no such x′ . An analogous result holds in the minimization case with f convex.
13.2.5
Nonnegativity constraints
Nonnegativity constraints may be accommodated in this framework by adapting Theorems 13.1 and 13.2. These calculations lead to the wellknown Kuhn–Tucker conditions. The constraint x i ≥ 0 may be written g◦i ( x) = − x i ≤ 0. Consider max f ( x) subject to g j ( x) ≤ 0, j = 1, ... , r; x i ≥ 0, i = 1,... , n,
(13.11)
or equivalently, max f ( x) subject to g j ( x) ≤ 0, j = 1, ... , r; g0i ( x) ≤ 0, i = 1,... , n.
(13.12)
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
438
With β i denoting the multiplier on constraint g◦i , Equation 13.4 becomes: ∂ f ( x∗ ) ∂xi
=
r X
∂ g j ( x∗ )
λj
∂xi
j =1
∂xi
=
r X
n X
λj
βk
∂xi
= −1 and
∂ g j ( x∗ )
j =1
∂ g◦k ( x∗ )
k =1
∂ g ◦i ( x∗ ) ∂ xi
with λ j ≥ 0 and βk ≥ 0. Noting that ∂ f ( x∗ )
+
∂xi
, i = 1,... , n,
∂ g ◦k ( x∗ ) ∂ xi
(13.13)
= 0, k 6= i , this is
− β i , i = 1,... , n.
(13.14)
The complementary slackness conditions give: λ j g j ( x∗ ) = 0 and β i g◦i ( x∗ ) =
− β i x∗i = 0. From Equation 13.14, the conditions β i ≥ 0 and β i x∗i = 0, along with
x∗i ≥ 0, give: ∂ f ( x∗ ) ∂xi
−
r X
λj
∂ g j ( x∗ ) ∂xi
j =1
≤ 0,
x∗i
Ã
∂ f ( x∗ ) ∂xi
−
r X
λj
∂ g j ( x∗ )
j =1
∂xi
!
= 0, x∗i ≥ 0,
∀ i.
(13.15) Similarly,
g j ( x ∗ ) ≤ 0, λ j g j ( x ∗ ) = 0, λ j ≥ 0,
∀ j.
(13.16)
Provided the constraint qualification is satisfied, the conditions in Equations 13.15 and 13.16 are necessary for a local maximum of Equation 13.11. These conditions may be expressed in terms of a Lagrangian function. Theorem 13.5: Consider the program: max f ( x) subject to g j ( x) ≤ 0,
j = 1, ... , r;
x i ≥ 0, i = 1,... , n.
(13.17)
Let L = f ( x) −
r X
λ j g j ( x ).
j =1
Provided the constraint qualification is satisfied, a local maximum in Equation 13.17 satisfies: ∂L ∂xi ∂L ∂λ j
≤ 0;
≥ 0;
xi λj
∂L ∂xi
∂L ∂λ j
= 0; = 0;
x i ≥ 0 for λj ≥ 0
for
i = 1,... , n j = 1,... , r.
(13.18)
13.2. OPTIMIZATION
439
For a local minimum, consider the program: min f ( x) subject to g j ( x) ≥ 0, j = 1, ... , r; x i ≥ 0, i = 1,... , n.
(13.19)
Using the earlier reasoning, the constraint qualification is modified (as in Theorem 13.3). Noting that minimizing f is equivalent to maximizing − f and
that the constraint g j ( x) ≥ 0 is equivalent to − g j ( x) ≤ 0, the minimization problem may be written as:
max [− f ( x)] subject to − g j ( x) ≤ 0, j = 1, ... , r; x i ≥ 0, i = 1,... , n, so the Lagrangian has the form L = − f − ∂Lˆ ∂λ j
Pr
j =1 λ j [− g j ].
With Lˆ = −L ,
(13.20) ∂Lˆ ∂ xi
= − ∂∂L x , i
= − ∂∂λL , the conditions in Equation 13.18 translate directly to identify necesj
sary conditions for x∗ to be a local solution in Equation 13.20. Theorem 13.6: Consider the program: min f ( x) subject to g j ( x) ≥ 0,
j = 1, ... , r;
Let L = f ( x) −
r X
x i ≥ 0,
i = 1,... , n.
(13.21)
λ j g j ( x ).
j =1
Provided the constraint qualification is satisfied, a local minimum in Equation 13.21 satisfies: ∂L
≥ 0;
∂xi ∂L ∂λ j
≤ 0;
xi λj
∂L ∂xi
∂L ∂λ j
= 0; = 0;
x i ≥ 0 for λj ≥ 0
for
i = 1,... , n j = 1,... , r.
(13.22)
The following example revisits an earlier example with inequality constraints using complementary slackness explicitly. E XAMPLE 13.9: Recall Example 13.2: max x f ( x1 , x2 ) = x1 x2 subject to the p x1 ≤ 0. Formu
constraints g 1 ( x1 , x2 ) = x22 + x12 − 4 ≤ 0 and g 2 ( x1 , x2 ) = x2 −
lating this problem with the Lagrangian and imposing the nonnegativity constraints gives: p L = x1 x2 − λ1 [ x22 + x12 − 4] − λ2 [ x2 − x1 ]
∂L
∂ x1
= x2 − λ1 2 x1 + λ2
1 1 p 2 x1
(13.23)
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
∂L ∂ x2 ∂L ∂λ1 ∂L ∂λ2
440
= x1 − λ1 2 x2 − λ2
(13.24)
= −[ x22 + x12 − 4]
(13.25)
= −[ x 2 −
p
x 1 ].
(13.26)
Since ( x1 , x2 ) = (1, 1) satisfies the constraints, in any solution, x i > 0, oth
erwise x1 x2 = 0 < 1 · 1. If both x1 and x2 are strictly positive, complemen
tary slackness implies that x2 − λ1 2 x1 + λ2 12 p1x = 0 and x1 − λ1 2 x2 − λ2 = 0. 1
Next, observe that λ1 and λ2 are strictly positive. To see this, sup
pose that λ1 = 0. Then Equation 13.23 implies that x2 = −λ2 21 p1x
1
and
Equation 13.24 gives x1 = λ2 . Since x1 > 0, λ2 > 0, so that x2 = −λ2 12 p1x
1
implies that x1 and x2 have opposite signs. Hence λ1 > 0. If λ2 = 0, then Equation 13.23 gives x2 = λ1 2 x1 and Equation 13.24 gives x1 = λ1 2 x2 . Therefore,
x2 x1
=
2λ1 x1 2λ1 x2
=
x1 x2 ,
and so x22 = x12 or x1 = x2 , since both are positive. Equa
tion 13.25 then implies that x1 > 1 and x2 > 1 but Equation 13.26 requires p x2 = x1 , which is impossible. Therefore, in the solution, x1 , x2 are located at the intersection of the constraints and the multipliers are determined as in
♦
Example 13.2.
E XAMPLE 13.10: A utility function that arises frequently in economics is the quasilinear function:
u( x, y) = v( x) + y, where x and y are the two goods consumed. Denote income by m, so that the budget constraint is px + q y = m. Consider the problem of maximizing
utility subject to the budget constraint. Utility is maximized at the tangency p of the indifference curve and the budget constraint. Suppose that v( x) = x
so that v′ ( x) =
1 p1 2 x.
budget constraint has slope
or x =
q2
1 4 p2 .
p −q.
Setting these equal,
Thus, expenditure on good x is px =
y ≥ 0, then px =
2 1q 4 p
solution where y < 0.
dy dx 1 p1 2 x
The indifference curve has slope 2 1q 4 p .
= − 21 p1x and the p q p = q , or x = 12 p ,
Since px + q y = m, if
≤ m. Otherwise, the situation is as depicted, with a
13.2. OPTIMIZATION
441
y
p
x + y = u¯
2 1 q 4 p2
·
x
•
Write the budget constraint as g( x, y) = px + q y − m ≤ 0. The problem is p then to maximize f ( x, y) = x + y subject to g( x, y) = px + q y − m ≤ 0, x ≥ 0
and y ≥ 0. Assume that
2 1q 4 p
> m, so that, in the absence of nonnegativity
constraints, the value of x demanded would exceed m, as depicted. p The Lagrangian for this problem is L = x + y − λ[ px + q y − m]. Applying
the conditions in Equation 13.22. ∂L ∂x
=
1 1 p − λ p ≤ 0; 2 x
∂L ∂y ∂L ∂λ
= 1 − λ q ≤ 0;
= −[ px + q y − m] ≥ 0;
x
y
λ
∂L ∂x ∂L ∂y
∂L ∂λ
= 0;
x≥0
= 0;
y≥0
= 0;
λ ≥ 0.
∂L ∂ y ≤ 0 implies that 1 − λ q ≤ 0, so ∂L 1 p1 1 that λ ≥ q . Next, ∂ x ≤ 0 implies that 2 x − λ p ≤ 0 and rearranging gives p p x ≥ 12 λ1p . So, x > 0 and therefore x ∂∂Lx = 0 implies ∂∂Lx = 0, giving x = 12 λ1p or x = 14 λ21p2 . To see that y = 0, suppose otherwise, so that y ∂∂Ly = 0 implies that ∂∂Ly = 0 p p q and so 1 − λ q = 0, implying that λ = 1q . In that case, x ≥ 21 λ1p gives x ≥ 21 p 2 2 q q and x ≥ 41 p2 , implying that px ≥ 14 p > m.
To solve this system, first observe that
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
442
L = 0 implies that [ px + q y − m] = 0 and with y = 0, px = m. Since λ > 0, λ ∂∂λ
And, since x =
1 1 4 λ2 p 2 ,
1 1 4 λ2 p 1 1 4 λ2 p 2
px =
substituting this into x =
= m or
gives x =
Thus, the solution values are: x =
13.2.6
1 λ2
m p,
= 4 pm, or λ = p 1
4 pm
1 1 4 λ2 p 2 ,
. Observe that
which gives x =
y = 0, λ = p 1
4 pm
m p.
.
♦
Necessary conditions and nonconvexities
The necessary conditions for a local optimum are not sufficient for a local optimum. The next example presents an earlier example in terms of the Lagrangian and shows how the necessary conditions may be satisfied at a point that is not a local optimum.
E XAMPLE 13.11: Recall Example 13.3: maximize the function f ( x, y) =
α ln x + y, subject to the constraints g 1 ( x, y) = y2 + ( x − 1)2 ≤ 2 and g 2 ( x, y) =
x y ≤ k = 1.12. The Lagrangian is:
L = α ln x + y − λ1 ( y2 + ( x − 1)2 − 2) − λ2 ( x y − k).
So, the necessary conditions become: ∂L ∂x ∂L ∂y ∂L ∂λ1 ∂L ∂λ2
1 ∂L = α − 2λ1 ( x − 1) − λ2 y ≤ 0; x = 0; x ∂x = 1 − 2λ1 y − λ2 x ≤ 0;
y
= −( y2 + ( x − 1)2 − 2) ≥ 0;
λ1
= −( x y − k) ≥ 0;
λ2
∂L ∂y ∂L ∂λ1 ∂L ∂λ2
Suppose that there is a solution with λ1 = 0. Then
= 0,
x≥0 y≥0
= 0, λ1 ≥ 0 = 0, λ2 ≥ 0. ∂L ∂y
≤ 0 implies λ2 x > 0. 1 ≤ 0 gives α x − λ2 y ≤ 0, which implies that y > 0. Next, With λ1 = 0, y ∂L λ2 ∂λ = 0 gives x y = k so 1x = k . Substituting into ∂∂Lx = 0 (α 1x − λ2 y = 0) gives 2 y α k − λ2 y = 0 or ( αk − λ2 ) y = 0 and since y > 0, λ2 = αk . Turning to ∂∂Ly = 0 gives 1 − λ2 x = 0 or 1 − αk x = 0 so that x = αk , so that x y = k implies y = α. Thus, the pair ( x′ , y′ ) = ( αk , α) satisfy the necessary conditions for an optimum. Since α = 0.6 and k = 1.12 in the discussion, here y′ = 0.6 and x′ = 10.12 .6 = 1.867. ′ ′ ′ ∂L ∂x
The value of the objective at ( x , y ) is β = 0.9745 = 0.6 ln(1.866) + 0.6.
13.2. OPTIMIZATION
443
y
xy = k
2
∇ g1
∇ g2
∇f ∇ g2 ∇f
y∗ 1 ( x − 1)2 + y2 = 2
y′
0
x∗
0
1
x′
6 10
· ln x + y = β
6 10
· ln x + y = β′ x
2
The solution to the constrained optimization problem is at ( x∗ , y∗ ) where the necessary conditions for a constrained maximum are satisfied. But the necessary conditions are also satisfied at ( x′ , y′ ) and this point is not a local maximum. As the figure illustrates, there are points in the neighborhood of ( x′ , y′ ) satisfying the constraint and yielding higher payoff. Along the constraint curve x y = k, y = kx and so the value of the objective along the constraint is f¯( x) = α ln x + k . A secondorder Taylor series expansion gives: x
µ ¶ µ µ ¶µ ¶ ¶ k k 2 k k 1 f¯( x) = f¯ + f¯′ x− + f¯′′ ( p) x − , α α α 2 α ³ ´ Since f ′ ( x) = α 1x − k x12 , at x = αk , f ′ αk = α 1k − α ¤ £ = 0. Also, f ′′ ( x) = −α x12 + 2k x13 = −α + 2k 1x x12 , so f ′′ ( p) =
where p is between x and 2
2
k . α
k ³ 1´2 = αk − αk k h α i ¤ £ −α + 2k 1p p12 . When x is close to αk , p ≈ αk , and − α + 2k 1p ≈ α > 0, so ¡ ¢2 f ′′ ( p) ≈ α αk > 0 when x is close to αk . Therefore, µ ¶ µ ¶ k 2 1 ³ α ´2 k · x− + α f¯( x) ≈ f¯ . α 2 k α
So, f ( x) > f¯( αk ) for x 6=
k α
in a neighborhood of x′ =
k . α
Finally, observe that
the constraint set is not convex (so Theorem 13.4 does not apply).
♦
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
13.2.7
444
The Lagrange multiplier
Consider a maximization program, with objective f and a single constraint g. Since ∇ f points in the direction of increase of f , at a point x∗ , on the boundary
of the constraint set defined by g, ∇ f should point out from the feasible region.
If ∇ g, which gives the direction of increase for g the constraint, points “out” from
the feasible set, in this case, at a tangency, ∇ f = λ∇ g for some positive λ. For ∇ g
to point out from the feasible set, the feasible set must be defined { x  g( x) ≤ 0}. Conversely, if the objective is to minimize f , then the direction of increase of f
should point into the feasible region at the candidate solution x∗ on the boundary of the feasible region. If we are to write ∇ f = λ∇ g with λ > 0, then ∇ g must
point into the feasible region. For this to be so, we must define the feasible set as { x  g( x) ≥ 0}.
13.2.8
Equality and inequality constraints
Finally, consider the case where there are both equality and inequality constraints. Given the family of constraints:
h l ( x) = 0, l = 1,... m;
g j ( x) ≤ 0, j = 1 ... , r.
Consider the problem of minimizing f ( x) subject to these constraints. Theorem 13.7: Suppose that f , h l , g j are continuous with continuous partial derivatives. Let X be an open set in R n and x∗ ∈ X and suppose that f ( x∗ ) ≤
f ( x), ∀ x ∈ X . Then there are numbers λ, µ1 , µ2 ,... , µk , λ1 , λ2 ,... , λr , not all 0, such
that:
λ
∂ f ( x∗ ) ∂xi
+
m X
l =1
µl
∂ h l ( x∗ ) ∂xi
+
r X
λr
j =1
∂ g j ( x∗ ) ∂x j
= 0,
i = 1,... , n.
(13.27)
In addition, 1. λ ≥ 0 and λ j ≥ 0 2. For each j such that g j ( x∗ ) < 0, λ j = 0 3. If the g j and h l satisfy a constraint qualification: the gradients of all h l and the active g j (those g j with g j ( x∗ ) = 0) are linearly independent vectors, then λ can be chosen equal to 1: λ = 1.
13.2. OPTIMIZATION
445
A result similar to Theorem 13.4 also holds with both equality and inequality constraints. Theorem 13.8: Suppose that f , h l , g j are continuous with continuous partial derivatives. Suppose that at x∗ , Equations 13.27 are satisfied, the constraint qualification is satisfied, f is convex and the constraint set convex. The x∗ minimizes f subject to the constraints.
This page intentionally left blank
Exercises for Chapter 13
Exercise 13.1 Let
f ( x, y) = y − α x − xβ ,
α, β > 1
g( x, y) = y2 − 1 − ln( x + 1).
Solve the problem max f ( x, y) subject to g( x, y) ≤ 0 and x, y ≥ 0, using the Kuhn– Tucker conditions.
Exercise 13.2 Let
f ( x, y) = x y g 1 ( x, y) = ax + b y − 2a,
a=b>0
g 2 ( x, y) = y + cxγ − 1,
c=
1 , γ = 2. 10
Solve the problem max f ( x, y) subject to g i ( x, y) ≤ 0 for i = 1, 2 and x, y ≥ 0, using
the Kuhn–Tucker conditions. Exercise 13.3 Let
f ( x, y) = x2 y g 1 ( x, y) = ax + b y − 2a,
a=b>0
g 2 ( x, y) = y + cxγ − 1,
c=
447
1 , γ = 2. 10
CHAPTER 13. INEQUALITYCONSTRAINED OPTIMIZATION
448
Solve the problem max f ( x, y) subject to g i ( x, y) ≤ 0 for i = 1, 2 and x, y ≥ 0, using the Kuhn–Tucker conditions. Exercise 13.4 Let
f ( x, y) = 14 − x2 − y2 − 2 x + 2 y g( x, y) = x + y − 4. Solve the problem max f ( x, y) subject to g( x, y) ≤ 0 and x, y ≥ 0, using the Kuhn– Tucker conditions.
Exercise 13.5 Let
f ( x, y) = x y
g 1 ( x, y) = 1 − y + ( x − 1)3
g 2 ( x, y) = 2 y − 2 + 4( x − 1)3 .
Consider the program: max f ( x, y) subject to g i ( x, y) ≤ 0, i = 1, 2. Confirm that the
solution to this program occurs at ( x, y) = (1, 1), but at this point, there is no nonnegative solution ((λ1 , λ2 ) pair) to the equation:
∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 .
Explain why the Kuhn–Tucker conditions fail.
14 Integration
14.1 Introduction The integral of a function f defines a new function, which may have the interpretation of an area or as the “reverse” of differentiation. For example, the computation of consumer surplus and deadweight loss involve the calculation of area. Given a demand function P(Q) and market price p, the consumer surplus is defined as the area under the demand curve above price and is used as a measure of the welfare generated by the existence of that market. The technique of integration provides a means of determining this area. In a second type of problem, it is useful to identify a function as the derivative of another function. For example, the rate of return on an asset – the change in the value of the asset per unit of time – may be viewed as the derivative of the value of the asset. Thus, knowing the derivative of the value of the asset with respect to time, say v′ (t), it is necessary to determine the value of the asset, v(t), from this derivative – given v′ (t), v(t) must be found. These problems are solved by integration. In what follows, the integral of a function is formally defined. Following this, a variety of applications are developed: Section 14.4 considers the welfare measures of producer and consumer surplus. Section 14.5 considers the calculation of cost and profit for a firm. Taxation and deadweight loss are examined in Section 14.6. In the remaining sections (Sections 14.7, 14.8, 14.9 and 14.10) presentvalue calculations, Leibnitz’s rule, measures of inequality and Ramsey pricing are discussed. Finally, Section 14.11 explores in detail various measures of welfare using these techniques.
449
CHAPTER 14. INTEGRATION
450
14.2 The integral Let f be a continuous function defined on an interval [a, b]. The integral of f over Rb the interval [a, b] is denoted a f (x)dx. Considering this expression as the area under the function f , when a and b are very close, then on the interval [a, b] the value of the function f does not change much ( f (b) ≈ f (a)), so the area is roughly
f (a)(b − a). To extend this idea to any interval (where b is not close to a), partition
the interval [a, b] into subintervals [x1 , x2 ],[x2 , x3 ],... ,[x k−1 , x k ] with a = x1 and
b = x k . A partition may be defined by the points determining the intervals: P =
{ x1 ,... , x n }, with x i < x i+1 . The norm or mesh of the partition is the largest of these
subintervals. Write P  for the norm of the partition, define P  = max i (x i+1 − x i ),
giving a measure of the widest interval in the partition. For each interval [x i , x i+1 ], let s i be a point in [x i , x i+1 ], x i ≤ s i ≤ x i+1 , i = 1,... , k − 1, s = (s1 ,... , s k−1 ) and consider the expression:
R(P , s) =
kX −1 i =1
f (s i )(x i+1 − x i ).
(14.1)
Definition 14.1: The function f is (Riemann) integrable on [a, b] if there is a number I, such that for every ǫ > 0 there is a δ > 0, such that if P  < δ, then  R(P , s) − I < ǫ,
for any s = (s1 ,... , s k ), x i ≤ s i ≤ x i+1 , i = 1,... , k − 1. In this case, I is said to be the Rb integral of f over [a, b] and the integral is denoted a f (x)dx = I.
Assuming the integral exists, the expression,
Rb a
f (x)dx denotes the area between
the function f and the horizontal axis. Figure 14.1 depicts the function f (x) = sin(π · x) + ln(x + 1) + 1, where x is measured in radians.
In Figure 14.2, the Riemann approximation is illustrated. As the mesh size becomes smaller, the rectangles better approximate the area under the curve. So, the integral is defined as the limit of a sequence of approximations to the area under the curve between a and b. In the figure, the area represented by the P rectangles is 6j =1 f (s i )(x j +1 − x j ). Note that if a < c < d, then Zc a
f (x)dx +
Zd c
f (x)dx =
Zd
f (x)dx.
a
A different perspective may be given where the integral is seen as the “reverse” of differentiation. With a fixed, varying b varies the integral and we may write the
14.2. THE INTEGRAL
451
y f ( x)
3 2 1 Rb a
0
a
0
f ( x)dx
b
1
x
2
Figure 14.1: The integral of f from a to b
y f ( x)
s1 s2 s2 s4 s5 s6 · · · · · · x1 x2 x3 x4 x5 x6 x7
x
Figure 14.2: The integral of f : definition
integral as a function of b. Define
F ( b) = Then F (b + ǫ) − F (b) =
Zb
f ( x)dx.
a
R b +ǫ
f ( x)dx. If ǫ is small, with f continuous, f is approximately constant on [b, b + ǫ] and equal to f (b). Then F (b + ǫ) − F (b) ∼ = f (b)[(b + ǫ) − b] = f (b)ǫ. Thus
b
F ( b+ǫ)−F ( b) ǫ
∼ = f (b). As ǫ → 0,
F ( b+ǫ)−F ( b) ǫ
→ F ′ (b) and the approx
imation becomes exact so that F ′ (b) = f (b). From this perspective, integration recovers a function F whose derivative is f .
CHAPTER 14. INTEGRATION
452
14.3 Common integrals and rules of integration It is common to write
R
f (x)dx to denote that function F(x) with the property
that F ′ (x) = f (x). The function F is not unique since adding a constant leaves the
derivative unchanged. However, the definite integral may be obtained directly Rb from F: F(b) − F(a) = a f (x)dx. Thus, it makes sense to have a record of some standard indefinite integrals. The following table summarizes some of the more frequently used integral formulae. R
f (x)dx
R
n
x dx R1 dx R xax e dx R sin(ax)dx R cos(ax)dx
F(x) = = = =
=
1 n +1 n +1 x
ln x, x > 0
1 ax ae − 1a cos(ax) 1 a sin(ax)
14.4 Measuring welfare: surplus Two standard measures of welfare are consumer and producer surplus. Consumer surplus is defined as the area under the demand curve above price; producer surplus is revenue less variable cost. These are discussed next, and illustrate the use of integration in economics.
14.4.1
Consumer surplus
Suppose that there are six consumers of a good. Each consumer requires one unit of the good. Let w p i denote willingness of i to pay for a unit and suppose that, w p1 = 8, w p2 = 6, w p3 = 5, w p4 = 4, w p5 = 3 and w p6 = 1. With this information
we can plot aggregate demand as a function of price (Figure 14.3).
Each consumer pays the price p∗ = 3. Consumer 1 is willing to pay 8, 5 more
than 1 must pay and so has $5 “surplus”. Similarly, consumer 2 has $3 of surplus, and so on. The total surplus for the consumers is: CS = (8 − 3) + (6 − 3) + (5 − 3) + (4 − 3) + (3 − 3) = 11.
If D(p) is the number of units demanded at price p, demand, or p(q) inverse demand, then CS =
Z8 3
D(p)d p =
Z4 0
[p(q) − 3]dq.
14.4. MEASURING WELFARE: SURPLUS
453
p 9 8 7 6 5 4 p=3 2 1
D q
0 0 1 2 3 4 5 6 7 8 9
Figure 14.3: Consumer surplus
14.4.2
Producer surplus
Firm supply is given by marginal cost. Let c i denote the extra cost of producing the ith unit. And set c1 = 12 , c2 = 22 , c3 = 32 , c4 = 42 , c5 =
5 2
and c6 = 62 . The total
producer surplus (Figure 14.4) is the sum of price less the cost for each additional unit: µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ 1 2 3 4 5 1 PS = 3 − + 3− + 3− + 3− + 3− =7 . 2 2 2 2 2 2
p 6 5 4 p=3
S
2 1 q
0 0 1 2 3 4 5 6 7 8
Figure 14.4: Producer surplus
CHAPTER 14. INTEGRATION
454
Combining these gives Figure 14.5.
p 9 8 7 6 5
CS
4 3p 2
PS
1 q
0 0 1 2 3 4 5 6 7 8 9
Figure 14.5: Consumer and producer surplus With S(p) the supply function, we can write total surplus as Z8 1 2
[D(p) − S(p)]d p.
Alternatively, if we write the demand and supply curves in inverse form, p d (q) and p s (q), the total surplus may be evaluated as: Z q∗ 0
[p d (q) − p s (q)]dq,
where q∗ is defined by the equation, p d (q∗ ) = p s (q∗ ). In this formulation, each individual buys only one unit of the good. Suppose that the individual’s demand for the good depends on price. The following discussion shows how consumer surplus measures the net benefit to the consumer of having the good available.
14.4.3
Consumer surplus and welfare
Suppose that utility is determined by the level of income y and the amount of good x consumed. The individual has income y and in the absence of consumption of the good y, has utility u = y, so that a dollar of income gives one unit of utility. The
14.4. MEASURING WELFARE: SURPLUS
455
unit price of x is p, so that x units cost px and the remaining income is then y − px.
The utility from x is increasing, but marginal utility from consumption of x is
decreasing. To represent this, let the utility from gross income y and consumption x be given according to the formula 1
u(x, y) = x 2 + y − px. 1 − 21 2x
Maximizing utility by choice of x gives: for x, x(p) =
1 1 4 p2 ,
1
− p = 0, or x 2 =
and the inverse demand is p(x) =
1 p
2 x
1 2p ,
giving demand
. In the absence of any
0
purchase of good x, the individual has welfare u = y. With x available at price p per unit, demand is x(p) = 11 4 p.
1 1 4 p2
Welfare with x available is
and so total expenditure on x is px(p) = p · 14
1 p2
=
1
u∗ = x(p) 2 + y − px(p) 1 11 + y− 2p 4p 11 = y+ . 4p =
Comparing with u0 , the gain from the presence of market x is
11 4 p:
The individual would be willing to give up (or pay) income equal to
u∗ − u0 = 1 1 4 p
11 4 p.
for access
to the market for good x at price p. Consider now the computation of consumer surplus associated with good x. From the demand curve x(p) (Figure 14.6), consumer surplus is given by: CS =
Z∞ p
1 x(t)dt = 4
Z∞ p
−2
t
¯ ¯ 1 −1 1 ¯¯∞ 1 −1 ¯¯∞ 1 1 dt = t =− t ¯ = . 4 −1 ¯ p 4 4p p
p
CS

Figure 14.6: Cost and profit
x
CHAPTER 14. INTEGRATION
456
Thus,
u∗ − u0 = CS. In this example, there is an exact connection between welfare measured by variation in utility and welfare measured by consumer surplus. This arises because the utility function is quasilinear and, as a result, the level of income y does not 1
1
appear in the demand function for x. If, for example, u( x, y) = x 2 y 2 , then the exact connection breaks down but consumer surplus may still be considered a welfare measure. This is discussed further later. In general, with income level y and m goods available to purchase at prices
p1 ,... , p m , demand for good j will have the form x j ( p1 ,... , p m , y). In that case, to compute consumer surplus, we take the prices of other goods as constant and let
xˆ j ( p j ) = x j ( p1 ,... , p j −1 , p j , p j +1 ,... , p m ). Then, for example, the consumer surplus R p′′ lost as a result of a price increase from p′j to p′′j is given by p′ j xˆ j ( p j )d p j . Aggregaj
tion to many consumers is straightforward, using the aggregate demand function P X j ( p j ) = ni=1 xˆ ij ( p j ), where xˆ ij ( p j ) is the demand for good j by individual i .
Consider the constant elasticity of substitution (CES ) utility function ρ
ρ 1
u( x1 , x2 ) = ( x1 + x2 ) ρ , where 0 < ρ < 1. Suppose the individual has income y and
faces prices p i , p j . Then, with utility maximization, the demand for good i is:
x i ( p i , p j , y) =
p ri −1 y
, p ri + p rj
r=
ρ ρ−1
.
For example, letting ρ = 12 , so that r = −1, this becomes
x i ( p i , p j , y) =
p−i 2 y p−i 1 + p−j 1
=
y p i + p2i p−j 1
.
Suppose p j = 1 and y = 100; this then becomes xˆ i ( p i ) = x i ( p i , 1, 100) =
p′i
the consumer surplus loss due to a price increase from p i to is: " # Z p′ p′i i 100 pi ˜ d p = 100 ln − ln i 1 + p′i 1 + pi ˜ i + p˜ 2i pi p
14.5 Cost, supply and profit When a firm is a price taker, profit at price p is given by: π( q) = pq − c( q),
100 . p i + p2i
Then
14.5. COST, SUPPLY AND PROFIT
457
where c(q) is the total cost (fixed and variable) of producing q units of output. Thus, F = c(0) denotes the fixed cost, independent of whether output is produced
or not. Define the variable cost function as c v (q) = c(q) − c(0). Thus, c′v (q) = c′ (q). If price is fixed, maximizing profit gives
π′ (q) = p − c′ (q) = 0
or p = c′ (q) = mc(q). Thus, quantity is chosen to set marginal cost equal to price. Assuming c′ (q) is increasing, this equation defines the firm’s supply curve (Figure 14.7). At price p, to maximize profit, q should be chosen to satisfy p = c′ (q) = mc(q). Or, q =
mc−1 (p) = s(p), where s(p) is the firm’s supply curve. At price p, the firm will p c′ (q)
p∗
c v (q∗ ) q∗ = s(p∗ )
q
Figure 14.7: The supply curve supply the quantity mc−1 (p) to maximize profit. On the assumption that marginal cost is increasing, supply is upward sloping. Total cost is the sum of variable and fixed cost: c(q) = c v (q) + F and so c′ (q) = c′v (q). The area under the marginal cost curve gives variable cost:
Z q∗ 0
′
c (q)dq =
Z q∗ 0
c′v (q)dq,
which restates to: c(q∗ ) − c(0) = c v (q∗ ) − c v (0) = c v (q∗ ),
CHAPTER 14. INTEGRATION
458
since c v (0) = 0. Profit for the firm is π(q∗ ) = p∗ q∗ − c(q∗ ) = p∗ q∗ − c v (q∗ ) − F.
Rearranging: ∗
∗ ∗
∗
∗ ∗
π(q ) + F = p q − c v (q ) = p q −
Z q∗ 0
c′v (q)dq.
Thus, [π(q∗ ) + F] +
Z q∗ 0
c′v (q)dq = p∗ q∗ .
p c′ (q)
p∗ π(q∗ ) + F
Rq∗ ′ c v (q)dq 0
q∗ = s(p)
q
Figure 14.8: Profit and cost So, producer surplus equals profit plus fixed cost (Figure 14.8). Producer surplus depends only on the marginal cost curve, and profit may be negative if F is large.
14.5.1
Average and marginal cost
These observations may also be seen in terms of average cost: AC(q) =
c(q) c v (q) + F c v (q) F = = + . q q q q
Then, AC ′ (q) =
qc′ (q) − c(q) 1 ′ = [c (q) − AC(q)]. q q2
14.5. COST, SUPPLY AND PROFIT
459
From this equation, one sees two things. First, average cost is minimized at the point where marginal cost equals average cost (AC ′ (q) = 0 gives c′ (q) = AC(q)).
Second, average cost is declining when MC < AC; and increasing when MC > AC. These observations are depicted in Figure 14.9. Note that, as depicted, p∗
= MC(∆), ∆ ∆
where the strict inequality follows provided c(0) > 0.
p c′ (q) p
AC(q)
∗
q∗ = s(p)
q
Figure 14.9: Average cost curves and marginal cost Similarly, considering average variable cost AV C(q) =
AV C ′ (q) =
qc′v (q)− c v (q) , q2
c v (q) q ,
it has derivative
so that the minimum AV C occurs where average variable
cost coincides with marginal cost. In the case of increasing marginal cost, average variable cost is also increasing and equals marginal cost at 0. More generally, observe that, since c v (0) = 0: lim
q→0
AV C(q) c v (q) c v (q) − c v (0) = lim = lim = c′v (0) = MC(0), q→0 q q→0 q q
so that average variable cost and marginal cost always coincide at q = 0.
Figure 14.10 depicts the case where marginal cost declines initially. Consider the
CHAPTER 14. INTEGRATION
460
p
c′ (q)
AV C(q)
AC(q)
p¯
q
q∗
Figure 14.10: Cost curves
total cost function c(q) = F + 4q − 74 q2 + 13 q3 , where F ≥ 0 is a constant, the fixed cost. The marginal cost function is
c′ (q) = 4 −
28 q + q2 8
and the average variable cost function is 7 1 AV C(q) = 4 − q + q2 . 4 3 ¡ ¢ ¡ ¢ At the minimum, q∗ = 2 58 and the value of either function is c′ 2 58 = AV C 2 58 = 109 64
= p∗ . At p∗ , variable costs are (just) covered at output level q∗ ,
p∗ = AV C(q∗ ) =
c v (q∗ ) . q∗
Thus, in Figure 14.11, B + D = p∗ q∗ = c v (q∗ ) =
Z q∗ 0
c′ (q)dq = A + D.
The logic for this is clear; marginal cost is high initially and at price p∗ exceeds unit revenue. As output increases beyond q′ , price exceeds marginal revenue and at point q∗ revenue from sales fully covers marginal cost. There is no sales (or production) below price p∗ . Above that price, supply follows the marginal cost curve. The supply curve is given by the heavy line (Figure 14.12). No output level between 0 and q∗ is ever chosen: output is either 0 or at least as large as q∗ .
14.6. TAXATION AND CONSUMER SURPLUS
461
c′ (q)
A p
AV C(q)
∗
B D q∗
q′
Figure 14.11: Cost curves
c′ (q)
A
AV C(q)
p∗
B D q′
q∗
Figure 14.12: Supply
14.6 Taxation and consumer surplus While taxation is necessary to raise government revenue, it imposes a burden on the consumer and changes consumer behavior. This change in behavior leads to inefficiency and welfare loss, measured in terms of deadweight loss of consumer surplus. The discussion here explains how this arises. A standard demand function is depicted in Figure 14.13, along with the impact of taxation on price and quantity. Denote the demand function by P(Q), and let P t = P0 (1 + t), where t is an advalorem tax at rate t.
CHAPTER 14. INTEGRATION
462
P
P
P¯
P¯
P0
Pt P0 willingness to pay
Q
Q0
0
0
revenue
C
deadweight loss •B • •A
Q t Q0
Q
Figure 14.13: Taxation and consumer surplus
The loss due to distortion in the market is the shaded triangle marked as deadweight loss. This is sometimes called the “excess burden” of tax, since raising the revenue incurs this additional cost. The deadweight loss is calculated: D(t) =
ZQ 0 Qt
P(Q)dQ − P0 (Q 0 − Q t ).
(14.2)
In Figure 14.13, willingness to pay measures the overall benefit provided to ¯ consumers by this market and is measured by the region POQ 0 A. At price P0 , consumers pay P0 ×Q 0 (the area P0 OQ 0 A). So, the surplus provided by the market ¯ 0 A (or POQ ¯ is given by the region PP 0 A − P0 OQ 0 A). The imposition of the tax raises price to P t , reducing surplus by P t P0 AB and generating revenue (P t − P0 ) ×
Q t . The surplus reduction exceeds the revenue raised by the amount P t P0 AB
less (P t − P0 ) × Q t , which is the region ABC, the deadweight loss associated with the tax.
E XAMPLE 14.1: Let P(Q) =
1 Q
and suppose that P0 is given exogenously
(horizontal supply). Consider the impact of an advalorem tax. One may use the formula in Equation 14.2 to determine the deadweight loss, D. The area under the demand curve between Q 0 and Q t is given by: ZQ 0 Qt
P(Q)dQ =
ZQ 0 Qt
· ¸ Q0 1 Q dQ = ln(Q) Q 0 = ln(Q 0 ) − ln(Q t ) = ln . t Q Qt
14.6. TAXATION AND CONSUMER SURPLUS
463
Next, Q 0 and Q t are determined by the initial price and the tax:
P0 =
1 1 1 1 so that Q 0 = ; and P0 (1 + t) = so that Q t = . Q0 P0 Qt P0 (1 + t)
These give
Q0 Qt
that
= 1 + t so that
RQ 0 Qt
Q0 − Q t = so P0 (Q 0 − Q t ) =
t 1+ t .
P (Q )dQ = ln
h
Q0 Qt
i
= ln(1 + t). Next, observe
µ ¶ 1 1 t 1 1− , = P0 1+t P0 1 + t
Therefore,
D ( t) =
ZQ 0 Qt
p(Q )dq − P0 (Q 0 − Q t )
= ln(1 + t) −
t . 1+t
(14.3) (14.4)
With this formula, the impact of varying the tax on the deadweight loss can be calculated: · ¸ (1 + t) − t 1 − 1+t (1 + t)2 · ¸· ¸ · ¸· ¸ 1 1 1 t = 1− = 1+t 1+t 1+t 1+t t > 0. = (1 + t)2
D ′ ( t) =
[(1 + t)2 − 2 t(1 + t)] (1 + t)4 [(1 + t) − 2 t] = (1 + t) (1 + t)4 (1 − t) = . (1 + t)3
D ′′ ( t) =
Therefore, the deadweight loss is increasing (D ′ ( t) > 0) at an increasing rate
(D ′′ ( t) > 0)).
♦
E XAMPLE 14.2: In Example 14.1, the supply price was exogenous. This example considers the calculation of producer and consumer surplus where supply is upward sloping. Consider the following supply and demand functions:
P d (Q ) = kQ −α , 0 < α < 1,
P s (Q ) = Q β .
CHAPTER 14. INTEGRATION
464
Together these determine the marketclearing quantity, Q 0 , and both the consumer and producer surplus. The following discussion shows that the sum of producer and consumer surplus (overall surplus), is: ·
β+α
(1 − α)(1 + β)
¸
α+β
k α+1 .
Equating supply and demand, P d (Q ) = kQ −α = Q β = P s (Q ) gives k = β
1
Q α+β , so the solution is Q 0 = k α+β . The associated price is P0 = k α+β . Note, β+1
therefore, that R0 = P0 Q 0 = k α+β . The consumer and producer surplus (CS
and PS ) are
CS = = =
ZQ 0 0
ZQ 0 0
P d (Q )dQ − R0 kQ −α dQ − R0
k (Q 0 )1−α − R0 , 1−α
PS = R0 − = R0 − = R0 −
ZQ 0
P s (Q )dQ
ZQ 0
Q β dQ
0
0
1 (Q 0 )β+1 . 1+β
Since R0 cancels out when consumer and producer surplus are combined:
CS + PS =
ZQ 0 0
[P d (Q ) − P s (Q )]dQ. 1 −α
Considering CS , from the solution to Q 0 , (Q 0 )1−α = k α+β so that k 1 −α 1−α (Q 0 )
so that
=
1 −α
1 α+β 1−α kk
=
CS =
1 −α
1+ α+β 1 . 1 −α k
Note that 1 +
1 −α α +β
β+1 β+1 β+1 α 1 k α+β − k α+β = k α+β . 1−α 1−α β+1
For the producer surplus, (Q 0 )β+1 = k α+β , so that β+1
PS = k α+β −
β+1 β+1 1 β k α+β = k α+β . 1+β 1+β
=
α+β+1−α α +β
=
β+1 , α +β
14.7. PRESENT VALUE
465
Therefore,
CS + PS = using
h
α 1 −α
β
i
+ 1 +β =
h
·
α
1−α
+
α(1+β) (1−α)(1+β)
β
1+β
¸
β+1
k α+β = i
β(1−α)
·
+ (1+β)(1−α) =
¸
α+β
(1 − α)(1 + β) h i α(1+β)+β(1−α) (1+β)(1−α)
β+1
k α+β , =
h
α +β (1+β)(1−α)
i .
♦
E XAMPLE 14.3: Suppose that a value added tax t is imposed on the market given by:
P d (Q ) = kQ −α , 0 < α < 1,
P s (Q ) = Q β .
The following calculations determine the deadweight loss associated with the tax. The aftertax equilibrium quantity is determined by: P d (Q ) =
kQ −α = (1 + t)Q β = (1 + t)P s (Q ). This can be rearranged to give
and letting k′ =
k (1+ t)
k −α (1+ t) Q
= Qβ,
gives k′ Q −α = Q β . Repeating earlier calculations gives 1
the aftertax equilibrium price and quantity Q t = (k′ ) α+β , and beforetax β
1
equilibrium is at Q ∗ = (k) α+β . The aftertax equilibrium price is P t = (k′ ) α+β . The deadweight loss is:
DW L = = = = =
ZQ ∗ Qt
ZQ ∗ 0
· · ·
[P d (Q ) − P s (Q )]dQ [P d (Q ) − P s (Q )]dQ − α+β
¸
α+β
¸µ
α+β
¸Ã
(1 − α)(1 + β) (1 − α)(1 + β) (1 − α)(1 + β)
β+1
k α+β − β+1
k α+β
k
β+1 α+β
ZQ t Q∗
·
[P d (Q ) − P s (Q )]dQ
α+β
(1 − α)(1 + β) ¶ β+1 − (k′ ) α+β "
µ
1 1− 1+t
¸
¶ β+1 #! α+β
β+1
(k′ ) α+β
.
♦
14.7 Present value A firm operating a plant has revenue flow: R ( t) at time t. The cost of operating the plant in presentvalue terms to time T is c(T ) and the discount rate is r.
CHAPTER 14. INTEGRATION
466
From these functions, the present value (profit) from operating to period T is π( T ) =
ZT 0
R ( t) e−rt dt − c(T ).
As time goes on (T increases) cost c(T ) increases and the increment to revenue decreases if R ( t) e−rt is declining. For how long should the firm operate: what value of T maximizes π(T )? A special case of this problem occurs when R ( t) = kt and c(T ) = ce rt , with
k, c > 0. In this case
π( T ) = k
To calculate
RT 0
ZT 0
te−rt dt − ce rT .
te−rt dt use the integration by parts rule, − rt
− rt
RT 0
udv = uv 0T −
RT 0
vdu.
In the present case, let u = t, v = e so dv = (−r) e dt. RT RT −rt ¡ 1 ¢RT Rewrite 0 te dt as − r 0 t(−r) e−rt dt and consider 0 t(−r) e−rt dt. Inte
grating by parts, ZT 0
The term
RT 0
ZT 0
t(−r) e−rt dt = te−rt 0T −
ZT 0
e−rt dt = T e−rT −
ZT
e−rt dt.
0
e−rt dt can be integrated: ¶ ZT ¶ µ ¶ µ µ 1 1 1 (−r) e−rt dt = − e−rt 0T = − [ e−rT − 1]. e−rt dt = − r 0 r r
Thus, ZT 0
µ ¶· ¸ ¸ · 1 1 1 1 1 1 T e−rT + e−rT − = 2 − 2 e−rT − T e−rT . r r r r r r ¸ · 1 1 1 π( T ) = k 2 − 2 e−rT − T e−rT − ce rT . r r r
te−rt dt = −
The firstorder condition is: π′ ( T ) = 0 = k
·
¸ 1 −rT 1 −rT e − e + T e−rT − cre rT = 0 = kT e−rT − cre rT = 0. r r
The secondorder sufficient condition for a maximum is: h i π′′ ( T ) = k e−rT − rT e−rT − cr2 e rT < 0.
One feature of this example is that the variable T appears in the range of the integral. For the particular functional form, completing the integration and
14.8. LEIBNITZ’S RULE
467
then computing the derivative was simple. Sometimes it is more convenient to differentiate directly and Leibnitz’s rule provides a means to do so.
14.8 Leibnitz’s rule Let f ( x, k) be a continuous function of x and t and let
g( k ) =
Z b( k)
f ( x, k)dx,
a( k)
c ≤ k ≤ d.
(14.5)
Leibnitz’s rule provides a formula for differentiating g. Theorem 14.1: Suppose that f ( x, k) and
∂ f ( x,k) ∂k
are continuous functions of ( x, t)
on a region [a, b] × [ c, d ] and suppose that a(k) and b(k) are continuous with continuous derivatives on [a, b] (continuously differentiable on [a, b]), both with range in [ c, d ], then
dg = dk
Z b( k) a( k)
∂f ∂k
dx + f (b(k), k)
db(k) da(k) − f (a(k), k) . dk dk
Applied to the presentvalue calculation: π( T ) =
ZT 0
R ( t) e−rt dt − c(T ) ⇒
dπ d = dT dT
·ZT 0
¸ R ( t) e−rt dt − c′ (T ).
Here f ( t, T ) = R ( t) e−rt dt. Applying Leibnitz’s rule:
Since
∂f ∂T
dπ = dT
·ZT 0
∂T
d0 dT − rT
= 0,
dπ dT
= 0 and
zero gives R (T ) e
∂f
dt + f (T, T )
¸ dT d0 − f (0, T ) − c ′ ( T ). dT dT
= f (T, T ) − c′ (T ) = R (T ) e−rT − c′ (T ). Setting this to
− c′ (T ) = 0 (in the example kT e−rT − cre rT = 0).
E XAMPLE 14.4: A student has discovered the function that relates exam score to time spent studying. The key components of the function are as follows. Time spent studying improves exam performance at the rate a t . However, with more study, fatigue reduces ability to comprehend interesting math quickly, so effort over time should be discounted at some rate: e−r( t+1). Note e−r( t+1) and not e−rt , because it takes some time to “get going”. Time spent studying is deducted from sleeping time. One hour of sleep is worth c
CHAPTER 14. INTEGRATION
468
dollars, so studying for T hours costs cT . Putting all this together, the “profit from study” function is given by π( T ) =
ZT 0
e−r( t+1) a t − cT,
where T is the number of hours spent studying. Assume that 0 < c < e−r and
0 < a < e r . A natural question is to find the optimal choice of T .
Here, using Leibnitz’s rule will help. The firstorder condition is: π′ ( T ) = e−r(T +1) aT − c = 0.
Now, aT = e T ln a , so the firstorder condition may be written e−r(T +1) e T ln a = c
or e−r [ e T (ln a −r) ] = c or [ e T (ln a −r) ] = e r c or T (ln a − r) = r + ln c. Thus,
T∗ =
r + ln c . (ln a − r)
Now, 0 < c < e−r so ln c < −r or r + ln c < 0 and 0 < a < e r gives ln a < r or ln a − r < 0. Therefore, T ∗ > 0. Finally, to check the secondorder condition, π′ ( T ) = e−r(T +1) aT − c = e−r e T (ln a −r) − c, π′′ ( T ) = (ln a − r) e−r [ e T (ln a −r) ] = (ln a − r) e−r(T +1) aT < 0.
A more complex version of this has a depend on T : a(T ). In this case, the integral becomes: π( T ) =
ZT 0
e−r( t+1) a(T ) t − cT.
In this case, applying Leibnitz’s rule: π′ ( T ) =
ZT 0
ZT
∂ ∂T
{ e−r( t+1) a(T ) t }dt + e−r(T +1) a(T )T − c
∂ {a(T ) t }dt + e−r(T +1) a(T )T − c ∂ T 0 ½ ¾ ZT 1 ′ = e−r( t+1) t a (T )a(T ) t dt + e−r(T +1) a(T )T − c a(T ) 0
=
=
e−r( t+1)
1 ′ a (T ) a(T )
ZT 0
t e−r( t+1) a(T ) t dt + e−r(T +1) a(T )T − c.
Setting this to 0 gives the firstorder condition for T .
♦
14.9. INEQUALITY MEASURES
469
14.9 Inequality measures The Lorenz curve is often used as an indicator of the distribution of income in a population. A Lorenz curve L identifies the share of national income belonging to different percentiles of the population, ordered by relative income. In particular, it is often used as an indicator of income inequality. The Lorenz curve is a function attaching a value L(p) to each percentile p. The following discussion defines these curves. Let F be a distribution function of income in a population: F(y) = proportion of
population with income ≤ y. Assume F is differentiable and let F ′ (y) = f (y), where
f is called the density of the distribution function. Assume f (y) > 0 for all y. For
yˆ > y, yˆ close to y,
F( yˆ ) − F(y) ∼ =
Z yˆ y
f (y)d y = f (y)( yˆ − y)
gives the proportion of the population with income between y and yˆ . Given y, if p = F(y) ≤ 1, then 100p% is the percentage of the population with income ≤ y. Alternatively, given the fraction p, if y is defined by p = F(y) then y is the
level of income such that 100p% have income ≤ y. To each p we can associate a y, y(p) such that 100p% of population have income in the range [0, y(p)]. Thus,
F(y(p)) = p. The Lorenz curve, L(p), gives for each fraction, p, the total income of
the poorest 100p% divided by the total income of the population (Figure 14.14).
The following discussion defines the Lorenz curve, first for the finite population case, and then for a continuous population. Consider first the case where there is a finite number of people in the population. Suppose that the incomes earned in the population are { y1 , y2 ,... , yk } with yi < yi+1 and n i people have P income yi . The total number of people in the population is n = ki=1 n i , the total Pq number with income no greater than yq is j =1 n j , and the fraction of the popPq
nj
ulation with income no larger than yq is Pkj =1 . The total income of people with n j =1 j Pq income no larger than yq is i=1 yi n i . The total income in the population is Pk i=1 yi n i . So, the fraction of income held by the population members with income no larger than yq is:
L=
Pq
i =1 Pk i =1
where f (yi ) =
yi n i yi n i ni n,
=
1 n 1 n
Pq
i =1 Pk i =1
yi n i yi n i
=
Pq
¡n ¢ y i i =1 i n ¡ ni ¢ Pk i =1 y i n
Pq
= P ki=1
yi f (yi )
i=1 yi f (yi )
P yq
y= y1
= P yk
the fraction of the population with income yi .
y= y1
y f (y) y f (y)
,
CHAPTER 14. INTEGRATION
470
By analogy with this expression, in the continuous case, R y(p)
0 L(p) = R ∞ 0
=p
x f (x)dx
x f (x)dx
=
R y(p) 0
R y(p) h f (x) i F(y(p)) 0 x F( y(p)) dx R∞ = 0 x f (x)dx
x f (x)dx µ
µp µ
where µ p is the mean income of the poorest fraction p of the population. Thus L(p) gives the percentage of total income held by the poorest 100p% of the population. Here µ is the mean income of the population. To clarify: Z y(p) 0
Note that
R y(p) 0
f (x) F( y(p)) dx
x f (x)dx = F(y(p)) = 1 (since
R y(p) 0
Z y(p) 0
x
f (x) dx. F(y(p))
f (x)dx = F(y(p)) and thus
f (x) F( y(p))
is a
distribution – the distribution of income in the group with income in the range [0, y(p)]. If xˆ > x and xˆ close to x,
f (x)( xˆ − x) F( y(p))
≈ “proportion” of the income group R y(p) f (x) ˆ Therefore, 0 x F( y(p)) dx = µ p – the [0, y(p)] whose income is in the range [x, x].
mean income of the poorest 100p% (i.e., mean income to pth percentile). Also F(y(p)) = p, by definition, so the Lorenz curve may be written L(p) = p
µp . µ
Since
µ p ≤ µ, L(p) ≤ p.
The following calculations show that the Lorenz curve is convex, as depicted in Figure 14.14. L(p) =
1 µ
Z y(p)
x f (x)dx,
0
1
L(p) A B
1
Figure 14.14: The Lorenz curve
14.10. RAMSEY PRICING
471
so (using Leibnitz’s rule), L′ (p) =
1 µ
[y(p) f (y(p))]
d y(p) . dp
Now p = F(y(p)), so 1 = F ′ (y(p))
d y(p) d y(p) d y(p) 1 = f (y(p)) ⇒ = . dp dp dp f (y(p))
Thus, L′ (p) =
y(p) ′′ 1 d y(p) 1 1 , L (p) = · = · > 0. µ µ dp µ f (y(p))
A second measure of inequality is given by the Gini coefficient. This is a single number, G, defined:
G=
1 2
−
R1 0
L(p)d p 1 2
= 1−2
Z1 0
L(p)d p =
A . A+B
(14.6)
In Figure 14.14, this is the area A divided by the area A + B. The area A is 12 − R1 1 ◦ 0 L(p)d p (the area under the 45 line and above L(p)), while the area A + B is 2 . While the Gini coefficient contains less information than the Lorenz curve, it has
the virtue of being a single number, so that Gini coefficients for different countries are comparable.
14.10 Ramsey pricing A standard rule of thumb for efficient pricing is that price be set equal to marginal cost: output should be increased up to the point where the cost of providing an additional unit is equal to the amount a consumer is willing to pay. Where there are fixed costs of production, this is not viable in the long term since marginal cost pricing does not cover fixed cost. In the case of a single good, price must be at least as large as average cost for longrun profit to be nonnegative. With more than one good, where there are fixed costs and joint production, the problem is more complicated. The following discussion considers efficient pricing (to maximize welfare) where more than one market is supplied, and where a fixed cost must be recovered by pricing above marginal cost. A firm supplies two markets, X and Y . The variable cost of joint production is c(x, y), fixed cost is F, and inverse demand in the markets is p x and
CHAPTER 14. INTEGRATION
472
p y , respectively. At output levels ( x∗ , y∗ ), surplus is given by: S ( x ∗ , y∗ ) =
Z x∗ 0
p x ( x)dx +
Z y∗ 0
p y ( y ) d y − c ( x ∗ , y ∗ ).
The problem is to maximize surplus subject to revenue from sales covering cost. For example, c( x, y) = c x x + c y y + F is a cost function with constant marginal costs
c x and c y and a fixed cost F . Thus, prices must be set above marginal cost to cover
fixed cost. Define the Lagrangian: L=
Z x∗ 0
p x ( x)dx +
Z y∗ 0
p y ( y)d y − c( x∗ , y∗ ) + λ[ p x ( x) x + p y ( y) y − c( x∗ , y∗ ) − F ].
At a solution ( x∗ , y∗ ), ∂L ∂ x∗ ∂L ∂ y∗ ∂L ∂λ
= p x ( x∗ ) − c x ( x∗ , y∗ ) + λ[ p x ( x∗ ) + p′x ( x∗ ) x∗ − c x ( x∗ , y∗ )] = 0 = p y ( y∗ ) − c y ( x∗ , y∗ ) + λ[ p y ( y∗ ) + p′y ( y∗ ) y∗ − c y ( x∗ , y∗ )] = 0 = p x ( x ∗ ) x ∗ + p y ( y ∗ ) y ∗ − c ( x ∗ , y ∗ ) − F = 0.
Rearranging these equations, [ p x ( x∗ ) − c x ( x∗ , y∗ )](1 + λ) + λ p′x ( x∗ ) x∗ = 0 [ p y ( y∗ ) − c y ( x∗ , y∗ )](1 + λ) + λ p′y ( y∗ ) y∗ = 0 or [ p x ( x∗ ) − c x ( x∗ , y∗ )] = − [ p y ( y∗ ) − c y ( x∗ , y∗ )] = − where k =
λ (1+λ) .
λ
(1 + λ)
p′x ( x∗ ) x∗ = − p′x ( x∗ ) x∗ k
λ p′ ( y∗ ) y∗ = − p′y ( y∗ ) y∗ k, (1 + λ) y
Dividing by p x ( x∗ ) and p y ( y∗ ), respectively, and noting that − p′x ( x∗ ) x∗
p x ( x∗ )
=
1 η x ( x∗ )
,
[ p x ( x∗ ) − c x ( x∗ , y∗ )] 1 =k p x ( x∗ ) η x ( x∗ ) [ p y ( y∗ ) − c y ( x∗ , y∗ )]
p y ( y∗ )
=k
1 η y ( y∗ )
.
14.10. RAMSEY PRICING
473
Thus, the markup of price in excess of marginal cost in each market should be proportional to the reciprocal of the elasticity in that market (with the same constant of proportionality, k, for all markets). R EMARK 14.1: In this discussion, differentiation involved differentiating with respect to a variable appearing in the range of an integral. This can be done using Leibnitz’s rule, but also by direct computation. The derivative of g( x∗ ) = R x∗ ∗ ∗ 0 p x ( x) dx with respect to x is p x ( x ), and this can be established with a simple variational argument: ·Z x∗ +∆ ¸ Z x∗ 1 1 [ g( x∗ + ∆) − g( x∗ )] = p x ( x)dx p x ( x)dx − ∆ ∆ 0 0 ·Z x∗ ¸ Z x ∗ +∆ Z x∗ 1 = p x ( x)dx + p x ( x)dx − p x ( x)dx ∆ 0 0 x∗ =
1 ∆
Z x ∗ +∆
p x ( x)dx
≈
1 ∆
Z x ∗ +∆
p x ( x∗ )dx,
x∗
x∗
the approximation following with p x ( x) approximately constant on [ x∗ , x∗ + ∆]. R x ∗ +∆ Since x∗ p x ( x∗ )dx = p x ( x∗ )∆, 1 1 [ g( x∗ + ∆) − g( x∗ )] ≈ p x ( x∗ )∆ = p x ( x∗ ). ∆ ∆
E XAMPLE 14.5: Consider p x ( x) = a − bx, p y ( y) = α − β y, and suppose that
cost is c( x, y) = c x x + c y y + F . Since the reciprocal of the elasticity is
− pxx
p′x ( x),
the formula:
1 ηx
=
[ p x ( x∗ ) − c x ( x∗ , y∗ )] 1 =k p x ( x∗ ) η x ( x∗ ) becomes
· ¸ p x ( x∗ ) − c x x∗ ′ ∗ = k × − p ( x ) . p x ( x∗ ) p x ( x∗ ) x
Cancelling p x ( x∗ ) and with p′x ( x) = −b,
a − bx∗ − c x = k × bx∗ . Thus, (a − c x ) = (1 + k)bx∗ so that x∗ =
1 a− c x 1+ k b
= δ a−bc x , with δ =
x mc be the quantity at which price is equal to marginal cost: x mc =
1 1+ k . Let a− c x b and,
CHAPTER 14. INTEGRATION
474
with similar notation, ymc =
α− c y β .
Thus, x∗ = δ x mc and y∗ = δ ymc , so that
relative to marginal cost pricing, both goods are scaled down by the same proportion. Thus, when x mc = ymc , x∗ = y∗ . Revenue over marginal cost in
market x is:
[ p x ( x∗ ) − c x ] x∗ = [(a − c x ) − bx∗ ] x∗ = [(a − c x ) − bx∗ ] x∗ = [(a − c x ) − δ(a − c x )] x∗ = [(1 − δ)(a − c x )] x∗ , 2
and substituting for x∗ gives revenue over marginal cost as: (1 − δ)δ (a−bc x ) .
The analogous calculation in market y gives revenue over cost as (1 − δ)δ
( α− c y ) 2 . β
Therefore, setting total revenue over marginal cost equal to
fixed cost gives: # ( a − c x )2 (α − c y )2 + =F (1 − δ)δ b β "
This expression (assuming a solution) will have two roots – select the larger
♦
root, corresponding to the larger output of x and y.
14.11 Welfare measures When the price of a good changes, the welfare of a consumer is affected. How should the welfare change be measured? There are a number of answers to this question – here three are considered: consumer surplus, compensating variation and equivalent variation.
14.11.1
Consumer surplus
Maximizing utility u( x1 ,... , x n ) subject to the budget constraint
P
p i x i = m yields
demand functions x i ( p1 ,... , p n , m), i = 1,... , n. The program is written: max subject to
u( x1 ,... , x n ) X p k x k = m.
(14.7)
k
For ease of notation, write the demand function as x i ( p i , p− i , m) to highlight
“own” price, p i . The optimal choice occurs at the tangency of the budget constraint
14.11. WELFARE MEASURES
475
and the indifference curve as depicted in Figure 14.15 for the case of two goods, (x i , x j ). A natural question arises: how does a price change affect welfare? For example, if p i increases to p′i , how is the individual’s welfare affected? Considering Figure 14.15, utility has fallen from u0 to u1 so that the loss in welfare might be measured by u0 − u1 . However, utility is an ordinal measure of welfare (u(x1 ,... , x n ) and 5u(x1 ,... , x n ) have exactly the same indifference curves so the distance u0 − u1 is arbitrary). A natural monetary measure of welfare loss uses the consumer surplus associated with demand for good i, and this measure is commonly used. However, a change in the price of one good affects demand for other goods through crossprice effects, so that consumer surplus variations resulting from a change in consumption of good i may not fully describe the impact on welfare. When the price of good x i increases to p′i , demand for good x i decreases from x i (p i , p j , m) to x i (p′i , p j , m) (assuming that the good is not a “Giffen” good). This gives a downward sloping demand curve – as p i increases, x i decreases.
xj
pi u0
u1 p′i x(p i , p j , m) • •
CS
x(p′i , p j , m)
pi x i (p i , p j , m)
x′i x i
m p′i
m pi
xi
x′i
xi
xi
Figure 14.15: Utility maximization and consumer surplus In Figure 14.15, a price increase from p i to p′i moves the budget constraint inward, leading to a tangency at x i (p′i , p j , m) on the new budget constraint: {(x i , x j ) 
p′i x i + p j x j = m}. Demand for good x i falls from x i = x i (p i , p j , m) to x′i = x i (p′i , p j ,m).
CHAPTER 14. INTEGRATION
476
The welfare cost measured in terms of consumer surplus is given by the area under the demand curve between p i and p′i : CS =
Z p′
i
pi
x i ( p˜ i , p j , m)d p˜ i .
(14.8)
The demand function x i = x i (p i , p j , m) is sometimes referred to as the Marshal
lian demand function, to distinguish it from the Hicksian (or “compensated”) demand function that is discussed later.
E XAMPLE 14.6: Suppose that u(x i , x j ) = aln x i + b ln x j . Optimality involves
setting the marginal rate of substitution equal to the price ratio. Thus, a xi b xj
Rearranging this expression:
ax j bxi
= pi pj ,
=
pi . pj a pj b p i x j . Substituting this ³ ´ p p i ab p j x j + p j x j = m, or i
so that x i =
into the budget constraint p i x i + p j x j = m gives £a ¤ a+ b b p j + p j x j = m, or b p j x j = m. A similar calculation applies for x i so that
x i (p i , p j , m) =
a m , a + b pi
x j (p i , p j , m) =
b m . a+b pj
So, for example, considering a price rise from p i to p′i , the reduction in consumer surplus is given by: CS =
Z p′
i
pi
x i ( p˜ i , p j , m)d p˜ i =
Z p′
i
pi
h i a m a d p˜ i = m ln(p′i ) − ln(p i ) . a + b p˜ i a+b
♦
14.11.2
Compensating variation
Two important alternative measures of welfare variation are compensating and equivalent variation. In this section, compensating variation is discussed in detail. The measure is illustrated in Figure 14.16. Suppose there are two goods, i and j, and the initial price–income profile is (p i , p j , m). Consumption is at A. Suppose the price of good i increases from p i to p′i . Then, consumption moves
14.11. WELFARE MEASURES
477
xj
•C
B
•
• A
u0 u′ xi m p′i
m′ p′i
m pi
Figure 14.16: Compensating variation
from A to B, and the person has a lower level of welfare. Suppose now that the individual is compensated for the price increase with an increase in income – just enough to restore the person’s initial level of utility: increase income from m to m′ . The amount m′ − m is exactly the monetary compensation required to offset the
welfarereducing impact of the price increase. After this increase in income m′ − m, the person ends up consuming at C. This is called the compensating variation: CV = compensating variation = m′ − m. The following discussion develops an expression for the compensating varia
tion using the expenditure function and the Hicksian demand curve. Given income m and prices (p i , p− i ), utility maximization leads to demand x∗ = x(p i , p− i , m) and
utility u0 = u(x(p i , p− i , m)). Now, consider the problem where u0 is fixed and the task is minimizing the cost of achieving the level of utility u0 : min
p 1 x1 + p 2 x2
subject to
u(x1 , x2 ) = u0 .
(14.9)
This problem involves moving the budget constraint in as far as possible subject to maintaining tangency with the indifference curve associated with utility level u0 . The solution is depicted in Figure 14.17.
CHAPTER 14. INTEGRATION
478
xj
h 0 • x (pi, p j, u )
{ x  u( x i , x j ) = u0 }
xi
Figure 14.17: The optimal costminimizing choice Notice that the solution to this program depends on the prices and level of utility u0 : the demand function for good i is x hi ( p i , p j , u0 ). The superscript “h” denotes the fact that this is the Hicksian (or compensated) demand function with demand dependent on prices and the level of utility, u0 . Example 14.7 computes these demand functions in a simple case.
E XAMPLE 14.7: With u( x i , x j ) = a ln x i + b ln x j , minimizing cost subject to
attaining some utility level u requires that the marginal rate of substitution ax j bxi
p
pi pj ,
or x i = ab p j x j . The choice of ( x i , x j ) i ¢ must satisfy u = a ln x i + b ln x j = ln x ai x bj or e u = x ai x bj . Substituting the equals the price ratio. Thus,
=
value for x i in terms of x j ,
e
So, x aj +b =
³
u
´a b pi eu a pj
x hi ( p i , p j , u) =
= x ai x bj
a pj = xj b pi µ
and x ai +b =
"µ
a pj b pi
¡
¶b
e
u
#
³
¶a
x bj
´b a pj eu . b pi
1 a+ b
,
a pj = b pi µ
¶a
x aj +b .
Thus,
x hj ( p i , p j , u) =
·µ
b pi a pj
¶a
eu
¸
1 a+ b
.
♦
14.11. WELFARE MEASURES
479
This contrasts with the Marshallian demand curve, x i (p i , p j , m), where demand depends on prices and the level of income and is found by maximizing utility subject to a budget constraint. From Figure 14.17 one can see that minimizing expenditure at given prices defines a budget constraint that is tangent to the indifference curve determined by u0 . Conversely, if this budget level is set, utility maximization will give a solution at the same tangency. Thus, x i (p i , p− i , m) = x hi (p i , p j , u0 ), for all i.
(14.10)
In words, starting with income m and maximizing utility gives demand x(p i , p− i , m) and corresponding utility level u0 . If this utility level, u0 , is then
fixed and expenditure minimized subject to attaining this utility level, the result
ing expenditure is m and the corresponding (Hicksian) demand is x hi (p i , p j , u0 ). This connection is illustrated in Example 14.9. Minimizing cost subject to achieving a given utility level leads to the expenditure function. If prices are (p i , p− i ) and a utility level u0 is given, write
e(p i , p− i , u0 ) for the (minimum) expenditure required to reach that utility level at
those prices. If the price of good i rises to say p′i , the relevant expenditure becomes e(p′i , p− i , u0 ). Thus, to compensate for the price rise in a way that would allow the
individual to maintain the same utility level, the individual would require the additional amount e(p′i , p− i , u0 ) − e(p i , p− i , u0 ). This is called the compensating variation. The following discussion shows how the Hicksian demand function can be used to compute the compensating variation. The Lagrangian associated with the cost minimizing program is: L (x i , x j , λ) = p i x i + p j x j + λ(u0 − u(x i , x j )),
with firstorder conditions: L i = p i − λu i (x i , x j ) = 0, 0
Lλ = u − u(x i , x j ) = 0,
where u i =
x hj (p i , p j , u0 )).
∂u . ∂ xi
µ
ui =
∂u ∂xi
¶
, i = 1,2
They determine a solution: x h (p i , p j , u0 ) = (x hi (p i , p j , u0 ),
Define the expenditure function:
¡ ¢ ¡ ¢ ¡ ¢ e p i , p j , u0 = p i x hi p i , p j , u0 + p j x hj p i , p j , u0 .
CHAPTER 14. INTEGRATION
480
Note that L (x hi (p i , p j , u0 ), x hj (p i , p j , u0 ), λ(p i , p j , u0 )) = e(p i , p j , u0 ),
for all (p i , p j , u0 ) since at any (p i , p j , u0 ), x h (p i , p j , u0 ) satisfies u0 − u(x h (p i , p j , u0 )) = 0. Therefore, we may differentiate with respect to prices or utility. In
particular:
∂L ∂ pi
=
h ∂L ∂ x i
∂xi ∂ p i
+
h ∂L ∂ x j
∂x j ∂ p i
=
∂e ∂ pi
and ∂e ∂ pi
=
∂L ∂ pi
= x hi (p i , p j , u0 ) +
= x hi (p i , p j , u0 ) +
2 X
·
k =1
2 X
pk
k =1
pk − λ
∂ x kh ∂ pi
∂u ∂xk
¸
−λ ∂ x kh ∂ pi
2 ∂u ∂ x h X k k =1 ∂ x k ∂ p i
.
From the firstorder conditions, the second term is zero. Therefore, ∂ e(p i , p j , u0 ) ∂ pi
= x hi (p i , p j , u0 ).
(14.11)
Thus, the derivative of the expenditure function with respect to the price of a good gives the Hicksian demand function for that good. Suppose that p i rises to p′i with p j remaining constant. If the individual’s income were set at e(p i , p j , u0 ), then optimal consumption would occur at x¯ (see Figure 14.18), yielding the lower level of utility u′ . How much compensation does the consumer need to be as well off as before (i.e., to maintain utility level u0 )? From the definition of the cost function, e(p′i , p j , u0 ) is the amount of money required to achieve the level of utility u0 at prices (p′i , p j ) whereas e(p i , p j , u0 ) is the amount required at prices (p i , p j ). Therefore, after the price increase, the individual needs e(p′i , p j , u0 ) − e(p i , p j , u0 ) to be as well off as before. In Figure 14.18, the impact of a price rise is given by the
inner dotted line. The outer dotted line results from the compensating income. Let (x hi , x hj ) = x h (p i , p j , u0 ) be the bundle chosen at prices (p i , p j ) to achieve utility u0 , h
h
and let (x′i , x′j ) = x h (p′i , p j , u0 ) be the bundle chosen at prices (p′i , p j ) to achieve utility u0 . Thus
e(p i , p j , u0 ) = p i x hi + p j x hj ,
h
h
e(p′i , p j , u0 ) = p′i x′i + p j x′j .
(14.12)
14.11. WELFARE MEASURES
481
xj
xj
h ′ 0 • x (p i , p j , u )
h 0 • x (p i , p j , u )
h 0 • x (p i , p j , u )
x¯ •
u0 u′ xi
xi
Figure 14.18: The impact of price changes on quantities The compensation required to make the individual as well off as before the price increase is then: CV = e(p′i , p j , u0 ) − e(p i , p j , u0 ).
(14.13)
This amount can be given an alternative representation. Recall that F(b) − F(a) = Rb ′ a F (x)dx. In the present context this can be applied to give: CV =
e(p′i , p j , u0 ) − e(p i , p j , u0 ) =
Z p′
i
pi
∂e ∂ pi
d pi =
Z p′
1
pi
x hi ( p˜ i , p j , u0 )d p˜ i .
Figure 14.19 illustrates the calculation.
E XAMPLE 14.8: u(x1 , x2 ) = x1α x21−α
¡ ¢ L (x1 , x2 , λ) = p1 x1 + p2 x2 + λ u0 − x1α x21−α L x1 = p1 − L x2 = p2 −
λα x1α x21−α
x1
=0
λ(1 − α)x1α x21−α
x2
Lλ = u0 − x1α x21−α = 0.
=0
(14.14)
CHAPTER 14. INTEGRATION
482
p′i CV
pi
x h ( p i , p j , u0) xi
Figure 14.19: Compensating variation
Thus,
p1 p2
=
α x2 1−α x1
·
α p2
u0 =
⇒ x1 =
(1 − α) p1
¸α
p2 α p 1 1 −α x 2 .
x2α x21−α =
·
The third equation implies α p2
(1 − α) p1
¸α
x2 ⇒ x2 =
·
(1 − α) p1 α p2
¸α
u0 .
Solving for x1 :
x1 = =
CV =
·
(1 − α) p1 α p2
¸−1
·
α p2
¸1−α
(1 − α) p1
Z p′
1
p1
= u0
·
u0
·
x2 =
(1 − α) α p2
(1 − α) α p2
·
(1 − α) p1 α p2
u0 = u0 ¸−(1−α)
¸−(1−α)
·
¸−(1−α) ·
(1 − α) p1 α p2
(1 − α) p1 α p2
¸−(1−α)
(1−α) p− d p˜ 1 = u0 1
·
= u0
·
(1 − α) α p2
¸−α ·
(1 − α) p1 α p2
(1 − α) α p2
¸−(1−α)
¸−(1−α) Z p′
1
p1
¸α
u0
(1−α) p− , 1
1 −α p− d p˜ 1 1
¶ ¸ · µ 1 −α (1 − α) −(1−α) 1 1 1 . ( p1 − ( p′1 )−α ) = u0 − α α p2 α pα ( p′1 )α 1
♦
14.11.3
Equivalent variation
Where compensating variation asks what monetary compensation is required to “neutralize” a price increase from lowering utility, equivalent variation asks what
14.11. WELFARE MEASURES
483
income reduction would have the same impact on utility as the price increase. As with compensating variation, equivalent variation has a simple expression in terms of the expenditure function. This is explained next. Suppose that the price of i increases from p i to p′i (see Figure 14.20). The
xj u0 u
1
B
•
C′ •
•A
xi m p′i
m′′ pi
m pi
Figure 14.20: Equivalent variation
impact of the price increase is to move consumption from A to B. Welfare drops from u0 to u1 . Suppose instead that price remains unchanged, but income is reduced to achieve the same loss of welfare. At initial prices, income is reduced from m to m′′ . In this case, the individual ends up consuming at C ′ . The loss of income m − m′′ has the equivalent effect on welfare as a price increase from p i to p′i . This is called the equivalent variation:
EV = equivalent variation = m − m′′ . The following discussion provides detailed calculations for EV . Suppose that income is m and prices (p1 ,... , p n ). Utility maximization leads to demand x(p i , p− i , m) and corresponding utility u0 . If now, starting from utility
of u0 the consumption bundle is chosen to minimize expenditure, this is achieved at consumption bundle x(p i , p− i , m) at a cost of m. In terms of the expenditure
function e(p i , p− i , u0 ) = m. See Figure 14.21. Now suppose that the price of good
CHAPTER 14. INTEGRATION
484
xj u0
u1 B
•
•
x(p i , p j , m)
C
•
x′i
xi xi
m p′i
m pi
Figure 14.21: Equivalent variation
i increases to p′i . With income unchanged, utility maximization leads to demand x(p′i , p− i , m), at B in Figure 14.21. So, p′i x i (p′i , p− i , m) +
X
k6= i
p k x i (p′i , p− i , m) = m.
(14.15)
Conversely, with u1 given, at price profile (p′i , p− i ) expenditure minimization subject to attaining utility level u1 leads to demand at B. That is, solving the program:
min
(
p′i x i +
X
j 6=1
pjxj
)
subject to u i (x) ≥ u1
(14.16)
gives the compensated or Hicksian demand x h (p′i , p− i , u1 ), at B in Figure 14.21, with corresponding expenditure e(p′i , p− i , u1 ). So,
x hi (p′i , p− i , u1 ) = x i (p′i , p− i , m), for all i.
(14.17)
14.11. WELFARE MEASURES
485
Thus, e(p′i , p− i , u1 ) = p′i x hi (p′i , p− i , u1 ) + = p′i x i (p′i , p− i , m) +
X
k6= i
X
k6= i
p k x kh (p′i , p− i , u1 )
(14.18)
p k x k (p′i , p− i , m)
= m.
Next, at the original prices (p i , p− i ), expenditure minimization, subject to achieving utility level u1 , leads to demand at C. Solving the program: min
(
p i xi +
X
pjxj
j 6=1
)
subject to u i (x) ≥ u1
(14.19)
gives the compensated or Hicksian demand x h (p i , p− i , u1 ), at C in Figure 14.21,
with corresponding expenditure e(p i , p− i , u1 ). The drop in welfare, from u0 to u1 , resulting from the price increase p i to p′i generates the same utility loss as would
a fall in income from m to e(p i , p− i , u1 ), because both B and C are on the same indifference curve. Since m = e(p′i , p− i , u1 ), the income equivalent of the price rise, called the equivalent variation, is given by: EV = =
14.11.4
e(p′i , p− i , u1 ) − e(p i , p− i , u1 ) = Z p′
i
pi
Z p′
i
pi
∂ e( p˜ i , p− i , u1 ) ∂ pi
x h ( p˜ i , p− i , u1 )d p˜ i .
d p˜ i
(14.20) (14.21)
Comparison of welfare measures
These welfare measures (CV and EV ) may be compared using the Hicksian and Marshallian demand curves x h (p i , p− i , u) and x i (p i , p− i , m). Consider the demand
for good i, x i (p i , p− i , m) evaluated at price utility profile (p i , p− i , m). At these prices and income, the level of utility is u0 . Conversely, at prices (p i , p− i ), the
minimum expenditure required to achieve utility u0 is e(p i , p− i , u0 ) = m, and this occurs at the consumption bundle x hi (p i , p− i , u0 ). As p i varies to, say, p′i , the value of e(p′i , p− i , u0 ) = m′ gives the amount of money required to maintain utility
u0 . At profile (p′i , p− i , m′ ) the same observations apply: maximizing utility gives
Marshallian demand x i (p′i , p− i , m′ ) and utility u0 , while minimizing expenditure
subject to achieving utility u0 gives compensated demand x hi (p′i , p− i , u0 ) and expenditure m′ . These observations are depicted in Figure 14.22.
CHAPTER 14. INTEGRATION
486
Consider the function x i ( p i , p− i , e( p i , p− i , u0 )) and the impact of a price change
on its value. As price p i changes to p′i , the initial impact is to push consumption toward the point A . As income is adjusted through e( p i , p− i , u0 ) to allow the indi
vidual to maintain the same utility level, this moves demand to x( p′i , p j , m′ ) =
x h ( p′i , p− i , u0 ) (at price p′i , m′ = e( p′i , p j , u0 )).
xj u0
u1 x( p′i , p j , m′ ) = x h ( p′i , p − i , u0 ) = x( p′i , p j , e ( p′i , p j , u0 ))
•
A
•
x( p′i , p j , m) = x( p′i , p j , e ( p i , p j , u0 ))
x( p i , p j , m) = x h ( p i , p − i , u0 ) = x( p i , p j , e ( p i , p j , u0 ))
•
xi m′ p′i
m p′i
m pi
Figure 14.22: Demand variation
Notice that
x hi ( p i , p− i , u0 ) = x i ( p i , p j , e( p i , p j , u0 ))
(14.22)
for each value of p i . This expression is an identity in p i and connects the Hicksian and Marshallian demand curves.
E XAMPLE 14.9: This example illustrates the connection between Marshallian and Hicksian demand functions that appears in Equations 14.10 and 14.22. Recall the Marshallian demands in Example 14.6: x i ( p i , p j , m) = a m a+ b p i
and x j ( p i , p j , m) =
b m a+ b p j .
The corresponding utility attained at this
14.11. WELFARE MEASURES
487
level of income is u0 , say:
u0 = a ln[ x i ( p i , p j , m)] + b ln[ x j ( p i , p j , m)] = a ln = ln
(·
a m a + b pi 0
Therefore, e u =
x hi ( p i , p j , u0 ) =
h
"µ
¸a ·
b m a+b pj
a m a+ b p i
a pj b pi
ia h
¶b
e
¸b )
b m a+ b p j
u0
#
·
¸ · ¸ b m a m + b ln a + b pi a+b pj
.
ib
. Consider:
1 a+ b
¸ · ¸ ) 1 a m a b m b a+ b a + b pi a+b pj ( µ ¶a µ ¶ µ ¶ ) 1 ³ a ´ b µ p j ¶ a+b b ³ a ´a b b b 1 b a+ b a+ b a 1 = m m b pi a+b pi a+b pj b b a b µ ¶ ¶ ¶ ¶ µ µ µ ³a´ b ³ a ´ a a+ b p j a+ b 1 b 1 a+ b 1 a+ b a+ b a+ b = ( m a + b ) a+ b b a+b a+b pi pi pj µ ¶ b µ ¶ a µ ¶ b ³ a ´ a µ b ¶ a+b b ³ a ´ b b 1 a+ b 1 a+ b 1 a+ b a+ b a+ b p ja+b m = a+b a+b b pi pi pj ³ a ´ a ³ a ´ b µ1¶ a+ b a+ b = m a+b a+b pi µ ¶ ³ a ´ 1 m = a + b pi
a pj = b pi µ
¶
b a+ b
(·
= x i ( p i , p j , m).
This gives the equality in Equation 14.10. Conversely, starting with a fixed utility level u0 , the corresponding Hicksian demand functions imply the expenditure:
m = p i x hi ( p i , p j , u0 ) + p j x hj ( p i , p j , u0 ) "µ # 1 ·µ ¶ ¶ ¸ 1 a p j b u 0 a+ b b p i a u 0 a+ b = pi . + pj e e b pi a pj So, "
m = pi
³a´
b
b a+ b
µ
pj pi
¶
b a+ b
µ ¶ a µ ¶ a # 0 u b a+ b p i a+ b + pj e a+ b . a pj
CHAPTER 14. INTEGRATION
488
The Marshallian demand is:
x i ( p i , p j , m) =
a m a + b pi
" µ ¶ a µ ¶ a # 0 ³ a ´ b µ p j ¶ a+b b u a 1 b a+ b p i a+ b a+ b pi = + pj e a+ b a + b pi b pi a pj " # ³ a ´ b µ p j ¶ a+b b p j µ b ¶ a+a b µ p j ¶− a+a b u0 a a+ b = + e a+ b a+b b pi pi a pi " µ ¶ a #µ ¶ b b ³ ´ p j a+ b u 0 a a+ b a b a+ b e a+ b = + a+b b a pi " µ ¶1− b # µ ¶ b ³a´ b a+ b p j a+ b u 0 b a a+ b + e a+ b = a+b b a pi ·³ ´ b µ ¶¸ µ ¶ b p j a+ b u 0 a a a+b ³ a ´ a+b b b = + e a+ b a+b b b a pi · µ ¶¸ µ ¶ b p j a+ b u 0 b a ³ a ´ a+b b 1+ e a+ b = a+b b a pi ·µ ¶¸ µ ¶ b p j a+ b u 0 a ³ a ´ a+b b a + b e a+ b = a+b b a pi " # 1 ³ a ´ b µ p j ¶ b 0 a+ b u = e b pi
= x hi ( p i , p j , u0 ).
♦
This completes the calculations. Returning to Equation 14.22 and differentiating, ∂ x hi ( p i , p− i , u0 ) ∂ pi
=
∂ x i ( p i , p j , e( p i , p j , u0 )) ∂ pi
+
∂ x i ( p i , p j , e( p i , p j , u0 )) ∂ e( p i , p j , u0 ) ∂m
∂ pi
.
Using Equation 14.11, ∂ x hi ( p i , p− i , u0 ) ∂ pi
=
∂ x i ( p i , p j , e( p i , p j , u0 )) ∂ pi
+
∂ x i ( p i , p j , e( p i , p j , u0 )) ∂m
x h ( p i , p − i , u 0 ), (14.23)
an expression called the Slutsky equation, which connects the slopes of the compensated and uncompensated demand functions. Rearranging the expression, ∂ x i ( p i , p j , e( p i , p j , u0 )) ∂ pi
=
∂ x hi ( p i , p− i , u0 ) ∂ pi
−
∂ x i ( p i , p j , e( p i , p j , u0 )) ∂m
x h ( p i , p − i , u 0 ). (14.24)
14.11. WELFARE MEASURES
489
For a normal good (demand increases when income increases),
∂ xi ∂m
> 0. Also,
∂ xhi ∂pi
is
negative, since it involves a rotation along the indifference curve. So, for normal goods, ∂xi ∂ pi
0 for all x. The study of each of these
problems is facilitated through the use of eigenvalues and eigenvectors.
In Section 15.2 eigenvalues and eigenvectors are defined and some of the key properties introduced. Section 15.3 describes two dynamic models, of price and population growth, that illustrate the use of eigenvectors and eigenvalues in the study of the time path of these variables. A key theorem, the Perron–Frobenius theorem, is summarized in Section 15.4. Section 15.5 describes two applications 493
CHAPTER 15. EIGENVALUES AND EIGENVECTORS
494
in detail: a simplified model of Internet webpage ranking and a generational model of population evolution. When the underlying matrix is symmetric, the associated eigenvalues and eigenvectors have additional properties and these features are discussed in Section 15.6. Eigenvalues are found as the roots of a polynomial equation and so may be real or complex. Therefore, some discussion of complex
numbers
in
this
context
is
provided
in
Section
15.7.
Section 15.8 considers the diagonalization of a matrix (so that a matrix A may be written in the form A = P ΛP −1 , with Λ a diagonal matrix.) This key property
simplifies the computation of powers of A. A number of important properties of eigenvalues are described in Section 15.9.
15.2 Definitions and basic properties Consider a square n × n matrix A. Given a vector x, this is transformed by A to the vector y = Ax by multiplication. In general, y and x will bear no particular
relation, but in the special case where y is proportional to x, y = λ x, x is called an eigenvector of the matrix A. E XAMPLE 15.1: Let Ã ! 4 6 A= , 2 8
x1 =
Ã ! 1
1
,
x2 =
Ã ! −3
1
,
λ1 = 10,
λ2 = 2.
Notice that Ax1 =
Ã
10 10
!
= 10
Ã ! 1
1
,
Ax2 =
Ã
−6
2
!
=2
Ã
−3
1
!
.
Ax1
Ax2 x2
x1
15.2. DEFINITIONS AND BASIC PROPERTIES
495
In the first case, applying A to x1 scales x1 up by a factor of 10; applying A to x2 scales x2 up by a factor of 2. For both vectors, applying A has the effect
♦
of scaling the vector but the direction is unchanged.
Let A be a square matrix of order n. Let x be an ndimensional vector and λ a scalar. If Ax = λ x, x 6= 0, then λ is called a characteristic root or an eigenvalue of
A and x its associated characteristic vector or eigenvector. Formally:
Definition 15.1: Given the square matrix A, a nonzero vector x is called an eigenvector of A if there is some number λ, such that Ax = λx. The scalar λ is called the eigenvalue associated with x (and may be 0). Different eigenvectors may have the same eigenvalue. Note that there is not a unique eigenvector associated with an eigenvalue, because
Ax = λ x ⇔ A ( cx) = λ( cx) so that if x is an eigenvector of A , then so is cx, where c
is any real nonzero number. Conventionally, this is resolved by some normalization. Since Ax = λ x implies ( A − λ I ) x = 0, if x 6= 0, then ( A − λ I ) must be singular so
that  A − λ I = 0. The determinental equation  A − λ I = 0, defines an nthorder
polynomial in λ called the characteristic equation or characteristic polynomial. Definition 15.2: The characteristic polynomial of A , p(λ) is defined:
p(λ) =  A − λ I . In the case where A is a 2 × 2 matrix,
A=
Ã
a11
a12
a21
a22
!
,
¯ ¯a − λ ¯ 11  A − λ I = ¯ ¯ a21
¯ a12 ¯¯ ¯ = (a11 − λ)(a22 − λ) − a21 a12 . a22 − λ¯
So,  A − λ I  = a11 a22 − (a11 + a22 )λ + λ2 − a21 a12
= λ2 − (a11 + a22 )λ + a11 a22 − a21 a12 = λ2 − tr( A )λ + det( A ),
(15.1)
where tr( A ) is the sum of the diagonal elements of A and det( A ) is the determinant of A . From this, in the 2 × 2 case, λ1 , λ2 =
tr( A ) ±
p tr( A )2 − 4det( A ) . 2
CHAPTER 15. EIGENVALUES AND EIGENVECTORS
496
E XAMPLE 15.2:
A=
Ã 4
6
3
1
!
,
A − λI =
Ã
4
6
3
1
! Ã
−
!
λ
0
0
λ
=
Ã
4−λ
6
3
1−λ
!
,
so that  A − λ I  = (4 − λ)(1 − λ) − 18 = λ2 − 5λ − 14 = (λ − 7)(λ + 2). So, λ1 = 7
and λ2 = −2 are the two eigenvalues. Finding the eigenvector x2 associated
with, say, λ2 requires solving: Ã
4 − λ2 3
6 1 − λ2
!Ã
x21 x22
!
=
Ã ! 0
0
⇔
Ã
6
!Ã ! 6 x21
3
3
x22
=
Ã ! 0
0
.
So, x21 = − x22 solves this equation. With a normalization, x21 = 1 gives x22 =
♦
−1 and x2 = (1, −1). A similar calculation gives x1 = (2, 1).
R EMARK 15.1: Sometimes it is useful to define left eigenvalues and eigenvectors. If y′ A = λ y′ for some λ and y 6= 0, then y is called a left eigenvector of
A . With this terminology, Ax = λ x is a right eigenvector, or eigenvalue pair.
The left eigenvalues are defined as the characteristic roots of the polynomial  A ′ − λ I  and since  A ′ − λ I = A − λ I  (a matrix and its transpose have the
same determinant), the eigenvalues are unchanged: the set of left eigenvalues
is the same as the set of right eigenvalues. However, the eigenvectors generally differ. For example:
A=
Ã
4
8
4
4
!
,
p p λ1 = 4(1 + 2), λ2 = 4(1 − 2),
left eigenvectors : y1 = right eigenvectors : x1 =
Ã p ! 1 2 2
1 Ãp ! 2 1
, y2 =
, x2 =
Ã p ! − 12 2
1 Ã p ! − 2 1
,
.
However, they satisfy orthogonality conditions. Consider left and right eigenvalue–eigenvector pair ( yi , λ i ) and ( x j , λ j ). Then
y′i A = λ i y′i ⇒ y′i Ax j = λ i y′i x j and Ax j = λ j x j ⇒ y′j Ax j = λ j y′j x j .
15.2. DEFINITIONS AND BASIC PROPERTIES
497
Therefore, λ i y′i x j = λ j y′j x j or (λ i − λ j ) y′i x j = 0. If all the eigenvalues are distinct,
then writing Y for a matrix of left eigenvectors, each eigenvector occupying a column of Y , and X for the matrix consisting of columns of right eigenvectors, these calculations imply that Y ′ X is a diagonal matrix.
For A an n × n matrix, the equation  A − λ I = 0 is a polynomial with n roots,
λ1 ,... , λn . The roots may be real or complex and may be repeated. This polynomial,
p(λ), may be written: p(λ) = a n λn + a n−1 λn−1 + · · · a1 λ + a0 = a n (λ − λ1 )(λ − λ2 ) · · · (λ − λn ).
(Take the second form, multiply out the right side and match the coefficients on powers of λ to connect the roots to the coefficients of p(λ).) For example,
p(λ) = λ2 − 3λ + 2 = (λ − λ1 )(λ − λ2 ) = λ2 − (λ1 + λ2 )λ + λ1 λ2 , so that λ1 + λ2 = 3 and λ1 λ1 = 2, giving λ1 = 1, λ2 = 2. From the definition, if A is
singular, then λ = 0 is a root of A. The fact that  A − λ I = 0 has n roots follows from the fundamental theorem of algebra, which states: Theorem 15.1: Let p( z) be an nthorder polynomial:
p ( z ) = a n z n + a n −1 z n −1 + · · · a 1 z + a 0 with a n 6= 0 (and where each a i may be real or complex). Then p has n roots,
z1 , z2 ,... , z n : p( z i ) = 0, i = 1,... , n.
Note that even if the entries of the matrix are real numbers (or even positive real numbers), eigenvalues and eigenvectors may be complex.
E XAMPLE 15.3: Let
A=
Ã
1
2
−3
2
!
.
Then  A − λ I = (1 − λ)(2 − λ) + 6 = λ2 − 3λ + 8. From this, the roots are: p p 3 ± 9 − 32 3 ± −23 = . λ1 , λ2 = 2 2
Therefore, the eigenvalues are complex.
♦
CHAPTER 15. EIGENVALUES AND EIGENVECTORS
498
R EMARK 15.2: In the case of a 3 × 3 matrix, the characteristic polynomial from
the determinant of A − λ I suggests some useful features of the function. The
polynomial is:
p(λ) = −λ3 + (a11 + a22 + a33 )λ2 + (−a11 a22 − a11 a33 − a22 a33 + a23 a32 + a21 a12 + a31 a13 )λ + (a11 a22 a33 − a11 a23 a32 + a21 a32 a13 − a21 a12 a33 + a31 a12 a23 − a31 a13 a22 ).
Notice that the coefficient on λ3 is (−1)3 , the coefficient on λ2 is tr( A ) and the constant is  A . For the ndimensional case, this becomes:
p(λ) = (−1)n λn + (−1)n−1 tr( A )λn−1 + [terms in λn−2 , λn−3 , · · · , λ] + det( A ). For example, recalling that p(λ) = A − λ I , p(0) = A = det( A ).
One important observation is that if the eigenvalues are all distinct, then the eigenvectors are linearly independent. Theorem 15.2: Let A have eigenvalues, λ1 , λ2 ,... , λn all distinct, and let x1 , x2 ,... ,
xn be associated eigenvectors. Then x1 , x2 ,... , xn are linearly independent. P ROOF : Consider the case where A is a 3 × 3 matrix. Suppose that x1 , x2 , x3 are
not linearly independent, so there exist α1 , α2 and α3 , at least one nonzero, such that α1 x1 + α2 x2 + α3 x3 = 0.
Multiplying by ( A − λ2 I )( A − λ3 I ) gives: ( A − λ2 I )( A − λ3 I )[α1 x1 + α2 x2 + α3 x3 ] = 0. Using the fact that ( A − λ i I ) x j = Ax j − λ i x j = λ j x j − λ i x j = (λ j − λ i ) x j gives: 0 = α1 ( A − λ2 I )( A − λ3 I ) x1 + α2 ( A − λ2 I )( A − λ3 I ) x2 + α3 ( A − λ2 I )( A − λ3 I ) x3 = α1 ( A − λ2 I )(λ1 − λ3 ) x1 + α2 ( A − λ2 I )(λ2 − λ3 ) x2 + α3 ( A − λ2 I )(λ3 − λ3 ) x3 = α1 (λ1 − λ3 )( A − λ2 I ) x1 + α2 (λ2 − λ3 )( A − λ2 I ) x2 + α3 (λ3 − λ3 )( A − λ2 I ) x3 = α1 (λ1 − λ3 )(λ1 − λ2 ) x1 + α2 (λ2 − λ3 )(λ2 − λ2 ) x2 + α3 (λ3 − λ3 )(λ3 − λ2 ) x3 = α1 (λ1 − λ3 )(λ1 − λ2 ) x1 .
Since x1 6= 0 and (λ1 − λ3 )(λ1 − λ2 ) 6= 0, it must be that α1 = 0. Similar computa
tions show that α2 = α3 = 0. For the nvariable case, the calculations generalize
directly.
15.2. DEFINITIONS AND BASIC PROPERTIES
499
R EMARK 15.3: Thus, distinct eigenvalues imply independent eigenvectors. Two real vectors x and y are said to be orthogonal if x′ y = 0. Although distinct eigen
values imply that the eigenvectors are independent, they may not be orthogonal. For example, the matrix A has eigenvectors v1 and v2 and eigenvalues λ1 ,
λ2 . The eigenvectors are linearly independent but not orthogonal, v1 · v2 = 9:
A=
Ã
0
1
−8
6
!
, v1 =
Ã ! 1
, v2 =
2
Ã ! 1
4
, λ1 = 2, λ2 = 4.
R EMARK 15.4: Even when the eigenvalues are not distinct, it may be possible to find linearly independent eigenvectors. For example, the identity matrix:
A=
Ã 1
0
0
1
!
has characteristic equation (1 − λ)2 = 0, so that λ1 = λ2 = 1. Nevertheless, Ã
1
0
!Ã ! 1
0
1
0
= λ1
Ã ! 1
0
,
Ã 1
0
0
1
!Ã ! 0
1
= λ2
Ã ! 0
1
,
so that one can find two linearly independent eigenvectors associated with the common eigenvectors λ1 = λ2 . However, this is not always possible. Consider the matrix:
B=
Ã
1
0
1
1
!
.
The matrix B has eigenvalues λ1 = λ2 = 1. Thus, any eigenvector must satisfy:
B=
Ã
1
0
1
1
!Ã
x1 x2
!
=
Ã
x1
!
x2
and this requires that x1 + x2 = x2 or x1 = 0, and so any eigenvector x = ( x1 , x2 )
has x1 = 0. All eigenvectors of B have the form x = ( x1 , x2 ) = (0, z).
R EMARK 15.5: One interesting property of the characteristic equation p(λ) = 0
is that the matrix satisfies it own characteristic equation in the sense that:
P ( A ) = a n A n + a n − 1 A n − 1 + · · · + a 1 A + a 0 = 0.
CHAPTER 15. EIGENVALUES AND EIGENVECTORS
500
This is called the Cayley–Hamilton theorem. In the case where the eigenvalues are all distinct, this is easy to see. Let x i be an eigenvector. Then
P ( A ) x i = ( a n A n + a n −1 A n −1 + · · · + a 1 A + a 0 I ) x i
= (a n A n x i + a n−1 A n−1 x i + · · · + a1 Ax i + a0 I x i )
= (a n λni x i + a n−1 λni −1 x i + · · · + a1 λ i x i + a0 I x i )
= (a n λni + a n−1 λni −1 + · · · + a1 λ i + a0 ) x i = p (λ i ) x i = 0.
So, P ( A ) x i = 0 for each eigenvector x i . Since the eigenvalues are distinct, the
eigenvectors are linearly independent and with the matrix X = ( x1 ,... , xn ),
P ( A ) X = 0 and inverting X , P ( A ) = 0.
Because Ax i = λ i x i , if Λ is the diagonal matrix with the eigenvalues on the
diagonal and X = { x1 ,... , xn } the matrix of eigenvectors, then A X = X Λ. If X
is nonsingular, then X −1 A X = Λ and A = X Λ X −1 . In this case, X is said to diagonalize A .
Theorem 15.3: A square n × n matrix A is diagonalizable if and only if it has n
linearly independent eigenvectors. A sufficient condition for this is that the eigenvalues of A all be distinct.
15.3 An application: dynamic models Consider a vector, yt that evolves over time according to the equation:
yt = A yt−1 . Then, starting from t = 1, this may be written as yt = A t y0 , so that powers of A are central to describing the evolution over time.
One particularly useful application of eigenvalues and eigenvectors is in computing powers of a matrix A . Suppose that A has n distinct eigenvalues, λ1 ,... , λn with eigenvectors x1 ,... , xn , so that Ax i = λ i x i , for each i . Arranging these n equations in matrix form:
A X = A ( x1 , x2 , · · · , xn ) = ( x1 , x2 , · · · , xn )Λ = X Λ
15.3. AN APPLICATION: DYNAMIC MODELS
501
where X is the square matrix with x i in column i and Λ is the diagonal matrix with λ i in the i th row and column position. Because X consists of linearly independent vectors, X −1 exists, so that
A = X Λ X −1 , and X −1 A X = Λ Then,
A 2 = A A = X Λ X − 1 X Λ X − 1 = X Λ2 X − 1
A k = X Λ k X −1 .
So, the behavior of A k can be considered in terms of Λk . Therefore, if a system evolves according to yt+1 = A yt , then
yt = A t y0 = X Λk X −1 y0 .
E XAMPLE 15.4: Let
A=
− 32
11 4
!
.
Ã
−4
!
Ã − 23
3
Then
X=
Ã
3
4
2
3
!
,
X
−1
=
3 −2
3
,
Ã
Λ=
1 2
0
0
3 4
!
.
So, for example, Ã
t
A =
2
A =
Ã
− 94 , 15 4
49 − 15 8 , 16
!
,
3
A =
Ã
3
4
2
3
− 49
− 57 32