Mathematics for Economists with Applications 9780415638272, 9780415638289, 9781315819532, 0415638275

Mathematics for Economists with Applicationsprovides detailed coverage of the mathematical techniques essential for unde

1,669 142 6MB

English Pages 690 [713] Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Mathematics for electronics : with applications

1,867 259 34MB Read more

Basic Mathematics for Economists 9781134497669, 1134497660

Economics students will welcome the new edition of this excellent textbook. Mathematics is an integral part of economics

1,084 115 3MB Read more

Discrete Mathematics With Applications 0124211801, 1865843830, 9780124211803

This approachable text studies discrete objects and the relationsips that bind them. It helps students understand and ap

7,526 728 13MB Read more

Dimensional Analysis for Economists

504 81 88MB Read more

Engineering Mathematics for Marine Applications 1108421040, 9781108421041

Gaining expertise in marine floating systems typically requires access to multiple resources to obtain the knowledge req

456 97 21MB Read more

Engineering Mathematics for Marine Applications 9781108421041, 9781108363235

183 53 6MB Read more

English for economists. Учебное пособие

435 101 12MB Read more

English for Economists and Businessmen

634 67 4MB Read more

Introduction to Financial Mathematics: With Computer Applications 0367410397, 9780367410391

This book’s primary objective is to educate aspiring finance professionals about mathematics and computation in the cont

2,115 320 24MB Read more

Discrete Mathematics with Applications, Metric Version [5 ed.] 9780357114087

3,898 539 47MB Read more

Mathematics for Economists with Applications
9780415638272, 9780415638289, 9781315819532, 0415638275

Author / Uploaded
Bergin
Jim

Table of contents :
Cover......Page 1
Title Page......Page 4
Copyright Page......Page 5
Table of Contents......Page 6
List of figures......Page 16
Preface......Page 22
1.2 A brief summary by chapter......Page 24
2.1 Introduction......Page 42
2.1.1 Some motivating examples......Page 43
2.2 Vectors......Page 46
2.3 Matrices......Page 47
2.3.1 Matrix addition and subtraction......Page 48
2.3.2 Matrix multiplication......Page 49
2.3.3 Matrix transpose......Page 52
2.3.4 Matrix inversion......Page 53
2.3.5 Additional remarks on determinants......Page 63
2.3.6 Cramer’s rule......Page 66
2.3.7 Gaussian elimination......Page 67
2.4 Linear dependence and independence......Page 73
2.4.1 Matrix rank and linear independence......Page 78
2.5.1 Solving systems of equations: a geometric view......Page 80
2.5.2 Solutions of equation systems: rank and linear independence......Page 84
2.6 Special matrices......Page 87
Exercises for Chapter 2......Page 94
3.2 The market model......Page 100
3.3 Adjacency matrices......Page 101
3.4 Input–output analysis......Page 104
3.5 Inflation and unemployment dynamics......Page 111
3.6 Stationary (invariant) distributions......Page 112
3.6.1 Convergence to a stationary distribution......Page 114
3.6.2 Computation of the invariant distribution......Page 116
3.7 Econometrics......Page 117
3.7.1 Derivation of the least squares estimator......Page 120
Exercises for Chapter 3......Page 124
4.1 Introduction......Page 128
4.1.1 Formulation......Page 129
4.1.2 The feasible region......Page 132
4.1.3 Finding an optimal solution......Page 133
4.1.4 Dual prices and slack variables......Page 136
4.1.5 Some examples......Page 139
4.1.6 Duality reconsidered......Page 143
4.2.1 Equality-constrained programs......Page 146
4.2.2 Definition and identification of basic solutions......Page 148
4.3 Duality principles......Page 154
4.3.1 Duality and dual prices......Page 157
Exercises for Chapter 4......Page 160
5.1 Introduction......Page 168
5.2.1 Demand functions......Page 169
5.2.2 Present value and the interest rate......Page 170
5.2.3 Taxation......Page 171
5.2.4 Options......Page 172
5.3.1 Preliminaries......Page 174
5.3.2 Inequalities and absolute value......Page 176
5.3.3 Continuity......Page 177
5.3.4 Differentiability......Page 179
5.4 Types of function......Page 181
5.4.1 Plots of various functions......Page 182
5.4.2 The log and exponential functions......Page 184
5.5 Growth rates......Page 186
5.5.1 Additivity of continuous growth rates......Page 187
5.5.2 The functions lnx (loge ex) and ekx......Page 188
5.6 Some rules of calculus......Page 189
5.6.1 l’Hospital’s rule......Page 192
5.6.2 Higher derivatives......Page 193
Exercises for Chapter 5......Page 194
6.1 Introduction......Page 198
6.2 Optimization......Page 199
6.2.1 The first-order condition: issues......Page 200
6.2.2 The second-order condition......Page 204
6.2.3 Applications: profit maximization and related problems......Page 206
6.3 Concavity and optimization......Page 210
6.4.1 The free-rider problem......Page 216
6.5 Comparative statics......Page 219
6.6 Elasticity of demand......Page 223
6.6.1 Elasticity, revenue and profit maximization......Page 225
6.7 Taxation and monopoly......Page 226
6.8 Taxation incidence in supply and demand......Page 228
6.8.1 A unit tax......Page 229
6.8.2 An ad-valorem tax......Page 230
6.8.3 Elasticities and taxation......Page 232
6.9 Ramsey pricing......Page 234
Exercises for Chapter 6......Page 238
7.1 Introduction......Page 242
7.2 Partial derivatives......Page 246
7.3 Level contours......Page 248
7.4 The elasticity of substitution......Page 250
7.5 Market equilibrium: an application......Page 253
7.5.1 Oligopoly: linear demand......Page 255
7.5.2 Oligopoly: non-linear demand......Page 256
7.5.3 Industry concentration indices......Page 257
7.6 Total differentials......Page 258
7.6.1 The impact of parameter changes......Page 263
7.7 Solutions to systems of equations......Page 265
7.7.1 Existence and properties of solutions......Page 267
7.8 The implicit function theorem......Page 270
7.8.1 General systems of equations......Page 276
Exercises for Chapter 7......Page 286
8.1 Introduction......Page 292
8.1.1 Taylor series approximations: an example......Page 293
8.1.2 Taylor series expansions......Page 295
8.1.3 Approximations and accuracy......Page 298
8.2 Applications of Taylor series expansions......Page 299
8.2.1 Concavity and the second derivative......Page 300
8.2.2 Roots of a function......Page 301
8.2.3 Numerical optimization......Page 302
8.3.1 Expected utility theory......Page 308
8.3.2 Risk-averse preferences and welfare loss......Page 310
8.3.3 Risk aversion and the cost of small risk......Page 313
8.4.1 Diversification......Page 316
8.4.2 Portfolio selection......Page 317
8.5 The multivariate version of Taylor’s theorem......Page 319
8.6 Proof of Taylor’s theorem......Page 320
Exercises for Chapter 8......Page 324
9.2 Vectors: length and distance......Page 330
9.3 Vectors: direction and angles......Page 333
9.4 Hyperplanes and direction of increase......Page 336
9.5 Farkas’ lemma......Page 339
Exercises for Chapter 9......Page 342
10.1 Introduction......Page 344
10.2 Quadratic forms......Page 345
10.3 Positive and negative definite matrices......Page 348
10.3.1 Symmetric matrices......Page 350
10.3.2 Criteria for positive and negative definiteness......Page 351
10.3.3 Positive and negative semi-definiteness......Page 352
10.4 Definiteness with equality constraints......Page 353
Exercises for Chapter 10......Page 356
11.2 The two-variable case......Page 358
11.2.1 Motivation for the second-order conditions......Page 362
11.2.2 Failure of the second-order conditions......Page 370
11.2.3 The envelope theorem......Page 372
11.3 Optimization: n variables......Page 380
11.3.1 Motivation for the Hessian conditions......Page 383
11.3.2 Definiteness and second-order conditions......Page 384
11.4 Concavity, convexity and the Hessian matrix......Page 390
Exercises for Chapter 11......Page 392
12.1 Introduction......Page 398
12.1.1 A motivating example......Page 399
12.2 The two-variable case......Page 401
12.3 Motivation for first- and second-order conditions......Page 407
12.3.2 The second-order condition......Page 408
12.3.3 A geometric perspective......Page 410
12.4 The n-variable case......Page 413
12.4.1 Quadratic forms and bordered matrices......Page 414
12.4.2 Bordered Hessians and optimization......Page 415
12.4.3 The second-order condition reconsidered......Page 418
12.5 Interpretation of the Lagrange multiplier......Page 419
12.6 Optimization: concavity and convexity......Page 425
12.6.1 Quasiconcavity and bordered Hessians......Page 432
12.7 Optimization with many constraints......Page 434
Exercises for Chapter 12......Page 438
13.1 Introduction......Page 442
13.1.1 A motivating example......Page 443
13.1.2 Overview......Page 445
13.2 Optimization......Page 446
13.2.1 The constraint qualification......Page 453
13.2.2 Complementary slackness......Page 457
13.2.4 Global and local optima......Page 459
13.2.5 Non-negativity constraints......Page 460
13.2.6 Necessary conditions and non-convexities......Page 465
13.2.8 Equality and inequality constraints......Page 467
Exercises for Chapter 13......Page 470
14.1 Introduction......Page 472
14.2 The integral......Page 473
14.4.1 Consumer surplus......Page 475
14.4.2 Producer surplus......Page 476
14.4.3 Consumer surplus and welfare......Page 477
14.5 Cost, supply and profit......Page 479
14.5.1 Average and marginal cost......Page 481
14.6 Taxation and consumer surplus......Page 484
14.7 Present value......Page 488
14.8 Leibnitz’s rule......Page 490
14.9 Inequality measures......Page 492
14.10 Ramsey pricing......Page 494
14.11.1 Consumer surplus......Page 497
14.11.2 Compensating variation......Page 499
14.11.3 Equivalent variation......Page 505
14.11.4 Comparison of welfare measures......Page 508
Exercises for Chapter 14......Page 514
15.1 Introduction......Page 516
15.2 Definitions and basic properties......Page 517
15.3 An application: dynamic models......Page 523
15.3.1 Supply and demand dynamics......Page 525
15.3.2 A model of population growth......Page 527
15.4 The Perron–Frobenius theorem......Page 531
15.5.1 Webpage ranking......Page 534
15.5.2 Leslie matrices......Page 538
15.6 Eigenvalues and eigenvectors of symmetric matrices......Page 542
15.7 Real and complex eigenvalues and eigenvectors......Page 546
15.8 Diagonalization......Page 552
15.9 Properties of eigenvalues and eigenvectors......Page 553
15.9.1 Largest eigenvalues and eigenvalue multiplicity......Page 557
Exercises for Chapter 15......Page 560
16.1 Introduction......Page 564
16.1.1 Preliminary discussion......Page 565
16.2.1 Constant coefficients......Page 567
16.2.2 Variable coefficients......Page 571
16.3.1 The logistic model......Page 575
16.3.2 The Bernoulli equation......Page 577
16.4 First-order differential equations: existence of solutions......Page 580
16.5.1 Constant coefficients......Page 581
16.5.2 Derivation of solution......Page 582
16.6 Systems of differential equations......Page 586
16.7 Stability......Page 589
Exercises for Chapter 16......Page 592
17.1 Introduction......Page 596
17.2 Motivating examples......Page 597
17.3 First-order linear difference equations......Page 599
17.3.2 The constant forcing function......Page 600
17.3.3 The cobweb model......Page 602
17.3.4 Equations with variable forcing function......Page 604
17.4 Second-order linear difference equations......Page 605
17.4.2 The homogeneous equation......Page 606
17.4.3 The general solution......Page 607
17.4.5 Examples......Page 608
17.4.6 The Samuelson multiplier-accelerator model......Page 609
17.5.1 The particular solution......Page 612
17.5.2 The complementary solution......Page 613
17.6 The n-variable case......Page 615
17.7.1 Various forcing functions: solutions......Page 616
17.7.2 Roots of the characteristic equation......Page 618
17.7.3 Stability......Page 622
Exercises for Chapter 17......Page 626
18.1 Introduction......Page 628
18.2 Random variables......Page 629
18.3 The distribution function and density......Page 631
18.3.1 The conditional distribution......Page 633
18.3.2 Joint distributions......Page 634
18.4.2 The binomial distribution......Page 636
18.4.3 The Poisson distribution......Page 639
18.4.4 The Poisson process......Page 640
18.5.1 The uniform distribution......Page 642
18.5.3 The normal distribution......Page 643
18.5.5 The chi-square distribution......Page 646
18.5.6 The t distribution......Page 648
18.5.7 The F distribution......Page 654
Exercises for Chapter 18......Page 656
19.2 Estimation......Page 662
19.2.1 Unbiasedness and efficiency......Page 663
19.3.1 Law of large numbers......Page 665
19.3.2 The central limit theorem......Page 668
19.4 Hypothesis testing......Page 670
19.4.2 Types of error......Page 671
19.4.3 Simple null and alternative hypotheses......Page 675
19.4.4 Uniformly most powerful tests......Page 677
19.4.5 Two-sided tests......Page 683
19.4.6 Unbiased tests......Page 685
19.4.7 Use of chi-square and F tests......Page 686
19.5.1 Matrix computations......Page 687
19.5.2 Ordinary least squares......Page 689
19.5.3 Distributions and hypothesis testing......Page 692
19.5.4 Test statistics......Page 694
19.5.5 Hypothesis testing for the linear model......Page 695
Exercises for Chapter 19......Page 700
Bibliography......Page 704
Index......Page 706

Citation preview

Mathematics for Economists with Applications

Mathematics for Economists with Applications provides detailed coverage of the mathematical techniques essential for undergraduate and introductory graduate work in economics, business and finance. Beginning with linear algebra and matrix theory, the book develops the techniques of univariate and multivariate calculus used in economics, proceeding to discuss the theory of optimization in detail. Integration, differential and difference equations are considered in subsequent chapters. Uniquely, the book also features a discussion of statistics and probability, including a study of the key distributions and their role in hypothesis testing. Throughout the text, large numbers of new and insightful examples and an extensive use of graphs explain and motivate the material. Each chapter develops from an elementary level and builds to more advanced topics, providing logical progression for the student, and enabling instructors to prescribe material to the required level of the course. With coverage substantial in depth as well as breadth, and including a companion website at www.routledge.com/cw/bergin, containing exercises related to the worked examples from each chapter of the book, Mathematics for Economists with Applications contains everything needed to understand and apply the mathematical methods and practices fundamental to the study of economics. James Bergin is Professor of Economics at Queen’s University, Canada.

This page intentionally left blank

Mathematics for Economists with Applications

James Bergin

A First published 2015 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN and by Routledge 711 Third Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2015 James Bergin The right of James Bergin to be identified as author of this work has been asserted by him in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Bergin, James. Mathematics for economists with applications / by James Bergin. – 1 Edition. pages cm 1. Economics, Mathematical. I. Title. HB135.B47 2015 510–dc23 2014020372 ISBN: 978-0-415-63827-2 (hbk) ISBN: 978-0-415-63828-9 (pbk) ISBN: 978-1-315-81953-2 (ebk) Typeset in NewCenturySchlbk by Cenveo Publisher Services

Contents

List of figures

xv

Preface

xxi

1 Introduction

1

1.1 Introduction

1

1.2 A brief summary by chapter

1

2 Matrices and systems of equations 2.1 Introduction

19 19

2.1.1 Some motivating examples

20

2.2 Vectors

23

2.3 Matrices

24

2.3.1 Matrix addition and subtraction

25

2.3.2 Matrix multiplication

26

2.3.3 Matrix transpose

29

2.3.4 Matrix inversion

30

2.3.5 Additional remarks on determinants

40

2.3.6 Cramer’s rule

43

2.3.7 Gaussian elimination

44

2.4 Linear dependence and independence 2.4.1 Matrix rank and linear independence 2.5 Solutions of systems of equations v

50 55 57

CONTENTS

vi

2.5.1 Solving systems of equations: a geometric view

57

2.5.2 Solutions of equation systems: rank and linear independence

61

2.6 Special matrices

64

Exercises for Chapter 2

71

3 Linear algebra: applications

77

3.1 Introduction

77

3.2 The market model

77

3.3 Adjacency matrices

78

3.4 Input–output analysis

81

3.5 Inflation and unemployment dynamics

88

3.6 Stationary (invariant) distributions

89

3.6.1 Convergence to a stationary distribution

91

3.6.2 Computation of the invariant distribution

93

3.7 Econometrics 3.7.1 Derivation of the least squares estimator

94 97

Exercises for Chapter 3

101

4 Linear programming

105

4.1 Introduction

105

4.1.1 Formulation

106

4.1.2 The feasible region

109

4.1.3 Finding an optimal solution

110

4.1.4 Dual prices and slack variables

113

4.1.5 Some examples

116

4.1.6 Duality reconsidered

120

4.2 Basic solutions

123

4.2.1 Equality-constrained programs

123

4.2.2 Definition and identification of basic solutions

125

4.3 Duality principles 4.3.1 Duality and dual prices Exercises for Chapter 4 5 Functions of one variable

131 134 137 145

5.1 Introduction

145

5.2 Some examples

146

CONTENTS

vii

5.2.1 Demand functions

146

5.2.2 Present value and the interest rate

147

5.2.3 Taxation

148

5.2.4 Options

149

5.3 Continuity and differentiability

151

5.3.1 Preliminaries

151

5.3.2 Inequalities and absolute value

153

5.3.3 Continuity

154

5.3.4 Differentiability

156

5.4 Types of function

158

5.4.1 Plots of various functions

159

5.4.2 The log and exponential functions

161

5.5 Growth rates

163

5.5.1 Additivity of continuous growth rates 5.5.2 The functions ln x (log e x) and e

kx

164 165

5.6 Some rules of calculus

166

5.6.1 l’Hospital’s rule

169

5.6.2 Higher derivatives

170

Exercises for Chapter 5 6 Functions of one variable: applications

171 175

6.1 Introduction

175

6.2 Optimization

176

6.2.1 The first-order condition: issues

177

6.2.2 The second-order condition

181

6.2.3 Applications: profit maximization and related problems

183

6.3 Concavity and optimization

187

6.3.1 Summary 6.4 Externalities 6.4.1 The free-rider problem

193 193 193

6.5 Comparative statics

196

6.6 Elasticity of demand

200

6.6.1 Elasticity, revenue and profit maximization

202

6.7 Taxation and monopoly

203

6.8 Taxation incidence in supply and demand

205

CONTENTS

viii

6.8.1 A unit tax

206

6.8.2 An ad-valorem tax

207

6.8.3 Elasticities and taxation

209

6.9 Ramsey pricing

211

Exercises for Chapter 6

215

7 Systems of equations, differentials and derivatives

219

7.1 Introduction

219

7.2 Partial derivatives

223

7.3 Level contours

225

7.4 The elasticity of substitution

227

7.5 Market equilibrium: an application

230

7.5.1 Oligopoly: linear demand

232

7.5.2 Oligopoly: non-linear demand

233

7.5.3 Industry concentration indices

234

7.6 Total differentials 7.6.1 The impact of parameter changes 7.7 Solutions to systems of equations 7.7.1 Existence and properties of solutions 7.8 The implicit function theorem 7.8.1 General systems of equations Exercises for Chapter 7 8 Taylor series 8.1 Introduction

235 240 242 244 247 253 263 269 269

8.1.1 Taylor series approximations: an example

270

8.1.2 Taylor series expansions

272

8.1.3 Approximations and accuracy

275

8.2 Applications of Taylor series expansions

276

8.2.1 Concavity and the second derivative

277

8.2.2 Roots of a function

278

8.2.3 Numerical optimization

279

8.3 Expected utility theory and behavior towards risk

285

8.3.1 Expected utility theory

285

8.3.2 Risk-averse preferences and welfare loss

287

8.3.3 Risk aversion and the cost of small risk

290

8.4 Diversification and portfolio selection

293

CONTENTS

ix

8.4.1 Diversification

293

8.4.2 Portfolio selection

294

8.5 The multivariate version of Taylor’s theorem

296

8.6 Proof of Taylor’s theorem

297

Exercises for Chapter 8

301

9 Vectors

307

9.1 Introduction

307

9.2 Vectors: length and distance

307

9.3 Vectors: direction and angles

310

9.4 Hyperplanes and direction of increase

313

9.5 Farkas’ lemma

316

Exercises for Chapter 9

319

10 Quadratic forms

321

10.1 Introduction

321

10.2 Quadratic forms

322

10.3 Positive and negative definite matrices

325

10.3.1 Symmetric matrices

327

10.3.2 Criteria for positive and negative definiteness

328

10.3.3 Positive and negative semi-definiteness

329

10.4 Definiteness with equality constraints

330

Exercises for Chapter 10

333

11 Multivariate optimization

335

11.1 Introduction

335

11.2 The two-variable case

335

11.2.1 Motivation for the second-order conditions

339

11.2.2 Failure of the second-order conditions

347

11.2.3 The envelope theorem

349

11.3 Optimization: n variables

357

11.3.1 Motivation for the Hessian conditions

360

11.3.2 Definiteness and second-order conditions

361

11.4 Concavity, convexity and the Hessian matrix

367

Exercises for Chapter 11

369

12 Equality-constrained optimization 12.1 Introduction

375 375

CONTENTS

12.1.1 A motivating example

x

376

12.2 The two-variable case

378

12.3 Motivation for first- and second-order conditions

384

12.3.1 The first-order condition

385

12.3.2 The second-order condition

385

12.3.3 A geometric perspective

387

12.4 The n-variable case

390

12.4.1 Quadratic forms and bordered matrices

391

12.4.2 Bordered Hessians and optimization

392

12.4.3 The second-order condition reconsidered

395

12.5 Interpretation of the Lagrange multiplier

396

12.6 Optimization: concavity and convexity

402

12.6.1 Quasiconcavity and bordered Hessians

409

12.7 Optimization with many constraints

411

Exercises for Chapter 12

415

13 Inequality-constrained optimization 13.1 Introduction

419 419

13.1.1 A motivating example

420

13.1.2 Overview

422

13.2 Optimization

423

13.2.1 The constraint qualification

430

13.2.2 Complementary slackness

434

13.2.3 Minimization and maximization

436

13.2.4 Global and local optima

436

13.2.5 Non-negativity constraints

437

13.2.6 Necessary conditions and non-convexities

442

13.2.7 The Lagrange multiplier

444

13.2.8 Equality and inequality constraints

444

Exercises for Chapter 13 14 Integration

447 449

14.1 Introduction

449

14.2 The integral

450

14.3 Common integrals and rules of integration

452

14.4 Measuring welfare: surplus

452

14.4.1 Consumer surplus

452

CONTENTS

xi

14.4.2 Producer surplus

453

14.4.3 Consumer surplus and welfare

454

14.5 Cost, supply and profit 14.5.1 Average and marginal cost

456 458

14.6 Taxation and consumer surplus

461

14.7 Present value

465

14.8 Leibnitz’s rule

467

14.9 Inequality measures

469

14.10 Ramsey pricing

471

14.11 Welfare measures

474

14.11.1 Consumer surplus

474

14.11.2 Compensating variation

476

14.11.3 Equivalent variation

482

14.11.4 Comparison of welfare measures

485

Exercises for Chapter 14 15 Eigenvalues and eigenvectors

491 493

15.1 Introduction

493

15.2 Definitions and basic properties

494

15.3 An application: dynamic models

500

15.3.1 Supply and demand dynamics

502

15.3.2 A model of population growth

504

15.4 The Perron–Frobenius theorem

508

15.5 Some applications

511

15.5.1 Webpage ranking

511

15.5.2 Leslie matrices

515

15.6 Eigenvalues and eigenvectors of symmetric matrices

519

15.7 Real and complex eigenvalues and eigenvectors

523

15.8 Diagonalization

529

15.9 Properties of eigenvalues and eigenvectors

530

15.9.1 Largest eigenvalues and eigenvalue multiplicity

534

Exercises for Chapter 15

537

16 Differential equations

541

16.1 Introduction 16.1.1 Preliminary discussion 16.2 First-order linear differential equations

541 542 544

CONTENTS

xii

16.2.1 Constant coefficients

544

16.2.2 Variable coefficients

548

16.3 Some non-linear first-order differential equations

552

16.3.1 The logistic model

552

16.3.2 The Bernoulli equation

554

16.4 First-order differential equations: existence of solutions

557

16.5 Second- and higher-order differential equations

558

16.5.1 Constant coefficients

558

16.5.2 Derivation of solution

559

16.6 Systems of differential equations

563

16.7 Stability

566

Exercises for Chapter 16

569

17 Linear difference equations

573

17.1 Introduction

573

17.2 Motivating examples

574

17.3 First-order linear difference equations

576

17.3.1 Solving first-order linear difference equations

577

17.3.2 The constant forcing function

577

17.3.3 The cobweb model

579

17.3.4 Equations with variable forcing function

581

17.4 Second-order linear difference equations

582

17.4.1 The inhomogeneous equation

583

17.4.2 The homogeneous equation

583

17.4.3 The general solution

584

17.4.4 Stability

585

17.4.5 Examples

585

17.4.6 The Samuelson multiplier-accelerator model

586

17.5 Vector difference equations

589

17.5.1 The particular solution

589

17.5.2 The complementary solution

590

17.6 The n-variable case

592

17.7 Miscellaneous calculations

593

17.7.1 Various forcing functions: solutions

593

17.7.2 Roots of the characteristic equation

595

17.7.3 Stability

599

CONTENTS

xiii

Exercises for Chapter 17 18 Probability and distributions

603 605

18.1 Introduction

605

18.2 Random variables

606

18.3 The distribution function and density

608

18.3.1 The conditional distribution

610

18.3.2 Joint distributions

611

18.4 Discrete distributions

613

18.4.1 The Bernoulli distribution

613

18.4.2 The binomial distribution

613

18.4.3 The Poisson distribution

616

18.4.4 The Poisson process

617

18.5 Continuous distributions

619

18.5.1 The uniform distribution

619

18.5.2 The exponential distribution

620

18.5.3 The normal distribution

620

18.5.4 Moment-generating functions

623

18.5.5 The chi-square distribution

623

18.5.6 The t distribution

625

18.5.7 The F distribution

631

Exercises for Chapter 18 19 Estimation and hypothesis testing

633 639

19.1 Introduction

639

19.2 Estimation

639

19.2.1 Unbiasedness and efficiency 19.3 Large-sample behavior

640 642

19.3.1 Law of large numbers

642

19.3.2 The central limit theorem

645

19.4 Hypothesis testing

647

19.4.1 A hypothesis test

648

19.4.2 Types of error

648

19.4.3 Simple null and alternative hypotheses

652

19.4.4 Uniformly most powerful tests

654

19.4.5 Two-sided tests

660

19.4.6 Unbiased tests

662

CONTENTS

xiv

19.4.7 Use of chi-square and F tests 19.5 Econometric applications

663 664

19.5.1 Matrix computations

664

19.5.2 Ordinary least squares

666

19.5.3 Distributions and hypothesis testing

669

19.5.4 Test statistics

671

19.5.5 Hypothesis testing for the linear model

672

Exercises for Chapter 19

677

Bibliography

681

Index

683

Figures

1.1 Cost flows

6

1.2 Reaction curves

9

1.3 Direction of steepest increase

12

1.4 Indifference curves and optimal choices

13

1.5 Non-optimal, optimal and non-feasible choice

14

2.1 Combinations of basis vectors

53

2.2 Intersecting lines: a unique solution

59

2.3 Combinations of basis vectors

60

2.4 Overlapping or parallel lines: many or no solutions

61

3.1 Fitting a line

95

4.1 Basic solutions

129

5.1 Choices determine demand

146

5.2 The present value function

149

5.3 The impact of taxation in a market

150

5.4 Options

151

5.5 Giffen good; a non-injective demand function

152

5.6 The derivative

157

5.7 Plot of f (x) = min{ x,1 − x}

158

5.8

160

a

f (x) = x , for various values of a x

5.9 Plot of f (x) = a , for various values of a xv

160

FIGURES

xvi

5.10 Plot of sin(x) and cos(x) for −2π ≤ x ≤ 2π ¡ ¢ 5.11 Plot of sin(x) and sin x − π2 for −2π ≤ x ≤ 2π

161 161

5.12 The functions e x and ln x (≡ log e (x))

165

6.1 Changing slopes

177

6.2 Optimization: some possibilities I

178

6.3 Optimization: some possibilities II

179

6.4 Minimization and inflection points p 6.5 Optimization: f (x) = 3 x − x

181 182

6.6 Revenue, cost and profit

183

6.7 Marginal cost and marginal revenue

184

6.8 Concavity

188

6.9 Concave and convex functions ′

189

′

6.10 Plot of derivatives f (x), g (x)

190 ′′

6.11 Strictly concave function, f , with f = 0 at x = 1

192

6.12 Congestion externalities

195

6.13 Impact of a parameter change on choice variable

197

6.14 Demand elasticity

201

6.15 The impact of taxation on price and quantity

205

6.16 Large and small changes in quantity

210

6.17 Taxation with elastic demand or supply

211

6.18 Ramsey pricing

212

7.1 The impact of price variations on demand 1 2

7.2 The function f (x, y) = x y

1 2

7.3 Isoquant slope

225 226

7.4 Elasticity of substitution 7.5 Plots of level contours of

222

228 ρ ρ 1 u = [x1 + x2 ] ρ

229

7.6 Optimal income–work combination

238

7.7 A parameter change

241

7.8 No solution

243

7.9 Multiple solutions

243

7.10 A system with two solutions

245

7.11 The impact of varying output on x and y

256

7.12 The IS–LM curves

257

7.13 The impact of increasing distance

260

FIGURES

xvii

x

8.1 Taylor series approximation of f (x) = ln(x + 1)e− 2

271

8.2 First-order Taylor series expansion

274

8.3 Taylor series approximations 8.4 Roots of a function:

f (x) = − 12

276 2

+ 4x − 4x + x

3

279

8.5 Risk aversion

288

8.6 Utility loss associated with risk

289

8.7 The cost of risk

291

9.1 Vectors and length

308

9.2 Vectors

309

9.3 Length

309

9.4 Orthogonality

311

9.5 Angles between vectors

311

9.6 Projections

312

9.7 Trigonometric functions

313

9.8 Orthogonal vectors

313

9.9 The normal to a hyperplane

314

9.10 The normal to a hyperplane

315

9.11 The sign of an inner product

315

9.12 Proof of Farkas’ lemma

317

9.13 Farkas’ lemma: a geometric view

318

11.1 The function f (x, y) = 1 − x2 − y2

336

11.2 The function f (x, y) = x − y

348

2

2

2

11.3 The function f (x, y) = x y

348

11.4 The impact of parameter variation on the objective value

350

12.1 Constrained optimal choice

377

12.2 Optimal leisure choice

377

12.3 Orthogonal lines and normal vectors

388

12.4 Utility maximization: the indifference curve

389

12.5 Utility maximization: optimization

390

12.6 Tangency condition fails

393

12.7 Tangencies at the solution

394

12.8 Quasiconcavity

404

12.9 A linear constraint

406

FIGURES

xviii

13.1 Sign of the Lagrange multiplier

426

14.1 The integral of f from a to b

451

14.2 The integral of f : definition

451

14.3 Consumer surplus

453

14.4 Producer surplus

453

14.5 Consumer and producer surplus

454

14.6 Cost and profit

455

14.7 The supply curve

457

14.8 Profit and cost

458

14.9 Average cost curves and marginal cost

459

14.10 Cost curves

460

14.11 Cost curves

461

14.12 Supply

461

14.13 Taxation and consumer surplus

462

14.14 The Lorenz curve

470

14.15 Utility maximization and consumer surplus

475

14.16 Compensating variation

477

14.17 The optimal cost-minimizing choice

478

14.18 The impact of price changes on quantities

481

14.19 Compensating variation

482

14.20 Equivalent variation

483

14.21 Equivalent variation

484

14.22 Demand variation

486

14.23 Different welfare measures

489

15.1 Price dynamics

505

15.2 Internet nodes

512

15.3 Adjacency matrix and corresponding transition process

512

15.4 Rotation of 90

◦

527

16.1 The logistic curve

554

16.2 Euler’s formula

560

17.1 Plot for yt +

¡3¢ 4

yt−1 , c = 72 , y0 = 72 .

578

17.2 Cobweb dynamics

581

17.3 Stability of second-order system

585

FIGURES

xix

17.4 A converging process

587

17.5 Dynamic reaction functions

592

a21 − 4a2

597

17.6 Plot of

= 0, 1 + a1 + a2 = 0 and a1 + 2a2 = 0

17.7 Stable regions: real, repeated and complex roots

601

18.1 Probability density

607

18.2 Cumulative distribution functions

609

18.3 Density functions

610

18.4 The function F(x, y) =

xy 2− x y

611

18.5 The binomial distribution

18.6 The Poisson distribution: P(X = k) =

616 e −λ λ k k!

617

18.7 Poisson process: density at t

619

18.8 The normal density

621

18.9 Distribution function of standard normal random variable

622

18.10 The chi-square distribution with three degrees of freedom q 18.11 The region {(y, z) | y ≤ t kz }

626 627

18.12 The t distribution with 4 and 30 degrees of freedom

628

18.13 The F distribution: F3,12 and F5,3 distributions

631

19.1 Trade-off between Type I and Type II error

650

19.2 Distributions at µ = 30 and µ = 30.5

658

19.3 Distributions at µ = 31 and µ = 31.5

658

19.4 Power function; one-sided test

659

19.5 Power functions for sample sizes of 50 and 225

660

19.6 Power functions for sample sizes of 50 and 225

662

This page intentionally left blank

Preface

Courses in undergraduate mathematical economics are taught at various levels, from the second to fourth year, and as preparatory material for graduate students. As a result, the diversity in the backgrounds of students taking such courses is substantial with large variations in training and knowledge. This diverse readership makes it difficult to present the mathematics used in economics at a suitable level. Nevertheless, there is a basic set of mathematical techniques considered prerequisite for many third- and fourth-year courses and which should be discussed in any course on introductory mathematical economics. While the material in this book varies substantially in terms of level and difficulty, it is possible to identify a subset of the material that may be used for a first course on mathematical techniques in economics. The following outline suggests a suitable collection of essential material at the undergraduate level. Chapter 2 on matrices and systems of equations provides a standard discussion of linear equation systems and rules on matrix manipulation. In particular, the material from Section 2.1 to Section 2.3.5 provides these rules, along with various techniques for solving such systems. Following this, Chapter 3 describes a variety of applications of matrix algebra, including input–output models, adjacency matrices and invariant distributions. These are discussed in Section 3.1 to Section 3.6. Proceeding to functions and calculus, Chapter 5 considers functions of a single variable. This chapter introduces continuity and differentiability of a function, and develops the basic techniques and rules of calculus and differentiation. In addition, the standard functions, including log and exponential function, are discussed in detail.

xxi

PREFACE

xxii

Chapter 6 provides a range of calculus applications, including optimization and the use of calculus techniques to determine elasticities and taxation incidence. In Chapter 7, the study of functions of many variables begins. Section 7.1 to Section 7.6 develop partial derivatives along with a variety of applications, including the study of contours of functions, the elasticity of substitution and comparative statics calculations. In Chapter 8, Section 8.1 and Section 8.2 cover a broad range of applications of Taylor series expansions, including finding the roots of an equation, the development of numerical optimization routines and the calculation of measures of risk aversion. Multivariate optimization is discussed in Chapter 11 for the unconstrained case. Sections 11.1 and 11.2 examine the two-variable problem, motivate the firstand second-order conditions and describe the envelope theorem. Chapter 12 develops optimization with equality constraints. Section 12.2 considers the case of two variables, while motivations for the necessary and sufficient conditions for an optimum are given in Section 12.3. Integration is introduced in Chapter 14. Section 14.2 defines the integral, and some basic rules of integration are presented in Section 14.3. Common applications, such as the measurement of welfare surplus and the calculation of present value, are described in Sections 14.4 through 14.7. In Chapter 16, differential equations are studied. Section 16.1 introduces the topic and Section 16.2 develops the basic analysis for the first-order linear differential equation. Chapter 17 considers difference equations. Section 17.2 provides some motivating examples and Section 17.3 provides the mathematical techniques for solving these equations. Summarizing this suggested course outline:

Chapter

Section

2

2.1–2.3.5

3

3.1–3.6

5

5.1–5.6

6

6.1–6.8

7

7.1–7.6

8

8.1–8.2

11

11.1–11.2

12

12.1–12.3

14

14.1–14.7

16

16.1–16.2

17

17.1–17.3

1 Introduction

1.1

Introduction

This introduction provides a brief overview of the contents of the book, briefly describing the focus of each chapter.

1.2

A brief summary by chapter

Chapter 2 introduces matrices and the corresponding linear algebra. A matrix is a rectangular array of numbers, which may be used to succinctly represent sets of equations, network connections, evolutionary dynamic systems and so on. Manipulation of matrices becomes essential to the efficient study of large systems. The following example illustrates the benefit of matrices when moving from single equation problems to systems of equations. In the supply-and-demand model, demand (D) for some good is negatively related to price, and supply (S) positively related. Thus, D = α − β p, where p is the

price of the good. The letters α and β denote parameters of the model with α > 0, denoting that an increase in price reduces demand. The formulation of supply is similar – except that supply ought to be positively related to price, so that we postulate that δ > 0 in the relation S = γ + δ p. Combining these two relationships gives us the supply-and-demand model:

D = α − β p, α > 0, β > 0

(1.1)

S = γ + δ p, γ < 0, δ > 0. At what price does the good sell and how much of the good is sold? The market clearing assumption implies that the market clearing price will prevail. Denote by 1

CHAPTER 1. INTRODUCTION

2

p e the price that equates supply and demand. Then D e = α − β p e = γ + δ p e = S e .

Rearranging, α − β p e = γ + δ p e , and again, (α − γ) = (β + δ) p e . Divide both sides by (β + δ) to get the equilibrium price p e =

( α − γ) (β+δ) .

Since at this price the quantity

demanded is equal to the quantity supplied, we can determine the equilibrium quantity, denoted Q e . We find Q e by hsubstituting p e into either the supply or i (δα+βγ) ( α − γ) demand equation: Q e = α − β p e = α − β (β+δ) = (β+δ) . Summarizing:

pe =

(α − γ ) , (β + δ)

Qe =

(δα + βγ) . (β + δ)

(1.2)

With multiple markets, the computations become more complex and motivate the use of matrices. With two (or more) markets, there may be cross-price effects: the price of bread may affect the demand for margarine; the price of tea may affect the demand for coffee. As a result, the computations to determine the market clearing prices and quantities become more complex. To capture these cross-price effects, write the demand and supply in both markets:

D 1 = a0 + a1 p1 + a2 p2 ,

Market 1:

D 2 = α0 + α1 p1 + α2 p2 ,

Market 2:

S1 = b 0 + b 1 p1 + b 2 p2

S2 = β0 + β1 p1 + β2 p2 .

Equilibrium occurs when prices equate supply and demand in both markets (D 1 = S1 and D 2 = S2 ). To find the equilibrium level of quantity and price in each market, equate supply and demand in each market. Market 1: Market 2:

D 1 = S1

D 2 = S2

or or

a0 + a1 p1 + a2 p2 = b0 + b1 p1 + b2 p2

α0 + α1 p1 + α2 p2 = β0 + β1 p1 + β2 p2 .

Rearranging these equations gives: ( a1 − b1 ) p1 + ( a2 − b2 ) p2 = ( b0 − a0 ) (α1 − β1 ) p1 + (α2 − β2 ) p2 = (β0 − α0 ). These equations can be solved by substitution. However, the process is tedious. From the first equation

p1 =

( b0 − a0 ) ( a2 − b2 ) − p2 . ( a1 − b1 ) ( a1 − b1 )

Substitute into the second equation to get: (α1 − β1 )

·

¸ ( b0 − a0 ) ( a2 − b2 ) − p2 + (α2 − β2 ) p2 = (β0 − α0 ). ( a1 − b1 ) ( a1 − b1 )

1.2. A BRIEF SUMMARY BY CHAPTER

3

Collecting terms in p2 gives: · ¸ ( a2 − b2 ) (b o − a o ) (α2 − β2 ) − (α1 − β1 ) . p2 = (β o − α o ) − (α1 − β1 ) ( a1 − b1 ) ( a1 − b1 )

Multiplying both sides by (a1 − b1 ) gives: [(α2 − β2 )(a1 − b1 ) − (α1 − β1 )(a2 − b2 )] p2 = (β o − α o )(a1 − b1 ) − (α1 − β1 )(b o − a o ). Thus,

p2 =

[(β o − α o )(a1 − b1 ) − (α1 − β1 )(b o − a o )] . [(α2 − β2 )(a1 − b1 ) − (α1 − β1 )(a2 − b2 )]

A similar calculation gives p1 . With three or more equations, the procedure becomes progressively more complicated. The use of matrix algebra simplifies problems of this sort significantly. In matrix form: Ã

( a1 − b1 )

(α1 − β1 )

( a2 − b2 )

!Ã

(α2 − β2 )

p1

!

p2

=

Ã

b0 − a0

!

β0 − α0

or Ã

p1 p2

!

=

Ã

( a1 − b1 )

( a2 − b2 )

(α1 − β1 )

! −1 Ã

(α2 − β2 )

The matrix Ã

( a1 − b1 )

(α1 − β1 )

( a2 − b2 )

b0 − a0

β0 − α0

!

.

! −1

(α2 − β2 )

is called the inverse of Ã

( a1 − b1 )

(α1 − β1 )

( a2 − b2 )

(α2 − β2 )

!

.

The inverse can be written (details later): Ã

( a1 − b1 )

(α1 − β1 )

( a2 − b2 )

(α2 − β2 )

! −1

Ã 1 (α2 − β2 ) = d −(α1 − β1 )

! −( a 2 − b 2 )

( a1 − b1 )

where d = (a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 ). Thus, Ã

p1 p2

!

! !Ã Ã 1 (α2 − β2 ) −(a2 − b2 ) (b0 − a0 ) = d −(α1 − β1 ) (a1 − b1 ) (β0 − α0 ) ! Ã 1 (α2 − β2 )(b0 − a0 ) − (a2 − b2 )(β0 − α0 ) = . d −(α1 − β1 )(b0 − a0 ) + (a1 − b1 )(β0 − α0 )

,

CHAPTER 1. INTRODUCTION

4

So, p1 =

[(α2 − β2 )(b0 − a0 ) − (a2 − b2 )(β0 − α0 )] [(a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 )]

p2 =

[−(α1 − β1 )(b0 − a0 ) + (a1 − b1 )(β0 − α0 )] . [(a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 )]

The early part of Chapter 2 describes the basic rules for matrix manipulation and for solving systems of equations. Not all systems have a solution, and some equation systems may have many solutions. This leads to the task of identifying when equation systems are soluble and the circumstances under which there is a unique solution. The key ideas of rank and linear independence are introduced to study this matter. The chapter concludes with some additional observations on matrix algebra. In Chapter 3, matrix techniques are used to illustrate the many uses of matrix algebra. These include linear demand systems, input–output techniques, Markov processes and invariant distributions, and least squares estimation. To illustrate, consider input–output analysis, where the economy is viewed as a number of interlinked industries (n industries). The output of an industry is used for final consumption, and in the production of other goods. The total output of an industry is allocated between that used in production in other industries (interindustry demand) and consumption or final demand. Planning to achieve some (target) level of consumption or final demand indirectly implies production levels in each industry. Input–output analysis is concerned with determining the industry production levels required to sustain or achieve target levels across the entire range of final demands. A simple model is developed next to illustrate the procedure. The discussion illustrates the method and the use of matrix methods to determine the necessary level of industry output. Let a i j = amount of i required to produce 1 unit of j; let x i = output of i and f i = final demand for i. Thus, if x j is the amount of good j produced, a i j x j is the amount of i required in the production of j. (If one kilo of good j is produced, a i j is the fraction of one kilo of i used in the production of the kilo of j.) The output of industry i is allocated to production in other industries (a i j x j ) and final demand ( f i ). Thus, x i = a i1 x1 + a i2 x2 + · · · + a in x n + f i , i = 1,... , n. This can be rearranged: −a i1 x1 − a i2 x2 · · · + (1 − a ii )x i · · · − a in x n = f i , i = 1,... , n,

1.2. A BRIEF SUMMARY BY CHAPTER

5

or, in matrix notation: 

(1 − a11 )

  −a 21    ..  . 

−a n1

−a12

···

−a1n

(1 − a22 ) .. .

···

−a2n .. .

−a n2

···

..

.



x1





f1



     x   f   2  2   =  .   ..   ..   .   .     

(1 − a nn )

xn

fn

This expression can be written succinctly: (I − A)x = f , where the matrices are: 

a11

a12

 a  21 A=  ..  . 

···

a22 .. .

···

a n2

···

a n1

..

.

 1   0 a2n    , I =   .. ..  . .   

a1n



0

a nn

0

···

1 .. .

···

0

···

..

.

0



 0   ..  . .  1

Chapter 4 develops the theory of linear programming. Optimization arises in many contexts. For example, utility maximization involves determining the consumption bundle that maximizes utility subject to a linear constraint, the budget constraint. While the objective function, the utility function, is non-linear, the set of feasible choices is defined by a linear constraint. Chapter 4 introduces constrained optimization in the linear context: both the objective and the constraints are linear. Linear programming is widely used in industry and so has immediate practical usefulness. The following transportation problem shows how a linear programming problem arises. A firm has two sources of input, two processing plants and three markets to supply. The shipping costs are given in the following table. Source 1

Source 2

Market 1

Market 2

Market 3

Plant A

2

3

5

4

5

Plant B

3

2

3

4

6

The chart in Figure 1.1 depicts this graphically. Let x iA j = # tons from source i through plant A to market j and x iB j = # tons

from source i through plant B to market j. Inputs are shipped from source to plant, and then to market. The cost of shipping one ton from source 1 through

plant A to market 3 is 2 + 5 = 7 (the cost of shipping from source 1 to plant A plus

the cost of shipping from plant A to market 3). Thus, the cost of shipping x1A3 tons on this route is 7x1A3 . Other route costs are calculated in the same way. Finally, sources 1 and 2 have, respectively, 10 and 20 units of raw material available.

CHAPTER 1. INTRODUCTION

6

Source 1

Source 2 $3

$2

$3

$2

Plant A

Plant B $5

$5

$3

$4

Market 1

$4 Market 2

$6 Market 3

Figure 1.1: Cost flows

Market demand is 10 in market 1, 15 in market 2 and 5 in market 3. This leads to the program: min

C = 7x1A1 + 6x1A2 + 7x1A3 + 6x1B1 + 7x1B2 + 9x1B3 + 8x2A1 + 7x2A2 + 8x2A3 + 5x2B1 + 6x2B2 + 8x2B3 ,

subject to: x1A1 + x1A2 + x1A3 + x1B1 + x1B2 + x1B3 ≤ 10 x2A1 + x2A2 + x2A3 + x2B1 + x2B2 + x2B3 ≤ 20 x1A1 + x1B1 + x2A1 + x2B1 ≥ 10 x1A2 + x1B2 + x2A2 + x2B2 ≥ 15 x1A3 + x1B3 + x2A3 + x2B3 ≥ 5 x iA j , x iB j ≥ 0, ∀ i, j. From a theoretical point of view, the subject introduces many key ideas in economics: optimization, constraints and feasible choices, dual or shadow prices of constraints, the notion of a slack variable and corresponding constraint, and so on. The first part of Chapter 4 introduces the basic concepts used to define a linear program, formulating objectives, constraints and feasible region. Then a graphical approach is used to illustrate the general solution technique. One key idea is that a solution to a linear program can always be found at a corner or vertex of the feasible region. Since there are a finite (although possibly large)

1.2. A BRIEF SUMMARY BY CHAPTER

7

number of vertices, this fact greatly simplifies the search for a solution. Examples illustrate this fact and utilize it to locate the optimal solution. In addition, computation of shadow prices may be done directly for simple problems where the graphical technique is used. The second part of the chapter considers the general structure of these problems. Basic solutions are introduced and vertices are identified as basic solutions. The computation of basic solutions is straightforward, so this observation provides a means of locating the vertices of the feasible region. These observations provide a basis for algorithms to determine the solution. The concluding part of the chapter provides a discussion of duality theory and the theory underlying dual or shadow prices. Chapter 5 is concerned with functions of one variable and begins by showing how various functions, demand functions, present value functions, and so on, arise in economics. A demand function is written p(q), relating price to quantity demanded. To measure price-demand responsiveness, one must consider the price p(q′ ) at which some alternative demand q′ would be forthcoming. This naturally leads to the calculation p(q′ )− p(q), or

p(q′ )− p(q) , giving q′ − q

the per-unit price variation

associated with a quantity variation. Similarly, if a single firm supplies the market, the total revenue to the firm is r(q) = q p(q), and if the cost of output level q is

c(q), the firm’s profit is π(q) = r(q) − c(q). In this way, functions arise routinely in

economic problems and calculations such as the increase in cost per unit increase in output

c(q′ )− c(q) , q′ − q

the marginal cost, lead naturally to calculus and derivatives.

Working with functions requires an understanding of the basic concepts of continuity and differentiability. These are introduced in the early part of the chapter. Following this, a variety of functions are described. The log and exponential functions are discussed in detail since they appear regularly in economic calculations, in particular in calculations concerning growth and present value. Various rules of calculus are described for the functions of functions (sums, products and ratios). Finally, higher-order derivatives are introduced. In Chapter 6 a number of applications illustrate the use of functions in economics. Optimization for functions of one variable is discussed in detail, by considering the behavior of the first and second derivatives. Subsequently, the determination of optima is related to concavity or convexity of the function, and this in turn to properties of the derivatives. Following the discussion of optimization with a single variable, a range of applications are considered. A model of externalities is developed to describe the free-rider problem and market inefficiency. Then the elasticity function is described and this is later applied to the study of taxation and dead-weight loss. For example, if the pre-tax equilibrium is at Q e with price p e and post-tax output and price at Q t , p t , the impact of the tax is a

CHAPTER 1. INTRODUCTION

8

quantity reduction of Q t − Q e and price increase of p t − p e . The dead-weight loss is the area between Q t and Q e above the supply curve and under the demand curve. In the linear case, this is easy to compute, giving an expression for dead-weight loss. In this chapter, comparative statics are introduced by considering the impact of parameter changes on the equilibrium or the optimal level of a variable. The incidence of a product tax is considered in detail. Finally, the chapter concludes with a discussion of Ramsey pricing. Chapter 7 introduces functions of many variables and systems of non-linear equations. The chapter begins with the definition of a partial derivative and the notion of a level contour of a function. This is illustrated with the familiar isoquant where curvature is described with the marginal rate of substitution. Partial derivatives are used to develop the notion of a Cournot equilibrium in the context of oligopoly. The market inverse demand function is given by the equation p = f (Q), where p is the market price and Q the aggregate quantity. There are

two firms, labeled 1 and 2. If firm i supplies quantity q i , the aggregate output and supply is Q = q1 + q2 . The cost per unit of output for firm i is given by c i – so that if firm i produces q i , its cost of production is c i q i . When aggregate output is Q = q1 + q2 , the revenue of firm i is q i f (Q). Thus, the total profit of firm i is π i (q1 , q2 ) = q i f (q1 + q2 ) − c i q i .

This notation makes explicit the dependence of profit on the output of each firm and illustrates how multivariate functions may arise. The analysis of firm behavior in the linear demand case p = a − bQ may be

taken further here. In this case, the profit of firm i is π i (q i , q j ) = pq i − c i q i , where i 6= j. Since price depends on the quantities, π i (q i , q j ) = [a − b(q1 + q2 )]q i − c i q i ,

where i 6= j. This can be rearranged: π i (q i , q j ) = [(a − c i ) − b(q1 + q2 )]q i ; again i 6= j.

How a firm decides on its level of output depends on how the other firm is expected to behave. A myopic assumption for a firm, say firm 1, is that firm 2 will not adjust its output in response to the choice of firm 1. In general, when one firm adjusts it output, the other firm will also wish to do so. In the particular case where both firms’ quantity levels are such that neither firm wants to adjust its output level, the output levels are equilibrium levels. The fact that a firm does not wish to change its level of output is the relevant issue. The fact that this decision is based on the naive assumption that the other firm would not respond is irrelevant. If firm i takes as given the output of j and maximizes profit with respect to i (by differentiating with respect to q i ), this gives: (a − c i ) − 2bq i − bq j = 0, i 6= j. Solving for q i gives q i = [(a− c i )− bq j ]/2b. In particular, q1 = [(a− c1 )− bq2 ]/2b, and q2 = [(a − c2 ) − bq1 ]/2b. Denote these expressions q1 (q2 ) and q2 (q1 ), respectively.

1.2. A BRIEF SUMMARY BY CHAPTER

9

These are depicted in Figure 1.2: the interpretation is that when firm 2 supplies the quantity q2 , then firm 1 would wish to choose the output level q1 (q2 ) and vice

q2

q1 (q2 )

q2 (q1 ) q1

Figure 1.2: Reaction curves versa. Note that at the quantity pair (q∗1 , q∗2 ), firm 1 wishes to choose q∗1 given that firm 2 is supplying q∗2 , and similarly for firm 2. (a − c1 ) 1 (a − c1 ) 1 − q2 = − 2b 2 2b 2 (a − c1 ) − (a − c2 ) 1 = + q1 . 2b 4

q1 =

½·

¸ ¾ (a − c2 ) 1 − q1 2b 2

(1.3)

Thus, the solution values for q1 and q2 , (q∗1 , q∗2 ), are: q∗1 =

· ¸ 4 1 (a − c1 ) − (a − c2 ) 6b 2

q∗2 =

· ¸ 4 1 (a − c2 ) − (a − c1 ) . 6b 2

(1.4)

Further development of the Cournot equilibrium allows a characterization of the price markup over marginal cost in terms of the elasticity. In addition, the Herfindahl–Hirschman index of industry concentration may be defined and related to the equilibrium markup in Cournot competition. This chapter also develops the techniques of comparative statics and provides a discussion of the implicit function theorem. Given two functions f and g of two variables (x and y), suppose that for some parameter α the functions satisfy α = f (x, y) and 0 = g(x, y). Then, at a solution (x, y) the total differentials (with a

CHAPTER 1. INTRODUCTION

10

variation in α, d α) give d α = f x (x, y)dx + f y (x, y)d y and 0 = g x (x, y)dx + g y (x, y)d y,

where, for example, f x (x, y) is the partial derivative of f at (x, y). Rearranging in matrix form: Ã

dα 0

!

=

Ã

f x (x, y)

f y (x, y)

g x (x, y)

g y (x, y)

!Ã

dx

!

dy

.

This sets up the x and y variations naturally in terms of the partial derivatives and d α. Ã

dx dy

!

=

Ã

! −1 Ã

f x (x, y)

f y (x, y)

g x (x, y)

g y (x, y)

dα 0

!

.

The formulation makes use of matrix algebra and highlights the fact that the matrix of partial derivatives must be invertible. The chapter proceeds to discuss the existence of solutions of non-linear systems of equations and their behavior when parameters change. An informal discussion leads to the implicit function theorem. Following this, a variety of examples including familiar macroeconomic multiplier models are developed to illustrate the application of the theorem. Chapter 8 introduces Taylor series expansions. A Taylor series expansion involves approximating a continuous function by a polynomial function, with higher-order polynomials giving better approximations or greater accuracy. The value of this technique is substantial since polynomial functions are relatively simple, and, if the approximation is good, it provides important information on the properties of the original function. The chapter begins with an example showing a quadratic approximation of a function and then proceeds to describe expansions of order n. Some further discussion shows how accuracy improves as the order of the expansion increases. Following this, the chapter develops some applications of Taylor series expansions, beginning with a proof that concavity implies a negative second derivative. This application shows the usefulness of the expansion. A second application concerns numerical optimization. Much of the logic for numerical optimization routines derives from the fact that an arbitrary continuous function may be approximated locally as a quadratic function. This observation forms the basis for many numerical routines. The ideas behind this are developed in the chapter and a variety of examples used to illustrate how the process may be used to locate a maximum or minimum of a function f (x). Exactly the same process may be used to locate a root of a function (a value from x such that the function is equal to 0 at that value ( f (x) = 0)). Multidimensional generalizations are described. The chapter then discusses risk aversion using Taylor series expansions

1.2. A BRIEF SUMMARY BY CHAPTER

11

to provide measures of risk, of risk premia, and of welfare loss resulting from small risks. Finally, a proof of Taylor’s theorem is given for both the univariate and multivariate cases. Vectors are discussed in Chapter 9. A vector can be viewed from a few different perspectives. In one, as a rectangular array of numbers, it is a matrix with either one row or one column. A second interpretation views a vector as a direction, each coordinate representing a length in the corresponding direction. Thus x = (x1 , x2 ) is viewed as x1 units in direction 1 and x2 units in direction 2. In

this view, it makes sense to describe vectors in terms of direction, perpendicular-

ity (or orthogonality), length in a given direction, and so on. In particular, using vectors it is possible to calculate the direction in which a function increases (or decreases) most rapidly. This perspective facilitates the study of many problems, including optimization. Specifically, considering a function f (x1 , x2 ), those values of x = (x1 , x2 ) that give a constant value for the function f determine a “level con-

tour” of the function f . So, the set F = {(x1 , x2 ) | f (x1 , x2 ) = c} is the level contour

of the function f at level c. Such curves appear in economics as, for example, iso-cost curves or indifference curves. If f is a utility function, the set F is the set of consumption combinations that give the same level of utility. It is common to consider the slope of the indifference curve by considering the variations of x1 and x2 that leave utility constant: 0 = f x1 (x1 , x2 )dx1 + f x2 (x1 , x2 )dx2 , with the

marginal rate of substitution given by

d x2 d x1

f x (x1 ,x2 ) . 2 1 ,x2 )

= − f x1 (x

But this information is

also contained in the vector ( f x1 (x1 , x2 ), f x2 (x1 , x2 )) and, in addition, the absolute magnitudes of the partial derivatives. It turns out that, viewed as a direction ∇ f = ( f x1 (x1 , x2 ), f x2 (x1 , x2 )) points in the direction of steepest increase in the func-

tion f . Figure 1.3 illustrates this observation. Starting from x, the value of the function f increases most rapidly in the direction ∇ f and ∇ f is perpendicular to

the level surface of f at x. This fact is used extensively in the discussion of constrained optimization. Finally, Chapter 9 states and proves Farkas’ lemma. This useful result may be used to prove the main theorems of linear programming and provides valuable insight into the logic behind important results in constrained optimization theory. Chapter 10 discusses quadratic forms. A quadratic form is an expression x′ Ax where x is a vector and A a matrix. In the univariate case, this just amounts to x2 a. An important question concerns the circumstances in which such an expression is always positive or always negative. For example, what properties of A would imply that x′ Ax > 0 for all x 6= 0. Such a matrix is called a positive definite

matrix. When x′ Ax < 0 for all x 6= 0, the matrix is called negative definite. The standard characterization is given in terms of the leading principal minors of the

CHAPTER 1. INTRODUCTION

12

x2

x•

• x +∇f

f (z) = 2 ∇f

f x (x1 ,x2 )

slope= − f 1 (x ,x ) x2 1 2

f (z) = 1 x1

Figure 1.3: Direction of steepest increase

matrix (the leading principal matrix of order k in an n × n matrix is obtained by

deleting the last n − k rows and n − k columns). Positive and negativeness cor-

respond to the cases where x′ Ax ≥ 0 (or x′ Ax ≤ 0) for all x. An important related question concerns positive or negative definiteness when the admissible x are constrained: under what conditions is x′ Ax < 0 or x′ Ax > 0 for all non-zero x satisfying

a set of linear restrictions: Bx = 0? This question arises in constrained optimiza-

tion. Here, a full characterization is given in terms of an augmented (bordered) matrix and its leading principal minors. Chapter 11 is the first of three chapters on optimization, and considers multivariate optimization where there are no constraints on the choice variables. The chapter begins with a discussion of the two-variable case, that of maximization or minimization of a function f (x1 , x2 ). For example, the standard profit-maximizing problem writes profit π(x1 , x2 ) = ph(x1 , x2 ) − c1 x1 − c2 x2 , where h is a production function, p the price of output and c i the unit cost of input i.

The first- and second-order conditions for a local maximum or minimum are developed in detail, motivated by using Taylor series expansions, using the Hessian matrix (of second-order partial derivatives). Following this, the envelope theorem is described at length: given a function f (x, α) depending on a choice variable x and parameter α, if x(α) maximizes f (x, α), how does f ∗ (α) = f (x(α), α) vary with

1.2. A BRIEF SUMMARY BY CHAPTER

13

α? This is the analog of a shadow or dual price in linear programming and can be

viewed as a measure of the value of the resource α. The chapter considers the n-variable case, generalizing the initial discussion. The chapter concludes by establishing the connection between concavity, convexity and the properties of the Hessian matrix. In Chapter 12, maximizing or minimizing a function subject to equality constraints is considered. Familiar problems are the maximization of utility P u(x1 ,... , x n ) subject to a budget constraint p i x i = y and minimization of the cost, P c = w i x i , of achieving a level of output y with a production function f (x1 ,... , x n ),

f (x1 ,... , x n ) = y. The following discussion describes utility maximization, to moti-

vate subsequent discussion.

An illustrative example concerns an individual’s optimization problem. A consumer’s utility depends on the consumption of food, f , and drink, w, according to the formula: u( f , w) = f w. The respective prices of food and drink are p f and p w and the consumer’s income is I. The problem for the consumer is to maximize

utility subject to spending no more than I. Thus, the consumer is restricted to choosing those pairs ( f , w) that satisfy p f f + p w w = I. This equation can also be ¡p ¢ written w = pIw − p wf f and this equation gives a straight line relation between w ¡p ¢ and f , with slope − p wf . See Figure 1.4. The set of pairs ( f , w) that yield a given level of utility u¯ are given by the equation u¯ = f w. This determines a relation ¡ ¢ between f and w, w = uf¯ , which gives for any value of f the level of w uf¯ that ¯ When u¯ = 1 then f w = 1 so that w = 1f ; when u¯ = 2 then f w = 2 achieves utility u.

w

w

p

u¯ = 3

u¯ = 2 u¯ = 1 f

•

f •slope = − p w

f

Figure 1.4: Indifference curves and optimal choices

so that w = 2f ; and when u¯ = 3 then f w = 3 so that w = 3f . These calculations indicate that higher levels of utility correspond to higher curves, as in Figure 1.4.

CHAPTER 1. INTRODUCTION

14

Thus, the highest utility attainable is reached by going to the highest curve having some point in common with the equation p f f + p w w = I.

w

•

a

•

b

•

c

u¯ = 3 u¯ = 2

u¯ = 1 f

Figure 1.5: Non-optimal, optimal and non-feasible choice

Figure 1.5 illustrates a number of possibilities regarding indifference curves and the budget constraint: (a) feasible points on the indifference curve are not ¯ (b) this is the only optimal; (c) there are no affordable points giving the utility u; possible optimal situation – and there is a unique optimal point −( f ∗ , w∗ ). At this ∗

point the slope of the curve is − wf ∗ , while the slope of the straight line is constant p

p

∗

at − p wf . Since these slopes are equal at the tangency this gives: − wf ∗ = − p wf and so p w w∗ = p f f ∗ . The same amount is spent on food and drink. Since p w w∗ + p f f ∗ = I

and p w w∗ = p f f ∗ , I = 2p w w∗ = 2p f f ∗ . Thus w∗ =

I 2p w

and f ∗ =

I 2p f

.

Mathematically, this can be viewed as follows. The utility function is a function of food and drink u( f , w). The extra utility from increasing f a small amount is

∂u ∂f ,

the marginal utility of food at the consumption level ( f , w). Likewise, the marginal utility of drink at the consumption level is denoted

∂u ∂w .

If at ( f , g) the level of food

consumption is changed a small amount, d f , the impact on utility is

∂u df . ∂f

If at

( f , w) the levels of food and drink consumption are changed by small amounts, d f and dw, the impact on utility is

∂u df ∂f

u + ∂∂w dw. If the change in the pair ( f , w)

to ( f + d f , w + dw) is such that no change in utility occurs, then it must be that 0=

∂u df ∂f

+

∂u dw. ∂w

Since the marginal utilities are positive (more food or more

drink raises utility), an increase in one must be balanced by a reduction in the other to maintain the same utility level. When f changes and w changes to keep

1.2. A BRIEF SUMMARY BY CHAPTER

15

utility constant, it must be that:

dw = −

Ã

1 ∂u ∂w

!

∂u ∂f

df

or

dw =− df

∂u ∂f ∂u ∂w

.

This gives the slope of the indifference curve at ( f , g). From the previous discussion, this must equal the slope of the budget constraint at the optimal point where utility is maximized subject to the budget constraint. So, at the optimal point, ∂u ∂f ∂u ∂w

=

pf pw

.

The ratio of the marginal utilities is called the marginal rate of substitution, so at the optimal point the marginal rate of substitution is equal to the price ratio. However, this procedure is ad-hoc. The chapter develops systematic procedures for locating the optima. The initial part of Chapter 12 deals with the two-variable case, giving necessary and sufficient conditions for local maxima and local minima. For the case of just two variables, the constraint (subject to regularity conditions) determines one variable as a function of the other, so that one variable may be eliminated. For example, if the objective is f (x1 , x2 ) and the constraint g(x1 , x2 ) = c, then us-

ing the constraint to solve for x2 as a function of x1 , x2 (x1 ), the objective may be maximized as a function of one unconstrained variable, max f (x1 , x2 (x1 )). This observation may be used to justify the second-order conditions. Following this, the n-variable case is considered. Here, the second-order conditions make use of the bordered Hessian matrix and properties derived from the theory of quadratic forms. The value of the Lagrange multiplier is interpreted as a shadow price and a general version of the envelope theorem given for the constrained optimization problem. In utility maximization, these calculations identify the Lagrange multiplier as the marginal value of income; in the cost-minimization case, the Lagrange multiplier is seen to be the marginal cost of additional output. Finally, this chapter discusses optimization in terms of the shape of the objective function and the constraints. The key idea is quasiconcavity (or quasiconvexity) of a function. This property is related to the bordered Hessian and used to provide a sufficient condition for a constrained global maximum. Chapter 13 considers optimization with inequality constraints. Thus, the task is to maximize or minimize a function f (x1 ,... , x n ) subject to a constraint g(x1 ,... , x n ) ≤ c or a collection of constraints g k (x1 ,... , x n ) ≤ c k , k = 1,... , m. The

chapter begins with some motivating examples. Perhaps the central idea in the chapter is the observation that the direction of improvement in the objective

CHAPTER 1. INTRODUCTION

16

must lie between the gradients of the constraints in the non-feasible direction.

y

p f (x, y) = y + 3 x

g 1 (x, y) = y + x3 − 1

g 2 (x, y) = y + x4 − 2

g2 = 0

2

∇ g1

∇f

∇ g2

1 • g1 = 0

x

0 0

1

2

3

4

5

6

Figure 1.6: Inequality-constrained optimization Let the feasible region be defined as those pairs (x1 , x2 ) satisfying g 1 (x1 , x2 ) ≤ 0

and g 2 (x1 , x2 ) ≤ 0. From the figure one may see that there are non-negative numbers λ1 and λ2 such that

∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 ,

or, alternatively,

∂f ∂ xi

∂g

∂g

= λ1 ∂ x1 + λ2 ∂ x2 , for i = 1,2. Although non-negativity of the i

i

choice variables (x1 , x2 ) is not imposed in this example, adding non-negativity

constraints is straightforward and done later in the chapter. It turns out that this condition, ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 , is, in general, a necessary condition for a local optimum: a candidate solution necessarily satisfies this condition in general.

The benefit of this is obvious: it provides a system of equations that helps to locate the solution (but note that there are more unknowns than equations, when the λ variables are included). However, one problem with this insight is that in certain (exceptional) circumstances this equation fails to be a necessary condition for an optimum. This arises, for example, when the constraints have the same gradient at the candidate solution. Recognizing this issue, the constraint qualification is a requirement on the constraints, such that the expression ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2

is in fact necessary at a solution. The chapter motivates the constraint qualification with a number of examples. Following this, the non-negativity constraints are introduced, leading to the familiar Kuhn–Tucker conditions. Chapter 14 discusses integration – the integral of a function. Most of the chapter is spent on applications, beginning with consumer and producer surplus. The

1.2. A BRIEF SUMMARY BY CHAPTER

17

chapter works through a range of problems: recovering total cost from marginal costs, calculating dead-weight loss from taxation, inequality measures, Ramsey pricing and welfare measures in consumer theory. In Chapter 15, the theory of eigenvalues is developed. Many problems involving the use of matrices in economics have a dynamic aspect where a system evolves according to a rule of the form x t+1 = Ax t , where A is a square matrix.

Systems of this form arise in evolutionary models of population change, in Markov processes, matrix difference equations and so on. The theory of eigenvectors and eigenvalues facilitates the study of such systems. An eigenvector–eigenvalue pair for a matrix A is a vector v and scalar λ, such that Av = λv. The vector v is called

an eigenvector and the scalar λ an eigenvalue. Typically, an n × n matrix A has n distinct eigenvalues, {λ i }ni=1 , and a corresponding set of linearly independent

eigenvectors, {v i }ni=1 . Arranging the eigenvectors in a matrix P = [v1 ,... , vn ], the set of equations Av i = λ i v i may be written AP = P Λ, where Λ is a diagonal matrix with λ1 , λ2 ,... , λn on the main diagonal, and 0 elsewhere. Thus, Λ is a diagonal

matrix with the eigenvalues on the diagonal. With P non-singular, this may be written A = P ΛP −1 . Considering A 2 = P ΛP −1 P ΛP −1 = P Λ2 P −1 , A 3 = P Λ3 P −1 , A t = P Λ t P −1 , higher powers of A are expressible in terms of powers of Λ. There-

fore, the long-run behavior of A t is determined by Λ. This property has many applications, such as in the study of long-run behavior of Markov processes, in the study of the dynamics of population growth, and so on. In this chapter, eigenvalues are used to study the evolution of different age cohorts in a population and to provide a model of webpage ranking based on the long-run dynamics of a Markov process describing the transition dynamics between webpages. Dynamic models in economics are frequently developed in terms of differential or difference equations. Chapter 16 develops the theory of differential equations. In the simplest case, a differential equation expresses the derivative of some variable y′ (t) as a function of its current value y(t) and the independent variable t: y′ (t) = f (y(t), t). (Such an equation is called a first-order differential equa-

tion. Regarding terminology, a second-order differential equation has the form y′′ (t) = f (y′ , y, t), and so on.) The problem is to determine the function y(t) from

this equation. For example, if the value of a loan p grows according to the formula p′ (t) = r p(t) = f (p(t), t), where r is the rate of interest at time t, what is p(t)?

Here, one can see that p(t) = p(0)e rt is the required function, where p(0) is the ini-

tial value of the loan. In general, determining the value of y that satisfies y′ (t) = f (y(t), t) is very difficult. (For continuous functions f , a solution always exists; the

solution is unique if, in addition, the function f satisfies a “Lipschitz” condition.) This difficulty leads to a focus on tractable problems. The chapter begins with

CHAPTER 1. INTRODUCTION

18

(first-order) linear differential equations y′ (t) = b(t)y(t) + c(t) or y′ (t) = b y(t) + c

when the coefficients are constant. These equations are solved using standard techniques in the early part of the chapter. Then two useful, but non-linear, firstorder differential equations are considered – the logistic and Bernoulli equations. Following this, second- and higher-order differential equations are discussed. A second-order linear differential equation with constant coefficients has the form ay′′ (t) + b y′ (t) + c y(t) + d = 0. Such a differential equation is again solved by stan-

dard techniques but the study of stability (the behavior of the solution y(t) as a function of t) is more involved. The chapter concludes with a discussion of matrix differential equations and the stability of such systems. In Chapter 17, linear difference equations are discussed. A linear difference

equation has the form yt = ayt−1 + b. In contrast to the differential equation, in a difference equation variables move in discrete increments. A kth order linear dif-

ference equation has the form yt = a1 yt−1 + a2 yt−1 +· · ·+ a k yt−k + b. The solutions of such systems are relatively straightforward. A slight generalization occurs when the parameter b varies, say as b t – this is called the variable forcing function case and is discussed for a variety of models relating b to t. The second-order difference equation, yt = a1 yt−1 + a2 yt−1 , is studied in detail, in particular the stability and dynamic behavior of yt . The conditions described here apply to autoregressive

processes of order 2: yt = a1 yt−1 + a2 yt−1 + ǫ t , where ǫ t is a random error term. The chapter concludes with a discussion of systems of difference equations.

Chapter 18 introduces some basic material on probability and random variables. The density and distribution function of a random variable are defined. The chapter then develops a few of the fundamental discrete distributions: the Bernoulli, Poisson and binomial distributions. Following this, continuous distributions are introduced. The chapter discusses in detail four important distributions that arise routinely in statistical theory and are the key distributions that appear in hypothesis testing. These are the normal, chi-square, t and F distributions. The relations between these distributions are explored in detail. In particular, the chi-square density is derived from the normal density; the t distribution is derived from the normal and chi-square distributions; and the F distribution is derived from the chi-square distribution. The concluding chapter, Chapter 19, introduces estimation and some basic properties of estimators. The law of large numbers and the central limit theorem are described. The chapter then proceeds to review confidence intervals and develop the fundamentals of hypothesis testing, including the role of various distributions. Least squares estimation is reviewed and the use of major test statistics (t and F) is explained.

2 Matrices and systems of equations

2.1

Introduction

In linear algebra, a rectangular array of numbers is called a matrix. For many problems in mathematics, economics, and other fields, arranging numbers in such blocks turns out to be very useful – for example, in solving systems of equations, analyzing the evolution of dynamic systems and so on. Working with blocks of numbers requires a methodology for the manipulation of such arrays, so that addition, subtraction, multiplication and other algebraic procedures are defined. Sections 2.2 and 2.3 introduce arrays of numbers (vectors and matrices) and provide the basic rules of manipulation (addition, subtraction and multiplication). Addition, subtraction and multiplication are natural extensions of the scalar rules. In contrast, matrix inversion is more involved, requiring a number of steps, described in Section 2.3.4. Matrix inversion is defined for square matrices but not all square matrices are invertible. For square matrices, a summary statistic, the determinant, identifies key properties of a matrix. In particular, a square matrix can be inverted if and only if it has a non-zero determinant. The determinant of a matrix is discussed in Section 2.3.5. Inverting a matrix ofter arises in solving systems of equations, but it is not the only way. Sections 2.3.6 and 2.3.7 provide alternative methods. In Section 2.4, linear dependence, a key concept in linear algebra, and independence are discussed. From this, the rank of a matrix is defined and matrix rank is used to discriminate between equation systems with many, one or no solution. Section 2.5 provides a geometric perspective on the task of solving equation systems and proceeds to give necessary and sufficient conditions in terms of rank and linear independence for a system to be soluble. Section 2.6

19

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

20

concludes with an introduction to some special matrices. The remainder of this section provides some elementary motivation for the use of matrices.

2.1.1 Some motivating examples The following examples illustrate how matrices arise naturally in a variety of areas and where the matrix formulation greatly simplifies subsequent analysis. The first example introduces adjacency matrices, a simple way to represent “closeness” or adjacency between entities – in this case cities, where closeness is represented by the minimum number of flights necessary to travel from one city to another. E XAMPLE 2.1: Suppose there are four airports, A , B, C and D . The figure shows the flights available from each airport. The only flight from A is to

B; from C , flights are available to A and D – and so on. These flight options may be represented in an array, with rows identified with flight origin and columns identified with flight destination. Thus, in the first row (corresponding to A ) and the third column (corresponding to B), there is a “1”, indicating that there is a flight from A to B. In the third row (corresponding to C ), there is a “1” in columns 1 and 4, corresponding to destinations A and

D : it is possible to take a flight from C to either A or D .

B

D

A

A

B

C

D

A

0

1

0

0

B

0

0

1

0

C

1

0

0

1

D

0

0

1

0

C

Write M or M1 for this array or matrix of numbers. Consider now the destinations reached if exactly two flights are taken. From A there is one possible destination, C , reached by a flight to B and then to C . From B there are two possible destinations: A and D with corresponding flight paths

B → C → A and B → C → D . In matrix M2 , these calculations are repre-

sented. Looking at the first row, there is just one possible destination, C ,

2.1. INTRODUCTION

21

with two flights from A . This is indicated by “1” in column 3 of row 1 of M2 . Similarly, from B there are two possible destinations if exactly two flights are taken: A and D . In matrix M2 , this is represented by a “1” in row 2 (corresponding to B) column 1 (corresponding to A ), and in row 2 column 4 (corresponding to D ). Matrix M3 indicates the destinations with exactly three flights. Finally, the matrix M (2) identifies the destinations reachable with two or fewer flights (two or one flights). Simple matrix rules may be used to generate matrices such as M3 or M (2).  0  0 M = M1 =   1

0

 1  0 M3 =   1

0



1

0

0

0

1

0

0

 0 ,  1

0

0

0

1

1

0

1

1

1

 0  1 M2 =   0



0

1

0

0

0

1

1

 1 ,  0

1 0 1 0 0 1    0 1 1 0 1      0  , M (2) = 1 0 1 1 .    1 1 1 1 1 0

1

0

1

1

♦

The next example describes a standard demand system. Demand for a good typically depends not just on the good’s own price, but also on the price of related goods. For example, if two goods are substitutes, such as butter and margarine, one expects the price of one good to affect the demand for the other. But, with such cross-price effects, it becomes more difficult to determine the equilibrium price in each market. The next example illustrates this issue.

E XAMPLE 2.2: This example considers equilibrium price determination. To illustrate, consider the case where there are two markets, 1 and 2, defined by the supply and demand conditions in each market. Suppose the demand and supply in the pair of markets is described by the following equations:

D 1 = 20 − 3 p1 + p2 ,

S1 = 2 + 2 p1

D 2 = 10 + 2 p1 − 4 p2 ,

S2 = 4 + 3 p2 .

So, for example, a unit increase in the price of good 1 reduces the demand for good 1 by 3 units, while an increase in the price of good 2, raises the demand for good 1 (possibly because good 1 is a substitute for good 2, with the higher price for good 2 causing consumers to switch to good 1).

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

22

Setting supply equal to demand in each market gives: 20 − 3 p1 + p2 = 2 + 2 p1 10 + 2 p1 − 4 p2 = 4 + 3 p2 , and rearranging, 5 p1 − p2 = 18 −2 p 1 + 7 p 2 = 6 .

These equations determine the price pair ( p1 , p2 ) at which both markets clear (supply equals demand in each market). The solution is ( p1 , p2 ) = (4, 2). However, finding such solutions is more difficult when the number of mar-

kets is large, with correspondingly large numbers of prices to be determined. This pair of equations is rearranged in matrix form in Example 2.5. The procedure generalizes directly to any number of equations and variables and provides a systematic way of dealing with and solving large systems of

♦

equations.

The final example focuses on the dynamic evolution of economic variables. At any point in time, suppose that the current state of the economy evolves to a future state according to a law of motion. Analyzing the long-run behavior of such economic variables is simplified by matrix techniques. The next example illustrates such a dynamic model.

E XAMPLE 2.3: The Phillips curve asserts that the change in inflation is proportional to the gap between the level of unemployment (u) and a natural rate of unemployment (u¯ ). If π denotes the rate of inflation, with subscripts for time periods, then π t − π t−1 = −α( u t − u¯ ).

Okun’s law connects changes in unemployment to output changes, which are in turn connected to the inflation rate. Assume that this relation has the form

u t − u t−1 = β( g − π t ), where g is the growth in the money supply. Rearranging these equations: π t = π t−1 − α( u t − u¯ )

u t = u t−1 − β( g − π t ).

2.2. VECTORS

23

Suppose that α = β = 21 , so the equations are π t = π t−1 − 12 (u t − u¯ ) and

u t = u t−1 − 12 ( g − π t ). Substituting the expression for u t into the first equa-

tion and substituting the expression for π t into the second equation, some rearrangement gives: ¸ · 2 1 2 4 π t −1 − u t −1 + g + u¯ 5 5 5 5 ¸ · 2 4 2 1 u t = π t−1 + u t−1 + − g + u¯ . 5 5 5 5 πt =

From these equations, the pair (π t , u t ) are determined by (π t−1 , u t−1 ) so the evolution of system is determined by the coefficients on π t−1 and u t−1 . This system can be arranged as a matrix system where the dynamics are determined by the matrix terms (see Example 2.5).

♦

The following discussion focuses on two types of array – vectors, where the array has one row or one column, and matrices, where the number of rows and columns is arbitrary.

2.2 Vectors Definition 2.1: A

VECTOR

in R n is an ordered collection (row or column) of n

real numbers. Throughout, lower case bold letters, such as a, b and c, will denote vectors. To highlight the dimension of a vector, say that a is an n-vector. For example, a =

(1, 3) and b = (5, 2) are 2-vectors, c = (3, 4, 1, 7) is a 4-vector, and so on. If two

vectors have the same number of rows or columns they may be multiplied, to form the inner product.

Definition 2.2: Let f and g be two n-vectors with f = ( f 1 , f 2 ,... , f n ) and g = ( g 1 , g 2 ,... , g n ). The

INNER PRODUCT

f · g = f 1 g1 + f 2 g2 + · · · + f n g n .

of f and g is denoted f · g and defined:

So the entries of the vectors are multiplied pairwise and added together. For example, with a and b given above, a · b = 1 × 5 + 3 × 2 = 11. Note that the order does

not matter since f · g = g · f . Equations may be represented using this notation. The equations x + y = 5 and 2 x + 3 y = 12 may be written as (1, 1) · ( x, y) = 5 and

(2, 3) · ( x, y) = 12.

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

24

There is a convenient way to represent f 1 g 1 + f 2 g 2 + · · · + f n g n . Suppose there

are 20 numbers, labeled x1 , x2 ,... , x20 . Adding the first ten can be written P x1 + x2 + · · · + x10 , but there is also a shorthand notation for this, 10 i=1 x i , which

can be read as saying “add up the numbers labeled from 1 to 10” or “sum from 1 to P P15 P6 P 10” the numbers x i . Similarly, to illustrate, 15 i=7 x i = i=1 x i − i=1 x i . The sign Pn is called the summation sign. With this notation: f · g = i=1 f i g i = f 1 g 1 + f 2 g 2 +

· · · + f n gn.

2.3 Matrices Definition 2.3: A

MATRIX

is a rectangular array of numbers.

Throughout, upper case bold letters, such as A , B and C , will denote matrices. A matrix B is an m × n matrix if it has m rows and n columns (this is sometimes

denoted by B − m × n).

If B is an m × n matrix we may write it as: 

b11

b12

  b21  B= .  ..  b m1

b22 .. .

b1n

···

 b2n   . ..  .   b mn

··· .. .

b m2



···

For example, C is a 2 × 3 matrix:

C=

Ã

2

11

4

1

0

6

!

.

Sometimes, the matrix B may be denoted {b i j } i∈ I, j ∈ J , where I and J are index

sets, or just as {b i j }. An element b i j of the matrix B represents the element in the position given by the i th row and jth column, the ( i, j)th element of B . There are some special matrices, such as the identity ( I ), null (0) and diagonal matrices (D ): 

1

0

 0  I = .  ..  0

1 .. .

··· .. .

0

···

···

0



 0  , ..  .  1



0

0

 0  0=.  ..  0

0 .. .

··· .. .

0

···

···

 0  0  , ..  .  0



   D=  

a11

0

0 .. .

a22 .. .

0

0

···

0

··· .. .

0 .. .

···

a nn



   .  

(2.1)

In the diagonal matrix (D ), the diagonal elements may be arbitrary numbers but the off-diagonal elements must be zero. When a matrix has just one row or one column, it is a vector.

2.3. MATRICES

25

The next sections discuss matrix algebra, in particular addition, subtraction, multiplication and inversion. The rules for addition, subtraction, multiplication are straightforward. Understanding matrix inversion requires some discussion, and the computation of the inverse involves a number of steps.

2.3.1

Matrix addition and subtraction

Let A and B be two m × n matrices, so that both A and B have the same number of rows and columns. M ATRIX

ADDITION :

The sum of A and B is obtained by adding the entries in the

two matrices pairwise: 

a11 + b11

  a +b 21  21 A+B =  ..  .  a m1 + b m1

M ATRIX

SUBTRACTION :



a12 + b12

···

a1 n + b1 n

a22 + b22 .. .

···

.

a2 n + b2 n .. .

a m2 + b m2

···

a mn + b mn

..

   .   

The difference of A and B is obtained by subtracting en-

tries pairwise: 

a11 − b11

  a −b 21  21 A−B =  ..  .  a m1 − b m1



a12 − b12

···

a1 n − b1 n

a22 − b22 .. .

···

.

a2 n − b2 n .. .

a m2 − b m2

···

a mn − b mn

..

   .   

So the entry in the i th row and jth column of A + B is a i j + b i j , and the entry in the i th row and jth column of A − B is a i j − b i j . For example, if

A=

Ã 1

3

4

5

8

1

!

and

B=

Ã

2

4

1

9

0

3

!

,

then

A+B = A−B =

Ã Ã

1+2

3+4

4+1

5+9

8+0

1+3

1−2

3−4

4−1

5−9

8−0

1−3

! !

= =

Ã Ã

!

3

7

5

14

8

4

−1

−1

3

−4

8

−2

; !

.

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

26

Note from the definition that A + B = B + A . Also, ( A + B ) + C = A + (B + C ). For

two matrices A and B , the operations of addition and subtraction are defined only if A and B have the same number of rows and columns.

2.3.2 Matrix multiplication Let A and B be two matrices:  a11 a12 a13 · · ·   a21 a22 a23 · · ·   a a32 a33 · · · A=  31  . .. .. ..  .. . . . 

a m1

a m2

a m3

···

a1 n





 a2 n    a3 n  ,  ..  .   a mn

b11

b12

b13

  b21   b B=  31  .  .. 

b22

b23

b32 .. .

b33 .. .

b n2

b n3

b n1

b1 p

···



 b2 p    b3 p  .  ..  .   bn p

···

··· .. . ···

So A is an m × n matrix and B is an n × p matrix. If the number of columns in A ,

n, equals the number of rows in B , then the matrices can be multiplied to form

the matrix C = AB , where C is defined component-wise as

c i j = a i1 b1 j + a i2 b2 j + · · · + a in b n j =

n X

a ik b k j .

k =1

In words, take the i th row of the A matrix and the jth column of the B matrix and form the inner product. Note that this is possible since A has the same number of columns as B has rows so that both the i th row of the A matrix and the jth column of the B matrix are vectors with the same number of elements, n, and the inner product can be formed. For example:

A=

Ã

1

3

2

4

3

6

!

,

 1   B = 2 3



5

2

5

 1 , 2

2

AB =

Ã

13

24

9

28

47

23

!

.

E XAMPLE 2.4: Recall the adjacency matrix in Example 2.1. Write f i j = 1 if

there is a flight from city i to city j; and write f i j = 0 if there is no flight

from i to j. For notational convenience, label the cities A , B, C and D as 1, 2, 3 and 4. Then the adjacency matrix may be written: 

f 11

f 12

f 13

  f 21 M =   f 31

f 22

f 23

f 32

f 33

f 42

f 43

f 41



 0    f 24   = 0   f 34  1

f 14

f 44

0



1

0

0

0

1

0

0

0

1

 0 .  1

0

2.3. MATRICES

27

Consider the matrices M 2 = M M , obtained by multiplying M with itself,

and M 3 = M M M , obtained by multiplying M by itself three times. These calculations give:





0

0

1

0

 1 M =  0

0

0

1

1

0

0

 1 ,  0

2

1

1





1

0

0

1

 0 M =  1

1

1

0

1

1

1

 0 .  1

3

0

0

Let f i2j be the entry in the i th row and jth column of M 2 . From its definition:

f i2j = f i1 f 1 j + f i2 f 2 j + f i3 f 3 j + f i4 f 4 j . If f ik f k j > 0, then there is a flight from i to k and a flight from k to j; and if

f i2j = 0, then f ik f k j = 0 for each k and there is no airport k with connecting

2 flight from i to j. In this example, f 13 = 1, indicating that there is exactly one

route with two flights from airport 1 to airport 3. (If f i2j = 2 this would mean that there are exactly two distinct routes connecting i and j, each involving

two flights.) Thus, f i2j is the number of distinct routes from i to j involving exactly two flights. Going further, let f i3j be the ( i, j)th entry in M 3 . This gives the number of distinct routes from i to j with exactly three flights. Finally, if we add M and M 2 together: 

 







0

1

0

0

0

0

1

0

0

1

1

0

 0 M + M2 =   1

0

1

0

0

1

0

1

1

1

1

0

1

0

0

   1  = 1   0  1

0

0

   0  + 1   1  0

0

1

 1  = M (2).  1

0

0

1

1

1

1

Thus, if f i j (2) is the entry in the i th row and jth column of M + M 2 , then

f i j (2) gives the number of distinct routes from i to j with either one or two

♦

flights.

With matrix multiplication, there are some basic points of caution to note. Even when the product AB can be formed, this does not imply that B A can be formed. The expression B A is not defined unless p = m (i.e., B has the same number of columns as A has rows). For example:    ! Ã 1 3 1    2 1    A =  5 1 , B = gives AB = 5 1 1 2 1 2

 3 Ã  2 1  1 1

1 1

!



5

 = 11

5

4



 6 ,

3

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

28

but the expression B A is not defined since B has fewer columns than A has rows. If both AB and B A exist, they are generally not equal and may not be the same dimension. For example:

B=

Ã

2

1

1

1

!

,C=

Ã

3

0

1

2

!

,

so BC =

Ã

7

2

4

2

!

while CB =

Ã 6

3

4

3

!

and

A=

Ã

1

3

2

4

 2  5  , B = 5 6 3

1

!



Ã

 32 7  , so AB = 42 1

27

!

36





4

10

16

 , and B A =  19

43

 67 .

5

13

21

A matrix may be multiplied by a real number by multiplying each element by the number. Take a real number, λ ∈ R (where R is the set of real numbers), then λ A is defined component-wise. The ( i, j)th element of λ A is λa i j . For example:

D=

Ã 1

3

4

2

!

, λD =

Ã

λ1

λ3

λ4

λ2

!

, if λ = 2, 2D =

Ã

2

6

8

4

!

.

The operations of matrix addition and multiplication have the following properties.

A (B + C ) = AB + AC λ( A + B ) = λ A + λB, λ a scalar

(λ + µ) A = λ A + µ A, λ, µ scalars. Matrices provide a succinct way of expressing systems of equations. The system of m equations in n unknowns, { x1 , x2 ,... , x n }:

a11 x1

+

a21 x1 .. .

+

a m 1 x1

+

+

a12 x2

+

···

...

+

+

...

+

a22 x2 .. .

+

a m 2 x2

+

...

+

+

a1 n x n

b1

=

a2 n x n .. .

=

a mn x n

=

b2 .. .

=

bn

may be written: 

a11

a12

a13

  a21   .  ..  a m1

a22 .. .

a23 .. .

··· .. .

a m2

a m3

···

···

a1 n



x1





b1



        a2 n    x2   b 2  =  .  ..  ..     ..  .   .   .  bm a mn x n

To illustrate, consider the examples introduced earlier.

2.3. MATRICES

29

E XAMPLE 2.5: Considering Example 2.2 shows how a system of equations may be arranged in matrix form using matrix multiplication: 5 p1 − p2 = 18

Ã

or

−2 p 1 + 7 p 2 = 6

5 −2

−1

!Ã

7

p1

!

p2

=

Ã

18 6

!

.

Similarly, the Phillips curve equations in Example 2.3 may be written in matrix form: ¸ · 2 1 2 4 g + u¯ π t = π t −1 − u t −1 + 5 5 5 5 ¸ · 2 4 2 1 u t = π t−1 + u t−1 + − g + u¯ 5 5 5 5

or Ã

πt

ut

2.3.3

!

=

Ã

4 5 2 5

− 25 4 5

!Ã

π t −1

u t −1

! Ã£ 1

+

¤!

g + 52 u¯ ¤ £ 52 − 5 g + 15 u¯

.

♦

Matrix transpose

Let A be a matrix, not necessarily square. The transpose of A is obtained from

A by “rotating” A so that the i th row becomes the i th column. Write A ′ for the transpose of A . The ( i, j)th element of A ′ is a ji , so the ( i, j)th element of the transpose is the ( j, i )th element of the original matrix. In general, if A is an m × n matrix, then A ′ is an n × m matrix. For example,   Ã ! Ã ! Ã ! 2 1   2 4 6 3 0 3 1 ′  ′  B= ⇒ B =  4 5 , D = ⇒D = . 1 5 3 1 2 0 2 6 3 When using vectors in matrix computations, dimensions must be assigned – since matrix multiplication depends on correct dimensions. To illustrate, let x and     y be the two 3 × 1 vectors: 1 4        x= , y = 2 5 . 3 6 Then, for example, x′ = (1, 2, 3), x′ y = 1 · 4 + 2 · 5 + 3 · 6 = 32 = x · y. Similarly,     Ã ! ! 1 Ã 2 1 ´    ³ 2 4 6  2 = 28 and y′ B ′ = (4, 5, 6)  Bx = = 64 47 . 4 5     20 1 5 3 3 6 3

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

30

However, the expressions xB or B ′ y are not defined, since the matrices do not conform for matrix multiplication. The transpose operation has the following properties: ( A ′ )′ = A

( A + B )′ = A ′ + B ′ ( AB )′ = B ′ A ′

( ABC )′ = C ′ B ′ A ′ . Note, for example, that if AB exists then A has the same number of columns as

B has rows. Say A is an m × n matrix and B an n × p matrix: then A ′ is an n × m

and B ′ an p × n matrix so that B ′ A ′ exists.

2.3.4 Matrix inversion When considering a single variable, division is straightforward. Given a number a, provided a 6= 0, one can divide 1 by a to get −1

a

. This term,

1 a

−1

or a

1 a,

or in alternative notation,

, is called the inverse of a or “a inverse”. It routinely

arises when solving equations. If 2 x = 4, then multiplying both sides by 12 gives ¡1¢ ¡1¢ ¡1¢ ¡1¢ 2 2 x = 2 4, and since 2 2 = 1 and 2 4 = 2, we get 1 x = 2 or x = 2. The same calculation may be expressed as 2−1 2 x = 2−1 4, and since 2−1 2 = 1, x = 2−1 4 = 2.

For the expression ax = b, where a and b are real numbers and a 6= 0, exactly the

same procedure applies: a−1 ax = a−1 b, and with a−1 a = 1, x = a−1 b. We wish to apply exactly the same logic to matrices.

Considering the equation system Az = b, if there is a matrix, A −1 , such that

A −1 A = I , then we can multiply the equation to get A −1 Az = A −1 b or I z = A −1 b

or z = A −1 b. For example, given the pair of equations

x+ y=5

(2.2)

2 x + 3 y = 12,

(2.3)

we wish to find a pair ( x∗ , y∗ ) that satisfies both of these equations. One can see that (0,4) satisfies the second equation, but not the first, while (5,0) satisfies the first but not the second. It is straightforward to find a solution by substitution. Take Equation 2.2 and express x in terms of y : x = 5 − y. This can be substituted

into Equation 2.3 to find: 2(5 − y) + 3 y = 12 or 10 − 2 y + 3 y = 12 or 10 + y = 12, so that

y = 2. Since x + y = 5, this implies that x = 3 and so the solution is ( x∗ , y∗ ) = (3, 2).

However, this solution procedure is not satisfactory for large systems of equations

and does not allow one to easily investigate circumstances under which solutions

2.3. MATRICES

31

exist or are unique. Writing the pair of equations in matrix form gives: Ã 1

1

2

3

!Ã ! x

y

=

Ã

!

5

12

.

Taking the matrix system on the left, consider multiplying both sides by an appropriate matrix: Ã

3

−1

−2

!Ã 1

1

2

3

1

!Ã ! x

y

=

Ã

3

−1

−2

!Ã

5

!

12

1

.

Carrying out the matrix multiplications on both sides gives: Ã 1

0

0

1

!Ã ! x

=

y

Ã ! 3

2

Ã ! x

or

y

=

Ã ! 3

2

.

This “appropriate” matrix is in fact the inverse of A :

A

−1

=

Ã

3

−1

−2

!

1

.

Observe that A −1 A = I :

A

−1

A=

Ã

3 −2

!Ã −1 1

1

2

1 3

!

=

Ã 1

0

0

1

!

.

This following discussion considers matrix inversion, developing the computation of A −1 . In general given a system of n equations with n unknowns, Az = b, where A is

an n × n square matrix, assuming we can invert A , we may solve for z according

to z = A −1 b. However, this is not always the case. Not all square matrices are

invertible (have an inverse). Returning to the single equation, ax = b, there are

three distinct possibilities. The first is that there is a unique solution when a 6= 0, so a−1 is defined, and the solution is given by x = a−1 b. A second possibility is that there is no solution – if a = 0 and b 6= 0 it is impossible to find x satisfying

ax = b. Finally, there may be an infinite number of solutions – if a = b = 0, then

any number x satisfies ax = b.

The situation is essentially the same in the matrix case. Consider the following three equation systems:

x+ y =5

x+ y=5

x+ y=5

2 x + 3 y = 12

3 x + 3 y = 12

2 x + 2 y = 10.

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

32

The first system has a unique solution (x∗ , y∗ ) = (5,12). The second system

requires that x + y = 5 and 3(x + y) = 12 or x + y = 4: there is no (x, y) satisfying both equations. The third system requires that x + y = 5 and 2x + 2y = 10 or x + y = 5 –

so any (x, y) satisfying the first equation also satisfies the second, and there are infinitely many solutions. If the matrix A has an inverse, A −1 , then A −1 A = I, the identity matrix and

given Az = b, we can multiply both sides by A −1 to obtain A −1 Az = A −1 b or z = A −1 b. In this case, there is a unique solution. When A does not have an in-

verse, there is either no solution or an infinite number of solutions. That issue is considered in detail in Section 2.5, along with the more general case where the number of equations and the number of variables may differ. A square matrix has an inverse if and only if its determinant (defined next) is non-zero. An alternative criterion for invertibility may be given in terms of matrix rank (developed in Section 2.5.2). R EMARK 2.1: A square matrix has a non-zero determinant if and only if the columns of the matrix are linearly independent, in which case the matrix is said to be of full rank. Rank and linear independence are introduced in Section 2.4.1. The concept of linear independence provides geometric insight into the circumstances in which a matrix is invertible, but is not required to define the inverse of a matrix.

When a matrix has an inverse, the inverse can be found in a number of ways. The method described here involves finding in turn a number of matrices, leading to the inverse. Briefly, the steps are to find the determinant along with the minor, cofactor and adjoint matrices; the inverse matrix is the adjoint matrix scaled by the reciprocal of the determinant (for which reason, it must be non-zero). These are developed in turn. R EMARK 2.2: As noted, the method discussed next is not the only way to invert a matrix, but it serves the purpose of introducing determinants, minors and cofactors. Later, alternative methods of inversion are discussed.

2.3.4.1

The determinant

Let A be a square matrix. The determinant of A is denoted | A |. The determinant is defined inductively. If A = {a}, so that A is a 1 × 1 matrix, then | A |= a

(the determinant is equal to the number, not the absolute value). For example, if

2.3. MATRICES

33

A = (−2), then | A |= −2. In the 2 × 2 case, A=

Ã

a11

a12

a21

a22

!

¯ ¯a ¯ 11 ⇒| A | = ¯ ¯ a21 def

¯ a12 ¯¯ ¯ = a11 a22 − a21 a12 . a22 ¯

In words, multiply the pair a11 and a22 and subtract from that the product of

a21 and a12 . For the 3 × 3 case the determinant is defined: 

a11

a12

 A=  a21 a31

a22

a32 ¯ ¯a ¯ 21 + a13 ¯ ¯a31

a13



¯ ¯a  ¯ 22 a23   , | A |= a11 ¯¯a 32 a33 ¯ a22 ¯¯ ¯. a32 ¯

¯ ¯ ¯a a23 ¯¯ ¯ 21 ¯ − a12 ¯ ¯a31 a33 ¯

¯ a23 ¯¯ ¯ a33 ¯

(2.4)

For example, 



1

2

3

 A= 1

3

 5  , | A |= 1(36 − 25) − 2(12 − 5) + 3(5 − 3) = 11 − 14 + 6 = 3.

1

5

(2.5)

12

Notice that the definition of the determinant for a 3 × 3 matrix makes use

of determinants of 2 × 2 matrices. Similarly, the definition of the determinant

of a 4 × 4 matrix utilizes 3 × 3 determinants, and the definition of the determinant of a n × n matrix utilizes (n − 1) × (n − 1) determinants. So the definition

of the determinant is given inductively. The general process is as follows. Given the square matrix A , let A i j be the matrix obtained from A by deleting the

i th row and jth column. If A is an n × n matrix, A i j is an (n − 1) × (n − 1) ma-

trix. For the n × n matrix A , the determinant may be calculated according to the

formula

| A |= a11 | A 11 | −a12 | A 12 | +a13 | A 13 | + · · · + (−1)1+n a1 n | A 1 n | .

(2.6)

(The pattern may be seen in the computation of the determinant in Equation 2.5.) Observe that in this calculation we are required to compute determinants of the

A i j matrices. However, if A is an n × n matrix, then A i j is an (n − 1) × (n − 1) matrix.

Therefore, if we have defined the determinant of a 2 × 2 matrix, we can define the

determinant of a 3 × 3 matrix. In turn, this allows us to define the determinant of a 4 × 4 matrix, and so on.

Note that the a i j coefficients are drawn from a chosen row, row 1, but in fact any row may be chosen for the calculation. In the example given above, if the

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

34

second row were chosen, the computation would proceed as: ¯ ¯ ¯ ¯ ¯ ¯ ¯2 3 ¯ ¯1 3 ¯ ¯ 1 2¯ ¯ ¯ ¯ ¯ ¯ ¯ | A | = −1 ¯ ¯ +3¯ ¯ −5¯ ¯ ¯5 12¯ ¯1 12¯ ¯ 1 5¯

= −1(2 · 12 − 3 · 5) + 3(1 · 12 − 1 · 3) − 5(1 · 5 − 1 · 2)

= −1(9) + 3(9) − 5(3) = 3.

Note that in Equation 2.6, we have a “−” sign before a12 while there is a “+” sign

before a13 . The rule here is that if in a i j , i, j sum to an even number (i.e., i + j is

even), then we place a “+” sign before that a i j and if i, j sum to an odd number then we place a “−” sign before that a i j . Finally, note that (−1)2 = (−1)(−1) = 1,

(−1)3 = (−1)(−1)(−1) = −1 and, in general, (−1)k = 1 if k is even and (−1)k = −1 if

k is odd. Thus

| A | = (−1)1+1 a11 | A 11 | +(−1)1+2 a12 | A 12 | +(−1)1+3 a13 | A 13 | + ... · · · + (−1)1+n a1 n | A 1 n |

=

n X

(−1)1+k a1 k | A 1 k | .

(2.7)

k =1

In fact, the determinant can be calculated by expanding along any row. Expanding along the i th row: | A|=

n X

(−1) i+ j a i j | A i j | = (−1) i+1 a i1 | A i1 | +(−1) i+2 a i2 | A i2 |

j =1

+ · · · + (−1) i+n a in | A in | .

(2.8)

While these calculations obtain the determinant by expanding along rows, the determinant may also be calculated by expansion along columns. The point is clarified in the discussion of cofactors below.

2.3.4.2

The minor matrix

Given an n × n matrix A , there is a corresponding matrix, M , called the matrix of

minors. Recall that A i j is the matrix obtained from A by deleting the i th row and

jth column of A . The minor in the i th row and jth column of the minor matrix is the determinant of A i j . That is:

m i j =| A i j | . Let M be the matrix with m i j in the i th row and jth column.

2.3. MATRICES

35

For example, in the 3 × 3 case: ¯ ¯a ¯ 22 m 11 = ¯ ¯ a32

¯ a23 ¯¯ ¯. a33 ¯

Consider the matrix A in Equation 2.5. The minors are: ¯ ¯ ¯ ¯ ¯ ¯3 5 ¯ ¯1 5 ¯ ¯1 ¯ ¯ ¯ ¯ ¯ m 11 = ¯ ¯ = 36 − 25 = 11; m 12 = ¯ ¯ = 12 − 5 = 7; m 13 = ¯ ¯5 12¯ ¯1 12¯ ¯1 ¯ ¯ ¯ ¯ ¯ ¯2 3 ¯ ¯1 3 ¯ ¯1 ¯ ¯ ¯ ¯ ¯ m 21 = ¯ ¯ = 24 − 15 = 9; m 22 = ¯ ¯ = 12 − 3 = 9; m 23 = ¯ ¯5 12¯ ¯1 12¯ ¯1 ¯ ¯ ¯ ¯ ¯ ¯1 ¯2 3¯ ¯ 1 3¯ ¯ ¯ ¯ ¯ ¯ m 31 = ¯ ¯ = 5 − 3 = 2; m 33 = ¯ ¯ = 10 − 9 = 1; m 32 = ¯ ¯1 ¯ 1 5¯ ¯3 5¯

¯ 3¯¯ ¯ = 5−3 = 2 5¯ ¯ 2¯¯ ¯ = 5−2 = 3 5¯ ¯ 2¯¯ ¯ = 3 − 2 = 1. 3¯

Together, A and its matrix of minors M are: 



1

2

3

 A= 1

3

 5 

1

5



and

12



11

7

2

 M = 9

9

 3 .

1

2

1

Finally, note that since m i j = | A i j |, the determinant may also be written as: | A |=

2.3.4.3

n X

j =1

(−1) i+ j a i j m i j .

(2.9)

The cofactor matrix

The cofactor matrix is derived directly from the minor matrix by changing the sign of minors in odd locations on the minor matrix. The i jth cofactor c i j is defined as

c i j = (−1) i+ j m i j . In words, if i + j is even, the cofactor c i j equals the minor m i j ;

and c i j = −m i j if i + j is odd. More succinctly:

c i j = (−1) i+ j m i j

(2.10)

noting that (−1) i+ j = 1 if i + j is even and (−1) i+ j = −1 if i + j is odd. Let C be the matrix of cofactors.

For example, in the 3 × 3 case: 









m 11

m 12

m 13

c11

c12

c13

 If M =  m 21

m 22

   m 23   , then C =  c21

c22

   c23   = −m 21

m 31

m 32

m 33

c31

c32

c33

m 11 m 31

−m 12

m 22

−m 32

m 13



 −m 23  .

m 33

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

36

Continuing with the earlier example, the matrix A and its minor and cofactor matrices are given by:  1   A = 1 1



2

3

3

 5 ,

5



12

11

7

 M = 9

9

1

 2  3 , 1

2



11

 C= −9

1

−7

2



 −3  .

9

−2

1

Note that, in view of Equations 2.9 and 2.10, | A |=

n X

ai j ci j ,

(2.11)

j =1

and this is called cofactor expansion along the i th row. Cofactor expansion will be discussed further.

2.3.4.4

The adjoint matrix

The adjoint matrix of A is denoted adj( A ) and defined as the transpose of the cofactor matrix: adj( A ) = C ′ . Continuing with the example, the matrices are given by: 









1

2

3

11

7

2

 A= 1

3

   5 , M =  9

9

   3  , C =  −9

1

5

12

1

2

1

11 1

−7

9

−2

2





11

  ′  −3  , C = −7

1

2

The inverse matrix is defined directly from the adjoint matrix.

2.3.4.5

The inverse matrix

Definition 2.4: The inverse of A is denoted A −1 and defined:

A −1 =

1 adj( A ). |A|

Continuing with the example (recall | A |= 3): 

11

 adj( A ) =   −7

2

−9

9

−3

1



 −2  ,

1



11 3  7 A −1 =  −  3 2 3

− 39

9 3 − 33



1 3  − 23  . 1 3

−9

9

−3

1



 −2  .

1

2.3. MATRICES

37

The inverse matrix satisfies: A A −1 = I = A −1 A  1  1  1

2 3 5

  11 3  3  7 − 5   3 2 12 3

− 93

1 3



  1  − 23  =  0   1 0

9 3

− 33

3



0

0

1

 0 .

0

1

A property of the adjoint (which follows from the cofactor expansion discussed above) is the following:  |A|   0  A adj( A ) = (adj A ) A =  .  .. 

0 |A| .. .

0

0

···

··· .. .

···



0

    =| A | · I.   |A|

0 .. .

So, given a system of n equations in n unknowns, such a system can be written in the form: Ax = b. Provided A has full rank (is non-singular), we can solve for

x : x = A −1 b .

E XAMPLE 2.6: Consider the system Ax = b, where A and b are given by: 

1

2

4

 2 A=  3

5

9

2

1

1

6

0



  4      1  , and b = 5 ,    7  1

3

2

2

and we require the solution x. Then 

− 162 61

  116   61 −1 A =  − 33  61  41 61

35 61

51 61

− 10 61

32 − 61

6 61

7 61

− 13 61

5 − 61

47 61



 57   − 61   22  61   7 − 61

  − 328 61    268     61  so that x = A −1 b =  .  − 51   61    80 61

♦

The next example develops the two-market model of Example 2.2 for arbitrary supply and demand parameters.

E XAMPLE 2.7: A two-market model with cross-price effects (the price of butter may affect the demand for margarine; the price of tea may affect

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

38

the demand for coffee) has the price of both goods enter the demand for each good. This observation leads to the market structure:

D 1 = a0 + a1 p1 + a2 p2 ,

Market 1:

D 2 = α0 + α1 p1 + α2 p2 ,

Market 2:

S1 = b 0 + b 1 p1 + b 2 p2

S2 = β0 + β1 p1 + β2 p2 .

(One might plausibly expect that the supply functions depend on own price (b2 = β1 = 0) but that is unnecessary for the computations.) To find the equi-

librium level of quantity and price in each market, equate supply and demand in each market. Market 1:

D 1 = S1

Market 2:

D 2 = S2

or or

a0 + a1 p1 + a2 p2 = b0 + b1 p1 + b2 p2

α0 + α1 p1 + α2 p2 = β0 + β1 p1 + β2 p2 .

Rearranging these equations gives: ( a1 − b1 ) p1 + ( a2 − b2 ) p2 = ( b0 − a0 ) (α1 − β1 ) p1 + (α2 − β2 ) p2 = (β0 − α0 ). These equations can be solved by substitution. From the first equation

p1 =

( b0 − a0 ) ( a2 − b2 ) − p2 . ( a1 − b1 ) ( a1 − b1 )

Substitute into the second equation to get: (α1 − β1 )

·

¸ ( b0 − a0 ) ( a2 − b2 ) − p2 + (α2 − β2 ) p2 = (β0 − α0 ). ( a1 − b1 ) ( a1 − b1 )

Collecting terms in p2 gives: · ¸ ( a2 − b2 ) ( b0 − a0 ) (α2 − β2 ) − (α1 − β1 ) . p2 = (β0 − α0 ) − (α1 − β1 ) ( a1 − b1 ) ( a1 − b1 )

Multiplying both sides by (a1 − b1 ) gives: [(α2 − β2 )(a1 − b1 ) − (α1 − β1 )(a2 − b2 )] p2 = (β0 − α0 )(a1 − b1 ) − (α1 − β1 )(b0 − a0 ). Thus,

p2 =

[(β0 − α0 )(a1 − b1 ) − (α1 − β1 )(b0 − a0 )] . [(α2 − β2 )(a1 − b1 ) − (α1 − β1 )(a2 − b2 )]

A similar calculation gives p1 .

♦

2.3. MATRICES

39

With three or more equations, the procedure in Example 2.7 becomes progressively more complicated and unmanageable. The use of matrix algebra simplifies the problem significantly, and the advantage of matrix algebra becomes more pronounced as the number of equations increases. Example 2.8 reconsiders the problem formulated in matrix terms and shows that a direct generalization to the

n-variable case is straightforward.

E XAMPLE 2.8: In matrix form, the system here may be written Ã

( a1 − b1 )

(α1 − β1 )

( a2 − b2 )

(α2 − β2 )

!Ã

p1

!

p2

=

Ã

b0 − a0

!

β0 − α0

or Ã

p1 p2

!

=

Ã

( a1 − b1 )

( a2 − b2 )

(α1 − β1 )

! −1 Ã

(α2 − β2 )

b0 − a0

β0 − α0

!

,

where the matrix Ã

( a1 − b1 )

(α1 − β1 )

( a2 − b2 )

! −1

(α2 − β2 )

is the inverse of Ã

( a1 − b1 )

(α1 − β1 )

( a2 − b2 )

!

(α2 − β2 )

.

Using the procedure above, the inverse is: Ã

( a1 − b1 )

(α1 − β1 )

( a2 − b2 )

(α2 − β2 )

!−1

Ã 1 (α2 − β2 ) = d −(α1 − β1 )

−( a 2 − b 2 )

( a1 − b1 )

where d = (a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 ). Thus, Ã

p1 p2

!

Ã !Ã ! 1 (α2 − β2 ) −(a2 − b2 ) (b0 − a0 ) = d −(α1 − β1 ) (a1 − b1 ) (β0 − α0 ) ! Ã 1 (α2 − β2 )(b0 − a0 ) − (a2 − b2 )(β0 − α0 ) . = d −(α1 − β1 )(b0 − a0 ) + (a1 − b1 )(β0 − α0 )

So [(α2 − β2 )(b0 − a0 ) − (a2 − b2 )(β0 − α0 )] [(a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 )] [−(α1 − β1 )(b0 − a0 ) + (a1 − b1 )(β0 − α0 )] p2 = . [(a1 − b1 )(α2 − β2 ) − (α1 − β1 )(a2 − b2 )]

p1 =

!

,

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

40

From this solution one can, for example, calculate the change in the equilibrium price of good 1 resulting from a change in α0 : (=

∆ p1 ∆β0 ).

∆ p1 ∆α0

=

(a 2 − b 2 ) d

♦

The following discussion elaborates on various aspects of determinants, cofactors and the inverse.

2.3.5 Additional remarks on determinants The following sections observe that the calculation of determinants may be obtained by cofactor expansion along any row or column, as described next. In addition, the determinant and inverse of a matrix satisfies a number of important properties, summarized in Sections 2.3.5.2 and 2.3.5.3.

2.3.5.1

Determinants and cofactor expansion

Let 



a11

a12

...

a1n

 a  21 A=  ..  .  a n1

a22 .. .

...

a n2

...

 a2n    ..  .   a nn

..

.



and



c11

c12

...

c1n

 c  21 C=  ..  .  c n1

c22 .. .

...

c n2

...

 c2n    ..  , .   c nn

..

.

where C is the cofactor matrix derived from A. Recall that the determinant of A may be calculated: | A |= a11 c11 + a12 c12 + · · · + a1n c1n =

n X

a1 j c1 j ,

j =1

where c i j is the i jth cofactor of A. From the definition, the determinant is computed by taking the first rows of both A and C, multiplying them pairwise, and summing. Put differently, the determinant of A equals the inner product of row 1 from matrix A with row 1 from matrix C. In fact, the same answer is obtained from the inner product of any matching rows of A and C. Furthermore, the determinant may also be computed by taking the inner product of matching columns: the inner product of the ith column of A and the ith column of C gives the determinant. For this reason, the

2.3. MATRICES

41

determinant of A satisfies the following: | A|= =

The expression

n X

j =1 n X

a i j c i j , for any i a i j c i j , for any j.

i =1

Pn

is called “row expansion” since the summation is along P a row (the i th). The expression ni=1 a i j c i j is called “column expansion” since the j =1 a i j c i j

summation is along a column (the jth). The following terms give the expansions

associated with each row (for the matrix in Equation 2.5):

a11 c11 + a12 c12 + a13 c13 = (1)(11) + 2(−7) + 3(2) = 3 a21 c21 + a22 c22 + a23 c23 = (1)(−9) + 3(9) + (5)(−3) = 3 a31 c31 + a32 c32 + a33 c33 = (1)(1) + 5(−2) + 12(1) = 3. Similarly, the column expansions are:

a11 c11 + a21 c21 + a31 c31 = (1)(11) + (1)(−9) + (1)(1) = 3 a12 c12 + a22 c22 + a32 c32 = (2)(−7) + 3(9) + (5)(−2) = 3 a13 c13 + a23 c23 + a33 c33 = (3)(2) + (5)(−3) + (12)(1) = 3. So, in the 3 × 3 case, the determinant is computed by any of the expansions

given above. This sometimes provides an easy way to compute a determinant.

E XAMPLE 2.9: In the following example the determinant is computed most easily by cofactor expansion along the row with most 0s.  3  B= 6 1

8 3 0

5



¯ ¯8  ¯ 4  ⇒| B |= b31 c31 + b32 c32 + b33 c33 = 1 · ¯¯3 0

¯ 5¯¯ ¯ = 32 − 15 = 17. 4¯

♦

The inner product of the i th row of the A matrix and the kth row of the cofactor matrix, with k 6= i , or the inner product of the jth column of the A matrix and the kth column of the cofactor matrix, with k 6= j, is called expansion by alien

cofactors. These expansions have the property that these inner products are equal P to zero. Thus, nj=1 a i j c k j = 0, if k 6= i (expansion by i th row of A , kth row of C ). P Similarly, ni=1 a i j c ik = 0, if j 6= k (expansion by jth column of A , kth column of C ).

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

42

To illustrate with the example above: 3 X

a1 j c2 j = a11 c21 + a12 c22 + a13 c13 = (1)(−9) + 2(9) + 3(−3) = 0

3 X

a i1 c i2 = a11 c12 + a21 c22 + a31 c32 = (1)(−7) + (1)(9) + (1)(−2) = 0.

j =1

i =1

2.3.5.2

Properties of determinants

1. | A |=| A ′ | . The determinant of matrix A equals the determinant of the transpose of matrix A .

2. Interchange of two rows or two columns reverses the sign of a determinant. 3. Multiplying any one row or column by a scalar k alters the determinant

k-fold. 4. The addition or subtraction of a multiple of any row (column) to or from any other row (column) leaves the value of the determinant unaltered. 5. If one row or column is a multiple of another, the determinant is 0. 6. If A and B are two square matrices of order n, then | AB |=| A || B |. For example,

A=

Ã 1

3

2

4

!

, B=

Ã

5

7

6

8

!

⇒ AB =

Ã

23

31

34

46

!

,

so | A |= −2, | B |= −2 and | A || B |= (−2)(−2) = 4. Also, | AB |= 1058 −

1054 = 4.

7. In general | A + B |6=| A | + | B |. Any of the following can hold: (1) | A +B |=| A | + | B | ;

(2) | A +B |>| A | + | B | ;

(3) | A +B | j. Thus, an upper triangular matrix has the form: 

   A=  



a11

a21

...

...

a n1

0 .. .

a22 .. .

... .. .

··· .. .

0

0

 a n2   . ..  .   a nn

0

From the definition, the determinant of an upper or lower triangular matrix is Q given by | A |= ni=1 a ii = a11 × · · · × a nn . Therefore, a triangular matrix is

non-singular if and only if all diagonal entries are non-zero.

2.6. SPECIAL MATRICES

65

Theorem 2.9: If a lower (upper) triangular matrix is invertible, its inverse is lower (upper) triangular. P ROOF : Consider a lower triangular matrix L and an entry l i j with i > j. The

minor matrix associated with this entry is obtained by deleting the i th row and

jth column of L. Deleting the jth column deletes l j j on the diagonal so the minor matrix has at entry ( j, j) the element of the L matrix in position ( j, j + 1),

which is 0. Therefore, the minor matrix is a lower triangular matrix with a 0 on the diagonal and hence has determinant m i j = 0. Thus the corresponding adjoint matrix A d has adji = 0 with i < j. The entries above the main diagonal

in the adjoint matrix are 0 and so the inverse is lower triangular. A similar calculation applies for an upper triangular matrix.

Definition 2.17: A square matrix A admits a LU (lower-upper) decomposition if A may be written: A = LU , where L is a lower triangular and U an upper

triangular matrix.

Not all square matrices A admit a LU decomposition. E XAMPLE 2.26: Let A be a 2 × 2 matrix that admits a LU decomposition,

so that A = LU :

A= =

Ã Ã

a11

a12

a21

a22

!

=

Ã

l 11

0

l 21

l 22

!Ã

u11

u12

0

u22

!

l 11 u11

l 11 u12

l 21 u11

l 21 u12 + l 22 u22

!

.

If A is non-singular, then so are L and U , so that both have all entries on the diagonal not equal to 0. Therefore, a11 = l 11 u11 6= 0. However, consider

A=

Ã

0

! 1

1

0

with

A

−1

=

Ã

0

1

1

0

!

,

so A is non-singular with a11 = 0 and any LU decomposition of a non-

singular matrix requires a11 > 0. There is no LU decomposition of A .

E XAMPLE 2.27: Let

A=

Ã

a11

a12

a21

a22

!

.

♦

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

66

a 21 a 11

Observe that, provided a11 6= 0, subtracting

× row 1 from row 2 creates a

matrix with a 0 in the lower left corner and the second row becomes: (a21 , a22 ) −

µ ¶ a21 a21 a21 (a11 , a12 ) = a21 − a11 , a22 − a12 a11 a11 a11 µ ¶ a21 = 0, a22 − a12 . a11

A matrix transformation that achieves this while leaving row 1 unchanged is as follows: Ã

1

0

− aa 21

1

11

!Ã

a11

a12

a21

a22

1

0

− aa 21 11

1

!

=

Ã

a11

a12

0

a22 − aa 21 a12

!

.

11

Observe that Ã

! −1

=

Ã

1

0

a 21 a 11

1

!

.

So, Ã

a11

a12

a21

a22

!

=

Ã

1

0

a 21 a 11

1

!Ã

a11

a12

0

21 a12 a22 − aa11

!

,

♦

which expresses A in the form A = LU .

Definition 2.18: Given an n × n matrix A , for j = 1,... , n, a principal minor is obtained from A by removing 0 ≤ k < n rows and the corresponding columns. If

A j j is the matrix obtained from A by deleting the last n − j rows and columns, this

matrix is called the leading principal minor of order j.

Theorem 2.10: An invertible matrix admits an LU decomposition if and only if the determinants of all leading principal minors are non-zero. A non-invertible matrix may have an LU decomposition (such as a triangular matrix with some 0s on the diagonal.) Definition 2.19: A square matrix A is idempotent if A = AA. The identity matrix is an idempotent matrix. Other examples: with a and b any numbers,

A=

Ã

1

a

0

0

!

or

B=

Ã

0

a

0

1

!

2.6. SPECIAL MATRICES

67

are idempotent matrices, as is

A=

Ã

1−a

a(1−a) b

b

!

a

a(1−a)2 b

 (1 − a)2 + a(1 − a) , since A A =  b(1 − a) + ab

+ a2 (1−b a)

a(1 − a) + a2



,

with A A simplifying to equal A . In the 3 × 3 case the matrix A is idempotent: 1

3 1 A= 3 1 3

1 3 5 6 − 16

1 3  − 16  . 5 6



If A is idempotent, its determinant is either 0 or 1. Since: A = A A , | A |=| A A |=| A | · | A |=| A |2 ,

using the rule that the determinant of the product of two square matrices is the product of the determinants. And, since | A |=| A |2 , it must be that | A | is equal

to 0 or 1. If A is idempotent, so is I − A , since ( I − A )( I − A ) = I I − I A − AI + A A =

I − 2 A + A = I − A.

Idempotent matrices arise naturally in statistics. The following discussion introduces a matrix that is used to compute deviations around the mean and the sum of squared deviations – the main calculation in computing the variance of sample data. Let ι be an n × 1 vector of 1s, so that ι′ = (1, 1,... , 1). Then ι′ ι = n, so def

that (ι′ ι)−1 = n1 . Define Q = ι(ι′ ι)−1 ι′ :

  1   ³ ¡ ¢ ¡ ¢ ¡ ¢  1 Q = ι(ι′ ι)−1 ι′ = ι n1 ι′ = n1 ιι′ = n1  .  1  ..    1

1

...

 1  ´ ¡ ¢ 1  1 = n1  .  .. 

1



1

...

1

1 .. .

... .. .

1

...

 1  . ..  .  1

Notice that Q is an idempotent matrix. It is symmetric and nn−1 =1

z }| { QQ = ι(ι′ ι)−1 ι′ ι(ι′ ι)−1 ι′ = ι(ι′ ι)−1 ι′ ι(ι′ ι)−1 ι′ = ι(ι′ ι)−1 ι′ = Q.

Given an n × 1 vector x so that x′ = ( x1 , x2 ,... , x n ):  1   ¡ ¢ 1 Qx = n1  .  .. 

1

1

...

1 .. .

... .. .

1

...









 1 Pn



  x¯              1  x 2  ¡ ¢  x 1 + x 2 + · · · + x n   x¯  1 = = =         . ..   ..  .. .. n      ...  .  .  . .       1 Pn 1 xn x1 + x2 + · · · + x n x¯ n i =1 x i

1

x1

x1 + x2 + · · · + x n

 n1  n

i =1 x i  Pn  i =1 x i 

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

68

Thus, Qx is an n × 1 vector with the mean of the data ( x1 , x2 ,... , x n ) in every

position. Deviations from the mean may be computed by the matrix I − Q where

I is the n × n identity matrix:

     x1 − x¯ x¯        x2   x¯   x2 − x¯        ( I − Q ) x = I x − Qx =  .  −  .  =  .  .  ..   ..   ..        ¯ ¯ xn − x x xn 

x1

Let D = I − Q . From previous discussion, D is symmetric and idempotent so that: (Dx)′ (Dx) = xD ′ Dx = xDx =

n X

i =1

( x i − x¯ )2 .

More generally, suppose that X is an n × k matrix with n ≥ k and X of rank k. Then X ( X ′ X )−1 X ′ is a symmetric idempotent matrix, as is I − X ( X ′ X )−1 X ′ .

Given a square n × n matrix A , the terms a11 , a22 ,... , a nn form the main diag-

onal – the entries along the diagonal from the top left corner to the bottom right corner. If a matrix has all entries that are not on the main diagonal equal to 0, the matrix is called a diagonal matrix. Given a collection of numbers, d1 , d2 ,... , d n , diag(d1 , d2 ,... , d n ) is the n × n matrix formed by arranging d1 , d2 ,... , d n in the main diagonal and placing 0s elsewhere.



d1

0

 0  diag(d1 , d2 ,... , d n ) =  .  ..  0

d2 .. .

··· .. .

0 .. .

0

···

dn

···

0



   .  

Definition 2.20: The trace of a square n × n matrix A is defined as the sum of the

terms on the main diagonal and is denoted tr( A ). Thus: tr( A ) =

n X

a ii .

i =1

The trace operation has a number of useful properties. • If A and B are two square n × n matrices then tr( A + B ) = tr( A ) + tr(B ). • tr( A ) = tr( A ′ ). • tr( cA ) = c · tr( A ), c a scalar. • If A is idempotent, then the trace of A equals its rank: tr( A ) = r( A ).

2.6. SPECIAL MATRICES

69

• If A is m × n and B is n × m, then tr( AB ) = tr(B A ). This follows because: tr( AB ) =

m X n X

i =1 j =1

a i j b ji =

n X m X

j =1 i =1

b ji a i j = tr(B A ).

To see this more explicitly, let C = AB and D = B A . Then:

c11 = a11 b11 + a12 b21 + · · · + a1 n b n1 c22 = a21 b12 + a22 b22 + · · · + a2 n b n2 .. .. . = . c mm = a m1 b1 m + a m2 b2 m + · · · + a mn b nm d11 = b11 a11 + b12 a21 + · · · + b1 m a m1 d22 = b21 a12 + b22 a22 + · · · + b2 m a m2 .. .. . = . d nn = b n1 a1 n + b n2 a2 n + · · · + b nm a mn P The diagonal sum of C = AB is m i=1 c ii and the diagonal sum of D = B A Pn P Pn is i=1 d ii . From the expressions above, m i=1 c ii = i=1 d ii . The following

example illustrates:

E XAMPLE 2.28:

A=

Ã

1

3

2

4

 2  , B= 5 6 3 ! 5

1



Ã

 32 7  , so AB = 42 1

27 36

!





4

10

16

 , and B A =  19

43

 67 ,

5

13

21

and observe that tr( AB ) = 32 + 36 = 4 + 43 + 21 = tr(B A ).

♦

This page intentionally left blank

Exercises for Chapter 2

Exercise 2.1 Consider the following three matrices:

A=

Ã

1

2

3

4

!



1  , B= 2 8

3 7



1

2

5

4

  2  , C = 6   6 9 7

1

3

4

3

4

2

 2 .  1

5

4





3

For each matrix, find (a) the determinant, (b) the matrix of minors, (c) the matrix of cofactors, (d) the adjoint matrix and (e) the inverse matrix. Exercise 2.2 Consider the following matrix:  1   D = 4 5

2



 3 .

1

What is the rank of D , D ′ D and DD ′ ? Find (a) D ′ (D transpose), (b) D ′ D , (c) (D ′ D )−1 (d) D (D ′ D )−1 D ′ . Call the last matrix you have calculated E (i.e., E =

D (D ′ D )−1 D ′ ). Find ED and DE . Check that ED = D and D ′ E = D ′ and explain

why this is so.

Exercise 2.3 Given A =

Ã

1

2

0

3

!

, find

(a) A 2 = A A , (b) A 3 = A A A , (c) A −1 , (d) 2 A 3 + 3 A 2 + 5 I .

Verify that (e) ( A + A ′ )′ = A ′ + A and (f) ( A A ′ )′ = A A ′ . 71

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

72

(a) For each of the following matrices, find the determinant and

Exercise 2.4

the inverse if it exists. 

1

−1

 A= 3

1

−1



2

 C= 3 0



1

4

2

 5 , 1

−1



2

2 





1

2

2

   −1   , B = 2

1

 4 ,

−4

1  −1 D=  0 3

3 

0

1

4

2

0

4

2

0

3

 1 ,  5

0

6

E=

Ã

t−2

−3

−4

0

!

t−1

.

(b) Find the value of t, such that the determinant of matrix E is zero. Solve the following systems of equations by matrix inversion or

Exercise 2.5 Cramer’s rule: (a)

3 x + 2 y = 16

(d)

−2 x + 3 y = 3

(g)

x + 2y = 4

−5 x + 7 y = 6

x− y+ z

(b)

(e)

2 x − 3 y = 18

x + 3y = 9

Q s = − c + dP

Q d = a − bP

Qd = Qs

(c)

(f)

x + 2y = 2

3 x + 4 y = 12 −x + 2 y − 3 z = 2

2x + y

= 1

4x − 2 y + 5 z = 3

= 4

−x + 2 y − z = 5 .

2x − y + z = 2

Exercise 2.6 Solve the following systems of equations by Cramer’s rule:

(a)

2 y + 4z

= 8

2x + 4 y + 2 z = 8

3x + 3 y + z = 4

(b)

x + 2y + z = 6

−x + y + 2 z = 3

x+z

= 2

(c)

x+ y− z

= 1

−2 y + 4 z

= 1

2x + 4 y − 5 z = 2 .

Exercise 2.7 Consider the following:

X is an n × k matrix e is an n × 1 vector of 1s (i.e., a column vector with n ones) A is an n × n symmetric matrix P = X ( X ′ X ) −1 X ′ M = In − P

EXERCISES FOR CHAPTER 2

73

Q = In −

µ ¶ 1 ee′ n

F = X ( X ′ A X )−1 X ′ A. (a) Determine the order of P, M,Q, F (i.e., the number of rows and columns). (b) Determine which of these are idempotent (B is idempotent if B 2 = BB = B ). (c) Determine which are symmetric (B = B ′ ). (d) Show that for an n vector x 

x1 − x¯



   x2 − x¯  n X   xi . Qx =  .  , where x¯ =  ..  i =1   x n − x¯

Exercise 2.8 The holdings of stock of individuals I and II in companies A, B, C, D are given by the matrix:

A I II

Ã

B

C

D

2000 1000

500

5000

1000

2000

0

500

!

.

Denote the stock price of stock A by p A (and similarly for B, C and D ). If the vector of stock prices (per share) is ( p A , p B , p C , p D ) = (24, 47, 150, 14), calculate the value of the stock holdings of both individuals.

Exercise 2.9 (a) Plot the following pairs of equations:

x + 3y = 9

(1)

3 x + 3 y = 12

(2)

4x + 3 y = 4

(3)

2x + 5 y = 9

(4)

4x + 2 y = 8

4x + y = 8

8x + 6 y = 8

4 x + 10 y = 10. If possible, determine the solution of each equation pair.

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

74

(b) Write the four pairs of equations in matrix form. Thus, for example,

ax + b y = α cx + d y = γ

is written: Ã

a

b

c

d

!Ã ! x

y

=

Ã ! α γ

.

(c) Find the determinant of each of the four coefficient matrices in part (b). Exercise 2.10 For the following matrices:

A=

Ã 1

4

2

1

!

,

B=

Ã

3

2

1

6

!

,

C=

Ã 1

3

1

3

2

1

!



,

2

 D= 4

6

1



 4 ,

2

(a) Find AB , B A , | A |, | B |, | AB | and compare | AB | with | A | × | B |. (b) Find DC , CD , | CD | and | DC |. (c) Find A ′ and C ′ .

Exercise 2.11 Use Cramer’s rule to solve the following two equation systems:      !Ã ! Ã ! Ã 1 2 3 x1 2      1 2 x1 5 3 2 1  x2  = 4 . and =      3 4 x2 6 2 3 1 x3 6 Exercise 2.12 Given:

A=

Ã

3

5

2

3

!

, B=

Ã 2

4

4

2

!



8

 , C=  −5

10





−1

1

−1

2 ³    y = −7 , x = 1  4

3

  −3 3     2  , D = 0 −4 6



0

3

−1

 2 

3

0

´ 7 ,

find (a) A + B , (b) C − D , (c) D ′ , (d) 5 A − 6B , (e) A −1 , (f) C −1 , (g) CD , (h) DC , (i)

( x′ x), (j) ( x′ x) y, (k) y′ C , (l) the inverse of D .

Exercise 2.13 In matrix notation, a linear regression equation can be written

y = X β + u, where y is an n × 1 vector of observations on the dependent variable.

EXERCISES FOR CHAPTER 2

75

X is a n × k matrix of observations on the exogenous variables. β is a k × 1 vector of coefficients.

u is an n × 1 vector of error terms. The (ordinary least squares OLS) estimator b is defined: b = ( X ′ X )−1 X ′ y. Suppose that 



  2      3 and y = 4 . 5 3

1

1

 X = 1

1

(a) Calculate b. (b) Calculate P = X ( X ′ X )−1 X ′ , M = I n − P (where I n is a n × n identity matrix). (c) Show or prove that X ′ M = 0 and explain why this is so. Exercise 2.14 For the matrix  1  D= 2 1



2

3

1

 4 ,

2

1

find the determinant and the matrices of minors and cofactors. Give the adjoint matrix and then the inverse matrix, D −1 . Exercise 2.15 Given the matrices

A=

Ã

2

! 5

4

3

,

B= 

4   E = 6 4

Ã 1

4

3

2

!

,



3

2

6

 5 , 5

9

C=

Ã 1

3

5

4

2

1

 2   F = 6 4

!



 D= 3

,

6

1



 4 ,

2



3

2

6

 5 :

9

2

5

(a) Find A ′ , B ′ , B ′ A ′ and ( AB )′ . Confirm that B ′ A ′ equals ( AB )′ . Is it also true that D ′ C ′ = (CD )′ ?

(b) Calculate the matrix H , defined H = I − D (D ′ D )−1 D ′ . Confirm that H ′ = H ,

HD = 0 and that HH = H . Let the matrix J be defined: J = I − C ′ (CC ′ )−1 C . Show

that JC ′ = 0.

(c) Find the rank of A , E , F , CC ′ and DD ′ . Also find the determinants of A , B , E and F .

CHAPTER 2. MATRICES AND SYSTEMS OF EQUATIONS

76

(d) Using Cramer’s rule, solve: Ã 2

5

4

3

!Ã

x1

!

x2

=

Ã ! 4

7

 4  6  4

,

3 6 9





  2      x 2  =  4 . 5     5 x3 6

2

x1

Note that the coefficient matrices are A and E . Exercise 2.16 Let 

a11

a12

  a21  A= .  ..  a m1

a22 .. .

··· .. .

a m2

···

···

a1 n



 a2 n   , ..  .   a mn



b11

b12

  b21  B= .  ..  b n1

b22 .. .

··· .. .

b n2

···

···

b1k



 b2k   . ..  .   b nk

Thus, A is an m × n matrix and B an n × k matrix. Show that ( A ′ )′ = A (the

transpose of A ′ is A ), and ( AB )′ = B ′ A ′ . For the second part, the ( j, i )th element of ( AB )′ is the ( i, j)th element of AB . Thus, show that the ( i, j)th element of AB is equal to the ( j, i )th element of B ′ A ′ .

3 Linear algebra: applications

3.1

Introduction

This chapter discusses applications of matrix algebra to illustrate techniques developed earlier. Section 3.2 reviews the basic supply and demand model with two goods. Sections 3.4 to 3.7 consider various applications, illustrating techniques such as the use of matrix polynomials and convergence of matrix expansions. These techniques have widespread use (for example in the study of dynamic systems) and are developed further in later chapters.

3.2

The market model

The two-market model has demand and supply in each market given by: Market 1:

D 1 = a0 + a1 p1 + a2 p2 ,

Market 2:

S1 = b 0 + b 1 p1 + b 2 p2

D 2 = α0 + α1 p1 + α2 p2 ,

S2 = β0 + β1 p1 + β2 p2 .

Equating supply and demand in each market gives the system of equations determining equilibrium prices: Ã (a1 − b1 )

(α1 − β1 )

(a2 − b2 )

!Ã

(α2 − β2 )

p1

!

p2

Thus, Ã

p1 p2

!

=

Ã

(a1 − b1 )

(α1 − β1 )

(a2 − b2 )

=

Ã

.

(b0 − a0 )

!

(β0 − α0 )

! −1 Ã

(α2 − β2 )

!

(b0 − a0 )

(β0 − α0 )

.

This is of the form discussed in the previous chapter with Ã ! Ã ! Ã ! (a1 − b1 ) (a2 − b2 ) p1 (b0 − a0 ) A= , x= , b= . (α1 − β1 ) (α2 − β2 ) p2 (β0 − α0 ) 77

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

78

E XAMPLE 3.1: Consider the following two-market model: Market 1

Market 2

D 1 = 6 + 5 p1 + 4 p2

D 2 = 5 + 4 p1 + 7 p2

S1 = 12 + 2 p1 + 8 p2

S2 = 9 + 6 p1 + 4 p2

To find the equilibrium prices equate supply and demand. In the first market this gives: 6 + 5 p1 + 4 p2 = 12 + 2 p1 + 8 p2 ⇔ 3 p1 − 4 p2 = 6. Similarly, for the second market: 5 + 4 p 1 + 7 p 2 = 9 + 6 p 1 + 4 p 2 ⇔ −2 p 1 + 3 p 2 = 4 . This gives the equation system (in matrix form): Ã

3

−2

so that Ã

p1 p2

!

=

Ã

3 −2

−4

−4

!Ã

3

p1 p2

! −1 Ã ! 6

4

3

!

=

Ã

=

Ã ! 6

4

,

3

4

!Ã ! 6

2

3

4

=

Ã

34 24

!

.

The example generalizes directly to the case of many markets.

♦

3.3 Adjacency matrices A network of connections can be expressed as an adjacency matrix that summarizes connections between different nodes in a network. If there is a direct link from node i to node j, write m i j = 1. If, in addition, there is a direct link back from

j to i , then also, m ji = 1. The network of links is directed in the sense that a link

from i to j does not necessarily imply a link from j to i . The adjacency matrix is: 

m 11

m 12

  m 21  M = .  ..  m n1

m 22 .. .

··· .. .

m n2

···

···

m1n



 m2n   . ..  .   m nn

Just as M identifies those pairs ( i, j) such that j can be reached from i in one step (m i j = 1), it is easy to identify those nodes connected by exactly two steps.

3.3. ADJACENCY MATRICES

79

Consider M 2 = M M . The entry in the ( i, j)th position of M 2 is

m2i j =

n X

m ik m k j .

k =1

Note that m ik m k j = 1 if and only if there is a link from i to k and from k to j.

Thus, m2i j is the number of distinct ways to go from i to j in two steps (using exactly two links). This insight carries over to steps of greater length. Consider M 3 = M M 2 . The P element in the ( i, j)th position is m3i j = nr=1 m ir m2r j . If there is a path from i to

r, then m ir = 1 and there are m2r j two-step paths (or m ir m2r j paths since m ir =

1) from i through r to j. If there is no path from i to r, m ir = 0 and there are

m ir m2r j = 0 paths from i through r to j. In total, there are m ir m2r j paths of length P 3 from i to j that pass through r. Summing over all r, there are nr=1 m ir m2r j

paths of length 3 that go from i to j. Thus, the entry m3i j in M 3 gives the number of paths of length 3 that go from i to j. If the matrix M is multiplied k times, the same reasoning implies that the ( i, j)th entry m kij gives the number of paths of length k going from i to j. Define the sum:

Sk = M + M2 + M3 + · · · + M k =

k X

M j.

j =1

The element skij in the ( i, j)th position gives the number of paths of length k or less, from i to j. E XAMPLE 3.2: Consider the network: 1

 0   M = 0 1

3

2



1

0

0

 1 ,

0

0

 0  2  M = 1 0



0

1

0

 0 ,

1

0

 1  3  M = 0 0



0

0

1

 0 .

0

1

The matrices M , M 2 , M 3 give the nodes reachable in one, two and three steps from any initial node.  0 1  S1 = M =  0 0 1 0

0



 1 ,

0



1   S 3 = M + M + M = 1 1 2

3





0

1

1

 S2 = M + M 2 =  1

0

 1 ,

1

1 1 1

 1  1 , 1

1

0

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

80

so, for example, considering row 1 of S2 , there is one way to go from node 1 to node 2 using one or two steps. But there is no way to go from node 1 (back) to node 1 in one or two steps. From S3 , there is one way to go from

♦

any node to any other in three or fewer steps.

The next example illustrates similar points with a larger set of nodes.

E XAMPLE 3.3: In this example, the network consists of six nodes. The corresponding adjacency matrix is on the right.

1

1 2

5

2

3 3

4 5

6

4

6

Then M 2 and the sum S6 =  0  2   0 2 M =  0  0  0

P6

j =1 M



0

1

1

1

0

0

0

0

0

1

0

0

0

0

0

1

0

1

0

1

0

0

0

0

0

 2   0 ,  0  0  1

j



1

2 3 4 5 6

0

1 0 0 0 0

  0   1    0   1  0

0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0



  0   0    1   1  0

are:  6  6   3 S6 =   0  3  0



3

3

7

3

8

6

3

11

3

3

3

5

3

0

0

3

0

3

3

8

3

0

0

3

0

 10   4 .  3  7  3

Thus, for example, from M 2 there is exactly one path of length 2 from node 1 to node 5: (1 → 2 → 5), indicated by m 15 = 1. Similarly, there are exactly

two paths of length 2 from node 2 to node 1 ((2 → 3 → 1) and (2 → 5 → 1)),

indicated by m 21 = 2.

Considering S6 , s6i j , the ( i, j)th entry gives the number of paths with 6

or fewer steps from i to j. So, for example, s24 = 11 indicates that there are

11 paths from 2 to 4 with 6 steps or less.

♦

3.4. INPUT–OUTPUT ANALYSIS

81

R EMARK 3.1: One issue concerns the possibility of moving from one node to another. The main observation in this regard is that with n nodes, if it is not possible to reach node j from node i in n steps, then it is impossible to reach j from i in any number of steps. This is easy to see. Let Wk ( i ) ⊆ {1, 2,... , n} be the set of nodes reachable from i in k or fewer

steps. Write #Wk ( i ) to denote the number of nodes in Wk ( i ). Then Wk ( i ) ⊆ Wk+1 ( i ):

Wk ( i ) is a subset of Wk+1 ( i ), so that if a node l is in Wk ( i ) it is also in Wk+1 ( i ). This follows from the observation that if it is possible to reach j from i in k

or fewer steps, it is certainly possible to reach j in no more than k + 1 steps. Next, if Wk ( i ) = Wk+1 ( i ) then the same set of nodes is reached in k steps as

in k + 1 steps – additional steps reach no extra nodes. If Wk+1 ( i ) = Wk ( i ) then

Wk+ j ( i ) = Wk ( i ) for j = 1, 2,... so, at each iteration in k, either extra nodes are reached or the process ceases to add nodes. Suppose that Wn ( i ) 6= {1, 2,... , n}.

Then for some k < n it must be that Wk ( i ) = Wk+1 ( i ), otherwise the set Wk ( i ) would have increased at each round k and, with n steps, would have n nodes.

But, Wk ( i ) = Wk+1 ( i ) implies that Wk ( i ) = Wk+ j ( i ) for all j ≥ 1. At j = n − k,

Wk ( i ) = Wn ( i ), and for any r ≥ 0, with j = n − k + r, Wk ( i ) = Wn+r ( i ). For any r, Wk ( i ) = Wn ( i ) = Wn+r ( i ). If node l is not in Wn ( i ) it can never be reached

from i . Note that this includes i : if i is not in Wn ( i ), there is no way to return to i .

The possibility of moving from one node to any other is closely related to the concept of irreducibility, which arises in the study of dynamic processes. Definition 3.1: A matrix A is irreducible if and only if, for any i , j, there is a sequence i 1 , i 2 ,... , i k with i 1 = i , i k = j and a i l i l+1 6= 0 for l = 1,... , k − 1.

3.4 Input–output analysis In input–output analysis, the economy is viewed as a number of interlinked industries (n industries). The output of an industry is used for final consumption, and in the production of other goods. For example, steel is used to produce trucks and transportation equipment, roads and bridges, and so on. Moreover, industries may be interlinked in complex ways: trucks are produced using steel but also transport iron ore used to make steel, giving rise to “circular” relationships in the production process – achieving a certain level of steel output implies an appropriate level of transportation capacity. Apart from output going from one industry to others, some steel is “consumed” – for example, people buy garden tools, kitchen utensils and so on. So, the total output of steel is allocated between that used in

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

82

production in other industries (such as transportation and housing industries), an intermediate or “interindustry” demand; and consumption or final demand. Planning to achieve some (target) level of consumption or final demand indirectly implies production levels in each industry. Input–output analysis is concerned with determining the industry production levels required to sustain or achieve target levels across the entire range of final demands. A simple model is developed next to illustrate the procedure. Let a i j = amount of i required to produce 1 unit of j, x i = output of i and

f i = final demand for i . Thus if x j is the amount of good j produced, a i j x j is the

amount of i required in the production of j. (If one kilo of good j is produced,

a i j is the fraction of one kilo of i used in the production of the kilo of j.) Let all measurements be in dollars. (So, for example, a i j is the number of dollars of good i required to produce one dollar in value of good j.) The output of industry

i is allocated to production in other industries (the a i j x j ) and final demand ( f i ). Thus,

x i = a i1 x1 + a i2 x2 + · · · + a in x n + f i , i = 1,... , n. This can be rearranged: −a i1 x1 − a i2 x2 · · · + (1 − a ii ) x i · · · − a in x n = f i , i = 1,... , n,

or, in matrix notation:       

(1 − a11 ) −a21 .. .

−a12

(1 − a22 ) .. .

···

−a1 n

− a n1

− a n2

···

(1 − a nn )

··· .. .

−a2 n .. .

      

x1 x2 .. . xn





      =    

f1 f2 .. . fn



   .  

This expression can be written succinctly: ( I − A ) x = f , where the matrices are: 

   I =  

1

0

0 .. .

1 .. .

··· .. .

0 .. .

0

···

1

0 

···

1 − a11   −a21  I−A= .  ..  − a n1

0



   ,  



a11

a12

  a21  A= .  ..  a n1

a22 .. .

··· .. .

a2 n .. .

a n2

···

a nn

−a12

···

−a1 n

− a n2

···

1 − a nn

1 − a22 .. .

··· .. .

−a2 n .. .



   .  

···

a1 n



   ,  

3.4. INPUT–OUTPUT ANALYSIS

83

E XAMPLE 3.4:

 0.7   A = 0.1 0.1



0.4

0.3

0.2

 0.3 

0.3

0.2

The matrix A corresponds to the system:

x1 = 0.7 x1 + 0.4 x2 + 0.3 x3 + f 1 x2 = 0.1 x1 + 0.2 x2 + 0.3 x3 + f 2 x3 = 0.1 x1 + 0.3 x2 + 0.2 x3 + f 3 . So, to produce one dollar of x1 requires 70 cents of x1 as input, 10 cents of x2 and 10 cents of x3 . Thus ( I − A ) x = f is: 

0.3

 −0.1  −0.1

−0.4

0.8

−0.3





  f1         −0.3  x2  =  f 2  . 0.8 x3 f3

−0.3

x1

For a fixed value of f the unique solution to these equations is obtained:

x = ( I − A )−1 f . For example,        f1 90 x1 0.3              f = = ⇒ = f 75 x −  2     2   0.1 f3 80 x3 −0.1   1239.20    =  400.57 . 405.17

−0.4

−0.3

−0.3

0.8

0.8

 −1 

 −0.3 

 90   75   80

Thus, if the economy must meet the level of final demand given by the f vector, the required level of output in the industries is given by the x vector,

x′ = (1239.20, 400.57, 405.17).

♦

Provided ( I − A ) is non-singular, the industry output vector is x = ( I − A )−1 f .

However, not only must x satisfy x = ( I − A )−1 f , it must also be the case that x

is non-negative for it to be a feasible level of production; negative output cannot be produced. So, it must be that x i ≥ 0 for all i . Whether x is non-negative or not

depends on the A matrix.

The following discussion shows that a sufficient condition for a non-negative solution to x = ( I − A )−1 f is that the sum of elements in each column of A is less

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

84

than 1. In this case, it turns out that all entries in the matrix ( I − A )−1 are nonnegative, so that ( I − A )−1 f is non-negative, since f is non-negative. When all

column sums of A are less than 1, A k , the kth power of A , approaches a matrix of 0s as k becomes large, and this fact is sufficient to ensure that ( I − A )−1 has all

non-negative entries. The next example illustrates this property; the key result is then developed.

E XAMPLE 3.5: To illustrate, consider the following input–output coefficient matrix A and final demand vector f given by:

A=

Ã

!

a11

a12

a21

a22

=

−a12

!

Ã

Ã

x1

Ã 0.2

0.5

0.3

0.1

!

,

f =

Ã ! f1

f2

=

Ã ! 30

60

.

Then,

I−A=

Ã

1 − a11 −a21

1 − a22

=

so that

0.8 −0.3 

30 19 = 10 x2 19

!

−0.5

!

implies

0.9

( I − A)

−1



30 19 = 10 19



50 57  80 57



! 50 Ã ! Ã 100 57  30 = . 80 60 100 57

Notice that the solution is positive – both x1 and x2 are positive. This is not accidental: it is guaranteed by the fact that the column sums of the A matrix are both less than 1. With a view to motivating future discussion, consider multiplying A by itself multiple times: A 2 = A A , A 3 = A A A and

A 4 = A A A A . Note that the column sums of the A matrix are 0.5 = 0.2 + 0.3

(first column) and 0.6 = 0.5 + 0.1 (second column), and that the largest of these two numbers is 0.6. Observe now that 

19 100 A2 =  9 100



3 20  , 4 25



83 1000 A3 =  33 500



11 100  , 61 1000



31 625 A4 =  63 2000



21 400  , 391 10 000

and these matrices have column sums as follows. The column sums of A 2 = ¡ 149 171 ¢ ¡ 28 31 ¢ 3 100 , 100 = (0.28, 0.31); column sums of A = 1000 , 1000 = (0.149, 0.171); and ¡ ¢ 916 column sums of A 4 = 10811 000 , 10 000 = (0.0811, 0.0916). Finally, note that (a) the largest column sum in A 2 is no larger than (0.6)2 = 0.36, (b) the largest

column sum in A 3 is no larger than (0.6)3 = 0.216 and (c) the largest col-

umn sum in A 4 is no larger than (0.6)4 = 0.1296. There is a good reason

for this pattern and it has one particularly useful implication – that the

3.4. INPUT–OUTPUT ANALYSIS

85

solution, x, is positive. These matters are discussed next and relate to the non-negativity of the solution to the input–output problem.

♦

Before proving the result (that A k → 0 when all column sums of A are less

than 1), a simple formula for calculating certain sums of scalars will provide useful insight. This formula is then generalized to matrices. Let α be a real number (α ∈ R ) with | α |< 1. Then: (1−α)(1+α+α2 +α3 +α4 +· · ·+αn ) = (1+α+α2 +· · ·+αn )−(α+α2 +· · ·+αn+1 ) = 1−αn+1 . Thus, (1 + α + α2 + · · · + αn ) =

1 − α n +1 = (1 − α)−1 (1 − αn+1 ). 1−α

Since | α |< 1, as n becomes large, αn+1 → 0. Consequently,

lim (1 + α + α2 + · · · + αn ) = lim (1 − α)−1 (1 − αn+1 ) = (1 − α)−1 .

n→∞

n→∞

Because αn → 0 as n → ∞, (1 + α + α2 + · · · + αn + · · · ) =

1 1 −α .

This discussion applies to real numbers, but there is an analogous discussion for matrices. Let A 2 = A A , A 3 = A A A and so on. Then, ( I − A )( I + A + A 2 + A 3 + · · · + A n ) = ( I + A + A 2 + · · · + A n ) − ( A + A 2 + · · · + A n +1 ) = I − A n +1 .

Thus

I + A + A 2 + · · · + A n = ( I − A ) −1 ( I − A n +1 ) . These calculations show that provided A n → 0 as n → ∞, (1 + A + A 2 + A 3 + · · · +

A n ) → ( I − A )−1 . Thus, ( I + A + A 2 +· · ·+ A n · · · ) = ( I − A )−1 , if A n → 0 as n becomes

large.

Theorem 3.1: A sufficient condition for A n → 0 is that the sum of the elements in P any column of A is less than one: i a i j < 1 for all j. P ROOF : If this condition holds then there is some number less than 1, and P greater than any column sum: ∃ r < 1 with i a i j ≤ r < 1, ∀ j. Denote the ( i, j)th

element of A 2 by a(2) , the i th row of A by a i , and the jth column of A by ij P P P P P P (2) a j . Then a i j = a i · a j = k a ik a k j . Thus i a(2) = i k a ik a k j = k ( i a ik a k j ) ij

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

=

86

P P P P 2 k ( i a ik ) a k j ≤ r k a k j ≤ r , using the fact that ( i a ik ) ≤ r. This implies that

a(2) ≤ r2 . Proceeding in this way gives a(inj ) ≤ r n and since r < 1, a(inj ) → 0 for all ij

( i, j).

To see these calculations more clearly:  P

k a1 k a k1

 P   k a2 k a k1 A =  ..  .  P k a nk a k1 2

P

k a1 k a k2

···

P

k a1 k a kn

P

k a2 k a k2

···

P

k a2 k a kn

P

.. .

..

k a nk a k2

. P

···

.. .

k a nk a kn



   .   

Consider the first column of A 2 . Writing out all the terms:

a11 a11 + a12 a21 + · · · + a1 n a n1

a21 a11 + a22 a21 + · · · + a2 n a n1 .. . .. .. . . . a n1 a11 + a n2 a21 + · · · + a nn a n1 Adding all these together: column 1 sum

column 2 sum

z z }| { }| { a11 (a11 + a21 + · · · + a n1 ) +a21 (a12 + a22 + · · · + a n2 ) column n sum

}| { z + · · · + a n1 (a1 n + a2 n + · · · + a nn ) .

Since each column sum is no larger than r, this sum is no larger than a11 r +

a21 r + · · · + a n1 r = (a11 + a21 + · · · + a n1 )r, which for the same reason is no larger

than r2 . The same reasoning applies to the other columns of A 2 .

Returning to the discussion, lim n→∞ A n+1 = 0, when the column sums of A are

all less than 1. Thus (1 + A + A 2 + · · · ) = ( I − A )−1 and so x = ( I − A )−1 f = ( I + A +

A 2 +· · · ) f . Since all the terms in the sum f + A f + A 2 f + A 3 f +· · · = ( I + A + A 2 +· · · ) f

are non-negative, this implies that x is non-negative.

E XAMPLE 3.6: This example illustrates how the polynomial expansion approximates the inverse. Ã 0.2

0.35

!

2

Ã

0.1275

0.175

!

, ,A = 0.125 0.1775 0.25 0.3 ! ! Ã Ã 0.0381 0.0534 0.0693 0.0971 4 3 , ,A = A = 0.0381 0.0534 0.0694 0.0970

A=

3.4. INPUT–OUTPUT ANALYSIS

87

5

Ã 0.0210

0.0294 0.0294

7

0.0210 Ã 0.0063 0.0063

0.0089

A = A =

0.0089

! !

6

,A = 8

,A =

Ã Ã

0.0115

0.0161

0.0115

0.0161

0.0035

0.0049

0.0035

0.0049

!

,

!

.

So, 2

3

4

5

6

7

8

I+A+A +A +A +A +A +A +A =

Ã

1.4772

0.7348

0.5248

1.6872

!

,

and ( I − A)

−1

=

Ã

1.4815

0.7407

0.5291

1.6931

!

.

Thus,

( I − A)

−1

2

3

4

5

6

7

8

− (I + A + A + A + A + A + A + A + A ) =

Ã

0.0043

0.0060

0.0043

0.0060

!

,

which is very small. So ( I + A + A 2 + A 3 + A 4 + A 5 + A 6 + A 7 + A 8 ) approximates

( I − A )−1 accurately to the first two decimal places.

♦

R EMARK 3.2: Given f ≥ 0, the key requirement for x ≥ 0 is that ( I − A )−1 has

all non-negative entries. This is a necessary condition: if the ( i, j)th element of ( I − A )−1 is negative, a final consumption vector f with 1 in the i th row produces an x vector with a negative entry in the i th position. Consider the

condition: ∃ x∗ ≥ 0 with x∗ ≫ Ax∗ , where a ≫ b means that a i > b i for all i .

Note that this implies that x∗ ≫ 0. Since x∗ ≫ A k x∗ implies the elements of

A k x∗ are bounded, so A k x∗ or some subsequence converges to a vector z giving

x∗ ≥ z ≥ 0. Since x∗ ≫ Ax∗ , there is a λ < 1 with λ x∗ ≫ Ax∗ . Therefore, since

all entries in x∗ and Ax∗ are non-negative, A (λ x∗ ) ≥ A Ax∗ or λ Ax∗ ≥ A 2 x∗

(λ Ax∗ ≫ A 2 x∗ if each row of A is strictly positive). And, since λ x∗ ≫ Ax∗ , λ2 x∗ ≫ λ Ax∗ . Therefore λ2 x∗ ≫ A 2 x∗ . Multiplying this expression by A with

the same observations gives λ3 x∗ ≫ A 3 x∗ . With k iterations, λk x∗ ≫ A k x∗ and since λk x∗ → 0, A k x∗ → 0. And since x∗ ≫ 0, A k → 0.

Finally, suppose that the row sums of A are all less than 1. Let x∗ be a

vector of 1s, so that Ax∗ is a vector with the row sums of A . Since the row sums are all less than 1, x∗ ≫ Ax∗ , and the previous discussion implies that A k → 0.

This may be seen directly. Recall that in A 2 , the first row consisted of the terms:

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

88

P { k a1 k a k j }nj=1 . Expanding these: X k

a1 k a k1 = a11 a11 + a12 a21 + a13 a31 + · · · + a1 n a n1

X

a1 k a k2 = a11 a12 + a12 a22 + a13 a32 + · · · + a1 n a n2

k

.. . X k

.. .

=

a1 k a kn = a11 a1 n + a12 a2 n + a13 a3 n + · · · + a1 n a nn .

Adding these terms to obtain the first row sum in A 2 :

a11 (a11 + a12 + · · · + a1 n ) + a12 (a21 + a22 + · · · + a2 n ) + · · · + a1 n (a n1 + a n2 + · · · + a nn ).

Letting d i be the row sum of i , this is equal to a11 d1 + a12 d2 + · · · + a1 n d n .

And letting d = max i d i = d < 1, the sum is no larger than (a11 + a12 +· · · + a1 n )d ,

which in turn is no larger than d 2 . Thus, the row sum of A k is no larger than

d k , which converges to 0, so that A k → 0.

3.5 Inflation and unemployment dynamics Earlier calculations using a model of the Phillips curve and Okun’s law led to the equation system: ¸ · 2 1 2 4 π t −1 − u t −1 + g + u¯ 5 5 5 5 ¸ · 4 2 1 2 u t = π t−1 + u t−1 + − g + u¯ . 5 5 5 5 πt =

Rewriting in matrix form, Ã



4 5 = 2 ut 5

πt

!

− 25 4 5

Ã

 !  1 2 ¯ g + u 5 5  . + u t −1 − 25 g + 51 u¯ π t −1

We can write this system succinctly as: v t = Av t−1 + δ. At a steady state, v t = v t−1 =

v∗ and v∗ = Av∗ + δ so that v∗ = ( I − A )−1 δ or ( I − A )v∗ = δ. Here, ( I − A ) v∗ = δ

⇐⇒



1  5 − 25







2 Ã ∗! 1 g + 52 u¯ 5 π   5 = 2 1 1 u∗ ¯ − g + u 5 5 5

3.6. STATIONARY (INVARIANT) DISTRIBUTIONS

89

or 1 ∗ 2 ∗ π + u = 5 5

1 2 g + u¯ 5 5

2 1 2 1 ¯ − π∗ + u∗ = − g + u, 5 5 5 5 ¯ with solution π∗ = g and u∗ = u. Returning to the equation v t = Av t−1 + δ, whether v t converges or not depends

on the matrix A. Substituting for v t−1 ,

v t = A[Av t−2 + δ] + δ = A 2 v t−2 + A δ + δ. And continuing, v t = A t v0 + [I + A + A 2 + · · · + A t−1 ]δ. If A t → 0, then [I + A +

A 2 + · · · + A t−1 ] → (I − A)−1 and v t → v∗ = (I − A)−1 δ. (The convergence of A t is

discussed in a Chapter 15, which deals with eigenvalues.) In this case, starting ¯ from any point (π0 , u0 ), the system converges over time to (π∗ , u∗ ) = (g, u).

3.6

Stationary (invariant) distributions

The notion of a stationary distribution is most easily motivated by an example. Suppose that a worker currently employed has a 95% probability of being employed next period and a worker currently unemployed has a 25% probability of being employed next period. Letting E denote “employed” and U denote “unemployed”; write P(U | E) to denote the probability of being unemployed next period given that one is currently employed. With this notation, P(E | E) = 0.95, P(U | E) = 0.05, P(E | U) = 0.25 and P(U | U) = 0.75. This defines a transition

matrix or Markov matrix, P: Ã P(E | E) P= P(E | U)

P(U | E)

P(U | U)

!

=

Ã 0.95

0.05

0.25

0.75

!

We are interested in the evolution of employment and unemployment in the workforce. Suppose that initially (currently), the fraction πE of the workforce is employed, and the fraction πU unemployed. Of workers currently employed, the fraction P(E | E) will continue to be employed next period – so the probability of being currently employed and continuing to be employed next period is πE P

(E | E). Of workers currently unemployed, the fraction P(E | U) will find employment next period – so the probability of being currently unemployed and finding

employment next period is πU P(E | U). The fraction of the workforce employed next period, πˆ E , is equal to the fraction of those currently employed who retain employment, πE P(E | E), plus the fraction of those currently unemployed who

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

90

gain employment, πU P(E | U): πˆ E = πE P(E | E) + πU P(E | U). Similarly, the frac-

tion of the workforce unemployed next period is πˆ U = πE P(U | E) + πU P(U | U). For

example, suppose that πE = 0.7 and πU = 0.3. Then πˆ E = (0.7)(0.95) + (0.3)(0.25) = 0.74, and πˆ U = (0.7)(0.05) + (0.3)(0.75) = 0.26. Notice that: Ã ! P(E | E) P(U | E) (πˆ E , πˆ U ) = (πE , πU ) . P(E | U) P(U | U)

ˆ = π P. Because this process proceeds over time, Writing π = (πE , πU ), this gives π

let π(t) = (πE (t), πU (t)) denote the distribution of states (employment and unemployment) in period t, so the evolution of the process proceeds according to the formula π(t + 1) = π(t)P. Proceeding, step by step, if π(0) denotes the current distribution of employed and unemployed, next period the distribution is determined by π(1) = π(0)P. Repeating this reasoning, in the third period, the distribution is determined by

π(2) = π(1)P = π(0)P P = π(0)P 2 . In the (t + 1)th period, this becomes π(t) = π(0)P t . In the case where the matrix P has all positive entries, P t converges to a

matrix, P ∗ , where in any given column all entries are the same, or equivalently, all rows are identical (see Section 3.6.1). In the two state case, Ã ! α β ∗ P = , α β where α and β are positive and sum to 1, (α + β = 1). Notice that Ã ! α β ∗ π(0)P = (πE (0), πU (0)) = (α, β). α β The important point to observe is that π(0)P ∗ = (α, β), regardless of the particular

value of π(0). Write π∗ = π(0)P ∗ to denote this distribution, the long run stationary or invariant distribution. Another way of viewing this is to observe that π∗ is ˆ 6= π∗ the only distribution that satisfies the equation π = π P ∗ : π∗ = π∗ P ∗ and π ˆ 6= π ˆ P ∗ , since π ˆ P ∗ = π∗ . implies that π

The following calculations illustrate these observations using the example. Ã

0.95

0.05

!

2

Ã 0.915

0.085

!

,P = , 0.25 0.75 0.425 0.575 Ã ! Ã ! 0.8905 0.1095 0.8380 0.1620 3 10 P = ,P = , 0.5475 0.4525 0.8098 0.1902 Ã ! Ã ! Ã 0.8335 0.1665 0.8333 0.1667 0.8333 20 50 100 P = P = ,P = 0.8327 0.1673 0.8333 0.1667 0.8333 P=

0.1667 0.1667

!

.

3.6. STATIONARY (INVARIANT) DISTRIBUTIONS

91

If the initial distribution is π(0) = (0.7, 0.3), then π(1) π(2) π(3)

= π(0)P = π(0)P = π(1)P = π(0)P = π(2)P = π(0)P

π(10) = π(9) P = π(0) P π(50) = π(49) P = π(0) P π(100) = π(99) P = π(0) P

= (0.740, 0.260),

2

= (0.7680, 0.2320)

2

= (0.7876, 0.2124),

9

= (0.8296, 0.1704)

50 100

= (0.8333, 0.1667), = (0.8333, 0.1667).

Therefore, in the long run, regardless of the initial employment–unemployment ¡ ¢ distribution, the long-run distribution is π∗ = (0.8333, 0.1667) = 65 , 61 . The frac-

tion of the workforce employed in the long run is 56 .

Although this discussion has centered around the two-state case (the states being employed and unemployed), the principles apply generally. If there are n states, labeled s1 ,... , s n with probability p i j of going from state s i to s j , then the corresponding transition matrix is: 

p12

  p21  P= .  ..  p n1

p22 .. .

··· .. .

p n2

···

Pn

p i j = 1. If p i j > 0 for all ( i, j) then P t

where each row sums to 1: for each i ,

j =1

···

p1 n



p11

 p2 n   , ..  .   p nn

converges to a matrix P ∗ as t becomes large, where the matrix P ∗ has all rows equal or, equivalently, the numbers in each column are the same. If π∗ is the invariant distribution of P so π∗ = π∗ P , then   π∗    .  P ∗ =  ..  ,   π∗

a matrix with n rows, each one equal to the invariant distribution. This raises the question: Why does this convergence take place?

3.6.1

Convergence to a stationary distribution

Let P be a transition matrix with representative entry p i j : p i j is the probability that state j will be reached next time, given that the current state is i . Let M j be the largest number in column of j, and m j be the smallest number in column j.

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

92

The matrix P 2 = P P has entry p(2) in the ( i, j)th position, where p(2) = ij ij

P

k

p ik p k j .

Say that P is strictly positive if there is some (small) strictly positive number ǫ such that p i j ≥ ǫ for every i, j. Theorem 3.2: Suppose that P is strictly positive. Then P t converges to a matrix P∗ where all rows of P∗ are identical. P ROOF : Considering column j of P , since p k j ≥ m j for all k and for some k, say

k∗ , p k∗ j = M j , we see that p(2) = ij

X k

p ik p k j = p ik∗ p k∗ j +

X

k6= k∗

p ik p k j = p ik∗ M j +

X

p ik p k j .

k6= k∗

Every element in column j is at least as large as m j , p k j ≥ m j , so that X

k6= k∗

p ik p k j ≥

X

k6= k∗

p ik m j = m j

X

k6= k∗

p ik = m j (1 − p ik∗ ).

(In this calculation, because each row sums to 1,

p ik = 1 − p ik∗ .) Therefore, p(2) = ij

X k

P

k6= k∗

p ik + p ik∗ = 1,

P

k6= k∗

p ik p k j ≥ p ik∗ M j + (1 − p ik∗ )m j ≥ ǫ M j + (1 − ǫ)m j = m j + ǫ( M j − m j ),

where the inequality follows because p ik∗ ≥ ǫ. Exactly the same reasoning shows that:

p(2) = ij

X k

p ik p k j ≤ (1 − ǫ) M j + ǫ m j = M j − ǫ( M j − m j ).

Therefore, every entry of the jth column of P 2 is at least as large as m j + ǫ( M j − m j ), and no entry of the jth column of P 2 is larger than M j −

ǫ( M j − m j ). This shows that elements in the columns of P 2 are “closer” together

than elements in the columns of P . If M 2j and m2j are the largest and smallest entries in column j of P 2 then {m j + ǫ( M j − m j )} ≤ m2j ≤ M 2j ≤ { M j − ǫ( M j − m j )},

so

M 2j − m2j ≤ { M j − ǫ( M j − m j )} − {m j + ǫ( M j − m j )} = (1 − 2ǫ)( M j − m j ). The same reasoning shows that with M tj and m tj the largest and smallest entries in column j of P t , M tj+1 − m tj+1 ≤ (1 − 2ǫ) t ( M j − m j ) → 0 as t becomes large.

3.6. STATIONARY (INVARIANT) DISTRIBUTIONS

93

The intuition is simple. In the jth column of P 2 , entry ( i, j) is P i(2) = j Therefore,

P i(2) j

P

k

p ik p k j .

is a weighted average of the elements of column j of P . Since

all weights are strictly positive, this weighted average must be smaller than the largest entry in column j and larger than the smallest entry in this column. (Assuming not all entries of column j are equal.) It is worth remarking that the assumption that every entry in P is positive gives a sufficient condition for P t to converge to a matrix P ∗ with all rows equal. But it is not necessary – weaker conditions on P are sufficient to ensure convergence, but are more complex to describe.

3.6.2

Computation of the invariant distribution

Finding the long-run stationary distribution would be tedious if computed by calculating P t for large values of t. There is an easy way to find the stationary distribution. Let I be the identity matrix, and define Q = I − P . Let c ii be the cofactor P in the i th row and column of the cofactor matrix of Q , and let c = ni=1 c ii . Then

π∗ = (π∗1 , π∗2 ,... , π∗n ) is given by

π∗i =

c ii , i = 1,... , n. c

Applying this to the example discussed earlier,

Q = I −P =

Ã 1 − 0.95

− 0.25

− 0.05

1 − 0.75

!

=

Ã

0.05 −0.25

−0.05

0.25

Thus, c11 = 0.25, c22 = 0.05, c = 0.25 + 0.05 = 0.3, so that π∗1 = 0.05 0.3

= 61 .

!

.

0.25 0.3

=

5 6

and π∗2 =

E XAMPLE 3.7: Suppose that the following statistics for a baseball player are known. When batting, there is a

3 5

chance that he will hit the ball (on

a swing), given that he hit the ball on his most recent swing. Write this as

P (H | H ) = 53 . Thus, the probability that he will miss on a swing, given that

he hit the ball on his most recent swing, is P ( M | H ) = 52 . The probabilities of hit and miss given that he missed on the most recent swing are P (H | M ) = 15 ,

P ( M | M ) = 45 . The transition matrix is then: T=

Ã

P (H | H )

P (H | M )

P (M | H )

P (M | M)

!

=

Ã

3 5 1 5

2 5 4 5

!

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

94

What is the (unique) stationary or invariant distribution of this transition matrix? On average (over the season), how often does the batter miss the ball on a swing? First:



I −T =

− 25

1 − 53 − 15

1−





2 = 5 4 − 51 5

The cofactors of I − T are c11 = 15 , c22 = ¢ ¡ invariant distribution is: 31 , 23 .

2 5

− 52 1 5

 

so that c = c11 + c22 = 53 . The

♦

E XAMPLE 3.8: Students never sleep – they spend all their time in the restaurant (R) or library (L). At any given time, a student in the library has a 75% chance of being in the library an hour later (and a 25% chance of being in the restaurant). A student in the restaurant has an 80% chance of being in the restaurant an hour later (and a 20% chance of being in the library). These probabilities determine the following transition matrix:

P=

Ã

P ( L | L)

P ( R | L)

P (L | R )

!

P (R | R )



3 4 = 1 5



1 4 4 5

On “average”, what percentage of the student population is in the library (i.e., find the stationary distribution of the transition matrix)? In this case,

Q = I −P Thus, the cofactors are c11 = Therefore π1 = the library.

4 9

1 5



1 4 = − 15

− 14

1 4, average 49

and c22 =

and π2 = 95 . So, on

1 5

 

so that c = c11 + c22 =

9 20 .

or 44.4% of students are in

♦

3.7 Econometrics A common problem in statistics and economics is the measurement of the relation between different variables. Suppose there is a collection of observations on a variable, yt : ( y1 , y2 ,... , yT ) and on a second variable x t : ( x1 , x2 ,... , xT ). Suppose also that these variables are linearly related. The discussion here describes a statistical process for measuring this relationship.

3.7. ECONOMETRICS

95

When the data sets consist of just two points, (y1 , y2 ) and (x1 , x2 ), then these may be arranged in pairs, (x1 , y1 ) and (x2 , y2 ), and placed on a graph as in Figure 3.1(A). Because there are only two points, one can draw a straight line

A y2 y3 y1

⋆ •

x1 x3

B ◦

•

◦

y2 y3

⋆

y1

x2

•

◦

•

◦

x1 x3

x2

Figure 3.1: Fitting a line through both points as indicated. The slope is given by b =

y2 − y1 x2 − x1 .

Since the equa-

tion satisfies yi = a + bx i , the intercept is given by a = y1 − bx1 = y2 − bx2 . Thus, the

relationship between y and x is fully described by the intercept and slope, (a, b). Suppose now there is a third point (x3 , y3 ), located at ⋆ on the graph. This point does not lie on the line, and it is not possible to draw a straight line through these three points. And one could have still more points – such as those indicated by ◦. In the graph, it seems reasonable to say that x and y are positively

related – when x increases, so does y. Also, the relationship appears “linear”. From a different perspective, one can imagine starting with the data as in Figure 3.1(B), and trying to draw a straight line as close to all the points as possible. This problem arises in econometric theory, where there is a presumed relationship between y and x variables of the form: yi = β0 + β1 x i + ε i If the x variable takes on the value x i , the corresponding y variable is determined by multiplying x i by a constant β1 , and adding an intercept β0 and a random error term ε i . The theory is that y is determined linearly from x, plus a small random shock. Given a finite collection of observations, (y1 , x1 ), · · · ,(yn , x n ), how

should one estimate β0 and β1 ? Suppose b0 and b1 are the estimates for β0 and β1 . Then, the ith observation has a gap between yi and its estimated value (according to the estimated line given the intercept b0 and slope b1 ) of b0 + b1 x i equal to e i = yi − b 0 − b 1 x i .

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

96

Ordinary least squares requires minimizing the sum of squares of these errors by choice of b0 and b1 .

S=

n X

i =1

( y i − b 0 − b 1 x i )2 .

(3.1)

In matrix form it is easy to describe (b0 , b1 ). Define the matrices X and y: 

1

x1

 1  X = .  ..  1





 x2   ..  .   xn

y1



   y2    y= .   ..    yn

It turns out that the choice of b0 and b1 that minimizes the sum of squared errors in Equation 3.1 is given by the following expression: Ã

b0 b1

!

= b = ( X ′ X )−1 X ′ y.

E XAMPLE 3.9: Suppose that there are five observations on ( x, y) pairs: x =

(3, 4, 6, 2, 5) and y = (7, 9, 12, 5, 11). Thus:  1  1   X = 1  1  1

3





 4   6 ,  2  5



7

  9     y = 12   5   11

and ′

X X=

Ã

5

20

20

90

!

′

,

(X X )

−1



=

9 5

− 52

− 25

1 10



′

,

X y=

Ã

44

!

194

.

Therefore,

b=

Ã

b0 b1

!

′

= (X X )

−1

X

′



9 5 y= − 25

− 52 1 10

Ã 

44 194

!

  8

5 =  . 9 5

♦ R EMARK 3.3: This calculation assumes that ( X ′ X )−1 exists – that X ′ X has an inverse. What if ( X ′ X )−1 does not exist? If r( X ′ X ) < 2 then since

3.7. ECONOMETRICS

97

r( X ′ X ) = min{r( X ′ ), r( X )} and r( X ′ ) = r( X ), r( X ) < 2. This implies that the

columns of X are linearly dependent so that ( x1 ,... , x n ) = α(1,... , 1), with α 6= 0. P P P Thus, x i = α, ∀ i and so x i = nα, x¯ = 1n x i = α, so x¯ = x i , ∀ i and ( x i − x¯ )2 = 0. In

this case, b1 cannot be computed. Thus, a sufficient condition for the estimator

to exist is that x 6= α(1,... , 1) for any α. The general case with many explanatory variables is similar.

yi = β0 + β1 x1 i + β2 x2 i + · · · + βk x ki + ǫ i . In vector notation:

y = X β + ǫ, where y − n × 1, x − n × (k + 1), ǫ − n × 1. The least squares estimator is (as before):

b = ( X ′ X )−1 X ′ y, where X ′ X − (k + 1) × (k + 1). For ( X ′ X )−1 to exist, we require that r( X ′ X ) = k + 1. Since r( X ′ X ) = r( X ) ≤

min(k + 1, n), in particular, this requires n ≥ k + 1 (the number of variables to be

estimated, k, is no greater than the number of observations).

The following discussion clarifies some details on the computation of the least squares estimator. This requires knowledge calculus.

3.7.1

Derivation of the least squares estimator

The sum of squared errors is given by:

S=

n X

i =1

e2i = =

n X

i =1 n X

i =1

( y i − b 0 − b 1 x i )2 ( y2i − 2b0 yi − 2b1 x i yi + b20 + 2b0 b1 x i + b21 x12 ).

Minimization of this expression by choice of β0 and β1 requires calculus – discussed in a later chapter. To minimize this expression, differentiate S with respect to b0 and b1 in turn and set the partial derivatives to 0. This gives: ∂S ∂b0 ∂S ∂b1

=2

n X

yi − 2

i =1 n X

= −2

i =1

n X

i =1

b0 − 2b1

x i yi + 2 b 0

n X

i =1

n X

i =1

xi = 0

x i + 2b1

n X

i =1

x2i = 0

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

98

Dividing both equations by 2 and rearranging: n X

i =1

n X

i =1

yi =

n X

i =1

x i yi = b 0

b0 + b1 n X

i =1

n X

i =1

x i + b1

x i = nb0 + b1

n X

i =1

n X

xi

i =1

x2i .

So, the least squares estimators b0 and b1 satisfy the pair of equations: n X

i =1

n X

i =1

yi =

n X

i =1

x i yi = b 0

b0 + b1 n X

i =1

n X

xi

i =1

x i + b1

n X

i =1

x2i ,

or, in matrix form: Ã P n Pn

i =1 y i

!

i =1 x i y i

Ã

Pn

n

i =1 x i Pn 2 i =1 x i

= Pn

i =1 x i

!Ã

b0 b1

!

.

Then

′

X X=

Ã

1

1

x1

x2

···

···

 1 !  1 1  . . xn  . 1

x1



 Ã x2  n  = Pn ..   . i =1 x i xn

Pn

i =1 x i Pn 2 i =1 x i

!

and

′

X y=

Ã



1

1

x1

x2

···

···

y1



! !  Ã P n  1   y2  i =1 y i .  .  = Pn .  xn  i =1 x i y i  .  yn

So, the least squares estimators b′ = (b0 , b1 ) satisfy the equations:

X ′ X b = X ′ y. Rearranging, on the assumption that ( X ′ X ) is non-singular,

b = ( X ′ X )−1 X y. To compute ( X ′ X )−1 we need the determinant, minors, cofactors and adjoint. The determinant is | X ′ X | = nΣ x2i − (Σ x i )2 = nΣ x2i − n2 x¯ (= n(Σ x2i − n x¯ 2 )).

3.7. ECONOMETRICS

99

Minors ÃP P ! x2i xi P xi n

Cofactors ÃP P ! x2i − xi P − xi n

Adjoint ÃP

x2i P − xi

−

P

xi

!

n

Thus,

( X ′ X ) −1 =

ÃP

x2i P n( x2i − n x¯ 2 ) − x i 1

P

  y1 ! Ã P ! P  1···1  yi − xi  ..  ′ , X y= .  . = P   x1 · · · x n n x i yi yn !

Ã

Thus, ′

(X X )

−1

P !Ã P ! yi x2i − xi X y= P 2 P P 2 n( x i − n x¯ ) − x i x i yi n ! ÃP P P P ( x2i )( yi ) − ( x i )( x i yi ) 1 = P 2 . P P P n( x i − n x¯ 2 ) −( x i )( yi ) + n( x i yi ) ′

1

ÃP

So P P P P ( x2i )(n y¯ ) − (n x¯ )( x i yi ) [( x2i )( y¯ ) − ( x¯ )( x i yi )] = b0 = P P n( x2i − n x¯ 2 ) ( x2i − n x¯ 2 ) P P P [n x i yi − n2 x¯ y¯ ] [ x i yi − n x¯ y¯ ] ( x i − x¯ )( yi − y¯ ) . b1 = = P 2 = P P 2 ( x i − x¯ )2 n( x i − n x¯ 2 ) ( x i − n x¯ 2 )

Now, [(Σ x2i ) y¯ − x¯ Σ x i yi ] = [(Σ x2i ) y¯ − n x¯ 2 y¯ − x¯ (Σ x i yi − n x¯ y¯ )] = y¯ (Σ x2i − n x¯ 2 ) − x¯ (Σ x i yi − n x¯ y¯ )

P P P P [( x2i )( y¯ ) − ( x¯ )( x i yi )] ( x2i ) y¯ − x¯ x i yi ¯ 1. b0 = = = y¯ − xb P P ( x2i − n x¯ 2 ) ( x2i − n x¯ 2 )

Thus, without the use of matrices, we can summarize: ¯ 1 , and b1 = b0 = y¯ − xb

P ( x i − x¯ )( yi − y¯ ) . P ( x i − x¯ )2

This page intentionally left blank

Exercises for Chapter 3

Exercise 3.1 For the (supply-and-demand) system of equations given below, find the equilibrium prices: 2 p1 + p2 + 6 p3 = 26

p1 + p2 + p3 = 12 3 p 1 + 2 p 2 − 2 p 3 = 2. Exercise 3.2

 0.7  A= 0.1 0.1



0.4

0.3

0.2

 0.3 ,

0.3

    f1 90        f = = f  2  75 : f3 80

0.2

find the vector of interindustry outputs. Exercise 3.3 For the input–output model with technology matrix A = suppose that the final demand vector is f =

Ã

! 350

250

Ã

0.2

0.3

0.2

0.25

!

.

(a) Find the appropriate industry output levels (the x vector). (b) Find A 2 (= A A ), A 3 (= A A A ), A 4 (= A A A A ) and A 5 (= A A A A A ). Then calculate the matrix G = ( I + A + A 2 + A 3 + A 4 + A 5 ) and compute G f .

(c) Observe that G is close to ( I − A )−1 and your answer in (a) is close to G f . 101

,

CHAPTER 3. LINEAR ALGEBRA: APPLICATIONS

102

3.4 with technology matrix Given the input–output   model 0.0 0.2 30    40, find the equilibrium industry and final demand f = 0.1 0.0    0.2 0.3 50

Exercise  0.4  A= 0.2 0.2 outputs.

Exercise 3.5

A simple economy consists of two sectors: agriculture ( A ) and

transportation (T ). The input–output matrix for this economy is given by

A B

Ã

A

T

0.4

0.1

0.2

0.2

!

.

(a) Find the gross output of agricultural products needed to satisfy a consumer demand for $50 million worth of agriculture products and $10 million worth of transportation. (b) Find the value of agriculture products and transportation consumed in the internal process of production in order to meet the gross output.



Exercise 3.6



0.2

0.2

0.1

 Given the input–output matrix  0.2 0.1

0.4

 0.1  and final de-

0.2 0.3  100    mand f =   80 , compute the interindustry flows necessary to sustain this final 50 demand. 

Exercise 3.7

Given an input–output matrix A , if x is the vector of industry

outputs and f the vector of final demands, then x and f satisfy x = Ax + f . In the

case where



3  12 1 A=  12

0



4 12

0

0

6  , 12  2 12

4 12





20



   f = 30 , 40

find the solution value of x. Suppose that each column sum of A is less than 1: for each j,

Pn

i =1 a i j

< 1.

Show that this implies that ( I − A )−1 exists and equals ( I + A + A 2 + A 3 + · · · ) = P t I+ ∞ t=1 A . Explain why this implies that for a given final demand f (a vector

with non-negative entries), the solution value of x is non-negative.

EXERCISES FOR CHAPTER 3

103

Exercise 3.8 Consider the transition matrix: 

T

1  21 = 3 1 4

1 4 1 3 1 2



1 4 1 . 3 1 4

(a) Find the unique stationary distribution (using the cofactors) of T i.e., find the 1 × 3 vector q satisfying qT = q. (b) Find T 2 and compare with T . (c) Find the determinant of T . Is T invertible? Why? Exercise 3.9 Let 

2



  3     9     x =  3 ,   6     11  

4

Find the least squares estimator.



7



  12     26     y = 13 .   18     36  

13

This page intentionally left blank

4 Linear programming

4.1

Introduction

Linear programming is an optimization technique for finding optimal solutions to problems where the objective and constraints are linear. In what follows, the general linear programming problem and methods of solution are described, along with various features of this class of problem. A linear program is characterized by a linear objective, linear constraints and restrictions on the signs of choice variables. Linearity of the constraint set leads to a particular geometric structure on the set possible choice variables. For low-dimensional programs this permits a graphical treatment of the problem and, more generally, the geometric structure leads naturally to an algorithmic method for finding solutions. Section 4.1 develops the basic features of a linear program. This consists of formulation of the programming problem, description of key issues in solving these programs and the valuation of resource constraints through the dual prices. A broader understanding of linear programming requires knowledge of basic solutions, which are discussed in Section 4.2. Here the geometric view described in Section 4.1 is reinterpreted in terms of equation systems. Section 4.3 describes classic results in linear programming where the original program (the primal) is related to a second program (the dual). Before developing the discussion, a simple example will illustrate how these programs arise.

E XAMPLE 4.1: A firm produces two goods, 1 and 2. Production requires the application of two techniques: processing and finishing. A unit of good 1 requires two hours of processing time and four hours of finishing time; a unit of 105

CHAPTER 4. LINEAR PROGRAMMING

106

good 2 requires three hours of processing and three hours of finishing time. The total amount of time that the processing machine can run for each day is 13 hours, while the finishing machine can run for 17 hours. Each unit of good 1 sells for $3 and each unit of good 2 sells for $4. The problem is to decide how much of each good should be produced to maximize revenue – a problem that may be formulated as a linear program. Let x denote the number of units of good 1 produced, and let y denote the number of units of good 2 produced. Then the revenue is equal to 3 x + 4 y, since each unit of 1 generates $3 of revenue and each unit of

2 generates $4 of revenue. This defines the linear objective. Next, x units of good 1 require two hours of processing time, while y units of good 2 re-

quire three hours of processing time: altogether, producing x units of 1 and

y units of 2 requires 2 x + 3 y hours of processing time. Similarly, producing

x units of 1 and y units of 2 requires 4 x + 3 y hours of finishing time. The

total amount of processing time is 13 hours, so the production plan must satisfy 2 x + 3 y ≤ 13, the first linear constraint, and likewise 4 x + 3 y ≤ 17

defines the finishing constraint. Finally, negative amounts cannot be pro-

duced: x, y ≥ 0. Putting all these conditions together, the full problem may be summarized:

max subject to:

3x + 4 y 2 x + 3 y ≤ 13

4 x + 3 y ≤ 17

(processing) (finishing)

x, y ≥ 0.

♦

4.1.1 Formulation In a linear-programming problem a collection of variables ( x1 , x2 , · · · , x n ) must be

chosen to maximize (or minimize) a linear function,

f ( x1 , x2 , · · · , x n ) = c 1 x1 + c 2 x2 + · · · + c n x n , subject to a finite number of linear constraints of the form:

a1 x1 + a2 x2 + · · · + a n x n ≤ b, a1 x1 + a2 x2 + · · · + a n x n ≥ b.

a1 x1 + a2 x2 + · · · + a n x n = b,

or

4.1. INTRODUCTION

107

In addition, the x i variables are required to be chosen non-negative. A typical linear program may be written: max

c 1 x1 + c 2 x2 + · · · + c n x n

subject to:

a11 x1 + a12 x2 + · · · + a1 n x n ≤ b1 a21 x1 + a22 x2 + · · · + a2 n x n .. .

≤ .. .

b2 .. .

a m1 x1 + a m2 x2 + · · · + a mn x n ≤ b m x1 , x2 , · · · , x n

≥ 0.

This is a linear program with n variables and m constraints. If we write c =

( c1 , c2 , · · · , c n ),



a11

a12

  a21  A= .  ..  a m1

a22 .. .

··· .. .

a m2

···

···

a1 n



 a2 n   , ..  .   a mn



b1



   b2    b =  . ,  ..    bm



x1



   x2    x =  . ,  ..    xn

then the program may be written in matrix form: max subject to:

c·x (4.1)

Ax ≤ b

x ≥ 0.

In parallel with the maximization problem, there is minimization formulation: min subject to:

c 1 x1 + c 2 x2 + · · · + c n x n a11 x1 + a12 x2 + · · · + a1 n x n

a21 x1 + a22 x2 + · · · + a2 n x n .. .

a m1 x1 + a m2 x2 + · · · + a mn x n x1 , x2 , · · · , x n

≥ b1

≥ .. .

b2 .. .

≥ bm

≥ 0.

In matrix form: min subject to:

c·x Ax ≥ b

x ≥ 0.

(4.2)

CHAPTER 4. LINEAR PROGRAMMING

4.1.1.1

108

Mixed constraints and unrestricted variable signs

The formulations for maximization and minimization given in Equations 4.1 and 4.2 have a standardized form with non-negativity constraints. In general, maximization and minimization problems can be cast in this way. The following discussion illustrates. Consider the program: max subject to:

c 1 x1 + c 2 x2 a11 x1 + a12 x2 ≤ b1

a21 x1 + a22 x2 ≥ b2 x 1 ≥ 0.

In this program, both types of inequality appear, and only x1 is required to be non-negative. Replacing x2 with y − z and requiring y and z to be non-negative

overcomes the second issue – picking y = 0 and z any desired positive number gives x2 equal to the negative of that number. max subject to:

c 1 x1 + c 2 y − c 2 z a11 x1 + a12 y − a12 z ≤ b1

a21 x1 + a22 y − a22 z ≥ −b2 x1 , y, z ≥ 0.

The first issue is solved by multiplying the relevant constraint by −1 to reverse

the inequality: for example 5 ≥ 4 is equivalent to −5 ≤ −4 and ax ≥ b is equivalent

to −ax ≤ −b. In this example, multiplying a21 x1 + a22 y − a22 z ≥ b2 by −1 gives: max subject to:

c 1 x1 + c 2 y − c 2 z a11 x1 + a12 y − a12 z ≤ b1

−a21 x1 − a22 y + a22 z ≤ b2

x1 , y, z ≥ 0.

P Similarly, given an equality constraint a i x i = b i , this is equivalent to two P P constraints, a i x i ≤ b i and a i x i ≥ b i . Note that the convention is to formulate

the minimization program with the inequalities having the “≥” form: in matrix terms, Ax ≥ b. In the linear program, constraints of the form Ax ≤ b or Ax ≥ b

are called feasibility constraints; the constraints that x i ≥ 0 for all i are called

non-negativity constraints. Taking both types of constraint together, refer to them as the program constraints or constraints.

4.1. INTRODUCTION

109

The next sections consider a number of important concepts in linear programming. Section 4.1.2 discusses the feasible region – those choices that satisfy all the constraints of the problem. Following this, Section 4.1.3 considers the problem of finding an optimal solution. In Section 4.1.4, dual prices and slack variables are introduced. Dual prices attach a price or valuation to each constraint, while slack variables indicate whether constraints are binding or not. Finally, the dual program associated with a linear program (called “the primal”) is introduced (Section 4.1.6). The dual program is derived from the primal program, is closely related and provides additional perspective on the primal program.

4.1.2

The feasible region

A vector x = (x1 ,... , x n ) is called feasible if it satisfies the linear constraint inequalities and the non-negativity constraints.

Definition 4.1: The set of x vectors that satisfy the constraints Ax ≤ b and x ≥ 0 is called the feasible set or feasible region.

Geometrically, it has a special shape – it is a convex polyhedron. The following examples give the feasible regions defined by inequality constraints (on the left side), and the corresponding graphs show the feasible regions as the shaded areas. E XAMPLE 4.2: In this example, the feasible region is defined by two inequality constraints and the non-negativity constraints. 2x + y ≤ 6 3x + 4y ≤ 12 x, y ≥ 0

y 6 5 4

2x + y = 6

3 2 1 0

•

3x + 4y = 12

x

0 1 2 3 4 5 6

♦

The next example has three constraints. The additional constraint adds corners to the feasible region, and indicates the impact of additional constraints.

CHAPTER 4. LINEAR PROGRAMMING

110

E XAMPLE 4.3:

y

2 x + 2 y ≤ 10

6

3 x + 6 y ≤ 18

5

2 x + 12 y ≤ 24

2 x + 2 y = 10

4

x, y ≥ 0

3 x + 6 y = 18

3 2

•

1 0

•

2 x + 12 y = 24

x 0 1 2 3 4 5 6 7 8 9 10 11 12

♦

The feasible region is defined by linear equations, but this region may be quite complex. The next example illustrates increasing complexity. In this example, the second constraint is written as 2 x + 6 y ≥ 12, which is equivalent to −2 x − 6 y ≤ 12. E XAMPLE 4.4:

y 3 x + 4 y ≤ 12

6

2 x + 6 y ≥ 12

5

2 x + 8 y ≤ 20

4

x, y ≥ 0

3 x + 4 y = 12 2 x + 8 y = 20

3 2 1 0

•

•

2 x + 6 y = 12

x 0 1 2 3 4 5 6 7 8 9 10

♦

4.1.3 Finding an optimal solution Once the feasible region is identified, one can begin the search for an optimal solution: a point in the feasible region such that there is no other point in the feasible region that improves the objective value.

4.1. INTRODUCTION

111

Definition 4.2: Consider the linear program: max c · x subject to Ax ≤ b and x ≥

0. If x∗ is feasible and c · x∗ ≥ c · x for all feasible x, then x∗ is called an optimal solution. Similarly, in the program: min c · x subject to Ax ≥ b and x ≥ 0. If x∗ is

feasible and c · x∗ ≤ c · x for all feasible x, then x∗ is called an optimal solution.

When it is possible to graph the feasible region (with two choice variables), and there are not too many constraints, the problem can be solved by inspection. To illustrate, consider Example 4.1 again.

E XAMPLE 4.5: Recall the program from Example 4.1: max subject to:

3x + 4 y 2 x + 3 y ≤ 13

4 x + 3 y ≤ 17

x, y ≥ 0.

Consider the objective function, π( x, y) = 3 x + 4 y. An equation such as

π( x, y) = 2 defines a line giving those combinations of ( x, y) such that π( x, y) = 2. For example, because 3 · 0 + 4 · 21 = 2 and 3 · 23 + 4 · 0 = 2, the points ¢ ¡ ¢ ¡ ( x, y) = 0, 21 and ( x, y) = 23 , 0 are on this line. Referring to π as profit, the

set {( x, y) | π( x, y) = k} is called an “iso-profit” line with profit k. Iso-profit

lines for the objective function, along with the feasible region, may easily be graphed. In the figure, the iso-profit lines are drawn for k = 8, 12 and 18.

Plotting these and the constraints gives:

y π( x, y) = 18

17 3

π( x, y) = 12

13 3

6 4 x + 3 y = 17

5 π( x, y) = 8

4 (2, 3) •

3 2

2 x + 3 y = 13

1 0 0

1

2

3

4

17 4

5

6

13 2

7

x

CHAPTER 4. LINEAR PROGRAMMING

112

From the figure, the highest value for the objective, 18, is achieved at (2, 3), the corner of the feasible region determined by equations 2 x + 3 y = 13 and 4 x + 3 y = 17. Using Cramer’s rule: ¯ ¯2 ¯ ¯ ¯4

¯ 3¯¯ ¯ = 6 − 12 = −6, 3¯

¯ ¯13 ¯ ¯ ¯17

¯ 3¯¯ ¯ = 39 − 51 = −12, 3¯

Thus, the solution is x∗ = ( x∗ , y∗ ) =

¡ −12 −6

¯ ¯2 ¯ ¯ ¯4

¯ 13¯¯ ¯ = 34 − 52 = −18. 17¯

¢ ¡ 12 18 ¢ , −−18 6 = 6 , 6 = (2, 3). Note that the

feasible region is defined by its corners. Determining the objective value at each of the corners gives the table: Corner : ( x, y) (0, 0) ¡ 13 ¢ 0, 3

(2, 3)∗ ¡ 17 ¢ 4 ,0

Objective value 0 = 3(0) + 4(0) ¡ ¢ 17 31 = 3(0) + 4 13 3

18 = 3(2) + 4(3) ¡ ¢ 12 43 = 3 17 4 + 4(0)

By comparing the objective value at each corner, observe that the highest value occurs at corner (2, 3). This is not accidental. An optimal solution can always be found at a corner.

♦

An important theorem in linear programming (see Theorem 4.2, later) says that if a linear program has an optimal solution, then it has an optimal solution at a corner. This does not say that every optimal solution is at a corner – just that there is an optimal solution at a corner (assuming the problem has an optimal solution). The key benefit of the theorem is that in searching for an optimal solution we can restrict attention to the corners of the feasible region. When there are many variables and a graphical solution is not possible, being able to restrict attention to corners is very useful. One method for solving a linear program, the simplex method, compares corners, moving from one to another to improve the objective and stopping when no further improvement is possible. (New methods move inside the feasible set during the search.) In more than two dimensions, the meaning of “corner” may be less clear. In higher dimensions, the appropriate concept is vertex or extreme point of the feasible region. Note in the figures that no corner is a (strict) convex combination of other points in the feasible region. We may define a corner of the feasible region to be a point, x∗ , in the feasible region with the property that we cannot find two other points in the feasible region, say y, z (different from x∗ ), such that x∗ lies on a straight line connecting these two points. The only points in the

4.1. INTRODUCTION

113

figures satisfying this criterion are the corners. This criterion is independent of the dimension of the feasible region (the number of elements in the x vector) and defines a vertex or extreme point of the feasible set in general. This is discussed further in Section 4.2.

4.1.4

Dual prices and slack variables

In a linear program, the inequality constraints identify resource constraints: usage of a particular resource may not exceed a particular level. Subject to these constraints, the objective is optimized. Therefore, when a resource constraint is varied (some right-hand side parameter b i is altered), this generally has an impact on the solution and corresponding objective value. This change in the objective value is measured by the dual price.

4.1.4.1

Dual prices

The dual or shadow price of the ith constraint is defined as the change in the value of the optimized objective due to a unit change in the right-hand side of the ith constraint. For example, suppose that the program has been solved and the solution is at a corner x∗ with associated objective value c · x∗ . Then, x∗ is located

at the intersection of some set of constraints. Changing the right-hand side (RHS) of one of those constraints slightly will change the intersection point slightly, to say x′ . Suppose that the RHS of the ith constraint is changed from b i to b′i and suppose that the corner at which the solution is located is defined by the same set of constraints (the solution occurs at the x′ ). In Example 4.6, figure A, suppose the original solution is at (2,3). If the RHS of constraint 1 is changed from 13 ¡ ¢ to 14 and the solution will move to 96 , 22 6 – still defined by the intersection of constraints 1 and 2.

(This is the typical situation. Formally, the solution basis does not change. See Example 4.6, Remark 4.1 and the discussion of basic solutions in Section 4.2.) Let the new solution be x′ . Then the dual price of the ith constraint is defined: DP i =

Change in objective c · x′ − c · x∗ = Change in RHS of constraint i b′i − b i

E XAMPLE 4.6: To illustrate, continuing with the previous example, consider the impact of changing the right-hand side (RHS1 ) of the first

CHAPTER 4. LINEAR PROGRAMMING

114

constraint from 13 to 14. In the notation above, b1 = 13, b′1 = 14. This defines a new program:

max 3x + 4y

subject to:

2x + 3y ≤ 13 (1) 4x + 3y ≤ 17 (2) ≥ 0

x, y

max 3x + 4y

subject to:

2x + 3y ≤ 14 (1′ )

==⇒

4x + 3y ≤ 17 (2) ≥ 0

x, y

When the RHS of constraint 1 changes to 13 from 14, the new solution using Cramer’s rule is: ¯ ¯2 ¯ ¯ ¯4

¯ 3¯¯ ¯ = 6 − 12 = −6, 3¯

¯ ¯14 ¯ ¯ ¯17

¯ 3¯¯ ¯ = 42 − 51 = −9, 3¯

Then the new solution is at x = (x′ , y′ ) =

¡9

22 6, 6

y

¢

¯ ¯2 ¯ ¯ ¯4

¯ 14¯¯ ¯ = 34 − 56 = −22. 17¯

(see figure A).

y figure A

17 6 3

figure B

17 6 3

5

5

13 34

•

¡9

22 6, 6

• (2,3)

3

13 34

¢

2

2

1

1

0

x 0

1

2

• (2,3) •

3

3

4

17 4

5

6

13 2

¡ 15

, 16 6

3

4

6

¢

0

7

x 0

1

2

17 4

5

6

13 2

The objective value at the new solution is: c · x′ = (3,4) ·

22 6, 6

¡9

¢

=

¡1¢ 6

[3(9) + 4(22)] =

¡1¢ 6

115 = 19.1667 = 19 16 .

7

4.1. INTRODUCTION

115

Therefore, DP1 =

∆objective c · x′ − c · x∗ 19.1667 − 18 = = = 1.1667 = 1 16 . ∆RHS1 b′1 − b1 14 − 13

The same reasoning applies to constraint 2. Consider raising the RHS (RHS2 ) of constraint 2 from 17 to 18. This is depicted in figure B. max 3x + 4y subject to: 2x + 3y ≤ 13 (1) 4x + 3y ≤ 17 (2) ≥ 0

x, y

max 3x + 4y subject to: 2x + 3y ≤ 13 (1)

==⇒

4x + 3y ≤ 18 (2′ ) x, y The new solution is at x† = (x† , y† ) = c · x† = (3,4) ·

¡ 15 6

¡ 15 6

≥ 0

¢ , 16 6 with objective value:

¢ ¡1¢ ¡1¢ , 16 6 = 6 [3(15) + 4(16)] = 6 109 = 18.1667

Using Cramer’s rule, the calculations for determining the solution are: ¯ ¯2 ¯ ¯ ¯4

¯ 3¯¯ ¯ = 6 − 12 = −6, 3¯

giving x′ =

−15 −6

DP2 =

=

15 6

¯ ¯13 ¯ ¯ ¯18

and y′ =

¯ 3¯¯ ¯ = 39 − 54 = −15, 3¯ −16 −6

=

16 6 .

¯ ¯2 ¯ ¯ ¯4

¯ 13¯¯ ¯ = 36 − 52 = −16. 18¯

Therefore,

∆objective c · x† − c · x∗ 18.1667 − 18 = = = 0.1667 = ∆RHS2 b′2 − b2 18 − 17

1 6

Note: One can also consider the impact of changes in other parameters of the problem. For example, suppose that the objective were 8x + 6y instead of 3x + 4y. In this case, the objective would run parallel to the second constraint

CHAPTER 4. LINEAR PROGRAMMING

116

and any point on the line between ( x, y) = (2, 3) and ( x, y) =

optimal, including the two optimal corners.

4.1.4.2

¡ 17 4

¢ , 0 would be

♦

Slack variables

Given the constraints Ax ≤ b and x ≥ 0, suppose that the optimal solution has been found and call it x∗ . The slack variables are defined as the vector s∗ that satisfies

Ax∗ + s∗ = b or s∗ = b − Ax∗ . (In the case where the constraints have the form Ax ≥ b and x ≥ 0, the slack variables are defined: Ax∗ − s∗ = b or s∗ = Ax∗ − b.

This convention ensures that the slack variables are non-negative.)

Continuing with the example, s∗ = (s∗1 , s∗2 ) is defined by the equations: 2 x∗ + 3 y∗ + s∗1 = 13

or

4 x∗ + 3 y∗ + s∗2 = 17

2(2) + 3(3) + s∗1 = 13

4(2) + 3(3) + s∗2 = 17

or

13 + s∗1 = 13

17 + s∗2 = 17

so that s∗ = (s∗1 , s∗2 ) = (0, 0). When a slack variable is 0, the corresponding con-

straint is binding. Here both slack variables are 0, and the constraints bind. In general, changing the RHS of a binding constraint changes the solution and

the value of the objective. Therefore, in general, if the slack variable associated with a constraint is 0, the corresponding dual price is non-zero. Conversely, if the slack variable associated with a constraint is positive, then the constraint is not binding: changing it has no effect on the solution and so the dual price is zero.

4.1.5 Some examples In this section, two examples are considered in detail to summarize the discussion so far. E XAMPLE 4.7: Consider the following linear program: max subject to:

4x + 7 y 2 x + 6 y ≤ 28

3 x + 4 y ≤ 22

5 x + 4 y ≤ 34

x, y ≥ 0.

(1) (2) (3)

4.1. INTRODUCTION

117

• Plot: A plot of the constraints gives:

y 9 8

constraint 3

7 6

22 4 28 5 6

¡ 92

• (2, 4)

4 3

72 22 , 22

•

¢

2

constraint 1 (6, 1) •

1 0

x 0

1

2

3

4

5

6

34722 5 3

8

9 10 11 12 13 14

• Solution: Constraints 1 and 2 intersect at ( x, y) = (2, 4); constraints ¡ 92 72 ¢ , 22 ; and constraints 2 and 3 at ( x, y) = (6, 1). 1 and 3 at ( x, y) = 22 These are found by solving each pair of equations in turn. For example, 2 x + 6 y = 28 and 3 x + 4 y = 22 solve to give ( x, y) = (2, 4), which turns out to be the optimal solution:

Ã

Thus

¯ ¯2 ¯ ¯ ¯3

¯ 6¯¯ ¯ = −10, 4¯

This gives ( x, y) =

2

6

3

4

¯ ¯28 ¯ ¯ ¯22

−40 −10 , −10

¡ −20

¢

!Ã ! x

y

=

Ã

28 22

¯ 6¯¯ ¯ = −20, 4¯

!

.

¯ ¯2 ¯ ¯ ¯3

¯ 28¯¯ ¯ = −40. 22¯

= (2, 4). To see that it is optimal, recall

from earlier discussion that comparing the objective value at each feasible corner is one way to solve the program. Another way is to compare the slopes of the constraints and the objective.

Constraint

Slope

Constraint 1

− 13

Constraint 2 Constraint 3

− 34 − 54

Ob jective − 47

CHAPTER 4. LINEAR PROGRAMMING

118

The slope of the objective is between the slope of constraints 1 and 2, so the optimal solution will be where these constraints intersect – at (2,4). The value of the objective at the solution is: 4(2) + 7(4) = 36. • Parameter changes: If, for example, the objective were changed to 6 x + 7 y with a slope of 67 , this would be steeper than constraint 2 and the solution would be located at ( x, y) = (6, 1). If the objective were al-

tered to be parallel to the second constraint (for example, 6 x +8 y) there

would be an infinite number of solutions (( x, y) = (2, 4), ( x, y) = (6, 1) and all points on the line connecting them).

• Slack variables: Since both constraints 1 and 2 are satisfied with equality, s1 = s2 = 0, while s3 = 34 − 5(2) − 4(4) = 34 − 26 = 8. • Dual prices: 1. To calculate the dual price in constraint 1, replace 28 by 29. Then: ¯ ¯29 ¯ ¯ ¯22

¯ 6¯¯ ¯ = −16, 4¯

¯ ¯2 ¯ ¯ ¯3

¯ 29¯¯ ¯ = −43 22¯

¡ 16 43 ¢ , 10 . The value of the objective and the new solution is ( x, y) = 10 ¡ 16 ¢ ¡ 43 ¢ 365 1 is 4 10 + 7 10 = 10 = 36 2 . Consequently, DP1 = 12 .

2. To calculate the dual price in constraint 2, replace 22 by 23. Then: ¯ ¯28 ¯ ¯ ¯23

¯ 6¯¯ ¯ = −26, 4¯

¯ ¯2 ¯ ¯ ¯3

¯ 28¯¯ ¯ = −38 23¯

¡ 26 38 ¢ , 10 . The value of the objective and the new solution is ( x, y) = 10 ¡ 38 ¢ 370 ¡ 26 ¢ is 4 10 + 7 10 = 10 = 37. Consequently, DP2 = 1.

3. Since changing the right-hand side of constraint 3 does not affect the solution, the associated dual price is 0.

E XAMPLE 4.8: Consider the following linear program: min subject to:

4x + 2 y 4 x + 8 y ≥ 40

3 x + 2 y ≥ 24

6 x + 2 y ≥ 30

x, y ≥ 0.

(1) (2) (3)

♦

4.1. INTRODUCTION

119

• Plot:

y 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

constraint 3

• (2, 9)

constraint 2

(4, 3)

• •

¡ 3¢ 7, 2

constraint 1 x 0 1 2 3 4 5 6 7 8 9 10

• Solution: Comparing the slopes of the objective and constraints gives the solution at the intersection of constraints 2 and 3. ¯ ¯ ¯ ¯ ¯ ¯ ¯3 2¯ ¯24 2¯ ¯3 24¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ = −6 , ¯ ¯ = −12, ¯ ¯ = −54. ¯6 2¯ ¯30 2¯ ¯6 30¯ Thus, ( x, y) = (2, 9). The value of the objective at the solution is: 4(2) +

2(9) = 26.

• Slack variables: The values of the slack variables at the optimal solution, ( x∗ , y∗ ) are defined from the equations: 4 x∗ + 8 y∗ − s∗1 = 40

3 x∗ + 2 y∗ − s∗2 = 24

6 x∗ + 2 y∗ − s∗3 = 30, giving s∗1 = 4 x∗ + 8 x∗ − 40 = 4(2) + 8(9) − 40 = 80 − 40 = 40, and s∗2 = 0,

s∗3 = 0.

• Dual prices: 1. Because constraint 1 is not binding, its shadow price is 0:

DP1 = 0. 2. To determine the dual price of the second constraint, increase

RHS2 by 1 from 24 to 25. Solving for the intersection of this new

CHAPTER 4. LINEAR PROGRAMMING

120

constraint 2 and the original constraint 3: ¯ ¯25 ¯ ¯ ¯30

¯ 2¯¯ ¯ = −10, 2¯

¯ ¯3 ¯ ¯ ¯6

¯ 25¯¯ ¯ = −60. 30¯

¢ ¡ This gives a new solution ( x, y) = 10 6 , 10 and the objective value ¡ ¢ is 4 10 6 + 2(10) = 26.6667. The original objective value was 26, so

a unit increase in the RHS of constraint 2 has raised the value of the objective by 23 : DP2 = 0.6667. 3. Finally, to determine DP3 raise the RHS of constraint 3 by 1 unit and find the new intersection with the original constraint 2: ¯ ¯24 ¯ ¯ ¯31

¯ 2¯¯ ¯ = −14, 2¯

¯ ¯3 ¯ ¯ ¯6

¯ 24¯¯ ¯ = −51. 31¯

¢ ¡ 51 This gives a new solution ( x, y) = 14 6 , 6 and the objective value ¡ 51 ¢ ¡ 14 ¢ ♦ is 4 6 + 2 6 = 26.3337. Therefore DP3 = 0.3337.

4.1.6 Duality reconsidered The following discussion provides a different perspective on dual prices and their computation. An example will motivate the discussion. The next example takes an earlier example and rearranges the parameters and objective to give a program called the dual program. Every program has a dual program, giving the terminology primal and dual to refer to the initial and transformed programs. It turns out that the optimal solution payoff values are the same in both programs, the dual prices in the primal are the solution values in the dual, and the solution values in the primal are the dual prices in the dual program.

E XAMPLE 4.9: Consider the following linear program: Program A min subject to:

13w + 17 z 2w + 4 z ≥ 3

3w + 3 z ≥ 4

w, z ≥ 0

(1) (2)

.

4.1. INTRODUCTION

121

Observe that this program bears a close relation to one discussed earlier (see Examples 4.5 and 4.6): Program B max subject to:

3x + 4 y 2 x + 3 y ≤ 13

4 x + 3 y ≤ 17

(1) (2)

x, y ≥ 0.

Program A is called the dual of program B. Notice that A is obtained from B by exchanging the objective coefficients and RHS constraint parameters, and taking the transpose of the coefficient matrix. The remarkable fact relating these programs is the following. Let ( x∗ , y∗ ) solve program B, with objective value πB and dual prices (DP1B , DP2B ). Similarly, let (w∗ , z∗ ) solve program A, with objective value π A and dual prices (DP1A , DP2A ). Then (1)

πB = π A ,

( x∗ , y∗ ) = (DP1A , DP2A ),

(2)

(3)

(w∗ , z∗ ) = (DP1B , DP2B ).

Recall from Examples 4.5 and 4.6, that the solution to program B has: πB = ¢ ¡ 18, ( x∗ , y∗ ) = (2, 3) and (DP1 , DP2 ) = 1 61 , 16 . To solve program A, plot the constraints:

z 2

4 3

constraint 2

1 3 4

¡7

1 6, 6

constraint 1

¢

w

0 0

1

4 3

2

3 2

The solution is at the intersection of the two constraints, and using Cramer’s rule:

¯ ¯2 ¯ ¯ ¯3

¯ 4¯¯ ¯ = −6 , 3¯

¯ ¯3 ¯ ¯ ¯4

¯ 4¯¯ ¯ = −7 , 3¯

¯ ¯2 ¯ ¯ ¯3

¯ 3¯¯ ¯ = −1 4¯

CHAPTER 4. LINEAR PROGRAMMING

gives solution (w∗ , z∗ ) = ¡ ¢ 17 61 = 108 6 = 18.

122

1 6, 6

¡7

¢

¡ ¢ ¡ ¢ = 1 61 , 16 . The objective value is π A = 13 76 +

To calculate the dual prices consider first an increase of

1 4

in the RHS of

constraint 1 and solve for the new intersection: ¯ ¯2 ¯ ¯ ¯3

¯ 4¯¯ ¯ = −6 , 3¯

¯ ¯3 1 ¯ 4 ¯ ¯4 ′

′

=

444 24

Thus, the new solution is: (w , z ) = value is then 13

¡ 25 ¢ 24

+ 17

¡

7 24

¢

¯ ¯2 ¯ ¯ ¯3

¯ 4 ¯¯ ¯ = −6 41 , 3¯ µ

6 41 6

,

1 34 6

¶

=

¯ 3 14 ¯¯ ¯ = −1 43 . 4¯

¡ 25

7 24 , 24

¢

= 18 21 . Thus, DP1A =

. The new objective 1 2 1 4

= 2.

For constraint 2, to determine DP2A , change the RHS to 4 41 : ¯ ¯2 ¯ ¯ ¯3

¯ 4¯¯ ¯ = −6 , 3¯

¯ ¯3 ¯ ¯ 1 ¯4 4

¯ ¯2 ¯ ¯ ¯3

¯ 4¯¯ ¯ = −8 , 3¯

¯ 3 ¯¯ ¯ = − 21 . 41¯ 4

¡ 1¢ . The new objective value is then The new solution is now (w† , z† ) = 86 , 12 3 ¡ 1 ¢¤ 225 £ ¡ 16 ¢ 13 12 + 17 12 = 12 = 18 34 . Thus, DP2A = 41 = 3. Summarizing: DP1A = 2, 4

DP2A = 3. Thus, ( x∗ , y∗ ) = (DP1A , DP2A ) = (2, 3).

Recall that the dual prices for program B were DP1B = 1.1667 = 67 and ¢ ¡ DP2B = 0.1667 = 16 and πB = 18. Thus, (w∗ , z∗ ) = (DP1B , DP2B ) = 67 , 16 and

♦

π A = πB = 18.

However, there are some qualifications to these observations, as the next remark illustrates. R EMARK 4.1: The tight connection between objective variables and dual variables depends on regularity of the programs. Consider the following program with associated dual:

Dual

Primal max

2x + 2 y

min

subject to:

4w + 4v + 6 z

subject to:

x≤4

(1)

w+z ≥2

(1)

y≤4

(2)

v+z ≥2

(2)

x+ y≤6

(3)

w, v, z ≥ 0

x, y ≥ 0

.

4.2. BASIC SOLUTIONS

123

The primal has an objective parallel to the third constraint. Any point on the line connecting (2, 4) and (4, 2), such as (3, 3), is an optimal solution to the linear program. The value of the objective is 12. Furthermore, for either constraints 1 or 2, raising the RHS by 1 or reducing it by 1, leaves (3, 3) both feasible and optimal, so that there is no impact on the objective. The dual prices of constraints 1 and 2 are 0. The third constraint in the primal has a dual price of 2. In the dual program, w = v = 0 and z = 2 is the unique optimal solution.

To see this, consider changing these values. If z is reduced by ∆, then both

x and y must increase by ∆ to satisfy the constraints. The reduction in the objective from the change in z, −6∆, is outweighed by the gain from the increase

in x and y: 8∆ = 4∆ + 4∆. Turning to shadow prices, if the RHS of the first

constraint in the dual is raised to 2 + γ, γ small, then it is optimal to raise w

from 0 to γ. Raising w raises the objective by 4γ, whereas increasing z would raise the cost by 6γ. The change in the objective divided by the change in the RHS of 1 is 4, suggesting DP1 = 4 in the dual. The same reasoning suggests

that DP2 = 4 in the dual, but the pair (4, 4) correspond to no solution in the

primal.

4.2 Basic solutions The previous discussion dealt with programs having constraints of the form Ax ≤

b or Ax ≥ b. In what follows, it is shown that such inequality-constrained prob-

lems may be reformulated to equality-constrained programs. The equalityconstrained formulation is then used to discuss basic solutions.

4.2.1

Equality-constrained programs

Consider the linear program: max c · x subject to Ax ≤ b and x ≥ 0. Here c – n × 1,

x – n × 1, A – m × n, s – m × 1 and I – m × m, the identity matrix. The program

may be written in a variety of equivalent ways:

I max subject to:

II c·x Ax ≤ b

x≥0

max subject to:

c· x+0·s Ax + Is = b

x, s ≥ 0

CHAPTER 4. LINEAR PROGRAMMING

124

IV

III max

( c, 0) · ( x, s)

subject to:

c¯ · z

max subject to:

( A : I )( x, s) = b

¯ =b Az

( x, s) ≥ 0

z ≥ 0.

Formulation II can be expressed succinctly as follows. Define new matrices and vectors: c¯ = ( c, 0), A¯ = ( A : I ), z = ( x, s). In the initial problem, the dimensions of the matrices and vectors were c − n × 1, x − n × 1, A − m × n. Since A is an m × n

matrix, this means that there are m constraints (leaving aside the non-negativity constraints on the x). The equality-constrained model is obtained by adding “slack” variables – s – one for each constraint. So there are m slack variables added to give formulation II. Thus c¯ is a lengthened vector with zeros added to c; likewise, z involves lengthening the x vector with s, and A¯ is a matrix formed by placing A and I , the m× m identity matrix, side by side. Therefore, z −(n+ m)×1, c¯ −(n+ m)×1 and A¯ − m × (n + m). Programs II, III and IV are equivalent problems – just involving relabeling. Furthermore, I and II are equivalent problems, so that the inequality-constrained program in I may be replaced by the equality-constrained program, IV. Theorem 4.1: The linear programs I and II have the same solutions: if x∗ solves ˜ s˜) solves II then x˜ solves I. I then for some s∗ ≥ 0, ( x∗, s∗ ) solves II, and if ( x, P ROOF : Let x∗ solve I and suppose that there is no s∗ ≥ 0 such that ( x∗, s∗ )

˜ s˜) with: solves II. If this is the case, then for any s∗ ≥ 0, ∃( x,

˜ s˜) ≥ 0. c · x˜ = c · x˜ + 0 · s˜ > c · x∗ + 0 · s∗ = c · x∗ and A x˜ + I s˜ = b, ( x, However, from the previous equation, ˜ = b − A x˜ implies A x˜ ≤ b. 0 ≤ s˜ = Is Therefore, x˜ is feasible for the first problem and c · x˜ > c · x∗ , contradicting the assumption that x∗ solves problem I .

˜ s˜) solves I I but x˜ does not solve I . Then ∃ x∗ ≥ 0, Conversely, suppose that ( x,

such that c · x∗ > c · x˜ and Ax∗ ≤ b. Set s∗ = b − Ax∗ so that s∗ ≥ 0. Then ( x∗, s∗ )

˜ s˜) in II : c · x∗ +0· s∗ > c · x˜ +0· s˜. This contradicts is feasible and improving over ( x,

˜ s˜) solves I I . Therefore, solving I and II are equivalent. the assumption that ( x, This completes the proof.

4.2. BASIC SOLUTIONS

125

R EMARK 4.2: The minimization problem is analogous.

I min subject to:

II min

c·x

subject to:

Ax ≥ b

c· x+0·s Ax − Is = b

x, s ≥ 0.

x≥0

This can be expressed: min subject to:

c¯ · z ¯ =b Az

z ≥ 0,

where A¯ = ( A : − I ), c¯ = ( c, 0), and z = ( x, s). Again, solving I and II yields the

same solution.

4.2.2

Definition and identification of basic solutions

In this section, basic solutions are introduced and related to the vertices of the feasible region. In view of Theorem 4.1, we can focus on the equality-constrained formulation for the following discussion (and discard the “bar” notation). Thus, the linear program will be written: max c · x, subject to : Ax = b and x ≥ 0. That is, max subject to:

c·x

Ax = b x ≥ 0.

Recall that a feasible solution in the inequality-constrained program is any vector x satisfying the constraints: Ax ≤ b and x ≥ 0. A feasible solution xˆ is an

optimal feasible solution if there is no other feasible x˜ , such that c · x˜ > c · xˆ in the case of a maximization problem and c · x˜ < c · xˆ in the case of a minimization

problem. (Alternatively, the definitions of feasible solution and optimal feasible solution can be given in terms of the equality-constrained program – it makes no difference.) With the equality-constrained program, a vector x satisfying Ax = b

and x ≥ 0 is said to be feasible. Let m be the number of rows in A and recall that

A has more columns than rows.

We assume that A has full rank and since A has more columns than rows this means that the rank of A equals the number of rows: m. This is called the

CHAPTER 4. LINEAR PROGRAMMING

126

full rank assumption: the m rows of A are linearly independent. This assumption involves no loss of generality. If A does not have full rank then there are two possibilities. Either some of the constraints in Ax = b are redundant (when some

subset of constraints are satisfied, so are the remainder), or the constraints are inconsistent – there is no x such that Ax = b. To see this more clearly, let a i

denote the i th row of A . Suppose that A does not have full rank. Then some

row of A , say row i , can be written as a linear combination of the other rows: P P P a i = j 6= i θ j a j . Now, suppose a j · x = b j , j 6= i , so that j 6= i θ j (a j · x) = j 6= i θ j b j . P P Since j 6= i (θ j a j ) · x = a j · x, this implies that a i · x = j 6= i θ j b j . However, the i th

equation specifies that x must satisfy a i · x = b i . Thus, there are two possibilities: P (a) b i = j 6= i θ j b j , in which case any x satisfying all but the i th constraint will P also satisfy the i th constraint or (b) b i 6= j 6= i θ j b j , so that any x satisfying all but

the i th constraint cannot satisfy the i th constraint. In case (a), the i th constraint

is redundant, while in case (b) the i th constraint is inconsistent with the other constraints. Therefore, in case (b) the problem has no solution while in case (a) the i th constraint may be dropped from the problem and the problem remains unchanged. From now on we assume that A has full rank – equal to the number of rows (m) in A . Suppose we select m linearly independent columns from the matrix A : say these columns together form a matrix B . Such a set of columns is called a basis. It is possible that not all sets of m columns of A are linearly independent, but since

A has full rank there is at least one set of m linearly independent columns. Since B is non-singular, the equation system BxB = b has a unique solution.

For example, suppose that the first m columns of A are linearly independent and

let B be the matrix formed by these first m columns. Let the remaining columns of A form some matrix C , so that A = (B : C ). Then if xB satisfies BxB = b, let

x = ( xB , 0) and observe that Ax = BxB + c · 0 = BxB = b. Thus x = ( xB : 0) satisfies Ax = b. However, this x may not satisfy x ≥ 0 – some elements of the vector x

may be negative. This problem is taken up later. Note that when the B matrix is selected, a corresponding set of x is selected. In the example above, selection of the first m columns of A implicitly led to the selection of the first m elements of

x, with the remainder set to 0. Given Ax = b, let B be any non-singular m × m submatrix of columns of A .

If all elements of x not associated with columns of B are set to 0, and the resulting set of equations solved, the unique x determined in this way is called a basic solution of the system Ax = b with respect to the basis B . The components of x associated with columns of B are called basic variables. Thus, in the

discussion above where the first m columns of A are chosen (and are linearly

4.2. BASIC SOLUTIONS

127

independent), x = ( xB , 0) is a basic solution (with respect to the basis B ) and xB the

basic variables.

If one or more of the basic variables in a basic solution has a value of 0, that solution is said to be a degenerate basic solution. Therefore, in a non-degenerate basic solution there are m non-zero components (and the remaining components are 0). In a degenerate basic solution, there are fewer than m components non-zero. Recall that if x satisfies both Ax = b and x ≥ 0, x is said to be feasible (for

these constraints). If x is a basic solution satisfying constraint 3 it is called a basic feasible solution. If this solution is also degenerate, it is called a degenerate

basic feasible solution. The following result, called the fundamental theorem of linear programming, is central in linear programming. Theorem 4.2: Given a linear program of the form I or I I ,

I max subject to:

II max

c·x

subject to:

Ax ≤ b

c·x Ax = b

x ≥ 0.

x≥0

1. If there is a feasible solution there is a basic feasible solution. 2. If there is an optimal feasible solution, there is an optimal basic feasible solution.

The importance of this theorem (from the computational point of view) is that the search for an optimal solution can be restricted to the basic solutions. Since there are only a finite number of these, there are only a finite number of x to be considered in the search for an optimal solution. The following example illustrates the use of basic solutions in locating an optimal solution.

E XAMPLE 4.10: max

2x + 5 y

subject to: 4 x + y

x + 3y x+ y x, y

≤ 24

≤ 21

≤9

≥0

CHAPTER 4. LINEAR PROGRAMMING

128

This is rearranged in equality constraint form as: max

2x + 5y + 0s1 + 0s2 + 0s3

subject to :

4x + y + s1 = 24 x + 3y + s2 = 21 x + y + s3 = 9 x, y, s1 , s2 , s3 = 0.

In matrix form, let:



4

1

1

0

 A¯ =  1

3

0

1

1

0

0

1

   0 24      0  , b = 21 , c¯ = (2,5,0,0,0), and z = (x, y, s1 , s2 , s3 ). 1 9

The problem may then be written:

c¯ · z

max subject to :

¯ =b Az z ≥ 0.

♦

In general, given a matrix with m rows and p columns (p > m), the num-

ber of m × m submatrices that can be formed is

p! m!(p− m)!

(the binomial coeffi-

cient). In the present case, the coefficient matrix has m = 3 and p = 5, and so

the maximum number of distinct 3 × 3 matrices (giving the maximum number of bases) is 3!(55!−3)! = 10. Consider the A¯ matrix. The constraints are plotted in Figure 4.1 (the numbering identifies the basic solutions). To find the first basic ¯ Provided these three columns solution, begin with the first three columns of A. are linearly independent (which they are), set the coefficients on the last two columns to 0 (s2 and s3 ) and solve for the coefficients of the first three columns (x, y, s1 ). x

y

s1

s2

s3

4

1

1

0

0

 A¯ =  1 1

3

0

1

1

0

0

 0 . 1





4.2. BASIC SOLUTIONS

129

[9] 24 • 23 22 21 20 19 18 17 16 15 14 13 12 11 10 [7] 9 • 8 7 • [3] • 6 • [8] [1] 5 • 4 [2] 3 2 [6] [4] [5] 1 [10] • • • 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Figure 4.1: Basic solutions

So, suppose s2 = s3 = 0, and solve for the remaining variables: 

4

1

 B1 =  1

3

1

1

  1 4   1 ⇒ 0   0 1

1 3 1

    x 24      y  = 21 . 0     0 s1 9

1

The solution is ( x, y, s1 ) = (3, 6, 6). In the terminology introduced earlier, the set ( x, y, s1 ) are called basic variables and (3,6,6,0,0) is a basic solution. The basis B1

was obtained by setting s2 and s3 to 0. To identify how the basis was obtained, denote this situation by B1 : (s2 , s3 = 0). The notation is:

B1 : (s2 , s3 = 0)   4 1 1   1 3 0   1 1 0 (3, 6, 6)

CHAPTER 4. LINEAR PROGRAMMING

130

The top line gives the elements set to 0 to determine the associated basis. The matrix identifies the basis. The last line gives the solution values of the basic variables:

B1 : (s2 , s3 = 0) B2 : (s1 , s3 = 0)     4 1 1 4 1 0      1 3 0  1 3 1     1 1 0 1 1 0 (3, 6, 6)

(5, 4, 4)

B3 : (s1 , s2 = 0)   4 1 0    1 3 0   1 1 1

B4 : ( y, s3 = 0) B5 : ( y, s2 = 0)     4 1 0 4 1 0      1 0 1 1 0 0     1 0 0 1 0 1

(4.636, 5.545, −1.09)

(9, −12, 12)

(21, −60, −12)

B6 : ( y, s1 = 0)   4 0 0    1 1 0   1 0 1

B7 : ( x, s3 = 0)   1 1 0   3 0 1   1 0 0

B8 : ( x, s2 = 0)   1 1 0   3 0 0   1 0 1

B9 : ( x, s1 = 0)   1 0 0   3 1 0   1 0 1

B10 : ( x, y = 0)   1 0 0   0 1 0   0 0 1

(6, 15, 3)

(9, 15, −6)

(7, 17, 2)

(24, −51, −15)

(24, 21, 9).

The ten basic solutions are in the form z = ( x, y, s1 , s2 , s3 ):

z1 = (3, 6, 6, 0, 0)

(4.3)

z2 = (5, 4, 0, 4, 0)

(4.4)

z3 = (4.636, 5.545, 0, 0, −1.09)

(4.5)

z4 = (9, 0, −12, 12, 0)

(4.6)

z5 = (21, 0, −60, 0, −12)

(4.7)

z6 = (6, 0, 0, 15, 3)

(4.8)

z7 = (0, 9, 15, −6, 0)

(4.9)

z8 = (0, 7, 17, 0, 2)

(4.10)

z9 = (0, 24, 0, −51, −15)

(4.11)

z10 = (0, 0, 24, 21, 9).

(4.12)

This list of z is the list of basic solutions. Note now that there are some basic solutions that have negative entries. Once we eliminate those with negative entries, we are left with the basic feasible solutions. Recall that all basic solutions are chosen to satisfy Az = b and the constraint z ≥ 0 is ignored. Eliminating those

basic solutions with negative entries amounts to eliminating those basic solutions, z, that fail to satisfy z ≥ 0. The non-feasible basic solutions are: z3 , z4 , z5 , z7

and z9 . Observe that those points on the graph are those at the intersection of constraints at points outside the feasible region. The basic feasible solutions are

z1 , z2 , z6 , z8 , z10 . Denoting the objective value at z by π( z), π( z1 ) = 2(3) + 5(6) = 36,

4.3. DUALITY PRINCIPLES

131

π( z2 ) = 2(5) + 5(4) = 30, π( z6 ) = 2(6) + 5(0) = 12, π( z8 ) = 2(0) + 5(7) = 35, π( z10 ) =

2(0) + 5(0) = 0. So, the solution is at z, with x = 3 and y = 6.

4.3 Duality principles There is a general principle behind the close connection between the primal and dual programs. It turns out that both have the same objective values. The following discussion formalizes these observations. Consider the primal (P) and dual (D):

P max

D c′ x

subject to:

b′ y

min subject to:

Ax ≤ b

x≥0

A′ y ≥ c

y ≥ 0.

The dual is formed from the primal by exchanging objective coefficients with the RHS vector; reversing the inequalities in the linear constraints; and transposing the constraint matrix. Taking the dual program associated with the dual gives the primal. Theorem 4.3: For any feasible x and any feasible y, b′ y ≥ c′ x. P ROOF : Taking any y feasible in the dual and pre-multiplying Ax ≤ b (from the

primal) gives y′ Ax ≤ y′ b = b′ y. Similarly, taking any x feasible in the primal

and pre-multiplying A ′ y ≥ c from the dual gives x′ A ′ y ≥ x′ c = c′ x. So, for any

feasible x and y,

b′ y ≥ y′ Ax = x′ A ′ y ≥ c′ x.

(4.13)

Therefore, for any feasible solution, the value of the primal is no larger than the value of the dual: b′ y ≥ c′ x. Perhaps surprisingly, when there is equality, the corresponding x and y are optimal in their respective programs. Theorem 4.4: If x∗ and y∗ are feasible and satisfy c′ x∗ = b′ y∗ , then x∗ is optimal in the primal and y∗ is optimal in the dual.

P ROOF : To see this, observe that from the inequalities above in Equation 4.13, for all feasible x, c′ x ≤ b′ y∗ = c′ x∗ , so that x∗ gives at least as high a payoff

CHAPTER 4. LINEAR PROGRAMMING

132

as any feasible x. Similarly, for all feasible y, b′ y ≥ c′ x∗ = b′ y∗ , so that y∗ is optimal for the dual.

For programs that have a solution, the equality c′ x∗ = b′ y∗ always holds, as stated in the next theorem, Theorem 4.5. The proof makes use of Farkas’ lemma.

Farkas’ lemma asserts that if Q is an m × n matrix and α an m × 1 vector then: 1. [Qv ≤ α and v ≥ 0] has a solution, or 2. [Q ′ z ≥ 0, α′ z < 0, and z ≥ 0] has a solution, but not both. Theorem 4.5: If x∗ and y∗ are solutions to the primal and dual respectively, then

c′ x∗ = b′ y∗ . P ROOF : From Theorem 4.3, b′ y∗ ≥ c′ x∗ . Let δ = b′ y∗ , so it is necessary to con-

firm that c′ x∗ = δ. For ǫ > 0, there is no solution to the system:

b′ y ≤ δ − ǫ

A′ y ≥ c

y ≥ 0,

otherwise, y∗ would not be optimal in the dual program. Rearranging, there is no y ≥ 0 with: Ã

b′

− A′

!

y≤

! Ã δ−ǫ

−c

.

In the statement of Farkas’ lemma (point 1), let

Q=

Ã

b′

− A′

!

,

α=

! Ã δ−ǫ

−c

,

so that from Farkas’ lemma (point 2), this implies that there is zˆ ≥ 0 with (b : − A ) zˆ ≥ 0, (δ − ǫ, − c′ ) zˆ < 0, zˆ ≥ 0. Let zˆ = ( z0 , z) so that these expressions say that there is a ( z0 , z) with:

bz0 − Az ≥ 0, (δ − ǫ) z0 − c′ z < 0, ( z0 , z) ≥ 0. Next, observe that z0 > 0. For if not ( z0 = 0), then − Az ≥ 0 and − c′ z < 0, or

equivalently, Az ≤ 0 and c′ z > 0. And if this were so, x∗ would not be optimal

4.3. DUALITY PRINCIPLES

133

for the primal (since ( x∗ + z) would be feasible A ( x∗ + z) ≤ b, ( x∗ + z) ≥ 0 and yield a higher payoff than x∗ , c′ ( x∗ + z) > c′ x∗ ). Therefore, since z0 > 0, in the

expressions, bz0 − Az ≥ 0 and z0 (δ − ǫ) − cz < 0 divide by z0 to get:

b − A z˜ ≥ 0 and

(δ − ǫ) − c z˜ < 0,

where z˜ = z z1 . Rearranging, there is z˜ ≥ 0 with 0

A z˜ ≤ b,

and

c z˜ > (δ − ǫ).

Hence, a solution to the dual program x∗ satisfies c′ x∗ > δ − ǫ for all ǫ > 0, so

that c′ x∗ ≥ δ. Therefore, δ = b′ y∗ ≥ c′ x∗ ≥ δ, so that b′ y∗ = c′ x∗ .

For the next theorem, let FP = { x | Ax ≤ b, x ≥ 0} and FD = { y | A ′ y ≥ c, y ≥ 0},

the sets of feasible points for the primal and dual. Theorem 4.6 asserts that if there is no feasible solution for the primal (FP = ;, the empty set), then there is no optimal solution for the dual: if FD 6= ;, then the dual program is unbounded.

Theorem 4.6: Suppose that FP = ;, so that there is no x feasible for the primal.

Then there is no yˆ ∈ FD satisfying b′ yˆ ≤ b y for all y ∈ FD .

P ROOF : Suppose that FP = ;, so Ax ≤ b for some x ≥ 0 has no solution. This

implies, by Farkas’ theorem, that there is a y˜ with A ′ y˜ ≥ 0, y˜ ≥ 0 and β = b′ y˜ < 0. If FD = ;, there is nothing to prove. If FD 6= ;, A ′ y ≥ c and y ≥ 0 has a

solution, yˆ : A ′ yˆ ≥ c and yˆ ≥ 0. Then, for any δ > 0, yˆ + δ y˜ is feasible: A ′ ( yˆ + δ y˜ ) ≥

c, yˆ + δ y˜ ≥ 0. Furthermore, b′ ( yˆ + δ y˜ ) = b′ yˆ + δb′ y˜ = b′ yˆ + δβ. This can be made

arbitrarily small (large negative value) by choosing δ sufficiently large.

The argument is symmetric if FD = ;. Then, either FP = ; or FP 6= ;, and the

primal program, P , has no upper bound on the objective. It is useful to summarize

the possible cases in terms of feasibility: 1. FP = ; and FD = ;. 2. FP = ; and FD 6= ;. 3. FP 6= ; and FD = ;. 4. FP 6= ; and FD 6= ;. In case 1, there is nothing to add. In case 2, according to Theorem 4.6, the dual program is unbounded. Case 3 is the same as case 2, except now the primal program

CHAPTER 4. LINEAR PROGRAMMING

134

is unbounded. Finally, considering case 4, since b′ y ≥ c′ x for all y ∈ FP both pro-

grams are bounded.

Thus, if either program has an optimal solution, then so does the other, and the objective values are equal: if x∗ solves the primal and y∗ solves the dual, then c ·

x∗ = b · y∗ . If one program is not feasible, the other has no optimal

solution.

E XAMPLE 4.11: Let A=

Ã

1

0

0

−1

!

,

b=

Ã

b1 b2

!

,

c=

Ã

c1

!

c2

Then Ax ≤ b has no non-negative solution x, if b1 < 0; and A ′ y ≥ c has no non-negative solution y if c2 > 0. In this case, FP = FD = ;. If b1 , b2 and c2 are all positive, then FP 6= ;, while FD = ;.

♦

R EMARK 4.3: Complementary slackness follows from the following considerations. Recall that y feasible for the dual and Ax ≤ b give y′ Ax ≤ y′ b, and x

feasible for the primal and A ′ y ≥ c give x′ A ′ y ≥ x′ c, so b′ y ≥ y′ Ax = x′ A ′ y ≥ ˆ y), ˆ c′ xˆ = b′ y, ˆ the previous inequalities imply c′ x. Since, at a solution pair ( x,

ˆ From this, yˆ ′ (b − A x) ˆ = 0 and xˆ ′ (c − A ′ y) ˆ = 0. b′ yˆ = yˆ ′ A xˆ = xˆ ′ A ′ yˆ = c′ x.

4.3.1 Duality and dual prices Theorem 4.4 may be used to provide insight on dual prices. Note that in examples such as Example 4.9 in Section 4.1.6, changing the right-hand sides a “small” amount leaves the solution at the intersection of the same constraints – with the solution point moved to the new intersection. For example, changing the RHS of constraint 2 in program B from b2 = 17 to b2 + ∆b2 = 18, moved the solution ¡ ¢ 16 from (2,3) (defined by the intersection of 2x + 3y = 13 and 3x + 4y = 17) to 15 6 , 6

(defined by the intersection of 2x + 3y = 13 and 3x + 4y = 18). Furthermore, considering the dual program, program A, the slopes of the constraints are − 42 = −2 and

− 33 = −1, while the slope of the objective is − 17 13 . So changing the coefficient on the

objective (from b2 = 17 to b2 = 18) leaves the objective slope between –1 and –2 ¡ ¢ and the solution unchanged at 76 , 16 . More generally, if the parameters b in the dual objective do not define objective iso-cost lines that are parallel to any of the constraints, then small changes in b

4.3. DUALITY PRINCIPLES

135

lead to no change in the optimal solution of the dual. So, given b with solution

y∗ to the dual program, if some b i is changed to b i + ∆b i , then the solution to the

dual remains unchanged at y∗ and the value of the dual objective at the optimal P solution to the new program is j 6= i b j y∗j +(b i +∆b i ) y∗i . Turning to the primal, with initial RHS, b, the optimal solution is x∗ . Changing the RHS of the i th equation

to b i + ∆b i determines a new solution with each x∗i shifted by a small amount – so that x∗i moves to x∗i + ∆ x i , say. From Theorem 4.4,

c1 x1∗ + c2 x2∗ + ... + c n x∗n ∗ = b1 y1∗ + b2 y2∗ + ... + b m ym

and after the RHS change: ∗ c1 ( x1∗ +∆ x1 )+ c2 ( x2∗ +∆ x2 )+...+ c n ( x∗n +∆ x n ) = b1 y1∗ +b2 y2∗ +...+(b i +∆b i ) yi +...+b m ym .

Using the pre-change equality:

c1 ∆ x1 + c2 ∆ x2 + ... + c n ∆ x n = ∆b j y j . Dividing:

c1 Writing the left side as

∆ x2 ∆xn ∆ x1 + c2 + ... + c n = yj. ∆b j ∆b j ∆b j

∆π ∆b j ,

this gives

∆π = yj. ∆b j So, for the “regular” case, the change in the objective resulting from a change in the RHS is given by the corresponding dual price.

This page intentionally left blank

Exercises for Chapter 4

Exercise 4.1

You have a choice of two foods to eat per day. Food A contains 3

grams of vitamin A per ounce, 4 grams of vitamin C and 1 gram of vitamin E. Food

B contains 1 gram of vitamin A, 5 grams of vitamin C and 3 grams of vitamin E. The minimum daily requirements of vitamins A, C and E are 22, 66 and 20 grams respectively. You are also on a diet, so you want to minimize your calorie intake. An ounce of food A has 30 calories and an ounce of food B has 20. (a) Formulate the linear programming problem. (b) What are the vertices of the convex set bounded by the constraints? (c) At what point do you minimize your calorie intake subject to the nutritional requirements? (d) Assume that each calorie you consume costs you $2. What would you pay for a vitamin pill that would reduce the amount of vitamin A you need from food to 20 grams? Exercise 4.2

For the following systems of constraints, determine and plot the

feasible regions.

(a)

4 x + 3 y ≥ 12

x− y ≤0

x≥0

(b)

y≥0

x+ y−6 ≤ 0

2x + y − 8 ≤ 0

137

2 x + y ≥ 50 (c)

x + 2 y ≥ 40 x≥0

y≥0

CHAPTER 4. LINEAR PROGRAMMING

138

x + 2 y ≤ 12

(d)

6 x + 4 y ≤ 48

−x + 2 y ≤ 8

3 x + y ≥ 12

(e)

6 x + 5 y ≥ 30

x + 2 y ≥ 14

x≥0

x≥0

y≥0

y≥0

20 x + 15 y ≤ 60, 000 4 x + 9 y ≤ 25, 200

(f)

4 x + 6 y ≤ 18, 000

30 x + 20 y ≤ 87, 000 Exercise 4.3 Solve the following linear programming problems: max such that:

5x + 4 y

x + 2 y ≤ 12

6 x + 4 y ≤ 48

−x + 2 y ≤ 8

x, y ≥ 0,

min such that:

x + 3y

3 x + y ≥ 12

6 x + 5 y ≥ 30

x + 2 y ≥ 14 x, y ≥ 0.

Part I. In the maximization problem, calculate the change in the value of the objective as a result of: (a) Changing the RHS (right-hand side) from 12 to 10, and from this deduce the dual price associated with the first constraint. (b) Changing the RHS from 48 to 40, and from this deduce the dual price associated with the second constraint. (c) Changing the RHS from 8 to 7, and from this deduce the dual price associated with the third constraint. Part II. In the minimization problem, calculate the change in the value of the objective as a result of: (d) Changing the RHS from 12 to 11, and from this deduce the dual price associated with the first constraint.

EXERCISES FOR CHAPTER 4

139

(e) Changing the RHS from 30 to 32, and from this deduce the dual price associated with the second constraint. (f) Changing the RHS from 14 to 15, and from this deduce the dual price associated with the third constraint. Part III. For both problems, introduce slack variables and write the programs as equalityconstrained problems. In both cases: (g) Identify all the basic solutions. (h) Identify all the basic feasible solutions. Exercise 4.4 Consider the following problem max

x+ y

subject to:

3 x + 5 y + 8 z ≤ 30

4 x + 5 y − 10 z ≤ 20

x, y, z ≥ 0.

(a) Reformulate the problem using equalities. (b) What are all the basic solutions? (c) What are all the basic feasible solutions? (d) What is the optimal basic feasible solution? Exercise 4.5 Consider the following linear program: max

x+ y

subject to: 2 x + 4 y ≤ 12

(4.1)

4 x + 2 y ≤ 12

(4.2)

x, y ≥ 0. (a) Plot the constraints and solve the program. (b) Write the program in equality constraint form ( Az = b) by introducing slack variables, s1 and s2 .

(c) The “ A ” matrix in (b) has two rows and four columns. Write down all the submatrices having two rows and two columns. There are six such matrices: A 1 , A 2 ,... , A 6 . Find the six basic"solutions # of the four " two # " #by setting 2 4 12 x , when s1 = variables ( x, y, s1 , s2 ) to zero. (e.g. A 1 , A1 = = 4 2 12 y s2 = 0).

CHAPTER 4. LINEAR PROGRAMMING

140

(d) Locate each of the six solutions found in (c) on a graph. Each solution has the form ( x∗ , y∗ , s∗1 , s∗2 ), but in all six cases the ( x∗ , y∗ ) differ – so you can see where each solution is on your x– y graph. Now eliminate those solutions that contain a negative term and relate the remaining solutions to the feasible region. Exercise 4.6 Consider the following linear program: max

Π = 2x + 2 y

subject to:

2 x + 4 y ≤ 16 3x + y ≤ 9

x ≥ 0, y ≥ 0. (a) Find the optimal values of x and y. (b) What is the value of the objective at the (optimal) solution? (c) What are the values of the slack (surplus) variables at the solution? (d) Suppose that in the second constraint, 9 is replaced by 10. How are your answers in (a) and (b) affected? Exercise 4.7 For the following linear program: min

C = 4x + 3 y

subject to:

3 x + 2 y ≥ 158

5 x + 2 y ≥ 200

x + 3 y ≥ 90

x ≥ 0, y ≥ 0. (a) Find the optimal values of x and y.

(b) What is the value of the objective at the (optimal) solution? (c) What are the values of the slack (surplus) variables at the solution? (d) Calculate the dual prices associated with each constraint. Exercise 4.8 For the following linear program: max subject to:

Π = 10 x + 24 y

x + 2 y ≤ 120

x + 4 y ≤ 180

x ≥ 0, y ≥ 0. (a) Find the optimal values of x and y.

(b) What is the value of the objective at the (optimal) solution? (c) What are the values of the slack (surplus) variables at the solution? (d) Calculate the dual prices associated with each constraint.

EXERCISES FOR CHAPTER 4

141

Exercise 4.9 Consider the following linear program: max 2 x + 5 y subject to: 4 x + 1 y ≤ 24

(4.1)

1 x + 3 y ≤ 21

(4.2)

1x + 1 y ≤ 9

(4.3)

x, y ≥ 0. (a) Solve the linear program – find the ( x, y) pair that solves the problem. (b) Find the slack (surplus) values. (c) What is the effect on the value of the objective if the third constraint is replaced by 1 x + 1 y ≤ 8? From this, give the dual price associated with the third constraint. (d) Calculate the dual prices for the other two constraints, (1) and (2). (e) Would the optimal values of the variables ( x, y) be different if the objective were 1 x + 5 y? Exercise 4.10 Consider the linear program: max subject to:

3x + 2 y

4 x + 2 y ≤ 14

4 x + 3 y ≤ 17

6 x + y ≤ 19

x ≥ 0, y ≥ 0. (a) Plot the constraints. (b) Find the optimal solution, ( x, y).

(c) Calculate the dual prices associated with each constraint. (d) Find all the basic solutions. (e) Identify all the basic feasible solutions. Exercise 4.11 Consider the linear program: max subject to:

3x + 4 y

3 x + y ≤ 16

x + 3 y ≤ 16 x+ y ≤6

x ≥ 0, y ≥ 0. (a) Plot the constraints. (b) Find the optimal solution, ( x, y).

CHAPTER 4. LINEAR PROGRAMMING

142

(c) Calculate the dual prices associated with each constraint. (d) Find all the basic solutions. (e) Identify all the basic feasible solutions. Exercise 4.12 Consider the following linear program: max subject to:

3x + 4 y 2 x + 3 y ≤ 13

(4.1)

3 x + 4 y ≤ 17

(4.2)

x, y ≥ 0. Plot the program and solve, giving the slack variables, objective value and dual prices. Exercise 4.13 Consider the following linear programming problem. max

3x + 2 y

subject to:

x + y ≤ 12 2 x + y ≤ 20

x ≥ 0, y ≥ 0. (a) Plot the constraints and the feasible region for this linear programming problem. (b) Solve for the optimal choice of x and y. Exercise 4.14 Consider the following linear program: max subject to:

x+ y 7 x + 2 y ≤ 28

(4.1)

4 x + 3 y ≤ 24

(4.2)

2 x + 6 y ≤ 42

(4.3)

x, y ≥ 0. Plot the program and solve, giving the slack variables, objective value and dual prices. Exercise 4.15 Consider the problem of maximizing x y subject to the constraint

x2 + y2 ≤ 2 and x ≥ 0, y ≥ 0. Let the feasible region be the ( x, y) pairs satisfying x2 + y2 ≤ 2 and x ≥ 0, y ≥ 0.

(a) Plot the feasible region. (b) Plot the set of ( x, y) pairs such that (i) x y = 1, (ii) x y = 2 and (iii) x y = 3.

EXERCISES FOR CHAPTER 4

143

(c) Indicate the solution on a graph and find the solution. (d) How does your answer change if the constraint changes to x2 + y2 ≤ 3? Exercise 4.16 (a) Consider the linear program: max 3 x + 5 y subject to:

x + 2 y ≤ 10

(4.1)

2 x + 3 y ≤ 17

(4.2)

3 x + 2 y ≤ 24

(4.3)

x ≥ 0, y ≥ 0. (i) Plot the constraints. (ii) Find the optimal solution, ( x, y). (iii) Calculate the dual prices associated with each constraint. (b) Consider the linear program: max subject to:

x+ y 2x + y ≤ 6

x + 2y ≤ 6 x ≥ 0, y ≥ 0. (i) Find all the basic solutions. (ii) Identify all the basic feasible solutions.

This page intentionally left blank

5 Functions of one variable

5.1

Introduction

Functions arise naturally in the study of economic problems. For example, a demand function relates quantity demanded to price. Properties of such functions are routinely studied – the slope and elasticity measure responsiveness of demand to price variations. The techniques presented here introduce the tools required for such calculations. Section 5.2 describes a variety of functions arising in different areas (taxation, finance and present value calculations). Section 5.3 introduces the basic notions of continuity and differentiability of a function. Certain families of functions, such as logarithmic, polynomial and exponential functions, appear frequently in economic applications; some of these are introduced in Section 5.4 and their shapes described. Growth rates and the algebra of growth calculations are developed in Section 5.5. Finally, Section 5.6 provides rules for the differentiation where the function is itself formed from other functions (such as the ratio or product of functions). A function, f , from a set X to a set Y is a rule associating with each element of a set X some element in the set Y . This is written f : X → Y . In words, to each

point in X , f associates a point, y = f (x) in Y . The set X is called the domain

and the set Y is called the range. The inverse demand function, relating price to

quantity demanded, p(Q), is a familiar example. The domain is the set of nonnegative numbers, R + (quantity must be positive), and the range is also the set of non-negative numbers (price is non-negative). Thus, p : R + → R + .

145

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

146

5.2 Some examples The following examples show a variety of ways in which functions arise in economics.

5.2.1 Demand functions Consumer demand theory describes an individual in terms of utility derived from consumption of a vector of goods (x1 , x2 ,... , x n ). With a fixed income y and price p i for good i, utility maximization leads to choices for the goods that depend on prices and income: x i (p1 , p2 ,... , p n , y). See Figure 5.1 for the two-good case. In the two-good case, utility maximization leads to the selection of (x1∗ , x2∗ ). The values of x1∗ and x2∗ depend on the prices (p1 , p2 ) and income, y. These are called the demand functions and are written x1 (p1 , p2 , y) and x2 (p1 , p2 , y).

x2

•

(x1∗ , x2∗ )

x1

Figure 5.1: Choices determine demand j

With J individuals, write the demand for good i by individual j as x i (p1 , p2 ,... , p n , y j ), where y j is income of individual j. Then, the aggregate demand for good i is: X i (p1 , p2 ,... , p n , y1 , y2 ,... , y J ) =

J X

j =1

j

x i (p1 , p2 ,... , p n , y j ).

The function X i depends on the prices and incomes of each individual. If we consider the case where all prices other than that of good j, p j = p¯ j , j 6= i, and all

5.2. SOME EXAMPLES

147

incomes, y j = y¯ j , are fixed then the aggregate demand is:

D i ( p i ) = X i ( p¯ 1 , p¯ 2 ,... , p¯ i−1 , p i , p¯ i+1 ,... , p n , y¯ 1 , y¯ 2 ,... , y¯ J ). For simplicity, ignore the subscripts and write, D ( p), for the aggregate demand as a function of own price. With this, one may study the behavior of aggregate demand as own price varies. Writing Q for the quantity demanded, Q = D ( p).

This gives aggregate demand for the good as a function of the good’s price. With

this, the responsiveness of demand to price variation may be considered. Consider a price p′ 6= p with corresponding demand Q ′ : the variation in quantity resulting

from the price variation is Q ′ − Q = D ( p′ ) − D ( p). Thus, for example, the change in aggregate quantity demanded per unit change in price is given by

Q ′ − Q D ( p′ ) − D ( p) = p′ − p p′ − p and gives the slope of the demand function. Similarly, the percentage change in quantity resulting from a percentage change in price is given by: Q ′ −Q Q p′ − p p

=

D ( p ′ )− D ( p ) D ( p) p′ − p p

=

p D ( p′ ) − D ( p) . D ( p) p′ − p

The negative of this term gives the elasticity of demand. Note that the elasticity of demand is computed at a point on the demand curve: at different points, the value of the elasticity will vary. Thus, we may write: ǫ( p ) = −

5.2.2

·

p D ( p)

¸·

¸ D ( p′ ) − D ( p) . p′ − p

Present value and the interest rate

The present value of a cash flow may be viewed as a function of the interest rate. A bond is characterized by a face value P , an interest rate ρ and a maturity date T , and it pays the holder a fixed payment, f = ρ P , each time period (such

as every six months or every year), and the face value P at the end of the T th period. Thus, the cash flow generated by the bond is: ( f , f , f ,... , f , P ), where f is paid at the end of periods 1, 2, up to T − 1. What is the value of this cash flow?

If the market interest rate is r, x dollars invested today for one period is worth

y = x(1 + r) at the end of the period. Put differently, to have y dollars at the end

of the period, x =

y (1+ r )

dollars should be invested now. How much should one pay

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

148

for a payoff of y dollars in one period from now? If the cost is c < x, borrowing c

will lead to a payoff y next period and a debt of c(1 + r). Since c(1 + r) < x(1 + r) = y,

this generates a positive net payoff. Similarly, if c > x the loan repayment in one period will exceed the payoff: c(1 + r) > y. So, c < x undervalues the return y in one period; and c > x overvalues the return y in one period. For market equilibrium, c=x=

y 1+ r .

This is the current value of y dollars, one period from now. The mar-

ket value of the return y in one period is

y 1+ r .

The same reasoning shows that the

current value of a payoff y two periods from now is it is

y . (1+ r) t

y , (1+ r)2

and t periods from now

Therefore, the flow ( f , f , f ,... , f , P) has current market value of: f f f f P + ··· +··· + + t 2 T − 1 1 + r (1 + r) (1 + r) (1 + r) (1 + r)T · ¸ 1 1 1 1 P =f + ··· +··· + + . 1 + r (1 + r)2 (1 + r) t (1 + r)T −1 (1 + r)T

V=

One simple case occurs when the payments continue indefinitely (and there is no end period at which P would be paid). In this case, ·

1 V=f 1+r

¸· 1+

¸ 1 1 1 + ··· +··· . 1 + r (1 + r)2 (1 + r) t 1 1 1−α , when α = 1+ r , the infinite f 1 1+ r 1+ r r = r (Figure 5.2). At the time

Using the fact that for 0 < α < 1, 1 + α + α2 + · · · = sum becomes

1 1 1 − 1+ r

=

1+ r r .

Therefore, V = f

the bond is issued, the rate on the bond, ρ , will typically equal the market rate r. Afterwards, while the bond rate is fixed, the market rate will vary, causing the value of the cash flow provided by the bond to vary. Write V (r) to denote this dependence, the valuing of the bond as a function of the current market interest rate: V (r) =

f . r

From this, one may, for example, calculate the impact on the value of the bond £ ¤ f f ofha change in the market interest rate as: ∆V = V (r′ ) − V (r) = r′ − r = f r1′ − 1r i h i r′ r−r′ f rrr ′ − rr ′ = f rr ′ .

5.2.3 Taxation In a market with supply and demand curves S(p) and D(p), the introduction of a sales tax creates a wedge between the price paid and the price received. How do these changes relate to the pre-tax price and quantity? With D(p) and supply S(p), the market clearing price is p0 , determined as the solution to: D(p0 ) = S(p0 ).

5.2. SOME EXAMPLES

149

V

r

Figure 5.2: The present value function

Suppose now that an ad-valorem tax is introduced, so that if the seller receives p t , the buyer pays (1 + t)p t . Let t0 = 0 be the tax initially (no tax) and t the tax imposed, so the tax incre-

ment is ∆ t = t − t0 = t. Now, market clearing requires that the amount supplied at price p t equals the amount demanded at price (1 + t)p t : D((1 + ∆ t)p t ) = S(p t ). With

Q(0) = D(p0 ) = S(p0 ) and Q(t) = D((1 + ∆ t)p t ) = S(p t ), the variation on quantity

traded as a result of the tax increment is Q(t) − Q(0) < 0, and the rate of change of quantity traded relative to the tax variation is given by

Q(t)−Q(0) . ∆t

From Figure 5.3, the magnitude of this depends on the slopes of the supply and demand curves. Similarly, the extent to which p t lies below p0 and (1 + t)p t

lies above p0 depends on the slopes of the curves and determines the incidence of the tax (which party, supplier or demander, pays most of the tax).

5.2.4

Options

At any point in time, a stock trades at some price p, which varies over time. Derivative assets can be created as a function of the stock price p, f (p). For example, consider a contract offering f (p) dollars if the stock price is p dollars or more at the end of trading, and 0 otherwise. For example, if f (p) = 5, then the contract pays $5 if the underlying stock has a price of at least p at the end of trading. Such derivative assets are called “options”. In finance, an option gives the owner the right to buy or sell an asset. An option giving the right to purchase an asset at some specified price, p∗ , called the

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

150

Q S(p)

Q0 Qt

•

p t p0

D(p) p (1+ t)p t

Figure 5.3: The impact of taxation in a market

exercise or strike price, at (or before) a specified date, T, is a call option. An option giving the right to sell is called a put option. When the right can be exercised only at the date T, the option is a European option. When the right can be exercised on or before the date T, the option is an American option. Such options define assets that are functions of the underlying assets and so are called derivative assets. Suppose that the strike price of a European call option is p¯ and that p˜ is the (as yet unknown) stock price at period T (in the future). Then, if the price p˜ turns ¯ the owner of the call option can exercise the right to purchase out to be above p, ¯ That unit of stock can be sold on the market one unit of the stock at price p. ˜ so the net gain is p˜ − p. ¯ If the price p˜ turns out to be below p, ¯ the at price p,

owner of the call option will choose not to exercise the purchase right, and the net ¯ and 0 benefit is 0. So, the owner of the option will get a return of p˜ − p¯ if p˜ > p, ˜ = max{ p˜ − p,0 ¯ }, the otherwise. Therefore, the return per call option owned is ϕ( p)

larger of p˜ − p¯ and 0 (max{ x, y} equals the larger of the two numbers, x and y). If

the individual owns k such options, then k units of the stock may be purchased at ˜ Figure 5.4 depicts the return of a call and price p¯ and so the return is then kϕ( p). ¯ put option where the striking price is p. In contrast, a put option gives the right to sell a unit of the asset at a (strike) ˆ say. In this case, net benefit arises when the price turns out to be low, price, p, ˆ This has a return so that the asset may be sold at the (higher) strike price p. ˜ = max{ pˆ − p,0 ˜ }. Such options may be combined to provide more complex of θ( p) return streams.

5.3. CONTINUITY AND DIFFERENTIABILITY

151

ϕ

θ

call option ϕ( p˜ )

p¯

put option θ( p˜ )

p˜

pˆ

p˜

Figure 5.4: Options

5.3 Continuity and differentiability 5.3.1

Preliminaries

Before discussing continuity and differentiability, it is necessary to give some background definitions, describe types of function and introduce the notion of distance between numbers. Definition 5.1: Functions may be categorized in terms of surjectivity, injectivity and bijectivity. • If f ( X ) = Y , where f ( X ) = { y | there is an x ∈ X , f ( x) = y}, we say f is onto Y

or surjective. In words, for any point y ∈ Y , there is some x ∈ X with f ( x) = y.

• The function f is called 1-1 (one-to-one) or injective if f ( x1 ) = f ( x2 ) implies

that x1 = x2 ; a 1-1 function cannot have the same value at two different points in the domain.

• If f is both 1-1 and onto, it is called bijective. If f is a bijective mapping, ∃

a function g : Y → X , called the inverse of f with g( f ( x)) = x and f ( g( y)) = y.

The function g associates a unique point in X to each point in Y , and is written f −1 .

E XAMPLE 5.1: For example, with X = Y = [0, 1], f ( x) = x2 is surjective since p for y ∈ Y there is an x with f ( x) = y ( x = y); and injective because x 6= x¯ and

x, x¯ ∈ [0, 1] imply f ( x) = x2 6= x¯ 2 = f ( x¯ ). This function is bijective with inverse

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

g(y) = f −1 (y) =

p

152

y. The function f (x) = 2x(1 − x) is not surjective because

for any y ∈ [0,1] and y >

1 2

there is no x ∈ [0,1] with f (x) = y; and it is not ¡ ¢ ¡ ¢ injective because, for example, f 13 = f 23 = 49 . ♦

In the economic context, some of these ideas can play an important role. Consider the case where there are two goods, x1 and x2 . Demand for good 1 is x1 (p1 , p2 ). For the discussion, let p2 be fixed (at say p¯ 2 ) and let D(p1 ) = x1 (p1 , p¯ 2 ), giving the demand for good 1 as its price varies.

In Figure 5.5, an increase in the price of good 1 from p1 to p′1 raises the demand for good 1 from x1 to x1′ . Such a good is called a “Giffen” good, and has the odd feature that the demand curve slopes upward in some regions. The second figure illustrates such a demand curve. Here, there are multiple prices that lead to the same level of demand. In the figure, x1 = D(p◦1 ) = D(p1 ) = D(p′′1 ): the demand equals x1 at three different prices, p◦1 , p1 and p′′1 . The demand function

Q x2 a • b •

p◦1 x1 + p¯ 2 x2 = y

c d• •

x1 x1′

y p′1

y p1

x1

x1′ x1

c a

p◦1

b

p1 p′1

d

p′′1

D(p) p

Figure 5.5: Giffen good; a non-injective demand function

is not injective and we cannot associate a unique price to a demand level such as x1 . Technically, the demand function does not have an inverse – there is no inverse demand function, p(x1 ). Such a function cannot be defined, since, at x1 for example, there are multiple prices associated with this demand level (p◦1 , p1 and p′′1 ).

5.3. CONTINUITY AND DIFFERENTIABILITY

153

5.3.2

Inequalities and absolute value

Informally, a function f is continuous if when x is close to x′ , f ( x) is close to f ( x′ ). Formally defining continuity requires a notion of closeness – the distance between different points or numbers. For example, considering the numbers 4, 6 and 9, four is closer to 6 than 9 is to 6. Since 6 − 4 = 2 while 9 − 6 = 3. Here, the subtraction is

done so that the result is positive: if in the second calculation, we write 6 − 9 = −3;

this is smaller than 2 (−3 < 2), but 9 is not closer than 4 is to 6. Clearly, here the

magnitude is what is important, and this is measured by the absolute value of a number. Comparing 9 − 6 = 3 and 6 − 9 = −3, the magnitude of the difference is 3; both 3 and −3 indicate that 9 and 6 are separated by three units.

When comparing two numbers, one can always say which one is larger or smaller, or else they are equal. The expressions “≥” and “≤” denote “greater than or equal to” and “less than or equal to”, respectively. Thus, “a is at least as large

as b” is written a ≥ b, while “a is greater (strictly) than b” is written a > b. Note

that if a ≥ b, then −a ≤ −b: multiplying both sides of an inequality reverses the inequality (consider 7 > 5 and −7 < −5).

For a real number a, the expression | a | is called the absolute value of a, and

is defined: | a |= a, if a ≥ 0 and | a |= −a, if a < 0: |a| =

(

−a

a

if a < 0, if a ≥ 0.

In words, | a | equals a if a is non-negative; and equals −a if a is negative. So, | a |

is always non-negative, and strictly positive unless a = 0.

P ROPOSITION 5.1: The absolute value operation satisfies a number of simple rules. For real numbers, a, b and c: 1. If a ≥ 0, then | x |≤ a if and only if −a ≤ x ≤ a; 2. | a + b |=| b + a |; 3. | a × b |=| a || b |; 4. | a + b |≤| a | + | b |; 5. | a + b |≥| a | − | b |; 6. |

a b

|=

|a | | b| ,

if b 6= 0.

♦

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

154

P ROOF : (1) From the definition of absolute value, − | x |≤ x ≤| x |, since x =| x |

or − | x |. If we assume | x |≤ a, then −a ≤ − | x |≤ x ≤| x |≤ a. Conversely, if

we assume −a ≤ x ≤ a then x ≥ 0 implies | x |= x ≤ a. On the other hand, if

x < 0, −a ≤ x implies − x ≤ a and | x |= x so | x |≤ a. Either way, | x |≤ a. Thus,

| x |≤ a is equivalent to −a ≤ x ≤ a. Points (2) and (3) follow from the definition.

Concerning (4), to see that | a + b |≤| b | + | a |, observe that − | a |≤ a ≤| a | and

− | b |≤ b ≤| b |: these inequalities imply −(| a | + | b |) ≤ a + b ≤ (| a | + | b |) and

| a + b |≤| a | + | b |. To see that | a + b |≥| a | − | b |, using rule (4), note that | x | + | y |≥| x + y | and, rearranging, | x |≥| x + y | − | y |. Set x = a + b, and y = −b

in this expression to give the result | a + b |≥| a | − | b |, for any a, b. Finally, (6)

follows from the definition.

P P Rules 4 and 5 generalize directly. Consider | ni=1 x i | ≤ ni=1 | x i |. To confirm P P P this, put a = x1 , b = ni=2 x i so that | ni=1 x i | ≤ | x1 | + | ni=2 x i |, using rule 4. Repeat P P P with | ni=2 x i | to give | ni=1 x i | ≤ | x1 | + | x2 | + | ni=3 x i |. Proceeding in this way P gives | ni=1 x i | ≤ | x1 | + | x2 | + · · · + | x n |. P Pn Similarly, | ni=1 x i | ≥ | x1 | − | x2 | − | x3 | − · · · − | x n |. Given i =1 x i , Pn Pn Pn set a = x1 and b = i=2 x i to get | i=1 x i | ≥ | x1 | − | i=2 x i |, using rule 5. The preP P vious discussion gives | ni=2 x i | ≤ | x2 | + | x3 | + · · · + | x n |, so − | ni=2 x i | ≥ − | x2 | − P P | x3 | − · · · − | x n |. Therefore, | ni=1 x i | ≥ | x1 | − | ni=2 x i | ≥ | x1 | − | x2 | − | x3 | − · · · −

| x n |.

5.3.3 Continuity Continuity of a function is the requirement that when points are close in the domain, the function should take those points to the same neighborhood in the range. This is formalized next. Definition 5.2: Let x1 , x2 ,... , x n = { x n }n≥1 be a sequence of real numbers. We say that x n converges to a real number c if given any positive number, ǫ > 0, however

small, there is an integer n¯ , such that | x n − c |≤ ǫ, whenever n ≥ n¯ .

This is written | x n − c |→ 0 as n → ∞, or x n → c, or lim n→∞ x n = c. Intuitively,

| x n − c |→ 0 means that | x n − c | gets “closer and closer” to 0 as n gets “larger and

larger” (n → ∞).

E XAMPLE 5.2: The following sequences illustrate convergence: 1. x n = c + n1 , so that | x n − c |=

1 n

→ 0, as n → ∞, so that x n converges to c.

5.3. CONTINUITY AND DIFFERENTIABILITY

155

2. x n = c + (−1)n . Thus x n is equal to c + 1 when n is even and c − 1 when

n is odd. In this case | x n − c | = 1 for all n and so x n does not converge

to c.

3. x n = c + n, so that | x n − c |= n and so x n does not converge to c.

♦

Definition 5.3: A function f is continuous at c if given ǫ > 0, ∃δ > 0 such that | f ( x) − f ( c) |< ǫ whenever | x − c |< δ.

In this case, write lim x→ c f ( x) = f ( c). As x gets “closer and closer” to c, f ( x) gets

“closer and closer” to f ( c). Call f continuous, if f is continuous at every c ∈ X .

E XAMPLE 5.3: The following functions illustrate continuity: 1. X = (0, ∞), f ( x) =

1 x

is continuous; but f is not continuous if the domain

is X = [0, ∞).

2. X = (−∞, ∞), f ( x) = 2 x3 + x2 + 5 is continuous. 3. X = [0, 1], f ( x) = x, if x
x, then f (x′ ) > f (x). If x′ is slightly larger than x, then f (x′ ) is slightly larger than

f (x), and f (x′ ) − f (x) measures the increase in f resulting from the increase from

x to x′ . More generally, whether f is increasing or not, f (x′ ) − f (x) measures the

change in f resulting from the change in x.

Definition 5.4: The function f is differentiable at x if, as x′ converges to x,

f (x′ )− f (x) (x′ − x)

converges to a number. Formally, f is differentiable at x if the following limit exists: lim

x′ → x

f (x′ ) − f (x) f (x + ∆ x) − f (x) f (x + ǫ) − f (x) = lim = lim . ′ ǫ → 0 ∆ x → 0 x −x ∆x ǫ

If the limit exists it is denoted

d f (x) or f ′ (x) dx and is called the derivative of f . In Figure 5.6, f ′ (x) is equal to the slope of the tangent to the curve;

f (x+∆ x)− f (x) ∆x

is

equal to the slope of the hypotenuse of the triangle. As ∆ x becomes smaller, the two slopes get closer. ¯ the function is flat at that When the derivative is equal to 0 at some point x, point. This is important in optimization, where one wishes to locate the value of x than makes f (x) as large as possible. The function f (x) = x(1 − x) has a plot similar to Figure 5.6 and has f ′ (x) = 0 at x = 12 .

The following functions illustrate differentiability.

E XAMPLE 5.4: If f (x) = x2 , then f (x + ∆ x) = (x + ∆ x)2 = x2 + 2x∆ x + (∆ x)2 so

that f (x + ∆ x) − f (x) = x2 + 2x∆ x + (∆ x)2 − x2 = 2x∆ x + (∆ x)2 . Therefore f (x + ∆ x) − f (x) 2x∆ x + (∆ x)2 = = 2x + (∆ x). ∆x ∆x

5.3. CONTINUITY AND DIFFERENTIABILITY

157

f (x + ∆ x) f (x)

•

x x + ∆x

Figure 5.6: The derivative

Since the limit of this as ∆ x goes to 0 is 2x, this shows that f ′ (x) = 2x. For example, take x =

1 2

= 0.5 and ∆ x = 0.05, so that x + ∆ x = 0.55. Then f (x +

∆ x) = f (0.55) = (0.55)2 = 0.3025 and f (x) = f (0.5) = 0.25 so that

f (x + ∆ x) − f (x) f (0.55) − f (0.5) 0.3025 − 0.25 0.0525 = = = = 1.05. ∆x 0.05 0.05 0.05 At x = 0.5, the derivative is f ′ (x) = 2x = 2(0.5) = 1, so the approximation is

♦

good at ∆ x = 0.05.

The next example gives a continuous function that does not have a derivative at x=

1 2

(see Figure 5.7). E XAMPLE 5.5: Let f (x) = min{ x,1 − x}, so that f (x) is the smaller of x and ¡ ¢ 1 − x. Take x = 12 and let x′ < 12 so that ∆ x = x′ − x < 0, f (x) = f 12 = 12 = x and

f (x′ ) = x′ since x′ < 12 . Now, observe that f (x + ∆ x) − f (x) = (x + ∆ x) − x = ∆ x,

so that

Thus lim∆ x→0

f (x′ ) − f (x) ∆ x = = 1. ∆x ∆x f (x+∆ x)− f (x) ∆x ′

= 1, when we take x′ < x (∆ x < 0).

Now, take x > x, so that x′ = x + ∆ x, ∆ x > 0. Then f (x′ ) = 1 − x′ = 1 − (x +

∆ x) = (1 − x) − ∆ x. Since f (x) = x =

1 2

= (1 − x),

f (x′ ) − f (x) f (x + ∆ x) − f (x) [(1 − x) − ∆ x − (1 − x)] −∆ x = = = = −1 ∆x ∆x ∆x ∆x

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

So the limit of

[ f ( x′ )− f ( x)] ∆x

158

= −1 when we take x′ > x. Another example of this

♦

sort is f ( x) =| x |. (Check that f is not differentiable at x = 0.)

x

1 2

Figure 5.7: Plot of f ( x) = min{ x, 1 − x} R EMARK 5.1: It is always the case that differentiability of a function implies f ( x0 +∆ x)− f ( x0) ∆x f ( x0 +∆ x)− f ( x0) to ∆x

that the function is continuous. To see this, suppose that lim∆ x→0

exists. As the denominator (∆ x) approaches 0, for the ratio

have a limit requires that ( f ( x0 + ∆ x) − f ( x0 )) also approaches 0. Therefore, lim∆ x→0 ( f ( x0 + ∆ x) − f ( x0 )) = 0. The converse is not true. For example, | x | is

continuous but not differentiable.

R EMARK 5.2: Differentiability plays an important role in locating the maximum or minimum of a function. Consider f ( x) = x(1 − x) = x − x2 , so f ′ ( x) = 1 − 2 x. Thus f ′ ( x) is positive when x < 21 , negative when x >

1 2

and equal to 0 at x = 12 .

The derivative of f measures the slope of f : the slope is positive to the left of 12 , 0

and negative to the right of 12 . For certain functions (such as this one), there ¡ ¢ is a unique point at which the derivative is 0 21 for the function f ( x) = x(1 − x) . at

1 2

Note that this occurs at the highest point of f so by solving f ′ ( x) = 0 = 1 − 2 x, we find the value of x that maximizes f : 1 − 2 x = 0 ⇒ x = 21 . This suggests that the

use of derivatives will be helpful in finding maxima (and minima) of functions.

5.4 Types of function In principle, a function f is a rule that associates to each point x in the domain, a point y in the range. In practice, it is useful to use specific functional forms that reflect specific patterns in relationship between x and y.

5.4. TYPES OF FUNCTION

159

E XPONENTIAL

FUNCTIONS :

The exponential function is defined f (x) = e x , where

e is a constant (known as Euler’s number, e ≈ 2.718), discussed later. More gener-

ally, exponential functions have the functional form f (x) = a x or f (x) = ba cx where a, b and c are constants. P OLYNOMIAL n

a n x + a n −1 x

FUNCTIONS :

n −1

A polynomial function is a function of the form: f (x) =

+ · · · + a1 x + a0 , where {a k }nk=0 are constants.

T RIGONOMETRIC

FUNCTIONS :

The three primary trigonometric functions are

the sine, cosine and tangent functions, f (x) = sin(x), f (x) = cos(x), f (x) = tan(x). Trigonometric functions are functions of an angle x, which may be measured in

degrees or radians. When measured in degrees, a 360◦ angle represents one full rotation in a circle. When measured in radians, a radian is that angle, such that 2π such angles is a full rotation in the circle. It is common to plot these functions from −π to π, after which they repeat, so in this case the unit of measure is radians.

L OGARITHMIC

FUNCTIONS :

The logarithm of a number is defined relative to

some given base, and is said to be the logarithm of the number to that base. The logarithm of x to the base b is that number y such that x = b y , that is the power to which the base must be raised to equal that number. So, for example, the logarithm of 100 to the base 10 is 2, because 100 = 102 . The log of x to the base b is

written logb x. One special case is where b = e, Euler’s number. Routinely, log e x is written ln x.

5.4.1

Plots of various functions

The following table gives some well known functions and their derivatives. f (x)

xa

ax

log e x

e ax

sin(x)

cos(x)

tan(x)

e g(x)

f ′ (x)

ax a−1

a x · ln(a)

1 x

ae ax

cos(x)

− sin(x)

1 cos2 (x)

g′ (x) · e g(x)

For some functions, parameter variation may change the plot of the function substantially. In Figure 5.8, different values of a vary the appearance of the function significantly. Note that the function f (x) = x a may only be defined for some values of x. For p example, when a = 12 , then f (x) = x is defined only for real x ≥ 0 and when a = −1

it is defined for all x 6= 0.

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

x2

160

x3

x −1

x −1

Figure 5.8: f (x) = x a , for various values of a The function f (x) = a x (Figure 5.9) is defined for all real x when a > 0. When 1

1

a < 0, a x may or may not be defined. For example, (−8)− 3 = −2, but (−1)− 2 is not

a real number.

2x 1x 2

1x

Figure 5.9: Plot of f (x) = a x , for various values of a The trigonometric functions are typified by repeating patterns. For any angle x, measured in degrees, this angle is the same angle as x + 360◦ so that any function of the angle will have the same value at x and x + 360◦ . Alternatively,

angles may be measured in radians. A radian is that angle such that 2π radians equals one rotation in the circle (or 360◦ ). In Figure 5.10 the angles are measured in radians. These functions may be modified (Figure 5.11). Consider the function f (x) = ¤¢ ¡ £ A sin(bx − c) + d = A sin b x − bc + d. The parameter A is called the amplitude:

if A = 2, the height of the fluctuation is doubled. The parameter

c b

measures

5.4. TYPES OF FUNCTION

161

sin( x)

•

π

•

−2π

2π

•

−π

•

cos( x)

Figure 5.10: Plot of sin( x) and cos( x) for −2π ≤ x ≤ 2π horizontal shift. Comparing f ( x) with g( x) = A sin(bx) + d , f tio

c b

¡c¢ b

= g(0). The ra-

is called the phase shift (the horizontal displacement of the function from its

initial position). Finally, the period of a function is the distance between repeating points and is given by

2π b .

sin( x) •

•

−π

−2π

•π 2

π

•

2π

•

x

sin( x) ¡ ¢ sin x − π 2

¢ ¡ Figure 5.11: Plot of sin( x) and sin x − π2 for −2π ≤ x ≤ 2π

5.4.2

The log and exponential functions

Both log and exponential functions are widely used in economics because, for example, growth formula and growth calculations utilize these functions. The log of

x to the base b is denoted logb x giving that number a such that x = ba . The log

function is defined only for positive numbers (for x > 0). Two particularly impor-

tant bases are b = 10 and b = e, where e is Euler’s number with e ≈ 2.718. Because

the function log e x is widely used, for notational convenience this is often written def

as ln x: ln x = log e x.

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

5.4.2.1

162

The log function

Definition 5.5: The log of a number x to the base b is the power to which b must be raised to equal that number. Thus, a = logb x means x = ba . So, logb x is the

number that satisfies x = b l og b x .

The log function has a number of useful properties. P ROPOSITION 5.2: The log function satisfies: 1. The log of x a to base b is equal to a times the log of x to the base b: logb x a = a log b x. 2. The log of a product is equal to the sum of the logs: logb x y = logb x + logb y. 3. The log of a ratio is equal to the difference of the logs: logb

x y

logb y.

= logb x −

4. Change of base. The log of x to base c equals the log of x to base b times log of b to base c: log c x = log c b × log b x, or

log c x logb x

♦

= log c b.

P ROOF : The proofs are straightforward manipulations of the definition. 1

1. Let c = logb x a , so x a = b c , using the definition of a log. Therefore ( x a ) a = 1

c

c

(b c ) a or x = c a . And x = b a means logb x =

c a.

Multiplying by a gives

a

a log b x = c or a log b x = logb x .

2. Suppose d = log b x y, f = logb x and g = log b y. Then (i) x y = b d , (ii) x = b f

and (iii) y = b g . Using (ii) and (iii), x y = b f b g = b f + g . Since (i) gives x y =

b d , it must be that d = f + g or logb x y = log b x + logb y.

3. Suppose d = logb xy , f = log b x and g = log b y. Then (i) and (iii) y = b g . Using (ii) and (iii),

must be that d = f − g or

logb xy

x y

=

bf bg

x y

= b d , (ii) x = b f

= b f − g . Since (i) gives

x y

= log b x − logb y.

= b d , it

4. Let d = log b x so that x = b d . Then log c x = log c b d . From rule 1, log c b d =

d log c b. So, log c x = d log c b = logb x · log c b, and thus

5.4.2.2

log c x log b x

= log c b.

Choice of base for the log function

There are two commonly used bases: 10 and e. For example, if α = log10 x then 10α = x. Thus, log10 10 = 1, log10 100 = 2, log10 1000 = 3, and so on. The number e is ¢x ¡ defined as e = lim x→∞ 1 + 1x ≈ 2.71828. It arises naturally in differentiation and

5.5. GROWTH RATES

163

integration. The functions are related by the following equations: log e x = and log10 x = α

log e x log e 10 .

log10 x log10 e

For example, if α = log10 x, then 10α = x and log e (10α ) = log e x,

and log e (10 ) = α · log e 10 = log10 x · log e 10, giving the second relation.

It is common practice to write ln x for log e x. One useful feature of the ln function is that the derivative of ln x is dy

dy

that e y d x = 1 or x d x = 1 or

and log10 x = 1 log e 10

· 1x .

log e x log e 10

dy dx

1 x.

If y = log e x, then e y = elog e x = x so

= 1x . For the formulae above: log10 e · ln x = log10 x

so the derivative of log10 x can be written as log10 e · 1x and as

R EMARK 5.3: The mathematical constant e ≈ 2.71828, Euler’s number, may be

defined in a number of ways. Define e to be that number that makes the area under the curve

1 x

between 1 and e equal to 1. The number e may also be de-

fined as:

e = 1+

∞ xj X x x2 x3 + + ··· = . 1! 2! 3! j =0 j !

Yet another equivalent definition gives e as the limit of a sequence of numbers: µ ¶ 1 x e = lim 1 + . x→∞ x

The latter formula makes the connection with continuous-time discounting.

5.5 Growth rates Suppose an investor invests one dollar for a year at a 100% interest rate, r = 100. If the interest is calculated once at the end of the year, the loan is worth 1×(1+ r) =

1×2 = 2. Suppose instead interest is compounded twice a year, with the half yearly ¢ ¡ rate set at 2r = 21 . Then after six months the loan is worth 1 × 1 + 12 and at the ¢¡ ¢ ¡ ¢2 ¡ end of twelve months it is worth 1 × 1 + 21 1 + 21 = 1 + 21 = 2.25. Similarly, if

the interest is calculated every four months at the four-monthly rate 3r = 13 , then ¢3 ¡ at the end of the year the value of the loan is 1 + 31 = 2.36. Calculating interest ¢ ¡ 1 12 = (1.08333)12 = 2.613. on a monthly basis, the value of the loan would be 1 + 12

In general, with n rounds of interest calculation, the value of the loan at the end ¢n ¡ of the year is 1 + n1 . As n becomes large, this approaches e. ¡ ¢n 1 + 1n −→ e.

With continuous interest accumulation (from one instant to the next, a tiny amount of interest is added to the principal), the value of the loan at the end of the year is e.

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

164

Suppose now that the interest rate is r instead of 1 and note that: ³

Ã !n Ã !( n )r Ã !( n )  r  r r  r ń 1 1 1 1+ = 1+ ¡n¢ = 1+ ¡n¢ → er . 1+ ¡n¢ =   n r r r

(Recall some rules of manipulation: x a x b = x a+b , ( x a )b = x a·b .) With t periods, and

an interest rate r each period, the value after t periods is ´(1 + r) t . With inter³£¡ ¢¤ 2 t and with interest est calculated twice each period, the final value is 1 + 2r ¢¤ ¢ ¡£¡ r n t . From the previous calculated n times each period the final value is 1+ n

calculations:

³h³ r ín ´ t 1+ → ( e r ) t = e rt . n

Thus, the value of P dollars invested now with continuous interest accumulation is V = P e rt . Conversely, V dollars in t periods with continuous discounting is worth V e−rt (= P ) now.

5.5.1 Additivity of continuous growth rates One advantage of using continuous compound rates is that growth rates are additive. Consider three time periods, s < t < u, with some variable (such as income) observed at those three dates: ys , yt , yu . In the discrete calculation, the growth rate from s to t is the number r st determined by from t to u is the number r tu determined by to u the growth rate is r su , defined from

yu ys

yu yt

yt ys

= (1 + r st ), and the growth rate

= (1 + r tu ). Over the period from s

= (1 + r su ). However,

(1 + r st )(1 + r tu ) = (1 + r st + r tu + r st r tu ) = (1 + r su ), so the overall growth is not the sum of the growths in the two periods: r st + r tu 6=

r su . By contrast, using the continuous growth formula: yt = eρ st , ys

yu = eρ tu , yt

yu = eρ su , ys

where ρ i j is the continuous growth rate from time i to time j. Then, multiplying,

eρ su =

yu yt yu = = eρ st eρ tu = eρ st +ρ tu , ys ys yt

so that ρ su = ρ st + ρ tu . Thus, the growth rate over the full period from t to u is equal to the sum of the growth rates in the two subperiods.

5.5. GROWTH RATES

165

5.5.2

The functions ln x (log e x) and e kx

Figure 5.12 shows the functions e x and log e x. Both are increasing in x, but e x increases rapidly (at an increasing rate), whereas log e x increases at a decreasing rate. The following section provides additional discussion on these important functions.

ex

ln x

x

x

Figure 5.12: The functions e x and ln x (≡ log e (x))

5.5.2.1

The derivatives of e x and ln x

Given a constant k, the derivative of e kx is given by lim∆→0

e k(x+∆) − e kx . ∆

e k(x+∆) − e kx e(kx+k∆) − e kx e kx e k∆ − e kx = = ∆ ∆ ∆ k∆ e kx [e k∆ − 1] − 1] kx [e = =e . ∆ ∆ 2

Recalling that one definition of e z is e z = 1 + 1!z + z2! + · · · k∆ k2 ∆2 k3 ∆3 + + ··· 1! 2! 3! · ¸ k∆ k2 ∆2 2 2 1 = 1 + k∆ + ∆ k + + ··· . 2! 3! 4!

e k∆ = 1 +

Therefore, · ¸ 1 k∆ k2 ∆2 [e k∆ − 1] = k + ∆ k2 + + ··· . ∆ 2! 3! 4!

Consider:

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

166

Since the term [· · · ] is bounded for |k∆| < 1: [ e k∆ − 1] = k. ∆→0 ∆ lim

¯h i¯ ¯ 1 k ∆ k 2 ∆2 ¯ (The term [· · · ] is bounded because ¯ 2! + 3! + 4! · · · ¯ ≤

|k∆| + |k∆|2 + · · · =

1 1−| k∆| .)

1 2!

Therefore:

2

∆| + |k3!∆| + |k4! ··· < 1 +

d kx { e } = ke kx . dx )−ln x . The derivative of ln x is given by lim∆→0 ln( x+∆ ∆

Consider: µ µ µ ¶ ¶ ¶ x+∆ ∆ 1 1 1 ln( x + ∆) − ln x 1 = ln = ln 1 + = ln 1 + x ∆ ∆ x ∆ x ∆ ∆ "µ "µ ¶1 # ¶x 1# 1 ∆x 1 ∆ = ln 1 + x . = ln 1 + x ∆

Rearranging brackets, ln

·³

1+

³ tinuous for x > 0 and since 1 +

1

x ∆

´x 1¸

1

x ∆

∆ x

´x

∆

"½ ³ 1+ = ln

∆

´ x ¾ 1x

#

½³ 1+

1

1

x ∆

> 1, therefore

∆

. The ln x function is con-

x ∆

´ x ¾ 1x ∆

> 1. (Recall, if f is

continuous at x0 then lim x→ x0 f ( x) = f ( x0 ).) Therefore, (    (µ µ ¶ x ) 1x ¶ x ) 1x h 1i ∆ ∆ 1 1 ln( x + ∆) − ln x  = ln  lim 1 +  = ln e x = 1 , = lim ln  1 + x lim x ∆→0 ∆→0 ∆→0 ∆ x ∆ ∆

and so,

d 1 {ln x} = . dx x

5.6 Some rules of calculus Some basic rules of calculus apply for the differentiation of the product, ratio and composition of continuous functions. 1. The derivative of a product of functions:

d { f ( x ) g ( x )} = f ( x ) g ′ ( x ) + g ( x ) f ′ ( x ). dx

5.6. SOME RULES OF CALCULUS

167

For example, if f ( x) = xα and g( x) = cos x. Then

d α d x { x cos( x)}

α xα−1 cos x.

= xα sin( x) +

2. The derivative of a ratio of functions ( g( x) 6= 0):

d dx

½

¾ f ( x) g( x ) f ′ ( x ) − f ( x ) g′ ( x ) = . g( x ) [ g( x)]2

3. The chain rule (differentiating the composition of functions).

d { f ( g( x))} = f ′ ( g( x)) g′ ( x). dx

E XAMPLE 5.6: Consider: f ( x) = ln x and g( x) = x2 . (Note that the ln function

is defined only for x > 0.) Then

f ′ ( x) =

1 d {ln( x)} = dx x

g′ ( x ) =

d 2 { x } = 2 x. dx

1. Product rule: If h( x) = f ( x) g( x), then h′ ( x) = (ln x) · (2 x) + 1x · x2 . 2. Ratio rule: If h( x) =

f ( x) g ( x) ,

then h′ ( x) =

x2 · 1x −(ln x)·(2 x) ( x 2 )2

3. Chain rule: Put h( x) = g( f ( x)) = (ln x)2 . Differentiating directly, h′ ( x) = ¡ ¢ (2 ln x) 1x . Or, from the chain rule,

h′ ( x) = g′ ( f ( x)) f ′ ( x) = (2 ln x)

E XAMPLE 5.7 (R ATIO

RULE ):

using the ratio formula:

µ ¶ µ ¶ 1 1 = (2 ln x) x x

The function tan x is defined tan x =

♦

sin x cos x ,

and

d cos( x) cos( x) + sin( x) sin( x) cos2 ( x) + sin2 ( x) 1 tan( x) = = = , dx (cos( x))2 cos2 ( x) cos2 ( x) utilizing the fact that the sine of an angle is the opposite (a) over the hypotenuse (h) and the cosine is the base (b) divided by the hypotenuse (in a right-angled triangle). Thus, cos x =

b h,

sin x =

a h

and tan x =

sin x cos x

=

a b.

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

168

The figure illustrates:

h

a

x

E XAMPLE 5.8 (C HAIN

RULE ):

♦

.

b

Let g( y) = sin y and f ( x) =

p

x. Then

g′ ( y) = cos y f ′ ( x) =

1 −1 x 2. 2

p p 1 If h( x) = g( f ( x)) = sin( x) then h′ ( x) = cos( x) 12 x− 2 =

p cos( x) p . 2 x

2

Let g( y) = α y and f ( x) = x2 , so that g( f ( x)) = α x . This may be written 2

2

2

eln α· x , so that the derivative is 2 x ln α eln α x = 2 x ln α · α x .

♦

In the context of the exponential and log functions, the chain rule gives:

d f ( x) e = f ′ ( x ) e f ( x) , dx

d 1 ′ ln f ( x) = f ( x) given f ( x) > 0. dx f ( x)

The next example illustrates a more abstract calculation using the chain rule. E XAMPLE 5.9: Suppose that f (α) = g(α)h(α) , where g(α) > 0 for all α. To find

the derivative of f , take the ln of f and differentiate it. ln f (α) = h(α) ln g(α).

Thus, differentiating both sides:

µ ¶ · µ ¶ ¸ d ln f (α) 1 1 = f ′ (α) = h′ (α) ln g(α) + h(α) g′ (α) . dα f (α) g(α)

So, µ

¶ 1 f (α) = f (α)h (α) ln g(α) + f (α)h(α) g′ (α) g(α) ¶ µ 1 g′ (α). f ′ (α) = g(α)h(α) h′ (α) ln g(α) + g(α)h(α) h(α) g(α) ′

′

¡ Alternatively, observe that f (α) = e h(α) ln g (α) , so that using the rule ¢ e r ( x) r ′ ( x ) · µ ¶ ¸ 1 f ′ ( x) = e h(α) ln g (α) h′ (α) ln g(α) + h(α) g′ (α) . g(α)

d r ( x) = dx e

♦

5.6. SOME RULES OF CALCULUS

169

5.6.1

l’Hospital’s rule

Sometimes it is necessary to consider the behavior of the ratio of functions,

f ( x) g ( x) ,

in

the neighborhood of a point where the ratio is not defined. Consider the behavior of the ratio f ( x) g ( x)

f ( x) g ( x)

behave?

in the neighborhood of a point a. Suppose that x → a; how does

There are a few cases where the answer is obvious. If f ( x) → c and g( x) → d

with d 6= 0, then

f ( x) g ( x)

→ dc . Similarly, if f ( x) → c and g( x) → d with d = 0 and c 6= 0,

then the ratio tends to +∞ or −∞.

However, there are two specific cases where the answer is not obvious: When both f ( x) and g( x) go to 0, and when both f ( x) and g( x) go to ±∞. In such cases,

determining the limit may be tedious. However, l’Hospital’s rule provides a means of evaluating the limit. Theorem 5.1: Suppose that (i) f and g are differentiable on (a − δ, a + δ) \ {a} for

some δ > 0 (where the notation x\{ y} means the set x with the point y removed), (ii) lim x→a f ( x) = 0 and lim x→a g( x) = 0, (iii) g′ ( x) 6= 0 on (a − δ, a + δ) \ {a}. Then, lim

x→ a

f ′ ( x) =k g′ ( x )

implies

lim

x→ a

f ( x) = k. g( x )

Similarly, suppose that (i) f and g are differentiable on (a − δ, a + δ) \ {a} for some δ > 0, (ii) lim x→a f ( x) = ∞ and lim x→a g( x) = ∞, (iii) g′ ( x) 6= 0 on ( a − δ, a + δ) \ {a}.

Then,

lim

x→ a

f ′ ( x) =k g′ ( x )

implies

lim

x→ a

f ( x) = k. g( x )

Thus, under general conditions, the limit of the ratio is equal to the limit of the ratio of the derivatives.

E XAMPLE 5.10 ( L’H OSPITAL’ S RULE ): Let f ( x) = x2 and g( x) = e x − 1. As

x approaches 0, the values of the functions approach 0: lim x→a f ( x) =

lim x→a g( x) = 0. How does

f ( x) g ( x)

behave as x → 0? Considering the derivatives

of both functions, f ′ ( x) = 2 x and g′ ( x) = e x . Since 2 x goes to 0 as x approaches 0 and e x goes to 1 as x goes to 0,

lim

x2

x→0 e x − 1

= lim

x→0

2x = 0. ex

♦

CHAPTER 5. FUNCTIONS OF ONE VARIABLE

170

5.6.2 Higher derivatives For notational convenience, the second derivative is denoted f ′′ ( x). Technically, the second derivative is defined in exactly the same way as the first derivative:

f ′′ ( x) = lim

∆→0

f ′ ( x + ∆) − f ′ ( x) . ∆

Thus, the second derivative is the derivative of the first derivative. Likewise, higher-order derivatives may be determined step by step. As noted earlier, a continuous function may not be differentiable. Similarly, a function may have a first derivative, but not a second derivative; and so on. Properties of the second derivative are central in the discussion of optimization. If the second derivative is positive, the slope of the function is increasing at that point; if the second derivative is negative, the slope of the function is decreasing. Third-order derivatives are calculated similarly:

f ′′′ ( x) = lim

∆→0

f ′′ ( x + ∆) − f ′′ ( x) . ∆

Higher-order derivatives are calculated by repeated differentiation.

Exercises for Chapter 5

Exercise 5.1 Consider the function F , defined:

f ( x) =

  2 x2 ,

x
0. h( x) 4x

Find the derivative of f with respect to x, and show that the derivative is negative p p for x < 5 and positive for x > 5. Exercise 5.8 Consider the ratio

ln( t) t : show that the limit as k −at

= 0. Next, consider the function f ( t) = t e

positive number k, lim t→∞ f ( t) = 0.

t → ∞ is 0: lim t→∞ ln(t t)

, t > 0, where a > 0. Show that for any

Exercise 5.9 Given a function f ( x), the first derivative is a function f ′ ( x). Differentiating this function f ′ ( x) gives a function denoted f ′′ ( x), the second derivative of f . Plot the function f ( x) = 8[ x3 − 2 x2 + x] and its first and second derivatives. Exercise 5.10 Suppose that revenue as a function of output is given by R (Q ) =

ln(Q )− 14 Q 2 . Output, in turn, depends on input K according to the equation Q (K ) =

3 − e−aK . Thus, r(K ) = R (Q (K )) gives revenue in terms of K . Find

dr dK .

Exercise 5.11 Suppose the inverse demand for a good is given by the function

P (Q ) where P ′ (Q ) < 0. This identifies the price that corresponds to a given level

of demand. Conversely, the demand function Q (P ) associates a level of demand to

each price. So, given a quantity Q , P (Q ) is the price that would lead to that level of demand. Mathematically, Q = Q (P (Q )) for each Q . The elasticity of demand is

defined at Q = P (Q ) as

ǫ(Q ) = −

P dQ P dQ (P ) =− . Q dP Q (P ) dP

EXERCISES FOR CHAPTER 5

173

Use the chain rule to show that 1 Q dP (Q ) =− . ǫ(Q ) P (Q ) dQ

Exercise 5.12 Suppose that output Q is produced at cost C (Q ) = ln(Q + 1) + e−Q . Therefore, the average cost function is:

A (Q ) =

C (Q ) ln(Q + 1) + e−Q = . Q Q

Calculate A ′ (C ). Exercise 5.13 Suppose that output Q is produced at cost C (Q ). The average cost of the output is A (C ) =

C (Q ) Q .

Show that average cost is declining if marginal cost

is less than average cost. Exercise 5.14 A firm pays tax on gross revenue R (Q ) at a rate that depends on output volume t(Q ). Suppose that R (Q ) = k ln(Q +1)− aQ α and t(Q ) = 1− e−bQ , then total tax paid is T (Q ) = R (Q ) t(Q ) = [k ln(Q + 1) − aQ α ](1 − e−bQ ). Use the product rule to find T ′ (Q ).

This page intentionally left blank

6 Functions of one variable: applications

6.1

Introduction

In this chapter, calculus techniques are applied to the study of a variety of problems, beginning with optimization in Section 6.2. In an optimization problem, given a function f of one variable x, the problem is to find the value of x, x∗ that maximizes f . Such an x∗ is characterized by the condition that f (x∗ ) ≥ f (x) for all x. If the choice of x is restricted to some set X , then the requirement is that f (x∗ ) ≥ f (x) for all x ∈ X . Calculus techniques are used to find x∗ by considering

how f varies with x. These techniques are local in nature and so locate “local” maxima (or minima), where there is no alternative choice close to the candidate optimum that yields a higher value of the function f (or lower value if the task is to minimize the value of f ). In contrast, x∗ is a global maximizer if f (x∗ ) is as large as f (x) for any possible x. Identifying whether a value x∗ is a local or global maximizer is considered in Section 6.3. This discussion focuses on the shape of the function f , and the use of calculus techniques to identify the shape. Following this discussion of optimization, a number of applications are examined. The discussion of externalities in Section 6.4 illustrates the use of these techniques to study the welfare implications of traffic congestion. Comparative statics, introduced in Section 6.5, concerns the change in a variable resulting from an underlying parameter change. For example, a change in the intercept of a demand curve will change the intersection of the supply and demand curves and hence the equilibrium quantity and price. In the optimization context, if the function f depends on x and a fixed parameter α, then the optimal choice of x to maximize f (x, α) will depend on α: x(α). Therefore, any change in α will result in a change in the value of x(α). The study of how x(α) changes as α changes is considered in this section. 175

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

176

Subsequent sections consider comparative static calculations in a taxation context – in Sections 6.7 and 6.8. A key measurement in economics is the elasticity, measuring the response of a dependent variable to a change in another variable – for example, measuring how demand responds to a price change. In the context of demand, this is introduced in Section 6.6. Finally, Section 6.9 considers Ramsey pricing, using variational arguments to study efficient taxation.

6.2 Optimization Making optimal choices involves maximization or minimization of some objective. In the present context, given a function f where f : X → R , suppose that we wish to find a value for x, that maximizes (or minimizes) f on X . For most of this discussion, X will be a subset of R . ¯ ≥ f (x) for all x ∈ X , and a Definition 6.1: Say that x¯ is a global maximum if f ( x) ¯ ≤ f (x) for all x ∈ X . If f ( x) ¯ ≥ f (x) for all x ∈ X , say that x¯ global minimum if f ( x) ¯ ≤ f (x) for all x ∈ X , say that x¯ minimizes f . maximizes f . If f ( x)

The discussion here focuses on the problem of maximizing f (to minimize f , maximize − f ). Furthermore, throughout the discussion only unconstrained optimization is considered: at any point x, one can move in any direction, to alternative

choices. That fact allows one to use derivatives to characterize optimal choices. (An example of a constrained optimization problem is: max f (x) = x2 subject to the constraint that 0 ≤ x ≤ 1. In that case, the maximum occurs at x = 1, where f ′ (x) = 2.)

To motivate some of the key ideas, consider Figure 6.1, where the function f is increasing to the left of b (at point a, for example), is flat at b and decreasing to the right of b (such as at c). Corresponding to these movements, the derivative is positive at a, 0 at b and negative at c. The function is highest at b (and so is maximized at b) and this corresponds to the point where the derivative is 0. When the derivative is not 0, such as at a or c, rightward or leftward movements raise the value of the function. So, a first derivative equal to 0 is a necessary condition for a point to be a candidate for maximizing the function. Note also that, in this example, the derivative is decreasing. If the derivative is decreasing, then the slope of the derivative is negative – the derivative of the first derivative is negative or, equivalently, the second derivative is negative. So, in this case, a point maximizes the function if the derivative of the function at that point is 0, and in addition, the second derivative is negative.

6.2. OPTIMIZATION

177

f ′ (a) < 0, f ′ (b) = 0, f ′ (c) < 0

a

b

c

Figure 6.1: Changing slopes ¯ = 0, so that A necessary condition for a point x¯ to be an optimum is that f ′ ( x)

¯ if f ′ ( x) ¯ > 0 increasing x raises the value of the function, the function is flat at x:

¯ < 0 reducing x raises the value of the function. This observation may and if f ′ ( x) be expressed mathematically.

Theorem 6.1: Let f be a differentiable function. Then if x¯ maximizes f (x), ¯ = 0. f ′ ( x) ¯ = k 6= 0. The derivative of f at x¯ is lim∆ x→0 P ROOF : Suppose that f ′ ( x) f ( x¯ +∆ x)− f ( x¯ ) ∆x

¯ > 0; = k. If k > 0, taking ∆ x > 0 and small implies that f ( x¯ + ∆ x) − f ( x)

¯ > 0. In either and if k < 0, taking ∆ x < 0 and small implies that f ( x¯ + ∆ x) − f ( x) case, x¯ is not a maximizer of f .

¯ = 0, then the function has no maximum. Of course, If no point x¯ exists with f ′ ( x) if the function is not differentiable, then it may have a maximum at a point where

the derivative is not defined. For example f (x) = min{ x,1 − x} is a tent shaped

function with a maximum at 12 , but the function is not differentiable at 12 . While

¯ = 0 is a necessary condition for a maximum of a differentiable function, it is f ′ ( x)

¯ not sufficient. Figure 6.1 gives a case where the function has a unique point, x, where the first derivative is 0 and this point is the point that maximizes f . The ¯ = 0, that the slope is zero, is called the first-order condition for an condition, f ′ ( x)

optimum.

6.2.1

The first-order condition: issues

The first-order condition is a necessary condition for a point x¯ to be either a ¯ 6= 0 it means that the function is either maximum or a minimum, for if f ′ ( x)

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

178

¯ However, this is only a necessary condition. increasing or decreasing at x. Consider the function: f (x) =

(

− x2

x

2

if x < 0, if x ≥ 0.

For this function, f ′ (x) = −2x, x < 0 and f ′ (x) = 2x, x ≥ 0. Therefore, f is differentiable and f ′ (0) = 0. However, there are points arbitrarily close to 0 (to the right)

with f (x) > 0 = f (0) and points arbitrarily close to 0 (to the left) with f (x) < 0.

(The function f (x) = x3 has similar properties.) Thus, while the condition f ′ (x) = 0 locates an x such that the function f is flat at that point, additional conditions must be checked to determine if x is optimal.

f

f

(a)

x¯

x

(b)

x¯

f

x

(c)

x¯

x

Figure 6.2: Optimization: some possibilities I

Figure 6.2 illustrates some of the many possibilities where a solution to ′

f (x) = 0 may or may not yield a maximum. In Figure 6.2, case (a) is “well-behaved”,

in that there is a unique point where f ′ (x) = 0, and this corresponds to a global

¯ f ′ ( x) ¯ = 0, but while x¯ is a local maximum, it is not a maximum. In case (b), at x, global maximum. In case (c), the point x¯ is called a point of inflection, where the function is increasing, then flattens and then continues increasing. The actual functions plotted in Figure 6.2 are: (a) f (x) = 3x − x2 ,

(b) f (x) = 1 − 4x(x − 1)(x − 2)2 (x − 3),

(c) f (x) = 3x − 6x2 + 4x3 .

R EMARK 6.1: It may be worth noting that f ′ (x) = 0 is consistent with the func-

tion being strictly increasing at x ( f is strictly increasing if y > x implies f (y) > ¯ = 0 but x¯ is not a local maxf (x)). Considering case (c), f (x) = 3x − 6x2 + 4x3 , f ′ ( x)

¯ = 0. imum. In fact, this function is strictly increasing, despite the fact that f ′ ( x) To see this, observe that f ′ (x) = 3 − 12x + 12x2 = 3 − 12x(1 − x), so that at x¯ = 12 , ¡ ¢ ¡ ¢ ¯ = 12 , while for x < x, ¯ f (x) < 12 and f ′ 12 = 3 − 12 14 = 0. Observe that x¯ = 12 , f ( x)

6.2. OPTIMIZATION

179

¯ f (x) > 12 , so f is a strictly increasing function. To see this more clearly, for x > x, f (x + z) − f (x) = [3(x + z) − 6(x + z)2 + 4(x + z)3 ] − [3x − 6x2 + 4x3 ] = 3z − 12xz − 6z2 + 12x2 z + 12xz2 + 4z3 = z[3 − 12x + 12x2 − 6z + 12xz + 4z2 ] = z[3 − 12x(1 − x) + z(−6 + 12x + 4z)]

f

¡1

2

¢

+z −f

¡1¢

Thus, z > 0 implies f

2

= 4z3 .

¡1 2

¢ ¡ ¢ ¡ ¢ ¡ ¢ + z − f 12 > 0, and z < 0 implies f 12 + z − f 12 < 0, so

the function f is strictly increasing.

The next group of functions illustrate additional possibilities. f

(d)

(e)

f

x

x¯

f

x

x¯

(f )

x¯

x

Figure 6.3: Optimization: some possibilities II

The functions plotted in Figure 6.3 are:

(d)

(e)

(f)

 p  2 x − x + 1,   f (x) = 2,   ¡ ¢  2x 1 − 14 x ,

0≤x f ′ (x + ∆), so that the graph of f ′ appears as in Figure 6.5 (ii). These ideas

can be formalized using Taylor series expansions.

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

f

f′ (i)

182

f ′′

(ii)

(iii)

x¯ (0, 0)

x¯ (0, 0)

x

x

x x¯ p Figure 6.5: Optimization: f ( x) = 3 x − x

Theorem 6.2: Suppose that f is a continuous function with continuous first and second derivatives, f ′ ( x) and f ′′ ( x) (these are continuous functions of x). Then

f ( x¯ + ∆ x) = f ( x¯ ) + f ′ ( x¯ )∆ x + 21 f ′′ ( p)(∆ x)2

(6.1)

for some p between x¯ and x¯ + ∆ x. (That is, p satisfies x¯ − |∆ x| ≤ p ≤ x¯ + |∆ x|.) (This theorem is discussed in detail later.) Because ∆ x may be positive or negative, p between x¯ and x¯ + ∆ x means that x¯ − |∆ x| ≤ p ≤ x¯ + |∆ x|. With this result,

necessary and sufficient conditions may be given for a local optimum.

Theorem 6.3: Let f be a continuous function with continuous first and second derivatives. Suppose that f ′ ( x¯ ) = 0. Then: 1. f ′′ ( x¯ ) < 0 implies that x¯ is a local maximum. 2. f ′′ ( x¯ ) > 0 implies that x¯ is a local minimum. P ROOF : Consider the first case and suppose that f ′ ( x¯ ) = 0 and f ′′ ( x¯ ) < 0. Since

f ′′ is continuous, there is an ǫ > 0 such that f ′′ ( z) < 0 for all z with x¯ −ǫ < z < x¯ +ǫ.

Pick ∆ x so that |∆ x| < ǫ. Then, from Theorem 6.2,

f ( x¯ + ∆ x) = f ( x¯ ) + f ′ ( x¯ )∆ x + 12 f ′′ ( p)(∆ x)2 = f ( x¯ ) + 21 f ′′ ( p)(∆ x)2 ,

since f ′ ( x¯ ) = 0,

with p satisfying:

x¯ − ǫ < x¯ − |∆ x| < p < x¯ + |∆ x| < x¯ + ǫ. Because x¯ − ǫ < p < x¯ + ǫ, f ′′ ( p) < 0, and therefore

1 ′′ 2 2 f ( p)(∆ x)

< 0. Consequently,

f ( x¯ + ∆ x) < f ( x¯ ) when |∆ x| < ǫ. The second part of the theorem is proved in the

same way.

6.2. OPTIMIZATION

183

The condition f ′′ (x) < 0 is called a second-order condition. ¯ = 0 but f ′′ (only) satisfies f ′′ (x) ≤ 0 in a neighR EMARK 6.2: Suppose that f ′ ( x)

borhood of x¯ (rather than f ′′ (x) < 0). That is, for some ǫ, if x¯ − ǫ < z < x¯ + ǫ, then f ′′ (z) ≤ 0. If |∆ x| < ǫ, then according to the Taylor series expansion, f (x + ∆ x) = ¯ + 12 f ′′ (p)(∆ x)2 , with f ′′ (p) ≤ 0, so that f (x + ∆ x) ≤ f ( x). ¯ But the inequality f ( x)

may not be strict. In this case x¯ is a local maximum, but it may not be unique ¯ (in any neighborhood of x¯ there may be x′ 6= x¯ and f (x′ ) = f ( x)).

¯ = 0 and f ′′ satisfies f ′′ (x) ≥ 0 in a neighborhood of x, ¯ then Conversely, if f ′ ( x)

¯ > 0, then continuity of f ′′ implies that x¯ is a local minimum. Again, if f ′′ ( x) ¯ and then f (x + ∆ x) > f ( x), ¯ implying that x¯ f ′′ (p) > 0 for p sufficiently close to x,

¯ = 0 implies is a unique local minimum. If, in fact, f ′′ (x) ≤ 0 for all x, then f ′ ( x) that x¯ is a global maximum. This observation is developed in Section 6.3. The

property f ′′ (x) ≤ 0 for all x is closely related to concavity (see Section 6.3 and

Theorem 6.6).

6.2.3

Applications: profit maximization and related problems

A simple application of these results is the monopoly revenue optimization problem, where inverse demand is given by p = p(Q) and the cost by c(Q). Thus, total revenue is R(Q) = Q p(Q) and profit is π(Q) = R(Q) − c(Q). These functions are depicted in Figure 6.6. R(Q)

π(Q)

c(Q)

Q

Q

Q

Figure 6.6: Revenue, cost and profit The first-order condition for profit maximization is: π′ (Q) = R ′ (Q) − c′ (Q) = 0.

If we assume that R ′′ (Q) ≤ 0 and c′′ (Q) ≥ 0 for all Q, then the profit function has

a non-positive second derivative: π′′ (Q) = R ′′ (Q) − c′′ (Q) ≤ 0. In this case, to find a

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

184

Demand, revenue and cost

Profit

MC

MR

D

Figure 6.7: Marginal cost and marginal revenue

¯ = 0. The intuition maximum of π it is sufficient to find a value Q¯ such that π′ (Q)

is simple: increase output to the point where the increase in revenue matches the increase in cost. (Beyond that point the increase in cost is greater.) This yields the profit-maximizing rule: set marginal revenue equal to marginal cost. Figure 6.7 depicts the profit-maximizing choice. The calculations may be taken a step further. With π(Q) = R(Q) − c(Q) =

Q p(Q) − c(Q), differentiating gives

π′ (Q) = [p(Q) + Q p′ (Q)] − c′ (Q)

and rearranging: · ¸ Q dp π′ (Q) = p(Q) 1 + − c′ (Q). p dQ

This expression can be simplified, using the elasticity of demand (discussed in Section 6.6).

E XAMPLE 6.3: In the linear case p(Q) = a − bQ and c(Q) = cQ, where c is

the constant marginal cost. Then

R(Q) = Q p(Q) = Q(a − bQ) = aQ − bQ 2 and π(Q) = aQ − bQ 2 − cQ = (a − c)Q − bQ 2 .

Also, π′ (Q) = (a − c) − 2bQ = R ′ (Q) − c′ (Q) = a − 2bQ − c. Setting π′ (Q) = (a − c) −

2bQ = 0, the solution for Q is Q m = ′′

′′

(a− c) 2b .

The second-order condition is

π (Q) ≤ 0 and in the linear case π (Q) = −2b < 0. Therefore, Q m =

(a− c) 2b

6.2. OPTIMIZATION

185

maximizes π. This may be compared with the competitive model, where quantity (Q c ) is such that price equals marginal cost: p(Q ) = c′ (Q ). With linearity a − bQ = c or Q c =

( a − c) b

♦

for the competitive output.

E XAMPLE 6.4: Let f ( x) = β ln(a + x) − (α + x)2 , where α, β > 0. Then

f ′ ( x) =

β α+ x

− 2(α + x)

and

f ′′ ( x) = −

β − 2 < 0. (α + x )2 β

Solving for the first-order condition: 0 = f ′ ( x) = α+ x − 2(α + x) gives q q q β β β (α + x)2 and so 2 = (α + x) and x = 2 − α (assume α − α > 0).

β 2

=

♦

The next example is a variation of the monopoly problem.

E XAMPLE 6.5: A monopolist has demand function p(Q ) =

1 Qα

= Q −α , 0
1. The firm maximizes profit (revenue minus cost) giving the objective:

π(Q ) = p(Q )Q − c(Q ) = QQ −α − cQ β = Q 1−α − cQ β .

Then π′ (Q ) = (1 − α)Q −α − β cQ β−1 π′′ (Q ) = −α(1 − α)Q −α−1 − β(β − 1) cQ β−2 .

Since 0 < α < 1 and β > 1, −α(1 − α) < 0 and −β(β − 2) < 0, and for Q > 0,

Q −α−1 > 0 and Q β−2 > 0. Therefore, for all Q > 0, π′′ (Q ) < 0. Consequently, the solution to π(Q ) = 0 gives a local optimum (in fact, a global optimum). And,

π′ (Q ) = (1 − α)Q −α − β cQ β−1 = 0

implies

(1 − α) − β cQ α+β−1 = 0.

Rearranging: (1 − α) βc · ¸ 1 (1 − α) α+β−1 Q= . βc

Q α+β−1 =

♦

E XAMPLE 6.7: Consider a tree harvesting problem. Suppose that the cost of planting a tree is c. If the tree is sold when harvested at age t, its value is f ( t). Therefore, the present discounted value (PDV) of a tree harvested at age t is P ( t) = e−rt f ( t) − c, where r is the interest rate. The first-order condition for present value maximization is

P ′ ( t) = 0 = e−rt f ′ ( t) − re−rt f ( t) = 0 (using the product rule). This may be written P ′ ( t) = 0 = e−rt [ f ′ ( t)− r f ( t)] = 0,

and since e−rt 6= 0 this gives [ f ′ ( t) − r f ( t)] = 0 as the first-order condition.

Since no functional form has been specified (yet) for f ( t), this cannot be solved.

6.3. CONCAVITY AND OPTIMIZATION

187

The second-order condition, which ensures that the solution above gives a maximum, is P ′′ (t) < 0, and here this requires that P ′′ (t) = −re−rt f ′ (t) + e−rt f ′′ (t) + r2 e−rt f (t) − re−rt f ′ (t) < 0, which simplifies to e−rt {−r f ′ (t) + f ′′ (t) + r2 f (t) − r f ′ (t)} < 0, or, since e−rt 6= 0,

{r2 f (t) − 2r f ′ (t) + f ′′ (t)} < 0.

To consider a specific example, suppose that f (t) = 0 = [ f ′ (t) − r f (t)] = 1

Multiply by t 2 to obtain 0 and f

′′

(t) = − 14 t−

¡3¢ 2

1 2 − rt = 0,

or t =

p

1

t = t 2 . In this case

1 1 −1 t 2 − rt 2 . 2

1 2r .

Since f (t) =

p

t > 0, f ′ (t) =

1 p1 2 t

>

< 0, the second-order condition is: 1 1 1 3 r2 t 2 − rt− 2 − t− 2 < 0. 4 1

Multiplying this by t 2 leaves the sign unchanged, so the condition is equivalent to: r2 t − r − And, substituting t =

1 2r

1 −1 t < 0. 4

gives the condition: r 1 − r − r = −r < 0. 2 2

The second-order condition is satisfied.

♦

The circumstances under which the second-order conditions hold are closely related to the shape of the function.

6.3

Concavity and optimization

One issue with the use of second-order conditions is that such conditions do not determine whether x¯ is a local or a global maximum. In cases (a) and (b) of ¯ = 0 and f ′′ ( x) ¯ < 0 are satisfied but case (a) corFigure 6.2, the conditions f ′ ( x) responds to a global maximum while case (b) corresponds to a local maximum.

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

188

As the figures plotted earlier show, the overall shape of the function f is crucial in determining whether these properties of the derivatives of the function identify local or global maxima. In Figure 6.2, the function depicted in (a) has a shape that leads to the solution to f ′ (x) = 0 identifying a global maximum. This function is an example of a concave function.

Definition 6.2: A function f is concave if for any x, y and any θ ∈ [0,1] f (θ x + (1 − θ)y) ≥ θ f (x) + (1 − θ) f (y). A concave function is “dome-shaped”, as in Figure 6.8.

f (θ x + (1 − θ)y) f (y) θ f (x) + (1 − θ) f (y)

f (x)

x

θ x + (1 − θ)y

y

Figure 6.8: Concavity When the shape is “reversed” the function is convex. Convexity is defined: Definition 6.3: A function g is convex if for any x, y and any θ ∈ [0,1] g(θ x + (1 − θ)y) ≤ θ g(x) + (1 − θ)g(y). A convex function is “bowl-shaped” – the reverse of a concave function. Intuitively, a concave function is shaped like a hill; a convex function is shaped like a valley. Figure 6.9 depicts both types of function. A more extensive discussion of concavity and convexity is given in Chapter 8, when Taylor series expansions are considered in detail. (The functions plotted in Figure 6.9 are f (x) = 5 ln(1 + x) − x and g(x) = −15x + 8e x − 5.) The properties of concavity and convexity are closely related to

global optima.

¯ = 0, then x¯ is a global maximum. If f is Theorem 6.4: If f is concave and f ′ ( x)

¯ = 0, then x¯ is a global minimum. convex and f ′ ( x)

6.3. CONCAVITY AND OPTIMIZATION

189

g f

convex

concave

x¯

x¯

Figure 6.9: Concave and convex functions P ROOF : Considering the concave case, suppose that f is concave and x¯ satisfies ¯ = 0 but x¯ is not a global maximum. Then there is some y such that f (y) > f ′ ( x) ¯ Using concavity of f : f ( x).

¯ ≥ θ f (y) + (1 − θ) f ( x), ¯ θ ∈ [0,1]. f (θ y + (1 − θ) x) Rearrange this expression as ¯ − f ( x) ¯ ≥ θ[ f (y) − f ( x)], ¯ θ ∈ [0,1]. f (θ y + (1 − θ) x) ¯ f ( x¯ + θ(y − x)) ¯ − f ( x) ¯ ≥ θ[ f (y) − f ( x)], ¯ θ ∈ [0,1] or Since θ y + (1 − θ) x¯ = x¯ + θ(y − x), ¯ − f ( x) ¯ f ( x¯ + θ(y − x)) ¯ ≥ [ f (y) − f ( x)], θ ¯ this may be written and, multiplying and dividing the left side by y − x, ¯ (y − x)

·

¸ ¯ − f ( x) ¯ f ( x¯ + θ(y − x)) ¯ ≥ [ f (y) − f ( x)]. ¯ θ(y − x)

Finally, letting θ → 0, gives ¯ f ′ ( x) ¯ ≥ [ f (y) − f ( x)] ¯ >0 (y − x) ¯ This contradicts the fact that f ′ ( x) ¯ = 0. Therefore, concavity (here ∆ x = θ(y − x)).

of f and the condition f ′ (x) = 0 imply that x¯ is a global maximum. A similar argument shows that with convexity, the condition f ′ (x) = 0 is sufficient for a global minimum.

¯ the convex funcIn Figure 6.9, the concave function, f , has a maximum at x; ¯ A key feature of either a concave function or a tion, g, has a minimum at x. convex function is that the second derivative never changes sign: it is always

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

190

non-negative or always non-positive. In Figure 6.10, the derivatives of f and g are plotted. Given the functional forms, f (x) = 5 ln(1 + x) − x with f ′ (x) = x

x

′

and g(x) = −5 − 15x + 8e with g (x) = −15 + 8e .

5 x +1

− 1,

Considering the concave function f , its derivative or slope to the left of x¯ is ¯ and negative to the right of x. ¯ The derivative decreases steadily, positive, 0 at x, ¯ to become a negative number to the right from a positive number to the left of x, ¯ Thus, the derivative is decreasing: f ′′ (x) ≤ 0 for all x. For the convex function of x.

g, the opposite observation applies. The derivative is negative to the left of x¯ and

¯ And so g′′ (x) ≥ 0 for all x. positive to the right of x. g′ (x)

′

f (x)

x¯

x¯

Figure 6.10: Plot of derivatives f ′ (x), g′ (x)

′′

¯ = 0 is sufficient for a global maximum: whether Notice that with concavity, f ′ ( x)

¯ ≤ 0 or f ′′ ( x) ¯ < 0 is irrelevant. Recall in Figure 6.3(d), the function depicted f ( x)

there is concave but any number x in the interval [1,2] is an optimum. While the function is concave, it is not strictly concave. Definition 6.4: A function f is strictly concave if for x 6= y f (θ x + (1 − θ)y) > θ f (x) + (1 − θ) f (y), θ ∈ (0,1). Strict concavity guarantees a unique optimum. (The definition of strict convexity reverses these strict inequalities.) Theorem 6.5: If f is a strictly concave function then f has a unique maximum. P ROOF : Suppose that x¯ 1 and x¯ 2 are two maxima. Then f (θ x¯ 1 + (1 − θ) x¯ 2 ) > θ f ( x¯ 1 ) + (1 − θ) f ( x¯ 2 ), θ ∈ (0,1) and this contradicts the assumption that x¯ 1 and x¯ 2 are maxima.

6.3. CONCAVITY AND OPTIMIZATION

191

R EMARK 6.3: To summarize the discussion so far, if f ′ ( x) = 0 and f ′ ( x) < 0 or

f ′′ ( x) ≤ 0 on a neighborhood of x, then x is a local maximum. And if f is concave,

f ′ ( x) = 0 implies that x is a global maximum, while strict concavity implies that

x is a unique global maximum.

The following discussion clarifies the relation between concavity and the second derivative. The main conclusions are that concavity implies that f ′′ ( x) ≤ 0 for all

x (proved in a later chapter); and f ′′ ( x) ≤ 0 for all x implies concavity. If f ′′ ( x) < 0

for all x then f is strictly concave, but strict concavity of f does not imply that

f ′′ ( x) < 0 for all x. (Note that the proof of this theorem requires integration, a

subject not developed until later.)

Theorem 6.6: Given the function f , suppose that f ′′ ( x) < 0 everywhere, then the function is strictly concave. If f ′′ ( x) ≤ 0 for all x, the function is concave. P ROOF : With f ′′ ( x) < 0 for all x, if y > x, f ′ ( y) − f ′ ( x) =

Ry x

f ′′ (ξ)d ξ < 0, so that

y > x implies f ′ ( y) < f ′ ( x), and f ′ (·) is a strictly decreasing function. So, for y > x, f ( y) − f ( x ) = f ( y) − f ( x ) =

Zy x

Zy x

f ′ (ξ)d ξ < f ′ ( x)[ y − x]

(6.2)

f ′ (ξ)d ξ > f ′ ( y)[ y − x],

(6.3)

since f ′ ( x) > f ′ ( z) > f ′ ( y) for all z with x < z < y. Taking x < y, define z(θ) =

θ x + (1 − θ) y, 0 < θ < 1. From Equation 6.3, since x < z (θ) < y,

f ( z(θ)) − f ( x) > f ′ ( z(θ))[ z(θ) − x]. From Equation 6.2

f ( y) − f ( z(θ)) < f ′ ( z(θ))[ y − z(θ)]. Rearranging these expressions:

f ( z(θ)) > f ( x) + f ′ ( z(θ))[ z(θ) − x]

(6.4)

f ( z(θ)) > f ( y) + f ′ ( z(θ))[ z(θ) − y].

(6.5)

Multiplying these expressions by θ and 1 − θ, respectively, and adding gives: [θ + (1 − θ)] f ( z(θ)) > θ f ( x) + (1 − θ) f ( y) + f ′ ( z(θ)) {θ[ z(θ) − x] + (1 − θ)[ z(θ) − y]} .

(6.6)

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

192

Since [θ + (1 − θ)] = 1 and θ[z(θ) − x] + (1 − θ)[z(θ) − y] = z(θ) − [θ x + (1 − θ)y] = 0, f (z(θ)) > θ f (x) + (1 − θ) f (y).

(6.7)

Therefore, f is strictly concave if f ′′ (x) < 0 for all x. If f satisfies the weaker condition f ′′ (x) ≤ 0 for all 0, the previous arguments go through with weak inequalities replacing strict inequalities everywhere to yield: f (z(θ)) ≥ θ f (x) + (1 − θ) f (y).

¯ = R EMARK 6.4: Because f ′′ (x) ≤ 0 for all x implies concavity, the conditions f ′ ( x)

0 and f ′′ (x) ≤ 0 for all x are sufficient to ensure that x¯ is a global maximum.

However, strict concavity of f does not imply an everywhere strictly negative second derivative. R EMARK 6.5: Consideration of the strictly concave function f (x) = −(x − 1)4

(Figure 6.11) shows that strict concavity does not imply that f ′′ (x) < 0 every-

where, since f ′ (x) = −4(x − 1)3 and f ′′ (x) = −12(x − 1)2 , so that f ′′ (1) = 0. (That this function is strictly concave is shown in a later chapter.)

·

·

·

1 2

·

1

·

3 2

x 2 −(x − 1)4

Figure 6.11: Strictly concave function, f , with f ′′ = 0 at x = 1 The previous observations may be gathered to give: Theorem 6.7:

1. A twice differentiable function f is concave if and only if

′′

f (x) ≤ 0, ∀ x. 2. The function f is convex if and only if f ′′ (x) ≥ 0, ∀ x. 3. Strict concavity of f implies that f ′′ (x) ≤ 0 for all x, and f ′′ (x) < 0 for all x implies strict concavity.

4. Strict convexity of f implies that f ′′ (x) ≥ 0 for all x, and f ′′ (x) > 0 for all x implies strict convexity.

There is an analogous result in higher dimensions.

6.4. EXTERNALITIES

193

R EMARK 6.6: For the multidimensional case, similar results apply. A function

f on R n is concave if and only if the matrix of second partial derivatives (the Hessian matrix) is negative semi-definite everywhere; strict concavity does not imply negative definiteness, but negative definiteness everywhere implies strict concavity.

6.3.1

Summary

The following set of points summarizes the previous discussion on optimization. • f ′ ( x¯ ) = 0 is necessary for a maximum or minimum. • f ′′ ( x¯ ) < 0 and f ′ ( x¯ ) = 0 is sufficient for x¯ to be a unique local maximum. • f ′′ ( x) ≤ 0 for a neighborhood of x¯ and f ′ ( x¯ ) = 0 is sufficient for x¯ to be a local maximum.

• f ′′ ( x) ≤ 0 for all x and f ′ ( x¯ ) = 0 implies that x¯ is a global maximum. • f ( x) concave implies that f ′′ ( x) ≤ 0 for all x. • f ′′ ( x) < 0 for all x implies f is strictly concave, which implies that f has a unique global maximum.

6.4 Externalities In many circumstances, the actions of one individual affect the welfare of another, sometimes in a positive way and sometimes in a negative way – but the individual taking the action does not consider such matters. So the overall impact on welfare resulting from the action is not considered in making the choice and this results in an inefficient outcome. One example of this type of issue arises in the use of public goods. When individuals can use a public facility for free there is either over-use or under-provision of the public facility. This over-use is called the freerider problem.

6.4.1

The free-rider problem

Consider a public bridge that drivers can use for free. For each individual driver, the benefit from using the bridge is b. With n drivers using the bridge, the cost (in

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

194

terms of delay) to each driver is c(n), so when there are n drivers in total crossing the bridge the net benefit or utility to each of those drivers is

u = b − c(n). A driver who does not cross the bridge does not pay the cost or get any benefit and has a utility of 0. Assume that c(0) = 0, c′ (n) > 0 and c′′ (n) > 0 for all n ≥ 0. Thus, there is little delay with few drivers on the bridge, but delay increases with

the number of drivers, getting progressively worse as the congestion increases. (Although in reality n must be an integer, assume for the calculations that n is a real number. If c′ (n) > 0 is small until n is very large, rounding to the nearest integer will not be a significant issue.)

6.4.1.1

Equilibrium behavior

The marginal social benefit (MSB) from increasing the number of drivers crossing the bridge is b, the extra utility from an extra person making the journey. The social cost from bridge usage is the cost per user times the number of users,

s( c) = nc(n), and the total benefit is nb, the benefit per user times the number of users. Therefore, the social welfare is:

w(n) = nb − s( c) = nb − nc(n). Maximizing this:

w′ (n) = b − s′ ( c) = b − [nc′ (n) + c(n)]. The intuition for this is that an extra user gets benefit, b. There are two components to marginal or social cost. The extra user incurs cost c(n) and, in addition, the marginal increase in congestion, measured by c′ (n), is incurred by all n people, for a total cost nc′ (n). This gives a total marginal social cost of MSC = s′ ( c) = nc′ (n) + c(n). Socially efficient usage of the bridge occurs when the number of drivers crossing the bridge maximizes total welfare: the sum of utilities less costs of those crossing the bridge. This occurs when w′ (n) = 0. Let the solution to this equation

be n∗ . The choice of n∗ is the value of n satisfying:

[b − c(n)] − nc′ (n) = 0

6.4. EXTERNALITIES

195

or b = c(n∗ ) + n∗ c′ (n∗ ).

(6.8)

But drivers independently decide whether to use the bridge or not, and usage ˆ is determined increases as long as b > c(n), so the equilibrium usage level (n)

as the solution b − c(n) = 0. Comparing these, the market equilibrium level of

nc′ (n) + c(n)

c(n)

b

n∗

nˆ

Figure 6.12: Congestion externalities usage nˆ is larger than the socially optimal level, n∗ (see Figure 6.12). The welfare implications of the difference between n∗ and nˆ are clear – usage is excessive from a social point of view.

6.4.1.2

A policy solution

Suppose the government could charge a toll for bridge usage so a driver must pay t to cross the bridge. In particular, suppose the bridge manager solves the program maxn w(n) and determines that optimal usage is n∗ . The manager then charges a toll of t = n∗ c′ (n∗ ). So, if n people use the bridge, each individual compares b with c(n) + n∗ c′ (n∗ ), the travel cost plus the toll charge. If b > c(n) + n∗ c′ (n∗ ), the benefit outweighs the cost and usage increases. By contrast, if b < c(n) + n∗ c′ (n∗ ),

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

196

the opposite is true, and usage will decrease. So, with the tax, equilibrium bridge usage is determined by the value of n that satisfies b = c(n) + n∗ c′ (n∗ ), which is n∗ . E XAMPLE 6.8: Consider the case where c(n) = an2 . Then c′ (n) = 2an and c′′ (n) = 2a. The solution for n∗ , the socially optimal level of usage, is deter-

mined by

b = nc′ (n) + c(n) = n2an + an2 = 3an2 or n∗ = p ∗ 3n .

q

b 3a

=

p1 3

q

b a.

The solution for nˆ is given by: b − an2 = 0 or nˆ =

q

b a

=

♦

6.5 Comparative statics Comparative statics studies the impact on a solution variable of changes in the underlying parameters. Consider an optimization problem where the function to be maximized depends explicitly on some parameter or parameters. For example, consider the maximization of the function f (x; a, c) = a · ln(x + 1) − cx. Writing the function this way makes explicit the dependence of the problem on both a and c. Maximizing gives: a

1 −c=0 x+1

and the second-order condition for a minimum, −a (x+11)2 < 0, is satisfied. From the

first-order condition, the solution is x(a, c) = ac − 1. So, for example, taking c to be a

constant,

dx da

= 1c , giving the impact on the solution of a change in the parameter, a.

In Figure 6.13, the impact of an increase in a to a′ is depicted. At parameter value a, the optimal choice of x is at x∗ ; when a is increased to a′ , the optimal choice of x moves from x∗ to x′ . Thus, the change in x per unit change in a is given by:

∆ x x′ − x∗ = ′ , ∆a a −a measuring the change in the optimal choice resulting from the parameter change.

6.5. COMPARATIVE STATICS

197

f ( x)

a′ ln( x + 1) − cx a ln( x + 1) − cx x x∗ x′

Figure 6.13: Impact of a parameter change on choice variable The following examples illustrate these ideas.

E XAMPLE 6.9: Consider the monopoly profit maximization (Example 6.5). ¡ ¢1 Note that the solution depends on the parameters c and α, Q (α, c) = 1−c α α , so that one may consider the impact of variations in α or c on the value

of Q . To calculate the derivative

dQ dα ,

we view Q as a function of α alone,

taking marginal cost to be a constant. First, to simplify matters, take the ln of Q and use the fact that 1 α

ln

(1−α) c ,

d ln Q dα

=

1 dQ Q dα

so that

dQ dα

=Q

d ln Q dα .

Since ln Q =

Ã ! µ ¶ d ln Q −1 (1 − α) 1 1 −1 = + α 1 −α ln dα c c α2 c ¶ µ µ ¶ µ ¶ (1 − α) −1 1 ln = − . c α(1 − α) α2

Multiplying this expression by Q =

¡ 1−α ¢ α1 c

gives

dQ dα

=Q

d ln Q dα :

¶1 µ ¶ µ ¶1 µ ¶ µ ¶ µ (1 − α) α (1 − α) α −1 (1 − α) 1 dQ − , = ln dα c c c α(1 − α) α2

giving the impact of a change in α on the optimal choice of Q .

The next example generalizes the cost function:

♦

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

198

E XAMPLE 6.10: This example modifies Example 6.5 by considering an alternative cost function. Suppose now that the cost function is c(Q ) = Q β , β >

1. In this case, the profit function is π(Q ) = Q 1−α − Q β . Therefore, π′ (Q ) =

(1 − α)Q −α − βQ β−1 = 0 gives

(1 − α)Q −α − βQ β−1 = 0 ⇒ (1 − α) = βQ α+β−1 ⇒

(1 − α) = Q α+β−1 . β

This implies that

Q = Q (α, β) = Then, ln Q =

³

1 α+β−1

´

ln

³

(1−α) β

´

µ

(1 − α) β

¶

1 α+β−1

.

. If we view Q (α, β) as a function of just α, so

that β is held fixed for the entire discussion, then we can consider

dQ dα :

!µ ¶ µ ¶2 µ ¶ µ ¶Ã 1 dQ 1 1 1 (1 − α) =− − , ln dα α+β−1 β α + β − 1 1 −α β β

calculated in the same way as in Example 6.9.

♦

Example 6.11 applies the same procedure to the tree harvesting problem (Example 6.7) to consider how the optimal time to harvest varies with the interest rate. E XAMPLE 6.11: Continuing with the tree harvesting problem of Example 6.7, one may also consider, in the abstract, the effect of variations in the interest rate on the solution value of t. The condition [ f ′ ( t) − r f ( t)] = 0 must

hold for any r – so that as r varies, so must t. Therefore,

f ′′ ( t)dt − f ( t)dr − r f ′ ( t)dt = [ f ′′ ( t) − r f ′ ( t)]dt − f ( t)dr = 0. Thus

dt f ( t) = , dr [ f ′′ ( t) − r f ′ ( t)]

(6.9)

and if we assume that f > 0, f ′ > 0, so that the value of the tree increases with age and at a diminishing rate so that f ′′ < 0, then this expression is negative.

To present a different perspective, this result may also be derived by approximation. The first-order condition is f ′ ( t) − r f ( t) = 0. If r varies to

6.5. COMPARATIVE STATICS

199

r + ∆r, then t must vary to, say, t + ∆ t so that the first-order condition is

satisfied at the new value of r:

f ′ ( t + ∆ t) − (r + ∆r) f ( t + ∆ t) = 0. Thus, [ f ′ ( t + ∆ t) − (r + ∆r) f ( t + ∆ t)] − [ f ′ ( t) − r f ( t)] = 0 © ′ ª © ª f ( t + ∆ t) − f ′ ( t) − r f ( t + ∆ t) − f ( t) − ∆r f ( t + ∆ t) = 0. Multiplying the last expression by

1 ∆t ∆ t ∆r

=

1 ∆r

gives:

f ( t + ∆ t) − f ( t) ∆ t 1 f ′ ( t + ∆ t) − f ′ ( t) ∆ t −r − ∆r f ( t + ∆ t) = 0 ∆t ∆r ∆t ∆r ∆r · ′ · ¸ ¸ f ( t + ∆ t) − f ′ ( t) ∆ t f ( t + ∆ t) − f ( t) ∆ t −r − f ( t + ∆ t) = 0 ∆t ∆r ∆t ∆r As ∆r → 0 with t → 0, varying smoothly in r,

f ′′ ( t)

dt dt − r f ′ ( t) − f ( t) = 0. dr dr

For example, let f ( t) = ktα , so that P ( t) = ktα e−rt − c and

P ′ ( t) = αktα−1 e−rt − rktα e−rt = ktα e−rt [α t−1 − r].

Setting this to 0 gives α t−1 − r = 0 or t = αr . Then

dt dr

= − rα2 .

To illustrate numerically, consider the case where k = 20, α = 0.9, c = 70

and r = 0.05, so that f ( t) = 20 t0.9 . The objective is then P ( t) = 20 t0.9 e−0.05 t − 70. Therefore, f ′ ( t) = (0.9)20 t−0.1 , f ′′ ( t) = −(0.1)(0.9)20 t−1.1 . According to the formula given earlier,

dt f ( t) 20 t0.9 = ′′ = ′ − dr [ f ( t) − r f ( t)] −(0.1)(0.9)20 t 1.1 − 0.05(0.9)20 t−0.1 =

t0.9

−(0.1)(0.9) t−1.1 − 0.05(0.9) t−0.1

=−

t . (0.09) t−1 + 0.045

Using the fact that the solution is t = dt dr dt dr

α r

=

0.9 0.05

= 18, this formula gives

= 360. (Notice that this is the same result as would be obtained from

=

0.9 0.052

=

0.9 0.0025

= 360.)

♦

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

200

6.6 Elasticity of demand The elasticity of demand measures the responsiveness of demand to price changes and is defined as the percentage change in quantity demanded due to a percentage change in price and denoted ǫ:

ǫ=

% change in Q = % change in P

∆Q Q ∆P P

=

P ∆Q P dQ ≃ = Q ∆P Q dP

dQ dP Q P

.

R EMARK 6.7: Often, because demand is downward sloping, the elasticity is defined as: −

% change in Q P dQ ≃− , % change in P Q dP

so that the elasticity is expressed as a positive number for downward sloping demand. The elasticity definition consists of a slope term magnitude of the variables,

P Q.

For any given

dQ dP ,

dQ dP

and the relative

the larger is Q relative to P,

the smaller is the elasticity. For example, if demand is equal to 10 units and the slope is −1, a unit price increase causes a unit drop in demand, which is 10% of total demand. If, however, demand is 100 units, and the slope is unchanged, a unit price increase again will reduce demand by 1 unit, which is 1% of demand. The demand elasticity reflects the importance of the levels of the variables,

whereas the slope does not. The calculations above use the fact that

a b c d

=

ad bc .

The elasticity approximation “≃”

becomes exact as the change in price becomes small. The elasticity of demand has a graphical interpretation as the ratio of the slope of the demand curve to the angle of the line from the origin to the point on the demand curve (see Figure 6.14). R EMARK 6.8: A point worth noting is that the reciprocal of the elasticity satisfies: 1 Q dP = . ǫ P dQ This is the case if dP 1 = dQ , dQ dP

since in that case, from the definition of ǫ, 1 = ǫ

1 P dQ Q dP

=

1 1 P dQ Q dP

=

Q 1 Q dP = . P dQ P dQ dP

6.6. ELASTICITY OF DEMAND

201

Q

dQ dP

Q P

Q (P ) (demand) P

Figure 6.14: Demand elasticity Because the price function is strictly decreasing (larger Q corresponds to lower

P , a downward sloping demand), we can write the relation between quantity and price as Q = Q (P ). This gives two alternative representations of the demand function: P = P (Q ) and Q = Q (P ). If we substitute one into the other we have an identity: Q ≡ Q (P (Q )). In words, for any Q , the price at which this quantity

will be purchased, P (Q ), is the price such that the quantity demanded at this price is Q . Because this equation holds for every Q , we can differentiate to get dQ dQ

=

dQ dP dP dQ

dQ

dP and since dQ /dQ = 1, this implies that dQ = 1/ dP . Since ǫ = # # " " #" Q 1 Q dP 1 1 1 1 ´= = = =³ . P dQ dQ dQ P ǫ P P dQ Q dP

Q

dP

P dQ Q dP ,

dP

´ ³ E XAMPLE 6.12: Let P (Q ) = ln 1 + Q1 . Then

dP dQ

= P ′ (Q ) is:

µ ¶ −1 dP 1 −1 = 1+Q · . = dQ (Q + 1)Q Q2 Q

´ ³ ´ ³ Since P (Q ) = ln 1 + Q1 , 1 + Q1 = e P or

Q ′ (P ) is:

1 Q

= e P − 1 or Q =

1 . ( e P −1)

Thus

dQ dP

=

µ ¶ µ ¶ 1 −1 dQ −1 1 P 2 · 1 + · e = = P = − Q · 1 + = −Q (1 + Q ), ´ ³ 2 dP ( e − 1)2 Q Q 1 Q

therefore

dQ dP

=

1 dP dQ

.

♦

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

202

6.6.1 Elasticity, revenue and profit maximization Note that, in general, the elasticity will depend on the point on the demand i h³ ´ curve Q

(a− bQ )

at which it is evaluated. For example, in the linear case P = Q = Qa − b and i¡ ¢ ³ ´³ ´ h³ ´ ¡ ¢ ¡ ¢ ¡ ¢ dQ dQ a a P Q = ab − b1 P , so dP = − b1 . Therefore, ǫ = Q = − b . − b1 = 1 − bQ dP Q

Sometimes it is useful to make explicit the dependence of the elasticity on the quantity by writing ǫ(Q ). Note that ǫ < 0 (as long as price is non-negative: a − bQ ≥

a = −1 or 0). When ǫ = −1, 1− bQ

a bQ

= 2 or Q =

a 2 b , the revenue-maximizing quantity.

Recall that the profit function of a monopoly was written: π(Q ) = P (Q )Q − c(Q ) = R (Q ) − c(Q ).

From this expression: π′ (Q ) = R ′ (Q ) − c′ (Q ) ¸ · Q dP − c′ (Q ) = P (Q ) 1 + P dQ ¸ · 1 − c′ (Q ). = P (Q ) 1 + ǫ

Since a profit-maximizing monopolist sets π′ (Q ) = 0 and solves for Q , this implies

that the solution satisfies:

· ¸ 1 P (Q ) 1 + = c′ (Q ), ǫ(Q )

and since ǫ(Q ) < 0, this implies that P (Q ) > c′ (Q ) at the solution so that the profit-

maximizing monopolist sets a price greater than marginal cost. Rearranging this expression, 1 P (Q ) − c′ (Q ) = , P (Q ) ǫ(Q )

so that the markup of price over marginal cost is equal to the reciprocal of the elasticity. Finally, note that marginal revenue is given by · ¸ 1 1 MR (Q ) = R (Q ) = P (Q ) 1 + = P (Q ) + P (Q ) < P (Q ) ǫ ǫ ′

and the marginal revenue curve lies below the demand curve P (Q ) by the amount 1 P (Q ). ǫ

6.7. TAXATION AND MONOPOLY

203

6.7 Taxation and monopoly Suppose that the government wishes to induce a monopolist to produce the quantity at which price equals marginal cost. To achieve this outcome, suppose that the government gives a per-unit subsidy of s and imposes a lump sum tax of T . The profit of the firm is then: π g (Q ) = Q ( P (Q ) + s) − cQ − T.

Taking s and T as given, the firm maximizes profit – the first-order condition is π′g (Q ) = 0 = P (Q ) + QP ′ (Q ) + s − c = 0.

i h By an earlier calculation, this is equivalent to: P (Q ) 1 + ǫ(1Q ) = c − s. Write the

solution to this equation as Q (s), to denote dependence on the level of the subsidy.

For any given s, this equation has a unique solution, provided R ′′ (Q ) < 0, where

R (Q ) = QP (Q ).

In the particular case where s = 0, the solution is the monopoly level of out-

put, Q m : Q (0) = Q m . Suppose that the subsidy level s∗ is such that Q (s∗ ) is the

competitive quantity Q (s∗ ) = Q c so that price equals marginal cost, P (Q (s∗ )) = c. Then the first-order condition at s∗ is

π′g (Q ( s∗ )) = 0 = P (Q ( s∗ )) + Q ( s∗ ) P ′ (Q ( s∗ )) + s∗ − c = 0,

and because P (Q (s∗ )) = c, this reduces to π′g (Q ( s∗ )) = Q ( s∗ ) P ′ (Q ( s∗ )) + s∗ = 0.

Now, recall that

1 ǫ

=

Q ′ P P (Q ),

so this condition may be written:

s∗ = −Q (s∗ )P ′ (Q (s∗ )) = −P (Q (s∗ ))

µ

¶ Q (s∗ ) P (Q (s∗ )) P (Q c ) ′ ∗ P ( Q ( s )) = − =− . ∗ ∗ P (Q (s )) ǫ(Q ( s )) ǫ(Q c )

So, if the subsidy is chosen equal to the price–elasticity ratio at the competitive output, the monopolist’s optimal choice is the competitive output. Unraveling the expression, µ

¶µ ¶ P (Q c ) dQ P (Q c ) = −s ǫ(Q c ) = −s . Qc dP ³ ´ dQ Q Multiply both sides by P (Qcc ) to get Q c = −s dP . In the case of linear demand ∗

− b1 ,

∗

so that s

∗

should be chosen to satisfy Q c =

s∗ b

∗

or s

dQ dP

=

= bQ c . Since

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

Qc =

( a − c) b ,

204

the unit subsidy s∗ = a − c leads the monopolist to produce at the com-

petitive output level.

In this case, the total subsidy is s∗ Q c = (a − c)

³

( a − c) b

´

=

( a − c )2 b .

Because the

first-order condition for a maximum does not depend on the lump sum tax T , the output of the firm depends only on the policy parameter s. So, ³the subsidy may be ´ recouped by a lump sum tax, T , chosen to equal s∗ Q c = (a − c)

( a − c) b

=

( a − c )2 b .

Since the solution depends on government-set parameters, comparative statics calculations arise naturally in considering the impact of government policies. The next example discusses the impact of various tax changes on a monopolist’s choice of quantity.

E XAMPLE 6.13: Suppose that a government imposes a revenue tax t and output subsidy s. Thus, if the firm receives revenue R and produces quantity

Q , the tax paid is tR and the subsidy received sQ . With demand given by P (Q ) = Q1α = Q −α , 0 < α < 1, and linear cost ( c(Q ) = cQ ), the profit function is π(Q ) = (1 − t) R (Q ) − cQ + sQ

= (1 − t)Q 1−α − ( c − s)Q.

Thus, the first-order condition is π′ (Q ) = (1 − t)(1 − α)Q −α − ( c − s) = 0. Rearranging,

Q

−α

µ ¶1 (1 − t)(1 − α) (1 − t)(1 − α) α ( c − s) α ⇒Q = ⇒Q = . = (1 − t)(1 − α) ( c − s) ( c − s)

It is of interest to know the impact of changes in t and s on output. Some calculation gives: µ ¶ 1 ¶ µ dQ 1 (1 − t)(1 − α) ( α −1) −(1 − α) = dt α ( c − s) ( c − s)

and µ ¶ 1 ¶ µ dQ 1 (1 − t)(1 − α) ( α −1) (1 − t)(1 − α) = ds α ( c − s) ( c − s)2 µ ¶1 µ ¶ 1 (1 − t)(1 − α) α 1 . = α ( c − s) ( c − s)

♦

6.8. TAXATION INCIDENCE IN SUPPLY AND DEMAND

205

6.8

Taxation incidence in supply and demand

When a tax is imposed on a good, this generally impacts both the buyer and seller. The price paid by the buyer will typically rise, and the price received by the seller will typically fall – with the gap equal to the tax. Relative to the pre-tax price P0 , the extent to which the buyer price after tax, P tb , exceeds P0 and the seller price received, P ts , falls below P0 measures the tax incidence. With a unit tax, P tb = P ts + t. If, for example, P tb − P0 is much larger than P0 − P ts , then the tax

incidence is borne mostly by the buyer. This is discussed next.

Suppose that in some market the government charges a tax t per unit sold. Thus the price paid by the purchaser exceeds the price received by the seller by the amount t : P d (Q) = P s (Q) + t. Here P d denotes the price paid by the demander and P s denotes the price paid by the seller. Prior to the imposition of the tax,

the equilibrium quantity Q e is determined by the equation P d (Q e ) = P s (Q e ). Demand is downward sloping and supply upward sloping, so P d′ (Q e ) < 0, P s′ (Q e ) > 0 and there is a unique point of intersection. The situation before and after tax is depicted in Figure 6.15. P S

P s (Q t ) + t = P d (Q t )

P d (Q e ) = P s (Q e )

•

P s (Q t )

D Qt

Qe

Q

Figure 6.15: The impact of taxation on price and quantity When a unit tax is imposed on a good, the amount of the tax is independent of the selling price, so that the gap between P s and p d is equal to the tax: p d (Q) =

P s (Q) + t. In contrast, with an ad-valorem tax, the amount of the tax depends on

the price. In this case, the selling price is scaled up by the addition of a tax that is a percentage of the price and then p d (Q) = (1 + t)P s (Q). In either case, the result is

to reduce the equilibrium amount traded. Let Q(t) (or Q t ) be the amount traded at

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

206

tax level t. This is a function of t, whether the tax is a unit or ad-valorem tax. What is

dQ dt ?

This question may be answered directly by solving for Q as a function of t

and then computing the derivative. It is also possible to derive a general formula for

dQ dt

in terms of elasticities, and that approach provides some insight into how

elasticities affect the responsiveness of quantity to variation in the tax. These matters are discussed next. The analyses for the unit and ad-valorem taxes are provided separately, although the two cases are quite similar.

6.8.1 A unit tax The impact of the imposition of a unit tax on quantity can be determined as follows. Given an initial value for t, the corresponding quantity, Q ( t), satisfies

P d (Q ( t)) = P s (Q ( t)) + t,

(Q (0) = Q e ).

Differentiating gives

P d′ (Q ( t))

dQ ( t) dQ ( t) − 1 = P s′ (Q ( t)) . dt dt

This may be rearranged, −1 = [P s′ (Q ( t)) − P d′ (Q ( t))]

dQ ( t) . dt

At t = 0, Q ( t) = Q (0) = Q e and P s (Q (0)) = P d (Q (0)) = P e , so considering this equa-

tion at t = 0,

· ¸ Pe Q e ′ Qe ′ dQ (0) P s (Q e ) − P d (Q e ) Q e Pe Pe dt · ¸ 1 dQ (0) Pe 1 − . = Q e ηs ηd dt

−1 =

(6.10)

Let the terms η s and η d denote the elasticities of supply and demand respectively. Since P d′ (Q ) < 0 and P s′ (Q ) > 0, η d < 0 and η s > 0. Solving for

t = 0,

dQ (0) dt )

dQ dt

(evaluated at

gives

dQ dt

Since η s > 0 and η d < 0,

t =0

dQ dt

=

dQ (0) dQ = = dt dt

P Q

h

−1

1 ηs

− η1

d

i.

< 0. Notice that when the elasticities are small, the

impact on Q of a change in t is large. This calculation gives the impact on Q of a change in t starting from the initial point where t = 0.

6.8. TAXATION INCIDENCE IN SUPPLY AND DEMAND

207

E XAMPLE 6.14: Consider a one-market model for good Q , where demand is given by P d (Q ) = Q −α , 0 < α < 1 and supply is given P s (Q ) = bQ , b > 0. The

market-clearing quantity is given by the solution to: Q −α = bQ , or 1 = bQ 1+α , ¡ ¢ 1 1 or Q 1+α = 1b , or Q = 1b 1+α = b− 1+α . With a unit tax, market clearing becomes Q −α = bQ + t. This expression is

difficult to work with. However, the elasticities formula above allows a direct computation of

dQ dt

at the market-clearing level. The elasticity of demand

and supply may be computed: P d′ (Q ) = −αQ −(1+α) , so that ¶ µ ¶ µ 1 Q Q (−α)Q −(1+α) = Q (1+α)Q −(1+α) (−α) = (−α). = P d′ (Q ) = ηd Pd Q −α

Similarly P s (Q ) = bQ , so that P s′ (Q ) = b, and µ ¶ µ ¶ 1 Q ′ Q 1 b= = P s (Q ) = b = 1. ηs Ps Ps b P d dQ Q dP d P =b that Q

Summarizing, η d =

Finally, observe

= − α1 and

1 ηd

= −α. Similarly, η s = 1 and η1s = 1.

since P e = bQ e , so:

1 dQ =− . dt b(1 + α)

♦

As mentioned, a unit tax is independent of the selling price and is just added to the selling price so that market clears when P d (Q t ) = P s (Q t ) + t. In contrast, with the ad-valorem tax, the tax paid is a percentage of the sale price.

6.8.2

An ad-valorem tax

Suppose that the government imposes an ad-valorem tax t, so that in (the aftertax) equilibrium (1 + t)P s (Q ( t)) = P d (Q ( t)), where Q ( t) is the after-tax quantity

traded. Let Q ( t) denote the equilibrium quantity after tax. Note that Q (0) = Q e .

With an ad-valorem tax, the elasticity formula is slightly different. Differentiate (1 + t)P s (Q ( t)) = P d (Q ( t)) with respect to t, to get

P s (Q ( t)) + (1 + t)P s′ (Q ( t))

dQ ( t) dQ ( t) = P d′ (Q ( t)) dt dt

or [(1 + t)P s′ (Q ( t)) − P d′ (Q ( t))]

dQ ( t) = −P s (Q ( t)) dt

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

208

or · ¸ P s (Q ( t)) Q ( t) P d (Q ( t)) Q ( t) dQ ( t) (1 + t) P s′ (Q ( t)) − P d′ (Q ( t)) = −P s (Q ( t)). Q ( t) P s (Q ( t)) Q ( t) P d (Q ( t)) dt

Let

1 η s (Q ( t))

=

Q ( t) ′ P s (Q ( t)) P s (Q ( t))

and

1 η d (Q ( t))

Q ( t) ′ P d (Q ( t)) P d (Q ( t))

=

be the reciprocals of the

elasticity of supply and demand at Q ( t). Then the expression may be written as: ¸ · 1 1 P d (Q ( t)) dQ ( t) P s (Q ( t)) − = −P s (Q ( t)). (1 + t) Q ( t) η s (Q ( t)) Q ( t) η d (Q ( t)) dt

This expression is valid at each value of t. However, it is simpler at t = 0, since

P s (Q (0)) = P d (Q (0)) = P e , so that P d (Q ( t)) Q ( t)

equals

P d (Q (0)) Q (0)

·

or

Pe Qe .

P s (Q ( t)) Q ( t)

equals

P s (Q (0)) Q (0)

or

Pe Qe .

Likewise, at t = 0,

In this case, the expression becomes:

¸ Pe 1 1 Pe dQ (0) − = −P s (Q (0)) = −P e . Q e η s (Q e ) Q e η d (Q e ) dt

So, · ¸ 1 dQ (0) Pe 1 − = −P e , Q e ηs ηd dt

where the elasticities are evaluated at Q e . Rearranging:

dQ (0) =− dt

Pe h

Pe Qe

= −h

1 ηs

− η1

d

Qe 1 ηs

− η1

d

i

i;

¸ · Pe 1 dQ 1 − = −P e , Q e (0) η s η d dt

dQ =− dt

P Pe Qe

h

1 ηs

− η1

d

i = −h

Qe 1 ηs

− η1

d

i,

where all functions are evaluated at t = 0. Applying this formula to the demand and supply functions of Example 6.14 gives:

µ ¶ 1 ¡ 1 ¢ 1 1 1 +α 1 dQ =− b − 1 +α . =− dt (1 + α) b (1 + α)

This calculation may be confirmed directly by solving for Q ( t) and differentiating. With an ad-valorem tax, the equation Q ( t)−α = (1 + t)bQ ( t) sets demand equal

to supply. Rearranging,

6.8. TAXATION INCIDENCE IN SUPPLY AND DEMAND

209

Q ( t)−α = (1 + t)bQ ( t) or

µ ¶ 1 1 +α 1 1 1 = Q ( t)1+α or Q ( t) = = ((1 + t)b)− 1+α . (1 + t)b (1 + t)b

Thus, µ ¶ 1 dQ ( t) 1 =− [(1 + t)b]− 1+α −1 · b. dt 1+α

At t = 0,

1 1 dQ ( t) =− b − 1 +α . dt (1 + α)

So, the two calculations yield the same result.

E XAMPLE 6.15: As a final example, these calculations may be repeated, replacing P s (Q ) = bQ, b > 0 with P s (Q ) = Q β , β > 0. In this case η d = − α1 and ηs =

1 β

so

1 ηd

= −α and

1 ηs

= β, and so

1 ηs

− η1 = α + β. The equilibrium quand

1

tity satisfies Q −α = Q β or 1 = Q α+β or Q = 1 α+β = 1. Likewise, P = 1 is the equilibrium price. Then, using the formula,

dQ 1 =− . dt α+β By contrast, solving P d (Q ( t)) = Q ( t)−α = (1 + t)Q ( t)β gives Q ( t)α+β = (1 + t)

−1

and Q ( t) = (1 + t)

1 − α+ β

. Differentiating:

1 1+ t

=

dQ 1 − 1 −1 =− (1 + t) α+β dt α+β 1 at t = 0. and this equals − α+ β

6.8.3

♦

Elasticities and taxation

Considering the formula associated with the unit tax,

dQ = dt

P Q

h

−1

1 ηs

− η1

d

i,

the following observations highlight some aspects of elasticities and their relationship to the responsiveness of demand and price to changes in taxation. If η s is

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

210

very large, the quantity supplied is very responsive to price changes, so the supply curve is flat. When η d is also very large, then the demand curve is also flat. In this case, both

1 ηs

1 ηd

and

are very small so that

dQ dt

is a large negative number (−1

divided by a small positive number). Therefore, large elasticities lead to a large change in quantity traded as a result of the tax. (See Figure 6.16.)

P

P S

S •

∆P

∆P D

•

D Qt

Q

Qe

Qt Qe

Q

Figure 6.16: Large and small changes in quantity

When both η s and η d are small, this corresponds to the situation where neither supply nor demand is responsive to price changes – the curves are both steep. In this case,

1 ηs

and

1 ηd

are large so that

dQ dt

is a small negative number. (See

Figure 6.16.) Other possibilities arise when only one of the elasticities is relatively small. If η s is very large (so that

1 ηs

≃ 0) and η d is very small, then in this case the

supply curve is close to horizontal and the demand curve steep. (See Figure 6.17.) When η d is very large (so that

1 ηd

≃ 0) and η s is very small, the supply curve is

steep and the demand curve flat. (See Figure 6.17.) While the price change is the

same in both cases in Figure 6.16, the quantity change is much smaller when the elasticities are small. In Figure 6.16, the demand and supply curves are similarly sloped; in one case they are relatively elastic so that large quantity changes occur as a result of the tax and in the other a less significant change occurs. Because they have similar slopes, the incidence of the tax is equally borne by demanders and suppliers – the rise in price of one is roughly equal to the drop in price for the other. In Figure 6.17, the demand and supply curves have dissimilar slopes, so the taxation incidence is unequally distributed.

6.9. RAMSEY PRICING

211

P

P S

S

∆P

•

∆P

•

D

D Qt Qe

Q

Qt Qe

Q

Figure 6.17: Taxation with elastic demand or supply

6.9

Ramsey pricing

Consider the efficiency problem that arises when there is a fixed cost in production, so that pricing at marginal cost generates a loss. In the absence of a subsidy to production, price must be set above marginal cost. Consider the task of recovering fixed cost by pricing above marginal cost in different markets where fixed cost is jointly shared in production, while minimizing loss of consumer and producer surplus. The issue then is to set prices in each of the markets to recover fixed cost, but in a way that minimizes overall dead-weight loss. Consider two markets with demands p x (x) = a − bx and p y (y) = α − β y, where

both items are produced by a firm with cost function c(x, y) = F + c[x + y], so that there is common constant marginal cost c and a fixed cost F. Pricing at marginal cost is efficient but generates a loss for the firm. If prices are set above marginal cost so that the fixed cost F is recouped, what is the optimal pricing scheme (minimizing loss of surplus, subject to raising the fixed cost F)? In Figure 6.18, to simplify, the parameters are chosen so that the quantity, q, setting price equal to marginal cost is the same in both markets. Consider reducing the quantity in each market by the same amount. In market X this leads to price d and revenue R x = (d − c)q′ . The dead-weight loss is given

by the triangle DW L x = D + E = 12 (d − c)(q − q′ ). In market Y , the price is e and the revenue raised is R y = E = (e − c)q′ . Also, the dead-weight loss is DW L y = E = 1 ′ 2 (e − c)(q − q ).

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

212

px

py d D

e

E

c

q′

q

Figure 6.18: Ramsey pricing

Now, consider the revenue generated per unit of dead-weight loss in each market: ρ x and ρ y . ρx = ρy =

q′ ( d − c) q ′ Rx =2 = 1 , ′ DW L x q − q′ 2 ( d − c)( q − q ) Ry DW L y

=

( e − c) q ′

1 ′ 2 ( e − c)( q − q )

=2

and

q′ . q − q′

Let ρ = ρ x = ρ y . Viewed as a function of ( q′ , q), ρ ( q′ , q) falls as q′ falls, so the

revenue gain per unit of dead-weight loss is greatest when q′ is close to q. And the optimal quantity reduction is the same in both markets: if, for example, the reductions were q′x < q′y then the revenue gain per unit of dead-weight loss is smaller in market X than Y .

In the case where marginal costs are constant but different and the priceequal-marginal-cost conditions give different quantities, the calculations are similar. When quantities are reduced from qα to q′α and corresponding prices following the quantity reductions are p′α , α = x, y, the revenue gains in markets x

and y are: R x = q′x ( p′x − c x ) and R y = q′y ( p′y − c y ), and the dead-weight losses are

DW L x = 21 ( p′x − c x )( q x − q′x ) and DW L y = 21 ( p′y − c y )( q y − q′y ). The corresponding

6.9. RAMSEY PRICING

213

ratios are:

q′x ( p′x − c x )

1 ′ ′ 2 ( p x − c x )( q x − q x )

=2

q′y ( p′y − c y )

q′x

, q x − q′x

1 ′ ′ 2 ( p y − c y )( q y − q y )

=2

q′y q y − q′y

.

Setting these equal:

q′x q x − q′x

=

q′y q y − q′y

,

so that q′x ( q y − q′y ) = q′y ( q x − q′x ) or q′x q y = q′y q x , or

ties should be reduced in the same proportion.

q′x qx

=

q′y qy .

Therefore, the quanti-

This page intentionally left blank

Exercises for Chapter 6

Exercise 6.1 Let C ( q) denote the total cost of producing q units of a good. Show that the average cost is minimized at the point where average and marginal costs coincide. Exercise 6.2 The price–cost margin (markup), m, for a monopolist is defined as

m=

price − marginal cost . price

Find the relationship between ǫ, the elasticity of demand and m. Exercise 6.3 Let demand for a good be given by p = a − bq or equivalently by ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ q = ab − b1 p = α − β p, where α = ab and β = 1b . The cost of production of the good is c( q) = cq, where c is a positive constant. Assume there is only one firm producing the good.

(a) Is the firm’s profit function concave? (b) Find the profit-maximizing level of output of the good. (c) Suppose that the firm is taxed, paying $τ per unit of output. Give the firm’s new objective function and find the firm’s profit-maximizing level of output. (d) Suppose that the firm is taxed, such that on each dollar of sales the firm pays the fraction τ in taxes. Give the firm’s new objective function and find the profit-maximizing output level. (e) In cases (c) and (d), analyze the effect of changing τ on the optimal choice of q. 215

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

Exercise 6.4

216

A monopolist faces a demand curve P (Q ) = kQ −α (0 < α < 1) and

cost function c(Q ) = cQ 2 , c > 0.

(a) Find the profit-maximizing level of output, Q ∗ . Confirm that the second-order condition for a maximum is satisfied. (b) Now suppose that the cost function is c(Q ) = c1 Q + c2 Q 2 , where c1 , c2 > 0 (the

demand function is still P (Q ) = kQ −α ). Confirm that the profit function is concave.

Exercise 6.5 Consider the following supply and demand model: demand : P d (Q ) = α − βQ ; supply : P s (Q ) = γ − δQ. The equilibrium quantity Q o solves P d (Q ) = P s (Q ), i.e., P d (Q o ) = P s (Q o ). (a) Find the elasticities of the demand and supply functions, η d and η s . (b) Calculate the impact of a small (per) unit tax, t, on the quantity sold. ³ ´ dQ (c) Relate your answer d t , to the elasticity formula: ¾ ½ dQ 1 1 1 − = −³ ´ p dt ηs ηd Q

(d) Replace the supply function with P s (Q ) = γ − 21 δQ 2 and repeat the calculations for the pair: P d (Q ) = α − βQ and P s (Q ) = γ − 21 δQ 2 .

Exercise 6.6 Suppose that instead of a unit tax, a proportional tax is imposed so that the after-tax relation between the selling and buying price is given by

P d (Q ) = (1 + t)P s (Q ). Calculate

dQ dt ,

at t = 0.

Exercise 6.7 In a city, electricity is produced by one firm. This firm is given the sole right to produce the city’s electricity because it can produce the electricity needed more efficiently than could any other firm or firms. The firm is thus a natural monopoly. The total revenue function for the firm is R (Q ) = P (Q )Q = 24Q − 5Q 2 (P (Q ) = 24 − 5Q ) and its total cost is 25 − Q 2 .

One of the problems that the city faces is that since the firm is a natural monopoly, and since electricity has a very inelastic demand, the firm can make large profits. To control these large profits, the city is considering a revenue tax ( tR ) as a means of regulating the firm. Assuming that the firm is a profit maximizer: (a) Calculate the optimal price and quantity before regulation, and after the revenue tax is applied. (b) If the tax rate is set at 20%, will the profits of the firm be decreased? Will the city benefit from this regulation, i.e., will it face lower prices and higher quantities? (c) Calculate the tax revenue the city will receive.

EXERCISES FOR CHAPTER 6

217

Exercise 6.8 To control the use of tobacco, the federal government can impose many different types of tax on the producers of tobacco. Assume that there is only one producer (monopolist), who faces the demand curve P (Q ) = 56−8Q , and whose total cost curve is C (Q ) = 6Q 2 −10. If the producer

is a profit maximizer:

(a) Determine the profit-maximizing output of the monopolist and the associated price. (b) Assume now that the government levies a per-unit tax of t on the producer in order to reduce the use of tobacco. What effect does this have on the profitmaximizing price and output? (c) Alternatively, assume the government levies a sales tax of 50% on the producer. What effect does this have on the profit-maximizing price and output? (d) If the per-unit tax is set at 7, and the sales tax is set at 50%, which tax is more effective? Exercise 6.9

Assume that a perfectly competitive producer has a production 1

function for a homogeneous good given by Q (L) = 20L 2 . The costs of production

are given by a fixed cost of $50 plus a wage rate of $8 for each unit of labour employed. If the market price of the good is $4, (a) Find the amount of labour needed to maximize profits, and the associated level of profits. (b) Assume that a payroll tax is imposed, which causes the wage rate to become w = w(1 + t). Determine the new amount of labour needed to maximize

profits.

(c) If t = 0.25, what is the amount of labour needed to maximize profits? What is the associated level of profits?

Exercise 6.10 A competitive firm faces the market price of $9 per unit. Its total cost is given by Q 3 − 9Q 2 + 33Q + 10. (a) Determine the level of output that maximizes the firm’s profits, and the value of profits at this level. Will the firm continue to produce? (b) Determine the minimum price at which the firm will stay in the industry. Exercise 6.11 Consider a firm with cost function C = 6 + Q 2 facing the market

demand function Q = 18 − 2P .

(a) If the firm behaves as a monopolist, what will be the price and quantity in equilibrium? (b) If the firm behaves competitively, what will be the equilibrium price and quantity?

CHAPTER 6. FUNCTIONS OF ONE VARIABLE: APPLICATIONS

218

(c) Suppose the monopolist were offered a subsidy S per unit produced. What level of subsidy would induce the monopolist to produce at the competitive equilibrium? Exercise 6.12

Consider a firm with the following cost and demand functions:

2

C (Q ) = Q + 2Q + 1, Q = 16 − 2P . (a) Calculate the equilibrium price and quantities: (i) if the firm acts as a monopolist (ii) if the firm acts competitively. (b) Calculate the profits the firm makes: (i) as a monopolist (ii) as a competitor. (c) Suppose the government offered the monopolist a per-unit subsidy, S . What level of S will bring about the competitive solution? (d) What is the subsidized monopolist’s level of profit? Exercise 6.13

Consider a monopolist with a demand function P = P (Q ) and

cost function c = c(Q ), c′ (Q ) > 0. Show that if the monopolist maximizes profit, the monopolist will operate on the portion of the demand curve where the elasticity exceeds 1. Exercise 6.14

Consider a monopolist with a demand function P = P (Q ) and

cost function c = c(Q ), c′ (Q ) > 0. Show that with profit maximization, the firm will operate on the portion of the demand curve where the elasticity exceeds 1. Exercise 6.15 In some market, the demand function is given by p d ( q) =

a ln q

and

the supply function is given by p s ( q) = bq.

(a) Calculate the elasticities of demand and supply. (b) The equilibrium quantity satisfies p d ( q) = p s ( q). Starting from a point where

there is no tax on the good, suppose that a small tax t is imposed – so that p d ( q) =

p s ( q) + t. Calculate

d q d pd dt , dt

and

d ps dt ,

all at t = 0.

Exercise 6.16 Suppose that the demand and supply functions in a market are given by P d (Q ) = Q −α and P s (Q ) = e kQ − 1. The equilibrium quantity depends on the parameters α and k: Q (α, k). (a) Show that

∂Q ∂α

< 0 and

∂Q ∂k

< 0.

(b) Suppose now that the government imposes a “goods and services” style tax at rate t (formally, a value added tax) so that the equilibrium quantity after tax is

Q ( t) (ignore the parameters α and k from now on). Thus, Q ( t) satisfies P d (Q ( t)) = (1 + t)P s (Q ( t)). Find

dQ (0) dt

in terms of elasticities.

7 Systems of equations, differentials and derivatives

7.1

Introduction

In many economic problems, functions of more than one variable arise naturally. Preferences are represented by a utility function that depends on goods consumed, u(x1 , x2 ,... , x n ), output in production is represented as a function of inputs, f (k1 , k2 ,... , k n ), and so on. As with the single-variable case, differential techniques play an important role in a variety of applications, such as comparative statics and optimization. For example, the impact of variation in one variable on the value of a function routinely arises in considering its marginal impact. In the production case, the marginal productivity of input k1 is given by [ f (k1 +

∆, k2 ,... , k n ) − f (k1 , k2 ,... , k n )]/∆, and measures the impact on output of a small change in input k1 . Such a calculation is called a partial derivative, since only variation in one of a number of variables is considered. Developing these techniques leads to a number of applications and provides the necessary tools to develop some important results – in particular the implicit function theorem. Following some introductory discussion, partial derivatives are defined in Section 7.2. This is the natural extension of the derivative of a function of one variable. After this, level contours are introduced (Section 7.3). A level contour identifies the set of x that yields a specific value for the function: in microeconomic theory, indifference curves and isoquants are level contours of utility and production functions respectively. In two dimensions particularly, level contours provide a useful way to visualize the shape of a function. Concepts such as the marginal rate of substitution between goods in consumption or production are explained naturally in terms of level contours and are defined using partial derivatives. 219

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

220

One important use of partial derivatives is in locating maxima and minima of a function: values of (x1 ,... , x n ) that make the function as large or as small as possible. This technique is illustrated in Section 7.5. The remaining sections consider (non-linear) systems of equations. Section 7.7 outlines some of the basic issues relating to non-linear systems, such as the existence and uniqueness of solutions. Following this, the implicit function is introduced in Section 7.8. This theorem studies how variation of one or more parameters in a system of equations affects each variable in the system. For a linear function of many variables, the impact of parameter variation is easy to determine. The function f (x1 , x2 ) = ax1 + bx2 is a linear function of two variables and f (x1 , x2 ,... , x n ) = a1 x1 + a2 x2 + · · · + a n x n is a linear function of n

variables, (x1 , x2 ,... , x n ). The impact of a change in x1 on the value of f (x1 , x2 ) = ax1 + bx2 may be computed directly:

f (x1 + ∆ x1 , x2 ) − f (x1 , x2 ) = (ax1 + ∆ x1 + bx2 ) − (ax1 + bx2 ) = a∆ x1 . From this, the change in f per unit change is x1 may be calculated as: f (x1 + ∆ x1 , x2 ) − f (x1 , x2 ) = a. ∆ x1 Similar calculations may be applied for non-linear functions to compute the impact of variation in some variable on the function. E XAMPLE 7.1: Consider f (x1 , x2 ) = (x1 x2 )2 + cos(x1 ) sin(x2 ), suppose that x1 is increased to x1 + ∆ x1 : f (x1 + ∆, x2 ) = ([x1 + ∆ x1 ] · x2 )2 + cos(x1 + ∆ x1 ) sin(x2 )

= [x12 + 2x1 ∆ x1 + (∆ x1 )2 ] · x22 + cos(x1 + ∆ x1 ) sin(x2 )

= x12 · x22 + [2x1 ∆ x1 + (∆ x1 )2 ] · x22 + cos(x1 + ∆ x1 ) sin(x2 ).

Subtracting f (x1 , x2 ), then f (x1 + ∆ x1 , x2 ) − f (x1 , x2 ) = [2x1 ∆ x1 + (∆ x1 )2 ] · x22

+ [cos(x1 + ∆ x1 ) − cos(x1 )] sin(x2 ).

Dividing by ∆ x1 , [cos(x1 + ∆ x1 ) − cos(x1 )] f (x1 + ∆ x1 , x2 ) − f (x1 , x2 ) = 2x1 x22 + sin(x2 ) + (∆ x1 ) · x22 . ∆ x1 ∆ x1

7.1. INTRODUCTION

221

As ∆ x1 goes to 0, the term (∆ x1 ) · x22 goes to 0 and

[cos( x1 +∆ x1 )−cos( x1 )] ∆ x1

approaches the derivative of cos( x1 ), which is − sin( x1 ). Therefore, as ∆ x1 approaches 0,

f ( x1 + ∆ x1 , x2 ) − f ( x1 , x2 ) → 2 x1 x22 − sin( x1 ) sin( x2 ). ∆ x1

♦ This example shows that calculating the impact of parameter or variable changes on a function of many variables or parameters is not significantly different from similar computations in the one-variable case. The need for such calculations comes up routinely in economics. Consider maximizing utility u( x1 , x2 ) subject to a budget constraint, p1 x1 + p2 x2 = y, where

p i is the price of good i and y is income. Rearranging the budget constraint:

x2 =

y p2

−

p1 p 2 x1 .

Substituting this into the utility function gives µ ¶ y p1 uˆ ( x1 , y, p1 , p2 ) = u x1 , − x1 . p2 p2

Taking ( y, p1 , p2 ) as fixed (beyond the control of the consumer), maximizing utility amounts to choosing x1 to maximize uˆ ( x1 , y, p1 , p2 ). The impact of variations in x1 on uˆ requires calculation of the direct effect via x1 and the indirect effect via x2 = y p1

−

p1 p 2 x1 .

When x1 is chosen optimally, the maximized value of uˆ gives the utility

of the individual from choosing optimally – let u∗ ( y, p1 , p2 ) = max x1 uˆ ( x1 , y, p1 , p2 ).

The function u∗ gives the utility of the individual as a function of ( y, p1 , p2 ), so one may consider the impact of income and price variations on the individual’s welfare. The impact on utility of an increase in income from y to y + ∆ y is given

by u∗ ( y + ∆ y, p1 , p2 ) − u∗ ( y, p1 , p2 ), and the per-unit change is given by

u∗ ( y + ∆ y, p1 , p2 ) − u∗ ( y, p1 , p2 ) . ∆y This expression is a partial derivative – calculating the impact on u∗ of a small change in y, with the other parameters affecting u∗ held fixed. The partial derivative is the analogous concept to the derivative of a function of one variable, for functions of more than one variable. In this example, partial derivatives appear both in the choice of x1 and the subsequent calculation of the marginal utility of income. In production theory, a production function f (k, l ) associates a level of output

Q = f (k, l ) to inputs of capital (k) and labor (l ). If capital is increased from k to k +

∆, then output increases to f (k+∆, l ) and the change in output is f (k + ∆, l ) − f (k, l ).

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

222

Dividing this by ∆ gives [ f (k + ∆, l) − f (k, l)]/∆, the change in output per unit

change in the capital input. As ∆ goes to 0, assuming f is differentiable with respect to k, the limiting value is called the marginal product of capital. In demand theory, when a profile of goods (x1 ,... , x n ) is chosen to maximize utility subject to a budget constraint p1 x1 + · · · + p n x n = y, the resulting demands depend on the profile of prices and income. Thus, the demand for good i is x i (p1 ,... , p n , y). When prices other than p i and income y are constant, changes in p i lead to changes in x i , and this connection between x i and p i gives the demand function for good i. As p i varies, the resulting change in x i is [x i (p1 ,... , p i−1 , p i + ∆ p i , p i+1 , p n , y) − x i (p1 ,... , p i−1 , p i + ∆ p i , p i+1 , p n , y)] . ∆ pi The partial derivative with respect to one variable depends on the values of the other variables entering the function. For example, the impact of a price change on demand depends on the prices of other goods and income. In Figure 7.1, the partial derivative with respect to p i at (p1 ,... , p i ,... , p n , y) is different from ˜ since |∆ x˜ i | > |∆ x i |. the partial derivative at ( p˜ 1 ,... , p i ,... , p˜ n , y), The next section defines the partial derivative of a function – a natural extension of the derivative of a function of a single variable, to functions of many variables. Following this, level contours are introduced and then, using partial derivatives, the equilibrium output in a duopoly is determined.

xi

∆ x˜ i ∆xi

∆ pi

˜ x i ( p˜ 1 ,... , p i ,... , p˜ n , y) x i (p1 ,... , p i ,... , p n , y) pi

Figure 7.1: The impact of price variations on demand

7.2. PARTIAL DERIVATIVES

223

7.2 Partial derivatives Let f : X → R , where X ⊆ R n , so f ( x) = f ( x1 , x2 ,... , x n ). If all the xs except one

are held fixed, we can view f as a function of one variable, and then consider the derivative of f with respect to this variable. This is called the partial derivative. The partial derivative of f with respect to x i is denoted ∂f ∂xi

Thus,

= lim

∂f ∂ xi

∆ xi →0

∂f ∂ xi

and defined as:

f ( x1 , x2 ,... , x i−1 , x i + ∆ x i , x i+1 ,... , x n ) − f ( x1 , x2 ,... , x n ) . ∆xi

is defined provided this limit exists for all ∆ x i → 0. The partial

derivative of f with respect to x i measures the change in f due to a small change in x i , when other variables are held constant.

E XAMPLE 7.2: With f ( x1 , x2 ) = ( x1 x2 )2 + cos( x1 ) sin( x2 ), ∂f ∂ x1 ∂f ∂ x2

= 2( x1 x2 ) x2 − sin( x1 ) sin( x2 ) = 2 x1 x22 − sin( x1 ) sin( x2 ), = 2( x1 x2 ) x1 + cos( x1 ) cos( x2 ) = 2 x12 x2 + cos( x1 ) cos( x2 ).

♦ R EMARK 7.1: Often, for notational convenience, the partial derivative of f with respect to x i is denoted f xi or f i when no confusion can arise. Also, where there are just two variables in consideration, these may be labeled x and y with partial derivatives f x and f y . With more than two variables, it is often more conve-

nient to label by index, x i , i = 1,... , n. Note that for f ( x1 , x2 ,... , x n ),

∂f ∂ xi

is a function of all the x i . One can take the

partial derivative of this function – the partial derivative of the derivative. n partial o derivative of

∂f ∂ xi

∂f

∂f ∂ xi

with respect to x i is ∂∂x ∂ x and the partial i i n o ∂f with respect to x j is ∂∂x ∂ x . For ease of notation, these are

Thus, the partial derivative of

sometimes written as

j

∂2 f ∂ x2i

and

∂2 f ∂ x j ∂ xi

i

or as f xi xi and f x j xi . In the two-variable case ∂f

∂f

with f ( x, y), the partial derivatives are f x = ∂ x and f y = ∂ y and the second partial n o n o n o ∂f ∂f ∂f derivatives are written as f xx = ∂∂x ∂ x , f yy = ∂∂y ∂ y f x y = ∂∂x ∂ y . and f yx = n o ∂f ∂ . ∂ y ∂x

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

224

E XAMPLE 7.3: To illustrate, consider the two-variable case with f ( x, y) =

xα yβ . Then:

f x ( x, y) = f x ( x, y) = Notice that f x ( x, y) =

∂f ∂x ∂f ∂y

= α x α −1 y β = α

xα β α y = f ( x, y) x x

= β xα yβ−1 = β xα

∂f ( x, y) ∂x

=

∂{ f ( x,y)} ∂x

yβ β = f ( x, y). y y

= α xα−1 yβ depends on x and y – it

is a function of ( x, y). The n ofunction f x ( x, y) has a partial derivative with re∂ ∂f spect to x, which is ∂ x ∂ x = α(α − 1) xα−2 yβ . Similarly, f x ( x, y) has a partial n o ∂f derivative with respect to y given by ∂∂y ∂ x = αβ xα−1 yβ−1 . In Example 7.2, the second partial derivatives are given by:

f x1 x1 = f x2 x1 = f x2 x2 = f x1 x2 =

∂ ∂ x1 ∂ ∂ x2 ∂ ∂ x1 ∂ ∂ x1

½ ½ ½ ½

∂f ∂ x1 ∂f ∂ x1 ∂f ∂ x2 ∂f ∂ x2

¾ ¾ ¾ ¾

= 2 x22 − cos( x1 ) sin( x2 ) = 4 x1 x2 − sin( x1 ) cos( x2 ) = 2 x12 − cos( x1 ) sin( x2 ) = 4 x1 x2 − sin( x1 ) cos( x2 ).

♦

Note that in Example 7.3, f x2 x1 = f x1 x2 . This is not accidental. When the partial

derivatives exist and are continuous, then the second partial derivatives satisfy a special relation known as Young’s theorem. Theorem 7.1: Suppose the function f ( x1 ,... , x n ) has continuous partial derivatives, f xi ( x1 ,... n , xon ), in a neighborhood of ( x¯ 1 ,... , x¯ n ), and suppose they are differ-

entiable,

∂ ∂ xi

∂f ∂x j

exists for each i, j; then the second partial derivatives are equal:

f xi x j =

∂2 f ∂xi ∂x j

=

∂ ∂xi

½

∂f ∂x j

¾

=

∂ ∂x j

½

∂f ∂xi

¾

=

∂2 f ∂x j ∂xi

= f x j xi .

E XAMPLE 7.4: Consider f ( x, y) = ln( x y) · sin y, so that f x = 1 x

y xy

sin y =

and f yx = cos y. Differentiating first with respect to y gives: f y = 1 y

x xy

1 x

sin y

sin y +

ln( x y) cos y = sin y + ln( x y) cos y. And then differentiating with respect to x:

fxy =

y xy

cos y = 1x cos y. So, f x y = f yx .

♦

7.3. LEVEL CONTOURS

225

7.3

Level contours

Level contours arise frequently in economics, for example, as isoquants in production theory or as indifference curves in consumer theory. Given a function of n variables, f (x1 ,... , x n ), the level contour of f at level or height f¯ is the set of values of (x1 ,... , x n ) such that f (x1 ,... , x n ) = f¯. If this set is denoted L ¯ , then f

L f¯ = {(x1 ,... , x n ) | f (x1 ,... , x n ) = f¯} Different values of f¯ determine different level contours. The level contour is a way of visualizing a function of more than one variable.

y

z

xy

5 4 3 2

z=3

1

z=2 z=1

1

2

3

4

x

5 xy

1

1

Figure 7.2: The function f (x, y) = x 2 y 2 1

1

In Figure 7.2, the function f (x, y) = x 2 y 2 is plotted, and three level contours

graphed, giving the (x, y) pairs associated with the level contours {(x, y) | f (x, y) = 1}, {(x, y) | f (x, y) = 2} and {(x, y) | f (x, y) = 3}.

Given a utility function u(x, y), the indifference curve corresponding to utility ¯ level u¯ is defined as the set of (x, y) combinations that provide level of utility, u: I u¯ = {(x, y) | u(x, y) = u¯ }. The isoquant corresponding to a production function f

is the set of input combinations (x, y) that yield a given level of output Q: I Q = {(x, y) | f (x, y) = Q }. Considering the production function f , since a small change

in x has impact f x on the value of output and a small change in y has impact f y , the total change in output from small changes (dx, d y) in (x, y) is d f = f x (x, y)dx +

f y (x, y)d y. This is called the total derivative of f at (x, y). Note that d f depends on

(x, y) and (dx, d y). An isoquant is a level contour of the function f . Starting from a point (x, y) on the contour, small variations in x and y that leave the value of

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

226

y

dy

f (x, y) = Q x

dx

Figure 7.3: Isoquant slope f unchanged satisfy d f = 0. This determines the rate of change in y required to

offset a change in x so as to leave the output level unchanged (see Figure 7.3): dy = −

f x (x, y) dx. f y (x, y)

E XAMPLE 7.5: This type of calculation may be used in other contexts. Consider a government that imposes an ad-valorem tax on all goods at the rate τ. Let s i be the total sales of good i and let p i be the price of good P i, both constant over time. Total tax revenue is T = ni=1 (1 + τ)s i p i . Suppose now that prices increase at the rate π(t) so that at time t, the price of good i

is p i (1 + π(t)). Suppose also, that the government adjusts the tax rate τ over P time to keep the total tax revenue constant: T = ni=1 (1 + τ(t))(1 + π(t))s i p i = P (1 + τ(t))(1 + π(t)) ni=1 s i p i . Then, differentiating with respect to t, 0=

½

¾ n d τ(t) d π(t) X (1 + π(t)) + (1 + τ(t)) si pi . dt dt i =1

Therefore, 0=

·

¸ d τ(t) d π(t) (1 + π(t)) + (1 + τ(t)) . dt dt

Thus, taxes should be adjusted over time according to the formula: d τ(t) (1 + τ(t)) d π(t) =− . dt (1 + π(t)) dt

♦

7.4. THE ELASTICITY OF SUBSTITUTION

227

7.4

The elasticity of substitution

The elasticity of demand measures the slope

dQ dP

of a curve (the demand function)

relative to the ratio of the variables at that point on the curve,

Q P.

This is the slope

of the curve at a point, divided by the slope of the ray from the origin to that point. The elasticity of substitution measures how that ray varies as the slope varies (all in percentage terms). In principle, this measure can be applied to the level surface of any function, but two particularly important functions in economics are the utility and production functions. Consider Figure 7.4, where the indifference curve u is given and a change in slope from a to b depicted. The marginal rates of substitution at a and b are denoted MRS(a) and MRS(b) and denote the slope of the indifference curve at the respective points. The ray slope from the origin to a is the ratio of consumption of x2 relative to x1 , u x1 (a) u x2 (a) .

substitution is MRS(a) =

x2a , x1a

when the marginal rate of

When the marginal rate of substitution changes

to MRS(b), the slope of the ray becomes

x2b x1b

. Therefore, the percentage change

in the ratio of consumption resulting from the percentage change in the MRS is given by the following expression, called the elasticity of substitution: "

xb 2 xb 1

# · ¸ xa − x2a 1

·

η=

xa 2 xa 1

¸

MRS(b)− MRS(a) MRS(a)

=

·

MRS(a) x2a x1a

In Equation 7.1, with u given, the ratio

¸ h ai x2 b − xa

x2b x1

1

MRS(b) − MRS(a) x2 x1

.

(7.1)

is viewed as a function of MRS,

with the elasticity tracking the change in the ratio resulting from the change in MRS. Considering this measure at a point corresponds to the difference between MRS(a) and MRS(b) becoming small (approaching 0 in the limit). Here, this leads to an expression in terms of the ln(x) function. Recall that the derivative of ln(x) is

1 x

so that

1 x

=

d ln(x) dx

≈

ln(x+∆)−ln(x) . ∆

Thus, ln(x + ∆) − ln(x) ≈

∆ x

′

= ³x x− x´

where x′ = x + ∆. From this perspective, the numerator in Equation 7.1 is d ln and the denominator is d ln(MRS). In differential terms: ³x ´ ³x ´ 2 2 d ln x1 d ln x1 ³ ´, η= = d ln(MRS) d ln u x1 ux

x2 x1

(7.2)

2

recalling the fact that MRS is obtained from du = 0 = u x1 dx1 + u x2 dx2 , giving d x2 d x1

ux

= − u x1 , giving MRS = 2

u x1 u x2

(measuring the slope in absolute terms).

Regarding the sign of η, note that, in the movement from a to b, the slope of the indifference curve is always negative but increases towards 0. So, in absolute

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

228

x2

x2a

•

a •b

x2b

u

MRS(a) MRS(b)

x2 x1

x1

x1a x b 1

Figure 7.4: Elasticity of substitution

terms, the slope ( MRS ) declines (is getting smaller). Likewise, the ratio of x2 to

x1 gets smaller, moving from a to b. Therefore, in moving from a to b both the ratio and MRS decline. Conversely, in the movement from b to a both increase. Consequently, as defined, the elasticity of substitution is non-negative, provided the level surface of the function is convex, as depicted.

E XAMPLE 7.6: The constant elasticity of substitution function is: ρ 1

ρ

u( x1 , x2 ) = [α x1 + (1 − α) x2 ] ρ . Therefore,

u x1 = u x2 =

1 ρ 1 ρ

ρ

ρ 1 −1

ρ

ρ 1 −1

[α x1 + (1 − α) x2 ] ρ [α x1 + (1 − α) x2 ] ρ

ρ −1

αρ x1

ρ −1

(1 − α)ρ x2

.

Therefore, MRS is:

u x1 u x2

=

ρ −1

αρ x1

ρ −1

(1 − α)ρ x2

µ

x1 = 1 − α x2 α

¶ρ −1

.

Taking the ln of both sides, ln

µ

u x1 u x2

¶

= ln

µ ¶ µ ¶ ³ α ´ ³ α ´ x1 x2 = ln . + (ρ − 1) ln + (1 − ρ ) ln 1−α x2 1−α x1

7.4. THE ELASTICITY OF SUBSTITUTION

229

Therefore, ln

µ

¶ µ ¶ ³ α ´ u x1 x2 1 1 =− ln + ln x1 1−ρ 1−α 1−ρ u x2

and ³ ´ d ln xx2 1 1 ³u ´ = η= . x1 1 − ρ d ln u x

♦

2

The constant elasticity function can exhibit substantial variation in shape, depending on the value of ρ . Figure 7.5 illustrates three possible values for ρ . The figure suggests some properties of this function as ρ varies. When ρ = 1, the

function is linear. As ρ decreases to 0, the function approaches the Cobb–Douglas function, x1α x21−α , while, as ρ goes to −∞, the function approaches the Leontief

function, min{ x1 , x2 } (and as ρ tends to ∞, the function tends to max{ x1 , x2 }). x2

x2

x2

ρ=2

ρ=

1 2

ρ = −4

x1

x1

x1 ρ

ρ 1

Figure 7.5: Plots of level contours of u = [x1 + x2 ] ρ ρ 1

ρ

Theorem 7.2: Given u(x1 , x2 ) = [α x1 + (1 − α)x2 ] ρ , then: ρ

ρ 1

ρ

ρ 1

• [α x1 + (1 − α)x2 ] ρ → x1α x21−α as ρ → 0 • [α x1 + (1 − α)x2 ] ρ → min{ x1 , x2 } as ρ → −∞. P ROOF : Write uρ to make dependence on ρ explicit. Assume at least one of ρ

ρ

{ x1 , x2 } is strictly positive, so that α x1 + (1 − α)x2 is strictly positive. Let

U ρ (x1 , x2 ) = f (uρ (x1 , x2 )) = ln uρ (x1 , x2 ) =

ρ ρ ln[α x1 +(1−α)x2 ]

ρ

ous monotonic transformation of u. Observe that

ρ α x1

=

q(ρ ) . ρ

So, f is a continuρ

+ (1 − α)x2 → 1 as ρ → 0,

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

230

ρ

ρ

so ln[α x1 + (1 − α) x2 ] → 0 and therefore both numerator and denominator go to

0 as ρ goes to 0. Using l’Hospital’s rule, the limit of this ratio is the limit of the derivatives: ρ

ρ

lim

ln[α x1 + (1 − α) x2 ] ρ

ρ →0

using the fact that function f

−1

d ρ dρ z z

ρ

ρ

= lim =

α x1 ln x1 + (1 − α) x2 ln x2 ρ

ρ

= α ln x1 + (1 − α) ln x2 ,

ρ →0

α x1 + (1 − α) x2

d (ln z )ρ dρ e

= ln ze(ln z)ρ = ln z · zρ and limρ →0 zρ = 1. The

( z) = e is continuous, so that uρ = f −1 (U ρ ) is continuous. Since

U ρ → α ln x1 + (1 − α) ln x2 ,

uρ ( x1 , x2 ) → eα ln x1 +(1−α) ln x2 = eα ln x1 e(1−α) ln x2 = x1α x21−α. So, as ρ tends to 0, the utility function tend to the Cobb–Douglas function. In the second case, suppose that x1 ≤ x2 . Write µ ¶ρ ¾¸ 1 · ½ ρ x2 ρ 1 ρ ρ [α x1 + (1 − α) x2 ] ρ = x1 α + (1 − α) x1 · µ ¶ρ ¸ 1 ρ x2 = x1 α + (1 − α) . x1 ³ ´ρ ≥ 1 and ρ → −∞ if x2 = x1 then xx12 = 1 for all ρ and if x2 > x1 , ³ ´ρ x2 → 0. Therefore α + (1 − α) xx2 → d where α ≤ d ≤ 1. Since ρ → −∞, ρ1 → 0, x1 1 h ³ ´ρ i 1 ρ x2 so that α + (1 − α) x1 → 1. So, for x1 ≤ x2 , the limit is x1 , and conversely;

Since ³ ´ρ

x2 x1

when x2 ≤ x1 , the limit is x2 . Therefore, as ρ → 0, £

ρ

ρ ¤ ρ1

α x1 + (1 − α) x2

→ min{ x1 , x2 }.

7.5 Market equilibrium: an application In the following discussion, partial derivatives are used to find the equilibrium output in an oligopolistic market with n firms. Consider a market where the (inverse) demand is given by P (Q ), giving the price at which the quantity Q will sell. Suppose there are n firms, where firm i produces output q i . Let Q denote P aggregate output: Q = ni=1 q i . Suppose also that firm i has cost function c( q i ), giving the total cost of producing q i units for firm i . Sales of q i units at price p

generate a profit of pq i − c( q i ) for firm i , and suppose the objective of the firm is

to maximize profit. The problem in this environment is that the price depends on total output Q , and Q depends on q i and the other firms’ output levels, { q j } j 6= i .

7.5. MARKET EQUILIBRIUM: AN APPLICATION

231

So, profit of firm i depends on the output levels of all the firms. Making this explicit: π i ( q1 ,... , q n ) = q i P (Q ) − c i ( q i ) = q i P

Ã

n X

j =1

!

q i − c ( q i ).

Given the profile of output levels ( q1 ,... , q n ), a firm, say firm i , may consider the possibility of changing output to some other level, say q′i . It is in the firm’s interest to do so if: π i ( q1 ,... , q i−1 , q′i , q i+1 ,... , q n ) > π i ( q1 ,... , q i−1 , q i , q i+1 ,... , q n ).

In that case, changing output from q i to q′i raises profit. When firms behave in this way (competing by choice of quantity and picking the profit-maximizing output, taking the actions of others as given), the form of competition is called Cournot competition. From the previous expression, if firm i chooses q i to maximize π i , at the chosen level of q i it must be that either increasing or reducing q i by a small amount provides no increase in firm i ’s profit: ∂π i ( q1 ,... , q i−1 , q i , q i+1 ,... , q n ) ∂qi

= 0.

Definition 7.1: Call an output profile ( q∗1 ,... , q∗n ) a Cournot equilibrium if no firm has an incentive to change output level.

This condition (that no firm has an incentive to change output) will be true if, for each i , π i ( q∗1 ,... , q∗i−1 , q i , q∗i+1 ,... , q∗n )

is maximized at q i = q∗i , taking q∗j constant for j 6= i .

A necessary condition for maximization of π i ( q∗1 ,... , q∗i−1 , q i , q∗i+1 ,... , q∗n ) by

choice of q i is that the (partial) derivative with respect to q i is equal to 0 at

q i = q∗i . The derivative of π i ( q∗1 ,... , q∗i−1 , q i , q∗i+1 ,... , q∗n ) with respect to q i equals

0 at q i = q∗i :

∂ ∂qi

(Here,

π i ( q∗1 ,... , q∗i−1 , q∗i , q∗i+1 ,... , q∗n ) = 0.

∂ π ( q∗ ,... , q∗i−1 , q∗i , q∗i+1 ,... , q∗n ) = 0 ∂q i i 1

∂ ∂qi

evaluated at q i = q∗i .)

is understood to mean the derivative

π i ( q∗1 ,... , q∗i−1 , q i , q∗i+1 ,... , q∗n )

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

232

Whether this condition is sufficient for the output vector to be an equilibrium is taken up later. The next section applies these calculations in the case where demand and cost functions are linear.

7.5.1 Oligopoly: linear demand To illustrate the calculations, consider the case of linear demand P (Q ) = a − bQ ,

and let firm i ’s cost be given by c( q i ) = cq i , so that every firm faces the same constant marginal cost, c, and 0 fixed cost. Then Ã

πi = q i a − b

!

n X

q j − cq i

Ã

!

j =1

= aq i − bq i = aq i − bq i

n X

j =1

Ã

X

j 6= i

q j − cq i !

q j − bq2i − cq i .

Taking the partial derivative of π i with respect to q i and setting to zero gives: ∂π i ∂qi

= a − c − 2bq i − b

X

j 6= i

q j = 0,

and rearranging the equality: X ( a − c) = 2qi + q j. b j 6= i

This gives a system of equations: 

2

1

 1  .  ..  1

2 .. .

··· .. .

1

···

1

···

which has the solution: ( q∗1 ,... , q∗n ) =



q1



 a− c 

b     a−   q2   c  1    b   .  =  . , ..   .   .  .  .   .  a− c 2 qn b

³

a− c a− c ( n+1) b ,... , ( n+1) b

´

. (To see this note that if the

q i are all equal and satisfy any equation, then they satisfy all equations, because

with q i = q∗ , each equation requires (n + 1) q∗ = each i , and

∂ ∂qi

a− c b .)

Thus, q∗i = q∗ =

π i ( q∗1 ,... , q∗i−1 , q∗i , q∗i+1 ,... , q∗n ) = 0

1 a− c n +1 b

for

7.5. MARKET EQUILIBRIUM: AN APPLICATION

233

is satisfied. This is called a symmetric equilibrium, since each firm makes the same choice. In fact, the solution might have been determined more easily. From the condition ∂π i ∂qi

= a−b

Ã

X

j 6= i

!

q i − 2bq i − c = 0,

assuming that the equilibrium is symmetric (each firm makes the same choice), P this will be satisfied by the symmetric choice q i = q∗ so that j 6= i q i = (n − 1) q∗ , giving a − b(n − 1) q∗ − 2bq∗ − c = 0, or a − c − b(n + 1) q∗ = 0, or q∗ = ∗

case, total output is Q o = nq =

n a− c n +1 b .

1 a− c n +1 b .

In this

R EMARK 7.2: Recall that the competitive quantity, Q c , is the quantity that leads to price equaling marginal cost: P (Q c ) = c. In the linear case, Q c =

( a − c) b ,

so

that Q o , the equilibrium oligopoly output, is related to Q c according to n ( n+1) Q c .

The price associated with the oligopoly output is P (Q o ) = a − n ( a − c) n bQ o = a − b ( n+1) Q c . Substituting in the value of Q c : P (Q o ) = a − b ( n+ 1) b = a( n+1)− n(a− c) (a+ nc) a n n = ( n+1) = ( n+1) + c ( n+1) . To summarize: a − ( n+1) [(a − c)] = ( n+1)

Qo =

Qo =

n 1 n Q c , and P (Q o ) = a + c. n+1 n+1 n+1

As the number of firms becomes large (n → ∞), the quantity converges to the competitive quantity, Q o → Q c , and the price converges to the competitive price n 1 (marginal cost), since a n+ 1 + n +1 c → c .

7.5.2

Oligopoly: non-linear demand

In the general case of non-linear price and cost functions, the profit function is: π i ( q1 ,... , q n ) = q i P

Ã

n X

i =1

!

q i − c ( q i ).

Repeating the earlier calculations: Ã ! n X ∂π i dP qi + qi · =P − c ′ ( q i ) = 0. ∂qi dQ i =1 In a symmetric equilibrium, each firm produces the same quantity, say q∗ , and this must satisfy the equation:

P (nq∗ ) + q∗ P ′ (nq∗ ) − c′ ( q∗ ) = 0 ½ ¸¾ · 1 nq∗ ′ ∗ P (nq∗ ) 1 + P ( nq ) = c ′ ( q ∗ ), n P (nq∗ )

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

234

or, using Q ∗ = nq∗ , ¸¾ · 1 Q∗ ′ ∗ P (Q ) = c′ ( q∗ ) P (Q ) 1 + n P (Q ∗ ) ∗

½

P and with η(Q ) denoting the elasticity of demand at Q , η(Q ) = − Q

dQ dP ,

¸ · 1 1 = c ′ ( q ∗ ). P (Q ∗ ) 1 − n η(Q ∗ )

Thus, again price converges to marginal cost as n → ∞. This is a common theme

in economic models: competition (a large number of firms) leads to efficient levels of output.

7.5.3 Industry concentration indices The previous discussion suggests that when a small number of firms are supplying a market, there is less competition, output is lower and prices higher. In industrial organization, a market is said to be “concentrated” if there are a small number of suppliers or if a small number of firms dominate the market. A natural question is: when is a market too concentrated, in the sense that market dominance of a few firms leads to large inefficiencies? One measure of concentration is the Herfindahl–Hirschman index. This index is a simple function of market shares of firms. The discussion here relates that index to the Cournot model using the techniques developed earlier. When there are a small number of firms in an industry, some firms will have a larger share of industry sales. If one or two firms have a large share of total sales, the industry is considered to be concentrated. The Herfindahl–Hirschman index (HH I ) of concentration in an industry is defined: 2

2

2

HH I = (100s1 ) + (100s2 ) + · · · + (100s n ) = 100

2

Ã

n X

i =1

s2i

!

,

where s i is the share of firm i of industry sales. This sum is minimized when the shares are all equal, so that larger values of HH I suggest greater industry concentration. (For example, with just two firms, s1 + s2 = 1, so that s21 + s22 =

s21 + (1 − s1 )2 . Differentiating with respect to s1 and setting to 0 gives 2s1 − 2(1 −

s1 ) = 0 = 4s1 − 2. So s1 =

1 2

and this corresponds to a minimum, since the second

derivative of s21 + (1 − s1 )2 equals 4, which is positive.) With n firms, the minimum P P occurs at s i = n1 for all i . In this case, s2i = 12 and ni=1 s2i = ni=1 12 = n 12 = n1 . n n n Consequently, the smallest possible value for HH I is 1002 n1 , when there are n firms. (In fact, viewing the HH I index as a function of the shares: HH I (s1 ,... , s n ),

7.6. TOTAL DIFFERENTIALS

235

the total change is d (HH I ) = (1002 )2 · [s1 ds1 + s2 ds2 +· · ·+ s n ds n ], where the share P changes must add to 0: ni=1 ds i = 0. If, for some j, k, s j > s k then setting ds k = −ds j with ds j < 0 decreases the HH I . Thus, s j = s k is necessary for a minimum.)

Consider now the firm’s optimizing condition in choosing q i . In what follows, allow different firms to have different cost functions: the cost function of firm i is

c i ( q i ). Maximizing profit for i gives the first-order condition: Ã ! n X dP qi + qi · P − c′i ( q i ) = 0. dQ i =1 Write Q =

P

i

q i , so that s i =

qi . Q

Rearrange the condition as:

P (Q ) − c′i ( q i ) + q i · P ′ (Q ) = 0. Divide by P (Q ):

P (Q ) − c′i ( q i ) P (Q )

+ qi ·

P ′ (Q ) = 0. P (Q )

qi : Q

Multiply by

′ 2 P ′ (Q ) q i q i P (Q ) − c i ( q i ) +Q · = 0. Q P (Q ) P (Q ) Q 2

Recalling that s i =

qi Q

η(Q ) = −

and

P dQ 1 Q dP Q P ′ (Q ) ⇒ =− = − P ′ (Q ) = −Q , Q dP η(Q ) P dQ P P (Q )

this expression may be written:

si So, n X

i =1

si

P (Q ) − c′i ( q i ) P (Q )

=

P (Q ) − c′i ( q i ) P (Q )

−

1 2 s = 0. η(Q ) i

n n X 1 1 X HH I 1 1002 s2i = s2i = 2 2 η(Q ) η(Q ) i=1 100 η(Q ) 100 i =1

Since P (Q ) − c′i ( q i ) is the price over cost margin of firm i ,

Pn

i =1 s i

P (Q )− c′i ( q i ) P (Q )

is the

weighted average of these margins. So, the higher is the HH I , the higher is the average price–cost margin or markup.

7.6 Total differentials The partial derivative of a function f ( x1 ,... , x n ) with respect to a variable x i measures the impact of a change in x i on the value of the function f . If f xi is the

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

236

partial derivative of f with respect to x i , then the impact of a small change in

x i , ∆ x i , on f is approximately f xi ∆ x i . One may consider the impact of multiple changes in all the x i at the same time, to consider a total variation in f resulting from a set of small changes in all the x i . Given a function f ( x1 ,... , x n ), the total differential of f , denoted d f , is defined:

df =

n ∂f n X X dx i = f xi dx i . i =1 ∂ x i i =1

(7.3) ∂f

³Note that d f ´depends on x = ( x1 ,... , x n ) and dx = (dx1 ,... , dx n ). Put ∂ x = ∂f ∂f ∂f ∂f ∂ x1 , ∂ x2 ,... , ∂ xn and dx = ( dx1 , dx2 ,... dx n ). Then d f = ∂ x · dx. In some branches

of mathematics, this sum is sometimes referred to as a “1-form”, consisting of the inner product of ( f x1 ,... , f xn ) and the vector direction (dx1 ,... , dx n ). Given

( x1 ,... , x n ), the partial derivatives are determined, the functions f xi ( x1 ,... , x n ), and Equation 7.3 is an equation in the changes, the dx i .

E XAMPLE 7.7: The following functions illustrate the total differential: 1. Let f ( x, y) = x2 ln( y). 1 d f = 2 x ln( y)dx + x2 d y. y 2. Let f ( x, y) = sin( x) cos( y) ln( x y). The total differential is: · ¸ 1 d f = cos( x) cos( y) ln( x y) + sin( x) cos( y) dx x · ¸ 1 + − sin( x) sin( y) ln( x y) + sin( x) cos( y) d y. y

3. Let f ( x, y) = etan( x y) . The total differential is:

df = e using the fact that 4. Let f ( x, y) =

³

tan( x y)

d dz

´ ln( y) α . ex

df =α

·µ

¶ µ ¶ ¸ y x dx + dy , cos2 ( x y) cos2 ( x y)

tan( z) = sec2 ( z) =

1 . cos2 ( z )

The total differential is: µ

ln( y) ex

¶α−1 ·µ

¶ µ ¶ ¸ − ln y 1 dx + dy . ex ye x

♦

7.6. TOTAL DIFFERENTIALS

237

The partial derivative of a function f (x, y), say with respect to x, f x (x, y), measures the change in f (x, y) due to a small change in x, dx. This is approximately

fx ≈

f (x + dx, y) − f (x, y) . dx

Thus, f x (x, y)dx ≈ f (x + dx, y) − f (x, y) is the change in f resulting from the change

in x. So, d f = f x (x, y)dx + f y (x, y)d y measures the overall change in f from the

combined effects of changing x and y. When the overall change is 0, d f = 0, and so 0 = f x (x, y)dx + f y (x, y)d y. This is an equation in dx and d y, given (x, y), which f (x,y)

may be solved to give d y = − f xy (x,y) dx, the change in y necessary to keep the value of the function constant, given the change in x of dx.

For example, consider the function f (x, y) = ax + b y. Then d f = adx + bd y,

since f x = a and f y = b. If d f = 0, then 0 = adx + bd y and d y = − ab dx. If x is

increased by dx, y must be reduced by d y = − ab dx to maintain the value of the

function constant. The following example utilizes this “trade-off ” computation in an economics application (using the marginal rate of substitution). Sometimes, partial derivatives may be used in the simplification of a problem. The following example reduces a two-variable constrained optimization problem to a one-variable problem, where partial derivatives are then used to locate the optimum. Utility is given as a function of income (y) and leisure (l): u = u(y, l). If cur-

rent income and leisure levels are (y, l), then small changes in income and leisure, d y and dl, produce a change in utility given by: du = u y (y, l)d y + u l (y, l)dl, the

total differential of u at (y, l). To simplify notation, write du = u y d y + u l dl. When

this is set to 0, the equation 0 = u y d y + u l dl relates small changes in y to small u

u

changes in l that leave utility unchanged: d y = − u yl dl or dl = − u yl d y. Thus, dy dl

u

= − u yl given the rate at which y must be substituted for l at the point (y, l)

to keep utility constant: as l (leisure) increases, y must be reduced at the rate u

− u yl to keep utility constant. The partial derivatives u y and u l are the marginal

utility of y and l respectively.

With work defined as available time (24) minus leisure (l), the time spent working is 24 − l and at wage ω, income is y = ω(24 − l). Therefore, d y = −ωdl.

Substitute into the total differential: du = u y (−ωdl) + u l dl = [−ω · u y + u l ]dl. If

utility is maximized, then it should not be possible for small changes in l to increase utility. For this to be the case requires that [−ω · u y + u l ] = 0, or ω · u y = u l , or ω =

ul uy .

Figure 7.6 depicts the solution.

The same condition is obtained directly by optimization. The income constraint, y = ω(24 − l), can be substituted into the utility function: u(ω(24 − l), l) = u∗ (l). The

problem of maximizing utility reduces to maximizing u∗ with respect to l.

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

238

y 24w

•

ul uy

=w

l

24

Figure 7.6: Optimal income–work combination

max u∗ gives l

dy dl

Now,

du∗ ∂u d y ∂u =0= + . dl ∂ y dl ∂l

= −ω, from the budget constraint, so the first-order condition becomes −u y ω + u l = 0, or ω =

where

MUl MU y

ul MUl = , u y MU y

is the ratio of marginal utilities. For a numerical example, suppose 1

1

that u( y, l ) = y 2 l 2 and so:

uy = This gives ω =

ul uy

=

y l

1 −1 1 y 2l2 = 2

1 21 21 2y l

y

; ul =

1 1 −1 y2 l 2 = 2

1 21 12 2y l

l

.

or ωl = y as the optimizing condition. Recalling that y =

ω(24 − l ), the optimizing condition may be written ωl = ω(24 − l ). Thus, 2ωl = 24ω

or l =

24 2

= 12 and y = ωl = 12ω.

E XAMPLE 7.8: This example considers a standard profit maximization problem. Let a firm’s profit be given by: π( k, l ) = P f ( k, l ) − rk − wl.

7.6. TOTAL DIFFERENTIALS

239

The total derivative of π is:

d π = πk dk + πl dl = (P f k − r)dk + (P f l − w)dl. For profit to be maximized, it must be that there is no variation in k or l that raises profit. This implies that (P f k − r) = 0 and (P f l − w) = 0. If, for

example, (P f k − r) < 0, then reducing k (dk < 0) would raise profit, while

if (P f k − r) > 0, then raising k (dk > 0) would raise profit. This gives the first-order conditions. The first-order conditions are: ∂π

= P f k − r = 0, f k =

∂k

and

∂π ∂l

= P f l − w = 0, f l =

These conditions imply that f k =

r P

and f l =

∂f ∂k ∂f ∂l

.

w P.

Consider a specific functional form: f (k, l ) = kα l β , then:

f f f k = αkα−1 l β = α , and f l = βkα l β−1 = β . k l Rearranging,

fk fl

=

α l β k

=

r w,

so that αwl = βrk and k =

αw βr l .

This determines

k as a function of l . To solve for l , use either of the first-order conditions. Taking f k = αkα−1 l β =

r P

and substituting for k yields:

α

α

µ

µ

αw βr

αw βr

¶α−1

¶ α −1

l α −1 l β =

l (β+α−1) =

r P r . P

Solving for l :

l=

½

µ ¶ ¾1/(α+β−1) r 1 α w 1 −α . P α βr

Similarly, the value of k is then determined: µ ¶ ¾1/(α+β−1) r 1 α w 1 −α k= βr P α βr )1/(α+β−1) (µ ¶ µ ¶ αω α+β−1 ³ r ´ 1 αw 1−α = βr P α βr (µ ¶β ³ ´ )1/(α+β−1) r 1 αω = βr P α αw

½

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

=

=

=

(µ

αω

(µ

αω

(

βr

βr

¶β ³

240

µ ¶ µ ¶)1/(α+β−1) w´ β ³ r ´ 1 P α ω β

¶β−1 ³

µ ¶)1/(α+β−1) w´ 1 P β

)1/(α+β−1) µ ¶ w 1 β r 1 −β . P β αω

♦

7.6.1 The impact of parameter changes The following discussion illustrates the impact of parameter shifts on the solution to a pair of equations. A production function q = f (x, y) gives the output, q, associated with inputs x and y. Assume that raising input of either x or y raises output, f x > 0 and f y > 0. An isoquant identifies different combinations of x and y that give

the same level of output. For example, I q = {(x, y) | f (x, y) = q} is the set of (x, y)

pairs that give output q. Under standard assumptions on the function f , an isoquant is convex to the origin, as depicted in Figure 7.7. Changing x and y by small amounts, dx and d y, changes output according to dq = d f = f x dx + f y d y. If the changes in x and y leave output unchanged, 0 = f x dx + f y d y, then (x + dx, y + d y)

is on the same isoquant as (x, y), as indicated in Figure 7.7, where a = (x, y) and b = (x + dx, y + d y). Now, suppose that the producer must use inputs x and y in

the proportion k, so that the choice of (x, y) must satisfy y = kx. Thus, output level q is achieved with input usage (x, y) at location a. Implicitly, the choice of q and

k determines the values of x and y: (k, q) → (x, y). To make this explicit, write x = s(k, q) and y = t(k, q). So f (x, y) = q and y = kx implicity determine (x, y).

Consider a variation in k, dk. The total differential of y = kx is d y = xdk + kdx,

giving small variations in y, k, x that permit the equation y = kx to continue to be satisfied. Next, if x and y must change so as to leave output unchanged, 0 = f x dx + f y d y, and this with the requirement d y = xdk + kdx determines the way in which

x and y change. Substituting the requirement d y = xdk + kdx into 0 = f x dx + f y d y

gives:

0 = f x dx + f y (xdk + kdx) = ( f x + f y k)dx + f y xdk. f x

Thus, ( f x + f y k)dx = − f y xdk or dx = − ( f x +yf y k) dk. This measures the change in

x due to a change in k, with q unchanged. Dividing through by dk, it has the

7.6. TOTAL DIFFERENTIALS

241

y

dk

b • a •

dy

Iq

x

dx

Figure 7.7: A parameter change

interpretation as a partial derivative (since q is held fixed): ∂x ∂k

=−

f yx ( f x + f y k)

.

(7.4)

In a similar way, we can hdeterminei the impact ³ hof a change i´ in k in y. Recall that fyx fyx d y = xdk + kdx = xdk + k − ( f x + f y k) dk = x + k − ( f x + f y k) dk. Thus, ∂y ∂k

µ · = x+k −

f yx ( f x + f y k)

¸¶

.

(7.5)

In these two Equations 7.4 and 7.5, the values of x and y are understood to be the solution values, located at a. Recall that when x and y are solved in terms of (k, q), we have x = s(k, q) and

y = t(k, q), so the calculation in Equation 7.4 corresponds to

∂x ∂k

=

∂s ∂k .

In general,

determining the functions s and t may be difficult, depending on the functional form of f . Therefore, the calculations in Equations 7.4 and 7.5 provide insight into the impact of parameter variations, even though we have not solved for x and y. In the case where the production function has the Cobb–Douglas form, f ( x, y) =

xα yβ , the functions may be found easily. Since xα yβ = q and y = kx, substituting

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

q . kβ

gives xα (kx)β = q or kβ xα+β = q or xα+β =

x = s(k, q) =

·

y = t(k, q) = k

q

242

Thus, ¸

1 α+β

kβ · ¸ 1 q α+β kβ

.

R EMARK 7.3: Considering the Cobb–Douglas function q = xα yβ , one might also

consider this expression as having three parameters ( q, α, β). With y = kx, the

pair of equations q = xα yβ , y = kx determine ( x, y) as functions of four parame-

ters (k, q, α, β) in which case we might write x = s∗ (k, q, α, β) and t = t∗ (k, q, α, β).

This changes none of the calculations above. However, with just two parameters (k, q) there is a 1-1 connection between parameters and endogenous variables ( x, y), whereas with four parameters this is not possible.

7.7 Solutions to systems of equations Consider a system of linear equations A y = x, where A is an n × n matrix, y a vector of variables and x a vector of parameters. Provided A is non-singular, we

can solve for y: there is a unique solution y = A −1 x. Furthermore, the impact of variations in x on the solution value of y is easy to determine. If x changes to

x + ∆ x, the new solution value of y is given by y′ = A −1 ( x + ∆ x) = A −1 x + A −1 ∆ x = y + A −1 ∆ x, so the change in y is ∆ y = y′ − y = A −1 ∆ x. In the non-linear case, similar issues arise. In the two-equation two-unknown system,

f 1 ( y1 , y2 ) = x1 f 2 ( y1 , y2 ) = x2 , where ( y1 , y2 ) are the variables and ( x1 , x2 ) the parameters, the following questions arise: 1. Does the equation system have a solution, ( y1∗ , y2∗ )? 2. Is the solution unique? 3. Assuming there is a solution, how does it vary with the parameters ( x1 , x2 )?

7.7. SOLUTIONS TO SYSTEMS OF EQUATIONS

243

These questions are considered in detail next, but it possible to indicate the main insights with a brief discussion. To address the first question (existence of a solution), consider the system: f 1 (y1 , y2 ) = 0y1 + 1y2 = 1 (= x1 )

f 2 (y1 , y2 ) = e− y1 + y2 = 1 (= x2 ).

Rearranging, the first equation gives y2 = 1, the second, y2 = 1− e− y1 . Regardless of y2 1 a y1

Figure 7.8: No solution

how

large

y1

is,

1 − e− y1 < 1,

so

the

pair

of

equations

has

no

solutions (Figure 7.8). However, if x2 = 1 is changed to x2 = a, as depicted, then there is a solution and it is unique. Finally, in this case, small variations in (x1 , x2 ) from the point (x1 , x2 ) = (1, a) lead to small variations in the solution, answering the third question.

The impact of parameter variation on a non-linear system of equations is complicated by the fact that there may be multiple solutions. Even in the univariate

f (x) f (x) nb A a

k

cx d ©

x′

B

x′′

Figure 7.9: Multiple solutions

x

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

244

case, the issue arises, since an equation, k = f (x), may have multiple solutions. Consider the function f (x) depicted in Figure 7.9 and the equation k = f (x), which

has multiple solutions, at x′ , x and x′′ . However, if we identify a particular solution, x, as we vary k, the solution, x, should vary smoothly with k. As k varies in the region A = (a, b), the solution x varies in the region (c, d). If k increases to b, the solution x moves to c: with ∆k = b − k and ∆ x = c − x, ′

′

dk = f (x)dx, with f (x) 6= 0,

dx dk

=

1 f ′ (x) .

∆x ∆k

=

c− x b− k .

Noting that

Note that this “works” because f ′ (x) 6= 0,

and we have a well-defined (implicit) function g : (a, b) → (c, d) with g(k) = x. The next section explores this idea in the multivariate case.

7.7.1 Existence and properties of solutions To motivate the discussion, recall that in the linear case, if A is a square n × n

non-singular matrix, then the equation system A y = x has a unique solution,

y = A −1 x. If A is singular, there are two possibilities – the system has no solution

or an infinite number of solutions. Observe that if both y∗ and y¯ solve A y = x, then for any number θ,

¯ = θ A y∗ + (1 − θ)A y¯ = θ x + (1 − θ)x = x. A(θ y∗ + (1 − θ) y) So, if y∗ and y¯ solve the equation A y = x then so does (θ y∗ + (1 − θ) y¯ ) for any value

of θ. So, when there are at least two solutions, there are an infinite number of solutions. Notice also, that if θ is close to 1, (1 − θ) is close to 0 and (θ y∗ + (1 − θ) y¯ )

is close to y∗ . So, when there is more than one solution every solution has other solutions arbitrarily close. These properties contrast with the situation in the nonlinear case. Consider the equation pair: y12 + y22 = x1

(7.6)

y1 + y2 = x2 .

(7.7)

The first equation is a circle (of radius

p

x1 ); the second is a linear equation. When

x1 = 1 and x2 = 1, both equations pass through the points (y1 , y2 ) = (1,0) and

(y1 , y2 ) = (0,1), points b and a respectively. The pair of equations has those two

points as solutions and no others, as depicted in Figure 7.10.

If the diameter of the circle were smaller, defined by an equation y12 + y22 = xˆ 1

with xˆ 1 < 12 , as indicated by the dashed line, there would be no solution. The pair of equations are inconsistent. This example suggests two observations: it is difficult

to determine whether or not a non-linear system of equations has a solution; and

7.7. SOLUTIONS TO SYSTEMS OF EQUATIONS

245

y2 d •

• a

c •

e•

y12 + y22 = xˆ 1 < 12

y12 + y22 = x1′ > 1

b

y1

•

y1 + y2 = x2′ y1 + y2 = x2

y12 + y22 = x1 = 1

Figure 7.10: A system with two solutions “typically”, if a non-linear system has more than one solution, the solutions are a distance apart: here at (y1 , y2 ) = (1,0) and (y1 , y2 ) = (0,1). This local uniqueness is

exploited in the implicit function theorem. Given the equation pair,

f 1 (y1 , y2 ) = x1 f 2 (y1 , y2 ) = x2 , one may ask if, given x = (x1 , x2 ), the value of y = (y1 , y2 ) is uniquely determined, assuming there is a solution, y. In the two-equation example, y12 + y22 = x1 and y1 + y2 = x2 , when x1 = 1 = x2 , there are two possible solutions for the y values:

(y1 , y2 ) = (1,0), at a, and (y1 , y2 ) = (0,1) at b.

However, there is a unique solution if we restrict attention to a neighborhood of a or of b. In Figure 7.10, if (x1′ , x2′ ) is close to (x1 , x2 ), then there is a unique point close to a that satisfies the pair of equations y12 + y22 = x1′ and y1 + y2 = x2′ , the

point c. This observation may be used to determine the local variation in (y1 , y2 ) resulting from a variation in (x1 , x2 ). To see this, consider the pair of equations

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

246

y12 + y22 = x1 , y1 + y2 = x2 , and taking total differentials: 2y1 d y1 + 2y2 d y2 = dx1 1d y1 + 1d y2 = dx2 . Writing in matrix form and solving for (d y1 , d y2 ): Ã

2y1

2y2

1

1

!Ã

d y1

!

Ã

d y2

=





dx1 dx2

!

Ã

,

d y1 d y2

!

=

Ã

2y1

2y2

1

1

! −1 Ã

dx1 dx2

!

.

Thus,  

d y1 d y2

y

1 2( y1 − y2 )

− ( y1 −2y2 )

= − 2( y11− y2 )

y1 ( y1 − y2 )

 

dx1 dx2



.

At point a in Figure 7.10, (y1 , y2 ) = (0,1), and substituting this into the coefficient matrix gives:

 

d y1 d y2





=

− 12

1

1 2

0

 

dx1 dx2



.

This may be compared with Figure 7.10 at a. First suppose that dx2 = 0 and dx1 > 0 – this corresponds to a dilation of the circle. Then d y1 = − 21 dx1 and

d y2 = 12 dx1 indicated by the move from point a to point d. If dx1 is set to 0 and dx2 > 0, this corresponds to a right shift in the straight line. Then (approximately)

d y1 = dx2 and d y2 = 0, which corresponds to a direct move to the right from a

along the circle to point e. Note that this is approximate, with dx2 taken to be very small (so the movement from a to e is horizontal). In considering the general case, the question of whether a non-linear system has a solution, or not, will not be considered. The entire discussion focuses on the properties of solutions, on the assumption that the equation system has at least one solution. Let x, y ∈ R n , and suppose that x j = f j (y1 , y2 ,... , yn ), j = 1,... , n. This gives an equation system:

x1 = f 1 (y1 , y2 ,... , yn ) x2 = f 2 (y1 , y2 ,... , yn ) .. .=

.. .

x n = f n (y1 , y2 ,... , yn ).

(7.8)

7.8. THE IMPLICIT FUNCTION THEOREM

247

Writing the vector of functions as f = ( f 1 ,... , f n ), this can be written succinctly as x = f (y), with x = (x1 , x2 ,... , x n ) and y = (y1 , y2 ,... , yn ).

Given the equation system x = f (y), consider a variation of y to some point

y˜ in some neighborhood of y, N (y). This determines a new value for x: x˜ = f ( y˜ ).

Provided f is continuous, x˜ will be close to x because y˜ is close to y. Now, suppose that for any y˜ ∈ N (y), with y˜ 6= y, x˜ = f ( y˜ ) 6= f (y) = x. In this case, the function f

is 1-1 in the neighborhood of y: different values of y determine different values of x. Therefore, when x˜ is close to x, there is a unique y˜ in the neighborhood of y with x˜ = f ( y˜ ). In this case, y is implicitly defined as a function of x, and so one can compute the impact on any give solution value y of a variation in x.

Referring to Figure 7.10 at x = (x1 , x2 ) = (1,1) there are two values of y that

satisfy x = f (y): y = a and y = b. Considering the point y = a, when x changes to

x˜ = (x1′ , x2′ ) the corresponding value of y moves from y = a to y˜ = c, the solution

in the neighborhood of a. Thus, y is implicitly defined as a function of x in this

neighborhood. In this way, we can associate to each point in a neighborhood of (x1 , x2 ) a unique point in a neighborhood of a. And, in this case, we can study how y varies in response to variations in x. Considering equation j, x j = f j (y1 ,... , yn ) in Equation system 7.8, dx j =

∂f j ∂ y1

d y1 + · · · +

∂f j ∂ ym

d yn ,

and the entire system may be written: 

  ∂ f1 dx1    ∂ y1  ..   ..  . = .    ∂fm dx n ∂y 1

··· .. . ···

 d y1   ..   ..  .   .  .   ∂fm d yn ∂ yn ∂ f1 ∂ yn



In abbreviated form: dx = Jd y, and provided J is non-singular, d y = J −1 dx. The study of such systems is taken up next.

7.8

The implicit function theorem

For the square linear system A y = x, if A has a non-zero determinant (is nonsingular), there are two implications: (1) the system has a solution and (2) the

solution is unique. Then as we vary x to x′ , y must vary to y′ , to satisfy A y′ = x′ .

(Because y = A −1 x = Bx uniquely determines y as a function of x.) The key requirement is that A is non-singular. For the non-linear equation system, there

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

248

is an analogous criterion based on a Jacobian determinant. This criterion, a nonzero Jacobian determinant, provides information about the local uniqueness of a solution but not about the existence of a solution. The Jacobian determinant also provides information about how solutions vary with parameters. Given the systems of equations x = f ( y), where the x i are fixed constants, let y∗ be a solution to

x = f ( y∗ ). Given a solution y∗ , under what circumstances is this the only solution

in a neighborhood of y∗ ? And, if x = f ( y∗ ), under what circumstances does a small change in x lead to a small change in the solution? For both questions, a sufficient condition is that the Jacobian determinant (introduced below) is non-zero at y∗ . ³ ´ n o ∂x The Jacobian of the equation system is defined as J = ∂∂ xy = ∂ yj , i, j = i

1,... , n.



∂ f1  ∂ y1

 ∂ f2   ∂ y1   ..  .  J =  ∂fi  ∂ y1   .  .  .  ∂fn ∂ y1

∂ f1 ∂ y2

...

∂ f1 ∂ yi

...

∂ f2 ∂ y2

...

∂ f2 ∂ yi

...

.. .

..

.. .

.. .

∂fi ∂ y2

...

∂fi ∂ yi

...

.. .

.. .

.. .

..

...

∂fn ∂ yi

...

∂fn ∂ y2

.

.



∂ f1 ∂ yn  ∂ f2   ∂ yn 

 ..  .    ∂fi  ∂ yn   ..   .   ∂fn ∂ yn

Before discussing the Jacobian in detail, an example will illustrate the role of the Jacobian. E XAMPLE 7.9: Consider the pair of equations:

x1 = 1 = x y = f 1 ( x, y)

x2 = 0 = y − x2 = f 2 ( x, y).

y y − x2 = 0

y∗

•

xy = 1 x∗

x

7.8. THE IMPLICIT FUNCTION THEOREM

249

The Jacobian of the system is:  ∂f

J=

∂ y1

∂ f1 ∂ y2

∂ f2 ∂ y1

∂ f2 ∂ y2

1



=

Ã

1

1

−2 x

1

!

.

So, the Jacobian determinant is | J | = 1 + 2 x. Note that the two curves “cut”,

providing a well-defined solution. The non-zero Jacobian determinant confirms this. Furthermore, variations in one of the parameters, ( x1 , x2 ), will vary the solution a small amount. This also is an implication of the non-zero Jacobian at the solution point ( x∗ , y∗ ).

♦

A key property of J is that when there is dependency between the equations

x i = f i ( y1 ,... , yn ), i = 1,... , n, the determinant of J is zero. (| J | is called the

Jacobian determinant). To see why this is so, suppose that the nth equation can be written as a function of the other n − 1 equations: x n = ϕ( f 1 , f 2 ,... , f n−1 ). Differentiate this equation to get: ∂xn

nX −1

=

∂ y′

i =1

∂ϕ ∂ f i ∂fi

∂ y′

=

nX −1

ϕi

i =1

∂fi ∂ y′

=

nX −1 i =1

ϕi

∂xi ∂ y′

, ϕi =

∂ϕ ∂fi

.

This shows that the nth row of J is a linear combination of the first (n − 1) rows

and this implies that | J |= 0.

E XAMPLE 7.10: For example, let x1 = f 1 ( y1 , y2 ) = y1 + y2 and x2 = f 2 ( y1 , y2 ) =

y12 + 2 y1 y2 + y22 . Observe that y12 + 2 y1 y2 + y22 = ( f 1 ( y1 , y2 ))2 , so that the equa-

tions are dependent (non-linearly). In the notation above, x2 = ϕ( f 1 ) or

f 2 = ϕ( f 1 ).

 ∂x

∂ y1

∂ x1 ∂ y2

∂ x2 ∂ y1

∂ x2 ∂ y2

=

∂ f1 ∂ y2

´

1

J = Thus, | J |= 2

³

∂ f1 ∂ y1

´³



Ã

−2

1

1

2 y1 + 2 y2

2 y1 + 2 y2

³

∂ f1 ∂ y1

´³

∂ f1 ∂ y2

´

!



∂ f1 ∂ y1

= ∂f 2 ∂ y1

1

∂ f1 ∂ y2 ∂f

2 ∂ y1

2



.

= 0. If x2 = x12 , there are an infinite

number of solutions and ( y1 , y2 ) is not determined in terms of ( x1 , x2 ). If

x2 6= x12 , there is no solution.

♦

The condition | J |6= 0 is necessary for a system of equations to be indepen-

dent. Note, from the example, that this condition detects linear and non-linear

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

250

dependence. However, the Jacobian determinant does not provide information about the existence of a solution.

E XAMPLE 7.11: Consider again the equation system:

y12 + y22 = x1 y1 + y2 = x2 . The Jacobian is:

J=

Ã

2 y1

2 y2

1

1

!

.

When x2 is small, say x2 < 21 , the equation system has no solution, but, for

example, at y1 = 0, y2 = 1, | J | = −2 6= 0. In this equation system, whether there is a solution or not depends on ( x1 , x2 ), but the Jacobian depends only

on ( y1 , y2 ), so it cannot provide information on the existence of a solution. ♦

In the linear case: x = A y so that J = A and so | J | = | A |. Hence | J | 6= 0 implies

that A is invertible and y = A −1 x. In this case, the solution is unique and the impact of variations in x and y is easy to determine ∆ y = A −1 ∆ x.

In the non-linear case, J will generally depend on the value of the vector y – see the first row of the example above. To make this explicit, one may write J ( y). The following theorem connects the value of | J | to the function f , and considers the impact of a variation in x j on the values of the yi . For the latter, one additional

piece of notation is required. Let e j denote a column vector with 1 in the jth row j

and zeros elsewhere. Write Ji for the matrix obtained from J by replacing the i th column with e j . Theorem 7.3: If f is a continuous function with continuous partial derivatives,

f : R n → R n and | J ( y˜ ) |6= 0 for all y˜ in a neighborhood of y, then f is 1-1 in a

neighborhood of y: if y′ is close to y, there is a unique x′ = f ( y′ ) close to x = f ( y). Furthermore, at the point y,

∂ yi ∂x j

=

| J ji | |J|

.

R EMARK 7.4: The motivation for the theorem is relatively easy to see. With x =

f ( y), and f 1-1 on a neighborhood of y, x′ close to x is associated with a unique

point y′ close to y: there is a function g associating points in the neighborhood of

x to points in the neighborhood of y, ( y1 ,... , yn ) = ( g 1 ( x1 ,... , x n ),... , g 1 ( x1 ,... , x n )),

such that x = f ( g( x)), because at x, g( x) gives the value y = g( x) with x = f ( y).

Then the partial derivatives of the g functions give the impact of variations in

7.8. THE IMPLICIT FUNCTION THEOREM

251

the x i on the yi :

∂ y1 ∂ x1

=

∂ g1 ∂ x1

and

∂ y1 ∂ x2

=

∂ g1 . ∂ x2

Considering the two-variable case:

x1 = f 1 (g 1 (x1 , x2 ), g 2 (x1 , x2 )) x2 = f 2 (g 1 (x1 , x2 ), g 2 (x1 , x2 )). Consider a small variation in x1 , holding x2 fixed. From the equation, x1 = f 1 (g 1 (x1 , x2 ), g 2 (x1 , x2 )):

dx1 =

∂ f 1 ∂ g1 ∂ y1 ∂ x1

∂ f 1 ∂ g2

dx1 +

∂ y2 ∂ x1

dx1 .

From the equation x2 = f 2 (g 1 (x1 , x2 ), g 2 (x1 , x2 )): 0=

∂ f 2 ∂ g1 ∂ y1 ∂ x1

dx1 +

∂ f 2 ∂ g2 ∂ y2 ∂ x1

dx1 .

Dividing each of these equations by dx1 : 1= 0= In matrix form: Ã ! 1

0

∂ f 1 ∂ g1 ∂ y1 ∂ x1 ∂ f 2 ∂ g1 ∂ y1 ∂ x1



∂ f1 ∂ y1  = ∂f 2 ∂ y1

+ +

∂ f 1 ∂ g2 ∂ y2 ∂ x1 ∂ f 2 ∂ g2 ∂ y2 ∂ x1



.



∂ f1 ∂ g1 ∂ y2   ∂ x1  , ∂ f2 ∂ g2 ∂ y2 ∂ x1

so, from Cramer’s rule:

∂ y1 ∂ x1

=

∂ g1 ∂ x1

¯ ¯1 ¯ ¯ ¯ ¯0

=¯ ¯ ∂ f1 ¯ ∂ y1 ¯ ¯ ∂ f2 ¯ ∂ y1

∂ f1 ¯ ∂ y2 ¯

¯ ¯

∂ f 2 ¯¯ ∂ y2

¯.

∂ f1 ¯ ∂ y2 ¯

¯

∂ f 2 ¯¯ ∂ y2

The next example applies the theorem to the pair of equations x1 = y12 + y22 , x2 = y1 + y2 .

E XAMPLE 7.12: Consider again the equation pair (depicted in Figure 7.10): x1 = f 1 (y1 , y2 ) = y12 + y22 x2 = f 2 (y1 , y2 ) = y1 + y2 ,

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

252

with x1 = x2 = 1. Then  ∂f

1

∂ y1



∂ f2 ∂ y1

∂ f1 ∂ y2 ∂ f2 ∂ y2



=

Ã

2y1

2y2

1

1

!

∂ f1 ∂ y2

 1  0

;

∂ f2 ∂ y2



=

Ã

1 0

2y2

!

1

;

 ∂f

1

∂ y1



∂ f2 ∂ y1

1 0



=

Ã 2y1

1

1

0

!

.

(7.9) So: ∂ y1 ∂ x1

At (y1 , y2 ) = (0,1),

=

∂ y1 ∂ x1

1 2(y1 − y2 )

= − 12 and

and ∂ y2 ∂ x1

∂ y2 ∂ x1

=

−1 . 2(y1 − y2 )

(7.10)

= 12 . The impact of dilating the circle

(depicted in Figure 7.10 by the circle formed with dotted lines) is to move

♦

the solution left from a to d.

R EMARK 7.5: Note that when these matrices in Equation 7.9 are evaluated at point (y1 , y2 ), the implicit assumption is that the point satisfies the equations. For example, the point (y1 , y2 ) = (4,0) lies on neither equation (with x1 = x2 = 1), but one may nevertheless compute the matrices of partial derivatives:

again, these partial derivatives are computed at solution points to the system of

equations.

R EMARK 7.6: The calculations in Example 7.12 do not give an explicit solution for (y1 , y2 ). However, this is straightforward here. From y1 + y2 = x2 , y1 = x2 − y2 ,

which may be substituted into y12 + y22 = x1 to give (x2 − y2 )2 + y22 = x1 or x22 − 2x2 y2 + y22 + y22 = x1 . Collecting terms, 2y22 − 2x2 y2 + x22 − x1 = 0. This quadratic in y2 has two roots, y2a , y2b :

y2a , y2b = = = = =

2x2 ±

q (2x2 )2 − 4(2)(x22 − x1 )

4 q 2 2x2 ± 4x2 − 8x22 + 8x1 4 q 2x2 ± 4(2x1 − x22 ) 4 q 2x2 ± 2 (2x1 − x22 ) 4 q x2 ± (2x1 − x22 ) 2

.

7.8. THE IMPLICIT FUNCTION THEOREM

253

At x1 = x2 = 1, this gives y2a , y2b = 1 2

1 ±1 2 ,

so y2a = 1 and y2b = 0. Taking the first root,

y2a ( x1 , x2 ) = 12 x2 + 21 (2 x1 − x22 ) , and differentiating with respect to x1 (dilating

the circle), gives

∂ y2a ∂ x1

=

µ ¶ 1 1 1 (2 x1 − x22 )− 2 2 2 2

=

1 1 (2 x1 − x22 )− 2 2

1 . = q 2 (2 x1 − x22 ) Notice that

y2a =

1 1 x2 + 2 2

q

(2 x1 − x22 ),

q Thus, y1a − y2a = − (2 x1 − x22 ) and ∂ y2a ∂ x1

1 y1a − y2a

y1a =

1 1 x2 − 2 2

= −q

1

(2 x1 − x22 )

q

(2 x1 − x22 ).

, so:

1 1 =− = q a a , 2( y 2 1 − y2 ) 2 (2 x1 − x2 )

which compares with Equation 7.10.

In concluding this section, recall the general system of Equation 7.8, which may be written in the form x = f ( y). Then, the total differential may be written, in matrix notation, as dx = J ( y)d y. If J ( y) is non-singular, then we may write d y =

J ( y)−1 dx, giving the variations in all y as a result of a variation in the vector x.

7.8.1

General systems of equations

In the previous discussion, equations had the form x i = f i ( y1 ,... , yn ), with the

interpretation that the x i are parameters determining the values of the yi , just as the values of the x i in x = A y determine the yi . In either case: x i = f i ( y1 ,... , yn ) P or b i = nj=1 a i j y j , the parameters appear “nicely” on one side. If we write F i ( y1 ,

y2 ,... , yn , x i ) = x i − f i ( y1 ,... , yn ), then x i = f i ( y1 ,... , yn ) for i = 1,... , n is equivalent

to F i ( y1 , y2 ,... , yn , x i ) = 0, for i = 1,... , n.

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

254

More generally, each parameter x i may enter all equations: F i ( y1 , y2 ,... , yn , x i ,

x2 ,... , x m ) = 0. For example, F 1 ( y1 , y2 , x1 , x2 , x3 ) = x3 − x1 y12 − x2 y22 F 2 ( y1 , y2 , x1 , x2 , x3 ) = 1 − x3 x1 y12 − x2 y22 gives a pair of equations determining two variables, ( y1 , y2 ), with three parameters, ( x1 , x2 , x3 ). The general system of equations has the form:

F 1 ( y1 ,... , yn ; x1 ,... , x m ) = 0 F 2 ( y1 ,... , yn ; x1 ,... , x m ) = 0 .. .

. = ..

F n ( y1 ,... , yn ; x1 ,... , x m ) = 0. In this system, the y variables are viewed as endogenous variables (variables to be determined) and the x variables are viewed as exogenous variables (variables whose values are fixed – parameters). Given the values for the x i , this is a system of n equations (F 1 , F 2 ,... , F n ), in n unknowns, ( y1 , y2 ,... , yn ). Note that the number of x i (m) need not equal the number of equations (n), but the number of

yi (n) must equal the number of equations. This is because the y variables are the variables that must be solved from the n equations, given some fixed set of values for the x variables. The impact of a parameter change (say x i changes) on some variable (say y j ) is measured by

∂ yj . ∂ xi

As in the previous discussion, this may be computed in terms of

the ratio of the determinants of two matrices. The following discussion describes j

the procedure. For this system, the Jacobian J and an additional matrix Ji are defined: ∂F 1  ∂ y1



∂F 1 ∂ y2

...

∂F 1 ∂ yj

...

 2  ∂F  ∂ y1    .  ..  J =  i  ∂F  ∂ y1    ..  .  

∂F 2 ∂ y2

...

∂F 2 ∂ yj

...

.. .

..

.. .

.. .

...

∂F i ∂ yj

...

.. .

.. .

..

∂F i ∂ y2

.. .

∂F n

∂F n

∂ y1

∂ y2

.

...

∂F n ∂ yi

.

...

∂F 1 ∂ yn 



∂F 2  ∂ yn  



 ..  .     i ∂F  ∂ yn    ..  .   

∂F n ∂ yn

7.8. THE IMPLICIT FUNCTION THEOREM

255

∂F 1  ∂ y1



 2  ∂F  ∂ y1   .  .  . j Ji =   ∂F i   ∂ y1   .  ..  

∂F 1 ∂ y2

...

−∂ F 1 ∂ xi

...

∂F 1 ∂ yn

∂F 2 ∂ y2

...

−∂ F 2 ∂ xi

...

∂F 2 ∂ yn

.. .

..

.. .

.. .

∂F i ∂ y2

...

−∂ F i ∂ xi

...

.. .

.. .

.. .

..

...

−∂ F n

...

∂F n

∂F n

∂ y1

∂ y2

.

∂ xi

.



     ..   .  .  ∂F i  ∂ yn   ..  .   

∂F n ∂ yn

j

The matrix Ji is obtained from J by replacing the jth column of J with the vector of partial derivatives of (−F 1 , −F 2 ,... , −F n ) with respect to x i . Theorem 7.4: Suppose that F i has continuous partial derivatives for all i and ¯ |6= 0 at (y, x). ¯ Then there exist continuous functions f 1 ,... , f i ,... , f n , such | J( x) ¯ Furthermore, the effect on that y j = f j (x), j = 1,2,... n, for x in a neighborhood of x. y j of a change in x i ,

∂ yj ∂ xi

, is given by

∂yj ∂xi

j

=

| Ji | |J|

.

To illustrate, return to the production problem discussed in Section 7.6.1 with a different specification of the production and constraint technology. Let demand, Q, be exogenously given and determining the required output. Suppose that the technology relating inputs x and y to output is Q = xα yβ . In addition, a technology

constraint requires that y = kxγ . We wish to consider the impact of changes in the

required production level Q on x and y. The equations relating the variables may be written: F 1 (x, y;Q, α, β, γ, k) = Q − xα yβ = 0

(1)

F 2 (x, y;Q, α, β, γ, k) = y − kxγ = 0.

(2)

These equations are depicted in Figure 7.11, where the impact of varying Q is   Ã ! illustrated. Ã −αQ −βQ ! 1 − ∂∂FQ −1 x y = J = −γ y and  . ∂F 2 0 1 − ∂Q x Q

Thus, | J |= − x (α + βγ), and

¯ ¯ 1 ¯¯−1 = ∂Q | J | ¯¯ 0 ¯ ¯ −αQ ∂y 1 ¯¯ x = ∂Q | J | ¯¯ −γ y ∂x

x

¯

−βQ ¯ y ¯¯

1 ¢Q ¯= ¡ α + βγ x 1 ¯ ¯ ¡ y¢ ¯ −1 ¯ γ x ¯= ¢³Q ´. ¯ ¡ 0¯ α + βγ x

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

y

256

Q increases

(2)

dy

•

•

(1)

x

dx

Figure 7.11: The impact of varying output on x and y E XAMPLE 7.13: Consider a macroeconomic model with two equations:

y = c( y − T ( y), r) + I ( y, r) + G M = L( y, r). The first equation is called the IS curve and the second the LM curve in macroeconomics. In this widely studied model, a common task is to determine the impact of changes in policy parameters (government expenditure

G and the money supply M ) on income y and the interest rate r. These are called policy multipliers and are computed next. For example,

∂y ∂G

is the pol-

icy multiplier relating government expenditure to output and measures the impact on output of an increase in government expenditure. Write c1 for the partial derivative of c with respect to its first argument,

cr =

∂c , ∂r

cr =

∂c , ∂r

Lr =

∂L ∂r

and L y =

∂L . Then, ∂y

differentiating the first equation:

d y = c1 (1 − T ′ )d y + c r dr + I y d y + I r dr gives the variation in r resulting from a variation in y when restricted to stay on this curve. Rearranging: (1 − c1 (1 − T ′ ) + I y )d y = ( c r + I r )dr (1 − c1 (1 − T ′ ) + I y ) (cr + I r )

=

dr . dy

A common economic assumption is that 0 < (1 − c1 (1 − T ′) + I y ) < 1 along with

7.8. THE IMPLICIT FUNCTION THEOREM

257

c r < 0 and I r < 0, so that dr dy

=

L − L ry ,

dr dy

< 0. Similar calculations for the LM curve yield

which is positive under the assumptions L y > 0 and L r < 0. The

curves are depicted in Figure 7.12.

r LM

IS y

Figure 7.12: The IS–LM curves

In this model, the exogenous variables are government expenditure (G) and the money supply (M). The endogenous variables are income (y) and the rate of interest (r). (As a numerical example, let T(y) = 12 y, c(y − T(y), r) = ³q ´ 1 y y − 12 y · 1r , I(y, r) = r and L(y, r) = y 2 r−1 .) The first equation asserts that national income (y) equals consumption (c), plus investment I, plus govern-

ment expenditure, G. The second says that money supply (M) equals money demand L (and demand depends on the interest rate (r) and the level of income). These equations may be rearranged in the general form given previously: F 1 (y, r,G, M) = y − c(y − T(y), r) − I(y, r) − G = 0

F 2 (y, r,G, M) = L(y, r) − M = 0.

Provided the Jacobian has full rank, we may view y and r as functions of G and M, y = f 1 (G, M), r = f 2 (G, M), and compute the corresponding partial derivatives according to the rule above. (In the theorem above y and r

correspond to y1 and y2 , while G and M correspond to x1 and x2 .) With this rearrangement, the Jacobian is obtained: 

J =

∂F 1 ∂y

∂F 1 ∂r

∂F 2 ∂y

∂F 2 ∂r



=

Ã

1 − c1 (1 − T y ) − I y Ly

−(c r + I r )

Lr

!

.

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

258

The Jacobian has determinant | J |= [1 − c1 (1 − T y) − I y ]L r + L y ( c r + I r ). To y

y

r calculate JG , JM , JGr and JM (where these correspond to J ji , with i = y, r

and j = G, M ), observe

  Ã ! 1 − ∂∂FG +1  = , ∂F 2 0 − ∂G

 

1

− ∂∂FM

2

− ∂∂FM



=

Ã

!

0 +1

.

Thus, y JG

JGr

= =

Ã

1

−( c r + I r )

0 Ã

Lr

!

y JM

,

1 − c1 (1 − T y ) − I y

1 0

Ly

!

r JM

,

= =

Ã 0

! −( c r + I r )

1 Lr Ã 1 − c1 (1 − T y ) − I y

Ly

0

!

1

.

Applying the theorem: ∂y ∂G

y

=

| JG | |J|

=

Lr , |J|

∂y

and

∂M

y

=

| JM | |J|

=

cr + I r . |J|

(7.11)

Similarly, ∂r ∂G

=

| JGr | |J|

=

−L y |J|

,

and

∂r ∂M

=

r | JM |

|J|

=

−[1 − c1 (1 − T y ) − I y ] |J|

.

(7.12)

♦

R EMARK 7.7: It is worth noting that, for a slightly different perspective, these results may be obtained directly by totally differentiating the original equation system.

d y = c1 (1 − T y )d y + c r dr + I y d y + I r dr + dG L y d y + L r dr = dM, [1 − c1 (1 − T y ) − I y ]d y − ( c r + I r )dr = dG

L y d y + L r dr = dM Ã

[1 − c1 (1 − T y ) − I y ]

Ly

!Ã ! −( c r + I r ) d y

Lr

dr

=

Ã

dy

Ã

dG

!

dM

.

This system has the form: Ax = b where

A=

Ã

[1 − c1 (1 − T y ) − I y ]

Ly

−( c r + I r )

Lr

!

, x=

dr

!

, b=

Ã

dG dM

!

.

7.8. THE IMPLICIT FUNCTION THEOREM

259

Note that A = J and that | A |= [1 − c1 (1 − T y ) − I y ]L r + L y (G r + I r ). Using Cramer’s rule:

¯ 1 ¯¯ dG dy = ¯ | A | ¯ dM

¯ −( c r + I r )¯¯ ¯ ¯ Lr

L r dG + ( c r + I r )dM |A| µ ¶ µ ¶ Lr (cr + I r ) = dG + dM, |A| |A|

=

and ¯ 1 ¯¯[1 − c1 (1 − T y ) − I y ] dr = ¯ |A|¯ Ly

= =

¯ dG ¯¯ ¯ dM ¯

[1 − c1 (1 − T y ) − I y ]dM − L y dG

µ

−L y |A|

¶

dG +

µ

|A|

[1 − c1 (1 − T y ) − I y ] |A|

¶

dM.

Considering the expression ¶ µ ¶ (cr + I r ) Lr dG + dM, dy = |A| |A| µ

if dM = 0, the change in y, d y, results from a change in G , dG , with M held

fixed. This measures the partial derivative³ of y´ with respect to G , and is given in the expression by the coefficient on dG : |LAr| . Thus: ∂y ∂G

=

·

¸ Lr , |A|

∂r ∂M

=

·

[1 − c1 (1 − T y ) − I y ] |A|

¸

,

and so on. This is the same result as appears in Equations 7.11 and 7.12.

The following discussion provides a graphical perspective on the calculations. A highway starts from point ( x, y) = (0, 0) and follows the path y = x + β x2 − γ x3

(given the x coordinate, there is a unique y coordinate determined by this equation). The direction x is east and the direction y is north. The point ( x, y) on this highway that is a distance r from the starting point satisfies x2 + y2 = r2 . (We assume that

2β 3γ

> r.) This point is uniquely defined by these two equations.

Suppose that we move slightly further along the highway by increasing the distance r; is the direction more northerly or more easterly? From the starting point at radius r, consider an increase in the radius to ′

r with r′ − r > 0 small. This changes the coordinates ( x, y) slightly to ( x′ , y′ ).

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

260

y

highway

dy

•

•

r

dr

x

x∗

dx

x∗ =

β+

p

β+3γ 3γ

Figure 7.13: The impact of increasing distance

The eastward movement is x′ − x and the northward movement is y′ − y. Then

x′ − x ∂ x ≈ r′ − r ∂r

y′ − y ∂ y ≈ . r′ − r ∂r

and

Thus, the northward movement is greater than the eastward movement if:

y′ − y = x′ − x

y′ − y r′ −r x′ − x r′ −r

≈

∂y ∂r ∂x ∂r

>1

Note that, along the highway, µ ¶ 2β dy = 1 + 2β x − 3γ x2 = 1 + (2β − 3γ x) x = 1 + 3γ − x x. dx 3γ

So,

dy dx

> 1 if

³

2β 3γ

´ − x > 0 or (2β − 3γ x) > 0. At any point ( x, y) on the circle x2 + y2 =

r2 , and since y2 ≥ 0, x2 ≤ r2 , and since the solution is positive, x < r, so that

2β − 3γ x > 2β − 3γ r or

2β 3γ

−x >

2β 3γ

− r. Since

2β 3γ 2

≥ r,

2β 3γ

3

− x ≥ 0. Therefore, at the

point on the circle where the curve y = x + β x − γ x cuts, the slope of the curve

is greater than 1: at the solution, than the eastward movement.

dy dx

> 1, and the northward movement is greater

7.8. THE IMPLICIT FUNCTION THEOREM

261

The following example reconsiders this discussion by calculating the partial derivatives of y and x with respect to r, and comparing them.

E XAMPLE 7.14: Recall:

x 2 + y2 = r 2

y − x − β x 2 + γ x 3 = 0.

2 xdx + 2 yd y = 2rdr

d y − dx − 2β xdx + 3γ x2 dx = 0 Ã

2x

2y

1 + 2β x − 3γ x 2

−1

!Ã

dx

!

dy

=

Ã

2rdr

!

0

,

or, since only r is varied, Ã

2x

2y

1 + 2β x − 3γ x 2

−1

!  ∂x 

 ∂r  = ∂y ∂r

Ã ! 2r

0

.

Using Cramer’s rule, the coefficient determinant is ∆ = −2 x − 2 y(1 + 2β x −

3γ x2 ), while ∆1 = −2r and ∆2 = −(1 + 2β x − 3γ x2 )2r. Then ∂x ∂r

=

∆1 , ∆

∂y ∂r

=

∆2 and ∆

∂y ∂r ∂x ∂r

=

∆2 = (1 + 2β x − 3γ x2 ) = (1 + (2β − 3γ x) x). ∆1 ♦

This page intentionally left blank

Exercises for Chapter 7

Exercise 7.1

The following simple macroeconomic model has four equations

relating government policy variables to aggregate income and investment. Aggregate income equals consumption plus investment and government expenditure:

y = c + i + g. Consumption is positively related to the real rate of interest:

c = α + β y, α, β > 0. Investment is negatively related to the rate of interest, r:

i = γ − δr, γ, δ > 0. The money supply, m s , equals the money demand, m d , and this in turn is positively related to the level of income (ρ > 0) and negatively related to the rate of interest (ϕ > 0):

m s = m d = ρ y − ϕ r, ϕ, k > 0.

Find the impact on y of (a) a small change in g, ∆ g, and (b) a small change in m,

∆ m. Exercise 7.2

Consider a market defined by the following supply and demand

curves.

q d = a + bP d + cY¯ q s = d + eP s ,

where a, c, d , e > 0, and b < 0, a > d . 263

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

264

(a) (i) Compute the equilibrium price P ∗ and quantity q∗ . (ii) Sign the comparative statics partial derivatives

∗ ∂P ∗ ∂ q , ∂Y¯ . ∂Y¯

(b) Suppose a tax t > 0 is levied on producers: (i) Compute the new equilibrium price and quantity. (ii) Sign the derivatives

∂ q∗ ∂t ,

∂P d ∂t

∂P s ∂t .

,

Exercise 7.3 Consider the following macroeconomic model of national income:

C = α + βY ,

α > 0, 0 < β < 1

Y = C + I +G I = γY ,

0 < γ < 1.

What are the exogenous and endogenous variables? Write this system in matrix form and solve for the reduced form (i.e., write the system in the form Ax = b,

where x is the vector of endogenous variables and b the vector of exogenous variables and solve as x = A −1 b). Using the reduced form expressions, find the following comparative statics derivatives: ∂c , ∂G

∂Y ∂G

,

∂I ∂G

.

Exercise 7.4 Consider the following macroeconomic model:

y = c( y, r) + i (r) + G m = l (r, y). (a) Identify the exogenous and endogenous variables. (b) (i) What is the effect of an increase in government expenditure G , on output y and the interest rate r? (ii) What is the effect of an increase in the money supply M , on output and the interest rate? (c) Consider what happens if we replace the above money market equation with the following: m = µ y. Repeat part (b) with this assumption. (d) Represent the two different systems in ( y, r) space and show the above effect diagrammatically. Exercise 7.5 Consider the following macroeconomic model:

S (Y , r) + T (Y ) + IM = I (r) + G + x( e) L(Y , r) = M,

EXERCISES FOR CHAPTER 7

265

where Y is national income, S is savings, which is a function of income and the rate of interest, IM is imports, x is exports, I is investment, G is government expenditure, M is money supply and e is the exchange rate. The first equation is the equilibrium condition for the goods market and the second equation is the equilibrium condition for the money market. Using this model, determine how the equilibrium values of Y and r change with a change in the exchange rate e and with a change in the exogenous level of imports (here imports is an exogenous variable, just like G ). Make the usual assumptions: S y > 0, S r > 0, T y > 0, I r < 0, x e < 0, L y > 0, L r < 0. Exercise 7.6 Consider a production process using input x to produce output y. The relationship between inputs and outputs is somewhat odd: for technological reasons the input–output levels must satisfy two conditions:

x2 + y2 = α, 0 < α < 1 β y = x(1 − x), 0 < β.

This pair of equations has a unique non-negative solution ( x, y ≥ 0). Plot these

functions in ( x, y) space. At this solution, calculate

dx d y dx dα , dα , dβ

and

dy dβ .

In relation

to your graph, interpret the signs of the derivatives (they are positive or negative). Exercise 7.7

Repeat the previous question, with input and output required to

satisfy the following two conditions:

y − γ ln( x + 1) = 0 x α yβ = γ.

Exercise 7.8 Consider the following IS –LM model with an import sector:

y = c ( y) + I ( i ) + G m = L( y, i ) x(π) = z( y, π). The first two equations are the IS and LM curves, respectively. The third equation asserts that exports ( x) depend only on the exchange rate, π, while imports ( z) depend on the exchange rate and the level of domestic income. (a) Calculate the slopes of the IS and LM curves. (b) Find

d y di dG , dG

and

dπ dG .

(Differentiate each of the three equations to give a 3 × 3

equation system in (d y, di, d π) and use Cramer’s rule.)

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

266

An individual has utility function u( x, y) = xα yβ , where 0 < α
0. Find

∂ y ∂x ∂ y ∂x , , , . ∂Q ∂Q ∂ k ∂ k

∂ y ∂x , . ∂α ∂α

Also, try finding

Exercise 7.14 Consider the following two equations: x2 + y2 = r2 and y = k ln( x + 1),

with r > 0 and k > 0. Plot both equations and indicate on your graph the impact of an increase in r and k on the solution. Find

Exercise 7.15

d y dx d y dr , dr dk

and

dx dk .

Consider the following two equations: x y = k and y = 1 − e−γ x ,

k > 0 and γ > 0. Plot both equations and indicate on your graph the impact of an

increase in k and γ on the solution. Find

d y dx d y dk , dk d γ

and

dx dγ .

Exercise 7.16 An output technology is given by x + k y2 = a, where k and a are

positive constants. Government regulations require that outputs of x and y satisfy the relation eβ y − eα x = c, where α, β and c are positive constants. (We assume that q ln( c+1) < ak to ensure there is a solution to the pair of equations.) β (a) The government has decided that it is desirable to raise x and reduce y, by a policy of increasing c. Will the policy have the desired effect? Calculate the impact on x and y of an increase in c. (b) An economic consultant recommends instead that the government raise β. Would this policy have the desired effect of increasing x and reducing y? Exercise 7.17 An individual has income or wealth, y, to allocate between present,

c, and future, f , consumption. Suppose that the individual has utility function u( c, f ) = c · f . Income not consumed in the current period is invested at interest

rate r to yield the amount available for future consumption: f = ( y − c)(1 + r).

Since y is given, the choice of c determines f . This is a budget constraint relating current and future consumption. Find the impact of a change in income, y, on consumption c.

CHAPTER 7. SYSTEMS OF EQUATIONS, DIFFERENTIALS AND DERIVATIVES

268

Exercise 7.18 An individual has preferences given by indifference curves of the form x y = k: if the agent is consuming x units of one good, the consumption of

k y

units of the other good leads to utility level k. The problem facing the individual is that the goods x and y are bundled so that they cannot be bought in any combination. Specifically, x units of one good come in a package with y = 1 − e−α x units

of the other good. Suppose that the individual currently is at utility level k, consuming at levels x and y according to the packaging constraint. Find the impact of an increase in utility on the consumption levels x and y: Exercise 7.19

∂x ∂k

and

∂y . ∂k

According to a macroeconomic model, the aggregate level of in-

come Y equals consumption C , investment I and government expenditures G (Y = C + I + G ). Consumption depends on income C = C (Y ) while investment de-

pends on the rate of interest r, so I = I (r). Thus, Y = C (Y ) + I (r) + G . The money

supply M equals money demand, which is a function of the level of income and

the rate of interest L = L(Y , r). Thus, M = L(Y , r). Finally, the exchange rate π

is assumed to depend on income and the interest rate π = π(Y , r). Together, this

gives three equations. The level of government expenditure (G ) and the money supply ( M ) are assumed to be controlled by the government. Given values for G and M , the three equations determine the values of Y , r and π. Suppose that these functions can be written in the form:

Y = cY α − γ r + G M=

Y r

π = Y β rδ .

In this model, there are two government policy parameters, government expenditure, G , and money supply M . The impact of changes in these policy variables on output, interest rates and the exchange rate are given by such expressions as ∂r ∂Y ∂G , ∂ M ,

and

∂π ∂G .

Determine these policy “multipliers” and confirm that

β = −δ.

∂π ∂G

= 0 if

Exercise 7.20 Suppose that the supply function for a particular good is given by the equation p = α + δ qβ , α, δ, β > 0. The demand function for the good is given by ln( pq) = k. Shifts in the supply function are caused by changes in α, δ and β. Plot

the supply and demand equations and calculate the impact on (equilibrium) price and quantity of supply shifts.

8 Taylor series

8.1

Introduction

Problems such as maximization of a function f or finding the root of an equation f (x) = 0 arise frequently in economics. However, often the function may be difficult

to work with, motivating the use of approximations to the function f that are easier to study and provide insight regarding f . One relatively simple class of approximating functions are the polynomials. A function h is a polynomial in x if it has the form: h(x) = a0 + a1 x + a2 x2 + · · · + a n x n , where each a i is a constant. So, h is the sum of terms where each term is a constant multiplied by x to the power of some integer, such as a3 x3 , where a3 is the constant and x is raised to the power of 3. The function h is called an nth order polynomial because the highest power is n (assuming a n 6= 0). For example, h(x) = 2 + x + 5x2 + 8x3 is a third-order polynomial, as is g(x) = 2x3 . Under suitable

circumstances, a large class of functions can be approximated by a polynomial, so that given the function f , one may find a polynomial such as h approximating f . This makes it possible to discover properties of f by study of the simpler function h and that fact is a primary motivation for studying these approximations. Taylor’s theorem concerns the approximation of functions with polynomials and provides the theoretical basis for the polynomial approximations. Section 8.1.1 motivates the approach by illustrating how a quadratic function may be chosen to approximate a continuous function. Following this, in Section 8.1.2 the main theorem is stated. This asserts that any continuous differentiable function may be approximated by a polynomial function – the Taylor series 269

CHAPTER 8. TAYLOR SERIES

270

expansion. Because Taylor series expansions involve approximation, there is a gap between the approximation and the actual function. However, the quality of the approximation improves as the length of the Taylor series expansion increases – the higher the order of the polynomial, the better the approximation. The improvement from lengthening the approximation is considered with an example in Section 8.1.3. Sections 8.2 to 8.4 provide a number of applications. Section 8.2 shows how Taylor series expansions may be used to address a variety of problems – such as relating concavity to the derivatives of a function, finding the roots of an equation, and generating numerical optimization routines to find maxima and minima of a function. In Section 8.3, expected utility is introduced. In this section, the theory is used to provide an approximate measure of utility loss from risk and a monetary measure of the cost incurred from bearing risk. Section 8.4 uses Taylor series to find the approximately optimal solution to a portfolio selection problem. Finally, Section 8.5 describes the multivariate version of Taylor’s theorem and Section 8.6 provides a proof of the theorem.

8.1.1 Taylor series approximations: an example To illustrate the way in which Taylor series may be used to approximate a function, consider the function: x

f (x) = ln(x + 1)e− 2 . Although simple to describe mathematically, the function has reasonably complex behavior as x varies. Starting from x = 0, the function increases initially and then declines, “flattening out” to 0 as x becomes large. Often, functions such as this are

difficult to study. For example, the function has a unique maximum, x∗ , defined by f ′ (x∗ ) = 0, but this expression is very difficult to solve. One way to address

this difficulty is to approximate f by another function, g, which is “close” to f but simpler to manipulate. Consider the function, g(x), which is given by the following formula: g(x) = 0.1524 + 0.44285x − 0.17489x2 . Observe that the function g “tracks” the function f (x) to some degree – see Figure 8.1. Both functions, f and g, have the same value at x = 1 and are close when x is close to 1. So, g is a good approximation to f when x is close to 1 and g has a simple form.

8.1. INTRODUCTION

271

0.6

0.4 x

f ( x) = ln( x + 1) e− 2 0.2

g( x) = 0.1524 + 0.44285 x − 0.17489 x2 x 1

2

x

Figure 8.1: Taylor series approximation of f ( x) = ln( x + 1) e− 2

In fact, the function g is a second-order Taylor series expansion of f , defined according to the formula:

g( x) = f (1) + f ′ (1)( x − 1) +

1 ′′ f (1)( x − 1)2 . 2

(8.1)

The logic for this is discussed later. The following calculations confirm that this formula in Equation 8.1 yields the expression for g given above. Observe that: 1

1 1 e− 2 x − ln( x + 1) e− 2 x ( x + 1) 2 1 1 1 1 f ′ (1) = e− 2 − ln(2) e− 2 , 2 2

f ′ ( x) =

and 1

1

1 e− 2 x 1 e− 2 x + ln( x + 1) e− 2 x − 2 ( x + 1) 4 ( x + 1) 1 x+2 −1 x 1 =− e 2 + ln( x + 1) e− 2 x 4 ( x + 1)2 1 3 1 1 f ′′ (1) = − e− 2 + ln(2) e− 2 . 4 4

f ′′ ( x) = −

With these calculations in place, g may be obtained: 1 g( x) = f (1) + f ′ (1)( x − 1) + f ′′ (1)( x − 1)2 2 · µ ¶ ¸ · ¸ 1 1 1 1 −1 1 3 1 1 e 2− ln(2) e− 2 − e− 2 ( x − 1)2 ln(2) e− 2 ( x − 1) + = ln(2) e− 2 + 2 2 8 8

CHAPTER 8. TAYLOR SERIES

272

1 1 1 −1 e 2 [1 − ln(2)]( x − 1) − e− 2 [3 − ln(2)]( x − 1)2 2 8 ¤ 1 −1 £ = e 2 13 ln(2) − 6 ln(2) x + 10 x − 7 + ln(2) x2 − 3 x2 8 ¤ 1 1£ = e− 2 (13 ln(2) − 7) + (10 − 6 ln(2)) x − (3 − ln(2)) x2 8 1

= ln(2) e− 2 +

= 0.1524 + 0.44285 x − 0.17489 x2 , 1

using the fact that ln(2) = 0.69314 and e− 2 = 0.60653. The function g is called a second-order Taylor series expansion of f around 1 since we used f (1), f ′ (1) and

f ′′ (1) to obtain g. The ability to approximate a function (such as f ), by a polynomial function, such as the quadratic function g, turns out to have many uses. One application is optimization. If the maximum of f is difficult to determine but g is close to f , then finding the maximum of g may hbe a good approximation to maximizing f . Note i 1 1 that f ′ ( x) may be written f ′ ( x) = ( x+1 1) − 12 ln( x + 1) e− 2 x . Since for any x, e− 2 x 6= 0, h i h i 1 1 1 1 ∗ ∗ f ′ ( x) = 0 implies ( x+ 1) − 2 ln( x + 1) = 0: x satisfies ( x∗ +1) − 2 ln( x + 1) = 0. The

solution for x∗ is with x∗ = 1.34575 and f ′ ( x∗ ) = 0 is satisfied. The function f has a unique maximum at x∗ .

The function g has a maximum at xˆ = 1.266. Assume that, since g approxi-

mates f , the maximum of g might be close to the maximum of f , so if we now expand f around xˆ , we obtain a “new” function, gˆ ; this should be a better approximation of f close to the maximum:

gˆ ( x) = f (1.266) + f ′ (1.266)( x − 1.266) +

1 ′′ f (1.266)( x − 1.266)2 2

= 0.22901 + 0.307258 x − 0.114577 x2 .

Observe that gˆ is maximized at x′ = 1.341, which is very close to x∗ . These calcu-

lations suggest a simple procedure for locating the maximum of a function using a simple algorithm, and illustrate one important use of Taylor series expansions.

8.1.2 Taylor series expansions Taylor’s theorem asserts that a continuous function with continuous derivatives may be approximated by a polynomial function. For notational convenience write

f ( k) ( x) =

dk f d xk

for the kth derivative of the function f . (As usual the lower deriva-

tives are often written f ′ ( x) =

f (2) ( x) and f (3) ( x).)

df dx ,

f ′′ ( x) =

d2 f d x2

and f ′′′ ( x) =

d3 f d x3

instead of f (1) ( x),

8.1. INTRODUCTION

273

Theorem 8.1: Let f be a continuous function on an interval [a, b] with continuous derivatives f ′ , f ′′ , ... f n . Then given x, x0 ∈ (a, b): 1 ′′ f ( x0 )( x − x0 )2 2! 1 1 + f ′′′ ( x0 )( x − x0 )3 + · · · + f n ( p)( x − x o )n 3! n!

f ( x) = f ( x0 ) + f ′ ( x0 )( x − x0 ) +

(8.2)

for some p between x and x0 , and 1 ′′ f ( x0 )( x − x0 )2 2! 1 1 + f ′′′ ( x0 )( x − x0 )3 + · · · + f n ( x0 )( x − x0 )n , 3! n!

f ( x) ≈ f ( x0 ) + f ′ ( x0 )( x − x0 ) +

(8.3)

when x is close to x0 .

In Theorem 8.1, the expression on the right side is called a Taylor series expansion of f of order n around x0 . Note that in one of the expansions, Equation 8.2, the last term is

1 n!

f n ( p)( x − x0 )n – the point p is chosen to give equality of the ex-

pansion with the function (and depends on both x and x0 ). The second expansion, Equation 8.3, has last term

1 n n n! f ( x0 )( x − x0 ) ,

and the expansion is approximately

equal to f ( x). The term n! is “n factorial”, defined n! = n(n − 1)(n − 2) · · · 3 · 2 · 1, so,

for example, 3! = 3 × 2 × 1 = 6. By convention, 0! = 1. For example, a third-order

expansion is:

f ( x) = f ( x o ) + f ′ ( x o )( x − x o ) +

1 1 ′′ f ( x o )( x − x o )2 + f ′′′ ( p)( x − x o )3 . 2! 3!

To illustrate the formula, consider the quadratic function f ( x) = x2 . The first

and second derivatives are f ′ ( x) = 2 x and f ′′ ( x) = 2 for all x, so that for any x0 and

any number p, f ′′ ( x0 ) = 2 = f ′′ ( p) and the second derivative, f ′′ ( x) is independent

of x. An exact first-order expansion has the form f ( x) = f ( x0 ) + f ′ ( p)( x − x0 ) for some p, so that for the function f ( x) = x2 :

f ( x ) = f ( x0 ) + f ′ ( p ) · ( x − x0 ) x2 = x02 + 2 p · ( x − x0 ). This equality requires that p =

x0 + x 2 .

To see this, rearrange the equation:

x2 − x02 = 2 p · ( x − x0 ) ( x − x0 )( x + x0 ) = 2 p · ( x − x0 )

CHAPTER 8. TAYLOR SERIES

274

( x0 + x ) = 2 p

( x0 + x ) = p. 2

Or, with approximation, f ( x) ≈ f ( x0 ) + f ′ ( x0 )( x − x0 ) = [ f ( x0 ) − f ′ ( x0 ) x0 ] + f ′ ( x0 ) · x, a linear equation with intercept [ f ( x0 ) − f ′ ( x0 ) x0 ] and slope f ′ ( x0 ).

f ( x) 1 [ f ( x0 ) − f ′ ( x0 ) x0 ] + f ′ ( x0 ) x

f ( x0 ) x x0

1

[ f ( x0 ) − f ′ ( x0 ) x0 ]

Figure 8.2: First-order Taylor series expansion

One advantage of the approximate expansion is that there is no need to calculate

p, since the expansion does not have to be exact. A second-order Taylor series expansion of x2 gives: 1 f ( x) = x02 + 2 x0 ( x − x0 ) + 2( x − x0 )2 = x02 + 2 x0 x − 2 x02 + x2 − 2 xx0 + x02 = x2 . 2 The expansion is exact because f ′′ ( x) = 2 for all values of x: the second derivative

is a constant. Put differently, if f ′′′ ( x) = 0, so the third derivative is 0, then the

second derivative must be constant and so a second-order expansion is exact if the third derivative is 0.

E XAMPLE 8.1: Consider the function f ( x) = ln( x + 1) · x2 . Note that f ′ ( x) =

x2 4x x2 ′′ x+1 + 2 ln( x + 1) x and f ( x) = − ( x+1)2 + ( x+1) + 2 ln( x + 1). At x = 2, f ( x) = 4 ln(3), f ′ ( x) = 34 + 4 ln(3), and f ′′ ( x) = − 49 + 38 + 2 ln(3) = 20 9 + 2 ln(3). This function has a first-order Taylor series expansion around 2 of: f ( x) ≈ 4 ln(3) + (4 ln(3) + 43 )

8.1. INTRODUCTION

275

( x − 2) and a second-order expansion f ( x) ≈ 4 ln(3) + (4 ln(3) + 43 )( x − 2) + 4 2 (ln(3) + 10 9 )( x − 2) . Since ln(3) = 1.0986, 4 ln(3) = 4.3944, 4 ln(3) + 3 = 5.7274 ¡ 10 ¢ ¡ 10 ¢ and ln(3) + 9 = 1.0986 + 9 = 2.2097, so f ( x) ≈ 4.3944 + 5.7274( x − 2) +

2.2097( x − 2)2 .

4.3944 + 5.7274( x − 2) + 2.2097( x − 2)2

f ( x) 20

10

x 1

2

3

4

From the figure, the approximation lies above f ( x), is tangent at x = 2 and

♦

provides a good approximation on the interval (1,3).

8.1.3

Approximations and accuracy

As the length of the Taylor series expansion increases, the accuracy of the approximation increases, but the complexity of the approximation also increases with higher-order polynomials. So, there is a trade-off between increasing accuracy and increasing complexity. To see this, consider f ( x) = x3 and expansions

around x0 = 1. The first-order Taylor series expansion is TSE1 ( x) = 1 + 3( x − 1) =

f (1) + f ′ (1)( x − 1); and the second-order Taylor series expansion is TSE2 ( x) = 1 +

3( x − 1) + 3( x − 1)2 = f (1) + f ′(1)( x − 1) + 21 f ′′ (1)( x − 1)2 . The error associated with

the first-order Taylor series expansion is E1 ( x) =| f ( x) − TSE1 ( x) |, and with the

second-order expansion, E2 ( x) =| f ( x) − TSE2 ( x) |. The following table compares

the errors.

x

x3

TSE1 ( x)

TSE2 ( x)

E1 ( x )

E2 ( x )

0.4

0.064

0.28

0.864

0.216

0.5

0.125

−0.8

0.25

0.625

0.125

0.6

0.216

−0.5

0.416

0.064

0.343

−0.2

0.28

0.7

0.1

0.37

0.243

0.027

CHAPTER 8. TAYLOR SERIES

276

0.8

0.512

0.4

0.52

0.112

0.008

0.9

0.729

0.7

0.73

0.029

0.001

1

1

1

1

0

0

1.1

1.331

1.3

1.33

0.031

0.001

1.2

1.728

1.6

1.72

0.128

0.008

1.3

2.197

1.9

2.17

0.297

0.027

1.4

2.744

2.2

2.68

0.544

0.064

1.5

3.375

2.5

3.25

0.875

0.125

1.6

4.096

2.8

3.88

1.296

0.216

f (x) = x3

TSE 2 (x) = 1 +3(x −1)+3(x −1)2

4

TSE 1 (x) = 1 +3(x −1)

3 2 1 · 1

2

−1 −2

Figure 8.3: Taylor series approximations

In Figure 8.3, f (0) = 0, T SE 1 (0) = −2 and T SE 2 (0) = 1.

8.2 Applications of Taylor series expansions The following sections consider some applications of Taylor series expansions. The first application is a proof that concavity of a function implies that the second derivative is non-positive. Following this, numerical optimization routines based on Taylor series are developed. Then some applications on decision-making with risk are described.

8.2. APPLICATIONS OF TAYLOR SERIES EXPANSIONS

277

8.2.1

Concavity and the second derivative

The following theorem shows that concavity implies a non-positive second derivative; a similar argument may be used to show that convex functions have nonnegative second derivatives. Theorem 8.2: If f is concave and f ′′ ( x) continuous then for any x, f ′′ ( x) ≤ 0. P ROOF : Pick x and let ∆ x = x˜ − x, x˜ close to x. Note 21 ( x + ∆ x) + 12 ( x − ∆ x) = x and

f concave gives:

f ( x) = f

µ

¶ 1 1 1 1 ( x + ∆ x ) + ( x − ∆ x ) ≥ f ( x + ∆ x ) + f ( x − ∆ x ). 2 2 2 2

Therefore 2 f ( x) ≥ f ( x + ∆ x) + f ( x − ∆ x) (or f ( x + ∆ x) + f ( x − ∆ x) ≤ 2 f ( x)). Now, Taylor’s theorem gives:

f ( y) = f ( y0 ) + f ′ ( y0 )( y − y0 ) +

1 ′′ f ( y˜ )( y − y0 )2 , 2!

where y˜ is between y and y0 . First, take y = x + ∆ x, y0 = x, so y − y0 = ∆ x and then

f ( x + ∆ x ) = f ( x ) + f ′ ( x )∆ x +

1 ′′ ∗ f ( x )(∆ x)2 , x∗ between x and x + ∆ x. 2

Next, take y = x − ∆ x, y0 = x, so y − y0 = −∆ x and then

f ( x − ∆ x ) = f ( x ) − f ′ ( x )∆ x +

1 ′′ f ( xˆ )(∆ x)2 , xˆ between x and x − ∆ x. 2

Adding these together: 1 f ( x + ∆ x) + f ( x − ∆ x) = 2 f ( x) + [ f ′′ ( x∗ ) + f ′′ ( xˆ )](∆ x)2 . 2 Recall from concavity that 2 f ( x) ≥ f ( x + ∆ x) + f ( x − ∆ x), so 1 0 ≥ f ( x + ∆ x) + f ( x − ∆ x) − 2 f ( x) = [ f ′′ ( x∗ ) + f ′′ ( xˆ )](∆ x)2 2 Since (∆ x)2 > 0, this implies that 0 ≥ 21 [ f ′′ ( x∗ ) + f ′′ ( xˆ )]. Since f ′′ is continuous, as ∆ x → 0, x∗ , xˆ → x, and so f ′′ ( x∗ ), f ′′ ( xˆ ) → f ′′ ( x). Therefore, 0 ≥ f ′′ ( x).

CHAPTER 8. TAYLOR SERIES

278

8.2.2 Roots of a function Given a function f , a common problem is to find a root of f , a value for x such that ¯ = 0. Finding a root involves solving the equation f (x) = 0. In practice this can f ( x)

be difficult. However, using Taylor series there is a simple procedure that may be used to search for the roots of a function. Consider a first-order Taylor series expansion of f around some x0 : f (x) ≈ f (x0 ) + f ′ (x0 )(x − x0 ). Given x0 , suppose x is chosen to set f (x0 ) + f ′ (x0 )(x − x0 ) = 0, with solution x1 . Then f (x1 ) ≈ f (x0 ) + f ′ (x0 )(x1 − x0 ) = 0, so that x1 is “approximately” a root of f . If this process is repeated with x1 replacing x0 , a revised estimate, x2 , for a root is obtained: f (x2 ) ≈ f (x1 ) + f ′ (x1 )(x2 − x1 ) = 0. And so on. This process is easy to implement in practice. Assuming f ′ (x0 ) 6= 0, the expression can be arranged to give an algorithm: x1 − x0 = −

f (x0 ) f ′ (x0 )

x1 = x0 −

f (x0 ) f ′ (x0 )

x2 = x1 −

f (x1 ) f ′ (x1 )

.. .

.. .

x n +1 = x n −

f (x n ) . f ′ (x n )

To illustrate, consider the function f (x) = − 12 + 4x − 4x2 + x3 (Figure 8.4). Applying this iterative process:

xi

f

f′

xi −

1

0.5

−1

1.5

1.5 1.4 1.403

−0.125

0.004 0.000

f (xi ) f ′ (xi )

−1.25

1.4

−1.318

1.403

−1.32

1.403

8.2. APPLICATIONS OF TAYLOR SERIES EXPANSIONS

279

f ( x) 2

1

x −0.5

0.5

1.0

1.5

2.0

2.5

Figure 8.4: Roots of a function: f ( x) = − 12 + 4 x − 4 x2 + x3

8.2.3

Numerical optimization

Suppose we wish to maximize some differentiable function f ( x). A maximum has the property that f ′ ( x) = 0. How do we solve f ′ ( x) = 0? One approach uses Taylor

series expansions,

f ( x) ≃ f ( x0 ) + f ′ ( x0 )( x − x0 ) +

1 ′′ f ( x0 )( x − x0 )2 . 2

When x0 is “close” to x, a value of x that maximizes the right-hand side (RHS) will also maximize the LHS. Differentiate with respect to x and set to 0. Since the RHS is a quadratic function of x, the derivative is linear in x. This yields

f ′ ( x0 ) + f ′′ ( x0 )( x − x0 ) = 0. Solving for x (call the solution x1 )

x1 = x0 −

f ′ ( x0 ) , f ′′ ( x0 )

assuming that f ′′ ( x0 ) 6= 0. Presumably, x1 will be closer to a maximum of f ( x) than

x0 . We can continue the iteration

x2 = x1 −

f ′ ( x1 ) , f ′′ ( x1 )

CHAPTER 8. TAYLOR SERIES

280

assuming f ′′ ( x1 ) 6= 0, and having defined x1 , x2 ,... , x n−1 ,

f ′ ( xn ) , f ′′ ( x n )

x n = x n −1 −

assuming f ′′ ( x n−1 ) 6= 0. If { x n } converges it will solve the problem f ′ ( x) = 0. E XAMPLE 8.2: Consider f ( x) = sin x so that f ′ ( x) = cos x, f ′′ ( x) = − sin x. The

sine function, sin( x), is equal to 0 at x = 0, rising to a value of 1 at x = π2 ,

falling to 0 at x = π. Thus, on the interval [0, π], sin x has a maximum at Starting the algorithm at

1 2

π 2.

= x0 begins an iteration to estimate π2 . Using the

formula, the iteration has the form:

x n = x n −1 +

cos( x n−1 ) . sin( x n−1 )

Beginning with x0 = 21 , ¡1¢ 1 cos 2 1 0.878 x1 = + = 2.330 ¡1¢ = + 2 sin 2 0.479 2

cos(2.330) −0.688 x2 = 2.330 + = 2.330 + = 1.381 sin(2.330) 0.725 cos(1.381) 0.189 x3 = 1.381 + = 1.381 + = 1.573 sin(1.381) 0.982

x4 = 1.573 +

cos(1.573) −(2.20) × 10−3 = 1.573 + = 1.5707963. sin(1.573) 0.999

0.982 0.683

sin( x)

0.479

·

·

1 2

Since

π 2

1.383

·

π 2

· 2.329

π

= 1.5707963, even at the fourth iteration we are getting an accurate

estimate of

π 2!

Starting at

1 2,

observe that the algorithm moves rightward

in the direction of increase of the function given by the derivative (to 2.329) overshooting. From this point, again moving in the direction of increase of

8.2. APPLICATIONS OF TAYLOR SERIES EXPANSIONS

281

the function, now leftward, the algorithm moves to 1.383. The next move is rightward, since the direction of increase is rightward – to 1.573.

♦

E XAMPLE 8.3: Consider the task of maximizing profit, where the profit function is: π( x) = 10 ln( x + 1) − 0.1 e0.1 x − x2 .

First, observe that the derivatives are π′ ( x) = π′′ ( x) =

−10 ( x+1)2

10 x +1

− (0.1)2 e0.1 x − 2 x and

− (0.1)3 e0.1 x − 2. Thus π′′ < 0 and the profit function is concave.

Therefore, if x∗ satisfies π′ ( x∗ ) = 0, then x∗ maximizes π. Using a secondorder Taylor series expansion:

1 π( x) ≈ π( x0 ) + π′ ( x0 )( x − x0 ) + π′′ ( x0 )( x − x0 )2 , 2 so that 0 = π′ ( x) ≈ π′ ( x0 ) + π′′ ( x0 )( x − x0 ) ⇒ x ≈ x0 −

π′ ( x 0 ) π′′ ( x0 )

.

There is no obvious point to start the iteration, so choose x0 = 0. The algorithm proceeds according to:

x i +1 = x i −

π′ ( x i ) π′′ ( x i )

= xi −

Ã

10 2 0.1 x − 2x x+1 − (0.1) e −10 3 0 . 1 − (0.1) e x − 2 ( x+1)2

!

.

The following table gives the sequence of iterations: From this it is clear that if x i+1 = x i and π′′ ( x i ) 6= 0, then π′ ( x i ) = 0 and this implies that x i maximizes π:

′

x i − ππ′′((xx))

xi

π′ ( x )

π′′ ( x)

0

9.99

0.832431

0.832431

3.781503

−12.0010

1.591887

0.662694

1.781785

1.781785

0.019295

−3.48974

1.787643

1.59 × 10−05

−3.28804

1.787648

−3.28803

1.787648

−3.28803

1.787648

1.787648 1.787648 1.787648 1.787648

1.08 × 10

−11

2.25 × 10

−16

2.25 × 10

−16

2.25 × 10−16

−4.97923

1.591887

−3.29346

1.787643

−3.28803

1.787648

−3.28803

1.787648

The profit-maximizing choice of x is x∗ = 1.787648.

♦

CHAPTER 8. TAYLOR SERIES

282

E XAMPLE 8.4: A production function has isoquants given by x y = k: if in-

puts are used at the levels x∗ and y∗ , the output level is x∗ y∗ . A technological constraint requires that the inputs x and y satisfy the relation

y = 1 − e−α x . Suppose that α = 2, and the firm decides to produce an output

of 5 units, so k = 5. In this case, the values of x∗ and y∗ are determined by

the two equations x y = 5 and y = 1 − e−2 x . The following calculations show how the values may be calculated using a Taylor series expansion. The first equation can be written x = the second equation we get y = 1 − e

f ( y) = y − 1 + e

−10 y

5 y

−2 5y ∗

and when this is substituted into = 1− e

−10 y

or y − 1 + e

−10 y

∗

= 0. Let

, so we wish to find y such that f ( y ) = 0. Now, f ( y∗ ) ≈

f ( y0 ) + f ′ ( y0 )( y∗ − y0 ), when y∗ is “close” to y0 , and since f ( y∗ ) = 0, f ( y0 ) +

f ′ ( y0 )( y∗ − y0 ) ≈ 0, so that y∗ ≈ y0 − iteration yi+1 ≈ yi −

f ( yi ) f ′ ( yi ) .

f ( y0 ) f ′ ( y0 ) .

This provides the basis for the ³ ´ −10 −10 Since f ( y) = y − 1 + e y , f ′ ( y) = 1 + 10 e y : y2 

−10 yi



  yi − 1 + e µ ¶ −10  y i +1 = y i −  .  1 + 102 e yi yi

Use a starting value for the iteration of y0 = 2:

f ′ ( x)

f ( x) f ′ ( x)

xi

f ( x)

2

1.006738

1.016845

1.009939

1.009939

0.00999

1.000496

0.999955

0.999955

2.37 × 10−07

1.000454

0.999955

−4.7 × 10

−15

1.000454

0.999955

−17

1.000454

0.999955

−5.1 × 10

−17

1.000454

0.999955

0.999955 0.999955 0.999955

−5.1 × 10

Thus, y∗ = 0.999955 and x =

5 0.999955

xi −

is the required solution.

♦

The numerical optimization procedure generalizes directly to multivariate optimization. The second-order Taylor series approximation in the multivariate case is:

f ( x ) ≈ f ( x0 ) + ∇ f ( x0 ) · ( x − x0 ) +

1 ( x − x0 )′ H ( x0 )( x − x0 ), 2!

where ∇ f ( x0 ) is the vector of partial derivatives of f evaluated at x0 , and where

H ( x0 ) is the Hessian matrix of second-order partial derivatives, also evaluated at

8.2. APPLICATIONS OF TAYLOR SERIES EXPANSIONS

283

x0 . Maximizing the right side by choice of the vector x gives: ∇ f ( x0 ) + H ( x0 )( x − x0 ) = 0.

Assuming the Hessian matrix is non-singular, the solution x1 satisfies

x 1 − x 0 = − H ( x 0 ) −1 ∇ f ( x 0 ) , and the resulting iteration is:

x i +1 − x i = − H ( x i ) −1 ∇ f ( x i ) .

E XAMPLE 8.5: Consider the objective function: 1

1

π( x, y) = 2 x 2 + 4 y 2 + x y − x2 − y2 .

Using a second-order expansion, it is straightforward to develop an algorithm to locate the values of x and y that maximize π. First, observe that a second-order Taylor series expansion of π( x, y) around ( x0 , y0 ) is: 1 π( x, y) ≈ π( x0 , y0 ) + π x ( x0 , y0 )( x − x0 ) + π y ( x0 , y0 )( y − y0 ) + {π xx ( x0 , y0 )( x − x0 )2 2 + 2π x y ( x0 , y0 )( x − x0 )( y − y0 ) + π yy ( x0 , y0 )( y − y0 )2 }.

Maximize the right side by choice of x and y. Differentiating with respect to x and y, respectively, and setting the equations to 0 gives the solution equations. For x: 1 0 = π x ( x0 , y0 ) + {2π xx ( x0 , y0 )( x − x0 ) + 2π x y ( x0 , y0 )( y − y0 )} 2 = π x ( x0 , y0 ) + π xx ( x0 , y0 )( x − x0 ) + π x y ( x0 , y0 )( y − y0 ).

Similarly, with y: 0 = π y ( x0 , y0 ) + π x y ( x0 , y0 )( x − x0 ) + π yy ( x0 , y0 )( y − y0 ). In matrix terms: Ã ! 0

0

=

Ã

π x ( x0 , y0 )

! Ã

π y ( x0 , y0 )

+

π xx ( x0 , y0 )

π x y ( x0 , y0 )

π x y ( x0 , y0 )

π yy ( x0 , y0 )

!Ã

( x − x0 )

!

( y − y0 )

.

CHAPTER 8. TAYLOR SERIES

284

Rearranging: Ã

( x − x0 )

!

( y − y0 )

=−

Ã

π xx ( x0 , y0 )

π x y ( x0 , y0 )

π x y ( x0 , y0 )

π yy ( x0 , y0 )

! −1 Ã

π x ( x0 , y0 )

!

π y ( x0 , y0 )

.

The solution has the form ( z = ( x, y)):

z = z0 − Hπ ( x, y)−1 ∇π( x, y), where Hπ is the matrix of partial derivatives and ∇π = (π x , x¯ y ). For the specific functional form: 1

1

π( x, y) = 2 x 2 + 4 y 2 + x y − x2 − y2 1

π x ( x, y) = x− 2 + y − 2 x 1

π y ( x, y) = 2 y− 2 + x − 2 y

1 3 π xx ( x, y) = − x− 2 − 2 2 3

π yy ( x, y) = − y− 2 − 2 π x y ( x, y) = 1.

Thus,

H ( x, y) =

Ã 3 − 21 x− 2 − 2

1

1 3

− y− 2 − 2

!

,

and so the algorithm is (beginning at ( x, y) = ( x0 , y0 )): Ã

x i +1

y i +1

!

Ã !  −3 xi − 21 x i 2 − 2 = − yi 1

−1 

1 − 23

− yi

−2



− 21

xi

+ yi − 2 x i



,  −1 2 yi 2 + x i − 2 yi

given initial value ( x0 , y0 ). A plot of the function:

z

x y

♦

8.3. EXPECTED UTILITY THEORY AND BEHAVIOR TOWARDS RISK

285

8.3 Expected utility theory and behavior towards risk The presence of risk (variability) lowers the value of an asset: other things equal, comparing two assets, the one that has greater risk will have a lower value. In this section, Taylor series expansions are used to determine approximate welfare and monetary measures of the negative impact of risk. Before providing these measures, some introduction to utility and risk is necessary.

8.3.1

Expected utility theory

The usual consumer theory model has u( x1 , x2 ,... , x n ) = utility from consumption of the ( x1 ,... , x n ) bundle of goods. In this model, the individual receives and consumes x1 of good 1, x2 of good 2 and so on. Suppose that one receives some random return instead of a sure consumption bundle. Then this formulation is inappropriate: when a person receives a random return X , we cannot attach a utility to the return in the same way. Let X be a random return:

X

Probability

x1

p1

x1 .. .

p1 .. .

xn

pn

where p i is the probability that X takes on the value x i . With this random return, the individual receives income x i with probability p i . When you hold a lottery ticket that pays a million dollars if you win (and you win with probability p, say), then the situation is described:

X

Probability

0

(1 − p)

1 million

p

Note that you either get 0 or 1 million – you do not get both! Above, the person receives some return (some x i ). Only one x i is actually received. Note also that we often have to make a decision about whether to purchase a random return or not, prior to knowing the outcome. We cannot choose to buy the lottery ticket after we know whether we have won or not. At the time of purchase, we face the possibility

CHAPTER 8. TAYLOR SERIES

286

of getting either 0 or one million dollars – and have to make a decision. Similarly, when one buys a stock today, the value of the stock tomorrow is unknown. In situations such as these, we need a way to compare the utility associated with different uncertain outcomes, so that we can choose the most preferred one. Expected utility theory is such a way. According to expected utility theory an individual has a utility function, u, defined on income. To calculate the utility associated with the random income or return stream X given previously, calculate the utility of income in each state i, u( x i ), multiply by the probability p i to get p i u( x i ) and sum P these terms, p i u( x i ). This is called expected utility and denoted E {u( X )}. Here,

u( x i ) is utility from x i . (Some outcome x i will occur – and this is what the indi-

vidual actually receives.) Now we can compare two random returns X and Y in expected utility terms:

X

Probability

Y

Probability

x1

p1

y1

q1

x2 .. .

p2 .. .

y2 .. .

q2 .. .

xn

pn

ym

qm

Then the individual prefers the random return y to x if: m X

j =1

q j u( y j ) >

n X

p i u ( x i ).

i =1

E XAMPLE 8.6: Let X and Y be two random variables with the following distributions:

X

Probability

Y

Probability

100

1 3 1 3 1 3

200

1 4 1 4 1 4 1 4

196 400

220 244 264

Suppose that the utility function is the square root function: u(·) =

p

·.

1 44 p i u( x i ) = [10 + 14 + 20] = = 14.67 3 3 X 60.84 1 = 15.21. E {u(Y )} = p i u( yi ) = [14.14 + 14.83 + 15.62 + 16.25] = 4 4

E { u ( X )} =

X

8.3. EXPECTED UTILITY THEORY AND BEHAVIOR TOWARDS RISK

287

Thus, y is preferred to x. Why should this be? Some insight is given from the mean and variance. E(X ) =

X

p i x i = 232 =

696 3

E(Y ) =

X

q i yi = 232 =

928 4

1 46944 Var(X ) = [(100 − 232)2 + (196 − 232)2 + (400 − 232)2 ] = = 15648 3 3 1 Var(Y ) = [(200 − 232)2 + (220 − 232)2 + (244 − 232)2 + (264 − 232)2 ] 4 =

2336 = 584 4

Thus, while both have the same mean return, Y has a smaller variance – is

♦

“less risky”.

8.3.2

Risk-averse preferences and welfare loss

The reason that Y is preferred to X in Example 8.6 is related to the shape of u. In general, u(·) is assumed to be concave: this is said to represent risk-averse pref-

erences. An individual is said to be risk averse if u is concave. It is easy to check p that · is a concave function. Loosely speaking, risk-averse individuals prefer low

variance for a given mean return. To understand the role of concavity, consider the case where the random variable X has only two possible values, x1 and x2 , as depicted in Figure 8.5.

X

Probability

x1

p

x2

1− p

The individual prefers the sure amount µ = px1 + (1 − p)x2 to the uncertain

return which gives x1 with probability p and x2 with probability (1 − p). Thus

concavity means that the individual prefers the mean return µ, for sure, rather than the uncertain return X (with mean µ). One can check that the opposite is the case when the utility function is convex. This can be taken further to clarify the role of the variance.

CHAPTER 8. TAYLOR SERIES

288

u

•

u ( x2 ) u(µ)

• •

E { u ( X )}

u ( x1 )

•

µ

x1

x2

x

Figure 8.5: Risk aversion Suppose x1 , x2 are close to µ, so that we can use a Taylor series approximation: 1 1 u( x1 ) = u(µ) + u′ (µ)( x1 − µ) + u′′ (µ)( x1 − µ)2 + u′′′ (µ1 )( x1 − µ)3 2 3! 1 1 u( x2 ) = u(µ) + u′ (µ)( x2 − µ) + u′′ (µ)( x2 − µ)2 + u′′′ (µ2 )( x2 − µ)3 . 2 3! P We can use these equations to calculate the expected utility, E (u) = p i u( x i ), as

follows:

E ( u ) = p 1 u ( x1 ) + p 2 u ( x2 ) = [ p1 u(µ) + p2 u(µ)] + [ p1 u′ (µ)( x1 − µ) + p2 u′ (µ)( x2 − µ)]

1 + [ p1 u′′ (µ)( x1 − µ)2 + p2 u′′ (µ)( x2 − µ)2 ] 2 1 + [ p1 u′′′ (µ1 )( x1 − µ)3 + p2 u′′′ (µ2 )( x2 − µ)3 ]. 3!

If ( x i − µ) is “small” then the last term is relatively small and can be ignored. Then,

E {u( X )} ≈ [ p1 u(µ) + p2 u(µ)] + [ p1 u′ (µ)( x1 − µ) + p2 u′ (µ)( x2 − µ)] 1 + [ p1 u′′ (µ)( x1 − µ)2 + p2 u′′ (µ)( x2 − µ)2 ]. 2

8.3. EXPECTED UTILITY THEORY AND BEHAVIOR TOWARDS RISK

289

Adding terms: E {u(X )} ≈ [p1 + p2 ]u(µ) + [p1 (x1 − µ) + p2 (x2 − µ)]u′ (µ) 1 + [p1 (x1 − µ)2 + p2 (x2 − µ)2 ]u′′ (µ). 2

Note that [p1 + p2 ] = 1, and [p1 (x1 − µ) + p2 (x2 − µ)] = 0 from the definition of the mean; while [p1 (x1 − µ)2 + p2 (x2 − µ)2 ] = σ2 , from the definition of the variance. Thus, we have the expression:

1 E {u(X )} ≈ u(µ) + u′′ (µ)σ2 . 2

(8.4)

If we write the last expression as E {u(X )} − u(µ) ∼ = 12 u′′ (µ)σ2 , we see that when u

is concave, then u′′ is negative – so that expected utility is less than u(µ). (See Figure 8.6.) Note also that the difference depends on the variance of the random return. Thus, the welfare loss due to small risk is: 1 ′′ u (µ)σ2 . 2 So, 12 u′′ (µ)σ2 is the loss in utility or the disutility associated with the risk. Figure 8.6 illustrates the idea. The disutility associated with risk is measured in utility terms but we can also give a monetary measure – discussed next.

u

•

u(x2 ) u(µ)

o

1 ′′ 2 2 u (µ)σ

•

E {u(X )}

u(x1 )

•

•

x1

µ

x2

Figure 8.6: Utility loss associated with risk

x

CHAPTER 8. TAYLOR SERIES

290

8.3.3 Risk aversion and the cost of small risk Let income be represented by a random variable Y , where Y takes on the value P P yi with probability p i (i.e., Prob(Y = yi ) = p i ). Thus µ y = p i yi and σ2y = p i ( yi − µ y )2 . Instead of asking how risk affects the level of utility, one can focus on the

monetary cost of bearing risk. How much would a person pay to avoid facing the risk and receive a sure income µ y ? If the individual receives µ y and makes a payment of c, the amount received is y∗ = µ y − c, and the individual is indifferent between the sure income y∗ and facing the random return Y . The amount, c, is P given as the solution of the equation: u(µ y − c) = p i u( yi ). In general, solving this explicitly for c is very difficult. However, we can use Taylor series expansions to find an approximate solution. This is done now, expanding each side separately. First, consider an expansion of u(µ y − c) around µ y .

u(µ y − c) = u(µ y ) − u′ (µ∗ ) c, some µ∗ ∈ [µ y − c, µ y ] = u(µ y ) − u′ (µ y ) c + [u′ (µ y ) − u′ (µ∗ )] c ≈ u(µ y ) − u′ (µ y ) c, when c is small so [u′ (µ y ) − u′ (µ∗ )] is small.

By taking an expansion to the first order for u(µ y − c), the approximation obtained

is linear in c. (A better approximation is given by u(µ y − c) ≈ u(µ y ) − u′ (µ y ) c + 1 ′′ 2 2 u (µ y ) c ,

but that is quadratic in c. Here, the linear approximation gives sim-

plicity at the cost of less accuracy.) Next, consider the right side of the expression u(µ y − c) =

sion of u( yi ) gives:

P

p i u( yi ). An expan-

1 1 u( yi ) = u(µ y ) + u′ (µ y )( yi − µ y ) + u′′ (µ y )( yi − µ y )2 + [u′′ (ξ) − u′′ (µ y )]( yi − µ y )2 , 2 2 where ξ is between µ y and yi . If yi is close to µ y the last term is relatively small and so: 1 u( yi ) ≈ u(µ y ) + u′ (µ y )( yi − µ y ) + u′′ (µ y )( yi − µ y )2 . 2 Multiplying by p i and summing over i gives: X

or, since

P

p i u ( y i ) ≃ u (µ y ) + u ′ (µ y )

p i ( yi − µ y ) =

P

p i yi −

X

P

X

X 1 p i ( yi − µ y ) + u′′ (µ y ) p i ( yi − µ y )2 , 2

p i µ y = µ y − µ y = 0,

1 p i u( yi ) ≃ u(µ y ) + u′′ (µ y )σ2y . 2

8.3. EXPECTED UTILITY THEORY AND BEHAVIOR TOWARDS RISK

291

u

•

u(x2 ) u(µ)

•

E {u(X )} = u(µ − c)

•

•

u(x1 )

c

z }| { µ

x1

x2

x

Figure 8.7: The cost of risk

Combining these two approximations: u(µ y ) − u′ (µ y )c ≈ u(µ y − c) = and so

X

1 p i u(yi ) ≈ u(µ y ) + u′′ (µ y )σ y 2

1 u(µ y ) − u′ (µ y )c ≈ u′ (µ y ) + u′′ (µ y )σ y . 2

Solving for c (approximately) gives: c≈

1 −u′′ (µ y ) 2 σ . 2 u ′ (µ y ) u′′ (µ )

Thus the “cost” of the risk is proportional to σ2y . The proportionality factor − u′ (µ yy)

is called the “degree of absolute risk aversion”. Figure 8.7 indicates how c is determined.

E XAMPLE 8.7: Let u(x) = ln(x), so that u′ (x) =

the random return is:

X

Probability

95

1 2 1 2

105

1 x

and u′′ (x) = − x12 . Suppose

CHAPTER 8. TAYLOR SERIES

292

u′′ ( x) u ′ ( x)

= − 1x and σ2x = 21 [52 + 52 ] = 25. From these, we calculate c £ ¤ £ 1 ¤ 25 = 18 . Similarly, using the formula c = 12 1x σ2x = 12 100 i we see that the h

Therefore,

utility loss associated with the risk is 21 u′′ (µ)σ2x =

1 2

−1 1002

25 =

25 20,000 .

♦

E XAMPLE 8.8: This example illustrates the computation of the cost of risk and the loss of utility using the approximations based on Taylor series expansions. Exact calculations (rather than approximations) are also given, to provide comparison. 1

Suppose that an individual has the utility function u( x) = x 3 and suppose

that the individual faces a 50-50 gamble of $45 or $55 (i.e., with probability 1 2

the individual receives $45 and with probability

$55). Since E {u( X )} ≈ u(50) +

1 2

the individual receives

1 ′′ 2 2 u (50)σ ,

5 2 2 1 u(50) − E {u( X )} = − u′′ (50)σ2 = 50− 3 × 25 = (0.001474)25 = 0.008187. 2 9 9

Thus, the approximate loss of utility from facing the risk, relative to receiving the sure amount of 50, is 0.008187. Similarly, the approximate cost associated with the risk is given by c ≈

′′ (50) 2 − 21 uu′ (50) σ . To compute this, 5 u′′ ( x) = − 92 x− 3 . Therefore,

1

observe that u( x) = x 3 , so that u′ ( x) = 13 x

−2 3

and

5

2 − 6 1 2 1 6 u′′ ( x) − 9 x 3 = = − x −1 = − × = − × . 2 ′ 1 − u ( x) 9 9 x 3 x 3 3x

Since σ2 = 12 (50 − 45)2 + 21 (55 − 45)2 = 12 52 + 21 52 = 25,

c≈−

µ ¶ 1 1 2 1 1 u′′ (50) 2 σ = − × − 25 = = 0.16667. ′ 2 u (50) 2 3 50 6

Now, observe that in this example, the exact value of expected utility is: E(u( X )) =

1 1 1 1 1 1 u(45) + u(50) = 45 3 + 50 3 = 3.620462. 2 2 2 2

Recall that c is chosen to satisfy:

u(50 − c) = E{u( X )},

8.4. DIVERSIFICATION AND PORTFOLIO SELECTION

293

1

so that c must satisfy u(50 − c) = (50 − c) 3 = 3.679923, or 50 − c = 3.6799233 =

49.8329, which gives c = 50 − 49.8329 = 0.167101. Summarizing, the exact cost of the risk is 0.167101, and the approximation of this using the Taylor

♦

series expansions is 0.16667.

8.4 Diversification and portfolio selection Diversification of risk involves combining different random returns that are independent or have low correlation. In that way, the probability of all returns being low (or high) at the same time is small, so that extreme variation in return is less likely. This is briefly discussed and related to the cost of risk. Then approximately optimal portfolio selection using Taylor series expansions is considered.

8.4.1

Diversification

This section illustrates how portfolio diversification reduces risk. Suppose that we have n projects. Assume that the return on each project is independent of the return on other projects. Project j has return Y ( j), a random variable. This random variable has n j possible values, (Y1 ( j),Y2 ( j),... ,Yn j ( j)), with associated probabilities ( p1 ( j), p2 ( j),... , p n j ( j)). The probability that Y ( j) takes the value Yi ( j) is

p i ( j): µ( j) = σ2 ( j ) =

nj X

p i ( j) yi ( j) = E (Y ( j)),

nj X

p i ( j)[Yi ( j) − µ( j)]2 = Var{Y ( j)}.

i =1

i =1

Let µ j = µ, σ2 ( j) = σ2 – so we assume that each of the random variables has the

same mean and variance. If an individual undertakes project j, the cost of risk is:

c=

1 −u′′ (µ) 2 σ . 2 u′ (µ)

Suppose instead the individual forms a portfolio containing n1 of each project. DeP note this portfolio Z . Z = n1 nj Y j a random variable. The mean and variance are:

E(Z) = µ =

n n 1X 1 1 X µj = µ = nµ = µ n j =1 n i =1 n

CHAPTER 8. TAYLOR SERIES

294

and Var( Z ) = n1 σ2 . To see the latter: Ã

! # " n n 1 X 1 X Var( Z ) = Var Yj = 2 Var(Y j ) , n j =1 n j =1

assuming the Y are independent. This is then equal to 1 n2

"

n X

σ2j

j =1

#

1 = 2 n

"

n X

j =1

σ

2

#

=

1 £ 2¤ 1 2 nσ = σ . n n2

Using our earlier results, we see that Z is a portfolio that has cost of risk to the

i th individual given by: c=

· ¸ 1 −u′′ (µ) σ2 . 2 u′ (µ) n

We observe that as the number (n) of projects increases, the cost of risk goes to 0. You can easily see what is happening here – the portfolio Z has the same mean return as each of the Y ( j) assets but the variance of Z is

1 n

the variance of each of

the Y ( j).

8.4.2 Portfolio selection This section studies the optimal division of a portfolio into a risky and a safe asset. The purchase of Z units of a risky asset yields an end-of-year gross return Z per unit of risky asset purchased. Let p i = Prob(Z = zi ). The asset price is P (price at

beginning of year), the individual’s wealth at the beginning of the year is W . Thus,

if the individual purchases α units of the asset and if at the end of the year the risky asset return turns out to be z i , then the individual’s end-of-year wealth is

W − αP + α z = W + α( z i − P ). Therefore, the (end-of-year) expected utility is given P by: E {u(W + α( Z − P ))} = p i u(W + α( z i − P )). P The individual’s problem is to choose α to solve: maxα p i u(W + α( z i − P )). Differentiating with respect to α gives X

p i u′ (W + α( z i − P ))( z i − P ) = 0.

Solving this gives the optimal solution for α. However, this may be difficult to solve, so that finding an approximate solution via a Taylor series approximation is useful. If z i − P is small,

u′ (W + α( z i − P )) ≃ u′ (W ) + u′′ (W )α( z i − P ).

8.4. DIVERSIFICATION AND PORTFOLIO SELECTION

295

Note that by using a first-order Taylor series expansion of u′ (W + α( z i − P )), the

expression above is linear in α. So X

p i {u′ (W ) + u′′ (W )α( z i − P )}( z i − P ) ≃ 0

or X

p i u′ (W )( z i − P ) +

X

p i u′′ (W )α( z i − P )2 ≃ 0.

To simplify this expression, observe that X

X p i u′ (W )( z i − P ) + p i u′′ (W )α( z i − P )2 X X = u′ (W ) p i ( z i − P ) + u′′ (W )α p i ( z i − P )2 .

Focus on the terms

Note that σ z = P p i ( z i − P )2 . X

P

P

p i ( z i − P )2 . We see that:

p i ( z i − P ) and

P

X

X

pi (zi − P) =

pi zi −

X

p i P = µ − P.

p i ( z i −µ z )2 is the variance of Z , but the expression above involves ¡ ¢2 p i ( z i − µ) + (µ − P ) X X X = p i ( z i − µ)2 + 2 p i (µ − P )( z i − µ) + p i (µ − P )2 X X = σ2 + 2(µ − P ) p i ( z i − µ) + (µ − P )2 p i

p i ( z i − P )2 =

X

= σ2 + ( µ − P ) 2 .

P p i = 1 and p i ( z i − µ) = 0.) P This calculation gives p i ( z i − P )2 = σ2 + (µ − P )2 ). So,

(Using the fact that

u′ (W )

X

P

p i ( z i − P ) + u′′ (W )α

X

p i ( z i − P )2 = u′ (W )(µ − P ) + u′′ (W )α[σ2 + (µ − P )2 ].

Therefore, the first-order condition is approximately:

u′ (W )(µ − P ) + u′′ (W )α[σ2 + (µ − P )2 ] ≃ 0. Solving for α: α≈

1 ′′ W ) − uu′ ((W )

(µ − P )

σ2 + ( µ − P ) 2

.

E XAMPLE 8.9: Let u(ξ) = ln(ξ), W = 100, P = 2. Suppose that the random

variable Z has the distribution Prob( Z = 1) = Prob( Z = 5) = 12 . Thus, µ = 3,

CHAPTER 8. TAYLOR SERIES

296

σ2 = 4. With these parameters:

· ¸ −1/(100)2 1 u′′ (W ) −u′′ (100) = − = ⇒ 1/ = 100. u′ (100) 1/100 100 u′ (W ) −2) Using the formula for α, α ≃ 100 (34+ 1 =

100 5

is to purchase roughly 20 units of asset.

= 20, so the approximate solution

♦

8.5 The multivariate version of Taylor’s theorem Continuous functions of many variables can also be approximated by Taylor series expansions. Let f ( x) : R n → R , be a function of n variables where R is the real number line and R n = R ×· · ·×R , the n-times product of R . Recall that the partial

derivative of f with respect to x i is denoted f xi and the second partial f xi x j =

∂2 f ∂ xi ∂ x j .

Write:



f x1



     f x2    ∇ f =  . ,  .   .    f xn



f x1 x1

f x1 x2

 f  x2 x1  H = .  ..   f xn x1

···

f x2 x2

···

.. .

f xn x2

..

.

···

f x1 xn





 f x1 xn    ..  .    f xn xn

and

x1 − x01



     x2 − x02    x − x0 =  . ..     .   x n − x0 n

Also, write ∇ f ( x) and H ( x) to denote the matrix of derivatives evaluated at x. A

Taylor series expansion of f to the second order around x0 is

f ( x ) = f ( x0 ) + ∇ f ( x0 ) · ( x − x0 ) +

1 ( x − x0 )′ H ( p)( x − x0 ). 2!

Theorem 8.3: Let f : R n → R , be continuous with continuous first and second partial derivatives. Then given x, x0 :

f ( x ) = f ( x0 ) + ∇ f ( x0 ) · ( x − x0 ) +

1 ( x − x0 )′ H ( p)( x − x0 ), 2!

where p is on a line between x and x0 : for some θ ∈ (0, 1), p = θ x0 + (1 − θ) x.

E XAMPLE 8.10: Let f ( x, y) = x2 y3 . So, f x ( x, y) = 2 x y3 , f y ( x, y) = 3 x2 y2 , f xx =

2 y3 , f yy = 6 x2 y and f x y = 6 x y2 . Then a first-order Taylor series expansion of

8.6. PROOF OF TAYLOR’S THEOREM

297

f around x = 1, y = 1 is: f ( x, y) ≈ f (1, 1) + f x (1, 1)( x − 1) + f y (1, 1)( y − 1) ≈ 1 + 2( x − 1) + 3( y − 1) ≈ −4 + 3 y + 2 x.

The second-order Taylor series expansion has the form:

f ( x, y) ≈ f (1, 1) + f x (1, 1)( x − 1) + f y (1, 1)( y − 1) 1 + [ f xx (1, 1)( x − 1)2 + 2 f x y (1, 1)( x − 1)( y − 1) + f yy (1, 1)( y − 1)2 ]. 2 Substituting the values for the function: 1 x2 y3 ≈ 1 + 2( x − 1) + 3( y − 1) + [2( x − 1)2 + 2 · 6( x − 1)( y − 1) + 6( y − 1)2 ] 2 ≈ 1 + 2( x − 1) + 3( y − 1) + ( x − 1)2 + 6( x − 1)( y − 1) + 3( y − 1)2 ≈ 6 − 9 y − 6 x + 3 y2 + x2 + 6 x y.

♦

8.6 Proof of Taylor’s theorem The proof of Taylor’s theorem makes use of Rolle’s theorem: Theorem 8.4: Let f be a continuous differentiable function on the interval [a, b]. Suppose that f (a) = f (b) = 0. Then, there is a point c ∈ (a, b) with f ′ ( c) = 0. P ROOF : If f ( x) = 0 for all x ∈ [a, b], then f ′ ( x) = 0 on [a, b]. Otherwise, there

is some point x ∈ [a, b] with f ( x) > 0 or f ( x) < 0. Suppose the former, so f ( x) >

f (a) = f (b). Since f is continuous, it has a maximum on [a, b] and this cannot

be at a or b. Hence there is an interior maximum at some point c with f ′ ( c) = 0.

If instead f ( x) < f (a) = f (b), there is a minimum on [a, b] at some point c with

f ′ ( c) = 0.

Theorem 8.5: Let f be a continuous function on an interval [a, b] with continuous derivatives f ′ , f ′′ , ... , f n . Then given x, x0 ∈ (a, b): 1 ′′ f ( x0 )( x − x0 )2 2! 1 1 + f ′′′ ( x0 )( x − x0 )3 + · · · + f n ( p)( x − x0 )n , 3! n!

f ( x) = f ( x0 ) + f ′ ( x0 )( x − x0 ) +

CHAPTER 8. TAYLOR SERIES

298

where p is between x and x0 , and 1 ′′ f ( x0 )( x − x0 )2 2! 1 1 + f ′′′ ( x0 )( x − x0 )3 + · · · + f n ( x0 )( x − x0 )n 3! n!

f ( x) ≈ f ( x0 ) + f ′ ( x0 )( x − x0 ) +

when x is close to x0 . P ROOF : Consider the second-order expansion – the generalization to the nthorder case is straightforward. Let

g( z) = f ( x) − f ( z) − f ′( z)( x − z) −

K ( x − z )2 . 2

Observe that g( x) = 0 and g( x0 ) = f ( x) − f ( x0 ) − f ′ ( x0 )( x − x0 ) − K2 ( x − x0 )2 . Choose

K to make this 0, so that g( x0 ) = 0. Then g( x) = g( x0 ), so that by Rolle’s the-

orem, there is a point between x0 and x, say ξ with g′ (ξ) = 0 (min{ x0 , x} < ξ < max{ x0 , x}). Differentiating g( z):

g′ ( z) = − f ′ ( z) − f ′′ ( z)( x − z) + f ′( z) + 2

K ( x − z) = − f ′′ ( z)( x − z) + K ( x − z). 2

At ξ, g′ (ξ) = − f ′′ (ξ)( x − ξ) + K ( x − ξ) = 0. Since ξ 6= x, f ′′ (ξ) = K . For the nth-order case: 1 ′′ f ( z)( x − z)2 2! K 1 f ( n−1)( z)( x − z)n−1 − ( x − z)n . −··· − (n − 1)! n!

g( z) = f ( x) − f ( z) − f ′ ( z)( x − z) −

Again, g( x) = 0 and choose K so that g( x0 ) = 0. Rolle’s theorem implies that

there is some ξ between x0 and x with g′ (ξ) = 0. In the derivative of g, all but

two terms cancel:

g′ ( z ) = −

1 K f ( n) ( z)( x − z)n−1 + ( x − z ) n −1 . (n − 1)! (n − 1)!

At ξ,

g′ (ξ) = −

1 K f ( n) (ξ)( x − ξ)n−1 + ( x − ξ)n−1 = 0. (n − 1)! (n − 1)!

Since x 6= ξ, f ( n) (ξ) = K . For the multivariate version, the proof is similar.

8.6. PROOF OF TAYLOR’S THEOREM

299

Theorem 8.6: Let f : R n → R , be continuous with continuous first and second partial derivatives. Then given x, x0 :

f ( x ) = f ( x0 ) + ∇ f ( x0 ) · ( x − x0 ) +

1 ( x − x0 )′ H ( p)( x − x0 ), 2!

where p is on a line between x and x0 : for some θ ∈ (0, 1), p = θ x0 + (1 − θ) x. P ROOF : To see why this is so, given z, ∆ z, let g( t) = f ( z + t∆ z). Then, g is a

function of one variable and a second-order Taylor series expansion around 0 gives:

g( t) = g(0) + g′ (0) t +

1 ′′ g (s) t2 , 0 < s < t. 2

Since

g′ ( t) =

X i

f i ( z + t∆ z)∆ z i ⇒ g′ (0) =

X

f i ( z )∆ z i

i

and

g′′ ( t) =

XX j

i

f i j ( z + t∆ z)(∆ z i )2 ⇒ g′′ (s) =

XX j

i

f i j ( z + s∆ z)(∆ z i )2 ,

this gives 1 ′′ g (s) t2 2 X 1 XX f i j ( z + s∆ z) t2 ∆ z2i . f ( z + t∆ z) = f ( z) + f i ( z) t∆ z i + 2 j i i

g( t) = g(0) + g′ (0) t +

If z = x0 and ∆ z = α=

s t

∈ (0, 1).

x− x0 t ,

then t∆ z = ( x − x0 ) = ∆ x and s∆ z = st ( x − x0 ) = α( x − x0 ),

f ( x ) = f ( x0 ) +

X i

f i ( x 0 )∆ x i +

XX j

i

f i j ( x0 + α∆ x)(∆ x i )2 .

Writing p = x0 + α( x − x0 ) = x0 (1 − α) + α x, gives:

f ( x ) = f ( x o ) + ∇ f ( x0 ) · ( x − x0 ) +

1 ( x − x0 )′ H ( p)( x − x0 ). 2!

This page intentionally left blank

Exercises for Chapter 8

Exercise 8.1 Consider the following functions:

f ( x ) = e x − 1,

(8.5)

g( x) = ln( x + 1),

(8.6)

h( x) = x ,

(8.7)

u( x ) = x ,

(8.8)

2

1 2

1 v( x ) = , x

(8.9)

w( x) = sin( x).

(8.10)

In all cases take x > 0. (a) Plot each function and determine whether each function is concave, convex or neither. (b) For each function, give the second-order Taylor series expansion. Exercise 8.2

Give the second-order Taylor series expansions for the following

functions,

f ( x) ≈ f ( x¯ ) + f ′ ( x¯ )( x − x¯ ) +

1 ′′ f ( x¯ )( x − x¯ )2 : 2

1

f ( x) = x 3

(8.11)

f ( x) = ln( x)

(8.12)

f ( x) = sin( x)

(8.13)

301

CHAPTER 8. TAYLOR SERIES

302

f ( x) = cos( x)

(8.14)

f ( x) = e−rx

(8.15)

f ( x) = x

(8.16)

α

f ( x) = a x .

(8.17)

Exercise 8.3 Consider the average cost function c(Q ) = e−Q + 12 Q 2 . Show that the cost function is strictly convex. Find the minimum of the average cost function c:

find the value of Q that minimizes c. To do this, first observe that we can write a Taylor series approximation of c: 1 c(Q ) ∼ = c(Q 0 ) + c′ (Q 0 )(Q − Q 0 ) + c′′ (Q 0 )(Q − Q 0 )2 . 2 When Q 0 is close to the value of Q that minimizes c, then minimizing c(Q ) should give approximately the same answer as minimizing 1 c(Q 0 ) + c′ (Q 0 )(Q − Q 0 ) + c′′ (Q 0 )(Q − Q 0 )2 . 2 If we minimize the latter expression by differentiating with respect to Q , we get the condition: c′ (Q 0 ) + c′′ (Q 0 )(Q − Q 0 ) = 0. Solving for Q gives Q = Q 0 − i h ′ first-order c (Q 0 ) is the basis of an iteration: guess an initial Q 0 ; this gives Q 1 = c′′ (Q 0h) . This i h ′ i c′ (Q 0 ) c (Q ) Q 0 − c′′ (Q ) . With Q 1 we can obtain another estimate Q 2 = Q 1 − c′′ (Q1 ) , and 0 h1 ′ i c (Q ) with Q 2 we can obtain Q 3 , and so on. At the nth iteration Q n+1 = Q n − c′′ (Qn ) . n

The process converges when the change from Q n to Q n+1 becomes small. In this

case, we endh up with i (approximately) Q n = hQ n′ +1 . iLookingh at′ thei iteration rule c (Q ) c (Q ) c′ (Q n ) Q n+1 = Q n − c′′ (Q n ) yields 0 = Q n+1 − Q n = − c′′ (Qnn ) . Thus c′′ (Qnn ) = 0, and since [ c′′ (Q n )] 6= 0, we get c′ (Q n ) = 0. Thus, since c is convex, Q n minimizes c.

Exercise 8.4 Harveys brewery has a monopoly in the beer market. The demand ¡ ¢ 2 function is P (Q ) = 100 − 31 Q 2 and the cost function is C (Q ) = Q 2 + e0.01Q . (a) Is the profit function concave? (b) Using numerical optimization, find the profit-maximizing level of output. (Hint: start with Q 0 = 8 and perform four iterations.) Exercise 8.5

A monopolist is faced with the following demand curve and cost

function

P (Q ) = 154 − 2 eQ ,

C (Q ) = 4Q + 6Q 2 .

(a) Confirm that the profit function is concave. (b) Find the profit-maximizing level of output using numerical optimization. Write down the algorithm. (Start iterations at q0 = 2 and perform five iterations.)

EXERCISES FOR CHAPTER 8

303

Exercise 8.6

A manufacturer of jackets faces the following demand and cost

functions:

P ( q) = 1000 − 10 q2 ,

C ( q) = 100 q + q ln q.

(a) Confirm that the profit function is concave. (b) Find the profit-maximizing level of output by numerical optimization. Exercise 8.7 Faced with the following demand and cost functions

P ( q) = 1000 − q2 ,

C ( q) = 25 + q2 + q(ln q)2 ,

a widget firm wishes to find the profit-maximizing level of output. Find the profitmaximizing level of output. Exercise 8.8 Suppose a firm’s profit function (as a function of output) is given p by π( q) = 2 ln( q + 1) + q − q. Find the profit-maximizing value of q. Exercise 8.9 Suppose a firm’s revenue function (as a function of output) is given by r( q) = 3(1 − e− q ) + 3 ln( q + 1) − 2 q. Find the revenue-maximizing value of q. Exercise 8.10

Consider the function f ( x) = 1 + x + 4 x2 − 3 x3 . This function has

two points at which the first derivative of f is 0. Denote these x∗ and x¯ , so that

f ′ ( x∗ ) = f ′ ( x¯ ) = 0. It is easy to see that one of these is 1, say that x¯ = 1. Find the

value of x∗ using a Taylor series expansion. Exercise 8.11

An individual has utility defined on income according to u( y) =

ln( y + 1). The individual has initial income y¯ and accepts a bet on a coin toss: if

heads, the individual receives a dollar, if tails, the individual pays a dollar. The

coin is fair (50/50 chance of heads or tails), thus the average utility from the bet is 1 1 u( y¯ − 1) + u( y¯ + 1). 2 2 If the individual had not accepted the bet, the utility obtained is u( y¯ ) for sure. Compare the average utility of the bet with the sure utility from not betting. Should the individual have accepted the bet? Exercise 8.12 An individual has the utility function u( y) = ln( y + 1). This gives

the utility of having y dollars. The individual faces a random income next period:

current income is $100 and next period income will either fall by $5 to $95 or increase by $5 to $105. There is a 50% chance that income will rise and a 50% chance that income will fall, so that average income next period is 100 = 12 95 + 12 105. The

risk premium, c, is defined as the amount of money less than $100 that makes the individual indifferent between accepting that sum for sure and facing the risk

CHAPTER 8. TAYLOR SERIES

304

associated with receiving only $95. That is, the individual is indifferent between $100 − c and the risky return $95 with 50% chance and $105 with 50% chance.

Thus, u(100 − c) =

1 1 2 u(95) + 2 u(105),

or ln(101 − c) = 12 ln(101 − 5) + 21 ln(101 + 5).

Find the approximate value of c using Taylor series expansions. Plot the utility function and illustrate how c is calculated on your graph.

Repeat the previous question using the utility function u( y) =

Exercise 8.13 y

e − 1. In this case, note that the utility function is convex (second derivative

positive) so that your picture will change significantly!

A student has just received a present of a new computer. The

Exercise 8.14

1

student’s yearly income is $50 and the student has a utility function u( y) = y 2 .

Unfortunately, there is a 10% chance that the computer will be stolen and the price of a new computer is $5, so in the event of theft, the replacement will leave the student with only $45 income for the year. Thus, in the absence of theft, the student’s income is $50, while if the computer is stolen, the income drops to $45. Computer insurance is available at a cost of $1. Use the calculations from a Taylor series expansion to decide whether the student should buy insurance or not. Note 1

3

u′ ( y) = 21 y− 2 and u′′ ( y) = − 14 y− 2 , so that · ¸ u′′ ( y) 1 − 3 1 − 1 −1 1 1 3 2 2 = − y y = − y− 2 [2 y1/2 ] = y−1 . ′ u ( y) 4 2 4 2

Exercise 8.15 Consider an individual with the following utility function defined p over wealth: u(w) = w. Suppose the person has the opportunity to take the fol-

lowing gamble: draw a ticket from a hat; a blue ticket yields a prize of $200, a yellow ticket yields $50 and an orange ticket yields $100. There are four orange tickets, two blue, and two yellow. (a) Calculate the expected payoff from the bet.

(b) Calculate the approximate cost of risk to this individual (using a Taylor series expansion). (c) How much is the person willing to pay to partake in this gamble? (d) Suppose instead the utility function is u(w) = w. Rework your answers to

parts (b) and (c).

(e) Suppose now the utility function is u(w) = w2 . Rework your answers to parts (b)

and (c).

(f) Explain intuitively why your answers differ in parts (c), (d) and (e). A risk-averse individual has the following utility function dep fined over income: u(w) = w. The person is faced with the following lottery with Exercise 8.16

EXERCISES FOR CHAPTER 8

305

possible outcomes denoted x i and their associated probabilities P ( x i ):

x

P ( x)

10

1 2 1 8 1 8 1 4

15 1 7

(a) What is the expected payoff from this lottery? (b) Calculate this person’s measure of absolute risk aversion. (c) If tickets for this lottery cost $7.50, will he buy a ticket? (Use a Taylor series expansion.) (d) Another individual has utility function v( x) = kx, k > 0. Will this individual buy a ticket?

Exercise 8.17 An individual owns a house worth $200,000 that is located next to a river. There is a 1% chance that the river will flood in the spring and cause $60,000 worth of damage. The utility function of the home owner is u(w) = ln w. (a) How much is the individual willing to pay for house insurance? (b) Suppose instead that the homeowner’s utility function is u = w. How much

would he now be willing to pay for insurance? Why is this amount more in part (a)? Exercise 8.18 A group of graduate students have recently pooled together their life’s savings and bought a ski cabin. Apart from its inaccessability, the major drawback with this cabin is that it is located near a hill that is somewhat prone to avalanches. The cabin is worth $100,000. There are two types of avalanche. The first, which occurs with 20% probability, will cause $30,000 worth of damage; the second occurs with 2% probability but causes $90,000 worth of damage. Suppose there are three students in this group with utility functions 1

u 1 ( w) = w 2 ,

u 2 ( w) = w2 ,

u3 (w) = w.

How much will each student be willing to pay to insure the cabin? What is the relationship between these amounts and how is this related to their attitudes towards risk? Is it true that a risk-loving individual will never purchase insurance? Exercise 8.19

According to expected utility theory, faced with uncertain out-

comes x1 with probability p and x2 with probability (1 − p), an individual associates utility E (u) = pu( x1 ) + (1 − p)u( x2 ). The risk premium, c, associated with

the risk is defined: u(µ − c) = E (u) = pu( x1 ) + (1 − p)u( x2 ), where µ = px1 + (1 − p) x2 ,

CHAPTER 8. TAYLOR SERIES

306

the mean of the random variable x. In the case where u( x) =

and p =

1 2, 2

find π. An approximation for c is given by c ≈

p

x, x1 = 90, x2 = 110 ′′ 1 u (µ) 2 − 2 u′ (µ) σ , where σ2 =

p( x1 − µ) + (1 − p)( x2 − µ)2 . Estimate c using this approximation and compare it

with the exact solution. Exercise 8.20

(a) Let a risk-averse agent have utility function u( x) = ln( x + 1).

Suppose that this individual faces the gamble: win $95 with probability $105 with probability

1 2.

1 2

and win

How much less than $100 would the individual accept

instead of the gamble? (b) Let a random variable have two possible outcomes x1 and x2 , with probabilities

p1 and p2 , respectively. Thus, the mean is µ = p1 x1 + p2 x2 . Let the agent’s utility

function be u. Suppose that p1 u( x1 ) + p2 u( x2 ) = u( z) (and both x1 and x2 are close

to z). Show that:

p 1 ( x1

µ− z

− z )2 + p

2 ( x2

− z )2

≈−

1 u′′ ( z) . 2 u′ ( z )

9 Vectors

9.1

Introduction

A vector is an array of numbers, x = (x1 , x2 ,... , x n ), and may be interpreted in a

number of ways: as the values of a set of variables, as a coordinate position in R n , as a direction (move x i units in direction i, for i = 1,... , n), and so on. Depending on

context, different interpretations are appropriate. In what follows, (Sections 9.2 and 9.3), some of the key summarizing features of a vector are considered: the

length of a vector, the distance between vectors and the angle between vectors. As mentioned, from a geometric perspective, vectors can be interpreted as having direction – in subsequent chapters on optimization, direction of improvement of a function is central to the characterization of an optimal choice. As it turns out, this direction is represented by a vector of partial derivatives of the function, the gradient vector. Vectors implicitly define hyperplanes and halfspaces, key concepts in economic theory – Section 9.4 discusses these concepts. Finally, Section 9.5 describes Farkas’ lemma, an old and important result in optimization theory.

9.2

Vectors: length and distance

The vector x = (x1 , x2 ,... , x n ) is a point in R n , a list of n real numbers. A vector

may also be viewed in terms of direction as: x1 units along axis 1, x2 units along axis 2, and so on, as illustrated in Figure 9.1. Expressed that way, the vector x describes a direction from 0 to a point.

307

CHAPTER 9. VECTORS

308

a

b

c

x

x2 c

a

b

c

c

b

a

x1

a

b

c

a

b

Figure 9.1: Vectors and length

What is the length of the vector x, labeled c in Figure 9.1? To answer this, from the vector x form a right-angled triangle where the hypotenuse has length c and the adjacent and opposite have lengths a and b, respectively. The length of the vector x is the length of the hypotenuse, c, and the value of c can be determined from the opposite and adjacent. On the right side of the figure, the triangle is arranged around a square with side length c. This square with side length c is inscribed inside a square of side length a + b. The area of the larger square consists of four occurrences of the

triangle and one of the square with side length c. The area of the triangle is 21 ab ¡ ¢ (half the base by the perpendicular height). Thus, (a + b)2 = c2 + 4 21 ab = c2 + 2ab. p Since (a + b)2 = a2 + b2 + 2ab, a2 + b2 = c2 , so that c = a2 + b2 , which is Pythagoras’ theorem.q Thus, the length of the vector x, denoted ∥ x ∥, is the length of the line c

given by

x12 + x22 :

∥ x ∥=

q

x12 + x22 .

This extends to n dimensions directly. The length of a vector x = (x1 , x2 ,... , x n ) is denoted ∥ x ∥ and defined

∥ x ∥=

q

x12 + x22 + · · · + x2n

=

s

n X

i =1

x2i =

p

x′ x.

From the definition, the length of x and − x are the same: ∥ x ∥=∥ − x ∥. The sum and difference of two vectors x and y define vectors x + y and x − y, as depicted in Figure 9.2.

The distance between two vectors, x, y, is the length of the vector x − y: ∥ x − y ∥=

s

n X

i =1

(x i − yi )2 .

9.2. VECTORS: LENGTH AND DISTANCE

309

x

x

x2

x− y

∥x∥

y y+ z

0

z

x1

Figure 9.2: Vectors Given a vector x, its length defines a circle or sphere with x in the boundary, so a circle with center 0 has radius r =∥ x ∥. Thus, the smaller the length of a vector, the smaller the circle on whose boundary it lies, as illustrated in Figure 9.3, where x and x′ have the same length (and y, y′ also have the same length). Given a vector z, equidistant points from z all lie on a circle centered on z of some fixed radius. In the second diagram in Figure 9.3, the point w, closest to y and on the horizontal axis, has the property that y − w is a vector perpen-

dicular (orthogonal) to the horizontal axis. The circle centered on y and tangent to the horizontal axis has only the point w in common with the horizontal axis. The figure suggests that y − w is perpendicular to the horizontal axis and this is x2

x2

y

y− w

x

y

x′ 0

x1 y′

Figure 9.3: Length

e1 w

x1

CHAPTER 9. VECTORS

310

in fact the case. To see this, let e 1 = (1, 0). Then ce 1 is the vector e 1 scaled by

the factor c. Any point, such as w, on the horizontal axis may be written as ce 1 , for some value of c. Consider the problem of choosing c to minimize the distance from y to the horizontal axis. This requires minimizing ∥ y − ce 1 ∥, or equivalently, minimizing:

∥ y − ce 1 ∥2 = ( y − ce 1 ) · ( y − ce 1 ) = y · y − 2 cy · e 1 + c2 e 1 · e 1 .

Choosing c to minimize this expression gives: −2 y · e 1 + 2 ce 1 · e 1 = 0, so that

c¯ =

y · e1 . e1 · e1

¯ 1 is the point on the horizontal closest to y. Observe that So, w = ce ¯ 1 ) · e 1 = y · e 1 − ce ¯ 1 · e 1 = y · e 1 − y · e 1 = 0. ( y − w) · e 1 = ( y − ce Since any point, v, on the horizontal axis may be written v = ce 1 , for some c,

( y−w)·v = ( y−w)· ce 1 = c( y−w)· e 1 = 0. In particular, ( y−w)·w = 0. The explanation

is simple. The choice of w yields a vector y − w with 0 in the first coordinate. Every

vector on the horizontal axis has 0 in the second coordinate, so ( y − w) · x = 0 for

any x on the horizontal axis x.

9.3 Vectors: direction and angles Since vectors indicate direction, we may compare the relative directions of different vectors. One particularly important comparison occurs when the vectors are orthogonal or “perpendicular”. Definition 9.1: Two vectors a and b are said to be orthogonal if their inner product is 0, a · b = 0. In two dimensions, a · b = a1 b1 + a2 b2 , so that if the vectors are orthogonal, then

a1 b1 + a2 b2 = 0, and rearranging,

a2 a1

= − bb 1 , provided a1 and b2 are both not equal 2

to 0. From the definition, if a and b are orthogonal, then so are the vectors ca and

db for any constants c and d , since ca · db = ( cd )a · b = 0. The vectors u and v,

u=

Ã ! 4

2

and

v=

Ã

−1

2

!

,

9.3. VECTORS: DIRECTION AND ANGLES

311

have inner product u · v = (4)(−1) + (2)(2) = −4 + 4 = 0, so these vectors are orthogonal (Figure 9.4).

v

u

Figure 9.4: Orthogonality Consider two vectors, v and u, depicted in Figure 9.5. Two cases are depicted – where the angle is acute and where the angle is obtuse. In either case, drop a perpendicular line from the point v onto a point lying on the vector u, cu. Orthogonality requires that this line, v − cu, be orthogonal to u: (v − cu) · u = 0. This

implies that v · u − cu · u = 0, so that c=

v

v − cu

v·u v·u = . u · u ∥ u ∥2

u

u

• cu

v − cu

θ

c>0

θ

c 0, v · u > 0; when the angle is obtuse, c < 0, v · u < 0.

Notice that when v · u = 0, cos(θ) = 0, corresponding to a 90◦ angle. For a dif-

ferent perspective consider Figure 9.8, where β and z are orthogonal: β · z = 0 or β

β1 z1 + β2 z2 = 0 so that z2 = − β1 z1 . Thus, the angle between z and the horizontal 2

9.4. HYPERPLANES AND DIRECTION OF INCREASE

313

x2 cos(θ) = ha ; sin(θ) = hb ; tan(θ) = h

b a

y h

b

b

θ

θ

x1

a

• α

a

z

Figure 9.7: Trigonometric functions

x2 β = (β1 , β2 )

β2

(β1 , β2 )

β2

β1 β2

(β2 , β1 )

α δ

γ β1 γ

α

x1

β − β1 2

α

β1

z (β2 , −β1 )

Figure 9.8: Orthogonal vectors

axis is (in absolute terms) axis. Since γ and

β1 β2

β1 , β2

the same as the angle between β and the vertical

combine to form a 90◦ angle, the angle between β and z is

90◦ . Alternatively, from the second diagram in Figure 9.8, note that 2α + δ = 90◦ ,

so the angle between (β1 , β2 ) and (β2 , −β1 ) is 90◦ and (β1 , β2 ) · (β2 , −β1 ) = 0.

9.4

Hyperplanes and direction of increase

Let β = (β1 ,... , βn ) 6= 0 be a vector of constants, c a constant and consider the

equation:

βx =

n X

i =1

β i x i = c.

CHAPTER 9. VECTORS

314

The set of all x satisfying this equation defines a hyperplane: H(β, c) = { x | β · x = c}. If y and z are on this hyperplane, then β · y = β · z = c, so that β · (y − z) = 0.

Thus, β and (y − z) are orthogonal, as depicted in Figure 9.9. Movement along the hyperplane defined by β and c is orthogonal to β.

x2 β

y z

y− z

x1 H = { x | β x = c}

Figure 9.9: The normal to a hyperplane A hyperplane H(β, c) = { x | β · x = c}, separates R n into three regions: H = { x | β x = c},

H+ = { x | β x > c},

H− = { x | β x < c}.

On which sides of H do H+ and H− lie? Consider the point y and move from y along the direction β a fraction δ > 0 to y + δβ, as depicted in Figure 9.10. Observe that

c′ = β · (y + δβ) = β · y + δβ · β = c + δβ · β > c Therefore, β points into the region H+ : starting at y and moving from y to w = y + δβ, w is in H+ .

Equivalently, let f (x) = β · x, then β points in the x direction on which f (x)

increases. Furthermore, notice that the direction β gives the greatest increase

among all directions of equal length. For example, r has the same length as δβ (since they both lie on the same circle around y), but c′′ = β · (y + r) < β · (y + δβ) = c′ .

9.4. HYPERPLANES AND DIRECTION OF INCREASE

315

x2 β

y + δβ

y+r

H ′ = { x | β x = c′ } y

H ′′ = { x | β x = c′′ }

•

x1 H = { x | β x = c}

Figure 9.10: The normal to a hyperplane

As noted earlier, the inner product of β with any vector ξ may be positive, zero or negative. If βξ = 0, they are orthogonal. Informally, if βξ > 0, the angle between β and ξ is less than 90◦ ; and if βξ < 0, the angle between β and ξ is greater

than 90◦ . To see this from a different perspective, consider Figure 9.11, where z is chosen so that β z > c, say β z = c′ > c.

x2 β

z x

H ∗ = { x | β x = c′ > c} x1

ξ = x− z

H = { x | β x = c} H ◦ = { x | β x = 0}

Figure 9.11: The sign of an inner product

CHAPTER 9. VECTORS

316

Notice from the figure that the vector ξ satisfies βξ = β(x − z) = c − c′ < 0, and ξ

forms an angle greater than 90◦ with β. Similarly, if z were chosen with β z = cˆ < c, then βξ = β(x − z) > 0.

If β is normal (perpendicular) to H, then for any α 6= 0, αβ is normal to the

hyperplane. Dividing β by ∥ β ∥ gives the unit normal, β∗ , to the hyperplane H where β∗ = β′ β ∥β∥2

= 1.

β . ∥β∥

The normal is orthogonal to H and has length 1, since β∗′ β∗ =

9.5 Farkas’ lemma These ideas may be used to develop a well-known theorem in linear algebra and linear inequalities called Farkas’ lemma. This result is particularly useful in optimization theory. Let A be an m × n matrix and b an n × 1 vector. Theorem 9.1: Either (1) { Ax = b, x ≥ 0}, or (2) { A ′ y ≤ 0 and y′ b > 0} has a solution, but not both.

P ROOF : Suppose that (1) has no solution: b 6∈ K ≡ { Ax | x ≥ 0}. Let p ∈ { Ax |

x ≥ 0} solve min z∈K kb − zk. Since K is convex, p is unique. Let w ≥ 0 satisfy p = Aw. The hyperplane perpendicular to b − p separates b and K: ∀ z ∈ K,

(b − p) · (z − p) ≤ 0 (see Figure 9.12.) Equivalently:

(b − p) · (Ax − Aw) ≤ 0, ∀ x ≥ 0. Put y = b − p, so y · (Ax − Aw) ≤ 0 for all x ≥ 0. Let x = w + e i , e i a vector with 1 in

the ith position and zero elsewhere. Then A(x − w) = Ae i = a i the ith column of A. Thus, y · a i ≤ 0, ∀ i, or y′ A ≤ 0. Taking x = 0 in y · (Ax − Aw) ≤ 0 and recalling

Aw = p gives − y · p ≤ 0. Substituting b − y for p, − y · (b − y) ≤ 0 or − y · b + y · y ≤ 0

or y · y ≤ y · b and y · y > 0 since y 6= 0, so y · b > 0. Thus, if (1) does not hold, (2) must hold. Finally, (1) and (2) cannot both be satisfied: if ∃ x ≥ 0, Ax = b and if y′ A ≥ 0 then y′ b = y′ Ax ≥ 0. This completes the proof.

R EMARK 9.1: Farkas’ lemma has a simple geometric interpretation. In Figure 9.13, b is in the cone generated by the vectors a1 and a2 : b = α1 a1 + α2 a2 with

α i ≥ 0. The shaded region is defined as the set of y satisfying a1 · y ≤ 0 and a2 · y ≤

0 or A ′ y ≤ 0, where A is the with columns a1 and a2 (A = {a1 , a2 }) and A ′ ½ matrix ¾ the transpose of A: A ′ =

a′1 a′2

. As the figure illustrates, the region { y | A ′ y ≤ 0}

is a subset of the region { y | b′ y ≤ 0}. Thus, Ax = b has a non-negative solution and A ′ y ≤ 0 implies b′ y ≤ 0.

9.5. FARKAS’ LEMMA

317

b • p•

•z

K

b− p •

• z− p

Figure 9.12: Proof of Farkas’ lemma

The lemma may be rearranged as follows: L EMMA 1: The pair A ′ y ≤ 0 and b′ y > 0 have a solution if and only if the pair

A ′ y ≥ 0 and b′ y < 0 have a solution. Consequently, Farkas’ lemma may be writ-

ten as:

either (1) { Ax = b, x ≥ 0}, or (2) { A ′ y ≥ 0 and b′ y < 0} has a solution, but

♦

not both.

P ROOF : For, if z satisfies A ′ z ≤ 0 and b′ z > 0 then − A ′ z ≥ 0 and −b′ z < 0 or

A ′ (− z) ≥ 0 and b′ (− z) < 0 or A ′ y ≥ 0 and b′ y < 0 with y = − z.

Farkas’ lemma may be expressed in other ways. One corollary is: L EMMA 2: Either:

Ax ≤ b

or

A′ y ≥ 0

x≥0

b′ y < 0

has a solution

has a solution.

y≥0

♦

Ã ! ´ A′ ′ ¯ . Then, applying I , so that A = I lemma (1), either [ A¯ z = b, z ≥ 0] has a solution or, [ A¯ ′ y ≥ 0 and b′ y < 0] has a ³ P ROOF : In Farkas’ lemma, let A¯ = A

CHAPTER 9. VECTORS

318

a1

b

a2 a′1 y˜ = 0 b′ y˜ = 0

{ y | A ′ y ≤ 0}

a′2 y˜ = 0

Figure 9.13: Farkas’ lemma: a geometric view solution. Let z = ( x, w), so that the first expression is equivalent to [ Ax + Iw = b

and ( x, w) ≥ 0]. There is an ( x, w) satisfying this if and only if there is an x ≥ 0

with Ax ≤ b.

The second expression in lemma (1) is, in this context: [

Ã

A′

!

Ã

! A′ y

Ã ! 0

≥ y= 0 y I and b′ y < 0], and this is equivalent to there being a y with A ′ y ≥ 0,

b′ y < 0 and y ≥ 0. So, either one or the other of the following hold (but not

both). Summarizing, there is an x ≥ 0 with Ax ≤ b, or there is a y with A ′ y ≥ 0,

b′ y < 0 and y ≥ 0.

Exercises for Chapter 9

Exercise 9.1 Find the distance between the vectors x = (5, 3) and y = (4, 6). Exercise 9.2

Let x = (1, 3), y = (2, 2) and z = (1, −1). Find the angle between x

and y, and that between x and z. Exercise 9.3

Let x = (4, 2) and y = (3, 3). Find c such that ( y− cx) · ( y − cx) ≤

˜ , for some c˜. Confirm that ( y − cx) · x = 0. ( y − v) · ( y − v) for all v = cx Exercise 9.4

(a) Let β = (1, 2). Find the hyperplane β · x = 2. Noting that (2, 0) is on this hyperplane, plot (2, 0) + β. Identify the region { x | β x ≥ 2}.

(b) Let β = (−1, 1). Find the hyperplane β · x = 2. Noting that (−1, 1) is on this hyperplane, plot (−1, 1) + β. Identify the region { x | β x ≥ 2}.

(c) Let β = (−1, −1). Find the hyperplane β · x = 2. Noting that (−1, −1) is on this hyperplane, plot (−1, −1) + β. Identify the region { x | β x ≥ 2}.

Exercise 9.5 Consider the matrix:

R (θ ) =

Ã cos(θ)

sin(θ)

− sin(θ)

cos(θ)

!

,

where θ is an angle measured in degrees. For example:

R (90) =

Ã

cos(90) sin(90)

− sin(90)

cos(90)

319

!

=

Ã 0

1

−1

0

!

.

CHAPTER 9. VECTORS

320

The transformation R (θ) x moves the vector x through an angle of θ degrees. (a) Letting x = (1, 1), find x′ = R (45) x. Using the formulae c =

x· x′ ∥ x∥

and cos(θ) =

(b) Letting y = (1, 3), find y′ = R (30) y. Using the formulae c =

x· y′ ∥ y∥2

and cos(θ) =

c ∥ x∥ , ∥ x′ ∥

c∥ y∥ ∥ y′ ∥ ,

confirm that cos(θ) =

confirm that cos(θ) =

p1 2

p

3 2

=

=

c∥ x ∥ . ∥ x′ ∥

c∥ y∥ ∥ y′ ∥ .

Exercise 9.6 Show that  1  2  4

3 1 2





  0      y2  ≥ 0 , 3     1 y3 0

5

y1

³



3

4

 y 1 ´   3   y2  < 0 y3

has a solution. Exercise 9.7

Consider the following observations on two random variables, x

and y:   9    7  x=  ,  3

5

  12   8  y=  . 2

6

The correlation coefficient is defined:

r = pP

P ( x i − x¯ )( yi − y¯ ) pP ( yi − y¯ )2 ( x i − x¯ )2

where x¯ and y¯ are the respective means. Find the correlation coefficient and relate it to the angle between the deviations around x and y (determine the angle between the vectors of deviations).

10 Quadratic forms

10.1 Introduction A quadratic form is given by the expression x′ Ax where x is an n × 1 vector and A an n × n matrix. Quadratic forms arise in various branches of economics, and

particularly in optimization theory to identify a candidate solution as a maximum

or minimum. For example, in the study of quadratic functions, ax2 + bx + c = (ax +

b)x + c, this function is either “u-shaped” or the opposite, depending on whether a is positive or negative. So, the turning point either locates a minimum or a

maximum depending of the sign of a. If, for example, a > 0, then large values of x

correspond with large values of the function and the turning point corresponds to a minimum value of the function. Essentially the same situation occurs when the function is a quadratic in many variables: x′ Ax + bx + c. In this case, the matrix A plays an analogous role to a in

the scalar case: whether a “turning point” is a maximum or minimum depends on properties of A. Letting f (x) = x′ Ax + bx + c =

n X n X

i =1 j =1

a i j xi x j +

n X

k =1

b k x k + c,

in the context of optimization, positive and negative definiteness are the central concepts. Section 10.2 introduces quadratic forms and shows how, in the twovariable case, properties of the A matrix determine the shape of the function f . In Section 10.3, positive and negative definiteness are defined and the properties of positive and negative matrices are characterized in terms of determinant conditions. When the function f (x) is arbitrary (not necessarily quadratic), the criteria developed here extend directly by using a second-order quadratic approximation 321

CHAPTER 10. QUADRATIC FORMS

322

of the function f to identify local maxima or local minima. Finally, Section 10.4 considers the case where there are linear restrictions on x. A matrix A is positive definite if for all x 6= 0, x′ Ax > 0 and negative definite when x′ Ax < 0 for all non-zero x. So, for example, positive definiteness requires that for all non-

zero x, x′ Ax > 0. Suppose instead, that this requirement applies only to those non-zero x satisfying an additional condition, B ′ x = 0: under what circumstances

is x′ Ax > 0 for all non-zero x satisfying B ′ x = 0? This issue is considered in Section 10.4. The criteria developed there appear in the study of constrained optimization problems.

10.2 Quadratic forms A quadratic form in n variables is given as:    a11 · · · a1n x1     .   XX ..  . ′   .  . . x Ax = (x1 , x2 ,... , x n )  . a i j xi x j , . .   ..  =    i j a n1 ... a nn x n where a i j is the element in the ith row and jth column of A. In the case of two variables: x=

Ã

x1

!

x2

,

A=

Ã

a11

a12

a21

a22

!

and ′

³

x Ax = x1

x2

Ã ´ a 11

a21

a12 a22

!Ã

x1 x2

!

= a11 x12 + (a12 + a21 )x1 x2 + a22 x22 ,

which is a quadratic form in the variables x1 and x2 . An important question concerns the sign of x′ Ax. Under what conditions is the quadratic form always positive or always negative, regardless of the value of x? The question arises in a variety of problems, and in particular in the area of optimization.

E XAMPLE 10.1: For example, if a11 = 3, a22 = 4 and a12 + a21 = 3, then the

associated quadratic form is q(x1 , x2 ) = 3x12 + 4x22 + 3x1 x2 . Let f (x) be the quadratic function plotted in the figure:

z = f (x) = x1 + x2 + q(x1 , x2 ) = x1 + x2 + 3x12 + 4x22 + 3x1 x2 .

10.2. QUADRATIC FORMS

323

z

x2 x1

2 5 , − 11 ) = ( x1∗ , x2∗ ). That is: This function is smallest when ( x1 , x2 ) = (− 22

f ( x1∗ , x2∗ ) ≤ f ( x1 , x2 ) for all ( x1 , x2 ).

Notice that the “bowl” shape of the function is crucial: around ( x1∗ , x2∗ )

the function curves up. This shape property is determined by the quadratic form q( x1 , x2 ), which has the property that q( x1 , x2 ) ≥ 0 for all ( x1 , x2 ), and illustrates the relationship between quadratic forms and optimization.

♦

In general, the sign of the quadratic form x′ Ax depends on both A and the vector

x, as Example 10.2 illustrates.

E XAMPLE 10.2: Let A =

Ã

!

a

0

0

−aˆ

′

x Ax = ( x1 , x2 )

Ã

where a > 0 and aˆ > 0. Then

a

0

0

−aˆ

!Ã

x1 x2

!

ˆ 22 . = ax12 − ax

Then

x′ Ax > 0 if

ˆ 22 > 0 or ax12 − ax

x′ Ax < 0

ˆ 22 < 0 or ax12 − ax

if

aˆ 2 x a 2 aˆ x12 < x22 . a x12 >

So, the sign of x′ Ax depends on the value of x and the parameters of A . If, instead,

A=

Ã

!

a

0

0

aˆ

ˆ 22 , which is never negative since a and aˆ are positive then x′ Ax = ax12 + ax

and x2i ≥ 0.

♦

CHAPTER 10. QUADRATIC FORMS

324

Thus, as Example 10.2 illustrates, under suitable restrictions on A , the sign of a ˆ 22 , may be completely determined by the paramequadratic form, x′ Ax = ax12 + ax ters of the A matrix.

Consider again the two-variable quadratic form:

( x1 x2 )

Ã

a11

a12

a21

a22

!Ã

x1

!

x2

= a11 x12 + (a12 + a21 ) x1 x2 + a22 x22

¯ 1 x2 + a22 x22 , = a11 x12 + 2ax Ã !Ã ! a¯ a11 x1 = ( x1 x2 ) . a¯ a22 x2

1 a¯ = (a12 + a21 ) 2 (10.1)

If a11 = 0, then x′ = (1, 0) gives x′ Ax = 0, so, for a positive or negative definite

¯ 1 x2 + a22 x22 may be rearranged matrix, a ii 6= 0 for all i . The expression a11 x12 + 2ax (assume a11 6= 0 or that a22 6= 0 and with a symmetric calculation):

¸ ¸2 · · a¯ 2 2 a¯ ¯ 1 x2 + a22 x22 = a11 x1 + x x2 + a22 − a11 x12 + 2ax a11 a11 2 · · ¸ ¸2 a¯ 1 2 ¯ = a11 x1 + a11 a22 − a x22 . x2 + a11 a11

If a11 > 0 and a11 a22 − a¯ 2 > 0, then this expression is positive if ( x1 , x2 ) 6= (0, 0).

Conversely, if a11 < 0 and a11 a22 − a¯ 2 > 0, then the expression is negative for all ( x1 , x2 ) 6= (0, 0). In either of the other two cases [a11 > 0, a11 a22 − a¯ 2 < 0] and

[a11 < 0, a11 a22 − a¯ 2 < 0], the sign of x′ Ax can be positive or negative depending

on the values of ( x1 , x2 ). Consequently, the quadratic form is strictly positive for all ( x1 , x2 ) 6= (0, 0) if and only if a11 > 0 and a11 a22 − a¯ 2 > 0; and strictly negative

for all ( x1 , x2 ) 6= (0, 0) if and only if a11 < 0 and a11 a22 − a¯ 2 > 0.

E XAMPLE 10.3: This example illustrates how different values of the a i j parameters in the A matrix affect the sign of the quadratic form, Equation 10.1:

A1 = A2 =

Ã Ã

4

3

1

2

4

3

5

2

! !

⇒ x ′ A 1 x = 4 x 2 + 4 x y + 2 y2 ⇒ x ′ A 1 x = 4 x 2 + 8 x y + 2 y2 .

Considering A 1 , a11 = 4 > 0, a11 a22 − a¯ 2 = 4 · 2 − 22 = 4 > 0. Similarly, for A 2 ,

a11 = 4 > 0, a11 a22 − a¯ 2 = 4 · 2 − 42 = −8 < 0. The pair of conditions a11 > 0 and

10.3. POSITIVE AND NEGATIVE DEFINITE MATRICES

325

a11 a22 − a¯ 2 > 0 are satisfied by the matrix A 1 , but not by A 2 . z

x′ A 1 x

x1 x2 z

x′ A 2 x

x1 x2 In the graphs, the quadratic form based on A 1 is always non-negative and strictly positive for ( x1 , x2 ) 6= (0, 0), whereas the quadratic form based on A 2

is negative for some values of ( x1 , x2 ). For example, at ( x1 , x2 ) = (−1, 1), 4 x2 + 8 x y + 2 y2 = 4(−1)2 + 8(−1)(1) + 2(1)2 = −2.

♦

The next section describes a test to determine if a matrix is positive or negative definite.

10.3 Positive and negative definite matrices For the following discussion let A be an n × n square matrix. Any vector x gives

a quadratic form, x′ Ax. Depending on the properties of the matrix, A , it may be that the expression x′ Ax is always positive or always negative (whenever x 6= 0).

Definition 10.1 formalizes this concept and the discussion provides criteria for checking when a matrix is positive or negative definite.

CHAPTER 10. QUADRATIC FORMS

326

Definition 10.1: Definite and semi-definite matrices: • A is positive definite if x′ Ax > 0, ∀ x 6= 0 • A is positive semi-definite if x′ Ax ≥ 0 • A is negative definite if x′ Ax < 0, ∀ x 6= 0 • A is negative semi-definite if x′ Ax ≤ 0. Observe that if A is positive (semi-) definite then − A is negative (semi-)definite. If

A is positive definite then A is positive semi-definite since x′ Ax > 0 ⇒ x′ Ax ≥ 0.

The converse is not true:

A=

Ã 1

1

1

1

!

,

x=

Ã

x1 x2

!

,

so x′ Ax = ( x1 + x2 )2 ≥ 0, but setting x1 = − x2 6= 0 gives x′ Ax = 0. How can one determine if a matrix is positive or negative definite? Sometimes the nature of the problem provides an answer. For example, suppose that B is an P n × k matrix and that A = B ′ B . Then x′ Ax = x′ B ′ Bx = z′ z = ni=1 z2i ≥ 0, where z =

Bx. So, whenever A can be expressed as A = B ′ B , x′ Ax ≥ 0 for all x. Furthermore, P if the rank of B is k, then for any x 6= 0, z = Bx 6= 0 so that x′ Ax = ni=1 z2i > 0

whenever x 6= 0.

In general, the matrix A will have no obvious structure implying the sign of the corresponding quadratic form and motivates the need for a systematic procedure to investigate definiteness of the matrix. The systematic procedure is based on the properties of “minors” derived from the matrix A . Definition 10.2: Given a square matrix A , the determinant of any k × k matrix

(0 < k ≤ n) obtained by deleting any n − k rows and n − k columns is called a

minor (or a kth-order minor to identify the dimensions of the matrix on which the determinant is derived).

• The ( i, j) minor of a square matrix A , denoted m i j , is the determinant of the matrix obtained from A by deleting the i th row and jth column. • A minor obtained from a square matrix A as the determinant of a k × k

matrix given by deleting n − k matching rows and columns is called a (kth-

order) principal minor.

• A principal minor is called a leading principal minor if it is the determinant of a submatrix obtained from A by deleting the last (n − k) rows and columns, 0 < k ≤ n.

10.3. POSITIVE AND NEGATIVE DEFINITE MATRICES

327

Definition 10.3: The trace, tr( A ), of a square matrix, A , is the sum of the diagonal P elements of the matrix: tr( A ) = ni=1 a ii . P ROPOSITION 10.1: The following observations are useful: 1. If A is positive definite then a ii > 0, ∀ i ; and if A is positive semi-definite then a ii ≥ 0, ∀ i .

2. If A is positive definite r( A ) = n. 3. If A is positive definite then tr( A ) > 0. 4. The determinant of a symmetric positive definite matrix is positive.

♦

P ROOF : Let x = (0, 0,... , 1,... , 0), a vector with 1 in the ith position and 0 else-

where. Then x′ Ax = a ii . If A is positive definite then a ii > 0, ∀ i ; and if A is positive semi-definite then a ii ≥ 0, ∀ i . To show the second result, note that if A

is positive definite, then for all x 6= 0, x′ Ax 6= 0. Therefore, for all x 6= 0, Ax 6= 0,

which implies linear independence of the columns of A , so that A has full rank,

n. The positive determinant property follows from Theorem 10.1.

E XAMPLE 10.4: Consider the matrix A :

A=

Ã

1

0

−4

1

!

.

This matrix has positive trace, determinant and leading principal minors, but is not positive definite. (With x′ = (2, 1), x′ Ax = −3.) Notice that the matrix A is not symmetric.

10.3.1

♦

Symmetric matrices

Recall that a matrix A is symmetric if A = A ′ (or a i j = a ji for all i, j). The following considerations show that there is no loss of generality in considering only symmetric matrices when checking for positive definiteness. Consider any nonsymmetric matrix B . Noting that x′ Bx = x′ B ′ x, if B is positive definite then so is B + B ′ since 2 x′ Bx = x′ Bx + x′ B ′ x = x′ (B + B ′) x, so that x′ Bx > 0 if and only if x′ (B + B ′) x > 0. A matrix B is positive definite (semi-definite) if and only if the

matrix B + B ′ is positive definite (semi-definite). Note that B + B ′ is symmetric.

Therefore, to determine if B is positive definite (positive semi-definite) etc., one

CHAPTER 10. QUADRATIC FORMS

328

need only confirm that (B + B ′ ) is positive definite (positive semi-definite). In discussing (semi-)definite matrices we need only consider symmetric matrices.

10.3.2

Criteria for positive and negative definiteness

In general, it is difficult to check positive or negative definiteness by inspecting a matrix. An alternative criterion is based on considering a sequence of determinants. Let A be a symmetric n × n matrix and let D i be an i × i matrix – the submatrix of A obtained by deleting the last n − i rows and the last n − i columns.

Thus |D i | is the i th leading principal minor. In words, for positive definiteness, the

leading principal minors {|D i |}ni=1 must all be positive; for negative definiteness they must alternate in sign, starting negative: 

a11

a12

a1 j

 a  21   .  .  . A=   a j1   .  .  .  a n1

···

···

a22

···

a2 j

···

.. .

a j2 .. .

a n2

..

.

···

.. .

ajj

.. .

.. .

···

an j

.. .

···

..

.

···

a1 n



 a2 n    ..   .  ,  a jn   ..   .   a nn



a11

   a21  Di =  .  .  .  a i1

a12

···

a22

···

.. .

a i2

..

.

···

a1 i



  a2 i   . ..   .   a ii

With this notation: Theorem 10.1: Let A be a symmetric matrix. The matrix A is positive definite if and only if | D i |> 0, ∀ i = 1,... , n. That is: | D 1 |> 0, | D 2 |> 0, | D 3 |> 0, ... , | D n |> 0.

The matrix A is negative definite if and only if (−1) i | D i |> 0, ∀ i = 1,... , n. That is: | D 1 |< 0, | D 2 |> 0, | D 3 |< 0, ... , (−1)n | D n |> 0.

Thus, with positive definiteness, | D i | is positive for all i , whereas with nega-

tive definiteness, | D i | alternates in sign, beginning negative.

In the 2 × 2 case, because the matrix is symmetric, a12 = a21 and a11 6= 0, the

quadratic form is:

x

′

Ax = a11 x12 + 2a12 x1 x2 + a22 x22 = a11

µ

a12 x1 + x2 a11

¶2

Ã

+ a22 −

a212 a11

!

x22 .

10.3. POSITIVE AND NEGATIVE DEFINITE MATRICES

329

From this expression, it is clear that if a11 > 0 and a11 a22 − a212 > 0, then provided

x1 6= 0 or x2 6= 0, x′ Ax > 0. Note that a11 is the determinant of the matrix D 1 , and

a11 a22 − a212 is the determinant of the matrix D 2 . From these calculations, note

that to check positive or negative definiteness, there are at most n determinants to evaluate.

10.3.3

Positive and negative semi-definiteness

The next theorem gives similar conditions for positive and negative semi˜ i be any submatrix of A condefiniteness and uses the following notation. Let D taining i rows and columns obtained by deleting any n − i rows and the corre-

sponding columns (if the kth row is deleted, then the kth column is also deleted). For example, if A is an n × n matrix: 

a kk

ak j

a kl

 a  jk   ..  .  ˜ Di =  a  lk   ..  . 

···

···

ajj .. .

···

.

a jl .. .

··· .. .

al j .. .

··· .. .

a ll .. .

···

ar j

···

a rl

···

a rk

..

..

.

a kr



 a jr    ..  .    a lr    ..  .  

a rr

˜ i is an i th-order principal minor. The determinant of any D For example, if A is a 3 × 3 matrix: 



a11

a12

a13

 A=  a21

a22

 a23  ,

a31

a32

a33

˜ i matrices: then there are three D Ã

a11

a12

a21

a22

!

,

Ã

a22

a23

a32

a33

!

,

Ã

a11

a13

a31

a33

!

,

and hence three principal minors of order 2. Theorem 10.2: Let A be a symmetric matrix. The matrix A is positive semi-definite if and only if: ˜ 1 |≥ 0, all | D ˜ 2 |≥ 0,... all | D ˜ i |≥ 0,... all | D ˜ n |≥ 0. all | D

CHAPTER 10. QUADRATIC FORMS

330

The matrix A is negative semi-definite if and only if: ˜ 1 |≤ 0, all | D ˜ 2 |≥ 0, ... all (−1) i | D ˜ i |≥ 0,... all (−1)n | D ˜ n |≥ 0. all | D Observe that the statements for positive and negative semi-definiteness are not obtained by replacing strict inequalities by non-strict inequalities. To check for positive semi-definiteness, it is necessary to check that all 1st-order principal minors are positive, all 2nd-order principal minors are positive, all 3rd-order principal minors are positive, and so on. There are n = minors,

n! ( n−3)!3!

n! ( n−1)!1!

1st-order principal minors,

n! ( n−2)!2!

2nd-order principal i P h 3rd-order principal minors, and so on. There are ni=1 ( n−ni!)! i! =

2n − 1 such minors in total.

10.4 Definiteness with equality constraints In constrained optimization problems, checking for maxima or minima leads to considering definiteness of a quadratic form subject to linear constraints: for example, that x′ Ax > 0 for all x satisfying Bx = 0, for some m × n matrix B , where 

b11

b12

 b  21 B=  ..  .  b m1

···

b22 .. .

··· .. .

b m2

···

b1n



 b2n    ..  . .   b mn

First, consider the simplest case where there is one constraint so Bx = 0 may be

written b′ x where b′ = (b1 , b2 ,... , b n ). Form the bordered matrix: 

0

b1

b2

  b1    M kk =  b2   ..  . 

···

a11

a12

···

a21 .. .

a22 .. .

···

a k1

a k2

···

bk

..

.

bk



 a1 k    a2 k  ,  ..  .  

a kk

or, writing b k for the first k elements of b and A kk for the matrix obtained from

A by deleting the ! last k + 1, k + 2,... , n rows and columns, M kk may be written Ã ′ 0 bk . M kk = b k A kk

10.4. DEFINITENESS WITH EQUALITY CONSTRAINTS

331

Theorem 10.3: Let A be a symmetric matrix and b1 6= 0. Then, 1. x′ A x > 0 for all x 6= 0 and b′ x = 0 if and only if | M kk | < 0,

k≥2

2. x′ A x < 0 for all x 6= 0 and b′ x = 0 if and only if (−1)k | M kk | > 0,

k≥2

In the case where m > 1, similar computations apply. Then, 

 0    0   .  .  .     0 M kk =     b11   b  12   .  ..  

b1k

.. . .. .

0 0 .. .

..

0

b21 b22 .. . b2k

0

b11

b12

···

0 .. .

b21 .. .

b22

··· .. .

0

b m1

b m2

···

b m1

a11

a12

···

b m2 .. .

a21 .. .

a22 .. .

···

b mk

a k1

a k2

···

.

.. . .. . .. . .. . .. .

···

..

.

 b1k    b2k   ..   .     b mk  .   a1 k    a2 k    ..  .   

a kk

The matrix M kk has m + k rows and columns. Writing A kk and B mk for the component matrices,

M kk =

Ã

0

B mk

B ′mk

A kk

!

.

Theorem 10.4: Let A be a symmetric matrix and let | Bmm |6= 0. Then, 1. x′ A x > 0 for all x 6= 0 with B x = 0 if and only if: (−1)m | M kk | > 0,

k = m + 1,... , n.

2. x′ A x < 0 for all x 6= 0 with B x = 0 if and only if: (−1)k | M kk | > 0,

k = m + 1,... , n.

CHAPTER 10. QUADRATIC FORMS

332

R EMARK 10.1: This characterization may be expressed in different ways. Let

I p be an identity matrix of dimension p. Then: Ã

0

Ik

Im

0

!Ã

0

B mk

B ′mk

A kk

!Ã

0

Im

Ik

0

!

=

Ã

B ′mk

A kk

0

B mk

!Ã

0

Im

Ik

0

!

=

Ã

A kk

B ′mk

B mk

0

!

.

From this, noting that Ã

µ¯ ¯0 | M kk |= ¯¯ Im

0

Im

Ik

0

!′

=

Ã

0

Ik

Im

0

¯¶ ¯ I k ¯¯ 2 ¯¯ A kk ¯ ×¯ B mk 0

!

,

¯ ¯ B ′mk ¯¯ ¯¯ A kk ¯=¯ B mk 0

¯ B ′mk ¯¯ ¯. 0

Therefore, the determinental conditions given previously can also be expressed in terms of matrices in the form: Ã

A kk

B ′mk

B mk

0

!

.

R EMARK 10.2: If there is just one constraint, m = 1, then constrained positive-

ness requires | M kk |< 0, k = m + 1,... , n. More generally, with m > 1 m odd gives

| M kk |< 0, k = m + 1,... , n, while with m even the inequality is reserved so that

| M kk |> 0, k = m + 1,... , n. In contrast, for constrained negatives | M kk | alter-

nates in sign starting positive at k = m + 1 if m is odd and starting negative if

m is even.

Exercises for Chapter 10

Exercise 10.1 Consider the following symmetric matrices:     3 −1 1 3 2 1       A= B= −1 3 −2 , 2 12 4 . 1 −2 2 1 4 2 Confirm that both A and B are positive definite. Exercise 10.2 Consider the following matrices:    −3 2 5 −5       A =  2 −6 B= 2 1 , 5 1 −13 1

2 −1 −1



1

 −1  .

−7

Confirm that both A and B are negative definite. Exercise 10.3 Consider the matrix: Ã

A=

1 4 − 31

− 31 2 9

!

.

Show that A is positive definite for all x = ( x1 , x2 ) satisfying Exercise 10.4 Consider the matrix: Ã

A=

1 4 − 34

− 34 32 9

!

¡1

2 2, 3

¢

· ( x1 , x2 ) = 0.

.

¢ ¡ Show that A is positive definite for all x = ( x1 , x2 ) satisfying 2, 38 · ( x1 , x2 ) = 0.

333

CHAPTER 10. QUADRATIC FORMS

334

Exercise 10.5 Consider the matrix:  α(α − 1)   Q (α, β, γ) =  αβ αγ

αβ β(β − 1) βγ

αγ



 .  γ(γ − 1). βγ

Let x′ Q (α, β, γ) x be a quadratic form. Also, let b = (w, r, p) be a vector of strictly positive entries. Suppose that α = β = γ = 41 . Show that x′ Q (α, β, γ) x < 0 for all x

satisfying b′ x = 0.

11 Multivariate optimization

11.1 Introduction This chapter is concerned with multivariate optimization, finding values for (x1 , x2 ,... , x n ) to make the function f (x1 , x2 ,... , x n ) as large or small as possible. In Section 11.2, the two-variable case is described; Section 11.3 considers the n-variable case. The extension from two to n variables is natural. Section 11.2 provides sufficient conditions for a local optimum and provides motivation for those conditions. Also, in this section, the envelope theorem is described. Section 11.3 provides sufficient conditions for both local and global optima. These conditions are developed in terms of concavity of the objective function and the properties of Hessian matrices in Section 11.4.

11.2 The two-variable case Consider the problem of choosing (x, y) to maximize z = f (x, y), a function of two variables. The first requirement is that no unilateral variations in x or y improve the value of f , the first-order conditions: ∂ f (x, y) ∂x ∂ f (x, y) ∂y

= f x (x, y) = 0 = f y (x, y) = 0.

(11.1)

The total differential is dz = f x dx + f y d y. This gives the change in the value

of the function due to the variation (dx, d y) in x and y from the value (x, y). 335

CHAPTER 11. MULTIVARIATE OPTIMIZATION

336

Note that dz depends on (x, y) since dz = f x (x, y)dx + f y (x, y)d y. When the choice

of x and y maximizes f , there can be no changes dx and d y that raise the value of f , as required by the partial derivative conditions above – that f x and f y are both 0. Consider the function plotted in Figure 11.1, f (x, y) = 1 − x2 − y2 . There, at z

y x

Figure 11.1: The function f (x, y) = 1 − x2 − y2

(0,0), f x = f y = 0, and it is clear from the figure that this point locates the highest

point of the function. Confirming that a point is the maximizing value requires a comparison similar to the univariate test f ′′ (x) < 0. The next theorem summarizes ¯ y). ¯ the key result for local optimality of a point ( x, ¯ y) ¯ satisfy Theorem 11.1: Suppose that ( x, 1.

¯ y) ¯ = f y ( x, ¯ y) ¯ =0 (a) f x ( x, £ ¤ ¯ y) ¯ < 0, f yy ( x, ¯ y) ¯ < 0 and f xx ( x, ¯ y) ¯ f yy ( x, ¯ y) ¯ − f x2y ( x, ¯ y) ¯ > 0, (b) f xx ( x,

¯ y) ¯ is a local maximum. then ( x, 2.

¯ y) ¯ = f y ( x, ¯ y) ¯ =0 (a) f x ( x, £ ¤ ¯ y) ¯ > 0, f yy ( x, ¯ y) ¯ > 0 and f xx ( x, ¯ y) ¯ f yy ( x, ¯ y) ¯ − f x2y ( x, ¯ y) ¯ > 0. (b) f xx ( x,

¯ y) ¯ is a local minimum. then ( x, The following examples illustrate the use of the theorem.

E XAMPLE 11.1: Suppose that a firm has a product that is produced at unit cost: min c(x, y) = e x e y − ln x − ln y.

11.2. THE TWO-VARIABLE CASE

337

What are the unit cost minimizing levels of x and y? The first-order conditions are: 1 =0 x 1 c y = e x e y − = 0. y

cx = ex e y −

Thus,

1 x

=

1 y

and x = y. From this, e2 x = 1x , so that solving xe2 x = 1 gives the

solution ( x = 0.4263). The second-order conditions are: 1 x2 1 c yy = e x e y + 2 y

c xx = e x e y +

cx y = ex e y. So, c xx > 0 and ¸· ¸ · 1 1 c xx c yy − c2x y = e x e y + 2 e x e y + 2 − [ e x e y ]2 x y

= ex e y

1 1 1 1 + e x e y 2 + 2 2 > 0. 2 y x x y

♦

The next example considers a profit maximization problem. E XAMPLE 11.2: Suppose that a firm has profit function

f ( x, y) =

p

x + ln( y + 1) − x −

1 y. 2

What values of x and y maximize profit? In this example, the computations are simplified because the function p may be written f ( x, y) = f 1 ( x) + f 2 ( y): f ( x, y) = x + ln( y + 1) − x − 12 y = ¤ £ p [ x − x] + ln( y + 1) − 21 y , and the cross-partial derivatives vanish. The function may be plotted:

z

x y

p f ( x, y) = x +ln( y + 1)− x − 12 y

CHAPTER 11. MULTIVARIATE OPTIMIZATION

338

The first-order conditions are:

fx =

1 1 p −1 = 0 2 x

fy =

1 1 − = 0. y+1 2

1

1 1 2 and 2 = y + 1 giving x = 4 , y = 1. The second derivatives are: 1 − 23 −2 f xx = − 4 x , f yy = −( y + 1) and f x y = 0. So, f xx and f yy are both negative ¢ ¡ ¢ ¡ ¢ ¡ ¯ y¯ ) = 14 , 1 , f ( x, y) < f 14 , 1 , for ( x, y) 6= 14 , 1 . ♦ in a neighborhood of ( x,

So, x 2 =

R EMARK 11.1: In Example 11.2, the function f ( x, y) has cross-partial derivatives that are 0, so that f x,y ( x, y) = 0 for all ( x, y). Later, in Equation 11.2, with

¯ y¯ ) = f y ( x, ¯ y¯ ) = 0 and f x y ( x, ¯ y¯ ) = f yx ( x, ¯ y¯ ) = 0, the Taylor series expansion ref x ( x,

duces to:

¯ y¯ ) + [ f xx ( x∗ , y∗ )(∆ x)2 + f yy ( x∗ , y∗ )(∆ y)2 ]. f ( x, y) = f ( x, So, if f xx ( x∗ , y∗ ) < 0 and f yy ( x∗ , y∗ ) < 0, since (∆ x)2 and (∆ y)2 are both posi-

¯ y¯ ). If f xx ( x, ¯ y¯ ) < 0 and f yy ( x, ¯ y¯ ) < 0, then this will also hold for tive, f ( x, y) < f ( x,

¯ y¯ ), so that for all ( x, y) close to ( x, ¯ y¯ ), f ( x, y) < f ( x, ¯ y¯ ). ( x∗ , y∗ ) close to ( x,

E XAMPLE 11.3: A problem in physics leads to the following objective. Minimize f ( x1 , x2 ), where f is defined: 1 f ( x1 , x2 ) = (γ1 x12 + γ3 ( x2 − x1 )2 + γ2 x22 ) − bx2 , γ i > 0, ∀ i = 1, 2, 3. 2 Differentiating with respect to x1 and x2 gives:

f x1 = γ1 x1 − γ3 ( x2 − x1 ) = 0 f x2 = γ3 ( x2 − x1 ) + γ2 x2 − b = 0. The second derivatives are: f x1 x1 = γ1 + γ3 , f x1 x2 = −γ3 , f x2 x2 = γ2 + γ3 . This gives the matrix H as

H=

Ã

f x1 x1

f x1 x2

f x1 x2

f x2 x2

!

=

Ã

(γ 1 + γ 3 ) −γ 3

−γ 3

(γ 2 + γ 3 )

!

.

To solve the first-order conditions, observe that they can be rearranged as: Ã

(γ 1 + γ 3 ) −γ 3

−γ 3

(γ 2 + γ 3 )

!Ã

x1 x2

!

=

Ã ! 0

b

.

11.2. THE TWO-VARIABLE CASE

339

Cramer’s rule may be used to solve for x1 and x2 . The determinant of H is | H |= (γ1 + γ3 )(γ2 + γ3 ) − γ23 = γ1 γ2 + γ1 γ3 + γ2 γ3 + γ23 − γ23 = γ1 γ2 + γ1 γ3 + γ2 γ3 .

So, ¯ 1 ¯¯ 0 x1 = ¯ | H | ¯b

Or,

x1 =

¯ ¯ ¯ 1 ¯¯(γ1 + γ3 ) ¯ ¯ , x2 = ¯ | H | ¯ −γ 3 (γ 2 + γ 3 )¯

−γ 3

¯ 0¯¯ ¯. b¯

b γ3 b (γ 1 + γ 3 ) , x2 = . γ1 γ2 + γ1 γ3 + γ2 γ3 γ1 γ2 + γ1 γ3 + γ2 γ3

Since

f x1 x1 = γ1 + γ2 > 0,

f x1 x1 f x2 x2 − f x21 x2 = γ1 γ2 + γ1 γ3 + γ2 γ3 > 0,

the second-order condition for a minimum is satisfied. The figure plots f for γ i = 2, ∀ i , b = 1.

z

x y f ( x, y) = x2 +( y − x)2 + y2 − y

♦

11.2.1

Motivation for the second-order conditions

Recall that in the univariate case, a Taylor series expansion gave:

f ( x) = f ( x¯ ) + f ′ ( x¯ )( x − x¯ ) +

1 ′′ f ( p)( x − x¯ )2 2!

for some p between x and x¯ . So, with f ′ ( x¯ ) = 0, f ( x) = f ( x¯ ) + 21 f ′′ ( p)( x − x¯ )2 , so that if

f ′′ ( p) < 0 for all p close to x¯ , then f ( x) < f ( x¯ ) and x¯ is a local maximum. Essentially

CHAPTER 11. MULTIVARIATE OPTIMIZATION

340

the same ideas apply to the multivariate case. In the two-variable case, a Taylor ¯ y¯ ), ( x, y) = ( x¯ + ∆ x, y¯ + ∆ y) is given by: series expansion of f around ( x, ¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] f ( x, y) = f ( x,

+ [ f xx ( x∗ , y∗ )(∆ x)2 + 2 f x y ( x∗ , y∗ )∆ x∆ y + f yy ( x∗ , y∗ )(∆ y)2 ]

(11.2)

¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] ≈ f ( x,

¯ y¯ )(∆ x)2 + 2 f x y ( x, ¯ y¯ )∆ x∆ y + f yy ( x, ¯ y¯ )(∆ y)2 ], + [ f xx ( x,

(11.3)

¯ y¯ ): where ( x∗ , y∗ ) is on the line segment joining the points ( x, y) and ( x, ˜ y˜ ) | ( x, ˜ y˜ ) = θ( x, y) + (1 − θ)( x, ¯ y¯ ), θ ∈ [0, 1]}, ( x∗ , y∗ ) ∈ {( x, and where ∆ x = x − x¯ and ∆ y = y − y¯ . The Taylor series expansion is exact with

¯ y¯ ) (Equation 11.3). The expression ( x∗ , y∗ ) (Equation 11.2) or approximate with ( x, may be written in matrix terms: " Ã !# 1 ∗ ∗ ∆x ¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] + f ( x, y) = f ( x, (∆ x ∆ y ) H ( x , y ) 2 ∆y " Ã !# ∆x 1 ¯ y¯ ) ¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] + , (∆ x∆ y)H ( x, ≈ f ( x, 2 ∆y

(11.4)

(11.5)

where

H ( x, y) =

Ã

f xx ( x, y)

f x y ( x, y)

f yx ( x, y)

f yy ( x, y)

!

,

is called the Hessian matrix. In the case where f ( x, y) = 1 − x2 − y2 ,

H ( x, y) =

Ã

−2

0

0

!

−2

so, the expression in Equation 11.4 reduces to ¯ y¯ ) + [ f x ( x, ¯ y¯ )∆ x + f y ( x, ¯ y¯ )∆ y] − 2[(∆ x)2 + (∆ y)2 ], f ( x, y) = f ( x, ¯ y¯ ), if f x ( x, ¯ y¯ ) = f y ( x, ¯ y¯ ) = 0 the expression reduces to f ( x, y) = −2[(∆ x)2 + and at ( x, (∆ y)2 ]. Since this is less than or equal to 0 for all values of ∆ x and ∆ y, the value

¯ y¯ ) is at least as large as at any other ( x, y) value. In this case, and in of f at ( x, general, the properties of the Hessian matrix are crucial in determining whether a point satisfying the first-order conditions (Equations 11.1) determine a local maximum (or minimum).

11.2. THE TWO-VARIABLE CASE

341

Note that the entries (partial derivatives) in the Hessian matrix depend on

x and y: writing H ( x, y) emphasizes that the entries are evaluated at ( x, y). If ¯ y¯ ), then f x ( x, ¯ y¯ ) = f y ( x, ¯ y¯ ) = 0. In this case, the first-order conditions hold at ( x, Equations 11.2 and 11.3 become:

¯ y¯ ) + [ f xx ( x∗ , y∗ )(∆ x)2 + 2 f x y ( x∗ , y∗ )∆ x∆ y + f yy ( x∗ , y∗ )(∆ y)2 ] f ( x, y) = f ( x, ¯ y¯ ) + [ f xx ( x, ¯ y¯ )(∆ x)2 + 2 f x y ( x, ¯ y¯ )∆ x∆ y + f yy ( x, ¯ y¯ )(∆ y)2 ], ≈ f ( x,

(11.6) (11.7)

¯ y¯ ). In terms of the Heswhere ( x∗ , y∗ ) is on the line segment joining ( x, y) and ( x, sian matrix: ∗

∗

¯ y¯ ) + (∆ x ∆ y)H ( x , y ) f ( x, y) = f ( x,

Ã

∆x

!

∆y Ã ! ∆x ¯ y¯ ) + (∆ x ∆ y)H ( x, ¯ y¯ ) . ≈ f ( x, ∆y

(11.8)

(11.9)

The sign of this expression depends on properties of the matrix:

H ( x, y) =

Ã

f xx ( x, y)

f x y ( x, y)

f x y ( x, y)

f yy ( x, y)

!

.

The key property is called negative definiteness. Ã

a

b

!

is said to be negative b c definite if for all z = ( z1 z2 ) 6= (0, 0), z′ Az < 0. The matrix is said to be positive Definition 11.1: A square symmetric matrix A = definite if z′ Az > 0 for all such z.

Recall the criteria for negative and positive definiteness: P ROPOSITION 11.1: In terms of the elements of the matrix, positive definiteness and negative definiteness are characterized as follows: 1. A is negative definite if and only if a < 0 and ac − b2 > 0, and 2. A is positive definite if and only if a > 0 and ac − b2 > 0.

♦

R EMARK 11.2: This implies that if A is negative (positive) definite and A˜ is (sufficiently) close to A , then A˜ is negative (positive) definite. For example, if ˜ c˜ are close to a, b, c, respectively, then a˜ < 0 and ˜ b, a < 0 and ac − b2 > 0 and a, a˜ c˜ − b˜ 2 > 0, so that A˜ is negative definite.

CHAPTER 11. MULTIVARIATE OPTIMIZATION

342

Consequently, H ( x, y) is negative definite at ( x, y) if:

f xx ( x, y) < 0 and f xx ( x, y) f yy ( x, y) − f x2y ( x, y) > 0. Likewise, H ( x, y) is positive definite at ( x, y) if:

f xx ( x, y) > 0 and f xx ( x, y) f yy ( x, y) − f x2y ( x, y) > 0. ¯ y¯ ) < 0, then for [ f xx ( x, ¯ y¯ ) f yy ( x, ¯ y¯ ) − R EMARK 11.3: As noted earlier, if f xx ( x,

¯ y¯ )] > 0 to hold, it must be that f yy ( x, ¯ y¯ ) < 0, since f x2y ( x, ¯ y¯ ) ≡ ( f x y ( x, ¯ y¯ ))2 ≥ 0. f x2y ( x, £ ¯ y¯ ) < 0 and f xx ( x, ¯ y¯ ) f yy So, condition 1(b) in Theorem 11.1 is equivalent to f xx ( x, ¤ 2 ¯ y¯ ) − f x y ( x, ¯ y¯ ) > 0. An analogous equivalence applies in the local minimum ( x,

case.

Given the previous discussion, the main theorem, Theorem 11.1, is straightforward to prove. ¯ y¯ ) satisfy Theorem 11.1 Suppose that ( x, 1.

¯ y¯ ) = f y ( x, ¯ y¯ ) = 0 (a) f x ( x, £ ¤ ¯ y¯ ) < 0, f yy ( x, ¯ y¯ ) < 0 and f xx ( x, ¯ y¯ ) f yy ( x, ¯ y¯ ) − f x2y ( x, ¯ y¯ ) > 0. (b) f xx ( x,

¯ y¯ ) is a local maximum. then ( x, 2.

¯ y¯ ) = f y ( x, ¯ y¯ ) = 0 (a) f x ( x, £ ¤ ¯ y¯ ) > 0, f yy ( x, ¯ y¯ ) > 0 and f xx ( x, ¯ y¯ ) f yy ( x, ¯ y¯ ) − f x2y ( x, ¯ y¯ ) > 0. (b) f xx ( x,

¯ y¯ ) is a local minimum. then ( x, ¯ y¯ ), f x ( x, ¯ y¯ ) = f y ( x, ¯ y¯ ) = 0, and suppose also that conP ROOF : Suppose that at ( x,

¯ y¯ ) is negative definite. Given a small numdition 1(b) is satisfied, so that H ( x, ¯ y¯ ) and ber η, let −η < ∆ x < η and −η < ∆ y < η. Since ( x∗ , y∗ ) lies between ( x,

¯ y¯ ). In view of the pre( x¯ + ∆ x, y¯ + ∆ y) in Equation 11.8, ( x∗ , y∗ ) is within η of ( x,

ceding observation, it must be that H ( x∗ , y∗ ) is negative definite for small η ¯ y¯ )). Therefore, (since H ( x∗ , y∗ ) is close to H ( x, ∗

∗

¯ y¯ ) + (∆ x ∆ y)H ( x , y ) f ( x, y) = f ( x,

Ã

∆x

∆y

!

∗

∗

and (∆ x ∆ y)H ( x , y )

Ã

∆x

∆y

!

< 0,

¯ y¯ ). Similar reasoning applies to the case of a local so it must be that f ( x, y) < f ( x,

minimum.

As mentioned in Remark 11.3, the conditions f xx ( x, y) < 0 and f xx ( x, y) f yy ( x, y) −

f x2y ( x, y) > 0 together imply that f yy ( x, y) < 0 so that the pair of conditions

11.2. THE TWO-VARIABLE CASE

343

[ f xx (x, y) < 0 and f xx (x, y) f yy (x, y) − f x2y (x, y) > 0] are equivalent to [ f xx (x, y) < 0, f yy (x, y) < 0 and f xx (x, y) f yy (x, y) − f x2y (x, y) > 0]. However, as a practical matter, it

is often useful to check the simpler conditions f xx (x, y) < 0 and f yy (x, y) < 0 first: if either fails, then the condition f xx (x, y) f yy (x, y) − f x2y (x, y) > 0 cannot be satisfied.

¯ y) ¯ + R EMARK 11.4: In since the Taylor series expansion f (x, y) = f ( x, Ã fact, ! ∆ x ¯ y) ¯ with f x ( x, ¯ y) ¯ = (∆ x ∆ y)H(x∗ , y∗ ) holds generally (given any (x, y), and ( x, ∆y ¯ y) ¯ = 0, there is (x∗ , y∗ ) such that this equation is satisfied). So, for examf y ( x, ¯ y) ¯ for all ple, if H(x, y) were negative definite for all (x, y), then f (x, y) < f ( x, ¯ y) ¯ and the first-order conditions give a global maximum. (x, y) 6= ( x,

With a view to generalization to the many-variable case, the second-order conditions can be expressed directly in terms of submatrices of the Hessian matrix H. Considering H, write H1 for the matrix consisting of just the top left entry H1 = ( f xx ), and H2 for the matrix H. Because the determinant of a matrix with

one row and column is just the single entry of that matrix (if A = [a], | A |= a), | H1 |= f xx

and | H2 |= f xx f yy − f x2y .

Consequently: R EMARK 11.5: The second-order conditions for a maximum are | H1 |< 0

and

| H2 |> 0.

The second-order conditions for a minimum are | H1 |> 0

and

| H2 |> 0.

In the case of two variables, the benefit from expressing the second-order conditions in terms of determinants derived from the Hessian matrix is not significant. However, with three or more variables, the matrix presentation is more succinct. (See Section 11.3.)

E XAMPLE 11.4: Let f (x, y) = −2x2 − 7y2 + x y + 12x + 37y. Then, f x = −4x + y +

12, f xx = −4, f y = −14y + x + 37, f yy = −14, f x y = 1. Solving for f x = 0, f y = 0

CHAPTER 11. MULTIVARIATE OPTIMIZATION

gives x =

41 11 ,

H=

y=

Ã −4

1

344

32 11 .

!

1 −14

,

| H 1 | = −4 < 0 ,

¯ ¯−4 ¯ |H2 | = ¯ ¯1

¯ 1 ¯¯ ¯ = 55 > 0, −14¯

so the Hessian is negative definite. Consequently, the solution to the first-

♦

order conditions is a local maximum. The next example involves somewhat more complex computation. E XAMPLE 11.5: Consider the problem: max f ( x, y) =

p

x[1 + ln( y + 1)] − x − y2 .

A graph of the function is:

y

x

z

p f ( x, y) = x[1 +ln( y + 1)]− x − y2

The first-order conditions are: 1 1 f x = ( x)− 2 [1 + ln( y + 1)] − 1 = 0. 2 p x fy = − 2 y = 0. ( y + 1) 1

From the first-order conditions, x 2 = 12 [1 + ln( y + 1)]. Substituting into the

1 2 [1 + ln( y + 1)] = 2 y( y + 1). Solving for y gives y = 1 2 4 [1 + ln(1.2448129)] . Next, consider the second-order

equation f y = 0 gives

0.2448129. Thus, x =

conditions,

3 1 f xx = − ( x)− 2 [1 + ln( y + 1)] < 0 4 p x f yy = − −2 ( y + 1)2 1 1 1 . f x y = ( x )− 2 2 ( y + 1)

11.2. THE TWO-VARIABLE CASE

345

Thus,

f xx f yy −

f x2y

p ¶ ¸ µ µ ¶· 1 −1 x 1 −3 1 2 − 2 − ( x) = − ( x) [1 + ln( y + 1)] − 4 4 ( y + 1)2 ( y + 1)2 µ ¶ 1 1 −3 = ([1 + ln( y + 1)] − 1) + 2 x 2 (1 + ln( y + 1)) 2 4 4 x( y + 1) µ ¶ ln( y + 1) 1 −3 = + 2 x 2 (1 + ln( y + 1)) > 0. 4 4 x( y + 1)2

Since ln( y + 1) = ln(1.2448129) > 0, the solution is a local maximum.

♦

The final example in this section considers a problem with general parameter specification.

E XAMPLE 11.6: A firm uses two inputs, gas ( x) and electricity ( z), to heat a building during winter. The problem is to decide on the appropriate combinations of the inputs to minimize cost. The cost function is given by:

c( x, z) = px−α + β

x + γ z, f ( z)

α, β, γ > 0,

where f ( z) is an increasing function ( f ′ ( z) > 0). The first-order conditions are:

c x ( x, z) = −α px−α−1 + c z ( x, z) = −β

β = 0, f ( z)

x f ′ ( z ) + γ = 0. f ( z )2

And the second derivatives are given by:

c xx ( x, z) = α(α + 1) px−α−2 · ¸ 1 1 ′ 2 ′′ c zz ( x, z) = −β x −2 ( f ( z )) + f ( z ) f ( z )3 f ( z )2 c xz ( x, z) = −

β f ′ ( z ). f ( z )2

Thus, the second-order conditions are:

c xx = α(α + 1) px−α−2 > 0 ¸ · 1 1 ′ 2 ′′ ( f ( z )) + f ( z ) >0 c zz = −β x −2 f ( z )3 f ( z )2

CHAPTER 11. MULTIVARIATE OPTIMIZATION

346

µ · c xx c zz − c2xz = −(α(α + 1) px−α−2 ) β x −2 2

β

−

f ( z )2

1 1 ( f ′ ( z))2 + f ′′ ( z) 3 f ( z) f ( z )2

¸¶

( f ′ ( z))2 > 0.

These conditions are impossible to confirm without a specific functional form for f . Consider the case where f ( z) = e z , so that f ( z) = f ′h( z) = f ′′ ( z) = ie z . Then f (1z)3 ( f ′ ( z))2 = f (1z) = f (1z)2 f ′′ ( z), so that c zz ( x, z) = −β x −2 f (1z) + f (1z) =

−β x f (1z) (−1) =

βx f ( z) .

β

β

Also, c xz ( x, z) = − f ( z)2 f ′ ( z) = − f ( z) . Thus, the second-

order conditions reduce to:

c xx = α(α + 1) px−α−2 > 0 c zz =

βx

>0

f ( z)

c xx c zz − c2xz = [α(α + 1) px−α−1 ]

·

The first-order condition gives −α px−α−1 +

stitute

β f ( z)

for −α px

−α −1

¸ µ ¶ β 2 − > 0. f ( z) f ( z) β

β f ( z)

= 0 or α px−α−1 =

β f ( z) .

Sub-

in the last condition to obtain

· ¸· ¸ µ ¶ ¶ µ β β β 2 β 2 (α + 1) − = (α) > 0, f ( z) f ( z) f ( z) f ( z)

which verifies that the first-order condition is satisfied. To solve for x and z, recall the first-order conditions:

c x ( x, z) = −α px−α−1 + From c x ( x, z) = 0, α px−α−1 =

x=

µ

1 αp

¶

−1 (α+1)

µ

β

f ( z)

β f ( z) ,

= 0,

c z ( x, z) = −β

so x−(α+1) =

β

f ( z)

¶

−1 (α+1)

1 β α p f ( z) .

1

= (α p) (α+1)

µ

x + γ = 0. f ( z)

Rearranging,

f ( z) β

¶

1 (α+1)

,

and from c z ( x, z) = 0, β f (xz) = γ so that, substituting for x, µ

β

f ( z)

¶µ

1 αp

¶

−1 (α+1)

µ

β

f ( z)

¶

−1 (α+1)

= γ.

Rearranging, µ

1 γ= αp

¶

−1 (α+1)

µ

β

f ( z)

¶µ

β

f ( z)

¶

−1 (α+1)

µ

1 = αp

¶

−1 (α+1)

µ

β

f ( z)

¶ 1+

−1 (α+1)

µ

1 = αp

¶

−1 (α+1)

µ

β

f ( z)

¶

α (α+1)

.

11.2. THE TWO-VARIABLE CASE

347

So, µ

β

f (z)

¶

α (α+1)

µ

1 =γ αp

¶

1 (α+1)

µ

and then

β

f (z)

¶

=γ

α+1 α

µ

1 αp

¶1

α

.

Thus,

f (z) = β(γ)

−(α+1) α

µ

1 αp

¶ −1 α

or

z= f

−1

Ã

β(γ)

−(α+1) α

µ

1 αp

¶ −1 ! α

.

When f (z) = e z , q = f (z) gives z = f −1 (q) = ln(q). Thus, "

z = ln β(γ)

11.2.2

−(α+1) α

µ

1 αp

¶ −1 # α

.

♦

Failure of the second-order conditions

The second-order conditions confirm that at a candidate optimum, in a neighborhood of that point the function “bends down” in all directions in the case of maximization, and “bends up” in all directions in the case of minimization. There are a variety of ways in which this condition may fail. The following examples illustrate two common cases. Consider the function f (x, y) = x2 − y2 . The function is depicted in Figure 11.2 and is sometimes called a saddle function in view of its shape. As the figure indicates, the point (0,0) satisfies the first-order conditions but from there variations in x raise f while variations in y reduce f . Observe that f x = 2x, f y = −2y, f xx = 2, f yy = −2 and f x y = 0, so the Hessian matrix is: Ã

2

0

0

−2

!

.

Thus, |H1 | = 2 and |H2 | = −2 so that neither the conditions for a local maximum or a local minimum are satisfied.

As a second example, consider the function: f (x, y) = x2 y.

CHAPTER 11. MULTIVARIATE OPTIMIZATION

348

z z

y

x x y

Figure 11.2: The function f (x, y) = x2 − y2 This function is plotted in Figure 11.3. Again, the point (0,0) satisfies the firstorder conditions since f x = 2x y, f y = x2 , but the function twists at (0,0) . The Hessian matrix is:

H=

Ã

2y

2x

2x

0

!

.

So |H1 | = 2y and |H2 | = −4x2 . Both of these are 0 at (x, y) = 0.

z y x

Figure 11.3: The function f (x, y) = x2 y

11.2. THE TWO-VARIABLE CASE

349

11.2.3

The envelope theorem

Consider the problem of maximizing a function of x when the function depends on 1

some exogenous parameter α : f (x, α). For example: f (x, α) = x 2 − α x. The usual procedure is to set the first derivative to zero: f x (x, α) = 0. This then gives the

solution x∗ = x(α) as a function of α, chosen to satisfy f x (x(α), α) = 0. Note that

if α varies, so does x(α) to maintain the equality of the partial derivative with 0. As α varies, so will the optimal choice of x to maintain the identity f x (x(α), α) ≡ 1

1

0. In the example, f (x, α) = x 2 − α x gives f x (x, α) = 12 x− 2 − α = 0. Rearranging, ¡ 1 ¢2 1 1 2 2α = x or x(α) = 2α . If the optimal choice of x given α, x(α), is substituted into

the function f , then f (x(α), α) is the value of the function being maximized, at the solution. Note that f (x(α), α) depends only on α; the value of the optimized function may be written f ∗ (α) ≡ f (x(α), α). In the example: ∗

f (α) = f (x(α), α) =

·µ

1 2α

¶2 ¸ 12

µ

1 −α 2α

¶2

µ

¶ µ ¶ µ ¶ 1 1 1 = − = . 2α 4α 4α

Now consider the following question. If α varies, how much will the value of the maximized function vary? That is, what is the value of

d f ∗ ( α) dα ?

Differentiating:

d f ∗ (α) dx(α) = f x (x(α), α) + f α (x(α), α). dα dα Note now that from the first-order condition f x (x(α), α) = 0 at each value of α, so that for each α,

d f ∗ (α) = f α (x(α), α). dα

In words, the change in the maximized value of the function, f ∗ (α), as the parameter (α) changes, is equal to the derivative of the (unoptimized) function, f (x, α), with respect to the parameter, with the derivative evaluated at the solution (x(α)) : ¡ ¢ f α (x∗ (α), α). In the example, f ∗ (α) = 41α , so that µ ¶ µ ¶ d f ∗ (α) 1 1 2 =− = − = − x(α). dα 2α 4α2 1

If we differentiate f (x, α) = x 2 − α x with respect to α and evaluate at the solution 1

x(α), we get − x(α). To maximize f (x, α) = x 2 − α x, the optimal choice of x is that 1

which makes the gap between x 2 and α x as large as possible. In Figure 11.4, this 1

is illustrated with the gap maximized at x(α) and with resulting value x(α) 2 − α x(α) = a − b.

CHAPTER 11. MULTIVARIATE OPTIMIZATION

350

x

1 2

slope = α′

slope = α

•d

f• •e ·

·

•a

c• • b

· x ( x(α ) α) ′

·

x

Figure 11.4: The impact of parameter variation on the objective value In the figure, profit at α is (a − b) and at α′ profit is (d − f ). Observe that π(α′ ) = d − f ≈ ( a − c) = ( a − b) − ( c − b) = π(α) − (α′ − α) x(α),

so π(α′ ) − π(α)

(α′ − α)

≈ − x(α).

This observation extends to the general case. Let f ( x1 , x2 , · · · , x n , α1 , α2 , · · · , αm ).

For ease of notation write x = ( x1 , x2 , · · · , x n ) and α = (α1 , α2 , · · · , αm ). The first-

order conditions are f xi ( x, α) = 0, i = 1, 2, · · · , n. Solving these gives x(α) = ( x1 (α),

x2 (α), · · · , x n (α)). Then the optimized value of the function is f ∗ (α) = f ( x(α), α).

To calculate the change in f ∗ due to a change in one α, say α i , take the partial derivative of f ∗ (α) with respect to α i . This gives ∂ f ∗ (α) ∂α i

=

n ∂ f ( x(α), α) ∂ x (α) X ∂ f ( x(α), α) i + . ∂ x ∂α ∂α i i i j =1

Note now that from the first-order conditions, the partial derivative of f ( x, α) evaluated at x = x(α) is equal to 0 for each j = 1, 2, · · · , n : ∂ f ∗ (α) ∂α i

=

∂ f ( x(α), α) ∂α i

.

∂ f ( x(α),α) ∂x j

= 0, ∀ j. Thus,

11.2. THE TWO-VARIABLE CASE

351

The following examples illustrate this result.

E XAMPLE 11.7: The profit function of a firm depends on output Q and advertising A . Profit is given by π(Q, A ) = 500 − 4Q 2 − 8Q + 6 AQ − 10 A 2 + 100 A.

So, the first-order conditions are: πQ = −8Q + 6 A − 8

=0

π A = 6Q − 20 A + 100 = 0.

Thus, Ã

−8

6

Solving these gives Q =

πQ A = 6,

2 πQQ π A A − πQ A

110 31

!Ã ! Q

6 −20

A

and A =

=

188 31 .

Ã

8

!

−100

.

Since πQQ = −8, π A A = −20 and

= 160 − 36 = 124. So, the second-order conditions for

♦

a maximum are satisfied.

Example 11.8 develops Example 11.7 to illustrate the envelope theorem.

E XAMPLE 11.8: Suppose that the profit function is now π(Q, A ) = 500 −

αQ 2 − βQ + γ AQ − 5 A 2 + 50 A , where α, β and γ are parameters. In this case,

the first-order conditions become:

πQ = −2αQ − β + γ A = 0,

π A = γQ − 10 A + 50 = 0.

Rearranging: Ã −2 α γ

γ

−10

!Ã ! Q

A

=

Ã

β

−50

!

.

Using Cramer’s rule:

¯ ¯ β ¯ ∆1 = ¯ ¯−50

¯ γ ¯¯ ¯ = 20α − γ2 , −10¯ ¯ ¯ ¯ ¯ −2 α γ ¯¯ β ¯¯ ¯ ¯ = −10β + 50γ, ∆2 = ¯ ¯ = 100α − γβ. ¯ γ −10¯ −50¯ ¯ ¯−2α ¯ ∆=¯ ¯ γ

CHAPTER 11. MULTIVARIATE OPTIMIZATION

352

Thus,

Q (α, β, γ) =

∆1 −10β + 50γ = , ∆ 20α − γ2

A (α, β, γ) =

∆2 100α − γβ = . ∆ 20α − γ2

For the second-order conditions: πQQ = −2α < 0,

and

2 2 πQQ π A A − πQ A = 20α − γ > 0

µ

if α >

γ2

20

¶

.

Finally, using the envelope theorem, the impact of parameter changes on the value of the objective may be calculated: ∂π∗ ∂α

= −Q (α, β, γ)2 ,

For example,

∂π∗ ∂α

∂π∗ ∂β

= −Q (α, β, γ),

∂π∗ ∂γ

= Q (α, β, γ) A (α, β, γ).

is the change in the firm’s profit (at the optimal solution)

due to a change in the parameter α.

♦

The next example considers a standard economic model: profit maximization by a price-taking firm with two inputs. Profit maximization leads to optimal choices, which in turn determine the firm’s profit. How does this vary as input prices vary? The envelope theorem provides a simple way to answer this question. In addition, the example illustrates the impact of parameter changes on the choice variables.

E XAMPLE 11.9: A firm produces output, Q , with two inputs, x and y, according to the production function Q = xα yα , 0 < α < 21 . The cost of input x is

r > 0 and the cost of input y is w > 0. The output Q sells at unit price p.

The following discussion considers the problem of profit maximization and consideration of the impact of parameter changes on the optimal values of the choice variables and the level of profit at the solution. The profit function is π( x, y) = pxα yα − rx − w y.

Maximizing this determines solution values for x and y, say x∗ and y∗ , which depend on the parameters ( p, r, w, α), x∗ = x( p, α, r, w) and y∗ =

y( p, α, r, w). Substituting these into the profit function gives the level of

profit at the optimal choices of ( x, y): π∗ ( p, w, r) = p( x∗ )α ( y∗ )α − rx∗ − w y∗ ,

with x∗ and y∗ functions of ( p, α, r, w).

11.2. THE TWO-VARIABLE CASE

353

How does maximized profit vary with a small change in r? According to the envelope theorem, the answer may be obtained by differentiating the unoptimized profit function with respect to r and evaluating the derivative at the solution. Here, the derivative of the unoptimized function with respect to r is − x and evaluating at the solution gives − x∗ . (Likewise, the impact

of an output price change is obtained by differentiating the unoptimized function with respect to p and evaluating the derivative at the solution.) The following computations confirm these observations. S OLVING

FOR THE OPTIMAL SOLUTION

The first-order conditions for profit maximization are: π x ( x, y) = α pxα−1 yα − r = α pQx−1 − r = 0

π y ( x, y) = α pxα yα−1 − w = α pQ y−1 − w = 0,

where Q = xα yα . Therefore, rx = w y or x =

w r y,

and

0 = α pxα−1 yα − r ³ w ´α−1 = αp y yα − r r ³ w ´α−1 = αp y α −1 y α − r r ³ w ´α−1 = αp y2α−1 − r. r Therefore,

αp r

¡ w ¢ α −1 r

= y1−2α , so that

y=

α p 1−12α ³ w ´ 1α−−21α

r

y = (α p )

r

1 1 −2 α

α−1

−α

w 1 −2 α r 1 −2 α .

(11.10)

Similarly, 1

α−1

−α

x = ( α p ) 1 −2 α r 1 −2 α w 1 −2 α .

(11.11)

The second-order conditions are satisfied, since: π xx = α(α − 1) pQx−2 < 0

π yy = α(α − 1) pQ y−2 < 0 π xx π yy − π2x y = α2 (α − 1)2 p2

2 2 Q2 2 2 2 Q 2 2 Q − α α p = α p [(α − 1)2 − α2 ] > 0, y2 x 2 x 2 y2 y2 x 2

CHAPTER 11. MULTIVARIATE OPTIMIZATION

354

using π x y = αα pQx−1 y−1 and the fact that [(α − 1)2 − α2 ] = [α2 − 2α + 1 − α2 ] =

(1 − 2α). T HE

IMPACT OF PARAMETER CHANGES ON CHOICE VARIABLES

From the solutions, the impact of parameter changes on the optimal values can be computed directly. For example: µ

¶ α−1 −α 1 1 ( α p ) 1 −2 α − 1 r 1 −2 α w 1 −2 α ∂p 1 − 2α µ ¶µ ¶ 1 1 = x 1 − 2α p α ∂x

=

and ∂y

1 α−1 −α −α ( α p ) 1 −2 α w 1 −2 α r 1 −2 α − 1 ∂ r 1 − 2α ¶µ ¶ µ 1 −α y. = 1 − 2α r

T HE

IMPACT

OF

CALCULATIONS )

=

PARAMETER

CHANGES

ON

( ENVELOPE

PROFIT ∗

At the optimal solution, the profit is π ( p, α, r, w) =

π( x( p, α, r, w), y( p, α, r, w)). According to the envelope theorem, the impact

of a parameter change on the objective may be computed by differentiating the unoptimized objective with respect to the parameter and evaluating the derivative at the solution. In the present context, for example: ∂π∗ ∂r ∂π∗ ∂w

= =

∂π( x( p, α, r, w), y( p, α, r, w)) ∂r ∂π( x( p, α, r, w), y( p, α, r, w)) ∂w

= − x( p, α, r, w) = − y( p, α, r, w).

The following discussion confirms this. Recall that π( x( p, α, r, w), y( p, α, r, w)) = px( p, α, r, w)α y( p, α, r, w)α

− rx( p, α, r, w) − w y( p, α, r, w).

Then ∂π( x( p, α, r, w), y( p, α, r, w)) ∂r

= α px( p, α, r, w)α−1 y( p, α, r, w)α α

+ α px( p, α, r, w) y( p, α, r, w)

∂ x( p, α, r, w)

α −1

∂r ∂ y( p, α, r, w) ∂r

11.2. THE TWO-VARIABLE CASE

355

−r

∂ x( p, α, r, w)

−w

∂r

∂ y( p, α, r, w) ∂r

− x( p, α, r, w)

= [α px( p, α, r, w)α−1 y( p, α, r, w)α − r] α

+ [α px( p, α, r, w) y( p, α, r, w)

α −1

∂ x( p, α, r, w)

− w]

∂r ∂ y( p, α, r, w) ∂r

− x( p, α, r, w).

We know that x( p, α, r, w) and y( p, α, r, w) were chosen to satisfy the firstorder conditions, so that [α px( p, α, r, w)α−1 y( p, α, r, w)α − r] = 0,

[α px( p, α, r, w)α y( p, α, r, w)α−1 − w] = 0. Therefore, ∂π( x( p, α, r, w), y( p, α, r, w))

= − x( p, α, r, w).

∂r

So, the derivative of the optimized function with respect to the parameter

r is equal to the derivative of the unoptimized function with respect to the parameter, with the derivative evaluated at the optimal solution. Similarly, ∂π( x( p, α, r, w), y( p, α, r, w))

= − y( p, α, r, w).

∂w

These calculations are closely related to dual prices in linear programming. The partial derivative of π( x( p, α, r, w), y( p, α, r, w)) with respect to r measures the change in the value of the objective at the optimal solution due to a change in the cost of one of the inputs ( x). T HE

IMPACT OF PARAMETER CHANGES ON PROFIT ( DIRECT CALCULATIONS )

Finally, the optimized function can be differentiated directly. These tedious calculations confirm the result obtained directly from the envelope theorem. 1

α−1

−α

Observe, using Equations 11.11 and 11.10, x = (α p) 1−2α r 1−2α w 1−2α , and y = 1

α−1

−α

(α p) 1−2α w 1−2α r 1−2α , so:

α

x α = ( α p ) 1 −2 α r y α = (α p ) Because

(α−1)α 1 −2 α

2

+ 1−−α2α =

α 1 −2 α

(α−1)α 1 −2 α

w

−α2

(11.12)

w 1 −2 α

(α−1)α 1 −2 α

r

−α2 1 −2 α

.

−α 1 −2 α , 2α

−α

−α

x α y α = ( α p ) 1 −2 α r 1 −2 α w 1 −2 α 2α

2α

−α

−α

= α 1 −2 α p 1 −2 α r 1 −2 α w 1 −2 α

(11.13)

CHAPTER 11. MULTIVARIATE OPTIMIZATION

356

and 2α

1

−α

−α

pxα yα = α 1−2α p 1−2α r 1−2α w 1−2α since

2α 1 −2 α

+1 =

2α+(1−2α) 1 −2 α

=

1 1 −2 α .

From the solution values of x and y, 1

α−1

−α

x = ( α p ) 1 −2 α r 1 −2 α w 1 −2 α α−1

1

−α

y = ( α p ) 1 −2 α w 1 −2 α r 1 −2 α and 1

α−1

−α

1

1

−α

1

1

−α

−α

rx = (α p) 1−2α r( 1−2α +1) w 1−2α = α 1−2α p 1−2α r 1−2α w 1−2α α−1

1

−α

−α

1

1

w y = (α p) 1−2α w( 1−2α +1) r 1−2α = α 1−2α p 1−2α w 1−2α r 1−2α . Thus, 1

1

−α

−α

rx + w y = 2α 1−2α p 1−2α w 1−2α r 1−2α . So, π∗ = p( x∗ )α ( y∗ )α − (rx∗ + w y∗ ), 2α

1

−α

−α

−α

−α

p( x∗ )α ( y∗ )α − (rx∗ + w y∗ ) = α 1−2α p 1−2α r 1−2α w 1−2α − 2α 1−2α p 1−2α w 1−2α r 1−2α i h 2α 1 1 −α −α = α 1 −2 α − 2 α 1 −2 α p 1 −2 α r 1 −2 α w 1 −2 α i h 1 1 1 −α −α = α 1−2α α−1 − 2α 1−2α p 1−2α r 1−2α w 1−2α ¤ 1 1 £ −α −α = α 1−2α α−1 − 2 p 1−2α r 1−2α w 1−2α 1

1

−α

−α

= α 1−2α α−1 [1 − 2α] p 1−2α r 1−2α w 1−2α ,

using the fact that ∂π∗ ∂r

=

2α 1 −2 α

=

1 1 −2 α

− 1. Differentiating with respect to r gives:

h −α i 1 −α −α 1 α 1−2α α−1 [1 − 2α] p 1−2α r 1−2α −1 w 1−2α 1 − 2α 1

1

α−1

−α

= −α 1 −2 α p 1 −2 α r 1 −2 α w 1 −2 α = − x∗ ,

using

−α 1 −2 α

−1 =

−α−(1−2α) 1 −2 α

=

α −1 1 −2 α

and

£

−α 1 −2 α

¤

α−1 [1 − 2α] = −1.

♦

11.3. OPTIMIZATION: N VARIABLES

357

11.3 Optimization: n variables Consider the problem of maximizing or minimizing a function f : max f ( x1 ,... , x n ) or min f ( x1 ,... , x n ). For convenience, write f ( x), where x = ( x1 ,... , x n ). The first derivatives of f are f i =

∂f ∂ xi , i

∗

= 1,... , n. Set ∇ f = ( f 1 , f 2 ,... , f n ). If it is the case

that at a point x , the partial derivative of f with respect to x i is not equal to 0, say f i ( x∗ ) =

∂ f ( x∗ ) ∂ xi

> 0, this means that at the point x∗ the value of f is increased

if x i is increased from x∗i . This implies that for x∗ to be a local maximum of the function f , it must be the case that every partial derivative is 0. A necessary condition for x∗ to be a local maximum is that partial derivatives at x∗ are 0: ∂ f ( x∗ ) ∂xi

= 0, i = 1,... , n.

Under what conditions can one be sure that x∗ is a local maximum? In other words, what are necessary and sufficient conditions for x∗ to be a local maximum? Let H ( p) be the Hessian matrix evaluated at p: 

H=

½

∂2 f ∂xi ∂x j

¾

∂2 f ∂2 x1

  2  ∂ f  ∂ x2 ∂ x1  =  .. ij  .   2

∂ f ∂ xn ∂ x1

∂2 f ∂ x1 ∂ x2 ∂2 f ∂2 x2

··· ···

.. .

..

∂2 f ∂ xn ∂ x2

.

···



∂2 f ∂ x1 ∂ xn 



∂2 f ∂ xn ∂ xn

f 11

   f 21  = ..     .     f n1

∂2 f  ∂ x2 ∂ xn  

.. .



f 12

···

f 22

···

.. .

..

f n2

···

.

f 1n



  f 2n   . ..  .    f nn

In general, H will depend on x. This dependence is made explicit by writing H ( x). Determining if H is positive or negative definite is possible by checking a set of determinental conditions. Define H j to be the j × j matrix obtained from H by deleting the last n − j rows and columns: 

f 11

   f 21  Hj =  .  .  .  f j1

f 12

···

f 22

···

.. .

f j2

..

.

···

f1 j

  f2 j   . ..   .   fjj

Theorem 11.2: Suppose that x∗ satisfies: ∂ f ( x∗ ) ∂xi



= 0, i = 1,... , n.

CHAPTER 11. MULTIVARIATE OPTIMIZATION

358

If |H1 | < 0, |H2 | > 0, |H3 | < 0, ... ,

or (−1) j |H j | > 0, j = 1,... , n,

then x∗ is a local maximum. If |H1 | > 0, |H2 | > 0, |H3 | > 0,... ,

|H j | > 0, j = 1,... , n,

or

then x∗ is a local minimum. The following example illustrates the calculations. E XAMPLE 11.10: Let f ( x, y, z) = 3 x2 + 12 y2 + 5 z2 − 13 x + 12 y + 2 xz − 3 yz. Then: f x = 6 x − 13 + 2 z, f xx = 6, f x y = 0, f xz = 2, f y = 24 y + 12 − 3 z, f yy = 24,

f yz = −3, f z = 10 z + 2 x − 3 y, f zz = 10, and 

6

0

 H = 0

24

2

−3

2

 −3 ,

The first-order conditions solve to give: x =

| H 1 | = 6 > 0,

¯ ¯6 ¯ |H2 | = ¯ ¯0

¯ 0 ¯¯ ¯ = 144 > 0, 24¯



10

205 86 ,

25 y = − 43 and z = − 28 43 .

¯ ¯6 ¯ ¯ |H3 | = ¯¯0 ¯ ¯2

so the Hessian is positive definite.

0 24 −3

¯ 2 ¯¯ ¯ −3¯¯ = 1290 > 0, ¯ 10 ¯

♦

The next example is the standard profit-maximization, now with three inputs. E XAMPLE 11.11: Consider the production function q = f ( x, y, z) = xα yβ zγ .

A price-taking firm must maximize π( x, y, z) = p f ( x, y, z) − rx − sy − tz. The

first-order conditions are π x = p f x − r = 0, π y = p f y − s = 0 and π z = p f z − t = 0. The first- and second-order partial derivatives are:

f x = α x α −1 y β z γ

f y = β xα yβ−1 zγ f z = γ x α y β z γ− 1

11.3. OPTIMIZATION: N VARIABLES

359

and

q x2 q f yy = β(β − 1) xα yβ−2 zγ = β(β − 1) 2 y q f zz = γ(γ − 1) xα yβ zγ−2 = γ(γ − 1) 2 z q α−1 β−1 γ f x y = αβ x y z = αβ xy q f xz = αγ xα−1 yβ zγ−1 = αγ xz q α β−1 γ−1 f yz = βγ x y z = βγ . yz f xx = α(α − 1) xα−2 yβ zγ = α(α − 1)

From these calculations, the Hessian is: 







π xx

πx y

π xz

f xx

fxy

f xz

 H = π yx

π yy

   π yz   =  f yx

f yy

 f yz  

π zx

πz y

q α(α − 1) x2   =  αβ xqy  q αγ xz

f zx

π zz

q αβ x y



q

β(β − 1) y2 q

βγ yz

Ã q ´ ³ α(α − 1) x2 q From this, H1 = α(α − 1) x2 , H2 = q αβ x y q

fzy

f zz

q αγ xz q βγ yz

 q

γ(γ − 1) z2

  . 

q

αβ x y q

!

β(β − 1) y2

and H3 = H .

Thus, |H1 | = α(α − 1) x2 < 0. Also, µ ¶2 q q 2 2 q |H2 | = α(α − 1)β(β − 1) 2 2 − α β xy x y q q = [α(α − 1)β(β − 1) − α2 β2 ] 2 2 x y q q = αβ[1 − α − β] 2 2 . x y

Since 1 − α − β > 0, this expression is positive. Finally, ¯ ¯ ¯ ¯ q ¯ ¯αβ x y q ¯ − αβ ¯ q¯ x y ¯¯ αγ q γ(γ − 1) z2 ¯ yz xz ¯ q ¯ β(β − 1) y2 ¯ ¯ ¯ q βγ yz ¯

¯ q ¯ q ¯¯β(β − 1) y2 |H3 | = α(α − 1) 2 ¯ x ¯ βγ q ¯ q ¯ q ¯¯αβ x y + αγ ¯ xz ¯ αγ q xz

q

βγ yz

¯ ¯ ¯ ¯ q¯ γ(γ − 1) z2 ¯ q

βγ yz

CHAPTER 11. MULTIVARIATE OPTIMIZATION

360

i q qh β(β − 1)γ(γ − 1) − β2 γ2 2 2 2 x y z i q q h αβγ(γ − 1) − αγβγ − αβ xy x yz2 q q + αγ [αββγ − αγβ(β − 1)] 2 xz xy z

= α(α − 1)

= −αβγ(1 − α − β − γ)

q2 , x 2 y2 z 2

♦

so |H3 | < 0, since (1 − α − β − γ) > 0.

11.3.1

Motivation for the Hessian conditions

To motivate the previous discussion, consider a Taylor series expansion of f . The matrix version of Taylor’s theorem gives:

f ( x) = f ( x∗ ) + ∇ f ( x∗ ) · ( x − x∗ ) + ( x − x∗ )′ H ( p)( x − x∗ ), where p is between x and x∗ . If x∗ satisfies: f i ( x∗ ) = 0, ∀ i , then

f ( x˜ ) = f ( x∗ ) + ( x˜ − x∗ )′ H ( p)( x˜ − x∗ ).

(11.14)

From this expression, whether f ( x˜ ) is larger or smaller than f ( x∗ ) depends entirely on H ( p). If, for all vectors, z 6= 0, z′ H ( p) z < 0, then H ( p) is negative definite. From Equation 11.14 this implies f ( x˜ ) < f ( x∗ ). If, for all vectors, z 6= 0, z′ H ( p) z > 0

the matrix H ( p) is positive definite and Equation 11.14 implies f ( x˜ ) > f ( x∗ ). So

the sufficiency conditions depend on properties of the Hessian matrix H , and the

associated quadratic form, characterized in terms of positive and negative definiteness. Theorem 11.3 gives criteria for negative and positive definiteness. Theorem 11.3: The matrix H ( x) is negative definite if: |H1 ( x)| < 0, |H2 ( x)| > 0, |H3 ( x)| < 0,... , or (−1) j |H j ( x)| > 0, for j = 1,... , n.

The matrix H ( x) is positive definite if: |H1 ( x)| > 0, |H2 ( x)| > 0, |H3 ( x)| > 0,... or |H j ( x)| > 0, for j = 1,... , n.

Note that if H ( x∗ ) satisfies the conditions for positive or negative definiteness, then provided p is close to x∗ , H ( p) will also satisfy those conditions. If x˜ is close

11.3. OPTIMIZATION: N VARIABLES

361

to x∗ in Equation 11.14, then so is p. So, for example, if at x∗ , f i ( x∗ ) = 0, for all i , and H ( x∗ ) is negative definite, then so is H ( p) and f ( x˜ ) < f ( x∗ ).

11.3.2

Definiteness and second-order conditions

Recall that the Hessian matrix is the matrix of second partial derivatives of the function. Because the order of differentiation does not matter, f xi x j = f x j xi , the Hessian matrix is symmetric and the expression z′ H ( p) z is a quadratic form. The

main theorem giving second-order conditions in terms of the Hessian matrix is: Theorem 11.4: Let x∗ satisfy f i ( x∗ ) = 0, i = 1,... , n. Then 1. x∗ is a local maximum if H ( x˜ ) is negative semi-definite, ∀ x˜ close to x∗ . 2. x∗ is a unique local maximum if H ( x∗ ) negative definite. 3. x∗ is a global maximum if H ( x) is negative semi-definite ∀ x. 4. x∗ is a unique global maximum if H ( x) is negative definite ∀ x. 5. x∗ is a local minimum if H ( x) is positive semi-definite. 6. x∗ is a unique local minimum if H ( x∗ ) is positive definite, ∀ x˜ close to x∗ . 7. x∗ is a global minimum if H ( x) positive semi-definite ∀ x. 8. x∗ is a unique global minimum if H ( x) positive definite ∀ x. P ROOF : These results are easy to verify by considering the Taylor series expansion:

f ( x) = f ( x∗ ) + ∇ f ( x∗ ) · ( x − x∗ ) + ( x − x∗ )′ H ( p)( x − x∗ ) and if at x∗ , f i ( x∗ ) = 0 so ∇ f ( x∗ ) = 0, then the expansion gives:

f ( x) = f ( x∗ ) + ( x − x∗ )′ H ( p)( x − x∗ ). For example, if H ( x˜ ) is negative semi-definite, ∀ x˜ close to x∗ , then this will also be satisfied by p in the Taylor series expansion, since f is between x˜ and x∗ :

H ( x˜ ) is negative semi-definite when x˜ is close to x∗ . In this case, ( x˜ − x∗ )′ H ( p)( x˜ −

x∗ ) ≤ 0, so that f ( x˜ ) ≤ f ( x∗ ). This confirms that x∗ is a local maximum. A similar

discussion applies to the other results.

CHAPTER 11. MULTIVARIATE OPTIMIZATION

362

The next examples illustrate three-variable optimization and the envelope theorem. Following this, a variation of the example is used to illustrate the envelope theorem.

E XAMPLE 11.12: A monopolist can sell a product in three different countries. The demand in each country is Q 1 = 45 − 21 P1 , Q 2 = 50 − 13 P2 and

Q 3 = 75 − 14 P3 . The cost function for the monopolist is C = 30 + Q 2 where Q = Q 1 + Q 2 + Q 3 . The profit function for the monopolist is:

π = T R − TC = P1 Q 1 + P2 Q 2 + P3 Q 3 − 30 − (Q 1 + Q 2 + Q 3 )2 .

Using the inverse demand functions, P1 = 90 − 2Q 1 , P2 = 150 − 3Q 2 , P3 = 300 − 4Q 3 , profit is:

π = (90 − 2Q 1 )Q 1 + (150 − 3Q 2 )Q 2 + (300 − 4Q 3 )Q 3 − 30 − (Q 1 + Q 2 + Q 3 )2 .

The first-order conditions for profit maximizing output in each country are: πQ 1 = 90 − 4Q 1 − 2(Q 1 + Q 2 + Q 3 ) = 0 πQ 2 = 150 − 6Q 2 − 2(Q 1 + Q 2 + Q 3 ) = 0 πQ 3 = 300 − 8Q 3 − 2(Q 1 + Q 2 + Q 3 ) = 0.

In matrix form, these conditions give: 

90

 

6

2

   150 − 2   

8

300

2

2

    Q1 0         2  Q 2  = 0 , 0 10 Q 3

2

and solving: 

 −1

6

2

2

 2  2

8

 2 

2

10

   19 100 Q1    Q 2  =  −1     25 Q3 3 − 100



19 100

  1 =  − 25  3 − 100

1 − 25 7 50 1 − 50

1 − 25

3 − 100

7 50

1 − 50

1 − 50

11 100

3 − 100



 1 ;  − 50  11 100



  21  90 10    57      150 =  5  .  273 300 10

11.3. OPTIMIZATION: N VARIABLES

363

Since πQ 1Q 1 = −6,

πQ 1 Q 2 = −2,

πQ 2Q 3 = −2,

πQ 3 Q 3 = −10,

 −6  H= −  2 −2

−2

−8

−2

−2

πQ 1 Q 3 = −2,



 −2  , −10

H = H3 ;

H 1 = −6 < 0 ,

¯ ¯−6 ¯ ¯ |H | = ¯¯−2 ¯ ¯−2

−2

−8

−2

πQ 2 Q 2 = −8,

¯ ¯−6 ¯ H2 = ¯ ¯−2

¯ −2¯¯ ¯ = 44 > 0, −8 ¯

¯ −2 ¯¯ ¯ −2 ¯¯ = −400 < 0, ¯ −10¯

♦

and the solution is a maximum.

E XAMPLE 11.13: Suppose that in Example 11.12 the cost function is C =

30 + αQ 2 , where α is some positive constant. Then the profit function

becomes

π = (90 − 2Q 1 )Q 1 + (150 − 3Q 2 )Q 2 + (300 − 4Q 3 )Q 3 − 30 − α(Q 1 + Q 2 + Q 3 )2 ,

and the first-order conditions are: πQ 1 = 90 − 4Q 1 − 2α(Q 1 + Q 2 + Q 3 ) = 0 πQ 2 = 150 − 6Q 2 − 2α(Q 1 + Q 2 + Q 3 ) = 0 πQ 3 = 300 − 8Q 3 − 2α(Q 1 + Q 2 + Q 3 ) = 0.

Thus, 

  4 + 2α    150 −  2α    300 2α

90

2α 6 + 2α 2α

    Q1 0     Q 2  = 0 . 2α      8 + 2α Q 3 0

2α

Some calculation yields:  4 + 2α   2α  2α

−1

2α

2α

6 + 2α

 2α   8 + 2α

2α



7α+12  4(13α+12)  =  − (13αα+12) 

− 4(133αα+12)

− (13αα+12)

− 4(133αα+12)

− 2(13αα+12)

5 α +6 4(13α+12)

3 α +4 2(13α+12)



  − 2(13αα+12)  

CHAPTER 11. MULTIVARIATE OPTIMIZATION

364

and 

7α+12  4(13α+12)   − (13αα+12)  − 4(133αα+12)

− (13αα+12)

3 α +4 2(13α+12) − 2(13αα+12)

  ¡ 15 ¢ ¡ −36+29a ¢   − 2 90 Q1 12+13a          15(a−20) α     = Q 2  − 2(13α+12)  150 =  − 12+13a .    ¡ 15 ¢ 31a+60 5 α +6 Q3 300

− 4(133αα+12)



4(13α+12)

2

12+13a

For the second-order conditions, the Hessian matrix is:    

−(4 + 2α) −2 α

−2 α

−2 α

−(6 + 2α) −2 α

−2 α

−2 α

−(8 + 2α)



 , 

so | H1 |= −(4 + 2α) < 0, | H2 |= [(4 + 2α)(6 + 2α)] − (2α)2 = 24 + 20α > 0, and

| H3 |= −(208α + 192) < 0.

This example may be used to illustrate the envelope theorem. If the optimized function is denoted π∗ (α), then

d π∗ = −(Q 1∗ + Q 2∗ + Q 3∗ )2 , dα where Q ∗i is the optimal value of Q i . The following calculations confirm this. Observe that with the cost function C = 30 + αQ 2 , the optimal values of the Q i depend on α, as calculated above. Write these functions:

Q 1 (α),Q 2 (α),Q 3 (α). Then R1 (Q 1 (α)) = P1 (α)Q 1 (α) = [90 − 2Q 1 (α)]Q 1 (α) R2 (Q 2 (α)) = P2 (α)Q 2 (α) = [150 − 3Q 2 (α)]Q 2 (α) R3 (Q 3 (α)) = P3 (α)Q 3 (α) = [300 − 4Q 3 (α)]Q 3 (α) and π∗ (α) = R1 (Q 1 (α)) + R2 (Q 2 (α)) + R3 (Q 3 (α)) − 30 − α(Q 1 (α) + Q 2 (α) + Q 3 (α))2 .

Thus,

d π∗ dR1 (Q 1 (α)) dQ 1 dR2 (Q 2 (α)) dQ 2 dR3 (Q 3 (α)) dQ 3 = + + dα dQ 1 dα dQ 2 dα dQ 3 dα · ¸ dQ 1 dQ 2 dQ 3 + + − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) dα dα dα − (Q 1 (α) + Q 2 (α) + Q 3 (α))2

11.3. OPTIMIZATION: N VARIABLES

365

=

·

¸ dR1 (Q 1 (α)) dQ 1 − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) dQ 1 dα ¸ · dQ 2 dR2 (Q 2 (α)) − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) + dQ 2 dα ¸ · dQ 3 dR3 (Q 3 (α)) − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) + dQ 3 dα

− (Q 1 (α) + Q 2 (α) + Q 3 (α))2 .

Since · d π∗ dα

¸ dR i (Q i (α)) − 2α(Q 1 (α) + Q 2 (α) + Q 3 (α)) = 0, i = 1, 2, 3, dQ i

= −(Q 1 (α) + Q 2 (α) + Q 3 (α))2 . Alternatively, recall that the profit function

(before optimization) is:

π = (90 − 2Q 1 )Q 1 + (150 − 3Q 2 )Q 2 + (300 − 4Q 3 )Q 3 − 30 − α(Q 1 + Q 2 + Q 3 )2 .

Differentiating with respect to the parameter α gives −(Q 1 + Q 2 + Q 3 )2 , and evaluating this at the solution yields: (Q 1 (α) + Q 2 (α) + Q 3 (α))2 . Thus, the

derivative of the unoptimized function with respect to the parameter α evaluated at the solution is the same as the derivative of the optimized function with respect to the parameter – an illustration of the envelope theorem. ♦

R EMARK 11.6: When f is additively separable, the second order conditions are P simpler. The function f is additively separable if f ( x1 ,... , x n ) = ni=1 f i ( x i ) =

f 1 ( x1 ) + · · · + f n ( x n ). The Hessian has the form:  ∂2 f 1

    H =    

∂2 x12

0

0

∂2 f 2 ∂2 x22

.. .

.. .

0

0

Then f is negative definite if and only if

E XAMPLE 11.14 (P RICE

··· ···

..

.

0 ∂2 f i ∂ x2i

0



  0    ..  . .    2 n

∂ f ∂2 x2n

< 0, ∀ i .

DISCRIMINATION IN THREE MARKETS ):

Price and

revenue in market 1 are given by P1 = P1 (Q 1 ) and R1 (Q 1 ) = P1 (Q 1 )Q 1 ,

respectively. Similarly, in markets 2 and 3, price and revenue are,

CHAPTER 11. MULTIVARIATE OPTIMIZATION

366

respectively, P2 = P2 (Q 2 ), P3 = P3 (Q 3 ), R2 (Q 2 ) = P2 (Q 2 )Q 2 and R3 (Q 3 ) =

P3 (Q 3 )Q 3 . Production cost is given by c(Q 1 ,Q 2 ,Q 3 ). The profit function is

then: π(Q 1 ,Q 2 ,Q 3 ) = R1 (Q 1 ) + R2 (Q 2 ) + R3 (Q 3 ) − c(Q 1 ,Q 2 ,Q 3 ).

(If the cost function is additively separable, so that c(Q 1 ,Q 2 ) = γ1 (Q 1 ) + γ2 (Q 2 ) + γ3 (Q 3 ), then the problem of maximizing profit reduces to choosing

Q 1 , Q 2 and Q 3 independently. In general, if the level of production of one good affects the cost of producing the other good, then the cost function is not additively separable.) The first-order conditions are: ¶ µ ∂c ∂π , c1 = π1 = R1′ (Q 1 ) − c1 = 0, π1 = ∂Q 1 ∂Q 1 µ ¶ ∂π ∂c π2 = R2′ (Q 2 ) − c2 = 0, π2 = , c2 = ∂Q 2 ∂Q 2 ¶ µ ∂c ∂π . , c3 = π3 = R3′ (Q 3 ) − c3 = 0, π3 = ∂Q 3 ∂Q 3

The second derivatives are π11 = R1′′ (Q 1 ) − c11 , π22 = R2′′ (Q 2 ) − c22 and π33 =

R3′′ (Q 3 ) − c33 . The cross derivatives are π i j = − c i j , when i 6= j. (π i j = ci j =

∂2 c ∂Q i ∂Q j ).

∂2 π , ∂Q i ∂Q j

The first-order conditions give:

MR1 (Q 1 ) = R1′ (Q 1 ) = P1 (Q 1 ) + Q 1 P1′ (Q 1 ) = c1 (Q 1 ,Q 2 ,Q 3 ) = MC1 (Q 1 ,Q 2 ,Q 3 ). So ¸ · Q 1 dP1 = c1 P1 (Q 1 ) 1 + P1 dQ 1

or · ¸ 1 P1 (Q 1 ) 1 + = c1 , η1 h i h i η1 < 0. Similarly, P2 (Q 2 ) 1 + η1 = c2 , with η2 < 0 and P3 (Q 3 ) 1 + η1 = c3 , 2

3

with η3 < 0. If η i is a large negative number, then market i is sensitive to price and the price in this market should be set close to marginal cost.

If the cost is linear: c(Q 1 ,Q 2 ,Q 3 ) = γ1 Q 1 + γ2 Q 2 + γ3 Q 3 , and the Hessian

matrix has a simple form, since c i j = 0 for i 6= j. Taking each market to

have diminishing marginal revenue: R ′′i < 0, i = 1, 2, 3. For the second-order conditions, observe that the Hessian matrix is:

11.4. CONCAVITY, CONVEXITY AND THE HESSIAN MATRIX

367



π11 (Q 1 )

0

0

π22 (Q 2 )

0

0

   

0





    =   π33 (Q 3 )

0

R1′′ (Q 1 )

0

0

R2′′ (Q 2 )

0

0

0



  .  R3′′ (Q 3 )

0

Therefore, the second-order conditions are: π11 = R1′′ (Q 1 ) < 0, ∀Q 1 , π11 π22 =

R1′′ (Q 1 )R2′′ (Q 2 ) > 0, and π11 π22 π33 = R1′′ (Q 1 )R2′′ (Q 2 )R3′′ (Q 3 ) < 0.

♦

11.4 Concavity, convexity and the Hessian matrix The next theorem relates concavity and convexity to the “definiteness” of the Hessian. Theorem 11.5: 1. If f is strictly concave, then f is concave; and concavity of f is equivalent to negative semi-definiteness of the Hessian matrix H ( x), for all x. 2. If the Hessian matrix H ( x) is negative definite for all x, then f is strictly concave. 3. If f is strictly convex then f is convex; and convexity of f implies that the Hessian matrix H ( x) is positive semi-definite for all x. 4. If the Hessian matrix H ( x) is positive definite for all x, then f is strictly convex. P ROOF : The following discussion proves 1. From the definitions, strict concavity implies concavity. The remaining calculations show that concavity is equivalent to negative semi-definiteness. Recall that a function, f , is concave if given

x and y, f (θ x + (1 − θ) y) ≥ θ f ( x) + (1 − θ) f ( y), θ ∈ [0, 1]. Concavity is determined

by comparing any two points with a third point on the line defined by those two points. Given any two points x, y the line defined by these two points is

L x y = {w | w = (1 − θ) x + θ y, some θ ∈ R}. Let f L xy be the function f defined on L x y by: f L xy (θ) = f (θ x + (1 − θ) y), ∀θ. Observe that for every line L x y , f L xy is

concave in θ if and only if f is concave.

CHAPTER 11. MULTIVARIATE OPTIMIZATION

368

¡ ¢ f L xy (λθ + (1 − λ)θ′ ) = f (λθ + (1 − λ)θ′ ) x + [1 − (λθ + (1 − λ)θ′ )] y ¡ ¢ = f λθ x + (1 − λ)θ′ x + [(λ + (1 − λ)) − (λθ + (1 − λ)θ′ )] y ¡ ¢ = f λθ x + (1 − λ)θ′ x + (λ(1 − θ) y + (1 − λ)(1 − θ′ ) y ¡ ¢ = f λ[θ x + (1 − θ) y] + (1 − λ)[θ′ x + (1 − θ′ ) y]

≥ λ f (θ x + (1 − θ) y) + (1 − λ) f (θ′ x + (1 − θ′ ) y) = λ f L xy (θ) + (1 − λ) f L xy (θ′ ),

where the inequality holds if and only if f is concave. Thus, f L xy (θ) =

f (θ x + (1 − θ) y) = f ( y + θ( x − y)) is concave for any x, y. Let z = x − y so f L xy (θ) = L xy

f ( y + θ z). By assumption, f L xy is concave so that f θθ ≤ 0. Now, L xy

fθ

=

X i

f i ( y + θ z) z i

and L xy

f θθ (θ) =

XX j

i

f ji ( y + θ z) z j z i = z′ H ( y + θ z) z.

So, concavity of f L xy in θ for every L x y is equivalent to z′ H ( y + θ z) z ≤ 0 for all

y, z and θ.

R EMARK 11.7: Note that strict concavity of a function f does not imply that

H ( x) is negative definite for all x; and strict convexity does not imply that H ( x) is positive definite for all x. For example, consider f ( x) = −( x − 1)4 , a strictly

concave function. Differentiating gives f ′′ ( x) = −12( x − 1)2 . Note that f ′′ (1) = 0

so that f ′′ is only negative semi-definite despite the fact that the function f is strictly concave.

Exercises for Chapter 11

Exercise 11.1 The total weekly revenue (in dollars) that a company obtains from selling goods x and y is:

R ( x, y) = −

¡1¢ 4

x2 −

¡3¢ 8

y2 −

¡1¢ 4

x y + 300 x + 240 y.

The total weekly cost attributed to production is

C ( x, y) = 180 x + 140 y + 5000. Find the profit-maximizing levels of x and y and check the second-order conditions. Exercise 11.2 A television relay station will serve three towns, A , B and C , located in the ( x, y) plane at coordinates A = (30, 20), B = (−20, 10) and C = (10, −10). Plot the location of the three towns in the ( x, y) plane and find the location of a transmitter, chosen to minimize the sum of the squares of the distances of the transmitter from each of the towns. For two points ( x1 , y1 ) and ( x2 , y2 ), the dis1

tance between them is [( x1 − x2 )2 + ( y1 − y2 )2 ] 2 . Exercise 11.3 Consider the following duopoly model. There are two firms supplying a market where demand is given by p(Q ) = a − bQ . Firm i produces q i units of output and so the total level of production is Q = q1 + q2 . Both firms face the same constant marginal cost, so the cost of producing q i for firm i is cq i . Thus the profit functions of firms 1 and 2 respectively, are given by: π1 ( q1 , q2 ) = p( q1 + q2 ) q1 − cq1 = ( a − b[ q1 + q2 ]) q1 − cq1 π2 ( q1 , q2 ) = p( q1 + q2 ) q2 − cq2 = ( a − b[ q1 + q2 ]) q2 − cq2 .

369

CHAPTER 11. MULTIVARIATE OPTIMIZATION

370

(a) Suppose that each firm takes the output of the other firm as fixed, and calculates its own best choice of quantity. In this case, firm i solves

∂π i ∂q i

= 0, taking as

given the value of q j . Calculate for the linear demand case the equations:  ∂π 

" # 0

∂π2 ∂q2

0

1

 ∂q1  =

.

Solve for q1 and q2 and denote the solutions q1d and q2d , where d denotes duopoly. Compare the value of Q d = q1d + q2d , with the monopoly level of output. (b) Suppose now that we have n firms instead of two. This case is called oligopoly. Let the output of the i th firm be q oi , where o denotes oligopoly, and see if you can solve for Q o = q1o + q2o + ... + q on . (c) Suppose now that a government tax per unit of output is imposed on each firm, with tax t i per unit on firm i . Then the profit functions become: π1 ( q1 , q2 ; t1 ) = p( q1 + q2 ) q1 − cq1 − t1 q1 = ( a − b[ q1 + q2 ]) q1 − cq1 − t1 q1 π2 ( q1 , q2 ; t2 ) = p( q1 + q2 ) q2 − cq2 − t2 q2 = ( a − b[ q1 + q2 ]) q2 − cq2 − t2 q2 .

Repeat the exercise in (a), and solve for q1 and q2 . In this case, q1 and q2 will depend on t1 and t2 (and the other parameters, a, b, c), so we can write q1 ( t1 , t2 ) and q2 ( t1 , t2 ). The total tax revenue of the government is T ( t1 , t2 ) = t1 q1 ( t1 , t2 ) +

t2 q2 ( t1 , t2 ). Maximize T with respect to t1 and t2 , and check that the second-order

conditions for a maximum are satisfied. Exercise 11.4 Consider a firm that sells its output, y, at a fixed price p. In the production process, the firm creates pollution. To control pollution, the firm can engage in pollution abatement, a. The technology is as follows: the cost of producing y units of output when operating at abatement level a is given by c( y, a), and the level of pollution emission is e( y, a). We assume that for the cost function,

c y > 0, c yy > 0, c a > 0 and c aa > 0. Similarly, for the emission process, e, we assume

that e y > 0, e yy > 0, e a < 0 and e aa > 0. In addition, we assume that c a y (= c ya ) and

e a y (= e ya ) are both positive. Finally, the government imposes a (unit) tax, τ, on

the level of pollution emission. The profit function for the firm is then given by: π( y, a) = py − c( y, a) − τ e( y, a).

(a) Find the first-order conditions for profit maximization with respect to choice of ( y, a). (b) Determine the second-order conditions.

EXERCISES FOR CHAPTER 11

371

(c) In the case where c( y, a) = y2 a + γ ay and e( y, a) = y − a2 + βay, γ, β parameters, find the profit-maximizing levels of y and a. Check the second-order conditions. From the solution to the first-order conditions and calculate ∂ y/∂τ and ∂a/∂τ. Exercise 11.5 A firm’s costs of plant operation are a function of two variables, x and y. The cost function is given by:

c( x, y) = 3 x2 − 2 x y + y2 + 4 x + 3 y. Give the solution to the first-order conditions and confirm that the solution minimizes the cost function. Exercise 11.6

A box with edge lengths x, y and z has volume V = x yz. The

surface area of a box with these dimensions (open at the top) is: S = x y + 2 yz +

2 xz. (The base of the box has length x and width y. Since there is no top, x y

occurs only once in the definition of S .) Suppose you have to choose a box size with volume 32; what choice of x, y and z is optimal to minimize surface area? Note that, in this question, there are only two choice variables once the volume is fixed. The variable z can be substituted out: z = 32/ x y. Find the optimal choices of the

variables to minimize surface area S , and check that the second-order conditions for a minimum are satisfied. Exercise 11.7 Consider a firm that sells its output, y, at a fixed price p. In the production process, the firm creates pollution. To control pollution, the firm can engage in pollution abatement, a. The technology is as follows: the cost of producing y units of output when operating at abatement level a is given by c( y, a), and the level of pollution emission is e( y, a). Finally, the government imposes a (unit) tax, τ, on the level of pollution emission. The profit function for the firm is then given by: π( y, a) = py − c( y, a) − τ e( y, a).

Give the first- and second-order conditions for a maximum. Find the profitmaximizing levels of output and abatement when the cost function is given by

c( y, a) = y2 +

1 2 2a ,

and e( y, a) = y(10 − a). Check that the second-order

conditions are satisfied. We assume that the tax rate is less than 1(0 < τ < 1), and

p > 10.

Exercise 11.8

Find the maximizing values of ( x, y) for the following function,

and confirm that the second-order conditions are satisfied:

f ( x, y) =

p

x + ln y − x2 − 3 y.

CHAPTER 11. MULTIVARIATE OPTIMIZATION

Exercise 11.9

372

Find the maximizing values of ( x, y) for the following function,

and confirm that the second-order conditions are satisfied: 1 f ( x, y) = a ln x + b ln y − c( x2 + y2 ). 2 Exercise 11.10 Find the maximizing values of ( x, y) for the following function, and confirm that the second-order conditions are satisfied:

f ( x, y) = ln(ax + b y) − Exercise 11.11

1 c ( x 2 + y 2 ). 2

Consider the problem of maximizing a function, f ( x, y; a), by

choice of x and y, where a is some parameter. Suppose that the solution values for

x and y are x(a) and y(a), respectively. The optimized value of the function is then f ∗ (a) = f ( x(a), y(a); a). (a) Explain why

d f ∗ (a) da

=

∂ f ( x(a),y(a);a) . ∂a

(b) Illustrate this in the one-variable case, where f ( x; a) = a ln( x) − 1a x. Exercise 11.12 2

( y − b) −

1 2 cx y.

Consider the following problem. Minimize f ( x, y) = ( x − a)2 +

(a) Find the minimizing values for x and y. (Assume that c < 1.)

(b) Confirm that the second-order conditions are satisfied. Let f ∗ (a, b, c) be the

value of the function at the solution. (c) Find

∂ f ∗ (a,b,c) ∂a

and

∂ f ∗ (a,b,c) . ∂c

Exercise 11.13 A firm maximizes profit, where profit is equal to revenue minus cost. Output (Q ) depends on two inputs, K and L, related according to the production function Q = f (K, L). Output sells at price p and the costs of the inputs K and

L are r and w, respectively. So the profit maximization problem is: max π(K, L) = p f (K, L) − rK − wL.

Give the first- and second-order conditions for profit maximization. In the particular case where f is f (K, L) = K α Lβ with α > 0, β > 0 and 0 < α + β < 1, verify that

the second-order conditions are satisfied and solve for the optimal choice of (K, L). Exercise 11.14

A standard econometric problem is determining the ordinary

least squares estimator (OLS). In the underlying model, the i th observation on one variable y is denoted yi and the corresponding observation on another variable x is denoted x i . It is postulated that y and x are linearly related with some

EXERCISES FOR CHAPTER 11

373

error, yi = β0 + β1 x i + ǫ i . If a and b are chosen as the intercept and slope,

respectively, then the estimated error associated with the i th observation is e i =

yi − a − bx i . The OLS estimator is obtained by minimizing the sum of squares of Pn 2 i=1 e i , by choice of a and b. Thus, the OLS estimator is

the estimated errors,

obtained by solving the problem:

min S = a,b

n X

i =1

e2i = min

n X

i =1

( yi − a − bx i )2 = min

n X

i =1

( yi − a − bx i )2 .

Find the OLS estimator and confirm the second-order conditions for a minimum. Exercise 11.15 A monopolist can sell a product in three different countries. The demand in each country is: 1 Q 1 = 50 − P1 2 1 Q 2 = 50 − P2 3 1 Q 3 = 75 − P3 . 4 The cost function for the monopolist is

C (Q ) = 30 + Q 2 ,

where Q = Q 1 + Q 2 + Q 3 .

Calculate the amount of the product that the profit-maximizing monopolist should sell in each country and check the second-order conditions for a maximum. Exercise 11.16

For the following functions, calculate the Hessians and deter-

mine if they are positive or negative definite: (a)

f ( x, y) = −2 x2 − 7 y2 + x y + 12 x + 37 y

(b)

f ( x, y) = 10 x2 + 5 y2 + 3 x − 12 y + 9 x y

(c)

f ( x, y, z) = 3 x2 + 12 y2 + 5 z2 − 13 x + 12 y + 2 xz − 3 yz

(d)

f ( x, y, z) = 2 x + 3 y − 5 xz + 2 x y − 6 x2 − 8 y2 − 11 z2 .

Exercise 11.17 For the following function, find all extreme points and check if they are maxima or minima:

f ( x, y) = 3 y3 + 2 x2 + 4 x y + 8 y2 + 2 y.

CHAPTER 11. MULTIVARIATE OPTIMIZATION

374

Exercise 11.18 Given the three-input Cobb–Douglas production function,

Y = ALα K β M γ , where L is the amount of labour, K the amount of capital and M the amount of other materials used, prove that the second-order conditions for a profit maximization hold if and only if α + β + γ < 1 (α, β, γ > 0). The good sells for a price of a dollar per unit. The costs of labour, capital and materials are, respectively, w, r

and p. Find the Hessian matrix and confirm that it is negative definite. Exercise 11.19 The geometric average of three (positive) numbers x, y and z is 1

given by ( x yz) 3 . Suppose that x + y + z = k. Show that the geometric average is

maximized when x = y = z. Exercise 11.20

A firm uses two inputs x, y to produce Q according to

Q = A ( xα + yβ ), with 0 < α, β < 1. The production cost is given by c( x, y), so that profit is π( x, y) = pA ( xα + yβ ) − c( x, y), where p is the price of Q. Give the first-

and second-order conditions for a maximum. In the case where c( x, y) = ax + b y +

dx y, a, b, d > 0, find the maximizing values of x, y and show that the conditions

for a maximum are satisfied.

Exercise 11.21 For the function

f ( x, y) = 10 x2 + 10 y2 + 3 x − 12 y + 10 x y, show that the Hessian is positive definite. Exercise 11.22 For the function

f ( x, y, z) = 2 x + 3 y − 5 xz + 2 x y − 6 x2 − 9 y2 − 12 z2 , show that the Hessian is negative definite.

12 Equality-constrained optimization

12.1 Introduction Optimization of a function f ( x1 ,... , x n ) involves choosing the x i variables to make the function f as large or as small as possible. With constrained optimization, permissible choices for the x i are restricted or constrained. In what follows, such constraints are represented by a function, g, whereby the x i variables must satisfy a condition of the form g( x1 ,... , x n ) = c. For example, in the standard utility

maximization problem, the task is to maximize utility u( x1 ,... , x n ) subject to a budget constraint p1 x1 + · · · + p n x n = c, where p i is the price of good i and c is the

income available.

Constraints limit the choices that are feasible. In the case of two goods, the budget constraint is p1 x1 + p2 x2 = c so that the feasible pairs lie on a line. In

this case, there is one constraint with two choice variables. The choice of one variable determines the choice of the other: given x1 , x2 is determined as x2 = c p2

−

p1 p 2 x1 .

Suppose that in addition to the budget constraint there is a second

constraint – that the two goods are complements that must be consumed in a set ratio:

x2 x1

= r or x2 = rx1 . Given this, from p1 x1 + p2 x2 = c one has p1 x1 + p2 rx1 = c

or ( p1 + p2 r) x1 = c or x1 =

c ( p1 + p2 r)

and x2 =

cr ( p1 + p2 r) .

Thus, there is a unique pair

satisfying the two constraints. This observation is generally true; if the number of constraints equals the number of choice variables, there are generally at most a finite number of solutions to the set of constraints. For this reason, the set of constraints is generally less than the number of choice variables. The next section provides a motivating example showing how the Lagrangian technique relates to the familiar first-order condition (marginal rate of substitution equals the price ratio) for an optimum in the consumer choice problem. 375

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

376

Section 12.2 considers the case of two variables and one constraint. The case of n variables and one constraint is discussed in Section 12.4. It turns out that the Lagrange multiplier appearing in the definition of the Lagrangian has a significant economic interpretation: it provides a measure of the value of the constraint. Specifically, it measures the marginal change in the value of the objective resulting from a small change in the constraint. This is examined in Section 12.5. In the case of unconstrained optimization, identification of sufficient conditions for a global optimum is simplified when the objective has a specific shape, particularly concavity or convexity. Similarly, with constrained optimization, under suitable conditions, discussed in Section 12.6, the local conditions for an optimum are sufficient for the solution to be a global optimum.

12.1.1

A motivating example

The optimization techniques described next are based on the use of Lagrangian functions. The discussion here shows how that formulation may be related to optimization ideas introduced earlier. Consider the problem of maximizing u(y, l) where y is income and l is leisure. Work is time not used for leisure, 24 − l, and

income is given by the wage w times the amount of time spent working, so income and leisure are connected according to the equation or constraint y = w(24 − l). Assume that the marginal benefit of both income and leisure is positive: u y > 0 and u l > 0. Maximizing u(y, l) subject to the constraint y = w(24 − l), is an equality-

constrained optimization problem. Preferences and the leisure–income constraint are depicted in Figure 12.1. Note that the condition y = w(24− l) implies that there

is only one variable to be chosen – once l is chosen, the value of y is determined. Using this fact, y can be eliminated from the problem – the constraint can be substituted into the objective to give u∗ (l) = u(w(24 − l), l) and the function u∗ can

be maximized with respect to l as an unconstrained problem. The function u∗ is depicted in Figure 12.2. Differentiating and setting to 0: ′

u∗ = 0 ⇒ − u y w + u l = 0 ⇒

ul = w. uy

For example, if u(y, l) = ln y + ln l, then u∗ (l) = ln(w(24 − l)) + ln(l). Differentiating u∗ (l) and setting the derivative to 0: ′

u∗ (l) = 0 = 24 2

−w 1 + = 0 ⇒ −l + 24 − l = 0 = 24 − 2l. w(24 − l) l

= 12, y = w(24 − 12) = 12w. The second-order condition for ³ ´2 ¡ ¢2 ′ a maximum is easy to check: u∗ (l) = − (241−l) − 1l < 0. Thus, using the con-

This has solution l =

12.1. INTRODUCTION

377

y

u0 •

y = w(24 − l ) · l∗

u¯

•

u′

· l′

24

l

increasing work

Figure 12.1: Constrained optimal choice

u

u¯ u′

• •

u0 · l∗

· l′

u∗ (·) 24

l

Figure 12.2: Optimal leisure choice

straint to solve for one variable in terms of the other leads to an unconstrained problem in one variable that can be solved as the unconstrained maximization of a function of one variable. With just two choice variables, this approach is often satisfactory, but becomes less manageable as the number of choice variables increases. An alternative approach to constrained optimization is to define a function called a Lagrangian. The Lagrangian is denoted L . It is the original function plus the constraint multiplied by a “Lagrange multiplier”. The arguments of the function L are the choice variables and a Lagrange multiplier, λ. In the present

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

378

example, the Lagrangian is defined: L (y, l, λ) = U(y, l) + λ(y − w(24 − l)).

This is now viewed as the objective and maximized with respect to all variables, including the Lagrangian multiplier. ∂L ∂y ∂L ∂l ∂L ∂λ

The first two conditions give

ul uy

= uy +λ = 0

(12.1)

= u l + λw = 0

(12.2)

= y − w(24 − l) = 0.

(12.3)

= w, and this condition imposes the tangency of

the indifference curve to the income–leisure constraint in Figure 12.1. The third equation gives y − w(24 − l) = 0. Solving these determines the optimal values for l and y.

To illustrate, consider the case where u(l, y) = ln u + ln y, so that u y = 1y , u l = ¡1¢ ³1´ ul y 1 l , and so w = u y = l / y = l ⇒ wl = y. Since y = w(24 − l) = wl, this gives

wl = 24w − lw or 24w = 2lw. Therefore l = 12 and y = wl = 12w. This is the same

solution as was obtained by substituting the constraint into the objective function. u

1 From Equation 12.2, u l + λw = 0. This equation determines λ : λ = − wl = − wl = 1 − 12w .

Thus, the solution is the same as that obtained by substitution, using the

constraint to solve for one variable in terms of the other and substituting that into the function to be maximized. In this example, there were just two variables and y was given as an explicit function of l. More generally, with many variables, more complicated constraint functions and possibly multiple constraints, solution by substitution becomes infeasible and use of Lagrangians is the standard approach. Before discussing the general case, the two-variable case is considered in more detail.

12.2 The two-variable case The following discussion considers in general terms the case where there are two choice variables and one equality constraint. Let the objective function be f (x, y) and let the constraint be c = g(x, y). Throughout, assume that (x, y) ∈ Z where Z is ¯ y) ¯ are in Z, all points (x′ , y′ ) sufficiently close to ( x, ¯ y) ¯ an open set in R 2 . Thus, if ( x,

are in Z. Furthermore, f and g are assumed to be continuously differentiable

12.2. THE TWO-VARIABLE CASE

379

(continuous with continuous derivatives). The task is to maximize or minimize

f ( x, y) subject to ( x, y), satisfying the constraint c = g( x, y). Definition 12.1: A point ( x∗ , y∗ ) is a constrained local maximum if g( x∗ , y∗ ) = 0

and f ( x∗ , y∗ ) ≥ f ( x, y) for all ( x, y) in a neighborhood of ( x∗ , y∗ ) and satisfying

g( x, y) = 0.

A constrained local minimum is similarly defined. To identify necessary and sufficient conditions for ( x∗ , y∗ ) to be a constrained local maximum or minimum it is useful to formulate the Lagrangian, L : L ( x, y, λ) = f ( x, y) + λ[ c − g( x, y)].

(12.4)

In addition, define the bordered Hessian, H¯ : 

0

 H¯ =   gx

gy

gx

gy

f xx − λ g xx

f xy − λgxy



 f xy − λgxy .

(12.5)

f yy − λ g yy

Sometimes, the bordered Hessian is written with a negative sign before the partial derivatives of the constraint (− g x and − g y ). This is equivalent to multiplying the ¯ first row and column by −1, a transformation that leaves the determinant of H unchanged. With this notation:

Theorem 12.1: Consider the problem of maximizing or minimizing the function

f ( x, y) subject to a constraint g( x, y) = c. Let L ( x, y, λ) = f ( x, y, λ) + λ[ c − g( x, y)].

Suppose ( x∗ , y∗ , λ∗ ) satisfies:

L x = f x ( x∗ , y∗ ) − λ∗ g x ( x∗ , y∗ ) = 0 ∗

∗

∗

∗

∗

(12.6)

L y = f y(x , y ) − λ g y(x , y ) = 0

(12.7)

L λ = c − g ( x ∗ , y ∗ ) = 0.

(12.8)

Then, • if |H¯ | > 0, ( x∗ , y∗ ) is a constrained local maximum; • if |H¯ | < 0, ( x∗ , y∗ ) is a constrained local minimum. Assuming the derivatives are non-zero, conditions 12.6 and 12.7 imply that

g x ( x ∗ , y∗ ) f x ( x∗ , y∗ ) λ∗ g x ( x∗ , y∗ ) = = f y ( x∗ , y∗ ) λ∗ g y ( x∗ , y∗ ) g y ( x∗ , y∗ ).

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

380

This, along with the constraint, c − g( x∗ , y∗ ) = 0, yields two equations in the two

unknowns, ( x∗ , y∗ ):

f x ( x ∗ , y∗ ) g x ( x ∗ , y∗ ) = f y ( x ∗ , y∗ ) g y ( x ∗ , y∗ )

(12.9)

g( x∗ , y∗ ) = c.

(12.10)

These may be solved for ( x∗ , y∗ ) and λ∗ solved residually from Equations 12.6 or 12.7. E XAMPLE 12.1: A cylinder has circumference length 2πr, where r is the radius. The area of a circle of radius r is πr2 . Thus, the surface area of the cylinder of radius r and height h is: a(r, h) = 2πr2 + 2πrh (top and bottom +

wall). The cylinder has a volume of v(r, h) = πr2 h.

Consider the problem of maximizing volume subject to total surface area being A . This leads to the Lagrangian: L (r, h, λ) = v(r, h) + λ( A − a(r, h))

L = πr2 h + λ( A − 2πr2 − 2πrh) :

∂L ∂r ∂L

∂h ∂L ∂λ

and

∂L ∂h

= 2πrh − λ4πr − λ2πh = 0

(1)

= πr2 − λ2πr = 0

(2)

= A − 2πr2 − 2πrh = 0,

(3)

gives λ = 12 r. Substitute into (1): 2πrh − 2πr2 − πrh = 0, or πr(2h −

2r − h) = 0, or h = 2r. Substitute into (3): A − 2πr2 − 4πr2 = 0, which gives

A = 6πr2 or:

A , r2 = 6π

r=

s

A , 6π

h = 2r = 2

Thus

A × L = π·r ·h = π 6π 2

∗

s

4A = 6π

s

s

A = 6π

4 A3 = 36 × 6π

s

s

4A . 6π

A3 54π.

For example, if A = 24π then r = 2, h = 4, λ = 1, L ∗ = 16π. To check the second-order condition (with g(h, r) = a(h, r), the constraint function is the area):

¯ ¯0 ¯ ¯ ¯ |H | = ¯¯ g r ¯ ¯ gh

gr L rr L rh

¯ ¯ 0 g h ¯¯ ¯¯ ¯ ¯ ¯ ¯ L hr ¯ = ¯4πr + 2πh ¯ ¯ L hh ¯ ¯ 2πr

4π r + 2π h

2πh − λ4π 2πr − λ2π

¯ ¯ ¯ ¯ 2πr − λ2π¯¯. ¯ ¯ 0

|H¯ | = 32π r − 16π r λ + 8π r h − 16π3 rhλ. 3 3

3 2

3 2

2π r

12.2. THE TWO-VARIABLE CASE

381

With λ = 21 r, this reduces to: | H¯ |= 24π3 r3 .

This is sufficient for a maximum (local). The area and volume functions may be depicted:

h A = a(r, h)

•

V = v(r, h) r

♦

R EMARK 12.1: In fact, the problem in Example 12.1 is easily solved by substitution. For fixed surface area A , radius and height are connected by the formula:

A = 2πrh + 2πr2 , so that h(r) =

A − 2π r 2 A = − r. 2π r 2π r

Volume is given as a function of r and h by v(r, h) = πr2 h. Substituting for h

gives

v∗ (r) = v(r, h(r)) = πr2 h = πr2

·

¸ A A − r = r − π r3 . 2π r 2

A 2

− 3πr2 and v∗′′ (r) = −6πr, so the function v∗ is concave and q solving for r in v∗′ (r) = A2 − 3πr2 = 0 gives r2 = 6Aπ and r = 6Aπ .

From this, v∗′ (r) =

E XAMPLE 12.2: A production technology gives combinations of x and y that yield a fixed level of output Q . The technology is described by the equation

y = Q + e2 x . The cost of using these inputs is given by the equation C = y − 3 x

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

382

(so that raising x reduces cost!). The task is to minimize cost subject to the production constraint.

y = Q + e2 x

y

y = c + 3x

• Q +1

c

x

The Lagrangian for the problem is: L = y − 3 x + λ[ y − Q − e2 x ]. Therefore, L x = −3 − λ2 e2 x = 0 Ly = 1+λ = 0 L λ = [ y − Q − e 2 x ] = 0.

From the second equation, λ = −1 and substituting into the first equation ¡ ¢ gives: 3 = 2 e2 x . Solving for x, x = 21 ln 23 . Since y = Q + e2 x , we get y = Q + 1

3

3

e2 2 ln( 2 ) = Q + eln( 2 ) = Q + 32 .

For the second-order condition, L xx = −λ4 e2 x , L yy = 0, L x y = 0. The

constraint is 0 = g( x, y), where g( x, y) = − y + Q + e2 x , so that g x = +2 e2 x and

g y = −1. The bordered Hessian is 

0

 2x H¯ =  2 e

−1

2 e2 x −λ4 e

0

2x

−1



¯ ¯2 e2 x  ¯  ¯ 0  ⇒| H |= −1 ¯ ¯ −1 0

¯ −λ4 e2 x ¯¯ ¯ = λ4 e2 x = −4 e2 x < 0. 0 ¯

Thus, the solution to the first-order conditions gives a local minimum.

♦

E XAMPLE 12.3: In this example, the task is to find a pair ( x, y) that minimize the radius of a circle, subject to ( x, y) being on a line: min f ( x, y) = x2 + ( y − 1)2 ,

12.2. THE TWO-VARIABLE CASE

383

subject to α = y − kx, 0 < α < 1, 0 < k. The Lagrangian is L = x2 + ( y − 1)2 + λ(α − y + kx).

This leads to first-order conditions: L x = 2 xλk = 0 L y = 2( y − 1) − λ = 0 Lλ = (α − y + kx) = 0.

From these, y = α + kx, y − 1 = (α − 1) + kx so 2[(α − 1) + kx] − λ = 0. Then

2 x + 2[(α − 1) + kx]k = 0 or (1 + k2 ) x = (1 − α)k, so x = (11−+αk)2k . Solving for y: ¡ ¢ 2 −α ) + k2 y = α + (11−+αk)2k = α1+ and λ = − 1k 2 x = − 2(1 . Next, L xx = 2, L yy = 2, k2 1+ k 2 L x y = 0, and g x = −k, g y = 1. The bordered Hessian is 

0

 ¯ = − k H 

1

−k

2

0

1



 0 ,

2

and | H¯ |= k(−2k) + 1(−2) = −2(1 + k2 ) < 0. A graph of the problem is given in

the figure.

y

y = α + kx

• • α

x

♦

R EMARK 12.2: Theorem 12.1 gives sufficient conditions for a local maximum. But a local (or global) maximum may fail to satisfy these conditions. Suppose that f ( x, y) = x y and g( x, y) = ( x + y − 1)2 . Consider the problem of maximizing

f subject to the constraint g( x, y) = 0. The constraint is satisfied if and only if

x + y − 1 = 0 or x + y = 1 or y = 1 − x. Substituting y = 1 − x into the objective and

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

384

maximizing fˆ( x) = x(1− x) gives fˆ′ ( x) = 1−2 x and fˆ′′ ( x) = −2 so that the local and

global maxima are found at x∗ = y∗ = 21 . But observe that g x ( x, y) = g y ( x, y) = ¡ ¢ ( x + y − 1) so that at the solution ( x∗ , y∗ ) = 12 , 21 , g x ( x∗ , y∗ ) = g y ( x∗ , y∗ ) = 0. ¢ ¡ However, ( f x , f y ) = ( y, x) so that ( f x ( x∗ , y∗ ), f y ( x∗ , y∗ )) = 12 , 21 . So, for example,

in Theorem 12.1, the condition f x ( x∗ , y∗ ) = λ∗ g x ( x∗ , y∗ ) fails, although ( x∗ , y∗ )

is the global optimum. Because ( g x ( x∗ , y∗ ), g y ( x∗ , y∗ )) = (0, 0), it is impossible to write f x ( x∗ , y∗ ) = λ∗ g x ( x∗ , y∗ ).

The issue here is that at the solution, ( g x ( x∗ , y∗ ), g y ( x∗ , y∗ )) = (0, 0). Theorem 12.2

confirms this observation.

Theorem 12.2: Let f and g be continuously differentiable functions on an open set Z in R 2 . Suppose that ( x∗ , y∗ ) is a constrained local maximum or minimum of

f , subject to the constraint g( x, y) = c. Suppose that ( g x ( x∗ , y∗ ), g y ( x∗ , y∗ )) 6= (0, 0).

Then there is a λ∗ such that  

f x ( x ∗ , y∗ ) f y ( x ∗ , y∗ )





 = λ∗ 

g x ( x ∗ , y∗ ) g y ( x ∗ , y∗ )



.

Therefore, when ( g x ( x∗ , y∗ ), g y ( x∗ , y∗ )) 6= (0, 0), the conditions f x = g x , f y = g y and

g = 0 give three equations in three unknowns, ( x, y, λ), which may be solved to

identify a candidate solution to the optimization problem.

The following discussion motivates the first-order conditions and derives the bordered Hessian second-order condition using familiar arguments for the unconstrained optimization problem. Then the problem is described from a geometric perspective. Following that, the general result for the n-variable case is given, in Theorem 12.4.

12.3 Motivation for first- and second-order conditions Observe that the constraint g( x, y) = c may be used to solve for y as a function of

x. Assume that either

∂g ∂x

or

∂g ∂y

6= 0, so that we can write one variable as a function

of the other. Recall the discussion on implicit functions. Let F ( x, y) = 0 = g( x, y) − c and assume that F has continuous partial derivatives. If

written as a function, h, of x while if

6= 0, then y can be

6= 0, then x can be written as a function

∂g ∂g ∂F = − ∂ y , if ∂ y 6= 0, then there is a function with y = h( x). Similarly, ∂y ∂g ∂g ∂F ∂ x = − ∂ x , if ∂ x 6= 0 then there is a function ϕ with x = ϕ( y). (These are local

ϕ, of y. Since

since

∂F ∂x

∂F ∂y

12.3. MOTIVATION FOR FIRST- AND SECOND-ORDER CONDITIONS

385

properties.) For example, if g( x, y) = x y = c, then y = xc = h( x); if g( x, y) = ln( x y) = c ¡ ¢ p then x y = e c , and y = 1x e c = h( x); if g( x, y) = px + q y = c, then y = qc − q x = h( x).

Note that the presumption here is that g y 6= 0. (In particular, this must hold at

the candidate solution to the optimization problem, so that the issue raised in Remark 12.2 is assumed not to arise.)

12.3.1

The first-order condition

Assume that we can write y as a function of x: y = h( x). Then, by substituting

for y the problem reduces to max x f ( x, h( x)) (or max y f (ϕ( y), y) if x is substituted

out). The original problem with two choice variables and a constraint has been redef duced to a one-variable unconstrained problem – maximizing the function fˆ( x) =

f ( x, h( x)). With just two variables, this procedure is, in principle, straightforward.

With more than two variables, the approach is more complex and in general it is more convenient to use Lagrangians. The following calculations show that the solution from max x f ( x, h( x)) and then determining y from x ( y = h( x)) is the same as that obtained from the Lagrangian method, solving max f ( x, y) subject to g( x, y) = c. Specifically, solving

max x f ( x, h( x)) leads to the same first-order conditions for ( x, y) as obtained earlier, and the second-order condition is shown to be equivalent to the bordered Hessian condition. Maximizing f ( x, h( x)) with respect to x, the first-order condition is

0=

d fˆ dy gx = f x + f y h′ = 0, where h′ = =− . dx dx gy

Thus, µ ¶ fy gx fx 0 = fx + f y − , so that = . gy gx g y

This condition and c = g( x, y) give the same conditions as obtained from the Lagrangian first-order conditions.

12.3.2

The second-order condition

Maximizing the function fˆ( x) = f ( x, h( x)) gives a first-order condition for an optimum. The second-order condition, fˆ′′ ( x) < 0 for a maximum or fˆ′′ ( x) > 0 for a

minimum, may also be related to the bordered Hessian condition: the discussion to follow shows that fˆ′′ < 0 or fˆ′′ > 0 is equivalent to the corresponding bordered Hessian conditions for a local constrained optimum.

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

For a maximum we require that

d2 f d x2

Evaluating this,

386

< 0, which is

d dx

n

df dx

o

=

d dx

©

ª f x + f y h′ < 0.

d d { f x + f y h′ } = { f x ( x, h( x)) + f y ( x, h( x))h′ ( x)} dx dx = f xx + f x y h′ + f x y h′ + f yy (h′ )2 + f y h′′ = f xx + 2 f x y h′ + f yy (h′ )2 + f y h′′ .

Since

h′ ( x) = − h′′ ( x) = − h′′ = − =−

Using h′ =

gx gy ,

g x ( x, h( x)) , g y ( x, h( x)) g y [ g xx + g x y h′ ] − g x [ g x y + g yy h′ ] g2y

g xx g y − g x y g x − g x y g x + g yy g2x / g y g2y

g xx g2y − 2 g x y g x g y + g yy g2x g3y

h′′ as above and λ =

fy gy

(using − h′ =

gx ) gy

.

gives the second derivative as:

" # ¸ · g2x gx d2 f + f yy = f xx + 2 f x y − + f y h′′ dx gy g2y ¾ ½ fy 1 ( g xx g2y − 2 g x y g x g y + g yy g2x ) = 2 f xx g2y − 2 f x y g x g y + f yy g2x − gy gy

=

1

g2y

{( f xx − λ g xx ) g2y − 2( f x y − λ g x y ) g x g y + ( f yy − λ g yy ) g2x },

so

d2 f × ( g2y ) = {( f xx − λ g xx ) g2y − 2( f x y − λ g x y ) g x g y + ( f yy − λ g yy ) g2x }. dx Now, consider the determinant: 

0

 | H¯ |=   gx

gy

gx f xx − λ g xx

f xy − λgxy

gy



 f xy − λgxy .

f yy − λ g yy

12.3. MOTIVATION FOR FIRST- AND SECOND-ORDER CONDITIONS

387

Evaluating the determinant: |H¯ | = − g x { g x ( f yy − λ g yy ) − g y ( f x y − λ g x y )} − g y {− g x ( f x y − λ g x y ) + g y ( f xx − λ g xx )} = −{( f xx − λ g xx ) g2y + 2( f x y − λ g x y ) g x g y + ( f yy − λ g yy ) g2x } =−

d2 f × ( g2y ). dx2

Therefore, |H¯ | has the opposite sign to For a maximum, the condition condition:

¯ ¯0 ¯ ¯ ¯ gx ¯ ¯ ¯gy

∂2 f ∂ x2

∂2 f . ∂ x2

< 0 is sufficient and this is equivalent to the ¯ ¯ ¯ ¯ f x y − λ g x y ¯¯ > 0. ¯ f yy − λ g yy ¯

gx

gy

f xx − λ g xx

f xy − λgxy

Since multiplying a row and a column by −1 does not change either the value or sign of the determinant, the determinant may also be written: ¯ ¯ 0 ¯ ¯ ¯− g x ¯ ¯ ¯− g y

¯ ¯ ¯ ¯ f x y − λ g x y ¯¯ . ¯ f yy − λ g yy ¯

−gx

−gy

f xx − λ g xx

f xy − λgxy

Conversely, in a minimization problem the determinant should be negative, i.e., ¯ ¯0 ¯ ¯ ¯ gx ¯ ¯ ¯gy

gx f xx − λ g xx

f xy − λgxy

¯ ¯ ¯ ¯ f x y − λ g x y ¯¯ < 0. ¯ f yy − λ g yy ¯

gy

Finally, since L xx = f xx − λ g xx , etc., the determinant may be written succinctly: ¯ ¯0 ¯ ¯ |H¯ | = ¯¯ g x ¯ ¯g y

12.3.3

gx L xx L yx

¯ g y ¯¯ ¯ L x y ¯¯ . ¯ L yy ¯

A geometric perspective

With just one constraint, g, and one objective, f , constrained optimization involves locating a point where the objective is tangent to the constraint and then

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

388

confirming that this is an optimum. The following discussion provides a different perspective. The slope of the constraint may also be measured in terms of a line perpendicular to the constraint. Figure 12.3 shows the perpendicular, P, to the line L and the curve f (x, y) = c in each panel. In fact, the perpendicular to a curve

f (x, y) = c is proportional to the vector of partial derivatives, ( f x (x, y), f y (x, y)).

Furthermore, this vector P points in the direction of greatest local increase in the function and therefore has a direct connection to the optimization problem. To motivate this perspective, first consider the case where the curve is a line such y

as L in Figure 12.3A. From the figure, the slope of the line is − x and the an-

gle between L and the horizontal axis is α. With the vertical dashed line as the baseline, the slope of the angle β is

x y.

Because α and β are in two corners of a

right-angled triangle, these two angles sum to 90◦ . Consider the line P, drawn perpendicular to L. Note that the angles α and α∗ are equal, and because the angles α∗ and β∗ sum to 90◦ , β∗ is the same angle as β, and so has the same y

slope, xy . Noting that the slope of α times the slope of β is − x × xy = −1, two lines, L

and P, are perpendicular if and only if their slopes multiply to −1. Now, consider

A

B

y

P fy

P

•

β

(f x ,f y) }|

{

x

α

z

y

z }| {

¯ y) ¯ ( x,

fx

∗

•β

f (x, y) = c x

∗

α

L

Figure 12.3: Orthogonal lines and normal vectors

Figure 12.3B. Along the level surface f (x, y) = c, and totally differentiating and

setting to 0 gives: 0 = f x (x, y)dx + f y (x, y)d y, so that the slope along the level surf

¯ y) ¯ given by the dashed line is face is − f xy . The slope at ( x,

dy dx

f ( x¯ , y¯ )

= − f xy ( x¯ , y¯ ) . The line

P is chosen perpendicular to the dashed line. Since the slope of P, s P , multiplied f ( x¯ , y¯ )

by the slope of the dashed line equals −1: −1 = − f xy ( x¯ , y¯ ) × s P , s P =

f y ( x¯ , y¯ ) f x ( x¯ , y¯ ) .

Thus, the

¯ y), ¯ is proportional to ( f x ( x, ¯ y), ¯ f y ( x, ¯ y)). ¯ The line perdirection P, starting from ( x, ¯ y), ¯ f y ( x, ¯ y)). ¯ pendicular to the tangent to the level surface is proportional to ( f x ( x,

12.3. MOTIVATION FOR FIRST- AND SECOND-ORDER CONDITIONS

389

To illustrate these ideas in the economic context, consider utility maximization where the utility function is u(x, y) = xα yβ . The partial derivatives of u with respect to x and y are u x = α xα−1 yβ = α ux and u y = β xα yβ−1 = β uy . Thus, ∇u(x, y) = ³ ´ (u x , u y ) = α ux , β uy . Notice, for example, that at (x1 , y1 ), x1 is small relative to y1 , so that α xu is large relative β yu and so the vector ∇u(x1 , y1 ) = (α xu , β yu ) 1

1

1

1

has first component large relative to the second. In Figure 12.4, the dashed line y

(x1 , y1 ) + ∇u(x1 , y1 )

y1 ·

•

(x1 , y1)

(x2 , y2 ) + ∇u(x2 , y2 )

∇u(x2 , y2 )

y2 ·

∇u(x1 , y1 )

· x1

•

(x2 , y2)

· x2

u = u¯ x

Figure 12.4: Utility maximization: the indifference curve starting from (x1 , y1 ) is the vector ∇u(x1 , y1 ). Similarly, the dashed line starting

from (x2 , y2 ) is the vector ∇u(x2 , y2 ). Notice that these dashed lines are perpen-

dicular to the level surface of the function u (the set of points {(x, y) | u(x, y) = u¯ }). Also, notice that for i = 1,2, starting from (x i , yi ) and moving in the direction

∇u(x i , yi ) increases the value of the function. For utility maximization, the con-

straint has the form g(x, y) = px + q y = c, where p and q are the prices of x and y, respectively. In this case, g x = p and g y = q, so that for all (x, y), ∇ g(x, y) = (p, q)

and the gradient of g(x, y) is constant, independent of (x, y). This is depicted in Figure 12.5a. In Figure 12.5b, at z1 = (x1 , y1 ) there is no number λ such that ∇u(z1 ) = λ∇ g(z1 ) and the indifference curve cuts the budget constraint. The sit-

uation is similar at z2 . Neither z1 nor z2 is optimal. Whenever this is the case, ∇u(z) 6= λ∇ g(z) for any λ, the slope of the indifference curve and the budget

constraint differ so that z cannot be optimal. In contrast, at point z3 , ∇u(z3 ) =

λ∇ g(z3 ), where ∇u(z3 ) is proportional to ∇ g(z3 ) and λ scales ∇ g(z3 ) to equal

∇u(z3 ). This is the first-order condition.

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

390

(a)

( b)

y

y z i = ( x i , yi )

y1 ·

y1 ·

•

(x1 , y1)

z1 •

∇u(z2)

y2 ·

∇ g ( x i , yi )

· x1

•

(x2 ,y2) px + q y = c

· x2

y2 ·

x

∇ g( z i )

· x1

z3•

∇u(z1)

•

z2

u = u¯

· x2

x

Figure 12.5: Utility maximization: optimization

12.4 The n-variable case This section considers the problem of maximizing or minimizing a function

f ( x1 , x2 ,... , x n ) subject to the constraint g( x1 , x2 ,... , x n ) = c. The procedure is sim-

ilar to the two-variable case: obtain first-order conditions relating the partial

derivatives of the objective to those of the constraint and then confirm that the solution to these equations actually is a local maximum or local minimum. The generalization of the Lagrangian to n variables is straightforward. Let L = f ( x1 , x2 ,... , x n ) + λ( c − g( x1 ,... , x n )).

The first-order conditions are: ∂L ∂xi ∂L ∂λ

= L i = f i ( x) − λ g i ( x) = 0 = L λ = c − g ( x ) = 0.

Let ( x∗ , λ∗ ) be a solution to these n + 1 equations. Whether this is a local max-

imum or local minimum depends on second-order conditions. These in turn are

derived in terms of bordered matrices and associated quadratic forms, developed in the next section. After that, the main result is developed. This gives sufficiency conditions for a solution to the first-order conditions to be a local maximum or minimum.

12.4. THE N -VARIABLE CASE

391

12.4.1

Quadratic forms and bordered matrices

This section recalls some material on quadratic forms and matrix definiteness. A matrix A determines a quadratic form according to the expression x′ Ax. The matrix A is said to be negative or positive definite if x′ Ax > 0 or x′ Ax < 0 for

all non-zero vectors x. The positive and negative definiteness criteria play a cen-

tral role in identifying local optima in the study of unconstrained optimization problems. In constrained optimization, there is a comparable criterion based on bordered matrices. The following theorem presents a result on quadratic forms. Its application to constrained optimization is developed subsequently. Theorem 12.3: Let A be an n matrix and let b be an n × 1 vector ! Ã × n symmetric 0 b′ with b 6= 0. Also, let C = C n = b A 

0

b1

b2

  b1   b C=  2  .  ..  bn

a11

a12

a21 .. .

a22 .. .

··· .. .

a n1

a n2

···

bn

···





 a1 n    a2 n  ,  ..  .   a nn

···

0

b1

b2

 b1   b Cj =   2  .  ..  bj

a11

a12

a21 .. .

a22 .. .

··· .. .

a j1

a j2

···

···

···

bj



 a1 j    a2 j  .  ..  .   ajj

Then • ξ′ A ξ < 0, for all ξ 6= 0 with b′ ξ = 0, if and only if (−1) j |C j | > 0, for all j = 2, 3,... , n.

• ξ′ A ξ > 0 for all ξ 6= 0 with b′ ξ = 0, if and only if |C j | < 0, for all j = 2, 3,... , n. The logic for this can be seen in the two-variable case. Let

b=

Ã ! α β

,

A=

Ã

a11

a12

a21

a22

!



,

0

 C= α β

α

a11 a21

β



 a12  .

a22

Consider a quadratic form in A , with x′ = ( x1 , x2 ), x′ Ax = x12 a11 + 2a12 x1 x2 + a22 x22

(note a12 = a21 ). Suppose that the constraint (α, β) · ( x1 , x2 ) = 0 is imposed. Then

x2 = − α x and imposing this condition on the quadratic form gives: β 1 ¶ ¶2 µ µ α α x′ Ax = x12 a11 + 2a12 x1 − x1 + a22 − x1 β β

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

392

· µ ¶ ¶ ¸ µ α α 2 a11 + 2a12 − + a22 − β β i 1 h = x12 β2 a11 − 2a12 αβ + a22 α2 2 . β

= x12

£ ¤ This expression is positive if and only if β2 a11 − 2a12 αβ + a22 α2 is positive.

Now, consider the determinant of C : |C | = −α(αa22 − βa12 ) + β(αa21 − βa11 ) = −α2 a22 + αβa12 + βαa21 − β2 a11 = −α2 a22 + 2αβa12 − β2 a11 i h = − β2 a11 − 2a12 αβ + a22 α2 .

For x 6= 0, x′ Ax > 0 for all x satisfying (α, β) · x = 0 if and only if |C | < 0.

12.4.2

Bordered Hessians and optimization

In the present context of n-variable optimization, ( x1 ,... , x n ) are chosen to maximize the objective f ( x1 ,... , x n ), subject to the constraint g( x1 ,... , x n ) = c. Define

the bordered Hessian matrices: 

0

  g1    H¯ =  g 2   ..  . 

gn

g1

g2

···

L11

L12

···

L21 .. .

L22 .. .

···

L n1

L n2

···

..

gn

.



 L1 n    L2 n  ,  ..  .  

L nn



0

  g1    H¯ j =  g 2   ..  . 

Comparing notation, note that b corresponds to

gj

∂g ∂x .

g1

g2

···

L11

L12

···

L21 .. .

L22 .. .

···

L j1

L j2

···

..

.

gj



 L1 j    L2 j  .  ..  .  

Ljj

The matrix A corresponds

to the matrix of second derivatives of the Lagrangian, and C j corresponds to the bordered Hessian principal minor H¯ j . With this notation, the main result on constrained optimization is: Theorem 12.4: Let L ( x1 ,... , x n , λ) = f ( x1 ,... , x n ) + λ( c − g( x1 ,... , x n )) and suppose

that at ( x∗ , λ∗ ) the following are satisfied: ∂L ∂xi ∂L ∂λ

= L i = 0, i = 1,... , n = L λ = 0.

12.4. THE N -VARIABLE CASE

393

If, in addition, the bordered Hessian H¯ satisfies (1) at x∗ , then x∗ is a local maxi¯ satisfies (2) at x∗ , then x∗ is a local minimum, where mum and if H 1. | H¯ 2 |> 0, | H¯ 3 |< 0, | H¯ 4 |> 0, ..., (−1)n | H n |> 0 ¯ n |< 0. 2. | H¯ 2 |< 0, | H¯ 3 |< 0, ..., | H Note that the theorem asserts that x∗ is a local maximum or minimum; x∗ may not be a global maximum or minimum. Some motivation for this result is as follows. The Lagrangian first-order necessary conditions for an optimum are: ∂L ∂xi ∂L ∂λ

= f i (x) − λ g i (x) = 0, i = 1,... , n

(12.11)

= c − g(x) = 0.

(12.12)

Figure 12.6 illustrates failure of the tangency conditions (Equation 12.11). The figure shows that when there is no λ such that condition f i = λ g i for all i, then generally there is an objective improving change that satisfies the constraint. There is an n-variable analogy to Theorem 12.2.

¯ g( x) ¯ = c, but there is no λ 6= 0 such At x, that ¯ = λ∇ g( x). ¯ ∇ f ( x)

¯ } { x | f (x) ≥ f ( x)

x¯

x¯ +∇ f

•

¯ Thus, the level curves of f and g cross at x. ˆ =c So there is a point such as xˆ with g( x) and ˆ > f ( x). ¯ f ( x)

• xˆ

x¯ +∇ g g=c

Figure 12.6: Tangency condition fails

Theorem 12.5: Let f and g be continuously differentiable functions on an open set Z in R n . Suppose that (x∗ ) is a constrained local maximum or minimum of

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

394

f , subject to the constraint g(x) = c. Suppose that ∇ g(x∗ ) 6= 0. Then there is a λ∗

such that

∇ f (x∗ ) = λ∗ ∇ g(x∗ ).

12.4.2.1

The sign of the Lagrange multiplier

Sometimes it is useful to know the sign of the Lagrangian multiplier at the solution. However, with equality-constrained optimization, the multiplier may be positive or negative, depending on the way the constraint is expressed. Figure 12.7 illustrates the case where the first-order tangency conditions are satisfied. There are two possibilities. In the first, ∇ f and ∇ g “point” in the same direction, so that

¯ } for some λ > 0, ∇ f = λ∇ g. This occurs when the upper contour sets { x | g(x) ≥ g( x) ¯ } are on the same side of the g and f curves through x. ¯ This is and { x | f (x) ≥ f ( x) depicted in (b).

When ∇ f and ∇ g “point” in opposite directions, then for some λ < 0, ∇ f = λ∇ g.

¯ } and { x | f (x) ≥ f ( x) ¯ } are This occurs when the upper contour sets { x | g(x) ≥ g( x)

¯ This is depicted in (a). These on opposite sides of the g and f curves through x. observations show that the sign of the Lagrange multiplier, λ, is indeterminate in equality-constrained optimization. x2

x2

(a)

(b) ¯ } { x | f (x) ≥ f ( x)

¯ } { x | f (x) ≥ f ( x)

x¯ +∇ f

x¯ +∇ f x¯ •

x¯ •

x¯ +∇ g

x¯ +∇ g

f

{ x | g(x) ≥ c}

g=c

f x1

{ x | g(x) ≤ c}

g=c

Figure 12.7: Tangencies at the solution

To illustrate, suppose that f (x1 , x2 ) = x1 x2 and g(x1 , x2 ) = x12 + x22 = c. Then ∇ f = ( f x1 , f x2 ) = (x2 , x1 )

and ∇ g = (g x1 , g x2 ) = (2x1 ,2x2 ),

x1

12.4. THE N -VARIABLE CASE

395

so that ∇ f = 12 ∇ g, and λ = 21 . Both ∇ f and ∇ g point into the positive orthant with

the vector ∇ f twice the length of ∇ g. Now, suppose that gˆ ( x1 , x2 ) = − x12 − x22 so

that the constraint is unchanged if gˆ ( x1 , x2 ) = − c. Since ∇ gˆ = (−2 x1 , −2 x2 ) = −∇ g, ∇ f = − 21 ∇ gˆ and λˆ = − 12 . The sign of the Lagrange multiplier depends on the way in which the constraint is expressed.

12.4.3

The second-order condition reconsidered

The following discussion elaborates on the connection between the bordered Hessian conditions and the optimality of x∗ , obtained as a solution to the first-order conditions. In particular, the discussion provides additional perspective on the connection between the bordered Hessian and the second-order condition. The Lagrangian equals the objective function at any x satisfying the constraint: L ( x, λ) = f ( x) + λ[ c − g( x)] = f ( x).

At a constrained maximum, ( x∗ , λ∗ ), f ( x∗ + v) − f ( x∗) ≤ 0, for any v with g( x∗ + v) −

c = 0. Consider a Taylor series expansion of L ( x∗ + v, λ) around x∗ . This gives: L ( x∗ + v, λ) = L ( x∗ , λ) +

X ∂L ( x∗ , λ) i

∂xi

1 ˆ λ)v v i + v′ HL ( x, 2

1 ˆ λ)v, L ( x∗ + v, λ) = L ( x∗ , λ) + v′ HL ( x, 2 ˆ λ) is the matrix where HL ( x, ∗

∂2 L ∂ xi ∂ x j

ˆ λ) where xˆ is a point between evaluated at ( x,

∗

x and x + v, and where the second equality follows from the first-order condi-

tions,

∂L ∂ xi

= 0, ∀ i .

Since x∗ and x∗ + v are feasible, ( c − g( x∗ ) = 0, c − g( x∗ + v) = 0), L ( x∗ + v, λ) =

f ( x∗ + v) and L ( x∗ , λ) = f ( x∗ ), so from the previous expression 1 ˆ λ)v. f ( x∗ + v) = f ( x∗ ) + v′ HL ( x, 2 ˆ λ)v < 0. Therefore f ( x∗ + v) < f ( x∗ ) if v′ HL ( x,

The constraint function satisfies, using a first-order Taylor series, 0 = g( x∗ + v) − c = g( x∗ ) − c + g x ( x˜ ) · v = g x ( x˜ ) · v, where x˜ is (on the line) between x∗ and x∗ + v.

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

396

Recall that given a vector b and matrix A , z′ Az < 0 for all z satisfying b′ z = 0

if and only if the matrix

Q=

Ã

0

b′

b

A

!

satisfies the conditions (−1) j |Q j | > 0, j = 2,... , n. Equivalently, |Q 2 | > 0, |Q 3 | < 0, and so on. (Here, Q j is the northwest principal minor of Q , of order j + 1.)

Returning to the optimization problem, consider the matrices: ∗

Q =

Ã

0

g x ( x ∗ )′

g x ( x∗ )

HL ( x∗ , λ)

!

ˆ x˜ ) = and Q ( x,

Ã

0

g x ( x˜ )′

g x ( x˜ )

ˆ λ) HL ( x,

!

.

ˆ x˜ ) ≈ Q ∗ so that if Q ∗ satisSince these functions are continuous, for v small, Q ( x,

ˆ x˜ )| > 0 for j ≥ 2 also. Therefore, since fies (−1) j |Q ∗j | > 0 for j ≥ 2, then (−1) j |Q j ( x, ˆ λ)v < 0 and so f ( x∗ + v) < f ( x∗ ). g x ( x˜ ) · v =0, v′ HL ( x,

12.5 Interpretation of the Lagrange multiplier The Lagrangian is L = f ( x1 ,... , x n ) + λ( c − g( x1 ,... , x n )). This has first-order con-

ditions:

f i ( x1∗ ,... , x∗n ) − λ g i ( x1∗ ,... , x∗n ) = 0 c = g( x1∗ ,... , x∗n ) = 0. Let ( x∗ , λ) be a solution. Note that x∗ and λ depend on c: as c varies they will also. Thus x∗ and λ may be written as functions of c : x∗ ( c), λ( c). These functions give a solution at each value of c. In particular, the first-order conditions hold at each value of c:

f i ( x1∗ ( c), · · · , x∗n ( c)) − λ( c) g i ( x1∗ ( c), · · · , x∗n ( c)) = 0 c − g( x1∗ ( c),... , x∗n ( c)) = 0. The value of the objective at c is: L ( c) = f ( x1∗ ( c),... , x∗n ( c)) + λ( c)[ c − g( x1∗ ( c),... ,

x∗n ( c))]. This function may be differentiated with respect to c to give: ∗

∗

∗

∗

dx dx dx dx dx∗ dx∗ d L ( c) = f 1 1 + f 2 2 + · · · + f n n − λ( c) g 1 1 − λ( c) g 2 2 − · · · − λ( c) g n n dc dc dc dc dc dc dc +

dλ [ c − g( x1∗ ( c),... , x∗n ( c))] + λ( c). dc

12.5. INTERPRETATION OF THE LAGRANGE MULTIPLIER

397

Rearranging: ∗

n dx d L ( c) X = [ f i ( x1∗ ( c)),... , x∗n ( c)) − λ( c) g i ( x1∗ ( c),... , x∗n ( c))] i dc dc i =1

+

dλ [ c − g( x1∗ ( c),... , x∗n ( c)] + λ( c) dc

= λ( c).

This follows using the first-order conditions. Thus,

dL dc

= λ, the marginal impact

on the value of the objective due to changing the constraint “a little”, is equal to the Lagrange multiplier. The next example uses the Lagrange multiplier to calculate a derivative: the change in a function due to a small change in its argument.

E XAMPLE 12.4: Consider the problem: max f ( x) = x2 subject to x = c. The

Lagrangian is:

L ( x, λ) = x2 + λ( c − x),

with first-order conditions: L x = 2 x − λ = 0, ⇒ x =

λ

2

Lλ = c − x = 0 ⇒ x = c,

so λ( c) = 2 c, and x( c) = c.

f ( x) x2

( c + ∆ c )2 c2

c c + ∆c

x

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

398

def

Thus, L ∗ ( c) = L ( x( c), λ( c)) = x( c)2 = c2 so that

dL ∗ dc

= λ = 2 c.

The figure illustrates this. Note that varying c “a little” amounts to calculating the slope of the function, x2 , which at c is equal to 2 c. Since L ∗ ( c) = L ( x( c), λ( c)) = f ( x( c)), the Lagrangian at the solution is identically ′

equal to the value of the objective at the solution, L ∗ ( c) = f ′ ( x( c)) dd xc =

2 x( c) dd xc = 2 c.

♦

Example 12.5 shows that in the standard utility maximization problem, the Lagrange multiplier measures the marginal utility of income: the utility gain an optimizing individual would achieve from a small increase in income. E XAMPLE 12.5: Consider the standard utility maximization problem: max u( x, y) = ln x + ln y, subject to px + q y = I . The Lagrangian is: L = ln x + ln y + λ( I − px − q y).

So, the first-order conditions are: 1 − λp = 0 x 1 L y = − λq = 0 y Lx =

Lλ = ( I − px + q y) = 0.

Solving, x =

I 2p ,

y=

bordered Hessian is

I 2q ,

and λ = 2I . L xx = − x12 , L yy = − y12 and L x y = 0. The 

0

 H¯ =  p

q

Thus, | H¯ |= I 2p

p2 y2

and y( I ) =

+

I 2q :

q2 x2

p − x12

0

q



 0  . − y12

> 0. Denote the solution values as functions of I , x( I ) =

¶ µ ¶ I I + ln 2p 2q µ 2 ¶ I = ln . 4 pq

u( x( I ), y( I )) = ln

µ

12.5. INTERPRETATION OF THE LAGRANGE MULTIPLIER

399

Thus, 1 2 du( x( I ), y( I )) I = 2 · = = λ. I dI 2 pq I 4 pq

Observe that if we differentiate L with respect to I we get λ and evaluating at the solution value for λ gives 2I . The same reasoning allows us to conclude that the marginal impact on maximum utility of a change in price p is given by −λ( I ) x( I ) = − 2I 2Ip = − 1p ,

differentiating the Lagrangian with respect to p and evaluating at the solu-

tion. Differentiating u( x( I ), y( I )) with respect to p gives −

2

1 I2 4 pq

· 4 pI 2 q = − 1p , so

that the marginal impact of an increase in the price of x is − 1p .

E XAMPLE 12.6 (C OST

♦

MINIMIZATION SUBJECT TO A PRODUCTION QUOTA ):

The cost function is c = rK + wL and the production function is Q = K α Lβ . The problem is:

min L = rK + wL + λ(Q − K α Lβ ). The first-order conditions are: L K = r − λα

Q K α Lβ = 0 ⇒ r − λα = 0 K K

(1)

Q K α Lβ = 0 ⇒ w − λβ = 0 L L

(2)

L L = w − λβ

L λ = Q − K α L β = 0.

Solving using (1) and (2),

r w

Q = K α Lβ =

µ

=

α L , β K

αw βr

¶α

(3)

which implies that K =

Lα Lβ =

µ

αw βr

¶α

αw L. βr

L α +β .

Thus

L=

µ

αw βr

¶

−α α+β

1

Q α+β .

So

K=

αw βr

L=

µ

αw βr

¶µ

αw βr

¶

−α α+β

1

Q α+β =

µ

αw βr

¶

β α+β

1

Q α+β .

Then

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

400

The solutions can also be written:

K=

³r´

−β α+β

µ

w β

¶

L=

³r´

α α+β

µ

w β

¶

α α

β α+β

−α α+β

1

Q α+β = K (Q ) 1

Q α+β = L(Q ).

As required, these satisfy K (Q )α L(Q )β = Q, ∀Q . To solve for λ, observe that λ=

³r´K

α Q

=

³ r ´³ r ´

Compare this with

−β α+β

µ

w β

¶

β α+β

Q

1 α+β−1

=

³r´

α α+β

α

µ

w β

α

α

dc dQ .

Note that c(Q ) = rK (Q ) + wL(Q ),

¶

β α+β

Q

1−(α+β) α+β

.

c(Q ) = rK (Q ) + wL(Q ) µ ¶ ³r´ w K (Q ) + β L(Q ) =α α β ³ r ´ α µ w ¶ α+β β ³ r ´ α µ w ¶ α+β β 1 1 α+β α+β ( α + β ) +β Q Q (α+β) =α α β α β β ¶ µ ³ r ´ α w α+β 1 α+β = (α + β) Q (α+β) . α β Therefore, µ ¶ β dc ³ r ´ α+αβ w α+β 1−α(α++ββ) = Q dQ α β

= λ(Q ).

♦

E XAMPLE 12.7: A production technology relating output Q to inputs x and

y is given by the equation y = Q + e kx . The cost function is given by C = y −δ x,

where δ > k. Consider minimizing cost by choice of x and y.

In this case the Lagrangian is: L = y − δ x + λ[ y − Q − e kx ]. Therefore, L x = −δ − λke kx = 0 Ly = 1+λ = 0 Lλ = [ y − Q − e kx ] = 0.

From the second equation, λ = −1 and ³substituting into the first equation ´ gives: δ = ke kx . Solving for x, x = 1k ln δk . Since y = Q + e kx , we get y = 1

δ

δ

Q + e k k ln( k ) = Q + eln( k ) = Q + δk .

12.5. INTERPRETATION OF THE LAGRANGE MULTIPLIER

401

For the second-order condition, L xx = −λk2 e kx , L yy = 0, L x y = 0. The

constraint is 0 = g( x, y), where g( x, y) = − y + Q + e kx , so that g x = +ke kx and

g y = −1. The bordered Hessian is 

ke kx

0

 ¯ = ke kx H  −1

−λ k 2 e 2 x

0

−1



¯ ¯ke kx  ¯ ¯ 0  ⇒| H |= −1 ¯¯ −1 0

¯ −λk2 e kx ¯¯ ¯ ¯ 0

= λk2 e kx = −k2 e kx < 0.

Thus, the solution to the first-order conditions gives a local minimum. Write the solution values of x and y as x(k, δ) and y(k, δ), and the solution value of the Lagrange multiplier as λ(k, δ). Then the cost is given by C (k, δ) =

y(k, δ) − δ x(k, δ). Using the envelope theorem ∂C ∂k

= −λ(k, δ) x(k, δ) e kx( k,δ) ,

∂C ∂δ

= − x(k, δ).

Note the answers are equal to the derivative of the Lagrangian with respect

♦

to the appropriate variable, evaluated at the solution.

E XAMPLE 12.8: Recall Example 12.3: max f ( x, y) = x2 + ( y − 1)2 subject to

α = y − kx, 0 < α < 1 and k > 0. The solution: x∗ = ∗

λ

−α ) . = − 2(1 1+ k 2

∗

Thus, y − 1 =

α+ k2 −(1+ k2 ) 1+ k 2

=

α −1 , 1+ k 2

(1−α) k , 1+ k 2

so that

L ∗ = ( x∗ )2 + ( y∗ − 1)2 ¶ µ ¶ (1 − α)k 2 α−1 2 + 1 + k2 1 + k2 µ ¶2 1 = [((1 − α)k)2 + (α − 1)2 ] 1 + k2 ¶2 µ 1 [(1 − α)2 (1 + k2 )] = 1 + k2 ¶ µ 1 (1 − α)2 . = 1 + k2

=

µ

y∗ =

α+ k 2 , 1+ k 2

and

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

402

Therefore, ∂L ∗ ∂α

=−

µ

¶ 1 2(1 − α) = λ∗ . 1 + k2

Thus, the impact on the constrained objective is obtained by differentiating the Lagrangian with respect to the parameter and evaluating the derivative

♦

at the solution.

E XAMPLE 12.9: Recall the optimal design of the cylinder in Example 12.1. ³ 3 ´1 2 A There L ∗ = 54 . Differentiating with respect to A, π ∂L ∗

1 = ∂A 2

Using the fact that r2 =

³

A 6π

·µ

¶ ¸− 1 µ ¶ A 2 2 2 3A 2 r 9 54π

´ , ∂L ∗ ∂A

=

1 r = λ (or λ∗ ). 2

Again, this is the derivative of the Lagrangian with respect to A, evaluated

♦

at the solution.

The discussion this far has focused on the identification of a local optimum. Section 12.6 provides conditions under which a local optimum is also a global optimum.

12.6 Optimization: concavity and convexity In a constrained optimization problem, the shape of the objective function and the constraints play an important role in characterizing global optima – just as in the unconstrained case. For this discussion, assume that the domain of the function is convex. Definition 12.2: A set X is convex if ∀ x, y ∈ X ⇒ θ x + (1 − θ)y ∈ X , ∀θ ∈ [0,1].

12.6. OPTIMIZATION: CONCAVITY AND CONVEXITY

403

The main result below relies on quasiconcavity of the objective function. Definition 12.3 introduces a variety of notions of quasiconcavity. Definition 12.3: Concavity and quasiconcavity. 1. A function f is concave if f (θ x + (1 − θ)y) ≥ θ f (x) + (1 − θ) f (y), ∀ x, y ∈ X , θ ∈ [0,1].

2. A function f is strictly concave if f (θ x + (1 − θ)y) > θ f (x) + (1 − θ) f (y), ∀ x 6= y ∈ X , θ ∈ (0,1).

3. A function f is quasiconcave if f (θ x+(1−θ)y) ≥ min{ f (x), f (y)}, ∀ x, y ∈ X , ∀θ ∈ [0,1].

4. A function f is strictly quasiconcave if f (θ x + (1 − θ)y) > min{ f (x), f (y)}, ∀ x 6= y, θ ∈ (0,1).

5. A function f is explicitly quasiconcave if it is quasiconcave and f (x) > f (y) ⇒ f (θ x + (1 − θ)y) > f (y), ∀θ ∈ (0,1).

Concavity is a more demanding requirement than quasiconcavity. P ROPOSITION 12.1: If f is (strictly) concave then f is (strictly) quasiconcave.♦ P ROOF : If f is concave then f (θ x + (1 − θ)y) ≥ θ f (x) + (1 − θ) f (y) ≥ min{ f (x), f (y)}.

With strict quasiconcavity and x 6= y, f (θ x + (1 − θ)y) > θ f (x) + (1 − θ) f (y) ≥

min{ f (x), f (y)}. Furthermore,

P ROPOSITION 12.2:

If f is strictly quasiconcave then f is explicitly quasicon-

♦

cave.

P ROOF : If f is strictly quasiconcave and f (x) > f (y) then f (θ x + (1 − θ)y) >

min{ f (x), f (y)} > f (y) and this confirms the proposition.

The relations are depicted in Figure 12.8. Notice that explicit quasiconcavity allows flat sections at the top of the function (a property excluded by strict quasiconcavity). Recall that concavity is related to the Hessian matrix H(x) = 

f 11 (x)

f 12 (x)

  f 21 (x)  H(x) =  .  ..  f n1 (x)

f 22 (x) .. . f n2 (x)

···

··· .. . ···

f 1n (x)



 f 2n (x)  , ..  .   f nn (x)

n

∂2 f (x) ∂ x∂ x′

o ,

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

f ( x)

f ( x) flat section only on top

404

f ( x) no flat section

flat section

x quasiconcave

x explicitly quasiconcave

x strictly quasiconcave

Figure 12.8: Quasiconcavity

where the dependence of the partial derivatives on x = ( x1 , x2 ,... , x n ) is made explicit.

As discussed earlier, if f is concave then the Hessian matrix H ( x) is negative semi-definite for all x. In fact, f is concave if and only if H ( x) is negative semidefinite for all x. If f is strictly concave then H ( x) is negative semi-definite for all

x; and, conversely, if H ( x) is negative definite for all x then f is strictly concave. Notice that strict concavity of f does not imply that H ( x) is negative definite for all x. This observation is clarified in the following remark.

R EMARK 12.3: While an everywhere negative definite Hessian matrix implies strict concavity of the associated function, the converse is not the case: a strictly concave function need not have an everywhere negative definite Hessian matrix. The following example confirms the latter observation. Consider f ( x) =

− x4 , with f ′ ( x) = −4 x3 and f ′′ ( x) = −12 x2 . Thus, f ′′ ( x) ≤ 0 with f ′′ (0) = 0, so

viewing f ′′ ( x) as the Hessian matrix (with just one entry), this is not nega-

tive definite for all x, since the requirement fails at x = 0. However, this func-

tion is strictly concave as the following discussion demonstrates. To do so, it is necessary to show that when 0 < θ < 1 and y 6= x, then f (θ x + (1 − θ) y) >

θ f ( x) + (1 − θ) f ( y). With f ( x) = − x4 ,

f (θ x + (1 − θ) y) = −(θ x + (1 − θ) y)4 = −{4θ3 x3 y + 6θ2 x2 y2 − 4θ4 x3 y + 12θ3 x y3 + 6θ4 x2 y2 + 4θ x y3 − 4θ4 x y3 − 12θ2 x y3 − 12θ3 x2 y2 + θ4 x4 − 4θ y4 + y4 + 6θ2 y4 − 4θ3 y4 + θ4 y4 }.

12.6. OPTIMIZATION: CONCAVITY AND CONVEXITY

405

Let δ = f (θ x + (1 − θ)y) − [θ f (x) + (1 − θ) f (y)] = −(θ x + (1 − θ)y)4 − {−θ x4 − (1 − θ)y4 }.

The function f is strictly concave if δ > 0 whenever θ ∈ (0,1) and x 6= y. δ = −{4θ3 x3 y + 6θ2 x2 y2 − 4θ4 x3 y + 12θ3 x y3 + 6θ4 x2 y2 + 4θ x y3 − 4θ4 x y3

− 12θ2 x y3 − 12θ3 x2 y2 + θ4 x4 − 3θ y4 + 6θ2 y4 − 4θ3 y4 + θ4 y4 − θ x4 } £ ¤ = −{θ(x − y)2 (θ − 1) θ2 y2 − 2θ2 x y + θ2 x2 + 2θ x y − 3θ y2 + θ x2 + 3y2 + 2x y + x2 } £ ¤ = θ(1 − θ)(x − y)2 θ2 y2 + 2θ(1 − θ)x y + θ2 x2 + 3(1 − θ)y2 + θ x2 + 2x y + x2 £ ¤ = θ(1 − θ)(x − y)2 θ2 y2 + (2θ(1 − θ) + 1)x y + θ2 x2 + 3(1 − θ)y2 + θ x2 + x2 .

From this expression, it is clear that with x 6= y and θ ∈ (0,1), δ > 0 if x y ≥ 0. It remains to show that δ > 0 for x y < 0. Suppose that x y < 0. Take x < 0 and y > 0 (the reasoning is the same for the case y < 0 and x > 0), and set z = θ x +

(1 − θ)y so that x < z < y. Regarding z there are two possibilities: z < 0 or z > 0. The analysis is the same in both cases, so consider the case where z < 0. Then there is a β, 0 < β < 1 with z = β x + (1 − β)0 = β x. So, z = β x + (1 − β)0 = θ x + (1 − θ)y

and thus (β − θ)x = (1 − θ)y. Since y > 0 and x < 0, β < θ. Therefore,

f (z) = f (β x + (1 − β)0) > β f (x) + (1 − β) f (0) > θ f (x) + (1 − θ) f (0) > θ f (x) + (1 − θ) f (y), where the first inequality follows because f (β x) = −β4 x4 > −β x4 = β f (x) and

f (0) = 0; the second because β < θ and f (0) = 0; and the third because f (0) > f (y).

An identical argument applies when z > 0.

So far the results apply only to finding local maxima or minima. For a global result it is necessary to impose additional assumptions. The following result assumes that the constraint set is convex: { x | g(x) = c} is convex. Observe that { x | g(x) = c} convex implies g(x) = a0 + a1 x1 + · · · + a n x n , (g(θ x + (1 − θ)y) = θ g(x) +

(1 − θ)g(y), ∀(x, y).) In Figure 12.9, the set of x i that satisfy g(x) = c lie on the straight line.

Theorem 12.6: Let L (x1 ,... , x n , λ) = f (x1 ,... , x n ) + λ(c − g(x1 ,... , x n )), where (a) f is explicitly quasiconcave (b) { x | g(x) = c} is convex.

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

406

x2

•

x∗

•

θ x∗ + (1 − θ) x¯

• { x˜ | g( x˜ ) = c}

x¯ { x˜ | f ( x˜ ) = f ( x¯ )}

x1

Figure 12.9: A linear constraint

If ( x∗ , λ∗ ) satisfy, 1. L i ( x∗ , λ∗ ) = 0, i = 1,... , n Lλ ( x∗ , λ∗ ) = 0

¯ 3 ( x∗ ) < 0,... , (−1)n H¯ n ( x∗ ) > 0, 2. H¯ 2 ( x∗ ) > 0, H then x∗ is a global maximum of the constrained program. P ROOF : Suppose ( x∗ , λ∗ ) satisfies the conditions (1) and (2) but is not a global maximum. Since x∗ is not a global maximum, ∃ y, g( y) = c and f ( y) > f ( x∗ ). Since g is linear, g(θ y + (1 − θ) x∗ ) = c, or g( x∗ + θ( y − x∗ )) = c. So θ y + (1 − θ) x∗ =

x∗ + θ( y − x∗ ) is feasible and since f is explicitly quasiconcave, f ( x∗ + θ( y − x∗ )) > f ( x∗ ), ∀θ ∈ (0, 1). However, since ( x∗ , λ∗ ) satisfy (1) and (2), from Theorem 12.4,

x∗ is a local maximum so that in some neighborhood of x∗ , all x˜ satisfying g( x˜ ) = c also satisfy f ( x˜ ) ≤ f ( x∗ ). However, θ can be chosen arbitrarily small and f ( x∗ +

θ( y − x∗ )) > f ( x∗ ). Thus, there are feasible points arbitrarily close to x∗ with a

higher value of f ; a contradiction.

The theorem is not valid without the assumption that (2) holds, as the next example illustrates.

12.6. OPTIMIZATION: CONCAVITY AND CONVEXITY

407

E XAMPLE 12.10: Define the function f 1 ( x):

f 1 ( x) = ( x − 1)3 , x ≤ 2

= ( x − 1)3 − ( x − 2)4 , x > 2.

f 1 ( x)

x

Set f ( x, y) = f 1 ( x), g( x, y) = x − y and c = 0. Let ( x∗ , y∗ , λ∗ ) = (1, −1, 0). Then

f x ( x, y) = 3( x − 1)2 if x < 2 and f x ( x, y) = 3( x − 1)2 − 4( x − 2)3 if x > 2. Thus

f x (1, −1) = 0, f y ( x, y) = 0, g x ( x, y) = 1 and g y = −1. Therefore, f x (1, −1) − 0 · g x (1, −1) = 0 f y (1, −1) − 0 · g y (1, −1) = 0 0 − (1 − 1) = 0, so (1) is satisfied at ( x∗ , y∗ , λ∗ ) = (1, −1, 0). Next, ¯ ¯0 ¯ ¯ ∗ ∗ ∗ ¯ | H2 ( x , y , λ ) |= ¯¯ g x ¯ ¯gy

gx f xx f yx

¯ g y ¯¯ ¯ f x y ¯¯ = 0. ¯ f yy ¯

Thus (2) is not satisfied. Observe from the graph that ( x∗ , y∗ , λ∗ ) = (1, −1, 0) does not correspond to a maximum.

♦

R EMARK 12.4: A variation on the proof of Theorem 12.6 may be given using Theorem 12.3 directly, as follows. Since { x | g( x) = c} is convex, g( x) = a0 +

a1 x1 + ... + a n x n because if x, y are in this set then so is θ x + (1 − θ) y. Then

g(θ x + (1 − θ) y) = c = θ g( x) + (1 − θ) g( y) so that g is linear (affine). Therefore

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

408

g i = a i and L i j = f i j . Thus 

0

a1

a2

  a1    ¯ H =  a2   ..  . 

···

f 11

f 12

···

f 21 .. .

f 22 .. .

···

f n1

f n2

···

an

..

.

an



 f 1n    f 2n  .  ..  .  

f nn

¯ ( x) to denote the matrix evaluated at x. Let a′ = (a1 , a2 ,... , a n ), so Write H

g( x) = a0 + a′ x. Suppose x∗ is not a global maximum so ∃ y, g( y) = c, f ( y) >

f ( x∗ ). Explicit quasiconcavity implies that f (θ y + (1 − θ) x∗ ) > f ( x∗ ), ∀θ ∈ (0, 1),

or f [ x∗ + θ( y − x∗ )] > f ( x∗ ), ∀θ ∈ (0, 1).

Since g( y) = a0 + a′ y = c = g( x∗ ) = a0 + a′ x∗ , g( y) − g( x∗ ) = a′ y − a′ x∗ = c − c = 0,

or a′ ( y − x∗ ) = 0. Also a′ θ( y − x∗ ) = 0. Recall the notation in Theorem 12.3 and let 

f 11 ( x)

   f 21 ( x)  A ( x) =  .  .  .  f n1 ( x)

f 12 ( x)

···

f 22 ( x)

···

.. .

..

f n2 ( x)

f 1 n ( x)



  f 2 n ( x)   , b = a, C j ( x) = H¯ j ( x). ..   .   f nn ( x)

.

···

Given (2) in the statement of Theorem 12.6, (−1) j | C j ( x∗ ) |> 0, j = 2,... , n, so from Theorem 12.3, ξ′ A ( x∗ )ξ < 0, for all ξ 6= 0, with a′ ξ = 0 (= b′ ξ).

As f ( x∗ + θ( y − x∗ )) > f ( x∗ ), take θ sufficiently small for a Taylor series ex-

pansion:

1 f ( x∗ + θ( y − x∗ )) = f ( x∗ ) + f ′ ( x∗ )θ( y − x∗ ) + θ( y − x∗ )′ A ( x˜ )θ( y − x∗ ) > f ( x∗ ), 2 (12.13) where x˜ is close to x∗ . Cancel f ( x∗ ) to get θ f ′ ( x∗ )( y − x∗ ) + 21 θ2 ( y − x∗ )′ A ( x˜ ) ×

( y− x∗ ) > 0. Consider the term ( y− x∗ )′ A ( x˜ )( y− x∗ ). At x∗ , ( y− x∗ )′ A ( x∗ )( y− x∗ ) < 0, since a′ ( y − x∗ ) = 0 and ( y − x∗ ) 6= 0.

For x˜ sufficiently close to x∗ , (−1) j | C j ( x˜ ) |> 0, j = 2,... , n, and since a′ ( y − x∗ )

and ( y − x∗ ) 6= 0, it must be that ( y − x∗ )′ A ( x˜ )( y − x∗ ) < 0. Thus to satisfy the inequality in expression 12.13, θ f ′ ( x∗ )( y − x∗ ) > 0, so f ′ ( x∗ )( y − x∗ ) > 0. However, 0 < f ′ ( x∗ )( y − x∗ ) = lim

θ →0

½

¾ f ( x∗ + θ( y − x∗ )) − f ( x∗ ) . θ

12.6. OPTIMIZATION: CONCAVITY AND CONVEXITY

409

This implies that for θ small, f ( x∗ + θ( y − x∗ )) > f ( x∗ ) and since x∗ + θ( y − x∗ ) is feasible ( g( x∗ + θ( y − x∗ )) = c) this contradicts the fact that x∗ satisfies the

conditions for a local maximum.

12.6.1

Quasiconcavity and bordered Hessians

Just as there is a close connection between concavity (and convexity) and the Hessian matrix, there is a similar connection for quasiconcavity (quasiconcavity), except that the Hessian matrix is replaced by a bordered Hessian matrix. Augment the Hessian matrix with a border of first derivatives as follows. Let 

0

f1

f2

  f1   f Q =  2 .  ..  fn

f 11

f 12

f 21 .. .

f 22 .. .

··· .. .

f n1

f n2

···

···

···

fn



 f 1n    f 2n  ,  ..  .   f nn



0

f1

f2

  f1   f Qj =   2 .  ..  fj

f 11

f 12

f 21 .. .

f 22 .. .

f j1

f j2

···

···

··· .. .

···

fj



 f1 j   f2 j .  ..  .   fjj

Write Q ( x) and Q j ( x) to denote matrices evaluated at x. These matrices are used to characterize quasiconcavity. Theorem 12.7: 1. If f is quasiconcave then, | Q 2 ( x) |≥ 0, | Q 3 ( x) |≤ 0,... , (−1)n | Q n ( x) |≥ 0, ∀ x.

(Q 1 ≤ 0 always holds). 2. If | Q 1 ( x) |< 0, | Q 2 ( x) |> 0,... , (−1)n | Q n ( x) |> 0, ∀ x,

then f is quasiconcave. 3. If f is quasiconvex, then, | Q 2 ( x) |≤ 0, | Q 3 ( x) |≤ 0,... , | Q n ( x) |≤ 0, ∀ x.

4. If | Q 1 ( x) |< 0, | Q 2 ( x) |< 0,... , | Q n ( x) |< 0, ∀ x,

then f is quasiconvex.

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

410

These results can be related to constrained optimization as follows. Consider the problem: max f ( x1 , x2 ,... , x n )

subject to

g( x1 , x2 ,... , x n ) = c.

Suppose that g( x1 ,... , x n ) = a0 + a1 x1 + a2 x2 + ... + a n x n . Then g i = a i . Let L = f ( x1 ,... , x n ) + λ( c − g( x1 ,... , x n )).

Then L i = f i − λ g i and L i j = f i j . The first-order conditions are L i = 0, ∀ i , or

f i = λ g i , ∀ i , or g i =

1 λ f i , ∀i.

Theorem 12.4 gives second-order conditions in terms

of the Hessian matrix. In the present context, 

0

 − g1   −g Hn = H =   2  .  .. 

− gn

− g1

− g2

···

L11

L12

L21 .. .

L22 .. .

··· .. .

L n1

L n2

···

···

− gn





0

   L1 n    − f 1 /λ   L2 n  =  − f 2 /λ    . ..   . .    . − f n /λ L nn

− f 1 /λ

− f 2 /λ

···

L21 .. .

L22 .. .

··· .. .

L n1

L n2

···

L11

L12

···

− f n /λ



 L1 n    L2 n  .  ..  .   L nn

Thus 

0

f1

f2

  f1   1 f2 H¯ j = 2  λ  .  .. 

f 11

f 12

f 21 .. .

f 22 .. .

f j1

L j2

fj

···

···

··· .. .

···

fj



 f1 j    f2 j  ,  ..  .   Ljj

since L i j = f i j in the linear constraint case. Noting that H j corresponds to Q j , Theorem 12.8: If the bordered Hessian satisfies the conditions for f to be quasiconcave, then the bordered Hessian satisfies the sufficient conditions for a local maximum.

E XAMPLE 12.11: Consider the problem: max U ( x1 ,... , x n ) subject to: p1 x1 +

p2 x2 + · · · + p n x n = Y , or p · x = Y = g( x). Thus, − g i = − p i and L i j = U i j .

Therefore, a sufficient condition for x∗ to be a local maximum is 

0  − p 1  − p2

− p1

U11

U21

− p2





0

 − p 1   U12  > 0,   − p 2 U22 − p3

− p1

− p2

U21

U22

U31

U32

U11

U12

− p3



 U13   < 0, etc.  U23 

U33

♦

12.7. OPTIMIZATION WITH MANY CONSTRAINTS

411

12.7 Optimization with many constraints Consider a square n × n matrix A , and let B be an m × n matrix with m < n. Thus,     a11 a12 · · · a1 n b11 a12 · · · a1 n    a21 a22 · · · a2 n       . .. ..  .. A= . B =  .. , . . . . . . .  ..   .. .. ..    b m1 a m2 · · · a mn a n1 a n2 · · · a nn Let B m denote the square matrix obtained from B by deleting the last n − m columns. Define the matrix:

C=

Ã

0

B

B′

A

!

.

The matrix C and a secondary family of matrices (C r ) may be defined more explicitly: 

... .. .

0 .. .

b11 .. .

b12 .. .

··· .. .

...

0

b m1

b m2

···

b m1

a11

a12

···

··· .. .

b m2 .. .

a21 .. .

a22 .. .

b1n

···

b mn

a n1

a n2

···

0  .  .  .   0   Cr =  b11  b12   .  .. 

... .. .

0 .. .

b11 .. .

b12 .. .

··· .. .

...

0

b m1

b m2

···

0  .  .  .   0   C=  b11   b12   .  ..  

b1r

···

··· .. .

b m1

a11

a12

··· .. .

b m2 .. .

a21 .. .

a22 .. .

··· .. .

···

b mr

a r1

a r2

···

···

···

 b1n ..   .   b mn    a1 n  ,  a2 n   ..  .   a nn  b1r ..   .   b mr    a1 r  .  a2 r   ..  .   a rr

Recall the matrices from C and C r from the earlier discussion on quadratic forms. These are used in Theorem 12.9 to provide the key result for the multiconstraint optimization problem. Theorem 12.9: Suppose that A is symmetric and B m is non-singular. 1. Then x′ Ax > 0 for every x with Bx = 0, if and only if: (−1)m | C r |< 0,

r = m + 1,... , n.

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

412

2. Then x′ Ax < 0 for every x with Bx = 0, if and only if: (−1)r | C r |> 0,

r = m + 1,... , n.

Notice that condition 1 requires that all determinants have the same sign (negative if m is odd, positive if m is even), whereas in condition 2 the sign alternates with the size of the matrix. This result may be used to give a sufficiency condition for local optimality of a solution x∗ to a constrained optimization problem. Let f ( x1 ,... , x n ) be the objective function and g k ( x1 ,... , x n ) = c k , k = 1,... , m a

collection of m constraints. Assume that m < n. Define the Lagrangian:

L = f ( x1 ,... , x n ) + λ1 [ c1 − g 1 ( x1 ,... , x n )] + · · · + λm [ c1 − g m ( x1 ,... , x n )].

Define the matrix D g as:  ∂g

∂ g1 ∂ x2

···

∂ g1 ∂ xn

∂ g2 ∂ x2

···

∂ g2 ∂ xn

.. .

.. .

∂gm ∂ x2

···

1

∂ x1

  ∂ g2   ∂ x1 Dg =   .  ..  

∂gm ∂ x1



    , ..  .   

∂gm ∂ xn

and define 



Cr = 

0

D ′g

0

  .  .  .      0  Dg   =  ∂ g1  ∂ x1  ∇2 L  ∂ g1  ∂x  2   .  ..  

∂ g1 ∂ xr

...

0

∂ g1 ∂ x1

∂ g1 ∂ x2

···

..

.

.. .

.. .

.. .

.. .

...

0

∂gm ∂ x1

∂gm ∂ x2

···

···

∂gm ∂ x1

L11

L12

···

···

∂gm ∂ x2

L21

L22

···

.. .

.. .

.. .

.. .

..

···

∂gm ∂ xr

L r1

L r2

···

∂ g1 ∂ xr



      ∂gm  ∂ xr    L1 r  .   L2 r    ..  .    L rr

.. .

.

Where ∇2 L is the matrix {L i j }1≤ i≤r,1≤ j ≤r , as indicated in the expression. Assume that the rank of D g is m, and that the x i are labeled such that the first m columns

of D g are linearly independent at x∗ . Thus Theorem 12.10: Suppose that x∗ is feasible (satisfies all the constraints). If ( x∗ , λ∗1 ,... , λ∗m ) = ( x∗ , λ∗ ) satisfies:

12.7. OPTIMIZATION WITH MANY CONSTRAINTS

413

1. L i ( x∗ , λ∗ ) = 0, i = 1,... , n Lλk ( x∗ , λ∗ ) = 0, k = 1,... , m

2. (−1)r | C r |> 0, r = m + 1,... n, then x∗ is a constrained local maximum. If condition 2 is replaced by (−1)m | C r |> 0 r = m + 1,... , n, then x∗ is a constrained local minimum.

So, for example, if there is one constraint so that m = 1, the local maximum

condition requires that (−1)2 | C2 |> 0 or | C2 |> 0, (−1)3 | C3 |> 0 or | C2 |> 0, and

so on. Similarly, the local minimum condition requires (−1)1 | C2 |> 0 or | C2 |< 0, (−1)1 | C3 |> 0 or | C3 |< 0, and so on. This is the same set of conditions as appears

in Theorem 12.4.

This page intentionally left blank

Exercises for Chapter 12

1

2

Exercise 12.1 A firm has the production function Y = 30L 2 K 3 . (a) What is the cost-minimizing level of L and K if the firm must produce 1000 units? Check the second-order conditions. (b) By how much will cost increase if the firm must increase output by one unit? (c) Given this production function, can we find a profit-maximizing level of output? Why or why not? Exercise 12.2

Assume that a monopolist producing one good can sell to two

distinct markets and price discriminate. The demand function in each market is 1 D 1 = 300 − P1 3 1 D 2 = 300 − P2 . 2 The total cost to the monopolist is C = 2D 12 + 3D 1 D 2 + D 22 . Also, owing to a government restriction, the total output of the firm must be 100 units. What is the profit-maximizing output in each market? Check the second-order conditions. Exercise 12.3

Consider a firm that produces beer and pretzels. It must sell a

bag of pretzels for every four bottles of beer. If its profit function is π = 150B − 4B 2 + 150P − 3P 2 + 10BP + 100

415

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

416

(P = # of bags of pretzels, B = # beers), what is the profit-maximizing level of output?

1

Exercise 12.4 The geometric average of two positive numbers, x and y, is ( x y) 2 . Consider the task of minimizing the geometric average, subject to the condition

x + y = 1. Show, using the method of Lagrange multipliers, that the optimal choice ¡1 1¢ 2 , 2 and confirm that this gives the constrained minimum.

of ( x, y) is

Exercise 12.5

A swimming pool is 25 meters times 100 meters. In coordi-

nates, a position ( x, y) gives length and distance traveled from (0,0). Thus (15,20) denotes the position of a swimmer who swam 20 meters along the x axis and 15 meters along the y axis from (0,0). There are four corners (0,0), (0,25), (100,0) and (100,25). The diagonal line connecting corners (0,25) and (100,0) is given by 1

4 y + x = 100. The distance from corner (0,0) to coordinate ( x, y) is ( x2 + y2 ) 2 . A swimmer wishes to swim from the (0,0) corner, and cross the diagonal line in the

shortest distance possible. What coordinate position should the person swim to? Use the method of Lagrange multipliers to determine the answer. Exercise 12.6

A firm wishes to produce a level of output Q at minimum cost.

Two inputs, capital (K ) and labour (L), are used to produce Q with the production function given by Q = K α + Lα , where 0 < α < 1. The wage rate is ω and the cost of capital (per unit) is r, so that total cost given (K, L) is C = rK + ωL.

(a) Set up the Lagrangian to solve the problem of finding the cost-minimizing levels of K and L required to produce Q . (b) Solve for K, L and the Lagrange multiplier, λ. (c) Check that the second-order conditions for a minimum are satisfied. (d) Observe that in your solution to part (b), both K and L depend on Q . To emphasize this dependency, write K (Q ) and L(Q ). Thus the cost of producing the output level Q is C (Q ) = rK (Q ) + ωL(Q ). Find dC (Q )/dQ and compare your answer with λ. Exercise 12.7

Dr. Who lives at coordinates ( x, y) = (3, 12). A road in the ( x, y)

plane follows the path: k = x2 − 6 x + y, where 0 < k < 3. On the bus journey home, each traveller must give the driver “drop-off ” coordinates. Unfortunately for Dr. Who, the bus does not travel by his house (since ( x, y) = (3, 12) is not on the line

k = x2 − 6 x + y).

(a) What drop-off coordinates should Dr. Who give the driver so that Dr. Who is as close to his house as possible? Hint: the distance between two points, ( x1 , y1 ) and 1

( x2 , y2 ), is d = [( x1 − x2 )2 + ( y1 − y2 )2 ] 2 . However, since minimizing this is equivalent

to minimizing d ∗ = [( x1 − x2 )2 + ( y1 − y2 )2 ], take d ∗ as the measure of distance. (b) Confirm that the second-order conditions are satisfied.

EXERCISES FOR CHAPTER 12

417

(c) The solution depends on k. Let d ∗ (k) be the distance at the solution from the drop-off point to Dr. Who’s house. Calculate

d [ d ∗ ( k)] dk

and relate it to the Lagrange

multiplier in (a). Exercise 12.8

A firm sets both price and quantity in a market. It is known

that the demand function for this market is P = k/Q . By law, the firm is required

to choose a price quantity combination lying on the demand curve. The cost of setting (P,Q ) combinations is C (P,Q ) = P + αQ 2 .

(a) Minimize C (P,Q ) subject to the demand constraint. (b) Confirm that the second-order conditions are satisfied. (c) The optimal choices of P and Q depend on (α, k) : P (α, k),Q (α, k), so that the cost at the solution is C ∗ (α, k) = C (P (α, k),Q (α, k)). Find ∂C ∗ (α, k)/∂k and compare

this with the Lagrange multiplier obtained in (a). Exercise 12.9

Maximize x y subject to the constraint x2 + y = k, where k is

some positive constant. Confirm that the second-order conditions for a constrained optimum are satisfied. Interpret the Lagrange multiplier. Exercise 12.10 Minimize ( x − 1)2 + y2 subject to the constraint x2 + k = y, where

k is some positive constant. Confirm that the second-order conditions for a con-

strained optimum are satisfied. Interpret the Lagrange multiplier. Exercise 12.11 Consider the following constrained optimization problem: maximize f ( x, y) = y − bx2 subject to the constraint: y = k + a ln x. Here k, a and b are positive constants.

(a) Plot the problem. (b) Find the solution values for x and y. (c) Confirm that the second-order conditions are satisfied. (d) Let the solution be x(a, b, k) and y(a, b, k). Then the value of the objective at the solution is f ∗ (a, b, k) = f ( x(a, b, k), y(a, b, k)). Find

∂ f ∗ (a,b,k) . ∂a

Find

∂ f ∗ (a,b,k) ∂k

and

interpret this in terms of the Lagrange multiplier.

Exercise 12.12 An individual faces the problem of minimizing expenditure subject to reaching some fixed level of utility u∗ . There are two goods, x and y, and the utility function is given by u( x, y) = xα yα , 0 < 2α < 1. The cost of x (per unit) is

p and the cost of y (per unit) is q, so the cost of the bundle ( x, y) is px + q y.

(a) Set up the Lagrangian for this problem. Find the optimal values of x and y and the value of the Lagrange multiplier. Check the second-order conditions.

CHAPTER 12. EQUALITY-CONSTRAINED OPTIMIZATION

418

(b) The solution depends on p, q and u∗ so write x( p, q, u∗ ), y( p, q, u∗ ), λ( p, q, u∗ ). The expenditure required to achieve utility level u∗ given prices p and q is therefore e( p, q, u∗ ) = px( p, q, u∗ ) + q y( p, q, u∗ ). Then ∂ e( p, q, u∗ ) ∂p

and

∂ e( p, q, u∗ ) ∂q

give the change in mimimum income required to reach utility level u∗ when the price of x and y change. Show that ∂ e( p, q, u∗ ) ∂p

= x( p, q, u∗ ) and

∂ e( p, q, u∗ ) ∂q

= y( p, q, u∗ ).

13 Inequality-constrained optimization

13.1 Introduction This chapter considers the problem of maximizing or minimizing a function f : R n → R subject to inequality constraints such as g(x) ≤ c or g(x) ≥ c. This type

of problem arises frequently in economics. For example, maximizing output y = f (x1 , x2 ) subject to expenditure constraint, g(x1 , x2 ) = w1 x1 + w2 x2 ≤ c, is a typical

problem.

In Section 13.2, the constrained optimization is formally defined and the main result (Theorem 13.1) is given. All the results in this chapter are derived from this theorem and the constraint qualification. Section 13.2.1 discusses the constraint qualification, a (mild) condition that the derivatives of the constraints must satisfy to guarantee that the “standard” first-order conditions are necessary for a local optimum. In Section 13.2.2, the complementary slackness condition is discussed: on non-binding constraints we can take the Lagrange multipliers to be 0. Section 13.2.3 discusses the logical equivalence of the maximization and minimization programs. In Section 13.2.4, conditions are given under which a local optimum is also a global optimum. In Section 13.2.5, problems with nonnegativity constraints are considered. In this case, the necessary conditions for a local optimum may be expressed in terms of the Kuhn–Tucker conditions, and this is done in Theorems 13.5 and 13.6, which are derived directly from the main result, Theorem 13.1. The remaining sections deal with a few miscellaneous topics.

419

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

420

First, Section 13.1.1 gives an introductory example that motivates the key characterizing condition for a solution: at the solution, the gradient of the objective may be written as a linear combination of the gradients of the constraints.

13.1.1

A motivating example

To develop some of the issues, consider the following optimization example.

E XAMPLE 13.1: max

x1 x2

subject to: x12 + x22 ≤ 1

x2 ≤ α x1 .

Thus, the objective is f (x1 , x2 ) = x1 x2 , and these constraints may be written g 1 (x1 , x2 ) = x12 + x22 − 1 ≤ 0 and g 2 (x1 , x2 ) = −α x1 + x22 ≤ 0. The figure depicts

the problem for two different values of the parameter α. x2

x2 = α˜ x1 •

xa

x1 x2 = 0.5 x12 + x22 = 1

x1 x2

x2 = αˆ x1 • xb

x1 x2 = 0.3448

x12 + x22 = 1

x1

13.1. INTRODUCTION

421

In the figures, constraint g 2 is depicted for two possible values for α: α˜ = 2 > 4 10

= αˆ . At parameter value α˜ , only the first constraint is binding and the ³q q ´ 1 1 ˆ solution occurs at x a = 2, 2 . At parameter value α, both constraints

1>

are binding and the solution occurs at x b . Let g˜ 2 ( x1 , x2 ) = −α˜ x1 + x2 , and

gˆ 2 ( x1 , x2 ) = −αˆ x1 + x2 .

At parameter value α˜ , the constraints are g 1 ( x1 , x2 ) ≤ 0 and g˜ 2 ( x1 , x2 ) ≤ 0

and the solution, x a , satisfies g 1 ( x a ) = 0, g˜ 2 ( x a ) < 0. Constraint g 1 is active

or binding at x a but constraint g˜ 2 is not active at x a , because the constraint

is not binding at x a . At parameter value αˆ , the constraints are g 1 ( x1 , x2 ) ≤ 0

and gˆ 2 ( x1 , x2 ) ≤ 0, and, at the solution, x b satisfies g 1 ( x b ) = 0, gˆ 2 ( x b ) = 0. Both constraints g 1 and gˆ 2 are active at x b . Depending on circumstance, a constraint may or may not be binding, so that the equations defining the solution must reflect this possibility. In the case where α = α˜ , observe that the slope of the objective is equal

to the slope of the active constraint. In the case where α = αˆ , the slope of the

objective lies between the slopes of the constraints. The point x b is located at the solution to the two constraints x2 = 0.4 x1 and x12 + x22 = 1: x b = ( x1 , x2 ) =

(0.9285, 0.3714). The slope of constraint 1 is 0.4 and the slope of constraint 9285 0.27 2 is − xx1 = − 00..3714 = −2.5. The slope of the objective is − 0.27 2 = − 0.9285 =

x1

2

−0.2908. Thus, since the slope of the objective lies between the slopes of

the two constraints, the slope of the objective may be written as a weighted average of the slopes of the constraints.

♦

That the slope of the objective lies between the slopes of the constraints is an important point. Remark 13.1 clarifies this point. R EMARK 13.1: Rather than express conditions in terms of slopes, it is standard practice to use the normal to a curve to represent the slope. Thus, with

f ( x1 , x2 ) = x1 x2 evaluated at x b , def

∇ f = ( f x1 , f x2 ) = ( x2 , x1 ) = (0.3714, 0.9285)

is the normal to the curve f ( x1 , x2 ) at ( x1 , x2 ). As a vector, this has slope

x1 x2 ,

which is the negative of the reciprocal of the slope of the objective f ( x1 , x2 ) =

x1 x2 , given by dd xx12 = − xx12 . Similarly, the normal of the linear constraint g 2 is ´ ³ ∂g ∂g ∇ g 2 = ∂ x 2 , ∂ x 2 = (−0.4, 1) and the normal of the quadratic constraint g 1 is 2´ ³ 1 ∂g ∂g ∇ g 1 = ∂ x 1 , ∂ x 1 = (2 x1 , 2 x2 ) = (1.857, 0.7428). 1

2

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

422

x2 xb + ∇ g2

g2

xb + ∇ f

• xb

xb + ∇ g1

f g1

x1

The slopes of the normals of the constraints are, respectively, 1 − 0.4 = −2.5 (g 2 ). The slope of the objective normal is

x1 x2

x2 x1

= 0.4 (g 1 ) and

= 2.5. An arc starting

with 0 slope first reaches the normal to g 1 then the normal to f and finally the normal to g 1 . Thus, the normal of the objective may be expressed as a positive linear combination of the constraint normals:

Ã

∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 ! Ã ! Ã ! 0.3714 1.8570 −0.4 = 0.3448 + 0.6724 . 0.9285 0.7428 1

(13.1)

The observations in Example 13.1 apply for the general case.

13.1.2

Overview

Given a function h(x) = h(x1 ,... , x n ), the normal to the function at x is the gradient

of the function at x, ∇h = (h x1 , h x2 ,... , h xn ). The key characterizing condition for an optimum is that the gradient (normal) of the objective may be written as a

weighted sum of the gradients of the constraints. With k constraints, { g j }kj=1 , ∇ f = Pk j =1 λ j ∇ g j , with λ j ≥ 0 for all j, as in Theorem 13.1. P Provided a constraint qualification is satisfied, this condition, ∇ f = kj=1 λ j ∇ g j , with λ j ≥ 0 for all j, along with complementary slackness conditions, determines a system of equations that may be solved to provide necessary conditions for a local optimum. All results here are derived from these conditions. The constraint qualification is discussed in Section 13.2.1; complementary slackness is discussed in Section 13.2.2. Sufficiency conditions are discussed in Section 13.2.4: concavity of the objective and convexity of the set of feasible points imply that, given the constraint qualification, a solution to the necessary conditions yields a global maximum for the constrained problem. Finally, the necessary conditions can be specialized to include non-negativity conditions, and this is done in Section 13.2.5.

13.2. OPTIMIZATION

423

13.2 Optimization Since a constraint of the form g( x) ≤ c may be written as g( x) − c ≤ 0 and constraints of the form g( x) ≥ c may be written as g( x) − c ≥ 0 or as − g( x) ≤ − c, we

can express all such constraints in the form g( x) ≤ 0 (or g( x) ≥ 0). Equality con-

straints may be dealt with in the same way since g( x) = c is equivalent to g( x) ≤ c

and g( x) ≥ c, but at the cost of having two constraints instead of one. Similarly,

non-negativity constraints may be expressed in this way; setting g( x) = − x, the condition g( x) ≤ 0 implies that x ≥ 0.

An inequality-constrained maximization problem is given as: max f ( x) subject to g j ( x) ≤ 0, j = 1, ... , r.

(13.2)

The function f ( x) is the objective function and g j , j = 1,... , r, the set of constraints. Feasible points and active constraints are defined next.

Definition 13.1: A point x∗ is feasible if it satisfies all of the constraints: g j ( x∗ ) ≤

0, ∀ j. A constraint, g j is active at a feasible point x∗ if g j ( x∗ ) = 0, and otherwise it

is inactive. Given the feasible point x∗ , let A ( x∗ ) be the set of active constraints at

x∗ : j ∈ A ( x∗ ) if and only if g j ( x∗ ) = 0. A point x∗ is a constrained local maximum if there is no alternative point “close” to x∗ that satisfies the constraints and gives a higher value for the objective. Formally: Definition 13.2: A point x∗ is a constrained local maximum of f if there is a neighborhood of x∗ (an open set in R n containing x∗ ), N ( x∗ ), such that f ( x∗ ) ≥

f ( x¯ ) for all x¯ ∈ N ( x∗ ) ∩ { x | g l ( x) ≤ 0, l = 1,... , r}.

The following discussion identifies necessary conditions for a point x∗ to be a constrained local maximum. The discussion makes use of Farkas’ lemma (see Remark 13.2), applied to the following equation system: ∂ g 1 ( x∗ )  ∂ x1

∂ g 2 ( x∗ ) ∂ x1

.. .

··· .. .

∂ g 1 ( x∗ ) ∂ xn

∂ g 2 ( x∗ ) ∂ xn

···

   

.. .

∂ g r ( x∗ ) ∂ x1  λ1 





∂ f ( x∗ )  ∂ x1 





  ..   ..   .  =  .      ∂ g r ( x∗ ) ∂ f ( x∗ ) λr ∂ xn ∂ xn

.. .

(13.3)

Each of the i rows may be read as: ∂ f ( x∗ ) ∂xi

=

r X

j =1

λj

∂ g j ( x∗ ) ∂xi

, i = 1,... , n.

(13.4)

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

424

This system of equations is central to the characterization of necessary conditions for an optimum. For ease of notation, write ∇ f ( x∗ ) for the term on the right side of equal-

ity 13.3, ∇ g for the n × r matrix on the left side and ∇ g l ( x∗ ) for the l th column of

∇ g( x∗ ). With λ = (λ1 ,... , λr ), equation system 13.3 may be written: ∇ g ( x ∗ ) · λ = ∇ f ( x ∗ ).

(13.5)

This equation system plays a central role in inequality-constrained optimization. Subject to a qualification, if the equation system does not have a non-negative solution, then the point x∗ is not a local optimum. Theorem 13.1 gives Equation 13.5 as a necessary condition for optimality, provided a “constraint qualification”, ∇ g j ( x∗ ) · z < 0, for any binding constraint, j, is satisfied. Theorem 13.1: Given the program 13.2, suppose that at x∗ there is some z with ∇ g j ( x∗ ) · z < 0 for each active constraint j. If x∗ is a constrained local maximum

then

∂ f ( x∗ ) ∂xi

=

r X

λj

j =1

∂ g j ( x∗ ) ∂xi

, i = 1,... , n

must have a non-negative solution, λ = (λ1 ,... , λr ), λ j ≥ 0, j = 1,... , r.

P ROOF : If Equation 13.3 has no non-negative solution, then from Farkas’ lemma (Remark 13.2), ∃ z∗ ∈ R n such that: n ∂ g ( x∗ ) X j

i =1

∂xi

· z∗i ≤ 0,

for

j = 1,... , r,

and

n ∂ f ( x∗ ) X · z∗i > 0. i =1 ∂ x i

(13.6)

More succinctly, ∇ g j ( x∗ ) · z∗ ≤ 0, ∀ j = 1,... , r, and ∇ f ( x∗ ) · z∗ > 0. By assumption, P ∂ g ( x∗ ) there is some z with ∇ g j ( x∗ ) · z < 0 for each j ∈ A ( x∗ ) (that is ni=1 ∂jx z i < 0 i

for all j ∈ A ( x∗ )). Let zα = (1 − α) z + α z∗, so that ∇ g j ( x∗ ) · zα < 0 for each j ∈ A ( x∗ ). Consider x∗ + δ zα where δ > 0 is small, and note that limδ→0 δ1 [ f ( x∗ +

δ zα ) − f ( x∗)] = ∇ f ( x∗ ) · zα > 0 when α is close to 1. Similarly, for each j ∈ A ( x∗ ),

limδ→0 δ1 [ g j ( x∗ + δ zα ) − g j ( x∗ )] = ∇ g j ( x∗ ) · zα < 0. For the inactive constraints,

j 6∈ A ( x∗ ), g j ( x∗ ) < 0 so that for δ close to 0, g j ( x∗ + δ zα ) < 0, j = 1,... , r. Conse-

quently, there is an α close to 1, and a δ close to 0 such that f ( x∗ +δ zα )− f ( x∗ ) > 0 and g j ( x∗ + δ zα ) < 0, j = 1,... , r. Summarizing, given that at x∗ there is some z

with ∇ g j ( x∗ ) · z < 0 for each active constraint j, if Equation 13.3 does not have a non-negative solution, then x∗ cannot be a constrained local maximum.

13.2. OPTIMIZATION

425

R EMARK 13.2: According to Farkas’ lemma, if A is an n × r matrix then exactly

one of the conditions holds:

1. Ax = b has a solution x ≥ 0, or 2. There exists y with A ′ y ≤ 0 and b′ y > 0. That is, if (1) is satisfied, (2) cannot be satisfied; and if (1) is not satisfied, then (2) must be satisfied.

The following remarks clarify a few points regarding this result, Theorem 13.1. Remark 13.3 explains how non-negativity constraints are imposed; Remark 13.4 discusses the determinacy of the system (in terms of equations and unknowns) and Remark 13.5 provides some insight into the sign of the multiplier. R EMARK 13.3: Observe that the program given by expression 13.2, max f (x) subject to g j (x) ≤ 0, j = 1, ... , r, includes the possibility of imposing non-negativity constraints. For example, if g k (x) = − x k , then g k (x) ≤ 0 is equivalent to − x k ≤ 0 or x k ≥ 0. In Section 13.2.5,

a special case is considered – where some of the constraints are non-negativity

conditions. There, the Lagrangian together with complementary slackness conditions are used to define the necessary conditions for an optimum. This gives the Kuhn–Tucker conditions for constrained optimization with inequality constraints.

R EMARK 13.4: Regarding the solution of these equation systems, note that in Equation 13.3 (or Equations 13.4 or 13.5), there are n equations in n + r

variables (x, λ). In addition, there are r inequalities g j (x) ≤ 0, j = 1,... , r to be

satisfied at a solution. Complementary slackness, discussed in Section 13.2.2,

asserts that the λ j may be chosen to satisfy λ j g j = 0 for each j, giving an addi-

tional r equations. Provided the constraint qualification is satisfied, a necessary condition for an optimum is that the system ∇ g(x) · λ = ∇ f (x) and λ j g j (x) = 0, j = 1,... , r of n + r equations in n + r unknowns has a solution (x1 ,... x n ; λ1 ,... , λr )

with λ j ≥ 0, ∀ j = 1,... , r.

R EMARK 13.5: A motivation for the sign of the Lagrange multiplier may be given by considering Figure 13.1. At the solution x1∗ = x2∗ , ∇ f (x∗ ) = (x2∗ , x1∗ ), and ∇ g(x∗ ) = (2x1∗ ,2x2∗ ), so that ∇ f (x∗ ) = λ∇ g(x∗ ) with λ = 12 . Notice that ∇ g points

out from the feasible region, as does ∇ f . Thus, the direction of improvement of f is the same as the direction of infeasible choices. This is necessary for the

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

x2

426

x∗ + ∇ g x∗ + ∇ f

x∗ = ( x1∗ , x2∗ ) •

f ( x1 , x2 ) = x1 x2 = u¯ g( x1 , x2 ) = x12 + x22 = c

x1 Figure 13.1: Sign of the Lagrange multiplier candidate solution to be an optimum. This fact is reflected in the expression

∇ f = λ∇ g with λ > 0.

The following example illustrates the application of Theorem 13.1 by showing how the necessary condition for an optimum is satisfied at the solution.

E XAMPLE 13.2: Consider the following problem: max x subject to:

f ( x1 , x2 ) = x1 x2 g 1 ( x1 , x2 ) = x2 + x12 − c ≤ 0 p g 2 ( x 1 , x 2 ) = x 2 − α x 1 ≤ 0.

x2

g1 = 0 ∇ g2

∇f ∇ g1

z

g2 = 0 f =k x1

Taking the parameter values as c = 4 and α = 1, the constraints intersect at ( x1 , x2 ) = (1.648, 1.284). Let

f ( x1 , x2 ) = x1 x2 , so that

13.2. OPTIMIZATION

427

at ( x1 , x2 ) = (1.648, 1.284), Ã ! Ã ! Ã ! ! Ã x2 2 x1 1.284 3.296 ∇f = ; ∇ g1 = ; = = x1 1.648 1 1

∇ g2 =

Ã

− 21 p1x

1

!

1

=

Ã

−0.389

1

!

.

Therefore, ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 = 0.5224∇ g 1 + 1.1256∇ g 2 .

So, ( x1 , x2 ; λ1 , λ2 ) = (1.648, 1.284, 0.5224, 1.1256) satisfies the necessary con-

♦

ditions for a maximum.

R EMARK 13.6: Note that the vector z′ = (−0.3, −0.6) satisfies ∇ g 1 · z = −1.589,

∇ g 1 · z = −0.483. Thus, the constraint qualification is satisfied. The constraint

qualification if satisfied, ensures that the requirement that ∇ g( x∗ ) · λ = ∇ f ( x∗ )

has a non-negative solution, where λ is a necessary condition for a local

optimum.

The next Example, 13.3, considers the quasilinear utility function and maximization of utility on a constraint set that is not convex. The example illustrates the key necessary condition, satisfied at the solution to the program. However, this example is reconsidered later (Example 13.11), where the non-convexity permits the necessary conditions to be satisfied at a non-optimal point. E XAMPLE 13.3: Consider the problem of maximizing the function f ( x, y) =

α ln x + y, subject to the constraints g 1 ( x, y) = y2 + ( x − 1)2 ≤ 2 and g 2 ( x, y) =

x y ≤ k = 1.12. For the calculations, take α = y 2

xy = k

6 10 .

∇f

∇ g1

∇ g2

( x − 1)2 + y2 = 2 1

g2

6 10

· ln x + y = β

g1 x

0 0

1

2

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

The solution occurs at x =

8 10

6 10 ln(x) + y = β = 1.266.

14 10 .

and y =

428

At these values, x y = k = 1.12 and

Note that

∇ f (x, y) =

At x =

8 10 ,

y=

14 10 ,

Ã

6 1 10 x

!

1

,

∇ g 1 (x, y) =

Ã

! 2(x − 1)

2y

,

∇ g 2 (x, y) =

Ã ! y

x

.

forming Equation 13.3 gives: Ã ! 6 8

1

which has solution λ1 =

55 204

= λ1

Ã

4 − 10 28 10

and λ2 =

!

+ λ2

Ã

14 10 4 10

!

,

125 204 .

This example is considered further in Section 13.2.6, where a second

♦

tangency point is identified, which fails to be a local optimum.

One feature of quasilinear preferences is that demand for one of the goods (the one entering the utility function non-linearly) depends only on the price ratio. If income is not sufficiently large, demand as a function of the price ratio may exceed income. This in turn implies that demand for the other good is negative. To avoid this outcome, it is necessary to impose non-negativity constraints on the variable choices, as is done in Example 13.4.

1

E XAMPLE 13.4: Consider the problem of maximizing f (x, y) = x 2 + y subject

to g 1 (x, y) = px + q y − m ≤ 0, g 2 (x, y) = − x ≤ 0 and g 3 (x, y) = − y ≤ 0. The

gradients are

∇f =

Ã1

2

p1 x

1

!

,

∇ g1 =

Ã ! p

q

,

∇ g2 =

From the graph, the solution is at (x∗ , y∗ ) =

³

Ã ! −1

0

m p ,0

´

,

∇ g3 =

Ã

0 −1

!

.

. Consider the expression:

∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 + λ3 ∇ g 3

13.2. OPTIMIZATION

429

y · λ1 ∇ g 1

λ3 ∇ g 3

· ∇f f ( x, y) =

px + q y

q

m p

∇ g1

=m

•

∇ g2

x

m p

∇ g3

or 1 2



q1

m p

1



 = λ1

Ã ! p

q

+ λ2

Ã ! −1

0

+ λ3

Ã

0 −1

!

.

Because the normal to the budget constraint is steeper than the normal to the indifference curve, the second coordinate of λ1 · ( p, q) will exceed the sec-

ond coordinate of ∇ f (which is 1), as depicted in the figure. Choose λ3 to reduce the second coordinate to 1. Thus,

1 q1 m 2 p

1 p1 2 mp .

Then λ1 q =

1 p1 2 mp q

=

1 q1 q . 2 m p

= λ1 p, or λ1 =

1 ³q 1 ´ 2 m p p

=

Ignoring the non-negativity condi-

p

q2

tion, the optimal x value would be 41 p2 and this is greater than m p . Thus, q m 1 q 1 q q1 1 q q1 > 1. Putting λ3 = 2 p m − 1 and λ2 = 0 comm 2 p > p and so 2 p p

p

pletes the determination of the λ i . This example is considered further in Example 13.10.

♦

The constraint qualification is discussed next. If the gradient of the objective is to P be written as a weighted average of the gradients of the constraints, ∇ f = λ j ∇ g j ,

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

430

this requires sufficient “variability” across the different constraints. The constraint qualification concerns this variation.

13.2.1

The constraint qualification

In Theorem 13.1, the constraint qualification requires that there exists some z with ∇ g j ( x∗ )· z < 0 for each active constraint j. An alternative constraint qualifica-

tion is given in terms of the linear independence of the gradients of the active constraints. Let ∇A ( x∗ ) g( x∗ ) be the matrix obtained from the matrix ∇ g( x∗ ) by delet-

ing the columns corresponding to inactive constraints. Suppose that ∇A ( x∗ ) g( x∗ )

has rank equal to the number of active constraints – the gradients of the active

constraints are linearly independent (so the number of active constraints must be no greater than n). Then there exists some z with ∇ g j ( x∗ ) · z < 0 for each active

constraint j. To see this, let γ = (−1, −1,... , −1), a vector with the same number of entries as there are active constraints. Since ∇A ( x∗ ) g( x∗ ) has full column rank, there is some z ∈ R n with z′ ∇A ( x∗ ) g( x∗ ) = γ. Thus, for each active constraint j, −1 = ∆ g j ( x∗ ) · z < 0.

The following examples illustrate the constraint qualification. E XAMPLE 13.5: Consider the pair of constraints: p g 1 ( x1 , x2 ) = x2 − x1 ≤ 0

g 2 ( x1 , x2 ) = x12 − 1 − x2 ≤ 0.

The two curves, g 1 ( x1 , x2 ) = 0, g 2 ( x1 , x2 ) = 0 intersect at ( x1 , x2 ) = (1.49, 1.22).

x2 g2 = 0 ∇ g1

∇f g1 = 0 ∇ g2

1

f

x1

0 0

1

13.2. OPTIMIZATION

431

The shaded area depicts the set of points in the non-negative orthant satisfying the two constraints. The gradient of f , ∇ f , lies in the cone determined by ∇ g 1 and ∇ g 2 so that ∇ f may be written ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 ,

♦

with λ1 , λ2 non-negative.

13.2.1.1

Failure of the constraint qualification

When the constraint qualification fails, the representation in Equation 13.4 (or 13.5) may be impossible to satisfy: at the solution to the problem, x∗ , Equation 13.4 cannot be satisfied. The following example illustrates.

E XAMPLE 13.6: The figure plots the constraints determined by two circles p of radius 8:

g 1 ( x) = ( x1 − 3)2 + ( x2 − 3)2 ≤ 8

g 2 ( x) = ( x1 − 7)2 + ( x2 − 7)2 ≤ 8. They have only one point in common: x = (5, 5).

x2 g2

10

g2 = 8

9 8

1 2 ∇ g 1 (5, 5)

7 6

g1

5 4 3 2 1

1 2 ∇ g 2 (5, 5)

g1 = 8

L

0 0 1 2 3 4 5 6 7 8 9 10

x1

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

432

The constraints have gradients:

∇ g 1 ( x) =

Ã

2( x1 − 3)

!

2( x2 − 3)

,

∇ g 2 ( x) =

Ã

2( x1 − 7)

2( x2 − 7)

!

.

The figure depicts these for x = (5, 5):

∇ g 1 (5, 5) =

Ã ! 4

4

,

∇ g 2 (5, 5) =

Ã ! −4

−4

.

Thus, for any λ1 , λ2 ,

λ1 ∇ g 1 (5, 5) + λ2 ∇ g 2 (5, 5) = (λ1 − λ2 )

Ã ! 4

4

=β

Ã ! 4

4

.

The line L passes through (5, 5). From the figure, note that {( x1 , x2 ) |

∇ g 1 (5, 5) · ( x1 , x2 ) ≤ 0} is the set of points on or below the line L . Like-

wise {( x1 , x2 ) | ∇ g 2 (5, 5) · ( x1 , x2 ) ≤ 0} is the set of points on or above the

line L . There is no point z with ∇ g i (5, 5) · z < 0 for i = 1, 2. Note also

that ∇ g 1 (5, 5) and ∇ g 2 (5, 5) are linearly dependent: ∇ g 1 (5, 5) = (4, 4) and

∇ g 2 (5, 5) = (−4, −4). Unless ∇ f (5, 5) = (α, α) for some number α, it is impossi-

ble to write ∇ f (5, 5) = λ1 ∇ g 1 (5, 5) + λ2 ∇ g 2 (5, 5), even though the maximization problem has a solution at ( x1∗ , x2∗ ) = (5, 5), the only feasible point.

♦

Considering Equation 13.5, in the context of Example 13.6, the matrix ∇ g( x∗ ) is singular and solving for (non-negative) λ in ∇ g( x∗ ) · λ = ∇ f ( x∗ ) is impossible. The

next example illustrates the same point – singularity of ∇ g( x∗ ) makes it impossi-

ble to satisfy ∇ g( x∗ )·λ = ∇ f ( x∗ ). However, in the example, the shape of the feasible region seems “regular”; the problem arises because one of the gradients vanishes at the candidate solution point.

E XAMPLE 13.7: Let g 1 ( x1 , x2 ) = ( x1 + x2 − 1)3 , g 2 ( x1 , x2 ) = x2 and f ( x1 , x2 ) =

x1 x2 . Consider the two constraints g 1 ( x1 , x2 ) ≤ 0 and g 2 ( x1 , x2 ) ≤ 12 . Satisfy-

ing constraint 1 requires that ( x1 + x2 − 1)3 ≤ 0 or ( x1 + x2 − 1) ≤ 0 or x1 + x2 ≤ 1.

13.2. OPTIMIZATION

433

The gradients are:

∇ f ( x1 , x2 ) =

Ã

x2 x1

!

,

∇ g 1 ( x1 , x2 ) =

Ã

3( x1 + x2 − 1)2

3( x1 + x2 − 1)2

!

,

∇ g 2 ( x1 , x2 ) =

Ã ! 0

1

.

x2 1 2 ∇ g2

1

∇f

( x¯ 1 , x¯ 2 )

0

x1

0

1

So, the set of ( x1 , x2 ) pairs satisfying both constraints are those pairs with ¢ ¡ x1 + x2 ≤ 1 and with x2 ≤ 21 . The solution occurs at ( x¯ 1 , x¯ 2 ) = 12 , 12 , where the gradients are

¶ Ã1! 1 1 ∇f , = 21 , 2 2 2 µ

∇ g2

µ

¶ Ã ! 0 1 1 , , = 2 2 0

∇ g2

µ

¶ Ã ! 0 1 1 , . = 2 2 1

¢ ¡ ¢ ¡ It is impossible to write ∇ f 12 , 21 as a linear combination of ∇ g 1 12 , 21 ¡1 1¢ and ∇ g 1 2 , 2 . Here, again, the matrix ∇ g( x∗ ) is singular at the candidate

solution point. One might remark that a re-specification of the constraint

g 1 ( x1 , x2 ) ≤ 1 to the equivalent gˆ 1 ( x1 , x2 ) = x1 + x2 − 1 ≤ 0 provides two lin-

♦

early independent constraint gradient vectors.

E XAMPLE 13.8: The following example illustrates yet another case where the necessary conditions are not satisfied. The³ constraints are g 1 ( x, y) = y − ´ (1 − x)3 ≤ 0 and g 2 ( x, y) = − y ≤ 0. Thus, ∇ g 1 = ³ ´ ∂g ∂g ∇ g 2 = ∂ x2 , ∂ y2 = (0, −1).

∂ g1 ∂ g1 , ∂y ∂x

= (−3(1 − x)2 , 1) and

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

434

x2 1 2 ∇ g2

1

∇ g1

x1

1

−1

∇ g2

So, at ( x, y) = (1, 0), unless ∇ f = α(0, 1) for some real number α, there is no

♦

solution to the expression: ∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 .

From Theorem 13.1, subject to the constraint qualification, a necessary condition for x∗ to be a solution is that ∂ f ( x∗ ) ∂xi

=

r X

λj

j =1

∂ g j ( x∗ ) ∂xi

, i = 1,... , n

must have a non-negative solution, λ = (λ1 ,... , λr ), λ j ≥ 0, j = 1,... , r.

Remark 13.4 notes that the Lagrange multipliers and associated constraints may be chosen to satisfy λ j g j = 0: at the solution, if the multiplier is positive, the cor-

responding constraint is binding (λ j > 0 implies g j = 0); and, if the constraint is not binding, the corresponding multiplier is 0 ( g j < 0 implies λ j = 0).

Imposing this requirement yields a system of n + k equations (∇ f =

Pk

j =1 λ j ∇ g j

gives n equations and λ j g j = 0, j = 1,... , k gives k equations) in n + k unknowns,

({ x i }ni=1 , {λ j }kj=1 ). The following discussion confirms the validity of imposing the complementary slackness conditions.

13.2.2

Complementary slackness

Consider the program given in Equation 13.2 and let x∗ be a constrained local maximum satisfying the necessary conditions of Theorem 13.1 with Lagrange

13.2. OPTIMIZATION

435

multipliers {λ j }kj=1 . At x∗ , the active constraints are A ( x∗ ) = {l | g l ( x∗ ) = 0} and

the inactive constraints A ( x∗ ) c = {l | g l ( x∗ ) < 0}. There is an open neighborhood of

x∗ , O ( x∗ ) with g l ( x) < 0 for any l ∈ A ( x∗ ) c and x ∈ O ( x∗ ).

Consider the modified program obtained from Equation 13.2 by deleting the constraints not binding at x∗ : max f ( x) subject to g j ( x) ≤ 0, j ∈ A ( x∗ ).

(13.7)

For this program, x∗ is a constrained local maximum – otherwise, in any neighborhood of x∗ there is an x′ satisfying the constraints A ( x∗ ) and with f ( x′ ) > f ( x∗ ), violating the assumption that x∗ is a constrained local maximum of the original

program. Furthermore, for this program the constraint qualification is satisfied at x∗ (if the qualification was satisfied at x∗ for the original program). Therefore (applying Theorem 13.1 to the modified program), there exists λ˜ j ≥

0, j ∈ A ( x∗ ) with:

∂ f ( x∗ ) ∂xi

=

X

λ˜ j

j ∈A ( x∗ )

∂ g j ( x∗ ) ∂xi

.

For j 6∈ A ( x∗ ), let λ˜ l = 0 so that: ∂ f ( x∗ ) ∂xi

=

r X

λ˜ j

∂ g j ( x∗ )

j =1

∂xi

.

The condition λ˜ j g j = 0 for all j = 1,... , r is called the complementary slackness condition. Consequently, we can refine Theorem 13.1 with complementary slackness conditions: Theorem 13.2: Given the program 13.2, suppose that at x∗ there is some z with

∆ g l ( x∗ ) · z < 0 for each active constraint l , then if x∗ is a constrained local maximum there exists λ j ≥ 0, j = 1,... , r, with: ∂ f ( x∗ ) ∂xi

=

r X

j =1

λj

∂ g j ( x∗ ) ∂xi

, i = 1,... , n,

(13.8)

and where λ j = 0 if g j ( x∗ ) < 0 (equivalently λ j g j ( x∗ ) = 0, ∀ j = 1,... , r). So, in expression 13.8 there are n equations, and the equations λ j g j ( x∗ ) = 0, ∀ j = 1,... , r, give n + r equations in n + r unknowns (see Remark 13.4).

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

13.2.3

436

Minimization and maximization

The case where the objective is to minimize a function subject to constraints may be considered by adapting Theorem 13.2. Let f ( x) be the function to be minimized, and specify the j constraints in the form g j ( x) ≥ 0, j = 1,... , r. Observe that minimizing f ( x) is the same as maximizing fˆ( x) = − f ( x); and that the con-

straint g j ( x) ≥ 0 is equivalent to − g j ( x) ≤ 0, so let gˆ j ( x) = − g j ( x) for each j. Then

the program:

(A)

min f ( x) subject to g j ( x) ≥ 0, j = 1, ... , r

is equivalent to the program: (B)

max fˆ( x) subject to gˆ j ( x) ≤ 0, j = 1, ... , r.

Considering program B, according to Theorem 13.2, if at x∗ , ∇ gˆ l ( x∗ ) · z < 0 for each active constraint l , then if x∗ is a local maximum of fˆ, there are λ j ≥ 0, j = 1,... , r, such that:

∂ fˆ( x∗ ) ∂xi

=

r X

λj

j =1

∂ gˆ j ( x∗ ) ∂xi

, i = 1,... , n

(13.9)

and λ j gˆ j ( x∗ ) = 0 for all j = 1,... , r.

These conditions may be written in terms of f and g j . Since f = − fˆ and

g = − gˆ j , Equation 13.9 is unchanged by multiplying by −1 to give ∂ f ( x∗ ) ∂xi

=

r X

j =1

λj

∂ g j ( x∗ ) ∂xi

, i = 1,... , n

(13.10)

and λ j gˆ j ( x∗ ) = 0 is equivalent to λ j g j ( x∗ ) = 0. Finally, note that ∇ gˆ l ( x∗ ) · z < 0 is

equivalent to ∇ g l ( x∗ ) · z > 0. Consequently,

Theorem 13.3: Suppose that at x∗ , there is some z such that ∇ g l ( x∗ ) · z > 0 for

each active constraint, l . Then if x∗ is a local minimum in program A, there exists λ j ≥ 0, j = 1,... , r such that Equation 13.10 is satisfied and λ j g j ( x∗ ) = 0, for all j.

13.2.4

Global and local optima

Theorem 13.1 gives necessary conditions for a constrained local optimum. This leaves open the possibility that there is some alternative feasible x′ satisfying the constraints, for which f ( x′ ) > f ( x∗ ).

13.2. OPTIMIZATION

437

Theorem 13.4: Suppose that f is concave and the feasible region convex, then, given the constraint qualification, any x∗ satisfying Equation 13.3 solves the optimization problem 13.2.

P ROOF : To see this observe that if f is concave then for any α ∈ [0, 1],

f ((1 − α) x∗ + α x′ ) ≥ (1 − α) f ( x∗ ) + α f ( x′ ). Rearranging gives

f ( x∗ + α( x′ − x∗ )) ≥ f ( x∗ ) + α[ f ( x′ ) − f ( x∗ )] or

f ( x∗ + α( x′ − x∗ )) − f ( x∗ ) ≥ α[ f ( x′ ) − f ( x∗ )]. Dividing by α and taking limits gives ∇ f ( x∗ ) · ( x′ − x∗ ) ≥ f ( x′ ) − f ( x∗ ). If there is

some feasible x′ with f ( x′ ) > f ( x∗ ), then ∇ f ( x∗ ) · ( x′ − x∗ ) > 0. Because the feasible set is convex, for every binding constraint, g j ( x∗ + α( x′ − x∗ )) = g j ((1 − α) x∗ + α x′ ) ≤ g j ( x∗ ) = 0 so that ∇ g j ( x∗ ) · ( x′ − x∗ ) ≤ 0 for every binding constraint. Put

z = ( x′ − x∗ ). So, 0 < ∇ f ( x∗ ) · z and for every binding constraint, j, ∇ g j ( x∗ ) · z ≤ 0. P According to the necessary condition for an optimum, ∇ f ( x∗ ) = rj=1 λ j ∆ g j ( x∗ )

with λ j = 0 for every non-binding constraint. Combining these inequalities: ∗

0 < ∆ f (x ) · z =

"

r X

j =1

∗

#

λ j ∆ g j (x ) · z =

r X

j =1

λ j ( ∆ g j ( x ∗ ) · z ) ≤ 0.

This contradiction implies there is no such x′ . An analogous result holds in the minimization case with f convex.

13.2.5

Non-negativity constraints

Non-negativity constraints may be accommodated in this framework by adapting Theorems 13.1 and 13.2. These calculations lead to the well-known Kuhn–Tucker conditions. The constraint x i ≥ 0 may be written g◦i ( x) = − x i ≤ 0. Consider max f ( x) subject to g j ( x) ≤ 0, j = 1, ... , r; x i ≥ 0, i = 1,... , n,

(13.11)

or equivalently, max f ( x) subject to g j ( x) ≤ 0, j = 1, ... , r; g0i ( x) ≤ 0, i = 1,... , n.

(13.12)

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

438

With β i denoting the multiplier on constraint g◦i , Equation 13.4 becomes: ∂ f ( x∗ ) ∂xi

=

r X

∂ g j ( x∗ )

λj

∂xi

j =1

∂xi

=

r X

n X

λj

βk

∂xi

= −1 and

∂ g j ( x∗ )

j =1

∂ g◦k ( x∗ )

k =1

∂ g ◦i ( x∗ ) ∂ xi

with λ j ≥ 0 and βk ≥ 0. Noting that ∂ f ( x∗ )

+

∂xi

, i = 1,... , n,

∂ g ◦k ( x∗ ) ∂ xi

(13.13)

= 0, k 6= i , this is

− β i , i = 1,... , n.

(13.14)

The complementary slackness conditions give: λ j g j ( x∗ ) = 0 and β i g◦i ( x∗ ) =

− β i x∗i = 0. From Equation 13.14, the conditions β i ≥ 0 and β i x∗i = 0, along with

x∗i ≥ 0, give: ∂ f ( x∗ ) ∂xi

−

r X

λj

∂ g j ( x∗ ) ∂xi

j =1

≤ 0,

x∗i

Ã

∂ f ( x∗ ) ∂xi

−

r X

λj

∂ g j ( x∗ )

j =1

∂xi

!

= 0, x∗i ≥ 0,

∀ i.

(13.15) Similarly,

g j ( x ∗ ) ≤ 0, λ j g j ( x ∗ ) = 0, λ j ≥ 0,

∀ j.

(13.16)

Provided the constraint qualification is satisfied, the conditions in Equations 13.15 and 13.16 are necessary for a local maximum of Equation 13.11. These conditions may be expressed in terms of a Lagrangian function. Theorem 13.5: Consider the program: max f ( x) subject to g j ( x) ≤ 0,

j = 1, ... , r;

x i ≥ 0, i = 1,... , n.

(13.17)

Let L = f ( x) −

r X

λ j g j ( x ).

j =1

Provided the constraint qualification is satisfied, a local maximum in Equation 13.17 satisfies: ∂L ∂xi ∂L ∂λ j

≤ 0;

≥ 0;

xi λj

∂L ∂xi

∂L ∂λ j

= 0; = 0;

x i ≥ 0 for λj ≥ 0

for

i = 1,... , n j = 1,... , r.

(13.18)

13.2. OPTIMIZATION

439

For a local minimum, consider the program: min f ( x) subject to g j ( x) ≥ 0, j = 1, ... , r; x i ≥ 0, i = 1,... , n.

(13.19)

Using the earlier reasoning, the constraint qualification is modified (as in Theorem 13.3). Noting that minimizing f is equivalent to maximizing − f and

that the constraint g j ( x) ≥ 0 is equivalent to − g j ( x) ≤ 0, the minimization problem may be written as:

max [− f ( x)] subject to − g j ( x) ≤ 0, j = 1, ... , r; x i ≥ 0, i = 1,... , n, so the Lagrangian has the form L = − f − ∂Lˆ ∂λ j

Pr

j =1 λ j [− g j ].

With Lˆ = −L ,

(13.20) ∂Lˆ ∂ xi

= − ∂∂L x , i

= − ∂∂λL , the conditions in Equation 13.18 translate directly to identify necesj

sary conditions for x∗ to be a local solution in Equation 13.20. Theorem 13.6: Consider the program: min f ( x) subject to g j ( x) ≥ 0,

j = 1, ... , r;

Let L = f ( x) −

r X

x i ≥ 0,

i = 1,... , n.

(13.21)

λ j g j ( x ).

j =1

Provided the constraint qualification is satisfied, a local minimum in Equation 13.21 satisfies: ∂L

≥ 0;

∂xi ∂L ∂λ j

≤ 0;

xi λj

∂L ∂xi

∂L ∂λ j

= 0; = 0;

x i ≥ 0 for λj ≥ 0

for

i = 1,... , n j = 1,... , r.

(13.22)

The following example revisits an earlier example with inequality constraints using complementary slackness explicitly. E XAMPLE 13.9: Recall Example 13.2: max x f ( x1 , x2 ) = x1 x2 subject to the p x1 ≤ 0. Formu-

constraints g 1 ( x1 , x2 ) = x22 + x12 − 4 ≤ 0 and g 2 ( x1 , x2 ) = x2 −

lating this problem with the Lagrangian and imposing the non-negativity constraints gives: p L = x1 x2 − λ1 [ x22 + x12 − 4] − λ2 [ x2 − x1 ]

∂L

∂ x1

= x2 − λ1 2 x1 + λ2

1 1 p 2 x1

(13.23)

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

∂L ∂ x2 ∂L ∂λ1 ∂L ∂λ2

440

= x1 − λ1 2 x2 − λ2

(13.24)

= −[ x22 + x12 − 4]

(13.25)

= −[ x 2 −

p

x 1 ].

(13.26)

Since ( x1 , x2 ) = (1, 1) satisfies the constraints, in any solution, x i > 0, oth-

erwise x1 x2 = 0 < 1 · 1. If both x1 and x2 are strictly positive, complemen-

tary slackness implies that x2 − λ1 2 x1 + λ2 12 p1x = 0 and x1 − λ1 2 x2 − λ2 = 0. 1

Next, observe that λ1 and λ2 are strictly positive. To see this, sup-

pose that λ1 = 0. Then Equation 13.23 implies that x2 = −λ2 21 p1x

1

and

Equation 13.24 gives x1 = λ2 . Since x1 > 0, λ2 > 0, so that x2 = −λ2 12 p1x

1

implies that x1 and x2 have opposite signs. Hence λ1 > 0. If λ2 = 0, then Equation 13.23 gives x2 = λ1 2 x1 and Equation 13.24 gives x1 = λ1 2 x2 . Therefore,

x2 x1

=

2λ1 x1 2λ1 x2

=

x1 x2 ,

and so x22 = x12 or x1 = x2 , since both are positive. Equa-

tion 13.25 then implies that x1 > 1 and x2 > 1 but Equation 13.26 requires p x2 = x1 , which is impossible. Therefore, in the solution, x1 , x2 are located at the intersection of the constraints and the multipliers are determined as in

♦

Example 13.2.

E XAMPLE 13.10: A utility function that arises frequently in economics is the quasilinear function:

u( x, y) = v( x) + y, where x and y are the two goods consumed. Denote income by m, so that the budget constraint is px + q y = m. Consider the problem of maximizing

utility subject to the budget constraint. Utility is maximized at the tangency p of the indifference curve and the budget constraint. Suppose that v( x) = x

so that v′ ( x) =

1 p1 2 x.

budget constraint has slope

or x =

q2

1 4 p2 .

p −q.

Setting these equal,

Thus, expenditure on good x is px =

y ≥ 0, then px =

2 1q 4 p

solution where y < 0.

dy dx 1 p1 2 x

The indifference curve has slope 2 1q 4 p .

= − 21 p1x and the p q p = q , or x = 12 p ,

Since px + q y = m, if

≤ m. Otherwise, the situation is as depicted, with a

13.2. OPTIMIZATION

441

y

p

x + y = u¯

2 1 q 4 p2

·

x

•

Write the budget constraint as g( x, y) = px + q y − m ≤ 0. The problem is p then to maximize f ( x, y) = x + y subject to g( x, y) = px + q y − m ≤ 0, x ≥ 0

and y ≥ 0. Assume that

2 1q 4 p

> m, so that, in the absence of non-negativity

constraints, the value of x demanded would exceed m, as depicted. p The Lagrangian for this problem is L = x + y − λ[ px + q y − m]. Applying

the conditions in Equation 13.22. ∂L ∂x

=

1 1 p − λ p ≤ 0; 2 x

∂L ∂y ∂L ∂λ

= 1 − λ q ≤ 0;

= −[ px + q y − m] ≥ 0;

x

y

λ

∂L ∂x ∂L ∂y

∂L ∂λ

= 0;

x≥0

= 0;

y≥0

= 0;

λ ≥ 0.

∂L ∂ y ≤ 0 implies that 1 − λ q ≤ 0, so ∂L 1 p1 1 that λ ≥ q . Next, ∂ x ≤ 0 implies that 2 x − λ p ≤ 0 and rearranging gives p p x ≥ 12 λ1p . So, x > 0 and therefore x ∂∂Lx = 0 implies ∂∂Lx = 0, giving x = 12 λ1p or x = 14 λ21p2 . To see that y = 0, suppose otherwise, so that y ∂∂Ly = 0 implies that ∂∂Ly = 0 p p q and so 1 − λ q = 0, implying that λ = 1q . In that case, x ≥ 21 λ1p gives x ≥ 21 p 2 2 q q and x ≥ 41 p2 , implying that px ≥ 14 p > m.

To solve this system, first observe that

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

442

L = 0 implies that [ px + q y − m] = 0 and with y = 0, px = m. Since λ > 0, λ ∂∂λ

And, since x =

1 1 4 λ2 p 2 ,

1 1 4 λ2 p 1 1 4 λ2 p 2

px =

substituting this into x =

= m or

gives x =

Thus, the solution values are: x =

13.2.6

1 λ2

m p,

= 4 pm, or λ = p 1

4 pm

1 1 4 λ2 p 2 ,

. Observe that

which gives x =

y = 0, λ = p 1

4 pm

m p.

.

♦

Necessary conditions and non-convexities

The necessary conditions for a local optimum are not sufficient for a local optimum. The next example presents an earlier example in terms of the Lagrangian and shows how the necessary conditions may be satisfied at a point that is not a local optimum.

E XAMPLE 13.11: Recall Example 13.3: maximize the function f ( x, y) =

α ln x + y, subject to the constraints g 1 ( x, y) = y2 + ( x − 1)2 ≤ 2 and g 2 ( x, y) =

x y ≤ k = 1.12. The Lagrangian is:

L = α ln x + y − λ1 ( y2 + ( x − 1)2 − 2) − λ2 ( x y − k).

So, the necessary conditions become: ∂L ∂x ∂L ∂y ∂L ∂λ1 ∂L ∂λ2

1 ∂L = α − 2λ1 ( x − 1) − λ2 y ≤ 0; x = 0; x ∂x = 1 − 2λ1 y − λ2 x ≤ 0;

y

= −( y2 + ( x − 1)2 − 2) ≥ 0;

λ1

= −( x y − k) ≥ 0;

λ2

∂L ∂y ∂L ∂λ1 ∂L ∂λ2

Suppose that there is a solution with λ1 = 0. Then

= 0,

x≥0 y≥0

= 0, λ1 ≥ 0 = 0, λ2 ≥ 0. ∂L ∂y

≤ 0 implies λ2 x > 0. 1 ≤ 0 gives α x − λ2 y ≤ 0, which implies that y > 0. Next, With λ1 = 0, y ∂L λ2 ∂λ = 0 gives x y = k so 1x = k . Substituting into ∂∂Lx = 0 (α 1x − λ2 y = 0) gives 2 y α k − λ2 y = 0 or ( αk − λ2 ) y = 0 and since y > 0, λ2 = αk . Turning to ∂∂Ly = 0 gives 1 − λ2 x = 0 or 1 − αk x = 0 so that x = αk , so that x y = k implies y = α. Thus, the pair ( x′ , y′ ) = ( αk , α) satisfy the necessary conditions for an optimum. Since α = 0.6 and k = 1.12 in the discussion, here y′ = 0.6 and x′ = 10.12 .6 = 1.867. ′ ′ ′ ∂L ∂x

The value of the objective at ( x , y ) is β = 0.9745 = 0.6 ln(1.866) + 0.6.

13.2. OPTIMIZATION

443

y

xy = k

2

∇ g1

∇ g2

∇f ∇ g2 ∇f

y∗ 1 ( x − 1)2 + y2 = 2

y′

0

x∗

0

1

x′

6 10

· ln x + y = β

6 10

· ln x + y = β′ x

2

The solution to the constrained optimization problem is at ( x∗ , y∗ ) where the necessary conditions for a constrained maximum are satisfied. But the necessary conditions are also satisfied at ( x′ , y′ ) and this point is not a local maximum. As the figure illustrates, there are points in the neighborhood of ( x′ , y′ ) satisfying the constraint and yielding higher payoff. Along the constraint curve x y = k, y = kx and so the value of the objective along the constraint is f¯( x) = α ln x + k . A second-order Taylor series expansion gives: x

µ ¶ µ µ ¶µ ¶ ¶ k k 2 k k 1 f¯( x) = f¯ + f¯′ x− + f¯′′ ( p) x − , α α α 2 α ³ ´ Since f ′ ( x) = α 1x − k x12 , at x = αk , f ′ αk = α 1k − α ¤ £ = 0. Also, f ′′ ( x) = −α x12 + 2k x13 = −α + 2k 1x x12 , so f ′′ ( p) =

where p is between x and 2

2

k . α

k ³ 1´2 = αk − αk k h α i ¤ £ −α + 2k 1p p12 . When x is close to αk , p ≈ αk , and − α + 2k 1p ≈ α > 0, so ¡ ¢2 f ′′ ( p) ≈ α αk > 0 when x is close to αk . Therefore, µ ¶ µ ¶ k 2 1 ³ α ´2 k · x− + α f¯( x) ≈ f¯ . α 2 k α

So, f ( x) > f¯( αk ) for x 6=

k α

in a neighborhood of x′ =

k . α

Finally, observe that

the constraint set is not convex (so Theorem 13.4 does not apply).

♦

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

13.2.7

444

The Lagrange multiplier

Consider a maximization program, with objective f and a single constraint g. Since ∇ f points in the direction of increase of f , at a point x∗ , on the boundary

of the constraint set defined by g, ∇ f should point out from the feasible region.

If ∇ g, which gives the direction of increase for g the constraint, points “out” from

the feasible set, in this case, at a tangency, ∇ f = λ∇ g for some positive λ. For ∇ g

to point out from the feasible set, the feasible set must be defined { x | g( x) ≤ 0}. Conversely, if the objective is to minimize f , then the direction of increase of f

should point into the feasible region at the candidate solution x∗ on the boundary of the feasible region. If we are to write ∇ f = λ∇ g with λ > 0, then ∇ g must

point into the feasible region. For this to be so, we must define the feasible set as { x | g( x) ≥ 0}.

13.2.8

Equality and inequality constraints

Finally, consider the case where there are both equality and inequality constraints. Given the family of constraints:

h l ( x) = 0, l = 1,... m;

g j ( x) ≤ 0, j = 1 ... , r.

Consider the problem of minimizing f ( x) subject to these constraints. Theorem 13.7: Suppose that f , h l , g j are continuous with continuous partial derivatives. Let X be an open set in R n and x∗ ∈ X and suppose that f ( x∗ ) ≤

f ( x), ∀ x ∈ X . Then there are numbers λ, µ1 , µ2 ,... , µk , λ1 , λ2 ,... , λr , not all 0, such

that:

λ

∂ f ( x∗ ) ∂xi

+

m X

l =1

µl

∂ h l ( x∗ ) ∂xi

+

r X

λr

j =1

∂ g j ( x∗ ) ∂x j

= 0,

i = 1,... , n.

(13.27)

In addition, 1. λ ≥ 0 and λ j ≥ 0 2. For each j such that g j ( x∗ ) < 0, λ j = 0 3. If the g j and h l satisfy a constraint qualification: the gradients of all h l and the active g j (those g j with g j ( x∗ ) = 0) are linearly independent vectors, then λ can be chosen equal to 1: λ = 1.

13.2. OPTIMIZATION

445

A result similar to Theorem 13.4 also holds with both equality and inequality constraints. Theorem 13.8: Suppose that f , h l , g j are continuous with continuous partial derivatives. Suppose that at x∗ , Equations 13.27 are satisfied, the constraint qualification is satisfied, f is convex and the constraint set convex. The x∗ minimizes f subject to the constraints.

This page intentionally left blank

Exercises for Chapter 13

Exercise 13.1 Let

f ( x, y) = y − α x − xβ ,

α, β > 1

g( x, y) = y2 − 1 − ln( x + 1).

Solve the problem max f ( x, y) subject to g( x, y) ≤ 0 and x, y ≥ 0, using the Kuhn– Tucker conditions.

Exercise 13.2 Let

f ( x, y) = x y g 1 ( x, y) = ax + b y − 2a,

a=b>0

g 2 ( x, y) = y + cxγ − 1,

c=

1 , γ = 2. 10

Solve the problem max f ( x, y) subject to g i ( x, y) ≤ 0 for i = 1, 2 and x, y ≥ 0, using

the Kuhn–Tucker conditions. Exercise 13.3 Let

f ( x, y) = x2 y g 1 ( x, y) = ax + b y − 2a,

a=b>0

g 2 ( x, y) = y + cxγ − 1,

c=

447

1 , γ = 2. 10

CHAPTER 13. INEQUALITY-CONSTRAINED OPTIMIZATION

448

Solve the problem max f ( x, y) subject to g i ( x, y) ≤ 0 for i = 1, 2 and x, y ≥ 0, using the Kuhn–Tucker conditions. Exercise 13.4 Let

f ( x, y) = 14 − x2 − y2 − 2 x + 2 y g( x, y) = x + y − 4. Solve the problem max f ( x, y) subject to g( x, y) ≤ 0 and x, y ≥ 0, using the Kuhn– Tucker conditions.

Exercise 13.5 Let

f ( x, y) = x y

g 1 ( x, y) = 1 − y + ( x − 1)3

g 2 ( x, y) = 2 y − 2 + 4( x − 1)3 .

Consider the program: max f ( x, y) subject to g i ( x, y) ≤ 0, i = 1, 2. Confirm that the

solution to this program occurs at ( x, y) = (1, 1), but at this point, there is no nonnegative solution ((λ1 , λ2 ) pair) to the equation:

∇ f = λ1 ∇ g 1 + λ2 ∇ g 2 .

Explain why the Kuhn–Tucker conditions fail.

14 Integration

14.1 Introduction The integral of a function f defines a new function, which may have the interpretation of an area or as the “reverse” of differentiation. For example, the computation of consumer surplus and dead-weight loss involve the calculation of area. Given a demand function P(Q) and market price p, the consumer surplus is defined as the area under the demand curve above price and is used as a measure of the welfare generated by the existence of that market. The technique of integration provides a means of determining this area. In a second type of problem, it is useful to identify a function as the derivative of another function. For example, the rate of return on an asset – the change in the value of the asset per unit of time – may be viewed as the derivative of the value of the asset. Thus, knowing the derivative of the value of the asset with respect to time, say v′ (t), it is necessary to determine the value of the asset, v(t), from this derivative – given v′ (t), v(t) must be found. These problems are solved by integration. In what follows, the integral of a function is formally defined. Following this, a variety of applications are developed: Section 14.4 considers the welfare measures of producer and consumer surplus. Section 14.5 considers the calculation of cost and profit for a firm. Taxation and dead-weight loss are examined in Section 14.6. In the remaining sections (Sections 14.7, 14.8, 14.9 and 14.10) present-value calculations, Leibnitz’s rule, measures of inequality and Ramsey pricing are discussed. Finally, Section 14.11 explores in detail various measures of welfare using these techniques.

449

CHAPTER 14. INTEGRATION

450

14.2 The integral Let f be a continuous function defined on an interval [a, b]. The integral of f over Rb the interval [a, b] is denoted a f (x)dx. Considering this expression as the area under the function f , when a and b are very close, then on the interval [a, b] the value of the function f does not change much ( f (b) ≈ f (a)), so the area is roughly

f (a)(b − a). To extend this idea to any interval (where b is not close to a), partition

the interval [a, b] into subintervals [x1 , x2 ],[x2 , x3 ],... ,[x k−1 , x k ] with a = x1 and

b = x k . A partition may be defined by the points determining the intervals: P =

{ x1 ,... , x n }, with x i < x i+1 . The norm or mesh of the partition is the largest of these

subintervals. Write |P | for the norm of the partition, define |P | = max i (x i+1 − x i ),

giving a measure of the widest interval in the partition. For each interval [x i , x i+1 ], let s i be a point in [x i , x i+1 ], x i ≤ s i ≤ x i+1 , i = 1,... , k − 1, s = (s1 ,... , s k−1 ) and consider the expression:

R(P , s) =

kX −1 i =1

f (s i )(x i+1 − x i ).

(14.1)

Definition 14.1: The function f is (Riemann) integrable on [a, b] if there is a number I, such that for every ǫ > 0 there is a δ > 0, such that if |P | < δ, then | R(P , s) − I |< ǫ,

for any s = (s1 ,... , s k ), x i ≤ s i ≤ x i+1 , i = 1,... , k − 1. In this case, I is said to be the Rb integral of f over [a, b] and the integral is denoted a f (x)dx = I.

Assuming the integral exists, the expression,

Rb a

f (x)dx denotes the area between

the function f and the horizontal axis. Figure 14.1 depicts the function f (x) = sin(π · x) + ln(x + 1) + 1, where x is measured in radians.

In Figure 14.2, the Riemann approximation is illustrated. As the mesh size becomes smaller, the rectangles better approximate the area under the curve. So, the integral is defined as the limit of a sequence of approximations to the area under the curve between a and b. In the figure, the area represented by the P rectangles is 6j =1 f (s i )(x j +1 − x j ). Note that if a < c < d, then Zc a

f (x)dx +

Zd c

f (x)dx =

Zd

f (x)dx.

a

A different perspective may be given where the integral is seen as the “reverse” of differentiation. With a fixed, varying b varies the integral and we may write the

14.2. THE INTEGRAL

451

y f ( x)

3 2 1 Rb a

0

a

0

f ( x)dx

b

1

x

2

Figure 14.1: The integral of f from a to b

y f ( x)

s1 s2 s2 s4 s5 s6 · · · · · · x1 x2 x3 x4 x5 x6 x7

x

Figure 14.2: The integral of f : definition

integral as a function of b. Define

F ( b) = Then F (b + ǫ) − F (b) =

Zb

f ( x)dx.

a

R b +ǫ

f ( x)dx. If ǫ is small, with f continuous, f is approximately constant on [b, b + ǫ] and equal to f (b). Then F (b + ǫ) − F (b) ∼ = f (b)[(b + ǫ) − b] = f (b)ǫ. Thus

b

F ( b+ǫ)−F ( b) ǫ

∼ = f (b). As ǫ → 0,

F ( b+ǫ)−F ( b) ǫ

→ F ′ (b) and the approx-

imation becomes exact so that F ′ (b) = f (b). From this perspective, integration recovers a function F whose derivative is f .

CHAPTER 14. INTEGRATION

452

14.3 Common integrals and rules of integration It is common to write

R

f (x)dx to denote that function F(x) with the property

that F ′ (x) = f (x). The function F is not unique since adding a constant leaves the

derivative unchanged. However, the definite integral may be obtained directly Rb from F: F(b) − F(a) = a f (x)dx. Thus, it makes sense to have a record of some standard indefinite integrals. The following table summarizes some of the more frequently used integral formulae. R

f (x)dx

R

n

x dx R1 dx R xax e dx R sin(ax)dx R cos(ax)dx

F(x) = = = =

=

1 n +1 n +1 x

ln x, x > 0

1 ax ae − 1a cos(ax) 1 a sin(ax)

14.4 Measuring welfare: surplus Two standard measures of welfare are consumer and producer surplus. Consumer surplus is defined as the area under the demand curve above price; producer surplus is revenue less variable cost. These are discussed next, and illustrate the use of integration in economics.

14.4.1

Consumer surplus

Suppose that there are six consumers of a good. Each consumer requires one unit of the good. Let w p i denote willingness of i to pay for a unit and suppose that, w p1 = 8, w p2 = 6, w p3 = 5, w p4 = 4, w p5 = 3 and w p6 = 1. With this information

we can plot aggregate demand as a function of price (Figure 14.3).

Each consumer pays the price p∗ = 3. Consumer 1 is willing to pay 8, 5 more

than 1 must pay and so has $5 “surplus”. Similarly, consumer 2 has $3 of surplus, and so on. The total surplus for the consumers is: CS = (8 − 3) + (6 − 3) + (5 − 3) + (4 − 3) + (3 − 3) = 11.

If D(p) is the number of units demanded at price p, demand, or p(q) inverse demand, then CS =

Z8 3

D(p)d p =

Z4 0

[p(q) − 3]dq.

14.4. MEASURING WELFARE: SURPLUS

453

p 9 8 7 6 5 4 p=3 2 1

D q

0 0 1 2 3 4 5 6 7 8 9

Figure 14.3: Consumer surplus

14.4.2

Producer surplus

Firm supply is given by marginal cost. Let c i denote the extra cost of producing the ith unit. And set c1 = 12 , c2 = 22 , c3 = 32 , c4 = 42 , c5 =

5 2

and c6 = 62 . The total

producer surplus (Figure 14.4) is the sum of price less the cost for each additional unit: µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ 1 2 3 4 5 1 PS = 3 − + 3− + 3− + 3− + 3− =7 . 2 2 2 2 2 2

p 6 5 4 p=3

S

2 1 q

0 0 1 2 3 4 5 6 7 8

Figure 14.4: Producer surplus

CHAPTER 14. INTEGRATION

454

Combining these gives Figure 14.5.

p 9 8 7 6 5

CS

4 3p 2

PS

1 q

0 0 1 2 3 4 5 6 7 8 9

Figure 14.5: Consumer and producer surplus With S(p) the supply function, we can write total surplus as Z8 1 2

[D(p) − S(p)]d p.

Alternatively, if we write the demand and supply curves in inverse form, p d (q) and p s (q), the total surplus may be evaluated as: Z q∗ 0

[p d (q) − p s (q)]dq,

where q∗ is defined by the equation, p d (q∗ ) = p s (q∗ ). In this formulation, each individual buys only one unit of the good. Suppose that the individual’s demand for the good depends on price. The following discussion shows how consumer surplus measures the net benefit to the consumer of having the good available.

14.4.3

Consumer surplus and welfare

Suppose that utility is determined by the level of income y and the amount of good x consumed. The individual has income y and in the absence of consumption of the good y, has utility u = y, so that a dollar of income gives one unit of utility. The

14.4. MEASURING WELFARE: SURPLUS

455

unit price of x is p, so that x units cost px and the remaining income is then y − px.

The utility from x is increasing, but marginal utility from consumption of x is

decreasing. To represent this, let the utility from gross income y and consumption x be given according to the formula 1

u(x, y) = x 2 + y − px. 1 − 21 2x

Maximizing utility by choice of x gives: for x, x(p) =

1 1 4 p2 ,

1

− p = 0, or x 2 =

and the inverse demand is p(x) =

1 p

2 x

1 2p ,

giving demand

. In the absence of any

0

purchase of good x, the individual has welfare u = y. With x available at price p per unit, demand is x(p) = 11 4 p.

1 1 4 p2

Welfare with x available is

and so total expenditure on x is px(p) = p · 14

1 p2

=

1

u∗ = x(p) 2 + y − px(p) 1 11 + y− 2p 4p 11 = y+ . 4p =

Comparing with u0 , the gain from the presence of market x is

11 4 p:

The individual would be willing to give up (or pay) income equal to

u∗ − u0 = 1 1 4 p

11 4 p.

for access

to the market for good x at price p. Consider now the computation of consumer surplus associated with good x. From the demand curve x(p) (Figure 14.6), consumer surplus is given by: CS =

Z∞ p

1 x(t)dt = 4

Z∞ p

−2

t

¯ ¯ 1 −1 1 ¯¯∞ 1 −1 ¯¯∞ 1 1 dt = t =− t ¯ = . 4 −1 ¯ p 4 4p p

p

CS

|

Figure 14.6: Cost and profit

x

CHAPTER 14. INTEGRATION

456

Thus,

u∗ − u0 = CS. In this example, there is an exact connection between welfare measured by variation in utility and welfare measured by consumer surplus. This arises because the utility function is quasilinear and, as a result, the level of income y does not 1

1

appear in the demand function for x. If, for example, u( x, y) = x 2 y 2 , then the exact connection breaks down but consumer surplus may still be considered a welfare measure. This is discussed further later. In general, with income level y and m goods available to purchase at prices

p1 ,... , p m , demand for good j will have the form x j ( p1 ,... , p m , y). In that case, to compute consumer surplus, we take the prices of other goods as constant and let

xˆ j ( p j ) = x j ( p1 ,... , p j −1 , p j , p j +1 ,... , p m ). Then, for example, the consumer surplus R p′′ lost as a result of a price increase from p′j to p′′j is given by p′ j xˆ j ( p j )d p j . Aggregaj

tion to many consumers is straightforward, using the aggregate demand function P X j ( p j ) = ni=1 xˆ ij ( p j ), where xˆ ij ( p j ) is the demand for good j by individual i .

Consider the constant elasticity of substitution (CES ) utility function ρ

ρ 1

u( x1 , x2 ) = ( x1 + x2 ) ρ , where 0 < ρ < 1. Suppose the individual has income y and

faces prices p i , p j . Then, with utility maximization, the demand for good i is:

x i ( p i , p j , y) =

p ri −1 y

, p ri + p rj

r=

ρ ρ−1

.

For example, letting ρ = 12 , so that r = −1, this becomes

x i ( p i , p j , y) =

p−i 2 y p−i 1 + p−j 1

=

y p i + p2i p−j 1

.

Suppose p j = 1 and y = 100; this then becomes xˆ i ( p i ) = x i ( p i , 1, 100) =

p′i

the consumer surplus loss due to a price increase from p i to is: " # Z p′ p′i i 100 pi ˜ d p = 100 ln − ln i 1 + p′i 1 + pi ˜ i + p˜ 2i pi p

14.5 Cost, supply and profit When a firm is a price taker, profit at price p is given by: π( q) = pq − c( q),

100 . p i + p2i

Then

14.5. COST, SUPPLY AND PROFIT

457

where c(q) is the total cost (fixed and variable) of producing q units of output. Thus, F = c(0) denotes the fixed cost, independent of whether output is produced

or not. Define the variable cost function as c v (q) = c(q) − c(0). Thus, c′v (q) = c′ (q). If price is fixed, maximizing profit gives

π′ (q) = p − c′ (q) = 0

or p = c′ (q) = mc(q). Thus, quantity is chosen to set marginal cost equal to price. Assuming c′ (q) is increasing, this equation defines the firm’s supply curve (Figure 14.7). At price p, to maximize profit, q should be chosen to satisfy p = c′ (q) = mc(q). Or, q =

mc−1 (p) = s(p), where s(p) is the firm’s supply curve. At price p, the firm will p c′ (q)

p∗

c v (q∗ ) q∗ = s(p∗ )

q

Figure 14.7: The supply curve supply the quantity mc−1 (p) to maximize profit. On the assumption that marginal cost is increasing, supply is upward sloping. Total cost is the sum of variable and fixed cost: c(q) = c v (q) + F and so c′ (q) = c′v (q). The area under the marginal cost curve gives variable cost:

Z q∗ 0

′

c (q)dq =

Z q∗ 0

c′v (q)dq,

which restates to: c(q∗ ) − c(0) = c v (q∗ ) − c v (0) = c v (q∗ ),

CHAPTER 14. INTEGRATION

458

since c v (0) = 0. Profit for the firm is π(q∗ ) = p∗ q∗ − c(q∗ ) = p∗ q∗ − c v (q∗ ) − F.

Rearranging: ∗

∗ ∗

∗

∗ ∗

π(q ) + F = p q − c v (q ) = p q −

Z q∗ 0

c′v (q)dq.

Thus, [π(q∗ ) + F] +

Z q∗ 0

c′v (q)dq = p∗ q∗ .

p c′ (q)

p∗ π(q∗ ) + F

Rq∗ ′ c v (q)dq 0

q∗ = s(p)

q

Figure 14.8: Profit and cost So, producer surplus equals profit plus fixed cost (Figure 14.8). Producer surplus depends only on the marginal cost curve, and profit may be negative if F is large.

14.5.1

Average and marginal cost

These observations may also be seen in terms of average cost: AC(q) =

c(q) c v (q) + F c v (q) F = = + . q q q q

Then, AC ′ (q) =

qc′ (q) − c(q) 1 ′ = [c (q) − AC(q)]. q q2

14.5. COST, SUPPLY AND PROFIT

459

From this equation, one sees two things. First, average cost is minimized at the point where marginal cost equals average cost (AC ′ (q) = 0 gives c′ (q) = AC(q)).

Second, average cost is declining when MC < AC; and increasing when MC > AC. These observations are depicted in Figure 14.9. Note that, as depicted, p∗
= MC(∆), ∆ ∆

where the strict inequality follows provided c(0) > 0.

p c′ (q) p

AC(q)

∗

q∗ = s(p)

q

Figure 14.9: Average cost curves and marginal cost Similarly, considering average variable cost AV C(q) =

AV C ′ (q) =

qc′v (q)− c v (q) , q2

c v (q) q ,

it has derivative

so that the minimum AV C occurs where average variable

cost coincides with marginal cost. In the case of increasing marginal cost, average variable cost is also increasing and equals marginal cost at 0. More generally, observe that, since c v (0) = 0: lim

q→0

AV C(q) c v (q) c v (q) − c v (0) = lim = lim = c′v (0) = MC(0), q→0 q q→0 q q

so that average variable cost and marginal cost always coincide at q = 0.

Figure 14.10 depicts the case where marginal cost declines initially. Consider the

CHAPTER 14. INTEGRATION

460

p

c′ (q)

AV C(q)

AC(q)

p¯

q

q∗

Figure 14.10: Cost curves

total cost function c(q) = F + 4q − 74 q2 + 13 q3 , where F ≥ 0 is a constant, the fixed cost. The marginal cost function is

c′ (q) = 4 −

28 q + q2 8

and the average variable cost function is 7 1 AV C(q) = 4 − q + q2 . 4 3 ¡ ¢ ¡ ¢ At the minimum, q∗ = 2 58 and the value of either function is c′ 2 58 = AV C 2 58 = 109 64

= p∗ . At p∗ , variable costs are (just) covered at output level q∗ ,

p∗ = AV C(q∗ ) =

c v (q∗ ) . q∗

Thus, in Figure 14.11, B + D = p∗ q∗ = c v (q∗ ) =

Z q∗ 0

c′ (q)dq = A + D.

The logic for this is clear; marginal cost is high initially and at price p∗ exceeds unit revenue. As output increases beyond q′ , price exceeds marginal revenue and at point q∗ revenue from sales fully covers marginal cost. There is no sales (or production) below price p∗ . Above that price, supply follows the marginal cost curve. The supply curve is given by the heavy line (Figure 14.12). No output level between 0 and q∗ is ever chosen: output is either 0 or at least as large as q∗ .

14.6. TAXATION AND CONSUMER SURPLUS

461

c′ (q)

A p

AV C(q)

∗

B D q∗

q′

Figure 14.11: Cost curves

c′ (q)

A

AV C(q)

p∗

B D q′

q∗

Figure 14.12: Supply

14.6 Taxation and consumer surplus While taxation is necessary to raise government revenue, it imposes a burden on the consumer and changes consumer behavior. This change in behavior leads to inefficiency and welfare loss, measured in terms of dead-weight loss of consumer surplus. The discussion here explains how this arises. A standard demand function is depicted in Figure 14.13, along with the impact of taxation on price and quantity. Denote the demand function by P(Q), and let P t = P0 (1 + t), where t is an ad-valorem tax at rate t.

CHAPTER 14. INTEGRATION

462

P

P

P¯

P¯

P0

Pt P0 willingness to pay

Q

Q0

0

0

revenue

C

dead-weight loss •B • •A

Q t Q0

Q

Figure 14.13: Taxation and consumer surplus

The loss due to distortion in the market is the shaded triangle marked as deadweight loss. This is sometimes called the “excess burden” of tax, since raising the revenue incurs this additional cost. The dead-weight loss is calculated: D(t) =

ZQ 0 Qt

P(Q)dQ − P0 (Q 0 − Q t ).

(14.2)

In Figure 14.13, willingness to pay measures the overall benefit provided to ¯ consumers by this market and is measured by the region POQ 0 A. At price P0 , consumers pay P0 ×Q 0 (the area P0 OQ 0 A). So, the surplus provided by the market ¯ 0 A (or POQ ¯ is given by the region PP 0 A − P0 OQ 0 A). The imposition of the tax raises price to P t , reducing surplus by P t P0 AB and generating revenue (P t − P0 ) ×

Q t . The surplus reduction exceeds the revenue raised by the amount P t P0 AB

less (P t − P0 ) × Q t , which is the region ABC, the dead-weight loss associated with the tax.

E XAMPLE 14.1: Let P(Q) =

1 Q

and suppose that P0 is given exogenously

(horizontal supply). Consider the impact of an ad-valorem tax. One may use the formula in Equation 14.2 to determine the dead-weight loss, D. The area under the demand curve between Q 0 and Q t is given by: ZQ 0 Qt

P(Q)dQ =

ZQ 0 Qt

· ¸ Q0 1 Q dQ = ln(Q) |Q 0 = ln(Q 0 ) − ln(Q t ) = ln . t Q Qt

14.6. TAXATION AND CONSUMER SURPLUS

463

Next, Q 0 and Q t are determined by the initial price and the tax:

P0 =

1 1 1 1 so that Q 0 = ; and P0 (1 + t) = so that Q t = . Q0 P0 Qt P0 (1 + t)

These give

Q0 Qt

that

= 1 + t so that

RQ 0 Qt

Q0 − Q t = so P0 (Q 0 − Q t ) =

t 1+ t .

P (Q )dQ = ln

h

Q0 Qt

i

= ln(1 + t). Next, observe

µ ¶ 1 1 t 1 1− , = P0 1+t P0 1 + t

Therefore,

D ( t) =

ZQ 0 Qt

p(Q )dq − P0 (Q 0 − Q t )

= ln(1 + t) −

t . 1+t

(14.3) (14.4)

With this formula, the impact of varying the tax on the dead-weight loss can be calculated: · ¸ (1 + t) − t 1 − 1+t (1 + t)2 · ¸· ¸ · ¸· ¸ 1 1 1 t = 1− = 1+t 1+t 1+t 1+t t > 0. = (1 + t)2

D ′ ( t) =

[(1 + t)2 − 2 t(1 + t)] (1 + t)4 [(1 + t) − 2 t] = (1 + t) (1 + t)4 (1 − t) = . (1 + t)3

D ′′ ( t) =

Therefore, the dead-weight loss is increasing (D ′ ( t) > 0) at an increasing rate

(D ′′ ( t) > 0)).

♦

E XAMPLE 14.2: In Example 14.1, the supply price was exogenous. This example considers the calculation of producer and consumer surplus where supply is upward sloping. Consider the following supply and demand functions:

P d (Q ) = kQ −α , 0 < α < 1,

P s (Q ) = Q β .

CHAPTER 14. INTEGRATION

464

Together these determine the market-clearing quantity, Q 0 , and both the consumer and producer surplus. The following discussion shows that the sum of producer and consumer surplus (overall surplus), is: ·

β+α

(1 − α)(1 + β)

¸

α+β

k α+1 .

Equating supply and demand, P d (Q ) = kQ −α = Q β = P s (Q ) gives k = β

1

Q α+β , so the solution is Q 0 = k α+β . The associated price is P0 = k α+β . Note, β+1

therefore, that R0 = P0 Q 0 = k α+β . The consumer and producer surplus (CS

and PS ) are

CS = = =

ZQ 0 0

ZQ 0 0

P d (Q )dQ − R0 kQ −α dQ − R0

k (Q 0 )1−α − R0 , 1−α

PS = R0 − = R0 − = R0 −

ZQ 0

P s (Q )dQ

ZQ 0

Q β dQ

0

0

1 (Q 0 )β+1 . 1+β

Since R0 cancels out when consumer and producer surplus are combined:

CS + PS =

ZQ 0 0

[P d (Q ) − P s (Q )]dQ. 1 −α

Considering CS , from the solution to Q 0 , (Q 0 )1−α = k α+β so that k 1 −α 1−α (Q 0 )

so that

=

1 −α

1 α+β 1−α kk

=

CS =

1 −α

1+ α+β 1 . 1 −α k

Note that 1 +

1 −α α +β

β+1 β+1 β+1 α 1 k α+β − k α+β = k α+β . 1−α 1−α β+1

For the producer surplus, (Q 0 )β+1 = k α+β , so that β+1

PS = k α+β −

β+1 β+1 1 β k α+β = k α+β . 1+β 1+β

=

α+β+1−α α +β

=

β+1 , α +β

14.7. PRESENT VALUE

465

Therefore,

CS + PS = using

h

α 1 −α

β

i

+ 1 +β =

h

·

α

1−α

+

α(1+β) (1−α)(1+β)

β

1+β

¸

β+1

k α+β = i

β(1−α)

·

+ (1+β)(1−α) =

¸

α+β

(1 − α)(1 + β) h i α(1+β)+β(1−α) (1+β)(1−α)

β+1

k α+β , =

h

α +β (1+β)(1−α)

i .

♦

E XAMPLE 14.3: Suppose that a value added tax t is imposed on the market given by:

P d (Q ) = kQ −α , 0 < α < 1,

P s (Q ) = Q β .

The following calculations determine the dead-weight loss associated with the tax. The after-tax equilibrium quantity is determined by: P d (Q ) =

kQ −α = (1 + t)Q β = (1 + t)P s (Q ). This can be rearranged to give

and letting k′ =

k (1+ t)

k −α (1+ t) Q

= Qβ,

gives k′ Q −α = Q β . Repeating earlier calculations gives 1

the after-tax equilibrium price and quantity Q t = (k′ ) α+β , and before-tax β

1

equilibrium is at Q ∗ = (k) α+β . The after-tax equilibrium price is P t = (k′ ) α+β . The dead-weight loss is:

DW L = = = = =

ZQ ∗ Qt

ZQ ∗ 0

· · ·

[P d (Q ) − P s (Q )]dQ [P d (Q ) − P s (Q )]dQ − α+β

¸

α+β

¸µ

α+β

¸Ã

(1 − α)(1 + β) (1 − α)(1 + β) (1 − α)(1 + β)

β+1

k α+β − β+1

k α+β

k

β+1 α+β

ZQ t Q∗

·

[P d (Q ) − P s (Q )]dQ

α+β

(1 − α)(1 + β) ¶ β+1 − (k′ ) α+β "

µ

1 1− 1+t

¸

¶ β+1 #! α+β

β+1

(k′ ) α+β

.

♦

14.7 Present value A firm operating a plant has revenue flow: R ( t) at time t. The cost of operating the plant in present-value terms to time T is c(T ) and the discount rate is r.

CHAPTER 14. INTEGRATION

466

From these functions, the present value (profit) from operating to period T is π( T ) =

ZT 0

R ( t) e−rt dt − c(T ).

As time goes on (T increases) cost c(T ) increases and the increment to revenue decreases if R ( t) e−rt is declining. For how long should the firm operate: what value of T maximizes π(T )? A special case of this problem occurs when R ( t) = kt and c(T ) = ce rt , with

k, c > 0. In this case

π( T ) = k

To calculate

RT 0

ZT 0

te−rt dt − ce rT .

te−rt dt use the integration by parts rule, − rt

− rt

RT 0

udv = uv |0T −

RT 0

vdu.

In the present case, let u = t, v = e so dv = (−r) e dt. RT RT −rt ¡ 1 ¢RT Rewrite 0 te dt as − r 0 t(−r) e−rt dt and consider 0 t(−r) e−rt dt. Inte-

grating by parts, ZT 0

The term

RT 0

ZT 0

t(−r) e−rt dt = te−rt |0T −

ZT 0

e−rt dt = T e−rT −

ZT

e−rt dt.

0

e−rt dt can be integrated: ¶ ZT ¶ µ ¶ µ µ 1 1 1 (−r) e−rt dt = − e−rt |0T = − [ e−rT − 1]. e−rt dt = − r 0 r r

Thus, ZT 0

µ ¶· ¸ ¸ · 1 1 1 1 1 1 T e−rT + e−rT − = 2 − 2 e−rT − T e−rT . r r r r r r ¸ · 1 1 1 π( T ) = k 2 − 2 e−rT − T e−rT − ce rT . r r r

te−rt dt = −

The first-order condition is: π′ ( T ) = 0 = k

·

¸ 1 −rT 1 −rT e − e + T e−rT − cre rT = 0 = kT e−rT − cre rT = 0. r r

The second-order sufficient condition for a maximum is: h i π′′ ( T ) = k e−rT − rT e−rT − cr2 e rT < 0.

One feature of this example is that the variable T appears in the range of the integral. For the particular functional form, completing the integration and

14.8. LEIBNITZ’S RULE

467

then computing the derivative was simple. Sometimes it is more convenient to differentiate directly and Leibnitz’s rule provides a means to do so.

14.8 Leibnitz’s rule Let f ( x, k) be a continuous function of x and t and let

g( k ) =

Z b( k)

f ( x, k)dx,

a( k)

c ≤ k ≤ d.

(14.5)

Leibnitz’s rule provides a formula for differentiating g. Theorem 14.1: Suppose that f ( x, k) and

∂ f ( x,k) ∂k

are continuous functions of ( x, t)

on a region [a, b] × [ c, d ] and suppose that a(k) and b(k) are continuous with continuous derivatives on [a, b] (continuously differentiable on [a, b]), both with range in [ c, d ], then

dg = dk

Z b( k) a( k)

∂f ∂k

dx + f (b(k), k)

db(k) da(k) − f (a(k), k) . dk dk

Applied to the present-value calculation: π( T ) =

ZT 0

R ( t) e−rt dt − c(T ) ⇒

dπ d = dT dT

·ZT 0

¸ R ( t) e−rt dt − c′ (T ).

Here f ( t, T ) = R ( t) e−rt dt. Applying Leibnitz’s rule:

Since

∂f ∂T

dπ = dT

·ZT 0

∂T

d0 dT − rT

= 0,

dπ dT

= 0 and

zero gives R (T ) e

∂f

dt + f (T, T )

¸ dT d0 − f (0, T ) − c ′ ( T ). dT dT

= f (T, T ) − c′ (T ) = R (T ) e−rT − c′ (T ). Setting this to

− c′ (T ) = 0 (in the example kT e−rT − cre rT = 0).

E XAMPLE 14.4: A student has discovered the function that relates exam score to time spent studying. The key components of the function are as follows. Time spent studying improves exam performance at the rate a t . However, with more study, fatigue reduces ability to comprehend interesting math quickly, so effort over time should be discounted at some rate: e−r( t+1). Note e−r( t+1) and not e−rt , because it takes some time to “get going”. Time spent studying is deducted from sleeping time. One hour of sleep is worth c

CHAPTER 14. INTEGRATION

468

dollars, so studying for T hours costs cT . Putting all this together, the “profit from study” function is given by π( T ) =

ZT 0

e−r( t+1) a t − cT,

where T is the number of hours spent studying. Assume that 0 < c < e−r and

0 < a < e r . A natural question is to find the optimal choice of T .

Here, using Leibnitz’s rule will help. The first-order condition is: π′ ( T ) = e−r(T +1) aT − c = 0.

Now, aT = e T ln a , so the first-order condition may be written e−r(T +1) e T ln a = c

or e−r [ e T (ln a −r) ] = c or [ e T (ln a −r) ] = e r c or T (ln a − r) = r + ln c. Thus,

T∗ =

r + ln c . (ln a − r)

Now, 0 < c < e−r so ln c < −r or r + ln c < 0 and 0 < a < e r gives ln a < r or ln a − r < 0. Therefore, T ∗ > 0. Finally, to check the second-order condition, π′ ( T ) = e−r(T +1) aT − c = e−r e T (ln a −r) − c, π′′ ( T ) = (ln a − r) e−r [ e T (ln a −r) ] = (ln a − r) e−r(T +1) aT < 0.

A more complex version of this has a depend on T : a(T ). In this case, the integral becomes: π( T ) =

ZT 0

e−r( t+1) a(T ) t − cT.

In this case, applying Leibnitz’s rule: π′ ( T ) =

ZT 0

ZT

∂ ∂T

{ e−r( t+1) a(T ) t }dt + e−r(T +1) a(T )T − c

∂ {a(T ) t }dt + e−r(T +1) a(T )T − c ∂ T 0 ½ ¾ ZT 1 ′ = e−r( t+1) t a (T )a(T ) t dt + e−r(T +1) a(T )T − c a(T ) 0

=

=

e−r( t+1)

1 ′ a (T ) a(T )

ZT 0

t e−r( t+1) a(T ) t dt + e−r(T +1) a(T )T − c.

Setting this to 0 gives the first-order condition for T .

♦

14.9. INEQUALITY MEASURES

469

14.9 Inequality measures The Lorenz curve is often used as an indicator of the distribution of income in a population. A Lorenz curve L identifies the share of national income belonging to different percentiles of the population, ordered by relative income. In particular, it is often used as an indicator of income inequality. The Lorenz curve is a function attaching a value L(p) to each percentile p. The following discussion defines these curves. Let F be a distribution function of income in a population: F(y) = proportion of

population with income ≤ y. Assume F is differentiable and let F ′ (y) = f (y), where

f is called the density of the distribution function. Assume f (y) > 0 for all y. For

yˆ > y, yˆ close to y,

F( yˆ ) − F(y) ∼ =

Z yˆ y

f (y)d y = f (y)( yˆ − y)

gives the proportion of the population with income between y and yˆ . Given y, if p = F(y) ≤ 1, then 100p% is the percentage of the population with income ≤ y. Alternatively, given the fraction p, if y is defined by p = F(y) then y is the

level of income such that 100p% have income ≤ y. To each p we can associate a y, y(p) such that 100p% of population have income in the range [0, y(p)]. Thus,

F(y(p)) = p. The Lorenz curve, L(p), gives for each fraction, p, the total income of

the poorest 100p% divided by the total income of the population (Figure 14.14).

The following discussion defines the Lorenz curve, first for the finite population case, and then for a continuous population. Consider first the case where there is a finite number of people in the population. Suppose that the incomes earned in the population are { y1 , y2 ,... , yk } with yi < yi+1 and n i people have P income yi . The total number of people in the population is n = ki=1 n i , the total Pq number with income no greater than yq is j =1 n j , and the fraction of the popPq

nj

ulation with income no larger than yq is Pkj =1 . The total income of people with n j =1 j Pq income no larger than yq is i=1 yi n i . The total income in the population is Pk i=1 yi n i . So, the fraction of income held by the population members with income no larger than yq is:

L=

Pq

i =1 Pk i =1

where f (yi ) =

yi n i yi n i ni n,

=

1 n 1 n

Pq

i =1 Pk i =1

yi n i yi n i

=

Pq

¡n ¢ y i i =1 i n ¡ ni ¢ Pk i =1 y i n

Pq

= P ki=1

yi f (yi )

i=1 yi f (yi )

P yq

y= y1

= P yk

the fraction of the population with income yi .

y= y1

y f (y) y f (y)

,

CHAPTER 14. INTEGRATION

470

By analogy with this expression, in the continuous case, R y(p)

0 L(p) = R ∞ 0

=p

x f (x)dx

x f (x)dx

=

R y(p) 0

R y(p) h f (x) i F(y(p)) 0 x F( y(p)) dx R∞ = 0 x f (x)dx

x f (x)dx µ

µp µ

where µ p is the mean income of the poorest fraction p of the population. Thus L(p) gives the percentage of total income held by the poorest 100p% of the population. Here µ is the mean income of the population. To clarify: Z y(p) 0

Note that

R y(p) 0

f (x) F( y(p)) dx

x f (x)dx = F(y(p)) = 1 (since

R y(p) 0

Z y(p) 0

x

f (x) dx. F(y(p))

f (x)dx = F(y(p)) and thus

f (x) F( y(p))

is a

distribution – the distribution of income in the group with income in the range [0, y(p)]. If xˆ > x and xˆ close to x,

f (x)( xˆ − x) F( y(p))

≈ “proportion” of the income group R y(p) f (x) ˆ Therefore, 0 x F( y(p)) dx = µ p – the [0, y(p)] whose income is in the range [x, x].

mean income of the poorest 100p% (i.e., mean income to pth percentile). Also F(y(p)) = p, by definition, so the Lorenz curve may be written L(p) = p

µp . µ

Since

µ p ≤ µ, L(p) ≤ p.

The following calculations show that the Lorenz curve is convex, as depicted in Figure 14.14. L(p) =

1 µ

Z y(p)

x f (x)dx,

0

1

L(p) A B

1

Figure 14.14: The Lorenz curve

14.10. RAMSEY PRICING

471

so (using Leibnitz’s rule), L′ (p) =

1 µ

[y(p) f (y(p))]

d y(p) . dp

Now p = F(y(p)), so 1 = F ′ (y(p))

d y(p) d y(p) d y(p) 1 = f (y(p)) ⇒ = . dp dp dp f (y(p))

Thus, L′ (p) =

y(p) ′′ 1 d y(p) 1 1 , L (p) = · = · > 0. µ µ dp µ f (y(p))

A second measure of inequality is given by the Gini coefficient. This is a single number, G, defined:

G=

1 2

−

R1 0

L(p)d p 1 2

= 1−2

Z1 0

L(p)d p =

A . A+B

(14.6)

In Figure 14.14, this is the area A divided by the area A + B. The area A is 12 − R1 1 ◦ 0 L(p)d p (the area under the 45 line and above L(p)), while the area A + B is 2 . While the Gini coefficient contains less information than the Lorenz curve, it has

the virtue of being a single number, so that Gini coefficients for different countries are comparable.

14.10 Ramsey pricing A standard rule of thumb for efficient pricing is that price be set equal to marginal cost: output should be increased up to the point where the cost of providing an additional unit is equal to the amount a consumer is willing to pay. Where there are fixed costs of production, this is not viable in the long term since marginal cost pricing does not cover fixed cost. In the case of a single good, price must be at least as large as average cost for long-run profit to be non-negative. With more than one good, where there are fixed costs and joint production, the problem is more complicated. The following discussion considers efficient pricing (to maximize welfare) where more than one market is supplied, and where a fixed cost must be recovered by pricing above marginal cost. A firm supplies two markets, X and Y . The variable cost of joint production is c(x, y), fixed cost is F, and inverse demand in the markets is p x and

CHAPTER 14. INTEGRATION

472

p y , respectively. At output levels ( x∗ , y∗ ), surplus is given by: S ( x ∗ , y∗ ) =

Z x∗ 0

p x ( x)dx +

Z y∗ 0

p y ( y ) d y − c ( x ∗ , y ∗ ).

The problem is to maximize surplus subject to revenue from sales covering cost. For example, c( x, y) = c x x + c y y + F is a cost function with constant marginal costs

c x and c y and a fixed cost F . Thus, prices must be set above marginal cost to cover

fixed cost. Define the Lagrangian: L=

Z x∗ 0

p x ( x)dx +

Z y∗ 0

p y ( y)d y − c( x∗ , y∗ ) + λ[ p x ( x) x + p y ( y) y − c( x∗ , y∗ ) − F ].

At a solution ( x∗ , y∗ ), ∂L ∂ x∗ ∂L ∂ y∗ ∂L ∂λ

= p x ( x∗ ) − c x ( x∗ , y∗ ) + λ[ p x ( x∗ ) + p′x ( x∗ ) x∗ − c x ( x∗ , y∗ )] = 0 = p y ( y∗ ) − c y ( x∗ , y∗ ) + λ[ p y ( y∗ ) + p′y ( y∗ ) y∗ − c y ( x∗ , y∗ )] = 0 = p x ( x ∗ ) x ∗ + p y ( y ∗ ) y ∗ − c ( x ∗ , y ∗ ) − F = 0.

Rearranging these equations, [ p x ( x∗ ) − c x ( x∗ , y∗ )](1 + λ) + λ p′x ( x∗ ) x∗ = 0 [ p y ( y∗ ) − c y ( x∗ , y∗ )](1 + λ) + λ p′y ( y∗ ) y∗ = 0 or [ p x ( x∗ ) − c x ( x∗ , y∗ )] = − [ p y ( y∗ ) − c y ( x∗ , y∗ )] = − where k =

λ (1+λ) .

λ

(1 + λ)

p′x ( x∗ ) x∗ = − p′x ( x∗ ) x∗ k

λ p′ ( y∗ ) y∗ = − p′y ( y∗ ) y∗ k, (1 + λ) y

Dividing by p x ( x∗ ) and p y ( y∗ ), respectively, and noting that − p′x ( x∗ ) x∗

p x ( x∗ )

=

1 η x ( x∗ )

,

[ p x ( x∗ ) − c x ( x∗ , y∗ )] 1 =k p x ( x∗ ) η x ( x∗ ) [ p y ( y∗ ) − c y ( x∗ , y∗ )]

p y ( y∗ )

=k

1 η y ( y∗ )

.

14.10. RAMSEY PRICING

473

Thus, the markup of price in excess of marginal cost in each market should be proportional to the reciprocal of the elasticity in that market (with the same constant of proportionality, k, for all markets). R EMARK 14.1: In this discussion, differentiation involved differentiating with respect to a variable appearing in the range of an integral. This can be done using Leibnitz’s rule, but also by direct computation. The derivative of g( x∗ ) = R x∗ ∗ ∗ 0 p x ( x) dx with respect to x is p x ( x ), and this can be established with a simple variational argument: ·Z x∗ +∆ ¸ Z x∗ 1 1 [ g( x∗ + ∆) − g( x∗ )] = p x ( x)dx p x ( x)dx − ∆ ∆ 0 0 ·Z x∗ ¸ Z x ∗ +∆ Z x∗ 1 = p x ( x)dx + p x ( x)dx − p x ( x)dx ∆ 0 0 x∗ =

1 ∆

Z x ∗ +∆

p x ( x)dx

≈

1 ∆

Z x ∗ +∆

p x ( x∗ )dx,

x∗

x∗

the approximation following with p x ( x) approximately constant on [ x∗ , x∗ + ∆]. R x ∗ +∆ Since x∗ p x ( x∗ )dx = p x ( x∗ )∆, 1 1 [ g( x∗ + ∆) − g( x∗ )] ≈ p x ( x∗ )∆ = p x ( x∗ ). ∆ ∆

E XAMPLE 14.5: Consider p x ( x) = a − bx, p y ( y) = α − β y, and suppose that

cost is c( x, y) = c x x + c y y + F . Since the reciprocal of the elasticity is

− pxx

p′x ( x),

the formula:

1 ηx

=

[ p x ( x∗ ) − c x ( x∗ , y∗ )] 1 =k p x ( x∗ ) η x ( x∗ ) becomes

· ¸ p x ( x∗ ) − c x x∗ ′ ∗ = k × − p ( x ) . p x ( x∗ ) p x ( x∗ ) x

Cancelling p x ( x∗ ) and with p′x ( x) = −b,

a − bx∗ − c x = k × bx∗ . Thus, (a − c x ) = (1 + k)bx∗ so that x∗ =

1 a− c x 1+ k b

= δ a−bc x , with δ =

x mc be the quantity at which price is equal to marginal cost: x mc =

1 1+ k . Let a− c x b and,

CHAPTER 14. INTEGRATION

474

with similar notation, ymc =

α− c y β .

Thus, x∗ = δ x mc and y∗ = δ ymc , so that

relative to marginal cost pricing, both goods are scaled down by the same proportion. Thus, when x mc = ymc , x∗ = y∗ . Revenue over marginal cost in

market x is:

[ p x ( x∗ ) − c x ] x∗ = [(a − c x ) − bx∗ ] x∗ = [(a − c x ) − bx∗ ] x∗ = [(a − c x ) − δ(a − c x )] x∗ = [(1 − δ)(a − c x )] x∗ , 2

and substituting for x∗ gives revenue over marginal cost as: (1 − δ)δ (a−bc x ) .

The analogous calculation in market y gives revenue over cost as (1 − δ)δ

( α− c y ) 2 . β

Therefore, setting total revenue over marginal cost equal to

fixed cost gives: # ( a − c x )2 (α − c y )2 + =F (1 − δ)δ b β "

This expression (assuming a solution) will have two roots – select the larger

♦

root, corresponding to the larger output of x and y.

14.11 Welfare measures When the price of a good changes, the welfare of a consumer is affected. How should the welfare change be measured? There are a number of answers to this question – here three are considered: consumer surplus, compensating variation and equivalent variation.

14.11.1

Consumer surplus

Maximizing utility u( x1 ,... , x n ) subject to the budget constraint

P

p i x i = m yields

demand functions x i ( p1 ,... , p n , m), i = 1,... , n. The program is written: max subject to

u( x1 ,... , x n ) X p k x k = m.

(14.7)

k

For ease of notation, write the demand function as x i ( p i , p− i , m) to highlight

“own” price, p i . The optimal choice occurs at the tangency of the budget constraint

14.11. WELFARE MEASURES

475

and the indifference curve as depicted in Figure 14.15 for the case of two goods, (x i , x j ). A natural question arises: how does a price change affect welfare? For example, if p i increases to p′i , how is the individual’s welfare affected? Considering Figure 14.15, utility has fallen from u0 to u1 so that the loss in welfare might be measured by u0 − u1 . However, utility is an ordinal measure of welfare (u(x1 ,... , x n ) and 5u(x1 ,... , x n ) have exactly the same indifference curves so the distance u0 − u1 is arbitrary). A natural monetary measure of welfare loss uses the consumer surplus associated with demand for good i, and this measure is commonly used. However, a change in the price of one good affects demand for other goods through cross-price effects, so that consumer surplus variations resulting from a change in consumption of good i may not fully describe the impact on welfare. When the price of good x i increases to p′i , demand for good x i decreases from x i (p i , p j , m) to x i (p′i , p j , m) (assuming that the good is not a “Giffen” good). This gives a downward sloping demand curve – as p i increases, x i decreases.

xj

pi u0

u1 p′i x(p i , p j , m) • •

CS

x(p′i , p j , m)

pi x i (p i , p j , m)

x′i x i

m p′i

m pi

xi

x′i

xi

xi

Figure 14.15: Utility maximization and consumer surplus In Figure 14.15, a price increase from p i to p′i moves the budget constraint inward, leading to a tangency at x i (p′i , p j , m) on the new budget constraint: {(x i , x j ) |

p′i x i + p j x j = m}. Demand for good x i falls from x i = x i (p i , p j , m) to x′i = x i (p′i , p j ,m).

CHAPTER 14. INTEGRATION

476

The welfare cost measured in terms of consumer surplus is given by the area under the demand curve between p i and p′i : CS =

Z p′

i

pi

x i ( p˜ i , p j , m)d p˜ i .

(14.8)

The demand function x i = x i (p i , p j , m) is sometimes referred to as the Marshal-

lian demand function, to distinguish it from the Hicksian (or “compensated”) demand function that is discussed later.

E XAMPLE 14.6: Suppose that u(x i , x j ) = aln x i + b ln x j . Optimality involves

setting the marginal rate of substitution equal to the price ratio. Thus, a xi b xj

Rearranging this expression:

ax j bxi

= pi pj ,

=

pi . pj a pj b p i x j . Substituting this ³ ´ p p i ab p j x j + p j x j = m, or i

so that x i =

into the budget constraint p i x i + p j x j = m gives £a ¤ a+ b b p j + p j x j = m, or b p j x j = m. A similar calculation applies for x i so that

x i (p i , p j , m) =

a m , a + b pi

x j (p i , p j , m) =

b m . a+b pj

So, for example, considering a price rise from p i to p′i , the reduction in consumer surplus is given by: CS =

Z p′

i

pi

x i ( p˜ i , p j , m)d p˜ i =

Z p′

i

pi

h i a m a d p˜ i = m ln(p′i ) − ln(p i ) . a + b p˜ i a+b

♦

14.11.2

Compensating variation

Two important alternative measures of welfare variation are compensating and equivalent variation. In this section, compensating variation is discussed in detail. The measure is illustrated in Figure 14.16. Suppose there are two goods, i and j, and the initial price–income profile is (p i , p j , m). Consumption is at A. Suppose the price of good i increases from p i to p′i . Then, consumption moves

14.11. WELFARE MEASURES

477

xj

•C

B

•

• A

u0 u′ xi m p′i

m′ p′i

m pi

Figure 14.16: Compensating variation

from A to B, and the person has a lower level of welfare. Suppose now that the individual is compensated for the price increase with an increase in income – just enough to restore the person’s initial level of utility: increase income from m to m′ . The amount m′ − m is exactly the monetary compensation required to offset the

welfare-reducing impact of the price increase. After this increase in income m′ − m, the person ends up consuming at C. This is called the compensating variation: CV = compensating variation = m′ − m. The following discussion develops an expression for the compensating varia-

tion using the expenditure function and the Hicksian demand curve. Given income m and prices (p i , p− i ), utility maximization leads to demand x∗ = x(p i , p− i , m) and

utility u0 = u(x(p i , p− i , m)). Now, consider the problem where u0 is fixed and the task is minimizing the cost of achieving the level of utility u0 : min

p 1 x1 + p 2 x2

subject to

u(x1 , x2 ) = u0 .

(14.9)

This problem involves moving the budget constraint in as far as possible subject to maintaining tangency with the indifference curve associated with utility level u0 . The solution is depicted in Figure 14.17.

CHAPTER 14. INTEGRATION

478

xj

h 0 • x (pi, p j, u )

{ x | u( x i , x j ) = u0 }

xi

Figure 14.17: The optimal cost-minimizing choice Notice that the solution to this program depends on the prices and level of utility u0 : the demand function for good i is x hi ( p i , p j , u0 ). The superscript “h” denotes the fact that this is the Hicksian (or compensated) demand function with demand dependent on prices and the level of utility, u0 . Example 14.7 computes these demand functions in a simple case.

E XAMPLE 14.7: With u( x i , x j ) = a ln x i + b ln x j , minimizing cost subject to

attaining some utility level u requires that the marginal rate of substitution ax j bxi

p

pi pj ,

or x i = ab p j x j . The choice of ( x i , x j ) i ¢ must satisfy u = a ln x i + b ln x j = ln x ai x bj or e u = x ai x bj . Substituting the equals the price ratio. Thus,

=

value for x i in terms of x j ,

e

So, x aj +b =

³

u

á b pi eu a pj

x hi ( p i , p j , u) =

= x ai x bj

a pj = xj b pi µ

and x ai +b =

"µ

a pj b pi

¡

¶b

e

u

#

³

¶a

x bj

´b a pj eu . b pi

1 a+ b

,

a pj = b pi µ

¶a

x aj +b .

Thus,

x hj ( p i , p j , u) =

·µ

b pi a pj

¶a

eu

¸

1 a+ b

.

♦

14.11. WELFARE MEASURES

479

This contrasts with the Marshallian demand curve, x i (p i , p j , m), where demand depends on prices and the level of income and is found by maximizing utility subject to a budget constraint. From Figure 14.17 one can see that minimizing expenditure at given prices defines a budget constraint that is tangent to the indifference curve determined by u0 . Conversely, if this budget level is set, utility maximization will give a solution at the same tangency. Thus, x i (p i , p− i , m) = x hi (p i , p j , u0 ), for all i.

(14.10)

In words, starting with income m and maximizing utility gives demand x(p i , p− i , m) and corresponding utility level u0 . If this utility level, u0 , is then

fixed and expenditure minimized subject to attaining this utility level, the result-

ing expenditure is m and the corresponding (Hicksian) demand is x hi (p i , p j , u0 ). This connection is illustrated in Example 14.9. Minimizing cost subject to achieving a given utility level leads to the expenditure function. If prices are (p i , p− i ) and a utility level u0 is given, write

e(p i , p− i , u0 ) for the (minimum) expenditure required to reach that utility level at

those prices. If the price of good i rises to say p′i , the relevant expenditure becomes e(p′i , p− i , u0 ). Thus, to compensate for the price rise in a way that would allow the

individual to maintain the same utility level, the individual would require the additional amount e(p′i , p− i , u0 ) − e(p i , p− i , u0 ). This is called the compensating variation. The following discussion shows how the Hicksian demand function can be used to compute the compensating variation. The Lagrangian associated with the cost minimizing program is: L (x i , x j , λ) = p i x i + p j x j + λ(u0 − u(x i , x j )),

with first-order conditions: L i = p i − λu i (x i , x j ) = 0, 0

Lλ = u − u(x i , x j ) = 0,

where u i =

x hj (p i , p j , u0 )).

∂u . ∂ xi

µ

ui =

∂u ∂xi

¶

, i = 1,2

They determine a solution: x h (p i , p j , u0 ) = (x hi (p i , p j , u0 ),

Define the expenditure function:

¡ ¢ ¡ ¢ ¡ ¢ e p i , p j , u0 = p i x hi p i , p j , u0 + p j x hj p i , p j , u0 .

CHAPTER 14. INTEGRATION

480

Note that L (x hi (p i , p j , u0 ), x hj (p i , p j , u0 ), λ(p i , p j , u0 )) = e(p i , p j , u0 ),

for all (p i , p j , u0 ) since at any (p i , p j , u0 ), x h (p i , p j , u0 ) satisfies u0 − u(x h (p i , p j , u0 )) = 0. Therefore, we may differentiate with respect to prices or utility. In

particular:

∂L ∂ pi

=

h ∂L ∂ x i

∂xi ∂ p i

+

h ∂L ∂ x j

∂x j ∂ p i

=

∂e ∂ pi

and ∂e ∂ pi

=

∂L ∂ pi

= x hi (p i , p j , u0 ) +

= x hi (p i , p j , u0 ) +

2 X

·

k =1

2 X

pk

k =1

pk − λ

∂ x kh ∂ pi

∂u ∂xk

¸

−λ ∂ x kh ∂ pi

2 ∂u ∂ x h X k k =1 ∂ x k ∂ p i

.

From the first-order conditions, the second term is zero. Therefore, ∂ e(p i , p j , u0 ) ∂ pi

= x hi (p i , p j , u0 ).

(14.11)

Thus, the derivative of the expenditure function with respect to the price of a good gives the Hicksian demand function for that good. Suppose that p i rises to p′i with p j remaining constant. If the individual’s income were set at e(p i , p j , u0 ), then optimal consumption would occur at x¯ (see Figure 14.18), yielding the lower level of utility u′ . How much compensation does the consumer need to be as well off as before (i.e., to maintain utility level u0 )? From the definition of the cost function, e(p′i , p j , u0 ) is the amount of money required to achieve the level of utility u0 at prices (p′i , p j ) whereas e(p i , p j , u0 ) is the amount required at prices (p i , p j ). Therefore, after the price increase, the individual needs e(p′i , p j , u0 ) − e(p i , p j , u0 ) to be as well off as before. In Figure 14.18, the impact of a price rise is given by the

inner dotted line. The outer dotted line results from the compensating income. Let (x hi , x hj ) = x h (p i , p j , u0 ) be the bundle chosen at prices (p i , p j ) to achieve utility u0 , h

h

and let (x′i , x′j ) = x h (p′i , p j , u0 ) be the bundle chosen at prices (p′i , p j ) to achieve utility u0 . Thus

e(p i , p j , u0 ) = p i x hi + p j x hj ,

h

h

e(p′i , p j , u0 ) = p′i x′i + p j x′j .

(14.12)

14.11. WELFARE MEASURES

481

xj

xj

h ′ 0 • x (p i , p j , u )

h 0 • x (p i , p j , u )

h 0 • x (p i , p j , u )

x¯ •

u0 u′ xi

xi

Figure 14.18: The impact of price changes on quantities The compensation required to make the individual as well off as before the price increase is then: CV = e(p′i , p j , u0 ) − e(p i , p j , u0 ).

(14.13)

This amount can be given an alternative representation. Recall that F(b) − F(a) = Rb ′ a F (x)dx. In the present context this can be applied to give: CV =

e(p′i , p j , u0 ) − e(p i , p j , u0 ) =

Z p′

i

pi

∂e ∂ pi

d pi =

Z p′

1

pi

x hi ( p˜ i , p j , u0 )d p˜ i .

Figure 14.19 illustrates the calculation.

E XAMPLE 14.8: u(x1 , x2 ) = x1α x21−α

¡ ¢ L (x1 , x2 , λ) = p1 x1 + p2 x2 + λ u0 − x1α x21−α L x1 = p1 − L x2 = p2 −

λα x1α x21−α

x1

=0

λ(1 − α)x1α x21−α

x2

Lλ = u0 − x1α x21−α = 0.

=0

(14.14)

CHAPTER 14. INTEGRATION

482

p′i CV

pi

x h ( p i , p j , u0) xi

Figure 14.19: Compensating variation

Thus,

p1 p2

=

α x2 1−α x1

·

α p2

u0 =

⇒ x1 =

(1 − α) p1

¸α

p2 α p 1 1 −α x 2 .

x2α x21−α =

·

The third equation implies α p2

(1 − α) p1

¸α

x2 ⇒ x2 =

·

(1 − α) p1 α p2

¸α

u0 .

Solving for x1 :

x1 = =

CV =

·

(1 − α) p1 α p2

¸−1

·

α p2

¸1−α

(1 − α) p1

Z p′

1

p1

= u0

·

u0

·

x2 =

(1 − α) α p2

(1 − α) α p2

·

(1 − α) p1 α p2

u0 = u0 ¸−(1−α)

¸−(1−α)

·

¸−(1−α) ·

(1 − α) p1 α p2

(1 − α) p1 α p2

¸−(1−α)

(1−α) p− d p˜ 1 = u0 1

·

= u0

·

(1 − α) α p2

¸−α ·

(1 − α) p1 α p2

(1 − α) α p2

¸−(1−α)

¸−(1−α) Z p′

1

p1

¸α

u0

(1−α) p− , 1

1 −α p− d p˜ 1 1

¶ ¸ · µ 1 −α (1 − α) −(1−α) 1 1 1 . ( p1 − ( p′1 )−α ) = u0 − α α p2 α pα ( p′1 )α 1

♦

14.11.3

Equivalent variation

Where compensating variation asks what monetary compensation is required to “neutralize” a price increase from lowering utility, equivalent variation asks what

14.11. WELFARE MEASURES

483

income reduction would have the same impact on utility as the price increase. As with compensating variation, equivalent variation has a simple expression in terms of the expenditure function. This is explained next. Suppose that the price of i increases from p i to p′i (see Figure 14.20). The

xj u0 u

1

B

•

C′ •

•A

xi m p′i

m′′ pi

m pi

Figure 14.20: Equivalent variation

impact of the price increase is to move consumption from A to B. Welfare drops from u0 to u1 . Suppose instead that price remains unchanged, but income is reduced to achieve the same loss of welfare. At initial prices, income is reduced from m to m′′ . In this case, the individual ends up consuming at C ′ . The loss of income m − m′′ has the equivalent effect on welfare as a price increase from p i to p′i . This is called the equivalent variation:

EV = equivalent variation = m − m′′ . The following discussion provides detailed calculations for EV . Suppose that income is m and prices (p1 ,... , p n ). Utility maximization leads to demand x(p i , p− i , m) and corresponding utility u0 . If now, starting from utility

of u0 the consumption bundle is chosen to minimize expenditure, this is achieved at consumption bundle x(p i , p− i , m) at a cost of m. In terms of the expenditure

function e(p i , p− i , u0 ) = m. See Figure 14.21. Now suppose that the price of good

CHAPTER 14. INTEGRATION

484

xj u0

u1 B

•

•

x(p i , p j , m)

C

•

x′i

xi xi

m p′i

m pi

Figure 14.21: Equivalent variation

i increases to p′i . With income unchanged, utility maximization leads to demand x(p′i , p− i , m), at B in Figure 14.21. So, p′i x i (p′i , p− i , m) +

X

k6= i

p k x i (p′i , p− i , m) = m.

(14.15)

Conversely, with u1 given, at price profile (p′i , p− i ) expenditure minimization subject to attaining utility level u1 leads to demand at B. That is, solving the program:

min

(

p′i x i +

X

j 6=1

pjxj

)

subject to u i (x) ≥ u1

(14.16)

gives the compensated or Hicksian demand x h (p′i , p− i , u1 ), at B in Figure 14.21, with corresponding expenditure e(p′i , p− i , u1 ). So,

x hi (p′i , p− i , u1 ) = x i (p′i , p− i , m), for all i.

(14.17)

14.11. WELFARE MEASURES

485

Thus, e(p′i , p− i , u1 ) = p′i x hi (p′i , p− i , u1 ) + = p′i x i (p′i , p− i , m) +

X

k6= i

X

k6= i

p k x kh (p′i , p− i , u1 )

(14.18)

p k x k (p′i , p− i , m)

= m.

Next, at the original prices (p i , p− i ), expenditure minimization, subject to achieving utility level u1 , leads to demand at C. Solving the program: min

(

p i xi +

X

pjxj

j 6=1

)

subject to u i (x) ≥ u1

(14.19)

gives the compensated or Hicksian demand x h (p i , p− i , u1 ), at C in Figure 14.21,

with corresponding expenditure e(p i , p− i , u1 ). The drop in welfare, from u0 to u1 , resulting from the price increase p i to p′i generates the same utility loss as would

a fall in income from m to e(p i , p− i , u1 ), because both B and C are on the same indifference curve. Since m = e(p′i , p− i , u1 ), the income equivalent of the price rise, called the equivalent variation, is given by: EV = =

14.11.4

e(p′i , p− i , u1 ) − e(p i , p− i , u1 ) = Z p′

i

pi

Z p′

i

pi

∂ e( p˜ i , p− i , u1 ) ∂ pi

x h ( p˜ i , p− i , u1 )d p˜ i .

d p˜ i

(14.20) (14.21)

Comparison of welfare measures

These welfare measures (CV and EV ) may be compared using the Hicksian and Marshallian demand curves x h (p i , p− i , u) and x i (p i , p− i , m). Consider the demand

for good i, x i (p i , p− i , m) evaluated at price utility profile (p i , p− i , m). At these prices and income, the level of utility is u0 . Conversely, at prices (p i , p− i ), the

minimum expenditure required to achieve utility u0 is e(p i , p− i , u0 ) = m, and this occurs at the consumption bundle x hi (p i , p− i , u0 ). As p i varies to, say, p′i , the value of e(p′i , p− i , u0 ) = m′ gives the amount of money required to maintain utility

u0 . At profile (p′i , p− i , m′ ) the same observations apply: maximizing utility gives

Marshallian demand x i (p′i , p− i , m′ ) and utility u0 , while minimizing expenditure

subject to achieving utility u0 gives compensated demand x hi (p′i , p− i , u0 ) and expenditure m′ . These observations are depicted in Figure 14.22.

CHAPTER 14. INTEGRATION

486

Consider the function x i ( p i , p− i , e( p i , p− i , u0 )) and the impact of a price change

on its value. As price p i changes to p′i , the initial impact is to push consumption toward the point A . As income is adjusted through e( p i , p− i , u0 ) to allow the indi-

vidual to maintain the same utility level, this moves demand to x( p′i , p j , m′ ) =

x h ( p′i , p− i , u0 ) (at price p′i , m′ = e( p′i , p j , u0 )).

xj u0

u1 x( p′i , p j , m′ ) = x h ( p′i , p − i , u0 ) = x( p′i , p j , e ( p′i , p j , u0 ))

•

A

•

x( p′i , p j , m) = x( p′i , p j , e ( p i , p j , u0 ))

x( p i , p j , m) = x h ( p i , p − i , u0 ) = x( p i , p j , e ( p i , p j , u0 ))

•

xi m′ p′i

m p′i

m pi

Figure 14.22: Demand variation

Notice that

x hi ( p i , p− i , u0 ) = x i ( p i , p j , e( p i , p j , u0 ))

(14.22)

for each value of p i . This expression is an identity in p i and connects the Hicksian and Marshallian demand curves.

E XAMPLE 14.9: This example illustrates the connection between Marshallian and Hicksian demand functions that appears in Equations 14.10 and 14.22. Recall the Marshallian demands in Example 14.6: x i ( p i , p j , m) = a m a+ b p i

and x j ( p i , p j , m) =

b m a+ b p j .

The corresponding utility attained at this

14.11. WELFARE MEASURES

487

level of income is u0 , say:

u0 = a ln[ x i ( p i , p j , m)] + b ln[ x j ( p i , p j , m)] = a ln = ln

(·

a m a + b pi 0

Therefore, e u =

x hi ( p i , p j , u0 ) =

h

"µ

¸a ·

b m a+b pj

a m a+ b p i

a pj b pi

ia h

¶b

e

¸b )

b m a+ b p j

u0

#

·

¸ · ¸ b m a m + b ln a + b pi a+b pj

.

ib

. Consider:

1 a+ b

¸ · ¸ ) 1 a m a b m b a+ b a + b pi a+b pj ( µ ¶a µ ¶ µ ¶ ) 1 ³ a ´ b µ p j ¶ a+b b ³ a á b b b 1 b a+ b a+ b a 1 = m m b pi a+b pi a+b pj b b a b µ ¶ ¶ ¶ ¶ µ µ µ ³a´ b ³ a ´ a a+ b p j a+ b 1 b 1 a+ b 1 a+ b a+ b a+ b = ( m a + b ) a+ b b a+b a+b pi pi pj µ ¶ b µ ¶ a µ ¶ b ³ a ´ a µ b ¶ a+b b ³ a ´ b b 1 a+ b 1 a+ b 1 a+ b a+ b a+ b p ja+b m = a+b a+b b pi pi pj ³ a ´ a ³ a ´ b µ1¶ a+ b a+ b = m a+b a+b pi µ ¶ ³ a ´ 1 m = a + b pi

a pj = b pi µ

¶

b a+ b

(·

= x i ( p i , p j , m).

This gives the equality in Equation 14.10. Conversely, starting with a fixed utility level u0 , the corresponding Hicksian demand functions imply the expenditure:

m = p i x hi ( p i , p j , u0 ) + p j x hj ( p i , p j , u0 ) "µ # 1 ·µ ¶ ¶ ¸ 1 a p j b u 0 a+ b b p i a u 0 a+ b = pi . + pj e e b pi a pj So, "

m = pi

³a´

b

b a+ b

µ

pj pi

¶

b a+ b

µ ¶ a µ ¶ a # 0 u b a+ b p i a+ b + pj e a+ b . a pj

CHAPTER 14. INTEGRATION

488

The Marshallian demand is:

x i ( p i , p j , m) =

a m a + b pi

" µ ¶ a µ ¶ a # 0 ³ a ´ b µ p j ¶ a+b b u a 1 b a+ b p i a+ b a+ b pi = + pj e a+ b a + b pi b pi a pj " # ³ a ´ b µ p j ¶ a+b b p j µ b ¶ a+a b µ p j ¶− a+a b u0 a a+ b = + e a+ b a+b b pi pi a pi " µ ¶ a #µ ¶ b b ³ ´ p j a+ b u 0 a a+ b a b a+ b e a+ b = + a+b b a pi " µ ¶1− b # µ ¶ b ³a´ b a+ b p j a+ b u 0 b a a+ b + e a+ b = a+b b a pi ·³ ´ b µ ¶¸ µ ¶ b p j a+ b u 0 a a a+b ³ a ´ a+b b b = + e a+ b a+b b b a pi · µ ¶¸ µ ¶ b p j a+ b u 0 b a ³ a ´ a+b b 1+ e a+ b = a+b b a pi ·µ ¶¸ µ ¶ b p j a+ b u 0 a ³ a ´ a+b b a + b e a+ b = a+b b a pi " # 1 ³ a ´ b µ p j ¶ b 0 a+ b u = e b pi

= x hi ( p i , p j , u0 ).

♦

This completes the calculations. Returning to Equation 14.22 and differentiating, ∂ x hi ( p i , p− i , u0 ) ∂ pi

=

∂ x i ( p i , p j , e( p i , p j , u0 )) ∂ pi

+

∂ x i ( p i , p j , e( p i , p j , u0 )) ∂ e( p i , p j , u0 ) ∂m

∂ pi

.

Using Equation 14.11, ∂ x hi ( p i , p− i , u0 ) ∂ pi

=

∂ x i ( p i , p j , e( p i , p j , u0 )) ∂ pi

+

∂ x i ( p i , p j , e( p i , p j , u0 )) ∂m

x h ( p i , p − i , u 0 ), (14.23)

an expression called the Slutsky equation, which connects the slopes of the compensated and uncompensated demand functions. Rearranging the expression, ∂ x i ( p i , p j , e( p i , p j , u0 )) ∂ pi

=

∂ x hi ( p i , p− i , u0 ) ∂ pi

−

∂ x i ( p i , p j , e( p i , p j , u0 )) ∂m

x h ( p i , p − i , u 0 ). (14.24)

14.11. WELFARE MEASURES

489

For a normal good (demand increases when income increases),

∂ xi ∂m

> 0. Also,

∂ xhi ∂pi

is

negative, since it involves a rotation along the indifference curve. So, for normal goods, ∂xi ∂ pi

0 for all x. The study of each of these

problems is facilitated through the use of eigenvalues and eigenvectors.

In Section 15.2 eigenvalues and eigenvectors are defined and some of the key properties introduced. Section 15.3 describes two dynamic models, of price and population growth, that illustrate the use of eigenvectors and eigenvalues in the study of the time path of these variables. A key theorem, the Perron–Frobenius theorem, is summarized in Section 15.4. Section 15.5 describes two applications 493

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

494

in detail: a simplified model of Internet webpage ranking and a generational model of population evolution. When the underlying matrix is symmetric, the associated eigenvalues and eigenvectors have additional properties and these features are discussed in Section 15.6. Eigenvalues are found as the roots of a polynomial equation and so may be real or complex. Therefore, some discussion of complex

numbers

in

this

context

is

provided

in

Section

15.7.

Section 15.8 considers the diagonalization of a matrix (so that a matrix A may be written in the form A = P ΛP −1 , with Λ a diagonal matrix.) This key property

simplifies the computation of powers of A. A number of important properties of eigenvalues are described in Section 15.9.

15.2 Definitions and basic properties Consider a square n × n matrix A. Given a vector x, this is transformed by A to the vector y = Ax by multiplication. In general, y and x will bear no particular

relation, but in the special case where y is proportional to x, y = λ x, x is called an eigenvector of the matrix A. E XAMPLE 15.1: Let Ã ! 4 6 A= , 2 8

x1 =

Ã ! 1

1

,

x2 =

Ã ! −3

1

,

λ1 = 10,

λ2 = 2.

Notice that Ax1 =

Ã

10 10

!

= 10

Ã ! 1

1

,

Ax2 =

Ã

−6

2

!

=2

Ã

−3

1

!

.

Ax1

Ax2 x2

x1

15.2. DEFINITIONS AND BASIC PROPERTIES

495

In the first case, applying A to x1 scales x1 up by a factor of 10; applying A to x2 scales x2 up by a factor of 2. For both vectors, applying A has the effect

♦

of scaling the vector but the direction is unchanged.

Let A be a square matrix of order n. Let x be an n-dimensional vector and λ a scalar. If Ax = λ x, x 6= 0, then λ is called a characteristic root or an eigenvalue of

A and x its associated characteristic vector or eigenvector. Formally:

Definition 15.1: Given the square matrix A, a non-zero vector x is called an eigenvector of A if there is some number λ, such that Ax = λx. The scalar λ is called the eigenvalue associated with x (and may be 0). Different eigenvectors may have the same eigenvalue. Note that there is not a unique eigenvector associated with an eigenvalue, because

Ax = λ x ⇔ A ( cx) = λ( cx) so that if x is an eigenvector of A , then so is cx, where c

is any real non-zero number. Conventionally, this is resolved by some normalization. Since Ax = λ x implies ( A − λ I ) x = 0, if x 6= 0, then ( A − λ I ) must be singular so

that | A − λ I |= 0. The determinental equation | A − λ I |= 0, defines an nth-order

polynomial in λ called the characteristic equation or characteristic polynomial. Definition 15.2: The characteristic polynomial of A , p(λ) is defined:

p(λ) = | A − λ I |. In the case where A is a 2 × 2 matrix,

A=

Ã

a11

a12

a21

a22

!

,

¯ ¯a − λ ¯ 11 | A − λ I |= ¯ ¯ a21

¯ a12 ¯¯ ¯ = (a11 − λ)(a22 − λ) − a21 a12 . a22 − λ¯

So, | A − λ I | = a11 a22 − (a11 + a22 )λ + λ2 − a21 a12

= λ2 − (a11 + a22 )λ + a11 a22 − a21 a12 = λ2 − tr( A )λ + det( A ),

(15.1)

where tr( A ) is the sum of the diagonal elements of A and det( A ) is the determinant of A . From this, in the 2 × 2 case, λ1 , λ2 =

tr( A ) ±

p tr( A )2 − 4det( A ) . 2

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

496

E XAMPLE 15.2:

A=

Ã 4

6

3

1

!

,

A − λI =

Ã

4

6

3

1

! Ã

−

!

λ

0

0

λ

=

Ã

4−λ

6

3

1−λ

!

,

so that | A − λ I | = (4 − λ)(1 − λ) − 18 = λ2 − 5λ − 14 = (λ − 7)(λ + 2). So, λ1 = 7

and λ2 = −2 are the two eigenvalues. Finding the eigenvector x2 associated

with, say, λ2 requires solving: Ã

4 − λ2 3

6 1 − λ2

!Ã

x21 x22

!

=

Ã ! 0

0

⇔

Ã

6

!Ã ! 6 x21

3

3

x22

=

Ã ! 0

0

.

So, x21 = − x22 solves this equation. With a normalization, x21 = 1 gives x22 =

♦

−1 and x2 = (1, −1). A similar calculation gives x1 = (2, 1).

R EMARK 15.1: Sometimes it is useful to define left eigenvalues and eigenvectors. If y′ A = λ y′ for some λ and y 6= 0, then y is called a left eigenvector of

A . With this terminology, Ax = λ x is a right eigenvector, or eigenvalue pair.

The left eigenvalues are defined as the characteristic roots of the polynomial | A ′ − λ I | and since | A ′ − λ I |=| A − λ I | (a matrix and its transpose have the

same determinant), the eigenvalues are unchanged: the set of left eigenvalues

is the same as the set of right eigenvalues. However, the eigenvectors generally differ. For example:

A=

Ã

4

8

4

4

!

,

p p λ1 = 4(1 + 2), λ2 = 4(1 − 2),

left eigenvectors : y1 = right eigenvectors : x1 =

Ã p ! 1 2 2

1 Ãp ! 2 1

, y2 =

, x2 =

Ã p ! − 12 2

1 Ã p ! − 2 1

,

.

However, they satisfy orthogonality conditions. Consider left and right eigenvalue–eigenvector pair ( yi , λ i ) and ( x j , λ j ). Then

y′i A = λ i y′i ⇒ y′i Ax j = λ i y′i x j and Ax j = λ j x j ⇒ y′j Ax j = λ j y′j x j .

15.2. DEFINITIONS AND BASIC PROPERTIES

497

Therefore, λ i y′i x j = λ j y′j x j or (λ i − λ j ) y′i x j = 0. If all the eigenvalues are distinct,

then writing Y for a matrix of left eigenvectors, each eigenvector occupying a column of Y , and X for the matrix consisting of columns of right eigenvectors, these calculations imply that Y ′ X is a diagonal matrix.

For A an n × n matrix, the equation | A − λ I |= 0 is a polynomial with n roots,

λ1 ,... , λn . The roots may be real or complex and may be repeated. This polynomial,

p(λ), may be written: p(λ) = a n λn + a n−1 λn−1 + · · · a1 λ + a0 = a n (λ − λ1 )(λ − λ2 ) · · · (λ − λn ).

(Take the second form, multiply out the right side and match the coefficients on powers of λ to connect the roots to the coefficients of p(λ).) For example,

p(λ) = λ2 − 3λ + 2 = (λ − λ1 )(λ − λ2 ) = λ2 − (λ1 + λ2 )λ + λ1 λ2 , so that λ1 + λ2 = 3 and λ1 λ1 = 2, giving λ1 = 1, λ2 = 2. From the definition, if A is

singular, then λ = 0 is a root of A. The fact that | A − λ I |= 0 has n roots follows from the fundamental theorem of algebra, which states: Theorem 15.1: Let p( z) be an nth-order polynomial:

p ( z ) = a n z n + a n −1 z n −1 + · · · a 1 z + a 0 with a n 6= 0 (and where each a i may be real or complex). Then p has n roots,

z1 , z2 ,... , z n : p( z i ) = 0, i = 1,... , n.

Note that even if the entries of the matrix are real numbers (or even positive real numbers), eigenvalues and eigenvectors may be complex.

E XAMPLE 15.3: Let

A=

Ã

1

2

−3

2

!

.

Then | A − λ I |= (1 − λ)(2 − λ) + 6 = λ2 − 3λ + 8. From this, the roots are: p p 3 ± 9 − 32 3 ± −23 = . λ1 , λ2 = 2 2

Therefore, the eigenvalues are complex.

♦

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

498

R EMARK 15.2: In the case of a 3 × 3 matrix, the characteristic polynomial from

the determinant of A − λ I suggests some useful features of the function. The

polynomial is:

p(λ) = −λ3 + (a11 + a22 + a33 )λ2 + (−a11 a22 − a11 a33 − a22 a33 + a23 a32 + a21 a12 + a31 a13 )λ + (a11 a22 a33 − a11 a23 a32 + a21 a32 a13 − a21 a12 a33 + a31 a12 a23 − a31 a13 a22 ).

Notice that the coefficient on λ3 is (−1)3 , the coefficient on λ2 is tr( A ) and the constant is | A |. For the n-dimensional case, this becomes:

p(λ) = (−1)n λn + (−1)n−1 tr( A )λn−1 + [terms in λn−2 , λn−3 , · · · , λ] + det( A ). For example, recalling that p(λ) =| A − λ I |, p(0) =| A |= det( A ).

One important observation is that if the eigenvalues are all distinct, then the eigenvectors are linearly independent. Theorem 15.2: Let A have eigenvalues, λ1 , λ2 ,... , λn all distinct, and let x1 , x2 ,... ,

xn be associated eigenvectors. Then x1 , x2 ,... , xn are linearly independent. P ROOF : Consider the case where A is a 3 × 3 matrix. Suppose that x1 , x2 , x3 are

not linearly independent, so there exist α1 , α2 and α3 , at least one non-zero, such that α1 x1 + α2 x2 + α3 x3 = 0.

Multiplying by ( A − λ2 I )( A − λ3 I ) gives: ( A − λ2 I )( A − λ3 I )[α1 x1 + α2 x2 + α3 x3 ] = 0. Using the fact that ( A − λ i I ) x j = Ax j − λ i x j = λ j x j − λ i x j = (λ j − λ i ) x j gives: 0 = α1 ( A − λ2 I )( A − λ3 I ) x1 + α2 ( A − λ2 I )( A − λ3 I ) x2 + α3 ( A − λ2 I )( A − λ3 I ) x3 = α1 ( A − λ2 I )(λ1 − λ3 ) x1 + α2 ( A − λ2 I )(λ2 − λ3 ) x2 + α3 ( A − λ2 I )(λ3 − λ3 ) x3 = α1 (λ1 − λ3 )( A − λ2 I ) x1 + α2 (λ2 − λ3 )( A − λ2 I ) x2 + α3 (λ3 − λ3 )( A − λ2 I ) x3 = α1 (λ1 − λ3 )(λ1 − λ2 ) x1 + α2 (λ2 − λ3 )(λ2 − λ2 ) x2 + α3 (λ3 − λ3 )(λ3 − λ2 ) x3 = α1 (λ1 − λ3 )(λ1 − λ2 ) x1 .

Since x1 6= 0 and (λ1 − λ3 )(λ1 − λ2 ) 6= 0, it must be that α1 = 0. Similar computa-

tions show that α2 = α3 = 0. For the n-variable case, the calculations generalize

directly.

15.2. DEFINITIONS AND BASIC PROPERTIES

499

R EMARK 15.3: Thus, distinct eigenvalues imply independent eigenvectors. Two real vectors x and y are said to be orthogonal if x′ y = 0. Although distinct eigen-

values imply that the eigenvectors are independent, they may not be orthogonal. For example, the matrix A has eigenvectors v1 and v2 and eigenvalues λ1 ,

λ2 . The eigenvectors are linearly independent but not orthogonal, v1 · v2 = 9:

A=

Ã

0

1

−8

6

!

, v1 =

Ã ! 1

, v2 =

2

Ã ! 1

4

, λ1 = 2, λ2 = 4.

R EMARK 15.4: Even when the eigenvalues are not distinct, it may be possible to find linearly independent eigenvectors. For example, the identity matrix:

A=

Ã 1

0

0

1

!

has characteristic equation (1 − λ)2 = 0, so that λ1 = λ2 = 1. Nevertheless, Ã

1

0

!Ã ! 1

0

1

0

= λ1

Ã ! 1

0

,

Ã 1

0

0

1

!Ã ! 0

1

= λ2

Ã ! 0

1

,

so that one can find two linearly independent eigenvectors associated with the common eigenvectors λ1 = λ2 . However, this is not always possible. Consider the matrix:

B=

Ã

1

0

1

1

!

.

The matrix B has eigenvalues λ1 = λ2 = 1. Thus, any eigenvector must satisfy:

B=

Ã

1

0

1

1

!Ã

x1 x2

!

=

Ã

x1

!

x2

and this requires that x1 + x2 = x2 or x1 = 0, and so any eigenvector x = ( x1 , x2 )

has x1 = 0. All eigenvectors of B have the form x = ( x1 , x2 ) = (0, z).

R EMARK 15.5: One interesting property of the characteristic equation p(λ) = 0

is that the matrix satisfies it own characteristic equation in the sense that:

P ( A ) = a n A n + a n − 1 A n − 1 + · · · + a 1 A + a 0 = 0.

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

500

This is called the Cayley–Hamilton theorem. In the case where the eigenvalues are all distinct, this is easy to see. Let x i be an eigenvector. Then

P ( A ) x i = ( a n A n + a n −1 A n −1 + · · · + a 1 A + a 0 I ) x i

= (a n A n x i + a n−1 A n−1 x i + · · · + a1 Ax i + a0 I x i )

= (a n λni x i + a n−1 λni −1 x i + · · · + a1 λ i x i + a0 I x i )

= (a n λni + a n−1 λni −1 + · · · + a1 λ i + a0 ) x i = p (λ i ) x i = 0.

So, P ( A ) x i = 0 for each eigenvector x i . Since the eigenvalues are distinct, the

eigenvectors are linearly independent and with the matrix X = ( x1 ,... , xn ),

P ( A ) X = 0 and inverting X , P ( A ) = 0.

Because Ax i = λ i x i , if Λ is the diagonal matrix with the eigenvalues on the

diagonal and X = { x1 ,... , xn } the matrix of eigenvectors, then A X = X Λ. If X

is non-singular, then X −1 A X = Λ and A = X Λ X −1 . In this case, X is said to diagonalize A .

Theorem 15.3: A square n × n matrix A is diagonalizable if and only if it has n

linearly independent eigenvectors. A sufficient condition for this is that the eigenvalues of A all be distinct.

15.3 An application: dynamic models Consider a vector, yt that evolves over time according to the equation:

yt = A yt−1 . Then, starting from t = 1, this may be written as yt = A t y0 , so that powers of A are central to describing the evolution over time.

One particularly useful application of eigenvalues and eigenvectors is in computing powers of a matrix A . Suppose that A has n distinct eigenvalues, λ1 ,... , λn with eigenvectors x1 ,... , xn , so that Ax i = λ i x i , for each i . Arranging these n equations in matrix form:

A X = A ( x1 , x2 , · · · , xn ) = ( x1 , x2 , · · · , xn )Λ = X Λ

15.3. AN APPLICATION: DYNAMIC MODELS

501

where X is the square matrix with x i in column i and Λ is the diagonal matrix with λ i in the i th row and column position. Because X consists of linearly independent vectors, X −1 exists, so that

A = X Λ X −1 , and X −1 A X = Λ Then,

A 2 = A A = X Λ X − 1 X Λ X − 1 = X Λ2 X − 1

A k = X Λ k X −1 .

So, the behavior of A k can be considered in terms of Λk . Therefore, if a system evolves according to yt+1 = A yt , then

yt = A t y0 = X Λk X −1 y0 .

E XAMPLE 15.4: Let

A=

− 32

11 4

!

.

Ã

−4

!

Ã − 23

3

Then

X=

Ã

3

4

2

3

!

,

X

−1

=

3 −2

3

,

Ã

Λ=

1 2

0

0

3 4

!

.

So, for example, Ã

t

A =

2

A =

Ã

− 94 , 15 4

49 − 15 8 , 16

!

,

3

A =

Ã

3

4

2

3

− 49

− 57 32

!Ã

1 2

0

0

3 4

57 16 179 64

!

!t Ã

,

3

−4

−2

... ,

!

3

A

10

=

,

Ã −0.4417

−0.3320

0.6640 0.4990

!

. ♦

Since Λ is a diagonal matrix, Λ t is easy to calculate. Letting w′j denote the jth row of X −1 , X −1 y0 is a vector with jth element w′j y0 , and Λ t X −1 y0 a vector with

jth element λ tj w′j y0 . Consequently, yt =

n X

j =1

x j λ tj w′j y0 =

n X

(w′j y0 ) x j λ tj .

j =1

(15.2)

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

502

Thus yt is a combination of the eigenvectors with the relative importance of each eigenvector x j , determined by its corresponding eigenvalue, λ j . For example, suppose that the system is two-dimensional so that:

yt = (w1′ y0 ) x1 λ1t + (w2′ y0 ) x2 λ2t µ ¶t ¸ · λ2 = λ1t (w1′ y0 ) x1 + (w2′ y0 ) x2 . λ1 So, if λ1 is larger than λ2 , then for large t,

³

´ λ2 t λ1

(15.3) (15.4)

≈ 0 and yt ≈ λ1t (w1′ y0 ) x1 . Thus,

yt is approximately proportional to the eigenvector with the largest eigenvalue.

In this case, starting from the initial position y0 , yt moves towards the line determined by x1 and then along this eigenvector (towards 0 or away from 0, depending on whether λ1 is smaller or larger than 1). Equation 15.2 may be seen directly from a slightly different perspective. If the eigenvectors are linearly independent, then for any initial condition y0 there are scalars α1 ,... , αn , such that y0 = α1 x1 + · · · + αn xn . Therefore,

yt = A t y0 = A t [α1 x1 + · · · + αn xn ] = α1 λ1t x1 + · · · + αn λnt xn , since A t xk = λkt xk .

15.3.1

Supply and demand dynamics

The following discussion develops a model of price adjustment determined by the excess of demand over supply: the greater the excess demand, the faster the price rises. In this framework, it is easy to consider the long-run dynamic behavior of prices. Consider the two-market demand and supply model: Market 1: Market 2:

D 1 = a0 + a1 p1 + a2 p2 ,

D 2 = α0 + α1 p1 + α2 p2 ,

S1 = b 0 + b 1 p1 + b 2 p2

S2 = β0 + β1 p1 + β2 p2 .

The excess demand in market i is D i − S i . The vector of excess demands is then:

D 1 − S1 = a0 + a1 p1 + a2 p2 − ( b 0 + b 1 p1 + b 2 p2 ) = ( a0 − b0 ) + ( a1 − b1 ) p1 + ( a2 − b2 ) p2

D 2 − S2 = α0 + α1 p1 + α2 p2 − (β0 + β1 p1 + β2 p2 ) = (α0 − β0 ) + (α1 − β1 ) p1 + (α2 − β2 ) p2 .

15.3. AN APPLICATION: DYNAMIC MODELS

503

Rearranging: Ã

D 1 − S1

!

D 2 − S2

=

Ã

( a0 − b0 )

! Ã

(α0 − β0 )

+

( a1 − b1 )

( a2 − b2 )

(α1 − β1 )

(α2 − β2 )

!Ã

p1 p2

!

.

(15.5)

In matrix notation, this may be written:

e = d + A p. When demand and supply are equal, excess demand is 0 and the markets clear, giving the market equilibrium price. This is found as p¯ , the solution to 0 = d + A p¯ .

Thus, d and the equilibrium price are related as d = − A p¯ (or p¯ = − A −1 d ). If the

price vector at time t is p t , the associated excess demand is e t = d + A p t . Since

d = − A p¯ , this may be written e t = d + A p t = − A p¯ + A p t = A ( p t − p¯ ). Now consider

a price adjustment model, when price moves in the direction of excess demand:

p t +1 − p t = γ e t = γ A ( p t − p¯ ).

Noting that p t+1 − p t = ( p t+1 − p¯ ) − ( p t − p¯ ),

p t+1 − p¯ = ( p t − p¯ ) + γ A ( p t − p¯ ) = ( I + γ A )( p t − p¯ ) = B ( p t − p¯ ).

Assuming that B has distinct eigenvalues λ1 and λ2 , with corresponding eigenvectors, q 1 and q 2 , B may be written: B = Q ΛQ −1 , where Λ is the diagonal matrix

with λ1 and λ2 on the diagonal and where Q = ( q 1 , q 2 ), the matrix consisting of the two eigenvectors. Writing v t = p t − p¯ , the equation system may be written:

v t+1 = Bv t = Q ΛQ −1 v t . To illustrate, suppose that Equation 15.5 has the form: Ã

D 1 − S1 D 2 − S2

!

=

Ã

−8

7

! Ã

+

1 2 − 21

3 2 − 54

!Ã

p1 p2

!

.

When excess demands are 0, the corresponding price is the equilibrium price. Ã ! 0

Ã ! Ã 1 −8 = + 21 0 7 −2

3 2 − 54

!Ã

p¯ 1 p¯ 2

!

,

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

504

so that Ã

p¯ 1

!

p¯ 2

=−

Ã

1 2 − 12

3 2 − 54

! −1 Ã

−8

!

=−

7

Ã

−10

−12

4

!Ã

4

! −8

7

=

Ã ! 4

4

.

Suppose that γ = 1, so that in the previous calculations, Ã

B=I+A=

1

0

0

1

! Ã +

1 2 − 12

The matrix B has eigenvalues λ1 = 42 , λ2 = (6, −4), q′2 = (4, −2). So, B=

Ã

3 2 − 12

3 2 − 14

!

=

Ã

6

−4

−4

2

!Ã

3 2 − 45 3 4

!

=

Ã

3 2 − 12

3 2 − 14

!

.

and corresponding eigenvectors q′1 =

2 4

0

0

3 4

!Ã

6

−4

−4

2

! −1

= Q ΛQ −1 .

¯ The price adjustment equation may be written (v t = p t − p): v t+1 = B t v t = Q ΛQ −1 v0 . Starting with v0 = (4,4), this generates the sequence Ã ! Ã ! Ã ! Ã ! Ã ! 12.0 13.50 12.37 10.40 8.37 v1 = , v2 = , v3 = , v4 = , v5 = , −3.0 −5.25 −5.43 −4.82 −3.99 Ã ! Ã ! Ã ! Ã ! Ã ! 6.56 5.06 3.86 2.93 2.21 v6 = , v7 = , v8 = , v9 = , v10 = . −3.18 −2.48 −1.90 −1.45 −1.10 Plotting these shows the time path of v t = p t − p¯ for t = 1,... ,10 in Figure 15.1.

Notice the trajectory of the price path as it moves quickly to be proportional to q 2 , the eigenvector (indicated by the gray arrow) corresponding to the dominant eigenvalue. In this example, at v0 , each price exceeds the equilibrium price by 4: p i0 − p¯i = 4.

15.3.2

A model of population growth

Consider a population that grows from an initial level of 1: an individual is born in period 1. The individual matures in period 2 and has one offspring, which enters life in period 3. Therefore, in period 3, there are two individuals, one mature, one new. In each period of time, a mature individual produces one offspring, and individuals never die. In period 4, the mature individual produces one offspring adding one individual to the two present in period 3, so that there are

15.3. AN APPLICATION: DYNAMIC MODELS

505

v2

•v0 = p0 − p¯

4 3 2 1 −1

•••••• •

−2

v1 •• 3 v10 • v9 • v8

−3

6 • v7

9

• v6

−4

12

v •1

v•5

−5

• v4

• v3

•v 2

Figure 15.1: Price dynamics three individuals present in period 4, two of whom are now mature – the original mature individual and the individual that was young the previous period. In period 5, these two mature individuals each produce one offspring and the individual young in period 4 become mature in period 5. And so on. Schematically, this may be shown as: { y} → {m} → {m, y} → {m, m, y} → {m, m, m, y, y} → {m, m, m, m, m, y, y, y} → · · ·

Tracking the cohorts of young and mature, the number of young next period equals the number of offspring this period which equals the current number of mature: yt+1 = m t . The number of mature next period equals the number of cur-

rent mature and current young: m t+1 = m t + yt . Thus, Ã

m t +1 y t +1

!

=

Ã

1

1

1

0

!Ã

mt yt

!

.

This formulation tracks both young and mature over time. Note that (m 1 , y1 ) = (0, 1), (m 2 , y2 ) = (1, 0), and so on. The model may also be presented in terms of the

growth of the entire population. This formulation is developed next. The dynamics for the whole population provide essentially the same details as the study of both groups separately. Considering the dynamics when written in terms of the evolution of the whole population, x t = m t + yt , this leads to the famous Fibonacci

equation. Let x t be the number of individuals (young and mature) in the population at time t. Take any time period t and consider how the population in that

period is determined: it will equal the number of young in the previous period,

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

506

t − 1, plus the number of mature in period t − 1, plus the number of offspring of the

mature in that period. The number of young in period t − 1 equals the population

increase from period t − 2. So, the number of young at period t − 1, denoted yt−1 ,

is given by yt−1 = x t−1 − x t−2 . The number of mature at period t − 1, m t−1 equals

the number alive in period t − 2, so that m t−1 = x t−2 . At period t, the number of

mature plus their offspring equals m t−1 + m t−1 = 2 x t−2 . The entire population at time t consists of those that were young at time t − 1, plus those that were mature at time t − 1, plus the offspring of those that were mature at time t − 1. Thus:

x t = y t −1 + 2 m t −1 = x t −1 − x t −2 + 2 x t −2 = x t −1 + x t −2 . With no individual present in period 0, and 1 in period 1, this gives the recursion:

x t = x t − 1 + x t − 2 , t ≥ 2, x 0 = 0, x 1 = 1. This is a second-order difference equation, but the dynamics may be expressed in first-order terms in a matrix formulation. Define z t = x t−1 so that x t+1 = x t + z t and

z t +1 = x t :

Ã

x t +1

!

z t +1

=

Ã 1

1

!Ã ! xt

1

0

zt

=A

Ã ! xt

zt

.

Because x1 = 1 and y1 = x0 = 0, this system proceeds according to: Ã

x t +1

!

z t +1

=

Ã 1

1

1

0

!t Ã ! 1

0

=A

t

Ã ! 1

0

.

(15.6)

To proceed, express the coefficient matrix in terms of its eigenvectors and eigenvalues: A = Q ΛQ −1 : ¯ ¯1 − λ ¯ ¯ ¯ 1

¯ 1 ¯¯ ¯ = (1 − λ)(−λ) − 1 = λ2 − λ − 1. 0 − λ¯

The eigenvalues are the roots of λ2 − λ − 1 = 0: λ1 , λ2 =

λ1 , the corresponding eigenvector satisfies:

Ã 1

1

!Ã ! ξ1

1

0

ξ2

= λ1

Ã ! ξ1 ξ2

p 1 ± 1 +4 2

=

p 1± 5 2 .

Considering

.

From the second line, ξ1 = λ1 ξ2 : normalizing with ξ2 = 1 gives ξ1 = λ1 and the

eigenvector is q′1 = (ξ1 , ξ2 ) = (λ1 , 1). Exactly the same reasoning gives the second

15.3. AN APPLICATION: DYNAMIC MODELS

507

eigenvector as q′2 = (λ2 , 1). Let Q be the matrix of eigenvectors, so that

A=

=

Ã Ã

1

1

1

0

!

= Q ΛQ !Ã

−1

=

λ1

0

1

0

λ2

Ã λ1

λ2

λ1

λ2

1

t

Ã

λ1

λ2

!Ã λ1

0

1

1

0

λ2

1

−λ2

!Ã

−1

!

λ1

!Ã

λ1

λ2

1

1

! −1

1 . λ1 − λ2

Thus,

A =

=

=

=

=

1

!Ã

0

1

Ã λ1

λ2

1

1

Ã λ1

λ2

1

1

!Ã !Ã

λ1t

0

0

λ2t

λ1t

−λ2t

x t +1 z t +1

!

−λ2

!Ã

λ1

1

−λ2

−1

−λ2 λ1t λ1 λ2t

λ1t − λ2t

1 λ1 − λ2

λ1

!

1 λ1 − λ2

!

1 λ1 − λ2

−λ2 λ1t + λ1 λ2t

Ã λ1t+1 − λ2t+1

λ1t − λ2t

λ1t−1 − λ2t−1

λ1t − λ2t

p p 1+ 5 1− 5 2 2

=

1 −5 4

!

!

1 λ1 − λ2

1 λ1 − λ2

,

= −1. From Equation 15.6 and the

Ã ! 1

Ã λ1t+1 − λ2t+1 1 = =A λ1 − λ2 0 λ1t − λ2t t

!

−λ2 λ1 λ1t + λ1 λ2 λ2t

preceding calculations, Ã

1 −1

λ2

Ã λ1t+1 − λ2t+1

using the fact that λ1 λ2 =

!t Ã

0

λ1

λ1t − λ2t

λ1t−1 − λ2t−1

!Ã ! 1

0

,

(15.7)

so that i 1 h t +1 λ1 − λ2t+1 λ1 − λ2 "Ã p ! t +1 # p ! t +1 Ã 1− 5 1+ 5 1 − =p 2 2 5

x t +1 =

Returning to the equation governing the evolution of mature and young: Ã

m t +1 y t +1

!

=

Ã

1

1

1

0

!Ã

mt yt

!

.

(15.8)

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

508

The properties of the coefficient matrix have already been developed. Letting s′t = (m t+1 , yt+1 ) and s′2 = (1, 0), so the system evolves as s t+1 = As t , s2 = As1 , s t =

A t−1 s1 , s t+1 = A t s1 . (Recall the population evolves (m, y) as s1 Ã= (0!, 1), s2 = (1Ã, 0).)! p λ1 λ2 and q 2 = The eigenvalues of A are λ1 , λ2 = 1±2 5 with eigenvectors q 1 = 1 1 and these are linearly independent. Therefore, s2 may be expressed as s2 = α1 q 1 +

α2 q 2 (with solution α1 =

p1 , 5

α2 = − p1 ). Thus, 5

s t +1 = A

t −1

s2 = A t−1 (α1 q 1 + α2 q 2 )

= α1 A t−1 q 1 + α2 A t−1 q 2 = α1 λ1t−1 q 1 + α2 λ2t−1 q 2 .

Expanding terms: Ã ! m t +1

y t +1

Ã Ã p ! t−1 Ã 1+p5 ! p ! t−1 Ã 1−p5 ! 1 1+ 5 1 1− 5 2 2 =p −p . 2 2 1 1 5 5

(15.9)

Thus, p !t# p !t Ã 1− 5 1+ 5 − 2 2 "Ã p ! t −1 # p ! t −1 Ã 1 1− 5 1+ 5 y t +1 = p − . 2 2 5

1 m t +1 = p 5

"Ã

So, for example, comparing x t and m t , recall that the number of mature at time t equals the population at time t − 1: m t+1 = x t , as given by the equations governing

the evolution of x t and m t .

Definition 15.3: The spectral radius of A , denoted ρ ( A ) is defined: ¯ ρ ( A ) = max{ |λ| ¯ λ is an eigenvalue of A }.

That is, ρ ( A ) is the largest eigenvalue in terms of absolute value.

15.4 The Perron–Frobenius theorem The Perron–Frobenius theorem gives properties of the eigenvalues and eigenvectors of square matrices with real non-negative components or with strictly positive components. Thus A = {a} i j with a i j real, a i j ≥ 0 for all i, j; or the stronger

condition that components are strictly positive for all i, j: a i j > 0, ∀ i, j. For the following discussion, a positive matrix or vector is a real matrix or vector with

15.4. THE PERRON–FROBENIUS THEOREM

509

all elements strictly positive; a non-negative matrix or vector is a real matrix or vector with all elements non-negative. Theorem 15.4: Let A be a positive square matrix, a i j > 0 ( A ≫ 0). Then: 1. There is a real positive eigenvalue, λ∗ with λ∗ = ρ ( A ). This eigenvalue is simple (it is not repeated as a root of | A − λ I | = 0).

2. All other eigenvalues λ′ satisfy |λ′ | < |λ| = λ∗ . 3. The eigenvalue λ∗ has a real positive eigenvector, x∗ , x∗i > 0 for all i . 4. The eigenvector x∗ is the only real non-negative eigenvector: any other eigenvector if real has a negative element or else is complex. 5. The eigenvalue λ∗ satisfies: min i

P

j ai j

≤ λ∗ ≤ max i

P

j ai j .

So, A ≫ 0 is a sufficient condition for A to have a strictly positive eigenvalue λ∗

whose corresponding eigenvector is real and strictly positive.

E XAMPLE 15.5: In this example, the matrix is positive, with one positive eigenvalue and corresponding eigenvector that is non-negative.

A=

Ã

1

4

3

2

!

,

λ1 = 5, λ2 = −2,

x1 =

Ã ! 1

1

, x2 =

Ã ! − 43

1

.

If A is rescaled by a factor γ, γ Ax i = γλ i x i , so that the eigenvalues of γ A

are γλ1 and γλ2 . When γ = 15 , γ A is a Markov matrix (non-negative with

♦

each row sum equal to 1).

E XAMPLE 15.6: In this example, the matrix is positive and has one real eigenvalue and two complex eigenvalues. Corresponding to the real eigenvalue is a positive eigenvector:  0.1   A = 0.8 0.1



0.1

2

0.7

 0.7 

2

0.1

λ1 = 2.2, λ2 = −0.65 + 0.952628 i, λ3 = −0.65 − 0.952628 i

CHAPTER 15. EIGENVALUES AND EIGENVECTORS



0.577





510

−0.808 + 0.0 i





−0.808 − 0.0 i



           x1 =  0.577 , x2 = 0.119 + 0.291 i  , x3 = 0.119 − 0.291 i  .

0.577

0.297 − 0.399 i

0.297 + 0.399 i

♦

R EMARK 15.6: The conclusions of Theorem 15.4 are unchanged if the condition that A ≫ 0 is replaced by the condition that there is some k¯ such that for all

k ≥ k¯ , A k ≫ 0. For example,

A=

Ã 1

2

3

0

!

;

2

A =

Ã

7

2

3

6

!

.

The condition that A ≫ 0 may be relaxed to A ≥ 0 (a i j ≥ 0) if the matrix is

assumed to be irreducible. (A matrix is irreducible if, given i, j, there is a sequence

a it 1 , a t 1 t 2 ,... , a t k t k+1 all positive and with t k+1 = j.) However, in this case, more than one eigenvalue may attain ρ ( A ). Theorem 15.5: Let A be a non-negative irreducible square matrix. Then: 1. There is a real positive eigenvalue, λ∗ with λ∗ = ρ ( A ). This eigenvalue is simple (it is not repeated as a root of | A − λ I | = 0).

2. The eigenvalue λ∗ has real positive eigenvector, x∗ , x∗i > 0 for all i . This eigenvector is unique (up to a scalar multiple).

3. No other eigenvalue λ′ 6= λ∗ has a real non-negative eigenvector. 4. For any other eigenvalue λ′ , |λ′ | ≤ |λ| = λ∗ . 5. The eigenvalue λ∗ satisfies: min i

P

j ai j

≤ λ∗ ≤ max i

P

j ai j .

The next example gives an irreducible matrix with two eigenvalues having the same norm of 1: | λ1 |=| λ2 |= 1. E XAMPLE 15.7: Let

A=

Ã

0

1

1

0

!

, λ1 = 1, λ2 = −1, x1 =

Ã ! 1

1

, x2 =

Ã

−1

1

!

.

♦

15.5. SOME APPLICATIONS

511

Note that if A is a non-negative matrix with all entries of A n positive for some n, then A is irreducible. R EMARK 15.7: If A is arbitrary (real with positive or negative entries) any P eigenvalue λ satisfies |λ| ≤ max i j |a i j |. E XAMPLE 15.8: Let A be a non-negative matrix: A=

Ã

a

b

c

d

!

and

A − λI =

Ã

a−λ c

b d−λ

!

.

The characteristic equation is λ2 − (a + d)λ + (ad − cb). This has roots: p

(a + d)2 − 4(ad − cb) 2 p 2 (a + d) ± a + d 2 + 2ad − 4ad + 4cb = 2 p 2 (a + d) ± (a − d) + 4cb = . 2

λ1 , λ2 =

(a + d) ±

so the roots of A are real.

♦

For non-negative (or positive) matrices larger than 2 × 2, eigenvalues may be complex, but there is always a real positive root with corresponding real eigenvector.

15.5 Some applications This section discusses two applications of the theory of eigenvectors. The first is a model of webpage ranking, based essentially on the transition process governing linkages from one webpage to another. The second application concerns population growth in a multigenerational model.

15.5.1

Webpage ranking

The Internet can be viewed as a collection of interconnected nodes; each node is a website. Figure 15.2 illustrates a six-node system. An arrow indicates a link (or

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

512

3

2

1

4

6

5

Figure 15.2: Internet nodes

outlink). From Figure 15.2, webpage 1 links to webpage 2 and 3, webpage 4 links to pages 1 and 5. The number of pages that a webpage links to is called the “outdegree”, the number of arrows leaving a webpage. This network structure may be represented in matrix terms as an adjacency matrix (see Figure 15.3). Denote the adjacency matrix A = {l i j } with l i j = 0 if there is no link from i to j, and l i j = 1 if

there is a link from i to j. In the matrix, a “1” in position (i, j) indicates a link from page i to page j. Let h(i) denote the number of outlinks from i and define t i j :

ti j =

1 1 2 3 4 5 6

2

3

1

1

4

(

1 h(i)

0

5

if h(i) > 0 and l i j = 1 otherwise.

6

1 1

1 1

= A

1 1 1

1

2

3

4

5

6

0

1 2

1 2

0

0

0

 2  0  3 0   4  12   5 0

0

0

0

0

0

0

1 2

0

0

0

1 2 1 2

1 3

0

1 3

0

0

0

1

0

 1   0 = T  0  1 3

1

1 1

.

6



0



0

Figure 15.3: Adjacency matrix and corresponding transition process

15.5. SOME APPLICATIONS

513

Provided h(i) > 0 for all i, the corresponding matrix T = { t i j } has each row sum

to 1. The page rank assigned to i, r i , is measured by how many pages link to i and the rank of those pages.

• The contribution of page j to the rank of i is positive only if j links to i, t ji > 0. • If j links to many pages, its contribution to i is diluted (the larger h( j), the less the impact on i of a link from j). • If j has a high rank and links to i, this contributes positively to i’s rank. This process can be expressed in terms of distributions over webpages. If r j is viewed as the probability of being in state j, p( j), and t i j as the probability of going from state i to state j, p( j | i) (so that each link from i is viewed as equally

likely), then the aggregate page-to-page movement may be represented in probabilistic terms by: p(i) =

X j

p(i | j)p( j)

or ri =

X

rj

{ j | l ji =1} h( j)

=

X

{ j | l ji =1}

t ji r j =

X

t ji r j .

j

Writing r ′ = (r 1 , r 2 ,... , r n ) for the vector or rankings, the ranking vector must satisfy:

r′ T = r′ . Transposing, we seek a solution to T ′ r = r. According to the Perron–Frobenius

theorem, there is a unique eigenvalue–eigenvector pair (λ∗ , r ∗ ) with λ∗ > 0 and all entries of r ∗ non-negative and satisfying

T ′ r ∗ = λ∗ r ∗ . Note that T ι = ι, ι′ T ′ = ι′ where ι is a column vector of 1s. Therefore λ∗ ι′ r ∗ = ι′ T ′ r ∗ = ι′ r ∗ ;

hence, λ∗ = 1 (and in fact, with the eigenvectors normalized to sum to 1), ι′ r ∗ = 1. For the matrix T ′ (with T in Figure 15.3), the eigenvalues are (to three places

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

514

of decimals): (λ1 , λ2 , λ3 , λ4 , λ5 , λ6 ) = (1, −0.769, −0.099 + 0.735 i, −0.099 − 0.735 i, −0.017 + 0.313 i, −0.017 − 0.313 i ).

So, two eigenvalues are real, λ1 and λ2 . The eigenvectors corresponding to λ1 and λ2 ( T ′ yi = λ i yi ) are:

y1 = (0.143, 0.131, 0.072, 0.285, 0.179, 0.190), y2 = (−0.343, 0.435, 0.223, 0.528, −0.489, −0.354) The remaining eigenvectors are complex. (As an aside, the corresponding eigenvectors for T (T x i = λ i x i ) are x1 and x2

(again to three decimal places):

x1 = (1, 1, 1, 1, 1, 1),

x2 = (0.356, −0.646, 0.098, −0.382, 0.230, 0.497).

Again, the remaining eigenvectors are complex. The values in the first eigenvector, x1 , just reflect the fact that the rows of T sum to 1.)

R EMARK 15.8: Notice that iterates of T converge to a matrix with all rows equal. At 20 iterations, T 20 is given by (to four places of decimals): 



0.1428

0.1305

0.0713

0.2840

0.1791

0.1921

 0.1405   20 0.1435 T = 0.1425  0.1428 

0.1316

0.0716

0.2897

0.1763

0.1307

0.0713

0.2853

0.1791

0.1904 

0.1327

0.0724

0.2857

0.1776

0.1449

0.1303

0.0711

0.2852

0.1790

0.1290

0.0702

0.2849

0.1807



0.1901 

.  0.1916  0.1892

0.1903

In particular, each row of T 20 equals x1 . What is the explanation for this? Because the eigenvectors are linearly independent, we can write any vector v in the form:

v=

X

αvi r i ,

a linear combination of the eigenvectors of T ′ (left eigenvectors of T ). Notice that the coefficients αvi depend on the vector v. For each eigenvector, r i of T ′

r ′i T = λ i r ′i and r ′i T k = λki r ′i .

15.5. SOME APPLICATIONS

515

Therefore,

v′ T k =

αvi λki r ′i µ ¶ X v λi k ′ = λ1k α1v r ′1 + αi ri. λ1 i6=1

X

αvi r ′i T k =

In view of Remark 15.6, |λ1 | > |λ j |, j 6= 1,

³

X

´ λi k λ1

→ 0, so for k large,

v′ T k ≈ λ1k α1v r ′1 . If v = e j for any j, where e j is a vector with 1 in the jth position and 0 else-

where,

e′j T k ≈ λ1k α1v r ′1 . Any row, j, of T k is approximately proportional to r ′i . (That all rows are equal arises from the fact that each row must sum to 1.)

R EMARK 15.9: Not all such matrices (T ) converge under iteration (T k ). A sufficient condition is that the matrix T be irreducible (it is possible to move from any state or location to any other state or location) and aperiodic (returns to the state at irregular times). The period d i of the state i is defined as d i = gcd{n ≥ 1 : t(iin) > 0}, with “gcd”

denoting greatest common divisor. (Set d i = ∞ if t(iin) = 0). For irreducible matri-

ces, all states have the same period. An irreducible matrix is called aperiodic if its period is 1. A matrix Q is strictly positive if all entries are strictly positive

Q ≫ 0 or q i j > 0, ∀ i, j. A transition matrix, T , is irreducible and aperiodic if and only if there is some k¯ such that T k ≫ 0 for all k ≥ k¯ .

15.5.2

Leslie matrices

Consider a species that has a maximum life span of n periods. A representative member of the population aged j gives birth to f j offspring, and survives to the next period with probability s j . Thus, if there are x i individuals aged i in the population, then the total number of offspring produced is f 1 x1 + f 2 x2 + · · · + f n x n . The number aged i > 1 in the

population is the number aged i − 1 in the previous period that survive to age i :

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

516

x i = s i−1 x i−1 , i > 1. Arranging these equations in matrix form: 





f1

f2

    x2,t+1   s1       .. =0  x t +1 =  .    .   . x  n−1,t+1   . 0 x n,t+1

0

x1,t+1

···

f n −1

s2 .. .

··· .. .

0 0

0

···

s n −1

fn



x1 t

    0   x2 t   .  0   ..  = Lx t .       0   x n−1,t   x nt 0

0

···



Thus, the long-run evolution of the population is given by x t+1 = Lx t : x t = L t x0 , where x0 is the initial population by age cohort. If a long-run steady-state distribution exists, it satisfies:

x = Lx, so that x is an eigenvector of L. This occurs when 1 is an eigenvalue of L, although in general this will not be the case. R EMARK 15.10: Note that the Leslie matrix can exhibit periodicity. Consider:

L=

Ã

0

2

1 2

0

!

,

2

L =

Ã

1

0

0

1

!

3

,

L =

Ã

0

2

1 2

0

!

.

This matrix has two eigenvalues, λ1 = 1 and λ2 = −1, with corresponding (and

linearly independent) eigenvectors x1′ = (2, 1), x2′ = (−2, 1).

Any vector z (initial state) may be written z = α1 x1 + α2 x2 , where the α i

depend on z. Since Lx i = λ i x i and L k x i = λki x i ,

L k z = α1 L k x1 + α2 L k x2 = α1 λ1k x1 + α2 λ2k x2

= α1 x1 + α2 (−1) k x2 .

For example, if z′ = (2, 2), Ã ! 2

2

= α1

Ã ! 2

1

+ α2

Ã

! −2

1

.

Solving, α1 = 32 , α2 = 21 . Thus, L k z = 23 x1 + 12 (−1)k x2 .

R EMARK 15.11: Recall that the first row of L is ( f 1 , f 2 ,... , f n ). Let { i | f i > 0} be

the set of age groups that can have offspring. The greatest common divisor of this set is the largest integer that divides (factors) each age in this set and is

15.5. SOME APPLICATIONS

517

denoted gcd { i | f i > 0}. In the example in Remark 15.10, { i | f i > 0} = 2, so that

2 = gcd { i | f i > 0}.

k

In the case where gcd { i | f i > 0} = 1, there is a k¯ such that k ≥ k¯ implies that

L is a strictly positive matrix (has all entries strictly positive): L k ≫ 0. E XAMPLE 15.9: Consider the Leslie matrix: 

0

 L= 0.4

0



2

2

0

 0 .

0.4

0

This matrix has three eigenvalues. Only one is positive and real and has largest absolute value: λ1 = 1.051, with corresponding eigenvector x1′ =

(0.926, 0.352, 0.134). The other two eigenvalues are λ2 = −0.526 + 0.168 i and p λ2 = −0.526 − 0.168 i , so that | λ2 |=| λ3 |= (−0.525)2 + (0.168)2 = 0.551. Considering iterates of L,





5.271

13.849

10.031

 L50 =  2.006

5.271

 3.818  .

0.764

2.006

1.453

All three eigenvalues are distinct, so the eigenvectors are linearly independent. For any initial population distribution, z0 , this may be written:

z0 = c1 x1 + c2 x2 + c3 x3 , where the c i may be real or complex numbers and will depend on z0 . Therefore,

L k z0 = L k ( c1 x1 + c2 x2 + c3 x3 )

= c1 L k x1 + c2 L k x2 + c3 L k x3

= c1 λ1k x1 + c2 λ2k x2 + c3 λ3k x3 " µ ¶k # µ ¶k λ3 λ2 k x2 + c3 x3 . = λ1 c1 x1 + c2 λ1 λ1 ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Noting that ¯ λλ12 ¯ = ¯ λλ31 ¯ =

0.551 1.051 ,

¯ ¯50 ¯ ¯ and so ¯ λλ21 ¯ ≈ 0, so that

L k z0 ≈ λ1k c1 x1 .

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

518

For example, let e 1 = (1, 0, 0) and recall that the eigenvector associated with λ1 is x1 = (0.926, 0.352, 0.134). Then

      5.271 0.926 0.926       50 50      L50 e 1 =  2.006 ≈ 5.69 0.352 ≈ (1.051) c1 0.352 = (1.051) c1 x1 . 0.764 0.134 0.134

So 5.69 ≈ (1.051)50 c1 or 5.69 ≈ 12.026 c1 so that c1 ≈ 0.473. Let e 2 = (0, 1, 0)

and e 3 = (0, 0, 1), so that the i th column of L50 is L50 e i . Comparing L50 and

x i , with Col i denoting the i th column of L50 :

Col1 = L50 e 1 = 5.69 x1 , Col2 = L50 e 2 = 14.97 x1 , Col3 = L50 e 3 = 10.83 x1 . ♦

When L has distinct eigenvalues, the eigenvectors are independent and one may form the system

L{ x1 , x2 ,... , x n } = { x1 , x2 ,... , x n }Λ or

LX = X Λ,

where Λ is the diagonal matrix of eigenvalues. Rearranging,

L = X Λ X −1 ,

L k = X Λ k X −1 .

Consider an initial value x0 . At time t, the state becomes x t :

x t = L t x0 = X Λ t X −1 x0 = X Λ t z0∗ , where z0∗ = X −1 z0 . Noting that λ1t

0

 0  = .  ..  0

λ2t .. .



Λ t z0∗

0

···

··· .. . ···

0



∗ z01





∗ λ1t z01



 ∗   t ∗   z  λ z    02   2 02   .  =  . ,   ..   ..      t ∗ ∗ t λ n z0 n λ n z0 n

0 .. .

∗ ∗ z t = X Λ t z0∗ = x1 λ1t z01 + x2 λ2t z02 + · · · + xn λnt z0∗n .

So, if there is a positive eigenvalue, say λ1 , that has norm larger than any other eigenvalue, then this one will dominate and, in the long run, z t will be propor∗ tional to x1 λ1t z01 . Depending on the magnitude of λ1 , the population will either

15.6. EIGENVALUES AND EIGENVECTORS OF SYMMETRIC MATRICES

519

grow, be constant or decline but, in any case, the relative proportions will be determined by x1 .

15.6 Eigenvalues and eigenvectors of symmetric matrices In some important cases, the matrices considered are symmetric. For example, the Hessian matrix of second partial derivatives arising in optimization is a symmetric matrix, as is any matrix A that may be written A = B ′ B for some matrix B . The eigenvalues and eigenvectors of symmetric matrices satisfy some additional

properties, which are outlined next. In what follows, all matrices are assumed to be symmetric. P ROPOSITION 15.1: For an n × n symmetric matrix, 1. The eigenvalues are all real. 2. If the eigenvalues are distinct, the eigenvectors are orthogonal. 3. If the eigenvalues are not distinct, n eigenvectors, v1 ,... , vn can be found, which are linearly independent and orthogonal.

♦

text

E XAMPLE 15.10:

A=

Ã

4

2

2

1

!

, A − λI =

Ã 4−λ

2

2

!

1−λ

,

| A − λ I |= (4 − λ)(1 − λ) − 4 = 4 − λ − 4λ + λ2 − 4 = λ2 − 5λ = 0.

The equation λ2 − 5λ = λ(1 − 5λ) = 0 has roots λ1 , λ2 = 0, 5. The eigenvectors are:

x1 =

Ã

p2 5 p1 5

!

, x2 =

Ã

p1 5 −2 p 5

!

Confirming: Ã

4

2

2

1

!Ã

p2 5 p1 5

!

=5

Ã

p2 5 p1 5

! Ã 4 , 2

2 1

!Ã

p1 5 −2 p 5

!

=0

Ã

p1 5 −2 p 5

!

.

Thus x1 , x2 , λ1 , λ2 satisfy, Ax1 = λ1 x1 and Ax2 = λ2 x2 . Thus, the eigenvalues here are real and distinct, and the eigenvectors are orthogonal.

♦

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

520

The next example illustrates the same ideas in the 3 × 3 case.

E XAMPLE 15.11: Consider the matrix B . The eigenvectors are v1 , v2 and p p v3 with corresponding eigenvalues, λ1 = 5 + 10, λ2 = 5 − 10 and λ3 = 4: 



6

2

1

 B= 2

4

 2 ,

1

2

4



 v1 =  

p  10 p 5  2 510  ,

p   1 − 510 p   10  v2 =   −2 5  , 1

1+

1

  −2    v3 =  1  . 2

Because the eigenvalues are distinct, the eigenvectors are orthogonal. Stacking the v i vectors in a matrix, C : p  1 + 510  p 10 C=  2 5 1

and



−2

v1′ v2

 ′ C′ C =   v2 v1

v2′ v2 v3′ v2



 ³ 1  = v1 2

1

v1′ v1 v3′ v1

p

10 p5 10 −2 5

1−

p  4 + 2 510    v2′ v3  0 = ′ v3 v3 0

v1′ v3



´ v3 ,

v2

0 4−2

0

p 10 5



 0 .

0

9

♦

When the eigenvalues are not distinct, some choices for eigenvectors may not be orthogonal. However, the eigenvectors may always be chosen to form an orthogonal set.

E XAMPLE 15.12: Consider the symmetric matrix: 

1

2

 A= 2

4

1

2

 1  2 . 1

Then three eigenvectors corresponding to the eigenvalues are:   1    λ1 = 6, v1 =  2 ; 1



−2



   λ2 = 0, v2 =   1 ; 0



−1



   λ3 = 0, v3 =   0 , 1

15.6. EIGENVALUES AND EIGENVECTORS OF SYMMETRIC MATRICES

521

so Av i = λ i v i , i = 1, 2, 3. For example, considering the first eigenvector:      1 2 1 1 1      2 4 2 2 = 6 2 .      1 2 1 1 1 Note that v2 · v3 = 2 6= 0, so that v2 and v3 are not orthogonal. However, for the repeated root λ2 = λ3 = 0, an eigenvector, x, must sat-

isfy: Ax = 0:



1

2

 2  1

4





  0         2  x2  = 0 . 1 x3 0

1

2

x1

Since the first and third rows are equal, and the second row is twice the first, there is only one independent equation: the system is satisfied by any ( x1 , x2 , x3 ) with x1 + 2 x2 + x3 = 0. One solution is to put x2 = 0, x1 = 1, x3 = −1; another has x2 = 1, x1 = −1, x3 = −1. This gives three orthogonal covectors:       1 −1 −1            v1 =  2 ; v2 =  1 ; v3 =  0 . 1 −1 1

♦ Definition 15.4: A square matrix C is said to be orthogonal if

C ′ C = I. P ROPOSITION 15.2: If C is orthogonal then it is non-singular.

♦

P P ROOF : Suppose ∃α i , i = 1,... , n, not all zero, such that α i c i = 0, where c i is P P the i th column of C . Then for any j = 1, 2,... , n, 0 = c j · ( α i c j ) = α i c j · c i = α j .

This contradicts the assumption that α j 6= 0 for some j.

Alternatively, observe that if C is orthogonal then | C |= 1, because | C ′ C |=

| C || C |=| I |= 1, so that | C |2 = 1 and therefore | C |= ±1. And a non-zero

determinant implies non-singularity. Thus, if C is orthogonal then C ′ = C −1

and C ′ C = CC ′ = I . Furthermore, each root of C is either +1 or −1. E XAMPLE 15.13: Let

C= C is an orthogonal matrix.

Ã

p2 5 p1 5

p1 5 − p2 5

!

:

♦

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

522

P ROPOSITION 15.3: If C is orthogonal then its roots are each equal to

♦

0 or 1.

P ROOF : To see that the roots are ±1, let λ1 ,... , λn be the roots of C , so that for

each λ i , ∃ x 6= 0 such that Cx = λ i x. This implies that

(Cx)′ (Cx) = ( x′ C ′ )(Cx) = x′ (C ′ C ) x = x′ x = λ i x′ (λ i x) = λ2i ( x′ x). Thus x′ x = λ2i ( x′ x) and since x′ x > 0, this implies that λ2i = 1 so that λ i = ±1. P ROPOSITION 15.4: If A is a symmetric n × n matrix, then there is an n × n

orthogonal matrix C such that 1. C ′ AC is diagonal, and

2. The characteristic roots Λ are the elements on the diagonal of

C ′ AC = Λ.

♦

Some results mentioned earlier are easy to prove in the symmetric case. P ROPOSITION 15.5: If A is symmetric, the rank of A is the number of non-zero diagonal elements of Λ.

♦

P ROOF : C ′ AC = Λ so r(C ′ AC ) = r( AC ) = r( A ) = r(Λ) = number of non-zero

elements on diagonal of Λ: r(C ′ AC ) = r( AC ) = r( A ) uses the fact that when B is non-singular, r(B A ) = r( A ) and here C ′ and C are non-singular.

P ROPOSITION 15.6: If A is symmetric and idempotent, tr( A ) = r( A ).

♦

P ROOF : From Proposition 15.4, write C ′ AC = Λ, with Λ the diagonal matrix

with 0 or 1 on the diagonal (the roots are 0 or 1 since A is idempotent). Then

CC ′ ACC ′ = C ΛC ′ or A = C ΛC ′ , since CC ′ = I . Thus tr( A ) = tr(C ΛC ′ ) =

tr(ΛC ′ C ) = tr(Λ) = r(Λ). Since C ′ AC = Λ, r(C ′ AC ) = r(Λ) = tr(Λ) = number of 1s on the diagonal and since r( A ) = r(C ′ AC ), r( A ) = r(Λ). Thus tr( A ) = tr(Λ) =

r(Λ) = r( A ).

P ROPOSITION 15.7: If A is idempotent, the characteristic roots of A are real.♦ P ROOF : Suppose that A is idempotent. Given a root λ, there exists some x 6= 0

with Ax = λ x. Multiply both sides by A : A ( Ax) = λ Ax or A 2 x = λλ x = λ2 x.

Since A is idempotent A 2 = A so Ax = λ2 x and since Ax = λ x this implies λ2 x = λ x and since x 6= 0, λ2 = λ so that λ = 0 or 1.

15.7. REAL AND COMPLEX EIGENVALUES AND EIGENVECTORS

523

Positive definiteness for a complex matrix is ambiguous since, even if x is real, x′ Ax may be a complex number. For real matrices, the expression is unambiguous. P ROPOSITION 15.8: Let λ1 ,... , λn be the roots of A. If A is positive definite then the roots are real and λ i > 0, ∀ i. If A is positive semi-definite then λ i ≥

♦

0, ∀ i.

P ROOF : Let λ be a root of A so that for some x 6= 0, Ax = λ x. Then x′ Ax =

λ x′ x > 0 so that λ =

x′ Ax x′ x ,

a real number. If A is positive definite 0 < x′ Ax so

that λ > 0. If A is positive semi-definite then λ =

x′ Ax x′ x

≥ 0.

15.7 Real and complex eigenvalues and eigenvectors The eigenvalues are the roots of the characteristic equation | A − λ I |= 0. In

Example 15.2 the roots are real, but this need not be the case. Even if the coefficients, a i j , of the matrix A are real, the roots may be complex, as Example 15.14 shows. (In fact, even if all entries of A are strictly positive, some roots may be complex, as Example 15.6 shows.) So, in general, the eigenvectors and eigenvalues of a real matrix may be complex valued. Section 15.6 considered the case where the matrix is real and symmetric and in that case the eigenvalues and eigenvectors may be chosen. (If x 6= 0 satisfies Ax = λ x, then A(ix) = λ(ix), so there are

always complex eigenvectors. If x + i y is an eigenvector, then A(x + z y) = λ(x + i y) so Ax = λ x and A y = λ y and so the real eigenvalue λ has a real eigenvector.)

However, in the study of many problems (such as dynamic systems and Markov processes), real, square, non-symmetric matrices arise naturally, in which case, dealing with complex solutions is unavoidable. For the calculations to follow, some basic facts regarding the algebra of complex numbers is necessary. R EMARK 15.12: A complex number z may be written z = a + ib, where a is the

real part and b the imaginary part. Given two complex numbers (z = a + ib,

w = c + id) the rules of addition, subtraction, multiplication and division are

as follows:

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

524

z + w = ( a + c) + ( b + d ) i z − w = ( a − c) + ( b − d ) i

zw = (a + ib)( c + id ) = ac + adi + cbi + bd i 2 = ac + (ad + cb) i − bd = (ac − bd ) + (ad + cb) i

1 a − bi . = z a2 + b2

The complex conjugate of z = a + bi is denoted z¯ and defined z¯ = a − bi . The p p norm of z is denoted | z | and defined | z |= z z¯ = (a + bi )(a − bi ) = a2 + b2 . So, 1 z

= z −1 =

z¯ | z|

=

a− bi | z| .

R EMARK 15.13: To illustrate the use of this algebra in matrix inversion, consider:

A=

Ã

2i

−2 i

1+ i

!

1 − 3i

.

The calculation of the determinant and inverse are the same for a complex matrix as for a real matrix. Thus, a complex matrix A has inverse B if B satisfies B A = AB = I . (Given any two matrices C and D (real or complex), if C

is an n × m matrix and D an m × r matrix, then E = CD has ( i, j)-component P 2 ei j = m s=1 c is d s j .) The determinant of A is (recall i = −1): | A |= 2 i (1 − 3 i ) − (−2 i )(1 + i ) = 2 i − 6 i 2 + 2 i + 2 i 2 = 4 i − 4 i 2 = 4 + 4 i = 4(1 + i ).

The minor, cofactor and adjoint matrices are: ! ! Ã Ã 1 − 3i +2 i 1 − 3 i −2 i , , C= M= −(1 + i ) 2 i 1+ i 2i

adj( A ) =

Ã

1 − 3i 2i

! −(1 + i )

2i

.

Thus,

A

−1

Ã 1 − 3i 1 = 4(1 + i ) 2 i

This may be simplified further. Note that ing through each term in the adjoint:

1 1+ i

! −(1 + i )

2i

.

= 21 (1 − i ) so

1 4(1+ i)

= 18 (1 − i ). Divid-

1 1 1 1 1 (1 − 3 i ) = (1 − i )(1 − 3 i ) = [1 − 4 i + 3 i 2 ] = [−2 − 4 i ] = − (1 + 2 i ) 4(1 + i ) 8 8 8 4 1 (1 + i ) =− − 4(1 + i ) 4 1 1 1 1 1 (2 i ) = (1 − i )2 i = (2 i − 2 i 2 ) = 2 ( i − (−1)) = (1 + i ). 4(1 + i ) 8 8 8 4

15.7. REAL AND COMPLEX EIGENVALUES AND EIGENVECTORS

525

Therefore:

A

−1

=

Ã − 41 − 21 i 1 4

+ 14 i

1 4

− 14

+ 41 i

!

.

R EMARK 15.14: For complex vectors, the definitions of orthogonality and length require modification, using complex conjugates: • The vector x′ = (1, i ) 6= 0 has x′ x = 1 − 1 = 0. • The vectors x′ = (1, i ) and y′ = (−1, i ) are orthogonal, but x′ y = (1, i )(−1, i )′ = −1 − 1 = −2.

These examples illustrate the need for a generalized definition of inner product with complex variables. Define the inner products of two complex vectors, x and P y, according to x · y = x i y¯ i , where y¯ i is the complex conjugate of yi ( yi = c + di ,

then y¯ i = c − di ). Note that this is not symmetric: x · y 6= y · x. This may be seen

in the scalar case with x = a + bi and y = c + di so that x · y = (a + bi )( c − di ) =

ac + db − (ad − bc) i and y · x = ( c + di )(a − bi ) = ac + db + (ad − bc) i . This definition

captures the correct notion of orthogonality. With x = (1, i ), x¯ = (1, − i ) and x · x =

1 × 1 + i (− i ) = 1 − (−1) = 2 and y = (−1, i ), y¯ = (−1, − i ) gives x · y = 1(−1) + i (− i ) =

−1 − (−1) = 0.

R EMARK 15.15: Linear independence for complex vectors is defined as follows. Let C denote the complex numbers and C k = C × · · · × C the k-product of C . P The vectors x1 ,... , xn are said to be linearly independent if ni=1 c i x i = 0 implies that c i = 0 for all i (here, each c i may be real or complex).

Let e j be the real n-vector with 1 in the i th position and 0 elsewhere. Let

x be an arbitrary n-vector with jth element x j . Then, letting c j = x j , x = Pn c e . Therefore, the collection of vectors { e j }nj=1 is a basis for C n and C n j =1 j j

has dimension n.

Apart from the case of real symmetric matrices, there is limited benefit to assuming that the entries of a matrix are real, since the eigenvalues and eigenvectors may be complex. Example 15.14 shows that matrices with real entries may have complex eigenvalues and eigenvectors.

E XAMPLE 15.14: Consider:

A=

Ã

1

5

−2

3

!

.

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

Then

A − λI =

526

Ã 1−λ

−2

5

!

3−λ

,

The characteristic equation for A is: | A − λ I |= (1 − λ)(3 − λ) + 10 = λ2 − 4λ + 13 = 0,

and this has roots or eigenvalues: λ1 , λ2 =

p 4 ± 16 − 4 · 13 = 2 ± 3 i, 2

so the roots are complex. The eigenvectors corresponding to λ1 = 2 + 3 i and λ2 = 2 − 3 i are with normalization of the second component to 1: ! Ã ! Ã 1 1 2 (1 − 3 i ) 2 (1 + 3 i ) , x2 = x1 = 1 1

where the computations make use of (1 + 3 i )(1 − 3 i ) = 10, so that (1 − 3 i ) = p 10 −1 = | zz¯|2 , | z |= a2 + b2 , or equivalently a+1bi = 1+3 i . Recall z = a + ib, z 1 (a − bi ). a2 + b2

♦

The next example illustrates the same point for a strictly positive matrix, given a real strictly positive matrix with complex eigenvalues. E XAMPLE 15.15: Consider,  1   A = 8 1



1

20

7

 7 ,

20

1

 1−λ   A − λI =  8 1

1 7−λ 20

20



 7  , 1−λ

and | A − λ I |= 2926 + 153λ + 9λ2 − λ3 = −(λ − 22)(λ2 + 13λ + 133). Therefore, λ1 = 22 is a root of the equation and an eigenvalue. The other two eigenval-

ues are obtained as the roots:

p p p 1 1 1 [−13 ± 132 − 4(133)] = [−13 ± 169 − 532)] = [−13 ± −363] 2 2 2 p 1 = [−13 ± −3(112 )] 2

λ2 , λ2 =

=

p 1 [−13 ± (11 3) i ]. 2

♦

15.7. REAL AND COMPLEX EIGENVALUES AND EIGENVECTORS

527

R EMARK 15.16: One case where complex roots arise naturally is with rotations. Let θ be an angle and define the matrix R(θ): R(θ) =

Ã

cos(θ)

− sin(θ)

sin(θ)

!

cos(θ)

.

Then, R(θ)x rotates the vector x counterclockwise θ degrees. A 90◦ rotation is given by:

R(90) =

Ã

cos(90) sin(90)

− sin(90)

cos(90)

!

=

Ã 0

1

−1

0

!

.

So, for example, the vector x = (1,1) under R(90) (see Figure 15.4) gives: Ã 0

1

−1

0

!Ã ! 1

1

=

Ã

−1

1

!

= x′ .

Every vector is rotated 90◦ under the transformation R(90). Therefore, for any x, x′ = R(90)x can never be a real scalar multiple of x, since x and x′ point in

different directions. In such circumstances complex vectors and complex roots

arise.

x∗

x

Figure 15.4: Rotation of 90◦

Definition 15.5: Let A be a square matrix with complex entries (a complex-valued matrix). • The matrix A¯ is obtained from A by taking the complex conjugate of each entry (replace each entry with its complex conjugate).

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

528

def • The conjugate matrix A ∗ = A¯ ′ , is called the conjugate transpose of A ,

obtained by taking the complex conjugate of each entry and then taking the transpose.

E XAMPLE 15.16: The conjugate transpose:

A=

Ã

2i −2 i

1+ i

!

1 − 3i

,

Ã −2 i ¯ A= 2i

!

1− i

1 + 3i

∗

,

A =

Ã

−2 i

1− i

2i 1 + 3i

!

.

♦

When A is real, A ∗ is the transpose of A . The conjugate transpose has the following properties: ( A ∗ )∗ = A ; ( A + B )∗ = A ∗ + B ∗ ; ( cA )∗ = c∗ A ∗ ; ( AB )∗ = B ∗ A ∗ . Definition 15.6: A complex-valued square matrix, A , with A = A ∗ is called a

Hermitian matrix.

Note that for any matrix A , A = A ∗ implies that the diagonal entries are real (because each diagonal element must equal its conjugate). If a matrix is real, the Hermitian condition is equivalent to symmetry.

E XAMPLE 15.17: Matrix A is symmetric but not Hermitian; matrix B is Hermitian.

A=

Ã

3 + 2i 2 − 4i

2 + 4i 1 + 5i

!

,

B=

Ã

3

2 + 4i

2 + 4i 1

!

.

♦

Definition 15.7: A complex-valued square matrix, A , is called a normal matrix if A ∗ A = A A ∗ . If A is Hermitian, then A ∗ A = A A = A A ∗ , so that Hermitian matrices are normal.

All normal matrices are diagonalizable, but not all diagonalizable matrices are normal. Definition 15.8: A complex-valued square matrix, A , is called a unitary matrix if A ∗ A = A A ∗ = I . Thus, A is unitary if and only if it has an inverse A −1 that is equal to its conjugate transpose A ∗ : A −1 = A ∗ .

15.8. DIAGONALIZATION

529

A unitary matrix corresponds to orthogonality for a real matrix. From the definitions, both Hermitian and unitary matrices are normal.

E XAMPLE 15.18: Consider:

A= =

Ã

1− i

1+ i Ã

1+ i

1− i

!

,

2(1 + i )(1 − i )

A A∗ =

(1 − i )2 + (1 + i )2

Ã

1− i

1+ i

!Ã

1+ i

1− i 1− i ! (1 − i )2 + (1 + i )2 . 2(1 + i )(1 − i ) 1+ i

1− i

!

1+ i

Since (1 + i )(1 − i ) = 1 − i 2 = 2 and (1 − i )2 + (1 + i )2 = 1 − 2 i + i 2 + 1 + 2 i + i 2 =

2 + 2 i 2 = 0, A A ∗ = 4 I . Similarly, A ∗ A = 4 I , so that B = 12 A satisfies BB ∗ =

B∗ B = I .

♦

15.8 Diagonalization One important role of eigenvalue theory is the diagonalization of a matrix. Definition 15.9: A square matrix A is diagonalizable if there is a non-singular matrix Q and a diagonal matrix D , such that A = QDQ −1 . The main theorem on diagonalization is: Theorem 15.6: A square n × n matrix A is diagonalizable if and only if it has n linearly independent eigenvectors.

If there are n linearly independent eigenvectors, { x i }ni=1 , then for each i , Ax i = λ i x i ; and forming a matrix X whose columns are the x i gives A X = X Λ, where Λ

is a diagonal matrix with the eigenvalues on the diagonal. Thus X −1 A X = Λ.

A case of particular interest is where the diagonalizing matrix is orthogonal (or unitary, allowing for complex eigenvectors). Definition 15.10: A complex-valued square matrix, A , is said to be unitarily diagonalizable if there is a unitary matrix P with complex entries such that:

P −1 AP = P ∗ AP = Λ, where Λ is a diagonal matrix with complex entries.

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

530

Theorem 15.7: A complex-valued square matrix, A, is unitarily diagonalizable if and only if it is normal.

E XAMPLE 15.19: This theorem leaves open the possibility that a realvalued matrix may be unitarily diagonalizable only with a complex matrix P. For example, A=

Ã

0

1

−1

0

!

,

A A ′ = A ′ A = I.

This matrix is not Hermitian, but it is normal and hence unitarily diagonalizable. The matrix has eigenvalues λ1 = i, λ2 = − i (with corresponding eigenvectors x1 = (− i,1) and x2 = (i,1)). Since A is unitarily diagonal-

izable, then there is P such that P −1 AP = Λ, a diagonal matrix. Then

AP = P Λ, so that the columns of P are eigenvectors of A and the elements

of Λ corresponding eigenvalues. (Since the eigenvectors, Λ, are complex,

if P diagonalizes A, P must be complex, otherwise AP is real while P Λ

♦

is complex.)

15.9 Properties of eigenvalues and eigenvectors The following discussion is concerned with arbitrary matrices, real or complex. As already mentioned, even if a square matrix is real, it may have complex eigenvalues and eigenvectors, so that there is little advantage in assuming the matrix to be real. For complex matrices that are Hermitian, all the eigenvalues are real, but the eigenvalues may be complex. For real symmetric matrices, all eigenvalues and eigenvectors are real. This case is discussed in Section 15.6. Theorem 15.8: Let λ1 ,... , λn be the roots of A (which may be complex). Then • The trace of A equals the sum of the eigenvalues of A: tr(A) =

Pn

i =1 λ i .

• The determinant of A equals the product of its eigenvalues. | A |=

Qn

i =1 λ i .

• The trace of A k is equal to the sum of the roots, each raised to the power of P k : tr(A k ) = i λki . For Hermitian matrices: Theorem 15.9: If A is a Hermitian matrix, all the eigenvalues are real. (Although the eigenvectors must be complex whenever A has complex elements.)

15.9. PROPERTIES OF EIGENVALUES AND EIGENVECTORS

531

E XAMPLE 15.20:

A=

Ã 4

1

2

3

!

,

λ1 = 5, λ2 = 2,

tr( A ) = λ1 + λ2 ,

| A |= 10 = λ1 λ2 ,

and, for example, 3

A =

Ã 86

39

78

47

!

,

with eigenvalues λˆ 1 = 8, λˆ 2 = 125, tr( A ) = 86 + 47 = 133 = 8 + 125, and | A |=

♦

1000 = 8 × 125.

Theorem 15.10: For any square matrix, A : (i) The roots of A ∗ are the conjugates of the roots of A . (ii) If A is non-singular, the roots of A −1 are µ1 ,... , µn where µ i = λ i −1 and λ i is a root of A .

(iii) If q is a real number, and λ an eigenvalue of A , then qλ is an eigenvalue of

qA . (iv) If q is a positive integer, and λ an eigenvalue of A , then λ q is an eigenvalue of A q . P ROOF : (i) Recall that |B¯ | = |B |, the determinant of the conjugate is the conju-

gate of the determinant, and the conjugate of a sum is the sum of the conjugates, A + B = A¯ + B¯ . If λ i is a root of A then | A − λ i I |= 0, so | A¯ − λ¯ i I |= 0. Therefore,

λ¯ i is an eigenvalue of A¯ and so λ¯ i is an eigenvalue of A¯ ′ = A ∗ .

(ii) Let (λ1 ,... , λn ) be the roots of A so that for each i, Ax i = λ i x i for some x i 6= 0. Therefore x i = λ i A −1 x i . Since λ i 6= 0, dividing by λ i gives A −1 x i = so that µ i =

1 , λi

a root of A −1 .

1 λi

xi = µi xi ,

(iii) and (iv) follow from the definition. P ROPOSITION 15.9: r( A ) = number of non-zero roots of A .

E XAMPLE 15.21: With

A=

Ã

1

3

0

4

!

and B =

Ã

4

2

2

1

!

,

♦

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

532

r( A ) = 2, and λ1 = 1, λ2 = 4; while r(B ) = 1, and λ1 = 5, λ2 = 0. So A has rank

2 and two non-zero roots while B has rank 1 and one non-zero root.

♦

Notice that in Example 15.21 the matrix A is not Hermitian (symmetric as a real matrix). Although the eigenvectors are independent, they are not orthogonal.

E XAMPLE 15.22:

A=

Ã

1

! 3

0

4

.

The characteristic equation is (1 − λ)(4 − λ) = 0 with roots 1 and 4. Thus,

Ax i = λ i x i or ( A − λ i I ) x i = 0:

Ã 1 − λi

!Ã

3

0

4 − λi

x i1

!

x i2

Ã ! 0

=

0

.

Taking λ i = 1: Ã

1−1

!Ã

3

0

4−1

x i1

Ã ! Ã 0 0 ; = 0 0

!

x i2

3

!Ã

3

x i1

!

x i2

=

Ã ! 0

0

.

So, x1 = (s, 0) satisfies this equation for any s. Taking λ i = 4: Ã

1−4 0

3 4−4

!Ã

x21

!

x22

Ã ! Ã −3 0 ; = 0 0

!Ã ! 3 x21

0

x22

=

Ã ! 0

0

.

So, x2 = ( t, t) satisfies this equation for any t. With the normalizations of

s = t = 1, the eigenvectors are:

x1 =

Ã ! 1

0

, x2 =

Ã ! 1

1

.

Whatever values are chosen for s and t, these must be non-zero since the eigenvectors must all be chosen non-zero. Therefore, in this example, if x˜ 1 and x˜ 2 are eigenvectors, then x˜ 1 · x˜ 2 = (s, 0) · ( t, t) = st 6= 0. So, the eigenvectors

cannot be chosen to be orthogonal.

♦

P ROPOSITION 15.10: If A is diagonal so that a i j = 0, i 6= j, then the roots of A

are its diagonal elements.

♦

15.9. PROPERTIES OF EIGENVALUES AND EIGENVECTORS

533

P ROOF : A = {a i j } , and a i j = 0 if i 6= j. Then | A − λ I |= 0 if λ = a ii for some i

since the i th row (and column) of A − λ I contains all zeros and so A − λ I must

have a zero determinant.

In view of Proposition 15.9: P ROPOSITION 15.11: The trace of an idempotent matrix equals its rank.

♦

The following discussion provides insight into the long-run behavior of powers of a matrix. Given the eigenvectors v1 ,... , vn and eigenvalues λ1 ,... , λn of the matrix A , for arbitrary numbers α1 ,... , αn , let

z = α1 v1 + α2 v2 + · · · + αn vn =

n X

αi vi .

i =1

Then

Az = α1 Av1 + α2 Av2 + · · · + αn Avn Az = α1 λ1 v1 + α2 λ2 v2 + · · · + αn λn vn . Repeating this,

A 2 z = A Az = α1 λ1 Av1 + α2 λ2 Av2 + · · · + αn λn Avn

A 2 z = A Az = α1 λ21 v1 + α2 λ22 v2 + · · · + αn λ2n vn . After t iterations,

A t z = α1 λ1t v1 + α2 λ2t v2 + · · · + αn λnt vn . Pick the largest eigenvalue of A , say λ1 is larger than any other eigenvalue: |λ1 | > |λ i |, i 6= 1, and rearrange:

A t z = λ1t {α1 v1 + α2

µ

λ2 λ1

¶t

v2 + · · · + α n

µ

λn λ1

¶t

vn }.

Since λ1 is strictly larger than any other eigenvalue, then for t large, A t z ≈ λ1t α1 v1

or ( λ1 A ) t z ≈ α1 v1 . 1

Thus,

regardless

of

z,

considering

the

sequence

{ Az, A 2 z, A 3 z,... , A t z}, for large t, A t z is proportional to the eigenvector asso-

ciated with the largest eigenvalue.

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

534

E XAMPLE 15.23: With

A=

Ã

1

! 3

0

4

,

x1 =

Ã ! 1

0

,

x1 =

Ã ! 1

1

,

A has two roots, λ1 = 1, λ2 = 4, with corresponding eigenvectors, x1 and x2 .

Taking the largest root, λ2 ,

Ã 1 1 A= 4 λ2 0

3 4

1

!

.

Then, for example, µ

1 A λ2

¶5

=

Ã

!5

1 4

3 4

0

1

=

!

Ã

Ã

1 1024

1023 1024

0

1

!

≈

Ã

0

1

0

1

!

,

so that µ

1 A λ2

¶5 Ã

z1 z2

=

z2

!

z2

= z2

Ã ! 1

1

= z2

Ã ! 1

1

,

♦

the eigenvector (subject to scaling) of λ2 .

More generally, let λ′ = max i |λ i | > 0 and λˆ i = and the expression becomes:

λi λ′

with |λˆ i | ≤ 1. Assume that λ1 = λ′

© ª A t z = λ1t α1 v1 + α2 λˆ 2t v2 + · · · + αn λˆ nt vn .

Let I = { i | |λ i | = |λ′ |}: j ∈ I , if and only if λ j = max i λ i . Then |λˆ j | = 1 if j ∈ I and |λˆ j | < 1 if j 6∈ I , so that for j 6∈ I , λˆ t → 0 as t → ∞. In this case, j

A t z → λ1t

15.9.1

©X

j∈I

ª .α j v j .

Largest eigenvalues and eigenvalue multiplicity

Let A be a square matrix. Recall that given a square matrix A , recall that the determinant | A −λ I | = p(λ)

defines a polynomial in λ:

p(λ) = a n λn + a n−1 λn−1 + · · · + a1 λ + a0 ,

15.9. PROPERTIES OF EIGENVALUES AND EIGENVECTORS

535

with n roots λ1 , λ2 ,... , λn , so that

p(λ) = a n (λ − λ1 )(λ − λ2 ) · · · (λ − λn ). Definition 15.11: If the root λ j occurs exactly k times, then it is said to have (algebraic) multiplicity k. In this case, there is a polynomial f (λ), f (λ j ) 6= 0 and

p(λ) = (λ − λ j )k f (λ). If a root has multiplicity 1 it is called a simple root. E XAMPLE 15.24: Let 



1

0

0

 A= 0

2

 0 .

0

0

3

Then the corresponding polynomial is: p(λ) = (1 − λ)(2 − λ)(3 − λ). So, each root is simple with multiplicity 1. If

 1   B = 0 0



0

0

1

 0 ,

0

2

then the corresponding polynomial is p(λ) = (1 − λ)(1 − λ)(2 − λ) = (1 − λ)2 (2 − λ). The root λ = 2 is simple, the root λ = 1 has multiplicity 2.

♦

Definition 15.12: The (geometric) multiplicity of an eigenvalue is the number of linearly independent eigenvectors that have that eigenvalue.

E XAMPLE 15.25: Taking the matrix B of Example 15.24, the expression

Bx = λ x or (B − λ I ) x = 0 gives, with λ = 1:  1−1   0  0

0 1−1 0





  0         = 0   x2  0  2 − 1 x3 0

0

x1

or

0 x1 + 0 x2 = 0

0 x1 + 0 x2 = 0 , 1 x3 = 0

so the set of eigenvectors associated with λ = 1 are vectors of the form: 

x1



   x2    0

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

536

and so the geometric multiplicity of the eigenvector λ = 1 is equal to 2. The

♦

geometric multiplicity of eigenvector λ = 2 is 1.

If n i is the algebraic multiplicity and m i the geometric multiplicity of root λ i , then m i ≤ n i .

E XAMPLE 15.26: Consider the matrix:

A=

Ã

1

2

0

1

!

p(λ) = (1 − λ)2 .

;

Thus, the root λ = 1 has algebraic multiplicity of 2. With λ = 1, ( A − λ I ) x = 0 or

Ã

1−λ 0

2

!Ã

1−λ

x1 x2

!

=

Ã ! 0

0

, or

Ã

0

2

0

0

!Ã

x1 x2

!

=

Ã ! 0

0

.

Any vector with x1 arbitrary and x2 = 0 satisfies this equation. So, the root

λ = 1 has geometric multiplicity of 1.

♦

Exercises for Chapter 15

Exercise 15.1 Let

A=

Ã

1

2

4

3

!

.

Give the characteristic equation and find the eigenvalues and a corresponding set of eigenvectors. Exercise 15.2 Let 



1

0

0

 B= 2

4

 2 .

4

0

2

Give the characteristic equation and find the eigenvalues and a corresponding set of eigenvectors. Exercise 15.3 Let

A=

Ã

− 46

− 12 4

6 4 11 4

!

.

Let λ1 and λ2 be the eigenvalues with eigenvectors x1 and x2 . Set X as the matrix with columns x1 and x2 and Λ the diagonal matrix with λ1 and λ2 on the main diagonal. Use the formula A = X Λ X −1 to show that A k → 0 as k → ∞.

537

CHAPTER 15. EIGENVALUES AND EIGENVECTORS

538

Exercise 15.4 Let A be an n × n matrix. Suppose that the sum of the absolute P values of the entries in any column is no larger than 1 (for each j, ni=1 | a i j |≤ 1). Show that all eigenvalues are equal to or less than 1 (in absolute value). Exercise 15.5

Show that a square matrix A is non-singular if and only if all

eigenvalues are not zero. Exercise 15.6 Consider the matrix:

A= A − λI =

Ã Ã

2

! 5

4

3

,

2−λ

5

4

!

3−λ

.

Therefore, | A −λ I |= −14−5λ+λ2 (which factors to (λ+2)(λ−7)). The characteristic equation is:

p(λ) = λ2 − 5λ − 14 = 0. According to the Cayley–Hamilton theorem, the matrix satisfies the characteristic polynomial

p( A ) = A 2 − 5 A − 14 = 0. Show that this condition is satisfied for this matrix. Exercise 15.7 Let p(λ) = a2 λ2 + a1 λ + a0 = 0 be the characteristic equation of a 2 × 2 matrix A . From the Cayley–Hamilton theorem, p( A ) = a2 A 2 + a1 A + a0 I = 0. This implies that A 2 may be computed as A 2 = − aa 1 A − aa 0 I . Illustrate this calcu2

2

lation with the matrix:

A=

Ã

4

5

2

3

!

.

Show how this procedure may be used to solve for A k in terms if A and I . Exercise 15.8 Show how the Cayley–Hamilton theorem may be used to solve for

A k in terms of A and I when A is a 2 × 2 matrix. Show also how it may be used to

find the inverse of A .

Exercise 15.9 Consider:

B=

Ã 1

5

2

3

!

.

EXERCISES FOR CHAPTER 15

539

Find the eigenvalues and eigenvectors of B . Exercise 15.10 Find the eigenvalues of

A=

Exercise 15.11

Ã

1

2

3

4

!

.

For complex matrices, the inverse is defined in the usual way.

If A is a square matrix with complex entries (a complex valued matrix), A is invertible if there is a matrix B with AB = B A = I , and B is called the inverse of

A and denoted A −1 . Calculate the determinant and inverse of: A=

Ã 2+2i

3+2i

4+ i

5+2i

!

.

This page intentionally left blank

16 Differential equations

16.1 Introduction In this chapter, dynamic models are considered where some variable varies continuously as a function of some other variable (such as time). If y is a function of x, then assuming differentiability, the derivative is written

dy dx

or y′ (x). Knowing

y, it is easy to determine the derivative y′ (x). However, in a variety of problems, the situation is reversed – the derivative, y′ (x) is known, but not the actual function y(x). And in this class of problem, the task is to determine y(x) from knowledge of y′ (x), called solving the differential equation. Section 16.1.1 provides a few motivating examples to illustrate how such dynamics may arise. Generally, it is difficult to determine y(x) from knowledge of x. For example, given y(x) = ln(1 + x2 ) + xe x , differentiation gives y′ (x) = given y′ (x) =

2x 1+ x 2

2x 1+ x 2

+ e x + xe x =

2x 1+ x 2

+ (1 + x)e x . But if

+ (1 + x)e x , it is more difficult to see that y(x) = ln(1 + x2 ) + xe x .

The general approach to this problem is to categorize differential equations into classes of problems of varying difficulty. In a first-order differential equation, the dynamic relation is expressed by a function f with y′ = f (y, x). If this can be written in the form a1 (x)y′ (x) + a0 (x)y = b(x) it is called a first-order linear dif-

ferential equation. Similarly, a second-order differential equation has the form y′′ (x) = f (y, y′ , x), where y′′ is the second derivative of the (unknown) function y.

Again, the task is to find the function y(x) that satisfies this equation. Depending on the function f , this problem may be extremely difficult to solve. In Section 16.2,

first-order linear differential equations are considered. These arise in a number of economic models – such as growth models. Turning to non-linear differential equations, these are typically difficult to solve but there are some specific functional forms for which a solution may easily be found. Two non-linear first-order 541

CHAPTER 16. DIFFERENTIAL EQUATIONS

542

differential equations (the logistic and Bernoulli equations) are introduced in Section 16.3. Given a first-order differential equation, y′ (x) = f (y, x), a fundamental question concerns the existence of a solution: is there a unique function y that satisfies this equation? Section 16.4 provides sufficient conditions for the existence of a unique solution. Second-order linear differential equations are considered in Section 16.5. Finally, Section 16.6 introduces linear systems of differential equations and uses that framework to study price dynamics in a multimarket system.

16.1.1

Preliminary discussion

A differential equation is a rule governing the change in some variable. For example, suppose there is a stock of wealth invested and growing at the rate r. Thus, if the current stock is y(t), and the value a short time later is y(t + ∆ t) then the per-

y(t+∆ t)− y(t) . The increase per unit of time over the time y(t) 1 y(t+∆ t)− y(t) 1 y(t+∆ t)− y(t) period t to t + ∆ t is given by ∆ t = y(t) . If y(t) varies smoothly y(t) ∆t 1 dy with t, then this is approximately y(t) d t , as ∆ t goes to 0. With a constant growth

centage increase is given by

rate of r, if y(t) is the stock of wealth at time t, the percentage change in the stock is given by the equation

1 dy y(t) d t

= r, so that

dy − r y(t) = 0. dt This is a differential equation where the change in y over time is governed by the current value of y and the parameter r. A solution to the differential equation is dy d t − r y(t) = 0 for each t. This d ln y(t) 1 dy that d t = y(t) d t , so that

a function y(t) satisfying the expression equation is easily solved, recognizing ln y(t) − ln y(t0 ) =

Zt t0

d ln y(t) dt = dt

Zt t0

1 d y(t) dt = y(t) dt

Zt t0

differential

rdt = r(t − t0 ).

Thus, ln y(t) = ln y(t0 ) + r(t − t0 ), so that y(t) = eln y(t 0 )+r(t− t 0) = y(t0 )e r(t− t 0) .

More generally, given y′ (x), the function y(x) is an unknown function of the

variable x (or t when it is convenient to emphasize time), but the derivative y′ (x) is specified in terms of some function f : dy = f (y, x). dx Such an equation is called an ordinary differential equation of order 1. Writing y(n) for the nth derivative of y with respect to x, an equation of the form y(n) = f (y(n−1), y(n−2) ,... , y(1) , y, x)

16.1. INTRODUCTION

543

is called an nth-order ordinary differential equation. A solution is a function y( x) satisfying the equation. The next example illustrates a simple application (a first-order linear differential equation), determining the flow of payments required to amortize a loan in some fixed period of time.

E XAMPLE 16.1: Suppose that an individual takes out a loan of value P0 (the principal at time t = 0). The interest rate per unit of time is r and the individual agrees to pay D per unit of time to repay the loan. Thus,

if the loan has outstanding value P( t) at time t, then the change in the value of the outstanding loan over a short period of time, ∆ t is

∆P ( t) = P ( t)r∆ t − D ∆ t = (P ( t)r − D )∆ t.

Dividing by ∆ t and letting the time increment, ∆ t go to 0,

dP ( t) = P ( t)r − D dt or

P ′ ( t) − rP ( t) = −D. Comparing with the general formula, see Equation 16.6, P ( t) is given by: µ

¶ D rt D P ( t) = P0 − e + r r

= P0 e rt +

D (1 − e rt ). r

With this formula, one may consider such questions as the per unit of time repayment, D (T ), required to amortize the loan in some period, T , so that

P (T ) = 0. The condition P (T ) = 0 implies that 0 = P0 e rT + Dr (1 − e rT ), so that D (T ) = −

r 1 − e rT

P0 e rT .

The figure plots D ( t) for P0 = 100 and r = 0.05.

CHAPTER 16. DIFFERENTIAL EQUATIONS

544

D (T ) 100 80 60 40 20 3

1

5

7

9

T

♦

16.2 First-order linear differential equations A first-order linear differential equation has the form:

a1 ( x)

dy + a 0 ( x ) y = b ( x ). dx

This expression defines the movement of y,

dy dx ,

in terms of its current value, y,

and the value of a variable x. A solution to the differential equation is a function

y( x) satisfying this equation on some interval in the domain of x.

16.2.1

Constant coefficients dy

When the coefficients a1 , a0 and b do not depend on x and a1 ( x) 6= 0, a1 d x + a0 y = b may be rewritten

d y a0 d x + a1

y=

b a1 ,

giving the simple first-order differential equation:

dy + α y = β. dx

(16.1)

A particular solution, y, is any function satisfying Equation 16.1. If ya and y b satisfy Equation 16.1, then y c = ya − y b satisfies the homogeneous Equation:

dy + α y = 0. dx

(16.2)

16.2. FIRST-ORDER LINEAR DIFFERENTIAL EQUATIONS

545

A solution to the homogeneous equation is called a complementary solution. Therefore, any two particular solutions differ by a complementary solution. So, given any particular solution, y p , to Equation 16.1, every particular solution, y, to Equation 16.1 is equal to y p plus a complementary solution, y c : y = y p + y c . The family of particular solutions is called the general solution.

Therefore, every solution to Equation 16.1 may be found by finding a particular solution and adding a complementary solution. Using this observation, the general solution may be obtained by finding a particular solution and then all complementary solutions. For a particular solution, with constant coefficients, this is easy to find: set dy dx

= 0 and pick the constant y to solve α y = β: y p ( x) =

homogeneous case where β = 0, Integrating, Zx x0

1 dy dx = y dx

Zx x0

dy dx

β . α

Next, consider the

+ α y = 0, and rearrange to give

1 ′ y ( x˜ )d x˜ = ln y( x) − ln y( x0 ) = − y( x˜ )

Zx x0

1 dy y dx

= −α.

αd x˜ = −α( x − x0 ).

Therefore, ln y( x) = [ln y( x0 ) + α x0 ] − α x. So, the complementary solution is:

y c ( x) = [ eln y( x0 )+α x0 ] e−α x = c0 e−α x . Note that c0 is an indeterminate constant of integration, so this expression is a family of functions. Combining the particular and complementary solutions gives:

y( x ) = y p ( x ) + y c ( x ) =

β

α

+ c 0 e −α x .

If the value of ý is given at x0 , y0 = y( x0 ), then y0 = ³ β

or c0 = y0 − α eα x0 . Therefore,

(16.3) (16.4) β α

β

+ c0 e−α x0 or y0 − α = c0 e−α x0

y( x ) = y p ( x ) + y c ( x ) µ ¶ β β −α( x− x0). = + y0 − e α α

(16.5) (16.6)

R EMARK 16.1: What happens to the solution function y( x) as x becomes very large? From the solution 16.4, it is clear that the answer depends on α. If α > 0,

then e−α x goes to 0 as x becomes large, and if α < 0, then e−α x goes to infin-

ity (or minus infinity, depending on c0 ) as x becomes large. Call the first case “stable” and the second “unstable”. Considering the complementary equation, dy dx

+ α y = 0, a trial solution for y is y( x) = e rx , so that the equation becomes

CHAPTER 16. DIFFERENTIAL EQUATIONS

546

0 = re rx + α e rx = (r + α) e rx giving (r + α) = 0. This is called the characteristic

polynomial of the homogeneous equation and has root r = −α. Therefore, the dynamic system is stable if the characteristic equation has a negative root.

E XAMPLE 16.2: The Harrod–Domar model of economic growth postulates that output and capital are related according the formula K = νY , where

K is the aggregate amount of capital, Y national output and ν a coefficient

determining the amount of capital required per unit of output produced. Savings, S , is related to output according to the formula, S = sY , so that savings is proportional to Y . Investment I equals saving S , I = S . Capi-

tal changes over time owing to two effects: investment raises capital and depreciation reduces capital, where depreciation is assumed to be at rate δ. Therefore, over an increment of time ∆ t, the change in capital is

∆ K = ( I − δ K )∆ t ∆K = (sY − δνY )∆ t, using I = sY and K = νY . Since K = νY , ∆K = ν∆Y and so, ν∆Y = ( I − δK )∆ t ´ ³s − δ Y ∆t ∆Y = ν ´ ∆Y ³ s = −δ Y. ∆t ν

In continuous time, this is written:

dY dt

=

the earlier discussion, this has solution:

¡s

ν

¢ − δ Y , or

dY dt

−

¡s

ν

¢ − δ Y = 0. From

s

Y ( t) = Y (0) × e( ν −δ) t .

♦

The following example gives the rate of price adjustment as dependent on the gap between supply and demand.

E XAMPLE 16.3: The linear demand and supply model is given by the equations:

Qd = α − β p Q s = γ + δ p,

16.2. FIRST-ORDER LINEAR DIFFERENTIAL EQUATIONS

547

where α > γ > 0 and β, δ > 0. Define a dynamic model by setting

Q s ), where ω > 0 is a constant. Thus,

dp dt

= ω[α − γ − (β + δ) p] or

dp dt

= ω(Q d −

dp + ω(β + δ) p = ω(α − γ). dt Solving,

p( t) = c0 e−ω(β+δ) t +

α−γ β+δ

.

α −γ

α −γ

At t = 0, p(0) = c0 + β+δ , so that c0 = p(0) − β+δ , and substituting this gives, h α − γ i −ω(β+δ) t α − γ . e + p( t) = p(0) − β+δ β+δ

♦

The next example, taken from a physics application, illustrates the calculation of terminal velocity – where a falling object reaches a constant speed when the resistance force from the air equals the force of gravity. E XAMPLE 16.4: According to Newton’s second law, F = ma, force equals

mass times acceleration. If s denotes speed or velocity, is equal to change in speed. Thus F =

m ds dt .

ds dt

= a( t), acceleration

Consider an object falling from

the sky. In a vacuum, gravity creates force proportional to mass: mg where

g is a gravity parameter measured in terms of acceleration ( g is roughly 9.8 m/s2 ). So, for an object falling in a vacuum, mg = m ds d t . However, in the

air, an object with mass encounters resistance proportional to speed, tending

to offset gravity. Suppose the proportionality factor is k, so that the accelerads k becomes m ds d t = mg − ks( t). Or, d t + m s( t) = g. Terminal velocity ³tion formula ´ ds d t = 0 occurs when speed satisfies mg − ks( t) = 0. A natural question to

ask is: how many seconds does it take for the speed of the object to approach terminal velocity? Consider the differential equation describing a falling object:

k ds d t + m s( t) =

g. Suppose that at time t0 = 0, the object begins falling so that, initially, the

speed is 0: s0 = s( t0 ) = 0. According to the formula:

³ gm ´ −kt gm gm e + s( t) = s0 − = (1 − e−kt ). k k k

Thus, the speed converges to s(∞) = fraction of terminal velocity is f =

gm k ,

s( t ) s(∞)

so that the speed at time t as a

= (1 − e−kt ). Therefore, e−kt = (1 − f )

CHAPTER 16. DIFFERENTIAL EQUATIONS

548

and −kt = ln(1 − f ) so that t = − 1k ln(1 − f ). For a skydiver in an outstretched position, terminal velocity is in the approximate range of 120–130 miles

♦

per hour.

16.2.2

Variable coefficients

A general first-order (linear) differential equation has the form:

a1 ( x)

dy + a 0 ( x ) y = b ( x ), dx

(16.7)

where a1 ( x) 6= 0, ∀ x. Equation 16.7 is called the inhomogeneous equation. The

homogeneous equation is obtained when b( x) = 0, ∀ x:

a1 ( x)

dy + a 0 ( x ) y = 0. dx

(16.8)

To solve Equation 16.7 write it in the form:

d y a0 ( x) b( x ) + y= , dx a1 ( x) a1 ( x) or, with α( x) =

a 0 ( x) a 1 ( x) ,

and β( x) =

b ( x) a 1 ( x) ,

dy + α( x) y = β( x). dx

(16.9)

The corresponding homogeneous equation is:

dy + α( x) y = 0. dx As noted earlier, a solution to Equation 16.9, y p , is called a particular solution. The family of all particular solutions is called the general solution. If y c is a solution of the homogeneous equation (the complementary solution) and y p is a particular solution, then y p + y c is also a particular solution. Note that if y c is a solution to the homogeneous equation, then so is k y c for any constant k (since d yc dx

+ α( x) y c = 0 implies k

d yc dx

+ α( x)k y c = 0). Therefore, if there is a solution, there

will be an infinite number of solutions, requiring some initial condition to provide a unique solution. Considering the homogeneous component,

dy dx

1 dy = −α( x). y dx

+ α( x) y = 0, rewrite as:

16.2. FIRST-ORDER LINEAR DIFFERENTIAL EQUATIONS

549

Integrating, ln y( x) − ln y( x0 ) = −

Rx

˜ )d x, ˜ x0 α( x

y( x ) = y( x0 ) e

−

so that Rx

x0

α( x˜ ) d x˜

(16.10)

is a solution of the homogeneous equation. We can check this by differentiating: R d y( x ) − x α( x˜ ) d x˜ = y( x0 )[−α( x)] e x0 dx R o n − x α( x˜ ) d x˜ = [−α( x)] y( x0 ) e x0

= [−α( x)] y( x).

Solving the general (inhomogeneous) case, given in Equation 16.9, to obtain a particular solution makes use of integrating factors, described next.

16.2.2.1

Integrating factors

Returning to the general equation, Equation 16.9,

d y( x ) + α( x) y( x) = β( x), dx observe that the left side is similar in form to q( x) y′ ( x) + q′ ( x) y( x), which is the

derivative of q( x) y( x). If Equation 16.9 is expressed in this way, recovery of the

function y( x) is simplified. To carry out this procedure, multiply the equation by some function q( x) (called an integrating factor) to give:

q( x )

d y( x ) + q( x)α( x) y( x) = q( x)β( x). dx

If q′ ( x) = q( x)α( x), then the expression is

q( x )

d y( x ) d + q ′ ( x ) y( x ) = { q( x) y( x)} = q( x)β( x). dx dx

(16.11)

In this case, integrating both sides of Equation 16.11 gives: Zx x0

d { q( x) y( x)} = q( x) y( x) | xx0 = q( x) y( x) − q( x0 ) y( x0 ) = dx

Zx

˜ q( x˜ )β( x˜ )d x,

(16.12)

x0

or

q ( x ) y( x ) = q ( x0 ) y( x0 ) +

Zx x0

˜ q( x˜ )β( x˜ )d x,

(16.13)

CHAPTER 16. DIFFERENTIAL EQUATIONS

550

so that a solution for y( x) is given:

y( x ) =

· ¸ Zx 1 q ( x0 ) y( x0 ) + q( x˜ )β( x˜ )d x˜ . q( x ) x0

What is q? Since q′ ( x) = q( x)α( x), o R n x q ( x) ln q( x0 ) = x0 α( x˜ )d x˜ , so that

q ′ ( x) q ( x)

(16.14)

= α( x) so that ln q( x) | xx0 =

Rx

˜ )d x˜ , x0 α( x

or

Rx Rx q( x ) α( x˜ ) d x˜ α( x˜ ) d x˜ = e x0 or q( x) = q( x0 ) e x0 . q ( x0 )

Therefore, · ¸ Zx 1 q ( x0 ) y( x0 ) + q( x′ )β( x′ )dx′ q( x ) x0 · ¸ Z x Rx′ 1 α( x˜ ) d x˜ q ( x0 ) y( x0 ) + q ( x0 ) = e x0 β( x′ ) dx′ q( x ) x0 R x Rx′ α( x˜ ) d x˜ β( x′ ) dx′ y( x0 ) + x0 e x0 . = R

y( x ) =

e

x x0

(16.15)

α( x˜ ) d x˜

Notice that when β( x) = 0 for all x, this expression reduces to Equation 16.10, the solution to the homogeneous equation.

E XAMPLE 16.5: Consider the linear demand and supply model of Example 16.3 with dynamic adjustment modified to

dp dt

= ω( t)(Q d − Q s ), where ω

now depends on time and with Q d = α − β p and Q s = γ + δ p. Dynamic adjust-

ment is now given by:

dp + ω( t)(β + δ) p = ω( t)(α − γ). dt From Equation 16.15, ·

p( t) =

p( t 0 ) +

Rt

t0 e

R t′

t0

e

ω( t˜)(β+δ) d t˜

Rt

t0

ω( t′ )(α − γ) dt′

ω( t˜)(β+δ) d t˜

¸

.

To simplify this expression, let t0 = 0 and set ω( t) = ω · t. Then Z t′ t0

1 2 2 ω( t˜)(β + δ) d t˜ = ω(β + δ) t′ = kt′ , 2

16.2. FIRST-ORDER LINEAR DIFFERENTIAL EQUATIONS

551

where k = ω(β + δ) 12 . Next, Zt

e

R t′

t0

ω( t˜)(β+δ) d t˜

t0

ω( t′ )(α − γ) dt′ = ω(α − γ)

Zt

′2

e kt t′ dt′

t0

2

= ω(α − γ ) =

1 e kt − 1 2 k

(α − γ) kt2 [ e − 1]. β+δ

Thus,

p( t) =

p(0) +

( α − γ) β+δ

e kt 2

= p(0) e−kt +

Writing p e =

( α − γ) β+δ ,

h

2

e kt − 1

i

2

2 (α − γ ) [1 − e−kt ]. β+δ

the market clearing price p( t) is a weighted average of 2

the initial price p(0) and the market clearing price p e : p( t) = p(0) e−kt + 2

p e [1 − e−kt ].

♦

E XAMPLE 16.6:

dy + ax y = x. dx Here, α( x) = ax so that

e

Rx

˜ )d x˜ = 2a [ x2 − x02 ]. x0 α( x

Rx

x0

α( x˜ ) d x˜

a 2

a 2

Thus,

a 2

a 2

= e 2 x e− 2 x0 = k( x0 ) e 2 x a 2

so that q( x) = q( x0 )k( x0 ) e 2 x = c0 e 2 x . Multiplying both sides of

by c0 e

a 2 2x

gives

·

¸ a 2 a 2 dy + ax y e 2 x = xe 2 x dx

or a 2 d n a x2 o ye 2 = xe 2 x dx

dy d x + ax y = x

CHAPTER 16. DIFFERENTIAL EQUATIONS

552

so that Z x′ x0

d n a x2 o ye 2 dx = dx

Z x′

a 2

xe 2 x dx

x0

or ′

y( x ) e

a ′2 2x

− y( x0 ) e

a 2 2 x0

=

Z x′

=

a 2

xe 2 x dx

x0

i 1 2 1 h 1 ax′ 2 − e 2 ax0 e2 a

or a ′2

y( x ′ ) e 2 x = c 0 +

1 1 ax′ 2 e2 . a

Therefore, a ′2 1 y( x ′ ) = c 0 e − 2 x + . a

♦

16.3 Some non-linear first-order differential equations This section considers two well-known non-linear first-order differential equations that have exact solutions: the logistic and Bernoulli differential equations.

16.3.1

The logistic model

The linear model

dy dx

+ ay = 0 has solution y( x) = y0 e−ax so that as x becomes

large y( x) tends to 0 or to ∞, depending on whether a is positive or negative.

Models of population growth where the habitat is finite cannot be modeled by this equation over long periods of time. The logistic model provides a growth model that asymptotes to a finite limit. Although the differential equation is non-linear, it is easily solved. Consider · ¸ h k− y y′ yi α =α = α 1− = [ k − y ], y k k k

α > 0, k > 0

16.3. SOME NON-LINEAR FIRST-ORDER DIFFERENTIAL EQUATIONS

553

From this expression, when k − y > 0 or y < k, y′ > 0 and the value of y grows.

Similarly, when k − y < 0 the value of y decreases. At y = k, y′ = 0, and k is the steady state value for y. The equation may be rearranged as, 1 k y′ = α. y k− y Observe that

1 y

+ k−1 y =

1(k− y) y(k− y)

+

1y y(k− y)

=

k y(k− y) .

Consequently,

1 k 1 1 y′ = y′ + y′ = α, y k− y y k− y and so Z x∗ x0

1 k y′ (x)dx = y(x) k − y(x)

Z x∗ x0

1 ′ y (x)dx + y(x)

Z x∗ x0

1 y′ (x)dx = α[x∗ − x0 ], k − y(x)

or ln y(x∗ ) − ln(x0 ) − [ln(k − y(x∗ )) − ln(k − y(x0 ))] = α[x∗ − x0 ] ln

y(x∗ ) y(x0 ) − ln = α[x∗ − x0 ] ∗ k − y(x ) k − y(x0 ) ln

y(x∗ ) y(x0 ) = ln + α[x∗ − x0 ] k − y(x∗ ) k − y(x0 )

ln

y(x∗ ) = c0 + α x∗ k − y(x∗ ) ∗ y(x∗ ) = e c 0 +α x k − y(x∗ ) ∗

= ceα x .

Therefore, ∗

∗

y(x∗ ) = kceα x − y(x∗ )ceα x , so that ∗

kceα x k y(x ) = ∗ = ∗, 1 + ceα x 1 + c ′ e −α x ∗

1 c′ = . c

k The value of c′ is determined ³ ´ by the initial condition y(x0 ) = y0 : y0 = 1+ c′ e−αx0 or 1 + c′ e−α x0 = yk or c′ = yk − 1 eα x0 . The function y(x) = 1+ cek −αx is called the logistic 0

function (Figure 16.1).

0

CHAPTER 16. DIFFERENTIAL EQUATIONS

554

y

k

x

Figure 16.1: The logistic curve

16.3.2

The Bernoulli equation

One particular non-linear differential equation that has appeared in economic applications is the Bernoulli equation. This is a differential equation of the form:

dy + a( x) y = β( x) yα , dx

α 6= 0, 1.

Dividing by yα , 1 dy + a( x) y1−α = β( x). yα dx Define w = y1−α , so that w′ = (1 − α) y−α y′ or variables:

dw dx

= (1 − α) y1α

dy dx .

With this change of

1 dw + a( x)w = β( x). (1 − α) dx This is a first-order linear differential equation, which may be solved for w( x) as 1

described previously. The function y( x) is then obtained as y( x) = w( x) 1−α . E XAMPLE 16.7: The Solow growth model provides a differential equation governing the long-run dynamics of per capita capital. A constant returns to Y L 1 −α

scale production function Y = f (K, L) is assumed, so that y = L f (K L , L) α

= f (k), where k =

K L.

α

For example, f (K, L) = K L

=

1 L

f (K, L) =

gives f (k) =

k , with 0 < α < 1. Savings, S , is a fixed proportion, s, of income, Y , and

investment I is equal to saving: I = S = sY . Capital growth per unit of time

is equal to investment less depreciation: I − δK : ∆K = (sY − δK )∆ t or, in

16.3. SOME NON-LINEAR FIRST-ORDER DIFFERENTIAL EQUATIONS

555

continuous time, y k,

dK dt

= (sY −δK ). Therefore,

1 dK K dt

¡ Y ¢ = sK − δ . Since

Y K

=

Y /L K /L

=

´ 1 dK ³ y = s −δ . K dt k

To convert to per capita terms, observe that dL dK 1 dK 1 dL dk L d t − K d t = = − k . dt L dt L dt L2

So, with

1 k

=

L K,

1 dk 1 dK 1 dL = − . k dt K dt L dt Let n =

1 dL L dt

be the continuous time growth rate of the labor supply, so that 1 dK 1 dk = − n. k dt K dt

Collecting terms, ³ y ´ 1 dK 1 dk = +n = s −δ . K dt k dt k

Therefore,

dk = sy − (δ + n)k = sf (k) − (δ + n)k. dt

(16.16)

From this expression, there are two points where k is constant: at 0, and at kˆ , defined by sf (kˆ ) − (δ + n)kˆ = 0.

f ( k) (n + δ)k

sf (k)

kˆ

k

CHAPTER 16. DIFFERENTIAL EQUATIONS

556

With f (k) = kα , the differential equation becomes

dk = skα − (δ + n)k dt or

dk dt

get:

+ (δ + n)k = skα , which is a simple Bernoulli equation. Rearrange to

1 dk + (δ + n)k1−α = s kα dt and define w = k1−α , so that

dk 1 dk dw = (1 − α)k−α = (1 − α) α , dt dt k dt

1 dw 1 dk = α . 1 − α dt k dt

or

With this change of variable, the differential equation is: 1 dw + (δ + n)w = s 1 − α dt

or

dw + (1 − α)(δ + n)w = (1 − α)s. dt

The solution to a differential equation of the form

w( t) = w(0) e−at +

dw dt

+ aw = b has the form:

b s = w(0) e−[(1−α)(δ+n)] t + . a δ+n

1

Since k = w 1−α h k( t) = k(0)1−α e−[(1−α)(δ+n)] t +

s i 1−1α . δ+n

¤ 1 £ α At t → ∞, k( t) → δ+s n 1−α . From Equation 16.16, dk d t = 0 when sk − (δ + n) k = £ s ¤ 1 0 or k = δ+n 1−α , so the solution gives the same steady state. (Note that

k = 0 is also a solution but one in which there is no output.) Also, from

Equation 16.16, sy − (δ + n)k = 0 so that and K grow at the same rate.

y k

=

Y K

=

(δ+ n ) s

is constant and so Y

In equilibrium, skα − (n + δ)k = 0, so the amount of output per capita

saved, skα , equals (n + δ)k. Total output per capita is kα and so consumption per capita, c, is c = kα − skα = kα − (n + δ)k. This is maximized when αkα−1 −

(n + δ) = 0, or

k∗ =

h α i 1 1 −α . n+δ

557

16.4. FIRST-ORDER DIFFERENTIAL EQUATIONS: EXISTENCE OF SOLUTIONS

The solution is called the golden rule level of capital, the level that maximizes per capita consumption. Comparing this with the long-run steady ¤ 1 £ state level of k = n+s δ 1−α ; these are equal when s = α. The savings rate

that maximizes the long-run per capita consumption is called the golden

♦

rule savings rate.

16.4 First-order differential equations: existence of solutions Consider the (differential) equation

dy = f ( x, y), dx associating the change in y to the current values of x and y. When f ( x, y) = β( x) − α( x) y, this gives the first-order differential equation

dy dx

+ α( x) y = β( x). A solution

to the equation on the interval (a, b) is a function y( x) satisfying

d y( x) dx

= f ( x, y( x))

for x ∈ (a, b). Two natural questions arise. Does the differential equation have a solution? And is the solution unique?

Definition 16.1: Let Q be a region in R 2 . Say that the function f ( x, y) satisfies a uniform Lipschitz condition in Q if there is some constant, K , such that for any pairs ( x, y1 ), ( x, y2 ) ∈ Q : | f ( x, y1 ) − f ( x, y2 ) |≤ K | y1 − y2 | .

With this definition, a key theorem on the existence of a solution to a differential equation is the following. Theorem 16.1: Given the differential equation:

dy = f ( x, y), dx with initial condition y( x0 ) = y0 , where f is a continuous function on Q =

[ x0 − a, x0 + a] × [ y0 − b, y0 + b]. Then for some a′ > 0, there is a function y defined on [ x0 − a′ , x0 + a′ ] with y( x0 ) = y0 and y′ ( x) = f ( x, y( x)), x ∈ [ x0 − a′ , x0 + a′ ]. If f satisfies a uniform Lipschitz condition on Q , the solution is unique.

CHAPTER 16. DIFFERENTIAL EQUATIONS

558

In the case of a linear differential equation, f ( x, y) = β( x) − α( x) y, with a( x) continuous,

f ( x, y1 ) − f ( x, y2 ) =| α( x) y1 − α( x) y2 |≤| α( x) | | y1 − y2 |≤ K | y1 − y2 |, where K = max x∈[ x0 −a′ ,x0 +a′ ] | a( x) |, and so the uniform Lipschitz condition is satisfied.

16.5 Second- and higher-order differential equations An equation is an nth-order linear differential equation if it has the form:

a n ( x)

d n −1 y dy dn y + a ( x ) + a 0 ( x ) y = b ( x ). + · · · + a1 ( x) n − 1 n n − 1 dx dx dx

A second-order differential equation has the form:

a2 ( x)

d2 y dy + a1 ( x) + a 0 ( x ) y = b ( x ). dx dx2

In general, this is difficult to solve for the function y. The remainder of the discussion focuses on the second-order differential equation with constant coefficients.

16.5.1

Constant coefficients

When the coefficients are constant, the second-order linear differential equation has the form:

a2

dy d2 y + a0 y = b. + a1 2 dx dx

This equation has a particular solution y¯ =

b a0 .

Consider the homogeneous

equation,

a2

dy d2 y + a1 + a 0 y = 0. dx dx2

Observe that if y1 and y2 are solutions to the homogeneous equation, then so is

y∗ = c1 y1 + c2 y2 for any c1 and c2 .

16.5. SECOND- AND HIGHER-ORDER DIFFERENTIAL EQUATIONS

559

A candidate solution y(x) = ke rx has the property that y′ (x) = rke rx = r y(x) and

y′′ (x) = r2 ke rx = r2 y(x), so that if this function satisfies the homogeneous equation

then

a2 r2 y(x) + a1 r y(x) + a0 y(x) = (a2 r2 + a1 r1 + a0 )y(x) = 0. Since y(x) = ke rx 6= 0, this requires a2 r2 + a1 r1 + a0 = 0. This quadratic expression has two roots:

r1, r2 =

−a1 ±

q

a21 − 4a2 a0

2a2

.

The solution to the differential equation depends on the roots, which may be real and distinct, real and repeated or complex.   c1 e r 1 x + c2 e r 2 x + ab ,   0  y(x) = c1 e rx + c2 xe rx + ab ,  0    αx c1 e cos(β x) + c2 eα x sin(β x) + ab , 0

roots real and distinct roots real and equal roots complex.

The first two cases are easy to confirm, the third involves some computation. These issues are developed in Section 16.5.2. R EMARK 16.2: Considering stability, in the first case it is necessary for r 1 and r 2 to be negative, and in the second case for r to be negative. In the complex case, the roots are repeated and α is the real part of both roots, so in that case, it is necessary for α to be negative. In sum, a sufficient condition for stability of the solution is that the real parts of the roots be negative.

16.5.2

Derivation of solution

Regarding these roots, there are three distinct regions of the parameter space: 1. Roots real and distinct: a21 − 4a2 a0 > 0. The general solution of the homogeneous equation has the form: y(x) = c1 e r 1 x + c2 e r 2 x .

2. Roots real and equal: a21 − 4a2 a0 = 0, so that r 1 = r 2 = r. The general solution of the homogeneous equation has the form: y(x) = c1 e rx + c2 xe rx .

3. Roots complex: a21 − 4a2 a0 < 0, so that r 1 = α + β i and r 2 = α − β i, where ³q ´ a α = − 2a1 and β = 4a2 a0 − a21 /2a2 . The general solution of the homoge2

neous equation has the form: y(x) = c1 e r 1 x + c2 e r 2 x .

CHAPTER 16. DIFFERENTIAL EQUATIONS

560

According to Euler’s formula, for any real number x, e ix = cos(x) + i sin(x) (Figure 16.2).

Im

ix •e

sin(x) x cos(x)

Re

Figure 16.2: Euler’s formula

Thus, e r 1 x = eα x+ iβ x = eα x e iβ x = eα x [cos(β x) + i sin(β x)] and e r 2 x = eα x− iβ x = eα x e i(−β x) = eα x [cos(−β x) + i sin(−β x)] = eα x [cos(β x) − i sin(β x)]. Therefore, y(x) = c1 e r 1 x + c2 e r 2 x

= c1 eα x [cos(β x) + i sin(β x)] + c2 eα x [cos(β x) − i sin(β x)] = eα x [(c1 + c2 ) cos(β x) + i(c1 − c2 ) sin(β x)].

Since y is a real-valued function, setting c1 − c2 = 0 and letting c1 + c2 = c, gives ya (x) = ceα x cos(β x)

16.5. SECOND- AND HIGHER-ORDER DIFFERENTIAL EQUATIONS

561

Similarly, letting c1 = − i 2c and c2 = i 2c gives c1 + c2 = 0 and i ( c1 − c2 ) = i [− i − i ] 2c = −2 i 2 2c = c. This gives

yb ( x) = ceα x sin(β x) as a solution. Since any combination of solutions is a solution, solutions of the homogeneous equation have the form:

y( x) = c1 eα x cos(β x) + c2 eα x sin(β x) = eα x [ c1 cos(β x) + c2 sin(β x)],

where c1 and c2 are real numbers. This may be rearranged. Consider a rightangled triangle with adjacent c1 , opposite c2 and hypotenuse q 2 2 h = c1 + c2 . The angle, θ, formed by the hypotenuse and adjacent has cosine,

cos(θ) =

c1 h

and sine, sin(θ) =

c2 h.

Thus, c1 = h cos(θ) and c2 = h sin(θ). Then

y( x) = c1 eα x cos(β x) + c2 eα x sin(β x)

= heα x [cos(θ) cos(β x) + sin(θ) sin(β x)] = heα x cos(β x − θ),

using the formula cos(a) cos(b) + sin(a) sin(b) = cos(a − b). Summarizing,    c1 e r 1 x + c2 e r 2 x + ab0 ,   y( x) = c1 e rx + c2 xe rx + ab ,  0    c1 eα x cos(β x) + c2 eα x sin(β x) + ab0 ,

roots real and distinct roots real and equal roots complex.

E XAMPLE 16.8: According to Hooke’s law, the force acting on an object is proportional to its displacement from equilibrium. If the equilibrium value of y is 0, then when the displacement is y < 0 the force acts to increase y, and when y > 0 the force acts to decrease y. With a proportionality factor

k, this may be written F = −k y, with force proportional to displacement.

According to Newton’s second law, force equals mass times acceleration: d2 y

F = ma = m d t2 . Hence m

d2 y + k y = 0. dt

CHAPTER 16. DIFFERENTIAL EQUATIONS

Rearranging,

d2 y dt

562

k +m y = 0. The associated roots are

r1, r2 =

q k ± −4(1)( m )

2

= ±i

s

k . m

Thus, the roots are complex and according to the previous discussion,

y( t) = h cos

Ãs

k · t−θ m

!

y

y0 y=0

t

t0

y( t)

³q ´ k At t0 , observe that y( t0 ) = y0 and y′ ( t0 ) = 0. At t0 , cos · t − θ is maxim q q q q k k k k · t0 − θ = 0 or m · t0 = θ, and so m ·t−θ = m · ( t − t0 ). mized so that m ³q ´ k Since cos(0) = 1, y0 = h cos m · ( t 0 − t 0 ) = h cos(0) = h. So, the differential

equation is given by:

y( t) = y0 cos

Ãs

! k · ( t − t0 ) . m d2 y

dy

This example may be generalized when the spring is damped: m d t2 + q d t +

k y = 0 (with real roots if and only if q2 − 4mk > 0).

♦

Finally, the following result confirms the existence and uniqueness of a solution in the linear case.

16.6. SYSTEMS OF DIFFERENTIAL EQUATIONS

563

Theorem 16.2: The nth-order linear differential equation:

a n ( x)

dy dn y d n −1 y + · · · + a1 ( x) + a ( x ) + a 0 ( x ) y = b( x ) n − 1 n n − 1 dx dx dx

with initial conditions y( x0 ) = y0 , y′ ( x0 ) = y1 , y′′ ( x0 ) = y2 ,... , y( n−1) ( x0 ) = yn−1 has a unique solution.

16.6 Systems of differential equations Let A be an n × n matrix and β an n × 1 vector of constants. A system of first-order linear differential equations is given by:

dy ( x) + A y( x) = β. dx

(16.17)

To simplify the discussion, assume that A has n linearly independent eigenvectors

v i and corresponding eigenvalues λ i , then for each i , Av i = λ i v i . Let Λ be the

diagonal matrix with λ i in the i th row and column and let P be the matrix with v i

in the i th column. Then this collection of n equations may be written as AP = P Λ.

Therefore, P −1 AP = P −1 P Λ = Λ. Considering Equation 16.17, define z = P −1 y,

so that y = P z and substitute into Equation 16.17 to obtain:

P

dz ( x) + AP z( x) = β. dx

Multiplying by P −1 :

dz ( x) + P −1 AP z( x) = P −1 β dx or

dz ( x) + Λ z( x) = P −1 β = β¯ . dx Because Λ is diagonal, this system has the form:

dz i ( x) + λ i z i ( x) = β¯ i , i = 1,... , n, dx

(16.18)

a collection of n first-order linear differential equations with solutions z i ( x) =

c i e−λi x + δ i . Given initial values yi ( x0 ), values z i ( x0 ) are determined according

to z(0) = P −1 y(0), and this determines c i and δ i , as in Equations 16.1 and 16.5.

CHAPTER 16. DIFFERENTIAL EQUATIONS

564

Since y = P z,

yi ( x ) =

n X

j =1

Pi j z j =

n X

j =1

P i j [ c j e −λ j x + δ j ] =

n X

j =1

γ i j e−λ j x + δ¯ j ,

(16.19)

P where γ i j = P i j c j and δ¯ i = nj=1 P i j δ j . Note that the roots λ i appearing in Equa-

tion 16.18 may be complex. Considering the characteristic polynomial p A (λ) =

| A − λ I |, this is an nth-order polynomial with real coefficients. Such a polyno-

mial has exactly n roots, including multiplicities (repeated roots), and because the

polynomial coefficients are real, complex roots appear in conjugate pairs. Because

P −1 AP = Λ, |P −1 AP | = | A | = |Λ| and tr(P −1 AP ) = tr( AP P −1 ) = tr( A ) = tr(Λ). P P That is, | A | = ×ni=1 λ i and ni=1 a ii = ni=1 λ i . Writing P = (v1 , v2 ,... , vn ) where v i is the i th eigenvector, Equation 16.19 may

be written as

y( x ) =

n X

j =1

v j c j e −λ j x +

n X

v jδ j.

j =1

The homogeneous equation y′ + A y = 0 solves to give yc ( x) =

Pn

j =1 v j c j e

−λ j x

as the

complementary solution of the homogeneous equation and a particular solution is given as the solution to A y = β or yp = A −1 β. A general solution of the homoge-

neous equation is then:

y( x) =

n X

j =1

v j c j e −λ j x + A −1 β.

E XAMPLE 16.9: Consider the two-good market given by: Market 1 : Market 2 :

D 1 = S1

D 2 = S2

or or

a0 + a1 p1 + a2 p2 = b0 + b1 p1 + b2 p2

α0 + α1 p1 + α2 p2 = β0 + β1 p1 + β2 p2 .

Suppose that price dynamics are linear, with price adjusting to eliminate excess demand over time:

d pi dt

= (D i ( t) − S i ( t)). Thus:

d p1 = ( a0 − b0 ) + ( a1 − b1 ) p1 + ( a2 − b2 ) p2 dt d p2 = (α0 − β0 ) + (α1 − β1 ) p1 + (α2 − β2 ) p2 . dt Letting

A=

Ã

( a1 − b1 )

(α1 − β1 )

( a2 − b2 )

!

(α2 − β2 )

,

β=

Ã

( a0 − b0 )

!

(α0 − β0 )

,

p=

Ã

p1 p2

!

,

16.6. SYSTEMS OF DIFFERENTIAL EQUATIONS

565

the system may be written:

dp = Ap+β dt

or

dp − A p = β. dt

¡3 0¢ ¡1 3¢ ¡ 5 6¢ , 20 , (a2 , b2 ) = 20 , 20 , (α1 , β1 ) = 20 , 20 , and Suppose that (a1 , b1 ) = − 20 ¡2 6¢ , 20 . Then (α2 , β2 ) = 20   3 − 11 20 20 . A= 4 2 − 20 − 20

Noting that the characteristic equation is | A − λ I |= λ2 + 34 λ + 18 , the matrix

A has eigenvalues: λ1 = − 41 , λ2 = − 21 , with corresponding eigenvectors,

v1′ = (1, 2) and v2′ = (3, 1). Write V = (v1 , v2 ), a matrix whose columns are

the eigenvectors of A and Λ for the diagonal matrix with λ1 and λ2 on the main diagonal. Considering the system

dp dt

− A p = β, multiplying by V −1 and setting

z = V −1 p, so that p = V z gives:

dz dz − V −1 AV z = − Λ z = V −1 β = β∗ . dt dt So

d zi dt

− λ i z i = β∗i , with solution z i = c i eλi t + δ i . Thus, as in Equation 16.19,

p( t) =

n X

j =1

v j c j e λ j t − A −1 β.

Because the roots are negative between 0 and −1, the system converges to

p e = − A −1 β. For example, if (a0 , b0 ) = (3, 45) and (α0 , β0 ) = (15, 20), then pe =

Ã

p1 e

!

p2 e

=−

Ã

− 11 20 2 − 20

3 20 4 − 20

! ! −1 Ã −15

−5

=

Ã

30 10

!

.

Fixing an initial value p(0),

p(0) = v1 c1 eλ1 0 + v2 c2 eλ2 0 + p e = v1 c1 + v2 c2 + p e = V c + p e , where c = ( c1 , c2 ). Thus, p(0) − p e = V c so that c = V −1 ( p(0) − p e ). Writing Λ∗ ( t) for the diagonal matrix with eλi t the i th diagonal term, the price dynamics may be written:

p( t) = V Λ∗ ( t)V −1 ( p(0) − p e ) + p e .

CHAPTER 16. DIFFERENTIAL EQUATIONS

566

Expanding the term V Λ∗ ( t)V −1 gives:

p( t) =

Ã

− 51 eλ1 t + 65 eλ2 t

3 λ1 t − 5e 6 λ1 t − 5e

− 25 eλ1 t + 25 eλ2 t

3 λ2 t 5e 1 λ2 t 5e

!

( p(0) − p e ) + p e .

♦

E XAMPLE 16.10: Let Ã

y1′ y2′

The coefficient matrix, A =

!

Ã

=

1 5 2 5

Ã

1 5 2 5

0 4 5

!

0 4 5

!Ã

! Ã ! 2 . + y2 4

y1

has eigenvalues λ1 =

and λ2 = 54 . The

1 5

corresponding eigenvectors are v1 = (− 32 , 1) and v2 = (0, 1). Therefore, the general solution is: Ã

! y1 ( x)

y2 ( x)

Ã

− 23

!

− 51 x

Ã ! 0

c2 e 1   1 − 23 c1 e− 5 x + 10 . = 4 1 c1 e− 5 x + c2 e− 5 =

1

c1 e

+

− 54

+

Ã

5

0

− 52

5 4

!Ã ! 2

4

♦

16.7 Stability The long-run behavior of a dynamic system depends on the roots of the characteristic equation. A sufficient condition for stability of the dynamic system is that the roots of the characteristic polynomial have negative real parts. Consider the polynomial:

p(λ) = a o λn + a1 λn−1 + a2 λn−2 + · · · + a n−1 λ + a n , where a0 > 0 (a0 can always be chosen positive, if necessary by multiplying a polynomial with negative coefficient on λn by −1).

Definition 16.2: The polynomial p(λ) is called stable if all roots of p have negative real part. (If a root λ∗ is real, it is negative, if complex with λ∗ = a∗ + ib∗ , then

a∗ < 0.)

16.7. STABILITY

567

The Routh–Hurwicz theorem provides a means to determine the stability of a polynomial. Define the n × n (Hurwicz) matrix, H : 

a1

a3

a5

a7

 a  0  0    0   0  H =  0   ..  .   .  .  .   ..  .  0

a2

a4

a6

a1

a3

a5

a0

a2

a4

0

a1

a3

0

a0

a2

ak

0

···

a k −1

···

a k −2

a k +1

··· .. .

a k −3

···

a k −4 .. .

···

···

···

ak

a k +2

a k −1 .. .

···

..

···

···

··· .. .

···

··· ···

. ..

···

.

···

a n −1

···

···

0



 0   0  ..   .   ..  .   ,            0  an

(16.20)

where k = n if n is odd, and k = n − 1 if k is even, and for j > n define a j = 0. For example, the cases n = 3, n = 5 and n = 6 are:





a1 a3 0    H = a0 a2 0  , 0 a1 a3



a1  a0   H = 0  0  0

a3 a5 a2 a4

0

0

0

 0   0 ,  0  a5

a1 a3 a5 a0 a2 a4 0



a1 a3



  a0 a2    0 a1 H =   0 a0  0 0  0 0



0

0

0

a4 a6

0

a3 a5

0

 0   0 .  0  0  a6

a1 a3 a5

a2 a4 a6 a1 a3 a5 a0 a2 a4

Theorem 16.3: The polynomial p(λ) is stable if and only if all the leading principal minors of H are positive. More formally, letting H j denote the matrix obtained from H by deleting the last n − j rows and columns for j = 1,... , n, the polynomial

is stable if and only if | H j |> 0 for all j.

In the case of a second-order polynomial,

H=

Ã

a1

0

a0

a2

!

,

H1 = {a1 },

H2 = H.

so that | H1 |= a1 and | H2 |= a1 a2 . These are positive if and only if a1 and a2 are

positive. And, since a0 > 0, stability requires that all coefficients in the characteristic polynomial p(λ) = a0 λ2 + a1 λ + a2 are positive.

CHAPTER 16. DIFFERENTIAL EQUATIONS

568

Consider Example 16.9 with coefficient matrix:  11 − 20 A= 2 − 20

3 20 4 − 20

 

The corresponding characteristic equation is | A − λ I |= λ2 + 34 λ + 18 , so that all

coefficients are positive. Note that the matrix has eigenvalues: λ1 = − 14 , λ2 = − 21 , leading to stable dynamics of the system.

Exercises for Chapter 16

Exercise 16.1 Consider a loan with outstanding value P ( t) at time t. The growth of the debt at time t, over a small instant of time, ∆ t, is P ( t)r∆ t, where r is the continuous time rate of interest. Suppose that repayments are divided into two periods: an early period with no interest – from time t = 0 to t∗ and then on a continuous basis at the rate d or a repayment of d ∆ t over the period of time ∆ t.

Find the value of d that amortizes the loan over the length of time T > t∗ . Exercise 16.2 Suppose that:

dy 3 − y( t) = 5 dt 4 and y satisfies the initial condition, y(0) = 0. Find y( t). Exercise 16.3 The elasticity of demand is: ǫ( p ) =

p dQ ( p) = f ( p ). Q dp

Find the demand function when the elasticity of demand is: (a) f ( p) = c (b) f ( p) = p c (c) f ( p) = cp. The parameter c may be positive or negative depending on context. 569

CHAPTER 16. DIFFERENTIAL EQUATIONS

570

Exercise 16.4 Suppose that:

dy + α( x) y( x) = 0. dx Find y( x) by direct integration. Exercise 16.5 Suppose that:

dy 1 + y( x ) = x dx 2 and y satisfies the initial condition, y(0) = 2. Find y( x). Exercise 16.6 Suppose that:

dy − α y( t) = t, dt where α is a constant and y satisfies the initial condition, y( t) = y( t0 ) at t = t0 . Find y( t).

Exercise 16.7 The theory of population growth from Malthus suggested that as the population grows, competition for resources will work to inhibit that population growth. One model that captures this feature is the logistic growth model: ¶ µ P dP = aP 1 − ∗ . dt P

Find the growth rate of the population as a function of t. Exercise 16.8

1

Consider a production function y = f (k) = k 2 . Suppose that k

evolves over time according to the formula:

µ ¶ 1 1 1 1 1 dk 2 2 = sk − (δ + n)k = k − + k, dt 20 10 30

where s =

1 20 ,

δ=

1 10

and n =

1 30

are the savings, depreciation and population

growth rates. Find k( t).

Exercise 16.9 Suppose that:

d 2 y 3 d y( t) 1 + y( t) = 5 − 8 dt2 4 dt and y satisfies the initial conditions, y(0) = 0, y(1) = 1. Find y( t).

EXERCISES FOR CHAPTER 16

571

Exercise 16.10 Consider the system: 1 ( t) dt d y2 ( t) dt

Ãdy

! Ã

+

7 4 5 2

− 34

−1

!Ã

y1 ( t) y2 ( t)

!

=

Ã ! 2

3

.

(a) Find the eigenvalues and eigenvectors of the coefficient matrix A , showing that v1′ = (3, 5) and v2′ = (1, 2) are eigenvectors of A . (b) Let P be the matrix of eigenvectors; define z = P −1 y. Express the system in terms of z.

(c) Solve for z.

This page intentionally left blank

17 Linear difference equations

17.1 Introduction In the study of many problems, the change in a variable is most naturally modeled in discrete units of time. For many bank loans, interest is computed on a monthly basis so that the level of debt is updated on a monthly basis. Government data are routinely released on a quarterly or annual basis, firms report profit each quarter. Thus, unemployment data or profit data are seen as sequences u t , u t−1 ,...

or π t , π t−1 ,.... When the data evolve according to a systematic pattern, difference equations may be used to study long-run behavior. Section 17.2 provides some introductory examples to motivate the use of difference equations. Difference equations are categorized by the order of the difference equation. In a first-order difference equation the current value of the variable of interest depends on its value in the previous period; in an nth-order difference equation, the current value depends on the value in the previous n periods. The entire discussion here considers only first- and second-order difference equations, developed in Sections 17.3 and 17.4, respectively. Section 17.3.3 describes the well-known cobweb model of cycles in the demand and price of a single good and illustrates the process of solving a first-order difference equation. A generalization of the first-order difference equation is considered in Section 17.3.4. In Section 17.4.5 some examples of second-order difference equations are considered, in particular the multiplier-accelerator model from macroeconomic theory. Finally, Section 17.5 considers vector difference equations.

573

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

574

17.2 Motivating examples Suppose that you obtain a loan from a bank, to be repaid over time. At the end of each period, interest is computed on the outstanding balance and a payment is made against the loan. Suppose that at the beginning of period t the outstanding balance is yt . At the end of the period, the outstanding balance has grown to (1 + r) yt when the interest rate is r. Then a payment, m, is made to the bank. So,

at the beginning of period t + 1, the outstanding balance is yt+1 = (1 + r) yt − m. This

may be rewritten as

yt+1 − (1 + r) yt = −m.

(17.1)

What sequence { yt } might satisfy this rule? Observe that if yt were constant over time, yt = y¯ , then y¯ − (1 + r) y¯ = −m or y¯ =

m r .

In this case, the outstanding debt

does not change and the repayment just covers interest on the principal. From the lender’s perspective, the loan is an asset, generating m dollars per period in perpetuity. This may be seen in present value terms. Consider an asset that pays m dollars at the end of each year. In present-

value terms, the payment of m one year from now is currently worth payment of m two years from now is currently worth

m in t years from now is currently worth flow is:

1 m. (1+ r ) t

1 m (1+ r )2

1 1+ r m,

the

and the payment of

So, the present value of the cash

1 1 1 m+ m+ m +··· 2 (1 + r) (1 + r) (1 + r)3 · ¸ 1 1 1 1+ + + · · · =m (1 + r) (1 + r) (1 + r)2 · ¸ 1 1+r =m (1 + r) r

PV =

=

m . r

(17.2)

The loan of y¯ with repayment only of interest generates a yearly cash flow of

m dollars. The value, y¯ , satisfies r y¯ = m; m is the repayment each period that

exactly covers interest accrued on the outstanding amount y¯ , so the principal y¯ is unchanged from period to period and equals PV .

Notice that if there were some sequence { y˜ t } satisfying y˜ t+1 − (1 + r) y˜ t = 0, then [ y¯ − (1 + r) y¯ ] + [ y˜ t+1 − (1 + r) y˜ t ] = −m + 0 = −m, or ( y¯ + y˜ t+1 ) − (1 + r)( y¯ + y˜ t ) = −m, so that the sequence { y∗t } = { y¯ + y˜ t } also satis-

fies Equation 17.1. Considering y˜ t+1 − (1 + r) y˜ t = 0, this expression asserts that

17.2. MOTIVATING EXAMPLES

575

y˜ t+1 = (1 + r) y˜ t , so that starting from the initial period the sequence increments

over time by multiplying by (1 + r): y˜ t+1 = A (1 + r) t for t ≥ 1, with A arbitrary in this expression. Thus, y∗t = y˜ t+1 + y¯ = A (1 + r) t +

m r .

Suppose that some terminal

date T is set such that the loan is fully repaid: 0 = yT∗ = A (1 + r)T + m r . Therefore, m 1 A (1 + r)T = − m r or A = − r (1+ r )T . Thus,

y∗t = A (1 + r) t +

m r

(17.3)

¸ · m m 1 (1 + r) t + = − r (1 + r)T r hm i· t ¸ (1 + r) . 1− = r (1 + r)T

If the repayment per period is m and the loaniis repaid in T periods, the amount i £ m ¤h £ m ¤h (1+ r )0 1 ∗ borrowed must have been: y0 = r 1 − (1+r)T = r 1 − (1+r)T . As a final motivating example, consider an employment model where work-

ers transition in and out of employment. Let p( e | e) be the probability of being

employed next period given a person is currently employed, and let p(u | e) be the

probability of being unemployed next period given a person is currently employed.

p( e | u) and p(u | u) denote the corresponding probabilities for an individual cur-

rently unemployed. From the definitions, p( e | e) + p(u | e) = 1 and p( e | u) + p(u |

u) = 1. If l is the labor force, then at any point in time current employment e t and

unemployment u t equal the labor force: l = e t + u t . Of those that are currently

employed, e t p( e | e) remain employed and e t p(u | e) become unemployed. Of those

that are currently unemployed, u t p( e | u) become employed and u t p(u | u) remain unemployed. Thus, unemployment next period is

u t +1 = e t p ( u | e ) + u t p ( u | u ) = ( l − u t ) p( u | e ) + u t p( u | u) = l p(u | e) + u t [ p(u | u) − p(u | e)] = k + δu t ,

where k = l p(u | e) and δ = [ p(u | u) − p(u | e)]. In the steady state, u t+1 = u t = u¯ , and so u¯ = k + δu¯ , u¯ (1 − δ) = k or u¯ = j

k 1 −δ .

Also, u t+1 − u¯ = δ(u t − u¯ ), so that u t+ j =

δ ( u t − u¯ ). So, for example, if there is a 25% chance of an unemployed worker

finding a job, and a 5% chance of an employed worker losing a job, then δ = 0.25 − 0.05 = 0.2.

This discussion illustrates the solution of a first-order difference equation. The next section formalized the analysis of first-order difference equations.

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

576

17.3 First-order linear difference equations A first-order linear difference equation describes the movement of some variable

yt over time according to an equation of the form: a1 yt + a0 yt−1 = f ( t). Assume a1 6= 0, since otherwise the equation gives yt−1 =

f ( t) a0 ,

in which case yt

evolves deterministically according to the function f and the behavior of yt is completely described. When a1 6= 0, the equation can be rearranged:

yt + ayt−1 = ϕ( t), where a =

a0 a1

and ϕ( t) =

1 a1

(17.4)

f ( t). This is the general form of the first-order linear

difference equation: yt + ayt−1 = ϕ( t).

A solution to this dynamic equation is a sequence { y∗t } satisfying the equation

for each t: y∗t + ay∗t−1 = ϕ( t).

p

Definition 17.1: A function yt is called a particular solution of yt + ayt−1 =

ϕ( t) if

p

p

yt + ayt−1 = ϕ( t), ∀ t. The collection of all particular solutions of yt + ayt−1 = ϕ( t) is called the general

solution.

In general, there are many particular solutions. A unique particular solution is selected by imposing an initial condition. For example, if the value of y0 is set, then y1 = ϕ( t) − ay0 , and so on: the sequence { yt } is fully determined. If yt and x t are two particular solutions, then

( yt + ayt−1 ) − ( x t + ax t−1 ) = ( yt − x t ) + a( yt−1 − x t−1 ) = z t + az t−1 = ϕ( t) − ϕ( t) = 0. So, if yt and x t are two particular solutions then z t = yt − x t satisfies:

z t + az t−1 = 0. Definition 17.2: A function ytc is called a complementary solution if

ytc + aytc−1 = 0, ∀ t. In view of the preceding discussion, if yt and x t are particular solutions, then

z t = yt − x t is a complementary solution. Consequently, given any particular solu-

17.3. FIRST-ORDER LINEAR DIFFERENCE EQUATIONS

577

tion x t , any other particular solution, yt , may be written yt = x t + z t , for some complementary solution z t .

R EMARK 17.1: Note, for example, that multiplying any complementary solution, ytc by a constant, k, gives a complementary solution k ytc . And, if x and y are complementary solutions, then, for any numbers α and β, α x + β y is a comple-

mentary solution. There are many complementary and (hence) many particular

solutions.

17.3.1

Solving first-order linear difference equations

These observations suggest a general method of solution for the first-order linear difference equation, yt + ayt−1 = ϕ( t): 1. Find a complementary solution, ytc , to the homogeneous equation:

yt + ayt−1 = 0. 2. Find

a

particular

solution,

p

yt ,

to

the

inhomogeneous

equation:

yt + ayt−1 = ϕ( t). 3. Combine the complementary and particular solutions to give the general p

solution: yt = ytc + yt . 4. Use the initial condition to select a particular solution. To simplify the discussion, the case where ϕ( t) is constant and the case where it is variable are discussed separately.

17.3.2

The constant forcing function

The function ϕ( t) is sometimes called the forcing function of the difference equation. When this is constant, ϕ( t) = c for all t, the difference equation is:

yt + ayt−1 = c. The homogeneous equation is yt + ayt−1 = 0. Finding a complementary solution is

straightforward: yt = −ayt−1 = (−a)2 yt−2 = (−a)3 yt−3 = · · · , so it is clear that yt =

k(−a) t

t

satisfies

k(−a) = k.

this

expression for

any

constant

k.

When

a = −1,

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

578

The inhomogeneous equation yt + ayt−1 = c has a constant particular solution if a 6= −1: pick yt = y¯ where y¯ + a y¯ = c, so yt = y¯ =

c 1+ a

is a solution. If a = −1, the

equation is yt − yt−1 = c, so that yt , yt−1 ,... lie on a line with slope c: yt = ct + k′ for any number k′ .

Combining the complementary and particular solutions, yt =

(

k(−a) t + 1+c a ′

(k + k ) + ct

a 6= −1 a = −1.

Imposing an initial condition eliminates the arbitrary parameter. In the case where a 6= −1, the initial condition that the solution equal y0 at t = 0 gives: y0 = k(−a)0 + 1+c a = k + 1+c a , so that k = y0 − 1+c a . In the case where a = −1, y0 = k + k′

and yt = y0 + ct. Thus, with initial condition y0 , the solution (depicted in Figure 17.1 for y0 = 6, a =

1 2

and c = 12 ) is: yt =

( £

¤ y0 − 1+c a (−a) t + 1+c a ,

y0 + ct,

a 6= −1

a = −1.

yt

4 3 2 1 0 0

1

2

3

4

5

6

Figure 17.1: Plot for yt +

7

³ ´ 3 4

8

9

10 11

t

yt−1 , c = 72 , y0 = 72 . p

E XAMPLE 17.1: yt − 12 yt−1 = 4, y0 = 1. The particular solution is: yt = ³ ´t y = 4 1 = 8. The complementary solution is: ytc = A 12 . This gives yt = 1− 2 ³ ´t A 12 + 8 and using y0 = 1 gives 1 = y0 = A + 8 ⇒ A = (−7). Therefore, ³ ´t ♦ yt = (−7) 12 + 8.

17.3. FIRST-ORDER LINEAR DIFFERENCE EQUATIONS

579

p

E XAMPLE 17.2: yt − yt−1 = 4, y0 = 1. The particular solution is yt = 4t and

the complementary solution is ytc = A(1) t = A. This gives yt = A(1) t + 4t =

♦

A + 4t, so 1 = y0 = A. Thus, yt = 1 + 4t.

The cobweb model of price dynamics described in Section 17.3.3 illustrates a first-order linear difference equation with constant forcing function.

17.3.3

The cobweb model

This is a supply and demand model with the key feature that supply in the current period depends on the previous period’s price. Thus, in any given period, price alone adjusts to clear the market: D t = a + b p t , a > 0, b < 0 S t = c + d p t−1 , 0 < c < a, d > 0. Market clearing each period gives: D t = a + b p t = c + d p t −1 = S t . In a stationary equilibrium (price constant over time): p t = p t−1 = p e . a + b pe = c + d pe ⇒ pe = 17.3.3.1

a− c . d−b

Dynamic behavior of price

With market clearing each period, D t = S t , current price adjusts to clear the market: a + b p t = c + d p t−1 . Rearranging, b p t − d p t−1 = c − a, so that: pt −

µ ¶ d c−a p t −1 = , b b

or p t + α p t−1 = δ, where α = − db and δ =

solution is:

c−a b .

Note that α = − db > 0 since d > 0, b < 0. The general pt = k

µ ¶t δ d + . b 1+α

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

Observe that

δ 1 +α

=

(c ³ −a) ´ b 1− db

=

c−a b− d

580

= p e > 0, so the solution can be written

pt = k

µ ¶t d + pe. b

With an initial price p0 , k is determined: k = p0 − p e and this gives the particular solution:

p t = (p0 − p e )

µ ¶t d + pe. b

¯ ¯ ¯ ¯ This is stable (p t → p e ) provided ¯ db ¯ < 1.

17.3.3.2

Stability

A different perspective can be given in terms of elasticities. The supply elasticity is: p t−1 dS t p t −1 = d = e s p−1 (p t−1 ), S t d p t −1 St and the demand elasticity is: p t dD t pt = b = e D p (p t ). Dt d pt Dt At p e = p t−1 = p t , S t = D t , so that

e sp−1 (p e ) e D p (p e )

=

³ ´ d b

= ratio of supply to demand

elasticity at the equilibrium point. Since d > 0 and b < 0,

d b

< 0. There are three

cases to consider. ¯ ¯ ¯ ¯ 1. db < −1, (¯ db ¯ > 1). In this case, supply is more responsive to price changes

than demand giving, unstable dynamics. ¯ ¯ ¯ ¯ 2. db = −1, (¯ db ¯ = 1). When demand and supply have slopes of equal magnitude,

cycles occur. ¯ ¯ ¯ ¯ 3. db > −1, (¯ db ¯ < 1). When supply is less responsive to price changes than demand, stable dynamics result and the price converges to p e .

Figure 17.2 illustrates.

17.3. FIRST-ORDER LINEAR DIFFERENCE EQUATIONS

581

p t model with cycles

pt

D t, St

unstable model

pt

stable model

D t, St

D t, St

Figure 17.2: Cobweb dynamics

17.3.4

Equations with variable forcing function

Considering again the equation

yt + ayt−1 = ϕ( t). The homogeneous part is the same, yt + ayt−1 = 0, regardless of ϕ( t), and the complementary solution is unchanged: yt = k(−a) t . Finding a particular solution is

more complex. The usual practice is to postulate a functional form for the partic-

ular solution that is similar to the functional form of the forcing function ϕ. For example, ϕ( t) = t2 − 3 suggests a quadratic particular solution yt : yt = α t2 + β t + γ,

whereas ϕ( t) = 2 t suggests a solution of the form yt = g2 t . The discussion here focuses on three cases: 1. ϕ( t) = gd t 2. ϕ( t) = α t + β. 3. ϕ( t) = α t2 + β. Case 1: ϕ( t) = gd t i  h t+1  y0 − gd (−a) t + gd , a+ d a+ d yt = ¡ g¢  y0 (−a) t − a td t+1 ,

a + d 6= 0

(17.5)

a + d = 0.

Case 2: ϕ( t) = δ t + β ½ yt = yo −

· ¸¾ ½ · ¸¾ 1 δ δ δ 1 t+ β+α (−a) t + β+a , 1+a 1+a 1+a 1+a 1+a

a 6= −1. (17.6)

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

582

Case 3: ϕ(t) = δ t2 + β yt = [y0 − γ3 ](−a) t + γ1 t2 + γ2 t + γ3 ,

a 6= −1.

(17.7)

Detailed calculations deriving these expressions are given in Section 17.7.1.

¡ ¢t yt−1 = 5 14 . (This corresponds to case 1 above with ¡ ¢t ¡ ¢ t +1 ¡ ¢t ¡ ¢ t +1 a = − 12 and d = 14 .) So yt = A 12 + ¡ 51 ¢ 14 = A 21 − (20) 14 . Next, −4 ¡1¢ ¡ 1 ¢t ¡ 1 ¢ t +1 1 = y0 = A − 20 4 = A − 5 ⇒ A = 6. Thus, yt = 6 2 − 20 4 . ♦

E XAMPLE 17.3: yt −

¡1¢ 2

¡ ¢t ¡ ¢t ¡ ¢ t +1 E XAMPLE 17.4: yt − 12 yt−1 = 5 12 . Therefore yt = A 12 − 51 t 12 = 2 ¡ 1 ¢t ¡ 1 ¢ t +1 ¡1¢ A 2 − 10t 2 . Finally, 1 = y0 = A − 10 2 = A − 5 ⇒ A = 6, so yt = ¡ 1 ¢t ¡ 1 ¢ t +1 6 2 − 10t 2 . ♦

17.4 Second-order linear difference equations The simplest form of second-order difference equation is: yt + a1 yt−1 + a2 yt−2 = c. As in the case of first-order difference equations, the problem of finding a solution is dealt with as follows: • Solve the inhomogeneous equation: yt + a1 yt−1 + a2 yt−2 = c to obtain the particular solution.

• Solve the homogeneous equation: yt + a1 yt−1 + a2 yt−2 = 0 to obtain the complementary solution.

• Obtain the general solution as the sum of the particular and complementary solutions. • Use initial conditions to identify a unique solution.

17.4. SECOND-ORDER LINEAR DIFFERENCE EQUATIONS

583

17.4.1

The inhomogeneous equation

The form of solution for the inhomogeneous equation depends on the values of a1 and a2 . The simplest candidate particular solution is a constant solution yt = y∗

and if this satisfies the inhomogeneous equation: y∗ (1 + a1 + a2 ) = c, y∗ is deterc 1+ a 1 + a 2 . y∗ (1 + a1 + a2 ) =

mined as y∗ = However,

c is insoluble if (1 + a1 + a2 ) = 0 (and c 6= 0). In this

case, try a particular solution y = ρ t, so that ρ t + a1 ρ ( t − 1) + a2 ρ ( t − 2) = c. Re-

arranging, ρ (1 + a1 + a2 ) t − (a1 + 2a2 )ρ = c or −(a1 + 2a2 )ρ = c and this solves for

ρ = − (a

c

1 +2 a 2 )

, provided a1 + 2a2 6= 0.

Finally, if both (1 + a1 + a2 ) and (a1 + 2a2 ) equal 0, try as a particular solution

y = ρ t2 . Then ρ t2 + a1 ρ ( t − 1)2 + a2 ρ ( t − 2)2 = c, or ρ t2 (1 + a1 + a2 ) − 2ρ t(a1 + 2a2 ) + c a 1 +4a 2 . The equations (1 + a1 + a2 ) = 0 and ( a1 + 2 a2 ) = 0 that ρ = 2c . So, depending on the parameters, a particular

(a1 + 4a2 )ρ = c, so that ρ = imply a2 = 1, a1 = −1, so

solution is given by:  c  , a1 + a2 6= −1   1+ a 1 + a 2  p c yt = a 1 + a 2 = −1 , a 1 +2a 2 t,     c 2 a 1 + a 2 = −1 , 2t ,

17.4.2

(17.8)

a1 + 2a2 6= 0 a 1 + 2 a 2 = 0,

(a1 = −2, and a2 = 1).

The homogeneous equation

Now, consider the homogeneous equation, yt + a1 yt−1 + a2 yt−2 = 0. As a trial com-

plementary solution consider ytc = kb t . This gives the equation:

kb t + a1 kb t−1 + a2 kb t−2 = 0. Canceling k gives the quadratic equation:

b 2 + a 1 b + a 2 = 0. Use the usual formula to find the roots, λ1 and λ2 : q −a1 ± a21 − 4a2 . λ1 , λ2 = 2 So, for i = 1, 2, yt = kλ ti satisfies the complementary equation for any value of k.

More generally, if yt and z t are solutions of the homogeneous equation, then so is

k1 yt + k2 z t for any numbers k1 and k2 . So any yt of the form: yt = k1 λ1t + k2 λ2t

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

584

satisfies the complementary equation. The dynamic behavior of this equation depends critically on the roots, and concerning the roots there are three possibilities: 1. Both roots are real and distinct (a21 − 4a2 > 0). 2. Both root are real but repeated (λ1 = λ2 , when a21 − 4a2 = 0). 3. Both roots are complex and distinct (a21 − 4a2 < 0). Depending on which of these three cases applies, the complementary solution is:

ytc

 t t    k1 λ1 + k2 λ2 , 

roots real and distinct

  

roots complex.

= k1 λ1t + k2 tλ2t , 

roots real and equal

r t [B1 cos(θ t) + B2 sin(θ t)],

(17.9)

The derivation of these equations is given in Section 17.7.2.

17.4.3

The general solution

Together, Equations 17.8 and 17.9 define the solution: p

yt = yt + ytc . For example, if the roots are real and distinct, and a1 + a2 6= 1, then yt = k1 λ1t + k2 λ2t +

c 1 + a1 + a2

.

Suppose now that y0 and y1 are known. Then, c

y0 = k1 + k2 +

1 + a1 + a2 c y1 = k1 λ1 + k2 λ2 + . 1 + a1 + a2 Rearranging: Ã

y0 − 1+a1c+a2 y1 − 1+a c+a 1

2

!

=

Ã

1

1

λ1

λ2

!Ã

k1 k2

!

,

and, since the roots are not equal, this determines k1 and k2 . How does yt behave in the long run as t becomes large? This is discussed next.

17.4. SECOND-ORDER LINEAR DIFFERENCE EQUATIONS

585

17.4.4

Stability

From the particular solution in Equation 17.8, it is clear that if a1 + a2 = −1, p

then the system diverges, since | yt | → ∞. So, a necessary condition for stability

is that a1 + a2 6= −1. Considering the complementary solution, in the case where

the roots λ1 and λ2 are real and distinct, the roots are less than 1 in absolute value if −2 < a1 < 2, 1 + a1 + a2 > 0 and 1 − a1 + a2 > 0. These conditions, along with the condition that the root be real, a21 − 4a2 > 0, define the stability region for real distinct roots. For stability in the case where the roots are real and equal a21 − 4a2 = 0, this reduces to the conditions a1 + a2 6= −1 and −2 < a1 < 2. When

the roots λ1 and λ2 are complex (and hence conjugate), then it is necessary that they have absolute value less than 1: |λ1 | = |λ2 | < −1 and this translates to the

condition a2 < 1. Assuming that a1 + a2 6= 1, the regions of stability determined by the parameters is depicted in Figure 17.3. Detailed calculations are given in Section 17.7.3.

a2 1 complex roots

a1 −2

−1

real roots

−1

1

2

real repeated roots

Figure 17.3: Stability of second-order system

17.4.5

Examples

E XAMPLE 17.5: In the equation yt + a1 yt−1 + a2 yt−2 = 9, suppose that a1 =

−0.5 and a2 = −0.3, so that

yt − 0.5yt−1 − 0.3yt−2 = 9.

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

586

The roots are real since a21 − 4a2 = 0.25 − 4(−0.3) = 1.45 > 0. The stability conditions are satisfied, since:

1 + a1 + a2 = 1 − 0.8 > 0 | a2 | < 2

1 − a1 + a2 = 1.2 > 0. The general form of the solution is: yt = k1 λ1t + k2 λ2t +

c . 1 + a1 + a2

p p The roots are λ1 = 12 [(0.25) + 1.45] ≈ 0.7271 and λ2 = 21 [(0.25) − 1.45] ≈ −0.4771, and

c 1+ a 1 + a 2

9 1.8

=

= 5. Substituting these numbers:

yt = k1 (0.7271) t + k2 (−0.4771) t + 5. Picking y0 = 11, y1 = −4.0439 as initial conditions gives the equation system:

Ã

y0 y1

!

=

Ã

11

−4.0439

!

=

Ã

1

1

0.7271

−0.4771

!Ã

k1 k2

!

.

The solution values are k1 = 1, k2 = 10. So, the difference equation is: yt = 1(0.7271) t + 10(−0.4771) t + 5. Figure 17.4 depicts the time path of the process.

17.4.6

♦

The Samuelson multiplier-accelerator model

The next example is the Samuelson multiplier-accelerator model, where aggregate household consumption is a fraction of aggregate income and investment responds to increases in household consumption.

17.4. SECOND-ORDER LINEAR DIFFERENCE EQUATIONS

587

yt

y¯ = 5

t

Figure 17.4: A converging process

E XAMPLE 17.6: Aggregate output equals aggregate consumption:

Yt = C t + I t + G C t = γ Y t −1 , 0 < γ < 1 I t = α(C t − C t−1 ), α > 0. Substitution gives a second-order difference equation:

Yt − γYt−1 − α(γYt−1 − γYt−2 ) = G Yt − γ(1 + α)Yt−1 + αγYt−2 = G. Solving for the particular solution: p

Yt =

G G = . 1 − γ(1 + α) + αγ 1 − γ

To find the complementary solution, we need to find the roots of: λ2 − γ(1 + α)λ + αγ = 0:

λ1 , λ2 =

γ(1 + α) ±

p [γ(1 + α)]2 − 4αγ . 2

There are three cases to consider: 1. The roots are real and distinct: γ2 (1 + α)2 > 4αγ (or γ >

4α ). (1+α)2

With a1 = −γ(1 + α) and a2 = αγ, the condition 1 + a1 + a2 > 0 gives

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

588

1 − γ(1 + α) + αγ = 1 − γ > 0; and the condition 1 − a1 + a2 > 0 gives

1 + γ(1 + α) + αγ > 0. The latter condition is always satisfied since α and

γ > 0. So, when the roots are real and distinct, the system is stable

since γ < 1.

2. The roots are real and equal γ2 (1 + α)2 = 4αγ. The condition a1 + a2 6= −1 here means −γ(1 + α) + γα 6= −1 and this is −γ 6= −1. The condition

−2 < a1 < 2 is −2 < αγ < 2. So, with real repeated roots, the stability

requirement is |αγ| < 2.

3. Where the roots are complex and distinct, γ2 (1 + α)2 < 4αγ. Then sta-

♦

bility requires that 0 < a2 < 1 or 0 < αγ < 1.

This multiplier-accelerator model may be augmented with the addition of a monetary sector.

E XAMPLE 17.7: Taking the previous example, modify by adding a mone¯ and money demand be m t = m 0 + tary sector. Let the money supply be m ¯ = mt. m 1 Yt−1 + m 2 r t . Assume money market equilibrium in each period: m

Also modify the investment equation: I t = α(C t − C t−1 ) + i 0 r t .

¯ = m t = m 0 + m 1 Yt−1 + m 2 r t , we From the money market equilibrium, m

can solve for r t : r t =

¯ m m2

m1 0 −m m − m Yt−1 . So, 2

2

Yt = C t + I t + G 0

¯ − m0 ) i 0 (m m1 − i0 Y t −1 + G m2 m2 ¯ − m0 ) m1 i 0 (m = γ(1 + α)Yt−1 − αγYt−2 − i 0 Y t −1 + + G, m2 m2

= γYt−1 + α(γYt−1 − γYt−2 ) +

so

¸ · ¯ − m0 ) i 0 (m i 0 m1 Yt−1 + αγYt−2 + +G = 0 Yt − γ(1 + α) − m2 m2

gives a second-order difference equation. Thus, the characteristic equation is:

· ¸ i 0 m1 λ2 − γ(1 + α) − λ + αγ = 0. m2

In the form yt + a1 yt−1 + a2 yt−2 = 0, the conditions for stability are: 1+ a1 +

a2 > 0, 1 − a1 + a2 > 0, and a2 < 1. Here, a1 corresponds to −[γ(1 + α) − i 0mm1 ] 2

17.5. VECTOR DIFFERENCE EQUATIONS

589

and a2 corresponds to αγ. So, the conditions for stability are: · ¸ i 0 m1 1 − γ(1 + α) − + αγ > 0 m2 ¸ · i 0 m1 + αγ > 0 1 + γ(1 + α) − m2

1 − αγ > 0.

♦

17.5 Vector difference equations Consider a dynamic system where the movement of two variables, x t and yt , over time is described:

a11 x t + a12 yt + b11 x t−1 + b12 yt−1 = c1 a21 x t + a22 yt + b21 x t−1 + b22 yt−2 = c2 . Let

zt =

" # xt

yt

, A=

" a11

a21

a12

a22

#

, B=

"

b11

b21

b12

b22

#

, C=

" # c1

c2

,

so that

Az t + Bz t−1 = C. As before define complementary, particular and general solutions. A complemenp

p

tary solution solves: Az tc +Bz tc−1 = 0; the particular solution solves: Az t +Bz t = C ; p

and the general solution is the sum of both: z t = z tc + z t .

17.5.1

The particular solution p

To find the particular solution, z t , set: p zt

= z¯ =

" # x¯

y¯ p

Then A z¯ + B z¯ = C or ( A + B ) z¯ = C . Assume A + B is non-singular, so that z t = z¯ =

( A + B ) −1 C .

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

17.5.2

590

The complementary solution

The complementary solution satisfies Az tc + Bz tc = 0. To solve this, try a solution of the form:

z tc

=

"

sb t ub t

#

so that

A

"

sb t ub t

#

+B

"

sb t−1

ub t−1

#

= 0.

Divide by b t−1 to get: ( Ab + B )

" # s

u

= 0.

If the vector (s, u) is not equal to 0, then | Ab + B |= 0, so ¯ a12 b + b12 ¯¯ ¯ = 0. a22 b + b22 ¯

¯ ¯a b + b 11 ¯ 11 | Ab + B |= ¯ ¯a21 b + b21

This gives: 0 = (a11 b + b11 )(a22 b + b22 ) − (a21 b + b21 )(a12 b + b12 ) = (a11 a22 − a21 a12 )b2 + (a11 b22 + a22 b11 − a21 b12 − a12 b21 )b + (b11 b22 − b21 b12 ) = k 1 b 2 + k 2 b + k 3 = 0.

Denote the roots of this quadratic b1 , b2 . The system

[ Ab i + B ]

" # s

u

=0

has solution (s i , u i ) and since ( Ab i + B ) is singular, this implies that (s, u) is deter-

mined up to a scalar multiple: if (s i , u i ) satisfies this equation, so does λ(s i , u i ) for

any real number, λ. Fixing the value of one (s i or u i ), determines the value of the other. This gives the complementary solution:

z tc



=

x tc ytc





=

s1 b1t + s2 b2t u1 b1t + u2 b2t



.

The general solution is

zt =

p z tc + z t

=

Ã

x tc ytc

!



=

s1 b1t + s2 b2t u1 b1t + u2 b2t



 + ( A + B ) −1

Ã

c1 c2

!

.

17.5. VECTOR DIFFERENCE EQUATIONS

591

If the roots are real and repeated, then for the complementary solution try:

z tc =

"

s1 b1t + s2 tb2t

u1 b1t + u2 tb2t

#

.

The particular solution is unchanged as long as | A + B | 6= 0. E XAMPLE 17.8: Consider an oligopoly of two firms, where each expects the other to produce last year’s output. In the usual case: π1 (Q 1 ,Q 2 ) = Q 1 ( a − b[Q 1 + Q 2 ]) − cQ 1 .

In the present circumstances: π1 (Q 1 t ,Q 2 t−1 ) = Q 1 t ( a − b(Q 1 t + Q 2 t−1 )) − cQ 1 t .

This has first-order condition: a − 2bQ 1 t − bQ 2 t−1 − c = 0. Therefore,

a− c 1 . Q 1 t + Q 2 t −1 = 2 2b Similarly, 1 a− c Q 2 t + Q 1 t −1 = . 2 2b These equations give the system: " 1

0

#" # " 1 Q1t + 2 0 1 Q2t

0

0

1 2

#" # Q 1 t −1

Q 2 t −1

=

"

( a − c) 2b ( a − c) 2b

For a particular solution Q it = Q¯ i :

a− c a− c 1 ⇒ Q¯ 1 = 1 Q¯ 1 = 2 2b 3b a− c a− c 1 ⇒ Q¯ 2 = . 1 Q¯ 2 = 2 2b 3b For the general solution: ¯ ¯b + 1 ¯ 2 ¯ ¯ 0

¯ 0 ¯¯ 1 ¯ = b 2 + b + = 0, 1¯ 4 b+ 2

the roots are

1

b1 , b2 =

−1 ± (1 − 4( 41 )) 2

2

1 1 = − ,− . 2 2

#

.

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

592

Thus the roots are repeated and real. The solution is: µ ¶ µ ¶ 1 t 1 t a− c Q 1t = s1 − + s2 − t+ 2 2 3b µ ¶ µ ¶ 1 t 1 t a− c Q 2t = u1 − + u2 − t+ . 2 2 3b

These reaction functions are depicted in Figure 17.5. The model is stable ¡ ¢t ¡ ¢t since − 12 → 0 and t − 12 → 0. ♦

Q 2t

Q 1t

Figure 17.5: Dynamic reaction functions

17.6 The n-variable case Consider a system of n equations governing the evolution of the n-vector x over time: x t − Ax t−1 = c,

(17.10)

then provided (I − A) is invertible, x¯ = (I − A)−1 c is a particular solution of Equa-

tion 17.10. Provided the eigenvalues of A are distinct, finding the complementary solution is relatively straightforward. When the eigenvectors are all distinct, A may be written A = U ΛU −1 where Λ is a diagonal matrix with the eigenvalues

17.7. MISCELLANEOUS CALCULATIONS

593

on the diagonal and where U is a matrix composed of the n linearly independent eigenvectors. Since x¯ − A x¯ = c, ¯ − A(x t−1 − x) ¯ =0 (x t − x) ¯ − U ΛU (x t − x)

−1

(17.11)

¯ =0 (x t−1 − x)

¯ − U ΛU −1U ΛU −1 (x t−2 − x) ¯ =0 (x t − x)

¯ − U Λ2U −1 (x t−2 − x) ¯ = 0. (x t − x)

Thus, ¯ − U Λ t U −1 (x0 − x) ¯ =0 (x t − x) or ¯ + x. ¯ x t = U Λ t U −1 (x0 − x) Provided all roots of A are less than 1, Λ t converges to the 0 matrix and x t con¯ verges to x.

17.7 Miscellaneous calculations The following sections provide detailed computations underlying the results stated in Sections 17.3.4 and 17.4.2.

17.7.1

Various forcing functions: solutions

The following sections solve for the forcing functions given in Equations 17.5, 17.6 and 17.7.

17.7.1.1

The case ϕ(t) = gd t

Assume that the particular solution has the form, yt = ρ d t . Substituting into the inhomogeneous equation gives ρ d t + aρ d t−1 = gd t . Factoring d t−1 , d t−1 (ρ d + aρ − gd) = 0,

or ρ d + aρ − gd = 0.

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

594

This equation has no solution in ρ if a + d = 0. If (a + d ) 6= 0, rearranging gives ρ=

gd a+ d

and a particular solution

yt =

gd t gd t+1 d = . a+d a+d

Then, the general solution is yt = k(−a) t + gd 0+1 a+ d

gd

gd

= k + a+d , k = y0 − a+d and

gd t+1 a+ d . With initial condition

y0 = k(−a)0 +

¸ gd gd t+1 yt = y0 − . (−a) t + a+d a+d ·

If a + d = 0, try as an alternative candidate for a particular solution: yt = ρ td t .

Then ρ td t + aρ ( t − 1)d t−1 = gd t . Factoring d t−1 gives

d t−1 (ρ td + aρ ( t − 1) − gd ) = 0,

or ρ td + aρ ( t − 1) − gd ) = 0

or ρ (d + a) t − ρ a − gd = 0. gd a .

³

gd a

´

¡g¢

td t+1 is a par¡g¢ ticular solution. Now the general solution has the form, yt = k(−a) t − a td t+1 . ¢ ¡ g With initial condition y0 , y0 = k and yt = y0 (−a) t − a td t+1 . Summarizing: Since d + a = 0, ρ (−a) = gd or ρ = −

Thus yt = −

 h i t+1  y0 − gd (−a) t + gd , a+ d a+ d yt = ¡ g¢  y0 (−a) t − a td t+1 ,

17.7.1.2

td t = −

a

a + d 6= 0 a + d = 0.

The case ϕ( t) = δ t + β

In this case, try yt = γ1 t + γ2 as a particular solution. Then, γ1 t + γ2 + a[γ1 ( t − 1) + γ2 ] = δ t + β γ1 t + γ2 + aγ1 ( t − 1) + aγ2 = δ t + β γ1 t + γ2 + aγ1 t − aγ1 + aγ2 = δ t + β

(γ1 + aγ1 ) t + (1 + a)γ2 − aγ1 = δ t + β. Equating coefficients, h(γ1 + aγ1 ) i= δ and (1 + a)γ2 − aγ1 = β. So, assuming (1 + a) 6= 0, γ1 =

δ 1+ a

and γ2 =

1 1+ a

β + α 1+δ a . Thus, the general solution is:

yt = k(−a) t +

δ

1+a

t+

· ¸ 1 δ β+a . 1+a 1+a

17.7. MISCELLANEOUS CALCULATIONS

595

h i The value of k is determined given an initial value y0 : y0 = k + 1+1 a β + α 1+δ a , so h i that k = y0 − 1+1 a β + α 1+δ a .

17.7.1.3

The case ϕ(t) = δ t2 + β

In this case, try yt = γ1 t2 + γ2 t + γ3 as a particular solution. Then, γ1 t2 + γ2 t + γ3 + a[γ1 (t − 1)2 + γ2 (t − 1) + γ3 ] = δ t2 + β

γ1 t2 + γ2 t + γ3 + aγ1 (t − 1)2 + aγ2 (t − 1) + aγ3 = δ t2 + β

γ1 t2 + γ2 t + γ3 + aγ1 t2 − 2aγ1 t + aγ1 + aγ2 t − aγ2 + aγ3 = δ t2 + β

(γ1 + aγ1 )t2 + (γ2 − 2aγ1 + aγ2 )t + (γ3 + aγ1 − aγ2 + aγ3 ) = δ t2 + β

(1 + a)γ1 t2 + [(1 + a)γ2 − 2aγ1 ]t + [(1 + a)γ3 + a(γ1 − γ2 )] = δ t2 + β.

This has no solution if a = −1 because in that case t2 does not appear on the left side, but does on the right side, so assume that a 6= −1. Then, comparing terms:

£

(1 + a)γ1 t2 = δ t2 ¤ (1 + a)γ2 − 2aγ1 t = 0 (1 + a)γ3 + a

1−a = β. a+1

and [(1 + a)γ2 − 2aγ1 ] = 0 gives γ2 = 12a γ1 = 2a δ = 2a 2 δ. ¡ 2a+a ¢ 1+aa−11+a (1+a) Finally, [(1 + a)γ3 + a(γ1 − γ2 )] = β. Since γ2 − γ1 = 1+a − 1 = a+1 , this gives Therefore γ1 =

δ 1+ a ,

β

a a −1 0 (1 + a)γ3 + a 1a− +1 = β, so γ3 = a (a+1)2 + 1+a . With initial condition y0 , y0 = k(− a) +

γ1 (0)2 + γ2 (0) + γ3 = k + γ3 and so k = y0 − γ3 . Therefore,

yt = [y0 − γ3 ](−a) t + γ1 t2 + γ2 t + γ3 ,

17.7.2

a 6= −1.

Roots of the characteristic equation

In Section 17.4.2, the solution for the homogeneous equation of the second-order difference equation is given. The following sections elaborate on that result. The three possible cases are considered in turn.

17.7.2.1

Roots of the characteristic equation: real and distinct

In this case, a21 − 4a2 > 0, and a complementary solution is of the form yt = k1 λ1t + k2 λ2t .

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

596

The constants k1 and k2 are arbitrary real numbers. If | λ1 |, | λ2 |< 1 then the

complementary solution converges to 0 for any values of k1 and k2 : lim t→∞ yt = 0. p

The general solution is: yt = ytc + yt , where ytc is a complementary solution

and

p yt

a particular solution. Depending on the values of a1 and a2 , the general

solution is:

yt =

  k1 λ1t + k2 λ2t + 1+a c +a , 1



k1 λ1t + k2 λ2t + a

2

c t, 1 +2 a 2

a1 + a2 6= −1

(17.12)

a1 + a2 = −1, a1 + 2a2 6= 0.

Note that a third possibility: a1 + a2 = −1, a1 + 2a2 = 0 does not arise since these

two equations imply that a1 = −2, a2 = 1, which would imply that a21 − 4a2 = 0.

When the roots are real and distinct, the equation pair a1 + a2 = −1, a1 + 2a2 = 0

cannot be satisfied.

Initial conditions

In either of these general solutions, the constants k1 and

k2 are as yet undetermined. These can be determined uniquely with additional information about yt . If the values of yt at t = 0 and t = 1, y0 and y1 , are known, this is sufficient to determine k1 and k2 , as described next.

Suppose that a − 1 + a2 6= −1, so that yt = k1 λ1t + k2 λ2t + 1+a c +a , and let c∗ = 1

c

1+ a 1 + a 2

2

. Let the two initial conditions be the values of yt at t = 0 and t = 1: y0 , y1 .

Then y0 = k1 + k2 + c∗ and y1 = k1 λ1 + k2 λ2 + c∗ . Rearranging, and solving for k1 and k2 : "

y0 − c∗

y1 − c∗

#

=

"

1

1

λ1

λ2

#"

k1 k2

#

=⇒

"

k1 k2

#

" −λ2 1 = λ2 − λ1 1

λ1

1

#"

y0 − c∗

y1 − c∗

#

.

In the second case, a1 + a2 = −1, a1 + 2a2 6= 0. Now, y0 = k1 + k2 and y1 = k1 λ1 +

k2 λ2 + c¯, where c¯ =

c a 1 +2 a 2 .

"

k1 k2

17.7.2.2

Solving for k1 , k2 :

#

" −λ2 1 = λ2 − λ1 1

λ1

#"

1

y0 y1 − c¯

#

.

Roots of the characteristic equation: real and repeated

With repeated roots, a21 − 4a2 = 0. In this case λ1 = λ2 = λ = − a21 and the complementary solution is:

yt = k1 λ t + k2 tλ t .

17.7. MISCELLANEOUS CALCULATIONS

597

This may be checked by substitution into the homogeneous equation: yt + a1 yt−1 + a2 yt−2 = 0. So,

(k1 λ t + k2 tλ t ) + a1 (k1 λ t−1 + k2 (t − 1)λ t−1 ) + a2 (k1 λ t−2 + k2 (t − 2)λ t−2 ) = 0 λ t−2 [(k1 λ2 + k2 tλ2 ) + a1 (k1 λ + k2 (t − 1)λ) + a2 (k1 + k2 (t − 2))] = 0

(k1 λ2 + a1 k1 λ + a2 k1 ) + (k2 tλ2 + k2 a1 (t − 1)λ + k2 a2 (t − 2)) = 0 k1 (λ2 + a1 λ + a2 ) + k2 (tλ2 + a1 (t − 1)λ + a2 (t − 2)) = 0

Now λ = − a21 and a21 − 4a2 = 0 in this case. Therefore λ2 + a1 λ + a2 =

a21 4

−

a21 2

+

a21 4

= 0.

Also, a2 a2 − (t − 1) 1 + (t − 2) 1 4 " 2 # 4 Ã 2! Ã 2! a21 a21 a21 a a = (t − 2) − + +2 1 − 1 4 2 4 4 2

(tλ2 + a1 (t − 1)λ + a2 (t − 2)) = t

a21

= 0.

This confirms that yt is a complementary solution. The curve a21 − 4a2 = 0 is tangent to the line a1 + a2 = −1 at a1 = −2, a2 = 1, the

only point common to both. At this point, a1 + 2a2 = 0. So, together, a21 − 4a2 = 0 and a1 + a2 = −1 eliminate the possibility that a1 + 2a2 6= 0 (see Figure 17.6).

a2

1

a21 − 4a2 = 0 a1

2

a1 + 2a2 = 0 1 + a1 + a2 = 0

Figure 17.6: Plot of a21 − 4a2 = 0, 1 + a1 + a2 = 0 and a1 + 2a2 = 0

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

p

598

So, when a21 − 4a2 = 0, there are two possibilities for the particular solution:

yt =

c 1+ a 1 + a 2

p

if a1 + a2 6= −1 and yt = 2c t if a1 + a2 = −1. Therefore, when a21 − 4a2 =

0, the general solution is:

yt =

 c  k1 λ t + k2 tλ t + 1+ a 1 + a 2 ,  k λ t + k tλ t − c t2 , 1 2 2

a1 + a2 6= −1 a 1 + a 2 = −1 , a 1 + 2 a 2 = 0 .

If | λ |< 1, then both λ t and tλ t go to 0. The solution converges to

a1 + a2 6= −1. If a1 + a2 = −1, then the solution diverges because

Initial conditions

17.7.2.3

c 2 2t

c 1+ a 1 + a 2

if

diverges.

Choosing values for some pair yt′ , yt′′ determine k1 and k2 .

Roots of the characteristic equation: complex and distinct − a21 , β

a21 − 4a2

q 4a 2 −a21

With complex roots, < 0. Let α = = , then λ1 , λ2 = α ± i β, 2 p 2 where i = −1, give the roots of b + a1 b + a2 = 0. Thus, the complementary solu-

tion has the form:

ytc = k1 λ1t + k2 λ2t . Considering λ1 , this has value α in the real plane ( x-axis) and β in the imaginary plane ( y-axis). The angle formed by the ray from the origin is θ given by θ = β

cos−1 ( αr ) = sin−1 ( r ), so we can write α = r cos θ and β = r sin θ, where r =| λ1 |= p | λ2 |= α2 + β2 (the length of the hypotenuse). Thus, λ1 = r(cos θ + i sin θ) and λ2 = r(cos θ − i sin θ). According to de Moivre’s theorem

(cos θ + i sin θ)n = (cos nθ + i sin nθ)

(cos θ − i sin θ)n = (cos nθ − i sin nθ).

Therefore, the complementary solution may be written:

ytc = k1 r t (cos θ t + i sin θ t) + k2 r t (cos θ t − i sin θ t) = r t [(k1 + k2 ) cos θ t + i (k1 − k2 ) sin θ t]

= r t [b1 cos θ t + b2 sin θ t],

where b1 = k1 + k2 and b2 = i (k1 − k2 ).

Since the solution is real, b1 and b2 must be real. Setting k1 = k2 = 2k , b1 = k,

b2 = 0 and ytc = r t c cos(θ t). Similarly, setting k1 = − i 2b , k2 = i b2 , b1 = k1 + k2 = 0 and

17.7. MISCELLANEOUS CALCULATIONS

599

b2 = i (k1 − k2 ) = i (− i ) c = c, and ytc = r t c sin(θ t). Thus, the complementary solution

is of the form ytc = r t [ c1 cos(θ t) + c2 sin(θ t)].

For the particular solution, observe that if (1 + a1 + a2 ) = 0 then a1 = −(1 + a2 )

so that a21 − 4a2 = [−(1 + a2 )]2 − 4a2 = 1 + 2a2 + a22 − 4a22 = 1 − 2a2 + a22 = (1 − a2 )2 ≥ 0.

Thus, 1 + a1 + a2 = 0 implies a21 − 4a2 ≥ 0. With complex roots, a21 − 4a2 < 0, it must be that 1 + a1 + a2 6= 0. Consequently, considering

yt + a1 yt−1 + a2 yt−2 = c, p

the particular solution is given by: yt =

c 1+ a 1 + a 2 .

So, the general solution is:

yt = r t [k1 cos(θ t) + k2 sin(θ t)] +

Note that r =

r

a21 4

+

4a 2 −a21 4

| cos(θ t)| ≤ 1 and | sin(θ t)| ≤ 1.

=

p

c 1 + a1 + a2

.

a2 and a2 > 0 since a21 − 4a2 < 0. Also,

Summarizing the discussion of the second-order difference equation, yt +a1 yt−1 +

a2 yt−2 = c: the particular and complementary solutions have the form:

p yt

ytc

=

 c  ,    1+ a 1 + a 2 c

a 1 +2 a 2    c 2 t,

if a1 + a2 6= −1 if a1 + a2 = −1, a1 + 2a2 6= 0

t,

if a1 + a2 = −1, a1 + 2a2 = 0

 t t    k1 λ1 + k2 λ2 , 

= k1 λ1t + k2 tλ2t ,    

r t [B1 cos(θ t) + B2 sin(θ t)],

roots real and distinct roots real and equal roots complex,

p

with general solution yt = yt + ytc .

17.7.3

Stability

An important question concerns the dynamic behavior of the solution over time: how does the general solution, yt , vary over time? To study this, the three cases must be considered separately.

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

17.7.3.1

600

Stability: roots real and distinct

In this case a21 − 4a2 > 0. For stability, the roots must have absolute value less than 1. Therefore

(a)

− 1 < λ1 =

−a1 +

q

a21 − 4a2

2

< 1, and (b)

− 1 < λ2 =

−a1 −

q

a21 − 4a2

2

< 1.

These conditions may be written: (a) Since −

− 2 + a1 < q

q

a21 − 4a2 < 2 + a1 and (b)

a21 − 4a2 < 0
y2 ): (−2 + a1 )2 > a21 − 4a2

or 4 − 4a1 + a21 > a21 − 4a2

or 4 − 4a1 + 4a2 > 0

or 1 − a1 + a2 > 0. Also, note that 0 < 2+ a1 and −2+ a1 < 0, so that −2 < a1 < 2 (and since a21 −4a2 > 0,

a2 < 1). To summarize, with a21 − 4a2 > 0, the model is stable if: • −2 < a 1 < 2 • 1 + a1 + a2 > 0 • 1 − a 1 + a 2 > 0.

17.7.3.2

Real and equal roots

In this case, a21 − 4a2 = 0. In this case we require a1 + a2 6= −1 (so the particular

solution is constant) and −1
0 since a21 − 4a2 < 0). So, in the case where a21 − 4a2 < 0, stability requires that • a2 < 1.

17.7.3.4

Stability regions

These three cases determine three regions within the set {(a1 , a2 ) | −2 ≤ a1 ≤ 2, −1 ≤ a2 ≤ 1}.

1. Real distinct roots: (i) a21 − 4a2 > 0, (ii) 1 + a1 + a2 > 0, (iii) 1 − a1 + a2 > 0. 2. Real equal roots: (i) a21 − 4a2 = 0, (ii) −2 < a1 < 2. 3. Complex distinct roots: (i) a21 − 4a2 < 0, (ii) a2 < 1. A graph of these regions is given in Figure 17.7.

a2

a2

1

1

a21 − 4a2 = 0

complex roots a1

−2

−1

1

1 + a1 + a2 = 0

−1

2

1 − a1 + a2 = 0

a1 −2

−1

real roots −1

1

2

real repeated roots

Figure 17.7: Stable regions: real, repeated and complex roots

This page intentionally left blank

Exercises for Chapter 17

Exercise 17.1 Let y0 be the initial value of y. Find the long-run dynamics of the following four systems:

yt =

1 y t −1 + 2 2

(17.13)

1 y t = − y t −1 + 2 2

(17.14)

y t = 2 y t −1 + 2

(17.15)

y t = −2 y t − 1 + 2 .

(17.16)

Exercise 17.2 Suppose that an amount P0 is borrowed at time t = 0. The interest

rate is r so after one period the debt grows to P0 (1 + r). Each period a repayment

d is made, so that the end of period 1, the outstanding debt is P1 = P0 (1 + r) − d .

The value of the loan proceeds over time in this way.

(a) Find the equation governing the evolution of P t .

(b) Given a period T , find the value of d that amortizes (pays off) the loan exactly at the end of period T .

Exercise 17.3

In a labor-market model, suppose that p(u | e) is the fraction of

employed people who become unemployed next period and p(u | u) the fraction of unemployed who remain unemployed. Let p(u | e) = 0.05 and p(u | u) = 0.3. If u t 603

CHAPTER 17. LINEAR DIFFERENCE EQUATIONS

604

is the unemployment rate (and 1 − u t the unemployment rate), then u t evolves according to the equation:

u t+1 = p(u | e)(1 − u t ) + p(u | u)u t . Find the long-run dynamic behavior of u t . Exercise 17.4 Consider the dynamic model

yt+1 = ayt + c. (a) Plot the equation yt+1 = ayt + c and the steady-state line yt+1 = yt in ( yt+1 , yt ) space, for: (i) a =

1 2

and c = 4.

(ii) a = − 21 and c = 4. (iii) a = 2 and c = −1. (b) In each case, trace the dynamics from the initial value of y0 = 1. Exercise 17.5 Let y0 be the initial value of y. Find the long-run dynamics of the following four systems: 1 y t −1 + 3 t 2 1 y t = − y t −1 + 3 t 2

yt =

(17.17) (17.18)

y t = 2 y t −1 + 3 t

(17.19)

yt = −2 yt−1 + 3 t.

(17.20)

Exercise 17.6 Let y0 be the initial value of y. Find the long-run dynamics of the following four systems: 1 y t −1 + 3 t 2 1 y t = − y t −1 + 2 t 2 µ ¶t 1 y t = 2 y t −1 + 2

yt =

y t = −2 y t − 1 + 2 t .

(17.21) (17.22) (17.23) (17.24)

18 Probability and distributions

18.1 Introduction A probability distribution gives the frequency of occurrence of some variable. For example, the income distribution is summarized by a function F, where F(y) gives the fraction of the population with income at or below y. If a person is selected at random from the population, then the probability that the person will have income below y is F(y). Similarly, one may consider the age distribution, represented by G, so that G(a) is the fraction of people with age no larger than a, or a wage distribution where H(w) gives the proportion of people earning a wage no larger than w. Thus, a distribution provides information regarding the frequency of a certain characteristic or property in a population. Considering the income distribution, suppose a person is selected at random from the population and income recorded. At the selection process, let Y be the as yet to be determined income. This unknown number is called a random variable. The probability that the person selected at random will have an income below y is denoted P(Y ≤ y)

and this is equal to F(y). In Section 18.2, random variables are introduced and briefly described. Intuitively, every distribution function describes the behavior of a random variable. Section 18.3 introduces the distribution function and its density. In some cases, a random variable may only have a finite number of possible values. For example, if age is recorded in years, then a person selected at random will have an age in the set {0,1,... , a¯ } where a¯ is the age of the oldest person alive. Or there may be an infinite number of possibilities – consider picking a positive integer, so the possible outcomes are given by the set {1,2,... }. In this case there is

605

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

606

a (countable) infinity of possible outcomes. Finally, the outcome possibilities may be infinite: if a number is chosen at random from the unit interval, then there is a continuum of possible outcomes (any number in the interval [0,1]). In the case where the random variable has a finite or countable number of possible outcomes, say that the random variable is discrete, if there are a continuum of possible outcomes, say the random variable is a continuous variable. These categorizations lead to an analogous categorization of distributions into those which attach positive probability to a finite or countable set of points, called discrete distributions, and another group where it is not possible to describe the distribution in terms of the probability of each individual outcome. For a certain category, these distributions are represented by a continuous distribution function (and this may be generalized to allow for discontinuities). Section 18.4 considers a few important discrete distributions (the Bernoulli, binomial and Poisson distributions). Finally, Section 18.5 describes the primary distributions used in hypothesis testing in economics: the uniform, normal, t, chi-square and F distributions.

18.2 Random variables Consider the experiment of tossing a six-sided die with each side (labeled 1 to 6) equally likely. For the experiment, there are six possible outcomes {1,2,3,4,5,6} and each outcome is equally likely with probability

1 6.

We can formalize this by

denoting the outcome of the die toss by X , where X can take on any one of the six possible values. Write P(X = i) to denote the probability that X takes the

value i (the toss results in side i facing up). With the probability of each outcome assigned a number, we can determine the probability that the number drawn is less than or equal to 3 (P(X ≤ 3) = P(X = 1) + P(X = 2) + P(X = 3)), that the prob-

ability is greater than 3 (P(X > 3) = P(X = 4) + P(X = 5) + P(X = 6) = 1 − P(X ≤ 3)), and so on. The notion of a random variable gives a formal way of examin-

ing random outcomes with associated probabilities. Any experiment with a finite number of outcomes { x1 , x2 ,... , x n } can be represented in this way: if outcome x i has probability p i , the associated random variable has corresponding probability P(X = x i ) = p i . With this formalism in place one can identify defining features P of the random process. The mean is denoted µ and defined µ = ni=1 p i x i , and P the variance, σ2 , is defined σ2 = ni=1 p i (x i − µ)2 . The mean provides one measure of “central tendency”. For example, if the die were tossed very many times we

would expect 1 to be drawn roughly one-sixth of the time, and likewise for the other numbers – so that the average would be approximately equal to the mean

18.2. RANDOM VARIABLES

607

µ=

P

i

p i x i = 16 1 + 61 2 +· · ·+ 16 6 = 3 12 . While the mean provides a measure of “central

tendency”, variance measures dispersion. In the die-tossing case, the variance is: ¡ ¢2 ¡ ¢2 ¡ ¢2 ¡ ¢2 ¡ ¢2 ¡ ¢2 σ2 = 1 − 3 12 + 2 − 3 12 + 3 − 3 12 + 4 − 3 12 + 5 − 3 12 + 6 − 3 12 = 17 21 .

This formalism allows one to study or represent arbitrary situations where outcomes are unknown. For example, we could let X denote the number of inches (or centimeters) of rainfall in a given city for next year, so that P(X = i) = p i

denotes the probability that there will be i inches of rainfall. In this case, we can say that the probability that there will be i inches of rainfall or less next year P is P(X ≤ i) = j ≤ i p j . From this perspective, the numbers p i are given – laws of nature generate the rainfall according to the same distribution from year to year. We may attempt to estimate the distribution using historical data: let pˆ i denote the fraction of recorded years in which rainfall was i. Then, pˆ i provides an estimate of the value of p i . Recall that a random variable is called discrete if it can have only a finite or countable number (distinct values may be labeled with the integers) of possible values. In the finite case, the random variable is described by the outcomes { x1 , x2 ,... , x n } and probabilities { p1 , p2 ,... , p n }. For example, with eight possible outcomes, the distribution may be depicted as in Figure 18.1.

p2 p1 p3

x1

x2

x3

x4

x5

x6

x7

x8

Figure 18.1: Probability density As an example of the countable case, let the possible outcomes be { x1 , x2 ,... , x n ,...}, where P(X = x i ) = (1 − δ)δ i−1 . Thus, P(X = x1 ) = (1 − δ), P(X = x2 ) = (1 − δ)δ,

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

P(X = x3 ) = (1 − δ)δ2 , and so on. Thus,

P

608

i P(X

= x i ) = (1 − δ)[1 + δ + δ2 + · · · ] = 1, so

the probabilities add up to 1. For a discrete random variable X that has outcome P x i with probability p i , the mean µ = x i p i is often called the expected value of X P and written E { X }. Similarly, the variance, σ2 = (x i − µ)2 p i is written E {(X − µ)2 }. If the random variable is continuous, there is a continuum of possible out-

comes. Some random events are most suitably modeled using continuous variables. For example, consider a device for filling cans of oil. Design specification requires that each can be filled to, say, 330 ml. In operation, few cans will be filled exactly; although the average fill may be 330 ml, a can selected at random may have, for example, 328.755 ml. However, most cans will be filled to a level close to 330 ml. Describing the dispersion around 330 ml; is most easily modeled with a continuous random variable. In this case, for example, consider a random variable X with the property that P(X < 320) = 0 and P(X > 340) = 0, and for x ∈ [320,340],

P(X < x) =

x−320 340−320

=

x−320 20 .

Thus, for example, P(X ≤ 330) =

10 20 .

18.3 The distribution function and density A random variable takes on different values with specific probabilities. The function attaching probabilities to outcomes is called the distribution function (or the related function, the density). The distribution function, F, associated with the random variable X is defined F(x) = P(X ≤ x).

(18.1)

Thus, F(x) is the probability that X will be no larger than x. Since P(X ≤ x) ≤ P(X ≤ x′ ) whenever x′ ≥ x, F(x) ≤ F(x′ ) for x′ ≥ x, so that F(x) is non-decreasing in

x. Given F, other probabilities may easily be computed. For example, P(xa < X ≤

x b ) = P(X ≤ x b ) − P(X ≤ xa ) = F(x b ) − F(xa ). A random variable X distributed on

an interval [a, b] satisfies P(X < a) = P(X > b) = 0. At a, the distribution function

satisfies F(a) = 0 and at b, F(b) = 1. The smallest such interval, [a, b] is called the

support of the distribution: with probability 1, the random variable will fall within

this interval (and for any a′ > a, F(a′ ) > 0, while for b′ < b, F(b′ ) < 1). A random variable X is non-negative if P(X ≥ 0) = 1. Figure 18.2 illustrates two distribution functions with supports [0,1] and [0,4], respectively.

In the examples depicted, the distribution functions are continuous. In the case of discrete random variables, the distribution function is defined in the same way: F(x) = P(X ≤ x). Since X takes values from a set { x1 , x2 ,... , x k ,... }, with

18.3. THE DISTRIBUTION FUNCTION AND DENSITY

609

1−

F(x) = x2

• 1

F(x) =

1−

1p 2 x

x

• x 4

Figure 18.2: Cumulative distribution functions

x i < x i+1 for all i, F(x) =

P

xi ≤ x P(X

= x i ), so that the distribution is a step function

increasing by the amount P(X = x i ) = p i at x = x i .

In general, a distribution function may have features of continuous and discrete distributions. For example, F(x) = 21 x for x
a

is given by:

P ( X ≤ a′ | X > a) =

F (a′ ) − F (a) . 1 − F (a)

The logic for this expression is simple. The probability that X lies in the interval [a, a′ ] is F (a′ ) − F (a) and the probability that X falls above a′ is 1 − F (a′ ). The total

probability of these two outcomes is 1 − F (a) = (F (a′ ) − F (a)) + (1 − F (a′)), so the relative likelihood of each outcome adds up to 1 with

1=

F (a′ ) − F (a) 1 − F (a′ ) + . 1 − F (a) 1 − F (a)

Given that X is above a, the probability that it is in the interval [a, a′ ] is and the probability that it is on or above a′ is

1− F ( a ′ ) 1− F ( a ) .

Hence, the conditional prob-

ability of X falling in [a, a′ ] given that X ≥ a is P ( X ≤ a′ | X > a) = ′

F ( X ≤ a | x ≥ a).

F ( a ′ )− F ( a ) 1− F ( a )

F ( a ′ )− F ( a ) 1− F ( a )

=

18.3. THE DISTRIBUTION FUNCTION AND DENSITY

611

18.3.2

Joint distributions

Given two random variables, X and Y , let the probability that X is in a set (called an event) A and Y in a set B be P(X ∈ A, Y ∈ B). When A and B are two half ¯ and B = [−∞, y], ¯ this becomes intervals, A = [−∞, x]

P(X ∈ A, Y ∈ B) = P(X ∈ (−∞, x], Y ∈ (−∞, y]) = P(X ≤ x, Y ≤ y).

(18.2)

Definition 18.1: The joint distribution of two random variables X ,Y is a function F(x, y) giving the probability that X is no larger than x and Y is no larger than y, and is defined: (18.3)

F(x, y) = P(X ≤ x and Y ≤ y).

Assuming sufficient differentiability, the density f (x, y) associated with F is defined as f (x, y) =

∂2 F . ∂ x∂ y

E XAMPLE 18.1: F(x, y) =

xy , (2 − x y)

0 ≤ x ≤ 1, 0 ≤ y ≤ 1

(Figure 18.4) with corresponding density F x y (x, y) = f (x, y) =

1 3yx 2x2 y2 + + . (2 − x y) (2 − x y)2 (2 − x y)3

z

y

x

Figure 18.4: The function F(x, y) =

xy 2− x y

(18.4)

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

612

Because X and Y are random variables, we may also consider the probability that X or Y lie in specific sets, such as P ( X ∈ A ). In this case, the distribution of

X is F X ( x) = P ( X ≤ x) and the distribution of Y , FY ( y) = P (Y ≤ y).

18.3.2.1

Independence

Definition 18.2: The random variables X and Y are said to be independent if for any events A and B,

P ( X ∈ A, Y ∈ B) = P ( X ∈ A )P (Y ∈ B).

(18.5)

An equivalent condition is that X and Y are independent if

P ( X ≤ x, Y ≤ y) = P ( X ≤ x)P (Y ≤ y),

(18.6)

∀ x, ∀ y.

In terms of the distribution function, two random variables X and Y are independent if the joint distribution F ( x, y) may be written F ( x, y) = F X ( x)FY ( y) for all x, y, where F X (FY ) is the distribution of X (Y ). Equivalently, X and Y are

independent if the density factors satisfy f ( x, y) = f X ( x) f Y ( y), where f X ( f Y ) is

the density of X (Y ).

E XAMPLE 18.2: If F ( x, y) = x y, then with F X ( x) = x and FY ( y) = y, F ( x, y) =

♦

F X ( x)FY ( y), and X and Y are independent.

With n random variables, X 1 , X 2 ,... , X n , the condition is unchanged:

F ( x1 , x2 ,... , x n ) = F X 1 ( x1 )F X 2 ( x2 ) · · · F X n ( x n ),

∀ x2 , x2 ,... , x n ,

and F x1 x2 ··· xn ( x1 , x2 ,... , x n ) = f ( x1 , x2 ,... , x n ) = f X 1 ( x1 ) f X 2 ( x2 ) · · · f X n ( x n ). Two random variables X and Y have a jointly normal distribution if

f ( x, y) = p

where ρ = and Y .

Cov( X ,Y ) σx σ y

1 p

πσ x σ y

=

1 − ρ2

e

E{( X −µ x )(Y −µ y )} σx σ y

− 2(11−ρ ) [(

y−µ y 2 ( x−µ x )( y−µ y ) x−µ x 2 ] σ x ) +( σ y ) −2 ρ σx σ y

,

is the correlation coefficient between X

18.4. DISCRETE DISTRIBUTIONS

613

In the special case where ρ = 0, y−µ y x−µ x 2 1 ) +( σ y ) 2 ] − 1 [( e 2 σx πσ x σ y 1 y−µ y 2 x−µ x 2 1 1 ) ( −1( ) =p e 2 σx p e 2 σy πσ x πσ y

f ( x, y) =

= f ( x ) f ( y ),

where f ( x) =

p1 πσx

e

− 21 (

x−µ x 2 σx )

.

R EMARK 18.1: Note, therefore, that for the normal distribution, 0-correlation implies independence: if two normally distributed random variables are uncorrelated, then they are independent.

18.4 Discrete distributions A discrete distribution is characterized by a set of possible outcomes { x1 , x2 ,... } and corresponding probabilities { p1 , p2 ,...}. Thus, the random variable X takes P the value x i with probability p i . The expectation of X is µ = i p i x i ; the variP ance σ2 = i p i ( x i − µ)2 . More generally, given any function h, one can define the P expectation of h as E {h( X )} = i p i h( x i ). The following section describes three distributions, the Bernoulli, binomial and Poisson distributions.

18.4.1

The Bernoulli distribution

Consider a random variable X with two possible outcomes, 0 or 1. Suppose the probability of the outcome 1 is given by p (and 1 − p for the outcome 0): P ( X = 1) =

p, P ( X = 0) = 1 − p. Then, the expected value of X is E { X } = p · 1 + (1 − p) · 0 = p;

and the variance σ2 = p(1 − p)2 + (1 − p)(0 − p)2 = p(1 − p)2 + (1 − p) p2 = p(1 − p)

[(1 − p) + p] = p(1 − p).

18.4.2

The binomial distribution

Suppose that there is a large number of identical coins (a population of coins), labeled 0 on one side and 1 on the other. Each coin when tossed has probability p of side 1 coming up, and probability (1 − p) of 0 coming up. Suppose that

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

614

n coins are drawn randomly from this population and each coin tossed in turn with the outcome recorded. The resulting outcome is a list of n entries such as (0, 1, 1, 0, · · · , 1) or (1, 1, 0, 1, · · · , 0). In each position, there are two possible entries: 0 or 1, so there are 2n possible lists. The probability of each such list is easy to

determine. For example, the probability of the list (0, 1, 1, 0, · · · , 1) is the probabil-

ity that 0 is drawn first, followed by 1, followed by 1, followed by 0, and so on, giving the probability (1 − p) · p · p · (1 − p) ·· · · · p. If there are k 1s and n − k 0s, then

this number is p k (1 − p)n−k . Any other list with exactly k 1s will have the same

probability. How many lists are there with exactly k 1s? This is discussed next.

18.4.2.1

Combinations and permutations

Suppose that there are n distinct objects, labeled 1 to n. One could imagine putting these objects side by side. How many distinct arrangements are there? First, observe that in placing an object in the first position, there are n choices – for any one of the n objects. With that done, there are n − 1 possible choices left for

the next (second) position, then n − 2 choices for the third position, and so on.

Thus, there are n × (n − 1) × (n − 2) × · · · × 1 possible arrangements. This expres-

sion is denoted n!. Consider next the case where there are k objects to be placed in n possible positions. There are n possible positions for the first object, (n − 1)

positions for the second, and n − k positions for the kth object. Thus, there are

n × (n − 1) · · · × (n − k + 1) possible arrangements of the k objects in the n positions.

Observe that n × (n − 1) × · · · × (n − k + 1) =

n×( n−1)×···×2×1 ( n− k)×( n− k−1)×···×1

=

n! ( n− k)! .

This expres-

sion is called the number of permutations of n elements taking k at a time and

is denoted P n,k . For example, with two objects labeled 1 and 2, and three possible positions, these objects can be arranged as follows: (1, 2, ⋆), (1, ⋆, 2), (2, 1, ⋆), (2, ⋆, 1), (⋆, 1, 2), (⋆, 2, 1), and here 3 × 2 =

3! (3−2)!

= 6.

Notice that in the example, comparing the first and third arrangement, you see that the same positions are occupied, and likewise comparing the second and fourth, and the fifth and sixth arrangements. If instead, we were just interested in the positions occupied by the set of k items, then to fill any given k positions with k items, there are k possible choices for the first entry, k − 1 for the second,

and so forth. Thus, there are k! ways in which we can place the k objects to make an arrangement. In the example, the first and third arrangement have the same locations occupied by interchanging the items. Since any given k positions in an arrangement can be occupied in k! different ways (by assigning the objects), the number of distinct ways in which the n positions can be filled by the k objects is 1 n! ( n− k)! × k!

=

n! ( n− k)!k! . In

the example, denoting a position filled by “+”, gives (+, +, ⋆),

18.4. DISCRETE DISTRIBUTIONS

615

(+, ⋆, +), (⋆, +, +). There are 3 =

3! (3−2)!2!

ways to fill two of the three positions. This

observation answers the question of how many ways we can arrange k 1s in n possible positions:

n ( n− k)!k! .

The formula is the number of combinations of n objects ¡n¢ ¡n¢ n k for this expression: k = ( n− k)!k! .

taking k at a time. It is common to write

More generally, suppose that the k items can be categorized into r distinct

groups. Say that there are k i items of category i with k = k1 + k2 + · · · + k r . Again, the formula

n! ( n− k)!

counts as distinct all possible arrangements of a given category.

If the items in category i are allocated to a fixed set of positions in a list, there are k i ! ways to do this. Hence, if the items in a category are not distinguished in a list, there are

n! ( n− k)!k 1 !k 2 !··· k r !

such lists. For example, if there are k g green

balls, k b blue balls, k r red balls and k y yellow balls with k = k g + k b + k r + k y < n,

then there are white).

18.4.2.2

n! ( n− k)!k g !k b !k r !k y !

color patterns possible (consider the n − k positions

The binomial distribution continued

From the previous discussion, the number of ways in which we can have k 1s in n possible positions or locations is ( n−nk!)!k! . Thus, the probability of having a profile ¡ ¢ with n − k 0s and k 1s is nk p k (1 − p)( n−k). According to the binomial theorem, for any numbers x and y and any positive integer n,

( x + y) n =

n ³n´ X

k =0

k

x k y( n − k ) .

(18.7)

In the particular case where x = p any y = 1 − p, 1 = ( p + 1 − p) n =

n ³n´ X

k =0

k

p k (1 − p)( n−k) .

(18.8)

So, the probabilities add up to 1 (as they should). Suppose that we have n identical coins with 0 on one face and 1 on the other. Suppose also that the probability that any one of these coins when tossed has probability p of falling with face 1 up. Let X i be the random variable representing the i th coin: P ( X i = 1) = p and P ( X i = 0) = 1 − p. Consider the random variable

X = X 1 + X 2 + · · · + X n . Tossing each coin, we can write down the outcome – to

produce a list such as (1, 0, 0, 1,... , 1), in which case the first and fourth coin fell with 1 up, the second and third coin with 0 up, and so on. The outcome on any coin is independent of the outcome on any other. From the earlier discussion, the

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

616

probability of exactly k 1s is: P(X = k) =

³n´

k

p k (1 − p)(n−k) ,

k = 0,... , n.

(18.9)

So, the sum of n independent Bernoulli variables has the binomial distribution (Figure 18.5).

binomial probabilities: n = 12, p = 12

65 binomial probabilities: n = 12, p = 100

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

Figure 18.5: The binomial distribution

18.4.3

The Poisson distribution

Let X be a random variable denoting the number of occurrences in one unit of time. Let P(X = k) be the probability of k occurrences, where k is a non-negative integer. The Poisson distribution assigns probability: P(X = k) =

e −λ λ k , k!

k = 0,1,...

(18.10)

to k occurrences, where λ > 0 is a given parameter. A mathematical fact is that P P P αk e −λ λ k λk for any real number α, eα = ∞ . Thus, ∞ = e −λ ∞ = e−λ eλ = 1, k=0 k! k=0 k! k=0 k! so that {P(X = k)}∞ does indeed define a distribution (Figure 18.6). k =0

When the unit of time is not necessarily 1 but, say, ∆, so that P(X = k) denotes

the probability of k occurrences in the period ∆ is given by: P(X = k) =

e−λ∆ (∆λ)k , k!

k = 0,1,...

So, for example, P(X = 0) = e−λ∆ and P(X = 1) = e−λ∆ (∆λ).

(18.11)

18.4. DISCRETE DISTRIBUTIONS

617

Poisson probabilities: λ = 3

Poisson probabilities: λ = 5

k

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

Figure 18.6: The Poisson distribution: P ( X = k) =

k

e −λ λ k k!

Comparing the probability of k and k + 1 occurrences: P ( X = k) =

e−λ∆ (∆λ)k k!

and

e−λ∆ (∆λ)k+1 k + 1! e−λ∆ (∆λ)k (∆λ) · = k! k+1 (∆λ) . = P ( X = k) · k+1

P ( X = k + 1) =

λ Thus, P ( X = k + 1) = P ( X = k) × ( k+ 1 )∆. The expected value of X is

E{ X } =

∞ X

k =0

P ( X = k) k =

∞ e−λ∆ (∆λ) k X k = ∆λ. k! k =0

The variance is: Var( X ) =

∞ X

k =0

P ( X = k)(k − ∆λ)2 = ∆λ,

so that the mean and variance are equal. The Poisson distribution is sometimes used to model queuing or arrival of customers at a service point.

18.4.4

The Poisson process

For each t ≥ 0, let X t be a random variable denoting the number of occurrences

of an event in the time interval [0, t]. For example, X t might denote the number of customers entering a store in the time period from 0 up to time t. Let X t have

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

618

the distribution

P ( X t = k) =

e−λ t ( tλ)k , k = 0, 1, 2,... , k!

giving the probability of k occurrences in the time period [0, t). Furthermore, suppose that X t has independent increments, so that X t − X s is independent of

X q − X p for t > s ≥ q > p with

P ( X t − X s = k) =

e−λ( t−s)[( t − s)λ]k . k!

(18.12)

(Two random variables, Y and Z , are said to be independent if for any pair of intervals (a, b) and ( c, d ), P (Y ∈ (a, b) and Z ∈ ( c, d )) = P (Y ∈ (a, b)) × P ( Z ∈ ( c, d )).)

From Equation 18.12, X t − X s has a Poisson distribution, giving the probability of

k occurrences in the time interval [s, t]. So, for any interval of time, ∆, the distri-

bution over occurrences of the event is P ( X s+∆ − X s = k) =

e−λ∆ (∆λ)k , k!

for any s ≥ 0.

E XAMPLE 18.3: To illustrate this, suppose that the number of customers arriving in a store is 20 per hour. From earlier calculations, λ gives the mean number of occurrences per unit of time (1 hour), so that λ = 20: P ( X t+1 − X t =

k) =

e−λ1 [1λ]k . k!

The probability of k occurrences in a ten-minute period ( 16 of

an hour) is, for any t, P ( X t+ 1 − X t = k) = 6

e

−λ 1 1 k 6 ( λ) 6

k!

♦

.

R EMARK 18.2: The Poisson process is characterized by a few simple postulates. 1. For a short length of time ∆, for all t in the family of random variables X t ,

t ≥ 0 satisfies: (a) P ( X t+∆ − X t = 0) = 1 − λ∆ + o(∆) (b) P ( X t+∆ − X t = 1) = λ∆ + o(∆) (c) P ( X t+∆ − X t ≥ 2) = o(∆), where o(∆) is a term of smaller order than ∆ (so that

o ( ∆) ∆

→ 0 as ∆ → 0).

2. If t1 < t2 < t3 < t4 , then X t 4 − X t 3 and X t 2 − X t 1 are independent random variables.

In this case, the family of random variables X t , t ≥ 0, satisfy Equation 18.12.

18.5. CONTINUOUS DISTRIBUTIONS

619

x 10 9 8 7 6 5 4 3 2 1

P ( X t+∆ − X t = 1) P ( X t+∆ − X t = 0)

•

t0 t1

• • t t+∆

t2

• 0

t

• 1

• 2

• 3

• 4

• 5 x t +∆ − x t

Figure 18.7: Poisson process: density at t

18.5 Continuous distributions In this section, continuous distributions are considered, characterized by a cumulative distribution F and density f = F ′ . In this case, the mean is E { X } = µ = R R x f ( x)dx and variance E {( X − µ)2 } = σ2 = ( x − µ)2 f ( x)dx. Similarly, given a funcR tion h( x), E {h( X )} = h( x) f ( x)dx. The following discussion introduces the primary distributions used in hypothesis testing in economics: the uniform, normal, t, chi-square and F distributions.

18.5.1

The uniform distribution

Consider the task of selecting a number at random from the interval [a, b]. With each number “equally likely”, this random variable is said to be uniformly distributed on [a, b] with distribution given by

F ( x) =

1 ( x − a), x ∈ [a, b]. b−a

The corresponding density is f ( x) = µ=

=

Zb a

x f ( x)dx =

Zb a

x

The uniform distribution has mean

1 1 dx = b−a b−a

1 (b + a)(b − a) 1 = (b + a). 2 b−a 2

The variance is σ2 =

1 b−a .

1 2 12 ( b − a) .

Zb a

xdx =

1 b2 − a2 1 1 x2 |ab = 2 b−a 2 b−a

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

18.5.2

620

The exponential distribution

A common model of the time of occurrence of an event uses the exponential distribution. This has density: f (x) = λ e−λ x , x ≥ 0. This has cumulative distribution F(x) = 1 − e−λ x with x ≥ 0. The distribution has mean µ =

1 λ

and variance σ2 =

1 . λ2

This distribution is particularly useful in mod-

eling waiting times between infrequent events. For example, suppose that a new discovery occurs three times every year. Then on average, a discovery occurs every four months or

1 3

of a year. The expected time between innovations is

1 3.

Model

the random waiting time to the arrival of an innovation as having density f (x) = 3e−3x . So, for example, the probability that an innovation arrives in 2 months or 16 ¡ ¢ 1 1 of a year is then F 16 = 1 − e−3 6 = 1 − e− 2 = 1 − 0.6065 = 0.3935; and the probability

that an innovation arrives within a year is F(1) = 1 − e−3·1 = 1 − e−3 = 0.9502. One interesting feature of the exponential distribution is that the conditional distri-

bution and the unconditional distribution are the same. That the event has not occurred does not imply that it will occur relatively soon: P(X ∈ [a, a′ ] | X ≥ a) =

′

′

F(a′ ) − F(a) e−λa − e−λa e−λa [1 − e−λa eλa ] = = − λ a 1 − F(a) e e −λ a ′

=

− eλa [1 − e−λ(a −a) ]

e λa

′

= 1 − eλ(a −a) = F(X < a′ | x > a) = 1 − eλ s ,

s = a′ − a

= F(s).

The distribution of the wait time is the same at time x = a, given no arrival, as it is at time x = 0.

18.5.3

The normal distribution

The normal distribution is often used to model the spread of a characteristic in a population. For example, IQ in a population is commonly believed to be normally distributed. The density of the normal distribution with mean µ and variance σ2 is depicted in Figure 18.8.

18.5. CONTINUOUS DISTRIBUTIONS

621

f (x) =

p1 e 2πσ

−

(x−µ)2 σ2

x µ

Figure 18.8: The normal density

The normal distribution has density: 2

(x−µ) 1 − f (x) = p e 2σ2 , 2πσ

−∞ < x < ∞,

(18.13)

with corresponding cumulative distribution: F(x) =

Zx

−∞

f (ξ)d ξ =

2

Zx

(ξ−µ) 1 − e 2σ2 d ξ. p −∞ 2πσ

(18.14)

This does not integrate to a closed form solution. In the special case where µ = 0

and σ2 = 1, the distribution is called the standard normal distribution, with denRx 1 2 sity ϕ(x) = p1 e− 2 x and cumulative distribution Φ(x) = −∞ ϕ(ξ)d ξ = 2π Rx 1 2 p1 e− 2 ξ d ξ. The cumulative distribution of the standard normal distribution −∞ 2π

is depicted in Figure 18.9.

R EMARK 18.3: Because Φ(x) → 1 as x → ∞, 1 = p R∞ 1 2 implies that 2π = −∞ e− 2 ξ d ξ.

R∞

−∞

1 2 p1 e− 2 ξ d ξ, 2π

and this

Notice that Φ(−1.645) = 0.05 and Φ(1.645) = 0.95, so that a random variable

drawn from a standard normal distribution will fall below −1.645 with probability 0.05 and above 1.645 with probability 0.05. Therefore, with probability equal to 0.9,

the

random

variable

will

fall in the

reference: x

Φ(x)

range

[−1.645,1.646].

For

−3.090 −2.576 −2.326 −1.960 −1.645 −1.282 1.282 1.645 1.960 2.326 2.576 3.090

0.001

0.005

0.010

0.025

0.050

0.100

0.900 0.950 0.975 0.990 0.995 0.999

A useful fact about normally distributed random variables is:

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

622

0.95

Φ( x) 1 2

0.05

• −1.645

x

• 1.645

Figure 18.9: Distribution function of standard normal random variable Theorem 18.1: If X is a normally distributed random variable with mean µ and variance σ2 , then Z = a + bX has a normal distribution with mean a + bµ and variance b2 σ2 .

Given a normal random variable X with mean µ and variance σ2 , consider the µ

µ

transformation X ′ = − σ + σ1 X . Then, according to the theorem, X ′ has mean − σ + µ σ

= 0 and variance

distribution. Since

X −µ µ 1 2 σ = 1, so that X ′ = − σ + σ1 X = σ has the standard σ2 X −µ σ ≤ k whenever X − µ ≤ kσ or X ≤ µ + kσ, so:

³

X −µ σ

or X ≤ µ + 1.960σ,

µ

¶ X −µ ≤ k = P ( X ≤ µ + k σ) . σ ´ X −µ ≤ 1.960 = 0.975. Since σ ≤ 1.960 whenever X − µ ≤ 1.960σ

P

For example, P

normal

µ

¶ X −µ P ≤ 1.960 = P ( X ≤ µ + 1.960σ) = 0.975. σ

Thus, for example, if a random variable is drawn from a normal distribution, there is a 95% probability that it will lie within 1.96 standard deviations of the mean. A second useful fact regarding normally distributed random variables is: Theorem 18.2: Suppose that X 1 , X 2 ,... , X n are normally distributed random variables. Then • X=

Pn

i =1 a i X i

is normally distributed with mean µ =

the mean of X i .

Pn

i =1 a i µ i ,

where µ i is

• If the random variables are independent then the variance of X is given by: P σ2 = ni=1 a2i σ2i , where σ2i is the variance of X i .

18.5. CONTINUOUS DISTRIBUTIONS

623

18.5.4

Moment-generating functions

The expectation of a function h( x) is E {h( X )} =

R

h( x) f ( x)dx. One special case

occurs when h( x) = e tx . The expectation is called the moment-generating function (assuming the integral exists):

m X ( t) = E { e tX } = (In the discrete case, m( t) =

P

Z

e tx f ( x)dx.

e txi p( x i ).)

Different distributions determine different moment generating functions. Theorem 18.3: Suppose that m X ( t) is defined on an interval (−ǫ, ǫ), ǫ > 0. Then

m X uniquely determines the distribution. Thus, if m X = m Y on (−ǫ, ǫ) then X and

Y have the same distribution.

One implication of this theorem is that finding the moment-generating function of a random variable identifies the distribution of the random variable. One important property of moment-generating functions is the following. Suppose that X 1 ,... , X n are n independent random variables and let Y = X 1 +

X 2 + · · · + X n . Then

m Y ( t) = E { e t( X 1+ X 2 +···+ X n ) } = E { e tX 1 }E { e tX 2 }· · · E { e tX n } = m X 1 ( t) · m X 2 ( t) · · · m X n ( t). Thus, the moment-generating function of a sum of independent random variables is equal to the product of the moment-generating functions of the individual random variables. In the case where the independent variables have the same distribution, then they have the same moment-generating function, m X so that

m Y ( t) = m X 1 ( t) · m X 2 ( t) · · · m X n ( t) = m X ( t)n . In the case of the normal distribution, the moment-generating function is: m( t) = 1 2 2 σ

eµ t+ 2 t

, where µ is the mean and σ2 the variance. Thus, for example, the sum of

n independent identically distributed random variables has moment-generating 1 2 nσ2

function m( t)n = e nµ t+ 2 t

18.5.5

, a normal distribution with mean nµ and variance nσ2 .

The chi-square distribution

A chi-square distribution with k degrees of freedom (k a positive integer) is defined by density:

f ( x) =

µ ¶k 1 1 2 k −1 − 1 x ³ ´ x2 e 2 , Γ k 2 2

x > 0,

(18.15)

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

where Γ(·) is the gamma function Γ( q) =

624

R∞ 0

ξ q−1 e−ξ d ξ. The chi-square distribu-

tion with n degrees of freedom has mean n and variance 2n. R EMARK 18.4: This gamma function is defined for any q > 0. If q > 1, then

Γ( q) = ( q − 1)Γ( q − 1). To see this, integrating by parts, Z∞ 0

ξ q −1 e −ξ d ξ = −

Z∞

ξ q −1 d { e −ξ } Z∞ Z∞ = −([ξ q−1 e−ξ ]∞ e−ξ d {ξ q−1 }) = e−ξ d {ξ q−1 }. 0 − 0

0

R∞

e−ξ ξ q−2 d ξ = ( q − 1)Γ( q − 1). p Furthermore, from the definition Γ(1) = 1, and at q = 12 , Γ( 12 ) = π. Because

The latter integral is

0

e− x d {ξ q−1 } = ( q − 1)

0

R∞ 0

Γ( x + 1) = xΓ( x), if x is an integer then Γ( x + 1) = x!. And, if x = n is an integer, ¶ µ ¶ µ ¶ µ ¶µ ¶ µ ¶ µ 1 1 1 3 3 1 = n− Γ n− = n− n− Γ n− Γ n+ 2 2 2 2 2 2 µ ¶µ ¶µ ¶ µ ¶µ ¶ µ ¶ 1 1 3 3 1 1 = n− n− n− ··· Γ 2 2 2 2 2 2 ¶µ ¶ µ ¶µ ¶ µ 3 1 p 2n − 1 2n − 3 ··· = π 2 2 2 2 µ ¶ p 1 = n 1 · 3 · 5 · · · (2n − 1) π. 2

R EMARK 18.5: The chi-square distribution is a special case of the gamma distribution, which has the density:

f ( x) =

βα

Γ(α)

x α −1 e −β x ,

where α > 0 and β > 0 and where x ≥ 0. If β =

1 2

the chi-square distribution. R∞ R∞ βα Because 0 f ( x)dx = 1, 0 Γ(α) xα−1 e−β x dx = implies the following useful fact: Z∞ 0

xα−1 e−β x dx =

and α = k2 , the distribution is βα R∞ α−1 −β x e dx Γ( α) 0 x

Γ(α) βα

.

= 1, which

(18.16)

The chi-square distribution has a close relation to the normal distribution. Theorem 18.4: • If X has a standard normal distribution, then X 2 has a chi-square distribution with one degree of freedom.

18.5. CONTINUOUS DISTRIBUTIONS

625

• If X 1 , X 2 ,... , X n are independent standard normal random variables, then Z = X 12 + X 22 +· · ·+ X n2 has a chi-square distribution with n degrees of freedom. P ROOF : For the first part, let X be a standard normal random variable and let Z = X 2 . Then for z > 0, p p Prob(Z ≤ z) = Prob(X 2 ≤ z) = Prob(− z ≤ X ≤ z),

and with F(z) = Prob(Z ≤ z), F(z) =

Zp z p

− z

1 2 1 1 p e− 2 x dx = p 2π 2π

Zp z p

1 2

e− 2 x dx.

− z

Differentiating, ·µ ¶ µ ¶ ¸ p 2 1 1 1 − 1 − 1 [p z]2 1 1 F ′ (z) = f (z) = p z 2 e 2 − − z− 2 e− 2 [− z] 2 2π 2 µ ¶ 1 1 1 1 −1 =p z 2 [e− 2 z + e− 2 z ] 2π 2 1 1 1 = p z − 2 e− 2 z 2π µ ¶1 1 1 2 1 −1 − 1 z =p z2 e 2 , π 2 ¡ ¢ p which is the chi-square distribution with one degree of freedom (Γ 12 = π).

For the second part, Z = X 12 + X 22 +· · ·+ X n2 = Z1 +· · ·+ Z n so that Z is the sum of

independent identically distributed chi-square random variables each with one degree of freedom. Take as given that a chi-square distribution with k degrees k

of freedom has moment-generating function m Z (t) = (1 − 2t)− 2 . Therefore, each 1

Z i has moment-generating function m Z i (t) = (1 − 2t)− 2 , and since h i 1 n n m Z (t) = Πni=1 m Z i (t) = (1 − 2t)− 2 = (1 − 2t)− 2 ,

the moment-generating function of Z is that of a chi-square distribution with n degrees of freedom (Figure 18.10).

18.5.6

The t distribution

The random variable X is said to have a t distribution with k degrees of freedom if the density of X is given by: f (x) =

Γ( 21 (k + 1)) Γ( k2 )

1 1 , p k +1 x kπ (1 + 2 ) 2 k

−∞ < x < ∞.

(18.17)

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

626

0.3 0.2 0.1

−1

0

1

2

3

4

5

6

7

8

9

Figure 18.10: The chi-square distribution with three degrees of freedom The parameter k > 0, the degrees of freedom, is usually taken to be an integer, although this is not necessary ( f (x) is a density for any k > 0). The t distribution with k degrees of freedom has mean 0 (k > 1) and variance important fact regarding the t distribution is:

k k −2

(k > 2). One

Theorem 18.5: Suppose that Y has a standard normal distribution and suppose that Z has a chi-square distribution with k degrees of freedom. Suppose also that Y and Z are independently distributed. Then the random variable Y X=q

Z k

has a t distribution with k degrees of freedom. P ROOF : Consider 



s ! Ã Z  Y  Prob(X ≤ t) = Prob  q ≤ t = Prob Y ≤ t , k Z k

where Y is a standard normal with density ϕ and q Z a chi-square variable with density ψ. For any Z, the set of Y satisfies Y ≤ t

Z k

(Figure 18.11). Thus, with

Y having normal density ϕ and Z having chi-square density ψ with k degrees of freedom: r ¾¶ µ½ z (y, z) | y ≤ t k q Z∞ Z t z k = ϕ(y)ψ(z)d ydz

F(t) = Prob(X ≤ t) = Prob

0

=

−∞

Z ∞ (Z t 0

q

−∞

z k

)

ϕ(y)d y ψ(z)dz.

18.5. CONTINUOUS DISTRIBUTIONS

627

Y 4 3 q Y ≤ t Zk

2 1

Z 1

2

3

−1

q Figure 18.11: The region {( y, z) | y ≤ t kz }

But

d dt

(Z

t

q

z k

−∞

)

µ r ¶r z z ϕ( y) d y = ϕ t k k

and µ ¶k 1 2 k −1 − 1 z z2 e 2 , ψ( z ) = k Γ( 2 ) 2

1

so that µ ¶k Z∞ · µ r ¶ r ¸ 1 1 2 k −1 − 1 z z z ³ ´ F ( t) = · ϕ t z 2 e 2 dz k k k Γ 2 0 ′

2

=

Z∞ " 0

1 =p 2π

1 p e 2π

Z∞ ·

³ q ´2 − 12 t kz

e

r # µ ¶k z 1 1 2 k −1 − 1 z ³ ´ · z 2 e 2 dz k Γ k 2 2

¡ 1

−2 t

0

¢ 2z k

µ ¶k 1 2 k −1 − 1 z 1 1 ´ ³ z 2 e 2 dz ·z p k 2 k Γ 2 1 2

¸

µ ¶ k Z∞ h i k ¡ ¢ 1 1 2z 1 1 1 1 1 2 ³ ´ =p p e− 2 t k · z 2 z 2 −1 e− 2 z dz k 2 0 2π k Γ 2

³ 2 ´ µ ¶ k +1 Z ∞ k +1 1 1 2 1 − 1 t +1 z ³ ´ · z 2 −1 dz. e 2 k =p 0 kπ Γ k 2

2

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

From Equation 18.16,

Z∞

e

2 − 21 tk

³

0

+1

´

R∞ 0

628

e−β z zα−1 dz =

Γ( α) βα

. With β =

³

1 t2 2 k

´ + 1 and α =

k +1 2 :

³ ´ ³ ´ k +1 k +1 Γ Γ k +1 2 2 z · z 2 −1 dz = h ³ = k +1 í í k+1 . h³ k + 1 ¢ ¡ 2 2 1 2 1 t2 t2 2 k +1 2 k +1

With F ′ ( t) = f ( t), ´ ³ µ ¶ k +1 Γ k+2 1 1 1 1 2 ³ ´ f ( t) = p · ´¸ k+2 1 kπ Γ k 2 ¡ 1 ¢ k+2 1 ³ t2 2 + 1 2 k ³ ´ Γ k+2 1 1 1 ³ ´ p = h³ 2 í k+1 , k k π 2 Γ 2 t k +1

which is the density of the t distribution.

0.4

t30

0.3

t4

0.2 0.1

−4

−3

−2

−1

0

1

2

3

4

Figure 18.12: The t distribution with 4 and 30 degrees of freedom R EMARK 18.6: Given a collection of independently identically distributed stanP dard normal random variables with mean 0 and variance 1, let x¯ = n1 i x i and 1 Pn 2 ¯ 2 ¯ s2 = n− 1 i=1 ( x i − x) . The following discussion shows that x and s are indepen-

dently distributed, x¯ has a normal distribution with mean 0 and variance

1 n

and

2

(n − 1)s has a chi-square distribution with n − 1 degrees of freedom. Therefore x¯ s

has a t distribution with n − 1 degrees of freedom. ¡ P ¢ © P ª P Note that Var n1 i X i = n12 i Var( X i ) = n12 n·1 = n1 . and E { X¯ } = E n1 i X i = P 1 1P ¯ i X i , the sum of n i E { X i } = 0, so that x has mean 0 and variance n . Finally,

normally distributed random variables, is normally distributed.

18.5. CONTINUOUS DISTRIBUTIONS

629

To consider the distribution of s2 , let ι′ = (1, · · · , 1) be a vector of 1s and define

the matrix M :

 1  0  M = I − ι(ι′ ι)−1 ι′ =  .  .. 

0

 1  0  = .  .. 

0

0

··· .. .

0

···

1 .. .

··· .. .

0

···



1

   1      ³ 0  1 1 −. 1 . n 0  . 1 1

0

···

 1    0  1 1 − . n  .. 0  

0

···

1 .. .

0

1 1 .. .

1

1

···

··· .. . ···

1

1

···

1

´



 1  , 1  1

since (ι′ ι) = n, (ι′ ι)−1 = n1 . Notice that M is symmetric and idempotent; M M = M .

Also



x1 − x¯





1

0

    x2 − x¯  0     .  =  .  ..   ..    0 x n − x¯

1 .. .

··· .. .

0

···

···



 1    0  1 1 − . n  .. 0  

0

1

1

1

···

1 .. .

··· .. .

1

···

1

 

x1



     1   x2    .  .  .  1   .  xn 1

¯ x,... ¯ Let x′ = ( x1 , x2 ,... , x n ) so that x is the column vector of the x i and x¯ ′ = ( x, , x¯ ), with some abuse of notation. Then

x − x¯ = M x, and so (n − 1)s2 =

X ( x i − x¯ )2 = ( x − x¯ )′ ( x − x¯ ) = x′ M x. i

Because x − x¯ = M x, x − x¯ is a vector of normal (but not independent) random variables. The expected value of the vector x − x¯ is E { x − x¯ } = 0. The variance–

covariance matrix of x − x¯ is

E {( x − x¯ )( x − x¯ )′ } = E { M xx′ M ′ } = ME { xx′ } M ′ = M I M ′ = M M = M. The symmetry of M implies that there is an orthogonal matrix P and eigenvalue matrix Λ, such that M may be written as M = P ΛP ′ . Considering (n − 1)s2 = x′ M x,

x′ M x = x′ P ΛP ′ x = y′ Λ y,

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

630

where y = P ′ x. Because y is a linear combination of normal random variables, y

is a vector of normally distributed random variables. The vector y has mean 0: E { y} = E {P ′ x} = P ′ E { x} = M0 = 0. The variance of y is: E { yy′ } = E {P ′ xx′ P } = P ′ E { xx′ }P = P ′ IP = P ′ P = I. So, the vector y′ = (y1 , y2 ,... , yn ) are n uncorrelated random bariables and, since normal, they are independent, each with mean 0 and variance 1.

Notice that if (λ, xv ) is an eigenvalue–eigenvector pair associated with M, then M xv = λ xv , since M M = M, multiplying this equation gives M M xv =

λ M xv or M xv = λ2 xv , so that λ xv = λ2 xv . Because xv 6= 0, λv must equal 0

or 1. Hence, all eigenvalues of M are either 0 or 1. Recall that the sum of diag-

onal elements of M is equal to the sum of the eigenvalues. And the sum of the diagonal elements of M is the trace of the matrix: tr(M) = tr(I − ι(ι′ ι)−1 ι′ ) = tr(I) − tr(ι(ι′ ι)−1 ι′ )) = n − tr((ι′ ι)−1 ι′ ι)) = n − 1. Since all eigenvalues are 0 or 1 and the sum of the eigenvalues is n − 1, there are

n − 1 eigenvalues each equal to 1. Consequently, y′ Λ y is the sum of squares of n − 1 normal independent random variables and so it has a chi-square distribu-

tion with n − 1 degrees of freedom. Since x′ M x = y′ Λ y, x′ M x has a chi-square

distribution with n − 1 degrees of freedom. And therefore, (n − 1)s2 has a chisquare distribution with n − 1 degrees of freedom. Considering x¯ =

1 ′ nι x

and (n − 1)s2 = x′ M x = x′ M ′ M x = (M x)′ M x, observe

that the covariance of x¯ and M x is 0: E { M xx′ ι} = ME { xx′ }ι = M ι = [I − ι(ι′ ι)−1 ι′ ] ι = [ι − ι(ι′ ι)−1 ι′ ι] = ι − ι = 0. Therefore, the random variables x¯ and M x are inde-

pendent, because ι′ x and M x are normally distributed. Since s2 is a function of M x, s2 and x¯ are independent. Consequently: X¯ q

(n−1)s2 n −1

=

X¯ =q s

X¯ P 1 ¯ 2 n−1 i (x i − x)

has t distribution with n − 1 degrees of freedom (Figure 18.12).

18.5. CONTINUOUS DISTRIBUTIONS

631

18.5.7

The F distribution

The random variable X is said to have the F distribution with degrees of freedom m and n (Figure 18.3) if the density of X is given by: ¡ ¢ m−2 Γ m2+n ³ m ´ m2 x 2 f (x) = ¡ m ¢ ¡ n ¢ ¡ ¡ ¢ ¢ m+n , x > 0. Γ 2 Γ 2 n 2 1+ m n x

(18.18)

Usually, m and n are taken to be positive integers, although f (x) is a density for m, n > 0.

F5,3

0.7 0.6 0.5 0.4 0.3

F3,12

0.2 0.1 −1

0

1

2

3

4

5

6

Figure 18.13: The F distribution: F3,12 and F5,3 distributions

Theorem 18.6: Suppose that Y and Z are independent chi-square distributions with m and n degrees of freedom, respectively. Then the random variable

X=

Y m Z n

has an F distribution with degrees of freedom m and n. The F distribution with degrees of freedom m and n has mean variance

2n2 (m+ n−2)

m(n−2)2(n−4)

n n −2

(n > 2) and

(n > 2). The proof of this theorem uses techniques similar to

the proof for the t distribution. If ϕ m and ϕ n are independent chi-square distributions of the random variables Y and Z, then ! ÃY Z∞ Z s ¡ m ¢ z ¡ ¡m¢ ¢ n m F(s) = Prob Z ≤ s = Prob Y ≤ s n Z = ϕ m (y)ϕ n (z)d ydz n

0

0

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

and ) (Z ¡ m ¢ s n z d ϕ m ( y) d y ϕ n ( z ) dz F (s) = 0 0 ds Z∞ ¡m¢ ¡ ¡ ¢ ¢ = zϕm s m n n z ϕ n ( z)dz. 0 ′

Z∞

632

Exercises for Chapter 18

Exercise 18.1 Suppose that an instructor has 100 questions available and must write an exam consisting of four questions. A question may be asked only once in the exam. How many different exams can the instructor set? (Two exams are different if there is at least one question on one exam that is not on the other.) Exercise 18.2 Suppose that an instructor has 100 questions available and must write an exam consisting of four questions. Furthermore, each question must come from a distinct category: A, B, C, D. The instructor has 15 category A questions, 25 category B questions, 40 category C questions and 30 category D questions. A question may be asked only once in the exam. (a) How many different exams can the instructor set? (Two exams are different if there is at least one question on one exam that is not on the other.) (b) Suppose instead that the exam consists of eight questions, and two must be drawn from each section. How many distinct exams are possible? Exercise 18.3 Let

f ( x) =

(

a(1 − x4 )

for 0 < x < 1

0

otherwise

be a probability density function. Find a.

633

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

634

Exercise 18.4 Let

f ( x) =

(

a(1 − x n )

for 0 < x < 1

0

otherwise

be a probability density function where n is a positive integer. Find a. Exercise 18.5 Let X be a random variable with density

f ( x) =

(

4 x3 0

Find Prob

Exercise 18.6

for 0 < x < 1 otherwise

µ

¶ 1 2 0

for x ≤ 0.

0

(a) Find the mean and variance of an exponentially distributed random variable. (b) Find the moment-generating function of f . Exercise 18.10 The exponential distribution is:

f ( x)

(

=

θ e −θ x

for x > 0

for x ≤ 0.

0

The hazard function is defined:

h( x) =

f ( x) . 1 − F ( x)

Show that, for the exponential function, the hazard function is constant. Exercise 18.11 The Poisson distribution is:

f ( x)

=

(

e −λ λ x x!

0

for x = 0, 1, 2,... otherwise.

If a random event has a mean number of occurrences in a given time period, then in that time period the number of occurrences has the Poisson distribution. Suppose that a coffee shop has an average of 24 customers per hour. (a) What is the probability that the shop will have six or fewer customers in any given hour? (b) Considering two successive hours, what is the probability that the shop will have six or fewer customers in the first hour and more than six customers in the second hour? Exercise 18.12

Suppose that X and Y are two independent random variables

with Poisson distributions:

e−λ x λ xx x!

y

and

tribution with parameter λ z = λ x + λ y .

e −λ y λ y y! .

Then Z = X + Y has a Poisson dis-

CHAPTER 18. PROBABILITY AND DISTRIBUTIONS

636

Suppose that two shops A and B have an average number of customers per hour of 5 and 10, respectively. What is the probability that, together, the two stores will have fewer than 20 customers in any given hour? Exercise 18.13 Use the fact that, for any real number γ,

eγ =

∞ γx X x =0 x !

to show that the moment-generating function for the Poisson distribution is t

m( t) = e−λ eλ e . From this conclude that the Poisson distribution has mean λ. Exercise 18.14

Suppose that a ball will fall onto the real number line. The

distribution of the point of impact is given by a standard normal distribution. Suppose you can place an interval of length k > 0 on the line. How should the

interval be located to maximize the probability that the ball will fall within the interval?

Exercise 18.15 Let X be a standard normal random variable. Define the random variable Z = δ X 2 where δ > 0. Find the distribution of Z . Exercise 18.16 The random variable defined:

X Tn = q , Z n

where X is a standard normal variable and Z a chi-square variable with n degrees of freedom, has a t distribution with n degrees of freedom. Considering the denominator, a chi-square variable with n degrees of freedom is equal to the sum of squares on n independent standard normal variP ables: Z = ni=1 X i2 . Let Vi = X i2 and since X i is a standard normal, E {Vi } = 1. P Thus, Zn = n1 ni=1 Vi . According to the law of large numbers, the average of n

independent identically distributed random variables converges to the average. q Z Z 1 Pn So, n = n i=1 Vi → 1, and so n → 1. Use this observation to argue that, as n becomes large, the distribution of a T variable converges to that of a standard normal distribution. Exercise 18.17 The random variable defined:

Fn =

Y m Z n

,

EXERCISES FOR CHAPTER 18

637

where Y , Z are independent and where Y is a chi-square distributed random variable with m degrees of freedom and Z a chi-square variable with n degrees of freedom, has an F distribution with m, n degrees of freedom. Considering the denominator, a chi-square variable with n degrees of freedom is equal to the sum of squares on n independent standard normal variables: P Z = ni=1 X i2 . Let Vi = X i2 and since X i is a standard normal, E {Vi } = 1. Thus, Z 1 Pn n = n i=1 Vi . According to the law of large numbers, the average of n inde-

pendent identically distributed random variables converges to the average. So, 1 Pn Z n = n i=1 Vi → 1. Use this observation to argue that, as n becomes large, the

distribution of an F variable converges to that of a chi-square (with m degrees of freedom) divided by its degrees of freedom.

This page intentionally left blank

19 Estimation and hypothesis testing

19.1 Introduction This chapter summarizes some of the key concepts in estimation and hypothesis testing that arise in statistical and econometric work in economics. Section 19.2 discusses estimation, unbiasedness and efficiency. The behavior of sequences of random variables in terms of long-run averages and the distribution of such averages is considered in Sections 19.3.1 and 19.3.2. The fact that the mean of a sequence of independently and identically distributed random variables has a statistical distribution approaching a normal distribution is an important observation used in hypothesis testing. Hypothesis testing is discussed in Section 19.4. Finally, these techniques are applied in the context of econometric models in Section 19.5.

19.2 Estimation Consider a sample of n random variables, X 1 , X 2 ,... , X n , all drawn from the same distribution, F, with density f . For example, flipping the same coin n times and writing 1 to denote heads and 0 to denote tails after each flip gives n such random variables X i – where the ith random variable X i is 1 or 0, depending on the outR come of the ith flip. The mean µ = x f (x)dx and variance σ2 = R (x −µ)2 f (x)dx are important summary statistics describing the distribution. Typically, such statistics are not known and must be estimated from sample data. P The average of the n variables is X¯ n = 1 n X i and provides an estimate of the n

639

i =1

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

mean, µ. Similarly, s2n =

Since

1 Pn ¯ n −1 i =1 ( X i − X n )

E { X¯ n } = E

(

640

gives an estimate of the variance σ2 .

) n n n 1X 1X 1X Xi = E{ X i } = µ = µ, n i =1 n i =1 n i =1

(19.1)

the mean X¯ n is called an unbiased estimator of µ. Provided the X i are independent with common mean, µ, and variance, σ2 , s2n is an unbiased estimator of σ2 : E {s2n } = σ2 . Observe that X¯ n and s2n are functions of the data observed,

X 1 , X 2 ,... , X n . This is what an estimator is – a function of the observed data esti-

mating underlying parameters that generated the data.

19.2.1

Unbiasedness and efficiency

In general, we assume that the data are generated by some given distribution F (or density f ) and this distribution may be characterized in terms of its “moments” R (the mean µ, variance σ2 , and higher moments k j = ( x − µ)k f ( x)dx). For the normal distribution, the mean and variance fully define the distribution (Equation 18.13); for the t distribution (Equation 18.17), the degrees of freedom k fully define the distribution, and so on. Given a distribution F where θ is some parameter of the distribution, F (·; θ),

let h(θ) be a function of that parameter. Finally, let g n ( X 1 , X 2 ,... , X n ) be a function of the data that provides an estimate of h(θ). For example, if θ is the mean and h(θ) = θ, g n ( X 1 ,... , X n ) provides an estimate of the mean. How good is an

estimator? Two primary criteria are unbiasedness and efficiency.

Definition 19.1: The estimator g n is an unbiased estimator of h(θ) if:

E { g n } = h(θ)

for all

θ.

Note that the expectation E { g n } is computed using the distribution function F (·; θ)

and so depends on θ. This may be made explicit by writing E θ {·}. If g n is an estimator of µ, the bias of the estimator is given by bias( g n ) = E { g n } − µ. Definition 19.2: The mean squared error of an estimator g n of h(θ) is given by:

E θ {[ g n − h(θ)]2 }.

(19.2)

19.2. ESTIMATION

641

The mean squared error ( MSE ) may be decomposed into the variance and bias of the estimator, as follows: nh i2 o ( g n − E θ { g n }) + (E θ { g n } − h(θ)) nh = E θ ( g n − E θ { g n })2 + (E θ { g n } − h(θ))2 io + 2( g n − E θ { g n })(E θ { g n } − h(θ)) o n = E θ ( g n − E θ { g n })2 + (E θ { g n } − h(θ))2 n o + 2E θ ( g n − E θ { g n })(E θ { g n } − h(θ))

E θ {[ g n − h(θ)]2 } = E θ

(19.3)

= E θ {( g n − E θ { g n })2 } + (E θ { g n } − h(θ))2 = Var( g n ) + (E θ { g n } − h(θ))2 = Var( g n ) + [bias( g n )]2 .

(19.4)

Unbiasedness is one desirable property of an estimator. A second is precision of the estimator in the sense that it is “likely” to be close to the value of the parameter to be estimated. For example, let J ⊂ {1, 2,... , n} containing j < n elements. P P Then consider the estimator of the mean: X˜ nJ = 1j i∈ J X i . E { X˜ nJ } = 1j i∈ J E { X i } = P 1 ˜J ˜J i∈ J µ = µ, so X n is an unbiased estimator of µ. However, the variance of X n is j 1 2 jσ .

Thus,

1 1 Var( X˜ nJ ) = σ2 > σ2 = Var( X¯ n ), j n

(19.5)

so that while both estimators are unbiased, one has a higher variance and so is a “less-accurate” estimator. Definition 19.3: An estimator g n of h(θ) is efficient if it is unbiased and there is no other unbiased estimator gˆ n with smaller variance. R EMARK 19.1: Such an estimator may not minimize the mean square error. For example, consider the two estimates of the variance:

s2 =

1X ( X i − X¯ i )2 , n n

and o

sˆ2 =

1 X n 2 s . ( X i − X¯ i )2 = n−1 n−1

1 2 sˆ satisfies E nσ−21 sˆ2 = n − 1, E {sˆ2 } = σ2 , sˆ2 is unbiased. Next observe σ2 n o 2 )2 2σ4 ˆ2 is that Var nσ−21 sˆ2 = 2(n − 1) so that Var{sˆ2 } = 2(n − 1) ( n(σ−1) 2 = n−1 . Thus, s 4 unbiased with variance n2σ−1 . ¢ 2 ¡ 1 2 n −1 2 1 1 2 ˆ2 By contrast, E {s2 } = n− n E {s } = n σ , so the bias is 1 − n σ − σ = − n σ . ¢ ª ¡ © ( n−1)2 2σ4 2( n−1) 4 n −1 2 n −1 2 2 2 Var{sˆ } = n2 n−1 = n2 σ . The variance of s equals, Var n sˆ = n

Since

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

642

Therefore:

MSE (s2 ) = Var(s2 ) + bias(s2 ) ¶ µ 2(n − 1) 4 1 2 2 σ = σ + − n n2 · ¸ 2(n − 1) 1 = + 2 σ4 n2 n 2n − 1 4 σ . = n2 But MSE (sˆ2 ) = Var(sˆ2 ) =

2σ4 n −1 .

Thus, 2σ4 2n − 1 4 − σ n−1 n2 2n2 − (n − 1)(2n − 1) 4 σ = (n − 1)n2 3n − 1 4 = σ . (n − 1)n2

MSE (sˆ2 ) − MSE (s2 ) =

Since n > 1, (3n − 1) > 0 and MSE (sˆ2 ) − MSE (s2 ) > 0. The biased estimator of σ2 , s2 , has smaller MSE than the unbiased estimator sˆ2 .

19.3 Large-sample behavior Let g n ( X 1 ,... , X n ) be an estimator of some parameter of a distribution F . As the sample size increases, the estimator varies. How does this behave as the sample size becomes large? The law of large numbers concerns the limiting behavior of the estimator. What does g n ( X 1 ,... , X n ) converge to as n becomes large (assuming it converges)? Because g n ( X 1 ,... , X n ) is a random variable, it has a distribution. The central limit theorem studies how this distribution behaves as n becomes large.

19.3.1

Law of large numbers

Given n random variables, X 1 , X 2 ,... , X n suppose they are independently distributed with common distribution F . Suppose also, that the mean of each X i P is µ and variance σ2 . Consider the average: X¯ n = n1 ni=1 X i . Under what circumstances does X¯ n converge to µ as n becomes large (in the sense of being close to µ

19.3. LARGE-SAMPLE BEHAVIOR

643

with high probability)? The expectation of X¯ n is

E { X¯ n } = E

(

) n n n 1X 1X 1 1X Xi = E{ X i } = µ = nµ = µ. n i =1 n i =1 n i =1 n

So, the mean of X¯ is µ. The variance of X¯ n is

σ2X¯ n

2

= E {( X¯ n − µ) } = E

(Ã

!2 ) n 1X Xi − µ . n i =1

(19.6)

Rearranging:

E

(Ã

!2 ) (Ã !2 ) !2 ) (Ã n n n 1X 1 X 1X 1 =E [ X i − nµ] Xi − µ =E X i − nµ n i =1 n i =1 n n i =1 !2 ) (Ã n 1X ( X i − µ) , =E n i =1

and this equals (µ ¶ Ã !2 ) µ ¶ (Ã !2 ) µ ¶ (Ã !2 ) n n n X X 1 2 X 1 2 1 2 E ( X i − µ) = ( X i − µ) = ( X i − µ) . E E n n n i =1 i =1 i =1

Rearranging the latter term: ) µ ¶ µ ¶2 n n µ ¶2 ( n n n X n XX 1 2X 1 XX 1 E ( X i − µ)( X j − µ) = E {( X i − µ)( X j − µ)} = σi j , n n i =1 j =1 n i =1 j =1 i =1 j =1

where σ i j = E {( X i − µ)( X j − µ)}. In the case where the X i are independent, σ i j = 0 P P P whenever i 6= j, then ni=1 nj=1 σ i j = ni=1 σ ii , and since the variance is the same for each X : σ ii = σ2 ;

σ2X¯ = E {( X¯ − µ)2 } = n

µ ¶2 1 1 nσ2 = σ2 . n n

(19.7)

Thus, as n increases, the variance of X¯ n declines, so that the dispersion of ¯ X n around the mean µ declines as n increases – with large n, X¯ n is close to X with high probability. The following discussion presents this in a somewhat more precise way: that for any ǫ > 0, P (| X¯ n − µ |≥ ǫ) goes to 0 as n becomes large. The

proof of the law of large numbers utilizes the Chebyshev inequality.

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

644

Theorem 19.1 (Chebyshev inequality): If X is a random variable with mean µ and variance σ2 , then

1 − P (−ǫ < X − µ < ǫ) = P (| X − µ |≥ ǫ) ≤

σ2 ǫ2

,

ǫ > 0.

(19.8)

P ROOF : Let g(·) be a non-negative function defined on the real number line.

E { g ( X )} = = ≥ ≥

Z∞

g( x) f ( x)dx

Z−∞

{ x | g ( x ) ≥δ }

Z

g( x) f ( x)dx +

Z

g( x) f ( x)dx

{ x | g ( x ) 0, then s2n

is said in probability to σ2 . If this condition is satisfied, then with q to converge p ¯ n( X n − µ ) converges in distribution to a standard normal distris n = s2n , Zˆ n = sn bution, N (0, 1).

19.4 Hypothesis testing Consider a data sample X = ( X 1 , X 2 ,... , X n ) drawn independently from or accord-

ing to some statistical distribution f ( x). Thus, the joint density is given by f ( x) =

f ( x1 , x2 ,... , x n ) = f ( x1 ) f ( x2 ) · · · f ( x n ). To make parameter dependence clear, write

f ( x | θ). The parameter θ is unknown, but belongs to some set Θ. Suppose that Θ

is partitioned into two sets, Θ1 and Θ2 , and one wishes to know whether or not θ ∈ Θ1 . In general, with a finite set of data, X 1 ,... X n , this cannot be answered with

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

648

certainty, but statistical procedures exist for providing an assessment of whether θ ∈ Θ1 , or not: hypothesis testing. Suppose that we have some hypothesis H0 (that θ ∈ Θ1 ) called the null hypothesis (the hypothesis to be tested), regarding this dis-

tribution. The (unknown) parameter θ is the subject of the hypothesis – that θ lies in some set Θ0 , or is equal to some specific value θ0 . The alternative hypothesis is that this is not the case, that θ 6∈ Θ0 . Label this hypothesis H1 . With the parameter space, Ω, partitioned into two regions, Ω1 and Ω2 , a hypothesis may be formulated:

H0 : H1 :

19.4.1

θ ∈ Ω0

θ ∈ Ω1 .

A hypothesis test

A procedure for testing the null hypothesis, a test T , is a procedure for making a judgement as to whether the hypothesis, H0 , is true or not. This decision is based on the data observed. The n data observations ( X 1 ,... , X n ) lie in the sample space, the set of possible values for this vector. For example, if a coin is tossed three times with outcome

H or T each time, then the set of possible outcomes is S = {(H, H, H ), (H, H, T ),

(H, T, H ), (H, T, T ), (T, H, H ), (T, H, T ), (T, T, H ), (T, T, T )}. With n tosses of the coin, an element of the sample space is a list of n possible values – such as (H, H, T, H,

T, T,... , T ), a list with n elements. To test a hypothesis, the sample space S is partitioned into two regions, the critical region C , and its complement C c : S = C ∪ C c . If the sample X 1 , X 2 ,... , X n

lies in the region C , the hypothesis H0 is rejected, and accepted otherwise. Note

that the value of the underlying parameter or parameters, θ, determines the distribution of the sample, so the likelihood of the sample lying on C or C c depends on θ. Depending on θ, some sample values will be more or less likely than others. In this way, the value of the observed sample provides information on whether a particular value of θ is plausible, given the data. So, C is chosen so that the observed sample would be unlikely if the maintained or null hypothesis were true: one rejects the null hypothesis because the sample data would be unlikely to be observed were the null hypothesis true.

19.4.2

Types of error

In making a decision regarding the truth of a hypothesis (testing the hypothesis), two possible errors may be made: deciding that the hypothesis is false when in

19.4. HYPOTHESIS TESTING

649

fact it is true, or alternatively, not rejecting the hypothesis when in fact it is false. A type I error involves rejecting the null hypothesis when it is true (for a simple given hypothesis). The probability of rejecting the null hypothesis when it is true is called the size or level of significance of the test. A type II error involves not rejecting the null hypothesis when it is false.

H0 true

H0 f alse

Accept H0

Correct

Type II error

Reject H0

Type I error

Correct

Definition 19.4: The power function, π, associated with a test is defined: π(θ) = Prob(Reject H0 | θ),

θ ∈ Ω.

In terms of the critical region, C , π(θ) = Prob(( X 1 ,... , X n ) ∈ C | θ). Definition 19.5: The size of a test, denoted α, is the largest probability of rejecting the null hypothesis when it is true: α = sup π(θ). θ ∈Ω0

19.4.2.1

Error trade-offs

Ideally, the probability of an error should be made as small as possible. However, in selecting the critical region, C , there is trade-off between the probability of a type I and a type II error. Recall, the decision rule is reject H0 if ( X 1 ,... , X n ) ∈ C . Consider the following two choices for C :

1. Set C = S . In this case always reject H0 . 2. Set C = {;}, the empty set. In this case never reject H0 . In the first case, H0 is rejected, regardless of the data, so in particular, H0 is rejected when H0 is true; and since H0 is never accepted, it is not accepted when false. Therefore, in case 1, P (Type I error) = 1 and P (Type II error) = 0. Conversely, in the second case, P (Type I error) = 0 and P (Type II error) = 1. Thus there is a trade-off when attempting to reduce both types of error. The next example illus-

trates this trade-off.

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

650

E XAMPLE 19.1: Suppose that (X 1 ,... , X n ) are random variables drawn from a normal distribution with known variance, σ2 = 4, and unknown mean µ, which is either 0 or 1. Consider the following null and alternative hypotheses: H0 : H1 :

µ=0

µ = 1.

Suppose the critical region is chosen as C = C(k) = {(X 1 ,... , X n ) | X¯ > k}, for

some number k. Then,

α(C(k)) = P( X¯ > k | µ = 0) = 1 − P( X¯ ≤ k | µ = 0) = 1 − Φ



β(C(k)) = 1 − P( X¯ ≥ k | µ = 1) = 1 − 1 − Φ





 k − 1  p2 n



 k − 0  

=Φ

p2 n



 k − 1   p2 n

.

For any k ∈ (−∞, ∞), there is a pair (α(C(k)), β(C(k))). A plot of this pair as k

varies shows the trade-off between the probability of a type I and a type II

♦

error (Figure 19.1).

α(C(k))

•

k → −∞

k=0

Φ(0) = 21 • •

• Φ( −p21 )

1 2

k→∞

•

β(C(k))

n

Figure 19.1: Trade-off between Type I and Type II error

19.4. HYPOTHESIS TESTING

651

19.4.2.2

Simple hypotheses and minimizing error probabilities

A simple hypothesis with simple alternative has the form:

H0 : H1 :

θ = θ0

θ = θ1 .

Let C be the critical region: reject H0 is ( X 1 ,... , X n ) ∈ C . Define α(C ) = Prob(( X 1 ,... , X n ) ∈ C | H0 ) = Prob(Type I error) β(C ) = 1 − Prob(( X 1 ,... , X n ) ∈ C | H1 ) = Prob(Type II error).

In this context, we envisage that the sample data are drawn either from a disQ tribution with density f ( x | θ0 ) = f ( x1 | θ0 ) · · · f ( x n | θ0 ) = ni=1 f ( x i | θ0 ) or from Q a distribution with density f ( x | θ1 ) = ni=1 f ( x i | θ1 ). Consider an average of the

error probabilities:

· ¸ Z f ( x | θ0 )dx + b 1 − f ( x | θ1 )dx C C ¸ · Z Z = b + a f ( x | θ0 )dx − b f ( x | θ1 )dx C Z C = b + [af ( x | θ0 ) − b f ( x | θ1 )]dx.

L(C ) = aα(C ) + bβ(C ) = a

Z

C

To make this as small as possible, pick C = C ∗ = { x | af ( x | θ0 ) − b f ( x | θ1 ) < 0}, to

be the set of x where af ( x | θ0 ) < b f ( x | θ1 ). With this approach, we reject H0 when f ( x|θ1 ) f ( x|θ0 )

> ab .

R EMARK 19.3: In view of the construction of C ∗ , for any other test procedure, Cˆ : α(C ) ≤ α(C ∗ ) ⇒ β(C ) ≥ β(C ∗ )

and α(C ) < α(C ∗ ) ⇒ β(C ) > β(C ∗ ).

(This is a version of the Neyman–Pearson lemma.)

Given the size of the test, the power is as large as possible on the alternative hypothesis. E XAMPLE 19.2: Suppose that ( X 1 ,... , X n ) are random variables drawn from a normal distribution with known variance, σ2 = 1, and unknown mean

µ. Consider the following null and alternative hypotheses:

H0 : H1 :

µ = µ0

µ = µ1 .

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

652

Assume that µ1 > µ0 . Consider the set C = { x = ( x1 ,... , x n ) |

f ( x|µ1 ) f ( x|µ0 )

≥ k}. The

density of n normal independent random variables with mean µ and variance σ2 is: µ

2

1

f ( x | µ, σ ) = p 2πσ2

¶n

e

− 21

1 σ2

− 21

1 σ2

P

P

( x i −µ ) 2 .

Therefore,

f ( x | µ1 , σ2 )

f ( x | µ0 , σ2 )

³

=³ =e =e

p 1 2πσ2

ń

e

( x i −µ 1 ) 2

ń 1 1 P ( x i −µ 0 ) 2 − p 1 e 2 σ2 2πσ2 P P − 12 12 ( xi −µ1 )2 + 21 12 ( xi −µ0 )2

e

σ

− 12

σ

1 P 2 P 2 2 [ ( x i −µ 1 ) − ( x i −µ 0 ) ]

σ

.

P P P P P Since ( x i − µ)2 = x2i − 2µ x i + nµ2 = x2i − 2nµ x¯ + nµ2 , ( x i − µ1 )2 − P ( x i − µ0 )2 = (−2nµ1 x¯ + nµ21 ) − (−2nµ0 x¯ + nµ20 ) = 2n x¯ (µ0 − µ1 ) + n(µ21 − µ20 ). Thus,

with σ2 = 1,

f ( x | µ1 , σ2 )

f ( x | µ0 , σ2 )

=e

−

1 2σ2

[2 n x¯ (µ0 −µ1 )+ n(µ21 −µ20 )] 1

2

2

= e−[ n x¯ (µ0 −µ1 )+ 2 n(µ1−µ0 )] .

f ( x|µ ,σ2 )

1

2

2

The condition f ( x|µ1 ,σ2 ) ≥ k implies e−[ n x¯ (µ0 −µ1 )+ 2 n(µ1−µ0 )] ≥ k or 0 ¤ £ − n x¯ (µ0 − µ1 ) + 12 n(µ21 − µ20 ) ≥ ln k. Rearranging: n x¯ (µ1 − µ0 ) ≥ ln k + ln k+ 21 n(µ21 −µ20 ) n ( µ 1 −µ 0 )

= kˆ (k). Finally, the size of the test, α, may be fixed by choosing k appropriately. P ( x¯ ≥ kˆ (k) | µ = µ0 ) = α. p p So, P ( x¯ − µ0 ≥ kˆ (k) − µ0 | µ0 ) = α, or P ( n( x¯ − µ0 ) ≥ n(kˆ (k) − p n( x¯ − µ0 ) is a standard normal variable, this is µ0 ) | µ0 ) = α. Since p ˆ p ˆ P ( Z ≥ n(k(k) − µ0 ) | µ0 ) = α. If zα satisfieshP ( Z ≥ zα ) = i α, then set n(k(k) − µ0 ) = zα or kˆ ( k) = p1n zα + µ0 . Then, ln k = p1n zα + µ0 (µ1 − µ0 ) − 21 (µ21 − µ20 ). 1 2 2 2 n(µ1 − µ0 )

or x¯ ≥

Finally,

k= e

19.4.3

h

p1 zα +µ0 n

i

¡ ¢ (µ1 −µ0 )− 21 µ21 −µ20

.

♦

Simple null and alternative hypotheses

Recall that a hypothesis test consists of the null hypothesis H0 (the hypothesis to be tested) and the alternative hypothesis H1 (the hypothesis assumed to

19.4. HYPOTHESIS TESTING

653

be true if the null hypothesis is false). For example, suppose that data come from one of two distributions – the sample X 1 ,... , X n is drawn from the distribution f ( x | θ0 ), or else from f ( x | θ1 ). Suppose that we hypothesize that the true

value of θ is θ0 , the null hypothesis is H0 : θ = θ0 and the alternative hypothesis is H1 : θ = θ1 . Sensible intuition suggests that we should compare how likely the data is to have come from f (· | θ0 ) with the likelihood of the data having

come from f (· | θ1 ). Define the likelihood of the sample under the parameter θ

as L ( X 1 ,... , X n | θ) = Πni=1 f ( X i | θ). This is the density associated with the sam-

ple at parameter θ. The relative likelihood of the sample (likelihood ratio) is then given by:

LR ( X 1 ,... , X n ) =

n L ( X 1 ,... , X n | θ0 ) Π i=1 f ( X i | θ0 ) = n . L ( X 1 ,... , X n | θ1 ) Π i=1 f ( X i | θ1 )

If LR ( X 1 ,... , X n ) is “large”, it seems plausible that the sample is drawn according to the parameter θ0 . In this case, the decision rule is: accept H0 if LR ( X 1 ,... , X n ) ≥

l and reject H0 otherwise, where l is some appropriately chosen threshold. The

standard way to choose l is to set the size of the test to some specific value. Suppose that we reject H0 whenever LR ( X 1 ,... , X n ) ≤ λ; then the rejection

region for the test is Cλ = {( X 1 ,... , X n ) | LR ( X 1 ,... , X n ) ≤ λ}, giving the sample

region on which H0 is rejected. Choosing λ small means that few samples will satisfy the condition; choosing λ large leads to most samples satisfying the condition. The probability of the sample being drawn in Cλ when the value of θ is θ0 is given by Prob{( X 1 ,... , X n ) ∈ Cλ | θ0 }. If λ is chosen so that Prob{( X 1 ,... , X n ) ∈ Cλ | θ0 } = α then the probability of rejecting the null hypothesis, H0 , when it is true is α: the size of the test is α. Similarly, we can consider Prob{( X 1 ,... , X n ) ∈ Cλ | θ1 }, the

probability of rejecting H0 when it is false. Let πλ (θ) = Prob{( X 1 ,... , X n ) ∈ Cλ | θ}

be the power function, so that πλ (θ0 ) = α.

A key result, the Neyman–Pearson lemma asserts that defining the rejection region using the likelihood ratio as described above is optimal in the sense that any alternative test region with no larger size will have a smaller power at the alternative hypothesis. For any alternative test with power function π∗ (θ), if π∗ (θ0 ) = πλ (θ0 ) = α, then π∗ (θ1 ) ≤ πλ (θ1 ).

E XAMPLE 19.3: The exponential distribution has the form f ( x) = θ e−θ x ,

where θ > 0 and 0 < x < ∞. Write f ( x | θ) to highlight the parameter. Given

a sample ( X 1 ,... , X n ),

L ( X 1 ,... , X n | θ) = Πni=1 f ( X i | θ) = Πni=1 θ e−θ X i = θ n e−θ

Pn

i =1

Xi

,

(19.26)

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

654

so that the likelihood ratio is: P µ ¶n n −θ n X Pn θ0 L ( X 1 ,... , X n | θ0 ) θ0 e 0 i=1 i Pn = LR ( X 1 ,... , X n ) = e − ( θ 0 − θ 1 ) i =1 X i . = L ( X 1 ,... , X n | θ1 ) θ n e−θ1 i=1 X i θ1 1 ³ ń Pn θ0 e − ( θ 0 − θ 1 ) i =1 X i ≤ λ So, LR ( X 1 ,... , X n ) ≤ λ corresponds to θ1 ³ ń Pn or e−(θ0 −θ1 ) i=1 X i ≤ θθ1 λ. Taking logarithms, this is equivalent to h³ ´ n i h³ 0 ´ n i P P −(θ0 − θ1 ) ni=1 X i ≤ ln θθ01 λ or ni=1 X i ≤ (θ1 −1 θ0 ) ln θθ01 λ , giving the

condition under which H0 is rejected. When θ = θ0 , we want this to have

♦

probability α.

19.4.4

Uniformly most powerful tests

With simple null and alternative hypothesis, and with fixed size α, the test that rejects H0 when LR ( X 1 ,... , X n ) =

L ( X 1 ,...,X n |θ0 ) L ( X 1 ,...,X n |θ1 )

≤ λ is the most powerful test on the

alternative H1 : no test of equal size has larger power on the alternative hypothesis

H1 : θ = θ1 . But this is true for any θ > θ0 . So, the test “reject H0 if x¯ ≤ k′ some

k′ ”, is a uniformly most powerful test of the null hypothesis, H0 : θ = θ0 , against

the alternative hypothesis, H1 : θ > θ0 . (Uniformly most powerful because it is most powerful at each θ1 > θ0 .) Considering Example 19.3, let the null hypothesis

be H0 : θ ∈ Ω0 = {θ ≤ θ0 } and H1 : θ ∈ Ω1 = {θ > θ0 }. Fix a size α and choose k

so that supθ ∈Ω0 πC (θ) = α, where πC is the power function determined by k, C = {( x1 ,... , x n ) | x¯ ≤ k}. Then, for any other critical region C ′ , with πC ′ (θ) ≤ α, πC ′ (θ) ≤

πC (θ) for all θ ∈ Ω1 .

Definition 19.6: Let f ( x | θ) =

Qn

i =1

f ( x i | θ) be the joint distribution on n indepen-

dent identically distributed random variables. Let T = r( x ) be a statistic (a function

of the observed data). Say that f ( x | θ) has a monotone likelihood ratio in θ if, for θ2 > θ1 ,

f ( x | θ2 ) f ( x | θ1 ) depends only on x through T and the ratio is a non-increasing or non-decreasing function of T over the range of T . P In Example 19.3, putting T = r( x ) = i x i , µ ¶n µ ¶n P f ( x | θ2 ) θ2 θ2 −(θ2 −θ1 ) ni=1 X i = e = e −( θ 2 −θ 1 ) T , f ( x | θ1 ) θ1 θ1

and with θ2 > θ1 , the ratio is monotonically decreasing in T .

19.4. HYPOTHESIS TESTING

655

Theorem 19.4: Suppose that f ( x | θ) has a monotone likelihood ratio in T = r( x ), and consider the null and alternative hypotheses:

H0 : θ ≤ θ0 H1 : θ > θ0 1. If

f ( x|θ ′ ) f ( x|θ ′′ )

(19.27)

is non-decreasing in r( x ), ∀θ′ > θ′′ ) then a test of the form: reject H0

if r( x ) ≥ c with P(r( x ) ≥ c | θ = θ0 ) = α is a uniformly most powerful (UMP)

test of the hypothesis 19.27 at level of significance α. 2. If

f ( x|θ ′ ) f ( x|θ ′′ )

is non-increasing in r( x ), ∀θ′ > θ′′ ) then a test of the form: reject H0

if r( x ) ≤ c with P(r( x ) ≤ c | θ = θ0 ) = α is a uniformly most powerful test of the hypothesis 19.27 at level of significance α.

R EMARK 19.4: If the hypotheses are specified:

H0 : θ ≥ θ0 H1 : θ < θ0 , then uniformly most powerful tests are possible. For example, reject H0 when the likelihood ratio

f ( x|θ ′ ) f ( x|θ ′′ )

is non-decreasing and in r( x ), ∀θ′ > θ′′ ) and r( x ) ≤ c

with c chosen so that P (r( x ) ≤ c | θ = θ0 ) = α is a uniformly most powerful (UMP)

test at significance level α. Conversely, when the likelihood ratio is increasing, the UMP test has the form “reject H0 when r( x ) ≥ c”.

Thus, for one-sided hypotheses, uniformly most powerful tests exist.

19.4.4.1

A detailed example

Suppose that in the past a compact car has averaged 30 mpg (µ = 30) with a

standard deviation of 5 mpg (σ = 5).The manufacturer wishes to test the claim

that mpg has increased as a result of engine redesign, against the hypothesis of no improvement. It is known that the redesign has left the standard deviation unchanged. Thus

H0 : µ ≤ 30 H1 : µ > 30

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

656

Suppose that the sample size is 50 (n = 50); data are available on the mpg performance of 50 cars. Consider:

Z50 =

19.4.4.2

x¯ 50 − 30 p5 50

.

Distribution of the test statistic

If the distribution of car mpg is normal, Z50 is normally distributed; if the distribution of car mpg is unknown, then, appealing to the central limit theorem, Z50 is approximately normally distributed. If Z50 is normally distributed, the procedure we will follow is exact; if not we will appeal to the central limit theorem and the procedure will be approximately correct.

19.4.4.3

Choice of significance level

Next we decide on a level of significance for the test: the probability of rejecting the null hypothesis when it is true. It is usual to choose a level of significance of 1%, 5% or 10%. On the null hypothesis, Z50 has a standard normal (or approximately a standard normal) distribution. Pick a level of significance of 5%: then

P(Z50 ≥ 1.65) = P



¯  x50 − 30 p5 50



> 1.65 = 0.05.

(19.28)

This gives µ

¶ 1 P x¯ ≥ 30 + p 1.65 = P( x¯ ≥ 31.2) = 0.05. 2

If engine redesign has not increased mpg, then in a sample of 50 new cars there would be a 5% chance that their average mpg would exceed 31.2 (Figure 19.2). In other words: on H0 , P( x¯ ≥ 31.2) = 0.05. Note that this probability is correct only

when the underlying mean is unchanged at 30 mpg. Our decision rule is to reject H0 if x¯ ≥ 31.2, because it is very unlikely that we would have a sample average

above 31.2 if H0 were true – there is only a 5% chance. However, there is a 5% chance that we will reject H0 when H0 is true. With this test P(reject H0 |H0 true) = 0.05. The choice of significance level fixes the probability of a type I error (at the level of significance).

19.4. HYPOTHESIS TESTING

657

19.4.4.4

Significance level: Type I and II errors

One could go further and reduce the probability of a type I error to 0 as follows. With Z50 a standard normal variable (or approximately so), on H0 ,   ¯ x − 30 50 P ≥ 6 = 0, or P( x¯ 50 ≥ 34.24) = 0. 5 p

(19.29)

50

If we reject H0 when x¯ 50 > 34.24, there is 0 probability that we will reject H0 when it is true.

Suppose that we adopt this rule: reject H0 if x¯ > 34.24 and suppose also that

the true mean is 31. Then

x¯ 50 −31 p5 50

is a standard normal variable. Consider the prob-

ability of x¯ being greater than 34.24 when the true mean is 31. This probability is   ¯ x − 31 34.24 − 31 50  P( x¯ > 34.24) = P  > (19.30) 5 5 p 50

p 50

= P(Z > 4.54).

(19.31)

Since, for a standard normal random variable Z, P(Z < 4.58) = 1, there is 0 prob-

ability of rejecting the null hypothesis H0 , even when µ = 31. So, though there

is a 0 probability of rejecting the null hypothesis when it is true, there is also 0 probability of rejecting when it is false (and µ = 31). Reducing the probability

of rejection when the null is true also reduces the probability of rejection when the null is false (increase the probability of accepting the null hypothesis when it is false). Reducing the probability of a type I error increases the probability of a type II error. Returning to the decision rule (reject H0 if x > 31.2), if the true mean were 31,

then

P( x¯ > 31.2) = P

P( x¯ 50 > 31.2) = P



¯  x50 − 31 p5 50



¯  x − 31 p5 50

>

>

 31.2 − 31  p5 50

 31.2 − 31  p5 50

= P(Z > 0.28) = 0.39

= P(Z ≥ 0.28) = 0.39.

(19.32)

Therefore, P(reject H0 | H0 false) = 0.39 and P(accept H0 | H0 false) = 0.61,

when the true mean is 31 (Figure 19.3). To summarize, while the old rule (reject H0 if x¯ > 31.2) has a 5% chance of rejecting H0 when H0 is true (Type I error)

and a 61% chance of accepting H0 when H0 is false and µ = 31 (Type II error),

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

658

16% 5% 31.2

µ = 30

µ = 30.5 31.2

Figure 19.2: Distributions at µ = 30 and µ = 30.5

66% 39% 31.2 µ = 31

31.2 µ = 31.5

Figure 19.3: Distributions at µ = 31 and µ = 31.5

the decision rule to reject H0 if x¯ > 34.24 has a 0% chance of rejecting H0 when it is true and a 100% chance of accepting H0 when it is false and µ = 31. The

point of these observations is that there is a trade-off between the probabilities of Type I and Type II errors. Reducing the probability of a Type I error increases the probability of a Type II error.

19.4.4.5

Power of a test

Ideally we would like to have a test that always rejects H0 when it is false and always accepts H0 when it is true. The power of a test when the true mean is µ is defined to be the probability of rejecting H0 (when the true mean is µ). Thus an ideal test would have probability 0 of rejecting H0 when it is true and probability 1 of rejecting H0 when it is false. In the present case, H0 is true when µ < 30 and

H0 is false when µ > 30. Denote by π(µ) the power of the test when the true mean

is µ. To calculate the power function, we calculate the probability of rejecting H0

when the true mean is varied. Then, for our example, π would look like Figure 19.4. This function is drawn in Figure 19.5.

19.4. HYPOTHESIS TESTING

659

µ = 30.0 :

µ = 30.5 :

µ = 31.0 :

µ = 31.5 :

µ = 32.0 :

P ( x¯ > 31.2) = P



 x¯ − 30.0 p5 50

P ( x¯ > 31.2) = P

P ( x¯ > 31.2) = P

P ( x¯ > 31.2) = P



 x¯ − 30.5

 x¯ − 31.0 p5 50



 x¯ − 31.5 p5 50

p5 50

p5 50

p5 50

 31.2 − 31.5  p5 50

 x¯ − 32.0

>

= P ( Z > 1.697) = 0.04

 31.2 − 30.5 

 31.2 − 31.0 

>

>



p5 50

>

p5 50



P ( x¯ > 31.2) = P

>

 31.2 − 30.0 

= P ( Z > 0.2828) = 0.39

= P ( Z > −0.4243) = 0.66

 31.2 − 32.0  p5 50

= P ( Z > 0.99) = 0.16

= P ( Z > −1.13) = 0.87

Figure 19.4: Power function; one-sided test

19.4.4.6

Increasing the sample size

Increasing the sample size changes the power function. Suppose the µsample size¶ is p ¯ 225 instead of 50 ( 225 = 15). Again, under the null hypothesis, P X −530 > zα = 15

0.05 requires zα = 1.65, so setting the size to 5% requires rejection of the null hypothesis when X¯ > 30 + 5 1.65 = 30.55. 15

With this rejection threshold, when the true mean is 30.5:

P ( x¯ ≥ 30.55) = P

Ã

X¯ − 30.5 5 15

>

30.55 − 30.5 5 15

!

= P ( Z > 0.15) = 0.44.

And when the true mean is 31:

P ( x¯ ≥ 30.55) = P

Ã

X¯ − 31 5 15

>

30.55 − 31 5 15

!

= P ( Z > −1.35) = 0.91.

Similarly, when the true mean is 31.5, P ( X¯ > 30.55) = 99.78. Increasing the sample size raises the power on the alternative hypothesis.

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

660

π(µ) ideal power function

π225 (µ)

π50 (µ)

α = 0.05

µ

30

Figure 19.5: Power functions for sample sizes of 50 and 225

19.4.5

Two-sided tests

Consider now a (null) hypothesis that some parameter, say the mean, is equal to a specific value against the (alternative) hypothesis that the parameter is greater than or less than that specific value. In this case, one expects to reject the null hypothesis when the sample average is either “high” or “low” relative to the null hypothesis.

E XAMPLE 19.4:

H0 : µ = 30 H1 : µ 6= 30. For a standard normal random variable, Z , P (−1.96 < Z < 1.96) = 0.05. With

a sample size of n = 50 and population standard deviation, σ = 5, p 50 ¯ 5 ( x − 30)

has a standard normal distribution so that 

P −1.96
π1 (µ), µ > 30, so that the power func-

tion associated with the one-sided test has greater power on the region {µ′ | µ′ > 30} and less power on the region {µ′ | µ′ < 30}. So, the two-sided test is not uniformly most powerful on the alternative hypothesis. Definition 19.7: A test satisfying: π(θ) ≤ α,

θ ∈ Ω0

π(θ) ≥ α,

θ ∈ Ω1

is an unbiased test of significance level α where maxθ ∈Ω0 = α. In the case where Ω0 = {µ0 }, a single value, and where the power function is differentiable, since π is minimized at µ = µ0 , it must be that π′ (µ) = 0. The test given in Example 19.4 is a uniformly most powerful unbiased test.

19.4. HYPOTHESIS TESTING

663

19.4.7

Use of chi-square and F tests

19.4.7.1

A chi-square test for the variance

Let X 1 ,... , X n be independent random variables, normally distributed with mean µ and variance σ2 . Consider:

H0 : σ2 = σ20

H1 : σ2 6= σ20 .

Then, on the null hypothesis, 1 X

σ20

( X i − x¯ )2 =

(n − 1)s2 σ20

has a chi-square distribution with n − 1 degrees of freedom. Suppose the signifi-

cance level is set at α. For a random variable Z that has a chi-square distribution with n − 1 degrees of freedom, let χ α L and χ α U satisfy: 2

2

P (Z ≤ χ α L ) = 1 − P (Z ≤ χ α U ) = P (Z > χ α U ) = 2

2

Then, on the null hypothesis, Ã

P χαL ≤ 2

(n − 1)s2 σ20

2

2

.

!

≤ χ α U = α. 2

This gives a test of H0 : reject the null hypothesis if [χ α L , χ α U ]. 2

α

( n−1) s2 σ20

is outside the interval

2

For example, if n = 20 then

( n−1) s2 σ20

has a chi-square distribution with n − 1 = 19

degrees of freedom. A chi-square random variable, Z , with 19 degrees of freedom satisfies: P ( Z ≤ 8.907) = 0.025 and P ( Z ≤ 32.852) = 0.975. So, on the null

hypothesis,

Ã

P 8.907 ≤ So, reject H0 when

19.4.7.2

( n−1) s2 σ20

(n − 1)s2 σ20

!

≤ 32.852 = 0.05.

lies outside the interval [8.907, 32.852].

An F test for difference in variances

Let X 1 ,... , X n be n independently normally distributed random variables with mean µ X and variance σ2X . Similarly, let Y1 ,... ,Yn be m independently normally

CHAPTER 19. ESTIMATION AND HYPOTHESIS TESTING

664

distributed random variables with mean µY and variance σ2Y . Suppose also that the X i and Y j are independent. Consider the hypotheses:

H0 : σ2X = σ2Y

H1 : σ2X 6= σ2Y .

Let S 2X =

1 P ¯ 2 n −1 ( X i − X )

2 and SY =

1 P ¯ 2 m −1 ( Y j − Y ) .

Then

( n−1) s2X σ2X

and

( m−1) s2Y σ2Y

have,

respectively, chi-square distributions with n − 1 and m − 1 degrees of freedom.

Consider:

( n−1) s2X

¶

( m−1) s2Y

¶

µ µ

σ2X

σ2Y

µ

/(n − 1)

=µ

/(m − 1)

s2X σ2X

s2Y σ2Y

¶ ¶=

Ã

s2X s2Y

!Ã

σ2Y σ2X

!

.

This has an F distribution with (n − 1, m − 1) degrees of freedom, and on H0 ,

σ2Y = σ2X , so that on H0 ,

s2X s2Y

has an F distribution with (n − 1, m − 1) degrees of free-

dom. For a random variable Z with an F distribution having these degrees of freedom, define F nL−1,m−1 and F nU−1,m−1 so that P ( Z < F nL−1,m−1 ) = P ( Z > F nU−1,m−1 ) = α2 . Then,

P

Ã

F nL−1,m−1